Message ID | 20200906031827.16819-1-xie.he.0141@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] net/packet: Fix a comment about hard_header_len and headroom allocation | expand |
On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote: > > This comment is outdated and no longer reflects the actual implementation > of af_packet.c. If it was previously true, can you point to a commit that changes the behavior? > > Reasons for the new comment: > > 1. > > In this file, the function packet_snd first reserves a headroom of > length (dev->hard_header_len + dev->needed_headroom). > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, > which calls dev->header_ops->create, to create the link layer header. > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of > length (dev->hard_header_len), and checks if the user has provided a > header of length (dev->hard_header_len) (in dev_validate_header). Not entirely, a header greater than dev->min_header_len that passes dev_validate_header. > This shows the developers of af_packet.c expect hard_header_len to > be consistent with header_ops. > > 2. > > In this file, the function packet_sendmsg_spkt has a FIXME comment. > That comment states that prepending an LL header internally in a driver > is considered a bug. I believe this bug can be fixed by setting > hard_header_len to 0, making the internal header completely invisible > to af_packet.c (and requesting the headroom in needed_headroom instead). Ack. > 3. > > There is a commit for a WiFi driver: > commit 9454f7a895b8 ("mwifiex: set needed_headroom, not hard_header_len") > According to the discussion about it at: > https://patchwork.kernel.org/patch/11407493/ > The author tried to set the WiFi driver's hard_header_len to the Ethernet > header length, and request additional header space internally needed by > setting needed_headroom. This means this usage is already adopted by > driver developers. > > Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Brian Norris <briannorris@chromium.org> > Cc: Cong Wang <xiyou.wangcong@gmail.com> > Signed-off-by: Xie He <xie.he.0141@gmail.com> > --- > net/packet/af_packet.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index 2b33e977a905..c808c76efa71 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -93,12 +93,15 @@ > > /* > Assumptions: > - - if device has no dev->hard_header routine, it adds and removes ll header > - inside itself. In this case ll header is invisible outside of device, > - but higher levels still should reserve dev->hard_header_len. > - Some devices are enough clever to reallocate skb, when header > - will not fit to reserved space (tunnel), another ones are silly > - (PPP). > + - If the device has no dev->header_ops, there is no LL header visible > + outside of the device. In this case, its hard_header_len should be 0. Such a constraint is more robustly captured with a compile time BUILD_BUG_ON check. Please do add a comment that summarizes why the invariant holds. More about the older comment, but if reusing: it's not entirely clear to me what "outside of the device" means. The upper layers that receive data from the device and send data to it, including packet_snd, I suppose? Not the lower layers, clearly. Maybe that can be more specific. > + The device may prepend its own header internally. In this case, its > + needed_headroom should be set to the space needed for it to add its > + internal header. > + For example, a WiFi driver pretending to be an Ethernet driver should > + set its hard_header_len to be the Ethernet header length, and set its > + needed_headroom to be (the real WiFi header length - the fake Ethernet > + header length). > - packet socket receives packets with pulled ll header, > so that SOCK_RAW should push it back. > > -- > 2.25.1 >
Thank you for your comment! On Mon, Sep 7, 2020 at 2:41 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote: > > > > This comment is outdated and no longer reflects the actual implementation > > of af_packet.c. > > If it was previously true, can you point to a commit that changes the behavior? This is my understanding about the history of "af_packet.c": 1. Pre git history At first, before "needed_headroom" was introduced, "hard_header_len" was the only way for a driver to request headroom. However, "hard_header_len" was also used in "af_packet.c" for processing the header. There was a confusion / disagreement between "af_packet.c" developers and driver developers about the use of "hard_header_len". "af_packet.c" developers would assume that all headers were visible to them through dev->header_ops (called dev->hard_header at that time?). But the developers of some drivers were not able to expose all their headers to "af_packet.c" through header_ops (for example, in tunnel drivers). These drivers still requested the headroom via "hard_header_len" but this created bugs for "af_packet.c" because "af_packet.c" would assume "hard_header_len" was the length of the header visible to them through header_ops. Therefore, in Linux version 2.1.43pre1, the FIXME comment was added. In this comment, "af_packet.c" developers clearly stated that not exposing the header through header_ops was a bug that needed to be fixed in the drivers. But I think driver developers were not able to agree because some drivers really had a need to add their own header without using header_ops (for example in tunnel drivers). In Linux version 2.1.68, the developer of "af_packet.c" compromised and recognized the use of "hard_header_len" even when there is no header_ops, by adding the comment I'm trying to change now. But I guess some other developers of "af_packet.c" continued to treat "hard_header_len" to be the length of header of header_ops and created a lot of problems. 2. Introduction of "needed_headroom" Because this issue has troubled for developers for long, in 2008, developers introduced "needed_headroom" to solve this problem. "needed_headroom" has only one purpose - reserve headroom. It is not used in af_packet.c for processing so drivers can safely use it to request headroom without exposing the header via header_ops. The commit was: commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") After "needed_headroom" was introduced, all drivers that needed to reserve the headroom but didn't want "af_packet.c" to interfere should change to "needed_headroom". From this point on, "af_packet.c" developers were able to assume "hard_header_len" was only used for header processing purposes in "af_packet.c". 3. Not reserving the headroom of hard_header_len for RAW sockets Another very important point in history is these two commits in 2018: commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") commit 9aad13b087ab ("packet: fix reserve calculation") These two commits changed packet_snd to the present state and made it no long reserve the headroom of hard_header_len for RAW sockets. This made drivers' switching from hard_header_len to needed_headroom became urgent because otherwise they might have a kernel panic when used with RAW sockets. > > In this file, the function packet_snd first reserves a headroom of > > length (dev->hard_header_len + dev->needed_headroom). > > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, > > which calls dev->header_ops->create, to create the link layer header. > > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of > > length (dev->hard_header_len), and checks if the user has provided a > > header of length (dev->hard_header_len) (in dev_validate_header). > > Not entirely, a header greater than dev->min_header_len that passes > dev_validate_header. Yes, I understand. The function checks both hard_header_len and min_header_len. I want to explain the role of hard_header_len in dev_validate_header. But I feel a little hard to concisely explain this without simplifying a little bit. > > /* > > Assumptions: > > - - if device has no dev->hard_header routine, it adds and removes ll header > > - inside itself. In this case ll header is invisible outside of device, > > - but higher levels still should reserve dev->hard_header_len. > > - Some devices are enough clever to reallocate skb, when header > > - will not fit to reserved space (tunnel), another ones are silly > > - (PPP). > > + - If the device has no dev->header_ops, there is no LL header visible > > + outside of the device. In this case, its hard_header_len should be 0. > > Such a constraint is more robustly captured with a compile time > BUILD_BUG_ON check. Please do add a comment that summarizes why the > invariant holds. I'm not sure how to do this. I guess both header_ops and hard_header_len are assigned at runtime. (Right?) I guess we are not able to check this at compile-time. > More about the older comment, but if reusing: it's not entirely clear > to me what "outside of the device" means. The upper layers that > receive data from the device and send data to it, including > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can > be more specific. Yes, right. If a header is visible "outside of the device", it means the header is exposed to upper layers via "header_ops". If a header is not visible "outside of the device" and is only used "internally", it means the header is not exposed to upper layers via "header_ops". Maybe we can change it to "outside of the device driver"? We can borrow the idea of encapsulation in object-oriented programming - some things that happen inside a software component should not be visible outside of that software component.
On Tue, Sep 8, 2020 at 2:07 AM Xie He <xie.he.0141@gmail.com> wrote: > > Thank you for your comment! > > On Mon, Sep 7, 2020 at 2:41 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote: > > > > > > This comment is outdated and no longer reflects the actual implementation > > > of af_packet.c. > > > > If it was previously true, can you point to a commit that changes the behavior? > > This is my understanding about the history of "af_packet.c": > > 1. Pre git history > > At first, before "needed_headroom" was introduced, "hard_header_len" > was the only way for a driver to request headroom. However, > "hard_header_len" was also used in "af_packet.c" for processing the > header. There was a confusion / disagreement between "af_packet.c" > developers and driver developers about the use of "hard_header_len". > "af_packet.c" developers would assume that all headers were visible to > them through dev->header_ops (called dev->hard_header at that time?). > But the developers of some drivers were not able to expose all their > headers to "af_packet.c" through header_ops (for example, in tunnel > drivers). These drivers still requested the headroom via > "hard_header_len" but this created bugs for "af_packet.c" because > "af_packet.c" would assume "hard_header_len" was the length of the > header visible to them through header_ops. > > Therefore, in Linux version 2.1.43pre1, the FIXME comment was added. > In this comment, "af_packet.c" developers clearly stated that not > exposing the header through header_ops was a bug that needed to be > fixed in the drivers. But I think driver developers were not able to > agree because some drivers really had a need to add their own header > without using header_ops (for example in tunnel drivers). > > In Linux version 2.1.68, the developer of "af_packet.c" compromised > and recognized the use of "hard_header_len" even when there is no > header_ops, by adding the comment I'm trying to change now. But I > guess some other developers of "af_packet.c" continued to treat > "hard_header_len" to be the length of header of header_ops and created > a lot of problems. > > 2. Introduction of "needed_headroom" > > Because this issue has troubled for developers for long, in 2008, > developers introduced "needed_headroom" to solve this problem. > "needed_headroom" has only one purpose - reserve headroom. It is not > used in af_packet.c for processing so drivers can safely use it to > request headroom without exposing the header via header_ops. > > The commit was: > commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") > > After "needed_headroom" was introduced, all drivers that needed to > reserve the headroom but didn't want "af_packet.c" to interfere should > change to "needed_headroom". > > From this point on, "af_packet.c" developers were able to assume > "hard_header_len" was only used for header processing purposes in > "af_packet.c". Very nice archeology! Thanks for summarizing. > 3. Not reserving the headroom of hard_header_len for RAW sockets > > Another very important point in history is these two commits in 2018: > commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link > layer allocation") > commit 9aad13b087ab ("packet: fix reserve calculation") > > These two commits changed packet_snd to the present state and made it > no long reserve the headroom of hard_header_len for RAW sockets. This > made drivers' switching from hard_header_len to needed_headroom became > urgent because otherwise they might have a kernel panic when used with > RAW sockets. > > > > In this file, the function packet_snd first reserves a headroom of > > > length (dev->hard_header_len + dev->needed_headroom). > > > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, > > > which calls dev->header_ops->create, to create the link layer header. > > > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of > > > length (dev->hard_header_len), and checks if the user has provided a > > > header of length (dev->hard_header_len) (in dev_validate_header). > > > > Not entirely, a header greater than dev->min_header_len that passes > > dev_validate_header. > > Yes, I understand. The function checks both hard_header_len and > min_header_len. I want to explain the role of hard_header_len in > dev_validate_header. But I feel a little hard to concisely explain > this without simplifying a little bit. Ack. > > > /* > > > Assumptions: > > > - - if device has no dev->hard_header routine, it adds and removes ll header > > > - inside itself. In this case ll header is invisible outside of device, > > > - but higher levels still should reserve dev->hard_header_len. > > > - Some devices are enough clever to reallocate skb, when header > > > - will not fit to reserved space (tunnel), another ones are silly > > > - (PPP). > > > + - If the device has no dev->header_ops, there is no LL header visible > > > + outside of the device. In this case, its hard_header_len should be 0. > > > > Such a constraint is more robustly captured with a compile time > > BUILD_BUG_ON check. Please do add a comment that summarizes why the > > invariant holds. > > I'm not sure how to do this. I guess both header_ops and > hard_header_len are assigned at runtime. (Right?) I guess we are not > able to check this at compile-time. header_ops should be compile constant, and most devices use struct initializers for hard_header_len, but of course you're right. Perhaps a WARN_ON_ONCE, then. > > More about the older comment, but if reusing: it's not entirely clear > > to me what "outside of the device" means. The upper layers that > > receive data from the device and send data to it, including > > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can > > be more specific. > > Yes, right. If a header is visible "outside of the device", it means > the header is exposed to upper layers via "header_ops". If a header is > not visible "outside of the device" and is only used "internally", it > means the header is not exposed to upper layers via "header_ops". > Maybe we can change it to "outside of the device driver"? We can > borrow the idea of encapsulation in object-oriented programming - some > things that happen inside a software component should not be visible > outside of that software component. How about "above"? If sketched as a network stack diagram, the code paths and devices below the (possibly tunnel) device do see packets with link layer header.
On Tue, Sep 8, 2020 at 1:56 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > > > /* > > > > Assumptions: > > > > - - if device has no dev->hard_header routine, it adds and removes ll header > > > > - inside itself. In this case ll header is invisible outside of device, > > > > - but higher levels still should reserve dev->hard_header_len. > > > > - Some devices are enough clever to reallocate skb, when header > > > > - will not fit to reserved space (tunnel), another ones are silly > > > > - (PPP). > > > > + - If the device has no dev->header_ops, there is no LL header visible > > > > + outside of the device. In this case, its hard_header_len should be 0. > > > > > > Such a constraint is more robustly captured with a compile time > > > BUILD_BUG_ON check. Please do add a comment that summarizes why the > > > invariant holds. > > > > I'm not sure how to do this. I guess both header_ops and > > hard_header_len are assigned at runtime. (Right?) I guess we are not > > able to check this at compile-time. > > header_ops should be compile constant, and most devices use > struct initializers for hard_header_len, but of course you're right. > > Perhaps a WARN_ON_ONCE, then. OK. Thank you for your suggestion! Actually I was not aware of these macros. So thank you for introducing them to me! I'll surely add this warning. > > > More about the older comment, but if reusing: it's not entirely clear > > > to me what "outside of the device" means. The upper layers that > > > receive data from the device and send data to it, including > > > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can > > > be more specific. > > > > Yes, right. If a header is visible "outside of the device", it means > > the header is exposed to upper layers via "header_ops". If a header is > > not visible "outside of the device" and is only used "internally", it > > means the header is not exposed to upper layers via "header_ops". > > Maybe we can change it to "outside of the device driver"? We can > > borrow the idea of encapsulation in object-oriented programming - some > > things that happen inside a software component should not be visible > > outside of that software component. > > How about "above"? If sketched as a network stack diagram, the code > paths and devices below the (possibly tunnel) device do see packets > with link layer header. OK. I understand what you mean now. We need to make it clear that the header is only invisible to upper layers but not to "lower layers" that the device may rely on. I'm thinking about a way to clearly phrase this. "Above the device" might be confusing to people. Do you think this is good: "invisible to upper-layer code including the code in af_packet.c"? Or simply "invisible to upper-layer code"? Or just "invisible to upper layers"? (I don't like the last one because I feel according to the network stack diagram "upper layers" should already and always not care about the LL header.)
> > > > More about the older comment, but if reusing: it's not entirely clear > > > > to me what "outside of the device" means. The upper layers that > > > > receive data from the device and send data to it, including > > > > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can > > > > be more specific. > > > > > > Yes, right. If a header is visible "outside of the device", it means > > > the header is exposed to upper layers via "header_ops". If a header is > > > not visible "outside of the device" and is only used "internally", it > > > means the header is not exposed to upper layers via "header_ops". > > > Maybe we can change it to "outside of the device driver"? We can > > > borrow the idea of encapsulation in object-oriented programming - some > > > things that happen inside a software component should not be visible > > > outside of that software component. > > > > How about "above"? If sketched as a network stack diagram, the code > > paths and devices below the (possibly tunnel) device do see packets > > with link layer header. > > OK. I understand what you mean now. We need to make it clear that the > header is only invisible to upper layers but not to "lower layers" > that the device may rely on. > > I'm thinking about a way to clearly phrase this. "Above the device" > might be confusing to people. Do you think this is good: "invisible to > upper-layer code including the code in af_packet.c"? Or simply > "invisible to upper-layer code"? Or just "invisible to upper layers"? > (I don't like the last one because I feel according to the network > stack diagram "upper layers" should already and always not care about > the LL header.) Upper layers is often understood to imply the network stack diagram indeed, excluding other stacks, such as virtual devices or packet sockets. Hence above. But either works. The commit message will disambiguate.
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 2b33e977a905..c808c76efa71 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -93,12 +93,15 @@ /* Assumptions: - - if device has no dev->hard_header routine, it adds and removes ll header - inside itself. In this case ll header is invisible outside of device, - but higher levels still should reserve dev->hard_header_len. - Some devices are enough clever to reallocate skb, when header - will not fit to reserved space (tunnel), another ones are silly - (PPP). + - If the device has no dev->header_ops, there is no LL header visible + outside of the device. In this case, its hard_header_len should be 0. + The device may prepend its own header internally. In this case, its + needed_headroom should be set to the space needed for it to add its + internal header. + For example, a WiFi driver pretending to be an Ethernet driver should + set its hard_header_len to be the Ethernet header length, and set its + needed_headroom to be (the real WiFi header length - the fake Ethernet + header length). - packet socket receives packets with pulled ll header, so that SOCK_RAW should push it back.
This comment is outdated and no longer reflects the actual implementation of af_packet.c. Reasons for the new comment: 1. In this file, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and checks if the user has provided a header of length (dev->hard_header_len) (in dev_validate_header). This shows the developers of af_packet.c expect hard_header_len to be consistent with header_ops. 2. In this file, the function packet_sendmsg_spkt has a FIXME comment. That comment states that prepending an LL header internally in a driver is considered a bug. I believe this bug can be fixed by setting hard_header_len to 0, making the internal header completely invisible to af_packet.c (and requesting the headroom in needed_headroom instead). 3. There is a commit for a WiFi driver: commit 9454f7a895b8 ("mwifiex: set needed_headroom, not hard_header_len") According to the discussion about it at: https://patchwork.kernel.org/patch/11407493/ The author tried to set the WiFi driver's hard_header_len to the Ethernet header length, and request additional header space internally needed by setting needed_headroom. This means this usage is already adopted by driver developers. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xie He <xie.he.0141@gmail.com> --- net/packet/af_packet.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)