diff mbox series

[net] net/packet: Fix a comment about hard_header_len and headroom allocation

Message ID 20200906031827.16819-1-xie.he.0141@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] net/packet: Fix a comment about hard_header_len and headroom allocation | expand

Commit Message

Xie He Sept. 6, 2020, 3:18 a.m. UTC
This comment is outdated and no longer reflects the actual implementation
of af_packet.c.

Reasons for the new comment:

1.

In this file, the function packet_snd first reserves a headroom of
length (dev->hard_header_len + dev->needed_headroom).
Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
which calls dev->header_ops->create, to create the link layer header.
If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
length (dev->hard_header_len), and checks if the user has provided a
header of length (dev->hard_header_len) (in dev_validate_header).
This shows the developers of af_packet.c expect hard_header_len to
be consistent with header_ops.

2.

In this file, the function packet_sendmsg_spkt has a FIXME comment.
That comment states that prepending an LL header internally in a driver
is considered a bug. I believe this bug can be fixed by setting
hard_header_len to 0, making the internal header completely invisible
to af_packet.c (and requesting the headroom in needed_headroom instead).

3.

There is a commit for a WiFi driver:
commit 9454f7a895b8 ("mwifiex: set needed_headroom, not hard_header_len")
According to the discussion about it at:
  https://patchwork.kernel.org/patch/11407493/
The author tried to set the WiFi driver's hard_header_len to the Ethernet
header length, and request additional header space internally needed by
setting needed_headroom. This means this usage is already adopted by
driver developers.

Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
---
 net/packet/af_packet.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Willem de Bruijn Sept. 7, 2020, 9:40 a.m. UTC | #1
On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote:
>
> This comment is outdated and no longer reflects the actual implementation
> of af_packet.c.

If it was previously true, can you point to a commit that changes the behavior?

>
> Reasons for the new comment:
>
> 1.
>
> In this file, the function packet_snd first reserves a headroom of
> length (dev->hard_header_len + dev->needed_headroom).
> Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
> which calls dev->header_ops->create, to create the link layer header.
> If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
> length (dev->hard_header_len), and checks if the user has provided a
> header of length (dev->hard_header_len) (in dev_validate_header).

Not entirely, a header greater than dev->min_header_len that passes
dev_validate_header.

> This shows the developers of af_packet.c expect hard_header_len to
> be consistent with header_ops.
>
> 2.
>
> In this file, the function packet_sendmsg_spkt has a FIXME comment.
> That comment states that prepending an LL header internally in a driver
> is considered a bug. I believe this bug can be fixed by setting
> hard_header_len to 0, making the internal header completely invisible
> to af_packet.c (and requesting the headroom in needed_headroom instead).

Ack.

> 3.
>
> There is a commit for a WiFi driver:
> commit 9454f7a895b8 ("mwifiex: set needed_headroom, not hard_header_len")
> According to the discussion about it at:
>   https://patchwork.kernel.org/patch/11407493/
> The author tried to set the WiFi driver's hard_header_len to the Ethernet
> header length, and request additional header space internally needed by
> setting needed_headroom. This means this usage is already adopted by
> driver developers.
>
> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Brian Norris <briannorris@chromium.org>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Xie He <xie.he.0141@gmail.com>
> ---
>  net/packet/af_packet.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 2b33e977a905..c808c76efa71 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -93,12 +93,15 @@
>
>  /*
>     Assumptions:
> -   - if device has no dev->hard_header routine, it adds and removes ll header
> -     inside itself. In this case ll header is invisible outside of device,
> -     but higher levels still should reserve dev->hard_header_len.
> -     Some devices are enough clever to reallocate skb, when header
> -     will not fit to reserved space (tunnel), another ones are silly
> -     (PPP).
> +   - If the device has no dev->header_ops, there is no LL header visible
> +     outside of the device. In this case, its hard_header_len should be 0.

Such a constraint is more robustly captured with a compile time
BUILD_BUG_ON check. Please do add a comment that summarizes why the
invariant holds.

More about the older comment, but if reusing: it's not entirely clear
to me what "outside of the device" means. The upper layers that
receive data from the device and send data to it, including
packet_snd, I suppose? Not the lower layers, clearly. Maybe that can
be more specific.

> +     The device may prepend its own header internally. In this case, its
> +     needed_headroom should be set to the space needed for it to add its
> +     internal header.
> +     For example, a WiFi driver pretending to be an Ethernet driver should
> +     set its hard_header_len to be the Ethernet header length, and set its
> +     needed_headroom to be (the real WiFi header length - the fake Ethernet
> +     header length).
>     - packet socket receives packets with pulled ll header,
>       so that SOCK_RAW should push it back.
>
> --
> 2.25.1
>
Xie He Sept. 8, 2020, 12:07 a.m. UTC | #2
Thank you for your comment!

On Mon, Sep 7, 2020 at 2:41 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote:
> >
> > This comment is outdated and no longer reflects the actual implementation
> > of af_packet.c.
>
> If it was previously true, can you point to a commit that changes the behavior?

This is my understanding about the history of "af_packet.c":

1. Pre git history

At first, before "needed_headroom" was introduced, "hard_header_len"
was the only way for a driver to request headroom. However,
"hard_header_len" was also used in "af_packet.c" for processing the
header. There was a confusion / disagreement between "af_packet.c"
developers and driver developers about the use of "hard_header_len".
"af_packet.c" developers would assume that all headers were visible to
them through dev->header_ops (called dev->hard_header at that time?).
But the developers of some drivers were not able to expose all their
headers to "af_packet.c" through header_ops (for example, in tunnel
drivers). These drivers still requested the headroom via
"hard_header_len" but this created bugs for "af_packet.c" because
"af_packet.c" would assume "hard_header_len" was the length of the
header visible to them through header_ops.

Therefore, in Linux version 2.1.43pre1, the FIXME comment was added.
In this comment, "af_packet.c" developers clearly stated that not
exposing the header through header_ops was a bug that needed to be
fixed in the drivers. But I think driver developers were not able to
agree because some drivers really had a need to add their own header
without using header_ops (for example in tunnel drivers).

In Linux version 2.1.68, the developer of "af_packet.c" compromised
and recognized the use of "hard_header_len" even when there is no
header_ops, by adding the comment I'm trying to change now. But I
guess some other developers of "af_packet.c" continued to treat
"hard_header_len" to be the length of header of header_ops and created
a lot of problems.

2. Introduction of "needed_headroom"

Because this issue has troubled for developers for long, in 2008,
developers introduced "needed_headroom" to solve this problem.
"needed_headroom" has only one purpose - reserve headroom. It is not
used in af_packet.c for processing so drivers can safely use it to
request headroom without exposing the header via header_ops.

The commit was:
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")

After "needed_headroom" was introduced, all drivers that needed to
reserve the headroom but didn't want "af_packet.c" to interfere should
change to "needed_headroom".

From this point on, "af_packet.c" developers were able to assume
"hard_header_len" was only used for header processing purposes in
"af_packet.c".

3. Not reserving the headroom of hard_header_len for RAW sockets

Another very important point in history is these two commits in 2018:
commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
layer allocation")
commit 9aad13b087ab ("packet: fix reserve calculation")

These two commits changed packet_snd to the present state and made it
no long reserve the headroom of hard_header_len for RAW sockets. This
made drivers' switching from hard_header_len to needed_headroom became
urgent because otherwise they might have a kernel panic when used with
RAW sockets.

> > In this file, the function packet_snd first reserves a headroom of
> > length (dev->hard_header_len + dev->needed_headroom).
> > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
> > which calls dev->header_ops->create, to create the link layer header.
> > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
> > length (dev->hard_header_len), and checks if the user has provided a
> > header of length (dev->hard_header_len) (in dev_validate_header).
>
> Not entirely, a header greater than dev->min_header_len that passes
> dev_validate_header.

Yes, I understand. The function checks both hard_header_len and
min_header_len. I want to explain the role of hard_header_len in
dev_validate_header. But I feel a little hard to concisely explain
this without simplifying a little bit.

> >  /*
> >     Assumptions:
> > -   - if device has no dev->hard_header routine, it adds and removes ll header
> > -     inside itself. In this case ll header is invisible outside of device,
> > -     but higher levels still should reserve dev->hard_header_len.
> > -     Some devices are enough clever to reallocate skb, when header
> > -     will not fit to reserved space (tunnel), another ones are silly
> > -     (PPP).
> > +   - If the device has no dev->header_ops, there is no LL header visible
> > +     outside of the device. In this case, its hard_header_len should be 0.
>
> Such a constraint is more robustly captured with a compile time
> BUILD_BUG_ON check. Please do add a comment that summarizes why the
> invariant holds.

I'm not sure how to do this. I guess both header_ops and
hard_header_len are assigned at runtime. (Right?) I guess we are not
able to check this at compile-time.

> More about the older comment, but if reusing: it's not entirely clear
> to me what "outside of the device" means. The upper layers that
> receive data from the device and send data to it, including
> packet_snd, I suppose? Not the lower layers, clearly. Maybe that can
> be more specific.

Yes, right. If a header is visible "outside of the device", it means
the header is exposed to upper layers via "header_ops". If a header is
not visible "outside of the device" and is only used "internally", it
means the header is not exposed to upper layers via "header_ops".
Maybe we can change it to "outside of the device driver"? We can
borrow the idea of encapsulation in object-oriented programming - some
things that happen inside a software component should not be visible
outside of that software component.
Willem de Bruijn Sept. 8, 2020, 8:55 a.m. UTC | #3
On Tue, Sep 8, 2020 at 2:07 AM Xie He <xie.he.0141@gmail.com> wrote:
>
> Thank you for your comment!
>
> On Mon, Sep 7, 2020 at 2:41 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote:
> > >
> > > This comment is outdated and no longer reflects the actual implementation
> > > of af_packet.c.
> >
> > If it was previously true, can you point to a commit that changes the behavior?
>
> This is my understanding about the history of "af_packet.c":
>
> 1. Pre git history
>
> At first, before "needed_headroom" was introduced, "hard_header_len"
> was the only way for a driver to request headroom. However,
> "hard_header_len" was also used in "af_packet.c" for processing the
> header. There was a confusion / disagreement between "af_packet.c"
> developers and driver developers about the use of "hard_header_len".
> "af_packet.c" developers would assume that all headers were visible to
> them through dev->header_ops (called dev->hard_header at that time?).
> But the developers of some drivers were not able to expose all their
> headers to "af_packet.c" through header_ops (for example, in tunnel
> drivers). These drivers still requested the headroom via
> "hard_header_len" but this created bugs for "af_packet.c" because
> "af_packet.c" would assume "hard_header_len" was the length of the
> header visible to them through header_ops.
>
> Therefore, in Linux version 2.1.43pre1, the FIXME comment was added.
> In this comment, "af_packet.c" developers clearly stated that not
> exposing the header through header_ops was a bug that needed to be
> fixed in the drivers. But I think driver developers were not able to
> agree because some drivers really had a need to add their own header
> without using header_ops (for example in tunnel drivers).
>
> In Linux version 2.1.68, the developer of "af_packet.c" compromised
> and recognized the use of "hard_header_len" even when there is no
> header_ops, by adding the comment I'm trying to change now. But I
> guess some other developers of "af_packet.c" continued to treat
> "hard_header_len" to be the length of header of header_ops and created
> a lot of problems.
>
> 2. Introduction of "needed_headroom"
>
> Because this issue has troubled for developers for long, in 2008,
> developers introduced "needed_headroom" to solve this problem.
> "needed_headroom" has only one purpose - reserve headroom. It is not
> used in af_packet.c for processing so drivers can safely use it to
> request headroom without exposing the header via header_ops.
>
> The commit was:
> commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
>
> After "needed_headroom" was introduced, all drivers that needed to
> reserve the headroom but didn't want "af_packet.c" to interfere should
> change to "needed_headroom".
>
> From this point on, "af_packet.c" developers were able to assume
> "hard_header_len" was only used for header processing purposes in
> "af_packet.c".

Very nice archeology!

Thanks for summarizing.

> 3. Not reserving the headroom of hard_header_len for RAW sockets
>
> Another very important point in history is these two commits in 2018:
> commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
> layer allocation")
> commit 9aad13b087ab ("packet: fix reserve calculation")
>
> These two commits changed packet_snd to the present state and made it
> no long reserve the headroom of hard_header_len for RAW sockets. This
> made drivers' switching from hard_header_len to needed_headroom became
> urgent because otherwise they might have a kernel panic when used with
> RAW sockets.
>
> > > In this file, the function packet_snd first reserves a headroom of
> > > length (dev->hard_header_len + dev->needed_headroom).
> > > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
> > > which calls dev->header_ops->create, to create the link layer header.
> > > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
> > > length (dev->hard_header_len), and checks if the user has provided a
> > > header of length (dev->hard_header_len) (in dev_validate_header).
> >
> > Not entirely, a header greater than dev->min_header_len that passes
> > dev_validate_header.
>
> Yes, I understand. The function checks both hard_header_len and
> min_header_len. I want to explain the role of hard_header_len in
> dev_validate_header. But I feel a little hard to concisely explain
> this without simplifying a little bit.

Ack.

> > >  /*
> > >     Assumptions:
> > > -   - if device has no dev->hard_header routine, it adds and removes ll header
> > > -     inside itself. In this case ll header is invisible outside of device,
> > > -     but higher levels still should reserve dev->hard_header_len.
> > > -     Some devices are enough clever to reallocate skb, when header
> > > -     will not fit to reserved space (tunnel), another ones are silly
> > > -     (PPP).
> > > +   - If the device has no dev->header_ops, there is no LL header visible
> > > +     outside of the device. In this case, its hard_header_len should be 0.
> >
> > Such a constraint is more robustly captured with a compile time
> > BUILD_BUG_ON check. Please do add a comment that summarizes why the
> > invariant holds.
>
> I'm not sure how to do this. I guess both header_ops and
> hard_header_len are assigned at runtime. (Right?) I guess we are not
> able to check this at compile-time.

header_ops should be compile constant, and most devices use
struct initializers for hard_header_len, but of course you're right.

Perhaps a WARN_ON_ONCE, then.

> > More about the older comment, but if reusing: it's not entirely clear
> > to me what "outside of the device" means. The upper layers that
> > receive data from the device and send data to it, including
> > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can
> > be more specific.
>
> Yes, right. If a header is visible "outside of the device", it means
> the header is exposed to upper layers via "header_ops". If a header is
> not visible "outside of the device" and is only used "internally", it
> means the header is not exposed to upper layers via "header_ops".
> Maybe we can change it to "outside of the device driver"? We can
> borrow the idea of encapsulation in object-oriented programming - some
> things that happen inside a software component should not be visible
> outside of that software component.

How about "above"? If sketched as a network stack diagram, the code
paths and devices below the (possibly tunnel) device do see packets
with link layer header.
Xie He Sept. 8, 2020, 11:27 a.m. UTC | #4
On Tue, Sep 8, 2020 at 1:56 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> > > >  /*
> > > >     Assumptions:
> > > > -   - if device has no dev->hard_header routine, it adds and removes ll header
> > > > -     inside itself. In this case ll header is invisible outside of device,
> > > > -     but higher levels still should reserve dev->hard_header_len.
> > > > -     Some devices are enough clever to reallocate skb, when header
> > > > -     will not fit to reserved space (tunnel), another ones are silly
> > > > -     (PPP).
> > > > +   - If the device has no dev->header_ops, there is no LL header visible
> > > > +     outside of the device. In this case, its hard_header_len should be 0.
> > >
> > > Such a constraint is more robustly captured with a compile time
> > > BUILD_BUG_ON check. Please do add a comment that summarizes why the
> > > invariant holds.
> >
> > I'm not sure how to do this. I guess both header_ops and
> > hard_header_len are assigned at runtime. (Right?) I guess we are not
> > able to check this at compile-time.
>
> header_ops should be compile constant, and most devices use
> struct initializers for hard_header_len, but of course you're right.
>
> Perhaps a WARN_ON_ONCE, then.

OK. Thank you for your suggestion! Actually I was not aware of these
macros. So thank you for introducing them to me! I'll surely add this
warning.

> > > More about the older comment, but if reusing: it's not entirely clear
> > > to me what "outside of the device" means. The upper layers that
> > > receive data from the device and send data to it, including
> > > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can
> > > be more specific.
> >
> > Yes, right. If a header is visible "outside of the device", it means
> > the header is exposed to upper layers via "header_ops". If a header is
> > not visible "outside of the device" and is only used "internally", it
> > means the header is not exposed to upper layers via "header_ops".
> > Maybe we can change it to "outside of the device driver"? We can
> > borrow the idea of encapsulation in object-oriented programming - some
> > things that happen inside a software component should not be visible
> > outside of that software component.
>
> How about "above"? If sketched as a network stack diagram, the code
> paths and devices below the (possibly tunnel) device do see packets
> with link layer header.

OK. I understand what you mean now. We need to make it clear that the
header is only invisible to upper layers but not to "lower layers"
that the device may rely on.

I'm thinking about a way to clearly phrase this. "Above the device"
might be confusing to people. Do you think this is good: "invisible to
upper-layer code including the code in af_packet.c"? Or simply
"invisible to upper-layer code"? Or just "invisible to upper layers"?
(I don't like the last one because I feel according to the network
stack diagram "upper layers" should already and always not care about
the LL header.)
Willem de Bruijn Sept. 8, 2020, 1:55 p.m. UTC | #5
> > > > More about the older comment, but if reusing: it's not entirely clear
> > > > to me what "outside of the device" means. The upper layers that
> > > > receive data from the device and send data to it, including
> > > > packet_snd, I suppose? Not the lower layers, clearly. Maybe that can
> > > > be more specific.
> > >
> > > Yes, right. If a header is visible "outside of the device", it means
> > > the header is exposed to upper layers via "header_ops". If a header is
> > > not visible "outside of the device" and is only used "internally", it
> > > means the header is not exposed to upper layers via "header_ops".
> > > Maybe we can change it to "outside of the device driver"? We can
> > > borrow the idea of encapsulation in object-oriented programming - some
> > > things that happen inside a software component should not be visible
> > > outside of that software component.
> >
> > How about "above"? If sketched as a network stack diagram, the code
> > paths and devices below the (possibly tunnel) device do see packets
> > with link layer header.
>
> OK. I understand what you mean now. We need to make it clear that the
> header is only invisible to upper layers but not to "lower layers"
> that the device may rely on.
>
> I'm thinking about a way to clearly phrase this. "Above the device"
> might be confusing to people. Do you think this is good: "invisible to
> upper-layer code including the code in af_packet.c"? Or simply
> "invisible to upper-layer code"? Or just "invisible to upper layers"?
> (I don't like the last one because I feel according to the network
> stack diagram "upper layers" should already and always not care about
> the LL header.)

Upper layers is often understood to imply the network stack diagram
indeed, excluding other stacks, such as virtual devices or packet
sockets. Hence above. But either works. The commit message will
disambiguate.
diff mbox series

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2b33e977a905..c808c76efa71 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -93,12 +93,15 @@ 
 
 /*
    Assumptions:
-   - if device has no dev->hard_header routine, it adds and removes ll header
-     inside itself. In this case ll header is invisible outside of device,
-     but higher levels still should reserve dev->hard_header_len.
-     Some devices are enough clever to reallocate skb, when header
-     will not fit to reserved space (tunnel), another ones are silly
-     (PPP).
+   - If the device has no dev->header_ops, there is no LL header visible
+     outside of the device. In this case, its hard_header_len should be 0.
+     The device may prepend its own header internally. In this case, its
+     needed_headroom should be set to the space needed for it to add its
+     internal header.
+     For example, a WiFi driver pretending to be an Ethernet driver should
+     set its hard_header_len to be the Ethernet header length, and set its
+     needed_headroom to be (the real WiFi header length - the fake Ethernet
+     header length).
    - packet socket receives packets with pulled ll header,
      so that SOCK_RAW should push it back.