diff mbox series

[nf] netfilter: nf_reject: init skb->dev for reset packet

Message ID 20240604120311.27300-1-fw@strlen.de
State Not Applicable
Headers show
Series [nf] netfilter: nf_reject: init skb->dev for reset packet | expand

Commit Message

Florian Westphal June 4, 2024, 12:03 p.m. UTC
skb_get_hash() triggers a (harmless) warn when neither skb->sk or skb->dev
is set.

In case of nf-generated tcp reset, both sk and dev are NULL:

WARNING: .. net/core/flow_dissector.c:1104
[..]
 skb_flow_dissect_flow_keys include/linux/skbuff.h:1536 [inline]
 skb_get_hash include/linux/skbuff.h:1578 [inline]
 nft_trace_init+0x7d/0x120 net/netfilter/nf_tables_trace.c:320
 nft_do_chain+0xb26/0xb90 net/netfilter/nf_tables_core.c:268
 nft_do_chain_ipv4+0x7a/0xa0 net/netfilter/nft_chain_filter.c:23
 nf_hook_slow+0x57/0x160 net/netfilter/core.c:626
 __ip_local_out+0x21d/0x260 net/ipv4/ip_output.c:118
 ip_local_out+0x26/0x1e0 net/ipv4/ip_output.c:127
 nf_send_reset+0x58c/0x700 net/ipv4/netfilter/nf_reject_ipv4.c:308
 nft_reject_ipv4_eval+0x53/0x90 net/ipv4/netfilter/nft_reject_ipv4.c:30
 [..]

Fixes: d0e13a1488ad ("flow_dissector: lookup netns by skb->sk if skb->dev is NULL")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/netfilter/nf_reject_ipv4.c | 1 +
 net/ipv6/netfilter/nf_reject_ipv6.c | 1 +
 2 files changed, 2 insertions(+)

Comments

Florian Westphal June 5, 2024, 6:14 p.m. UTC | #1
Christoph Paasch <cpaasch@apple.com> wrote:
> > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> I just gave this one a shot in my syzkaller instances and am still hitting the issue.

No, different bug, this patch is correct.

I refuse to touch the flow dissector.
Pablo Neira Ayuso June 5, 2024, 6:38 p.m. UTC | #2
On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> Christoph Paasch <cpaasch@apple.com> wrote:
> > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> >
> > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
>
> No, different bug, this patch is correct.
>
> I refuse to touch the flow dissector.

I see callers of ip_local_out() in the tree which do not set skb->dev.

I don't understand this:

bool __skb_flow_dissect(const struct net *net,
                        const struct sk_buff *skb,
                        struct flow_dissector *flow_dissector,
                        void *target_container, const void *data,
                        __be16 proto, int nhoff, int hlen, unsigned int flags)
{
[...]
        WARN_ON_ONCE(!net);
        if (net) {

it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")

Is this WARN_ON_ONCE() bogus?
Florian Westphal June 5, 2024, 7:08 p.m. UTC | #3
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

[ CC Willem ]

> On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> > Christoph Paasch <cpaasch@apple.com> wrote:
> > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > >
> > > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
> >
> > No, different bug, this patch is correct.
> >
> > I refuse to touch the flow dissector.
> 
> I see callers of ip_local_out() in the tree which do not set skb->dev.
> 
> I don't understand this:
> 
> bool __skb_flow_dissect(const struct net *net,
>                         const struct sk_buff *skb,
>                         struct flow_dissector *flow_dissector,
>                         void *target_container, const void *data,
>                         __be16 proto, int nhoff, int hlen, unsigned int flags)
> {
> [...]
>         WARN_ON_ONCE(!net);
>         if (net) {
> 
> it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
> 
> Is this WARN_ON_ONCE() bogus?

When this was added (handle dissection from bpf prog, per netns), the correct
solution would have been to pass 'struct net' explicitly via skb_get_hash()
and all variants.  As that was likely deemed to be too much code churn it
tries to infer struct net via skb->{dev,sk}.

So there are several options here:
1. remove the WARN_ON_ONCE and be done with it
2. remove the WARN_ON_ONCE and pretend net was init_net
3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
   or 2)
4. stop using skb_get_hash() from netfilter (but there are likely other
   callers that might hit this).
5. fix up callers, one by one
6. assign skb->dev inside netfilter if its unset

3 and 2 combined are probably going to be the least invasive.

5 might take some time, we now know two, namely tcp resets generated
from netfilter and igmp_send_report().  No idea if there are more.

I dislike 3) mainly because of the 'guess the netns' design, not because it
adds more code to a way too large function however, so maybe its
acceptable?
Pablo Neira Ayuso June 5, 2024, 7:45 p.m. UTC | #4
On Wed, Jun 05, 2024 at 09:08:33PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> [ CC Willem ]
> 
> > On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> > > Christoph Paasch <cpaasch@apple.com> wrote:
> > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > >
> > > > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
> > >
> > > No, different bug, this patch is correct.
> > >
> > > I refuse to touch the flow dissector.
> > 
> > I see callers of ip_local_out() in the tree which do not set skb->dev.
> > 
> > I don't understand this:
> > 
> > bool __skb_flow_dissect(const struct net *net,
> >                         const struct sk_buff *skb,
> >                         struct flow_dissector *flow_dissector,
> >                         void *target_container, const void *data,
> >                         __be16 proto, int nhoff, int hlen, unsigned int flags)
> > {
> > [...]
> >         WARN_ON_ONCE(!net);
> >         if (net) {
> > 
> > it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
> > 
> > Is this WARN_ON_ONCE() bogus?
> 
> When this was added (handle dissection from bpf prog, per netns), the correct
> solution would have been to pass 'struct net' explicitly via skb_get_hash()
> and all variants.  As that was likely deemed to be too much code churn it
> tries to infer struct net via skb->{dev,sk}.
> 
> So there are several options here:
> 1. remove the WARN_ON_ONCE and be done with it
> 2. remove the WARN_ON_ONCE and pretend net was init_net
> 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
>    or 2)
> 4. stop using skb_get_hash() from netfilter (but there are likely other
>    callers that might hit this).
> 5. fix up callers, one by one
> 6. assign skb->dev inside netfilter if its unset
> 
> 3 and 2 combined are probably going to be the least invasive.
> 
> 5 might take some time, we now know two, namely tcp resets generated
> from netfilter and igmp_send_report().  No idea if there are more.

Quickly browsing, synproxy and tee also calls ip_local_out() too.

icmp_send() which is used, eg. to send destination unreachable too to
reset.

There is also __skb_get_hash_symmetric() that could hit this from
nft_hash?

No idea what more callers need to be adjusted to remove this splat,
that was a cursory tree review.

And ip_output() sets on skb->dev from postrouting path, then if
callers are fixed, then skb->dev would be once then again from output path?
Willem de Bruijn June 5, 2024, 9:38 p.m. UTC | #5
On Wed, Jun 5, 2024 at 3:45 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Wed, Jun 05, 2024 at 09:08:33PM +0200, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > [ CC Willem ]
> >
> > > On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> > > > Christoph Paasch <cpaasch@apple.com> wrote:
> > > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > >
> > > > > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
> > > >
> > > > No, different bug, this patch is correct.
> > > >
> > > > I refuse to touch the flow dissector.
> > >
> > > I see callers of ip_local_out() in the tree which do not set skb->dev.
> > >
> > > I don't understand this:
> > >
> > > bool __skb_flow_dissect(const struct net *net,
> > >                         const struct sk_buff *skb,
> > >                         struct flow_dissector *flow_dissector,
> > >                         void *target_container, const void *data,
> > >                         __be16 proto, int nhoff, int hlen, unsigned int flags)
> > > {
> > > [...]
> > >         WARN_ON_ONCE(!net);
> > >         if (net) {
> > >
> > > it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
> > >
> > > Is this WARN_ON_ONCE() bogus?
> >
> > When this was added (handle dissection from bpf prog, per netns), the correct
> > solution would have been to pass 'struct net' explicitly via skb_get_hash()
> > and all variants.  As that was likely deemed to be too much code churn it
> > tries to infer struct net via skb->{dev,sk}.

It has been a while, but I think we just did not anticipate skb's with
neither dev nor sk set.

Digging through the layers from skb_hash to __skb_flow_dissect
now, it does look impractical to add such an explicit API.

> > So there are several options here:
> > 1. remove the WARN_ON_ONCE and be done with it
> > 2. remove the WARN_ON_ONCE and pretend net was init_net
> > 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
> >    or 2)
> > 4. stop using skb_get_hash() from netfilter (but there are likely other
> >    callers that might hit this).
> > 5. fix up callers, one by one
> > 6. assign skb->dev inside netfilter if its unset

Is 6 a realistic option?

> >
> > 3 and 2 combined are probably going to be the least invasive.
> >
> > 5 might take some time, we now know two, namely tcp resets generated
> > from netfilter and igmp_send_report().  No idea if there are more.
>
> Quickly browsing, synproxy and tee also calls ip_local_out() too.
>
> icmp_send() which is used, eg. to send destination unreachable too to
> reset.

Since this uses ip_append_data the generated response should have
its skb->sk set.

> There is also __skb_get_hash_symmetric() that could hit this from
> nft_hash?
>
> No idea what more callers need to be adjusted to remove this splat,
> that was a cursory tree review.
>
> And ip_output() sets on skb->dev from postrouting path, then if
> callers are fixed, then skb->dev would be once then again from output path?
Pablo Neira Ayuso June 5, 2024, 10:16 p.m. UTC | #6
On Wed, Jun 05, 2024 at 05:38:00PM -0400, Willem de Bruijn wrote:
> On Wed, Jun 5, 2024 at 3:45 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > On Wed, Jun 05, 2024 at 09:08:33PM +0200, Florian Westphal wrote:
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > >
> > > [ CC Willem ]
> > >
> > > > On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> > > > > Christoph Paasch <cpaasch@apple.com> wrote:
> > > > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > > > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > > >
> > > > > > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
> > > > >
> > > > > No, different bug, this patch is correct.
> > > > >
> > > > > I refuse to touch the flow dissector.
> > > >
> > > > I see callers of ip_local_out() in the tree which do not set skb->dev.
> > > >
> > > > I don't understand this:
> > > >
> > > > bool __skb_flow_dissect(const struct net *net,
> > > >                         const struct sk_buff *skb,
> > > >                         struct flow_dissector *flow_dissector,
> > > >                         void *target_container, const void *data,
> > > >                         __be16 proto, int nhoff, int hlen, unsigned int flags)
> > > > {
> > > > [...]
> > > >         WARN_ON_ONCE(!net);
> > > >         if (net) {
> > > >
> > > > it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
> > > >
> > > > Is this WARN_ON_ONCE() bogus?
> > >
> > > When this was added (handle dissection from bpf prog, per netns), the correct
> > > solution would have been to pass 'struct net' explicitly via skb_get_hash()
> > > and all variants.  As that was likely deemed to be too much code churn it
> > > tries to infer struct net via skb->{dev,sk}.
> 
> It has been a while, but I think we just did not anticipate skb's with
> neither dev nor sk set.
> 
> Digging through the layers from skb_hash to __skb_flow_dissect
> now, it does look impractical to add such an explicit API.
> 
> > > So there are several options here:
> > > 1. remove the WARN_ON_ONCE and be done with it
> > > 2. remove the WARN_ON_ONCE and pretend net was init_net
> > > 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
> > >    or 2)
> > > 4. stop using skb_get_hash() from netfilter (but there are likely other
> > >    callers that might hit this).
> > > 5. fix up callers, one by one
> > > 6. assign skb->dev inside netfilter if its unset
> 
> Is 6 a realistic option?

Postrouting path already sets on skb->dev via ip_output(), if callers
are "fixed" then skb->dev will get set twice.

the netfilter tracing infrastructure only needs a hash identifier for
this packet to be displayed from userspace when tracing rules, how is
the running the custom bpf dissector hook useful in such case?

the most correct solution is to pass struct net explicitly to the
dissector API instead of guessing what net this packet belongs to.

Else, remove this WARN_ON_ONCE which is just a oneliner.

> > > 3 and 2 combined are probably going to be the least invasive.
> > >
> > > 5 might take some time, we now know two, namely tcp resets generated
> > > from netfilter and igmp_send_report().  No idea if there are more.
> >
> > Quickly browsing, synproxy and tee also calls ip_local_out() too.
> >
> > icmp_send() which is used, eg. to send destination unreachable too to
> > reset.
> 
> Since this uses ip_append_data the generated response should have
> its skb->sk set.

thanks for explaining.

> > There is also __skb_get_hash_symmetric() that could hit this from
> > nft_hash?
> >
> > No idea what more callers need to be adjusted to remove this splat,
> > that was a cursory tree review.
> >
> > And ip_output() sets on skb->dev from postrouting path, then if
> > callers are fixed, then skb->dev would be once then again from output path?
Willem de Bruijn June 6, 2024, 1:54 a.m. UTC | #7
On Wed, Jun 5, 2024 at 6:16 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Wed, Jun 05, 2024 at 05:38:00PM -0400, Willem de Bruijn wrote:
> > On Wed, Jun 5, 2024 at 3:45 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > >
> > > On Wed, Jun 05, 2024 at 09:08:33PM +0200, Florian Westphal wrote:
> > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > >
> > > > [ CC Willem ]
> > > >
> > > > > On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> > > > > > Christoph Paasch <cpaasch@apple.com> wrote:
> > > > > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > > > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > > > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > > > > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > > > >
> > > > > > > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
> > > > > >
> > > > > > No, different bug, this patch is correct.
> > > > > >
> > > > > > I refuse to touch the flow dissector.
> > > > >
> > > > > I see callers of ip_local_out() in the tree which do not set skb->dev.
> > > > >
> > > > > I don't understand this:
> > > > >
> > > > > bool __skb_flow_dissect(const struct net *net,
> > > > >                         const struct sk_buff *skb,
> > > > >                         struct flow_dissector *flow_dissector,
> > > > >                         void *target_container, const void *data,
> > > > >                         __be16 proto, int nhoff, int hlen, unsigned int flags)
> > > > > {
> > > > > [...]
> > > > >         WARN_ON_ONCE(!net);
> > > > >         if (net) {
> > > > >
> > > > > it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
> > > > >
> > > > > Is this WARN_ON_ONCE() bogus?
> > > >
> > > > When this was added (handle dissection from bpf prog, per netns), the correct
> > > > solution would have been to pass 'struct net' explicitly via skb_get_hash()
> > > > and all variants.  As that was likely deemed to be too much code churn it
> > > > tries to infer struct net via skb->{dev,sk}.
> >
> > It has been a while, but I think we just did not anticipate skb's with
> > neither dev nor sk set.
> >
> > Digging through the layers from skb_hash to __skb_flow_dissect
> > now, it does look impractical to add such an explicit API.
> >
> > > > So there are several options here:
> > > > 1. remove the WARN_ON_ONCE and be done with it
> > > > 2. remove the WARN_ON_ONCE and pretend net was init_net
> > > > 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
> > > >    or 2)
> > > > 4. stop using skb_get_hash() from netfilter (but there are likely other
> > > >    callers that might hit this).
> > > > 5. fix up callers, one by one
> > > > 6. assign skb->dev inside netfilter if its unset
> >
> > Is 6 a realistic option?
>
> Postrouting path already sets on skb->dev via ip_output(), if callers
> are "fixed" then skb->dev will get set twice.

I meant to set it just before calling skb_get_hash and unset
right after. Just using the skb to piggy back the information.

> the netfilter tracing infrastructure only needs a hash identifier for
> this packet to be displayed from userspace when tracing rules, how is
> the running the custom bpf dissector hook useful in such case?

The BPF flow dissector is there as much to short circuit the
hard-coded C protocol dissectors as to expand on the existing
feature set. I did not want production machines exposed to
every protocol parser, as that set kept expanding.

Having different dissection algorithms depending on where the
packet enters the dissector is also surprising behavior?

If all that is needed is an opaque ID, and on egress skb->hash
is derived with skb_set_hash_from_sk from sk_txhash, then
this can even be pseudo-random from net_tx_rndhash().

> the most correct solution is to pass struct net explicitly to the
> dissector API instead of guessing what net this packet belongs to.

Unfortunately from skb_get_hash to __skb_flow_dissect is four
layers of indirections, including one with three underscores already,
that cannot be easily circumvented.

Temporarily passing it through skb (and unsetting after) seems
the simplest fix to me.

> Else, remove this WARN_ON_ONCE which is just a oneliner.

According to the commit that introduced it, this was to sniff out drivers
that don't set this (via eth_get_headlen).

The problem is that nothing else warns loudly, so you just quietly
lose the BPF dissection coverage.

This is the first time it warned. You point out that the value of BPF
is limited to you in this case. But that's not true for all such cases.
I'd rather dissection breaks hard than that it falls back quietly for
input from an untrusted network.

Maybe reduce it to DEBUG_NET_WARN_ON_ONCE?

> > > > 3 and 2 combined are probably going to be the least invasive.
> > > >
> > > > 5 might take some time, we now know two, namely tcp resets generated
> > > > from netfilter and igmp_send_report().  No idea if there are more.
> > >
> > > Quickly browsing, synproxy and tee also calls ip_local_out() too.
> > >
> > > icmp_send() which is used, eg. to send destination unreachable too to
> > > reset.
> >
> > Since this uses ip_append_data the generated response should have
> > its skb->sk set.
>
> thanks for explaining.
>
> > > There is also __skb_get_hash_symmetric() that could hit this from
> > > nft_hash?
> > >
> > > No idea what more callers need to be adjusted to remove this splat,
> > > that was a cursory tree review.
> > >
> > > And ip_output() sets on skb->dev from postrouting path, then if
> > > callers are fixed, then skb->dev would be once then again from output path?
Pablo Neira Ayuso June 6, 2024, 6:20 a.m. UTC | #8
Hi Willem,

On Wed, Jun 05, 2024 at 09:54:59PM -0400, Willem de Bruijn wrote:
> On Wed, Jun 5, 2024 at 6:16 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > On Wed, Jun 05, 2024 at 05:38:00PM -0400, Willem de Bruijn wrote:
> > > On Wed, Jun 5, 2024 at 3:45 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > >
> > > > On Wed, Jun 05, 2024 at 09:08:33PM +0200, Florian Westphal wrote:
> > > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > >
> > > > > [ CC Willem ]
> > > > >
> > > > > > On Wed, Jun 05, 2024 at 08:14:50PM +0200, Florian Westphal wrote:
> > > > > > > Christoph Paasch <cpaasch@apple.com> wrote:
> > > > > > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > > > > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > > > > > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
> > > > > > > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > > > > >
> > > > > > > > I just gave this one a shot in my syzkaller instances and am still hitting the issue.
> > > > > > >
> > > > > > > No, different bug, this patch is correct.
> > > > > > >
> > > > > > > I refuse to touch the flow dissector.
> > > > > >
> > > > > > I see callers of ip_local_out() in the tree which do not set skb->dev.
> > > > > >
> > > > > > I don't understand this:
> > > > > >
> > > > > > bool __skb_flow_dissect(const struct net *net,
> > > > > >                         const struct sk_buff *skb,
> > > > > >                         struct flow_dissector *flow_dissector,
> > > > > >                         void *target_container, const void *data,
> > > > > >                         __be16 proto, int nhoff, int hlen, unsigned int flags)
> > > > > > {
> > > > > > [...]
> > > > > >         WARN_ON_ONCE(!net);
> > > > > >         if (net) {
> > > > > >
> > > > > > it was added by 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
> > > > > >
> > > > > > Is this WARN_ON_ONCE() bogus?
> > > > >
> > > > > When this was added (handle dissection from bpf prog, per netns), the correct
> > > > > solution would have been to pass 'struct net' explicitly via skb_get_hash()
> > > > > and all variants.  As that was likely deemed to be too much code churn it
> > > > > tries to infer struct net via skb->{dev,sk}.
> > >
> > > It has been a while, but I think we just did not anticipate skb's with
> > > neither dev nor sk set.
> > >
> > > Digging through the layers from skb_hash to __skb_flow_dissect
> > > now, it does look impractical to add such an explicit API.
> > >
> > > > > So there are several options here:
> > > > > 1. remove the WARN_ON_ONCE and be done with it
> > > > > 2. remove the WARN_ON_ONCE and pretend net was init_net
> > > > > 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
> > > > >    or 2)
> > > > > 4. stop using skb_get_hash() from netfilter (but there are likely other
> > > > >    callers that might hit this).
> > > > > 5. fix up callers, one by one
> > > > > 6. assign skb->dev inside netfilter if its unset
> > >
> > > Is 6 a realistic option?
> >
> > Postrouting path already sets on skb->dev via ip_output(), if callers
> > are "fixed" then skb->dev will get set twice.
> 
> I meant to set it just before calling skb_get_hash and unset
> right after. Just using the skb to piggy back the information.
> 
> > the netfilter tracing infrastructure only needs a hash identifier for
> > this packet to be displayed from userspace when tracing rules, how is
> > the running the custom bpf dissector hook useful in such case?
> 
> The BPF flow dissector is there as much to short circuit the
> hard-coded C protocol dissectors as to expand on the existing
> feature set. I did not want production machines exposed to
> every protocol parser, as that set kept expanding.
> 
> Having different dissection algorithms depending on where the
> packet enters the dissector is also surprising behavior?

I would say: "having different dissection _operation modes_ depending on
where the packets enters the dissector is the expected behaviour", so
the dissector code adapts to the use-case.

> If all that is needed is an opaque ID, and on egress skb->hash
> is derived with skb_set_hash_from_sk from sk_txhash, then
> this can even be pseudo-random from net_tx_rndhash().

I think this will not work for the netfilter use-case, such
pseudo-random number would be then generated in the scope of the chain
and that packet could go over several chains while being processed.
Thus, displaying different IDs for the same packet.

> > the most correct solution is to pass struct net explicitly to the
> > dissector API instead of guessing what net this packet belongs to.
> 
> Unfortunately from skb_get_hash to __skb_flow_dissect is four
> layers of indirections, including one with three underscores already,
> that cannot be easily circumvented.
> 
> Temporarily passing it through skb (and unsetting after) seems
> the simplest fix to me.

it sounds like a temporary workaround. I understand your concern that
it requires a much larger patch to pass net to the flow dissector
infrastructure.

> > Else, remove this WARN_ON_ONCE which is just a oneliner.
> 
> According to the commit that introduced it, this was to sniff out drivers
> that don't set this (via eth_get_headlen).
> 
> The problem is that nothing else warns loudly, so you just quietly
> lose the BPF dissection coverage.
> 
> This is the first time it warned. You point out that the value of BPF
> is limited to you in this case. But that's not true for all such cases.
> I'd rather dissection breaks hard than that it falls back quietly for
> input from an untrusted network.
> 
> Maybe reduce it to DEBUG_NET_WARN_ON_ONCE?

I am afraid syzkaller and such will still hit this for Christoph,
since DEBUG_NET is usually recommended to be turned on in such case.

Another possibility is that netfilter stops using the dissector code
for the tracing infrastructure at the cost of replicating code.

Thanks.
Florian Westphal June 6, 2024, 8:39 a.m. UTC | #9
Willem de Bruijn <willemb@google.com> wrote:
> On Wed, Jun 5, 2024 at 3:45 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > On Wed, Jun 05, 2024 at 09:08:33PM +0200, Florian Westphal wrote:
> > > So there are several options here:
> > > 1. remove the WARN_ON_ONCE and be done with it
> > > 2. remove the WARN_ON_ONCE and pretend net was init_net
> > > 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
> > >    or 2)
> > > 4. stop using skb_get_hash() from netfilter (but there are likely other
> > >    callers that might hit this).
> > > 5. fix up callers, one by one
> > > 6. assign skb->dev inside netfilter if its unset
> 
> Is 6 a realistic option?

The output hook has to outdev available (its skb_dst(skb)->dev, passed
in from __ip_local_out()).

So we could set skb->dev = outdev, before calling skb_get_hash and
__skb_get_hash_symmetric.
Florian Westphal June 6, 2024, 9:26 a.m. UTC | #10
Florian Westphal <fw@strlen.de> wrote:
> When this was added (handle dissection from bpf prog, per netns), the correct
> solution would have been to pass 'struct net' explicitly via skb_get_hash()
> and all variants.  As that was likely deemed to be too much code churn it
> tries to infer struct net via skb->{dev,sk}.
> 
> So there are several options here:
> 1. remove the WARN_ON_ONCE and be done with it
> 2. remove the WARN_ON_ONCE and pretend net was init_net
> 3. also look at skb_dst(skb)->dev if skb->dev is unset, then back to 1)
>    or 2)
> 4. stop using skb_get_hash() from netfilter (but there are likely other
>    callers that might hit this).

diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
--- a/net/netfilter/nf_tables_trace.c
+++ b/net/netfilter/nf_tables_trace.c
@@ -303,6 +303,30 @@ void nft_trace_notify(const struct nft_pktinfo *pkt,
        kfree_skb(skb);
 }
 
+static u32 __nf_skb_get_hash(const struct net *net, struct sk_buff *skb)
+{
+       struct flow_keys keys;
+       u32 hash;
+
+       memset(&keys, 0, sizeof(keys));
+
+       __skb_flow_dissect(net, skb, &flow_keys_dissector,
+                          &keys, NULL, 0, 0, 0,
+                          FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
+
+       hash = flow_hash_from_keys(&keys);
+       __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
+       return hash;
+}
+
+static u32 nf_skb_get_hash(const struct net *net, struct sk_buff *skb)
+{
+       if (!skb->l4_hash && !skb->sw_hash)
+               return __nf_skb_get_hash(net, skb);
+
+       return skb->hash;
+}
+
 void nft_trace_init(struct nft_traceinfo *info, const struct nft_pktinfo *pkt,
                    const struct nft_chain *chain)
 {
@@ -317,7 +341,7 @@ void nft_trace_init(struct nft_traceinfo *info, const struct nft_pktinfo *pkt,
        net_get_random_once(&trace_key, sizeof(trace_key));
 
        info->skbid = (u32)siphash_3u32(hash32_ptr(skb),
-                                       skb_get_hash(skb),
+                                       nf_skb_get_hash(nft_net(pkt), skb),
                                        skb->skb_iif,
                                        &trace_key);
 }


... doesn't solve the nft_hash.c issue (which calls _symmetric version, and
that uses flow_key definiton that isn't exported outside flow_dissector.o.
Florian Westphal June 6, 2024, 1:04 p.m. UTC | #11
Florian Westphal <fw@strlen.de> wrote:
> ... doesn't solve the nft_hash.c issue (which calls _symmetric version, and
> that uses flow_key definiton that isn't exported outside flow_dissector.o.

and here is the diff that would pass net for _symmetric, not too
horrible I think.

With that and the copypaste of skb_get_hash into nf_trace infra
netfilter can still pass skbs to the flow dissector with NULL skb->sk,dev
but the WARN would no longer trigger as struct net is non-null.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9254bca2813d..e9e6cf3d148c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -524,12 +524,13 @@ static inline void tun_flow_save_rps_rxhash(struct tun_flow_entry *e, u32 hash)
  */
 static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb)
 {
+	const struct net *net = dev_net(tun->dev);
 	struct tun_flow_entry *e;
 	u32 txq, numqueues;
 
 	numqueues = READ_ONCE(tun->numqueues);
 
-	txq = __skb_get_hash_symmetric(skb);
+	txq = __skb_get_hash_symmetric(net, skb);
 	e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq);
 	if (e) {
 		tun_flow_save_rps_rxhash(e, txq);
@@ -1038,10 +1039,11 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
 		/* Select queue was not called for the skbuff, so we extract the
 		 * RPS hash and save it into the flow_table here.
 		 */
+		const struct net *net = dev_net(tun->dev);
 		struct tun_flow_entry *e;
 		__u32 rxhash;
 
-		rxhash = __skb_get_hash_symmetric(skb);
+		rxhash = __skb_get_hash_symmetric(net, skb);
 		e = tun_flow_find(&tun->flows[tun_hashfn(rxhash)], rxhash);
 		if (e)
 			tun_flow_save_rps_rxhash(e, rxhash);
@@ -1938,7 +1940,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	 */
 	if (!rcu_access_pointer(tun->steering_prog) && tun->numqueues > 1 &&
 	    !tfile->detached)
-		rxhash = __skb_get_hash_symmetric(skb);
+		rxhash = __skb_get_hash_symmetric(dev_net(tun->dev), skb);
 
 	rcu_read_lock();
 	if (unlikely(!(tun->dev->flags & IFF_UP))) {
@@ -2521,7 +2523,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 
 	if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
 	    !tfile->detached)
-		rxhash = __skb_get_hash_symmetric(skb);
+		rxhash = __skb_get_hash_symmetric(dev_net(tun->dev), skb);
 
 	if (tfile->napi_enabled) {
 		queue = &tfile->sk.sk_write_queue;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1c2902eaebd3..60a4dc5586c8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1493,7 +1493,7 @@ __skb_set_sw_hash(struct sk_buff *skb, __u32 hash, bool is_l4)
 }
 
 void __skb_get_hash(struct sk_buff *skb);
-u32 __skb_get_hash_symmetric(const struct sk_buff *skb);
+u32 __skb_get_hash_symmetric(const struct net *net, const struct sk_buff *skb);
 u32 skb_get_poff(const struct sk_buff *skb);
 u32 __skb_get_poff(const struct sk_buff *skb, const void *data,
 		   const struct flow_keys_basic *keys, int hlen);
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f82e9a7d3b37..634896129780 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1831,14 +1831,14 @@ EXPORT_SYMBOL(make_flow_keys_digest);
 
 static struct flow_dissector flow_keys_dissector_symmetric __read_mostly;
 
-u32 __skb_get_hash_symmetric(const struct sk_buff *skb)
+u32 __skb_get_hash_symmetric(const struct net *net, const struct sk_buff *skb)
 {
 	struct flow_keys keys;
 
 	__flow_hash_secret_init();
 
 	memset(&keys, 0, sizeof(keys));
-	__skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric,
+	__skb_flow_dissect(net, skb, &flow_keys_dissector_symmetric,
 			   &keys, NULL, 0, 0, 0, 0);
 
 	return __flow_hash_from_keys(&keys, &hashrnd);
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 92d47e469204..3e7296ed5319 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -51,7 +51,8 @@ static void nft_symhash_eval(const struct nft_expr *expr,
 	struct sk_buff *skb = pkt->skb;
 	u32 h;
 
-	h = reciprocal_scale(__skb_get_hash_symmetric(skb), priv->modulus);
+	h = reciprocal_scale(__skb_get_hash_symmetric(nft_net(pkt), skb),
+			     priv->modulus);
 
 	regs->data[priv->dreg] = h + priv->offset;
 }
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 964225580824..0e6166784972 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1084,7 +1084,8 @@ static int clone(struct datapath *dp, struct sk_buff *skb,
 			     !dont_clone_flow_key);
 }
 
-static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
+static void execute_hash(const struct net *net,
+			 struct sk_buff *skb, struct sw_flow_key *key,
 			 const struct nlattr *attr)
 {
 	struct ovs_action_hash *hash_act = nla_data(attr);
@@ -1097,7 +1098,7 @@ static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
 		/* OVS_HASH_ALG_SYM_L4 hashing type.  NOTE: this doesn't
 		 * extend past an encapsulated header.
 		 */
-		hash = __skb_get_hash_symmetric(skb);
+		hash = __skb_get_hash_symmetric(net, skb);
 	}
 
 	hash = jhash_1word(hash, hash_act->hash_basis);
@@ -1359,7 +1360,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			break;
 
 		case OVS_ACTION_ATTR_HASH:
-			execute_hash(skb, key, a);
+			execute_hash(ovs_dp_get_net(dp), skb, key, a);
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_MPLS: {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ea3ebc160e25..b047fdb0c02c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1364,7 +1364,9 @@ static unsigned int fanout_demux_hash(struct packet_fanout *f,
 				      struct sk_buff *skb,
 				      unsigned int num)
 {
-	return reciprocal_scale(__skb_get_hash_symmetric(skb), num);
+	struct net *net = read_pnet(&f->net);
+
+	return reciprocal_scale(__skb_get_hash_symmetric(net, skb), num);
 }
 
 static unsigned int fanout_demux_lb(struct packet_fanout *f,
Willem de Bruijn June 6, 2024, 2:09 p.m. UTC | #12
Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
> > ... doesn't solve the nft_hash.c issue (which calls _symmetric version, and
> > that uses flow_key definiton that isn't exported outside flow_dissector.o.
> 
> and here is the diff that would pass net for _symmetric, not too
> horrible I think.
> 
> With that and the copypaste of skb_get_hash into nf_trace infra
> netfilter can still pass skbs to the flow dissector with NULL skb->sk,dev
> but the WARN would no longer trigger as struct net is non-null.

Thanks for coding this up Florian. This overall looks good to me.

One suggested change is to introduce a three underscore variant (yes
really) ___skb_get_hash_symmetric that takes the optional net, and
leave the existing callers of the two underscore version as is.

The copypaste probably belongs with the other flow dissector wrappers
in sk_buff.h.

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 9254bca2813d..e9e6cf3d148c 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -524,12 +524,13 @@ static inline void tun_flow_save_rps_rxhash(struct tun_flow_entry *e, u32 hash)
>   */
>  static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb)
>  {
> +	const struct net *net = dev_net(tun->dev);
>  	struct tun_flow_entry *e;
>  	u32 txq, numqueues;
>  
>  	numqueues = READ_ONCE(tun->numqueues);
>  
> -	txq = __skb_get_hash_symmetric(skb);
> +	txq = __skb_get_hash_symmetric(net, skb);
>  	e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq);
>  	if (e) {
>  		tun_flow_save_rps_rxhash(e, txq);
> @@ -1038,10 +1039,11 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
>  		/* Select queue was not called for the skbuff, so we extract the
>  		 * RPS hash and save it into the flow_table here.
>  		 */
> +		const struct net *net = dev_net(tun->dev);
>  		struct tun_flow_entry *e;
>  		__u32 rxhash;
>  
> -		rxhash = __skb_get_hash_symmetric(skb);
> +		rxhash = __skb_get_hash_symmetric(net, skb);
>  		e = tun_flow_find(&tun->flows[tun_hashfn(rxhash)], rxhash);
>  		if (e)
>  			tun_flow_save_rps_rxhash(e, rxhash);
> @@ -1938,7 +1940,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  	 */
>  	if (!rcu_access_pointer(tun->steering_prog) && tun->numqueues > 1 &&
>  	    !tfile->detached)
> -		rxhash = __skb_get_hash_symmetric(skb);
> +		rxhash = __skb_get_hash_symmetric(dev_net(tun->dev), skb);
>  
>  	rcu_read_lock();
>  	if (unlikely(!(tun->dev->flags & IFF_UP))) {
> @@ -2521,7 +2523,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>  
>  	if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
>  	    !tfile->detached)
> -		rxhash = __skb_get_hash_symmetric(skb);
> +		rxhash = __skb_get_hash_symmetric(dev_net(tun->dev), skb);
>  
>  	if (tfile->napi_enabled) {
>  		queue = &tfile->sk.sk_write_queue;
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 1c2902eaebd3..60a4dc5586c8 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1493,7 +1493,7 @@ __skb_set_sw_hash(struct sk_buff *skb, __u32 hash, bool is_l4)
>  }
>  
>  void __skb_get_hash(struct sk_buff *skb);
> -u32 __skb_get_hash_symmetric(const struct sk_buff *skb);
> +u32 __skb_get_hash_symmetric(const struct net *net, const struct sk_buff *skb);
>  u32 skb_get_poff(const struct sk_buff *skb);
>  u32 __skb_get_poff(const struct sk_buff *skb, const void *data,
>  		   const struct flow_keys_basic *keys, int hlen);
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index f82e9a7d3b37..634896129780 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -1831,14 +1831,14 @@ EXPORT_SYMBOL(make_flow_keys_digest);
>  
>  static struct flow_dissector flow_keys_dissector_symmetric __read_mostly;
>  
> -u32 __skb_get_hash_symmetric(const struct sk_buff *skb)
> +u32 __skb_get_hash_symmetric(const struct net *net, const struct sk_buff *skb)
>  {
>  	struct flow_keys keys;
>  
>  	__flow_hash_secret_init();
>  
>  	memset(&keys, 0, sizeof(keys));
> -	__skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric,
> +	__skb_flow_dissect(net, skb, &flow_keys_dissector_symmetric,
>  			   &keys, NULL, 0, 0, 0, 0);
>  
>  	return __flow_hash_from_keys(&keys, &hashrnd);
> diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
> index 92d47e469204..3e7296ed5319 100644
> --- a/net/netfilter/nft_hash.c
> +++ b/net/netfilter/nft_hash.c
> @@ -51,7 +51,8 @@ static void nft_symhash_eval(const struct nft_expr *expr,
>  	struct sk_buff *skb = pkt->skb;
>  	u32 h;
>  
> -	h = reciprocal_scale(__skb_get_hash_symmetric(skb), priv->modulus);
> +	h = reciprocal_scale(__skb_get_hash_symmetric(nft_net(pkt), skb),
> +			     priv->modulus);
>  
>  	regs->data[priv->dreg] = h + priv->offset;
>  }
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 964225580824..0e6166784972 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -1084,7 +1084,8 @@ static int clone(struct datapath *dp, struct sk_buff *skb,
>  			     !dont_clone_flow_key);
>  }
>  
> -static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
> +static void execute_hash(const struct net *net,
> +			 struct sk_buff *skb, struct sw_flow_key *key,
>  			 const struct nlattr *attr)
>  {
>  	struct ovs_action_hash *hash_act = nla_data(attr);
> @@ -1097,7 +1098,7 @@ static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
>  		/* OVS_HASH_ALG_SYM_L4 hashing type.  NOTE: this doesn't
>  		 * extend past an encapsulated header.
>  		 */
> -		hash = __skb_get_hash_symmetric(skb);
> +		hash = __skb_get_hash_symmetric(net, skb);
>  	}
>  
>  	hash = jhash_1word(hash, hash_act->hash_basis);
> @@ -1359,7 +1360,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>  			break;
>  
>  		case OVS_ACTION_ATTR_HASH:
> -			execute_hash(skb, key, a);
> +			execute_hash(ovs_dp_get_net(dp), skb, key, a);
>  			break;
>  
>  		case OVS_ACTION_ATTR_PUSH_MPLS: {
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index ea3ebc160e25..b047fdb0c02c 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1364,7 +1364,9 @@ static unsigned int fanout_demux_hash(struct packet_fanout *f,
>  				      struct sk_buff *skb,
>  				      unsigned int num)
>  {
> -	return reciprocal_scale(__skb_get_hash_symmetric(skb), num);
> +	struct net *net = read_pnet(&f->net);
> +
> +	return reciprocal_scale(__skb_get_hash_symmetric(net, skb), num);
>  }
>  
>  static unsigned int fanout_demux_lb(struct packet_fanout *f,
Florian Westphal June 6, 2024, 2:15 p.m. UTC | #13
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> Florian Westphal wrote:
> > Florian Westphal <fw@strlen.de> wrote:
> > > ... doesn't solve the nft_hash.c issue (which calls _symmetric version, and
> > > that uses flow_key definiton that isn't exported outside flow_dissector.o.
> > 
> > and here is the diff that would pass net for _symmetric, not too
> > horrible I think.
> > 
> > With that and the copypaste of skb_get_hash into nf_trace infra
> > netfilter can still pass skbs to the flow dissector with NULL skb->sk,dev
> > but the WARN would no longer trigger as struct net is non-null.
> 
> Thanks for coding this up Florian. This overall looks good to me.

Thanks for reviewing.

> One suggested change is to introduce a three underscore variant (yes
> really) ___skb_get_hash_symmetric that takes the optional net, and
> leave the existing callers of the two underscore version as is.

Okay, that reduces the code churn.

> The copypaste probably belongs with the other flow dissector wrappers
> in sk_buff.h.

skb_get_hash(skb);
__skb_get_hash_symmetric(skb);
____skb_get_hash_symmetric(net, skb);

I named the copypasta as nf_skb_get_hash. If placed in sk_buff.h:
net_get_hash_net()?
skb_get_hash()?

And if either of that exists, maybe then use
skb_get_hash_symmetric_net(net, skb)

or similar?

(There is no skb_get_hash_symmetric, no idea why it
 uses __prefix).
Willem de Bruijn June 6, 2024, 2:28 p.m. UTC | #14
Florian Westphal wrote:
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > Florian Westphal wrote:
> > > Florian Westphal <fw@strlen.de> wrote:
> > > > ... doesn't solve the nft_hash.c issue (which calls _symmetric version, and
> > > > that uses flow_key definiton that isn't exported outside flow_dissector.o.
> > > 
> > > and here is the diff that would pass net for _symmetric, not too
> > > horrible I think.
> > > 
> > > With that and the copypaste of skb_get_hash into nf_trace infra
> > > netfilter can still pass skbs to the flow dissector with NULL skb->sk,dev
> > > but the WARN would no longer trigger as struct net is non-null.
> > 
> > Thanks for coding this up Florian. This overall looks good to me.
> 
> Thanks for reviewing.
> 
> > One suggested change is to introduce a three underscore variant (yes
> > really) ___skb_get_hash_symmetric that takes the optional net, and
> > leave the existing callers of the two underscore version as is.
> 
> Okay, that reduces the code churn.
> 
> > The copypaste probably belongs with the other flow dissector wrappers
> > in sk_buff.h.
> 
> skb_get_hash(skb);
> __skb_get_hash_symmetric(skb);
> ____skb_get_hash_symmetric(net, skb);
> 
> I named the copypasta as nf_skb_get_hash. If placed in sk_buff.h:
> net_get_hash_net()?
> skb_get_hash()?

Still passing an skb too, so skb_get_hash_net()?
 
> And if either of that exists, maybe then use
> skb_get_hash_symmetric_net(net, skb)

If symmetric is equally good for nft, that would be preferable, as it
avoids the extra function. But I suppose it aliases the two flow
directions, which may be exactly what you don't want?

> or similar?
> 
> (There is no skb_get_hash_symmetric, no idea why it
>  uses __prefix).

Perhaps because it is more closely analogous to __skb_get_hash, than
to skb_get_hash.
Florian Westphal June 6, 2024, 2:38 p.m. UTC | #15
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > I named the copypasta as nf_skb_get_hash. If placed in sk_buff.h:
> > net_get_hash_net()?
> > skb_get_hash()?
> 
> Still passing an skb too, so skb_get_hash_net()?

Sounds good to me.

> > And if either of that exists, maybe then use
> > skb_get_hash_symmetric_net(net, skb)
> 
> If symmetric is equally good for nft, that would be preferable, as it
> avoids the extra function. But I suppose it aliases the two flow
> directions, which may be exactly what you don't want?

It would actually be fine, but the more important part is that
skb->hash is set.

For the trace id, I want a stable identifier that won't change
(e.g. when nat rewrites addresses).

This currently works because skb_get_hash computes it at most once.

skb_get_hash_symmetric_net() will be used from nft_hash.c as
__skb_get_hash_symmetric "replacement".

Pablo, you can drop this patch, I will try the 'pass net to dissector'
route.
Willem de Bruijn June 6, 2024, 2:43 p.m. UTC | #16
Florian Westphal wrote:
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > > I named the copypasta as nf_skb_get_hash. If placed in sk_buff.h:
> > > net_get_hash_net()?
> > > skb_get_hash()?
> > 
> > Still passing an skb too, so skb_get_hash_net()?
> 
> Sounds good to me.
> 
> > > And if either of that exists, maybe then use
> > > skb_get_hash_symmetric_net(net, skb)
> > 
> > If symmetric is equally good for nft, that would be preferable, as it
> > avoids the extra function. But I suppose it aliases the two flow
> > directions, which may be exactly what you don't want?
> 
> It would actually be fine, but the more important part is that
> skb->hash is set.
> 
> For the trace id, I want a stable identifier that won't change
> (e.g. when nat rewrites addresses).
> 
> This currently works because skb_get_hash computes it at most once.

Probably not relevant to these skbs, that don't have an skb->sk.

But in case skbs coming from the TCP stack are also in scope: can
sk_rethink_txhash cause problems?
 
> skb_get_hash_symmetric_net() will be used from nft_hash.c as
> __skb_get_hash_symmetric "replacement".
> 
> Pablo, you can drop this patch, I will try the 'pass net to dissector'
> route.
Florian Westphal June 6, 2024, 2:52 p.m. UTC | #17
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > This currently works because skb_get_hash computes it at most once.
> 
> Probably not relevant to these skbs, that don't have an skb->sk.
> 
> But in case skbs coming from the TCP stack are also in scope: can
> sk_rethink_txhash cause problems?

No, thats fine for the nf_trace infra.
diff mbox series

Patch

diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 04504b2b51df..9333a779eab2 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -278,6 +278,7 @@  void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 	if (nskb->len > dst_mtu(skb_dst(nskb)))
 		goto free_nskb;
 
+	nskb->dev = skb_dst(nskb)->dev;
 	nf_ct_attach(nskb, oldskb);
 	nf_ct_set_closing(skb_nfct(oldskb));
 
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index dedee264b8f6..386223311579 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -334,6 +334,7 @@  void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		return;
 	}
 
+	nskb->dev = dst->dev;
 	skb_dst_set(nskb, dst);
 
 	nskb->mark = fl6.flowi6_mark;