diff mbox series

[nf] netfilter: nfnetlink_queue: reroute reinjected packets from postrouting

Message ID 20240912185832.11962-1-pablo@netfilter.org
State Not Applicable
Headers show
Series [nf] netfilter: nfnetlink_queue: reroute reinjected packets from postrouting | expand

Commit Message

Pablo Neira Ayuso Sept. 12, 2024, 6:58 p.m. UTC
368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
conntracks") adjusts NAT again in case that packet loses race to confirm
the conntrack entry.

The reinject path triggers a route lookup again for the output hook, but
not for the postrouting hook where queue to userspace is also possible.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
I tried but I am not managing to make a selftest that runs reliable.
I can reproduce it manually and validate that this works.

./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
userspace queue processing which helps trigger the race more easily,
socat needs to send several packets in the same UDP flow.

@Antonio: Could you try this patch meanwhile there is a testcase for
this.

 net/netfilter/nfnetlink_queue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Antonio Ojea Sept. 13, 2024, 6:24 a.m. UTC | #1
On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
> conntracks") adjusts NAT again in case that packet loses race to confirm
> the conntrack entry.
>
> The reinject path triggers a route lookup again for the output hook, but
> not for the postrouting hook where queue to userspace is also possible.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> I tried but I am not managing to make a selftest that runs reliable.
> I can reproduce it manually and validate that this works.
>
> ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
> userspace queue processing which helps trigger the race more easily,
> socat needs to send several packets in the same UDP flow.
>
> @Antonio: Could you try this patch meanwhile there is a testcase for
> this.

Let me test it and report back

>  net/netfilter/nfnetlink_queue.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> index e0716da256bf..aeb354271e85 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -276,7 +276,8 @@ static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry
>  #ifdef CONFIG_INET
>         const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry);
>
> -       if (entry->state.hook == NF_INET_LOCAL_OUT) {
> +       if (entry->state.hook == NF_INET_LOCAL_OUT ||
> +           entry->state.hook == NF_INET_POST_ROUTING) {
>                 const struct iphdr *iph = ip_hdr(skb);
>
>                 if (!(iph->tos == rt_info->tos &&
> --
> 2.30.2
>
Antonio Ojea Sept. 17, 2024, 10:01 p.m. UTC | #2
On Fri, 13 Sept 2024 at 07:24, Antonio Ojea
<antonio.ojea.garcia@gmail.com> wrote:
>
> On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
> > conntracks") adjusts NAT again in case that packet loses race to confirm
> > the conntrack entry.
> >
> > The reinject path triggers a route lookup again for the output hook, but
> > not for the postrouting hook where queue to userspace is also possible.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> > I tried but I am not managing to make a selftest that runs reliable.
> > I can reproduce it manually and validate that this works.
> >
> > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
> > userspace queue processing which helps trigger the race more easily,
> > socat needs to send several packets in the same UDP flow.
> >
> > @Antonio: Could you try this patch meanwhile there is a testcase for
> > this.
>
> Let me test it and report back
>

Ok, I finally managed to get this tested, and it does not seem to
solve the problem, it keeps dnating twice after the packet is enqueued
by nfqueue

See trace obtained with pwru, origin 10.244.0.3, virtual IP of DNS
server 10.96.0.10 that DNATs to 10.244.0.2 and 10.244.0.4

21:44:13.066 0xffff97ff83662280 0   <empty>:3552     2007043994686
10.244.0.3:39492->10.96.0.10:53(udp) nf_checksum
21:44:13.066 0xffff97ff83662280 0   <empty>:3552     2007043995059
10.244.0.3:39492->10.96.0.10:53(udp) nf_ip_checksum
21:44:13.066 0xffff97ff83662280 0   <empty>:3552     2007043995538
10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_ipv4_pre_routing
21:44:13.066 0xffff97ff83662280 0   <empty>:3552     2007043995957
10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_inet_fn
21:44:13.066 0xffff97ff83662280 0   <empty>:3552     2007043996439
10.244.0.3:39492->10.96.0.10:53(udp) nft_nat_do_chain
21:44:13.067 0xffff97ff83662280 0   <empty>:3552     2007043999827
10.244.0.3:39492->10.96.0.10:53(udp) xt_dnat_target_v2
21:44:13.067 0xffff97ff83662280 0   <empty>:3552     2007044000721
10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_manip_pkt
21:44:13.067 0xffff97ff83662280 0   <empty>:3552     2007044023444
10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_ipv4_manip_pkt
21:44:13.067 0xffff97ff83662280 0   <empty>:3552     2007044024162
10.244.0.3:39492->10.96.0.10:53(udp) skb_ensure_writable
21:44:13.068 0xffff97ff83662280 0   <empty>:3552     2007044024819
10.244.0.3:39492->10.96.0.10:53(udp) l4proto_manip_pkt
21:44:13.068 0xffff97ff83662280 0   <empty>:3552     2007044025158
10.244.0.3:39492->10.96.0.10:53(udp) skb_ensure_writable
21:44:13.068 0xffff97ff83662280 0   <empty>:3552     2007044025711
10.244.0.3:39492->10.96.0.10:53(udp) nf_csum_update
21:44:13.068 0xffff97ff83662280 0   <empty>:3552     2007044026381
10.244.0.3:39492->10.96.0.10:53(udp) inet_proto_csum_replace4
21:44:13.068 0xffff97ff83662280 0   <empty>:3552     2007044026730
10.244.0.3:39492->10.96.0.10:53(udp) inet_proto_csum_replace4
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044027433
10.244.0.3:39492->10.244.0.2:53(udp) ip_rcv_finish
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044028188
10.244.0.3:39492->10.244.0.2:53(udp) udp_v4_early_demux
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044029235
10.244.0.3:39492->10.244.0.2:53(udp) ip_route_input_noref
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044029696
10.244.0.3:39492->10.244.0.2:53(udp) ip_route_input_slow
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044030986
10.244.0.3:39492->10.244.0.2:53(udp) fib_validate_source
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044031571
10.244.0.3:39492->10.244.0.2:53(udp) __fib_validate_source
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044032576
10.244.0.3:39492->10.244.0.2:53(udp) ip_forward
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044033236
10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044034004
10.244.0.3:39492->10.244.0.2:53(udp) selinux_ip_forward
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044034601
10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_ipv4
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044037452
10.244.0.3:39492->10.244.0.2:53(udp) ip_output
21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044037796
10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow
21:44:13.070 0xffff97ff83662280 0   <empty>:3552     2007044038241
10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_inet
21:44:13.070 0xffff97ff83662280 0   <empty>:3552     2007044040343
10.244.0.3:39492->10.244.0.2:53(udp) nf_queue
--- snipped other skbs ---
21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052515236
10.244.0.3:39492->10.244.0.2:53(udp) nf_conntrack_update
21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052538616
10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_manip_pkt
21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052539511
10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_ipv4_manip_pkt
21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540123
10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable
21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540589
10.244.0.3:39492->10.244.0.2:53(udp) l4proto_manip_pkt
21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540875
10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable
21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052541326
10.244.0.3:39492->10.244.0.2:53(udp) nf_csum_update
21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052541944
10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4
21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052542259
10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4  <<<<
DNATed twice
21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052543321
10.244.0.3:39492->10.244.0.4:53(udp) ip_route_me_harder
21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052545374
10.244.0.3:39492->10.244.0.4:53(udp) __xfrm_decode_session
21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052546324
10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_ipv4_out
21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052546676
10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_inet_fn
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052547186
10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052547732
10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute_compat
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052548217
10.244.0.3:39492->10.244.0.4:53(udp) nf_confirm
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052548744
10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052549162
10.244.0.3:39492->10.244.0.4:53(udp) __ip_finish_output
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052549614
10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output2
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052550159
10.244.0.3:39492->10.244.0.4:53(udp) __dev_queue_xmit
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052550656
10.244.0.3:39492->10.244.0.4:53(udp) netdev_core_pick_tx
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052551436
10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_skb
21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052551882
10.244.0.3:39492->10.244.0.4:53(udp) netif_skb_features
21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052552291
10.244.0.3:39492->10.244.0.4:53(udp) passthru_features_check
21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052552672
10.244.0.3:39492->10.244.0.4:53(udp) skb_network_protocol
21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052553191
10.244.0.3:39492->10.244.0.4:53(udp) skb_csum_hwoffload_help
21:44:13.154 0xffff97ff83662280 0   <empty>:1463     2007052553566
10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_xfrm
21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052554026
10.244.0.3:39492->10.244.0.4:53(udp) dev_hard_start_xmit
21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052554482
10.244.0.3:39492->10.244.0.4:53(udp) veth_xmit
21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052555156
10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb
21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052555604
10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb2
21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052556045
10.244.0.3:39492->10.244.0.4:53(udp) skb_scrub_packet
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052556449
10.244.0.3:39492->10.244.0.4:53(udp) eth_type_trans
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052557536
10.244.0.3:39492->10.244.0.4:53(udp) __netif_rx
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052559424
10.244.0.3:39492->10.244.0.4:53(udp) netif_rx_internal
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052559872
10.244.0.3:39492->10.244.0.4:53(udp) enqueue_to_backlog
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052560827
10.244.0.3:39492->10.244.0.4:53(udp) __netif_receive_skb_one_core
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052561410
10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052561845
10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv_core
21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052564056
10.244.0.3:39492->10.244.0.4:53(udp)
kfree_skb_reason(SKB_DROP_REASON_OTHERHOST)




> >  net/netfilter/nfnetlink_queue.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> > index e0716da256bf..aeb354271e85 100644
> > --- a/net/netfilter/nfnetlink_queue.c
> > +++ b/net/netfilter/nfnetlink_queue.c
> > @@ -276,7 +276,8 @@ static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry
> >  #ifdef CONFIG_INET
> >         const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry);
> >
> > -       if (entry->state.hook == NF_INET_LOCAL_OUT) {
> > +       if (entry->state.hook == NF_INET_LOCAL_OUT ||
> > +           entry->state.hook == NF_INET_POST_ROUTING) {
> >                 const struct iphdr *iph = ip_hdr(skb);
> >
> >                 if (!(iph->tos == rt_info->tos &&
> > --
> > 2.30.2
> >
Pablo Neira Ayuso Sept. 18, 2024, 8:30 a.m. UTC | #3
Hi Antonio,

On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote:
> On Fri, 13 Sept 2024 at 07:24, Antonio Ojea
> <antonio.ojea.garcia@gmail.com> wrote:
> >
> > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > >
> > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
> > > conntracks") adjusts NAT again in case that packet loses race to confirm
> > > the conntrack entry.
> > >
> > > The reinject path triggers a route lookup again for the output hook, but
> > > not for the postrouting hook where queue to userspace is also possible.
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
> > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > ---
> > > I tried but I am not managing to make a selftest that runs reliable.
> > > I can reproduce it manually and validate that this works.
> > >
> > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
> > > userspace queue processing which helps trigger the race more easily,
> > > socat needs to send several packets in the same UDP flow.
> > >
> > > @Antonio: Could you try this patch meanwhile there is a testcase for
> > > this.
> >
> > Let me test it and report back
> >
> 
> Ok, I finally managed to get this tested, and it does not seem to
> solve the problem, it keeps dnating twice after the packet is enqueued
> by nfqueue

dnatting twice is required to deal with the conntrack confirmation race.

packet 1 enters prerouting, dnat is done using IP A as destination
packet 2 enters prerouting, dnat is done using IP B as destination
packet 1 is enqueued to userspace from postrouting, with unconfirmed conntrack
packet 2 is enqueued to userspace from postrouting, with unconfirmed conntrack
packet 1 is reinjected back to kernelspace from postrouting, conntrack
is confirmed, it uses IP A as destination.
*packet 2* is reinjected back to kernelspace from postrouting, ct lookup
tells us packet lost race, unconfirmed conntrack is dropped, update
mangling to use IP A for consistency (because packet 2 is using IP).

So far we have been assuming races with conntrack confirmation are
unlikely, thus, this is scenario is handled as a corner case which
requires this double mangling.

You still see packets being dropped, right? That should not happen,
I might be then missing something else because my patch triggers also
a re-routing which is required because packet 2 was finally mangled
to use IP A while route still points to IP B.
Florian Westphal Sept. 18, 2024, 8:42 a.m. UTC | #4
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Ok, I finally managed to get this tested, and it does not seem to
> > solve the problem, it keeps dnating twice after the packet is enqueued
> > by nfqueue
> 
> dnatting twice is required to deal with the conntrack confirmation race.

Antonio could also try this hack:

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -379,7 +379,7 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
                unsigned int ct_verdict = verdict;
 
                rcu_read_lock();
-               ct_hook = rcu_dereference(nf_ct_hook);
+               ct_hook = NULL;
                if (ct_hook)
                        ct_verdict = ct_hook->update(entry->state.net, entry->skb);
                rcu_read_unlock();

which defers this to the clash resolution logic.
The ct_hook->update infra predates this, I'm not sure we need
it anymore.
Antonio Ojea Sept. 18, 2024, 9:51 a.m. UTC | #5
On Wed, 18 Sept 2024 at 09:42, Florian Westphal <fw@strlen.de> wrote:
>
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Ok, I finally managed to get this tested, and it does not seem to
> > > solve the problem, it keeps dnating twice after the packet is enqueued
> > > by nfqueue
> >
> > dnatting twice is required to deal with the conntrack confirmation race.
>
> Antonio could also try this hack:
>
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -379,7 +379,7 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
>                 unsigned int ct_verdict = verdict;
>
>                 rcu_read_lock();
> -               ct_hook = rcu_dereference(nf_ct_hook);
> +               ct_hook = NULL;
>                 if (ct_hook)
>                         ct_verdict = ct_hook->update(entry->state.net, entry->skb);
>                 rcu_read_unlock();
>
> which defers this to the clash resolution logic.
> The ct_hook->update infra predates this, I'm not sure we need
> it anymore.

Awesome, it works perfectly

I have these patches in addition to this last one

c3d69b2c40bb selftests: netfilter: add reverse-clash resolution test case
fd7c45a0aa7a netfilter: conntrack: add clash resolution for reverse collisions
8bb12723d1c4 netfilter: nf_nat: don't try nat source port reallocation
for reverse dir clash

It works with and without 610cea0d00f8 netfilter: nfnetlink_queue:
reroute reinjected packets from postrouting
Florian Westphal Sept. 18, 2024, 9:54 a.m. UTC | #6
Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > Antonio could also try this hack:
> >
> > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> > --- a/net/netfilter/nfnetlink_queue.c
> > +++ b/net/netfilter/nfnetlink_queue.c
> > @@ -379,7 +379,7 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
> >                 unsigned int ct_verdict = verdict;
> >
> >                 rcu_read_lock();
> > -               ct_hook = rcu_dereference(nf_ct_hook);
> > +               ct_hook = NULL;
> >                 if (ct_hook)
> >                         ct_verdict = ct_hook->update(entry->state.net, entry->skb);
> >                 rcu_read_unlock();
> >
> > which defers this to the clash resolution logic.
> > The ct_hook->update infra predates this, I'm not sure we need
> > it anymore.
> 
> Awesome, it works perfectly

Great, I'll send a formal patch.
Pablo Neira Ayuso Sept. 18, 2024, 8:53 p.m. UTC | #7
On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote:
> On Fri, 13 Sept 2024 at 07:24, Antonio Ojea
> <antonio.ojea.garcia@gmail.com> wrote:
> >
> > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > >
> > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
> > > conntracks") adjusts NAT again in case that packet loses race to confirm
> > > the conntrack entry.
> > >
> > > The reinject path triggers a route lookup again for the output hook, but
> > > not for the postrouting hook where queue to userspace is also possible.
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
> > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > ---
> > > I tried but I am not managing to make a selftest that runs reliable.
> > > I can reproduce it manually and validate that this works.
> > >
> > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
> > > userspace queue processing which helps trigger the race more easily,
> > > socat needs to send several packets in the same UDP flow.
> > >
> > > @Antonio: Could you try this patch meanwhile there is a testcase for
> > > this.
> >
> > Let me test it and report back
> >
> 
> Ok, I finally managed to get this tested, and it does not seem to
> solve the problem, it keeps dnating twice after the packet is enqueued
> by nfqueue

I really don't understand why my patch did not work.

In this new test run you have use postrouting/filter chain, but it
does not call nf_reroute as in the previous trace.

Does program issues the NF_STOP verdict instead?

[...]
> 10.244.0.3:39492->10.244.0.2:53(udp) ip_output
> 21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044037796
> 10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow
> 21:44:13.070 0xffff97ff83662280 0   <empty>:3552     2007044038241
> 10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_inet
> 21:44:13.070 0xffff97ff83662280 0   <empty>:3552     2007044040343
> 10.244.0.3:39492->10.244.0.2:53(udp) nf_queue
> --- snipped other skbs ---
> 21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052515236
> 10.244.0.3:39492->10.244.0.2:53(udp) nf_conntrack_update
> 21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052538616
> 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_manip_pkt
> 21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052539511
> 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_ipv4_manip_pkt
> 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540123
> 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable
> 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540589
> 10.244.0.3:39492->10.244.0.2:53(udp) l4proto_manip_pkt
> 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540875
> 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable
> 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052541326
> 10.244.0.3:39492->10.244.0.2:53(udp) nf_csum_update
> 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052541944
> 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4
> 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052542259
> 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4  <<<<
> DNATed twice
> 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052543321
> 10.244.0.3:39492->10.244.0.4:53(udp) ip_route_me_harder
> 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052545374
> 10.244.0.3:39492->10.244.0.4:53(udp) __xfrm_decode_session
> 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052546324
> 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_ipv4_out
> 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052546676
> 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_inet_fn
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052547186
> 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052547732
> 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute_compat
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052548217
> 10.244.0.3:39492->10.244.0.4:53(udp) nf_confirm
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052548744
> 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052549162
> 10.244.0.3:39492->10.244.0.4:53(udp) __ip_finish_output
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052549614
> 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output2
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052550159
> 10.244.0.3:39492->10.244.0.4:53(udp) __dev_queue_xmit
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052550656
> 10.244.0.3:39492->10.244.0.4:53(udp) netdev_core_pick_tx
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052551436
> 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_skb
> 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052551882
> 10.244.0.3:39492->10.244.0.4:53(udp) netif_skb_features
> 21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052552291
> 10.244.0.3:39492->10.244.0.4:53(udp) passthru_features_check
> 21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052552672
> 10.244.0.3:39492->10.244.0.4:53(udp) skb_network_protocol
> 21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052553191
> 10.244.0.3:39492->10.244.0.4:53(udp) skb_csum_hwoffload_help
> 21:44:13.154 0xffff97ff83662280 0   <empty>:1463     2007052553566
> 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_xfrm
> 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052554026
> 10.244.0.3:39492->10.244.0.4:53(udp) dev_hard_start_xmit
> 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052554482
> 10.244.0.3:39492->10.244.0.4:53(udp) veth_xmit
> 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052555156
> 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb
> 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052555604
> 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb2
> 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052556045
> 10.244.0.3:39492->10.244.0.4:53(udp) skb_scrub_packet
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052556449
> 10.244.0.3:39492->10.244.0.4:53(udp) eth_type_trans
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052557536
> 10.244.0.3:39492->10.244.0.4:53(udp) __netif_rx
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052559424
> 10.244.0.3:39492->10.244.0.4:53(udp) netif_rx_internal
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052559872
> 10.244.0.3:39492->10.244.0.4:53(udp) enqueue_to_backlog
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052560827
> 10.244.0.3:39492->10.244.0.4:53(udp) __netif_receive_skb_one_core
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052561410
> 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052561845
> 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv_core
> 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052564056
> 10.244.0.3:39492->10.244.0.4:53(udp)
> kfree_skb_reason(SKB_DROP_REASON_OTHERHOST)
Antonio Ojea Sept. 18, 2024, 9:42 p.m. UTC | #8
On Wed, 18 Sept 2024 at 21:53, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote:
> > On Fri, 13 Sept 2024 at 07:24, Antonio Ojea
> > <antonio.ojea.garcia@gmail.com> wrote:
> > >
> > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > >
> > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
> > > > conntracks") adjusts NAT again in case that packet loses race to confirm
> > > > the conntrack entry.
> > > >
> > > > The reinject path triggers a route lookup again for the output hook, but
> > > > not for the postrouting hook where queue to userspace is also possible.
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
> > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > > ---
> > > > I tried but I am not managing to make a selftest that runs reliable.
> > > > I can reproduce it manually and validate that this works.
> > > >
> > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
> > > > userspace queue processing which helps trigger the race more easily,
> > > > socat needs to send several packets in the same UDP flow.
> > > >
> > > > @Antonio: Could you try this patch meanwhile there is a testcase for
> > > > this.
> > >
> > > Let me test it and report back
> > >
> >
> > Ok, I finally managed to get this tested, and it does not seem to
> > solve the problem, it keeps dnating twice after the packet is enqueued
> > by nfqueue
>
> I really don't understand why my patch did not work.
>
> In this new test run you have use postrouting/filter chain, but it
> does not call nf_reroute as in the previous trace.
>
> Does program issues the NF_STOP verdict instead?
>

The program is the same and the environment the same, I just change
the kernel of the VM.
I have observed different drops reasons, see my original comment
https://bugzilla.netfilter.org/show_bug.cgi?id=1766#c0 has
SKB_DROP_REASON_IP_RPFILTER and SKB_DROP_REASON_NEIGH_FAILED, and in
this trace we have SKB_DROP_REASON_OTHERHOST ... , can it be possible
you fixed one scenario with your patch but not the others?

> [...]
> > 10.244.0.3:39492->10.244.0.2:53(udp) ip_output
> > 21:44:13.069 0xffff97ff83662280 0   <empty>:3552     2007044037796
> > 10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow
> > 21:44:13.070 0xffff97ff83662280 0   <empty>:3552     2007044038241
> > 10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_inet
> > 21:44:13.070 0xffff97ff83662280 0   <empty>:3552     2007044040343
> > 10.244.0.3:39492->10.244.0.2:53(udp) nf_queue
> > --- snipped other skbs ---
> > 21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052515236
> > 10.244.0.3:39492->10.244.0.2:53(udp) nf_conntrack_update
> > 21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052538616
> > 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_manip_pkt
> > 21:44:13.149 0xffff97ff83662280 0   <empty>:1463     2007052539511
> > 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_ipv4_manip_pkt
> > 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540123
> > 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable
> > 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540589
> > 10.244.0.3:39492->10.244.0.2:53(udp) l4proto_manip_pkt
> > 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052540875
> > 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable
> > 21:44:13.150 0xffff97ff83662280 0   <empty>:1463     2007052541326
> > 10.244.0.3:39492->10.244.0.2:53(udp) nf_csum_update
> > 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052541944
> > 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4
> > 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052542259
> > 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4  <<<<
> > DNATed twice
> > 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052543321
> > 10.244.0.3:39492->10.244.0.4:53(udp) ip_route_me_harder
> > 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052545374
> > 10.244.0.3:39492->10.244.0.4:53(udp) __xfrm_decode_session
> > 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052546324
> > 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_ipv4_out
> > 21:44:13.151 0xffff97ff83662280 0   <empty>:1463     2007052546676
> > 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_inet_fn
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052547186
> > 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052547732
> > 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute_compat
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052548217
> > 10.244.0.3:39492->10.244.0.4:53(udp) nf_confirm
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052548744
> > 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052549162
> > 10.244.0.3:39492->10.244.0.4:53(udp) __ip_finish_output
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052549614
> > 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output2
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052550159
> > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_queue_xmit
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052550656
> > 10.244.0.3:39492->10.244.0.4:53(udp) netdev_core_pick_tx
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052551436
> > 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_skb
> > 21:44:13.152 0xffff97ff83662280 0   <empty>:1463     2007052551882
> > 10.244.0.3:39492->10.244.0.4:53(udp) netif_skb_features
> > 21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052552291
> > 10.244.0.3:39492->10.244.0.4:53(udp) passthru_features_check
> > 21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052552672
> > 10.244.0.3:39492->10.244.0.4:53(udp) skb_network_protocol
> > 21:44:13.153 0xffff97ff83662280 0   <empty>:1463     2007052553191
> > 10.244.0.3:39492->10.244.0.4:53(udp) skb_csum_hwoffload_help
> > 21:44:13.154 0xffff97ff83662280 0   <empty>:1463     2007052553566
> > 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_xfrm
> > 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052554026
> > 10.244.0.3:39492->10.244.0.4:53(udp) dev_hard_start_xmit
> > 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052554482
> > 10.244.0.3:39492->10.244.0.4:53(udp) veth_xmit
> > 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052555156
> > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb
> > 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052555604
> > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb2
> > 21:44:13.155 0xffff97ff83662280 0   <empty>:1463     2007052556045
> > 10.244.0.3:39492->10.244.0.4:53(udp) skb_scrub_packet
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052556449
> > 10.244.0.3:39492->10.244.0.4:53(udp) eth_type_trans
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052557536
> > 10.244.0.3:39492->10.244.0.4:53(udp) __netif_rx
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052559424
> > 10.244.0.3:39492->10.244.0.4:53(udp) netif_rx_internal
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052559872
> > 10.244.0.3:39492->10.244.0.4:53(udp) enqueue_to_backlog
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052560827
> > 10.244.0.3:39492->10.244.0.4:53(udp) __netif_receive_skb_one_core
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052561410
> > 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052561845
> > 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv_core
> > 21:44:13.156 0xffff97ff83662280 0   <empty>:1463     2007052564056
> > 10.244.0.3:39492->10.244.0.4:53(udp)
> > kfree_skb_reason(SKB_DROP_REASON_OTHERHOST)
Pablo Neira Ayuso Sept. 19, 2024, 12:02 p.m. UTC | #9
On Wed, Sep 18, 2024 at 10:42:25PM +0100, Antonio Ojea wrote:
> On Wed, 18 Sept 2024 at 21:53, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote:
> > > On Fri, 13 Sept 2024 at 07:24, Antonio Ojea
> > > <antonio.ojea.garcia@gmail.com> wrote:
> > > >
> > > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > >
> > > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
> > > > > conntracks") adjusts NAT again in case that packet loses race to confirm
> > > > > the conntrack entry.
> > > > >
> > > > > The reinject path triggers a route lookup again for the output hook, but
> > > > > not for the postrouting hook where queue to userspace is also possible.
> > > > >
> > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
> > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > > > ---
> > > > > I tried but I am not managing to make a selftest that runs reliable.
> > > > > I can reproduce it manually and validate that this works.
> > > > >
> > > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the
> > > > > userspace queue processing which helps trigger the race more easily,
> > > > > socat needs to send several packets in the same UDP flow.
> > > > >
> > > > > @Antonio: Could you try this patch meanwhile there is a testcase for
> > > > > this.
> > > >
> > > > Let me test it and report back
> > > >
> > >
> > > Ok, I finally managed to get this tested, and it does not seem to
> > > solve the problem, it keeps dnating twice after the packet is enqueued
> > > by nfqueue
> >
> > I really don't understand why my patch did not work.
> >
> > In this new test run you have use postrouting/filter chain, but it
> > does not call nf_reroute as in the previous trace.
> >
> > Does program issues the NF_STOP verdict instead?
> >
> 
> The program is the same and the environment the same, I just change
> the kernel of the VM.
> I have observed different drops reasons, see my original comment
> https://bugzilla.netfilter.org/show_bug.cgi?id=1766#c0 has
> SKB_DROP_REASON_IP_RPFILTER and SKB_DROP_REASON_NEIGH_FAILED, and in
> this trace we have SKB_DROP_REASON_OTHERHOST ... , can it be possible
> you fixed one scenario with your patch but not the others?

It could be different scenario. I was expecting consistency in UDP packet
distribution is a requirement, but I understood goal at this stage is
to ensure packets are not dropped while dealing with clash resolution.

I have applied Florian's patch to nf.git, thanks.
Antonio Ojea Oct. 6, 2024, 2:44 p.m. UTC | #10
>
> It could be different scenario. I was expecting consistency in UDP packet
> distribution is a requirement, but I understood goal at this stage is
> to ensure packets are not dropped while dealing with clash resolution.
>
> I have applied Florian's patch to nf.git, thanks.

Is there a workaround I can apply in the meantime? kernels fixes take
a long time to be on users' distros and I have continuous reports
about this problem.

I was thinking that I can track the tuples in userspace and hold the
duplicate for some time, but I'm not sure this will completely solve
the problem and I want to consider this as a last resort.
Is there any feature in nftables that can help? any ideas/suggestions
I can explore?
Antonio Ojea Oct. 7, 2024, 8:14 a.m. UTC | #11
On Sun, 6 Oct 2024 at 15:44, Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
>
> >
> > It could be different scenario. I was expecting consistency in UDP packet
> > distribution is a requirement, but I understood goal at this stage is
> > to ensure packets are not dropped while dealing with clash resolution.
> >
> > I have applied Florian's patch to nf.git, thanks.
>
> Is there a workaround I can apply in the meantime? kernels fixes take
> a long time to be on users' distros and I have continuous reports
> about this problem.
>
> I was thinking that I can track the tuples in userspace and hold the
> duplicate for some time, but I'm not sure this will completely solve
> the problem and I want to consider this as a last resort.
> Is there any feature in nftables that can help? any ideas/suggestions
> I can explore?

answering myself and for reference in case someone hits the same
problem, I just special cased the DNS traffic to be processed only in
the PREROUTING hook after DNAT and skip it in POSTROUTING, this does
not seem to trigger the race problem.
Pablo Neira Ayuso Oct. 7, 2024, 8:30 a.m. UTC | #12
On Mon, Oct 07, 2024 at 09:14:41AM +0100, Antonio Ojea wrote:
> On Sun, 6 Oct 2024 at 15:44, Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> >
> > >
> > > It could be different scenario. I was expecting consistency in UDP packet
> > > distribution is a requirement, but I understood goal at this stage is
> > > to ensure packets are not dropped while dealing with clash resolution.
> > >
> > > I have applied Florian's patch to nf.git, thanks.
> >
> > Is there a workaround I can apply in the meantime? kernels fixes take
> > a long time to be on users' distros and I have continuous reports
> > about this problem.
> >
> > I was thinking that I can track the tuples in userspace and hold the
> > duplicate for some time, but I'm not sure this will completely solve
> > the problem and I want to consider this as a last resort.
> > Is there any feature in nftables that can help? any ideas/suggestions
> > I can explore?
> 
> answering myself and for reference in case someone hits the same
> problem, I just special cased the DNS traffic to be processed only in
> the PREROUTING hook after DNAT and skip it in POSTROUTING, this does
> not seem to trigger the race problem.

I am going to request inclusion of this patch to -stable so you don't
have to carry this workaround in the near future.
diff mbox series

Patch

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index e0716da256bf..aeb354271e85 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -276,7 +276,8 @@  static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry
 #ifdef CONFIG_INET
 	const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry);
 
-	if (entry->state.hook == NF_INET_LOCAL_OUT) {
+	if (entry->state.hook == NF_INET_LOCAL_OUT ||
+	    entry->state.hook == NF_INET_POST_ROUTING) {
 		const struct iphdr *iph = ip_hdr(skb);
 
 		if (!(iph->tos == rt_info->tos &&