Message ID | 20240912185832.11962-1-pablo@netfilter.org |
---|---|
State | Not Applicable |
Headers | show |
Series | [nf] netfilter: nfnetlink_queue: reroute reinjected packets from postrouting | expand |
On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed > conntracks") adjusts NAT again in case that packet loses race to confirm > the conntrack entry. > > The reinject path triggers a route lookup again for the output hook, but > not for the postrouting hook where queue to userspace is also possible. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > --- > I tried but I am not managing to make a selftest that runs reliable. > I can reproduce it manually and validate that this works. > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the > userspace queue processing which helps trigger the race more easily, > socat needs to send several packets in the same UDP flow. > > @Antonio: Could you try this patch meanwhile there is a testcase for > this. Let me test it and report back > net/netfilter/nfnetlink_queue.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c > index e0716da256bf..aeb354271e85 100644 > --- a/net/netfilter/nfnetlink_queue.c > +++ b/net/netfilter/nfnetlink_queue.c > @@ -276,7 +276,8 @@ static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry > #ifdef CONFIG_INET > const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry); > > - if (entry->state.hook == NF_INET_LOCAL_OUT) { > + if (entry->state.hook == NF_INET_LOCAL_OUT || > + entry->state.hook == NF_INET_POST_ROUTING) { > const struct iphdr *iph = ip_hdr(skb); > > if (!(iph->tos == rt_info->tos && > -- > 2.30.2 >
On Fri, 13 Sept 2024 at 07:24, Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote: > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed > > conntracks") adjusts NAT again in case that packet loses race to confirm > > the conntrack entry. > > > > The reinject path triggers a route lookup again for the output hook, but > > not for the postrouting hook where queue to userspace is also possible. > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > --- > > I tried but I am not managing to make a selftest that runs reliable. > > I can reproduce it manually and validate that this works. > > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the > > userspace queue processing which helps trigger the race more easily, > > socat needs to send several packets in the same UDP flow. > > > > @Antonio: Could you try this patch meanwhile there is a testcase for > > this. > > Let me test it and report back > Ok, I finally managed to get this tested, and it does not seem to solve the problem, it keeps dnating twice after the packet is enqueued by nfqueue See trace obtained with pwru, origin 10.244.0.3, virtual IP of DNS server 10.96.0.10 that DNATs to 10.244.0.2 and 10.244.0.4 21:44:13.066 0xffff97ff83662280 0 <empty>:3552 2007043994686 10.244.0.3:39492->10.96.0.10:53(udp) nf_checksum 21:44:13.066 0xffff97ff83662280 0 <empty>:3552 2007043995059 10.244.0.3:39492->10.96.0.10:53(udp) nf_ip_checksum 21:44:13.066 0xffff97ff83662280 0 <empty>:3552 2007043995538 10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_ipv4_pre_routing 21:44:13.066 0xffff97ff83662280 0 <empty>:3552 2007043995957 10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_inet_fn 21:44:13.066 0xffff97ff83662280 0 <empty>:3552 2007043996439 10.244.0.3:39492->10.96.0.10:53(udp) nft_nat_do_chain 21:44:13.067 0xffff97ff83662280 0 <empty>:3552 2007043999827 10.244.0.3:39492->10.96.0.10:53(udp) xt_dnat_target_v2 21:44:13.067 0xffff97ff83662280 0 <empty>:3552 2007044000721 10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_manip_pkt 21:44:13.067 0xffff97ff83662280 0 <empty>:3552 2007044023444 10.244.0.3:39492->10.96.0.10:53(udp) nf_nat_ipv4_manip_pkt 21:44:13.067 0xffff97ff83662280 0 <empty>:3552 2007044024162 10.244.0.3:39492->10.96.0.10:53(udp) skb_ensure_writable 21:44:13.068 0xffff97ff83662280 0 <empty>:3552 2007044024819 10.244.0.3:39492->10.96.0.10:53(udp) l4proto_manip_pkt 21:44:13.068 0xffff97ff83662280 0 <empty>:3552 2007044025158 10.244.0.3:39492->10.96.0.10:53(udp) skb_ensure_writable 21:44:13.068 0xffff97ff83662280 0 <empty>:3552 2007044025711 10.244.0.3:39492->10.96.0.10:53(udp) nf_csum_update 21:44:13.068 0xffff97ff83662280 0 <empty>:3552 2007044026381 10.244.0.3:39492->10.96.0.10:53(udp) inet_proto_csum_replace4 21:44:13.068 0xffff97ff83662280 0 <empty>:3552 2007044026730 10.244.0.3:39492->10.96.0.10:53(udp) inet_proto_csum_replace4 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044027433 10.244.0.3:39492->10.244.0.2:53(udp) ip_rcv_finish 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044028188 10.244.0.3:39492->10.244.0.2:53(udp) udp_v4_early_demux 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044029235 10.244.0.3:39492->10.244.0.2:53(udp) ip_route_input_noref 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044029696 10.244.0.3:39492->10.244.0.2:53(udp) ip_route_input_slow 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044030986 10.244.0.3:39492->10.244.0.2:53(udp) fib_validate_source 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044031571 10.244.0.3:39492->10.244.0.2:53(udp) __fib_validate_source 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044032576 10.244.0.3:39492->10.244.0.2:53(udp) ip_forward 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044033236 10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044034004 10.244.0.3:39492->10.244.0.2:53(udp) selinux_ip_forward 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044034601 10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_ipv4 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044037452 10.244.0.3:39492->10.244.0.2:53(udp) ip_output 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044037796 10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow 21:44:13.070 0xffff97ff83662280 0 <empty>:3552 2007044038241 10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_inet 21:44:13.070 0xffff97ff83662280 0 <empty>:3552 2007044040343 10.244.0.3:39492->10.244.0.2:53(udp) nf_queue --- snipped other skbs --- 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052515236 10.244.0.3:39492->10.244.0.2:53(udp) nf_conntrack_update 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052538616 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_manip_pkt 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052539511 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_ipv4_manip_pkt 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540123 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540589 10.244.0.3:39492->10.244.0.2:53(udp) l4proto_manip_pkt 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540875 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052541326 10.244.0.3:39492->10.244.0.2:53(udp) nf_csum_update 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052541944 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052542259 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4 <<<< DNATed twice 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052543321 10.244.0.3:39492->10.244.0.4:53(udp) ip_route_me_harder 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052545374 10.244.0.3:39492->10.244.0.4:53(udp) __xfrm_decode_session 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052546324 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_ipv4_out 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052546676 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_inet_fn 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052547186 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052547732 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute_compat 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052548217 10.244.0.3:39492->10.244.0.4:53(udp) nf_confirm 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052548744 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052549162 10.244.0.3:39492->10.244.0.4:53(udp) __ip_finish_output 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052549614 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output2 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052550159 10.244.0.3:39492->10.244.0.4:53(udp) __dev_queue_xmit 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052550656 10.244.0.3:39492->10.244.0.4:53(udp) netdev_core_pick_tx 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052551436 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_skb 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052551882 10.244.0.3:39492->10.244.0.4:53(udp) netif_skb_features 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052552291 10.244.0.3:39492->10.244.0.4:53(udp) passthru_features_check 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052552672 10.244.0.3:39492->10.244.0.4:53(udp) skb_network_protocol 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052553191 10.244.0.3:39492->10.244.0.4:53(udp) skb_csum_hwoffload_help 21:44:13.154 0xffff97ff83662280 0 <empty>:1463 2007052553566 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_xfrm 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052554026 10.244.0.3:39492->10.244.0.4:53(udp) dev_hard_start_xmit 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052554482 10.244.0.3:39492->10.244.0.4:53(udp) veth_xmit 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052555156 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052555604 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb2 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052556045 10.244.0.3:39492->10.244.0.4:53(udp) skb_scrub_packet 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052556449 10.244.0.3:39492->10.244.0.4:53(udp) eth_type_trans 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052557536 10.244.0.3:39492->10.244.0.4:53(udp) __netif_rx 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052559424 10.244.0.3:39492->10.244.0.4:53(udp) netif_rx_internal 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052559872 10.244.0.3:39492->10.244.0.4:53(udp) enqueue_to_backlog 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052560827 10.244.0.3:39492->10.244.0.4:53(udp) __netif_receive_skb_one_core 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052561410 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052561845 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv_core 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052564056 10.244.0.3:39492->10.244.0.4:53(udp) kfree_skb_reason(SKB_DROP_REASON_OTHERHOST) > > net/netfilter/nfnetlink_queue.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c > > index e0716da256bf..aeb354271e85 100644 > > --- a/net/netfilter/nfnetlink_queue.c > > +++ b/net/netfilter/nfnetlink_queue.c > > @@ -276,7 +276,8 @@ static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry > > #ifdef CONFIG_INET > > const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry); > > > > - if (entry->state.hook == NF_INET_LOCAL_OUT) { > > + if (entry->state.hook == NF_INET_LOCAL_OUT || > > + entry->state.hook == NF_INET_POST_ROUTING) { > > const struct iphdr *iph = ip_hdr(skb); > > > > if (!(iph->tos == rt_info->tos && > > -- > > 2.30.2 > >
Hi Antonio, On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote: > On Fri, 13 Sept 2024 at 07:24, Antonio Ojea > <antonio.ojea.garcia@gmail.com> wrote: > > > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed > > > conntracks") adjusts NAT again in case that packet loses race to confirm > > > the conntrack entry. > > > > > > The reinject path triggers a route lookup again for the output hook, but > > > not for the postrouting hook where queue to userspace is also possible. > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > > --- > > > I tried but I am not managing to make a selftest that runs reliable. > > > I can reproduce it manually and validate that this works. > > > > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the > > > userspace queue processing which helps trigger the race more easily, > > > socat needs to send several packets in the same UDP flow. > > > > > > @Antonio: Could you try this patch meanwhile there is a testcase for > > > this. > > > > Let me test it and report back > > > > Ok, I finally managed to get this tested, and it does not seem to > solve the problem, it keeps dnating twice after the packet is enqueued > by nfqueue dnatting twice is required to deal with the conntrack confirmation race. packet 1 enters prerouting, dnat is done using IP A as destination packet 2 enters prerouting, dnat is done using IP B as destination packet 1 is enqueued to userspace from postrouting, with unconfirmed conntrack packet 2 is enqueued to userspace from postrouting, with unconfirmed conntrack packet 1 is reinjected back to kernelspace from postrouting, conntrack is confirmed, it uses IP A as destination. *packet 2* is reinjected back to kernelspace from postrouting, ct lookup tells us packet lost race, unconfirmed conntrack is dropped, update mangling to use IP A for consistency (because packet 2 is using IP). So far we have been assuming races with conntrack confirmation are unlikely, thus, this is scenario is handled as a corner case which requires this double mangling. You still see packets being dropped, right? That should not happen, I might be then missing something else because my patch triggers also a re-routing which is required because packet 2 was finally mangled to use IP A while route still points to IP B.
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > Ok, I finally managed to get this tested, and it does not seem to > > solve the problem, it keeps dnating twice after the packet is enqueued > > by nfqueue > > dnatting twice is required to deal with the conntrack confirmation race. Antonio could also try this hack: diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -379,7 +379,7 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict) unsigned int ct_verdict = verdict; rcu_read_lock(); - ct_hook = rcu_dereference(nf_ct_hook); + ct_hook = NULL; if (ct_hook) ct_verdict = ct_hook->update(entry->state.net, entry->skb); rcu_read_unlock(); which defers this to the clash resolution logic. The ct_hook->update infra predates this, I'm not sure we need it anymore.
On Wed, 18 Sept 2024 at 09:42, Florian Westphal <fw@strlen.de> wrote: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > Ok, I finally managed to get this tested, and it does not seem to > > > solve the problem, it keeps dnating twice after the packet is enqueued > > > by nfqueue > > > > dnatting twice is required to deal with the conntrack confirmation race. > > Antonio could also try this hack: > > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c > --- a/net/netfilter/nfnetlink_queue.c > +++ b/net/netfilter/nfnetlink_queue.c > @@ -379,7 +379,7 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict) > unsigned int ct_verdict = verdict; > > rcu_read_lock(); > - ct_hook = rcu_dereference(nf_ct_hook); > + ct_hook = NULL; > if (ct_hook) > ct_verdict = ct_hook->update(entry->state.net, entry->skb); > rcu_read_unlock(); > > which defers this to the clash resolution logic. > The ct_hook->update infra predates this, I'm not sure we need > it anymore. Awesome, it works perfectly I have these patches in addition to this last one c3d69b2c40bb selftests: netfilter: add reverse-clash resolution test case fd7c45a0aa7a netfilter: conntrack: add clash resolution for reverse collisions 8bb12723d1c4 netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash It works with and without 610cea0d00f8 netfilter: nfnetlink_queue: reroute reinjected packets from postrouting
Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote: > > Antonio could also try this hack: > > > > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c > > --- a/net/netfilter/nfnetlink_queue.c > > +++ b/net/netfilter/nfnetlink_queue.c > > @@ -379,7 +379,7 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict) > > unsigned int ct_verdict = verdict; > > > > rcu_read_lock(); > > - ct_hook = rcu_dereference(nf_ct_hook); > > + ct_hook = NULL; > > if (ct_hook) > > ct_verdict = ct_hook->update(entry->state.net, entry->skb); > > rcu_read_unlock(); > > > > which defers this to the clash resolution logic. > > The ct_hook->update infra predates this, I'm not sure we need > > it anymore. > > Awesome, it works perfectly Great, I'll send a formal patch.
On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote: > On Fri, 13 Sept 2024 at 07:24, Antonio Ojea > <antonio.ojea.garcia@gmail.com> wrote: > > > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed > > > conntracks") adjusts NAT again in case that packet loses race to confirm > > > the conntrack entry. > > > > > > The reinject path triggers a route lookup again for the output hook, but > > > not for the postrouting hook where queue to userspace is also possible. > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > > --- > > > I tried but I am not managing to make a selftest that runs reliable. > > > I can reproduce it manually and validate that this works. > > > > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the > > > userspace queue processing which helps trigger the race more easily, > > > socat needs to send several packets in the same UDP flow. > > > > > > @Antonio: Could you try this patch meanwhile there is a testcase for > > > this. > > > > Let me test it and report back > > > > Ok, I finally managed to get this tested, and it does not seem to > solve the problem, it keeps dnating twice after the packet is enqueued > by nfqueue I really don't understand why my patch did not work. In this new test run you have use postrouting/filter chain, but it does not call nf_reroute as in the previous trace. Does program issues the NF_STOP verdict instead? [...] > 10.244.0.3:39492->10.244.0.2:53(udp) ip_output > 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044037796 > 10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow > 21:44:13.070 0xffff97ff83662280 0 <empty>:3552 2007044038241 > 10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_inet > 21:44:13.070 0xffff97ff83662280 0 <empty>:3552 2007044040343 > 10.244.0.3:39492->10.244.0.2:53(udp) nf_queue > --- snipped other skbs --- > 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052515236 > 10.244.0.3:39492->10.244.0.2:53(udp) nf_conntrack_update > 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052538616 > 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_manip_pkt > 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052539511 > 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_ipv4_manip_pkt > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540123 > 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540589 > 10.244.0.3:39492->10.244.0.2:53(udp) l4proto_manip_pkt > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540875 > 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052541326 > 10.244.0.3:39492->10.244.0.2:53(udp) nf_csum_update > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052541944 > 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4 > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052542259 > 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4 <<<< > DNATed twice > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052543321 > 10.244.0.3:39492->10.244.0.4:53(udp) ip_route_me_harder > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052545374 > 10.244.0.3:39492->10.244.0.4:53(udp) __xfrm_decode_session > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052546324 > 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_ipv4_out > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052546676 > 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_inet_fn > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052547186 > 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052547732 > 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute_compat > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052548217 > 10.244.0.3:39492->10.244.0.4:53(udp) nf_confirm > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052548744 > 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052549162 > 10.244.0.3:39492->10.244.0.4:53(udp) __ip_finish_output > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052549614 > 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output2 > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052550159 > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_queue_xmit > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052550656 > 10.244.0.3:39492->10.244.0.4:53(udp) netdev_core_pick_tx > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052551436 > 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_skb > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052551882 > 10.244.0.3:39492->10.244.0.4:53(udp) netif_skb_features > 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052552291 > 10.244.0.3:39492->10.244.0.4:53(udp) passthru_features_check > 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052552672 > 10.244.0.3:39492->10.244.0.4:53(udp) skb_network_protocol > 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052553191 > 10.244.0.3:39492->10.244.0.4:53(udp) skb_csum_hwoffload_help > 21:44:13.154 0xffff97ff83662280 0 <empty>:1463 2007052553566 > 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_xfrm > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052554026 > 10.244.0.3:39492->10.244.0.4:53(udp) dev_hard_start_xmit > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052554482 > 10.244.0.3:39492->10.244.0.4:53(udp) veth_xmit > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052555156 > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052555604 > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb2 > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052556045 > 10.244.0.3:39492->10.244.0.4:53(udp) skb_scrub_packet > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052556449 > 10.244.0.3:39492->10.244.0.4:53(udp) eth_type_trans > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052557536 > 10.244.0.3:39492->10.244.0.4:53(udp) __netif_rx > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052559424 > 10.244.0.3:39492->10.244.0.4:53(udp) netif_rx_internal > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052559872 > 10.244.0.3:39492->10.244.0.4:53(udp) enqueue_to_backlog > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052560827 > 10.244.0.3:39492->10.244.0.4:53(udp) __netif_receive_skb_one_core > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052561410 > 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052561845 > 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv_core > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052564056 > 10.244.0.3:39492->10.244.0.4:53(udp) > kfree_skb_reason(SKB_DROP_REASON_OTHERHOST)
On Wed, 18 Sept 2024 at 21:53, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote: > > On Fri, 13 Sept 2024 at 07:24, Antonio Ojea > > <antonio.ojea.garcia@gmail.com> wrote: > > > > > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > > > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed > > > > conntracks") adjusts NAT again in case that packet loses race to confirm > > > > the conntrack entry. > > > > > > > > The reinject path triggers a route lookup again for the output hook, but > > > > not for the postrouting hook where queue to userspace is also possible. > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > > > --- > > > > I tried but I am not managing to make a selftest that runs reliable. > > > > I can reproduce it manually and validate that this works. > > > > > > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the > > > > userspace queue processing which helps trigger the race more easily, > > > > socat needs to send several packets in the same UDP flow. > > > > > > > > @Antonio: Could you try this patch meanwhile there is a testcase for > > > > this. > > > > > > Let me test it and report back > > > > > > > Ok, I finally managed to get this tested, and it does not seem to > > solve the problem, it keeps dnating twice after the packet is enqueued > > by nfqueue > > I really don't understand why my patch did not work. > > In this new test run you have use postrouting/filter chain, but it > does not call nf_reroute as in the previous trace. > > Does program issues the NF_STOP verdict instead? > The program is the same and the environment the same, I just change the kernel of the VM. I have observed different drops reasons, see my original comment https://bugzilla.netfilter.org/show_bug.cgi?id=1766#c0 has SKB_DROP_REASON_IP_RPFILTER and SKB_DROP_REASON_NEIGH_FAILED, and in this trace we have SKB_DROP_REASON_OTHERHOST ... , can it be possible you fixed one scenario with your patch but not the others? > [...] > > 10.244.0.3:39492->10.244.0.2:53(udp) ip_output > > 21:44:13.069 0xffff97ff83662280 0 <empty>:3552 2007044037796 > > 10.244.0.3:39492->10.244.0.2:53(udp) nf_hook_slow > > 21:44:13.070 0xffff97ff83662280 0 <empty>:3552 2007044038241 > > 10.244.0.3:39492->10.244.0.2:53(udp) nft_do_chain_inet > > 21:44:13.070 0xffff97ff83662280 0 <empty>:3552 2007044040343 > > 10.244.0.3:39492->10.244.0.2:53(udp) nf_queue > > --- snipped other skbs --- > > 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052515236 > > 10.244.0.3:39492->10.244.0.2:53(udp) nf_conntrack_update > > 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052538616 > > 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_manip_pkt > > 21:44:13.149 0xffff97ff83662280 0 <empty>:1463 2007052539511 > > 10.244.0.3:39492->10.244.0.2:53(udp) nf_nat_ipv4_manip_pkt > > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540123 > > 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable > > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540589 > > 10.244.0.3:39492->10.244.0.2:53(udp) l4proto_manip_pkt > > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052540875 > > 10.244.0.3:39492->10.244.0.2:53(udp) skb_ensure_writable > > 21:44:13.150 0xffff97ff83662280 0 <empty>:1463 2007052541326 > > 10.244.0.3:39492->10.244.0.2:53(udp) nf_csum_update > > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052541944 > > 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4 > > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052542259 > > 10.244.0.3:39492->10.244.0.2:53(udp) inet_proto_csum_replace4 <<<< > > DNATed twice > > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052543321 > > 10.244.0.3:39492->10.244.0.4:53(udp) ip_route_me_harder > > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052545374 > > 10.244.0.3:39492->10.244.0.4:53(udp) __xfrm_decode_session > > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052546324 > > 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_ipv4_out > > 21:44:13.151 0xffff97ff83662280 0 <empty>:1463 2007052546676 > > 10.244.0.3:39492->10.244.0.4:53(udp) nf_nat_inet_fn > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052547186 > > 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052547732 > > 10.244.0.3:39492->10.244.0.4:53(udp) selinux_ip_postroute_compat > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052548217 > > 10.244.0.3:39492->10.244.0.4:53(udp) nf_confirm > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052548744 > > 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052549162 > > 10.244.0.3:39492->10.244.0.4:53(udp) __ip_finish_output > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052549614 > > 10.244.0.3:39492->10.244.0.4:53(udp) ip_finish_output2 > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052550159 > > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_queue_xmit > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052550656 > > 10.244.0.3:39492->10.244.0.4:53(udp) netdev_core_pick_tx > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052551436 > > 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_skb > > 21:44:13.152 0xffff97ff83662280 0 <empty>:1463 2007052551882 > > 10.244.0.3:39492->10.244.0.4:53(udp) netif_skb_features > > 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052552291 > > 10.244.0.3:39492->10.244.0.4:53(udp) passthru_features_check > > 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052552672 > > 10.244.0.3:39492->10.244.0.4:53(udp) skb_network_protocol > > 21:44:13.153 0xffff97ff83662280 0 <empty>:1463 2007052553191 > > 10.244.0.3:39492->10.244.0.4:53(udp) skb_csum_hwoffload_help > > 21:44:13.154 0xffff97ff83662280 0 <empty>:1463 2007052553566 > > 10.244.0.3:39492->10.244.0.4:53(udp) validate_xmit_xfrm > > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052554026 > > 10.244.0.3:39492->10.244.0.4:53(udp) dev_hard_start_xmit > > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052554482 > > 10.244.0.3:39492->10.244.0.4:53(udp) veth_xmit > > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052555156 > > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb > > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052555604 > > 10.244.0.3:39492->10.244.0.4:53(udp) __dev_forward_skb2 > > 21:44:13.155 0xffff97ff83662280 0 <empty>:1463 2007052556045 > > 10.244.0.3:39492->10.244.0.4:53(udp) skb_scrub_packet > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052556449 > > 10.244.0.3:39492->10.244.0.4:53(udp) eth_type_trans > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052557536 > > 10.244.0.3:39492->10.244.0.4:53(udp) __netif_rx > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052559424 > > 10.244.0.3:39492->10.244.0.4:53(udp) netif_rx_internal > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052559872 > > 10.244.0.3:39492->10.244.0.4:53(udp) enqueue_to_backlog > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052560827 > > 10.244.0.3:39492->10.244.0.4:53(udp) __netif_receive_skb_one_core > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052561410 > > 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052561845 > > 10.244.0.3:39492->10.244.0.4:53(udp) ip_rcv_core > > 21:44:13.156 0xffff97ff83662280 0 <empty>:1463 2007052564056 > > 10.244.0.3:39492->10.244.0.4:53(udp) > > kfree_skb_reason(SKB_DROP_REASON_OTHERHOST)
On Wed, Sep 18, 2024 at 10:42:25PM +0100, Antonio Ojea wrote: > On Wed, 18 Sept 2024 at 21:53, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > On Tue, Sep 17, 2024 at 11:01:31PM +0100, Antonio Ojea wrote: > > > On Fri, 13 Sept 2024 at 07:24, Antonio Ojea > > > <antonio.ojea.garcia@gmail.com> wrote: > > > > > > > > On Thu, 12 Sept 2024 at 20:58, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > > > > > > > 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed > > > > > conntracks") adjusts NAT again in case that packet loses race to confirm > > > > > the conntrack entry. > > > > > > > > > > The reinject path triggers a route lookup again for the output hook, but > > > > > not for the postrouting hook where queue to userspace is also possible. > > > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > > Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> > > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > > > > --- > > > > > I tried but I am not managing to make a selftest that runs reliable. > > > > > I can reproduce it manually and validate that this works. > > > > > > > > > > ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the > > > > > userspace queue processing which helps trigger the race more easily, > > > > > socat needs to send several packets in the same UDP flow. > > > > > > > > > > @Antonio: Could you try this patch meanwhile there is a testcase for > > > > > this. > > > > > > > > Let me test it and report back > > > > > > > > > > Ok, I finally managed to get this tested, and it does not seem to > > > solve the problem, it keeps dnating twice after the packet is enqueued > > > by nfqueue > > > > I really don't understand why my patch did not work. > > > > In this new test run you have use postrouting/filter chain, but it > > does not call nf_reroute as in the previous trace. > > > > Does program issues the NF_STOP verdict instead? > > > > The program is the same and the environment the same, I just change > the kernel of the VM. > I have observed different drops reasons, see my original comment > https://bugzilla.netfilter.org/show_bug.cgi?id=1766#c0 has > SKB_DROP_REASON_IP_RPFILTER and SKB_DROP_REASON_NEIGH_FAILED, and in > this trace we have SKB_DROP_REASON_OTHERHOST ... , can it be possible > you fixed one scenario with your patch but not the others? It could be different scenario. I was expecting consistency in UDP packet distribution is a requirement, but I understood goal at this stage is to ensure packets are not dropped while dealing with clash resolution. I have applied Florian's patch to nf.git, thanks.
> > It could be different scenario. I was expecting consistency in UDP packet > distribution is a requirement, but I understood goal at this stage is > to ensure packets are not dropped while dealing with clash resolution. > > I have applied Florian's patch to nf.git, thanks. Is there a workaround I can apply in the meantime? kernels fixes take a long time to be on users' distros and I have continuous reports about this problem. I was thinking that I can track the tuples in userspace and hold the duplicate for some time, but I'm not sure this will completely solve the problem and I want to consider this as a last resort. Is there any feature in nftables that can help? any ideas/suggestions I can explore?
On Sun, 6 Oct 2024 at 15:44, Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote: > > > > > It could be different scenario. I was expecting consistency in UDP packet > > distribution is a requirement, but I understood goal at this stage is > > to ensure packets are not dropped while dealing with clash resolution. > > > > I have applied Florian's patch to nf.git, thanks. > > Is there a workaround I can apply in the meantime? kernels fixes take > a long time to be on users' distros and I have continuous reports > about this problem. > > I was thinking that I can track the tuples in userspace and hold the > duplicate for some time, but I'm not sure this will completely solve > the problem and I want to consider this as a last resort. > Is there any feature in nftables that can help? any ideas/suggestions > I can explore? answering myself and for reference in case someone hits the same problem, I just special cased the DNS traffic to be processed only in the PREROUTING hook after DNAT and skip it in POSTROUTING, this does not seem to trigger the race problem.
On Mon, Oct 07, 2024 at 09:14:41AM +0100, Antonio Ojea wrote: > On Sun, 6 Oct 2024 at 15:44, Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote: > > > > > > > > It could be different scenario. I was expecting consistency in UDP packet > > > distribution is a requirement, but I understood goal at this stage is > > > to ensure packets are not dropped while dealing with clash resolution. > > > > > > I have applied Florian's patch to nf.git, thanks. > > > > Is there a workaround I can apply in the meantime? kernels fixes take > > a long time to be on users' distros and I have continuous reports > > about this problem. > > > > I was thinking that I can track the tuples in userspace and hold the > > duplicate for some time, but I'm not sure this will completely solve > > the problem and I want to consider this as a last resort. > > Is there any feature in nftables that can help? any ideas/suggestions > > I can explore? > > answering myself and for reference in case someone hits the same > problem, I just special cased the DNS traffic to be processed only in > the PREROUTING hook after DNAT and skip it in POSTROUTING, this does > not seem to trigger the race problem. I am going to request inclusion of this patch to -stable so you don't have to carry this workaround in the near future.
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index e0716da256bf..aeb354271e85 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -276,7 +276,8 @@ static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry #ifdef CONFIG_INET const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry); - if (entry->state.hook == NF_INET_LOCAL_OUT) { + if (entry->state.hook == NF_INET_LOCAL_OUT || + entry->state.hook == NF_INET_POST_ROUTING) { const struct iphdr *iph = ip_hdr(skb); if (!(iph->tos == rt_info->tos &&
368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks") adjusts NAT again in case that packet loses race to confirm the conntrack entry. The reinject path triggers a route lookup again for the output hook, but not for the postrouting hook where queue to userspace is also possible. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> --- I tried but I am not managing to make a selftest that runs reliable. I can reproduce it manually and validate that this works. ./nft_queue -d 1000 helps by introducing a delay of 1000ms in the userspace queue processing which helps trigger the race more easily, socat needs to send several packets in the same UDP flow. @Antonio: Could you try this patch meanwhile there is a testcase for this. net/netfilter/nfnetlink_queue.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)