Message ID | 20200625224435.GA2325089@tws |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [v3] IPv4: Tunnel: Fix effective path mtu calculation | expand |
On Fri, 26 Jun 2020 00:44:35 +0200 Oliver Herms wrote: > The calculation of the effective tunnel mtu, that is used to create > mtu exceptions if necessary, is currently not done correctly. This > leads to unnecessary entries in the IPv6 route cache for any > packet send through the tunnel. > > The root cause is, that "dev->hard_header_len" is subtracted from the > tunnel destionations path mtu. Thus subtracting too much, if > dev->hard_header_len is filled in. This is that case for SIT tunnels > where hard_header_len is the underlyings dev hard_header_len (e.g. 14 > for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091). It seems like SIT possibly got missed in evolution of the ip_tunnel code? It seems to duplicate a lot of code, including pmtu checking. Doesn't call ip_tunnel_init()... My understanding is that for a while now tunnels are not supposed to use dev->hard_header_len to reserve skb space, and use dev->needed_headroom, instead. sit uses hard_header_len and doesn't even copy needed_headroom of the lower device. > However, the MTU of the path is exclusive of the ethernet header > and the 20 bytes for the IP header are being subtracted separately > already. Thus hard_header_len is removed from this calculation. > > For IPIP and GRE tunnels this doesn't change anything as > hard_header_len is zero in those cases anyways. This statement is definitely not true. Please see the calls to ether_setup() in ip_gre.c, and the implementation of this function. > This patch also corrects the calculation of the payload's packet size. > > Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") > Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> All in all, I think it's the SIT code that needs work, not ip_tunnel.
On 30.06.20 08:22, Jakub Kicinski wrote: > On Fri, 26 Jun 2020 00:44:35 +0200 Oliver Herms wrote: >> The calculation of the effective tunnel mtu, that is used to create >> mtu exceptions if necessary, is currently not done correctly. This >> leads to unnecessary entries in the IPv6 route cache for any >> packet send through the tunnel. >> >> The root cause is, that "dev->hard_header_len" is subtracted from the >> tunnel destionations path mtu. Thus subtracting too much, if >> dev->hard_header_len is filled in. This is that case for SIT tunnels >> where hard_header_len is the underlyings dev hard_header_len (e.g. 14 >> for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091). > > It seems like SIT possibly got missed in evolution of the ip_tunnel > code? It seems to duplicate a lot of code, including pmtu checking. > Doesn't call ip_tunnel_init()... Are you open for patches cleaning this up? > > My understanding is that for a while now tunnels are not supposed to use > dev->hard_header_len to reserve skb space, and use dev->needed_headroom, > instead. sit uses hard_header_len and doesn't even copy needed_headroom > of the lower device. > >> However, the MTU of the path is exclusive of the ethernet header >> and the 20 bytes for the IP header are being subtracted separately >> already. Thus hard_header_len is removed from this calculation. >> >> For IPIP and GRE tunnels this doesn't change anything as >> hard_header_len is zero in those cases anyways. > > This statement is definitely not true. Please see the calls to > ether_setup() in ip_gre.c, and the implementation of this function Right. I have to admit I've only checked for L3 tunnels using printk on dev->hard_header_len. Showing 0 for IPIP and GRE. So shall I file a patch that changes hard_header_len for SIT tunnels to 0?
Le 30/06/2020 à 08:22, Jakub Kicinski a écrit : [snip] > My understanding is that for a while now tunnels are not supposed to use > dev->hard_header_len to reserve skb space, and use dev->needed_headroom, > instead. sit uses hard_header_len and doesn't even copy needed_headroom > of the lower device. I missed this. I was wondering why IPv6 tunnels uses hard_header_len, if there was a "good" reason: $ git grep "hard_header_len.*=" net/ipv6/ net/ipv6/ip6_tunnel.c: dev->hard_header_len = tdev->hard_header_len + t_hlen; net/ipv6/ip6_tunnel.c: dev->hard_header_len = LL_MAX_HEADER + t_hlen; net/ipv6/sit.c: dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr); net/ipv6/sit.c: dev->hard_header_len = LL_MAX_HEADER + t_hlen; A cleanup would be nice ;-)
On Tue, 30 Jun 2020 12:21:14 +0200 Oliver Herms wrote: > On 30.06.20 08:22, Jakub Kicinski wrote: > > On Fri, 26 Jun 2020 00:44:35 +0200 Oliver Herms wrote: > >> The calculation of the effective tunnel mtu, that is used to create > >> mtu exceptions if necessary, is currently not done correctly. This > >> leads to unnecessary entries in the IPv6 route cache for any > >> packet send through the tunnel. > >> > >> The root cause is, that "dev->hard_header_len" is subtracted from the > >> tunnel destionations path mtu. Thus subtracting too much, if > >> dev->hard_header_len is filled in. This is that case for SIT tunnels > >> where hard_header_len is the underlyings dev hard_header_len (e.g. 14 > >> for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091). > > > > It seems like SIT possibly got missed in evolution of the ip_tunnel > > code? It seems to duplicate a lot of code, including pmtu checking. > > Doesn't call ip_tunnel_init()... > > Are you open for patches cleaning this up? Certainly! Maybe some of the oddities are justified, but cleanup / re-aligning with the rest of ip_tunnels would be nice. Not sure how much of it is qualifying as a bug, so perhaps two series would be needed - one for net / stable with bug fixes and another of pure cleanups for net-next?
On Tue, 30 Jun 2020 17:51:41 +0200 Nicolas Dichtel wrote: > Le 30/06/2020 à 08:22, Jakub Kicinski a écrit : > [snip] > > My understanding is that for a while now tunnels are not supposed to use > > dev->hard_header_len to reserve skb space, and use dev->needed_headroom, > > instead. sit uses hard_header_len and doesn't even copy needed_headroom > > of the lower device. > > I missed this. I was wondering why IPv6 tunnels uses hard_header_len, if there > was a "good" reason: > > $ git grep "hard_header_len.*=" net/ipv6/ > net/ipv6/ip6_tunnel.c: dev->hard_header_len = > tdev->hard_header_len + t_hlen; > net/ipv6/ip6_tunnel.c: dev->hard_header_len = LL_MAX_HEADER + t_hlen; > net/ipv6/sit.c: dev->hard_header_len = tdev->hard_header_len + > sizeof(struct iphdr); > net/ipv6/sit.c: dev->hard_header_len = LL_MAX_HEADER + t_hlen; > > A cleanup would be nice ;-) I did some archaeological investigatin' yesterday, and I saw c95b819ad75b ("gre: Use needed_headroom") which converted GRE. Then I think Pravin used GRE as a base for better ip_tunnel infra and the no-hard_header_len-abuse gospel has spread to other IPv4 tunnels. AFAICT IPv6 tunnels were not as lucky, and SIT just got missed in the IPV4 conversion..
Le 30/06/2020 à 19:33, Jakub Kicinski a écrit : > On Tue, 30 Jun 2020 17:51:41 +0200 Nicolas Dichtel wrote: >> Le 30/06/2020 à 08:22, Jakub Kicinski a écrit : >> [snip] >>> My understanding is that for a while now tunnels are not supposed to use >>> dev->hard_header_len to reserve skb space, and use dev->needed_headroom, >>> instead. sit uses hard_header_len and doesn't even copy needed_headroom >>> of the lower device. >> >> I missed this. I was wondering why IPv6 tunnels uses hard_header_len, if there >> was a "good" reason: >> >> $ git grep "hard_header_len.*=" net/ipv6/ >> net/ipv6/ip6_tunnel.c: dev->hard_header_len = >> tdev->hard_header_len + t_hlen; >> net/ipv6/ip6_tunnel.c: dev->hard_header_len = LL_MAX_HEADER + t_hlen; >> net/ipv6/sit.c: dev->hard_header_len = tdev->hard_header_len + >> sizeof(struct iphdr); >> net/ipv6/sit.c: dev->hard_header_len = LL_MAX_HEADER + t_hlen; >> >> A cleanup would be nice ;-) > > I did some archaeological investigatin' yesterday, and I saw > c95b819ad75b ("gre: Use needed_headroom") which converted GRE. Thanks for the pointer. > Then I think Pravin used GRE as a base for better ip_tunnel infra > and the no-hard_header_len-abuse gospel has spread to other IPv4 > tunnels. AFAICT IPv6 tunnels were not as lucky, and SIT just got > missed in the IPV4 conversion.. Yep, I agree with you, it's probably "historical".
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index f4f1d11eab50..66565647122d 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -492,11 +492,10 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb, int mtu; tunnel_hlen = md ? tunnel_hlen : tunnel->hlen; - pkt_size = skb->len - tunnel_hlen - dev->hard_header_len; + pkt_size = skb->len - tunnel_hlen; if (df) - mtu = dst_mtu(&rt->dst) - dev->hard_header_len - - sizeof(struct iphdr) - tunnel_hlen; + mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) - tunnel_hlen; else mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
The calculation of the effective tunnel mtu, that is used to create mtu exceptions if necessary, is currently not done correctly. This leads to unnecessary entries in the IPv6 route cache for any packet send through the tunnel. The root cause is, that "dev->hard_header_len" is subtracted from the tunnel destionations path mtu. Thus subtracting too much, if dev->hard_header_len is filled in. This is that case for SIT tunnels where hard_header_len is the underlyings dev hard_header_len (e.g. 14 for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091). However, the MTU of the path is exclusive of the ethernet header and the 20 bytes for the IP header are being subtracted separately already. Thus hard_header_len is removed from this calculation. For IPIP and GRE tunnels this doesn't change anything as hard_header_len is zero in those cases anyways. This patch also corrects the calculation of the payload's packet size. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> --- net/ipv4/ip_tunnel.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)