diff mbox series

[v3] IPv4: Tunnel: Fix effective path mtu calculation

Message ID 20200625224435.GA2325089@tws
State Changes Requested
Delegated to: David Miller
Headers show
Series [v3] IPv4: Tunnel: Fix effective path mtu calculation | expand

Commit Message

Oliver Herms June 25, 2020, 10:44 p.m. UTC
The calculation of the effective tunnel mtu, that is used to create
mtu exceptions if necessary, is currently not done correctly. This
leads to unnecessary entries in the IPv6 route cache for any
packet send through the tunnel.

The root cause is, that "dev->hard_header_len" is subtracted from the
tunnel destionations path mtu. Thus subtracting too much, if
dev->hard_header_len is filled in. This is that case for SIT tunnels
where hard_header_len is the underlyings dev hard_header_len (e.g. 14
for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091).

However, the MTU of the path is exclusive of the ethernet header
and the 20 bytes for the IP header are being subtracted separately
already. Thus hard_header_len is removed from this calculation.

For IPIP and GRE tunnels this doesn't change anything as hard_header_len
is zero in those cases anyways.

This patch also corrects the calculation of the payload's packet size.

Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
---
 net/ipv4/ip_tunnel.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski June 30, 2020, 6:22 a.m. UTC | #1
On Fri, 26 Jun 2020 00:44:35 +0200 Oliver Herms wrote:
> The calculation of the effective tunnel mtu, that is used to create
> mtu exceptions if necessary, is currently not done correctly. This
> leads to unnecessary entries in the IPv6 route cache for any
> packet send through the tunnel.
> 
> The root cause is, that "dev->hard_header_len" is subtracted from the
> tunnel destionations path mtu. Thus subtracting too much, if
> dev->hard_header_len is filled in. This is that case for SIT tunnels
> where hard_header_len is the underlyings dev hard_header_len (e.g. 14
> for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091).

It seems like SIT possibly got missed in evolution of the ip_tunnel
code? It seems to duplicate a lot of code, including pmtu checking.
Doesn't call ip_tunnel_init()...

My understanding is that for a while now tunnels are not supposed to use
dev->hard_header_len to reserve skb space, and use dev->needed_headroom, 
instead. sit uses hard_header_len and doesn't even copy needed_headroom
of the lower device.

> However, the MTU of the path is exclusive of the ethernet header
> and the 20 bytes for the IP header are being subtracted separately
> already. Thus hard_header_len is removed from this calculation.
> 
> For IPIP and GRE tunnels this doesn't change anything as
> hard_header_len is zero in those cases anyways.

This statement is definitely not true. Please see the calls to
ether_setup() in ip_gre.c, and the implementation of this function.

> This patch also corrects the calculation of the payload's packet size.
> 
> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>

All in all, I think it's the SIT code that needs work, not ip_tunnel.
Oliver Herms June 30, 2020, 10:21 a.m. UTC | #2
On 30.06.20 08:22, Jakub Kicinski wrote:
> On Fri, 26 Jun 2020 00:44:35 +0200 Oliver Herms wrote:
>> The calculation of the effective tunnel mtu, that is used to create
>> mtu exceptions if necessary, is currently not done correctly. This
>> leads to unnecessary entries in the IPv6 route cache for any
>> packet send through the tunnel.
>>
>> The root cause is, that "dev->hard_header_len" is subtracted from the
>> tunnel destionations path mtu. Thus subtracting too much, if
>> dev->hard_header_len is filled in. This is that case for SIT tunnels
>> where hard_header_len is the underlyings dev hard_header_len (e.g. 14
>> for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091).
> 
> It seems like SIT possibly got missed in evolution of the ip_tunnel
> code? It seems to duplicate a lot of code, including pmtu checking.
> Doesn't call ip_tunnel_init()...

Are you open for patches cleaning this up?

> 
> My understanding is that for a while now tunnels are not supposed to use
> dev->hard_header_len to reserve skb space, and use dev->needed_headroom, 
> instead. sit uses hard_header_len and doesn't even copy needed_headroom
> of the lower device.
> 
>> However, the MTU of the path is exclusive of the ethernet header
>> and the 20 bytes for the IP header are being subtracted separately
>> already. Thus hard_header_len is removed from this calculation.
>>
>> For IPIP and GRE tunnels this doesn't change anything as
>> hard_header_len is zero in those cases anyways.
> 
> This statement is definitely not true. Please see the calls to
> ether_setup() in ip_gre.c, and the implementation of this function
Right. I have to admit I've only checked for L3 tunnels using printk
on dev->hard_header_len. Showing 0 for IPIP and GRE.

So shall I file a patch that changes hard_header_len for SIT tunnels to 0?
Nicolas Dichtel June 30, 2020, 3:51 p.m. UTC | #3
Le 30/06/2020 à 08:22, Jakub Kicinski a écrit :
[snip]
> My understanding is that for a while now tunnels are not supposed to use
> dev->hard_header_len to reserve skb space, and use dev->needed_headroom, 
> instead. sit uses hard_header_len and doesn't even copy needed_headroom
> of the lower device.

I missed this. I was wondering why IPv6 tunnels uses hard_header_len, if there
was a "good" reason:

$ git grep "hard_header_len.*=" net/ipv6/
net/ipv6/ip6_tunnel.c:                  dev->hard_header_len =
tdev->hard_header_len + t_hlen;
net/ipv6/ip6_tunnel.c:  dev->hard_header_len = LL_MAX_HEADER + t_hlen;
net/ipv6/sit.c:         dev->hard_header_len = tdev->hard_header_len +
sizeof(struct iphdr);
net/ipv6/sit.c: dev->hard_header_len    = LL_MAX_HEADER + t_hlen;

A cleanup would be nice ;-)
Jakub Kicinski June 30, 2020, 5:27 p.m. UTC | #4
On Tue, 30 Jun 2020 12:21:14 +0200 Oliver Herms wrote:
> On 30.06.20 08:22, Jakub Kicinski wrote:
> > On Fri, 26 Jun 2020 00:44:35 +0200 Oliver Herms wrote:  
> >> The calculation of the effective tunnel mtu, that is used to create
> >> mtu exceptions if necessary, is currently not done correctly. This
> >> leads to unnecessary entries in the IPv6 route cache for any
> >> packet send through the tunnel.
> >>
> >> The root cause is, that "dev->hard_header_len" is subtracted from the
> >> tunnel destionations path mtu. Thus subtracting too much, if
> >> dev->hard_header_len is filled in. This is that case for SIT tunnels
> >> where hard_header_len is the underlyings dev hard_header_len (e.g. 14
> >> for ethernet) + 20 bytes IP header (see net/ipv6/sit.c:1091).  
> > 
> > It seems like SIT possibly got missed in evolution of the ip_tunnel
> > code? It seems to duplicate a lot of code, including pmtu checking.
> > Doesn't call ip_tunnel_init()...  
> 
> Are you open for patches cleaning this up?

Certainly! Maybe some of the oddities are justified, but cleanup /
re-aligning with the rest of ip_tunnels would be nice.

Not sure how much of it is qualifying as a bug, so perhaps two series
would be needed - one for net / stable with bug fixes and another of
pure cleanups for net-next?
Jakub Kicinski June 30, 2020, 5:33 p.m. UTC | #5
On Tue, 30 Jun 2020 17:51:41 +0200 Nicolas Dichtel wrote:
> Le 30/06/2020 à 08:22, Jakub Kicinski a écrit :
> [snip]
> > My understanding is that for a while now tunnels are not supposed to use
> > dev->hard_header_len to reserve skb space, and use dev->needed_headroom, 
> > instead. sit uses hard_header_len and doesn't even copy needed_headroom
> > of the lower device.  
> 
> I missed this. I was wondering why IPv6 tunnels uses hard_header_len, if there
> was a "good" reason:
> 
> $ git grep "hard_header_len.*=" net/ipv6/
> net/ipv6/ip6_tunnel.c:                  dev->hard_header_len =
> tdev->hard_header_len + t_hlen;
> net/ipv6/ip6_tunnel.c:  dev->hard_header_len = LL_MAX_HEADER + t_hlen;
> net/ipv6/sit.c:         dev->hard_header_len = tdev->hard_header_len +
> sizeof(struct iphdr);
> net/ipv6/sit.c: dev->hard_header_len    = LL_MAX_HEADER + t_hlen;
> 
> A cleanup would be nice ;-)

I did some archaeological investigatin' yesterday, and I saw
c95b819ad75b ("gre: Use needed_headroom") which converted GRE.
Then I think Pravin used GRE as a base for better ip_tunnel infra 
and the no-hard_header_len-abuse gospel has spread to other IPv4
tunnels. AFAICT IPv6 tunnels were not as lucky, and SIT just got
missed in the IPV4 conversion..
Nicolas Dichtel June 30, 2020, 10:27 p.m. UTC | #6
Le 30/06/2020 à 19:33, Jakub Kicinski a écrit :
> On Tue, 30 Jun 2020 17:51:41 +0200 Nicolas Dichtel wrote:
>> Le 30/06/2020 à 08:22, Jakub Kicinski a écrit :
>> [snip]
>>> My understanding is that for a while now tunnels are not supposed to use
>>> dev->hard_header_len to reserve skb space, and use dev->needed_headroom, 
>>> instead. sit uses hard_header_len and doesn't even copy needed_headroom
>>> of the lower device.  
>>
>> I missed this. I was wondering why IPv6 tunnels uses hard_header_len, if there
>> was a "good" reason:
>>
>> $ git grep "hard_header_len.*=" net/ipv6/
>> net/ipv6/ip6_tunnel.c:                  dev->hard_header_len =
>> tdev->hard_header_len + t_hlen;
>> net/ipv6/ip6_tunnel.c:  dev->hard_header_len = LL_MAX_HEADER + t_hlen;
>> net/ipv6/sit.c:         dev->hard_header_len = tdev->hard_header_len +
>> sizeof(struct iphdr);
>> net/ipv6/sit.c: dev->hard_header_len    = LL_MAX_HEADER + t_hlen;
>>
>> A cleanup would be nice ;-)
> 
> I did some archaeological investigatin' yesterday, and I saw
> c95b819ad75b ("gre: Use needed_headroom") which converted GRE.
Thanks for the pointer.

> Then I think Pravin used GRE as a base for better ip_tunnel infra 
> and the no-hard_header_len-abuse gospel has spread to other IPv4
> tunnels. AFAICT IPv6 tunnels were not as lucky, and SIT just got
> missed in the IPV4 conversion..
Yep, I agree with you, it's probably "historical".
diff mbox series

Patch

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index f4f1d11eab50..66565647122d 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -492,11 +492,10 @@  static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 	int mtu;
 
 	tunnel_hlen = md ? tunnel_hlen : tunnel->hlen;
-	pkt_size = skb->len - tunnel_hlen - dev->hard_header_len;
+	pkt_size = skb->len - tunnel_hlen;
 
 	if (df)
-		mtu = dst_mtu(&rt->dst) - dev->hard_header_len
-					- sizeof(struct iphdr) - tunnel_hlen;
+		mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) - tunnel_hlen;
 	else
 		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;