Message ID | 1456603028-12589-2-git-send-email-bpoirier@suse.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Hi Benjamin, [ -Cc my old RH address ] On 02/27/2016 08:57 PM, Benjamin Poirier wrote: > The current reserved_tailroom calculation fails to take hlen and tlen into > account. > > skb: > [__hlen__|__data____________|__tlen___|__extra__] > ^ ^ > head skb_end_offset > > In this representation, hlen + data + tlen is the size passed to alloc_skb. > "extra" is the extra space made available in __alloc_skb because of > rounding up by kmalloc. We can reorder the representation like so: > > [__hlen__|__data____________|__extra__|__tlen___] > ^ ^ > head skb_end_offset > > The maximum space available for ip headers and payload without > fragmentation is min(mtu, data + extra). Therefore, > reserved_tailroom > = data + extra + tlen - min(mtu, data + extra) > = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) > = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) > > Compare the second line to the current expression: > reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset) > and we can see that hlen and tlen are not taken into account. > > Depending on hlen, tlen, mtu and the number of multicast address records, > the current code may output skbs that have less tailroom than > dev->needed_tailroom or it may output more skbs than needed because not all > space available is used. > > Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs") > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > --- > net/ipv4/igmp.c | 4 ++-- > net/ipv6/mcast.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c > index 05e4cba..b5d28a4 100644 > --- a/net/ipv4/igmp.c > +++ b/net/ipv4/igmp.c [...] [ cutting the IPv4 part off as diff is the same ] > diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c > index 5ee56d0..c157edc 100644 > --- a/net/ipv6/mcast.c > +++ b/net/ipv6/mcast.c > @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) > return NULL; > > skb->priority = TC_PRIO_CONTROL; > - skb->reserved_tailroom = skb_end_offset(skb) - > - min(mtu, skb_end_offset(skb)); > skb_reserve(skb, hlen); > + skb->reserved_tailroom = skb_tailroom(skb) - > + min_t(int, mtu, skb_tailroom(skb) - tlen); Are you sure this is correct? Wouldn't that mean (assuming we allocated enough space), that I could now fill a larger than MTU frame? > if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) { > /* <draft-ietf-magma-mld-source-05.txt>: Thanks, Daniel
On 2016/02/29 15:57, Daniel Borkmann wrote: [...] > > [ cutting the IPv4 part off as diff is the same ] > > >diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c > >index 5ee56d0..c157edc 100644 > >--- a/net/ipv6/mcast.c > >+++ b/net/ipv6/mcast.c > >@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) > > return NULL; > > > > skb->priority = TC_PRIO_CONTROL; > >- skb->reserved_tailroom = skb_end_offset(skb) - > >- min(mtu, skb_end_offset(skb)); > > skb_reserve(skb, hlen); > >+ skb->reserved_tailroom = skb_tailroom(skb) - > >+ min_t(int, mtu, skb_tailroom(skb) - tlen); > > Are you sure this is correct? Wouldn't that mean (assuming we allocated > enough space), that I could now fill a larger than MTU frame? Quoting back a part of the log: > >The maximum space available for ip headers and payload without > >fragmentation is min(mtu, data + extra). Therefore, > >reserved_tailroom > >= data + extra + tlen - min(mtu, data + extra) > >= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) > >= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) The min() takes care of the situation you describe, ie. if the allocated space is large, reserved_tailroom will be large enough that we do not use more space than the mtu. I tested the mld and igmp code with different driver parameters, mtu values, number of multicast address records and even allocation failures. If you think the formula is wrong, please provide a counter-example with hlen, tlen, mtu and size values. Regards, -Benjamin
On 02/29/2016 04:19 PM, Benjamin Poirier wrote: > On 2016/02/29 15:57, Daniel Borkmann wrote: > [...] >> >> [ cutting the IPv4 part off as diff is the same ] >> >>> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c >>> index 5ee56d0..c157edc 100644 >>> --- a/net/ipv6/mcast.c >>> +++ b/net/ipv6/mcast.c >>> @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) >>> return NULL; >>> >>> skb->priority = TC_PRIO_CONTROL; >>> - skb->reserved_tailroom = skb_end_offset(skb) - >>> - min(mtu, skb_end_offset(skb)); >>> skb_reserve(skb, hlen); >>> + skb->reserved_tailroom = skb_tailroom(skb) - >>> + min_t(int, mtu, skb_tailroom(skb) - tlen); >> >> Are you sure this is correct? Wouldn't that mean (assuming we allocated >> enough space), that I could now fill a larger than MTU frame? > > Quoting back a part of the log: > >>> The maximum space available for ip headers and payload without >>> fragmentation is min(mtu, data + extra). Therefore, >>> reserved_tailroom >>> = data + extra + tlen - min(mtu, data + extra) >>> = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) >>> = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) > > The min() takes care of the situation you describe, ie. if the allocated > space is large, reserved_tailroom will be large enough that we do not > use more space than the mtu. Hmm, sorry, you are right, I had a bug in my thought process wrt the skb_reserve() that is now done first. Code is fine, patch would be against -net tree: Acked-by: Daniel Borkmann <daniel@iogearbox.net> Thanks, Benjamin!
On 29.02.2016 16:19, Benjamin Poirier wrote: > On 2016/02/29 15:57, Daniel Borkmann wrote: > [...] >> >> [ cutting the IPv4 part off as diff is the same ] >> >>> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c >>> index 5ee56d0..c157edc 100644 >>> --- a/net/ipv6/mcast.c >>> +++ b/net/ipv6/mcast.c >>> @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) >>> return NULL; >>> >>> skb->priority = TC_PRIO_CONTROL; >>> - skb->reserved_tailroom = skb_end_offset(skb) - >>> - min(mtu, skb_end_offset(skb)); >>> skb_reserve(skb, hlen); >>> + skb->reserved_tailroom = skb_tailroom(skb) - >>> + min_t(int, mtu, skb_tailroom(skb) - tlen); >> >> Are you sure this is correct? Wouldn't that mean (assuming we allocated >> enough space), that I could now fill a larger than MTU frame? > > Quoting back a part of the log: > >>> The maximum space available for ip headers and payload without >>> fragmentation is min(mtu, data + extra). Therefore, >>> reserved_tailroom >>> = data + extra + tlen - min(mtu, data + extra) >>> = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) >>> = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) > > The min() takes care of the situation you describe, ie. if the allocated > space is large, reserved_tailroom will be large enough that we do not > use more space than the mtu. > > I tested the mld and igmp code with different driver parameters, mtu > values, number of multicast address records and even allocation > failures. If you think the formula is wrong, please provide a > counter-example with hlen, tlen, mtu and size values. I think the code is fine albeit I think we should remove the min macro and just do something: if (skb_tailroom(skb) > mtu) skb->reserved_tailroom = skb_tailroom(skb) - mtu; Does that make sense? I think it is much more readable. Thanks, Hannes
On 2016/02/29 16:43, Hannes Frederic Sowa wrote: > On 29.02.2016 16:19, Benjamin Poirier wrote: > >On 2016/02/29 15:57, Daniel Borkmann wrote: > >[...] > >> > >>[ cutting the IPv4 part off as diff is the same ] > >> > >>>diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c > >>>index 5ee56d0..c157edc 100644 > >>>--- a/net/ipv6/mcast.c > >>>+++ b/net/ipv6/mcast.c > >>>@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) > >>> return NULL; > >>> > >>> skb->priority = TC_PRIO_CONTROL; > >>>- skb->reserved_tailroom = skb_end_offset(skb) - > >>>- min(mtu, skb_end_offset(skb)); > >>> skb_reserve(skb, hlen); > >>>+ skb->reserved_tailroom = skb_tailroom(skb) - > >>>+ min_t(int, mtu, skb_tailroom(skb) - tlen); > >> > >>Are you sure this is correct? Wouldn't that mean (assuming we allocated > >>enough space), that I could now fill a larger than MTU frame? > > > >Quoting back a part of the log: > > > >>>The maximum space available for ip headers and payload without > >>>fragmentation is min(mtu, data + extra). Therefore, > >>>reserved_tailroom > >>>= data + extra + tlen - min(mtu, data + extra) > >>>= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) > >>>= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) > > > >The min() takes care of the situation you describe, ie. if the allocated > >space is large, reserved_tailroom will be large enough that we do not > >use more space than the mtu. > > > >I tested the mld and igmp code with different driver parameters, mtu > >values, number of multicast address records and even allocation > >failures. If you think the formula is wrong, please provide a > >counter-example with hlen, tlen, mtu and size values. > > I think the code is fine albeit I think we should remove the min macro and > just do something: > > if (skb_tailroom(skb) > mtu) > skb->reserved_tailroom = skb_tailroom(skb) - mtu; > > Does that make sense? I think it is much more readable. That is not equivalent. It fails to take tlen into account. For igmp, consider this case: with hlen = 16, mtu = 9000, tlen = 8, additionally, suppose that the first iteration of the allocation loop (alloc_skb(9000 + 16 + 8, ...) which requires 4 pages) fails and the second iteration (alloc_skb((9000 >> 1) + 16 + 8, ...) which requires 2 pages) succeeds: size = (9000 >> 1) + 16 + 8 = 4524 skb_end_offset = 8192 - 320 = 7872 tailroom = 7872 - 16 = 7856 data = 9000 >> 1 = 4500 extra = 7872 - 4524 = 3348 reserved tailroom (patch version) = 4500 + 3348 + 8 - min(9000, 4500 + 3348) = 8 reserved tailroom (your version) = 0 Headers are ipv4 + igmpv3 = 24 + 8 = 32, records are 8 bytes With 978 igmpv3 records, with your version, we would output an skb that has less tailroom (0) than dev->needed_tailroom (8). For mld, consider this case: with hlen = 16, mtu = 9000, tlen = 8: size = 3776 (SKB_MAX_ORDER case) skb_end_offset = 3776 tailroom = 3776 - 16 = 3760 data = 3776 - 16 - 8 = 3752 extra = 0 reserved tailroom (patch version) = 3752 + 0 + 8 - min(9000, 3752 + 0) = 8 reserved tailroom (your version) = 0 Headers are ipv6 + icmpv6 = 48 + 8 = 56, records are 20 bytes With 185 mld records, with your formula, we would output an skb that has less tailroom (4) than dev->needed_tailroom (8). If you think we should write the expression with "if" instead of "min", instead of the current + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); it should be: + if (mtu < skb_tailroom(skb) - tlen) + skb->reserved_tailroom = skb_tailroom(skb) - mtu; + else + skb->reserved_tailroom = tlen; The second alternative does not look more readable to me but I have been looking at that expression for a while. If you think that it is more readable, I will resend the patch expressed that way. Please let me know.
On 29.02.2016 19:08, Benjamin Poirier wrote: > If you think we should write the expression with "if" instead of "min", > instead of the current > > + skb->reserved_tailroom = skb_tailroom(skb) - > + min_t(int, mtu, skb_tailroom(skb) - tlen); > > it should be: > > + if (mtu < skb_tailroom(skb) - tlen) > + skb->reserved_tailroom = skb_tailroom(skb) - mtu; > + else > + skb->reserved_tailroom = tlen; > > The second alternative does not look more readable to me but I have been > looking at that expression for a while. If you think that it is more > readable, I will resend the patch expressed that way. Please let me > know. I would still find it more readable actually, but no strong opinion, I would leave it up to you. Could it make sense to put this code into a static inline helper and reuse it for both, igmp and mld? Thanks, Hannes
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 05e4cba..b5d28a4 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -356,9 +356,9 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu) skb_dst_set(skb, &rt->dst); skb->dev = dev; - skb->reserved_tailroom = skb_end_offset(skb) - - min(mtu, skb_end_offset(skb)); skb_reserve(skb, hlen); + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); skb_reset_network_header(skb); pip = ip_hdr(skb); diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 5ee56d0..c157edc 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) return NULL; skb->priority = TC_PRIO_CONTROL; - skb->reserved_tailroom = skb_end_offset(skb) - - min(mtu, skb_end_offset(skb)); skb_reserve(skb, hlen); + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) { /* <draft-ietf-magma-mld-source-05.txt>:
The current reserved_tailroom calculation fails to take hlen and tlen into account. skb: [__hlen__|__data____________|__tlen___|__extra__] ^ ^ head skb_end_offset In this representation, hlen + data + tlen is the size passed to alloc_skb. "extra" is the extra space made available in __alloc_skb because of rounding up by kmalloc. We can reorder the representation like so: [__hlen__|__data____________|__extra__|__tlen___] ^ ^ head skb_end_offset The maximum space available for ip headers and payload without fragmentation is min(mtu, data + extra). Therefore, reserved_tailroom = data + extra + tlen - min(mtu, data + extra) = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) Compare the second line to the current expression: reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset) and we can see that hlen and tlen are not taken into account. Depending on hlen, tlen, mtu and the number of multicast address records, the current code may output skbs that have less tailroom than dev->needed_tailroom or it may output more skbs than needed because not all space available is used. Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> --- net/ipv4/igmp.c | 4 ++-- net/ipv6/mcast.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)