diff mbox

[net,v3] ipv4: allow local fragmentation in ip_finish_output_gso()

Message ID 1478118977-19608-1-git-send-email-lrichard@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Lance Richardson Nov. 2, 2016, 8:36 p.m. UTC
Some configurations (e.g. geneve interface with default
MTU of 1500 over an ethernet interface with 1500 MTU) result
in the transmission of packets that exceed the configured MTU.
While this should be considered to be a "bad" configuration,
it is still allowed and should not result in the sending
of packets that exceed the configured MTU.

Fix by dropping the assumption in ip_finish_output_gso() that
locally originated gso packets will never need fragmentation.
Basic testing using iperf (observing CPU usage and bandwidth)
have shown no measurable performance impact for traffic not
requiring fragmentation.

Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack")
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
---
 v2: IPSKB_FRAG_SEGS is no longer useful, remove it.
 v3: Eliminate unused variable warning.
 include/net/ip.h          |  3 +--
 net/ipv4/ip_forward.c     |  2 +-
 net/ipv4/ip_output.c      |  6 ++----
 net/ipv4/ip_tunnel_core.c | 11 -----------
 net/ipv4/ipmr.c           |  2 +-
 5 files changed, 5 insertions(+), 19 deletions(-)

Comments

Shmulik Ladkani Nov. 3, 2016, 7:42 a.m. UTC | #1
On Wed,  2 Nov 2016 16:36:17 -0400
Lance Richardson <lrichard@redhat.com> wrote:

> -	/* common case: fragmentation of segments is not allowed,
> -	 * or seglen is <= mtu
> +	/* common case: seglen is <= mtu
>  	 */
> -	if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
> -	      skb_gso_validate_mtu(skb, mtu))
> +	if (skb_gso_validate_mtu(skb, mtu))
>  		return ip_finish_output2(net, sk, skb);

OK.
Resembles https://patchwork.ozlabs.org/patch/644724/ ;)

> -	if (skb_iif && !(df & htons(IP_DF))) {
> -		/* Arrived from an ingress interface, got encapsulated, with
> -		 * fragmentation of encapulating frames allowed.
> -		 * If skb is gso, the resulting encapsulated network segments
> -		 * may exceed dst mtu.
> -		 * Allow IP Fragmentation of segments.
> -		 */
> -		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> -	}

The only potential issue I see removing this, is that the KNOWLEDGE of
the forwarding/tunnel-bridging usecases (that require a
skb_gso_validate_mtu check) is lost.

Meaning, if one decides (as claimed in the past) that locally generated
traffic must NOT go through fragmentation validation, then no one
remembers those other valid usecases (which are very specific and not
trivial to imagine) that MUST go through it; Therefore it can easily get
broken.

Maybe we can elaborate in the comment above the call to
skb_gso_validate_mtu in ip_finish_output_gso?

Best,
Shmulik
Hannes Frederic Sowa Nov. 3, 2016, 9:42 a.m. UTC | #2
On 02.11.2016 21:36, Lance Richardson wrote:
> Some configurations (e.g. geneve interface with default
> MTU of 1500 over an ethernet interface with 1500 MTU) result
> in the transmission of packets that exceed the configured MTU.
> While this should be considered to be a "bad" configuration,
> it is still allowed and should not result in the sending
> of packets that exceed the configured MTU.
> 
> Fix by dropping the assumption in ip_finish_output_gso() that
> locally originated gso packets will never need fragmentation.
> Basic testing using iperf (observing CPU usage and bandwidth)
> have shown no measurable performance impact for traffic not
> requiring fragmentation.
> 
> Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack")
> Reported-by: Jan Tluka <jtluka@redhat.com>
> Signed-off-by: Lance Richardson <lrichard@redhat.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks Florian for remembering that we can remove IPSKB_FRAG_SEGS now
again. :)
Hannes Frederic Sowa Nov. 3, 2016, 9:44 a.m. UTC | #3
On 03.11.2016 08:42, Shmulik Ladkani wrote:
> On Wed,  2 Nov 2016 16:36:17 -0400
> Lance Richardson <lrichard@redhat.com> wrote:
> 
>> -	/* common case: fragmentation of segments is not allowed,
>> -	 * or seglen is <= mtu
>> +	/* common case: seglen is <= mtu
>>  	 */
>> -	if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
>> -	      skb_gso_validate_mtu(skb, mtu))
>> +	if (skb_gso_validate_mtu(skb, mtu))
>>  		return ip_finish_output2(net, sk, skb);
> 
> OK.
> Resembles https://patchwork.ozlabs.org/patch/644724/ ;)
> 
>> -	if (skb_iif && !(df & htons(IP_DF))) {
>> -		/* Arrived from an ingress interface, got encapsulated, with
>> -		 * fragmentation of encapulating frames allowed.
>> -		 * If skb is gso, the resulting encapsulated network segments
>> -		 * may exceed dst mtu.
>> -		 * Allow IP Fragmentation of segments.
>> -		 */
>> -		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
>> -	}
> 
> The only potential issue I see removing this, is that the KNOWLEDGE of
> the forwarding/tunnel-bridging usecases (that require a
> skb_gso_validate_mtu check) is lost.
> 
> Meaning, if one decides (as claimed in the past) that locally generated
> traffic must NOT go through fragmentation validation, then no one
> remembers those other valid usecases (which are very specific and not
> trivial to imagine) that MUST go through it; Therefore it can easily get
> broken.

I agree but I think the git changelog reassembles the changes in this
area in good enough detail and it can be added if there is need for that
currently I don't see a reason why to leave this code there.

I am a bit sad to remove this condition, but the alternative, to
generate oversized frames, is absolutely not acceptable. :/

Thanks,
Hannes
Lance Richardson Nov. 3, 2016, 1:06 p.m. UTC | #4
> From: "Shmulik Ladkani" <shmulik.ladkani@gmail.com>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: netdev@vger.kernel.org, fw@strlen.de, jtluka@redhat.com, hannes@stressinduktion.org
> Sent: Thursday, November 3, 2016 3:42:39 AM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()
> 
> On Wed,  2 Nov 2016 16:36:17 -0400
> Lance Richardson <lrichard@redhat.com> wrote:
> 
> > -	/* common case: fragmentation of segments is not allowed,
> > -	 * or seglen is <= mtu
> > +	/* common case: seglen is <= mtu
> >  	 */
> > -	if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
> > -	      skb_gso_validate_mtu(skb, mtu))
> > +	if (skb_gso_validate_mtu(skb, mtu))
> >  		return ip_finish_output2(net, sk, skb);
> 
> OK.
> Resembles https://patchwork.ozlabs.org/patch/644724/ ;)
> 
> > -	if (skb_iif && !(df & htons(IP_DF))) {
> > -		/* Arrived from an ingress interface, got encapsulated, with
> > -		 * fragmentation of encapulating frames allowed.
> > -		 * If skb is gso, the resulting encapsulated network segments
> > -		 * may exceed dst mtu.
> > -		 * Allow IP Fragmentation of segments.
> > -		 */
> > -		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> > -	}
> 
> The only potential issue I see removing this, is that the KNOWLEDGE of
> the forwarding/tunnel-bridging usecases (that require a
> skb_gso_validate_mtu check) is lost.
> 
> Meaning, if one decides (as claimed in the past) that locally generated
> traffic must NOT go through fragmentation validation, then no one
> remembers those other valid usecases (which are very specific and not
> trivial to imagine) that MUST go through it; Therefore it can easily get
> broken.
> 

Hi Shmulik,

It is my understanding that the test to avoid fragmentation validation
for locally originated packets was a performance optimization which was
based on the premise that the IP stack will never produce locally originated
packets with an MTU mismatch. Tunneling encapsulation breaks this premise.
For correctness (honoring configured MTU), it seems we cannot avoid calling
skb_gso_validate_mtu().

> Maybe we can elaborate in the comment above the call to
> skb_gso_validate_mtu in ip_finish_output_gso?

I'm not sure what could be added that would help, was there something
specific you had in mind?

Thanks for reviewing this.

Regards,

   Lance
> 
> Best,
> Shmulik
>
David Miller Nov. 3, 2016, 8:12 p.m. UTC | #5
From: Lance Richardson <lrichard@redhat.com>
Date: Wed,  2 Nov 2016 16:36:17 -0400

> Some configurations (e.g. geneve interface with default
> MTU of 1500 over an ethernet interface with 1500 MTU) result
> in the transmission of packets that exceed the configured MTU.
> While this should be considered to be a "bad" configuration,
> it is still allowed and should not result in the sending
> of packets that exceed the configured MTU.
> 
> Fix by dropping the assumption in ip_finish_output_gso() that
> locally originated gso packets will never need fragmentation.
> Basic testing using iperf (observing CPU usage and bandwidth)
> have shown no measurable performance impact for traffic not
> requiring fragmentation.
> 
> Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack")
> Reported-by: Jan Tluka <jtluka@redhat.com>
> Signed-off-by: Lance Richardson <lrichard@redhat.com>

Applied and queued up for -stable.
Shmulik Ladkani Nov. 3, 2016, 8:27 p.m. UTC | #6
Hi Hannes, Lance,

On Wed,  2 Nov 2016 16:36:17 -0400 Lance Richardson <lrichard@redhat.com> wrote:
>  
> -	if (skb_iif && !(df & htons(IP_DF))) {
> -		/* Arrived from an ingress interface, got encapsulated, with
> -		 * fragmentation of encapulating frames allowed.
> -		 * If skb is gso, the resulting encapsulated network segments
> -		 * may exceed dst mtu.
> -		 * Allow IP Fragmentation of segments.
> -		 */
> -		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> -	}

Thinking this over, I'm concerned of this change.

Few months back, we discussed this and got to the conclusion that in the
"ingress,tunnel,egress" scenario, segments are allowed to be
fragmented if the original inner ip packet does NOT have the DF.

See 
  https://patchwork.ozlabs.org/patch/657132/
  https://patchwork.ozlabs.org/patch/661219/

I think you expressed that those tunneled skbs already having DF set
should go through pmtu discovery.

Suggested patch unconditionally calls skb_gso_validate_mtu().

Thus we're changing behavior for "ingress,tunnel,egress" scenario of
the tunneled packets having DF set in the inner iph.

WDYT?
Shmulik Ladkani Nov. 3, 2016, 8:40 p.m. UTC | #7
On Thu, 03 Nov 2016 16:12:44 -0400 (EDT) David Miller <davem@davemloft.net> wrote:
> Applied and queued up for -stable.

Dave, my response lagged your "Applied" by few minutes ;)

This seems to deserve some more thought to make sure nothing got broken,
as expressed last in https://patchwork.ozlabs.org/patch/690594/

Best,
Shmulik
David Miller Nov. 3, 2016, 8:56 p.m. UTC | #8
From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Date: Thu, 3 Nov 2016 22:40:52 +0200

> On Thu, 03 Nov 2016 16:12:44 -0400 (EDT) David Miller <davem@davemloft.net> wrote:
>> Applied and queued up for -stable.
> 
> Dave, my response lagged your "Applied" by few minutes ;)
> 
> This seems to deserve some more thought to make sure nothing got broken,
> as expressed last in https://patchwork.ozlabs.org/patch/690594/

Feel free to send a followup or a revert if necessary.
Lance Richardson Nov. 3, 2016, 9:05 p.m. UTC | #9
> From: "Shmulik Ladkani" <shmulik.ladkani@gmail.com>
> To: "Lance Richardson" <lrichard@redhat.com>, fw@strlen.de, hannes@stressinduktion.org
> Cc: netdev@vger.kernel.org, jtluka@redhat.com
> Sent: Thursday, November 3, 2016 4:27:51 PM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()
> 
> Hi Hannes, Lance,
> 
> On Wed,  2 Nov 2016 16:36:17 -0400 Lance Richardson <lrichard@redhat.com>
> wrote:
> >  
> > -	if (skb_iif && !(df & htons(IP_DF))) {
> > -		/* Arrived from an ingress interface, got encapsulated, with
> > -		 * fragmentation of encapulating frames allowed.
> > -		 * If skb is gso, the resulting encapsulated network segments
> > -		 * may exceed dst mtu.
> > -		 * Allow IP Fragmentation of segments.
> > -		 */
> > -		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> > -	}
> 
> Thinking this over, I'm concerned of this change.
> 
> Few months back, we discussed this and got to the conclusion that in the
> "ingress,tunnel,egress" scenario, segments are allowed to be
> fragmented if the original inner ip packet does NOT have the DF.
> 
> See
>   https://patchwork.ozlabs.org/patch/657132/
>   https://patchwork.ozlabs.org/patch/661219/
> 
> I think you expressed that those tunneled skbs already having DF set
> should go through pmtu discovery.
> 
> Suggested patch unconditionally calls skb_gso_validate_mtu().
> 
> Thus we're changing behavior for "ingress,tunnel,egress" scenario of
> the tunneled packets having DF set in the inner iph.
> 
> WDYT?
> 

I'm still digesting the patchwork history, but it seems to me:

   1) If we call skb_gso_validate_mtu() and it returns true, ip_finish_output2() will
      be called, just as before, so nothing changes.

   2) If we were to avoid calling skb_gso_validate_mtu() when it would have returned
      false and call ip_finish_output2() without performing fragmentation, we
      would transmit (or attempt to transmit) a packet that exceeds the configured
      MTU.

   3) If we do call skb_gso_validate_mtu() and it returns false, ip_finish_output_gso()
      will call ip_fragment() to perform needed fragmentation. Whether fragmentation
      actually occurs at this point depends on the value of the DF flag in the
      IP header (and perhaps skb->ignore_df, frag_max_size, etc.).

Is the issue you're pointing out about cases in which the inner IP header has DF set
but the tunnel header does not?

Thanks,

   Lance
Hannes Frederic Sowa Nov. 3, 2016, 9:34 p.m. UTC | #10
On 03.11.2016 22:05, Lance Richardson wrote:
>> From: "Shmulik Ladkani" <shmulik.ladkani@gmail.com>
>> To: "Lance Richardson" <lrichard@redhat.com>, fw@strlen.de, hannes@stressinduktion.org
>> Cc: netdev@vger.kernel.org, jtluka@redhat.com
>> Sent: Thursday, November 3, 2016 4:27:51 PM
>> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()
>>
>> Hi Hannes, Lance,
>>
>> On Wed,  2 Nov 2016 16:36:17 -0400 Lance Richardson <lrichard@redhat.com>
>> wrote:
>>>  
>>> -	if (skb_iif && !(df & htons(IP_DF))) {
>>> -		/* Arrived from an ingress interface, got encapsulated, with
>>> -		 * fragmentation of encapulating frames allowed.
>>> -		 * If skb is gso, the resulting encapsulated network segments
>>> -		 * may exceed dst mtu.
>>> -		 * Allow IP Fragmentation of segments.
>>> -		 */
>>> -		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
>>> -	}
>>
>> Thinking this over, I'm concerned of this change.
>>
>> Few months back, we discussed this and got to the conclusion that in the
>> "ingress,tunnel,egress" scenario, segments are allowed to be
>> fragmented if the original inner ip packet does NOT have the DF.



>>
>> See
>>   https://patchwork.ozlabs.org/patch/657132/
>>   https://patchwork.ozlabs.org/patch/661219/
>>
>> I think you expressed that those tunneled skbs already having DF set
>> should go through pmtu discovery.
>>
>> Suggested patch unconditionally calls skb_gso_validate_mtu().
>>
>> Thus we're changing behavior for "ingress,tunnel,egress" scenario of
>> the tunneled packets having DF set in the inner iph.
>>
>> WDYT?
>>
> 
> I'm still digesting the patchwork history, but it seems to me:
> 
>    1) If we call skb_gso_validate_mtu() and it returns true, ip_finish_output2() will
>       be called, just as before, so nothing changes.
> 
>    2) If we were to avoid calling skb_gso_validate_mtu() when it would have returned
>       false and call ip_finish_output2() without performing fragmentation, we
>       would transmit (or attempt to transmit) a packet that exceeds the configured
>       MTU.
> 
>    3) If we do call skb_gso_validate_mtu() and it returns false, ip_finish_output_gso()
>       will call ip_fragment() to perform needed fragmentation. Whether fragmentation
>       actually occurs at this point depends on the value of the DF flag in the
>       IP header (and perhaps skb->ignore_df, frag_max_size, etc.).
> 
> Is the issue you're pointing out about cases in which the inner IP header has DF set
> but the tunnel header does not?

Correct, but we should maybe redefine the code a bit. From my
understanding we can now create an ICMP storm in case every fragment gets.

I think for net this is currently fine and we certainly don't want to
generate oversized datagrams.

I fear the more special case logic we add the sooner or later it will
bite us again. :/

Right now, I see the problem we might end up generating lots of error
callbacks for large gso segments. Maybe we can also just abort after
fragmenting the first frame generated an error. Florian? Or just
overoptimize and jump to the last one, which could have a different size. :)

Anyway, the best thing would be that vxlan etc. inherits the inner DF bit.

The problem to solve here is that for some time we generated oversized
packets on the wire, which is absolutely a no-go. If we now start to
break things for people with "wrong" configuration, we could also get
more complains. Currently I think this is the only way out, unfortunately.

Any other ideas?
Shmulik Ladkani Nov. 4, 2016, 8:02 a.m. UTC | #11
Hi,

On Thu, 3 Nov 2016 17:05:54 -0400 (EDT) Lance Richardson <lrichard@redhat.com> wrote:
> 
> I'm still digesting the patchwork history, but it seems to me:
> 
>    1) If we call skb_gso_validate_mtu() and it returns true, ip_finish_output2() will
>       be called, just as before, so nothing changes.
> 
>    2) If we were to avoid calling skb_gso_validate_mtu() when it would have returned
>       false and call ip_finish_output2() without performing fragmentation, we
>       would transmit (or attempt to transmit) a packet that exceeds the configured
>       MTU.

Yes, this seemed strange to me as well.

I proposed your exact same patch, but Hannes wanted to limit its
behavior, expressing the following concerns (see [1]), quote:

    "The reason why I rather don't want to see a general solution is that
    currently gre and ipip are pmtu-safe and I rather would like to keep the
    old behavior that gre and ipip packets don't get treated alike. I
    couldn't check the current throughly but it seems they would also be
    affected by this change.

    My idea was to put those IPSKB_FORWARD just into the vxlan and geneve
    endpoints which could be seen as UDP endpoints and don't forward and
    care about pmtu events."

and

    "My argumentation is that vxlan and geneve can be seen as endpoints and
    don't have proper pmtu handling. Thus I would be fine with adding hacks
    to just make it working. For GRE I would like to keep strict internet
    pmtu handling and also keep the normal ethernet semantics in place, that
    we should drop packets if another interface did transmit on an ethernet
    bridge with a too big frame size."

Point is, I have no objection to the suggested patch as a whole.
I think (and thought) it is generally correct, as it fixes the anomalies
happending with GSO skbs not getting same treatment as non-gso skbs.

Just want to check whether Hannes' concerns are still valid, and if so,
refine the patch as needed.

Best,
Shmulik

[1] https://patchwork.ozlabs.org/patch/646683/
Shmulik Ladkani Nov. 4, 2016, 9:24 a.m. UTC | #12
Hi,

On Thu, 3 Nov 2016 09:06:27 -0400 (EDT) Lance Richardson <lrichard@redhat.com> wrote:
> I'm not sure what could be added that would help, was there something
> specific you had in mind?

How about something like this (preliminary, feel free to massage):

@@ -248,10 +248,16 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 
 	/* Slowpath -  GSO segment length is exceeding the dst MTU.
 	 *
-	 * This can happen in two cases:
-	 * 1) TCP GRO packet, DF bit not set
-	 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
-	 * from host network stack.
+	 * This can happen in several cases:
+	 *  - Forwarding of TCP GRO packet, DF bit not set
+	 *  - Forwarding of skb arrived in a virtualization environment (from
+	 *    virtio-net/vhost/tap) with TSO/GSO size set by other's network
+	 *    stack
+	 *  - Local GSO skb xmitted on an NETIF_F_TSO tunnel stacked over an
+	 *    interface with a smaller mtu
+	 *  - Arriving GRO skb (or GSO skb in a virtualized env) that gets L2
+	 *    bridged to a NETIF_F_TSO tunnel stacked over an interface with an
+	 *    insufficent mtu
 	 */
 	features = netif_skb_features(skb);
 	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
Shmulik Ladkani Nov. 4, 2016, 9:40 a.m. UTC | #13
On Thu, 3 Nov 2016 22:34:34 +0100 Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> Correct, but we should maybe redefine the code a bit. From my
> understanding we can now create an ICMP storm in case every fragment gets.

Yes, you are right.

Each segment gets into ip_fragment, and due to outer DF being set,
ICMP_FRAG_NEEDED is sent per segment.

BTW, suppose GRO is off, and sender actually did send a burst of
(non-gso) packets with outer DF set, and each was tunnel encapsulated,
resulting in oversized frames.

Would'nt the stack just send the ICMP_FRAG_NEEDED per encapsulated
frame?

If so, then the GRO behaviour is aligned, and there's nothing to fix.

Best,
Shmulik
Lance Richardson Nov. 4, 2016, 1:48 p.m. UTC | #14
> From: "Shmulik Ladkani" <shmulik.ladkani@gmail.com>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: netdev@vger.kernel.org, fw@strlen.de, jtluka@redhat.com, hannes@stressinduktion.org
> Sent: Friday, November 4, 2016 5:24:09 AM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()
> 
> Hi,
> 
> On Thu, 3 Nov 2016 09:06:27 -0400 (EDT) Lance Richardson
> <lrichard@redhat.com> wrote:
> > I'm not sure what could be added that would help, was there something
> > specific you had in mind?
> 
> How about something like this (preliminary, feel free to massage):
> 
> @@ -248,10 +248,16 @@ static int ip_finish_output_gso(struct net *net, struct
> sock *sk,
>  
>  	/* Slowpath -  GSO segment length is exceeding the dst MTU.
>  	 *
> -	 * This can happen in two cases:
> -	 * 1) TCP GRO packet, DF bit not set
> -	 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
> -	 * from host network stack.
> +	 * This can happen in several cases:
> +	 *  - Forwarding of TCP GRO packet, DF bit not set
> +	 *  - Forwarding of skb arrived in a virtualization environment (from
> +	 *    virtio-net/vhost/tap) with TSO/GSO size set by other's network
> +	 *    stack
> +	 *  - Local GSO skb xmitted on an NETIF_F_TSO tunnel stacked over an
> +	 *    interface with a smaller mtu
> +	 *  - Arriving GRO skb (or GSO skb in a virtualized env) that gets L2
> +	 *    bridged to a NETIF_F_TSO tunnel stacked over an interface with an
> +	 *    insufficent mtu
>  	 */
>  	features = netif_skb_features(skb);
>  	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
> 

Thanks, that looks good to me. I can send a follow-up patch with this change,
if you like (there seems to be agreement that the original patch is OK).

   Lance
Lance Richardson Nov. 4, 2016, 1:49 p.m. UTC | #15
> From: "Shmulik Ladkani" <shmulik.ladkani@gmail.com>
> To: "Hannes Frederic Sowa" <hannes@stressinduktion.org>
> Cc: "Lance Richardson" <lrichard@redhat.com>, fw@strlen.de, netdev@vger.kernel.org, jtluka@redhat.com
> Sent: Friday, November 4, 2016 5:40:14 AM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()
> 
> On Thu, 3 Nov 2016 22:34:34 +0100 Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Correct, but we should maybe redefine the code a bit. From my
> > understanding we can now create an ICMP storm in case every fragment gets.
> 
> Yes, you are right.
> 
> Each segment gets into ip_fragment, and due to outer DF being set,
> ICMP_FRAG_NEEDED is sent per segment.
> 
> BTW, suppose GRO is off, and sender actually did send a burst of
> (non-gso) packets with outer DF set, and each was tunnel encapsulated,
> resulting in oversized frames.
> 
> Would'nt the stack just send the ICMP_FRAG_NEEDED per encapsulated
> frame?
> 
> If so, then the GRO behaviour is aligned, and there's nothing to fix.
> 

Agree.

> Best,
> Shmulik
>
diff mbox

Patch

diff --git a/include/net/ip.h b/include/net/ip.h
index 5413883..d3a1078 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -47,8 +47,7 @@  struct inet_skb_parm {
 #define IPSKB_REROUTED		BIT(4)
 #define IPSKB_DOREDIRECT	BIT(5)
 #define IPSKB_FRAG_PMTU		BIT(6)
-#define IPSKB_FRAG_SEGS		BIT(7)
-#define IPSKB_L3SLAVE		BIT(8)
+#define IPSKB_L3SLAVE		BIT(7)
 
 	u16			frag_max_size;
 };
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 8b4ffd2..9f0a7b9 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -117,7 +117,7 @@  int ip_forward(struct sk_buff *skb)
 	if (opt->is_strictroute && rt->rt_uses_gateway)
 		goto sr_failed;
 
-	IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
+	IPCB(skb)->flags |= IPSKB_FORWARDED;
 	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
 	if (ip_exceeds_mtu(skb, mtu)) {
 		IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 03e7f73..4971401 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -239,11 +239,9 @@  static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	struct sk_buff *segs;
 	int ret = 0;
 
-	/* common case: fragmentation of segments is not allowed,
-	 * or seglen is <= mtu
+	/* common case: seglen is <= mtu
 	 */
-	if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
-	      skb_gso_validate_mtu(skb, mtu))
+	if (skb_gso_validate_mtu(skb, mtu))
 		return ip_finish_output2(net, sk, skb);
 
 	/* Slowpath -  GSO segment length is exceeding the dst MTU.
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 777bc18..fed3d29 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -63,7 +63,6 @@  void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	int pkt_len = skb->len - skb_inner_network_offset(skb);
 	struct net *net = dev_net(rt->dst.dev);
 	struct net_device *dev = skb->dev;
-	int skb_iif = skb->skb_iif;
 	struct iphdr *iph;
 	int err;
 
@@ -73,16 +72,6 @@  void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	skb_dst_set(skb, &rt->dst);
 	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 
-	if (skb_iif && !(df & htons(IP_DF))) {
-		/* Arrived from an ingress interface, got encapsulated, with
-		 * fragmentation of encapulating frames allowed.
-		 * If skb is gso, the resulting encapsulated network segments
-		 * may exceed dst mtu.
-		 * Allow IP Fragmentation of segments.
-		 */
-		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
-	}
-
 	/* Push down and install the IP header. */
 	skb_push(skb, sizeof(struct iphdr));
 	skb_reset_network_header(skb);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5f006e1..27089f5 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1749,7 +1749,7 @@  static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		vif->dev->stats.tx_bytes += skb->len;
 	}
 
-	IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
+	IPCB(skb)->flags |= IPSKB_FORWARDED;
 
 	/* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally
 	 * not only before forwarding, but after forwarding on all output