diff mbox

xfrm: don't segment UFO packets

Message ID 20160316160026.GB19258@midget.suse.cz
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Bohac March 16, 2016, 4 p.m. UTC
xfrm_output() will segment GSO packets, including UDP (UFO) packets.
this is wrong per RFC4303, section 3.3.4.  Fragmentation:

   If necessary, fragmentation is performed after ESP
   processing within an IPsec implementation.  Thus,
   transport mode ESP is applied only to whole IP
   datagrams (not to IP fragments).

Prevent xfrm_output() from segmenting UFO packets so that they will be
fragmented after the xfrm transforms.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Comments

Herbert Xu March 17, 2016, 5:03 a.m. UTC | #1
On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote:
> xfrm_output() will segment GSO packets, including UDP (UFO) packets.
> this is wrong per RFC4303, section 3.3.4.  Fragmentation:
> 
>    If necessary, fragmentation is performed after ESP
>    processing within an IPsec implementation.  Thus,
>    transport mode ESP is applied only to whole IP
>    datagrams (not to IP fragments).
> 
> Prevent xfrm_output() from segmenting UFO packets so that they will be
> fragmented after the xfrm transforms.
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Fair enough.  But I wonder if this is enough.  Wouldn't UDP notice
that we're doing IPsec and prefragment the packet anyway? So I think
this check may also be needed in the UDP output path.

Thanks,
Jiri Bohac March 17, 2016, 9:41 a.m. UTC | #2
On Thu, Mar 17, 2016 at 01:03:59PM +0800, Herbert Xu wrote:
> On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote:
> > Prevent xfrm_output() from segmenting UFO packets so that they will be
> > fragmented after the xfrm transforms.
> 
> Fair enough.  But I wonder if this is enough.  Wouldn't UDP notice
> that we're doing IPsec and prefragment the packet anyway? So I think
> this check may also be needed in the UDP output path.

Fixes my broken case. Ftracing a sendmsg call that sends ~8k of
data over the UDP socket, I see a single 8k skb travel through the
stack all the way to xfrm_output(). MTU is 1500.
That's the whole poing of fragmentation offloading, to pass all
the data at once as far as we can, isn't it? What UDP code did
you think would "notice and prefragment"?

Thanks,
Steffen Klassert March 17, 2016, 10:24 a.m. UTC | #3
On Thu, Mar 17, 2016 at 10:41:15AM +0100, Jiri Bohac wrote:
> On Thu, Mar 17, 2016 at 01:03:59PM +0800, Herbert Xu wrote:
> > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote:
> > > Prevent xfrm_output() from segmenting UFO packets so that they will be
> > > fragmented after the xfrm transforms.
> > 
> > Fair enough.  But I wonder if this is enough.  Wouldn't UDP notice
> > that we're doing IPsec and prefragment the packet anyway? So I think
> > this check may also be needed in the UDP output path.
> 
> Fixes my broken case. 

Is this IPv4 or IPv6? IPv4 should not create a GSO skb
if IPsec is done. It checks for rt->dst.header_len
in __ip_append_data() and does a fallback to the
standard case if rt->dst.header_len is non zero.

In IPv6 this check is missing, so this could be the
problem if this is IPv6.
Jiri Bohac March 17, 2016, 10:49 a.m. UTC | #4
On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote:
> > > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote:
> > Fixes my broken case. 
> 
> Is this IPv4 or IPv6? IPv4 should not create a GSO skb
> if IPsec is done. It checks for rt->dst.header_len
> in __ip_append_data() and does a fallback to the
> standard case if rt->dst.header_len is non zero.

It's IPv6.

> In IPv6 this check is missing, so this could be the
> problem if this is IPv6.

Doesn't the check do exactly the opposite of what the RFC says?
The RFC wants ESP to be performed first and fragmentation after
that. UDPv4 currently seems to be doing the opposite. Well at
least it works, unlike in the IPv6 case, where the packet is
fragmented, but not enough space is reserved, so after adding the
ESP headers, it is fragmented once more.

(Details can be found in my first e-mail in this thread, I now
replied into the old thread after >1 month, sorry for that:
http://thread.gmane.org/gmane.linux.network/396952
)
Steffen Klassert March 17, 2016, 11:01 a.m. UTC | #5
On Thu, Mar 17, 2016 at 11:49:53AM +0100, Jiri Bohac wrote:
> On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote:
> > > > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote:
> > > Fixes my broken case. 
> > 
> > Is this IPv4 or IPv6? IPv4 should not create a GSO skb
> > if IPsec is done. It checks for rt->dst.header_len
> > in __ip_append_data() and does a fallback to the
> > standard case if rt->dst.header_len is non zero.
> 
> It's IPv6.
> 
> > In IPv6 this check is missing, so this could be the
> > problem if this is IPv6.
> 
> Doesn't the check do exactly the opposite of what the RFC says?
> The RFC wants ESP to be performed first and fragmentation after
> that. UDPv4 currently seems to be doing the opposite. 

No, __ip_append_data() only prepares the packets for fragmentation
and enqueues them. Then __ip_make_skb() dequeues and builds
one skb with a fraglist. Then the xfrm layer is called, so
esp linearizes (unfortunately) the skb and applies the
transformation. Fragmentation happens after that.
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4355129..6f3e814 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3501,6 +3501,12 @@  static inline bool skb_is_gso_v6(const struct sk_buff *skb)
 	return skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6;
 }
 
+/* Note: Should be called only if skb_is_gso(skb) is true */
+static inline bool skb_is_ufo(const struct sk_buff *skb)
+{
+	return skb_shinfo(skb)->gso_type & SKB_GSO_UDP;
+}
+
 void __skb_warn_lro_forwarding(const struct sk_buff *skb);
 
 static inline bool skb_warn_if_lro(const struct sk_buff *skb)
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index cc3676e..c52cc8b 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -197,8 +197,12 @@  int xfrm_output(struct sock *sk, struct sk_buff *skb)
 	struct net *net = dev_net(skb_dst(skb)->dev);
 	int err;
 
-	if (skb_is_gso(skb))
-		return xfrm_output_gso(net, sk, skb);
+	if (skb_is_gso(skb)) {
+		if (skb_is_ufo(skb))
+			return xfrm_output2(net, sk, skb);
+		else
+			return xfrm_output_gso(net, sk, skb);
+	}
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		err = skb_checksum_help(skb);