From patchwork Fri Jul 15 11:43:07 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shmulik Ladkani X-Patchwork-Id: 648790 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3rrW1j5cKMz9s4x for ; Fri, 15 Jul 2016 21:44:09 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=lNJaI8Wt; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932854AbcGOLoH (ORCPT ); Fri, 15 Jul 2016 07:44:07 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:34250 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932735AbcGOLn4 (ORCPT ); Fri, 15 Jul 2016 07:43:56 -0400 Received: by mail-wm0-f66.google.com with SMTP id q128so1906690wma.1 for ; Fri, 15 Jul 2016 04:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=V7ICpUZrTT52SmQQpmM8iw2fJ80D8cGvUx1ajqDHzyw=; b=lNJaI8Wt/p5lVg0ri3tzIy29wEwWPRr8R27h9sUvduXRsrFONTu0Rr87wCh3D7sfZ2 Nc42MxqP4if1x0OOwPLys7Mol3tN4i5BUplpboHs4YlK4LIJErYpUzPCAmoTyHZulrBA MTINj2wT1hq7i/WMEuzB993Ru5Jr57v1aOpTUjla5menVJMsT1AYr4Ld5+IwCPf4TpCJ TpjjVpF/DuGUJJxsraEzhXUutRGa0fmF42Ifwy26x4wvXTgUqFjW1HKtSzvbpxCLW1dq zk6nRwECAPbMfVQZXniI5wWNxDquQdgqG6S7DM4Yr+pGDE7XY61R2thRhRwvr/meLJk7 V+LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=V7ICpUZrTT52SmQQpmM8iw2fJ80D8cGvUx1ajqDHzyw=; b=eogq60YtclZqGwwPv04oVx8YKcTEinYUuzZzC/ZFdf2ylLciMDHL/jSvC2YqywRuOH TNT9uGVFMnRxlti8Mbs5txiA6g0vl6nPdB3LK8p3hX7BjGKr2qbIuOKXgRB0btgEcMR6 IrzFg1Zwpd1R72KI/e0qNW0jnrzmKVKBe4dcoYa66607niiMIwkWZSFpwIaoRLXbIrI6 QSDhIoTDn7jKcigFkgXfKBfOVdcK3hlEhe+MZFuuBOCa6nE1+FtW1Uzy09Vq9fZMy5ay BpESnVtUlDyVUuke4BwS2t3hjeSC+2TVUCDG0uY1sGP+/E8cdrFFqiEC0IAL537A4sAx VxzA== X-Gm-Message-State: ALyK8tJKqlhM3d6Hxqr8SdSFhImLwSgpy8Z1M2P5nSW2hqQY7yGkiUIOxFk5lt6jurvZRA== X-Received: by 10.28.103.6 with SMTP id b6mr21328118wmc.89.1468583034925; Fri, 15 Jul 2016 04:43:54 -0700 (PDT) Received: from halley.ravello.local ([37.46.39.199]) by smtp.gmail.com with ESMTPSA id ai4sm337553wjc.17.2016.07.15.04.43.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 15 Jul 2016 04:43:54 -0700 (PDT) From: Shmulik Ladkani To: "David S . Miller" , netdev@vger.kernel.org Cc: shmulik.ladkani@ravellosystems.com, Eric Dumazet , Shmulik Ladkani , Hannes Frederic Sowa , Florian Westphal Subject: [PATCH v2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs Date: Fri, 15 Jul 2016 14:43:07 +0300 Message-Id: <1468582987-18990-1-git-send-email-shmulik.ladkani@gmail.com> X-Mailer: git-send-email 2.7.4 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Given: - tap0 and vxlan0 are bridged - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400) Assume GSO skbs arriving from tap0 having a gso_size as determined by user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500). After encapsulation these skbs have skb_gso_network_seglen that exceed eth0's ip_skb_dst_mtu. These skbs are accidentally passed to ip_finish_output2 AS IS. Alas, each final segment (segmented either by validate_xmit_skb or by hardware UFO) would be larger than eth0 mtu. As a result, those above-mtu segments get dropped on certain networks. This behavior is not aligned with the NON-GSO case: Assume a non-gso 1500-sized IP packet arrives from tap0. After encapsulation, the vxlan datagram is fragmented normally at the ip_finish_output-->ip_fragment code path. The expected behavior for the GSO case would be segmenting the "gso-oversized" skb first, then fragmenting each segment according to dst mtu, and finally passing the resulting fragments to ip_finish_output2. 'ip_finish_output_gso' already supports this "Slowpath" behavior, but it is only considered if IPSKB_FORWARDED is set (which is not set in the bridged case). In order to support the bridged case, we'll mark skbs arriving from an ingress interface that get udp-encaspulated as "allowed to be fragmented". This mark (as well as the original IPSKB_FORWARDED mark) gets tested in 'ip_finish_output_gso', in order to determine whether validating the network seglen is needed. Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the gso and non-gso cases), which serves users wishing to forbid fragmentation at the udp tunnel endpoint. Cc: Hannes Frederic Sowa Cc: Florian Westphal Signed-off-by: Shmulik Ladkani Acked-by: Hannes Frederic Sowa --- v2: Instead of completely removing the IPSKB_FORWARDED condition of 'ip_finish_output_gso' (forcing an expensive 'skb_gso_validate_mtu' on all local traffic), augment the condition to the tunneled usecase, as suggested by Florian and Hannes. include/net/ip.h | 1 + net/ipv4/ip_output.c | 10 +++++++--- net/ipv4/ip_tunnel_core.c | 9 +++++++++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 08f36cd2b8..9742b92dc9 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -47,6 +47,7 @@ struct inet_skb_parm { #define IPSKB_REROUTED BIT(4) #define IPSKB_DOREDIRECT BIT(5) #define IPSKB_FRAG_PMTU BIT(6) +#define IPSKB_FRAG_SEGS BIT(7) u16 frag_max_size; }; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index e23f141c9b..18bb7639dd 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -221,11 +221,15 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, { netdev_features_t features; struct sk_buff *segs; + int allow_frag; int ret = 0; - /* common case: locally created skb or seglen is <= mtu */ - if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) || - skb_gso_validate_mtu(skb, mtu)) + allow_frag = IPCB(skb)->flags & (IPSKB_FORWARDED | IPSKB_FRAG_SEGS); + + /* common case: locally created skb and fragmentation of segments is + * not allowed, or seglen is <= mtu + */ + if (!allow_frag || skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(net, sk, skb); /* Slowpath - GSO segment length is exceeding the dst MTU. diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index afd6b5968c..9d847c3025 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -63,6 +63,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, int pkt_len = skb->len - skb_inner_network_offset(skb); struct net *net = dev_net(rt->dst.dev); struct net_device *dev = skb->dev; + int skb_iif = skb->skb_iif; struct iphdr *iph; int err; @@ -72,6 +73,14 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, skb_dst_set(skb, &rt->dst); memset(IPCB(skb), 0, sizeof(*IPCB(skb))); + if (skb_iif && proto == IPPROTO_UDP) { + /* Arrived from an ingress interface and got udp encapuslated. + * The encapsulated network segment length may exceed dst mtu. + * Allow IP Fragmentation of segments. + */ + IPCB(skb)->flags |= IPSKB_FRAG_SEGS; + } + /* Push down and install the IP header. */ skb_push(skb, sizeof(struct iphdr)); skb_reset_network_header(skb);