From patchwork Fri Feb 19 18:44:38 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 585377 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (li376-54.members.linode.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 0596F140324 for ; Sat, 20 Feb 2016 05:45:24 +1100 (AEDT) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id A377022C397; Fri, 19 Feb 2016 10:45:04 -0800 (PST) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx3v3.cudamail.com (mx3.cudamail.com [64.34.241.5]) by archives.nicira.com (Postfix) with ESMTPS id 462B222C396 for ; Fri, 19 Feb 2016 10:45:03 -0800 (PST) Received: from bar6.cudamail.com (localhost [127.0.0.1]) by mx3v3.cudamail.com (Postfix) with ESMTPS id CD087163788 for ; Fri, 19 Feb 2016 11:45:02 -0700 (MST) X-ASG-Debug-ID: 1455907502-0b3237554ebbd40001-byXFYA Received: from mx1-pf2.cudamail.com ([192.168.24.2]) by bar6.cudamail.com with ESMTP id xOxk6M1CKJ5H5NjW (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 19 Feb 2016 11:45:02 -0700 (MST) X-Barracuda-Envelope-From: joe@ovn.org X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.2 Received: from unknown (HELO relay6-d.mail.gandi.net) (217.70.183.198) by mx1-pf2.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 19 Feb 2016 18:45:01 -0000 Received-SPF: pass (mx1-pf2.cudamail.com: SPF record at ovn.org designates 217.70.183.198 as permitted sender) X-Barracuda-Apparent-Source-IP: 217.70.183.198 X-Barracuda-RBL-IP: 217.70.183.198 Received: from mfilter36-d.gandi.net (mfilter36-d.gandi.net [217.70.178.167]) by relay6-d.mail.gandi.net (Postfix) with ESMTP id 48CC2FB882; Fri, 19 Feb 2016 19:45:00 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mfilter36-d.gandi.net Received: from relay6-d.mail.gandi.net ([IPv6:::ffff:217.70.183.198]) by mfilter36-d.gandi.net (mfilter36-d.gandi.net [::ffff:10.0.15.180]) (amavisd-new, port 10024) with ESMTP id d5f6AUYvbSjm; Fri, 19 Feb 2016 19:44:58 +0100 (CET) X-Originating-IP: 208.91.1.34 Received: from localhost.localdomain (unknown [208.91.1.34]) (Authenticated sender: joe@ovn.org) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 38D48FB881; Fri, 19 Feb 2016 19:44:56 +0100 (CET) X-CudaMail-Envelope-Sender: joe@ovn.org From: Joe Stringer To: jesse@kernel.org, dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E2-218059451 X-CudaMail-DTE: 021916 X-CudaMail-Originating-IP: 217.70.183.198 Date: Fri, 19 Feb 2016 10:44:38 -0800 X-ASG-Orig-Subj: [##CM-E2-218059451##][PATCHv2 3/6] datapath: Set a large MTU on tunnel devices. Message-Id: <1455907481-24507-4-git-send-email-joe@ovn.org> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1455907481-24507-1-git-send-email-joe@ovn.org> References: <1455907481-24507-1-git-send-email-joe@ovn.org> X-Barracuda-Connect: UNKNOWN[192.168.24.2] X-Barracuda-Start-Time: 1455907502 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Cc: David Wragg Subject: [ovs-dev] [PATCHv2 3/6] datapath: Set a large MTU on tunnel devices. X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" From: David Wragg Upstream commit: Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg Signed-off-by: David S. Miller Upstream: 7e059158d57b ("vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices") Signed-off-by: Joe Stringer --- acinclude.m4 | 1 + datapath/linux/compat/geneve.c | 18 ++++++++++++++---- datapath/linux/compat/include/net/ip_tunnels.h | 6 ++++++ datapath/linux/compat/ip_gre.c | 8 ++++++++ datapath/linux/compat/ip_tunnel.c | 19 ++++++++++++++++--- datapath/linux/compat/vxlan.c | 11 ++++++++--- datapath/vport-vxlan.c | 2 ++ 7 files changed, 55 insertions(+), 10 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index dc06be6323ee..11c77877d46c 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -354,6 +354,7 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/net/ip.h], [IPSKB_FRAG_PMTU], [OVS_DEFINE([HAVE_CORRECT_MRU_HANDLING])]) + OVS_GREP_IFELSE([$KSRC/include/net/ip_tunnels.h], [__ip_tunnel_change_mtu]) OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], [hashfn.*const], [OVS_DEFINE([HAVE_INET_FRAGS_CONST])]) OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], [last_in], diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c index 29a65a0a49d2..50ed3936d038 100644 --- a/datapath/linux/compat/geneve.c +++ b/datapath/linux/compat/geneve.c @@ -1039,11 +1039,21 @@ struct net_device *rpl_geneve_dev_create_fb(struct net *net, const char *name, return dev; err = geneve_configure(net, dev, 0, 0, 0, 0, htons(dst_port), true); - if (err) { - free_netdev(dev); - return ERR_PTR(err); - } + if (err) + goto err; + + /* openvswitch users expect packet sizes to be unrestricted, + * so set the largest MTU we can. + */ + err = geneve_change_mtu(dev, IP_MAX_MTU); + if (err) + goto err; + return dev; + +err: + free_netdev(dev); + return ERR_PTR(err); } EXPORT_SYMBOL_GPL(rpl_geneve_dev_create_fb); diff --git a/datapath/linux/compat/include/net/ip_tunnels.h b/datapath/linux/compat/include/net/ip_tunnels.h index 3e1ceef21930..5eda8a2e3b0f 100644 --- a/datapath/linux/compat/include/net/ip_tunnels.h +++ b/datapath/linux/compat/include/net/ip_tunnels.h @@ -327,4 +327,10 @@ int rpl_ip_tunnel_get_iflink(const struct net_device *dev); #define ip_tunnel_get_link_net rpl_ip_tunnel_get_link_net struct net *rpl_ip_tunnel_get_link_net(const struct net_device *dev); #endif /* HAVE_METADATA_DST */ + +#ifndef HAVE___IP_TUNNEL_CHANGE_MTU +#define __ip_tunnel_change_mtu rpl___ip_tunnel_change_mtu +int rpl___ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict); +#endif + #endif /* __NET_IP_TUNNELS_H */ diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c index c9197e9652fd..f6a841fbc4d1 100644 --- a/datapath/linux/compat/ip_gre.c +++ b/datapath/linux/compat/ip_gre.c @@ -613,6 +613,14 @@ struct net_device *rpl_gretap_fb_dev_create(struct net *net, const char *name, #endif if (err < 0) goto out; + + /* openvswitch users expect packet sizes to be unrestricted, + * so set the largest MTU we can. + */ + err = __ip_tunnel_change_mtu(dev, IP_MAX_MTU, false); + if (err) + goto out; + return dev; out: free_netdev(dev); diff --git a/datapath/linux/compat/ip_tunnel.c b/datapath/linux/compat/ip_tunnel.c index 2d4070eccb01..81909370271c 100644 --- a/datapath/linux/compat/ip_tunnel.c +++ b/datapath/linux/compat/ip_tunnel.c @@ -137,18 +137,31 @@ static int ip_tunnel_bind_dev(struct net_device *dev) return mtu; } -int rpl_ip_tunnel_change_mtu(struct net_device *dev, int new_mtu) +int rpl___ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict) { struct ip_tunnel *tunnel = netdev_priv(dev); int t_hlen = tunnel->hlen + sizeof(struct iphdr); + int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen; - if (new_mtu < 68 || - new_mtu > 0xFFF8 - dev->hard_header_len - t_hlen) + if (new_mtu < 68) return -EINVAL; + + if (new_mtu > max_mtu) { + if (strict) + return -EINVAL; + + new_mtu = max_mtu; + } + dev->mtu = new_mtu; return 0; } +int rpl_ip_tunnel_change_mtu(struct net_device *dev, int new_mtu) +{ + return rpl___ip_tunnel_change_mtu(dev, new_mtu, true); +} + static void ip_tunnel_dev_free(struct net_device *dev) { #ifdef HAVE_DEV_TSTATS diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c index f443a1b0352a..769b76f53ac8 100644 --- a/datapath/linux/compat/vxlan.c +++ b/datapath/linux/compat/vxlan.c @@ -1905,6 +1905,7 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, int err; bool use_ipv6 = false; __be16 default_port = vxlan->cfg.dst_port; + struct net_device *lowerdev = NULL; vxlan->net = src_net; @@ -1924,9 +1925,7 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, } if (conf->remote_ifindex) { - struct net_device *lowerdev - = __dev_get_by_index(src_net, conf->remote_ifindex); - + lowerdev = __dev_get_by_index(src_net, conf->remote_ifindex); dst->remote_ifindex = conf->remote_ifindex; if (!lowerdev) { @@ -1957,6 +1956,12 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, dev->needed_headroom = ETH_HLEN + VXLAN_HEADROOM; } + if (conf->mtu) { + err = __vxlan_change_mtu(dev, lowerdev, dst, conf->mtu, false); + if (err) + return err; + } + memcpy(&vxlan->cfg, conf, sizeof(*conf)); if (!vxlan->cfg.dst_port) vxlan->cfg.dst_port = default_port; diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c index 66b79f4dba6e..c05f5d447e08 100644 --- a/datapath/vport-vxlan.c +++ b/datapath/vport-vxlan.c @@ -91,6 +91,8 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms) struct vxlan_config conf = { .no_share = true, .flags = VXLAN_F_COLLECT_METADATA, + /* Don't restrict the packets that can be sent by MTU */ + .mtu = IP_MAX_MTU, }; if (!options) {