From patchwork Sun Jul 12 20:07:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1327599 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B4d9r2xCdz9sRN for ; Mon, 13 Jul 2020 06:07:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729401AbgGLUH1 (ORCPT ); Sun, 12 Jul 2020 16:07:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729012AbgGLUH0 (ORCPT ); Sun, 12 Jul 2020 16:07:26 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68B96C061794 for ; Sun, 12 Jul 2020 13:07:26 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1juiFx-0002dC-2J; Sun, 12 Jul 2020 22:07:25 +0200 From: Florian Westphal To: Cc: aconole@redhat.com, sbrivio@redhat.com, Florian Westphal Subject: [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Date: Sun, 12 Jul 2020 22:07:03 +0200 Message-Id: <20200712200705.9796-2-fw@strlen.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200712200705.9796-1-fw@strlen.de> References: <20200712200705.9796-1-fw@strlen.de> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org vxlan and geneve take the to-be-transmitted skb, prepend the encapsulation header and send the result. Neither vxlan nor geneve can do anything about a lowered path mtu except notifying the peer/upper dst entry. In routed setups, vxlan takes the updated pmtu from the encap sockets' dst entry and will notify/update the dst entry of the current skb. Some setups, however, will use vxlan as a bridge port (or openvs vport). In both cases, no upper dst entry exists. Without this patch: 1. Client sends x bytes, where x == MTU of vxlan/geneve interface. 2. the encap header is prepended and the encap packet is passed to ip_output. 3. If the sk received a pmtu error in the mean time, then ip_output will fetch the mtu from the encap socket instead of dev->mtu. 4. ip_output emits an ICMP error to encap socket The step #4 prevents the route exception from timing out, and setup remains in a state where the upper layer cannot send MTU-sized packets, even though the encapsulated packet doesn't exceed the link MTU. It appears best to configure the encap socket to never learn about path MTU in these setups. Next patch will add the VXLAN config plane to use this. Signed-off-by: Florian Westphal --- include/net/ipv6.h | 7 +++++++ include/net/udp_tunnel.h | 2 ++ net/ipv4/udp_tunnel_core.c | 2 ++ net/ipv6/ip6_udp_tunnel.c | 7 +++++++ 4 files changed, 18 insertions(+) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 5e65bf2fd32d..fa8e546546e3 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1195,6 +1195,13 @@ static inline void ip6_sock_set_recverr(struct sock *sk) release_sock(sk); } +static inline void ip6_sock_set_mtu_discover(struct sock *sk, int val) +{ + lock_sock(sk); + inet6_sk(sk)->pmtudisc = val; + release_sock(sk); +} + static inline int __ip6_sock_set_addr_preferences(struct sock *sk, int val) { unsigned int pref = 0; diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index dd20ce99740c..f02be73bdae1 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -34,6 +34,8 @@ struct udp_port_cfg { unsigned int use_udp_checksums:1, use_udp6_tx_checksums:1, use_udp6_rx_checksums:1, + ip_pmtudisc:1, + ip_pmtudiscv:3, ipv6_v6only:1; }; diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c index 3eecba0874aa..1d20bd5b72ac 100644 --- a/net/ipv4/udp_tunnel_core.c +++ b/net/ipv4/udp_tunnel_core.c @@ -26,6 +26,8 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg, if (err < 0) goto error; } + if (cfg->ip_pmtudisc) + ip_sock_set_mtu_discover(sock->sk, cfg->ip_pmtudiscv); udp_addr.sin_family = AF_INET; udp_addr.sin_addr = cfg->local_ip; diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c index cdc4d4ee2420..63c22252a76f 100644 --- a/net/ipv6/ip6_udp_tunnel.c +++ b/net/ipv6/ip6_udp_tunnel.c @@ -34,6 +34,13 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg, if (err < 0) goto error; } + if (cfg->ip_pmtudisc) { + BUILD_BUG_ON(IP_PMTUDISC_DONT != IPV6_PMTUDISC_DONT); + BUILD_BUG_ON(IP_PMTUDISC_OMIT != IPV6_PMTUDISC_OMIT); + + ip_sock_set_mtu_discover(sock->sk, cfg->ip_pmtudiscv); + ip6_sock_set_mtu_discover(sock->sk, cfg->ip_pmtudiscv); + } udp6_addr.sin6_family = AF_INET6; memcpy(&udp6_addr.sin6_addr, &cfg->local_ip6, From patchwork Sun Jul 12 20:07:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1327600 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B4d9r5ft3z9sQt for ; Mon, 13 Jul 2020 06:07:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729409AbgGLUHc (ORCPT ); Sun, 12 Jul 2020 16:07:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729012AbgGLUHb (ORCPT ); Sun, 12 Jul 2020 16:07:31 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFE10C061794 for ; Sun, 12 Jul 2020 13:07:30 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1juiG1-0002dR-K0; Sun, 12 Jul 2020 22:07:29 +0200 From: Florian Westphal To: Cc: aconole@redhat.com, sbrivio@redhat.com, Florian Westphal Subject: [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Date: Sun, 12 Jul 2020 22:07:04 +0200 Message-Id: <20200712200705.9796-3-fw@strlen.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200712200705.9796-1-fw@strlen.de> References: <20200712200705.9796-1-fw@strlen.de> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org While its already possible to configure VXLAN to never set the DF bit on packets that it sends, it was not yet possible to tell kernel to not update the encapsulation sockets path MTU. This can be used to tell ip stack to always use the interface MTU when VXLAN wants to send a packet. When packets are routed, VXLAN use skbs existing dst entries to propagate the MTU update to the overlay, but on a bridge this doesn't work (no routing, no dst entry, and no ip forwarding takes place, so nothing emits icmp packet w. mtu update to sender). This is only useful when VXLAN is used as a bridge port and the network is known to accept packets up to the link MTU to avoid bogus pmtu icmp packets from stopping tunneled traffic. Signed-off-by: Florian Westphal --- drivers/net/vxlan.c | 65 +++++++++++++++++++++++++++++++----- include/net/vxlan.h | 2 ++ include/uapi/linux/if_link.h | 1 + 3 files changed, 59 insertions(+), 9 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index a43c97b13924..ceb2940a2a62 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -3316,6 +3316,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = { [IFLA_VXLAN_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG }, [IFLA_VXLAN_TTL_INHERIT] = { .type = NLA_FLAG }, [IFLA_VXLAN_DF] = { .type = NLA_U8 }, + [IFLA_VXLAN_PMTUDISC] = { .type = NLA_U8 }, }; static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[], @@ -3418,7 +3419,8 @@ static const struct ethtool_ops vxlan_ethtool_ops = { }; static struct socket *vxlan_create_sock(struct net *net, bool ipv6, - __be16 port, u32 flags, int ifindex) + const struct vxlan_config *cfg, + int ifindex) { struct socket *sock; struct udp_port_cfg udp_conf; @@ -3429,13 +3431,18 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6, if (ipv6) { udp_conf.family = AF_INET6; udp_conf.use_udp6_rx_checksums = - !(flags & VXLAN_F_UDP_ZERO_CSUM6_RX); + !(cfg->flags & VXLAN_F_UDP_ZERO_CSUM6_RX); udp_conf.ipv6_v6only = 1; } else { udp_conf.family = AF_INET; } - udp_conf.local_udp_port = port; + if (cfg->pmtudisc) { + udp_conf.ip_pmtudisc = 1; + udp_conf.ip_pmtudiscv = cfg->pmtudiscv; + } + + udp_conf.local_udp_port = cfg->dst_port; udp_conf.bind_ifindex = ifindex; /* Open UDP socket */ @@ -3448,7 +3455,7 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6, /* Create new listen socket if needed */ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6, - __be16 port, u32 flags, + const struct vxlan_config *cfg, int ifindex) { struct vxlan_net *vn = net_generic(net, vxlan_net_id); @@ -3464,7 +3471,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6, for (h = 0; h < VNI_HASH_SIZE; ++h) INIT_HLIST_HEAD(&vs->vni_list[h]); - sock = vxlan_create_sock(net, ipv6, port, flags, ifindex); + sock = vxlan_create_sock(net, ipv6, cfg, ifindex); if (IS_ERR(sock)) { kfree(vs); return ERR_CAST(sock); @@ -3472,10 +3479,10 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6, vs->sock = sock; refcount_set(&vs->refcnt, 1); - vs->flags = (flags & VXLAN_F_RCV_FLAGS); + vs->flags = (cfg->flags & VXLAN_F_RCV_FLAGS); spin_lock(&vn->sock_lock); - hlist_add_head_rcu(&vs->hlist, vs_head(net, port)); + hlist_add_head_rcu(&vs->hlist, vs_head(net, cfg->dst_port)); udp_tunnel_notify_add_rx_port(sock, (vs->flags & VXLAN_F_GPE) ? UDP_TUNNEL_TYPE_VXLAN_GPE : @@ -3521,8 +3528,7 @@ static int __vxlan_sock_add(struct vxlan_dev *vxlan, bool ipv6) } if (!vs) vs = vxlan_socket_create(vxlan->net, ipv6, - vxlan->cfg.dst_port, vxlan->cfg.flags, - l3mdev_index); + &vxlan->cfg, l3mdev_index); if (IS_ERR(vs)) return PTR_ERR(vs); #if IS_ENABLED(CONFIG_IPV6) @@ -3984,6 +3990,21 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[], if (data[IFLA_VXLAN_LINK]) conf->remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]); + if (data[IFLA_VXLAN_PMTUDISC]) { + int pmtuv = nla_get_u8(data[IFLA_VXLAN_PMTUDISC]); + + if (pmtuv < IP_PMTUDISC_DONT) { + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_PMTUDISC], "PMTUDISC Value < 0"); + return -EOPNOTSUPP; + } + if (pmtuv > IP_PMTUDISC_OMIT) { + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_PMTUDISC], "PMTUDISC Value > IP_PMTUDISC_OMIT"); + return -EOPNOTSUPP; + } + + conf->pmtudisc = 1; + conf->pmtudiscv = pmtuv; + } if (data[IFLA_VXLAN_TOS]) conf->tos = nla_get_u8(data[IFLA_VXLAN_TOS]); @@ -4249,6 +4270,27 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], netdev_adjacent_change_commit(dst->remote_dev, lowerdev, dev); if (lowerdev && lowerdev != dst->remote_dev) dst->remote_dev = lowerdev; + + if (conf.pmtudisc && conf.pmtudiscv != vxlan->cfg.pmtudiscv) { + struct vxlan_sock *sock4 = rtnl_dereference(vxlan->vn4_sock); +#if IS_ENABLED(CONFIG_IPV6) + struct vxlan_sock *sock6 = rtnl_dereference(vxlan->vn6_sock); +#endif + struct socket *sock; + + if (sock4) { + sock = sock4->sock; + ip_sock_set_mtu_discover(sock->sk, conf.pmtudiscv); + } +#if IS_ENABLED(CONFIG_IPV6) + if (sock6) { + sock = sock6->sock; + ip6_sock_set_mtu_discover(sock->sk, conf.pmtudiscv); + ip_sock_set_mtu_discover(sock->sk, conf.pmtudiscv); + } +#endif + } + vxlan_config_apply(dev, &conf, lowerdev, vxlan->net, true); return 0; } @@ -4276,6 +4318,7 @@ static size_t vxlan_get_size(const struct net_device *dev) nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TTL_INHERIT */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TOS */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_DF */ + nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_PMTUDISC */ nla_total_size(sizeof(__be32)) + /* IFLA_VXLAN_LABEL */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_LEARNING */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_PROXY */ @@ -4374,6 +4417,10 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put(skb, IFLA_VXLAN_PORT_RANGE, sizeof(ports), &ports)) goto nla_put_failure; + if (vxlan->cfg.pmtudisc && + nla_put_u8(skb, IFLA_VXLAN_PMTUDISC, vxlan->cfg.pmtudiscv)) + goto nla_put_failure; + if (vxlan->cfg.flags & VXLAN_F_GBP && nla_put_flag(skb, IFLA_VXLAN_GBP)) goto nla_put_failure; diff --git a/include/net/vxlan.h b/include/net/vxlan.h index 3a41627cbdfe..1414cfa2005f 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -220,6 +220,8 @@ struct vxlan_config { unsigned long age_interval; unsigned int addrmax; bool no_share; + u8 pmtudisc:1; + u8 pmtudiscv:3; enum ifla_vxlan_df df; }; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index cc185a007ade..f22cf508871c 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -548,6 +548,7 @@ enum { IFLA_VXLAN_GPE, IFLA_VXLAN_TTL_INHERIT, IFLA_VXLAN_DF, + IFLA_VXLAN_PMTUDISC, __IFLA_VXLAN_MAX }; #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1) From patchwork Sun Jul 12 20:07:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1327601 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B4d9s1P4Vz9sR4 for ; Mon, 13 Jul 2020 06:07:45 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729421AbgGLUHh (ORCPT ); Sun, 12 Jul 2020 16:07:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729012AbgGLUHf (ORCPT ); Sun, 12 Jul 2020 16:07:35 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8503EC061794 for ; Sun, 12 Jul 2020 13:07:35 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1juiG6-0002dh-4g; Sun, 12 Jul 2020 22:07:34 +0200 From: Florian Westphal To: Cc: aconole@redhat.com, sbrivio@redhat.com, Florian Westphal Subject: [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Date: Sun, 12 Jul 2020 22:07:05 +0200 Message-Id: <20200712200705.9796-4-fw@strlen.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200712200705.9796-1-fw@strlen.de> References: <20200712200705.9796-1-fw@strlen.de> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org same as vxlan change, compile tested only. Signed-off-by: Florian Westphal --- drivers/net/geneve.c | 59 ++++++++++++++++++++++++++++++++---- include/uapi/linux/if_link.h | 1 + 2 files changed, 54 insertions(+), 6 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 49b00def2eef..19c1c74f6b5e 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -53,6 +53,8 @@ struct geneve_config { bool collect_md; bool use_udp6_rx_checksums; bool ttl_inherit; + u8 pmtudisc:1; + u8 pmtudiscv:3; enum ifla_geneve_df df; }; @@ -442,8 +444,10 @@ static int geneve_udp_encap_err_lookup(struct sock *sk, struct sk_buff *skb) } static struct socket *geneve_create_sock(struct net *net, bool ipv6, - __be16 port, bool ipv6_rx_csum) + const struct geneve_config *cfg) { + bool ipv6_rx_csum = cfg->use_udp6_rx_checksums; + __be16 port = cfg->info.key.tp_dst; struct socket *sock; struct udp_port_cfg udp_conf; int err; @@ -459,6 +463,11 @@ static struct socket *geneve_create_sock(struct net *net, bool ipv6, udp_conf.local_ip.s_addr = htonl(INADDR_ANY); } + if (cfg->pmtudisc) { + udp_conf.ip_pmtudisc = 1; + udp_conf.ip_pmtudiscv = cfg->pmtudiscv; + } + udp_conf.local_udp_port = port; /* Open UDP socket */ @@ -564,8 +573,9 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb, } /* Create new listen socket if needed */ -static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port, - bool ipv6, bool ipv6_rx_csum) +static struct geneve_sock *geneve_socket_create(struct net *net, + const struct geneve_config *cfg, + bool ipv6) { struct geneve_net *gn = net_generic(net, geneve_net_id); struct geneve_sock *gs; @@ -577,7 +587,7 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port, if (!gs) return ERR_PTR(-ENOMEM); - sock = geneve_create_sock(net, ipv6, port, ipv6_rx_csum); + sock = geneve_create_sock(net, ipv6, cfg); if (IS_ERR(sock)) { kfree(gs); return ERR_CAST(sock); @@ -664,8 +674,7 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6) goto out; } - gs = geneve_socket_create(net, geneve->cfg.info.key.tp_dst, ipv6, - geneve->cfg.use_udp6_rx_checksums); + gs = geneve_socket_create(net, &geneve->cfg, ipv6); if (IS_ERR(gs)) return PTR_ERR(gs); @@ -1173,6 +1182,7 @@ static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = { [IFLA_GENEVE_UDP_ZERO_CSUM6_RX] = { .type = NLA_U8 }, [IFLA_GENEVE_TTL_INHERIT] = { .type = NLA_U8 }, [IFLA_GENEVE_DF] = { .type = NLA_U8 }, + [IFLA_GENEVE_PMTUDISC] = { .type = NLA_U8 }, }; static int geneve_validate(struct nlattr *tb[], struct nlattr *data[], @@ -1411,6 +1421,21 @@ static int geneve_nl2info(struct nlattr *tb[], struct nlattr *data[], info->key.ttl = nla_get_u8(data[IFLA_GENEVE_TTL]); cfg->ttl_inherit = false; } + if (data[IFLA_GENEVE_PMTUDISC]) { + int pmtuv = nla_get_u8(data[IFLA_GENEVE_PMTUDISC]); + + if (pmtuv < IP_PMTUDISC_DONT) { + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_GENEVE_PMTUDISC], "PMTUDISC Value < 0"); + return -EOPNOTSUPP; + } + if (pmtuv > IP_PMTUDISC_OMIT) { + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_GENEVE_PMTUDISC], "PMTUDISC Value > IP_PMTUDISC_OMIT"); + return -EOPNOTSUPP; + } + + cfg->pmtudisc = 1; + cfg->pmtudiscv = pmtuv; + } if (data[IFLA_GENEVE_TOS]) info->key.tos = nla_get_u8(data[IFLA_GENEVE_TOS]); @@ -1634,6 +1659,23 @@ static int geneve_changelink(struct net_device *dev, struct nlattr *tb[], } geneve_quiesce(geneve, &gs4, &gs6); + + if (cfg.pmtudisc && cfg.pmtudiscv != geneve->cfg.pmtudiscv) { + struct socket *sock; + + if (gs4) { + sock = gs4->sock; + ip_sock_set_mtu_discover(sock->sk, cfg.pmtudiscv); + } +#if IS_ENABLED(CONFIG_IPV6) + if (gs6) { + sock = gs6->sock; + ip6_sock_set_mtu_discover(sock->sk, cfg.pmtudiscv); + ip_sock_set_mtu_discover(sock->sk, cfg.pmtudiscv); + } +#endif + } + memcpy(&geneve->cfg, &cfg, sizeof(cfg)); geneve_unquiesce(geneve, gs4, gs6); @@ -1655,6 +1697,7 @@ static size_t geneve_get_size(const struct net_device *dev) nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_TTL */ nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_TOS */ nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_DF */ + nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_PMTUDISC */ nla_total_size(sizeof(__be32)) + /* IFLA_GENEVE_LABEL */ nla_total_size(sizeof(__be16)) + /* IFLA_GENEVE_PORT */ nla_total_size(0) + /* IFLA_GENEVE_COLLECT_METADATA */ @@ -1706,6 +1749,10 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_u8(skb, IFLA_GENEVE_DF, geneve->cfg.df)) goto nla_put_failure; + if (geneve->cfg.pmtudisc && + nla_put_u8(skb, IFLA_GENEVE_PMTUDISC, geneve->cfg.pmtudiscv)) + goto nla_put_failure; + if (nla_put_be16(skb, IFLA_GENEVE_PORT, info->key.tp_dst)) goto nla_put_failure; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index f22cf508871c..2ca0059b7d1a 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -582,6 +582,7 @@ enum { IFLA_GENEVE_LABEL, IFLA_GENEVE_TTL_INHERIT, IFLA_GENEVE_DF, + IFLA_GENEVE_PMTUDISC, __IFLA_GENEVE_MAX }; #define IFLA_GENEVE_MAX (__IFLA_GENEVE_MAX - 1)