From patchwork Tue Sep 20 14:01:05 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 672284 X-Patchwork-Delegate: pshelar@ovn.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3sdkv222Nnz9s8x for ; Wed, 21 Sep 2016 00:01:18 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 5242B10306; Tue, 20 Sep 2016 07:01:17 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e4.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id F25FC10303 for ; Tue, 20 Sep 2016 07:01:15 -0700 (PDT) Received: from bar5.cudamail.com (unknown [192.168.21.12]) by mx1e4.cudamail.com (Postfix) with ESMTPS id 7F5931E00ED for ; Tue, 20 Sep 2016 08:01:15 -0600 (MDT) X-ASG-Debug-ID: 1474380072-09eadd3531776190001-byXFYA Received: from mx1-pf2.cudamail.com ([192.168.24.2]) by bar5.cudamail.com with ESMTP id OAhuNARSbGYhCZ68 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 20 Sep 2016 08:01:13 -0600 (MDT) X-Barracuda-Envelope-From: cascardo@redhat.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.2 Received: from unknown (HELO mx1.redhat.com) (209.132.183.28) by mx1-pf2.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 20 Sep 2016 14:01:12 -0000 Received-SPF: pass (mx1-pf2.cudamail.com: SPF record at _spf1.redhat.com designates 209.132.183.28 as permitted sender) X-Barracuda-Apparent-Source-IP: 209.132.183.28 X-Barracuda-RBL-IP: 209.132.183.28 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 81BD6C04B925; Tue, 20 Sep 2016 14:01:11 +0000 (UTC) Received: from indiana.gru.redhat.com ([10.96.65.63]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8KE1998025491; Tue, 20 Sep 2016 10:01:10 -0400 X-CudaMail-Envelope-Sender: cascardo@redhat.com From: Thadeu Lima de Souza Cascardo To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E2-919023427 X-CudaMail-DTE: 092016 X-CudaMail-Originating-IP: 209.132.183.28 Date: Tue, 20 Sep 2016 11:01:05 -0300 X-ASG-Orig-Subj: [##CM-E2-919023427##][RFC PATCH] datapath: allow tunnels to be created with rtnetlink Message-Id: <1474380065-2957-1-git-send-email-cascardo@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 20 Sep 2016 14:01:11 +0000 (UTC) X-Barracuda-Connect: UNKNOWN[192.168.24.2] X-Barracuda-Start-Time: 1474380073 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Cc: e@erig.me Subject: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" In order to use rtnetlink to create new tunnel vports, the backported tunnels require some code that was removed from their upstream version, mainly the necessary code for newlink and for start_xmit. This patch adds back the necessary code for VXLAN, GRE and Geneve tunnels. Signed-off-by: Eric Garver Signed-off-by: Thadeu Lima de Souza Cascardo --- datapath/linux/Modules.mk | 1 + datapath/linux/compat/geneve.c | 15 +-- datapath/linux/compat/include/linux/if_tunnel.h | 71 ++++++++++++ datapath/linux/compat/ip_gre.c | 65 ++++++++--- datapath/linux/compat/vxlan.c | 147 +++++++++++++++++++++--- 5 files changed, 261 insertions(+), 38 deletions(-) create mode 100644 datapath/linux/compat/include/linux/if_tunnel.h diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk index 26f6d22..ad7d14a 100644 --- a/datapath/linux/Modules.mk +++ b/datapath/linux/Modules.mk @@ -38,6 +38,7 @@ openvswitch_headers += \ linux/compat/include/linux/if.h \ linux/compat/include/linux/if_ether.h \ linux/compat/include/linux/if_link.h \ + linux/compat/include/linux/if_tunnel.h \ linux/compat/include/linux/if_vlan.h \ linux/compat/include/linux/in.h \ linux/compat/include/linux/jiffies.h \ diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c index 0c5b58a..79bb0ba 100644 --- a/datapath/linux/compat/geneve.c +++ b/datapath/linux/compat/geneve.c @@ -1112,9 +1112,8 @@ tx_error: } #endif -netdev_tx_t rpl_geneve_xmit(struct sk_buff *skb) +static netdev_tx_t geneve_dev_xmit(struct sk_buff *skb, struct net_device *dev) { - struct net_device *dev = skb->dev; struct geneve_dev *geneve = netdev_priv(dev); struct ip_tunnel_info *info = NULL; @@ -1128,18 +1127,12 @@ netdev_tx_t rpl_geneve_xmit(struct sk_buff *skb) #endif return geneve_xmit_skb(skb, dev, info); } -EXPORT_SYMBOL_GPL(rpl_geneve_xmit); -static netdev_tx_t geneve_dev_xmit(struct sk_buff *skb, struct net_device *dev) +netdev_tx_t rpl_geneve_xmit(struct sk_buff *skb) { - /* Drop All packets coming from networking stack. OVS-CB is - * not initialized for these packets. - */ - - dev_kfree_skb(skb); - dev->stats.tx_dropped++; - return NETDEV_TX_OK; + return geneve_dev_xmit(skb, skb->dev); } +EXPORT_SYMBOL_GPL(rpl_geneve_xmit); static int __geneve_change_mtu(struct net_device *dev, int new_mtu, bool strict) { diff --git a/datapath/linux/compat/include/linux/if_tunnel.h b/datapath/linux/compat/include/linux/if_tunnel.h new file mode 100644 index 0000000..476fe3c --- /dev/null +++ b/datapath/linux/compat/include/linux/if_tunnel.h @@ -0,0 +1,71 @@ +#ifndef _LINUX_IF_TUNNEL_WRAPPER_H +#define _LINUX_IF_TUNNEL_WRAPPER_H + +#include_next + +/* GRE section */ +enum { +#define IFLA_GRE_UNSPEC rpl_IFLA_GRE_UNSPEC + IFLA_GRE_UNSPEC, + +#define IFLA_GRE_LINK rpl_IFLA_GRE_LINK + IFLA_GRE_LINK, + +#define IFLA_GRE_IFLAGS rpl_IFLA_GRE_IFLAGS + IFLA_GRE_IFLAGS, + +#define IFLA_GRE_OFLAGS rpl_IFLA_GRE_OFLAGS + IFLA_GRE_OFLAGS, + +#define IFLA_GRE_IKEY rpl_IFLA_GRE_IKEY + IFLA_GRE_IKEY, + +#define IFLA_GRE_OKEY rpl_IFLA_GRE_OKEY + IFLA_GRE_OKEY, + +#define IFLA_GRE_LOCAL rpl_IFLA_GRE_LOCAL + IFLA_GRE_LOCAL, + +#define IFLA_GRE_REMOTE rpl_IFLA_GRE_REMOTE + IFLA_GRE_REMOTE, + +#define IFLA_GRE_TTL rpl_IFLA_GRE_TTL + IFLA_GRE_TTL, + +#define IFLA_GRE_TOS rpl_IFLA_GRE_TOS + IFLA_GRE_TOS, + +#define IFLA_GRE_PMTUDISC rpl_IFLA_GRE_PMTUDISC + IFLA_GRE_PMTUDISC, + +#define IFLA_GRE_ENCAP_LIMIT rpl_IFLA_GRE_ENCAP_LIMIT + IFLA_GRE_ENCAP_LIMIT, + +#define IFLA_GRE_FLOWINFO rpl_IFLA_GRE_FLOWINFO + IFLA_GRE_FLOWINFO, + +#define IFLA_GRE_FLAGS rpl_IFLA_GRE_FLAGS + IFLA_GRE_FLAGS, + +#define IFLA_GRE_ENCAP_TYPE rpl_IFLA_GRE_ENCAP_TYPE + IFLA_GRE_ENCAP_TYPE, + +#define IFLA_GRE_ENCAP_FLAGS rpl_IFLA_GRE_ENCAP_FLAGS + IFLA_GRE_ENCAP_FLAGS, + +#define IFLA_GRE_ENCAP_SPORT rpl_IFLA_GRE_ENCAP_SPORT + IFLA_GRE_ENCAP_SPORT, + +#define IFLA_GRE_ENCAP_DPORT rpl_IFLA_GRE_ENCAP_DPORT + IFLA_GRE_ENCAP_DPORT, + +#define IFLA_GRE_COLLECT_METADATA rpl_IFLA_GRE_COLLECT_METADATA + IFLA_GRE_COLLECT_METADATA, + +#define __IFLA_GRE_MAX rpl__IFLA_GRE_MAX + __IFLA_GRE_MAX +}; +#undef IFLA_GRE_MAX +#define IFLA_GRE_MAX (__IFLA_GRE_MAX - 1) + +#endif diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c index 03c5435..ab04dab 100644 --- a/datapath/linux/compat/ip_gre.c +++ b/datapath/linux/compat/ip_gre.c @@ -273,9 +273,8 @@ static struct rtable *gre_get_rt(struct sk_buff *skb, return ip_route_output_key(net, fl); } -netdev_tx_t rpl_gre_fb_xmit(struct sk_buff *skb) +static netdev_tx_t gre_dev_xmit(struct sk_buff *skb, struct net_device *dev) { - struct net_device *dev = skb->dev; struct ip_tunnel_info *tun_info; const struct ip_tunnel_key *key; struct flowi4 fl; @@ -338,7 +337,6 @@ err_free_skb: dev->stats.tx_dropped++; return NETDEV_TX_OK; } -EXPORT_SYMBOL(rpl_gre_fb_xmit); #define GRE_FEATURES (NETIF_F_SG | \ NETIF_F_FRAGLIST | \ @@ -443,6 +441,47 @@ static void ipgre_netlink_parms(struct net_device *dev, memset(parms, 0, sizeof(*parms)); parms->iph.protocol = IPPROTO_GRE; + + if (!data) + return; + + if (data[IFLA_GRE_LINK]) + parms->link = nla_get_u32(data[IFLA_GRE_LINK]); + + if (data[IFLA_GRE_IFLAGS]) + parms->i_flags = gre_flags_to_tnl_flags(nla_get_be16(data[IFLA_GRE_IFLAGS])); + + if (data[IFLA_GRE_OFLAGS]) + parms->o_flags = gre_flags_to_tnl_flags(nla_get_be16(data[IFLA_GRE_OFLAGS])); + + if (data[IFLA_GRE_IKEY]) + parms->i_key = nla_get_be32(data[IFLA_GRE_IKEY]); + + if (data[IFLA_GRE_OKEY]) + parms->o_key = nla_get_be32(data[IFLA_GRE_OKEY]); + + if (data[IFLA_GRE_LOCAL]) + parms->iph.saddr = nla_get_in_addr(data[IFLA_GRE_LOCAL]); + + if (data[IFLA_GRE_REMOTE]) + parms->iph.daddr = nla_get_in_addr(data[IFLA_GRE_REMOTE]); + + if (data[IFLA_GRE_TTL]) + parms->iph.ttl = nla_get_u8(data[IFLA_GRE_TTL]); + + if (data[IFLA_GRE_TOS]) + parms->iph.tos = nla_get_u8(data[IFLA_GRE_TOS]); + + if (!data[IFLA_GRE_PMTUDISC] || nla_get_u8(data[IFLA_GRE_PMTUDISC])) + parms->iph.frag_off = htons(IP_DF); + + if (data[IFLA_GRE_COLLECT_METADATA]) { + struct ip_tunnel *t = netdev_priv(dev); + + t->collect_md = true; + if (dev->type == ARPHRD_IPGRE) + dev->type = ARPHRD_NONE; + } } static int gre_tap_init(struct net_device *dev) @@ -453,16 +492,11 @@ static int gre_tap_init(struct net_device *dev) return ip_tunnel_init(dev); } -static netdev_tx_t gre_dev_xmit(struct sk_buff *skb, struct net_device *dev) +netdev_tx_t rpl_gre_fb_xmit(struct sk_buff *skb) { - /* Drop All packets coming from networking stack. OVS-CB is - * not initialized for these packets. - */ - - dev_kfree_skb(skb); - dev->stats.tx_dropped++; - return NETDEV_TX_OK; + return gre_dev_xmit(skb, skb->dev); } +EXPORT_SYMBOL(rpl_gre_fb_xmit); int ovs_gre_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) { @@ -518,11 +552,9 @@ static int ipgre_newlink(struct net_device *dev, #endif { struct ip_tunnel_parm p; - int err; ipgre_netlink_parms(dev, data, tb, &p); - err = ip_tunnel_newlink(dev, tb, &p); - return err; + return ip_tunnel_newlink(dev, tb, &p); } @@ -580,6 +612,11 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev) !!(p->iph.frag_off & htons(IP_DF)))) goto nla_put_failure; + if (t->collect_md) { + if (nla_put_flag(skb, IFLA_GRE_COLLECT_METADATA)) + goto nla_put_failure; + } + return 0; nla_put_failure: diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c index 47a5a68..73b260e 100644 --- a/datapath/linux/compat/vxlan.c +++ b/datapath/linux/compat/vxlan.c @@ -1225,9 +1225,8 @@ tx_free: * Outer UDP destination is the VXLAN assigned port. * source port is based on hash of flow */ -netdev_tx_t rpl_vxlan_xmit(struct sk_buff *skb) +static netdev_tx_t vxlan_dev_xmit(struct sk_buff *skb, struct net_device *dev) { - struct net_device *dev = skb->dev; struct vxlan_dev *vxlan = netdev_priv(dev); const struct ip_tunnel_info *info; @@ -1244,7 +1243,6 @@ netdev_tx_t rpl_vxlan_xmit(struct sk_buff *skb) kfree_skb(skb); return NETDEV_TX_OK; } -EXPORT_SYMBOL_GPL(rpl_vxlan_xmit); /* Walk the forwarding table and purge stale entries */ static void vxlan_cleanup(unsigned long arg) @@ -1466,16 +1464,11 @@ int ovs_vxlan_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(ovs_vxlan_fill_metadata_dst); -static netdev_tx_t vxlan_dev_xmit(struct sk_buff *skb, struct net_device *dev) +netdev_tx_t rpl_vxlan_xmit(struct sk_buff *skb) { - /* Drop All packets coming from networking stack. OVS-CB is - * not initialized for these packets. - */ - - dev_kfree_skb(skb); - dev->stats.tx_dropped++; - return NETDEV_TX_OK; + return vxlan_dev_xmit(skb, skb->dev); } +EXPORT_SYMBOL_GPL(rpl_vxlan_xmit); static const struct net_device_ops vxlan_netdev_ether_ops = { .ndo_init = vxlan_init, @@ -1950,8 +1943,136 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, static int vxlan_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { - pr_info("unsupported operation\n"); - return -EINVAL; + struct vxlan_config conf; + int err; + + memset(&conf, 0, sizeof(conf)); + + if (data[IFLA_VXLAN_ID]) + conf.vni = cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID])); + + if (data[IFLA_VXLAN_GROUP]) { + conf.remote_ip.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_GROUP]); + } else if (data[IFLA_VXLAN_GROUP6]) { + if (!IS_ENABLED(CONFIG_IPV6)) + return -EPFNOSUPPORT; + + conf.remote_ip.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]); + conf.remote_ip.sa.sa_family = AF_INET6; + } + + if (data[IFLA_VXLAN_LOCAL]) { + conf.saddr.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_LOCAL]); + conf.saddr.sa.sa_family = AF_INET; + } else if (data[IFLA_VXLAN_LOCAL6]) { + if (!IS_ENABLED(CONFIG_IPV6)) + return -EPFNOSUPPORT; + + /* TODO: respect scope id */ + conf.saddr.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]); + conf.saddr.sa.sa_family = AF_INET6; + } + + if (data[IFLA_VXLAN_LINK]) + conf.remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]); + + if (data[IFLA_VXLAN_TOS]) + conf.tos = nla_get_u8(data[IFLA_VXLAN_TOS]); + + if (data[IFLA_VXLAN_TTL]) + conf.ttl = nla_get_u8(data[IFLA_VXLAN_TTL]); + + if (data[IFLA_VXLAN_LABEL]) + conf.label = nla_get_be32(data[IFLA_VXLAN_LABEL]) & + IPV6_FLOWLABEL_MASK; + + if (!data[IFLA_VXLAN_LEARNING] || nla_get_u8(data[IFLA_VXLAN_LEARNING])) + conf.flags |= VXLAN_F_LEARN; + + if (data[IFLA_VXLAN_AGEING]) + conf.age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]); + + if (data[IFLA_VXLAN_PROXY] && nla_get_u8(data[IFLA_VXLAN_PROXY])) + conf.flags |= VXLAN_F_PROXY; + + if (data[IFLA_VXLAN_RSC] && nla_get_u8(data[IFLA_VXLAN_RSC])) + conf.flags |= VXLAN_F_RSC; + + if (data[IFLA_VXLAN_L2MISS] && nla_get_u8(data[IFLA_VXLAN_L2MISS])) + conf.flags |= VXLAN_F_L2MISS; + + if (data[IFLA_VXLAN_L3MISS] && nla_get_u8(data[IFLA_VXLAN_L3MISS])) + conf.flags |= VXLAN_F_L3MISS; + + if (data[IFLA_VXLAN_LIMIT]) + conf.addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]); + + if (data[IFLA_VXLAN_COLLECT_METADATA] && + nla_get_u8(data[IFLA_VXLAN_COLLECT_METADATA])) + conf.flags |= VXLAN_F_COLLECT_METADATA; + + if (data[IFLA_VXLAN_PORT_RANGE]) { + const struct ifla_vxlan_port_range *p + = nla_data(data[IFLA_VXLAN_PORT_RANGE]); + conf.port_min = ntohs(p->low); + conf.port_max = ntohs(p->high); + } + + if (data[IFLA_VXLAN_PORT]) + conf.dst_port = nla_get_be16(data[IFLA_VXLAN_PORT]); + + if (data[IFLA_VXLAN_UDP_CSUM] && + !nla_get_u8(data[IFLA_VXLAN_UDP_CSUM])) + conf.flags |= VXLAN_F_UDP_ZERO_CSUM_TX; + + if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX] && + nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX])) + conf.flags |= VXLAN_F_UDP_ZERO_CSUM6_TX; + + if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX] && + nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX])) + conf.flags |= VXLAN_F_UDP_ZERO_CSUM6_RX; + + if (data[IFLA_VXLAN_REMCSUM_TX] && + nla_get_u8(data[IFLA_VXLAN_REMCSUM_TX])) + conf.flags |= VXLAN_F_REMCSUM_TX; + + if (data[IFLA_VXLAN_REMCSUM_RX] && + nla_get_u8(data[IFLA_VXLAN_REMCSUM_RX])) + conf.flags |= VXLAN_F_REMCSUM_RX; + + if (data[IFLA_VXLAN_GBP]) + conf.flags |= VXLAN_F_GBP; + + if (data[IFLA_VXLAN_GPE]) + conf.flags |= VXLAN_F_GPE; + + if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL]) + conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL; + + if (tb[IFLA_MTU]) + conf.mtu = nla_get_u32(tb[IFLA_MTU]); + + err = vxlan_dev_configure(src_net, dev, &conf); + switch (err) { + case -ENODEV: + pr_info("ifindex %d does not exist\n", conf.remote_ifindex); + break; + + case -EPERM: + pr_info("IPv6 is disabled via sysctl\n"); + break; + + case -EEXIST: + pr_info("duplicate VNI %u\n", be32_to_cpu(conf.vni)); + break; + + case -EINVAL: + pr_info("unsupported combination of extensions\n"); + break; + } + + return err; } static void vxlan_dellink(struct net_device *dev, struct list_head *head)