From patchwork Fri Apr 15 21:30:59 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 611251 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3qmrM71Pbgz9t6t for ; Sat, 16 Apr 2016 07:31:15 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id D7BE01086B; Fri, 15 Apr 2016 14:31:13 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e4.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id 75A841076B for ; Fri, 15 Apr 2016 14:31:12 -0700 (PDT) Received: from bar5.cudamail.com (unknown [192.168.21.12]) by mx1e4.cudamail.com (Postfix) with ESMTPS id 01A4A1E0392 for ; Fri, 15 Apr 2016 15:31:12 -0600 (MDT) X-ASG-Debug-ID: 1460755870-09eadd4b7f314b0001-byXFYA Received: from mx3-pf1.cudamail.com ([192.168.14.2]) by bar5.cudamail.com with ESMTP id ZehH7bjWNJGGyRLo (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 15 Apr 2016 15:31:10 -0600 (MDT) X-Barracuda-Envelope-From: cascardo@redhat.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.14.2 Received: from unknown (HELO mx1.redhat.com) (209.132.183.28) by mx3-pf1.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 15 Apr 2016 21:31:10 -0000 Received-SPF: pass (mx3-pf1.cudamail.com: SPF record at _spf1.redhat.com designates 209.132.183.28 as permitted sender) X-Barracuda-Apparent-Source-IP: 209.132.183.28 X-Barracuda-RBL-IP: 209.132.183.28 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CE9B23B71F for ; Fri, 15 Apr 2016 21:31:08 +0000 (UTC) Received: from indiana.gru.redhat.com (ovpn-113-95.phx2.redhat.com [10.3.113.95]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u3FLV6fc031758; Fri, 15 Apr 2016 17:31:07 -0400 X-CudaMail-Envelope-Sender: cascardo@redhat.com From: Thadeu Lima de Souza Cascardo To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-V1-414047444 X-CudaMail-DTE: 041516 X-CudaMail-Originating-IP: 209.132.183.28 Date: Fri, 15 Apr 2016 18:30:59 -0300 X-ASG-Orig-Subj: [##CM-V1-414047444##][RFC PATCH] create vxlan device using rtnetlink interface Message-Id: <1460755859-4282-1-git-send-email-cascardo@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Barracuda-Connect: UNKNOWN[192.168.14.2] X-Barracuda-Start-Time: 1460755870 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Cc: jbenc@redhat.com Subject: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Hi, this is an RFC patch (could probably be 2 patches - more below), that creates VXLAN devices in userspace and adds them as netdev vports, instead of using the vxlan vport code. The reason for this change is that it has been mentioned in the past that some of the vport code should be considered compat code, that is, it won't receive new features or flags. That means tunnel devices should be created in userspace and added as netdev vports. Some problems are noticeable in this patch, and I have gone through some iterations back and forth. First is the creation of the device under a switch/case. I tried to make this a netdev_class function, overriding the netdev class by dpif_port_open_type. That required exporting a lot of the vport functions. I can still do it that way, if that's preferrable, but this seems more simple. The other problem is during port_del, that since we don't have a netdev, the dpif_port type would not be vxlan, and deciding whether to remove it or not could not be made. In fact, this is one of the main reasons why this is an RFC. The solution here is to decide on the type by the device's name, and there is even a function for this, though it looks up for the netdev's already created, those that probably come from the database. However, when OVS starts, it removes ports from the datapath, and that information is not available, and may not even be. In fact, since the dpif_port has a different name because it's a vport, it won't match any netdev at all. So, I did the opposite of getting the type from the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we make this more strict and use the port, we could still be in conflict with a vxlan device created by the user with such name. This should have been one patch by itself. Since we already have similar problems, like failing to port_add when there is a device named vxlan_sys_ created by the user, one could argue that this patch does not introduce a new problem. Jiri has suggested that we require users to create the interfaces themselves, by using whatever method their OS has, and add them as netdev devices. That would still require fetching some of the configuration from the device in order to make it properly work with flow-based tunnels. In fact, if we set the remote or local IP addresses on those devices, this would require multiple interfaces instead of only one just to be able to specify the same level of configuration as ovsdb allows us to. And that still requires some proper changes in order to support flow-based tunnels (calling tnl_port_add/tnl_port_del with proper configuration, for example). Another option instead of relying on the device names is to use ifalias. It's a user settable name that is opaque to the kernel and tools like iproute and net-tools. Any opinions on these options or other comments on the patch? --- lib/dpif-netlink.c | 285 +++++++++++++++++++++++++++++++++++++++++++---------- lib/netdev-vport.c | 47 ++++++--- lib/netdev-vport.h | 1 + lib/netdev.c | 7 ++ 4 files changed, 273 insertions(+), 67 deletions(-) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 015ba20..6be6f52 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -780,10 +781,8 @@ get_vport_type(const struct dpif_netlink_vport *vport) } static enum ovs_vport_type -netdev_to_ovs_vport_type(const struct netdev *netdev) +netdev_to_ovs_vport_type(const char *type) { - const char *type = netdev_get_type(netdev); - if (!strcmp(type, "tap") || !strcmp(type, "system")) { return OVS_VPORT_TYPE_NETDEV; } else if (!strcmp(type, "internal")) { @@ -804,19 +803,14 @@ netdev_to_ovs_vport_type(const struct netdev *netdev) } static int -dpif_netlink_port_add__(struct dpif_netlink *dpif, struct netdev *netdev, +dpif_netlink_port_add__(struct dpif_netlink *dpif, const char *name, + enum ovs_vport_type type, + struct ofpbuf *options, odp_port_t *port_nop) OVS_REQ_WRLOCK(dpif->upcall_lock) { - const struct netdev_tunnel_config *tnl_cfg; - char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; - const char *name = netdev_vport_get_dpif_port(netdev, - namebuf, sizeof namebuf); - const char *type = netdev_get_type(netdev); struct dpif_netlink_vport request, reply; struct ofpbuf *buf; - uint64_t options_stub[64 / 8]; - struct ofpbuf options; struct nl_sock **socksp = NULL; uint32_t *upcall_pids; int error = 0; @@ -831,52 +825,19 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, struct netdev *netdev, dpif_netlink_vport_init(&request); request.cmd = OVS_VPORT_CMD_NEW; request.dp_ifindex = dpif->dp_ifindex; - request.type = netdev_to_ovs_vport_type(netdev); - if (request.type == OVS_VPORT_TYPE_UNSPEC) { - VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has " - "unsupported type `%s'", - dpif_name(&dpif->dpif), name, type); - vport_del_socksp(dpif, socksp); - return EINVAL; - } + request.type = type; request.name = name; - if (request.type == OVS_VPORT_TYPE_NETDEV) { -#ifdef _WIN32 - /* XXX : Map appropiate Windows handle */ -#else - netdev_linux_ethtool_set_flag(netdev, ETH_FLAG_LRO, "LRO", false); -#endif - } - - tnl_cfg = netdev_get_tunnel_config(netdev); - if (tnl_cfg && (tnl_cfg->dst_port != 0 || tnl_cfg->exts)) { - ofpbuf_use_stack(&options, options_stub, sizeof options_stub); - if (tnl_cfg->dst_port) { - nl_msg_put_u16(&options, OVS_TUNNEL_ATTR_DST_PORT, - ntohs(tnl_cfg->dst_port)); - } - if (tnl_cfg->exts) { - size_t ext_ofs; - int i; - - ext_ofs = nl_msg_start_nested(&options, OVS_TUNNEL_ATTR_EXTENSION); - for (i = 0; i < 32; i++) { - if (tnl_cfg->exts & (1 << i)) { - nl_msg_put_flag(&options, i); - } - } - nl_msg_end_nested(&options, ext_ofs); - } - request.options = options.data; - request.options_len = options.size; - } - request.port_no = *port_nop; upcall_pids = vport_socksp_to_pids(socksp, dpif->n_handlers); request.n_upcall_pids = socksp ? dpif->n_handlers : 1; request.upcall_pids = upcall_pids; + if (options) { + request.options = options->data; + request.options_len = options->size; + } + error = dpif_netlink_vport_transact(&request, &reply, &buf); if (!error) { *port_nop = reply.port_no; @@ -916,6 +877,215 @@ exit: } static int +dpif_netlink_port_add_compat(struct dpif_netlink *dpif, struct netdev *netdev, + odp_port_t *port_nop) + OVS_REQ_WRLOCK(dpif->upcall_lock) +{ + const struct netdev_tunnel_config *tnl_cfg; + char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; + const char *name = netdev_vport_get_dpif_port(netdev, + namebuf, sizeof namebuf); + const char *type = netdev_get_type(netdev); + uint64_t options_stub[64 / 8]; + struct ofpbuf options; + enum ovs_vport_type ovs_type; + + ovs_type = netdev_to_ovs_vport_type(netdev_get_type(netdev)); + if (ovs_type == OVS_VPORT_TYPE_UNSPEC) { + VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has " + "unsupported type `%s'", + dpif_name(&dpif->dpif), name, type); + return EINVAL; + } + + if (ovs_type == OVS_VPORT_TYPE_NETDEV) { +#ifdef _WIN32 + /* XXX : Map appropiate Windows handle */ +#else + netdev_linux_ethtool_set_flag(netdev, ETH_FLAG_LRO, "LRO", false); +#endif + } + + tnl_cfg = netdev_get_tunnel_config(netdev); + if (tnl_cfg && (tnl_cfg->dst_port != 0 || tnl_cfg->exts)) { + ofpbuf_use_stack(&options, options_stub, sizeof options_stub); + if (tnl_cfg->dst_port) { + nl_msg_put_u16(&options, OVS_TUNNEL_ATTR_DST_PORT, + ntohs(tnl_cfg->dst_port)); + } + if (tnl_cfg->exts) { + size_t ext_ofs; + int i; + + ext_ofs = nl_msg_start_nested(&options, OVS_TUNNEL_ATTR_EXTENSION); + for (i = 0; i < 32; i++) { + if (tnl_cfg->exts & (1 << i)) { + nl_msg_put_flag(&options, i); + } + } + nl_msg_end_nested(&options, ext_ofs); + } + return dpif_netlink_port_add__(dpif, name, ovs_type, &options, port_nop); + } else { + return dpif_netlink_port_add__(dpif, name, ovs_type, NULL, port_nop); + } + +} + +#ifdef __linux__ + +static int +netdev_vxlan_create(struct netdev *netdev) +{ + int err; + struct ofpbuf request, *reply; + size_t linkinfo_off, infodata_off; + char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; + const char *name = netdev_vport_get_dpif_port(netdev, + namebuf, sizeof namebuf); + const struct netdev_tunnel_config *tnl_cfg; + tnl_cfg = netdev_get_tunnel_config(netdev); + if (!tnl_cfg) { /* or assert? */ + return EINVAL; + } + + ofpbuf_init(&request, 0); + nl_msg_put_nlmsghdr(&request, 0, RTM_NEWLINK, + NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE); + ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg)); + nl_msg_put_string(&request, IFLA_IFNAME, name); + linkinfo_off = nl_msg_start_nested(&request, IFLA_LINKINFO); + nl_msg_put_string(&request, IFLA_INFO_KIND, "vxlan"); + infodata_off = nl_msg_start_nested(&request, IFLA_INFO_DATA); + nl_msg_put_u8(&request, IFLA_VXLAN_COLLECT_METADATA, 1); + nl_msg_put_u8(&request, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, 1); + if (tnl_cfg->exts & (1 << OVS_VXLAN_EXT_GBP)) { + nl_msg_put_flag(&request, IFLA_VXLAN_GBP); + } + nl_msg_put_be16(&request, IFLA_VXLAN_PORT, tnl_cfg->dst_port); + nl_msg_end_nested(&request, infodata_off); + nl_msg_end_nested(&request, linkinfo_off); + + err = nl_transact(NETLINK_ROUTE, &request, &reply); + + if (!err) { + ofpbuf_uninit(reply); + } + + if (err == EINVAL) { + err = EOPNOTSUPP; + } + + ofpbuf_uninit(&request); + return err; +} + +static int +netdev_vxlan_destroy(const char *name) +{ + int err; + struct ofpbuf request, *reply; + + ofpbuf_init(&request, 0); + nl_msg_put_nlmsghdr(&request, 0, RTM_DELLINK, + NLM_F_REQUEST | NLM_F_ACK); + ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg)); + nl_msg_put_string(&request, IFLA_IFNAME, name); + + err = nl_transact(NETLINK_ROUTE, &request, &reply); + + if (!err) { + ofpbuf_uninit(reply); + } + + ofpbuf_uninit(&request); + return err; +} + +#else + +static int +netdev_vxlan_create(struct netdev *netdev OVS_UNUSED) +{ + return EOPNOTSUPP; +} + +static int +netdev_vxlan_destroy(const char *name OVS_UNUSED) +{ + return EOPNOTSUPP; +} + +#endif + +static int +dpif_netlink_port_query__(const struct dpif_netlink *dpif, odp_port_t port_no, + const char *port_name, struct dpif_port *dpif_port); + +static int +dpif_netlink_port_create(struct netdev *netdev) +{ + switch (netdev_to_ovs_vport_type(netdev_get_type(netdev))) { + case OVS_VPORT_TYPE_VXLAN: + return netdev_vxlan_create(netdev); + case OVS_VPORT_TYPE_NETDEV: + case OVS_VPORT_TYPE_INTERNAL: + case OVS_VPORT_TYPE_GENEVE: + case OVS_VPORT_TYPE_GRE: + case OVS_VPORT_TYPE_LISP: + case OVS_VPORT_TYPE_STT: + case OVS_VPORT_TYPE_UNSPEC: + case __OVS_VPORT_TYPE_MAX: + default: + return EOPNOTSUPP; + } + return 0; +} + +static int +dpif_netlink_port_destroy(const char *name, const char *type) +{ + switch (netdev_to_ovs_vport_type(type)) { + case OVS_VPORT_TYPE_VXLAN: + return netdev_vxlan_destroy(name); + case OVS_VPORT_TYPE_NETDEV: + case OVS_VPORT_TYPE_INTERNAL: + case OVS_VPORT_TYPE_GENEVE: + case OVS_VPORT_TYPE_GRE: + case OVS_VPORT_TYPE_LISP: + case OVS_VPORT_TYPE_STT: + case OVS_VPORT_TYPE_UNSPEC: + case __OVS_VPORT_TYPE_MAX: + default: + return EOPNOTSUPP; + } + return 0; +} + +static int +dpif_netlink_port_create_and_add(struct dpif_netlink *dpif, struct netdev *netdev, + odp_port_t *port_nop) + OVS_REQ_WRLOCK(dpif->upcall_lock) +{ + int error; + char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; + const char *name = netdev_vport_get_dpif_port(netdev, + namebuf, sizeof namebuf); + + error = dpif_netlink_port_create(netdev); + if (error) { + return error; + } + + error = dpif_netlink_port_add__(dpif, name, OVS_VPORT_TYPE_NETDEV, NULL, port_nop); + if (error) { + VLOG_DBG("failed to add port, destroying: %d", error); + dpif_netlink_port_destroy(name, netdev_get_type(netdev)); + } + return error; +} + +static int dpif_netlink_port_add(struct dpif *dpif_, struct netdev *netdev, odp_port_t *port_nop) { @@ -923,7 +1093,10 @@ dpif_netlink_port_add(struct dpif *dpif_, struct netdev *netdev, int error; fat_rwlock_wrlock(&dpif->upcall_lock); - error = dpif_netlink_port_add__(dpif, netdev, port_nop); + error = dpif_netlink_port_create_and_add(dpif, netdev, port_nop); + if (error == EOPNOTSUPP) { + error = dpif_netlink_port_add_compat(dpif, netdev, port_nop); + } fat_rwlock_unlock(&dpif->upcall_lock); return error; @@ -935,6 +1108,12 @@ dpif_netlink_port_del__(struct dpif_netlink *dpif, odp_port_t port_no) { struct dpif_netlink_vport vport; int error; + struct dpif_port dpif_port; + + error = dpif_netlink_port_query__(dpif, port_no, NULL, &dpif_port); + if (error) { + return error; + } dpif_netlink_vport_init(&vport); vport.cmd = OVS_VPORT_CMD_DEL; @@ -944,6 +1123,9 @@ dpif_netlink_port_del__(struct dpif_netlink *dpif, odp_port_t port_no) vport_del_channels(dpif, port_no); + dpif_netlink_port_destroy(dpif_port.name, dpif_port.type); + dpif_port_destroy(&dpif_port); + return error; } @@ -991,6 +1173,7 @@ dpif_netlink_port_query__(const struct dpif_netlink *dpif, odp_port_t port_no, return error; } + static int dpif_netlink_port_query_by_number(const struct dpif *dpif_, odp_port_t port_no, struct dpif_port *dpif_port) diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index e398562..57578c3 100644 --- a/lib/netdev-vport.c +++ b/lib/netdev-vport.c @@ -90,6 +90,8 @@ struct vport_class { struct netdev_class netdev_class; }; +static const struct vport_class vport_classes[]; + /* Last read of the route-table's change number. */ static uint64_t rt_change_seqno; @@ -1547,25 +1549,38 @@ netdev_vport_range(struct unixctl_conn *conn, int argc, tunnel_get_status, \ BUILD_HEADER, PUSH_HEADER, POP_HEADER) }} +/* The name of the dpif_port should be short enough to accomodate adding + * a port number to the end if one is necessary. */ +static const struct vport_class vport_classes[] = { + TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header, + push_udp_header, + netdev_geneve_pop_header), + TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header, + netdev_gre_push_header, + netdev_gre_pop_header), + TUNNEL_CLASS("ipsec_gre", "gre_sys", NULL, NULL, NULL), + TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header, + push_udp_header, + netdev_vxlan_pop_header), + TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL), + TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL), +}; + +const struct netdev_class * +netdev_dpif_port_get_vport_class(const char *dpif_port) +{ + int i; + for (i = 0; i < ARRAY_SIZE(vport_classes); i++) { + if (strstr(dpif_port, vport_classes[i].dpif_port) == dpif_port) { + return &vport_classes[i].netdev_class; + } + } + return NULL; +} + void netdev_vport_tunnel_register(void) { - /* The name of the dpif_port should be short enough to accomodate adding - * a port number to the end if one is necessary. */ - static const struct vport_class vport_classes[] = { - TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header, - push_udp_header, - netdev_geneve_pop_header), - TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header, - netdev_gre_push_header, - netdev_gre_pop_header), - TUNNEL_CLASS("ipsec_gre", "gre_sys", NULL, NULL, NULL), - TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header, - push_udp_header, - netdev_vxlan_pop_header), - TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL), - TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL), - }; static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; if (ovsthread_once_start(&once)) { diff --git a/lib/netdev-vport.h b/lib/netdev-vport.h index be02cb5..7e32bbd 100644 --- a/lib/netdev-vport.h +++ b/lib/netdev-vport.h @@ -42,6 +42,7 @@ void netdev_vport_inc_tx(const struct netdev *, bool netdev_vport_is_vport_class(const struct netdev_class *); const char *netdev_vport_class_get_dpif_port(const struct netdev_class *); +const struct netdev_class *netdev_dpif_port_get_vport_class(const char *); #ifndef _WIN32 enum { NETDEV_VPORT_NAME_BUFSIZE = 16 }; diff --git a/lib/netdev.c b/lib/netdev.c index 3e50694..0bc4d04 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -1789,6 +1789,13 @@ netdev_get_type_from_name(const char *name) struct netdev *dev = netdev_from_name(name); const char *type = dev ? netdev_get_type(dev) : NULL; netdev_close(dev); + if (!type) { + const struct netdev_class * netdev_class; + netdev_class = netdev_dpif_port_get_vport_class(name); + if (netdev_class) { + type = netdev_class->type; + } + } return type; }