From patchwork Wed Oct 6 17:00:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1537293 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=fHSCpQmc; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HPghN2fylz9sR4 for ; Thu, 7 Oct 2021 04:01:08 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 463FC8403E; Wed, 6 Oct 2021 17:01:05 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UZ2PtZa4Iqny; Wed, 6 Oct 2021 17:01:02 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id AA8CC84035; Wed, 6 Oct 2021 17:01:01 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6EB67C000F; Wed, 6 Oct 2021 17:01:01 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 37428C000D for ; Wed, 6 Oct 2021 17:01:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 01E1960EDE for ; Wed, 6 Oct 2021 17:01:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp3.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DI-Z_CQJp73U for ; Wed, 6 Oct 2021 17:00:57 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.8.0 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by smtp3.osuosl.org (Postfix) with ESMTPS id 5B9C760EDC for ; Wed, 6 Oct 2021 17:00:57 +0000 (UTC) Received: by mail-pf1-x432.google.com with SMTP id q19so2425468pfl.4 for ; Wed, 06 Oct 2021 10:00:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bZTyhHm8PZn0t7IgXXwHUNCFDgbtp8NVbDKRjdESy08=; b=fHSCpQmcFVrefpgYEjQ7SwhAjexXDdA4s8Kfst5mU+iGAOacgQpk4ipT4KNqmw9iPW ZXXMxG4B1anSqypM7+vvt4Y470qyUDPLjjWudT4eFuoTWGYEWWZYxw3XXolpZ/YG59NO 1p3pM3lDAkZn19R0jAogQQGwmQ8NQvrdETYl/er3kNYMHocVdTtJHBw8Y2NvBj6E1lK8 bSSdqwhxd6xqiUBT7si6kwgiJ0NQd+/oxYbWXRYsQ3BAN1E6MCTIzCYVWi0Bj2H0wVHH XNMBgjtWVqwxCwP29OT1ejcFxdRwOZ9RGiIqfvXTSV2fp/Q7MYCg2AlIg4JMZV7hXAgv EaPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bZTyhHm8PZn0t7IgXXwHUNCFDgbtp8NVbDKRjdESy08=; b=fA7FGHKkteUW1mHJq7z7NFaaNR+R8+DzuHDDox9EvbppAjsqtrrfUeWmJqDjXbH44b l2o7fyitCw69eG//vjt4VNtXi2ywPSXmxzpKE0T+IbCbaKw26US8kpBTnUee42eljpx8 ErbCcxhGEGkgptPzVov7xOWXnwNvvbnKwLOHcSWul6ebGmzDUZdA6kcIn1nnqcFkx0OE mTT86+z6h2JzlUL3LJkVsYx40AI876+aBhGwTOZdHmTLddQvteTRfcbcwIOeqLMI1cWf +wB6ValORhjZv2htZ6cb27k6jnUmQZo59Ijt5GmPcHOAil4auGk0PG+19lSFQFtxRh26 bq+A== X-Gm-Message-State: AOAM530FSyv7/0nAGwKMNMDmfiZES2fq4JvLZVK+jazFoX+72o3VXac7 39RunFWCRNpH0g35Y5FM68EvktRa3Rk= X-Google-Smtp-Source: ABdhPJwiYlMTOR+JZphQe6sAf9Vg39bDQCiKqOYWEo8KIswEqJCqLDOfrHOnRSYd1ITptvtmk0u/lw== X-Received: by 2002:a62:6d86:0:b0:448:152d:83a4 with SMTP id i128-20020a626d86000000b00448152d83a4mr37654604pfc.38.1633539655161; Wed, 06 Oct 2021 10:00:55 -0700 (PDT) Received: from localhost.localdomain (ec2-44-237-21-18.us-west-2.compute.amazonaws.com. [44.237.21.18]) by smtp.gmail.com with ESMTPSA id p24sm5776449pjz.32.2021.10.06.10.00.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Oct 2021 10:00:54 -0700 (PDT) From: William Tu To: dev@openvswitch.org Date: Wed, 6 Oct 2021 17:00:44 +0000 Message-Id: <20211006170044.496-1-u9012063@gmail.com> X-Mailer: git-send-email 2.33.0.windows.2 MIME-Version: 1.0 Cc: sergey.madaminov@gmail.com, ocardona@microsoft.com, pallavi.kadam@intel.com, blp@cs.stanford.edu, i.maximets@ovn.org, tuc@vmware.com, Dmitry.Kozliuk@gmail.com, talshn@nvidia.com Subject: [ovs-dev] [PATCH] RFC: netdev-dpdk: Add Windows support. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The patch adds OVS-DPDK supports on Windows. Motivation ---------- Currently OVS supports multiple datapath implementations: Linux kernel, Windows kernel, userspace with OVS-DPDK Linux, and HW offload. Adding any new feature to OVS datapath requires OS-specific expertise and usually ends up with feature mismatch, ex: Linux kernel supports feature A, but Windows does not, and high maintenace cost [1]. It would be great if OVS just uses single datapath across different platforms, and the datapath is portatble, performant, and easy to maintain. The natural choice is the dpif-netdev, the usersapce netdev datapath currently used by OVS-DPDK and AF_XDP, currently only works on Linux platform. So the last piece is to make OVS-DPDK runs on Windows. With this work, the OVS userspace datapath becomes the one runs on Windows/Linux/FreeBSD and any new datapath feature should naturally work on different OSes. Performance of userspace datapath should be equal or better to the kernel datapath, due to the kernel bypass design of DPDK/AF_XDP, and optimizations in OVS userspace datapath. OVS also translates its datapath flows into HW offload-capable NICs, such as Mellanox CX6. With userspace datapath, HW offload works by using tc-flower (when using AF_XDP netdev) or using rte_flow API, (when using DPDK netdev). We've tried non-triival OVS actions such as conntrack or tunnel actions offloaded into Mellanox card. Moving forward, OVS will have better cross-platform support, better performance, and easier to maintain. So far I haven't seen any virtual switch capable of doing all of the above. Implementation on Windows ------------------------- It's harder than I thought due to my Linux only background. Sergey and I first need to add meson build support to OVS, in order to make compiling and linking to Windows DPDK library easier[2]. In this patch, we use clang compiler on Windows to compile DPDK and OVS, and OVS links to DPDK static library (Dynamic lib still not supported yet in DPDK Windows). Windows DPDK library hasn't finished all its library/driver porting to Windows. So a couple of DPDK library code used by OVS doesn't compile. For examples: 1) ovs-numa.c: used to detect lcores used by DPDK. 2) open_memstream, fopencookies: used to redirect DPDK logs. 3) vhostuser doesn't support: used to connect VM/container. 4) dpdk-socket-mem not support: configuration I simply remove them in this patch. In the future, I will probably refactor or #ifdef. In addition, only a few DPDK PMD drivers are supported. (please check the DPDK doc[3]) 1) Physical: ice, i40e, mlx5, ixgbe 3) Virtual: none (still work in progress) For ice, i40, ixgbe, you will need additional two Windows drivers, netuio and virt2phys. For mlx5, you will need WINOF-2. I tested this patch using mlx5 pmd on Azure cloud and on-prem and ixgbevf on AWS EC2, using Windows server 2019. And I only tested basic OpenFlow actions such as matching ipv4 headers, drop and output. Future Work ----------- In priority ordering a. Add CI or Dockerfile for building the code b. Performance numbers, compared with OVS kernel datapath c. Try rte_flow HW offload on Windows d. Resolve all the compiler warnings e. Refactor the vhostuser code Some thoughts: f. With clang support on Windows, can we remove MSVC? so OVS will have less compiler-specific code, ex: include\openvswitch\compiler.h g. DPDK starts to implement pthread in Windows, can OVS uses pthread library from DPDK, instead of pthreads4w? h. Mingw/msys is pretty slow, by switching to meson+clang, what's the improvement of build process? f. How does OVS-DPDK connect to VM/Container in Windows? There is no vhostuser, we should look at netvsc/vmbus. I got lots of help from many people in DPDK community, thanks! Reference --------- [1] https://dl.acm.org/doi/10.1145/3452296.3472914 [2] http://patchwork.ozlabs.org/project/openvswitch/cover/20210808014931.320242-1-sergey.madaminov@gmail.com/ [3] https://doc.dpdk.org/guides/windows_gsg/ [4] Porting OvS-DPDK to Windows with Meson https://github.com/smadaminov/ovs-dpdk-meson-issues Signed-off-by: William Tu Signed-off-by: William Tu --- ovs/config.h.meson | 7 - ovs/include/openvswitch/compiler.h | 4 +- ovs/include/openvswitch/util.h | 2 +- ovs/lib/dpdk.c | 114 +--- ovs/lib/dpif-netdev.c | 4 + ovs/lib/meson.build | 12 +- ovs/lib/netdev-dpdk.c | 934 +---------------------------- ovs/lib/sflow_api.h | 1 + ovs/lib/util.c | 4 - ovs/lib/vlog.c | 7 + ovs/meson.build | 88 ++- ovs/meson_options.txt | 2 + ovs/ofproto/meson.build | 3 + ovs/ovsdb/meson.build | 3 + ovs/utilities/meson.build | 3 + ovs/vswitchd/meson.build | 10 + ovs/vtep/meson.build | 3 + 17 files changed, 140 insertions(+), 1061 deletions(-) diff --git a/ovs/config.h.meson b/ovs/config.h.meson index 3aaa1cb..fd6d79c 100644 --- a/ovs/config.h.meson +++ b/ovs/config.h.meson @@ -377,11 +377,4 @@ #include "include/windows/windefs.h" #endif -/* MSR: use meson checks to enable/disable this - and still not clear why it does not work on Linux - or maybe I should include this header only in Windows */ -#if defined(__clang__) && defined(WIN32) -#include -#endif - #mesondefine HAVE_THREAD_SAFETY diff --git a/ovs/include/openvswitch/compiler.h b/ovs/include/openvswitch/compiler.h index 997d5eb..23b088a 100644 --- a/ovs/include/openvswitch/compiler.h +++ b/ovs/include/openvswitch/compiler.h @@ -172,7 +172,7 @@ * OVS_PACKED_ENUM is intended for use only as a space optimization, since it * only works with GCC. That means that it must not be used in wire protocols * or otherwise exposed outside of a single process. */ -#if __GNUC__ && !__CHECKER__ +#if __GNUC__ && !__CHECKER__ && !__clang__ #define OVS_PACKED_ENUM __attribute__((__packed__)) #define HAVE_PACKED_ENUM #else @@ -278,7 +278,7 @@ extern int (*build_assert(void))[BUILD_ASSERT__(EXPR)] #endif -#ifdef __GNUC__ +#if defined(__GNUC__) && !defined(__clang__) #define BUILD_ASSERT_GCCONLY(EXPR) BUILD_ASSERT(EXPR) #define BUILD_ASSERT_DECL_GCCONLY(EXPR) BUILD_ASSERT_DECL(EXPR) #else diff --git a/ovs/include/openvswitch/util.h b/ovs/include/openvswitch/util.h index 228b185..d73634f 100644 --- a/ovs/include/openvswitch/util.h +++ b/ovs/include/openvswitch/util.h @@ -100,7 +100,7 @@ OVS_NO_RETURN void ovs_assert_failure(const char *, const char *, const char *); * dereference any pointer, so it would be surprising for it to cause any * problems in practice. */ -#ifdef __GNUC__ +#if defined(__GNUC__) && !defined(__clang__) #define OBJECT_OFFSETOF(OBJECT, MEMBER) offsetof(typeof(*(OBJECT)), MEMBER) #else #define OBJECT_OFFSETOF(OBJECT, MEMBER) \ diff --git a/ovs/lib/dpdk.c b/ovs/lib/dpdk.c index 855467b..d0c4c2d 100644 --- a/ovs/lib/dpdk.c +++ b/ovs/lib/dpdk.c @@ -277,42 +277,6 @@ construct_dpdk_args(const struct smap *ovs_other_config, struct svec *args) construct_dpdk_mutex_options(ovs_other_config, args); } -static ssize_t -dpdk_log_write(void *c OVS_UNUSED, const char *buf, size_t size) -{ - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(600, 600); - static struct vlog_rate_limit dbg_rl = VLOG_RATE_LIMIT_INIT(600, 600); - - switch (rte_log_cur_msg_loglevel()) { - case RTE_LOG_DEBUG: - VLOG_DBG_RL(&dbg_rl, "%.*s", (int) size, buf); - break; - case RTE_LOG_INFO: - case RTE_LOG_NOTICE: - VLOG_INFO_RL(&rl, "%.*s", (int) size, buf); - break; - case RTE_LOG_WARNING: - VLOG_WARN_RL(&rl, "%.*s", (int) size, buf); - break; - case RTE_LOG_ERR: - VLOG_ERR_RL(&rl, "%.*s", (int) size, buf); - break; - case RTE_LOG_CRIT: - case RTE_LOG_ALERT: - case RTE_LOG_EMERG: - VLOG_EMER("%.*s", (int) size, buf); - break; - default: - OVS_NOT_REACHED(); - } - - return size; -} - -static cookie_io_functions_t dpdk_log_func = { - .write = dpdk_log_write, -}; - static void dpdk_unixctl_mem_stream(struct unixctl_conn *conn, int argc OVS_UNUSED, const char *argv[] OVS_UNUSED, void *aux) @@ -423,51 +387,9 @@ dpdk_init__(const struct smap *ovs_other_config) struct ovs_numa_dump *affinity = NULL; struct svec args = SVEC_EMPTY_INITIALIZER; - log_stream = fopencookie(NULL, "w+", dpdk_log_func); - if (log_stream == NULL) { - VLOG_ERR("Can't redirect DPDK log: %s.", ovs_strerror(errno)); - } else { - setbuf(log_stream, NULL); - rte_openlog_stream(log_stream); - } - - if (process_vhost_flags("vhost-sock-dir", ovs_rundir(), - NAME_MAX, ovs_other_config, - &sock_dir_subcomponent)) { - struct stat s; - if (!strstr(sock_dir_subcomponent, "..")) { - vhost_sock_dir = xasprintf("%s/%s", ovs_rundir(), - sock_dir_subcomponent); - - err = stat(vhost_sock_dir, &s); - if (err) { - VLOG_ERR("vhost-user sock directory '%s' does not exist.", - vhost_sock_dir); - } - } else { - vhost_sock_dir = xstrdup(ovs_rundir()); - VLOG_ERR("vhost-user sock directory request '%s/%s' has invalid" - "characters '..' - using %s instead.", - ovs_rundir(), sock_dir_subcomponent, ovs_rundir()); - } - free(sock_dir_subcomponent); - } else { - vhost_sock_dir = sock_dir_subcomponent; - } - - vhost_iommu_enabled = smap_get_bool(ovs_other_config, - "vhost-iommu-support", false); - VLOG_INFO("IOMMU support for vhost-user-client %s.", - vhost_iommu_enabled ? "enabled" : "disabled"); + VLOG_INFO("Can't redirect DPDK log in Windows: %s.", ovs_strerror(errno)); - vhost_postcopy_enabled = smap_get_bool(ovs_other_config, - "vhost-postcopy-support", false); - if (vhost_postcopy_enabled && memory_locked()) { - VLOG_WARN("vhost-postcopy-support and mlockall are not compatible."); - vhost_postcopy_enabled = false; - } - VLOG_INFO("POSTCOPY support for vhost-user-client %s.", - vhost_postcopy_enabled ? "enabled" : "disabled"); + rte_openlog_stream(log_stream); per_port_memory = smap_get_bool(ovs_other_config, "per-port-memory", false); @@ -561,40 +483,12 @@ dpdk_init__(const struct smap *ovs_other_config) VLOG_EMER("Unable to initialize DPDK: %s", ovs_strerror(rte_errno)); return false; } - - if (VLOG_IS_DBG_ENABLED()) { - size_t size; - char *response = NULL; - FILE *stream = open_memstream(&response, &size); - - if (stream) { - fprintf(stream, "rte_memzone_dump:\n"); - rte_memzone_dump(stream); - fprintf(stream, "rte_log_dump:\n"); - rte_log_dump(stream); - fclose(stream); - VLOG_DBG("%s", response); - free(response); - } else { - VLOG_DBG("Could not dump memzone and log levels. " - "Unable to open memstream: %s.", ovs_strerror(errno)); - } - } - - unixctl_command_register("dpdk/log-list", "", 0, 0, - dpdk_unixctl_mem_stream, rte_log_dump); - unixctl_command_register("dpdk/log-set", "{level | pattern:level}", 0, - INT_MAX, dpdk_unixctl_log_set, NULL); - unixctl_command_register("dpdk/get-malloc-stats", "", 0, 0, - dpdk_unixctl_mem_stream, - malloc_dump_stats_wrapper); - /* We are called from the main thread here */ RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID; + VLOG_INFO("Register the dpdk classes"); /* Finally, register the dpdk classes */ netdev_dpdk_register(); - netdev_register_flow_api_provider(&netdev_offload_dpdk); return true; } @@ -615,7 +509,7 @@ dpdk_init(const struct smap *ovs_other_config) static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER; if (ovsthread_once_start(&once_enable)) { - VLOG_INFO("Using %s", rte_version()); + VLOG_INFO("Using %s", "rte_version"); /* not exposed in DPDK. */ VLOG_INFO("DPDK Enabled - initializing..."); enabled = dpdk_init__(ovs_other_config); if (enabled) { diff --git a/ovs/lib/dpif-netdev.c b/ovs/lib/dpif-netdev.c index 59e326f..0f986ce 100644 --- a/ovs/lib/dpif-netdev.c +++ b/ovs/lib/dpif-netdev.c @@ -1575,7 +1575,11 @@ dpif_netdev_port_open_type(const struct dpif_class *class, const char *type) { return strcmp(type, "internal") ? type : dpif_netdev_class_is_dummy(class) ? "dummy-internal" +#ifdef _WIN32 + : "internal"; +#else : "tap"; +#endif } static struct dpif * diff --git a/ovs/lib/meson.build b/ovs/lib/meson.build index 42e0bbf..9c768b4 100644 --- a/ovs/lib/meson.build +++ b/ovs/lib/meson.build @@ -370,7 +370,6 @@ sources = files( 'lldp/lldp.c', 'lldp/lldpd.c', 'lldp/lldpd-structs.c', - 'dpdk-stub.c', 'async-append-null.c', # no POSIX_AIO 'stream-nossl.c', 'dns-resolve.h', @@ -393,7 +392,18 @@ if build_machine.system() == 'windows' 'netlink-notifier.c', 'netlink-socket.c', 'wmi.c', +# 'syslog-direct.c', +# 'syslog-libc.c', + 'syslog-null.c', ) + + if enable_dpdk + sources += files( + 'dpdk.c', + 'netdev-dpdk.c', + ) + deps += dpdk_lib_deps + endif endif if build_machine.system() == 'linux' diff --git a/ovs/lib/netdev-dpdk.c b/ovs/lib/netdev-dpdk.c index 45a96b9..69dc065 100644 --- a/ovs/lib/netdev-dpdk.c +++ b/ovs/lib/netdev-dpdk.c @@ -22,9 +22,7 @@ #include #include #include -#include #include -#include #include #include @@ -37,7 +35,6 @@ #include #include #include -#include #include "cmap.h" #include "coverage.h" @@ -181,17 +178,6 @@ static int vring_state_changed(int vid, uint16_t queue_id, int enable); static void destroy_connection(int vid); static void vhost_guest_notified(int vid); -static const struct vhost_device_ops virtio_net_device_ops = -{ - .new_device = new_device, - .destroy_device = destroy_device, - .vring_state_changed = vring_state_changed, - .features_changed = NULL, - .new_connection = NULL, - .destroy_connection = destroy_connection, - .guest_notified = vhost_guest_notified, -}; - /* Custom software stats for dpdk ports */ struct netdev_dpdk_sw_stats { /* No. of retries when unable to transmit. */ @@ -536,7 +522,6 @@ struct netdev_rxq_dpdk { }; static void netdev_dpdk_destruct(struct netdev *netdev); -static void netdev_dpdk_vhost_destruct(struct netdev *netdev); static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, struct netdev_custom_stats *); @@ -550,8 +535,7 @@ netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev); static bool is_dpdk_class(const struct netdev_class *class) { - return class->destruct == netdev_dpdk_destruct - || class->destruct == netdev_dpdk_vhost_destruct; + return class->destruct == netdev_dpdk_destruct; } /* DPDK NIC drivers allocate RX buffers at a particular granularity, typically @@ -1165,8 +1149,10 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->started = true; rte_eth_promiscuous_enable(dev->port_id); +#ifndef _WIN32 + /* Windows WINOF2.70 not support. */ rte_eth_allmulticast_enable(dev->port_id); - +#endif memset(ð_addr, 0x0, sizeof(eth_addr)); rte_eth_macaddr_get(dev->port_id, ð_addr); VLOG_INFO_RL(&rl, "Port "DPDK_PORT_ID_FMT": "ETH_ADDR_FMT, @@ -1302,128 +1288,6 @@ netdev_dpdk_get_num_ports(struct rte_device *device) return count; } -static int -vhost_common_construct(struct netdev *netdev) - OVS_REQUIRES(dpdk_mutex) -{ - int socket_id = rte_lcore_to_socket_id(rte_get_main_lcore()); - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - - dev->vhost_rxq_enabled = dpdk_rte_mzalloc(OVS_VHOST_MAX_QUEUE_NUM * - sizeof *dev->vhost_rxq_enabled); - if (!dev->vhost_rxq_enabled) { - return ENOMEM; - } - dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM); - if (!dev->tx_q) { - rte_free(dev->vhost_rxq_enabled); - return ENOMEM; - } - - atomic_init(&dev->vhost_tx_retries_max, VHOST_ENQ_RETRY_DEF); - - return common_construct(netdev, DPDK_ETH_PORT_ID_INVALID, - DPDK_DEV_VHOST, socket_id); -} - -static int -netdev_dpdk_vhost_construct(struct netdev *netdev) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - const char *name = netdev->name; - int err; - - /* 'name' is appended to 'vhost_sock_dir' and used to create a socket in - * the file system. '/' or '\' would traverse directories, so they're not - * acceptable in 'name'. */ - if (strchr(name, '/') || strchr(name, '\\')) { - VLOG_ERR("\"%s\" is not a valid name for a vhost-user port. " - "A valid name must not include '/' or '\\'", - name); - return EINVAL; - } - - ovs_mutex_lock(&dpdk_mutex); - /* Take the name of the vhost-user port and append it to the location where - * the socket is to be created, then register the socket. - */ - dev->vhost_id = xasprintf("%s/%s", dpdk_get_vhost_sock_dir(), name); - - dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT; - - /* There is no support for multi-segments buffers. */ - dev->vhost_driver_flags |= RTE_VHOST_USER_LINEARBUF_SUPPORT; - err = rte_vhost_driver_register(dev->vhost_id, dev->vhost_driver_flags); - if (err) { - VLOG_ERR("vhost-user socket device setup failure for socket %s\n", - dev->vhost_id); - goto out; - } else { - fatal_signal_add_file_to_unlink(dev->vhost_id); - VLOG_INFO("Socket %s created for vhost-user port %s\n", - dev->vhost_id, name); - } - - err = rte_vhost_driver_callback_register(dev->vhost_id, - &virtio_net_device_ops); - if (err) { - VLOG_ERR("rte_vhost_driver_callback_register failed for vhost user " - "port: %s\n", name); - goto out; - } - - if (!userspace_tso_enabled()) { - err = rte_vhost_driver_disable_features(dev->vhost_id, - 1ULL << VIRTIO_NET_F_HOST_TSO4 - | 1ULL << VIRTIO_NET_F_HOST_TSO6 - | 1ULL << VIRTIO_NET_F_CSUM); - if (err) { - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " - "port: %s\n", name); - goto out; - } - } - - err = rte_vhost_driver_start(dev->vhost_id); - if (err) { - VLOG_ERR("rte_vhost_driver_start failed for vhost user " - "port: %s\n", name); - goto out; - } - - err = vhost_common_construct(netdev); - if (err) { - VLOG_ERR("vhost_common_construct failed for vhost user " - "port: %s\n", name); - } - -out: - if (err) { - free(dev->vhost_id); - dev->vhost_id = NULL; - } - - ovs_mutex_unlock(&dpdk_mutex); - VLOG_WARN_ONCE("dpdkvhostuser ports are considered deprecated; " - "please migrate to dpdkvhostuserclient ports."); - return err; -} - -static int -netdev_dpdk_vhost_client_construct(struct netdev *netdev) -{ - int err; - - ovs_mutex_lock(&dpdk_mutex); - err = vhost_common_construct(netdev); - if (err) { - VLOG_ERR("vhost_common_construct failed for vhost user client" - "port: %s\n", netdev->name); - } - ovs_mutex_unlock(&dpdk_mutex); - return err; -} - static int netdev_dpdk_construct(struct netdev *netdev) { @@ -1500,58 +1364,6 @@ netdev_dpdk_destruct(struct netdev *netdev) ovs_mutex_unlock(&dpdk_mutex); } -/* rte_vhost_driver_unregister() can call back destroy_device(), which will - * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'. To avoid a - * deadlock, none of the mutexes must be held while calling this function. */ -static int -dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED, - char *vhost_id) - OVS_EXCLUDED(dpdk_mutex) - OVS_EXCLUDED(dev->mutex) -{ - return rte_vhost_driver_unregister(vhost_id); -} - -static void -netdev_dpdk_vhost_destruct(struct netdev *netdev) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - char *vhost_id; - - ovs_mutex_lock(&dpdk_mutex); - - /* Guest becomes an orphan if still attached. */ - if (netdev_dpdk_get_vid(dev) >= 0 - && !(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { - VLOG_ERR("Removing port '%s' while vhost device still attached.", - netdev->name); - VLOG_ERR("To restore connectivity after re-adding of port, VM on " - "socket '%s' must be restarted.", dev->vhost_id); - } - - vhost_id = dev->vhost_id; - dev->vhost_id = NULL; - rte_free(dev->vhost_rxq_enabled); - - common_destruct(dev); - - ovs_mutex_unlock(&dpdk_mutex); - - if (!vhost_id) { - goto out; - } - - if (dpdk_vhost_driver_unregister(dev, vhost_id)) { - VLOG_ERR("%s: Unable to unregister vhost driver for socket '%s'.\n", - netdev->name, vhost_id); - } else if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { - /* OVS server mode - remove this socket from list for deletion */ - fatal_signal_remove_file_to_unlink(vhost_id); - } -out: - free(vhost_id); -} - static void netdev_dpdk_dealloc(struct netdev *netdev) { @@ -2053,42 +1865,6 @@ out: return err; } -static int -netdev_dpdk_vhost_client_set_config(struct netdev *netdev, - const struct smap *args, - char **errp OVS_UNUSED) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - const char *path; - int max_tx_retries, cur_max_tx_retries; - - ovs_mutex_lock(&dev->mutex); - if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { - path = smap_get(args, "vhost-server-path"); - if (!nullable_string_is_equal(path, dev->vhost_id)) { - free(dev->vhost_id); - dev->vhost_id = nullable_xstrdup(path); - netdev_request_reconfigure(netdev); - } - } - - max_tx_retries = smap_get_int(args, "tx-retries-max", - VHOST_ENQ_RETRY_DEF); - if (max_tx_retries < VHOST_ENQ_RETRY_MIN - || max_tx_retries > VHOST_ENQ_RETRY_MAX) { - max_tx_retries = VHOST_ENQ_RETRY_DEF; - } - atomic_read_relaxed(&dev->vhost_tx_retries_max, &cur_max_tx_retries); - if (max_tx_retries != cur_max_tx_retries) { - atomic_store_relaxed(&dev->vhost_tx_retries_max, max_tx_retries); - VLOG_INFO("Max Tx retries for vhost device '%s' set to %d", - netdev_get_name(netdev), max_tx_retries); - } - ovs_mutex_unlock(&dev->mutex); - - return 0; -} - static int netdev_dpdk_get_numa_id(const struct netdev *netdev) { @@ -2326,7 +2102,7 @@ ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, static bool is_vhost_running(struct netdev_dpdk *dev) { - return (netdev_dpdk_get_vid(dev) >= 0 && dev->vhost_reconfigured); + return false; } static inline void @@ -2395,69 +2171,6 @@ netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev, } } -/* - * The receive path for the vhost port is the TX path out from guest. - */ -static int -netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, - struct dp_packet_batch *batch, int *qfill) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); - struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); - uint16_t nb_rx = 0; - uint16_t qos_drops = 0; - int qid = rxq->queue_id * VIRTIO_QNUM + VIRTIO_TXQ; - int vid = netdev_dpdk_get_vid(dev); - - if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured - || !(dev->flags & NETDEV_UP))) { - return EAGAIN; - } - - nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp, - (struct rte_mbuf **) batch->packets, - NETDEV_MAX_BURST); - if (!nb_rx) { - return EAGAIN; - } - - if (qfill) { - if (nb_rx == NETDEV_MAX_BURST) { - /* The DPDK API returns a uint32_t which often has invalid bits in - * the upper 16-bits. Need to restrict the value to uint16_t. */ - *qfill = rte_vhost_rx_queue_count(vid, qid) & UINT16_MAX; - } else { - *qfill = 0; - } - } - - if (policer) { - qos_drops = nb_rx; - nb_rx = ingress_policer_run(policer, - (struct rte_mbuf **) batch->packets, - nb_rx, true); - qos_drops -= nb_rx; - } - - rte_spinlock_lock(&dev->stats_lock); - netdev_dpdk_vhost_update_rx_counters(dev, batch->packets, - nb_rx, qos_drops); - rte_spinlock_unlock(&dev->stats_lock); - - batch->count = nb_rx; - dp_packet_batch_init_packet_fields(batch); - - return 0; -} - -static bool -netdev_dpdk_vhost_rxq_enabled(struct netdev_rxq *rxq) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); - - return dev->vhost_rxq_enabled[rxq->queue_id]; -} - static int netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, int *qfill) @@ -2553,122 +2266,6 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, return cnt; } -static inline void -netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev, - struct dp_packet **packets, - int attempted, - struct netdev_dpdk_sw_stats *sw_stats_add) -{ - int dropped = sw_stats_add->tx_mtu_exceeded_drops + - sw_stats_add->tx_qos_drops + - sw_stats_add->tx_failure_drops + - sw_stats_add->tx_invalid_hwol_drops; - struct netdev_stats *stats = &dev->stats; - int sent = attempted - dropped; - int i; - - stats->tx_packets += sent; - stats->tx_dropped += dropped; - - for (i = 0; i < sent; i++) { - stats->tx_bytes += dp_packet_size(packets[i]); - } - - if (OVS_UNLIKELY(dropped || sw_stats_add->tx_retries)) { - struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats; - - sw_stats->tx_retries += sw_stats_add->tx_retries; - sw_stats->tx_failure_drops += sw_stats_add->tx_failure_drops; - sw_stats->tx_mtu_exceeded_drops += sw_stats_add->tx_mtu_exceeded_drops; - sw_stats->tx_qos_drops += sw_stats_add->tx_qos_drops; - sw_stats->tx_invalid_hwol_drops += sw_stats_add->tx_invalid_hwol_drops; - } -} - -static void -__netdev_dpdk_vhost_send(struct netdev *netdev, int qid, - struct dp_packet **pkts, int cnt) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; - struct netdev_dpdk_sw_stats sw_stats_add; - unsigned int n_packets_to_free = cnt; - unsigned int total_packets = cnt; - int i, retries = 0; - int max_retries = VHOST_ENQ_RETRY_MIN; - int vid = netdev_dpdk_get_vid(dev); - - qid = dev->tx_q[qid % netdev->n_txq].map; - - if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || qid < 0 - || !(dev->flags & NETDEV_UP))) { - rte_spinlock_lock(&dev->stats_lock); - dev->stats.tx_dropped+= cnt; - rte_spinlock_unlock(&dev->stats_lock); - goto out; - } - - if (OVS_UNLIKELY(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) { - COVERAGE_INC(vhost_tx_contention); - rte_spinlock_lock(&dev->tx_q[qid].tx_lock); - } - - sw_stats_add.tx_invalid_hwol_drops = cnt; - if (userspace_tso_enabled()) { - cnt = netdev_dpdk_prep_hwol_batch(dev, cur_pkts, cnt); - } - - sw_stats_add.tx_invalid_hwol_drops -= cnt; - sw_stats_add.tx_mtu_exceeded_drops = cnt; - cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); - sw_stats_add.tx_mtu_exceeded_drops -= cnt; - - /* Check has QoS has been configured for the netdev */ - sw_stats_add.tx_qos_drops = cnt; - cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true); - sw_stats_add.tx_qos_drops -= cnt; - - n_packets_to_free = cnt; - - do { - int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; - unsigned int tx_pkts; - - tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt); - if (OVS_LIKELY(tx_pkts)) { - /* Packets have been sent.*/ - cnt -= tx_pkts; - /* Prepare for possible retry.*/ - cur_pkts = &cur_pkts[tx_pkts]; - if (OVS_UNLIKELY(cnt && !retries)) { - /* - * Read max retries as there are packets not sent - * and no retries have already occurred. - */ - atomic_read_relaxed(&dev->vhost_tx_retries_max, &max_retries); - } - } else { - /* No packets sent - do not retry.*/ - break; - } - } while (cnt && (retries++ < max_retries)); - - rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); - - sw_stats_add.tx_failure_drops = cnt; - sw_stats_add.tx_retries = MIN(retries, max_retries); - - rte_spinlock_lock(&dev->stats_lock); - netdev_dpdk_vhost_update_tx_counters(dev, pkts, total_packets, - &sw_stats_add); - rte_spinlock_unlock(&dev->stats_lock); - -out: - for (i = 0; i < n_packets_to_free; i++) { - dp_packet_delete(pkts[i]); - } -} - static void netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque) { @@ -2825,13 +2422,9 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) } if (OVS_LIKELY(txcnt)) { - if (dev->type == DPDK_DEV_VHOST) { - __netdev_dpdk_vhost_send(netdev, qid, pkts, txcnt); - } else { tx_failure += netdev_dpdk_eth_tx_burst(dev, qid, (struct rte_mbuf **)pkts, txcnt); - } } dropped += qos_drops + mtu_drops + tx_failure; @@ -2845,22 +2438,6 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) } } -static int -netdev_dpdk_vhost_send(struct netdev *netdev, int qid, - struct dp_packet_batch *batch, - bool concurrent_txq OVS_UNUSED) -{ - - if (OVS_UNLIKELY(batch->packets[0]->source != DPBUF_DPDK)) { - dpdk_do_tx_copy(netdev, qid, batch); - dp_packet_delete_batch(batch, true); - } else { - __netdev_dpdk_vhost_send(netdev, qid, batch->packets, - dp_packet_batch_size(batch)); - } - return 0; -} - static inline void netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, struct dp_packet_batch *batch, @@ -3581,63 +3158,6 @@ netdev_dpdk_update_flags(struct netdev *netdev, return error; } -static int -netdev_dpdk_vhost_user_get_status(const struct netdev *netdev, - struct smap *args) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - - ovs_mutex_lock(&dev->mutex); - - bool client_mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; - smap_add_format(args, "mode", "%s", client_mode ? "client" : "server"); - - int vid = netdev_dpdk_get_vid(dev); - if (vid < 0) { - smap_add_format(args, "status", "disconnected"); - ovs_mutex_unlock(&dev->mutex); - return 0; - } else { - smap_add_format(args, "status", "connected"); - } - - char socket_name[PATH_MAX]; - if (!rte_vhost_get_ifname(vid, socket_name, PATH_MAX)) { - smap_add_format(args, "socket", "%s", socket_name); - } - - uint64_t features; - if (!rte_vhost_get_negotiated_features(vid, &features)) { - smap_add_format(args, "features", "0x%016"PRIx64, features); - } - - uint16_t mtu; - if (!rte_vhost_get_mtu(vid, &mtu)) { - smap_add_format(args, "mtu", "%d", mtu); - } - - int numa = rte_vhost_get_numa_node(vid); - if (numa >= 0) { - smap_add_format(args, "numa", "%d", numa); - } - - uint16_t vring_num = rte_vhost_get_vring_num(vid); - if (vring_num) { - smap_add_format(args, "num_of_vrings", "%d", vring_num); - } - - for (int i = 0; i < vring_num; i++) { - struct rte_vhost_vring vring; - - rte_vhost_get_vhost_vring(vid, i, &vring); - smap_add_nocopy(args, xasprintf("vring_%d_size", i), - xasprintf("%d", vring.size)); - } - - ovs_mutex_unlock(&dev->mutex); - return 0; -} - /* * Convert a given uint32_t link speed defined in DPDK to a string * equivalent. @@ -3713,7 +3233,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) enum { IF_TYPE_ETHERNETCSMACD = 6 }; smap_add_format(args, "if_type", "%"PRIu32, IF_TYPE_ETHERNETCSMACD); - smap_add_format(args, "if_descr", "%s %s", rte_version(), + smap_add_format(args, "if_descr", "%s %s", "rte_version()", dev_info.driver_name); smap_add_format(args, "pci-vendor_id", "0x%x", vendor_id); smap_add_format(args, "pci-device_id", "0x%x", device_id); @@ -3871,7 +3391,7 @@ netdev_dpdk_get_mempool_info(struct unixctl_conn *conn, goto out; } } - +#ifndef _WIN32 stream = open_memstream(&response, &size); if (!stream) { response = xasprintf("Unable to open memstream: %s.", @@ -3879,7 +3399,7 @@ netdev_dpdk_get_mempool_info(struct unixctl_conn *conn, unixctl_command_reply_error(conn, response); goto out; } - +#endif if (netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -3904,19 +3424,6 @@ out: netdev_close(netdev); } -/* - * Set virtqueue flags so that we do not receive interrupts. - */ -static void -set_irq_status(int vid) -{ - uint32_t i; - - for (i = 0; i < rte_vhost_get_vring_num(vid); i++) { - rte_vhost_enable_guest_notification(vid, i, 0); - } -} - /* * Fixes mapping for vhost-user tx queues. Must be called after each * enabling/disabling of queues and n_txq modifications. @@ -3966,73 +3473,6 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) free(enabled_queues); } -/* - * A new virtio-net device is added to a vhost port. - */ -static int -new_device(int vid) -{ - struct netdev_dpdk *dev; - bool exists = false; - int newnode = 0; - char ifname[IF_NAME_SZ]; - - rte_vhost_get_ifname(vid, ifname, sizeof ifname); - - ovs_mutex_lock(&dpdk_mutex); - /* Add device to the vhost port with the same name as that passed down. */ - LIST_FOR_EACH(dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); - if (nullable_string_is_equal(ifname, dev->vhost_id)) { - uint32_t qp_num = rte_vhost_get_vring_num(vid) / VIRTIO_QNUM; - - /* Get NUMA information */ - newnode = rte_vhost_get_numa_node(vid); - if (newnode == -1) { -#ifdef VHOST_NUMA - VLOG_INFO("Error getting NUMA info for vHost Device '%s'", - ifname); -#endif - newnode = dev->socket_id; - } - - if (dev->requested_n_txq < qp_num - || dev->requested_n_rxq < qp_num - || dev->requested_socket_id != newnode) { - dev->requested_socket_id = newnode; - dev->requested_n_rxq = qp_num; - dev->requested_n_txq = qp_num; - netdev_request_reconfigure(&dev->up); - } else { - /* Reconfiguration not required. */ - dev->vhost_reconfigured = true; - } - - ovsrcu_index_set(&dev->vid, vid); - exists = true; - - /* Disable notifications. */ - set_irq_status(vid); - netdev_change_seq_changed(&dev->up); - ovs_mutex_unlock(&dev->mutex); - break; - } - ovs_mutex_unlock(&dev->mutex); - } - ovs_mutex_unlock(&dpdk_mutex); - - if (!exists) { - VLOG_INFO("vHost Device '%s' can't be added - name not found", ifname); - - return -1; - } - - VLOG_INFO("vHost Device '%s' has been added on numa node %i", - ifname, newnode); - - return 0; -} - /* Clears mapping for all available queues of vhost interface. */ static void netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) @@ -4045,175 +3485,6 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) } } -/* - * Remove a virtio-net device from the specific vhost port. Use dev->remove - * flag to stop any more packets from being sent or received to/from a VM and - * ensure all currently queued packets have been sent/received before removing - * the device. - */ -static void -destroy_device(int vid) -{ - struct netdev_dpdk *dev; - bool exists = false; - char ifname[IF_NAME_SZ]; - - rte_vhost_get_ifname(vid, ifname, sizeof ifname); - - ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - if (netdev_dpdk_get_vid(dev) == vid) { - - ovs_mutex_lock(&dev->mutex); - dev->vhost_reconfigured = false; - ovsrcu_index_set(&dev->vid, -1); - memset(dev->vhost_rxq_enabled, 0, - dev->up.n_rxq * sizeof *dev->vhost_rxq_enabled); - netdev_dpdk_txq_map_clear(dev); - - netdev_change_seq_changed(&dev->up); - ovs_mutex_unlock(&dev->mutex); - exists = true; - break; - } - } - - ovs_mutex_unlock(&dpdk_mutex); - - if (exists) { - /* - * Wait for other threads to quiesce after setting the 'virtio_dev' - * to NULL, before returning. - */ - ovsrcu_synchronize(); - /* - * As call to ovsrcu_synchronize() will end the quiescent state, - * put thread back into quiescent state before returning. - */ - ovsrcu_quiesce_start(); - VLOG_INFO("vHost Device '%s' has been removed", ifname); - } else { - VLOG_INFO("vHost Device '%s' not found", ifname); - } -} - -static int -vring_state_changed(int vid, uint16_t queue_id, int enable) -{ - struct netdev_dpdk *dev; - bool exists = false; - int qid = queue_id / VIRTIO_QNUM; - bool is_rx = (queue_id % VIRTIO_QNUM) == VIRTIO_TXQ; - char ifname[IF_NAME_SZ]; - - rte_vhost_get_ifname(vid, ifname, sizeof ifname); - - ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); - if (nullable_string_is_equal(ifname, dev->vhost_id)) { - if (is_rx) { - bool old_state = dev->vhost_rxq_enabled[qid]; - - dev->vhost_rxq_enabled[qid] = enable != 0; - if (old_state != dev->vhost_rxq_enabled[qid]) { - netdev_change_seq_changed(&dev->up); - } - } else { - if (enable) { - dev->tx_q[qid].map = qid; - } else { - dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; - } - netdev_dpdk_remap_txqs(dev); - } - exists = true; - ovs_mutex_unlock(&dev->mutex); - break; - } - ovs_mutex_unlock(&dev->mutex); - } - ovs_mutex_unlock(&dpdk_mutex); - - if (exists) { - VLOG_INFO("State of queue %d ( %s_qid %d ) of vhost device '%s' " - "changed to \'%s\'", queue_id, is_rx == true ? "rx" : "tx", - qid, ifname, (enable == 1) ? "enabled" : "disabled"); - } else { - VLOG_INFO("vHost Device '%s' not found", ifname); - return -1; - } - - return 0; -} - -static void -destroy_connection(int vid) -{ - struct netdev_dpdk *dev; - char ifname[IF_NAME_SZ]; - bool exists = false; - - rte_vhost_get_ifname(vid, ifname, sizeof ifname); - - ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); - if (nullable_string_is_equal(ifname, dev->vhost_id)) { - uint32_t qp_num = NR_QUEUE; - - if (netdev_dpdk_get_vid(dev) >= 0) { - VLOG_ERR("Connection on socket '%s' destroyed while vhost " - "device still attached.", dev->vhost_id); - } - - /* Restore the number of queue pairs to default. */ - if (dev->requested_n_txq != qp_num - || dev->requested_n_rxq != qp_num) { - dev->requested_n_rxq = qp_num; - dev->requested_n_txq = qp_num; - netdev_request_reconfigure(&dev->up); - } - ovs_mutex_unlock(&dev->mutex); - exists = true; - break; - } - ovs_mutex_unlock(&dev->mutex); - } - ovs_mutex_unlock(&dpdk_mutex); - - if (exists) { - VLOG_INFO("vHost Device '%s' connection has been destroyed", ifname); - } else { - VLOG_INFO("vHost Device '%s' not found", ifname); - } -} - -static -void vhost_guest_notified(int vid OVS_UNUSED) -{ - COVERAGE_INC(vhost_notification); -} - -/* - * Retrieve the DPDK virtio device ID (vid) associated with a vhostuser - * or vhostuserclient netdev. - * - * Returns a value greater or equal to zero for a valid vid or '-1' if - * there is no valid vid associated. A vid of '-1' must not be used in - * rte_vhost_ APi calls. - * - * Once obtained and validated, a vid can be used by a PMD for multiple - * subsequent rte_vhost API calls until the PMD quiesces. A PMD should - * not fetch the vid again for each of a series of API calls. - */ - -int -netdev_dpdk_get_vid(const struct netdev_dpdk *dev) -{ - return ovsrcu_index_get(&dev->vid); -} - struct ingress_policer * netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev) { @@ -4229,7 +3500,7 @@ netdev_dpdk_class_init(void) * needs to be done only once */ if (ovsthread_once_start(&once)) { int ret; - +#ifndef _WIN32 ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL); unixctl_command_register("netdev-dpdk/set-admin-state", "[netdev] up|down", 1, 2, @@ -4250,7 +3521,7 @@ netdev_dpdk_class_init(void) VLOG_ERR("Ethernet device callback register error: %s", rte_strerror(-ret)); } - +#endif ovsthread_once_done(&once); } @@ -5042,158 +4313,6 @@ out: return err; } -static int -dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) -{ - dev->up.n_txq = dev->requested_n_txq; - dev->up.n_rxq = dev->requested_n_rxq; - int err; - - /* Always keep RX queue 0 enabled for implementations that won't - * report vring states. */ - dev->vhost_rxq_enabled[0] = true; - - /* Enable TX queue 0 by default if it wasn't disabled. */ - if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) { - dev->tx_q[0].map = 0; - } - - if (userspace_tso_enabled()) { - dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; - VLOG_DBG("%s: TSO enabled on vhost port", netdev_get_name(&dev->up)); - } - - netdev_dpdk_remap_txqs(dev); - - err = netdev_dpdk_mempool_configure(dev); - if (!err) { - /* A new mempool was created or re-used. */ - netdev_change_seq_changed(&dev->up); - } else if (err != EEXIST) { - return err; - } - if (netdev_dpdk_get_vid(dev) >= 0) { - if (dev->vhost_reconfigured == false) { - dev->vhost_reconfigured = true; - /* Carrier status may need updating. */ - netdev_change_seq_changed(&dev->up); - } - } - - return 0; -} - -static int -netdev_dpdk_vhost_reconfigure(struct netdev *netdev) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - int err; - - ovs_mutex_lock(&dev->mutex); - err = dpdk_vhost_reconfigure_helper(dev); - ovs_mutex_unlock(&dev->mutex); - - return err; -} - -static int -netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - int err; - uint64_t vhost_flags = 0; - uint64_t vhost_unsup_flags; - - ovs_mutex_lock(&dev->mutex); - - /* Configure vHost client mode if requested and if the following criteria - * are met: - * 1. Device hasn't been registered yet. - * 2. A path has been specified. - */ - if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) && dev->vhost_id) { - /* Register client-mode device. */ - vhost_flags |= RTE_VHOST_USER_CLIENT; - - /* There is no support for multi-segments buffers. */ - vhost_flags |= RTE_VHOST_USER_LINEARBUF_SUPPORT; - - /* Enable IOMMU support, if explicitly requested. */ - if (dpdk_vhost_iommu_enabled()) { - vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; - } - - /* Enable POSTCOPY support, if explicitly requested. */ - if (dpdk_vhost_postcopy_enabled()) { - vhost_flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT; - } - - /* Enable External Buffers if TCP Segmentation Offload is enabled. */ - if (userspace_tso_enabled()) { - vhost_flags |= RTE_VHOST_USER_EXTBUF_SUPPORT; - } - - err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); - if (err) { - VLOG_ERR("vhost-user device setup failure for device %s\n", - dev->vhost_id); - goto unlock; - } else { - /* Configuration successful */ - dev->vhost_driver_flags |= vhost_flags; - VLOG_INFO("vHost User device '%s' created in 'client' mode, " - "using client socket '%s'", - dev->up.name, dev->vhost_id); - } - - err = rte_vhost_driver_callback_register(dev->vhost_id, - &virtio_net_device_ops); - if (err) { - VLOG_ERR("rte_vhost_driver_callback_register failed for " - "vhost user client port: %s\n", dev->up.name); - goto unlock; - } - - if (userspace_tso_enabled()) { - netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; - vhost_unsup_flags = 1ULL << VIRTIO_NET_F_HOST_ECN - | 1ULL << VIRTIO_NET_F_HOST_UFO; - } else { - /* This disables checksum offloading and all the features - * that depends on it (TSO, UFO, ECN) according to virtio - * specification. */ - vhost_unsup_flags = 1ULL << VIRTIO_NET_F_CSUM; - } - - err = rte_vhost_driver_disable_features(dev->vhost_id, - vhost_unsup_flags); - if (err) { - VLOG_ERR("rte_vhost_driver_disable_features failed for " - "vhost user client port: %s\n", dev->up.name); - goto unlock; - } - - err = rte_vhost_driver_start(dev->vhost_id); - if (err) { - VLOG_ERR("rte_vhost_driver_start failed for vhost user " - "client port: %s\n", dev->up.name); - goto unlock; - } - } - - err = dpdk_vhost_reconfigure_helper(dev); - -unlock: - ovs_mutex_unlock(&dev->mutex); - - return err; -} - int netdev_dpdk_get_port_id(struct netdev *netdev) { @@ -5462,41 +4581,8 @@ static const struct netdev_class dpdk_class = { .send = netdev_dpdk_eth_send, }; -static const struct netdev_class dpdk_vhost_class = { - .type = "dpdkvhostuser", - NETDEV_DPDK_CLASS_COMMON, - .construct = netdev_dpdk_vhost_construct, - .destruct = netdev_dpdk_vhost_destruct, - .send = netdev_dpdk_vhost_send, - .get_carrier = netdev_dpdk_vhost_get_carrier, - .get_stats = netdev_dpdk_vhost_get_stats, - .get_custom_stats = netdev_dpdk_get_sw_custom_stats, - .get_status = netdev_dpdk_vhost_user_get_status, - .reconfigure = netdev_dpdk_vhost_reconfigure, - .rxq_recv = netdev_dpdk_vhost_rxq_recv, - .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, -}; - -static const struct netdev_class dpdk_vhost_client_class = { - .type = "dpdkvhostuserclient", - NETDEV_DPDK_CLASS_COMMON, - .construct = netdev_dpdk_vhost_client_construct, - .destruct = netdev_dpdk_vhost_destruct, - .set_config = netdev_dpdk_vhost_client_set_config, - .send = netdev_dpdk_vhost_send, - .get_carrier = netdev_dpdk_vhost_get_carrier, - .get_stats = netdev_dpdk_vhost_get_stats, - .get_custom_stats = netdev_dpdk_get_sw_custom_stats, - .get_status = netdev_dpdk_vhost_user_get_status, - .reconfigure = netdev_dpdk_vhost_client_reconfigure, - .rxq_recv = netdev_dpdk_vhost_rxq_recv, - .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, -}; - void netdev_dpdk_register(void) { netdev_register_provider(&dpdk_class); - netdev_register_provider(&dpdk_vhost_class); - netdev_register_provider(&dpdk_vhost_client_class); } diff --git a/ovs/lib/sflow_api.h b/ovs/lib/sflow_api.h index 7264fc1..180648d 100644 --- a/ovs/lib/sflow_api.h +++ b/ovs/lib/sflow_api.h @@ -21,6 +21,7 @@ #include #include #include +#include #ifdef SFLOW_DO_SOCKET #include diff --git a/ovs/lib/util.c b/ovs/lib/util.c index 1195c79..3542f48 100644 --- a/ovs/lib/util.c +++ b/ovs/lib/util.c @@ -591,10 +591,6 @@ ovs_set_program_name(const char *argv0, const char *version) size_t max_len = strlen(argv0) + 1; SetErrorMode(GetErrorMode() | SEM_NOGPFAULTERRORBOX); -#if _MSC_VER < 1900 - /* This function is deprecated from 1900 (Visual Studio 2015) */ - _set_output_format(_TWO_DIGIT_EXPONENT); -#endif basename = xmalloc(max_len); _splitpath_s(argv0, NULL, 0, NULL, 0, basename, max_len, NULL, 0); diff --git a/ovs/lib/vlog.c b/ovs/lib/vlog.c index 533f937..00d5d92 100644 --- a/ovs/lib/vlog.c +++ b/ovs/lib/vlog.c @@ -215,9 +215,16 @@ vlog_get_destination_val(const char *name) void vlog_insert_module(struct ovs_list *vlog) { +/* There is a bug here related to linking pthread. + * I think it's related to WHOLEARCHIVE option. + * */ +#ifndef _WIN32 ovs_mutex_lock(&log_file_mutex); +#endif ovs_list_insert(&vlog_modules, vlog); +#ifndef _WIN32 ovs_mutex_unlock(&log_file_mutex); +#endif } /* Returns the name for logging module 'module'. */ diff --git a/ovs/meson.build b/ovs/meson.build index 72d20f2..9d57716 100644 --- a/ovs/meson.build +++ b/ovs/meson.build @@ -9,7 +9,7 @@ project('openvswitch', 'C', version: '2.16.90', license: 'Apache2.0', - default_options: ['buildtype=release', 'default_library=static'], + default_options: ['buildtype=debug', 'default_library=static'], meson_version: '>= 0.59.0' ) @@ -25,6 +25,8 @@ conf_soversion = configuration_data() conf_soversion.set('SOVERSION', as_soversion) cc = meson.get_compiler('c') +enable_dpdk = false +global_inc = [] prog_python = import('python').find_installation('python3') python_env = environment() @@ -42,15 +44,47 @@ if build_machine.system() == 'windows' if get_option('with-pthread') == 'optval' error('"with-pthread" is a required option on Windows') endif + + # suppress all warnings + # add_global_arguments('-w', language : 'c') + + add_global_arguments('-U_MSC_VER', language : 'c') + add_global_arguments('-D__GNUC__', language : 'c') + # add_global_arguments('-std=c11', language : 'c') + add_global_arguments('-std=gnu99', language : 'c') + add_global_arguments('-D_TIMESPEC_DEFINED', language : 'c') + # add_global_arguments('-D_CRT_NO_TIME_T', language : 'c') + add_global_arguments('-D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS', language : 'c') + add_global_arguments('-D_CRT_SECURE_NO_DEPRECATE', language : 'c') + + + if get_option('with-dpdk') != 'optval' + dpdk_lib_path = get_option('with-dpdk') + message('DPDK path=', dpdk_lib_path) + cdata.set('DPDK_NETDEV', 1) + cdata.set('HAVE_RTE_CONFIG_H', 1) + dpdk_lib_deps = dependency('libdpdk', required : true, static : true) + enable_dpdk = true + add_global_arguments('-msse3', language : 'c') + add_global_arguments('-mssse3', language : 'c') + add_global_arguments('-U__AVX512F__', language : 'c') + add_project_link_arguments('-Wl,/FORCE:MULTIPLE', language: 'c') + add_project_link_arguments('-Wl,/WHOLEARCHIVE', language: 'c') + add_project_link_arguments('-lmincore', language: 'c') + add_project_link_arguments('-ladvapi32', '-lsetupapi', language: 'c') + add_project_link_arguments('-ldbghelp', language: 'c') + endif + win_pthread_path = get_option('with-pthread') thread_dep = cc.find_library( - 'pthreadVC3-w32', + 'pthreadVC3', dirs : [ # Some may place DLL and LIB files under 'lib' subfolder win_pthread_path, win_pthread_path + '/lib', ], required : true, + static: true, ) windows_libs = [ 'Ws2_32', # Resolve symbols for Winsock @@ -65,6 +99,19 @@ if build_machine.system() == 'windows' required : true, ) endforeach + + enable_mlx5 = false + if enable_mlx5 + res_lib = run_command(prog_python, '-c', 'import os; print(os.environ["DEVX_LIB_PATH"])') + devx_lib_dir = res_lib.stdout().strip() + mlx5devx_lib = cc.find_library('mlx5devx', dirs: devx_lib_dir, required: true, static : false) + windows_deps += mlx5devx_lib + + res_inc = run_command(prog_python, '-c', 'import os; print(os.environ["DEVX_INC_PATH"])') + devx_inc_dir = res_inc.stdout().strip() + global_inc += include_directories(devx_inc_dir) + endif + endif check_headers = [ @@ -236,12 +283,25 @@ subdir('include') cin_processing = configuration_data() cin_processing.set('BANNER', '/* -*- mode: c; buffer-read-only: t -*- */') # this a temporary hack and may break backwards compatability with autotools cin_processing.set('srcdir', meson.project_source_root()) -cin_processing.set_quoted('LOGDIR', '/var/log/openvswitch') -cin_processing.set_quoted('RUNDIR', '/var/run/openvswitch') -cin_processing.set_quoted('DBDIR', '/etc/openvswitch') -cin_processing.set_quoted('bindir', '/bin') -cin_processing.set_quoted('sysconfdir', '/etc') -cin_processing.set_quoted('pkgdatadir', '/usr/share/openvswitch') + + +if build_machine.system() == 'windows' + install_prefix = 'c:/openvswitch/usr/' + cin_processing.set_quoted('LOGDIR', 'c:/openvswitch/var/log/') + cin_processing.set_quoted('RUNDIR', 'c:/openvswitch/var/run/') + cin_processing.set_quoted('DBDIR', 'c:/openvswitch/etc/') + cin_processing.set_quoted('bindir', 'c:/openvswitch/bin') + cin_processing.set_quoted('sysconfdir', 'c:/openvswitch/etc') + cin_processing.set_quoted('pkgdatadir', 'c:/openvswitch/usr/share/') +else + install_prefix = '/usr/local/' + cin_processing.set_quoted('LOGDIR', '/var/log/openvswitch') + cin_processing.set_quoted('RUNDIR', '/var/run/openvswitch') + cin_processing.set_quoted('DBDIR', '/etc/openvswitch') + cin_processing.set_quoted('bindir', '/bin') + cin_processing.set_quoted('sysconfdir', '/etc') + cin_processing.set_quoted('pkgdatadir', '/usr/share/openvswitch') +endif subdir('python/ovs') @@ -257,14 +317,14 @@ foreach cin : c_in_files ) endforeach -global_inc = include_directories( +global_inc += include_directories( '.', 'include', 'lib', ) if build_machine.system() == 'windows' - global_inc = include_directories( + global_inc += include_directories( '.', 'include', 'lib', @@ -274,6 +334,12 @@ if build_machine.system() == 'windows' win_pthread_path, win_pthread_path + '/include', ) + if enable_dpdk + dpdk_lib_path = get_option('with-dpdk') + global_inc += include_directories( + dpdk_lib_path + '/include', + ) + endif endif global_libs = [] @@ -306,8 +372,6 @@ add_project_arguments( language : 'c', ) -install_prefix = '/usr/local/' - lib_ovsdb_server_idl_ovsidl = custom_target( 'ovsdb-server-idl.ovsidl', input : [ diff --git a/ovs/meson_options.txt b/ovs/meson_options.txt index 2631b26..e857de9 100644 --- a/ovs/meson_options.txt +++ b/ovs/meson_options.txt @@ -1,2 +1,4 @@ option('with-pthread', type : 'string', value : 'optval', description : 'Path to the POSIX threads API library for Windows') +option('with-dpdk', type : 'string', value : 'optval', + description : 'Path to DPDK library for Windows') diff --git a/ovs/ofproto/meson.build b/ovs/ofproto/meson.build index 4f3b3fe..f318660 100644 --- a/ovs/ofproto/meson.build +++ b/ovs/ofproto/meson.build @@ -44,6 +44,9 @@ sources += [ ] deps = [] +if enable_dpdk + deps += dpdk_lib_deps +endif mapfile = meson.project_build_root() + '/ofproto.map' vflag = '-Wl'.format(meson.current_source_dir(), mapfile) diff --git a/ovs/ovsdb/meson.build b/ovs/ovsdb/meson.build index 1f7299e..206708e 100644 --- a/ovs/ovsdb/meson.build +++ b/ovs/ovsdb/meson.build @@ -43,6 +43,9 @@ sources = files( ) deps = [] +if enable_dpdk + deps += dpdk_lib_deps +endif mapfile = meson.project_build_root() + '/ovsdb.map' vflag = '-Wl'.format(meson.current_source_dir(), mapfile) diff --git a/ovs/utilities/meson.build b/ovs/utilities/meson.build index 49c7200..f4e62e5 100644 --- a/ovs/utilities/meson.build +++ b/ovs/utilities/meson.build @@ -6,6 +6,9 @@ sources = files( ) deps = [] +if enable_dpdk + deps += dpdk_lib_deps +endif executable('ovs-appctl', sources, dependencies : deps, diff --git a/ovs/vswitchd/meson.build b/ovs/vswitchd/meson.build index b8e3298..b438009 100644 --- a/ovs/vswitchd/meson.build +++ b/ovs/vswitchd/meson.build @@ -16,9 +16,19 @@ sources += lib_vswitch_idl_h deps = [] + +if enable_dpdk + deps += dpdk_lib_deps +endif + if build_machine.system() == 'linux' deps += m_dep endif +if build_machine.system() == 'windows' + deps += windows_deps +# deps += mlx5devx_lib + message('vswitchd: add windows_deps') +endif executable('ovs-vswitchd', sources, dependencies : deps, diff --git a/ovs/vtep/meson.build b/ovs/vtep/meson.build index abdae7e..7f9f3c1 100644 --- a/ovs/vtep/meson.build +++ b/ovs/vtep/meson.build @@ -40,6 +40,9 @@ sources = [ ] deps = [] +if enable_dpdk + deps += dpdk_lib_deps +endif mapfile = meson.project_build_root() + '/vtep.map' vflag = '-Wl'.format(meson.current_source_dir(), mapfile)