Message ID | 20200104011326.92161-2-yihung.wei@gmail.com |
---|---|
State | Accepted |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev,v9,1/2] netdev-linux: Detect numa node id. | expand |
On Fri, Jan 3, 2020 at 5:13 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote: > > Currently, the AF_XDP socket (XSK) related memory are allocated by main > thread in the main thread's NUMA domain. With the patch that detects > netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on > the AF_XDP netdev's NUMA domain. If the net device's NUMA domain > is different from the main thread's NUMA domain, we will have two > cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). > > This patch addresses the aforementioned issue by allocating > the memory in the net device's NUMA domain. > > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Thanks for working on this! Acked-by: William Tu <u9012063@gmail.com>
On 04.01.2020 02:13, Yi-Hung Wei wrote: > Currently, the AF_XDP socket (XSK) related memory are allocated by main > thread in the main thread's NUMA domain. With the patch that detects > netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on > the AF_XDP netdev's NUMA domain. If the net device's NUMA domain > is different from the main thread's NUMA domain, we will have two > cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). > > This patch addresses the aforementioned issue by allocating > the memory in the net device's NUMA domain. > > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Suggesting following incremental for both patches: --- diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 0e43c09ee..ae55944d4 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -674,7 +674,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev) /* Allocate all the xsk related memory in the netdev's NUMA domain. */ struct bitmask *old_bm = NULL; int old_policy, numa_id; - if (numa_available() != -1) { + if (numa_available() != -1 && ovs_numa_get_n_numas() > 1) { numa_id = netdev_get_numa_id(netdev); if (numa_id != NETDEV_NUMA_UNSPEC) { old_bm = numa_allocate_nodemask(); @@ -723,6 +723,9 @@ out: if (old_bm) { if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) { VLOG_WARN("Failed to restore NUMA memory policy."); + /* Can't restore correctly. Try to use localalloc as the most + * likely default memory policy. */ + numa_set_localalloc(); } numa_bitmask_free(old_bm); } diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index af2a34aa9..e1ef58bef 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -1459,6 +1459,11 @@ netdev_linux_get_numa_id__(struct netdev_linux *netdev) netdev->numa_id = 0; netdev->cache_valid |= VALID_NUMA_ID; + if (ovs_numa_get_n_numas() < 2) { + /* No need to check on system with a single NUMA node. */ + return 0; + } + name = netdev_get_name(&netdev->up); if (strpbrk(name, "/\\")) { VLOG_ERR_RL(&rl, "\"%s\" is not a valid name for a port. " --- It solves 3 issues: 1. Avoiding warning while using physical device on system without NUMA topology. 2. Attempt to restore memory policy to default and less likely to fail in case real restoring failed. 3. Saving some time by avoiding checking NUMA node on system without NUMA. If you're OK with that, I could squash this in while applying the patch. Best regards, Ilya Maximets.
On Fri, Jan 17, 2020 at 2:58 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 04.01.2020 02:13, Yi-Hung Wei wrote: > > Currently, the AF_XDP socket (XSK) related memory are allocated by main > > thread in the main thread's NUMA domain. With the patch that detects > > netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on > > the AF_XDP netdev's NUMA domain. If the net device's NUMA domain > > is different from the main thread's NUMA domain, we will have two > > cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). > > > > This patch addresses the aforementioned issue by allocating > > the memory in the net device's NUMA domain. > > > > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> > > > Suggesting following incremental for both patches: > > --- > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c > index 0e43c09ee..ae55944d4 100644 > --- a/lib/netdev-afxdp.c > +++ b/lib/netdev-afxdp.c > @@ -674,7 +674,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev) > /* Allocate all the xsk related memory in the netdev's NUMA domain. */ > struct bitmask *old_bm = NULL; > int old_policy, numa_id; > - if (numa_available() != -1) { > + if (numa_available() != -1 && ovs_numa_get_n_numas() > 1) { > numa_id = netdev_get_numa_id(netdev); > if (numa_id != NETDEV_NUMA_UNSPEC) { > old_bm = numa_allocate_nodemask(); > @@ -723,6 +723,9 @@ out: > if (old_bm) { > if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) { > VLOG_WARN("Failed to restore NUMA memory policy."); > + /* Can't restore correctly. Try to use localalloc as the most > + * likely default memory policy. */ > + numa_set_localalloc(); > } > numa_bitmask_free(old_bm); > } > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > index af2a34aa9..e1ef58bef 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -1459,6 +1459,11 @@ netdev_linux_get_numa_id__(struct netdev_linux *netdev) > netdev->numa_id = 0; > netdev->cache_valid |= VALID_NUMA_ID; > > + if (ovs_numa_get_n_numas() < 2) { > + /* No need to check on system with a single NUMA node. */ > + return 0; > + } > + > name = netdev_get_name(&netdev->up); > if (strpbrk(name, "/\\")) { > VLOG_ERR_RL(&rl, "\"%s\" is not a valid name for a port. " > > --- > > > It solves 3 issues: > 1. Avoiding warning while using physical device on system without NUMA topology. > 2. Attempt to restore memory policy to default and less likely to fail in case > real restoring failed. > 3. Saving some time by avoiding checking NUMA node on system without NUMA. > > > If you're OK with that, I could squash this in while applying the patch. > Hi Ilya, Thanks for the diff! Yi-Hung is on vacation, and yes, the change looks good to me. Regards, William
On 18.01.2020 00:19, William Tu wrote: > On Fri, Jan 17, 2020 at 2:58 PM Ilya Maximets <i.maximets@ovn.org> wrote: >> >> On 04.01.2020 02:13, Yi-Hung Wei wrote: >>> Currently, the AF_XDP socket (XSK) related memory are allocated by main >>> thread in the main thread's NUMA domain. With the patch that detects >>> netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on >>> the AF_XDP netdev's NUMA domain. If the net device's NUMA domain >>> is different from the main thread's NUMA domain, we will have two >>> cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). >>> >>> This patch addresses the aforementioned issue by allocating >>> the memory in the net device's NUMA domain. >>> >>> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> >> >> >> Suggesting following incremental for both patches: >> >> --- >> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c >> index 0e43c09ee..ae55944d4 100644 >> --- a/lib/netdev-afxdp.c >> +++ b/lib/netdev-afxdp.c >> @@ -674,7 +674,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev) >> /* Allocate all the xsk related memory in the netdev's NUMA domain. */ >> struct bitmask *old_bm = NULL; >> int old_policy, numa_id; >> - if (numa_available() != -1) { >> + if (numa_available() != -1 && ovs_numa_get_n_numas() > 1) { >> numa_id = netdev_get_numa_id(netdev); >> if (numa_id != NETDEV_NUMA_UNSPEC) { >> old_bm = numa_allocate_nodemask(); >> @@ -723,6 +723,9 @@ out: >> if (old_bm) { >> if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) { >> VLOG_WARN("Failed to restore NUMA memory policy."); >> + /* Can't restore correctly. Try to use localalloc as the most >> + * likely default memory policy. */ >> + numa_set_localalloc(); >> } >> numa_bitmask_free(old_bm); >> } >> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c >> index af2a34aa9..e1ef58bef 100644 >> --- a/lib/netdev-linux.c >> +++ b/lib/netdev-linux.c >> @@ -1459,6 +1459,11 @@ netdev_linux_get_numa_id__(struct netdev_linux *netdev) >> netdev->numa_id = 0; >> netdev->cache_valid |= VALID_NUMA_ID; >> >> + if (ovs_numa_get_n_numas() < 2) { >> + /* No need to check on system with a single NUMA node. */ >> + return 0; >> + } >> + >> name = netdev_get_name(&netdev->up); >> if (strpbrk(name, "/\\")) { >> VLOG_ERR_RL(&rl, "\"%s\" is not a valid name for a port. " >> >> --- >> >> >> It solves 3 issues: >> 1. Avoiding warning while using physical device on system without NUMA topology. >> 2. Attempt to restore memory policy to default and less likely to fail in case >> real restoring failed. >> 3. Saving some time by avoiding checking NUMA node on system without NUMA. >> >> >> If you're OK with that, I could squash this in while applying the patch. >> > > Hi Ilya, > > Thanks for the diff! > Yi-Hung is on vacation, and yes, the change looks good to me. Thanks! I folded this in and applied both patches to master. Best regards, Ilya Maximets.
diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst index 7b0736c96114..c4685fa7ebac 100644 --- a/Documentation/intro/install/afxdp.rst +++ b/Documentation/intro/install/afxdp.rst @@ -164,7 +164,7 @@ If a test case fails, check the log at:: Setup AF_XDP netdev ------------------- -Before running OVS with AF_XDP, make sure the libbpf and libelf are +Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are set-up right:: ldd vswitchd/ovs-vswitchd diff --git a/acinclude.m4 b/acinclude.m4 index 542637ac8cb8..f73dc9bf7e3c 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -286,6 +286,8 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [ AC_CHECK_FUNCS([pthread_spin_lock], [], [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])]) + OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma]) + AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF_XDP support is available and enabled.]) LIBBPF_LDADD=" -lbpf -lelf" diff --git a/include/sparse/automake.mk b/include/sparse/automake.mk index 073631e8c082..974ad3fe55f7 100644 --- a/include/sparse/automake.mk +++ b/include/sparse/automake.mk @@ -5,6 +5,7 @@ noinst_HEADERS += \ include/sparse/bits/floatn.h \ include/sparse/assert.h \ include/sparse/math.h \ + include/sparse/numa.h \ include/sparse/netinet/in.h \ include/sparse/netinet/ip6.h \ include/sparse/netpacket/packet.h \ diff --git a/include/sparse/numa.h b/include/sparse/numa.h new file mode 100644 index 000000000000..3691a0eaf729 --- /dev/null +++ b/include/sparse/numa.h @@ -0,0 +1,27 @@ +/* + * Copyright (c) 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef __CHECKER__ +#error "Use this header only with sparse. It is not a correct implementation." +#endif + +/* Avoid sparse warning: non-ANSI function declaration of function" */ +#define numa_get_membind_compat() numa_get_membind_compat(void) +#define numa_get_interleave_mask_compat() numa_get_interleave_mask_compat(void) +#define numa_get_run_node_mask_compat() numa_get_run_node_mask_compat(void) + +/* Get actual <numa.h> definitions for us to annotate and build on. */ +#include_next<numa.h> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 426dce944977..fad0aab9d550 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -26,6 +26,8 @@ #include <linux/rtnetlink.h> #include <linux/if_xdp.h> #include <net/if.h> +#include <numa.h> +#include <numaif.h> #include <poll.h> #include <stdlib.h> #include <sys/resource.h> @@ -669,6 +671,24 @@ netdev_afxdp_reconfigure(struct netdev *netdev) struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; int err = 0; + /* Allocate all the xsk related memory in the netdev's NUMA domain. */ + struct bitmask *old_bm = NULL; + int old_policy, numa_id; + if (numa_available() != -1) { + numa_id = netdev_get_numa_id(netdev); + if (numa_id != NETDEV_NUMA_UNSPEC) { + old_bm = numa_allocate_nodemask(); + if (get_mempolicy(&old_policy, old_bm->maskp, old_bm->size + 1, + NULL, 0)) { + VLOG_INFO("Failed to get NUMA memory policy."); + numa_bitmask_free(old_bm); + old_bm = NULL; + } else { + numa_set_preferred(numa_id); + } + } + } + ovs_mutex_lock(&dev->mutex); if (netdev->n_rxq == dev->requested_n_rxq @@ -700,6 +720,12 @@ netdev_afxdp_reconfigure(struct netdev *netdev) netdev_change_seq_changed(netdev); out: ovs_mutex_unlock(&dev->mutex); + if (old_bm) { + if (set_mempolicy(old_policy, old_bm->maskp, old_bm->size + 1)) { + VLOG_WARN("Failed to restore NUMA memory policy."); + } + numa_bitmask_free(old_bm); + } return err; }
Currently, the AF_XDP socket (XSK) related memory are allocated by main thread in the main thread's NUMA domain. With the patch that detects netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on the AF_XDP netdev's NUMA domain. If the net device's NUMA domain is different from the main thread's NUMA domain, we will have two cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). This patch addresses the aforementioned issue by allocating the memory in the net device's NUMA domain. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> --- v9: - Addreess review comments from Ilya in patch 2. * Add check on numa_available() * Properly restore memory policy with get/set_mempolicy. v8: - Addreess review comments from Eelco and Ilya in patch 2. * Use OVS_FIND_DEPENDENCY(). * Avoid the locking issue when calling netdev_get_numa_id(). * Check NETDEV_NUMA_UNSPEC. * Use return value from netdev_get_numa_id() directly, and check NETDEV_NUMA_UNSPEC case. * Use numa_set_preferred(). --- Documentation/intro/install/afxdp.rst | 2 +- acinclude.m4 | 2 ++ include/sparse/automake.mk | 1 + include/sparse/numa.h | 27 +++++++++++++++++++++++++++ lib/netdev-afxdp.c | 26 ++++++++++++++++++++++++++ 5 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 include/sparse/numa.h