diff mbox series

[ovs-dev,v8,2/2] netdev-afxdp: NUMA-aware memory allocation for XSK related memory

Message ID 20191218203106.85695-2-yihung.wei@gmail.com
State Superseded
Headers show
Series [ovs-dev,v8,1/2] netdev-linux: Detect numa node id. | expand

Commit Message

Yi-Hung Wei Dec. 18, 2019, 8:31 p.m. UTC
Currently, the AF_XDP socket (XSK) related memory are allocated by main
thread in the main thread's NUMA domain.  With the patch that detects
netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
is different from the main thread's NUMA domain, we will have two
cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).

This patch addresses the aforementioned issue by allocating
the memory in the net device's NUMA domain.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
v8:
  - Addreess review comments from Eelco and Ilya in patch 2.
    * Use OVS_FIND_DEPENDENCY().
    * Avoid the locking issue when calling netdev_get_numa_id().
    * Check NETDEV_NUMA_UNSPEC.
    * Use return value from netdev_get_numa_id() directly, and
      check NETDEV_NUMA_UNSPEC case.
    * Use numa_set_preferred().

---
 Documentation/intro/install/afxdp.rst |  2 +-
 acinclude.m4                          |  2 ++
 include/sparse/automake.mk            |  1 +
 include/sparse/numa.h                 | 27 +++++++++++++++++++++++++++
 lib/netdev-afxdp.c                    | 13 +++++++++++++
 5 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 include/sparse/numa.h

Comments

William Tu Dec. 20, 2019, 6:52 p.m. UTC | #1
On Wed, Dec 18, 2019 at 12:31:06PM -0800, Yi-Hung Wei wrote:
> Currently, the AF_XDP socket (XSK) related memory are allocated by main
> thread in the main thread's NUMA domain.  With the patch that detects
> netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
> the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
> is different from the main thread's NUMA domain, we will have two
> cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).
> 
> This patch addresses the aforementioned issue by allocating
> the memory in the net device's NUMA domain.
> 
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

LGTM, Thanks for working on this!
(I wasn't able to test NIC on different NUMA id, becuase I don't
have physical access to hardware. So only make sure numa id=0 works)

Tested-by: William Tu <u9012063@gmail.com>
Ilya Maximets Jan. 3, 2020, 3:09 p.m. UTC | #2
On 18.12.2019 21:31, Yi-Hung Wei wrote:
> Currently, the AF_XDP socket (XSK) related memory are allocated by main
> thread in the main thread's NUMA domain.  With the patch that detects
> netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
> the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
> is different from the main thread's NUMA domain, we will have two
> cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).
> 
> This patch addresses the aforementioned issue by allocating
> the memory in the net device's NUMA domain.
> 
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
> v8:
>   - Addreess review comments from Eelco and Ilya in patch 2.
>     * Use OVS_FIND_DEPENDENCY().
>     * Avoid the locking issue when calling netdev_get_numa_id().
>     * Check NETDEV_NUMA_UNSPEC.
>     * Use return value from netdev_get_numa_id() directly, and
>       check NETDEV_NUMA_UNSPEC case.
>     * Use numa_set_preferred().
> 
> ---
>  Documentation/intro/install/afxdp.rst |  2 +-
>  acinclude.m4                          |  2 ++
>  include/sparse/automake.mk            |  1 +
>  include/sparse/numa.h                 | 27 +++++++++++++++++++++++++++
>  lib/netdev-afxdp.c                    | 13 +++++++++++++
>  5 files changed, 44 insertions(+), 1 deletion(-)
>  create mode 100644 include/sparse/numa.h
> 
> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
> index 7b0736c96114..c4685fa7ebac 100644
> --- a/Documentation/intro/install/afxdp.rst
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -164,7 +164,7 @@ If a test case fails, check the log at::
>  
>  Setup AF_XDP netdev
>  -------------------
> -Before running OVS with AF_XDP, make sure the libbpf and libelf are
> +Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are
>  set-up right::
>  
>    ldd vswitchd/ovs-vswitchd
> diff --git a/acinclude.m4 b/acinclude.m4
> index 542637ac8cb8..f73dc9bf7e3c 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -286,6 +286,8 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
>      AC_CHECK_FUNCS([pthread_spin_lock], [],
>        [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
>  
> +    OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma])
> +
>      AC_DEFINE([HAVE_AF_XDP], [1],
>                [Define to 1 if AF_XDP support is available and enabled.])
>      LIBBPF_LDADD=" -lbpf -lelf"
> diff --git a/include/sparse/automake.mk b/include/sparse/automake.mk
> index 073631e8c082..974ad3fe55f7 100644
> --- a/include/sparse/automake.mk
> +++ b/include/sparse/automake.mk
> @@ -5,6 +5,7 @@ noinst_HEADERS += \
>          include/sparse/bits/floatn.h \
>          include/sparse/assert.h \
>          include/sparse/math.h \
> +        include/sparse/numa.h \
>          include/sparse/netinet/in.h \
>          include/sparse/netinet/ip6.h \
>          include/sparse/netpacket/packet.h \
> diff --git a/include/sparse/numa.h b/include/sparse/numa.h
> new file mode 100644
> index 000000000000..3691a0eaf729
> --- /dev/null
> +++ b/include/sparse/numa.h
> @@ -0,0 +1,27 @@
> +/*
> + * Copyright (c) 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef __CHECKER__
> +#error "Use this header only with sparse.  It is not a correct implementation."
> +#endif
> +
> +/* Avoid sparse warning: non-ANSI function declaration of function" */
> +#define numa_get_membind_compat() numa_get_membind_compat(void)
> +#define numa_get_interleave_mask_compat() numa_get_interleave_mask_compat(void)
> +#define numa_get_run_node_mask_compat() numa_get_run_node_mask_compat(void)
> +
> +/* Get actual <numa.h> definitions for us to annotate and build on. */
> +#include_next<numa.h>
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index 91b70b298e57..9a7dd8208f8f 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -26,6 +26,7 @@
>  #include <linux/rtnetlink.h>
>  #include <linux/if_xdp.h>
>  #include <net/if.h>
> +#include <numa.h>
>  #include <poll.h>
>  #include <stdlib.h>
>  #include <sys/resource.h>
> @@ -661,6 +662,14 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
>      int err = 0;
>  
> +    /* Allocate all the xsk related memory in the netdev's NUMA domain. */
> +    struct bitmask *old_bm = NULL;
> +    int numa_id = netdev_get_numa_id(netdev);
> +    if (numa_id != NETDEV_NUMA_UNSPEC) {
> +        old_bm = numa_get_membind();
> +        numa_set_preferred(numa_id);
> +    }
> +
>      ovs_mutex_lock(&dev->mutex);
>  
>      if (netdev->n_rxq == dev->requested_n_rxq
> @@ -692,6 +701,10 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      netdev_change_seq_changed(netdev);
>  out:
>      ovs_mutex_unlock(&dev->mutex);
> +    if (old_bm) {
> +        numa_set_membind(old_bm);

This will not return previous numa policy, it will set policy
to membind, which might be not expected by the user.

I don't see a valid wrapper for that, so it seems like the only
way is to use get/set_mempolicy directly for restoring the original
memory policy.

BTW, you're not allowed to use any libnuma functions if !numa_available().
You need to check it first somewhere.

> +        numa_bitmask_free(old_bm);
> +    }
>      return err;
>  }
>  
>
Yi-Hung Wei Jan. 4, 2020, 1:12 a.m. UTC | #3
On Fri, Jan 3, 2020 at 7:09 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 18.12.2019 21:31, Yi-Hung Wei wrote:
> > --- a/lib/netdev-afxdp.c
> > +++ b/lib/netdev-afxdp.c
> > @@ -692,6 +701,10 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
> >      netdev_change_seq_changed(netdev);
> >  out:
> >      ovs_mutex_unlock(&dev->mutex);
> > +    if (old_bm) {
> > +        numa_set_membind(old_bm);
>
> This will not return previous numa policy, it will set policy
> to membind, which might be not expected by the user.
>
> I don't see a valid wrapper for that, so it seems like the only
> way is to use get/set_mempolicy directly for restoring the original
> memory policy.

Thanks for pointing this out.  Yes, after checking on libnuma there is
not proper wrapper to export and restore the original memory policy .
I would use get/set_mempolicy to achieve that in the next version.

>
> BTW, you're not allowed to use any libnuma functions if !numa_available().
> You need to check it first somewhere.

Sure, I will add a check with numa_available() in the next version.

Thanks,

-Yi-Hung
diff mbox series

Patch

diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index 7b0736c96114..c4685fa7ebac 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -164,7 +164,7 @@  If a test case fails, check the log at::
 
 Setup AF_XDP netdev
 -------------------
-Before running OVS with AF_XDP, make sure the libbpf and libelf are
+Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are
 set-up right::
 
   ldd vswitchd/ovs-vswitchd
diff --git a/acinclude.m4 b/acinclude.m4
index 542637ac8cb8..f73dc9bf7e3c 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -286,6 +286,8 @@  AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
     AC_CHECK_FUNCS([pthread_spin_lock], [],
       [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
 
+    OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma])
+
     AC_DEFINE([HAVE_AF_XDP], [1],
               [Define to 1 if AF_XDP support is available and enabled.])
     LIBBPF_LDADD=" -lbpf -lelf"
diff --git a/include/sparse/automake.mk b/include/sparse/automake.mk
index 073631e8c082..974ad3fe55f7 100644
--- a/include/sparse/automake.mk
+++ b/include/sparse/automake.mk
@@ -5,6 +5,7 @@  noinst_HEADERS += \
         include/sparse/bits/floatn.h \
         include/sparse/assert.h \
         include/sparse/math.h \
+        include/sparse/numa.h \
         include/sparse/netinet/in.h \
         include/sparse/netinet/ip6.h \
         include/sparse/netpacket/packet.h \
diff --git a/include/sparse/numa.h b/include/sparse/numa.h
new file mode 100644
index 000000000000..3691a0eaf729
--- /dev/null
+++ b/include/sparse/numa.h
@@ -0,0 +1,27 @@ 
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef __CHECKER__
+#error "Use this header only with sparse.  It is not a correct implementation."
+#endif
+
+/* Avoid sparse warning: non-ANSI function declaration of function" */
+#define numa_get_membind_compat() numa_get_membind_compat(void)
+#define numa_get_interleave_mask_compat() numa_get_interleave_mask_compat(void)
+#define numa_get_run_node_mask_compat() numa_get_run_node_mask_compat(void)
+
+/* Get actual <numa.h> definitions for us to annotate and build on. */
+#include_next<numa.h>
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 91b70b298e57..9a7dd8208f8f 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -26,6 +26,7 @@ 
 #include <linux/rtnetlink.h>
 #include <linux/if_xdp.h>
 #include <net/if.h>
+#include <numa.h>
 #include <poll.h>
 #include <stdlib.h>
 #include <sys/resource.h>
@@ -661,6 +662,14 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
     int err = 0;
 
+    /* Allocate all the xsk related memory in the netdev's NUMA domain. */
+    struct bitmask *old_bm = NULL;
+    int numa_id = netdev_get_numa_id(netdev);
+    if (numa_id != NETDEV_NUMA_UNSPEC) {
+        old_bm = numa_get_membind();
+        numa_set_preferred(numa_id);
+    }
+
     ovs_mutex_lock(&dev->mutex);
 
     if (netdev->n_rxq == dev->requested_n_rxq
@@ -692,6 +701,10 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     netdev_change_seq_changed(netdev);
 out:
     ovs_mutex_unlock(&dev->mutex);
+    if (old_bm) {
+        numa_set_membind(old_bm);
+        numa_bitmask_free(old_bm);
+    }
     return err;
 }