@@ -10,6 +10,7 @@ DOC_SOURCE = \
Documentation/intro/why-ovs.rst \
Documentation/intro/install/index.rst \
Documentation/intro/install/bash-completion.rst \
+ Documentation/intro/install/afxdp.rst \
Documentation/intro/install/debian.rst \
Documentation/intro/install/documentation.rst \
Documentation/intro/install/distributions.rst \
@@ -59,6 +59,7 @@ vSwitch? Start here.
:doc:`intro/install/windows` |
:doc:`intro/install/xenserver` |
:doc:`intro/install/dpdk` |
+ :doc:`intro/install/afxdp` |
:doc:`Installation FAQs <faq/releases>`
- **Tutorials:** :doc:`tutorials/faucet` |
new file mode 100644
@@ -0,0 +1,433 @@
+..
+ Licensed under the Apache License, Version 2.0 (the "License"); you may
+ not use this file except in compliance with the License. You may obtain
+ a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and limitations
+ under the License.
+
+ Convention for heading levels in Open vSwitch documentation:
+
+ ======= Heading 0 (reserved for the title in a document)
+ ------- Heading 1
+ ~~~~~~~ Heading 2
+ +++++++ Heading 3
+ ''''''' Heading 4
+
+ Avoid deeper levels because they do not render well.
+
+
+========================
+Open vSwitch with AF_XDP
+========================
+
+This document describes how to build and install Open vSwitch using
+AF_XDP netdev.
+
+.. warning::
+ The AF_XDP support of Open vSwitch is considered 'experimental',
+ and it is not compiled in by default.
+
+
+Introduction
+------------
+AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
+built upon the eBPF and XDP technology. It is aims to have comparable
+performance to DPDK but cooperate better with existing kernel's networking
+stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
+attached to the netdev, by-passing a couple of Linux kernel's subsystems.
+As a result, AF_XDP socket shows much better performance than AF_PACKET.
+For more details about AF_XDP, please see linux kernel's
+Documentation/networking/af_xdp.rst
+
+
+AF_XDP Netdev
+-------------
+OVS has a couple of netdev types, i.e., system, tap, or
+dpdk. The AF_XDP feature adds a new netdev types called
+"afxdp", and implement its configuration, packet reception,
+and transmit functions. Since the AF_XDP socket, called xsk,
+operates in userspace, once ovs-vswitchd receives packets
+from xsk, the afxdp netdev re-uses the existing userspace
+dpif-netdev datapath. As a result, most of the packet processing
+happens at the userspace instead of linux kernel.
+
+::
+
+ | +-------------------+
+ | | ovs-vswitchd |<-->ovsdb-server
+ | +-------------------+
+ | | ofproto |<-->OpenFlow controllers
+ | +--------+-+--------+
+ | | netdev | |ofproto-|
+ userspace | +--------+ | dpif |
+ | | afxdp | +--------+
+ | | netdev | | dpif |
+ | +---||---+ +--------+
+ | || | dpif- |
+ | || | netdev |
+ |_ || +--------+
+ ||
+ _ +---||-----+--------+
+ | | AF_XDP prog + |
+ kernel | | xsk_map |
+ |_ +--------||---------+
+ ||
+ physical
+ NIC
+
+
+Build requirements
+------------------
+
+In addition to the requirements described in :doc:`general`, building Open
+vSwitch with AF_XDP will require the following:
+
+- libbpf from kernel source tree (kernel 5.0.0 or later)
+
+- Linux kernel XDP support, with the following options (required)
+
+ * CONFIG_BPF=y
+
+ * CONFIG_BPF_SYSCALL=y
+
+ * CONFIG_XDP_SOCKETS=y
+
+
+- The following optional Kconfig options are also recommended, but not
+ required:
+
+ * CONFIG_BPF_JIT=y (Performance)
+
+ * CONFIG_HAVE_BPF_JIT=y (Performance)
+
+ * CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
+
+- Once your AF_XDP-enabled kernel is ready, if possible, run
+ **./xdpsock -r -N -z -i <your device>** under linux/samples/bpf.
+ This is an OVS indepedent benchmark tools for AF_XDP.
+ It makes sure your basic kernel requirements are met for AF_XDP.
+
+
+Installing
+----------
+For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
+Frist, clone a recent version of Linux bpf-next tree::
+
+ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+
+Second, go into the Linux source directory and build libbpf in the tools
+directory::
+
+ cd bpf-next/
+ cd tools/lib/bpf/
+ make && make install
+ make install_headers
+
+.. note::
+ Make sure xsk.h and bpf.h are installed in system's library path,
+ e.g. /usr/local/include/bpf/ or /usr/include/bpf/
+
+Make sure the libbpf.so is installed correctly::
+
+ ldconfig
+ ldconfig -p | grep libbpf
+
+Third, ensure the standard OVS requirements are installed and
+bootstrap/configure the package::
+
+ ./boot.sh && ./configure --enable-afxdp
+
+Finally, build and install OVS::
+
+ make && make install
+
+To kick start end-to-end autotesting::
+
+ uname -a # make sure having 5.0+ kernel
+ make check-afxdp TESTSUITEFLAGS='1'
+
+If a test case fails, check the log at::
+
+ cat tests/system-afxdp-testsuite.dir/001/system-afxdp-testsuite.log
+
+
+Setup AF_XDP netdev
+-------------------
+Before running OVS with AF_XDP, make sure the libbpf and libelf are
+set-up right::
+
+ ldd vswitchd/ovs-vswitchd
+
+Open vSwitch should be started using userspace datapath as described
+in :doc:`general`::
+
+ ovs-vswitchd ...
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+Make sure your device driver support AF_XDP, and to use 1 PMD (on core 4)
+on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
+pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
+
+ ethtool -L enp2s0 combined 1
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+ options:n_rxq=1 options:xdpmode=drv \
+ other_config:pmd-rxq-affinity="0:4"
+
+Or, use 4 pmds/cores and 4 queues by doing::
+
+ ethtool -L enp2s0 combined 4
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+ options:n_rxq=4 options:xdpmode=drv \
+ other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+
+.. note::
+ pmd-rxq-affinity is optional. If not specified, system will auto-assign.
+
+To validate that the bridge has successfully instantiated, you can use the::
+
+ ovs-vsctl show
+
+Should show something like::
+
+ Port "ens802f0"
+ Interface "ens802f0"
+ type: afxdp
+ options: {n_rxq="1", xdpmode=drv}
+
+Otherwise, enable debugging by::
+
+ ovs-appctl vlog/set netdev_afxdp::dbg
+
+
+References
+----------
+Most of the design details are described in the paper presented at
+Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
+section 4, and slides[2][4].
+"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
+about AF_XDP current and future work.
+
+[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
+
+[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
+
+[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
+
+[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
+
+
+Performance Tuning
+------------------
+The name of the game is to keep your CPU running in userspace, allowing PMD
+to keep polling the AF_XDP queues without any interferences from kernel.
+
+#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
+ running cores, device plug-in slot)
+
+#. Isolate your CPU by doing isolcpu at grub configure.
+
+#. IRQ should not set to pmd running core.
+
+#. The Spectre and Meltdown fixes increase the overhead of system calls.
+
+
+Debugging performance issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+While running the traffic, use linux perf tool to see where your cpu
+spends its cycle::
+
+ cd bpf-next/tools/perf
+ make
+ ./perf record -p `pidof ovs-vswitchd` sleep 10
+ ./perf report
+
+Measure your system call rate by doing::
+
+ pstree -p `pidof ovs-vswitchd`
+ strace -c -p <your pmd's PID>
+
+Or, use OVS pmd tool::
+
+ ovs-appctl dpif-netdev/pmd-stats-show
+
+
+Example Script
+--------------
+
+Below is a script using namespaces and veth peer::
+
+ #!/bin/bash
+ ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
+ --disable-system --detach \
+ ovs-vsctl -- add-br br0 -- set Bridge br0 \
+ protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
+ fail-mode=secure datapath_type=netdev
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+ ip netns add at_ns0
+ ovs-appctl vlog/set netdev_afxdp::dbg
+
+ ip link add p0 type veth peer name afxdp-p0
+ ip link set p0 netns at_ns0
+ ip link set dev afxdp-p0 up
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
+
+ ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
+ ip addr add "10.1.1.1/24" dev p0
+ ip link set dev p0 up
+ NS_EXEC_HEREDOC
+
+ ip netns add at_ns1
+ ip link add p1 type veth peer name afxdp-p1
+ ip link set p1 netns at_ns1
+ ip link set dev afxdp-p1 up
+
+ ovs-vsctl add-port br0 afxdp-p1 -- \
+ set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
+ ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
+ ip addr add "10.1.1.2/24" dev p1
+ ip link set dev p1 up
+ NS_EXEC_HEREDOC
+
+ ip netns exec at_ns0 ping -i .2 10.1.1.2
+
+
+Limitations/Known Issues
+------------------------
+#. Device's numa ID is always 0, need a way to find numa id from a netdev.
+#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
+ work-around is to use OpenFlow meter action.
+#. AF_XDP device added to bridge, remove, and added again will fail.
+#. Most of the tests are done using i40e single port. Multiple ports and
+ also ixgbe driver also needs to be tested.
+#. No latency test result (TODO items)
+
+
+PVP using tap device
+--------------------
+Assume you have enp2s0 as physical nic, and a tap device connected to VM.
+First, start OVS, then add physical port::
+
+ ethtool -L enp2s0 combined 1
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+ options:n_rxq=1 options:xdpmode=drv \
+ other_config:pmd-rxq-affinity="0:4"
+
+Start a VM with virtio and tap device::
+
+ qemu-system-x86_64 -hda ubuntu1810.qcow \
+ -m 4096 \
+ -cpu host,+x2apic -enable-kvm \
+ -device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
+ vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
+ -netdev type=tap,id=net0,vhost=on,queues=8 \
+ -object memory-backend-file,id=mem,size=4096M,\
+ mem-path=/dev/hugepages,share=on \
+ -numa node,memdev=mem -mem-prealloc -smp 2
+
+Create OpenFlow rules::
+
+ ovs-vsctl add-port br0 tap0 -- set interface tap0 type="afxdp"
+ ovs-ofctl del-flows br0
+ ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
+ ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
+
+Inside the VM, use xdp_rxq_info to bounce back the traffic::
+
+ ./xdp_rxq_info --dev ens3 --action XDP_TX
+
+The performance number I got is around 1.6Mpps.
+This is due to using the kernel's tap interface, which requires copying
+packet into kernel from the umem buffer in userspace.
+
+
+PVP using vhostuser device
+--------------------------
+First, build OVS with DPDK and AFXDP::
+
+ ./configure --enable-afxdp --with-dpdk=<dpdk path>
+ make -j4 && make install
+
+Create a vhost-user port from OVS::
+
+ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
+ other_config:pmd-cpu-mask=0xfff
+ ovs-vsctl add-port br0 vhost-user-1 \
+ -- set Interface vhost-user-1 type=dpdkvhostuser
+
+Start VM using vhost-user mode::
+
+ qemu-system-x86_64 -hda ubuntu1810.qcow \
+ -m 4096 \
+ -cpu host,+x2apic -enable-kvm \
+ -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \
+ -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \
+ -device virtio-net-pci,mac=00:00:00:00:00:01,\
+ netdev=mynet1,mq=on,vectors=10 \
+ -object memory-backend-file,id=mem,size=4096M,\
+ mem-path=/dev/hugepages,share=on \
+ -numa node,memdev=mem -mem-prealloc -smp 2
+
+Setup the OpenFlow ruls::
+
+ ovs-ofctl del-flows br0
+ ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:vhost-user-1"
+ ovs-ofctl add-flow br0 "in_port=vhost-user-1, actions=output:enp2s0"
+
+Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
+
+ ./xdp_rxq_info --dev ens3 --action XDP_DROP
+ ./xdp_rxq_info --dev ens3 --action XDP_TX
+
+Performance: for RX_DROP: 6.6Mpps, TX: 2.3Mpps
+
+
+PCP container using veth
+------------------------
+Create namespace and veth peer devices::
+
+ ip netns add at_ns0
+ ip link add p0 type veth peer name afxdp-p0
+ ip link set p0 netns at_ns0
+ ip link set dev afxdp-p0 up
+ ip netns exec at_ns0 ip link set dev p0 up
+
+Attach the veth port to br0 (linux kernel mode)::
+
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 options:n_rxq=1
+
+Or, use AF_XDP with skb mode::
+
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
+
+Setup the OpenFlow rules::
+
+ ovs-ofctl del-flows br0
+ ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
+ ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
+
+In the namespace, run drop or bounce back the packet::
+
+ ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
+ ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
+
+Performace: for RX_DROP: 800Kpps, TX: 700Kpps
+
+
+Bug Reporting
+-------------
+
+Please report problems to dev@openvswitch.org.
@@ -45,6 +45,7 @@ Installation from Source
xenserver
userspace
dpdk
+ afxdp
Installation from Packages
--------------------------
@@ -221,6 +221,41 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
])
])
+dnl OVS_CHECK_LINUX_AF_XDP
+dnl
+dnl Check both Linux kernel AF_XDP and libbpf support
+AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
+ AC_ARG_ENABLE([afxdp],
+ [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
+ [], [enable_afxdp=no])
+ AC_MSG_CHECKING([whether AF_XDP is enabled])
+ if test "$enable_afxdp" != yes; then
+ AC_MSG_RESULT([no])
+ AF_XDP_ENABLE=false
+ else
+ AC_MSG_RESULT([yes])
+ AF_XDP_ENABLE=true
+
+ AC_CHECK_HEADER([bpf/libbpf.h], [],
+ [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
+
+ AC_CHECK_HEADER([linux/if_xdp.h], [],
+ [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
+
+ AC_CHECK_HEADER([bpf/xsk.h], [],
+ [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+
+ AC_CHECK_HEADER([bpf/libbpf_util.h], [],
+ [AC_MSG_ERROR([unable to find bpf/libbpf_util.h for AF_XDP support])])
+
+ AC_DEFINE([HAVE_AF_XDP], [1],
+ [Define to 1 if AF_XDP support is available and enabled.])
+ LIBBPF_LDADD=" -lbpf -lelf"
+ AC_SUBST([LIBBPF_LDADD])
+ fi
+ AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
+])
+
dnl OVS_CHECK_DPDK
dnl
dnl Configure DPDK source tree
@@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
OVS_CHECK_DOT
OVS_CHECK_IF_DL
OVS_CHECK_STRTOK_R
+OVS_CHECK_LINUX_AF_XDP
AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
[], [], [[#include <sys/stat.h>]])
@@ -14,6 +14,10 @@ if WIN32
lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
endif
+if HAVE_AF_XDP
+lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
+endif
+
lib_libopenvswitch_la_LDFLAGS = \
$(OVS_LTINFO) \
-Wl,--version-script=$(top_builddir)/lib/libopenvswitch.sym \
@@ -392,6 +396,7 @@ lib_libopenvswitch_la_SOURCES += \
lib/if-notifier.h \
lib/netdev-linux.c \
lib/netdev-linux.h \
+ lib/netdev-linux-private.h \
lib/netdev-tc-offloads.c \
lib/netdev-tc-offloads.h \
lib/netlink-conntrack.c \
@@ -409,6 +414,15 @@ lib_libopenvswitch_la_SOURCES += \
lib/tc.h
endif
+if HAVE_AF_XDP
+lib_libopenvswitch_la_SOURCES += \
+ lib/xdpsock.c \
+ lib/xdpsock.h \
+ lib/netdev-afxdp.c \
+ lib/netdev-afxdp.h \
+ lib/spinlock.h
+endif
+
if DPDK_NETDEV
lib_libopenvswitch_la_SOURCES += \
lib/dpdk.c \
@@ -19,6 +19,7 @@
#include <string.h>
#include "dp-packet.h"
+#include "netdev-afxdp.h"
#include "netdev-dpdk.h"
#include "openvswitch/dynamic-string.h"
#include "util.h"
@@ -59,6 +60,27 @@ dp_packet_use(struct dp_packet *b, void *base, size_t allocated)
dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
}
+#if HAVE_AF_XDP
+/* Initialize 'b' as an empty dp_packet that contains
+ * memory starting at AF_XDP umem base.
+ */
+void
+dp_packet_use_afxdp(struct dp_packet *b, void *base, size_t allocated)
+{
+ dp_packet_set_base(b, base);
+ dp_packet_set_data(b, base);
+ dp_packet_set_size(b, 0);
+
+ dp_packet_set_allocated(b, allocated);
+ b->source = DPBUF_AFXDP;
+ dp_packet_reset_offsets(b);
+ pkt_metadata_init(&b->md, 0);
+ dp_packet_reset_cutlen(b);
+ dp_packet_reset_offload(b);
+ b->packet_type = htonl(PT_ETH);
+}
+#endif
+
/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
* memory starting at 'base'. 'base' should point to a buffer on the stack.
* (Nothing actually relies on 'base' being allocated on the stack. It could
@@ -122,6 +144,8 @@ dp_packet_uninit(struct dp_packet *b)
* created as a dp_packet */
free_dpdk_buf((struct dp_packet*) b);
#endif
+ } else if (b->source == DPBUF_AFXDP) {
+ free_afxdp_buf(b);
}
}
}
@@ -248,6 +272,9 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
case DPBUF_STACK:
OVS_NOT_REACHED();
+ case DPBUF_AFXDP:
+ OVS_NOT_REACHED();
+
case DPBUF_STUB:
b->source = DPBUF_MALLOC;
new_base = xmalloc(new_allocated);
@@ -433,6 +460,7 @@ dp_packet_steal_data(struct dp_packet *b)
{
void *p;
ovs_assert(b->source != DPBUF_DPDK);
+ ovs_assert(b->source != DPBUF_AFXDP);
if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) {
p = dp_packet_data(b);
@@ -25,6 +25,7 @@
#include <rte_mbuf.h>
#endif
+#include "netdev-afxdp.h"
#include "netdev-dpdk.h"
#include "openvswitch/list.h"
#include "packets.h"
@@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
DPBUF_DPDK, /* buffer data is from DPDK allocated memory.
* ref to dp_packet_init_dpdk() in dp-packet.c.
*/
+ DPBUF_AFXDP, /* buffer data from XDP frame */
};
#define DP_PACKET_CONTEXT_SIZE 64
@@ -89,6 +91,13 @@ struct dp_packet {
};
};
+#if HAVE_AF_XDP
+struct dp_packet_afxdp {
+ struct umem_pool *mpool;
+ struct dp_packet packet;
+};
+#endif
+
static inline void *dp_packet_data(const struct dp_packet *);
static inline void dp_packet_set_data(struct dp_packet *, void *);
static inline void *dp_packet_base(const struct dp_packet *);
@@ -122,7 +131,9 @@ static inline const void *dp_packet_get_nd_payload(const struct dp_packet *);
void dp_packet_use(struct dp_packet *, void *, size_t);
void dp_packet_use_stub(struct dp_packet *, void *, size_t);
void dp_packet_use_const(struct dp_packet *, const void *, size_t);
-
+#if HAVE_AF_XDP
+void dp_packet_use_afxdp(struct dp_packet *, void *, size_t);
+#endif
void dp_packet_init_dpdk(struct dp_packet *);
void dp_packet_init(struct dp_packet *, size_t);
@@ -184,6 +195,11 @@ dp_packet_delete(struct dp_packet *b)
return;
}
+ if (b->source == DPBUF_AFXDP) {
+ free_afxdp_buf(b);
+ return;
+ }
+
dp_packet_uninit(b);
free(b);
}
@@ -198,6 +198,12 @@ cycles_counter_update(struct pmd_perf_stats *s)
{
#ifdef DPDK_NETDEV
return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP
+ /* This is x86-specific instructions. */
+ uint32_t h, l;
+ asm volatile("rdtsc" : "=a" (l), "=d" (h));
+
+ return s->last_tsc = ((uint64_t) h << 32) | l;
#else
return s->last_tsc = 0;
#endif
new file mode 100644
@@ -0,0 +1,858 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#if !defined(__i386__) && !defined(__x86_64__)
+#error AF_XDP supported only for Linux on x86 or x86_64
+#endif
+
+#include <config.h>
+
+#include "netdev-linux-private.h"
+#include "netdev-linux.h"
+#include "netdev-afxdp.h"
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/rtnetlink.h>
+#include <linux/if_xdp.h>
+#include <net/if.h>
+#include <stdlib.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "dp-packet.h"
+#include "dpif-netdev.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/vlog.h"
+#include "packets.h"
+#include "socket-util.h"
+#include "spinlock.h"
+#include "util.h"
+#include "xdpsock.h"
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
+#define UMEM2XPKT(base, i) \
+ ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
+ i * sizeof(struct dp_packet_afxdp))
+
+static uint32_t prog_id;
+static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
+ int mode);
+static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
+static void xsk_destroy(struct xsk_socket_info *xsk);
+
+static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t size,
+ int xdpmode)
+{
+ struct xsk_umem_config uconfig OVS_UNUSED;
+ struct xsk_umem_info *umem;
+ int ret;
+ int i;
+
+ umem = xcalloc(1, sizeof(*umem));
+ ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
+ NULL);
+ if (ret) {
+ VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
+ ovs_strerror(errno),
+ xdpmode == XDP_COPY ? "SKB": "DRV");
+ free(umem);
+ return NULL;
+ }
+
+ umem->buffer = buffer;
+
+ /* set-up umem pool */
+ if (umem_pool_init(&umem->mpool, NUM_FRAMES) < 0) {
+ VLOG_ERR("umem_pool_init failed");
+ if (xsk_umem__delete(umem->umem)) {
+ VLOG_ERR("xsk_umem__delete failed");
+ }
+ free(umem);
+ return NULL;
+ }
+
+ for (i = NUM_FRAMES - 1; i >= 0; i--) {
+ struct umem_elem *elem;
+
+ elem = ALIGNED_CAST(struct umem_elem *,
+ (char *)umem->buffer + i * FRAME_SIZE);
+ umem_elem_push(&umem->mpool, elem);
+ }
+
+ /* set-up metadata */
+ if (xpacket_pool_init(&umem->xpool, NUM_FRAMES) < 0) {
+ VLOG_ERR("xpacket_pool_init failed");
+ umem_pool_cleanup(&umem->mpool);
+ if (xsk_umem__delete(umem->umem)) {
+ VLOG_ERR("xsk_umem__delete failed");
+ }
+ free(umem);
+ return NULL;
+ }
+
+ VLOG_DBG("%s xpacket pool from %p to %p", __func__,
+ umem->xpool.array,
+ (char *)umem->xpool.array +
+ NUM_FRAMES * sizeof(struct dp_packet_afxdp));
+
+ for (i = NUM_FRAMES - 1; i >= 0; i--) {
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ xpacket = UMEM2XPKT(umem->xpool.array, i);
+ xpacket->mpool = &umem->mpool;
+
+ packet = &xpacket->packet;
+ packet->source = DPBUF_AFXDP;
+ }
+
+ return umem;
+}
+
+static struct xsk_socket_info *
+xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
+ uint32_t queue_id, int xdpmode)
+{
+ struct xsk_socket_config cfg;
+ struct xsk_socket_info *xsk;
+ char devname[IF_NAMESIZE];
+ uint32_t idx = 0;
+ int ret;
+ int i;
+
+ xsk = xcalloc(1, sizeof(*xsk));
+ xsk->umem = umem;
+ cfg.rx_size = CONS_NUM_DESCS;
+ cfg.tx_size = PROD_NUM_DESCS;
+ cfg.libbpf_flags = 0;
+
+ if (xdpmode == XDP_ZEROCOPY) {
+ cfg.bind_flags = XDP_ZEROCOPY;
+ cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+ } else {
+ cfg.bind_flags = XDP_COPY;
+ cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+ }
+
+ if (if_indextoname(ifindex, devname) == NULL) {
+ VLOG_ERR("ifindex %d to devname failed (%s)",
+ ifindex, ovs_strerror(errno));
+ free(xsk);
+ return NULL;
+ }
+
+ ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
+ &xsk->rx, &xsk->tx, &cfg);
+ if (ret) {
+ VLOG_ERR("xsk_socket__create failed (%s) mode: %s qid: %d",
+ ovs_strerror(errno),
+ xdpmode == XDP_COPY ? "SKB": "DRV",
+ queue_id);
+ free(xsk);
+ return NULL;
+ }
+
+ /* Make sure the built-in AF_XDP program is loaded */
+ ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
+ if (ret) {
+ VLOG_ERR("Get XDP prog ID failed (%s)", ovs_strerror(errno));
+ xsk_socket__delete(xsk->xsk);
+ free(xsk);
+ return NULL;
+ }
+
+ /* Populate (PROD_NUM_DESCS - BATCH_SIZE) elems to the FILL queue */
+ while (!xsk_ring_prod__reserve(&xsk->umem->fq,
+ PROD_NUM_DESCS - BATCH_SIZE, &idx)) {
+ VLOG_WARN_RL(&rl, "Retry xsk_ring_prod__reserve to FILL queue");
+ }
+
+ for (i = 0;
+ i < (PROD_NUM_DESCS - BATCH_SIZE) * FRAME_SIZE;
+ i += FRAME_SIZE) {
+ struct umem_elem *elem;
+ uint64_t addr;
+
+ elem = umem_elem_pop(&xsk->umem->mpool);
+ addr = UMEM2DESC(elem, xsk->umem->buffer);
+
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
+ }
+
+ xsk_ring_prod__submit(&xsk->umem->fq,
+ PROD_NUM_DESCS - BATCH_SIZE);
+ return xsk;
+}
+
+static struct xsk_socket_info *
+xsk_configure(int ifindex, int xdp_queue_id, int xdpmode)
+{
+ struct xsk_socket_info *xsk;
+ struct xsk_umem_info *umem;
+ void *bufs;
+ int ret;
+
+ /* umem memory region */
+ ret = posix_memalign(&bufs, get_page_size(),
+ NUM_FRAMES * FRAME_SIZE);
+ if (ret) {
+ VLOG_ERR("posix_memalign fails: %s", ovs_strerror(errno));
+ return NULL;
+ }
+ memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
+
+ /* create AF_XDP socket */
+ umem = xsk_configure_umem(bufs,
+ NUM_FRAMES * FRAME_SIZE,
+ xdpmode);
+ if (!umem) {
+ free(bufs);
+ return NULL;
+ }
+
+ xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode);
+ if (!xsk) {
+ /* clean up umem and xpacket pool */
+ if (xsk_umem__delete(umem->umem)) {
+ VLOG_ERR("xsk_umem__delete failed");
+ }
+ free(bufs);
+ umem_pool_cleanup(&umem->mpool);
+ xpacket_pool_cleanup(&umem->xpool);
+ free(umem);
+ }
+ return xsk;
+}
+
+int
+xsk_configure_all(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ struct xsk_socket_info *xsk;
+ int i, ifindex;
+
+ ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+ /* configure each queue */
+ for (i = 0; i < netdev->n_rxq; i++) {
+ VLOG_INFO("%s configure queue %d mode %s", __func__, i,
+ dev->xdpmode == XDP_COPY ? "SKB" : "DRV");
+ xsk = xsk_configure(ifindex, i, dev->xdpmode);
+ if (!xsk) {
+ VLOG_ERR("failed to create AF_XDP socket on queue %d", i);
+ goto err;
+ }
+ dev->xsk[i] = xsk;
+ xsk->rx_dropped = 0;
+ xsk->tx_dropped = 0;
+ }
+
+ return 0;
+
+err:
+ xsk_destroy_all(netdev);
+ return EINVAL;
+}
+
+static void
+xsk_destroy(struct xsk_socket_info *xsk)
+{
+ struct xsk_umem *umem;
+
+ if (!xsk) {
+ return;
+ }
+
+ umem = xsk->umem->umem;
+ xsk_socket__delete(xsk->xsk);
+ if (xsk_umem__delete(umem)) {
+ VLOG_ERR("xsk_umem__delete failed");
+ }
+
+ /* free the packet buffer */
+ free(xsk->umem->buffer);
+
+ /* cleanup umem pool */
+ umem_pool_cleanup(&xsk->umem->mpool);
+
+ /* cleanup metadata pool */
+ xpacket_pool_cleanup(&xsk->umem->xpool);
+
+ free(xsk->umem);
+ free(xsk);
+}
+
+void
+xsk_destroy_all(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ int i, ifindex;
+
+ ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+ for (i = 0; i < MAX_XSKQ; i++) {
+ if (dev->xsk[i]) {
+ VLOG_INFO("destroy xsk[%d]", i);
+ xsk_destroy(dev->xsk[i]);
+ dev->xsk[i] = NULL;
+ dev->xsk[i]->rx_dropped = 0;
+ dev->xsk[i]->tx_dropped = 0;
+ }
+ }
+ VLOG_INFO("remove xdp program");
+ xsk_remove_xdp_program(ifindex, dev->xdpmode);
+}
+
+static inline void OVS_UNUSED
+log_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
+ struct xdp_statistics stat;
+ socklen_t optlen;
+
+ optlen = sizeof stat;
+ ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP, XDP_STATISTICS,
+ &stat, &optlen) == 0);
+
+ VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu",
+ stat.rx_dropped,
+ stat.rx_invalid_descs,
+ stat.tx_invalid_descs);
+}
+
+int
+netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
+ char **errp OVS_UNUSED)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ const char *str_xdpmode;
+ int xdpmode, new_n_rxq;
+
+ ovs_mutex_lock(&dev->mutex);
+ new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
+ if (new_n_rxq > MAX_XSKQ) {
+ ovs_mutex_unlock(&dev->mutex);
+ VLOG_ERR("%s: Too big 'n_rxq' (%d > %d).",
+ netdev_get_name(netdev), new_n_rxq, MAX_XSKQ);
+ return EINVAL;
+ }
+
+ str_xdpmode = smap_get_def(args, "xdpmode", "skb");
+ if (!strcasecmp(str_xdpmode, "drv")) {
+ xdpmode = XDP_ZEROCOPY;
+ } else if (!strcasecmp(str_xdpmode, "skb")) {
+ xdpmode = XDP_COPY;
+ } else {
+ VLOG_ERR("%s: Incorrect xdpmode (%s).",
+ netdev_get_name(netdev), str_xdpmode);
+ ovs_mutex_unlock(&dev->mutex);
+ return EINVAL;
+ }
+
+ if (dev->requested_n_rxq != new_n_rxq
+ || dev->requested_xdpmode != xdpmode) {
+ dev->requested_n_rxq = new_n_rxq;
+ dev->requested_xdpmode = xdpmode;
+ netdev_request_reconfigure(netdev);
+ }
+ ovs_mutex_unlock(&dev->mutex);
+ return 0;
+}
+
+int
+netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+
+ ovs_mutex_lock(&dev->mutex);
+ smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
+ smap_add_format(args, "xdpmode", "%s",
+ dev->xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
+ ovs_mutex_unlock(&dev->mutex);
+ return 0;
+}
+
+int
+netdev_afxdp_reconfigure(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ int err = 0;
+
+ ovs_mutex_lock(&dev->mutex);
+
+ if (netdev->n_rxq == dev->requested_n_rxq
+ && dev->xdpmode == dev->requested_xdpmode) {
+ goto out;
+ }
+
+ xsk_destroy_all(netdev);
+ netdev->n_rxq = dev->requested_n_rxq;
+
+ if (dev->requested_xdpmode == XDP_ZEROCOPY) {
+ VLOG_INFO("AF_XDP device %s in DRV mode", netdev_get_name(netdev));
+ /* From SKB mode to DRV mode */
+ dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+ dev->xdp_bind_flags = XDP_ZEROCOPY;
+ dev->xdpmode = XDP_ZEROCOPY;
+
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK): %s",
+ ovs_strerror(errno));
+ }
+ } else {
+ VLOG_INFO("AF_XDP device %s in SKB mode", netdev_get_name(netdev));
+ /* From DRV mode to SKB mode */
+ dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+ dev->xdp_bind_flags = XDP_COPY;
+ dev->xdpmode = XDP_COPY;
+ /* TODO: set rlimit back to previous value
+ * when no device is in DRV mode.
+ */
+ }
+
+ err = xsk_configure_all(netdev);
+ if (err) {
+ VLOG_ERR("AF_XDP device %s reconfig fails", netdev_get_name(netdev));
+ }
+ netdev_change_seq_changed(netdev);
+out:
+ ovs_mutex_unlock(&dev->mutex);
+ return err;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev)
+{
+ /* FIXME: Get netdev's PCIe device ID, then find
+ * its NUMA node id.
+ */
+ VLOG_INFO("FIXME: Device %s always use numa id 0",
+ netdev_get_name(netdev));
+ return 0;
+}
+
+static void
+xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
+{
+ uint32_t curr_prog_id = 0;
+ uint32_t flags;
+
+ /* remove_xdp_program() */
+ if (xdpmode == XDP_COPY) {
+ flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+ } else {
+ flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+ }
+
+ if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, flags)) {
+ bpf_set_link_xdp_fd(ifindex, -1, flags);
+ }
+ if (prog_id == curr_prog_id) {
+ bpf_set_link_xdp_fd(ifindex, -1, flags);
+ } else if (!curr_prog_id) {
+ VLOG_INFO("couldn't find a prog id on a given interface");
+ } else {
+ VLOG_INFO("program on interface changed, not removing");
+ }
+}
+
+void
+signal_remove_xdp(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ int ifindex;
+
+ ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+ VLOG_WARN("force remove xdp program");
+ xsk_remove_xdp_program(ifindex, dev->xdpmode);
+}
+
+static struct dp_packet_afxdp *
+dp_packet_cast_afxdp(const struct dp_packet *d)
+{
+ ovs_assert(d->source == DPBUF_AFXDP);
+ return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
+}
+
+void
+free_afxdp_buf(struct dp_packet *p)
+{
+ struct dp_packet_afxdp *xpacket;
+ unsigned long addr;
+
+ xpacket = dp_packet_cast_afxdp(p);
+ if (xpacket->mpool) {
+ void *base = dp_packet_base(p);
+
+ addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
+ umem_elem_push(xpacket->mpool, (void *)addr);
+ }
+}
+
+static void
+free_afxdp_buf_batch(struct dp_packet_batch *batch)
+{
+ struct dp_packet_afxdp *xpacket = NULL;
+ struct dp_packet *packet;
+ void *elems[BATCH_SIZE];
+ unsigned long addr;
+
+ /* all packets are AF_XDP, so handles its own delete in batch */
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ xpacket = dp_packet_cast_afxdp(packet);
+ if (xpacket->mpool) {
+ void *base = dp_packet_base(packet);
+
+ addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
+ elems[i] = (void *)addr;
+ }
+ }
+ umem_elem_push_n(xpacket->mpool, batch->count, elems);
+ dp_packet_batch_init(batch);
+}
+
+int
+netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
+ int *qfill)
+{
+ struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
+ struct netdev *netdev = rx->up.netdev;
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ struct umem_elem *elems[BATCH_SIZE];
+ uint32_t idx_rx = 0, idx_fq = 0;
+ struct xsk_socket_info *xsk;
+ int qid = rxq_->queue_id;
+ unsigned int rcvd, i;
+ int ret = 0;
+
+ xsk = dev->xsk[qid];
+ rx->fd = xsk_socket__fd(xsk->xsk);
+
+ /* See if there is any packet on RX queue,
+ * if yes, idx_rx is the index having the packet.
+ */
+ rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
+ if (!rcvd) {
+ return 0;
+ }
+
+ ret = umem_elem_pop_n(&xsk->umem->mpool, rcvd, (void **)elems);
+ if (OVS_UNLIKELY(ret)) {
+ xsk_ring_cons__release(&xsk->rx, rcvd);
+ xsk->rx_dropped += rcvd;
+ return ENOMEM;
+ }
+
+ /* Prepare for the FILL queue */
+ if (!xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq)) {
+ /* The FILL queue is full, don't retry or process rx. Wait for kernel
+ * to move received packets from FILL queue to RX queue.
+ */
+ umem_elem_push_n(&xsk->umem->mpool, rcvd, (void **)elems);
+ xsk_ring_cons__release(&xsk->rx, rcvd);
+ xsk->rx_dropped += rcvd;
+ return ENOMEM;
+ }
+
+ /* Setup a dp_packet batch from descriptors in RX queue */
+ for (i = 0; i < rcvd; i++) {
+ uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+ uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
+ char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+ uint64_t index;
+
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ index = addr >> FRAME_SHIFT;
+ xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
+ packet = &xpacket->packet;
+
+ /* Initialize the struct dp_packet */
+ dp_packet_use_afxdp(packet, pkt, FRAME_SIZE - FRAME_HEADROOM);
+ dp_packet_set_size(packet, len);
+
+ /* Add packet into batch, increase batch->count */
+ dp_packet_batch_add(batch, packet);
+
+ idx_rx++;
+ }
+ /* Release the RX queue */
+ xsk_ring_cons__release(&xsk->rx, rcvd);
+
+ for (i = 0; i < rcvd; i++) {
+ uint64_t index;
+ struct umem_elem *elem;
+
+ /* Get one free umem, program it into FILL queue */
+ elem = elems[i];
+ index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+ ovs_assert((index & FRAME_SHIFT_MASK) == 0);
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
+
+ idx_fq++;
+ }
+ xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+
+ if (qfill) {
+ /* TODO: return the number of remaining packets in the queue. */
+ *qfill = 0;
+ }
+
+#ifdef AFXDP_DEBUG
+ log_xsk_stat(xsk);
+#endif
+ return 0;
+}
+
+static inline int
+kick_tx(struct xsk_socket_info *xsk)
+{
+ int ret;
+
+ /* This causes system call into kernel's xsk_sendmsg, and
+ * xsk_generic_xmit (skb mode) or xsk_async_xmit (driver mode).
+ */
+ ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
+ if (OVS_UNLIKELY(ret < 0)) {
+ if (errno == ENXIO || errno == ENOBUFS || errno == EOPNOTSUPP) {
+ return errno;
+ }
+ }
+ /* no error, or EBUSY or EAGAIN */
+ return 0;
+}
+
+static inline bool
+check_free_batch(struct dp_packet_batch *batch)
+{
+ struct umem_pool *first_mpool = NULL;
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ if (packet->source != DPBUF_AFXDP) {
+ return false;
+ }
+ xpacket = dp_packet_cast_afxdp(packet);
+ if (i == 0) {
+ first_mpool = xpacket->mpool;
+ continue;
+ }
+ if (xpacket->mpool != first_mpool) {
+ return false;
+ }
+ }
+ /* All packets are DPBUF_AFXDP and from the same mpool */
+ return true;
+}
+
+static inline void
+afxdp_complete_tx(struct xsk_socket_info *xsk)
+{
+ struct umem_elem *elems_push[BATCH_SIZE];
+ uint32_t idx_cq = 0;
+ int tx_done, j, ret;
+
+ if (!xsk->outstanding_tx) {
+ return;
+ }
+
+ ret = kick_tx(xsk);
+ if (OVS_UNLIKELY(ret)) {
+ VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
+ ovs_strerror(ret));
+ }
+
+ tx_done = xsk_ring_cons__peek(&xsk->umem->cq, BATCH_SIZE, &idx_cq);
+ if (tx_done > 0) {
+ xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+ xsk->outstanding_tx -= tx_done;
+ }
+
+ /* Recycle back to umem pool */
+ for (j = 0; j < tx_done; j++) {
+ struct umem_elem *elem;
+ uint64_t addr;
+
+ addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+ elem = ALIGNED_CAST(struct umem_elem *,
+ (char *)xsk->umem->buffer + addr);
+ elems_push[j] = elem;
+ }
+
+ umem_elem_push_n(&xsk->umem->mpool, tx_done, (void **)elems_push);
+}
+
+int
+netdev_afxdp_batch_send(struct netdev *netdev_, int qid,
+ struct dp_packet_batch *batch,
+ bool concurrent_txq)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev_);
+ struct xsk_socket_info *xsk = dev->xsk[qid];
+ struct umem_elem *elems_pop[BATCH_SIZE];
+ struct dp_packet *packet;
+ bool free_batch = true;
+ uint32_t idx = 0;
+ int error = 0;
+ int ret;
+
+ if (OVS_UNLIKELY(concurrent_txq)) {
+ ovs_spin_lock(&dev->tx_lock);
+ }
+
+ /* Process CQ first. */
+ afxdp_complete_tx(xsk);
+
+ free_batch = check_free_batch(batch);
+
+ ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, (void **)elems_pop);
+ if (OVS_UNLIKELY(ret)) {
+ xsk->tx_dropped += batch->count;
+ error = ENOMEM;
+ goto out;
+ }
+
+ /* Make sure we have enough TX descs */
+ ret = xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx);
+ if (OVS_UNLIKELY(ret == 0)) {
+ umem_elem_push_n(&xsk->umem->mpool, batch->count, (void **)elems_pop);
+ xsk->tx_dropped += batch->count;
+ error = ENOMEM;
+ goto out;
+ }
+
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ struct umem_elem *elem;
+ uint64_t index;
+
+ elem = elems_pop[i];
+ /* Copy the packet to the umem we just pop from umem pool.
+ * TODO: avoid this copy if the packet and the pop umem
+ * are located in the same umem.
+ */
+ memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
+
+ index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
+ = dp_packet_size(packet);
+ }
+ xsk_ring_prod__submit(&xsk->tx, batch->count);
+ xsk->outstanding_tx += batch->count;
+
+ ret = kick_tx(xsk);
+ if (OVS_UNLIKELY(ret)) {
+ umem_elem_push_n(&xsk->umem->mpool, batch->count, (void **)elems_pop);
+ VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
+ ovs_strerror(ret));
+ }
+
+out:
+ if (free_batch) {
+ free_afxdp_buf_batch(batch);
+ } else {
+ dp_packet_delete_batch(batch, true);
+ }
+
+ if (OVS_UNLIKELY(concurrent_txq)) {
+ ovs_spin_unlock(&dev->tx_lock);
+ }
+ return error;
+}
+
+int
+netdev_afxdp_rxq_construct(struct netdev_rxq *rxq_ OVS_UNUSED)
+{
+ /* Done at reconfigure */
+ return 0;
+}
+
+void
+netdev_afxdp_destruct(struct netdev *netdev_)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+
+ /* Note: tc is by-passed when using drv-mode, but when using
+ * skb-mode, we might need to clean up tc. */
+
+ xsk_destroy_all(netdev_);
+ ovs_mutex_destroy(&netdev->mutex);
+}
+
+int
+netdev_afxdp_get_stats(const struct netdev *netdev_,
+ struct netdev_stats *stats)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev_);
+ struct netdev_stats dev_stats;
+ struct xsk_socket_info *xsk;
+ int error, i;
+
+ ovs_mutex_lock(&dev->mutex);
+
+ error = get_stats_via_netlink(netdev_, &dev_stats);
+ if (error) {
+ VLOG_WARN_RL(&rl, "Error getting AF_XDP statistics");
+ } else {
+ /* Use kernel netdev's packet and byte counts */
+ stats->rx_packets = dev_stats.rx_packets;
+ stats->rx_bytes = dev_stats.rx_bytes;
+ stats->tx_packets = dev_stats.tx_packets;
+ stats->tx_bytes = dev_stats.tx_bytes;
+
+ stats->rx_errors += dev_stats.rx_errors;
+ stats->tx_errors += dev_stats.tx_errors;
+ stats->rx_dropped += dev_stats.rx_dropped;
+ stats->tx_dropped += dev_stats.tx_dropped;
+ stats->multicast += dev_stats.multicast;
+ stats->collisions += dev_stats.collisions;
+ stats->rx_length_errors += dev_stats.rx_length_errors;
+ stats->rx_over_errors += dev_stats.rx_over_errors;
+ stats->rx_crc_errors += dev_stats.rx_crc_errors;
+ stats->rx_frame_errors += dev_stats.rx_frame_errors;
+ stats->rx_fifo_errors += dev_stats.rx_fifo_errors;
+ stats->rx_missed_errors += dev_stats.rx_missed_errors;
+ stats->tx_aborted_errors += dev_stats.tx_aborted_errors;
+ stats->tx_carrier_errors += dev_stats.tx_carrier_errors;
+ stats->tx_fifo_errors += dev_stats.tx_fifo_errors;
+ stats->tx_heartbeat_errors += dev_stats.tx_heartbeat_errors;
+ stats->tx_window_errors += dev_stats.tx_window_errors;
+
+ /* Account the dropped in each xsk */
+ for (i = 0; i < MAX_XSKQ; i++) {
+ xsk = dev->xsk[i];
+ if (xsk) {
+ stats->rx_dropped += xsk->rx_dropped;
+ stats->tx_dropped += xsk->tx_dropped;
+ }
+ }
+ }
+ ovs_mutex_unlock(&dev->mutex);
+
+ return error;
+}
new file mode 100644
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_AFXDP_H
+#define NETDEV_AFXDP_H 1
+
+#include <config.h>
+
+#ifdef HAVE_AF_XDP
+
+#include <stdint.h>
+#include <stdbool.h>
+
+/* These functions are Linux AF_XDP specific, so they should be used directly
+ * only by Linux-specific code. */
+
+#define MAX_XSKQ 16
+
+struct netdev;
+struct xsk_socket_info;
+struct xdp_umem;
+struct dp_packet_batch;
+struct smap;
+struct dp_packet;
+struct netdev_rxq;
+struct netdev_stats;
+
+int xsk_configure_all(struct netdev *netdev);
+void xsk_destroy_all(struct netdev *netdev);
+
+int netdev_afxdp_rxq_construct(struct netdev_rxq *rxq_);
+void netdev_afxdp_destruct(struct netdev *netdev_);
+
+int netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_,
+ struct dp_packet_batch *batch,
+ int *qfill);
+int netdev_afxdp_batch_send(struct netdev *netdev_, int qid,
+ struct dp_packet_batch *batch,
+ bool concurrent_txq);
+int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
+ char **errp);
+int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args);
+int netdev_afxdp_get_numa_id(const struct netdev *netdev);
+int netdev_afxdp_get_stats(const struct netdev *netdev_,
+ struct netdev_stats *stats);
+
+void free_afxdp_buf(struct dp_packet *p);
+int netdev_afxdp_reconfigure(struct netdev *netdev);
+void signal_remove_xdp(struct netdev *netdev);
+
+#else /* !HAVE_AF_XDP */
+
+#include "openvswitch/compiler.h"
+
+struct dp_packet;
+
+static inline void
+free_afxdp_buf(struct dp_packet *p OVS_UNUSED)
+{
+ /* Nothing */
+}
+
+#endif /* HAVE_AF_XDP */
+#endif /* netdev-afxdp.h */
new file mode 100644
@@ -0,0 +1,139 @@
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_LINUX_PRIVATE_H
+#define NETDEV_LINUX_PRIVATE_H 1
+
+#include <config.h>
+
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "netdev-afxdp.h"
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"
+#include "xdpsock.h"
+
+/* These functions are Linux specific, so they should be used directly only by
+ * Linux-specific code. */
+
+struct netdev;
+
+struct netdev_rxq_linux {
+ struct netdev_rxq up;
+ bool is_tap;
+ int fd;
+};
+
+void netdev_linux_run(const struct netdev_class *);
+
+int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
+ const char *flag_name, bool enable);
+
+int get_stats_via_netlink(const struct netdev *netdev_,
+ struct netdev_stats *stats);
+
+struct netdev_linux {
+ struct netdev up;
+
+ /* Protects all members below. */
+ struct ovs_mutex mutex;
+
+ unsigned int cache_valid;
+
+ bool miimon; /* Link status of last poll. */
+ long long int miimon_interval; /* Miimon Poll rate. Disabled if <= 0. */
+ struct timer miimon_timer;
+
+ int netnsid; /* Network namespace ID. */
+ /* The following are figured out "on demand" only. They are only valid
+ * when the corresponding VALID_* bit in 'cache_valid' is set. */
+ int ifindex;
+ struct eth_addr etheraddr;
+ int mtu;
+ unsigned int ifi_flags;
+ long long int carrier_resets;
+ uint32_t kbits_rate; /* Policing data. */
+ uint32_t kbits_burst;
+ int vport_stats_error; /* Cached error code from vport_get_stats().
+ 0 or an errno value. */
+ int netdev_mtu_error; /* Cached error code from SIOCGIFMTU
+ * or SIOCSIFMTU.
+ */
+ int ether_addr_error; /* Cached error code from set/get etheraddr. */
+ int netdev_policing_error; /* Cached error code from set policing. */
+ int get_features_error; /* Cached error code from ETHTOOL_GSET. */
+ int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */
+
+ enum netdev_features current; /* Cached from ETHTOOL_GSET. */
+ enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
+ enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
+
+ struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. */
+ struct tc *tc;
+
+ /* For devices of class netdev_tap_class only. */
+ int tap_fd;
+ bool present; /* If the device is present in the namespace */
+ uint64_t tx_dropped; /* tap device can drop if the iface is down */
+
+ /* LAG information. */
+ bool is_lag_master; /* True if the netdev is a LAG master. */
+
+ /* AF_XDP information */
+#ifdef HAVE_AF_XDP
+ struct xsk_socket_info *xsk[MAX_XSKQ];
+ int requested_n_rxq;
+ int xdpmode, requested_xdpmode; /* detect mode changed */
+ int xdp_flags, xdp_bind_flags;
+ ovs_spinlock_t tx_lock;
+#endif
+};
+
+static bool
+is_netdev_linux_class(const struct netdev_class *netdev_class)
+{
+ return netdev_class->run == netdev_linux_run;
+}
+
+static struct netdev_linux *
+netdev_linux_cast(const struct netdev *netdev)
+{
+ ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
+
+ return CONTAINER_OF(netdev, struct netdev_linux, up);
+}
+
+static struct netdev_rxq_linux *
+netdev_rxq_linux_cast(const struct netdev_rxq *rx)
+{
+ ovs_assert(is_netdev_linux_class(netdev_get_class(rx->netdev)));
+
+ return CONTAINER_OF(rx, struct netdev_rxq_linux, up);
+}
+
+#endif /* netdev-linux-private.h */
@@ -17,6 +17,7 @@
#include <config.h>
#include "netdev-linux.h"
+#include "netdev-linux-private.h"
#include <errno.h>
#include <fcntl.h>
@@ -54,6 +55,7 @@
#include "fatal-signal.h"
#include "hash.h"
#include "openvswitch/hmap.h"
+#include "netdev-afxdp.h"
#include "netdev-provider.h"
#include "netdev-tc-offloads.h"
#include "netdev-vport.h"
@@ -487,57 +489,6 @@ static int tc_calc_cell_log(unsigned int mtu);
static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, int mtu);
static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_t burst_bytes);
-struct netdev_linux {
- struct netdev up;
-
- /* Protects all members below. */
- struct ovs_mutex mutex;
-
- unsigned int cache_valid;
-
- bool miimon; /* Link status of last poll. */
- long long int miimon_interval; /* Miimon Poll rate. Disabled if <= 0. */
- struct timer miimon_timer;
-
- int netnsid; /* Network namespace ID. */
- /* The following are figured out "on demand" only. They are only valid
- * when the corresponding VALID_* bit in 'cache_valid' is set. */
- int ifindex;
- struct eth_addr etheraddr;
- int mtu;
- unsigned int ifi_flags;
- long long int carrier_resets;
- uint32_t kbits_rate; /* Policing data. */
- uint32_t kbits_burst;
- int vport_stats_error; /* Cached error code from vport_get_stats().
- 0 or an errno value. */
- int netdev_mtu_error; /* Cached error code from SIOCGIFMTU or SIOCSIFMTU. */
- int ether_addr_error; /* Cached error code from set/get etheraddr. */
- int netdev_policing_error; /* Cached error code from set policing. */
- int get_features_error; /* Cached error code from ETHTOOL_GSET. */
- int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */
-
- enum netdev_features current; /* Cached from ETHTOOL_GSET. */
- enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
- enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
-
- struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. */
- struct tc *tc;
-
- /* For devices of class netdev_tap_class only. */
- int tap_fd;
- bool present; /* If the device is present in the namespace */
- uint64_t tx_dropped; /* tap device can drop if the iface is down */
-
- /* LAG information. */
- bool is_lag_master; /* True if the netdev is a LAG master. */
-};
-
-struct netdev_rxq_linux {
- struct netdev_rxq up;
- bool is_tap;
- int fd;
-};
/* This is set pretty low because we probably won't learn anything from the
* additional log messages. */
@@ -551,8 +502,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
* changes in the device miimon status, so we can use atomic_count. */
static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
-static void netdev_linux_run(const struct netdev_class *);
-
static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *,
int cmd, const char *cmd_name);
static int get_flags(const struct netdev *, unsigned int *flags);
@@ -566,7 +515,6 @@ static int do_set_addr(struct netdev *netdev,
struct in_addr addr);
static int get_etheraddr(const char *netdev_name, struct eth_addr *ea);
static int set_etheraddr(const char *netdev_name, const struct eth_addr);
-static int get_stats_via_netlink(const struct netdev *, struct netdev_stats *);
static int af_packet_sock(void);
static bool netdev_linux_miimon_enabled(void);
static void netdev_linux_miimon_run(void);
@@ -574,31 +522,10 @@ static void netdev_linux_miimon_wait(void);
static int netdev_linux_get_mtu__(struct netdev_linux *netdev, int *mtup);
static bool
-is_netdev_linux_class(const struct netdev_class *netdev_class)
-{
- return netdev_class->run == netdev_linux_run;
-}
-
-static bool
is_tap_netdev(const struct netdev *netdev)
{
return netdev_get_class(netdev) == &netdev_tap_class;
}
-
-static struct netdev_linux *
-netdev_linux_cast(const struct netdev *netdev)
-{
- ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
-
- return CONTAINER_OF(netdev, struct netdev_linux, up);
-}
-
-static struct netdev_rxq_linux *
-netdev_rxq_linux_cast(const struct netdev_rxq *rx)
-{
- ovs_assert(is_netdev_linux_class(netdev_get_class(rx->netdev)));
- return CONTAINER_OF(rx, struct netdev_rxq_linux, up);
-}
static int
netdev_linux_netnsid_update__(struct netdev_linux *netdev)
@@ -774,7 +701,7 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
}
}
-static void
+void
netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
{
struct nl_sock *sock;
@@ -3279,9 +3206,7 @@ exit:
.run = netdev_linux_run, \
.wait = netdev_linux_wait, \
.alloc = netdev_linux_alloc, \
- .destruct = netdev_linux_destruct, \
.dealloc = netdev_linux_dealloc, \
- .send = netdev_linux_send, \
.send_wait = netdev_linux_send_wait, \
.set_etheraddr = netdev_linux_set_etheraddr, \
.get_etheraddr = netdev_linux_get_etheraddr, \
@@ -3312,10 +3237,8 @@ exit:
.arp_lookup = netdev_linux_arp_lookup, \
.update_flags = netdev_linux_update_flags, \
.rxq_alloc = netdev_linux_rxq_alloc, \
- .rxq_construct = netdev_linux_rxq_construct, \
.rxq_destruct = netdev_linux_rxq_destruct, \
.rxq_dealloc = netdev_linux_rxq_dealloc, \
- .rxq_recv = netdev_linux_rxq_recv, \
.rxq_wait = netdev_linux_rxq_wait, \
.rxq_drain = netdev_linux_rxq_drain
@@ -3323,30 +3246,64 @@ const struct netdev_class netdev_linux_class = {
NETDEV_LINUX_CLASS_COMMON,
LINUX_FLOW_OFFLOAD_API,
.type = "system",
+ .is_pmd = false,
.construct = netdev_linux_construct,
+ .destruct = netdev_linux_destruct,
.get_stats = netdev_linux_get_stats,
.get_features = netdev_linux_get_features,
.get_status = netdev_linux_get_status,
- .get_block_id = netdev_linux_get_block_id
+ .get_block_id = netdev_linux_get_block_id,
+ .send = netdev_linux_send,
+ .rxq_construct = netdev_linux_rxq_construct,
+ .rxq_recv = netdev_linux_rxq_recv,
};
const struct netdev_class netdev_tap_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "tap",
+ .is_pmd = false,
.construct = netdev_linux_construct_tap,
+ .destruct = netdev_linux_destruct,
.get_stats = netdev_tap_get_stats,
.get_features = netdev_linux_get_features,
.get_status = netdev_linux_get_status,
+ .send = netdev_linux_send,
+ .rxq_construct = netdev_linux_rxq_construct,
+ .rxq_recv = netdev_linux_rxq_recv,
};
const struct netdev_class netdev_internal_class = {
NETDEV_LINUX_CLASS_COMMON,
LINUX_FLOW_OFFLOAD_API,
.type = "internal",
+ .is_pmd = false,
.construct = netdev_linux_construct,
+ .destruct = netdev_linux_destruct,
.get_stats = netdev_internal_get_stats,
.get_status = netdev_internal_get_status,
+ .send = netdev_linux_send,
+ .rxq_construct = netdev_linux_rxq_construct,
+ .rxq_recv = netdev_linux_rxq_recv,
};
+
+#ifdef HAVE_AF_XDP
+const struct netdev_class netdev_afxdp_class = {
+ NETDEV_LINUX_CLASS_COMMON,
+ .type = "afxdp",
+ .is_pmd = true,
+ .construct = netdev_linux_construct,
+ .destruct = netdev_afxdp_destruct,
+ .get_stats = netdev_afxdp_get_stats,
+ .get_status = netdev_linux_get_status,
+ .set_config = netdev_afxdp_set_config,
+ .get_config = netdev_afxdp_get_config,
+ .reconfigure = netdev_afxdp_reconfigure,
+ .get_numa_id = netdev_afxdp_get_numa_id,
+ .send = netdev_afxdp_batch_send,
+ .rxq_construct = netdev_afxdp_rxq_construct,
+ .rxq_recv = netdev_afxdp_rxq_recv,
+};
+#endif
#define CODEL_N_QUEUES 0x0000
@@ -5918,7 +5875,7 @@ netdev_stats_from_rtnl_link_stats64(struct netdev_stats *dst,
dst->tx_window_errors = src->tx_window_errors;
}
-static int
+int
get_stats_via_netlink(const struct netdev *netdev_, struct netdev_stats *stats)
{
struct ofpbuf request;
@@ -903,6 +903,9 @@ extern const struct netdev_class netdev_linux_class;
extern const struct netdev_class netdev_internal_class;
extern const struct netdev_class netdev_tap_class;
+#ifdef HAVE_AF_XDP
+extern const struct netdev_class netdev_afxdp_class;
+#endif
#ifdef __cplusplus
}
#endif
@@ -104,6 +104,9 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
static void restore_all_flags(void *aux OVS_UNUSED);
void update_device_args(struct netdev *, const struct shash *args);
+#ifdef HAVE_AF_XDP
+void signal_remove_xdp(struct netdev *netdev);
+#endif
int
netdev_n_txq(const struct netdev *netdev)
@@ -146,6 +149,9 @@ netdev_initialize(void)
netdev_register_provider(&netdev_internal_class);
netdev_register_provider(&netdev_tap_class);
netdev_vport_tunnel_register();
+#ifdef HAVE_AF_XDP
+ netdev_register_provider(&netdev_afxdp_class);
+#endif
#endif
#if defined(__FreeBSD__) || defined(__NetBSD__)
netdev_register_provider(&netdev_tap_class);
@@ -2007,6 +2013,11 @@ restore_all_flags(void *aux OVS_UNUSED)
saved_flags & ~saved_values,
&old_flags);
}
+#ifdef HAVE_AF_XDP
+ if (netdev->netdev_class == &netdev_afxdp_class) {
+ signal_remove_xdp(netdev);
+ }
+#endif
}
}
new file mode 100644
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef SPINLOCK_H
+#define SPINLOCK_H 1
+
+#include <config.h>
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include "ovs-atomic.h"
+
+typedef struct {
+ atomic_int locked;
+} ovs_spinlock_t;
+
+static inline void
+ovs_spinlock_init(ovs_spinlock_t *sl)
+{
+ atomic_init(&sl->locked, 0);
+}
+
+static inline void
+ovs_spin_lock(ovs_spinlock_t *sl)
+{
+ int exp = 0, locked = 0;
+
+ while (!atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
+ memory_order_acquire,
+ memory_order_relaxed)) {
+ locked = 1;
+ while (locked) {
+ atomic_read_relaxed(&sl->locked, &locked);
+ }
+ exp = 0;
+ }
+}
+
+static inline void
+ovs_spin_unlock(ovs_spinlock_t *sl)
+{
+ atomic_store_explicit(&sl->locked, 0, memory_order_release);
+}
+
+static inline int OVS_UNUSED
+ovs_spin_trylock(ovs_spinlock_t *sl)
+{
+ int exp = 0;
+ return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
+ memory_order_acquire,
+ memory_order_relaxed);
+}
+#endif
new file mode 100644
@@ -0,0 +1,183 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <config.h>
+
+#if !HAVE_POSIX_MEMALIGN
+#error POSIX_MEMALIGN is required for AF_XDP
+#endif
+
+#include "xdpsock.h"
+#include "dp-packet.h"
+#include "openvswitch/compiler.h"
+
+/* Note:
+ * umem_elem_push* shouldn't overflow because we always pop
+ * elem first, then push back to the stack.
+ */
+static inline void
+__umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ void *ptr;
+
+ if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
+ OVS_NOT_REACHED();
+ }
+
+ ptr = &umemp->array[umemp->index];
+ memcpy(ptr, addrs, n * sizeof(void *));
+ umemp->index += n;
+}
+
+void umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ ovs_spin_lock(&umemp->mutex);
+ __umem_elem_push_n(umemp, n, addrs);
+ ovs_spin_unlock(&umemp->mutex);
+}
+
+static inline void
+__umem_elem_push(struct umem_pool *umemp, void *addr)
+{
+ if (OVS_UNLIKELY(umemp->index + 1) > umemp->size) {
+ OVS_NOT_REACHED();
+ }
+
+ umemp->array[umemp->index++] = addr;
+}
+
+void
+umem_elem_push(struct umem_pool *umemp, void *addr)
+{
+
+ ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
+
+ ovs_spin_lock(&umemp->mutex);
+ __umem_elem_push(umemp, addr);
+ ovs_spin_unlock(&umemp->mutex);
+}
+
+static inline int
+__umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ void *ptr;
+
+ if (OVS_UNLIKELY(umemp->index - n < 0)) {
+ return -ENOMEM;
+ }
+
+ umemp->index -= n;
+ ptr = &umemp->array[umemp->index];
+ memcpy(addrs, ptr, n * sizeof(void *));
+
+ return 0;
+}
+
+int
+umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ int ret;
+
+ ovs_spin_lock(&umemp->mutex);
+ ret = __umem_elem_pop_n(umemp, n, addrs);
+ ovs_spin_unlock(&umemp->mutex);
+
+ return ret;
+}
+
+static inline void *
+__umem_elem_pop(struct umem_pool *umemp)
+{
+ if (OVS_UNLIKELY(umemp->index - 1 < 0)) {
+ return NULL;
+ }
+
+ return umemp->array[--umemp->index];
+}
+
+void *
+umem_elem_pop(struct umem_pool *umemp)
+{
+ void *ptr;
+
+ ovs_spin_lock(&umemp->mutex);
+ ptr = __umem_elem_pop(umemp);
+ ovs_spin_unlock(&umemp->mutex);
+
+ return ptr;
+}
+
+static void **
+__umem_pool_alloc(unsigned int size)
+{
+ void *bufs;
+ int ret;
+
+ ret = posix_memalign(&bufs, getpagesize(),
+ size * sizeof(void *));
+ if (ret) {
+ return NULL;
+ }
+
+ memset(bufs, 0, size * sizeof(void *));
+ return (void **)bufs;
+}
+
+int
+umem_pool_init(struct umem_pool *umemp, unsigned int size)
+{
+ umemp->array = __umem_pool_alloc(size);
+ if (!umemp->array) {
+ return -ENOMEM;
+ }
+
+ umemp->size = size;
+ umemp->index = 0;
+ ovs_spinlock_init(&umemp->mutex);
+ return 0;
+}
+
+void
+umem_pool_cleanup(struct umem_pool *umemp)
+{
+ free(umemp->array);
+ umemp->array = NULL;
+}
+
+/* AF_XDP metadata init/destroy */
+int
+xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
+{
+ void *bufs;
+ int ret;
+
+ ret = posix_memalign(&bufs, getpagesize(),
+ size * sizeof(struct dp_packet_afxdp));
+ if (ret) {
+ return -ENOMEM;
+ }
+ memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
+
+ xp->array = bufs;
+ xp->size = size;
+ return 0;
+}
+
+void
+xpacket_pool_cleanup(struct xpacket_pool *xp)
+{
+ free(xp->array);
+ xp->array = NULL;
+}
new file mode 100644
@@ -0,0 +1,101 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+
+#include <config.h>
+
+#ifdef HAVE_AF_XDP
+
+#include <bpf/xsk.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stdio.h>
+
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "spinlock.h"
+
+#define FRAME_HEADROOM XDP_PACKET_HEADROOM
+#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK ((1 << FRAME_SHIFT) - 1)
+
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+
+/* The worst case is all 4 queues TX/CQ/RX/FILL are full.
+ * Setting NUM_FRAMES to this makes sure umem_pop always successes.
+ */
+#define NUM_FRAMES (2 * (PROD_NUM_DESCS + CONS_NUM_DESCS))
+
+#define BATCH_SIZE NETDEV_MAX_BURST
+
+BUILD_ASSERT_DECL(IS_POW2(NUM_FRAMES));
+BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
+BUILD_ASSERT_DECL(NUM_FRAMES == 2 * (PROD_NUM_DESCS + CONS_NUM_DESCS));
+
+/* LIFO ptr_array */
+struct umem_pool {
+ int index; /* point to top */
+ unsigned int size;
+ ovs_spinlock_t mutex;
+ void **array; /* a pointer array, point to umem buf */
+};
+
+/* array-based dp_packet_afxdp */
+struct xpacket_pool {
+ unsigned int size;
+ struct dp_packet_afxdp **array;
+};
+
+struct xsk_umem_info {
+ struct umem_pool mpool;
+ struct xpacket_pool xpool;
+ struct xsk_ring_prod fq;
+ struct xsk_ring_cons cq;
+ struct xsk_umem *umem;
+ void *buffer;
+};
+
+struct xsk_socket_info {
+ struct xsk_ring_cons rx;
+ struct xsk_ring_prod tx;
+ struct xsk_umem_info *umem;
+ struct xsk_socket *xsk;
+ unsigned long rx_dropped;
+ unsigned long tx_dropped;
+ uint32_t outstanding_tx;
+};
+
+struct umem_elem {
+ struct umem_elem *next;
+};
+
+void umem_elem_push(struct umem_pool *umemp, void *addr);
+void umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
+
+void *umem_elem_pop(struct umem_pool *umemp);
+int umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
+
+int umem_pool_init(struct umem_pool *umemp, unsigned int size);
+void umem_pool_cleanup(struct umem_pool *umemp);
+int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
+void xpacket_pool_cleanup(struct xpacket_pool *xp);
+
+#endif
+#endif
@@ -4,12 +4,14 @@ EXTRA_DIST += \
$(SYSTEM_TESTSUITE_AT) \
$(SYSTEM_KMOD_TESTSUITE_AT) \
$(SYSTEM_USERSPACE_TESTSUITE_AT) \
+ $(SYSTEM_AFXDP_TESTSUITE_AT) \
$(SYSTEM_OFFLOADS_TESTSUITE_AT) \
$(SYSTEM_DPDK_TESTSUITE_AT) \
$(OVSDB_CLUSTER_TESTSUITE_AT) \
$(TESTSUITE) \
$(SYSTEM_KMOD_TESTSUITE) \
$(SYSTEM_USERSPACE_TESTSUITE) \
+ $(SYSTEM_AFXDP_TESTSUITE) \
$(SYSTEM_OFFLOADS_TESTSUITE) \
$(SYSTEM_DPDK_TESTSUITE) \
$(OVSDB_CLUSTER_TESTSUITE) \
@@ -159,6 +161,10 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
tests/system-userspace-macros.at \
tests/system-userspace-packet-type-aware.at
+SYSTEM_AFXDP_TESTSUITE_AT = \
+ tests/system-afxdp-testsuite.at \
+ tests/system-afxdp-macros.at
+
SYSTEM_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-ovn.at \
@@ -183,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
+SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
@@ -316,6 +323,11 @@ check-system-userspace: all
set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+check-afxdp: all
+ $(MAKE) install
+ set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
+ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+
check-offloads: all
set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
@@ -353,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@
+$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
+ $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
+ $(AM_V_at)mv $@.tmp $@
+
$(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@
new file mode 100644
@@ -0,0 +1,20 @@
+# Add port to ovs bridge by using afxdp mode.
+# This will use generic XDP support in the veth driver.
+m4_define([ADD_VETH],
+ [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
+ CONFIGURE_VETH_OFFLOADS([$1])
+ AT_CHECK([ip link set $1 netns $2])
+ AT_CHECK([ip link set dev ovs-$1 up])
+ AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+ set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"])
+ NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
+ NS_CHECK_EXEC([$2], [ip link set dev $1 up])
+ if test -n "$5"; then
+ NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
+ fi
+ if test -n "$6"; then
+ NS_CHECK_EXEC([$2], [ip route add default via $6])
+ fi
+ on_exit 'ip link del ovs-$1'
+ ]
+)
new file mode 100644
@@ -0,0 +1,26 @@
+AT_INIT
+
+AT_COPYRIGHT([Copyright (c) 2018, 2019 Nicira, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
+
+m4_include([tests/ovs-macros.at])
+m4_include([tests/ovsdb-macros.at])
+m4_include([tests/ofproto-macros.at])
+m4_include([tests/system-common-macros.at])
+m4_include([tests/system-userspace-macros.at])
+m4_include([tests/system-afxdp-macros.at])
+
+m4_include([tests/system-traffic.at])
@@ -3082,6 +3082,21 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
</p>
</column>
+ <column name="other_config" key="xdpmode"
+ type='{"type": "string",
+ "enum": ["set", ["skb", "drv"]]}'>
+ <p>
+ Specifies the operational mode of the XDP program.
+ If "drv", the XDP program is loaded into the device driver with
+ zero-copy RX and TX enabled. This mode requires device driver with
+ AF_XDP support and has the best performance.
+ If "skb", the XDP program is using generic XDP mode in kernel with
+ extra data copying between userspace and kernel. No device driver
+ support is needed. Note that this is afxdp netdev type only.
+ Defaults to "skb" mode.
+ </p>
+ </column>
+
<column name="options" key="vhost-server-path"
type='{"type": "string"}'>
<p>
The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket type built upon the eBPF and XDP technology. It is aims to have comparable performance to DPDK but cooperate better with existing kernel's networking stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program attached to the netdev, by-passing a couple of Linux kernel's subsystems As a result, AF_XDP socket shows much better performance than AF_PACKET For more details about AF_XDP, please see linux kernel's Documentation/networking/af_xdp.rst. Note that by default, this feature is not compiled in. Signed-off-by: William Tu <u9012063@gmail.com> --- v1->v2: - add a list to maintain unused umem elements - remove copy from rx umem to ovs internal buffer - use hugetlb to reduce misses (not much difference) - use pmd mode netdev in OVS (huge performance improve) - remove malloc dp_packet, instead put dp_packet in umem v2->v3: - rebase on the OVS master, 7ab4b0653784 ("configure: Check for more specific function to pull in pthread library.") - remove the dependency on libbpf and dpif-bpf. instead, use the built-in XDP_ATTACH feature. - data structure optimizations for better performance, see[1] - more test cases support v3: https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html v3->v4: - Use AF_XDP API provided by libbpf - Remove the dependency on XDP_ATTACH kernel patch set - Add documentation, bpf.rst v4->v5: - rebase to master - remove rfc, squash all into a single patch - add --enable-afxdp, so by default, AF_XDP is not compiled - add options: xdpmode=drv,skb - add multiple queue and multiple PMD support, with options: n_rxq - improve documentation, rename bpf.rst to af_xdp.rst v5->v6 - rebase to master, commit 0cdd5b13de91b98 - address errors from sparse and clang - pass travis-ci test - address feedback from Ben - fix issues reported by 0-day robot - improved documentation v6-v7 - rebase to master, commit abf11558c1515bf3b1 - address feedbacks from Ilya, Ben, and Eelco, see: https://www.mail-archive.com/ovs-dev@openvswitch.org/msg32357.html - add XDP mode change, implement get/set_config, reconfigure - Fix reconfiguration/crash issue caused by libbpf, see patch: [PATCH bpf 0/2] libbpf: fixes for AF_XDP teardown - perf optimization for batching umem_push/pop - perf optimization for batching kick_tx - test build with dpdk - fix/refactor atomic operation - make AF_XDP x86 specific, otherwise fail at build time - lots of code refactoring - add PVP setup in documentation v7-v8: - Address feedback from Ilya at: https://patchwork.ozlabs.org/patch/1095019/ - add netdev-linux-private.h - fix afxdp reconfigure issue - sort include headers - remove unnecessary OVS_UNUSED - coding style fixes - error case handling and memory leak v8-v9: - rebase to master 180bbbed3a3867d52 - Address review feedback from Ben, Ilya and Eelco, at: https://patchwork.ozlabs.org/patch/1097740/ - == From Ilya == - Optimize the reconfiguration logic - Implement .rxq_recv and .send for afxdp - Remove system-afxdp-traffic.at, reuse existing code - Use Ilya's rdtsc code - remove --disable-system - == From Eelco == - Fix bug when remove br0, util(revalidator49)|EMER|lib/poll-loop.c:111: assertion !fd != !wevent failed - Fix bug and use default value from libbpf, ex: XSK_RING_PROD__DEFAULT... - Clear xdp program when receive signal, ctrl+c - Add options to vswitch.xml, set xdpmode default to skb-mode - No support for ARM and PPC, now x86_64 only - remove redundant header includes and function/macro definitions - remove some ifdef HAVE_AF_XDP - == From others/both about afxdp rx and tx == - Several umem push/pop error handling improvement/fixes - add lock to address concurrent_txq case - improve error handling - add stats - Things that are not done yet - MTU limitation - n_txq_desc/n_rxq_desc option. --- Documentation/automake.mk | 1 + Documentation/index.rst | 1 + Documentation/intro/install/afxdp.rst | 433 +++++++++++++++++ Documentation/intro/install/index.rst | 1 + acinclude.m4 | 35 ++ configure.ac | 1 + lib/automake.mk | 14 + lib/dp-packet.c | 28 ++ lib/dp-packet.h | 18 +- lib/dpif-netdev-perf.h | 6 + lib/netdev-afxdp.c | 858 ++++++++++++++++++++++++++++++++++ lib/netdev-afxdp.h | 77 +++ lib/netdev-linux-private.h | 139 ++++++ lib/netdev-linux.c | 121 ++--- lib/netdev-provider.h | 3 + lib/netdev.c | 11 + lib/spinlock.h | 70 +++ lib/xdpsock.c | 183 ++++++++ lib/xdpsock.h | 101 ++++ tests/automake.mk | 16 + tests/system-afxdp-macros.at | 20 + tests/system-afxdp-testsuite.at | 26 ++ vswitchd/vswitch.xml | 15 + 23 files changed, 2095 insertions(+), 83 deletions(-) create mode 100644 Documentation/intro/install/afxdp.rst create mode 100644 lib/netdev-afxdp.c create mode 100644 lib/netdev-afxdp.h create mode 100644 lib/netdev-linux-private.h create mode 100644 lib/spinlock.h create mode 100644 lib/xdpsock.c create mode 100644 lib/xdpsock.h create mode 100644 tests/system-afxdp-macros.at create mode 100644 tests/system-afxdp-testsuite.at