mbox series

[bpf-next,v4,0/2] libbpf: adding AF_XDP support

Message ID 1549631126-29067-1-git-send-email-magnus.karlsson@intel.com
Headers show
Series libbpf: adding AF_XDP support | expand

Message

Magnus Karlsson Feb. 8, 2019, 1:05 p.m. UTC
This patch proposes to add AF_XDP support to libbpf. The main reason
for this is to facilitate writing applications that use AF_XDP by
offering higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The proposed interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
  and AF_XDP sockets. This interface also loads a simple XDP program
  that routes all traffic on a queue up to the AF_XDP socket.

The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.

The rings are now called xsk_ring and when a producer operates on
it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
way we can get some compile time error checking that the rings are
used correctly.

Comments and contenplations:

* The current behaviour is that the library loads an XDP program (if
  requested to do so) but the clean up of this program is left to the
  application. It would be possible to implement this cleanup in the
  library, but it would require state to be kept on netdev level,
  which there is none at the moment, and the synchronization of this
  between processes. All this adding complexity. But when we get an
  XDP program per queue id, then it becomes trivial to also remove the
  XDP program when the application exits. This proposal from Jesper,
  Björn and others will also improve the performance of libbpf, since
  most of the XDP program code can be removed when that feature is
  supported.

* In a future release, I am planning on adding a higher level data
  plane interface too. This will be based around recvmsg and sendmsg
  with the use of struct iovec for batching, without the user having
  to know anything about the underlying four rings of an AF_XDP
  socket. There will be one semantic difference though from the
  standard recvmsg and that is that the kernel will fill in the iovecs
  instead of the application. But the rest should be the same as the
  libc versions so that application writers feel at home.

Patch 1: adds AF_XDP support in libbpf
Patch 2: updates the xdpsock sample application to use the libbpf functions.

Changes v3 to v4:
  * Dropped the pr_*() patch in favor of Yonghong Song's patch set
  * Addressed the review comments of Daniel Borkmann, mainly leaking
    of file descriptors at clean up and making the data plane APIs
    all static inline (with the exception of xsk_umem__get_data that
    uses an internal structure I do not want to expose).
  * Fixed the netlink callback as suggested by Maciej Fijalkowski.
  * Removed an unecessary include in the sample program as spotted by
    Ilia Fillipov.
Changes v2 to v3:
  * Added automatic loading of a simple XDP program that routes all
    traffic on a queue up to the AF_XDP socket. This program loading
    can be disabled.
  * Updated function names to be consistent with the libbpf naming
    convention
  * Moved all code to xsk.[ch]
  * Removed all the XDP program loading code from the sample since
    this is now done by libbpf
  * The initialization functions now return a handle as suggested by
    Alexei
  * const statements added in the API where applicable.
Changes v1 to v2:
  * Fixed cleanup of library state on error.
  * Moved API to initial version
  * Prefixed all public functions by xsk__ instead of xsk_
  * Added comment about changed default ring sizes, batch size and umem
    size in the sample application commit message
  * The library now only creates an Rx or Tx ring if the respective
    parameter is != NULL

Note that for zero-copy to work on FVL you need the following patch:
https://lore.kernel.org/netdev/1548770597-16141-1-git-send-email-magnus.karlsson@intel.com/.
For ixgbe, you need a similar patch called found here:
https://lore.kernel.org/netdev/CAJ8uoz1GJBmC0GFbURvEzY4kDZZ6C7O9+1F+gV0y=GOMGLobUQ@mail.gmail.com/.

I based this patch set on bpf-next commit a4021a3579c5 ("tools/bpf: add log_level to bpf_load_program_attr")

Thanks: Magnus

Magnus Karlsson (2):
  libbpf: add support for using AF_XDP sockets
  samples/bpf: convert xdpsock to use libbpf for AF_XDP access

 samples/bpf/Makefile              |   1 -
 samples/bpf/xdpsock.h             |  11 -
 samples/bpf/xdpsock_kern.c        |  56 ---
 samples/bpf/xdpsock_user.c        | 839 ++++++++++++--------------------------
 tools/include/uapi/linux/if_xdp.h |  78 ++++
 tools/lib/bpf/Build               |   2 +-
 tools/lib/bpf/Makefile            |   5 +-
 tools/lib/bpf/README.rst          |  11 +-
 tools/lib/bpf/libbpf.map          |   7 +
 tools/lib/bpf/xsk.c               | 742 +++++++++++++++++++++++++++++++++
 tools/lib/bpf/xsk.h               | 205 ++++++++++
 11 files changed, 1306 insertions(+), 651 deletions(-)
 delete mode 100644 samples/bpf/xdpsock.h
 delete mode 100644 samples/bpf/xdpsock_kern.c
 create mode 100644 tools/include/uapi/linux/if_xdp.h
 create mode 100644 tools/lib/bpf/xsk.c
 create mode 100644 tools/lib/bpf/xsk.h

--
2.7.4

Comments

Jean-Mickael Guerin Feb. 11, 2019, 6:33 a.m. UTC | #1
Hi Magnus,

> * In a future release, I am planning on adding a higher level data
>   plane interface too. This will be based around recvmsg and sendmsg
>   with the use of struct iovec for batching, without the user having
>   to know anything about the underlying four rings of an AF_XDP
>   socket. There will be one semantic difference though from the
>   standard recvmsg and that is that the kernel will fill in the iovecs
>   instead of the application. But the rest should be the same as the
>   libc versions so that application writers feel at home.

You might consider recvmmsg() and sendmmsg() (bulk of multi segments packets?)

Jean-Mickael
Magnus Karlsson Feb. 11, 2019, 7:52 a.m. UTC | #2
On Mon, Feb 11, 2019 at 7:34 AM Jean-Mickael Guerin <jmg@6wind.com> wrote:
>
> Hi Magnus,
>
> > * In a future release, I am planning on adding a higher level data
> >   plane interface too. This will be based around recvmsg and sendmsg
> >   with the use of struct iovec for batching, without the user having
> >   to know anything about the underlying four rings of an AF_XDP
> >   socket. There will be one semantic difference though from the
> >   standard recvmsg and that is that the kernel will fill in the iovecs
> >   instead of the application. But the rest should be the same as the
> >   libc versions so that application writers feel at home.
>
> You might consider recvmmsg() and sendmmsg() (bulk of multi segments packets?)

Exactly :-). Spot on.

/Magnus

> Jean-Mickael
Jonathan Lemon Feb. 11, 2019, 7:48 p.m. UTC | #3
On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:

> This patch proposes to add AF_XDP support to libbpf. The main reason
> for this is to facilitate writing applications that use AF_XDP by
> offering higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.

I like the idea of encapsulating the boilerplate logic in a library.

I do think there is an important missing piece though - there should be
some code which queries the netdev for how many queues are attached, and
create the appropriate number of umem/AF_XDP sockets.

I ran into this issue when testing the current AF_XDP code - on my test
boxes, the mlx5 card has 55 channels (aka queues), so when the test program
binds only to channel 0, nothing works as expected, since not all traffic
is being intercepted.  While obvious in hindsight, this took a while to
track down.
Magnus Karlsson Feb. 13, 2019, 11:32 a.m. UTC | #4
On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
>
> > This patch proposes to add AF_XDP support to libbpf. The main reason
> > for this is to facilitate writing applications that use AF_XDP by
> > offering higher-level APIs that hide many of the details of the AF_XDP
> > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > offering easy-to-use higher level interfaces of XDP
> > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > applications using it simpler and smaller, and finally also make it
> > possible for applications to benefit from optimizations in the AF_XDP
> > user space access code. Previously, people just copied and pasted the
> > code from the sample application into their application, which is not
> > desirable.
>
> I like the idea of encapsulating the boilerplate logic in a library.
>
> I do think there is an important missing piece though - there should be
> some code which queries the netdev for how many queues are attached, and
> create the appropriate number of umem/AF_XDP sockets.
>
> I ran into this issue when testing the current AF_XDP code - on my test
> boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> binds only to channel 0, nothing works as expected, since not all traffic
> is being intercepted.  While obvious in hindsight, this took a while to
> track down.

Yes, agreed. You are not the first one to stumble upon this problem
:-). Let me think a little bit on how to solve this in a good way. We
need this to be simple and intuitive, as you say.

/Magnus

> --
> Jonathan
Jesper Dangaard Brouer Feb. 13, 2019, 11:55 a.m. UTC | #5
On Wed, 13 Feb 2019 12:32:47 +0100
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >  
> > > This patch proposes to add AF_XDP support to libbpf. The main reason
> > > for this is to facilitate writing applications that use AF_XDP by
> > > offering higher-level APIs that hide many of the details of the AF_XDP
> > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > offering easy-to-use higher level interfaces of XDP
> > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > applications using it simpler and smaller, and finally also make it
> > > possible for applications to benefit from optimizations in the AF_XDP
> > > user space access code. Previously, people just copied and pasted the
> > > code from the sample application into their application, which is not
> > > desirable.  
> >
> > I like the idea of encapsulating the boilerplate logic in a library.
> >
> > I do think there is an important missing piece though - there should be
> > some code which queries the netdev for how many queues are attached, and
> > create the appropriate number of umem/AF_XDP sockets.
> >
> > I ran into this issue when testing the current AF_XDP code - on my test
> > boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> > binds only to channel 0, nothing works as expected, since not all traffic
> > is being intercepted.  While obvious in hindsight, this took a while to
> > track down.  
> 
> Yes, agreed. You are not the first one to stumble upon this problem
> :-). Let me think a little bit on how to solve this in a good way. We
> need this to be simple and intuitive, as you say.

I see people hitting this with AF_XDP all the time... I had some
backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
give the performance reason why and propose a workaround.

[1] https://github.com/xdp-project/xdp-project/tree/master/conference/FOSDEM2019
[2] https://github.com/xdp-project/xdp-project/blob/master/conference/FOSDEM2019/xdp_building_block.org#backup-slides

Alternative work-around
  * Create as many AF_XDP sockets as RXQs
  * Have userspace poll()/select on all sockets
Jonathan Lemon Feb. 13, 2019, 8:49 p.m. UTC | #6
On 13 Feb 2019, at 3:32, Magnus Karlsson wrote:

> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon 
> <jonathan.lemon@gmail.com> wrote:
>>
>> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
>>
>>> This patch proposes to add AF_XDP support to libbpf. The main reason
>>> for this is to facilitate writing applications that use AF_XDP by
>>> offering higher-level APIs that hide many of the details of the 
>>> AF_XDP
>>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
>>> offering easy-to-use higher level interfaces of XDP
>>> functionality. Hopefully this will facilitate adoption of AF_XDP, 
>>> make
>>> applications using it simpler and smaller, and finally also make it
>>> possible for applications to benefit from optimizations in the 
>>> AF_XDP
>>> user space access code. Previously, people just copied and pasted 
>>> the
>>> code from the sample application into their application, which is 
>>> not
>>> desirable.
>>
>> I like the idea of encapsulating the boilerplate logic in a library.
>>
>> I do think there is an important missing piece though - there should 
>> be
>> some code which queries the netdev for how many queues are attached, 
>> and
>> create the appropriate number of umem/AF_XDP sockets.
>>
>> I ran into this issue when testing the current AF_XDP code - on my 
>> test
>> boxes, the mlx5 card has 55 channels (aka queues), so when the test 
>> program
>> binds only to channel 0, nothing works as expected, since not all 
>> traffic
>> is being intercepted.  While obvious in hindsight, this took a while 
>> to
>> track down.
>
> Yes, agreed. You are not the first one to stumble upon this problem
> :-). Let me think a little bit on how to solve this in a good way. We
> need this to be simple and intuitive, as you say.

Has any investigation been done on using some variant of MPSC 
implementation
as an intermediate form for AF_XDP?  E.g.: something like LCRQ or the 
bulkQ
in bpf devmap/cpumap.  I'm aware that this would be slightly slower, as 
it
would introduce a lock in the path, but I'd think that having DEVMAP, 
CPUMAP
and XSKMAP all behave the same way would add more flexibility.

Ideally, if the configuration matches the underlying hardware, then the
implementation would reduce to the current setup (and allow ZC 
implementations),
but a non-matching configuration would still work - as opposed to the 
current
situation.
Magnus Karlsson Feb. 14, 2019, 8:25 a.m. UTC | #7
On Wed, Feb 13, 2019 at 9:49 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> On 13 Feb 2019, at 3:32, Magnus Karlsson wrote:
>
> > On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon
> > <jonathan.lemon@gmail.com> wrote:
> >>
> >> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >>
> >>> This patch proposes to add AF_XDP support to libbpf. The main reason
> >>> for this is to facilitate writing applications that use AF_XDP by
> >>> offering higher-level APIs that hide many of the details of the
> >>> AF_XDP
> >>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> >>> offering easy-to-use higher level interfaces of XDP
> >>> functionality. Hopefully this will facilitate adoption of AF_XDP,
> >>> make
> >>> applications using it simpler and smaller, and finally also make it
> >>> possible for applications to benefit from optimizations in the
> >>> AF_XDP
> >>> user space access code. Previously, people just copied and pasted
> >>> the
> >>> code from the sample application into their application, which is
> >>> not
> >>> desirable.
> >>
> >> I like the idea of encapsulating the boilerplate logic in a library.
> >>
> >> I do think there is an important missing piece though - there should
> >> be
> >> some code which queries the netdev for how many queues are attached,
> >> and
> >> create the appropriate number of umem/AF_XDP sockets.
> >>
> >> I ran into this issue when testing the current AF_XDP code - on my
> >> test
> >> boxes, the mlx5 card has 55 channels (aka queues), so when the test
> >> program
> >> binds only to channel 0, nothing works as expected, since not all
> >> traffic
> >> is being intercepted.  While obvious in hindsight, this took a while
> >> to
> >> track down.
> >
> > Yes, agreed. You are not the first one to stumble upon this problem
> > :-). Let me think a little bit on how to solve this in a good way. We
> > need this to be simple and intuitive, as you say.
>
> Has any investigation been done on using some variant of MPSC
> implementation
> as an intermediate form for AF_XDP?  E.g.: something like LCRQ or the
> bulkQ
> in bpf devmap/cpumap.  I'm aware that this would be slightly slower, as
> it
> would introduce a lock in the path, but I'd think that having DEVMAP,
> CPUMAP
> and XSKMAP all behave the same way would add more flexibility.

Not as far as I know. But adding the option of having a MPSC or even
MPMC queues has been on the todo list for a while, however, the
current focus of Björn and myself is to upstream the performance
improvements from the Linux Plumbers paper, improve ease-of-use, and
help Jesper et al. with the per-queue XDP program implementation
(which will increase both performance and ease-of-use). If anyone has
some spare cycles out there, please go ahead and give MPSC or MPMC
queues a try :-).

/Magnus

> Ideally, if the configuration matches the underlying hardware, then the
> implementation would reduce to the current setup (and allow ZC
> implementations),
> but a non-matching configuration would still work - as opposed to the
> current
> situation.
> --
> Jonathan
Daniel Borkmann Feb. 15, 2019, 4:20 p.m. UTC | #8
On 02/13/2019 12:55 PM, Jesper Dangaard Brouer wrote:
> On Wed, 13 Feb 2019 12:32:47 +0100
> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>>> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
>>>  
>>>> This patch proposes to add AF_XDP support to libbpf. The main reason
>>>> for this is to facilitate writing applications that use AF_XDP by
>>>> offering higher-level APIs that hide many of the details of the AF_XDP
>>>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
>>>> offering easy-to-use higher level interfaces of XDP
>>>> functionality. Hopefully this will facilitate adoption of AF_XDP, make
>>>> applications using it simpler and smaller, and finally also make it
>>>> possible for applications to benefit from optimizations in the AF_XDP
>>>> user space access code. Previously, people just copied and pasted the
>>>> code from the sample application into their application, which is not
>>>> desirable.  
>>>
>>> I like the idea of encapsulating the boilerplate logic in a library.
>>>
>>> I do think there is an important missing piece though - there should be
>>> some code which queries the netdev for how many queues are attached, and
>>> create the appropriate number of umem/AF_XDP sockets.
>>>
>>> I ran into this issue when testing the current AF_XDP code - on my test
>>> boxes, the mlx5 card has 55 channels (aka queues), so when the test program
>>> binds only to channel 0, nothing works as expected, since not all traffic
>>> is being intercepted.  While obvious in hindsight, this took a while to
>>> track down.  
>>
>> Yes, agreed. You are not the first one to stumble upon this problem
>> :-). Let me think a little bit on how to solve this in a good way. We
>> need this to be simple and intuitive, as you say.
> 
> I see people hitting this with AF_XDP all the time... I had some
> backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
> give the performance reason why and propose a workaround.

Magnus, I presume you're going to address this for the initial libbpf merge
since the plan is to make it easier to consume for users?

Few more minor items in individual patches, will reply there.

Thanks,
Daniel

> [1] https://github.com/xdp-project/xdp-project/tree/master/conference/FOSDEM2019
> [2] https://github.com/xdp-project/xdp-project/blob/master/conference/FOSDEM2019/xdp_building_block.org#backup-slides
> 
> Alternative work-around
>   * Create as many AF_XDP sockets as RXQs
>   * Have userspace poll()/select on all sockets
>
Magnus Karlsson Feb. 18, 2019, 8:20 a.m. UTC | #9
On Fri, Feb 15, 2019 at 5:48 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 02/13/2019 12:55 PM, Jesper Dangaard Brouer wrote:
> > On Wed, 13 Feb 2019 12:32:47 +0100
> > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> >> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >>> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >>>
> >>>> This patch proposes to add AF_XDP support to libbpf. The main reason
> >>>> for this is to facilitate writing applications that use AF_XDP by
> >>>> offering higher-level APIs that hide many of the details of the AF_XDP
> >>>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> >>>> offering easy-to-use higher level interfaces of XDP
> >>>> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> >>>> applications using it simpler and smaller, and finally also make it
> >>>> possible for applications to benefit from optimizations in the AF_XDP
> >>>> user space access code. Previously, people just copied and pasted the
> >>>> code from the sample application into their application, which is not
> >>>> desirable.
> >>>
> >>> I like the idea of encapsulating the boilerplate logic in a library.
> >>>
> >>> I do think there is an important missing piece though - there should be
> >>> some code which queries the netdev for how many queues are attached, and
> >>> create the appropriate number of umem/AF_XDP sockets.
> >>>
> >>> I ran into this issue when testing the current AF_XDP code - on my test
> >>> boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> >>> binds only to channel 0, nothing works as expected, since not all traffic
> >>> is being intercepted.  While obvious in hindsight, this took a while to
> >>> track down.
> >>
> >> Yes, agreed. You are not the first one to stumble upon this problem
> >> :-). Let me think a little bit on how to solve this in a good way. We
> >> need this to be simple and intuitive, as you say.
> >
> > I see people hitting this with AF_XDP all the time... I had some
> > backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
> > give the performance reason why and propose a workaround.
>
> Magnus, I presume you're going to address this for the initial libbpf merge
> since the plan is to make it easier to consume for users?

I think the first thing we need is education and documentation. Have a
FAQ or "common mistakes" section in the Documentation. And of course,
sending Jesper around the world reminding people about this ;-).

To address this on a libbpf interface level, I think the best way is
to reprogram the NIC to send all traffic to the queue that you
provided in the xsk_socket__create call. This "set up NIC routing"
behavior can then be disable with a flag, just as the XDP program
loading can be disabled. The standard config of xsk_socket__create
will then set up as many things for the user as possible just to get
up and running quickly. More advanced users can then disable parts of
it to gain more flexibility. Does this sound OK? Do not want to go the
route of polling multiple sockets and aggregating the traffic as this
will have significant negative performance implications.

/Magnus

> Few more minor items in individual patches, will reply there.
>
> Thanks,
> Daniel
>
> > [1] https://github.com/xdp-project/xdp-project/tree/master/conference/FOSDEM2019
> > [2] https://github.com/xdp-project/xdp-project/blob/master/conference/FOSDEM2019/xdp_building_block.org#backup-slides
> >
> > Alternative work-around
> >   * Create as many AF_XDP sockets as RXQs
> >   * Have userspace poll()/select on all sockets
> >
>
Daniel Borkmann Feb. 18, 2019, 9:38 a.m. UTC | #10
On 02/18/2019 09:20 AM, Magnus Karlsson wrote:
> On Fri, Feb 15, 2019 at 5:48 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> On 02/13/2019 12:55 PM, Jesper Dangaard Brouer wrote:
>>> On Wed, 13 Feb 2019 12:32:47 +0100
>>> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>>>> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>>>>> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
>>>>>
>>>>>> This patch proposes to add AF_XDP support to libbpf. The main reason
>>>>>> for this is to facilitate writing applications that use AF_XDP by
>>>>>> offering higher-level APIs that hide many of the details of the AF_XDP
>>>>>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
>>>>>> offering easy-to-use higher level interfaces of XDP
>>>>>> functionality. Hopefully this will facilitate adoption of AF_XDP, make
>>>>>> applications using it simpler and smaller, and finally also make it
>>>>>> possible for applications to benefit from optimizations in the AF_XDP
>>>>>> user space access code. Previously, people just copied and pasted the
>>>>>> code from the sample application into their application, which is not
>>>>>> desirable.
>>>>>
>>>>> I like the idea of encapsulating the boilerplate logic in a library.
>>>>>
>>>>> I do think there is an important missing piece though - there should be
>>>>> some code which queries the netdev for how many queues are attached, and
>>>>> create the appropriate number of umem/AF_XDP sockets.
>>>>>
>>>>> I ran into this issue when testing the current AF_XDP code - on my test
>>>>> boxes, the mlx5 card has 55 channels (aka queues), so when the test program
>>>>> binds only to channel 0, nothing works as expected, since not all traffic
>>>>> is being intercepted.  While obvious in hindsight, this took a while to
>>>>> track down.
>>>>
>>>> Yes, agreed. You are not the first one to stumble upon this problem
>>>> :-). Let me think a little bit on how to solve this in a good way. We
>>>> need this to be simple and intuitive, as you say.
>>>
>>> I see people hitting this with AF_XDP all the time... I had some
>>> backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
>>> give the performance reason why and propose a workaround.
>>
>> Magnus, I presume you're going to address this for the initial libbpf merge
>> since the plan is to make it easier to consume for users?
> 
> I think the first thing we need is education and documentation. Have a
> FAQ or "common mistakes" section in the Documentation. And of course,
> sending Jesper around the world reminding people about this ;-).
> 
> To address this on a libbpf interface level, I think the best way is
> to reprogram the NIC to send all traffic to the queue that you
> provided in the xsk_socket__create call. This "set up NIC routing"
> behavior can then be disable with a flag, just as the XDP program
> loading can be disabled. The standard config of xsk_socket__create
> will then set up as many things for the user as possible just to get
> up and running quickly. More advanced users can then disable parts of
> it to gain more flexibility. Does this sound OK? Do not want to go the
> route of polling multiple sockets and aggregating the traffic as this
> will have significant negative performance implications.

I think that is fine, I would probably make this one a dedicated API call
in order to have some more flexibility than just simple flag. E.g. once
nfp AF_XDP support lands at some point, I could imagine that this call
resp. a drop-in replacement API call for more advanced steering could
also take an offloaded BPF prog fd, for example, which then would program
the steering on the NIC [0]. Seems at least there's enough complexity on
its own to have a dedicated API for it. Thoughts?

Thanks,
Daniel

  [0] https://patchwork.ozlabs.org/cover/910614/
Magnus Karlsson Feb. 18, 2019, 10:09 a.m. UTC | #11
On Mon, Feb 18, 2019 at 10:38 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 02/18/2019 09:20 AM, Magnus Karlsson wrote:
> > On Fri, Feb 15, 2019 at 5:48 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> On 02/13/2019 12:55 PM, Jesper Dangaard Brouer wrote:
> >>> On Wed, 13 Feb 2019 12:32:47 +0100
> >>> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> >>>> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >>>>> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >>>>>
> >>>>>> This patch proposes to add AF_XDP support to libbpf. The main reason
> >>>>>> for this is to facilitate writing applications that use AF_XDP by
> >>>>>> offering higher-level APIs that hide many of the details of the AF_XDP
> >>>>>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> >>>>>> offering easy-to-use higher level interfaces of XDP
> >>>>>> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> >>>>>> applications using it simpler and smaller, and finally also make it
> >>>>>> possible for applications to benefit from optimizations in the AF_XDP
> >>>>>> user space access code. Previously, people just copied and pasted the
> >>>>>> code from the sample application into their application, which is not
> >>>>>> desirable.
> >>>>>
> >>>>> I like the idea of encapsulating the boilerplate logic in a library.
> >>>>>
> >>>>> I do think there is an important missing piece though - there should be
> >>>>> some code which queries the netdev for how many queues are attached, and
> >>>>> create the appropriate number of umem/AF_XDP sockets.
> >>>>>
> >>>>> I ran into this issue when testing the current AF_XDP code - on my test
> >>>>> boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> >>>>> binds only to channel 0, nothing works as expected, since not all traffic
> >>>>> is being intercepted.  While obvious in hindsight, this took a while to
> >>>>> track down.
> >>>>
> >>>> Yes, agreed. You are not the first one to stumble upon this problem
> >>>> :-). Let me think a little bit on how to solve this in a good way. We
> >>>> need this to be simple and intuitive, as you say.
> >>>
> >>> I see people hitting this with AF_XDP all the time... I had some
> >>> backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
> >>> give the performance reason why and propose a workaround.
> >>
> >> Magnus, I presume you're going to address this for the initial libbpf merge
> >> since the plan is to make it easier to consume for users?
> >
> > I think the first thing we need is education and documentation. Have a
> > FAQ or "common mistakes" section in the Documentation. And of course,
> > sending Jesper around the world reminding people about this ;-).
> >
> > To address this on a libbpf interface level, I think the best way is
> > to reprogram the NIC to send all traffic to the queue that you
> > provided in the xsk_socket__create call. This "set up NIC routing"
> > behavior can then be disable with a flag, just as the XDP program
> > loading can be disabled. The standard config of xsk_socket__create
> > will then set up as many things for the user as possible just to get
> > up and running quickly. More advanced users can then disable parts of
> > it to gain more flexibility. Does this sound OK? Do not want to go the
> > route of polling multiple sockets and aggregating the traffic as this
> > will have significant negative performance implications.
>
> I think that is fine, I would probably make this one a dedicated API call
> in order to have some more flexibility than just simple flag. E.g. once
> nfp AF_XDP support lands at some point, I could imagine that this call
> resp. a drop-in replacement API call for more advanced steering could
> also take an offloaded BPF prog fd, for example, which then would program
> the steering on the NIC [0]. Seems at least there's enough complexity on
> its own to have a dedicated API for it. Thoughts?

I agree that there is probably enough complexity to warrant adding a
higher level API to deal with this problem (flow steering). But there
are likely a number of cases we have not thought that would complicate
it even further. This is why I suggest that this functionality should
be in its own patch set that I can devote some time and thought to.
IMO, the current patch set and functionality does already lower the
bar of entry significantly and has a value even without hiding or
controlling the steering of traffic. What I would like to do in this
patch set is to add a FAQ section in
Documentation/networking/af_xdp.rst explaining this problem. Something
like: "Q: Why am I not seeing any traffic? A: Check these four
things.....". Could add some text in the libbpf README referring to
this document also. Opinions?

Thanks: Magnus

> Thanks,
> Daniel
>
>   [0] https://patchwork.ozlabs.org/cover/910614/