[RFC,0/5] ebpf: Added ebpf helper for libvirtd.

Message ID	20210609100457.142570-1-andrew@daynix.com
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Andrew Melnychenko <andrew@daynix.com> To: mst@redhat.com, yuri.benditovich@daynix.com, jasowang@redhat.com, armbru@redhat.com, eblake@redhat.com, berrange@redhat.com Subject: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd. Date: Wed, 9 Jun 2021 13:04:52 +0300 Message-Id: <20210609100457.142570-1-andrew@daynix.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: none client-ip=2a00:1450:4864:20::131; envelope-from=andrew@daynix.com; helo=mail-lf1-x131.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: yan@daynix.com, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	ebpf: Added ebpf helper for libvirtd. \| expand [RFC,0/5] ebpf: Added ebpf helper for libvirtd. [RFC,1/5] ebpf: Added eBPF initialization by fds. [RFC,2/5] virtio-net: Added property to load eBPF RSS with fds. [RFC,3/5] ebpf_rss_helper: Added helper for eBPF RSS. [RFC,4/5] qmp: Added qemu-ebpf-rss-path command. [RFC,5/5] meson: libbpf dependency now exclusively for Linux.

Andrew Melnichenko June 9, 2021, 10:04 a.m. UTC

Libvirt usually launches qemu with strict permissions.
To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.

Added property "ebpf_rss_fds" for "virtio-net" that allows to
initialize eBPF RSS context with passed program & maps fds.

Added qemu-ebpf-rss-helper - simple helper that loads eBPF
context and passes fds through unix socket.
Libvirt should call the helper and pass fds to qemu through
"ebpf_rss_fds" property.

Added explicit target OS check for libbpf dependency in meson.
eBPF RSS works only with Linux TAP, so there is no reason to
build eBPF loader/helper for non-Linux.

Overall, libvirt process should not be aware of the "interface"
of eBPF RSS, it will not be aware of eBPF maps/program "type" and
their quantity. That's why qemu and the helper should be from
the same build and be "synchronized". Technically each qemu may
have its own helper. That's why "query-helper-paths" qmp command
was added. Qemu should return the path to the helper that suits
and libvirt should use "that" helper for "that" emulator.

qmp sample:
C: { "execute": "query-helper-paths" }
S: { "return": [
     {
       "name": "qemu-ebpf-rss-helper",
       "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
     }
    ]
   }

Andrew Melnychenko (5):
  ebpf: Added eBPF initialization by fds.
  virtio-net: Added property to load eBPF RSS with fds.
  ebpf_rss_helper: Added helper for eBPF RSS.
  qmp: Added qemu-ebpf-rss-path command.
  meson: libbpf dependency now exclusively for Linux.

 ebpf/ebpf_rss-stub.c           |   6 ++
 ebpf/ebpf_rss.c                |  31 +++++++-
 ebpf/ebpf_rss.h                |   5 ++
 ebpf/qemu-ebpf-rss-helper.c    | 130 +++++++++++++++++++++++++++++++++
 hw/net/virtio-net.c            |  77 ++++++++++++++++++-
 include/hw/virtio/virtio-net.h |   1 +
 meson.build                    |  37 ++++++----
 monitor/qmp-cmds.c             |  78 ++++++++++++++++++++
 qapi/misc.json                 |  29 ++++++++
 9 files changed, 374 insertions(+), 20 deletions(-)
 create mode 100644 ebpf/qemu-ebpf-rss-helper.c

Jason Wang June 10, 2021, 6:41 a.m. UTC | #1

在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> Libvirt usually launches qemu with strict permissions.
> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.


A silly question:

Kernel had the following permission checks in bpf syscall:

        if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
                 return -EPERM;
...

         err = security_bpf(cmd, &attr, size);
         if (err < 0)
                 return err;

So if I understand the code correctly, bpf syscall can only be done if:

1) unprivileged_bpf is enabled or
2) has the capability  and pass the LSM checks

So I think the series is for unprivileged_bpf disabled. If I'm not 
wrong, I guess the policy is to grant CAP_BPF but do fine grain checks 
via LSM.

If this is correct, need to describe it in the commit log.


>
> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> initialize eBPF RSS context with passed program & maps fds.
>
> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> context and passes fds through unix socket.
> Libvirt should call the helper and pass fds to qemu through
> "ebpf_rss_fds" property.
>
> Added explicit target OS check for libbpf dependency in meson.
> eBPF RSS works only with Linux TAP, so there is no reason to
> build eBPF loader/helper for non-Linux.
>
> Overall, libvirt process should not be aware of the "interface"
> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> their quantity.


I'm not sure this is the best. We have several examples that let libvirt 
to involve. Examples:

1) create TAP device (and the TUN_SETIFF)

2) open vhost devices


>   That's why qemu and the helper should be from
> the same build and be "synchronized". Technically each qemu may
> have its own helper. That's why "query-helper-paths" qmp command
> was added. Qemu should return the path to the helper that suits
> and libvirt should use "that" helper for "that" emulator.
>
> qmp sample:
> C: { "execute": "query-helper-paths" }
> S: { "return": [
>       {
>         "name": "qemu-ebpf-rss-helper",
>         "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>       }
>      ]
>     }


I think we need an example on the detail steps for how libvirt is 
expected to use this.

Thanks


>
> Andrew Melnychenko (5):
>    ebpf: Added eBPF initialization by fds.
>    virtio-net: Added property to load eBPF RSS with fds.
>    ebpf_rss_helper: Added helper for eBPF RSS.
>    qmp: Added qemu-ebpf-rss-path command.
>    meson: libbpf dependency now exclusively for Linux.
>
>   ebpf/ebpf_rss-stub.c           |   6 ++
>   ebpf/ebpf_rss.c                |  31 +++++++-
>   ebpf/ebpf_rss.h                |   5 ++
>   ebpf/qemu-ebpf-rss-helper.c    | 130 +++++++++++++++++++++++++++++++++
>   hw/net/virtio-net.c            |  77 ++++++++++++++++++-
>   include/hw/virtio/virtio-net.h |   1 +
>   meson.build                    |  37 ++++++----
>   monitor/qmp-cmds.c             |  78 ++++++++++++++++++++
>   qapi/misc.json                 |  29 ++++++++
>   9 files changed, 374 insertions(+), 20 deletions(-)
>   create mode 100644 ebpf/qemu-ebpf-rss-helper.c
>

Yuri Benditovich June 10, 2021, 6:55 a.m. UTC | #2

On Thu, Jun 10, 2021 at 9:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> > Libvirt usually launches qemu with strict permissions.
> > To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>
>
> A silly question:
>
> Kernel had the following permission checks in bpf syscall:
>
>         if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>                  return -EPERM;
> ...
>
>          err = security_bpf(cmd, &attr, size);
>          if (err < 0)
>                  return err;
>
> So if I understand the code correctly, bpf syscall can only be done if:
>
> 1) unprivileged_bpf is enabled or
> 2) has the capability  and pass the LSM checks
>
> So I think the series is for unprivileged_bpf disabled. If I'm not
> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> via LSM.
>
> If this is correct, need to describe it in the commit log.
>
>
> >
> > Added property "ebpf_rss_fds" for "virtio-net" that allows to
> > initialize eBPF RSS context with passed program & maps fds.
> >
> > Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> > context and passes fds through unix socket.
> > Libvirt should call the helper and pass fds to qemu through
> > "ebpf_rss_fds" property.
> >
> > Added explicit target OS check for libbpf dependency in meson.
> > eBPF RSS works only with Linux TAP, so there is no reason to
> > build eBPF loader/helper for non-Linux.
> >
> > Overall, libvirt process should not be aware of the "interface"
> > of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> > their quantity.
>
>
> I'm not sure this is the best. We have several examples that let libvirt
> to involve. Examples:
>
> 1) create TAP device (and the TUN_SETIFF)
>
> 2) open vhost devices
>
>
> >   That's why qemu and the helper should be from
> > the same build and be "synchronized". Technically each qemu may
> > have its own helper. That's why "query-helper-paths" qmp command
> > was added. Qemu should return the path to the helper that suits
> > and libvirt should use "that" helper for "that" emulator.
> >
> > qmp sample:
> > C: { "execute": "query-helper-paths" }
> > S: { "return": [
> >       {
> >         "name": "qemu-ebpf-rss-helper",
> >         "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >       }
> >      ]
> >     }
>
>
> I think we need an example on the detail steps for how libvirt is
> expected to use this.

The preliminary patches for libvirt are at
https://github.com/daynix/libvirt/tree/RSSv1

>
> Thanks
>
>
> >
> > Andrew Melnychenko (5):
> >    ebpf: Added eBPF initialization by fds.
> >    virtio-net: Added property to load eBPF RSS with fds.
> >    ebpf_rss_helper: Added helper for eBPF RSS.
> >    qmp: Added qemu-ebpf-rss-path command.
> >    meson: libbpf dependency now exclusively for Linux.
> >
> >   ebpf/ebpf_rss-stub.c           |   6 ++
> >   ebpf/ebpf_rss.c                |  31 +++++++-
> >   ebpf/ebpf_rss.h                |   5 ++
> >   ebpf/qemu-ebpf-rss-helper.c    | 130 +++++++++++++++++++++++++++++++++
> >   hw/net/virtio-net.c            |  77 ++++++++++++++++++-
> >   include/hw/virtio/virtio-net.h |   1 +
> >   meson.build                    |  37 ++++++----
> >   monitor/qmp-cmds.c             |  78 ++++++++++++++++++++
> >   qapi/misc.json                 |  29 ++++++++
> >   9 files changed, 374 insertions(+), 20 deletions(-)
> >   create mode 100644 ebpf/qemu-ebpf-rss-helper.c
> >
>

Jason Wang June 11, 2021, 5:36 a.m. UTC | #3

在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com>  wrote:
>> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>>> Libvirt usually launches qemu with strict permissions.
>>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>> A silly question:
>>
>> Kernel had the following permission checks in bpf syscall:
>>
>>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>>                   return -EPERM;
>> ...
>>
>>           err = security_bpf(cmd, &attr, size);
>>           if (err < 0)
>>                   return err;
>>
>> So if I understand the code correctly, bpf syscall can only be done if:
>>
>> 1) unprivileged_bpf is enabled or
>> 2) has the capability  and pass the LSM checks
>>
>> So I think the series is for unprivileged_bpf disabled. If I'm not
>> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
>> via LSM.
>>
>> If this is correct, need to describe it in the commit log.
>>
>>
>>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
>>> initialize eBPF RSS context with passed program & maps fds.
>>>
>>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>>> context and passes fds through unix socket.
>>> Libvirt should call the helper and pass fds to qemu through
>>> "ebpf_rss_fds" property.
>>>
>>> Added explicit target OS check for libbpf dependency in meson.
>>> eBPF RSS works only with Linux TAP, so there is no reason to
>>> build eBPF loader/helper for non-Linux.
>>>
>>> Overall, libvirt process should not be aware of the "interface"
>>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
>>> their quantity.
>> I'm not sure this is the best. We have several examples that let libvirt
>> to involve. Examples:
>>
>> 1) create TAP device (and the TUN_SETIFF)
>>
>> 2) open vhost devices
>>
>>
>>>    That's why qemu and the helper should be from
>>> the same build and be "synchronized". Technically each qemu may
>>> have its own helper. That's why "query-helper-paths" qmp command
>>> was added. Qemu should return the path to the helper that suits
>>> and libvirt should use "that" helper for "that" emulator.
>>>
>>> qmp sample:
>>> C: { "execute": "query-helper-paths" }
>>> S: { "return": [
>>>        {
>>>          "name": "qemu-ebpf-rss-helper",
>>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>>>        }
>>>       ]
>>>      }
>> I think we need an example on the detail steps for how libvirt is
>> expected to use this.
> The preliminary patches for libvirt are at
> https://github.com/daynix/libvirt/tree/RSSv1


Will have a look but it would be better if the assumption of the 
management is detailed here to ease the reviewers.

Thanks


>

Andrew Melnichenko June 11, 2021, 4:49 p.m. UTC | #4

Hi,

> So I think the series is for unprivileged_bpf disabled. If I'm not
> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> via LSM.
>

The main idea is to run eBPF RSS with qemu without any permission.
Libvirt should handle everything and pass proper eBPF file descriptors.
For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
also required, and in the future may be other permissions.

I'm not sure this is the best. We have several examples that let libvirt
> to involve. Examples:
>
> 1) create TAP device (and the TUN_SETIFF)
>
> 2) open vhost devices
>

Technically TAP/vhost not related to a particular qemu emulator. So common
TAP creation should fit any modern qemu. eBPF fds(program and maps) should
suit the interface for current qemu, g.e. some qemu builds may have
different map
structures or their count. It's necessary that the qemu got fds prepared by
the helper
that was built with the qemu.

I think we need an example on the detail steps for how libvirt is
> expected to use this.
>

The simplified workflow looks like this:

   1. Libvirt got "emulator" from domain document.
   2. Libvirt queries for qemu capabilities.
   3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
   4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
   5. If required, the "qemu-ebpf-rss-helper" called and fds are received
   through unix fd.
   6. Those fds are for eBPF RSS, which passed to child process - qemu.
   7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".


On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com> wrote:

>
> 在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com>  wrote:
> >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >>> Libvirt usually launches qemu with strict permissions.
> >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
> >> A silly question:
> >>
> >> Kernel had the following permission checks in bpf syscall:
> >>
> >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> >>                   return -EPERM;
> >> ...
> >>
> >>           err = security_bpf(cmd, &attr, size);
> >>           if (err < 0)
> >>                   return err;
> >>
> >> So if I understand the code correctly, bpf syscall can only be done if:
> >>
> >> 1) unprivileged_bpf is enabled or
> >> 2) has the capability  and pass the LSM checks
> >>
> >> So I think the series is for unprivileged_bpf disabled. If I'm not
> >> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> >> via LSM.
> >>
> >> If this is correct, need to describe it in the commit log.
> >>
> >>
> >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> >>> initialize eBPF RSS context with passed program & maps fds.
> >>>
> >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> >>> context and passes fds through unix socket.
> >>> Libvirt should call the helper and pass fds to qemu through
> >>> "ebpf_rss_fds" property.
> >>>
> >>> Added explicit target OS check for libbpf dependency in meson.
> >>> eBPF RSS works only with Linux TAP, so there is no reason to
> >>> build eBPF loader/helper for non-Linux.
> >>>
> >>> Overall, libvirt process should not be aware of the "interface"
> >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> >>> their quantity.
> >> I'm not sure this is the best. We have several examples that let libvirt
> >> to involve. Examples:
> >>
> >> 1) create TAP device (and the TUN_SETIFF)
> >>
> >> 2) open vhost devices
> >>
> >>
> >>>    That's why qemu and the helper should be from
> >>> the same build and be "synchronized". Technically each qemu may
> >>> have its own helper. That's why "query-helper-paths" qmp command
> >>> was added. Qemu should return the path to the helper that suits
> >>> and libvirt should use "that" helper for "that" emulator.
> >>>
> >>> qmp sample:
> >>> C: { "execute": "query-helper-paths" }
> >>> S: { "return": [
> >>>        {
> >>>          "name": "qemu-ebpf-rss-helper",
> >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >>>        }
> >>>       ]
> >>>      }
> >> I think we need an example on the detail steps for how libvirt is
> >> expected to use this.
> > The preliminary patches for libvirt are at
> > https://github.com/daynix/libvirt/tree/RSSv1
>
>
> Will have a look but it would be better if the assumption of the
> management is detailed here to ease the reviewers.
>
> Thanks
>
>
> >
>
>

Daniel P. Berrangé June 11, 2021, 5:24 p.m. UTC | #5

On Fri, Jun 11, 2021 at 07:49:21PM +0300, Andrew Melnichenko wrote:
> Hi,
> 
> > So I think the series is for unprivileged_bpf disabled. If I'm not
> > wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> > via LSM.
> >
> 
> The main idea is to run eBPF RSS with qemu without any permission.
> Libvirt should handle everything and pass proper eBPF file descriptors.
> For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> also required, and in the future may be other permissions.
> 
> I'm not sure this is the best. We have several examples that let libvirt
> > to involve. Examples:
> >
> > 1) create TAP device (and the TUN_SETIFF)
> >
> > 2) open vhost devices
> >
> 
> Technically TAP/vhost not related to a particular qemu emulator. So common
> TAP creation should fit any modern qemu. eBPF fds(program and maps) should
> suit the interface for current qemu, g.e. some qemu builds may have
> different map
> structures or their count. It's necessary that the qemu got fds prepared by
> the helper
> that was built with the qemu.
> 
> I think we need an example on the detail steps for how libvirt is
> > expected to use this.
> >
> 
> The simplified workflow looks like this:
> 
>    1. Libvirt got "emulator" from domain document.
>    2. Libvirt queries for qemu capabilities.
>    3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>    4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
>    5. If required, the "qemu-ebpf-rss-helper" called and fds are received
>    through unix fd.
>    6. Those fds are for eBPF RSS, which passed to child process - qemu.
>    7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".

So this basically works in the same way as the qemu bridge
helper, with the extra advantage that we can actually query
QEMU for the right helper instead of libvirt hardcoding te
helper path.  We should make your QMP query command also
return the paths for the existing QEMU helpers (bridge helper,
and pr helper) too.

Anyway, this approach is obviously viable for libvirt, since
it matches what we already do for other features.


Regards,
Daniel

Jason Wang June 15, 2021, 9:13 a.m. UTC | #6

在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
> Hi,
>
>     So I think the series is for unprivileged_bpf disabled. If I'm not
>     wrong, I guess the policy is to grant CAP_BPF but do fine grain
>     checks
>     via LSM.
>
>
> The main idea is to run eBPF RSS with qemu without any permission.
> Libvirt should handle everything and pass proper eBPF file descriptors.
> For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> also required, and in the future may be other permissions.


I may miss something.

But RSS requires to update the map. This won't work if you don't grant 
any permission to qemu.

Thanks


>
>     I'm not sure this is the best. We have several examples that let
>     libvirt
>     to involve. Examples:
>
>     1) create TAP device (and the TUN_SETIFF)
>
>     2) open vhost devices
>
>
> Technically TAP/vhost not related to a particular qemu emulator. So common
> TAP creation should fit any modern qemu. eBPF fds(program and maps) should
> suit the interface for current qemu, g.e. some qemu builds may have 
> different map
> structures or their count. It's necessary that the qemu got fds 
> prepared by the helper
> that was built with the qemu.
>
>     I think we need an example on the detail steps for how libvirt is
>     expected to use this.
>
>
> The simplified workflow looks like this:
>
>  1. Libvirt got "emulator" from domain document.
>  2. Libvirt queries for qemu capabilities.
>  3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>  4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
>  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>     received through unix fd.
>  6. Those fds are for eBPF RSS, which passed to child process - qemu.
>  7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".
>
>
> On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>     > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com
>     <mailto:jasowang@redhat.com>>  wrote:
>     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>     >>> Libvirt usually launches qemu with strict permissions.
>     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>     >> A silly question:
>     >>
>     >> Kernel had the following permission checks in bpf syscall:
>     >>
>     >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>     >>                   return -EPERM;
>     >> ...
>     >>
>     >>           err = security_bpf(cmd, &attr, size);
>     >>           if (err < 0)
>     >>                   return err;
>     >>
>     >> So if I understand the code correctly, bpf syscall can only be
>     done if:
>     >>
>     >> 1) unprivileged_bpf is enabled or
>     >> 2) has the capability  and pass the LSM checks
>     >>
>     >> So I think the series is for unprivileged_bpf disabled. If I'm not
>     >> wrong, I guess the policy is to grant CAP_BPF but do fine grain
>     checks
>     >> via LSM.
>     >>
>     >> If this is correct, need to describe it in the commit log.
>     >>
>     >>
>     >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
>     >>> initialize eBPF RSS context with passed program & maps fds.
>     >>>
>     >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>     >>> context and passes fds through unix socket.
>     >>> Libvirt should call the helper and pass fds to qemu through
>     >>> "ebpf_rss_fds" property.
>     >>>
>     >>> Added explicit target OS check for libbpf dependency in meson.
>     >>> eBPF RSS works only with Linux TAP, so there is no reason to
>     >>> build eBPF loader/helper for non-Linux.
>     >>>
>     >>> Overall, libvirt process should not be aware of the "interface"
>     >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
>     >>> their quantity.
>     >> I'm not sure this is the best. We have several examples that
>     let libvirt
>     >> to involve. Examples:
>     >>
>     >> 1) create TAP device (and the TUN_SETIFF)
>     >>
>     >> 2) open vhost devices
>     >>
>     >>
>     >>>    That's why qemu and the helper should be from
>     >>> the same build and be "synchronized". Technically each qemu may
>     >>> have its own helper. That's why "query-helper-paths" qmp command
>     >>> was added. Qemu should return the path to the helper that suits
>     >>> and libvirt should use "that" helper for "that" emulator.
>     >>>
>     >>> qmp sample:
>     >>> C: { "execute": "query-helper-paths" }
>     >>> S: { "return": [
>     >>>        {
>     >>>          "name": "qemu-ebpf-rss-helper",
>     >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>     >>>        }
>     >>>       ]
>     >>>      }
>     >> I think we need an example on the detail steps for how libvirt is
>     >> expected to use this.
>     > The preliminary patches for libvirt are at
>     > https://github.com/daynix/libvirt/tree/RSSv1
>     <https://github.com/daynix/libvirt/tree/RSSv1>
>
>
>     Will have a look but it would be better if the assumption of the
>     management is detailed here to ease the reviewers.
>
>     Thanks
>
>
>     >
>

Andrew Melnichenko June 15, 2021, 10:18 p.m. UTC | #7

Hi,

> I may miss something.
>
> But RSS requires to update the map. This won't work if you don't grant
> any permission to qemu.
>
> Thanks
>

Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is not
required to update maps.
With "kernel.unprivileged_bpf_disabled=1" - setting maps will fail(without
CAP_BPF) and "in-qemu" RSS will be used.

On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:

>
> 在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
> > Hi,
> >
> >     So I think the series is for unprivileged_bpf disabled. If I'm not
> >     wrong, I guess the policy is to grant CAP_BPF but do fine grain
> >     checks
> >     via LSM.
> >
> >
> > The main idea is to run eBPF RSS with qemu without any permission.
> > Libvirt should handle everything and pass proper eBPF file descriptors.
> > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> > also required, and in the future may be other permissions.
>
>
> I may miss something.
>
> But RSS requires to update the map. This won't work if you don't grant
> any permission to qemu.
>
> Thanks
>
>
> >
> >     I'm not sure this is the best. We have several examples that let
> >     libvirt
> >     to involve. Examples:
> >
> >     1) create TAP device (and the TUN_SETIFF)
> >
> >     2) open vhost devices
> >
> >
> > Technically TAP/vhost not related to a particular qemu emulator. So
> common
> > TAP creation should fit any modern qemu. eBPF fds(program and maps)
> should
> > suit the interface for current qemu, g.e. some qemu builds may have
> > different map
> > structures or their count. It's necessary that the qemu got fds
> > prepared by the helper
> > that was built with the qemu.
> >
> >     I think we need an example on the detail steps for how libvirt is
> >     expected to use this.
> >
> >
> > The simplified workflow looks like this:
> >
> >  1. Libvirt got "emulator" from domain document.
> >  2. Libvirt queries for qemu capabilities.
> >  3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
> >  4. On NIC preparation Libvirt checks for virtio-net + rss
> configurations.
> >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
> >     received through unix fd.
> >  6. Those fds are for eBPF RSS, which passed to child process - qemu.
> >  7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".
> >
> >
> > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> >     > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>>  wrote:
> >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >     >>> Libvirt usually launches qemu with strict permissions.
> >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
> >     >> A silly question:
> >     >>
> >     >> Kernel had the following permission checks in bpf syscall:
> >     >>
> >     >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> >     >>                   return -EPERM;
> >     >> ...
> >     >>
> >     >>           err = security_bpf(cmd, &attr, size);
> >     >>           if (err < 0)
> >     >>                   return err;
> >     >>
> >     >> So if I understand the code correctly, bpf syscall can only be
> >     done if:
> >     >>
> >     >> 1) unprivileged_bpf is enabled or
> >     >> 2) has the capability  and pass the LSM checks
> >     >>
> >     >> So I think the series is for unprivileged_bpf disabled. If I'm not
> >     >> wrong, I guess the policy is to grant CAP_BPF but do fine grain
> >     checks
> >     >> via LSM.
> >     >>
> >     >> If this is correct, need to describe it in the commit log.
> >     >>
> >     >>
> >     >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> >     >>> initialize eBPF RSS context with passed program & maps fds.
> >     >>>
> >     >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> >     >>> context and passes fds through unix socket.
> >     >>> Libvirt should call the helper and pass fds to qemu through
> >     >>> "ebpf_rss_fds" property.
> >     >>>
> >     >>> Added explicit target OS check for libbpf dependency in meson.
> >     >>> eBPF RSS works only with Linux TAP, so there is no reason to
> >     >>> build eBPF loader/helper for non-Linux.
> >     >>>
> >     >>> Overall, libvirt process should not be aware of the "interface"
> >     >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> >     >>> their quantity.
> >     >> I'm not sure this is the best. We have several examples that
> >     let libvirt
> >     >> to involve. Examples:
> >     >>
> >     >> 1) create TAP device (and the TUN_SETIFF)
> >     >>
> >     >> 2) open vhost devices
> >     >>
> >     >>
> >     >>>    That's why qemu and the helper should be from
> >     >>> the same build and be "synchronized". Technically each qemu may
> >     >>> have its own helper. That's why "query-helper-paths" qmp command
> >     >>> was added. Qemu should return the path to the helper that suits
> >     >>> and libvirt should use "that" helper for "that" emulator.
> >     >>>
> >     >>> qmp sample:
> >     >>> C: { "execute": "query-helper-paths" }
> >     >>> S: { "return": [
> >     >>>        {
> >     >>>          "name": "qemu-ebpf-rss-helper",
> >     >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >     >>>        }
> >     >>>       ]
> >     >>>      }
> >     >> I think we need an example on the detail steps for how libvirt is
> >     >> expected to use this.
> >     > The preliminary patches for libvirt are at
> >     > https://github.com/daynix/libvirt/tree/RSSv1
> >     <https://github.com/daynix/libvirt/tree/RSSv1>
> >
> >
> >     Will have a look but it would be better if the assumption of the
> >     management is detailed here to ease the reviewers.
> >
> >     Thanks
> >
> >
> >     >
> >
>
>

Andrew Melnichenko June 18, 2021, 8:03 p.m. UTC | #8

Hi Jason,
I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu, and
Debian - no need permissions to update BPF maps.

On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com>
wrote:

> Hi,
>
>> I may miss something.
>>
>> But RSS requires to update the map. This won't work if you don't grant
>> any permission to qemu.
>>
>> Thanks
>>
>
> Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is not
> required to update maps.
> With "kernel.unprivileged_bpf_disabled=1" - setting maps will fail(without
> CAP_BPF) and "in-qemu" RSS will be used.
>
> On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>
>>
>> 在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
>> > Hi,
>> >
>> >     So I think the series is for unprivileged_bpf disabled. If I'm not
>> >     wrong, I guess the policy is to grant CAP_BPF but do fine grain
>> >     checks
>> >     via LSM.
>> >
>> >
>> > The main idea is to run eBPF RSS with qemu without any permission.
>> > Libvirt should handle everything and pass proper eBPF file descriptors.
>> > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
>> > also required, and in the future may be other permissions.
>>
>>
>> I may miss something.
>>
>> But RSS requires to update the map. This won't work if you don't grant
>> any permission to qemu.
>>
>> Thanks
>>
>>
>> >
>> >     I'm not sure this is the best. We have several examples that let
>> >     libvirt
>> >     to involve. Examples:
>> >
>> >     1) create TAP device (and the TUN_SETIFF)
>> >
>> >     2) open vhost devices
>> >
>> >
>> > Technically TAP/vhost not related to a particular qemu emulator. So
>> common
>> > TAP creation should fit any modern qemu. eBPF fds(program and maps)
>> should
>> > suit the interface for current qemu, g.e. some qemu builds may have
>> > different map
>> > structures or their count. It's necessary that the qemu got fds
>> > prepared by the helper
>> > that was built with the qemu.
>> >
>> >     I think we need an example on the detail steps for how libvirt is
>> >     expected to use this.
>> >
>> >
>> > The simplified workflow looks like this:
>> >
>> >  1. Libvirt got "emulator" from domain document.
>> >  2. Libvirt queries for qemu capabilities.
>> >  3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>> >  4. On NIC preparation Libvirt checks for virtio-net + rss
>> configurations.
>> >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>> >     received through unix fd.
>> >  6. Those fds are for eBPF RSS, which passed to child process - qemu.
>> >  7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".
>> >
>> >
>> > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com
>> > <mailto:jasowang@redhat.com>> wrote:
>> >
>> >
>> >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>> >     > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com
>> >     <mailto:jasowang@redhat.com>>  wrote:
>> >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>> >     >>> Libvirt usually launches qemu with strict permissions.
>> >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>> >     >> A silly question:
>> >     >>
>> >     >> Kernel had the following permission checks in bpf syscall:
>> >     >>
>> >     >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>> >     >>                   return -EPERM;
>> >     >> ...
>> >     >>
>> >     >>           err = security_bpf(cmd, &attr, size);
>> >     >>           if (err < 0)
>> >     >>                   return err;
>> >     >>
>> >     >> So if I understand the code correctly, bpf syscall can only be
>> >     done if:
>> >     >>
>> >     >> 1) unprivileged_bpf is enabled or
>> >     >> 2) has the capability  and pass the LSM checks
>> >     >>
>> >     >> So I think the series is for unprivileged_bpf disabled. If I'm
>> not
>> >     >> wrong, I guess the policy is to grant CAP_BPF but do fine grain
>> >     checks
>> >     >> via LSM.
>> >     >>
>> >     >> If this is correct, need to describe it in the commit log.
>> >     >>
>> >     >>
>> >     >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
>> >     >>> initialize eBPF RSS context with passed program & maps fds.
>> >     >>>
>> >     >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>> >     >>> context and passes fds through unix socket.
>> >     >>> Libvirt should call the helper and pass fds to qemu through
>> >     >>> "ebpf_rss_fds" property.
>> >     >>>
>> >     >>> Added explicit target OS check for libbpf dependency in meson.
>> >     >>> eBPF RSS works only with Linux TAP, so there is no reason to
>> >     >>> build eBPF loader/helper for non-Linux.
>> >     >>>
>> >     >>> Overall, libvirt process should not be aware of the "interface"
>> >     >>> of eBPF RSS, it will not be aware of eBPF maps/program "type"
>> and
>> >     >>> their quantity.
>> >     >> I'm not sure this is the best. We have several examples that
>> >     let libvirt
>> >     >> to involve. Examples:
>> >     >>
>> >     >> 1) create TAP device (and the TUN_SETIFF)
>> >     >>
>> >     >> 2) open vhost devices
>> >     >>
>> >     >>
>> >     >>>    That's why qemu and the helper should be from
>> >     >>> the same build and be "synchronized". Technically each qemu may
>> >     >>> have its own helper. That's why "query-helper-paths" qmp command
>> >     >>> was added. Qemu should return the path to the helper that suits
>> >     >>> and libvirt should use "that" helper for "that" emulator.
>> >     >>>
>> >     >>> qmp sample:
>> >     >>> C: { "execute": "query-helper-paths" }
>> >     >>> S: { "return": [
>> >     >>>        {
>> >     >>>          "name": "qemu-ebpf-rss-helper",
>> >     >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>> >     >>>        }
>> >     >>>       ]
>> >     >>>      }
>> >     >> I think we need an example on the detail steps for how libvirt is
>> >     >> expected to use this.
>> >     > The preliminary patches for libvirt are at
>> >     > https://github.com/daynix/libvirt/tree/RSSv1
>> >     <https://github.com/daynix/libvirt/tree/RSSv1>
>> >
>> >
>> >     Will have a look but it would be better if the assumption of the
>> >     management is detailed here to ease the reviewers.
>> >
>> >     Thanks
>> >
>> >
>> >     >
>> >
>>
>>

Jason Wang June 21, 2021, 9:20 a.m. UTC | #9

在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> Hi Jason,
> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu, 
> and Debian - no need permissions to update BPF maps.


How about RHEL :) ?

Thanks


>
> On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com 
> <mailto:andrew@daynix.com>> wrote:
>
>     Hi,
>
>         I may miss something.
>
>         But RSS requires to update the map. This won't work if you
>         don't grant
>         any permission to qemu.
>
>         Thanks
>
>
>     Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is
>     not required to update maps.
>     With "kernel.unprivileged_bpf_disabled=1" - setting maps will
>     fail(without CAP_BPF) and "in-qemu" RSS will be used.
>
>     On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com
>     <mailto:jasowang@redhat.com>> wrote:
>
>
>         在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
>         > Hi,
>         >
>         >     So I think the series is for unprivileged_bpf disabled.
>         If I'm not
>         >     wrong, I guess the policy is to grant CAP_BPF but do
>         fine grain
>         >     checks
>         >     via LSM.
>         >
>         >
>         > The main idea is to run eBPF RSS with qemu without any
>         permission.
>         > Libvirt should handle everything and pass proper eBPF file
>         descriptors.
>         > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
>         > also required, and in the future may be other permissions.
>
>
>         I may miss something.
>
>         But RSS requires to update the map. This won't work if you
>         don't grant
>         any permission to qemu.
>
>         Thanks
>
>
>         >
>         >     I'm not sure this is the best. We have several examples
>         that let
>         >     libvirt
>         >     to involve. Examples:
>         >
>         >     1) create TAP device (and the TUN_SETIFF)
>         >
>         >     2) open vhost devices
>         >
>         >
>         > Technically TAP/vhost not related to a particular qemu
>         emulator. So common
>         > TAP creation should fit any modern qemu. eBPF fds(program
>         and maps) should
>         > suit the interface for current qemu, g.e. some qemu builds
>         may have
>         > different map
>         > structures or their count. It's necessary that the qemu got fds
>         > prepared by the helper
>         > that was built with the qemu.
>         >
>         >     I think we need an example on the detail steps for how
>         libvirt is
>         >     expected to use this.
>         >
>         >
>         > The simplified workflow looks like this:
>         >
>         >  1. Libvirt got "emulator" from domain document.
>         >  2. Libvirt queries for qemu capabilities.
>         >  3. One of the capabilities is "qemu-ebpf-rss-helper"
>         path(if present).
>         >  4. On NIC preparation Libvirt checks for virtio-net + rss
>         configurations.
>         >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>         >     received through unix fd.
>         >  6. Those fds are for eBPF RSS, which passed to child
>         process - qemu.
>         >  7. Qemu launched with virtio-net-pci property "rss" and
>         "ebpf_rss_fds".
>         >
>         >
>         > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang
>         <jasowang@redhat.com <mailto:jasowang@redhat.com>
>         > <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>
>         wrote:
>         >
>         >
>         >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>         >     > On Thu, Jun 10, 2021 at 9:41 AM Jason
>         Wang<jasowang@redhat.com <mailto:jasowang@redhat.com>
>         >     <mailto:jasowang@redhat.com
>         <mailto:jasowang@redhat.com>>> wrote:
>         >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>         >     >>> Libvirt usually launches qemu with strict permissions.
>         >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper
>         was added.
>         >     >> A silly question:
>         >     >>
>         >     >> Kernel had the following permission checks in bpf
>         syscall:
>         >     >>
>         >     >>          if (sysctl_unprivileged_bpf_disabled &&
>         !bpf_capable())
>         >     >>                   return -EPERM;
>         >     >> ...
>         >     >>
>         >     >>           err = security_bpf(cmd, &attr, size);
>         >     >>           if (err < 0)
>         >     >>                   return err;
>         >     >>
>         >     >> So if I understand the code correctly, bpf syscall
>         can only be
>         >     done if:
>         >     >>
>         >     >> 1) unprivileged_bpf is enabled or
>         >     >> 2) has the capability  and pass the LSM checks
>         >     >>
>         >     >> So I think the series is for unprivileged_bpf
>         disabled. If I'm not
>         >     >> wrong, I guess the policy is to grant CAP_BPF but do
>         fine grain
>         >     checks
>         >     >> via LSM.
>         >     >>
>         >     >> If this is correct, need to describe it in the commit
>         log.
>         >     >>
>         >     >>
>         >     >>> Added property "ebpf_rss_fds" for "virtio-net" that
>         allows to
>         >     >>> initialize eBPF RSS context with passed program &
>         maps fds.
>         >     >>>
>         >     >>> Added qemu-ebpf-rss-helper - simple helper that
>         loads eBPF
>         >     >>> context and passes fds through unix socket.
>         >     >>> Libvirt should call the helper and pass fds to qemu
>         through
>         >     >>> "ebpf_rss_fds" property.
>         >     >>>
>         >     >>> Added explicit target OS check for libbpf dependency
>         in meson.
>         >     >>> eBPF RSS works only with Linux TAP, so there is no
>         reason to
>         >     >>> build eBPF loader/helper for non-Linux.
>         >     >>>
>         >     >>> Overall, libvirt process should not be aware of the
>         "interface"
>         >     >>> of eBPF RSS, it will not be aware of eBPF
>         maps/program "type" and
>         >     >>> their quantity.
>         >     >> I'm not sure this is the best. We have several
>         examples that
>         >     let libvirt
>         >     >> to involve. Examples:
>         >     >>
>         >     >> 1) create TAP device (and the TUN_SETIFF)
>         >     >>
>         >     >> 2) open vhost devices
>         >     >>
>         >     >>
>         >     >>>    That's why qemu and the helper should be from
>         >     >>> the same build and be "synchronized". Technically
>         each qemu may
>         >     >>> have its own helper. That's why "query-helper-paths"
>         qmp command
>         >     >>> was added. Qemu should return the path to the helper
>         that suits
>         >     >>> and libvirt should use "that" helper for "that"
>         emulator.
>         >     >>>
>         >     >>> qmp sample:
>         >     >>> C: { "execute": "query-helper-paths" }
>         >     >>> S: { "return": [
>         >     >>>        {
>         >     >>>          "name": "qemu-ebpf-rss-helper",
>         >     >>>          "path":
>         "/usr/local/libexec/qemu-ebpf-rss-helper"
>         >     >>>        }
>         >     >>>       ]
>         >     >>>      }
>         >     >> I think we need an example on the detail steps for
>         how libvirt is
>         >     >> expected to use this.
>         >     > The preliminary patches for libvirt are at
>         >     > https://github.com/daynix/libvirt/tree/RSSv1
>         <https://github.com/daynix/libvirt/tree/RSSv1>
>         >     <https://github.com/daynix/libvirt/tree/RSSv1
>         <https://github.com/daynix/libvirt/tree/RSSv1>>
>         >
>         >
>         >     Will have a look but it would be better if the
>         assumption of the
>         >     management is detailed here to ease the reviewers.
>         >
>         >     Thanks
>         >
>         >
>         >     >
>         >
>

Yuri Benditovich June 22, 2021, 3:29 a.m. UTC | #10

On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> > Hi Jason,
> > I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
> > and Debian - no need permissions to update BPF maps.
>
>
> How about RHEL :) ?

If I'm not mistaken, the RHEL releases do not use modern kernels yet
(for BPF we need 5.8+).
So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.


>
> Thanks
>
>
> >
> > On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com
> > <mailto:andrew@daynix.com>> wrote:
> >
> >     Hi,
> >
> >         I may miss something.
> >
> >         But RSS requires to update the map. This won't work if you
> >         don't grant
> >         any permission to qemu.
> >
> >         Thanks
> >
> >
> >     Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is
> >     not required to update maps.
> >     With "kernel.unprivileged_bpf_disabled=1" - setting maps will
> >     fail(without CAP_BPF) and "in-qemu" RSS will be used.
> >
> >     On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >         在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
> >         > Hi,
> >         >
> >         >     So I think the series is for unprivileged_bpf disabled.
> >         If I'm not
> >         >     wrong, I guess the policy is to grant CAP_BPF but do
> >         fine grain
> >         >     checks
> >         >     via LSM.
> >         >
> >         >
> >         > The main idea is to run eBPF RSS with qemu without any
> >         permission.
> >         > Libvirt should handle everything and pass proper eBPF file
> >         descriptors.
> >         > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> >         > also required, and in the future may be other permissions.
> >
> >
> >         I may miss something.
> >
> >         But RSS requires to update the map. This won't work if you
> >         don't grant
> >         any permission to qemu.
> >
> >         Thanks
> >
> >
> >         >
> >         >     I'm not sure this is the best. We have several examples
> >         that let
> >         >     libvirt
> >         >     to involve. Examples:
> >         >
> >         >     1) create TAP device (and the TUN_SETIFF)
> >         >
> >         >     2) open vhost devices
> >         >
> >         >
> >         > Technically TAP/vhost not related to a particular qemu
> >         emulator. So common
> >         > TAP creation should fit any modern qemu. eBPF fds(program
> >         and maps) should
> >         > suit the interface for current qemu, g.e. some qemu builds
> >         may have
> >         > different map
> >         > structures or their count. It's necessary that the qemu got fds
> >         > prepared by the helper
> >         > that was built with the qemu.
> >         >
> >         >     I think we need an example on the detail steps for how
> >         libvirt is
> >         >     expected to use this.
> >         >
> >         >
> >         > The simplified workflow looks like this:
> >         >
> >         >  1. Libvirt got "emulator" from domain document.
> >         >  2. Libvirt queries for qemu capabilities.
> >         >  3. One of the capabilities is "qemu-ebpf-rss-helper"
> >         path(if present).
> >         >  4. On NIC preparation Libvirt checks for virtio-net + rss
> >         configurations.
> >         >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
> >         >     received through unix fd.
> >         >  6. Those fds are for eBPF RSS, which passed to child
> >         process - qemu.
> >         >  7. Qemu launched with virtio-net-pci property "rss" and
> >         "ebpf_rss_fds".
> >         >
> >         >
> >         > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang
> >         <jasowang@redhat.com <mailto:jasowang@redhat.com>
> >         > <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>
> >         wrote:
> >         >
> >         >
> >         >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> >         >     > On Thu, Jun 10, 2021 at 9:41 AM Jason
> >         Wang<jasowang@redhat.com <mailto:jasowang@redhat.com>
> >         >     <mailto:jasowang@redhat.com
> >         <mailto:jasowang@redhat.com>>> wrote:
> >         >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >         >     >>> Libvirt usually launches qemu with strict permissions.
> >         >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper
> >         was added.
> >         >     >> A silly question:
> >         >     >>
> >         >     >> Kernel had the following permission checks in bpf
> >         syscall:
> >         >     >>
> >         >     >>          if (sysctl_unprivileged_bpf_disabled &&
> >         !bpf_capable())
> >         >     >>                   return -EPERM;
> >         >     >> ...
> >         >     >>
> >         >     >>           err = security_bpf(cmd, &attr, size);
> >         >     >>           if (err < 0)
> >         >     >>                   return err;
> >         >     >>
> >         >     >> So if I understand the code correctly, bpf syscall
> >         can only be
> >         >     done if:
> >         >     >>
> >         >     >> 1) unprivileged_bpf is enabled or
> >         >     >> 2) has the capability  and pass the LSM checks
> >         >     >>
> >         >     >> So I think the series is for unprivileged_bpf
> >         disabled. If I'm not
> >         >     >> wrong, I guess the policy is to grant CAP_BPF but do
> >         fine grain
> >         >     checks
> >         >     >> via LSM.
> >         >     >>
> >         >     >> If this is correct, need to describe it in the commit
> >         log.
> >         >     >>
> >         >     >>
> >         >     >>> Added property "ebpf_rss_fds" for "virtio-net" that
> >         allows to
> >         >     >>> initialize eBPF RSS context with passed program &
> >         maps fds.
> >         >     >>>
> >         >     >>> Added qemu-ebpf-rss-helper - simple helper that
> >         loads eBPF
> >         >     >>> context and passes fds through unix socket.
> >         >     >>> Libvirt should call the helper and pass fds to qemu
> >         through
> >         >     >>> "ebpf_rss_fds" property.
> >         >     >>>
> >         >     >>> Added explicit target OS check for libbpf dependency
> >         in meson.
> >         >     >>> eBPF RSS works only with Linux TAP, so there is no
> >         reason to
> >         >     >>> build eBPF loader/helper for non-Linux.
> >         >     >>>
> >         >     >>> Overall, libvirt process should not be aware of the
> >         "interface"
> >         >     >>> of eBPF RSS, it will not be aware of eBPF
> >         maps/program "type" and
> >         >     >>> their quantity.
> >         >     >> I'm not sure this is the best. We have several
> >         examples that
> >         >     let libvirt
> >         >     >> to involve. Examples:
> >         >     >>
> >         >     >> 1) create TAP device (and the TUN_SETIFF)
> >         >     >>
> >         >     >> 2) open vhost devices
> >         >     >>
> >         >     >>
> >         >     >>>    That's why qemu and the helper should be from
> >         >     >>> the same build and be "synchronized". Technically
> >         each qemu may
> >         >     >>> have its own helper. That's why "query-helper-paths"
> >         qmp command
> >         >     >>> was added. Qemu should return the path to the helper
> >         that suits
> >         >     >>> and libvirt should use "that" helper for "that"
> >         emulator.
> >         >     >>>
> >         >     >>> qmp sample:
> >         >     >>> C: { "execute": "query-helper-paths" }
> >         >     >>> S: { "return": [
> >         >     >>>        {
> >         >     >>>          "name": "qemu-ebpf-rss-helper",
> >         >     >>>          "path":
> >         "/usr/local/libexec/qemu-ebpf-rss-helper"
> >         >     >>>        }
> >         >     >>>       ]
> >         >     >>>      }
> >         >     >> I think we need an example on the detail steps for
> >         how libvirt is
> >         >     >> expected to use this.
> >         >     > The preliminary patches for libvirt are at
> >         >     > https://github.com/daynix/libvirt/tree/RSSv1
> >         <https://github.com/daynix/libvirt/tree/RSSv1>
> >         >     <https://github.com/daynix/libvirt/tree/RSSv1
> >         <https://github.com/daynix/libvirt/tree/RSSv1>>
> >         >
> >         >
> >         >     Will have a look but it would be better if the
> >         assumption of the
> >         >     management is detailed here to ease the reviewers.
> >         >
> >         >     Thanks
> >         >
> >         >
> >         >     >
> >         >
> >
>

Jason Wang June 22, 2021, 4:58 a.m. UTC | #11

在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>> Hi Jason,
>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>> and Debian - no need permissions to update BPF maps.
>>
>> How about RHEL :) ?
> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> (for BPF we need 5.8+).
> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.


Adding Toke for more ideas on this.

Thanks


>
>
>> Thanks
>>
>>
>>> On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com
>>> <mailto:andrew@daynix.com>> wrote:
>>>
>>>      Hi,
>>>
>>>          I may miss something.
>>>
>>>          But RSS requires to update the map. This won't work if you
>>>          don't grant
>>>          any permission to qemu.
>>>
>>>          Thanks
>>>
>>>
>>>      Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is
>>>      not required to update maps.
>>>      With "kernel.unprivileged_bpf_disabled=1" - setting maps will
>>>      fail(without CAP_BPF) and "in-qemu" RSS will be used.
>>>
>>>      On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com
>>>      <mailto:jasowang@redhat.com>> wrote:
>>>
>>>
>>>          在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
>>>          > Hi,
>>>          >
>>>          >     So I think the series is for unprivileged_bpf disabled.
>>>          If I'm not
>>>          >     wrong, I guess the policy is to grant CAP_BPF but do
>>>          fine grain
>>>          >     checks
>>>          >     via LSM.
>>>          >
>>>          >
>>>          > The main idea is to run eBPF RSS with qemu without any
>>>          permission.
>>>          > Libvirt should handle everything and pass proper eBPF file
>>>          descriptors.
>>>          > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
>>>          > also required, and in the future may be other permissions.
>>>
>>>
>>>          I may miss something.
>>>
>>>          But RSS requires to update the map. This won't work if you
>>>          don't grant
>>>          any permission to qemu.
>>>
>>>          Thanks
>>>
>>>
>>>          >
>>>          >     I'm not sure this is the best. We have several examples
>>>          that let
>>>          >     libvirt
>>>          >     to involve. Examples:
>>>          >
>>>          >     1) create TAP device (and the TUN_SETIFF)
>>>          >
>>>          >     2) open vhost devices
>>>          >
>>>          >
>>>          > Technically TAP/vhost not related to a particular qemu
>>>          emulator. So common
>>>          > TAP creation should fit any modern qemu. eBPF fds(program
>>>          and maps) should
>>>          > suit the interface for current qemu, g.e. some qemu builds
>>>          may have
>>>          > different map
>>>          > structures or their count. It's necessary that the qemu got fds
>>>          > prepared by the helper
>>>          > that was built with the qemu.
>>>          >
>>>          >     I think we need an example on the detail steps for how
>>>          libvirt is
>>>          >     expected to use this.
>>>          >
>>>          >
>>>          > The simplified workflow looks like this:
>>>          >
>>>          >  1. Libvirt got "emulator" from domain document.
>>>          >  2. Libvirt queries for qemu capabilities.
>>>          >  3. One of the capabilities is "qemu-ebpf-rss-helper"
>>>          path(if present).
>>>          >  4. On NIC preparation Libvirt checks for virtio-net + rss
>>>          configurations.
>>>          >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>>>          >     received through unix fd.
>>>          >  6. Those fds are for eBPF RSS, which passed to child
>>>          process - qemu.
>>>          >  7. Qemu launched with virtio-net-pci property "rss" and
>>>          "ebpf_rss_fds".
>>>          >
>>>          >
>>>          > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang
>>>          <jasowang@redhat.com <mailto:jasowang@redhat.com>
>>>          > <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>
>>>          wrote:
>>>          >
>>>          >
>>>          >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>>>          >     > On Thu, Jun 10, 2021 at 9:41 AM Jason
>>>          Wang<jasowang@redhat.com <mailto:jasowang@redhat.com>
>>>          >     <mailto:jasowang@redhat.com
>>>          <mailto:jasowang@redhat.com>>> wrote:
>>>          >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>>>          >     >>> Libvirt usually launches qemu with strict permissions.
>>>          >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper
>>>          was added.
>>>          >     >> A silly question:
>>>          >     >>
>>>          >     >> Kernel had the following permission checks in bpf
>>>          syscall:
>>>          >     >>
>>>          >     >>          if (sysctl_unprivileged_bpf_disabled &&
>>>          !bpf_capable())
>>>          >     >>                   return -EPERM;
>>>          >     >> ...
>>>          >     >>
>>>          >     >>           err = security_bpf(cmd, &attr, size);
>>>          >     >>           if (err < 0)
>>>          >     >>                   return err;
>>>          >     >>
>>>          >     >> So if I understand the code correctly, bpf syscall
>>>          can only be
>>>          >     done if:
>>>          >     >>
>>>          >     >> 1) unprivileged_bpf is enabled or
>>>          >     >> 2) has the capability  and pass the LSM checks
>>>          >     >>
>>>          >     >> So I think the series is for unprivileged_bpf
>>>          disabled. If I'm not
>>>          >     >> wrong, I guess the policy is to grant CAP_BPF but do
>>>          fine grain
>>>          >     checks
>>>          >     >> via LSM.
>>>          >     >>
>>>          >     >> If this is correct, need to describe it in the commit
>>>          log.
>>>          >     >>
>>>          >     >>
>>>          >     >>> Added property "ebpf_rss_fds" for "virtio-net" that
>>>          allows to
>>>          >     >>> initialize eBPF RSS context with passed program &
>>>          maps fds.
>>>          >     >>>
>>>          >     >>> Added qemu-ebpf-rss-helper - simple helper that
>>>          loads eBPF
>>>          >     >>> context and passes fds through unix socket.
>>>          >     >>> Libvirt should call the helper and pass fds to qemu
>>>          through
>>>          >     >>> "ebpf_rss_fds" property.
>>>          >     >>>
>>>          >     >>> Added explicit target OS check for libbpf dependency
>>>          in meson.
>>>          >     >>> eBPF RSS works only with Linux TAP, so there is no
>>>          reason to
>>>          >     >>> build eBPF loader/helper for non-Linux.
>>>          >     >>>
>>>          >     >>> Overall, libvirt process should not be aware of the
>>>          "interface"
>>>          >     >>> of eBPF RSS, it will not be aware of eBPF
>>>          maps/program "type" and
>>>          >     >>> their quantity.
>>>          >     >> I'm not sure this is the best. We have several
>>>          examples that
>>>          >     let libvirt
>>>          >     >> to involve. Examples:
>>>          >     >>
>>>          >     >> 1) create TAP device (and the TUN_SETIFF)
>>>          >     >>
>>>          >     >> 2) open vhost devices
>>>          >     >>
>>>          >     >>
>>>          >     >>>    That's why qemu and the helper should be from
>>>          >     >>> the same build and be "synchronized". Technically
>>>          each qemu may
>>>          >     >>> have its own helper. That's why "query-helper-paths"
>>>          qmp command
>>>          >     >>> was added. Qemu should return the path to the helper
>>>          that suits
>>>          >     >>> and libvirt should use "that" helper for "that"
>>>          emulator.
>>>          >     >>>
>>>          >     >>> qmp sample:
>>>          >     >>> C: { "execute": "query-helper-paths" }
>>>          >     >>> S: { "return": [
>>>          >     >>>        {
>>>          >     >>>          "name": "qemu-ebpf-rss-helper",
>>>          >     >>>          "path":
>>>          "/usr/local/libexec/qemu-ebpf-rss-helper"
>>>          >     >>>        }
>>>          >     >>>       ]
>>>          >     >>>      }
>>>          >     >> I think we need an example on the detail steps for
>>>          how libvirt is
>>>          >     >> expected to use this.
>>>          >     > The preliminary patches for libvirt are at
>>>          >     > https://github.com/daynix/libvirt/tree/RSSv1
>>>          <https://github.com/daynix/libvirt/tree/RSSv1>
>>>          >     <https://github.com/daynix/libvirt/tree/RSSv1
>>>          <https://github.com/daynix/libvirt/tree/RSSv1>>
>>>          >
>>>          >
>>>          >     Will have a look but it would be better if the
>>>          assumption of the
>>>          >     management is detailed here to ease the reviewers.
>>>          >
>>>          >     Thanks
>>>          >
>>>          >
>>>          >     >
>>>          >
>>>

Toke Høiland-Jørgensen June 22, 2021, 8:25 a.m. UTC | #12

Jason Wang <jasowang@redhat.com> writes:

> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>> Hi Jason,
>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>>> and Debian - no need permissions to update BPF maps.
>>>
>>> How about RHEL :) ?
>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>> (for BPF we need 5.8+).
>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>
> Adding Toke for more ideas on this.

Ignore the kernel version number; we backport all of BPF to RHEL,
basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.

However, we completely disable unprivileged BPF on RHEL kernels. Also,
there's upstream commit:
08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")

which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
believe this may end up being the default on Fedora as well.

So any design relying on unprivileged BPF is likely to break; I'd
suggest you look into how you can get this to work with CAP_BPF :)

-Toke

Daniel P. Berrangé June 22, 2021, 8:27 a.m. UTC | #13

On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
> Jason Wang <jasowang@redhat.com> writes:
> 
> > 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>
> >>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >>>> Hi Jason,
> >>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
> >>>> and Debian - no need permissions to update BPF maps.
> >>>
> >>> How about RHEL :) ?
> >> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> >> (for BPF we need 5.8+).
> >> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
> >
> > Adding Toke for more ideas on this.
> 
> Ignore the kernel version number; we backport all of BPF to RHEL,
> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> 
> However, we completely disable unprivileged BPF on RHEL kernels. Also,
> there's upstream commit:
> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
> 
> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
> believe this may end up being the default on Fedora as well.
> 
> So any design relying on unprivileged BPF is likely to break; I'd
> suggest you look into how you can get this to work with CAP_BPF :)

QEMU will never have any capabilities. Any resources that required
privileges have to be opened by a separate privileged helper, and the
open FD then passed across to the QEMU process. This relies on the
capabilities checks only being performed at time of initial opening,
and *not* on operations performed on the already open FD.

Regards,
Daniel

Toke Høiland-Jørgensen June 22, 2021, 9:09 a.m. UTC | #14

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>> 
>> > 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>> >> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>> >>>
>> >>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>> >>>> Hi Jason,
>> >>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>> >>>> and Debian - no need permissions to update BPF maps.
>> >>>
>> >>> How about RHEL :) ?
>> >> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>> >> (for BPF we need 5.8+).
>> >> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>> >
>> > Adding Toke for more ideas on this.
>> 
>> Ignore the kernel version number; we backport all of BPF to RHEL,
>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>> 
>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>> there's upstream commit:
>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>> 
>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>> believe this may end up being the default on Fedora as well.
>> 
>> So any design relying on unprivileged BPF is likely to break; I'd
>> suggest you look into how you can get this to work with CAP_BPF :)
>
> QEMU will never have any capabilities. Any resources that required
> privileges have to be opened by a separate privileged helper, and the
> open FD then passed across to the QEMU process. This relies on the
> capabilities checks only being performed at time of initial opening,
> and *not* on operations performed on the already open FD.

That won't work for regular map updates either, unfortunately: you still
have to perform a bpf() syscall to update an element, and that is a
privileged operation.

You may be able to get around this by using an array map type and
mmap()'ing the map contents, but I'm not sure how well that will work
across process boundaries.

If it doesn't, I only see two possibilities: populate the map
ahead-of-time and leave it in place, or keep the privileged helper
process around to perform map updates on behalf of QEMU...

-Toke

Andrew Melnichenko June 22, 2021, 1:01 p.m. UTC | #15

Hi,
Thank you for your comments.
I'll play with array type mmap. And later will provide some solution.

On Tue, Jun 22, 2021 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:

> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >>
> >> > 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >> >> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com>
> wrote:
> >> >>>
> >> >>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >> >>>> Hi Jason,
> >> >>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,
> Ubuntu,
> >> >>>> and Debian - no need permissions to update BPF maps.
> >> >>>
> >> >>> How about RHEL :) ?
> >> >> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> >> >> (for BPF we need 5.8+).
> >> >> So this will be (probably) relevant for RHEL 9. Please correct me if
> I'm wrong.
> >> >
> >> > Adding Toke for more ideas on this.
> >>
> >> Ignore the kernel version number; we backport all of BPF to RHEL,
> >> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> >>
> >> However, we completely disable unprivileged BPF on RHEL kernels. Also,
> >> there's upstream commit:
> >> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by
> default")
> >>
> >> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
> >> believe this may end up being the default on Fedora as well.
> >>
> >> So any design relying on unprivileged BPF is likely to break; I'd
> >> suggest you look into how you can get this to work with CAP_BPF :)
> >
> > QEMU will never have any capabilities. Any resources that required
> > privileges have to be opened by a separate privileged helper, and the
> > open FD then passed across to the QEMU process. This relies on the
> > capabilities checks only being performed at time of initial opening,
> > and *not* on operations performed on the already open FD.
>
> That won't work for regular map updates either, unfortunately: you still
> have to perform a bpf() syscall to update an element, and that is a
> privileged operation.
>
> You may be able to get around this by using an array map type and
> mmap()'ing the map contents, but I'm not sure how well that will work
> across process boundaries.
>
> If it doesn't, I only see two possibilities: populate the map
> ahead-of-time and leave it in place, or keep the privileged helper
> process around to perform map updates on behalf of QEMU...
>
> -Toke
>
>

Toke Høiland-Jørgensen June 22, 2021, 1:17 p.m. UTC | #16

Andrew Melnichenko <andrew@daynix.com> writes:

> Hi,
> Thank you for your comments.
> I'll play with array type mmap. And later will provide some solution.

Cool - you're welcome! :)

-Toke

Jason Wang June 23, 2021, 12:47 a.m. UTC | #17

在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>>> Jason Wang <jasowang@redhat.com> writes:
>>>
>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>>>>> Hi Jason,
>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>>>>>> and Debian - no need permissions to update BPF maps.
>>>>>> How about RHEL :) ?
>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>>>>> (for BPF we need 5.8+).
>>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>>>> Adding Toke for more ideas on this.
>>> Ignore the kernel version number; we backport all of BPF to RHEL,
>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>>>
>>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>>> there's upstream commit:
>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>>>
>>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>>> believe this may end up being the default on Fedora as well.
>>>
>>> So any design relying on unprivileged BPF is likely to break; I'd
>>> suggest you look into how you can get this to work with CAP_BPF :)
>> QEMU will never have any capabilities. Any resources that required
>> privileges have to be opened by a separate privileged helper, and the
>> open FD then passed across to the QEMU process. This relies on the
>> capabilities checks only being performed at time of initial opening,
>> and *not* on operations performed on the already open FD.
> That won't work for regular map updates either, unfortunately: you still
> have to perform a bpf() syscall to update an element, and that is a
> privileged operation.
>
> You may be able to get around this by using an array map type and
> mmap()'ing the map contents, but I'm not sure how well that will work
> across process boundaries.
>
> If it doesn't, I only see two possibilities: populate the map
> ahead-of-time and leave it in place, or keep the privileged helper
> process around to perform map updates on behalf of QEMU...


Right, and this could be probably done by extending and tracking the RSS 
update via rx filter event.

Thanks


>
> -Toke
>

Yuri Benditovich June 28, 2021, 11:18 a.m. UTC | #18

On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >
> >> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
> >>> Jason Wang <jasowang@redhat.com> writes:
> >>>
> >>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >>>>>>> Hi Jason,
> >>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
> >>>>>>> and Debian - no need permissions to update BPF maps.
> >>>>>> How about RHEL :) ?
> >>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> >>>>> (for BPF we need 5.8+).
> >>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
> >>>> Adding Toke for more ideas on this.
> >>> Ignore the kernel version number; we backport all of BPF to RHEL,
> >>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> >>>
> >>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
> >>> there's upstream commit:
> >>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
> >>>
> >>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
> >>> believe this may end up being the default on Fedora as well.
> >>>
> >>> So any design relying on unprivileged BPF is likely to break; I'd
> >>> suggest you look into how you can get this to work with CAP_BPF :)
> >> QEMU will never have any capabilities. Any resources that required
> >> privileges have to be opened by a separate privileged helper, and the
> >> open FD then passed across to the QEMU process. This relies on the
> >> capabilities checks only being performed at time of initial opening,
> >> and *not* on operations performed on the already open FD.
> > That won't work for regular map updates either, unfortunately: you still
> > have to perform a bpf() syscall to update an element, and that is a
> > privileged operation.
> >
> > You may be able to get around this by using an array map type and
> > mmap()'ing the map contents, but I'm not sure how well that will work
> > across process boundaries.
> >
> > If it doesn't, I only see two possibilities: populate the map
> > ahead-of-time and leave it in place, or keep the privileged helper
> > process around to perform map updates on behalf of QEMU...
>
>
> Right, and this could be probably done by extending and tracking the RSS
> update via rx filter event.

Jason,
Can you please get a little into details - what you mean by 'extending
and tracking the RSS
> update via rx filter event'?

Thanks,
Yuri

>
> Thanks
>
>
> >
> > -Toke
> >
>

Jason Wang June 29, 2021, 3:39 a.m. UTC | #19

在 2021/6/28 下午7:18, Yuri Benditovich 写道:
> On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>
>>>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>>>>> Jason Wang <jasowang@redhat.com> writes:
>>>>>
>>>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>>>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>>>>>>> Hi Jason,
>>>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>>>>>>>> and Debian - no need permissions to update BPF maps.
>>>>>>>> How about RHEL :) ?
>>>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>>>>>>> (for BPF we need 5.8+).
>>>>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>>>>>> Adding Toke for more ideas on this.
>>>>> Ignore the kernel version number; we backport all of BPF to RHEL,
>>>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>>>>>
>>>>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>>>>> there's upstream commit:
>>>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>>>>>
>>>>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>>>>> believe this may end up being the default on Fedora as well.
>>>>>
>>>>> So any design relying on unprivileged BPF is likely to break; I'd
>>>>> suggest you look into how you can get this to work with CAP_BPF :)
>>>> QEMU will never have any capabilities. Any resources that required
>>>> privileges have to be opened by a separate privileged helper, and the
>>>> open FD then passed across to the QEMU process. This relies on the
>>>> capabilities checks only being performed at time of initial opening,
>>>> and *not* on operations performed on the already open FD.
>>> That won't work for regular map updates either, unfortunately: you still
>>> have to perform a bpf() syscall to update an element, and that is a
>>> privileged operation.
>>>
>>> You may be able to get around this by using an array map type and
>>> mmap()'ing the map contents, but I'm not sure how well that will work
>>> across process boundaries.
>>>
>>> If it doesn't, I only see two possibilities: populate the map
>>> ahead-of-time and leave it in place, or keep the privileged helper
>>> process around to perform map updates on behalf of QEMU...
>>
>> Right, and this could be probably done by extending and tracking the RSS
>> update via rx filter event.
> Jason,
> Can you please get a little into details - what you mean by 'extending
> and tracking the RSS


There's a monitor event which could be used for qemu to notify the 
privileged application (e.g the one has CAP_NET_ADMIN) to update the rx 
filter attributes of the host networking device.

It works like, when the rx filters is updated by guest, qemu will 
generate an rx filter update event (see rxfilter_notify()) which could 
be captured by the privileged application.

Then the privileged application query rx filter information via 
query-rx-filter command and do the proper setups.

This is designed for macvtap but I think it might be used by RSS as well.

The helper can monitor the rx-filter event and update the eBPF maps. But 
I'm not sure if it needs some coordination with libvirt in this case.

Thanks


>> update via rx filter event'?
> Thanks,
> Yuri
>
>> Thanks
>>
>>
>>> -Toke
>>>

Andrew Melnichenko June 30, 2021, 4:40 p.m. UTC | #20

Hi, all.
Thank you for ur comments.
I've tested few possible solutions and I'll prepare new patches for RFC
with mmap() based eBPF in the near future.

On Tue, Jun 29, 2021 at 6:39 AM Jason Wang <jasowang@redhat.com> wrote:

>
> 在 2021/6/28 下午7:18, Yuri Benditovich 写道:
> > On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
> >>> Daniel P. Berrangé <berrange@redhat.com> writes:
> >>>
> >>>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen
> wrote:
> >>>>> Jason Wang <jasowang@redhat.com> writes:
> >>>>>
> >>>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >>>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com>
> wrote:
> >>>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >>>>>>>>> Hi Jason,
> >>>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,
> Ubuntu,
> >>>>>>>>> and Debian - no need permissions to update BPF maps.
> >>>>>>>> How about RHEL :) ?
> >>>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels
> yet
> >>>>>>> (for BPF we need 5.8+).
> >>>>>>> So this will be (probably) relevant for RHEL 9. Please correct me
> if I'm wrong.
> >>>>>> Adding Toke for more ideas on this.
> >>>>> Ignore the kernel version number; we backport all of BPF to RHEL,
> >>>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> >>>>>
> >>>>> However, we completely disable unprivileged BPF on RHEL kernels.
> Also,
> >>>>> there's upstream commit:
> >>>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by
> default")
> >>>>>
> >>>>> which adds a new value of '2' to the unprivileged_bpf_disable
> sysctl. I
> >>>>> believe this may end up being the default on Fedora as well.
> >>>>>
> >>>>> So any design relying on unprivileged BPF is likely to break; I'd
> >>>>> suggest you look into how you can get this to work with CAP_BPF :)
> >>>> QEMU will never have any capabilities. Any resources that required
> >>>> privileges have to be opened by a separate privileged helper, and the
> >>>> open FD then passed across to the QEMU process. This relies on the
> >>>> capabilities checks only being performed at time of initial opening,
> >>>> and *not* on operations performed on the already open FD.
> >>> That won't work for regular map updates either, unfortunately: you
> still
> >>> have to perform a bpf() syscall to update an element, and that is a
> >>> privileged operation.
> >>>
> >>> You may be able to get around this by using an array map type and
> >>> mmap()'ing the map contents, but I'm not sure how well that will work
> >>> across process boundaries.
> >>>
> >>> If it doesn't, I only see two possibilities: populate the map
> >>> ahead-of-time and leave it in place, or keep the privileged helper
> >>> process around to perform map updates on behalf of QEMU...
> >>
> >> Right, and this could be probably done by extending and tracking the RSS
> >> update via rx filter event.
> > Jason,
> > Can you please get a little into details - what you mean by 'extending
> > and tracking the RSS
>
>
> There's a monitor event which could be used for qemu to notify the
> privileged application (e.g the one has CAP_NET_ADMIN) to update the rx
> filter attributes of the host networking device.
>
> It works like, when the rx filters is updated by guest, qemu will
> generate an rx filter update event (see rxfilter_notify()) which could
> be captured by the privileged application.
>
> Then the privileged application query rx filter information via
> query-rx-filter command and do the proper setups.
>
> This is designed for macvtap but I think it might be used by RSS as well.
>
> The helper can monitor the rx-filter event and update the eBPF maps. But
> I'm not sure if it needs some coordination with libvirt in this case.
>
> Thanks
>
>
> >> update via rx filter event'?
> > Thanks,
> > Yuri
> >
> >> Thanks
> >>
> >>
> >>> -Toke
> >>>
>
>

[RFC,0/5] ebpf: Added ebpf helper for libvirtd.

Message

Comments