[ovs-dev,RFC,0/2] Basic eBPF/XDP support in OVN.

Message ID	20220531004237.3872754-1-numans@ovn.org
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> sender: numans@ovn.org) by mail.gandi.net (Postfix) with ESMTPSA id A3D16E0002; Tue, 31 May 2022 00:45:58 +0000 (UTC) From: numans@ovn.org To: dev@openvswitch.org Date: Mon, 30 May 2022 20:42:37 -0400 Message-Id: <20220531004237.3872754-1-numans@ovn.org> MIME-Version: 1.0 Subject: [ovs-dev] [RFC ovn 0/2] Basic eBPF/XDP support in OVN. Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org>
Series	Basic eBPF/XDP support in OVN. \| expand [ovs-dev,RFC,0/2] Basic eBPF/XDP support in OVN. [ovs-dev,RFC,1/2] RFC: Add basic xdp/eBPF support in OVN. [ovs-dev,RFC,2/2] RFC: ovn-controller: Attach XDP progs to the VIFs of the logical ports.

Numan Siddique May 31, 2022, 12:42 a.m. UTC

From: Numan Siddique <numans@ovn.org>

XDP program - ovn_xdp.c added in this RFC patch  series implements basic port
security and drops any packet if the port security check fails.
There are still few TODOs in the port security checks. Like
      - Make ovn xdp configurable.
      - Removing the ingress Openflow rules from table 73 and 74 if ovn xdp
        is enabled.
      - Add IPv6 support.
      - Enhance the port security xdp program for ARP/IPv6 ND checks.
    
This patch adds a basic XDP support in OVN and in future we can
leverage eBPF/XDP features.

I'm not sure how much value this RFC patch adds to make use of eBPF/XDP
just for port security.  Submitting as RFC to get some feedback and
start some conversation on eBPF/XDP in OVN.

In order to attach and detach xdp programs,  libxdp [1] and libbpf is used.
    
To test it out locally, please install libxdp-devel and libbpf-devel
and the compile OVN first and then compile ovn_xdp by running "make
bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/


Numan Siddique (2):
  RFC: Add basic xdp/eBPF support in OVN.
  RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
    ports.

 Makefile.am                 |   6 +-
 bpf/.gitignore              |   5 +
 bpf/automake.mk             |  23 +++
 bpf/ovn_xdp.c               | 156 +++++++++++++++
 configure.ac                |   2 +
 controller/automake.mk      |   4 +-
 controller/binding.c        |  45 +++--
 controller/binding.h        |   7 +
 controller/ovn-controller.c |  79 +++++++-
 controller/xdp.c            | 389 ++++++++++++++++++++++++++++++++++++
 controller/xdp.h            |  41 ++++
 m4/ovn.m4                   |  20 ++
 tests/automake.mk           |   1 +
 13 files changed, 753 insertions(+), 25 deletions(-)
 create mode 100644 bpf/.gitignore
 create mode 100644 bpf/automake.mk
 create mode 100644 bpf/ovn_xdp.c
 create mode 100644 controller/xdp.c
 create mode 100644 controller/xdp.h

Han Zhou June 8, 2022, 6:42 a.m. UTC | #1

On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
>
> From: Numan Siddique <numans@ovn.org>
>
> XDP program - ovn_xdp.c added in this RFC patch  series implements basic
port
> security and drops any packet if the port security check fails.
> There are still few TODOs in the port security checks. Like
>       - Make ovn xdp configurable.
>       - Removing the ingress Openflow rules from table 73 and 74 if ovn
xdp
>         is enabled.
>       - Add IPv6 support.
>       - Enhance the port security xdp program for ARP/IPv6 ND checks.
>
> This patch adds a basic XDP support in OVN and in future we can
> leverage eBPF/XDP features.
>
> I'm not sure how much value this RFC patch adds to make use of eBPF/XDP
> just for port security.  Submitting as RFC to get some feedback and
> start some conversation on eBPF/XDP in OVN.
>
Hi Numan,

This is really cool. It demonstrates how OVN could leverage eBPF/XDP.

On the other hand, for the port-security feature in XDP, I keep thinking
about the scenarios and it is still not very clear to me. One advantage I
can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are
used, XDP may perform better and drop packets with lower CPU cost
(comparing with OVS kernel datapath). However, I am also wondering why
would a attacker use invalid IP/MAC for DOS attacks? Do you have some more
thoughts about the use cases? And do you have any performance results
comparing with the current OVS implementation?

Another question is, would it work with smart NIC HW-offload, where VF
representer ports are added to OVS on the smart NIC? I guess XDP doesn't
support representer port, right?

Thanks,
Han

> In order to attach and detach xdp programs,  libxdp [1] and libbpf is
used.
>
> To test it out locally, please install libxdp-devel and libbpf-devel
> and the compile OVN first and then compile ovn_xdp by running "make
> bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/
>
>
> Numan Siddique (2):
>   RFC: Add basic xdp/eBPF support in OVN.
>   RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
>     ports.
>
>  Makefile.am                 |   6 +-
>  bpf/.gitignore              |   5 +
>  bpf/automake.mk             |  23 +++
>  bpf/ovn_xdp.c               | 156 +++++++++++++++
>  configure.ac                |   2 +
>  controller/automake.mk      |   4 +-
>  controller/binding.c        |  45 +++--
>  controller/binding.h        |   7 +
>  controller/ovn-controller.c |  79 +++++++-
>  controller/xdp.c            | 389 ++++++++++++++++++++++++++++++++++++
>  controller/xdp.h            |  41 ++++
>  m4/ovn.m4                   |  20 ++
>  tests/automake.mk           |   1 +
>  13 files changed, 753 insertions(+), 25 deletions(-)
>  create mode 100644 bpf/.gitignore
>  create mode 100644 bpf/automake.mk
>  create mode 100644 bpf/ovn_xdp.c
>  create mode 100644 controller/xdp.c
>  create mode 100644 controller/xdp.h
>
> --
> 2.35.3
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Mengxin Liu June 8, 2022, 10:33 a.m. UTC | #2

Just give some input about eBPF/XDP support.

We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found
that
the L2 LB will use conntrack and ovs clone which hurts performance badly.
The latency
for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop from
6Mb/s to 2.8Mb/s.

Even if the traffic does not target to LB VIPs has the same performance
drop and it also leads to the
total datapath cannot be offloaded to hardware.

And finally we turn to using Cilium's chaining mode to replace the OVN L2
LB to implement kube-proxy to
resolve the above issues. We hope to see the lb optimization by eBPF/XDP on
the OVN side.

On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote:

> On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
> >
> > From: Numan Siddique <numans@ovn.org>
> >
> > XDP program - ovn_xdp.c added in this RFC patch  series implements basic
> port
> > security and drops any packet if the port security check fails.
> > There are still few TODOs in the port security checks. Like
> >       - Make ovn xdp configurable.
> >       - Removing the ingress Openflow rules from table 73 and 74 if ovn
> xdp
> >         is enabled.
> >       - Add IPv6 support.
> >       - Enhance the port security xdp program for ARP/IPv6 ND checks.
> >
> > This patch adds a basic XDP support in OVN and in future we can
> > leverage eBPF/XDP features.
> >
> > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP
> > just for port security.  Submitting as RFC to get some feedback and
> > start some conversation on eBPF/XDP in OVN.
> >
> Hi Numan,
>
> This is really cool. It demonstrates how OVN could leverage eBPF/XDP.
>
> On the other hand, for the port-security feature in XDP, I keep thinking
> about the scenarios and it is still not very clear to me. One advantage I
> can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are
> used, XDP may perform better and drop packets with lower CPU cost
> (comparing with OVS kernel datapath). However, I am also wondering why
> would a attacker use invalid IP/MAC for DOS attacks? Do you have some more
> thoughts about the use cases? And do you have any performance results
> comparing with the current OVS implementation?
>
> Another question is, would it work with smart NIC HW-offload, where VF
> representer ports are added to OVS on the smart NIC? I guess XDP doesn't
> support representer port, right?
>
> Thanks,
> Han
>
> > In order to attach and detach xdp programs,  libxdp [1] and libbpf is
> used.
> >
> > To test it out locally, please install libxdp-devel and libbpf-devel
> > and the compile OVN first and then compile ovn_xdp by running "make
> > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/
> >
> >
> > Numan Siddique (2):
> >   RFC: Add basic xdp/eBPF support in OVN.
> >   RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
> >     ports.
> >
> >  Makefile.am                 |   6 +-
> >  bpf/.gitignore              |   5 +
> >  bpf/automake.mk             |  23 +++
> >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> >  configure.ac                |   2 +
> >  controller/automake.mk      |   4 +-
> >  controller/binding.c        |  45 +++--
> >  controller/binding.h        |   7 +
> >  controller/ovn-controller.c |  79 +++++++-
> >  controller/xdp.c            | 389 ++++++++++++++++++++++++++++++++++++
> >  controller/xdp.h            |  41 ++++
> >  m4/ovn.m4                   |  20 ++
> >  tests/automake.mk           |   1 +
> >  13 files changed, 753 insertions(+), 25 deletions(-)
> >  create mode 100644 bpf/.gitignore
> >  create mode 100644 bpf/automake.mk
> >  create mode 100644 bpf/ovn_xdp.c
> >  create mode 100644 controller/xdp.c
> >  create mode 100644 controller/xdp.h
> >
> > --
> > 2.35.3
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

Numan Siddique June 8, 2022, 3:07 p.m. UTC | #3

On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> wrote:
>
> Just give some input about eBPF/XDP support.
>
> We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found
> that
> the L2 LB will use conntrack and ovs clone which hurts performance badly.
> The latency
> for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop from
> 6Mb/s to 2.8Mb/s.
>
> Even if the traffic does not target to LB VIPs has the same performance
> drop and it also leads to the
> total datapath cannot be offloaded to hardware.
>
> And finally we turn to using Cilium's chaining mode to replace the OVN L2
> LB to implement kube-proxy to
> resolve the above issues. We hope to see the lb optimization by eBPF/XDP on
> the OVN side.
>

Thanks for your comments and inputs.   I think we should definitely
explore optimizing this use case
and see if its possible to leverage eBPF/XDP for this.

> On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote:
>
> > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
> > >
> > > From: Numan Siddique <numans@ovn.org>
> > >
> > > XDP program - ovn_xdp.c added in this RFC patch  series implements basic
> > port
> > > security and drops any packet if the port security check fails.
> > > There are still few TODOs in the port security checks. Like
> > >       - Make ovn xdp configurable.
> > >       - Removing the ingress Openflow rules from table 73 and 74 if ovn
> > xdp
> > >         is enabled.
> > >       - Add IPv6 support.
> > >       - Enhance the port security xdp program for ARP/IPv6 ND checks.
> > >
> > > This patch adds a basic XDP support in OVN and in future we can
> > > leverage eBPF/XDP features.
> > >
> > > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP
> > > just for port security.  Submitting as RFC to get some feedback and
> > > start some conversation on eBPF/XDP in OVN.
> > >
> > Hi Numan,
> >
> > This is really cool. It demonstrates how OVN could leverage eBPF/XDP.
> >
> > On the other hand, for the port-security feature in XDP, I keep thinking
> > about the scenarios and it is still not very clear to me. One advantage I
> > can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are
> > used, XDP may perform better and drop packets with lower CPU cost
> > (comparing with OVS kernel datapath). However, I am also wondering why
> > would a attacker use invalid IP/MAC for DOS attacks? Do you have some more
> > thoughts about the use cases?

My idea was to demonstrate the use of eBPF/XDP and port security
checks were easy to do
before the packet hits the OVS pipeline.

If we were to move the port security check to XDP, then the only
advantage we would be getting
in my opinion is to remove the corresponding ingress port security
check related OF rules from ovs-vswitchd, thereby decreasing some
looks up during
flow translation.

I'm not sure why an attacker would use invalid IP/MAC for DOS attacks.
But from what I know, ovn-kubernetes do want to restrict each POD to
its assigned IP/MAC.

 And do you have any performance results
> > comparing with the current OVS implementation?

I didn't do any scale/performance related tests.

If we were to move port security feature to XDP in OVN, then I think  we need to
   - Complete the TODO's like adding IPv6 and ARP/ND related checks
   - Do some scale testing and see whether its reducing memory
footprint of ovs-vswitchd and ovn-controller because of the reduction
in OF rules

> >
> > Another question is, would it work with smart NIC HW-offload, where VF
> > representer ports are added to OVS on the smart NIC? I guess XDP doesn't
> > support representer port, right?

I think so. I don't have much experience/knowledge on this.  From what
I understand,  if datapath flows are offloaded and since XDP is not
offloaded, the xdo checks will be totally missed.
So if XDP is to be used, then offloading should be disabled.

Thanks
Numan

> >
> > Thanks,
> > Han
> >
> > > In order to attach and detach xdp programs,  libxdp [1] and libbpf is
> > used.
> > >
> > > To test it out locally, please install libxdp-devel and libbpf-devel
> > > and the compile OVN first and then compile ovn_xdp by running "make
> > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/
> > >
> > >
> > > Numan Siddique (2):
> > >   RFC: Add basic xdp/eBPF support in OVN.
> > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
> > >     ports.
> > >
> > >  Makefile.am                 |   6 +-
> > >  bpf/.gitignore              |   5 +
> > >  bpf/automake.mk             |  23 +++
> > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> > >  configure.ac                |   2 +
> > >  controller/automake.mk      |   4 +-
> > >  controller/binding.c        |  45 +++--
> > >  controller/binding.h        |   7 +
> > >  controller/ovn-controller.c |  79 +++++++-
> > >  controller/xdp.c            | 389 ++++++++++++++++++++++++++++++++++++
> > >  controller/xdp.h            |  41 ++++
> > >  m4/ovn.m4                   |  20 ++
> > >  tests/automake.mk           |   1 +
> > >  13 files changed, 753 insertions(+), 25 deletions(-)
> > >  create mode 100644 bpf/.gitignore
> > >  create mode 100644 bpf/automake.mk
> > >  create mode 100644 bpf/ovn_xdp.c
> > >  create mode 100644 controller/xdp.c
> > >  create mode 100644 controller/xdp.h
> > >
> > > --
> > > 2.35.3
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
>
> --
> 刘梦馨
> Blog： http://oilbeater.com
> Weibo: @oilbeater <http://weibo.com/oilbeater>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Han Zhou June 8, 2022, 3:52 p.m. UTC | #4

On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> wrote:
> >
> > Just give some input about eBPF/XDP support.
> >
> > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found
> > that
> > the L2 LB will use conntrack and ovs clone which hurts performance
badly.
> > The latency
> > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop from
> > 6Mb/s to 2.8Mb/s.
> >
Thanks for the input!
Could you tell roughly how many packets were sent in a single test? Was the
latency measured for all the UDP packets in average? I am asking because if
the packets hit mega flows in the kernel cache, it shouldn't be slower than
kube-proxy which also uses conntrack. If it is HW offloaded it should be
faster.

> > Even if the traffic does not target to LB VIPs has the same performance
> > drop and it also leads to the
> > total datapath cannot be offloaded to hardware.
> >

Was it clear why the total datapath cannot be offloaded to HW? There might
be problems of supporting HW offloading in earlier version of OVN. There
have been improvements to make it more HW offload friendly.

> > And finally we turn to using Cilium's chaining mode to replace the OVN
L2
> > LB to implement kube-proxy to
> > resolve the above issues. We hope to see the lb optimization by
eBPF/XDP on
> > the OVN side.
> >
>
> Thanks for your comments and inputs.   I think we should definitely
> explore optimizing this use case
> and see if its possible to leverage eBPF/XDP for this.
>

I am sorry that I am confused by OVN "L2" LB. I think you might mean OVN
"L3/L4" LB?

Some general thoughts on this is, OVN is primarily to program OVS (or other
OpenFlow based datapath) to implement SDN. OVS OpenFlow is a data-driven
approach (as mentioned by Ben in several talks). The advantage is that it
uses caches to accelerate datapath, regardless of the number of pipeline
stages in the forwarding logic; and the disadvantage is of course when a
packet has a cache miss, it will be slow. So I would think the direction of
using eBPF/XDP is better to be within OVS itself, instead of adding an
extra stage that cannot be cached within the OVS framework, because even if
the extra stage is very fast, it is still extra.

I would consider such an extra eBPF/XDP stage in OVN directly only for the
cases that we know it is likely to miss the OVS/HW flow caches. One example
may be DOS attacks that always trigger CT unestablished entries, which is
not HW offload friendly. (But I don't have concrete use cases/scenarios)

In the case of OVN LB, I don't see a reason why it would miss the cache
except for the first packets. Adding an extra eBPF/XDP stage on top of the
OVS cached pipeline doesn't seem to improve the performance.

> > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote:
> >
> > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
> > > >
> > > > From: Numan Siddique <numans@ovn.org>
> > > >
> > > > XDP program - ovn_xdp.c added in this RFC patch  series implements
basic
> > > port
> > > > security and drops any packet if the port security check fails.
> > > > There are still few TODOs in the port security checks. Like
> > > >       - Make ovn xdp configurable.
> > > >       - Removing the ingress Openflow rules from table 73 and 74 if
ovn
> > > xdp
> > > >         is enabled.
> > > >       - Add IPv6 support.
> > > >       - Enhance the port security xdp program for ARP/IPv6 ND
checks.
> > > >
> > > > This patch adds a basic XDP support in OVN and in future we can
> > > > leverage eBPF/XDP features.
> > > >
> > > > I'm not sure how much value this RFC patch adds to make use of
eBPF/XDP
> > > > just for port security.  Submitting as RFC to get some feedback and
> > > > start some conversation on eBPF/XDP in OVN.
> > > >
> > > Hi Numan,
> > >
> > > This is really cool. It demonstrates how OVN could leverage eBPF/XDP.
> > >
> > > On the other hand, for the port-security feature in XDP, I keep
thinking
> > > about the scenarios and it is still not very clear to me. One
advantage I
> > > can think of is to prevent DOS attacks from VM/Pod when invalid
IP/MAC are
> > > used, XDP may perform better and drop packets with lower CPU cost
> > > (comparing with OVS kernel datapath). However, I am also wondering why
> > > would a attacker use invalid IP/MAC for DOS attacks? Do you have some
more
> > > thoughts about the use cases?
>
> My idea was to demonstrate the use of eBPF/XDP and port security
> checks were easy to do
> before the packet hits the OVS pipeline.
>
Understand. It is indeed a great demonstration.

> If we were to move the port security check to XDP, then the only
> advantage we would be getting
> in my opinion is to remove the corresponding ingress port security
> check related OF rules from ovs-vswitchd, thereby decreasing some
> looks up during
> flow translation.
>
For slow path, it might reduce the lookups in two tables, but considering
that we have tens of tables, this cost may be negligible?
For fast path, there is no impact on the megaflow cache.

> I'm not sure why an attacker would use invalid IP/MAC for DOS attacks.
> But from what I know, ovn-kubernetes do want to restrict each POD to
> its assigned IP/MAC.
>
Yes, restricting pods to use assigned IP/MAC is for port security, which is
implemented by the port-security flows. I was talking about DOS attacks
just to imagine a use case that utilizes the performance advantage of XDP.
If it is just to detect and drop a regular amount of packets that try to
use fake IP/MAC to circumvent security policies (ACLs), it doesn't reflect
the benefit of XDP.

>  And do you have any performance results
> > > comparing with the current OVS implementation?
>
> I didn't do any scale/performance related tests.
>
> If we were to move port security feature to XDP in OVN, then I think  we
need to
>    - Complete the TODO's like adding IPv6 and ARP/ND related checks
>    - Do some scale testing and see whether its reducing memory
> footprint of ovs-vswitchd and ovn-controller because of the reduction
> in OF rules
>

Maybe I am wrong, but I think port-security flows are only related to local
LSPs on each node, which doesn't contribute much to the OVS/ovn-controller
memory footprint, and thanks to your patches that moves port-security flow
generation from northd to ovn-controller, the central components are
already out of the picture of the port-security related costs. So I guess
we won't see obvious differences in scale tests.

> > >
> > > Another question is, would it work with smart NIC HW-offload, where VF
> > > representer ports are added to OVS on the smart NIC? I guess XDP
doesn't
> > > support representer port, right?
>
> I think so. I don't have much experience/knowledge on this.  From what
> I understand,  if datapath flows are offloaded and since XDP is not
> offloaded, the xdo checks will be totally missed.
> So if XDP is to be used, then offloading should be disabled.
>

Agree, although I did hope it could help for HW offload enabled
environments to mitigate the scenarios when packets would miss the HW flow
cache.

Thanks,
Han

> Thanks
> Numan
>
> > >
> > > Thanks,
> > > Han
> > >
> > > > In order to attach and detach xdp programs,  libxdp [1] and libbpf
is
> > > used.
> > > >
> > > > To test it out locally, please install libxdp-devel and libbpf-devel
> > > > and the compile OVN first and then compile ovn_xdp by running "make
> > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
/usr/local/share/ovn/
> > > >
> > > >
> > > > Numan Siddique (2):
> > > >   RFC: Add basic xdp/eBPF support in OVN.
> > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
> > > >     ports.
> > > >
> > > >  Makefile.am                 |   6 +-
> > > >  bpf/.gitignore              |   5 +
> > > >  bpf/automake.mk             |  23 +++
> > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> > > >  configure.ac                |   2 +
> > > >  controller/automake.mk      |   4 +-
> > > >  controller/binding.c        |  45 +++--
> > > >  controller/binding.h        |   7 +
> > > >  controller/ovn-controller.c |  79 +++++++-
> > > >  controller/xdp.c            | 389
++++++++++++++++++++++++++++++++++++
> > > >  controller/xdp.h            |  41 ++++
> > > >  m4/ovn.m4                   |  20 ++
> > > >  tests/automake.mk           |   1 +
> > > >  13 files changed, 753 insertions(+), 25 deletions(-)
> > > >  create mode 100644 bpf/.gitignore
> > > >  create mode 100644 bpf/automake.mk
> > > >  create mode 100644 bpf/ovn_xdp.c
> > > >  create mode 100644 controller/xdp.c
> > > >  create mode 100644 controller/xdp.h
> > > >
> > > > --
> > > > 2.35.3
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
> >
> > --
> > 刘梦馨
> > Blog： http://oilbeater.com
> > Weibo: @oilbeater <http://weibo.com/oilbeater>
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Mengxin Liu June 8, 2022, 4:41 p.m. UTC | #5

> Could you tell roughly how many packets were sent in a single test? Was
the latency measured for all the UDP packets in average?

Let me describe my test method more clearly. In fact, we only tested
pod-to-pod performance *not* pod-to-service and then do profile with
flamegraph and find the loadbalancer process took about 30% CPU usage.

Run two Pods in two different node, and one run qperf server the other run
qperf client to test udp latency and bandwidth performance with command
`qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`.

In the first test, we use kube-ovn default setup which use ovn loadbalancer
to replace kube-proxy and got the result latency  25.7us and bandwidth
2.8Mb/s

Then we manually delete all ovn loadbalancer rules bind to the logical
switch, and got a much better result 18.5us and 6Mb/s

> Was it clear why the total datapath cannot be offloaded to HW?
The issue we meet with hw-offload is that mellanox cx5/cx6 didn't support
dp_hash and hash at the moment and these two method are used by
group table to select a backend.
What makes things worse is that when any lb bind to a ls all packet will go
through the lb pipeline even if it not designate to service. So the total
ls datapath cannot be offloaded.

We have a customized path to bypaas the lb pipeline if traffic not
designate to service here
https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch

> I am sorry that I am confused by OVN "L2" LB. I think you might mean OVN
"L3/L4" LB?
I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to replace
kube-proxy

>   I am asking because if the packets hit mega flows in the kernel cache,
it shouldn't be slower than kube-proxy which also uses conntrack. If it is
HW offloaded it should be faster.

In my previous profile it seems unrelated to mega flow cache. The flame
graph shows that there is extra ovs clone and reprocess compared to the
flame graph without lb. I have introduced how to profile and optimize
kube-ovn performance before and give more detail about the lb performance
issue at the beginning of the video in Chinese
https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide more
help

On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote:

>
>
> On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> wrote:
> > >
> > > Just give some input about eBPF/XDP support.
> > >
> > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found
> > > that
> > > the L2 LB will use conntrack and ovs clone which hurts performance
> badly.
> > > The latency
> > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop
> from
> > > 6Mb/s to 2.8Mb/s.
> > >
> Thanks for the input!
> Could you tell roughly how many packets were sent in a single test? Was
> the latency measured for all the UDP packets in average? I am asking
> because if the packets hit mega flows in the kernel cache, it shouldn't be
> slower than kube-proxy which also uses conntrack. If it is HW offloaded it
> should be faster.
>
> > > Even if the traffic does not target to LB VIPs has the same performance
> > > drop and it also leads to the
> > > total datapath cannot be offloaded to hardware.
> > >
>
> Was it clear why the total datapath cannot be offloaded to HW? There might
> be problems of supporting HW offloading in earlier version of OVN. There
> have been improvements to make it more HW offload friendly.
>
> > > And finally we turn to using Cilium's chaining mode to replace the OVN
> L2
> > > LB to implement kube-proxy to
> > > resolve the above issues. We hope to see the lb optimization by
> eBPF/XDP on
> > > the OVN side.
> > >
> >
> > Thanks for your comments and inputs.   I think we should definitely
> > explore optimizing this use case
> > and see if its possible to leverage eBPF/XDP for this.
> >
>
> I am sorry that I am confused by OVN "L2" LB. I think you might mean OVN
> "L3/L4" LB?
>
> Some general thoughts on this is, OVN is primarily to program OVS (or
> other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a
> data-driven approach (as mentioned by Ben in several talks). The advantage
> is that it uses caches to accelerate datapath, regardless of the number of
> pipeline stages in the forwarding logic; and the disadvantage is of course
> when a packet has a cache miss, it will be slow. So I would think the
> direction of using eBPF/XDP is better to be within OVS itself, instead of
> adding an extra stage that cannot be cached within the OVS framework,
> because even if the extra stage is very fast, it is still extra.
>
> I would consider such an extra eBPF/XDP stage in OVN directly only for the
> cases that we know it is likely to miss the OVS/HW flow caches. One example
> may be DOS attacks that always trigger CT unestablished entries, which is
> not HW offload friendly. (But I don't have concrete use cases/scenarios)
>
> In the case of OVN LB, I don't see a reason why it would miss the cache
> except for the first packets. Adding an extra eBPF/XDP stage on top of the
> OVS cached pipeline doesn't seem to improve the performance.
>
> > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote:
> > >
> > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
> > > > >
> > > > > From: Numan Siddique <numans@ovn.org>
> > > > >
> > > > > XDP program - ovn_xdp.c added in this RFC patch  series implements
> basic
> > > > port
> > > > > security and drops any packet if the port security check fails.
> > > > > There are still few TODOs in the port security checks. Like
> > > > >       - Make ovn xdp configurable.
> > > > >       - Removing the ingress Openflow rules from table 73 and 74
> if ovn
> > > > xdp
> > > > >         is enabled.
> > > > >       - Add IPv6 support.
> > > > >       - Enhance the port security xdp program for ARP/IPv6 ND
> checks.
> > > > >
> > > > > This patch adds a basic XDP support in OVN and in future we can
> > > > > leverage eBPF/XDP features.
> > > > >
> > > > > I'm not sure how much value this RFC patch adds to make use of
> eBPF/XDP
> > > > > just for port security.  Submitting as RFC to get some feedback and
> > > > > start some conversation on eBPF/XDP in OVN.
> > > > >
> > > > Hi Numan,
> > > >
> > > > This is really cool. It demonstrates how OVN could leverage eBPF/XDP.
> > > >
> > > > On the other hand, for the port-security feature in XDP, I keep
> thinking
> > > > about the scenarios and it is still not very clear to me. One
> advantage I
> > > > can think of is to prevent DOS attacks from VM/Pod when invalid
> IP/MAC are
> > > > used, XDP may perform better and drop packets with lower CPU cost
> > > > (comparing with OVS kernel datapath). However, I am also wondering
> why
> > > > would a attacker use invalid IP/MAC for DOS attacks? Do you have
> some more
> > > > thoughts about the use cases?
> >
> > My idea was to demonstrate the use of eBPF/XDP and port security
> > checks were easy to do
> > before the packet hits the OVS pipeline.
> >
> Understand. It is indeed a great demonstration.
>
> > If we were to move the port security check to XDP, then the only
> > advantage we would be getting
> > in my opinion is to remove the corresponding ingress port security
> > check related OF rules from ovs-vswitchd, thereby decreasing some
> > looks up during
> > flow translation.
> >
> For slow path, it might reduce the lookups in two tables, but considering
> that we have tens of tables, this cost may be negligible?
> For fast path, there is no impact on the megaflow cache.
>
> > I'm not sure why an attacker would use invalid IP/MAC for DOS attacks.
> > But from what I know, ovn-kubernetes do want to restrict each POD to
> > its assigned IP/MAC.
> >
> Yes, restricting pods to use assigned IP/MAC is for port security, which
> is implemented by the port-security flows. I was talking about DOS attacks
> just to imagine a use case that utilizes the performance advantage of XDP.
> If it is just to detect and drop a regular amount of packets that try to
> use fake IP/MAC to circumvent security policies (ACLs), it doesn't reflect
> the benefit of XDP.
>
> >  And do you have any performance results
> > > > comparing with the current OVS implementation?
> >
> > I didn't do any scale/performance related tests.
> >
> > If we were to move port security feature to XDP in OVN, then I think  we
> need to
> >    - Complete the TODO's like adding IPv6 and ARP/ND related checks
> >    - Do some scale testing and see whether its reducing memory
> > footprint of ovs-vswitchd and ovn-controller because of the reduction
> > in OF rules
> >
>
> Maybe I am wrong, but I think port-security flows are only related to
> local LSPs on each node, which doesn't contribute much to the
> OVS/ovn-controller memory footprint, and thanks to your patches that moves
> port-security flow generation from northd to ovn-controller, the central
> components are already out of the picture of the port-security related
> costs. So I guess we won't see obvious differences in scale tests.
>
> > > >
> > > > Another question is, would it work with smart NIC HW-offload, where
> VF
> > > > representer ports are added to OVS on the smart NIC? I guess XDP
> doesn't
> > > > support representer port, right?
> >
> > I think so. I don't have much experience/knowledge on this.  From what
> > I understand,  if datapath flows are offloaded and since XDP is not
> > offloaded, the xdo checks will be totally missed.
> > So if XDP is to be used, then offloading should be disabled.
> >
>
> Agree, although I did hope it could help for HW offload enabled
> environments to mitigate the scenarios when packets would miss the HW flow
> cache.
>
> Thanks,
> Han
>
> > Thanks
> > Numan
> >
> > > >
> > > > Thanks,
> > > > Han
> > > >
> > > > > In order to attach and detach xdp programs,  libxdp [1] and libbpf
> is
> > > > used.
> > > > >
> > > > > To test it out locally, please install libxdp-devel and
> libbpf-devel
> > > > > and the compile OVN first and then compile ovn_xdp by running "make
> > > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
> /usr/local/share/ovn/
> > > > >
> > > > >
> > > > > Numan Siddique (2):
> > > > >   RFC: Add basic xdp/eBPF support in OVN.
> > > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
> > > > >     ports.
> > > > >
> > > > >  Makefile.am                 |   6 +-
> > > > >  bpf/.gitignore              |   5 +
> > > > >  bpf/automake.mk             |  23 +++
> > > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> > > > >  configure.ac                |   2 +
> > > > >  controller/automake.mk      |   4 +-
> > > > >  controller/binding.c        |  45 +++--
> > > > >  controller/binding.h        |   7 +
> > > > >  controller/ovn-controller.c |  79 +++++++-
> > > > >  controller/xdp.c            | 389
> ++++++++++++++++++++++++++++++++++++
> > > > >  controller/xdp.h            |  41 ++++
> > > > >  m4/ovn.m4                   |  20 ++
> > > > >  tests/automake.mk           |   1 +
> > > > >  13 files changed, 753 insertions(+), 25 deletions(-)
> > > > >  create mode 100644 bpf/.gitignore
> > > > >  create mode 100644 bpf/automake.mk
> > > > >  create mode 100644 bpf/ovn_xdp.c
> > > > >  create mode 100644 controller/xdp.c
> > > > >  create mode 100644 controller/xdp.h
> > > > >
> > > > > --
> > > > > 2.35.3
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > >
> > >
> > > --
> > > 刘梦馨
> > > Blog： http://oilbeater.com
> > > Weibo: @oilbeater <http://weibo.com/oilbeater>
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

Dan Williams June 8, 2022, 5:52 p.m. UTC | #6

On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote:
> > Could you tell roughly how many packets were sent in a single test?
> > Was
> the latency measured for all the UDP packets in average?
> 
> Let me describe my test method more clearly. In fact, we only tested
> pod-to-pod performance *not* pod-to-service and then do profile with
> flamegraph and find the loadbalancer process took about 30% CPU
> usage.

pod -> pod (directly to the other Pod IP) shouldn't go through any load
balancer related flows though, right? That seems curious to me... It
might hit OVN's load balancer stages but (I think!) shouldn't be
matching any rules in them, because the packet's destination IP
wouldn't be a LB VIP.

Did you do an ofproto/trace to see what OVS flows the packet was
hitting and if any were OVN LB related?

Dan

> 
> Run two Pods in two different node, and one run qperf server the
> other run
> qperf client to test udp latency and bandwidth performance with
> command
> `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`.
> 
> In the first test, we use kube-ovn default setup which use ovn
> loadbalancer
> to replace kube-proxy and got the result latency  25.7us and
> bandwidth
> 2.8Mb/s
> 
> Then we manually delete all ovn loadbalancer rules bind to the
> logical
> switch, and got a much better result 18.5us and 6Mb/s
> 
> > Was it clear why the total datapath cannot be offloaded to HW?
> The issue we meet with hw-offload is that mellanox cx5/cx6 didn't
> support
> dp_hash and hash at the moment and these two method are used by
> group table to select a backend.
> What makes things worse is that when any lb bind to a ls all packet
> will go
> through the lb pipeline even if it not designate to service. So the
> total
> ls datapath cannot be offloaded.
> 
> We have a customized path to bypaas the lb pipeline if traffic not
> designate to service here
> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch
> 
> > I am sorry that I am confused by OVN "L2" LB. I think you might
> > mean OVN
> "L3/L4" LB?
> I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to
> replace
> kube-proxy
> 
> >   I am asking because if the packets hit mega flows in the kernel
> > cache,
> it shouldn't be slower than kube-proxy which also uses conntrack. If
> it is
> HW offloaded it should be faster.
> 
> In my previous profile it seems unrelated to mega flow cache. The
> flame
> graph shows that there is extra ovs clone and reprocess compared to
> the
> flame graph without lb. I have introduced how to profile and optimize
> kube-ovn performance before and give more detail about the lb
> performance
> issue at the beginning of the video in Chinese
> https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide
> more
> help
> 
> On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote:
> 
> > 
> > 
> > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org>
> > wrote:
> > > 
> > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com>
> > > wrote:
> > > > 
> > > > Just give some input about eBPF/XDP support.
> > > > 
> > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes,
> > > > but found
> > > > that
> > > > the L2 LB will use conntrack and ovs clone which hurts
> > > > performance
> > badly.
> > > > The latency
> > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth
> > > > drop
> > from
> > > > 6Mb/s to 2.8Mb/s.
> > > > 
> > Thanks for the input!
> > Could you tell roughly how many packets were sent in a single test?
> > Was
> > the latency measured for all the UDP packets in average? I am
> > asking
> > because if the packets hit mega flows in the kernel cache, it
> > shouldn't be
> > slower than kube-proxy which also uses conntrack. If it is HW
> > offloaded it
> > should be faster.
> > 
> > > > Even if the traffic does not target to LB VIPs has the same
> > > > performance
> > > > drop and it also leads to the
> > > > total datapath cannot be offloaded to hardware.
> > > > 
> > 
> > Was it clear why the total datapath cannot be offloaded to HW?
> > There might
> > be problems of supporting HW offloading in earlier version of OVN.
> > There
> > have been improvements to make it more HW offload friendly.
> > 
> > > > And finally we turn to using Cilium's chaining mode to replace
> > > > the OVN
> > L2
> > > > LB to implement kube-proxy to
> > > > resolve the above issues. We hope to see the lb optimization by
> > eBPF/XDP on
> > > > the OVN side.
> > > > 
> > > 
> > > Thanks for your comments and inputs.   I think we should
> > > definitely
> > > explore optimizing this use case
> > > and see if its possible to leverage eBPF/XDP for this.
> > > 
> > 
> > I am sorry that I am confused by OVN "L2" LB. I think you might
> > mean OVN
> > "L3/L4" LB?
> > 
> > Some general thoughts on this is, OVN is primarily to program OVS
> > (or
> > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a
> > data-driven approach (as mentioned by Ben in several talks). The
> > advantage
> > is that it uses caches to accelerate datapath, regardless of the
> > number of
> > pipeline stages in the forwarding logic; and the disadvantage is of
> > course
> > when a packet has a cache miss, it will be slow. So I would think
> > the
> > direction of using eBPF/XDP is better to be within OVS itself,
> > instead of
> > adding an extra stage that cannot be cached within the OVS
> > framework,
> > because even if the extra stage is very fast, it is still extra.
> > 
> > I would consider such an extra eBPF/XDP stage in OVN directly only
> > for the
> > cases that we know it is likely to miss the OVS/HW flow caches. One
> > example
> > may be DOS attacks that always trigger CT unestablished entries,
> > which is
> > not HW offload friendly. (But I don't have concrete use
> > cases/scenarios)
> > 
> > In the case of OVN LB, I don't see a reason why it would miss the
> > cache
> > except for the first packets. Adding an extra eBPF/XDP stage on top
> > of the
> > OVS cached pipeline doesn't seem to improve the performance.
> > 
> > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com>
> > > > wrote:
> > > > 
> > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
> > > > > > 
> > > > > > From: Numan Siddique <numans@ovn.org>
> > > > > > 
> > > > > > XDP program - ovn_xdp.c added in this RFC patch  series
> > > > > > implements
> > basic
> > > > > port
> > > > > > security and drops any packet if the port security check
> > > > > > fails.
> > > > > > There are still few TODOs in the port security checks. Like
> > > > > >       - Make ovn xdp configurable.
> > > > > >       - Removing the ingress Openflow rules from table 73
> > > > > > and 74
> > if ovn
> > > > > xdp
> > > > > >         is enabled.
> > > > > >       - Add IPv6 support.
> > > > > >       - Enhance the port security xdp program for ARP/IPv6
> > > > > > ND
> > checks.
> > > > > > 
> > > > > > This patch adds a basic XDP support in OVN and in future we
> > > > > > can
> > > > > > leverage eBPF/XDP features.
> > > > > > 
> > > > > > I'm not sure how much value this RFC patch adds to make use
> > > > > > of
> > eBPF/XDP
> > > > > > just for port security.  Submitting as RFC to get some
> > > > > > feedback and
> > > > > > start some conversation on eBPF/XDP in OVN.
> > > > > > 
> > > > > Hi Numan,
> > > > > 
> > > > > This is really cool. It demonstrates how OVN could leverage
> > > > > eBPF/XDP.
> > > > > 
> > > > > On the other hand, for the port-security feature in XDP, I
> > > > > keep
> > thinking
> > > > > about the scenarios and it is still not very clear to me. One
> > advantage I
> > > > > can think of is to prevent DOS attacks from VM/Pod when
> > > > > invalid
> > IP/MAC are
> > > > > used, XDP may perform better and drop packets with lower CPU
> > > > > cost
> > > > > (comparing with OVS kernel datapath). However, I am also
> > > > > wondering
> > why
> > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you
> > > > > have
> > some more
> > > > > thoughts about the use cases?
> > > 
> > > My idea was to demonstrate the use of eBPF/XDP and port security
> > > checks were easy to do
> > > before the packet hits the OVS pipeline.
> > > 
> > Understand. It is indeed a great demonstration.
> > 
> > > If we were to move the port security check to XDP, then the only
> > > advantage we would be getting
> > > in my opinion is to remove the corresponding ingress port
> > > security
> > > check related OF rules from ovs-vswitchd, thereby decreasing some
> > > looks up during
> > > flow translation.
> > > 
> > For slow path, it might reduce the lookups in two tables, but
> > considering
> > that we have tens of tables, this cost may be negligible?
> > For fast path, there is no impact on the megaflow cache.
> > 
> > > I'm not sure why an attacker would use invalid IP/MAC for DOS
> > > attacks.
> > > But from what I know, ovn-kubernetes do want to restrict each POD
> > > to
> > > its assigned IP/MAC.
> > > 
> > Yes, restricting pods to use assigned IP/MAC is for port security,
> > which
> > is implemented by the port-security flows. I was talking about DOS
> > attacks
> > just to imagine a use case that utilizes the performance advantage
> > of XDP.
> > If it is just to detect and drop a regular amount of packets that
> > try to
> > use fake IP/MAC to circumvent security policies (ACLs), it doesn't
> > reflect
> > the benefit of XDP.
> > 
> > >  And do you have any performance results
> > > > > comparing with the current OVS implementation?
> > > 
> > > I didn't do any scale/performance related tests.
> > > 
> > > If we were to move port security feature to XDP in OVN, then I
> > > think  we
> > need to
> > >    - Complete the TODO's like adding IPv6 and ARP/ND related
> > > checks
> > >    - Do some scale testing and see whether its reducing memory
> > > footprint of ovs-vswitchd and ovn-controller because of the
> > > reduction
> > > in OF rules
> > > 
> > 
> > Maybe I am wrong, but I think port-security flows are only related
> > to
> > local LSPs on each node, which doesn't contribute much to the
> > OVS/ovn-controller memory footprint, and thanks to your patches
> > that moves
> > port-security flow generation from northd to ovn-controller, the
> > central
> > components are already out of the picture of the port-security
> > related
> > costs. So I guess we won't see obvious differences in scale tests.
> > 
> > > > > 
> > > > > Another question is, would it work with smart NIC HW-offload,
> > > > > where
> > VF
> > > > > representer ports are added to OVS on the smart NIC? I guess
> > > > > XDP
> > doesn't
> > > > > support representer port, right?
> > > 
> > > I think so. I don't have much experience/knowledge on this.  From
> > > what
> > > I understand,  if datapath flows are offloaded and since XDP is
> > > not
> > > offloaded, the xdo checks will be totally missed.
> > > So if XDP is to be used, then offloading should be disabled.
> > > 
> > 
> > Agree, although I did hope it could help for HW offload enabled
> > environments to mitigate the scenarios when packets would miss the
> > HW flow
> > cache.
> > 
> > Thanks,
> > Han
> > 
> > > Thanks
> > > Numan
> > > 
> > > > > 
> > > > > Thanks,
> > > > > Han
> > > > > 
> > > > > > In order to attach and detach xdp programs,  libxdp [1] and
> > > > > > libbpf
> > is
> > > > > used.
> > > > > > 
> > > > > > To test it out locally, please install libxdp-devel and
> > libbpf-devel
> > > > > > and the compile OVN first and then compile ovn_xdp by
> > > > > > running "make
> > > > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
> > /usr/local/share/ovn/
> > > > > > 
> > > > > > 
> > > > > > Numan Siddique (2):
> > > > > >   RFC: Add basic xdp/eBPF support in OVN.
> > > > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the
> > > > > > logical
> > > > > >     ports.
> > > > > > 
> > > > > >  Makefile.am                 |   6 +-
> > > > > >  bpf/.gitignore              |   5 +
> > > > > >  bpf/automake.mk             |  23 +++
> > > > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> > > > > >  configure.ac                |   2 +
> > > > > >  controller/automake.mk      |   4 +-
> > > > > >  controller/binding.c        |  45 +++--
> > > > > >  controller/binding.h        |   7 +
> > > > > >  controller/ovn-controller.c |  79 +++++++-
> > > > > >  controller/xdp.c            | 389
> > ++++++++++++++++++++++++++++++++++++
> > > > > >  controller/xdp.h            |  41 ++++
> > > > > >  m4/ovn.m4                   |  20 ++
> > > > > >  tests/automake.mk           |   1 +
> > > > > >  13 files changed, 753 insertions(+), 25 deletions(-)
> > > > > >  create mode 100644 bpf/.gitignore
> > > > > >  create mode 100644 bpf/automake.mk
> > > > > >  create mode 100644 bpf/ovn_xdp.c
> > > > > >  create mode 100644 controller/xdp.c
> > > > > >  create mode 100644 controller/xdp.h
> > > > > > 
> > > > > > --
> > > > > > 2.35.3
> > > > > > 
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > dev@openvswitch.org
> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > 
> > > > 
> > > > 
> > > > --
> > > > 刘梦馨
> > > > Blog： http://oilbeater.com
> > > > Weibo: @oilbeater <http://weibo.com/oilbeater>
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> 
>

Mengxin Liu June 9, 2022, 3:19 a.m. UTC | #7

> pod -> pod (directly to the other Pod IP) shouldn't go through any load
balancer related flows though, right?

It didn't match the final vip and ct_lb action. But when the lb rule
exists, it will first send all packets to conntrack and lead recirculation
with ovs clone and it hurts the performance.

And I find the initial commit that send all traffic to conntrack here
https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb
is to fix a bug.

Even if we bypass the conntrack action in ingress pipeline by a customized
ovn, we still cannot bypass the conntrack in the egress pipeline. All
egress packets still need to be sent to conntrack to test if they match a
nat session.

I cannot find the full performance test data at the moment. What I find is
that with the patch to bypass ingress conntrack, with lb rules, the latency
for pod-to-pod qperf test dropped from 118us to 97us. And if no lb rules
exist, the pod-to-pod latency drops to 88us.

On Thu, 9 Jun 2022 at 01:52, Dan Williams <dcbw@redhat.com> wrote:

> On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote:
> > > Could you tell roughly how many packets were sent in a single test?
> > > Was
> > the latency measured for all the UDP packets in average?
> >
> > Let me describe my test method more clearly. In fact, we only tested
> > pod-to-pod performance *not* pod-to-service and then do profile with
> > flamegraph and find the loadbalancer process took about 30% CPU
> > usage.
>
> pod -> pod (directly to the other Pod IP) shouldn't go through any load
> balancer related flows though, right? That seems curious to me... It
> might hit OVN's load balancer stages but (I think!) shouldn't be
> matching any rules in them, because the packet's destination IP
> wouldn't be a LB VIP.
>
> Did you do an ofproto/trace to see what OVS flows the packet was
> hitting and if any were OVN LB related?
>
> Dan
>
> >
> > Run two Pods in two different node, and one run qperf server the
> > other run
> > qperf client to test udp latency and bandwidth performance with
> > command
> > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`.
> >
> > In the first test, we use kube-ovn default setup which use ovn
> > loadbalancer
> > to replace kube-proxy and got the result latency  25.7us and
> > bandwidth
> > 2.8Mb/s
> >
> > Then we manually delete all ovn loadbalancer rules bind to the
> > logical
> > switch, and got a much better result 18.5us and 6Mb/s
> >
> > > Was it clear why the total datapath cannot be offloaded to HW?
> > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't
> > support
> > dp_hash and hash at the moment and these two method are used by
> > group table to select a backend.
> > What makes things worse is that when any lb bind to a ls all packet
> > will go
> > through the lb pipeline even if it not designate to service. So the
> > total
> > ls datapath cannot be offloaded.
> >
> > We have a customized path to bypaas the lb pipeline if traffic not
> > designate to service here
> >
> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch
> >
> > > I am sorry that I am confused by OVN "L2" LB. I think you might
> > > mean OVN
> > "L3/L4" LB?
> > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to
> > replace
> > kube-proxy
> >
> > >   I am asking because if the packets hit mega flows in the kernel
> > > cache,
> > it shouldn't be slower than kube-proxy which also uses conntrack. If
> > it is
> > HW offloaded it should be faster.
> >
> > In my previous profile it seems unrelated to mega flow cache. The
> > flame
> > graph shows that there is extra ovs clone and reprocess compared to
> > the
> > flame graph without lb. I have introduced how to profile and optimize
> > kube-ovn performance before and give more detail about the lb
> > performance
> > issue at the beginning of the video in Chinese
> > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide
> > more
> > help
> >
> > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote:
> >
> > >
> > >
> > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org>
> > > wrote:
> > > >
> > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com>
> > > > wrote:
> > > > >
> > > > > Just give some input about eBPF/XDP support.
> > > > >
> > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes,
> > > > > but found
> > > > > that
> > > > > the L2 LB will use conntrack and ovs clone which hurts
> > > > > performance
> > > badly.
> > > > > The latency
> > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth
> > > > > drop
> > > from
> > > > > 6Mb/s to 2.8Mb/s.
> > > > >
> > > Thanks for the input!
> > > Could you tell roughly how many packets were sent in a single test?
> > > Was
> > > the latency measured for all the UDP packets in average? I am
> > > asking
> > > because if the packets hit mega flows in the kernel cache, it
> > > shouldn't be
> > > slower than kube-proxy which also uses conntrack. If it is HW
> > > offloaded it
> > > should be faster.
> > >
> > > > > Even if the traffic does not target to LB VIPs has the same
> > > > > performance
> > > > > drop and it also leads to the
> > > > > total datapath cannot be offloaded to hardware.
> > > > >
> > >
> > > Was it clear why the total datapath cannot be offloaded to HW?
> > > There might
> > > be problems of supporting HW offloading in earlier version of OVN.
> > > There
> > > have been improvements to make it more HW offload friendly.
> > >
> > > > > And finally we turn to using Cilium's chaining mode to replace
> > > > > the OVN
> > > L2
> > > > > LB to implement kube-proxy to
> > > > > resolve the above issues. We hope to see the lb optimization by
> > > eBPF/XDP on
> > > > > the OVN side.
> > > > >
> > > >
> > > > Thanks for your comments and inputs.   I think we should
> > > > definitely
> > > > explore optimizing this use case
> > > > and see if its possible to leverage eBPF/XDP for this.
> > > >
> > >
> > > I am sorry that I am confused by OVN "L2" LB. I think you might
> > > mean OVN
> > > "L3/L4" LB?
> > >
> > > Some general thoughts on this is, OVN is primarily to program OVS
> > > (or
> > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a
> > > data-driven approach (as mentioned by Ben in several talks). The
> > > advantage
> > > is that it uses caches to accelerate datapath, regardless of the
> > > number of
> > > pipeline stages in the forwarding logic; and the disadvantage is of
> > > course
> > > when a packet has a cache miss, it will be slow. So I would think
> > > the
> > > direction of using eBPF/XDP is better to be within OVS itself,
> > > instead of
> > > adding an extra stage that cannot be cached within the OVS
> > > framework,
> > > because even if the extra stage is very fast, it is still extra.
> > >
> > > I would consider such an extra eBPF/XDP stage in OVN directly only
> > > for the
> > > cases that we know it is likely to miss the OVS/HW flow caches. One
> > > example
> > > may be DOS attacks that always trigger CT unestablished entries,
> > > which is
> > > not HW offload friendly. (But I don't have concrete use
> > > cases/scenarios)
> > >
> > > In the case of OVN LB, I don't see a reason why it would miss the
> > > cache
> > > except for the first packets. Adding an extra eBPF/XDP stage on top
> > > of the
> > > OVS cached pipeline doesn't seem to improve the performance.
> > >
> > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
> > > > > > >
> > > > > > > From: Numan Siddique <numans@ovn.org>
> > > > > > >
> > > > > > > XDP program - ovn_xdp.c added in this RFC patch  series
> > > > > > > implements
> > > basic
> > > > > > port
> > > > > > > security and drops any packet if the port security check
> > > > > > > fails.
> > > > > > > There are still few TODOs in the port security checks. Like
> > > > > > >       - Make ovn xdp configurable.
> > > > > > >       - Removing the ingress Openflow rules from table 73
> > > > > > > and 74
> > > if ovn
> > > > > > xdp
> > > > > > >         is enabled.
> > > > > > >       - Add IPv6 support.
> > > > > > >       - Enhance the port security xdp program for ARP/IPv6
> > > > > > > ND
> > > checks.
> > > > > > >
> > > > > > > This patch adds a basic XDP support in OVN and in future we
> > > > > > > can
> > > > > > > leverage eBPF/XDP features.
> > > > > > >
> > > > > > > I'm not sure how much value this RFC patch adds to make use
> > > > > > > of
> > > eBPF/XDP
> > > > > > > just for port security.  Submitting as RFC to get some
> > > > > > > feedback and
> > > > > > > start some conversation on eBPF/XDP in OVN.
> > > > > > >
> > > > > > Hi Numan,
> > > > > >
> > > > > > This is really cool. It demonstrates how OVN could leverage
> > > > > > eBPF/XDP.
> > > > > >
> > > > > > On the other hand, for the port-security feature in XDP, I
> > > > > > keep
> > > thinking
> > > > > > about the scenarios and it is still not very clear to me. One
> > > advantage I
> > > > > > can think of is to prevent DOS attacks from VM/Pod when
> > > > > > invalid
> > > IP/MAC are
> > > > > > used, XDP may perform better and drop packets with lower CPU
> > > > > > cost
> > > > > > (comparing with OVS kernel datapath). However, I am also
> > > > > > wondering
> > > why
> > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you
> > > > > > have
> > > some more
> > > > > > thoughts about the use cases?
> > > >
> > > > My idea was to demonstrate the use of eBPF/XDP and port security
> > > > checks were easy to do
> > > > before the packet hits the OVS pipeline.
> > > >
> > > Understand. It is indeed a great demonstration.
> > >
> > > > If we were to move the port security check to XDP, then the only
> > > > advantage we would be getting
> > > > in my opinion is to remove the corresponding ingress port
> > > > security
> > > > check related OF rules from ovs-vswitchd, thereby decreasing some
> > > > looks up during
> > > > flow translation.
> > > >
> > > For slow path, it might reduce the lookups in two tables, but
> > > considering
> > > that we have tens of tables, this cost may be negligible?
> > > For fast path, there is no impact on the megaflow cache.
> > >
> > > > I'm not sure why an attacker would use invalid IP/MAC for DOS
> > > > attacks.
> > > > But from what I know, ovn-kubernetes do want to restrict each POD
> > > > to
> > > > its assigned IP/MAC.
> > > >
> > > Yes, restricting pods to use assigned IP/MAC is for port security,
> > > which
> > > is implemented by the port-security flows. I was talking about DOS
> > > attacks
> > > just to imagine a use case that utilizes the performance advantage
> > > of XDP.
> > > If it is just to detect and drop a regular amount of packets that
> > > try to
> > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't
> > > reflect
> > > the benefit of XDP.
> > >
> > > >  And do you have any performance results
> > > > > > comparing with the current OVS implementation?
> > > >
> > > > I didn't do any scale/performance related tests.
> > > >
> > > > If we were to move port security feature to XDP in OVN, then I
> > > > think  we
> > > need to
> > > >    - Complete the TODO's like adding IPv6 and ARP/ND related
> > > > checks
> > > >    - Do some scale testing and see whether its reducing memory
> > > > footprint of ovs-vswitchd and ovn-controller because of the
> > > > reduction
> > > > in OF rules
> > > >
> > >
> > > Maybe I am wrong, but I think port-security flows are only related
> > > to
> > > local LSPs on each node, which doesn't contribute much to the
> > > OVS/ovn-controller memory footprint, and thanks to your patches
> > > that moves
> > > port-security flow generation from northd to ovn-controller, the
> > > central
> > > components are already out of the picture of the port-security
> > > related
> > > costs. So I guess we won't see obvious differences in scale tests.
> > >
> > > > > >
> > > > > > Another question is, would it work with smart NIC HW-offload,
> > > > > > where
> > > VF
> > > > > > representer ports are added to OVS on the smart NIC? I guess
> > > > > > XDP
> > > doesn't
> > > > > > support representer port, right?
> > > >
> > > > I think so. I don't have much experience/knowledge on this.  From
> > > > what
> > > > I understand,  if datapath flows are offloaded and since XDP is
> > > > not
> > > > offloaded, the xdo checks will be totally missed.
> > > > So if XDP is to be used, then offloading should be disabled.
> > > >
> > >
> > > Agree, although I did hope it could help for HW offload enabled
> > > environments to mitigate the scenarios when packets would miss the
> > > HW flow
> > > cache.
> > >
> > > Thanks,
> > > Han
> > >
> > > > Thanks
> > > > Numan
> > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Han
> > > > > >
> > > > > > > In order to attach and detach xdp programs,  libxdp [1] and
> > > > > > > libbpf
> > > is
> > > > > > used.
> > > > > > >
> > > > > > > To test it out locally, please install libxdp-devel and
> > > libbpf-devel
> > > > > > > and the compile OVN first and then compile ovn_xdp by
> > > > > > > running "make
> > > > > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
> > > /usr/local/share/ovn/
> > > > > > >
> > > > > > >
> > > > > > > Numan Siddique (2):
> > > > > > >   RFC: Add basic xdp/eBPF support in OVN.
> > > > > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the
> > > > > > > logical
> > > > > > >     ports.
> > > > > > >
> > > > > > >  Makefile.am                 |   6 +-
> > > > > > >  bpf/.gitignore              |   5 +
> > > > > > >  bpf/automake.mk             |  23 +++
> > > > > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> > > > > > >  configure.ac                |   2 +
> > > > > > >  controller/automake.mk      |   4 +-
> > > > > > >  controller/binding.c        |  45 +++--
> > > > > > >  controller/binding.h        |   7 +
> > > > > > >  controller/ovn-controller.c |  79 +++++++-
> > > > > > >  controller/xdp.c            | 389
> > > ++++++++++++++++++++++++++++++++++++
> > > > > > >  controller/xdp.h            |  41 ++++
> > > > > > >  m4/ovn.m4                   |  20 ++
> > > > > > >  tests/automake.mk           |   1 +
> > > > > > >  13 files changed, 753 insertions(+), 25 deletions(-)
> > > > > > >  create mode 100644 bpf/.gitignore
> > > > > > >  create mode 100644 bpf/automake.mk
> > > > > > >  create mode 100644 bpf/ovn_xdp.c
> > > > > > >  create mode 100644 controller/xdp.c
> > > > > > >  create mode 100644 controller/xdp.h
> > > > > > >
> > > > > > > --
> > > > > > > 2.35.3
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > dev mailing list
> > > > > > > dev@openvswitch.org
> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > dev@openvswitch.org
> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > 刘梦馨
> > > > > Blog： http://oilbeater.com
> > > > > Weibo: @oilbeater <http://weibo.com/oilbeater>
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
> >
>
>

Mengxin Liu June 9, 2022, 3:49 a.m. UTC | #8

In our profile, the conntrack is the main reason for performance drop.

And when need related functions like lb and acl, we have to carefully check
if other unrelated flows are affected like this patch for multicast traffic
http://patchwork.ozlabs.org/project/ovn/patch/20211217141645.9931-1-dceara@redhat.com

For some performance test poc scenarios we also have to disable functions
related to conntrack for a better result. If XDP/eBPF can help with the
conntrack performance issues, I think it will be a big boost and we don't
need lots of customization or turn to Cilium to replace some functions but
bring in lots of complexity.

On Thu, 9 Jun 2022 at 11:19, 刘梦馨 <liumengxinfly@gmail.com> wrote:

> > pod -> pod (directly to the other Pod IP) shouldn't go through any load
> balancer related flows though, right?
>
> It didn't match the final vip and ct_lb action. But when the lb rule
> exists, it will first send all packets to conntrack and lead recirculation
> with ovs clone and it hurts the performance.
>
> And I find the initial commit that send all traffic to conntrack here
> https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb
> is to fix a bug.
>
> Even if we bypass the conntrack action in ingress pipeline by a customized
> ovn, we still cannot bypass the conntrack in the egress pipeline. All
> egress packets still need to be sent to conntrack to test if they match a
> nat session.
>
> I cannot find the full performance test data at the moment. What I find is
> that with the patch to bypass ingress conntrack, with lb rules, the latency
> for pod-to-pod qperf test dropped from 118us to 97us. And if no lb rules
> exist, the pod-to-pod latency drops to 88us.
>
> On Thu, 9 Jun 2022 at 01:52, Dan Williams <dcbw@redhat.com> wrote:
>
>> On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote:
>> > > Could you tell roughly how many packets were sent in a single test?
>> > > Was
>> > the latency measured for all the UDP packets in average?
>> >
>> > Let me describe my test method more clearly. In fact, we only tested
>> > pod-to-pod performance *not* pod-to-service and then do profile with
>> > flamegraph and find the loadbalancer process took about 30% CPU
>> > usage.
>>
>> pod -> pod (directly to the other Pod IP) shouldn't go through any load
>> balancer related flows though, right? That seems curious to me... It
>> might hit OVN's load balancer stages but (I think!) shouldn't be
>> matching any rules in them, because the packet's destination IP
>> wouldn't be a LB VIP.
>>
>> Did you do an ofproto/trace to see what OVS flows the packet was
>> hitting and if any were OVN LB related?
>>
>> Dan
>>
>> >
>> > Run two Pods in two different node, and one run qperf server the
>> > other run
>> > qperf client to test udp latency and bandwidth performance with
>> > command
>> > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`.
>> >
>> > In the first test, we use kube-ovn default setup which use ovn
>> > loadbalancer
>> > to replace kube-proxy and got the result latency  25.7us and
>> > bandwidth
>> > 2.8Mb/s
>> >
>> > Then we manually delete all ovn loadbalancer rules bind to the
>> > logical
>> > switch, and got a much better result 18.5us and 6Mb/s
>> >
>> > > Was it clear why the total datapath cannot be offloaded to HW?
>> > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't
>> > support
>> > dp_hash and hash at the moment and these two method are used by
>> > group table to select a backend.
>> > What makes things worse is that when any lb bind to a ls all packet
>> > will go
>> > through the lb pipeline even if it not designate to service. So the
>> > total
>> > ls datapath cannot be offloaded.
>> >
>> > We have a customized path to bypaas the lb pipeline if traffic not
>> > designate to service here
>> >
>> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch
>> >
>> > > I am sorry that I am confused by OVN "L2" LB. I think you might
>> > > mean OVN
>> > "L3/L4" LB?
>> > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to
>> > replace
>> > kube-proxy
>> >
>> > >   I am asking because if the packets hit mega flows in the kernel
>> > > cache,
>> > it shouldn't be slower than kube-proxy which also uses conntrack. If
>> > it is
>> > HW offloaded it should be faster.
>> >
>> > In my previous profile it seems unrelated to mega flow cache. The
>> > flame
>> > graph shows that there is extra ovs clone and reprocess compared to
>> > the
>> > flame graph without lb. I have introduced how to profile and optimize
>> > kube-ovn performance before and give more detail about the lb
>> > performance
>> > issue at the beginning of the video in Chinese
>> > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide
>> > more
>> > help
>> >
>> > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote:
>> >
>> > >
>> > >
>> > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org>
>> > > wrote:
>> > > >
>> > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > Just give some input about eBPF/XDP support.
>> > > > >
>> > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes,
>> > > > > but found
>> > > > > that
>> > > > > the L2 LB will use conntrack and ovs clone which hurts
>> > > > > performance
>> > > badly.
>> > > > > The latency
>> > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth
>> > > > > drop
>> > > from
>> > > > > 6Mb/s to 2.8Mb/s.
>> > > > >
>> > > Thanks for the input!
>> > > Could you tell roughly how many packets were sent in a single test?
>> > > Was
>> > > the latency measured for all the UDP packets in average? I am
>> > > asking
>> > > because if the packets hit mega flows in the kernel cache, it
>> > > shouldn't be
>> > > slower than kube-proxy which also uses conntrack. If it is HW
>> > > offloaded it
>> > > should be faster.
>> > >
>> > > > > Even if the traffic does not target to LB VIPs has the same
>> > > > > performance
>> > > > > drop and it also leads to the
>> > > > > total datapath cannot be offloaded to hardware.
>> > > > >
>> > >
>> > > Was it clear why the total datapath cannot be offloaded to HW?
>> > > There might
>> > > be problems of supporting HW offloading in earlier version of OVN.
>> > > There
>> > > have been improvements to make it more HW offload friendly.
>> > >
>> > > > > And finally we turn to using Cilium's chaining mode to replace
>> > > > > the OVN
>> > > L2
>> > > > > LB to implement kube-proxy to
>> > > > > resolve the above issues. We hope to see the lb optimization by
>> > > eBPF/XDP on
>> > > > > the OVN side.
>> > > > >
>> > > >
>> > > > Thanks for your comments and inputs.   I think we should
>> > > > definitely
>> > > > explore optimizing this use case
>> > > > and see if its possible to leverage eBPF/XDP for this.
>> > > >
>> > >
>> > > I am sorry that I am confused by OVN "L2" LB. I think you might
>> > > mean OVN
>> > > "L3/L4" LB?
>> > >
>> > > Some general thoughts on this is, OVN is primarily to program OVS
>> > > (or
>> > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a
>> > > data-driven approach (as mentioned by Ben in several talks). The
>> > > advantage
>> > > is that it uses caches to accelerate datapath, regardless of the
>> > > number of
>> > > pipeline stages in the forwarding logic; and the disadvantage is of
>> > > course
>> > > when a packet has a cache miss, it will be slow. So I would think
>> > > the
>> > > direction of using eBPF/XDP is better to be within OVS itself,
>> > > instead of
>> > > adding an extra stage that cannot be cached within the OVS
>> > > framework,
>> > > because even if the extra stage is very fast, it is still extra.
>> > >
>> > > I would consider such an extra eBPF/XDP stage in OVN directly only
>> > > for the
>> > > cases that we know it is likely to miss the OVS/HW flow caches. One
>> > > example
>> > > may be DOS attacks that always trigger CT unestablished entries,
>> > > which is
>> > > not HW offload friendly. (But I don't have concrete use
>> > > cases/scenarios)
>> > >
>> > > In the case of OVN LB, I don't see a reason why it would miss the
>> > > cache
>> > > except for the first packets. Adding an extra eBPF/XDP stage on top
>> > > of the
>> > > OVS cached pipeline doesn't seem to improve the performance.
>> > >
>> > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
>> > > > > > >
>> > > > > > > From: Numan Siddique <numans@ovn.org>
>> > > > > > >
>> > > > > > > XDP program - ovn_xdp.c added in this RFC patch  series
>> > > > > > > implements
>> > > basic
>> > > > > > port
>> > > > > > > security and drops any packet if the port security check
>> > > > > > > fails.
>> > > > > > > There are still few TODOs in the port security checks. Like
>> > > > > > >       - Make ovn xdp configurable.
>> > > > > > >       - Removing the ingress Openflow rules from table 73
>> > > > > > > and 74
>> > > if ovn
>> > > > > > xdp
>> > > > > > >         is enabled.
>> > > > > > >       - Add IPv6 support.
>> > > > > > >       - Enhance the port security xdp program for ARP/IPv6
>> > > > > > > ND
>> > > checks.
>> > > > > > >
>> > > > > > > This patch adds a basic XDP support in OVN and in future we
>> > > > > > > can
>> > > > > > > leverage eBPF/XDP features.
>> > > > > > >
>> > > > > > > I'm not sure how much value this RFC patch adds to make use
>> > > > > > > of
>> > > eBPF/XDP
>> > > > > > > just for port security.  Submitting as RFC to get some
>> > > > > > > feedback and
>> > > > > > > start some conversation on eBPF/XDP in OVN.
>> > > > > > >
>> > > > > > Hi Numan,
>> > > > > >
>> > > > > > This is really cool. It demonstrates how OVN could leverage
>> > > > > > eBPF/XDP.
>> > > > > >
>> > > > > > On the other hand, for the port-security feature in XDP, I
>> > > > > > keep
>> > > thinking
>> > > > > > about the scenarios and it is still not very clear to me. One
>> > > advantage I
>> > > > > > can think of is to prevent DOS attacks from VM/Pod when
>> > > > > > invalid
>> > > IP/MAC are
>> > > > > > used, XDP may perform better and drop packets with lower CPU
>> > > > > > cost
>> > > > > > (comparing with OVS kernel datapath). However, I am also
>> > > > > > wondering
>> > > why
>> > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you
>> > > > > > have
>> > > some more
>> > > > > > thoughts about the use cases?
>> > > >
>> > > > My idea was to demonstrate the use of eBPF/XDP and port security
>> > > > checks were easy to do
>> > > > before the packet hits the OVS pipeline.
>> > > >
>> > > Understand. It is indeed a great demonstration.
>> > >
>> > > > If we were to move the port security check to XDP, then the only
>> > > > advantage we would be getting
>> > > > in my opinion is to remove the corresponding ingress port
>> > > > security
>> > > > check related OF rules from ovs-vswitchd, thereby decreasing some
>> > > > looks up during
>> > > > flow translation.
>> > > >
>> > > For slow path, it might reduce the lookups in two tables, but
>> > > considering
>> > > that we have tens of tables, this cost may be negligible?
>> > > For fast path, there is no impact on the megaflow cache.
>> > >
>> > > > I'm not sure why an attacker would use invalid IP/MAC for DOS
>> > > > attacks.
>> > > > But from what I know, ovn-kubernetes do want to restrict each POD
>> > > > to
>> > > > its assigned IP/MAC.
>> > > >
>> > > Yes, restricting pods to use assigned IP/MAC is for port security,
>> > > which
>> > > is implemented by the port-security flows. I was talking about DOS
>> > > attacks
>> > > just to imagine a use case that utilizes the performance advantage
>> > > of XDP.
>> > > If it is just to detect and drop a regular amount of packets that
>> > > try to
>> > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't
>> > > reflect
>> > > the benefit of XDP.
>> > >
>> > > >  And do you have any performance results
>> > > > > > comparing with the current OVS implementation?
>> > > >
>> > > > I didn't do any scale/performance related tests.
>> > > >
>> > > > If we were to move port security feature to XDP in OVN, then I
>> > > > think  we
>> > > need to
>> > > >    - Complete the TODO's like adding IPv6 and ARP/ND related
>> > > > checks
>> > > >    - Do some scale testing and see whether its reducing memory
>> > > > footprint of ovs-vswitchd and ovn-controller because of the
>> > > > reduction
>> > > > in OF rules
>> > > >
>> > >
>> > > Maybe I am wrong, but I think port-security flows are only related
>> > > to
>> > > local LSPs on each node, which doesn't contribute much to the
>> > > OVS/ovn-controller memory footprint, and thanks to your patches
>> > > that moves
>> > > port-security flow generation from northd to ovn-controller, the
>> > > central
>> > > components are already out of the picture of the port-security
>> > > related
>> > > costs. So I guess we won't see obvious differences in scale tests.
>> > >
>> > > > > >
>> > > > > > Another question is, would it work with smart NIC HW-offload,
>> > > > > > where
>> > > VF
>> > > > > > representer ports are added to OVS on the smart NIC? I guess
>> > > > > > XDP
>> > > doesn't
>> > > > > > support representer port, right?
>> > > >
>> > > > I think so. I don't have much experience/knowledge on this.  From
>> > > > what
>> > > > I understand,  if datapath flows are offloaded and since XDP is
>> > > > not
>> > > > offloaded, the xdo checks will be totally missed.
>> > > > So if XDP is to be used, then offloading should be disabled.
>> > > >
>> > >
>> > > Agree, although I did hope it could help for HW offload enabled
>> > > environments to mitigate the scenarios when packets would miss the
>> > > HW flow
>> > > cache.
>> > >
>> > > Thanks,
>> > > Han
>> > >
>> > > > Thanks
>> > > > Numan
>> > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Han
>> > > > > >
>> > > > > > > In order to attach and detach xdp programs,  libxdp [1] and
>> > > > > > > libbpf
>> > > is
>> > > > > > used.
>> > > > > > >
>> > > > > > > To test it out locally, please install libxdp-devel and
>> > > libbpf-devel
>> > > > > > > and the compile OVN first and then compile ovn_xdp by
>> > > > > > > running "make
>> > > > > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
>> > > /usr/local/share/ovn/
>> > > > > > >
>> > > > > > >
>> > > > > > > Numan Siddique (2):
>> > > > > > >   RFC: Add basic xdp/eBPF support in OVN.
>> > > > > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the
>> > > > > > > logical
>> > > > > > >     ports.
>> > > > > > >
>> > > > > > >  Makefile.am                 |   6 +-
>> > > > > > >  bpf/.gitignore              |   5 +
>> > > > > > >  bpf/automake.mk             |  23 +++
>> > > > > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
>> > > > > > >  configure.ac                |   2 +
>> > > > > > >  controller/automake.mk      |   4 +-
>> > > > > > >  controller/binding.c        |  45 +++--
>> > > > > > >  controller/binding.h        |   7 +
>> > > > > > >  controller/ovn-controller.c |  79 +++++++-
>> > > > > > >  controller/xdp.c            | 389
>> > > ++++++++++++++++++++++++++++++++++++
>> > > > > > >  controller/xdp.h            |  41 ++++
>> > > > > > >  m4/ovn.m4                   |  20 ++
>> > > > > > >  tests/automake.mk           |   1 +
>> > > > > > >  13 files changed, 753 insertions(+), 25 deletions(-)
>> > > > > > >  create mode 100644 bpf/.gitignore
>> > > > > > >  create mode 100644 bpf/automake.mk
>> > > > > > >  create mode 100644 bpf/ovn_xdp.c
>> > > > > > >  create mode 100644 controller/xdp.c
>> > > > > > >  create mode 100644 controller/xdp.h
>> > > > > > >
>> > > > > > > --
>> > > > > > > 2.35.3
>> > > > > > >
>> > > > > > > _______________________________________________
>> > > > > > > dev mailing list
>> > > > > > > dev@openvswitch.org
>> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> > > > > > _______________________________________________
>> > > > > > dev mailing list
>> > > > > > dev@openvswitch.org
>> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > 刘梦馨
>> > > > > Blog： http://oilbeater.com
>> > > > > Weibo: @oilbeater <http://weibo.com/oilbeater>
>> > > > > _______________________________________________
>> > > > > dev mailing list
>> > > > > dev@openvswitch.org
>> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> > >
>> >
>> >
>>
>>
>
> --
> 刘梦馨
> Blog： http://oilbeater.com
> Weibo: @oilbeater <http://weibo.com/oilbeater>
>

Han Zhou Jan. 21, 2023, 12:26 a.m. UTC | #9

On Wed, Jun 8, 2022 at 8:50 PM 刘梦馨 <liumengxinfly@gmail.com> wrote:

> In our profile, the conntrack is the main reason for performance drop.
>
> And when need related functions like lb and acl, we have to carefully
> check if other unrelated flows are affected like this patch for multicast
> traffic
>
> http://patchwork.ozlabs.org/project/ovn/patch/20211217141645.9931-1-dceara@redhat.com
>
> For some performance test poc scenarios we also have to disable functions
> related to conntrack for a better result. If XDP/eBPF can help with the
> conntrack performance issues, I think it will be a big boost and we don't
> need lots of customization or turn to Cilium to replace some functions but
> bring in lots of complexity.
>
> On Thu, 9 Jun 2022 at 11:19, 刘梦馨 <liumengxinfly@gmail.com> wrote:
>
>> > pod -> pod (directly to the other Pod IP) shouldn't go through any load
>> balancer related flows though, right?
>>
>> It didn't match the final vip and ct_lb action. But when the lb rule
>> exists, it will first send all packets to conntrack and lead recirculation
>> with ovs clone and it hurts the performance.
>>
>> And I find the initial commit that send all traffic to conntrack here
>> https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb
>> is to fix a bug.
>>
>> Even if we bypass the conntrack action in ingress pipeline by a
>> customized ovn, we still cannot bypass the conntrack in the egress
>> pipeline. All egress packets still need to be sent to conntrack to test if
>> they match a nat session.
>>
>> I cannot find the full performance test data at the moment. What I find
>> is that with the patch to bypass ingress conntrack, with lb rules, the
>> latency for pod-to-pod qperf test dropped from 118us to 97us. And if no lb
>> rules exist, the pod-to-pod latency drops to 88us.
>>
>
To revisit this discussion, now with the below patch, the stateless ACLs
can be used to bypass conntrack even when there are LBs. It's in the main
branch and will be in 23.03 release:
https://github.com/ovn-org/ovn/commit/a0f82efdd9dfd3ef2d9606c1890e353df1097a51

Hope this helps.

Regards,
Han


>
>> On Thu, 9 Jun 2022 at 01:52, Dan Williams <dcbw@redhat.com> wrote:
>>
>>> On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote:
>>> > > Could you tell roughly how many packets were sent in a single test?
>>> > > Was
>>> > the latency measured for all the UDP packets in average?
>>> >
>>> > Let me describe my test method more clearly. In fact, we only tested
>>> > pod-to-pod performance *not* pod-to-service and then do profile with
>>> > flamegraph and find the loadbalancer process took about 30% CPU
>>> > usage.
>>>
>>> pod -> pod (directly to the other Pod IP) shouldn't go through any load
>>> balancer related flows though, right? That seems curious to me... It
>>> might hit OVN's load balancer stages but (I think!) shouldn't be
>>> matching any rules in them, because the packet's destination IP
>>> wouldn't be a LB VIP.
>>>
>>> Did you do an ofproto/trace to see what OVS flows the packet was
>>> hitting and if any were OVN LB related?
>>>
>>> Dan
>>>
>>> >
>>> > Run two Pods in two different node, and one run qperf server the
>>> > other run
>>> > qperf client to test udp latency and bandwidth performance with
>>> > command
>>> > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`.
>>> >
>>> > In the first test, we use kube-ovn default setup which use ovn
>>> > loadbalancer
>>> > to replace kube-proxy and got the result latency  25.7us and
>>> > bandwidth
>>> > 2.8Mb/s
>>> >
>>> > Then we manually delete all ovn loadbalancer rules bind to the
>>> > logical
>>> > switch, and got a much better result 18.5us and 6Mb/s
>>> >
>>> > > Was it clear why the total datapath cannot be offloaded to HW?
>>> > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't
>>> > support
>>> > dp_hash and hash at the moment and these two method are used by
>>> > group table to select a backend.
>>> > What makes things worse is that when any lb bind to a ls all packet
>>> > will go
>>> > through the lb pipeline even if it not designate to service. So the
>>> > total
>>> > ls datapath cannot be offloaded.
>>> >
>>> > We have a customized path to bypaas the lb pipeline if traffic not
>>> > designate to service here
>>> >
>>> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch
>>> >
>>> > > I am sorry that I am confused by OVN "L2" LB. I think you might
>>> > > mean OVN
>>> > "L3/L4" LB?
>>> > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to
>>> > replace
>>> > kube-proxy
>>> >
>>> > >   I am asking because if the packets hit mega flows in the kernel
>>> > > cache,
>>> > it shouldn't be slower than kube-proxy which also uses conntrack. If
>>> > it is
>>> > HW offloaded it should be faster.
>>> >
>>> > In my previous profile it seems unrelated to mega flow cache. The
>>> > flame
>>> > graph shows that there is extra ovs clone and reprocess compared to
>>> > the
>>> > flame graph without lb. I have introduced how to profile and optimize
>>> > kube-ovn performance before and give more detail about the lb
>>> > performance
>>> > issue at the beginning of the video in Chinese
>>> > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide
>>> > more
>>> > help
>>> >
>>> > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote:
>>> >
>>> > >
>>> > >
>>> > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org>
>>> > > wrote:
>>> > > >
>>> > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com>
>>> > > > wrote:
>>> > > > >
>>> > > > > Just give some input about eBPF/XDP support.
>>> > > > >
>>> > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes,
>>> > > > > but found
>>> > > > > that
>>> > > > > the L2 LB will use conntrack and ovs clone which hurts
>>> > > > > performance
>>> > > badly.
>>> > > > > The latency
>>> > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth
>>> > > > > drop
>>> > > from
>>> > > > > 6Mb/s to 2.8Mb/s.
>>> > > > >
>>> > > Thanks for the input!
>>> > > Could you tell roughly how many packets were sent in a single test?
>>> > > Was
>>> > > the latency measured for all the UDP packets in average? I am
>>> > > asking
>>> > > because if the packets hit mega flows in the kernel cache, it
>>> > > shouldn't be
>>> > > slower than kube-proxy which also uses conntrack. If it is HW
>>> > > offloaded it
>>> > > should be faster.
>>> > >
>>> > > > > Even if the traffic does not target to LB VIPs has the same
>>> > > > > performance
>>> > > > > drop and it also leads to the
>>> > > > > total datapath cannot be offloaded to hardware.
>>> > > > >
>>> > >
>>> > > Was it clear why the total datapath cannot be offloaded to HW?
>>> > > There might
>>> > > be problems of supporting HW offloading in earlier version of OVN.
>>> > > There
>>> > > have been improvements to make it more HW offload friendly.
>>> > >
>>> > > > > And finally we turn to using Cilium's chaining mode to replace
>>> > > > > the OVN
>>> > > L2
>>> > > > > LB to implement kube-proxy to
>>> > > > > resolve the above issues. We hope to see the lb optimization by
>>> > > eBPF/XDP on
>>> > > > > the OVN side.
>>> > > > >
>>> > > >
>>> > > > Thanks for your comments and inputs.   I think we should
>>> > > > definitely
>>> > > > explore optimizing this use case
>>> > > > and see if its possible to leverage eBPF/XDP for this.
>>> > > >
>>> > >
>>> > > I am sorry that I am confused by OVN "L2" LB. I think you might
>>> > > mean OVN
>>> > > "L3/L4" LB?
>>> > >
>>> > > Some general thoughts on this is, OVN is primarily to program OVS
>>> > > (or
>>> > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a
>>> > > data-driven approach (as mentioned by Ben in several talks). The
>>> > > advantage
>>> > > is that it uses caches to accelerate datapath, regardless of the
>>> > > number of
>>> > > pipeline stages in the forwarding logic; and the disadvantage is of
>>> > > course
>>> > > when a packet has a cache miss, it will be slow. So I would think
>>> > > the
>>> > > direction of using eBPF/XDP is better to be within OVS itself,
>>> > > instead of
>>> > > adding an extra stage that cannot be cached within the OVS
>>> > > framework,
>>> > > because even if the extra stage is very fast, it is still extra.
>>> > >
>>> > > I would consider such an extra eBPF/XDP stage in OVN directly only
>>> > > for the
>>> > > cases that we know it is likely to miss the OVS/HW flow caches. One
>>> > > example
>>> > > may be DOS attacks that always trigger CT unestablished entries,
>>> > > which is
>>> > > not HW offload friendly. (But I don't have concrete use
>>> > > cases/scenarios)
>>> > >
>>> > > In the case of OVN LB, I don't see a reason why it would miss the
>>> > > cache
>>> > > except for the first packets. Adding an extra eBPF/XDP stage on top
>>> > > of the
>>> > > OVS cached pipeline doesn't seem to improve the performance.
>>> > >
>>> > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote:
>>> > > > > > >
>>> > > > > > > From: Numan Siddique <numans@ovn.org>
>>> > > > > > >
>>> > > > > > > XDP program - ovn_xdp.c added in this RFC patch  series
>>> > > > > > > implements
>>> > > basic
>>> > > > > > port
>>> > > > > > > security and drops any packet if the port security check
>>> > > > > > > fails.
>>> > > > > > > There are still few TODOs in the port security checks. Like
>>> > > > > > >       - Make ovn xdp configurable.
>>> > > > > > >       - Removing the ingress Openflow rules from table 73
>>> > > > > > > and 74
>>> > > if ovn
>>> > > > > > xdp
>>> > > > > > >         is enabled.
>>> > > > > > >       - Add IPv6 support.
>>> > > > > > >       - Enhance the port security xdp program for ARP/IPv6
>>> > > > > > > ND
>>> > > checks.
>>> > > > > > >
>>> > > > > > > This patch adds a basic XDP support in OVN and in future we
>>> > > > > > > can
>>> > > > > > > leverage eBPF/XDP features.
>>> > > > > > >
>>> > > > > > > I'm not sure how much value this RFC patch adds to make use
>>> > > > > > > of
>>> > > eBPF/XDP
>>> > > > > > > just for port security.  Submitting as RFC to get some
>>> > > > > > > feedback and
>>> > > > > > > start some conversation on eBPF/XDP in OVN.
>>> > > > > > >
>>> > > > > > Hi Numan,
>>> > > > > >
>>> > > > > > This is really cool. It demonstrates how OVN could leverage
>>> > > > > > eBPF/XDP.
>>> > > > > >
>>> > > > > > On the other hand, for the port-security feature in XDP, I
>>> > > > > > keep
>>> > > thinking
>>> > > > > > about the scenarios and it is still not very clear to me. One
>>> > > advantage I
>>> > > > > > can think of is to prevent DOS attacks from VM/Pod when
>>> > > > > > invalid
>>> > > IP/MAC are
>>> > > > > > used, XDP may perform better and drop packets with lower CPU
>>> > > > > > cost
>>> > > > > > (comparing with OVS kernel datapath). However, I am also
>>> > > > > > wondering
>>> > > why
>>> > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you
>>> > > > > > have
>>> > > some more
>>> > > > > > thoughts about the use cases?
>>> > > >
>>> > > > My idea was to demonstrate the use of eBPF/XDP and port security
>>> > > > checks were easy to do
>>> > > > before the packet hits the OVS pipeline.
>>> > > >
>>> > > Understand. It is indeed a great demonstration.
>>> > >
>>> > > > If we were to move the port security check to XDP, then the only
>>> > > > advantage we would be getting
>>> > > > in my opinion is to remove the corresponding ingress port
>>> > > > security
>>> > > > check related OF rules from ovs-vswitchd, thereby decreasing some
>>> > > > looks up during
>>> > > > flow translation.
>>> > > >
>>> > > For slow path, it might reduce the lookups in two tables, but
>>> > > considering
>>> > > that we have tens of tables, this cost may be negligible?
>>> > > For fast path, there is no impact on the megaflow cache.
>>> > >
>>> > > > I'm not sure why an attacker would use invalid IP/MAC for DOS
>>> > > > attacks.
>>> > > > But from what I know, ovn-kubernetes do want to restrict each POD
>>> > > > to
>>> > > > its assigned IP/MAC.
>>> > > >
>>> > > Yes, restricting pods to use assigned IP/MAC is for port security,
>>> > > which
>>> > > is implemented by the port-security flows. I was talking about DOS
>>> > > attacks
>>> > > just to imagine a use case that utilizes the performance advantage
>>> > > of XDP.
>>> > > If it is just to detect and drop a regular amount of packets that
>>> > > try to
>>> > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't
>>> > > reflect
>>> > > the benefit of XDP.
>>> > >
>>> > > >  And do you have any performance results
>>> > > > > > comparing with the current OVS implementation?
>>> > > >
>>> > > > I didn't do any scale/performance related tests.
>>> > > >
>>> > > > If we were to move port security feature to XDP in OVN, then I
>>> > > > think  we
>>> > > need to
>>> > > >    - Complete the TODO's like adding IPv6 and ARP/ND related
>>> > > > checks
>>> > > >    - Do some scale testing and see whether its reducing memory
>>> > > > footprint of ovs-vswitchd and ovn-controller because of the
>>> > > > reduction
>>> > > > in OF rules
>>> > > >
>>> > >
>>> > > Maybe I am wrong, but I think port-security flows are only related
>>> > > to
>>> > > local LSPs on each node, which doesn't contribute much to the
>>> > > OVS/ovn-controller memory footprint, and thanks to your patches
>>> > > that moves
>>> > > port-security flow generation from northd to ovn-controller, the
>>> > > central
>>> > > components are already out of the picture of the port-security
>>> > > related
>>> > > costs. So I guess we won't see obvious differences in scale tests.
>>> > >
>>> > > > > >
>>> > > > > > Another question is, would it work with smart NIC HW-offload,
>>> > > > > > where
>>> > > VF
>>> > > > > > representer ports are added to OVS on the smart NIC? I guess
>>> > > > > > XDP
>>> > > doesn't
>>> > > > > > support representer port, right?
>>> > > >
>>> > > > I think so. I don't have much experience/knowledge on this.  From
>>> > > > what
>>> > > > I understand,  if datapath flows are offloaded and since XDP is
>>> > > > not
>>> > > > offloaded, the xdo checks will be totally missed.
>>> > > > So if XDP is to be used, then offloading should be disabled.
>>> > > >
>>> > >
>>> > > Agree, although I did hope it could help for HW offload enabled
>>> > > environments to mitigate the scenarios when packets would miss the
>>> > > HW flow
>>> > > cache.
>>> > >
>>> > > Thanks,
>>> > > Han
>>> > >
>>> > > > Thanks
>>> > > > Numan
>>> > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > > Han
>>> > > > > >
>>> > > > > > > In order to attach and detach xdp programs,  libxdp [1] and
>>> > > > > > > libbpf
>>> > > is
>>> > > > > > used.
>>> > > > > > >
>>> > > > > > > To test it out locally, please install libxdp-devel and
>>> > > libbpf-devel
>>> > > > > > > and the compile OVN first and then compile ovn_xdp by
>>> > > > > > > running "make
>>> > > > > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
>>> > > /usr/local/share/ovn/
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > Numan Siddique (2):
>>> > > > > > >   RFC: Add basic xdp/eBPF support in OVN.
>>> > > > > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the
>>> > > > > > > logical
>>> > > > > > >     ports.
>>> > > > > > >
>>> > > > > > >  Makefile.am                 |   6 +-
>>> > > > > > >  bpf/.gitignore              |   5 +
>>> > > > > > >  bpf/automake.mk             |  23 +++
>>> > > > > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
>>> > > > > > >  configure.ac                |   2 +
>>> > > > > > >  controller/automake.mk      |   4 +-
>>> > > > > > >  controller/binding.c        |  45 +++--
>>> > > > > > >  controller/binding.h        |   7 +
>>> > > > > > >  controller/ovn-controller.c |  79 +++++++-
>>> > > > > > >  controller/xdp.c            | 389
>>> > > ++++++++++++++++++++++++++++++++++++
>>> > > > > > >  controller/xdp.h            |  41 ++++
>>> > > > > > >  m4/ovn.m4                   |  20 ++
>>> > > > > > >  tests/automake.mk           |   1 +
>>> > > > > > >  13 files changed, 753 insertions(+), 25 deletions(-)
>>> > > > > > >  create mode 100644 bpf/.gitignore
>>> > > > > > >  create mode 100644 bpf/automake.mk
>>> > > > > > >  create mode 100644 bpf/ovn_xdp.c
>>> > > > > > >  create mode 100644 controller/xdp.c
>>> > > > > > >  create mode 100644 controller/xdp.h
>>> > > > > > >
>>> > > > > > > --
>>> > > > > > > 2.35.3
>>> > > > > > >
>>> > > > > > > _______________________________________________
>>> > > > > > > dev mailing list
>>> > > > > > > dev@openvswitch.org
>>> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> > > > > > _______________________________________________
>>> > > > > > dev mailing list
>>> > > > > > dev@openvswitch.org
>>> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > 刘梦馨
>>> > > > > Blog： http://oilbeater.com
>>> > > > > Weibo: @oilbeater <http://weibo.com/oilbeater>
>>> > > > > _______________________________________________
>>> > > > > dev mailing list
>>> > > > > dev@openvswitch.org
>>> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> > >
>>> >
>>> >
>>>
>>>
>>
>> --
>> 刘梦馨
>> Blog： http://oilbeater.com
>> Weibo: @oilbeater <http://weibo.com/oilbeater>
>>
>
>
> --
> 刘梦馨
> Blog： http://oilbeater.com
> Weibo: @oilbeater <http://weibo.com/oilbeater>
>

[ovs-dev,RFC,0/2] Basic eBPF/XDP support in OVN.

Message

Comments