Message ID | 20220531004237.3872754-1-numans@ovn.org |
---|---|
Headers | show |
Series | Basic eBPF/XDP support in OVN. | expand |
On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > From: Numan Siddique <numans@ovn.org> > > XDP program - ovn_xdp.c added in this RFC patch series implements basic port > security and drops any packet if the port security check fails. > There are still few TODOs in the port security checks. Like > - Make ovn xdp configurable. > - Removing the ingress Openflow rules from table 73 and 74 if ovn xdp > is enabled. > - Add IPv6 support. > - Enhance the port security xdp program for ARP/IPv6 ND checks. > > This patch adds a basic XDP support in OVN and in future we can > leverage eBPF/XDP features. > > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP > just for port security. Submitting as RFC to get some feedback and > start some conversation on eBPF/XDP in OVN. > Hi Numan, This is really cool. It demonstrates how OVN could leverage eBPF/XDP. On the other hand, for the port-security feature in XDP, I keep thinking about the scenarios and it is still not very clear to me. One advantage I can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are used, XDP may perform better and drop packets with lower CPU cost (comparing with OVS kernel datapath). However, I am also wondering why would a attacker use invalid IP/MAC for DOS attacks? Do you have some more thoughts about the use cases? And do you have any performance results comparing with the current OVS implementation? Another question is, would it work with smart NIC HW-offload, where VF representer ports are added to OVS on the smart NIC? I guess XDP doesn't support representer port, right? Thanks, Han > In order to attach and detach xdp programs, libxdp [1] and libbpf is used. > > To test it out locally, please install libxdp-devel and libbpf-devel > and the compile OVN first and then compile ovn_xdp by running "make > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/ > > > Numan Siddique (2): > RFC: Add basic xdp/eBPF support in OVN. > RFC: ovn-controller: Attach XDP progs to the VIFs of the logical > ports. > > Makefile.am | 6 +- > bpf/.gitignore | 5 + > bpf/automake.mk | 23 +++ > bpf/ovn_xdp.c | 156 +++++++++++++++ > configure.ac | 2 + > controller/automake.mk | 4 +- > controller/binding.c | 45 +++-- > controller/binding.h | 7 + > controller/ovn-controller.c | 79 +++++++- > controller/xdp.c | 389 ++++++++++++++++++++++++++++++++++++ > controller/xdp.h | 41 ++++ > m4/ovn.m4 | 20 ++ > tests/automake.mk | 1 + > 13 files changed, 753 insertions(+), 25 deletions(-) > create mode 100644 bpf/.gitignore > create mode 100644 bpf/automake.mk > create mode 100644 bpf/ovn_xdp.c > create mode 100644 controller/xdp.c > create mode 100644 controller/xdp.h > > -- > 2.35.3 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Just give some input about eBPF/XDP support. We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found that the L2 LB will use conntrack and ovs clone which hurts performance badly. The latency for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop from 6Mb/s to 2.8Mb/s. Even if the traffic does not target to LB VIPs has the same performance drop and it also leads to the total datapath cannot be offloaded to hardware. And finally we turn to using Cilium's chaining mode to replace the OVN L2 LB to implement kube-proxy to resolve the above issues. We hope to see the lb optimization by eBPF/XDP on the OVN side. On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote: > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > > > From: Numan Siddique <numans@ovn.org> > > > > XDP program - ovn_xdp.c added in this RFC patch series implements basic > port > > security and drops any packet if the port security check fails. > > There are still few TODOs in the port security checks. Like > > - Make ovn xdp configurable. > > - Removing the ingress Openflow rules from table 73 and 74 if ovn > xdp > > is enabled. > > - Add IPv6 support. > > - Enhance the port security xdp program for ARP/IPv6 ND checks. > > > > This patch adds a basic XDP support in OVN and in future we can > > leverage eBPF/XDP features. > > > > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP > > just for port security. Submitting as RFC to get some feedback and > > start some conversation on eBPF/XDP in OVN. > > > Hi Numan, > > This is really cool. It demonstrates how OVN could leverage eBPF/XDP. > > On the other hand, for the port-security feature in XDP, I keep thinking > about the scenarios and it is still not very clear to me. One advantage I > can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are > used, XDP may perform better and drop packets with lower CPU cost > (comparing with OVS kernel datapath). However, I am also wondering why > would a attacker use invalid IP/MAC for DOS attacks? Do you have some more > thoughts about the use cases? And do you have any performance results > comparing with the current OVS implementation? > > Another question is, would it work with smart NIC HW-offload, where VF > representer ports are added to OVS on the smart NIC? I guess XDP doesn't > support representer port, right? > > Thanks, > Han > > > In order to attach and detach xdp programs, libxdp [1] and libbpf is > used. > > > > To test it out locally, please install libxdp-devel and libbpf-devel > > and the compile OVN first and then compile ovn_xdp by running "make > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/ > > > > > > Numan Siddique (2): > > RFC: Add basic xdp/eBPF support in OVN. > > RFC: ovn-controller: Attach XDP progs to the VIFs of the logical > > ports. > > > > Makefile.am | 6 +- > > bpf/.gitignore | 5 + > > bpf/automake.mk | 23 +++ > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > configure.ac | 2 + > > controller/automake.mk | 4 +- > > controller/binding.c | 45 +++-- > > controller/binding.h | 7 + > > controller/ovn-controller.c | 79 +++++++- > > controller/xdp.c | 389 ++++++++++++++++++++++++++++++++++++ > > controller/xdp.h | 41 ++++ > > m4/ovn.m4 | 20 ++ > > tests/automake.mk | 1 + > > 13 files changed, 753 insertions(+), 25 deletions(-) > > create mode 100644 bpf/.gitignore > > create mode 100644 bpf/automake.mk > > create mode 100644 bpf/ovn_xdp.c > > create mode 100644 controller/xdp.c > > create mode 100644 controller/xdp.h > > > > -- > > 2.35.3 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> wrote: > > Just give some input about eBPF/XDP support. > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found > that > the L2 LB will use conntrack and ovs clone which hurts performance badly. > The latency > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop from > 6Mb/s to 2.8Mb/s. > > Even if the traffic does not target to LB VIPs has the same performance > drop and it also leads to the > total datapath cannot be offloaded to hardware. > > And finally we turn to using Cilium's chaining mode to replace the OVN L2 > LB to implement kube-proxy to > resolve the above issues. We hope to see the lb optimization by eBPF/XDP on > the OVN side. > Thanks for your comments and inputs. I think we should definitely explore optimizing this use case and see if its possible to leverage eBPF/XDP for this. > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote: > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > XDP program - ovn_xdp.c added in this RFC patch series implements basic > > port > > > security and drops any packet if the port security check fails. > > > There are still few TODOs in the port security checks. Like > > > - Make ovn xdp configurable. > > > - Removing the ingress Openflow rules from table 73 and 74 if ovn > > xdp > > > is enabled. > > > - Add IPv6 support. > > > - Enhance the port security xdp program for ARP/IPv6 ND checks. > > > > > > This patch adds a basic XDP support in OVN and in future we can > > > leverage eBPF/XDP features. > > > > > > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP > > > just for port security. Submitting as RFC to get some feedback and > > > start some conversation on eBPF/XDP in OVN. > > > > > Hi Numan, > > > > This is really cool. It demonstrates how OVN could leverage eBPF/XDP. > > > > On the other hand, for the port-security feature in XDP, I keep thinking > > about the scenarios and it is still not very clear to me. One advantage I > > can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are > > used, XDP may perform better and drop packets with lower CPU cost > > (comparing with OVS kernel datapath). However, I am also wondering why > > would a attacker use invalid IP/MAC for DOS attacks? Do you have some more > > thoughts about the use cases? My idea was to demonstrate the use of eBPF/XDP and port security checks were easy to do before the packet hits the OVS pipeline. If we were to move the port security check to XDP, then the only advantage we would be getting in my opinion is to remove the corresponding ingress port security check related OF rules from ovs-vswitchd, thereby decreasing some looks up during flow translation. I'm not sure why an attacker would use invalid IP/MAC for DOS attacks. But from what I know, ovn-kubernetes do want to restrict each POD to its assigned IP/MAC. And do you have any performance results > > comparing with the current OVS implementation? I didn't do any scale/performance related tests. If we were to move port security feature to XDP in OVN, then I think we need to - Complete the TODO's like adding IPv6 and ARP/ND related checks - Do some scale testing and see whether its reducing memory footprint of ovs-vswitchd and ovn-controller because of the reduction in OF rules > > > > Another question is, would it work with smart NIC HW-offload, where VF > > representer ports are added to OVS on the smart NIC? I guess XDP doesn't > > support representer port, right? I think so. I don't have much experience/knowledge on this. From what I understand, if datapath flows are offloaded and since XDP is not offloaded, the xdo checks will be totally missed. So if XDP is to be used, then offloading should be disabled. Thanks Numan > > > > Thanks, > > Han > > > > > In order to attach and detach xdp programs, libxdp [1] and libbpf is > > used. > > > > > > To test it out locally, please install libxdp-devel and libbpf-devel > > > and the compile OVN first and then compile ovn_xdp by running "make > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/ > > > > > > > > > Numan Siddique (2): > > > RFC: Add basic xdp/eBPF support in OVN. > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the logical > > > ports. > > > > > > Makefile.am | 6 +- > > > bpf/.gitignore | 5 + > > > bpf/automake.mk | 23 +++ > > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > > configure.ac | 2 + > > > controller/automake.mk | 4 +- > > > controller/binding.c | 45 +++-- > > > controller/binding.h | 7 + > > > controller/ovn-controller.c | 79 +++++++- > > > controller/xdp.c | 389 ++++++++++++++++++++++++++++++++++++ > > > controller/xdp.h | 41 ++++ > > > m4/ovn.m4 | 20 ++ > > > tests/automake.mk | 1 + > > > 13 files changed, 753 insertions(+), 25 deletions(-) > > > create mode 100644 bpf/.gitignore > > > create mode 100644 bpf/automake.mk > > > create mode 100644 bpf/ovn_xdp.c > > > create mode 100644 controller/xdp.c > > > create mode 100644 controller/xdp.h > > > > > > -- > > > 2.35.3 > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > -- > 刘梦馨 > Blog: http://oilbeater.com > Weibo: @oilbeater <http://weibo.com/oilbeater> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> wrote: > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> wrote: > > > > Just give some input about eBPF/XDP support. > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found > > that > > the L2 LB will use conntrack and ovs clone which hurts performance badly. > > The latency > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop from > > 6Mb/s to 2.8Mb/s. > > Thanks for the input! Could you tell roughly how many packets were sent in a single test? Was the latency measured for all the UDP packets in average? I am asking because if the packets hit mega flows in the kernel cache, it shouldn't be slower than kube-proxy which also uses conntrack. If it is HW offloaded it should be faster. > > Even if the traffic does not target to LB VIPs has the same performance > > drop and it also leads to the > > total datapath cannot be offloaded to hardware. > > Was it clear why the total datapath cannot be offloaded to HW? There might be problems of supporting HW offloading in earlier version of OVN. There have been improvements to make it more HW offload friendly. > > And finally we turn to using Cilium's chaining mode to replace the OVN L2 > > LB to implement kube-proxy to > > resolve the above issues. We hope to see the lb optimization by eBPF/XDP on > > the OVN side. > > > > Thanks for your comments and inputs. I think we should definitely > explore optimizing this use case > and see if its possible to leverage eBPF/XDP for this. > I am sorry that I am confused by OVN "L2" LB. I think you might mean OVN "L3/L4" LB? Some general thoughts on this is, OVN is primarily to program OVS (or other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a data-driven approach (as mentioned by Ben in several talks). The advantage is that it uses caches to accelerate datapath, regardless of the number of pipeline stages in the forwarding logic; and the disadvantage is of course when a packet has a cache miss, it will be slow. So I would think the direction of using eBPF/XDP is better to be within OVS itself, instead of adding an extra stage that cannot be cached within the OVS framework, because even if the extra stage is very fast, it is still extra. I would consider such an extra eBPF/XDP stage in OVN directly only for the cases that we know it is likely to miss the OVS/HW flow caches. One example may be DOS attacks that always trigger CT unestablished entries, which is not HW offload friendly. (But I don't have concrete use cases/scenarios) In the case of OVN LB, I don't see a reason why it would miss the cache except for the first packets. Adding an extra eBPF/XDP stage on top of the OVS cached pipeline doesn't seem to improve the performance. > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote: > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > XDP program - ovn_xdp.c added in this RFC patch series implements basic > > > port > > > > security and drops any packet if the port security check fails. > > > > There are still few TODOs in the port security checks. Like > > > > - Make ovn xdp configurable. > > > > - Removing the ingress Openflow rules from table 73 and 74 if ovn > > > xdp > > > > is enabled. > > > > - Add IPv6 support. > > > > - Enhance the port security xdp program for ARP/IPv6 ND checks. > > > > > > > > This patch adds a basic XDP support in OVN and in future we can > > > > leverage eBPF/XDP features. > > > > > > > > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP > > > > just for port security. Submitting as RFC to get some feedback and > > > > start some conversation on eBPF/XDP in OVN. > > > > > > > Hi Numan, > > > > > > This is really cool. It demonstrates how OVN could leverage eBPF/XDP. > > > > > > On the other hand, for the port-security feature in XDP, I keep thinking > > > about the scenarios and it is still not very clear to me. One advantage I > > > can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are > > > used, XDP may perform better and drop packets with lower CPU cost > > > (comparing with OVS kernel datapath). However, I am also wondering why > > > would a attacker use invalid IP/MAC for DOS attacks? Do you have some more > > > thoughts about the use cases? > > My idea was to demonstrate the use of eBPF/XDP and port security > checks were easy to do > before the packet hits the OVS pipeline. > Understand. It is indeed a great demonstration. > If we were to move the port security check to XDP, then the only > advantage we would be getting > in my opinion is to remove the corresponding ingress port security > check related OF rules from ovs-vswitchd, thereby decreasing some > looks up during > flow translation. > For slow path, it might reduce the lookups in two tables, but considering that we have tens of tables, this cost may be negligible? For fast path, there is no impact on the megaflow cache. > I'm not sure why an attacker would use invalid IP/MAC for DOS attacks. > But from what I know, ovn-kubernetes do want to restrict each POD to > its assigned IP/MAC. > Yes, restricting pods to use assigned IP/MAC is for port security, which is implemented by the port-security flows. I was talking about DOS attacks just to imagine a use case that utilizes the performance advantage of XDP. If it is just to detect and drop a regular amount of packets that try to use fake IP/MAC to circumvent security policies (ACLs), it doesn't reflect the benefit of XDP. > And do you have any performance results > > > comparing with the current OVS implementation? > > I didn't do any scale/performance related tests. > > If we were to move port security feature to XDP in OVN, then I think we need to > - Complete the TODO's like adding IPv6 and ARP/ND related checks > - Do some scale testing and see whether its reducing memory > footprint of ovs-vswitchd and ovn-controller because of the reduction > in OF rules > Maybe I am wrong, but I think port-security flows are only related to local LSPs on each node, which doesn't contribute much to the OVS/ovn-controller memory footprint, and thanks to your patches that moves port-security flow generation from northd to ovn-controller, the central components are already out of the picture of the port-security related costs. So I guess we won't see obvious differences in scale tests. > > > > > > Another question is, would it work with smart NIC HW-offload, where VF > > > representer ports are added to OVS on the smart NIC? I guess XDP doesn't > > > support representer port, right? > > I think so. I don't have much experience/knowledge on this. From what > I understand, if datapath flows are offloaded and since XDP is not > offloaded, the xdo checks will be totally missed. > So if XDP is to be used, then offloading should be disabled. > Agree, although I did hope it could help for HW offload enabled environments to mitigate the scenarios when packets would miss the HW flow cache. Thanks, Han > Thanks > Numan > > > > > > > Thanks, > > > Han > > > > > > > In order to attach and detach xdp programs, libxdp [1] and libbpf is > > > used. > > > > > > > > To test it out locally, please install libxdp-devel and libbpf-devel > > > > and the compile OVN first and then compile ovn_xdp by running "make > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/ > > > > > > > > > > > > Numan Siddique (2): > > > > RFC: Add basic xdp/eBPF support in OVN. > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the logical > > > > ports. > > > > > > > > Makefile.am | 6 +- > > > > bpf/.gitignore | 5 + > > > > bpf/automake.mk | 23 +++ > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > > > configure.ac | 2 + > > > > controller/automake.mk | 4 +- > > > > controller/binding.c | 45 +++-- > > > > controller/binding.h | 7 + > > > > controller/ovn-controller.c | 79 +++++++- > > > > controller/xdp.c | 389 ++++++++++++++++++++++++++++++++++++ > > > > controller/xdp.h | 41 ++++ > > > > m4/ovn.m4 | 20 ++ > > > > tests/automake.mk | 1 + > > > > 13 files changed, 753 insertions(+), 25 deletions(-) > > > > create mode 100644 bpf/.gitignore > > > > create mode 100644 bpf/automake.mk > > > > create mode 100644 bpf/ovn_xdp.c > > > > create mode 100644 controller/xdp.c > > > > create mode 100644 controller/xdp.h > > > > > > > > -- > > > > 2.35.3 > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > -- > > 刘梦馨 > > Blog: http://oilbeater.com > > Weibo: @oilbeater <http://weibo.com/oilbeater> > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> Could you tell roughly how many packets were sent in a single test? Was the latency measured for all the UDP packets in average? Let me describe my test method more clearly. In fact, we only tested pod-to-pod performance *not* pod-to-service and then do profile with flamegraph and find the loadbalancer process took about 30% CPU usage. Run two Pods in two different node, and one run qperf server the other run qperf client to test udp latency and bandwidth performance with command `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`. In the first test, we use kube-ovn default setup which use ovn loadbalancer to replace kube-proxy and got the result latency 25.7us and bandwidth 2.8Mb/s Then we manually delete all ovn loadbalancer rules bind to the logical switch, and got a much better result 18.5us and 6Mb/s > Was it clear why the total datapath cannot be offloaded to HW? The issue we meet with hw-offload is that mellanox cx5/cx6 didn't support dp_hash and hash at the moment and these two method are used by group table to select a backend. What makes things worse is that when any lb bind to a ls all packet will go through the lb pipeline even if it not designate to service. So the total ls datapath cannot be offloaded. We have a customized path to bypaas the lb pipeline if traffic not designate to service here https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch > I am sorry that I am confused by OVN "L2" LB. I think you might mean OVN "L3/L4" LB? I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to replace kube-proxy > I am asking because if the packets hit mega flows in the kernel cache, it shouldn't be slower than kube-proxy which also uses conntrack. If it is HW offloaded it should be faster. In my previous profile it seems unrelated to mega flow cache. The flame graph shows that there is extra ovs clone and reprocess compared to the flame graph without lb. I have introduced how to profile and optimize kube-ovn performance before and give more detail about the lb performance issue at the beginning of the video in Chinese https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide more help On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote: > > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> wrote: > > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> wrote: > > > > > > Just give some input about eBPF/XDP support. > > > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, but found > > > that > > > the L2 LB will use conntrack and ovs clone which hurts performance > badly. > > > The latency > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth drop > from > > > 6Mb/s to 2.8Mb/s. > > > > Thanks for the input! > Could you tell roughly how many packets were sent in a single test? Was > the latency measured for all the UDP packets in average? I am asking > because if the packets hit mega flows in the kernel cache, it shouldn't be > slower than kube-proxy which also uses conntrack. If it is HW offloaded it > should be faster. > > > > Even if the traffic does not target to LB VIPs has the same performance > > > drop and it also leads to the > > > total datapath cannot be offloaded to hardware. > > > > > Was it clear why the total datapath cannot be offloaded to HW? There might > be problems of supporting HW offloading in earlier version of OVN. There > have been improvements to make it more HW offload friendly. > > > > And finally we turn to using Cilium's chaining mode to replace the OVN > L2 > > > LB to implement kube-proxy to > > > resolve the above issues. We hope to see the lb optimization by > eBPF/XDP on > > > the OVN side. > > > > > > > Thanks for your comments and inputs. I think we should definitely > > explore optimizing this use case > > and see if its possible to leverage eBPF/XDP for this. > > > > I am sorry that I am confused by OVN "L2" LB. I think you might mean OVN > "L3/L4" LB? > > Some general thoughts on this is, OVN is primarily to program OVS (or > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a > data-driven approach (as mentioned by Ben in several talks). The advantage > is that it uses caches to accelerate datapath, regardless of the number of > pipeline stages in the forwarding logic; and the disadvantage is of course > when a packet has a cache miss, it will be slow. So I would think the > direction of using eBPF/XDP is better to be within OVS itself, instead of > adding an extra stage that cannot be cached within the OVS framework, > because even if the extra stage is very fast, it is still extra. > > I would consider such an extra eBPF/XDP stage in OVN directly only for the > cases that we know it is likely to miss the OVS/HW flow caches. One example > may be DOS attacks that always trigger CT unestablished entries, which is > not HW offload friendly. (But I don't have concrete use cases/scenarios) > > In the case of OVN LB, I don't see a reason why it would miss the cache > except for the first packets. Adding an extra eBPF/XDP stage on top of the > OVS cached pipeline doesn't seem to improve the performance. > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> wrote: > > > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > > > XDP program - ovn_xdp.c added in this RFC patch series implements > basic > > > > port > > > > > security and drops any packet if the port security check fails. > > > > > There are still few TODOs in the port security checks. Like > > > > > - Make ovn xdp configurable. > > > > > - Removing the ingress Openflow rules from table 73 and 74 > if ovn > > > > xdp > > > > > is enabled. > > > > > - Add IPv6 support. > > > > > - Enhance the port security xdp program for ARP/IPv6 ND > checks. > > > > > > > > > > This patch adds a basic XDP support in OVN and in future we can > > > > > leverage eBPF/XDP features. > > > > > > > > > > I'm not sure how much value this RFC patch adds to make use of > eBPF/XDP > > > > > just for port security. Submitting as RFC to get some feedback and > > > > > start some conversation on eBPF/XDP in OVN. > > > > > > > > > Hi Numan, > > > > > > > > This is really cool. It demonstrates how OVN could leverage eBPF/XDP. > > > > > > > > On the other hand, for the port-security feature in XDP, I keep > thinking > > > > about the scenarios and it is still not very clear to me. One > advantage I > > > > can think of is to prevent DOS attacks from VM/Pod when invalid > IP/MAC are > > > > used, XDP may perform better and drop packets with lower CPU cost > > > > (comparing with OVS kernel datapath). However, I am also wondering > why > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you have > some more > > > > thoughts about the use cases? > > > > My idea was to demonstrate the use of eBPF/XDP and port security > > checks were easy to do > > before the packet hits the OVS pipeline. > > > Understand. It is indeed a great demonstration. > > > If we were to move the port security check to XDP, then the only > > advantage we would be getting > > in my opinion is to remove the corresponding ingress port security > > check related OF rules from ovs-vswitchd, thereby decreasing some > > looks up during > > flow translation. > > > For slow path, it might reduce the lookups in two tables, but considering > that we have tens of tables, this cost may be negligible? > For fast path, there is no impact on the megaflow cache. > > > I'm not sure why an attacker would use invalid IP/MAC for DOS attacks. > > But from what I know, ovn-kubernetes do want to restrict each POD to > > its assigned IP/MAC. > > > Yes, restricting pods to use assigned IP/MAC is for port security, which > is implemented by the port-security flows. I was talking about DOS attacks > just to imagine a use case that utilizes the performance advantage of XDP. > If it is just to detect and drop a regular amount of packets that try to > use fake IP/MAC to circumvent security policies (ACLs), it doesn't reflect > the benefit of XDP. > > > And do you have any performance results > > > > comparing with the current OVS implementation? > > > > I didn't do any scale/performance related tests. > > > > If we were to move port security feature to XDP in OVN, then I think we > need to > > - Complete the TODO's like adding IPv6 and ARP/ND related checks > > - Do some scale testing and see whether its reducing memory > > footprint of ovs-vswitchd and ovn-controller because of the reduction > > in OF rules > > > > Maybe I am wrong, but I think port-security flows are only related to > local LSPs on each node, which doesn't contribute much to the > OVS/ovn-controller memory footprint, and thanks to your patches that moves > port-security flow generation from northd to ovn-controller, the central > components are already out of the picture of the port-security related > costs. So I guess we won't see obvious differences in scale tests. > > > > > > > > > Another question is, would it work with smart NIC HW-offload, where > VF > > > > representer ports are added to OVS on the smart NIC? I guess XDP > doesn't > > > > support representer port, right? > > > > I think so. I don't have much experience/knowledge on this. From what > > I understand, if datapath flows are offloaded and since XDP is not > > offloaded, the xdo checks will be totally missed. > > So if XDP is to be used, then offloading should be disabled. > > > > Agree, although I did hope it could help for HW offload enabled > environments to mitigate the scenarios when packets would miss the HW flow > cache. > > Thanks, > Han > > > Thanks > > Numan > > > > > > > > > > Thanks, > > > > Han > > > > > > > > > In order to attach and detach xdp programs, libxdp [1] and libbpf > is > > > > used. > > > > > > > > > > To test it out locally, please install libxdp-devel and > libbpf-devel > > > > > and the compile OVN first and then compile ovn_xdp by running "make > > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or > /usr/local/share/ovn/ > > > > > > > > > > > > > > > Numan Siddique (2): > > > > > RFC: Add basic xdp/eBPF support in OVN. > > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the logical > > > > > ports. > > > > > > > > > > Makefile.am | 6 +- > > > > > bpf/.gitignore | 5 + > > > > > bpf/automake.mk | 23 +++ > > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > > > > configure.ac | 2 + > > > > > controller/automake.mk | 4 +- > > > > > controller/binding.c | 45 +++-- > > > > > controller/binding.h | 7 + > > > > > controller/ovn-controller.c | 79 +++++++- > > > > > controller/xdp.c | 389 > ++++++++++++++++++++++++++++++++++++ > > > > > controller/xdp.h | 41 ++++ > > > > > m4/ovn.m4 | 20 ++ > > > > > tests/automake.mk | 1 + > > > > > 13 files changed, 753 insertions(+), 25 deletions(-) > > > > > create mode 100644 bpf/.gitignore > > > > > create mode 100644 bpf/automake.mk > > > > > create mode 100644 bpf/ovn_xdp.c > > > > > create mode 100644 controller/xdp.c > > > > > create mode 100644 controller/xdp.h > > > > > > > > > > -- > > > > > 2.35.3 > > > > > > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > -- > > > 刘梦馨 > > > Blog: http://oilbeater.com > > > Weibo: @oilbeater <http://weibo.com/oilbeater> > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote: > > Could you tell roughly how many packets were sent in a single test? > > Was > the latency measured for all the UDP packets in average? > > Let me describe my test method more clearly. In fact, we only tested > pod-to-pod performance *not* pod-to-service and then do profile with > flamegraph and find the loadbalancer process took about 30% CPU > usage. pod -> pod (directly to the other Pod IP) shouldn't go through any load balancer related flows though, right? That seems curious to me... It might hit OVN's load balancer stages but (I think!) shouldn't be matching any rules in them, because the packet's destination IP wouldn't be a LB VIP. Did you do an ofproto/trace to see what OVS flows the packet was hitting and if any were OVN LB related? Dan > > Run two Pods in two different node, and one run qperf server the > other run > qperf client to test udp latency and bandwidth performance with > command > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`. > > In the first test, we use kube-ovn default setup which use ovn > loadbalancer > to replace kube-proxy and got the result latency 25.7us and > bandwidth > 2.8Mb/s > > Then we manually delete all ovn loadbalancer rules bind to the > logical > switch, and got a much better result 18.5us and 6Mb/s > > > Was it clear why the total datapath cannot be offloaded to HW? > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't > support > dp_hash and hash at the moment and these two method are used by > group table to select a backend. > What makes things worse is that when any lb bind to a ls all packet > will go > through the lb pipeline even if it not designate to service. So the > total > ls datapath cannot be offloaded. > > We have a customized path to bypaas the lb pipeline if traffic not > designate to service here > https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch > > > I am sorry that I am confused by OVN "L2" LB. I think you might > > mean OVN > "L3/L4" LB? > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to > replace > kube-proxy > > > I am asking because if the packets hit mega flows in the kernel > > cache, > it shouldn't be slower than kube-proxy which also uses conntrack. If > it is > HW offloaded it should be faster. > > In my previous profile it seems unrelated to mega flow cache. The > flame > graph shows that there is extra ovs clone and reprocess compared to > the > flame graph without lb. I have introduced how to profile and optimize > kube-ovn performance before and give more detail about the lb > performance > issue at the beginning of the video in Chinese > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide > more > help > > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote: > > > > > > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> > > wrote: > > > > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> > > > wrote: > > > > > > > > Just give some input about eBPF/XDP support. > > > > > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, > > > > but found > > > > that > > > > the L2 LB will use conntrack and ovs clone which hurts > > > > performance > > badly. > > > > The latency > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth > > > > drop > > from > > > > 6Mb/s to 2.8Mb/s. > > > > > > Thanks for the input! > > Could you tell roughly how many packets were sent in a single test? > > Was > > the latency measured for all the UDP packets in average? I am > > asking > > because if the packets hit mega flows in the kernel cache, it > > shouldn't be > > slower than kube-proxy which also uses conntrack. If it is HW > > offloaded it > > should be faster. > > > > > > Even if the traffic does not target to LB VIPs has the same > > > > performance > > > > drop and it also leads to the > > > > total datapath cannot be offloaded to hardware. > > > > > > > > Was it clear why the total datapath cannot be offloaded to HW? > > There might > > be problems of supporting HW offloading in earlier version of OVN. > > There > > have been improvements to make it more HW offload friendly. > > > > > > And finally we turn to using Cilium's chaining mode to replace > > > > the OVN > > L2 > > > > LB to implement kube-proxy to > > > > resolve the above issues. We hope to see the lb optimization by > > eBPF/XDP on > > > > the OVN side. > > > > > > > > > > Thanks for your comments and inputs. I think we should > > > definitely > > > explore optimizing this use case > > > and see if its possible to leverage eBPF/XDP for this. > > > > > > > I am sorry that I am confused by OVN "L2" LB. I think you might > > mean OVN > > "L3/L4" LB? > > > > Some general thoughts on this is, OVN is primarily to program OVS > > (or > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a > > data-driven approach (as mentioned by Ben in several talks). The > > advantage > > is that it uses caches to accelerate datapath, regardless of the > > number of > > pipeline stages in the forwarding logic; and the disadvantage is of > > course > > when a packet has a cache miss, it will be slow. So I would think > > the > > direction of using eBPF/XDP is better to be within OVS itself, > > instead of > > adding an extra stage that cannot be cached within the OVS > > framework, > > because even if the extra stage is very fast, it is still extra. > > > > I would consider such an extra eBPF/XDP stage in OVN directly only > > for the > > cases that we know it is likely to miss the OVS/HW flow caches. One > > example > > may be DOS attacks that always trigger CT unestablished entries, > > which is > > not HW offload friendly. (But I don't have concrete use > > cases/scenarios) > > > > In the case of OVN LB, I don't see a reason why it would miss the > > cache > > except for the first packets. Adding an extra eBPF/XDP stage on top > > of the > > OVS cached pipeline doesn't seem to improve the performance. > > > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> > > > > wrote: > > > > > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > > > > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > > > > > XDP program - ovn_xdp.c added in this RFC patch series > > > > > > implements > > basic > > > > > port > > > > > > security and drops any packet if the port security check > > > > > > fails. > > > > > > There are still few TODOs in the port security checks. Like > > > > > > - Make ovn xdp configurable. > > > > > > - Removing the ingress Openflow rules from table 73 > > > > > > and 74 > > if ovn > > > > > xdp > > > > > > is enabled. > > > > > > - Add IPv6 support. > > > > > > - Enhance the port security xdp program for ARP/IPv6 > > > > > > ND > > checks. > > > > > > > > > > > > This patch adds a basic XDP support in OVN and in future we > > > > > > can > > > > > > leverage eBPF/XDP features. > > > > > > > > > > > > I'm not sure how much value this RFC patch adds to make use > > > > > > of > > eBPF/XDP > > > > > > just for port security. Submitting as RFC to get some > > > > > > feedback and > > > > > > start some conversation on eBPF/XDP in OVN. > > > > > > > > > > > Hi Numan, > > > > > > > > > > This is really cool. It demonstrates how OVN could leverage > > > > > eBPF/XDP. > > > > > > > > > > On the other hand, for the port-security feature in XDP, I > > > > > keep > > thinking > > > > > about the scenarios and it is still not very clear to me. One > > advantage I > > > > > can think of is to prevent DOS attacks from VM/Pod when > > > > > invalid > > IP/MAC are > > > > > used, XDP may perform better and drop packets with lower CPU > > > > > cost > > > > > (comparing with OVS kernel datapath). However, I am also > > > > > wondering > > why > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you > > > > > have > > some more > > > > > thoughts about the use cases? > > > > > > My idea was to demonstrate the use of eBPF/XDP and port security > > > checks were easy to do > > > before the packet hits the OVS pipeline. > > > > > Understand. It is indeed a great demonstration. > > > > > If we were to move the port security check to XDP, then the only > > > advantage we would be getting > > > in my opinion is to remove the corresponding ingress port > > > security > > > check related OF rules from ovs-vswitchd, thereby decreasing some > > > looks up during > > > flow translation. > > > > > For slow path, it might reduce the lookups in two tables, but > > considering > > that we have tens of tables, this cost may be negligible? > > For fast path, there is no impact on the megaflow cache. > > > > > I'm not sure why an attacker would use invalid IP/MAC for DOS > > > attacks. > > > But from what I know, ovn-kubernetes do want to restrict each POD > > > to > > > its assigned IP/MAC. > > > > > Yes, restricting pods to use assigned IP/MAC is for port security, > > which > > is implemented by the port-security flows. I was talking about DOS > > attacks > > just to imagine a use case that utilizes the performance advantage > > of XDP. > > If it is just to detect and drop a regular amount of packets that > > try to > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't > > reflect > > the benefit of XDP. > > > > > And do you have any performance results > > > > > comparing with the current OVS implementation? > > > > > > I didn't do any scale/performance related tests. > > > > > > If we were to move port security feature to XDP in OVN, then I > > > think we > > need to > > > - Complete the TODO's like adding IPv6 and ARP/ND related > > > checks > > > - Do some scale testing and see whether its reducing memory > > > footprint of ovs-vswitchd and ovn-controller because of the > > > reduction > > > in OF rules > > > > > > > Maybe I am wrong, but I think port-security flows are only related > > to > > local LSPs on each node, which doesn't contribute much to the > > OVS/ovn-controller memory footprint, and thanks to your patches > > that moves > > port-security flow generation from northd to ovn-controller, the > > central > > components are already out of the picture of the port-security > > related > > costs. So I guess we won't see obvious differences in scale tests. > > > > > > > > > > > > Another question is, would it work with smart NIC HW-offload, > > > > > where > > VF > > > > > representer ports are added to OVS on the smart NIC? I guess > > > > > XDP > > doesn't > > > > > support representer port, right? > > > > > > I think so. I don't have much experience/knowledge on this. From > > > what > > > I understand, if datapath flows are offloaded and since XDP is > > > not > > > offloaded, the xdo checks will be totally missed. > > > So if XDP is to be used, then offloading should be disabled. > > > > > > > Agree, although I did hope it could help for HW offload enabled > > environments to mitigate the scenarios when packets would miss the > > HW flow > > cache. > > > > Thanks, > > Han > > > > > Thanks > > > Numan > > > > > > > > > > > > > Thanks, > > > > > Han > > > > > > > > > > > In order to attach and detach xdp programs, libxdp [1] and > > > > > > libbpf > > is > > > > > used. > > > > > > > > > > > > To test it out locally, please install libxdp-devel and > > libbpf-devel > > > > > > and the compile OVN first and then compile ovn_xdp by > > > > > > running "make > > > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or > > /usr/local/share/ovn/ > > > > > > > > > > > > > > > > > > Numan Siddique (2): > > > > > > RFC: Add basic xdp/eBPF support in OVN. > > > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the > > > > > > logical > > > > > > ports. > > > > > > > > > > > > Makefile.am | 6 +- > > > > > > bpf/.gitignore | 5 + > > > > > > bpf/automake.mk | 23 +++ > > > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > > > > > configure.ac | 2 + > > > > > > controller/automake.mk | 4 +- > > > > > > controller/binding.c | 45 +++-- > > > > > > controller/binding.h | 7 + > > > > > > controller/ovn-controller.c | 79 +++++++- > > > > > > controller/xdp.c | 389 > > ++++++++++++++++++++++++++++++++++++ > > > > > > controller/xdp.h | 41 ++++ > > > > > > m4/ovn.m4 | 20 ++ > > > > > > tests/automake.mk | 1 + > > > > > > 13 files changed, 753 insertions(+), 25 deletions(-) > > > > > > create mode 100644 bpf/.gitignore > > > > > > create mode 100644 bpf/automake.mk > > > > > > create mode 100644 bpf/ovn_xdp.c > > > > > > create mode 100644 controller/xdp.c > > > > > > create mode 100644 controller/xdp.h > > > > > > > > > > > > -- > > > > > > 2.35.3 > > > > > > > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > dev@openvswitch.org > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > -- > > > > 刘梦馨 > > > > Blog: http://oilbeater.com > > > > Weibo: @oilbeater <http://weibo.com/oilbeater> > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > >
> pod -> pod (directly to the other Pod IP) shouldn't go through any load balancer related flows though, right? It didn't match the final vip and ct_lb action. But when the lb rule exists, it will first send all packets to conntrack and lead recirculation with ovs clone and it hurts the performance. And I find the initial commit that send all traffic to conntrack here https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb is to fix a bug. Even if we bypass the conntrack action in ingress pipeline by a customized ovn, we still cannot bypass the conntrack in the egress pipeline. All egress packets still need to be sent to conntrack to test if they match a nat session. I cannot find the full performance test data at the moment. What I find is that with the patch to bypass ingress conntrack, with lb rules, the latency for pod-to-pod qperf test dropped from 118us to 97us. And if no lb rules exist, the pod-to-pod latency drops to 88us. On Thu, 9 Jun 2022 at 01:52, Dan Williams <dcbw@redhat.com> wrote: > On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote: > > > Could you tell roughly how many packets were sent in a single test? > > > Was > > the latency measured for all the UDP packets in average? > > > > Let me describe my test method more clearly. In fact, we only tested > > pod-to-pod performance *not* pod-to-service and then do profile with > > flamegraph and find the loadbalancer process took about 30% CPU > > usage. > > pod -> pod (directly to the other Pod IP) shouldn't go through any load > balancer related flows though, right? That seems curious to me... It > might hit OVN's load balancer stages but (I think!) shouldn't be > matching any rules in them, because the packet's destination IP > wouldn't be a LB VIP. > > Did you do an ofproto/trace to see what OVS flows the packet was > hitting and if any were OVN LB related? > > Dan > > > > > Run two Pods in two different node, and one run qperf server the > > other run > > qperf client to test udp latency and bandwidth performance with > > command > > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`. > > > > In the first test, we use kube-ovn default setup which use ovn > > loadbalancer > > to replace kube-proxy and got the result latency 25.7us and > > bandwidth > > 2.8Mb/s > > > > Then we manually delete all ovn loadbalancer rules bind to the > > logical > > switch, and got a much better result 18.5us and 6Mb/s > > > > > Was it clear why the total datapath cannot be offloaded to HW? > > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't > > support > > dp_hash and hash at the moment and these two method are used by > > group table to select a backend. > > What makes things worse is that when any lb bind to a ls all packet > > will go > > through the lb pipeline even if it not designate to service. So the > > total > > ls datapath cannot be offloaded. > > > > We have a customized path to bypaas the lb pipeline if traffic not > > designate to service here > > > https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch > > > > > I am sorry that I am confused by OVN "L2" LB. I think you might > > > mean OVN > > "L3/L4" LB? > > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to > > replace > > kube-proxy > > > > > I am asking because if the packets hit mega flows in the kernel > > > cache, > > it shouldn't be slower than kube-proxy which also uses conntrack. If > > it is > > HW offloaded it should be faster. > > > > In my previous profile it seems unrelated to mega flow cache. The > > flame > > graph shows that there is extra ovs clone and reprocess compared to > > the > > flame graph without lb. I have introduced how to profile and optimize > > kube-ovn performance before and give more detail about the lb > > performance > > issue at the beginning of the video in Chinese > > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide > > more > > help > > > > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote: > > > > > > > > > > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> > > > wrote: > > > > > > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> > > > > wrote: > > > > > > > > > > Just give some input about eBPF/XDP support. > > > > > > > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, > > > > > but found > > > > > that > > > > > the L2 LB will use conntrack and ovs clone which hurts > > > > > performance > > > badly. > > > > > The latency > > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth > > > > > drop > > > from > > > > > 6Mb/s to 2.8Mb/s. > > > > > > > > Thanks for the input! > > > Could you tell roughly how many packets were sent in a single test? > > > Was > > > the latency measured for all the UDP packets in average? I am > > > asking > > > because if the packets hit mega flows in the kernel cache, it > > > shouldn't be > > > slower than kube-proxy which also uses conntrack. If it is HW > > > offloaded it > > > should be faster. > > > > > > > > Even if the traffic does not target to LB VIPs has the same > > > > > performance > > > > > drop and it also leads to the > > > > > total datapath cannot be offloaded to hardware. > > > > > > > > > > > Was it clear why the total datapath cannot be offloaded to HW? > > > There might > > > be problems of supporting HW offloading in earlier version of OVN. > > > There > > > have been improvements to make it more HW offload friendly. > > > > > > > > And finally we turn to using Cilium's chaining mode to replace > > > > > the OVN > > > L2 > > > > > LB to implement kube-proxy to > > > > > resolve the above issues. We hope to see the lb optimization by > > > eBPF/XDP on > > > > > the OVN side. > > > > > > > > > > > > > Thanks for your comments and inputs. I think we should > > > > definitely > > > > explore optimizing this use case > > > > and see if its possible to leverage eBPF/XDP for this. > > > > > > > > > > I am sorry that I am confused by OVN "L2" LB. I think you might > > > mean OVN > > > "L3/L4" LB? > > > > > > Some general thoughts on this is, OVN is primarily to program OVS > > > (or > > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a > > > data-driven approach (as mentioned by Ben in several talks). The > > > advantage > > > is that it uses caches to accelerate datapath, regardless of the > > > number of > > > pipeline stages in the forwarding logic; and the disadvantage is of > > > course > > > when a packet has a cache miss, it will be slow. So I would think > > > the > > > direction of using eBPF/XDP is better to be within OVS itself, > > > instead of > > > adding an extra stage that cannot be cached within the OVS > > > framework, > > > because even if the extra stage is very fast, it is still extra. > > > > > > I would consider such an extra eBPF/XDP stage in OVN directly only > > > for the > > > cases that we know it is likely to miss the OVS/HW flow caches. One > > > example > > > may be DOS attacks that always trigger CT unestablished entries, > > > which is > > > not HW offload friendly. (But I don't have concrete use > > > cases/scenarios) > > > > > > In the case of OVN LB, I don't see a reason why it would miss the > > > cache > > > except for the first packets. Adding an extra eBPF/XDP stage on top > > > of the > > > OVS cached pipeline doesn't seem to improve the performance. > > > > > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> > > > > > wrote: > > > > > > > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: > > > > > > > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > > > > > > > XDP program - ovn_xdp.c added in this RFC patch series > > > > > > > implements > > > basic > > > > > > port > > > > > > > security and drops any packet if the port security check > > > > > > > fails. > > > > > > > There are still few TODOs in the port security checks. Like > > > > > > > - Make ovn xdp configurable. > > > > > > > - Removing the ingress Openflow rules from table 73 > > > > > > > and 74 > > > if ovn > > > > > > xdp > > > > > > > is enabled. > > > > > > > - Add IPv6 support. > > > > > > > - Enhance the port security xdp program for ARP/IPv6 > > > > > > > ND > > > checks. > > > > > > > > > > > > > > This patch adds a basic XDP support in OVN and in future we > > > > > > > can > > > > > > > leverage eBPF/XDP features. > > > > > > > > > > > > > > I'm not sure how much value this RFC patch adds to make use > > > > > > > of > > > eBPF/XDP > > > > > > > just for port security. Submitting as RFC to get some > > > > > > > feedback and > > > > > > > start some conversation on eBPF/XDP in OVN. > > > > > > > > > > > > > Hi Numan, > > > > > > > > > > > > This is really cool. It demonstrates how OVN could leverage > > > > > > eBPF/XDP. > > > > > > > > > > > > On the other hand, for the port-security feature in XDP, I > > > > > > keep > > > thinking > > > > > > about the scenarios and it is still not very clear to me. One > > > advantage I > > > > > > can think of is to prevent DOS attacks from VM/Pod when > > > > > > invalid > > > IP/MAC are > > > > > > used, XDP may perform better and drop packets with lower CPU > > > > > > cost > > > > > > (comparing with OVS kernel datapath). However, I am also > > > > > > wondering > > > why > > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you > > > > > > have > > > some more > > > > > > thoughts about the use cases? > > > > > > > > My idea was to demonstrate the use of eBPF/XDP and port security > > > > checks were easy to do > > > > before the packet hits the OVS pipeline. > > > > > > > Understand. It is indeed a great demonstration. > > > > > > > If we were to move the port security check to XDP, then the only > > > > advantage we would be getting > > > > in my opinion is to remove the corresponding ingress port > > > > security > > > > check related OF rules from ovs-vswitchd, thereby decreasing some > > > > looks up during > > > > flow translation. > > > > > > > For slow path, it might reduce the lookups in two tables, but > > > considering > > > that we have tens of tables, this cost may be negligible? > > > For fast path, there is no impact on the megaflow cache. > > > > > > > I'm not sure why an attacker would use invalid IP/MAC for DOS > > > > attacks. > > > > But from what I know, ovn-kubernetes do want to restrict each POD > > > > to > > > > its assigned IP/MAC. > > > > > > > Yes, restricting pods to use assigned IP/MAC is for port security, > > > which > > > is implemented by the port-security flows. I was talking about DOS > > > attacks > > > just to imagine a use case that utilizes the performance advantage > > > of XDP. > > > If it is just to detect and drop a regular amount of packets that > > > try to > > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't > > > reflect > > > the benefit of XDP. > > > > > > > And do you have any performance results > > > > > > comparing with the current OVS implementation? > > > > > > > > I didn't do any scale/performance related tests. > > > > > > > > If we were to move port security feature to XDP in OVN, then I > > > > think we > > > need to > > > > - Complete the TODO's like adding IPv6 and ARP/ND related > > > > checks > > > > - Do some scale testing and see whether its reducing memory > > > > footprint of ovs-vswitchd and ovn-controller because of the > > > > reduction > > > > in OF rules > > > > > > > > > > Maybe I am wrong, but I think port-security flows are only related > > > to > > > local LSPs on each node, which doesn't contribute much to the > > > OVS/ovn-controller memory footprint, and thanks to your patches > > > that moves > > > port-security flow generation from northd to ovn-controller, the > > > central > > > components are already out of the picture of the port-security > > > related > > > costs. So I guess we won't see obvious differences in scale tests. > > > > > > > > > > > > > > > Another question is, would it work with smart NIC HW-offload, > > > > > > where > > > VF > > > > > > representer ports are added to OVS on the smart NIC? I guess > > > > > > XDP > > > doesn't > > > > > > support representer port, right? > > > > > > > > I think so. I don't have much experience/knowledge on this. From > > > > what > > > > I understand, if datapath flows are offloaded and since XDP is > > > > not > > > > offloaded, the xdo checks will be totally missed. > > > > So if XDP is to be used, then offloading should be disabled. > > > > > > > > > > Agree, although I did hope it could help for HW offload enabled > > > environments to mitigate the scenarios when packets would miss the > > > HW flow > > > cache. > > > > > > Thanks, > > > Han > > > > > > > Thanks > > > > Numan > > > > > > > > > > > > > > > > Thanks, > > > > > > Han > > > > > > > > > > > > > In order to attach and detach xdp programs, libxdp [1] and > > > > > > > libbpf > > > is > > > > > > used. > > > > > > > > > > > > > > To test it out locally, please install libxdp-devel and > > > libbpf-devel > > > > > > > and the compile OVN first and then compile ovn_xdp by > > > > > > > running "make > > > > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or > > > /usr/local/share/ovn/ > > > > > > > > > > > > > > > > > > > > > Numan Siddique (2): > > > > > > > RFC: Add basic xdp/eBPF support in OVN. > > > > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the > > > > > > > logical > > > > > > > ports. > > > > > > > > > > > > > > Makefile.am | 6 +- > > > > > > > bpf/.gitignore | 5 + > > > > > > > bpf/automake.mk | 23 +++ > > > > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > > > > > > configure.ac | 2 + > > > > > > > controller/automake.mk | 4 +- > > > > > > > controller/binding.c | 45 +++-- > > > > > > > controller/binding.h | 7 + > > > > > > > controller/ovn-controller.c | 79 +++++++- > > > > > > > controller/xdp.c | 389 > > > ++++++++++++++++++++++++++++++++++++ > > > > > > > controller/xdp.h | 41 ++++ > > > > > > > m4/ovn.m4 | 20 ++ > > > > > > > tests/automake.mk | 1 + > > > > > > > 13 files changed, 753 insertions(+), 25 deletions(-) > > > > > > > create mode 100644 bpf/.gitignore > > > > > > > create mode 100644 bpf/automake.mk > > > > > > > create mode 100644 bpf/ovn_xdp.c > > > > > > > create mode 100644 controller/xdp.c > > > > > > > create mode 100644 controller/xdp.h > > > > > > > > > > > > > > -- > > > > > > > 2.35.3 > > > > > > > > > > > > > > _______________________________________________ > > > > > > > dev mailing list > > > > > > > dev@openvswitch.org > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > dev@openvswitch.org > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > > > > > -- > > > > > 刘梦馨 > > > > > Blog: http://oilbeater.com > > > > > Weibo: @oilbeater <http://weibo.com/oilbeater> > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > >
In our profile, the conntrack is the main reason for performance drop. And when need related functions like lb and acl, we have to carefully check if other unrelated flows are affected like this patch for multicast traffic http://patchwork.ozlabs.org/project/ovn/patch/20211217141645.9931-1-dceara@redhat.com For some performance test poc scenarios we also have to disable functions related to conntrack for a better result. If XDP/eBPF can help with the conntrack performance issues, I think it will be a big boost and we don't need lots of customization or turn to Cilium to replace some functions but bring in lots of complexity. On Thu, 9 Jun 2022 at 11:19, 刘梦馨 <liumengxinfly@gmail.com> wrote: > > pod -> pod (directly to the other Pod IP) shouldn't go through any load > balancer related flows though, right? > > It didn't match the final vip and ct_lb action. But when the lb rule > exists, it will first send all packets to conntrack and lead recirculation > with ovs clone and it hurts the performance. > > And I find the initial commit that send all traffic to conntrack here > https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb > is to fix a bug. > > Even if we bypass the conntrack action in ingress pipeline by a customized > ovn, we still cannot bypass the conntrack in the egress pipeline. All > egress packets still need to be sent to conntrack to test if they match a > nat session. > > I cannot find the full performance test data at the moment. What I find is > that with the patch to bypass ingress conntrack, with lb rules, the latency > for pod-to-pod qperf test dropped from 118us to 97us. And if no lb rules > exist, the pod-to-pod latency drops to 88us. > > On Thu, 9 Jun 2022 at 01:52, Dan Williams <dcbw@redhat.com> wrote: > >> On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote: >> > > Could you tell roughly how many packets were sent in a single test? >> > > Was >> > the latency measured for all the UDP packets in average? >> > >> > Let me describe my test method more clearly. In fact, we only tested >> > pod-to-pod performance *not* pod-to-service and then do profile with >> > flamegraph and find the loadbalancer process took about 30% CPU >> > usage. >> >> pod -> pod (directly to the other Pod IP) shouldn't go through any load >> balancer related flows though, right? That seems curious to me... It >> might hit OVN's load balancer stages but (I think!) shouldn't be >> matching any rules in them, because the packet's destination IP >> wouldn't be a LB VIP. >> >> Did you do an ofproto/trace to see what OVS flows the packet was >> hitting and if any were OVN LB related? >> >> Dan >> >> > >> > Run two Pods in two different node, and one run qperf server the >> > other run >> > qperf client to test udp latency and bandwidth performance with >> > command >> > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`. >> > >> > In the first test, we use kube-ovn default setup which use ovn >> > loadbalancer >> > to replace kube-proxy and got the result latency 25.7us and >> > bandwidth >> > 2.8Mb/s >> > >> > Then we manually delete all ovn loadbalancer rules bind to the >> > logical >> > switch, and got a much better result 18.5us and 6Mb/s >> > >> > > Was it clear why the total datapath cannot be offloaded to HW? >> > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't >> > support >> > dp_hash and hash at the moment and these two method are used by >> > group table to select a backend. >> > What makes things worse is that when any lb bind to a ls all packet >> > will go >> > through the lb pipeline even if it not designate to service. So the >> > total >> > ls datapath cannot be offloaded. >> > >> > We have a customized path to bypaas the lb pipeline if traffic not >> > designate to service here >> > >> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch >> > >> > > I am sorry that I am confused by OVN "L2" LB. I think you might >> > > mean OVN >> > "L3/L4" LB? >> > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to >> > replace >> > kube-proxy >> > >> > > I am asking because if the packets hit mega flows in the kernel >> > > cache, >> > it shouldn't be slower than kube-proxy which also uses conntrack. If >> > it is >> > HW offloaded it should be faster. >> > >> > In my previous profile it seems unrelated to mega flow cache. The >> > flame >> > graph shows that there is extra ovs clone and reprocess compared to >> > the >> > flame graph without lb. I have introduced how to profile and optimize >> > kube-ovn performance before and give more detail about the lb >> > performance >> > issue at the beginning of the video in Chinese >> > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide >> > more >> > help >> > >> > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote: >> > >> > > >> > > >> > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> >> > > wrote: >> > > > >> > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> >> > > > wrote: >> > > > > >> > > > > Just give some input about eBPF/XDP support. >> > > > > >> > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, >> > > > > but found >> > > > > that >> > > > > the L2 LB will use conntrack and ovs clone which hurts >> > > > > performance >> > > badly. >> > > > > The latency >> > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth >> > > > > drop >> > > from >> > > > > 6Mb/s to 2.8Mb/s. >> > > > > >> > > Thanks for the input! >> > > Could you tell roughly how many packets were sent in a single test? >> > > Was >> > > the latency measured for all the UDP packets in average? I am >> > > asking >> > > because if the packets hit mega flows in the kernel cache, it >> > > shouldn't be >> > > slower than kube-proxy which also uses conntrack. If it is HW >> > > offloaded it >> > > should be faster. >> > > >> > > > > Even if the traffic does not target to LB VIPs has the same >> > > > > performance >> > > > > drop and it also leads to the >> > > > > total datapath cannot be offloaded to hardware. >> > > > > >> > > >> > > Was it clear why the total datapath cannot be offloaded to HW? >> > > There might >> > > be problems of supporting HW offloading in earlier version of OVN. >> > > There >> > > have been improvements to make it more HW offload friendly. >> > > >> > > > > And finally we turn to using Cilium's chaining mode to replace >> > > > > the OVN >> > > L2 >> > > > > LB to implement kube-proxy to >> > > > > resolve the above issues. We hope to see the lb optimization by >> > > eBPF/XDP on >> > > > > the OVN side. >> > > > > >> > > > >> > > > Thanks for your comments and inputs. I think we should >> > > > definitely >> > > > explore optimizing this use case >> > > > and see if its possible to leverage eBPF/XDP for this. >> > > > >> > > >> > > I am sorry that I am confused by OVN "L2" LB. I think you might >> > > mean OVN >> > > "L3/L4" LB? >> > > >> > > Some general thoughts on this is, OVN is primarily to program OVS >> > > (or >> > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a >> > > data-driven approach (as mentioned by Ben in several talks). The >> > > advantage >> > > is that it uses caches to accelerate datapath, regardless of the >> > > number of >> > > pipeline stages in the forwarding logic; and the disadvantage is of >> > > course >> > > when a packet has a cache miss, it will be slow. So I would think >> > > the >> > > direction of using eBPF/XDP is better to be within OVS itself, >> > > instead of >> > > adding an extra stage that cannot be cached within the OVS >> > > framework, >> > > because even if the extra stage is very fast, it is still extra. >> > > >> > > I would consider such an extra eBPF/XDP stage in OVN directly only >> > > for the >> > > cases that we know it is likely to miss the OVS/HW flow caches. One >> > > example >> > > may be DOS attacks that always trigger CT unestablished entries, >> > > which is >> > > not HW offload friendly. (But I don't have concrete use >> > > cases/scenarios) >> > > >> > > In the case of OVN LB, I don't see a reason why it would miss the >> > > cache >> > > except for the first packets. Adding an extra eBPF/XDP stage on top >> > > of the >> > > OVS cached pipeline doesn't seem to improve the performance. >> > > >> > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: >> > > > > > > >> > > > > > > From: Numan Siddique <numans@ovn.org> >> > > > > > > >> > > > > > > XDP program - ovn_xdp.c added in this RFC patch series >> > > > > > > implements >> > > basic >> > > > > > port >> > > > > > > security and drops any packet if the port security check >> > > > > > > fails. >> > > > > > > There are still few TODOs in the port security checks. Like >> > > > > > > - Make ovn xdp configurable. >> > > > > > > - Removing the ingress Openflow rules from table 73 >> > > > > > > and 74 >> > > if ovn >> > > > > > xdp >> > > > > > > is enabled. >> > > > > > > - Add IPv6 support. >> > > > > > > - Enhance the port security xdp program for ARP/IPv6 >> > > > > > > ND >> > > checks. >> > > > > > > >> > > > > > > This patch adds a basic XDP support in OVN and in future we >> > > > > > > can >> > > > > > > leverage eBPF/XDP features. >> > > > > > > >> > > > > > > I'm not sure how much value this RFC patch adds to make use >> > > > > > > of >> > > eBPF/XDP >> > > > > > > just for port security. Submitting as RFC to get some >> > > > > > > feedback and >> > > > > > > start some conversation on eBPF/XDP in OVN. >> > > > > > > >> > > > > > Hi Numan, >> > > > > > >> > > > > > This is really cool. It demonstrates how OVN could leverage >> > > > > > eBPF/XDP. >> > > > > > >> > > > > > On the other hand, for the port-security feature in XDP, I >> > > > > > keep >> > > thinking >> > > > > > about the scenarios and it is still not very clear to me. One >> > > advantage I >> > > > > > can think of is to prevent DOS attacks from VM/Pod when >> > > > > > invalid >> > > IP/MAC are >> > > > > > used, XDP may perform better and drop packets with lower CPU >> > > > > > cost >> > > > > > (comparing with OVS kernel datapath). However, I am also >> > > > > > wondering >> > > why >> > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you >> > > > > > have >> > > some more >> > > > > > thoughts about the use cases? >> > > > >> > > > My idea was to demonstrate the use of eBPF/XDP and port security >> > > > checks were easy to do >> > > > before the packet hits the OVS pipeline. >> > > > >> > > Understand. It is indeed a great demonstration. >> > > >> > > > If we were to move the port security check to XDP, then the only >> > > > advantage we would be getting >> > > > in my opinion is to remove the corresponding ingress port >> > > > security >> > > > check related OF rules from ovs-vswitchd, thereby decreasing some >> > > > looks up during >> > > > flow translation. >> > > > >> > > For slow path, it might reduce the lookups in two tables, but >> > > considering >> > > that we have tens of tables, this cost may be negligible? >> > > For fast path, there is no impact on the megaflow cache. >> > > >> > > > I'm not sure why an attacker would use invalid IP/MAC for DOS >> > > > attacks. >> > > > But from what I know, ovn-kubernetes do want to restrict each POD >> > > > to >> > > > its assigned IP/MAC. >> > > > >> > > Yes, restricting pods to use assigned IP/MAC is for port security, >> > > which >> > > is implemented by the port-security flows. I was talking about DOS >> > > attacks >> > > just to imagine a use case that utilizes the performance advantage >> > > of XDP. >> > > If it is just to detect and drop a regular amount of packets that >> > > try to >> > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't >> > > reflect >> > > the benefit of XDP. >> > > >> > > > And do you have any performance results >> > > > > > comparing with the current OVS implementation? >> > > > >> > > > I didn't do any scale/performance related tests. >> > > > >> > > > If we were to move port security feature to XDP in OVN, then I >> > > > think we >> > > need to >> > > > - Complete the TODO's like adding IPv6 and ARP/ND related >> > > > checks >> > > > - Do some scale testing and see whether its reducing memory >> > > > footprint of ovs-vswitchd and ovn-controller because of the >> > > > reduction >> > > > in OF rules >> > > > >> > > >> > > Maybe I am wrong, but I think port-security flows are only related >> > > to >> > > local LSPs on each node, which doesn't contribute much to the >> > > OVS/ovn-controller memory footprint, and thanks to your patches >> > > that moves >> > > port-security flow generation from northd to ovn-controller, the >> > > central >> > > components are already out of the picture of the port-security >> > > related >> > > costs. So I guess we won't see obvious differences in scale tests. >> > > >> > > > > > >> > > > > > Another question is, would it work with smart NIC HW-offload, >> > > > > > where >> > > VF >> > > > > > representer ports are added to OVS on the smart NIC? I guess >> > > > > > XDP >> > > doesn't >> > > > > > support representer port, right? >> > > > >> > > > I think so. I don't have much experience/knowledge on this. From >> > > > what >> > > > I understand, if datapath flows are offloaded and since XDP is >> > > > not >> > > > offloaded, the xdo checks will be totally missed. >> > > > So if XDP is to be used, then offloading should be disabled. >> > > > >> > > >> > > Agree, although I did hope it could help for HW offload enabled >> > > environments to mitigate the scenarios when packets would miss the >> > > HW flow >> > > cache. >> > > >> > > Thanks, >> > > Han >> > > >> > > > Thanks >> > > > Numan >> > > > >> > > > > > >> > > > > > Thanks, >> > > > > > Han >> > > > > > >> > > > > > > In order to attach and detach xdp programs, libxdp [1] and >> > > > > > > libbpf >> > > is >> > > > > > used. >> > > > > > > >> > > > > > > To test it out locally, please install libxdp-devel and >> > > libbpf-devel >> > > > > > > and the compile OVN first and then compile ovn_xdp by >> > > > > > > running "make >> > > > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or >> > > /usr/local/share/ovn/ >> > > > > > > >> > > > > > > >> > > > > > > Numan Siddique (2): >> > > > > > > RFC: Add basic xdp/eBPF support in OVN. >> > > > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the >> > > > > > > logical >> > > > > > > ports. >> > > > > > > >> > > > > > > Makefile.am | 6 +- >> > > > > > > bpf/.gitignore | 5 + >> > > > > > > bpf/automake.mk | 23 +++ >> > > > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ >> > > > > > > configure.ac | 2 + >> > > > > > > controller/automake.mk | 4 +- >> > > > > > > controller/binding.c | 45 +++-- >> > > > > > > controller/binding.h | 7 + >> > > > > > > controller/ovn-controller.c | 79 +++++++- >> > > > > > > controller/xdp.c | 389 >> > > ++++++++++++++++++++++++++++++++++++ >> > > > > > > controller/xdp.h | 41 ++++ >> > > > > > > m4/ovn.m4 | 20 ++ >> > > > > > > tests/automake.mk | 1 + >> > > > > > > 13 files changed, 753 insertions(+), 25 deletions(-) >> > > > > > > create mode 100644 bpf/.gitignore >> > > > > > > create mode 100644 bpf/automake.mk >> > > > > > > create mode 100644 bpf/ovn_xdp.c >> > > > > > > create mode 100644 controller/xdp.c >> > > > > > > create mode 100644 controller/xdp.h >> > > > > > > >> > > > > > > -- >> > > > > > > 2.35.3 >> > > > > > > >> > > > > > > _______________________________________________ >> > > > > > > dev mailing list >> > > > > > > dev@openvswitch.org >> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > > > > _______________________________________________ >> > > > > > dev mailing list >> > > > > > dev@openvswitch.org >> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > 刘梦馨 >> > > > > Blog: http://oilbeater.com >> > > > > Weibo: @oilbeater <http://weibo.com/oilbeater> >> > > > > _______________________________________________ >> > > > > dev mailing list >> > > > > dev@openvswitch.org >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > >> > >> > >> >> > > -- > 刘梦馨 > Blog: http://oilbeater.com > Weibo: @oilbeater <http://weibo.com/oilbeater> >
On Wed, Jun 8, 2022 at 8:50 PM 刘梦馨 <liumengxinfly@gmail.com> wrote: > In our profile, the conntrack is the main reason for performance drop. > > And when need related functions like lb and acl, we have to carefully > check if other unrelated flows are affected like this patch for multicast > traffic > > http://patchwork.ozlabs.org/project/ovn/patch/20211217141645.9931-1-dceara@redhat.com > > For some performance test poc scenarios we also have to disable functions > related to conntrack for a better result. If XDP/eBPF can help with the > conntrack performance issues, I think it will be a big boost and we don't > need lots of customization or turn to Cilium to replace some functions but > bring in lots of complexity. > > On Thu, 9 Jun 2022 at 11:19, 刘梦馨 <liumengxinfly@gmail.com> wrote: > >> > pod -> pod (directly to the other Pod IP) shouldn't go through any load >> balancer related flows though, right? >> >> It didn't match the final vip and ct_lb action. But when the lb rule >> exists, it will first send all packets to conntrack and lead recirculation >> with ovs clone and it hurts the performance. >> >> And I find the initial commit that send all traffic to conntrack here >> https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb >> is to fix a bug. >> >> Even if we bypass the conntrack action in ingress pipeline by a >> customized ovn, we still cannot bypass the conntrack in the egress >> pipeline. All egress packets still need to be sent to conntrack to test if >> they match a nat session. >> >> I cannot find the full performance test data at the moment. What I find >> is that with the patch to bypass ingress conntrack, with lb rules, the >> latency for pod-to-pod qperf test dropped from 118us to 97us. And if no lb >> rules exist, the pod-to-pod latency drops to 88us. >> > To revisit this discussion, now with the below patch, the stateless ACLs can be used to bypass conntrack even when there are LBs. It's in the main branch and will be in 23.03 release: https://github.com/ovn-org/ovn/commit/a0f82efdd9dfd3ef2d9606c1890e353df1097a51 Hope this helps. Regards, Han > >> On Thu, 9 Jun 2022 at 01:52, Dan Williams <dcbw@redhat.com> wrote: >> >>> On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote: >>> > > Could you tell roughly how many packets were sent in a single test? >>> > > Was >>> > the latency measured for all the UDP packets in average? >>> > >>> > Let me describe my test method more clearly. In fact, we only tested >>> > pod-to-pod performance *not* pod-to-service and then do profile with >>> > flamegraph and find the loadbalancer process took about 30% CPU >>> > usage. >>> >>> pod -> pod (directly to the other Pod IP) shouldn't go through any load >>> balancer related flows though, right? That seems curious to me... It >>> might hit OVN's load balancer stages but (I think!) shouldn't be >>> matching any rules in them, because the packet's destination IP >>> wouldn't be a LB VIP. >>> >>> Did you do an ofproto/trace to see what OVS flows the packet was >>> hitting and if any were OVN LB related? >>> >>> Dan >>> >>> > >>> > Run two Pods in two different node, and one run qperf server the >>> > other run >>> > qperf client to test udp latency and bandwidth performance with >>> > command >>> > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`. >>> > >>> > In the first test, we use kube-ovn default setup which use ovn >>> > loadbalancer >>> > to replace kube-proxy and got the result latency 25.7us and >>> > bandwidth >>> > 2.8Mb/s >>> > >>> > Then we manually delete all ovn loadbalancer rules bind to the >>> > logical >>> > switch, and got a much better result 18.5us and 6Mb/s >>> > >>> > > Was it clear why the total datapath cannot be offloaded to HW? >>> > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't >>> > support >>> > dp_hash and hash at the moment and these two method are used by >>> > group table to select a backend. >>> > What makes things worse is that when any lb bind to a ls all packet >>> > will go >>> > through the lb pipeline even if it not designate to service. So the >>> > total >>> > ls datapath cannot be offloaded. >>> > >>> > We have a customized path to bypaas the lb pipeline if traffic not >>> > designate to service here >>> > >>> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch >>> > >>> > > I am sorry that I am confused by OVN "L2" LB. I think you might >>> > > mean OVN >>> > "L3/L4" LB? >>> > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to >>> > replace >>> > kube-proxy >>> > >>> > > I am asking because if the packets hit mega flows in the kernel >>> > > cache, >>> > it shouldn't be slower than kube-proxy which also uses conntrack. If >>> > it is >>> > HW offloaded it should be faster. >>> > >>> > In my previous profile it seems unrelated to mega flow cache. The >>> > flame >>> > graph shows that there is extra ovs clone and reprocess compared to >>> > the >>> > flame graph without lb. I have introduced how to profile and optimize >>> > kube-ovn performance before and give more detail about the lb >>> > performance >>> > issue at the beginning of the video in Chinese >>> > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide >>> > more >>> > help >>> > >>> > On Wed, 8 Jun 2022 at 23:53, Han Zhou <zhouhan@gmail.com> wrote: >>> > >>> > > >>> > > >>> > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <numans@ovn.org> >>> > > wrote: >>> > > > >>> > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <liumengxinfly@gmail.com> >>> > > > wrote: >>> > > > > >>> > > > > Just give some input about eBPF/XDP support. >>> > > > > >>> > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, >>> > > > > but found >>> > > > > that >>> > > > > the L2 LB will use conntrack and ovs clone which hurts >>> > > > > performance >>> > > badly. >>> > > > > The latency >>> > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth >>> > > > > drop >>> > > from >>> > > > > 6Mb/s to 2.8Mb/s. >>> > > > > >>> > > Thanks for the input! >>> > > Could you tell roughly how many packets were sent in a single test? >>> > > Was >>> > > the latency measured for all the UDP packets in average? I am >>> > > asking >>> > > because if the packets hit mega flows in the kernel cache, it >>> > > shouldn't be >>> > > slower than kube-proxy which also uses conntrack. If it is HW >>> > > offloaded it >>> > > should be faster. >>> > > >>> > > > > Even if the traffic does not target to LB VIPs has the same >>> > > > > performance >>> > > > > drop and it also leads to the >>> > > > > total datapath cannot be offloaded to hardware. >>> > > > > >>> > > >>> > > Was it clear why the total datapath cannot be offloaded to HW? >>> > > There might >>> > > be problems of supporting HW offloading in earlier version of OVN. >>> > > There >>> > > have been improvements to make it more HW offload friendly. >>> > > >>> > > > > And finally we turn to using Cilium's chaining mode to replace >>> > > > > the OVN >>> > > L2 >>> > > > > LB to implement kube-proxy to >>> > > > > resolve the above issues. We hope to see the lb optimization by >>> > > eBPF/XDP on >>> > > > > the OVN side. >>> > > > > >>> > > > >>> > > > Thanks for your comments and inputs. I think we should >>> > > > definitely >>> > > > explore optimizing this use case >>> > > > and see if its possible to leverage eBPF/XDP for this. >>> > > > >>> > > >>> > > I am sorry that I am confused by OVN "L2" LB. I think you might >>> > > mean OVN >>> > > "L3/L4" LB? >>> > > >>> > > Some general thoughts on this is, OVN is primarily to program OVS >>> > > (or >>> > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a >>> > > data-driven approach (as mentioned by Ben in several talks). The >>> > > advantage >>> > > is that it uses caches to accelerate datapath, regardless of the >>> > > number of >>> > > pipeline stages in the forwarding logic; and the disadvantage is of >>> > > course >>> > > when a packet has a cache miss, it will be slow. So I would think >>> > > the >>> > > direction of using eBPF/XDP is better to be within OVS itself, >>> > > instead of >>> > > adding an extra stage that cannot be cached within the OVS >>> > > framework, >>> > > because even if the extra stage is very fast, it is still extra. >>> > > >>> > > I would consider such an extra eBPF/XDP stage in OVN directly only >>> > > for the >>> > > cases that we know it is likely to miss the OVS/HW flow caches. One >>> > > example >>> > > may be DOS attacks that always trigger CT unestablished entries, >>> > > which is >>> > > not HW offload friendly. (But I don't have concrete use >>> > > cases/scenarios) >>> > > >>> > > In the case of OVN LB, I don't see a reason why it would miss the >>> > > cache >>> > > except for the first packets. Adding an extra eBPF/XDP stage on top >>> > > of the >>> > > OVS cached pipeline doesn't seem to improve the performance. >>> > > >>> > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <zhouhan@gmail.com> >>> > > > > wrote: >>> > > > > >>> > > > > > On Mon, May 30, 2022 at 5:46 PM <numans@ovn.org> wrote: >>> > > > > > > >>> > > > > > > From: Numan Siddique <numans@ovn.org> >>> > > > > > > >>> > > > > > > XDP program - ovn_xdp.c added in this RFC patch series >>> > > > > > > implements >>> > > basic >>> > > > > > port >>> > > > > > > security and drops any packet if the port security check >>> > > > > > > fails. >>> > > > > > > There are still few TODOs in the port security checks. Like >>> > > > > > > - Make ovn xdp configurable. >>> > > > > > > - Removing the ingress Openflow rules from table 73 >>> > > > > > > and 74 >>> > > if ovn >>> > > > > > xdp >>> > > > > > > is enabled. >>> > > > > > > - Add IPv6 support. >>> > > > > > > - Enhance the port security xdp program for ARP/IPv6 >>> > > > > > > ND >>> > > checks. >>> > > > > > > >>> > > > > > > This patch adds a basic XDP support in OVN and in future we >>> > > > > > > can >>> > > > > > > leverage eBPF/XDP features. >>> > > > > > > >>> > > > > > > I'm not sure how much value this RFC patch adds to make use >>> > > > > > > of >>> > > eBPF/XDP >>> > > > > > > just for port security. Submitting as RFC to get some >>> > > > > > > feedback and >>> > > > > > > start some conversation on eBPF/XDP in OVN. >>> > > > > > > >>> > > > > > Hi Numan, >>> > > > > > >>> > > > > > This is really cool. It demonstrates how OVN could leverage >>> > > > > > eBPF/XDP. >>> > > > > > >>> > > > > > On the other hand, for the port-security feature in XDP, I >>> > > > > > keep >>> > > thinking >>> > > > > > about the scenarios and it is still not very clear to me. One >>> > > advantage I >>> > > > > > can think of is to prevent DOS attacks from VM/Pod when >>> > > > > > invalid >>> > > IP/MAC are >>> > > > > > used, XDP may perform better and drop packets with lower CPU >>> > > > > > cost >>> > > > > > (comparing with OVS kernel datapath). However, I am also >>> > > > > > wondering >>> > > why >>> > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you >>> > > > > > have >>> > > some more >>> > > > > > thoughts about the use cases? >>> > > > >>> > > > My idea was to demonstrate the use of eBPF/XDP and port security >>> > > > checks were easy to do >>> > > > before the packet hits the OVS pipeline. >>> > > > >>> > > Understand. It is indeed a great demonstration. >>> > > >>> > > > If we were to move the port security check to XDP, then the only >>> > > > advantage we would be getting >>> > > > in my opinion is to remove the corresponding ingress port >>> > > > security >>> > > > check related OF rules from ovs-vswitchd, thereby decreasing some >>> > > > looks up during >>> > > > flow translation. >>> > > > >>> > > For slow path, it might reduce the lookups in two tables, but >>> > > considering >>> > > that we have tens of tables, this cost may be negligible? >>> > > For fast path, there is no impact on the megaflow cache. >>> > > >>> > > > I'm not sure why an attacker would use invalid IP/MAC for DOS >>> > > > attacks. >>> > > > But from what I know, ovn-kubernetes do want to restrict each POD >>> > > > to >>> > > > its assigned IP/MAC. >>> > > > >>> > > Yes, restricting pods to use assigned IP/MAC is for port security, >>> > > which >>> > > is implemented by the port-security flows. I was talking about DOS >>> > > attacks >>> > > just to imagine a use case that utilizes the performance advantage >>> > > of XDP. >>> > > If it is just to detect and drop a regular amount of packets that >>> > > try to >>> > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't >>> > > reflect >>> > > the benefit of XDP. >>> > > >>> > > > And do you have any performance results >>> > > > > > comparing with the current OVS implementation? >>> > > > >>> > > > I didn't do any scale/performance related tests. >>> > > > >>> > > > If we were to move port security feature to XDP in OVN, then I >>> > > > think we >>> > > need to >>> > > > - Complete the TODO's like adding IPv6 and ARP/ND related >>> > > > checks >>> > > > - Do some scale testing and see whether its reducing memory >>> > > > footprint of ovs-vswitchd and ovn-controller because of the >>> > > > reduction >>> > > > in OF rules >>> > > > >>> > > >>> > > Maybe I am wrong, but I think port-security flows are only related >>> > > to >>> > > local LSPs on each node, which doesn't contribute much to the >>> > > OVS/ovn-controller memory footprint, and thanks to your patches >>> > > that moves >>> > > port-security flow generation from northd to ovn-controller, the >>> > > central >>> > > components are already out of the picture of the port-security >>> > > related >>> > > costs. So I guess we won't see obvious differences in scale tests. >>> > > >>> > > > > > >>> > > > > > Another question is, would it work with smart NIC HW-offload, >>> > > > > > where >>> > > VF >>> > > > > > representer ports are added to OVS on the smart NIC? I guess >>> > > > > > XDP >>> > > doesn't >>> > > > > > support representer port, right? >>> > > > >>> > > > I think so. I don't have much experience/knowledge on this. From >>> > > > what >>> > > > I understand, if datapath flows are offloaded and since XDP is >>> > > > not >>> > > > offloaded, the xdo checks will be totally missed. >>> > > > So if XDP is to be used, then offloading should be disabled. >>> > > > >>> > > >>> > > Agree, although I did hope it could help for HW offload enabled >>> > > environments to mitigate the scenarios when packets would miss the >>> > > HW flow >>> > > cache. >>> > > >>> > > Thanks, >>> > > Han >>> > > >>> > > > Thanks >>> > > > Numan >>> > > > >>> > > > > > >>> > > > > > Thanks, >>> > > > > > Han >>> > > > > > >>> > > > > > > In order to attach and detach xdp programs, libxdp [1] and >>> > > > > > > libbpf >>> > > is >>> > > > > > used. >>> > > > > > > >>> > > > > > > To test it out locally, please install libxdp-devel and >>> > > libbpf-devel >>> > > > > > > and the compile OVN first and then compile ovn_xdp by >>> > > > > > > running "make >>> > > > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or >>> > > /usr/local/share/ovn/ >>> > > > > > > >>> > > > > > > >>> > > > > > > Numan Siddique (2): >>> > > > > > > RFC: Add basic xdp/eBPF support in OVN. >>> > > > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the >>> > > > > > > logical >>> > > > > > > ports. >>> > > > > > > >>> > > > > > > Makefile.am | 6 +- >>> > > > > > > bpf/.gitignore | 5 + >>> > > > > > > bpf/automake.mk | 23 +++ >>> > > > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ >>> > > > > > > configure.ac | 2 + >>> > > > > > > controller/automake.mk | 4 +- >>> > > > > > > controller/binding.c | 45 +++-- >>> > > > > > > controller/binding.h | 7 + >>> > > > > > > controller/ovn-controller.c | 79 +++++++- >>> > > > > > > controller/xdp.c | 389 >>> > > ++++++++++++++++++++++++++++++++++++ >>> > > > > > > controller/xdp.h | 41 ++++ >>> > > > > > > m4/ovn.m4 | 20 ++ >>> > > > > > > tests/automake.mk | 1 + >>> > > > > > > 13 files changed, 753 insertions(+), 25 deletions(-) >>> > > > > > > create mode 100644 bpf/.gitignore >>> > > > > > > create mode 100644 bpf/automake.mk >>> > > > > > > create mode 100644 bpf/ovn_xdp.c >>> > > > > > > create mode 100644 controller/xdp.c >>> > > > > > > create mode 100644 controller/xdp.h >>> > > > > > > >>> > > > > > > -- >>> > > > > > > 2.35.3 >>> > > > > > > >>> > > > > > > _______________________________________________ >>> > > > > > > dev mailing list >>> > > > > > > dev@openvswitch.org >>> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > > > > > _______________________________________________ >>> > > > > > dev mailing list >>> > > > > > dev@openvswitch.org >>> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > > > > > >>> > > > > >>> > > > > >>> > > > > -- >>> > > > > 刘梦馨 >>> > > > > Blog: http://oilbeater.com >>> > > > > Weibo: @oilbeater <http://weibo.com/oilbeater> >>> > > > > _______________________________________________ >>> > > > > dev mailing list >>> > > > > dev@openvswitch.org >>> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > > >>> > >>> > >>> >>> >> >> -- >> 刘梦馨 >> Blog: http://oilbeater.com >> Weibo: @oilbeater <http://weibo.com/oilbeater> >> > > > -- > 刘梦馨 > Blog: http://oilbeater.com > Weibo: @oilbeater <http://weibo.com/oilbeater> >
From: Numan Siddique <numans@ovn.org> XDP program - ovn_xdp.c added in this RFC patch series implements basic port security and drops any packet if the port security check fails. There are still few TODOs in the port security checks. Like - Make ovn xdp configurable. - Removing the ingress Openflow rules from table 73 and 74 if ovn xdp is enabled. - Add IPv6 support. - Enhance the port security xdp program for ARP/IPv6 ND checks. This patch adds a basic XDP support in OVN and in future we can leverage eBPF/XDP features. I'm not sure how much value this RFC patch adds to make use of eBPF/XDP just for port security. Submitting as RFC to get some feedback and start some conversation on eBPF/XDP in OVN. In order to attach and detach xdp programs, libxdp [1] and libbpf is used. To test it out locally, please install libxdp-devel and libbpf-devel and the compile OVN first and then compile ovn_xdp by running "make bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/ Numan Siddique (2): RFC: Add basic xdp/eBPF support in OVN. RFC: ovn-controller: Attach XDP progs to the VIFs of the logical ports. Makefile.am | 6 +- bpf/.gitignore | 5 + bpf/automake.mk | 23 +++ bpf/ovn_xdp.c | 156 +++++++++++++++ configure.ac | 2 + controller/automake.mk | 4 +- controller/binding.c | 45 +++-- controller/binding.h | 7 + controller/ovn-controller.c | 79 +++++++- controller/xdp.c | 389 ++++++++++++++++++++++++++++++++++++ controller/xdp.h | 41 ++++ m4/ovn.m4 | 20 ++ tests/automake.mk | 1 + 13 files changed, 753 insertions(+), 25 deletions(-) create mode 100644 bpf/.gitignore create mode 100644 bpf/automake.mk create mode 100644 bpf/ovn_xdp.c create mode 100644 controller/xdp.c create mode 100644 controller/xdp.h