Message ID | 1594224636-42337-1-git-send-email-u9012063@gmail.com |
---|---|
Headers | show |
Series | Add VxLAN encap support for tc offload. | expand |
On 7/8/20 6:10 PM, William Tu wrote: > The patch adds VxLAN encap tc-offload support. The userspace datapath, dpif-netdev, > flow format differs than the kernel datapath in case of tunnel encap. Unlike kernel, > the dpif-netdev does not use set and output action, but uses a single clone action with > all the tunnel info nested inside. As an exmaple blow: > actions:clone(tnl_push(tnl_port(5), > header(size=50,type=4,eth(dst=06:1d:6e:a3:f1:61,src=26:df:25:f6:7b:4f,dl_type=0x0800), > ipv4(src=172.31.1.100,dst=172.31.1.1,proto=17,tos=0,ttl=64,frag=0x4000), > udp(src=0,dst=4789,csum=0x0), > vxlan(flags=0x8000000,vni=0x0)),out_port(2) > ), 3) > > The patch parses the above tunnel encap format and passes to the tc for > offloading the VxLAN tunnel. The idea is similar to the recent dpdk > offload patchset: > netdev-offload-dpdk: Support offload of clone tnl_push/output actions > > Example of tc format: > $ tc -s filter show dev ovs-p1 ingress > filter protocol ip pref 3 flower chain 0 > filter protocol ip pref 3 flower chain 0 handle 0x1 > dst_mac 56:2a:1f:3c:bb:f2 > src_mac 96:0c:a7:b0:60:a4 > eth_type ipv4 > ip_tos 0/0x3 > ip_flags nofrag > skip_hw > not_in_hw > action order 1: tunnel_key set > src_ip 172.31.1.100 > dst_ip 172.31.1.1 > key_id 0 > dst_port 4789 > nocsum > ttl 64 pipe > index 2 ref 1 bind 1 installed 0 sec used 0 sec > Action statistics: > Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > no_percpu > > action order 2: mirred (Egress Redirect to device ovs-p0) stolen > index 2 ref 1 bind 1 installed 0 sec used 0 sec > Action statistics: > Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > cookie b46e99079448ce581d0fe7a9853c0bb5 > no_percpu > > Ilya Maximets (2): > netdev: Allow storing dpif type into netdev structure. > netdev-offload: Use dpif type instead of class. > > William Tu (1): > netdev-offload-tc: Add VxLAN encap support. > > lib/dpif-netdev.c | 15 ++--- > lib/dpif-netlink.c | 23 ++++---- > lib/dpif.c | 21 ++++--- > lib/netdev-offload-dpdk.c | 17 +++--- > lib/netdev-offload-tc.c | 124 +++++++++++++++++++++++++++++++++++++++++- > lib/netdev-offload.c | 52 +++++++++--------- > lib/netdev-offload.h | 16 +++--- > lib/netdev-provider.h | 3 +- > lib/netdev.c | 16 ++++++ > lib/netdev.h | 2 + > ofproto/ofproto-dpif-upcall.c | 5 +- > 11 files changed, 217 insertions(+), 77 deletions(-) > Hi. That is interesting thing. I didn't look to the code, but I have a question. IIUC, you're running userspace datapath with some linux ports and linux_tc offloading provider enabled for them. I tried this combination previously and it has an issue that having a RAW socket open, even if packet was redirected by TC to another OVS port, we will still receive it via RAW socket at least on the destination port. I'm not sure how to work around this issue. Do you have any thoughts? Or you're using HW offloading with afxdp or/and skip_sw flag? I guess, there should be no such issue in this case if packet never reaches the kernel tc. BTW, I merged the patch-set from Eli, so first two patches are in repository now. Best regards, Ilya Maximets.
On Wed, Jul 08, 2020 at 07:55:58PM +0200, Ilya Maximets wrote: > On 7/8/20 6:10 PM, William Tu wrote: > > The patch adds VxLAN encap tc-offload support. The userspace datapath, dpif-netdev, > > flow format differs than the kernel datapath in case of tunnel encap. Unlike kernel, > > the dpif-netdev does not use set and output action, but uses a single clone action with > > all the tunnel info nested inside. As an exmaple blow: > > actions:clone(tnl_push(tnl_port(5), > > header(size=50,type=4,eth(dst=06:1d:6e:a3:f1:61,src=26:df:25:f6:7b:4f,dl_type=0x0800), > > ipv4(src=172.31.1.100,dst=172.31.1.1,proto=17,tos=0,ttl=64,frag=0x4000), > > udp(src=0,dst=4789,csum=0x0), > > vxlan(flags=0x8000000,vni=0x0)),out_port(2) > > ), 3) > > > > The patch parses the above tunnel encap format and passes to the tc for > > offloading the VxLAN tunnel. The idea is similar to the recent dpdk > > offload patchset: > > netdev-offload-dpdk: Support offload of clone tnl_push/output actions snip > > Hi. > > That is interesting thing. I didn't look to the code, but I have a question. > IIUC, you're running userspace datapath with some linux ports and linux_tc > offloading provider enabled for them. I tried this combination previously > and it has an issue that having a RAW socket open, even if packet was redirected > by TC to another OVS port, we will still receive it via RAW socket at least > on the destination port. I'm not sure how to work around this issue. > Do you have any thoughts? Yes, I encountered the same issue. IIUC, the reason is when registering a raw socket, at kernel __netif_receive_skb_core(), the packet is delivered to raw socket first, then calls the sch_handle_ingress(). So even though at TC layer we return TC_ACT_SHOT, the packet is already delivered to raw socket and seen by OVS. And this causes my ping test reporting 64 bytes from 10.1.1.2: icmp_seq=7 ttl=64 time=0.503 ms (DUP!) 64 bytes from 10.1.1.2: icmp_seq=7 ttl=64 time=0.508 ms (DUP!) Even using afxdp socket has the same issue, because the skb deliver point, do_xdp_generic() is also before tc. > > Or you're using HW offloading with afxdp or/and skip_sw flag? I guess, there > should be no such issue in this case if packet never reaches the kernel tc. > I don't have a solution to this problem, but my plan is that So for testing, I'm using the software tc-flower, skip_hw. And once everything works, we should use HW offload (skip_sw) with afxdp. > BTW, I merged the patch-set from Eli, so first two patches are in repository > now. > Thanks for your comment, I will work on v2. William