mbox series

[net-next,v2,0/4] net/sched: Introduce tc connection tracking

Message ID 1561038141-31370-1-git-send-email-paulb@mellanox.com
Headers show
Series net/sched: Introduce tc connection tracking | expand

Message

Paul Blakey June 20, 2019, 1:42 p.m. UTC
Hi,

This patch series add connection tracking capabilities in tc sw datapath.
It does so via a new tc action, called act_ct, and new tc flower classifier matching
on conntrack state, mark and label.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

The pattern used in the design here closely resembles OvS, as the plan is to also offload
OvS conntrack rules to tc. OvS datapath rules uses it's recirculation mechanism to send
specific packets to conntrack, and return with the new conntrack state (ct_state) on some other recirc_id
to be matched again (we use goto chain for this).

This results in the following OvS datapath rules:

recirc_id(0),in_port(ens1f0_0),ct_state(-trk),... actions:ct(zone=2),recirc(2)
recirc_id(2),in_port(ens1f0_0),ct_state(+new+trk),ct_mark(0xbb),... actions:ct(commit,zone=2,nat(src=5.5.5.7),mark=0xbb),ens1f0_1
recirc_id(2),in_port(ens1f0_0),ct_state(+est+trk),ct_mark(0xbb),... actions:ct(zone=2,nat),ens1f0_1

recirc_id(1),in_port(ens1f0_1),ct_state(-trk),... actions:ct(zone=2),recirc(1)
recirc_id(1),in_port(ens1f0_1),ct_state(+est+trk),... actions:ct(zone=2,nat),ens1f0_0

Changelog:
	See individual patches.

Paul Blakey (4):
  net/sched: Introduce action ct
  net/flow_dissector: add connection tracking dissection
  net/sched: cls_flower: Add matching on conntrack info
  tc-tests: Add tc action ct tests

 include/linux/skbuff.h                             |  10 +
 include/net/flow_dissector.h                       |  15 +
 include/net/flow_offload.h                         |   5 +
 include/net/tc_act/tc_ct.h                         |  64 ++
 include/uapi/linux/pkt_cls.h                       |  17 +
 include/uapi/linux/tc_act/tc_ct.h                  |  41 +
 net/core/flow_dissector.c                          |  44 +
 net/sched/Kconfig                                  |  11 +
 net/sched/Makefile                                 |   1 +
 net/sched/act_ct.c                                 | 978 +++++++++++++++++++++
 net/sched/cls_api.c                                |   5 +
 net/sched/cls_flower.c                             | 127 ++-
 .../selftests/tc-testing/tc-tests/actions/ct.json  | 314 +++++++
 13 files changed, 1627 insertions(+), 5 deletions(-)
 create mode 100644 include/net/tc_act/tc_ct.h
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 net/sched/act_ct.c
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/ct.json

Comments

Cong Wang June 24, 2019, 5:59 p.m. UTC | #1
On Thu, Jun 20, 2019 at 6:43 AM Paul Blakey <paulb@mellanox.com> wrote:
>
> Hi,
>
> This patch series add connection tracking capabilities in tc sw datapath.
> It does so via a new tc action, called act_ct, and new tc flower classifier matching
> on conntrack state, mark and label.

Thanks for more detailed description here.

I still don't see why we have to do this in L2, mind to be more specific?

IOW, if you really want to manipulate conntrack info and use it for
matching, why not do it in netfilter layer as it is where conntrack is?

BTW, if the cls_flower ct_state matching is not in upstream yet, please
try to push it first, as it is a justification of this patchset.

Thanks.
Paul Blakey June 30, 2019, 8:43 a.m. UTC | #2
On 6/24/2019 8:59 PM, Cong Wang wrote:

> On Thu, Jun 20, 2019 at 6:43 AM Paul Blakey <paulb@mellanox.com> wrote:
>> Hi,
>>
>> This patch series add connection tracking capabilities in tc sw datapath.
>> It does so via a new tc action, called act_ct, and new tc flower classifier matching
>> on conntrack state, mark and label.
> Thanks for more detailed description here.
>
> I still don't see why we have to do this in L2, mind to be more specific?

tc is an complete datapath, and does it's routing/manipulation before 
the kernel stack (here the hooks

are on device ingress qdisc), for example, take this simple namespace setup

#setup 2 reps
sudo ip netns add ns0
sudo ip netns add ns1
sudo ip link add vm type veth peer name vm_rep
sudo ip link add vm2 type veth peer name vm2_rep
sudo ip link set vm netns ns0
sudo ip link set vm2 netns ns1
sudo ip netns exec ns0 ifconfig vm 3.3.3.3/24 up
sudo ip netns exec ns1 ifconfig vm2 3.3.3.4/24 up
sudo ifconfig vm_rep up
sudo ifconfig vm2_rep up
sudo tc qdisc add dev vm_rep ingress
sudo tc qdisc add dev vm2_rep ingress

#outbound
sudo tc filter add dev vm_rep ingress proto ip chain 0 prio 1 flower 
ct_state -trk     action mirred egress redirect dev vm2_rep
sudo tc filter add dev vm_rep ingress proto ip chain 1 prio 1 flower 
ct_state +trk+new action ct commit pipe action mirred egress redirect 
dev vm2_rep
sudo tc filter add dev vm_rep ingress proto ip chain 1 prio 1 flower 
ct_state +trk+est action mirred egress redirect dev vm2_rep

#inbound
sudo tc filter add dev vm2_rep ingress proto ip chain 0 prio 1 flower 
ct_state -trk     action mirred egress redirect dev vm_rep
sudo tc filter add dev vm2_rep ingress proto ip chain 1 prio 1 flower 
ct_state +trk+est action mirred egress redirect dev vm_rep

#handle arps
sudo tc filter add dev vm2_rep ingress proto arp chain 0 prio 2 flower 
action mirred egress redirect dev vm_rep
sudo tc filter add dev vm_rep ingress proto arp chain 0 prio 2 flower 
action mirred egress redirect dev vm2_rep

#run traffic
sudo timeout 20 ip netns exec ns1 iperf -s&
sudo ip netns exec ns0 iperf -c 3.3.3.4 -t 10


The traffic is handled in tc datapath layer and the user here decided 
how to route the packets.

In a real world exmaple,  we are going to use it with SRIOV where the tc 
rules are on representors, and the vms above are

SRIOV vfs attached to VMs. We also don't want to send any packet to 
conntrack just those that we want,

and we might do manipulation on the packet before sending it to 
conntrack such as with tc action pedit , in a router

setup (change macs, ips).

>
> IOW, if you really want to manipulate conntrack info and use it for
> matching, why not do it in netfilter layer as it is where conntrack is?
>
> BTW, if the cls_flower ct_state matching is not in upstream yet, please
> try to push it first, as it is a justification of this patchset.
>
> Thanks.

It's patch 3/4 of this patch set, I can move it to be first