Message ID | 1589464110-7571-1-git-send-email-paulb@mellanox.com |
---|---|
Headers | show |
Series | net/sched: act_ct: Add support for specifying tuple offload policy | expand |
On 14/05/2020 14:48, Paul Blakey wrote: > To avoid conflicting policies, the policy is applied per zone on the first > act ct instance for that zone, and must be repeated in all further act ct > instances of the same zone. Is the scope of this the entire zone, or just offload of that zone to a specific device? Either way, the need to repeat the policy on every tc command suggests that there really ought to instead be a separate API for configuring conntrack offload policy, either per zone or per (zone, device) pair, as appropriate. -ed
Thu, May 14, 2020 at 04:04:02PM CEST, ecree@solarflare.com wrote: >On 14/05/2020 14:48, Paul Blakey wrote: >> To avoid conflicting policies, the policy is applied per zone on the first >> act ct instance for that zone, and must be repeated in all further act ct >> instances of the same zone. >Is the scope of this the entire zone, or just offload of that zone to a > specific device? >Either way, the need to repeat the policy on every tc command suggests > that there really ought to instead be a separate API for configuring > conntrack offload policy, either per zone or per (zone, device) pair, > as appropriate. You don't have to repeat. You specify it in the first one, in the rest you omit it. > >-ed
On 14/05/2020 15:49, Jiri Pirko wrote: > Thu, May 14, 2020 at 04:04:02PM CEST, ecree@solarflare.com wrote: >> Either way, the need to repeat the policy on every tc command suggests >> that there really ought to instead be a separate API for configuring >> conntrack offload policy, either per zone or per (zone, device) pair, >> as appropriate. > You don't have to repeat. You specify it in the first one, in the rest > you omit it. Ok, well (a) the commit message needs changing to make that clear, and (b) that kind of implicit action-at-a-distance slightly bothers me. If you don't repeat, then the order of tc commands matters, and any program (e.g. OvS) generating these commands will need to either keep track of which zones it's configured policy for already, or just repeat on every command just in case. It really doesn't feel like an orthogonal, Unixy API to me. <offtopic rambliness="very"> TBH I think that when a tc rule with a ct action is offloaded, drivers should get three callbacks in order: 1) a CT zone callback (if the CT zone is 'new') 2) an action callback, including a pointer to the CT zone info (in case the driver chose to ignore the CT zone callback because it had no offloading work to do at that point) (if the action is 'new') 3) a rule callback, including a pointer to the action info (in case the driver ignored the action creation). And each of these should be drivable directly from userspace as well as being called automatically by the level below it. Currently we have (2) as a distinct entity in TC, but no-one offloads it (and as I've ranted before, that makes a mess of stats) and AIUI it's not called when user creates a rule, only when using 'tc action' command directly). And (1) doesn't exist at all; drivers just have to notice the first time a tc ct action they're offloading mentions a given zone, and call into nf_flow_table_offload to register for tracked conns in that zone. I feel that this hierarchical 'offload dependencies first' model would make drivers simpler and more robust, as well as helping to ensure different drivers share a consistent interpretation of the API. RFC on the above? Obviously I'm not likely to start implementing it until after we have a TC-supporting sfc driver upstream (it's coming Real Soon Now™, I swear!) but I thought it worthwhile to throw the design up for discussion earlier rather than later. </offtopic> -ed
On 14/05/2020 18:28, Edward Cree wrote: > On 14/05/2020 15:49, Jiri Pirko wrote: >> Thu, May 14, 2020 at 04:04:02PM CEST, ecree@solarflare.com wrote: >>> Either way, the need to repeat the policy on every tc command suggests >>> that there really ought to instead be a separate API for configuring >>> conntrack offload policy, either per zone or per (zone, device) pair, >>> as appropriate. >> You don't have to repeat. You specify it in the first one, in the rest >> you omit it. > Ok, well (a) the commit message needs changing to make that clear, and > (b) that kind of implicit action-at-a-distance slightly bothers me. > If you don't repeat, then the order of tc commands matters, and any > program (e.g. OvS) generating these commands will need to either keep > track of which zones it's configured policy for already, or just > repeat on every command just in case. > It really doesn't feel like an orthogonal, Unixy API to me. You were right that a tc user needs to supply the same policy for all ct action instances of the same zone (not per zone/dev). Actually, in my internal previous version of this patchset, if the user doesn't specify a policy, it uses what ever was configured before, and if the user does specify, it must match the previosuly configured table parameter or override a previously unconfigured table parameter (or new table). But we think, as you pointed out, explicit as here is better, there is just no API to configure the flow table currently so we suggested this. Do you have any suggestion for an API that would be better? Yes in OvS we planned on submitting the same global policy for all ct actions. Paul.
On 18/05/2020 17:17, Paul Blakey wrote: > But we think, as you pointed out, explicit as here is better, there is just no API to configure the flow table currently so we suggested this. > Do you have any suggestion for an API that would be better? I see two possible approaches. We could either say "conntrack is part of netfilter, so this should be an nftnetlink API", or we could say "this is about configuring TC offload (of conntracks), so it belongs in a TC command". I lean towards the latter mainly so that I don't have to install & learn netfilter commands (the current conntrack offload can be enabled without touching them). So it'd be something like "tc ct add zone 1 timeout 120 pkts 10", and if a 'tc filter add' or 'tc action add' references a ct zone that hasn't been 'tc ct add'ed, it gets automatically added with the default policy (and if you come along later and try to 'tc ct add' it you get an EBUSY or EEXIST or something). WDYT? -ed
Mon, May 18, 2020 at 06:48:46PM CEST, ecree@solarflare.com wrote: >On 18/05/2020 17:17, Paul Blakey wrote: >> But we think, as you pointed out, explicit as here is better, there is just no API to configure the flow table currently so we suggested this. >> Do you have any suggestion for an API that would be better? >I see two possible approaches. We could either say "conntrack is > part of netfilter, so this should be an nftnetlink API", or we > could say "this is about configuring TC offload (of conntracks), > so it belongs in a TC command". I lean towards the latter mainly > so that I don't have to install & learn netfilter commands (the > current conntrack offload can be enabled without touching them). >So it'd be something like "tc ct add zone 1 timeout 120 pkts 10", > and if a 'tc filter add' or 'tc action add' references a ct zone > that hasn't been 'tc ct add'ed, it gets automatically added with > the default policy (and if you come along later and try to 'tc ct > add' it you get an EBUSY or EEXIST or something). Is it worth to have an object just for this particular purpose? In the past I was trying to push a tc block object that could be added/removed and being used to insert filters w/o being attached to any qdisc. This was frowned upon and refused because the existence of block does not have any meaning w/o being attached. What you suggest with zone sounds quite similar. More to that, it is related only to act_ct. Is it a good idea to have a common object in TC which is actually used as internal part of act_ct only? To be honest, I don't like the idea. > >WDYT? > >-ed
On 18/05/2020 18:25, Jiri Pirko wrote: > Is it worth to have an object just for this particular purpose? In the > past I was trying to push a tc block object that could be added/removed > and being used to insert filters w/o being attached to any qdisc. This > was frowned upon and refused because the existence of block does not > have any meaning w/o being attached. A tc action doesn't have any meaning either until it is attached to a filter. Is the consensus that the 'tc action' API/command set was a mistake, or am I misunderstanding the objection? > What you suggest with zone sounds quite similar. More to that, it is > related only to act_ct. Is it a good idea to have a common object in TC > which is actually used as internal part of act_ct only? Well, really it's related as much to flower ct_stateas to act_ct: the policy numbers control when a conntrack rule (from the zone) gets offloaded into drivers, thus determining whether a packet (which has been through an act_ct to make it +trk) is ±est. It's because it has a scope broader than a single ct action that I'm resistant to hanging it off act_ct in this way. Also it still somewhat bothers me that this policy isn't scoped to the device; I realise that the current implementation of a single flow table shared by all offloading devices is what forces that, but it just doesn't seem semantically right that the policy on when to offload a connection is global across devices with potentially differing capabilities (e.g. size of their conntrack tables) that might want different policies. (And a 'tc ct add dev eth0 zone 1 policy_blah...' would conveniently give a hook point for callback (1) from my offtopic ramble, that the driver could use to register for connections in the zone and start offloading them to hardware, rather than doing it the first time it sees that zone show up in an act_ct it's offloading. You don't really want to do the same in the non-device-qualified case because that could use up HW table space for connections in a zone you're not offloading any rules for.) Basically I'm just dreaming of a world where TC does a lot more with explicit objects that it creates and then references, rather than drivers having to implicitly create HW objects for things the first time a rule tries to reference them. "Is it worth" all these extra objects? Really that depends on how much simpler the drivers can become as a result; this is the control path, so programmer time is worth more than machine time, and space in the programmer's head is worth more than machine RAM ;-) -ed
On 5/18/2020 9:02 PM, Edward Cree wrote: > On 18/05/2020 18:25, Jiri Pirko wrote: >> Is it worth to have an object just for this particular purpose? In the >> past I was trying to push a tc block object that could be added/removed >> and being used to insert filters w/o being attached to any qdisc. This >> was frowned upon and refused because the existence of block does not >> have any meaning w/o being attached. > A tc action doesn't have any meaning either until it is attached to a > filter. Is the consensus that the 'tc action' API/command set was a > mistake, or am I misunderstanding the objection? > >> What you suggest with zone sounds quite similar. More to that, it is >> related only to act_ct. Is it a good idea to have a common object in TC >> which is actually used as internal part of act_ct only? > Well, really it's related as much to flower ct_stateas to act_ct: the > policy numbers control when a conntrack rule (from the zone) gets > offloaded into drivers, thus determining whether a packet (which has > been through an act_ct to make it +trk) is ±est. It doesn't affect when a connection will become established (+est), just the offloading of such connections. > It's because it has a scope broader than a single ct action that I'm > resistant to hanging it off act_ct in this way. > > Also it still somewhat bothers me that this policy isn't scoped to the > device; I realise that the current implementation of a single flow > table shared by all offloading devices is what forces that, but it > just doesn't seem semantically right that the policy on when to > offload a connection is global across devices with potentially > differing capabilities (e.g. size of their conntrack tables) that > might want different policies. > (And a 'tc ct add dev eth0 zone 1 policy_blah...' would conveniently > give a hook point for callback (1) from my offtopic ramble, that the > driver could use to register for connections in the zone and start > offloading them to hardware, rather than doing it the first time it > sees that zone show up in an act_ct it's offloading. You don't > really want to do the same in the non-device-qualified case because > that could use up HW table space for connections in a zone you're > not offloading any rules for.) > > Basically I'm just dreaming of a world where TC does a lot more with > explicit objects that it creates and then references, rather than > drivers having to implicitly create HW objects for things the first > time a rule tries to reference them. > "Is it worth" all these extra objects? Really that depends on how > much simpler the drivers can become as a result; this is the control > path, so programmer time is worth more than machine time, and space > in the programmer's head is worth more than machine RAM ;-) > > -ed I see what you mean here, but this is only used to control action ct behavior and we don't expect this to be used or referenced in other actions/filters. What you are suggesting will require new userspace and kernel (builtin) tc netlink API to manage conntrack zones/nf flow tables policies. I'm not sure how well it will sit with the flow table having a device while the filter has a tc block which can have multiple devices. And then we have the single IPS_OFFLOAD_BIT so a flow can't currently be shared between different flow tables that will be created for different devices. We will need to do a an atomic lookup/insert to each table. So this will need a lot of work, and I think might be a overkill till we have more use cases besides the policy per device case which can still be achieved, if needed, with different conntrack zones.
On 26/05/2020 10:25, Paul Blakey wrote: > On 5/18/2020 9:02 PM, Edward Cree wrote: >> Well, really it's related as much to flower ct_state as to act_ct: the >> policy numbers control when a conntrack rule (from the zone) gets >> offloaded into drivers, thus determining whether a packet (which has >> been through an act_ct to make it +trk) is ±est. > It doesn't affect when a connection will become established (+est), > just the offloading of such connections. Yes; what I meant was that the packet wouldn't match a +trk+estrule in hardware if the connection hadn't been offloaded (it would instead go to software, where it would match). So the policy is in a sense controlling ct_state offload, as much as it is act_ct offload. > I'm not sure how well it will sit with the flow table having a device while > the filter has a tc block which can have multiple devices. > > And then we have the single IPS_OFFLOAD_BIT so a flow can't currently be > shared between different flow tables that will be created for different devices. > We will need to do a an atomic lookup/insert to each table. I see; it does get a bit complicated. I agree that the 'policy per device' use case is too speculative to justify all that work. But I think I can still make my dream happen with the global policy, just by putting the "we used this CT zone in an action, we need to offload it" smarts into TC rather than all the drivers. Which I suppose is orthogonal to the TC CT policy uAPI question; but I still prefer the 'explicit CT object' approach. To do those smarts, TC would need a CT object internally anyway, so we may as well expose it and get a clean policy API into the bargain. > What you are suggesting will require new userspace and kernel (builtin) > tc netlink API to manage conntrack zones/nf flow tables policies. Either way there's new API. Yours may be a smaller change (just adding new attributes hung off the act_ct), but you still need to update userspace to take advantage of it, else you get the default policy. And `tc ct add zone 1 policy_timeout 120` (global policy) can be done backwards-compatibly, too, as I described in my message before last. (Policy-per-device would be much more of a pain new-uAPI-wise; again, I agree to drop that idea.) -ed