Message ID | 20191022141804.27639-1-vladbu@mellanox.com |
---|---|
Headers | show |
Series | Control action percpu counters allocation by netlink flag | expand |
On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: > Currently, significant fraction of CPU time during TC filter allocation > is spent in percpu allocator. Moreover, percpu allocator is protected > with single global mutex which negates any potential to improve its > performance by means of recent developments in TC filter update API that > removed rtnl lock for some Qdiscs and classifiers. In order to > significantly improve filter update rate and reduce memory usage we > would like to allow users to skip percpu counters allocation for > specific action if they don't expect high traffic rate hitting the > action, which is a reasonable expectation for hardware-offloaded setup. > In that case any potential gains to software fast-path performance > gained by usage of percpu-allocated counters compared to regular integer > counters protected by spinlock are not important, but amount of > additional CPU and memory consumed by them is significant. Yes! I wonder how this can play together with conntrack offloading. With it the sw datapath will be more used, as a conntrack entry can only be offloaded after the handshake. That said, the host can have to process quite some handshakes in sw datapath. Seems OvS can then just not set this flag in act_ct (and others for this rule), and such cases will be able to leverage the percpu stats. Right? > allocator, but not for action idr lock, which is per-action. Note that > percpu allocator is still used by dst_cache in tunnel_key actions and > consumes 4.68% CPU time. Dst_cache seems like good opportunity for > further insertion rate optimization but is not addressed by this change. I vented this idea re dst_cache last week with Paolo. He sent me a draft patch but I didn't test it yet. Thanks, Marcelo
On Tue 22 Oct 2019 at 17:35, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >> Currently, significant fraction of CPU time during TC filter allocation >> is spent in percpu allocator. Moreover, percpu allocator is protected >> with single global mutex which negates any potential to improve its >> performance by means of recent developments in TC filter update API that >> removed rtnl lock for some Qdiscs and classifiers. In order to >> significantly improve filter update rate and reduce memory usage we >> would like to allow users to skip percpu counters allocation for >> specific action if they don't expect high traffic rate hitting the >> action, which is a reasonable expectation for hardware-offloaded setup. >> In that case any potential gains to software fast-path performance >> gained by usage of percpu-allocated counters compared to regular integer >> counters protected by spinlock are not important, but amount of >> additional CPU and memory consumed by them is significant. > > Yes! > > I wonder how this can play together with conntrack offloading. With > it the sw datapath will be more used, as a conntrack entry can only be > offloaded after the handshake. That said, the host can have to > process quite some handshakes in sw datapath. Seems OvS can then just > not set this flag in act_ct (and others for this rule), and such cases > will be able to leverage the percpu stats. Right? The flag is set per each actions instance so client can chose not to use the flag in case-by-case basis. Conntrack use case requires further investigation since I'm not entirely convinced that handling first few packets in sw (before connection reaches established state and is offloaded) warrants having percpu counter. > >> allocator, but not for action idr lock, which is per-action. Note that >> percpu allocator is still used by dst_cache in tunnel_key actions and >> consumes 4.68% CPU time. Dst_cache seems like good opportunity for >> further insertion rate optimization but is not addressed by this change. > > I vented this idea re dst_cache last week with Paolo. He sent me a > draft patch but I didn't test it yet. Looking forward to it! > > Thanks, > Marcelo
On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: > - Extend actions that are used for hardware offloads with optional > netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and > update affected actions to not allocate percpu counters when the flag > is set. I just went over all the patches and they mostly make sense to me. So far the only point I'm uncertain of is the naming of the flag, "fast_init". That is not clear on what it does and can be overloaded with other stuff later and we probably don't want that. Say, for example, we want percpu counters but to disable allocating the stats for hw, to make the counter in 28169abadb08 ("net/sched: Add hardware specific counters to TC actions") optional. So what about: TCA_ACT_FLAGS_NO_PERCPU_STATS TCA_ACT_FLAGS_NO_HW_STATS (this one to be done on a subsequent patchset, yes) ? Marcelo
On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >> - Extend actions that are used for hardware offloads with optional >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and >> update affected actions to not allocate percpu counters when the flag >> is set. > > I just went over all the patches and they mostly make sense to me. So > far the only point I'm uncertain of is the naming of the flag, > "fast_init". That is not clear on what it does and can be overloaded > with other stuff later and we probably don't want that. I intentionally named it like that because I do want to overload it with other stuff in future, instead of adding new flag value for every single small optimization we might come up with :) Also, I didn't want to hardcode implementation details into UAPI that we will have to maintain for long time after percpu allocator in kernel is potentially replaced with something new and better (like idr is being replaced with xarray now, for example) Anyway, lets see what other people think. I'm open to changing it. > > Say, for example, we want percpu counters but to disable allocating > the stats for hw, to make the counter in 28169abadb08 ("net/sched: Add > hardware specific counters to TC actions") optional. > > So what about: > TCA_ACT_FLAGS_NO_PERCPU_STATS > TCA_ACT_FLAGS_NO_HW_STATS (this one to be done on a subsequent patchset, yes) > ? > > Marcelo
On Tue, Oct 22, 2019 at 03:52:31PM +0000, Vlad Buslov wrote: > > On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: > >> - Extend actions that are used for hardware offloads with optional > >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and > >> update affected actions to not allocate percpu counters when the flag > >> is set. > > > > I just went over all the patches and they mostly make sense to me. So > > far the only point I'm uncertain of is the naming of the flag, > > "fast_init". That is not clear on what it does and can be overloaded > > with other stuff later and we probably don't want that. > > I intentionally named it like that because I do want to overload it with > other stuff in future, instead of adding new flag value for every single > small optimization we might come up with :) Hah :-) > > Also, I didn't want to hardcode implementation details into UAPI that we > will have to maintain for long time after percpu allocator in kernel is > potentially replaced with something new and better (like idr is being > replaced with xarray now, for example) I see. OTOH, this also means that the UAPI here would be unstable (different meanings over time for the same call), and hopefully new behaviors would always be backwards compatible. > > Anyway, lets see what other people think. I'm open to changing it. > > > > > Say, for example, we want percpu counters but to disable allocating > > the stats for hw, to make the counter in 28169abadb08 ("net/sched: Add > > hardware specific counters to TC actions") optional. > > > > So what about: > > TCA_ACT_FLAGS_NO_PERCPU_STATS > > TCA_ACT_FLAGS_NO_HW_STATS (this one to be done on a subsequent patchset, yes) > > ? > > > > Marcelo >
Vlad Buslov <vladbu@mellanox.com> writes: > On Tue 22 Oct 2019 at 17:35, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: >> On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >>> Currently, significant fraction of CPU time during TC filter allocation >>> is spent in percpu allocator. Moreover, percpu allocator is protected >>> with single global mutex which negates any potential to improve its >>> performance by means of recent developments in TC filter update API that >>> removed rtnl lock for some Qdiscs and classifiers. In order to >>> significantly improve filter update rate and reduce memory usage we >>> would like to allow users to skip percpu counters allocation for >>> specific action if they don't expect high traffic rate hitting the >>> action, which is a reasonable expectation for hardware-offloaded setup. >>> In that case any potential gains to software fast-path performance >>> gained by usage of percpu-allocated counters compared to regular integer >>> counters protected by spinlock are not important, but amount of >>> additional CPU and memory consumed by them is significant. >> >> Yes! >> >> I wonder how this can play together with conntrack offloading. With >> it the sw datapath will be more used, as a conntrack entry can only be >> offloaded after the handshake. That said, the host can have to >> process quite some handshakes in sw datapath. Seems OvS can then just >> not set this flag in act_ct (and others for this rule), and such cases >> will be able to leverage the percpu stats. Right? > > The flag is set per each actions instance so client can chose not to use > the flag in case-by-case basis. Conntrack use case requires further > investigation since I'm not entirely convinced that handling first few > packets in sw (before connection reaches established state and is > offloaded) warrants having percpu counter. Hi Vlad, Did you consider using TCA_ROOT_FLAGS instead of adding another per-action 32-bit flag?
On Tue 22 Oct 2019 at 21:17, Roman Mashak <mrv@mojatatu.com> wrote: > Vlad Buslov <vladbu@mellanox.com> writes: > >> On Tue 22 Oct 2019 at 17:35, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: >>> On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >>>> Currently, significant fraction of CPU time during TC filter allocation >>>> is spent in percpu allocator. Moreover, percpu allocator is protected >>>> with single global mutex which negates any potential to improve its >>>> performance by means of recent developments in TC filter update API that >>>> removed rtnl lock for some Qdiscs and classifiers. In order to >>>> significantly improve filter update rate and reduce memory usage we >>>> would like to allow users to skip percpu counters allocation for >>>> specific action if they don't expect high traffic rate hitting the >>>> action, which is a reasonable expectation for hardware-offloaded setup. >>>> In that case any potential gains to software fast-path performance >>>> gained by usage of percpu-allocated counters compared to regular integer >>>> counters protected by spinlock are not important, but amount of >>>> additional CPU and memory consumed by them is significant. >>> >>> Yes! >>> >>> I wonder how this can play together with conntrack offloading. With >>> it the sw datapath will be more used, as a conntrack entry can only be >>> offloaded after the handshake. That said, the host can have to >>> process quite some handshakes in sw datapath. Seems OvS can then just >>> not set this flag in act_ct (and others for this rule), and such cases >>> will be able to leverage the percpu stats. Right? >> >> The flag is set per each actions instance so client can chose not to use >> the flag in case-by-case basis. Conntrack use case requires further >> investigation since I'm not entirely convinced that handling first few >> packets in sw (before connection reaches established state and is >> offloaded) warrants having percpu counter. > > Hi Vlad, > > Did you consider using TCA_ROOT_FLAGS instead of adding another > per-action 32-bit flag? Hi Roman, I considered it, but didn't find good way to implement my change with TCA_ROOT_FLAGS. I needed some flags to be per-action for following reasons: 1. Not all actions support the flag (only implemented for hw offloaded actions). 2. TCA_ROOT_FLAGS is act API specific and I need this to work when actions are created when actions are created with filters through cls API. I guess I could have changed tcf_action_init_1() to require having TCA_ROOT_FLAGS before actions attribute and then pass obtained value to act_ops->init() as additional argument, but it sounds more complex and ugly. Regards, Vlad
On Tue 22 Oct 2019 at 20:09, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > On Tue, Oct 22, 2019 at 03:52:31PM +0000, Vlad Buslov wrote: >> >> On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: >> > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >> >> - Extend actions that are used for hardware offloads with optional >> >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and >> >> update affected actions to not allocate percpu counters when the flag >> >> is set. >> > >> > I just went over all the patches and they mostly make sense to me. So >> > far the only point I'm uncertain of is the naming of the flag, >> > "fast_init". That is not clear on what it does and can be overloaded >> > with other stuff later and we probably don't want that. >> >> I intentionally named it like that because I do want to overload it with >> other stuff in future, instead of adding new flag value for every single >> small optimization we might come up with :) > > Hah :-) > >> >> Also, I didn't want to hardcode implementation details into UAPI that we >> will have to maintain for long time after percpu allocator in kernel is >> potentially replaced with something new and better (like idr is being >> replaced with xarray now, for example) > > I see. OTOH, this also means that the UAPI here would be unstable > (different meanings over time for the same call), and hopefully new > behaviors would always be backwards compatible. This flag doesn't change any userland-visible behavior, just suggests different performance trade off (insertion rate for sw packet processing speed). Counters continue to work with or without the flag. Any optimization that actually modifies cls or act API behavior will have to use dedicated flag value. > >> >> Anyway, lets see what other people think. I'm open to changing it. >> >> > >> > Say, for example, we want percpu counters but to disable allocating >> > the stats for hw, to make the counter in 28169abadb08 ("net/sched: Add >> > hardware specific counters to TC actions") optional. >> > >> > So what about: >> > TCA_ACT_FLAGS_NO_PERCPU_STATS >> > TCA_ACT_FLAGS_NO_HW_STATS (this one to be done on a subsequent patchset, yes) >> > ? >> > >> > Marcelo >>
Hi Vlad, On 2019-10-22 10:17 a.m., Vlad Buslov wrote: > Currently, significant fraction of CPU time during TC filter allocation > is spent in percpu allocator. Moreover, percpu allocator is protected > with single global mutex which negates any potential to improve its > performance by means of recent developments in TC filter update API that > removed rtnl lock for some Qdiscs and classifiers. In order to > significantly improve filter update rate and reduce memory usage we > would like to allow users to skip percpu counters allocation for > specific action if they don't expect high traffic rate hitting the > action, which is a reasonable expectation for hardware-offloaded setup. > In that case any potential gains to software fast-path performance > gained by usage of percpu-allocated counters compared to regular integer > counters protected by spinlock are not important, but amount of > additional CPU and memory consumed by them is significant. Great to see this becoming low hanging on the fruit tree after your improvements. Note: had a discussion a few years back with Eric D.(on Cc) when i was trying to improve action dumping; what you are seeing was very visible when doing a large batch creation of actions. At the time i was thinking of amortizing the cost of that mutex in a batch action create i.e you ask the per cpu allocator to alloc a batch of the stats instead of singular. I understand your use case being different since it is for h/w offload. If you have time can you test with batching many actions and seeing the before/after improvement? Note: even for h/w offload it makes sense to first create the actions then bind to filters (in my world thats what we end up doing). If we can improve the first phase it is a win for both s/w and hw use cases. Question: Given TCA_ACT_FLAGS_FAST_INIT is common to all actions would it make sense to use Could you have used a TLV in the namespace of TCA_ACT_MAX (outer TLV)? You will have to pass a param to ->init(). cheers, jamal
I shouldve read the thread backward. My earlier email was asking similar question to Roman. On 2019-10-23 2:38 a.m., Vlad Buslov wrote: > > On Tue 22 Oct 2019 at 21:17, Roman Mashak <mrv@mojatatu.com> wrote: > Hi Roman, > > I considered it, but didn't find good way to implement my change with > TCA_ROOT_FLAGS. I needed some flags to be per-action for following > reasons: > > 1. Not all actions support the flag (only implemented for hw offloaded > actions). > > > 2. TCA_ROOT_FLAGS is act API specific and I need this to work when > actions are created when actions are created with filters through cls > API. I guess I could have changed tcf_action_init_1() to require > having TCA_ROOT_FLAGS before actions attribute and then pass obtained > value to act_ops->init() as additional argument, but it sounds more > complex and ugly. I really shouldve read the thread backwards;-> The question is if this uglier than introducing a new TLV for every action. cheers, jamal
On Wed 23 Oct 2019 at 15:49, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > Hi Vlad, > > On 2019-10-22 10:17 a.m., Vlad Buslov wrote: >> Currently, significant fraction of CPU time during TC filter allocation >> is spent in percpu allocator. Moreover, percpu allocator is protected >> with single global mutex which negates any potential to improve its >> performance by means of recent developments in TC filter update API that >> removed rtnl lock for some Qdiscs and classifiers. In order to >> significantly improve filter update rate and reduce memory usage we >> would like to allow users to skip percpu counters allocation for >> specific action if they don't expect high traffic rate hitting the >> action, which is a reasonable expectation for hardware-offloaded setup. >> In that case any potential gains to software fast-path performance >> gained by usage of percpu-allocated counters compared to regular integer >> counters protected by spinlock are not important, but amount of >> additional CPU and memory consumed by them is significant. > > Great to see this becoming low hanging on the fruit tree > after your improvements. > Note: had a discussion a few years back with Eric D.(on Cc) > when i was trying to improve action dumping; what you are seeing > was very visible when doing a large batch creation of actions. > At the time i was thinking of amortizing the cost of that mutex > in a batch action create i.e you ask the per cpu allocator > to alloc a batch of the stats instead of singular. > > I understand your use case being different since it is for h/w > offload. If you have time can you test with batching many actions > and seeing the before/after improvement? Will do. > > Note: even for h/w offload it makes sense to first create the actions > then bind to filters (in my world thats what we end up doing). > If we can improve the first phase it is a win for both s/w and hw use > cases. > > Question: > Given TCA_ACT_FLAGS_FAST_INIT is common to all actions would it make > sense to use Could you have used a TLV in the namespace of TCA_ACT_MAX > (outer TLV)? You will have to pass a param to ->init(). It is not common for all actions. I omitted modifying actions that are not offloaded and some actions don't user percpu allocator at all (pedit, for example) and have no use for this flag at the moment. > > cheers, > jamal
On Wed 23 Oct 2019 at 16:02, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > I shouldve read the thread backward. My earlier email was asking > similar question to Roman. > > On 2019-10-23 2:38 a.m., Vlad Buslov wrote: >> >> On Tue 22 Oct 2019 at 21:17, Roman Mashak <mrv@mojatatu.com> wrote: > >> Hi Roman, >> >> I considered it, but didn't find good way to implement my change with >> TCA_ROOT_FLAGS. I needed some flags to be per-action for following >> reasons: >> >> 1. Not all actions support the flag (only implemented for hw offloaded >> actions). >> >> >> 2. TCA_ROOT_FLAGS is act API specific and I need this to work when >> actions are created when actions are created with filters through cls >> API. I guess I could have changed tcf_action_init_1() to require >> having TCA_ROOT_FLAGS before actions attribute and then pass obtained >> value to act_ops->init() as additional argument, but it sounds more >> complex and ugly. > > I really shouldve read the thread backwards;-> > The question is if this uglier than introducing a new TLV for every > action. I'm not introducing it for every action, only for minority of them. > > cheers, > jamal
On 2019-10-23 9:04 a.m., Vlad Buslov wrote: > > On Wed 23 Oct 2019 at 15:49, Jamal Hadi Salim <jhs@mojatatu.com> wrote: >> Hi Vlad, >> >> I understand your use case being different since it is for h/w >> offload. If you have time can you test with batching many actions >> and seeing the before/after improvement? > > Will do. Thanks. I think you may have published number before, but would be interesting to see the before and after of adding the action first and measuring the filter improvement without caring about the allocator. > >> >> Note: even for h/w offload it makes sense to first create the actions >> then bind to filters (in my world thats what we end up doing). >> If we can improve the first phase it is a win for both s/w and hw use >> cases. >> >> Question: >> Given TCA_ACT_FLAGS_FAST_INIT is common to all actions would it make >> sense to use Could you have used a TLV in the namespace of TCA_ACT_MAX >> (outer TLV)? You will have to pass a param to ->init(). > > It is not common for all actions. I omitted modifying actions that are > not offloaded and some actions don't user percpu allocator at all > (pedit, for example) and have no use for this flag at the moment. pedit just never got updated (its simple to update). There is value in the software to have _all_ the actions use per cpu stats. It improves fast path performance. Jiri complains constantly about all these new per-action TLVs which are generic. He promised to "fix it all" someday. Jiri i notice your ack here, what happened? ;-> cheers, jamal
On Wed 23 Oct 2019 at 17:21, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-23 9:04 a.m., Vlad Buslov wrote: >> >> On Wed 23 Oct 2019 at 15:49, Jamal Hadi Salim <jhs@mojatatu.com> wrote: >>> Hi Vlad, >>> > >>> I understand your use case being different since it is for h/w >>> offload. If you have time can you test with batching many actions >>> and seeing the before/after improvement? >> >> Will do. > > Thanks. > > I think you may have published number before, but would be interesting > to see the before and after of adding the action first and measuring the > filter improvement without caring about the allocator. For filter with single gact drop action (first line in insertion rate table in the cover letter) I get insertion rate of 412k rules/sec with all of the actions preallocated in advance, which is 2x improvement. > >> >>> >>> Note: even for h/w offload it makes sense to first create the actions >>> then bind to filters (in my world thats what we end up doing). >>> If we can improve the first phase it is a win for both s/w and hw use >>> cases. >>> >>> Question: >>> Given TCA_ACT_FLAGS_FAST_INIT is common to all actions would it make >>> sense to use Could you have used a TLV in the namespace of TCA_ACT_MAX >>> (outer TLV)? You will have to pass a param to ->init(). >> >> It is not common for all actions. I omitted modifying actions that are >> not offloaded and some actions don't user percpu allocator at all >> (pedit, for example) and have no use for this flag at the moment. > > pedit just never got updated (its simple to update). There is > value in the software to have _all_ the actions use per cpu stats. > It improves fast path performance. > > Jiri complains constantly about all these new per-action TLVs > which are generic. He promised to "fix it all" someday. Jiri i notice > your ack here, what happened? ;-> > > cheers, > jamal
Wed, Oct 23, 2019 at 04:21:51PM CEST, jhs@mojatatu.com wrote: >On 2019-10-23 9:04 a.m., Vlad Buslov wrote: >> >> On Wed 23 Oct 2019 at 15:49, Jamal Hadi Salim <jhs@mojatatu.com> wrote: >> > Hi Vlad, >> > > >> > I understand your use case being different since it is for h/w >> > offload. If you have time can you test with batching many actions >> > and seeing the before/after improvement? >> >> Will do. > >Thanks. > >I think you may have published number before, but would be interesting >to see the before and after of adding the action first and measuring the >filter improvement without caring about the allocator. > >> >> > >> > Note: even for h/w offload it makes sense to first create the actions >> > then bind to filters (in my world thats what we end up doing). >> > If we can improve the first phase it is a win for both s/w and hw use >> > cases. >> > >> > Question: >> > Given TCA_ACT_FLAGS_FAST_INIT is common to all actions would it make >> > sense to use Could you have used a TLV in the namespace of TCA_ACT_MAX >> > (outer TLV)? You will have to pass a param to ->init(). >> >> It is not common for all actions. I omitted modifying actions that are >> not offloaded and some actions don't user percpu allocator at all >> (pedit, for example) and have no use for this flag at the moment. > >pedit just never got updated (its simple to update). There is >value in the software to have _all_ the actions use per cpu stats. >It improves fast path performance. > >Jiri complains constantly about all these new per-action TLVs >which are generic. He promised to "fix it all" someday. Jiri i notice >your ack here, what happened? ;-> Correct, it would be great. However not sure how exactly to do that now. Do you have some ideas. But basically this patchset does what was done many many times in the past. I think it was a mistake in the original design not to have some "common attrs" :/ Lesson learned for next interfaces.
On Tue, Oct 22, 2019 at 10:10 AM Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > > On Tue, Oct 22, 2019 at 03:52:31PM +0000, Vlad Buslov wrote: > > > > On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > > > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: > > >> - Extend actions that are used for hardware offloads with optional > > >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and > > >> update affected actions to not allocate percpu counters when the flag > > >> is set. > > > > > > I just went over all the patches and they mostly make sense to me. So > > > far the only point I'm uncertain of is the naming of the flag, > > > "fast_init". That is not clear on what it does and can be overloaded > > > with other stuff later and we probably don't want that. > > > > I intentionally named it like that because I do want to overload it with > > other stuff in future, instead of adding new flag value for every single > > small optimization we might come up with :) > > Hah :-) > > > > > Also, I didn't want to hardcode implementation details into UAPI that we > > will have to maintain for long time after percpu allocator in kernel is > > potentially replaced with something new and better (like idr is being > > replaced with xarray now, for example) > > I see. OTOH, this also means that the UAPI here would be unstable > (different meanings over time for the same call), and hopefully new > behaviors would always be backwards compatible. +1, I also think optimization flags should be specific to what they optimize. TCA_ACT_FLAGS_NO_PERCPU_STATS seems like a better choice. It allows user to explicitly request for it.
On Thu 24 Oct 2019 at 18:12, Roopa Prabhu <roopa@cumulusnetworks.com> wrote: > On Tue, Oct 22, 2019 at 10:10 AM Marcelo Ricardo Leitner > <mleitner@redhat.com> wrote: >> >> On Tue, Oct 22, 2019 at 03:52:31PM +0000, Vlad Buslov wrote: >> > >> > On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: >> > > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >> > >> - Extend actions that are used for hardware offloads with optional >> > >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and >> > >> update affected actions to not allocate percpu counters when the flag >> > >> is set. >> > > >> > > I just went over all the patches and they mostly make sense to me. So >> > > far the only point I'm uncertain of is the naming of the flag, >> > > "fast_init". That is not clear on what it does and can be overloaded >> > > with other stuff later and we probably don't want that. >> > >> > I intentionally named it like that because I do want to overload it with >> > other stuff in future, instead of adding new flag value for every single >> > small optimization we might come up with :) >> >> Hah :-) >> >> > >> > Also, I didn't want to hardcode implementation details into UAPI that we >> > will have to maintain for long time after percpu allocator in kernel is >> > potentially replaced with something new and better (like idr is being >> > replaced with xarray now, for example) >> >> I see. OTOH, this also means that the UAPI here would be unstable >> (different meanings over time for the same call), and hopefully new >> behaviors would always be backwards compatible. > > +1, I also think optimization flags should be specific to what they optimize. > TCA_ACT_FLAGS_NO_PERCPU_STATS seems like a better choice. It allows > user to explicitly request for it. Why would user care about details of optimizations that doesn't produce visible side effects for user land? Again, counters continue to work the same with or without the flag.
On Thu 24 Oct 2019 at 10:35, Jiri Pirko <jiri@resnulli.us> wrote: > Wed, Oct 23, 2019 at 04:21:51PM CEST, jhs@mojatatu.com wrote: >>On 2019-10-23 9:04 a.m., Vlad Buslov wrote: >>> >>> On Wed 23 Oct 2019 at 15:49, Jamal Hadi Salim <jhs@mojatatu.com> wrote: >>> > Hi Vlad, >>> > >> >>> > I understand your use case being different since it is for h/w >>> > offload. If you have time can you test with batching many actions >>> > and seeing the before/after improvement? >>> >>> Will do. >> >>Thanks. >> >>I think you may have published number before, but would be interesting >>to see the before and after of adding the action first and measuring the >>filter improvement without caring about the allocator. >> >>> >>> > >>> > Note: even for h/w offload it makes sense to first create the actions >>> > then bind to filters (in my world thats what we end up doing). >>> > If we can improve the first phase it is a win for both s/w and hw use >>> > cases. >>> > >>> > Question: >>> > Given TCA_ACT_FLAGS_FAST_INIT is common to all actions would it make >>> > sense to use Could you have used a TLV in the namespace of TCA_ACT_MAX >>> > (outer TLV)? You will have to pass a param to ->init(). >>> >>> It is not common for all actions. I omitted modifying actions that are >>> not offloaded and some actions don't user percpu allocator at all >>> (pedit, for example) and have no use for this flag at the moment. >> >>pedit just never got updated (its simple to update). There is >>value in the software to have _all_ the actions use per cpu stats. >>It improves fast path performance. >> >>Jiri complains constantly about all these new per-action TLVs >>which are generic. He promised to "fix it all" someday. Jiri i notice >>your ack here, what happened? ;-> > > Correct, it would be great. However not sure how exactly to do that now. > Do you have some ideas. > > But basically this patchset does what was done many many times in the > past. I think it was a mistake in the original design not to have some > "common attrs" :/ Lesson learned for next interfaces. Jamal, can we reach some conclusion? Do you still suggest to refactor the patches to use TCA_ROOT_FLAGS and parse it in act API instead of individual actions? Thanks, Vlad
On 2019-10-24 11:23 a.m., Vlad Buslov wrote: > On Thu 24 Oct 2019 at 10:35, Jiri Pirko <jiri@resnulli.us> wrote: >> Wed, Oct 23, 2019 at 04:21:51PM CEST, jhs@mojatatu.com wrote: >>> On 2019-10-23 9:04 a.m., Vlad Buslov wrote: >>>> [..] >>> Jiri complains constantly about all these new per-action TLVs >>> which are generic. He promised to "fix it all" someday. Jiri i notice >>> your ack here, what happened? ;-> >> >> Correct, it would be great. However not sure how exactly to do that now. >> Do you have some ideas. >> >> >> But basically this patchset does what was done many many times in the >> past. I think it was a mistake in the original design not to have some >> "common attrs" :/ Lesson learned for next interfaces. > Jiri, we have a high level TLV space which can be applied to all actions. See discussion below with Vlad. At least for this specific change we can get away from repeating that mistake. > Jamal, can we reach some conclusion? Do you still suggest to refactor > the patches to use TCA_ROOT_FLAGS and parse it in act API instead of > individual actions? > IMO this would certainly help us walk away from having every action replicate the same attribute with common semantics. refactoring ->init() to take an extra attribute may look ugly but is cleaner from a uapi pov. Actions that dont implement the feature can ignore the extra parameter(s). It doesnt have to be TCA_ROOT_FLAGS but certainly that high level TLV space is the right place to put it. WDYT? cheers, jamal
On Thu 24 Oct 2019 at 19:12, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-24 11:23 a.m., Vlad Buslov wrote: >> On Thu 24 Oct 2019 at 10:35, Jiri Pirko <jiri@resnulli.us> wrote: >>> Wed, Oct 23, 2019 at 04:21:51PM CEST, jhs@mojatatu.com wrote: >>>> On 2019-10-23 9:04 a.m., Vlad Buslov wrote: >>>>> > > [..] >>>> Jiri complains constantly about all these new per-action TLVs >>>> which are generic. He promised to "fix it all" someday. Jiri i notice >>>> your ack here, what happened? ;-> >>> >>> Correct, it would be great. However not sure how exactly to do that now. >>> Do you have some ideas. >>> >>> >>> But basically this patchset does what was done many many times in the >>> past. I think it was a mistake in the original design not to have some >>> "common attrs" :/ Lesson learned for next interfaces. >> > > Jiri, we have a high level TLV space which can be applied to all > actions. See discussion below with Vlad. At least for this specific > change we can get away from repeating that mistake. > >> Jamal, can we reach some conclusion? Do you still suggest to refactor >> the patches to use TCA_ROOT_FLAGS and parse it in act API instead of >> individual actions? >> > > IMO this would certainly help us walk away from having > every action replicate the same attribute with common semantics. > refactoring ->init() to take an extra attribute may look ugly but > is cleaner from a uapi pov. Actions that dont implement the feature > can ignore the extra parameter(s). It doesnt have to be TCA_ROOT_FLAGS > but certainly that high level TLV space is the right place to put it. > WDYT? > > cheers, > jamal Well, I like having it per-action better because of reasons I explained before (some actions don't use percpu allocator at all and some actions that are not hw offloaded don't need it), but I think both solutions have their benefits and drawbacks, so I'm fine with refactoring it. Do you have any opinion regarding flag naming? Several people suggested to be more specific, but I strongly dislike the idea of hardcoding the name of a internal kernel data structure in UAPI constant that will potentially outlive the data structure by a long time.
Hi Vlad, On 2019-10-24 12:44 p.m., Vlad Buslov wrote: > > Well, I like having it per-action better because of reasons I explained > before (some actions don't use percpu allocator at all and some actions > that are not hw offloaded don't need it), but I think both solutions > have their benefits and drawbacks, so I'm fine with refactoring it. > I am happy you are doing all this great work already. I would be happier if you did it at the root level. It is something that we have been meaning to deal with for a while now. > Do you have any opinion regarding flag naming? Several people suggested > to be more specific, but I strongly dislike the idea of hardcoding the > name of a internal kernel data structure in UAPI constant that will > potentially outlive the data structure by a long time. Could you not just name the bit with a define to say what the bit is for and still use the top level flag? Example we have a bit called "TCA_FLAG_LARGE_DUMP_ON" cheers, jamal
On Thu, Oct 24, 2019 at 03:18:00PM +0000, Vlad Buslov wrote: > > On Thu 24 Oct 2019 at 18:12, Roopa Prabhu <roopa@cumulusnetworks.com> wrote: > > On Tue, Oct 22, 2019 at 10:10 AM Marcelo Ricardo Leitner > > <mleitner@redhat.com> wrote: > >> > >> On Tue, Oct 22, 2019 at 03:52:31PM +0000, Vlad Buslov wrote: > >> > > >> > On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > >> > > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: > >> > >> - Extend actions that are used for hardware offloads with optional > >> > >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and > >> > >> update affected actions to not allocate percpu counters when the flag > >> > >> is set. > >> > > > >> > > I just went over all the patches and they mostly make sense to me. So > >> > > far the only point I'm uncertain of is the naming of the flag, > >> > > "fast_init". That is not clear on what it does and can be overloaded > >> > > with other stuff later and we probably don't want that. > >> > > >> > I intentionally named it like that because I do want to overload it with > >> > other stuff in future, instead of adding new flag value for every single > >> > small optimization we might come up with :) > >> > >> Hah :-) > >> > >> > > >> > Also, I didn't want to hardcode implementation details into UAPI that we > >> > will have to maintain for long time after percpu allocator in kernel is > >> > potentially replaced with something new and better (like idr is being > >> > replaced with xarray now, for example) > >> > >> I see. OTOH, this also means that the UAPI here would be unstable > >> (different meanings over time for the same call), and hopefully new > >> behaviors would always be backwards compatible. > > > > +1, I also think optimization flags should be specific to what they optimize. > > TCA_ACT_FLAGS_NO_PERCPU_STATS seems like a better choice. It allows > > user to explicitly request for it. > > Why would user care about details of optimizations that doesn't produce > visible side effects for user land? Again, counters continue to work the > same with or without the flag. It's just just details of optimizations, on whether to use likely() or not, and it does produce user visible effects. Not in terms of API but of system behavior. Otherwise we wouldn't need the flag, right? _FAST_INIT, or the fact that it inits faster, is just one of the aspects of it, but one could also want it for being lighther in footprint as currently it is really easy to eat Gigs of RAM away on these percpu counters. So how should it be called, _FAST_INIT or _SLIM_RULES? It may be implementation detail, yes, but we shouldn't be building usage profiles and instead let the user pick what they want. Likewise, we don't have net.ipv4.fast_tcp, but net.ipv4.tcp_sack, tcp_timestamps & cia. If we can find another name then, without using 'percpu' on it but without stablishing profiles, that would be nice. Like TCA_ACT_FLAGS_SIMPLE_STATS, or so. Even though I still prefer the PERCPU, as it's as explicit as it can be. Note that bpf does it like that already: uapi]$ grep BPF.*HASH -- linux/bpf.h BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_LRU_HASH, BPF_MAP_TYPE_LRU_PERCPU_HASH, ...
On Thu 24 Oct 2019 at 20:31, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > On Thu, Oct 24, 2019 at 03:18:00PM +0000, Vlad Buslov wrote: >> >> On Thu 24 Oct 2019 at 18:12, Roopa Prabhu <roopa@cumulusnetworks.com> wrote: >> > On Tue, Oct 22, 2019 at 10:10 AM Marcelo Ricardo Leitner >> > <mleitner@redhat.com> wrote: >> >> >> >> On Tue, Oct 22, 2019 at 03:52:31PM +0000, Vlad Buslov wrote: >> >> > >> >> > On Tue 22 Oct 2019 at 18:15, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: >> >> > > On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote: >> >> > >> - Extend actions that are used for hardware offloads with optional >> >> > >> netlink 32bit flags field. Add TCA_ACT_FLAGS_FAST_INIT action flag and >> >> > >> update affected actions to not allocate percpu counters when the flag >> >> > >> is set. >> >> > > >> >> > > I just went over all the patches and they mostly make sense to me. So >> >> > > far the only point I'm uncertain of is the naming of the flag, >> >> > > "fast_init". That is not clear on what it does and can be overloaded >> >> > > with other stuff later and we probably don't want that. >> >> > >> >> > I intentionally named it like that because I do want to overload it with >> >> > other stuff in future, instead of adding new flag value for every single >> >> > small optimization we might come up with :) >> >> >> >> Hah :-) >> >> >> >> > >> >> > Also, I didn't want to hardcode implementation details into UAPI that we >> >> > will have to maintain for long time after percpu allocator in kernel is >> >> > potentially replaced with something new and better (like idr is being >> >> > replaced with xarray now, for example) >> >> >> >> I see. OTOH, this also means that the UAPI here would be unstable >> >> (different meanings over time for the same call), and hopefully new >> >> behaviors would always be backwards compatible. >> > >> > +1, I also think optimization flags should be specific to what they optimize. >> > TCA_ACT_FLAGS_NO_PERCPU_STATS seems like a better choice. It allows >> > user to explicitly request for it. >> >> Why would user care about details of optimizations that doesn't produce >> visible side effects for user land? Again, counters continue to work the >> same with or without the flag. > > It's just just details of optimizations, on whether to use likely() or > not, and it does produce user visible effects. Not in terms of API but > of system behavior. Otherwise we wouldn't need the flag, right? > _FAST_INIT, or the fact that it inits faster, is just one of the > aspects of it, but one could also want it for being lighther in > footprint as currently it is really easy to eat Gigs of RAM away on > these percpu counters. So how should it be called, _FAST_INIT or > _SLIM_RULES? Hmm, yes, I think I emphasize insertion rate too much because it was my main goal, but memory usage is also important. > > It may be implementation detail, yes, but we shouldn't be building > usage profiles and instead let the user pick what they want. Likewise, > we don't have net.ipv4.fast_tcp, but net.ipv4.tcp_sack, tcp_timestamps > & cia. > > If we can find another name then, without using 'percpu' on it but > without stablishing profiles, that would be nice. > Like TCA_ACT_FLAGS_SIMPLE_STATS, or so. > > Even though I still prefer the PERCPU, as it's as explicit as it can be. Note > that bpf does it like that already: > uapi]$ grep BPF.*HASH -- linux/bpf.h > BPF_MAP_TYPE_HASH, > BPF_MAP_TYPE_PERCPU_HASH, > BPF_MAP_TYPE_LRU_HASH, > BPF_MAP_TYPE_LRU_PERCPU_HASH, > ... I would argue that all of these produce visible side effect for the user. Selective acks are clearly visible in packet dumps, BPF maps are directly accessed from BPF programs loaded from user space, etc. But I get your point that naming can be improved from FAST_INIT which is too abstract.
On Thu 24 Oct 2019 at 20:17, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > Hi Vlad, > > On 2019-10-24 12:44 p.m., Vlad Buslov wrote: > >> >> Well, I like having it per-action better because of reasons I explained >> before (some actions don't use percpu allocator at all and some actions >> that are not hw offloaded don't need it), but I think both solutions >> have their benefits and drawbacks, so I'm fine with refactoring it. >> > > I am happy you are doing all this great work already. I would be happier if you > did it at the root level. It is something that we have been > meaning to deal with for a while now. > >> Do you have any opinion regarding flag naming? Several people suggested >> to be more specific, but I strongly dislike the idea of hardcoding the >> name of a internal kernel data structure in UAPI constant that will >> potentially outlive the data structure by a long time. > > Could you not just name the bit with a define to say what the bit > is for and still use the top level flag? Example we have > a bit called "TCA_FLAG_LARGE_DUMP_ON" Yes, of course. I was talking strictly about naming of TCA_ACT_FLAGS_FAST_INIT flag value constant. > > cheers, > jamal
On 2019-10-24 2:05 p.m., Vlad Buslov wrote: > > > Yes, of course. I was talking strictly about naming of > TCA_ACT_FLAGS_FAST_INIT flag value constant. > Sorry, I missed that. You know discussions on names and punctiations can last a lifetime;->[1]. A name which reflects semantics or established style is always helpful. So Marcelo's suggestion is reasonable.. In the same spirit as TCA_FLAG_LARGE_DUMP_ON, how about: TCA_FLAG_NO_PERCPU_STATS? cheers, jamal [1]http://bikeshed.com/
On Fri 25 Oct 2019 at 14:46, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-24 2:05 p.m., Vlad Buslov wrote: >> > >> >> Yes, of course. I was talking strictly about naming of >> TCA_ACT_FLAGS_FAST_INIT flag value constant. >> > > Sorry, I missed that. You know discussions on names and > punctiations can last a lifetime;->[1]. > A name which reflects semantics or established style > is always helpful. So Marcelo's suggestion is reasonable.. > In the same spirit as TCA_FLAG_LARGE_DUMP_ON, how about: > TCA_FLAG_NO_PERCPU_STATS? > > cheers, > jamal > [1]http://bikeshed.com/ It looks like I'm the only one who doesn't like to hardcode internal kernel data structure names into UAPI. Will change it in V2. Thanks, Vlad
On Thu 24 Oct 2019 at 19:12, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-24 11:23 a.m., Vlad Buslov wrote: >> On Thu 24 Oct 2019 at 10:35, Jiri Pirko <jiri@resnulli.us> wrote: >>> Wed, Oct 23, 2019 at 04:21:51PM CEST, jhs@mojatatu.com wrote: >>>> On 2019-10-23 9:04 a.m., Vlad Buslov wrote: >>>>> > > [..] >>>> Jiri complains constantly about all these new per-action TLVs >>>> which are generic. He promised to "fix it all" someday. Jiri i notice >>>> your ack here, what happened? ;-> >>> >>> Correct, it would be great. However not sure how exactly to do that now. >>> Do you have some ideas. >>> >>> >>> But basically this patchset does what was done many many times in the >>> past. I think it was a mistake in the original design not to have some >>> "common attrs" :/ Lesson learned for next interfaces. >> > > Jiri, we have a high level TLV space which can be applied to all > actions. See discussion below with Vlad. At least for this specific > change we can get away from repeating that mistake. > >> Jamal, can we reach some conclusion? Do you still suggest to refactor >> the patches to use TCA_ROOT_FLAGS and parse it in act API instead of >> individual actions? >> > > IMO this would certainly help us walk away from having > every action replicate the same attribute with common semantics. > refactoring ->init() to take an extra attribute may look ugly but > is cleaner from a uapi pov. Actions that dont implement the feature > can ignore the extra parameter(s). It doesnt have to be TCA_ROOT_FLAGS > but certainly that high level TLV space is the right place to put it. > WDYT? > > cheers, > jamal Hi Jamal, I've been trying to re-design this change to use high level TLV, but I don't understand how to make it backward compatible. If I just put new attribute before nested action attributes, it will fail to parse when older client send netlink packet without it and I don't see straightforward way to distinguish between the new attribute and following nested action attribute (which might accidentally have same length and fall into allowed attribute range). I don't have much experience working with netlink, so I might be missing something obvious here. However, extending existing action attributes (which are already conveniently parsed in tcf_action_init_1() function) with new optional 'flags' value seems like straightforward and backward compatible solution. Rough outline in code: 2 files changed, 6 insertions(+), 2 deletions(-) include/uapi/linux/pkt_cls.h | 1 + net/sched/act_api.c | 7 +++++-- modified include/uapi/linux/pkt_cls.h @@ -16,6 +16,7 @@ enum { TCA_ACT_STATS, TCA_ACT_PAD, TCA_ACT_COOKIE, + TCA_ACT_FLAGS, __TCA_ACT_MAX }; modified net/sched/act_api.c @@ -852,6 +852,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, char act_name[IFNAMSIZ]; struct nlattr *tb[TCA_ACT_MAX + 1]; struct nlattr *kind; + u32 flags = 0; int err; if (name == NULL) { @@ -877,6 +878,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, goto err_out; } } + if (tb[TCA_ACT_FLAGS]) + flags = nla_get_bitfield32(tb[TCA_ACT_FLAGS]).value; } else { if (strlcpy(act_name, name, IFNAMSIZ) >= IFNAMSIZ) { NL_SET_ERR_MSG(extack, "TC action name too long"); @@ -915,10 +918,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, /* backward compatibility for policer */ if (name == NULL) err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind, - rtnl_held, tp, extack); + rtnl_held, tp, flags, extack); else err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held, - tp, extack); + tp, flags, extack); if (err < 0) goto err_mod; WDYT? Regards, Vlad
On 2019-10-25 10:26 a.m., Vlad Buslov wrote: > I've been trying to re-design this change to use high level TLV, but I > don't understand how to make it backward compatible. If I just put new > attribute before nested action attributes, it will fail to parse when > older client send netlink packet without it and I don't see > straightforward way to distinguish between the new attribute and > following nested action attribute (which might accidentally have same > length and fall into allowed attribute range). I don't have much > experience working with netlink, so I might be missing something obvious > here. However, extending existing action attributes (which are already > conveniently parsed in tcf_action_init_1() function) with new optional > 'flags' value seems like straightforward and backward compatible > solution. > I think i understand what you are saying. Let me take a quick look at the code and get back to you. cheers, jamal
On 2019-10-25 10:57 a.m., Jamal Hadi Salim wrote: > On 2019-10-25 10:26 a.m., Vlad Buslov wrote: > > I think i understand what you are saying. Let me take a quick > look at the code and get back to you. > How about something like this (not even compile tested).. Yes, that init is getting uglier... over time if we are going to move things into shared attributes we will need to introduce a new data struct to pass to ->init() cheers, jamal diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 69d4676a402f..b0d1e00fe2c2 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -842,6 +842,7 @@ static const struct nla_policy tcf_action_policy[TCA_ACT_MAX + 1] = { struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, + struct nla_bitfield32 *root_flags, bool rtnl_held, struct netlink_ext_ack *extack) { @@ -914,10 +915,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, /* backward compatibility for policer */ if (name == NULL) err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind, - rtnl_held, tp, extack); + rtnl_held, root_flags, tp, extack); else err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held, - tp, extack); + root_flags, tp, extack); if (err < 0) goto err_mod; @@ -955,7 +956,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, struct tc_action *actions[], size_t *attr_size, - bool rtnl_held, struct netlink_ext_ack *extack) + bool rtnl_held, struct nla_bitfield32 *root_flags, + struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; struct tc_action *act; @@ -970,7 +972,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(net, tp, tb[i], est, name, ovr, bind, - rtnl_held, extack); + root_flags, rtnl_held, extack); if (IS_ERR(act)) { err = PTR_ERR(act); goto err; @@ -1350,6 +1352,7 @@ tcf_add_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], static int tcf_action_add(struct net *net, struct nlattr *nla, struct nlmsghdr *n, u32 portid, int ovr, + struct nla_bitfield32 *root_flags, struct netlink_ext_ack *extack) { size_t attr_size = 0; @@ -1358,7 +1361,8 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, for (loop = 0; loop < 10; loop++) { ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, - actions, &attr_size, true, extack); + actions, &attr_size, true, root_flags, + extack); if (ret != -EAGAIN) break; } @@ -1385,6 +1389,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, struct net *net = sock_net(skb->sk); struct nlattr *tca[TCA_ROOT_MAX + 1]; u32 portid = skb ? NETLINK_CB(skb).portid : 0; + struct nla_bitfield32 root_flags = {0, 0}; int ret = 0, ovr = 0; if ((n->nlmsg_type != RTM_GETACTION) && @@ -1412,8 +1417,11 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, */ if (n->nlmsg_flags & NLM_F_REPLACE) ovr = 1; + if (tb[TCA_ROOT_FLAGS]) + root_flags = nla_get_bitfield32(tb[TCA_ROOT_FLAGS]); + ret = tcf_action_add(net, tca[TCA_ACT_TAB], n, portid, ovr, - extack); + &root_flags, extack); break; case RTM_DELACTION: ret = tca_action_gd(net, tca[TCA_ACT_TAB], n,
On Fri 25 Oct 2019 at 18:08, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-25 10:57 a.m., Jamal Hadi Salim wrote: >> On 2019-10-25 10:26 a.m., Vlad Buslov wrote: > >> >> I think i understand what you are saying. Let me take a quick >> look at the code and get back to you. >> > > How about something like this (not even compile tested).. > > Yes, that init is getting uglier... over time if we > are going to move things into shared attributes we will > need to introduce a new data struct to pass to ->init() > > cheers, > jamal The problem with this approach is that it only works when actions are created through act API, and not when they are created together with filter by cls API which doesn't expect or parse TCA_ROOT. That is why I wanted to have something in tcf_action_init_1() which is called by both of them.
On 2019-10-25 11:18 a.m., Vlad Buslov wrote: > > The problem with this approach is that it only works when actions are > created through act API, and not when they are created together with > filter by cls API which doesn't expect or parse TCA_ROOT. That is why I > wanted to have something in tcf_action_init_1() which is called by both > of them. > Aha. So the call path for tcf_action_init_1() via cls_api also needs to have this infra. I think i understand better what you wanted to do earlier with changing those enums. This is fixable - we just need to have the classifier side also take a new action root flags bitfield (I can send a sample). To your original comment, it is ugly. But maybe "fixing it" is pushing the boundaries and we should just go on and let your original approach in. My only worry is, given this is uapi, if we go back to your original idea we continue to propagate the bad design and we cant take it back (all the tooling etc would be cast for the next 5 years even if we did fix it in a month). Thoughts? cheers, jamal
On 2019-10-25 11:43 a.m., Jamal Hadi Salim wrote: > On 2019-10-25 11:18 a.m., Vlad Buslov wrote: >> > >> The problem with this approach is that it only works when actions are >> created through act API, and not when they are created together with >> filter by cls API which doesn't expect or parse TCA_ROOT. That is why I >> wanted to have something in tcf_action_init_1() which is called by both >> of them. >> > > Aha. So the call path for tcf_action_init_1() via cls_api also needs > to have this infra. I think i understand better what you wanted > to do earlier with changing those enums. > Hold on. Looking more at the code, direct call for tcf_action_init_1() from the cls code path is for backward compat of old policer approach. I think even modern iproute2 doesnt support that kind of call anymore. So you can pass NULL there for the *flags. But: for direct call to tcf_action_init() we would have to extract the flag from the TLV. The TLV already has TCA_ROOT_FLAGS in it. cheers, jamal
On Fri 25 Oct 2019 at 18:43, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-25 11:18 a.m., Vlad Buslov wrote: >> > >> The problem with this approach is that it only works when actions are >> created through act API, and not when they are created together with >> filter by cls API which doesn't expect or parse TCA_ROOT. That is why I >> wanted to have something in tcf_action_init_1() which is called by both >> of them. >> > > Aha. So the call path for tcf_action_init_1() via cls_api also needs > to have this infra. I think i understand better what you wanted > to do earlier with changing those enums. > > This is fixable - we just need to have the classifier side also take a > new action root flags bitfield (I can send a sample). To your original Trying to add this infra to cls API leads back to my original question: how do I do it in backward compatible manner? I assume that we can't break users of RTM_NEWTFILTER. > comment, it is ugly. But maybe "fixing it" is pushing the boundaries > and we should just go on and let your original approach in. > My only worry is, given this is uapi, if we go back to your original > idea we continue to propagate the bad design and we cant take it back > (all the tooling etc would be cast for the next 5 years even if we > did fix it in a month). Thoughts? I don't see anything particularly ugly with extending either action-specific attributes or TCA_ACT_* attributes, so its hard for me to reason about problems with current approach :) > > cheers, > jamal
On Fri 25 Oct 2019 at 19:06, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-25 11:43 a.m., Jamal Hadi Salim wrote: >> On 2019-10-25 11:18 a.m., Vlad Buslov wrote: >>> >> >>> The problem with this approach is that it only works when actions are >>> created through act API, and not when they are created together with >>> filter by cls API which doesn't expect or parse TCA_ROOT. That is why I >>> wanted to have something in tcf_action_init_1() which is called by both >>> of them. >>> >> >> Aha. So the call path for tcf_action_init_1() via cls_api also needs >> to have this infra. I think i understand better what you wanted >> to do earlier with changing those enums. >> > > Hold on. Looking more at the code, direct call for tcf_action_init_1() > from the cls code path is for backward compat of old policer approach. > I think even modern iproute2 doesnt support that kind of call > anymore. So you can pass NULL there for the *flags. But having the FAST_INIT flag set when creating actions through cls API is my main use case. Are you suggesting to only have flags when actions created through act API? > > But: for direct call to tcf_action_init() we would have > to extract the flag from the TLV. > The TLV already has TCA_ROOT_FLAGS in it. > > > cheers, > jamal
On Fri 25 Oct 2019 at 19:06, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-25 11:43 a.m., Jamal Hadi Salim wrote: >> On 2019-10-25 11:18 a.m., Vlad Buslov wrote: >>> >> >>> The problem with this approach is that it only works when actions are >>> created through act API, and not when they are created together with >>> filter by cls API which doesn't expect or parse TCA_ROOT. That is why I >>> wanted to have something in tcf_action_init_1() which is called by both >>> of them. >>> >> >> Aha. So the call path for tcf_action_init_1() via cls_api also needs >> to have this infra. I think i understand better what you wanted >> to do earlier with changing those enums. >> > > Hold on. Looking more at the code, direct call for tcf_action_init_1() > from the cls code path is for backward compat of old policer approach. > I think even modern iproute2 doesnt support that kind of call > anymore. So you can pass NULL there for the *flags. > > But: for direct call to tcf_action_init() we would have > to extract the flag from the TLV. > The TLV already has TCA_ROOT_FLAGS in it. After re-reading the code I think what you suggesting is that I can somehow parse TCA_ROOT_FLAGS in following functions: int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, struct tc_action *actions[], size_t *attr_size, bool rtnl_held, struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; struct tc_action *act; size_t sz = 0; int err; int i; -> err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX_PRIO, nla, NULL, extack); if (err < 0) return err; for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(net, tp, tb[i], est, name, ovr, bind, rtnl_held, extack); if (IS_ERR(act)) { err = PTR_ERR(act); goto err; } act->order = i; sz += tcf_action_fill_size(act); /* Start from index 0 */ actions[i - 1] = act; } *attr_size = tcf_action_full_attrs_size(sz); return i - 1; err: tcf_action_destroy(actions, bind); return err; } Again, I'm not too familiar with netlink, but looking at that I assume that the code parses up to TCA_ACT_MAX_PRIO nested TCA_ACT attributes and passes them one-by-one to tcf_action_init_1(). Are you suggesting that it also somehow includes TCA_ROOT? Regards, Vlad
On 2019-10-25 12:10 p.m., Vlad Buslov wrote: > >> Hold on. Looking more at the code, direct call for tcf_action_init_1() >> from the cls code path is for backward compat of old policer approach. >> I think even modern iproute2 doesnt support that kind of call >> anymore. So you can pass NULL there for the *flags. > > But having the FAST_INIT flag set when creating actions through cls API > is my main use case. Are you suggesting to only have flags when actions > created through act API? > Not at all. Here's my thinking... I didnt see your iproute2 change; however, in user space - i think all the classifiers eventually call parse_action()? You can stick the flags there under TCA_ACT_ROOT_FLAGS In the kernel, tcf_exts_validate() - two spots: tcf_action_init_1() pass NULL tcf_action_init() before invocation extract the TCA_ACT_ROOT_FLAGS (similar to the act_api approach). Am i missing something? Sorry - dont have much cycles right now but i could do a prototype later. cheers, jamal
On Fri 25 Oct 2019 at 19:29, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-25 12:10 p.m., Vlad Buslov wrote: >> > >>> Hold on. Looking more at the code, direct call for tcf_action_init_1() >>> from the cls code path is for backward compat of old policer approach. >>> I think even modern iproute2 doesnt support that kind of call >>> anymore. So you can pass NULL there for the *flags. >> >> But having the FAST_INIT flag set when creating actions through cls API >> is my main use case. Are you suggesting to only have flags when actions >> created through act API? >> > > Not at all. Here's my thinking... > > I didnt see your iproute2 change; however, in user space - i think all > the classifiers eventually call parse_action()? You can stick the flags > there under TCA_ACT_ROOT_FLAGS > > In the kernel, tcf_exts_validate() - two spots: > tcf_action_init_1() pass NULL > tcf_action_init() before invocation extract the TCA_ACT_ROOT_FLAGS > (similar to the act_api approach). > > Am i missing something? Sorry - dont have much cycles right now > but i could do a prototype later. > > cheers, > jamal I don't exactly follow. In case of act API we have following call chain: tc_ctl_action()->tcf_action_add()->tcf_action_init(). In this case TCA_ROOT is parsed in tc_ctl_action() and tcf_action_init() is called with one of its nested attributes - tca[TCA_ACT_TAB]. When tcf_action_init() is called TCA_ROOT is already "parsed out", but it is easy to pass it as an argument. For cls API lets take flower as an example: fl_change() parses TCA_FLOWER, and calls fl_set_parms()->tcf_exts_validate()->tcf_action_init() with TCA_FLOWER_ACT nested attribute. No TCA_ROOT is expected, TCA_FLOWER_ACT contains up to TCA_ACT_MAX_PRIO nested TCA_ACT attributes. So where can I include it without breaking backward compatibility? Regards, Vlad
On 2019-10-25 12:53 p.m., Vlad Buslov wrote: [..] Sorry, the distractions here are not helping. When i responded i had started to do below enum { TCA_ACT_UNSPEC, TCA_ACT_KIND, TCA_ACT_OPTIONS, TCA_ACT_INDEX, TCA_ACT_STATS, TCA_ACT_PAD, TCA_ACT_COOKIE, TCA_ACT_ROOT_FLAGS, __TCA_ACT_MAX }; Note: "TCA_ACT_ROOT_FLAGS" I think your other email may have tried to do the same? So i claimed it was there in that email because grep showed it but that's because i had added it;-> > For cls API lets take flower as an example: fl_change() parses TCA_FLOWER, and calls > fl_set_parms()->tcf_exts_validate()->tcf_action_init() with > TCA_FLOWER_ACT nested attribute. No TCA_ROOT is expected, TCA_FLOWER_ACT > contains up to TCA_ACT_MAX_PRIO nested TCA_ACT attributes. So where can > I include it without breaking backward compatibility? > Not a clean solution but avoids the per action TLV attribute: in user space parse_action() you add TCA_ACT_ROOT_FLAGS if the user specifies this on the command line. in the kernel everything just passes NULL for root_flags when invocation is from tcf_exts_validate() i.e both tcf_action_init_1() and tcf_action_init() In tcf_action_init_1(), if root_flags is NULL you try to get flags from from tb[TCA_ACT_ROOT_FLAGS] Apologies in advance for any latency - I will have time tomorrow. cheers, jamal
Like attached... cheers, jamal On 2019-10-25 2:17 p.m., Jamal Hadi Salim wrote: > On 2019-10-25 12:53 p.m., Vlad Buslov wrote: > [..] > > Sorry, the distractions here are not helping. > When i responded i had started to do below > > enum { > TCA_ACT_UNSPEC, > TCA_ACT_KIND, > TCA_ACT_OPTIONS, > TCA_ACT_INDEX, > TCA_ACT_STATS, > TCA_ACT_PAD, > TCA_ACT_COOKIE, > TCA_ACT_ROOT_FLAGS, > __TCA_ACT_MAX > }; > > Note: "TCA_ACT_ROOT_FLAGS" > I think your other email may have tried to do the same? > So i claimed it was there in that email because grep showed it > but that's because i had added it;-> > >> For cls API lets take flower as an example: fl_change() parses >> TCA_FLOWER, and calls >> fl_set_parms()->tcf_exts_validate()->tcf_action_init() with >> TCA_FLOWER_ACT nested attribute. No TCA_ROOT is expected, TCA_FLOWER_ACT >> contains up to TCA_ACT_MAX_PRIO nested TCA_ACT attributes. So where can >> I include it without breaking backward compatibility? >> > Not a clean solution but avoids the per action TLV attribute: > > in user space parse_action() you add TCA_ACT_ROOT_FLAGS > if the user specifies this on the command line. > > in the kernel everything just passes NULL for root_flags > when invocation is from tcf_exts_validate() i.e both > tcf_action_init_1() and > tcf_action_init() > > In tcf_action_init_1(), if root_flags is NULL > you try to get flags from from tb[TCA_ACT_ROOT_FLAGS] > > Apologies in advance for any latency - I will have time tomorrow. > > cheers, > jamal diff --git a/include/net/act_api.h b/include/net/act_api.h index b18c699681ca..ec52b28ddfc5 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -93,7 +93,8 @@ struct tc_action_ops { int (*lookup)(struct net *net, struct tc_action **a, u32 index); int (*init)(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **act, int ovr, - int bind, bool rtnl_held, struct tcf_proto *tp, + int bind, bool rtnl_held, struct nla_bitfield32 *rf, + struct tcf_proto *tp, struct netlink_ext_ack *extack); int (*walk)(struct net *, struct sk_buff *, struct netlink_callback *, int, @@ -176,10 +177,12 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, struct tc_action *actions[], size_t *attr_size, - bool rtnl_held, struct netlink_ext_ack *extack); + bool rtnl_held, struct nla_bitfield32 *root_flags, + struct netlink_ext_ack *extack); struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, + struct nla_bitfield32 *root_flags, bool rtnl_held, struct netlink_ext_ack *extack); int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind, diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index a6aa466fac9e..d92f3d0f2c79 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -16,6 +16,7 @@ enum { TCA_ACT_STATS, TCA_ACT_PAD, TCA_ACT_COOKIE, + TCA_ACT_ROOT_FLAGS, __TCA_ACT_MAX }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 69d4676a402f..ef7b0bb735c7 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -842,6 +842,7 @@ static const struct nla_policy tcf_action_policy[TCA_ACT_MAX + 1] = { struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, + struct nla_bitfield32 *root_flags, bool rtnl_held, struct netlink_ext_ack *extack) { @@ -852,6 +853,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *tb[TCA_ACT_MAX + 1]; struct nlattr *kind; int err; + struct nla_bitfield32 rf = {0, 0}; if (name == NULL) { err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, @@ -884,6 +886,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, } } + if (!root_flags && tb[TCA_ACT_ROOT_FLAGS]) { + rf = nla_get_bitfield32(tb[TCA_ACT_ROOT_FLAGS]); + root_flags = &rf; + } a_o = tc_lookup_action_n(act_name); if (a_o == NULL) { #ifdef CONFIG_MODULES @@ -914,10 +920,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, /* backward compatibility for policer */ if (name == NULL) err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind, - rtnl_held, tp, extack); + rtnl_held, root_flags, tp, extack); else err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held, - tp, extack); + root_flags, tp, extack); if (err < 0) goto err_mod; @@ -955,7 +961,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, struct tc_action *actions[], size_t *attr_size, - bool rtnl_held, struct netlink_ext_ack *extack) + bool rtnl_held, struct nla_bitfield32 *root_flags, + struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; struct tc_action *act; @@ -970,7 +977,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(net, tp, tb[i], est, name, ovr, bind, - rtnl_held, extack); + root_flags, rtnl_held, extack); if (IS_ERR(act)) { err = PTR_ERR(act); goto err; @@ -1350,6 +1357,7 @@ tcf_add_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], static int tcf_action_add(struct net *net, struct nlattr *nla, struct nlmsghdr *n, u32 portid, int ovr, + struct nla_bitfield32 *root_flags, struct netlink_ext_ack *extack) { size_t attr_size = 0; @@ -1358,7 +1366,8 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, for (loop = 0; loop < 10; loop++) { ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, - actions, &attr_size, true, extack); + actions, &attr_size, true, root_flags, + extack); if (ret != -EAGAIN) break; } @@ -1385,6 +1394,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, struct net *net = sock_net(skb->sk); struct nlattr *tca[TCA_ROOT_MAX + 1]; u32 portid = skb ? NETLINK_CB(skb).portid : 0; + struct nla_bitfield32 root_flags = {0, 0}; int ret = 0, ovr = 0; if ((n->nlmsg_type != RTM_GETACTION) && @@ -1412,8 +1422,11 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, */ if (n->nlmsg_flags & NLM_F_REPLACE) ovr = 1; + if (tb[TCA_ROOT_FLAGS]) + root_flags = nla_get_bitfield32(tb[TCA_ROOT_FLAGS]); + ret = tcf_action_add(net, tca[TCA_ACT_TAB], n, portid, ovr, - extack); + &root_flags, extack); break; case RTM_DELACTION: ret = tca_action_gd(net, tca[TCA_ACT_TAB], n, diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 8717c0b26c90..9dc5bf0d637e 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -2946,7 +2946,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb, act = tcf_action_init_1(net, tp, tb[exts->police], rate_tlv, "police", ovr, TCA_ACT_BIND, rtnl_held, - extack); + NULL, extack); if (IS_ERR(act)) return PTR_ERR(act); @@ -2959,7 +2959,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb, err = tcf_action_init(net, tp, tb[exts->action], rate_tlv, NULL, ovr, TCA_ACT_BIND, exts->actions, &attr_size, - rtnl_held, extack); + rtnl_held, NULL, extack); if (err < 0) return err; exts->nr_actions = err;
On 2019-10-25 5:05 p.m., Jamal Hadi Salim wrote: > + if (!root_flags && tb[TCA_ACT_ROOT_FLAGS]) { > + rf = nla_get_bitfield32(tb[TCA_ACT_ROOT_FLAGS]); > + root_flags = &rf; > + } !root_flags check doesnt look right. Hopefully it makes more sense now.... cheers, jamal
On 2019-10-25 5:10 p.m., Jamal Hadi Salim wrote: > On 2019-10-25 5:05 p.m., Jamal Hadi Salim wrote: >> + if (!root_flags && tb[TCA_ACT_ROOT_FLAGS]) { >> + rf = nla_get_bitfield32(tb[TCA_ACT_ROOT_FLAGS]); >> + root_flags = &rf; >> + } > > > > !root_flags check doesnt look right. > Hopefully it makes more sense now.... > For completion: It compiled ;-> cheers, jamal
On Sat 26 Oct 2019 at 00:52, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-25 5:10 p.m., Jamal Hadi Salim wrote: >> On 2019-10-25 5:05 p.m., Jamal Hadi Salim wrote: >>> + if (!root_flags && tb[TCA_ACT_ROOT_FLAGS]) { >>> + rf = nla_get_bitfield32(tb[TCA_ACT_ROOT_FLAGS]); >>> + root_flags = &rf; >>> + } >> >> >> >> !root_flags check doesnt look right. >> Hopefully it makes more sense now.... >> > > For completion: > It compiled ;-> > > > cheers, > jamal Okay, I understand now what you suggest. But why not unify cls and act API, and always have flags parsed in tcf_action_init_1() as TCA_ACT_ROOT_FLAGS like I suggested in one of my previous mails? That way we don't have to pass pointers around. Regards, Vlad
On 2019-10-26 5:44 a.m., Vlad Buslov wrote: > > Okay, I understand now what you suggest. But why not unify cls and act > API, and always have flags parsed in tcf_action_init_1() as > TCA_ACT_ROOT_FLAGS like I suggested in one of my previous mails? That > way we don't have to pass pointers around. That would work. I am being a sucker for optimization - one flag for a batch of actions vs one per action. It is a good compromise, go for it. cheers, jamal
Jamal Hadi Salim <jhs@mojatatu.com> writes: > On 2019-10-26 5:44 a.m., Vlad Buslov wrote: >> > >> Okay, I understand now what you suggest. But why not unify cls and act >> API, and always have flags parsed in tcf_action_init_1() as >> TCA_ACT_ROOT_FLAGS like I suggested in one of my previous mails? That >> way we don't have to pass pointers around. > > That would work. > I am being a sucker for optimization - one flag for a batch > of actions vs one per action. > It is a good compromise, go for it. But why do we need to have two attributes, one at the root level TCA_ROOT_FLAGS and the other at the inner TCA_ACT_* level, but in fact serving the same purpose -- passing flags for optimizations? The whole nest of action attributes including root ones is passed as 3rd argument of tcf_exts_validate(), so it can be validated and extracted at that level and passed to tcf_action_init_1() as pointer to 32-bit flag, admittedly it's ugly given the growing number of arguments to tcf_action_init_1(). With old iproute2 the pointer will always be NULL, so I think backward compatibilty will be preserved.
On 2019-10-26 10:52 a.m., Roman Mashak wrote: [..] > > But why do we need to have two attributes, one at the root level > TCA_ROOT_FLAGS and the other at the inner TCA_ACT_* level, but in fact > serving the same purpose -- passing flags for optimizations? > > > The whole nest of action attributes including root ones is passed as 3rd > argument of tcf_exts_validate(), so it can be validated and extracted at > that level and passed to tcf_action_init_1() as pointer to 32-bit flag, > admittedly it's ugly given the growing number of arguments to > tcf_action_init_1(). With old iproute2 the pointer will always be NULL, > so I think backward compatibilty will be preserved. Note: we only call tcf_action_init_1() at that level for very old policer api for backward compatibility reasons. I think what would make sense is to be able to call tcf_action_init()(the else statement in tcf_exts_validate()) from that level with a global flag but for that we would need to introduce something like TCA_ROOT_FLAGS under this space: --- enum { TCA_UNSPEC, TCA_KIND, TCA_OPTIONS, TCA_STATS, TCA_XSTATS, TCA_RATE, TCA_FCNT, TCA_STATS2, TCA_STAB, TCA_PAD, TCA_DUMP_INVISIBLE, TCA_CHAIN, TCA_HW_OFFLOAD, TCA_INGRESS_BLOCK, TCA_EGRESS_BLOCK, __TCA_MAX }; --- which would be a cleaner solution but would require _a lot more code_ both in user/kernel. Thats why i feel Vlad's suggestion is a reasonable compromise because it gets rid of the original issue of per-specific-action TLVs. On optimization: The current suggestion from Vlad is a bit inefficient, example, if was trying to batch 100 actions i now have 1200 bytes of overhead instead of 12 bytes. cheers, jamal
On Sat 26 Oct 2019 at 19:06, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-26 10:52 a.m., Roman Mashak wrote: > [..] >> >> But why do we need to have two attributes, one at the root level >> TCA_ROOT_FLAGS and the other at the inner TCA_ACT_* level, but in fact >> serving the same purpose -- passing flags for optimizations? >> >> >> The whole nest of action attributes including root ones is passed as 3rd >> argument of tcf_exts_validate(), so it can be validated and extracted at >> that level and passed to tcf_action_init_1() as pointer to 32-bit flag, >> admittedly it's ugly given the growing number of arguments to >> tcf_action_init_1(). With old iproute2 the pointer will always be NULL, >> so I think backward compatibilty will be preserved. > > Note: we only call tcf_action_init_1() at that level for very > old policer api for backward compatibility reasons. I think what > would make sense is to be able to call tcf_action_init()(the else > statement in tcf_exts_validate()) from that level with a global flag > but for that we would need to introduce something like TCA_ROOT_FLAGS > under this space: > --- > enum { > TCA_UNSPEC, > TCA_KIND, > TCA_OPTIONS, > TCA_STATS, > TCA_XSTATS, > TCA_RATE, > TCA_FCNT, > TCA_STATS2, > TCA_STAB, > TCA_PAD, > TCA_DUMP_INVISIBLE, > TCA_CHAIN, > TCA_HW_OFFLOAD, > TCA_INGRESS_BLOCK, > TCA_EGRESS_BLOCK, > __TCA_MAX > }; > --- > > which would be a cleaner solution but would require > _a lot more code_ both in user/kernel. > Thats why i feel Vlad's suggestion is a reasonable compromise > because it gets rid of the original issue of per-specific-action > TLVs. > > On optimization: > The current suggestion from Vlad is a bit inefficient, > example, if was trying to batch 100 actions i now have 1200 > bytes of overhead instead of 12 bytes. > > cheers, > jamal Hmm, yes, this looks quite redundant. At the same time having basically same flags in two different spaces is ugly. Maybe correct approach would be not to add act API at all and only extend tcf_action_init_1() to be used from cls API? I don't see a use for act API at the moment because the only flag value (skip percpu allocation) is hardware offloads specific and such clients generally create action through cls new filter API. WDYT? Regards, Vlad
On 2019-10-26 12:42 p.m., Vlad Buslov wrote: > > On Sat 26 Oct 2019 at 19:06, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > Hmm, yes, this looks quite redundant. It's legit. Two different code paths for two different objects that can be configured independently or together. Only other thing that does this is FIB/NH path. > At the same time having basically > same flags in two different spaces is ugly. Maybe correct approach would > be not to add act API at all and only extend tcf_action_init_1() to be > used from cls API? I don't see a use for act API at the moment because > the only flag value (skip percpu allocation) is hardware offloads > specific and such clients generally create action through cls new filter > API. WDYT? > I am not sure if there is a gain. Your code path is tcf_exts_validate()->tcf_action_init()->tcf_action_init_1() Do you want to replicate the tcf_action_init() in cls? cheers, jamal
On Sat 26 Oct 2019 at 21:38, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2019-10-26 12:42 p.m., Vlad Buslov wrote: >> >> On Sat 26 Oct 2019 at 19:06, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > >> Hmm, yes, this looks quite redundant. > > It's legit. > Two different code paths for two different objects > that can be configured independently or together. > Only other thing that does this is FIB/NH path. > >> At the same time having basically >> same flags in two different spaces is ugly. Maybe correct approach would >> be not to add act API at all and only extend tcf_action_init_1() to be >> used from cls API? I don't see a use for act API at the moment because >> the only flag value (skip percpu allocation) is hardware offloads >> specific and such clients generally create action through cls new filter >> API. WDYT? >> > > I am not sure if there is a gain. Your code path is > tcf_exts_validate()->tcf_action_init()->tcf_action_init_1() > > Do you want to replicate the tcf_action_init() in > cls? > > cheers, > jamal No, I meant that we don't need to change iproute2 to always send new TCA_ACT_ROOT_FLAGS. They can be set conditionally (only when user set the flag) or only when creating filters with actions attached (cls API). Such approach wouldn't introduce any additional overhead to netlink packets when batch creating actions. Regards, Vlad
On 2019-10-27 4:31 a.m., Vlad Buslov wrote: > > No, I meant that we don't need to change iproute2 to always send new > TCA_ACT_ROOT_FLAGS. They can be set conditionally (only when user set > the flag) or only when creating filters with actions attached (cls API). > Such approach wouldn't introduce any additional overhead to netlink > packets when batch creating actions. > iproute2 will only send conditionally. parse_actions() in iproute2 will only set this field if the user explicitly set it on the command line for that action - either when an action is specified as standalone(or in a batch) or when bound to a filter should make no difference. No? The overhead i was talking about was worst case scenario. cheers, jamal