[RFC,net-next,0/3] TC: Introduce qevents

Message ID	cover.1590512901.git.petrm@mellanox.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Petr Machata <petrm@mellanox.com> To: netdev@vger.kernel.org Cc: Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <eric.dumazet@gmail.com>, jhs@mojatatu.com, jiri@mellanox.com, idosch@mellanox.com, Petr Machata <petrm@mellanox.com> Subject: [RFC PATCH net-next 0/3] TC: Introduce qevents Date: Tue, 26 May 2020 20:10:04 +0300 Message-Id: <cover.1590512901.git.petrm@mellanox.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	TC: Introduce qevents \| expand [RFC,net-next,0/3] TC: Introduce qevents [RFC,net-next,1/3] net: sched: Introduce helpers for qevent blocks [RFC,net-next,2/3] net: sched: sch_red: Split init and change callbacks [RFC,net-next,3/3] net: sched: sch_red: Add qevents "early" and "mark"

Petr Machata May 26, 2020, 5:10 p.m. UTC

The Spectrum hardware allows execution of one of several actions as a
result of queue management events: tail-dropping, early-dropping, marking a
packet, or passing a configured latency threshold or buffer size. Such
packets can be mirrored, trapped, or sampled.

Modeling the action to be taken as simply a TC action is very attractive,
but it is not obvious where to put these actions. At least with ECN marking
one could imagine a tree of qdiscs and classifiers that effectively
accomplishes this task, albeit in an impractically complex manner. But
there is just no way to match on dropped-ness of a packet, let alone
dropped-ness due to a particular reason.

To allow configuring user-defined actions as a result of inner workings of
a qdisc, this patch set introduces a concept of qevents. Those are attach
points for TC blocks, where filters can be put that are executed as the
packet hits well-defined points in the qdisc algorithms. The attached
blocks can be shared, in a manner similar to clsact ingress and egress
blocks, arbitrary classifiers with arbitrary actions can be put on them,
etc.

For example:

# tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early block 10
# tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

Patch #1 of this set introduces several helpers to allow easy and uniform
addition of qevents to qdiscs. The following two patches, #2 and #3, then
add two qevents to the RED qdisc: "early" qevent fires when a packet is
early-dropped; "mark" qevent, when it is ECN-marked.

This patch set does not deal with offloading. The idea there is that a
driver will be able to figure out that a given block is used in qevent
context by looking at binder type. A future patch-set will add a qdisc
pointer to struct flow_block_offload, which a driver will be able to
consult to glean the TC or other relevant attributes.

Petr Machata (3):
  net: sched: Introduce helpers for qevent blocks
  net: sched: sch_red: Split init and change callbacks
  net: sched: sch_red: Add qevents "early" and "mark"

 include/net/flow_offload.h     |   2 +
 include/net/pkt_cls.h          |  48 +++++++++++++++
 include/uapi/linux/pkt_sched.h |   2 +
 net/sched/cls_api.c            | 107 +++++++++++++++++++++++++++++++++
 net/sched/sch_red.c            | 100 ++++++++++++++++++++++++++----
 5 files changed, 247 insertions(+), 12 deletions(-)

Cong Wang May 27, 2020, 4:09 a.m. UTC | #1

On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>
> The Spectrum hardware allows execution of one of several actions as a
> result of queue management events: tail-dropping, early-dropping, marking a
> packet, or passing a configured latency threshold or buffer size. Such
> packets can be mirrored, trapped, or sampled.
>
> Modeling the action to be taken as simply a TC action is very attractive,
> but it is not obvious where to put these actions. At least with ECN marking
> one could imagine a tree of qdiscs and classifiers that effectively
> accomplishes this task, albeit in an impractically complex manner. But
> there is just no way to match on dropped-ness of a packet, let alone
> dropped-ness due to a particular reason.
>
> To allow configuring user-defined actions as a result of inner workings of
> a qdisc, this patch set introduces a concept of qevents. Those are attach
> points for TC blocks, where filters can be put that are executed as the
> packet hits well-defined points in the qdisc algorithms. The attached
> blocks can be shared, in a manner similar to clsact ingress and egress
> blocks, arbitrary classifiers with arbitrary actions can be put on them,
> etc.

This concept does not fit well into qdisc, essentially you still want to
install filters (and actions) somewhere on qdisc, but currently all filters
are executed at enqueue, basically you want to execute them at other
pre-defined locations too, for example early drop.

So, perhaps adding a "position" in tc filter is better? Something like:

tc qdisc add dev x root handle 1: ... # same as before
tc filter add dev x parent 1:0 position early_drop matchall action....

And obviously default position must be "enqueue". Makes sense?

(The word "position" may be not accurate, but hope you get my point
here.)

Thanks.

Petr Machata May 27, 2020, 9:56 a.m. UTC | #2

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>>
>> The Spectrum hardware allows execution of one of several actions as a
>> result of queue management events: tail-dropping, early-dropping, marking a
>> packet, or passing a configured latency threshold or buffer size. Such
>> packets can be mirrored, trapped, or sampled.
>>
>> Modeling the action to be taken as simply a TC action is very attractive,
>> but it is not obvious where to put these actions. At least with ECN marking
>> one could imagine a tree of qdiscs and classifiers that effectively
>> accomplishes this task, albeit in an impractically complex manner. But
>> there is just no way to match on dropped-ness of a packet, let alone
>> dropped-ness due to a particular reason.
>>
>> To allow configuring user-defined actions as a result of inner workings of
>> a qdisc, this patch set introduces a concept of qevents. Those are attach
>> points for TC blocks, where filters can be put that are executed as the
>> packet hits well-defined points in the qdisc algorithms. The attached
>> blocks can be shared, in a manner similar to clsact ingress and egress
>> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>> etc.
>
> This concept does not fit well into qdisc, essentially you still want to
> install filters (and actions) somewhere on qdisc, but currently all filters
> are executed at enqueue, basically you want to execute them at other
> pre-defined locations too, for example early drop.
>
> So, perhaps adding a "position" in tc filter is better? Something like:
>
> tc qdisc add dev x root handle 1: ... # same as before
> tc filter add dev x parent 1:0 position early_drop matchall action....

Position would just be a chain index.

> And obviously default position must be "enqueue". Makes sense?

Chain 0.

So if I understand the proposal correctly: a qdisc has a classification
block (cl_ops->tcf_block). Qevents then reference a chain on that
classification block.

If the above is correct, I disagree that this is a better model. RED
does not need to classify anything, modelling this as a classification
block is wrong. It should be another block. (Also shareable, because as
an operator you likely want to treat all, say, early drops the same, and
therefore to add just one rule for all 128 or so of your qdiscs.)

Vladimir Oltean May 27, 2020, 3:19 p.m. UTC | #3

Hi Petr,

On Tue, 26 May 2020 at 20:11, Petr Machata <petrm@mellanox.com> wrote:
>
> The Spectrum hardware allows execution of one of several actions as a
> result of queue management events: tail-dropping, early-dropping, marking a
> packet, or passing a configured latency threshold or buffer size. Such
> packets can be mirrored, trapped, or sampled.
>
> Modeling the action to be taken as simply a TC action is very attractive,
> but it is not obvious where to put these actions. At least with ECN marking
> one could imagine a tree of qdiscs and classifiers that effectively
> accomplishes this task, albeit in an impractically complex manner. But
> there is just no way to match on dropped-ness of a packet, let alone
> dropped-ness due to a particular reason.
>
> To allow configuring user-defined actions as a result of inner workings of
> a qdisc, this patch set introduces a concept of qevents. Those are attach
> points for TC blocks, where filters can be put that are executed as the
> packet hits well-defined points in the qdisc algorithms. The attached
> blocks can be shared, in a manner similar to clsact ingress and egress
> blocks, arbitrary classifiers with arbitrary actions can be put on them,
> etc.
>
> For example:
>
> # tc qdisc add dev eth0 root handle 1: \
>         red limit 500K avpkt 1K qevent early block 10
> # tc filter add block 10 \
>         matchall action mirred egress mirror dev eth1
>
> Patch #1 of this set introduces several helpers to allow easy and uniform
> addition of qevents to qdiscs. The following two patches, #2 and #3, then
> add two qevents to the RED qdisc: "early" qevent fires when a packet is
> early-dropped; "mark" qevent, when it is ECN-marked.
>
> This patch set does not deal with offloading. The idea there is that a
> driver will be able to figure out that a given block is used in qevent
> context by looking at binder type. A future patch-set will add a qdisc
> pointer to struct flow_block_offload, which a driver will be able to
> consult to glean the TC or other relevant attributes.
>
> Petr Machata (3):
>   net: sched: Introduce helpers for qevent blocks
>   net: sched: sch_red: Split init and change callbacks
>   net: sched: sch_red: Add qevents "early" and "mark"
>
>  include/net/flow_offload.h     |   2 +
>  include/net/pkt_cls.h          |  48 +++++++++++++++
>  include/uapi/linux/pkt_sched.h |   2 +
>  net/sched/cls_api.c            | 107 +++++++++++++++++++++++++++++++++
>  net/sched/sch_red.c            | 100 ++++++++++++++++++++++++++----
>  5 files changed, 247 insertions(+), 12 deletions(-)
>
> --
> 2.20.1
>

I only took a cursory glance at your patches. Can these "qevents" be
added to code outside of the packet scheduler, like to the bridge, for
example? Or can the bridge mark the packets somehow, and then any
generic qdisc be able to recognize this mark without specific code?
A very common use case which is currently not possible to implement is
to rate-limit flooded (broadcast, unknown unicast, unknown multicast)
traffic. Can your "qevents" be used to describe this, or must it be
described separately?

Thanks,
-Vladimir

Petr Machata May 27, 2020, 4:25 p.m. UTC | #4

Vladimir Oltean <olteanv@gmail.com> writes:

> I only took a cursory glance at your patches. Can these "qevents" be
> added to code outside of the packet scheduler, like to the bridge, for
> example? Or can the bridge mark the packets somehow, and then any
> generic qdisc be able to recognize this mark without specific code?
> A very common use case which is currently not possible to implement is
> to rate-limit flooded (broadcast, unknown unicast, unknown multicast)
> traffic. Can your "qevents" be used to describe this, or must it be
> described separately?

You mean something like a "flood" qevent? In principle nothing prevents
this, but it does not strike me as a very good fit. These events are
meant to be used on qdiscs, hence the "q" in the name. I am not sure it
makes sense to reuse them for bridge traffic policing.

Peeking in 802.1Q, I see "Managed objects for per-stream filtering and
policing". If that's related, it seems like it would be conservative to
model the policing directly in the bridge, instead of this round-about
through TC.

Cong Wang May 28, 2020, 4 a.m. UTC | #5

On Wed, May 27, 2020 at 2:56 AM Petr Machata <petrm@mellanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
> >>
> >> The Spectrum hardware allows execution of one of several actions as a
> >> result of queue management events: tail-dropping, early-dropping, marking a
> >> packet, or passing a configured latency threshold or buffer size. Such
> >> packets can be mirrored, trapped, or sampled.
> >>
> >> Modeling the action to be taken as simply a TC action is very attractive,
> >> but it is not obvious where to put these actions. At least with ECN marking
> >> one could imagine a tree of qdiscs and classifiers that effectively
> >> accomplishes this task, albeit in an impractically complex manner. But
> >> there is just no way to match on dropped-ness of a packet, let alone
> >> dropped-ness due to a particular reason.
> >>
> >> To allow configuring user-defined actions as a result of inner workings of
> >> a qdisc, this patch set introduces a concept of qevents. Those are attach
> >> points for TC blocks, where filters can be put that are executed as the
> >> packet hits well-defined points in the qdisc algorithms. The attached
> >> blocks can be shared, in a manner similar to clsact ingress and egress
> >> blocks, arbitrary classifiers with arbitrary actions can be put on them,
> >> etc.
> >
> > This concept does not fit well into qdisc, essentially you still want to
> > install filters (and actions) somewhere on qdisc, but currently all filters
> > are executed at enqueue, basically you want to execute them at other
> > pre-defined locations too, for example early drop.
> >
> > So, perhaps adding a "position" in tc filter is better? Something like:
> >
> > tc qdisc add dev x root handle 1: ... # same as before
> > tc filter add dev x parent 1:0 position early_drop matchall action....
>
> Position would just be a chain index.

Why? By position, I mean a place where we _execute_ tc filters on
a qdisc, currently there is only "enqueue". I don't see how this is
close to a chain which is basically a group of filters.


>
> > And obviously default position must be "enqueue". Makes sense?
>
> Chain 0.
>
> So if I understand the proposal correctly: a qdisc has a classification
> block (cl_ops->tcf_block). Qevents then reference a chain on that
> classification block.

No, I am suggesting to replace your qevents with position, because
as I said it does not fit well there.

>
> If the above is correct, I disagree that this is a better model. RED
> does not need to classify anything, modelling this as a classification
> block is wrong. It should be another block. (Also shareable, because as
> an operator you likely want to treat all, say, early drops the same, and
> therefore to add just one rule for all 128 or so of your qdiscs.)

You can still choose not to classify anything by using matchall. No
one is saying you have to do classification.

If you want to jump to a block, you can just use a goto action like
normal. So your above example can be replaced with:

# tc qdisc add dev eth0 root handle 1: \
        red limit 500K avpkt 1K

# tc filter add dev eth0 parent 1:0 position early_drop matchall \
   action goto chain 10

# tc chain add dev eth0 index 10 ...

See the difference?

Thanks.

Petr Machata May 28, 2020, 9:48 a.m. UTC | #6

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Wed, May 27, 2020 at 2:56 AM Petr Machata <petrm@mellanox.com> wrote:
>>
>>
>> Cong Wang <xiyou.wangcong@gmail.com> writes:
>>
>> > On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>> >>
>> >> The Spectrum hardware allows execution of one of several actions as a
>> >> result of queue management events: tail-dropping, early-dropping, marking a
>> >> packet, or passing a configured latency threshold or buffer size. Such
>> >> packets can be mirrored, trapped, or sampled.
>> >>
>> >> Modeling the action to be taken as simply a TC action is very attractive,
>> >> but it is not obvious where to put these actions. At least with ECN marking
>> >> one could imagine a tree of qdiscs and classifiers that effectively
>> >> accomplishes this task, albeit in an impractically complex manner. But
>> >> there is just no way to match on dropped-ness of a packet, let alone
>> >> dropped-ness due to a particular reason.
>> >>
>> >> To allow configuring user-defined actions as a result of inner workings of
>> >> a qdisc, this patch set introduces a concept of qevents. Those are attach
>> >> points for TC blocks, where filters can be put that are executed as the
>> >> packet hits well-defined points in the qdisc algorithms. The attached
>> >> blocks can be shared, in a manner similar to clsact ingress and egress
>> >> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>> >> etc.
>> >
>> > This concept does not fit well into qdisc, essentially you still want to
>> > install filters (and actions) somewhere on qdisc, but currently all filters
>> > are executed at enqueue, basically you want to execute them at other
>> > pre-defined locations too, for example early drop.
>> >
>> > So, perhaps adding a "position" in tc filter is better? Something like:
>> >
>> > tc qdisc add dev x root handle 1: ... # same as before
>> > tc filter add dev x parent 1:0 position early_drop matchall action....
>>
>> Position would just be a chain index.
>
> Why? By position, I mean a place where we _execute_ tc filters on
> a qdisc, currently there is only "enqueue". I don't see how this is
> close to a chain which is basically a group of filters.

OK, I misunderstood what you mean.

So you propose to have further division within the block? To have sort
of namespaces within blocks or chains, where depending on the context,
only filters in the corresponding namespace are executed?

>>
>> > And obviously default position must be "enqueue". Makes sense?
>>
>> Chain 0.
>>
>> So if I understand the proposal correctly: a qdisc has a classification
>> block (cl_ops->tcf_block). Qevents then reference a chain on that
>> classification block.
>
> No, I am suggesting to replace your qevents with position, because
> as I said it does not fit well there.
>
>>
>> If the above is correct, I disagree that this is a better model. RED
>> does not need to classify anything, modelling this as a classification
>> block is wrong. It should be another block. (Also shareable, because as
>> an operator you likely want to treat all, say, early drops the same, and
>> therefore to add just one rule for all 128 or so of your qdiscs.)
>
> You can still choose not to classify anything by using matchall. No
> one is saying you have to do classification.

The point here is filter reuse, not classification.

> If you want to jump to a block, you can just use a goto action like

I don't think you can jump to a block. You can jump to a chain within
the same block.

> normal. So your above example can be replaced with:
>
> # tc qdisc add dev eth0 root handle 1: \
>         red limit 500K avpkt 1K
>
> # tc filter add dev eth0 parent 1:0 position early_drop matchall \
>    action goto chain 10
>
> # tc chain add dev eth0 index 10 ...
>
> See the difference?

Cong Wang May 30, 2020, 4:48 a.m. UTC | #7

On Thu, May 28, 2020 at 2:48 AM Petr Machata <petrm@mellanox.com> wrote:
> So you propose to have further division within the block? To have sort
> of namespaces within blocks or chains, where depending on the context,
> only filters in the corresponding namespace are executed?

What I suggest is to let filters (or chain or block) decide where
they belong to, because I think that fit naturally.

What you suggest is to let qdisc's decide where they put filters,
which looks odd to me.

Hope it is all clear now.

Thanks.

Petr Machata May 30, 2020, 8:55 a.m. UTC | #8

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Thu, May 28, 2020 at 2:48 AM Petr Machata <petrm@mellanox.com> wrote:
>> So you propose to have further division within the block? To have sort
>> of namespaces within blocks or chains, where depending on the context,
>> only filters in the corresponding namespace are executed?
>
> What I suggest is to let filters (or chain or block) decide where
> they belong to, because I think that fit naturally.

So filters would have this attribute that marks them for execution in
the qevent context?

Does "goto chain" keep this position? I.e. are the executed filters from
the "position" context or not? If they keep the position, then this new
system can be equivalently modeled as simply a block binding point.

If "goto chain" loses the position, that is just a block binding point
and a chain to start on, instead of the default zero.

So introducing the position does not seem to allow us to model anything
that could not be modeled before. But it is a more complicated system.

> What you suggest is to let qdisc's decide where they put filters,
> which looks odd to me.

Ultimately the qdisc decides what qevents to expose. Qevents are closely
tied to the inner workings of a qdisc algorithm, they can't be probed as
modules the way qdiscs, filters and actions can. If a user wishes to
make use of them, they will have to let qdiscs "dictate" where to put
the filters one way or another.

Jiri Pirko June 1, 2020, 1:35 p.m. UTC | #9

Thu, May 28, 2020 at 11:48:50AM CEST, petrm@mellanox.com wrote:
>
>Cong Wang <xiyou.wangcong@gmail.com> writes:
>
>> On Wed, May 27, 2020 at 2:56 AM Petr Machata <petrm@mellanox.com> wrote:
>>>
>>>
>>> Cong Wang <xiyou.wangcong@gmail.com> writes:
>>>
>>> > On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>>> >>
>>> >> The Spectrum hardware allows execution of one of several actions as a
>>> >> result of queue management events: tail-dropping, early-dropping, marking a
>>> >> packet, or passing a configured latency threshold or buffer size. Such
>>> >> packets can be mirrored, trapped, or sampled.
>>> >>
>>> >> Modeling the action to be taken as simply a TC action is very attractive,
>>> >> but it is not obvious where to put these actions. At least with ECN marking
>>> >> one could imagine a tree of qdiscs and classifiers that effectively
>>> >> accomplishes this task, albeit in an impractically complex manner. But
>>> >> there is just no way to match on dropped-ness of a packet, let alone
>>> >> dropped-ness due to a particular reason.
>>> >>
>>> >> To allow configuring user-defined actions as a result of inner workings of
>>> >> a qdisc, this patch set introduces a concept of qevents. Those are attach
>>> >> points for TC blocks, where filters can be put that are executed as the
>>> >> packet hits well-defined points in the qdisc algorithms. The attached
>>> >> blocks can be shared, in a manner similar to clsact ingress and egress
>>> >> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>>> >> etc.
>>> >
>>> > This concept does not fit well into qdisc, essentially you still want to
>>> > install filters (and actions) somewhere on qdisc, but currently all filters
>>> > are executed at enqueue, basically you want to execute them at other
>>> > pre-defined locations too, for example early drop.
>>> >
>>> > So, perhaps adding a "position" in tc filter is better? Something like:
>>> >
>>> > tc qdisc add dev x root handle 1: ... # same as before
>>> > tc filter add dev x parent 1:0 position early_drop matchall action....
>>>
>>> Position would just be a chain index.
>>
>> Why? By position, I mean a place where we _execute_ tc filters on
>> a qdisc, currently there is only "enqueue". I don't see how this is
>> close to a chain which is basically a group of filters.
>
>OK, I misunderstood what you mean.
>
>So you propose to have further division within the block? To have sort
>of namespaces within blocks or chains, where depending on the context,
>only filters in the corresponding namespace are executed?

Please take the block as a whole entity. It has one entry ->chain0
processing. The gotos to other chains should be contained within the
block.

Please don't divide the block. If you want to process the block from a
different entry point, please process it as a whole.


>
>>>
>>> > And obviously default position must be "enqueue". Makes sense?
>>>
>>> Chain 0.
>>>
>>> So if I understand the proposal correctly: a qdisc has a classification
>>> block (cl_ops->tcf_block). Qevents then reference a chain on that
>>> classification block.
>>
>> No, I am suggesting to replace your qevents with position, because
>> as I said it does not fit well there.
>>
>>>
>>> If the above is correct, I disagree that this is a better model. RED
>>> does not need to classify anything, modelling this as a classification
>>> block is wrong. It should be another block. (Also shareable, because as
>>> an operator you likely want to treat all, say, early drops the same, and
>>> therefore to add just one rule for all 128 or so of your qdiscs.)
>>
>> You can still choose not to classify anything by using matchall. No
>> one is saying you have to do classification.
>
>The point here is filter reuse, not classification.
>
>> If you want to jump to a block, you can just use a goto action like
>
>I don't think you can jump to a block. You can jump to a chain within
>the same block.

Correct.

entrypoint->
   block X
     chain 0
     chain A
     chain B
     chain ...

There's no goto/jump out of the block.

>
>> normal. So your above example can be replaced with:
>>
>> # tc qdisc add dev eth0 root handle 1: \
>>         red limit 500K avpkt 1K
>>
>> # tc filter add dev eth0 parent 1:0 position early_drop matchall \
>>    action goto chain 10
>>
>> # tc chain add dev eth0 index 10 ...
>>
>> See the difference?

Jiri Pirko June 1, 2020, 1:40 p.m. UTC | #10

Wed, May 27, 2020 at 06:09:03AM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>>
>> The Spectrum hardware allows execution of one of several actions as a
>> result of queue management events: tail-dropping, early-dropping, marking a
>> packet, or passing a configured latency threshold or buffer size. Such
>> packets can be mirrored, trapped, or sampled.
>>
>> Modeling the action to be taken as simply a TC action is very attractive,
>> but it is not obvious where to put these actions. At least with ECN marking
>> one could imagine a tree of qdiscs and classifiers that effectively
>> accomplishes this task, albeit in an impractically complex manner. But
>> there is just no way to match on dropped-ness of a packet, let alone
>> dropped-ness due to a particular reason.
>>
>> To allow configuring user-defined actions as a result of inner workings of
>> a qdisc, this patch set introduces a concept of qevents. Those are attach
>> points for TC blocks, where filters can be put that are executed as the
>> packet hits well-defined points in the qdisc algorithms. The attached
>> blocks can be shared, in a manner similar to clsact ingress and egress
>> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>> etc.
>
>This concept does not fit well into qdisc, essentially you still want to
>install filters (and actions) somewhere on qdisc, but currently all filters
>are executed at enqueue, basically you want to execute them at other
>pre-defined locations too, for example early drop.
>
>So, perhaps adding a "position" in tc filter is better? Something like:
>
>tc qdisc add dev x root handle 1: ... # same as before
>tc filter add dev x parent 1:0 position early_drop matchall action....
>
>And obviously default position must be "enqueue". Makes sense?


Well, if you look at the examples in the cover letter, I think that they
are showing something very similar you are talking about:

# tc qdisc add dev eth0 root handle 1: \
        red limit 500K avpkt 1K qevent early block 10
# tc filter add block 10 \
        matchall action mirred egress mirror dev eth1


The first command just says "early drop position should be processed by
block 10"

The second command just adds a filter to the block 10.



We have this concept of blocks, we use them in "tc filter" as a handle.

The block as a unit could be attached to be processed not only to
"enqueue" but to anything else, like some qdisc stage.

Looks quite neat to me.



>
>(The word "position" may be not accurate, but hope you get my point
>here.)
>
>Thanks.

Cong Wang June 1, 2020, 7:50 p.m. UTC | #11

On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
> The first command just says "early drop position should be processed by
> block 10"
>
> The second command just adds a filter to the block 10.

This is exactly why it looks odd to me, because it _reads_ like
'tc qdisc' creates the block to hold tc filters... I think tc filters should
create whatever placeholder for themselves.

I know in memory block (or chain or filters) are stored in qdisc, but
it is still different to me who initiates the creation.

Thanks.

Cong Wang June 1, 2020, 8:01 p.m. UTC | #12

On Sat, May 30, 2020 at 1:55 AM Petr Machata <petrm@mellanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Thu, May 28, 2020 at 2:48 AM Petr Machata <petrm@mellanox.com> wrote:
> >> So you propose to have further division within the block? To have sort
> >> of namespaces within blocks or chains, where depending on the context,
> >> only filters in the corresponding namespace are executed?
> >
> > What I suggest is to let filters (or chain or block) decide where
> > they belong to, because I think that fit naturally.
>
> So filters would have this attribute that marks them for execution in
> the qevent context?

If you view it as position rather than qevent, sure, we already need to
specify the "context" for tc filter creations anyway, "dev... parent ..." is
is context to locate the filter placeholders. This is why it makes sense
to add one more piece, say "position", that is "dev... parent... position...".

It is you who calls it qevent which of course looks like only belong to
qdisc's.

>
> Ultimately the qdisc decides what qevents to expose. Qevents are closely
> tied to the inner workings of a qdisc algorithm, they can't be probed as
> modules the way qdiscs, filters and actions can. If a user wishes to
> make use of them, they will have to let qdiscs "dictate" where to put
> the filters one way or another.

For a specific event like early drop, sure. But if we think it generally,
"enqueue" has the same problem, there are a few qdisc's don't even
support filtering (noop). We can just return ENOSUPP anyway.

Petr Machata June 1, 2020, 10:37 p.m. UTC | #13

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> The first command just says "early drop position should be processed by
>> block 10"
>>
>> The second command just adds a filter to the block 10.

> This is exactly why it looks odd to me, because it _reads_ like
> 'tc qdisc' creates the block to hold tc filters... I think tc filters should
> create whatever placeholder for themselves.

Look at clsact. It creates blocks in exactly the same way.

> I know in memory block (or chain or filters) are stored in qdisc, but
> it is still different to me who initiates the creation.

The block binding mechanics are not new. The patch just reuses them. If
you are unhappy about how this is currently done, I too would see merit
in creating a block explicitly, like with chains. But it has nothing to
do with this patchset, which would just naturally pick up whatever this
new mechanic is.

Jiri Pirko June 2, 2020, 6:05 a.m. UTC | #14

Mon, Jun 01, 2020 at 09:50:23PM CEST, xiyou.wangcong@gmail.com wrote:
>On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> The first command just says "early drop position should be processed by
>> block 10"
>>
>> The second command just adds a filter to the block 10.
>
>This is exactly why it looks odd to me, because it _reads_ like
>'tc qdisc' creates the block to hold tc filters... I think tc filters should
>create whatever placeholder for themselves.

That is how it is done already today. We have the block object. The
instances are created separatelly (clsact for example), user may specify
the block id to identify the block. Then the user may use this block id
to add/remove filters directly to/from the block.

What you propose with "position" on the other hand look unnatural for
me. It would slice the exising blocks in some way. Currently, the block
has 1 clearly defined entrypoint. With "positions", all of the sudden
there would be many of them? I can't really imagine how that is supposed
to work :/

>
>I know in memory block (or chain or filters) are stored in qdisc, but
>it is still different to me who initiates the creation.

Qdiscs create blocks.

>
>Thanks.

Cong Wang June 3, 2020, 7:05 a.m. UTC | #15

On Mon, Jun 1, 2020 at 11:05 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Mon, Jun 01, 2020 at 09:50:23PM CEST, xiyou.wangcong@gmail.com wrote:
> >On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >> The first command just says "early drop position should be processed by
> >> block 10"
> >>
> >> The second command just adds a filter to the block 10.
> >
> >This is exactly why it looks odd to me, because it _reads_ like
> >'tc qdisc' creates the block to hold tc filters... I think tc filters should
> >create whatever placeholder for themselves.
>
> That is how it is done already today. We have the block object. The
> instances are created separatelly (clsact for example), user may specify
> the block id to identify the block. Then the user may use this block id
> to add/remove filters directly to/from the block.
>
> What you propose with "position" on the other hand look unnatural for
> me. It would slice the exising blocks in some way. Currently, the block
> has 1 clearly defined entrypoint. With "positions", all of the sudden
> there would be many of them? I can't really imagine how that is supposed
> to work :/

I imagine we could introduce multiple blocks for a qdisc.

Currently we have:

struct some_qdisc_data {
  struct tcf_block *block;
};

Maybe we can extend it to:

struct some_qdisc_data {
  struct tcf_block *blocks[3];
};

#define ENQUEUE 0
#define DEQUEUE 1
#define DROP 2

static struct tcf_block *foo_tcf_block(struct Qdisc *sch, unsigned long cl,
                                            struct netlink_ext_ack
*extack, int position)
{
        struct some_qdisc_data *q = qdisc_priv(sch);

        if (cl)
                return NULL;
        return q->block[position];
}


Just some scratches on my mind.

Thanks.

Petr Machata June 3, 2020, 10:08 a.m. UTC | #16

Cong Wang <xiyou.wangcong@gmail.com> writes:

> I imagine we could introduce multiple blocks for a qdisc.

Yes, and that's what the patchset does. If you look at struct
tcf_qevent, it is just some block bookkeeping and an attribute name.

> Currently we have:
>
> struct some_qdisc_data {
>   struct tcf_block *block;
> };
>
> Maybe we can extend it to:
>
> struct some_qdisc_data {
>   struct tcf_block *blocks[3];

Yeah, except not all qdiscs will implement all qevents, so let's instead
make it a handful of fields, like in the patchset.

> };
>
> #define ENQUEUE 0
> #define DEQUEUE 1
> #define DROP 2
>
> static struct tcf_block *foo_tcf_block(struct Qdisc *sch, unsigned long cl,
>                                             struct netlink_ext_ack
> *extack, int position)
> {
>         struct some_qdisc_data *q = qdisc_priv(sch);
>
>         if (cl)
>                 return NULL;
>         return q->block[position];
> }

Interestingly, this is close to my original approach, pre-RFC. But there
needs to be this global list of all existing qevents. On its own, that's
a negative--at least it's an extra uAPI to maintain. What does it bring?

It theoretically allows one to refer to blocks symbolically, through
binding point coordinates (dev D parent P qevent Q) not by indices
(block B). But then one block could be referenced by several different
coordinates, which is confusing. That is the reason TC disallows editing
filters on shared blocks. Qevents should be the same.

What else is there?

[RFC,net-next,0/3] TC: Introduce qevents

Message

Comments