mbox series

[v2,net-next,0/7] dpaa2-eth: add support for Rx traffic classes

Message ID 20200515184753.15080-1-ioana.ciornei@nxp.com
Headers show
Series dpaa2-eth: add support for Rx traffic classes | expand

Message

Ioana Ciornei May 15, 2020, 6:47 p.m. UTC
This patch set adds support for Rx traffic classes on DPAA2 Ethernet
devices.

The first two patches make the necessary changes so that multiple
traffic classes are configured and their statistics are displayed in the
debugfs. The third patch adds a static distribution to said traffic
classes based on the VLAN PCP field.

The last patches add support for the congestion group taildrop mechanism
that allows us to control the number of frames that can accumulate on a
group of Rx frame queues belonging to the same traffic class.

Changes in v2:
  - use mask as u16* in set_vlan_qos - 3/7

Ioana Radulescu (7):
  dpaa2-eth: Add support for Rx traffic classes
  dpaa2-eth: Trim debugfs FQ stats
  dpaa2-eth: Distribute ingress frames based on VLAN prio
  dpaa2-eth: Add helper functions
  dpaa2-eth: Minor cleanup in dpaa2_eth_set_rx_taildrop()
  dpaa2-eth: Add congestion group taildrop
  dpaa2-eth: Update FQ taildrop threshold and buffer pool count

 .../freescale/dpaa2/dpaa2-eth-debugfs.c       |  11 +-
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  | 225 +++++++++++++++---
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.h  |  44 +++-
 .../ethernet/freescale/dpaa2/dpaa2-ethtool.c  |  24 +-
 .../net/ethernet/freescale/dpaa2/dpni-cmd.h   |  34 +++
 drivers/net/ethernet/freescale/dpaa2/dpni.c   | 131 ++++++++++
 drivers/net/ethernet/freescale/dpaa2/dpni.h   |  36 +++
 7 files changed, 450 insertions(+), 55 deletions(-)

Comments

Jakub Kicinski May 15, 2020, 7:20 p.m. UTC | #1
On Fri, 15 May 2020 21:47:46 +0300 Ioana Ciornei wrote:
> This patch set adds support for Rx traffic classes on DPAA2 Ethernet
> devices.
> 
> The first two patches make the necessary changes so that multiple
> traffic classes are configured and their statistics are displayed in the
> debugfs. The third patch adds a static distribution to said traffic
> classes based on the VLAN PCP field.
> 
> The last patches add support for the congestion group taildrop mechanism
> that allows us to control the number of frames that can accumulate on a
> group of Rx frame queues belonging to the same traffic class.

Ah, I miseed you already sent a v2. Same question applies:

> How is this configured from the user perspective? I looked through the
> patches and I see no information on how the input is taken from the
> user.
Ioana Ciornei May 15, 2020, 7:31 p.m. UTC | #2
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Fri, 15 May 2020 21:47:46 +0300 Ioana Ciornei wrote:
> > This patch set adds support for Rx traffic classes on DPAA2 Ethernet
> > devices.
> >
> > The first two patches make the necessary changes so that multiple
> > traffic classes are configured and their statistics are displayed in
> > the debugfs. The third patch adds a static distribution to said
> > traffic classes based on the VLAN PCP field.
> >
> > The last patches add support for the congestion group taildrop
> > mechanism that allows us to control the number of frames that can
> > accumulate on a group of Rx frame queues belonging to the same traffic class.
> 
> Ah, I miseed you already sent a v2. Same question applies:
> 
> > How is this configured from the user perspective? I looked through the
> > patches and I see no information on how the input is taken from the
> > user.

There is no input taken from the user at the moment. The traffic class id is statically selected based on the VLAN PCP field. The configuration for this is added in patch 3/7.

Ioana
Jakub Kicinski May 15, 2020, 7:40 p.m. UTC | #3
On Fri, 15 May 2020 19:31:18 +0000 Ioana Ciornei wrote:
> > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> > classes
> > 
> > On Fri, 15 May 2020 21:47:46 +0300 Ioana Ciornei wrote:  
> > > This patch set adds support for Rx traffic classes on DPAA2 Ethernet
> > > devices.
> > >
> > > The first two patches make the necessary changes so that multiple
> > > traffic classes are configured and their statistics are displayed in
> > > the debugfs. The third patch adds a static distribution to said
> > > traffic classes based on the VLAN PCP field.
> > >
> > > The last patches add support for the congestion group taildrop
> > > mechanism that allows us to control the number of frames that can
> > > accumulate on a group of Rx frame queues belonging to the same traffic class.  
> > 
> > Ah, I miseed you already sent a v2. Same question applies:
> >   
> > > How is this configured from the user perspective? I looked through the
> > > patches and I see no information on how the input is taken from the
> > > user.  
> 
> There is no input taken from the user at the moment. The traffic
> class id is statically selected based on the VLAN PCP field. The
> configuration for this is added in patch 3/7.

Having some defaults for RX queue per TC is understandable. But patch 1
changes how many RX queues are used in the first place. Why if user
does not need RX queues per TC?
Ioana Ciornei May 15, 2020, 8:48 p.m. UTC | #4
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Fri, 15 May 2020 19:31:18 +0000 Ioana Ciornei wrote:
> > > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx
> > > traffic classes
> > >
> > > On Fri, 15 May 2020 21:47:46 +0300 Ioana Ciornei wrote:
> > > > This patch set adds support for Rx traffic classes on DPAA2
> > > > Ethernet devices.
> > > >
> > > > The first two patches make the necessary changes so that multiple
> > > > traffic classes are configured and their statistics are displayed
> > > > in the debugfs. The third patch adds a static distribution to said
> > > > traffic classes based on the VLAN PCP field.
> > > >
> > > > The last patches add support for the congestion group taildrop
> > > > mechanism that allows us to control the number of frames that can
> > > > accumulate on a group of Rx frame queues belonging to the same traffic
> class.
> > >
> > > Ah, I miseed you already sent a v2. Same question applies:
> > >
> > > > How is this configured from the user perspective? I looked through
> > > > the patches and I see no information on how the input is taken
> > > > from the user.
> >
> > There is no input taken from the user at the moment. The traffic class
> > id is statically selected based on the VLAN PCP field. The
> > configuration for this is added in patch 3/7.
> 
> Having some defaults for RX queue per TC is understandable. But patch 1
> changes how many RX queues are used in the first place. Why if user does not
> need RX queues per TC?

In DPAA2 we have a boot time configurable system in which the user can select
for each interface how many queues and how many traffic classes it needs.

The driver picks these up from firmware and configures the traffic class
distribution only if there is more than one requested.
With one TC the behavior of the driver is exactly as before.

Ioana
Jakub Kicinski May 15, 2020, 10:25 p.m. UTC | #5
On Fri, 15 May 2020 20:48:27 +0000 Ioana Ciornei wrote:
> > > There is no input taken from the user at the moment. The traffic class
> > > id is statically selected based on the VLAN PCP field. The
> > > configuration for this is added in patch 3/7.  
> > 
> > Having some defaults for RX queue per TC is understandable. But patch 1
> > changes how many RX queues are used in the first place. Why if user does not
> > need RX queues per TC?  
> 
> In DPAA2 we have a boot time configurable system in which the user can select
> for each interface how many queues and how many traffic classes it needs.

Looking at the UG online DPNI_CREATE has a NUM_RX_TCS param. You're not
using that for the kernel driver?

> The driver picks these up from firmware and configures the traffic class
> distribution only if there is more than one requested.
> With one TC the behavior of the driver is exactly as before.

This configuring things statically via some direct FW interface when
system boots really sounds like a typical "using Linux to boot a
proprietary networking stack" scenario.

With the Rx QoS features users won't even be able to tell via standard
Linux interfaces what the config was.
David Miller May 15, 2020, 11:33 p.m. UTC | #6
From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 15 May 2020 15:25:00 -0700

> With the Rx QoS features users won't even be able to tell via standard
> Linux interfaces what the config was.

+1
Ioana Ciornei May 16, 2020, 8:16 a.m. UTC | #7
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Fri, 15 May 2020 20:48:27 +0000 Ioana Ciornei wrote:
> > > > There is no input taken from the user at the moment. The traffic
> > > > class id is statically selected based on the VLAN PCP field. The
> > > > configuration for this is added in patch 3/7.
> > >
> > > Having some defaults for RX queue per TC is understandable. But
> > > patch 1 changes how many RX queues are used in the first place. Why
> > > if user does not need RX queues per TC?
> >
> > In DPAA2 we have a boot time configurable system in which the user can
> > select for each interface how many queues and how many traffic classes it
> needs.
> 
> Looking at the UG online DPNI_CREATE has a NUM_RX_TCS param. You're not
> using that for the kernel driver?

I have to give a bit of context here. If we look at what the hardware supports,
DPAA2 is reconfigurable and what I mean by that is that we can create new
interfaces (DPNIs) or destroy them at runtime using the DPNI_CREATE command
that you found in the UG.
This runtime reconfiguration is not supported in upstream. What we rely on in
upstream is on a static configuration of all the resources (how many interfaces
are needed and how should these interfaces be provisioned) which is applied
and configured at boot time even before Linux boots and gets to probe those
interfaces.

In the kernel driver we just get the num_tcs and num_queues parameters
(which are set in stone by now) and configure everything based on them.
This is why the kernel driver is not using at all the DPNI_CREATE command.
 
> 
> > The driver picks these up from firmware and configures the traffic
> > class distribution only if there is more than one requested.
> > With one TC the behavior of the driver is exactly as before.
> 
> This configuring things statically via some direct FW interface when system
> boots really sounds like a typical "using Linux to boot a proprietary networking
> stack" scenario.

I may have explained it poorly before, but the kernel is not in control of how
many TCs or queues are there it merely just configures the distribution
depending on what the hardware was setup for.

> 
> With the Rx QoS features users won't even be able to tell via standard Linux
> interfaces what the config was.

Ok, that is true. So how should this information be exported to the user?

Ioana
Jakub Kicinski May 18, 2020, 7:35 p.m. UTC | #8
On Sat, 16 May 2020 08:16:47 +0000 Ioana Ciornei wrote:
> > With the Rx QoS features users won't even be able to tell via standard Linux
> > interfaces what the config was.  
> 
> Ok, that is true. So how should this information be exported to the user?

I believe no such interface currently exists.
Ioana Ciornei May 19, 2020, 7:38 a.m. UTC | #9
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Sat, 16 May 2020 08:16:47 +0000 Ioana Ciornei wrote:
> > > With the Rx QoS features users won't even be able to tell via
> > > standard Linux interfaces what the config was.
> >
> > Ok, that is true. So how should this information be exported to the user?
> 
> I believe no such interface currently exists.

I am having a bit of trouble understanding what should be the route for this feature to get accepted.

Is the problem having the classification to a TC based on the VLAN PCP or is there anything else?

Regards,
Ioana
Jakub Kicinski May 19, 2020, 6:43 p.m. UTC | #10
On Tue, 19 May 2020 07:38:57 +0000 Ioana Ciornei wrote:
> > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic classes
> > 
> > On Sat, 16 May 2020 08:16:47 +0000 Ioana Ciornei wrote:  
> > > > With the Rx QoS features users won't even be able to tell via
> > > > standard Linux interfaces what the config was.  
> > >
> > > Ok, that is true. So how should this information be exported to the user?  
> > 
> > I believe no such interface currently exists.  
> 
> I am having a bit of trouble understanding what should be the route
> for this feature to get accepted.

What's the feature you're trying to get accepted? Driver's datapath to
behave correctly when some proprietary out-of-tree API is used to do the
actual configuration? Unexciting.

> Is the problem having the classification to a TC based on the VLAN
> PCP or is there anything else?

What you have is basically RX version of mqprio, right? Multiple rings
per "channel" each gets frames with specific priorities? This needs to
be well integrated with the rest of the stack, but I don't think TC
qdisc offload is a fit. Given we don't have qdiscs on ingress. As I
said a new API for this would most likely have to be created.
Ioana Ciornei May 19, 2020, 8:58 p.m. UTC | #11
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Tue, 19 May 2020 07:38:57 +0000 Ioana Ciornei wrote:
> > > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx
> > > traffic classes
> > >
> > > On Sat, 16 May 2020 08:16:47 +0000 Ioana Ciornei wrote:
> > > > > With the Rx QoS features users won't even be able to tell via
> > > > > standard Linux interfaces what the config was.
> > > >
> > > > Ok, that is true. So how should this information be exported to the user?
> > >
> > > I believe no such interface currently exists.
> >
> > I am having a bit of trouble understanding what should be the route
> > for this feature to get accepted.
> 
> What's the feature you're trying to get accepted? Driver's datapath to behave
> correctly when some proprietary out-of-tree API is used to do the actual
> configuration? Unexciting.
> 

I don't think this comment is very productive. The out-of-tree API has nothing
to do with this. The NIC parameters like number of traffic classes and number of
queues per traffic class are fixed as far as the Linux driver is concerned. 

What this patch set does is make use of the queues corresponding to traffic classes
different than 0, if those exist.

> > Is the problem having the classification to a TC based on the VLAN PCP
> > or is there anything else?
> 
> What you have is basically RX version of mqprio, right? Multiple rings per
> "channel" each gets frames with specific priorities? 

You can think of it like that (maybe, understanding that mqprio cannot be attached
to the ingress qdisc, indeed).
DPAA2 has many peculiarities but I don't think this is one of them. There are some
RX queues, some of which can be given a hardware priority at dequeue time and they
form a traffic class together. The assumption being that you don't want low-priority
traffic to cause congestion for high-priority traffic, to have lower latency for
high-priority traffic, etc.
Some 802.1Q standards like PFC actually depend on having multiple traffic classes too,
based on VLAN PCP.

> This needs to be well
> integrated with the rest of the stack, but I don't think TC qdisc offload is a fit.
> Given we don't have qdiscs on ingress. As I said a new API for this would most
> likely have to be created.

For just assigning a traffic class based on packet headers a tc filter with the
skbedit priority action on ingress is enough (maybe even too much since there are
other drivers that have the same default prioritization based on VLAN PCP).

But you are correct that this would not be enough to cover all possible use cases except
for the most simple ones. There are per-traffic class ingress policers, and those become
tricky to support since there's nothing that denotes the traffic class to match on,
currently. I see 2 possible approaches, each with its own drawbacks:
- Allow clsact to be classful, similar to mqprio, and attach filters to its classes (each
  class would correspond to an ingress traffic class). But this would make the skbedit
  action redundant, since QoS classification with a classful clsact should be done
  completely differently now. Also, the classful clsact would have to deny qdiscs attached
  to it that don't make sense, because all of those were written with egress in mind.
- Try to linearize the ingress filter rules under the classless clsact, both the ones that
  have a skbedit action, and the ones that match on a skb priority in order to perform
  ingress policing. But this would be very brittle because the matching order would depend
  on the order in which the rules were introduced:
  rule 1: flower skb-priority 5 action police rate 34Mbps # note: matching on skb-priority doesn't exist (yet?)
  rule 2: flower vlan_prio 5 action skbedit priority 5
  In this case, traffic with VLAN priority 5 would not get rate-limited to 34Mbps.

So this is one of the reasons why I preferred to defer the hard questions and start with
something simple (which for some reason still gets pushback).

Ioana
Jakub Kicinski May 19, 2020, 9:35 p.m. UTC | #12
On Tue, 19 May 2020 20:58:50 +0000 Ioana Ciornei wrote:
> > This needs to be well
> > integrated with the rest of the stack, but I don't think TC qdisc offload is a fit.
> > Given we don't have qdiscs on ingress. As I said a new API for this would most
> > likely have to be created.  
> 
> For just assigning a traffic class based on packet headers a tc filter with the
> skbedit priority action on ingress is enough (maybe even too much since there are
> other drivers that have the same default prioritization based on VLAN PCP).
> 
> But you are correct that this would not be enough to cover all possible use cases except
> for the most simple ones. There are per-traffic class ingress policers, and those become
> tricky to support since there's nothing that denotes the traffic class to match on,
> currently. I see 2 possible approaches, each with its own drawbacks:
> - Allow clsact to be classful, similar to mqprio, and attach filters to its classes (each
>   class would correspond to an ingress traffic class). But this would make the skbedit
>   action redundant, since QoS classification with a classful clsact should be done
>   completely differently now. Also, the classful clsact would have to deny qdiscs attached
>   to it that don't make sense, because all of those were written with egress in mind.
> - Try to linearize the ingress filter rules under the classless clsact, both the ones that
>   have a skbedit action, and the ones that match on a skb priority in order to perform
>   ingress policing. But this would be very brittle because the matching order would depend
>   on the order in which the rules were introduced:
>   rule 1: flower skb-priority 5 action police rate 34Mbps # note: matching on skb-priority doesn't exist (yet?)
>   rule 2: flower vlan_prio 5 action skbedit priority 5
>   In this case, traffic with VLAN priority 5 would not get rate-limited to 34Mbps.
> 
> So this is one of the reasons why I preferred to defer the hard questions and start with
> something simple (which for some reason still gets pushback).

You're jumping to classification while the configuration of the queues
itself is still not defined. How does the user know how many queues
there are to classify into?

Does this driver has descriptor rings for RX / completion? How does it
decide which queue to pool at NAPI time?
Ioana Ciornei May 20, 2020, 3:10 p.m. UTC | #13
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Tue, 19 May 2020 20:58:50 +0000 Ioana Ciornei wrote:
> > > This needs to be well
> > > integrated with the rest of the stack, but I don't think TC qdisc offload is a fit.
> > > Given we don't have qdiscs on ingress. As I said a new API for this
> > > would most likely have to be created.
> >
> > For just assigning a traffic class based on packet headers a tc filter
> > with the skbedit priority action on ingress is enough (maybe even too
> > much since there are other drivers that have the same default prioritization
> based on VLAN PCP).
> >
> > But you are correct that this would not be enough to cover all
> > possible use cases except for the most simple ones. There are
> > per-traffic class ingress policers, and those become tricky to support
> > since there's nothing that denotes the traffic class to match on, currently. I see
> 2 possible approaches, each with its own drawbacks:
> > - Allow clsact to be classful, similar to mqprio, and attach filters to its classes
> (each
> >   class would correspond to an ingress traffic class). But this would make the
> skbedit
> >   action redundant, since QoS classification with a classful clsact should be
> done
> >   completely differently now. Also, the classful clsact would have to deny
> qdiscs attached
> >   to it that don't make sense, because all of those were written with egress in
> mind.
> > - Try to linearize the ingress filter rules under the classless clsact, both the ones
> that
> >   have a skbedit action, and the ones that match on a skb priority in order to
> perform
> >   ingress policing. But this would be very brittle because the matching order
> would depend
> >   on the order in which the rules were introduced:
> >   rule 1: flower skb-priority 5 action police rate 34Mbps # note: matching on
> skb-priority doesn't exist (yet?)
> >   rule 2: flower vlan_prio 5 action skbedit priority 5
> >   In this case, traffic with VLAN priority 5 would not get rate-limited to
> 34Mbps.
> >
> > So this is one of the reasons why I preferred to defer the hard
> > questions and start with something simple (which for some reason still gets
> pushback).
> 
> You're jumping to classification while the configuration of the queues itself is
> still not defined. How does the user know how many queues there are to classify
> into?
> 
> Does this driver has descriptor rings for RX / completion? How does it decide
> which queue to pool at NAPI time?

DPAA2 has frame queues per each Rx traffic class and the decision from which queue
to pull frames from is made by the HW based on the queue priority within a channel
(there is one channel per each CPU).

If this should be modeled in software, then I assume there should be a NAPI instance
for each traffic class and the stack should know in which order to call the poll()
callbacks so that the priority is respected.

Ioana
Jakub Kicinski May 20, 2020, 7:12 p.m. UTC | #14
On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote:
> DPAA2 has frame queues per each Rx traffic class and the decision from which queue
> to pull frames from is made by the HW based on the queue priority within a channel
> (there is one channel per each CPU).

IOW you're reading the descriptor for the device memory/iomem address
and the HW will return the next descriptor based on configured priority?
Presumably strict priority?

> If this should be modeled in software, then I assume there should be a NAPI instance
> for each traffic class and the stack should know in which order to call the poll()
> callbacks so that the priority is respected.

Right, something like that. But IMHO not needed if HW can serve the
right descriptor upon poll.
Ioana Ciornei May 20, 2020, 8:24 p.m. UTC | #15
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote:
> > DPAA2 has frame queues per each Rx traffic class and the decision from
> > which queue to pull frames from is made by the HW based on the queue
> > priority within a channel (there is one channel per each CPU).
> 
> IOW you're reading the descriptor for the device memory/iomem address and
> the HW will return the next descriptor based on configured priority?

That's the general idea but the decision is not made on a frame by frame bases
but rather on a dequeue operation which can, at a maximum, return
16 frame descriptors at a time.

> Presumably strict priority?

Only the two highest traffic classes are in strict priority, while the other 6 TCs
form two priority tiers - medium(4 TCs) and low (last two TCs).

> 
> > If this should be modeled in software, then I assume there should be a
> > NAPI instance for each traffic class and the stack should know in
> > which order to call the poll() callbacks so that the priority is respected.
> 
> Right, something like that. But IMHO not needed if HW can serve the right
> descriptor upon poll.

After thinking this through I don't actually believe that multiple NAPI instances
would solve this in any circumstance at all:

- If you have hardware prioritization with full scheduling on dequeue then job on the
driver side is already done.
- If you only have hardware assist for prioritization (ie hardware gives you multiple
rings but doesn't tell you from which one to dequeue) then you can still use a single
NAPI instance just fine and pick the highest priority non-empty ring on-the-fly basically.

What I am having trouble understanding is how the fully software implementation
of this possible new Rx qdisc should work. Somehow the skb->priority should be taken
into account when the skb is passing though the stack (ie a higher priority skb should
surpass another previously received skb even if the latter one was received first, but
its priority queue is congested).

I don't have a very deep understanding of the stack but I am thinking that the
enqueue_to_backlog()/process_backlog() area could be a candidate place for sorting out
bottlenecks. In case we do that I don't see why a qdisc would be necessary at all and not
have everybody benefit from prioritization based on skb->priority.

Ioana
Jakub Kicinski May 21, 2020, 7:07 p.m. UTC | #16
On Wed, 20 May 2020 20:24:43 +0000 Ioana Ciornei wrote:
> > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> > classes
> > 
> > On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote:  
> > > DPAA2 has frame queues per each Rx traffic class and the decision from
> > > which queue to pull frames from is made by the HW based on the queue
> > > priority within a channel (there is one channel per each CPU).  
> > 
> > IOW you're reading the descriptor for the device memory/iomem address and
> > the HW will return the next descriptor based on configured priority?  
> 
> That's the general idea but the decision is not made on a frame by frame bases
> but rather on a dequeue operation which can, at a maximum, return
> 16 frame descriptors at a time.

I see!

> > Presumably strict priority?  
> 
> Only the two highest traffic classes are in strict priority, while the other 6 TCs
> form two priority tiers - medium(4 TCs) and low (last two TCs).
> 
> > > If this should be modeled in software, then I assume there should be a
> > > NAPI instance for each traffic class and the stack should know in
> > > which order to call the poll() callbacks so that the priority is respected.  
> > 
> > Right, something like that. But IMHO not needed if HW can serve the right
> > descriptor upon poll.  
> 
> After thinking this through I don't actually believe that multiple NAPI instances
> would solve this in any circumstance at all:
> 
> - If you have hardware prioritization with full scheduling on dequeue then job on the
> driver side is already done.
> - If you only have hardware assist for prioritization (ie hardware gives you multiple
> rings but doesn't tell you from which one to dequeue) then you can still use a single
> NAPI instance just fine and pick the highest priority non-empty ring on-the-fly basically.
> 
> What I am having trouble understanding is how the fully software implementation
> of this possible new Rx qdisc should work. Somehow the skb->priority should be taken
> into account when the skb is passing though the stack (ie a higher priority skb should
> surpass another previously received skb even if the latter one was received first, but
> its priority queue is congested).

I'd think the SW implementation would come down to which ring to
service first. If there are multiple rings on the host NAPI can try
to read from highest priority ring first and then move on to next prio.
Not sure if there would be a use case for multiple NAPIs for busy
polling or not.

I was hoping we can solve this with the new ring config API (which is
coming any day now, ehh) - in which I hope user space will be able to
assign rings to NAPI instances, all we would have needed would be also
controlling the querying order. But that doesn't really work for you,
it seems, since the selection is offloaded to HW :S

> I don't have a very deep understanding of the stack but I am thinking that the
> enqueue_to_backlog()/process_backlog() area could be a candidate place for sorting out
> bottlenecks. In case we do that I don't see why a qdisc would be necessary at all and not
> have everybody benefit from prioritization based on skb->priority.

I think once the driver picks the frame up it should run with it to
completion (+/-GRO). We have natural batching with NAPI processing.
Every NAPI budget high priority rings get a chance to preempt lower
ones.
Ioana Ciornei May 22, 2020, 1:58 p.m. UTC | #17
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Wed, 20 May 2020 20:24:43 +0000 Ioana Ciornei wrote:
> > > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx
> > > traffic classes
> > >
> > > On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote:
> > > > DPAA2 has frame queues per each Rx traffic class and the decision
> > > > from which queue to pull frames from is made by the HW based on
> > > > the queue priority within a channel (there is one channel per each CPU).
> > >
> > > IOW you're reading the descriptor for the device memory/iomem
> > > address and the HW will return the next descriptor based on configured
> priority?
> >
> > That's the general idea but the decision is not made on a frame by
> > frame bases but rather on a dequeue operation which can, at a maximum,
> > return
> > 16 frame descriptors at a time.
> 
> I see!
> 
> > > Presumably strict priority?
> >
> > Only the two highest traffic classes are in strict priority, while the
> > other 6 TCs form two priority tiers - medium(4 TCs) and low (last two TCs).
> >
> > > > If this should be modeled in software, then I assume there should
> > > > be a NAPI instance for each traffic class and the stack should
> > > > know in which order to call the poll() callbacks so that the priority is
> respected.
> > >
> > > Right, something like that. But IMHO not needed if HW can serve the
> > > right descriptor upon poll.
> >
> > After thinking this through I don't actually believe that multiple
> > NAPI instances would solve this in any circumstance at all:
> >
> > - If you have hardware prioritization with full scheduling on dequeue
> > then job on the driver side is already done.
> > - If you only have hardware assist for prioritization (ie hardware
> > gives you multiple rings but doesn't tell you from which one to
> > dequeue) then you can still use a single NAPI instance just fine and pick the
> highest priority non-empty ring on-the-fly basically.
> >
> > What I am having trouble understanding is how the fully software
> > implementation of this possible new Rx qdisc should work. Somehow the
> > skb->priority should be taken into account when the skb is passing
> > though the stack (ie a higher priority skb should surpass another
> > previously received skb even if the latter one was received first, but its priority
> queue is congested).
> 
> I'd think the SW implementation would come down to which ring to service first.
> If there are multiple rings on the host NAPI can try to read from highest priority
> ring first and then move on to next prio.
> Not sure if there would be a use case for multiple NAPIs for busy polling or not.
> 
> I was hoping we can solve this with the new ring config API (which is coming any
> day now, ehh) - in which I hope user space will be able to assign rings to NAPI
> instances, all we would have needed would be also controlling the querying
> order. But that doesn't really work for you, it seems, since the selection is
> offloaded to HW :S
> 

Yes, I would need only the configuration of traffic classes and their priorities and
not the software prioritization.

I'll keep a close eye on the mailing list to see what the new ring config API
that you're referring to is adding.

> > I don't have a very deep understanding of the stack but I am thinking
> > that the
> > enqueue_to_backlog()/process_backlog() area could be a candidate place
> > for sorting out bottlenecks. In case we do that I don't see why a
> > qdisc would be necessary at all and not have everybody benefit from
> prioritization based on skb->priority.
> 
> I think once the driver picks the frame up it should run with it to completion (+/-
> GRO). We have natural batching with NAPI processing.
> Every NAPI budget high priority rings get a chance to preempt lower ones.
Ioana Ciornei May 29, 2020, 11:45 a.m. UTC | #18
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
> 
> On Sat, 16 May 2020 08:16:47 +0000 Ioana Ciornei wrote:
> > > With the Rx QoS features users won't even be able to tell via
> > > standard Linux interfaces what the config was.
> >
> > Ok, that is true. So how should this information be exported to the user?
> 
> I believe no such interface currently exists.

Somehow I missed this the first time around but the number of Rx traffic classes
can be exported through the DCB ops if those traffic classes are PFC enabled.
Also, adding PFC support was the main target of this patch set.

An output like the following would convey to the user how many traffic classes
are available and which of them are PFC enabled.

root@localhost:~# lldptool -t -i eth1 -V PFC
IEEE 8021QAZ PFC TLV
         Willing: yes
         MACsec Bypass Capable: no
         PFC capable traffic classes: 8
         PFC enabled: 1 3 7

Ioana
Jakub Kicinski May 29, 2020, 7:40 p.m. UTC | #19
On Fri, 29 May 2020 11:45:08 +0000 Ioana Ciornei wrote:
> > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> > classes
> > 
> > On Sat, 16 May 2020 08:16:47 +0000 Ioana Ciornei wrote:  
> > > > With the Rx QoS features users won't even be able to tell via
> > > > standard Linux interfaces what the config was.  
> > >
> > > Ok, that is true. So how should this information be exported to the user?  
> > 
> > I believe no such interface currently exists.  
> 
> Somehow I missed this the first time around but the number of Rx traffic classes
> can be exported through the DCB ops if those traffic classes are PFC enabled.
> Also, adding PFC support was the main target of this patch set.
> 
> An output like the following would convey to the user how many traffic classes
> are available and which of them are PFC enabled.
> 
> root@localhost:~# lldptool -t -i eth1 -V PFC
> IEEE 8021QAZ PFC TLV
>          Willing: yes
>          MACsec Bypass Capable: no
>          PFC capable traffic classes: 8
>          PFC enabled: 1 3 7

Ack, the DCB APIs are probably the closest today to what you need. I'm
not sure whether there is an established relation between the tcs
there, and the number of queues reported and used in ethtool, though :(