mbox series

[RFC,net-next,0/6] devlink: Add device metric support

Message ID 20200817125059.193242-1-idosch@idosch.org
Headers show
Series devlink: Add device metric support | expand

Message

Ido Schimmel Aug. 17, 2020, 12:50 p.m. UTC
From: Ido Schimmel <idosch@nvidia.com>

This patch set extends devlink to allow device drivers to expose device
metrics to user space in a standard and extensible fashion, as opposed
to the driver-specific debugfs approach.

This a joint work with Amit Cohen and Danielle Ratson during a two-day
company hackathon.

Motivation
==========

Certain devices have metrics (e.g., diagnostic counters, histograms)
that are useful to expose to user space for testing and debugging
purposes.

Currently, there is no standardized interface through which these
metrics can be exposed and most drivers resort to debugfs, which is not
very welcome in the networking subsystem and for good reasons. For one,
it is not a stable interface on which users can rely. Secondly, it
results in duplicated code and inconsistent interfaces in case drivers
are implementing similar functionality.

While Ethernet drivers can expose per-port counters to user space via
ethtool, they cannot expose device-wide metrics or configurable metrics
such as counters that can be enabled / disabled or histogram agents.

Solution overview
=================

Currently, the only supported metric type is a counter, but histograms
will be added in the future. The current interface is:

devlink dev metric show [ DEV metric METRIC | group GROUP ]
devlink dev metric set DEV metric METRIC [ group GROUP ]

Device drivers can dynamically register or unregister their metrics with
devlink by calling devlink_metric_counter_create() /
devlink_metric_destroy().

Grouping allows user space to group certain metrics together so that
they can be queried from the kernel using one request and retrieved in a
single response filtered by the kernel (i.e., kernel sets
NLM_F_DUMP_FILTERED).

Example
=======

Instantiate two netdevsim devices:

# echo "10 1" > /sys/bus/netdevsim/new_device
# echo "20 1" > /sys/bus/netdevsim/new_device

Dump all available metrics:

# devlink -s dev metric show
netdevsim/netdevsim10:
   metric dummy_counter type counter group 0 value 2
netdevsim/netdevsim20:
   metric dummy_counter type counter group 0 value 2

Dump a specific metric:

# devlink -s dev metric show netdevsim/netdevsim10 metric dummy_counter
netdevsim/netdevsim10:
   metric dummy_counter type counter group 0 value 3

Set a metric to a different group:

# devlink dev metric set netdevsim/netdevsim10 metric dummy_counter group 10

Dump all metrics in a specific group:

# devlink -s dev metric show group 10
netdevsim/netdevsim10:
   metric dummy_counter type counter group 10 value 4

Future extensions
=================

1. Enablement and disablement of metrics. This is useful in case the
metric adds latency when enabled or consumes limited resources (e.g.,
counters or histogram agents). It is up to the device driver to decide
if a metric is enabled by default or not. Proposed interface:

devlink dev metric set DEV metric METRIC [ group GROUP ]
	[ enable { true | false } ]

2. Histogram metrics. Some devices have the ability to calculate
histograms in hardware by sampling a specific parameter multiple times
per second. For example, the transmission queue depth of a port. This
enables the debugging of microbursts which would otherwise be invisible.
While this can be achieved in software using BPF, it is not applicable
when the data plane is offloaded as the CPU does not see the traffic.
Proposed interface:

devlink dev metric set DEV metric METRIC [ group GROUP ]
	[ enable { true | false } ] [ hist_type { linear | exp } ]
	[ hist_sample_interval SAMPLE ] [ hist_min MIN ] [ hist_max MAX ]
	[ hist_buckets BUCKETS ]

3. Per-port metrics. While all the metrics can be exposed as global and
namespaced as per-port by naming them accordingly, there is value in
allowing user space to dump all metrics related to a certain port.
Proposed interface:

devlink port metric set DEV/PORT_INDEX metric METRIC [ group GROUP ]
	[ enable { true | false } ] [ hist_type { linear | exp } ]
	[ hist_sample_interval SAMPLE ] [ hist_min MIN ] [ hist_max MAX ]
	[ hist_buckets BUCKETS ]
devlink port metric show [ DEV/PORT_INDEX metric METRIC | group GROUP ]

To avoid duplicating ethtool functionality we can decide to expose via
this interface only:

1. Configurable metrics
2. Metrics that are not only relevant to Ethernet ports

TODO
====

1. Add devlink-metric man page
2. Add selftests over mlxsw

Ido Schimmel (6):
  devlink: Add device metric infrastructure
  netdevsim: Add devlink metric support
  selftests: netdevsim: Add devlink metric tests
  mlxsw: reg: Add Tunneling NVE Counters Register
  mlxsw: reg: Add Tunneling NVE Counters Register Version 2
  mlxsw: spectrum_nve: Expose VXLAN counters via devlink-metric

 .../networking/devlink/devlink-metric.rst     |  37 ++
 Documentation/networking/devlink/index.rst    |   1 +
 Documentation/networking/devlink/mlxsw.rst    |  36 ++
 drivers/net/ethernet/mellanox/mlxsw/reg.h     | 104 ++++++
 .../ethernet/mellanox/mlxsw/spectrum_nve.h    |  10 +
 .../mellanox/mlxsw/spectrum_nve_vxlan.c       | 285 +++++++++++++++
 drivers/net/netdevsim/dev.c                   |  92 ++++-
 drivers/net/netdevsim/netdevsim.h             |   1 +
 include/net/devlink.h                         |  18 +
 include/uapi/linux/devlink.h                  |  19 +
 net/core/devlink.c                            | 346 ++++++++++++++++++
 .../drivers/net/netdevsim/devlink.sh          |  49 ++-
 12 files changed, 995 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/devlink/devlink-metric.rst

Comments

Jakub Kicinski Aug. 19, 2020, 12:24 a.m. UTC | #1
On Mon, 17 Aug 2020 15:50:53 +0300 Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> This patch set extends devlink to allow device drivers to expose device
> metrics to user space in a standard and extensible fashion, as opposed
> to the driver-specific debugfs approach.

I feel like all those loose hardware interfaces are a huge maintenance
burden. I don't know what the solution is, but the status quo is not
great.

I spend way too much time patrolling ethtool -S outputs already.

I've done my absolute best to make sure that something as simple as
updating device firmware can be done in a vendor-agnostic fashion, 
and I'm not sure I've succeeded. Every single vendor comes up with
their own twists.

Long story short I'm extremely unexcited about another interface where
drivers expose random strings of their picking. Maybe udp_tunnel module
can have "global" stats?

This would be a good topic for netconf, or LPC hallway discussion :(
David Ahern Aug. 19, 2020, 2:43 a.m. UTC | #2
On 8/18/20 6:24 PM, Jakub Kicinski wrote:
> On Mon, 17 Aug 2020 15:50:53 +0300 Ido Schimmel wrote:
>> From: Ido Schimmel <idosch@nvidia.com>
>>
>> This patch set extends devlink to allow device drivers to expose device
>> metrics to user space in a standard and extensible fashion, as opposed
>> to the driver-specific debugfs approach.
> 
> I feel like all those loose hardware interfaces are a huge maintenance
> burden. I don't know what the solution is, but the status quo is not
> great.

I don't agree with the 'loose' characterization. Ido and team are
pushing what is arguably a modern version of `ethtool -S`, so it
provides a better API for retrieving data.

> 
> I spend way too much time patrolling ethtool -S outputs already.
> 

But that's the nature of detailed stats which are often essential to
ensuring the system is operating as expected or debugging some problem.
Commonality is certainly desired in names when relevant to be able to
build tooling around the stats. As an example, per-queue stats have been
essential to me for recent investigations. ethq has been really helpful
in crossing NIC vendors and viewing those stats as it handles the
per-vendor naming differences, but it requires changes to show anything
else - errors per queue, xdp stats, drops, etc. This part could be simpler.

As for this set, I believe the metrics exposed here are more unique to
switch ASICs. At least one company I know of has built a business model
around exposing detailed telemetry of switch ASICs, so clearly some find
them quite valuable.
Jakub Kicinski Aug. 19, 2020, 3:35 a.m. UTC | #3
On Tue, 18 Aug 2020 20:43:11 -0600 David Ahern wrote:
> On 8/18/20 6:24 PM, Jakub Kicinski wrote:
> > On Mon, 17 Aug 2020 15:50:53 +0300 Ido Schimmel wrote:  
> >> From: Ido Schimmel <idosch@nvidia.com>
> >>
> >> This patch set extends devlink to allow device drivers to expose device
> >> metrics to user space in a standard and extensible fashion, as opposed
> >> to the driver-specific debugfs approach.  
> > 
> > I feel like all those loose hardware interfaces are a huge maintenance
> > burden. I don't know what the solution is, but the status quo is not
> > great.  
> 
> I don't agree with the 'loose' characterization.

Loose as in not bound by any standard or best practices.

> Ido and team are pushing what is arguably a modern version of
> `ethtool -S`, so it provides a better API for retrieving data.

ethtool -S is absolutely terrible. Everybody comes up with their own
names for IEEE stats, and dumps stats which clearly have corresponding
fields in rtnl_link_stats64 there. We don't need a modern ethtool -S,
we need to get away from that mess.

> > I spend way too much time patrolling ethtool -S outputs already.
> 
> But that's the nature of detailed stats which are often essential to
> ensuring the system is operating as expected or debugging some problem.
> Commonality is certainly desired in names when relevant to be able to
> build tooling around the stats.

There are stats which are clearly detailed and device specific, 
but what ends up happening is that people expose very much not
implementation specific stats through the free form interfaces, 
because it's the easiest. 

And users are left picking up the pieces, having to ask vendors what
each stat means, and trying to create abstractions in their user space
glue.

> As an example, per-queue stats have been
> essential to me for recent investigations. ethq has been really helpful
> in crossing NIC vendors and viewing those stats as it handles the
> per-vendor naming differences, but it requires changes to show anything
> else - errors per queue, xdp stats, drops, etc. This part could be simpler.

Sounds like you're agreeing with me?

> As for this set, I believe the metrics exposed here are more unique to
> switch ASICs.

This is the list from patch 6:

   * - ``nve_vxlan_encap``
   * - ``nve_vxlan_decap``
   * - ``nve_vxlan_decap_errors``
   * - ``nve_vxlan_decap_discards``

What's so unique?

> At least one company I know of has built a business model
> around exposing detailed telemetry of switch ASICs, so clearly some find
> them quite valuable.

It's a question of interface, not the value of exposed data.

If I have to download vendor documentation and tooling, or adapt my own
scripts for every new vendor, I could have as well downloaded an SDK.
Florian Fainelli Aug. 19, 2020, 4:30 a.m. UTC | #4
On 8/18/2020 8:35 PM, Jakub Kicinski wrote:
> On Tue, 18 Aug 2020 20:43:11 -0600 David Ahern wrote:
>> On 8/18/20 6:24 PM, Jakub Kicinski wrote:
>>> On Mon, 17 Aug 2020 15:50:53 +0300 Ido Schimmel wrote:
>>>> From: Ido Schimmel <idosch@nvidia.com>
>>>>
>>>> This patch set extends devlink to allow device drivers to expose device
>>>> metrics to user space in a standard and extensible fashion, as opposed
>>>> to the driver-specific debugfs approach.
>>>
>>> I feel like all those loose hardware interfaces are a huge maintenance
>>> burden. I don't know what the solution is, but the status quo is not
>>> great.
>>
>> I don't agree with the 'loose' characterization.
> 
> Loose as in not bound by any standard or best practices.
> 
>> Ido and team are pushing what is arguably a modern version of
>> `ethtool -S`, so it provides a better API for retrieving data.
> 
> ethtool -S is absolutely terrible. Everybody comes up with their own
> names for IEEE stats, and dumps stats which clearly have corresponding
> fields in rtnl_link_stats64 there. We don't need a modern ethtool -S,
> we need to get away from that mess.
> 
>>> I spend way too much time patrolling ethtool -S outputs already.
>>
>> But that's the nature of detailed stats which are often essential to
>> ensuring the system is operating as expected or debugging some problem.
>> Commonality is certainly desired in names when relevant to be able to
>> build tooling around the stats.
> 
> There are stats which are clearly detailed and device specific,
> but what ends up happening is that people expose very much not
> implementation specific stats through the free form interfaces,
> because it's the easiest.
> 
> And users are left picking up the pieces, having to ask vendors what
> each stat means, and trying to create abstractions in their user space
> glue.

Should we require vendors to either provide a Documentation/ entry for 
each statistics they have (and be guaranteed that it will be outdated 
unless someone notices), or would you rather have the statistics 
description be part of the devlink interface itself? Should we define 
namespaces such that standard metrics should be under the standard 
namespace and the vendor standard is the wild west?

> 
>> As an example, per-queue stats have been
>> essential to me for recent investigations. ethq has been really helpful
>> in crossing NIC vendors and viewing those stats as it handles the
>> per-vendor naming differences, but it requires changes to show anything
>> else - errors per queue, xdp stats, drops, etc. This part could be simpler.
> 
> Sounds like you're agreeing with me?
> 
>> As for this set, I believe the metrics exposed here are more unique to
>> switch ASICs.
> 
> This is the list from patch 6:
> 
>     * - ``nve_vxlan_encap``
>     * - ``nve_vxlan_decap``
>     * - ``nve_vxlan_decap_errors``
>     * - ``nve_vxlan_decap_discards``
> 
> What's so unique?
> 
>> At least one company I know of has built a business model
>> around exposing detailed telemetry of switch ASICs, so clearly some find
>> them quite valuable.
> 
> It's a question of interface, not the value of exposed data.
> 
> If I have to download vendor documentation and tooling, or adapt my own
> scripts for every new vendor, I could have as well downloaded an SDK.

Are not you being a bit over dramatic here with your example? At least 
you can run the same command to obtain the stats regardless of the 
driver and vendor, so from that perspective Linux continues to be the 
abstraction and that is not broken.
Jakub Kicinski Aug. 19, 2020, 4:18 p.m. UTC | #5
On Tue, 18 Aug 2020 21:30:16 -0700 Florian Fainelli wrote:
> >>> I spend way too much time patrolling ethtool -S outputs already.  
> >>
> >> But that's the nature of detailed stats which are often essential to
> >> ensuring the system is operating as expected or debugging some problem.
> >> Commonality is certainly desired in names when relevant to be able to
> >> build tooling around the stats.  
> > 
> > There are stats which are clearly detailed and device specific,
> > but what ends up happening is that people expose very much not
> > implementation specific stats through the free form interfaces,
> > because it's the easiest.
> > 
> > And users are left picking up the pieces, having to ask vendors what
> > each stat means, and trying to create abstractions in their user space
> > glue.  
> 
> Should we require vendors to either provide a Documentation/ entry for 
> each statistics they have (and be guaranteed that it will be outdated 
> unless someone notices), or would you rather have the statistics 
> description be part of the devlink interface itself? Should we define 
> namespaces such that standard metrics should be under the standard 
> namespace and the vendor standard is the wild west?

I'm trying to find a solution which will not require a policeman to
constantly monitor the compliance. Please see my effort to ensure
drivers document and use the same ethtool -S stats in the TLS offload
implementations. I've been trying to improve this situation for a long
time, and it's getting old.

Please focus on the stats this set adds, instead of fantasizing of what
could be. These are absolutely not implementation specific!

> > If I have to download vendor documentation and tooling, or adapt my own
> > scripts for every new vendor, I could have as well downloaded an SDK.  
> 
> Are not you being a bit over dramatic here with your example? 

I hope not. It's very hard/impossible today to run a fleet of Linux
machines without resorting to vendor tooling.

> At least  you can run the same command to obtain the stats regardless
> of the driver and vendor, so from that perspective Linux continues to
> be the abstraction and that is not broken.

Format of the data is no abstraction.
Florian Fainelli Aug. 19, 2020, 5:20 p.m. UTC | #6
On 8/19/20 9:18 AM, Jakub Kicinski wrote:
> On Tue, 18 Aug 2020 21:30:16 -0700 Florian Fainelli wrote:
>>>>> I spend way too much time patrolling ethtool -S outputs already.  
>>>>
>>>> But that's the nature of detailed stats which are often essential to
>>>> ensuring the system is operating as expected or debugging some problem.
>>>> Commonality is certainly desired in names when relevant to be able to
>>>> build tooling around the stats.  
>>>
>>> There are stats which are clearly detailed and device specific,
>>> but what ends up happening is that people expose very much not
>>> implementation specific stats through the free form interfaces,
>>> because it's the easiest.
>>>
>>> And users are left picking up the pieces, having to ask vendors what
>>> each stat means, and trying to create abstractions in their user space
>>> glue.  
>>
>> Should we require vendors to either provide a Documentation/ entry for 
>> each statistics they have (and be guaranteed that it will be outdated 
>> unless someone notices), or would you rather have the statistics 
>> description be part of the devlink interface itself? Should we define 
>> namespaces such that standard metrics should be under the standard 
>> namespace and the vendor standard is the wild west?
> 
> I'm trying to find a solution which will not require a policeman to
> constantly monitor the compliance. Please see my effort to ensure
> drivers document and use the same ethtool -S stats in the TLS offload
> implementations. I've been trying to improve this situation for a long
> time, and it's getting old.

Which is why I am asking genuinely what do you think should be done
besides doing more code reviews? It does not seem to me that there is an
easy way to catch new stats being added with tools/scripts/whatever and
then determine what they are about, right?

> 
> Please focus on the stats this set adds, instead of fantasizing of what
> could be. These are absolutely not implementation specific!

Not sure if fantasizing is quite what I would use. I am just pointing
out that given the inability to standardize on statistics maybe we
should have namespaces and try our best to have everything fit into the
standard namespace along with a standard set of names, and push back
whenever we see vendor stats being added (or more pragmatically, ask
what they are). But maybe this very idea is moot.

> 
>>> If I have to download vendor documentation and tooling, or adapt my own
>>> scripts for every new vendor, I could have as well downloaded an SDK.  
>>
>> Are not you being a bit over dramatic here with your example? 
> 
> I hope not. It's very hard/impossible today to run a fleet of Linux
> machines without resorting to vendor tooling.

Your argument was putting on the same level resorting to vendor tooling
to extract meaningful statistics/counters versus using a SDK to operate
the hardware (this is how I understood it), and I do not believe this is
fair.

> 
>> At least  you can run the same command to obtain the stats regardless
>> of the driver and vendor, so from that perspective Linux continues to
>> be the abstraction and that is not broken.
> 
> Format of the data is no abstraction.
>
Jakub Kicinski Aug. 19, 2020, 6:07 p.m. UTC | #7
On Wed, 19 Aug 2020 10:20:08 -0700 Florian Fainelli wrote:
> > I'm trying to find a solution which will not require a policeman to
> > constantly monitor the compliance. Please see my effort to ensure
> > drivers document and use the same ethtool -S stats in the TLS offload
> > implementations. I've been trying to improve this situation for a long
> > time, and it's getting old.  
> 
> Which is why I am asking genuinely what do you think should be done
> besides doing more code reviews? It does not seem to me that there is an
> easy way to catch new stats being added with tools/scripts/whatever and
> then determine what they are about, right?

I don't have a great way forward in mind, sadly. All I can think of is
that we should try to create more well defined interfaces and steer
away from free-form ones.

Example, here if the stats are vxlan decap/encap/error - we should
expose that from the vxlan module. That way vxlan module defines one
set of stats for everyone.

In general unless we attach stats to the object they relate to, we will
end up building parallel structures for exposing statistics from the
drivers. I posted a set once which was implementing hierarchical stats,
but I've abandoned it for this reason.

> > Please focus on the stats this set adds, instead of fantasizing of what
> > could be. These are absolutely not implementation specific!  
> 
> Not sure if fantasizing is quite what I would use. I am just pointing
> out that given the inability to standardize on statistics maybe we
> should have namespaces and try our best to have everything fit into the
> standard namespace along with a standard set of names, and push back
> whenever we see vendor stats being added (or more pragmatically, ask
> what they are). But maybe this very idea is moot.

IDK. I just don't feel like this is going to fly, see how many names
people invented for the CRC error statistic in ethtool -S, even tho
there is a standard stat for that! And users are actually parsing the
output of ethtool -S to get CRC stats because (a) it became the go-to
place for NIC stats and (b) some drivers forget to report in the
standard place.

The cover letter says this set replaces the bad debugfs with a good,
standard API. It may look good and standard for _vendors_ because they
will know where to dump their counters, but it makes very little
difference for _users_. If I have to parse names for every vendor I use,
I can as well add a per-vendor debugfs path to my script.

The bar for implementation-specific driver stats has to be high.

> >>> If I have to download vendor documentation and tooling, or adapt my own
> >>> scripts for every new vendor, I could have as well downloaded an SDK.    
> >>
> >> Are not you being a bit over dramatic here with your example?   
> > 
> > I hope not. It's very hard/impossible today to run a fleet of Linux
> > machines without resorting to vendor tooling.  
> 
> Your argument was putting on the same level resorting to vendor tooling
> to extract meaningful statistics/counters versus using a SDK to operate
> the hardware (this is how I understood it), and I do not believe this is
> fair.

Okay, fair. I just think that in datacenter deployments we are way
closer to the SDK model than people may want to admit.
David Ahern Aug. 20, 2020, 2:35 p.m. UTC | #8
On 8/19/20 12:07 PM, Jakub Kicinski wrote:
> On Wed, 19 Aug 2020 10:20:08 -0700 Florian Fainelli wrote:
>>> I'm trying to find a solution which will not require a policeman to
>>> constantly monitor the compliance. Please see my effort to ensure
>>> drivers document and use the same ethtool -S stats in the TLS offload
>>> implementations. I've been trying to improve this situation for a long
>>> time, and it's getting old.  
>>
>> Which is why I am asking genuinely what do you think should be done
>> besides doing more code reviews? It does not seem to me that there is an
>> easy way to catch new stats being added with tools/scripts/whatever and
>> then determine what they are about, right?
> 
> I don't have a great way forward in mind, sadly. All I can think of is
> that we should try to create more well defined interfaces and steer
> away from free-form ones.

There is a lot of value in free-form too.

> 
> Example, here if the stats are vxlan decap/encap/error - we should
> expose that from the vxlan module. That way vxlan module defines one
> set of stats for everyone.
> 
> In general unless we attach stats to the object they relate to, we will
> end up building parallel structures for exposing statistics from the
> drivers. I posted a set once which was implementing hierarchical stats,
> but I've abandoned it for this reason.
> 
>>> Please focus on the stats this set adds, instead of fantasizing of what
>>> could be. These are absolutely not implementation specific!  
>>
>> Not sure if fantasizing is quite what I would use. I am just pointing
>> out that given the inability to standardize on statistics maybe we
>> should have namespaces and try our best to have everything fit into the
>> standard namespace along with a standard set of names, and push back
>> whenever we see vendor stats being added (or more pragmatically, ask
>> what they are). But maybe this very idea is moot.
> 
> IDK. I just don't feel like this is going to fly, see how many names
> people invented for the CRC error statistic in ethtool -S, even tho
> there is a standard stat for that! And users are actually parsing the
> output of ethtool -S to get CRC stats because (a) it became the go-to
> place for NIC stats and (b) some drivers forget to report in the
> standard place.
> 
> The cover letter says this set replaces the bad debugfs with a good,
> standard API. It may look good and standard for _vendors_ because they
> will know where to dump their counters, but it makes very little
> difference for _users_. If I have to parse names for every vendor I use,
> I can as well add a per-vendor debugfs path to my script.
> 
> The bar for implementation-specific driver stats has to be high.

My take away from this is you do not like the names - the strings side
of it.

Do you object to the netlink API? The netlink API via devlink?

'perf' has json files to describe and document counters
(tools/perf/pmu-events). Would something like that be acceptable as a
form of in-tree documentation of counters? (vs Documentation/networking
or URLs like
https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)

> 
>>>>> If I have to download vendor documentation and tooling, or adapt my own
>>>>> scripts for every new vendor, I could have as well downloaded an SDK.    
>>>>
>>>> Are not you being a bit over dramatic here with your example?   
>>>
>>> I hope not. It's very hard/impossible today to run a fleet of Linux
>>> machines without resorting to vendor tooling.  
>>
>> Your argument was putting on the same level resorting to vendor tooling
>> to extract meaningful statistics/counters versus using a SDK to operate
>> the hardware (this is how I understood it), and I do not believe this is
>> fair.
> 
> Okay, fair. I just think that in datacenter deployments we are way
> closer to the SDK model than people may want to admit.
> 

I do not agree with that; the SDK model means you *must* use vendor code
to make something work. Your argument here is about labels for stats and
an understanding of their meaning.
Jakub Kicinski Aug. 20, 2020, 4:09 p.m. UTC | #9
On Thu, 20 Aug 2020 08:35:25 -0600 David Ahern wrote:
> On 8/19/20 12:07 PM, Jakub Kicinski wrote:
> > I don't have a great way forward in mind, sadly. All I can think of is
> > that we should try to create more well defined interfaces and steer
> > away from free-form ones.  
> 
> There is a lot of value in free-form too.

On Tue, 18 Aug 2020 20:35:01 -0700 Jakub Kicinski wrote:
> It's a question of interface, not the value of exposed data.

> > Example, here if the stats are vxlan decap/encap/error - we should
> > expose that from the vxlan module. That way vxlan module defines one
> > set of stats for everyone.
> > 
> > In general unless we attach stats to the object they relate to, we will
> > end up building parallel structures for exposing statistics from the
> > drivers. I posted a set once which was implementing hierarchical stats,
> > but I've abandoned it for this reason.
> > > [...]
> > 
> > IDK. I just don't feel like this is going to fly, see how many names
> > people invented for the CRC error statistic in ethtool -S, even tho
> > there is a standard stat for that! And users are actually parsing the
> > output of ethtool -S to get CRC stats because (a) it became the go-to
> > place for NIC stats and (b) some drivers forget to report in the
> > standard place.
> > 
> > The cover letter says this set replaces the bad debugfs with a good,
> > standard API. It may look good and standard for _vendors_ because they
> > will know where to dump their counters, but it makes very little
> > difference for _users_. If I have to parse names for every vendor I use,
> > I can as well add a per-vendor debugfs path to my script.
> > 
> > The bar for implementation-specific driver stats has to be high.  
> 
> My take away from this is you do not like the names - the strings side
> of it.
> 
> Do you object to the netlink API? The netlink API via devlink?
> 
> 'perf' has json files to describe and document counters
> (tools/perf/pmu-events). Would something like that be acceptable as a
> form of in-tree documentation of counters? (vs Documentation/networking
> or URLs like
> https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)

Please refer to what I said twice now about the definition of the stats
exposed here belonging with the VxLAN code, not the driver.

> > Okay, fair. I just think that in datacenter deployments we are way
> > closer to the SDK model than people may want to admit.
> 
> I do not agree with that; the SDK model means you *must* use vendor code
> to make something work. Your argument here is about labels for stats and
> an understanding of their meaning.

Sure, no "must" for passing packets, but you "must" use vendor tooling
to operate a fleet.

Since everybody already has vendor tools what value does this API add?
I still need per vendor logic. Let's try to build APIs which will
actually make user's life easier, which users will want to switch to.
Ido Schimmel Aug. 21, 2020, 10:30 a.m. UTC | #10
On Thu, Aug 20, 2020 at 09:09:42AM -0700, Jakub Kicinski wrote:
> On Thu, 20 Aug 2020 08:35:25 -0600 David Ahern wrote:
> > On 8/19/20 12:07 PM, Jakub Kicinski wrote:
> > > I don't have a great way forward in mind, sadly. All I can think of is
> > > that we should try to create more well defined interfaces and steer
> > > away from free-form ones.  
> > 
> > There is a lot of value in free-form too.
> 
> On Tue, 18 Aug 2020 20:35:01 -0700 Jakub Kicinski wrote:
> > It's a question of interface, not the value of exposed data.
> 
> > > Example, here if the stats are vxlan decap/encap/error - we should
> > > expose that from the vxlan module. That way vxlan module defines one
> > > set of stats for everyone.
> > > 
> > > In general unless we attach stats to the object they relate to, we will
> > > end up building parallel structures for exposing statistics from the
> > > drivers. I posted a set once which was implementing hierarchical stats,
> > > but I've abandoned it for this reason.
> > > > [...]
> > > 
> > > IDK. I just don't feel like this is going to fly, see how many names
> > > people invented for the CRC error statistic in ethtool -S, even tho
> > > there is a standard stat for that! And users are actually parsing the
> > > output of ethtool -S to get CRC stats because (a) it became the go-to
> > > place for NIC stats and (b) some drivers forget to report in the
> > > standard place.
> > > 
> > > The cover letter says this set replaces the bad debugfs with a good,
> > > standard API. It may look good and standard for _vendors_ because they
> > > will know where to dump their counters, but it makes very little
> > > difference for _users_. If I have to parse names for every vendor I use,
> > > I can as well add a per-vendor debugfs path to my script.
> > > 
> > > The bar for implementation-specific driver stats has to be high.  
> > 
> > My take away from this is you do not like the names - the strings side
> > of it.
> > 
> > Do you object to the netlink API? The netlink API via devlink?
> > 
> > 'perf' has json files to describe and document counters
> > (tools/perf/pmu-events). Would something like that be acceptable as a
> > form of in-tree documentation of counters? (vs Documentation/networking
> > or URLs like
> > https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)
> 
> Please refer to what I said twice now about the definition of the stats
> exposed here belonging with the VxLAN code, not the driver.

Please refer to the changelog:

"The Spectrum ASICs have a single hardware VTEP that is able to perform
VXLAN encapsulation and decapsulation. The VTEP is logically mapped by
mlxsw to the multiple VXLAN netdevs that are using it. Exposing the
counters of this VTEP via the multiple VXLAN netdevs that are using it
would be both inaccurate and confusing for users.
    
Instead, expose the counters of the VTEP via devlink-metric. Note that
Spectrum-1 supports a different set of counters compared to newer ASICs
in the Spectrum family."

Hardware implementations will rarely fit 1:1 to the nice and discrete
software implementations that they try to accelerate. The purpose of
this API is exposing metrics specific to these hardware implementations.
This results in better visibility which can be leveraged for faster
debugging and more thorough testing.

The reason I came up with this interface is not the specific VXLAN
metrics that bother you, but a new platform we are working on. It uses
the ASIC as a cache that refers lookups to an external device in case of
cache misses. It is completely transparent to user space (you get better
scale), but the driver is very much aware of this stuff as it needs to
insert objects (e.g., routes) in a way that will minimize cache misses.
Just checking that ping works is hardly enough. We must be able to read
the cache counters to ensure we do not see cache misses when we do not
expect them.

As another example, consider the algorithmic TCAM implementation we have
in Spectrum-2 for ACLs [1]. While a user simply adds / deletes filters,
the driver needs to jump through multiple hops in order to program them
in a way that will result in a better scale and reduced latency. We
currently do not have an interface through which we can expose metrics
related to this specific implementation.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=756cd36626f773e9a72a39c1dd12da4deacfacdf

> 
> > > Okay, fair. I just think that in datacenter deployments we are way
> > > closer to the SDK model than people may want to admit.
> > 
> > I do not agree with that; the SDK model means you *must* use vendor code
> > to make something work. Your argument here is about labels for stats and
> > an understanding of their meaning.
> 
> Sure, no "must" for passing packets, but you "must" use vendor tooling
> to operate a fleet.
> 
> Since everybody already has vendor tools what value does this API add?

We don't have any "vendor tools" to get this information. Our team is
doing everything it possibly can in order to move away from such an
approach.

> I still need per vendor logic. Let's try to build APIs which will
> actually make user's life easier, which users will want to switch to.

Developers are also users and they should be able to read whatever
information they need from the device in order to help them do their
work. You have a multitude of tools (e.g., kprobes, tracepoints) to get
better visibility into the software data path. Commonality is not a
reason to be blind as a bat when looking into the hardware data path.
Jakub Kicinski Aug. 21, 2020, 4:53 p.m. UTC | #11
On Fri, 21 Aug 2020 13:30:21 +0300 Ido Schimmel wrote:
> On Thu, Aug 20, 2020 at 09:09:42AM -0700, Jakub Kicinski wrote:
> > On Thu, 20 Aug 2020 08:35:25 -0600 David Ahern wrote:  
> > > On 8/19/20 12:07 PM, Jakub Kicinski wrote:  
> > > > I don't have a great way forward in mind, sadly. All I can think of is
> > > > that we should try to create more well defined interfaces and steer
> > > > away from free-form ones.    
> > > 
> > > There is a lot of value in free-form too.  
> > 
> > On Tue, 18 Aug 2020 20:35:01 -0700 Jakub Kicinski wrote:  
> > > It's a question of interface, not the value of exposed data.  
> >   
> > > > Example, here if the stats are vxlan decap/encap/error - we should
> > > > expose that from the vxlan module. That way vxlan module defines one
> > > > set of stats for everyone.
> > > > 
> > > > In general unless we attach stats to the object they relate to, we will
> > > > end up building parallel structures for exposing statistics from the
> > > > drivers. I posted a set once which was implementing hierarchical stats,
> > > > but I've abandoned it for this reason.  
> > > > > [...]  
> > > > 
> > > > IDK. I just don't feel like this is going to fly, see how many names
> > > > people invented for the CRC error statistic in ethtool -S, even tho
> > > > there is a standard stat for that! And users are actually parsing the
> > > > output of ethtool -S to get CRC stats because (a) it became the go-to
> > > > place for NIC stats and (b) some drivers forget to report in the
> > > > standard place.
> > > > 
> > > > The cover letter says this set replaces the bad debugfs with a good,
> > > > standard API. It may look good and standard for _vendors_ because they
> > > > will know where to dump their counters, but it makes very little
> > > > difference for _users_. If I have to parse names for every vendor I use,
> > > > I can as well add a per-vendor debugfs path to my script.
> > > > 
> > > > The bar for implementation-specific driver stats has to be high.    
> > > 
> > > My take away from this is you do not like the names - the strings side
> > > of it.
> > > 
> > > Do you object to the netlink API? The netlink API via devlink?
> > > 
> > > 'perf' has json files to describe and document counters
> > > (tools/perf/pmu-events). Would something like that be acceptable as a
> > > form of in-tree documentation of counters? (vs Documentation/networking
> > > or URLs like
> > > https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)  
> > 
> > Please refer to what I said twice now about the definition of the stats
> > exposed here belonging with the VxLAN code, not the driver.  
> 
> Please refer to the changelog:
> 
> "The Spectrum ASICs have a single hardware VTEP that is able to perform
> VXLAN encapsulation and decapsulation. The VTEP is logically mapped by
> mlxsw to the multiple VXLAN netdevs that are using it. Exposing the
> counters of this VTEP via the multiple VXLAN netdevs that are using it
> would be both inaccurate and confusing for users.
>     
> Instead, expose the counters of the VTEP via devlink-metric. Note that
> Spectrum-1 supports a different set of counters compared to newer ASICs
> in the Spectrum family."

I read your cover letter, I didn't say the metrics have to be reported
on particular netdevs.

> Hardware implementations will rarely fit 1:1 to the nice and discrete
> software implementations that they try to accelerate. The purpose of
> this API is exposing metrics specific to these hardware implementations.
> This results in better visibility which can be leveraged for faster
> debugging and more thorough testing.
> 
> The reason I came up with this interface is not the specific VXLAN
> metrics that bother you, but a new platform we are working on. It uses
> the ASIC as a cache that refers lookups to an external device in case of
> cache misses. It is completely transparent to user space (you get better
> scale), but the driver is very much aware of this stuff as it needs to
> insert objects (e.g., routes) in a way that will minimize cache misses.
> Just checking that ping works is hardly enough. We must be able to read
> the cache counters to ensure we do not see cache misses when we do not
> expect them.
> 
> As another example, consider the algorithmic TCAM implementation we have
> in Spectrum-2 for ACLs [1]. While a user simply adds / deletes filters,
> the driver needs to jump through multiple hops in order to program them
> in a way that will result in a better scale and reduced latency. We
> currently do not have an interface through which we can expose metrics
> related to this specific implementation.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=756cd36626f773e9a72a39c1dd12da4deacfacdf
> 
> >   
> > > > Okay, fair. I just think that in datacenter deployments we are way
> > > > closer to the SDK model than people may want to admit.  
> > > 
> > > I do not agree with that; the SDK model means you *must* use vendor code
> > > to make something work. Your argument here is about labels for stats and
> > > an understanding of their meaning.  
> > 
> > Sure, no "must" for passing packets, but you "must" use vendor tooling
> > to operate a fleet.
> > 
> > Since everybody already has vendor tools what value does this API add?  
> 
> We don't have any "vendor tools" to get this information. Our team is
> doing everything it possibly can in order to move away from such an
> approach.

For clarity I wasn't speaking about your switches, sadly I have no
experience with those.

> > I still need per vendor logic. Let's try to build APIs which will
> > actually make user's life easier, which users will want to switch to.  
> 
> Developers are also users and they should be able to read whatever
> information they need from the device in order to help them do their
> work. You have a multitude of tools (e.g., kprobes, tracepoints) to get
> better visibility into the software data path. Commonality is not a
> reason to be blind as a bat when looking into the hardware data path.

How many times do I have to say that I'm not arguing against the value
of the data? 

If you open up this interface either someone will police it, or it will
become a dumpster.
David Ahern Aug. 21, 2020, 7:12 p.m. UTC | #12
On 8/21/20 10:53 AM, Jakub Kicinski wrote:
> How many times do I have to say that I'm not arguing against the value
> of the data? 
> 
> If you open up this interface either someone will police it, or it will
> become a dumpster.

I am not following what you are proposing as a solution. You do not like
Ido's idea of stats going through devlink, but you are not being clear
on what you think is a better way.

You say vxlan stats belong in the vxlan driver, but the stats do not
have to be reported on particular netdevs. How then do h/w stats get
exposed via vxlan code?
Jakub Kicinski Aug. 21, 2020, 11:50 p.m. UTC | #13
On Fri, 21 Aug 2020 13:12:59 -0600 David Ahern wrote:
> On 8/21/20 10:53 AM, Jakub Kicinski wrote:
> > How many times do I have to say that I'm not arguing against the value
> > of the data? 
> > 
> > If you open up this interface either someone will police it, or it will
> > become a dumpster.  
> 
> I am not following what you are proposing as a solution. You do not like
> Ido's idea of stats going through devlink, but you are not being clear
> on what you think is a better way.
> 
> You say vxlan stats belong in the vxlan driver, but the stats do not
> have to be reported on particular netdevs. How then do h/w stats get
> exposed via vxlan code?

No strong preference, for TLS I've done:

# cat /proc/net/tls_stat 
TlsCurrTxSw                     	0
TlsCurrRxSw                     	0
TlsCurrTxDevice                 	0
TlsCurrRxDevice                 	0
TlsTxSw                         	0
TlsRxSw                         	0
TlsTxDevice                     	0
TlsRxDevice                     	0
TlsDecryptError                 	0
TlsRxDeviceResync               	0

We can add something over netlink, I opted for simplicity since global
stats don't have to scale with number of interfaces.
David Ahern Aug. 21, 2020, 11:59 p.m. UTC | #14
On 8/21/20 5:50 PM, Jakub Kicinski wrote:
> On Fri, 21 Aug 2020 13:12:59 -0600 David Ahern wrote:
>> On 8/21/20 10:53 AM, Jakub Kicinski wrote:
>>> How many times do I have to say that I'm not arguing against the value
>>> of the data? 
>>>
>>> If you open up this interface either someone will police it, or it will
>>> become a dumpster.  
>>
>> I am not following what you are proposing as a solution. You do not like
>> Ido's idea of stats going through devlink, but you are not being clear
>> on what you think is a better way.
>>
>> You say vxlan stats belong in the vxlan driver, but the stats do not
>> have to be reported on particular netdevs. How then do h/w stats get
>> exposed via vxlan code?
> 
> No strong preference, for TLS I've done:

But you clearly *do* have a strong preference.

> 
> # cat /proc/net/tls_stat 

I do not agree with adding files under /proc/net for this.

> TlsCurrTxSw                     	0
> TlsCurrRxSw                     	0
> TlsCurrTxDevice                 	0
> TlsCurrRxDevice                 	0
> TlsTxSw                         	0
> TlsRxSw                         	0
> TlsTxDevice                     	0
> TlsRxDevice                     	0
> TlsDecryptError                 	0
> TlsRxDeviceResync               	0
> 
> We can add something over netlink, I opted for simplicity since global
> stats don't have to scale with number of interfaces. 
> 

IMHO, netlink is the right "channel" to move data from kernel to
userspace, and opting in to *specific* stats is a must have feature.

I think devlink is the right framework given that the stats are device
based but not specific to any particular netdev instance. Further, this
allows discrimination of hardware stats from software stats which if
tied to vxlan as a protocol and somehow pulled from the vxan driver
those would be combined into one (at least how my mind is thinking of this).

####

Let's say the direction is for these specific stats (as opposed to the
general problem that Ido and others are considering) to be pulled from
the vxlan driver. How does that driver get access to hardware stats?
vxlan is a protocol and not tied to devices. How should the connection
be made?
Jakub Kicinski Aug. 22, 2020, 12:37 a.m. UTC | #15
On Fri, 21 Aug 2020 17:59:57 -0600 David Ahern wrote:
> On 8/21/20 5:50 PM, Jakub Kicinski wrote:
> > On Fri, 21 Aug 2020 13:12:59 -0600 David Ahern wrote:  
> >> I am not following what you are proposing as a solution. You do not like
> >> Ido's idea of stats going through devlink, but you are not being clear
> >> on what you think is a better way.
> >>
> >> You say vxlan stats belong in the vxlan driver, but the stats do not
> >> have to be reported on particular netdevs. How then do h/w stats get
> >> exposed via vxlan code?  
> > 
> > No strong preference, for TLS I've done:  
> 
> But you clearly *do* have a strong preference.

I'm answering your question.

The question is "How then do h/w stats get exposed via vxlan code?"

Please note that the question includes "via vxlan code".

So no, I have no preference as long as it's "via vxlan code", and not
directly from the driver with a vendor-invented name.

> > # cat /proc/net/tls_stat   
> 
> I do not agree with adding files under /proc/net for this.

Yeah it's not the best, with higher LoC a better solution should be
within reach.

> > TlsCurrTxSw                     	0
> > TlsCurrRxSw                     	0
> > TlsCurrTxDevice                 	0
> > TlsCurrRxDevice                 	0
> > TlsTxSw                         	0
> > TlsRxSw                         	0
> > TlsTxDevice                     	0
> > TlsRxDevice                     	0
> > TlsDecryptError                 	0
> > TlsRxDeviceResync               	0
> > 
> > We can add something over netlink, I opted for simplicity since global
> > stats don't have to scale with number of interfaces. 
> 
> IMHO, netlink is the right "channel" to move data from kernel to
> userspace, and opting in to *specific* stats is a must have feature.
> 
> I think devlink is the right framework given that the stats are device
> based but not specific to any particular netdev instance. 

I'd be careful with the "not specific to any particular netdev
instance". A perfect API would be flexible when it comes to scoping :)

> Further, this
> allows discrimination of hardware stats from software stats which if
> tied to vxlan as a protocol and somehow pulled from the vxan driver
> those would be combined into one (at least how my mind is thinking of this).

Right, for tls the stats which have "Device" in the name are hardware.
But netlink will have better ways of separating the two.

> ####
> 
> Let's say the direction is for these specific stats (as opposed to the
> general problem that Ido and others are considering) to be pulled from
> the vxlan driver. How does that driver get access to hardware stats?
> vxlan is a protocol and not tied to devices. How should the connection
> be made?

Drivers which offload VxLAN already have a dependency on it, right?
They can just registers to it and get queried on dump. Or if we want
scoping we can piggyback on whatever object stats are scoped to.

*If* we scope on HW objects do we need to worry about some user some
day wanting to have stats per vxlan netdev and per HW instance?
David Ahern Aug. 22, 2020, 1:18 a.m. UTC | #16
On 8/21/20 6:37 PM, Jakub Kicinski wrote:
>>> # cat /proc/net/tls_stat   
>>
>> I do not agree with adding files under /proc/net for this.
> 
> Yeah it's not the best, with higher LoC a better solution should be
> within reach.

The duplicity here is mind-boggling. Tls stats from hardware is on par
with Ido's *example* of vxlan stats from an ASIC. You agree that
/proc/net files are wrong, but you did it anyway and now you want the
next person to solve the problem you did not want to tackle but have
strong opinions on.

Ido has a history of thinking through problems and solutions in a proper
Linux Way. netlink is the right API, and devlink was created for
'device' stuff versus 'netdev' stuff. Hence, I agree with this
*framework* for extracting asic stats.
Jakub Kicinski Aug. 22, 2020, 4:27 p.m. UTC | #17
On Fri, 21 Aug 2020 19:18:37 -0600 David Ahern wrote:
> On 8/21/20 6:37 PM, Jakub Kicinski wrote:
> >>> # cat /proc/net/tls_stat     
> >>
> >> I do not agree with adding files under /proc/net for this.  
> > 
> > Yeah it's not the best, with higher LoC a better solution should be
> > within reach.  
> 
> The duplicity here is mind-boggling. Tls stats from hardware is on par
> with Ido's *example* of vxlan stats from an ASIC. You agree that
> /proc/net files are wrong,

I didn't say /proc/net was wrong, I'm just trying to be agreeable.
Maybe I need to improve my command of the English language.

AFAIK /proc/net is where protocol stats are.

> but you did it anyway and now you want the
> next person to solve the problem you did not want to tackle but have
> strong opinions on.

I have no need and interest in vxlan stats.

> Ido has a history of thinking through problems and solutions in a proper
> Linux Way. netlink is the right API, and devlink was created for
> 'device' stuff versus 'netdev' stuff. Hence, I agree with this
> *framework* for extracting asic stats.

You seem to focus on less relevant points. I primarily care about the
statistics being defined and identified by Linux, not every vendor for
themselves.

No question about Ido's ability and contributions, but then again 
(from the cover letter):

> This a joint work [...] during a two-day company hackathon.
Ido Schimmel Aug. 23, 2020, 7:04 a.m. UTC | #18
On Sat, Aug 22, 2020 at 09:27:39AM -0700, Jakub Kicinski wrote:
> On Fri, 21 Aug 2020 19:18:37 -0600 David Ahern wrote:
> > On 8/21/20 6:37 PM, Jakub Kicinski wrote:
> > >>> # cat /proc/net/tls_stat     
> > >>
> > >> I do not agree with adding files under /proc/net for this.  
> > > 
> > > Yeah it's not the best, with higher LoC a better solution should be
> > > within reach.  
> > 
> > The duplicity here is mind-boggling. Tls stats from hardware is on par
> > with Ido's *example* of vxlan stats from an ASIC. You agree that
> > /proc/net files are wrong,
> 
> I didn't say /proc/net was wrong, I'm just trying to be agreeable.
> Maybe I need to improve my command of the English language.
> 
> AFAIK /proc/net is where protocol stats are.
> 
> > but you did it anyway and now you want the
> > next person to solve the problem you did not want to tackle but have
> > strong opinions on.
> 
> I have no need and interest in vxlan stats.
> 
> > Ido has a history of thinking through problems and solutions in a proper
> > Linux Way. netlink is the right API, and devlink was created for
> > 'device' stuff versus 'netdev' stuff. Hence, I agree with this
> > *framework* for extracting asic stats.
> 
> You seem to focus on less relevant points. I primarily care about the
> statistics being defined and identified by Linux, not every vendor for
> themselves.

Trying to understand how we can move this forward. The issue is with the
specific VXLAN metrics, but you generally agree with the need for the
framework? See my two other examples: Cache counters and algorithmic
TCAM counters.

> No question about Ido's ability and contributions, but then again 
> (from the cover letter):
> 
> > This a joint work [...] during a two-day company hackathon.

The *implementation* was done during the hackathon. The *design* was
done before. I sent to the team three design proposals (CLI / netlink
API / device drivers API) before we landed on this. Started thinking
about this idea a few months ago and the hackathon was simply a good
opportunity to implement and showcase it.
Jakub Kicinski Aug. 24, 2020, 7:11 p.m. UTC | #19
On Sun, 23 Aug 2020 10:04:34 +0300 Ido Schimmel wrote:
> > You seem to focus on less relevant points. I primarily care about the
> > statistics being defined and identified by Linux, not every vendor for
> > themselves.  
> 
> Trying to understand how we can move this forward. The issue is with the
> specific VXLAN metrics, but you generally agree with the need for the
> framework? See my two other examples: Cache counters and algorithmic
> TCAM counters.

Yes, we will likely need a way to report design-specific performance
counters no matter what. That said I would prefer to pave the way for
exposing standardized stats first, so the reviewers (e.g. myself) have
a clear place to point folks to. 

My last attempt was to just try to standardize the strings for the
per-netdev TLS offload stats (those are in addition to the /proc stats),
and document them in Documentation/. It turned out to have quite a
high review overhead, and the convergence is not satisfactory.

The only strong use I have right now is FEC stats, and I'm planning to
add IEEE-based counters to devlink ports. The scoping of MAC/PHY
counters to dl-port is, I hope, reasonable, although it remains to be
seen what phy folks think about it.

As I previously said - I think that protocol stats are best exported
from the protocol driver, otherwise the API may need to grow parallel
hierarchies. E.g. semantics of per-queue NIC counters get confusing
unless the are reported with the information about the queues - sadly
no API for that exists. In particular the life time of objects is hard
to match with lifetime of statistics. Similar thing with low
granularity counters related to traffic classification.

Long story short, it's a complicated topic, IDK how much of it I can
expect you to tackle. At the minimum I'd like it if we had a clear
separation between Linux/standard stats that drivers should share, 
and justifiably implementation specific values.

The DEVLINK_..GENERIC identifiers or trying to standardize on strings
are not working for me as a reviewer, and as an infrastructure engineer.