mbox series

[net-next,0/4] cxgb4: add ethtool self_test support

Message ID 20200717134759.8268-1-vishal@chelsio.com
Headers show
Series cxgb4: add ethtool self_test support | expand

Message

Vishal Kulkarni July 17, 2020, 1:47 p.m. UTC
This series of patches add support for below tests.
1. Adapter status test
2. Link test
3. Link speed test
4. Loopback test

Vishal Kulkarni (4):
  cxgb4: Add ethtool self-test support
  cxgb4: Add link test to ethtool self test.
  cxgb4: Add adapter status check to ethtool
  cxgb4: Add speed link test to ethtool self_test

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h    |  10 ++
 .../ethernet/chelsio/cxgb4/cxgb4_ethtool.c    | 137 ++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/sge.c      | 117 ++++++++++++++-
 3 files changed, 261 insertions(+), 3 deletions(-)

Comments

Andrew Lunn July 17, 2020, 6:02 p.m. UTC | #1
On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote:
> This series of patches add support for below tests.
> 1. Adapter status test
> 2. Link test
> 3. Link speed test
> 4. Loopback test

Hi Vishal

The loopback test is pretty usual for an ethtool self test. But the
first 3 are rather odd. They don't really seem to be self tests. What
reason do you have for adding these? Are you trying to debug a
specific problem?

	 Andrew
Vishal Kulkarni July 20, 2020, 6:28 a.m. UTC | #2
On Friday, July 07/17/20, 2020 at 20:02:51 +0200, Andrew Lunn wrote:
> On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote:
> > This series of patches add support for below tests.
> > 1. Adapter status test
> > 2. Link test
> > 3. Link speed test
> > 4. Loopback test
> 
> Hi Vishal
> 
> The loopback test is pretty usual for an ethtool self test. But the
> first 3 are rather odd. They don't really seem to be self tests. What
> reason do you have for adding these? Are you trying to debug a
> specific problem?
> 
> 	 Andrew
Hi Andrew,

Our requirement is to add a list of self tests that can summarize if the adapter is functioning
properly in a single command during system init. The above tests are the most common ones run by
our on-field diagnostics team. Besides, these tests seem to be the most common among other drivers as well.

Hence we have added
1. Adapter status test: Tests whether the adapter is alive or crashed
2. Link test: Adapter PHY is up or not.
3. Link speed test: Adapter has negotiated link speed correctly or not.

-Vishal
Andrew Lunn July 20, 2020, 1:35 p.m. UTC | #3
On Mon, Jul 20, 2020 at 11:58:37AM +0530, Vishal Kulkarni wrote:
> On Friday, July 07/17/20, 2020 at 20:02:51 +0200, Andrew Lunn wrote:
> > On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote:
> > > This series of patches add support for below tests.
> > > 1. Adapter status test
> > > 2. Link test
> > > 3. Link speed test
> > > 4. Loopback test
> > 
> > Hi Vishal
> > 
> > The loopback test is pretty usual for an ethtool self test. But the
> > first 3 are rather odd. They don't really seem to be self tests. What
> > reason do you have for adding these? Are you trying to debug a
> > specific problem?
> > 
> > 	 Andrew
> Hi Andrew,
> 
> Our requirement is to add a list of self tests that can summarize if the adapter is functioning
> properly in a single command during system init. The above tests are the most common ones run by
> our on-field diagnostics team. Besides, these tests seem to be the most common among other drivers as well.
> 
> Hence we have added
> 1. Adapter status test: Tests whether the adapter is alive or crashed
> 2. Link test: Adapter PHY is up or not.
> 3. Link speed test: Adapter has negotiated link speed correctly or not.

Hi Vishal

Knowing that the field team does this is useful. But i still don't see
these as self tests.

From the man page:

       -t --test
              Executes adapter selftest on the specified network
	      device. Possible test modes are:

           offline
                  Perform full set of tests, possibly interrupting normal
		  operation during the tests,

           online Perform limited set of tests, not interrupting normal
	   operation,

           external_lb
                  Perform full set of tests, as for offline, and additionally
		  an external-loopback test.


Maybe a crashed adaptor could be considered a self test, but

1) I expect nearly everything else is failing so it is pretty obvious
2) devlink health seems like a better API

The PHY is up or not is only partially to do with self. It has a lot
to do with the link partner and the cable. Plus ip link show will tell
you this.

3) This actually sounds like a bug. Why would it of negotiated a link
speed it cannot support? If you have non-overlapping sets of
advertised link modes, i.e. there is no common mode to select, the
link should remain down, but this is not an error. You can use ethtool
to list both the local and peer advertised modes. You could also
report this via the new link state properties Mellanox just added.

       Andrew
Vishal Kulkarni July 21, 2020, 1:38 p.m. UTC | #4
On Monday, July 07/20/20, 2020 at 15:35:54 +0200, Andrew Lunn wrote:
> On Mon, Jul 20, 2020 at 11:58:37AM +0530, Vishal Kulkarni wrote:
> > On Friday, July 07/17/20, 2020 at 20:02:51 +0200, Andrew Lunn wrote:
> > > On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote:
> > > > This series of patches add support for below tests.
> > > > 1. Adapter status test
> > > > 2. Link test
> > > > 3. Link speed test
> > > > 4. Loopback test
> > > 
> > > Hi Vishal
> > > 
> > > The loopback test is pretty usual for an ethtool self test. But the
> > > first 3 are rather odd. They don't really seem to be self tests. What
> > > reason do you have for adding these? Are you trying to debug a
> > > specific problem?
> > > 
> > > 	 Andrew
> > Hi Andrew,
> > 
> > Our requirement is to add a list of self tests that can summarize if the adapter is functioning
> > properly in a single command during system init. The above tests are the most common ones run by
> > our on-field diagnostics team. Besides, these tests seem to be the most common among other drivers as well.
> > 
> > Hence we have added
> > 1. Adapter status test: Tests whether the adapter is alive or crashed
> > 2. Link test: Adapter PHY is up or not.
> > 3. Link speed test: Adapter has negotiated link speed correctly or not.
> 
> Hi Vishal
> 
> Knowing that the field team does this is useful. But i still don't see
> these as self tests.
> 
> From the man page:
> 
>        -t --test
>               Executes adapter selftest on the specified network
> 	      device. Possible test modes are:
> 
>            offline
>                   Perform full set of tests, possibly interrupting normal
> 		  operation during the tests,
> 
>            online Perform limited set of tests, not interrupting normal
> 	   operation,
> 
>            external_lb
>                   Perform full set of tests, as for offline, and additionally
> 		  an external-loopback test.
> 
> 
> Maybe a crashed adaptor could be considered a self test, but
> 
> 1) I expect nearly everything else is failing so it is pretty obvious
> 2) devlink health seems like a better API
> 
> The PHY is up or not is only partially to do with self. It has a lot
> to do with the link partner and the cable. Plus ip link show will tell
> you this.
> 
> 3) This actually sounds like a bug. Why would it of negotiated a link
> speed it cannot support? If you have non-overlapping sets of
> advertised link modes, i.e. there is no common mode to select, the
> link should remain down, but this is not an error. You can use ethtool
> to list both the local and peer advertised modes. You could also
> report this via the new link state properties Mellanox just added.
> 
>        Andrew

Hi Andrew,

Our requirement is to get overall adapter health from single tool and command.
Using devlink and ip will require multiple tools and commands.

-Vishal
Andrew Lunn July 21, 2020, 1:41 p.m. UTC | #5
> Hi Andrew,
> 
> Our requirement is to get overall adapter health from single tool and command.
> Using devlink and ip will require multiple tools and commands.

That is not a good reason to abuse the Kernel norms and do odd things.

     Andrew
Jakub Kicinski July 21, 2020, 4:49 p.m. UTC | #6
On Tue, 21 Jul 2020 15:41:45 +0200 Andrew Lunn wrote:
> > Hi Andrew,
> > 
> > Our requirement is to get overall adapter health from single tool and command.
> > Using devlink and ip will require multiple tools and commands.  
> 
> That is not a good reason to abuse the Kernel norms and do odd things.

+1 

You should probably build your own tool if you have this single tool
requirement. This single tool fallacy leads to very bad outcomes, like
people trying to report system state in device dumps, 'cause they want
system state in their customer bug reports :/
David Miller July 21, 2020, 11:02 p.m. UTC | #7
From: Vishal Kulkarni <vishal@chelsio.com>
Date: Tue, 21 Jul 2020 19:08:35 +0530

> Our requirement is to get overall adapter health from single tool and command.
> Using devlink and ip will require multiple tools and commands.

This is an invalid argument.

We have multiple facilities, each of which handles a specific task that it
was designed for.  You shall use such facilities, as appropriate, to fulfill
your needs.
David Miller July 21, 2020, 11:02 p.m. UTC | #8
From: Andrew Lunn <andrew@lunn.ch>
Date: Tue, 21 Jul 2020 15:41:45 +0200

>> Hi Andrew,
>> 
>> Our requirement is to get overall adapter health from single tool and command.
>> Using devlink and ip will require multiple tools and commands.
> 
> That is not a good reason to abuse the Kernel norms and do odd things.

+1