Message ID | 20200717134759.8268-1-vishal@chelsio.com |
---|---|
Headers | show |
Series | cxgb4: add ethtool self_test support | expand |
On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote: > This series of patches add support for below tests. > 1. Adapter status test > 2. Link test > 3. Link speed test > 4. Loopback test Hi Vishal The loopback test is pretty usual for an ethtool self test. But the first 3 are rather odd. They don't really seem to be self tests. What reason do you have for adding these? Are you trying to debug a specific problem? Andrew
On Friday, July 07/17/20, 2020 at 20:02:51 +0200, Andrew Lunn wrote: > On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote: > > This series of patches add support for below tests. > > 1. Adapter status test > > 2. Link test > > 3. Link speed test > > 4. Loopback test > > Hi Vishal > > The loopback test is pretty usual for an ethtool self test. But the > first 3 are rather odd. They don't really seem to be self tests. What > reason do you have for adding these? Are you trying to debug a > specific problem? > > Andrew Hi Andrew, Our requirement is to add a list of self tests that can summarize if the adapter is functioning properly in a single command during system init. The above tests are the most common ones run by our on-field diagnostics team. Besides, these tests seem to be the most common among other drivers as well. Hence we have added 1. Adapter status test: Tests whether the adapter is alive or crashed 2. Link test: Adapter PHY is up or not. 3. Link speed test: Adapter has negotiated link speed correctly or not. -Vishal
On Mon, Jul 20, 2020 at 11:58:37AM +0530, Vishal Kulkarni wrote: > On Friday, July 07/17/20, 2020 at 20:02:51 +0200, Andrew Lunn wrote: > > On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote: > > > This series of patches add support for below tests. > > > 1. Adapter status test > > > 2. Link test > > > 3. Link speed test > > > 4. Loopback test > > > > Hi Vishal > > > > The loopback test is pretty usual for an ethtool self test. But the > > first 3 are rather odd. They don't really seem to be self tests. What > > reason do you have for adding these? Are you trying to debug a > > specific problem? > > > > Andrew > Hi Andrew, > > Our requirement is to add a list of self tests that can summarize if the adapter is functioning > properly in a single command during system init. The above tests are the most common ones run by > our on-field diagnostics team. Besides, these tests seem to be the most common among other drivers as well. > > Hence we have added > 1. Adapter status test: Tests whether the adapter is alive or crashed > 2. Link test: Adapter PHY is up or not. > 3. Link speed test: Adapter has negotiated link speed correctly or not. Hi Vishal Knowing that the field team does this is useful. But i still don't see these as self tests. From the man page: -t --test Executes adapter selftest on the specified network device. Possible test modes are: offline Perform full set of tests, possibly interrupting normal operation during the tests, online Perform limited set of tests, not interrupting normal operation, external_lb Perform full set of tests, as for offline, and additionally an external-loopback test. Maybe a crashed adaptor could be considered a self test, but 1) I expect nearly everything else is failing so it is pretty obvious 2) devlink health seems like a better API The PHY is up or not is only partially to do with self. It has a lot to do with the link partner and the cable. Plus ip link show will tell you this. 3) This actually sounds like a bug. Why would it of negotiated a link speed it cannot support? If you have non-overlapping sets of advertised link modes, i.e. there is no common mode to select, the link should remain down, but this is not an error. You can use ethtool to list both the local and peer advertised modes. You could also report this via the new link state properties Mellanox just added. Andrew
On Monday, July 07/20/20, 2020 at 15:35:54 +0200, Andrew Lunn wrote: > On Mon, Jul 20, 2020 at 11:58:37AM +0530, Vishal Kulkarni wrote: > > On Friday, July 07/17/20, 2020 at 20:02:51 +0200, Andrew Lunn wrote: > > > On Fri, Jul 17, 2020 at 07:17:55PM +0530, Vishal Kulkarni wrote: > > > > This series of patches add support for below tests. > > > > 1. Adapter status test > > > > 2. Link test > > > > 3. Link speed test > > > > 4. Loopback test > > > > > > Hi Vishal > > > > > > The loopback test is pretty usual for an ethtool self test. But the > > > first 3 are rather odd. They don't really seem to be self tests. What > > > reason do you have for adding these? Are you trying to debug a > > > specific problem? > > > > > > Andrew > > Hi Andrew, > > > > Our requirement is to add a list of self tests that can summarize if the adapter is functioning > > properly in a single command during system init. The above tests are the most common ones run by > > our on-field diagnostics team. Besides, these tests seem to be the most common among other drivers as well. > > > > Hence we have added > > 1. Adapter status test: Tests whether the adapter is alive or crashed > > 2. Link test: Adapter PHY is up or not. > > 3. Link speed test: Adapter has negotiated link speed correctly or not. > > Hi Vishal > > Knowing that the field team does this is useful. But i still don't see > these as self tests. > > From the man page: > > -t --test > Executes adapter selftest on the specified network > device. Possible test modes are: > > offline > Perform full set of tests, possibly interrupting normal > operation during the tests, > > online Perform limited set of tests, not interrupting normal > operation, > > external_lb > Perform full set of tests, as for offline, and additionally > an external-loopback test. > > > Maybe a crashed adaptor could be considered a self test, but > > 1) I expect nearly everything else is failing so it is pretty obvious > 2) devlink health seems like a better API > > The PHY is up or not is only partially to do with self. It has a lot > to do with the link partner and the cable. Plus ip link show will tell > you this. > > 3) This actually sounds like a bug. Why would it of negotiated a link > speed it cannot support? If you have non-overlapping sets of > advertised link modes, i.e. there is no common mode to select, the > link should remain down, but this is not an error. You can use ethtool > to list both the local and peer advertised modes. You could also > report this via the new link state properties Mellanox just added. > > Andrew Hi Andrew, Our requirement is to get overall adapter health from single tool and command. Using devlink and ip will require multiple tools and commands. -Vishal
> Hi Andrew, > > Our requirement is to get overall adapter health from single tool and command. > Using devlink and ip will require multiple tools and commands. That is not a good reason to abuse the Kernel norms and do odd things. Andrew
On Tue, 21 Jul 2020 15:41:45 +0200 Andrew Lunn wrote: > > Hi Andrew, > > > > Our requirement is to get overall adapter health from single tool and command. > > Using devlink and ip will require multiple tools and commands. > > That is not a good reason to abuse the Kernel norms and do odd things. +1 You should probably build your own tool if you have this single tool requirement. This single tool fallacy leads to very bad outcomes, like people trying to report system state in device dumps, 'cause they want system state in their customer bug reports :/
From: Vishal Kulkarni <vishal@chelsio.com> Date: Tue, 21 Jul 2020 19:08:35 +0530 > Our requirement is to get overall adapter health from single tool and command. > Using devlink and ip will require multiple tools and commands. This is an invalid argument. We have multiple facilities, each of which handles a specific task that it was designed for. You shall use such facilities, as appropriate, to fulfill your needs.
From: Andrew Lunn <andrew@lunn.ch> Date: Tue, 21 Jul 2020 15:41:45 +0200 >> Hi Andrew, >> >> Our requirement is to get overall adapter health from single tool and command. >> Using devlink and ip will require multiple tools and commands. > > That is not a good reason to abuse the Kernel norms and do odd things. +1