diff mbox series

[net] net/ibmvnic: Report last valid speed and duplex values to ethtool

Message ID 1561655353-17114-1-git-send-email-tlfalcon@linux.ibm.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] net/ibmvnic: Report last valid speed and duplex values to ethtool | expand

Commit Message

Thomas Falcon June 27, 2019, 5:09 p.m. UTC
This patch resolves an issue with sensitive bonding modes
that require valid speed and duplex settings to function
properly. Currently, the adapter will report that device
speed and duplex is unknown if the communication link
with firmware is unavailable. This decision can break LACP
configurations if the timing is right.

For example, if invalid speeds are reported, the slave
device's link state is set to a transitional "fail" state
and the LACP port is disabled. However, if valid speeds
are reported later but the link state has not been altered,
the LACP port will remain disabled. If the link state then
transitions back to "up" from "fail," it results in a state
such that the slave reports valid speed/duplex and is up,
but the LACP port will remain disabled.

Workaround this by reporting the last recorded speed
and duplex settings unless the device has never been
activated. In that case or when the hypervisor gives
invalid values, continue to report unknown speed or
duplex to ethtool.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Andrew Lunn June 27, 2019, 5:57 p.m. UTC | #1
On Thu, Jun 27, 2019 at 12:09:13PM -0500, Thomas Falcon wrote:
> This patch resolves an issue with sensitive bonding modes
> that require valid speed and duplex settings to function
> properly. Currently, the adapter will report that device
> speed and duplex is unknown if the communication link
> with firmware is unavailable.

Dumb question. If you cannot communicate with the firmware, isn't the
device FUBAR? So setting the LACP port to disabled is the correct
things to do.

       Andrew
Thomas Falcon June 27, 2019, 8:56 p.m. UTC | #2
On 6/27/19 12:57 PM, Andrew Lunn wrote:
> On Thu, Jun 27, 2019 at 12:09:13PM -0500, Thomas Falcon wrote:
>> This patch resolves an issue with sensitive bonding modes
>> that require valid speed and duplex settings to function
>> properly. Currently, the adapter will report that device
>> speed and duplex is unknown if the communication link
>> with firmware is unavailable.
> Dumb question. If you cannot communicate with the firmware, isn't the
> device FUBAR? So setting the LACP port to disabled is the correct
> things to do.
>
>         Andrew
>
Yes, I think that is correct too.  The problem is that the link is only 
down temporarily.  In this case - we are testing with a pseries logical 
partition - the partition is migrated to another server. The driver must 
wait for a signal from the hypervisor to resume operation with the new 
device.  Once it resumes, we see that the device reboots and gets 
correct speed settings, but the port flag (AD_LACP_PORT_ENABLED) is 
still cleared.

Tom
David Miller July 2, 2019, 9:01 p.m. UTC | #3
From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Thu, 27 Jun 2019 12:09:13 -0500

> This patch resolves an issue with sensitive bonding modes
> that require valid speed and duplex settings to function
> properly. Currently, the adapter will report that device
> speed and duplex is unknown if the communication link
> with firmware is unavailable. This decision can break LACP
> configurations if the timing is right.
> 
> For example, if invalid speeds are reported, the slave
> device's link state is set to a transitional "fail" state
> and the LACP port is disabled. However, if valid speeds
> are reported later but the link state has not been altered,
> the LACP port will remain disabled. If the link state then
> transitions back to "up" from "fail," it results in a state
> such that the slave reports valid speed/duplex and is up,
> but the LACP port will remain disabled.
> 
> Workaround this by reporting the last recorded speed
> and duplex settings unless the device has never been
> activated. In that case or when the hypervisor gives
> invalid values, continue to report unknown speed or
> duplex to ethtool.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>

Like Andrew, I have my conerns about this.

If the firmware is unavailable, the link is effectively down.  So
you should report link down and unknown link parameters.

Bonding and LACP should do the right thing when the firwmare is
reachable again after the migration and the link goes back up.

If bonding/LACP isn't doing that, then the bug is there.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 3da6800..7c14e33 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2276,10 +2276,8 @@  static int ibmvnic_get_link_ksettings(struct net_device *netdev,
 	int rc;
 
 	rc = send_query_phys_parms(adapter);
-	if (rc) {
-		adapter->speed = SPEED_UNKNOWN;
-		adapter->duplex = DUPLEX_UNKNOWN;
-	}
+	if (rc)
+		netdev_warn(netdev, "Device query of current speed and duplex settings failed; reported values may be stale.\n");
 	cmd->base.speed = adapter->speed;
 	cmd->base.duplex = adapter->duplex;
 	cmd->base.port = PORT_FIBRE;
@@ -4834,6 +4832,8 @@  static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	dev_set_drvdata(&dev->dev, netdev);
 	adapter->vdev = dev;
 	adapter->netdev = netdev;
+	adapter->speed = SPEED_UNKNOWN;
+	adapter->duplex = DUPLEX_UNKNOWN;
 
 	ether_addr_copy(adapter->mac_addr, mac_addr_p);
 	ether_addr_copy(netdev->dev_addr, adapter->mac_addr);