Message ID | 20190716081655.7676-1-bpoirier@suse.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] be2net: Signal that the device cannot transmit during reconfiguration | expand |
From: Benjamin Poirier <bpoirier@suse.com> Date: Tue, 16 Jul 2019 17:16:55 +0900 > While changing the number of interrupt channels, be2net stops adapter > operation (including netif_tx_disable()) but it doesn't signal that it > cannot transmit. This may lead dev_watchdog() to falsely trigger during > that time. > > Add the missing call to netif_carrier_off(), following the pattern used in > many other drivers. netif_carrier_on() is already taken care of in > be_open(). > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Applied.
I think there is a problem if dev_watchdog() is triggered before netif_carrier_off(). dev_watchdog() might call ->ndo_tx_timeout(), i.e. be_tx_timeout(), if txq timeout happens. Thus be_tx_timeout() could still be able to access the memory which is being freed by be_update_queues(). Thanks, Firo
On 2019/07/17 13:23, Firo Yang wrote:
> I think there is a problem if dev_watchdog() is triggered before netif_carrier_off(). dev_watchdog() might call ->ndo_tx_timeout(), i.e. be_tx_timeout(), if txq timeout happens. Thus be_tx_timeout() could still be able to access the memory which is being freed by be_update_queues().
Good point. That's a separate problem which would occur in case of real
tx timeout. How about this followup change:
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4698,8 +4698,13 @@ int be_update_queues(struct be_adapter *adapter)
int status;
if (netif_running(netdev)) {
+ /* be_tx_timeout() must not run concurrently with this
+ * function, synchronize with an already-running dev_watchdog
+ */
+ netif_tx_lock_bh(netdev);
/* device cannot transmit now, avoid dev_watchdog timeouts */
netif_carrier_off(netdev);
+ netif_tx_unlock_bh(netdev);
be_close(netdev);
}
I don't think this change could fix this problem because if SMP, dev_watchdog() could run on a different CPU. Thanks, Firo
On 2019/07/17 17:56, Firo Yang wrote:
> I don't think this change could fix this problem because if SMP, dev_watchdog() could run on a different CPU.
hmm, SMP is clearly part of the picture here. The change I proposed
revolves around the synchronization offered by dev->tx_global_lock:
we have
\ dev_watchdog
\ netif_tx_lock
spin_lock(&dev->tx_global_lock);
...
\ netif_tx_unlock
and
\ be_update_queues
\ netif_tx_lock_bh
\ netif_tx_lock
spin_lock(&dev->tx_global_lock);
Makes sense?
Crystal clear. Many thanks. // Firo
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 82015c8a5ed7..b7a246b33599 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -4697,8 +4697,12 @@ int be_update_queues(struct be_adapter *adapter) struct net_device *netdev = adapter->netdev; int status; - if (netif_running(netdev)) + if (netif_running(netdev)) { + /* device cannot transmit now, avoid dev_watchdog timeouts */ + netif_carrier_off(netdev); + be_close(netdev); + } be_cancel_worker(adapter);
While changing the number of interrupt channels, be2net stops adapter operation (including netif_tx_disable()) but it doesn't signal that it cannot transmit. This may lead dev_watchdog() to falsely trigger during that time. Add the missing call to netif_carrier_off(), following the pattern used in many other drivers. netif_carrier_on() is already taken care of in be_open(). Signed-off-by: Benjamin Poirier <bpoirier@suse.com> --- drivers/net/ethernet/emulex/benet/be_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)