Message ID | 20190718014218.16610-1-bpoirier@suse.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] be2net: Synchronize be_update_queues with dev_watchdog | expand |
On 7/17/19 6:42 PM, Benjamin Poirier wrote: > As pointed out by Firo Yang, a netdev tx timeout may trigger just before an > ethtool set_channels operation is started. be_tx_timeout(), which dumps > some queue structures, is not written to run concurrently with > be_update_queues(), which frees/allocates those queues structures. Add some > synchronization between the two. > > Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Would not moving the netif_tx_disable() in be_close() further up in the function resolve that problem as well?
On 2019/07/18 10:23, Florian Fainelli wrote: > On 7/17/19 6:42 PM, Benjamin Poirier wrote: > > As pointed out by Firo Yang, a netdev tx timeout may trigger just before an > > ethtool set_channels operation is started. be_tx_timeout(), which dumps > > some queue structures, is not written to run concurrently with > > be_update_queues(), which frees/allocates those queues structures. Add some > > synchronization between the two. > > > > Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com> > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > > Would not moving the netif_tx_disable() in be_close() further up in the > function resolve that problem as well? Thanks for your review Florian, No, netif_tx_disable() doesn't provide mutual exclusion with dev_watchdog(). You can have: cpu0 cpu1 \ dev_watchdog \ netif_tx_lock \ be_tx_timeout ... \ be_set_channels \ be_update_queues \ netif_carrier_off \ netif_tx_disable ... \ be_clear_queues still running in be_tx_timeout(), boom!
From: Benjamin Poirier <bpoirier@suse.com> Date: Thu, 18 Jul 2019 10:42:18 +0900 > As pointed out by Firo Yang, a netdev tx timeout may trigger just before an > ethtool set_channels operation is started. be_tx_timeout(), which dumps > some queue structures, is not written to run concurrently with > be_update_queues(), which frees/allocates those queues structures. Add some > synchronization between the two. > > Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Applied and queued up for -stable, thanks.
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index b7a246b33599..2edb86ec9fe9 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -4698,8 +4698,13 @@ int be_update_queues(struct be_adapter *adapter) int status; if (netif_running(netdev)) { + /* be_tx_timeout() must not run concurrently with this + * function, synchronize with an already-running dev_watchdog + */ + netif_tx_lock_bh(netdev); /* device cannot transmit now, avoid dev_watchdog timeouts */ netif_carrier_off(netdev); + netif_tx_unlock_bh(netdev); be_close(netdev); }
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an ethtool set_channels operation is started. be_tx_timeout(), which dumps some queue structures, is not written to run concurrently with be_update_queues(), which frees/allocates those queues structures. Add some synchronization between the two. Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> --- drivers/net/ethernet/emulex/benet/be_main.c | 5 +++++ 1 file changed, 5 insertions(+)