diff mbox series

[V2,net,1/3] net: ena: Prevent reset after device destruction

Message ID 20200819090443.24917-2-shayagr@amazon.com
State Changes Requested
Delegated to: David Miller
Headers show
Series Bug fixes for ENA ethernet driver | expand

Commit Message

Shay Agroskin Aug. 19, 2020, 9:04 a.m. UTC
The reset work is scheduled by the timer routine whenever it
detects that a device reset is required (e.g. when a keep_alive signal
is missing).
When releasing device resources in ena_destroy_device() the driver cancels the
scheduling of the timer routine without destroying the reset
work explicitly.

This creates the following bug:
    The driver is suspended and the ena_suspend() function is called
	-> This function calls ena_destroy_device() to free the net device
	   resources
	    -> The driver waits for the timer routine to finish
	    its execution and then cancels it, thus preventing from it
	    to be called again.

    If, in its final execution, the timer routine schedules a reset,
    the reset routine might be called afterwards, and a redundant call to
    ena_restore_device() would be made.

By changing the reset routine we allow it to read the device's state
accurately.
This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set
before resetting the device and making both the destruction function and
the flag check are under rtnl lock.
The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction
routine. Also surround the flag check with 'likely' because
we expect that the reset routine would be called only when
ENA_FLAG_TRIGGER_RESET flag is set.

The destruction of the timer and reset services in __ena_shutoff() have to
stay, even though the timer routine is destroyed in ena_destroy_device().
This is to avoid a case in which the reset routine is scheduled after
free_netdev() in __ena_shutoff(), which would create an access to freed
memory in adapter->flags.

Fixes: 84a629e ("[New feature] ena_netdev: Add hibernation support")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Comments

Jakub Kicinski Aug. 19, 2020, 3:46 p.m. UTC | #1
On Wed, 19 Aug 2020 12:04:41 +0300 Shay Agroskin wrote:
> Fixes: 84a629e ("[New feature] ena_netdev: Add hibernation support")

Fixes tag: Fixes: 84a629e ("[New feature] ena_netdev: Add hibernation support")
Has these problem(s):
	- Target SHA1 does not exist


Also hash needs to be 12 characters.
Shay Agroskin Aug. 19, 2020, 5:27 p.m. UTC | #2
Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 19 Aug 2020 12:04:41 +0300 Shay Agroskin wrote:
>> Fixes: 84a629e ("[New feature] ena_netdev: Add hibernation 
>> support")
>
> Fixes tag: Fixes: 84a629e ("[New feature] ena_netdev: Add 
> hibernation support")
> Has these problem(s):
> 	- Target SHA1 does not exist
>
>
> Also hash needs to be 12 characters.

Yup, sorry for that. Should have been more careful with it.

I'll fix it in next patchset. Thanks
diff mbox series

Patch

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2a6c9725e092..44aeace196f0 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3601,16 +3601,14 @@  static void ena_fw_reset_device(struct work_struct *work)
 {
 	struct ena_adapter *adapter =
 		container_of(work, struct ena_adapter, reset_task);
-	struct pci_dev *pdev = adapter->pdev;
 
-	if (unlikely(!test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
-		dev_err(&pdev->dev,
-			"device reset schedule while reset bit is off\n");
-		return;
-	}
 	rtnl_lock();
-	ena_destroy_device(adapter, false);
-	ena_restore_device(adapter);
+
+	if (likely(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
+		ena_destroy_device(adapter, false);
+		ena_restore_device(adapter);
+	}
+
 	rtnl_unlock();
 }
 
@@ -4389,8 +4387,11 @@  static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
 		netdev->rx_cpu_rmap = NULL;
 	}
 #endif /* CONFIG_RFS_ACCEL */
-	del_timer_sync(&adapter->timer_service);
 
+	/* Make sure timer and reset routine won't be called after
+	 * freeing device resources.
+	 */
+	del_timer_sync(&adapter->timer_service);
 	cancel_work_sync(&adapter->reset_task);
 
 	rtnl_lock(); /* lock released inside the below if-else block */