Message ID | 20200812101059.5501-2-shayagr@amazon.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | Bug fixes for ENA ethernet driver | expand |
On Wed, 12 Aug 2020 13:10:57 +0300 Shay Agroskin wrote: > This patch also removes the destruction of the timer and reset services > from ena_remove() since the timer is destroyed by the destruction > routine and the reset work is handled by this patch. You'd still have a use after free if the work runs after the device is removed. I think cancel_work_sync() gotta stay.
Jakub Kicinski <kuba@kernel.org> writes: > On Wed, 12 Aug 2020 13:10:57 +0300 Shay Agroskin wrote: >> This patch also removes the destruction of the timer and reset >> services >> from ena_remove() since the timer is destroyed by the >> destruction >> routine and the reset work is handled by this patch. > > You'd still have a use after free if the work runs after the > device is > removed. I think cancel_work_sync() gotta stay. Hi, thank you for reviewing the patch. Short answer: I verified that the ENA_FLAG_TRIGGER_RESET flag cannot be set after ena_destroy_device() finishes its execution. Long answer: The ena_destroy_device() function is called with rtnl_lock() held, so it cannot run in parallel with the reset function. Also the destroy function clears the bit ENA_FLAG_TRIGGER_RESET without which the reset function just exits without doing anything. A problem can then only happen when some routine sets the ENA_FLAG_TRIGGER_RESET bit before the reset function is executed, the following describes all functions from which this bit can be set: - check_* functions: these function are called from the timer routine which is destroyed in ena_destroy_device(), so by the time the rtnl_lock() released, the bit is cleared - napi related functions (io_poll, xdp_io_poll, validate_rx_req_id etc.): the napi is de-registered in ena_destroy_device(), so none of these functions is called after destroying the device. - xmit functions (ena_xmit_common, ena_tx_timeout): the device is brought down and all its RX/TX resources are freed before releasing the lock. These are all the occurrences I found. Without this bit set, the reset function would fail the 'if' check in this patch, and exit without doing anything. Destroying the reset function explicitly won't help since by the time we do it, the function can already be executed.
On Thu, 13 Aug 2020 15:51:46 +0300 Shay Agroskin wrote: > Long answer: > The ena_destroy_device() function is called with rtnl_lock() held, > so it cannot run in parallel with the reset function. Also the > destroy function clears the bit ENA_FLAG_TRIGGER_RESET without > which the reset function just exits without doing anything. > > A problem can then only happen when some routine sets the > ENA_FLAG_TRIGGER_RESET bit before the reset function is executed, > the following describes all functions from which this bit can be > set: ena_fw_reset_device() runs from a workqueue, it can be preempted right before it tries to take the rtnl_lock. Then after arbitrarily long delay it will start again, take the lock, and dereference adapter->flags. But adapter could have been long freed at this point. Unless you flush a workqueue or cancel_work_sync() you can never be sure it's not scheduled. And I can only see a flush when module is unloaded now.
Jakub Kicinski <kuba@kernel.org> writes: > On Thu, 13 Aug 2020 15:51:46 +0300 Shay Agroskin wrote: >> Long answer: >> The ena_destroy_device() function is called with rtnl_lock() >> held, >> so it cannot run in parallel with the reset function. Also the >> destroy function clears the bit ENA_FLAG_TRIGGER_RESET without >> which the reset function just exits without doing anything. >> >> A problem can then only happen when some routine sets the >> ENA_FLAG_TRIGGER_RESET bit before the reset function is >> executed, >> the following describes all functions from which this bit can >> be >> set: > > ena_fw_reset_device() runs from a workqueue, it can be preempted > right > before it tries to take the rtnl_lock. Then after arbitrarily > long > delay it will start again, take the lock, and dereference > adapter->flags. But adapter could have been long freed at this > point. Missed that the check for the 'flags' field also requires that netdev_priv field (adapter variable) would be allocated. Thank you for pointing that out, this indeed needs to be fixed. I'll add reset work destruction in next patchset. Thank you for reviewing it > > Unless you flush a workqueue or cancel_work_sync() you can never > be > sure it's not scheduled. And I can only see a flush when module > is > unloaded now.
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 2a6c9725e092..0488fcbf48f7 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -3601,16 +3601,14 @@ static void ena_fw_reset_device(struct work_struct *work) { struct ena_adapter *adapter = container_of(work, struct ena_adapter, reset_task); - struct pci_dev *pdev = adapter->pdev; - if (unlikely(!test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) { - dev_err(&pdev->dev, - "device reset schedule while reset bit is off\n"); - return; - } rtnl_lock(); - ena_destroy_device(adapter, false); - ena_restore_device(adapter); + + if (likely(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) { + ena_destroy_device(adapter, false); + ena_restore_device(adapter); + } + rtnl_unlock(); } @@ -4389,9 +4387,6 @@ static void __ena_shutoff(struct pci_dev *pdev, bool shutdown) netdev->rx_cpu_rmap = NULL; } #endif /* CONFIG_RFS_ACCEL */ - del_timer_sync(&adapter->timer_service); - - cancel_work_sync(&adapter->reset_task); rtnl_lock(); /* lock released inside the below if-else block */ adapter->reset_reason = ENA_REGS_RESET_SHUTDOWN;
The reset work is scheduled by the timer routine whenever it detects that a device reset is required (e.g. when a keep_alive signal is missing). When releasing device resources in ena_destroy() the driver cancels the scheduling of the timer routine without destroying the reset work explicitly. This creates the following bug: The driver is suspended and the ena_suspend() function is called -> This function calls ena_destroy() to free the net device resources -> The driver waits for the timer routine to finish its execution and then cancels it, thus preventing from it to be called again. If, in its final execution, the timer routine schedules a reset, the reset routine might be called after the device resources are freed, which might cause a kernel panic. By changing the reset routine so that it cannot run simultaneously with the destruction routine, we allow the reset routine read the device's state accurately. This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set before resetting the device and making both the destruction function and the flag check are under rtnl lock. The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction routine. Also surround the flag check with 'likely' because we expect that the reset routine would be called only when ENA_FLAG_TRIGGER_RESET flag is set. This patch also removes the destruction of the timer and reset services from ena_remove() since the timer is destroyed by the destruction routine and the reset work is handled by this patch. Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Shay Agroskin <shayagr@amazon.com> --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)