Message ID | 20200629211801.C3D7095C0900@us180.sjc.aristanetworks.com |
---|---|
State | Awaiting Upstream |
Delegated to: | David Miller |
Headers | show |
Series | igb: reinit_locked() should be called with rtnl_lock | expand |
On Mon, 29 Jun 2020 14:18:01 -0700 Francesco Ruggeri wrote: > We observed a panic in igb_reset_task caused by this race condition > when doing a reboot -f: > > kworker reboot -f > > igb_reset_task > igb_reinit_locked > igb_down > napi_synchronize > __igb_shutdown > igb_clear_interrupt_scheme > igb_free_q_vectors > igb_free_q_vector > adapter->q_vector[v_idx] = NULL; > napi_disable > Panics trying to access > adapter->q_vector[v_idx].napi_state > > This commit applies to igb the same changes that were applied to ixgbe > in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with > rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in > reset subtask"). > > Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Thanks for the patch.. Would you mind adding a fixes tag here? Probably: Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") And as a matter of fact it looks like e1000e and e1000 have the same bug :/ Would you mind checking all Intel driver producing matches for all the affected ones?
> Would you mind adding a fixes tag here? Probably: > > Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") That seems to be the commit that introduced the driver in 2.6.25. I am not familiar with the history of the driver to tell if this was a day 1 problem or if it became an issue later. > > And as a matter of fact it looks like e1000e and e1000 have the same > bug :/ Would you mind checking all Intel driver producing matches for > all the affected ones? Do you mean identify all Intel drivers that may have the same issue? Francesco
> -----Original Message----- > From: Francesco Ruggeri <fruggeri@arista.com> > Sent: Monday, June 29, 2020 21:51 > To: Jakub Kicinski <kuba@kernel.org> > Cc: open list <linux-kernel@vger.kernel.org>; netdev > <netdev@vger.kernel.org>; intel-wired-lan@lists.osuosl.org; David Miller > <davem@davemloft.net>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> > Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock > > > Would you mind adding a fixes tag here? Probably: > > > > Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") > > That seems to be the commit that introduced the driver in 2.6.25. > I am not familiar with the history of the driver to tell if this was a day 1 > problem or if it became an issue later. > > > > > And as a matter of fact it looks like e1000e and e1000 have the same > > bug :/ Would you mind checking all Intel driver producing matches for > > all the affected ones? > > Do you mean identify all Intel drivers that may have the same issue? > Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue. @David Miller - I am picking up this patch
> Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue. > > @David Miller - I am picking up this patch There seems to be a second race, independent from the original one, that results in a divide error: kworker reboot -f tx packet igb_reset_task __igb_shutdown rtnl_lock() ... igb_clear_interrupt_scheme igb_free_q_vectors adapter->num_tx_queues = 0 ... rtnl_unlock() rtnl_lock() igb_reinit_locked igb_down igb_up netif_tx_start_all_queues dev_hard_start_xmit igb_xmit_frame igb_tx_queue_mapping Panics on r_idx % adapter->num_tx_queues Using in igb_reset_task a logic similar to the one in ixgbe_reset_subtask (bailing if __IGB_DOWN or __IGB_RESETTING is set) seems to avoid the panic. That logic was first introduced in ixgbe as part of commit 2f90b8657ec ('ixgbe: this patch adds support for DCB to the kernel and ixgbe driver'). Both fixes seem to be needed.
> -----Original Message----- > From: Francesco Ruggeri <fruggeri@arista.com> > Sent: Thursday, July 2, 2020 12:35 > To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> > Cc: Jakub Kicinski <kuba@kernel.org>; David Miller <davem@davemloft.net>; > open list <linux-kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>; > intel-wired-lan@lists.osuosl.org > Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock > > > Do not worry about the other Intel drivers, I have our developers looking at > each of our drivers for the locking issue. > > > > @David Miller - I am picking up this patch > > There seems to be a second race, independent from the original one, that > results in a divide error: > > kworker reboot -f tx packet > > igb_reset_task > __igb_shutdown > rtnl_lock() > ... > igb_clear_interrupt_scheme > igb_free_q_vectors > adapter->num_tx_queues = 0 > ... > rtnl_unlock() > rtnl_lock() > igb_reinit_locked > igb_down > igb_up > netif_tx_start_all_queues > dev_hard_start_xmit > igb_xmit_frame > igb_tx_queue_mapping > Panics on > r_idx % adapter->num_tx_queues > > Using in igb_reset_task a logic similar to the one in ixgbe_reset_subtask (bailing > if __IGB_DOWN or __IGB_RESETTING is set) seems to avoid the panic. > That logic was first introduced in ixgbe as part of commit 2f90b8657ec ('ixgbe: > this patch adds support for DCB to the kernel and ixgbe driver'). > Both fixes seem to be needed. So will you be sending a v2 of your patch to include the second fix?
> > So will you be sending a v2 of your patch to include the second fix? Yes, I am working on it. Just to confirm, v2 should include both fixes, right? Thanks, Francesco
> -----Original Message----- > From: Francesco Ruggeri <fruggeri@arista.com> > Sent: Thursday, July 2, 2020 13:20 > To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> > Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Jakub Kicinski > <kuba@kernel.org>; David Miller <davem@davemloft.net>; open list <linux- > kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>; intel-wired- > lan@lists.osuosl.org > Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock > > > > > So will you be sending a v2 of your patch to include the second fix? > > Yes, I am working on it. Just to confirm, v2 should include both fixes, right? Correct.
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 8bb3db2cbd41..b79a78e102f3 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6224,9 +6224,11 @@ static void igb_reset_task(struct work_struct *work) struct igb_adapter *adapter; adapter = container_of(work, struct igb_adapter, reset_task); + rtnl_lock(); igb_dump(adapter); netdev_err(adapter->netdev, "Reset adapter\n"); igb_reinit_locked(adapter); + rtnl_unlock(); } /**
We observed a panic in igb_reset_task caused by this race condition when doing a reboot -f: kworker reboot -f igb_reset_task igb_reinit_locked igb_down napi_synchronize __igb_shutdown igb_clear_interrupt_scheme igb_free_q_vectors igb_free_q_vector adapter->q_vector[v_idx] = NULL; napi_disable Panics trying to access adapter->q_vector[v_idx].napi_state This commit applies to igb the same changes that were applied to ixgbe in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in reset subtask"). Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 2 ++ 1 file changed, 2 insertions(+)