diff mbox series

igb: reinit_locked() should be called with rtnl_lock

Message ID 20200629211801.C3D7095C0900@us180.sjc.aristanetworks.com
State Awaiting Upstream
Delegated to: David Miller
Headers show
Series igb: reinit_locked() should be called with rtnl_lock | expand

Commit Message

Francesco Ruggeri June 29, 2020, 9:18 p.m. UTC
We observed a panic in igb_reset_task caused by this race condition
when doing a reboot -f:

	kworker			reboot -f

	igb_reset_task
	igb_reinit_locked
	igb_down
	napi_synchronize
				__igb_shutdown
				igb_clear_interrupt_scheme
				igb_free_q_vectors
				igb_free_q_vector
				adapter->q_vector[v_idx] = NULL;
	napi_disable
	Panics trying to access
	adapter->q_vector[v_idx].napi_state

This commit applies to igb the same changes that were applied to ixgbe
in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
reset subtask").

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jakub Kicinski June 30, 2020, 12:16 a.m. UTC | #1
On Mon, 29 Jun 2020 14:18:01 -0700 Francesco Ruggeri wrote:
> We observed a panic in igb_reset_task caused by this race condition
> when doing a reboot -f:
> 
> 	kworker			reboot -f
> 
> 	igb_reset_task
> 	igb_reinit_locked
> 	igb_down
> 	napi_synchronize
> 				__igb_shutdown
> 				igb_clear_interrupt_scheme
> 				igb_free_q_vectors
> 				igb_free_q_vector
> 				adapter->q_vector[v_idx] = NULL;
> 	napi_disable
> 	Panics trying to access
> 	adapter->q_vector[v_idx].napi_state
> 
> This commit applies to igb the same changes that were applied to ixgbe
> in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
> rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
> reset subtask").
> 
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>

Thanks for the patch..

Would you mind adding a fixes tag here? Probably:

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")

And as a matter of fact it looks like e1000e and e1000 have the same
bug :/ Would you mind checking all Intel driver producing matches for
all the affected ones?
Francesco Ruggeri June 30, 2020, 4:50 a.m. UTC | #2
> Would you mind adding a fixes tag here? Probably:
>
> Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")

That seems to be the commit that introduced the driver in 2.6.25.
I am not familiar with the history of the driver to tell if this was a day 1
problem or if it became an issue later.

>
> And as a matter of fact it looks like e1000e and e1000 have the same
> bug :/ Would you mind checking all Intel driver producing matches for
> all the affected ones?

Do you mean identify all Intel drivers that may have the same issue?

Francesco
Kirsher, Jeffrey T July 1, 2020, 1:37 a.m. UTC | #3
> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Monday, June 29, 2020 21:51
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: open list <linux-kernel@vger.kernel.org>; netdev
> <netdev@vger.kernel.org>; intel-wired-lan@lists.osuosl.org; David Miller
> <davem@davemloft.net>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
> 
> > Would you mind adding a fixes tag here? Probably:
> >
> > Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
> 
> That seems to be the commit that introduced the driver in 2.6.25.
> I am not familiar with the history of the driver to tell if this was a day 1
> problem or if it became an issue later.
> 
> >
> > And as a matter of fact it looks like e1000e and e1000 have the same
> > bug :/ Would you mind checking all Intel driver producing matches for
> > all the affected ones?
> 
> Do you mean identify all Intel drivers that may have the same issue?
> 

Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue.

@David Miller - I am picking up this patch
Francesco Ruggeri July 2, 2020, 7:35 p.m. UTC | #4
> Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue.
>
> @David Miller - I am picking up this patch

There seems to be a second race, independent from the
original one, that results in a divide error:

kworker         reboot -f       tx packet

igb_reset_task
                __igb_shutdown
                rtnl_lock()
                ...
                igb_clear_interrupt_scheme
                igb_free_q_vectors
                adapter->num_tx_queues = 0
                ...
                rtnl_unlock()
rtnl_lock()
igb_reinit_locked
igb_down
igb_up
netif_tx_start_all_queues
                                dev_hard_start_xmit
                                igb_xmit_frame
                                igb_tx_queue_mapping
                                Panics on
                                r_idx % adapter->num_tx_queues

Using in igb_reset_task a logic similar to the one in
ixgbe_reset_subtask (bailing if __IGB_DOWN or __IGB_RESETTING
is set) seems to avoid the panic.
That logic was first introduced in ixgbe as part of commit
2f90b8657ec ('ixgbe: this patch adds support for DCB to the
kernel and ixgbe driver').
Both fixes seem to be needed.
Kirsher, Jeffrey T July 2, 2020, 8:05 p.m. UTC | #5
> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Thursday, July 2, 2020 12:35
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Jakub Kicinski <kuba@kernel.org>; David Miller <davem@davemloft.net>;
> open list <linux-kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>;
> intel-wired-lan@lists.osuosl.org
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
> 
> > Do not worry about the other Intel drivers, I have our developers looking at
> each of our drivers for the locking issue.
> >
> > @David Miller - I am picking up this patch
> 
> There seems to be a second race, independent from the original one, that
> results in a divide error:
> 
> kworker         reboot -f       tx packet
> 
> igb_reset_task
>                 __igb_shutdown
>                 rtnl_lock()
>                 ...
>                 igb_clear_interrupt_scheme
>                 igb_free_q_vectors
>                 adapter->num_tx_queues = 0
>                 ...
>                 rtnl_unlock()
> rtnl_lock()
> igb_reinit_locked
> igb_down
> igb_up
> netif_tx_start_all_queues
>                                 dev_hard_start_xmit
>                                 igb_xmit_frame
>                                 igb_tx_queue_mapping
>                                 Panics on
>                                 r_idx % adapter->num_tx_queues
> 
> Using in igb_reset_task a logic similar to the one in ixgbe_reset_subtask (bailing
> if __IGB_DOWN or __IGB_RESETTING is set) seems to avoid the panic.
> That logic was first introduced in ixgbe as part of commit 2f90b8657ec ('ixgbe:
> this patch adds support for DCB to the kernel and ixgbe driver').
> Both fixes seem to be needed.

So will you be sending a v2 of your patch to include the second fix?
Francesco Ruggeri July 2, 2020, 8:20 p.m. UTC | #6
>
> So will you be sending a v2 of your patch to include the second fix?

Yes, I am working on it. Just to confirm, v2 should include both fixes, right?

Thanks,
Francesco
Kirsher, Jeffrey T July 2, 2020, 9:26 p.m. UTC | #7
> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Thursday, July 2, 2020 13:20
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; David Miller <davem@davemloft.net>; open list <linux-
> kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
> 
> >
> > So will you be sending a v2 of your patch to include the second fix?
> 
> Yes, I am working on it. Just to confirm, v2 should include both fixes, right?

Correct.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8bb3db2cbd41..b79a78e102f3 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6224,9 +6224,11 @@  static void igb_reset_task(struct work_struct *work)
 	struct igb_adapter *adapter;
 	adapter = container_of(work, struct igb_adapter, reset_task);
 
+	rtnl_lock();
 	igb_dump(adapter);
 	netdev_err(adapter->netdev, "Reset adapter\n");
 	igb_reinit_locked(adapter);
+	rtnl_unlock();
 }
 
 /**