diff mbox

[2.6.35-rc1] net-next: vmxnet3 fixes [4/5] Do not reset when the device is not opened

Message ID alpine.LRH.2.00.1007151448260.25472@localhost.localdomain
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Shreyas Bhatewara July 16, 2010, 1:20 a.m. UTC
On Wed, 14 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> Date: Tue, 13 Jul 2010 17:49:52 -0700 (PDT)
> 
> > 
> > Do not reset when the device is not opened
> > 
> > If a reset is scheduled, and the device goes thru close and open, it
> > may happen that reset and open may run in parallel.
> > The reset code now bails out if the device is not opened.
> >  
> > Signed-off-by: Ronghua Zang <ronghua@vmware.com>
> > Signed-off-by: Matthieu Bucchianeri <matthieu@vmware.com>
> > Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> 
> Testing IFF_UP is just making your race hole a little smaller but
> it isn't fixing the problem.
> 
> In fact, there really isn't any legitimate reason for a driver
> to test the IFF_UP state bit.  netif_running() is the correct
> test, and asynchronous work queue things like resets should
> synchronize with open/close using the RTNL lock so that your
> test of netif_running() is a stable one.
> 

Is this what you suggest :

---

Hold rtnl_lock to get the right link state.

While asynchronously resetting the device, hold rtnl_lock to get the 
right value from netif_running. If a reset is scheduled, and the device
goes thru close and open, it may happen that reset and open may run in 
parallel. Holding rtnl_lock will avoid this.

---

 drivers/net/vmxnet3/vmxnet3_drv.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 16, 2010, 1:32 a.m. UTC | #1
From: Shreyas Bhatewara <sbhatewara@vmware.com>
Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)

> Is this what you suggest :
> 
> ---
> 
> Hold rtnl_lock to get the right link state.

It ought to work, but make sure that it is legal to take the
RTNL semaphore in all contexts in which this code block
might be called.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shreyas Bhatewara July 16, 2010, 8:17 a.m. UTC | #2
On Thu, 15 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
> 
> > Is this what you suggest :
> > 
> > ---
> > 
> > Hold rtnl_lock to get the right link state.
> 
> It ought to work, but make sure that it is legal to take the
> RTNL semaphore in all contexts in which this code block
> might be called.
> 

This code block is called only from the workqueue handler, which runs in
process context, so it is legal to take rtnl semaphore.
Tested this code by simulating event interrupts (which schedule this 
code) at considerable frequency while the interface was brought up and
down in a loop. Similar stress testing had revealed the bug originally. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 17, 2010, 11:35 p.m. UTC | #3
From: Shreyas Bhatewara <sbhatewara@vmware.com>
Date: Fri, 16 Jul 2010 01:17:29 -0700 (PDT)

> 
> 
> On Thu, 15 Jul 2010, David Miller wrote:
> 
>> From: Shreyas Bhatewara <sbhatewara@vmware.com>
>> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
>> 
>> > Is this what you suggest :
>> > 
>> > ---
>> > 
>> > Hold rtnl_lock to get the right link state.
>> 
>> It ought to work, but make sure that it is legal to take the
>> RTNL semaphore in all contexts in which this code block
>> might be called.
>> 
> 
> This code block is called only from the workqueue handler, which runs in
> process context, so it is legal to take rtnl semaphore.
> Tested this code by simulating event interrupts (which schedule this 
> code) at considerable frequency while the interface was brought up and
> down in a loop. Similar stress testing had revealed the bug originally. 

Awesome, please submit this formally.  The copy you sent lacked a commit
message and signoff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 1b0ce8c..c4d7e42 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2420,6 +2420,7 @@  vmxnet3_reset_work(struct work_struct *data)
 		return;
 
 	/* if the device is closed, we must leave it alone */
+	rtnl_lock();
 	if (netif_running(adapter->netdev)) {
 		printk(KERN_INFO "%s: resetting\n", adapter->netdev->name);
 		vmxnet3_quiesce_dev(adapter);
@@ -2428,6 +2429,7 @@  vmxnet3_reset_work(struct work_struct *data)
 	} else {
 		printk(KERN_INFO "%s: already closed\n", adapter->netdev->name);
 	}
+	rtnl_unlock();
 
 	clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
 }