diff mbox

usbnet: driver_info->stop required to stop USB interrupts?

Message ID 1394529263.16128.6.camel@linux-fkkt.site
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Oliver Neukum March 11, 2014, 9:14 a.m. UTC
On Mon, 2014-03-10 at 19:53 -0700, Julius Werner wrote:

> I think the best solution would be to just make dev->wait a directly
> embedded structure inside struct usbnet instead of a pointer to
> something stack-allocated. usbnet_bh() could just call wake_up()
> unconditionally (if empty it will be a noop), and then one other check
> for !dev->wait could be replaced with a call to waitqueue_active().
> Then the waitqueue-internal locks should be enough to protect all
> accesses.

What do you think?

	Regards
		Oliver

From 5ea2cb8a2893953fc7067aeae093ab20239d5108 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum@suse.de>
Date: Tue, 11 Mar 2014 09:59:48 +0100
Subject: [PATCH] usbnet: include wait queue head in device structure

This fixes a race which happens by freeing an object on the stack.
Quoting Julius:
> The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().

The fix is to just not allocate the data structure on the stack.
As dev->wait is abused as a flag it also takes a runtime PM change
to fix this bug.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
---
 drivers/net/usb/usbnet.c   | 27 ++++++++++++++-------------
 include/linux/usb/usbnet.h |  2 +-
 2 files changed, 15 insertions(+), 14 deletions(-)

Comments

Julius Werner March 11, 2014, 5:27 p.m. UTC | #1
I don't know enough about how runtime PM works in this driver to
really review that patch, sorry. (Would this do a complete resume of
the device if we call usbnet_stop() while it was suspended? Is that
what we want?)

I think you could have also preserved the original logic of using
dev->wait as a flag if you had just replaced 'if (!dev->wait &&' with
'if (!waitqueue_active(&dev->wait) &&' to check whether the waitqueue
is empty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julius Werner March 17, 2014, 9:53 p.m. UTC | #2
Hi Oliver,

Any update on the state of this patch? Did it get picked up for merge
somewhere? Do you agree with my suggestion of sticking closer to the
original logic instead of adding that autopm_get(), or do you maybe
want to add some more reviewers to confirm your code? I don't really
care that much which way we choose in the end, I just want to make
sure this bug gets fixed and we don't forget about it.

Thanks,
Julius
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum March 17, 2014, 9:55 p.m. UTC | #3
On Mon, 2014-03-17 at 14:53 -0700, Julius Werner wrote:
> Hi Oliver,
> 
> Any update on the state of this patch? Did it get picked up for merge
> somewhere? Do you agree with my suggestion of sticking closer to the
> original logic instead of adding that autopm_get(), or do you maybe
> want to add some more reviewers to confirm your code? I don't really
> care that much which way we choose in the end, I just want to make
> sure this bug gets fixed and we don't forget about it.

I am hping for the reporter of the original bug to test it.

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grant Grundler March 17, 2014, 10:02 p.m. UTC | #4
On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote:
> On Mon, 2014-03-17 at 14:53 -0700, Julius Werner wrote:
>> Hi Oliver,
>>
>> Any update on the state of this patch? Did it get picked up for merge
>> somewhere? Do you agree with my suggestion of sticking closer to the
>> original logic instead of adding that autopm_get(), or do you maybe
>> want to add some more reviewers to confirm your code? I don't really
>> care that much which way we choose in the end, I just want to make
>> sure this bug gets fixed and we don't forget about it.
>
> I am hping for the reporter of the original bug to test it.

Ok. I didn't realize I was part of the dependency chain here. :)
It seemed "obvious" to me once Julius described the race.

I'll try it on the offending system but fear I will run into some other problem.

cheers,
grant

>
>         Regards
>                 Oliver
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grant Grundler March 18, 2014, 8:51 p.m. UTC | #5
On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote:
> I am hping for the reporter of the original bug to test it.

Oliver,
on a haswell system running ChromeOS-3.8 kernel, this patch as-is
resulted in a "Bad Spinlock Magic" error and subsequent pagefault.
I believe the sequence was:
   usbnet_open -> tasklet_schedule(dev->bh) -> usbnet_bh -> wake_up
(&dev->wait) -> panic

I tried adding the following change on top of your patch but believe
the plumbing still isn't quite correct since the USB device (eth0) is
reporting a link but no TX or RX of traffic:
 @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
                goto done;
        }

+       /* usbnet_bh() expects the spinlock to be initialized. */
+       init_waitqueue_head(&dev->wait);
+
        /* hard_mtu or rx_urb_size may change in reset() */
        usbnet_update_max_qlen(dev);

I suspect this hunk of your patch is now causing different problems at
init time:
@@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param)
        clear_bit(EVENT_RX_KILL, &dev->flags);

        // waiting for all pending urbs to complete?
-       if (dev->wait) {
-               if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
-                       wake_up (dev->wait);
-               }
+       if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
+               wake_up (&dev->wait);

        // or are we maybe short a few urbs?
        } else if (netif_running (dev->net) &&

Please advise on what you'd like me to try next.

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grant Grundler March 18, 2014, 8:52 p.m. UTC | #6
On Tue, Mar 18, 2014 at 1:51 PM, Grant Grundler <grundler@google.com> wrote:
> On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote:
>> I am hping for the reporter of the original bug to test it.
>
> Oliver,
> on a haswell system running ChromeOS-3.8 kernel, this patch as-is
> resulted in a "Bad Spinlock Magic" error and subsequent pagefault.
> I believe the sequence was:
>    usbnet_open -> tasklet_schedule(dev->bh) -> usbnet_bh -> wake_up
> (&dev->wait) -> panic

BTW, full console output is available on this public patch:
    https://code.google.com/p/chromium/issues/detail?id=353648

cheers,
grant

>
> I tried adding the following change on top of your patch but believe
> the plumbing still isn't quite correct since the USB device (eth0) is
> reporting a link but no TX or RX of traffic:
>  @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
>                 goto done;
>         }
>
> +       /* usbnet_bh() expects the spinlock to be initialized. */
> +       init_waitqueue_head(&dev->wait);
> +
>         /* hard_mtu or rx_urb_size may change in reset() */
>         usbnet_update_max_qlen(dev);
>
> I suspect this hunk of your patch is now causing different problems at
> init time:
> @@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param)
>         clear_bit(EVENT_RX_KILL, &dev->flags);
>
>         // waiting for all pending urbs to complete?
> -       if (dev->wait) {
> -               if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
> -                       wake_up (dev->wait);
> -               }
> +       if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
> +               wake_up (&dev->wait);
>
>         // or are we maybe short a few urbs?
>         } else if (netif_running (dev->net) &&
>
> Please advise on what you'd like me to try next.
>
> cheers,
> grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julius Werner March 19, 2014, 1:09 a.m. UTC | #7
> I tried adding the following change on top of your patch but believe
> the plumbing still isn't quite correct since the USB device (eth0) is
> reporting a link but no TX or RX of traffic:
>  @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
>                 goto done;
>         }
>
> +       /* usbnet_bh() expects the spinlock to be initialized. */
> +       init_waitqueue_head(&dev->wait);
> +
>         /* hard_mtu or rx_urb_size may change in reset() */
>         usbnet_update_max_qlen(dev);


I think a better place for this would be in usbnet_probe() (together
with all the other dev->xxx initialization). I'm not quite sure how
that could cause the transfer problems you are seeing, but at least
you will no longer initialize the waitqueue multiple times on multiple
usbnet_open (which is probably a bad idea).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grant Grundler March 19, 2014, 1:40 a.m. UTC | #8
On Tue, Mar 18, 2014 at 6:09 PM, Julius Werner <jwerner@chromium.org> wrote:
> I think a better place for this would be in usbnet_probe() (together
> with all the other dev->xxx initialization).

Definitely better.

@@ -1536,6 +1536,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb
        dev->driver_name = name;
        dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV
                                | NETIF_MSG_PROBE | NETIF_MSG_LINK);
+       init_waitqueue_head(&dev->wait);
        skb_queue_head_init (&dev->rxq);
        skb_queue_head_init (&dev->txq);
        skb_queue_head_init (&dev->done);


> I'm not quite sure how
> that could cause the transfer problems you are seeing, but at least
> you will no longer initialize the waitqueue multiple times on multiple
> usbnet_open (which is probably a bad idea).

Yeah, I agree. I don't expect usbnet_open would get called multiple
times but I'll try the above anyway.

thanks,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grant Grundler March 20, 2014, 12:58 a.m. UTC | #9
On Tue, Mar 18, 2014 at 6:40 PM, Grant Grundler <grundler@google.com> wrote:
> On Tue, Mar 18, 2014 at 6:09 PM, Julius Werner <jwerner@chromium.org> wrote:
>> I think a better place for this would be in usbnet_probe() (together
>> with all the other dev->xxx initialization).
>
> Definitely better.
>
> @@ -1536,6 +1536,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb
>         dev->driver_name = name;
>         dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV
>                                 | NETIF_MSG_PROBE | NETIF_MSG_LINK);
> +       init_waitqueue_head(&dev->wait);
>         skb_queue_head_init (&dev->rxq);
>         skb_queue_head_init (&dev->txq);
>         skb_queue_head_init (&dev->done);

Oliver,
So even with this additional change to usbnet_probe, the device is
reporting a link but can't transmit packets.
I've tried with three different USB dongles (AX88178, AX88772B, SMSC75xx).

In the last case, I ran "ifconfig eth0 192.168.1.100" and then tried
to ping 192.168.1.1. "ifconfig eth0" then reports "RX packets 0 bytes
0" and "TX packets 0 bytes 10796". The TX packets != bytes seems odd
and suggests (to me) that data getting discarded someplace in the
stack. I confirmed by running tcpdump on the other end of the link
(192.168.1.1 side) and got nothing from 192.168.1.100.

I believe the "10796" TX bytes is due to the connection manager
"trying really hard" to get an IP address (via DHCP).

Your patch looks like it should work. But it's missing something and I
don't have a clue what it is. Ideas?

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index dd10d58..b5bcb48 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -752,14 +752,12 @@  EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);
 // precondition: never called in_interrupt
 static void usbnet_terminate_urbs(struct usbnet *dev)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(unlink_wakeup);
 	DECLARE_WAITQUEUE(wait, current);
 	int temp;
 
 	/* ensure there are no more active urbs */
-	add_wait_queue(&unlink_wakeup, &wait);
+	add_wait_queue(&dev->wait, &wait);
 	set_current_state(TASK_UNINTERRUPTIBLE);
-	dev->wait = &unlink_wakeup;
 	temp = unlink_urbs(dev, &dev->txq) +
 		unlink_urbs(dev, &dev->rxq);
 
@@ -773,15 +771,14 @@  static void usbnet_terminate_urbs(struct usbnet *dev)
 				  "waited for %d urb completions\n", temp);
 	}
 	set_current_state(TASK_RUNNING);
-	dev->wait = NULL;
-	remove_wait_queue(&unlink_wakeup, &wait);
+	remove_wait_queue(&dev->wait, &wait);
 }
 
 int usbnet_stop (struct net_device *net)
 {
 	struct usbnet		*dev = netdev_priv(net);
 	struct driver_info	*info = dev->driver_info;
-	int			retval;
+	int			retval, pm;
 
 	clear_bit(EVENT_DEV_OPEN, &dev->flags);
 	netif_stop_queue (net);
@@ -791,6 +788,8 @@  int usbnet_stop (struct net_device *net)
 		   net->stats.rx_packets, net->stats.tx_packets,
 		   net->stats.rx_errors, net->stats.tx_errors);
 
+	/* to not race resume */
+	pm = usb_autopm_get_interface(dev->intf);
 	/* allow minidriver to stop correctly (wireless devices to turn off
 	 * radio etc) */
 	if (info->stop) {
@@ -817,6 +816,9 @@  int usbnet_stop (struct net_device *net)
 	dev->flags = 0;
 	del_timer_sync (&dev->delay);
 	tasklet_kill (&dev->bh);
+	if (!pm)
+		usb_autopm_put_interface(dev->intf);
+
 	if (info->manage_power &&
 	    !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
 		info->manage_power(dev, 0);
@@ -1438,10 +1440,8 @@  static void usbnet_bh (unsigned long param)
 	clear_bit(EVENT_RX_KILL, &dev->flags);
 
 	// waiting for all pending urbs to complete?
-	if (dev->wait) {
-		if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
-			wake_up (dev->wait);
-		}
+	if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
+		wake_up (&dev->wait);
 
 	// or are we maybe short a few urbs?
 	} else if (netif_running (dev->net) &&
@@ -1791,9 +1791,10 @@  int usbnet_resume (struct usb_interface *intf)
 		spin_unlock_irq(&dev->txq.lock);
 
 		if (test_bit(EVENT_DEV_OPEN, &dev->flags)) {
-			/* handle remote wakeup ASAP */
-			if (!dev->wait &&
-				netif_device_present(dev->net) &&
+			/* handle remote wakeup ASAP
+			 * we cannot race against stop
+			 */
+			if (netif_device_present(dev->net) &&
 				!timer_pending(&dev->delay) &&
 				!test_bit(EVENT_RX_HALT, &dev->flags))
 					rx_alloc_submit(dev, GFP_NOIO);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index e303eef..0662e98 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -30,7 +30,7 @@  struct usbnet {
 	struct driver_info	*driver_info;
 	const char		*driver_name;
 	void			*driver_priv;
-	wait_queue_head_t	*wait;
+	wait_queue_head_t	wait;
 	struct mutex		phy_mutex;
 	unsigned char		suspend_count;
 	unsigned char		pkt_cnt, pkt_err;