Message ID | 1394529263.16128.6.camel@linux-fkkt.site |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
I don't know enough about how runtime PM works in this driver to really review that patch, sorry. (Would this do a complete resume of the device if we call usbnet_stop() while it was suspended? Is that what we want?) I think you could have also preserved the original logic of using dev->wait as a flag if you had just replaced 'if (!dev->wait &&' with 'if (!waitqueue_active(&dev->wait) &&' to check whether the waitqueue is empty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Oliver, Any update on the state of this patch? Did it get picked up for merge somewhere? Do you agree with my suggestion of sticking closer to the original logic instead of adding that autopm_get(), or do you maybe want to add some more reviewers to confirm your code? I don't really care that much which way we choose in the end, I just want to make sure this bug gets fixed and we don't forget about it. Thanks, Julius -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-03-17 at 14:53 -0700, Julius Werner wrote: > Hi Oliver, > > Any update on the state of this patch? Did it get picked up for merge > somewhere? Do you agree with my suggestion of sticking closer to the > original logic instead of adding that autopm_get(), or do you maybe > want to add some more reviewers to confirm your code? I don't really > care that much which way we choose in the end, I just want to make > sure this bug gets fixed and we don't forget about it. I am hping for the reporter of the original bug to test it. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote: > On Mon, 2014-03-17 at 14:53 -0700, Julius Werner wrote: >> Hi Oliver, >> >> Any update on the state of this patch? Did it get picked up for merge >> somewhere? Do you agree with my suggestion of sticking closer to the >> original logic instead of adding that autopm_get(), or do you maybe >> want to add some more reviewers to confirm your code? I don't really >> care that much which way we choose in the end, I just want to make >> sure this bug gets fixed and we don't forget about it. > > I am hping for the reporter of the original bug to test it. Ok. I didn't realize I was part of the dependency chain here. :) It seemed "obvious" to me once Julius described the race. I'll try it on the offending system but fear I will run into some other problem. cheers, grant > > Regards > Oliver > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote:
> I am hping for the reporter of the original bug to test it.
Oliver,
on a haswell system running ChromeOS-3.8 kernel, this patch as-is
resulted in a "Bad Spinlock Magic" error and subsequent pagefault.
I believe the sequence was:
usbnet_open -> tasklet_schedule(dev->bh) -> usbnet_bh -> wake_up
(&dev->wait) -> panic
I tried adding the following change on top of your patch but believe
the plumbing still isn't quite correct since the USB device (eth0) is
reporting a link but no TX or RX of traffic:
@@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
goto done;
}
+ /* usbnet_bh() expects the spinlock to be initialized. */
+ init_waitqueue_head(&dev->wait);
+
/* hard_mtu or rx_urb_size may change in reset() */
usbnet_update_max_qlen(dev);
I suspect this hunk of your patch is now causing different problems at
init time:
@@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param)
clear_bit(EVENT_RX_KILL, &dev->flags);
// waiting for all pending urbs to complete?
- if (dev->wait) {
- if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
- wake_up (dev->wait);
- }
+ if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
+ wake_up (&dev->wait);
// or are we maybe short a few urbs?
} else if (netif_running (dev->net) &&
Please advise on what you'd like me to try next.
cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 18, 2014 at 1:51 PM, Grant Grundler <grundler@google.com> wrote: > On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote: >> I am hping for the reporter of the original bug to test it. > > Oliver, > on a haswell system running ChromeOS-3.8 kernel, this patch as-is > resulted in a "Bad Spinlock Magic" error and subsequent pagefault. > I believe the sequence was: > usbnet_open -> tasklet_schedule(dev->bh) -> usbnet_bh -> wake_up > (&dev->wait) -> panic BTW, full console output is available on this public patch: https://code.google.com/p/chromium/issues/detail?id=353648 cheers, grant > > I tried adding the following change on top of your patch but believe > the plumbing still isn't quite correct since the USB device (eth0) is > reporting a link but no TX or RX of traffic: > @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net) > goto done; > } > > + /* usbnet_bh() expects the spinlock to be initialized. */ > + init_waitqueue_head(&dev->wait); > + > /* hard_mtu or rx_urb_size may change in reset() */ > usbnet_update_max_qlen(dev); > > I suspect this hunk of your patch is now causing different problems at > init time: > @@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param) > clear_bit(EVENT_RX_KILL, &dev->flags); > > // waiting for all pending urbs to complete? > - if (dev->wait) { > - if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) { > - wake_up (dev->wait); > - } > + if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) { > + wake_up (&dev->wait); > > // or are we maybe short a few urbs? > } else if (netif_running (dev->net) && > > Please advise on what you'd like me to try next. > > cheers, > grant -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I tried adding the following change on top of your patch but believe > the plumbing still isn't quite correct since the USB device (eth0) is > reporting a link but no TX or RX of traffic: > @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net) > goto done; > } > > + /* usbnet_bh() expects the spinlock to be initialized. */ > + init_waitqueue_head(&dev->wait); > + > /* hard_mtu or rx_urb_size may change in reset() */ > usbnet_update_max_qlen(dev); I think a better place for this would be in usbnet_probe() (together with all the other dev->xxx initialization). I'm not quite sure how that could cause the transfer problems you are seeing, but at least you will no longer initialize the waitqueue multiple times on multiple usbnet_open (which is probably a bad idea). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 18, 2014 at 6:09 PM, Julius Werner <jwerner@chromium.org> wrote: > I think a better place for this would be in usbnet_probe() (together > with all the other dev->xxx initialization). Definitely better. @@ -1536,6 +1536,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb dev->driver_name = name; dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK); + init_waitqueue_head(&dev->wait); skb_queue_head_init (&dev->rxq); skb_queue_head_init (&dev->txq); skb_queue_head_init (&dev->done); > I'm not quite sure how > that could cause the transfer problems you are seeing, but at least > you will no longer initialize the waitqueue multiple times on multiple > usbnet_open (which is probably a bad idea). Yeah, I agree. I don't expect usbnet_open would get called multiple times but I'll try the above anyway. thanks, grant -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 18, 2014 at 6:40 PM, Grant Grundler <grundler@google.com> wrote: > On Tue, Mar 18, 2014 at 6:09 PM, Julius Werner <jwerner@chromium.org> wrote: >> I think a better place for this would be in usbnet_probe() (together >> with all the other dev->xxx initialization). > > Definitely better. > > @@ -1536,6 +1536,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb > dev->driver_name = name; > dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV > | NETIF_MSG_PROBE | NETIF_MSG_LINK); > + init_waitqueue_head(&dev->wait); > skb_queue_head_init (&dev->rxq); > skb_queue_head_init (&dev->txq); > skb_queue_head_init (&dev->done); Oliver, So even with this additional change to usbnet_probe, the device is reporting a link but can't transmit packets. I've tried with three different USB dongles (AX88178, AX88772B, SMSC75xx). In the last case, I ran "ifconfig eth0 192.168.1.100" and then tried to ping 192.168.1.1. "ifconfig eth0" then reports "RX packets 0 bytes 0" and "TX packets 0 bytes 10796". The TX packets != bytes seems odd and suggests (to me) that data getting discarded someplace in the stack. I confirmed by running tcpdump on the other end of the link (192.168.1.1 side) and got nothing from 192.168.1.100. I believe the "10796" TX bytes is due to the connection manager "trying really hard" to get an IP address (via DHCP). Your patch looks like it should work. But it's missing something and I don't have a clue what it is. Ideas? cheers, grant -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index dd10d58..b5bcb48 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -752,14 +752,12 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs); // precondition: never called in_interrupt static void usbnet_terminate_urbs(struct usbnet *dev) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(unlink_wakeup); DECLARE_WAITQUEUE(wait, current); int temp; /* ensure there are no more active urbs */ - add_wait_queue(&unlink_wakeup, &wait); + add_wait_queue(&dev->wait, &wait); set_current_state(TASK_UNINTERRUPTIBLE); - dev->wait = &unlink_wakeup; temp = unlink_urbs(dev, &dev->txq) + unlink_urbs(dev, &dev->rxq); @@ -773,15 +771,14 @@ static void usbnet_terminate_urbs(struct usbnet *dev) "waited for %d urb completions\n", temp); } set_current_state(TASK_RUNNING); - dev->wait = NULL; - remove_wait_queue(&unlink_wakeup, &wait); + remove_wait_queue(&dev->wait, &wait); } int usbnet_stop (struct net_device *net) { struct usbnet *dev = netdev_priv(net); struct driver_info *info = dev->driver_info; - int retval; + int retval, pm; clear_bit(EVENT_DEV_OPEN, &dev->flags); netif_stop_queue (net); @@ -791,6 +788,8 @@ int usbnet_stop (struct net_device *net) net->stats.rx_packets, net->stats.tx_packets, net->stats.rx_errors, net->stats.tx_errors); + /* to not race resume */ + pm = usb_autopm_get_interface(dev->intf); /* allow minidriver to stop correctly (wireless devices to turn off * radio etc) */ if (info->stop) { @@ -817,6 +816,9 @@ int usbnet_stop (struct net_device *net) dev->flags = 0; del_timer_sync (&dev->delay); tasklet_kill (&dev->bh); + if (!pm) + usb_autopm_put_interface(dev->intf); + if (info->manage_power && !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags)) info->manage_power(dev, 0); @@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param) clear_bit(EVENT_RX_KILL, &dev->flags); // waiting for all pending urbs to complete? - if (dev->wait) { - if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) { - wake_up (dev->wait); - } + if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) { + wake_up (&dev->wait); // or are we maybe short a few urbs? } else if (netif_running (dev->net) && @@ -1791,9 +1791,10 @@ int usbnet_resume (struct usb_interface *intf) spin_unlock_irq(&dev->txq.lock); if (test_bit(EVENT_DEV_OPEN, &dev->flags)) { - /* handle remote wakeup ASAP */ - if (!dev->wait && - netif_device_present(dev->net) && + /* handle remote wakeup ASAP + * we cannot race against stop + */ + if (netif_device_present(dev->net) && !timer_pending(&dev->delay) && !test_bit(EVENT_RX_HALT, &dev->flags)) rx_alloc_submit(dev, GFP_NOIO); diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index e303eef..0662e98 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -30,7 +30,7 @@ struct usbnet { struct driver_info *driver_info; const char *driver_name; void *driver_priv; - wait_queue_head_t *wait; + wait_queue_head_t wait; struct mutex phy_mutex; unsigned char suspend_count; unsigned char pkt_cnt, pkt_err;