Message ID | CA+55aFwxbs_hLG58Q_xSK2vpufjmwMk-xkqxTNh_5h-A8y4vbg@mail.gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, Mar 5, 2016 at 1:43 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Mar 4, 2016 at 2:26 PM, Andrey Konovalov <andreyknvl@gmail.com> wrote: >> >> and when I run the vm and connect the device I get: >> >> [ 23.672662] cdc_ncm 1-1:1.6: bind() failure >> [ 23.673447] usbnet_probe(): freeing netdev: ffff88006ab48000 >> [ 23.675822] usbnet_probe(): freeing netdev: ffff88006ab48000 >> >> So this seems to be a double-free (or at least a double free_netdev() >> call), but the object gets freed twice from usbnet_probe() and not >> from usbnet_disconnect(), so you're right that the latter doesn't get >> called. I'm not sure how usbnet_probe() ends up being called twice. > > I still don't think it's a double free. I think the probe thing is > called twice, and that's why to different allocations get free'd twice > (and the reason it's the same pointer is that the same allocation got > reused) > > But I don't know that driver, really. > >>> Which particular usbnet bind routine is this? There are multiple >>> sub-drivers for usbnet that all do different things. >> >> cdc_ncm_bind() > > Ok, so that matches my theory. > > Does this attached stupid patch make the warning go away? Because from > what I can tell, usbnet_link_change() really will start using that > "dev" that simply isn't valid - and will be free'd - if the bind > fails. Yes, KASAN doesn't report anything with your patch. > > So you have usbnet_defer_kevent() getting triggered, which in turn > ends up using "usbnet->kevent" > > But somebody like Oliver is really the right person to check this. For > example, it's entirely possible that we should just instead do > > cancel_work_sync(&dev->kevent); > > before the "free_netdev(net)" in the "out1" label. > > And there might be other things that bind() can have touched than the > kevent workqueue. > > Linus
On Fri, 2016-03-04 at 14:43 -0800, Linus Torvalds wrote: > So you have usbnet_defer_kevent() getting triggered, which in turn > ends up using "usbnet->kevent" > > But somebody like Oliver is really the right person to check this. For > example, it's entirely possible that we should just instead do > > cancel_work_sync(&dev->kevent); > > before the "free_netdev(net)" in the "out1" label. Hi Bjorn, I thinbk Linus has analyzed this correctly, but the fix really needs to cancel the work, because we can also fail later after bind() has already run. However, still cdc-ncm and the other drivers should clean up after themselves if bind() fails, as usbnet really cannot known what the subdrivers have done. So in conclusion, I think Linus' fix should also go into cdc-ncm. Regards Oliver
On March 5, 2016 4:51:30 PM CET, Oliver Neukum <oneukum@suse.com> wrote: >On Fri, 2016-03-04 at 14:43 -0800, Linus Torvalds wrote: > >> So you have usbnet_defer_kevent() getting triggered, which in turn >> ends up using "usbnet->kevent" >> >> But somebody like Oliver is really the right person to check this. >For >> example, it's entirely possible that we should just instead do >> >> cancel_work_sync(&dev->kevent); >> >> before the "free_netdev(net)" in the "out1" label. > >Hi Bjorn, > >I thinbk Linus has analyzed this correctly, but the fix really needs >to cancel the work, because we can also fail later after bind() has >already run. However, still cdc-ncm and the other drivers should clean >up after themselves if bind() fails, as usbnet really cannot known what >the subdrivers have done. > >So in conclusion, I think Linus' fix should also go into cdc-ncm. Definitely. The patch is so obviously correct that we can only wonder how it was possible to miss it it the first place :) Will take a look to see if we could do a better job cleaning up in other places. (I do also wonder a bit about the failure to bind - is that expected or is there some bug in the cdc_ncm descriptor parsing?) Bjørn
On Fri, Mar 4, 2016 at 11:43 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Mar 4, 2016 at 2:26 PM, Andrey Konovalov <andreyknvl@gmail.com> wrote: >> >> and when I run the vm and connect the device I get: >> >> [ 23.672662] cdc_ncm 1-1:1.6: bind() failure >> [ 23.673447] usbnet_probe(): freeing netdev: ffff88006ab48000 >> [ 23.675822] usbnet_probe(): freeing netdev: ffff88006ab48000 >> >> So this seems to be a double-free (or at least a double free_netdev() >> call), but the object gets freed twice from usbnet_probe() and not >> from usbnet_disconnect(), so you're right that the latter doesn't get >> called. I'm not sure how usbnet_probe() ends up being called twice. > > I still don't think it's a double free. I think the probe thing is > called twice, and that's why to different allocations get free'd twice > (and the reason it's the same pointer is that the same allocation got > reused) FYI, we have a KASAN patch in flight that adds "quarantine" for freed memory (memory is reused only after a significant delay). It should help to easily differentiate between use-after-free, double-free and heap-out-of-bound. Yes, now it is confusing. > But I don't know that driver, really. > >>> Which particular usbnet bind routine is this? There are multiple >>> sub-drivers for usbnet that all do different things. >> >> cdc_ncm_bind() > > Ok, so that matches my theory. > > Does this attached stupid patch make the warning go away? Because from > what I can tell, usbnet_link_change() really will start using that > "dev" that simply isn't valid - and will be free'd - if the bind > fails. > > So you have usbnet_defer_kevent() getting triggered, which in turn > ends up using "usbnet->kevent" > > But somebody like Oliver is really the right person to check this. For > example, it's entirely possible that we should just instead do > > cancel_work_sync(&dev->kevent); > > before the "free_netdev(net)" in the "out1" label. > > And there might be other things that bind() can have touched than the > kevent workqueue. > > Linus
On Sat, Mar 5, 2016 at 11:53 AM, Bjørn Mork <bjorn@mork.no> wrote: > > > Definitely. The patch is so obviously correct that we can only wonder how it was possible to miss it it the first place :) > > Will take a look to see if we could do a better job cleaning up in other places. What should I do for 4.5? Will there be a pull request for this, or should I just commit my cdc_ncm_bind() patch directly? Linus
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index dc0212c3cc28..5878b54cc563 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -995,6 +995,8 @@ static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf) * placed NDP. */ ret = cdc_ncm_bind_common(dev, intf, CDC_NCM_DATA_ALTSETTING_NCM, 0); + if (ret < 0) + return ret; /* * We should get an event when network connection is "connected" or @@ -1003,7 +1005,7 @@ static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf) * start IPv6 negotiation and more. */ usbnet_link_change(dev, 0, 0); - return ret; + return 0; } static void cdc_ncm_align_tail(struct sk_buff *skb, size_t modulus, size_t remainder, size_t max)