diff mbox

Possible double-free in the usbnet driver

Message ID CA+55aFwxbs_hLG58Q_xSK2vpufjmwMk-xkqxTNh_5h-A8y4vbg@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Linus Torvalds March 4, 2016, 10:43 p.m. UTC
On Fri, Mar 4, 2016 at 2:26 PM, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> and when I run the vm and connect the device I get:
>
> [   23.672662] cdc_ncm 1-1:1.6: bind() failure
> [   23.673447] usbnet_probe(): freeing netdev: ffff88006ab48000
> [   23.675822] usbnet_probe(): freeing netdev: ffff88006ab48000
>
> So this seems to be a double-free (or at least a double free_netdev()
> call), but the object gets freed twice from usbnet_probe() and not
> from usbnet_disconnect(), so you're right that the latter doesn't get
> called. I'm not sure how usbnet_probe() ends up being called twice.

I still don't think it's a double free. I think the probe thing is
called twice, and that's why to different allocations get free'd twice
(and the reason it's the same pointer is that the same allocation got
reused)

But I don't know that driver, really.

>> Which particular usbnet bind routine is this? There are multiple
>> sub-drivers for usbnet that all do different things.
>
> cdc_ncm_bind()

Ok, so that matches my theory.

Does this attached stupid patch make the warning go away? Because from
what I can tell, usbnet_link_change() really will start using that
"dev" that simply isn't valid - and will be free'd - if the bind
fails.

So you have usbnet_defer_kevent() getting triggered, which in turn
ends up using "usbnet->kevent"

But somebody like Oliver is really the right person to check this. For
example, it's entirely possible that we should just instead do

        cancel_work_sync(&dev->kevent);

before the "free_netdev(net)" in the "out1" label.

And there might be other things that bind() can have touched than the
kevent workqueue.

               Linus
drivers/net/usb/cdc_ncm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Andrey Konovalov March 4, 2016, 11 p.m. UTC | #1
On Sat, Mar 5, 2016 at 1:43 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Mar 4, 2016 at 2:26 PM, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>>
>> and when I run the vm and connect the device I get:
>>
>> [   23.672662] cdc_ncm 1-1:1.6: bind() failure
>> [   23.673447] usbnet_probe(): freeing netdev: ffff88006ab48000
>> [   23.675822] usbnet_probe(): freeing netdev: ffff88006ab48000
>>
>> So this seems to be a double-free (or at least a double free_netdev()
>> call), but the object gets freed twice from usbnet_probe() and not
>> from usbnet_disconnect(), so you're right that the latter doesn't get
>> called. I'm not sure how usbnet_probe() ends up being called twice.
>
> I still don't think it's a double free. I think the probe thing is
> called twice, and that's why to different allocations get free'd twice
> (and the reason it's the same pointer is that the same allocation got
> reused)
>
> But I don't know that driver, really.
>
>>> Which particular usbnet bind routine is this? There are multiple
>>> sub-drivers for usbnet that all do different things.
>>
>> cdc_ncm_bind()
>
> Ok, so that matches my theory.
>
> Does this attached stupid patch make the warning go away? Because from
> what I can tell, usbnet_link_change() really will start using that
> "dev" that simply isn't valid - and will be free'd - if the bind
> fails.

Yes, KASAN doesn't report anything with your patch.

>
> So you have usbnet_defer_kevent() getting triggered, which in turn
> ends up using "usbnet->kevent"
>
> But somebody like Oliver is really the right person to check this. For
> example, it's entirely possible that we should just instead do
>
>         cancel_work_sync(&dev->kevent);
>
> before the "free_netdev(net)" in the "out1" label.
>
> And there might be other things that bind() can have touched than the
> kevent workqueue.
>
>                Linus
Oliver Neukum March 5, 2016, 3:51 p.m. UTC | #2
On Fri, 2016-03-04 at 14:43 -0800, Linus Torvalds wrote:

> So you have usbnet_defer_kevent() getting triggered, which in turn
> ends up using "usbnet->kevent"
> 
> But somebody like Oliver is really the right person to check this. For
> example, it's entirely possible that we should just instead do
> 
>         cancel_work_sync(&dev->kevent);
> 
> before the "free_netdev(net)" in the "out1" label.

Hi Bjorn,

I thinbk Linus has analyzed this correctly, but the fix really needs
to cancel the work, because we can also fail later after bind() has
already run. However, still cdc-ncm and the other drivers should clean
up after themselves if bind() fails, as usbnet really cannot known what
the subdrivers have done.

So in conclusion, I think Linus' fix should also go into cdc-ncm.

	Regards
		Oliver
Bjørn Mork March 5, 2016, 7:53 p.m. UTC | #3
On March 5, 2016 4:51:30 PM CET, Oliver Neukum <oneukum@suse.com> wrote:
>On Fri, 2016-03-04 at 14:43 -0800, Linus Torvalds wrote:
>
>> So you have usbnet_defer_kevent() getting triggered, which in turn
>> ends up using "usbnet->kevent"
>> 
>> But somebody like Oliver is really the right person to check this.
>For
>> example, it's entirely possible that we should just instead do
>> 
>>         cancel_work_sync(&dev->kevent);
>> 
>> before the "free_netdev(net)" in the "out1" label.
>
>Hi Bjorn,
>
>I thinbk Linus has analyzed this correctly, but the fix really needs
>to cancel the work, because we can also fail later after bind() has
>already run. However, still cdc-ncm and the other drivers should clean
>up after themselves if bind() fails, as usbnet really cannot known what
>the subdrivers have done.
>
>So in conclusion, I think Linus' fix should also go into cdc-ncm.

Definitely.  The patch is so obviously correct that we can only wonder how it was possible to miss it it the first place :)

Will take a look to see if we could do a better job cleaning up in other places.

(I do also wonder a bit about the failure to bind - is that expected or is there some bug in the cdc_ncm descriptor parsing?)


Bjørn
Dmitry Vyukov March 7, 2016, 9:08 a.m. UTC | #4
On Fri, Mar 4, 2016 at 11:43 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Mar 4, 2016 at 2:26 PM, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>>
>> and when I run the vm and connect the device I get:
>>
>> [   23.672662] cdc_ncm 1-1:1.6: bind() failure
>> [   23.673447] usbnet_probe(): freeing netdev: ffff88006ab48000
>> [   23.675822] usbnet_probe(): freeing netdev: ffff88006ab48000
>>
>> So this seems to be a double-free (or at least a double free_netdev()
>> call), but the object gets freed twice from usbnet_probe() and not
>> from usbnet_disconnect(), so you're right that the latter doesn't get
>> called. I'm not sure how usbnet_probe() ends up being called twice.
>
> I still don't think it's a double free. I think the probe thing is
> called twice, and that's why to different allocations get free'd twice
> (and the reason it's the same pointer is that the same allocation got
> reused)


FYI, we have a KASAN patch in flight that adds "quarantine" for freed
memory (memory is reused only after a significant delay). It should
help to easily differentiate between use-after-free, double-free and
heap-out-of-bound. Yes, now it is confusing.


> But I don't know that driver, really.
>
>>> Which particular usbnet bind routine is this? There are multiple
>>> sub-drivers for usbnet that all do different things.
>>
>> cdc_ncm_bind()
>
> Ok, so that matches my theory.
>
> Does this attached stupid patch make the warning go away? Because from
> what I can tell, usbnet_link_change() really will start using that
> "dev" that simply isn't valid - and will be free'd - if the bind
> fails.
>
> So you have usbnet_defer_kevent() getting triggered, which in turn
> ends up using "usbnet->kevent"
>
> But somebody like Oliver is really the right person to check this. For
> example, it's entirely possible that we should just instead do
>
>         cancel_work_sync(&dev->kevent);
>
> before the "free_netdev(net)" in the "out1" label.
>
> And there might be other things that bind() can have touched than the
> kevent workqueue.
>
>                Linus
Linus Torvalds March 7, 2016, 6:13 p.m. UTC | #5
On Sat, Mar 5, 2016 at 11:53 AM, Bjørn Mork <bjorn@mork.no> wrote:
>
>
> Definitely.  The patch is so obviously correct that we can only wonder how it was possible to miss it it the first place :)
>
> Will take a look to see if we could do a better job cleaning up in other places.

What should I do for 4.5? Will there be a pull request for this, or
should I just commit my cdc_ncm_bind() patch directly?

              Linus
diff mbox

Patch

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index dc0212c3cc28..5878b54cc563 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -995,6 +995,8 @@  static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf)
 	 * placed NDP.
 	 */
 	ret = cdc_ncm_bind_common(dev, intf, CDC_NCM_DATA_ALTSETTING_NCM, 0);
+	if (ret < 0)
+		return ret;
 
 	/*
 	 * We should get an event when network connection is "connected" or
@@ -1003,7 +1005,7 @@  static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf)
 	 * start IPv6 negotiation and more.
 	 */
 	usbnet_link_change(dev, 0, 0);
-	return ret;
+	return 0;
 }
 
 static void cdc_ncm_align_tail(struct sk_buff *skb, size_t modulus, size_t remainder, size_t max)