Message ID | CA+55aFwg=OtMWmU153uM27Fwk5eVv5ZBGBA9phQqVqn3nNAy6Q@mail.gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 21/04/17 20:42, Linus Torvalds wrote: > On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I'm assuming that the real cause is simply that "dev->reg_state" ends >> up being NETREG_UNREGISTERING or something. Maybe the BUG_ON() could >> be just removed, and replaced by the previous warning about >> NETREG_UNINITIALIZED. >> >> Something like the attached (TOTALLY UNTESTED) patch. > > .. might as well test it. > > That patch doesn't fix the problem, but it does show that yes, it was > NETREG_UNREGISTERING: > > unregister_netdevice: device pim6reg/ffff962dc4606000 was not registered (2) > > but then immediately afterwards we get > > general protection fault: 0000 [#1] SMP > Workqueue: netns cleanup_net > RIP: 0010:dev_shutdown+0xe/0xc0 > Call Trace: > rollback_registered_many+0x2a5/0x440 > unregister_netdevice_many+0x1e/0xb0 > default_device_exit_batch+0x145/0x170 > > which is due to a > > mov 0x388(%rdi),%eax > > where %rdi is 0xdead000000000090. That is at the very beginning of > dev_shutdown, it's "dev" itself that has that value, so it comes from > (_another_) invocation of rollback_registered_many(), when it does > that > > list_for_each_entry(dev, head, unreg_list) { > > so it seems to be a case of another "list_del() leaves list in bad > state", and it was the added test for "dev->reg_state != > NETREG_REGISTERED" that did that > > list_del(&dev->unreg_list); > > and left random contents in the unreg_list. > > So that "handle error case" was almost certainly just buggy too. > > And the bug seems to be that we're trying to unregister a netdevice > that has already been unregistered. > > Over to Eric and networking people. This oops is user-triggerable, and > leaves the machine in a bad state (the original BUG_ON() and the new > GP fault both happen while holding the RTNL, so networking is not > healthy afterwards. > > Linus > Right, I've already posted a patch for ip6mr that should fix the issue. CCed you and LKML just now. Thanks, Nik
On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Apr 21, 2017 at 5:48 AM, Andrey Konovalov <andreyknvl@google.com> wrote: >> >> I've got the following error report while fuzzing the kernel with syzkaller. >> >> ------------[ cut here ]------------ >> kernel BUG at net/core/dev.c:6813! > > Another useless BUG_ON() that I think we are double-unregister'ing the pim6reg device, we probably need something like: commit 7dc00c82cbb0119cf4663f65bbaa2cc55f961db2 Author: Wang Chen <wangchen@cn.fujitsu.com> Date: Mon Jul 14 20:56:34 2008 -0700 ipv4: Fix ipmr unregister device oops An oops happens during device unregister.
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 21 Apr 2017 10:42:48 -0700 > Over to Eric and networking people. This oops is user-triggerable, and > leaves the machine in a bad state (the original BUG_ON() and the new > GP fault both happen while holding the RTNL, so networking is not > healthy afterwards. I have the fix in my tree and will push it to shortly.
From: Cong Wang <xiyou.wangcong@gmail.com> Date: Fri, 21 Apr 2017 11:55:04 -0700 > On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Fri, Apr 21, 2017 at 5:48 AM, Andrey Konovalov <andreyknvl@google.com> wrote: >>> >>> I've got the following error report while fuzzing the kernel with syzkaller. >>> >>> ------------[ cut here ]------------ >>> kernel BUG at net/core/dev.c:6813! >> >> Another useless BUG_ON() that > > I think we are double-unregister'ing the pim6reg device, > we probably need something like: This particular bug is fixed by Nikolay's fix. I'm not saying you haven't spotted another bug.
diff --git a/net/core/dev.c b/net/core/dev.c index 533a6d6f6092..4e50b2fb3a90 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6801,16 +6801,14 @@ static void rollback_registered_many(struct list_head *head) * for initialization unwind. Remove those * devices and proceed with the remaining. */ - if (dev->reg_state == NETREG_UNINITIALIZED) { - pr_debug("unregister_netdevice: device %s/%p never was registered\n", - dev->name, dev); + if (WARN_ON_ONCE(dev->reg_state != NETREG_REGISTERED)) { + printk("unregister_netdevice: device %s/%p was not registered (%d)\n", + dev->name, dev, dev->reg_state); - WARN_ON(1); list_del(&dev->unreg_list); continue; } dev->dismantle = true; - BUG_ON(dev->reg_state != NETREG_REGISTERED); } /* If device is running, close it first. */