Message ID | 20100420174401.GB1334@midget.suse.cz |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Le mardi 20 avril 2010 à 19:44 +0200, Jiri Bohac a écrit : > Hi, > > I found what I believe is a race condition in __ipv6_ifa_notify(), in the call > to dst_free(). > > __ipv6_ifa_notify() contains: > > case RTM_DELADDR: > if (ifp->idev->cnf.forwarding) > addrconf_leave_anycast(ifp); > addrconf_leave_solict(ifp->idev, &ifp->addr); > dst_hold(&ifp->rt->u.dst); > if (ip6_del_rt(ifp->rt)) > dst_free(&ifp->rt->u.dst); > break; > > AFAICT, ip6_del_rt() will call dst_free() itself if it finds and actually > deletes the route: > ip6_del_rt() -> __ip6_del_rt() -> fib6_del() -> fib6_del_route() -> > -> rt6_release() -> dst_free() > > If it fails (like when it races with another invocation of ip6_del_rt()), it > will return nonzero and this will cause the above code to call dst_free() on its own. > > dst_free() has no protection against concurrent invocation and if Sorry ? of course dst_free() has a protection... By definition, the dst_destroy() is called only by the last thread with the final refcount on object. > two invocations make it through the "if (dst->obsolete > 1)" > check before one of them calls __dst_free(), the same dst_entry > may end up either: > 1) dst_destroy()ed and put on the dst_garbage.list, or > 2) put on the dst_garbage.list twice > both resulting in trouble once the GC is run. > > One possible code path leading to two invocations of __ipv6_ifa_notify() seems > to be when two bonding slaves receive a NS/NA with the bonds IPv6 address when > the bonding master is in the DAD phase with a tentative address: > > netif_receive_skb() gets invoked on two CPUs and sets skb->dev to the bonding master ... > ... ip6_mc_input() -> ip6_input_finish() -> icmpv6_rcv() -> ndisc_rcv() -> > -> ndisc_recv_ns() -> addrconf_dad_failure() -> ipv6_del_addr() -> ipv6_ifa_notify() -> > -> __ipv6_ifa_notify > > > What is the reason __ipv6_ifa_notify() calls dst_free() when > ip6_del_rt() fails? I don't see a way ip6_del_rt() could fail > with the dst still needing to be freed. > > I am just testing whether the following will help: > > --- a/net/ipv6/addrconf.c 2010-04-17 00:12:32.000000000 +0200 > +++ b/net/ipv6/addrconf.c 2010-04-20 19:07:35.000000000 +0200 > @@ -3974,8 +3974,7 @@ static void __ipv6_ifa_notify(int event, > addrconf_leave_anycast(ifp); > addrconf_leave_solict(ifp->idev, &ifp->addr); > dst_hold(&ifp->rt->u.dst); > - if (ip6_del_rt(ifp->rt)) > - dst_free(&ifp->rt->u.dst); > + ip6_del_rt(ifp->rt); > break; > } > } > I dont understand the problem Jiri. We just did dst_hold(&ifp->rt->u.dst), so if ip6_del_rt() fails we must dst_free(), or we leak a refcount. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 20, 2010 at 07:57:27PM +0200, Eric Dumazet wrote: > Le mardi 20 avril 2010 à 19:44 +0200, Jiri Bohac a écrit : > > --- a/net/ipv6/addrconf.c 2010-04-17 00:12:32.000000000 +0200 > > +++ b/net/ipv6/addrconf.c 2010-04-20 19:07:35.000000000 +0200 > > @@ -3974,8 +3974,7 @@ static void __ipv6_ifa_notify(int event, > > addrconf_leave_anycast(ifp); > > addrconf_leave_solict(ifp->idev, &ifp->addr); > > dst_hold(&ifp->rt->u.dst); > > - if (ip6_del_rt(ifp->rt)) > > - dst_free(&ifp->rt->u.dst); > > + ip6_del_rt(ifp->rt); > > break; > > } > > } > > > > > I dont understand the problem Jiri. > > We just did dst_hold(&ifp->rt->u.dst), so if ip6_del_rt() fails we must > dst_free(), or we leak a refcount. Well, no ... dst_free() does not decrement the refcount. The "opposite" of dst_hold() is dst_release(). And it does not automatically call dst_free() when the refcount drops to 0. dst_free() needs to be called explicitly and it apparently expects the caller to ensure that two dst_free()s won't be called simultaneously. But my bonding example shows this is not the case.
Hi, On Tue, Apr 20, 2010 at 07:44:01PM +0200, Jiri Bohac wrote: > What is the reason __ipv6_ifa_notify() calls dst_free() when > ip6_del_rt() fails? I don't see a way ip6_del_rt() could fail > with the dst still needing to be freed. checked again and I still think that if ip6_del_rt() fails, ifp->rt must have been freed already. Anybody with a counterexample? > I am just testing whether the following will help: > > --- a/net/ipv6/addrconf.c 2010-04-17 00:12:32.000000000 +0200 > +++ b/net/ipv6/addrconf.c 2010-04-20 19:07:35.000000000 +0200 > @@ -3974,8 +3974,7 @@ static void __ipv6_ifa_notify(int event, > addrconf_leave_anycast(ifp); > addrconf_leave_solict(ifp->idev, &ifp->addr); > dst_hold(&ifp->rt->u.dst); > - if (ip6_del_rt(ifp->rt)) > - dst_free(&ifp->rt->u.dst); > + ip6_del_rt(ifp->rt); > break; > } > } not surprisingly, it helps my case -- the race condition does not happen and the resulting oopses disappears. Of course, this does not prove the patch is correct. Could anybody with more insight into the dst refcounting please take a look? The code originally came from: Author: Hideaki Yoshifuji <yoshfuji@linux-ipv6.org> Date: Wed Aug 18 05:25:16 2004 +0900 [IPV6] refer inet6 device via corresponding local route from address structure. And has been modified later by: commit 4641e7a334adf6856300a98e7296dfc886c446af Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu Feb 2 16:55:45 2006 -0800 [IPV6]: Don't hold extra ref count in ipv6_ifa_notify Thanks,
On Wed, Apr 21, 2010 at 11:34:29PM +0200, Jiri Bohac wrote: > Hi, > > On Tue, Apr 20, 2010 at 07:44:01PM +0200, Jiri Bohac wrote: > > What is the reason __ipv6_ifa_notify() calls dst_free() when > > ip6_del_rt() fails? I don't see a way ip6_del_rt() could fail > > with the dst still needing to be freed. > > checked again and I still think that if ip6_del_rt() fails, > ifp->rt must have been freed already. Anybody with a > counterexample? I agree with your diagnosis and the two duplicate NDISC messages scenario sounds plausible. Anyway, I think the root of the issue is the fact that NDISC is calling addrconf_dad_failure with no locking whatsoever. The latter is not idempotent so some form of locking is needed. This bug appears to have been around since the very start. I'll dig deeper to see where we might be able to add some locks. Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu, 22 Apr 2010 10:32:11 +0800 > Anyway, I think the root of the issue is the fact that NDISC is > calling addrconf_dad_failure with no locking whatsoever. The > latter is not idempotent so some form of locking is needed. > > This bug appears to have been around since the very start. > > I'll dig deeper to see where we might be able to add some locks. Thanks Herbert. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/net/ipv6/addrconf.c 2010-04-17 00:12:32.000000000 +0200 +++ b/net/ipv6/addrconf.c 2010-04-20 19:07:35.000000000 +0200 @@ -3974,8 +3974,7 @@ static void __ipv6_ifa_notify(int event, addrconf_leave_anycast(ifp); addrconf_leave_solict(ifp->idev, &ifp->addr); dst_hold(&ifp->rt->u.dst); - if (ip6_del_rt(ifp->rt)) - dst_free(&ifp->rt->u.dst); + ip6_del_rt(ifp->rt); break; } }