Message ID | 519ea12b-4c24-9e8e-c5eb-ca02c9c7d264@i-love.sakura.ne.jp |
---|---|
State | Rejected |
Delegated to: | David Miller |
Headers | show |
Series | ipv4: Delete uncached routes upon unregistration of loopback device. | expand |
On 5/4/19 10:52 AM, Tetsuo Handa wrote: > syzbot is hitting infinite loop when a loopback device in a namespace is > unregistered [1]. This is because rt_flush_dev() is moving the refcount of > "any device to unregister" to "a loopback device in that namespace" but > nobody can drop the refcount moved from non loopback devices when the > loopback device in that namespace is unregistered. > > This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly > purge netdev references on uncached routes.") but there is no description > why we have to temporarily move the refcount to "a loopback device in that > namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op > when "a loopback device in that namespace" is about to be unregistered. > > Since I don't know the reason, this patch breaks the infinite loop by > deleting the uncached route (which eventually drops the refcount via > dst_destroy()) when "a loopback device in that namespace" is unregistered > rather than when "non-loopback devices in that namespace" is unregistered. Well, you have not fixed a bug, you simply made sure that whatever cpu is using the routes you forcibly deleted is going to crash the host very soon (use-after-frees have undefined behavior, but KASAN should crash most of the times) Please do not send patches like that with a huge CC list, keep networking patches to netdev mailing list. Mahesh has an alternative patch, adding a fake device that can not be dismantled to make sure we fully intercept skbs sent through a dead route, instead of relying on loopback dropping them later at some point.
On 2019/05/05 0:56, Eric Dumazet wrote:> > Well, you have not fixed a bug, you simply made sure that whatever cpu is using the > routes you forcibly deleted is going to crash the host very soon (use-after-frees have > undefined behavior, but KASAN should crash most of the times) I confirmed that this patch survives "#syz test:" before submitting. But you know that this patch is deleting the route entry too early. OK. > > Please do not send patches like that with a huge CC list, keep networking patches > to netdev mailing list. If netdev people started working on this "minutely crashing bug" earlier, I would not have written a patch... > > Mahesh has an alternative patch, adding a fake device that can not be dismantled > to make sure we fully intercept skbs sent through a dead route, instead of relying > on loopback dropping them later at some point. So, the reason to temporarily move the refcount is to give enough period so that the route entry is no longer used. But moving the refcount to a loopback device in a namespace was wrong. Is this understanding correct? Compared to moving the refcount to the loopback device in the init namespace, the fake device can somehow drop the refcount moved via rt_flush_dev(), can't it? Anyway, I'll wait for Mahesh.
On 5/4/19 1:09 PM, Tetsuo Handa wrote: > On 2019/05/05 0:56, Eric Dumazet wrote:> >> Well, you have not fixed a bug, you simply made sure that whatever cpu is using the >> routes you forcibly deleted is going to crash the host very soon (use-after-frees have >> undefined behavior, but KASAN should crash most of the times) > > I confirmed that this patch survives "#syz test:" before submitting. > But you know that this patch is deleting the route entry too early. OK. > >> >> Please do not send patches like that with a huge CC list, keep networking patches >> to netdev mailing list. > > If netdev people started working on this "minutely crashing bug" earlier, > I would not have written a patch... So, just that you know, we are working on bug fixes, and this is best effort. It is not because _you_ want to fix a particular bug (out of hundreds) that we need to stop everything and work full time on a particular bug. And here the root cause of the problem is elsewhere. A dst is leaking somewhere, and prevents the netns dismantle. We had many dst leaks in the past, and they keep being added by new bugs. > >> >> Mahesh has an alternative patch, adding a fake device that can not be dismantled >> to make sure we fully intercept skbs sent through a dead route, instead of relying >> on loopback dropping them later at some point. > > So, the reason to temporarily move the refcount is to give enough period > so that the route entry is no longer used. But moving the refcount to a > loopback device in a namespace was wrong. Is this understanding correct? I believe you need spend more time on studying the networking code by yourself, add tracing if you believe this could be useful to you and others. > > Compared to moving the refcount to the loopback device in the init namespace, > the fake device can somehow drop the refcount moved via rt_flush_dev(), can't it? > The fake device wont ever disappear. > Anyway, I'll wait for Mahesh. >
Hello, On Sat, 4 May 2019, Tetsuo Handa wrote: > syzbot is hitting infinite loop when a loopback device in a namespace is > unregistered [1]. This is because rt_flush_dev() is moving the refcount of > "any device to unregister" to "a loopback device in that namespace" but > nobody can drop the refcount moved from non loopback devices when the > loopback device in that namespace is unregistered. > > This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly > purge netdev references on uncached routes.") but there is no description > why we have to temporarily move the refcount to "a loopback device in that > namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op > when "a loopback device in that namespace" is about to be unregistered. > > Since I don't know the reason, this patch breaks the infinite loop by > deleting the uncached route (which eventually drops the refcount via > dst_destroy()) when "a loopback device in that namespace" is unregistered > rather than when "non-loopback devices in that namespace" is unregistered. There is one simple rule: code that holds device references should catch device events and to put the references. This should happen not after the NETDEV_UNREGISTER event. But there are users such as dsts that have longer life because there are other objects that can hold dsts - sockets, for example. Sockets can hold dst as result of route resolving (sk_dst_cache) and to reuse it as long as it is valid. And many sockets can point to same dst. Obviously, socket can be idle while device disappears. We do not propagate device events to every socket. What we do is to mark the dst entry as invalid and to drop its dev reference. So, the problem can be noticed on next sending when the cached dst is revalidated. As the dst entry is marked as obsolete, it will be dropped. What you see in rt_flush_dev() is that the IPv4 route subsystem drops its dev references at the right time. Why net->loopback_dev ? Because we prefer the leaked dsts to prevent netns to be released, so that we have more info where is the problem. If we account the leaks to init netns, nobody will notice them. These are leaks that missed many cleanup steps: device events and even net-exit events. If you see "lo" in error messages, it is more likely a dst leak, i.e. we got a dst reference but dst_release() was not called before netns cleanup. In such case you need to track not the dev references but the dst references. If another device is shown, it is not a dst leak but some dev leak (dev_put not called). > [1] https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4 > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Reported-by: syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com> > --- > net/ipv4/route.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 6fdf1c195d8e..7e865c11d4f3 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -1522,15 +1522,21 @@ void rt_flush_dev(struct net_device *dev) > { > struct net *net = dev_net(dev); > struct rtable *rt; > + struct rtable *tmp; > int cpu; > > for_each_possible_cpu(cpu) { > struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu); > > spin_lock_bh(&ul->lock); > - list_for_each_entry(rt, &ul->head, rt_uncached) { > + list_for_each_entry_safe(rt, tmp, &ul->head, rt_uncached) { > if (rt->dst.dev != dev) > continue; > + if (dev == net->loopback_dev) { > + list_del_init(&rt->rt_uncached); > + ip_rt_put(rt); > + continue; > + } > rt->dst.dev = net->loopback_dev; > dev_hold(rt->dst.dev); > dev_put(dev); > -- > 2.17.1 Regards -- Julian Anastasov <ja@ssi.bg>
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 6fdf1c195d8e..7e865c11d4f3 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1522,15 +1522,21 @@ void rt_flush_dev(struct net_device *dev) { struct net *net = dev_net(dev); struct rtable *rt; + struct rtable *tmp; int cpu; for_each_possible_cpu(cpu) { struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu); spin_lock_bh(&ul->lock); - list_for_each_entry(rt, &ul->head, rt_uncached) { + list_for_each_entry_safe(rt, tmp, &ul->head, rt_uncached) { if (rt->dst.dev != dev) continue; + if (dev == net->loopback_dev) { + list_del_init(&rt->rt_uncached); + ip_rt_put(rt); + continue; + } rt->dst.dev = net->loopback_dev; dev_hold(rt->dst.dev); dev_put(dev);
syzbot is hitting infinite loop when a loopback device in a namespace is unregistered [1]. This is because rt_flush_dev() is moving the refcount of "any device to unregister" to "a loopback device in that namespace" but nobody can drop the refcount moved from non loopback devices when the loopback device in that namespace is unregistered. This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly purge netdev references on uncached routes.") but there is no description why we have to temporarily move the refcount to "a loopback device in that namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op when "a loopback device in that namespace" is about to be unregistered. Since I don't know the reason, this patch breaks the infinite loop by deleting the uncached route (which eventually drops the refcount via dst_destroy()) when "a loopback device in that namespace" is unregistered rather than when "non-loopback devices in that namespace" is unregistered. [1] https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com> --- net/ipv4/route.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)