From patchwork Wed Jun 10 10:40:43 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 482592 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id B607B1401CB for ; Wed, 10 Jun 2015 20:41:02 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=yandex-team.ru header.i=@yandex-team.ru header.b=j5hUt2lp; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933397AbbFJKkz (ORCPT ); Wed, 10 Jun 2015 06:40:55 -0400 Received: from forward-corp1g.mail.yandex.net ([95.108.253.251]:35893 "EHLO forward-corp1g.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932397AbbFJKkq (ORCPT ); Wed, 10 Jun 2015 06:40:46 -0400 Received: from smtpcorp1m.mail.yandex.net (smtpcorp1m.mail.yandex.net [77.88.61.150]) by forward-corp1g.mail.yandex.net (Yandex) with ESMTP id 4CE313660C88; Wed, 10 Jun 2015 13:40:44 +0300 (MSK) Received: from smtpcorp1m.mail.yandex.net (localhost [127.0.0.1]) by smtpcorp1m.mail.yandex.net (Yandex) with ESMTP id 13E642CA0521; Wed, 10 Jun 2015 13:40:44 +0300 (MSK) Received: from unknown (unknown [2a02:6b8:0:408:ae:3705:67a7:32ac]) by smtpcorp1m.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id yEQuqgepSI-eicWiuVY; Wed, 10 Jun 2015 13:40:44 +0300 (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1433932844; bh=sUy4g9g37omVV9ucNmV2Yb7LvoBVAIgUj40/UnvokdA=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: User-Agent:MIME-Version:Content-Type:Content-Transfer-Encoding; b=j5hUt2lpq3nUGVq7YVqCO3JtyWN7EiRFCs6o7Vr+J+zHB9GAQAcuQLxXn1QZoefjT F2FGYGGEd8GET/+gyRqv2lkK75MnRqmmLsLX1peggXNvvvmtyMlk3d3RPefysLLByw ih43WfT+ebhpxlCqBOZNSTLowAAT4wmcFuvHPj7s= Authentication-Results: smtpcorp1m.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Subject: [PATCH 3.10.y 1/2] ipv6: prevent fib6_run_gc() contention From: Konstantin Khlebnikov To: stable@vger.kernel.org Cc: Michal Kubecek , netdev@vger.kernel.org, "David S. Miller" Date: Wed, 10 Jun 2015 13:40:43 +0300 Message-ID: <20150610104043.3791.92411.stgit@buzz> In-Reply-To: <20150610103926.3791.22866.stgit@buzz> References: <20150610103926.3791.22866.stgit@buzz> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Michal Kubeček commit 2ac3ac8f86f2fe065d746d9a9abaca867adec577 upstream On a high-traffic router with many processors and many IPv6 dst entries, soft lockup in fib6_run_gc() can occur when number of entries reaches gc_thresh. This happens because fib6_run_gc() uses fib6_gc_lock to allow only one thread to run the garbage collector but ip6_dst_gc() doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc() returns. On a system with many entries, this can take some time so that in the meantime, other threads pass the tests in ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for the lock. They then have to run the garbage collector one after another which blocks them for quite long. Resolve this by replacing special value ~0UL of expire parameter to fib6_run_gc() by explicit "force" parameter to choose between spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with force=false if gc_thresh is reached but not max_size. Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller --- include/net/ip6_fib.h | 2 +- net/ipv6/ip6_fib.c | 19 ++++++++----------- net/ipv6/ndisc.c | 4 ++-- net/ipv6/route.c | 4 ++-- 4 files changed, 13 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 665e0cee59bd..5e661a979694 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -301,7 +301,7 @@ extern void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info); extern void fib6_run_gc(unsigned long expires, - struct net *net); + struct net *net, bool force); extern void fib6_gc_cleanup(void); diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index ceeb9458bb60..0b5e9086322d 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1648,19 +1648,16 @@ static int fib6_age(struct rt6_info *rt, void *arg) static DEFINE_SPINLOCK(fib6_gc_lock); -void fib6_run_gc(unsigned long expires, struct net *net) +void fib6_run_gc(unsigned long expires, struct net *net, bool force) { - if (expires != ~0UL) { + if (force) { spin_lock_bh(&fib6_gc_lock); - gc_args.timeout = expires ? (int)expires : - net->ipv6.sysctl.ip6_rt_gc_interval; - } else { - if (!spin_trylock_bh(&fib6_gc_lock)) { - mod_timer(&net->ipv6.ip6_fib_timer, jiffies + HZ); - return; - } - gc_args.timeout = net->ipv6.sysctl.ip6_rt_gc_interval; + } else if (!spin_trylock_bh(&fib6_gc_lock)) { + mod_timer(&net->ipv6.ip6_fib_timer, jiffies + HZ); + return; } + gc_args.timeout = expires ? (int)expires : + net->ipv6.sysctl.ip6_rt_gc_interval; gc_args.more = icmp6_dst_gc(); @@ -1677,7 +1674,7 @@ void fib6_run_gc(unsigned long expires, struct net *net) static void fib6_gc_timer_cb(unsigned long arg) { - fib6_run_gc(0, (struct net *)arg); + fib6_run_gc(0, (struct net *)arg, true); } static int __net_init fib6_net_init(struct net *net) diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 05f361338c2e..deedf7ddbc6e 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1584,7 +1584,7 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event, switch (event) { case NETDEV_CHANGEADDR: neigh_changeaddr(&nd_tbl, dev); - fib6_run_gc(~0UL, net); + fib6_run_gc(0, net, false); idev = in6_dev_get(dev); if (!idev) break; @@ -1594,7 +1594,7 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event, break; case NETDEV_DOWN: neigh_ifdown(&nd_tbl, dev); - fib6_run_gc(~0UL, net); + fib6_run_gc(0, net, false); break; case NETDEV_NOTIFY_PEERS: ndisc_send_unsol_na(dev); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index d94d224f7e68..bd83c90f970c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1349,7 +1349,7 @@ static int ip6_dst_gc(struct dst_ops *ops) goto out; net->ipv6.ip6_rt_gc_expire++; - fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net); + fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, entries > rt_max_size); net->ipv6.ip6_rt_last_gc = now; entries = dst_entries_get_slow(ops); if (entries < ops->gc_thresh) @@ -2849,7 +2849,7 @@ int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write, net = (struct net *)ctl->extra1; delay = net->ipv6.sysctl.flush_delay; proc_dointvec(ctl, write, buffer, lenp, ppos); - fib6_run_gc(delay <= 0 ? ~0UL : (unsigned long)delay, net); + fib6_run_gc(delay <= 0 ? 0 : (unsigned long)delay, net, delay > 0); return 0; }