From patchwork Tue May 19 21:45:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ahmed S. Darwish" X-Patchwork-Id: 1293779 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49RTwg32yvz9sTK for ; Wed, 20 May 2020 07:46:27 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728008AbgESVqX (ORCPT ); Tue, 19 May 2020 17:46:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726030AbgESVqX (ORCPT ); Tue, 19 May 2020 17:46:23 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8C07C08C5C0; Tue, 19 May 2020 14:46:22 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA3e-0002Yy-7I; Tue, 19 May 2020 23:45:54 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org Subject: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Date: Tue, 19 May 2020 23:45:23 +0200 Message-Id: <20200519214547.352050-2-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Sequence counters write paths are critical sections that must never be preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed. Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and netdev name retrieval.") handled a deadlock, observed with CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was infinitely spinning: it got scheduled after the seqcount write side blocked inside its own critical section. To fix that deadlock, among other issues, the commit added a cond_resched() inside the read side section. While this will get the non-preemptible kernel eventually unstuck, the seqcount reader is fully exhausting its slice just spinning -- until TIF_NEED_RESCHED is set. The fix is also still broken: if the seqcount reader belongs to a real-time scheduling policy, it can spin forever and the kernel will livelock. Disabling preemption over the seqcount write side critical section will not work: inside it are a number of GFP_KERNEL allocations and mutex locking through the drivers/base/ :: device_rename() call chain. From all the above, replace the seqcount with a rwsem. Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.) Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount) Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) Cc: Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior Reviewed-by: Eric Dumazet Reported-by: kbuild test robot Reported-by: Dan Carpenter --- net/core/dev.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 522288177bbd..e18a4c23df0e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -79,6 +79,7 @@ #include #include #include +#include #include #include #include @@ -194,7 +195,7 @@ static DEFINE_SPINLOCK(napi_hash_lock); static unsigned int napi_gen_id = NR_CPUS; static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8); -static seqcount_t devnet_rename_seq; +static DECLARE_RWSEM(devnet_rename_sem); static inline void dev_base_seq_inc(struct net *net) { @@ -930,18 +931,13 @@ EXPORT_SYMBOL(dev_get_by_napi_id); * @net: network namespace * @name: a pointer to the buffer where the name will be stored. * @ifindex: the ifindex of the interface to get the name from. - * - * The use of raw_seqcount_begin() and cond_resched() before - * retrying is required as we want to give the writers a chance - * to complete when CONFIG_PREEMPTION is not set. */ int netdev_get_name(struct net *net, char *name, int ifindex) { struct net_device *dev; - unsigned int seq; -retry: - seq = raw_seqcount_begin(&devnet_rename_seq); + down_read(&devnet_rename_sem); + rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); if (!dev) { @@ -951,10 +947,8 @@ int netdev_get_name(struct net *net, char *name, int ifindex) strcpy(name, dev->name); rcu_read_unlock(); - if (read_seqcount_retry(&devnet_rename_seq, seq)) { - cond_resched(); - goto retry; - } + + up_read(&devnet_rename_sem); return 0; } @@ -1228,10 +1222,10 @@ int dev_change_name(struct net_device *dev, const char *newname) likely(!(dev->priv_flags & IFF_LIVE_RENAME_OK))) return -EBUSY; - write_seqcount_begin(&devnet_rename_seq); + down_write(&devnet_rename_sem); if (strncmp(newname, dev->name, IFNAMSIZ) == 0) { - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); return 0; } @@ -1239,7 +1233,7 @@ int dev_change_name(struct net_device *dev, const char *newname) err = dev_get_valid_name(net, dev, newname); if (err < 0) { - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); return err; } @@ -1254,11 +1248,11 @@ int dev_change_name(struct net_device *dev, const char *newname) if (ret) { memcpy(dev->name, oldname, IFNAMSIZ); dev->name_assign_type = old_assign_type; - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); return ret; } - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); netdev_adjacent_rename_links(dev, oldname); @@ -1279,7 +1273,7 @@ int dev_change_name(struct net_device *dev, const char *newname) /* err >= 0 after dev_alloc_name() or stores the first errno */ if (err >= 0) { err = ret; - write_seqcount_begin(&devnet_rename_seq); + down_write(&devnet_rename_sem); memcpy(dev->name, oldname, IFNAMSIZ); memcpy(oldname, newname, IFNAMSIZ); dev->name_assign_type = old_assign_type; From patchwork Tue May 19 21:45:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ahmed S. Darwish" X-Patchwork-Id: 1293783 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49RTxJ23mvz9sTM for ; Wed, 20 May 2020 07:47:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728277AbgESVq5 (ORCPT ); Tue, 19 May 2020 17:46:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728219AbgESVqx (ORCPT ); Tue, 19 May 2020 17:46:53 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64344C08C5C0; Tue, 19 May 2020 14:46:53 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA3o-0002bS-13; Tue, 19 May 2020 23:46:04 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , Andrew Lunn , Florian Fainelli , Heiner Kallweit , Russell King , "David S. Miller" , netdev@vger.kernel.org Subject: [PATCH v1 03/25] net: phy: fixed_phy: Remove unused seqcount Date: Tue, 19 May 2020 23:45:25 +0200 Message-Id: <20200519214547.352050-4-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Commit bf7afb29d545 ("phy: improve safety of fixed-phy MII register reading") protected the fixed PHY status with a sequence counter. Two years later, commit d2b977939b18 ("net: phy: fixed-phy: remove fixed_phy_update_state()") removed the sequence counter's write side critical section -- neutralizing its read side retry loop. Remove the unused seqcount. Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior --- drivers/net/phy/fixed_phy.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c index 4a3d34f40cb9..f55365c9d1f7 100644 --- a/drivers/net/phy/fixed_phy.c +++ b/drivers/net/phy/fixed_phy.c @@ -34,7 +34,6 @@ struct fixed_mdio_bus { struct fixed_phy { int addr; struct phy_device *phydev; - seqcount_t seqcount; struct fixed_phy_status status; bool no_carrier; int (*link_update)(struct net_device *, struct fixed_phy_status *); @@ -80,19 +79,17 @@ static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num) list_for_each_entry(fp, &fmb->phys, node) { if (fp->addr == phy_addr) { struct fixed_phy_status state; - int s; - do { - s = read_seqcount_begin(&fp->seqcount); - fp->status.link = !fp->no_carrier; - /* Issue callback if user registered it. */ - if (fp->link_update) - fp->link_update(fp->phydev->attached_dev, - &fp->status); - /* Check the GPIO for change in status */ - fixed_phy_update(fp); - state = fp->status; - } while (read_seqcount_retry(&fp->seqcount, s)); + fp->status.link = !fp->no_carrier; + + /* Issue callback if user registered it. */ + if (fp->link_update) + fp->link_update(fp->phydev->attached_dev, + &fp->status); + + /* Check the GPIO for change in status */ + fixed_phy_update(fp); + state = fp->status; return swphy_read_reg(reg_num, &state); } @@ -150,8 +147,6 @@ static int fixed_phy_add_gpiod(unsigned int irq, int phy_addr, if (!fp) return -ENOMEM; - seqcount_init(&fp->seqcount); - if (irq != PHY_POLL) fmb->mii_bus->irq[phy_addr] = irq; From patchwork Tue May 19 21:45:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ahmed S. Darwish" X-Patchwork-Id: 1293782 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49RTxC0TVyz9sT6 for ; Wed, 20 May 2020 07:46:55 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728229AbgESVqx (ORCPT ); Tue, 19 May 2020 17:46:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726030AbgESVqw (ORCPT ); Tue, 19 May 2020 17:46:52 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31BCCC08C5C0; Tue, 19 May 2020 14:46:52 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA3x-0002d6-SK; Tue, 19 May 2020 23:46:14 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org Subject: [PATCH v1 05/25] u64_stats: Document writer non-preemptibility requirement Date: Tue, 19 May 2020 23:45:27 +0200 Message-Id: <20200519214547.352050-6-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The u64_stats mechanism uses sequence counters to protect against 64-bit values tearing on 32-bit architectures. Updating such statistics is a sequence counter write side critical section. Preemption must be disabled before entering this seqcount write critical section. Failing to do so, the seqcount read side can preempt the write side section and spin for the entire scheduler tick. If that reader belongs to a real-time scheduling class, it can spin forever and the kernel will livelock. Document this statistics update side non-preemptibility requirement. Reword the u64_stats header file top comment to always mention "Reader" or "Writer" at the start of each bullet point, making it easier to follow which side each point is actually for. Fix the statement "whole thing is a NOOP on 64bit arches or UP kernels". For 32-bit UP kernels, preemption is always disabled for the statistics read side section. Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior --- include/linux/u64_stats_sync.h | 38 ++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h index 9de5c10293f5..30358ce3d8fe 100644 --- a/include/linux/u64_stats_sync.h +++ b/include/linux/u64_stats_sync.h @@ -7,29 +7,31 @@ * we provide a synchronization point, that is a noop on 64bit or UP kernels. * * Key points : - * 1) Use a seqcount on SMP 32bits, with low overhead. - * 2) Whole thing is a noop on 64bit arches or UP kernels. - * 3) Write side must ensure mutual exclusion or one seqcount update could + * + * 1) Use a seqcount on 32-bit SMP, only disable preemption for 32-bit UP. + * + * 2) The whole thing is a no-op on 64-bit architectures. + * + * 3) Write side must ensure mutual exclusion, or one seqcount update could * be lost, thus blocking readers forever. - * If this synchronization point is not a mutex, but a spinlock or - * spinlock_bh() or disable_bh() : - * 3.1) Write side should not sleep. - * 3.2) Write side should not allow preemption. - * 3.3) If applicable, interrupts should be disabled. * - * 4) If reader fetches several counters, there is no guarantee the whole values - * are consistent (remember point 1) : this is a noop on 64bit arches anyway) + * 4) Write side must disable preemption, or a seqcount reader can preempt the + * writer and also spin forever. * - * 5) readers are allowed to sleep or be preempted/interrupted : They perform - * pure reads. But if they have to fetch many values, it's better to not allow - * preemptions/interruptions to avoid many retries. + * 5) Write side must use the _irqsave() variant if other writers, or a reader, + * can be invoked from an IRQ context. * - * 6) If counter might be written by an interrupt, readers should block interrupts. - * (On UP, there is no seqcount_t protection, a reader allowing interrupts could - * read partial values) + * 6) If reader fetches several counters, there is no guarantee the whole values + * are consistent w.r.t. each other (remember point #2: seqcounts are not + * used for 64bit architectures). * - * 7) For irq and softirq uses, readers can use u64_stats_fetch_begin_irq() and - * u64_stats_fetch_retry_irq() helpers + * 7) Readers are allowed to sleep or be preempted/interrupted: they perform + * pure reads. + * + * 8) Readers must use both u64_stats_fetch_{begin,retry}_irq() if the stats + * might be updated from a hardirq or softirq context (remember point #1: + * seqcounts are not used for UP kernels). 32-bit UP stat readers could read + * corrupted 64-bit values otherwise. * * Usage : * From patchwork Tue May 19 21:45:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ahmed S. Darwish" X-Patchwork-Id: 1293785 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49RTxz4J4tz9sTH for ; Wed, 20 May 2020 07:47:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728426AbgESVre (ORCPT ); Tue, 19 May 2020 17:47:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728351AbgESVrc (ORCPT ); Tue, 19 May 2020 17:47:32 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B1B6C08C5C0; Tue, 19 May 2020 14:47:32 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA4m-0002mk-OU; Tue, 19 May 2020 23:47:04 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , "David S. Miller" , Jakub Kicinski , netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org Subject: [PATCH v1 15/25] netfilter: conntrack: Use sequence counter with associated spinlock Date: Tue, 19 May 2020 23:45:37 +0200 Message-Id: <20200519214547.352050-16-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish --- include/net/netfilter/nf_conntrack.h | 2 +- net/netfilter/nf_conntrack_core.c | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 9f551f3b69c6..333fd54aec30 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize); extern struct hlist_nulls_head *nf_conntrack_hash; extern unsigned int nf_conntrack_htable_size; -extern seqcount_t nf_conntrack_generation; +extern seqcount_spinlock_t nf_conntrack_generation; extern unsigned int nf_conntrack_max; /* must be called with rcu read lock held */ diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index c4582eb71766..48a839377da2 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size); unsigned int nf_conntrack_max __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_max); -seqcount_t nf_conntrack_generation __read_mostly; +seqcount_spinlock_t nf_conntrack_generation __read_mostly; static unsigned int nf_conntrack_hash_rnd __read_mostly; static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple, @@ -2512,7 +2512,8 @@ int nf_conntrack_init_start(void) /* struct nf_ct_ext uses u8 to store offsets/size */ BUILD_BUG_ON(total_extension_size() > 255u); - seqcount_init(&nf_conntrack_generation); + seqcount_spinlock_init(&nf_conntrack_generation, + &nf_conntrack_locks_all_lock); for (i = 0; i < CONNTRACK_LOCKS; i++) spin_lock_init(&nf_conntrack_locks[i]); From patchwork Tue May 19 21:45:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ahmed S. Darwish" X-Patchwork-Id: 1293788 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49RTym1YCGz9sTH for ; Wed, 20 May 2020 07:48:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728480AbgESVrl (ORCPT ); Tue, 19 May 2020 17:47:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728351AbgESVrf (ORCPT ); Tue, 19 May 2020 17:47:35 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EB28C08C5C0; Tue, 19 May 2020 14:47:35 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA4r-0002nf-QD; Tue, 19 May 2020 23:47:09 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , "David S. Miller" , Jakub Kicinski , netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org Subject: [PATCH v1 16/25] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Date: Tue, 19 May 2020 23:45:38 +0200 Message-Id: <20200519214547.352050-17-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_rwlock_t data type, which allows to associate a rwlock with the sequence counter. This enables lockdep to verify that the rwlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish --- net/netfilter/nft_set_rbtree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 3ffef454d469..f50d986d43c5 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -18,7 +18,7 @@ struct nft_rbtree { struct rb_root root; rwlock_t lock; - seqcount_t count; + seqcount_rwlock_t count; struct delayed_work gc_work; }; @@ -505,7 +505,7 @@ static int nft_rbtree_init(const struct nft_set *set, struct nft_rbtree *priv = nft_set_priv(set); rwlock_init(&priv->lock); - seqcount_init(&priv->count); + seqcount_rwlock_init(&priv->count, &priv->lock); priv->root = RB_ROOT; INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc); From patchwork Tue May 19 21:45:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ahmed S. Darwish" X-Patchwork-Id: 1293790 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49RTyq5Ts8z9sTS for ; Wed, 20 May 2020 07:48:19 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728458AbgESVrk (ORCPT ); Tue, 19 May 2020 17:47:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728437AbgESVrj (ORCPT ); Tue, 19 May 2020 17:47:39 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBA42C08C5C3; Tue, 19 May 2020 14:47:38 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA4w-0002oZ-PC; Tue, 19 May 2020 23:47:14 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , Steffen Klassert , Herbert Xu , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org Subject: [PATCH v1 17/25] xfrm: policy: Use sequence counters with associated lock Date: Tue, 19 May 2020 23:45:39 +0200 Message-Id: <20200519214547.352050-18-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the sequence counter write side critical section. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead, which allow to associate a lock with the sequence counter. This enables lockdep to verify that the lock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish --- net/xfrm/xfrm_policy.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 297b2fdb3c29..aae78a7aecd7 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin { /* list containing '*:*' policies */ struct hlist_head hhead; - seqcount_t count; + seqcount_spinlock_t count; /* tree sorted by daddr/prefix */ struct rb_root root_d; @@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1] __read_mostly; static struct kmem_cache *xfrm_dst_cache __ro_after_init; -static __read_mostly seqcount_t xfrm_policy_hash_generation; +static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation; static struct rhashtable xfrm_policy_inexact_table; static const struct rhashtable_params xfrm_pol_inexact_params; @@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir) INIT_HLIST_HEAD(&bin->hhead); bin->root_d = RB_ROOT; bin->root_s = RB_ROOT; - seqcount_init(&bin->count); + seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock); prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table, &bin->k, &bin->head, @@ -1911,7 +1911,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol, static struct xfrm_pol_inexact_node * xfrm_policy_lookup_inexact_addr(const struct rb_root *r, - seqcount_t *count, + seqcount_spinlock_t *count, const xfrm_address_t *addr, u16 family) { const struct rb_node *parent; @@ -4158,7 +4158,7 @@ void __init xfrm_init(void) { register_pernet_subsys(&xfrm_net_ops); xfrm_dev_init(); - seqcount_init(&xfrm_policy_hash_generation); + seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex); xfrm_input_init(); #ifdef CONFIG_INET_ESPINTCP