From patchwork Wed Jan 18 08:00:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1728047 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=qt6g0zZx; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NxdXV2Z1Xz23gM for ; Wed, 18 Jan 2023 19:02:34 +1100 (AEDT) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4NxdXT6qVbz3chJ for ; Wed, 18 Jan 2023 19:02:33 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=qt6g0zZx; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::1029; helo=mail-pj1-x1029.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=qt6g0zZx; dkim-atps=neutral Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4NxdV85Ys3z3c2W for ; Wed, 18 Jan 2023 19:00:32 +1100 (AEDT) Received: by mail-pj1-x1029.google.com with SMTP id v10-20020a17090abb8a00b00229c517a6eeso1495455pjr.5 for ; Wed, 18 Jan 2023 00:00:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=qt6g0zZxclXPX/KufLzDrxetxOa+Y0KuO2kDBGjEirYTyEuDL2FU0kh1CW+bvKzQ8e 1PgIn8vJJDHYwOXsY+eXFKzMReN8kzIkL3cqJbHC/KcVaDPk5+4ASp5rwAzbthNV2jCy sFId0D+KFl7RHqDm5hu7vWp9CHS1bb0XbHOhZNmj1FkqrWt4HCW+AaZLa2qrD0ImjpmG OLknz9aRVzEP5mWTfavQXBkdiCAYcQpJIhtMx/MGg9B5RRIVRIGc2Rw6L2BEqeC36yiX RdYtdnH476ugfbY0kv33lImerLZIfQVPZX7TrvFbeii8oXEK4Tdej1T6R4YBH6qhLF7Z 2J+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=7qQg5mBEJ5ZMwm3Pt2OrsOPV/vOQOpXy7e2Dyd+Jx2M51d7wCNLqRvK7wWiDc2NLg0 mc6Ktv7S4wTNV3qM9bF/M37VyZfEMfx8hXAqKWr0/K93HW8MHUH+mnne/dxZp7Gphdgl vSqM0ClEH+gwur7nJZ0SW8jxolGWMpera/9iJpGLGg7mw9nqdTAtNWSHM4n51yfLQszC LsC8W47ByQCJuPD2aejBmVi6xi9E5NVdbZ4yMZtLORpZq1nkWQYvuBpO+rp/HwYeVep7 VrK36Y109K09UEEybIHDtdFPGdqR0S0rdZOyrlTcehHdquglHSBM5Da2iDyjtmcl37qS Gy5g== X-Gm-Message-State: AFqh2krs6MHfuTzNmxnM+0bWABqChrmJWtb4k+pNL4ZG04PUhssnUgd1 CYdngsxn9aVJ7Cv8uqVFmLI= X-Google-Smtp-Source: AMrXdXtbjb5k3yKNxPZ6Ry87Y+0HPuS/QyUv3IYX7qbgXoBriB0NOKeG0RZHtZHLAEw3QY6HwQqHbg== X-Received: by 2002:a17:90b:1d04:b0:229:2b7d:ee41 with SMTP id on4-20020a17090b1d0400b002292b7dee41mr5967999pjb.45.1674028830421; Wed, 18 Jan 2023 00:00:30 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:29 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Subject: [PATCH v6 2/5] lazy tlb: allow lazy tlb mm refcounting to be configurable Date: Wed, 18 Jan 2023 18:00:08 +1000 Message-Id: <20230118080011.2258375-3-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch , Linus Torvalds , Nicholas Piggin , linux-mm , Andy Lutomirski , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm when it is context switched. This can be disabled by architectures that don't require this refcounting if they clean up lazy tlb mms when the last refcount is dropped. Currently this is always enabled, which is what existing code does, so the patch is effectively a no-op. Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is. Signed-off-by: Nicholas Piggin --- Documentation/mm/active_mm.rst | 6 ++++++ arch/Kconfig | 17 +++++++++++++++++ include/linux/sched/mm.h | 18 +++++++++++++++--- kernel/sched/core.c | 22 ++++++++++++++++++---- kernel/sched/sched.h | 4 +++- 5 files changed, 59 insertions(+), 8 deletions(-) diff --git a/Documentation/mm/active_mm.rst b/Documentation/mm/active_mm.rst index 6f8269c284ed..2b0d08332400 100644 --- a/Documentation/mm/active_mm.rst +++ b/Documentation/mm/active_mm.rst @@ -4,6 +4,12 @@ Active MM ========= +Note, the mm_count refcount may no longer include the "lazy" users +(running tasks with ->active_mm == mm && ->mm == NULL) on kernels +with CONFIG_MMU_LAZY_TLB_REFCOUNT=n. Taking and releasing these lazy +references must be done with mmgrab_lazy_tlb() and mmdrop_lazy_tlb() +helpers which abstracts this config option. + :: List: linux-kernel diff --git a/arch/Kconfig b/arch/Kconfig index 12e3ddabac9d..b07d36f08fea 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -465,6 +465,23 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM irqs disabled over activate_mm. Architectures that do IPI based TLB shootdowns should enable this. +# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references. +# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching +# to/from kernel threads when the same mm is running on a lot of CPUs (a large +# multi-threaded application), by reducing contention on the mm refcount. +# +# This can be disabled if the architecture ensures no CPUs are using an mm as a +# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm +# or its kernel page tables). This could be arranged by arch_exit_mmap(), or +# final exit(2) TLB flush, for example. +# +# To implement this, an arch *must*: +# Ensure the _lazy_tlb variants of mmgrab/mmdrop are used when dropping the +# lazy reference of a kthread's ->active_mm (non-arch code has been converted +# already). +config MMU_LAZY_TLB_REFCOUNT + def_bool y + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 5376caf6fcf3..68bbe8d90c2e 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -82,17 +82,29 @@ static inline void mmdrop_sched(struct mm_struct *mm) /* Helpers for lazy TLB mm refcounting */ static inline void mmgrab_lazy_tlb(struct mm_struct *mm) { - mmgrab(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmgrab(mm); } static inline void mmdrop_lazy_tlb(struct mm_struct *mm) { - mmdrop(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) { + mmdrop(mm); + } else { + /* + * mmdrop_lazy_tlb must provide a full memory barrier, see the + * membarrier comment finish_task_switch which relies on this. + */ + smp_mb(); + } } static inline void mmdrop_lazy_tlb_sched(struct mm_struct *mm) { - mmdrop_sched(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmdrop_sched(mm); + else + smp_mb(); // see above } /** diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26aaa974ee6d..1ea14d849a0d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5081,7 +5081,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) __releases(rq->lock) { struct rq *rq = this_rq(); - struct mm_struct *mm = rq->prev_mm; + struct mm_struct *mm = NULL; unsigned int prev_state; /* @@ -5100,7 +5100,10 @@ static struct rq *finish_task_switch(struct task_struct *prev) current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); - rq->prev_mm = NULL; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + mm = rq->prev_lazy_mm; + rq->prev_lazy_mm = NULL; +#endif /* * A task struct has one reference for the use as "current". @@ -5231,9 +5234,20 @@ context_switch(struct rq *rq, struct task_struct *prev, lru_gen_use_mm(next->mm); if (!prev->mm) { // from kernel - /* will mmdrop_lazy_tlb() in finish_task_switch(). */ - rq->prev_mm = prev->active_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + /* Will mmdrop_lazy_tlb() in finish_task_switch(). */ + rq->prev_lazy_mm = prev->active_mm; prev->active_mm = NULL; +#else + /* + * Without MMU_LAZY_TLB_REFCOUNT there is no lazy + * tracking (because no rq->prev_lazy_mm) in + * finish_task_switch, so no mmdrop_lazy_tlb(), so no + * memory barrier for membarrier (see the membarrier + * comment in finish_task_switch()). Do it here. + */ + smp_mb(); +#endif } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 771f8ddb7053..33da8fa8b5a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1009,7 +1009,9 @@ struct rq { struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; - struct mm_struct *prev_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + struct mm_struct *prev_lazy_mm; +#endif unsigned int clock_update_flags; u64 clock;