From patchwork Tue Jun 20 07:40:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796941 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=0LMD2/Bp; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QldpT379zz20XZ for ; Tue, 20 Jun 2023 17:40:33 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=0LMD2/Bp; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QldpT1v1gz30hB for ; Tue, 20 Jun 2023 17:40:33 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=0LMD2/Bp; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::1129; helo=mail-yw1-x1129.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qldp85srVz3038 for ; Tue, 20 Jun 2023 17:40:16 +1000 (AEST) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-56d304e5f83so49427027b3.2 for ; Tue, 20 Jun 2023 00:40:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246813; x=1689838813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=0LMD2/Bp9LkZIBYgPbN4jKWx5l3xqsV5p+9EAezDq6xR/SWmyYTEAe1FdDy1xbMmRb H8PBsbrNU4fYI4mKZfhyMwB/CU4nG5b31JuXXrIAyWt0sajI0olzTibXN+ktWNYF0oep x0ccngPNp6kQepRw/XRcKOWVrIwm+WEupexqrFO6FFWjiu1SlHy8yMCBdGb2CGeJXPQe rQ6klP2gqbiJAoxVWslA7bD8MQXYp6OVctc8Pm9x7E9rZqqXloCfaCSI/DExJxOsqSJT 4d6fjthovhopd44fEKacEstPh46MmIG/u5ZNkKED0IyMDYNloqJ4rKSwMXR8oHsnZsZm /6EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246813; x=1689838813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=AuBdgGwNvB5f7eSH86823eu9wtHBWNbdzBzxvij0zfhOozFLDWUFcAta711sLtWXBS TpeQoyiATxMc4JIfGgktzEuPT1iWZr294shNdyROWJ3JPMHqx8HGprulRbNFVkHJYidg yZSDWLwPi/I2ScB0iCDeW4QX5+wjZYwfhvRV4t0U4G4O2JBq68M1aCagMGPA4NuUT9LG blCTfFygHW9DG/h0nuw2YRWWQWqIjoXxkDqmgvXQNOlu8zesHIf3BlgUczarUBdTJbIv fkeDugVnwITJbvDpjla4K6hMb7URdYUgX6URJfrUHIcQVEorY08DNoGomFEflaS97X1B KOJg== X-Gm-Message-State: AC+VfDxcA/wt1hnBbfQChugND7G4Zt6KYkxREHFqIyapJ6sq2Sx0PBAl pLbA7a9Uv7UhNowC40DFCQnztA== X-Google-Smtp-Source: ACHHUZ75OBuUzMptMwBPGDkg2ahPnjw/MJzrATJW6OVlCqaB4pyoJIMCLPp7Z4SL32qkhEKJol5B6w== X-Received: by 2002:a81:75d6:0:b0:56d:b98:cc16 with SMTP id q205-20020a8175d6000000b0056d0b98cc16mr13330173ywc.45.1687246813120; Tue, 20 Jun 2023 00:40:13 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x8-20020a817c08000000b005623ae13106sm368166ywc.100.2023.06.20.00.40.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:40:12 -0700 (PDT) Date: Tue, 20 Jun 2023 00:40:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <53514a65-9053-1e8a-c76a-c158f8965@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Shaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- mm/pgtable-generic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a1326e61d7ee..8b0fc7fdc46f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -99,7 +99,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ - /* rcu_read_unlock() to be added later */ \ + rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) @@ -108,7 +108,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) } static inline void pte_unmap(pte_t *pte) { - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); } #endif diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index c7ab18a5fb77..674671835631 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -236,7 +236,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { pmd_t pmdval; - /* rcu_read_lock() to be added later */ + rcu_read_lock(); pmdval = pmdp_get_lockless(pmd); if (pmdvalp) *pmdvalp = pmdval; @@ -250,7 +250,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) } return __pte_map(&pmdval, addr); nomap: - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); return NULL; } From patchwork Tue Jun 20 07:42:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796943 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=Rk9aZz3l; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qldry0q8Qz20XZ for ; Tue, 20 Jun 2023 17:42:42 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=Rk9aZz3l; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qldrx6ZkFz30gj for ; Tue, 20 Jun 2023 17:42:41 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=Rk9aZz3l; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::b2a; helo=mail-yb1-xb2a.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qldrb3VLxz2yyc for ; Tue, 20 Jun 2023 17:42:22 +1000 (AEST) Received: by mail-yb1-xb2a.google.com with SMTP id 3f1490d57ef6-bc476bf5239so4498011276.2 for ; Tue, 20 Jun 2023 00:42:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246939; x=1689838939; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=Rk9aZz3lCgWM/wzoeIdA8TQQHo7EP0qUx4FBk6w6vBIv3V9EGmhpCtWeOKkVLalH/B 3/IhZXP8pzSAUtUwv1YZ0rZFRUBVXIfjAUyirXzYoBsOop5Z2sZPS+eTpvJIhr7MrLYm LUjzgRD1GyQ/XuvAm8PLozNnLUaUsJj8Srr/G3h67lgjS83i0yMgd02EIVoO5rsRx9kI daqlfGCCUSLKQt7wMh3o0YW7GwW3VGqW8slprzgeDxp+5B0Z6u1+4uuHf8AelDneMwjV DKf7ckeYtLcfEfqu2RjXHw4XHJyjxhdMznrF0GKkr5IkH8s4Le9upNEGz9sVfSI4hq5l 39mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246939; x=1689838939; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=Or7bhc67CHDylNvIGiRhzy/3Kw8nlTm2YtMPBiqxvVfYsVxHWGRDFa6PQvLlHhydIX Gcn/uXhFAmllPKZzW/7Ftcru9PmuvSTzdkoK/UOky1/Xcfj3pEeT38g9y5/QHjGVadpI l40jmpb2qbX2a1A2q8ajz6KmKVekpIP61a03PCrMbo/J5LXls5Q6tJmekHj3IJ9ITcKg 4qcpv+exTY8mgVCkrtJ1/FZZr5Wk+Q4n2mlZv9Saua1A9IIvMXCZ7q8xLfpOhstj0w4r whAqisSFCGHSHvlNog48Cxd/LWdGTP6HDxdatoY7ofUGQo84x+2UW6CKZWcR2KDaXJEu ZgPw== X-Gm-Message-State: AC+VfDw/sAYeMe991QhQiLQ0Csca9MHwZf0hvyHsZkabHOv6raoIy5vn PB/IOTWAXwEJ5wuQMr1y8XPEKw== X-Google-Smtp-Source: ACHHUZ5rgE5ZeLfP+qyjhwCRCkWSPHoZuMre6lfq8YixODGIptTSfSS2sgODZVUUkcQ4STmKqbbGHw== X-Received: by 2002:a25:4dc2:0:b0:bac:4dd6:d0d1 with SMTP id a185-20020a254dc2000000b00bac4dd6d0d1mr7498427ybb.19.1687246938747; Tue, 20 Jun 2023 00:42:18 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j200-20020a2523d1000000b00be6cf9d2544sm254636ybj.40.2023.06.20.00.42.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:42:18 -0700 (PDT) Date: Tue, 20 Jun 2023 00:42:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David Sc. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=y, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(), used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync() synonym for tlb_remove_table_sync_one(): to send the necessary interrupt at the right moment on those configs which do not already send it. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++++ mm/pgtable-generic.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8b0fc7fdc46f..525f1782b466 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -390,6 +390,7 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) return pmd; } #define pmdp_get_lockless pmdp_get_lockless +#define pmdp_get_lockless_sync() tlb_remove_table_sync_one() #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */ @@ -408,6 +409,9 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) { return pmdp_get(pmdp); } +static inline void pmdp_get_lockless_sync(void) +{ +} #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 674671835631..5e85a625ab30 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,12 +232,41 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ + (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) +/* + * See the comment above ptep_get_lockless() in include/linux/pgtable.h: + * the barriers in pmdp_get_lockless() cannot guarantee that the value in + * pmd_high actually belongs with the value in pmd_low; but holding interrupts + * off blocks the TLB flush between present updates, which guarantees that a + * successful __pte_offset_map() points to a page from matched halves. + */ +static unsigned long pmdp_get_lockless_start(void) +{ + unsigned long irqflags; + + local_irq_save(irqflags); + return irqflags; +} +static void pmdp_get_lockless_end(unsigned long irqflags) +{ + local_irq_restore(irqflags); +} +#else +static unsigned long pmdp_get_lockless_start(void) { return 0; } +static void pmdp_get_lockless_end(unsigned long irqflags) { } +#endif + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { + unsigned long irqflags; pmd_t pmdval; rcu_read_lock(); + irqflags = pmdp_get_lockless_start(); pmdval = pmdp_get_lockless(pmd); + pmdp_get_lockless_end(irqflags); + if (pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) From patchwork Tue Jun 20 07:43:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796945 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=qhwhngGU; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qldvq2H3tz20XZ for ; Tue, 20 Jun 2023 17:45:11 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=qhwhngGU; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qldvq0FtRz3bY9 for ; Tue, 20 Jun 2023 17:45:11 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=qhwhngGU; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::1133; helo=mail-yw1-x1133.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QldtQ6WLlz30f7 for ; Tue, 20 Jun 2023 17:43:58 +1000 (AEST) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-570114e1feaso51317047b3.3 for ; Tue, 20 Jun 2023 00:43:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247035; x=1689839035; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=qhwhngGUpgrcZDlnvH+b7VWG9WLTHUEVswW9xK/3YCFZXun6Sku2aj+qmeuJtlpDm9 NL1p7tRRaSaVuFlEgJZSd1u9Osk9e4zYWPVYSmjFqeH51Two+LdeRyVK6tn3qt9RQ2/C FXROYrneK++r2OPYxjNpEmp15a3hsycB8EeXDTwBWnGuwYFgeqd19dhMRsM+70lw7kZD UxM84oMlRLUMCqnzuOHx9Iap0algo8Gh2vtX9lFjHG25G0LFiMapQLFR33NHeN7LFjU7 xqRJD57e/d0/xxEOFNXoz5t1uVw5lEYV2V7WrZMTJMmAFZ9r11+4uO5ZNp31jPHEHE3P 8GXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247035; x=1689839035; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=KsoEeyqcqIHJM2bo9S6lFxtenG1WWjZGxnd/VZuJlu9e98h6VQxXivbSwyjkTHV2nQ e1SJkrBEWAc6NdlnlXxw6wKuBXcEYey23w4i+QqQTejFX4XlIvcCPMWv43j68aA2QzxD I/YjSgkzESMsjvpMSUGNorw9SSCQ4Ldq0fGAec/mLfBRxhtfZDNeYjoDRrnlCTs0wzHF 5nd5hmORqfFo2l65vRitSuVKTJmZqCT1bZ5eS+ZvdQZoVPjly7dhPBdFNFRaNRdJ8AtY dbEIRjqMh5rhnMkwhucd0j7hluWzU+GZatPTji46q3poCx8OdZaxOe2BQAAnnQ/FkIEG rUVg== X-Gm-Message-State: AC+VfDw38v80AutfVH0RaKwfEfnO2kJMZKBCA8vM6pvhYdaPvkyWNlkg bEJ0/e/lrvLJ1jdxAs/JsTA3tA== X-Google-Smtp-Source: ACHHUZ7JD/5gT47oLcluDex4538IBoUvSmCiuwTDk3dXuw6uEp5PSF1HYd+jz7L8cEHWLsd+Ww/fjQ== X-Received: by 2002:a0d:cf82:0:b0:56d:2b1e:3d88 with SMTP id r124-20020a0dcf82000000b0056d2b1e3d88mr11408597ywd.24.1687247035289; Tue, 20 Jun 2023 00:43:55 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y188-20020a8188c5000000b00545a08184f8sm365085ywf.136.2023.06.20.00.43.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:43:54 -0700 (PDT) Date: Tue, 20 Jun 2023 00:43:50 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in adjust_pte(): because it gives the not-locked ptl for precisely that pte, which the caller can then safely lock; whereas pte_lockptr() is not so tightly coupled, because it dereferences the pmd pointer again. Signed-off-by: Hugh Dickins --- arch/arm/mm/fault-armv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index ca5302b0b7ee..7cb125497976 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,11 +117,10 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte = pte_offset_map(pmd, address); + pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); if (!pte) return 0; - ptl = pte_lockptr(vma->vm_mm, pmd); do_pte_lock(ptl); ret = do_adjust_pte(vma, address, pfn, pte); From patchwork Tue Jun 20 07:45:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796947 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=jZI7lGkG; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qldwp3y4Jz20XS for ; Tue, 20 Jun 2023 17:46:02 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=jZI7lGkG; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qldwp1FCHz3c2J for ; Tue, 20 Jun 2023 17:46:02 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=jZI7lGkG; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::112a; helo=mail-yw1-x112a.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x112a.google.com (mail-yw1-x112a.google.com [IPv6:2607:f8b0:4864:20::112a]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QldwH2hYRz3bW1 for ; Tue, 20 Jun 2023 17:45:35 +1000 (AEST) Received: by mail-yw1-x112a.google.com with SMTP id 00721157ae682-5703cb4bcb4so39361697b3.3 for ; Tue, 20 Jun 2023 00:45:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247132; x=1689839132; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=jZI7lGkGOA9VplRFSG9/39QEHLqEHFn81AVfRltHoq/8iNx7aAWlNrNyFe+XyF0lml Bnzro2qd7JBTjoGR7k3DplV7l1biYn3/LoxKNyegFiBFIRoYqaL2y+rd8Ux0pQmhTgvs IUPCYgX4z0WdKLSspIafzLUn3c05nMTizwc56jirtYHMfRSc/oqFfK7oTa/fwkNDoThq 2HdR/Am0ViDDg3zBY7MZWP/MVjmD1Yx9xMfwqoDnIfAkP9LmxM3rFBDQvW9AFIfnglmj paE2iSuMCCu/tpYLxht5C/v0kudDdPO6A0RSV0ftjBtK/MFrCHoTJHxirOjy2uYrAODS XFsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247132; x=1689839132; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=cNAjs17KN5KGfgXrCCE6UymmJM3ip6PW61u3C7KLexRCexndb2ezriFug2VvsIQkwz JvAF3VBv3sEIMx4GCJpsT49UBTOgCT6BA3NeWD8pi8aChVbImjvg4sA2tnmc64bw/xKp pyt3e+TPlqS9/QHg40X3NuGDJ88lJag2KY59S/Q5nAxysaMgX9xTRgFDWCHHc/B7Llx5 NBONoxGhQui7Oflo6Pz3iOUbsrqpoKZ8QOYa4mlZnKPndjL3wQHv4Xbdh+dZ51o079Xn rIPhkm+D1Pd4q59v6mHX0QDPMz2lkC/GRLiLr4uXOC14gv4/K6lg4qzNpJl6rxza06jU Ry5Q== X-Gm-Message-State: AC+VfDyiNDJAvAl6t6cIVHgwozL71DHrwLxEuVfLRBI2x9eyD2ioOzEg r12PhXs1Wu9OOyF6jK1bWG1tEg== X-Google-Smtp-Source: ACHHUZ6lVvN31sBriyHtQUo4iIlWjUc8vgsx+qnIrz5Kb8Jhe8UJO9QGV+cM44ErYZUti1LdEdlZpw== X-Received: by 2002:a81:7189:0:b0:54f:9cd0:990 with SMTP id m131-20020a817189000000b0054f9cd00990mr2846044ywc.18.1687247131680; Tue, 20 Jun 2023 00:45:31 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o9-20020a0dcc09000000b0056d2fce4e09sm379759ywd.42.2023.06.20.00.45.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:45:31 -0700 (PDT) Date: Tue, 20 Jun 2023 00:45:26 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 04/12] powerpc: assert_pte_locked() use pte_offset_map_nolock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <7ae6836b-b612-23f1-63e0-babda6e96e2c@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is stricter than the previous implementation, which skipped when pmd_none() (with a comment on khugepaged collapse transitions): but wouldn't we want to know, if an assert_pte_locked() caller can be racing such transitions? This mod might cause new crashes: which either expose my ignorance, or indicate issues to be fixed, or limit the usage of assert_pte_locked(). Signed-off-by: Hugh Dickins --- arch/powerpc/mm/pgtable.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..16b061af86d7 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) p4d_t *p4d; pud_t *pud; pmd_t *pmd; + pte_t *pte; + spinlock_t *ptl; if (mm == &init_mm) return; @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) pud = pud_offset(p4d, addr); BUG_ON(pud_none(*pud)); pmd = pmd_offset(pud, addr); - /* - * khugepaged to collapse normal pages to hugepage, first set - * pmd to none to force page fault/gup to take mmap_lock. After - * pmd is set to none, we do a pte_clear which does this assertion - * so if we find pmd none, return. - */ - if (pmd_none(*pmd)) - return; - BUG_ON(!pmd_present(*pmd)); - assert_spin_locked(pte_lockptr(mm, pmd)); + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + BUG_ON(!pte); + assert_spin_locked(ptl); + pte_unmap(pte); } #endif /* CONFIG_DEBUG_VM */ From patchwork Tue Jun 20 07:47:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796955 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=FTDl5iE6; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QldzW20zGz20XS for ; Tue, 20 Jun 2023 17:48:23 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=FTDl5iE6; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QldzV4R5vz30h7 for ; Tue, 20 Jun 2023 17:48:22 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=FTDl5iE6; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::b2f; helo=mail-yb1-xb2f.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qldz75KbGz2yxt for ; Tue, 20 Jun 2023 17:48:02 +1000 (AEST) Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-be30cbe88b3so4509743276.1 for ; Tue, 20 Jun 2023 00:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247279; x=1689839279; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=FTDl5iE634tOzkrqSCVEu9Nf5D7hLqctOZFWgKMkQ3RqUmNsUi/5jC5UBELVK+62a/ tppQbX64R9NP8cj1gGJ6SvM5CU9vfUE39QVPjAKeWzgQ/S5s5xCa2DCQNDmPIUw76uDU docHziNctEA/rKYRNqAAe9o2ycs/Al7k/GvMXYgsm/K5xYfYahKx2SHDAiqufCA//I5i fpqnLbMwekkTgKZJTegbsUNCWCDlJKVeWVzpAxM2KcHVartgtsYK5L6lD7NGrGn8gZ8V H4EcXMPG6hTq4PUDvh8MLdynIDaaJQky/EdiXCX827BTAn0GvFQ+WeIvsKvPBLKBi8Cj Dm0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247279; x=1689839279; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=Py0dDUxKtMK5lATWNlLQMIbHzDAW3lBT0jbK1Ml18WiXLZpvA8gzJNzJLI5G5vGvBF /zMtZe/kFkFVlgIb178P0N7vJhYmD7pWnam5zEXVpDTM1NKLaraD14dE8rJ/inRoBxWt /eYIZWj5PxUcymrYaZM5wo+7ZAWbUJRtAlpoIZbP4PUs3H7HTTY4p7SqT/GFjx2qkV/Q YtJnrM2+k6lzMSN0U/8YhA941ZMYVAjjTW0uEG5RV1fqe0lLd76y8bplXsMb/7vkQZd3 /7sNYqbGrD1JLMpoOvjfF0U3idhtSZ6YRibSgLOfQM8vP8wFCXDL1UBzacOsTW6O/hDi hJUg== X-Gm-Message-State: AC+VfDyvX0+zA8Ey7A3xfhqgYJcC790BpJQ6y3VsoJRjxJHOUFp4MAEg xYK61dhYRlq97f4jQt5PlpnrZA== X-Google-Smtp-Source: ACHHUZ4S0WdBV687zylMKJcs5BTJOEB/HsGWCswlmI7GBgVKK08qTW3e1TZ6XRwEWVZtJ7HVB5KlNQ== X-Received: by 2002:a0d:e64d:0:b0:569:74f3:f3e1 with SMTP id p74-20020a0de64d000000b0056974f3f3e1mr11385428ywe.0.1687247279055; Tue, 20 Jun 2023 00:47:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j127-20020a0df985000000b005612fc707bfsm364068ywf.120.2023.06.20.00.47.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:47:58 -0700 (PDT) Date: Tue, 20 Jun 2023 00:47:54 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <5cd9f442-61da-4c3d-eca-b7f44d22aa5f@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David Sc. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Add powerpc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This is awkward because the struct page contains only one rcu_head, but that page may be shared between PTE_FRAG_NR pagetables, each wanting to use the rcu_head at the same time: account concurrent deferrals with a heightened refcount, only the first making use of the rcu_head, but re-deferring if more deferrals arrived during its grace period. Signed-off-by: Hugh Dickins Signed-off-by: Hugh Dickins --- arch/powerpc/include/asm/pgalloc.h | 4 +++ arch/powerpc/mm/pgtable-frag.c | 51 ++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index 3360cad78ace..3a971e2a8c73 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -45,6 +45,10 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage) pte_fragment_free((unsigned long *)ptepage, 0); } +/* arch use pte_free_defer() implementation in arch/powerpc/mm/pgtable-frag.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* * Functions that deal with pagetables that could be at any level of * the table need to be passed an "index_size" so they know how to diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 20652daa1d7e..e4f58c5fc2ac 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -120,3 +120,54 @@ void pte_fragment_free(unsigned long *table, int kernel) __free_page(page); } } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define PTE_FREE_DEFERRED 0x10000 /* beyond any PTE_FRAG_NR */ + +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + int refcount; + + page = container_of(head, struct page, rcu_head); + refcount = atomic_sub_return(PTE_FREE_DEFERRED - 1, + &page->pt_frag_refcount); + if (refcount < PTE_FREE_DEFERRED) { + pte_fragment_free((unsigned long *)page_address(page), 0); + return; + } + /* + * One page may be shared between PTE_FRAG_NR pagetables. + * At least one more call to pte_free_defer() came in while we + * were already deferring, so the free must be deferred again; + * but just for one grace period, however many calls came in. + */ + while (refcount >= PTE_FREE_DEFERRED + PTE_FREE_DEFERRED) { + refcount = atomic_sub_return(PTE_FREE_DEFERRED, + &page->pt_frag_refcount); + } + /* Remove that refcount of 1 left for fragment freeing above */ + atomic_dec(&page->pt_frag_refcount); + call_rcu(&page->rcu_head, pte_free_now); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + /* + * One page may be shared between PTE_FRAG_NR pagetables: only queue + * it once for freeing, but note whenever the free must be deferred. + * + * (This would be much simpler if the struct page had an rcu_head for + * each fragment, or if we could allocate a separate array for that.) + * + * Convert our refcount of 1 to a refcount of PTE_FREE_DEFERRED, and + * proceed to call_rcu() only when the rcu_head is not already in use. + */ + if (atomic_add_return(PTE_FREE_DEFERRED - 1, &page->pt_frag_refcount) < + PTE_FREE_DEFERRED + PTE_FREE_DEFERRED) + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ From patchwork Tue Jun 20 07:49:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796966 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=Wlyhpqj5; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qlf1g2DfHz20Xf for ; Tue, 20 Jun 2023 17:50:15 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=Wlyhpqj5; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qlf1f5BBSz3bxH for ; Tue, 20 Jun 2023 17:50:14 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=Wlyhpqj5; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::112b; helo=mail-yw1-x112b.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qlf130qbGz3bVf for ; Tue, 20 Jun 2023 17:49:42 +1000 (AEST) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-5704fce0f23so47448417b3.3 for ; Tue, 20 Jun 2023 00:49:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247379; x=1689839379; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=Wlyhpqj5qzfuvutrjOa/4V6PuPsDCDOgWHtSW/4kwbuOkurc1q0KurhYLqHW1JVDng ocfjumMCL+5VTdlkv6JFAWzHFSaf/DQcTveKuW/F1nUw+Sj6E3jgL+8I+UQrrKTcSspC zGEBBcceErIi3jvfUdE8XIwgkG18lQEbl2V2HS4GGiBXkvY4t9/NkaOVpMEahcggUUzl u8oefzeOOrLLnM3RkTGo6WsoKQCjwWkScujGOaUdyRth+pzrlMusEw0XYP/BcTSABeYV rleoq5m+PuifcHOelghsbpsP6kubTOIzWI9OP98uqMngj6OJlbnVDdZn1qt2293RkTnb aaaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247379; x=1689839379; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=bzM34/KVtorxTnYx6sr66MrtPxJ5+HLUA4L9gj1lK8Uy0ODEg1mvpdXO0tlynDT7vr 0hQ2tENAEwZWXgKL33SJlC6AOsfC7QkFD5qbG5tFJZq9c0ixggWK4zt1I0qlQNNC5zrI ZHI9BIQr03ASKno4UyVXZr5ZXnguRfJsKHW8PwMRF9ODSvPAMg2GXjj9PEMukLhFgvGc cVaYV9piSDNw4jwogFBSfWpEF+Y9WvYUwXyx9zQ8A6fAwVqesxCnNpIRE5AyHl8dWKnc uYCJDVu4Caazchqr5bCorBLeHCoTE9jlwol3J1RgbS3ywYQLbs3nqX0Tgkc2y8k77ZAq FnKg== X-Gm-Message-State: AC+VfDxt+aYLm5YBQtz65GfmsCpGIIBo1l4n6lRZqAQE1wXRO/GqpgIl JK3I1rpIBmw0GzLzzfjjP/VDhw== X-Google-Smtp-Source: ACHHUZ5+IOPW+qBQbjfJiBwfK7g3HrB1Jv9bM1vDxBV2sAeLMGBBrGsscHleOg916F+4k1/7bPeS5w== X-Received: by 2002:a81:8311:0:b0:568:d63e:dd2c with SMTP id t17-20020a818311000000b00568d63edd2cmr10308263ywf.11.1687247379257; Tue, 20 Jun 2023 00:49:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o17-20020a0dcc11000000b005702597583fsm381836ywd.26.2023.06.20.00.49.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:49:38 -0700 (PDT) Date: Tue, 20 Jun 2023 00:49:34 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Add sparc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. sparc32 supports pagetables sharing a page, but does not support THP; sparc64 supports THP, but does not support pagetables sharing a page. So the sparc-specific pte_free_defer() is as simple as the generic one, except for converting between pte_t *pgtable_t and struct page *. Signed-off-by: Hugh Dickins --- arch/sparc/include/asm/pgalloc_64.h | 4 ++++ arch/sparc/mm/init_64.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/pgalloc_64.h index 7b5561d17ab1..caa7632be4c2 100644 --- a/arch/sparc/include/asm/pgalloc_64.h +++ b/arch/sparc/include/asm/pgalloc_64.h @@ -65,6 +65,10 @@ pgtable_t pte_alloc_one(struct mm_struct *mm); void pte_free_kernel(struct mm_struct *mm, pte_t *pte); void pte_free(struct mm_struct *mm, pgtable_t ptepage); +/* arch use pte_free_defer() implementation in arch/sparc/mm/init_64.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + #define pmd_populate_kernel(MM, PMD, PTE) pmd_set(MM, PMD, PTE) #define pmd_populate(MM, PMD, PTE) pmd_set(MM, PMD, PTE) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 04f9db0c3111..0d7fd793924c 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2930,6 +2930,22 @@ void pgtable_free(void *table, bool is_page) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + __pte_free((pgtable_t)page_address(page)); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +} + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd) { From patchwork Tue Jun 20 07:51:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796969 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=yMZZZ71U; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qlf3P1wmGz20WT for ; Tue, 20 Jun 2023 17:51:45 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=yMZZZ71U; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qlf3N6Qhrz3bb4 for ; Tue, 20 Jun 2023 17:51:44 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=yMZZZ71U; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::1136; helo=mail-yw1-x1136.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qlf343vzHz2yxt for ; Tue, 20 Jun 2023 17:51:28 +1000 (AEST) Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-5728df0a7d9so33776337b3.1 for ; Tue, 20 Jun 2023 00:51:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247485; x=1689839485; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=yMZZZ71UME7oK4yGdGKbgAGJwHWifsRBUE1ty8JHuL5GZamUEZ4PagH7+CA+ACRuXJ sVYkjhp9M9lk4smKqdKCXO0MTjpP1EkTbYT8MDDYBlSiF5OQTuGuvrMV46BD/K8KMf/8 VOuKBB/cMCFRHgkyZqVnyK9Ko9EAea3S5WqfjuYhhZ6hhTM9p4DwQdnfw6MNXWL2d4va sxwz/XIyotH6XV5mKGrc5rIq2+19pPbxhawaqiUWsN1AbxSzM+rTQINZrSJH0bvVIXEy SnNN73ZM1CD0WeufHS2IZ+FCcyXO+JAYLl5u8OigLfNh6mFqqNG2vS4i+tU2O9Fk9zM/ R+ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247485; x=1689839485; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=iC+MsnteYriBp/BIUUPas/blJudjklK2/jGJH7c+PINXEWHMHqZCauaxawuXbEQOCt +3olFYEENeh9HgIbs15/dQZC9aDcyRcMmDv++xKNzsPyx7cE1mGgy9selLZRuMMhs+v4 DckUJ0YKMO/r8TnWdNXeEwtfmBYoR17ILzGxacCez1CTqBU1/Xy8UvRkRFh9fSU+w4HN fm0GJ3zFP93SVfCuoENTIEilwBL+Pp+1cnphHJhQiCjnnkuMwS1WRfXgkAofzGoo6jpl NnYklGpYVqEgaPC6WYrxUoL7v7g5pCIbawGnijcHZeZdCgffMIzprUvX0vpHYmPbcA7a cBqg== X-Gm-Message-State: AC+VfDxB/7qRYC4eN6Gzaf9rFButi0s52cmNBV+vUdV1TU/uVbK9Ggo1 8XEjEuc/F2yjqcbUd51qh9WhcA== X-Google-Smtp-Source: ACHHUZ5V40k+FwqrX2L16LirpEKemNO0AwOCEL4uHyPeS6tT3McwesGGFGOm1IWTgWqHC9fJk3bo5g== X-Received: by 2002:a25:b296:0:b0:ba8:7122:2917 with SMTP id k22-20020a25b296000000b00ba871222917mr8863514ybj.0.1687247484917; Tue, 20 Jun 2023 00:51:24 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o195-20020a2541cc000000b00bb21aeddeffsm258356yba.18.2023.06.20.00.51.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:51:24 -0700 (PDT) Date: Tue, 20 Jun 2023 00:51:19 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Add s390-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This version is more complicated than others: because s390 fits two 2K page tables into one 4K page (so page->rcu_head must be shared between both halves), and already uses page->lru (which page->rcu_head overlays) to list any free halves; with clever management by page->_refcount bits. Build upon the existing management, adjusted to follow a new rule: that a page is not linked to mm_context_t::pgtable_list while either half is pending free, by either tlb_remove_table() or pte_free_defer(); but is afterwards either relinked to the list (if other half is allocated), or freed (if other half is free): by __tlb_remove_table() in both cases. This rule ensures that page->lru is no longer in use while page->rcu_head may be needed for use by pte_free_defer(). And a fortuitous byproduct of following this rule is that page_table_free() no longer needs its curious two-step manipulation of _refcount - read commit c2c224932fd0 ("s390/mm: fix 2KB pgtable release race") for what to think of there. But it does not solve the problem that two halves may need rcu_head at the same time. For that, add HHead bits between s390's AAllocated and PPending bits in the upper byte of page->_refcount: then the second pte_free_defer() can see that rcu_head is already in use, and the RCU callee pte_free_half() can see that it needs to make a further call_rcu() for that other half. page_table_alloc() set the page->pt_mm field, so __tlb_remove_table() knows where to link the freed half while its other half is allocated. But linking to the list needs mm->context.lock: and although AA bit set guarantees that pt_mm must still be valid, it does not guarantee that mm is still valid an instant later: so acquiring mm->context.lock would not be safe. For now, use a static global mm_pgtable_list_lock instead: then a soon-to-follow commit will split it per-mm as before (probably by using a SLAB_TYPESAFE_BY_RCU structure for the list head and its lock); and update the commentary on the pgtable_list. Signed-off-by: Hugh Dickins Signed-off-by: Gerald Schaefer Signed-off-by: Hugh Dickins Reviewed-by: Gerald Schaefer --- arch/s390/include/asm/pgalloc.h | 4 + arch/s390/mm/pgalloc.c | 205 +++++++++++++++++++++++--------- include/linux/mm_types.h | 2 +- 3 files changed, 154 insertions(+), 57 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 17eb618f1348..89a9d5ef94f8 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -143,6 +143,10 @@ static inline void pmd_populate(struct mm_struct *mm, #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte) #define pte_free(mm, pte) page_table_free(mm, (unsigned long *) pte) +/* arch use pte_free_defer() implementation in arch/s390/mm/pgalloc.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + void vmem_map_init(void); void *vmem_crst_alloc(unsigned long val); pte_t *vmem_pte_alloc(void); diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 66ab68db9842..11983a3ff95a 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -159,6 +159,11 @@ void page_table_free_pgste(struct page *page) #endif /* CONFIG_PGSTE */ +/* + * Temporarily use a global spinlock instead of mm->context.lock. + * This will be replaced by a per-mm spinlock in a followup commit. + */ +static DEFINE_SPINLOCK(mm_pgtable_list_lock); /* * A 2KB-pgtable is either upper or lower half of a normal page. * The second half of the page may be unused or used as another @@ -172,7 +177,7 @@ void page_table_free_pgste(struct page *page) * When a parent page gets fully allocated it contains 2KB-pgtables in both * upper and lower halves and is removed from mm_context_t::pgtable_list. * - * When 2KB-pgtable is freed from to fully allocated parent page that + * When 2KB-pgtable is freed from the fully allocated parent page that * page turns partially allocated and added to mm_context_t::pgtable_list. * * If 2KB-pgtable is freed from the partially allocated parent page that @@ -182,16 +187,24 @@ void page_table_free_pgste(struct page *page) * As follows from the above, no unallocated or fully allocated parent * pages are contained in mm_context_t::pgtable_list. * + * NOTE NOTE NOTE: The commentary above and below has not yet been updated: + * the new rule is that a page is not linked to mm_context_t::pgtable_list + * while either half is pending free by any method; but afterwards is + * either relinked to it, or freed, by __tlb_remove_table(). This allows + * pte_free_defer() to use the page->rcu_head (which overlays page->lru). + * * The upper byte (bits 24-31) of the parent page _refcount is used * for tracking contained 2KB-pgtables and has the following format: * - * PP AA - * 01234567 upper byte (bits 24-31) of struct page::_refcount - * || || - * || |+--- upper 2KB-pgtable is allocated - * || +---- lower 2KB-pgtable is allocated - * |+------- upper 2KB-pgtable is pending for removal - * +-------- lower 2KB-pgtable is pending for removal + * PPHHAA + * 76543210 upper byte (bits 24-31) of struct page::_refcount + * |||||| + * |||||+--- lower 2KB-pgtable is allocated + * ||||+---- upper 2KB-pgtable is allocated + * |||+----- lower 2KB-pgtable is pending free by page->rcu_head + * ||+------ upper 2KB-pgtable is pending free by page->rcu_head + * |+------- lower 2KB-pgtable is pending free by any method + * +-------- upper 2KB-pgtable is pending free by any method * * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why * using _refcount is possible). @@ -200,7 +213,7 @@ void page_table_free_pgste(struct page *page) * The parent page is either: * - added to mm_context_t::pgtable_list in case the second half of the * parent page is still unallocated; - * - removed from mm_context_t::pgtable_list in case both hales of the + * - removed from mm_context_t::pgtable_list in case both halves of the * parent page are allocated; * These operations are protected with mm_context_t::lock. * @@ -239,32 +252,22 @@ unsigned long *page_table_alloc(struct mm_struct *mm) /* Try to get a fragment of a 4K page as a 2K page table */ if (!mm_alloc_pgste(mm)) { table = NULL; - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); if (!list_empty(&mm->context.pgtable_list)) { page = list_first_entry(&mm->context.pgtable_list, struct page, lru); mask = atomic_read(&page->_refcount) >> 24; - /* - * The pending removal bits must also be checked. - * Failure to do so might lead to an impossible - * value of (i.e 0x13 or 0x23) written to _refcount. - * Such values violate the assumption that pending and - * allocation bits are mutually exclusive, and the rest - * of the code unrails as result. That could lead to - * a whole bunch of races and corruptions. - */ - mask = (mask | (mask >> 4)) & 0x03U; - if (mask != 0x03U) { - table = (unsigned long *) page_to_virt(page); - bit = mask & 1; /* =1 -> second 2K */ - if (bit) - table += PTRS_PER_PTE; - atomic_xor_bits(&page->_refcount, - 0x01U << (bit + 24)); - list_del(&page->lru); - } + /* Cannot be on this list if either half pending free */ + WARN_ON_ONCE(mask & ~0x03U); + /* One or other half must be available, but not both */ + WARN_ON_ONCE(mask == 0x00U || mask == 0x03U); + table = (unsigned long *)page_to_virt(page); + bit = mask & 0x01U; /* =1 -> second 2K available */ + table += bit * PTRS_PER_PTE; + atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); + list_del(&page->lru); } - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); if (table) return table; } @@ -278,6 +281,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm) } arch_set_page_dat(page, 0); /* Initialize page table */ + page->pt_mm = mm; table = (unsigned long *) page_to_virt(page); if (mm_alloc_pgste(mm)) { /* Return 4K page table with PGSTEs */ @@ -288,14 +292,14 @@ unsigned long *page_table_alloc(struct mm_struct *mm) /* Return the first 2K fragment of the page */ atomic_xor_bits(&page->_refcount, 0x01U << 24); memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); list_add(&page->lru, &mm->context.pgtable_list); - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); } return table; } -static void page_table_release_check(struct page *page, void *table, +static void page_table_release_check(struct page *page, unsigned long *table, unsigned int half, unsigned int mask) { char msg[128]; @@ -317,21 +321,18 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) if (!mm_alloc_pgste(mm)) { /* Free 2K page table fragment of a 4K page */ bit = ((unsigned long) table & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t)); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); /* - * Mark the page for delayed release. The actual release - * will happen outside of the critical section from this - * function or from __tlb_remove_table() + * Mark the page for release. The actual release will happen + * below from this function, or later from __tlb_remove_table(). */ - mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); + mask = atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) + if (mask & 0x03U) /* other half is allocated */ list_add(&page->lru, &mm->context.pgtable_list); - else + else if (!(mask & 0x30U)) /* other half not pending */ list_del(&page->lru); - spin_unlock_bh(&mm->context.lock); - mask = atomic_xor_bits(&page->_refcount, 0x10U << (bit + 24)); - mask >>= 24; + spin_unlock_bh(&mm_pgtable_list_lock); if (mask != 0x00U) return; half = 0x01U << bit; @@ -362,19 +363,17 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, return; } bit = ((unsigned long) table & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t)); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); /* - * Mark the page for delayed release. The actual release will happen - * outside of the critical section from __tlb_remove_table() or from - * page_table_free() + * Mark the page for delayed release. + * The actual release will happen later, from __tlb_remove_table(). */ mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) - list_add_tail(&page->lru, &mm->context.pgtable_list); - else + /* Other half not allocated? Other half not already pending free? */ + if ((mask & 0x03U) == 0x00U && (mask & 0x30U) != 0x30U) list_del(&page->lru); - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); table = (unsigned long *) ((unsigned long) table | (0x01U << bit)); tlb_remove_table(tlb, table); } @@ -382,17 +381,40 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, void __tlb_remove_table(void *_table) { unsigned int mask = (unsigned long) _table & 0x03U, half = mask; - void *table = (void *)((unsigned long) _table ^ mask); + unsigned long *table = (unsigned long *)((unsigned long) _table ^ mask); struct page *page = virt_to_page(table); switch (half) { case 0x00U: /* pmd, pud, or p4d */ - free_pages((unsigned long)table, CRST_ALLOC_ORDER); + __free_pages(page, CRST_ALLOC_ORDER); return; case 0x01U: /* lower 2K of a 4K page table */ - case 0x02U: /* higher 2K of a 4K page table */ - mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24)); - mask >>= 24; + case 0x02U: /* upper 2K of a 4K page table */ + /* + * If the other half is marked as allocated, page->pt_mm must + * still be valid, page->rcu_head no longer in use so page->lru + * good for use, so now make the freed half available for reuse. + * But be wary of races with that other half being freed. + */ + if (atomic_read(&page->_refcount) & (0x03U << 24)) { + struct mm_struct *mm = page->pt_mm; + /* + * It is safe to use page->pt_mm when the other half + * is seen allocated while holding pgtable_list lock; + * but how will it be safe to acquire that spinlock? + * Global mm_pgtable_list_lock is safe and easy for + * now, then a followup commit will split it per-mm. + */ + spin_lock_bh(&mm_pgtable_list_lock); + mask = atomic_xor_bits(&page->_refcount, mask << 28); + mask >>= 24; + if (mask & 0x03U) + list_add(&page->lru, &mm->context.pgtable_list); + spin_unlock_bh(&mm_pgtable_list_lock); + } else { + mask = atomic_xor_bits(&page->_refcount, mask << 28); + mask >>= 24; + } if (mask != 0x00U) return; break; @@ -407,6 +429,77 @@ void __tlb_remove_table(void *_table) __free_page(page); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now0(struct rcu_head *head); +static void pte_free_now1(struct rcu_head *head); + +static void pte_free_pgste(struct rcu_head *head) +{ + unsigned long *table; + struct page *page; + + page = container_of(head, struct page, rcu_head); + table = (unsigned long *)page_to_virt(page); + table = (unsigned long *)((unsigned long)table | 0x03U); + __tlb_remove_table(table); +} + +static void pte_free_half(struct rcu_head *head, unsigned int bit) +{ + unsigned long *table; + struct page *page; + unsigned int mask; + + page = container_of(head, struct page, rcu_head); + mask = atomic_xor_bits(&page->_refcount, 0x04U << (bit + 24)); + + table = (unsigned long *)page_to_virt(page); + table += bit * PTRS_PER_PTE; + table = (unsigned long *)((unsigned long)table | (0x01U << bit)); + __tlb_remove_table(table); + + /* If pte_free_defer() of the other half came in, queue it now */ + if (mask & 0x0CU) + call_rcu(&page->rcu_head, bit ? pte_free_now0 : pte_free_now1); +} + +static void pte_free_now0(struct rcu_head *head) +{ + pte_free_half(head, 0); +} + +static void pte_free_now1(struct rcu_head *head) +{ + pte_free_half(head, 1); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + unsigned int bit, mask; + struct page *page; + + page = virt_to_page(pgtable); + if (mm_alloc_pgste(mm)) { + call_rcu(&page->rcu_head, pte_free_pgste); + return; + } + bit = ((unsigned long)pgtable & ~PAGE_MASK) / + (PTRS_PER_PTE * sizeof(pte_t)); + + spin_lock_bh(&mm_pgtable_list_lock); + mask = atomic_xor_bits(&page->_refcount, 0x15U << (bit + 24)); + mask >>= 24; + /* Other half not allocated? Other half not already pending free? */ + if ((mask & 0x03U) == 0x00U && (mask & 0x30U) != 0x30U) + list_del(&page->lru); + spin_unlock_bh(&mm_pgtable_list_lock); + + /* Do not relink on rcu_head if other half already linked on rcu_head */ + if ((mask & 0x0CU) != 0x0CU) + call_rcu(&page->rcu_head, bit ? pte_free_now1 : pte_free_now0); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /* * Base infrastructure required to generate basic asces, region, segment, * and page tables that do not make use of enhanced features like EDAT1. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..1667a1bdb8a8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -146,7 +146,7 @@ struct page { pgtable_t pmd_huge_pte; /* protected by page->ptl */ unsigned long _pt_pad_2; /* mapping */ union { - struct mm_struct *pt_mm; /* x86 pgds only */ + struct mm_struct *pt_mm; /* x86 pgd, s390 */ atomic_t pt_frag_refcount; /* powerpc */ }; #if ALLOC_SPLIT_PTLOCKS From patchwork Tue Jun 20 07:53:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796975 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=2GY6tb59; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qlf5R29w8z20WT for ; Tue, 20 Jun 2023 17:53:31 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=2GY6tb59; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qlf5Q5dPLz30hK for ; Tue, 20 Jun 2023 17:53:30 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=2GY6tb59; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::1135; helo=mail-yw1-x1135.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qlf543KfNz2yyc for ; Tue, 20 Jun 2023 17:53:11 +1000 (AEST) Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-5707b429540so53753377b3.1 for ; Tue, 20 Jun 2023 00:53:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247588; x=1689839588; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=2GY6tb59UZYGOHu8Jpard49Ijl8S41fZ8HDgfi6a4gvFJ5750jQeYrmu9nVrvi1jW2 tp8ypG/banZ4ws6Js9iQQN9cGknEmtht9BWd32J2yMcrVqteXUvW9YR9uMMhKiTOhwCj DlNulmZyq+f8KzKXIgNcUSNUMmKGW1CGHEEgbEp/J7IxewrE//XtQewkerpcgsbwL40t WfRaz/SdO8RzSMtTrw2rYV18y0/dvvOSll1UA2/d8Zuh32W7Pnj2B4fhDdSSvEN+0aoF 9f1IZjfj8naj8IynmrWiGUnfEPBGZdaKRscDXhw1Sy1gWU3iNWnHKGQGUPwpRy5JQoX9 lMYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247588; x=1689839588; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=LphCZU410+gq6F7ympLMAsyKB4g7KREzlswzHuQQxJ+2daz+IMezhpx4QwzUUx3hZS TZYNhjcWjN29j2gdftvodM7ePaKhZLdrNvCD9HUVM4ENqSZN7lF4qluxLmtCaxIidPVM 5lhfXngugZtxuyQziZt3d35oJHHZkiv+gMk42mK4kKTRJKi6+XsSW/oWeSXRmL1qjKFi lMccHgI3J65st8ubgQCJ4/ybBCreBhQzb8fp683ZeGH3yUOGrq9SwZif2JL3ggxAwlZG hHT3UhXrQEPzeVl9G5PskT7XyK8AfL8qtn9NEMxaHTBkqjwYExxuh81e0iCQVcFb8IQr qLPQ== X-Gm-Message-State: AC+VfDyTyFECI+sAdFbySKbNvQgsYv4XfqrqV0arAdQcGsyFU4sXW3yq 4kKZfIVJWKV5sDtRjBbquTMONw== X-Google-Smtp-Source: ACHHUZ5uDhFE53584OsFJztx7V9NmKNE+1aQZMwUHlsEN2oJtZCLrBIKo4HQBYW6kI7RzecFkIc3yA== X-Received: by 2002:a81:6c11:0:b0:56f:f83f:618 with SMTP id h17-20020a816c11000000b0056ff83f0618mr10840742ywc.19.1687247587950; Tue, 20 Jun 2023 00:53:07 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x11-20020a81630b000000b0056ffca5fb01sm370775ywb.117.2023.06.20.00.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:53:07 -0700 (PDT) Date: Tue, 20 Jun 2023 00:53:03 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <3e5961a2-26e5-d1ab-5c4c-527e273e3cc5@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Add the generic pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This version suits all those architectures which use an unfragmented page for one page table (none of whose pte_free()s use the mm arg which was passed to it). Signed-off-by: Hugh Dickins --- include/linux/mm_types.h | 4 ++++ include/linux/pgtable.h | 2 ++ mm/pgtable-generic.c | 20 ++++++++++++++++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1667a1bdb8a8..09335fa28c41 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -144,6 +144,10 @@ struct page { struct { /* Page table pages */ unsigned long _pt_pad_1; /* compound_head */ pgtable_t pmd_huge_pte; /* protected by page->ptl */ + /* + * A PTE page table page might be freed by use of + * rcu_head: which overlays those two fields above. + */ unsigned long _pt_pad_2; /* mapping */ union { struct mm_struct *pt_mm; /* x86 pgd, s390 */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 525f1782b466..d18d3e963967 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5e85a625ab30..ab3741064bb8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include /* @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, return pmd; } #endif + +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */ +#ifndef pte_free_defer +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = pgtable; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* pte_free_defer */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ From patchwork Tue Jun 20 07:54:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796979 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=J8nzRWA0; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qlf7Z0G2Lz20XZ for ; Tue, 20 Jun 2023 17:55:21 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=J8nzRWA0; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qlf7Y4QtQz30gy for ; Tue, 20 Jun 2023 17:55:21 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=J8nzRWA0; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::735; helo=mail-qk1-x735.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qlf7F2fVKz2xFm for ; Tue, 20 Jun 2023 17:55:05 +1000 (AEST) Received: by mail-qk1-x735.google.com with SMTP id af79cd13be357-763a360868aso120754785a.0 for ; Tue, 20 Jun 2023 00:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247701; x=1689839701; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=J8nzRWA0uvpss9wXeQOicc8Ot9y4/UFuc6plZhigByzD/E6z56SmcPuzuWHVg6RLIr zeSwVuKlgmRjiVprT2c29o+LY/xl0mKSG7EI+HK+ZmF9VRuRMDrYHfurcBSQUvbGMDJM BrTznrXs5U8ERPP2CrQPh2E6mB/xxogY+UVNmpVE8mP4PFPg1/HAfaCo5DqPYBJr1jNU jJBNI0/yqplj0fpPoW7vqJFOdNB1zyVXocOEAXeD5BoEwwZ8uu8OHUDQYXH1raSp0ip4 Z7eB2HLsP8hFBVrCu74QJhwf7v/CbG+Ae1Bf3UqUxfhWUQfg52/Gluq44YXQQXKCwSWh Wn8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247701; x=1689839701; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=Qxgpfq+hmqlwWGTUj++ZAIU44mnyAV/+Qg4nVkN5Ou7Vu6IWR9YR0l90hHVFp86s05 kra7pGonlIPlpsR3npb1CzpyS/lR0SQOrRC7OQXhbf2bZn3MpwkT/vCQnAhk24BG8J5V YopHNBMNNWqavCk2znQys/mmzhcG/MfK7PDdAWC/jN+9rULgrwJGjJNfpD6qLfpMzizC g5QYZrbXGQC1RymPv8uWW5HIFgAqXy/7i01DKqVdkoUjbgskcxTEwnWSo2VuipL/X9/2 5exHzO/ngc1p4+M856h7zPLHt4MT41Or1HOx+5XiBIqTJ8Ij/VDXYoFwSWChu24KAPVd drXg== X-Gm-Message-State: AC+VfDxGyzVOZMgi9vSdUSBC+fHNiKHFJ4WT9/PXEnGHvd07n4rh5Nde 7McSY1xDxkvSh+OQtj7jbEAQ0Q== X-Google-Smtp-Source: ACHHUZ4DJO4EDwJbG0wgrFZYngaLjP4vE5bfDEN/dCam98OaAjzy+Vc4NRjPKMFiq2LjNC4HChBWFA== X-Received: by 2002:a05:620a:2890:b0:75d:4145:154e with SMTP id j16-20020a05620a289000b0075d4145154emr12424317qkp.65.1687247701263; Tue, 20 Jun 2023 00:55:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j1-20020a0df901000000b0054f50f71834sm368662ywf.124.2023.06.20.00.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:55:00 -0700 (PDT) Date: Tue, 20 Jun 2023 00:54:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Simplify shmem and file THP collapse's retract_page_tables(), and relax its locking: to improve its success rate and to lessen impact on others. Instead of its MADV_COLLAPSE case doing set_huge_pmd() at target_addr of target_mm, leave that part of the work to madvise_collapse() calling collapse_pte_mapped_thp() afterwards: just adjust collapse_file()'s result code to arrange for that. That spares retract_page_tables() four arguments; and since it will be successful in retracting all of the page tables expected of it, no need to track and return a result code itself. It needs i_mmap_lock_read(mapping) for traversing the vma interval tree, but it does not need i_mmap_lock_write() for that: page_vma_mapped_walk() allows for pte_offset_map_lock() etc to fail, and uses pmd_lock() for THPs. retract_page_tables() just needs to use those same spinlocks to exclude it briefly, while transitioning pmd from page table to none: so restore its use of pmd_lock() inside of which pte lock is nested. Users of pte_offset_map_lock() etc all now allow for them to fail: so retract_page_tables() now has no use for mmap_write_trylock() or vma_try_start_write(). In common with rmap and page_vma_mapped_walk(), it does not even need the mmap_read_lock(). But those users do expect the page table to remain a good page table, until they unlock and rcu_read_unlock(): so the page table cannot be freed immediately, but rather by the recently added pte_free_defer(). Use the (usually a no-op) pmdp_get_lockless_sync() to send an interrupt when PAE, and pmdp_collapse_flush() did not already do so: to make sure that the start,pmdp_get_lockless(),end sequence in __pte_offset_map() cannot pick up a pmd entry with mismatched pmd_low and pmd_high. retract_page_tables() can be enhanced to replace_page_tables(), which inserts the final huge pmd without mmap lock: going through an invalid state instead of pmd_none() followed by fault. But that enhancement does raise some more questions: leave it until a later release. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 184 ++++++++++++++++++++---------------------------- 1 file changed, 75 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1083f0e38a07..f7a0f7673127 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1617,9 +1617,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, break; case SCAN_PMD_NONE: /* - * In MADV_COLLAPSE path, possible race with khugepaged where - * all pte entries have been removed and pmd cleared. If so, - * skip all the pte checks and just update the pmd mapping. + * All pte entries have been removed and pmd cleared. + * Skip all the pte checks and just update the pmd mapping. */ goto maybe_install_pmd; default: @@ -1748,123 +1747,88 @@ static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_sl mmap_write_unlock(mm); } -static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, - struct mm_struct *target_mm, - unsigned long target_addr, struct page *hpage, - struct collapse_control *cc) +static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; - int target_result = SCAN_FAIL; - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { - int result = SCAN_FAIL; - struct mm_struct *mm = NULL; - unsigned long addr = 0; - pmd_t *pmd; - bool is_target = false; + struct mmu_notifier_range range; + struct mm_struct *mm; + unsigned long addr; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml; + spinlock_t *ptl; + bool skipped_uffd = false; /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that - * got written to. These VMAs are likely not worth investing - * mmap_write_lock(mm) as PMD-mapping is likely to be split - * later. - * - * Note that vma->anon_vma check is racy: it can be set up after - * the check but before we took mmap_lock by the fault path. - * But page lock would prevent establishing any new ptes of the - * page, so we are safe. - * - * An alternative would be drop the check, but check that page - * table is clear before calling pmdp_collapse_flush() under - * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. It would also probably require locking - * the anon_vma. + * got written to. These VMAs are likely not worth removing + * page tables from, as PMD-mapping is likely to be split later. */ - if (READ_ONCE(vma->anon_vma)) { - result = SCAN_PAGE_ANON; - goto next; - } + if (READ_ONCE(vma->anon_vma)) + continue; + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); if (addr & ~HPAGE_PMD_MASK || - vma->vm_end < addr + HPAGE_PMD_SIZE) { - result = SCAN_VMA_CHECK; - goto next; - } - mm = vma->vm_mm; - is_target = mm == target_mm && addr == target_addr; - result = find_pmd_or_thp_or_none(mm, addr, &pmd); - if (result != SCAN_SUCCEED) - goto next; - /* - * We need exclusive mmap_lock to retract page table. - * - * We use trylock due to lock inversion: we need to acquire - * mmap_lock while holding page lock. Fault path does it in - * reverse order. Trylock is a way to avoid deadlock. - * - * Also, it's not MADV_COLLAPSE's job to collapse other - * mappings - let khugepaged take care of them later. - */ - result = SCAN_PTE_MAPPED_HUGEPAGE; - if ((cc->is_khugepaged || is_target) && - mmap_write_trylock(mm)) { - /* trylock for the same lock inversion as above */ - if (!vma_try_start_write(vma)) - goto unlock_next; - - /* - * Re-check whether we have an ->anon_vma, because - * collapse_and_free_pmd() requires that either no - * ->anon_vma exists or the anon_vma is locked. - * We already checked ->anon_vma above, but that check - * is racy because ->anon_vma can be populated under the - * mmap lock in read mode. - */ - if (vma->anon_vma) { - result = SCAN_PAGE_ANON; - goto unlock_next; - } - /* - * When a vma is registered with uffd-wp, we can't - * recycle the pmd pgtable because there can be pte - * markers installed. Skip it only, so the rest mm/vma - * can still have the same file mapped hugely, however - * it'll always mapped in small page size for uffd-wp - * registered ranges. - */ - if (hpage_collapse_test_exit(mm)) { - result = SCAN_ANY_PROCESS; - goto unlock_next; - } - if (userfaultfd_wp(vma)) { - result = SCAN_PTE_UFFD_WP; - goto unlock_next; - } - collapse_and_free_pmd(mm, vma, addr, pmd); - if (!cc->is_khugepaged && is_target) - result = set_huge_pmd(vma, addr, pmd, hpage); - else - result = SCAN_SUCCEED; - -unlock_next: - mmap_write_unlock(mm); - goto next; - } - /* - * Calling context will handle target mm/addr. Otherwise, let - * khugepaged try again later. - */ - if (!is_target) { - khugepaged_add_pte_mapped_thp(mm, addr); + vma->vm_end < addr + HPAGE_PMD_SIZE) continue; + + mm = vma->vm_mm; + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) + continue; + + if (hpage_collapse_test_exit(mm)) + continue; + /* + * When a vma is registered with uffd-wp, we cannot recycle + * the page table because there may be pte markers installed. + * Other vmas can still have the same file mapped hugely, but + * skip this one: it will always be mapped in small page size + * for uffd-wp registered ranges. + */ + if (userfaultfd_wp(vma)) + continue; + + /* PTEs were notified when unmapped; but now for the PMD? */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + addr, addr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + pml = pmd_lock(mm, pmd); + ptl = pte_lockptr(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + /* + * Huge page lock is still held, so normally the page table + * must remain empty; and we have already skipped anon_vma + * and userfaultfd_wp() vmas. But since the mmap_lock is not + * held, it is still possible for a racing userfaultfd_ioctl() + * to have inserted ptes or markers. Now that we hold ptlock, + * repeating the anon_vma check protects from one category, + * and repeating the userfaultfd_wp() check from another. + */ + if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { + skipped_uffd = true; + } else { + pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); + pmdp_get_lockless_sync(); + } + + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); + + mmu_notifier_invalidate_range_end(&range); + + if (!skipped_uffd) { + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, addr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); } -next: - if (is_target) - target_result = result; } - i_mmap_unlock_write(mapping); - return target_result; + i_mmap_unlock_read(mapping); } /** @@ -2261,9 +2225,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* * Remove pte page tables, so we can re-fault the page as huge. + * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp(). */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); + retract_page_tables(mapping, start); + if (cc && !cc->is_khugepaged) + result = SCAN_PTE_MAPPED_HUGEPAGE; unlock_page(hpage); /* From patchwork Tue Jun 20 07:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796981 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=aVESce+3; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qlf9P3mDgz20XZ for ; Tue, 20 Jun 2023 17:56:57 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=aVESce+3; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Qlf9P1Fx3z30h7 for ; Tue, 20 Jun 2023 17:56:57 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=aVESce+3; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::82c; helo=mail-qt1-x82c.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Qlf941zS9z2xFm for ; Tue, 20 Jun 2023 17:56:40 +1000 (AEST) Received: by mail-qt1-x82c.google.com with SMTP id d75a77b69052e-3f9cf20da51so37768481cf.2 for ; Tue, 20 Jun 2023 00:56:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247796; x=1689839796; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=aVESce+35OzsmAlAStgyGN3RAj7WDCAE8sHC0KoMDM6cH2/moaTq02nEqNPwD3/T+W noO2Ww2L93Fcf5p9VnVnxgvjusPf47iNS1mjXHxFvgvoI/RPajs6xy2hlJm3pPwx/ukk EEVdSHMxTCdD+Z/QL780HcIsKFRRn53QQIJPB599L7CCSrk3CJVO4ksGsL2XLtesId9S XBJIAQuaqrldCBAWpVU1/KQoK1mGqrJ3H9wgg6dhtt5FIThtuxRlpvzAq0tV3zGFdrNA jHvRkgQ7WhWy8ZMMUEOCC7EiG5C12tuJVl2FBkFi/jQT8V3wZP59XIwC46nMyksOaHFh Ka8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247796; x=1689839796; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=He4slZFKbL8DOGwfwhBa3eMNBmuLQPAotAHlR49jbHWlQONXWBW5aBkHb1xksmdL7a asT7C93Ow8stW0lMWpn5hjFA6gzIX9pFeQBt400rEYcLjXi1xBwrgMN9EBrfnhS8wSZ/ VcUpkthX0lepC6bV4ChTN1sTHN7mqL9jSe6DXr0Pz/hfWrDbeRGWbty1IiSDwyICbv5J +Zb/Z+hePnID+J7p1ZeTRd6v/QHL6tnPjcZ/4+hogzsKvRkiCj+//6/Io2iuXb5EuiDF /oZUVc/rN0xFhDYy5/835rUuviUZtPyUUDSCFUakNRorahX2lceT/sWIcrs0l8jA0uLp J0oQ== X-Gm-Message-State: AC+VfDyUeTLkP6MQyykz45u3RsUyghb3dzMo3fMiEpmPl62V2E+wmhki eLYYk3FiDLB5k3beAklX+jT7eQ== X-Google-Smtp-Source: ACHHUZ7b5Od5F2ztLp7089p/6rSzydLtc+ipnNwtJnuDVoDpL+0rndgulPeJ/Tox+jZ/pysI1HkLKw== X-Received: by 2002:a05:622a:1708:b0:3f0:ac80:1ed7 with SMTP id h8-20020a05622a170800b003f0ac801ed7mr15885756qtk.45.1687247796145; Tue, 20 Jun 2023 00:56:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v25-20020a25fc19000000b00bab9a67a4cesm257974ybd.29.2023.06.20.00.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:56:35 -0700 (PDT) Date: Tue, 20 Jun 2023 00:56:31 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp(). It does need mmap_read_lock(), but it does not need mmap_write_lock(), nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing paths are relying on pte_offset_map_lock() and pmd_lock(), so use those. Follow the pattern in retract_page_tables(); and using pte_free_defer() removes most of the need for tlb_remove_table_sync_one() here; but call pmdp_get_lockless_sync() to use it in the PAE case. First check the VMA, in case page tables are being torn down: from JannH. Confirm the preliminary find_pmd_or_thp_or_none() once page lock has been acquired and the page looks suitable: from then on its state is stable. However, collapse_pte_mapped_thp() was doing something others don't: freeing a page table still containing "valid" entries. i_mmap lock did stop a racing truncate from double-freeing those pages, but we prefer collapse_pte_mapped_thp() to clear the entries as usual. Their TLB flush can wait until the pmdp_collapse_flush() which follows, but the mmu_notifier_invalidate_range_start() has to be done earlier. Do the "step 1" checking loop without mmu_notifier: it wouldn't be good for khugepaged to keep on repeatedly invalidating a range which is then found unsuitable e.g. contains COWs. "step 2", which does the clearing, must then be more careful (after dropping ptl to do mmu_notifier), with abort prepared to correct the accounting like "step 3". But with those entries now cleared, "step 4" (after dropping ptl to do pmd_lock) is kept safe by the huge page lock, which stops new PTEs from being faulted in. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 172 ++++++++++++++++++++++-------------------------- 1 file changed, 77 insertions(+), 95 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f7a0f7673127..060ac8789a1e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1485,7 +1485,7 @@ static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, return ret; } -/* hpage must be locked, and mmap_lock must be held in write */ +/* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) { @@ -1497,7 +1497,7 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, }; VM_BUG_ON(!PageTransHuge(hpage)); - mmap_assert_write_locked(vma->vm_mm); + mmap_assert_locked(vma->vm_mm); if (do_set_pmd(&vmf, hpage)) return SCAN_FAIL; @@ -1506,48 +1506,6 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, return SCAN_SUCCEED; } -/* - * A note about locking: - * Trying to take the page table spinlocks would be useless here because those - * are only used to synchronize: - * - * - modifying terminal entries (ones that point to a data page, not to another - * page table) - * - installing *new* non-terminal entries - * - * Instead, we need roughly the same kind of protection as free_pgtables() or - * mm_take_all_locks() (but only for a single VMA): - * The mmap lock together with this VMA's rmap locks covers all paths towards - * the page table entries we're messing with here, except for hardware page - * table walks and lockless_pages_from_mm(). - */ -static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t pmd; - struct mmu_notifier_range range; - - mmap_assert_write_locked(mm); - if (vma->vm_file) - lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem); - /* - * All anon_vmas attached to the VMA have the same root and are - * therefore locked by the same lock. - */ - if (vma->anon_vma) - lockdep_assert_held_write(&vma->anon_vma->root->rwsem); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - pmd = pmdp_collapse_flush(vma, addr, pmdp); - tlb_remove_table_sync_one(); - mmu_notifier_invalidate_range_end(&range); - mm_dec_nr_ptes(mm); - page_table_check_pte_clear_range(mm, addr, pmd); - pte_free(mm, pmd_pgtable(pmd)); -} - /** * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at * address haddr. @@ -1563,26 +1521,29 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { + struct mmu_notifier_range range; + bool notified = false; unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = vma_lookup(mm, haddr); struct page *hpage; pte_t *start_pte, *pte; - pmd_t *pmd; - spinlock_t *ptl; - int count = 0, result = SCAN_FAIL; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml, *ptl; + int nr_ptes = 0, result = SCAN_FAIL; int i; - mmap_assert_write_locked(mm); + mmap_assert_locked(mm); + + /* First check VMA found, in case page tables are being torn down */ + if (!vma || !vma->vm_file || + !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) + return SCAN_VMA_CHECK; /* Fast check before locking page if already PMD-mapped */ result = find_pmd_or_thp_or_none(mm, haddr, &pmd); if (result == SCAN_PMD_MAPPED) return result; - if (!vma || !vma->vm_file || - !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) - return SCAN_VMA_CHECK; - /* * If we are here, we've succeeded in replacing all the native pages * in the page cache with a single hugepage. If a mm were to fault-in @@ -1612,6 +1573,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); switch (result) { case SCAN_SUCCEED: break; @@ -1625,27 +1587,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } - /* Lock the vma before taking i_mmap and page table locks */ - vma_start_write(vma); - - /* - * We need to lock the mapping so that from here on, only GUP-fast and - * hardware page walks can access the parts of the page tables that - * we're operating on. - * See collapse_and_free_pmd(). - */ - i_mmap_lock_write(vma->vm_file->f_mapping); - - /* - * This spinlock should be unnecessary: Nobody else should be accessing - * the page tables under spinlock protection here, only - * lockless_pages_from_mm() and the hardware page walker can access page - * tables while all the high-level locks are held in write mode. - */ result = SCAN_FAIL; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); - if (!start_pte) - goto drop_immap; + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto drop_hpage; /* step 1: check all mapped PTEs are to the right huge page */ for (i = 0, addr = haddr, pte = start_pte; @@ -1671,57 +1616,94 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, */ if (hpage + i != page) goto abort; - count++; } - /* step 2: adjust rmap */ + pte_unmap_unlock(start_pte, ptl); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + notified = true; + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto abort; + + /* step 2: clear page table and adjust rmap */ for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { struct page *page; if (pte_none(*pte)) continue; - page = vm_normal_page(vma, addr, *pte); - if (WARN_ON_ONCE(page && is_zone_device_page(page))) + /* + * We dropped ptl after the first scan, to do the mmu_notifier: + * page lock stops more PTEs of the hpage being faulted in, but + * does not stop write faults COWing anon copies from existing + * PTEs; and does not stop those being swapped out or migrated. + */ + if (!pte_present(*pte)) { + result = SCAN_PTE_NON_PRESENT; goto abort; + } + page = vm_normal_page(vma, addr, *pte); + if (hpage + i != page) + goto abort; + + /* + * Must clear entry, or a racing truncate may re-remove it. + * TLB flush can be left until pmdp_collapse_flush() does it. + * PTE dirty? Shmem page is already dirty; file is read-only. + */ + pte_clear(mm, addr, pte); page_remove_rmap(page, vma, false); + nr_ptes++; } pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (count) { - page_ref_sub(hpage, count); - add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); + if (nr_ptes) { + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); } - /* step 4: remove pte entries */ - /* we make no change to anon, but protect concurrent anon page lookup */ - if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); + /* step 4: remove page table */ - collapse_and_free_pmd(mm, vma, haddr, pmd); + /* Huge page lock is still held, so page table must remain empty */ + pml = pmd_lock(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd); + pmdp_get_lockless_sync(); + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); - if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); - i_mmap_unlock_write(vma->vm_file->f_mapping); + mmu_notifier_invalidate_range_end(&range); + + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, haddr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); maybe_install_pmd: /* step 5: install pmd entry */ result = install_pmd ? set_huge_pmd(vma, haddr, pmd, hpage) : SCAN_SUCCEED; - + goto drop_hpage; +abort: + if (nr_ptes) { + flush_tlb_mm(mm); + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); + } + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + if (notified) + mmu_notifier_invalidate_range_end(&range); drop_hpage: unlock_page(hpage); put_page(hpage); return result; - -abort: - pte_unmap_unlock(start_pte, ptl); -drop_immap: - i_mmap_unlock_write(vma->vm_file->f_mapping); - goto drop_hpage; } static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) @@ -2857,9 +2839,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, case SCAN_PTE_MAPPED_HUGEPAGE: BUG_ON(mmap_locked); BUG_ON(*prev); - mmap_write_lock(mm); + mmap_read_lock(mm); result = collapse_pte_mapped_thp(mm, addr, true); - mmap_write_unlock(mm); + mmap_locked = true; goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: From patchwork Tue Jun 20 07:58:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796983 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=jjSq9aTI; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QlfCF12qcz20XZ for ; Tue, 20 Jun 2023 17:58:33 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=jjSq9aTI; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QlfCD5B4vz30h9 for ; Tue, 20 Jun 2023 17:58:32 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=jjSq9aTI; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::112a; helo=mail-yw1-x112a.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x112a.google.com (mail-yw1-x112a.google.com [IPv6:2607:f8b0:4864:20::112a]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QlfBv4RCRz2xFm for ; Tue, 20 Jun 2023 17:58:14 +1000 (AEST) Received: by mail-yw1-x112a.google.com with SMTP id 00721157ae682-5701eaf0d04so46505627b3.2 for ; Tue, 20 Jun 2023 00:58:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247892; x=1689839892; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=jjSq9aTI2DDqxscN4pq6tIW2t1oB2Us2M+ybVjj+WbNNuPa0SHl8anbtzK7Rjar1SY FtbdELwb+qm7aZfRDGlas9htldGEgGJo8EUel1IgLBRhuOjybXZ2JhWFWoNydWhWnLoe sDikPgviTF+F9BLGHUmNHshe9jLjcHp4eHYN78AHXDshMx1gmgi0lwXRrXrymzvfy1oS sSS/SWZGOylcmqt/OOyH7yW5jIVhDF2myOH5nmIHjTp4hCoBYxyIV0dBLcoGfKjX+nSm SCqmPcSA1OfsDZ4JORRyiDnJMkamZlwDweNLYjOmQrj5114Aq+ZG9d2oxbylEIDpX6zC XvUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247892; x=1689839892; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=XXYZDaqfFDuKOEDnfSXV6FQRxJXGiOaZuQDSM0+cl4drAzCP9S5/PwZZi4x3BrMr0t /dMuQLjrUAVLKWsgo/ZFND1uYajaJyaALSVOZkodveucvSMekg8MEiTy1OxINXyhlFou BbAQL5FRSTkNrDG1FGeviqPNt8BzaMWmOiur9dr1cLk1ZXT0KVDQV8Z3qtgYqs4gwRKS XYN/lQtczUeuBVtlsJAH/MpwSrj0VBRqAs1I8WovtlULh7Nemy362hzJlShorGbDQrOx tlNHU4jJDYphs8RaEvrYQiw2I1KVKikrypqdY7JZf8AiVRJW+uRO4Akt7RbZt71PAR+H IeWA== X-Gm-Message-State: AC+VfDxFm+8q+3I3Rel/wZPTdlM3ZFgQIg5QXQq+wRMYh1QpXvo2C2+c cAaIUn0d2Xt5abvjIehiz96iIg== X-Google-Smtp-Source: ACHHUZ6WOKlFi1AP0MbqxWtb/dOy1NBeYqPcXgHn95ddmUtg48jpbqy+Ay6U+MoHnAVEhjTgTFamZg== X-Received: by 2002:a81:9157:0:b0:570:8802:ec9f with SMTP id i84-20020a819157000000b005708802ec9fmr10023541ywg.19.1687247892063; Tue, 20 Jun 2023 00:58:12 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a133-20020a81668b000000b00569ff2d94f6sm385736ywc.19.2023.06.20.00.58.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:58:11 -0700 (PDT) Date: Tue, 20 Jun 2023 00:58:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <90cd6860-eb92-db66-9a8-5fa7b494a10@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Now that retract_page_tables() can retract page tables reliably, without depending on trylocks, delete all the apparatus for khugepaged to try again later: khugepaged_collapse_pte_mapped_thps() etc; and free up the per-mm memory which was set aside for that in the khugepaged_mm_slot. But one part of that is worth keeping: when hpage_collapse_scan_file() found SCAN_PTE_MAPPED_HUGEPAGE, that address was noted in the mm_slot to be tried for retraction later - catching, for example, page tables where a reversible mprotect() of a portion had required splitting the pmd, but now it can be recollapsed. Call collapse_pte_mapped_thp() directly in this case (why was it deferred before? I assume an issue with needing mmap_lock for write, but now it's only needed for read). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 125 +++++++----------------------------------------- 1 file changed, 16 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 060ac8789a1e..06c659e6a89e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,8 +92,6 @@ static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); static struct kmem_cache *mm_slot_cache __read_mostly; -#define MAX_PTE_MAPPED_THP 8 - struct collapse_control { bool is_khugepaged; @@ -107,15 +105,9 @@ struct collapse_control { /** * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned * @slot: hash lookup from mm to mm_slot - * @nr_pte_mapped_thp: number of pte mapped THP - * @pte_mapped_thp: address array corresponding pte mapped THP */ struct khugepaged_mm_slot { struct mm_slot slot; - - /* pte-mapped THP in this mm */ - int nr_pte_mapped_thp; - unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP]; }; /** @@ -1441,50 +1433,6 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot) } #ifdef CONFIG_SHMEM -/* - * Notify khugepaged that given addr of the mm is pte-mapped THP. Then - * khugepaged should try to collapse the page table. - * - * Note that following race exists: - * (1) khugepaged calls khugepaged_collapse_pte_mapped_thps() for mm_struct A, - * emptying the A's ->pte_mapped_thp[] array. - * (2) MADV_COLLAPSE collapses some file extent with target mm_struct B, and - * retract_page_tables() finds a VMA in mm_struct A mapping the same extent - * (at virtual address X) and adds an entry (for X) into mm_struct A's - * ->pte-mapped_thp[] array. - * (3) khugepaged calls khugepaged_collapse_scan_file() for mm_struct A at X, - * sees a pte-mapped THP (SCAN_PTE_MAPPED_HUGEPAGE) and adds an entry - * (for X) into mm_struct A's ->pte-mapped_thp[] array. - * Thus, it's possible the same address is added multiple times for the same - * mm_struct. Should this happen, we'll simply attempt - * collapse_pte_mapped_thp() multiple times for the same address, under the same - * exclusive mmap_lock, and assuming the first call is successful, subsequent - * attempts will return quickly (without grabbing any additional locks) when - * a huge pmd is found in find_pmd_or_thp_or_none(). Since this is a cheap - * check, and since this is a rare occurrence, the cost of preventing this - * "multiple-add" is thought to be more expensive than just handling it, should - * it occur. - */ -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - struct khugepaged_mm_slot *mm_slot; - struct mm_slot *slot; - bool ret = false; - - VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - - spin_lock(&khugepaged_mm_lock); - slot = mm_slot_lookup(mm_slots_hash, mm); - mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { - mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; - ret = true; - } - spin_unlock(&khugepaged_mm_lock); - return ret; -} - /* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) @@ -1706,29 +1654,6 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, return result; } -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ - struct mm_slot *slot = &mm_slot->slot; - struct mm_struct *mm = slot->mm; - int i; - - if (likely(mm_slot->nr_pte_mapped_thp == 0)) - return; - - if (!mmap_write_trylock(mm)) - return; - - if (unlikely(hpage_collapse_test_exit(mm))) - goto out; - - for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); - -out: - mm_slot->nr_pte_mapped_thp = 0; - mmap_write_unlock(mm); -} - static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; @@ -2372,16 +2297,6 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, { BUILD_BUG(); } - -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ -} - -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - return false; -} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2411,7 +2326,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.mm_slot = mm_slot; } spin_unlock(&khugepaged_mm_lock); - khugepaged_collapse_pte_mapped_thps(mm_slot); mm = slot->mm; /* @@ -2464,36 +2378,29 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.address); mmap_read_unlock(mm); - *result = hpage_collapse_scan_file(mm, - khugepaged_scan.address, - file, pgoff, cc); mmap_locked = false; + *result = hpage_collapse_scan_file(mm, + khugepaged_scan.address, file, pgoff, cc); + if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + mmap_locked = true; + if (hpage_collapse_test_exit(mm)) { + fput(file); + goto breakouterloop; + } + *result = collapse_pte_mapped_thp(mm, + khugepaged_scan.address, false); + if (*result == SCAN_PMD_MAPPED) + *result = SCAN_SUCCEED; + } fput(file); } else { *result = hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, - cc); + khugepaged_scan.address, &mmap_locked, cc); } - switch (*result) { - case SCAN_PTE_MAPPED_HUGEPAGE: { - pmd_t *pmd; - *result = find_pmd_or_thp_or_none(mm, - khugepaged_scan.address, - &pmd); - if (*result != SCAN_SUCCEED) - break; - if (!khugepaged_add_pte_mapped_thp(mm, - khugepaged_scan.address)) - break; - } fallthrough; - case SCAN_SUCCEED: + if (*result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; - break; - default: - break; - } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; From patchwork Tue Jun 20 07:59:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 1796986 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=PTqSZx2n; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QlfFF3NjCz20XS for ; Tue, 20 Jun 2023 18:00:17 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=PTqSZx2n; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QlfFD6nHTz30fx for ; Tue, 20 Jun 2023 18:00:16 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=PTqSZx2n; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::1129; helo=mail-yw1-x1129.google.com; envelope-from=hughd@google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QlfDs25hqz2xG9 for ; Tue, 20 Jun 2023 17:59:57 +1000 (AEST) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-5701810884aso39722327b3.0 for ; Tue, 20 Jun 2023 00:59:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247995; x=1689839995; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=PTqSZx2nNI4PV4JwXAt99NVosmlLWlQ/KYn1Uf1f3Yd1Z3NVe7obF1sAci/vRU4j/I /dODEkNukKEVJX5QJOgx8A4SVofs7E+EBk4aUKnEe/gxIGiuFO4K0lGYei4SR6tcYKnK AduYU9eLhwRb6vByQAsUQ6tUe6SjjlibwcGiUn9EmoAEYZk/fTkwOJxts/LviFdlBYyU 2HDG5towuNEQ5CvzS8P4/0FmCn98RyYyORjokeprTYbqe1PhIQYQex8ifvvnbwDxRbO6 McJrcKmHoDDBglMt6fs5aQdKWv2A5unXxMstk7k6pfo7pQ4FA/PbwXapoPL1ZYXTeDU7 ON7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247995; x=1689839995; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=gw/m8I5n2+kiURPqQLuBvdbafZxLrdBcOLEoD17xhe8FFUgQAHeUg3aI+LBa3FJg6/ WwYC3FpLVpkkWDPARBjBEp3JoUA4lWV7oUYz9swI0+qRBqZBMAzfMnT3M4fTbRyEhtLy Gh5l+Q5HxkCnWWSeWM1zxuveRLF7DSb1FAPKEbyMTE6IPXSR88HjAQXUtULJxmN+kUx5 ISMteF0fRr/OtYVspxobn02qT8KhJHJmMwPtyZWuWyopfgwUO85tlTzd6bfFa1M4Ir9/ IP4k5AQGnr+IrWtvv1mTZla9IdfvGlH+Dim1lYxru12v1Mf1hMEqaBzm6dcwh87j9ZP/ O7AQ== X-Gm-Message-State: AC+VfDyjXYM17yeI61l53d6+z6bqsM58U35yrx0VWsqtKe/GSlkXWBHn EMPrctD5u1szKNg+1/SpaCuwrg== X-Google-Smtp-Source: ACHHUZ515t86hS4ZGbNoo4Ty1vmElBXGedGoznKelWut2eqFES40KQtwKfY1cJoUB4hv3Ba9PnrZlQ== X-Received: by 2002:a5b:bc9:0:b0:bb4:14a2:fb4a with SMTP id c9-20020a5b0bc9000000b00bb414a2fb4amr2548922ybr.9.1687247993367; Tue, 20 Jun 2023 00:59:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v190-20020a257ac7000000b00ba88763e5b5sm268132ybc.2.2023.06.20.00.59.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:59:53 -0700 (PDT) Date: Tue, 20 Jun 2023 00:59:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton Subject: [PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <27505a8-e717-61ce-ab70-5f79d9bf646b@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, SeongJae Park , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" mmap_write_trylock() and vma_try_start_write() were added just for khugepaged, but now it has no use for them: delete. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 17 ----------------- include/linux/mmap_lock.h | 10 ---------- 2 files changed, 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3c2e56980853..9b24f8fbf899 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -690,21 +690,6 @@ static inline void vma_start_write(struct vm_area_struct *vma) up_write(&vma->vm_lock->lock); } -static inline bool vma_try_start_write(struct vm_area_struct *vma) -{ - int mm_lock_seq; - - if (__is_vma_write_locked(vma, &mm_lock_seq)) - return true; - - if (!down_write_trylock(&vma->vm_lock->lock)) - return false; - - vma->vm_lock_seq = mm_lock_seq; - up_write(&vma->vm_lock->lock); - return true; -} - static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -730,8 +715,6 @@ static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline bool vma_try_start_write(struct vm_area_struct *vma) - { return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..d1191f02c7fa 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -112,16 +112,6 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) return ret; } -static inline bool mmap_write_trylock(struct mm_struct *mm) -{ - bool ret; - - __mmap_lock_trace_start_locking(mm, true); - ret = down_write_trylock(&mm->mmap_lock) != 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; -} - static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true);