From patchwork Wed Jul 17 22:02:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1961810 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JyyrznhH; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JyyrznhH; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WPVMZ1FlHz20B2 for ; Thu, 18 Jul 2024 08:04:10 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JyyrznhH; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JyyrznhH; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WPVMZ0CKBz3dJB for ; Thu, 18 Jul 2024 08:04:10 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JyyrznhH; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JyyrznhH; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WPVKh5Jg8z3bVG for ; Thu, 18 Jul 2024 08:02:32 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=JyyrznhH7PKYjfMHJ3adtMwE/px0QtCPV2ihma2ICUkzJDdpoeQTDhKXwJuyWm2iVlFHcC Io8iG29tmdBzjIlVoY8UGZqJBv1YPqtSRfh1JVHt6oiCZWHp8b31b4R9owwaRpSgPD/0Ov uPLaxCdhEup8snzIfGC0negn50hJDwU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=JyyrznhH7PKYjfMHJ3adtMwE/px0QtCPV2ihma2ICUkzJDdpoeQTDhKXwJuyWm2iVlFHcC Io8iG29tmdBzjIlVoY8UGZqJBv1YPqtSRfh1JVHt6oiCZWHp8b31b4R9owwaRpSgPD/0Ov uPLaxCdhEup8snzIfGC0negn50hJDwU= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-357-O0PR6WryNnaanfa7REFY5w-1; Wed, 17 Jul 2024 18:02:25 -0400 X-MC-Unique: O0PR6WryNnaanfa7REFY5w-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-44aeacbf2baso259151cf.2 for ; Wed, 17 Jul 2024 15:02:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253745; x=1721858545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=aqWfXOu3Tr4uxoeBM5YB6dUzt09zgt4RDKIilTA9UUak3D9+8wtvBi2WOENJgmVvFL yFxMUsCy2dN4FQopDngEDeZX6I+02J3CERqaE1NB0Q7Xis9/7QHDFNYTBbN80bRmX7Dz 5nYsQZX9emB2h2Q/89EBDzk+HIpanlkSw2jH0eZ87AFX46O9f/TVOWRKMRV1KcoB7SnZ CnC9owY+vxM6eSHf/uJd5naDsi+iM5rjJsivOmi8n55/bsXLEctfHTB19mthK5dLj9Iu 3mNNiHloYggTNYHVXkLpUROW2DfIrr3UOxXKgLG1AHJ4sMd0xuaa7UlGJODb+T1C15qQ dufQ== X-Forwarded-Encrypted: i=1; AJvYcCXC1IVa1HYFu9gtX6YX3Csnkx5Q4aMiqN9oUFA72jcg9hyPtqiP2sizr5bpVWeXuOJpl/RpU2y8lRKuuedmAITnfCfUINsOKyjw6htqpg== X-Gm-Message-State: AOJu0YwM2XEkP6FRoGOiFDgwOmXO1w5DsKC/tuebdrcgJKupK+dkX8nl kYsZ2s+t9vlJPyTWL39I+48XP+uvEHbUgAxletg2a3EyU7omUkmWQmXPhF3lMcvCvcLIohiQbPc cIC38XF8vZZk224QHYTDoY409M4qxKuwbGeUTgnhlcZ8BhNxhhKLXs0D4KKgik6s= X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22369391cf.1.1721253744671; Wed, 17 Jul 2024 15:02:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFUvRqkcMnpaLn941R1BXuIssxTrm35f8/3mo50id9Lq3VPmnZD9wDrl4u3hs2U9BoucnNtpA== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22369171cf.1.1721253744329; Wed, 17 Jul 2024 15:02:24 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 1/6] mm/treewide: Remove pgd_devmap() Date: Wed, 17 Jul 2024 18:02:14 -0400 Message-ID: <20240717220219.3743374-2-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, Alistair Popple , Ryan Roberts , David Hildenbrand , x86@kernel.org, Hugh Dickins , peterx@redhat.com, Michal Hocko , Alex Williamson , linux-riscv@lists.infradead.org, Matthew Wilcox , Jason Gunthorpe , sparclinux@vger.kernel.org, Axel Rasmussen , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Dan Williams , Vlastimil Babka , Oscar Salvador Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" It's always 0 for all archs, and there's no sign to even support p4d entry in the near future. Remove it until it's needed for real. Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 5 ----- arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ----- arch/x86/include/asm/pgtable.h | 5 ----- include/linux/pgtable.h | 4 ---- mm/gup.c | 2 -- 5 files changed, 21 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index f8efbc128446..5d5d1b18b837 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1119,11 +1119,6 @@ static inline int pud_devmap(pud_t pud) { return 0; } - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #ifdef CONFIG_PAGE_TABLE_CHECK diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 5da92ba68a45..051b1b6d729c 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1431,11 +1431,6 @@ static inline int pud_devmap(pud_t pud) { return pte_devmap(pud_pte(pud)); } - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 701593c53f3b..0d234f48ceeb 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -311,11 +311,6 @@ static inline int pud_devmap(pud_t pud) return 0; } #endif - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2289e9f7aa1b..0a904300ac90 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1626,10 +1626,6 @@ static inline int pud_devmap(pud_t pud) { return 0; } -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ diff --git a/mm/gup.c b/mm/gup.c index 54d0dc3831fb..b023bcd38235 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3149,8 +3149,6 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr, if (!pgd_access_permitted(orig, flags & FOLL_WRITE)) return 0; - BUILD_BUG_ON(pgd_devmap(orig)); - page = pgd_page(orig); refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); From patchwork Wed Jul 17 22:02:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1961807 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BYyIlfsc; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BYyIlfsc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WPVLp0pgSz20B2 for ; Thu, 18 Jul 2024 08:03:30 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BYyIlfsc; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BYyIlfsc; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WPVLn6v1Rz3dHM for ; Thu, 18 Jul 2024 08:03:29 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BYyIlfsc; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BYyIlfsc; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WPVKg2sZtz30TZ for ; Thu, 18 Jul 2024 08:02:31 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253748; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=BYyIlfscfu0TlSzpQLATgGYot4scna3XiKQ9+i8j7Kd1FnfBNzEcnnhjNGtAcQ9dcPKMgB wYUrlFK67D2BmPKzIbutVrx8bkXkDcVHU4aQTkrjsuVrGFk8y9DgXjR+zwpPh/01PK0FtV IyrfauPoiksfnaK/WTQ9fO65TRju1zQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253748; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=BYyIlfscfu0TlSzpQLATgGYot4scna3XiKQ9+i8j7Kd1FnfBNzEcnnhjNGtAcQ9dcPKMgB wYUrlFK67D2BmPKzIbutVrx8bkXkDcVHU4aQTkrjsuVrGFk8y9DgXjR+zwpPh/01PK0FtV IyrfauPoiksfnaK/WTQ9fO65TRju1zQ= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-343-4a0NnKZkN0GgCYgoYmODAA-1; Wed, 17 Jul 2024 18:02:27 -0400 X-MC-Unique: 4a0NnKZkN0GgCYgoYmODAA-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-448335bb53aso257651cf.0 for ; Wed, 17 Jul 2024 15:02:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253747; x=1721858547; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=ALBEYdCUwQuJ3LIAC+EMbBKp4OKbnzBfjndcSNXyjkt6uP+Oj/LoyQj02b4InNZCsL fHheg68q9TKBmRUes0dImXk5qSKlvSVR5ilD1H7YvB5Cga01Ln09n4NQyR4fYGh5vRL7 mYVfFVzzn4Nai3annUvBXquLtZbESRBYhy98+qGCmB0oaFEA/FD8GusKes2HG/HiaWLu UeUSG/WstVy2z9kBOv6dvM8UC5ozIMc7KcBxcuIQhRlm9FBoGiog277kQnZghYQ8snQI HQQZwzRN183u3YG/qFpndN+JbXVG2tMeJaumJSVht2nZVS85fbNIuuLG8J6ltnpxPhox 1urQ== X-Forwarded-Encrypted: i=1; AJvYcCXxroDglgGJc0HCKaWE6eThHeu+W/RTBlJqweSCLPo5xIOIKf3m61KdNt99tJ+o4hErSp/7D9a/9kpXvCOFQvAjEUfPa0qdiokHtRmgdA== X-Gm-Message-State: AOJu0YxjpqO2kRPswbUiPP+TtvHUdA6JxLD26B7adtokazU6X46ziVF+ mR+PiNykefQNFPYTqITrk4kg4KtxBUo7l0SrYuVPwCaImCzHbFPuhSKP1/Ui6GwLbOQytnEhQOA 5wQcut2qBlW/DaovmXJ602V8rz3MN/r5oS+cjAH7CSz+k4VjmNXa7+zB38Yq2X4Q= X-Received: by 2002:a05:622a:3cd:b0:44f:89e3:e8d3 with SMTP id d75a77b69052e-44f89e3ebf3mr16472731cf.10.1721253746704; Wed, 17 Jul 2024 15:02:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG7KLWkbU1UwJKSHrhdblARNi4OMqfXeCTcT+ovTUhvRHwOTG0GQKyZtUb2OdQ3d+/fWLwEbg== X-Received: by 2002:a05:622a:3cd:b0:44f:89e3:e8d3 with SMTP id d75a77b69052e-44f89e3ebf3mr16472521cf.10.1721253746301; Wed, 17 Jul 2024 15:02:26 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:25 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 2/6] mm: PGTABLE_HAS_P[MU]D_LEAVES config options Date: Wed, 17 Jul 2024 18:02:15 -0400 Message-ID: <20240717220219.3743374-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, Alistair Popple , Ryan Roberts , David Hildenbrand , x86@kernel.org, Hugh Dickins , peterx@redhat.com, Michal Hocko , Alex Williamson , linux-riscv@lists.infradead.org, Matthew Wilcox , Jason Gunthorpe , sparclinux@vger.kernel.org, Axel Rasmussen , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Dan Williams , Vlastimil Babka , Oscar Salvador Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Introduce two more sub-options for PGTABLE_HAS_HUGE_LEAVES: - PGTABLE_HAS_PMD_LEAVES: set when there can be PMD mappings - PGTABLE_HAS_PUD_LEAVES: set when there can be PUD mappings It will help to identify whether the current build may only want PMD helpers but not PUD ones, as these sub-options will also check against the arch support over HAVE_ARCH_TRANSPARENT_HUGEPAGE[_PUD]. Note that having them depend on HAVE_ARCH_TRANSPARENT_HUGEPAGE[_PUD] is still some intermediate step. The best way is to have an option say "whether arch XXX supports PMD/PUD mappings" and so on. However let's leave that for later as that's the easy part. So far, we use these options to stably detect per-arch huge mapping support. Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 10 +++++++--- mm/Kconfig | 6 ++++++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 711632df7edf..37482c8445d1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -96,14 +96,18 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) -#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES -#define HPAGE_PMD_SHIFT PMD_SHIFT +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES #define HPAGE_PUD_SHIFT PUD_SHIFT #else -#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PUD_SHIFT ({ BUILD_BUG(); 0; }) #endif +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +#define HPAGE_PMD_SHIFT PMD_SHIFT +#else +#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) +#endif + #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1< X-Patchwork-Id: 1961812 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KxDrSraP; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=izz1NsE7; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WPVNK5Jdyz20B2 for ; Thu, 18 Jul 2024 08:04:49 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KxDrSraP; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=izz1NsE7; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WPVNK4HLDz3cbQ for ; Thu, 18 Jul 2024 08:04:49 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KxDrSraP; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=izz1NsE7; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WPVKk02LYz3cQ7 for ; Thu, 18 Jul 2024 08:02:33 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=KxDrSraPvEsWfp1nbyJF/qOmJquxnhovK15GPOskAe0C+OsXOM6S1mo5O8nuvafbmife4t /cYJPtVusxL/GQX3MgYnFB50UvotMlF6yB1g9Kdq/kJq/E+05O5WS0MaxOY3l++rrfIyA/ 6LZBHKQfNSIsgKJdh30P7CiyCjhd934= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=izz1NsE7z16KviwtOvglK+lpZx+KGQJ1uC7aJKZRi9qJtEGHXUKylk5QYfOkQHJxnV1H+x 4EzpGu3fPefhtVZImEc0ROdlgX4SfoIFRoP1Cw+mMEcye8G3leLRw80QPGrQZhOoISfQOm Sd21pGyb9NFaT0TLERkZW86tdkreZr0= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-367-YtCH_1ruM6Op1dbqXCsH5g-1; Wed, 17 Jul 2024 18:02:29 -0400 X-MC-Unique: YtCH_1ruM6Op1dbqXCsH5g-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6b7740643fbso590216d6.1 for ; Wed, 17 Jul 2024 15:02:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253749; x=1721858549; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=a8WjJd9O45t59Z+Y7HQa2cPjIYjs9rTc23eglikTybSV1kkxMvo5bsf0ExmmmeeoR4 6UXZCUs/1R6Vk9uAgcLaOdJvjTocPg7g8Ny9pVhbv0xKWQ1GeFh7J9vuVCksEca/MCE1 KXql8HdRQURdlR+z9dqE82Pf1E5f0ek/krmweyWPqbVD0A/DrQPujzvcbuJmmRnbKQ0k bdU/1pLUP/rkS6p7YccZufyPiFy7QvyZ3fefZHFr3xPdIHA3gJi5SB3NDtt0rdWpElY+ 6cBsgsDre8uppT2xu8ow/flaxlDj9hvnR2h3fbmFcJEZ031rXk7TGyuYao16vPwX5sO9 oo5w== X-Forwarded-Encrypted: i=1; AJvYcCVaWBRXMZsAc2nUIHHcA9+8SVfJz1OK5dY2EMbkacyCAH/yLw+XwTue0HhrQig0gOg7aNiR1yzPAINy+P2S7e95U4NndGW75fudRn3TtA== X-Gm-Message-State: AOJu0YwBNxQMqqKVW8nVjehqKCUma2mhnjp53MpSnrLri8nAPzcHgxxX 8iqJaXch+m6TNIKnXYIrmCK2L40bkFQWABuZzOolpQFf/IwvrSEZzmad5T3upNBjAUui4LQrTZT TxZAnU8sUwWCPYaioEK6A8cAO8npOcjU2xfpTr1aUWVy55MCfZXIle5+NR7WMx6k= X-Received: by 2002:ac8:5e4e:0:b0:44f:89e3:e8d2 with SMTP id d75a77b69052e-44f89e3ec09mr15925781cf.12.1721253748781; Wed, 17 Jul 2024 15:02:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkh4XOnr3AtOIW10S+BWZqmnCIDsT2k5pxPdK7+Hm3t/qz0UGl/1x0RrFHwUrrZQJVVqVXtg== X-Received: by 2002:ac8:5e4e:0:b0:44f:89e3:e8d2 with SMTP id d75a77b69052e-44f89e3ec09mr15925451cf.12.1721253748289; Wed, 17 Jul 2024 15:02:28 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:27 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 3/6] mm/treewide: Make pgtable-generic.c THP agnostic Date: Wed, 17 Jul 2024 18:02:16 -0400 Message-ID: <20240717220219.3743374-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, Alistair Popple , Ryan Roberts , David Hildenbrand , x86@kernel.org, Hugh Dickins , peterx@redhat.com, Michal Hocko , Alex Williamson , linux-riscv@lists.infradead.org, Matthew Wilcox , Jason Gunthorpe , sparclinux@vger.kernel.org, Axel Rasmussen , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Dan Williams , Vlastimil Babka , Oscar Salvador Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Make pmd/pud helpers to rely on the new PGTABLE_HAS_*_LEAVES option, rather than THP alone, as THP is only one form of huge mapping. Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 6 ++-- arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/riscv/include/asm/pgtable.h | 4 +-- arch/s390/include/asm/pgtable.h | 2 +- arch/s390/mm/pgtable.c | 4 +-- arch/sparc/mm/tlb.c | 2 +- arch/x86/mm/pgtable.c | 15 ++++----- include/linux/mm_types.h | 2 +- include/linux/pgtable.h | 4 +-- mm/memory.c | 2 +- mm/pgtable-generic.c | 32 ++++++++++---------- 12 files changed, 40 insertions(+), 37 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 5d5d1b18b837..b93c03256ada 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1105,7 +1105,7 @@ extern int __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS static inline int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, @@ -1114,7 +1114,9 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma, return __ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty); } +#endif +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES static inline int pud_devmap(pud_t pud) { return 0; @@ -1178,7 +1180,7 @@ static inline int __ptep_clear_flush_young(struct vm_area_struct *vma, return young; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 051b1b6d729c..84cf55e18334 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1119,7 +1119,7 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write) return pte_access_permitted(pmd_pte(pmd), write); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 5a4a75369043..d6a5457627df 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__pmd_frag_nr); unsigned long __pmd_frag_size_shift; EXPORT_SYMBOL(__pmd_frag_size_shift); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES /* * This is called when relaxing access to a hugepage. It's also called in the page * fault path when we don't hit any of the major fault cases, ie, a minor diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index ebfe8faafb79..8c28f15f601b 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -752,7 +752,7 @@ static inline bool pud_user_accessible_page(pud_t pud) } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); @@ -802,7 +802,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #define pmdp_collapse_flush pmdp_collapse_flush extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ /* * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index fb6870384b97..398bbed20dee 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1710,7 +1710,7 @@ pmd_t pmdp_xchg_direct(struct mm_struct *, unsigned long, pmd_t *, pmd_t); pmd_t pmdp_xchg_lazy(struct mm_struct *, unsigned long, pmd_t *, pmd_t); pud_t pudp_xchg_direct(struct mm_struct *, unsigned long, pud_t *, pud_t); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PGTABLE_DEPOSIT void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 2c944bafb030..c4481068734e 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -561,7 +561,7 @@ pud_t pudp_xchg_direct(struct mm_struct *mm, unsigned long addr, } EXPORT_SYMBOL(pudp_xchg_direct); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pgtable) { @@ -600,7 +600,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) set_pte(ptep, __pte(_PAGE_INVALID)); return pgtable; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ #ifdef CONFIG_PGSTE void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index 8648a50afe88..140813d07c9f 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -143,7 +143,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, tlb_batch_add_one(mm, vaddr, pte_exec(orig), hugepage_shift); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr, pmd_t pmd) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fa77411bb266..7b10d4a0c0cd 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -511,7 +511,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, return changed; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t entry, int dirty) @@ -532,7 +532,9 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, return changed; } +#endif /* PGTABLE_HAS_PMD_LEAVES */ +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, pud_t entry, int dirty) { @@ -552,7 +554,7 @@ int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, return changed; } -#endif +#endif /* PGTABLE_HAS_PUD_LEAVES */ int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) @@ -566,7 +568,7 @@ int ptep_test_and_clear_young(struct vm_area_struct *vma, return ret; } -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { @@ -580,7 +582,7 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma, } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES int pudp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp) { @@ -613,7 +615,7 @@ int ptep_clear_flush_young(struct vm_area_struct *vma, return ptep_test_and_clear_young(vma, address, ptep); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { @@ -641,8 +643,7 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, } #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, pud_t *pudp) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index ef09c4eef6d3..44ef91ce720c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -942,7 +942,7 @@ struct mm_struct { #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) && !USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif #ifdef CONFIG_NUMA_BALANCING diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0a904300ac90..5a5aaee5fa1c 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -362,7 +362,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, #endif #ifndef __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) @@ -383,7 +383,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, BUILD_BUG(); return 0; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ #endif #ifndef __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH diff --git a/mm/memory.c b/mm/memory.c index 802d0d8a40f9..126ee0903c79 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -666,7 +666,7 @@ struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, return NULL; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t pmd) { diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711a..e9fc3f6774a6 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -103,7 +103,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #ifndef __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS int pmdp_set_access_flags(struct vm_area_struct *vma, @@ -145,20 +145,6 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; } - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, - pud_t *pudp) -{ - pud_t pud; - - VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); - pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); - flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); - return pud; -} -#endif #endif #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT @@ -252,7 +238,21 @@ void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) call_rcu(&page->rcu_head, pte_free_now); } #endif /* pte_free_defer */ -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + pud_t pud; + + VM_BUG_ON(address & ~HPAGE_PUD_MASK); + VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return pud; +} +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) From patchwork Wed Jul 17 22:02:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1961813 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=UZJZGZQI; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JfQlVEUV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WPVPD2f4Lz1yY1 for ; Thu, 18 Jul 2024 08:05:36 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=UZJZGZQI; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JfQlVEUV; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WPVPD1XLVz3cBN for ; Thu, 18 Jul 2024 08:05:36 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=UZJZGZQI; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JfQlVEUV; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WPVKm1mwyz3dGt for ; Thu, 18 Jul 2024 08:02:36 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=UZJZGZQI5N1saBiwJ6j2rxTh3Yq5/Dk89IZ2y4ngk+myHoaK8F2cL9hwSfgTA3IrXvlp7v jh7NHmDlvCue842vJv/i6yQkbqPYA5fQ7/N5A5qIy0XKGdzoaIXkiCEsDNuRyrUbM7UXsi 7SARwxd+DhCPjKxII/5Fha8mHP0ek8Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=JfQlVEUVMOPpY652T1C1RBr+zzBkkHWWDdi4G1t5rg7cjR44Tm5rzY9tszfSWD5it2+wwg qitRdNyvVWN1WxDFZhZfQQjHkdSnYoMwsK0HHia5MQlpvtELx2a4Dyw3WdlFJ4m55BVLH0 pRn2ggEuIZF41b3Yb3a3u946NNMh9Fo= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-320-trzvn2cTMqiG-YKsGzdTMg-1; Wed, 17 Jul 2024 18:02:31 -0400 X-MC-Unique: trzvn2cTMqiG-YKsGzdTMg-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6b79c5c972eso514236d6.1 for ; Wed, 17 Jul 2024 15:02:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253751; x=1721858551; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=dKDzB1vu1K8ehi44G3fjDAaMQ2UU9CkILZuTR+2fHXuz9H0eKQh3IlOTVXNilIScb3 2mWegZMP4PIl7V32rDprn8KKztUi6X3X6IXwcW4lCD7eX3GsWsrLnV8qp0kerJocadI/ mP4UVYtEDcNHiP6Jsllq+HTynxqHKcZTbOlBhJ3E/KggTRAAZ/tS50udWjEVhueHZdEX nG3JmWZNgavT1hTR1V+9KJ8vOxOpgWTIotmVX+Ujln8MBMLCOWa9LMaBHw5zdEIKsvQ1 I0MG0DVd2MCvz8anyJxRivG16BfPuEZFIZQAi4P9GA8ygxJAbhZAeEnwG/e8T/je6lXk J6LQ== X-Forwarded-Encrypted: i=1; AJvYcCV62UBgRo+xOcQg3vBdRhrL13BdWve6x6WPMnFlXw6lrRV4N4VqHqxhMXDJA2+OO404zFeips9kx94+Z+2XiYrXotdnP+Tc8G+VRSn3LQ== X-Gm-Message-State: AOJu0YyxatKZyPEniyYCiFf4ijM3aQxlfIJvyWLTgwktV/XphZfju/Lc /4EV+7bHuYsdy+A0q6/BtcfiQ6cTlKgY9gOGdOS6JzbaEilkB2q8KXl3Kawvhl2PvS5IApRVzQn ubrxJffz1aAAb6qjiOah3SyRChLGenenlnTtz/Oegtd0MEFnoO994rw1e5E7xEUc= X-Received: by 2002:a05:622a:164b:b0:44e:cff7:3743 with SMTP id d75a77b69052e-44f86e7339emr21652001cf.9.1721253751287; Wed, 17 Jul 2024 15:02:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHgaCdw1UOkSL+KIFT2eMjRU01W5DATHpkzikOyR2irQlfZJvXc4O4/CXTqU/rdsuwB1sVYnA== X-Received: by 2002:a05:622a:164b:b0:44e:cff7:3743 with SMTP id d75a77b69052e-44f86e7339emr21651491cf.9.1721253750364; Wed, 17 Jul 2024 15:02:30 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:29 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 4/6] mm: Move huge mapping declarations from internal.h to huge_mm.h Date: Wed, 17 Jul 2024 18:02:17 -0400 Message-ID: <20240717220219.3743374-5-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, Alistair Popple , Ryan Roberts , David Hildenbrand , x86@kernel.org, Hugh Dickins , peterx@redhat.com, Michal Hocko , Alex Williamson , linux-riscv@lists.infradead.org, Matthew Wilcox , Jason Gunthorpe , sparclinux@vger.kernel.org, Axel Rasmussen , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Dan Williams , Vlastimil Babka , Oscar Salvador Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Most of the huge mapping relevant helpers are declared in huge_mm.h, not internal.h. Move the only few from internal.h into huge_mm.h. Here to move pmd_needs_soft_dirty_wp() over, we'll also need to move vma_soft_dirty_enabled() into mm.h as it'll be needed in two headers later (internal.h, huge_mm.h). Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 10 ++++++++++ include/linux/mm.h | 18 ++++++++++++++++++ mm/internal.h | 33 --------------------------------- 3 files changed, 28 insertions(+), 33 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 37482c8445d1..d8b642ad512d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -8,6 +8,11 @@ #include /* only for vma_is_dax() */ #include +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write); +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, @@ -629,4 +634,9 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) +static inline bool pmd_needs_soft_dirty_wp(struct vm_area_struct *vma, pmd_t pmd) +{ + return vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd); +} + #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 5f1075d19600..fa10802d8faa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1117,6 +1117,24 @@ static inline unsigned int folio_order(struct folio *folio) return folio->_flags_1 & 0xff; } +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma) +{ + /* + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty + * enablements, because when without soft-dirty being compiled in, + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) + * will be constantly true. + */ + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) + return false; + + /* + * Soft-dirty is kind of special: its tracking is enabled when the + * vma flags not set. + */ + return !(vma->vm_flags & VM_SOFTDIRTY); +} + #include /* diff --git a/mm/internal.h b/mm/internal.h index b4d86436565b..e49941747749 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -917,8 +917,6 @@ bool need_mlock_drain(int cpu); void mlock_drain_local(void); void mlock_drain_remote(int cpu); -extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); - /** * vma_address - Find the virtual address a page range is mapped at * @vma: The vma which maps this object. @@ -1229,14 +1227,6 @@ int migrate_device_coherent_page(struct page *page); int __must_check try_grab_folio(struct folio *folio, int refs, unsigned int flags); -/* - * mm/huge_memory.c - */ -void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write); -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write); - /* * mm/mmap.c */ @@ -1342,29 +1332,6 @@ static __always_inline void vma_set_range(struct vm_area_struct *vma, vma->vm_pgoff = pgoff; } -static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma) -{ - /* - * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty - * enablements, because when without soft-dirty being compiled in, - * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) - * will be constantly true. - */ - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) - return false; - - /* - * Soft-dirty is kind of special: its tracking is enabled when the - * vma flags not set. - */ - return !(vma->vm_flags & VM_SOFTDIRTY); -} - -static inline bool pmd_needs_soft_dirty_wp(struct vm_area_struct *vma, pmd_t pmd) -{ - return vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd); -} - static inline bool pte_needs_soft_dirty_wp(struct vm_area_struct *vma, pte_t pte) { return vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte); From patchwork Wed Jul 17 22:02:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1961815 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VOlm/XVE; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=EAuUkb1Q; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WPVQv2zrLz1yY1 for ; Thu, 18 Jul 2024 08:07:03 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VOlm/XVE; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=EAuUkb1Q; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WPVQv1wsnz3cNB for ; Thu, 18 Jul 2024 08:07:03 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VOlm/XVE; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=EAuUkb1Q; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WPVKt3Bsnz3cbX for ; Thu, 18 Jul 2024 08:02:42 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253758; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=VOlm/XVEQv1w+8cG1/huqZ6PENNtkyCZruAvvAh9gtogaR4RFKamYdJ+ZyH0w0sciB3jpE 48fsRAX79jPK+FtqUcOrjv65vzGbcCJhPm10U79INu6zQZz7aVYd/BIFFCuYP5C3ggWYdZ VYwE1knVPcw+72k9SKApXBOzHoBGJ8Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=EAuUkb1QZaupZ2P2BtaoTK92Ry5PO0A9L+/MEypTGHUn4FpxsFiUlKPf3vQmSpPMBZRp0X Ik1iebhkICXS0CsuU2GGlfXP/z5esf2Updhkx15JQV42tJ8Lno/66ZfJzhHDJhr+ZHgczS L7+kehDz5bPDw1TWmKO9HDEaQ8lPqik= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-195-s8eU9wbNPI2n2E2SEZvy9A-1; Wed, 17 Jul 2024 18:02:35 -0400 X-MC-Unique: s8eU9wbNPI2n2E2SEZvy9A-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6b792d6fe5bso618606d6.2 for ; Wed, 17 Jul 2024 15:02:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253754; x=1721858554; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=jNZBzB1Kvv0WQ5ZBaG5PkBK3kw0qrIycwmjsxLY1pgK7pNgidrxtXKvelaPa1s8IDy 82xuCic8Y1hL+9UQ0q1NpX/kWfhiVU/pXhlebLhYrsI9p5RmfQ6U6wlZX1t+oBipYXRz rAn2pNMota3MGQFNnPXT0YjhTWMnz88hcjvdWD38wz5ZiQGeAUjq1TW0YokC3Hj7V00G Wp+5eXWJnvW8VPK6CQpKsfa7/BNfD/PrIVMFst1pRfx98J5X40dRsfL9WUh4CVLya97X dq08ClkMHQJYSELCPUAGhv9v2cS21smHBb7uhS8vPngHL6bIkozBOMjF9nIbzasKGHQQ AKMQ== X-Forwarded-Encrypted: i=1; AJvYcCVgyiTzGCAfr3BQvM70+mwPgzxepMUby0Zq1bU9c13K0zktmaCYelJWPSzOufNAtErQSfXSQDkFGw3IFqdcF0hfSYzzSbeU+usxx6yivw== X-Gm-Message-State: AOJu0YyJkJwdNUKBCM5D7SOFTYwbDlv4/MP37SV1R+DhsWbUG9QJ6qv4 2+BTd6UgH9bcKwWZQetnlvKIj/twBx3gOwdvrdoElcUwdu92lqTTdULKS7MqqEIiSCskgJWh2Ej 7R9kxim9y8XCN9GCfULUac6NpQwXeEHHodwhuHsYpVzugUc77JpcxbOquBWYYnkE= X-Received: by 2002:a05:622a:178f:b0:447:f3d8:e394 with SMTP id d75a77b69052e-44f86186c77mr22821021cf.2.1721253753692; Wed, 17 Jul 2024 15:02:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHjoecSnH17WsLUAfbk8D5niviaBnv9p9n7X7ysSM2hooHGbHoRt5hE2vvjgIUgbHso/7wvBQ== X-Received: by 2002:a05:622a:178f:b0:447:f3d8:e394 with SMTP id d75a77b69052e-44f86186c77mr22820481cf.2.1721253752855; Wed, 17 Jul 2024 15:02:32 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:32 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 5/6] mm/huge_mapping: Create huge_mapping_pxx.c Date: Wed, 17 Jul 2024 18:02:18 -0400 Message-ID: <20240717220219.3743374-6-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, Alistair Popple , Ryan Roberts , David Hildenbrand , x86@kernel.org, Hugh Dickins , peterx@redhat.com, Michal Hocko , Alex Williamson , linux-riscv@lists.infradead.org, Matthew Wilcox , Jason Gunthorpe , sparclinux@vger.kernel.org, Axel Rasmussen , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Dan Williams , Vlastimil Babka , Oscar Salvador Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" At some point, we need to decouple "huge mapping" with THP, for any non-THP huge mappings in the future (hugetlb, pfnmap, etc..). This is the first step towards it. Or say, we already started to do this when PGTABLE_HAS_HUGE_LEAVES option was introduced: that is the first thing Linux start to describe LEAVEs rather than THPs when it is about huge mappings. Before that, mostly any huge mapping will have THP involved, like devmap. Hugetlb is special only because we duplicated the whole world there, but we also have a demand to decouple that now. Linux used to have huge_memory.c which only compiles with THP enabled, I wished it was called thp.c from the start. In reality, it contains more than processing THP: any huge mapping (even if not falling into THP category) will be able to leverage many of these helpers, but unfortunately this file is not compiled if !THP. These helpers are normally only about the pgtable operations, which may not be directly relevant to what type of huge folio (e.g. THP) underneath, or perhaps even if there's no vmemmap to back it. It's better we move them out of THP world. Create a new set of files huge_mapping_p[mu]d.c. This patch starts to move quite a few essential helpers from huge_memory.c into these new files, so that they'll start to work and compile rely on PGTABLE_HAS_PXX_LEAVES rather than THP. Split them into two files by nature so that e.g. archs that only supports PMD huge mapping can avoid compiling the whole -pud file, with the hope to reduce the size of object compiled and linked. No functional change intended, but only code movement. Said that, there will be some "ifdef" machinery changes to pass all kinds of compilations. Cc: Jason Gunthorpe Cc: Matthew Wilcox Cc: Oscar Salvador Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 318 +++++--- include/linux/pgtable.h | 23 +- include/trace/events/huge_mapping.h | 41 + include/trace/events/thp.h | 28 - mm/Makefile | 2 + mm/huge_mapping_pmd.c | 979 +++++++++++++++++++++++ mm/huge_mapping_pud.c | 235 ++++++ mm/huge_memory.c | 1125 +-------------------------- 8 files changed, 1472 insertions(+), 1279 deletions(-) create mode 100644 include/trace/events/huge_mapping.h create mode 100644 mm/huge_mapping_pmd.c create mode 100644 mm/huge_mapping_pud.c diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d8b642ad512d..aea2784df8ef 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -8,43 +8,214 @@ #include /* only for vma_is_dax() */ #include +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); void touch_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, bool write); -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write); -pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); -vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); -int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); -void huge_pmd_set_accessed(struct vm_fault *vmf); int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma); +int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, + unsigned long addr); +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address); +spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); -#else -static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +static inline spinlock_t * +pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { + if (pud_trans_huge(*pud) || pud_devmap(*pud)) + return __pud_trans_huge_lock(pud, vma); + else + return NULL; } -#endif +#define split_huge_pud(__vma, __pud, __address) \ + do { \ + pud_t *____pud = (__pud); \ + if (pud_trans_huge(*____pud) || pud_devmap(*____pud)) \ + __split_huge_pud(__vma, __pud, __address); \ + } while (0) +#else /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ +static inline void +huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +{ +} + +static inline int +change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} + +static inline spinlock_t * +pud_trans_huge_lock(pud_t *pud, + struct vm_area_struct *vma) +{ + return NULL; +} + +static inline void +touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) +{ +} + +static inline int +copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pud_t *dst_pud, pud_t *src_pud, unsigned long addr, + struct vm_area_struct *vma) +{ + return 0; +} + +static inline int +zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, + unsigned long addr) +{ + return 0; +} + +static inline void +__split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address) +{ +} + +#define split_huge_pud(__vma, __pud, __address) do {} while (0) +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ + +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); +void huge_pmd_set_accessed(struct vm_fault *vmf); vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf); -bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next); int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr); -int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, - unsigned long addr); bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd); int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, unsigned long cp_flags); +void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio); +void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio); +void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio); +spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); +bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, + pmd_t pmd); +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd); + +static inline int is_swap_pmd(pmd_t pmd) +{ + return !pmd_none(pmd) && !pmd_present(pmd); +} + +/* mmap_lock must be held on entry */ +static inline spinlock_t * +pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + return __pmd_trans_huge_lock(pmd, vma); + else + return NULL; +} + +#define split_huge_pmd(__vma, __pmd, __address) \ + do { \ + pmd_t *____pmd = (__pmd); \ + if (is_swap_pmd(*____pmd) || pmd_is_leaf(*____pmd)) \ + __split_huge_pmd(__vma, __pmd, __address, \ + false, NULL); \ + } while (0) +#else /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ +static inline spinlock_t * +pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + return NULL; +} + +static inline int is_swap_pmd(pmd_t pmd) +{ + return 0; +} +static inline void +__split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio) +{ +} +#define split_huge_pmd(__vma, __pmd, __address) do {} while (0) + +static inline int +copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + return 0; +} + +static inline int +zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr) +{ + return 0; +} + +static inline vm_fault_t +do_huge_pmd_wp_page(struct vm_fault *vmf) +{ + return 0; +} + +static inline void +huge_pmd_set_accessed(struct vm_fault *vmf) +{ +} + +static inline int +change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} + +static inline bool +move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) +{ + return false; +} + +static inline void +split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio) +{ +} + +static inline void +split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio) +{ +} +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ + +bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, unsigned long next); vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); +struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_UNSUPPORTED, @@ -130,6 +301,9 @@ extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd); + static inline bool hugepage_global_enabled(void) { return transparent_hugepage_flags & @@ -332,44 +506,6 @@ static inline int split_huge_page(struct page *page) } void deferred_split_folio(struct folio *folio); -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio); - -#define split_huge_pmd(__vma, __pmd, __address) \ - do { \ - pmd_t *____pmd = (__pmd); \ - if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \ - || pmd_devmap(*____pmd)) \ - __split_huge_pmd(__vma, __pmd, __address, \ - false, NULL); \ - } while (0) - - -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct folio *folio); - -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address); - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags); -#else -static inline int -change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) { return 0; } -#endif - -#define split_huge_pud(__vma, __pud, __address) \ - do { \ - pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) \ - || pud_devmap(*____pud)) \ - __split_huge_pud(__vma, __pud, __address); \ - } while (0) - int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); int madvise_collapse(struct vm_area_struct *vma, @@ -377,31 +513,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); -spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); -spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); - -static inline int is_swap_pmd(pmd_t pmd) -{ - return !pmd_none(pmd) && !pmd_present(pmd); -} - -/* mmap_lock must be held on entry */ -static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, - struct vm_area_struct *vma) -{ - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) - return __pmd_trans_huge_lock(pmd, vma); - else - return NULL; -} -static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, - struct vm_area_struct *vma) -{ - if (pud_trans_huge(*pud) || pud_devmap(*pud)) - return __pud_trans_huge_lock(pud, vma); - else - return NULL; -} /** * folio_test_pmd_mappable - Can we map this folio with a PMD? @@ -416,6 +527,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); extern struct folio *huge_zero_folio; extern unsigned long huge_zero_pfn; @@ -445,13 +557,17 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } -void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, bool freeze, struct folio *folio); bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline void +__split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) +{ +} + static inline bool folio_test_pmd_mappable(struct folio *folio) { return false; @@ -505,16 +621,6 @@ static inline int split_huge_page(struct page *page) return 0; } static inline void deferred_split_folio(struct folio *folio) {} -#define split_huge_pmd(__vma, __pmd, __address) \ - do { } while (0) - -static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio) {} -static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze, struct folio *folio) {} -static inline void split_huge_pmd_locked(struct vm_area_struct *vma, - unsigned long address, pmd_t *pmd, - bool freeze, struct folio *folio) {} static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, @@ -523,9 +629,6 @@ static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, return false; } -#define split_huge_pud(__vma, __pmd, __address) \ - do { } while (0) - static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { @@ -545,20 +648,6 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, long adjust_next) { } -static inline int is_swap_pmd(pmd_t pmd) -{ - return 0; -} -static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, - struct vm_area_struct *vma) -{ - return NULL; -} -static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, - struct vm_area_struct *vma) -{ - return NULL; -} static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { @@ -606,15 +695,8 @@ static inline int next_order(unsigned long *orders, int prev) return 0; } -static inline void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ -} - -static inline int change_huge_pud(struct mmu_gather *tlb, - struct vm_area_struct *vma, pud_t *pudp, - unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) +static inline vm_fault_t +do_huge_pmd_anonymous_page(struct vm_fault *vmf) { return 0; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5a5aaee5fa1c..5e505373b113 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -628,8 +628,8 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, #endif /* __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) && \ + !defined(__HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL) static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, int full) @@ -638,14 +638,14 @@ static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, } #endif -#ifndef __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL +#if defined(CONFIG_PGTABLE_HAS_PUD_LEAVES) && \ + !defined(__HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL) static inline pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, int full) { return pudp_huge_get_and_clear(vma->vm_mm, address, pudp); } -#endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL @@ -894,9 +894,9 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif + #ifndef __HAVE_ARCH_PUDP_SET_WRPROTECT -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES static inline void pudp_set_wrprotect(struct mm_struct *mm, unsigned long address, pud_t *pudp) { @@ -910,8 +910,7 @@ static inline void pudp_set_wrprotect(struct mm_struct *mm, { BUILD_BUG(); } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ #endif #ifndef pmdp_collapse_flush @@ -1735,7 +1734,6 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE -#ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * ARCHes with special requirements for evicting THP backing TLB entries can * implement this. Otherwise also, it can help optimize normal TLB flush in @@ -1745,10 +1743,15 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) * invalidate the entire TLB which is not desirable. * e.g. see arch/arc: flush_pmd_tlb_range */ +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define flush_pmd_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#define flush_pud_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #else #define flush_pmd_tlb_range(vma, addr, end) BUILD_BUG() +#endif + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +#define flush_pud_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#else #define flush_pud_tlb_range(vma, addr, end) BUILD_BUG() #endif #endif diff --git a/include/trace/events/huge_mapping.h b/include/trace/events/huge_mapping.h new file mode 100644 index 000000000000..20036d090ce5 --- /dev/null +++ b/include/trace/events/huge_mapping.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM huge_mapping + +#if !defined(_TRACE_HUGE_MAPPING_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HUGE_MAPPING_H + +#include +#include + +DECLARE_EVENT_CLASS(migration_pmd, + + TP_PROTO(unsigned long addr, unsigned long pmd), + + TP_ARGS(addr, pmd), + + TP_STRUCT__entry( + __field(unsigned long, addr) + __field(unsigned long, pmd) + ), + + TP_fast_assign( + __entry->addr = addr; + __entry->pmd = pmd; + ), + TP_printk("addr=%lx, pmd=%lx", __entry->addr, __entry->pmd) +); + +DEFINE_EVENT(migration_pmd, set_migration_pmd, + TP_PROTO(unsigned long addr, unsigned long pmd), + TP_ARGS(addr, pmd) +); + +DEFINE_EVENT(migration_pmd, remove_migration_pmd, + TP_PROTO(unsigned long addr, unsigned long pmd), + TP_ARGS(addr, pmd) +); +#endif /* _TRACE_HUGE_MAPPING_H */ + +/* This part must be outside protection */ +#include diff --git a/include/trace/events/thp.h b/include/trace/events/thp.h index f50048af5fcc..395b574b1c79 100644 --- a/include/trace/events/thp.h +++ b/include/trace/events/thp.h @@ -66,34 +66,6 @@ DEFINE_EVENT(hugepage_update, hugepage_update_pud, TP_PROTO(unsigned long addr, unsigned long pud, unsigned long clr, unsigned long set), TP_ARGS(addr, pud, clr, set) ); - -DECLARE_EVENT_CLASS(migration_pmd, - - TP_PROTO(unsigned long addr, unsigned long pmd), - - TP_ARGS(addr, pmd), - - TP_STRUCT__entry( - __field(unsigned long, addr) - __field(unsigned long, pmd) - ), - - TP_fast_assign( - __entry->addr = addr; - __entry->pmd = pmd; - ), - TP_printk("addr=%lx, pmd=%lx", __entry->addr, __entry->pmd) -); - -DEFINE_EVENT(migration_pmd, set_migration_pmd, - TP_PROTO(unsigned long addr, unsigned long pmd), - TP_ARGS(addr, pmd) -); - -DEFINE_EVENT(migration_pmd, remove_migration_pmd, - TP_PROTO(unsigned long addr, unsigned long pmd), - TP_ARGS(addr, pmd) -); #endif /* _TRACE_THP_H */ /* This part must be outside protection */ diff --git a/mm/Makefile b/mm/Makefile index d2915f8c9dc0..3a846121b1f5 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -95,6 +95,8 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o +obj-$(CONFIG_PGTABLE_HAS_PMD_LEAVES) += huge_mapping_pmd.o +obj-$(CONFIG_PGTABLE_HAS_PUD_LEAVES) += huge_mapping_pud.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o diff --git a/mm/huge_mapping_pmd.c b/mm/huge_mapping_pmd.c new file mode 100644 index 000000000000..7b85e2a564d6 --- /dev/null +++ b/mm/huge_mapping_pmd.c @@ -0,0 +1,979 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Red Hat, Inc. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" +#include "swap.h" + +#define CREATE_TRACE_POINTS +#include + +/* + * Returns page table lock pointer if a given pmd maps a thp, NULL otherwise. + * + * Note that if it returns page table lock pointer, this routine returns without + * unlocking page table lock. So callers must unlock it. + */ +spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + spinlock_t *ptl; + + ptl = pmd_lock(vma->vm_mm, pmd); + if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || + pmd_devmap(*pmd))) + return ptl; + spin_unlock(ptl); + return NULL; +} + +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_WRITE)) + pmd = pmd_mkwrite(pmd, vma); + return pmd; +} + +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write) +{ + pmd_t _pmd; + + _pmd = pmd_mkyoung(*pmd); + if (write) + _pmd = pmd_mkdirty(_pmd); + if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, + pmd, _pmd, write)) + update_mmu_cache_pmd(vma, addr, pmd); +} + +int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + spinlock_t *dst_ptl, *src_ptl; + struct page *src_page; + struct folio *src_folio; + pmd_t pmd; + pgtable_t pgtable = NULL; + int ret = -ENOMEM; + + /* Skip if can be re-fill on fault */ + if (!vma_is_anonymous(dst_vma)) + return 0; + + pgtable = pte_alloc_one(dst_mm); + if (unlikely(!pgtable)) + goto out; + + dst_ptl = pmd_lock(dst_mm, dst_pmd); + src_ptl = pmd_lockptr(src_mm, src_pmd); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + ret = -EAGAIN; + pmd = *src_pmd; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (unlikely(is_swap_pmd(pmd))) { + swp_entry_t entry = pmd_to_swp_entry(pmd); + + VM_BUG_ON(!is_pmd_migration_entry(pmd)); + if (!is_readable_migration_entry(entry)) { + entry = make_readable_migration_entry( + swp_offset(entry)); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + if (pmd_swp_uffd_wp(*src_pmd)) + pmd = pmd_swp_mkuffd_wp(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (!userfaultfd_wp(dst_vma)) + pmd = pmd_swp_clear_uffd_wp(pmd); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; + } +#endif + + if (unlikely(!pmd_trans_huge(pmd))) { + pte_free(dst_mm, pgtable); + goto out_unlock; + } + /* + * When page table lock is held, the huge zero pmd should not be + * under splitting since we don't split the page itself, only pmd to + * a page table. + */ + if (is_huge_zero_pmd(pmd)) { + /* + * mm_get_huge_zero_folio() will never allocate a new + * folio here, since we already have a zero page to + * copy. It just takes a reference. + */ + mm_get_huge_zero_folio(dst_mm); + goto out_zero_page; + } + + src_page = pmd_page(pmd); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + src_folio = page_folio(src_page); + + folio_get(src_folio); + if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { + /* Page maybe pinned: split and retry the fault on PTEs. */ + folio_put(src_folio); + pte_free(dst_mm, pgtable); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); + return -EAGAIN; + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); +out_zero_page: + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + pmdp_set_wrprotect(src_mm, addr, src_pmd); + if (!userfaultfd_wp(dst_vma)) + pmd = pmd_clear_uffd_wp(pmd); + pmd = pmd_mkold(pmd_wrprotect(pmd)); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + + ret = 0; +out_unlock: + spin_unlock(src_ptl); + spin_unlock(dst_ptl); +out: + return ret; +} + +void huge_pmd_set_accessed(struct vm_fault *vmf) +{ + bool write = vmf->flags & FAULT_FLAG_WRITE; + + vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) + goto unlock; + + touch_pmd(vmf->vma, vmf->address, vmf->pmd, write); + +unlock: + spin_unlock(vmf->ptl); +} + +vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) +{ + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; + struct vm_area_struct *vma = vmf->vma; + struct folio *folio; + struct page *page; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + pmd_t orig_pmd = vmf->orig_pmd; + + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + VM_BUG_ON_VMA(!vma->anon_vma, vma); + + if (is_huge_zero_pmd(orig_pmd)) + goto fallback; + + spin_lock(vmf->ptl); + + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + + page = pmd_page(orig_pmd); + folio = page_folio(page); + VM_BUG_ON_PAGE(!PageHead(page), page); + + /* Early check when only holding the PT lock. */ + if (PageAnonExclusive(page)) + goto reuse; + + if (!folio_trylock(folio)) { + folio_get(folio); + spin_unlock(vmf->ptl); + folio_lock(folio); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + folio_unlock(folio); + folio_put(folio); + return 0; + } + folio_put(folio); + } + + /* Recheck after temporarily dropping the PT lock. */ + if (PageAnonExclusive(page)) { + folio_unlock(folio); + goto reuse; + } + + /* + * See do_wp_page(): we can only reuse the folio exclusively if + * there are no additional references. Note that we always drain + * the LRU cache immediately after adding a THP. + */ + if (folio_ref_count(folio) > + 1 + folio_test_swapcache(folio) * folio_nr_pages(folio)) + goto unlock_fallback; + if (folio_test_swapcache(folio)) + folio_free_swap(folio); + if (folio_ref_count(folio) == 1) { + pmd_t entry; + + folio_move_anon_rmap(folio, vma); + SetPageAnonExclusive(page); + folio_unlock(folio); +reuse: + if (unlikely(unshare)) { + spin_unlock(vmf->ptl); + return 0; + } + entry = pmd_mkyoung(orig_pmd); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + return 0; + } + +unlock_fallback: + folio_unlock(folio); + spin_unlock(vmf->ptl); +fallback: + __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); + return VM_FAULT_FALLBACK; +} + +bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, + pmd_t pmd) +{ + struct page *page; + + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) + return false; + + /* Don't touch entries that are not even readable (NUMA hinting). */ + if (pmd_protnone(pmd)) + return false; + + /* Do we need write faults for softdirty tracking? */ + if (pmd_needs_soft_dirty_wp(vma, pmd)) + return false; + + /* Do we need write faults for uffd-wp tracking? */ + if (userfaultfd_huge_pmd_wp(vma, pmd)) + return false; + + if (!(vma->vm_flags & VM_SHARED)) { + /* See can_change_pte_writable(). */ + page = vm_normal_page_pmd(vma, addr, pmd); + return page && PageAnon(page) && PageAnonExclusive(page); + } + + /* See can_change_pte_writable(). */ + return pmd_dirty(pmd); +} + +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + +int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr) +{ + pmd_t orig_pmd; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + + ptl = __pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + /* + * For architectures like ppc64 we look at deposited pgtable + * when calling pmdp_huge_get_and_clear. So do the + * pgtable_trans_huge_withdraw after finishing pmdp related + * operations. + */ + orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, + tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + if (vma_is_special_huge(vma)) { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); + spin_unlock(ptl); + } else if (is_huge_zero_pmd(orig_pmd)) { + zap_deposited_table(tlb->mm, pmd); + spin_unlock(ptl); + } else { + struct folio *folio = NULL; + int flush_needed = 1; + + if (pmd_present(orig_pmd)) { + struct page *page = pmd_page(orig_pmd); + + folio = page_folio(page); + folio_remove_rmap_pmd(folio, page, vma); + WARN_ON_ONCE(folio_mapcount(folio) < 0); + VM_BUG_ON_PAGE(!PageHead(page), page); + } else if (thp_migration_supported()) { + swp_entry_t entry; + + VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); + entry = pmd_to_swp_entry(orig_pmd); + folio = pfn_swap_entry_folio(entry); + flush_needed = 0; + } else + WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + + if (folio_test_anon(folio)) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); + } else { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, mm_counter_file(folio), + -HPAGE_PMD_NR); + } + + spin_unlock(ptl); + if (flush_needed) + tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); + } + return 1; +} + +static pmd_t move_soft_dirty_pmd(pmd_t pmd) +{ +#ifdef CONFIG_MEM_SOFT_DIRTY + if (unlikely(is_pmd_migration_entry(pmd))) + pmd = pmd_swp_mksoft_dirty(pmd); + else if (pmd_present(pmd)) + pmd = pmd_mksoft_dirty(pmd); +#endif + return pmd; +} + +#ifndef pmd_move_must_withdraw +static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, + spinlock_t *old_pmd_ptl, + struct vm_area_struct *vma) +{ + /* + * With split pmd lock we also need to move preallocated + * PTE page table if new_pmd is on different PMD page table. + * + * We also don't deposit and withdraw tables for file pages. + */ + return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma); +} +#endif + +bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) +{ + spinlock_t *old_ptl, *new_ptl; + pmd_t pmd; + struct mm_struct *mm = vma->vm_mm; + bool force_flush = false; + + /* + * The destination pmd shouldn't be established, free_pgtables() + * should have released it; but move_page_tables() might have already + * inserted a page table, if racing against shmem/file collapse. + */ + if (!pmd_none(*new_pmd)) { + VM_BUG_ON(pmd_trans_huge(*new_pmd)); + return false; + } + + /* + * We don't have to worry about the ordering of src and dst + * ptlocks because exclusive mmap_lock prevents deadlock. + */ + old_ptl = __pmd_trans_huge_lock(old_pmd, vma); + if (old_ptl) { + new_ptl = pmd_lockptr(mm, new_pmd); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); + if (pmd_present(pmd)) + force_flush = true; + VM_BUG_ON(!pmd_none(*new_pmd)); + + if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { + pgtable_t pgtable; + pgtable = pgtable_trans_huge_withdraw(mm, old_pmd); + pgtable_trans_huge_deposit(mm, new_pmd, pgtable); + } + pmd = move_soft_dirty_pmd(pmd); + set_pmd_at(mm, new_addr, new_pmd, pmd); + if (force_flush) + flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + spin_unlock(old_ptl); + return true; + } + return false; +} + +/* + * Returns + * - 0 if PMD could not be locked + * - 1 if PMD was locked but protections unchanged and TLB flush unnecessary + * or if prot_numa but THP migration is not supported + * - HPAGE_PMD_NR if protections changed and TLB flush necessary + */ +int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + pmd_t oldpmd, entry; + bool prot_numa = cp_flags & MM_CP_PROT_NUMA; + bool uffd_wp = cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + int ret = 1; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + + if (prot_numa && !thp_migration_supported()) + return 1; + + ptl = __pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_swap_pmd(*pmd)) { + swp_entry_t entry = pmd_to_swp_entry(*pmd); + struct folio *folio = pfn_swap_entry_folio(entry); + pmd_t newpmd; + + VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + if (is_writable_migration_entry(entry)) { + /* + * A protection check is difficult so + * just be safe and disable write + */ + if (folio_test_anon(folio)) + entry = make_readable_exclusive_migration_entry(swp_offset(entry)); + else + entry = make_readable_migration_entry(swp_offset(entry)); + newpmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*pmd)) + newpmd = pmd_swp_mksoft_dirty(newpmd); + } else { + newpmd = *pmd; + } + + if (uffd_wp) + newpmd = pmd_swp_mkuffd_wp(newpmd); + else if (uffd_wp_resolve) + newpmd = pmd_swp_clear_uffd_wp(newpmd); + if (!pmd_same(*pmd, newpmd)) + set_pmd_at(mm, addr, pmd, newpmd); + goto unlock; + } +#endif + + if (prot_numa) { + struct folio *folio; + bool toptier; + /* + * Avoid trapping faults against the zero page. The read-only + * data is likely to be read-cached on the local CPU and + * local/remote hits to the zero page are not interesting. + */ + if (is_huge_zero_pmd(*pmd)) + goto unlock; + + if (pmd_protnone(*pmd)) + goto unlock; + + folio = pmd_folio(*pmd); + toptier = node_is_toptier(folio_nid(folio)); + /* + * Skip scanning top tier node if normal numa + * balancing is disabled + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && + toptier) + goto unlock; + + if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && + !toptier) + folio_xchg_access_time(folio, + jiffies_to_msecs(jiffies)); + } + /* + * In case prot_numa, we are under mmap_read_lock(mm). It's critical + * to not clear pmd intermittently to avoid race with MADV_DONTNEED + * which is also under mmap_read_lock(mm): + * + * CPU0: CPU1: + * change_huge_pmd(prot_numa=1) + * pmdp_huge_get_and_clear_notify() + * madvise_dontneed() + * zap_pmd_range() + * pmd_trans_huge(*pmd) == 0 (without ptl) + * // skip the pmd + * set_pmd_at(); + * // pmd is re-established + * + * The race makes MADV_DONTNEED miss the huge pmd and don't clear it + * which may break userspace. + * + * pmdp_invalidate_ad() is required to make sure we don't miss + * dirty/young flags set by hardware. + */ + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); + + entry = pmd_modify(oldpmd, newprot); + if (uffd_wp) + entry = pmd_mkuffd_wp(entry); + else if (uffd_wp_resolve) + /* + * Leave the write bit to be handled by PF interrupt + * handler, then things like COW could be properly + * handled. + */ + entry = pmd_clear_uffd_wp(entry); + + /* See change_pte_range(). */ + if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && + can_change_pmd_writable(vma, addr, entry)) + entry = pmd_mkwrite(entry, vma); + + ret = HPAGE_PMD_NR; + set_pmd_at(mm, addr, pmd, entry); + + if (huge_pmd_needs_flush(oldpmd, entry)) + tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); +unlock: + spin_unlock(ptl); + return ret; +} + +static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long haddr, bool freeze) +{ + struct mm_struct *mm = vma->vm_mm; + struct folio *folio; + struct page *page; + pgtable_t pgtable; + pmd_t old_pmd, _pmd; + bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; + bool anon_exclusive = false, dirty = false; + unsigned long addr; + pte_t *pte; + int i; + + VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); + VM_BUG_ON_VMA(vma->vm_start > haddr, vma); + VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && + !pmd_devmap(*pmd)); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + count_vm_event(THP_SPLIT_PMD); +#endif + + if (!vma_is_anonymous(vma)) { + old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); + /* + * We are going to unmap this huge page. So + * just go ahead and zap it + */ + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmd); + if (vma_is_special_huge(vma)) + return; + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + folio = pfn_swap_entry_folio(entry); + } else { + page = pmd_page(old_pmd); + folio = page_folio(page); + if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) + folio_mark_dirty(folio); + if (!folio_test_referenced(folio) && pmd_young(old_pmd)) + folio_set_referenced(folio); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); + } + add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); + return; + } + + if (is_huge_zero_pmd(*pmd)) { + /* + * FIXME: Do we want to invalidate secondary mmu by calling + * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below + * inside __split_huge_pmd() ? + * + * We are going from a zero huge page write protected to zero + * small page also write protected so it does not seems useful + * to invalidate secondary mmu at this time. + */ + return __split_huge_zero_page_pmd(vma, haddr, pmd); + } + + pmd_migration = is_pmd_migration_entry(*pmd); + if (unlikely(pmd_migration)) { + swp_entry_t entry; + + old_pmd = *pmd; + entry = pmd_to_swp_entry(old_pmd); + page = pfn_swap_entry_to_page(entry); + write = is_writable_migration_entry(entry); + if (PageAnon(page)) + anon_exclusive = is_readable_exclusive_migration_entry(entry); + young = is_migration_entry_young(entry); + dirty = is_migration_entry_dirty(entry); + soft_dirty = pmd_swp_soft_dirty(old_pmd); + uffd_wp = pmd_swp_uffd_wp(old_pmd); + } else { + /* + * Up to this point the pmd is present and huge and userland has + * the whole access to the hugepage during the split (which + * happens in place). If we overwrite the pmd with the not-huge + * version pointing to the pte here (which of course we could if + * all CPUs were bug free), userland could trigger a small page + * size TLB miss on the small sized TLB while the hugepage TLB + * entry is still established in the huge TLB. Some CPU doesn't + * like that. See + * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum + * 383 on page 105. Intel should be safe but is also warns that + * it's only safe if the permission and cache attributes of the + * two entries loaded in the two TLB is identical (which should + * be the case here). But it is generally safer to never allow + * small and huge TLB entries for the same virtual address to be + * loaded simultaneously. So instead of doing "pmd_populate(); + * flush_pmd_tlb_range();" we first mark the current pmd + * notpresent (atomically because here the pmd_trans_huge must + * remain set at all times on the pmd until the split is + * complete for this pmd), then we flush the SMP TLB and finally + * we write the non-huge version of the pmd entry with + * pmd_populate. + */ + old_pmd = pmdp_invalidate(vma, haddr, pmd); + page = pmd_page(old_pmd); + folio = page_folio(page); + if (pmd_dirty(old_pmd)) { + dirty = true; + folio_set_dirty(folio); + } + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + uffd_wp = pmd_uffd_wp(old_pmd); + + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + + /* + * Without "freeze", we'll simply split the PMD, propagating the + * PageAnonExclusive() flag for each PTE by setting it for + * each subpage -- no need to (temporarily) clear. + * + * With "freeze" we want to replace mapped pages by + * migration entries right away. This is only possible if we + * managed to clear PageAnonExclusive() -- see + * set_pmd_migration_entry(). + * + * In case we cannot clear PageAnonExclusive(), split the PMD + * only and let try_to_migrate_one() fail later. + * + * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. + */ + anon_exclusive = PageAnonExclusive(page); + if (freeze && anon_exclusive && + folio_try_share_anon_rmap_pmd(folio, page)) + freeze = false; + if (!freeze) { + rmap_t rmap_flags = RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags |= RMAP_EXCLUSIVE; + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } + } + + /* + * Withdraw the table only after we mark the pmd entry invalid. + * This's critical for some architectures (Power). + */ + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); + + /* + * Note that NUMA hinting access restrictions are not transferred to + * avoid any possibility of altering permissions across VMAs. + */ + if (freeze || pmd_migration) { + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { + pte_t entry; + swp_entry_t swp_entry; + + if (write) + swp_entry = make_writable_migration_entry( + page_to_pfn(page + i)); + else if (anon_exclusive) + swp_entry = make_readable_exclusive_migration_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_migration_entry( + page_to_pfn(page + i)); + if (young) + swp_entry = make_migration_entry_young(swp_entry); + if (dirty) + swp_entry = make_migration_entry_dirty(swp_entry); + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + set_pte_at(mm, addr, pte + i, entry); + } + } else { + pte_t entry; + + entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); + if (write) + entry = pte_mkwrite(entry, vma); + if (!young) + entry = pte_mkold(entry); + /* NOTE: this may set soft-dirty too on some archs */ + if (dirty) + entry = pte_mkdirty(entry); + if (soft_dirty) + entry = pte_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_mkuffd_wp(entry); + + for (i = 0; i < HPAGE_PMD_NR; i++) + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); + } + pte_unmap(pte); + + if (!pmd_migration) + folio_remove_rmap_pmd(folio, page, vma); + if (freeze) + put_page(page); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + +void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio) +{ + VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); + VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); + VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); + VM_BUG_ON(freeze && !folio); + + /* + * When the caller requests to set up a migration entry, we + * require a folio to check the PMD against. Otherwise, there + * is a risk of replacing the wrong folio. + */ + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || + is_pmd_migration_entry(*pmd)) { + if (folio && folio != pmd_folio(*pmd)) + return; + __split_huge_pmd_locked(vma, pmd, address, freeze); + } +} + +void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio) +{ + spinlock_t *ptl; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PMD_MASK, + (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + ptl = pmd_lock(vma->vm_mm, pmd); + split_huge_pmd_locked(vma, range.start, pmd, freeze, folio); + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); +} + +void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio) +{ + pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); + + if (!pmd) + return; + + __split_huge_pmd(vma, pmd, address, freeze, folio); +} + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page) +{ + struct folio *folio = page_folio(page); + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long address = pvmw->address; + bool anon_exclusive; + pmd_t pmdval; + swp_entry_t entry; + pmd_t pmdswp; + + if (!(pvmw->pmd && !pvmw->pte)) + return 0; + + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); + pmdval = pmdp_invalidate(vma, address, pvmw->pmd); + + /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ + anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); + if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return -EBUSY; + } + + if (pmd_dirty(pmdval)) + folio_mark_dirty(folio); + if (pmd_write(pmdval)) + entry = make_writable_migration_entry(page_to_pfn(page)); + else if (anon_exclusive) + entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); + else + entry = make_readable_migration_entry(page_to_pfn(page)); + if (pmd_young(pmdval)) + entry = make_migration_entry_young(entry); + if (pmd_dirty(pmdval)) + entry = make_migration_entry_dirty(entry); + pmdswp = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + pmdswp = pmd_swp_mksoft_dirty(pmdswp); + if (pmd_uffd_wp(pmdval)) + pmdswp = pmd_swp_mkuffd_wp(pmdswp); + set_pmd_at(mm, address, pvmw->pmd, pmdswp); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); + trace_set_migration_pmd(address, pmd_val(pmdswp)); + + return 0; +} + +void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) +{ + struct folio *folio = page_folio(new); + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long address = pvmw->address; + unsigned long haddr = address & HPAGE_PMD_MASK; + pmd_t pmde; + swp_entry_t entry; + + if (!(pvmw->pmd && !pvmw->pte)) + return; + + entry = pmd_to_swp_entry(*pvmw->pmd); + folio_get(folio); + pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); + if (pmd_swp_soft_dirty(*pvmw->pmd)) + pmde = pmd_mksoft_dirty(pmde); + if (is_writable_migration_entry(entry)) + pmde = pmd_mkwrite(pmde, vma); + if (pmd_swp_uffd_wp(*pvmw->pmd)) + pmde = pmd_mkuffd_wp(pmde); + if (!is_migration_entry_young(entry)) + pmde = pmd_mkold(pmde); + /* NOTE: this may contain setting soft-dirty on some archs */ + if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) + pmde = pmd_mkdirty(pmde); + + if (folio_test_anon(folio)) { + rmap_t rmap_flags = RMAP_NONE; + + if (!is_readable_migration_entry(entry)) + rmap_flags |= RMAP_EXCLUSIVE; + + folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); + } else { + folio_add_file_rmap_pmd(folio, new, vma); + } + VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); + set_pmd_at(mm, haddr, pvmw->pmd, pmde); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, address, pvmw->pmd); + trace_remove_migration_pmd(address, pmd_val(pmde)); +} +#endif diff --git a/mm/huge_mapping_pud.c b/mm/huge_mapping_pud.c new file mode 100644 index 000000000000..c3a6bffe2871 --- /dev/null +++ b/mm/huge_mapping_pud.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Red Hat, Inc. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" +#include "swap.h" + +/* + * Returns page table lock pointer if a given pud maps a thp, NULL otherwise. + * + * Note that if it returns page table lock pointer, this routine returns without + * unlocking page table lock. So callers must unlock it. + */ +spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) +{ + spinlock_t *ptl; + + ptl = pud_lock(vma->vm_mm, pud); + if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + return ptl; + spin_unlock(ptl); + return NULL; +} + +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) +{ + pud_t _pud; + + _pud = pud_mkyoung(*pud); + if (write) + _pud = pud_mkdirty(_pud); + if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, + pud, _pud, write)) + update_mmu_cache_pud(vma, addr, pud); +} + +int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pud_t *dst_pud, pud_t *src_pud, unsigned long addr, + struct vm_area_struct *vma) +{ + spinlock_t *dst_ptl, *src_ptl; + pud_t pud; + int ret; + + dst_ptl = pud_lock(dst_mm, dst_pud); + src_ptl = pud_lockptr(src_mm, src_pud); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + ret = -EAGAIN; + pud = *src_pud; + if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + goto out_unlock; + + /* + * When page table lock is held, the huge zero pud should not be + * under splitting since we don't split the page itself, only pud to + * a page table. + */ + if (is_huge_zero_pud(pud)) { + /* No huge zero pud yet */ + } + + /* + * TODO: once we support anonymous pages, use + * folio_try_dup_anon_rmap_*() and split if duplicating fails. + */ + pudp_set_wrprotect(src_mm, addr, src_pud); + pud = pud_mkold(pud_wrprotect(pud)); + set_pud_at(dst_mm, addr, dst_pud, pud); + + ret = 0; +out_unlock: + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + return ret; +} + +void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +{ + bool write = vmf->flags & FAULT_FLAG_WRITE; + + vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) + goto unlock; + + touch_pud(vmf->vma, vmf->address, vmf->pud, write); +unlock: + spin_unlock(vmf->ptl); +} + +/* + * Returns: + * + * - 0: if pud leaf changed from under us + * - 1: if pud can be skipped + * - HPAGE_PUD_NR: if pud was successfully processed + */ +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm = vma->vm_mm; + pud_t oldpud, entry; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PUD_SIZE); + + /* NUMA balancing doesn't apply to dax */ + if (cp_flags & MM_CP_PROT_NUMA) + return 1; + + /* + * Huge entries on userfault-wp only works with anonymous, while we + * don't have anonymous PUDs yet. + */ + if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) + return 1; + + ptl = __pud_trans_huge_lock(pudp, vma); + if (!ptl) + return 0; + + /* + * Can't clear PUD or it can race with concurrent zapping. See + * change_huge_pmd(). + */ + oldpud = pudp_invalidate(vma, addr, pudp); + entry = pud_modify(oldpud, newprot); + set_pud_at(mm, addr, pudp, entry); + tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); + + spin_unlock(ptl); + return HPAGE_PUD_NR; +} + +int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pud, unsigned long addr) +{ + spinlock_t *ptl; + pud_t orig_pud; + + ptl = __pud_trans_huge_lock(pud, vma); + if (!ptl) + return 0; + + orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); + arch_check_zapped_pud(vma, orig_pud); + tlb_remove_pud_tlb_entry(tlb, pud, addr); + if (vma_is_special_huge(vma)) { + spin_unlock(ptl); + /* No zero page support yet */ + } else { + /* No support for anonymous PUD pages yet */ + BUG(); + } + return 1; +} + +static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, + unsigned long haddr) +{ + VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); + VM_BUG_ON_VMA(vma->vm_start > haddr, vma); + VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); + VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + count_vm_event(THP_SPLIT_PUD); +#endif + + pudp_huge_clear_flush(vma, haddr, pud); +} + +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ + spinlock_t *ptl; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PUD_MASK, + (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); + mmu_notifier_invalidate_range_start(&range); + ptl = pud_lock(vma->vm_mm, pud); + if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + goto out; + __split_huge_pud_locked(vma, pud, range.start); + +out: + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); +} diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 554dec14b768..11aee24ce21a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -838,13 +838,6 @@ static int __init setup_transparent_hugepage(char *str) } __setup("transparent_hugepage=", setup_transparent_hugepage); -pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) -{ - if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd, vma); - return pmd; -} - #ifdef CONFIG_MEMCG static inline struct deferred_split *get_deferred_split_queue(struct folio *folio) @@ -1313,19 +1306,6 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write) EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write) -{ - pmd_t _pmd; - - _pmd = pmd_mkyoung(*pmd); - if (write) - _pmd = pmd_mkdirty(_pmd); - if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, write)) - update_mmu_cache_pmd(vma, addr, pmd); -} - struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap) { @@ -1366,309 +1346,6 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, return page; } -int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) -{ - spinlock_t *dst_ptl, *src_ptl; - struct page *src_page; - struct folio *src_folio; - pmd_t pmd; - pgtable_t pgtable = NULL; - int ret = -ENOMEM; - - /* Skip if can be re-fill on fault */ - if (!vma_is_anonymous(dst_vma)) - return 0; - - pgtable = pte_alloc_one(dst_mm); - if (unlikely(!pgtable)) - goto out; - - dst_ptl = pmd_lock(dst_mm, dst_pmd); - src_ptl = pmd_lockptr(src_mm, src_pmd); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - - ret = -EAGAIN; - pmd = *src_pmd; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - if (unlikely(is_swap_pmd(pmd))) { - swp_entry_t entry = pmd_to_swp_entry(pmd); - - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (!is_readable_migration_entry(entry)) { - entry = make_readable_migration_entry( - swp_offset(entry)); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd = pmd_swp_mkuffd_wp(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - if (!userfaultfd_wp(dst_vma)) - pmd = pmd_swp_clear_uffd_wp(pmd); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } -#endif - - if (unlikely(!pmd_trans_huge(pmd))) { - pte_free(dst_mm, pgtable); - goto out_unlock; - } - /* - * When page table lock is held, the huge zero pmd should not be - * under splitting since we don't split the page itself, only pmd to - * a page table. - */ - if (is_huge_zero_pmd(pmd)) { - /* - * mm_get_huge_zero_folio() will never allocate a new - * folio here, since we already have a zero page to - * copy. It just takes a reference. - */ - mm_get_huge_zero_folio(dst_mm); - goto out_zero_page; - } - - src_page = pmd_page(pmd); - VM_BUG_ON_PAGE(!PageHead(src_page), src_page); - src_folio = page_folio(src_page); - - folio_get(src_folio); - if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { - /* Page maybe pinned: split and retry the fault on PTEs. */ - folio_put(src_folio); - pte_free(dst_mm, pgtable); - spin_unlock(src_ptl); - spin_unlock(dst_ptl); - __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); - return -EAGAIN; - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); -out_zero_page: - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) - pmd = pmd_clear_uffd_wp(pmd); - pmd = pmd_mkold(pmd_wrprotect(pmd)); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - - ret = 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); -out: - return ret; -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write) -{ - pud_t _pud; - - _pud = pud_mkyoung(*pud); - if (write) - _pud = pud_mkdirty(_pud); - if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, - pud, _pud, write)) - update_mmu_cache_pud(vma, addr, pud); -} - -int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pud_t *dst_pud, pud_t *src_pud, unsigned long addr, - struct vm_area_struct *vma) -{ - spinlock_t *dst_ptl, *src_ptl; - pud_t pud; - int ret; - - dst_ptl = pud_lock(dst_mm, dst_pud); - src_ptl = pud_lockptr(src_mm, src_pud); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - - ret = -EAGAIN; - pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) - goto out_unlock; - - /* - * When page table lock is held, the huge zero pud should not be - * under splitting since we don't split the page itself, only pud to - * a page table. - */ - if (is_huge_zero_pud(pud)) { - /* No huge zero pud yet */ - } - - /* - * TODO: once we support anonymous pages, use - * folio_try_dup_anon_rmap_*() and split if duplicating fails. - */ - pudp_set_wrprotect(src_mm, addr, src_pud); - pud = pud_mkold(pud_wrprotect(pud)); - set_pud_at(dst_mm, addr, dst_pud, pud); - - ret = 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); - return ret; -} - -void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) -{ - bool write = vmf->flags & FAULT_FLAG_WRITE; - - vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud); - if (unlikely(!pud_same(*vmf->pud, orig_pud))) - goto unlock; - - touch_pud(vmf->vma, vmf->address, vmf->pud, write); -unlock: - spin_unlock(vmf->ptl); -} -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ - -void huge_pmd_set_accessed(struct vm_fault *vmf) -{ - bool write = vmf->flags & FAULT_FLAG_WRITE; - - vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) - goto unlock; - - touch_pmd(vmf->vma, vmf->address, vmf->pmd, write); - -unlock: - spin_unlock(vmf->ptl); -} - -vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) -{ - const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; - struct vm_area_struct *vma = vmf->vma; - struct folio *folio; - struct page *page; - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - pmd_t orig_pmd = vmf->orig_pmd; - - vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); - VM_BUG_ON_VMA(!vma->anon_vma, vma); - - if (is_huge_zero_pmd(orig_pmd)) - goto fallback; - - spin_lock(vmf->ptl); - - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { - spin_unlock(vmf->ptl); - return 0; - } - - page = pmd_page(orig_pmd); - folio = page_folio(page); - VM_BUG_ON_PAGE(!PageHead(page), page); - - /* Early check when only holding the PT lock. */ - if (PageAnonExclusive(page)) - goto reuse; - - if (!folio_trylock(folio)) { - folio_get(folio); - spin_unlock(vmf->ptl); - folio_lock(folio); - spin_lock(vmf->ptl); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { - spin_unlock(vmf->ptl); - folio_unlock(folio); - folio_put(folio); - return 0; - } - folio_put(folio); - } - - /* Recheck after temporarily dropping the PT lock. */ - if (PageAnonExclusive(page)) { - folio_unlock(folio); - goto reuse; - } - - /* - * See do_wp_page(): we can only reuse the folio exclusively if - * there are no additional references. Note that we always drain - * the LRU cache immediately after adding a THP. - */ - if (folio_ref_count(folio) > - 1 + folio_test_swapcache(folio) * folio_nr_pages(folio)) - goto unlock_fallback; - if (folio_test_swapcache(folio)) - folio_free_swap(folio); - if (folio_ref_count(folio) == 1) { - pmd_t entry; - - folio_move_anon_rmap(folio, vma); - SetPageAnonExclusive(page); - folio_unlock(folio); -reuse: - if (unlikely(unshare)) { - spin_unlock(vmf->ptl); - return 0; - } - entry = pmd_mkyoung(orig_pmd); - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - spin_unlock(vmf->ptl); - return 0; - } - -unlock_fallback: - folio_unlock(folio); - spin_unlock(vmf->ptl); -fallback: - __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); - return VM_FAULT_FALLBACK; -} - -static inline bool can_change_pmd_writable(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd) -{ - struct page *page; - - if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) - return false; - - /* Don't touch entries that are not even readable (NUMA hinting). */ - if (pmd_protnone(pmd)) - return false; - - /* Do we need write faults for softdirty tracking? */ - if (pmd_needs_soft_dirty_wp(vma, pmd)) - return false; - - /* Do we need write faults for uffd-wp tracking? */ - if (userfaultfd_huge_pmd_wp(vma, pmd)) - return false; - - if (!(vma->vm_flags & VM_SHARED)) { - /* See can_change_pte_writable(). */ - page = vm_normal_page_pmd(vma, addr, pmd); - return page && PageAnon(page) && PageAnonExclusive(page); - } - - /* See can_change_pte_writable(). */ - return pmd_dirty(pmd); -} - /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { @@ -1830,342 +1507,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - -int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr) -{ - pmd_t orig_pmd; - spinlock_t *ptl; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - ptl = __pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - /* - * For architectures like ppc64 we look at deposited pgtable - * when calling pmdp_huge_get_and_clear. So do the - * pgtable_trans_huge_withdraw after finishing pmdp related - * operations. - */ - orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, - tlb->fullmm); - arch_check_zapped_pmd(vma, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); - if (vma_is_special_huge(vma)) { - if (arch_needs_pgtable_deposit()) - zap_deposited_table(tlb->mm, pmd); - spin_unlock(ptl); - } else if (is_huge_zero_pmd(orig_pmd)) { - zap_deposited_table(tlb->mm, pmd); - spin_unlock(ptl); - } else { - struct folio *folio = NULL; - int flush_needed = 1; - - if (pmd_present(orig_pmd)) { - struct page *page = pmd_page(orig_pmd); - - folio = page_folio(page); - folio_remove_rmap_pmd(folio, page, vma); - WARN_ON_ONCE(folio_mapcount(folio) < 0); - VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - folio = pfn_swap_entry_folio(entry); - flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); - - if (folio_test_anon(folio)) { - zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); - } else { - if (arch_needs_pgtable_deposit()) - zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, mm_counter_file(folio), - -HPAGE_PMD_NR); - } - - spin_unlock(ptl); - if (flush_needed) - tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); - } - return 1; -} - -#ifndef pmd_move_must_withdraw -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, - spinlock_t *old_pmd_ptl, - struct vm_area_struct *vma) -{ - /* - * With split pmd lock we also need to move preallocated - * PTE page table if new_pmd is on different PMD page table. - * - * We also don't deposit and withdraw tables for file pages. - */ - return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma); -} -#endif - -static pmd_t move_soft_dirty_pmd(pmd_t pmd) -{ -#ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) - pmd = pmd_swp_mksoft_dirty(pmd); - else if (pmd_present(pmd)) - pmd = pmd_mksoft_dirty(pmd); -#endif - return pmd; -} - -bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) -{ - spinlock_t *old_ptl, *new_ptl; - pmd_t pmd; - struct mm_struct *mm = vma->vm_mm; - bool force_flush = false; - - /* - * The destination pmd shouldn't be established, free_pgtables() - * should have released it; but move_page_tables() might have already - * inserted a page table, if racing against shmem/file collapse. - */ - if (!pmd_none(*new_pmd)) { - VM_BUG_ON(pmd_trans_huge(*new_pmd)); - return false; - } - - /* - * We don't have to worry about the ordering of src and dst - * ptlocks because exclusive mmap_lock prevents deadlock. - */ - old_ptl = __pmd_trans_huge_lock(old_pmd, vma); - if (old_ptl) { - new_ptl = pmd_lockptr(mm, new_pmd); - if (new_ptl != old_ptl) - spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); - pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); - if (pmd_present(pmd)) - force_flush = true; - VM_BUG_ON(!pmd_none(*new_pmd)); - - if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { - pgtable_t pgtable; - pgtable = pgtable_trans_huge_withdraw(mm, old_pmd); - pgtable_trans_huge_deposit(mm, new_pmd, pgtable); - } - pmd = move_soft_dirty_pmd(pmd); - set_pmd_at(mm, new_addr, new_pmd, pmd); - if (force_flush) - flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); - if (new_ptl != old_ptl) - spin_unlock(new_ptl); - spin_unlock(old_ptl); - return true; - } - return false; -} - -/* - * Returns - * - 0 if PMD could not be locked - * - 1 if PMD was locked but protections unchanged and TLB flush unnecessary - * or if prot_numa but THP migration is not supported - * - HPAGE_PMD_NR if protections changed and TLB flush necessary - */ -int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) -{ - struct mm_struct *mm = vma->vm_mm; - spinlock_t *ptl; - pmd_t oldpmd, entry; - bool prot_numa = cp_flags & MM_CP_PROT_NUMA; - bool uffd_wp = cp_flags & MM_CP_UFFD_WP; - bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; - int ret = 1; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - if (prot_numa && !thp_migration_supported()) - return 1; - - ptl = __pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - if (is_swap_pmd(*pmd)) { - swp_entry_t entry = pmd_to_swp_entry(*pmd); - struct folio *folio = pfn_swap_entry_folio(entry); - pmd_t newpmd; - - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); - if (is_writable_migration_entry(entry)) { - /* - * A protection check is difficult so - * just be safe and disable write - */ - if (folio_test_anon(folio)) - entry = make_readable_exclusive_migration_entry(swp_offset(entry)); - else - entry = make_readable_migration_entry(swp_offset(entry)); - newpmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*pmd)) - newpmd = pmd_swp_mksoft_dirty(newpmd); - } else { - newpmd = *pmd; - } - - if (uffd_wp) - newpmd = pmd_swp_mkuffd_wp(newpmd); - else if (uffd_wp_resolve) - newpmd = pmd_swp_clear_uffd_wp(newpmd); - if (!pmd_same(*pmd, newpmd)) - set_pmd_at(mm, addr, pmd, newpmd); - goto unlock; - } -#endif - - if (prot_numa) { - struct folio *folio; - bool toptier; - /* - * Avoid trapping faults against the zero page. The read-only - * data is likely to be read-cached on the local CPU and - * local/remote hits to the zero page are not interesting. - */ - if (is_huge_zero_pmd(*pmd)) - goto unlock; - - if (pmd_protnone(*pmd)) - goto unlock; - - folio = pmd_folio(*pmd); - toptier = node_is_toptier(folio_nid(folio)); - /* - * Skip scanning top tier node if normal numa - * balancing is disabled - */ - if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && - toptier) - goto unlock; - - if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && - !toptier) - folio_xchg_access_time(folio, - jiffies_to_msecs(jiffies)); - } - /* - * In case prot_numa, we are under mmap_read_lock(mm). It's critical - * to not clear pmd intermittently to avoid race with MADV_DONTNEED - * which is also under mmap_read_lock(mm): - * - * CPU0: CPU1: - * change_huge_pmd(prot_numa=1) - * pmdp_huge_get_and_clear_notify() - * madvise_dontneed() - * zap_pmd_range() - * pmd_trans_huge(*pmd) == 0 (without ptl) - * // skip the pmd - * set_pmd_at(); - * // pmd is re-established - * - * The race makes MADV_DONTNEED miss the huge pmd and don't clear it - * which may break userspace. - * - * pmdp_invalidate_ad() is required to make sure we don't miss - * dirty/young flags set by hardware. - */ - oldpmd = pmdp_invalidate_ad(vma, addr, pmd); - - entry = pmd_modify(oldpmd, newprot); - if (uffd_wp) - entry = pmd_mkuffd_wp(entry); - else if (uffd_wp_resolve) - /* - * Leave the write bit to be handled by PF interrupt - * handler, then things like COW could be properly - * handled. - */ - entry = pmd_clear_uffd_wp(entry); - - /* See change_pte_range(). */ - if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && - can_change_pmd_writable(vma, addr, entry)) - entry = pmd_mkwrite(entry, vma); - - ret = HPAGE_PMD_NR; - set_pmd_at(mm, addr, pmd, entry); - - if (huge_pmd_needs_flush(oldpmd, entry)) - tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); -unlock: - spin_unlock(ptl); - return ret; -} - -/* - * Returns: - * - * - 0: if pud leaf changed from under us - * - 1: if pud can be skipped - * - HPAGE_PUD_NR: if pud was successfully processed - */ -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) -{ - struct mm_struct *mm = vma->vm_mm; - pud_t oldpud, entry; - spinlock_t *ptl; - - tlb_change_page_size(tlb, HPAGE_PUD_SIZE); - - /* NUMA balancing doesn't apply to dax */ - if (cp_flags & MM_CP_PROT_NUMA) - return 1; - - /* - * Huge entries on userfault-wp only works with anonymous, while we - * don't have anonymous PUDs yet. - */ - if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) - return 1; - - ptl = __pud_trans_huge_lock(pudp, vma); - if (!ptl) - return 0; - - /* - * Can't clear PUD or it can race with concurrent zapping. See - * change_huge_pmd(). - */ - oldpud = pudp_invalidate(vma, addr, pudp); - entry = pud_modify(oldpud, newprot); - set_pud_at(mm, addr, pudp, entry); - tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); - - spin_unlock(ptl); - return HPAGE_PUD_NR; -} -#endif - #ifdef CONFIG_USERFAULTFD /* * The PT lock for src_pmd and dst_vma/src_vma (for reading) are locked by @@ -2306,105 +1647,8 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm } #endif /* CONFIG_USERFAULTFD */ -/* - * Returns page table lock pointer if a given pmd maps a thp, NULL otherwise. - * - * Note that if it returns page table lock pointer, this routine returns without - * unlocking page table lock. So callers must unlock it. - */ -spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) -{ - spinlock_t *ptl; - ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) - return ptl; - spin_unlock(ptl); - return NULL; -} - -/* - * Returns page table lock pointer if a given pud maps a thp, NULL otherwise. - * - * Note that if it returns page table lock pointer, this routine returns without - * unlocking page table lock. So callers must unlock it. - */ -spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) -{ - spinlock_t *ptl; - - ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) - return ptl; - spin_unlock(ptl); - return NULL; -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pud, unsigned long addr) -{ - spinlock_t *ptl; - pud_t orig_pud; - - ptl = __pud_trans_huge_lock(pud, vma); - if (!ptl) - return 0; - - orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); - arch_check_zapped_pud(vma, orig_pud); - tlb_remove_pud_tlb_entry(tlb, pud, addr); - if (vma_is_special_huge(vma)) { - spin_unlock(ptl); - /* No zero page support yet */ - } else { - /* No support for anonymous PUD pages yet */ - BUG(); - } - return 1; -} - -static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, - unsigned long haddr) -{ - VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); - VM_BUG_ON_VMA(vma->vm_start > haddr, vma); - VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); - VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); - - count_vm_event(THP_SPLIT_PUD); - - pudp_huge_clear_flush(vma, haddr, pud); -} - -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ - spinlock_t *ptl; - struct mmu_notifier_range range; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PUD_MASK, - (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); - mmu_notifier_invalidate_range_start(&range); - ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) - goto out; - __split_huge_pud_locked(vma, pud, range.start); - -out: - spin_unlock(ptl); - mmu_notifier_invalidate_range_end(&range); -} -#else -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ -} -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ - -static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, - unsigned long haddr, pmd_t *pmd) +void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -2444,274 +1688,6 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long haddr, bool freeze) -{ - struct mm_struct *mm = vma->vm_mm; - struct folio *folio; - struct page *page; - pgtable_t pgtable; - pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; - bool anon_exclusive = false, dirty = false; - unsigned long addr; - pte_t *pte; - int i; - - VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); - VM_BUG_ON_VMA(vma->vm_start > haddr, vma); - VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) - && !pmd_devmap(*pmd)); - - count_vm_event(THP_SPLIT_PMD); - - if (!vma_is_anonymous(vma)) { - old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); - /* - * We are going to unmap this huge page. So - * just go ahead and zap it - */ - if (arch_needs_pgtable_deposit()) - zap_deposited_table(mm, pmd); - if (vma_is_special_huge(vma)) - return; - if (unlikely(is_pmd_migration_entry(old_pmd))) { - swp_entry_t entry; - - entry = pmd_to_swp_entry(old_pmd); - folio = pfn_swap_entry_folio(entry); - } else { - page = pmd_page(old_pmd); - folio = page_folio(page); - if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) - folio_mark_dirty(folio); - if (!folio_test_referenced(folio) && pmd_young(old_pmd)) - folio_set_referenced(folio); - folio_remove_rmap_pmd(folio, page, vma); - folio_put(folio); - } - add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); - return; - } - - if (is_huge_zero_pmd(*pmd)) { - /* - * FIXME: Do we want to invalidate secondary mmu by calling - * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below - * inside __split_huge_pmd() ? - * - * We are going from a zero huge page write protected to zero - * small page also write protected so it does not seems useful - * to invalidate secondary mmu at this time. - */ - return __split_huge_zero_page_pmd(vma, haddr, pmd); - } - - pmd_migration = is_pmd_migration_entry(*pmd); - if (unlikely(pmd_migration)) { - swp_entry_t entry; - - old_pmd = *pmd; - entry = pmd_to_swp_entry(old_pmd); - page = pfn_swap_entry_to_page(entry); - write = is_writable_migration_entry(entry); - if (PageAnon(page)) - anon_exclusive = is_readable_exclusive_migration_entry(entry); - young = is_migration_entry_young(entry); - dirty = is_migration_entry_dirty(entry); - soft_dirty = pmd_swp_soft_dirty(old_pmd); - uffd_wp = pmd_swp_uffd_wp(old_pmd); - } else { - /* - * Up to this point the pmd is present and huge and userland has - * the whole access to the hugepage during the split (which - * happens in place). If we overwrite the pmd with the not-huge - * version pointing to the pte here (which of course we could if - * all CPUs were bug free), userland could trigger a small page - * size TLB miss on the small sized TLB while the hugepage TLB - * entry is still established in the huge TLB. Some CPU doesn't - * like that. See - * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum - * 383 on page 105. Intel should be safe but is also warns that - * it's only safe if the permission and cache attributes of the - * two entries loaded in the two TLB is identical (which should - * be the case here). But it is generally safer to never allow - * small and huge TLB entries for the same virtual address to be - * loaded simultaneously. So instead of doing "pmd_populate(); - * flush_pmd_tlb_range();" we first mark the current pmd - * notpresent (atomically because here the pmd_trans_huge must - * remain set at all times on the pmd until the split is - * complete for this pmd), then we flush the SMP TLB and finally - * we write the non-huge version of the pmd entry with - * pmd_populate. - */ - old_pmd = pmdp_invalidate(vma, haddr, pmd); - page = pmd_page(old_pmd); - folio = page_folio(page); - if (pmd_dirty(old_pmd)) { - dirty = true; - folio_set_dirty(folio); - } - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); - uffd_wp = pmd_uffd_wp(old_pmd); - - VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); - VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); - - /* - * Without "freeze", we'll simply split the PMD, propagating the - * PageAnonExclusive() flag for each PTE by setting it for - * each subpage -- no need to (temporarily) clear. - * - * With "freeze" we want to replace mapped pages by - * migration entries right away. This is only possible if we - * managed to clear PageAnonExclusive() -- see - * set_pmd_migration_entry(). - * - * In case we cannot clear PageAnonExclusive(), split the PMD - * only and let try_to_migrate_one() fail later. - * - * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. - */ - anon_exclusive = PageAnonExclusive(page); - if (freeze && anon_exclusive && - folio_try_share_anon_rmap_pmd(folio, page)) - freeze = false; - if (!freeze) { - rmap_t rmap_flags = RMAP_NONE; - - folio_ref_add(folio, HPAGE_PMD_NR - 1); - if (anon_exclusive) - rmap_flags |= RMAP_EXCLUSIVE; - folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, - vma, haddr, rmap_flags); - } - } - - /* - * Withdraw the table only after we mark the pmd entry invalid. - * This's critical for some architectures (Power). - */ - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pmd_populate(mm, &_pmd, pgtable); - - pte = pte_offset_map(&_pmd, haddr); - VM_BUG_ON(!pte); - - /* - * Note that NUMA hinting access restrictions are not transferred to - * avoid any possibility of altering permissions across VMAs. - */ - if (freeze || pmd_migration) { - for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { - pte_t entry; - swp_entry_t swp_entry; - - if (write) - swp_entry = make_writable_migration_entry( - page_to_pfn(page + i)); - else if (anon_exclusive) - swp_entry = make_readable_exclusive_migration_entry( - page_to_pfn(page + i)); - else - swp_entry = make_readable_migration_entry( - page_to_pfn(page + i)); - if (young) - swp_entry = make_migration_entry_young(swp_entry); - if (dirty) - swp_entry = make_migration_entry_dirty(swp_entry); - entry = swp_entry_to_pte(swp_entry); - if (soft_dirty) - entry = pte_swp_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_swp_mkuffd_wp(entry); - - VM_WARN_ON(!pte_none(ptep_get(pte + i))); - set_pte_at(mm, addr, pte + i, entry); - } - } else { - pte_t entry; - - entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); - if (write) - entry = pte_mkwrite(entry, vma); - if (!young) - entry = pte_mkold(entry); - /* NOTE: this may set soft-dirty too on some archs */ - if (dirty) - entry = pte_mkdirty(entry); - if (soft_dirty) - entry = pte_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_mkuffd_wp(entry); - - for (i = 0; i < HPAGE_PMD_NR; i++) - VM_WARN_ON(!pte_none(ptep_get(pte + i))); - - set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); - } - pte_unmap(pte); - - if (!pmd_migration) - folio_remove_rmap_pmd(folio, page, vma); - if (freeze) - put_page(page); - - smp_wmb(); /* make pte visible before pmd */ - pmd_populate(mm, pmd, pgtable); -} - -void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, bool freeze, struct folio *folio) -{ - VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); - VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); - VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); - VM_BUG_ON(freeze && !folio); - - /* - * When the caller requests to set up a migration entry, we - * require a folio to check the PMD against. Otherwise, there - * is a risk of replacing the wrong folio. - */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) - return; - __split_huge_pmd_locked(vma, pmd, address, freeze); - } -} - -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio) -{ - spinlock_t *ptl; - struct mmu_notifier_range range; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PMD_MASK, - (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - ptl = pmd_lock(vma->vm_mm, pmd); - split_huge_pmd_locked(vma, range.start, pmd, freeze, folio); - spin_unlock(ptl); - mmu_notifier_invalidate_range_end(&range); -} - -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct folio *folio) -{ - pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); - - if (!pmd) - return; - - __split_huge_pmd(vma, pmd, address, freeze, folio); -} - static inline void split_huge_pmd_if_needed(struct vm_area_struct *vma, unsigned long address) { /* @@ -3772,100 +2748,3 @@ static int __init split_huge_pages_debugfs(void) late_initcall(split_huge_pages_debugfs); #endif -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page) -{ - struct folio *folio = page_folio(page); - struct vm_area_struct *vma = pvmw->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long address = pvmw->address; - bool anon_exclusive; - pmd_t pmdval; - swp_entry_t entry; - pmd_t pmdswp; - - if (!(pvmw->pmd && !pvmw->pte)) - return 0; - - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); - pmdval = pmdp_invalidate(vma, address, pvmw->pmd); - - /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ - anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); - if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { - set_pmd_at(mm, address, pvmw->pmd, pmdval); - return -EBUSY; - } - - if (pmd_dirty(pmdval)) - folio_mark_dirty(folio); - if (pmd_write(pmdval)) - entry = make_writable_migration_entry(page_to_pfn(page)); - else if (anon_exclusive) - entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); - else - entry = make_readable_migration_entry(page_to_pfn(page)); - if (pmd_young(pmdval)) - entry = make_migration_entry_young(entry); - if (pmd_dirty(pmdval)) - entry = make_migration_entry_dirty(entry); - pmdswp = swp_entry_to_pmd(entry); - if (pmd_soft_dirty(pmdval)) - pmdswp = pmd_swp_mksoft_dirty(pmdswp); - if (pmd_uffd_wp(pmdval)) - pmdswp = pmd_swp_mkuffd_wp(pmdswp); - set_pmd_at(mm, address, pvmw->pmd, pmdswp); - folio_remove_rmap_pmd(folio, page, vma); - folio_put(folio); - trace_set_migration_pmd(address, pmd_val(pmdswp)); - - return 0; -} - -void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) -{ - struct folio *folio = page_folio(new); - struct vm_area_struct *vma = pvmw->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long address = pvmw->address; - unsigned long haddr = address & HPAGE_PMD_MASK; - pmd_t pmde; - swp_entry_t entry; - - if (!(pvmw->pmd && !pvmw->pte)) - return; - - entry = pmd_to_swp_entry(*pvmw->pmd); - folio_get(folio); - pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); - if (pmd_swp_soft_dirty(*pvmw->pmd)) - pmde = pmd_mksoft_dirty(pmde); - if (is_writable_migration_entry(entry)) - pmde = pmd_mkwrite(pmde, vma); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde = pmd_mkuffd_wp(pmde); - if (!is_migration_entry_young(entry)) - pmde = pmd_mkold(pmde); - /* NOTE: this may contain setting soft-dirty on some archs */ - if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) - pmde = pmd_mkdirty(pmde); - - if (folio_test_anon(folio)) { - rmap_t rmap_flags = RMAP_NONE; - - if (!is_readable_migration_entry(entry)) - rmap_flags |= RMAP_EXCLUSIVE; - - folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); - } else { - folio_add_file_rmap_pmd(folio, new, vma); - } - VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); - set_pmd_at(mm, haddr, pvmw->pmd, pmde); - - /* No need to invalidate - it was non-present before */ - update_mmu_cache_pmd(vma, address, pvmw->pmd); - trace_remove_migration_pmd(address, pmd_val(pmde)); -} -#endif From patchwork Wed Jul 17 22:02:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1961814 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X4x4G6uc; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X4x4G6uc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WPVQ03Mv1z1yY1 for ; Thu, 18 Jul 2024 08:06:16 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X4x4G6uc; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X4x4G6uc; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WPVQ02DtGz30WW for ; Thu, 18 Jul 2024 08:06:16 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X4x4G6uc; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X4x4G6uc; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WPVKr757Xz3dHm for ; Thu, 18 Jul 2024 08:02:40 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253757; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=X4x4G6uceckUzUEFGydtnyMyFkRjLVoqyUUMh/QrfNuBPjzC9RjvYSOEIxX7VF0qrmHG1L RU6fyidvyYkpp1Lt2j+kEsYvydnRlVukrOufvWQQ/KI4aMiJNOYvm9RwuSVL2eDTGweXnt ICK1+wdNZr4ufiu3xQP6BO7pWI//sbg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253757; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=X4x4G6uceckUzUEFGydtnyMyFkRjLVoqyUUMh/QrfNuBPjzC9RjvYSOEIxX7VF0qrmHG1L RU6fyidvyYkpp1Lt2j+kEsYvydnRlVukrOufvWQQ/KI4aMiJNOYvm9RwuSVL2eDTGweXnt ICK1+wdNZr4ufiu3xQP6BO7pWI//sbg= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-224-O-rrOkh8Phq-2imGV99Qdg-1; Wed, 17 Jul 2024 18:02:36 -0400 X-MC-Unique: O-rrOkh8Phq-2imGV99Qdg-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-447f9d993c2so176221cf.1 for ; Wed, 17 Jul 2024 15:02:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253755; x=1721858555; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=Fzr/flbVXzFUDxuj4yGLDC3jxO6LRHiCt1N9Nz3gfpnXgz8KHXjAElcaWIMVZ7TuST vroZtgyo3YbcHyIh+YWqo87bPezT//TlVbX5SWxqK6DMYmXpW7M7ZcI6i7cF8CEq/PZ7 aaBFTdYEYvcsahH7mnT2+b5Hr8sohEIcx/a+nZJ8fpYAzJFw/bPs2TOR29zZlYv12h5p ADpgWavjq9TkGvxxM53Up4iGeJmAUOtP+kzCxh5QY4xrlQavrQApPCiEp4PAVupVO2UV dwCgURtSHsCh4ghYCyP0f8A0W7n2yFj6mSnXFmwvJxXf9A10IekTUS2N11iq/QlUhKlg 7OwQ== X-Forwarded-Encrypted: i=1; AJvYcCXw6E8gpWSdXeFcrdQUeJ6CYGneiZ710oiiazJx84IbRyH49uRKTVtltZYF3TE6guVmVWy1Q8/UVxIwjL8=@lists.ozlabs.org X-Gm-Message-State: AOJu0Yy9w8CJDh83jXnbHAijaqGNhAxzNKElEdG6vzbIWimjs6tOUpit drbk4GR2hwcfDiWEpYzuHN4/kZy3n3PbgOb0lJum7GDIj4ONDXFkcv/8Bz0vwUpdrHMgrymvJDW eDL4BrA4epi601OOTrPfz9PWrIuXtwGLnuiQf6/w+A9YYMg0tDDVMdJmT0t1opJY= X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22373081cf.1.1721253755335; Wed, 17 Jul 2024 15:02:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH8QsOTDoliMPT3fo+/7Outqv1nF7W8rzQA7PybYEjXwyS1q7X3t8IYBAe2znVX4XcYe4OF/A== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22372781cf.1.1721253754848; Wed, 17 Jul 2024 15:02:34 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:34 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 6/6] mm: Convert "*_trans_huge() || *_devmap()" to use *_leaf() Date: Wed, 17 Jul 2024 18:02:19 -0400 Message-ID: <20240717220219.3743374-7-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, Alistair Popple , Ryan Roberts , David Hildenbrand , x86@kernel.org, Hugh Dickins , peterx@redhat.com, Michal Hocko , Alex Williamson , linux-riscv@lists.infradead.org, Matthew Wilcox , Jason Gunthorpe , sparclinux@vger.kernel.org, Axel Rasmussen , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Dan Williams , Vlastimil Babka , Oscar Salvador Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" This patch converted all such checks into one *_leaf() check under common mm/, as "thp+devmap" should compose everything for a *_leaf() for now. I didn't yet touch arch code in other directories, as some arch may need some special attention, so I left those separately. It should start to save some cycles on such check and pave way for the new leaf types. E.g., when a new type of leaf is introduced, it'll naturally go the same route to what we have now for thp+devmap. Here one issue with pxx_leaf() API is that such API will be defined by arch but it doesn't consider kernel config. For example, below "if" branch cannot be automatically optimized: if (pmd_leaf()) { ... } Even if both THP && HUGETLB are not enabled (which means pmd_leaf() can never return true). To provide a chance for compilers to optimize and omit code when possible, introduce a light wrapper for them and call them pxx_is_leaf(). That will take kernel config into account and properly allow omitting branches when the compiler knows it'll constantly returns false. This tries to mimic what we used to have with pxx_trans_huge() when !THP, so it now also applies to pxx_leaf() API. Cc: Alistair Popple Cc: Dan Williams Cc: Jason Gunthorpe Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 6 +++--- include/linux/pgtable.h | 30 +++++++++++++++++++++++++++++- mm/hmm.c | 4 ++-- mm/huge_mapping_pmd.c | 9 +++------ mm/huge_mapping_pud.c | 6 +++--- mm/mapping_dirty_helpers.c | 4 ++-- mm/memory.c | 14 ++++++-------- mm/migrate_device.c | 2 +- mm/mprotect.c | 4 ++-- mm/mremap.c | 5 ++--- mm/page_vma_mapped.c | 5 ++--- mm/pgtable-generic.c | 7 +++---- 12 files changed, 58 insertions(+), 38 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index aea2784df8ef..a5b026d0731e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -27,7 +27,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); static inline spinlock_t * pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - if (pud_trans_huge(*pud) || pud_devmap(*pud)) + if (pud_is_leaf(*pud)) return __pud_trans_huge_lock(pud, vma); else return NULL; @@ -36,7 +36,7 @@ pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) || pud_devmap(*____pud)) \ + if (pud_is_leaf(*____pud)) \ __split_huge_pud(__vma, __pud, __address); \ } while (0) #else /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ @@ -125,7 +125,7 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t * pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + if (is_swap_pmd(*pmd) || pmd_is_leaf(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else return NULL; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5e505373b113..af7709a132aa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1641,7 +1641,7 @@ static inline int pud_trans_unstable(pud_t *pud) defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) pud_t pudval = READ_ONCE(*pud); - if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + if (pud_none(pudval) || pud_leaf(pudval)) return 1; if (unlikely(pud_bad(pudval))) { pud_clear_bad(pud); @@ -1901,6 +1901,34 @@ typedef unsigned int pgtbl_mod_mask; #define pmd_leaf(x) false #endif +/* + * Wrapper of pxx_leaf() helpers. + * + * Comparing to pxx_leaf() API, the only difference is: using these macros + * can help code generation, so unnecessary code can be omitted when the + * specific level of leaf is not possible due to kernel config. It is + * needed because normally pxx_leaf() can be defined in arch code without + * knowing the kernel config. + * + * Currently we only need pmd/pud versions, because the largest leaf Linux + * supports so far is pud. + * + * Defining here also means that in arch's pgtable headers these macros + * cannot be used, pxx_leaf()s need to be used instead, because this file + * will not be included in arch's pgtable headers. + */ +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +#define pmd_is_leaf(x) pmd_leaf(x) +#else +#define pmd_is_leaf(x) false +#endif + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +#define pud_is_leaf(x) pud_leaf(x) +#else +#define pud_is_leaf(x) false +#endif + #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) #endif diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..8d985bbbfee9 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -351,7 +351,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); } - if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) { + if (pmd_is_leaf(pmd)) { /* * No need to take pmd_lock here, even if some other thread * is splitting the huge pmd we will get that event through @@ -362,7 +362,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, * values. */ pmd = pmdp_get_lockless(pmdp); - if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd)) + if (!pmd_is_leaf(pmd)) goto again; return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); diff --git a/mm/huge_mapping_pmd.c b/mm/huge_mapping_pmd.c index 7b85e2a564d6..d30c60685f66 100644 --- a/mm/huge_mapping_pmd.c +++ b/mm/huge_mapping_pmd.c @@ -60,8 +60,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) + if (likely(is_swap_pmd(*pmd) || pmd_is_leaf(*pmd))) return ptl; spin_unlock(ptl); return NULL; @@ -627,8 +626,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && - !pmd_devmap(*pmd)); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_is_leaf(*pmd)); #ifdef CONFIG_TRANSPARENT_HUGEPAGE count_vm_event(THP_SPLIT_PMD); @@ -845,8 +843,7 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, * require a folio to check the PMD against. Otherwise, there * is a risk of replacing the wrong folio. */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { + if (pmd_is_leaf(*pmd) || is_pmd_migration_entry(*pmd)) { if (folio && folio != pmd_folio(*pmd)) return; __split_huge_pmd_locked(vma, pmd, address, freeze); diff --git a/mm/huge_mapping_pud.c b/mm/huge_mapping_pud.c index c3a6bffe2871..58871dd74df2 100644 --- a/mm/huge_mapping_pud.c +++ b/mm/huge_mapping_pud.c @@ -57,7 +57,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + if (likely(pud_is_leaf(*pud))) return ptl; spin_unlock(ptl); return NULL; @@ -90,7 +90,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + if (unlikely(!pud_leaf(pud))) goto out_unlock; /* @@ -225,7 +225,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); mmu_notifier_invalidate_range_start(&range); ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + if (unlikely(!pud_is_leaf(*pud))) goto out; __split_huge_pud_locked(vma, pud, range.start); diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c index 2f8829b3541a..a9ea767d2d73 100644 --- a/mm/mapping_dirty_helpers.c +++ b/mm/mapping_dirty_helpers.c @@ -129,7 +129,7 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end, pmd_t pmdval = pmdp_get_lockless(pmd); /* Do not split a huge pmd, present or migrated */ - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) { + if (pmd_is_leaf(pmdval)) { WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval)); walk->action = ACTION_CONTINUE; } @@ -152,7 +152,7 @@ static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned long end, pud_t pudval = READ_ONCE(*pud); /* Do not split a huge pud */ - if (pud_trans_huge(pudval) || pud_devmap(pudval)) { + if (pud_is_leaf(pudval)) { WARN_ON(pud_write(pudval) || pud_dirty(pudval)); walk->action = ACTION_CONTINUE; } diff --git a/mm/memory.c b/mm/memory.c index 126ee0903c79..6dc92c514bb7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1235,8 +1235,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pmd = pmd_offset(src_pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd) - || pmd_devmap(*src_pmd)) { + if (is_swap_pmd(*src_pmd) || pmd_is_leaf(*src_pmd)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma); err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, @@ -1272,7 +1271,7 @@ copy_pud_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pud = pud_offset(src_p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) { + if (pud_is_leaf(*src_pud)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PUD_SIZE, src_vma); @@ -1710,7 +1709,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + if (is_swap_pmd(*pmd) || pmd_is_leaf(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); else if (zap_huge_pmd(tlb, vma, pmd, addr)) { @@ -1752,7 +1751,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*pud) || pud_devmap(*pud)) { + if (pud_is_leaf(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { mmap_assert_locked(tlb->mm); split_huge_pud(vma, pud, addr); @@ -5605,8 +5604,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, pud_t orig_pud = *vmf.pud; barrier(); - if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { - + if (pud_is_leaf(orig_pud)) { /* * TODO once we support anonymous PUDs: NUMA case and * FAULT_FLAG_UNSHARE handling. @@ -5646,7 +5644,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, pmd_migration_entry_wait(mm, vmf.pmd); return 0; } - if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) { + if (pmd_is_leaf(vmf.orig_pmd)) { if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa..1fbeee9619c8 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -596,7 +596,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, pmdp = pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) + if (pmd_leaf(*pmdp)) goto abort; if (pte_alloc(mm, pmdp)) goto abort; diff --git a/mm/mprotect.c b/mm/mprotect.c index 694f13b83864..ddfee216a02b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -381,7 +381,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb, goto next; _pmd = pmdp_get_lockless(pmd); - if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { + if (is_swap_pmd(_pmd) || pmd_is_leaf(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); @@ -452,7 +452,7 @@ static inline long change_pud_range(struct mmu_gather *tlb, mmu_notifier_invalidate_range_start(&range); } - if (pud_leaf(pud)) { + if (pud_is_leaf(pud)) { if ((next - addr != PUD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pud(vma, pudp, addr); diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc640..f5c9884ea1f8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -587,7 +587,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr); if (!new_pud) break; - if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { + if (pud_is_leaf(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr, old_pud, new_pud, need_rmap_locks); @@ -609,8 +609,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, if (!new_pmd) break; again: - if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || - pmd_devmap(*old_pmd)) { + if (is_swap_pmd(*old_pmd) || pmd_is_leaf(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, old_pmd, new_pmd, need_rmap_locks)) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae5cc42aa208..891bea8062d2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -235,8 +235,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) */ pmde = pmdp_get_lockless(pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || - (pmd_present(pmde) && pmd_devmap(pmde))) { + if (pmd_is_leaf(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; if (!pmd_present(pmde)) { @@ -251,7 +250,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); return true; } - if (likely(pmd_trans_huge(pmde) || pmd_devmap(pmde))) { + if (likely(pmd_is_leaf(pmde))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); if (!check_pmd(pmd_pfn(pmde), pvmw)) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index e9fc3f6774a6..c7b7a803f4ad 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -139,8 +139,7 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && - !pmd_devmap(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_leaf(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; @@ -247,7 +246,7 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, pud_t pud; VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + VM_BUG_ON(!pud_leaf(*pudp)); pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); return pud; @@ -293,7 +292,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) goto nomap; - if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + if (unlikely(pmd_leaf(pmdval))) goto nomap; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd);