From patchwork Thu Oct 10 18:23:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 1995635 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=hHuOP6RT; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=V5FW3JcZ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XPdbB3bXxz1xvW for ; Fri, 11 Oct 2024 05:29:58 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:Reply-To:List-Subscribe:List-Help: List-Post:List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID :Mime-Version:Date:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=VIb+M2CvnIexY2g53C3fL1A0WCi7EKDc4gz29aNLzHM=; b=hHuOP6RTmWG8ZS dcWiNQ9RrhyHaFHpeOiY9eQZ4nzrHlZusXxa6P5yEw1zAOeoK0X6CQGM0iKxcC/ErjlKBXTVw67lG 4ebsAmQLxehiqUlmQ8HwRAy8GBaHLqn+y5hDCtTxobLnrEYBFWBfUnNrHObfw/gFptzuGNcuro1tH 7AAdLK6rUHorMJhTss/2ZHsZMudh7obOAtwJaItDM4Y1MDLYOWmd7vUyRGO+PnEaXO7In8lH7iwIM lRHvubEkjBnriIj9suP1estpOcM1EzFyDU+5+u7PkR7PqPudw9T2OCRlonwnXT/fZApq0RU7p7uGG Ev8rR/Y27SXLCZmL3L+w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1syxvB-0000000Dsov-0GTk; Thu, 10 Oct 2024 18:29:57 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1syxqE-0000000Dpb5-3SV4 for kvm-riscv@lists.infradead.org; Thu, 10 Oct 2024 18:24:52 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-6e28b624bfcso20355597b3.2 for ; Thu, 10 Oct 2024 11:24:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728584688; x=1729189488; darn=lists.infradead.org; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=PBRfCittAXjccGkBtqBU7H4iPRP62jjV6+4PwlptPDA=; b=V5FW3JcZrSn1h+naAFlk3KBT5/T5frlese0fDkjAzRHSYa65GjXTE/DG2hdpQWtehA Fow3dDP8hJR8e3yGvU74FO6kDOQzQMKWnObjE9ttPj69002L8aZw4F8QW8EyS5G7V7vy 0QCIMH4zB4Qb8VFStHww/xVzArTB51n6h/72YX3K1drKCN2GmIdtI3PfFHfh6H4SN9f+ M/YUl26tF7/MtDyylmMV9EXR+ffCZEmA2M5TkCf1bfAfYDxeVMTpzxdEMAfYr8vncjXz pS6fbFFxp+E2YVUUOS9CGyCyBjbLpAfdLI2qMEVCsv8+FXd4CteIXMTEXOe3FxaCy2Sj Y8nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728584688; x=1729189488; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PBRfCittAXjccGkBtqBU7H4iPRP62jjV6+4PwlptPDA=; b=Sq3ARjpzCRal++52p4B6nnl8swlRB0ZxTDBwTuI3NfgXnC45E6q1lmYXnRenN+fLyo AKdCdIm3/ll3m+ZLne8PYhuwQ7B95kzCycYGNRb8cjZtcD0Skcfvj1lnBksoVU1jvJe+ ZUVsDtCA3M83+zvjFFIsBbd3gFx63PfRrE+9fjnbZifSX0IsnJnVSakAYV7PHw9/sRV0 U/+tSfQg84RggDTx6+rpeSjqphY3MpikecGYmDc/RbWvZsYKdeSBFmT1V4AtPT39IM4h RjXKXtnmkyKa1ZOxWziiJF+GAeP8ktQefaKmDIW2984zbN8CvEQcM3ldApWP7bpFe7Um zCVA== X-Forwarded-Encrypted: i=1; AJvYcCW51atN5yv1rI7PZ0hRaQgi+K3kVznQhMzfZuGIHZLn+qQ57ysPKN7ZUWD5HFrth88Bz+WsLdzYrsU=@lists.infradead.org X-Gm-Message-State: AOJu0Ywcpw4+gxg3OM3+o1FqDFcF2oGMseIpsQZmdFJgf+WiSs9SmUD0 uRxNBLVhRMNL9MWI5LW1lqvcSijL2QXGaU9GI9LhPJprKm3Ziz9amXbaigvhZ3jjCZZ0nj85acL pLA== X-Google-Smtp-Source: AGHT+IFVWhSyNvZCWHkSlnUyMVGf0mcCDs6zWifUN6uyeZ0ix6vlXwCyyjUyA4S+yzba7AHFzz2C9NUEgCs= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a05:690c:3006:b0:6e2:1b8c:39bf with SMTP id 00721157ae682-6e32213b1b4mr226037b3.2.1728584688198; Thu, 10 Oct 2024 11:24:48 -0700 (PDT) Date: Thu, 10 Oct 2024 11:23:02 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241010182427.1434605-1-seanjc@google.com> Subject: [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Tianrui Zhao , Bibo Mao , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, " =?utf-8?q?Alex_Benn=C3=A9e?= " , Yan Zhao , David Matlack , David Stevens , Andrew Jones X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241010_112450_907717_95B13625 X-CRM114-Status: GOOD ( 16.85 ) X-Spam-Score: -9.5 (---------) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: TL;DR: Eliminate KVM's long-standing (and heinous) behavior of essentially guessing which pfns are refcounted pages (see kvm_pfn_to_refcounted_page()). Getting there requires "fixing" arch code that isn't obviously broken. Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to stop marking pages/folios dirty/accessed based solely on t [...] Content analysis details: (-9.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM welcome-list -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:1149 listed in] [list.dnswl.org] -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: kvm-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Sean Christopherson Sender: "kvm-riscv" Errors-To: kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org TL;DR: Eliminate KVM's long-standing (and heinous) behavior of essentially guessing which pfns are refcounted pages (see kvm_pfn_to_refcounted_page()). Getting there requires "fixing" arch code that isn't obviously broken. Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to stop marking pages/folios dirty/accessed based solely on the pfn that's stored in KVM's stage-2 page tables. Instead of tracking which SPTEs correspond to refcounted pages, simply remove all of the code that operates on "struct page" based ona the pfn in stage-2 PTEs. This is the back ~40-50% of the series. For x86 in particular, which sets accessed/dirty status when that info would be "lost", e.g. when SPTEs are zapped or KVM clears the dirty flag in a SPTE, foregoing the updates provides very measurable performance improvements for related operations. E.g. when clearing dirty bits as part of dirty logging, and zapping SPTEs to reconstitue huge pages when disabling dirty logging. The front ~40% of the series is cleanups and prep work, and most of it is x86 focused (purely because x86 added the most special cases, *sigh*). E.g. several of the inputs to hva_to_pfn() (and it's myriad wrappers), can be removed by cleaning up and deduplicating x86 code. v13: - Rebased onto v6.12-rc2 - Collect reviews. [Alex and others] - Fix a transient bug in arm64 and RISC-V where KVM would leak a page refcount. [Oliver] - Fix a dangling comment. [Alex] - Drop kvm_lookup_pfn(), as the x86 that "needed" it was stupid and is (was?) eliminated in v6.12. - Drop check_user_page_hwpoison(). [Paolo] - Drop the arm64 MTE fixes that went into 6.12. - Slightly redo the guest_memfd interaction to account for 6.12 changes. v12: https://lore.kernel.org/all/20240726235234.228822-1-seanjc@google.com David Stevens (3): KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Sean Christopherson (82): KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE KVM: x86/mmu: Don't overwrite shadow-present MMU SPTEs when prefaulting KVM: x86/mmu: Invert @can_unsync and renamed to @synchronizing KVM: x86/mmu: Mark new SPTE as Accessed when synchronizing existing SPTE KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs KVM: Annotate that all paths in hva_to_pfn() might sleep KVM: Return ERR_SIGPENDING from hva_to_pfn() if GUP returns -EGAIN KVM: Drop extra GUP (via check_user_page_hwpoison()) to detect poisoned page KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map() KVM: Explicitly initialize all fields at the start of kvm_vcpu_map() KVM: Use NULL for struct page pointer to indicate mremapped memory KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx KVM: nVMX: Add helper to put (unmap) vmcs12 pages KVM: Use plain "struct page" pointer instead of single-entry array KVM: Provide refcounted page as output field in struct kvm_follow_pfn KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c KVM: pfncache: Precisely track refcounted pages KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap() KVM: Get writable mapping for __kvm_vcpu_map() only when necessary KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default KVM: x86: Don't fault-in APIC access page during initial allocation KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean() KVM: x86/mmu: Add common helper to handle prefetching SPTEs KVM: x86/mmu: Add helper to "finish" handling a guest page fault KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() KVM: Move declarations of memslot accessors up in kvm_host.h KVM: Add kvm_faultin_pfn() to specifically service guest page faults KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn() KVM: guest_memfd: Pass index, not gfn, to __kvm_gmem_get_pfn() KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns KVM: x86/mmu: Don't mark unused faultin pages as accessed KVM: Move x86's API to release a faultin page to common KVM KVM: VMX: Hold mmu_lock until page is released when updating APIC access page KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map() KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page() KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault path KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault path KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest KVM: PPC: Remove extra get_page() to fix page refcount leak KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions KVM: Convert gfn_to_page() to use kvm_follow_pfn() KVM: Add support for read-only usage of gfn_to_page() KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace KVM: PPC: Explicitly require struct page memory for Ultravisor sharing KVM: Drop gfn_to_pfn() APIs now that all users are gone KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory KVM: Make kvm_follow_pfn.refcounted_page a required field KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs KVM: arm64: Don't mark "struct page" accessed when making SPTE young KVM: Drop APIs that manipulate "struct page" via pfns KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Documentation/virt/kvm/locking.rst | 80 ++-- arch/arm64/include/asm/kvm_pgtable.h | 4 +- arch/arm64/kvm/guest.c | 21 +- arch/arm64/kvm/hyp/pgtable.c | 7 +- arch/arm64/kvm/mmu.c | 21 +- arch/loongarch/kvm/mmu.c | 40 +- arch/mips/kvm/mmu.c | 26 +- arch/powerpc/include/asm/kvm_book3s.h | 4 +- arch/powerpc/kvm/book3s.c | 7 +- arch/powerpc/kvm/book3s_32_mmu_host.c | 7 +- arch/powerpc/kvm/book3s_64_mmu_host.c | 12 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 35 +- arch/powerpc/kvm/book3s_hv_nested.c | 4 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 25 +- arch/powerpc/kvm/book3s_pr.c | 14 +- arch/powerpc/kvm/book3s_xive_native.c | 2 +- arch/powerpc/kvm/e500_mmu_host.c | 19 +- arch/riscv/kvm/mmu.c | 9 +- arch/s390/kvm/vsie.c | 4 +- arch/x86/kvm/lapic.c | 12 - arch/x86/kvm/mmu/mmu.c | 181 ++++---- arch/x86/kvm/mmu/mmu_internal.h | 7 +- arch/x86/kvm/mmu/paging_tmpl.h | 31 +- arch/x86/kvm/mmu/spte.c | 31 +- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 23 +- arch/x86/kvm/svm/nested.c | 4 +- arch/x86/kvm/svm/sev.c | 12 +- arch/x86/kvm/svm/svm.c | 8 +- arch/x86/kvm/vmx/nested.c | 42 +- arch/x86/kvm/vmx/vmx.c | 28 +- arch/x86/kvm/vmx/vmx.h | 2 - include/linux/kvm_host.h | 123 +++-- virt/kvm/guest_memfd.c | 28 +- virt/kvm/kvm_main.c | 602 +++++++++---------------- virt/kvm/kvm_mm.h | 36 +- virt/kvm/pfncache.c | 20 +- 38 files changed, 689 insertions(+), 869 deletions(-) base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b Tested-by: Dmitry Osipenko