From patchwork Thu Oct 10 18:23:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 1995818 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=FcDrOokC; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=OSACH8Gj; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XPkcf1C4dz1xvg for ; Fri, 11 Oct 2024 09:16:33 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:Reply-To:List-Subscribe:List-Help: List-Post:List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID :References:Mime-Version:In-Reply-To:Date:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=l0gv3Eo1Iizp8e079hM207fwA0dXtk0n4pq+7qcRVHg=; b=FcDrOokCxcojS8 z8n5UgLWNMm12JbxhiPJmfm1LBTWNCm68y3HKZgcuzuzdXsVLDqkhgXiKRvM7WHLkFBY/yUXRgJyz N7zvxTy5+KYo8QoP8QsScBFYoUEBeBHNnonKYVXifloDc8pPqzwuqrgTBI1SvUOaNKrudNfdoh5yu bTkHQ94JZfRSg4WBdvvGBhQgnoTtg/7Itl0LVgJ8tNoKEIjs5HBv0l+9WV6DGTNC/zkoJPHk1g7F6 8KJHlVmAnVxqMbjPI1uOoLmsbJGEIuG9vhPczUW+aSyEfRob1aiC0Qhgvcz+lcQ4CRjtVM8d+hXLZ Rn+8C0l6WGVBV82YZAQQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1sz1SQ-0000000EVus-2RPC; Thu, 10 Oct 2024 22:16:30 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1syxrK-0000000DqSz-2245 for kvm-riscv@lists.infradead.org; Thu, 10 Oct 2024 18:26:01 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-e29135d1d0cso859776276.1 for ; Thu, 10 Oct 2024 11:25:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728584757; x=1729189557; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=dQaiAwtn89tr8uPQzQfay17fvGV3F3WPslUlHA+BgDY=; b=OSACH8GjnuyuiXpMfRDQCjz+rzwPNx3aGaepvL099zA8znTQC1egQ2seldu9OeijMU LazKJ2ZGVLT3cJM5U1v/DaYRFVPu0uxfn8zMN/VSc5LNS7kFSdzT1N8jORlYegWBdM/M 6+R11SLiQDzZrRPEGc1qEG0QzXF6NWx2dCA+IXrYAeG9Ro77XlesdpijfZpvz09fesrT q+fr7nWvVHD1GEkg/Q0M4eSGZNQMP2USY0zm3KIrxgzWmSOvWe+t8nR4i6Zd1AWzpjTY FkihywB7cmxcQ55tLY2qp4bZYNfslEZRg29M9xzmTj8+zhGiB5ThGCdBROkdMErWT8rb +SJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728584757; x=1729189557; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=dQaiAwtn89tr8uPQzQfay17fvGV3F3WPslUlHA+BgDY=; b=XgtA9ekaPH5VbYGv1F19g64F4dQ+IxrN4zMtraJcXWHV7Ao9BOWKJSPv/HKvMh545J qpEB1lyGG/wXI0TyYw2vChtmBy5x9XIqcCEXx+doT0Q3Rz3sW/em91AV0lRaR7X5DqyK K7PY2ABA0XEFospEtVflfeMmX9B2gV8D59vp/mEu7LHk6x5VYrQhP9sSzIQeH546x2Ar fsrO+wQa0AzmDm7guPIckuTpTXKMoUQXwazvfAFi5NlWc2U93luKhqLnfm5Roqt9q6sz ZGEClbLG4+yrbSGApBHDABQ2GJS31lMFap9FJe+Qyhg4K4rr9ZO9ar1qmzggqH3AAvUR TOXg== X-Forwarded-Encrypted: i=1; AJvYcCU3V+JU3EVleZWTgnww2vyg9NMAbhfWEdpXdKY3eKRgwDXfgpUs8Gf1j+Ja3U4c99Tyu9Dezh27icM=@lists.infradead.org X-Gm-Message-State: AOJu0YyKxqGymipgAvCvTSoa4RDHwRmjp8hCqQA5L+H6YHV1l2s9FHkO gCaSZZ1e6mSNMRMhGR9M4LF9ZYAgeLqeADDGQ5IeYBlAlxUs4U40JITC0LnNDV13C/J+/mzGBuD RgQ== X-Google-Smtp-Source: AGHT+IHY0yNiFmgeAXsfVKl7EbEBY2R3q1IXvjRBHtSPoxXhe2HbHq4twyB0Huv0R+/OHvP6V1ZW3nZzpzQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a25:80cb:0:b0:e20:2502:be14 with SMTP id 3f1490d57ef6-e28fe410672mr4860276.7.1728584757233; Thu, 10 Oct 2024 11:25:57 -0700 (PDT) Date: Thu, 10 Oct 2024 11:23:33 -0700 In-Reply-To: <20241010182427.1434605-1-seanjc@google.com> Mime-Version: 1.0 References: <20241010182427.1434605-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241010182427.1434605-32-seanjc@google.com> Subject: [PATCH v13 31/85] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Tianrui Zhao , Bibo Mao , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, " =?utf-8?q?Alex_Benn=C3=A9e?= " , Yan Zhao , David Matlack , David Stevens , Andrew Jones X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241010_112558_745004_83E36FE9 X-CRM114-Status: GOOD ( 22.60 ) X-Spam-Score: -9.5 (---------) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM. As per Documentation/core-api/pin_user_pages.rst, writing to a page that was gotten via FOLL_GET is explicitly disallowed. Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() Content analysis details: (-9.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:b49 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM welcome-list -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: kvm-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Sean Christopherson Sender: "kvm-riscv" Errors-To: kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM. As per Documentation/core-api/pin_user_pages.rst, writing to a page that was gotten via FOLL_GET is explicitly disallowed. Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() Unfortunately, FOLL_PIN is a "private" flag, and so kvm_follow_pfn must use a one-off bool instead of being able to piggyback the "flags" field. Link: https://lwn.net/Articles/930667 Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Tested-by: Alex Bennée Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 54 +++++++++++++++++++++++++++++----------- virt/kvm/kvm_mm.h | 7 ++++++ 3 files changed, 47 insertions(+), 16 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 02ab3a657aa6..8739b905d85b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -280,7 +280,7 @@ struct kvm_host_map { * can be used as guest memory but they are not managed by host * kernel). */ - struct page *refcounted_page; + struct page *pinned_page; struct page *page; void *hva; kvm_pfn_t pfn; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b1c1b7e4f33a..40a59526d466 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2814,9 +2814,12 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, */ if (map) { pfn = map->pfn; - page = kvm_pfn_to_refcounted_page(pfn); - if (page && !get_page_unless_zero(page)) - return KVM_PFN_ERR_FAULT; + + if (!kfp->pin) { + page = kvm_pfn_to_refcounted_page(pfn); + if (page && !get_page_unless_zero(page)) + return KVM_PFN_ERR_FAULT; + } } else { pfn = page_to_pfn(page); } @@ -2834,16 +2837,24 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) { struct page *page; + bool r; /* - * Fast pin a writable pfn only if it is a write fault request - * or the caller allows to map a writable pfn for a read fault - * request. + * Try the fast-only path when the caller wants to pin/get the page for + * writing. If the caller only wants to read the page, KVM must go + * down the full, slow path in order to avoid racing an operation that + * breaks Copy-on-Write (CoW), e.g. so that KVM doesn't end up pointing + * at the old, read-only page while mm/ points at a new, writable page. */ if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable)) return false; - if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) { + if (kfp->pin) + r = pin_user_pages_fast(kfp->hva, 1, FOLL_WRITE, &page) == 1; + else + r = get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page); + + if (r) { *pfn = kvm_resolve_pfn(kfp, page, NULL, true); return true; } @@ -2872,10 +2883,21 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) struct page *page, *wpage; int npages; - npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags); + if (kfp->pin) + npages = pin_user_pages_unlocked(kfp->hva, 1, &page, flags); + else + npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags); if (npages != 1) return npages; + /* + * Pinning is mutually exclusive with opportunistically mapping a read + * fault as writable, as KVM should never pin pages when mapping memory + * into the guest (pinning is only for direct accesses from KVM). + */ + if (WARN_ON_ONCE(kfp->map_writable && kfp->pin)) + goto out; + /* map read fault as writable if possible */ if (!(flags & FOLL_WRITE) && kfp->map_writable && get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) { @@ -2884,6 +2906,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) flags |= FOLL_WRITE; } +out: *pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE); return npages; } @@ -3099,10 +3122,11 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) .slot = gfn_to_memslot(vcpu->kvm, gfn), .gfn = gfn, .flags = FOLL_WRITE, - .refcounted_page = &map->refcounted_page, + .refcounted_page = &map->pinned_page, + .pin = true, }; - map->refcounted_page = NULL; + map->pinned_page = NULL; map->page = NULL; map->hva = NULL; map->gfn = gfn; @@ -3139,16 +3163,16 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty) if (dirty) kvm_vcpu_mark_page_dirty(vcpu, map->gfn); - if (map->refcounted_page) { + if (map->pinned_page) { if (dirty) - kvm_release_page_dirty(map->refcounted_page); - else - kvm_release_page_clean(map->refcounted_page); + kvm_set_page_dirty(map->pinned_page); + kvm_set_page_accessed(map->pinned_page); + unpin_user_page(map->pinned_page); } map->hva = NULL; map->page = NULL; - map->refcounted_page = NULL; + map->pinned_page = NULL; } EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index d3ac1ba8ba66..acef3f5c582a 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -30,6 +30,13 @@ struct kvm_follow_pfn { /* FOLL_* flags modifying lookup behavior, e.g. FOLL_WRITE. */ unsigned int flags; + /* + * Pin the page (effectively FOLL_PIN, which is an mm/ internal flag). + * The page *must* be pinned if KVM will write to the page via a kernel + * mapping, e.g. via kmap(), mremap(), etc. + */ + bool pin; + /* * If non-NULL, try to get a writable mapping even for a read fault. * Set to true if a writable mapping was obtained.