From patchwork Fri Jul 26 23:51:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 1965474 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=QPeVj+2a; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=Rimon+Em; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WW4cj3YP4z1yY5 for ; Sat, 27 Jul 2024 10:04:53 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:Reply-To:List-Subscribe:List-Help: List-Post:List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID :References:Mime-Version:In-Reply-To:Date:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=BsvZ8nruhyOdFNTZhVZ0tbKjiih4HFsX4a757NJ93iU=; b=QPeVj+2aZmkVbO 5W2aVRUYGOSO73HXJVd6k31QVhpMMZmNw26ZUSWjonBNGO22blDowzP8UgY//PTlRPO6zLaUKbhpE gvPf1Ff38fVOMqd8e1E9cqMzPNkGqIPCQWasnZf+Ix9VGGS0rj+UB0Bfxrpx0VNuOAuHDPu0GMiYV 9sAJdgPmL8fqJpUFgHnHcZ6MCFKO/zpxycT482+vaENhmdfhgbIHt+JXzU7ar/HSarnthmv0xLETy wIbEHnhGBrlKGVh+0PPUZjQkkOp4Jgh8A0TOkjaeS3pxDxJFHjkmXZxzLi1MOUtlEdxV8bF1chBh6 bYttzi0LKB/SfrsY6byA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXUvc-00000005WN6-0JSq; Sat, 27 Jul 2024 00:04:52 +0000 Received: from mail-pf1-f201.google.com ([209.85.210.201]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXUkl-00000005Pvv-1b9I for kvm-riscv@lists.infradead.org; Fri, 26 Jul 2024 23:53:44 +0000 Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-70e93462241so1461454b3a.3 for ; Fri, 26 Jul 2024 16:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722038018; x=1722642818; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=hlhq7We3LF/Ul7sqre1CZocM6x6KNUXlB6LyC83BMPc=; b=Rimon+EmJLdmFAU/YCFWwPAynZDrSIiIUDPOfVWli3+6vmBXbpmxN1aF/tS7Z+yKfe FZ/OE66ljZHOW7flSTxNWWvXhWPz7iuDs0m5dPCYDHV9GFZvawreVCuSStXWJH5tHnb2 krXpkkMyQsBL57JpsHGm9l4CuVY7gL17WI/K2oeg3F12kTdaRZLi2Se5w1If6OHFOzCx vOeFUoKk1uthZGP/BEcQyLhCqxrq20kWaqfixxgWZP1WG896sTJULtxFctJpd/rYZC0x 67CyS3QyG615/tOK7fEsmeE2MBYDxNFCq3FynXpTDb31GGo9vc2p6rAgxt9MpEE3JjiR +dKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722038018; x=1722642818; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hlhq7We3LF/Ul7sqre1CZocM6x6KNUXlB6LyC83BMPc=; b=QGgh47QzVISKAeZLg5qSDVKUB06SvHUHTs7LZLpJwkuWATdcGpXd6ZDHxFWnZZhrc8 TifdE9Oh1PRAQqsvqwsqafXExdQJBMwV2iL03K07/2BGBvKH75pdcSZYGO3EUmneLnrH PAqtf12T9AAvinlSZN+/R4bpj6PQhpAS8ntEkhke65AA7xybY1ql8NEnSzx7lD1Jp4R4 M4Bu2ncaGu7y4SXWQfogPXdTUZocKz4sEme15g0vW8A2LqQagvw6mmE6SlAhT8t98fzV bnlB50/ze/K68bF7l23kQuNOe/NboXd48iWVvtI7/+c3GXHPhRNPbFeh/vrgZodjRGrx AXkQ== X-Forwarded-Encrypted: i=1; AJvYcCVIyC/wONIpmzfYv7OFzDWP8UQ7SEcorYVjB2Y07xOLw2cBESOpDq7n+Rq8BY+N8Vak7oHlnUnyAAJgh98CPvb29cJ85H/2o/mglOLpxQ== X-Gm-Message-State: AOJu0YzBO+nFnI+jM3zzJr1qCw8kXvhuEwkh5rQ38/fDrvGV/YYpM0+N g2VdKMFzXcgYJax7SHkqSw0d/M8z3+z3yIOQlBNO9WEqXmA85b+4zBY2DZlmOfI8jNhWhN1HowC 5ZQ== X-Google-Smtp-Source: AGHT+IHh1dyWiT0um9sPElrkv3mBIH4IG/eZ2ITHeKZHWjYM95v4ERaP3jgKxmY1KlRL2J6TSkYz8SuqfJg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:2f10:b0:70e:9de1:9edf with SMTP id d2e1a72fcca58-70ece9fc2c1mr8667b3a.1.1722038017517; Fri, 26 Jul 2024 16:53:37 -0700 (PDT) Date: Fri, 26 Jul 2024 16:51:38 -0700 In-Reply-To: <20240726235234.228822-1-seanjc@google.com> Mime-Version: 1.0 References: <20240726235234.228822-1-seanjc@google.com> X-Mailer: git-send-email 2.46.0.rc1.232.g9752f9e123-goog Message-ID: <20240726235234.228822-30-seanjc@google.com> Subject: [PATCH v12 29/84] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Tianrui Zhao , Bibo Mao , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, David Matlack , David Stevens X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240726_165339_530341_8CC0B29F X-CRM114-Status: GOOD ( 23.63 ) X-Spam-Score: -9.5 (---------) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM. As per Documentation/core-api/pin_user_pages.rst, writing to a page that was gotten via FOLL_GET is explicitly disallowed. Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() Content analysis details: (-9.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.201 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM welcome-list 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.201 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: kvm-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Sean Christopherson Sender: "kvm-riscv" Errors-To: kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM. As per Documentation/core-api/pin_user_pages.rst, writing to a page that was gotten via FOLL_GET is explicitly disallowed. Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() Unfortunately, FOLL_PIN is a "private" flag, and so kvm_follow_pfn must use a one-off bool instead of being able to piggyback the "flags" field. Link: https://lwn.net/Articles/930667 Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 54 +++++++++++++++++++++++++++++----------- virt/kvm/kvm_mm.h | 7 ++++++ 3 files changed, 47 insertions(+), 16 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8b5ac3305b05..3d4094ece479 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -280,7 +280,7 @@ struct kvm_host_map { * can be used as guest memory but they are not managed by host * kernel). */ - struct page *refcounted_page; + struct page *pinned_page; struct page *page; void *hva; kvm_pfn_t pfn; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 255cbed83b40..4a9b99c11355 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2824,9 +2824,12 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, */ if (pte) { pfn = pte_pfn(*pte); - page = kvm_pfn_to_refcounted_page(pfn); - if (page && !get_page_unless_zero(page)) - return KVM_PFN_ERR_FAULT; + + if (!kfp->pin) { + page = kvm_pfn_to_refcounted_page(pfn); + if (page && !get_page_unless_zero(page)) + return KVM_PFN_ERR_FAULT; + } } else { pfn = page_to_pfn(page); } @@ -2845,16 +2848,24 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) { struct page *page; + bool r; /* - * Fast pin a writable pfn only if it is a write fault request - * or the caller allows to map a writable pfn for a read fault - * request. + * Try the fast-only path when the caller wants to pin/get the page for + * writing. If the caller only wants to read the page, KVM must go + * down the full, slow path in order to avoid racing an operation that + * breaks Copy-on-Write (CoW), e.g. so that KVM doesn't end up pointing + * at the old, read-only page while mm/ points at a new, writable page. */ if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable)) return false; - if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) { + if (kfp->pin) + r = pin_user_pages_fast(kfp->hva, 1, FOLL_WRITE, &page) == 1; + else + r = get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page); + + if (r) { *pfn = kvm_resolve_pfn(kfp, page, NULL, true); return true; } @@ -2883,10 +2894,21 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) struct page *page, *wpage; int npages; - npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags); + if (kfp->pin) + npages = pin_user_pages_unlocked(kfp->hva, 1, &page, flags); + else + npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags); if (npages != 1) return npages; + /* + * Pinning is mutually exclusive with opportunistically mapping a read + * fault as writable, as KVM should never pin pages when mapping memory + * into the guest (pinning is only for direct accesses from KVM). + */ + if (WARN_ON_ONCE(kfp->map_writable && kfp->pin)) + goto out; + /* map read fault as writable if possible */ if (!(flags & FOLL_WRITE) && kfp->map_writable && get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) { @@ -2895,6 +2917,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) flags |= FOLL_WRITE; } +out: *pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE); return npages; } @@ -3119,10 +3142,11 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) .slot = gfn_to_memslot(vcpu->kvm, gfn), .gfn = gfn, .flags = FOLL_WRITE, - .refcounted_page = &map->refcounted_page, + .refcounted_page = &map->pinned_page, + .pin = true, }; - map->refcounted_page = NULL; + map->pinned_page = NULL; map->page = NULL; map->hva = NULL; map->gfn = gfn; @@ -3159,16 +3183,16 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty) if (dirty) kvm_vcpu_mark_page_dirty(vcpu, map->gfn); - if (map->refcounted_page) { + if (map->pinned_page) { if (dirty) - kvm_release_page_dirty(map->refcounted_page); - else - kvm_release_page_clean(map->refcounted_page); + kvm_set_page_dirty(map->pinned_page); + kvm_set_page_accessed(map->pinned_page); + unpin_user_page(map->pinned_page); } map->hva = NULL; map->page = NULL; - map->refcounted_page = NULL; + map->pinned_page = NULL; } EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index d3ac1ba8ba66..acef3f5c582a 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -30,6 +30,13 @@ struct kvm_follow_pfn { /* FOLL_* flags modifying lookup behavior, e.g. FOLL_WRITE. */ unsigned int flags; + /* + * Pin the page (effectively FOLL_PIN, which is an mm/ internal flag). + * The page *must* be pinned if KVM will write to the page via a kernel + * mapping, e.g. via kmap(), mremap(), etc. + */ + bool pin; + /* * If non-NULL, try to get a writable mapping even for a read fault. * Set to true if a writable mapping was obtained.