From patchwork Thu Oct 10 18:23:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 1995694 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=R9ysftX+; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=QOiWlENQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XPf9k5swTz1xsv for ; Fri, 11 Oct 2024 05:56:26 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:Reply-To:List-Subscribe:List-Help: List-Post:List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID :References:Mime-Version:In-Reply-To:Date:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=2pJIhcrC/qk4N4OkpqgKwZHDjxggbR84TPjXqpxKItE=; b=R9ysftX+wO22gr dA6FZjp/fUrxCs9hAKNYN9GLiZm0mi4OJU7uaVFcRf08m+jfUfASm2scE9n4DJbN8v3yR31cU6Gv1 UDBx3hAJN4uDDrXPVNt6PwSkZjFzKEdW3x3WjG+w8fKNKY3wZAInhla4pIe6csQznvpPBDwEkNMem DCzbBQC1lmxRbwWbvAWTR8xa5sJPMYgrYxdH42+5FnYrLw/Lu7ioBLWL9mBdGpb+4AkE5nobeTM5/ CoHNtE9IxSbv1YFd5dpT327Ife8M/r1yUGs2YRozB+DLhoTd/i+52zkYixo8XXp3CXHD2dmTWsCf4 Eadyy0H+sRZayyYfJ1Jw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1syyKn-0000000E0WV-1Ckk; Thu, 10 Oct 2024 18:56:25 +0000 Received: from mail-pf1-x44a.google.com ([2607:f8b0:4864:20::44a]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1syxrC-0000000DqLq-1kvU for kvm-riscv@lists.infradead.org; Thu, 10 Oct 2024 18:25:53 +0000 Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-71e00a395b1so1476155b3a.0 for ; Thu, 10 Oct 2024 11:25:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728584749; x=1729189549; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=txhtY6exXDaGFQAWYGvmt09bjeoYz319F03IR2k/C3M=; b=QOiWlENQw1uIbr6nUNrqjMhma3ERcOVij9+O8o2BjPvO4mrYl4XsAmuLiusexGR8og wjRSA7WvE5P8YAxpwHCCocnQY3fc2JIYkO+hhKk6wpWzFsfI8BAMyortVof5wkh28VS0 Qt6uIQVR43FVlVhGrf1uAMsyy/6cvsQrPWf167zPqypNnCLXzoYxB2jdJWRtBimBpcWt uoYysw3iFUs8BsYcyGBmeWFo8/VrIQKNDreM/iOY/RkbKvOAdmKLBqSp17mNc5/WUjiD njASJ7YlaTL5HOieVnKQztyEjeiufrz2c0OKSD/CKjo1u2aS16zJOMO10Q+JaDo62pQq 63gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728584749; x=1729189549; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=txhtY6exXDaGFQAWYGvmt09bjeoYz319F03IR2k/C3M=; b=OCbbwRmaajox7SHw39yiNZheQTM3fPNosc8rTHDo3XeUFkvLvslRGKf5f5SZkQGIgB uQiqqTqEqRHyX4v2mHJgYhr7+OY6qItGBRTyxLZtIVN8MZzhBXLzdHMDCQ+uqIcDjidG nrmH0y7eYDGQFML9TGqV8sNoAmkxF6/f4+DMq9nvVJahWicGXzkxTSkmRxpS5kmYq8tD eEEA4MuCmascmyleoEx2sk9e9jZOM8VFnucgFFqUqnWC+PjK3Sywc9Em3UseAFeJ5Oqk 8TGL2lITnS9T9gCWQLXsrknM1Bw5VXHzSVObSklyfaMRbgLf631cRxEY0frK9Ecne/uJ qxKg== X-Forwarded-Encrypted: i=1; AJvYcCVrKts8o8CEmmNind5stqP8Vo1+oR2CGfeZIcMW0tJRxwpiujMC9NuewMwHEVpLatN6Ow3adrX+8Lk=@lists.infradead.org X-Gm-Message-State: AOJu0YyJM8+zdstawxmH4CU5uxpo+IvPm7zfV5VfuvcJnNizdj76xsQ/ dgrnFSxCiu1q2kLXdbBnaDSurP0AhutEfAFc9hShaB2g81cZSjHAkCGfo1GW1nMY8LqpW4NhQ/y yEg== X-Google-Smtp-Source: AGHT+IE/uLa8aldzd4vJMnq+De69DiThAuVt+3q8zF6rqxlBj2/ZKo7+FHCoJyy45GpCD4mZjJfGV3U05Tk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a05:6a00:2d8a:b0:71d:f1f9:b982 with SMTP id d2e1a72fcca58-71e1dc00b9cmr38199b3a.6.1728584749206; Thu, 10 Oct 2024 11:25:49 -0700 (PDT) Date: Thu, 10 Oct 2024 11:23:29 -0700 In-Reply-To: <20241010182427.1434605-1-seanjc@google.com> Mime-Version: 1.0 References: <20241010182427.1434605-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241010182427.1434605-28-seanjc@google.com> Subject: [PATCH v13 27/85] KVM: Provide refcounted page as output field in struct kvm_follow_pfn From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Tianrui Zhao , Bibo Mao , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, " =?utf-8?q?Alex_Benn=C3=A9e?= " , Yan Zhao , David Matlack , David Stevens , Andrew Jones X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241010_112550_709180_1CF07B75 X-CRM114-Status: GOOD ( 22.69 ) X-Spam-Score: -9.5 (---------) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Add kvm_follow_pfn.refcounted_page as an output for the "to pfn" APIs to "return" the struct page that is associated with the returned pfn (if KVM acquired a reference to the page). This will eventual [...] Content analysis details: (-9.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:44a listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM welcome-list -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: kvm-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Sean Christopherson Sender: "kvm-riscv" Errors-To: kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Add kvm_follow_pfn.refcounted_page as an output for the "to pfn" APIs to "return" the struct page that is associated with the returned pfn (if KVM acquired a reference to the page). This will eventually allow removing KVM's hacky kvm_pfn_to_refcounted_page() code, which is error prone and can't detect pfns that are valid, but aren't (currently) refcounted. Tested-by: Alex Bennée Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 99 +++++++++++++++++++++------------------------ virt/kvm/kvm_mm.h | 9 +++++ 2 files changed, 56 insertions(+), 52 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d3e48fcc4fb0..e29f78ed6f48 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2746,6 +2746,46 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w return gfn_to_hva_memslot_prot(slot, gfn, writable); } +static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, + struct follow_pfnmap_args *map, bool writable) +{ + kvm_pfn_t pfn; + + WARN_ON_ONCE(!!page == !!map); + + if (kfp->map_writable) + *kfp->map_writable = writable; + + /* + * FIXME: Remove this once KVM no longer blindly calls put_page() on + * every pfn that points at a struct page. + * + * Get a reference for follow_pte() pfns if they happen to point at a + * struct page, as KVM will ultimately call kvm_release_pfn_clean() on + * the returned pfn, i.e. KVM expects to have a reference. + * + * Certain IO or PFNMAP mappings can be backed with valid struct pages, + * but be allocated without refcounting, e.g. tail pages of + * non-compound higher order allocations. Grabbing and putting a + * reference to such pages would cause KVM to prematurely free a page + * it doesn't own (KVM gets and puts the one and only reference). + * Don't allow those pages until the FIXME is resolved. + */ + if (map) { + pfn = map->pfn; + page = kvm_pfn_to_refcounted_page(pfn); + if (page && !get_page_unless_zero(page)) + return KVM_PFN_ERR_FAULT; + } else { + pfn = page_to_pfn(page); + } + + if (kfp->refcounted_page) + *kfp->refcounted_page = page; + + return pfn; +} + /* * The fast path to get the writable pfn which will be stored in @pfn, * true indicates success, otherwise false is returned. @@ -2763,9 +2803,7 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) return false; if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) { - *pfn = page_to_pfn(page); - if (kfp->map_writable) - *kfp->map_writable = true; + *pfn = kvm_resolve_pfn(kfp, page, NULL, true); return true; } @@ -2797,23 +2835,15 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) if (npages != 1) return npages; - if (!kfp->map_writable) - goto out; - - if (kfp->flags & FOLL_WRITE) { - *kfp->map_writable = true; - goto out; - } - /* map read fault as writable if possible */ - if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) { - *kfp->map_writable = true; + if (!(flags & FOLL_WRITE) && kfp->map_writable && + get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) { put_page(page); page = wpage; + flags |= FOLL_WRITE; } -out: - *pfn = page_to_pfn(page); + *pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE); return npages; } @@ -2828,22 +2858,11 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; } -static int kvm_try_get_pfn(kvm_pfn_t pfn) -{ - struct page *page = kvm_pfn_to_refcounted_page(pfn); - - if (!page) - return 1; - - return get_page_unless_zero(page); -} - static int hva_to_pfn_remapped(struct vm_area_struct *vma, struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn) { struct follow_pfnmap_args args = { .vma = vma, .address = kfp->hva }; bool write_fault = kfp->flags & FOLL_WRITE; - kvm_pfn_t pfn; int r; r = follow_pfnmap_start(&args); @@ -2867,37 +2886,13 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, } if (write_fault && !args.writable) { - pfn = KVM_PFN_ERR_RO_FAULT; + *p_pfn = KVM_PFN_ERR_RO_FAULT; goto out; } - if (kfp->map_writable) - *kfp->map_writable = args.writable; - pfn = args.pfn; - - /* - * Get a reference here because callers of *hva_to_pfn* and - * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the - * returned pfn. This is only needed if the VMA has VM_MIXEDMAP - * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will - * simply do nothing for reserved pfns. - * - * Whoever called remap_pfn_range is also going to call e.g. - * unmap_mapping_range before the underlying pages are freed, - * causing a call to our MMU notifier. - * - * Certain IO or PFNMAP mappings can be backed with valid - * struct pages, but be allocated without refcounting e.g., - * tail pages of non-compound higher order allocations, which - * would then underflow the refcount when the caller does the - * required put_page. Don't allow those pages here. - */ - if (!kvm_try_get_pfn(pfn)) - r = -EFAULT; + *p_pfn = kvm_resolve_pfn(kfp, NULL, &args, args.writable); out: follow_pfnmap_end(&args); - *p_pfn = pfn; - return r; } diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index d5a215958f06..d3ac1ba8ba66 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -35,6 +35,15 @@ struct kvm_follow_pfn { * Set to true if a writable mapping was obtained. */ bool *map_writable; + + /* + * Optional output. Set to a valid "struct page" if the returned pfn + * is for a refcounted or pinned struct page, NULL if the returned pfn + * has no struct page or if the struct page is not being refcounted + * (e.g. tail pages of non-compound higher order allocations from + * IO/PFNMAP mappings). + */ + struct page **refcounted_page; }; kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);