From patchwork Tue Oct 31 13:19:09 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tim Gardner <tim.gardner@canonical.com>
X-Patchwork-Id: 1857568
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com
 (client-ip=185.125.189.65; helo=lists.ubuntu.com;
 envelope-from=kernel-team-bounces@lists.ubuntu.com;
 receiver=patchwork.ozlabs.org)
Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65])
	(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKW2d3BbGz1yQZ
	for <incoming@patchwork.ozlabs.org>; Wed,  1 Nov 2023 00:19:53 +1100 (AEDT)
Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com)
	by lists.ubuntu.com with esmtp (Exim 4.86_2)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1qxoej-0004bP-Pt; Tue, 31 Oct 2023 13:19:42 +0000
Received: from smtp-relay-internal-1.internal ([10.131.114.114]
 helo=smtp-relay-internal-1.canonical.com)
 by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.86_2) (envelope-from <tim.gardner@canonical.com>)
 id 1qxoeL-0004Yo-7y
 for kernel-team@lists.ubuntu.com; Tue, 31 Oct 2023 13:19:17 +0000
Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com
 [209.85.210.198])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id C2BF83F1D9
 for <kernel-team@lists.ubuntu.com>; Tue, 31 Oct 2023 13:19:16 +0000 (UTC)
Received: by mail-pf1-f198.google.com with SMTP id
 d2e1a72fcca58-6bf2b098e43so3917442b3a.3
 for <kernel-team@lists.ubuntu.com>; Tue, 31 Oct 2023 06:19:16 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1698758355; x=1699363155;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=hLgFuPMTPrUoBcwTEEoMNawez4lozVQER1IFXBccyro=;
 b=tJp8qd2mSG4mDsMYiRqGstHLBghhDM/xuHTAERkTkYlnHPAxQbc8LxKMnk+Yfh6XTy
 e4naxvIcV0Lnbg0eGo7iYJMoVqhlp0Hcl3Uibi0W5R9fiY7z0cUJKoVdMno5RGW1Ljhi
 JBrlDnRzrUzopuW3jaLAPaiAQZFwKCXi/WY5QeAhqixICk34azWfx9EcaOrPNHIVKfM6
 /bcg92wV5NktdLW+VjuptDwpRSVPCZFWoUuIMV2RA6rDW9AnjoF1B09NXHfLohz/XfWg
 fjuE5qNF/wdUz6wCC1/LLsD2wNwo1M4+yBOtgtVwt8zzpGVTAxvfbnIUUMuMstKlGYg1
 5nQA==
X-Gm-Message-State: AOJu0Yyon70iI5wRv3uxBnhcKO5h6ksxT4WKH6C229CgcTRz4AZUL6q0
 G42ELoWhvbKzauNi1MmMh8Uy8/qNgDVLq8rGNbMKkZ1+OWltdGfitcQBsbZ/WkxlE5R/ZqlwzL/
 syV3UdLtztjPRalF2M43HsDbHSeLlN/PJTks+FUSslWYWsQgoJA==
X-Received: by 2002:a05:6a20:7289:b0:17a:fa76:8062 with SMTP id
 o9-20020a056a20728900b0017afa768062mr13086393pzk.19.1698758354634;
 Tue, 31 Oct 2023 06:19:14 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IF+Wd3ELNzs6d+0F28pyTUdFYdnuXQuVEPXt8ZdPYQJp9lCl0xBWmljelTWBhS+6vqw0gU0tw==
X-Received: by 2002:a05:6a20:7289:b0:17a:fa76:8062 with SMTP id
 o9-20020a056a20728900b0017afa768062mr13086362pzk.19.1698758354122;
 Tue, 31 Oct 2023 06:19:14 -0700 (PDT)
Received: from smtp.gmail.com (174-045-099-030.res.spectrum.com.
 [174.45.99.30]) by smtp.gmail.com with ESMTPSA id
 j21-20020a056a00235500b006be484e5b9bsm1230057pfj.58.2023.10.31.06.19.13
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 31 Oct 2023 06:19:13 -0700 (PDT)
From: Tim Gardner <tim.gardner@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [PATCH 2/2] UBUNTU: SAUCE: Refresh the TDX support and support DDA
 for a TDX VM with paravisor
Date: Tue, 31 Oct 2023 07:19:09 -0600
Message-Id: <20231031131909.99632-3-tim.gardner@canonical.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20231031131909.99632-1-tim.gardner@canonical.com>
References: <20231031131909.99632-1-tim.gardner@canonical.com>
MIME-Version: 1.0
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

From: Dexuan Cui <decui@microsoft.com>

BugLink: https://bugs.launchpad.net/bugs/2042096

Ideally we would revert
commit b8b46adebbd8 ("UBUNTU: SAUCE: Support TDX+HCL (July 9, 2023)"), and
apply "[PATCH v7 0/8] x86/hyperv: Add AMD sev-snp enlightened guest support on hyperv" [1]
and apply "[PATCH v3 00/10] Support TDX guests on Hyper-V (the Hyper-V specific part)" [2]
(Note: [2] depends on [1]), but that would introduce too many changes, and
actually "AMD sev-snp enlightened guest support on hyperv" still needs some
extra patches that are not in the upstream yet, e.g. Tianyu Lan's #HV
interrupt injection patch [3] is not in the upstream yet.

So I think a better way to have [2] is to make a patch that adds the missing
part of [2] for the 6.2-based linux-azure kernel, hence I made this patch.

This patch mainly does the below two things:

a) Add commit 23378295042a ("Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor") [4]
This fixes a bug in the hv_pci driver for device assignment (DDA) for a TDX
VM with the paravisor: in such a VM, the hyperv_pcpu_input_arg must be
private (i.e. encrypted), otherwise the hypercalls in hv_pci fail since the
hypercalls in such a VM is handled by the paravisor rather than by the
hypervisor.

b) Undo some hack code introduced by
commit b8b46adebbd8 ("UBUNTU: SAUCE: Support TDX+HCL (July 9, 2023)"),
e.g. in hyperv_init(), this patch moves the below code to its original place:

 cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
                                 hv_cpu_init, hv_cpu_die);

With this patch, now hyperv_init() in this 6.2 linux-azure kernel is
exactly the same as the version in the mainline kernel.

I tested the patch for a TDX VM without and with paravisor, a VBS VM,
a SNP VM with paravisor, and a regular VM. All the VMs have 128 vCPUs
and 20 GB of memory. All worked as expected.

References:
[1] https://lwn.net/ml/linux-kernel/ZOQMiLEdPsD+pF8q@liuwe-devbox-debian-v2/
[2] https://lwn.net/ml/linux-kernel/ZOfwSDjW0wlHozYV@liuwe-devbox-debian-v2/
[3] https://lwn.net/ml/linux-kernel/20230515165917.1306922-3-ltykernel@gmail.com/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23378295042a4bcaeec350733a4771678e7a1f3a

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 arch/x86/hyperv/hv_init.c | 66 ++++++++++++++++---------------------
 drivers/hv/hv.c           | 69 +++++++++++++++++++++++++++++++++++----
 drivers/hv/hv_common.c    |  3 +-
 drivers/hv/hyperv_vmbus.h | 11 +++++++
 4 files changed, 104 insertions(+), 45 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a6ccc041539d..af3653caefd3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -51,7 +51,7 @@ static int hyperv_init_ghcb(void)
 	void *ghcb_va;
 	void **ghcb_base;
 
-	if (!hv_isolation_type_snp())
+	if (!ms_hyperv.paravisor_present || !hv_isolation_type_snp())
 		return 0;
 
 	if (!hv_ghcb_pg)
@@ -457,7 +457,7 @@ void __init hyperv_init(void)
 			goto common_free;
 	}
 
-	if (hv_isolation_type_snp()) {
+	if (ms_hyperv.paravisor_present && hv_isolation_type_snp()) {
 		/* Negotiate GHCB Version. */
 		if (!hv_ghcb_negotiate_protocol())
 			hv_ghcb_terminate(SEV_TERM_SET_GEN,
@@ -468,36 +468,39 @@ void __init hyperv_init(void)
 			goto free_vp_assist_page;
 	}
 
+	cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
+				  hv_cpu_init, hv_cpu_die);
+	if (cpuhp < 0)
+		goto free_ghcb_page;
+
 	/*
 	 * Setup the hypercall page and enable hypercalls.
 	 * 1. Register the guest ID
 	 * 2. Enable the hypercall and register the hypercall page
 	 *
-	 * A TDX VM with no paravisor uses GHCI rather than hv_hypercall_pg.
-	 * When the VM needs to pass an input page to Hyper-V, the page must
-	 * be a shared page, e.g. hv_post_message() uses the per-CPU shared
-	 * page hyperv_pcpu_input_arg.
+	 * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
+	 * when the hypercall input is a page, such a VM must pass a decrypted
+	 * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
+	 * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
 	 *
 	 * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
-	 * which are handled by the paravisor and a private input page must be
-	 * used, e.g. see hv_mark_gpa_visibility(). The VM uses GHCI for
-	 * two hypercalls: HVCALL_SIGNAL_EVENT (see vmbus_set_event()) and
-	 * HVCALL_POST_MESSAGE (the input page must be a shared page, i.e.
-	 * hv_post_message() uses the per-CPU shared hyperv_pcpu_input_arg.)
-	 * NOTE: we must initialize hv_hypercall_pg before hv_cpu_init(),
-	 * because hv_cpu_init() -> hv_common_cpu_init() -> set_memory_decrypted()
-	 * -> ... -> hv_vtom_set_host_visibility() -> ... -> hv_do_hypercall()
-	 * needs to call the hv_hypercall_pg.
-	 */
-
-	/*
-	 * In the case of TDX with the paravisor, we should write the MSR
-	 * before hv_cpu_init(), which needs to call the paravisor-handled
-	 * HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY.
+	 * which are handled by the paravisor and the VM must use an encrypted
+	 * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
+	 * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
+	 * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
+	 * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
+	 * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
+	 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
+	 * instead, hv_post_message() uses the post_msg_page, which is decrypted
+	 * in such a VM and is only used in such a VM.
 	 */
 	guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
+	/* With the paravisor, the VM must also write the ID via GHCB/GHCI */
+	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
+
+	/* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg */
 	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
 		goto skip_hypercall_pg_init;
 
@@ -506,7 +509,7 @@ void __init hyperv_init(void)
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
 			__builtin_return_address(0));
 	if (hv_hypercall_pg == NULL)
-		goto free_ghcb_page;
+		goto clean_guest_os_id;
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
@@ -541,18 +544,6 @@ void __init hyperv_init(void)
 	}
 
 skip_hypercall_pg_init:
-	cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
-				  hv_cpu_init, hv_cpu_die);
-	if (cpuhp < 0)
-		goto clean_guest_os_id;
-
-	/*
-	 * In the case of SNP with the paravisor, we must write the MSR to
-	 * the hypervisor after hv_cpu_init(), which maps the hv_ghcb_pg first.
-	 */
-	if (hyperv_paravisor_present)
-		hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
-
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -592,8 +583,8 @@ void __init hyperv_init(void)
 
 clean_guest_os_id:
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-	if (hyperv_paravisor_present)
-		hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
+	cpuhp_remove_state(cpuhp);
 free_ghcb_page:
 	free_percpu(hv_ghcb_pg);
 free_vp_assist_page:
@@ -613,8 +604,7 @@ void hyperv_cleanup(void)
 
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-	if (hyperv_paravisor_present)
-		hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
 	/*
 	 * Reset hypercall page reference before reset the page,
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index a5d388f3706c..8c5fa0807456 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -93,7 +93,17 @@ int hv_post_message(union hv_connection_id connection_id,
 
 	local_irq_save(flags);
 
-	aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	/*
+	 * A TDX VM with the paravisor must use the decrypted post_msg_page: see
+	 * the comment in struct hv_per_cpu_context. A SNP VM with the paravisor
+	 * can use the encrypted hyperv_pcpu_input_arg because it copies the
+	 * input into the GHCB page, which has been decrypted by the paravisor.
+	 */
+	if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
+		aligned_msg = this_cpu_ptr(hv_context.cpu_context)->post_msg_page;
+	else
+		aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
 	aligned_msg->connectionid = connection_id;
 	aligned_msg->reserved = 0;
 	aligned_msg->message_type = message_type;
@@ -142,6 +152,24 @@ int hv_synic_alloc(void)
 		tasklet_init(&hv_cpu->msg_dpc,
 			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
 
+		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
+			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->post_msg_page == NULL) {
+				pr_err("Unable to allocate post msg page\n");
+				goto err;
+			}
+
+			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt post msg page: %d\n", ret);
+				/* Just leak the page, as it's unsafe to free the page. */
+				hv_cpu->post_msg_page = NULL;
+				goto err;
+			}
+
+			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
+		}
+
 		/*
 		 * Synic message and event pages are allocated by paravisor.
 		 * Skip these pages allocation here.
@@ -158,6 +186,9 @@ int hv_synic_alloc(void)
 				(void *)get_zeroed_page(GFP_ATOMIC);
 			if (hv_cpu->synic_event_page == NULL) {
 				pr_err("Unable to allocate SYNIC event page\n");
+
+				free_page((unsigned long)hv_cpu->synic_message_page);
+				hv_cpu->synic_message_page = NULL;
 				goto err;
 			}
 		}
@@ -168,6 +199,14 @@ int hv_synic_alloc(void)
 				(unsigned long)hv_cpu->synic_message_page, 1);
 			if (ret) {
 				pr_err("Failed to decrypt SYNIC msg page\n");
+				hv_cpu->synic_message_page = NULL;
+
+				/*
+				 * Free the event page here so that hv_synic_free()
+				 * won't later try to re-encrypt it.
+				 */
+				free_page((unsigned long)hv_cpu->synic_event_page);
+				hv_cpu->synic_event_page = NULL;
 				goto err;
 			}
 
@@ -175,8 +214,12 @@ int hv_synic_alloc(void)
 				(unsigned long)hv_cpu->synic_event_page, 1);
 			if (ret) {
 				pr_err("Failed to decrypt SYNIC event page\n");
+				hv_cpu->synic_event_page = NULL;
 				goto err;
 			}
+
+			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
 		}
 	}
 
@@ -200,6 +243,17 @@ void hv_synic_free(void)
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		/* It's better to leak the page if the encryption fails. */
+		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
+			if (hv_cpu->post_msg_page) {
+				ret = set_memory_encrypted((unsigned long)
+					hv_cpu->post_msg_page, 1);
+				if (ret) {
+					pr_err("Failed to encrypt post msg page: %d\n", ret);
+					hv_cpu->post_msg_page = NULL;
+				}
+			}
+		}
+
 		if (hv_isolation_type_tdx() && !hyperv_paravisor_present) {
 			if (hv_cpu->synic_message_page) {
 				ret = set_memory_encrypted((unsigned long)
@@ -210,14 +264,17 @@ void hv_synic_free(void)
 				}
 			}
 
-			ret = set_memory_encrypted(
-				(unsigned long)hv_cpu->synic_event_page, 1);
-			if (ret) {
-				pr_err("Failed to encrypt SYNIC event page\n");
-				continue;
+			if (hv_cpu->synic_event_page) {
+				ret = set_memory_encrypted(
+					(unsigned long)hv_cpu->synic_event_page, 1);
+				if (ret) {
+					pr_err("Failed to encrypt SYNIC event page\n");
+					hv_cpu->synic_event_page = NULL;
+				}
 			}
 		}
 
+		free_page((unsigned long)hv_cpu->post_msg_page);
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
 	}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 308d7d485803..20033df9031d 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -149,7 +149,8 @@ int hv_common_cpu_init(unsigned int cpu)
 		if (!mem)
 			return -ENOMEM;
 
-		if (hv_isolation_type_tdx()) {
+		if (!ms_hyperv.paravisor_present &&
+			(hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)mem, pgcount);
 
 			/* It may be unsafe to free mem upon error. */
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 55f2086841ae..f6b1e710f805 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -123,6 +123,17 @@ struct hv_per_cpu_context {
 	void *synic_message_page;
 	void *synic_event_page;
 
+	/*
+	 * The page is only used in hv_post_message() for a TDX VM (with the
+	 * paravisor) to post a messages to Hyper-V: when such a VM calls
+	 * HVCALL_POST_MESSAGE, it can't use the hyperv_pcpu_input_arg (which
+	 * is encrypted in such a VM) as the hypercall input page, because
+	 * the input page for HVCALL_POST_MESSAGE must be decrypted in such a
+	 * VM, so post_msg_page (which is decrypted in hv_synic_alloc()) is
+	 * introduced for this purpose. See hyperv_init() for more comments.
+	 */
+	void *post_msg_page;
+
 	/*
 	 * Starting with win8, we can take channel interrupts on any CPU;
 	 * we will manage the tasklet that handles events messages on a per CPU