From patchwork Tue Oct 31 13:19:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 1857568 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKW2d3BbGz1yQZ for ; Wed, 1 Nov 2023 00:19:53 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qxoej-0004bP-Pt; Tue, 31 Oct 2023 13:19:42 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qxoeL-0004Yo-7y for kernel-team@lists.ubuntu.com; Tue, 31 Oct 2023 13:19:17 +0000 Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id C2BF83F1D9 for ; Tue, 31 Oct 2023 13:19:16 +0000 (UTC) Received: by mail-pf1-f198.google.com with SMTP id d2e1a72fcca58-6bf2b098e43so3917442b3a.3 for ; Tue, 31 Oct 2023 06:19:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698758355; x=1699363155; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hLgFuPMTPrUoBcwTEEoMNawez4lozVQER1IFXBccyro=; b=tJp8qd2mSG4mDsMYiRqGstHLBghhDM/xuHTAERkTkYlnHPAxQbc8LxKMnk+Yfh6XTy e4naxvIcV0Lnbg0eGo7iYJMoVqhlp0Hcl3Uibi0W5R9fiY7z0cUJKoVdMno5RGW1Ljhi JBrlDnRzrUzopuW3jaLAPaiAQZFwKCXi/WY5QeAhqixICk34azWfx9EcaOrPNHIVKfM6 /bcg92wV5NktdLW+VjuptDwpRSVPCZFWoUuIMV2RA6rDW9AnjoF1B09NXHfLohz/XfWg fjuE5qNF/wdUz6wCC1/LLsD2wNwo1M4+yBOtgtVwt8zzpGVTAxvfbnIUUMuMstKlGYg1 5nQA== X-Gm-Message-State: AOJu0Yyon70iI5wRv3uxBnhcKO5h6ksxT4WKH6C229CgcTRz4AZUL6q0 G42ELoWhvbKzauNi1MmMh8Uy8/qNgDVLq8rGNbMKkZ1+OWltdGfitcQBsbZ/WkxlE5R/ZqlwzL/ syV3UdLtztjPRalF2M43HsDbHSeLlN/PJTks+FUSslWYWsQgoJA== X-Received: by 2002:a05:6a20:7289:b0:17a:fa76:8062 with SMTP id o9-20020a056a20728900b0017afa768062mr13086393pzk.19.1698758354634; Tue, 31 Oct 2023 06:19:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF+Wd3ELNzs6d+0F28pyTUdFYdnuXQuVEPXt8ZdPYQJp9lCl0xBWmljelTWBhS+6vqw0gU0tw== X-Received: by 2002:a05:6a20:7289:b0:17a:fa76:8062 with SMTP id o9-20020a056a20728900b0017afa768062mr13086362pzk.19.1698758354122; Tue, 31 Oct 2023 06:19:14 -0700 (PDT) Received: from smtp.gmail.com (174-045-099-030.res.spectrum.com. [174.45.99.30]) by smtp.gmail.com with ESMTPSA id j21-20020a056a00235500b006be484e5b9bsm1230057pfj.58.2023.10.31.06.19.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 06:19:13 -0700 (PDT) From: Tim Gardner To: kernel-team@lists.ubuntu.com Subject: [PATCH 2/2] UBUNTU: SAUCE: Refresh the TDX support and support DDA for a TDX VM with paravisor Date: Tue, 31 Oct 2023 07:19:09 -0600 Message-Id: <20231031131909.99632-3-tim.gardner@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231031131909.99632-1-tim.gardner@canonical.com> References: <20231031131909.99632-1-tim.gardner@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Dexuan Cui BugLink: https://bugs.launchpad.net/bugs/2042096 Ideally we would revert commit b8b46adebbd8 ("UBUNTU: SAUCE: Support TDX+HCL (July 9, 2023)"), and apply "[PATCH v7 0/8] x86/hyperv: Add AMD sev-snp enlightened guest support on hyperv" [1] and apply "[PATCH v3 00/10] Support TDX guests on Hyper-V (the Hyper-V specific part)" [2] (Note: [2] depends on [1]), but that would introduce too many changes, and actually "AMD sev-snp enlightened guest support on hyperv" still needs some extra patches that are not in the upstream yet, e.g. Tianyu Lan's #HV interrupt injection patch [3] is not in the upstream yet. So I think a better way to have [2] is to make a patch that adds the missing part of [2] for the 6.2-based linux-azure kernel, hence I made this patch. This patch mainly does the below two things: a) Add commit 23378295042a ("Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor") [4] This fixes a bug in the hv_pci driver for device assignment (DDA) for a TDX VM with the paravisor: in such a VM, the hyperv_pcpu_input_arg must be private (i.e. encrypted), otherwise the hypercalls in hv_pci fail since the hypercalls in such a VM is handled by the paravisor rather than by the hypervisor. b) Undo some hack code introduced by commit b8b46adebbd8 ("UBUNTU: SAUCE: Support TDX+HCL (July 9, 2023)"), e.g. in hyperv_init(), this patch moves the below code to its original place: cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online", hv_cpu_init, hv_cpu_die); With this patch, now hyperv_init() in this 6.2 linux-azure kernel is exactly the same as the version in the mainline kernel. I tested the patch for a TDX VM without and with paravisor, a VBS VM, a SNP VM with paravisor, and a regular VM. All the VMs have 128 vCPUs and 20 GB of memory. All worked as expected. References: [1] https://lwn.net/ml/linux-kernel/ZOQMiLEdPsD+pF8q@liuwe-devbox-debian-v2/ [2] https://lwn.net/ml/linux-kernel/ZOfwSDjW0wlHozYV@liuwe-devbox-debian-v2/ [3] https://lwn.net/ml/linux-kernel/20230515165917.1306922-3-ltykernel@gmail.com/ [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23378295042a4bcaeec350733a4771678e7a1f3a Signed-off-by: Dexuan Cui Signed-off-by: Tim Gardner --- arch/x86/hyperv/hv_init.c | 66 ++++++++++++++++--------------------- drivers/hv/hv.c | 69 +++++++++++++++++++++++++++++++++++---- drivers/hv/hv_common.c | 3 +- drivers/hv/hyperv_vmbus.h | 11 +++++++ 4 files changed, 104 insertions(+), 45 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index a6ccc041539d..af3653caefd3 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -51,7 +51,7 @@ static int hyperv_init_ghcb(void) void *ghcb_va; void **ghcb_base; - if (!hv_isolation_type_snp()) + if (!ms_hyperv.paravisor_present || !hv_isolation_type_snp()) return 0; if (!hv_ghcb_pg) @@ -457,7 +457,7 @@ void __init hyperv_init(void) goto common_free; } - if (hv_isolation_type_snp()) { + if (ms_hyperv.paravisor_present && hv_isolation_type_snp()) { /* Negotiate GHCB Version. */ if (!hv_ghcb_negotiate_protocol()) hv_ghcb_terminate(SEV_TERM_SET_GEN, @@ -468,36 +468,39 @@ void __init hyperv_init(void) goto free_vp_assist_page; } + cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online", + hv_cpu_init, hv_cpu_die); + if (cpuhp < 0) + goto free_ghcb_page; + /* * Setup the hypercall page and enable hypercalls. * 1. Register the guest ID * 2. Enable the hypercall and register the hypercall page * - * A TDX VM with no paravisor uses GHCI rather than hv_hypercall_pg. - * When the VM needs to pass an input page to Hyper-V, the page must - * be a shared page, e.g. hv_post_message() uses the per-CPU shared - * page hyperv_pcpu_input_arg. + * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg: + * when the hypercall input is a page, such a VM must pass a decrypted + * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page + * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present. * * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls, - * which are handled by the paravisor and a private input page must be - * used, e.g. see hv_mark_gpa_visibility(). The VM uses GHCI for - * two hypercalls: HVCALL_SIGNAL_EVENT (see vmbus_set_event()) and - * HVCALL_POST_MESSAGE (the input page must be a shared page, i.e. - * hv_post_message() uses the per-CPU shared hyperv_pcpu_input_arg.) - * NOTE: we must initialize hv_hypercall_pg before hv_cpu_init(), - * because hv_cpu_init() -> hv_common_cpu_init() -> set_memory_decrypted() - * -> ... -> hv_vtom_set_host_visibility() -> ... -> hv_do_hypercall() - * needs to call the hv_hypercall_pg. - */ - - /* - * In the case of TDX with the paravisor, we should write the MSR - * before hv_cpu_init(), which needs to call the paravisor-handled - * HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY. + * which are handled by the paravisor and the VM must use an encrypted + * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and + * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and + * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls: + * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8(). + * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e. + * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg; + * instead, hv_post_message() uses the post_msg_page, which is decrypted + * in such a VM and is only used in such a VM. */ guest_id = hv_generate_guest_id(LINUX_VERSION_CODE); wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id); + /* With the paravisor, the VM must also write the ID via GHCB/GHCI */ + hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id); + + /* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg */ if (hv_isolation_type_tdx() && !hyperv_paravisor_present) goto skip_hypercall_pg_init; @@ -506,7 +509,7 @@ void __init hyperv_init(void) VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __builtin_return_address(0)); if (hv_hypercall_pg == NULL) - goto free_ghcb_page; + goto clean_guest_os_id; rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); hypercall_msr.enable = 1; @@ -541,18 +544,6 @@ void __init hyperv_init(void) } skip_hypercall_pg_init: - cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online", - hv_cpu_init, hv_cpu_die); - if (cpuhp < 0) - goto clean_guest_os_id; - - /* - * In the case of SNP with the paravisor, we must write the MSR to - * the hypervisor after hv_cpu_init(), which maps the hv_ghcb_pg first. - */ - if (hyperv_paravisor_present) - hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id); - /* * hyperv_init() is called before LAPIC is initialized: see * apic_intr_mode_init() -> x86_platform.apic_post_init() and @@ -592,8 +583,8 @@ void __init hyperv_init(void) clean_guest_os_id: wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); - if (hyperv_paravisor_present) - hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); + hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); + cpuhp_remove_state(cpuhp); free_ghcb_page: free_percpu(hv_ghcb_pg); free_vp_assist_page: @@ -613,8 +604,7 @@ void hyperv_cleanup(void) /* Reset our OS id */ wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); - if (hyperv_paravisor_present) - hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); + hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); /* * Reset hypercall page reference before reset the page, diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c index a5d388f3706c..8c5fa0807456 100644 --- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -93,7 +93,17 @@ int hv_post_message(union hv_connection_id connection_id, local_irq_save(flags); - aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg); + /* + * A TDX VM with the paravisor must use the decrypted post_msg_page: see + * the comment in struct hv_per_cpu_context. A SNP VM with the paravisor + * can use the encrypted hyperv_pcpu_input_arg because it copies the + * input into the GHCB page, which has been decrypted by the paravisor. + */ + if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present) + aligned_msg = this_cpu_ptr(hv_context.cpu_context)->post_msg_page; + else + aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg); + aligned_msg->connectionid = connection_id; aligned_msg->reserved = 0; aligned_msg->message_type = message_type; @@ -142,6 +152,24 @@ int hv_synic_alloc(void) tasklet_init(&hv_cpu->msg_dpc, vmbus_on_msg_dpc, (unsigned long) hv_cpu); + if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) { + hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC); + if (hv_cpu->post_msg_page == NULL) { + pr_err("Unable to allocate post msg page\n"); + goto err; + } + + ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1); + if (ret) { + pr_err("Failed to decrypt post msg page: %d\n", ret); + /* Just leak the page, as it's unsafe to free the page. */ + hv_cpu->post_msg_page = NULL; + goto err; + } + + memset(hv_cpu->post_msg_page, 0, PAGE_SIZE); + } + /* * Synic message and event pages are allocated by paravisor. * Skip these pages allocation here. @@ -158,6 +186,9 @@ int hv_synic_alloc(void) (void *)get_zeroed_page(GFP_ATOMIC); if (hv_cpu->synic_event_page == NULL) { pr_err("Unable to allocate SYNIC event page\n"); + + free_page((unsigned long)hv_cpu->synic_message_page); + hv_cpu->synic_message_page = NULL; goto err; } } @@ -168,6 +199,14 @@ int hv_synic_alloc(void) (unsigned long)hv_cpu->synic_message_page, 1); if (ret) { pr_err("Failed to decrypt SYNIC msg page\n"); + hv_cpu->synic_message_page = NULL; + + /* + * Free the event page here so that hv_synic_free() + * won't later try to re-encrypt it. + */ + free_page((unsigned long)hv_cpu->synic_event_page); + hv_cpu->synic_event_page = NULL; goto err; } @@ -175,8 +214,12 @@ int hv_synic_alloc(void) (unsigned long)hv_cpu->synic_event_page, 1); if (ret) { pr_err("Failed to decrypt SYNIC event page\n"); + hv_cpu->synic_event_page = NULL; goto err; } + + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE); + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE); } } @@ -200,6 +243,17 @@ void hv_synic_free(void) = per_cpu_ptr(hv_context.cpu_context, cpu); /* It's better to leak the page if the encryption fails. */ + if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) { + if (hv_cpu->post_msg_page) { + ret = set_memory_encrypted((unsigned long) + hv_cpu->post_msg_page, 1); + if (ret) { + pr_err("Failed to encrypt post msg page: %d\n", ret); + hv_cpu->post_msg_page = NULL; + } + } + } + if (hv_isolation_type_tdx() && !hyperv_paravisor_present) { if (hv_cpu->synic_message_page) { ret = set_memory_encrypted((unsigned long) @@ -210,14 +264,17 @@ void hv_synic_free(void) } } - ret = set_memory_encrypted( - (unsigned long)hv_cpu->synic_event_page, 1); - if (ret) { - pr_err("Failed to encrypt SYNIC event page\n"); - continue; + if (hv_cpu->synic_event_page) { + ret = set_memory_encrypted( + (unsigned long)hv_cpu->synic_event_page, 1); + if (ret) { + pr_err("Failed to encrypt SYNIC event page\n"); + hv_cpu->synic_event_page = NULL; + } } } + free_page((unsigned long)hv_cpu->post_msg_page); free_page((unsigned long)hv_cpu->synic_event_page); free_page((unsigned long)hv_cpu->synic_message_page); } diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 308d7d485803..20033df9031d 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -149,7 +149,8 @@ int hv_common_cpu_init(unsigned int cpu) if (!mem) return -ENOMEM; - if (hv_isolation_type_tdx()) { + if (!ms_hyperv.paravisor_present && + (hv_isolation_type_snp() || hv_isolation_type_tdx())) { ret = set_memory_decrypted((unsigned long)mem, pgcount); /* It may be unsafe to free mem upon error. */ diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h index 55f2086841ae..f6b1e710f805 100644 --- a/drivers/hv/hyperv_vmbus.h +++ b/drivers/hv/hyperv_vmbus.h @@ -123,6 +123,17 @@ struct hv_per_cpu_context { void *synic_message_page; void *synic_event_page; + /* + * The page is only used in hv_post_message() for a TDX VM (with the + * paravisor) to post a messages to Hyper-V: when such a VM calls + * HVCALL_POST_MESSAGE, it can't use the hyperv_pcpu_input_arg (which + * is encrypted in such a VM) as the hypercall input page, because + * the input page for HVCALL_POST_MESSAGE must be decrypted in such a + * VM, so post_msg_page (which is decrypted in hv_synic_alloc()) is + * introduced for this purpose. See hyperv_init() for more comments. + */ + void *post_msg_page; + /* * Starting with win8, we can take channel interrupts on any CPU; * we will manage the tasklet that handles events messages on a per CPU