diff mbox series

[SRU,N:intel,7/8] UBUNTU: SAUCE: KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits

Message ID 20240626135232.2731811-8-thibault.ferrante@canonical.com
State New
Headers show
Series Support Kexec/Kdump | expand

Commit Message

Thibault Ferrante June 26, 2024, 1:52 p.m. UTC
From: Gerd Hoffmann <kraxel@redhat.com>

BugLink: https://bugs.launchpad.net/bugs/2070356

Use the GuestPhysBits field in CPUID.0x80000008 to communicate the max
mappable GPA to userspace, i.e. the max GPA that is addressable by the
CPU itself.  Typically this is identical to the max effective GPA, except
in the case where the CPU supports MAXPHYADDR > 48 but does not support
5-level TDP (the CPU consults bits 51:48 of the GPA only when walking the
fifth level TDP page table entry).

Enumerating the max mappable GPA via CPUID will allow guest firmware to
map resources like PCI bars in the highest possible address space, while
ensuring that the GPA is addressable by the CPU.  Without precise
knowledge about the max mappable GPA, the guest must assume that 5-level
paging is unsupported and thus restrict its mappings to the lower 48 bits.

Advertise the max mappable GPA via KVM_GET_SUPPORTED_CPUID as userspace
doesn't have easy access to whether or not 5-level paging is supported,
and to play nice with userspace VMMs that reflect the supported CPUID
directly into the guest.

AMD's APM (3.35) defines GuestPhysBits (EAX[23:16]) as:

  Maximum guest physical address size in bits.  This number applies
  only to guests using nested paging.  When this field is zero, refer
  to the PhysAddrSize field for the maximum guest physical address size.

Tom Lendacky confirmed that the purpose of GuestPhysBits is software use
and KVM can use it as described above.  Real hardware always returns zero.

Leave GuestPhysBits as '0' when TDP is disabled in order to comply with
the APM's statement that GuestPhysBits "applies only to guest using nested
paging".  As above, guest firmware will likely create suboptimal mappings,
but that is a very minor issue and not a functional concern.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20240313125844.912415-3-kraxel@redhat.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
(cherry picked from http://github.com/intel/kernel-downstream.git/v6.8-tdx from commit e631d610805278e9817c7ab6741ca297ad72b36d)
Signed-off-by: Thibault Ferrante <thibault.ferrante@canonical.com>
---
 arch/x86/kvm/cpuid.c   | 28 +++++++++++++++++++++++++---
 arch/x86/kvm/mmu.h     |  2 ++
 arch/x86/kvm/mmu/mmu.c |  5 +++++
 3 files changed, 32 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 54b064dd8a48..064cc2e87113 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1226,8 +1226,22 @@  static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = entry->ebx = entry->ecx = 0;
 		break;
 	case 0x80000008: {
+		/*
+		 * GuestPhysAddrSize (EAX[23:16]) is intended for software
+		 * use.
+		 *
+		 * KVM's ABI is to report the effective MAXPHYADDR for the
+		 * guest in PhysAddrSize (phys_as), and the maximum
+		 * *addressable* GPA in GuestPhysAddrSize (g_phys_as).
+		 *
+		 * GuestPhysAddrSize is valid if and only if TDP is enabled,
+		 * in which case the max GPA that can be addressed by KVM may
+		 * be less than the max GPA that can be legally generated by
+		 * the guest, e.g. if MAXPHYADDR>48 but the CPU doesn't
+		 * support 5-level TDP.
+		 */
 		unsigned int virt_as = max((entry->eax >> 8) & 0xff, 48U);
-		unsigned int phys_as;
+		unsigned int phys_as, g_phys_as;
 
 		/*
 		 * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
@@ -1236,15 +1250,23 @@  static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		 * paging, too.
 		 *
 		 * If TDP is enabled, use the raw bare metal MAXPHYADDR as
-		 * reductions to the HPAs do not affect GPAs.
+		 * reductions to the HPAs do not affect GPAs.  The max
+		 * addressable GPA is the same as the max effective GPA, except
+		 * that it's capped at 48 bits if 5-level TDP isn't supported
+		 * (hardware processes bits 51:48 only when walking the fifth
+		 * level page table).
 		 */
 		if (!tdp_enabled) {
 			phys_as = boot_cpu_data.x86_phys_bits;
+			g_phys_as = 0;
 		} else {
 			phys_as = entry->eax & 0xff;
+			g_phys_as = phys_as;
+			if (kvm_mmu_get_max_tdp_level() < 5)
+				g_phys_as = min(g_phys_as, 48);
 		}
 
-		entry->eax = phys_as | (virt_as << 8);
+		entry->eax = phys_as | (virt_as << 8) | (g_phys_as << 16);
 		entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8));
 		entry->edx = 0;
 		cpuid_entry_override(entry, CPUID_8000_0008_EBX);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ab2854f337ab..725bc5caa217 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -100,6 +100,8 @@  static inline u8 kvm_get_shadow_phys_bits(void)
 	return boot_cpu_data.x86_phys_bits;
 }
 
+u8 kvm_mmu_get_max_tdp_level(void);
+
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 20eae3e759db..440d8a5dad20 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5419,6 +5419,11 @@  static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
 	return max_tdp_level;
 }
 
+u8 kvm_mmu_get_max_tdp_level(void)
+{
+	return tdp_root_level ? tdp_root_level : max_tdp_level;
+}
+
 static union kvm_mmu_page_role
 kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 				union kvm_cpu_role cpu_role)