Message ID | 20211009003711.1390019-4-sathyanarayanan.kuppuswamy@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | Add TDX Guest Support (shared-mm support) | expand |
On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Just like MKTME, TDX reassigns bits of the physical address for > metadata. MKTME used several bits for an encryption KeyID. TDX > uses a single bit in guests to communicate whether a physical page > should be protected by TDX as private memory (bit set to 0) or > unprotected and shared with the VMM (bit set to 1). > > Add a helper, tdx_shared_mask() to generate the mask. The processor > enumerates its physical address width to include the shared bit, which > means it gets included in __PHYSICAL_MASK by default. This is incorrect. The shared bit _may_ be a legal PA bit, but AIUI it's not a hard requirement. > Remove the shared mask from 'physical_mask' since any bits in > tdx_shared_mask() are not used for physical addresses in page table > entries. ... > @@ -94,6 +100,9 @@ static void tdx_get_info(void) > > td_info.gpa_width = out.rcx & GENMASK(5, 0); > td_info.attributes = out.rdx; > + > + /* Exclude Shared bit from the __PHYSICAL_MASK */ > + physical_mask &= ~tdx_shared_mask(); This is insufficient, though it's not really the fault of this patch, the specs themselves botch this whole thing. The TDX Module spec explicitly states that GPAs above GPAW are considered reserved. 10.11.1. GPAW-Relate EPT Violations GPA bits higher than the SHARED bit are considered reserved and must be 0. Address translation with any of the reserved bits set to 1 cause a #PF with PFEC (Page Fault Error Code) RSVD bit set. But this is contradicted by the architectural extensions spec, which states that a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF. Note, this section also appears to have a bug, as it states that GPA bit 47 is both the SHARED bit and reserved. I assume that blurb is intended to clarify that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's the shared bit it's ok to access. 1.4.2 Guest Physical Address Translation If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical address width is configured to be 48, accesses with GPA bits 51:48 not all being 0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE, even if the “EPT-violations #VE” execution control is 1. If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits 46:MAXPA in any paging structure can cause a reserved bit page fault on access. The Module spec also calls out that the effective GPA is not to be confused with MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA is enumerated separately by design so that the guest doesn't incorrectly think 46:MAXPA are usable. But that is problematic for the case where MAXPA > GPAW. The effective GPA width (in bits) for this TD (do not confuse with MAXPA). SHARED bit is at GPA bit GPAW-1. I can't find the exact reference, but the TDX module always passes through host's MAXPHYADDR. As it pertains to this patch, just doing physical_mask &= ~tdx_shared_mask() means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous physical_mask, and could access "reserved" memory. If the VMM defines legal memory with bits [MAXPHYADDR:48]!=0, explosions may ensue. That's arguably a VMM bug, but given that the VMM is untrusted I think the guest should be paranoid when handling the SHARED bit. I also don't know that the kernel will play nice with a discontiguous mask. Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR, or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate, this mess isn't going to be truly fixed from the guest perspective. So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or refuse to boot if the host has defined legal memory in that range. FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49. But on the guest side, I think we should be paranoid.
On Fri, Nov 05, 2021 at 10:11:48PM +0000, Sean Christopherson wrote: > On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote: > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > > > Just like MKTME, TDX reassigns bits of the physical address for > > metadata. MKTME used several bits for an encryption KeyID. TDX > > uses a single bit in guests to communicate whether a physical page > > should be protected by TDX as private memory (bit set to 0) or > > unprotected and shared with the VMM (bit set to 1). > > > > Add a helper, tdx_shared_mask() to generate the mask. The processor > > enumerates its physical address width to include the shared bit, which > > means it gets included in __PHYSICAL_MASK by default. > > This is incorrect. The shared bit _may_ be a legal PA bit, but AIUI it's not a > hard requirement. Good point, will fix. > > Remove the shared mask from 'physical_mask' since any bits in > > tdx_shared_mask() are not used for physical addresses in page table > > entries. > > ... > > > @@ -94,6 +100,9 @@ static void tdx_get_info(void) > > > > td_info.gpa_width = out.rcx & GENMASK(5, 0); > > td_info.attributes = out.rdx; > > + > > + /* Exclude Shared bit from the __PHYSICAL_MASK */ > > + physical_mask &= ~tdx_shared_mask(); > > This is insufficient, though it's not really the fault of this patch, the specs > themselves botch this whole thing. > > The TDX Module spec explicitly states that GPAs above GPAW are considered reserved. > > 10.11.1. GPAW-Relate EPT Violations > GPA bits higher than the SHARED bit are considered reserved and must be 0. > Address translation with any of the reserved bits set to 1 cause a #PF with > PFEC (Page Fault Error Code) RSVD bit set. > > But this is contradicted by the architectural extensions spec, which states that > a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF. > Note, this section also appears to have a bug, as it states that GPA bit 47 is > both the SHARED bit and reserved. I assume that blurb is intended to clarify > that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's > the shared bit it's ok to access. > > 1.4.2 > Guest Physical Address Translation > If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical > address width is configured to be 48, accesses with GPA bits 51:48 not all being > 0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE, > even if the “EPT-violations #VE” execution control is 1. > > If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit > is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits > 46:MAXPA in any paging structure can cause a reserved bit page fault on access. > > The Module spec also calls out that the effective GPA is not to be confused with > MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA > is enumerated separately by design so that the guest doesn't incorrectly think > 46:MAXPA are usable. But that is problematic for the case where MAXPA > GPAW. > > The effective GPA width (in bits) for this TD (do not confuse with MAXPA). > SHARED bit is at GPA bit GPAW-1. > > I can't find the exact reference, but the TDX module always passes through host's > MAXPHYADDR. As it pertains to this patch, just doing > > physical_mask &= ~tdx_shared_mask() > > means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous > physical_mask, and could access "reserved" memory. If the VMM defines legal memory > with bits [MAXPHYADDR:48]!=0, explosions may ensue. That's arguably a VMM bug, but > given that the VMM is untrusted I think the guest should be paranoid when handling > the SHARED bit. I also don't know that the kernel will play nice with a discontiguous > mask. I expect it to be buggy. > Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR, > or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate, > this mess isn't going to be truly fixed from the guest perspective. > > So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or > refuse to boot if the host has defined legal memory in that range. Right. But only >= GPAW-1 as shared bit is the MSB within GPAW: physical_mask &= GENMASK_ULL(td_info.gpa_width - 2, 0); '2' here smells bad, but well... Given that physical_mask is now contiguous we can truncate anything from e820 that cannot be addressed with adjusted __PHYSICAL_MASK: iff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f0deed..16d57a8769e8 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -833,6 +833,9 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type unsigned long last_pfn = 0; unsigned long max_arch_pfn = MAX_ARCH_PFN; + if (max_arch_pfn > PHYS_PFN(__PHYSICAL_MASK + 1)) + max_arch_pfn = PHYS_PFN(__PHYSICAL_MASK + 1); + for (i = 0; i < e820_table->nr_entries; i++) { struct e820_entry *entry = &e820_table->entries[i]; unsigned long start_pfn; Does it look reasonable? > FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force > GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49. But on the guest side, > I think we should be paranoid.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 37b27412f52e..e99c669e633a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -871,6 +871,7 @@ config INTEL_TDX_GUEST depends on SECURITY depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM + select X86_MEM_ENCRYPT_COMMON help Provide support for running in a trusted domain on Intel processors equipped with Trusted Domain Extensions. TDX is a Intel technology diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index eb5e9dbe1861..b8f758dbbea9 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -76,6 +76,8 @@ bool tdx_handle_virtualization_exception(struct pt_regs *regs, bool tdx_early_handle_ve(struct pt_regs *regs); +extern phys_addr_t tdx_shared_mask(void); + /* * To support I/O port access in decompressor or early kernel init * code, since #VE exception handler cannot be used, use paravirt @@ -141,6 +143,8 @@ static inline void tdx_early_init(void) { }; static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; } +static inline phys_addr_t tdx_shared_mask(void) { return 0; } + #endif /* CONFIG_INTEL_TDX_GUEST */ #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index bb237cf291e6..8a37ab0c6cbf 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -71,6 +71,12 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, return out->r10; } +/* The highest bit of a guest physical address is the "sharing" bit */ +phys_addr_t tdx_shared_mask(void) +{ + return 1ULL << (td_info.gpa_width - 1); +} + static void tdx_get_info(void) { struct tdx_module_output out; @@ -94,6 +100,9 @@ static void tdx_get_info(void) td_info.gpa_width = out.rcx & GENMASK(5, 0); td_info.attributes = out.rdx; + + /* Exclude Shared bit from the __PHYSICAL_MASK */ + physical_mask &= ~tdx_shared_mask(); } static __cpuidle void _tdx_halt(const bool irq_disabled, const bool do_sti)