Message ID | 20180917104144.19188-1-suzuki.poulose@arm.com |
---|---|
Headers | show |
Series | kvm: arm64: Dynamic IPA and 52bit IPA | expand |
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > Add support for setting the VTCR_EL2 per VM, rather than hard > coding a value at boot time per CPU. This would allow us to tune > the stage2 page table parameters per VM in the later changes. > > We compute the VTCR fields based on the system wide sanitised > feature registers, except for the hardware management of Access > Flags (VTCR_EL2.HA). It is fine to run a system with a mix of > CPUs that may or may not update the page table Access Flags. > Since the bit is RES0 on CPUs that don't support it, the bit > should be ignored on them. > > Suggested-by: Marc Zyngier <marc.zyngier@arm.com> > Acked-by: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/kvm_arm.h | 3 +- > arch/arm64/include/asm/kvm_asm.h | 2 - > arch/arm64/include/asm/kvm_host.h | 12 ++++-- > arch/arm64/include/asm/kvm_hyp.h | 1 + > arch/arm64/kvm/hyp/Makefile | 1 - > arch/arm64/kvm/hyp/s2-setup.c | 72 ------------------------------- > arch/arm64/kvm/reset.c | 30 +++++++++++++ > 7 files changed, 40 insertions(+), 81 deletions(-) > delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c > > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h > index 5f807b680a5f..14317b3a1820 100644 > --- a/arch/arm64/include/asm/kvm_arm.h > +++ b/arch/arm64/include/asm/kvm_arm.h > @@ -135,8 +135,7 @@ > * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are > * not known to exist and will break with this configuration. > * > - * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time > - * (see hyp-init.S). > + * The VTCR_EL2 is configured per VM and is initialised in kvm_arm_config_vm(). > * > * Note that when using 4K pages, we concatenate two first level page tables > * together. With 16K pages, we concatenate 16 first level page tables. > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h > index 102b5a5c47b6..0b53c72e7591 100644 > --- a/arch/arm64/include/asm/kvm_asm.h > +++ b/arch/arm64/include/asm/kvm_asm.h > @@ -72,8 +72,6 @@ extern void __vgic_v3_init_lrs(void); > > extern u32 __kvm_get_mdcr_el2(void); > > -extern u32 __init_stage2_translation(void); > - > /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ > #define __hyp_this_cpu_ptr(sym) \ > ({ \ > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index b04280ae1be0..5ecd457bce7d 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -61,11 +61,13 @@ struct kvm_arch { > u64 vmid_gen; > u32 vmid; > > - /* 1-level 2nd stage table, protected by kvm->mmu_lock */ > + /* stage2 entry level table */ > pgd_t *pgd; > > /* VTTBR value associated with above pgd and vmid */ > u64 vttbr; > + /* VTCR_EL2 value for this VM */ > + u64 vtcr; > > /* The last vcpu id that ran on each physical CPU */ > int __percpu *last_vcpu_ran; > @@ -442,10 +444,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, > > static inline void __cpu_init_stage2(void) > { > - u32 parange = kvm_call_hyp(__init_stage2_translation); > + u32 ps; > > - WARN_ONCE(parange < 40, > - "PARange is %d bits, unsupported configuration!", parange); > + /* Sanity check for minimum IPA size support */ > + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 0x7); > + WARN_ONCE(ps < 40, > + "PARange is %d bits, unsupported configuration!", ps); > } > > /* Guest/host FPSIMD coordination helpers */ > diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h > index d1bd1e0f14d7..23aca66767f9 100644 > --- a/arch/arm64/include/asm/kvm_hyp.h > +++ b/arch/arm64/include/asm/kvm_hyp.h > @@ -161,6 +161,7 @@ void __noreturn __hyp_do_panic(unsigned long, ...); > */ > static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm) > { > + write_sysreg(kvm->arch.vtcr, vtcr_el2); > write_sysreg(kvm->arch.vttbr, vttbr_el2); > } > > diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile > index 2fabc2dc1966..82d1904328ad 100644 > --- a/arch/arm64/kvm/hyp/Makefile > +++ b/arch/arm64/kvm/hyp/Makefile > @@ -19,7 +19,6 @@ obj-$(CONFIG_KVM_ARM_HOST) += switch.o > obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o > obj-$(CONFIG_KVM_ARM_HOST) += tlb.o > obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o > -obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o > > # KVM code is run at a different exception code with a different map, so > # compiler instrumentation that inserts callbacks or checks into the code may > diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c > deleted file mode 100644 > index e1ca672e937a..000000000000 > --- a/arch/arm64/kvm/hyp/s2-setup.c > +++ /dev/null > @@ -1,72 +0,0 @@ > -/* > - * Copyright (C) 2016 - ARM Ltd > - * Author: Marc Zyngier <marc.zyngier@arm.com> > - * > - * This program is free software; you can redistribute it and/or modify > - * it under the terms of the GNU General Public License version 2 as > - * published by the Free Software Foundation. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - * GNU General Public License for more details. > - * > - * You should have received a copy of the GNU General Public License > - * along with this program. If not, see <http://www.gnu.org/licenses/>. > - */ > - > -#include <linux/types.h> > -#include <asm/kvm_arm.h> > -#include <asm/kvm_asm.h> > -#include <asm/kvm_hyp.h> > -#include <asm/cpufeature.h> > - > -u32 __hyp_text __init_stage2_translation(void) > -{ > - u64 val = VTCR_EL2_FLAGS; > - u64 parange; > - u32 phys_shift; > - u64 tmp; > - > - /* > - * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS > - * bits in VTCR_EL2. Amusingly, the PARange is 4 bits, but the > - * allocated values are limited to 3bits. > - */ > - parange = read_sysreg(id_aa64mmfr0_el1) & 7; > - if (parange > ID_AA64MMFR0_PARANGE_MAX) > - parange = ID_AA64MMFR0_PARANGE_MAX; > - val |= parange << VTCR_EL2_PS_SHIFT; > - > - /* Compute the actual PARange... */ > - phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); > - > - /* > - * ... and clamp it to 40 bits, unless we have some braindead > - * HW that implements less than that. In all cases, we'll > - * return that value for the rest of the kernel to decide what > - * to do. > - */ > - val |= VTCR_EL2_T0SZ(phys_shift > 40 ? 40 : phys_shift); > - > - /* > - * Check the availability of Hardware Access Flag / Dirty Bit > - * Management in ID_AA64MMFR1_EL1 and enable the feature in VTCR_EL2. > - */ > - tmp = (read_sysreg(id_aa64mmfr1_el1) >> ID_AA64MMFR1_HADBS_SHIFT) & 0xf; > - if (tmp) > - val |= VTCR_EL2_HA; > - > - /* > - * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS > - * bit in VTCR_EL2. > - */ > - tmp = (read_sysreg(id_aa64mmfr1_el1) >> ID_AA64MMFR1_VMIDBITS_SHIFT) & 0xf; > - val |= (tmp == ID_AA64MMFR1_VMIDBITS_16) ? > - VTCR_EL2_VS_16BIT : > - VTCR_EL2_VS_8BIT; > - > - write_sysreg(val, vtcr_el2); > - > - return phys_shift; > -} > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index b0c07dab5cb3..e0c49377b771 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -26,6 +26,7 @@ > > #include <kvm/arm_arch_timer.h> > > +#include <asm/cpufeature.h> > #include <asm/cputype.h> > #include <asm/ptrace.h> > #include <asm/kvm_arm.h> > @@ -134,9 +135,38 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) > return kvm_timer_vcpu_reset(vcpu); > } > > +/* > + * Configure the VTCR_EL2 for this VM. The VTCR value is common > + * across all the physical CPUs on the system. We use system wide > + * sanitised values to fill in different fields, except for Hardware > + * Management of Access Flags. HA Flag is set unconditionally on > + * all CPUs, as it is safe to run with or without the feature and > + * the bit is RES0 on CPUs that don't support it. > + */ > int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) > { > + u64 vtcr = VTCR_EL2_FLAGS; #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) in include/asm/kvm_arm.h I don't see T0SZ=24 encoded there and I don't see it set either in the code below? For bisection purpose. Thanks Eric > + u64 parange; > + > if (type) > return -EINVAL; > + > + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7; > + if (parange > ID_AA64MMFR0_PARANGE_MAX) > + parange = ID_AA64MMFR0_PARANGE_MAX; > + vtcr |= parange << VTCR_EL2_PS_SHIFT; > + > + /* > + * Enable the Hardware Access Flag management, unconditionally > + * on all CPUs. The features is RES0 on CPUs without the support > + * and must be ignored by the CPUs. > + */ > + vtcr |= VTCR_EL2_HA; > + > + /* Set the vmid bits */ > + vtcr |= (kvm_get_vmid_bits() == 16) ? > + VTCR_EL2_VS_16BIT : > + VTCR_EL2_VS_8BIT; > + kvm->arch.vtcr = vtcr; > return 0; > } >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > Allow the arch backends to perform VM specific initialisation. > This will be later used to handle IPA size configuration and per-VM > VTCR configuration on arm64. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Eric > --- > arch/arm/include/asm/kvm_host.h | 7 +++++++ > arch/arm64/include/asm/kvm_host.h | 2 ++ > arch/arm64/kvm/reset.c | 7 +++++++ > virt/kvm/arm/arm.c | 5 +++-- > 4 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 3ad482d2f1eb..72d46418e1ef 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -354,4 +354,11 @@ static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {} > struct kvm *kvm_arch_alloc_vm(void); > void kvm_arch_free_vm(struct kvm *kvm); > > +static inline int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) > +{ > + if (type) > + return -EINVAL; > + return 0; > +} > + > #endif /* __ARM_KVM_HOST_H__ */ > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index 3d6d7336f871..b04280ae1be0 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -513,4 +513,6 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu); > struct kvm *kvm_arch_alloc_vm(void); > void kvm_arch_free_vm(struct kvm *kvm); > > +int kvm_arm_config_vm(struct kvm *kvm, unsigned long type); > + > #endif /* __ARM64_KVM_HOST_H__ */ > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index e37c78bbe1ca..b0c07dab5cb3 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -133,3 +133,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) > /* Reset timer */ > return kvm_timer_vcpu_reset(vcpu); > } > + > +int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) > +{ > + if (type) > + return -EINVAL; > + return 0; > +} > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index c92053bc3f96..327d0fd28380 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -120,8 +120,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > { > int ret, cpu; > > - if (type) > - return -EINVAL; > + ret = kvm_arm_config_vm(kvm, type); > + if (ret) > + return ret; > > kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran)); > if (!kvm->arch.last_vcpu_ran) >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > On arm64 VTTBR_EL2:BADDR holds the base address for the stage2 > translation table. The Arm ARM mandates that the bits BADDR[x-1:0] > should be 0, where 'x' is defined for a given IPA Size and the > number of levels for a translation granule size. It is defined > using some magical constants. This patch is a reverse engineered > implementation to calculate the 'x' at runtime for a given ipa and > number of page table levels. See patch for more details. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > Changes since V3: > - Update reference to latest ARM ARM and improve commentary > --- > arch/arm64/include/asm/kvm_arm.h | 63 +++++++++++++++++++++++++++++--- > arch/arm64/include/asm/kvm_mmu.h | 25 ++++++++++++- > 2 files changed, 81 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h > index 14317b3a1820..3fb1d440be6e 100644 > --- a/arch/arm64/include/asm/kvm_arm.h > +++ b/arch/arm64/include/asm/kvm_arm.h > @@ -123,7 +123,6 @@ > #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) > #define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) > #define VTCR_EL2_T0SZ_MASK 0x3f > -#define VTCR_EL2_T0SZ_40B 24 > #define VTCR_EL2_VS_SHIFT 19 > #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) > #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) > @@ -140,11 +139,8 @@ > * Note that when using 4K pages, we concatenate two first level page tables > * together. With 16K pages, we concatenate 16 first level page tables. > * > - * The magic numbers used for VTTBR_X in this patch can be found in Tables > - * D4-23 and D4-25 in ARM DDI 0487A.b. > */ > > -#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B > #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ > VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1) > > @@ -175,9 +171,64 @@ > #endif > > #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) > -#define VTTBR_X (VTTBR_X_TGRAN_MAGIC - VTCR_EL2_T0SZ_IPA) > +/* > + * ARM VMSAv8-64 defines an algorithm for finding the translation table > + * descriptors in section D4.2.8 in ARM DDI 0487C.a. > + * > + * The algorithm defines the expectations on the BaseAddress (for the page > + * table) bits resolved at each level based on the page size, entry level > + * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR > + * for stage2 page table. > + * > + * The value of "x" is calculated as : > + * x = Magic_N - T0SZ What is not crystal clear to me is the "if SL0b,c = n" case where x get a value not based on Magic_N. Please could you explain why it is not relevant? Thanks Eric > + * > + * where Magic_N is an integer depending on the page size and the entry > + * level of the page table as below: > + * > + * -------------------------------------------- > + * | Entry level | 4K 16K 64K | > + * -------------------------------------------- > + * | Level: 0 (4 levels) | 28 | - | - | > + * -------------------------------------------- > + * | Level: 1 (3 levels) | 37 | 31 | 25 | > + * -------------------------------------------- > + * | Level: 2 (2 levels) | 46 | 42 | 38 | > + * -------------------------------------------- > + * | Level: 3 (1 level) | - | 53 | 51 | > + * -------------------------------------------- > + * > + * We have a magic formula for the Magic_N below: > + * > + * Magic_N(PAGE_SIZE, Level) = 64 - ((PAGE_SHIFT - 3) * Number_of_levels) > + * > + * where Number_of_levels = (4 - Level). We are only interested in the > + * value for Entry_Level for the stage2 page table. > + * > + * So, given that T0SZ = (64 - IPA_SHIFT), we can compute 'x' as follows: > + * > + * x = (64 - ((PAGE_SHIFT - 3) * Number_of_levels)) - (64 - IPA_SHIFT) > + * = IPA_SHIFT - ((PAGE_SHIFT - 3) * Number of levels) > + * > + * Here is one way to explain the Magic Formula: > + * > + * x = log2(Size_of_Entry_Level_Table) > + * > + * Since, we can resolve (PAGE_SHIFT - 3) bits at each level, and another > + * PAGE_SHIFT bits in the PTE, we have : > + * > + * Bits_Entry_level = IPA_SHIFT - ((PAGE_SHIFT - 3) * (n - 1) + PAGE_SHIFT) > + * = IPA_SHIFT - (PAGE_SHIFT - 3) * n - 3 > + * where n = number of levels, and since each pointer is 8bytes, we have: > + * > + * x = Bits_Entry_Level + 3 > + * = IPA_SHIFT - (PAGE_SHIFT - 3) * n > + * > + * The only constraint here is that, we have to find the number of page table > + * levels for a given IPA size (which we do, see stage2_pt_levels()) > + */ > +#define ARM64_VTTBR_X(ipa, levels) ((ipa) - ((levels) * (PAGE_SHIFT - 3))) > > -#define VTTBR_BADDR_MASK (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << VTTBR_X) > #define VTTBR_VMID_SHIFT (UL(48)) > #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT) > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 7342d2c51773..ac3ca9690bad 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -145,7 +145,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v) > #define kvm_phys_shift(kvm) KVM_PHYS_SHIFT > #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) > #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) > -#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK > > static inline bool kvm_page_empty(void *ptr) > { > @@ -520,5 +519,29 @@ static inline int hyp_map_aux_data(void) > > #define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr) > > +/* > + * Get the magic number 'x' for VTTBR:BADDR of this KVM instance. > + * With v8.2 LVA extensions, 'x' should be a minimum of 6 with > + * 52bit IPS. > + */ > +static inline int arm64_vttbr_x(u32 ipa_shift, u32 levels) > +{ > + int x = ARM64_VTTBR_X(ipa_shift, levels); > + > + return (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && x < 6) ? 6 : x; > +} > + > +static inline u64 vttbr_baddr_mask(u32 ipa_shift, u32 levels) > +{ > + unsigned int x = arm64_vttbr_x(ipa_shift, levels); > + > + return GENMASK_ULL(PHYS_MASK_SHIFT - 1, x); > +} > + > +static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm) > +{ > + return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)); > +} > + > #endif /* __ASSEMBLY__ */ > #endif /* __ARM64_KVM_MMU_H__ */ >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > Switch to dynamic stage2 page table layout based on the given > VM. So far we had a common stage2 table layout determined at > compile time. Make decision based on the VM instance depending > on the IPA limit for the VM. Adds helpers to compute the stage2 > parameters based on the guest's IPA and uses them to make the decisions. > > The IPA limit is still fixed to 40bits and the build time check > to ensure the stage2 doesn't exceed the host kernels page table > levels is retained. Also make sure that the host has pud/pmd helpers > are used only when they are available at host. needs some rewording. > > Cc: Christoffer Dall <cdall@kernel.org> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/stage2_pgtable.h | 84 +++++++++++++++---------- > 1 file changed, 52 insertions(+), 32 deletions(-) > > diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h > index 384f9e982cba..e5acda8e2e31 100644 > --- a/arch/arm64/include/asm/stage2_pgtable.h > +++ b/arch/arm64/include/asm/stage2_pgtable.h > @@ -22,6 +22,13 @@ > #include <linux/hugetlb.h> > #include <asm/pgtable.h> > > +/* > + * PGDIR_SHIFT determines the size a top-level page table entry can map strictly speaking it is the index of the start level, right? size of the top-level page table entry is 1 << PGDIR_SHIFT > + * and depends on the number of levels in the page table. Compute the > + * PGDIR_SHIFT for a given number of levels. > + */ > +#define pt_levels_pgdir_shift(n) ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - (n)) why not using lvls instead of n as below? Besides Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Eric > + > /* > * The hardware supports concatenation of up to 16 tables at stage2 entry level > * and we use the feature whenever possible. > @@ -30,11 +37,13 @@ > * On arm64, the smallest PAGE_SIZE supported is 4k, which means > * (PAGE_SHIFT - 3) > 4 holds for all page sizes. > * This implies, the total number of page table levels at stage2 expected > - * by the hardware is actually the number of levels required for (KVM_PHYS_SHIFT - 4) > + * by the hardware is actually the number of levels required for (IPA_SHIFT - 4) > * in normal translations(e.g, stage1), since we cannot have another level in > - * the range (KVM_PHYS_SHIFT, KVM_PHYS_SHIFT - 4). > + * the range (IPA_SHIFT, IPA_SHIFT - 4). > */ > -#define STAGE2_PGTABLE_LEVELS ARM64_HW_PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4) > +#define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4) > +#define STAGE2_PGTABLE_LEVELS stage2_pgtable_levels(KVM_PHYS_SHIFT) > +#define kvm_stage2_levels(kvm) stage2_pgtable_levels(kvm_phys_shift(kvm)) > > /* > * With all the supported VA_BITs and 40bit guest IPA, the following condition > @@ -54,33 +63,42 @@ > #error "Unsupported combination of guest IPA and host VA_BITS." > #endif > > -/* S2_PGDIR_SHIFT is the size mapped by top-level stage2 entry */ > -#define S2_PGDIR_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - STAGE2_PGTABLE_LEVELS) > -#define S2_PGDIR_SIZE (1UL << S2_PGDIR_SHIFT) > -#define S2_PGDIR_MASK (~(S2_PGDIR_SIZE - 1)) > + > +/* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */ > +#define stage2_pgdir_shift(kvm) pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) > +#define stage2_pgdir_size(kvm) (1ULL << stage2_pgdir_shift(kvm)) > +#define stage2_pgdir_mask(kvm) ~(stage2_pgdir_size(kvm) - 1) > > /* > * The number of PTRS across all concatenated stage2 tables given by the > * number of bits resolved at the initial level. > */ > -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT)) > +#define __s2_pgd_ptrs(ipa, lvls) (1 << ((ipa) - pt_levels_pgdir_shift((lvls)))) > +#define __s2_pgd_size(ipa, lvls) (__s2_pgd_ptrs((ipa), (lvls)) * sizeof(pgd_t)) > + > +#define stage2_pgd_ptrs(kvm) __s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)) > +#define stage2_pgd_size(kvm) __s2_pgd_size(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)) > > /* > * kvm_mmmu_cache_min_pages() is the number of pages required to install > * a stage-2 translation. We pre-allocate the entry level page table at > * the VM creation. > */ > -#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1) > +#define kvm_mmu_cache_min_pages(kvm) (kvm_stage2_levels(kvm) - 1) > > /* Stage2 PUD definitions when the level is present */ > -#define STAGE2_PGTABLE_HAS_PUD (STAGE2_PGTABLE_LEVELS > 3) > +static inline bool kvm_stage2_has_pud(struct kvm *kvm) > +{ > + return (CONFIG_PGTABLE_LEVELS > 3) && (kvm_stage2_levels(kvm) > 3); > +} > + > #define S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1) > #define S2_PUD_SIZE (1UL << S2_PUD_SHIFT) > #define S2_PUD_MASK (~(S2_PUD_SIZE - 1)) > > static inline bool stage2_pgd_none(struct kvm *kvm, pgd_t pgd) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > return pgd_none(pgd); > else > return 0; > @@ -88,13 +106,13 @@ static inline bool stage2_pgd_none(struct kvm *kvm, pgd_t pgd) > > static inline void stage2_pgd_clear(struct kvm *kvm, pgd_t *pgdp) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > pgd_clear(pgdp); > } > > static inline bool stage2_pgd_present(struct kvm *kvm, pgd_t pgd) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > return pgd_present(pgd); > else > return 1; > @@ -102,14 +120,14 @@ static inline bool stage2_pgd_present(struct kvm *kvm, pgd_t pgd) > > static inline void stage2_pgd_populate(struct kvm *kvm, pgd_t *pgd, pud_t *pud) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > pgd_populate(NULL, pgd, pud); > } > > static inline pud_t *stage2_pud_offset(struct kvm *kvm, > pgd_t *pgd, unsigned long address) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > return pud_offset(pgd, address); > else > return (pud_t *)pgd; > @@ -117,13 +135,13 @@ static inline pud_t *stage2_pud_offset(struct kvm *kvm, > > static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > pud_free(NULL, pud); > } > > static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) > { > - if (STAGE2_PGTABLE_HAS_PUD) > + if (kvm_stage2_has_pud(kvm)) > return kvm_page_empty(pudp); > else > return false; > @@ -132,7 +150,7 @@ static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) > static inline phys_addr_t > stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > - if (STAGE2_PGTABLE_HAS_PUD) { > + if (kvm_stage2_has_pud(kvm)) { > phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK; > > return (boundary - 1 < end - 1) ? boundary : end; > @@ -142,14 +160,18 @@ stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > } > > /* Stage2 PMD definitions when the level is present */ > -#define STAGE2_PGTABLE_HAS_PMD (STAGE2_PGTABLE_LEVELS > 2) > +static inline bool kvm_stage2_has_pmd(struct kvm *kvm) > +{ > + return (CONFIG_PGTABLE_LEVELS > 2) && (kvm_stage2_levels(kvm) > 2); > +} > + > #define S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2) > #define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) > #define S2_PMD_MASK (~(S2_PMD_SIZE - 1)) > > static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > return pud_none(pud); > else > return 0; > @@ -157,13 +179,13 @@ static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud) > > static inline void stage2_pud_clear(struct kvm *kvm, pud_t *pud) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > pud_clear(pud); > } > > static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > return pud_present(pud); > else > return 1; > @@ -171,14 +193,14 @@ static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud) > > static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > pud_populate(NULL, pud, pmd); > } > > static inline pmd_t *stage2_pmd_offset(struct kvm *kvm, > pud_t *pud, unsigned long address) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > return pmd_offset(pud, address); > else > return (pmd_t *)pud; > @@ -186,13 +208,13 @@ static inline pmd_t *stage2_pmd_offset(struct kvm *kvm, > > static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > pmd_free(NULL, pmd); > } > > static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > return pud_huge(pud); > else > return 0; > @@ -200,7 +222,7 @@ static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud) > > static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp) > { > - if (STAGE2_PGTABLE_HAS_PMD) > + if (kvm_stage2_has_pmd(kvm)) > return kvm_page_empty(pmdp); > else > return 0; > @@ -209,7 +231,7 @@ static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp) > static inline phys_addr_t > stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > - if (STAGE2_PGTABLE_HAS_PMD) { > + if (kvm_stage2_has_pmd(kvm)) { > phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK; > > return (boundary - 1 < end - 1) ? boundary : end; > @@ -223,17 +245,15 @@ static inline bool stage2_pte_table_empty(struct kvm *kvm, pte_t *ptep) > return kvm_page_empty(ptep); > } > > -#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t)) > - > static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr) > { > - return (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)); > + return (((addr) >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1)); > } > > static inline phys_addr_t > stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > - phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK; > + phys_addr_t boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm); > > return (boundary - 1 < end - 1) ? boundary : end; > } >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > Right now the stage2 page table for a VM is hard coded, assuming > an IPA of 40bits. As we are about to add support for per VM IPA, > prepare the stage2 page table helpers to accept the kvm instance > to make the right decision for the VM. No functional changes. > Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves > some of the definitions in arm32 to align with the arm64. > Also drop the _AC() specifier constants wherever possible. > > Cc: Christoffer Dall <cdall@kernel.org> > Acked-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Eric > --- > Changes since V3: > - Improve the comment about kvm_mmu_cache_min_pages() > - Drop _AC() in arm64 definitions > - Move kvm_mmu_cache_min_pages() in arm to stage2_pgtable.h in > line with arm64. > --- > arch/arm/include/asm/kvm_arm.h | 3 +- > arch/arm/include/asm/kvm_mmu.h | 13 +- > arch/arm/include/asm/stage2_pgtable.h | 50 +++++--- > arch/arm64/include/asm/kvm_mmu.h | 7 +- > arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 +-- > arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 +-- > arch/arm64/include/asm/stage2_pgtable.h | 58 +++++---- > virt/kvm/arm/arm.c | 2 +- > virt/kvm/arm/mmu.c | 119 +++++++++--------- > virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- > 10 files changed, 156 insertions(+), 132 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h > index 3ab8b3781bfe..c3f1f9b304b7 100644 > --- a/arch/arm/include/asm/kvm_arm.h > +++ b/arch/arm/include/asm/kvm_arm.h > @@ -133,8 +133,7 @@ > * space. > */ > #define KVM_PHYS_SHIFT (40) > -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) > -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) > + > #define PTRS_PER_S2_PGD (_AC(1, ULL) << (KVM_PHYS_SHIFT - 30)) > > /* Virtualization Translation Control Register (VTCR) bits */ > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > index 265ea9cf7df7..12ae5fbbcf01 100644 > --- a/arch/arm/include/asm/kvm_mmu.h > +++ b/arch/arm/include/asm/kvm_mmu.h > @@ -35,16 +35,12 @@ > addr; \ > }) > > -/* > - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. > - */ > -#define KVM_MMU_CACHE_MIN_PAGES 2 > - > #ifndef __ASSEMBLY__ > > #include <linux/highmem.h> > #include <asm/cacheflush.h> > #include <asm/cputype.h> > +#include <asm/kvm_arm.h> > #include <asm/kvm_hyp.h> > #include <asm/pgalloc.h> > #include <asm/stage2_pgtable.h> > @@ -52,6 +48,13 @@ > /* Ensure compatibility with arm64 */ > #define VA_BITS 32 > > +#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT > +#define kvm_phys_size(kvm) (1ULL << kvm_phys_shift(kvm)) > +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - 1ULL) > +#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK > + > +#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t)) > + > int create_hyp_mappings(void *from, void *to, pgprot_t prot); > int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, > void __iomem **kaddr, > diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h > index 460d616bb2d6..f6a7ea805232 100644 > --- a/arch/arm/include/asm/stage2_pgtable.h > +++ b/arch/arm/include/asm/stage2_pgtable.h > @@ -19,43 +19,53 @@ > #ifndef __ARM_S2_PGTABLE_H_ > #define __ARM_S2_PGTABLE_H_ > > -#define stage2_pgd_none(pgd) pgd_none(pgd) > -#define stage2_pgd_clear(pgd) pgd_clear(pgd) > -#define stage2_pgd_present(pgd) pgd_present(pgd) > -#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud) > -#define stage2_pud_offset(pgd, address) pud_offset(pgd, address) > -#define stage2_pud_free(pud) pud_free(NULL, pud) > +/* > + * kvm_mmu_cache_min_pages() is the number of pages required > + * to install a stage-2 translation. We pre-allocate the entry > + * level table at VM creation. Since we have a 3 level page-table, > + * we need only two pages to add a new mapping. > + */ > +#define kvm_mmu_cache_min_pages(kvm) 2 > > -#define stage2_pud_none(pud) pud_none(pud) > -#define stage2_pud_clear(pud) pud_clear(pud) > -#define stage2_pud_present(pud) pud_present(pud) > -#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd) > -#define stage2_pmd_offset(pud, address) pmd_offset(pud, address) > -#define stage2_pmd_free(pmd) pmd_free(NULL, pmd) > +#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) > +#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) > +#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) > +#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) > +#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) > +#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) > > -#define stage2_pud_huge(pud) pud_huge(pud) > +#define stage2_pud_none(kvm, pud) pud_none(pud) > +#define stage2_pud_clear(kvm, pud) pud_clear(pud) > +#define stage2_pud_present(kvm, pud) pud_present(pud) > +#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) > +#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) > +#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd) > + > +#define stage2_pud_huge(kvm, pud) pud_huge(pud) > > /* Open coded p*d_addr_end that can deal with 64bit addresses */ > -static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end) > +static inline phys_addr_t > +stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK; > > return (boundary - 1 < end - 1) ? boundary : end; > } > > -#define stage2_pud_addr_end(addr, end) (end) > +#define stage2_pud_addr_end(kvm, addr, end) (end) > > -static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) > +static inline phys_addr_t > +stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > phys_addr_t boundary = (addr + PMD_SIZE) & PMD_MASK; > > return (boundary - 1 < end - 1) ? boundary : end; > } > > -#define stage2_pgd_index(addr) pgd_index(addr) > +#define stage2_pgd_index(kvm, addr) pgd_index(addr) > > -#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep) > -#define stage2_pmd_table_empty(pmdp) kvm_page_empty(pmdp) > -#define stage2_pud_table_empty(pudp) false > +#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) > +#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) > +#define stage2_pud_table_empty(kvm, pudp) false > > #endif /* __ARM_S2_PGTABLE_H_ */ > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index d6fff7de5539..3a032066e52c 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -141,8 +141,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v) > * We currently only support a 40bit IPA. > */ > #define KVM_PHYS_SHIFT (40) > -#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) > -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) > + > +#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT > +#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) > +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) > +#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK > > #include <asm/stage2_pgtable.h> > > diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h b/arch/arm64/include/asm/stage2_pgtable-nopmd.h > index 2656a0fd05a6..0280dedbf75f 100644 > --- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h > +++ b/arch/arm64/include/asm/stage2_pgtable-nopmd.h > @@ -26,17 +26,17 @@ > #define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) > #define S2_PMD_MASK (~(S2_PMD_SIZE-1)) > > -#define stage2_pud_none(pud) (0) > -#define stage2_pud_present(pud) (1) > -#define stage2_pud_clear(pud) do { } while (0) > -#define stage2_pud_populate(pud, pmd) do { } while (0) > -#define stage2_pmd_offset(pud, address) ((pmd_t *)(pud)) > +#define stage2_pud_none(kvm, pud) (0) > +#define stage2_pud_present(kvm, pud) (1) > +#define stage2_pud_clear(kvm, pud) do { } while (0) > +#define stage2_pud_populate(kvm, pud, pmd) do { } while (0) > +#define stage2_pmd_offset(kvm, pud, address) ((pmd_t *)(pud)) > > -#define stage2_pmd_free(pmd) do { } while (0) > +#define stage2_pmd_free(kvm, pmd) do { } while (0) > > -#define stage2_pmd_addr_end(addr, end) (end) > +#define stage2_pmd_addr_end(kvm, addr, end) (end) > > -#define stage2_pud_huge(pud) (0) > -#define stage2_pmd_table_empty(pmdp) (0) > +#define stage2_pud_huge(kvm, pud) (0) > +#define stage2_pmd_table_empty(kvm, pmdp) (0) > > #endif > diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h b/arch/arm64/include/asm/stage2_pgtable-nopud.h > index 5ee87b54ebf3..cd6304e203be 100644 > --- a/arch/arm64/include/asm/stage2_pgtable-nopud.h > +++ b/arch/arm64/include/asm/stage2_pgtable-nopud.h > @@ -24,16 +24,16 @@ > #define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT) > #define S2_PUD_MASK (~(S2_PUD_SIZE-1)) > > -#define stage2_pgd_none(pgd) (0) > -#define stage2_pgd_present(pgd) (1) > -#define stage2_pgd_clear(pgd) do { } while (0) > -#define stage2_pgd_populate(pgd, pud) do { } while (0) > +#define stage2_pgd_none(kvm, pgd) (0) > +#define stage2_pgd_present(kvm, pgd) (1) > +#define stage2_pgd_clear(kvm, pgd) do { } while (0) > +#define stage2_pgd_populate(kvm, pgd, pud) do { } while (0) > > -#define stage2_pud_offset(pgd, address) ((pud_t *)(pgd)) > +#define stage2_pud_offset(kvm, pgd, address) ((pud_t *)(pgd)) > > -#define stage2_pud_free(x) do { } while (0) > +#define stage2_pud_free(kvm, x) do { } while (0) > > -#define stage2_pud_addr_end(addr, end) (end) > -#define stage2_pud_table_empty(pmdp) (0) > +#define stage2_pud_addr_end(kvm, addr, end) (end) > +#define stage2_pud_table_empty(kvm, pmdp) (0) > > #endif > diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h > index 8b68099348e5..11891612be14 100644 > --- a/arch/arm64/include/asm/stage2_pgtable.h > +++ b/arch/arm64/include/asm/stage2_pgtable.h > @@ -55,7 +55,7 @@ > > /* S2_PGDIR_SHIFT is the size mapped by top-level stage2 entry */ > #define S2_PGDIR_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - STAGE2_PGTABLE_LEVELS) > -#define S2_PGDIR_SIZE (_AC(1, UL) << S2_PGDIR_SHIFT) > +#define S2_PGDIR_SIZE (1UL << S2_PGDIR_SHIFT) > #define S2_PGDIR_MASK (~(S2_PGDIR_SIZE - 1)) > > /* > @@ -65,28 +65,30 @@ > #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT)) > > /* > - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation > - * levels in addition to the PGD. > + * kvm_mmmu_cache_min_pages() is the number of pages required to install > + * a stage-2 translation. We pre-allocate the entry level page table at > + * the VM creation. > */ > -#define KVM_MMU_CACHE_MIN_PAGES (STAGE2_PGTABLE_LEVELS - 1) > +#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1) > > > #if STAGE2_PGTABLE_LEVELS > 3 > > #define S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1) > -#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT) > +#define S2_PUD_SIZE (1UL << S2_PUD_SHIFT) > #define S2_PUD_MASK (~(S2_PUD_SIZE - 1)) > > -#define stage2_pgd_none(pgd) pgd_none(pgd) > -#define stage2_pgd_clear(pgd) pgd_clear(pgd) > -#define stage2_pgd_present(pgd) pgd_present(pgd) > -#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud) > -#define stage2_pud_offset(pgd, address) pud_offset(pgd, address) > -#define stage2_pud_free(pud) pud_free(NULL, pud) > +#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) > +#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) > +#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) > +#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) > +#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) > +#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) > > -#define stage2_pud_table_empty(pudp) kvm_page_empty(pudp) > +#define stage2_pud_table_empty(kvm, pudp) kvm_page_empty(pudp) > > -static inline phys_addr_t stage2_pud_addr_end(phys_addr_t addr, phys_addr_t end) > +static inline phys_addr_t > +stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK; > > @@ -99,20 +101,21 @@ static inline phys_addr_t stage2_pud_addr_end(phys_addr_t addr, phys_addr_t end) > #if STAGE2_PGTABLE_LEVELS > 2 > > #define S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2) > -#define S2_PMD_SIZE (_AC(1, UL) << S2_PMD_SHIFT) > +#define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) > #define S2_PMD_MASK (~(S2_PMD_SIZE - 1)) > > -#define stage2_pud_none(pud) pud_none(pud) > -#define stage2_pud_clear(pud) pud_clear(pud) > -#define stage2_pud_present(pud) pud_present(pud) > -#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd) > -#define stage2_pmd_offset(pud, address) pmd_offset(pud, address) > -#define stage2_pmd_free(pmd) pmd_free(NULL, pmd) > +#define stage2_pud_none(kvm, pud) pud_none(pud) > +#define stage2_pud_clear(kvm, pud) pud_clear(pud) > +#define stage2_pud_present(kvm, pud) pud_present(pud) > +#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) > +#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) > +#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd) > > -#define stage2_pud_huge(pud) pud_huge(pud) > -#define stage2_pmd_table_empty(pmdp) kvm_page_empty(pmdp) > +#define stage2_pud_huge(kvm, pud) pud_huge(pud) > +#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) > > -static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) > +static inline phys_addr_t > +stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK; > > @@ -121,7 +124,7 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) > > #endif /* STAGE2_PGTABLE_LEVELS > 2 */ > > -#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep) > +#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) > > #if STAGE2_PGTABLE_LEVELS == 2 > #include <asm/stage2_pgtable-nopmd.h> > @@ -129,10 +132,13 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) > #include <asm/stage2_pgtable-nopud.h> > #endif > > +#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t)) > > -#define stage2_pgd_index(addr) (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) > +#define stage2_pgd_index(kvm, addr) \ > + (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) > > -static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end) > +static inline phys_addr_t > +stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > { > phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK; > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 327d0fd28380..43e716bc3f08 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -545,7 +545,7 @@ static void update_vttbr(struct kvm *kvm) > > /* update vttbr to be used with the new vmid */ > pgd_phys = virt_to_phys(kvm->arch.pgd); > - BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK); > + BUG_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)); > vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK(kvm_vmid_bits); > kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid; > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 4a285d760ce0..7e477b3cae5b 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -45,7 +45,6 @@ static phys_addr_t hyp_idmap_vector; > > static unsigned long io_map_base; > > -#define S2_PGD_SIZE (PTRS_PER_S2_PGD * sizeof(pgd_t)) > #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) > > #define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0) > @@ -150,20 +149,20 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc) > > static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr) > { > - pud_t *pud_table __maybe_unused = stage2_pud_offset(pgd, 0UL); > - stage2_pgd_clear(pgd); > + pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL); > + stage2_pgd_clear(kvm, pgd); > kvm_tlb_flush_vmid_ipa(kvm, addr); > - stage2_pud_free(pud_table); > + stage2_pud_free(kvm, pud_table); > put_page(virt_to_page(pgd)); > } > > static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr) > { > - pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(pud, 0); > - VM_BUG_ON(stage2_pud_huge(*pud)); > - stage2_pud_clear(pud); > + pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0); > + VM_BUG_ON(stage2_pud_huge(kvm, *pud)); > + stage2_pud_clear(kvm, pud); > kvm_tlb_flush_vmid_ipa(kvm, addr); > - stage2_pmd_free(pmd_table); > + stage2_pmd_free(kvm, pmd_table); > put_page(virt_to_page(pud)); > } > > @@ -252,7 +251,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd, > } > } while (pte++, addr += PAGE_SIZE, addr != end); > > - if (stage2_pte_table_empty(start_pte)) > + if (stage2_pte_table_empty(kvm, start_pte)) > clear_stage2_pmd_entry(kvm, pmd, start_addr); > } > > @@ -262,9 +261,9 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud, > phys_addr_t next, start_addr = addr; > pmd_t *pmd, *start_pmd; > > - start_pmd = pmd = stage2_pmd_offset(pud, addr); > + start_pmd = pmd = stage2_pmd_offset(kvm, pud, addr); > do { > - next = stage2_pmd_addr_end(addr, end); > + next = stage2_pmd_addr_end(kvm, addr, end); > if (!pmd_none(*pmd)) { > if (pmd_thp_or_huge(*pmd)) { > pmd_t old_pmd = *pmd; > @@ -281,7 +280,7 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud, > } > } while (pmd++, addr = next, addr != end); > > - if (stage2_pmd_table_empty(start_pmd)) > + if (stage2_pmd_table_empty(kvm, start_pmd)) > clear_stage2_pud_entry(kvm, pud, start_addr); > } > > @@ -291,14 +290,14 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, > phys_addr_t next, start_addr = addr; > pud_t *pud, *start_pud; > > - start_pud = pud = stage2_pud_offset(pgd, addr); > + start_pud = pud = stage2_pud_offset(kvm, pgd, addr); > do { > - next = stage2_pud_addr_end(addr, end); > - if (!stage2_pud_none(*pud)) { > - if (stage2_pud_huge(*pud)) { > + next = stage2_pud_addr_end(kvm, addr, end); > + if (!stage2_pud_none(kvm, *pud)) { > + if (stage2_pud_huge(kvm, *pud)) { > pud_t old_pud = *pud; > > - stage2_pud_clear(pud); > + stage2_pud_clear(kvm, pud); > kvm_tlb_flush_vmid_ipa(kvm, addr); > kvm_flush_dcache_pud(old_pud); > put_page(virt_to_page(pud)); > @@ -308,7 +307,7 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, > } > } while (pud++, addr = next, addr != end); > > - if (stage2_pud_table_empty(start_pud)) > + if (stage2_pud_table_empty(kvm, start_pud)) > clear_stage2_pgd_entry(kvm, pgd, start_addr); > } > > @@ -332,7 +331,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > assert_spin_locked(&kvm->mmu_lock); > WARN_ON(size & ~PAGE_MASK); > > - pgd = kvm->arch.pgd + stage2_pgd_index(addr); > + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); > do { > /* > * Make sure the page table is still active, as another thread > @@ -341,8 +340,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > */ > if (!READ_ONCE(kvm->arch.pgd)) > break; > - next = stage2_pgd_addr_end(addr, end); > - if (!stage2_pgd_none(*pgd)) > + next = stage2_pgd_addr_end(kvm, addr, end); > + if (!stage2_pgd_none(kvm, *pgd)) > unmap_stage2_puds(kvm, pgd, addr, next); > /* > * If the range is too large, release the kvm->mmu_lock > @@ -371,9 +370,9 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud, > pmd_t *pmd; > phys_addr_t next; > > - pmd = stage2_pmd_offset(pud, addr); > + pmd = stage2_pmd_offset(kvm, pud, addr); > do { > - next = stage2_pmd_addr_end(addr, end); > + next = stage2_pmd_addr_end(kvm, addr, end); > if (!pmd_none(*pmd)) { > if (pmd_thp_or_huge(*pmd)) > kvm_flush_dcache_pmd(*pmd); > @@ -389,11 +388,11 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd, > pud_t *pud; > phys_addr_t next; > > - pud = stage2_pud_offset(pgd, addr); > + pud = stage2_pud_offset(kvm, pgd, addr); > do { > - next = stage2_pud_addr_end(addr, end); > - if (!stage2_pud_none(*pud)) { > - if (stage2_pud_huge(*pud)) > + next = stage2_pud_addr_end(kvm, addr, end); > + if (!stage2_pud_none(kvm, *pud)) { > + if (stage2_pud_huge(kvm, *pud)) > kvm_flush_dcache_pud(*pud); > else > stage2_flush_pmds(kvm, pud, addr, next); > @@ -409,10 +408,10 @@ static void stage2_flush_memslot(struct kvm *kvm, > phys_addr_t next; > pgd_t *pgd; > > - pgd = kvm->arch.pgd + stage2_pgd_index(addr); > + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); > do { > - next = stage2_pgd_addr_end(addr, end); > - if (!stage2_pgd_none(*pgd)) > + next = stage2_pgd_addr_end(kvm, addr, end); > + if (!stage2_pgd_none(kvm, *pgd)) > stage2_flush_puds(kvm, pgd, addr, next); > } while (pgd++, addr = next, addr != end); > } > @@ -898,7 +897,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) > } > > /* Allocate the HW PGD, making sure that each page gets its own refcount */ > - pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO); > + pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO); > if (!pgd) > return -ENOMEM; > > @@ -987,7 +986,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) > > spin_lock(&kvm->mmu_lock); > if (kvm->arch.pgd) { > - unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); > + unmap_stage2_range(kvm, 0, kvm_phys_size(kvm)); > pgd = READ_ONCE(kvm->arch.pgd); > kvm->arch.pgd = NULL; > } > @@ -995,7 +994,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) > > /* Free the HW pgd, one page at a time */ > if (pgd) > - free_pages_exact(pgd, S2_PGD_SIZE); > + free_pages_exact(pgd, stage2_pgd_size(kvm)); > } > > static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, > @@ -1004,16 +1003,16 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache > pgd_t *pgd; > pud_t *pud; > > - pgd = kvm->arch.pgd + stage2_pgd_index(addr); > - if (stage2_pgd_none(*pgd)) { > + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); > + if (stage2_pgd_none(kvm, *pgd)) { > if (!cache) > return NULL; > pud = mmu_memory_cache_alloc(cache); > - stage2_pgd_populate(pgd, pud); > + stage2_pgd_populate(kvm, pgd, pud); > get_page(virt_to_page(pgd)); > } > > - return stage2_pud_offset(pgd, addr); > + return stage2_pud_offset(kvm, pgd, addr); > } > > static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, > @@ -1026,15 +1025,15 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache > if (!pud) > return NULL; > > - if (stage2_pud_none(*pud)) { > + if (stage2_pud_none(kvm, *pud)) { > if (!cache) > return NULL; > pmd = mmu_memory_cache_alloc(cache); > - stage2_pud_populate(pud, pmd); > + stage2_pud_populate(kvm, pud, pmd); > get_page(virt_to_page(pud)); > } > > - return stage2_pmd_offset(pud, addr); > + return stage2_pmd_offset(kvm, pud, addr); > } > > static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > @@ -1208,8 +1207,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, > if (writable) > pte = kvm_s2pte_mkwrite(pte); > > - ret = mmu_topup_memory_cache(&cache, KVM_MMU_CACHE_MIN_PAGES, > - KVM_NR_MEM_OBJS); > + ret = mmu_topup_memory_cache(&cache, > + kvm_mmu_cache_min_pages(kvm), > + KVM_NR_MEM_OBJS); > if (ret) > goto out; > spin_lock(&kvm->mmu_lock); > @@ -1297,19 +1297,21 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end) > > /** > * stage2_wp_pmds - write protect PUD range > + * kvm: kvm instance for the VM > * @pud: pointer to pud entry > * @addr: range start address > * @end: range end address > */ > -static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end) > +static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, > + phys_addr_t addr, phys_addr_t end) > { > pmd_t *pmd; > phys_addr_t next; > > - pmd = stage2_pmd_offset(pud, addr); > + pmd = stage2_pmd_offset(kvm, pud, addr); > > do { > - next = stage2_pmd_addr_end(addr, end); > + next = stage2_pmd_addr_end(kvm, addr, end); > if (!pmd_none(*pmd)) { > if (pmd_thp_or_huge(*pmd)) { > if (!kvm_s2pmd_readonly(pmd)) > @@ -1329,18 +1331,19 @@ static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end) > * > * Process PUD entries, for a huge PUD we cause a panic. > */ > -static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end) > +static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, > + phys_addr_t addr, phys_addr_t end) > { > pud_t *pud; > phys_addr_t next; > > - pud = stage2_pud_offset(pgd, addr); > + pud = stage2_pud_offset(kvm, pgd, addr); > do { > - next = stage2_pud_addr_end(addr, end); > - if (!stage2_pud_none(*pud)) { > + next = stage2_pud_addr_end(kvm, addr, end); > + if (!stage2_pud_none(kvm, *pud)) { > /* TODO:PUD not supported, revisit later if supported */ > - BUG_ON(stage2_pud_huge(*pud)); > - stage2_wp_pmds(pud, addr, next); > + BUG_ON(stage2_pud_huge(kvm, *pud)); > + stage2_wp_pmds(kvm, pud, addr, next); > } > } while (pud++, addr = next, addr != end); > } > @@ -1356,7 +1359,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > pgd_t *pgd; > phys_addr_t next; > > - pgd = kvm->arch.pgd + stage2_pgd_index(addr); > + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); > do { > /* > * Release kvm_mmu_lock periodically if the memory region is > @@ -1370,9 +1373,9 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > cond_resched_lock(&kvm->mmu_lock); > if (!READ_ONCE(kvm->arch.pgd)) > break; > - next = stage2_pgd_addr_end(addr, end); > - if (stage2_pgd_present(*pgd)) > - stage2_wp_puds(pgd, addr, next); > + next = stage2_pgd_addr_end(kvm, addr, end); > + if (stage2_pgd_present(kvm, *pgd)) > + stage2_wp_puds(kvm, pgd, addr, next); > } while (pgd++, addr = next, addr != end); > } > > @@ -1521,7 +1524,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > up_read(¤t->mm->mmap_sem); > > /* We need minimum second+third level pages */ > - ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES, > + ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm), > KVM_NR_MEM_OBJS); > if (ret) > return ret; > @@ -1764,7 +1767,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) > } > > /* Userspace should not be able to register out-of-bounds IPAs */ > - VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE); > + VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); > > if (fault_status == FSC_ACCESS) { > handle_access_fault(vcpu, fault_ipa); > @@ -2063,7 +2066,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > * space addressable by the KVM guest IPA space. > */ > if (memslot->base_gfn + memslot->npages >= > - (KVM_PHYS_SIZE >> PAGE_SHIFT)) > + (kvm_phys_size(kvm) >> PAGE_SHIFT)) > return -EFAULT; > > down_read(¤t->mm->mmap_sem); > diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c > index 6ada2432e37c..114dce9f4bf5 100644 > --- a/virt/kvm/arm/vgic/vgic-kvm-device.c > +++ b/virt/kvm/arm/vgic/vgic-kvm-device.c > @@ -25,7 +25,7 @@ > int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, > phys_addr_t addr, phys_addr_t alignment) > { > - if (addr & ~KVM_PHYS_MASK) > + if (addr & ~kvm_phys_mask(kvm)) > return -E2BIG; > > if (!IS_ALIGNED(addr, alignment)) >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > VTCR_EL2 holds the following key stage2 translation table > parameters: > SL0 - Entry level in the page table lookup. > T0SZ - Denotes the size of the memory addressed by the table. > > We have been using fixed values for the SL0 depending on the > page size as we have a fixed IPA size. But since we are about > to make it dynamic, we need to calculate the SL0 at runtime > per VM. This patch adds a helper to compute the value of SL0 > for a VM based on the IPA size. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Cc: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > Changes since v3: > - Update reference to latest ARM ARM. > - Update per-vm VTCR value of SL0. > - Add helpers to decode levels from SL0. > - Didn't pick up Reviewed-by tag from Eric, as there > are some new changes in this version > --- > arch/arm64/include/asm/kvm_arm.h | 51 +++++++++++++++++++++++++------- > arch/arm64/kvm/reset.c | 1 + > 2 files changed, 41 insertions(+), 11 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h > index 3fb1d440be6e..5c1487dc5dca 100644 > --- a/arch/arm64/include/asm/kvm_arm.h > +++ b/arch/arm64/include/asm/kvm_arm.h > @@ -121,7 +121,6 @@ > #define VTCR_EL2_IRGN0_WBWA TCR_IRGN0_WBWA > #define VTCR_EL2_SL0_SHIFT 6 > #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) > -#define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) > #define VTCR_EL2_T0SZ_MASK 0x3f > #define VTCR_EL2_VS_SHIFT 19 > #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) > @@ -148,29 +147,59 @@ > /* > * Stage2 translation configuration: > * 64kB pages (TG0 = 1) > - * 2 level page tables (SL = 1) > */ > -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) > -#define VTTBR_X_TGRAN_MAGIC 38 > +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K > +#define VTCR_EL2_TGRAN_SL0_BASE 3UL the name if not obvious. I understand this is yet another magic number used in the formulae below: SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level I first tried to map this onto some spec fields. May be worth a comment? Besides Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Eric > + > #elif defined(CONFIG_ARM64_16K_PAGES) > /* > * Stage2 translation configuration: > * 16kB pages (TG0 = 2) > - * 2 level page tables (SL = 1) > */ > -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1) > -#define VTTBR_X_TGRAN_MAGIC 42 > +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K > +#define VTCR_EL2_TGRAN_SL0_BASE 3UL > #else /* 4K */ > /* > * Stage2 translation configuration: > * 4kB pages (TG0 = 0) > - * 3 level page tables (SL = 1) > */ > -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1) > -#define VTTBR_X_TGRAN_MAGIC 37 > +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K > +#define VTCR_EL2_TGRAN_SL0_BASE 2UL > #endif > > -#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) > +#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN) > +/* > + * VTCR_EL2:SL0 indicates the entry level for Stage2 translation. > + * Interestingly, it depends on the page size. > + * See D.10.2.121, VTCR_EL2, in ARM DDI 0487C.a > + * > + * ----------------------------------------- > + * | Entry level | 4K | 16K/64K | > + * ------------------------------------------ > + * | Level: 0 | 2 | - | > + * ------------------------------------------ > + * | Level: 1 | 1 | 2 | > + * ------------------------------------------ > + * | Level: 2 | 0 | 1 | > + * ------------------------------------------ > + * | Level: 3 | - | 0 | > + * ------------------------------------------ > + * > + * That table roughly translates to : > + * > + * SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level > + * > + * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided > + * we take care of ruling out the unsupported cases and > + * Entry_Level = 4 - Number_of_levels. > + * > + */ > +#define VTCR_EL2_LVLS_TO_SL0(levels) \ > + ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT) > +#define VTCR_EL2_SL0_TO_LVLS(sl0) \ > + ((sl0) + 4 - VTCR_EL2_TGRAN_SL0_BASE) > +#define VTCR_EL2_LVLS(vtcr) \ > + VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT) > /* > * ARM VMSAv8-64 defines an algorithm for finding the translation table > * descriptors in section D4.2.8 in ARM DDI 0487C.a. > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index e0c49377b771..d9b7a00993b6 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -167,6 +167,7 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) > vtcr |= (kvm_get_vmid_bits() == 16) ? > VTCR_EL2_VS_16BIT : > VTCR_EL2_VS_8BIT; > + vtcr |= VTCR_EL2_LVLS_TO_SL0(kvm_stage2_levels(kvm)); > kvm->arch.vtcr = vtcr; > return 0; > } >
On 20/09/18 15:07, Auger Eric wrote: > Hi Suzuki, > On 9/17/18 12:41 PM, Suzuki K Poulose wrote: >> On arm64 VTTBR_EL2:BADDR holds the base address for the stage2 >> translation table. The Arm ARM mandates that the bits BADDR[x-1:0] >> should be 0, where 'x' is defined for a given IPA Size and the >> number of levels for a translation granule size. It is defined >> using some magical constants. This patch is a reverse engineered >> implementation to calculate the 'x' at runtime for a given ipa and >> number of page table levels. See patch for more details. >> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > >> --- >> Changes since V3: >> - Update reference to latest ARM ARM and improve commentary >> --- >> arch/arm64/include/asm/kvm_arm.h | 63 +++++++++++++++++++++++++++++--- >> arch/arm64/include/asm/kvm_mmu.h | 25 ++++++++++++- >> 2 files changed, 81 insertions(+), 7 deletions(-) >> >> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h >> index 14317b3a1820..3fb1d440be6e 100644 >> --- a/arch/arm64/include/asm/kvm_arm.h >> +++ b/arch/arm64/include/asm/kvm_arm.h >> @@ -123,7 +123,6 @@ >> #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) >> #define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) >> #define VTCR_EL2_T0SZ_MASK 0x3f >> -#define VTCR_EL2_T0SZ_40B 24 >> #define VTCR_EL2_VS_SHIFT 19 >> #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) >> #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) >> @@ -140,11 +139,8 @@ >> * Note that when using 4K pages, we concatenate two first level page tables >> * together. With 16K pages, we concatenate 16 first level page tables. >> * >> - * The magic numbers used for VTTBR_X in this patch can be found in Tables >> - * D4-23 and D4-25 in ARM DDI 0487A.b. >> */ >> >> -#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B >> #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ >> VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1) >> >> @@ -175,9 +171,64 @@ >> #endif >> >> #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) >> -#define VTTBR_X (VTTBR_X_TGRAN_MAGIC - VTCR_EL2_T0SZ_IPA) >> +/* >> + * ARM VMSAv8-64 defines an algorithm for finding the translation table >> + * descriptors in section D4.2.8 in ARM DDI 0487C.a. >> + * >> + * The algorithm defines the expectations on the BaseAddress (for the page >> + * table) bits resolved at each level based on the page size, entry level >> + * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR >> + * for stage2 page table. >> + * >> + * The value of "x" is calculated as : >> + * x = Magic_N - T0SZ > > What is not crystal clear to me is the "if SL0b,c = n" case where x get > a value not based on Magic_N. Please could you explain why it is not > relevant? We only care about the "x" for the "entry" level of the table look up to make sure that the VTTBR is physical address meets the required alignment. In both cases, if SL0 b,c == n, x is (PAGE_SHIFT) iff the level you are looking at is not the "entry level". So this should always be page aligned, like any intermediate level table. The Magic value is needed only needed for the "entry" level due to the fact that we may have lesser bits to resolve (i.e, depending on your PAMax or in other words T0SZ) than the intermediate levels (where we always resolve {PAGE_SHIFT - 3} bits. This is further complicated by the fact that Stage2 could use different number of levels for a given T0SZ than the stage1. I acknowledge that the algorithm is a bit too cryptic and I spent quite sometime decode it to the formula we use below ;-). I could update the comment to : /* * ARM VMSAv8-64 defines an algorithm for finding the translation table * descriptors in section D4.2.8 in ARM DDI 0487C.a. * * The algorithm defines the expectations on the translation table * addresses for each level, based on PAGE_SIZE, entry level * and the translation table size (T0SZ). The variable "x" in the * algorithm determines the alignment of a table base address at a given * level and thus determines the alignment of VTTBR:BADDR for stage2 * page table entry level. * Since the number of bits resolved at the entry level could vary * depending on the T0SZ, the value of "x" is defined based on a * Magic constant for a given PAGE_SIZE and Entry Level. The * intermediate levels must be always aligned to the PAGE_SIZE (i.e, * x = PAGE_SHIFT). * * The value of "x" for entry level is calculated as : * x = Magic_N - T0SZ * ... Suzuki IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Eric, On 20/09/18 15:25, Auger Eric wrote: > Hi Suzuki, > On 9/17/18 12:41 PM, Suzuki K Poulose wrote: >> VTCR_EL2 holds the following key stage2 translation table >> parameters: >> SL0 - Entry level in the page table lookup. >> T0SZ - Denotes the size of the memory addressed by the table. >> >> We have been using fixed values for the SL0 depending on the >> page size as we have a fixed IPA size. But since we are about >> to make it dynamic, we need to calculate the SL0 at runtime >> per VM. This patch adds a helper to compute the value of SL0 >> for a VM based on the IPA size. >> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Cc: Eric Auger <eric.auger@redhat.com> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> Changes since v3: >> - Update reference to latest ARM ARM. >> - Update per-vm VTCR value of SL0. >> - Add helpers to decode levels from SL0. >> - Didn't pick up Reviewed-by tag from Eric, as there >> are some new changes in this version (-) >> >> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h >> index 3fb1d440be6e..5c1487dc5dca 100644 >> --- a/arch/arm64/include/asm/kvm_arm.h >> +++ b/arch/arm64/include/asm/kvm_arm.h >> @@ -121,7 +121,6 @@ >> #define VTCR_EL2_IRGN0_WBWA TCR_IRGN0_WBWA >> #define VTCR_EL2_SL0_SHIFT 6 >> #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) >> -#define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) >> #define VTCR_EL2_T0SZ_MASK 0x3f >> #define VTCR_EL2_VS_SHIFT 19 >> #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) >> @@ -148,29 +147,59 @@ >> /* >> * Stage2 translation configuration: >> * 64kB pages (TG0 = 1) >> - * 2 level page tables (SL = 1) >> */ >> -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) >> -#define VTTBR_X_TGRAN_MAGIC 38 >> +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K >> +#define VTCR_EL2_TGRAN_SL0_BASE 3UL > the name if not obvious. I understand this is yet another magic number > used in the formulae below: > SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level > I first tried to map this onto some spec fields. May be worth a comment? Sure, I will make that clear. > > Besides > Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Suzuki IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > From: Kristina Martsenko <kristina.martsenko@arm.com> > > Add support for handling 52bit guest physical address to the > VGIC layer. So far we have limited the guest physical address > to 48bits, by explicitly masking the upper bits. This patch > removes the restriction. We do not have to check if the host > supports 52bit as the gpa is always validated during an access. > (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). > Also, the ITS table save-restore is also not affected with > the enhancement. The DTE entries already store the bits[51:8] > of the ITT_addr (with a 256byte alignment). > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> > [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > include/linux/irqchip/arm-gic-v3.h | 5 +++++ > virt/kvm/arm/vgic/vgic-its.c | 36 +++++++++--------------------- > virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 -- > 3 files changed, 15 insertions(+), 28 deletions(-) > > diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h > index 8bdbb5f29494..e961f40992d7 100644 > --- a/include/linux/irqchip/arm-gic-v3.h > +++ b/include/linux/irqchip/arm-gic-v3.h > @@ -357,6 +357,8 @@ > #define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt) > #define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb) > > +#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(52, 12)) nit GENMASK_ULL(51, 12), bit 52 is RES0 > + > #define GITS_BASER_NR_REGS 8 > > #define GITS_BASER_VALID (1ULL << 63) > @@ -388,6 +390,9 @@ > #define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48) > #define GITS_BASER_PHYS_52_to_48(phys) \ > (((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12) > +#define GITS_BASER_ADDR_48_to_52(baser) \ > + (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48) > + > #define GITS_BASER_SHAREABILITY_SHIFT (10) > #define GITS_BASER_InnerShareable \ > GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable) > diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c > index 12502251727e..eb2a390a6c86 100644 > --- a/virt/kvm/arm/vgic/vgic-its.c > +++ b/virt/kvm/arm/vgic/vgic-its.c > @@ -241,13 +241,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id, > list_for_each_entry(dev, &(its)->device_list, dev_list) \ > list_for_each_entry(ite, &(dev)->itt_head, ite_list) > > -/* > - * We only implement 48 bits of PA at the moment, although the ITS > - * supports more. Let's be restrictive here. > - */ > -#define BASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 16)) > -#define CBASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 12)) > - > #define GIC_LPI_OFFSET 8192 > > #define VITS_TYPER_IDBITS 16 > @@ -759,6 +752,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, > { > int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K; > u64 indirect_ptr, type = GITS_BASER_TYPE(baser); > + phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser); > int esz = GITS_BASER_ENTRY_SIZE(baser); > int index; > gfn_t gfn; > @@ -783,7 +777,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, > if (id >= (l1_tbl_size / esz)) > return false; > > - addr = BASER_ADDRESS(baser) + id * esz; > + addr = base + id * esz; > gfn = addr >> PAGE_SHIFT; > > if (eaddr) > @@ -798,7 +792,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, > > /* Each 1st level entry is represented by a 64-bit value. */ > if (kvm_read_guest_lock(its->dev->kvm, > - BASER_ADDRESS(baser) + index * sizeof(indirect_ptr), > + base + index * sizeof(indirect_ptr), > &indirect_ptr, sizeof(indirect_ptr))) > return false; > > @@ -808,11 +802,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, > if (!(indirect_ptr & BIT_ULL(63))) > return false; > > - /* > - * Mask the guest physical address and calculate the frame number. > - * Any address beyond our supported 48 bits of PA will be caught > - * by the actual check in the final step. > - */ > + /* Mask the guest physical address and calculate the frame number. */ > indirect_ptr &= GENMASK_ULL(51, 16); > > /* Find the address of the actual entry */ > @@ -1304,9 +1294,6 @@ static u64 vgic_sanitise_its_baser(u64 reg) > GITS_BASER_OUTER_CACHEABILITY_SHIFT, > vgic_sanitise_outer_cacheability); > > - /* Bits 15:12 contain bits 51:48 of the PA, which we don't support. */ > - reg &= ~GENMASK_ULL(15, 12); > - > /* We support only one (ITS) page size: 64K */ > reg = (reg & ~GITS_BASER_PAGE_SIZE_MASK) | GITS_BASER_PAGE_SIZE_64K; > > @@ -1325,11 +1312,8 @@ static u64 vgic_sanitise_its_cbaser(u64 reg) > GITS_CBASER_OUTER_CACHEABILITY_SHIFT, > vgic_sanitise_outer_cacheability); > > - /* > - * Sanitise the physical address to be 64k aligned. > - * Also limit the physical addresses to 48 bits. > - */ > - reg &= ~(GENMASK_ULL(51, 48) | GENMASK_ULL(15, 12)); > + /* Sanitise the physical address to be 64k aligned. */ > + reg &= ~GENMASK_ULL(15, 12); > > return reg; > } > @@ -1375,7 +1359,7 @@ static void vgic_its_process_commands(struct kvm *kvm, struct vgic_its *its) > if (!its->enabled) > return; > > - cbaser = CBASER_ADDRESS(its->cbaser); > + cbaser = GITS_CBASER_ADDRESS(its->cbaser); > > while (its->cwriter != its->creadr) { > int ret = kvm_read_guest_lock(kvm, cbaser + its->creadr, > @@ -2233,7 +2217,7 @@ static int vgic_its_restore_device_tables(struct vgic_its *its) > if (!(baser & GITS_BASER_VALID)) > return 0; > > - l1_gpa = BASER_ADDRESS(baser); > + l1_gpa = GITS_BASER_ADDR_48_to_52(baser); > > if (baser & GITS_BASER_INDIRECT) { > l1_esz = GITS_LVL1_ENTRY_SIZE; > @@ -2305,7 +2289,7 @@ static int vgic_its_save_collection_table(struct vgic_its *its) > { > const struct vgic_its_abi *abi = vgic_its_get_abi(its); > u64 baser = its->baser_coll_table; > - gpa_t gpa = BASER_ADDRESS(baser); > + gpa_t gpa = GITS_BASER_ADDR_48_to_52(baser); > struct its_collection *collection; > u64 val; > size_t max_size, filled = 0; > @@ -2354,7 +2338,7 @@ static int vgic_its_restore_collection_table(struct vgic_its *its) > if (!(baser & GITS_BASER_VALID)) > return 0; > > - gpa = BASER_ADDRESS(baser); > + gpa = GITS_BASER_ADDR_48_to_52(baser); > > max_size = GITS_BASER_NR_PAGES(baser) * SZ_64K; > > diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c > index a2a175b08b17..b3d1f0985117 100644 > --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c > +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c > @@ -364,7 +364,6 @@ static u64 vgic_sanitise_pendbaser(u64 reg) > vgic_sanitise_outer_cacheability); > > reg &= ~PENDBASER_RES0_MASK; > - reg &= ~GENMASK_ULL(51, 48); > > return reg; > } > @@ -382,7 +381,6 @@ static u64 vgic_sanitise_propbaser(u64 reg) > vgic_sanitise_outer_cacheability); > > reg &= ~PROPBASER_RES0_MASK; > - reg &= ~GENMASK_ULL(51, 48); > return reg; > } > > Besides looks good to me. Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Eric
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > So far we have restricted the IPA size of the VM to the default > value (40bits). Now that we can manage the IPA size per VM and > support dynamic stage2 page tables, we can allow VMs to have > larger IPA. This patch introduces a the maximum IPA size > supported on the host. to be reworded This is decided by the following factors : > > 1) Maximum PARange supported by the CPUs - This can be inferred > from the system wide safe value. > 2) Maximum PA size supported by the host kernel (48 vs 52) > 3) Number of levels in the host page table (as we base our > stage2 tables on the host table helpers). > > Since the stage2 page table code is dependent on the stage1 > page table, we always ensure that : > > Number of Levels at Stage1 >= Number of Levels at Stage2 > > So we limit the IPA to make sure that the above condition > is satisfied. This will affect the following combinations > of VA_BITS and IPA for different page sizes. > > Host configuration | Unsupported IPA ranges > 39bit VA, 4K | [44, 48] > 36bit VA, 16K | [41, 48] > 42bit VA, 64K | [47, 52] > > Supporting the above combinations need independent stage2 > page table manipulation code, which would need substantial > changes. We could purse the solution independently and > switch the page table code once we have it ready. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > Changes since V2: > - Restrict the IPA size to limit the number of page table > levels in stage2 to that of stage1 or less. > --- > arch/arm/include/asm/kvm_mmu.h | 2 ++ > arch/arm64/include/asm/kvm_host.h | 2 ++ > arch/arm64/kvm/reset.c | 43 +++++++++++++++++++++++++++++++ > virt/kvm/arm/arm.c | 2 ++ > 4 files changed, 49 insertions(+) > > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > index 12ae5fbbcf01..5ad1a54f98dc 100644 > --- a/arch/arm/include/asm/kvm_mmu.h > +++ b/arch/arm/include/asm/kvm_mmu.h > @@ -358,6 +358,8 @@ static inline int hyp_map_aux_data(void) > > #define kvm_phys_to_vttbr(addr) (addr) > > +static inline void kvm_set_ipa_limit(void) {} > + > #endif /* !__ASSEMBLY__ */ > > #endif /* __ARM_KVM_MMU_H__ */ > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index 5ecd457bce7d..f0474061851d 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -513,6 +513,8 @@ static inline int kvm_arm_have_ssbd(void) > void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu); > void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu); > > +void kvm_set_ipa_limit(void); > + > #define __KVM_HAVE_ARCH_VM_ALLOC > struct kvm *kvm_arch_alloc_vm(void); > void kvm_arch_free_vm(struct kvm *kvm); > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 51ecf0f7c912..76972b19bdd7 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -34,6 +34,9 @@ > #include <asm/kvm_coproc.h> > #include <asm/kvm_mmu.h> > > +/* Maximum phys_shift supported for any VM on this host */ > +static u32 kvm_ipa_limit; > + > /* > * ARMv8 Reset Values > */ > @@ -135,6 +138,46 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) > return kvm_timer_vcpu_reset(vcpu); > } > > +void kvm_set_ipa_limit(void) > +{ > + unsigned int ipa_max, va_max, parange; > + > + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7; > + ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange); > + > + /* Raise the limit to the default size for backward compatibility */ > + if (ipa_max < KVM_PHYS_SHIFT) { > + WARN_ONCE(1, > + "PARange is %d bits, unsupported configuration!", > + ipa_max); > + ipa_max = KVM_PHYS_SHIFT; I don't really get what does happen in this case. The CPU cannot handle PA up to ipa_max so can the VM run properly? In case it is a showstopper, kvm_set_ipa_limit should return an error, cascaded by init_common_resources. Otherwise the warning message may be reworded. > + } > + > + /* Clamp it to the PA size supported by the kernel */ > + ipa_max = (ipa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : ipa_max; > + /* > + * Since our stage2 table is dependent on the stage1 page table code, > + * we must always honor the following condition: > + * > + * Number of levels in Stage1 >= Number of levels in Stage2. > + * > + * So clamp the ipa limit further down to limit the number of levels. > + * Since we can concatenate upto 16 tables at entry level, we could > + * go upto 4bits above the maximum VA addressible with the current addressable? > + * number of levels. > + */ > + va_max = PGDIR_SHIFT + PAGE_SHIFT - 3; > + va_max += 4; > + > + if (va_max < ipa_max) { > + kvm_info("Limiting IPA limit to %dbytes due to host VA bits limitation\n",> + va_max); > + ipa_max = va_max; you have a trace for this limitation but none for the comparison against PHYS_MASK_SHIFT. > + } > + > + kvm_ipa_limit = ipa_max; > +} > + > /* > * Configure the VTCR_EL2 for this VM. The VTCR value is common > * across all the physical CPUs on the system. We use system wide > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 43e716bc3f08..631f9a3ad99a 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -1413,6 +1413,8 @@ static int init_common_resources(void) > kvm_vmid_bits = kvm_get_vmid_bits(); > kvm_info("%d-bit VMID\n", kvm_vmid_bits); > > + kvm_set_ipa_limit(); As we have a kvm_info for the supported vmid_bits, may be good to output the max IPA size supported by the host whatever the applied clamps? Thanks Eric > + > return 0; > } > >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > Add support for handling 52bit addresses in PAR to HPFAR > conversion. Instead of hardcoding the address limits, we > now use PHYS_MASK_SHIFT. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Eric > --- > arch/arm64/include/asm/kvm_arm.h | 7 +++++++ > arch/arm64/kvm/hyp/switch.c | 2 +- > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h > index 0a37c0513ede..241d1622fa19 100644 > --- a/arch/arm64/include/asm/kvm_arm.h > +++ b/arch/arm64/include/asm/kvm_arm.h > @@ -308,6 +308,13 @@ > > /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */ > #define HPFAR_MASK (~UL(0xf)) > +/* > + * We have > + * PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12] > + * HPFAR [PA_Shift - 9 : 4] = FIPA [PA_Shift - 1 : 12] > + */ > +#define PAR_TO_HPFAR(par) \ > + (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8) > > #define kvm_arm_exception_type \ > {0, "IRQ" }, \ > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c > index 9d5ce1a3039a..7cc175c88a37 100644 > --- a/arch/arm64/kvm/hyp/switch.c > +++ b/arch/arm64/kvm/hyp/switch.c > @@ -263,7 +263,7 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar) > return false; /* Translation failed, back to guest */ > > /* Convert PAR to HPFAR format */ > - *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4; > + *hpfar = PAR_TO_HPFAR(tmp); > return true; > } > >
Hi Suzuki, On 9/17/18 12:41 PM, Suzuki K Poulose wrote: > Since we are about to remove the lower limit on the IPA size, > make sure that we do not go to 1 level page table (e.g, with > 32bit IPA on 64K host with concatenation) to avoid splitting > the host PMD huge pages at stage2. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/stage2_pgtable.h | 8 +++++++- > arch/arm64/kvm/reset.c | 12 +++++++++++- > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h > index 352ec4158fdf..6a56fdff0823 100644 > --- a/arch/arm64/include/asm/stage2_pgtable.h > +++ b/arch/arm64/include/asm/stage2_pgtable.h > @@ -72,8 +72,14 @@ > /* > * The number of PTRS across all concatenated stage2 tables given by the > * number of bits resolved at the initial level. > + * If we force more number of levels than necessary, we may have more levels? > + * stage2_pgdir_shift > IPA, in which case, stage2_pgd_ptrs will have > + * one entry. > */ > -#define __s2_pgd_ptrs(ipa, lvls) (1 << ((ipa) - pt_levels_pgdir_shift((lvls)))) > +#define pgd_ptrs_shift(ipa, pgdir_shift) \ > + ((ipa) > (pgdir_shift) ? ((ipa) - (pgdir_shift)) : 0) > +#define __s2_pgd_ptrs(ipa, lvls) \ > + (1 << (pgd_ptrs_shift((ipa), pt_levels_pgdir_shift(lvls)))) > #define __s2_pgd_size(ipa, lvls) (__s2_pgd_ptrs((ipa), (lvls)) * sizeof(pgd_t)) > > #define stage2_pgd_ptrs(kvm) __s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)) > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 76972b19bdd7..0393bb974b23 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -190,10 +190,19 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) > { > u64 vtcr = VTCR_EL2_FLAGS; > u64 parange; > + u8 lvls; > > if (type) > return -EINVAL; > > + /* > + * Use a minimum 2 level page table to prevent splitting > + * host PMD huge pages at stage2. > + */ > + lvls = stage2_pgtable_levels(KVM_PHYS_SHIFT); > + if (lvls < 2) > + lvls = 2; > + > parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7; > if (parange > ID_AA64MMFR0_PARANGE_MAX) > parange = ID_AA64MMFR0_PARANGE_MAX; > @@ -210,7 +219,8 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) > vtcr |= (kvm_get_vmid_bits() == 16) ? > VTCR_EL2_VS_16BIT : > VTCR_EL2_VS_8BIT; > - vtcr |= VTCR_EL2_LVLS_TO_SL0(stage2_pgtable_levels(KVM_PHYS_SHIFT)); > + nit: new line not requested Thanks Eric > + vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls); > vtcr |= VTCR_EL2_T0SZ(KVM_PHYS_SHIFT); > > kvm->arch.vtcr = vtcr; >
On 09/25/2018 11:00 AM, Auger Eric wrote: > Hi Suzuki, > > On 9/17/18 12:41 PM, Suzuki K Poulose wrote: >> Since we are about to remove the lower limit on the IPA size, >> make sure that we do not go to 1 level page table (e.g, with >> 32bit IPA on 64K host with concatenation) to avoid splitting >> the host PMD huge pages at stage2. >> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> arch/arm64/include/asm/stage2_pgtable.h | 8 +++++++- >> arch/arm64/kvm/reset.c | 12 +++++++++++- >> 2 files changed, 18 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h >> index 352ec4158fdf..6a56fdff0823 100644 >> --- a/arch/arm64/include/asm/stage2_pgtable.h >> +++ b/arch/arm64/include/asm/stage2_pgtable.h >> @@ -72,8 +72,14 @@ >> /* >> * The number of PTRS across all concatenated stage2 tables given by the >> * number of bits resolved at the initial level. >> + * If we force more number of levels than necessary, we may have > more levels? >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c >> index 76972b19bdd7..0393bb974b23 100644 >> --- a/arch/arm64/kvm/reset.c >> +++ b/arch/arm64/kvm/reset.c >> @@ -210,7 +219,8 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) >> vtcr |= (kvm_get_vmid_bits() == 16) ? >> VTCR_EL2_VS_16BIT : >> VTCR_EL2_VS_8BIT; >> - vtcr |= VTCR_EL2_LVLS_TO_SL0(stage2_pgtable_levels(KVM_PHYS_SHIFT)); >> + > nit: new line not requested > Fixed all the above Suzuki
Hi Eric On 09/21/2018 03:57 PM, Auger Eric wrote: > Hi Suzuki, > > On 9/17/18 12:41 PM, Suzuki K Poulose wrote: >> From: Kristina Martsenko <kristina.martsenko@arm.com> >> >> Add support for handling 52bit guest physical address to the >> VGIC layer. So far we have limited the guest physical address >> to 48bits, by explicitly masking the upper bits. This patch >> removes the restriction. We do not have to check if the host >> supports 52bit as the gpa is always validated during an access. >> (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). >> Also, the ITS table save-restore is also not affected with >> the enhancement. The DTE entries already store the bits[51:8] >> of the ITT_addr (with a 256byte alignment). >> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> >> [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> include/linux/irqchip/arm-gic-v3.h | 5 +++++ >> virt/kvm/arm/vgic/vgic-its.c | 36 +++++++++--------------------- >> virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 -- >> 3 files changed, 15 insertions(+), 28 deletions(-) >> >> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h >> index 8bdbb5f29494..e961f40992d7 100644 >> --- a/include/linux/irqchip/arm-gic-v3.h >> +++ b/include/linux/irqchip/arm-gic-v3.h >> @@ -357,6 +357,8 @@ >> #define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt) >> #define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb) >> >> +#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(52, 12)) > nit GENMASK_ULL(51, 12), bit 52 is RES0 I will fix this. >> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c >> index a2a175b08b17..b3d1f0985117 100644 >> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c >> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c >> @@ -364,7 +364,6 @@ static u64 vgic_sanitise_pendbaser(u64 reg) >> vgic_sanitise_outer_an); >> >> reg &= ~PENDBASER_RES0_MASK; >> - reg &= ~GENMASK_ULL(51, 48); >> >> return reg; >> } >> @@ -382,7 +381,6 @@ static u64 vgic_sanitise_propbaser(u64 reg) >> vgic_sanitise_outer_cacheability); >> >> reg &= ~PROPBASER_RES0_MASK; >> - reg &= ~GENMASK_ULL(51, 48); >> return reg; >> } >> >> > Besides looks good to me. > Reviewed-by: Eric Auger <eric.auger@redhat.com> Thanks Suzuki
On 09/25/2018 10:59 AM, Auger Eric wrote: > Hi Suzuki, > > On 9/17/18 12:41 PM, Suzuki K Poulose wrote: >> So far we have restricted the IPA size of the VM to the default >> value (40bits). Now that we can manage the IPA size per VM and >> support dynamic stage2 page tables, we can allow VMs to have >> larger IPA. This patch introduces a the maximum IPA size >> supported on the host. > to be reworded > This is decided by the following factors : Sure >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c >> index 51ecf0f7c912..76972b19bdd7 100644 >> --- a/arch/arm64/kvm/reset.c >> +++ b/arch/arm64/kvm/reset.c >> @@ -34,6 +34,9 @@ >> #include <asm/kvm_coproc.h> >> #include <asm/kvm_mmu.h> >> >> +/* Maximum phys_shift supported for any VM on this host */ >> +static u32 kvm_ipa_limit; >> + >> /* >> * ARMv8 Reset Values >> */ >> @@ -135,6 +138,46 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) >> return kvm_timer_vcpu_reset(vcpu); >> } >> >> +void kvm_set_ipa_limit(void) >> +{ >> + unsigned int ipa_max, va_max, parange; >> + >> + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7; >> + ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange); >> + >> + /* Raise the limit to the default size for backward compatibility */ >> + if (ipa_max < KVM_PHYS_SHIFT) { >> + WARN_ONCE(1, >> + "PARange is %d bits, unsupported configuration!", >> + ipa_max); >> + ipa_max = KVM_PHYS_SHIFT; > I don't really get what does happen in this case. The CPU cannot handle > PA up to ipa_max so can the VM run properly? In case it is a > showstopper, kvm_set_ipa_limit should return an error, cascaded by > init_common_resources. Otherwise the warning message may be reworded. I think this was a warning added to warn against the older Foundation model which had a 36bit PA size. So the VTCR was progammed with a 36bit limit, while the KVM guest was allowed to create 40bit IPA space, though it wouldn't fly well if someone tried to. With this series, I think we may expose the real IPA_MAX (which could be < 40bit) and warn the user if someone tried to create a VM with 40bit IPA (vm_type == 0) and let the call succeed (for the sake of ABI). Marc, Christoffer, Eric Thoughts ? >> + } >> + >> + /* Clamp it to the PA size supported by the kernel */ >> + ipa_max = (ipa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : ipa_max; >> + /* >> + * Since our stage2 table is dependent on the stage1 page table code, >> + * we must always honor the following condition: >> + * >> + * Number of levels in Stage1 >= Number of levels in Stage2. >> + * >> + * So clamp the ipa limit further down to limit the number of levels. >> + * Since we can concatenate upto 16 tables at entry level, we could >> + * go upto 4bits above the maximum VA addressible with the current > addressable? Sure >> + * number of levels. >> + */ >> + va_max = PGDIR_SHIFT + PAGE_SHIFT - 3; >> + va_max += 4; >> + >> + if (va_max < ipa_max) { >> + kvm_info("Limiting IPA limit to %dbytes due to host VA bits limitation\n", >> + va_max); >> + ipa_max = va_max; > you have a trace for this limitation but none for the comparison against > PHYS_MASK_SHIFT. May be I could add a message which only mentions what is the limiting factor kernel VA vs kernel PA support >> + } >> + >> + kvm_ipa_limit = ipa_max; >> +} >> + >> /* >> * Configure the VTCR_EL2 for this VM. The VTCR value is common >> * across all the physical CPUs on the system. We use system wide >> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c >> index 43e716bc3f08..631f9a3ad99a 100644 >> --- a/virt/kvm/arm/arm.c >> +++ b/virt/kvm/arm/arm.c >> @@ -1413,6 +1413,8 @@ static int init_common_resources(void) >> kvm_vmid_bits = kvm_get_vmid_bits(); >> kvm_info("%d-bit VMID\n", kvm_vmid_bits); >> >> + kvm_set_ipa_limit(); > As we have a kvm_info for the supported vmid_bits, may be good to output > the max IPA size supported by the host whatever the applied clamps? Sure, will do that. Thanks Suzuki
Hi Suzuki, On 9/20/18 5:22 PM, Suzuki K Poulose wrote: > > > On 20/09/18 15:07, Auger Eric wrote: >> Hi Suzuki, >> On 9/17/18 12:41 PM, Suzuki K Poulose wrote: >>> On arm64 VTTBR_EL2:BADDR holds the base address for the stage2 >>> translation table. The Arm ARM mandates that the bits BADDR[x-1:0] >>> should be 0, where 'x' is defined for a given IPA Size and the >>> number of levels for a translation granule size. It is defined >>> using some magical constants. This patch is a reverse engineered >>> implementation to calculate the 'x' at runtime for a given ipa and >>> number of page table levels. See patch for more details. >>> >>> Cc: Marc Zyngier <marc.zyngier@arm.com> >>> Cc: Christoffer Dall <cdall@kernel.org> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> >>> --- >>> Changes since V3: >>> - Update reference to latest ARM ARM and improve commentary >>> --- >>> arch/arm64/include/asm/kvm_arm.h | 63 +++++++++++++++++++++++++++++--- >>> arch/arm64/include/asm/kvm_mmu.h | 25 ++++++++++++- >>> 2 files changed, 81 insertions(+), 7 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/kvm_arm.h >>> b/arch/arm64/include/asm/kvm_arm.h >>> index 14317b3a1820..3fb1d440be6e 100644 >>> --- a/arch/arm64/include/asm/kvm_arm.h >>> +++ b/arch/arm64/include/asm/kvm_arm.h >>> @@ -123,7 +123,6 @@ >>> #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) >>> #define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) >>> #define VTCR_EL2_T0SZ_MASK 0x3f >>> -#define VTCR_EL2_T0SZ_40B 24 >>> #define VTCR_EL2_VS_SHIFT 19 >>> #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) >>> #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) >>> @@ -140,11 +139,8 @@ >>> * Note that when using 4K pages, we concatenate two first level >>> page tables >>> * together. With 16K pages, we concatenate 16 first level page >>> tables. >>> * >>> - * The magic numbers used for VTTBR_X in this patch can be found in >>> Tables >>> - * D4-23 and D4-25 in ARM DDI 0487A.b. >>> */ >>> >>> -#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B >>> #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | >>> VTCR_EL2_ORGN0_WBWA | \ >>> VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1) >>> >>> @@ -175,9 +171,64 @@ >>> #endif >>> >>> #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | >>> VTCR_EL2_TGRAN_FLAGS) >>> -#define VTTBR_X (VTTBR_X_TGRAN_MAGIC - >>> VTCR_EL2_T0SZ_IPA) >>> +/* >>> + * ARM VMSAv8-64 defines an algorithm for finding the translation table >>> + * descriptors in section D4.2.8 in ARM DDI 0487C.a. >>> + * >>> + * The algorithm defines the expectations on the BaseAddress (for >>> the page >>> + * table) bits resolved at each level based on the page size, entry >>> level >>> + * and T0SZ. The variable "x" in the algorithm also affects the >>> VTTBR:BADDR >>> + * for stage2 page table. >>> + * >>> + * The value of "x" is calculated as : >>> + * x = Magic_N - T0SZ >> >> What is not crystal clear to me is the "if SL0b,c = n" case where x get >> a value not based on Magic_N. Please could you explain why it is not >> relevant? > > We only care about the "x" for the "entry" level of the table look up > to make sure that the VTTBR is physical address meets the required > alignment. In both cases, if SL0 b,c == n, x is (PAGE_SHIFT) iff the > level you are looking at is not the "entry level". So this should always > be page aligned, like any intermediate level table. Oh OK I get it now. > > The Magic value is needed only needed for the "entry" level due to the > fact that we may have lesser bits to resolve (i.e, depending on your > PAMax or in other words T0SZ) than the intermediate levels (where we > always resolve {PAGE_SHIFT - 3} bits. This is further complicated by the > fact that Stage2 could use different number of levels for a given T0SZ > than the stage1. > I acknowledge that the algorithm is a bit too cryptic and I spent quite > sometime decode it to the formula we use below ;-). > > I could update the comment to : > > /* > * ARM VMSAv8-64 defines an algorithm for finding the translation table > * descriptors in section D4.2.8 in ARM DDI 0487C.a. > * > * The algorithm defines the expectations on the translation table > * addresses for each level, based on PAGE_SIZE, entry level > * and the translation table size (T0SZ). The variable "x" in the > * algorithm determines the alignment of a table base address at a given > * level and thus determines the alignment of VTTBR:BADDR for stage2 > * page table entry level. > * Since the number of bits resolved at the entry level could vary > * depending on the T0SZ, the value of "x" is defined based on a > * Magic constant for a given PAGE_SIZE and Entry Level. The > * intermediate levels must be always aligned to the PAGE_SIZE (i.e, > * x = PAGE_SHIFT). > * > * The value of "x" for entry level is calculated as : > * x = Magic_N - T0SZ > * Looks OK. Thank you for the explanation. Eric > > ... > > Suzuki > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy > the information in any medium. Thank you.