Message ID | 1522156531-28348-1-git-send-email-suzuki.poulose@arm.com |
---|---|
Headers | show |
Series | kvm: arm64: Dynamic & 52bit IPA support | expand |
On Tue, Mar 27, 2018 at 02:15:11PM +0100, Suzuki K Poulose wrote: > virtio-mmio with virtio-v1 uses a 32bit PFN for the queue. > If the queue pfn is too large to fit in 32bits, which > we could hit on arm64 systems with 52bit physical addresses > (even with 64K page size), we simply miss out a proper link > to the other side of the queue. > > Add a check to validate the PFN, rather than silently breaking > the devices. > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Jason Wang <jasowang@redhat.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Cc: Peter Maydel <peter.maydell@linaro.org> > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> OK - seems harmless so I will queue this. But I really think effort should be spent on adding v1.0 support in QEMU. > --- > drivers/virtio/virtio_mmio.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > index 67763d3..b2f9b5c 100644 > --- a/drivers/virtio/virtio_mmio.c > +++ b/drivers/virtio/virtio_mmio.c > @@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, > /* Activate the queue */ > writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM); > if (vm_dev->version == 1) { > + u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT; > + > + /* > + * virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something > + * that doesn't fit in 32bit, fail the setup rather than > + * pretending to be successful. > + */ > + if (q_pfn >> 32) { > + dev_err(&vdev->dev, "virtio-mmio: queue address too large\n"); > + err = -ENOMEM; > + goto error_bad_pfn; > + } > + > writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN); > - writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT, > - vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); > + writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); > } else { > u64 addr; > > @@ -430,6 +442,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, > > return vq; > > +error_bad_pfn: > + vring_del_virtqueue(vq); > error_new_virtqueue: > if (vm_dev->version == 1) { > writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); > -- > 2.7.4
On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote: > Legacy PCI over virtio uses a 32bit PFN for the queue. If the > queue pfn is too large to fit in 32bits, which we could hit on > arm64 systems with 52bit physical addresses (even with 64K page > size), we simply miss out a proper link to the other side of > the queue. > > Add a check to validate the PFN, rather than silently breaking > the devices. > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Jason Wang <jasowang@redhat.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Cc: Peter Maydel <peter.maydell@linaro.org> > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c > index 2780886..4b84a75 100644 > --- a/drivers/virtio/virtio_pci_legacy.c > +++ b/drivers/virtio/virtio_pci_legacy.c > @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > struct virtqueue *vq; > u16 num; > int err; > + u64 q_pfn; > > /* Select the queue we're interested in */ > iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); > @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > if (!vq) > return ERR_PTR(-ENOMEM); > > + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT; > + if (q_pfn >> 32) { > + dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n"); > + err = -ENOMEM; ENOMEM seems wrong here. E2BIG? > + goto out_del_vq; > + } > + > /* activate the queue */ > - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, > - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > + iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); Is the cast really necessary here? > > vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY; > > @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > > out_deactivate: > iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > +out_del_vq: > vring_del_virtqueue(vq); > return ERR_PTR(err); > } > -- > 2.7.4
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > If the guest wants to use a larger physical address space place > the RAM at upper half of the address space. Otherwise, it uses the > default layout. > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arm/aarch32/include/kvm/kvm-arch.h | 1 + > arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++++++--- > arm/include/arm-common/kvm-arch.h | 11 ++++++----- > arm/kvm.c | 2 +- > 4 files changed, 20 insertions(+), 9 deletions(-) > > diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h > index cd31e72..2d62aab 100644 > --- a/arm/aarch32/include/kvm/kvm-arch.h > +++ b/arm/aarch32/include/kvm/kvm-arch.h > @@ -4,6 +4,7 @@ > #define ARM_KERN_OFFSET(...) 0x8000 > > #define ARM_MAX_MEMORY(...) ARM_LOMAP_MAX_MEMORY I guess this should now be ARM32_MAX_MEMORY? Thanks, Jean
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > We set VTCR_EL2 very early during the stage2 init and don't > touch it ever. This is fine as we had a fixed IPA size. This > patch changes the behavior to set the VTCR for a given VM, > depending on its stage2 table. The common configuration for > VTCR is still performed during the early init as we have to > retain the hardware access flag update bits (VTCR_EL2_HA) > per CPU (as they are only set for the CPUs which are capabile). (Nit: capable) > The bits defining the number of levels in the page table (SL0) > and and the size of the Input address to the translation (T0SZ) > are programmed for each VM upon entry to the guest. > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c > index 870f4b1..5ccd3ae 100644 > --- a/arch/arm64/kvm/hyp/switch.c > +++ b/arch/arm64/kvm/hyp/switch.c > @@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu) > static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu) > { > struct kvm *kvm = kern_hyp_va(vcpu->kvm); > + u64 vtcr = read_sysreg(vtcr_el2); > + > + vtcr &= ~VTCR_EL2_PRIVATE_MASK; > + vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) | > + VTCR_EL2_T0SZ(kvm_phys_shift(kvm)); > + write_sysreg(vtcr, vtcr_el2); > write_sysreg(kvm->arch.vttbr, vttbr_el2); > } Do we need to set this register for tlb maintenance too? e.g. tlbi for a 3-level-stage2 vm when a 2-level-stage2 vm's vtcr is loaded... (The ARM-ARM has 'Any of the bits of VTCR_EL2 are permitted to be cached in a TLB'.) Thanks, James
On 03/04/18 15:58, James Morse wrote: > Hi Suzuki, > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> We set VTCR_EL2 very early during the stage2 init and don't >> touch it ever. This is fine as we had a fixed IPA size. This >> patch changes the behavior to set the VTCR for a given VM, >> depending on its stage2 table. The common configuration for >> VTCR is still performed during the early init as we have to >> retain the hardware access flag update bits (VTCR_EL2_HA) >> per CPU (as they are only set for the CPUs which are capabile). > > (Nit: capable) > Thanks for spotting, will fix it. > >> The bits defining the number of levels in the page table (SL0) >> and and the size of the Input address to the translation (T0SZ) >> are programmed for each VM upon entry to the guest. > >> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c >> index 870f4b1..5ccd3ae 100644 >> --- a/arch/arm64/kvm/hyp/switch.c >> +++ b/arch/arm64/kvm/hyp/switch.c >> @@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu) >> static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu) >> { >> struct kvm *kvm = kern_hyp_va(vcpu->kvm); >> + u64 vtcr = read_sysreg(vtcr_el2); >> + >> + vtcr &= ~VTCR_EL2_PRIVATE_MASK; >> + vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) | >> + VTCR_EL2_T0SZ(kvm_phys_shift(kvm)); >> + write_sysreg(vtcr, vtcr_el2); >> write_sysreg(kvm->arch.vttbr, vttbr_el2); >> } > > Do we need to set this register for tlb maintenance too? > e.g. tlbi for a 3-level-stage2 vm when a 2-level-stage2 vm's vtcr is loaded... > > (The ARM-ARM has 'Any of the bits of VTCR_EL2 are permitted to be cached in a TLB'.) You're right. We need to set the VTCR for the tlb operations. I think we can do this by hooking it to the __tlb_switch_to_guest() routine. Will address it in the next version. Cheers Suzuki
On 27 March 2018 at 14:15, Suzuki K Poulose <suzuki.poulose@arm.com> wrote: > Expose the maximum physical address size supported by the host > for a VM. This could be later used by the userspace to choose the > appropriate size for a given VM. The limit is determined as the > minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52) > and the stage2 page table support limit (which is 40bits at the moment). > For backward compatibility, we support a minimum of 40bits. The limit > will be lifted as we add support for the stage2 to support the host > kernel PA limit. > > This value may be different from what is exposed to the VM via > CPU ID registers. The limit only applies to the stage2 page table. > > Cc: Christoffer Dall <cdall@kernel.org> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Peter Maydel <peter.maydell@linaro.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > Documentation/virtual/kvm/api.txt | 14 ++++++++++++++ > arch/arm/include/asm/kvm_mmu.h | 5 +++++ > arch/arm64/include/asm/kvm_mmu.h | 5 +++++ > include/uapi/linux/kvm.h | 6 ++++++ > virt/kvm/arm/arm.c | 6 ++++++ > 5 files changed, 36 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 792fa87..55908a8 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error > This ioctl can be used to unregister the guest memory region registered > with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above. > > +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT > +Capability: basic > +Architectures: arm, arm64 > +Type: system ioctl > +Parameters: none > +Returns: log2(Maximum physical address space size) supported by the > +hyperviosr. typo: "hypervisor". > + > +This ioctl can be used to identify the maximum physical address space size > +supported by the hypervisor. Is that the physical address space on the host, or the physical address space size we present to the guest? > The returned value indicates the maximum size > +of the address that can be resolved by the stage2 translation table on > +arm/arm64. On arm64, the value is decided based on the host kernel > +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange. > +This may not match the value exposed to the VM in CPU ID registers. Isn't it likely to confuse the guest if we lie to it about the PA range it sees? When would the two values differ? Do we also need a 'set' operation, so userspace can create a VM that has a 40 bit userspace on a CPU that supports more than that, or does it just work? What's the x86 API for KVM to tell userspace about physical address range restrictions? thanks -- PMM
Hi Suzuki, I haven't had a chance to look at the code but noticed one issue below. Suzuki K Poulose <suzuki.poulose@arm.com> writes: > Now that we can manage the stage2 page table per VM, switch the > configuration details to per VM instance. We keep track of the > IPA bits, number of page table levels and the VTCR bits (which > depends on the IPA and the number of levels). While at it, remove > unused pgd_lock field from kvm_arch for arm64. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/kvm_host.h | 14 ++++++++++++-- > arch/arm64/include/asm/kvm_mmu.h | 11 +++++++++-- > arch/arm64/include/asm/stage2_pgtable.h | 1 - > arch/arm64/kvm/hyp/switch.c | 3 +-- > virt/kvm/arm/mmu.c | 4 ++++ > 5 files changed, 26 insertions(+), 7 deletions(-) > [...] > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index bb458bf..e86d7f4 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v) > */ > #define KVM_PHYS_SHIFT (40) > > -#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT > +#define kvm_phys_shift(kvm) (kvm->arch.phys_shift) > #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) > #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) > +#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels) > > static inline bool kvm_page_empty(void *ptr) > { [...] > diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h > index 33e8ebb..9b75b83 100644 > --- a/arch/arm64/include/asm/stage2_pgtable.h > +++ b/arch/arm64/include/asm/stage2_pgtable.h > @@ -44,7 +44,6 @@ > */ > #define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls)))) > > -#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm)) > #define stage2_pgdir_shift(kvm) \ > pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) > #define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm))) [...] > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 7a264c6..746f38e 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) > return -EINVAL; > } > > + /* Make sure we have the stage2 configured for this VM */ > + if (WARN_ON(!kvm_stage2_levels(kvm))) > + return -EINVAL; > + This hunk breaks the 32-bit build as kvm_stag2_levels() isn't defined on arm. Thanks, Punit > /* Allocate the HW PGD, making sure that each page gets its own refcount */ > pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO); > if (!pgd)
On 13/04/18 14:21, Peter Maydell wrote: > On 27 March 2018 at 14:15, Suzuki K Poulose <suzuki.poulose@arm.com> wrote: >> Expose the maximum physical address size supported by the host >> for a VM. This could be later used by the userspace to choose the >> appropriate size for a given VM. The limit is determined as the >> minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52) >> and the stage2 page table support limit (which is 40bits at the moment). >> For backward compatibility, we support a minimum of 40bits. The limit >> will be lifted as we add support for the stage2 to support the host >> kernel PA limit. >> >> This value may be different from what is exposed to the VM via >> CPU ID registers. The limit only applies to the stage2 page table. >> >> Cc: Christoffer Dall <cdall@kernel.org> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Peter Maydel <peter.maydell@linaro.org> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> Documentation/virtual/kvm/api.txt | 14 ++++++++++++++ >> arch/arm/include/asm/kvm_mmu.h | 5 +++++ >> arch/arm64/include/asm/kvm_mmu.h | 5 +++++ >> include/uapi/linux/kvm.h | 6 ++++++ >> virt/kvm/arm/arm.c | 6 ++++++ >> 5 files changed, 36 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >> index 792fa87..55908a8 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error >> This ioctl can be used to unregister the guest memory region registered >> with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above. >> >> +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT >> +Capability: basic >> +Architectures: arm, arm64 >> +Type: system ioctl >> +Parameters: none >> +Returns: log2(Maximum physical address space size) supported by the >> +hyperviosr. > > typo: "hypervisor". > Will fix it. >> + >> +This ioctl can be used to identify the maximum physical address space size >> +supported by the hypervisor. > > Is that the physical address space on the host, or the physical > address space size we present to the guest? It is the size of the address space we present to the guest. I will update the documentation to make it more clear. > >> The returned value indicates the maximum size >> +of the address that can be resolved by the stage2 translation table on >> +arm/arm64. On arm64, the value is decided based on the host kernel >> +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange. >> +This may not match the value exposed to the VM in CPU ID registers. > > Isn't it likely to confuse the guest if we lie to it about the PA range it > sees? When would the two values differ? On a heterogeneous system, the guest could see different values of PARange on the same VCPU. So that is not safe for a guest at the moment. Ideally, we should emulate the PARange to provide the system wide safe value, which the guest can read. We don't touch the emulation of PARange in the ID registers in this set. All we do is (in the next patches) limiting the address space size provided to the guest. May be we could update PARange to the limit imposed and emulate the field. > > Do we also need a 'set' operation, so userspace can create a VM > that has a 40 bit userspace on a CPU that supports more than that, > or does it just work? It just works as before, creating a 40bit userspace, without any additional steps. All we do is, allowing to create a VM with bigger address space by specifying the size in the "type" field. The other question is, does it really matter what a guest sees in PARange and what it is provided with ? e.g, on my Juno, the A53's have 40bit and A57 has 44bit, while the system uses only 40bit. This will be true even with the new change. i.e, we don't allow a size beyond the limit supported by all the CPUs on the system. > > What's the x86 API for KVM to tell userspace about physical address > range restrictions? From a quick look, the limit comes from cpuid (leaf 0x80000008 ?). So, it could be via the existing per-VCPU get/set_cpuid{,2}() API on x86. Suzuki
On 13/04/18 17:27, Punit Agrawal wrote: > Hi Suzuki, > > I haven't had a chance to look at the code but noticed one issue below. > > Suzuki K Poulose <suzuki.poulose@arm.com> writes: > >> Now that we can manage the stage2 page table per VM, switch the >> configuration details to per VM instance. We keep track of the >> IPA bits, number of page table levels and the VTCR bits (which >> depends on the IPA and the number of levels). While at it, remove >> unused pgd_lock field from kvm_arch for arm64. >> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> arch/arm64/include/asm/kvm_host.h | 14 ++++++++++++-- >> arch/arm64/include/asm/kvm_mmu.h | 11 +++++++++-- >> arch/arm64/include/asm/stage2_pgtable.h | 1 - >> arch/arm64/kvm/hyp/switch.c | 3 +-- >> virt/kvm/arm/mmu.c | 4 ++++ >> 5 files changed, 26 insertions(+), 7 deletions(-) >> > > [...] > >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >> index bb458bf..e86d7f4 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v) >> */ >> #define KVM_PHYS_SHIFT (40) >> >> -#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT >> +#define kvm_phys_shift(kvm) (kvm->arch.phys_shift) >> #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) >> #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) >> +#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels) >> >> static inline bool kvm_page_empty(void *ptr) >> { > > [...] > >> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h >> index 33e8ebb..9b75b83 100644 >> --- a/arch/arm64/include/asm/stage2_pgtable.h >> +++ b/arch/arm64/include/asm/stage2_pgtable.h >> @@ -44,7 +44,6 @@ >> */ >> #define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls)))) >> >> -#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm)) >> #define stage2_pgdir_shift(kvm) \ >> pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) >> #define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm))) > > [...] > >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 7a264c6..746f38e 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) >> return -EINVAL; >> } >> >> + /* Make sure we have the stage2 configured for this VM */ >> + if (WARN_ON(!kvm_stage2_levels(kvm))) >> + return -EINVAL; >> + > > This hunk breaks the 32-bit build as kvm_stag2_levels() isn't defined on > arm. Thanks for spotting. I have fixed this locally in my next version to check for the kvm_phys_shift(), as I plan to delay the levels selection to the actual allocation of the table, giving us a fall back to increase the level if we are unable to allocate contiguous pages (e.g, 16 * 64K pages with say 46bit IPA). Cheers Suzuki
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > So far we had a static stage2 page table handling code, based on a > fixed IPA of 40bits. As we prepare for a configurable IPA size per > VM, make the our stage2 page table code dynamic to do the right thing > for a given VM. > > Support for the IPA size configuration needs other changes in the way > we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still > fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h > to the top, before including the asm/stage2_pgtable.h to avoid a forward > declaration. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/kvm_mmu.h | 16 +- > arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ------ > arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 ----- > arch/arm64/include/asm/stage2_pgtable.h | 203 +++++++++++++++++--------- > 4 files changed, 145 insertions(+), 155 deletions(-) > delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h > delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 594c4e6..bc133ce 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -18,9 +18,10 @@ > #ifndef __ARM64_KVM_MMU_H__ > #define __ARM64_KVM_MMU_H__ > > +#include <asm/cpufeature.h> > #include <asm/page.h> > #include <asm/memory.h> > -#include <asm/cpufeature.h> OOI, why was the include of cpufeature.h moved earlier? Cheers,
On 25/04/18 17:35, Julien Grall wrote: > Hi Suzuki, > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> So far we had a static stage2 page table handling code, based on a >> fixed IPA of 40bits. As we prepare for a configurable IPA size per >> VM, make the our stage2 page table code dynamic to do the right thing >> for a given VM. >> >> Support for the IPA size configuration needs other changes in the way >> we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still >> fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h >> to the top, before including the asm/stage2_pgtable.h to avoid a forward >> declaration. >> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> arch/arm64/include/asm/kvm_mmu.h | 16 +- >> arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ------ >> arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 ----- >> arch/arm64/include/asm/stage2_pgtable.h | 203 +++++++++++++++++--------- >> 4 files changed, 145 insertions(+), 155 deletions(-) >> delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h >> delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h >> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >> index 594c4e6..bc133ce 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -18,9 +18,10 @@ >> #ifndef __ARM64_KVM_MMU_H__ >> #define __ARM64_KVM_MMU_H__ >> +#include <asm/cpufeature.h> >> #include <asm/page.h> >> #include <asm/memory.h> >> -#include <asm/cpufeature.h> > > OOI, why was the include of cpufeature.h moved earlier? I think it was a minor cleanup to keep the includes in order. I will either mention it or drop it from the hunk altogether. Cheers Suzuki
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > This patch rearranges the page table level helpers so that it can > be reused for a page table with different number of levels > (e.g, stage2 page table for a VM) than the kernel page tables. > As such there is no functional change with this patch. > > The page table helpers are defined to do the right thing for the > fixed page table levels set for the kernel. This patch tries to > refactor the code such that, we can have helpers for each level, > which should be used when the caller knows that the level exists > for the page table dealt with. Since the kernel defines helpers > p.d_action and __p.d_action, for consistency, we name the raw > page table action helpers __raw_p.d_action. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Steve Capper <steve.capper@arm.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Cheers, > --- > arch/arm64/include/asm/pgalloc.h | 34 +++++++++++++++++++---- > arch/arm64/include/asm/pgtable.h | 60 +++++++++++++++++++++++++--------------- > 2 files changed, 66 insertions(+), 28 deletions(-) > > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h > index 2e05bcd..ab4662c 100644 > --- a/arch/arm64/include/asm/pgalloc.h > +++ b/arch/arm64/include/asm/pgalloc.h > @@ -29,6 +29,30 @@ > #define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO) > #define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t)) > > +static inline void __raw_pmd_free(pmd_t *pmdp) > +{ > + BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1)); > + free_page((unsigned long)pmdp); > +} > + > +static inline void > +__raw_pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot) > +{ > + __raw_set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot)); > +} > + > +static inline void __raw_pud_free(pud_t *pudp) > +{ > + BUG_ON((unsigned long)pudp & (PAGE_SIZE-1)); > + free_page((unsigned long)pudp); > +} > + > +static inline void > +__raw_pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot) > +{ > + __raw_set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot)); > +} > + > #if CONFIG_PGTABLE_LEVELS > 2 > > static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) > @@ -38,13 +62,12 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) > > static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp) > { > - BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1)); > - free_page((unsigned long)pmdp); > + __raw_pmd_free(pmdp); > } > > static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot) > { > - set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot)); > + __raw_pud_populate(pudp, pmdp, prot); > } > > static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp) > @@ -67,13 +90,12 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) > > static inline void pud_free(struct mm_struct *mm, pud_t *pudp) > { > - BUG_ON((unsigned long)pudp & (PAGE_SIZE-1)); > - free_page((unsigned long)pudp); > + __raw_pud_free(pudp); > } > > static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot) > { > - set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot)); > + __raw_pgd_populate(pgdp, pudp, prot); > } > > static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp) > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 7e2c27e..5e22503 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -475,31 +475,39 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) > */ > #define mk_pte(page,prot) pfn_pte(page_to_pfn(page),prot) > > -#if CONFIG_PGTABLE_LEVELS > 2 > +#define __raw_pud_none(pud) (!pud_val(pud)) > +#define __raw_pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT)) > +#define __raw_pud_present(pud) pte_present(pud_pte(pud)) > > -#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd)) > - > -#define pud_none(pud) (!pud_val(pud)) > -#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT)) > -#define pud_present(pud) pte_present(pud_pte(pud)) > - > -static inline void set_pud(pud_t *pudp, pud_t pud) > +static inline void __raw_set_pud(pud_t *pudp, pud_t pud) > { > WRITE_ONCE(*pudp, pud); > dsb(ishst); > isb(); > } > > -static inline void pud_clear(pud_t *pudp) > +static inline void __raw_pud_clear(pud_t *pudp) > { > - set_pud(pudp, __pud(0)); > + __raw_set_pud(pudp, __pud(0)); > } > > -static inline phys_addr_t pud_page_paddr(pud_t pud) > +static inline phys_addr_t __raw_pud_page_paddr(pud_t pud) > { > return __pud_to_phys(pud); > } > > +#if CONFIG_PGTABLE_LEVELS > 2 > + > +#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd)) > + > +#define pud_none(pud) __raw_pud_none(pud) > +#define pud_bad(pud) __raw_pud_bad(pud) > +#define pud_present(pud) __raw_pud_present(pud) > + > +#define set_pud(pudp, pud) __raw_set_pud((pudp), (pud)) > +#define pud_clear(pudp) __raw_pud_clear((pudp)) > +#define pud_page_paddr(pud) __raw_pud_page_paddr((pud)) > + > /* Find an entry in the second-level page table. */ > #define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)) > > @@ -528,30 +536,38 @@ static inline phys_addr_t pud_page_paddr(pud_t pud) > > #endif /* CONFIG_PGTABLE_LEVELS > 2 */ > > -#if CONFIG_PGTABLE_LEVELS > 3 > +#define __raw_pgd_none(pgd) (!pgd_val(pgd)) > +#define __raw_pgd_bad(pgd) (!(pgd_val(pgd) & 2)) > +#define __raw_pgd_present(pgd) (pgd_val(pgd)) > > -#define pud_ERROR(pud) __pud_error(__FILE__, __LINE__, pud_val(pud)) > - > -#define pgd_none(pgd) (!pgd_val(pgd)) > -#define pgd_bad(pgd) (!(pgd_val(pgd) & 2)) > -#define pgd_present(pgd) (pgd_val(pgd)) > - > -static inline void set_pgd(pgd_t *pgdp, pgd_t pgd) > +static inline void __raw_set_pgd(pgd_t *pgdp, pgd_t pgd) > { > WRITE_ONCE(*pgdp, pgd); > dsb(ishst); > } > > -static inline void pgd_clear(pgd_t *pgdp) > +static inline void __raw_pgd_clear(pgd_t *pgdp) > { > - set_pgd(pgdp, __pgd(0)); > + __raw_set_pgd(pgdp, __pgd(0)); > } > > -static inline phys_addr_t pgd_page_paddr(pgd_t pgd) > +static inline phys_addr_t __raw_pgd_page_paddr(pgd_t pgd) > { > return __pgd_to_phys(pgd); > } > > +#if CONFIG_PGTABLE_LEVELS > 3 > + > +#define pud_ERROR(pud) __pud_error(__FILE__, __LINE__, pud_val(pud)) > + > +#define pgd_none(pgd) __raw_pgd_none((pgd)) > +#define pgd_bad(pgd) __raw_pgd_bad((pgd)) > +#define pgd_present(pgd) __raw_pgd_present((pgd)) > + > +#define set_pgd(pgdp, pgd) __raw_set_pgd((pgdp), (pgd)) > +#define pgd_clear(pgdp) __raw_pgd_clear((pgdp)) > +#define pgd_page_paddr(pgd) __raw_pgd_page_paddr((pgd)) > + > /* Find an entry in the frst-level page table. */ > #define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)) > >
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > Make pud_huge reusable for stage2 tables, independent > of the stage1 levels. > > Cc: Steve Capper <steve.capper@arm.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Cheers, > --- > arch/arm64/include/asm/pgtable.h | 5 +++++ > arch/arm64/mm/hugetlbpage.c | 2 +- > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 5e22503..4a16c11 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -479,6 +479,11 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) > #define __raw_pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT)) > #define __raw_pud_present(pud) pte_present(pud_pte(pud)) > > +static inline int __raw_pud_huge(pud_t pud) > +{ > + return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT); > +} > + > static inline void __raw_set_pud(pud_t *pudp, pud_t pud) > { > WRITE_ONCE(*pudp, pud); > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > index ecc6818..3f4bb43 100644 > --- a/arch/arm64/mm/hugetlbpage.c > +++ b/arch/arm64/mm/hugetlbpage.c > @@ -35,7 +35,7 @@ int pmd_huge(pmd_t pmd) > int pud_huge(pud_t pud) > { > #ifndef __PAGETABLE_PMD_FOLDED > - return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT); > + return __raw_pud_huge(pud); > #else > return 0; > #endif >
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical > size shift. Limit the size to the maximum supported by the kernel. > We are about to move the user of this code and this helps to > keep the changes cleaner. It is probably worth to mention that you are also adding 52-bit support in the patch. Cheers, > > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++ > arch/arm64/kvm/hyp/s2-setup.c | 28 +++++----------------------- > 2 files changed, 21 insertions(+), 23 deletions(-) > > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > index fbf0aab..1f2a5dd 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void) > return zcr; > } > > +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) > +{ > + switch (parange) { > + case 0: return 32; > + case 1: return 36; > + case 2: return 40; > + case 3: return 42; > + case 4: return 44; > + /* Report 48 bit if the kernel doesn't support 52bit */ > + default: > + case 5: return 48; > +#ifdef CONFIG_ARM64_PA_BITS_52 > + case 6: return 52; > +#endif > + } > +} > #endif /* __ASSEMBLY__ */ > > #endif > diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c > index 603e1ee..b1129c8 100644 > --- a/arch/arm64/kvm/hyp/s2-setup.c > +++ b/arch/arm64/kvm/hyp/s2-setup.c > @@ -19,11 +19,13 @@ > #include <asm/kvm_arm.h> > #include <asm/kvm_asm.h> > #include <asm/kvm_hyp.h> > +#include <asm/cpufeature.h> > > u32 __hyp_text __init_stage2_translation(void) > { > u64 val = VTCR_EL2_FLAGS; > u64 parange; > + u32 phys_shift; > u64 tmp; > > /* > @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void) > val |= parange << 16; > > /* Compute the actual PARange... */ > - switch (parange) { > - case 0: > - parange = 32; > - break; > - case 1: > - parange = 36; > - break; > - case 2: > - parange = 40; > - break; > - case 3: > - parange = 42; > - break; > - case 4: > - parange = 44; > - break; > - case 5: > - default: > - parange = 48; > - break; > - } > + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); > > /* > * ... and clamp it to 40 bits, unless we have some braindead > @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void) > * return that value for the rest of the kernel to decide what > * to do. > */ > - val |= 64 - (parange > 40 ? 40 : parange); > + val |= 64 - (phys_shift > 40 ? 40 : phys_shift); > > /* > * Check the availability of Hardware Access Flag / Dirty Bit > @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void) > > write_sysreg(val, vtcr_el2); > > - return parange; > + return phys_shift; > } >
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > Right now the stage2 page table for a VM is hard coded, assuming > an IPA of 40bits. As we are about to add support for per VM IPA, > prepare the stage2 page table helpers to accept the kvm instance > to make the right decision for the VM. No functional changes. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <cdall@kernel.org> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm/include/asm/kvm_arm.h | 3 +- > arch/arm/include/asm/kvm_mmu.h | 15 +++- > arch/arm/include/asm/stage2_pgtable.h | 42 ++++----- > arch/arm64/include/asm/kvm_mmu.h | 7 +- > arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 ++-- > arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 ++-- > arch/arm64/include/asm/stage2_pgtable.h | 49 ++++++----- > virt/kvm/arm/arm.c | 2 +- > virt/kvm/arm/mmu.c | 119 +++++++++++++------------- > virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- > 10 files changed, 148 insertions(+), 125 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h > index 3ab8b37..c3f1f9b 100644 > --- a/arch/arm/include/asm/kvm_arm.h > +++ b/arch/arm/include/asm/kvm_arm.h > @@ -133,8 +133,7 @@ > * space. > */ > #define KVM_PHYS_SHIFT (40) > -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) > -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it? [...] > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 7faed6e..594c4e6 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v) > * We currently only support a 40bit IPA. > */ > #define KVM_PHYS_SHIFT (40) > -#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) > -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) > + > +#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT > +#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) NIT: is the _AC(...) necessary? It does not look like this function is going to be usable in assembly. > +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) Same here.
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > Specify the physical size for the VM encoded in the vm type. > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arm/include/arm-common/kvm-arch.h | 6 +++++- > arm/kvm.c | 21 +++++++++++++++++++++ > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h > index ca7ab0f..64bc92a 100644 > --- a/arm/include/arm-common/kvm-arch.h > +++ b/arm/include/arm-common/kvm-arch.h > @@ -42,7 +42,11 @@ > > #define KVM_IRQ_OFFSET GIC_SPI_IRQ_BASE > > -#define KVM_VM_TYPE 0 > +extern unsigned long kvm_arm_type; > +extern void kvm__arch_init_hyp(struct kvm *kvm); > + > +#define KVM_VM_TYPE kvm_arm_type > +#define kvm__arch_init_hyp kvm__arch_init_hyp > > #define VIRTIO_DEFAULT_TRANS(kvm) \ > ((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO) > diff --git a/arm/kvm.c b/arm/kvm.c > index 5701d41..a9a9140 100644 > --- a/arm/kvm.c > +++ b/arm/kvm.c > @@ -11,6 +11,8 @@ > #include <linux/kvm.h> > #include <linux/sizes.h> > > +unsigned long kvm_arm_type; > + > struct kvm_ext kvm_req_ext[] = { > { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) }, > { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) }, > @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = { > { 0, 0 }, > }; > > +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT > +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a) > +#endif Once the IOCTL is agreed, I think you would need to update include/linux/kvm.h. > + > +void kvm__arch_init_hyp(struct kvm *kvm) > +{ > + unsigned max_ipa; > + > + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT); > + if (max_ipa < 0) > + max_ipa = 40; > + if (!kvm->cfg.arch.phys_shift) > + kvm->cfg.arch.phys_shift = 40; > + if (kvm->cfg.arch.phys_shift > max_ipa) > + die("Requested PA size (%u) is not supported by the host (%ubits)\n", > + kvm->cfg.arch.phys_shift, max_ipa); > + kvm_arm_type = kvm->cfg.arch.phys_shift; FWIW, older Linux will not accept a KVM type other than 0. So I think the best would be to bail out if KVM_ARM_GET_MAX_VM_PHYS_SHIFT does not exist. For safety, you could even check that arch.phys_shift is always 40 when running under older Linux. Cheers,
On 26/04/18 11:58, Julien Grall wrote: > Hi Suzuki, > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical >> size shift. Limit the size to the maximum supported by the kernel. >> We are about to move the user of this code and this helps to >> keep the changes cleaner. > > It is probably worth to mention that you are also adding 52-bit support in the patch. Sure, will do. Can I take that as a Reviewed-by with the fixed commit description ? Cheers Suzuki > > Cheers, > >> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will.deacon@arm.com> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Christoffer Dall <cdall@kernel.org> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++ >> arch/arm64/kvm/hyp/s2-setup.c | 28 +++++----------------------- >> 2 files changed, 21 insertions(+), 23 deletions(-) >> >> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h >> index fbf0aab..1f2a5dd 100644 >> --- a/arch/arm64/include/asm/cpufeature.h >> +++ b/arch/arm64/include/asm/cpufeature.h >> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void) >> return zcr; >> } >> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) >> +{ >> + switch (parange) { >> + case 0: return 32; >> + case 1: return 36; >> + case 2: return 40; >> + case 3: return 42; >> + case 4: return 44; >> + /* Report 48 bit if the kernel doesn't support 52bit */ >> + default: >> + case 5: return 48; >> +#ifdef CONFIG_ARM64_PA_BITS_52 >> + case 6: return 52; >> +#endif >> + } >> +} >> #endif /* __ASSEMBLY__ */ >> #endif >> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c >> index 603e1ee..b1129c8 100644 >> --- a/arch/arm64/kvm/hyp/s2-setup.c >> +++ b/arch/arm64/kvm/hyp/s2-setup.c >> @@ -19,11 +19,13 @@ >> #include <asm/kvm_arm.h> >> #include <asm/kvm_asm.h> >> #include <asm/kvm_hyp.h> >> +#include <asm/cpufeature.h> >> u32 __hyp_text __init_stage2_translation(void) >> { >> u64 val = VTCR_EL2_FLAGS; >> u64 parange; >> + u32 phys_shift; >> u64 tmp; >> /* >> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void) >> val |= parange << 16; >> /* Compute the actual PARange... */ >> - switch (parange) { >> - case 0: >> - parange = 32; >> - break; >> - case 1: >> - parange = 36; >> - break; >> - case 2: >> - parange = 40; >> - break; >> - case 3: >> - parange = 42; >> - break; >> - case 4: >> - parange = 44; >> - break; >> - case 5: >> - default: >> - parange = 48; >> - break; >> - } >> + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); >> /* >> * ... and clamp it to 40 bits, unless we have some braindead >> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void) >> * return that value for the rest of the kernel to decide what >> * to do. >> */ >> - val |= 64 - (parange > 40 ? 40 : parange); >> + val |= 64 - (phys_shift > 40 ? 40 : phys_shift); >> /* >> * Check the availability of Hardware Access Flag / Dirty Bit >> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void) >> write_sysreg(val, vtcr_el2); >> - return parange; >> + return phys_shift; >> } >> >
On 27/04/18 16:18, Suzuki K Poulose wrote: > On 26/04/18 11:58, Julien Grall wrote: >> Hi Suzuki, >> >> On 27/03/18 14:15, Suzuki K Poulose wrote: >>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical >>> size shift. Limit the size to the maximum supported by the kernel. >>> We are about to move the user of this code and this helps to >>> keep the changes cleaner. >> >> It is probably worth to mention that you are also adding 52-bit >> support in the patch. > > Sure, will do. Can I take that as a Reviewed-by with the fixed > commit description ? Yes. Here we go: Reviewed-by: Julien Grall <julien.grall@arm.com> Cheers, > > Cheers > Suzuki > >> >> Cheers, >> >>> >>> Cc: Mark Rutland <mark.rutland@arm.com> >>> Cc: Catalin Marinas <catalin.marinas@arm.com> >>> Cc: Will Deacon <will.deacon@arm.com> >>> Cc: Marc Zyngier <marc.zyngier@arm.com> >>> Cc: Christoffer Dall <cdall@kernel.org> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> --- >>> arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++ >>> arch/arm64/kvm/hyp/s2-setup.c | 28 +++++----------------------- >>> 2 files changed, 21 insertions(+), 23 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/cpufeature.h >>> b/arch/arm64/include/asm/cpufeature.h >>> index fbf0aab..1f2a5dd 100644 >>> --- a/arch/arm64/include/asm/cpufeature.h >>> +++ b/arch/arm64/include/asm/cpufeature.h >>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void) >>> return zcr; >>> } >>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) >>> +{ >>> + switch (parange) { >>> + case 0: return 32; >>> + case 1: return 36; >>> + case 2: return 40; >>> + case 3: return 42; >>> + case 4: return 44; >>> + /* Report 48 bit if the kernel doesn't support 52bit */ >>> + default: >>> + case 5: return 48; >>> +#ifdef CONFIG_ARM64_PA_BITS_52 >>> + case 6: return 52; >>> +#endif >>> + } >>> +} >>> #endif /* __ASSEMBLY__ */ >>> #endif >>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c >>> b/arch/arm64/kvm/hyp/s2-setup.c >>> index 603e1ee..b1129c8 100644 >>> --- a/arch/arm64/kvm/hyp/s2-setup.c >>> +++ b/arch/arm64/kvm/hyp/s2-setup.c >>> @@ -19,11 +19,13 @@ >>> #include <asm/kvm_arm.h> >>> #include <asm/kvm_asm.h> >>> #include <asm/kvm_hyp.h> >>> +#include <asm/cpufeature.h> >>> u32 __hyp_text __init_stage2_translation(void) >>> { >>> u64 val = VTCR_EL2_FLAGS; >>> u64 parange; >>> + u32 phys_shift; >>> u64 tmp; >>> /* >>> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void) >>> val |= parange << 16; >>> /* Compute the actual PARange... */ >>> - switch (parange) { >>> - case 0: >>> - parange = 32; >>> - break; >>> - case 1: >>> - parange = 36; >>> - break; >>> - case 2: >>> - parange = 40; >>> - break; >>> - case 3: >>> - parange = 42; >>> - break; >>> - case 4: >>> - parange = 44; >>> - break; >>> - case 5: >>> - default: >>> - parange = 48; >>> - break; >>> - } >>> + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); >>> /* >>> * ... and clamp it to 40 bits, unless we have some braindead >>> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void) >>> * return that value for the rest of the kernel to decide what >>> * to do. >>> */ >>> - val |= 64 - (parange > 40 ? 40 : parange); >>> + val |= 64 - (phys_shift > 40 ? 40 : phys_shift); >>> /* >>> * Check the availability of Hardware Access Flag / Dirty Bit >>> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void) >>> write_sysreg(val, vtcr_el2); >>> - return parange; >>> + return phys_shift; >>> } >>> >> >
On 26/04/18 14:35, Julien Grall wrote: > Hi Suzuki, > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> Right now the stage2 page table for a VM is hard coded, assuming >> an IPA of 40bits. As we are about to add support for per VM IPA, >> prepare the stage2 page table helpers to accept the kvm instance >> to make the right decision for the VM. No functional changes. >> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h >> index 3ab8b37..c3f1f9b 100644 >> --- a/arch/arm/include/asm/kvm_arm.h >> +++ b/arch/arm/include/asm/kvm_arm.h >> @@ -133,8 +133,7 @@ >> * space. >> */ >> #define KVM_PHYS_SHIFT (40) >> -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) >> -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) > > I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it? > > [...] I am moving all the macros that depend on the "kvm" instance to kvm_mmu.h. I will see if I can move the KVM_PHYS_SHIFT without much trouble. > >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >> index 7faed6e..594c4e6 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v) >> * We currently only support a 40bit IPA. >> */ >> #define KVM_PHYS_SHIFT (40) >> -#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) >> -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) >> + >> +#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT >> +#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) > > NIT: is the _AC(...) necessary? It does not look like this function is going to be usable in assembly. > >> +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) > > Same here. > Ok, will drop them. Cheers Suzuki
On 27/04/18 16:22, Suzuki K Poulose wrote: > On 26/04/18 14:35, Julien Grall wrote: >> Hi Suzuki, >> >> On 27/03/18 14:15, Suzuki K Poulose wrote: >>> Right now the stage2 page table for a VM is hard coded, assuming >>> an IPA of 40bits. As we are about to add support for per VM IPA, >>> prepare the stage2 page table helpers to accept the kvm instance >>> to make the right decision for the VM. No functional changes. > > >>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h >>> index 3ab8b37..c3f1f9b 100644 >>> --- a/arch/arm/include/asm/kvm_arm.h >>> +++ b/arch/arm/include/asm/kvm_arm.h >>> @@ -133,8 +133,7 @@ >>> * space. >>> */ >>> #define KVM_PHYS_SHIFT (40) >>> -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) >>> -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) >> >> I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it? >> >> [...] > > I am moving all the macros that depend on the "kvm" instance to kvm_mmu.h. > I will see if I can move the KVM_PHYS_SHIFT without much trouble. It looks like we can't do that easily. KVM_PHYS_SHIFT is used for KVM_T0SZ on arm, even though that can be simply hard coded to avoid the dependency on KVM_PHYS_SHIFT (like we did for arm64, T0SZ is defined to 24). I would leave it as it is to avoid the noise. Cheers Suzuki
Hi Suzuki, On 27/04/18 16:58, Suzuki K Poulose wrote: > On 27/04/18 16:22, Suzuki K Poulose wrote: >> On 26/04/18 14:35, Julien Grall wrote: >>> Hi Suzuki, >>> >>> On 27/03/18 14:15, Suzuki K Poulose wrote: >>>> Right now the stage2 page table for a VM is hard coded, assuming >>>> an IPA of 40bits. As we are about to add support for per VM IPA, >>>> prepare the stage2 page table helpers to accept the kvm instance >>>> to make the right decision for the VM. No functional changes. >> >> >>>> diff --git a/arch/arm/include/asm/kvm_arm.h >>>> b/arch/arm/include/asm/kvm_arm.h >>>> index 3ab8b37..c3f1f9b 100644 >>>> --- a/arch/arm/include/asm/kvm_arm.h >>>> +++ b/arch/arm/include/asm/kvm_arm.h >>>> @@ -133,8 +133,7 @@ >>>> * space. >>>> */ >>>> #define KVM_PHYS_SHIFT (40) >>>> -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) >>>> -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) >>> >>> I assume you are moving them to kvm_mmu.h in order to match the arm64 >>> side, right? If so, would not it make sense to make KVM_PHYS_SHIFT >>> with it? >>> >>> [...] >> >> I am moving all the macros that depend on the "kvm" instance to >> kvm_mmu.h. >> I will see if I can move the KVM_PHYS_SHIFT without much trouble. > > It looks like we can't do that easily. KVM_PHYS_SHIFT is used for KVM_T0SZ > on arm, even though that can be simply hard coded to avoid the > dependency on > KVM_PHYS_SHIFT (like we did for arm64, T0SZ is defined to 24). I would > leave it > as it is to avoid the noise. Fine. That was only a suggestion :). Cheers,
Hi Suzuki, The algos in this patch looks good to me. A couple of NIT below. On 27/03/18 14:15, Suzuki K Poulose wrote: > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h > index b0c8417..176551c 100644 > --- a/arch/arm64/include/asm/kvm_arm.h > +++ b/arch/arm64/include/asm/kvm_arm.h > @@ -124,6 +124,8 @@ > #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) > #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) > > +#define VTCR_EL2_T0SZ(x) TCR_T0SZ((x)) NIT: The inner parentheses should not be necessary. [...] > +/* > + * VTCR_EL2:SL0 indicates the entry level for Stage2 translation. > + * Interestingly, it depends on the page size. > + * See D.10.2.110, VTCR_EL2, in ARM DDI 0487B.b > + * > + * ----------------------------------------- > + * | Entry level | 4K | 16K/64K | > + * ------------------------------------------ > + * | Level: 0 | 2 | - | > + * ------------------------------------------ > + * | Level: 1 | 1 | 2 | > + * ------------------------------------------ > + * | Level: 2 | 0 | 1 | > + * ------------------------------------------ > + * | Level: 3 | - | 0 | > + * ------------------------------------------ > + * > + * That table roughly translates to : > + * > + * SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level > + * > + * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided > + * we take care of ruling out the unsupported cases and > + * Entry_Level = 4 - Number_of_levels. > + * > + */ > +#define VTCR_EL2_SL0(levels) \ > + ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT) > +/* > + * ARM VMSAv8-64 defines an algorithm for finding the translation table > + * descriptors in section D4.2.8 in ARM DDI 0487B.b. > + * > + * The algorithm defines the expectaions on the BaseAddress (for the page NIT: s/expectaions/expectations/ Cheers,
Hi, On 27/03/18 14:15, Suzuki K Poulose wrote: > diff --git a/arm/kvm.c b/arm/kvm.c > index 5701d41..a9a9140 100644 > --- a/arm/kvm.c > +++ b/arm/kvm.c > @@ -11,6 +11,8 @@ > #include <linux/kvm.h> > #include <linux/sizes.h> > > +unsigned long kvm_arm_type; > + > struct kvm_ext kvm_req_ext[] = { > { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) }, > { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) }, > @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = { > { 0, 0 }, > }; > > +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT > +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a) > +#endif > + > +void kvm__arch_init_hyp(struct kvm *kvm) > +{ > + unsigned max_ipa; > + > + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT); > + if (max_ipa < 0) Another issues spotted while doing some testing. This will always be false because max_ipa is unsigned. I think we want to turn max_ipa to signed. Cheers,
On 30/04/18 15:17, Julien Grall wrote: > Hi, > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> diff --git a/arm/kvm.c b/arm/kvm.c >> index 5701d41..a9a9140 100644 >> --- a/arm/kvm.c >> +++ b/arm/kvm.c >> @@ -11,6 +11,8 @@ >> #include <linux/kvm.h> >> #include <linux/sizes.h> >> +unsigned long kvm_arm_type; >> + >> struct kvm_ext kvm_req_ext[] = { >> { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) }, >> { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) }, >> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = { >> { 0, 0 }, >> }; >> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT >> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a) >> +#endif >> + >> +void kvm__arch_init_hyp(struct kvm *kvm) >> +{ >> + unsigned max_ipa; >> + >> + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT); >> + if (max_ipa < 0) > > Another issues spotted while doing some testing. This will always be false because max_ipa is unsigned. > > I think we want to turn max_ipa to signed. Good catch. I will fix it. Cheers Suzuki
Hi Suzuki, On 27/03/18 14:15, Suzuki K Poulose wrote: > We set VTCR_EL2 very early during the stage2 init and don't > touch it ever. This is fine as we had a fixed IPA size. This > patch changes the behavior to set the VTCR for a given VM, > depending on its stage2 table. The common configuration for > VTCR is still performed during the early init as we have to > retain the hardware access flag update bits (VTCR_EL2_HA) > per CPU (as they are only set for the CPUs which are capabile). > The bits defining the number of levels in the page table (SL0) > and and the size of the Input address to the translation (T0SZ) > are programmed for each VM upon entry to the guest. > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index 596f8e4..9f3c8b8 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, > > static inline void __cpu_init_stage2(void) > { > - u32 parange = kvm_call_hyp(__init_stage2_translation); > + u32 ps; > > - WARN_ONCE(parange < 40, > - "PARange is %d bits, unsupported configuration!", parange); > + kvm_call_hyp(__init_stage2_translation); > + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1)); Doesn't id_aa64mmfr0_parange_to_phys_shift() expect you do to the mask and shift for it? This will always hit the default case. > + WARN_ONCE(ps < 40, > + "PARange is %d bits, unsupported configuration!", ps); > } Thanks, James
Hi Suzuki, Nit: KVM in the subject line? On 27/03/18 14:15, Suzuki K Poulose wrote: > Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical > size shift. Limit the size to the maximum supported by the kernel. > We are about to move the user of this code and this helps to > keep the changes cleaner. > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > index fbf0aab..1f2a5dd 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void) > return zcr; > } > > +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) > +{ > + switch (parange) { > + case 0: return 32; > + case 1: return 36; > + case 2: return 40; > + case 3: return 42; > + case 4: return 44; > + /* Report 48 bit if the kernel doesn't support 52bit */ > + default: > + case 5: return 48; > +#ifdef CONFIG_ARM64_PA_BITS_52 > + case 6: return 52; > +#endif Eeew. I thought 'default' had to appear at the end of the list, but evidently not! If the last three bit value ever gets used this is going to look really weird. Can't we have a helper that just does the mapping, then apply the clamping with something like: | parange = min(CONFIG_ARM64_PA_BITS, parange); Its odd that the helper has the id-register in the name, but expects you do the shift and mask for it... (and for this patch, KVM has already done the 52bit clamping with: | if (parange > ID_AA64MMFR0_PARANGE_MAX) | parange = ID_AA64MMFR0_PARANGE_MAX; ) > diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c > index 603e1ee..b1129c8 100644 > --- a/arch/arm64/kvm/hyp/s2-setup.c > +++ b/arch/arm64/kvm/hyp/s2-setup.c > @@ -19,11 +19,13 @@ > #include <asm/kvm_arm.h> > #include <asm/kvm_asm.h> > #include <asm/kvm_hyp.h> > +#include <asm/cpufeature.h> > > u32 __hyp_text __init_stage2_translation(void) > { Nit: Why change the variable you put this in, if its all removed again in patch 11? Thanks, James
On 03/05/18 15:39, James Morse wrote: > Hi Suzuki, > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> We set VTCR_EL2 very early during the stage2 init and don't >> touch it ever. This is fine as we had a fixed IPA size. This >> patch changes the behavior to set the VTCR for a given VM, >> depending on its stage2 table. The common configuration for >> VTCR is still performed during the early init as we have to >> retain the hardware access flag update bits (VTCR_EL2_HA) >> per CPU (as they are only set for the CPUs which are capabile). >> The bits defining the number of levels in the page table (SL0) >> and and the size of the Input address to the translation (T0SZ) >> are programmed for each VM upon entry to the guest. > > >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >> index 596f8e4..9f3c8b8 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, >> >> static inline void __cpu_init_stage2(void) >> { >> - u32 parange = kvm_call_hyp(__init_stage2_translation); >> + u32 ps; >> >> - WARN_ONCE(parange < 40, >> - "PARange is %d bits, unsupported configuration!", parange); >> + kvm_call_hyp(__init_stage2_translation); >> + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1)); > > Doesn't id_aa64mmfr0_parange_to_phys_shift() expect you do to the mask and shift > for it? This will always hit the default case. Good catch ! The error case was not hit on the system I tested, as it was indeed having 48bit PA. I should have done more testing with Juno where it is 40bit PA (which doesn't really allow different phys-shift ranges). I will change the helper to extract the parange and the convert it. Also, rename it to : id_aa64mmfr0_phys_shift() Cheers Suzuki > > >> + WARN_ONCE(ps < 40, >> + "PARange is %d bits, unsupported configuration!", ps); >> } > > > > Thanks, > > James >
On 03/05/18 15:39, James Morse wrote: > Hi Suzuki, > > Nit: KVM in the subject line? Well, the helper is generic and its just that KVM makes use of it. > > On 27/03/18 14:15, Suzuki K Poulose wrote: >> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical >> size shift. Limit the size to the maximum supported by the kernel. >> We are about to move the user of this code and this helps to >> keep the changes cleaner. > >> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h >> index fbf0aab..1f2a5dd 100644 >> --- a/arch/arm64/include/asm/cpufeature.h >> +++ b/arch/arm64/include/asm/cpufeature.h >> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void) >> return zcr; >> } >> >> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) >> +{ >> + switch (parange) { >> + case 0: return 32; >> + case 1: return 36; >> + case 2: return 40; >> + case 3: return 42; >> + case 4: return 44; >> + /* Report 48 bit if the kernel doesn't support 52bit */ >> + default: >> + case 5: return 48; >> +#ifdef CONFIG_ARM64_PA_BITS_52 >> + case 6: return 52; >> +#endif > > Eeew. I thought 'default' had to appear at the end of the list, but evidently > not! If the last three bit value ever gets used this is going to look really weird. I could rearrange them a bit to : case 4: .. #ifdef CONFIG_ARM64_PA_BITS_52 case 6: ... #endif case 5: /* Report 48 bit if the kernel doesn't support 52bit */ default: return 48; > > Can't we have a helper that just does the mapping, then apply the clamping with > something like: > | parange = min(CONFIG_ARM64_PA_BITS, parange); Yes, I think that might be a bit cleaner. > > > Its odd that the helper has the id-register in the name, but expects you do the > shift and mask for it... > > (and for this patch, KVM has already done the 52bit clamping with: > | if (parange > ID_AA64MMFR0_PARANGE_MAX) > | parange = ID_AA64MMFR0_PARANGE_MAX; > ) > As mentioned in the other thread, I will change the name of the helper. > >> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c >> index 603e1ee..b1129c8 100644 >> --- a/arch/arm64/kvm/hyp/s2-setup.c >> +++ b/arch/arm64/kvm/hyp/s2-setup.c >> @@ -19,11 +19,13 @@ >> #include <asm/kvm_arm.h> >> #include <asm/kvm_asm.h> >> #include <asm/kvm_hyp.h> >> +#include <asm/cpufeature.h> >> >> u32 __hyp_text __init_stage2_translation(void) >> { > > Nit: Why change the variable you put this in, if its all removed again in patch 11? The parange holds the PARange initially, used to set the VTCR.IPS and is then overloaded with the converted "phys-shift". The change is a minor cleanup to make that clear, even though we remove it later as we don't deal with the phys-shifts anymore. I would prefer to keep it as it is. Cheers Suzuki
On Tue, Mar 27, 2018 at 05:11:04PM +0300, Michael S. Tsirkin wrote: > On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote: > > Legacy PCI over virtio uses a 32bit PFN for the queue. If the > > queue pfn is too large to fit in 32bits, which we could hit on > > arm64 systems with 52bit physical addresses (even with 64K page > > size), we simply miss out a proper link to the other side of > > the queue. > > > > Add a check to validate the PFN, rather than silently breaking > > the devices. > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > Cc: Jason Wang <jasowang@redhat.com> > > Cc: Marc Zyngier <marc.zyngier@arm.com> > > Cc: Christoffer Dall <cdall@kernel.org> > > Cc: Peter Maydel <peter.maydell@linaro.org> > > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > > --- > > drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c > > index 2780886..4b84a75 100644 > > --- a/drivers/virtio/virtio_pci_legacy.c > > +++ b/drivers/virtio/virtio_pci_legacy.c > > @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > > struct virtqueue *vq; > > u16 num; > > int err; > > + u64 q_pfn; > > > > /* Select the queue we're interested in */ > > iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); > > @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > > if (!vq) > > return ERR_PTR(-ENOMEM); > > > > + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT; > > + if (q_pfn >> 32) { > > + dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n"); > > + err = -ENOMEM; > > ENOMEM seems wrong here. E2BIG? > > > + goto out_del_vq; > > + } > > + > > /* activate the queue */ > > - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, > > - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > > + iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > > Is the cast really necessary here? > > > > > vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY; > > > > @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > > > > out_deactivate: > > iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > > +out_del_vq: > > vring_del_virtqueue(vq); > > return ERR_PTR(err); > > } > > -- > > 2.7.4 Ping are you going to address and repost, or should I drop this?
On 13/07/18 01:36, Michael S. Tsirkin wrote: > On Tue, Mar 27, 2018 at 05:11:04PM +0300, Michael S. Tsirkin wrote: >> On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote: >>> Legacy PCI over virtio uses a 32bit PFN for the queue. If the >>> queue pfn is too large to fit in 32bits, which we could hit on >>> arm64 systems with 52bit physical addresses (even with 64K page >>> size), we simply miss out a proper link to the other side of >>> the queue. >>> >>> Add a check to validate the PFN, rather than silently breaking >>> the devices. >>> >>> Cc: "Michael S. Tsirkin" <mst@redhat.com> >>> Cc: Jason Wang <jasowang@redhat.com> >>> Cc: Marc Zyngier <marc.zyngier@arm.com> >>> Cc: Christoffer Dall <cdall@kernel.org> >>> Cc: Peter Maydel <peter.maydell@linaro.org> >>> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> --- >>> drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++-- >>> 1 file changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c >>> index 2780886..4b84a75 100644 >>> --- a/drivers/virtio/virtio_pci_legacy.c >>> +++ b/drivers/virtio/virtio_pci_legacy.c >>> @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, >>> struct virtqueue *vq; >>> u16 num; >>> int err; >>> + u64 q_pfn; >>> >>> /* Select the queue we're interested in */ >>> iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); >>> @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, >>> if (!vq) >>> return ERR_PTR(-ENOMEM); >>> >>> + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT; >>> + if (q_pfn >> 32) { >>> + dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n"); >>> + err = -ENOMEM; >> >> ENOMEM seems wrong here. E2BIG? >> >>> + goto out_del_vq; >>> + } >>> + >>> /* activate the queue */ >>> - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, >>> - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); >>> + iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); >> >> Is the cast really necessary here? >> >>> >>> vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY; >>> >>> @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, >>> >>> out_deactivate: >>> iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); >>> +out_del_vq: >>> vring_del_virtqueue(vq); >>> return ERR_PTR(err); >>> } >>> -- >>> 2.7.4 Michael, > > Ping are you going to address and repost, or should I drop this? This was addressed and reposted as v3, which needs a minor update to the error message as mentioned here [0]. I will post the fixed version today. [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/588398.html Thanks Suzuki