mbox series

[v2,00/17] kvm: arm64: Dynamic & 52bit IPA support

Message ID 1522156531-28348-1-git-send-email-suzuki.poulose@arm.com
Headers show
Series kvm: arm64: Dynamic & 52bit IPA support | expand

Message

Suzuki K Poulose March 27, 2018, 1:15 p.m. UTC
The physical address space size for a VM (IPA size) on arm/arm64 is
limited to a static limit of 40bits. This series adds support for
using a limit specific to a VM, allowing to use a limit supported
by the host (based on the host kernel configuration and CPU support).
The default and the minimum size is fixed to 40bits. We also add
support for handling 52bit IPA addresses added by Arm v8.2 extensions.

As mentioned above, the supported IPA size on a host could be different
from the system's PARange indicated by the CPUs (e.g, kernel limit
on the PA size). So we expose the limit via a new system ioctl request
 - KVM_ARM_GET_MAX_VM_PHYS_SHIFT - on arm/arm64. This can then be
passed on to the KVM_CREATE_VM ioctl, encoded in the "type" field.
Bits [7-0] of the type are reserved for the IPA size. This approach
allows simpler management of the stage2 page table and guest memory
slots.

The arm64 page table level helpers are defined based on the page
table levels used by the host VA. So, the accessors may not work
if the guest uses more number of levels in stage2 than the stage1
of the host. In order to provide an independent stage2 page table,
we refactor the arm64 page table helpers to give us raw accessors
for each level, which should only used when that level is present.
And then, based on the VM, we make the decision of the stage2
page table using the raw accessors.

52bit support is added for VGIC (including ITS emulation) and handling
of PAR, HPFAR registers.

The series applies on arm64 for-next/core. A tree is available here:

	 git://linux-arm.org/linux-skp.git ipa52/v2

Changes since V1:
 - Change the userspace API for configuring VM to encode the IPA
   size in the VM type.  (suggested by Christoffer)
 - Expose the IPA limit on the host via ioctl on /dev/kvm
 - Handle 52bit addresses in PAR & HPFAR
 - Drop patch changing the life time of stage2 PGD
 - Rename macros for 48-to-52 bit conversion for GIC ITS BASER.
   (suggested by Christoffer)
 - Split virtio PFN check patches and address comments.

The series also adds :
 - Support for handling 52bit IPA for vgic ITS.
 - Cleanup in virtio to handle errors when the PFN used in
   the virtio transport doesn't fit in 32bit.

Tested with
  - Modified kvmtool, which can only be used for (patches included in
    the series for reference / testing):
    * with virtio-pci upto 44bit PA (Due to 4K page size for virtio-pci
      legacy implemented by kvmtool)
    * Upto 48bit PA with virtio-mmio, due to 32bit PFN limitation.
  - Hacked Qemu (boot loader support for highmem, phys-shift support)
    * with virtio-pci GIC-v3 ITS & MSI upto 52bit on Foundation model.

Kristina Martsenko (1):
  vgic: Add support for 52bit guest physical address

Suzuki K Poulose (16):
  virtio: mmio-v1: Validate queue PFN
  virtio: pci-legacy: Validate queue pfn
  arm64: Make page table helpers reusable
  arm64: Refactor pud_huge for reusability
  arm64: Helper for parange to PASize
  kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
  kvm: arm/arm64: Remove spurious WARN_ON
  kvm: arm/arm64: Prepare for VM specific stage2 translations
  kvm: arm64: Make stage2 page table layout dynamic
  kvm: arm64: Dynamic configuration of VTCR and VTTBR mask
  kvm: arm64: Configure VTCR per VM
  kvm: arm/arm64: Expose supported physical address limit for VM
  kvm: arm/arm64: Allow tuning the physical address size for VM
  kvm: arm64: Switch to per VM IPA limit
  kvm: arm64: Add support for handling 52bit IPA
  kvm: arm64: Allow IPA size supported by the system

 Documentation/virtual/kvm/api.txt             |  14 ++
 arch/arm/include/asm/kvm_arm.h                |   3 +-
 arch/arm/include/asm/kvm_mmu.h                |  22 ++-
 arch/arm/include/asm/stage2_pgtable.h         |  42 ++---
 arch/arm64/include/asm/cpufeature.h           |  16 ++
 arch/arm64/include/asm/kvm_arm.h              | 119 +++++++++++++--
 arch/arm64/include/asm/kvm_asm.h              |   2 +-
 arch/arm64/include/asm/kvm_host.h             |  19 ++-
 arch/arm64/include/asm/kvm_mmu.h              |  71 +++++++--
 arch/arm64/include/asm/pgalloc.h              |  34 ++++-
 arch/arm64/include/asm/pgtable.h              |  63 +++++---
 arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 -----
 arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -----
 arch/arm64/include/asm/stage2_pgtable.h       | 211 +++++++++++++++++---------
 arch/arm64/kvm/hyp/s2-setup.c                 |  34 +----
 arch/arm64/kvm/hyp/switch.c                   |   7 +-
 arch/arm64/mm/hugetlbpage.c                   |   2 +-
 drivers/virtio/virtio_mmio.c                  |  18 ++-
 drivers/virtio/virtio_pci_legacy.c            |  12 +-
 include/linux/irqchip/arm-gic-v3.h            |   5 +
 include/uapi/linux/kvm.h                      |  16 ++
 virt/kvm/arm/arm.c                            |  32 +++-
 virt/kvm/arm/mmu.c                            | 124 ++++++++-------
 virt/kvm/arm/vgic/vgic-its.c                  |  37 ++---
 virt/kvm/arm/vgic/vgic-kvm-device.c           |   2 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c              |   2 -
 26 files changed, 626 insertions(+), 362 deletions(-)
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h

kvmtool hack for IPA support :

Suzuki K Poulose (4):
  kvmtool: Allow backends to run checks on the KVM device fd
  kvmtool: arm64: Add support for guest physical address size
  kvmtool: arm64: Switch memory layout
  kvmtool: arm: Add support for creating VM with PA size

 arm/aarch32/include/kvm/kvm-arch.h        |  1 +
 arm/aarch64/include/kvm/kvm-arch.h        | 15 ++++++++++++---
 arm/aarch64/include/kvm/kvm-config-arch.h |  5 ++++-
 arm/include/arm-common/kvm-arch.h         | 17 +++++++++++------
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 arm/kvm.c                                 | 23 ++++++++++++++++++++++-
 include/kvm/kvm.h                         |  4 ++++
 kvm.c                                     |  2 ++
 8 files changed, 57 insertions(+), 11 deletions(-)

Comments

Michael S. Tsirkin March 27, 2018, 2:07 p.m. UTC | #1
On Tue, Mar 27, 2018 at 02:15:11PM +0100, Suzuki K Poulose wrote:
> virtio-mmio with virtio-v1 uses a 32bit PFN for the queue.
> If the queue pfn is too large to fit in 32bits, which
> we could hit on arm64 systems with 52bit physical addresses
> (even with 64K page size), we simply miss out a proper link
> to the other side of the queue.
> 
> Add a check to validate the PFN, rather than silently breaking
> the devices.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Cc: Peter Maydel <peter.maydell@linaro.org>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

OK - seems harmless so I will queue this.
But I really think effort should be spent on
adding v1.0 support in QEMU.

> ---
>  drivers/virtio/virtio_mmio.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 67763d3..b2f9b5c 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  	/* Activate the queue */
>  	writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM);
>  	if (vm_dev->version == 1) {
> +		u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT;
> +
> +		/*
> +		 * virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something
> +		 * that doesn't fit in 32bit, fail the setup rather than
> +		 * pretending to be successful.
> +		 */
> +		if (q_pfn >> 32) {
> +			dev_err(&vdev->dev, "virtio-mmio: queue address too large\n");
> +			err = -ENOMEM;
> +			goto error_bad_pfn;
> +		}
> +
>  		writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
> -		writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT,
> -				vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> +		writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
>  	} else {
>  		u64 addr;
>  
> @@ -430,6 +442,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  
>  	return vq;
>  
> +error_bad_pfn:
> +	vring_del_virtqueue(vq);
>  error_new_virtqueue:
>  	if (vm_dev->version == 1) {
>  		writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> -- 
> 2.7.4
Michael S. Tsirkin March 27, 2018, 2:11 p.m. UTC | #2
On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote:
> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
> queue pfn is too large to fit in 32bits, which we could hit on
> arm64 systems with 52bit physical addresses (even with 64K page
> size), we simply miss out a proper link to the other side of
> the queue.
> 
> Add a check to validate the PFN, rather than silently breaking
> the devices.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Cc: Peter Maydel <peter.maydell@linaro.org>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 2780886..4b84a75 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  	struct virtqueue *vq;
>  	u16 num;
>  	int err;
> +	u64 q_pfn;
>  
>  	/* Select the queue we're interested in */
>  	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  	if (!vq)
>  		return ERR_PTR(-ENOMEM);
>  
> +	q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> +	if (q_pfn >> 32) {
> +		dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
> +		err = -ENOMEM;

ENOMEM seems wrong here. E2BIG?

> +		goto out_del_vq;
> +	}
> +
>  	/* activate the queue */
> -	iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> -		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +	iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);

Is the cast really necessary here?

>  
>  	vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
>  
> @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  
>  out_deactivate:
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +out_del_vq:
>  	vring_del_virtqueue(vq);
>  	return ERR_PTR(err);
>  }
> -- 
> 2.7.4
Jean-Philippe Brucker April 3, 2018, 12:34 p.m. UTC | #3
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> If the guest wants to use a larger physical address space place
> the RAM at upper half of the address space. Otherwise, it uses the
> default layout.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arm/aarch32/include/kvm/kvm-arch.h |  1 +
>  arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++++++---
>  arm/include/arm-common/kvm-arch.h  | 11 ++++++-----
>  arm/kvm.c                          |  2 +-
>  4 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h
> index cd31e72..2d62aab 100644
> --- a/arm/aarch32/include/kvm/kvm-arch.h
> +++ b/arm/aarch32/include/kvm/kvm-arch.h
> @@ -4,6 +4,7 @@
>  #define ARM_KERN_OFFSET(...)	0x8000
>  
>  #define ARM_MAX_MEMORY(...)	ARM_LOMAP_MAX_MEMORY

I guess this should now be ARM32_MAX_MEMORY?

Thanks,
Jean
James Morse April 3, 2018, 2:58 p.m. UTC | #4
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> We set VTCR_EL2 very early during the stage2 init and don't
> touch it ever. This is fine as we had a fixed IPA size. This
> patch changes the behavior to set the VTCR for a given VM,
> depending on its stage2 table. The common configuration for
> VTCR is still performed during the early init as we have to
> retain the hardware access flag update bits (VTCR_EL2_HA)
> per CPU (as they are only set for the CPUs which are capabile).

(Nit: capable)


> The bits defining the number of levels in the page table (SL0)
> and and the size of the Input address to the translation (T0SZ)
> are programmed for each VM upon entry to the guest.

> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 870f4b1..5ccd3ae 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
>  static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> +	u64 vtcr = read_sysreg(vtcr_el2);
> +
> +	vtcr &= ~VTCR_EL2_PRIVATE_MASK;
> +	vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
> +		VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
> +	write_sysreg(vtcr, vtcr_el2);
>  	write_sysreg(kvm->arch.vttbr, vttbr_el2);
>  }

Do we need to set this register for tlb maintenance too?
e.g. tlbi for a 3-level-stage2 vm when a 2-level-stage2 vm's vtcr is loaded...

(The ARM-ARM has 'Any of the bits of VTCR_EL2 are permitted to be cached in a TLB'.)


Thanks,

James
Suzuki K Poulose April 3, 2018, 3:44 p.m. UTC | #5
On 03/04/18 15:58, James Morse wrote:
> Hi Suzuki,
> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> We set VTCR_EL2 very early during the stage2 init and don't
>> touch it ever. This is fine as we had a fixed IPA size. This
>> patch changes the behavior to set the VTCR for a given VM,
>> depending on its stage2 table. The common configuration for
>> VTCR is still performed during the early init as we have to
>> retain the hardware access flag update bits (VTCR_EL2_HA)
>> per CPU (as they are only set for the CPUs which are capabile).
> 
> (Nit: capable)
> 

Thanks for spotting, will fix it.

> 
>> The bits defining the number of levels in the page table (SL0)
>> and and the size of the Input address to the translation (T0SZ)
>> are programmed for each VM upon entry to the guest.
> 
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 870f4b1..5ccd3ae 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
>>   static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
>>   {
>>   	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
>> +	u64 vtcr = read_sysreg(vtcr_el2);
>> +
>> +	vtcr &= ~VTCR_EL2_PRIVATE_MASK;
>> +	vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
>> +		VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
>> +	write_sysreg(vtcr, vtcr_el2);
>>   	write_sysreg(kvm->arch.vttbr, vttbr_el2);
>>   }
> 
> Do we need to set this register for tlb maintenance too?
> e.g. tlbi for a 3-level-stage2 vm when a 2-level-stage2 vm's vtcr is loaded...
> 
> (The ARM-ARM has 'Any of the bits of VTCR_EL2 are permitted to be cached in a TLB'.)

You're right. We need to set the VTCR for the tlb operations. I think
we can do this by hooking it to the __tlb_switch_to_guest() routine.
Will address it in the next version.

Cheers
Suzuki
Peter Maydell April 13, 2018, 1:21 p.m. UTC | #6
On 27 March 2018 at 14:15, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> Expose the maximum physical address size supported by the host
> for a VM. This could be later used by the userspace to choose the
> appropriate size for a given VM. The limit is determined as the
> minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
> and the stage2 page table support limit (which is 40bits at the moment).
> For backward compatibility, we support a minimum of 40bits. The limit
> will be lifted as we add support for the stage2 to support the host
> kernel PA limit.
>
> This value may be different from what is exposed to the VM via
> CPU ID registers. The limit only applies to the stage2 page table.
>
> Cc: Christoffer Dall <cdall@kernel.org>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Peter Maydel <peter.maydell@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  Documentation/virtual/kvm/api.txt | 14 ++++++++++++++
>  arch/arm/include/asm/kvm_mmu.h    |  5 +++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++++
>  include/uapi/linux/kvm.h          |  6 ++++++
>  virt/kvm/arm/arm.c                |  6 ++++++
>  5 files changed, 36 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 792fa87..55908a8 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error
>  This ioctl can be used to unregister the guest memory region registered
>  with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
>
> +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +Capability: basic
> +Architectures: arm, arm64
> +Type: system ioctl
> +Parameters: none
> +Returns: log2(Maximum physical address space size) supported by the
> +hyperviosr.

typo: "hypervisor".

> +
> +This ioctl can be used to identify the maximum physical address space size
> +supported by the hypervisor.

Is that the physical address space on the host, or the physical
address space size we present to the guest?

> The returned value indicates the maximum size
> +of the address that can be resolved by the stage2 translation table on
> +arm/arm64. On arm64, the value is decided based on the host kernel
> +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange.
> +This may not match the value exposed to the VM in CPU ID registers.

Isn't it likely to confuse the guest if we lie to it about the PA range it
sees? When would the two values differ?

Do we also need a 'set' operation, so userspace can create a VM
that has a 40 bit userspace on a CPU that supports more than that,
or does it just work?

What's the x86 API for KVM to tell userspace about physical address
range restrictions?

thanks
-- PMM
Punit Agrawal April 13, 2018, 4:27 p.m. UTC | #7
Hi Suzuki,

I haven't had a chance to look at the code but noticed one issue below.

Suzuki K Poulose <suzuki.poulose@arm.com> writes:

> Now that we can manage the stage2 page table per VM, switch the
> configuration details to per VM instance. We keep track of the
> IPA bits, number of page table levels and the VTCR bits (which
> depends on the IPA and the number of levels). While at it, remove
> unused pgd_lock field from kvm_arch for arm64.
>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h       | 14 ++++++++++++--
>  arch/arm64/include/asm/kvm_mmu.h        | 11 +++++++++--
>  arch/arm64/include/asm/stage2_pgtable.h |  1 -
>  arch/arm64/kvm/hyp/switch.c             |  3 +--
>  virt/kvm/arm/mmu.c                      |  4 ++++
>  5 files changed, 26 insertions(+), 7 deletions(-)
>

[...]

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index bb458bf..e86d7f4 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>   */
>  #define KVM_PHYS_SHIFT	(40)
>  
> -#define kvm_phys_shift(kvm)		KVM_PHYS_SHIFT
> +#define kvm_phys_shift(kvm)		(kvm->arch.phys_shift)
>  #define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
>  #define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
> +#define kvm_stage2_levels(kvm)		(kvm->arch.s2_levels)
>  
>  static inline bool kvm_page_empty(void *ptr)
>  {

[...]

> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index 33e8ebb..9b75b83 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -44,7 +44,6 @@
>   */
>  #define __s2_pgd_ptrs(pa, lvls)	(1 << ((pa) - pt_levels_pgdir_shift((lvls))))
>  
> -#define kvm_stage2_levels(kvm)		stage2_pt_levels(kvm_phys_shift(kvm))
>  #define stage2_pgdir_shift(kvm)	\
>  		pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
>  #define stage2_pgdir_size(kvm)		(_AC(1, UL) << stage2_pgdir_shift((kvm)))

[...]

> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 7a264c6..746f38e 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>  		return -EINVAL;
>  	}
>  
> +	/* Make sure we have the stage2 configured for this VM */
> +	if (WARN_ON(!kvm_stage2_levels(kvm)))
> +		return -EINVAL;
> +

This hunk breaks the 32-bit build as kvm_stag2_levels() isn't defined on
arm.

Thanks,
Punit

>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
>  	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
Suzuki K Poulose April 16, 2018, 10:23 a.m. UTC | #8
On 13/04/18 14:21, Peter Maydell wrote:
> On 27 March 2018 at 14:15, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>> Expose the maximum physical address size supported by the host
>> for a VM. This could be later used by the userspace to choose the
>> appropriate size for a given VM. The limit is determined as the
>> minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
>> and the stage2 page table support limit (which is 40bits at the moment).
>> For backward compatibility, we support a minimum of 40bits. The limit
>> will be lifted as we add support for the stage2 to support the host
>> kernel PA limit.
>>
>> This value may be different from what is exposed to the VM via
>> CPU ID registers. The limit only applies to the stage2 page table.
>>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Peter Maydel <peter.maydell@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   Documentation/virtual/kvm/api.txt | 14 ++++++++++++++
>>   arch/arm/include/asm/kvm_mmu.h    |  5 +++++
>>   arch/arm64/include/asm/kvm_mmu.h  |  5 +++++
>>   include/uapi/linux/kvm.h          |  6 ++++++
>>   virt/kvm/arm/arm.c                |  6 ++++++
>>   5 files changed, 36 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 792fa87..55908a8 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error
>>   This ioctl can be used to unregister the guest memory region registered
>>   with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
>>
>> +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>> +Capability: basic
>> +Architectures: arm, arm64
>> +Type: system ioctl
>> +Parameters: none
>> +Returns: log2(Maximum physical address space size) supported by the
>> +hyperviosr.
> 
> typo: "hypervisor".
> 

Will fix it.

>> +
>> +This ioctl can be used to identify the maximum physical address space size
>> +supported by the hypervisor.
> 
> Is that the physical address space on the host, or the physical
> address space size we present to the guest?

It is the size of the address space we present to the guest. I will update
the documentation to make it more clear.

> 
>> The returned value indicates the maximum size
>> +of the address that can be resolved by the stage2 translation table on
>> +arm/arm64. On arm64, the value is decided based on the host kernel
>> +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange.
>> +This may not match the value exposed to the VM in CPU ID registers.
> 
> Isn't it likely to confuse the guest if we lie to it about the PA range it
> sees? When would the two values differ?

On a heterogeneous system, the guest could see different values
of PARange on the same VCPU. So that is not safe for a guest at the moment.
Ideally, we should emulate the PARange to provide the system wide safe value, which the
guest can read.

We don't touch the emulation of PARange in the ID registers in this set.
All we do is (in the next patches) limiting the address space size provided
to the guest. May be we could update PARange to the limit imposed and emulate
the field.

> 
> Do we also need a 'set' operation, so userspace can create a VM
> that has a 40 bit userspace on a CPU that supports more than that,
> or does it just work?

It just works as before, creating a 40bit userspace, without any additional
steps. All we do is, allowing to create a VM with bigger address space
by specifying the size in the "type" field. The other question is, does
it really matter what a guest sees in PARange and what it is provided
with ? e.g, on my Juno, the A53's have 40bit and A57 has 44bit, while
the system uses only 40bit.

This will be true even with the new change. i.e, we don't allow a size
beyond the limit supported by all the CPUs on the system.

> 
> What's the x86 API for KVM to tell userspace about physical address
> range restrictions?

 From a quick look, the limit comes from cpuid (leaf 0x80000008 ?). So, it
could be via the existing per-VCPU get/set_cpuid{,2}() API on x86.

Suzuki
Suzuki K Poulose April 16, 2018, 10:25 a.m. UTC | #9
On 13/04/18 17:27, Punit Agrawal wrote:
> Hi Suzuki,
> 
> I haven't had a chance to look at the code but noticed one issue below.
> 
> Suzuki K Poulose <suzuki.poulose@arm.com> writes:
> 
>> Now that we can manage the stage2 page table per VM, switch the
>> configuration details to per VM instance. We keep track of the
>> IPA bits, number of page table levels and the VTCR bits (which
>> depends on the IPA and the number of levels). While at it, remove
>> unused pgd_lock field from kvm_arch for arm64.
>>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_host.h       | 14 ++++++++++++--
>>   arch/arm64/include/asm/kvm_mmu.h        | 11 +++++++++--
>>   arch/arm64/include/asm/stage2_pgtable.h |  1 -
>>   arch/arm64/kvm/hyp/switch.c             |  3 +--
>>   virt/kvm/arm/mmu.c                      |  4 ++++
>>   5 files changed, 26 insertions(+), 7 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index bb458bf..e86d7f4 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>>    */
>>   #define KVM_PHYS_SHIFT	(40)
>>   
>> -#define kvm_phys_shift(kvm)		KVM_PHYS_SHIFT
>> +#define kvm_phys_shift(kvm)		(kvm->arch.phys_shift)
>>   #define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
>>   #define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
>> +#define kvm_stage2_levels(kvm)		(kvm->arch.s2_levels)
>>   
>>   static inline bool kvm_page_empty(void *ptr)
>>   {
> 
> [...]
> 
>> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
>> index 33e8ebb..9b75b83 100644
>> --- a/arch/arm64/include/asm/stage2_pgtable.h
>> +++ b/arch/arm64/include/asm/stage2_pgtable.h
>> @@ -44,7 +44,6 @@
>>    */
>>   #define __s2_pgd_ptrs(pa, lvls)	(1 << ((pa) - pt_levels_pgdir_shift((lvls))))
>>   
>> -#define kvm_stage2_levels(kvm)		stage2_pt_levels(kvm_phys_shift(kvm))
>>   #define stage2_pgdir_shift(kvm)	\
>>   		pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
>>   #define stage2_pgdir_size(kvm)		(_AC(1, UL) << stage2_pgdir_shift((kvm)))
> 
> [...]
> 
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 7a264c6..746f38e 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>   		return -EINVAL;
>>   	}
>>   
>> +	/* Make sure we have the stage2 configured for this VM */
>> +	if (WARN_ON(!kvm_stage2_levels(kvm)))
>> +		return -EINVAL;
>> +
> 
> This hunk breaks the 32-bit build as kvm_stag2_levels() isn't defined on
> arm.

Thanks for spotting. I have fixed this locally in my next version to check
for the kvm_phys_shift(), as I plan to delay the levels selection to the
actual allocation of the table, giving us a fall back to increase the level
if we are unable to allocate contiguous pages (e.g, 16 * 64K pages with say 46bit
IPA).


Cheers
Suzuki
Julien Grall April 25, 2018, 4:35 p.m. UTC | #10
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> So far we had a static stage2 page table handling code, based on a
> fixed IPA of 40bits. As we prepare for a configurable IPA size per
> VM, make the our stage2 page table code dynamic to do the right thing
> for a given VM.
> 
> Support for the IPA size configuration needs other changes in the way
> we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
> fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
> to the top, before including the asm/stage2_pgtable.h to avoid a forward
> declaration.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm64/include/asm/kvm_mmu.h              |  16 +-
>   arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 ------
>   arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -----
>   arch/arm64/include/asm/stage2_pgtable.h       | 203 +++++++++++++++++---------
>   4 files changed, 145 insertions(+), 155 deletions(-)
>   delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
>   delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 594c4e6..bc133ce 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -18,9 +18,10 @@
>   #ifndef __ARM64_KVM_MMU_H__
>   #define __ARM64_KVM_MMU_H__
>   
> +#include <asm/cpufeature.h>
>   #include <asm/page.h>
>   #include <asm/memory.h>
> -#include <asm/cpufeature.h>

OOI, why was the include of cpufeature.h moved earlier?

Cheers,
Suzuki K Poulose April 25, 2018, 4:37 p.m. UTC | #11
On 25/04/18 17:35, Julien Grall wrote:
> Hi Suzuki,
> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> So far we had a static stage2 page table handling code, based on a
>> fixed IPA of 40bits. As we prepare for a configurable IPA size per
>> VM, make the our stage2 page table code dynamic to do the right thing
>> for a given VM.
>>
>> Support for the IPA size configuration needs other changes in the way
>> we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
>> fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
>> to the top, before including the asm/stage2_pgtable.h to avoid a forward
>> declaration.
>>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_mmu.h              |  16 +-
>>   arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 ------
>>   arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -----
>>   arch/arm64/include/asm/stage2_pgtable.h       | 203 +++++++++++++++++---------
>>   4 files changed, 145 insertions(+), 155 deletions(-)
>>   delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
>>   delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 594c4e6..bc133ce 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -18,9 +18,10 @@
>>   #ifndef __ARM64_KVM_MMU_H__
>>   #define __ARM64_KVM_MMU_H__
>> +#include <asm/cpufeature.h>
>>   #include <asm/page.h>
>>   #include <asm/memory.h>
>> -#include <asm/cpufeature.h>
> 
> OOI, why was the include of cpufeature.h moved earlier?

I think it was a minor cleanup to keep the includes in order. I will either
mention it or drop it from the hunk altogether.

Cheers
Suzuki
Julien Grall April 26, 2018, 10:54 a.m. UTC | #12
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> This patch rearranges the page table level helpers so that it can
> be reused for a page table with different number of levels
> (e.g, stage2 page table for a VM) than the kernel page tables.
> As such there is no functional change with this patch.
> 
> The page table helpers are defined to do the right thing for the
> fixed page table levels set for the kernel. This patch tries to
> refactor the code such that, we can have helpers for each level,
> which should be used when the caller knows that the level exists
> for the page table dealt with. Since the kernel defines helpers
> p.d_action and __p.d_action, for consistency, we name the raw
> page table action helpers __raw_p.d_action.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

> ---
>   arch/arm64/include/asm/pgalloc.h | 34 +++++++++++++++++++----
>   arch/arm64/include/asm/pgtable.h | 60 +++++++++++++++++++++++++---------------
>   2 files changed, 66 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 2e05bcd..ab4662c 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -29,6 +29,30 @@
>   #define PGALLOC_GFP	(GFP_KERNEL | __GFP_ZERO)
>   #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
>   
> +static inline void __raw_pmd_free(pmd_t *pmdp)
> +{
> +	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> +	free_page((unsigned long)pmdp);
> +}
> +
> +static inline void
> +__raw_pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
> +{
> +	__raw_set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
> +}
> +
> +static inline void __raw_pud_free(pud_t *pudp)
> +{
> +	BUG_ON((unsigned long)pudp & (PAGE_SIZE-1));
> +	free_page((unsigned long)pudp);
> +}
> +
> +static inline void
> +__raw_pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot)
> +{
> +	__raw_set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot));
> +}
> +
>   #if CONFIG_PGTABLE_LEVELS > 2
>   
>   static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> @@ -38,13 +62,12 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>   
>   static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
>   {
> -	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> -	free_page((unsigned long)pmdp);
> +	__raw_pmd_free(pmdp);
>   }
>   
>   static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
>   {
> -	set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
> +	__raw_pud_populate(pudp, pmdp, prot);
>   }
>   
>   static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> @@ -67,13 +90,12 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>   
>   static inline void pud_free(struct mm_struct *mm, pud_t *pudp)
>   {
> -	BUG_ON((unsigned long)pudp & (PAGE_SIZE-1));
> -	free_page((unsigned long)pudp);
> +	__raw_pud_free(pudp);
>   }
>   
>   static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot)
>   {
> -	set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot));
> +	__raw_pgd_populate(pgdp, pudp, prot);
>   }
>   
>   static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7e2c27e..5e22503 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -475,31 +475,39 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>    */
>   #define mk_pte(page,prot)	pfn_pte(page_to_pfn(page),prot)
>   
> -#if CONFIG_PGTABLE_LEVELS > 2
> +#define __raw_pud_none(pud)	(!pud_val(pud))
> +#define __raw_pud_bad(pud)	(!(pud_val(pud) & PUD_TABLE_BIT))
> +#define __raw_pud_present(pud)	pte_present(pud_pte(pud))
>   
> -#define pmd_ERROR(pmd)		__pmd_error(__FILE__, __LINE__, pmd_val(pmd))
> -
> -#define pud_none(pud)		(!pud_val(pud))
> -#define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
> -#define pud_present(pud)	pte_present(pud_pte(pud))
> -
> -static inline void set_pud(pud_t *pudp, pud_t pud)
> +static inline void __raw_set_pud(pud_t *pudp, pud_t pud)
>   {
>   	WRITE_ONCE(*pudp, pud);
>   	dsb(ishst);
>   	isb();
>   }
>   
> -static inline void pud_clear(pud_t *pudp)
> +static inline void __raw_pud_clear(pud_t *pudp)
>   {
> -	set_pud(pudp, __pud(0));
> +	__raw_set_pud(pudp, __pud(0));
>   }
>   
> -static inline phys_addr_t pud_page_paddr(pud_t pud)
> +static inline phys_addr_t __raw_pud_page_paddr(pud_t pud)
>   {
>   	return __pud_to_phys(pud);
>   }
>   
> +#if CONFIG_PGTABLE_LEVELS > 2
> +
> +#define pmd_ERROR(pmd)		__pmd_error(__FILE__, __LINE__, pmd_val(pmd))
> +
> +#define pud_none(pud)		__raw_pud_none(pud)
> +#define pud_bad(pud)		__raw_pud_bad(pud)
> +#define pud_present(pud)	__raw_pud_present(pud)
> +
> +#define set_pud(pudp, pud)	__raw_set_pud((pudp), (pud))
> +#define pud_clear(pudp)		__raw_pud_clear((pudp))
> +#define pud_page_paddr(pud)	__raw_pud_page_paddr((pud))
> +
>   /* Find an entry in the second-level page table. */
>   #define pmd_index(addr)		(((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
>   
> @@ -528,30 +536,38 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>   
>   #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
>   
> -#if CONFIG_PGTABLE_LEVELS > 3
> +#define __raw_pgd_none(pgd)	(!pgd_val(pgd))
> +#define __raw_pgd_bad(pgd)	(!(pgd_val(pgd) & 2))
> +#define __raw_pgd_present(pgd)	(pgd_val(pgd))
>   
> -#define pud_ERROR(pud)		__pud_error(__FILE__, __LINE__, pud_val(pud))
> -
> -#define pgd_none(pgd)		(!pgd_val(pgd))
> -#define pgd_bad(pgd)		(!(pgd_val(pgd) & 2))
> -#define pgd_present(pgd)	(pgd_val(pgd))
> -
> -static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
> +static inline void __raw_set_pgd(pgd_t *pgdp, pgd_t pgd)
>   {
>   	WRITE_ONCE(*pgdp, pgd);
>   	dsb(ishst);
>   }
>   
> -static inline void pgd_clear(pgd_t *pgdp)
> +static inline void __raw_pgd_clear(pgd_t *pgdp)
>   {
> -	set_pgd(pgdp, __pgd(0));
> +	__raw_set_pgd(pgdp, __pgd(0));
>   }
>   
> -static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
> +static inline phys_addr_t __raw_pgd_page_paddr(pgd_t pgd)
>   {
>   	return __pgd_to_phys(pgd);
>   }
>   
> +#if CONFIG_PGTABLE_LEVELS > 3
> +
> +#define pud_ERROR(pud)		__pud_error(__FILE__, __LINE__, pud_val(pud))
> +
> +#define pgd_none(pgd)		__raw_pgd_none((pgd))
> +#define pgd_bad(pgd)		__raw_pgd_bad((pgd))
> +#define pgd_present(pgd)	__raw_pgd_present((pgd))
> +
> +#define set_pgd(pgdp, pgd)	__raw_set_pgd((pgdp), (pgd))
> +#define pgd_clear(pgdp)		__raw_pgd_clear((pgdp))
> +#define pgd_page_paddr(pgd)	__raw_pgd_page_paddr((pgd))
> +
>   /* Find an entry in the frst-level page table. */
>   #define pud_index(addr)		(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
>   
>
Julien Grall April 26, 2018, 10:55 a.m. UTC | #13
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Make pud_huge reusable for stage2 tables, independent
> of the stage1 levels.
> 
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

> ---
>   arch/arm64/include/asm/pgtable.h | 5 +++++
>   arch/arm64/mm/hugetlbpage.c      | 2 +-
>   2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 5e22503..4a16c11 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -479,6 +479,11 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>   #define __raw_pud_bad(pud)	(!(pud_val(pud) & PUD_TABLE_BIT))
>   #define __raw_pud_present(pud)	pte_present(pud_pte(pud))
>   
> +static inline int __raw_pud_huge(pud_t pud)
> +{
> +	return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
> +}
> +
>   static inline void __raw_set_pud(pud_t *pudp, pud_t pud)
>   {
>   	WRITE_ONCE(*pudp, pud);
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index ecc6818..3f4bb43 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -35,7 +35,7 @@ int pmd_huge(pmd_t pmd)
>   int pud_huge(pud_t pud)
>   {
>   #ifndef __PAGETABLE_PMD_FOLDED
> -	return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
> +	return __raw_pud_huge(pud);
>   #else
>   	return 0;
>   #endif
>
Julien Grall April 26, 2018, 10:58 a.m. UTC | #14
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
> size shift. Limit the size to the maximum supported by the kernel.
> We are about to move the user of this code and this helps to
> keep the changes cleaner.

It is probably worth to mention that you are also adding 52-bit support 
in the patch.

Cheers,

> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
>   arch/arm64/kvm/hyp/s2-setup.c       | 28 +++++-----------------------
>   2 files changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index fbf0aab..1f2a5dd 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>   	return zcr;
>   }
>   
> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> +{
> +	switch (parange) {
> +	case 0: return 32;
> +	case 1: return 36;
> +	case 2: return 40;
> +	case 3: return 42;
> +	case 4: return 44;
> +	/* Report 48 bit if the kernel doesn't support 52bit */
> +	default:
> +	case 5: return 48;
> +#ifdef CONFIG_ARM64_PA_BITS_52
> +	case 6: return 52;
> +#endif
> +	}
> +}
>   #endif /* __ASSEMBLY__ */
>   
>   #endif
> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
> index 603e1ee..b1129c8 100644
> --- a/arch/arm64/kvm/hyp/s2-setup.c
> +++ b/arch/arm64/kvm/hyp/s2-setup.c
> @@ -19,11 +19,13 @@
>   #include <asm/kvm_arm.h>
>   #include <asm/kvm_asm.h>
>   #include <asm/kvm_hyp.h>
> +#include <asm/cpufeature.h>
>   
>   u32 __hyp_text __init_stage2_translation(void)
>   {
>   	u64 val = VTCR_EL2_FLAGS;
>   	u64 parange;
> +	u32 phys_shift;
>   	u64 tmp;
>   
>   	/*
> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
>   	val |= parange << 16;
>   
>   	/* Compute the actual PARange... */
> -	switch (parange) {
> -	case 0:
> -		parange = 32;
> -		break;
> -	case 1:
> -		parange = 36;
> -		break;
> -	case 2:
> -		parange = 40;
> -		break;
> -	case 3:
> -		parange = 42;
> -		break;
> -	case 4:
> -		parange = 44;
> -		break;
> -	case 5:
> -	default:
> -		parange = 48;
> -		break;
> -	}
> +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
>   
>   	/*
>   	 * ... and clamp it to 40 bits, unless we have some braindead
> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
>   	 * return that value for the rest of the kernel to decide what
>   	 * to do.
>   	 */
> -	val |= 64 - (parange > 40 ? 40 : parange);
> +	val |= 64 - (phys_shift > 40 ? 40 : phys_shift);
>   
>   	/*
>   	 * Check the availability of Hardware Access Flag / Dirty Bit
> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
>   
>   	write_sysreg(val, vtcr_el2);
>   
> -	return parange;
> +	return phys_shift;
>   }
>
Julien Grall April 26, 2018, 1:35 p.m. UTC | #15
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Right now the stage2 page table for a VM is hard coded, assuming
> an IPA of 40bits. As we are about to add support for per VM IPA,
> prepare the stage2 page table helpers to accept the kvm instance
> to make the right decision for the VM. No functional changes.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm/include/asm/kvm_arm.h                |   3 +-
>   arch/arm/include/asm/kvm_mmu.h                |  15 +++-
>   arch/arm/include/asm/stage2_pgtable.h         |  42 ++++-----
>   arch/arm64/include/asm/kvm_mmu.h              |   7 +-
>   arch/arm64/include/asm/stage2_pgtable-nopmd.h |  18 ++--
>   arch/arm64/include/asm/stage2_pgtable-nopud.h |  16 ++--
>   arch/arm64/include/asm/stage2_pgtable.h       |  49 ++++++-----
>   virt/kvm/arm/arm.c                            |   2 +-
>   virt/kvm/arm/mmu.c                            | 119 +++++++++++++-------------
>   virt/kvm/arm/vgic/vgic-kvm-device.c           |   2 +-
>   10 files changed, 148 insertions(+), 125 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 3ab8b37..c3f1f9b 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -133,8 +133,7 @@
>    * space.
>    */
>   #define KVM_PHYS_SHIFT	(40)
> -#define KVM_PHYS_SIZE	(_AC(1, ULL) << KVM_PHYS_SHIFT)
> -#define KVM_PHYS_MASK	(KVM_PHYS_SIZE - _AC(1, ULL))

I assume you are moving them to kvm_mmu.h in order to match the arm64 
side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it?

[...]

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 7faed6e..594c4e6 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>    * We currently only support a 40bit IPA.
>    */
>   #define KVM_PHYS_SHIFT	(40)
> -#define KVM_PHYS_SIZE	(1UL << KVM_PHYS_SHIFT)
> -#define KVM_PHYS_MASK	(KVM_PHYS_SIZE - 1UL)
> +
> +#define kvm_phys_shift(kvm)		KVM_PHYS_SHIFT
> +#define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))

NIT: is the _AC(...) necessary? It does not look like this function is 
going to be usable in assembly.

> +#define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))

Same here.
Julien Grall April 26, 2018, 2:08 p.m. UTC | #16
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Specify the physical size for the VM encoded in the vm type.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arm/include/arm-common/kvm-arch.h |  6 +++++-
>   arm/kvm.c                         | 21 +++++++++++++++++++++
>   2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> index ca7ab0f..64bc92a 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -42,7 +42,11 @@
>   
>   #define KVM_IRQ_OFFSET		GIC_SPI_IRQ_BASE
>   
> -#define KVM_VM_TYPE		0
> +extern unsigned long		kvm_arm_type;
> +extern void kvm__arch_init_hyp(struct kvm *kvm);
> +
> +#define KVM_VM_TYPE		kvm_arm_type
> +#define kvm__arch_init_hyp	kvm__arch_init_hyp
>   
>   #define VIRTIO_DEFAULT_TRANS(kvm)	\
>   	((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO)
> diff --git a/arm/kvm.c b/arm/kvm.c
> index 5701d41..a9a9140 100644
> --- a/arm/kvm.c
> +++ b/arm/kvm.c
> @@ -11,6 +11,8 @@
>   #include <linux/kvm.h>
>   #include <linux/sizes.h>
>   
> +unsigned long kvm_arm_type;
> +
>   struct kvm_ext kvm_req_ext[] = {
>   	{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
>   	{ DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
>   	{ 0, 0 },
>   };
>   
> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT		_IO(KVMIO, 0x0a)
> +#endif

Once the IOCTL is agreed, I think you would need to update 
include/linux/kvm.h.

> +
> +void kvm__arch_init_hyp(struct kvm *kvm)
> +{
> +	unsigned max_ipa;
> +
> +	max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
> +	if (max_ipa < 0)
> +		max_ipa = 40;
> +	if (!kvm->cfg.arch.phys_shift)
> +		kvm->cfg.arch.phys_shift = 40;
> +	if (kvm->cfg.arch.phys_shift > max_ipa)
> +		die("Requested PA size (%u) is not supported by the host (%ubits)\n",
> +		    kvm->cfg.arch.phys_shift, max_ipa);
> +	kvm_arm_type = kvm->cfg.arch.phys_shift;

FWIW, older Linux will not accept a KVM type other than 0. So I think 
the best would be to bail out if KVM_ARM_GET_MAX_VM_PHYS_SHIFT does not 
exist.

For safety, you could even check that arch.phys_shift is always 40 when 
running under older Linux.

Cheers,
Suzuki K Poulose April 27, 2018, 3:18 p.m. UTC | #17
On 26/04/18 11:58, Julien Grall wrote:
> Hi Suzuki,
> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
>> size shift. Limit the size to the maximum supported by the kernel.
>> We are about to move the user of this code and this helps to
>> keep the changes cleaner.
> 
> It is probably worth to mention that you are also adding 52-bit support in the patch.

Sure, will do. Can I take that as a Reviewed-by with the fixed
commit description ?

Cheers
Suzuki

> 
> Cheers,
> 
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
>>   arch/arm64/kvm/hyp/s2-setup.c       | 28 +++++-----------------------
>>   2 files changed, 21 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index fbf0aab..1f2a5dd 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>>       return zcr;
>>   }
>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>> +{
>> +    switch (parange) {
>> +    case 0: return 32;
>> +    case 1: return 36;
>> +    case 2: return 40;
>> +    case 3: return 42;
>> +    case 4: return 44;
>> +    /* Report 48 bit if the kernel doesn't support 52bit */
>> +    default:
>> +    case 5: return 48;
>> +#ifdef CONFIG_ARM64_PA_BITS_52
>> +    case 6: return 52;
>> +#endif
>> +    }
>> +}
>>   #endif /* __ASSEMBLY__ */
>>   #endif
>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
>> index 603e1ee..b1129c8 100644
>> --- a/arch/arm64/kvm/hyp/s2-setup.c
>> +++ b/arch/arm64/kvm/hyp/s2-setup.c
>> @@ -19,11 +19,13 @@
>>   #include <asm/kvm_arm.h>
>>   #include <asm/kvm_asm.h>
>>   #include <asm/kvm_hyp.h>
>> +#include <asm/cpufeature.h>
>>   u32 __hyp_text __init_stage2_translation(void)
>>   {
>>       u64 val = VTCR_EL2_FLAGS;
>>       u64 parange;
>> +    u32 phys_shift;
>>       u64 tmp;
>>       /*
>> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>       val |= parange << 16;
>>       /* Compute the actual PARange... */
>> -    switch (parange) {
>> -    case 0:
>> -        parange = 32;
>> -        break;
>> -    case 1:
>> -        parange = 36;
>> -        break;
>> -    case 2:
>> -        parange = 40;
>> -        break;
>> -    case 3:
>> -        parange = 42;
>> -        break;
>> -    case 4:
>> -        parange = 44;
>> -        break;
>> -    case 5:
>> -    default:
>> -        parange = 48;
>> -        break;
>> -    }
>> +    phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
>>       /*
>>        * ... and clamp it to 40 bits, unless we have some braindead
>> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>        * return that value for the rest of the kernel to decide what
>>        * to do.
>>        */
>> -    val |= 64 - (parange > 40 ? 40 : parange);
>> +    val |= 64 - (phys_shift > 40 ? 40 : phys_shift);
>>       /*
>>        * Check the availability of Hardware Access Flag / Dirty Bit
>> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
>>       write_sysreg(val, vtcr_el2);
>> -    return parange;
>> +    return phys_shift;
>>   }
>>
>
Julien Grall April 27, 2018, 3:18 p.m. UTC | #18
On 27/04/18 16:18, Suzuki K Poulose wrote:
> On 26/04/18 11:58, Julien Grall wrote:
>> Hi Suzuki,
>>
>> On 27/03/18 14:15, Suzuki K Poulose wrote:
>>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
>>> size shift. Limit the size to the maximum supported by the kernel.
>>> We are about to move the user of this code and this helps to
>>> keep the changes cleaner.
>>
>> It is probably worth to mention that you are also adding 52-bit 
>> support in the patch.
> 
> Sure, will do. Can I take that as a Reviewed-by with the fixed
> commit description ?

Yes. Here we go:

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

> 
> Cheers
> Suzuki
> 
>>
>> Cheers,
>>
>>>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Christoffer Dall <cdall@kernel.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
>>>   arch/arm64/kvm/hyp/s2-setup.c       | 28 +++++-----------------------
>>>   2 files changed, 21 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/cpufeature.h 
>>> b/arch/arm64/include/asm/cpufeature.h
>>> index fbf0aab..1f2a5dd 100644
>>> --- a/arch/arm64/include/asm/cpufeature.h
>>> +++ b/arch/arm64/include/asm/cpufeature.h
>>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>>>       return zcr;
>>>   }
>>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>>> +{
>>> +    switch (parange) {
>>> +    case 0: return 32;
>>> +    case 1: return 36;
>>> +    case 2: return 40;
>>> +    case 3: return 42;
>>> +    case 4: return 44;
>>> +    /* Report 48 bit if the kernel doesn't support 52bit */
>>> +    default:
>>> +    case 5: return 48;
>>> +#ifdef CONFIG_ARM64_PA_BITS_52
>>> +    case 6: return 52;
>>> +#endif
>>> +    }
>>> +}
>>>   #endif /* __ASSEMBLY__ */
>>>   #endif
>>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c 
>>> b/arch/arm64/kvm/hyp/s2-setup.c
>>> index 603e1ee..b1129c8 100644
>>> --- a/arch/arm64/kvm/hyp/s2-setup.c
>>> +++ b/arch/arm64/kvm/hyp/s2-setup.c
>>> @@ -19,11 +19,13 @@
>>>   #include <asm/kvm_arm.h>
>>>   #include <asm/kvm_asm.h>
>>>   #include <asm/kvm_hyp.h>
>>> +#include <asm/cpufeature.h>
>>>   u32 __hyp_text __init_stage2_translation(void)
>>>   {
>>>       u64 val = VTCR_EL2_FLAGS;
>>>       u64 parange;
>>> +    u32 phys_shift;
>>>       u64 tmp;
>>>       /*
>>> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>>       val |= parange << 16;
>>>       /* Compute the actual PARange... */
>>> -    switch (parange) {
>>> -    case 0:
>>> -        parange = 32;
>>> -        break;
>>> -    case 1:
>>> -        parange = 36;
>>> -        break;
>>> -    case 2:
>>> -        parange = 40;
>>> -        break;
>>> -    case 3:
>>> -        parange = 42;
>>> -        break;
>>> -    case 4:
>>> -        parange = 44;
>>> -        break;
>>> -    case 5:
>>> -    default:
>>> -        parange = 48;
>>> -        break;
>>> -    }
>>> +    phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
>>>       /*
>>>        * ... and clamp it to 40 bits, unless we have some braindead
>>> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>>        * return that value for the rest of the kernel to decide what
>>>        * to do.
>>>        */
>>> -    val |= 64 - (parange > 40 ? 40 : parange);
>>> +    val |= 64 - (phys_shift > 40 ? 40 : phys_shift);
>>>       /*
>>>        * Check the availability of Hardware Access Flag / Dirty Bit
>>> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
>>>       write_sysreg(val, vtcr_el2);
>>> -    return parange;
>>> +    return phys_shift;
>>>   }
>>>
>>
>
Suzuki K Poulose April 27, 2018, 3:22 p.m. UTC | #19
On 26/04/18 14:35, Julien Grall wrote:
> Hi Suzuki,
> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Right now the stage2 page table for a VM is hard coded, assuming
>> an IPA of 40bits. As we are about to add support for per VM IPA,
>> prepare the stage2 page table helpers to accept the kvm instance
>> to make the right decision for the VM. No functional changes.


>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 3ab8b37..c3f1f9b 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -133,8 +133,7 @@
>>    * space.
>>    */
>>   #define KVM_PHYS_SHIFT    (40)
>> -#define KVM_PHYS_SIZE    (_AC(1, ULL) << KVM_PHYS_SHIFT)
>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - _AC(1, ULL))
> 
> I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it?
> 
> [...]

I am moving all the macros that depend on the "kvm" instance to kvm_mmu.h.
I will see if I can move the KVM_PHYS_SHIFT without much trouble.

> 
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 7faed6e..594c4e6 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>>    * We currently only support a 40bit IPA.
>>    */
>>   #define KVM_PHYS_SHIFT    (40)
>> -#define KVM_PHYS_SIZE    (1UL << KVM_PHYS_SHIFT)
>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - 1UL)
>> +
>> +#define kvm_phys_shift(kvm)        KVM_PHYS_SHIFT
>> +#define kvm_phys_size(kvm)        (_AC(1, ULL) << kvm_phys_shift(kvm))
> 
> NIT: is the _AC(...) necessary? It does not look like this function is going to be usable in assembly.
> 
>> +#define kvm_phys_mask(kvm)        (kvm_phys_size(kvm) - _AC(1, ULL))
> 
> Same here.
> 

Ok, will drop them.

Cheers
Suzuki
Suzuki K Poulose April 27, 2018, 3:58 p.m. UTC | #20
On 27/04/18 16:22, Suzuki K Poulose wrote:
> On 26/04/18 14:35, Julien Grall wrote:
>> Hi Suzuki,
>>
>> On 27/03/18 14:15, Suzuki K Poulose wrote:
>>> Right now the stage2 page table for a VM is hard coded, assuming
>>> an IPA of 40bits. As we are about to add support for per VM IPA,
>>> prepare the stage2 page table helpers to accept the kvm instance
>>> to make the right decision for the VM. No functional changes.
> 
> 
>>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>>> index 3ab8b37..c3f1f9b 100644
>>> --- a/arch/arm/include/asm/kvm_arm.h
>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>> @@ -133,8 +133,7 @@
>>>    * space.
>>>    */
>>>   #define KVM_PHYS_SHIFT    (40)
>>> -#define KVM_PHYS_SIZE    (_AC(1, ULL) << KVM_PHYS_SHIFT)
>>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - _AC(1, ULL))
>>
>> I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it?
>>
>> [...]
> 
> I am moving all the macros that depend on the "kvm" instance to kvm_mmu.h.
> I will see if I can move the KVM_PHYS_SHIFT without much trouble.

It looks like we can't do that easily. KVM_PHYS_SHIFT is used for KVM_T0SZ
on arm, even though that can be simply hard coded to avoid the dependency on
KVM_PHYS_SHIFT (like we did for arm64, T0SZ is defined to 24). I would leave it
as it is to avoid the noise.

Cheers
Suzuki
Julien Grall April 27, 2018, 4:04 p.m. UTC | #21
Hi Suzuki,

On 27/04/18 16:58, Suzuki K Poulose wrote:
> On 27/04/18 16:22, Suzuki K Poulose wrote:
>> On 26/04/18 14:35, Julien Grall wrote:
>>> Hi Suzuki,
>>>
>>> On 27/03/18 14:15, Suzuki K Poulose wrote:
>>>> Right now the stage2 page table for a VM is hard coded, assuming
>>>> an IPA of 40bits. As we are about to add support for per VM IPA,
>>>> prepare the stage2 page table helpers to accept the kvm instance
>>>> to make the right decision for the VM. No functional changes.
>>
>>
>>>> diff --git a/arch/arm/include/asm/kvm_arm.h 
>>>> b/arch/arm/include/asm/kvm_arm.h
>>>> index 3ab8b37..c3f1f9b 100644
>>>> --- a/arch/arm/include/asm/kvm_arm.h
>>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>>> @@ -133,8 +133,7 @@
>>>>    * space.
>>>>    */
>>>>   #define KVM_PHYS_SHIFT    (40)
>>>> -#define KVM_PHYS_SIZE    (_AC(1, ULL) << KVM_PHYS_SHIFT)
>>>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - _AC(1, ULL))
>>>
>>> I assume you are moving them to kvm_mmu.h in order to match the arm64 
>>> side, right? If so, would not it make sense to make KVM_PHYS_SHIFT 
>>> with it?
>>>
>>> [...]
>>
>> I am moving all the macros that depend on the "kvm" instance to 
>> kvm_mmu.h.
>> I will see if I can move the KVM_PHYS_SHIFT without much trouble.
> 
> It looks like we can't do that easily. KVM_PHYS_SHIFT is used for KVM_T0SZ
> on arm, even though that can be simply hard coded to avoid the 
> dependency on
> KVM_PHYS_SHIFT (like we did for arm64, T0SZ is defined to 24). I would 
> leave it
> as it is to avoid the noise.

Fine. That was only a suggestion :).

Cheers,
Julien Grall April 30, 2018, 11:14 a.m. UTC | #22
Hi Suzuki,

The algos in this patch looks good to me. A couple of NIT below.

On 27/03/18 14:15, Suzuki K Poulose wrote:
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index b0c8417..176551c 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -124,6 +124,8 @@
>   #define VTCR_EL2_VS_8BIT	(0 << VTCR_EL2_VS_SHIFT)
>   #define VTCR_EL2_VS_16BIT	(1 << VTCR_EL2_VS_SHIFT)
>   
> +#define VTCR_EL2_T0SZ(x)	TCR_T0SZ((x))

NIT: The inner parentheses should not be necessary.

[...]

> +/*
> + * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
> + * Interestingly, it depends on the page size.
> + * See D.10.2.110, VTCR_EL2, in ARM DDI 0487B.b
> + *
> + *	-----------------------------------------
> + *	| Entry level		|  4K  | 16K/64K |
> + *	------------------------------------------
> + *	| Level: 0		|  2   |   -     |
> + *	------------------------------------------
> + *	| Level: 1		|  1   |   2     |
> + *	------------------------------------------
> + *	| Level: 2		|  0   |   1     |
> + *	------------------------------------------
> + *	| Level: 3		|  -   |   0     |
> + *	------------------------------------------
> + *
> + * That table roughly translates to :
> + *
> + *	SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level
> + *
> + * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided
> + * we take care of ruling out the unsupported cases and
> + * Entry_Level = 4 - Number_of_levels.
> + *
> + */
> +#define VTCR_EL2_SL0(levels) \
> +	((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
> +/*
> + * ARM VMSAv8-64 defines an algorithm for finding the translation table
> + * descriptors in section D4.2.8 in ARM DDI 0487B.b.
> + *
> + * The algorithm defines the expectaions on the BaseAddress (for the page

NIT: s/expectaions/expectations/

Cheers,
Julien Grall April 30, 2018, 2:17 p.m. UTC | #23
Hi,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> diff --git a/arm/kvm.c b/arm/kvm.c
> index 5701d41..a9a9140 100644
> --- a/arm/kvm.c
> +++ b/arm/kvm.c
> @@ -11,6 +11,8 @@
>   #include <linux/kvm.h>
>   #include <linux/sizes.h>
>   
> +unsigned long kvm_arm_type;
> +
>   struct kvm_ext kvm_req_ext[] = {
>   	{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
>   	{ DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
>   	{ 0, 0 },
>   };
>   
> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT		_IO(KVMIO, 0x0a)
> +#endif
> +
> +void kvm__arch_init_hyp(struct kvm *kvm)
> +{
> +	unsigned max_ipa;
> +
> +	max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
> +	if (max_ipa < 0)

Another issues spotted while doing some testing. This will always be 
false because max_ipa is unsigned.

I think we want to turn max_ipa to signed.

Cheers,
Suzuki K Poulose April 30, 2018, 2:18 p.m. UTC | #24
On 30/04/18 15:17, Julien Grall wrote:
> Hi,
> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> diff --git a/arm/kvm.c b/arm/kvm.c
>> index 5701d41..a9a9140 100644
>> --- a/arm/kvm.c
>> +++ b/arm/kvm.c
>> @@ -11,6 +11,8 @@
>>   #include <linux/kvm.h>
>>   #include <linux/sizes.h>
>> +unsigned long kvm_arm_type;
>> +
>>   struct kvm_ext kvm_req_ext[] = {
>>       { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
>>       { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
>> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
>>       { 0, 0 },
>>   };
>> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT        _IO(KVMIO, 0x0a)
>> +#endif
>> +
>> +void kvm__arch_init_hyp(struct kvm *kvm)
>> +{
>> +    unsigned max_ipa;
>> +
>> +    max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
>> +    if (max_ipa < 0)
> 
> Another issues spotted while doing some testing. This will always be false because max_ipa is unsigned.
> 
> I think we want to turn max_ipa to signed.

Good catch. I will fix it.

Cheers
Suzuki
James Morse May 3, 2018, 2:39 p.m. UTC | #25
Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> We set VTCR_EL2 very early during the stage2 init and don't
> touch it ever. This is fine as we had a fixed IPA size. This
> patch changes the behavior to set the VTCR for a given VM,
> depending on its stage2 table. The common configuration for
> VTCR is still performed during the early init as we have to
> retain the hardware access flag update bits (VTCR_EL2_HA)
> per CPU (as they are only set for the CPUs which are capabile).
> The bits defining the number of levels in the page table (SL0)
> and and the size of the Input address to the translation (T0SZ)
> are programmed for each VM upon entry to the guest.


> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 596f8e4..9f3c8b8 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  
>  static inline void __cpu_init_stage2(void)
>  {
> -	u32 parange = kvm_call_hyp(__init_stage2_translation);
> +	u32 ps;
>  
> -	WARN_ONCE(parange < 40,
> -		  "PARange is %d bits, unsupported configuration!", parange);
> +	kvm_call_hyp(__init_stage2_translation);
> +	ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1));

Doesn't id_aa64mmfr0_parange_to_phys_shift() expect you do to the mask and shift
for it? This will always hit the default case.


> +	WARN_ONCE(ps < 40,
> +		  "PARange is %d bits, unsupported configuration!", ps);
>  }



Thanks,

James
James Morse May 3, 2018, 2:39 p.m. UTC | #26
Hi Suzuki,

Nit: KVM in the subject line?

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
> size shift. Limit the size to the maximum supported by the kernel.
> We are about to move the user of this code and this helps to
> keep the changes cleaner.

> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index fbf0aab..1f2a5dd 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>  	return zcr;
>  }
>  
> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> +{
> +	switch (parange) {
> +	case 0: return 32;
> +	case 1: return 36;
> +	case 2: return 40;
> +	case 3: return 42;
> +	case 4: return 44;
> +	/* Report 48 bit if the kernel doesn't support 52bit */
> +	default:
> +	case 5: return 48;
> +#ifdef CONFIG_ARM64_PA_BITS_52
> +	case 6: return 52;
> +#endif

Eeew. I thought 'default' had to appear at the end of the list, but evidently
not! If the last three bit value ever gets used this is going to look really weird.

Can't we have a helper that just does the mapping, then apply the clamping with
something like:
| parange = min(CONFIG_ARM64_PA_BITS, parange);


Its odd that the helper has the id-register in the name, but expects you do the
shift and mask for it...

(and for this patch, KVM has already done the 52bit clamping with:
| 	if (parange > ID_AA64MMFR0_PARANGE_MAX)
|		parange = ID_AA64MMFR0_PARANGE_MAX;
)


> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
> index 603e1ee..b1129c8 100644
> --- a/arch/arm64/kvm/hyp/s2-setup.c
> +++ b/arch/arm64/kvm/hyp/s2-setup.c
> @@ -19,11 +19,13 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/cpufeature.h>
>  
>  u32 __hyp_text __init_stage2_translation(void)
>  {

Nit: Why change the variable you put this in, if its all removed again in patch 11?


Thanks,

James
Suzuki K Poulose May 8, 2018, 11:16 a.m. UTC | #27
On 03/05/18 15:39, James Morse wrote:
> Hi Suzuki,
> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> We set VTCR_EL2 very early during the stage2 init and don't
>> touch it ever. This is fine as we had a fixed IPA size. This
>> patch changes the behavior to set the VTCR for a given VM,
>> depending on its stage2 table. The common configuration for
>> VTCR is still performed during the early init as we have to
>> retain the hardware access flag update bits (VTCR_EL2_HA)
>> per CPU (as they are only set for the CPUs which are capabile).
>> The bits defining the number of levels in the page table (SL0)
>> and and the size of the Input address to the translation (T0SZ)
>> are programmed for each VM upon entry to the guest.
> 
> 
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 596f8e4..9f3c8b8 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>>   
>>   static inline void __cpu_init_stage2(void)
>>   {
>> -	u32 parange = kvm_call_hyp(__init_stage2_translation);
>> +	u32 ps;
>>   
>> -	WARN_ONCE(parange < 40,
>> -		  "PARange is %d bits, unsupported configuration!", parange);
>> +	kvm_call_hyp(__init_stage2_translation);
>> +	ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1));
> 
> Doesn't id_aa64mmfr0_parange_to_phys_shift() expect you do to the mask and shift
> for it? This will always hit the default case.

Good catch ! The error case was not hit on the system I tested, as it was
indeed having 48bit PA. I should have done more testing with Juno where it
is 40bit PA (which doesn't really allow different phys-shift ranges).

I will change the helper to extract the parange and the convert it.
Also, rename it to :

id_aa64mmfr0_phys_shift()

Cheers
Suzuki

> 
> 
>> +	WARN_ONCE(ps < 40,
>> +		  "PARange is %d bits, unsupported configuration!", ps);
>>   }
> 
> 
> 
> Thanks,
> 
> James
>
Suzuki K Poulose May 8, 2018, 1:47 p.m. UTC | #28
On 03/05/18 15:39, James Morse wrote:
> Hi Suzuki,
> 
> Nit: KVM in the subject line?

Well, the helper is generic and its just that KVM makes use of it.

> 
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
>> size shift. Limit the size to the maximum supported by the kernel.
>> We are about to move the user of this code and this helps to
>> keep the changes cleaner.
> 
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index fbf0aab..1f2a5dd 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>>   	return zcr;
>>   }
>>   
>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>> +{
>> +	switch (parange) {
>> +	case 0: return 32;
>> +	case 1: return 36;
>> +	case 2: return 40;
>> +	case 3: return 42;
>> +	case 4: return 44;
>> +	/* Report 48 bit if the kernel doesn't support 52bit */
>> +	default:
>> +	case 5: return 48;
>> +#ifdef CONFIG_ARM64_PA_BITS_52
>> +	case 6: return 52;
>> +#endif
> 
> Eeew. I thought 'default' had to appear at the end of the list, but evidently
> not! If the last three bit value ever gets used this is going to look really weird.

I could rearrange them a bit to :

	case 4: ..
#ifdef CONFIG_ARM64_PA_BITS_52
	case 6: ...
#endif
	case 5:
	/* Report 48 bit if the kernel doesn't support 52bit */
	default: return 48;

> 
> Can't we have a helper that just does the mapping, then apply the clamping with
> something like:
> | parange = min(CONFIG_ARM64_PA_BITS, parange);

Yes, I think that might be a bit cleaner.

> 
> 
> Its odd that the helper has the id-register in the name, but expects you do the
> shift and mask for it...
> 
> (and for this patch, KVM has already done the 52bit clamping with:
> | 	if (parange > ID_AA64MMFR0_PARANGE_MAX)
> |		parange = ID_AA64MMFR0_PARANGE_MAX;
> )
> 

As mentioned in the other thread, I will change the name of the helper.

> 
>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
>> index 603e1ee..b1129c8 100644
>> --- a/arch/arm64/kvm/hyp/s2-setup.c
>> +++ b/arch/arm64/kvm/hyp/s2-setup.c
>> @@ -19,11 +19,13 @@
>>   #include <asm/kvm_arm.h>
>>   #include <asm/kvm_asm.h>
>>   #include <asm/kvm_hyp.h>
>> +#include <asm/cpufeature.h>
>>   
>>   u32 __hyp_text __init_stage2_translation(void)
>>   {
> 
> Nit: Why change the variable you put this in, if its all removed again in patch 11?

The parange holds the PARange initially, used to set the VTCR.IPS and is then
overloaded with the converted "phys-shift". The change is a minor cleanup to
make that clear, even though we remove it later as we don't deal with the
phys-shifts anymore. I would prefer to keep it as it is.

Cheers
Suzuki
Michael S. Tsirkin July 13, 2018, 12:36 a.m. UTC | #29
On Tue, Mar 27, 2018 at 05:11:04PM +0300, Michael S. Tsirkin wrote:
> On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote:
> > Legacy PCI over virtio uses a 32bit PFN for the queue. If the
> > queue pfn is too large to fit in 32bits, which we could hit on
> > arm64 systems with 52bit physical addresses (even with 64K page
> > size), we simply miss out a proper link to the other side of
> > the queue.
> > 
> > Add a check to validate the PFN, rather than silently breaking
> > the devices.
> > 
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Christoffer Dall <cdall@kernel.org>
> > Cc: Peter Maydel <peter.maydell@linaro.org>
> > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > ---
> >  drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > index 2780886..4b84a75 100644
> > --- a/drivers/virtio/virtio_pci_legacy.c
> > +++ b/drivers/virtio/virtio_pci_legacy.c
> > @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> >  	struct virtqueue *vq;
> >  	u16 num;
> >  	int err;
> > +	u64 q_pfn;
> >  
> >  	/* Select the queue we're interested in */
> >  	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> > @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> >  	if (!vq)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > +	q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> > +	if (q_pfn >> 32) {
> > +		dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
> > +		err = -ENOMEM;
> 
> ENOMEM seems wrong here. E2BIG?
> 
> > +		goto out_del_vq;
> > +	}
> > +
> >  	/* activate the queue */
> > -	iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > -		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > +	iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> 
> Is the cast really necessary here?
> 
> >  
> >  	vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
> >  
> > @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> >  
> >  out_deactivate:
> >  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > +out_del_vq:
> >  	vring_del_virtqueue(vq);
> >  	return ERR_PTR(err);
> >  }
> > -- 
> > 2.7.4

Ping are you going to address and repost, or should I drop this?
Suzuki K Poulose July 13, 2018, 8:54 a.m. UTC | #30
On 13/07/18 01:36, Michael S. Tsirkin wrote:
> On Tue, Mar 27, 2018 at 05:11:04PM +0300, Michael S. Tsirkin wrote:
>> On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote:
>>> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
>>> queue pfn is too large to fit in 32bits, which we could hit on
>>> arm64 systems with 52bit physical addresses (even with 64K page
>>> size), we simply miss out a proper link to the other side of
>>> the queue.
>>>
>>> Add a check to validate the PFN, rather than silently breaking
>>> the devices.
>>>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Christoffer Dall <cdall@kernel.org>
>>> Cc: Peter Maydel <peter.maydell@linaro.org>
>>> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
>>> index 2780886..4b84a75 100644
>>> --- a/drivers/virtio/virtio_pci_legacy.c
>>> +++ b/drivers/virtio/virtio_pci_legacy.c
>>> @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>>   	struct virtqueue *vq;
>>>   	u16 num;
>>>   	int err;
>>> +	u64 q_pfn;
>>>   
>>>   	/* Select the queue we're interested in */
>>>   	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>> @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>>   	if (!vq)
>>>   		return ERR_PTR(-ENOMEM);
>>>   
>>> +	q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>>> +	if (q_pfn >> 32) {
>>> +		dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
>>> +		err = -ENOMEM;
>>
>> ENOMEM seems wrong here. E2BIG?
>>
>>> +		goto out_del_vq;
>>> +	}
>>> +
>>>   	/* activate the queue */
>>> -	iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
>>> -		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>>> +	iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>>
>> Is the cast really necessary here?
>>
>>>   
>>>   	vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
>>>   
>>> @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>>   
>>>   out_deactivate:
>>>   	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>>> +out_del_vq:
>>>   	vring_del_virtqueue(vq);
>>>   	return ERR_PTR(err);
>>>   }
>>> -- 
>>> 2.7.4

Michael,

> 
> Ping are you going to address and repost, or should I drop this?

This was addressed and reposted as v3, which needs a minor update to the
error message as mentioned here [0]. I will post the fixed version today.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/588398.html

Thanks
Suzuki