mbox series

[v4,00/20] kvm: arm64: Dynamic IPA and 52bit IPA

Message ID 1531905547-25478-1-git-send-email-suzuki.poulose@arm.com
Headers show
Series kvm: arm64: Dynamic IPA and 52bit IPA | expand

Message

Suzuki K Poulose July 18, 2018, 9:18 a.m. UTC
The physical address space size for a VM (IPA size) on arm/arm64 is
limited to a static limit of 40bits. This series adds support for
using an IPA size specific to a VM, allowing to use a size supported
by the host (based on the host kernel configuration and CPU support).
The default size is fixed to 40bits. On arm64, we can allow the limit
to be lowered (limiting the number of levels in stage2 to 2, to prevent
splitting the host PMD huge pages at stage2). We also add support for
handling 52bit IPA addresses (where supported) added by Arm v8.2
extensions.

As mentioned above, the supported IPA size on a host could be different
from the system's PARange indicated by the CPUs (e.g, kernel limit
on the PA size). So we expose the limit via a new KVM_CAP_ARM_VM_MAX_PHYS_SHIFT
on arm/arm64. See below for further discussion on the userspace API.

Supporting different IPA size requires modification to the stage2 page
table code. The arm64 page table level helpers are defined based on the
page table levels used by the host VA. So, the accessors may not work
if the guest uses more number of levels in stage2 than the stage1
of the host.  The previous versions (v1 & v2) of this series refactored
the stage1 page table accessors to reuse the low-level accessors for an
independent stage2 table. However, due to the level folding in the
generic code, the types are redefined as well. i.e, if the PUD is
folded, the pud_t could be defined as :

 typedef struct { pgd_t pgd; } pud_t;

similarly for pmd_t.  So, without stage1 independent page table entry
types for stage2, we could be dealing with a different type for level
 0-2 entries. This is practically fine on arm/arm64 as the entries
have similar format and size and we always use the appropriate
accessors to get the raw value (i.e, pud_val/pmd_val etc). But not
ideal for a solution upstream. So, this version caps the stage2 page
table levels to that of the stage1. This has the following impact on
the IPA support for various pagesize/host-va combinations :


x-----------------------------------------------------x
| host\ipa    | 40bit | 42bit | 44bit | 48bit | 52bit |
-------------------------------------------------------
| 39bit-4K    |  y    |   y   |  n    |   n   |  n/a  |
-------------------------------------------------------
| 48bit-4K    |  y    |   y   |  y    |   y   |  n/a  |
-------------------------------------------------------
| 36bit-16K   |  y    |   n   |  n    |   n   |  n/a  |
-------------------------------------------------------
| 47bit-16K   |  y    |   y   |  y    |   y   |  n/a  |
-------------------------------------------------------
| 48bit-4K    |  y    |   y   |  y    |   y   |  n/a  |
-------------------------------------------------------
| 42bit-64K   |  y    |   y   |  y    |   n   |  n    |
-------------------------------------------------------
| 48bit-64K   |  y    |   y   |  y    |   y   |  y    |
x-----------------------------------------------------x

Or the following list shows what cannot be supported :

 39bit-4K host  | [44 - 48]
 36bit-16K host | [41 - 48]
 42bit-64K host | [47 - 52]

which is not really bad. We can pursue the independent stage2
page table support and lift the restriction once we get there.
Given there is a proposal for new generic page table walker [0],
it would make sense to make our efforts in sync with it to avoid
diverting from a common API.

52bit support is added for VGIC (including ITS emulation) and handling
of PAR, HPFAR registers.

We need to set the IPA limit as early as the VM creation to keep the
code simpler to avoid sprinkling checks everywhere to ensure that the
IPA is configured. So, this version encodes the IPA size in the machine_type
argument to KVM_CREATE_VM ioctl. Bits [7-0] of the type are reserved for
the IPA size. However, there are other reasons (e.g, SVE vector length on
arm64) why we need a better way to tune the parameters of the VM early
enough to finalize the settings before any additional devices/memory/CPUs
are added to the VM. See [2] for the discussions on the previous version.

To summarise these are the following options :

 1) Add a new CREATE_VM ioctl (e.g CREAT_VM2) which imposes one of the
following semantics on the user.
  a) Pass a list of attributes/capabilities to be configured when the
     VM is created.

	vm_fd = ioctl(kvm_dev_fd, KVM_CREATE_VM2, &attributes);
     where "attributes" could be a new different structure or existing
     struct kvm_enable_cap {}.

	OR

  b) Create the VM in an unconfigured state where no resources can
     be allocated until a further IOCTL is issued after configuring
     different attributes/capabilities.
	i.e
		vm_fd = ioctl(kvm_dev_fd, KVM_CREATE_VM2, ..);
		/* Creates a VM unconfigured, not runnable */
		/* Configure the VM attributes on vm_fd */
		rc = ioctl(vm_fd, KVM_CREATE_VM_COMPLETE,...);

The series applies on 4.18-rc4. A tree is available here:

	 git://linux-arm.org/linux-skp.git ipa52/v4

Tested with
  - Modified kvmtool, which can only be used for (patches included in
    the series for reference / testing):
    * with virtio-pci upto 44bit PA (Due to 4K page size for virtio-pci
      legacy implemented by kvmtool)
    * Upto 48bit PA with virtio-mmio, due to 32bit PFN limitation.
  - Hacked Qemu (boot loader support for highmem, phys-shift support)
    * with virtio-pci GIC-v3 ITS & MSI upto 52bit on Foundation model.
    Also see [1] for Qemu support.

[0] https://lkml.org/lkml/2018/4/24/777
[1] https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05759.html
[2] https://lkml.org/lkml/2018/7/4/729

Changes since V3:
 - Use per-VM VTCR instead per-VM private VTCR bits
 - Allow IPA less than 40bits
 - Split the patch adding support for stage2 dynamic page tables
 - Rearrange the series to keep the userspace API at the end, which
   needs further discussion.
 - Collect Reviews/Acks from Eric & Marc

Changes since V2:
 - Drop "refactoring of host page table helpers" and restrict the IPA size
   to make sure stage2 doesn't use more page table levels than that of the host.
 - Load VTCR for TLB operations on behalf of the VM (Pointed-by: James Morse)
 - Split a couple of patches to make them easier to review.
 - Fall back to normal (non-concatenated) entry level page table support if
   possible.
 - Bump the IOCTL number

Changes since V1:
 - Change the userspace API for configuring VM to encode the IPA
   size in the VM type.  (suggested by Christoffer)
 - Expose the IPA limit on the host via ioctl on /dev/kvm
 - Handle 52bit addresses in PAR & HPFAR
 - Drop patch changing the life time of stage2 PGD
 - Rename macros for 48-to-52 bit conversion for GIC ITS BASER.
   (suggested by Christoffer)
 - Split virtio PFN check patches and address comments.


Kristina Martsenko (1):
  vgic: Add support for 52bit guest physical address

Suzuki K Poulose (19):
  virtio: mmio-v1: Validate queue PFN
  virtio: pci-legacy: Validate queue pfn
  kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
  kvm: arm/arm64: Remove spurious WARN_ON
  kvm: arm64: Add helper for loading the stage2 setting for a VM
  arm64: Add a helper for PARange to physical shift conversion
  kvm: arm64: Clean up VTCR_EL2 initialisation
  kvm: arm64: Configure VTCR_EL2 per VM
  kvm: arm/arm64: Prepare for VM specific stage2 translations
  kvm: arm64: Prepare for dynamic stage2 page table layout
  kvm: arm64: Make stage2 page table layout dynamic
  kvm: arm64: Dynamic configuration of VTTBR mask
  kvm: arm64: Configure VTCR_EL2.SL0 per VM
  kvm: arm64: Switch to per VM IPA limit
  kvm: arm64: Add 52bit support for PAR to HPFAR conversoin
  kvm: arm64: Set a limit on the IPA size
  kvm: arm64: Limit the minimum number of page table levels
  kvm: arm/arm64: Expose supported physical address limit for VM
  kvm: arm64: Allow tuning the physical address size for VM

 Documentation/virtual/kvm/api.txt             |   4 +
 arch/arm/include/asm/kvm_arm.h                |   3 +-
 arch/arm/include/asm/kvm_host.h               |   7 +
 arch/arm/include/asm/kvm_mmu.h                |  18 +-
 arch/arm/include/asm/stage2_pgtable.h         |  50 +++---
 arch/arm64/include/asm/cpufeature.h           |  13 ++
 arch/arm64/include/asm/kvm_arm.h              | 130 +++++++++++---
 arch/arm64/include/asm/kvm_asm.h              |   2 -
 arch/arm64/include/asm/kvm_host.h             |  15 +-
 arch/arm64/include/asm/kvm_hyp.h              |   7 +
 arch/arm64/include/asm/kvm_mmu.h              |  82 ++++++++-
 arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 -----
 arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -----
 arch/arm64/include/asm/stage2_pgtable.h       | 237 +++++++++++++++++++-------
 arch/arm64/kvm/guest.c                        |  49 +++++-
 arch/arm64/kvm/hyp/Makefile                   |   1 -
 arch/arm64/kvm/hyp/s2-setup.c                 |  90 ----------
 arch/arm64/kvm/hyp/switch.c                   |   4 +-
 arch/arm64/kvm/hyp/tlb.c                      |   4 +-
 drivers/virtio/virtio_mmio.c                  |  20 ++-
 drivers/virtio/virtio_pci_legacy.c            |  14 +-
 include/linux/irqchip/arm-gic-v3.h            |   5 +
 include/uapi/linux/kvm.h                      |  10 ++
 virt/kvm/arm/arm.c                            |  23 ++-
 virt/kvm/arm/mmu.c                            | 120 ++++++-------
 virt/kvm/arm/vgic/vgic-its.c                  |  36 ++--
 virt/kvm/arm/vgic/vgic-kvm-device.c           |   2 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c              |   2 -
 28 files changed, 631 insertions(+), 398 deletions(-)
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
 delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c


kvmtool changes :

Suzuki K Poulose (4):
  kvmtool: Allow backends to run checks on the KVM device fd
  kvmtool: arm64: Add support for guest physical address size
  kvmtool: arm64: Switch memory layout
  kvmtool: arm: Add support for creating VM with PA size

 arm/aarch32/include/kvm/kvm-arch.h        |  6 ++++--
 arm/aarch64/include/kvm/kvm-arch.h        | 15 ++++++++++++---
 arm/aarch64/include/kvm/kvm-config-arch.h |  5 ++++-
 arm/include/arm-common/kvm-arch.h         | 17 +++++++++++------
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 arm/kvm.c                                 | 24 +++++++++++++++++++++++-
 include/kvm/kvm.h                         |  4 ++++
 kvm.c                                     |  2 ++
 8 files changed, 61 insertions(+), 13 deletions(-)

Comments

Michael S. Tsirkin July 22, 2018, 3:53 p.m. UTC | #1
On Wed, Jul 18, 2018 at 10:18:45AM +0100, Suzuki K Poulose wrote:
> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
> queue pfn is too large to fit in 32bits, which we could hit on
> arm64 systems with 52bit physical addresses (even with 64K page
> size), we simply miss out a proper link to the other side of
> the queue.
> 
> Add a check to validate the PFN, rather than silently breaking
> the devices.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Cc: Peter Maydel <peter.maydell@linaro.org>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

I assume this will be merged through some other tree.

> ---
> Changes since v2:
>  - Change errno to -E2BIG
> ---
>  drivers/virtio/virtio_pci_legacy.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 2780886..de062fb 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  	struct virtqueue *vq;
>  	u16 num;
>  	int err;
> +	u64 q_pfn;
>  
>  	/* Select the queue we're interested in */
>  	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> @@ -141,9 +142,17 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  	if (!vq)
>  		return ERR_PTR(-ENOMEM);
>  
> +	q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> +	if (q_pfn >> 32) {
> +		dev_err(&vp_dev->pci_dev->dev,
> +			"platform bug: legacy virtio-mmio must not be used with RAM above 0x%llxGB\n",
> +			0x1ULL << (32 + PAGE_SHIFT - 30));
> +		err = -E2BIG;
> +		goto out_del_vq;
> +	}
> +
>  	/* activate the queue */
> -	iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> -		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +	iowrite32(q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
>  	vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
>  
> @@ -160,6 +169,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>  
>  out_deactivate:
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +out_del_vq:
>  	vring_del_virtqueue(vq);
>  	return ERR_PTR(err);
>  }
> -- 
> 2.7.4
Michael S. Tsirkin July 22, 2018, 3:55 p.m. UTC | #2
On Wed, Jul 18, 2018 at 10:18:44AM +0100, Suzuki K Poulose wrote:
> virtio-mmio with virtio-v1 uses a 32bit PFN for the queue.
> If the queue pfn is too large to fit in 32bits, which
> we could hit on arm64 systems with 52bit physical addresses
> (even with 64K page size), we simply miss out a proper link
> to the other side of the queue.
> 
> Add a check to validate the PFN, rather than silently breaking
> the devices.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Cc: Peter Maydel <peter.maydell@linaro.org>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

I assume this will be merged through some other tree.

> ---
> Changes since v2:
>  - Change errno to -E2BIG
> ---
>  drivers/virtio/virtio_mmio.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 67763d3..4cd9ea5 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -397,9 +397,23 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  	/* Activate the queue */
>  	writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM);
>  	if (vm_dev->version == 1) {
> +		u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT;
> +
> +		/*
> +		 * virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something
> +		 * that doesn't fit in 32bit, fail the setup rather than
> +		 * pretending to be successful.

I'd drop the "rather than pretending to be successful." if you fail
you are not pretending to be successful. Pls fix if you have to respin
anyway.

> +		 */
> +		if (q_pfn >> 32) {
> +			dev_err(&vdev->dev,
> +				"platform bug: legacy virtio-mmio must not be used with RAM above 0x%llxGB\n",
> +				0x1ULL << (32 + PAGE_SHIFT - 30));
> +			err = -E2BIG;
> +			goto error_bad_pfn;
> +		}
> +
>  		writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
> -		writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT,
> -				vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> +		writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
>  	} else {
>  		u64 addr;
>  
> @@ -430,6 +444,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>  
>  	return vq;
>  
> +error_bad_pfn:
> +	vring_del_virtqueue(vq);
>  error_new_virtqueue:
>  	if (vm_dev->version == 1) {
>  		writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> -- 
> 2.7.4
Suzuki K Poulose July 23, 2018, 9:44 a.m. UTC | #3
On 07/22/2018 04:53 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2018 at 10:18:45AM +0100, Suzuki K Poulose wrote:
>> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
>> queue pfn is too large to fit in 32bits, which we could hit on
>> arm64 systems with 52bit physical addresses (even with 64K page
>> size), we simply miss out a proper link to the other side of
>> the queue.
>>
>> Add a check to validate the PFN, rather than silently breaking
>> the devices.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Cc: Peter Maydel <peter.maydell@linaro.org>
>> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>


Michael,

Thanks.

> 
> I assume this will be merged through some other tree.
>


As such these two virtio patches do not have any code dependencies with
the rest of the series. So, if you could pick this up it should be fine.
Otherwise, may be Marc can push it with the rest of the series.

Marc,

Are you OK with that ?

Suzuki
Marc Zyngier July 23, 2018, 12:54 p.m. UTC | #4
On 23/07/18 10:44, Suzuki K Poulose wrote:
> On 07/22/2018 04:53 PM, Michael S. Tsirkin wrote:
>> On Wed, Jul 18, 2018 at 10:18:45AM +0100, Suzuki K Poulose wrote:
>>> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
>>> queue pfn is too large to fit in 32bits, which we could hit on
>>> arm64 systems with 52bit physical addresses (even with 64K page
>>> size), we simply miss out a proper link to the other side of
>>> the queue.
>>>
>>> Add a check to validate the PFN, rather than silently breaking
>>> the devices.
>>>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Christoffer Dall <cdall@kernel.org>
>>> Cc: Peter Maydel <peter.maydell@linaro.org>
>>> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> 
> Michael,
> 
> Thanks.
> 
>>
>> I assume this will be merged through some other tree.
>>
> 
> 
> As such these two virtio patches do not have any code dependencies with
> the rest of the series. So, if you could pick this up it should be fine.
> Otherwise, may be Marc can push it with the rest of the series.
> 
> Marc,
> 
> Are you OK with that ?

Given that these two patches completely independent, I think their
natural path should be the virtio tree. But if Michael doesn't want to
pick them, I'll do it as part of this series.

Thanks,

	M.
Michael S. Tsirkin July 23, 2018, 2:20 p.m. UTC | #5
On Mon, Jul 23, 2018 at 01:54:10PM +0100, Marc Zyngier wrote:
> On 23/07/18 10:44, Suzuki K Poulose wrote:
> > On 07/22/2018 04:53 PM, Michael S. Tsirkin wrote:
> >> On Wed, Jul 18, 2018 at 10:18:45AM +0100, Suzuki K Poulose wrote:
> >>> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
> >>> queue pfn is too large to fit in 32bits, which we could hit on
> >>> arm64 systems with 52bit physical addresses (even with 64K page
> >>> size), we simply miss out a proper link to the other side of
> >>> the queue.
> >>>
> >>> Add a check to validate the PFN, rather than silently breaking
> >>> the devices.
> >>>
> >>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >>> Cc: Jason Wang <jasowang@redhat.com>
> >>> Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>> Cc: Christoffer Dall <cdall@kernel.org>
> >>> Cc: Peter Maydel <peter.maydell@linaro.org>
> >>> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>
> >> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > 
> > Michael,
> > 
> > Thanks.
> > 
> >>
> >> I assume this will be merged through some other tree.
> >>
> > 
> > 
> > As such these two virtio patches do not have any code dependencies with
> > the rest of the series. So, if you could pick this up it should be fine.
> > Otherwise, may be Marc can push it with the rest of the series.
> > 
> > Marc,
> > 
> > Are you OK with that ?
> 
> Given that these two patches completely independent, I think their
> natural path should be the virtio tree. But if Michael doesn't want to
> pick them, I'll do it as part of this series.
> 
> Thanks,
> 
> 	M.

It's ok, I can pick them up.

> -- 
> Jazz is not dead. It just smells funny...
Christoffer Dall Aug. 30, 2018, 9:39 a.m. UTC | #6
On Wed, Jul 18, 2018 at 10:18:48AM +0100, Suzuki K Poulose wrote:
> We load the stage2 context of a guest for different operations,
> including running the guest and tlb maintenance on behalf of the
> guest. As of now only the vttbr is private to the guest, but this
> is about to change with IPA per VM. Add a helper to load the stage2
> configuration for a VM, which could do the right thing with the
> future changes.
> 
> Cc: Christoffer Dall <cdall@kernel.org>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v2:
>  - New patch
> ---
>  arch/arm64/include/asm/kvm_hyp.h | 6 ++++++
>  arch/arm64/kvm/hyp/switch.c      | 2 +-
>  arch/arm64/kvm/hyp/tlb.c         | 4 ++--
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 384c343..82f9994 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -155,5 +155,11 @@ void deactivate_traps_vhe_put(void);
>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>  void __noreturn __hyp_do_panic(unsigned long, ...);
>  
> +/* Must be called from hyp code running at EL2 */

more importantly than having to run this at EL2, is that it must have
gone through the proper sequence of update_vttbr() and disabling
interrupts to avoid using a stale VMID.

> +static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm)
> +{
> +	write_sysreg(kvm->arch.vttbr, vttbr_el2);
> +}
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d496ef5..355fb25 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -195,7 +195,7 @@ void deactivate_traps_vhe_put(void)
>  
>  static void __hyp_text __activate_vm(struct kvm *kvm)
>  {
> -	write_sysreg(kvm->arch.vttbr, vttbr_el2);
> +	__load_guest_stage2(kvm);
>  }
>  
>  static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 131c777..4dbd9c6 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -30,7 +30,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
>  	 * bits. Changing E2H is impossible (goodbye TTBR1_EL2), so
>  	 * let's flip TGE before executing the TLB operation.
>  	 */
> -	write_sysreg(kvm->arch.vttbr, vttbr_el2);
> +	__load_guest_stage2(kvm);
>  	val = read_sysreg(hcr_el2);
>  	val &= ~HCR_TGE;
>  	write_sysreg(val, hcr_el2);
> @@ -39,7 +39,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
>  
>  static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm)
>  {
> -	write_sysreg(kvm->arch.vttbr, vttbr_el2);
> +	__load_guest_stage2(kvm);
>  	isb();
>  }
>  
> -- 
> 2.7.4
> 

Thanks,
-Christoffer
Christoffer Dall Aug. 30, 2018, 9:42 a.m. UTC | #7
On Wed, Jul 18, 2018 at 10:18:49AM +0100, Suzuki K Poulose wrote:
> On arm64, ID_AA64MMFR0_EL1.PARange encodes the maximum Physical
> Address range supported by the CPU. Add a helper to decode this
> to actual physical shift. If we hit an unallocated value, return
> the maximum range supported by the kernel.
> This will be used by KVM to set the VTCR_EL2.T0SZ, as it
> is about to move its place. Having this helper keeps the code
> movement cleaner.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since V2:
>  - Split the patch
>  - Limit the physical shift only for values unrecognized.
> ---
>  arch/arm64/include/asm/cpufeature.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 1717ba1..855cf0e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -530,6 +530,19 @@ void arm64_set_ssbd_mitigation(bool state);
>  static inline void arm64_set_ssbd_mitigation(bool state) {}
>  #endif
>  
> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> +{
> +	switch (parange) {
> +	case 0: return 32;
> +	case 1: return 36;
> +	case 2: return 40;
> +	case 3: return 42;
> +	case 4: return 44;
> +	case 5: return 48;
> +	case 6: return 52;
> +	default: return CONFIG_ARM64_PA_BITS;

I don't understand this case?  Shouldn't this include at least a WARN()
?

Thanks,
    Christoffer

> +	}
> +}
>  #endif /* __ASSEMBLY__ */
>  
>  #endif
> -- 
> 2.7.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Christoffer Dall Aug. 30, 2018, 10:02 a.m. UTC | #8
On Wed, Jul 18, 2018 at 10:18:51AM +0100, Suzuki K Poulose wrote:
> Add support for setting the VTCR_EL2 per VM, rather than hard
> coding a value at boot time per CPU. This would allow us to tune
> the stage2 page table parameters per VM in the later changes.
> 
> We compute the VTCR fields based on the system wide sanitised
> feature registers, except for the hardware management of Access
> Flags (VTCR_EL2.HA). It is fine to run a system with a mix of
> CPUs that may or may not update the page table Access Flags.
> Since the bit is RES0 on CPUs that don't support it, the bit
> should be ignored on them.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

> 
> Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <cdall@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +++
>  arch/arm64/include/asm/kvm_arm.h  |  3 +-
>  arch/arm64/include/asm/kvm_asm.h  |  2 --
>  arch/arm64/include/asm/kvm_host.h | 15 +++++---
>  arch/arm64/include/asm/kvm_hyp.h  |  1 +
>  arch/arm64/kvm/guest.c            | 38 ++++++++++++++++++++-
>  arch/arm64/kvm/hyp/Makefile       |  1 -
>  arch/arm64/kvm/hyp/s2-setup.c     | 72 ---------------------------------------
>  virt/kvm/arm/arm.c                |  4 +++
>  9 files changed, 58 insertions(+), 83 deletions(-)
>  delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1f1fe410..86f43ab 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -350,4 +350,9 @@ static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}
>  struct kvm *kvm_arch_alloc_vm(void);
>  void kvm_arch_free_vm(struct kvm *kvm);
>  
> +static inline int kvm_arm_config_vm(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 3dffd38..87d2db6 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -134,8 +134,7 @@
>   * 40 bits wide (T0SZ = 24).  Systems with a PARange smaller than 40 bits are
>   * not known to exist and will break with this configuration.
>   *
> - * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time
> - * (see hyp-init.S).
> + * The VTCR_EL2 is configured per VM and is initialised in kvm_arm_config_vm().
>   *
>   * Note that when using 4K pages, we concatenate two first level page tables
>   * together. With 16K pages, we concatenate 16 first level page tables.
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 102b5a5..0b53c72 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -72,8 +72,6 @@ extern void __vgic_v3_init_lrs(void);
>  
>  extern u32 __kvm_get_mdcr_el2(void);
>  
> -extern u32 __init_stage2_translation(void);
> -
>  /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */
>  #define __hyp_this_cpu_ptr(sym)						\
>  	({								\
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fe8777b..b1ffaf3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -61,12 +61,13 @@ struct kvm_arch {
>  	u64    vmid_gen;
>  	u32    vmid;
>  
> -	/* 1-level 2nd stage table and lock */
> -	spinlock_t pgd_lock;
> +	/* stage2 entry level table */
>  	pgd_t *pgd;
>  
>  	/* VTTBR value associated with above pgd and vmid */
>  	u64    vttbr;
> +	/* VTCR_EL2 value for this VM */
> +	u64    vtcr;
>  
>  	/* The last vcpu id that ran on each physical CPU */
>  	int __percpu *last_vcpu_ran;
> @@ -442,10 +443,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  
>  static inline void __cpu_init_stage2(void)
>  {
> -	u32 parange = kvm_call_hyp(__init_stage2_translation);
> +	u32 ps;
>  
> -	WARN_ONCE(parange < 40,
> -		  "PARange is %d bits, unsupported configuration!", parange);
> +	/* Sanity check for minimum IPA size support */
> +	ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 0x7);
> +	WARN_ONCE(ps < 40,
> +		  "PARange is %d bits, unsupported configuration!", ps);
>  }
>  
>  /* Guest/host FPSIMD coordination helpers */
> @@ -513,4 +516,6 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
>  struct kvm *kvm_arch_alloc_vm(void);
>  void kvm_arch_free_vm(struct kvm *kvm);
>  
> +int kvm_arm_config_vm(struct kvm *kvm);
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 82f9994..6f47cc7 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -158,6 +158,7 @@ void __noreturn __hyp_do_panic(unsigned long, ...);
>  /* Must be called from hyp code running at EL2 */
>  static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm)
>  {
> +	write_sysreg(kvm->arch.vtcr, vtcr_el2);
>  	write_sysreg(kvm->arch.vttbr, vttbr_el2);
>  }
>  
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 56a0260..d24ee23 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -26,11 +26,13 @@
>  #include <linux/vmalloc.h>
>  #include <linux/fs.h>
>  #include <kvm/arm_psci.h>
> -#include <asm/cputype.h>
>  #include <linux/uaccess.h>
> +#include <asm/cpufeature.h>
> +#include <asm/cputype.h>
>  #include <asm/kvm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_coproc.h>
> +#include <asm/kvm_mmu.h>
>  
>  #include "trace.h"
>  
> @@ -458,3 +460,37 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  
>  	return ret;
>  }
> +
> +/*
> + * Configure the VTCR_EL2 for this VM. The VTCR value is common
> + * across all the physical CPUs on the system. We use system wide
> + * sanitised values to fill in different fields, except for Hardware
> + * Management of Access Flags. HA Flag is set unconditionally on
> + * all CPUs, as it is safe to run with or without the feature and
> + * the bit is RES0 on CPUs that don't support it.
> + */
> +int kvm_arm_config_vm(struct kvm *kvm)
> +{
> +	u64 vtcr = VTCR_EL2_FLAGS;
> +	u64 parange;
> +
> +
> +	parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7;
> +	if (parange > ID_AA64MMFR0_PARANGE_MAX)
> +		parange = ID_AA64MMFR0_PARANGE_MAX;
> +	vtcr |= parange << VTCR_EL2_PS_SHIFT;
> +
> +	/*
> +	 * Enable the Hardware Access Flag management, unconditionally
> +	 * on all CPUs. The features is RES0 on CPUs without the support
> +	 * and must be ignored by the CPUs.
> +	 */
> +	vtcr |= VTCR_EL2_HA;
> +
> +	/* Set the vmid bits */
> +	vtcr |= (kvm_get_vmid_bits() == 16) ?
> +		VTCR_EL2_VS_16BIT :
> +		VTCR_EL2_VS_8BIT;
> +	kvm->arch.vtcr = vtcr;
> +	return 0;
> +}
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 4313f74..1b80a9a 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -18,7 +18,6 @@ obj-$(CONFIG_KVM_ARM_HOST) += switch.o
>  obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
>  obj-$(CONFIG_KVM_ARM_HOST) += tlb.o
>  obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
> -obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o
>  
>  # KVM code is run at a different exception code with a different map, so
>  # compiler instrumentation that inserts callbacks or checks into the code may
> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
> deleted file mode 100644
> index e1ca672..0000000
> --- a/arch/arm64/kvm/hyp/s2-setup.c
> +++ /dev/null
> @@ -1,72 +0,0 @@
> -/*
> - * Copyright (C) 2016 - ARM Ltd
> - * Author: Marc Zyngier <marc.zyngier@arm.com>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> - */
> -
> -#include <linux/types.h>
> -#include <asm/kvm_arm.h>
> -#include <asm/kvm_asm.h>
> -#include <asm/kvm_hyp.h>
> -#include <asm/cpufeature.h>
> -
> -u32 __hyp_text __init_stage2_translation(void)
> -{
> -	u64 val = VTCR_EL2_FLAGS;
> -	u64 parange;
> -	u32 phys_shift;
> -	u64 tmp;
> -
> -	/*
> -	 * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS
> -	 * bits in VTCR_EL2. Amusingly, the PARange is 4 bits, but the
> -	 * allocated values are limited to 3bits.
> -	 */
> -	parange = read_sysreg(id_aa64mmfr0_el1) & 7;
> -	if (parange > ID_AA64MMFR0_PARANGE_MAX)
> -		parange = ID_AA64MMFR0_PARANGE_MAX;
> -	val |= parange << VTCR_EL2_PS_SHIFT;
> -
> -	/* Compute the actual PARange... */
> -	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> -
> -	/*
> -	 * ... and clamp it to 40 bits, unless we have some braindead
> -	 * HW that implements less than that. In all cases, we'll
> -	 * return that value for the rest of the kernel to decide what
> -	 * to do.
> -	 */
> -	val |= VTCR_EL2_T0SZ(phys_shift > 40 ? 40 : phys_shift);
> -
> -	/*
> -	 * Check the availability of Hardware Access Flag / Dirty Bit
> -	 * Management in ID_AA64MMFR1_EL1 and enable the feature in VTCR_EL2.
> -	 */
> -	tmp = (read_sysreg(id_aa64mmfr1_el1) >> ID_AA64MMFR1_HADBS_SHIFT) & 0xf;
> -	if (tmp)
> -		val |= VTCR_EL2_HA;
> -
> -	/*
> -	 * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS
> -	 * bit in VTCR_EL2.
> -	 */
> -	tmp = (read_sysreg(id_aa64mmfr1_el1) >> ID_AA64MMFR1_VMIDBITS_SHIFT) & 0xf;
> -	val |= (tmp == ID_AA64MMFR1_VMIDBITS_16) ?
> -			VTCR_EL2_VS_16BIT :
> -			VTCR_EL2_VS_8BIT;
> -
> -	write_sysreg(val, vtcr_el2);
> -
> -	return phys_shift;
> -}
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 04e554c..37e46e4 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -122,6 +122,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (type)
>  		return -EINVAL;
>  
> +	ret = kvm_arm_config_vm(kvm);
> +	if (ret)
> +		return ret;
> +
>  	kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
>  	if (!kvm->arch.last_vcpu_ran)
>  		return -ENOMEM;
> -- 
> 2.7.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Suzuki K Poulose Sept. 3, 2018, 10:03 a.m. UTC | #9
On 30/08/18 10:39, Christoffer Dall wrote:
> On Wed, Jul 18, 2018 at 10:18:48AM +0100, Suzuki K Poulose wrote:
>> We load the stage2 context of a guest for different operations,
>> including running the guest and tlb maintenance on behalf of the
>> guest. As of now only the vttbr is private to the guest, but this
>> is about to change with IPA per VM. Add a helper to load the stage2
>> configuration for a VM, which could do the right thing with the
>> future changes.
>>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v2:
>>   - New patch
>> ---
>>   arch/arm64/include/asm/kvm_hyp.h | 6 ++++++
>>   arch/arm64/kvm/hyp/switch.c      | 2 +-
>>   arch/arm64/kvm/hyp/tlb.c         | 4 ++--
>>   3 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>> index 384c343..82f9994 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -155,5 +155,11 @@ void deactivate_traps_vhe_put(void);
>>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>>   void __noreturn __hyp_do_panic(unsigned long, ...);
>>   
>> +/* Must be called from hyp code running at EL2 */
> 
> more importantly than having to run this at EL2, is that it must have
> gone through the proper sequence of update_vttbr() and disabling
> interrupts to avoid using a stale VMID.

Right, I will update the comment.

Cheers
Suzuki
Suzuki K Poulose Sept. 3, 2018, 10:06 a.m. UTC | #10
On 30/08/18 10:42, Christoffer Dall wrote:
> On Wed, Jul 18, 2018 at 10:18:49AM +0100, Suzuki K Poulose wrote:
>> On arm64, ID_AA64MMFR0_EL1.PARange encodes the maximum Physical
>> Address range supported by the CPU. Add a helper to decode this
>> to actual physical shift. If we hit an unallocated value, return
>> the maximum range supported by the kernel.
>> This will be used by KVM to set the VTCR_EL2.T0SZ, as it
>> is about to move its place. Having this helper keeps the code
>> movement cleaner.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: James Morse <james.morse@arm.com>
>> Cc: Christoffer Dall <cdall@kernel.org>
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since V2:
>>   - Split the patch
>>   - Limit the physical shift only for values unrecognized.
>> ---
>>   arch/arm64/include/asm/cpufeature.h | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 1717ba1..855cf0e 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -530,6 +530,19 @@ void arm64_set_ssbd_mitigation(bool state);
>>   static inline void arm64_set_ssbd_mitigation(bool state) {}
>>   #endif
>>   
>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>> +{
>> +	switch (parange) {
>> +	case 0: return 32;
>> +	case 1: return 36;
>> +	case 2: return 40;
>> +	case 3: return 42;
>> +	case 4: return 44;
>> +	case 5: return 48;
>> +	case 6: return 52;
>> +	default: return CONFIG_ARM64_PA_BITS;
> 
> I don't understand this case?  Shouldn't this include at least a WARN()
> ?

If a new value gets assigned in the future, an older kernel might not
be aware of it. As per the Arm ARM ID feature value rules, we are
guaranteed that the next higher value must indicate higher value.
So, WARN() may not be the right choice. Hence, we restrict it to
the value supported by the kernel.

Suzuki
Christoffer Dall Sept. 3, 2018, 11:13 a.m. UTC | #11
On Mon, Sep 03, 2018 at 11:06:44AM +0100, Suzuki K Poulose wrote:
> On 30/08/18 10:42, Christoffer Dall wrote:
> >On Wed, Jul 18, 2018 at 10:18:49AM +0100, Suzuki K Poulose wrote:
> >>On arm64, ID_AA64MMFR0_EL1.PARange encodes the maximum Physical
> >>Address range supported by the CPU. Add a helper to decode this
> >>to actual physical shift. If we hit an unallocated value, return
> >>the maximum range supported by the kernel.
> >>This will be used by KVM to set the VTCR_EL2.T0SZ, as it
> >>is about to move its place. Having this helper keeps the code
> >>movement cleaner.
> >>
> >>Cc: Catalin Marinas <catalin.marinas@arm.com>
> >>Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>Cc: James Morse <james.morse@arm.com>
> >>Cc: Christoffer Dall <cdall@kernel.org>
> >>Reviewed-by: Eric Auger <eric.auger@redhat.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>---
> >>Changes since V2:
> >>  - Split the patch
> >>  - Limit the physical shift only for values unrecognized.
> >>---
> >>  arch/arm64/include/asm/cpufeature.h | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >>diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> >>index 1717ba1..855cf0e 100644
> >>--- a/arch/arm64/include/asm/cpufeature.h
> >>+++ b/arch/arm64/include/asm/cpufeature.h
> >>@@ -530,6 +530,19 @@ void arm64_set_ssbd_mitigation(bool state);
> >>  static inline void arm64_set_ssbd_mitigation(bool state) {}
> >>  #endif
> >>+static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> >>+{
> >>+	switch (parange) {
> >>+	case 0: return 32;
> >>+	case 1: return 36;
> >>+	case 2: return 40;
> >>+	case 3: return 42;
> >>+	case 4: return 44;
> >>+	case 5: return 48;
> >>+	case 6: return 52;
> >>+	default: return CONFIG_ARM64_PA_BITS;
> >
> >I don't understand this case?  Shouldn't this include at least a WARN()
> >?
> 
> If a new value gets assigned in the future, an older kernel might not
> be aware of it. As per the Arm ARM ID feature value rules, we are
> guaranteed that the next higher value must indicate higher value.
> So, WARN() may not be the right choice. Hence, we restrict it to
> the value supported by the kernel.
> 

ok, fair enough.

Thanks,

    Christoffer