Message ID | 20240123181753.413961-1-eric.auger@redhat.com |
---|---|
Headers | show |
Series | VIRTIO-IOMMU: Introduce an aw-bits option | expand |
Hi Eric, On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote: > In [1] and [2] we attempted to fix a case where a VFIO-PCI device > protected with a virtio-iommu is assigned to an x86 guest. On x86 > the physical IOMMU may have an address width (gaw) of 39 or 48 bits > whereas the virtio-iommu exposes a 64b input address space by default. > Hence the guest may try to use the full 64b space and DMA MAP > failures may be encountered. To work around this issue we endeavoured > to pass usable host IOVA regions (excluding the out of range space) from > VFIO to the virtio-iommu device so that the virtio-iommu driver can > query those latter during the probe request and let the guest iommu > kernel subsystem carve them out. > > However if there are several devices in the same iommu group, > only the reserved regions of the first one are taken into > account by the iommu subsystem of the guest. This generally > works on baremetal because devices are not going to > expose different reserved regions. However in our case, this > may prevent from taking into account the host iommu geometry. > > So the simplest solution to this problem looks to introduce an > input address width option, aw-bits, which matches what is > done on the intel-iommu. By default, from now on it is set > to 39 bits with pc_q35 and 64b with arm virt. Doesn't Arm have the same problem? The TTB0 page tables limit what can be mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB. A Linux host driver could configure smaller VA sizes: * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts). * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which could be as low as 36 bits (but realistically 39, since 36 depends on 16kB pages and CONFIG_EXPERT). But 64-bit definitely can't work for VFIO, and I suppose isn't useful for virtual devices, so maybe 39 is also a reasonable default on Arm. Thanks, Jean > This replaces the > previous default value of 64b. So we need to introduce a compat > for pc_q35 machines older than 9.0 to behave similarly.
Hi Jean-Philippe, On 1/29/24 13:23, Jean-Philippe Brucker wrote: > Hi Eric, > > On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote: >> In [1] and [2] we attempted to fix a case where a VFIO-PCI device >> protected with a virtio-iommu is assigned to an x86 guest. On x86 >> the physical IOMMU may have an address width (gaw) of 39 or 48 bits >> whereas the virtio-iommu exposes a 64b input address space by default. >> Hence the guest may try to use the full 64b space and DMA MAP >> failures may be encountered. To work around this issue we endeavoured >> to pass usable host IOVA regions (excluding the out of range space) from >> VFIO to the virtio-iommu device so that the virtio-iommu driver can >> query those latter during the probe request and let the guest iommu >> kernel subsystem carve them out. >> >> However if there are several devices in the same iommu group, >> only the reserved regions of the first one are taken into >> account by the iommu subsystem of the guest. This generally >> works on baremetal because devices are not going to >> expose different reserved regions. However in our case, this >> may prevent from taking into account the host iommu geometry. >> >> So the simplest solution to this problem looks to introduce an >> input address width option, aw-bits, which matches what is >> done on the intel-iommu. By default, from now on it is set >> to 39 bits with pc_q35 and 64b with arm virt. > Doesn't Arm have the same problem? The TTB0 page tables limit what can be > mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB. > A Linux host driver could configure smaller VA sizes: > * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which > can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts). Yes I think we can ignore that use case. > * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which > could be as low as 36 bits (but realistically 39, since 36 depends on > 16kB pages and CONFIG_EXPERT). Further reading "3.4.1 Input address size and Virtual Address size" ooks indeed SMMU_IDR5.VAX gives info on the physical SMMU actual implementation max (which matches intel iommu gaw). I missed that. Now I am confused about should we limit VAS to 39 to accomodate of the worst case host SW configuration or shall we use 48 instead? If we set such a low 39b value, won't it prevent some guests from properly working? Thanks Eric > > But 64-bit definitely can't work for VFIO, and I suppose isn't useful for > virtual devices, so maybe 39 is also a reasonable default on Arm. > > Thanks, > Jean > >> This replaces the >> previous default value of 64b. So we need to introduce a compat >> for pc_q35 machines older than 9.0 to behave similarly.
On Mon, Jan 29, 2024 at 03:07:41PM +0100, Eric Auger wrote: > Hi Jean-Philippe, > > On 1/29/24 13:23, Jean-Philippe Brucker wrote: > > Hi Eric, > > > > On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote: > >> In [1] and [2] we attempted to fix a case where a VFIO-PCI device > >> protected with a virtio-iommu is assigned to an x86 guest. On x86 > >> the physical IOMMU may have an address width (gaw) of 39 or 48 bits > >> whereas the virtio-iommu exposes a 64b input address space by default. > >> Hence the guest may try to use the full 64b space and DMA MAP > >> failures may be encountered. To work around this issue we endeavoured > >> to pass usable host IOVA regions (excluding the out of range space) from > >> VFIO to the virtio-iommu device so that the virtio-iommu driver can > >> query those latter during the probe request and let the guest iommu > >> kernel subsystem carve them out. > >> > >> However if there are several devices in the same iommu group, > >> only the reserved regions of the first one are taken into > >> account by the iommu subsystem of the guest. This generally > >> works on baremetal because devices are not going to > >> expose different reserved regions. However in our case, this > >> may prevent from taking into account the host iommu geometry. > >> > >> So the simplest solution to this problem looks to introduce an > >> input address width option, aw-bits, which matches what is > >> done on the intel-iommu. By default, from now on it is set > >> to 39 bits with pc_q35 and 64b with arm virt. > > Doesn't Arm have the same problem? The TTB0 page tables limit what can be > > mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB. > > A Linux host driver could configure smaller VA sizes: > > * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which > > can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts). > Yes I think we can ignore that use case. > > * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which > > could be as low as 36 bits (but realistically 39, since 36 depends on > > 16kB pages and CONFIG_EXPERT). > Further reading "3.4.1 Input address size and Virtual Address size" ooks > indeed SMMU_IDR5.VAX gives info on the physical SMMU actual > implementation max (which matches intel iommu gaw). I missed that. Now I > am confused about should we limit VAS to 39 to accomodate of the worst > case host SW configuration or shall we use 48 instead? I don't know what's best either. 48 should be fine if hosts normally enable VA_BITS_48 (I see debian has it [1], not sure how to find the others). [1] https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/config/arm64/config?ref_type=heads#L18 > If we set such a low 39b value, won't it prevent some guests from > properly working? It's not that low, since it gives each endpoint a private 512GB address space, but yes there might be special cases that reach the limit. Maybe assign a multi-queue NIC to a 256-vCPU guest, and if you want per-vCPU DMA pools, then with a 39-bit address space you only get 2GB per vCPU. With 48-bit you get 1TB which should be plenty. 52-bit private IOVA space doesn't seem useful, I doubt we'll ever need to support that on the MAP/UNMAP interface. So I guess 48-bit can be the default, and users with special setups can override aw-bits. Thanks, Jean
Hi Jean, On 1/29/24 18:42, Jean-Philippe Brucker wrote: > On Mon, Jan 29, 2024 at 03:07:41PM +0100, Eric Auger wrote: >> Hi Jean-Philippe, >> >> On 1/29/24 13:23, Jean-Philippe Brucker wrote: >>> Hi Eric, >>> >>> On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote: >>>> In [1] and [2] we attempted to fix a case where a VFIO-PCI device >>>> protected with a virtio-iommu is assigned to an x86 guest. On x86 >>>> the physical IOMMU may have an address width (gaw) of 39 or 48 bits >>>> whereas the virtio-iommu exposes a 64b input address space by default. >>>> Hence the guest may try to use the full 64b space and DMA MAP >>>> failures may be encountered. To work around this issue we endeavoured >>>> to pass usable host IOVA regions (excluding the out of range space) from >>>> VFIO to the virtio-iommu device so that the virtio-iommu driver can >>>> query those latter during the probe request and let the guest iommu >>>> kernel subsystem carve them out. >>>> >>>> However if there are several devices in the same iommu group, >>>> only the reserved regions of the first one are taken into >>>> account by the iommu subsystem of the guest. This generally >>>> works on baremetal because devices are not going to >>>> expose different reserved regions. However in our case, this >>>> may prevent from taking into account the host iommu geometry. >>>> >>>> So the simplest solution to this problem looks to introduce an >>>> input address width option, aw-bits, which matches what is >>>> done on the intel-iommu. By default, from now on it is set >>>> to 39 bits with pc_q35 and 64b with arm virt. >>> Doesn't Arm have the same problem? The TTB0 page tables limit what can be >>> mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB. >>> A Linux host driver could configure smaller VA sizes: >>> * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which >>> can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts). >> Yes I think we can ignore that use case. >>> * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which >>> could be as low as 36 bits (but realistically 39, since 36 depends on >>> 16kB pages and CONFIG_EXPERT). >> Further reading "3.4.1 Input address size and Virtual Address size" ooks >> indeed SMMU_IDR5.VAX gives info on the physical SMMU actual >> implementation max (which matches intel iommu gaw). I missed that. Now I >> am confused about should we limit VAS to 39 to accomodate of the worst >> case host SW configuration or shall we use 48 instead? > I don't know what's best either. 48 should be fine if hosts normally > enable VA_BITS_48 (I see debian has it [1], not sure how to find the > others). > > [1] https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/config/arm64/config?ref_type=heads#L18 > >> If we set such a low 39b value, won't it prevent some guests from >> properly working? > It's not that low, since it gives each endpoint a private 512GB address > space, but yes there might be special cases that reach the limit. Maybe > assign a multi-queue NIC to a 256-vCPU guest, and if you want per-vCPU DMA > pools, then with a 39-bit address space you only get 2GB per vCPU. With > 48-bit you get 1TB which should be plenty. > > 52-bit private IOVA space doesn't seem useful, I doubt we'll ever need to > support that on the MAP/UNMAP interface. > > So I guess 48-bit can be the default, and users with special setups can > override aw-bits. Yes it looks safe also on my side. I will respin with 48b default then. Thank you! Eric > > Thanks, > Jean >