mbox series

[RFC,v2,0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

Message ID 171450753489.10851.3056035705169121613.stgit@linux.ibm.com (mailing list archive)
Headers show
Series powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO | expand

Message

Shivaprasad G Bhat April 30, 2024, 8:05 p.m. UTC
RFC v1 was posted here [1]. As I was testing more and fixing the
issues, I realized its clean to have the table_group_ops implemented
the way it is done on PowerNV and stop 'borrowing' the DMA windows
for pSeries.

This patch-set implements the iommu table_group_ops for pSeries for
VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
pSeries machines.

So, this patchset is a re-write and not close to the V1 except
for few changes.

Structure of the patchset:
-------------------------
The first and fifth patches just code movements.

Second patch takes care of collecting the TCE and DDW information
for the vfio_iommu_spapr_tce_ddw_info during probe.

Third patch fixes the convention of using table[1] for VFs on
pSeries when used by the host driver.

Fourth patch fixes the VFIO to call TCE clear before unset window.

The last patch has the API implementations, please find the
details on its commit description.

Testing:
-------
Tested with nested guest for NVME card, Mellanox multi-function
card by attaching them to nested kvm guest running on a pSeries
lpar.
Also vfio-test [2] by Alex Willamson, was forked and updated to
add support for pSeries guest and used to test these patches[3].

Limitations/Known Issues:
------------------------
* The DMA window restrictions with SRIOV VF scenarios of having
maximum 1 dma window is taken care in the current patches itself.
However, the necessary changes required in
vfio_iommu_spapr_tce_ddw_info to expose the default window being
a 64-bit one and the qemu changes handle the same will be taken
care in next versions.
* KVM guest boot throws warning at remap_pfn_range_notrack(), on
the host, I will post the fix along in the next versions.
* The DLPAR hotplugged device has no FDT entry until next reboot,
default dma window property has to be preserved differently for
this case.

References:
----------
[1] https://lore.kernel.org/linuxppc-dev/171026724548.8367.8321359354119254395.stgit@linux.ibm.com/
[2] https://github.com/awilliam/tests
[3] https://github.com/nnmwebmin/vfio-ppc-tests/tree/vfio-ppc-ex

---
Changelog:
v1: https://lore.kernel.org/linuxppc-dev/171026724548.8367.8321359354119254395.stgit@linux.ibm.com/
 - Rewrite as to stop borrowing the DMA windows and implemented
 the table_group_ops for pSeries.
 - Cover letter and Patch 6 has more details as this was a rewrite.

Shivaprasad G Bhat (6):
      powerpc/iommu: Move pSeries specific functions to pseries/iommu.c
      powerpc/pseries/iommu: Fix the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl output
      powerpc/pseries/iommu: Use the iommu table[0] for IOV VF's DDW
      vfio/spapr: Always clear TCEs before unsetting the window
      powerpc/iommu: Move dev_has_iommu_table() to iommu.c
      powerpc/iommu: Implement the iommu_table_group_ops for pSeries


 arch/powerpc/include/asm/iommu.h          |   9 +-
 arch/powerpc/kernel/eeh.c                 |  16 -
 arch/powerpc/kernel/iommu.c               | 170 +----
 arch/powerpc/platforms/powernv/pci-ioda.c |   6 +-
 arch/powerpc/platforms/pseries/iommu.c    | 720 +++++++++++++++++++++-
 drivers/vfio/vfio_iommu_spapr_tce.c       |  13 +-
 6 files changed, 729 insertions(+), 205 deletions(-)

--
Signature

Comments

Jason Gunthorpe May 1, 2024, 2:09 p.m. UTC | #1
On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote:
> RFC v1 was posted here [1]. As I was testing more and fixing the
> issues, I realized its clean to have the table_group_ops implemented
> the way it is done on PowerNV and stop 'borrowing' the DMA windows
> for pSeries.
> 
> This patch-set implements the iommu table_group_ops for pSeries for
> VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
> pSeries machines.

Wait, did they previously not have any support?

Again, this TCE stuff needs to go away, not grow. I can grudgingly
accept fixing it where it used to work, but not enabling more HW that
never worked before! :(

Jason
Alexey Kardashevskiy May 2, 2024, 1:29 a.m. UTC | #2
On 2/5/24 00:09, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote:
>> RFC v1 was posted here [1]. As I was testing more and fixing the
>> issues, I realized its clean to have the table_group_ops implemented
>> the way it is done on PowerNV and stop 'borrowing' the DMA windows
>> for pSeries.
>>
>> This patch-set implements the iommu table_group_ops for pSeries for
>> VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
>> pSeries machines.
> 
> Wait, did they previously not have any support?
 >
> Again, this TCE stuff needs to go away, not grow. I can grudgingly
> accept fixing it where it used to work, but not enabling more HW that
> never worked before! :(


This used to work when I tried last time 2+ years ago, not a new stuff. 
Thanks,
Shivaprasad G Bhat May 3, 2024, 7:03 p.m. UTC | #3
On 5/2/24 06:59, Alexey Kardashevskiy wrote:
>
>
> On 2/5/24 00:09, Jason Gunthorpe wrote:
>> On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote:
>>> RFC v1 was posted here [1]. As I was testing more and fixing the
>>> issues, I realized its clean to have the table_group_ops implemented
>>> the way it is done on PowerNV and stop 'borrowing' the DMA windows
>>> for pSeries.
>>>
>>> This patch-set implements the iommu table_group_ops for pSeries for
>>> VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
>>> pSeries machines.
>>
>> Wait, did they previously not have any support?
> >
>> Again, this TCE stuff needs to go away, not grow. I can grudgingly
>> accept fixing it where it used to work, but not enabling more HW that
>> never worked before! :(
>
>
> This used to work when I tried last time 2+ years ago, not a new 
> stuff. Thanks,
>
Thanks Alexey for pitching in.


Hi Jason,


As Alexey implied, this used to work in the past.


The support for pSeries VFIO exists for a long time, and the support
for VFIO_SPAPR_TCE_v2_IOMMU also was added with
9d67c9433509 ("powerpc/iommu: Add "borrowing" iommu_table_group_ops")


The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to 
it_userspace")
broke the userspace view for pSeries, which the Patch 6 here tries to
bring back.


We found more issues with 9d67c9433509 and I felt its
better to stop "borrowing" the DMA windows as that would be
cleaner which is what is done in Patch 6.


In this process we discovered few bugs in upstream as well, which
we have been trying to fix and have posted few of fixes earlier like,
d2d00e15808 powerpc: iommu: Bring back table group release_ownership() call
83b3836bf83 iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA


So, this patch series tries to fix some more issues(patch 2, 4, 6)
coupled with some code refactoring(1, 3, 5 & 6) to stop "borrowing"
DMA windows.


We have legacy workloads using VFIO in userspace/kvm guests running
on downstream distro kernels. We want these workloads to be able to
continue running on our arch.


Going forward we are planning to have the IOMMUFD support for PPC64,
I firmly believe the refactoring in this patch series is a step in
that direction.


Thanks,
Shivaprasad
Jason Gunthorpe May 6, 2024, 5:43 p.m. UTC | #4
On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote:
> We have legacy workloads using VFIO in userspace/kvm guests running
> on downstream distro kernels. We want these workloads to be able to
> continue running on our arch.

It has been broken since 2018, I don't find this reasoning entirely
reasonable :\

> I firmly believe the refactoring in this patch series is a step in
> that direction.

But fine, as long as we are going to fix it. PPC really needs this to
be resolved to keep working.

Jason
Shivaprasad G Bhat May 7, 2024, 3:10 p.m. UTC | #5
Hi Jason,


On 5/6/24 23:13, Jason Gunthorpe wrote:
> On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote:
>> We have legacy workloads using VFIO in userspace/kvm guests running
>> on downstream distro kernels. We want these workloads to be able to
>> continue running on our arch.
> It has been broken since 2018, I don't find this reasoning entirely
> reasonable :\

Though upstream has been broken since 2018 for pSeries, the breaking

patches got trickled into downstream distro kernels only in the last few

years. The legacy workloads that were running on PowerNV with these

downstream distros are now broken on the pSeries logical partitions

without the fixes in this series.

>> I firmly believe the refactoring in this patch series is a step in
>> that direction.
> But fine, as long as we are going to fix it. PPC really needs this to
> be resolved to keep working.

Thanks, We are working on it.


Regards,

Shivaprasad

>
> Jason
Shawn Anastasio May 10, 2024, 6:33 p.m. UTC | #6
On 5/6/24 12:43 PM, Jason Gunthorpe wrote:
> On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote:
>> We have legacy workloads using VFIO in userspace/kvm guests running
>> on downstream distro kernels. We want these workloads to be able to
>> continue running on our arch.
> 
> It has been broken since 2018, I don't find this reasoning entirely
> reasonable :\
>

Raptor is currently working on an automated test runner setup to
exercise the VFIO subsystem on PowerNV and (to a lesser extent) pSeries,
so breakages like this going forward will hopefully be caught much more
quickly.

>> I firmly believe the refactoring in this patch series is a step in
>> that direction.
> 
> But fine, as long as we are going to fix it. PPC really needs this to
> be resolved to keep working.
>

Agreed. Modernizing/de-cluttering PPC's IOMMU code in general is another
task that we're working towards. As mentioned previously on the list,
we're working towards a more standard IOMMU driver for PPC that can be
used with dma_iommu, which I think will be a good step towards cleaning
this up. Initially PowerNV is going to be our target, but to the extent
that it is possible and useful, pSeries support could be brought in
later.

> Jason

Thanks,
Shawn