diff mbox series

[RFCv1,07/14] iommufd: Add viommu set/unset_dev_id ops

Message ID 6e57d7b5aa1705bdd547b1cd2aca93d3bf70dfa4.1712978212.git.nicolinc@nvidia.com
State Handled Elsewhere
Headers show
Series Add Tegra241 (Grace) CMDQV Support (part 2/2) | expand

Commit Message

Nicolin Chen April 13, 2024, 3:47 a.m. UTC
Add a pair of ops to set and unet device's virtual ID that belongs to
a viommu object. They will be used, in the following patch, by iommufd
to support some HW-acceleration feature from the host level.

For instance, every device behind an ARM SMMU has a Stream ID. The ID
is used by ATC invalidation commands so SMMU HW can direct invalidation
requests to the corresponding PCI device where the ID belongs to. In a
virtualization use case, a passthroughed device in the VM will have a
virtuail Stream ID, used by the ATC invalidation commands in the guest
system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to
execute the guest-level ATC invalidation commands directly, yet needs
the HW to be aware of its virtual Stream ID so it can replace with its
physical Stream ID.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/iommufd.h | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jason Gunthorpe May 12, 2024, 2:46 p.m. UTC | #1
On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> Add a pair of ops to set and unet device's virtual ID that belongs to
> a viommu object. They will be used, in the following patch, by iommufd
> to support some HW-acceleration feature from the host level.
> 
> For instance, every device behind an ARM SMMU has a Stream ID. The ID
> is used by ATC invalidation commands so SMMU HW can direct invalidation
> requests to the corresponding PCI device where the ID belongs to. In a
> virtualization use case, a passthroughed device in the VM will have a
> virtuail Stream ID, used by the ATC invalidation commands in the guest
> system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to
> execute the guest-level ATC invalidation commands directly, yet needs
> the HW to be aware of its virtual Stream ID so it can replace with its
> physical Stream ID.

I imagine using this as well for the ATC invalidation commands. It
would be very easy and simplifying if the command fixup just extracted
the vSID from the ATC invalidation and used an xarray to turn it into
a pSID and then pushed the resulting command.

Seems fine as is

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Jason Gunthorpe May 12, 2024, 2:51 p.m. UTC | #2
On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> @@ -94,9 +94,13 @@ struct iommufd_viommu {
>   * struct iommufd_viommu_ops - viommu specific operations
>   * @free: Free all driver-specific parts of an iommufd_viommu. The memory
>   *        of the entire viommu will be free-ed by iommufd core
> + * @set/unset_dev_id: set/unset a user space virtual id for a device
>   */
>  struct iommufd_viommu_ops {
>  	void (*free)(struct iommufd_viommu *viommu);
> +	int (*set_dev_id)(struct iommufd_viommu *viommu,
> +			  struct device *dev, u64 dev_id);
> +	void (*unset_dev_id)(struct iommufd_viommu *viommu, struct
> device *dev);

Actually, looking at this more closely, why doesn't the core pass in
dev_id to unset_dev_id? Not doing so seems like it will crease a bunch
of unnecessary work for the driver.

Jason
Nicolin Chen May 13, 2024, 4:39 a.m. UTC | #3
On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > Add a pair of ops to set and unet device's virtual ID that belongs to
> > a viommu object. They will be used, in the following patch, by iommufd
> > to support some HW-acceleration feature from the host level.
> > 
> > For instance, every device behind an ARM SMMU has a Stream ID. The ID
> > is used by ATC invalidation commands so SMMU HW can direct invalidation
> > requests to the corresponding PCI device where the ID belongs to. In a
> > virtualization use case, a passthroughed device in the VM will have a
> > virtuail Stream ID, used by the ATC invalidation commands in the guest
> > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to
> > execute the guest-level ATC invalidation commands directly, yet needs
> > the HW to be aware of its virtual Stream ID so it can replace with its
> > physical Stream ID.
> 
> I imagine using this as well for the ATC invalidation commands. It
> would be very easy and simplifying if the command fixup just extracted
> the vSID from the ATC invalidation and used an xarray to turn it into
> a pSID and then pushed the resulting command.

You mean the nested SMMU series right? Actually the set_dev_id
ioctl was a part of that until we wanted to try DEV_INVALIDATE.

So again, yes, it makes sense to me that we move viommu and the
set_dev_id to the nested series, and then drop DEV_INVALIDATE.

Thanks
Nicolin
Jason Gunthorpe May 14, 2024, 3:53 p.m. UTC | #4
On Sun, May 12, 2024 at 09:39:15PM -0700, Nicolin Chen wrote:
> On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > > Add a pair of ops to set and unet device's virtual ID that belongs to
> > > a viommu object. They will be used, in the following patch, by iommufd
> > > to support some HW-acceleration feature from the host level.
> > > 
> > > For instance, every device behind an ARM SMMU has a Stream ID. The ID
> > > is used by ATC invalidation commands so SMMU HW can direct invalidation
> > > requests to the corresponding PCI device where the ID belongs to. In a
> > > virtualization use case, a passthroughed device in the VM will have a
> > > virtuail Stream ID, used by the ATC invalidation commands in the guest
> > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to
> > > execute the guest-level ATC invalidation commands directly, yet needs
> > > the HW to be aware of its virtual Stream ID so it can replace with its
> > > physical Stream ID.
> > 
> > I imagine using this as well for the ATC invalidation commands. It
> > would be very easy and simplifying if the command fixup just extracted
> > the vSID from the ATC invalidation and used an xarray to turn it into
> > a pSID and then pushed the resulting command.
> 
> You mean the nested SMMU series right? Actually the set_dev_id
> ioctl was a part of that until we wanted to try DEV_INVALIDATE.

Yes, there is nothing inherently wrong with DEV_INVALIDATE, we could
continue to use that as the API and automatically pick up the VIOMMU
instance from the nesting domain to process the ATS.

The VMM needs a reliable place to send the CMDQ data, on ARM/AMD this
needs to be an always available global-to-the-viommu thing. Intel
needs to associate the virtual invalidation with the correct nesting
domain as well.

So I original thought it would nice and simple to have a
VIOMMU_INVALIDATE as well.

But.. If we need a nesting domain that is indentity (ie the S2) then
when the VIOMMU is created then we'd also create an identity nesting
domain as well. Functionally we could use that global nesting domain
to deliver the DEV_INVALIDATE too.

It is a bit quirky, but it would be OK.

> So again, yes, it makes sense to me that we move viommu and the
> set_dev_id to the nested series, and then drop DEV_INVALIDATE.

I would like to do this bit by bit. viommu is a big series on its own.

DEV_INVALIDATE is fine, it just can't do ATS invalidation.

We can add ATS invalidation after either as an enhancement as part of
adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or
both)

Jason
Nicolin Chen May 15, 2024, 1:59 a.m. UTC | #5
On Tue, May 14, 2024 at 12:53:23PM -0300, Jason Gunthorpe wrote:
> On Sun, May 12, 2024 at 09:39:15PM -0700, Nicolin Chen wrote:
> > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > > > Add a pair of ops to set and unet device's virtual ID that belongs to
> > > > a viommu object. They will be used, in the following patch, by iommufd
> > > > to support some HW-acceleration feature from the host level.
> > > > 
> > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID
> > > > is used by ATC invalidation commands so SMMU HW can direct invalidation
> > > > requests to the corresponding PCI device where the ID belongs to. In a
> > > > virtualization use case, a passthroughed device in the VM will have a
> > > > virtuail Stream ID, used by the ATC invalidation commands in the guest
> > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to
> > > > execute the guest-level ATC invalidation commands directly, yet needs
> > > > the HW to be aware of its virtual Stream ID so it can replace with its
> > > > physical Stream ID.
> > > 
> > > I imagine using this as well for the ATC invalidation commands. It
> > > would be very easy and simplifying if the command fixup just extracted
> > > the vSID from the ATC invalidation and used an xarray to turn it into
> > > a pSID and then pushed the resulting command.
> > 
> > You mean the nested SMMU series right? Actually the set_dev_id
> > ioctl was a part of that until we wanted to try DEV_INVALIDATE.
> 
> Yes, there is nothing inherently wrong with DEV_INVALIDATE, we could
> continue to use that as the API and automatically pick up the VIOMMU
> instance from the nesting domain to process the ATS.
> 
> The VMM needs a reliable place to send the CMDQ data, on ARM/AMD this
> needs to be an always available global-to-the-viommu thing. Intel
> needs to associate the virtual invalidation with the correct nesting
> domain as well.
> 
> So I original thought it would nice and simple to have a
> VIOMMU_INVALIDATE as well.
> 
> But.. If we need a nesting domain that is indentity (ie the S2) then
> when the VIOMMU is created then we'd also create an identity nesting
> domain as well.

So, you want a proxy S1 domain for a device to attach, in case
of a stage-2 only setup, because an S2 domain will no longer has
a VMID, since it's shared among viommus. In the SMMU driver case,
an arm_smmu_domain won't have an smmu pointer, so a device can't
attach to an S2 domain but always an nested S1 domain, right?

> Functionally we could use that global nesting domain
> to deliver the DEV_INVALIDATE too.

If my narrative above is correct, the device is actually still
attached to S2 domain via a proxy nested S1 domain. What cache
do we need to invalidate except S2 mappings in this case?

> > So again, yes, it makes sense to me that we move viommu and the
> > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> 
> I would like to do this bit by bit. viommu is a big series on its own.
> 
> DEV_INVALIDATE is fine, it just can't do ATS invalidation.

I am not very sure about AMD. Intel doesn't need DEV_INVALIDATE
at this moment. SMMU only uses DEV_INVALIDATE for ATC and CD
invalidations, which can be both shifted to VIOMMU_INVALIDATE.

Same question: any other case can we use the DEV_INVALIDATE for?

> We can add ATS invalidation after either as an enhancement as part of
> adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or
> both)

Yea, maybe step by step like this:

Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH
Part-2 VIOMMU_SET/UNSET_VDEV_ID
Part-3 VIOMMU_INVALIDATE
Part-4 VQUEUE_ALLOC
...

Thanks
Nicolin
Jason Gunthorpe May 21, 2024, 6:24 p.m. UTC | #6
On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> So, you want a proxy S1 domain for a device to attach, in case
> of a stage-2 only setup, because an S2 domain will no longer has
> a VMID, since it's shared among viommus. In the SMMU driver case,
> an arm_smmu_domain won't have an smmu pointer, so a device can't
> attach to an S2 domain but always an nested S1 domain, right?

That seems like a simple solution to the VMID lifetime, but it means
the kernel has to decode more types of vSTE.

> > Functionally we could use that global nesting domain
> > to deliver the DEV_INVALIDATE too.
> 
> If my narrative above is correct, the device is actually still
> attached to S2 domain via a proxy nested S1 domain. What cache
> do we need to invalidate except S2 mappings in this case?

qemu needs a reliable place to send the invalidation commands to (ie
what ID to provide).

If we add IOMMU_VIOMMU_INVALIDATE then the ID is the viommu id.

If we enhance IOMMU_HWPT_INVALIDATE then the ID is the identity
nesting domain.

Either case leads to the viommu object in the kernel.

I don't know if there is merit one way or the other. A more specific
API surface is nice, but the two APIs are completely duplicating.

So maybe:

#define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE

As documentation and have the kernel just detect based on the type of
the passed ID?

> > > So again, yes, it makes sense to me that we move viommu and the
> > > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> > 
> > I would like to do this bit by bit. viommu is a big series on its own.
> > 
> > DEV_INVALIDATE is fine, it just can't do ATS invalidation.
> 
> I am not very sure about AMD.

AMD will need the same vRID -> pRID stuff and we want that to run on
the VIOMMU

> Same question: any other case can we use the DEV_INVALIDATE for?

DEV_INVALIDATE was interesting before the viommu idea because
userspace could process each invalidation command and when it reaches
ATS it would invoke the correct DEV_INVALIDATE.

With viommu we expect ATS supporting drivers to support viommu and
then to do vRID -> pRID in the other invalidation paths. In this case
I don't see a reason to do DEV_INVALIDATE right now.

> > We can add ATS invalidation after either as an enhancement as part of
> > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or
> > both)
> 
> Yea, maybe step by step like this:
> 
> Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH
> Part-2 VIOMMU_SET/UNSET_VDEV_ID
> Part-3 VIOMMU_INVALIDATE
> Part-4 VQUEUE_ALLOC
> ...

So we have this stuff still open:
 - Identity STE with PASID (part 2b)
 - IOMMU_GET_HW_INFO (part 3)
 - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3)
 - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3)
 - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3
 - VIOMMU_ALLOC, VIOMMU_ATTACH
 - VIOMMU_INVALIDATE
 - VIOMMU_SET/UNSET_VDEV_ID
 - VQUEUE_ALLOC / vCMDQ

I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit
to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID
which brings ATS support the API. vCMDQ hypervisor support would sit
on top of that with just VQUEUE?

Jason
Nicolin Chen May 21, 2024, 10:27 p.m. UTC | #7
On Tue, May 21, 2024 at 03:24:48PM -0300, Jason Gunthorpe wrote:
> On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > So, you want a proxy S1 domain for a device to attach, in case
> > of a stage-2 only setup, because an S2 domain will no longer has
> > a VMID, since it's shared among viommus. In the SMMU driver case,
> > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > attach to an S2 domain but always an nested S1 domain, right?
> 
> That seems like a simple solution to the VMID lifetime, but it means
> the kernel has to decode more types of vSTE.

Yea. For vSTE=abort, likely we need a nested block domain too?

> > > Functionally we could use that global nesting domain
> > > to deliver the DEV_INVALIDATE too.
> > 
> > If my narrative above is correct, the device is actually still
> > attached to S2 domain via a proxy nested S1 domain. What cache
> > do we need to invalidate except S2 mappings in this case?
> 
> qemu needs a reliable place to send the invalidation commands to (ie
> what ID to provide).
> 
> If we add IOMMU_VIOMMU_INVALIDATE then the ID is the viommu id.
> 
> If we enhance IOMMU_HWPT_INVALIDATE then the ID is the identity
> nesting domain.
> 
> Either case leads to the viommu object in the kernel.

Ooohh! I didn't connect the dots this far. Yes. This turns the
IOMMU_HWPT_INVALIDATE back to the very first version supporting
device cache flush. Though using IOMMU_HWPT_INVALIDATE feels a
bit rule breaking now since it assumes the nested HWPT keeps a
vdev_id lookup table somewhere in its associates...

> I don't know if there is merit one way or the other. A more specific
> API surface is nice, but the two APIs are completely duplicating.
> 
> So maybe:
> 
> #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE
> 
> As documentation and have the kernel just detect based on the type of
> the passed ID?

Yea, the only difference is viommu_id v.s. hwpt_id that we can
document.

Then in this case, we have two mostly identical uAPIs for the
SMMU driver to use. Should we implement both?

> > > > So again, yes, it makes sense to me that we move viommu and the
> > > > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> > > 
> > > I would like to do this bit by bit. viommu is a big series on its own.
> > > 
> > > DEV_INVALIDATE is fine, it just can't do ATS invalidation.
> > 
> > I am not very sure about AMD.
> 
> AMD will need the same vRID -> pRID stuff and we want that to run on
> the VIOMMU
> 
> > Same question: any other case can we use the DEV_INVALIDATE for?
> 
> DEV_INVALIDATE was interesting before the viommu idea because
> userspace could process each invalidation command and when it reaches
> ATS it would invoke the correct DEV_INVALIDATE.

Agreed. That helped a lot in VMM dispatching the invalidation
requests.

> With viommu we expect ATS supporting drivers to support viommu and
> then to do vRID -> pRID in the other invalidation paths. In this case
> I don't see a reason to do DEV_INVALIDATE right now.

Yea. I guessed so.

> > > We can add ATS invalidation after either as an enhancement as part of
> > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or
> > > both)
> > 
> > Yea, maybe step by step like this:
> > 
> > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH
> > Part-2 VIOMMU_SET/UNSET_VDEV_ID
> > Part-3 VIOMMU_INVALIDATE
> > Part-4 VQUEUE_ALLOC
> > ...
> 
> So we have this stuff still open:
>  - Identity STE with PASID (part 2b)
>  - IOMMU_GET_HW_INFO (part 3)
>  - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3)
>  - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3)
>  - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3
>  - VIOMMU_ALLOC, VIOMMU_ATTACH
>  - VIOMMU_INVALIDATE
>  - VIOMMU_SET/UNSET_VDEV_ID
>  - VQUEUE_ALLOC / vCMDQ
> 
> I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit
> to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID
> which brings ATS support the API.

There is some conflict at passing in viommu_id/viommu v.s. parent
hwpt_id/domain for a nested domain allocation. Do you think that
should be addressed later in VIOMMU series v.s. part3?

More specifically, I have two drafts in my viommu series:
87a659e65229 WAR: iommufd: Allow pt_it to carry viommu_id
7c5fd8f50bc9 WAR pass in viommu pointer to domain_alloc_user op

I know that these two only make sense with VIOMMU_ALOC. Yet, will
there be a problem, if we establish nested domain allocation with
parent domain/hwpt by part3, in the uAPI, and then change later?
Will we end up with supporting two for backward compatibility?

> vCMDQ hypervisor support would sit on top of that with just VQUEUE?

Yea.

Thanks
Nicolin
Jason Gunthorpe May 22, 2024, 1:59 p.m. UTC | #8
On Tue, May 21, 2024 at 03:27:02PM -0700, Nicolin Chen wrote:
> On Tue, May 21, 2024 at 03:24:48PM -0300, Jason Gunthorpe wrote:
> > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > So, you want a proxy S1 domain for a device to attach, in case
> > > of a stage-2 only setup, because an S2 domain will no longer has
> > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > > attach to an S2 domain but always an nested S1 domain, right?
> > 
> > That seems like a simple solution to the VMID lifetime, but it means
> > the kernel has to decode more types of vSTE.
> 
> Yea. For vSTE=abort, likely we need a nested block domain too?

Sure, it is easy to do
 
> > I don't know if there is merit one way or the other. A more specific
> > API surface is nice, but the two APIs are completely duplicating.
> > 
> > So maybe:
> > 
> > #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE
> > 
> > As documentation and have the kernel just detect based on the type of
> > the passed ID?
> 
> Yea, the only difference is viommu_id v.s. hwpt_id that we can
> document.
> 
> Then in this case, we have two mostly identical uAPIs for the
> SMMU driver to use. Should we implement both?

I suspect it will turn out nicely naturally, lets try and see
 
> > > > We can add ATS invalidation after either as an enhancement as part of
> > > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or
> > > > both)
> > > 
> > > Yea, maybe step by step like this:
> > > 
> > > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH
> > > Part-2 VIOMMU_SET/UNSET_VDEV_ID
> > > Part-3 VIOMMU_INVALIDATE
> > > Part-4 VQUEUE_ALLOC
> > > ...
> > 
> > So we have this stuff still open:
> >  - Identity STE with PASID (part 2b)
> >  - IOMMU_GET_HW_INFO (part 3)
> >  - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3)
> >  - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3)
> >  - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3
> >  - VIOMMU_ALLOC, VIOMMU_ATTACH
> >  - VIOMMU_INVALIDATE
> >  - VIOMMU_SET/UNSET_VDEV_ID
> >  - VQUEUE_ALLOC / vCMDQ
> > 
> > I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit
> > to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID
> > which brings ATS support the API.
> 
> There is some conflict at passing in viommu_id/viommu v.s. parent
> hwpt_id/domain for a nested domain allocation. Do you think that
> should be addressed later in VIOMMU series v.s. part3?
>
> More specifically, I have two drafts in my viommu series:
> 87a659e65229 WAR: iommufd: Allow pt_it to carry viommu_id
> 7c5fd8f50bc9 WAR pass in viommu pointer to domain_alloc_user op

It would be good for viommu to come with all the uAPI changes in one
shot, so all the pt_ids should be updated to accept viommu to pass the
S2 HWPT.

Then whatever driver changes are needed to make ATS work should come
together too.

> I know that these two only make sense with VIOMMU_ALOC. Yet, will
> there be a problem, if we establish nested domain allocation with
> parent domain/hwpt by part3, in the uAPI, and then change later?
> Will we end up with supporting two for backward compatibility?

I think this is fairly minor compatability, let's see.

Jason
Tian, Kevin May 23, 2024, 5:44 a.m. UTC | #9
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, May 13, 2024 12:39 PM
> 
> On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > > Add a pair of ops to set and unet device's virtual ID that belongs to
> > > a viommu object. They will be used, in the following patch, by iommufd
> > > to support some HW-acceleration feature from the host level.
> > >
> > > For instance, every device behind an ARM SMMU has a Stream ID. The ID
> > > is used by ATC invalidation commands so SMMU HW can direct
> invalidation
> > > requests to the corresponding PCI device where the ID belongs to. In a
> > > virtualization use case, a passthroughed device in the VM will have a
> > > virtuail Stream ID, used by the ATC invalidation commands in the guest
> > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface
> to
> > > execute the guest-level ATC invalidation commands directly, yet needs
> > > the HW to be aware of its virtual Stream ID so it can replace with its
> > > physical Stream ID.
> >
> > I imagine using this as well for the ATC invalidation commands. It
> > would be very easy and simplifying if the command fixup just extracted
> > the vSID from the ATC invalidation and used an xarray to turn it into
> > a pSID and then pushed the resulting command.
> 
> You mean the nested SMMU series right? Actually the set_dev_id
> ioctl was a part of that until we wanted to try DEV_INVALIDATE.
> 
> So again, yes, it makes sense to me that we move viommu and the
> set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> 

I'm right about to ask how the nesting series is going. Per earlier
discussion iirc the nesting series will go in before VCMDQ?
Nicolin Chen May 23, 2024, 6:09 a.m. UTC | #10
On Thu, May 23, 2024 at 05:44:40AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Monday, May 13, 2024 12:39 PM
> >
> > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > > > Add a pair of ops to set and unet device's virtual ID that belongs to
> > > > a viommu object. They will be used, in the following patch, by iommufd
> > > > to support some HW-acceleration feature from the host level.
> > > >
> > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID
> > > > is used by ATC invalidation commands so SMMU HW can direct
> > invalidation
> > > > requests to the corresponding PCI device where the ID belongs to. In a
> > > > virtualization use case, a passthroughed device in the VM will have a
> > > > virtuail Stream ID, used by the ATC invalidation commands in the guest
> > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface
> > to
> > > > execute the guest-level ATC invalidation commands directly, yet needs
> > > > the HW to be aware of its virtual Stream ID so it can replace with its
> > > > physical Stream ID.
> > >
> > > I imagine using this as well for the ATC invalidation commands. It
> > > would be very easy and simplifying if the command fixup just extracted
> > > the vSID from the ATC invalidation and used an xarray to turn it into
> > > a pSID and then pushed the resulting command.
> >
> > You mean the nested SMMU series right? Actually the set_dev_id
> > ioctl was a part of that until we wanted to try DEV_INVALIDATE.
> >
> > So again, yes, it makes sense to me that we move viommu and the
> > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> 
> I'm right about to ask how the nesting series is going. Per earlier
> discussion iirc the nesting series will go in before VCMDQ?

Yes. It still should. Yet we ended up with adding VIOMMU to the
nested SMMU series too. A shared S2 domain/hwpt isn't exclusive
for VCMDQ use case but also for regular nesting on a multi-SMMU
setup. So, VIOMMU turns out to be the best object that we have
at this moment to hold individual VMIDs for different physical
SMMUs sharing a single S2 domain. Its virtual device ID lookup
feature can also allow us to forget about DEV_INVALIDATE ioctl
for now.

Jason listed all the tasks ahead in this thread too, using SMMU
as an example:
> So we have this stuff still open:
>  - Identity STE with PASID (part 2b)
>  - IOMMU_GET_HW_INFO (part 3)
>  - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3)
>  - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3)
>  - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3
>  - VIOMMU_ALLOC, VIOMMU_ATTACH
>  - VIOMMU_INVALIDATE

By this series nesting setup is done. We need Baolu's solution
or VQUEUE for fault reporting after that.

Thanks
Nicolin
Tian, Kevin May 23, 2024, 6:19 a.m. UTC | #11
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, May 22, 2024 2:25 AM
> 
> On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > So, you want a proxy S1 domain for a device to attach, in case
> > of a stage-2 only setup, because an S2 domain will no longer has
> > a VMID, since it's shared among viommus. In the SMMU driver case,
> > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > attach to an S2 domain but always an nested S1 domain, right?
> 
> That seems like a simple solution to the VMID lifetime, but it means
> the kernel has to decode more types of vSTE.
> 

why does ATC invalidation need to know about VMID?

Or is above talking about beyond the discussion between DEV_INVALIDATE
vs. VIOMMU_INVALIDATE?
Tian, Kevin May 23, 2024, 6:22 a.m. UTC | #12
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, May 23, 2024 2:10 PM
> 
> On Thu, May 23, 2024 at 05:44:40AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Monday, May 13, 2024 12:39 PM
> > >
> > > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > > > > Add a pair of ops to set and unet device's virtual ID that belongs to
> > > > > a viommu object. They will be used, in the following patch, by
> iommufd
> > > > > to support some HW-acceleration feature from the host level.
> > > > >
> > > > > For instance, every device behind an ARM SMMU has a Stream ID. The
> ID
> > > > > is used by ATC invalidation commands so SMMU HW can direct
> > > invalidation
> > > > > requests to the corresponding PCI device where the ID belongs to. In a
> > > > > virtualization use case, a passthroughed device in the VM will have a
> > > > > virtuail Stream ID, used by the ATC invalidation commands in the
> guest
> > > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-
> interface
> > > to
> > > > > execute the guest-level ATC invalidation commands directly, yet needs
> > > > > the HW to be aware of its virtual Stream ID so it can replace with its
> > > > > physical Stream ID.
> > > >
> > > > I imagine using this as well for the ATC invalidation commands. It
> > > > would be very easy and simplifying if the command fixup just extracted
> > > > the vSID from the ATC invalidation and used an xarray to turn it into
> > > > a pSID and then pushed the resulting command.
> > >
> > > You mean the nested SMMU series right? Actually the set_dev_id
> > > ioctl was a part of that until we wanted to try DEV_INVALIDATE.
> > >
> > > So again, yes, it makes sense to me that we move viommu and the
> > > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> >
> > I'm right about to ask how the nesting series is going. Per earlier
> > discussion iirc the nesting series will go in before VCMDQ?
> 
> Yes. It still should. Yet we ended up with adding VIOMMU to the
> nested SMMU series too. A shared S2 domain/hwpt isn't exclusive
> for VCMDQ use case but also for regular nesting on a multi-SMMU
> setup. So, VIOMMU turns out to be the best object that we have
> at this moment to hold individual VMIDs for different physical
> SMMUs sharing a single S2 domain. Its virtual device ID lookup
> feature can also allow us to forget about DEV_INVALIDATE ioctl
> for now.

Yeah, that makes sense. btw it's probably helpful to future reviews
if you can try to include more explanations about the design
tradeoff based on those discussions. Somehow it's not easy for me
to catch up with those discussions being lacking of many contexts
here. 😊

> 
> Jason listed all the tasks ahead in this thread too, using SMMU
> as an example:
> > So we have this stuff still open:
> >  - Identity STE with PASID (part 2b)
> >  - IOMMU_GET_HW_INFO (part 3)
> >  - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3)
> >  - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3)
> >  - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3
> >  - VIOMMU_ALLOC, VIOMMU_ATTACH
> >  - VIOMMU_INVALIDATE
> 
> By this series nesting setup is done. We need Baolu's solution
> or VQUEUE for fault reporting after that.
> 

this is a clear plan.
Jason Gunthorpe May 23, 2024, 1:33 p.m. UTC | #13
On Thu, May 23, 2024 at 05:44:40AM +0000, Tian, Kevin wrote:
> > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote:
> > > > Add a pair of ops to set and unet device's virtual ID that belongs to
> > > > a viommu object. They will be used, in the following patch, by iommufd
> > > > to support some HW-acceleration feature from the host level.
> > > >
> > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID
> > > > is used by ATC invalidation commands so SMMU HW can direct
> > invalidation
> > > > requests to the corresponding PCI device where the ID belongs to. In a
> > > > virtualization use case, a passthroughed device in the VM will have a
> > > > virtuail Stream ID, used by the ATC invalidation commands in the guest
> > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface
> > to
> > > > execute the guest-level ATC invalidation commands directly, yet needs
> > > > the HW to be aware of its virtual Stream ID so it can replace with its
> > > > physical Stream ID.
> > >
> > > I imagine using this as well for the ATC invalidation commands. It
> > > would be very easy and simplifying if the command fixup just extracted
> > > the vSID from the ATC invalidation and used an xarray to turn it into
> > > a pSID and then pushed the resulting command.
> > 
> > You mean the nested SMMU series right? Actually the set_dev_id
> > ioctl was a part of that until we wanted to try DEV_INVALIDATE.
> > 
> > So again, yes, it makes sense to me that we move viommu and the
> > set_dev_id to the nested series, and then drop DEV_INVALIDATE.
> > 
> 
> I'm right about to ask how the nesting series is going. Per earlier
> discussion iirc the nesting series will go in before VCMDQ?

It has been sitting on my github, since it is now the N+1 series to
merge I will post it in a few weeks. There are still some details
yet to be typed in unfortunately.

Jason
Jason Gunthorpe May 23, 2024, 3:01 p.m. UTC | #14
On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, May 22, 2024 2:25 AM
> > 
> > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > So, you want a proxy S1 domain for a device to attach, in case
> > > of a stage-2 only setup, because an S2 domain will no longer has
> > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > > attach to an S2 domain but always an nested S1 domain, right?
> > 
> > That seems like a simple solution to the VMID lifetime, but it means
> > the kernel has to decode more types of vSTE.
> > 
> 
> why does ATC invalidation need to know about VMID?

ATC invalidation always requires a vRID to pRID translation and the
VIOMMU will hold that translation.

On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into
HW, and vCMDQ requires the VMID to do that.

Jason
Tian, Kevin May 24, 2024, 2:21 a.m. UTC | #15
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, May 23, 2024 11:01 PM
> 
> On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, May 22, 2024 2:25 AM
> > >
> > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > > So, you want a proxy S1 domain for a device to attach, in case
> > > > of a stage-2 only setup, because an S2 domain will no longer has
> > > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > > > attach to an S2 domain but always an nested S1 domain, right?
> > >
> > > That seems like a simple solution to the VMID lifetime, but it means
> > > the kernel has to decode more types of vSTE.
> > >
> >
> > why does ATC invalidation need to know about VMID?
> 
> ATC invalidation always requires a vRID to pRID translation and the
> VIOMMU will hold that translation.
> 
> On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into
> HW, and vCMDQ requires the VMID to do that.
> 

At a quick glance VMID and vRID->pRID translation are both configurations
of a vintf.

My impression was that vintf->vmid is added to guest cmd when it's
about iotlb invalidation.

then vintf->sid_slots is walked when handling a guest cmd for ATC
invalidation.

I'm not sure why the latter one requires a valid VMID to do the walking
except it's a implementation choice made by vCMDQ?
Nicolin Chen May 24, 2024, 3:26 a.m. UTC | #16
On Fri, May 24, 2024 at 02:21:23AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, May 23, 2024 11:01 PM
> >
> > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Wednesday, May 22, 2024 2:25 AM
> > > >
> > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > > > So, you want a proxy S1 domain for a device to attach, in case
> > > > > of a stage-2 only setup, because an S2 domain will no longer has
> > > > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > > > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > > > > attach to an S2 domain but always an nested S1 domain, right?
> > > >
> > > > That seems like a simple solution to the VMID lifetime, but it means
> > > > the kernel has to decode more types of vSTE.
> > > >
> > >
> > > why does ATC invalidation need to know about VMID?
> >
> > ATC invalidation always requires a vRID to pRID translation and the
> > VIOMMU will hold that translation.
> >
> > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into
> > HW, and vCMDQ requires the VMID to do that.
> >
> 
> At a quick glance VMID and vRID->pRID translation are both configurations
> of a vintf.
> 
> My impression was that vintf->vmid is added to guest cmd when it's
> about iotlb invalidation.
> 
> then vintf->sid_slots is walked when handling a guest cmd for ATC
> invalidation.
> 
> I'm not sure why the latter one requires a valid VMID to do the walking
> except it's a implementation choice made by vCMDQ?

Well, we haven't thought about a case of doing ATC invalidation
via VINTF/VCMDQ without setting up a VMID, as "VMID" is a field
in the VINTF_CONFIG register next to the Enable bit and must be
set prior to enabling a VINTF, though your understanding of the
HW work flow is probably accurate :)

And the narrative at the top was trying to describe the links:
  [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ]
v.s.
  [ device ] => [ non-shareable S2 ]

So the first case can take advantage of VIOMMU_INVALIDATE v.s.
the second case requires a DEV_INVALIDATE.

Another conclusion between the lines: since ARM SMMU will have
the first case (with viommu), there is no longer any use of a
DEV_INVALIDATE ioctl. So, we would likely drop it in the coming
version.

Thanks
Nicolin
Tian, Kevin May 24, 2024, 5:24 a.m. UTC | #17
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, May 24, 2024 11:26 AM
> 
> On Fri, May 24, 2024 at 02:21:23AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, May 23, 2024 11:01 PM
> > >
> > > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Wednesday, May 22, 2024 2:25 AM
> > > > >
> > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > > > > So, you want a proxy S1 domain for a device to attach, in case
> > > > > > of a stage-2 only setup, because an S2 domain will no longer has
> > > > > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > > > > an arm_smmu_domain won't have an smmu pointer, so a device
> can't
> > > > > > attach to an S2 domain but always an nested S1 domain, right?
> > > > >
> > > > > That seems like a simple solution to the VMID lifetime, but it means
> > > > > the kernel has to decode more types of vSTE.
> > > > >
> > > >
> > > > why does ATC invalidation need to know about VMID?
> > >
> > > ATC invalidation always requires a vRID to pRID translation and the
> > > VIOMMU will hold that translation.
> > >
> > > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed
> into
> > > HW, and vCMDQ requires the VMID to do that.
> > >
> >
> > At a quick glance VMID and vRID->pRID translation are both configurations
> > of a vintf.
> >
> > My impression was that vintf->vmid is added to guest cmd when it's
> > about iotlb invalidation.
> >
> > then vintf->sid_slots is walked when handling a guest cmd for ATC
> > invalidation.
> >
> > I'm not sure why the latter one requires a valid VMID to do the walking
> > except it's a implementation choice made by vCMDQ?
> 
> Well, we haven't thought about a case of doing ATC invalidation
> via VINTF/VCMDQ without setting up a VMID, as "VMID" is a field
> in the VINTF_CONFIG register next to the Enable bit and must be
> set prior to enabling a VINTF, though your understanding of the
> HW work flow is probably accurate :)

Okay, that explains it. they are irrelevant in concept but come
relevant due to that detail. 😊

> 
> And the narrative at the top was trying to describe the links:
>   [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ]
> v.s.
>   [ device ] => [ non-shareable S2 ]
> 
> So the first case can take advantage of VIOMMU_INVALIDATE v.s.
> the second case requires a DEV_INVALIDATE.

and one side-effect in the first case is to save one VMID for
non-shareable S2 hence improves iotlb efficiency.

> 
> Another conclusion between the lines: since ARM SMMU will have
> the first case (with viommu), there is no longer any use of a
> DEV_INVALIDATE ioctl. So, we would likely drop it in the coming
> version.
> 
> Thanks
> Nicolin
Nicolin Chen May 24, 2024, 5:57 a.m. UTC | #18
On Fri, May 24, 2024 at 05:24:11AM +0000, Tian, Kevin wrote:
> > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > > > > > So, you want a proxy S1 domain for a device to attach, in case
> > > > > > > of a stage-2 only setup, because an S2 domain will no longer has
> > > > > > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > > > > > an arm_smmu_domain won't have an smmu pointer, so a device
> > can't
> > > > > > > attach to an S2 domain but always an nested S1 domain, right?
> > > > > >
> > > > > > That seems like a simple solution to the VMID lifetime, but it means
> > > > > > the kernel has to decode more types of vSTE.

> > And the narrative at the top was trying to describe the links:
> >   [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ]
> > v.s.
> >   [ device ] => [ non-shareable S2 ]
> >
> > So the first case can take advantage of VIOMMU_INVALIDATE v.s.
> > the second case requires a DEV_INVALIDATE.
> 
> and one side-effect in the first case is to save one VMID for
> non-shareable S2 hence improves iotlb efficiency.

Hmm, how is that?

VMID is currently stored in an S2 domain, actually. The viommu
is a VMID holder to potentially decouple VMID from S2 domain,
because VMID is per SMMU instance while S2 domain is shareable:
   [ dev0 ] => [ S1 dom0 ] => [ viommu0 (VMID0) [ shared S2 ] ]
   [ dev1 ] => [ S1 dom1 ] => [ viommu1 (VMID1) [ shared S2 ] ]

By the way, we can also have (very likely our initial version):
   [ dev0 ] => [ S1 dom0 ] => [ viommu0 [ non-sharable S2 dom0 (VMID0) ] ]
   [ dev1 ] => [ S1 dom1 ] => [ viommu1 [ non-sharable S2 dom1 (VMID1) ] ]

Thanks
Nicolin
Tian, Kevin May 24, 2024, 7:21 a.m. UTC | #19
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, May 24, 2024 1:58 PM
> 
> On Fri, May 24, 2024 at 05:24:11AM +0000, Tian, Kevin wrote:
> > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > > > > > > So, you want a proxy S1 domain for a device to attach, in case
> > > > > > > > of a stage-2 only setup, because an S2 domain will no longer has
> > > > > > > > a VMID, since it's shared among viommus. In the SMMU driver
> case,
> > > > > > > > an arm_smmu_domain won't have an smmu pointer, so a device
> > > can't
> > > > > > > > attach to an S2 domain but always an nested S1 domain, right?
> > > > > > >
> > > > > > > That seems like a simple solution to the VMID lifetime, but it
> means
> > > > > > > the kernel has to decode more types of vSTE.
> 
> > > And the narrative at the top was trying to describe the links:
> > >   [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ]
> > > v.s.
> > >   [ device ] => [ non-shareable S2 ]
> > >
> > > So the first case can take advantage of VIOMMU_INVALIDATE v.s.
> > > the second case requires a DEV_INVALIDATE.
> >
> > and one side-effect in the first case is to save one VMID for
> > non-shareable S2 hence improves iotlb efficiency.
> 
> Hmm, how is that?
> 
> VMID is currently stored in an S2 domain, actually. The viommu
> is a VMID holder to potentially decouple VMID from S2 domain,
> because VMID is per SMMU instance while S2 domain is shareable:
>    [ dev0 ] => [ S1 dom0 ] => [ viommu0 (VMID0) [ shared S2 ] ]
>    [ dev1 ] => [ S1 dom1 ] => [ viommu1 (VMID1) [ shared S2 ] ]

My point was based on Jason's example about 3 VMIDs:

   hwpt_alloc(deva, nesting_parent=true) = shared_s2
   viommu_alloc(deva, shared_s2) = viommu1
   viommu_alloc(devb, shared_s2) = viommu2
   hwpt_alloc(deva, viommu1, vste) = deva_vste
   hwpt_alloc(devb, viommu2, vste) = devb_vste
   attach(deva, deva_vste)
   attach(devb, devb_vste)
   attach(devc, shared_s2)

for devc it could be:
   hwpt_alloc(deva, viommu1, vproxy_s1) = devc_proxys1
   attach(devc, devc_proxys1)

then devc will reuse VMID of viommu1 and we save one VMID.

Does that not work so we need create another viommu to hold the
proxy identity s1 then still need a 3rd VMID?
Jason Gunthorpe May 24, 2024, 1:05 p.m. UTC | #20
On Fri, May 24, 2024 at 02:21:23AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, May 23, 2024 11:01 PM
> > 
> > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Wednesday, May 22, 2024 2:25 AM
> > > >
> > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote:
> > > > > So, you want a proxy S1 domain for a device to attach, in case
> > > > > of a stage-2 only setup, because an S2 domain will no longer has
> > > > > a VMID, since it's shared among viommus. In the SMMU driver case,
> > > > > an arm_smmu_domain won't have an smmu pointer, so a device can't
> > > > > attach to an S2 domain but always an nested S1 domain, right?
> > > >
> > > > That seems like a simple solution to the VMID lifetime, but it means
> > > > the kernel has to decode more types of vSTE.
> > > >
> > >
> > > why does ATC invalidation need to know about VMID?
> > 
> > ATC invalidation always requires a vRID to pRID translation and the
> > VIOMMU will hold that translation.
> > 
> > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into
> > HW, and vCMDQ requires the VMID to do that.
> > 
> 
> At a quick glance VMID and vRID->pRID translation are both configurations
> of a vintf.
> 
> My impression was that vintf->vmid is added to guest cmd when it's
> about iotlb invalidation.
> 
> then vintf->sid_slots is walked when handling a guest cmd for ATC
> invalidation.
> 
> I'm not sure why the latter one requires a valid VMID to do the walking
> except it's a implementation choice made by vCMDQ?

Yeah, that is probably right. And then in iommufd we are sort of
doubling down by saying the VIOMMU object holds the S2 and the
vRID->pRID as a bundle together.

Jason
Jason Gunthorpe May 24, 2024, 1:12 p.m. UTC | #21
On Fri, May 24, 2024 at 07:21:59AM +0000, Tian, Kevin wrote:

> My point was based on Jason's example about 3 VMIDs:
> 
>    hwpt_alloc(deva, nesting_parent=true) = shared_s2
>    viommu_alloc(deva, shared_s2) = viommu1
>    viommu_alloc(devb, shared_s2) = viommu2
>    hwpt_alloc(deva, viommu1, vste) = deva_vste
>    hwpt_alloc(devb, viommu2, vste) = devb_vste
>    attach(deva, deva_vste)
>    attach(devb, devb_vste)
>    attach(devc, shared_s2)
> 
> for devc it could be:
>    hwpt_alloc(deva, viommu1, vproxy_s1) = devc_proxys1
>    attach(devc, devc_proxys1)
> 
> then devc will reuse VMID of viommu1 and we save one VMID.

I think that can work, at least I haven't thought of a reason why the
non-vIOMMU device's VMID couldn't be shared with one of the
vIOMMUs. 

It is all quite a contrived case, I dont't expect people to want to do
anything like this, it is just showing that the API is properly setup
that it could be done. Userspace is not restricted to a single VIOMMU.

> Does that not work so we need create another viommu to hold the
> proxy identity s1 then still need a 3rd VMID?

Yes, I think that can be OK.

Jason
diff mbox series

Patch

diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index dec10c6bb261..ca6ac8a1ffd0 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -94,9 +94,13 @@  struct iommufd_viommu {
  * struct iommufd_viommu_ops - viommu specific operations
  * @free: Free all driver-specific parts of an iommufd_viommu. The memory
  *        of the entire viommu will be free-ed by iommufd core
+ * @set/unset_dev_id: set/unset a user space virtual id for a device
  */
 struct iommufd_viommu_ops {
 	void (*free)(struct iommufd_viommu *viommu);
+	int (*set_dev_id)(struct iommufd_viommu *viommu,
+			  struct device *dev, u64 dev_id);
+	void (*unset_dev_id)(struct iommufd_viommu *viommu, struct device *dev);
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)