Message ID | 6e57d7b5aa1705bdd547b1cd2aca93d3bf70dfa4.1712978212.git.nicolinc@nvidia.com |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | Add Tegra241 (Grace) CMDQV Support (part 2/2) | expand |
On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > Add a pair of ops to set and unet device's virtual ID that belongs to > a viommu object. They will be used, in the following patch, by iommufd > to support some HW-acceleration feature from the host level. > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > is used by ATC invalidation commands so SMMU HW can direct invalidation > requests to the corresponding PCI device where the ID belongs to. In a > virtualization use case, a passthroughed device in the VM will have a > virtuail Stream ID, used by the ATC invalidation commands in the guest > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to > execute the guest-level ATC invalidation commands directly, yet needs > the HW to be aware of its virtual Stream ID so it can replace with its > physical Stream ID. I imagine using this as well for the ATC invalidation commands. It would be very easy and simplifying if the command fixup just extracted the vSID from the ATC invalidation and used an xarray to turn it into a pSID and then pushed the resulting command. Seems fine as is Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > @@ -94,9 +94,13 @@ struct iommufd_viommu { > * struct iommufd_viommu_ops - viommu specific operations > * @free: Free all driver-specific parts of an iommufd_viommu. The memory > * of the entire viommu will be free-ed by iommufd core > + * @set/unset_dev_id: set/unset a user space virtual id for a device > */ > struct iommufd_viommu_ops { > void (*free)(struct iommufd_viommu *viommu); > + int (*set_dev_id)(struct iommufd_viommu *viommu, > + struct device *dev, u64 dev_id); > + void (*unset_dev_id)(struct iommufd_viommu *viommu, struct > device *dev); Actually, looking at this more closely, why doesn't the core pass in dev_id to unset_dev_id? Not doing so seems like it will crease a bunch of unnecessary work for the driver. Jason
On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > Add a pair of ops to set and unet device's virtual ID that belongs to > > a viommu object. They will be used, in the following patch, by iommufd > > to support some HW-acceleration feature from the host level. > > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > > is used by ATC invalidation commands so SMMU HW can direct invalidation > > requests to the corresponding PCI device where the ID belongs to. In a > > virtualization use case, a passthroughed device in the VM will have a > > virtuail Stream ID, used by the ATC invalidation commands in the guest > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to > > execute the guest-level ATC invalidation commands directly, yet needs > > the HW to be aware of its virtual Stream ID so it can replace with its > > physical Stream ID. > > I imagine using this as well for the ATC invalidation commands. It > would be very easy and simplifying if the command fixup just extracted > the vSID from the ATC invalidation and used an xarray to turn it into > a pSID and then pushed the resulting command. You mean the nested SMMU series right? Actually the set_dev_id ioctl was a part of that until we wanted to try DEV_INVALIDATE. So again, yes, it makes sense to me that we move viommu and the set_dev_id to the nested series, and then drop DEV_INVALIDATE. Thanks Nicolin
On Sun, May 12, 2024 at 09:39:15PM -0700, Nicolin Chen wrote: > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > > Add a pair of ops to set and unet device's virtual ID that belongs to > > > a viommu object. They will be used, in the following patch, by iommufd > > > to support some HW-acceleration feature from the host level. > > > > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > > > is used by ATC invalidation commands so SMMU HW can direct invalidation > > > requests to the corresponding PCI device where the ID belongs to. In a > > > virtualization use case, a passthroughed device in the VM will have a > > > virtuail Stream ID, used by the ATC invalidation commands in the guest > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to > > > execute the guest-level ATC invalidation commands directly, yet needs > > > the HW to be aware of its virtual Stream ID so it can replace with its > > > physical Stream ID. > > > > I imagine using this as well for the ATC invalidation commands. It > > would be very easy and simplifying if the command fixup just extracted > > the vSID from the ATC invalidation and used an xarray to turn it into > > a pSID and then pushed the resulting command. > > You mean the nested SMMU series right? Actually the set_dev_id > ioctl was a part of that until we wanted to try DEV_INVALIDATE. Yes, there is nothing inherently wrong with DEV_INVALIDATE, we could continue to use that as the API and automatically pick up the VIOMMU instance from the nesting domain to process the ATS. The VMM needs a reliable place to send the CMDQ data, on ARM/AMD this needs to be an always available global-to-the-viommu thing. Intel needs to associate the virtual invalidation with the correct nesting domain as well. So I original thought it would nice and simple to have a VIOMMU_INVALIDATE as well. But.. If we need a nesting domain that is indentity (ie the S2) then when the VIOMMU is created then we'd also create an identity nesting domain as well. Functionally we could use that global nesting domain to deliver the DEV_INVALIDATE too. It is a bit quirky, but it would be OK. > So again, yes, it makes sense to me that we move viommu and the > set_dev_id to the nested series, and then drop DEV_INVALIDATE. I would like to do this bit by bit. viommu is a big series on its own. DEV_INVALIDATE is fine, it just can't do ATS invalidation. We can add ATS invalidation after either as an enhancement as part of adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or both) Jason
On Tue, May 14, 2024 at 12:53:23PM -0300, Jason Gunthorpe wrote: > On Sun, May 12, 2024 at 09:39:15PM -0700, Nicolin Chen wrote: > > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > > > Add a pair of ops to set and unet device's virtual ID that belongs to > > > > a viommu object. They will be used, in the following patch, by iommufd > > > > to support some HW-acceleration feature from the host level. > > > > > > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > > > > is used by ATC invalidation commands so SMMU HW can direct invalidation > > > > requests to the corresponding PCI device where the ID belongs to. In a > > > > virtualization use case, a passthroughed device in the VM will have a > > > > virtuail Stream ID, used by the ATC invalidation commands in the guest > > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to > > > > execute the guest-level ATC invalidation commands directly, yet needs > > > > the HW to be aware of its virtual Stream ID so it can replace with its > > > > physical Stream ID. > > > > > > I imagine using this as well for the ATC invalidation commands. It > > > would be very easy and simplifying if the command fixup just extracted > > > the vSID from the ATC invalidation and used an xarray to turn it into > > > a pSID and then pushed the resulting command. > > > > You mean the nested SMMU series right? Actually the set_dev_id > > ioctl was a part of that until we wanted to try DEV_INVALIDATE. > > Yes, there is nothing inherently wrong with DEV_INVALIDATE, we could > continue to use that as the API and automatically pick up the VIOMMU > instance from the nesting domain to process the ATS. > > The VMM needs a reliable place to send the CMDQ data, on ARM/AMD this > needs to be an always available global-to-the-viommu thing. Intel > needs to associate the virtual invalidation with the correct nesting > domain as well. > > So I original thought it would nice and simple to have a > VIOMMU_INVALIDATE as well. > > But.. If we need a nesting domain that is indentity (ie the S2) then > when the VIOMMU is created then we'd also create an identity nesting > domain as well. So, you want a proxy S1 domain for a device to attach, in case of a stage-2 only setup, because an S2 domain will no longer has a VMID, since it's shared among viommus. In the SMMU driver case, an arm_smmu_domain won't have an smmu pointer, so a device can't attach to an S2 domain but always an nested S1 domain, right? > Functionally we could use that global nesting domain > to deliver the DEV_INVALIDATE too. If my narrative above is correct, the device is actually still attached to S2 domain via a proxy nested S1 domain. What cache do we need to invalidate except S2 mappings in this case? > > So again, yes, it makes sense to me that we move viommu and the > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > I would like to do this bit by bit. viommu is a big series on its own. > > DEV_INVALIDATE is fine, it just can't do ATS invalidation. I am not very sure about AMD. Intel doesn't need DEV_INVALIDATE at this moment. SMMU only uses DEV_INVALIDATE for ATC and CD invalidations, which can be both shifted to VIOMMU_INVALIDATE. Same question: any other case can we use the DEV_INVALIDATE for? > We can add ATS invalidation after either as an enhancement as part of > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or > both) Yea, maybe step by step like this: Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH Part-2 VIOMMU_SET/UNSET_VDEV_ID Part-3 VIOMMU_INVALIDATE Part-4 VQUEUE_ALLOC ... Thanks Nicolin
On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > So, you want a proxy S1 domain for a device to attach, in case > of a stage-2 only setup, because an S2 domain will no longer has > a VMID, since it's shared among viommus. In the SMMU driver case, > an arm_smmu_domain won't have an smmu pointer, so a device can't > attach to an S2 domain but always an nested S1 domain, right? That seems like a simple solution to the VMID lifetime, but it means the kernel has to decode more types of vSTE. > > Functionally we could use that global nesting domain > > to deliver the DEV_INVALIDATE too. > > If my narrative above is correct, the device is actually still > attached to S2 domain via a proxy nested S1 domain. What cache > do we need to invalidate except S2 mappings in this case? qemu needs a reliable place to send the invalidation commands to (ie what ID to provide). If we add IOMMU_VIOMMU_INVALIDATE then the ID is the viommu id. If we enhance IOMMU_HWPT_INVALIDATE then the ID is the identity nesting domain. Either case leads to the viommu object in the kernel. I don't know if there is merit one way or the other. A more specific API surface is nice, but the two APIs are completely duplicating. So maybe: #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE As documentation and have the kernel just detect based on the type of the passed ID? > > > So again, yes, it makes sense to me that we move viommu and the > > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > > > I would like to do this bit by bit. viommu is a big series on its own. > > > > DEV_INVALIDATE is fine, it just can't do ATS invalidation. > > I am not very sure about AMD. AMD will need the same vRID -> pRID stuff and we want that to run on the VIOMMU > Same question: any other case can we use the DEV_INVALIDATE for? DEV_INVALIDATE was interesting before the viommu idea because userspace could process each invalidation command and when it reaches ATS it would invoke the correct DEV_INVALIDATE. With viommu we expect ATS supporting drivers to support viommu and then to do vRID -> pRID in the other invalidation paths. In this case I don't see a reason to do DEV_INVALIDATE right now. > > We can add ATS invalidation after either as an enhancement as part of > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or > > both) > > Yea, maybe step by step like this: > > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH > Part-2 VIOMMU_SET/UNSET_VDEV_ID > Part-3 VIOMMU_INVALIDATE > Part-4 VQUEUE_ALLOC > ... So we have this stuff still open: - Identity STE with PASID (part 2b) - IOMMU_GET_HW_INFO (part 3) - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3) - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3) - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 - VIOMMU_ALLOC, VIOMMU_ATTACH - VIOMMU_INVALIDATE - VIOMMU_SET/UNSET_VDEV_ID - VQUEUE_ALLOC / vCMDQ I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID which brings ATS support the API. vCMDQ hypervisor support would sit on top of that with just VQUEUE? Jason
On Tue, May 21, 2024 at 03:24:48PM -0300, Jason Gunthorpe wrote: > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > So, you want a proxy S1 domain for a device to attach, in case > > of a stage-2 only setup, because an S2 domain will no longer has > > a VMID, since it's shared among viommus. In the SMMU driver case, > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > attach to an S2 domain but always an nested S1 domain, right? > > That seems like a simple solution to the VMID lifetime, but it means > the kernel has to decode more types of vSTE. Yea. For vSTE=abort, likely we need a nested block domain too? > > > Functionally we could use that global nesting domain > > > to deliver the DEV_INVALIDATE too. > > > > If my narrative above is correct, the device is actually still > > attached to S2 domain via a proxy nested S1 domain. What cache > > do we need to invalidate except S2 mappings in this case? > > qemu needs a reliable place to send the invalidation commands to (ie > what ID to provide). > > If we add IOMMU_VIOMMU_INVALIDATE then the ID is the viommu id. > > If we enhance IOMMU_HWPT_INVALIDATE then the ID is the identity > nesting domain. > > Either case leads to the viommu object in the kernel. Ooohh! I didn't connect the dots this far. Yes. This turns the IOMMU_HWPT_INVALIDATE back to the very first version supporting device cache flush. Though using IOMMU_HWPT_INVALIDATE feels a bit rule breaking now since it assumes the nested HWPT keeps a vdev_id lookup table somewhere in its associates... > I don't know if there is merit one way or the other. A more specific > API surface is nice, but the two APIs are completely duplicating. > > So maybe: > > #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE > > As documentation and have the kernel just detect based on the type of > the passed ID? Yea, the only difference is viommu_id v.s. hwpt_id that we can document. Then in this case, we have two mostly identical uAPIs for the SMMU driver to use. Should we implement both? > > > > So again, yes, it makes sense to me that we move viommu and the > > > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > > > > > I would like to do this bit by bit. viommu is a big series on its own. > > > > > > DEV_INVALIDATE is fine, it just can't do ATS invalidation. > > > > I am not very sure about AMD. > > AMD will need the same vRID -> pRID stuff and we want that to run on > the VIOMMU > > > Same question: any other case can we use the DEV_INVALIDATE for? > > DEV_INVALIDATE was interesting before the viommu idea because > userspace could process each invalidation command and when it reaches > ATS it would invoke the correct DEV_INVALIDATE. Agreed. That helped a lot in VMM dispatching the invalidation requests. > With viommu we expect ATS supporting drivers to support viommu and > then to do vRID -> pRID in the other invalidation paths. In this case > I don't see a reason to do DEV_INVALIDATE right now. Yea. I guessed so. > > > We can add ATS invalidation after either as an enhancement as part of > > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or > > > both) > > > > Yea, maybe step by step like this: > > > > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH > > Part-2 VIOMMU_SET/UNSET_VDEV_ID > > Part-3 VIOMMU_INVALIDATE > > Part-4 VQUEUE_ALLOC > > ... > > So we have this stuff still open: > - Identity STE with PASID (part 2b) > - IOMMU_GET_HW_INFO (part 3) > - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3) > - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3) > - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 > - VIOMMU_ALLOC, VIOMMU_ATTACH > - VIOMMU_INVALIDATE > - VIOMMU_SET/UNSET_VDEV_ID > - VQUEUE_ALLOC / vCMDQ > > I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit > to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID > which brings ATS support the API. There is some conflict at passing in viommu_id/viommu v.s. parent hwpt_id/domain for a nested domain allocation. Do you think that should be addressed later in VIOMMU series v.s. part3? More specifically, I have two drafts in my viommu series: 87a659e65229 WAR: iommufd: Allow pt_it to carry viommu_id 7c5fd8f50bc9 WAR pass in viommu pointer to domain_alloc_user op I know that these two only make sense with VIOMMU_ALOC. Yet, will there be a problem, if we establish nested domain allocation with parent domain/hwpt by part3, in the uAPI, and then change later? Will we end up with supporting two for backward compatibility? > vCMDQ hypervisor support would sit on top of that with just VQUEUE? Yea. Thanks Nicolin
On Tue, May 21, 2024 at 03:27:02PM -0700, Nicolin Chen wrote: > On Tue, May 21, 2024 at 03:24:48PM -0300, Jason Gunthorpe wrote: > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > So, you want a proxy S1 domain for a device to attach, in case > > > of a stage-2 only setup, because an S2 domain will no longer has > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > > attach to an S2 domain but always an nested S1 domain, right? > > > > That seems like a simple solution to the VMID lifetime, but it means > > the kernel has to decode more types of vSTE. > > Yea. For vSTE=abort, likely we need a nested block domain too? Sure, it is easy to do > > I don't know if there is merit one way or the other. A more specific > > API surface is nice, but the two APIs are completely duplicating. > > > > So maybe: > > > > #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE > > > > As documentation and have the kernel just detect based on the type of > > the passed ID? > > Yea, the only difference is viommu_id v.s. hwpt_id that we can > document. > > Then in this case, we have two mostly identical uAPIs for the > SMMU driver to use. Should we implement both? I suspect it will turn out nicely naturally, lets try and see > > > > We can add ATS invalidation after either as an enhancement as part of > > > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or > > > > both) > > > > > > Yea, maybe step by step like this: > > > > > > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH > > > Part-2 VIOMMU_SET/UNSET_VDEV_ID > > > Part-3 VIOMMU_INVALIDATE > > > Part-4 VQUEUE_ALLOC > > > ... > > > > So we have this stuff still open: > > - Identity STE with PASID (part 2b) > > - IOMMU_GET_HW_INFO (part 3) > > - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3) > > - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3) > > - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 > > - VIOMMU_ALLOC, VIOMMU_ATTACH > > - VIOMMU_INVALIDATE > > - VIOMMU_SET/UNSET_VDEV_ID > > - VQUEUE_ALLOC / vCMDQ > > > > I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit > > to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID > > which brings ATS support the API. > > There is some conflict at passing in viommu_id/viommu v.s. parent > hwpt_id/domain for a nested domain allocation. Do you think that > should be addressed later in VIOMMU series v.s. part3? > > More specifically, I have two drafts in my viommu series: > 87a659e65229 WAR: iommufd: Allow pt_it to carry viommu_id > 7c5fd8f50bc9 WAR pass in viommu pointer to domain_alloc_user op It would be good for viommu to come with all the uAPI changes in one shot, so all the pt_ids should be updated to accept viommu to pass the S2 HWPT. Then whatever driver changes are needed to make ATS work should come together too. > I know that these two only make sense with VIOMMU_ALOC. Yet, will > there be a problem, if we establish nested domain allocation with > parent domain/hwpt by part3, in the uAPI, and then change later? > Will we end up with supporting two for backward compatibility? I think this is fairly minor compatability, let's see. Jason
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Monday, May 13, 2024 12:39 PM > > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > > Add a pair of ops to set and unet device's virtual ID that belongs to > > > a viommu object. They will be used, in the following patch, by iommufd > > > to support some HW-acceleration feature from the host level. > > > > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > > > is used by ATC invalidation commands so SMMU HW can direct > invalidation > > > requests to the corresponding PCI device where the ID belongs to. In a > > > virtualization use case, a passthroughed device in the VM will have a > > > virtuail Stream ID, used by the ATC invalidation commands in the guest > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface > to > > > execute the guest-level ATC invalidation commands directly, yet needs > > > the HW to be aware of its virtual Stream ID so it can replace with its > > > physical Stream ID. > > > > I imagine using this as well for the ATC invalidation commands. It > > would be very easy and simplifying if the command fixup just extracted > > the vSID from the ATC invalidation and used an xarray to turn it into > > a pSID and then pushed the resulting command. > > You mean the nested SMMU series right? Actually the set_dev_id > ioctl was a part of that until we wanted to try DEV_INVALIDATE. > > So again, yes, it makes sense to me that we move viommu and the > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > I'm right about to ask how the nesting series is going. Per earlier discussion iirc the nesting series will go in before VCMDQ?
On Thu, May 23, 2024 at 05:44:40AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Monday, May 13, 2024 12:39 PM > > > > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > > > Add a pair of ops to set and unet device's virtual ID that belongs to > > > > a viommu object. They will be used, in the following patch, by iommufd > > > > to support some HW-acceleration feature from the host level. > > > > > > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > > > > is used by ATC invalidation commands so SMMU HW can direct > > invalidation > > > > requests to the corresponding PCI device where the ID belongs to. In a > > > > virtualization use case, a passthroughed device in the VM will have a > > > > virtuail Stream ID, used by the ATC invalidation commands in the guest > > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface > > to > > > > execute the guest-level ATC invalidation commands directly, yet needs > > > > the HW to be aware of its virtual Stream ID so it can replace with its > > > > physical Stream ID. > > > > > > I imagine using this as well for the ATC invalidation commands. It > > > would be very easy and simplifying if the command fixup just extracted > > > the vSID from the ATC invalidation and used an xarray to turn it into > > > a pSID and then pushed the resulting command. > > > > You mean the nested SMMU series right? Actually the set_dev_id > > ioctl was a part of that until we wanted to try DEV_INVALIDATE. > > > > So again, yes, it makes sense to me that we move viommu and the > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > I'm right about to ask how the nesting series is going. Per earlier > discussion iirc the nesting series will go in before VCMDQ? Yes. It still should. Yet we ended up with adding VIOMMU to the nested SMMU series too. A shared S2 domain/hwpt isn't exclusive for VCMDQ use case but also for regular nesting on a multi-SMMU setup. So, VIOMMU turns out to be the best object that we have at this moment to hold individual VMIDs for different physical SMMUs sharing a single S2 domain. Its virtual device ID lookup feature can also allow us to forget about DEV_INVALIDATE ioctl for now. Jason listed all the tasks ahead in this thread too, using SMMU as an example: > So we have this stuff still open: > - Identity STE with PASID (part 2b) > - IOMMU_GET_HW_INFO (part 3) > - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3) > - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3) > - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 > - VIOMMU_ALLOC, VIOMMU_ATTACH > - VIOMMU_INVALIDATE By this series nesting setup is done. We need Baolu's solution or VQUEUE for fault reporting after that. Thanks Nicolin
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, May 22, 2024 2:25 AM > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > So, you want a proxy S1 domain for a device to attach, in case > > of a stage-2 only setup, because an S2 domain will no longer has > > a VMID, since it's shared among viommus. In the SMMU driver case, > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > attach to an S2 domain but always an nested S1 domain, right? > > That seems like a simple solution to the VMID lifetime, but it means > the kernel has to decode more types of vSTE. > why does ATC invalidation need to know about VMID? Or is above talking about beyond the discussion between DEV_INVALIDATE vs. VIOMMU_INVALIDATE?
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Thursday, May 23, 2024 2:10 PM > > On Thu, May 23, 2024 at 05:44:40AM +0000, Tian, Kevin wrote: > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > Sent: Monday, May 13, 2024 12:39 PM > > > > > > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > > > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > > > > Add a pair of ops to set and unet device's virtual ID that belongs to > > > > > a viommu object. They will be used, in the following patch, by > iommufd > > > > > to support some HW-acceleration feature from the host level. > > > > > > > > > > For instance, every device behind an ARM SMMU has a Stream ID. The > ID > > > > > is used by ATC invalidation commands so SMMU HW can direct > > > invalidation > > > > > requests to the corresponding PCI device where the ID belongs to. In a > > > > > virtualization use case, a passthroughed device in the VM will have a > > > > > virtuail Stream ID, used by the ATC invalidation commands in the > guest > > > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v- > interface > > > to > > > > > execute the guest-level ATC invalidation commands directly, yet needs > > > > > the HW to be aware of its virtual Stream ID so it can replace with its > > > > > physical Stream ID. > > > > > > > > I imagine using this as well for the ATC invalidation commands. It > > > > would be very easy and simplifying if the command fixup just extracted > > > > the vSID from the ATC invalidation and used an xarray to turn it into > > > > a pSID and then pushed the resulting command. > > > > > > You mean the nested SMMU series right? Actually the set_dev_id > > > ioctl was a part of that until we wanted to try DEV_INVALIDATE. > > > > > > So again, yes, it makes sense to me that we move viommu and the > > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > > > I'm right about to ask how the nesting series is going. Per earlier > > discussion iirc the nesting series will go in before VCMDQ? > > Yes. It still should. Yet we ended up with adding VIOMMU to the > nested SMMU series too. A shared S2 domain/hwpt isn't exclusive > for VCMDQ use case but also for regular nesting on a multi-SMMU > setup. So, VIOMMU turns out to be the best object that we have > at this moment to hold individual VMIDs for different physical > SMMUs sharing a single S2 domain. Its virtual device ID lookup > feature can also allow us to forget about DEV_INVALIDATE ioctl > for now. Yeah, that makes sense. btw it's probably helpful to future reviews if you can try to include more explanations about the design tradeoff based on those discussions. Somehow it's not easy for me to catch up with those discussions being lacking of many contexts here. 😊 > > Jason listed all the tasks ahead in this thread too, using SMMU > as an example: > > So we have this stuff still open: > > - Identity STE with PASID (part 2b) > > - IOMMU_GET_HW_INFO (part 3) > > - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3) > > - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3) > > - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 > > - VIOMMU_ALLOC, VIOMMU_ATTACH > > - VIOMMU_INVALIDATE > > By this series nesting setup is done. We need Baolu's solution > or VQUEUE for fault reporting after that. > this is a clear plan.
On Thu, May 23, 2024 at 05:44:40AM +0000, Tian, Kevin wrote: > > On Sun, May 12, 2024 at 11:46:54AM -0300, Jason Gunthorpe wrote: > > > On Fri, Apr 12, 2024 at 08:47:04PM -0700, Nicolin Chen wrote: > > > > Add a pair of ops to set and unet device's virtual ID that belongs to > > > > a viommu object. They will be used, in the following patch, by iommufd > > > > to support some HW-acceleration feature from the host level. > > > > > > > > For instance, every device behind an ARM SMMU has a Stream ID. The ID > > > > is used by ATC invalidation commands so SMMU HW can direct > > invalidation > > > > requests to the corresponding PCI device where the ID belongs to. In a > > > > virtualization use case, a passthroughed device in the VM will have a > > > > virtuail Stream ID, used by the ATC invalidation commands in the guest > > > > system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface > > to > > > > execute the guest-level ATC invalidation commands directly, yet needs > > > > the HW to be aware of its virtual Stream ID so it can replace with its > > > > physical Stream ID. > > > > > > I imagine using this as well for the ATC invalidation commands. It > > > would be very easy and simplifying if the command fixup just extracted > > > the vSID from the ATC invalidation and used an xarray to turn it into > > > a pSID and then pushed the resulting command. > > > > You mean the nested SMMU series right? Actually the set_dev_id > > ioctl was a part of that until we wanted to try DEV_INVALIDATE. > > > > So again, yes, it makes sense to me that we move viommu and the > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > > > I'm right about to ask how the nesting series is going. Per earlier > discussion iirc the nesting series will go in before VCMDQ? It has been sitting on my github, since it is now the N+1 series to merge I will post it in a few weeks. There are still some details yet to be typed in unfortunately. Jason
On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, May 22, 2024 2:25 AM > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > So, you want a proxy S1 domain for a device to attach, in case > > > of a stage-2 only setup, because an S2 domain will no longer has > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > > attach to an S2 domain but always an nested S1 domain, right? > > > > That seems like a simple solution to the VMID lifetime, but it means > > the kernel has to decode more types of vSTE. > > > > why does ATC invalidation need to know about VMID? ATC invalidation always requires a vRID to pRID translation and the VIOMMU will hold that translation. On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into HW, and vCMDQ requires the VMID to do that. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, May 23, 2024 11:01 PM > > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Wednesday, May 22, 2024 2:25 AM > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > > So, you want a proxy S1 domain for a device to attach, in case > > > > of a stage-2 only setup, because an S2 domain will no longer has > > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > > > attach to an S2 domain but always an nested S1 domain, right? > > > > > > That seems like a simple solution to the VMID lifetime, but it means > > > the kernel has to decode more types of vSTE. > > > > > > > why does ATC invalidation need to know about VMID? > > ATC invalidation always requires a vRID to pRID translation and the > VIOMMU will hold that translation. > > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into > HW, and vCMDQ requires the VMID to do that. > At a quick glance VMID and vRID->pRID translation are both configurations of a vintf. My impression was that vintf->vmid is added to guest cmd when it's about iotlb invalidation. then vintf->sid_slots is walked when handling a guest cmd for ATC invalidation. I'm not sure why the latter one requires a valid VMID to do the walking except it's a implementation choice made by vCMDQ?
On Fri, May 24, 2024 at 02:21:23AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Thursday, May 23, 2024 11:01 PM > > > > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > Sent: Wednesday, May 22, 2024 2:25 AM > > > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > > > So, you want a proxy S1 domain for a device to attach, in case > > > > > of a stage-2 only setup, because an S2 domain will no longer has > > > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > > > > attach to an S2 domain but always an nested S1 domain, right? > > > > > > > > That seems like a simple solution to the VMID lifetime, but it means > > > > the kernel has to decode more types of vSTE. > > > > > > > > > > why does ATC invalidation need to know about VMID? > > > > ATC invalidation always requires a vRID to pRID translation and the > > VIOMMU will hold that translation. > > > > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into > > HW, and vCMDQ requires the VMID to do that. > > > > At a quick glance VMID and vRID->pRID translation are both configurations > of a vintf. > > My impression was that vintf->vmid is added to guest cmd when it's > about iotlb invalidation. > > then vintf->sid_slots is walked when handling a guest cmd for ATC > invalidation. > > I'm not sure why the latter one requires a valid VMID to do the walking > except it's a implementation choice made by vCMDQ? Well, we haven't thought about a case of doing ATC invalidation via VINTF/VCMDQ without setting up a VMID, as "VMID" is a field in the VINTF_CONFIG register next to the Enable bit and must be set prior to enabling a VINTF, though your understanding of the HW work flow is probably accurate :) And the narrative at the top was trying to describe the links: [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ] v.s. [ device ] => [ non-shareable S2 ] So the first case can take advantage of VIOMMU_INVALIDATE v.s. the second case requires a DEV_INVALIDATE. Another conclusion between the lines: since ARM SMMU will have the first case (with viommu), there is no longer any use of a DEV_INVALIDATE ioctl. So, we would likely drop it in the coming version. Thanks Nicolin
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Friday, May 24, 2024 11:26 AM > > On Fri, May 24, 2024 at 02:21:23AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Thursday, May 23, 2024 11:01 PM > > > > > > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote: > > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > > Sent: Wednesday, May 22, 2024 2:25 AM > > > > > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > > > > So, you want a proxy S1 domain for a device to attach, in case > > > > > > of a stage-2 only setup, because an S2 domain will no longer has > > > > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > > > > an arm_smmu_domain won't have an smmu pointer, so a device > can't > > > > > > attach to an S2 domain but always an nested S1 domain, right? > > > > > > > > > > That seems like a simple solution to the VMID lifetime, but it means > > > > > the kernel has to decode more types of vSTE. > > > > > > > > > > > > > why does ATC invalidation need to know about VMID? > > > > > > ATC invalidation always requires a vRID to pRID translation and the > > > VIOMMU will hold that translation. > > > > > > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed > into > > > HW, and vCMDQ requires the VMID to do that. > > > > > > > At a quick glance VMID and vRID->pRID translation are both configurations > > of a vintf. > > > > My impression was that vintf->vmid is added to guest cmd when it's > > about iotlb invalidation. > > > > then vintf->sid_slots is walked when handling a guest cmd for ATC > > invalidation. > > > > I'm not sure why the latter one requires a valid VMID to do the walking > > except it's a implementation choice made by vCMDQ? > > Well, we haven't thought about a case of doing ATC invalidation > via VINTF/VCMDQ without setting up a VMID, as "VMID" is a field > in the VINTF_CONFIG register next to the Enable bit and must be > set prior to enabling a VINTF, though your understanding of the > HW work flow is probably accurate :) Okay, that explains it. they are irrelevant in concept but come relevant due to that detail. 😊 > > And the narrative at the top was trying to describe the links: > [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ] > v.s. > [ device ] => [ non-shareable S2 ] > > So the first case can take advantage of VIOMMU_INVALIDATE v.s. > the second case requires a DEV_INVALIDATE. and one side-effect in the first case is to save one VMID for non-shareable S2 hence improves iotlb efficiency. > > Another conclusion between the lines: since ARM SMMU will have > the first case (with viommu), there is no longer any use of a > DEV_INVALIDATE ioctl. So, we would likely drop it in the coming > version. > > Thanks > Nicolin
On Fri, May 24, 2024 at 05:24:11AM +0000, Tian, Kevin wrote: > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > > > > > So, you want a proxy S1 domain for a device to attach, in case > > > > > > > of a stage-2 only setup, because an S2 domain will no longer has > > > > > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > > > > > an arm_smmu_domain won't have an smmu pointer, so a device > > can't > > > > > > > attach to an S2 domain but always an nested S1 domain, right? > > > > > > > > > > > > That seems like a simple solution to the VMID lifetime, but it means > > > > > > the kernel has to decode more types of vSTE. > > And the narrative at the top was trying to describe the links: > > [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ] > > v.s. > > [ device ] => [ non-shareable S2 ] > > > > So the first case can take advantage of VIOMMU_INVALIDATE v.s. > > the second case requires a DEV_INVALIDATE. > > and one side-effect in the first case is to save one VMID for > non-shareable S2 hence improves iotlb efficiency. Hmm, how is that? VMID is currently stored in an S2 domain, actually. The viommu is a VMID holder to potentially decouple VMID from S2 domain, because VMID is per SMMU instance while S2 domain is shareable: [ dev0 ] => [ S1 dom0 ] => [ viommu0 (VMID0) [ shared S2 ] ] [ dev1 ] => [ S1 dom1 ] => [ viommu1 (VMID1) [ shared S2 ] ] By the way, we can also have (very likely our initial version): [ dev0 ] => [ S1 dom0 ] => [ viommu0 [ non-sharable S2 dom0 (VMID0) ] ] [ dev1 ] => [ S1 dom1 ] => [ viommu1 [ non-sharable S2 dom1 (VMID1) ] ] Thanks Nicolin
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Friday, May 24, 2024 1:58 PM > > On Fri, May 24, 2024 at 05:24:11AM +0000, Tian, Kevin wrote: > > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > > > > > > So, you want a proxy S1 domain for a device to attach, in case > > > > > > > > of a stage-2 only setup, because an S2 domain will no longer has > > > > > > > > a VMID, since it's shared among viommus. In the SMMU driver > case, > > > > > > > > an arm_smmu_domain won't have an smmu pointer, so a device > > > can't > > > > > > > > attach to an S2 domain but always an nested S1 domain, right? > > > > > > > > > > > > > > That seems like a simple solution to the VMID lifetime, but it > means > > > > > > > the kernel has to decode more types of vSTE. > > > > And the narrative at the top was trying to describe the links: > > > [ device ] => [ proxy identity S1 ] => [ viommu [ shared S2 ] ] > > > v.s. > > > [ device ] => [ non-shareable S2 ] > > > > > > So the first case can take advantage of VIOMMU_INVALIDATE v.s. > > > the second case requires a DEV_INVALIDATE. > > > > and one side-effect in the first case is to save one VMID for > > non-shareable S2 hence improves iotlb efficiency. > > Hmm, how is that? > > VMID is currently stored in an S2 domain, actually. The viommu > is a VMID holder to potentially decouple VMID from S2 domain, > because VMID is per SMMU instance while S2 domain is shareable: > [ dev0 ] => [ S1 dom0 ] => [ viommu0 (VMID0) [ shared S2 ] ] > [ dev1 ] => [ S1 dom1 ] => [ viommu1 (VMID1) [ shared S2 ] ] My point was based on Jason's example about 3 VMIDs: hwpt_alloc(deva, nesting_parent=true) = shared_s2 viommu_alloc(deva, shared_s2) = viommu1 viommu_alloc(devb, shared_s2) = viommu2 hwpt_alloc(deva, viommu1, vste) = deva_vste hwpt_alloc(devb, viommu2, vste) = devb_vste attach(deva, deva_vste) attach(devb, devb_vste) attach(devc, shared_s2) for devc it could be: hwpt_alloc(deva, viommu1, vproxy_s1) = devc_proxys1 attach(devc, devc_proxys1) then devc will reuse VMID of viommu1 and we save one VMID. Does that not work so we need create another viommu to hold the proxy identity s1 then still need a 3rd VMID?
On Fri, May 24, 2024 at 02:21:23AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Thursday, May 23, 2024 11:01 PM > > > > On Thu, May 23, 2024 at 06:19:59AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > Sent: Wednesday, May 22, 2024 2:25 AM > > > > > > > > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > > > > So, you want a proxy S1 domain for a device to attach, in case > > > > > of a stage-2 only setup, because an S2 domain will no longer has > > > > > a VMID, since it's shared among viommus. In the SMMU driver case, > > > > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > > > > attach to an S2 domain but always an nested S1 domain, right? > > > > > > > > That seems like a simple solution to the VMID lifetime, but it means > > > > the kernel has to decode more types of vSTE. > > > > > > > > > > why does ATC invalidation need to know about VMID? > > > > ATC invalidation always requires a vRID to pRID translation and the > > VIOMMU will hold that translation. > > > > On vCMDQ HW and on AMD HW the vRID to pRID translation is pushed into > > HW, and vCMDQ requires the VMID to do that. > > > > At a quick glance VMID and vRID->pRID translation are both configurations > of a vintf. > > My impression was that vintf->vmid is added to guest cmd when it's > about iotlb invalidation. > > then vintf->sid_slots is walked when handling a guest cmd for ATC > invalidation. > > I'm not sure why the latter one requires a valid VMID to do the walking > except it's a implementation choice made by vCMDQ? Yeah, that is probably right. And then in iommufd we are sort of doubling down by saying the VIOMMU object holds the S2 and the vRID->pRID as a bundle together. Jason
On Fri, May 24, 2024 at 07:21:59AM +0000, Tian, Kevin wrote: > My point was based on Jason's example about 3 VMIDs: > > hwpt_alloc(deva, nesting_parent=true) = shared_s2 > viommu_alloc(deva, shared_s2) = viommu1 > viommu_alloc(devb, shared_s2) = viommu2 > hwpt_alloc(deva, viommu1, vste) = deva_vste > hwpt_alloc(devb, viommu2, vste) = devb_vste > attach(deva, deva_vste) > attach(devb, devb_vste) > attach(devc, shared_s2) > > for devc it could be: > hwpt_alloc(deva, viommu1, vproxy_s1) = devc_proxys1 > attach(devc, devc_proxys1) > > then devc will reuse VMID of viommu1 and we save one VMID. I think that can work, at least I haven't thought of a reason why the non-vIOMMU device's VMID couldn't be shared with one of the vIOMMUs. It is all quite a contrived case, I dont't expect people to want to do anything like this, it is just showing that the API is properly setup that it could be done. Userspace is not restricted to a single VIOMMU. > Does that not work so we need create another viommu to hold the > proxy identity s1 then still need a 3rd VMID? Yes, I think that can be OK. Jason
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index dec10c6bb261..ca6ac8a1ffd0 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -94,9 +94,13 @@ struct iommufd_viommu { * struct iommufd_viommu_ops - viommu specific operations * @free: Free all driver-specific parts of an iommufd_viommu. The memory * of the entire viommu will be free-ed by iommufd core + * @set/unset_dev_id: set/unset a user space virtual id for a device */ struct iommufd_viommu_ops { void (*free)(struct iommufd_viommu *viommu); + int (*set_dev_id)(struct iommufd_viommu *viommu, + struct device *dev, u64 dev_id); + void (*unset_dev_id)(struct iommufd_viommu *viommu, struct device *dev); }; #if IS_ENABLED(CONFIG_IOMMUFD)
Add a pair of ops to set and unet device's virtual ID that belongs to a viommu object. They will be used, in the following patch, by iommufd to support some HW-acceleration feature from the host level. For instance, every device behind an ARM SMMU has a Stream ID. The ID is used by ATC invalidation commands so SMMU HW can direct invalidation requests to the corresponding PCI device where the ID belongs to. In a virtualization use case, a passthroughed device in the VM will have a virtuail Stream ID, used by the ATC invalidation commands in the guest system. NVIDIA's CMDQV extension for SMMUv3 provides a v-interface to execute the guest-level ATC invalidation commands directly, yet needs the HW to be aware of its virtual Stream ID so it can replace with its physical Stream ID. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- include/linux/iommufd.h | 4 ++++ 1 file changed, 4 insertions(+)