mbox series

[v6,0/9] hw/iommufd: IOMMUFD Dirty Tracking

Message ID 20240722211326.70162-1-joao.m.martins@oracle.com
Headers show
Series hw/iommufd: IOMMUFD Dirty Tracking | expand

Message

Joao Martins July 22, 2024, 9:13 p.m. UTC
This small series adds support for IOMMU dirty tracking support via the
IOMMUFD backend. The hardware capability is available on most recent x86
hardware (and these SMMUv3 in upcoming v6.11). The series is divided
organized as follows:

* Patches 1 - 7: IOMMUFD backend support for dirty tracking;

Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
we will find and attach a device to a compatible IOMMU domain, or allocate a new
hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
workflow is relatively simple:

1) Probe device and allow dirty tracking in the HWPT
2) Toggling dirty tracking on/off
3) Read-and-clear of Dirty IOVAs

The heuristics selected for (1) were to always request the HWPT for
dirty tracking if supported, or rely on device dirty page tracking. This
is a little simplistic and we aren't necessarily utilizing IOMMU dirty
tracking even if we ask during hwpt allocation.

The unmap case is deferred until further vIOMMU support with migration
is added[3] which will then introduce the usage of
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
dma unmap bitmap flow.

* Patches 8 - 9: Don't block live migration where there's no VF dirty
tracker, considering that we have IOMMU dirty tracking.

Comments and feedback appreciated (on patches 1, 5, 8, 9)

Cheers,
    Joao

P.S. Suggest v6.11-rc as hypervisor kernel as there's
some bugs fixed there with regards to IOMMU hugepage dirty tracking.

Changes since v5[6]:
* Remove patches 1-4 as these were commited to vfio-next
* Add the Rb by Cedric and Zhenzhong (previously patches 7, 8, 10, 11)
* Introduce VFIODevice::iommu_dirty_tracking and use it on patch 5, 8
to store whether we can use IOMMU dirty tracking.

Changes since v4[5]:
* Add various Reviewed-by in patches 2,3,4,6,8,11
* Change error messages to mention IOMMU (Zhenzhong)
* Better improve the checking of dirty page tracking in
  vfio_migration_realize() to detect per-device IOMMU instead of using
  container dirty_page_supported().
* Improve various commit messages (Eric)
* Extract the caps::hw_caps into its own patch as it was miosleading to
be hidden in another patch (new patch 7)
* Restructure patch 1 helper to be vfio_device_is_mdev() and use
vfio::mdev directly in rest of patches (Cedric)
* Improve error messages of set,query dirty tracking (Cedric)
* Add missing casts to uintptr and uint64_t* (Cedric)
* Add missing commens to struct doc from aw_bits removal (and hw_caps
addition) (Eric)
* Fix the detach flow in auto domains (Eric)
* Set hwpt to NULL on detach (Eric)
* Spurious line (Eric)

Changes since v3[5]:
* Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if the VFIO
  device is mdev. (Zhenzhong)
* Skip setting IOMMU device for mdev (Zhenzhong)
* Add Zhenzhong review tag in patch 3
* Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
  a new HostIOMMUDevice capability and thus remove the cap patch from the series (Zhenzhong)
* Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization in attach_device()
while skipping it all together for mdev. (Cedric)
* Due to the previous item, had to remove aw_bits because it depends on device attach being
finished, instead defer it to when get_cap() gets called.
* Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
* Pass errp in all cases, and instead just free the error in case of -EINVAL
  in most of all patches, and also pass Error* in iommufd_backend_alloc_hwpt() amd
  set/query dirty. This is made better thanks in part to skipping auto domains for mdev (Cedric)

Changes since RFCv2[4]:
* Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
we end up not actually toggling dirty tracking. (Avihai)
* Fix error handling widely in auto domains logic and all patches (Avihai)
* Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
* New patches 1 and 2 taking into consideration previous comments.
* Store hwpt::flags to know if we have dirty tracking (Avihai)
* New patch 8, that allows to query dirty tracking support after
provisioning. This is a cleaner way to check IOMMU dirty tracking support
when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
device caps way is still used because at vfio attach we aren't yet with
a fully initialized migration state.
* Adopt error propagation in query,set dirty tracking
* Misc improvements overall broadly and Avihai
* Drop hugepages as it's a bit unrelated; I can pursue that patch
* separately. The main motivation is to provide a way to test
without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
does.

Changes since RFCv1[2]:
* Remove intel/amd dirty tracking emulation enabling
* Remove the dirtyrate improvement for VF/IOMMU dirty tracking
[Will pursue these two in separate series]
* Introduce auto domains support
* Enforce dirty tracking following the IOMMUFD UAPI for this
* Add support for toggling hugepages in IOMMUFD
* Auto enable support when VF supports migration to use IOMMU
when it doesn't have VF dirty tracking
* Add a parameter to toggle VF dirty tracking

[0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
[2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
[3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
[4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
[5] https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
[6] https://lore.kernel.org/qemu-devel/20240719120501.81279-1-joao.m.martins@oracle.com/

Joao Martins (9):
  vfio/iommufd: Introduce auto domain creation
  vfio/{iommufd,container}: Remove caps::aw_bits
  vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
  vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
    attach_device()
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  vfio/migration: Don't block migration device dirty tracking is
    unsupported
  vfio/common: Allow disabling device dirty page tracking

 include/hw/vfio/vfio-common.h      |  13 +++
 include/sysemu/host_iommu_device.h |   5 +-
 include/sysemu/iommufd.h           |  11 ++
 backends/iommufd.c                 |  85 ++++++++++++++-
 hw/vfio/common.c                   |  19 ++--
 hw/vfio/container.c                |   9 +-
 hw/vfio/helpers.c                  |  11 ++
 hw/vfio/iommufd.c                  | 170 ++++++++++++++++++++++++++++-
 hw/vfio/migration.c                |  12 +-
 hw/vfio/pci.c                      |   3 +
 backends/trace-events              |   3 +
 11 files changed, 318 insertions(+), 23 deletions(-)

Comments

Eric Auger July 23, 2024, 7:38 a.m. UTC | #1
Hi Joao,

On 7/22/24 23:13, Joao Martins wrote:
> Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
> before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
> behind the device supports dirty tracking.
>
> Note: The HostIOMMUDevice data from legacy backend is static and doesn't
> need any information from the (type1-iommu) backend to be initialized.
> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
> iommufd FD to be connected and having a devid to be able to successfully
Nit: maybe this comment shall be also added in iommufd.c before the call
to vfio_device_hiod_realize() to avoid someone else to move that call
earlier at some point
> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
> different places within the backend .attach_device() implementation.
>
> Suggested-by: Cédric Le Goater <clg@redhat.cm>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  hw/vfio/common.c              | 16 ++++++----------
>  hw/vfio/container.c           |  4 ++++
>  hw/vfio/helpers.c             | 11 +++++++++++
>  hw/vfio/iommufd.c             |  4 ++++
>  5 files changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 1a96678f8c38..4e44b26d3c45 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>  void vfio_reset_handler(void *opaque);
>  struct vfio_device_info *vfio_get_device_info(int fd);
>  bool vfio_device_is_mdev(VFIODevice *vbasedev);
> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>  bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>                          AddressSpace *as, Error **errp);
>  void vfio_detach_device(VFIODevice *vbasedev);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 784e266e6aab..da12cbd56408 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>  {
>      const VFIOIOMMUClass *ops =
>          VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
> -    HostIOMMUDevice *hiod;
> +    HostIOMMUDevice *hiod = NULL;
>  
>      if (vbasedev->iommufd) {
>          ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>  
>      assert(ops);
>  
> -    if (!ops->attach_device(name, vbasedev, as, errp)) {
> -        return false;
> -    }
>  
> -    if (vbasedev->mdev) {
> -        return true;
> +    if (!vbasedev->mdev) {
> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> +        vbasedev->hiod = hiod;
>      }
>  
> -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>          object_unref(hiod);
> -        ops->detach_device(vbasedev);
> +        vbasedev->hiod = NULL;
>          return false;
>      }
> -    vbasedev->hiod = hiod;
>  
>      return true;
>  }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 10cb4b4320ac..9ccdb639ac84 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>  
>      trace_vfio_attach_device(vbasedev->name, groupid);
>  
> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
> +        return false;
don't you want to go to err_alloc_ioas instead?
> +    }
> +
>      group = vfio_get_group(groupid, as, errp);
>      if (!group) {
>          return false;
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index 7e23e9080c9d..ea15c79db0a3 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>      subsys = realpath(tmp, NULL);
>      return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>  }
> +
> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
> +{
> +    HostIOMMUDevice *hiod = vbasedev->hiod;
> +
> +    if (!hiod) {
> +        return true;
> +    }
> +
> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
> +}
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 5e2fc1ce089d..2324bf892c56 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>  
>      space = vfio_get_address_space(as);
>  
> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
> +        return false;
> +    }
> +
>      /* try to attach to an existing container in this space */
>      QLIST_FOREACH(bcontainer, &space->containers, next) {
>          container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
Eric
Cédric Le Goater July 23, 2024, 7:44 a.m. UTC | #2
On 7/23/24 09:38, Eric Auger wrote:
> Hi Joao,
> 
> On 7/22/24 23:13, Joao Martins wrote:
>> Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
>> behind the device supports dirty tracking.
>>
>> Note: The HostIOMMUDevice data from legacy backend is static and doesn't
>> need any information from the (type1-iommu) backend to be initialized.
>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>> iommufd FD to be connected and having a devid to be able to successfully
> Nit: maybe this comment shall be also added in iommufd.c before the call
> to vfio_device_hiod_realize() to avoid someone else to move that call
> earlier at some point
>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>> different places within the backend .attach_device() implementation.
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   include/hw/vfio/vfio-common.h |  1 +
>>   hw/vfio/common.c              | 16 ++++++----------
>>   hw/vfio/container.c           |  4 ++++
>>   hw/vfio/helpers.c             | 11 +++++++++++
>>   hw/vfio/iommufd.c             |  4 ++++
>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 1a96678f8c38..4e44b26d3c45 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>   void vfio_reset_handler(void *opaque);
>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>                           AddressSpace *as, Error **errp);
>>   void vfio_detach_device(VFIODevice *vbasedev);
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 784e266e6aab..da12cbd56408 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>   {
>>       const VFIOIOMMUClass *ops =
>>           VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>> -    HostIOMMUDevice *hiod;
>> +    HostIOMMUDevice *hiod = NULL;
>>   
>>       if (vbasedev->iommufd) {
>>           ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>   
>>       assert(ops);
>>   
>> -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>> -        return false;
>> -    }
>>   
>> -    if (vbasedev->mdev) {
>> -        return true;
>> +    if (!vbasedev->mdev) {
>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> +        vbasedev->hiod = hiod;
>>       }
>>   
>> -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>           object_unref(hiod);
>> -        ops->detach_device(vbasedev);
>> +        vbasedev->hiod = NULL;
>>           return false;
>>       }
>> -    vbasedev->hiod = hiod;
>>   
>>       return true;
>>   }
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 10cb4b4320ac..9ccdb639ac84 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>   
>>       trace_vfio_attach_device(vbasedev->name, groupid);
>>   
>> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>> +        return false;
> don't you want to go to err_alloc_ioas instead?

hmm, the err_alloc_ioas label is in a different function iommufd_cdev_attach().

may be you meant the comment for routine iommufd_cdev_attach() and
label err_connect_bind ?


Thanks,

C.


>> +    }
>> +
>>       group = vfio_get_group(groupid, as, errp);
>>       if (!group) {
>>           return false;
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index 7e23e9080c9d..ea15c79db0a3 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>       subsys = realpath(tmp, NULL);
>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>   }
>> +
>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>> +{
>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>> +
>> +    if (!hiod) {
>> +        return true;
>> +    }
>> +
>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
>> +}
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 5e2fc1ce089d..2324bf892c56 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>   
>>       space = vfio_get_address_space(as);
>>   
>> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>> +        return false;
>> +    }
>> +
>>       /* try to attach to an existing container in this space */
>>       QLIST_FOREACH(bcontainer, &space->containers, next) {
>>           container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> Eric
>
Joao Martins July 23, 2024, 7:53 a.m. UTC | #3
On 23/07/2024 08:38, Eric Auger wrote:
> Hi Joao,
> 
> On 7/22/24 23:13, Joao Martins wrote:
>> Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
>> behind the device supports dirty tracking.
>>
>> Note: The HostIOMMUDevice data from legacy backend is static and doesn't
>> need any information from the (type1-iommu) backend to be initialized.
>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>> iommufd FD to be connected and having a devid to be able to successfully
> Nit: maybe this comment shall be also added in iommufd.c before the call
> to vfio_device_hiod_realize() to avoid someone else to move that call
> earlier at some point
>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>> different places within the backend .attach_device() implementation.
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  include/hw/vfio/vfio-common.h |  1 +
>>  hw/vfio/common.c              | 16 ++++++----------
>>  hw/vfio/container.c           |  4 ++++
>>  hw/vfio/helpers.c             | 11 +++++++++++
>>  hw/vfio/iommufd.c             |  4 ++++
>>  5 files changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 1a96678f8c38..4e44b26d3c45 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>  void vfio_reset_handler(void *opaque);
>>  struct vfio_device_info *vfio_get_device_info(int fd);
>>  bool vfio_device_is_mdev(VFIODevice *vbasedev);
>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>  bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>                          AddressSpace *as, Error **errp);
>>  void vfio_detach_device(VFIODevice *vbasedev);
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 784e266e6aab..da12cbd56408 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>  {
>>      const VFIOIOMMUClass *ops =
>>          VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>> -    HostIOMMUDevice *hiod;
>> +    HostIOMMUDevice *hiod = NULL;
>>  
>>      if (vbasedev->iommufd) {
>>          ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>  
>>      assert(ops);
>>  
>> -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>> -        return false;
>> -    }
>>  
>> -    if (vbasedev->mdev) {
>> -        return true;
>> +    if (!vbasedev->mdev) {
>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> +        vbasedev->hiod = hiod;
>>      }
>>  
>> -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>          object_unref(hiod);
>> -        ops->detach_device(vbasedev);
>> +        vbasedev->hiod = NULL;
>>          return false;
>>      }
>> -    vbasedev->hiod = hiod;
>>  
>>      return true;
>>  }
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 10cb4b4320ac..9ccdb639ac84 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>  
>>      trace_vfio_attach_device(vbasedev->name, groupid);
>>  
>> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>> +        return false;
> don't you want to go to err_alloc_ioas instead?

Oh, yes, I thought I was doing that, but I am not :( Thanks for catching that

Your comment is spot on but in the wrong place.

vfio_legacy_attach_device() can just return false as it's at the top of the
function but here (...)

>> +    }
>> +
>>      group = vfio_get_group(groupid, as, errp);
>>      if (!group) {
>>          return false;
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index 7e23e9080c9d..ea15c79db0a3 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>      subsys = realpath(tmp, NULL);
>>      return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>  }
>> +
>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>> +{
>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>> +
>> +    if (!hiod) {
>> +        return true;
>> +    }
>> +
>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
>> +}
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 5e2fc1ce089d..2324bf892c56 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>  
>>      space = vfio_get_address_space(as);
>>  
>> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>> +        return false;
>> +    }
>> +

(...) we definitely need a goto err_alloc_ioas here.

Snip below:

@@ -482,7 +483,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice
*vbasedev,
     space = vfio_get_address_space(as);

     if (!vfio_device_hiod_realize(vbasedev, errp)) {
-        return false;
+        goto err_alloc_ioas;
     }

     /* try to attach to an existing container in this space */
Eric Auger July 23, 2024, 7:55 a.m. UTC | #4
On 7/23/24 09:44, Cédric Le Goater wrote:
> On 7/23/24 09:38, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/22/24 23:13, Joao Martins wrote:
>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>> of the device
>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>> allows the use
>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell
>>> if the IOMMU
>>> behind the device supports dirty tracking.
>>>
>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>> doesn't
>>> need any information from the (type1-iommu) backend to be initialized.
>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>>> iommufd FD to be connected and having a devid to be able to
>>> successfully
>> Nit: maybe this comment shall be also added in iommufd.c before the call
>> to vfio_device_hiod_realize() to avoid someone else to move that call
>> earlier at some point
>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>> different places within the backend .attach_device() implementation.
>>>
>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>   include/hw/vfio/vfio-common.h |  1 +
>>>   hw/vfio/common.c              | 16 ++++++----------
>>>   hw/vfio/container.c           |  4 ++++
>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>   hw/vfio/iommufd.c             |  4 ++++
>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h
>>> b/include/hw/vfio/vfio-common.h
>>> index 1a96678f8c38..4e44b26d3c45 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>   void vfio_reset_handler(void *opaque);
>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>                           AddressSpace *as, Error **errp);
>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 784e266e6aab..da12cbd56408 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice
>>> *vbasedev,
>>>   {
>>>       const VFIOIOMMUClass *ops =
>>>          
>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>> -    HostIOMMUDevice *hiod;
>>> +    HostIOMMUDevice *hiod = NULL;
>>>         if (vbasedev->iommufd) {
>>>           ops =
>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>> VFIODevice *vbasedev,
>>>         assert(ops);
>>>   -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>> -        return false;
>>> -    }
>>>   -    if (vbasedev->mdev) {
>>> -        return true;
>>> +    if (!vbasedev->mdev) {
>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>> +        vbasedev->hiod = hiod;
>>>       }
>>>   -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>>> errp)) {
>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>           object_unref(hiod);
>>> -        ops->detach_device(vbasedev);
>>> +        vbasedev->hiod = NULL;
>>>           return false;
>>>       }
>>> -    vbasedev->hiod = hiod;
>>>         return true;
>>>   }
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>> char *name, VFIODevice *vbasedev,
>>>         trace_vfio_attach_device(vbasedev->name, groupid);
>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>> +        return false;
>> don't you want to go to err_alloc_ioas instead?
>
> hmm, the err_alloc_ioas label is in a different function
> iommufd_cdev_attach().
>
> may be you meant the comment for routine iommufd_cdev_attach() and
> label err_connect_bind ?
>
>
> Thanks,
>
> C.
>
>
>>> +    }
>>> +
>>>       group = vfio_get_group(groupid, as, errp);
>>>       if (!group) {
>>>           return false;
>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>> --- a/hw/vfio/helpers.c
>>> +++ b/hw/vfio/helpers.c
>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>>       subsys = realpath(tmp, NULL);
>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>   }
>>> +
>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>> +{
>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>> +
>>> +    if (!hiod) {
>>> +        return true;
>>> +    }
>>> +
>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>> vbasedev, errp);
>>> +}
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 5e2fc1ce089d..2324bf892c56 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>> *name, VFIODevice *vbasedev,
>>>         space = vfio_get_address_space(as);
>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>> +        return false;
Hum sorry my previous comment was targetting that place. I think
unrolling is needed up to put_address_space

so effectively this does not match err_alloc_ioas but I guess we would
need another label

Eric
>>> +    }
>>> +
>>>       /* try to attach to an existing container in this space */
>>>       QLIST_FOREACH(bcontainer, &space->containers, next) {
>>>           container = container_of(bcontainer, VFIOIOMMUFDContainer,
>>> bcontainer);
>> Eric
>>
>
Cédric Le Goater July 23, 2024, 8 a.m. UTC | #5
On 7/23/24 09:53, Joao Martins wrote:
> On 23/07/2024 08:38, Eric Auger wrote:
>> Hi Joao,
>>
>> On 7/22/24 23:13, Joao Martins wrote:
>>> Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
>>> behind the device supports dirty tracking.
>>>
>>> Note: The HostIOMMUDevice data from legacy backend is static and doesn't
>>> need any information from the (type1-iommu) backend to be initialized.
>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>>> iommufd FD to be connected and having a devid to be able to successfully
>> Nit: maybe this comment shall be also added in iommufd.c before the call
>> to vfio_device_hiod_realize() to avoid someone else to move that call
>> earlier at some point
>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>> different places within the backend .attach_device() implementation.
>>>
>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>   include/hw/vfio/vfio-common.h |  1 +
>>>   hw/vfio/common.c              | 16 ++++++----------
>>>   hw/vfio/container.c           |  4 ++++
>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>   hw/vfio/iommufd.c             |  4 ++++
>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 1a96678f8c38..4e44b26d3c45 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>   void vfio_reset_handler(void *opaque);
>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>                           AddressSpace *as, Error **errp);
>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 784e266e6aab..da12cbd56408 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>   {
>>>       const VFIOIOMMUClass *ops =
>>>           VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>> -    HostIOMMUDevice *hiod;
>>> +    HostIOMMUDevice *hiod = NULL;
>>>   
>>>       if (vbasedev->iommufd) {
>>>           ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>   
>>>       assert(ops);
>>>   
>>> -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>> -        return false;
>>> -    }
>>>   
>>> -    if (vbasedev->mdev) {
>>> -        return true;
>>> +    if (!vbasedev->mdev) {
>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>> +        vbasedev->hiod = hiod;
>>>       }
>>>   
>>> -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>           object_unref(hiod);
>>> -        ops->detach_device(vbasedev);
>>> +        vbasedev->hiod = NULL;
>>>           return false;
>>>       }
>>> -    vbasedev->hiod = hiod;
>>>   
>>>       return true;
>>>   }
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>>   
>>>       trace_vfio_attach_device(vbasedev->name, groupid);
>>>   
>>> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>> +        return false;
>> don't you want to go to err_alloc_ioas instead?
> 
> Oh, yes, I thought I was doing that, but I am not :( Thanks for catching that
> 
> Your comment is spot on but in the wrong place.
> 
> vfio_legacy_attach_device() can just return false as it's at the top of the
> function but here (...)
> 
>>> +    }
>>> +
>>>       group = vfio_get_group(groupid, as, errp);
>>>       if (!group) {
>>>           return false;
>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>> --- a/hw/vfio/helpers.c
>>> +++ b/hw/vfio/helpers.c
>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>>       subsys = realpath(tmp, NULL);
>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>   }
>>> +
>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>> +{
>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>> +
>>> +    if (!hiod) {
>>> +        return true;
>>> +    }
>>> +
>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
>>> +}
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 5e2fc1ce089d..2324bf892c56 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>>   
>>>       space = vfio_get_address_space(as);
>>>   
>>> +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>> +        return false;
>>> +    }
>>> +
> 
> (...) we definitely need a goto err_alloc_ioas here.
> 
> Snip below:
> 
> @@ -482,7 +483,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice
> *vbasedev,
>       space = vfio_get_address_space(as);
> 
>       if (!vfio_device_hiod_realize(vbasedev, errp)) {
> -        return false;
> +        goto err_alloc_ioas;
>       }
> 
>       /* try to attach to an existing container in this space */
> 

ok. Applied the changes.


Thanks,

C.
Joao Martins July 23, 2024, 8:05 a.m. UTC | #6
On 23/07/2024 08:55, Eric Auger wrote:
> 
> 
> On 7/23/24 09:44, Cédric Le Goater wrote:
>> On 7/23/24 09:38, Eric Auger wrote:
>>> Hi Joao,
>>>
>>> On 7/22/24 23:13, Joao Martins wrote:
>>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>>> of the device
>>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>>> allows the use
>>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell
>>>> if the IOMMU
>>>> behind the device supports dirty tracking.
>>>>
>>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>>> doesn't
>>>> need any information from the (type1-iommu) backend to be initialized.
>>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>>>> iommufd FD to be connected and having a devid to be able to
>>>> successfully
>>> Nit: maybe this comment shall be also added in iommufd.c before the call
>>> to vfio_device_hiod_realize() to avoid someone else to move that call
>>> earlier at some point
>>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>>> different places within the backend .attach_device() implementation.
>>>>
>>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>   include/hw/vfio/vfio-common.h |  1 +
>>>>   hw/vfio/common.c              | 16 ++++++----------
>>>>   hw/vfio/container.c           |  4 ++++
>>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>>   hw/vfio/iommufd.c             |  4 ++++
>>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>> b/include/hw/vfio/vfio-common.h
>>>> index 1a96678f8c38..4e44b26d3c45 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>>   void vfio_reset_handler(void *opaque);
>>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>                           AddressSpace *as, Error **errp);
>>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 784e266e6aab..da12cbd56408 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice
>>>> *vbasedev,
>>>>   {
>>>>       const VFIOIOMMUClass *ops =
>>>>          
>>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>>> -    HostIOMMUDevice *hiod;
>>>> +    HostIOMMUDevice *hiod = NULL;
>>>>         if (vbasedev->iommufd) {
>>>>           ops =
>>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>>> VFIODevice *vbasedev,
>>>>         assert(ops);
>>>>   -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>> -        return false;
>>>> -    }
>>>>   -    if (vbasedev->mdev) {
>>>> -        return true;
>>>> +    if (!vbasedev->mdev) {
>>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>> +        vbasedev->hiod = hiod;
>>>>       }
>>>>   -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>>>> errp)) {
>>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>           object_unref(hiod);
>>>> -        ops->detach_device(vbasedev);
>>>> +        vbasedev->hiod = NULL;
>>>>           return false;
>>>>       }
>>>> -    vbasedev->hiod = hiod;
>>>>         return true;
>>>>   }
>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>>> --- a/hw/vfio/container.c
>>>> +++ b/hw/vfio/container.c
>>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>>> char *name, VFIODevice *vbasedev,
>>>>         trace_vfio_attach_device(vbasedev->name, groupid);
>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>> +        return false;
>>> don't you want to go to err_alloc_ioas instead?
>>
>> hmm, the err_alloc_ioas label is in a different function
>> iommufd_cdev_attach().
>>
>> may be you meant the comment for routine iommufd_cdev_attach() and
>> label err_connect_bind ?
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>>>> +    }
>>>> +
>>>>       group = vfio_get_group(groupid, as, errp);
>>>>       if (!group) {
>>>>           return false;
>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>>> --- a/hw/vfio/helpers.c
>>>> +++ b/hw/vfio/helpers.c
>>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>>>       subsys = realpath(tmp, NULL);
>>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>>   }
>>>> +
>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>>> +{
>>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>>> +
>>>> +    if (!hiod) {
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>>> vbasedev, errp);
>>>> +}
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 5e2fc1ce089d..2324bf892c56 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>>> *name, VFIODevice *vbasedev,
>>>>         space = vfio_get_address_space(as);
>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>> +        return false;
> Hum sorry my previous comment was targetting that place. I think
> unrolling is needed up to put_address_space
> 
> so effectively this does not match err_alloc_ioas but I guess we would
> need another label
> 

You're right. We haven't yet attached rthe device and that's what err_alloc_ioas
would do. Adding another label not sure would make things cleaner given the
ordering requirement. So maybe this instead?

@@ -482,7 +483,8 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice
*vbasedev,
     space = vfio_get_address_space(as);

     if (!vfio_device_hiod_realize(vbasedev, errp)) {
-        return false;
+        vfio_put_address_space(space);
+        goto err_connect_bind;
     }

     /* try to attach to an existing container in this space */
Cédric Le Goater July 23, 2024, 8:08 a.m. UTC | #7
On 7/23/24 10:05, Joao Martins wrote:
> On 23/07/2024 08:55, Eric Auger wrote:
>>
>>
>> On 7/23/24 09:44, Cédric Le Goater wrote:
>>> On 7/23/24 09:38, Eric Auger wrote:
>>>> Hi Joao,
>>>>
>>>> On 7/22/24 23:13, Joao Martins wrote:
>>>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>>>> of the device
>>>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>>>> allows the use
>>>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell
>>>>> if the IOMMU
>>>>> behind the device supports dirty tracking.
>>>>>
>>>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>>>> doesn't
>>>>> need any information from the (type1-iommu) backend to be initialized.
>>>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>>>>> iommufd FD to be connected and having a devid to be able to
>>>>> successfully
>>>> Nit: maybe this comment shall be also added in iommufd.c before the call
>>>> to vfio_device_hiod_realize() to avoid someone else to move that call
>>>> earlier at some point
>>>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>>>> different places within the backend .attach_device() implementation.
>>>>>
>>>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>    include/hw/vfio/vfio-common.h |  1 +
>>>>>    hw/vfio/common.c              | 16 ++++++----------
>>>>>    hw/vfio/container.c           |  4 ++++
>>>>>    hw/vfio/helpers.c             | 11 +++++++++++
>>>>>    hw/vfio/iommufd.c             |  4 ++++
>>>>>    5 files changed, 26 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>> b/include/hw/vfio/vfio-common.h
>>>>> index 1a96678f8c38..4e44b26d3c45 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>>>    void vfio_reset_handler(void *opaque);
>>>>>    struct vfio_device_info *vfio_get_device_info(int fd);
>>>>>    bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>>>    bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>>                            AddressSpace *as, Error **errp);
>>>>>    void vfio_detach_device(VFIODevice *vbasedev);
>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>> index 784e266e6aab..da12cbd56408 100644
>>>>> --- a/hw/vfio/common.c
>>>>> +++ b/hw/vfio/common.c
>>>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice
>>>>> *vbasedev,
>>>>>    {
>>>>>        const VFIOIOMMUClass *ops =
>>>>>           
>>>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>>>> -    HostIOMMUDevice *hiod;
>>>>> +    HostIOMMUDevice *hiod = NULL;
>>>>>          if (vbasedev->iommufd) {
>>>>>            ops =
>>>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>>>> VFIODevice *vbasedev,
>>>>>          assert(ops);
>>>>>    -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>> -        return false;
>>>>> -    }
>>>>>    -    if (vbasedev->mdev) {
>>>>> -        return true;
>>>>> +    if (!vbasedev->mdev) {
>>>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>> +        vbasedev->hiod = hiod;
>>>>>        }
>>>>>    -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>>>>> errp)) {
>>>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>            object_unref(hiod);
>>>>> -        ops->detach_device(vbasedev);
>>>>> +        vbasedev->hiod = NULL;
>>>>>            return false;
>>>>>        }
>>>>> -    vbasedev->hiod = hiod;
>>>>>          return true;
>>>>>    }
>>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>>>> --- a/hw/vfio/container.c
>>>>> +++ b/hw/vfio/container.c
>>>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>>>> char *name, VFIODevice *vbasedev,
>>>>>          trace_vfio_attach_device(vbasedev->name, groupid);
>>>>>    +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>> +        return false;
>>>> don't you want to go to err_alloc_ioas instead?
>>>
>>> hmm, the err_alloc_ioas label is in a different function
>>> iommufd_cdev_attach().
>>>
>>> may be you meant the comment for routine iommufd_cdev_attach() and
>>> label err_connect_bind ?
>>>
>>>
>>> Thanks,
>>>
>>> C.
>>>
>>>
>>>>> +    }
>>>>> +
>>>>>        group = vfio_get_group(groupid, as, errp);
>>>>>        if (!group) {
>>>>>            return false;
>>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>>>> --- a/hw/vfio/helpers.c
>>>>> +++ b/hw/vfio/helpers.c
>>>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>>>>        subsys = realpath(tmp, NULL);
>>>>>        return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>>>    }
>>>>> +
>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>>>> +{
>>>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>>>> +
>>>>> +    if (!hiod) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>>>> vbasedev, errp);
>>>>> +}
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index 5e2fc1ce089d..2324bf892c56 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>>>> *name, VFIODevice *vbasedev,
>>>>>          space = vfio_get_address_space(as);
>>>>>    +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>> +        return false;
>> Hum sorry my previous comment was targetting that place. I think
>> unrolling is needed up to put_address_space
>>
>> so effectively this does not match err_alloc_ioas but I guess we would
>> need another label
>>
> 
> You're right. We haven't yet attached rthe device and that's what err_alloc_ioas
> would do. Adding another label not sure would make things cleaner given the
> ordering requirement. So maybe this instead?
> 
> @@ -482,7 +483,8 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice
> *vbasedev,
>       space = vfio_get_address_space(as);
> 
>       if (!vfio_device_hiod_realize(vbasedev, errp)) {
> -        return false;
> +        vfio_put_address_space(space);
> +        goto err_connect_bind;
>       }
> 
>       /* try to attach to an existing container in this space */
> 

LGTM.


Thanks,

C.
Eric Auger July 23, 2024, 8:10 a.m. UTC | #8
On 7/23/24 10:05, Joao Martins wrote:
> On 23/07/2024 08:55, Eric Auger wrote:
>>
>> On 7/23/24 09:44, Cédric Le Goater wrote:
>>> On 7/23/24 09:38, Eric Auger wrote:
>>>> Hi Joao,
>>>>
>>>> On 7/22/24 23:13, Joao Martins wrote:
>>>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>>>> of the device
>>>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>>>> allows the use
>>>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell
>>>>> if the IOMMU
>>>>> behind the device supports dirty tracking.
>>>>>
>>>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>>>> doesn't
>>>>> need any information from the (type1-iommu) backend to be initialized.
>>>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>>>>> iommufd FD to be connected and having a devid to be able to
>>>>> successfully
>>>> Nit: maybe this comment shall be also added in iommufd.c before the call
>>>> to vfio_device_hiod_realize() to avoid someone else to move that call
>>>> earlier at some point
>>>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>>>> different places within the backend .attach_device() implementation.
>>>>>
>>>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>   include/hw/vfio/vfio-common.h |  1 +
>>>>>   hw/vfio/common.c              | 16 ++++++----------
>>>>>   hw/vfio/container.c           |  4 ++++
>>>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>>>   hw/vfio/iommufd.c             |  4 ++++
>>>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>> b/include/hw/vfio/vfio-common.h
>>>>> index 1a96678f8c38..4e44b26d3c45 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>>>   void vfio_reset_handler(void *opaque);
>>>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>>                           AddressSpace *as, Error **errp);
>>>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>> index 784e266e6aab..da12cbd56408 100644
>>>>> --- a/hw/vfio/common.c
>>>>> +++ b/hw/vfio/common.c
>>>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name, VFIODevice
>>>>> *vbasedev,
>>>>>   {
>>>>>       const VFIOIOMMUClass *ops =
>>>>>          
>>>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>>>> -    HostIOMMUDevice *hiod;
>>>>> +    HostIOMMUDevice *hiod = NULL;
>>>>>         if (vbasedev->iommufd) {
>>>>>           ops =
>>>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>>>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>>>> VFIODevice *vbasedev,
>>>>>         assert(ops);
>>>>>   -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>> -        return false;
>>>>> -    }
>>>>>   -    if (vbasedev->mdev) {
>>>>> -        return true;
>>>>> +    if (!vbasedev->mdev) {
>>>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>> +        vbasedev->hiod = hiod;
>>>>>       }
>>>>>   -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>>>>> errp)) {
>>>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>           object_unref(hiod);
>>>>> -        ops->detach_device(vbasedev);
>>>>> +        vbasedev->hiod = NULL;
>>>>>           return false;
>>>>>       }
>>>>> -    vbasedev->hiod = hiod;
>>>>>         return true;
>>>>>   }
>>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>>>> --- a/hw/vfio/container.c
>>>>> +++ b/hw/vfio/container.c
>>>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>>>> char *name, VFIODevice *vbasedev,
>>>>>         trace_vfio_attach_device(vbasedev->name, groupid);
>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>> +        return false;
>>>> don't you want to go to err_alloc_ioas instead?
>>> hmm, the err_alloc_ioas label is in a different function
>>> iommufd_cdev_attach().
>>>
>>> may be you meant the comment for routine iommufd_cdev_attach() and
>>> label err_connect_bind ?
>>>
>>>
>>> Thanks,
>>>
>>> C.
>>>
>>>
>>>>> +    }
>>>>> +
>>>>>       group = vfio_get_group(groupid, as, errp);
>>>>>       if (!group) {
>>>>>           return false;
>>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>>>> --- a/hw/vfio/helpers.c
>>>>> +++ b/hw/vfio/helpers.c
>>>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
>>>>>       subsys = realpath(tmp, NULL);
>>>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>>>   }
>>>>> +
>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>>>> +{
>>>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>>>> +
>>>>> +    if (!hiod) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>>>> vbasedev, errp);
>>>>> +}
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index 5e2fc1ce089d..2324bf892c56 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>>>> *name, VFIODevice *vbasedev,
>>>>>         space = vfio_get_address_space(as);
>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>> +        return false;
>> Hum sorry my previous comment was targetting that place. I think
>> unrolling is needed up to put_address_space
>>
>> so effectively this does not match err_alloc_ioas but I guess we would
>> need another label
>>
> You're right. We haven't yet attached rthe device and that's what err_alloc_ioas
> would do. Adding another label not sure would make things cleaner given the
> ordering requirement. So maybe this instead?
>
> @@ -482,7 +483,8 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice
> *vbasedev,
>      space = vfio_get_address_space(as);
>
>      if (!vfio_device_hiod_realize(vbasedev, errp)) {
> -        return false;
> +        vfio_put_address_space(space);
> +        goto err_connect_bind;
>      }
>
>      /* try to attach to an existing container in this space */
>
With that addition
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
Duan, Zhenzhong July 23, 2024, 8:20 a.m. UTC | #9
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v6 4/9] vfio/{iommufd,container}: Invoke
>HostIOMMUDevice::realize() during attach_device()
>
>On 23/07/2024 08:55, Eric Auger wrote:
>>
>>
>> On 7/23/24 09:44, Cédric Le Goater wrote:
>>> On 7/23/24 09:38, Eric Auger wrote:
>>>> Hi Joao,
>>>>
>>>> On 7/22/24 23:13, Joao Martins wrote:
>>>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>>>> of the device
>>>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>>>> allows the use
>>>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially
>tell
>>>>> if the IOMMU
>>>>> behind the device supports dirty tracking.
>>>>>
>>>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>>>> doesn't
>>>>> need any information from the (type1-iommu) backend to be
>initialized.
>>>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires
>the
>>>>> iommufd FD to be connected and having a devid to be able to
>>>>> successfully
>>>> Nit: maybe this comment shall be also added in iommufd.c before the
>call
>>>> to vfio_device_hiod_realize() to avoid someone else to move that call
>>>> earlier at some point
>>>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>>>> different places within the backend .attach_device() implementation.
>>>>>
>>>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>   include/hw/vfio/vfio-common.h |  1 +
>>>>>   hw/vfio/common.c              | 16 ++++++----------
>>>>>   hw/vfio/container.c           |  4 ++++
>>>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>>>   hw/vfio/iommufd.c             |  4 ++++
>>>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>> b/include/hw/vfio/vfio-common.h
>>>>> index 1a96678f8c38..4e44b26d3c45 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>>>   void vfio_reset_handler(void *opaque);
>>>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>>                           AddressSpace *as, Error **errp);
>>>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>> index 784e266e6aab..da12cbd56408 100644
>>>>> --- a/hw/vfio/common.c
>>>>> +++ b/hw/vfio/common.c
>>>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name,
>VFIODevice
>>>>> *vbasedev,
>>>>>   {
>>>>>       const VFIOIOMMUClass *ops =
>>>>>
>>>>>
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>>>> -    HostIOMMUDevice *hiod;
>>>>> +    HostIOMMUDevice *hiod = NULL;
>>>>>         if (vbasedev->iommufd) {
>>>>>           ops =
>>>>>
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>D));
>>>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>>>> VFIODevice *vbasedev,
>>>>>         assert(ops);
>>>>>   -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>> -        return false;
>>>>> -    }
>>>>>   -    if (vbasedev->mdev) {
>>>>> -        return true;
>>>>> +    if (!vbasedev->mdev) {
>>>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops-
>>hiod_typename));
>>>>> +        vbasedev->hiod = hiod;
>>>>>       }
>>>>>   -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>vbasedev,
>>>>> errp)) {
>>>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>           object_unref(hiod);
>>>>> -        ops->detach_device(vbasedev);
>>>>> +        vbasedev->hiod = NULL;
>>>>>           return false;
>>>>>       }
>>>>> -    vbasedev->hiod = hiod;
>>>>>         return true;
>>>>>   }
>>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>>>> --- a/hw/vfio/container.c
>>>>> +++ b/hw/vfio/container.c
>>>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>>>> char *name, VFIODevice *vbasedev,
>>>>>         trace_vfio_attach_device(vbasedev->name, groupid);
>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>> +        return false;
>>>> don't you want to go to err_alloc_ioas instead?
>>>
>>> hmm, the err_alloc_ioas label is in a different function
>>> iommufd_cdev_attach().
>>>
>>> may be you meant the comment for routine iommufd_cdev_attach() and
>>> label err_connect_bind ?
>>>
>>>
>>> Thanks,
>>>
>>> C.
>>>
>>>
>>>>> +    }
>>>>> +
>>>>>       group = vfio_get_group(groupid, as, errp);
>>>>>       if (!group) {
>>>>>           return false;
>>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>>>> --- a/hw/vfio/helpers.c
>>>>> +++ b/hw/vfio/helpers.c
>>>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice
>*vbasedev)
>>>>>       subsys = realpath(tmp, NULL);
>>>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>>>   }
>>>>> +
>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>>>> +{
>>>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>>>> +
>>>>> +    if (!hiod) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>>>> vbasedev, errp);
>>>>> +}
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index 5e2fc1ce089d..2324bf892c56 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>>>> *name, VFIODevice *vbasedev,
>>>>>         space = vfio_get_address_space(as);
>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>> +        return false;
>> Hum sorry my previous comment was targetting that place. I think
>> unrolling is needed up to put_address_space
>>
>> so effectively this does not match err_alloc_ioas but I guess we would
>> need another label
>>
>
>You're right. We haven't yet attached rthe device and that's what
>err_alloc_ioas
>would do. Adding another label not sure would make things cleaner given
>the
>ordering requirement. So maybe this instead?
>
>@@ -482,7 +483,8 @@ static bool iommufd_cdev_attach(const char *name,
>VFIODevice
>*vbasedev,
>     space = vfio_get_address_space(as);
>
>     if (!vfio_device_hiod_realize(vbasedev, errp)) {
>-        return false;
>+        vfio_put_address_space(space);
>+        goto err_connect_bind;
>     }
>
>     /* try to attach to an existing container in this space */

I was confused though Cedric and Eric both ACK this change. Don't we miss the iommufd_cdev_unbind_and_disconnect() call?
Eric Auger July 23, 2024, 8:24 a.m. UTC | #10
On 7/23/24 10:20, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v6 4/9] vfio/{iommufd,container}: Invoke
>> HostIOMMUDevice::realize() during attach_device()
>>
>> On 23/07/2024 08:55, Eric Auger wrote:
>>>
>>> On 7/23/24 09:44, Cédric Le Goater wrote:
>>>> On 7/23/24 09:38, Eric Auger wrote:
>>>>> Hi Joao,
>>>>>
>>>>> On 7/22/24 23:13, Joao Martins wrote:
>>>>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>>>>> of the device
>>>>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>>>>> allows the use
>>>>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially
>> tell
>>>>>> if the IOMMU
>>>>>> behind the device supports dirty tracking.
>>>>>>
>>>>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>>>>> doesn't
>>>>>> need any information from the (type1-iommu) backend to be
>> initialized.
>>>>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires
>> the
>>>>>> iommufd FD to be connected and having a devid to be able to
>>>>>> successfully
>>>>> Nit: maybe this comment shall be also added in iommufd.c before the
>> call
>>>>> to vfio_device_hiod_realize() to avoid someone else to move that call
>>>>> earlier at some point
>>>>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>>>>> different places within the backend .attach_device() implementation.
>>>>>>
>>>>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> ---
>>>>>>   include/hw/vfio/vfio-common.h |  1 +
>>>>>>   hw/vfio/common.c              | 16 ++++++----------
>>>>>>   hw/vfio/container.c           |  4 ++++
>>>>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>>>>   hw/vfio/iommufd.c             |  4 ++++
>>>>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>>> b/include/hw/vfio/vfio-common.h
>>>>>> index 1a96678f8c38..4e44b26d3c45 100644
>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>>>>   void vfio_reset_handler(void *opaque);
>>>>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>>>                           AddressSpace *as, Error **errp);
>>>>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>> index 784e266e6aab..da12cbd56408 100644
>>>>>> --- a/hw/vfio/common.c
>>>>>> +++ b/hw/vfio/common.c
>>>>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name,
>> VFIODevice
>>>>>> *vbasedev,
>>>>>>   {
>>>>>>       const VFIOIOMMUClass *ops =
>>>>>>
>>>>>>
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>>>>> -    HostIOMMUDevice *hiod;
>>>>>> +    HostIOMMUDevice *hiod = NULL;
>>>>>>         if (vbasedev->iommufd) {
>>>>>>           ops =
>>>>>>
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>> D));
>>>>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>>>>> VFIODevice *vbasedev,
>>>>>>         assert(ops);
>>>>>>   -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>> -        return false;
>>>>>> -    }
>>>>>>   -    if (vbasedev->mdev) {
>>>>>> -        return true;
>>>>>> +    if (!vbasedev->mdev) {
>>>>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops-
>>> hiod_typename));
>>>>>> +        vbasedev->hiod = hiod;
>>>>>>       }
>>>>>>   -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>> vbasedev,
>>>>>> errp)) {
>>>>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>>           object_unref(hiod);
>>>>>> -        ops->detach_device(vbasedev);
>>>>>> +        vbasedev->hiod = NULL;
>>>>>>           return false;
>>>>>>       }
>>>>>> -    vbasedev->hiod = hiod;
>>>>>>         return true;
>>>>>>   }
>>>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>>>>> --- a/hw/vfio/container.c
>>>>>> +++ b/hw/vfio/container.c
>>>>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>>>>> char *name, VFIODevice *vbasedev,
>>>>>>         trace_vfio_attach_device(vbasedev->name, groupid);
>>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>>> +        return false;
>>>>> don't you want to go to err_alloc_ioas instead?
>>>> hmm, the err_alloc_ioas label is in a different function
>>>> iommufd_cdev_attach().
>>>>
>>>> may be you meant the comment for routine iommufd_cdev_attach() and
>>>> label err_connect_bind ?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> C.
>>>>
>>>>
>>>>>> +    }
>>>>>> +
>>>>>>       group = vfio_get_group(groupid, as, errp);
>>>>>>       if (!group) {
>>>>>>           return false;
>>>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>>>>> --- a/hw/vfio/helpers.c
>>>>>> +++ b/hw/vfio/helpers.c
>>>>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice
>> *vbasedev)
>>>>>>       subsys = realpath(tmp, NULL);
>>>>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>>>>   }
>>>>>> +
>>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>>>>> +{
>>>>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>>>>> +
>>>>>> +    if (!hiod) {
>>>>>> +        return true;
>>>>>> +    }
>>>>>> +
>>>>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>>>>> vbasedev, errp);
>>>>>> +}
>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>> index 5e2fc1ce089d..2324bf892c56 100644
>>>>>> --- a/hw/vfio/iommufd.c
>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>>>>> *name, VFIODevice *vbasedev,
>>>>>>         space = vfio_get_address_space(as);
>>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>>> +        return false;
>>> Hum sorry my previous comment was targetting that place. I think
>>> unrolling is needed up to put_address_space
>>>
>>> so effectively this does not match err_alloc_ioas but I guess we would
>>> need another label
>>>
>> You're right. We haven't yet attached rthe device and that's what
>> err_alloc_ioas
>> would do. Adding another label not sure would make things cleaner given
>> the
>> ordering requirement. So maybe this instead?
>>
>> @@ -482,7 +483,8 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice
>> *vbasedev,
>>     space = vfio_get_address_space(as);
>>
>>     if (!vfio_device_hiod_realize(vbasedev, errp)) {
>> -        return false;
>> +        vfio_put_address_space(space);
>> +        goto err_connect_bind;
>>     }
>>
>>     /* try to attach to an existing container in this space */
> I was confused though Cedric and Eric both ACK this change. Don't we miss the iommufd_cdev_unbind_and_disconnect() call?
Hum yes you're right. connect and bind was done. I thought this was done
later. so err_alloc_ioas label looks good

Eric
>
Joao Martins July 23, 2024, 8:26 a.m. UTC | #11
On 23/07/2024 09:24, Eric Auger wrote:
> 
> 
> On 7/23/24 10:20, Duan, Zhenzhong wrote:
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH v6 4/9] vfio/{iommufd,container}: Invoke
>>> HostIOMMUDevice::realize() during attach_device()
>>>
>>> On 23/07/2024 08:55, Eric Auger wrote:
>>>>
>>>> On 7/23/24 09:44, Cédric Le Goater wrote:
>>>>> On 7/23/24 09:38, Eric Auger wrote:
>>>>>> Hi Joao,
>>>>>>
>>>>>> On 7/22/24 23:13, Joao Martins wrote:
>>>>>>> Move the HostIOMMUDevice::realize() to be invoked during the attach
>>>>>>> of the device
>>>>>>> before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>>>>>>> allows the use
>>>>>>> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially
>>> tell
>>>>>>> if the IOMMU
>>>>>>> behind the device supports dirty tracking.
>>>>>>>
>>>>>>> Note: The HostIOMMUDevice data from legacy backend is static and
>>>>>>> doesn't
>>>>>>> need any information from the (type1-iommu) backend to be
>>> initialized.
>>>>>>> In contrast however, the IOMMUFD HostIOMMUDevice data requires
>>> the
>>>>>>> iommufd FD to be connected and having a devid to be able to
>>>>>>> successfully
>>>>>> Nit: maybe this comment shall be also added in iommufd.c before the
>>> call
>>>>>> to vfio_device_hiod_realize() to avoid someone else to move that call
>>>>>> earlier at some point
>>>>>>> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>>>>>>> different places within the backend .attach_device() implementation.
>>>>>>>
>>>>>>> Suggested-by: Cédric Le Goater <clg@redhat.cm>
>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>> ---
>>>>>>>   include/hw/vfio/vfio-common.h |  1 +
>>>>>>>   hw/vfio/common.c              | 16 ++++++----------
>>>>>>>   hw/vfio/container.c           |  4 ++++
>>>>>>>   hw/vfio/helpers.c             | 11 +++++++++++
>>>>>>>   hw/vfio/iommufd.c             |  4 ++++
>>>>>>>   5 files changed, 26 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>>>> b/include/hw/vfio/vfio-common.h
>>>>>>> index 1a96678f8c38..4e44b26d3c45 100644
>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
>>>>>>>   void vfio_reset_handler(void *opaque);
>>>>>>>   struct vfio_device_info *vfio_get_device_info(int fd);
>>>>>>>   bool vfio_device_is_mdev(VFIODevice *vbasedev);
>>>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>>>>>>>   bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>>>>>>>                           AddressSpace *as, Error **errp);
>>>>>>>   void vfio_detach_device(VFIODevice *vbasedev);
>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>>> index 784e266e6aab..da12cbd56408 100644
>>>>>>> --- a/hw/vfio/common.c
>>>>>>> +++ b/hw/vfio/common.c
>>>>>>> @@ -1537,7 +1537,7 @@ bool vfio_attach_device(char *name,
>>> VFIODevice
>>>>>>> *vbasedev,
>>>>>>>   {
>>>>>>>       const VFIOIOMMUClass *ops =
>>>>>>>
>>>>>>>
>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>>>>>>> -    HostIOMMUDevice *hiod;
>>>>>>> +    HostIOMMUDevice *hiod = NULL;
>>>>>>>         if (vbasedev->iommufd) {
>>>>>>>           ops =
>>>>>>>
>>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>>> D));
>>>>>>> @@ -1545,21 +1545,17 @@ bool vfio_attach_device(char *name,
>>>>>>> VFIODevice *vbasedev,
>>>>>>>         assert(ops);
>>>>>>>   -    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>>> -        return false;
>>>>>>> -    }
>>>>>>>   -    if (vbasedev->mdev) {
>>>>>>> -        return true;
>>>>>>> +    if (!vbasedev->mdev) {
>>>>>>> +        hiod = HOST_IOMMU_DEVICE(object_new(ops-
>>>> hiod_typename));
>>>>>>> +        vbasedev->hiod = hiod;
>>>>>>>       }
>>>>>>>   -    hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>>>>>>> -    if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>> vbasedev,
>>>>>>> errp)) {
>>>>>>> +    if (!ops->attach_device(name, vbasedev, as, errp)) {
>>>>>>>           object_unref(hiod);
>>>>>>> -        ops->detach_device(vbasedev);
>>>>>>> +        vbasedev->hiod = NULL;
>>>>>>>           return false;
>>>>>>>       }
>>>>>>> -    vbasedev->hiod = hiod;
>>>>>>>         return true;
>>>>>>>   }
>>>>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>>>>> index 10cb4b4320ac..9ccdb639ac84 100644
>>>>>>> --- a/hw/vfio/container.c
>>>>>>> +++ b/hw/vfio/container.c
>>>>>>> @@ -914,6 +914,10 @@ static bool vfio_legacy_attach_device(const
>>>>>>> char *name, VFIODevice *vbasedev,
>>>>>>>         trace_vfio_attach_device(vbasedev->name, groupid);
>>>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>>>> +        return false;
>>>>>> don't you want to go to err_alloc_ioas instead?
>>>>> hmm, the err_alloc_ioas label is in a different function
>>>>> iommufd_cdev_attach().
>>>>>
>>>>> may be you meant the comment for routine iommufd_cdev_attach() and
>>>>> label err_connect_bind ?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> C.
>>>>>
>>>>>
>>>>>>> +    }
>>>>>>> +
>>>>>>>       group = vfio_get_group(groupid, as, errp);
>>>>>>>       if (!group) {
>>>>>>>           return false;
>>>>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>>>>> index 7e23e9080c9d..ea15c79db0a3 100644
>>>>>>> --- a/hw/vfio/helpers.c
>>>>>>> +++ b/hw/vfio/helpers.c
>>>>>>> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice
>>> *vbasedev)
>>>>>>>       subsys = realpath(tmp, NULL);
>>>>>>>       return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>>>>>>>   }
>>>>>>> +
>>>>>>> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>>>>>>> +{
>>>>>>> +    HostIOMMUDevice *hiod = vbasedev->hiod;
>>>>>>> +
>>>>>>> +    if (!hiod) {
>>>>>>> +        return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod,
>>>>>>> vbasedev, errp);
>>>>>>> +}
>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>> index 5e2fc1ce089d..2324bf892c56 100644
>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>>>>>>> *name, VFIODevice *vbasedev,
>>>>>>>         space = vfio_get_address_space(as);
>>>>>>>   +    if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>>>>>> +        return false;
>>>> Hum sorry my previous comment was targetting that place. I think
>>>> unrolling is needed up to put_address_space
>>>>
>>>> so effectively this does not match err_alloc_ioas but I guess we would
>>>> need another label
>>>>
>>> You're right. We haven't yet attached rthe device and that's what
>>> err_alloc_ioas
>>> would do. Adding another label not sure would make things cleaner given
>>> the
>>> ordering requirement. So maybe this instead?
>>>
>>> @@ -482,7 +483,8 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice
>>> *vbasedev,
>>>     space = vfio_get_address_space(as);
>>>
>>>     if (!vfio_device_hiod_realize(vbasedev, errp)) {
>>> -        return false;
>>> +        vfio_put_address_space(space);
>>> +        goto err_connect_bind;
>>>     }
>>>
>>>     /* try to attach to an existing container in this space */
>> I was confused though Cedric and Eric both ACK this change. Don't we miss the iommufd_cdev_unbind_and_disconnect() call?
> Hum yes you're right. connect and bind was done. I thought this was done
> later. so err_alloc_ioas label looks good
> 
It seems you were right the first time.

I definitely haven't got coffee yet.
Cédric Le Goater July 23, 2024, 8:35 a.m. UTC | #12
On 7/22/24 23:13, Joao Martins wrote:
> This small series adds support for IOMMU dirty tracking support via the
> IOMMUFD backend. The hardware capability is available on most recent x86
> hardware (and these SMMUv3 in upcoming v6.11). The series is divided
> organized as follows:
> 
> * Patches 1 - 7: IOMMUFD backend support for dirty tracking;
> 
> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
> we will find and attach a device to a compatible IOMMU domain, or allocate a new
> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
> workflow is relatively simple:
> 
> 1) Probe device and allow dirty tracking in the HWPT
> 2) Toggling dirty tracking on/off
> 3) Read-and-clear of Dirty IOVAs
> 
> The heuristics selected for (1) were to always request the HWPT for
> dirty tracking if supported, or rely on device dirty page tracking. This
> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
> tracking even if we ask during hwpt allocation.
> 
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
> 
> * Patches 8 - 9: Don't block live migration where there's no VF dirty
> tracker, considering that we have IOMMU dirty tracking.
> 
> Comments and feedback appreciated (on patches 1, 5, 8, 9)
> 
> Cheers,
>      Joao
> 
> P.S. Suggest v6.11-rc as hypervisor kernel as there's
> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
> 
> Changes since v5[6]:
> * Remove patches 1-4 as these were commited to vfio-next
> * Add the Rb by Cedric and Zhenzhong (previously patches 7, 8, 10, 11)
> * Introduce VFIODevice::iommu_dirty_tracking and use it on patch 5, 8
> to store whether we can use IOMMU dirty tracking.
> 
> Changes since v4[5]:
> * Add various Reviewed-by in patches 2,3,4,6,8,11
> * Change error messages to mention IOMMU (Zhenzhong)
> * Better improve the checking of dirty page tracking in
>    vfio_migration_realize() to detect per-device IOMMU instead of using
>    container dirty_page_supported().
> * Improve various commit messages (Eric)
> * Extract the caps::hw_caps into its own patch as it was miosleading to
> be hidden in another patch (new patch 7)
> * Restructure patch 1 helper to be vfio_device_is_mdev() and use
> vfio::mdev directly in rest of patches (Cedric)
> * Improve error messages of set,query dirty tracking (Cedric)
> * Add missing casts to uintptr and uint64_t* (Cedric)
> * Add missing commens to struct doc from aw_bits removal (and hw_caps
> addition) (Eric)
> * Fix the detach flow in auto domains (Eric)
> * Set hwpt to NULL on detach (Eric)
> * Spurious line (Eric)
> 
> Changes since v3[5]:
> * Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if the VFIO
>    device is mdev. (Zhenzhong)
> * Skip setting IOMMU device for mdev (Zhenzhong)
> * Add Zhenzhong review tag in patch 3
> * Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
>    a new HostIOMMUDevice capability and thus remove the cap patch from the series (Zhenzhong)
> * Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization in attach_device()
> while skipping it all together for mdev. (Cedric)
> * Due to the previous item, had to remove aw_bits because it depends on device attach being
> finished, instead defer it to when get_cap() gets called.
> * Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
> * Pass errp in all cases, and instead just free the error in case of -EINVAL
>    in most of all patches, and also pass Error* in iommufd_backend_alloc_hwpt() amd
>    set/query dirty. This is made better thanks in part to skipping auto domains for mdev (Cedric)
> 
> Changes since RFCv2[4]:
> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
> we end up not actually toggling dirty tracking. (Avihai)
> * Fix error handling widely in auto domains logic and all patches (Avihai)
> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
> * New patches 1 and 2 taking into consideration previous comments.
> * Store hwpt::flags to know if we have dirty tracking (Avihai)
> * New patch 8, that allows to query dirty tracking support after
> provisioning. This is a cleaner way to check IOMMU dirty tracking support
> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
> device caps way is still used because at vfio attach we aren't yet with
> a fully initialized migration state.
> * Adopt error propagation in query,set dirty tracking
> * Misc improvements overall broadly and Avihai
> * Drop hugepages as it's a bit unrelated; I can pursue that patch
> * separately. The main motivation is to provide a way to test
> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
> does.
> 
> Changes since RFCv1[2]:
> * Remove intel/amd dirty tracking emulation enabling
> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
> [Will pursue these two in separate series]
> * Introduce auto domains support
> * Enforce dirty tracking following the IOMMUFD UAPI for this
> * Add support for toggling hugepages in IOMMUFD
> * Auto enable support when VF supports migration to use IOMMU
> when it doesn't have VF dirty tracking
> * Add a parameter to toggle VF dirty tracking
> 
> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
> [4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
> [5] https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
> [6] https://lore.kernel.org/qemu-devel/20240719120501.81279-1-joao.m.martins@oracle.com/
> 
> Joao Martins (9):
>    vfio/iommufd: Introduce auto domain creation
>    vfio/{iommufd,container}: Remove caps::aw_bits
>    vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
>    vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
>      attach_device()
>    vfio/iommufd: Probe and request hwpt dirty tracking capability
>    vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>    vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>    vfio/migration: Don't block migration device dirty tracking is
>      unsupported
>    vfio/common: Allow disabling device dirty page tracking
> 
>   include/hw/vfio/vfio-common.h      |  13 +++
>   include/sysemu/host_iommu_device.h |   5 +-
>   include/sysemu/iommufd.h           |  11 ++
>   backends/iommufd.c                 |  85 ++++++++++++++-
>   hw/vfio/common.c                   |  19 ++--
>   hw/vfio/container.c                |   9 +-
>   hw/vfio/helpers.c                  |  11 ++
>   hw/vfio/iommufd.c                  | 170 ++++++++++++++++++++++++++++-
>   hw/vfio/migration.c                |  12 +-
>   hw/vfio/pci.c                      |   3 +
>   backends/trace-events              |   3 +
>   11 files changed, 318 insertions(+), 23 deletions(-)
> 


Applied to vfio-next with the changes that were discussed this morning.
Please check.

Thanks,

C.
Joao Martins July 23, 2024, 8:56 a.m. UTC | #13
On 23/07/2024 09:35, Cédric Le Goater wrote:
> On 7/22/24 23:13, Joao Martins wrote:
>> This small series adds support for IOMMU dirty tracking support via the
>> IOMMUFD backend. The hardware capability is available on most recent x86
>> hardware (and these SMMUv3 in upcoming v6.11). The series is divided
>> organized as follows:
>>
>> * Patches 1 - 7: IOMMUFD backend support for dirty tracking;
>>
>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
>> we will find and attach a device to a compatible IOMMU domain, or allocate a new
>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
>> workflow is relatively simple:
>>
>> 1) Probe device and allow dirty tracking in the HWPT
>> 2) Toggling dirty tracking on/off
>> 3) Read-and-clear of Dirty IOVAs
>>
>> The heuristics selected for (1) were to always request the HWPT for
>> dirty tracking if supported, or rely on device dirty page tracking. This
>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>> tracking even if we ask during hwpt allocation.
>>
>> The unmap case is deferred until further vIOMMU support with migration
>> is added[3] which will then introduce the usage of
>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>> dma unmap bitmap flow.
>>
>> * Patches 8 - 9: Don't block live migration where there's no VF dirty
>> tracker, considering that we have IOMMU dirty tracking.
>>
>> Comments and feedback appreciated (on patches 1, 5, 8, 9)
>>
>> Cheers,
>>      Joao
>>
>> P.S. Suggest v6.11-rc as hypervisor kernel as there's
>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>
>> Changes since v5[6]:
>> * Remove patches 1-4 as these were commited to vfio-next
>> * Add the Rb by Cedric and Zhenzhong (previously patches 7, 8, 10, 11)
>> * Introduce VFIODevice::iommu_dirty_tracking and use it on patch 5, 8
>> to store whether we can use IOMMU dirty tracking.
>>
>> Changes since v4[5]:
>> * Add various Reviewed-by in patches 2,3,4,6,8,11
>> * Change error messages to mention IOMMU (Zhenzhong)
>> * Better improve the checking of dirty page tracking in
>>    vfio_migration_realize() to detect per-device IOMMU instead of using
>>    container dirty_page_supported().
>> * Improve various commit messages (Eric)
>> * Extract the caps::hw_caps into its own patch as it was miosleading to
>> be hidden in another patch (new patch 7)
>> * Restructure patch 1 helper to be vfio_device_is_mdev() and use
>> vfio::mdev directly in rest of patches (Cedric)
>> * Improve error messages of set,query dirty tracking (Cedric)
>> * Add missing casts to uintptr and uint64_t* (Cedric)
>> * Add missing commens to struct doc from aw_bits removal (and hw_caps
>> addition) (Eric)
>> * Fix the detach flow in auto domains (Eric)
>> * Set hwpt to NULL on detach (Eric)
>> * Spurious line (Eric)
>>
>> Changes since v3[5]:
>> * Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if
>> the VFIO
>>    device is mdev. (Zhenzhong)
>> * Skip setting IOMMU device for mdev (Zhenzhong)
>> * Add Zhenzhong review tag in patch 3
>> * Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
>>    a new HostIOMMUDevice capability and thus remove the cap patch from the
>> series (Zhenzhong)
>> * Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization
>> in attach_device()
>> while skipping it all together for mdev. (Cedric)
>> * Due to the previous item, had to remove aw_bits because it depends on device
>> attach being
>> finished, instead defer it to when get_cap() gets called.
>> * Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
>> * Pass errp in all cases, and instead just free the error in case of -EINVAL
>>    in most of all patches, and also pass Error* in
>> iommufd_backend_alloc_hwpt() amd
>>    set/query dirty. This is made better thanks in part to skipping auto
>> domains for mdev (Cedric)
>>
>> Changes since RFCv2[4]:
>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
>> we end up not actually toggling dirty tracking. (Avihai)
>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>> * New patches 1 and 2 taking into consideration previous comments.
>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>> * New patch 8, that allows to query dirty tracking support after
>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>> device caps way is still used because at vfio attach we aren't yet with
>> a fully initialized migration state.
>> * Adopt error propagation in query,set dirty tracking
>> * Misc improvements overall broadly and Avihai
>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>> * separately. The main motivation is to provide a way to test
>> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
>> does.
>>
>> Changes since RFCv1[2]:
>> * Remove intel/amd dirty tracking emulation enabling
>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>> [Will pursue these two in separate series]
>> * Introduce auto domains support
>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>> * Add support for toggling hugepages in IOMMUFD
>> * Auto enable support when VF supports migration to use IOMMU
>> when it doesn't have VF dirty tracking
>> * Add a parameter to toggle VF dirty tracking
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
>> [1]
>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
>> [2]
>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
>> [3]
>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
>> [4]
>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>> [5]
>> https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
>> [6]
>> https://lore.kernel.org/qemu-devel/20240719120501.81279-1-joao.m.martins@oracle.com/
>>
>> Joao Martins (9):
>>    vfio/iommufd: Introduce auto domain creation
>>    vfio/{iommufd,container}: Remove caps::aw_bits
>>    vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
>>    vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
>>      attach_device()
>>    vfio/iommufd: Probe and request hwpt dirty tracking capability
>>    vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>>    vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>>    vfio/migration: Don't block migration device dirty tracking is
>>      unsupported
>>    vfio/common: Allow disabling device dirty page tracking
>>
>>   include/hw/vfio/vfio-common.h      |  13 +++
>>   include/sysemu/host_iommu_device.h |   5 +-
>>   include/sysemu/iommufd.h           |  11 ++
>>   backends/iommufd.c                 |  85 ++++++++++++++-
>>   hw/vfio/common.c                   |  19 ++--
>>   hw/vfio/container.c                |   9 +-
>>   hw/vfio/helpers.c                  |  11 ++
>>   hw/vfio/iommufd.c                  | 170 ++++++++++++++++++++++++++++-
>>   hw/vfio/migration.c                |  12 +-
>>   hw/vfio/pci.c                      |   3 +
>>   backends/trace-events              |   3 +
>>   11 files changed, 318 insertions(+), 23 deletions(-)
> 
> Applied to vfio-next with the changes that were discussed this morning.
> Please check.
> 

I think the only thing missing is in the fourth patch to add the comment Eric
suggested (see below). Other than that, looks good to me.

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 50ffa4b77090..abb6f1a4b4a8 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -488,6 +488,13 @@ static bool iommufd_cdev_attach(const char *name,
VFIODevice *vbasedev,

     space = vfio_get_address_space(as);

+    /*
+     * The HostIOMMUDevice data from legacy backend is static and doesn't need
+     * any information from the (type1-iommu) backend to be initialized. In
+     * contrast however, the IOMMUFD HostIOMMUDevice data requires the iommufd
+     * FD to be connected and having a devid to be able to successfully call
+     * iommufd_backend_get_device_info().
+     */
     if (!vfio_device_hiod_realize(vbasedev, errp)) {
         goto err_alloc_ioas;
     }
Cédric Le Goater July 23, 2024, 9:08 a.m. UTC | #14
On 7/23/24 10:56, Joao Martins wrote:
> On 23/07/2024 09:35, Cédric Le Goater wrote:
>> On 7/22/24 23:13, Joao Martins wrote:
>>> This small series adds support for IOMMU dirty tracking support via the
>>> IOMMUFD backend. The hardware capability is available on most recent x86
>>> hardware (and these SMMUv3 in upcoming v6.11). The series is divided
>>> organized as follows:
>>>
>>> * Patches 1 - 7: IOMMUFD backend support for dirty tracking;
>>>
>>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
>>> we will find and attach a device to a compatible IOMMU domain, or allocate a new
>>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
>>> workflow is relatively simple:
>>>
>>> 1) Probe device and allow dirty tracking in the HWPT
>>> 2) Toggling dirty tracking on/off
>>> 3) Read-and-clear of Dirty IOVAs
>>>
>>> The heuristics selected for (1) were to always request the HWPT for
>>> dirty tracking if supported, or rely on device dirty page tracking. This
>>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>>> tracking even if we ask during hwpt allocation.
>>>
>>> The unmap case is deferred until further vIOMMU support with migration
>>> is added[3] which will then introduce the usage of
>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>>> dma unmap bitmap flow.
>>>
>>> * Patches 8 - 9: Don't block live migration where there's no VF dirty
>>> tracker, considering that we have IOMMU dirty tracking.
>>>
>>> Comments and feedback appreciated (on patches 1, 5, 8, 9)
>>>
>>> Cheers,
>>>       Joao
>>>
>>> P.S. Suggest v6.11-rc as hypervisor kernel as there's
>>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>>
>>> Changes since v5[6]:
>>> * Remove patches 1-4 as these were commited to vfio-next
>>> * Add the Rb by Cedric and Zhenzhong (previously patches 7, 8, 10, 11)
>>> * Introduce VFIODevice::iommu_dirty_tracking and use it on patch 5, 8
>>> to store whether we can use IOMMU dirty tracking.
>>>
>>> Changes since v4[5]:
>>> * Add various Reviewed-by in patches 2,3,4,6,8,11
>>> * Change error messages to mention IOMMU (Zhenzhong)
>>> * Better improve the checking of dirty page tracking in
>>>     vfio_migration_realize() to detect per-device IOMMU instead of using
>>>     container dirty_page_supported().
>>> * Improve various commit messages (Eric)
>>> * Extract the caps::hw_caps into its own patch as it was miosleading to
>>> be hidden in another patch (new patch 7)
>>> * Restructure patch 1 helper to be vfio_device_is_mdev() and use
>>> vfio::mdev directly in rest of patches (Cedric)
>>> * Improve error messages of set,query dirty tracking (Cedric)
>>> * Add missing casts to uintptr and uint64_t* (Cedric)
>>> * Add missing commens to struct doc from aw_bits removal (and hw_caps
>>> addition) (Eric)
>>> * Fix the detach flow in auto domains (Eric)
>>> * Set hwpt to NULL on detach (Eric)
>>> * Spurious line (Eric)
>>>
>>> Changes since v3[5]:
>>> * Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if
>>> the VFIO
>>>     device is mdev. (Zhenzhong)
>>> * Skip setting IOMMU device for mdev (Zhenzhong)
>>> * Add Zhenzhong review tag in patch 3
>>> * Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
>>>     a new HostIOMMUDevice capability and thus remove the cap patch from the
>>> series (Zhenzhong)
>>> * Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization
>>> in attach_device()
>>> while skipping it all together for mdev. (Cedric)
>>> * Due to the previous item, had to remove aw_bits because it depends on device
>>> attach being
>>> finished, instead defer it to when get_cap() gets called.
>>> * Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
>>> * Pass errp in all cases, and instead just free the error in case of -EINVAL
>>>     in most of all patches, and also pass Error* in
>>> iommufd_backend_alloc_hwpt() amd
>>>     set/query dirty. This is made better thanks in part to skipping auto
>>> domains for mdev (Cedric)
>>>
>>> Changes since RFCv2[4]:
>>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
>>> we end up not actually toggling dirty tracking. (Avihai)
>>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>>> * New patches 1 and 2 taking into consideration previous comments.
>>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>>> * New patch 8, that allows to query dirty tracking support after
>>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>>> device caps way is still used because at vfio attach we aren't yet with
>>> a fully initialized migration state.
>>> * Adopt error propagation in query,set dirty tracking
>>> * Misc improvements overall broadly and Avihai
>>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>>> * separately. The main motivation is to provide a way to test
>>> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
>>> does.
>>>
>>> Changes since RFCv1[2]:
>>> * Remove intel/amd dirty tracking emulation enabling
>>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>>> [Will pursue these two in separate series]
>>> * Introduce auto domains support
>>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>>> * Add support for toggling hugepages in IOMMUFD
>>> * Auto enable support when VF supports migration to use IOMMU
>>> when it doesn't have VF dirty tracking
>>> * Add a parameter to toggle VF dirty tracking
>>>
>>> [0]
>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
>>> [1]
>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
>>> [2]
>>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
>>> [3]
>>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
>>> [4]
>>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>>> [5]
>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
>>> [6]
>>> https://lore.kernel.org/qemu-devel/20240719120501.81279-1-joao.m.martins@oracle.com/
>>>
>>> Joao Martins (9):
>>>     vfio/iommufd: Introduce auto domain creation
>>>     vfio/{iommufd,container}: Remove caps::aw_bits
>>>     vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
>>>     vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
>>>       attach_device()
>>>     vfio/iommufd: Probe and request hwpt dirty tracking capability
>>>     vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>>>     vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>>>     vfio/migration: Don't block migration device dirty tracking is
>>>       unsupported
>>>     vfio/common: Allow disabling device dirty page tracking
>>>
>>>    include/hw/vfio/vfio-common.h      |  13 +++
>>>    include/sysemu/host_iommu_device.h |   5 +-
>>>    include/sysemu/iommufd.h           |  11 ++
>>>    backends/iommufd.c                 |  85 ++++++++++++++-
>>>    hw/vfio/common.c                   |  19 ++--
>>>    hw/vfio/container.c                |   9 +-
>>>    hw/vfio/helpers.c                  |  11 ++
>>>    hw/vfio/iommufd.c                  | 170 ++++++++++++++++++++++++++++-
>>>    hw/vfio/migration.c                |  12 +-
>>>    hw/vfio/pci.c                      |   3 +
>>>    backends/trace-events              |   3 +
>>>    11 files changed, 318 insertions(+), 23 deletions(-)
>>
>> Applied to vfio-next with the changes that were discussed this morning.
>> Please check.
>>
> 
> I think the only thing missing is in the fourth patch to add the comment Eric
> suggested (see below). Other than that, looks good to me.
> 
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 50ffa4b77090..abb6f1a4b4a8 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -488,6 +488,13 @@ static bool iommufd_cdev_attach(const char *name,
> VFIODevice *vbasedev,
> 
>       space = vfio_get_address_space(as);
> 
> +    /*
> +     * The HostIOMMUDevice data from legacy backend is static and doesn't need
> +     * any information from the (type1-iommu) backend to be initialized. In
> +     * contrast however, the IOMMUFD HostIOMMUDevice data requires the iommufd
> +     * FD to be connected and having a devid to be able to successfully call
> +     * iommufd_backend_get_device_info().
> +     */
>       if (!vfio_device_hiod_realize(vbasedev, errp)) {
>           goto err_alloc_ioas;
>       }
> 

Yep. This is fixed now. I will send a vfio PR in a couple of hours.

Thanks,

C.
Joao Martins July 23, 2024, 2:21 p.m. UTC | #15
On 23/07/2024 15:23, Yi Liu wrote:
> On 2024/7/23 05:13, Joao Martins wrote:
>> This small series adds support for IOMMU dirty tracking support via the
>> IOMMUFD backend. The hardware capability is available on most recent x86
>> hardware (and these SMMUv3 in upcoming v6.11). The series is divided
>> organized as follows:
>>
>> * Patches 1 - 7: IOMMUFD backend support for dirty tracking;
>>
>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
>> we will find and attach a device to a compatible IOMMU domain, or allocate a new
>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
>> workflow is relatively simple:
>>
>> 1) Probe device and allow dirty tracking in the HWPT
>> 2) Toggling dirty tracking on/off
>> 3) Read-and-clear of Dirty IOVAs
>>
>> The heuristics selected for (1) were to always request the HWPT for
>> dirty tracking if supported, or rely on device dirty page tracking. This
>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>> tracking even if we ask during hwpt allocation.
>>
>> The unmap case is deferred until further vIOMMU support with migration
>> is added[3] which will then introduce the usage of
>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>> dma unmap bitmap flow.
>>
>> * Patches 8 - 9: Don't block live migration where there's no VF dirty
>> tracker, considering that we have IOMMU dirty tracking.
>>
>> Comments and feedback appreciated (on patches 1, 5, 8, 9)
> 
> Hi Joao,
> 
> Do you have a github branch for this version? :)
> 

No, but you can probably use Cedric's vfio-next branch:

	https://github.com/legoater/qemu/tree/vfio-next

... Given that he has just submitted his PR with this series.

	Joao
Yi Liu July 23, 2024, 2:23 p.m. UTC | #16
On 2024/7/23 05:13, Joao Martins wrote:
> This small series adds support for IOMMU dirty tracking support via the
> IOMMUFD backend. The hardware capability is available on most recent x86
> hardware (and these SMMUv3 in upcoming v6.11). The series is divided
> organized as follows:
> 
> * Patches 1 - 7: IOMMUFD backend support for dirty tracking;
> 
> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
> we will find and attach a device to a compatible IOMMU domain, or allocate a new
> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
> workflow is relatively simple:
> 
> 1) Probe device and allow dirty tracking in the HWPT
> 2) Toggling dirty tracking on/off
> 3) Read-and-clear of Dirty IOVAs
> 
> The heuristics selected for (1) were to always request the HWPT for
> dirty tracking if supported, or rely on device dirty page tracking. This
> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
> tracking even if we ask during hwpt allocation.
> 
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
> 
> * Patches 8 - 9: Don't block live migration where there's no VF dirty
> tracker, considering that we have IOMMU dirty tracking.
> 
> Comments and feedback appreciated (on patches 1, 5, 8, 9)

Hi Joao,

Do you have a github branch for this version? :)

> Cheers,
>      Joao
> 
> P.S. Suggest v6.11-rc as hypervisor kernel as there's
> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
> 
> Changes since v5[6]:
> * Remove patches 1-4 as these were commited to vfio-next
> * Add the Rb by Cedric and Zhenzhong (previously patches 7, 8, 10, 11)
> * Introduce VFIODevice::iommu_dirty_tracking and use it on patch 5, 8
> to store whether we can use IOMMU dirty tracking.
> 
> Changes since v4[5]:
> * Add various Reviewed-by in patches 2,3,4,6,8,11
> * Change error messages to mention IOMMU (Zhenzhong)
> * Better improve the checking of dirty page tracking in
>    vfio_migration_realize() to detect per-device IOMMU instead of using
>    container dirty_page_supported().
> * Improve various commit messages (Eric)
> * Extract the caps::hw_caps into its own patch as it was miosleading to
> be hidden in another patch (new patch 7)
> * Restructure patch 1 helper to be vfio_device_is_mdev() and use
> vfio::mdev directly in rest of patches (Cedric)
> * Improve error messages of set,query dirty tracking (Cedric)
> * Add missing casts to uintptr and uint64_t* (Cedric)
> * Add missing commens to struct doc from aw_bits removal (and hw_caps
> addition) (Eric)
> * Fix the detach flow in auto domains (Eric)
> * Set hwpt to NULL on detach (Eric)
> * Spurious line (Eric)
> 
> Changes since v3[5]:
> * Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if the VFIO
>    device is mdev. (Zhenzhong)
> * Skip setting IOMMU device for mdev (Zhenzhong)
> * Add Zhenzhong review tag in patch 3
> * Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
>    a new HostIOMMUDevice capability and thus remove the cap patch from the series (Zhenzhong)
> * Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization in attach_device()
> while skipping it all together for mdev. (Cedric)
> * Due to the previous item, had to remove aw_bits because it depends on device attach being
> finished, instead defer it to when get_cap() gets called.
> * Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
> * Pass errp in all cases, and instead just free the error in case of -EINVAL
>    in most of all patches, and also pass Error* in iommufd_backend_alloc_hwpt() amd
>    set/query dirty. This is made better thanks in part to skipping auto domains for mdev (Cedric)
> 
> Changes since RFCv2[4]:
> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
> we end up not actually toggling dirty tracking. (Avihai)
> * Fix error handling widely in auto domains logic and all patches (Avihai)
> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
> * New patches 1 and 2 taking into consideration previous comments.
> * Store hwpt::flags to know if we have dirty tracking (Avihai)
> * New patch 8, that allows to query dirty tracking support after
> provisioning. This is a cleaner way to check IOMMU dirty tracking support
> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
> device caps way is still used because at vfio attach we aren't yet with
> a fully initialized migration state.
> * Adopt error propagation in query,set dirty tracking
> * Misc improvements overall broadly and Avihai
> * Drop hugepages as it's a bit unrelated; I can pursue that patch
> * separately. The main motivation is to provide a way to test
> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
> does.
> 
> Changes since RFCv1[2]:
> * Remove intel/amd dirty tracking emulation enabling
> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
> [Will pursue these two in separate series]
> * Introduce auto domains support
> * Enforce dirty tracking following the IOMMUFD UAPI for this
> * Add support for toggling hugepages in IOMMUFD
> * Auto enable support when VF supports migration to use IOMMU
> when it doesn't have VF dirty tracking
> * Add a parameter to toggle VF dirty tracking
> 
> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
> [4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
> [5] https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
> [6] https://lore.kernel.org/qemu-devel/20240719120501.81279-1-joao.m.martins@oracle.com/
> 
> Joao Martins (9):
>    vfio/iommufd: Introduce auto domain creation
>    vfio/{iommufd,container}: Remove caps::aw_bits
>    vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
>    vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
>      attach_device()
>    vfio/iommufd: Probe and request hwpt dirty tracking capability
>    vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>    vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>    vfio/migration: Don't block migration device dirty tracking is
>      unsupported
>    vfio/common: Allow disabling device dirty page tracking
> 
>   include/hw/vfio/vfio-common.h      |  13 +++
>   include/sysemu/host_iommu_device.h |   5 +-
>   include/sysemu/iommufd.h           |  11 ++
>   backends/iommufd.c                 |  85 ++++++++++++++-
>   hw/vfio/common.c                   |  19 ++--
>   hw/vfio/container.c                |   9 +-
>   hw/vfio/helpers.c                  |  11 ++
>   hw/vfio/iommufd.c                  | 170 ++++++++++++++++++++++++++++-
>   hw/vfio/migration.c                |  12 +-
>   hw/vfio/pci.c                      |   3 +
>   backends/trace-events              |   3 +
>   11 files changed, 318 insertions(+), 23 deletions(-)
>
Cédric Le Goater July 23, 2024, 2:24 p.m. UTC | #17
Hello

On 7/23/24 16:23, Yi Liu wrote:
> On 2024/7/23 05:13, Joao Martins wrote:
>> This small series adds support for IOMMU dirty tracking support via the
>> IOMMUFD backend. The hardware capability is available on most recent x86
>> hardware (and these SMMUv3 in upcoming v6.11). The series is divided
>> organized as follows:
>>
>> * Patches 1 - 7: IOMMUFD backend support for dirty tracking;
>>
>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is that
>> we will find and attach a device to a compatible IOMMU domain, or allocate a new
>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
>> workflow is relatively simple:
>>
>> 1) Probe device and allow dirty tracking in the HWPT
>> 2) Toggling dirty tracking on/off
>> 3) Read-and-clear of Dirty IOVAs
>>
>> The heuristics selected for (1) were to always request the HWPT for
>> dirty tracking if supported, or rely on device dirty page tracking. This
>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>> tracking even if we ask during hwpt allocation.
>>
>> The unmap case is deferred until further vIOMMU support with migration
>> is added[3] which will then introduce the usage of
>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>> dma unmap bitmap flow.
>>
>> * Patches 8 - 9: Don't block live migration where there's no VF dirty
>> tracker, considering that we have IOMMU dirty tracking.
>>
>> Comments and feedback appreciated (on patches 1, 5, 8, 9)
> 
> Hi Joao,
> 
> Do you have a github branch for this version? :)

There has been some updates since. Please use :

   https://github.com/legoater/qemu/commits/vfio-next

or you could wait for the PR to be merged :

   https://lore.kernel.org/qemu-devel/20240723140019.387786-1-clg@redhat.com/

Thanks,

C.


> 
>> Cheers,
>>      Joao
>>
>> P.S. Suggest v6.11-rc as hypervisor kernel as there's
>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>
>> Changes since v5[6]:
>> * Remove patches 1-4 as these were commited to vfio-next
>> * Add the Rb by Cedric and Zhenzhong (previously patches 7, 8, 10, 11)
>> * Introduce VFIODevice::iommu_dirty_tracking and use it on patch 5, 8
>> to store whether we can use IOMMU dirty tracking.
>>
>> Changes since v4[5]:
>> * Add various Reviewed-by in patches 2,3,4,6,8,11
>> * Change error messages to mention IOMMU (Zhenzhong)
>> * Better improve the checking of dirty page tracking in
>>    vfio_migration_realize() to detect per-device IOMMU instead of using
>>    container dirty_page_supported().
>> * Improve various commit messages (Eric)
>> * Extract the caps::hw_caps into its own patch as it was miosleading to
>> be hidden in another patch (new patch 7)
>> * Restructure patch 1 helper to be vfio_device_is_mdev() and use
>> vfio::mdev directly in rest of patches (Cedric)
>> * Improve error messages of set,query dirty tracking (Cedric)
>> * Add missing casts to uintptr and uint64_t* (Cedric)
>> * Add missing commens to struct doc from aw_bits removal (and hw_caps
>> addition) (Eric)
>> * Fix the detach flow in auto domains (Eric)
>> * Set hwpt to NULL on detach (Eric)
>> * Spurious line (Eric)
>>
>> Changes since v3[5]:
>> * Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if the VFIO
>>    device is mdev. (Zhenzhong)
>> * Skip setting IOMMU device for mdev (Zhenzhong)
>> * Add Zhenzhong review tag in patch 3
>> * Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
>>    a new HostIOMMUDevice capability and thus remove the cap patch from the series (Zhenzhong)
>> * Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization in attach_device()
>> while skipping it all together for mdev. (Cedric)
>> * Due to the previous item, had to remove aw_bits because it depends on device attach being
>> finished, instead defer it to when get_cap() gets called.
>> * Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
>> * Pass errp in all cases, and instead just free the error in case of -EINVAL
>>    in most of all patches, and also pass Error* in iommufd_backend_alloc_hwpt() amd
>>    set/query dirty. This is made better thanks in part to skipping auto domains for mdev (Cedric)
>>
>> Changes since RFCv2[4]:
>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
>> we end up not actually toggling dirty tracking. (Avihai)
>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>> * New patches 1 and 2 taking into consideration previous comments.
>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>> * New patch 8, that allows to query dirty tracking support after
>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>> device caps way is still used because at vfio attach we aren't yet with
>> a fully initialized migration state.
>> * Adopt error propagation in query,set dirty tracking
>> * Misc improvements overall broadly and Avihai
>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>> * separately. The main motivation is to provide a way to test
>> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
>> does.
>>
>> Changes since RFCv1[2]:
>> * Remove intel/amd dirty tracking emulation enabling
>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>> [Will pursue these two in separate series]
>> * Introduce auto domains support
>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>> * Add support for toggling hugepages in IOMMUFD
>> * Auto enable support when VF supports migration to use IOMMU
>> when it doesn't have VF dirty tracking
>> * Add a parameter to toggle VF dirty tracking
>>
>> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
>> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
>> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
>> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
>> [4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>> [5] https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
>> [6] https://lore.kernel.org/qemu-devel/20240719120501.81279-1-joao.m.martins@oracle.com/
>>
>> Joao Martins (9):
>>    vfio/iommufd: Introduce auto domain creation
>>    vfio/{iommufd,container}: Remove caps::aw_bits
>>    vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
>>    vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
>>      attach_device()
>>    vfio/iommufd: Probe and request hwpt dirty tracking capability
>>    vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>>    vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>>    vfio/migration: Don't block migration device dirty tracking is
>>      unsupported
>>    vfio/common: Allow disabling device dirty page tracking
>>
>>   include/hw/vfio/vfio-common.h      |  13 +++
>>   include/sysemu/host_iommu_device.h |   5 +-
>>   include/sysemu/iommufd.h           |  11 ++
>>   backends/iommufd.c                 |  85 ++++++++++++++-
>>   hw/vfio/common.c                   |  19 ++--
>>   hw/vfio/container.c                |   9 +-
>>   hw/vfio/helpers.c                  |  11 ++
>>   hw/vfio/iommufd.c                  | 170 ++++++++++++++++++++++++++++-
>>   hw/vfio/migration.c                |  12 +-
>>   hw/vfio/pci.c                      |   3 +
>>   backends/trace-events              |   3 +
>>   11 files changed, 318 insertions(+), 23 deletions(-)
>>
>