mbox series

[v4,00/41] vfio: Adopt iommufd

Message ID 20231102071302.1818071-1-zhenzhong.duan@intel.com
Headers show
Series vfio: Adopt iommufd | expand

Message

Duan, Zhenzhong Nov. 2, 2023, 7:12 a.m. UTC
Hi,

Thanks all for giving guides and comments on previous series, here is
the v4 of pure iommufd support part.

Based on Cédric's suggestion, this series includes an effort to remove
spapr code from container.c, now all spapr functions are moved to spapr.c
or spapr_pci_vfio.c, but there are still a few trival check on
VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
callbacks and duplicate code just to remove them. Some functions are moved
to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
spapr_pci_vfio.c is arch specific, or else we need to introduce stub
functions to those spapr functions moved.


PATCH 1-5: Move spapr functions to spapr*.c
PATCH 6-20: Abstract out base container
PATCH 21-24: Introduce sparpr container and its specific interface
PATCH 25: Add --enable/--disable-iommufd config support
PATCH 26: Introduce iommufd object
PATCH 27-33: add IOMMUFD container and cdev support
PATCH 34-39: fd passing for IOMMUFD object and cdev
PATCH 40: make VFIOContainerBase parameter const
PATCH 41: Compile out for PPC


We have done wide test with different combinations, e.g:
- PCI device were tested
- FD passing and hot reset with some trick.
- device hotplug test with legacy and iommufd backends
- with or without vIOMMU for legacy and iommufd backends
- divices linked to different iommufds
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform, ccw and ap were only compile-tested due to environment limit


Given some iommufd kernel limitations, the iommufd backend is
not yet fully on par with the legacy backend w.r.t. features like:
- p2p mappings (you will see related error traces)
- dirty page sync
- and etc.


qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v4
Based on vfio-next, commit id: f686924775

--------------------------------------------------------------------------

Below are some background and graph about the design:

With the introduction of iommufd, the Linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.

At QEMU level, interactions with the /dev/iommu are abstracted by a new
iommufd object (compiled in with the CONFIG_IOMMUFD option).

Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
linked with an iommufd object. In this series, the vfio-pci device is
granted with such capability (other VFIO devices are not yet ready):

It gets a new optional parameter named iommufd which allows to pass
an iommufd object:

    -object iommufd,id=iommufd0
    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

Note the /dev/iommu and vfio cdev can be externally opened by a
management layer. In such a case the fd is passed:

    -object iommufd,id=iommufd0,fd=22
    -device vfio-pci,iommufd=iommufd0,fd=23

If the fd parameter is not passed, the fd is opened by QEMU.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
for detailed discuss on this requirement.

If no iommufd option is passed to the vfio-pci device, iommufd is not
used and the end-user gets the behavior based on the legacy vfio iommu
interfaces:

    -device vfio-pci,host=0000:02:00.0

While the legacy kernel interface is group-centric, the new iommufd
interface is device-centric, relying on device fd and iommufd.

To support both interfaces in the QEMU VFIO device we reworked the vfio
container abstraction so that the generic VFIO code can use either
backend.

The VFIOContainer object becomes a base object derived into
a) the legacy VFIO container and
b) the new iommufd based container.

The base object implements generic code such as code related to
memory_listener and address space management whereas the derived
objects implement callbacks specific to either BE, legacy and
iommufd. Indeed each backend has its own way to setup secure context
and dma management interface. The below diagram shows how it looks
like with both BEs.

                    VFIO                           AddressSpace/Memory
    +-------+  +----------+  +-----+  +-----+
    |  pci  |  | platform |  |  ap |  | ccw |
    +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
        |           |           |        |        |   AddressSpace       |
        |           |           |        |        +------------+---------+
    +---V-----------V-----------V--------V----+               /
    |           VFIOAddressSpace              | <------------+
    |                  |                      |  MemoryListener
    |          VFIOContainer list             |
    +-------+----------------------------+----+
            |                            |
            |                            |
    +-------V------+            +--------V----------+
    |   iommufd    |            |    vfio legacy    |
    |  container   |            |     container     |
    +-------+------+            +--------+----------+
            |                            |
            | /dev/iommu                 | /dev/vfio/vfio
            | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
Userspace   |                            |
============+============================+===========================
Kernel      |  device fd                 |
            +---------------+            | group/container fd
            | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
            |  ATTACH_IOAS) |            | device fd
            |               |            |
            |       +-------V------------V-----------------+
    iommufd |       |                vfio                  |
(map/unmap  |       +---------+--------------------+-------+
ioas_copy)  |                 |                    | map/unmap
            |                 |                    |
     +------V------+    +-----V------+      +------V--------+
     | iommfd core |    |  device    |      |  vfio iommu   |
     +-------------+    +------------+      +---------------+

[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
              (bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
                  (set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
2. VFIO populates DMA map/unmap via the container BEs
   *) iommufd BE: uses iommufd
   *) vfio legacy BE: uses container fd


Changelog:
v4:
- add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
- add doc for default case without fd (Markus)
- Fix build issue reported by Markus and Cédric
- Simply use SPDX identifier in new file (Cédric)
- make vfio_container_init/destroy helper a seperate patch (Cédric)
- make vrdl_list movement a seperate patch (Cédric)
- add const for some callback parameters (Cédric)
- add g_assert in VFIOIOMMUOps callback (Cédric)
- introduce pci_hot_reset callback (Cédric)
- remove VFIOIOMMUSpaprOps (Cédric)
- initialize g_autofree to NULL (Cédric)
- adjust func name prefix and trace event in iommufd.c (Cédric)
- add RB

v3:
- Rename base container as VFIOContainerBase and legacy container as container (Cédric)
- Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
- Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
- Introduce vfio_iommu_spapr_ops (Cédric)
- Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
- patch19 and patch21 are splitted to two parts to facilitate review

v2:
- patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
- add fd passing to platform/ap/ccw vfio device
- add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
- rename char_dev.h to chardev_open.h for same naming scheme per Daniel
- add full copyright per Daniel and Jason


Note changelog below are from full IOMMUFD series:

v1:
- Alloc hwpt instead of using auto hwpt
- elaborate iommufd code per Nicolin
- consolidate two patches and drop as.c
- typo error fix and function rename

rfcv4:
- rebase on top of v8.0.3
- Add one patch from Yi which is about vfio device add in kvm
- Remove IOAS_COPY optimization and focus on functions in this patchset
- Fix wrong name issue reported and fix suggested by Matthew
- Fix compilation issue reported and fix sugggsted by Nicolin
- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
granularity
- Add dev_iter_next() callback to avoid adding so many callback
  at container scope, add VFIODevice.hwpt to support that
- Restore all functions back to common from container whenever possible,
  mainly migration and reset related functions
- Add --enable/disable-iommufd config option, enabled by default in linux
- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
redundant code
- Add FD passing support for vfio device backed by IOMMUFD
- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
- Fix FD leak in vfio_get_devicefd()

rfcv3:
- rebase on top of v7.2.0
- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
  VFIO backends
- Fix use after free in error path, reported by Alister
- Split common.c in several steps to ease the review

rfcv2:
- remove the first three patches of rfcv1
- add open cdev helper suggested by Jason
- remove the QOMification of the VFIOContainer and simply use standard ops
(David)
- add "-object iommufd" suggested by Alex

Thanks
Zhenzhong

Eric Auger (11):
  vfio/container: Switch to dma_map|unmap API
  vfio/common: Move giommu_list in base container
  vfio/container: Move space field to base container
  vfio/container: Switch to IOMMU BE
    set_dirty_page_tracking/query_dirty_bitmap API
  vfio/container: Convert functions to base container
  vfio/container: Move pgsizes and dma_max_mappings to base container
  vfio/container: Move listener to base container
  vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base
    container
  vfio/container: Implement attach/detach_device
  backends/iommufd: Introduce the iommufd object
  vfio/pci: Allow the selection of a given iommu backend

Yi Liu (2):
  util/char_dev: Add open_cdev()
  vfio/iommufd: Implement the iommufd backend

Zhenzhong Duan (28):
  vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
  vfio/container: Move vfio_container_add/del_section_window into
    spapr.c
  vfio/container: Move spapr specific init/deinit into spapr.c
  vfio/spapr: Make vfio_spapr_create/remove_window static
  vfio/common: Move vfio_host_win_add/del into spapr.c
  vfio: Introduce base object for VFIOContainer and targeted interface
  vfio/container: Introduce a empty VFIOIOMMUOps
  vfio/common: Introduce vfio_container_init/destroy helper
  vfio/container: Move per container device list in base container
  vfio/container: Move vrdl_list to base container
  vfio/container: Move iova_ranges to base container
  vfio/spapr: Introduce spapr backend and target interface
  vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  vfio/spapr: Move prereg_listener into spapr container
  vfio/spapr: Move hostwin_list into spapr container
  Add iommufd configure option
  vfio/iommufd: Relax assert check for iommufd backend
  vfio/iommufd: Add support for iova_ranges
  vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
  vfio/pci: Introduce a vfio pci hot reset interface
  vfio/iommufd: Enable pci hot reset through iommufd cdev interface
  vfio/pci: Make vfio cdev pre-openable by passing a file handle
  vfio: Allow the selection of a given iommu backend for platform ap and
    ccw
  vfio/platform: Make vfio cdev pre-openable by passing a file handle
  vfio/ap: Make vfio cdev pre-openable by passing a file handle
  vfio/ccw: Make vfio cdev pre-openable by passing a file handle
  vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
    callbacks
  vfio: Compile out iommufd for PPC target

 MAINTAINERS                           |  13 +
 meson.build                           |   6 +
 qapi/qom.json                         |  22 +
 hw/vfio/pci.h                         |   6 +
 include/hw/vfio/vfio-common.h         | 118 ++---
 include/hw/vfio/vfio-container-base.h | 121 +++++
 include/hw/vfio/vfio-platform.h       |   1 +
 include/hw/vfio/vfio.h                |   7 -
 include/qemu/chardev_open.h           |  16 +
 include/sysemu/iommufd.h              |  46 ++
 backends/iommufd-stub.c               |  59 +++
 backends/iommufd.c                    | 257 ++++++++++
 hw/ppc/spapr_pci_vfio.c               | 100 +++-
 hw/vfio/ap.c                          |  38 +-
 hw/vfio/ccw.c                         |  40 +-
 hw/vfio/common.c                      | 330 ++++++------
 hw/vfio/container-base.c              | 101 ++++
 hw/vfio/container.c                   | 443 ++++------------
 hw/vfio/helpers.c                     |  34 +-
 hw/vfio/iommufd.c                     | 697 ++++++++++++++++++++++++++
 hw/vfio/pci.c                         | 112 +++--
 hw/vfio/platform.c                    |  45 +-
 hw/vfio/spapr.c                       | 338 ++++++++++++-
 util/chardev_open.c                   |  81 +++
 backends/Kconfig                      |   4 +
 backends/meson.build                  |   5 +
 backends/trace-events                 |  12 +
 hw/vfio/meson.build                   |   4 +
 hw/vfio/trace-events                  |  18 +-
 meson_options.txt                     |   2 +
 qemu-options.hx                       |  13 +
 scripts/meson-buildoptions.sh         |   3 +
 util/meson.build                      |   1 +
 33 files changed, 2403 insertions(+), 690 deletions(-)
 create mode 100644 include/hw/vfio/vfio-container-base.h
 delete mode 100644 include/hw/vfio/vfio.h
 create mode 100644 include/qemu/chardev_open.h
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd-stub.c
 create mode 100644 backends/iommufd.c
 create mode 100644 hw/vfio/container-base.c
 create mode 100644 hw/vfio/iommufd.c
 create mode 100644 util/chardev_open.c

Comments

Cédric Le Goater Nov. 6, 2023, 2:23 p.m. UTC | #1
On 11/2/23 08:12, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, here is
> the v4 of pure iommufd support part.
> 
> Based on Cédric's suggestion, this series includes an effort to remove
> spapr code from container.c, now all spapr functions are moved to spapr.c
> or spapr_pci_vfio.c, but there are still a few trival check on
> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
> callbacks and duplicate code just to remove them. Some functions are moved
> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
> functions to those spapr functions moved.
> 
> 
> PATCH 1-5: Move spapr functions to spapr*.c

PATCH 1-5 applied to vfio-next.

Thanks,

C.
Cédric Le Goater Nov. 7, 2023, 6:28 p.m. UTC | #2
On 11/2/23 08:12, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, here is
> the v4 of pure iommufd support part.
> 
> Based on Cédric's suggestion, this series includes an effort to remove
> spapr code from container.c, now all spapr functions are moved to spapr.c
> or spapr_pci_vfio.c, but there are still a few trival check on
> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
> callbacks and duplicate code just to remove them. Some functions are moved
> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
> functions to those spapr functions moved.
> 
> 
> PATCH 1-5: Move spapr functions to spapr*.c
> PATCH 6-20: Abstract out base container
> PATCH 21-24: Introduce sparpr container and its specific interface

PATCH 6-24 applied to vfio-next :

   https://github.com/legoater/qemu/commits/vfio-next

(with a global s/fucntional/functional/)


I also pushed the remaining patches on :

   https://github.com/legoater/qemu/commits/vfio-8.2

with a slight rework of the IOMMUFD configuration, now done per platform.
The VFIO frontend and the 'iommufd' object are only available on x86_64,
arm, s390x.

Thanks,

C.


> PATCH 25: Add --enable/--disable-iommufd config support
> PATCH 26: Introduce iommufd object
> PATCH 27-33: add IOMMUFD container and cdev support
> PATCH 34-39: fd passing for IOMMUFD object and cdev
> PATCH 40: make VFIOContainerBase parameter const
> PATCH 41: Compile out for PPC
> 
> 
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> 
> 
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.
> 
> 
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v4
> Based on vfio-next, commit id: f686924775
> 
> --------------------------------------------------------------------------
> 
> Below are some background and graph about the design:
> 
> With the introduction of iommufd, the Linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> 
> At QEMU level, interactions with the /dev/iommu are abstracted by a new
> iommufd object (compiled in with the CONFIG_IOMMUFD option).
> 
> Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
> linked with an iommufd object. In this series, the vfio-pci device is
> granted with such capability (other VFIO devices are not yet ready):
> 
> It gets a new optional parameter named iommufd which allows to pass
> an iommufd object:
> 
>      -object iommufd,id=iommufd0
>      -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
> 
> Note the /dev/iommu and vfio cdev can be externally opened by a
> management layer. In such a case the fd is passed:
> 
>      -object iommufd,id=iommufd0,fd=22
>      -device vfio-pci,iommufd=iommufd0,fd=23
> 
> If the fd parameter is not passed, the fd is opened by QEMU.
> See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
> for detailed discuss on this requirement.
> 
> If no iommufd option is passed to the vfio-pci device, iommufd is not
> used and the end-user gets the behavior based on the legacy vfio iommu
> interfaces:
> 
>      -device vfio-pci,host=0000:02:00.0
> 
> While the legacy kernel interface is group-centric, the new iommufd
> interface is device-centric, relying on device fd and iommufd.
> 
> To support both interfaces in the QEMU VFIO device we reworked the vfio
> container abstraction so that the generic VFIO code can use either
> backend.
> 
> The VFIOContainer object becomes a base object derived into
> a) the legacy VFIO container and
> b) the new iommufd based container.
> 
> The base object implements generic code such as code related to
> memory_listener and address space management whereas the derived
> objects implement callbacks specific to either BE, legacy and
> iommufd. Indeed each backend has its own way to setup secure context
> and dma management interface. The below diagram shows how it looks
> like with both BEs.
> 
>                      VFIO                           AddressSpace/Memory
>      +-------+  +----------+  +-----+  +-----+
>      |  pci  |  | platform |  |  ap |  | ccw |
>      +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
>          |           |           |        |        |   AddressSpace       |
>          |           |           |        |        +------------+---------+
>      +---V-----------V-----------V--------V----+               /
>      |           VFIOAddressSpace              | <------------+
>      |                  |                      |  MemoryListener
>      |          VFIOContainer list             |
>      +-------+----------------------------+----+
>              |                            |
>              |                            |
>      +-------V------+            +--------V----------+
>      |   iommufd    |            |    vfio legacy    |
>      |  container   |            |     container     |
>      +-------+------+            +--------+----------+
>              |                            |
>              | /dev/iommu                 | /dev/vfio/vfio
>              | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
> Userspace   |                            |
> ============+============================+===========================
> Kernel      |  device fd                 |
>              +---------------+            | group/container fd
>              | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
>              |  ATTACH_IOAS) |            | device fd
>              |               |            |
>              |       +-------V------------V-----------------+
>      iommufd |       |                vfio                  |
> (map/unmap  |       +---------+--------------------+-------+
> ioas_copy)  |                 |                    | map/unmap
>              |                 |                    |
>       +------V------+    +-----V------+      +------V--------+
>       | iommfd core |    |  device    |      |  vfio iommu   |
>       +-------------+    +------------+      +---------------+
> 
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>                (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>                    (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> 2. VFIO populates DMA map/unmap via the container BEs
>     *) iommufd BE: uses iommufd
>     *) vfio legacy BE: uses container fd
> 
> 
> Changelog:
> v4:
> - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
> - add doc for default case without fd (Markus)
> - Fix build issue reported by Markus and Cédric
> - Simply use SPDX identifier in new file (Cédric)
> - make vfio_container_init/destroy helper a seperate patch (Cédric)
> - make vrdl_list movement a seperate patch (Cédric)
> - add const for some callback parameters (Cédric)
> - add g_assert in VFIOIOMMUOps callback (Cédric)
> - introduce pci_hot_reset callback (Cédric)
> - remove VFIOIOMMUSpaprOps (Cédric)
> - initialize g_autofree to NULL (Cédric)
> - adjust func name prefix and trace event in iommufd.c (Cédric)
> - add RB
> 
> v3:
> - Rename base container as VFIOContainerBase and legacy container as container (Cédric)
> - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
> - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
> - Introduce vfio_iommu_spapr_ops (Cédric)
> - Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
> - patch19 and patch21 are splitted to two parts to facilitate review
> 
> v2:
> - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
> - add fd passing to platform/ap/ccw vfio device
> - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
> - rename char_dev.h to chardev_open.h for same naming scheme per Daniel
> - add full copyright per Daniel and Jason
> 
> 
> Note changelog below are from full IOMMUFD series:
> 
> v1:
> - Alloc hwpt instead of using auto hwpt
> - elaborate iommufd code per Nicolin
> - consolidate two patches and drop as.c
> - typo error fix and function rename
> 
> rfcv4:
> - rebase on top of v8.0.3
> - Add one patch from Yi which is about vfio device add in kvm
> - Remove IOAS_COPY optimization and focus on functions in this patchset
> - Fix wrong name issue reported and fix suggested by Matthew
> - Fix compilation issue reported and fix sugggsted by Nicolin
> - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
> granularity
> - Add dev_iter_next() callback to avoid adding so many callback
>    at container scope, add VFIODevice.hwpt to support that
> - Restore all functions back to common from container whenever possible,
>    mainly migration and reset related functions
> - Add --enable/disable-iommufd config option, enabled by default in linux
> - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
> - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
> - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
> redundant code
> - Add FD passing support for vfio device backed by IOMMUFD
> - Fix hot unplug resource leak issue in vfio_legacy_detach_device()
> - Fix FD leak in vfio_get_devicefd()
> 
> rfcv3:
> - rebase on top of v7.2.0
> - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
>    VFIO backends
> - Fix use after free in error path, reported by Alister
> - Split common.c in several steps to ease the review
> 
> rfcv2:
> - remove the first three patches of rfcv1
> - add open cdev helper suggested by Jason
> - remove the QOMification of the VFIOContainer and simply use standard ops
> (David)
> - add "-object iommufd" suggested by Alex
> 
> Thanks
> Zhenzhong
> 
> Eric Auger (11):
>    vfio/container: Switch to dma_map|unmap API
>    vfio/common: Move giommu_list in base container
>    vfio/container: Move space field to base container
>    vfio/container: Switch to IOMMU BE
>      set_dirty_page_tracking/query_dirty_bitmap API
>    vfio/container: Convert functions to base container
>    vfio/container: Move pgsizes and dma_max_mappings to base container
>    vfio/container: Move listener to base container
>    vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base
>      container
>    vfio/container: Implement attach/detach_device
>    backends/iommufd: Introduce the iommufd object
>    vfio/pci: Allow the selection of a given iommu backend
> 
> Yi Liu (2):
>    util/char_dev: Add open_cdev()
>    vfio/iommufd: Implement the iommufd backend
> 
> Zhenzhong Duan (28):
>    vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
>    vfio/container: Move vfio_container_add/del_section_window into
>      spapr.c
>    vfio/container: Move spapr specific init/deinit into spapr.c
>    vfio/spapr: Make vfio_spapr_create/remove_window static
>    vfio/common: Move vfio_host_win_add/del into spapr.c
>    vfio: Introduce base object for VFIOContainer and targeted interface
>    vfio/container: Introduce a empty VFIOIOMMUOps
>    vfio/common: Introduce vfio_container_init/destroy helper
>    vfio/container: Move per container device list in base container
>    vfio/container: Move vrdl_list to base container
>    vfio/container: Move iova_ranges to base container
>    vfio/spapr: Introduce spapr backend and target interface
>    vfio/spapr: switch to spapr IOMMU BE add/del_section_window
>    vfio/spapr: Move prereg_listener into spapr container
>    vfio/spapr: Move hostwin_list into spapr container
>    Add iommufd configure option
>    vfio/iommufd: Relax assert check for iommufd backend
>    vfio/iommufd: Add support for iova_ranges
>    vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
>    vfio/pci: Introduce a vfio pci hot reset interface
>    vfio/iommufd: Enable pci hot reset through iommufd cdev interface
>    vfio/pci: Make vfio cdev pre-openable by passing a file handle
>    vfio: Allow the selection of a given iommu backend for platform ap and
>      ccw
>    vfio/platform: Make vfio cdev pre-openable by passing a file handle
>    vfio/ap: Make vfio cdev pre-openable by passing a file handle
>    vfio/ccw: Make vfio cdev pre-openable by passing a file handle
>    vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
>      callbacks
>    vfio: Compile out iommufd for PPC target
> 
>   MAINTAINERS                           |  13 +
>   meson.build                           |   6 +
>   qapi/qom.json                         |  22 +
>   hw/vfio/pci.h                         |   6 +
>   include/hw/vfio/vfio-common.h         | 118 ++---
>   include/hw/vfio/vfio-container-base.h | 121 +++++
>   include/hw/vfio/vfio-platform.h       |   1 +
>   include/hw/vfio/vfio.h                |   7 -
>   include/qemu/chardev_open.h           |  16 +
>   include/sysemu/iommufd.h              |  46 ++
>   backends/iommufd-stub.c               |  59 +++
>   backends/iommufd.c                    | 257 ++++++++++
>   hw/ppc/spapr_pci_vfio.c               | 100 +++-
>   hw/vfio/ap.c                          |  38 +-
>   hw/vfio/ccw.c                         |  40 +-
>   hw/vfio/common.c                      | 330 ++++++------
>   hw/vfio/container-base.c              | 101 ++++
>   hw/vfio/container.c                   | 443 ++++------------
>   hw/vfio/helpers.c                     |  34 +-
>   hw/vfio/iommufd.c                     | 697 ++++++++++++++++++++++++++
>   hw/vfio/pci.c                         | 112 +++--
>   hw/vfio/platform.c                    |  45 +-
>   hw/vfio/spapr.c                       | 338 ++++++++++++-
>   util/chardev_open.c                   |  81 +++
>   backends/Kconfig                      |   4 +
>   backends/meson.build                  |   5 +
>   backends/trace-events                 |  12 +
>   hw/vfio/meson.build                   |   4 +
>   hw/vfio/trace-events                  |  18 +-
>   meson_options.txt                     |   2 +
>   qemu-options.hx                       |  13 +
>   scripts/meson-buildoptions.sh         |   3 +
>   util/meson.build                      |   1 +
>   33 files changed, 2403 insertions(+), 690 deletions(-)
>   create mode 100644 include/hw/vfio/vfio-container-base.h
>   delete mode 100644 include/hw/vfio/vfio.h
>   create mode 100644 include/qemu/chardev_open.h
>   create mode 100644 include/sysemu/iommufd.h
>   create mode 100644 backends/iommufd-stub.c
>   create mode 100644 backends/iommufd.c
>   create mode 100644 hw/vfio/container-base.c
>   create mode 100644 hw/vfio/iommufd.c
>   create mode 100644 util/chardev_open.c
>
Matthew Rosato Nov. 8, 2023, 3:26 a.m. UTC | #3
On 11/7/23 1:28 PM, Cédric Le Goater wrote:
> On 11/2/23 08:12, Zhenzhong Duan wrote:
>> Hi,
>>
>> Thanks all for giving guides and comments on previous series, here is
>> the v4 of pure iommufd support part.
>>
>> Based on Cédric's suggestion, this series includes an effort to remove
>> spapr code from container.c, now all spapr functions are moved to spapr.c
>> or spapr_pci_vfio.c, but there are still a few trival check on
>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
>> callbacks and duplicate code just to remove them. Some functions are moved
>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>> functions to those spapr functions moved.
>>
>>
>> PATCH 1-5: Move spapr functions to spapr*.c
>> PATCH 6-20: Abstract out base container
>> PATCH 21-24: Introduce sparpr container and its specific interface
> 
> PATCH 6-24 applied to vfio-next :
> 
>   https://github.com/legoater/qemu/commits/vfio-next
> 
> (with a global s/fucntional/functional/)
> 
> 
> I also pushed the remaining patches on :
> 
>   https://github.com/legoater/qemu/commits/vfio-8.2
> 
> with a slight rework of the IOMMUFD configuration, now done per platform.
> The VFIO frontend and the 'iommufd' object are only available on x86_64,
> arm, s390x.

FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found) so I think something isn't actually enabling IOMMUFD as expected with your change...

Instead I tested on s390x using vfio-next + patches 25-41 of this series on top.  

Legacy backend regression testing worked fine for vfio-pci, vfio-ap and vfio-ccw.

Using iommufd backend for vfio-pci on s390 exposes an s390-only issue related to accounting of vfio DMA limit (code in hw/s390x/s390-pci-vfio.c assumes VFIODevice.group is never null, but that's no longer true when we use the iommufd backend with cdev).  We don't even need to track this when using the iommufd backend -- With that issue bypassed, vfio-pci testing on s390x looks good so far.  I'll send a separate fix for that.

Using the iommufd backend for vfio-ccw and vfio-ap did not work, see response to patch 28.

Thanks,
Matt
Duan, Zhenzhong Nov. 8, 2023, 8:37 a.m. UTC | #4
>-----Original Message-----
>From: Matthew Rosato <mjrosato@linux.ibm.com>
>Sent: Wednesday, November 8, 2023 11:27 AM
>Subject: Re: [PATCH v4 00/41] vfio: Adopt iommufd
>
>On 11/7/23 1:28 PM, Cédric Le Goater wrote:
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> Hi,
>>>
>>> Thanks all for giving guides and comments on previous series, here is
>>> the v4 of pure iommufd support part.
>>>
>>> Based on Cédric's suggestion, this series includes an effort to remove
>>> spapr code from container.c, now all spapr functions are moved to spapr.c
>>> or spapr_pci_vfio.c, but there are still a few trival check on
>>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce
>many
>>> callbacks and duplicate code just to remove them. Some functions are moved
>>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>>> functions to those spapr functions moved.
>>>
>>>
>>> PATCH 1-5: Move spapr functions to spapr*.c
>>> PATCH 6-20: Abstract out base container
>>> PATCH 21-24: Introduce sparpr container and its specific interface
>>
>> PATCH 6-24 applied to vfio-next :
>>
>>   https://github.com/legoater/qemu/commits/vfio-next
>>
>> (with a global s/fucntional/functional/)
>>
>>
>> I also pushed the remaining patches on :
>>
>>   https://github.com/legoater/qemu/commits/vfio-8.2
>>
>> with a slight rework of the IOMMUFD configuration, now done per platform.
>> The VFIO frontend and the 'iommufd' object are only available on x86_64,
>> arm, s390x.

Thanks Cédric.

>
>FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the
>iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found)
>so I think something isn't actually enabling IOMMUFD as expected with your
>change...

It looks CONFIG_IOMMUFD is recognized by Kconfig sub-system but not received
by compiler. I'm still digging how to pass CONFIG_IOMMUFD to compiler.

>
>Instead I tested on s390x using vfio-next + patches 25-41 of this series on top.
>
>Legacy backend regression testing worked fine for vfio-pci, vfio-ap and vfio-ccw.
>
>Using iommufd backend for vfio-pci on s390 exposes an s390-only issue related
>to accounting of vfio DMA limit (code in hw/s390x/s390-pci-vfio.c assumes
>VFIODevice.group is never null, but that's no longer true when we use the
>iommufd backend with cdev).  We don't even need to track this when using the
>iommufd backend -- With that issue bypassed, vfio-pci testing on s390x looks
>good so far.  I'll send a separate fix for that.

Thanks for fixing that.

BRs.
Zhenzhong

>
>Using the iommufd backend for vfio-ccw and vfio-ap did not work, see response
>to patch 28.
>
>Thanks,
>Matt
Duan, Zhenzhong Nov. 8, 2023, 9:07 a.m. UTC | #5
>-----Original Message-----
>From: Duan, Zhenzhong
>Sent: Wednesday, November 8, 2023 4:38 PM
>Subject: RE: [PATCH v4 00/41] vfio: Adopt iommufd
>
>
>
>>-----Original Message-----
>>From: Matthew Rosato <mjrosato@linux.ibm.com>
>>Sent: Wednesday, November 8, 2023 11:27 AM
>>Subject: Re: [PATCH v4 00/41] vfio: Adopt iommufd
>>
>>On 11/7/23 1:28 PM, Cédric Le Goater wrote:
>>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>>> Hi,
>>>>
>>>> Thanks all for giving guides and comments on previous series, here is
>>>> the v4 of pure iommufd support part.
>>>>
>>>> Based on Cédric's suggestion, this series includes an effort to remove
>>>> spapr code from container.c, now all spapr functions are moved to spapr.c
>>>> or spapr_pci_vfio.c, but there are still a few trival check on
>>>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce
>>many
>>>> callbacks and duplicate code just to remove them. Some functions are moved
>>>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>>>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>>>> functions to those spapr functions moved.
>>>>
>>>>
>>>> PATCH 1-5: Move spapr functions to spapr*.c
>>>> PATCH 6-20: Abstract out base container
>>>> PATCH 21-24: Introduce sparpr container and its specific interface
>>>
>>> PATCH 6-24 applied to vfio-next :
>>>
>>>   https://github.com/legoater/qemu/commits/vfio-next
>>>
>>> (with a global s/fucntional/functional/)
>>>
>>>
>>> I also pushed the remaining patches on :
>>>
>>>   https://github.com/legoater/qemu/commits/vfio-8.2
>>>
>>> with a slight rework of the IOMMUFD configuration, now done per platform.
>>> The VFIO frontend and the 'iommufd' object are only available on x86_64,
>>> arm, s390x.
>
>Thanks Cédric.
>
>>
>>FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the
>>iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found)
>>so I think something isn't actually enabling IOMMUFD as expected with your
>>change...
>
>It looks CONFIG_IOMMUFD is recognized by Kconfig sub-system but not received
>by compiler. I'm still digging how to pass CONFIG_IOMMUFD to compiler.

Need below change to pass CONFIG_IOMMUFD to compiler.

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 0a810f8b88..2a3263b51f 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -30,6 +30,7 @@
 #include "exec/address-spaces.h"
 #include "qom/object.h"
 #include "monitor/monitor.h"
+#include CONFIG_DEVICES

 #define TYPE_VFIO_AP_DEVICE      "vfio-ap"

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index a674bd8d6d..08101ad445 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -31,6 +31,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "monitor/monitor.h"
+#include CONFIG_DEVICES

 struct VFIOCCWDevice {
     S390CCWDevice cdev;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d8f658ea47..3121b5f985 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -44,6 +44,7 @@
 #include "migration/qemu-file.h"
 #include "sysemu/iommufd.h"
 #include "monitor/monitor.h"
+#include CONFIG_DEVICES

 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"

Thanks
Zhenzhong
Cédric Le Goater Nov. 8, 2023, 9:21 a.m. UTC | #6
On 11/8/23 04:26, Matthew Rosato wrote:
> On 11/7/23 1:28 PM, Cédric Le Goater wrote:
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> Hi,
>>>
>>> Thanks all for giving guides and comments on previous series, here is
>>> the v4 of pure iommufd support part.
>>>
>>> Based on Cédric's suggestion, this series includes an effort to remove
>>> spapr code from container.c, now all spapr functions are moved to spapr.c
>>> or spapr_pci_vfio.c, but there are still a few trival check on
>>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
>>> callbacks and duplicate code just to remove them. Some functions are moved
>>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>>> functions to those spapr functions moved.
>>>
>>>
>>> PATCH 1-5: Move spapr functions to spapr*.c
>>> PATCH 6-20: Abstract out base container
>>> PATCH 21-24: Introduce sparpr container and its specific interface
>>
>> PATCH 6-24 applied to vfio-next :
>>
>>    https://github.com/legoater/qemu/commits/vfio-next
>>
>> (with a global s/fucntional/functional/)
>>
>>
>> I also pushed the remaining patches on :
>>
>>    https://github.com/legoater/qemu/commits/vfio-8.2
>>
>> with a slight rework of the IOMMUFD configuration, now done per platform.
>> The VFIO frontend and the 'iommufd' object are only available on x86_64,
>> arm, s390x.
> 
> FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found) so I think something isn't actually enabling IOMMUFD as expected with your change...

yes. The previous method used to enable the IOMMUFD device with
a ./configure script option was exposing the CONFIG_IOMMUFD define
globally.

The current method using the Kconfig files requires an extra :

#include CONFIG_DEVICES

in each file using CONFIG_IOMMUFD.

I didn't see it because when compiled natively on x86_64 the
CONFIG_IOMMUFD define is included for some (magic) reason.
It is not the case on other arches, ppc64, aarch64, s390x.

I did the update and repushed vfio-8.2. Should work now.

> 
> Instead I tested on s390x using vfio-next + patches 25-41 of this series on top.
> 
> Legacy backend regression testing worked fine for vfio-pci, vfio-ap and vfio-ccw.

ok. Good. This means that vfio-next is in good shape.

> Using iommufd backend for vfio-pci on s390 exposes an s390-only issue related to accounting of vfio DMA limit (code in hw/s390x/s390-pci-vfio.c assumes VFIODevice.group is never null, but that's no longer true when we use the iommufd backend with cdev).  We don't even need to track this when using the iommufd backend -- With that issue bypassed, vfio-pci testing on s390x looks good so far.  I'll send a separate fix for that.

Thanks,

C.

  
> Using the iommufd backend for vfio-ccw and vfio-ap did not work, see response to patch 28.
> 
> Thanks,
> Matt
>
Cédric Le Goater Nov. 8, 2023, 9:23 a.m. UTC | #7
>>> FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the
>>> iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found)
>>> so I think something isn't actually enabling IOMMUFD as expected with your
>>> change...
>>
>> It looks CONFIG_IOMMUFD is recognized by Kconfig sub-system but not received
>> by compiler. I'm still digging how to pass CONFIG_IOMMUFD to compiler.
> 
> Need below change to pass CONFIG_IOMMUFD to compiler.
> 
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 0a810f8b88..2a3263b51f 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -30,6 +30,7 @@
>   #include "exec/address-spaces.h"
>   #include "qom/object.h"
>   #include "monitor/monitor.h"
> +#include CONFIG_DEVICES
> 
>   #define TYPE_VFIO_AP_DEVICE      "vfio-ap"
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index a674bd8d6d..08101ad445 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -31,6 +31,7 @@
>   #include "qemu/main-loop.h"
>   #include "qemu/module.h"
>   #include "monitor/monitor.h"
> +#include CONFIG_DEVICES
> 
>   struct VFIOCCWDevice {
>       S390CCWDevice cdev;
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index d8f658ea47..3121b5f985 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -44,6 +44,7 @@
>   #include "migration/qemu-file.h"
>   #include "sysemu/iommufd.h"
>   #include "monitor/monitor.h"
> +#include CONFIG_DEVICES
> 
>   #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"


yep. I pushed forced vfio-8.2 with these changes.

Thanks,

C.