mbox series

[0/3] vfio/pci: Add NVIDIA GPUDirect P2P clique support

Message ID 20170829214929.31136.21144.stgit@gimli.home
Headers show
Series vfio/pci: Add NVIDIA GPUDirect P2P clique support | expand

Message

Alex Williamson Aug. 29, 2017, 10:05 p.m. UTC
NVIDIA has a specification for exposing a virtual vendor capability
which provides a hint to guest drivers as to which sets of GPUs can
support direct peer-to-peer DMA.  Devices with the same clique ID are
expected to support this.  The user can specify a clique ID for an
NVIDIA graphics device using the new vfio-pci x-nv-gpudirect-clique=
option, where valid clique IDs are a 4-bit integer.  It's entirely the
user's responsibility to specify sets of devices for which P2P works
correctly and provides some benefit.  This is only useful for DMA
between NVIDIA GPUs, therefore it's only useful to specify cliques
comprised of more than one GPU.  Furthermore, this does not enable DMA
between VMs, there is no change to VM DMA mapping, this only exposes
hints about existing DMA paths to the guest driver.  Thanks,

Alex

---

Alex Williamson (3):
      vfio/pci: Do not unwind on error
      vfio/pci: Add virtual capabilities quirk infrastructure
      vfio/pci: Add NVIDIA GPUDirect Cliques support


 hw/vfio/pci-quirks.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c        |   17 +++++++
 hw/vfio/pci.h        |    4 ++
 3 files changed, 133 insertions(+), 2 deletions(-)

Comments

Bob Chen Oct. 26, 2017, 10:45 a.m. UTC | #1
There seem to be some bugs in these patches, causing my VM failed to boot.

Test case:

0. Merge these 3 patches in to release 2.10.1

1. qemu-system-x86_64_2.10.1  ... \
-device vfio-pci,host=04:00.0 \
-device vfio-pci,host=05:00.0 \
-device vfio-pci,host=08:00.0 \
-device vfio-pci,host=09:00.0 \
-device vfio-pci,host=85:00.0 \
-device vfio-pci,host=86:00.0 \
-device vfio-pci,host=89:00.0 \
-device vfio-pci,host=8a:00.0 ...

The guest was able to boot up.

2. qemu-system-x86_64_2.10.1  ... \
-device vfio-pci,host=04:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=05:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=08:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=09:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=85:00.0,x-nv-gpudirect-clique=8 \
-device vfio-pci,host=86:00.0,x-nv-gpudirect-clique=8 \
-device vfio-pci,host=89:00.0,x-nv-gpudirect-clique=8 \
-device vfio-pci,host=8a:00.0,x-nv-gpudirect-clique=8 \

Hang. VNC couldn't connect.


My personal patch used to work, although it was done by straightforward
hacking and not that friendly to read.

--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c

 @@static int vfio_initfn(PCIDevice *pdev)

    vfio_add_emulated_long(vdev, 0xc8, 0x50080009, ~0);
    if (count < 4)
    {
        vfio_add_emulated_long(vdev, 0xcc, 0x00005032, ~0);
    }
    else
    {
        vfio_add_emulated_long(vdev, 0xcc, 0x00085032, ~0);
    }
    vfio_add_emulated_word(vdev, 0x78, 0xc810, ~0);

2017-08-30 6:05 GMT+08:00 Alex Williamson <alex.williamson@redhat.com>:

> NVIDIA has a specification for exposing a virtual vendor capability
> which provides a hint to guest drivers as to which sets of GPUs can
> support direct peer-to-peer DMA.  Devices with the same clique ID are
> expected to support this.  The user can specify a clique ID for an
> NVIDIA graphics device using the new vfio-pci x-nv-gpudirect-clique=
> option, where valid clique IDs are a 4-bit integer.  It's entirely the
> user's responsibility to specify sets of devices for which P2P works
> correctly and provides some benefit.  This is only useful for DMA
> between NVIDIA GPUs, therefore it's only useful to specify cliques
> comprised of more than one GPU.  Furthermore, this does not enable DMA
> between VMs, there is no change to VM DMA mapping, this only exposes
> hints about existing DMA paths to the guest driver.  Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (3):
>       vfio/pci: Do not unwind on error
>       vfio/pci: Add virtual capabilities quirk infrastructure
>       vfio/pci: Add NVIDIA GPUDirect Cliques support
>
>
>  hw/vfio/pci-quirks.c |  114 ++++++++++++++++++++++++++++++
> ++++++++++++++++++++
>  hw/vfio/pci.c        |   17 +++++++
>  hw/vfio/pci.h        |    4 ++
>  3 files changed, 133 insertions(+), 2 deletions(-)
>
Bob Chen Nov. 20, 2017, 10:45 a.m. UTC | #2
It's a mistake, please ignore. This patch is able to work.

2017-10-26 18:45 GMT+08:00 Bob Chen <a175818323@gmail.com>:

> There seem to be some bugs in these patches, causing my VM failed to boot.
>
> Test case:
>
> 0. Merge these 3 patches in to release 2.10.1
>
> 1. qemu-system-x86_64_2.10.1  ... \
> -device vfio-pci,host=04:00.0 \
> -device vfio-pci,host=05:00.0 \
> -device vfio-pci,host=08:00.0 \
> -device vfio-pci,host=09:00.0 \
> -device vfio-pci,host=85:00.0 \
> -device vfio-pci,host=86:00.0 \
> -device vfio-pci,host=89:00.0 \
> -device vfio-pci,host=8a:00.0 ...
>
> The guest was able to boot up.
>
> 2. qemu-system-x86_64_2.10.1  ... \
> -device vfio-pci,host=04:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=05:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=08:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=09:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=85:00.0,x-nv-gpudirect-clique=8 \
> -device vfio-pci,host=86:00.0,x-nv-gpudirect-clique=8 \
> -device vfio-pci,host=89:00.0,x-nv-gpudirect-clique=8 \
> -device vfio-pci,host=8a:00.0,x-nv-gpudirect-clique=8 \
>
> Hang. VNC couldn't connect.
>
>
> My personal patch used to work, although it was done by straightforward
> hacking and not that friendly to read.
>
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
>
>  @@static int vfio_initfn(PCIDevice *pdev)
>
>     vfio_add_emulated_long(vdev, 0xc8, 0x50080009, ~0);
>     if (count < 4)
>     {
>         vfio_add_emulated_long(vdev, 0xcc, 0x00005032, ~0);
>     }
>     else
>     {
>         vfio_add_emulated_long(vdev, 0xcc, 0x00085032, ~0);
>     }
>     vfio_add_emulated_word(vdev, 0x78, 0xc810, ~0);
>
> 2017-08-30 6:05 GMT+08:00 Alex Williamson <alex.williamson@redhat.com>:
>
>> NVIDIA has a specification for exposing a virtual vendor capability
>> which provides a hint to guest drivers as to which sets of GPUs can
>> support direct peer-to-peer DMA.  Devices with the same clique ID are
>> expected to support this.  The user can specify a clique ID for an
>> NVIDIA graphics device using the new vfio-pci x-nv-gpudirect-clique=
>> option, where valid clique IDs are a 4-bit integer.  It's entirely the
>> user's responsibility to specify sets of devices for which P2P works
>> correctly and provides some benefit.  This is only useful for DMA
>> between NVIDIA GPUs, therefore it's only useful to specify cliques
>> comprised of more than one GPU.  Furthermore, this does not enable DMA
>> between VMs, there is no change to VM DMA mapping, this only exposes
>> hints about existing DMA paths to the guest driver.  Thanks,
>>
>> Alex
>>
>> ---
>>
>> Alex Williamson (3):
>>       vfio/pci: Do not unwind on error
>>       vfio/pci: Add virtual capabilities quirk infrastructure
>>       vfio/pci: Add NVIDIA GPUDirect Cliques support
>>
>>
>>  hw/vfio/pci-quirks.c |  114 ++++++++++++++++++++++++++++++
>> ++++++++++++++++++++
>>  hw/vfio/pci.c        |   17 +++++++
>>  hw/vfio/pci.h        |    4 ++
>>  3 files changed, 133 insertions(+), 2 deletions(-)
>>
>
>