[0/3] vfio/pci: Add NVIDIA GPUDirect P2P clique support

Message ID	20170829214929.31136.21144.stgit@gimli.home
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6C25285540 From: Alex Williamson <alex.williamson@redhat.com> To: qemu-devel@nongnu.org Date: Tue, 29 Aug 2017 16:05:24 -0600 Message-ID: <20170829214929.31136.21144.stgit@gimli.home> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [PATCH 0/3] vfio/pci: Add NVIDIA GPUDirect P2P clique support Precedence: list Cc: a175818323@gmail.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	vfio/pci: Add NVIDIA GPUDirect P2P clique support \| expand [0/3] vfio/pci: Add NVIDIA GPUDirect P2P clique support [1/3] vfio/pci: Do not unwind on error [2/3] vfio/pci: Add virtual capabilities quirk infrastructure [3/3] vfio/pci: Add NVIDIA GPUDirect Cliques support

Message ID

20170829214929.31136.21144.stgit@gimli.home

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6C25285540
From: Alex Williamson <alex.williamson@redhat.com>
To: qemu-devel@nongnu.org
Date: Tue, 29 Aug 2017 16:05:24 -0600
Message-ID: <20170829214929.31136.21144.stgit@gimli.home>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] [PATCH 0/3] vfio/pci: Add NVIDIA GPUDirect P2P clique
	support
Precedence: list
Cc: a175818323@gmail.com
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Series

vfio/pci: Add NVIDIA GPUDirect P2P clique support | expand

Message

Alex Williamson Aug. 29, 2017, 10:05 p.m. UTC

NVIDIA has a specification for exposing a virtual vendor capability
which provides a hint to guest drivers as to which sets of GPUs can
support direct peer-to-peer DMA.  Devices with the same clique ID are
expected to support this.  The user can specify a clique ID for an
NVIDIA graphics device using the new vfio-pci x-nv-gpudirect-clique=
option, where valid clique IDs are a 4-bit integer.  It's entirely the
user's responsibility to specify sets of devices for which P2P works
correctly and provides some benefit.  This is only useful for DMA
between NVIDIA GPUs, therefore it's only useful to specify cliques
comprised of more than one GPU.  Furthermore, this does not enable DMA
between VMs, there is no change to VM DMA mapping, this only exposes
hints about existing DMA paths to the guest driver.  Thanks,

Alex

---

Alex Williamson (3):
      vfio/pci: Do not unwind on error
      vfio/pci: Add virtual capabilities quirk infrastructure
      vfio/pci: Add NVIDIA GPUDirect Cliques support


 hw/vfio/pci-quirks.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c        |   17 +++++++
 hw/vfio/pci.h        |    4 ++
 3 files changed, 133 insertions(+), 2 deletions(-)

Comments

Bob Chen Oct. 26, 2017, 10:45 a.m. UTC | #1

There seem to be some bugs in these patches, causing my VM failed to boot.

Test case:

0. Merge these 3 patches in to release 2.10.1

1. qemu-system-x86_64_2.10.1  ... \
-device vfio-pci,host=04:00.0 \
-device vfio-pci,host=05:00.0 \
-device vfio-pci,host=08:00.0 \
-device vfio-pci,host=09:00.0 \
-device vfio-pci,host=85:00.0 \
-device vfio-pci,host=86:00.0 \
-device vfio-pci,host=89:00.0 \
-device vfio-pci,host=8a:00.0 ...

The guest was able to boot up.

2. qemu-system-x86_64_2.10.1  ... \
-device vfio-pci,host=04:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=05:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=08:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=09:00.0,x-nv-gpudirect-clique=0 \
-device vfio-pci,host=85:00.0,x-nv-gpudirect-clique=8 \
-device vfio-pci,host=86:00.0,x-nv-gpudirect-clique=8 \
-device vfio-pci,host=89:00.0,x-nv-gpudirect-clique=8 \
-device vfio-pci,host=8a:00.0,x-nv-gpudirect-clique=8 \

Hang. VNC couldn't connect.


My personal patch used to work, although it was done by straightforward
hacking and not that friendly to read.

--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c

 @@static int vfio_initfn(PCIDevice *pdev)

    vfio_add_emulated_long(vdev, 0xc8, 0x50080009, ~0);
    if (count < 4)
    {
        vfio_add_emulated_long(vdev, 0xcc, 0x00005032, ~0);
    }
    else
    {
        vfio_add_emulated_long(vdev, 0xcc, 0x00085032, ~0);
    }
    vfio_add_emulated_word(vdev, 0x78, 0xc810, ~0);

2017-08-30 6:05 GMT+08:00 Alex Williamson <alex.williamson@redhat.com>:

> NVIDIA has a specification for exposing a virtual vendor capability
> which provides a hint to guest drivers as to which sets of GPUs can
> support direct peer-to-peer DMA.  Devices with the same clique ID are
> expected to support this.  The user can specify a clique ID for an
> NVIDIA graphics device using the new vfio-pci x-nv-gpudirect-clique=
> option, where valid clique IDs are a 4-bit integer.  It's entirely the
> user's responsibility to specify sets of devices for which P2P works
> correctly and provides some benefit.  This is only useful for DMA
> between NVIDIA GPUs, therefore it's only useful to specify cliques
> comprised of more than one GPU.  Furthermore, this does not enable DMA
> between VMs, there is no change to VM DMA mapping, this only exposes
> hints about existing DMA paths to the guest driver.  Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (3):
>       vfio/pci: Do not unwind on error
>       vfio/pci: Add virtual capabilities quirk infrastructure
>       vfio/pci: Add NVIDIA GPUDirect Cliques support
>
>
>  hw/vfio/pci-quirks.c |  114 ++++++++++++++++++++++++++++++
> ++++++++++++++++++++
>  hw/vfio/pci.c        |   17 +++++++
>  hw/vfio/pci.h        |    4 ++
>  3 files changed, 133 insertions(+), 2 deletions(-)
>

Bob Chen Nov. 20, 2017, 10:45 a.m. UTC | #2

It's a mistake, please ignore. This patch is able to work.

2017-10-26 18:45 GMT+08:00 Bob Chen <a175818323@gmail.com>:

> There seem to be some bugs in these patches, causing my VM failed to boot.
>
> Test case:
>
> 0. Merge these 3 patches in to release 2.10.1
>
> 1. qemu-system-x86_64_2.10.1  ... \
> -device vfio-pci,host=04:00.0 \
> -device vfio-pci,host=05:00.0 \
> -device vfio-pci,host=08:00.0 \
> -device vfio-pci,host=09:00.0 \
> -device vfio-pci,host=85:00.0 \
> -device vfio-pci,host=86:00.0 \
> -device vfio-pci,host=89:00.0 \
> -device vfio-pci,host=8a:00.0 ...
>
> The guest was able to boot up.
>
> 2. qemu-system-x86_64_2.10.1  ... \
> -device vfio-pci,host=04:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=05:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=08:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=09:00.0,x-nv-gpudirect-clique=0 \
> -device vfio-pci,host=85:00.0,x-nv-gpudirect-clique=8 \
> -device vfio-pci,host=86:00.0,x-nv-gpudirect-clique=8 \
> -device vfio-pci,host=89:00.0,x-nv-gpudirect-clique=8 \
> -device vfio-pci,host=8a:00.0,x-nv-gpudirect-clique=8 \
>
> Hang. VNC couldn't connect.
>
>
> My personal patch used to work, although it was done by straightforward
> hacking and not that friendly to read.
>
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
>
>  @@static int vfio_initfn(PCIDevice *pdev)
>
>     vfio_add_emulated_long(vdev, 0xc8, 0x50080009, ~0);
>     if (count < 4)
>     {
>         vfio_add_emulated_long(vdev, 0xcc, 0x00005032, ~0);
>     }
>     else
>     {
>         vfio_add_emulated_long(vdev, 0xcc, 0x00085032, ~0);
>     }
>     vfio_add_emulated_word(vdev, 0x78, 0xc810, ~0);
>
> 2017-08-30 6:05 GMT+08:00 Alex Williamson <alex.williamson@redhat.com>:
>
>> NVIDIA has a specification for exposing a virtual vendor capability
>> which provides a hint to guest drivers as to which sets of GPUs can
>> support direct peer-to-peer DMA.  Devices with the same clique ID are
>> expected to support this.  The user can specify a clique ID for an
>> NVIDIA graphics device using the new vfio-pci x-nv-gpudirect-clique=
>> option, where valid clique IDs are a 4-bit integer.  It's entirely the
>> user's responsibility to specify sets of devices for which P2P works
>> correctly and provides some benefit.  This is only useful for DMA
>> between NVIDIA GPUs, therefore it's only useful to specify cliques
>> comprised of more than one GPU.  Furthermore, this does not enable DMA
>> between VMs, there is no change to VM DMA mapping, this only exposes
>> hints about existing DMA paths to the guest driver.  Thanks,
>>
>> Alex
>>
>> ---
>>
>> Alex Williamson (3):
>>       vfio/pci: Do not unwind on error
>>       vfio/pci: Add virtual capabilities quirk infrastructure
>>       vfio/pci: Add NVIDIA GPUDirect Cliques support
>>
>>
>>  hw/vfio/pci-quirks.c |  114 ++++++++++++++++++++++++++++++
>> ++++++++++++++++++++
>>  hw/vfio/pci.c        |   17 +++++++
>>  hw/vfio/pci.h        |    4 ++
>>  3 files changed, 133 insertions(+), 2 deletions(-)
>>
>
>