diff mbox

vfio-pci: Fix Nvidia MSI ACK through 0x88000 quirk

Message ID 20131111214137.4219.53967.stgit@bling.home
State New
Headers show

Commit Message

Alex Williamson Nov. 11, 2013, 9:43 p.m. UTC
When MSI is enabled on Nvidia GeForce cards the driver seems to
acknowledge the interrupt by writing a 0xff byte to the MSI capability
ID register using the PCI config space mirror at offset 0x88000 from
BAR0.  Without this, the device will only fire a single interrupt.
VFIO handles the PCI capability ID/next registers as virtual w/o write
support, so any write through config space is currently dropped.  Add
a check for this and allow the write through the BAR window.  The
registers are read-only anyway.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

We might be able to do something with an ioeventfd to make the QEMU
component of this asynchronous... tbd.

 hw/misc/vfio.c |   30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Dave Airlie Nov. 11, 2013, 9:55 p.m. UTC | #1
On Tue, Nov 12, 2013 at 7:43 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> When MSI is enabled on Nvidia GeForce cards the driver seems to
> acknowledge the interrupt by writing a 0xff byte to the MSI capability
> ID register using the PCI config space mirror at offset 0x88000 from
> BAR0.  Without this, the device will only fire a single interrupt.
> VFIO handles the PCI capability ID/next registers as virtual w/o write
> support, so any write through config space is currently dropped.  Add
> a check for this and allow the write through the BAR window.  The
> registers are read-only anyway.

This is only half the truth, I'm afraid if I'm right its much worse than that.

At least on some GPUs the MSI ack is done via PCI config space itself,
and on some its done via the mirror, and yes it matters on some cards
which way it works.

Dave.
Alex Williamson Nov. 11, 2013, 10:32 p.m. UTC | #2
On Tue, 2013-11-12 at 07:55 +1000, Dave Airlie wrote:
> On Tue, Nov 12, 2013 at 7:43 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > When MSI is enabled on Nvidia GeForce cards the driver seems to
> > acknowledge the interrupt by writing a 0xff byte to the MSI capability
> > ID register using the PCI config space mirror at offset 0x88000 from
> > BAR0.  Without this, the device will only fire a single interrupt.
> > VFIO handles the PCI capability ID/next registers as virtual w/o write
> > support, so any write through config space is currently dropped.  Add
> > a check for this and allow the write through the BAR window.  The
> > registers are read-only anyway.
> 
> This is only half the truth, I'm afraid if I'm right its much worse than that.
> 
> At least on some GPUs the MSI ack is done via PCI config space itself,
> and on some its done via the mirror, and yes it matters on some cards
> which way it works.

I was hoping that wouldn't be the case since it seems fairly universal
that PCI config space access should be considered slow and avoided for
things like this.  But, I suppose with MMConfig it's no worse than
device MMIO space.

On my GTX660 I did actually test both paths.  The existing quirk
converts all accesses into the config space mirror into proper config
space accesses, so I changed the kernel to not drop the write.  Then I
figured I'd rather solve this in QEMU using the mirror directly.  So in
my case it didn't matter which path.

I'll keep this in mind though, I may want to enable kernel support for
this first and advertise it via a flag on the config space region, then
I can avoid doing a double write in QEMU and really breaking the state
machine.  Thanks for the feedback,

Alex
Dave Airlie Nov. 11, 2013, 10:56 p.m. UTC | #3
On Tue, Nov 12, 2013 at 8:32 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-11-12 at 07:55 +1000, Dave Airlie wrote:
>> On Tue, Nov 12, 2013 at 7:43 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > When MSI is enabled on Nvidia GeForce cards the driver seems to
>> > acknowledge the interrupt by writing a 0xff byte to the MSI capability
>> > ID register using the PCI config space mirror at offset 0x88000 from
>> > BAR0.  Without this, the device will only fire a single interrupt.
>> > VFIO handles the PCI capability ID/next registers as virtual w/o write
>> > support, so any write through config space is currently dropped.  Add
>> > a check for this and allow the write through the BAR window.  The
>> > registers are read-only anyway.
>>
>> This is only half the truth, I'm afraid if I'm right its much worse than that.
>>
>> At least on some GPUs the MSI ack is done via PCI config space itself,
>> and on some its done via the mirror, and yes it matters on some cards
>> which way it works.
>
> I was hoping that wouldn't be the case since it seems fairly universal
> that PCI config space access should be considered slow and avoided for
> things like this.  But, I suppose with MMConfig it's no worse than
> device MMIO space.

For reference dig around in here

http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg14437.html

nvidia commented on it as well, it may be they require the slowness or
just some other coherency issue.

Dave.
diff mbox

Patch

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 5a8c305..3f50872 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1811,6 +1811,34 @@  static void vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr)
             vdev->host.function);
 }
 
+static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
+                                          uint64_t data, unsigned size)
+{
+    VFIOQuirk *quirk = opaque;
+    VFIODevice *vdev = quirk->vdev;
+    PCIDevice *pdev = &vdev->pdev;
+    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
+
+    vfio_generic_quirk_write(opaque, addr, data, size);
+
+    /*
+     * Nvidia seems to acknowledge MSI interrupts by writing 0xff to the
+     * MSI capability ID register.  Both the ID and next register are
+     * read-only, so we allow writes covering either of those to real hw.
+     * NB - only fixed for the 0x88000 MMIO window.
+     */
+    if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
+        vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
+        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+    }
+}
+
+static const MemoryRegionOps vfio_nvidia_88000_quirk = {
+    .read = vfio_generic_quirk_read,
+    .write = vfio_nvidia_88000_quirk_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
 /*
  * Finally, BAR0 itself.  We want to redirect any accesses to either
  * 0x1800 or 0x88000 through the PCI config space access functions.
@@ -1837,7 +1865,7 @@  static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
     quirk->data.address_mask = PCIE_CONFIG_SPACE_SIZE - 1;
     quirk->data.bar = nr;
 
-    memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk,
+    memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
                           quirk, "vfio-nvidia-bar0-88000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].mem,