Message ID | 20131111214137.4219.53967.stgit@bling.home |
---|---|
State | New |
Headers | show |
On Tue, Nov 12, 2013 at 7:43 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > When MSI is enabled on Nvidia GeForce cards the driver seems to > acknowledge the interrupt by writing a 0xff byte to the MSI capability > ID register using the PCI config space mirror at offset 0x88000 from > BAR0. Without this, the device will only fire a single interrupt. > VFIO handles the PCI capability ID/next registers as virtual w/o write > support, so any write through config space is currently dropped. Add > a check for this and allow the write through the BAR window. The > registers are read-only anyway. This is only half the truth, I'm afraid if I'm right its much worse than that. At least on some GPUs the MSI ack is done via PCI config space itself, and on some its done via the mirror, and yes it matters on some cards which way it works. Dave.
On Tue, 2013-11-12 at 07:55 +1000, Dave Airlie wrote: > On Tue, Nov 12, 2013 at 7:43 AM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > When MSI is enabled on Nvidia GeForce cards the driver seems to > > acknowledge the interrupt by writing a 0xff byte to the MSI capability > > ID register using the PCI config space mirror at offset 0x88000 from > > BAR0. Without this, the device will only fire a single interrupt. > > VFIO handles the PCI capability ID/next registers as virtual w/o write > > support, so any write through config space is currently dropped. Add > > a check for this and allow the write through the BAR window. The > > registers are read-only anyway. > > This is only half the truth, I'm afraid if I'm right its much worse than that. > > At least on some GPUs the MSI ack is done via PCI config space itself, > and on some its done via the mirror, and yes it matters on some cards > which way it works. I was hoping that wouldn't be the case since it seems fairly universal that PCI config space access should be considered slow and avoided for things like this. But, I suppose with MMConfig it's no worse than device MMIO space. On my GTX660 I did actually test both paths. The existing quirk converts all accesses into the config space mirror into proper config space accesses, so I changed the kernel to not drop the write. Then I figured I'd rather solve this in QEMU using the mirror directly. So in my case it didn't matter which path. I'll keep this in mind though, I may want to enable kernel support for this first and advertise it via a flag on the config space region, then I can avoid doing a double write in QEMU and really breaking the state machine. Thanks for the feedback, Alex
On Tue, Nov 12, 2013 at 8:32 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Tue, 2013-11-12 at 07:55 +1000, Dave Airlie wrote: >> On Tue, Nov 12, 2013 at 7:43 AM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > When MSI is enabled on Nvidia GeForce cards the driver seems to >> > acknowledge the interrupt by writing a 0xff byte to the MSI capability >> > ID register using the PCI config space mirror at offset 0x88000 from >> > BAR0. Without this, the device will only fire a single interrupt. >> > VFIO handles the PCI capability ID/next registers as virtual w/o write >> > support, so any write through config space is currently dropped. Add >> > a check for this and allow the write through the BAR window. The >> > registers are read-only anyway. >> >> This is only half the truth, I'm afraid if I'm right its much worse than that. >> >> At least on some GPUs the MSI ack is done via PCI config space itself, >> and on some its done via the mirror, and yes it matters on some cards >> which way it works. > > I was hoping that wouldn't be the case since it seems fairly universal > that PCI config space access should be considered slow and avoided for > things like this. But, I suppose with MMConfig it's no worse than > device MMIO space. For reference dig around in here http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg14437.html nvidia commented on it as well, it may be they require the slowness or just some other coherency issue. Dave.
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 5a8c305..3f50872 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -1811,6 +1811,34 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr) vdev->host.function); } +static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + VFIOQuirk *quirk = opaque; + VFIODevice *vdev = quirk->vdev; + PCIDevice *pdev = &vdev->pdev; + hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK; + + vfio_generic_quirk_write(opaque, addr, data, size); + + /* + * Nvidia seems to acknowledge MSI interrupts by writing 0xff to the + * MSI capability ID register. Both the ID and next register are + * read-only, so we allow writes covering either of those to real hw. + * NB - only fixed for the 0x88000 MMIO window. + */ + if ((pdev->cap_present & QEMU_PCI_CAP_MSI) && + vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) { + vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size); + } +} + +static const MemoryRegionOps vfio_nvidia_88000_quirk = { + .read = vfio_generic_quirk_read, + .write = vfio_nvidia_88000_quirk_write, + .endianness = DEVICE_LITTLE_ENDIAN, +}; + /* * Finally, BAR0 itself. We want to redirect any accesses to either * 0x1800 or 0x88000 through the PCI config space access functions. @@ -1837,7 +1865,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr) quirk->data.address_mask = PCIE_CONFIG_SPACE_SIZE - 1; quirk->data.bar = nr; - memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, + memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk, quirk, "vfio-nvidia-bar0-88000-quirk", TARGET_PAGE_ALIGN(quirk->data.address_mask + 1)); memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
When MSI is enabled on Nvidia GeForce cards the driver seems to acknowledge the interrupt by writing a 0xff byte to the MSI capability ID register using the PCI config space mirror at offset 0x88000 from BAR0. Without this, the device will only fire a single interrupt. VFIO handles the PCI capability ID/next registers as virtual w/o write support, so any write through config space is currently dropped. Add a check for this and allow the write through the BAR window. The registers are read-only anyway. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- We might be able to do something with an ioeventfd to make the QEMU component of this asynchronous... tbd. hw/misc/vfio.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)