From patchwork Tue May 1 16:43:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 907126 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=208.118.235.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b6gC1xp4z9s35 for ; Wed, 2 May 2018 02:43:51 +1000 (AEST) Received: from localhost ([::1]:45249 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYNZ-0003yT-3C for incoming@patchwork.ozlabs.org; Tue, 01 May 2018 12:43:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55339) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYN5-0003xB-2r for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fDYN1-0007EY-4U for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45162) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fDYN0-0007DH-QX for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:15 -0400 Received: from smtp.corp.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BA68C3003A24; Tue, 1 May 2018 16:43:13 +0000 (UTC) Received: from gimli.home (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by smtp.corp.redhat.com (Postfix) with ESMTP id A2C254D327; Tue, 1 May 2018 16:43:06 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Tue, 01 May 2018 10:43:05 -0600 Message-ID: <20180501164305.28940.50928.stgit@gimli.home> In-Reply-To: <20180501162901.28940.1075.stgit@gimli.home> References: <20180501162901.28940.1075.stgit@gimli.home> User-Agent: StGit/0.18-102-gdf9f MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Tue, 01 May 2018 16:43:13 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 1/4] vfio/quirks: Add common quirk alloc helper X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eric.auger@redhat.com, peterx@redhat.com, kvm@vger.kernel.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This will later be used to include list initialization. Reviewed-by: Eric Auger Reviewed-by: Peter Xu Signed-off-by: Alex Williamson --- hw/vfio/pci-quirks.c | 48 +++++++++++++++++++++--------------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index e5779a7ad35b..cc3a74ed992a 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -275,6 +275,15 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = { .endianness = DEVICE_LITTLE_ENDIAN, }; +static VFIOQuirk *vfio_quirk_alloc(int nr_mem) +{ + VFIOQuirk *quirk = g_new0(VFIOQuirk, 1); + quirk->mem = g_new0(MemoryRegion, nr_mem); + quirk->nr_mem = nr_mem; + + return quirk; +} + static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev) { VFIOQuirk *quirk; @@ -288,9 +297,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev) return; } - quirk = g_malloc0(sizeof(*quirk)); - quirk->mem = g_new0(MemoryRegion, 1); - quirk->nr_mem = 1; + quirk = vfio_quirk_alloc(1); memory_region_init_io(quirk->mem, OBJECT(vdev), &vfio_ati_3c3_quirk, vdev, "vfio-ati-3c3-quirk", 1); @@ -323,9 +330,7 @@ static void vfio_probe_ati_bar4_quirk(VFIOPCIDevice *vdev, int nr) return; } - quirk = g_malloc0(sizeof(*quirk)); - quirk->mem = g_new0(MemoryRegion, 2); - quirk->nr_mem = 2; + quirk = vfio_quirk_alloc(2); window = quirk->data = g_malloc0(sizeof(*window) + sizeof(VFIOConfigWindowMatch)); window->vdev = vdev; @@ -371,10 +376,9 @@ static void vfio_probe_ati_bar2_quirk(VFIOPCIDevice *vdev, int nr) return; } - quirk = g_malloc0(sizeof(*quirk)); + quirk = vfio_quirk_alloc(1); mirror = quirk->data = g_malloc0(sizeof(*mirror)); - mirror->mem = quirk->mem = g_new0(MemoryRegion, 1); - quirk->nr_mem = 1; + mirror->mem = quirk->mem; mirror->vdev = vdev; mirror->offset = 0x4000; mirror->bar = nr; @@ -548,10 +552,8 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev) return; } - quirk = g_malloc0(sizeof(*quirk)); + quirk = vfio_quirk_alloc(2); quirk->data = data = g_malloc0(sizeof(*data)); - quirk->mem = g_new0(MemoryRegion, 2); - quirk->nr_mem = 2; data->vdev = vdev; memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_nvidia_3d4_quirk, @@ -667,9 +669,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr) return; } - quirk = g_malloc0(sizeof(*quirk)); - quirk->mem = g_new0(MemoryRegion, 4); - quirk->nr_mem = 4; + quirk = vfio_quirk_alloc(4); bar5 = quirk->data = g_malloc0(sizeof(*bar5) + (sizeof(VFIOConfigWindowMatch) * 2)); window = &bar5->window; @@ -762,10 +762,9 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr) return; } - quirk = g_malloc0(sizeof(*quirk)); + quirk = vfio_quirk_alloc(1); mirror = quirk->data = g_malloc0(sizeof(*mirror)); - mirror->mem = quirk->mem = g_new0(MemoryRegion, 1); - quirk->nr_mem = 1; + mirror->mem = quirk->mem; mirror->vdev = vdev; mirror->offset = 0x88000; mirror->bar = nr; @@ -781,10 +780,9 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr) /* The 0x1800 offset mirror only seems to get used by legacy VGA */ if (vdev->vga) { - quirk = g_malloc0(sizeof(*quirk)); + quirk = vfio_quirk_alloc(1); mirror = quirk->data = g_malloc0(sizeof(*mirror)); - mirror->mem = quirk->mem = g_new0(MemoryRegion, 1); - quirk->nr_mem = 1; + mirror->mem = quirk->mem; mirror->vdev = vdev; mirror->offset = 0x1800; mirror->bar = nr; @@ -945,9 +943,7 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr) return; } - quirk = g_malloc0(sizeof(*quirk)); - quirk->mem = g_new0(MemoryRegion, 2); - quirk->nr_mem = 2; + quirk = vfio_quirk_alloc(2); quirk->data = rtl = g_malloc0(sizeof(*rtl)); rtl->vdev = vdev; @@ -1507,9 +1503,7 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr) } /* Setup our quirk to munge GTT addresses to the VM allocated buffer */ - quirk = g_malloc0(sizeof(*quirk)); - quirk->mem = g_new0(MemoryRegion, 2); - quirk->nr_mem = 2; + quirk = vfio_quirk_alloc(2); igd = quirk->data = g_malloc0(sizeof(*igd)); igd->vdev = vdev; igd->index = ~0; From patchwork Tue May 1 16:43:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 907129 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=208.118.235.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b6jl3mTJz9s1w for ; Wed, 2 May 2018 02:46:03 +1000 (AEST) Received: from localhost ([::1]:45267 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYPh-0005gv-9H for incoming@patchwork.ozlabs.org; Tue, 01 May 2018 12:46:01 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55402) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYNI-00048I-0C for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fDYNE-0007LY-3B for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58248) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fDYND-0007Ky-TQ for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:28 -0400 Received: from smtp.corp.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 091B781DE9; Tue, 1 May 2018 16:43:27 +0000 (UTC) Received: from gimli.home (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by smtp.corp.redhat.com (Postfix) with ESMTP id C0482313DD08; Tue, 1 May 2018 16:43:19 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Tue, 01 May 2018 10:43:18 -0600 Message-ID: <20180501164318.28940.89195.stgit@gimli.home> In-Reply-To: <20180501162901.28940.1075.stgit@gimli.home> References: <20180501162901.28940.1075.stgit@gimli.home> User-Agent: StGit/0.18-102-gdf9f MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 01 May 2018 16:43:27 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 2/4] vfio/quirks: Add quirk reset callback X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eric.auger@redhat.com, peterx@redhat.com, kvm@vger.kernel.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Quirks can be self modifying, provide a hook to allow them to cleanup on device reset if desired. Reviewed-by: Eric Auger Reviewed-by: Peter Xu Signed-off-by: Alex Williamson --- hw/vfio/pci-quirks.c | 15 +++++++++++++++ hw/vfio/pci.c | 2 ++ hw/vfio/pci.h | 2 ++ 3 files changed, 19 insertions(+) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index cc3a74ed992a..f0947cbf152f 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -1694,6 +1694,21 @@ void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr) /* * Reset quirks */ +void vfio_quirk_reset(VFIOPCIDevice *vdev) +{ + int i; + + for (i = 0; i < PCI_ROM_SLOT; i++) { + VFIOQuirk *quirk; + VFIOBAR *bar = &vdev->bars[i]; + + QLIST_FOREACH(quirk, &bar->quirks, next) { + if (quirk->reset) { + quirk->reset(vdev, quirk); + } + } + } +} /* * AMD Radeon PCI config reset, based on Linux: diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 4947fe39a28c..65446fb42845 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2207,6 +2207,8 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev) vdev->vbasedev.name, nr); } } + + vfio_quirk_reset(vdev); } static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name) diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 59ab7757a300..594a5bd00593 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -29,6 +29,7 @@ typedef struct VFIOQuirk { void *data; int nr_mem; MemoryRegion *mem; + void (*reset)(struct VFIOPCIDevice *vdev, struct VFIOQuirk *quirk); } VFIOQuirk; typedef struct VFIOBAR { @@ -167,6 +168,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr); void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr); void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev); int vfio_add_virt_caps(VFIOPCIDevice *vdev, Error **errp); +void vfio_quirk_reset(VFIOPCIDevice *vdev); extern const PropertyInfo qdev_prop_nv_gpudirect_clique; From patchwork Tue May 1 16:43:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 907127 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=208.118.235.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b6gt2lm9z9s2t for ; Wed, 2 May 2018 02:44:26 +1000 (AEST) Received: from localhost ([::1]:45250 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYO8-0004Jh-9V for incoming@patchwork.ozlabs.org; Tue, 01 May 2018 12:44:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55452) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYNW-0004HS-Gv for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fDYNS-0007Tm-FO for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42112) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fDYNS-0007TH-6m for qemu-devel@nongnu.org; Tue, 01 May 2018 12:43:42 -0400 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 67599C057FAD; Tue, 1 May 2018 16:43:41 +0000 (UTC) Received: from gimli.home (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by smtp.corp.redhat.com (Postfix) with ESMTP id D2ABF30012B5; Tue, 1 May 2018 16:43:32 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Tue, 01 May 2018 10:43:32 -0600 Message-ID: <20180501164332.28940.40383.stgit@gimli.home> In-Reply-To: <20180501162901.28940.1075.stgit@gimli.home> References: <20180501162901.28940.1075.stgit@gimli.home> User-Agent: StGit/0.18-102-gdf9f MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 01 May 2018 16:43:41 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 3/4] vfio/quirks: ioeventfd quirk acceleration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eric.auger@redhat.com, peterx@redhat.com, kvm@vger.kernel.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found in device MMIO space. Normally PCI config space is considered a slow path and further optimization is unnecessary, however NVIDIA uses a register here to enable the MSI interrupt to re-trigger. Exiting to QEMU for this MSI-ACK handling can therefore rate limit our interrupt handling. Fortunately the MSI-ACK write is easily detected since the quirk MemoryRegion otherwise has very few accesses, so simply looking for consecutive writes with the same data is sufficient, in this case 10 consecutive writes with the same data and size is arbitrarily chosen. We configure the KVM ioeventfd with data match, so there's no risk of triggering for the wrong data or size, but we do risk that pathological driver behavior might consume all of QEMU's file descriptors, so we cap ourselves to 10 ioeventfds for this purpose. In support of the above, generic ioeventfd infrastructure is added for vfio quirks. This automatically initializes an ioeventfd list per quirk, disables and frees ioeventfds on exit, and allows ioeventfds marked as dynamic to be dropped on device reset. The rationale for this latter feature is that useful ioeventfds may depend on specific driver behavior and since we necessarily place a cap on our use of ioeventfds, a machine reset is a reasonable point at which to assume a new driver and re-profile. Signed-off-by: Alex Williamson Reviewed-by: Peter Xu Reviewed-by: Eric Auger --- hw/vfio/pci-quirks.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++- hw/vfio/pci.c | 2 + hw/vfio/pci.h | 15 ++++ hw/vfio/trace-events | 3 + 4 files changed, 192 insertions(+), 2 deletions(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index f0947cbf152f..4cedc733bc0a 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qemu/error-report.h" +#include "qemu/main-loop.h" #include "qemu/range.h" #include "qapi/error.h" #include "qapi/visitor.h" @@ -202,6 +203,7 @@ typedef struct VFIOConfigMirrorQuirk { uint32_t offset; uint8_t bar; MemoryRegion *mem; + uint8_t data[]; } VFIOConfigMirrorQuirk; static uint64_t vfio_generic_quirk_mirror_read(void *opaque, @@ -278,12 +280,98 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = { static VFIOQuirk *vfio_quirk_alloc(int nr_mem) { VFIOQuirk *quirk = g_new0(VFIOQuirk, 1); + QLIST_INIT(&quirk->ioeventfds); quirk->mem = g_new0(MemoryRegion, nr_mem); quirk->nr_mem = nr_mem; return quirk; } +static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd) +{ + QLIST_REMOVE(ioeventfd, next); + memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size, + ioeventfd->match_data, ioeventfd->data, + &ioeventfd->e); + qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL); + event_notifier_cleanup(&ioeventfd->e); + trace_vfio_ioeventfd_exit(memory_region_name(ioeventfd->mr), + (uint64_t)ioeventfd->addr, ioeventfd->size, + ioeventfd->data); + g_free(ioeventfd); +} + +static void vfio_drop_dynamic_eventfds(VFIOPCIDevice *vdev, VFIOQuirk *quirk) +{ + VFIOIOEventFD *ioeventfd, *tmp; + + QLIST_FOREACH_SAFE(ioeventfd, &quirk->ioeventfds, next, tmp) { + if (ioeventfd->dynamic) { + vfio_ioeventfd_exit(ioeventfd); + } + } +} + +static void vfio_ioeventfd_handler(void *opaque) +{ + VFIOIOEventFD *ioeventfd = opaque; + + if (event_notifier_test_and_clear(&ioeventfd->e)) { + vfio_region_write(ioeventfd->region, ioeventfd->region_addr, + ioeventfd->data, ioeventfd->size); + trace_vfio_ioeventfd_handler(memory_region_name(ioeventfd->mr), + (uint64_t)ioeventfd->addr, ioeventfd->size, + ioeventfd->data); + } +} + +static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev, + MemoryRegion *mr, hwaddr addr, + unsigned size, uint64_t data, + VFIORegion *region, + hwaddr region_addr, bool dynamic) +{ + VFIOIOEventFD *ioeventfd; + + if (vdev->no_kvm_ioeventfd) { + return NULL; + } + + ioeventfd = g_malloc0(sizeof(*ioeventfd)); + + if (event_notifier_init(&ioeventfd->e, 0)) { + g_free(ioeventfd); + return NULL; + } + + /* + * MemoryRegion and relative offset, plus additional ioeventfd setup + * parameters for configuring and later tearing down KVM ioeventfd. + */ + ioeventfd->mr = mr; + ioeventfd->addr = addr; + ioeventfd->size = size; + ioeventfd->data = data; + ioeventfd->match_data = true; + ioeventfd->dynamic = dynamic; + /* + * VFIORegion and relative offset for implementing the userspace + * handler. data & size fields shared for both uses. + */ + ioeventfd->region = region; + ioeventfd->region_addr = region_addr; + + qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), + vfio_ioeventfd_handler, NULL, ioeventfd); + memory_region_add_eventfd(ioeventfd->mr, ioeventfd->addr, + ioeventfd->size, ioeventfd->match_data, + ioeventfd->data, &ioeventfd->e); + trace_vfio_ioeventfd_init(memory_region_name(mr), (uint64_t)addr, + size, data); + + return ioeventfd; +} + static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev) { VFIOQuirk *quirk; @@ -719,6 +807,17 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr) trace_vfio_quirk_nvidia_bar5_probe(vdev->vbasedev.name); } +typedef struct LastDataSet { + hwaddr addr; + uint64_t data; + unsigned size; + int hits; + int added; +} LastDataSet; + +#define MAX_DYN_IOEVENTFD 10 +#define HITS_FOR_IOEVENTFD 10 + /* * Finally, BAR0 itself. We want to redirect any accesses to either * 0x1800 or 0x88000 through the PCI config space access functions. @@ -729,6 +828,7 @@ static void vfio_nvidia_quirk_mirror_write(void *opaque, hwaddr addr, VFIOConfigMirrorQuirk *mirror = opaque; VFIOPCIDevice *vdev = mirror->vdev; PCIDevice *pdev = &vdev->pdev; + LastDataSet *last = (LastDataSet *)&mirror->data; vfio_generic_quirk_mirror_write(opaque, addr, data, size); @@ -743,6 +843,60 @@ static void vfio_nvidia_quirk_mirror_write(void *opaque, hwaddr addr, addr + mirror->offset, data, size); trace_vfio_quirk_nvidia_bar0_msi_ack(vdev->vbasedev.name); } + + /* + * Automatically add an ioeventfd to handle any repeated write with the + * same data and size above the standard PCI config space header. This is + * primarily expected to accelerate the MSI-ACK behavior, such as noted + * above. Current hardware/drivers should trigger an ioeventfd at config + * offset 0x704 (region offset 0x88704), with data 0x0, size 4. + * + * The criteria of 10 successive hits is arbitrary but reliably adds the + * MSI-ACK region. Note that as some writes are bypassed via the ioeventfd, + * the remaining ones have a greater chance of being seen successively. + * To avoid the pathological case of burning up all of QEMU's open file + * handles, arbitrarily limit this algorithm from adding no more than 10 + * ioeventfds, print an error if we would have added an 11th, and then + * stop counting. + */ + if (!vdev->no_kvm_ioeventfd && + addr > PCI_STD_HEADER_SIZEOF && last->added < MAX_DYN_IOEVENTFD + 1) { + if (addr != last->addr || data != last->data || size != last->size) { + last->addr = addr; + last->data = data; + last->size = size; + last->hits = 1; + } else if (++last->hits >= HITS_FOR_IOEVENTFD) { + if (last->added < MAX_DYN_IOEVENTFD) { + VFIOIOEventFD *ioeventfd; + ioeventfd = vfio_ioeventfd_init(vdev, mirror->mem, addr, size, + data, &vdev->bars[mirror->bar].region, + mirror->offset + addr, true); + if (ioeventfd) { + VFIOQuirk *quirk; + + QLIST_FOREACH(quirk, + &vdev->bars[mirror->bar].quirks, next) { + if (quirk->data == mirror) { + QLIST_INSERT_HEAD(&quirk->ioeventfds, + ioeventfd, next); + break; + } + } + + assert(quirk != NULL); /* Check not found */ + + last->added++; + } + } else { + last->added++; + + error_report("NVIDIA ioeventfd queue full for %s, unable to " + "accelerate 0x%"HWADDR_PRIx", data 0x%"PRIx64", " + "size %u", vdev->vbasedev.name, addr, data, size); + } + } + } } static const MemoryRegionOps vfio_nvidia_mirror_quirk = { @@ -751,6 +905,16 @@ static const MemoryRegionOps vfio_nvidia_mirror_quirk = { .endianness = DEVICE_LITTLE_ENDIAN, }; +static void vfio_nvidia_bar0_quirk_reset(VFIOPCIDevice *vdev, VFIOQuirk *quirk) +{ + VFIOConfigMirrorQuirk *mirror = quirk->data; + LastDataSet *last = (LastDataSet *)&mirror->data; + + memset(last, 0, sizeof(*last)); + + vfio_drop_dynamic_eventfds(vdev, quirk); +} + static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr) { VFIOQuirk *quirk; @@ -763,7 +927,8 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr) } quirk = vfio_quirk_alloc(1); - mirror = quirk->data = g_malloc0(sizeof(*mirror)); + quirk->reset = vfio_nvidia_bar0_quirk_reset; + mirror = quirk->data = g_malloc0(sizeof(*mirror) + sizeof(LastDataSet)); mirror->mem = quirk->mem; mirror->vdev = vdev; mirror->offset = 0x88000; @@ -781,7 +946,8 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr) /* The 0x1800 offset mirror only seems to get used by legacy VGA */ if (vdev->vga) { quirk = vfio_quirk_alloc(1); - mirror = quirk->data = g_malloc0(sizeof(*mirror)); + quirk->reset = vfio_nvidia_bar0_quirk_reset; + mirror = quirk->data = g_malloc0(sizeof(*mirror) + sizeof(LastDataSet)); mirror->mem = quirk->mem; mirror->vdev = vdev; mirror->offset = 0x1800; @@ -1668,6 +1834,10 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr) int i; QLIST_FOREACH(quirk, &bar->quirks, next) { + while (!QLIST_EMPTY(&quirk->ioeventfds)) { + vfio_ioeventfd_exit(QLIST_FIRST(&quirk->ioeventfds)); + } + for (i = 0; i < quirk->nr_mem; i++) { memory_region_del_subregion(bar->region.mem, &quirk->mem[i]); } diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 65446fb42845..ba1239551115 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3175,6 +3175,8 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false), DEFINE_PROP_BOOL("x-no-geforce-quirks", VFIOPCIDevice, no_geforce_quirks, false), + DEFINE_PROP_BOOL("x-no-kvm-ioeventfd", VFIOPCIDevice, no_kvm_ioeventfd, + false), DEFINE_PROP_UINT32("x-pci-vendor-id", VFIOPCIDevice, vendor_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-pci-device-id", VFIOPCIDevice, device_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-pci-sub-vendor-id", VFIOPCIDevice, diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 594a5bd00593..dbb3aca9b3d2 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -24,9 +24,23 @@ struct VFIOPCIDevice; +typedef struct VFIOIOEventFD { + QLIST_ENTRY(VFIOIOEventFD) next; + MemoryRegion *mr; + hwaddr addr; + unsigned size; + uint64_t data; + EventNotifier e; + VFIORegion *region; + hwaddr region_addr; + bool match_data; + bool dynamic; +} VFIOIOEventFD; + typedef struct VFIOQuirk { QLIST_ENTRY(VFIOQuirk) next; void *data; + QLIST_HEAD(, VFIOIOEventFD) ioeventfds; int nr_mem; MemoryRegion *mem; void (*reset)(struct VFIOPCIDevice *vdev, struct VFIOQuirk *quirk); @@ -149,6 +163,7 @@ typedef struct VFIOPCIDevice { bool no_kvm_msi; bool no_kvm_msix; bool no_geforce_quirks; + bool no_kvm_ioeventfd; VFIODisplay *dpy; } VFIOPCIDevice; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 20109cb7581f..f8f97d1ff90c 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -77,6 +77,9 @@ vfio_quirk_ati_bonaire_reset_no_smc(const char *name) "%s" vfio_quirk_ati_bonaire_reset_timeout(const char *name) "%s" vfio_quirk_ati_bonaire_reset_done(const char *name) "%s" vfio_quirk_ati_bonaire_reset(const char *name) "%s" +vfio_ioeventfd_exit(const char *name, uint64_t addr, unsigned size, uint64_t data) "%s+0x%"PRIx64"[%d]:0x%"PRIx64 +vfio_ioeventfd_handler(const char *name, uint64_t addr, unsigned size, uint64_t data) "%s+0x%"PRIx64"[%d] -> 0x%"PRIx64 +vfio_ioeventfd_init(const char *name, uint64_t addr, unsigned size, uint64_t data) "%s+0x%"PRIx64"[%d]:0x%"PRIx64 vfio_pci_igd_bar4_write(const char *name, uint32_t index, uint32_t data, uint32_t base) "%s [0x%03x] 0x%08x -> 0x%08x" vfio_pci_igd_bdsm_enabled(const char *name, int size) "%s %dMB" vfio_pci_igd_opregion_enabled(const char *name) "%s" From patchwork Tue May 1 16:43:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 907128 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=208.118.235.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b6jC3Y6pz9s1w for ; Wed, 2 May 2018 02:45:35 +1000 (AEST) Received: from localhost ([::1]:45259 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYPF-0005In-3U for incoming@patchwork.ozlabs.org; Tue, 01 May 2018 12:45:33 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55521) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDYNm-0004U8-Nk for qemu-devel@nongnu.org; Tue, 01 May 2018 12:44:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fDYNl-0007dx-JT for qemu-devel@nongnu.org; Tue, 01 May 2018 12:44:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57146) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fDYNl-0007d4-9I for qemu-devel@nongnu.org; Tue, 01 May 2018 12:44:01 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 375625F7B9; Tue, 1 May 2018 16:44:00 +0000 (UTC) Received: from gimli.home (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8502A34206; Tue, 1 May 2018 16:43:46 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Tue, 01 May 2018 10:43:46 -0600 Message-ID: <20180501164346.28940.93328.stgit@gimli.home> In-Reply-To: <20180501162901.28940.1075.stgit@gimli.home> References: <20180501162901.28940.1075.stgit@gimli.home> User-Agent: StGit/0.18-102-gdf9f MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 01 May 2018 16:44:00 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 4/4] vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eric.auger@redhat.com, peterx@redhat.com, kvm@vger.kernel.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" With vfio ioeventfd support, we can program vfio-pci to perform a specified BAR write when an eventfd is triggered. This allows the KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding userspace handling for these events. On the same micro-benchmark where the ioeventfd got us to almost 90% of performance versus disabling the GeForce quirks, this gets us to within 95%. Signed-off-by: Alex Williamson Reviewed-by: Peter Xu --- hw/vfio/pci-quirks.c | 50 +++++++++++++++++++++++++++++++++++++++++++------- hw/vfio/pci.c | 2 ++ hw/vfio/pci.h | 2 ++ hw/vfio/trace-events | 2 +- 4 files changed, 48 insertions(+), 8 deletions(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index 4cedc733bc0a..94be27dd0a3b 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -16,6 +16,7 @@ #include "qemu/range.h" #include "qapi/error.h" #include "qapi/visitor.h" +#include #include "hw/nvram/fw_cfg.h" #include "pci.h" #include "trace.h" @@ -287,13 +288,31 @@ static VFIOQuirk *vfio_quirk_alloc(int nr_mem) return quirk; } -static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd) +static void vfio_ioeventfd_exit(VFIOPCIDevice *vdev, VFIOIOEventFD *ioeventfd) { QLIST_REMOVE(ioeventfd, next); + memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size, ioeventfd->match_data, ioeventfd->data, &ioeventfd->e); - qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL); + + if (ioeventfd->vfio) { + struct vfio_device_ioeventfd vfio_ioeventfd; + + vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd); + vfio_ioeventfd.flags = ioeventfd->size; + vfio_ioeventfd.data = ioeventfd->data; + vfio_ioeventfd.offset = ioeventfd->region->fd_offset + + ioeventfd->region_addr; + vfio_ioeventfd.fd = -1; + + ioctl(vdev->vbasedev.fd, VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd); + + } else { + qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), + NULL, NULL, NULL); + } + event_notifier_cleanup(&ioeventfd->e); trace_vfio_ioeventfd_exit(memory_region_name(ioeventfd->mr), (uint64_t)ioeventfd->addr, ioeventfd->size, @@ -307,7 +326,7 @@ static void vfio_drop_dynamic_eventfds(VFIOPCIDevice *vdev, VFIOQuirk *quirk) QLIST_FOREACH_SAFE(ioeventfd, &quirk->ioeventfds, next, tmp) { if (ioeventfd->dynamic) { - vfio_ioeventfd_exit(ioeventfd); + vfio_ioeventfd_exit(vdev, ioeventfd); } } } @@ -361,13 +380,30 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev, ioeventfd->region = region; ioeventfd->region_addr = region_addr; - qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), - vfio_ioeventfd_handler, NULL, ioeventfd); + if (!vdev->no_vfio_ioeventfd) { + struct vfio_device_ioeventfd vfio_ioeventfd; + + vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd); + vfio_ioeventfd.flags = ioeventfd->size; + vfio_ioeventfd.data = ioeventfd->data; + vfio_ioeventfd.offset = ioeventfd->region->fd_offset + + ioeventfd->region_addr; + vfio_ioeventfd.fd = event_notifier_get_fd(&ioeventfd->e); + + ioeventfd->vfio = !ioctl(vdev->vbasedev.fd, + VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd); + } + + if (!ioeventfd->vfio) { + qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), + vfio_ioeventfd_handler, NULL, ioeventfd); + } + memory_region_add_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size, ioeventfd->match_data, ioeventfd->data, &ioeventfd->e); trace_vfio_ioeventfd_init(memory_region_name(mr), (uint64_t)addr, - size, data); + size, data, ioeventfd->vfio); return ioeventfd; } @@ -1835,7 +1871,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr) QLIST_FOREACH(quirk, &bar->quirks, next) { while (!QLIST_EMPTY(&quirk->ioeventfds)) { - vfio_ioeventfd_exit(QLIST_FIRST(&quirk->ioeventfds)); + vfio_ioeventfd_exit(vdev, QLIST_FIRST(&quirk->ioeventfds)); } for (i = 0; i < quirk->nr_mem; i++) { diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index ba1239551115..84e27c7bb2d1 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3177,6 +3177,8 @@ static Property vfio_pci_dev_properties[] = { no_geforce_quirks, false), DEFINE_PROP_BOOL("x-no-kvm-ioeventfd", VFIOPCIDevice, no_kvm_ioeventfd, false), + DEFINE_PROP_BOOL("x-no-vfio-ioeventfd", VFIOPCIDevice, no_vfio_ioeventfd, + false), DEFINE_PROP_UINT32("x-pci-vendor-id", VFIOPCIDevice, vendor_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-pci-device-id", VFIOPCIDevice, device_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-pci-sub-vendor-id", VFIOPCIDevice, diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index dbb3aca9b3d2..dbb3932b50ef 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -35,6 +35,7 @@ typedef struct VFIOIOEventFD { hwaddr region_addr; bool match_data; bool dynamic; + bool vfio; } VFIOIOEventFD; typedef struct VFIOQuirk { @@ -164,6 +165,7 @@ typedef struct VFIOPCIDevice { bool no_kvm_msix; bool no_geforce_quirks; bool no_kvm_ioeventfd; + bool no_vfio_ioeventfd; VFIODisplay *dpy; } VFIOPCIDevice; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index f8f97d1ff90c..d2a74952e389 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -79,7 +79,7 @@ vfio_quirk_ati_bonaire_reset_done(const char *name) "%s" vfio_quirk_ati_bonaire_reset(const char *name) "%s" vfio_ioeventfd_exit(const char *name, uint64_t addr, unsigned size, uint64_t data) "%s+0x%"PRIx64"[%d]:0x%"PRIx64 vfio_ioeventfd_handler(const char *name, uint64_t addr, unsigned size, uint64_t data) "%s+0x%"PRIx64"[%d] -> 0x%"PRIx64 -vfio_ioeventfd_init(const char *name, uint64_t addr, unsigned size, uint64_t data) "%s+0x%"PRIx64"[%d]:0x%"PRIx64 +vfio_ioeventfd_init(const char *name, uint64_t addr, unsigned size, uint64_t data, bool vfio) "%s+0x%"PRIx64"[%d]:0x%"PRIx64" vfio:%d" vfio_pci_igd_bar4_write(const char *name, uint32_t index, uint32_t data, uint32_t base) "%s [0x%03x] 0x%08x -> 0x%08x" vfio_pci_igd_bdsm_enabled(const char *name, int size) "%s %dMB" vfio_pci_igd_opregion_enabled(const char *name) "%s"