From patchwork Tue Aug 7 19:31:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 954663 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41lPqP1t9Sz9s4V for ; Wed, 8 Aug 2018 05:34:57 +1000 (AEST) Received: from localhost ([::1]:40556 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7ks-00072L-U1 for incoming@patchwork.ozlabs.org; Tue, 07 Aug 2018 15:34:54 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44292) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7hm-0005Fi-Tr for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fn7hk-0007z0-T1 for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:13376) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fn7hk-0007yQ-N7 for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:40 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F0E9C821C7; Tue, 7 Aug 2018 19:31:39 +0000 (UTC) Received: from gimli.home (ovpn-116-35.phx2.redhat.com [10.3.116.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5E0CE82775; Tue, 7 Aug 2018 19:31:38 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 7 Aug 2018 13:31:22 -0600 Message-Id: <20180807193125.30378-2-alex.williamson@redhat.com> In-Reply-To: <20180807193125.30378-1-alex.williamson@redhat.com> References: <20180807193125.30378-1-alex.williamson@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 07 Aug 2018 19:31:40 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3 1/4] balloon: Allow multiple inhibit users X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cornelia Huck , david@redhat.com, kvm@vger.kernel.org, peterx@redhat.com, "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" A simple true/false internal state does not allow multiple users. Fix this within the existing interface by converting to a counter, so long as the counter is elevated, ballooning is inhibited. Reviewed-by: David Hildenbrand Reviewed-by: Peter Xu Reviewed-by: Cornelia Huck Signed-off-by: Alex Williamson --- balloon.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/balloon.c b/balloon.c index 6bf0a9681377..931987983858 100644 --- a/balloon.c +++ b/balloon.c @@ -26,6 +26,7 @@ #include "qemu/osdep.h" #include "qemu-common.h" +#include "qemu/atomic.h" #include "exec/cpu-common.h" #include "sysemu/kvm.h" #include "sysemu/balloon.h" @@ -37,16 +38,22 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; static void *balloon_opaque; -static bool balloon_inhibited; +static int balloon_inhibit_count; bool qemu_balloon_is_inhibited(void) { - return balloon_inhibited; + return atomic_read(&balloon_inhibit_count) > 0; } void qemu_balloon_inhibit(bool state) { - balloon_inhibited = state; + if (state) { + atomic_inc(&balloon_inhibit_count); + } else { + atomic_dec(&balloon_inhibit_count); + } + + assert(atomic_read(&balloon_inhibit_count) >= 0); } static bool have_balloon(Error **errp) From patchwork Tue Aug 7 19:31:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 954661 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41lPmb07dvz9s4V for ; Wed, 8 Aug 2018 05:32:31 +1000 (AEST) Received: from localhost ([::1]:40549 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7iW-0005I2-Mc for incoming@patchwork.ozlabs.org; Tue, 07 Aug 2018 15:32:28 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44290) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7hm-0005Fg-TU for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fn7hl-0007zS-C5 for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34562) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fn7hl-0007yl-6Y for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:41 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7667A36807; Tue, 7 Aug 2018 19:31:40 +0000 (UTC) Received: from gimli.home (ovpn-116-35.phx2.redhat.com [10.3.116.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0829282774; Tue, 7 Aug 2018 19:31:39 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 7 Aug 2018 13:31:23 -0600 Message-Id: <20180807193125.30378-3-alex.williamson@redhat.com> In-Reply-To: <20180807193125.30378-1-alex.williamson@redhat.com> References: <20180807193125.30378-1-alex.williamson@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 07 Aug 2018 19:31:40 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cornelia Huck , david@redhat.com, kvm@vger.kernel.org, peterx@redhat.com, "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Remove KVM specific tests in balloon_page(), instead marking ballooning as inhibited without KVM_CAP_SYNC_MMU support. Reviewed-by: David Hildenbrand Reviewed-by: Peter Xu Reviewed-by: Cornelia Huck Signed-off-by: Alex Williamson Acked-by: Paolo Bonzini --- accel/kvm/kvm-all.c | 4 ++++ hw/virtio/virtio-balloon.c | 4 +--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index eb7db92a5e3b..38f468d8e2b1 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -39,6 +39,7 @@ #include "trace.h" #include "hw/irq.h" #include "sysemu/sev.h" +#include "sysemu/balloon.h" #include "hw/boards.h" @@ -1698,6 +1699,9 @@ static int kvm_init(MachineState *ms) s->many_ioeventfds = kvm_check_many_ioeventfds(); s->sync_mmu = !!kvm_vm_check_extension(kvm_state, KVM_CAP_SYNC_MMU); + if (!s->sync_mmu) { + qemu_balloon_inhibit(true); + } return 0; diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 1f7a87f09429..b5425080c5fb 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -21,7 +21,6 @@ #include "hw/mem/pc-dimm.h" #include "sysemu/balloon.h" #include "hw/virtio/virtio-balloon.h" -#include "sysemu/kvm.h" #include "exec/address-spaces.h" #include "qapi/error.h" #include "qapi/qapi-events-misc.h" @@ -36,8 +35,7 @@ static void balloon_page(void *addr, int deflate) { - if (!qemu_balloon_is_inhibited() && (!kvm_enabled() || - kvm_has_sync_mmu())) { + if (!qemu_balloon_is_inhibited()) { qemu_madvise(addr, BALLOON_PAGE_SIZE, deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); } From patchwork Tue Aug 7 19:31:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 954662 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41lPmc4WPWz9s5H for ; Wed, 8 Aug 2018 05:32:32 +1000 (AEST) Received: from localhost ([::1]:40550 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7iY-0005KV-5P for incoming@patchwork.ozlabs.org; Tue, 07 Aug 2018 15:32:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44300) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7hn-0005Fj-3c for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fn7hm-0007zy-0w for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58974) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fn7hl-0007zL-Pe for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:41 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 02EC1C047B63; Tue, 7 Aug 2018 19:31:41 +0000 (UTC) Received: from gimli.home (ovpn-116-35.phx2.redhat.com [10.3.116.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8445782774; Tue, 7 Aug 2018 19:31:40 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 7 Aug 2018 13:31:24 -0600 Message-Id: <20180807193125.30378-4-alex.williamson@redhat.com> In-Reply-To: <20180807193125.30378-1-alex.williamson@redhat.com> References: <20180807193125.30378-1-alex.williamson@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 07 Aug 2018 19:31:41 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3 3/4] vfio: Inhibit ballooning based on group attachment to a container X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cornelia Huck , david@redhat.com, kvm@vger.kernel.org, peterx@redhat.com, "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" We use a VFIOContainer to associate an AddressSpace to one or more VFIOGroups. The VFIOContainer represents the DMA context for that AdressSpace for those VFIOGroups and is synchronized to changes in that AddressSpace via a MemoryListener. For IOMMU backed devices, maintaining the DMA context for a VFIOGroup generally involves pinning a host virtual address in order to create a stable host physical address and then mapping a translation from the associated guest physical address to that host physical address into the IOMMU. While the above maintains the VFIOContainer synchronized to the QEMU memory API of the VM, memory ballooning occurs outside of that API. Inflating the memory balloon (ie. cooperatively capturing pages from the guest for use by the host) simply uses MADV_DONTNEED to "zap" pages from QEMU's host virtual address space. The page pinning and IOMMU mapping above remains in place, negating the host's ability to reuse the page, but the host virtual to host physical mapping of the page is invalidated outside of QEMU's memory API. When the balloon is later deflated, attempting to cooperatively return pages to the guest, the page is simply freed by the guest balloon driver, allowing it to be used in the guest and incurring a page fault when that occurs. The page fault maps a new host physical page backing the existing host virtual address, meanwhile the VFIOContainer still maintains the translation to the original host physical address. At this point the guest vCPU and any assigned devices will map different host physical addresses to the same guest physical address. Badness. The IOMMU typically does not have page level granularity with which it can track this mapping without also incurring inefficiencies in using page size mappings throughout. MMU notifiers in the host kernel also provide indicators for invalidating the mapping on balloon inflation, not for updating the mapping when the balloon is deflated. For these reasons we assume a default behavior that the mapping of each VFIOGroup into the VFIOContainer is incompatible with memory ballooning and increment the balloon inhibitor to match the attached VFIOGroups. Signed-off-by: Alex Williamson Reviewed-by: Peter Xu --- hw/vfio/common.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index fb396cf00ac4..a3d758af8d8f 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -32,6 +32,7 @@ #include "hw/hw.h" #include "qemu/error-report.h" #include "qemu/range.h" +#include "sysemu/balloon.h" #include "sysemu/kvm.h" #include "trace.h" #include "qapi/error.h" @@ -1044,6 +1045,33 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, space = vfio_get_address_space(as); + /* + * VFIO is currently incompatible with memory ballooning insofar as the + * madvise to purge (zap) the page from QEMU's address space does not + * interact with the memory API and therefore leaves stale virtual to + * physical mappings in the IOMMU if the page was previously pinned. We + * therefore add a balloon inhibit for each group added to a container, + * whether the container is used individually or shared. This provides + * us with options to allow devices within a group to opt-in and allow + * ballooning, so long as it is done consistently for a group (for instance + * if the device is an mdev device where it is known that the host vendor + * driver will never pin pages outside of the working set of the guest + * driver, which would thus not be ballooning candidates). + * + * The first opportunity to induce pinning occurs here where we attempt to + * attach the group to existing containers within the AddressSpace. If any + * pages are already zapped from the virtual address space, such as from a + * previous ballooning opt-in, new pinning will cause valid mappings to be + * re-established. Likewise, when the overall MemoryListener for a new + * container is registered, a replay of mappings within the AddressSpace + * will occur, re-establishing any previously zapped pages as well. + * + * NB. Balloon inhibiting does not currently block operation of the + * balloon driver or revoke previously pinned pages, it only prevents + * calling madvise to modify the virtual mapping of ballooned pages. + */ + qemu_balloon_inhibit(true); + QLIST_FOREACH(container, &space->containers, next) { if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) { group->container = container; @@ -1232,6 +1260,7 @@ close_fd_exit: close(fd); put_space_exit: + qemu_balloon_inhibit(false); vfio_put_address_space(space); return ret; @@ -1352,6 +1381,7 @@ void vfio_put_group(VFIOGroup *group) return; } + qemu_balloon_inhibit(false); vfio_kvm_device_del_group(group); vfio_disconnect_container(group); QLIST_REMOVE(group, next); From patchwork Tue Aug 7 19:31:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 954664 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41lPqV4bgYz9s4V for ; Wed, 8 Aug 2018 05:35:02 +1000 (AEST) Received: from localhost ([::1]:40559 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7ky-00077o-9Z for incoming@patchwork.ozlabs.org; Tue, 07 Aug 2018 15:35:00 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44326) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn7ho-0005Fz-Cs for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fn7hm-00080k-RX for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43552) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fn7hm-000806-JG for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:31:42 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D9E0A307D85B; Tue, 7 Aug 2018 19:31:41 +0000 (UTC) Received: from gimli.home (ovpn-116-35.phx2.redhat.com [10.3.116.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1326082774; Tue, 7 Aug 2018 19:31:41 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 7 Aug 2018 13:31:25 -0600 Message-Id: <20180807193125.30378-5-alex.williamson@redhat.com> In-Reply-To: <20180807193125.30378-1-alex.williamson@redhat.com> References: <20180807193125.30378-1-alex.williamson@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 07 Aug 2018 19:31:41 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cornelia Huck , david@redhat.com, kvm@vger.kernel.org, peterx@redhat.com, "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson --- hw/vfio/ccw.c | 9 +++++++++ hw/vfio/common.c | 23 ++++++++++++++++++++++- hw/vfio/pci.c | 26 +++++++++++++++++++++++++- hw/vfio/trace-events | 1 + include/hw/vfio/vfio-common.h | 2 ++ 5 files changed, 59 insertions(+), 2 deletions(-) diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index 351b305e1ae7..e96bbdc78b48 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -349,6 +349,15 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice *vcdev, } } + /* + * All vfio-ccw devices are believed to operate in a way compatible with + * memory ballooning, ie. pages pinned in the host are in the current + * working set of the guest driver and therefore never overlap with pages + * available to the guest balloon driver. This needs to be set before + * vfio_get_device() for vfio common to handle the balloon inhibitor. + */ + vcdev->vdev.balloon_allowed = true; + if (vfio_get_device(group, vcdev->cdev.mdevid, &vcdev->vdev, errp)) { goto out_err; } diff --git a/hw/vfio/common.c b/hw/vfio/common.c index a3d758af8d8f..cd1f4af18abb 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1381,7 +1381,9 @@ void vfio_put_group(VFIOGroup *group) return; } - qemu_balloon_inhibit(false); + if (!group->balloon_allowed) { + qemu_balloon_inhibit(false); + } vfio_kvm_device_del_group(group); vfio_disconnect_container(group); QLIST_REMOVE(group, next); @@ -1417,6 +1419,25 @@ int vfio_get_device(VFIOGroup *group, const char *name, return ret; } + /* + * Clear the balloon inhibitor for this group if the driver knows the + * device operates compatibly with ballooning. Setting must be consistent + * per group, but since compatibility is really only possible with mdev + * currently, we expect singleton groups. + */ + if (vbasedev->balloon_allowed != group->balloon_allowed) { + if (!QLIST_EMPTY(&group->device_list)) { + error_setg(errp, + "Inconsistent device balloon setting within group"); + return -1; + } + + if (!group->balloon_allowed) { + group->balloon_allowed = true; + qemu_balloon_inhibit(false); + } + } + vbasedev->fd = fd; vbasedev->group = group; QLIST_INSERT_HEAD(&group->device_list, vbasedev, next); diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 6cbb8fa0549d..056f3a887a8f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2804,12 +2804,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); VFIODevice *vbasedev_iter; VFIOGroup *group; - char *tmp, group_path[PATH_MAX], *group_name; + char *tmp, *subsys, group_path[PATH_MAX], *group_name; Error *err = NULL; ssize_t len; struct stat st; int groupid; int i, ret; + bool is_mdev; if (!vdev->vbasedev.sysfsdev) { if (!(~vdev->host.domain || ~vdev->host.bus || @@ -2869,6 +2870,27 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) } } + /* + * Mediated devices *might* operate compatibly with memory ballooning, but + * we cannot know for certain, it depends on whether the mdev vendor driver + * stays in sync with the active working set of the guest driver. Prevent + * the x-balloon-allowed option unless this is minimally an mdev device. + */ + tmp = g_strdup_printf("%s/subsystem", vdev->vbasedev.sysfsdev); + subsys = realpath(tmp, NULL); + g_free(tmp); + is_mdev = (strcmp(subsys, "/sys/bus/mdev") == 0); + free(subsys); + + trace_vfio_mdev(vdev->vbasedev.name, is_mdev); + + if (vdev->vbasedev.balloon_allowed && !is_mdev) { + error_setg(errp, "x-balloon-allowed only potentially compatible " + "with mdev devices"); + vfio_put_group(group); + goto error; + } + ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev, errp); if (ret) { vfio_put_group(group); @@ -3170,6 +3192,8 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), + DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice, + vbasedev.balloon_allowed, false), DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false), DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false), DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false), diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index d2a74952e389..a85e8662eadb 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -39,6 +39,7 @@ vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: % vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n size: 0x%lx, offset: 0x%lx, flags: 0x%lx" vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m" vfio_realize(const char *name, int group_id) " (%s) group %d" +vfio_mdev(const char *name, bool is_mdev) " (%s) is_mdev %d" vfio_add_ext_cap_dropped(const char *name, uint16_t cap, uint16_t offset) "%s 0x%x@0x%x" vfio_pci_reset(const char *name) " (%s)" vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index a9036929b220..15ea6c26fdbc 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -112,6 +112,7 @@ typedef struct VFIODevice { bool reset_works; bool needs_reset; bool no_mmap; + bool balloon_allowed; VFIODeviceOps *ops; unsigned int num_irqs; unsigned int num_regions; @@ -131,6 +132,7 @@ typedef struct VFIOGroup { QLIST_HEAD(, VFIODevice) device_list; QLIST_ENTRY(VFIOGroup) next; QLIST_ENTRY(VFIOGroup) container_next; + bool balloon_allowed; } VFIOGroup; typedef struct VFIODMABuf {