From patchwork Fri Oct 14 10:37:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jike Song X-Patchwork-Id: 682204 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3swPKg2fZsz9s3T for ; Fri, 14 Oct 2016 21:41:42 +1100 (AEDT) Received: from localhost ([::1]:46459 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1buzvm-0006Wv-6U for incoming@patchwork.ozlabs.org; Fri, 14 Oct 2016 06:41:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41801) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1buzut-0005za-IY for qemu-devel@nongnu.org; Fri, 14 Oct 2016 06:40:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1buzup-0002pe-4T for qemu-devel@nongnu.org; Fri, 14 Oct 2016 06:40:42 -0400 Received: from mga02.intel.com ([134.134.136.20]:25104) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1buzuo-0002p1-M2 for qemu-devel@nongnu.org; Fri, 14 Oct 2016 06:40:39 -0400 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP; 14 Oct 2016 03:40:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.31,344,1473145200"; d="scan'208"; a="1070357203" Received: from git1.bj.intel.com ([10.238.135.72]) by fmsmga002.fm.intel.com with ESMTP; 14 Oct 2016 03:40:36 -0700 Message-ID: <5800B579.9000705@intel.com> Date: Fri, 14 Oct 2016 18:37:45 +0800 From: Jike Song User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Paolo Bonzini References: <1475998904-13456-1-git-send-email-xiaoguang.chen@intel.com> <1475998904-13456-2-git-send-email-xiaoguang.chen@intel.com> <20161009083134.GA19090@nvidia.com> <20161010180140.GA27757@nvidia.com> <1259cdba-c137-c3da-abe2-ecf51aec6738@linux.intel.com> <523e1446-75f1-fe3a-d818-f7d238d57751@redhat.com> In-Reply-To: <523e1446-75f1-fe3a-d818-f7d238d57751@redhat.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.20 Subject: Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Tian, Kevin" , Neo Jia , kvm@vger.kernel.org, guangrong.xiao@intel.com, Alex Williamson , Xiaoguang Chen , qemu-devel , Kirti Wankhede , Xiao Guangrong Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" On 10/11/2016 05:47 PM, Paolo Bonzini wrote: > > > On 11/10/2016 11:21, Xiao Guangrong wrote: >> >> >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote: >>> >>> >>> On 11/10/2016 04:39, Xiao Guangrong wrote: >>>> >>>> >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote: >>>>> >>>>> >>>>> On 10/10/2016 20:01, Neo Jia wrote: >>>>>>> Hi Neo, >>>>>>> >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT, >>>>>>> while nVidia does. >>>>>> >>>>>> Hi Paolo and Xiaoguang, >>>>>> >>>>>> I am just wondering how device driver can register a notifier so he >>>>>> can be >>>>>> notified for write-protected pages when writes are happening. >>>>> >>>>> It can't yet, but the API is ready for that. kvm_vfio_set_group is >>>>> currently where a struct kvm_device* and struct vfio_group* touch. >>>>> Given >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to >>>>> kvm_page_track_register_notifier. So I guess you could add a callback >>>>> that passes the struct kvm_device* to the mdev device. >>>>> >>>>> Xiaoguang and Guangrong, what were your plans? We discussed it briefly >>>>> at KVM Forum but I don't remember the details. >>>> >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can >>>> figure out the kvm instance based on the fd. >>>> >>>> We got a new idea, how about search the kvm instance by mm_struct, it >>>> can work as KVMGT is running in the vcpu context and it is much more >>>> straightforward. >>> >>> Perhaps I didn't understand your suggestion, but the same mm_struct can >>> have more than 1 struct kvm so I'm not sure that it can work. >> >> vcpu->pid is valid during vcpu running so that it can be used to figure >> out which kvm instance owns the vcpu whose pid is the one as current >> thread, i think it can work. :) > > No, don't do that. There's no reason for a thread to run a single VCPU, > and if you can have multiple VCPUs you can also have multiple VCPUs from > multiple VMs. > > Passing file descriptors around are the right way to connect subsystems. [CC Alex, Kevin and Qemu-devel] Hi Paolo & Alex, IIUC, passing file descriptors means touching QEMU and the UAPI between QEMU and VFIO. Would you guys have a look at below draft patch? If it's on the correct direction, I'll send the split ones. Thanks! --- Thanks, Jike diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index bec694c..f715d37 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -10,12 +10,14 @@ * the COPYING file in the top-level directory. */ +#include #include "qemu/osdep.h" #include "qemu/error-report.h" #include "qemu/range.h" #include "qapi/error.h" #include "hw/nvram/fw_cfg.h" #include "pci.h" +#include "sysemu/kvm.h" #include "trace.h" /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */ @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev) break; } } + +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev) +{ + int vmfd; + + if (!kvm_enabled() || !vdev->kvmgt) + return; + + /* Tell the device what KVM it attached */ + vmfd = kvm_get_vmfd(kvm_state); + ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd); +} diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a5a620a..8732552 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev) return ret; } + vfio_quirk_kvmgt(vdev); + /* Get a copy of config space */ ret = pread(vdev->vbasedev.fd, vdev->pdev.config, MIN(pci_config_size(&vdev->pdev), vdev->config_size), @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice, sub_device_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0), + DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false), /* * TODO - support passed fds... is this necessary? * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name), diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 7d482d9..813832c 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice { bool no_kvm_intx; bool no_kvm_msi; bool no_kvm_msix; + bool kvmgt; } VFIOPCIDevice; uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); @@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev); int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, struct vfio_region_info *info); +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev); + #endif /* HW_VFIO_VFIO_PCI_H */ diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index df67cc0..dd8320a 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align)); int kvm_ioctl(KVMState *s, int type, ...); int kvm_vm_ioctl(KVMState *s, int type, ...); +int kvm_get_vmfd(KVMState *s); int kvm_vcpu_ioctl(CPUState *cpu, int type, ...); diff --git a/kvm-all.c b/kvm-all.c index efb5fe3..bd72ce3 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...) return ret; } +int kvm_get_vmfd(KVMState *s) +{ + return s->vmfd; +} + int kvm_vcpu_ioctl(CPUState *cpu, int type, ...) { int ret; diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index 759b850..952303f 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove { }; #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) + +/** + * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32) + */ +#define VFIO_SET_KVMFD _IO(VFIO_TYPE, VFIO_BASE + 21) + /* ***************************************************************** */ #endif /* VFIO_H */