From patchwork Thu Jan 25 04:03:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tiwei Bie X-Patchwork-Id: 865713 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zRpR80zRgz9t3F for ; Thu, 25 Jan 2018 15:07:48 +1100 (AEDT) Received: from localhost ([::1]:40527 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eeYpG-0004Ej-65 for incoming@patchwork.ozlabs.org; Wed, 24 Jan 2018 23:07:46 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40463) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eeYlx-000270-7S for qemu-devel@nongnu.org; Wed, 24 Jan 2018 23:04:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eeYls-0005W8-Pu for qemu-devel@nongnu.org; Wed, 24 Jan 2018 23:04:21 -0500 Received: from mga14.intel.com ([192.55.52.115]:64665) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eeYls-0005Rs-DE for qemu-devel@nongnu.org; Wed, 24 Jan 2018 23:04:16 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jan 2018 20:04:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,409,1511856000"; d="scan'208";a="25261312" Received: from debian-xvivbkq.sh.intel.com ([10.67.104.226]) by fmsmga001.fm.intel.com with ESMTP; 24 Jan 2018 20:04:10 -0800 From: Tiwei Bie To: qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org, mst@redhat.com, alex.williamson@redhat.com, jasowang@redhat.com, pbonzini@redhat.com, stefanha@redhat.com Date: Thu, 25 Jan 2018 12:03:22 +0800 Message-Id: <20180125040328.22867-1-tiwei.bie@intel.com> X-Mailer: git-send-email 2.13.3 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.115 Subject: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jianfeng.tan@intel.com, tiwei.bie@intel.com, cunming.liang@intel.com, xiao.w.wang@intel.com, zhihong.wang@intel.com, dan.daly@intel.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This patch set does some small extensions to vhost-user protocol to support VFIO based accelerators, and makes it possible to get the similar performance of VFIO based PCI passthru while keeping the virtio device emulation in QEMU. How does accelerator accelerate vhost (data path) ================================================= Any virtio ring compatible devices potentially can be used as the vhost data path accelerators. We can setup the accelerator based on the informations (e.g. memory table, features, ring info, etc) available on the vhost backend. And accelerator will be able to use the virtio ring provided by the virtio driver in the VM directly. So the virtio driver in the VM can exchange e.g. network packets with the accelerator directly via the virtio ring. That is to say, we will be able to use the accelerator to accelerate the vhost data path. We call it vDPA: vhost Data Path Acceleration. Notice: Although the accelerator can talk with the virtio driver in the VM via the virtio ring directly. The control path events (e.g. device start/stop) in the VM will still be trapped and handled by QEMU, and QEMU will deliver such events to the vhost backend via standard vhost protocol. Below link is an example showing how to setup a such environment via nested VM. In this case, the virtio device in the outer VM is the accelerator. It will be used to accelerate the virtio device in the inner VM. In reality, we could use virtio ring compatible hardware device as the accelerators. http://dpdk.org/ml/archives/dev/2017-December/085044.html In above example, it doesn't require any changes to QEMU, but it has lower performance compared with the traditional VFIO based PCI passthru. And that's the problem this patch set wants to solve. The performance issue of vDPA/vhost-user and solutions ====================================================== For vhost-user backend, the critical issue in vDPA is that the data path performance is relatively low and some host threads are needed for the data path, because some necessary mechanisms are missing to support: 1) guest driver notifies the device directly; 2) device interrupts the guest directly; So this patch set does some small extensions to the vhost-user protocol to make both of them possible. It leverages the same mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as the PCI passthru. A new protocol feature bit is added to negotiate the accelerator feature support. Two new slave message types are added to control the notify region and queue interrupt passthru for each queue. From the view of vhost-user protocol design, it's very flexible. The passthru can be enabled/disabled for each queue individually, and it's possible to accelerate each queue by different devices. More design and implementation details can be found from the last patch. Difference between vDPA and PCI passthru ======================================== The key difference between PCI passthru and vDPA is that, in vDPA only the data path of the device (e.g. DMA ring, notify region and queue interrupt) is pass-throughed to the VM, the device control path (e.g. PCI configuration space and MMIO regions) is still defined and emulated by QEMU. The benefits of keeping virtio device emulation in QEMU compared with virtio device PCI passthru include (but not limit to): - consistent device interface for guest OS in the VM; - max flexibility on the hardware (i.e. the accelerators) design; - leveraging the existing virtio live-migration framework; Why extend vhost-user for vDPA ============================== We have already implemented various virtual switches (e.g. OVS-DPDK) based on vhost-user for VMs in the Cloud. They are purely software running on CPU cores. When we have accelerators for such NFVi applications, it's ideal if the applications could keep using the original interface (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide when and how to switch between CPU and accelerators within the interface. And the switching (i.e. switch between CPU and accelerators) can be done flexibly and quickly inside the applications. More details about this can be found from the Cunming's discussions on the RFC patch set. The previous links: RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html RFC -> v1: - Add some details about how vDPA works in cover letter (Alexey) - Add some details about the OVS offload use-case in cover letter (Jason) - Move PCI specific stuffs out of vhost-user (Jason) - Handle the virtual IOMMU case (Jason) - Move VFIO group management code into vfio/common.c (Alex) - Various refinements; (approximately sorted by comment posting time) Tiwei Bie (6): vhost-user: support receiving file descriptors in slave_read vhost-user: introduce shared vhost-user state virtio: support adding sub-regions for notify region vfio: support getting VFIOGroup from groupfd vfio: remove DPRINTF() definition from vfio-common.h vhost-user: add VFIO based accelerators support Makefile.target | 4 + docs/interop/vhost-user.txt | 57 +++++++++ hw/scsi/vhost-user-scsi.c | 6 +- hw/vfio/common.c | 96 ++++++++++++++- hw/virtio/vhost-user.c | 250 +++++++++++++++++++++++++++++++++++++++- hw/virtio/virtio-pci.c | 48 ++++++++ hw/virtio/virtio-pci.h | 5 + hw/virtio/virtio.c | 39 +++++++ include/hw/vfio/vfio-common.h | 11 +- include/hw/virtio/vhost-user.h | 34 ++++++ include/hw/virtio/virtio-scsi.h | 6 +- include/hw/virtio/virtio.h | 5 + include/qemu/osdep.h | 1 + net/vhost-user.c | 30 ++--- 14 files changed, 559 insertions(+), 33 deletions(-) create mode 100644 include/hw/virtio/vhost-user.h