From patchwork Tue May 27 12:06:02 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Nikolaev X-Patchwork-Id: 352891 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 525191400E1 for ; Tue, 27 May 2014 22:08:18 +1000 (EST) Received: from localhost ([::1]:33846 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WpGAy-0001cH-1T for incoming@patchwork.ozlabs.org; Tue, 27 May 2014 08:08:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43787) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WpG93-0006zu-4I for qemu-devel@nongnu.org; Tue, 27 May 2014 08:06:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WpG8w-00084I-Tq for qemu-devel@nongnu.org; Tue, 27 May 2014 08:06:17 -0400 Received: from mail-wg0-f52.google.com ([74.125.82.52]:51767) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WpG8w-000844-LF for qemu-devel@nongnu.org; Tue, 27 May 2014 08:06:10 -0400 Received: by mail-wg0-f52.google.com with SMTP id l18so9283805wgh.23 for ; Tue, 27 May 2014 05:06:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-type :content-transfer-encoding; bh=eJY/K6h8iB8H0+cuOvJbtVC/jUDSW/KPg8fMjnRhylc=; b=B+VQul1LpKn+FXP+ncMZMLCL2W3AIAugholmlPg0OBirsldu6IlNw3qzQSqdffZe4q eNcQOl4sH1XnwVTOzjreeaTlnaif6fR8NmQ4LHJQglfZu1Ka23760uxilRlWXTLJxCx6 Z2984ldUJ7jieKP0PCJatHiZ4aItjtoJQq26cGb9Uv4KGNieehd8qeZ2IhUQxOuMl3Gw 9opAkNOon6MHSsl3xOZfDSQpixnEIR9s7ISXvaRpz9vZn2LWjhDctxv7yZaQar5KMJJ0 2uY4ABA3ZSiSdPLBAh4v9u/AXevEB4X63/IcDSkvRdvMrHgMEJxq3m/yaZ1k0pLfUPes oNPA== X-Gm-Message-State: ALoCoQmYZWKXl9O52ok8q6d3mu/W7mx2v3TpiAK/RBW2X7ycaNrV8QjE6fbZMsLAUu5l/csTmDs1 X-Received: by 10.180.184.167 with SMTP id ev7mr36823244wic.55.1401192369981; Tue, 27 May 2014 05:06:09 -0700 (PDT) Received: from [0.0.14.236] ([82.146.27.14]) by mx.google.com with ESMTPSA id q2sm7873555wix.5.2014.05.27.05.06.08 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 27 May 2014 05:06:09 -0700 (PDT) From: Nikolay Nikolaev To: snabb-devel@googlegroups.com, qemu-devel@nongnu.org Date: Tue, 27 May 2014 15:06:02 +0300 Message-ID: <20140527120557.15172.79065.stgit@3820> In-Reply-To: <20140527120050.15172.94908.stgit@3820> References: <20140527120050.15172.94908.stgit@3820> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 74.125.82.52 Cc: a.motakis@virtualopensystems.com, luke@snabb.co, tech@virtualopensystems.com, n.nikolaev@virtualopensystems.com, mst@redhat.com Subject: [Qemu-devel] [PATCH v10 12/18] Add vhost-user as a vhost backend. X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The initialization takes a chardev backed by a unix domain socket. It should implement qemu_fe_set_msgfds in order to be able to pass file descriptors to the remote process. Each ioctl request of vhost-kernel has a vhost-user message equivalent, which is sent over the control socket. The general approach is to copy the data from the supplied argument pointer to a designated field in the message. If a file descriptor is to be passed it will be placed in the fds array for inclusion in the sendmsg control header. VHOST_SET_MEM_TABLE ignores the supplied vhost_memory structure and scans the global ram_list for ram blocks with a valid fd field set. This would be set when the '-object memory-file' option with share=on property is used. Signed-off-by: Antonios Motakis Signed-off-by: Nikolay Nikolaev --- hw/virtio/Makefile.objs | 2 hw/virtio/vhost-backend.c | 5 + hw/virtio/vhost-user.c | 342 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 348 insertions(+), 1 deletion(-) create mode 100644 hw/virtio/vhost-user.c diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index 51e5bdb..ec9e855 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -5,4 +5,4 @@ common-obj-y += virtio-mmio.o common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/ obj-y += virtio.o virtio-balloon.o -obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o +obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index 509e103..35316c4 100644 --- a/hw/virtio/vhost-backend.c +++ b/hw/virtio/vhost-backend.c @@ -14,6 +14,8 @@ #include +extern const VhostOps user_ops; + static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int request, void *arg) { @@ -57,6 +59,9 @@ int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) case VHOST_BACKEND_TYPE_KERNEL: dev->vhost_ops = &kernel_ops; break; + case VHOST_BACKEND_TYPE_USER: + dev->vhost_ops = &user_ops; + break; default: error_report("Unknown vhost backend type\n"); r = -1; diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c new file mode 100644 index 0000000..3244ef8 --- /dev/null +++ b/hw/virtio/vhost-user.c @@ -0,0 +1,342 @@ +/* + * vhost-user + * + * Copyright (c) 2013 Virtual Open Systems Sarl. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#include "hw/virtio/vhost.h" +#include "hw/virtio/vhost-backend.h" +#include "sysemu/char.h" +#include "sysemu/kvm.h" +#include "qemu/error-report.h" +#include "qemu/sockets.h" + +#include +#include +#include +#include +#include +#include + +#define VHOST_MEMORY_MAX_NREGIONS 8 + +typedef enum VhostUserRequest { + VHOST_USER_NONE = 0, + VHOST_USER_GET_FEATURES = 1, + VHOST_USER_SET_FEATURES = 2, + VHOST_USER_SET_OWNER = 3, + VHOST_USER_RESET_OWNER = 4, + VHOST_USER_SET_MEM_TABLE = 5, + VHOST_USER_SET_LOG_BASE = 6, + VHOST_USER_SET_LOG_FD = 7, + VHOST_USER_SET_VRING_NUM = 8, + VHOST_USER_SET_VRING_ADDR = 9, + VHOST_USER_SET_VRING_BASE = 10, + VHOST_USER_GET_VRING_BASE = 11, + VHOST_USER_SET_VRING_KICK = 12, + VHOST_USER_SET_VRING_CALL = 13, + VHOST_USER_SET_VRING_ERR = 14, + VHOST_USER_MAX +} VhostUserRequest; + +typedef struct VhostUserMemoryRegion { + uint64_t guest_phys_addr; + uint64_t memory_size; + uint64_t userspace_addr; +} VhostUserMemoryRegion; + +typedef struct VhostUserMemory { + uint32_t nregions; + uint32_t padding; + VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; +} VhostUserMemory; + +typedef struct VhostUserMsg { + VhostUserRequest request; + +#define VHOST_USER_VERSION_MASK (0x3) +#define VHOST_USER_REPLY_MASK (0x1<<2) + uint32_t flags; + uint32_t size; /* the following payload size */ + union { +#define VHOST_USER_VRING_IDX_MASK (0xff) +#define VHOST_USER_VRING_NOFD_MASK (0x1<<8) + uint64_t u64; + struct vhost_vring_state state; + struct vhost_vring_addr addr; + VhostUserMemory memory; + }; +} QEMU_PACKED VhostUserMsg; + +static VhostUserMsg m __attribute__ ((unused)); +#define VHOST_USER_HDR_SIZE (sizeof(m.request) \ + + sizeof(m.flags) \ + + sizeof(m.size)) + +#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE) + +/* The version of the protocol we support */ +#define VHOST_USER_VERSION (0x1) + +static bool ioeventfd_enabled(void) +{ + return kvm_enabled() && kvm_eventfds_enabled(); +} + +static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = { + -1, /* VHOST_USER_NONE */ + VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ + VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ + VHOST_SET_OWNER, /* VHOST_USER_SET_OWNER */ + VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ + VHOST_SET_MEM_TABLE, /* VHOST_USER_SET_MEM_TABLE */ + VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ + VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ + VHOST_SET_VRING_NUM, /* VHOST_USER_SET_VRING_NUM */ + VHOST_SET_VRING_ADDR, /* VHOST_USER_SET_VRING_ADDR */ + VHOST_SET_VRING_BASE, /* VHOST_USER_SET_VRING_BASE */ + VHOST_GET_VRING_BASE, /* VHOST_USER_GET_VRING_BASE */ + VHOST_SET_VRING_KICK, /* VHOST_USER_SET_VRING_KICK */ + VHOST_SET_VRING_CALL, /* VHOST_USER_SET_VRING_CALL */ + VHOST_SET_VRING_ERR /* VHOST_USER_SET_VRING_ERR */ +}; + +static VhostUserRequest vhost_user_request_translate(unsigned long int request) +{ + VhostUserRequest idx; + + for (idx = 0; idx < VHOST_USER_MAX; idx++) { + if (ioctl_to_vhost_user_request[idx] == request) { + break; + } + } + + return (idx == VHOST_USER_MAX) ? VHOST_USER_NONE : idx; +} + +static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) +{ + CharDriverState *chr = dev->opaque; + uint8_t *p = (uint8_t *) msg; + int r, size = VHOST_USER_HDR_SIZE; + + r = qemu_chr_fe_read_all(chr, p, size); + if (r != size) { + error_report("Failed to read msg header. Read %d instead of %d.\n", r, + size); + goto fail; + } + + /* validate received flags */ + if (msg->flags != (VHOST_USER_REPLY_MASK | VHOST_USER_VERSION)) { + error_report("Failed to read msg header." + " Flags 0x%x instead of 0x%x.\n", msg->flags, + VHOST_USER_REPLY_MASK | VHOST_USER_VERSION); + goto fail; + } + + /* validate message size is sane */ + if (msg->size > VHOST_USER_PAYLOAD_SIZE) { + error_report("Failed to read msg header." + " Size %d exceeds the maximum %zu.\n", msg->size, + VHOST_USER_PAYLOAD_SIZE); + goto fail; + } + + if (msg->size) { + p += VHOST_USER_HDR_SIZE; + size = msg->size; + r = qemu_chr_fe_read_all(chr, p, size); + if (r != size) { + error_report("Failed to read msg payload." + " Read %d instead of %d.\n", r, msg->size); + goto fail; + } + } + + return 0; + +fail: + return -1; +} + +static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg, + int *fds, int fd_num) +{ + CharDriverState *chr = dev->opaque; + int size = VHOST_USER_HDR_SIZE + msg->size; + + if (fd_num) { + qemu_chr_fe_set_msgfds(chr, fds, fd_num); + } + + return qemu_chr_fe_write_all(chr, (const uint8_t *) msg, size) == size ? + 0 : -1; +} + +static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, + void *arg) +{ + VhostUserMsg msg; + VhostUserRequest msg_request; + RAMBlock *block = 0; + struct vhost_vring_file *file = 0; + int need_reply = 0; + int fds[VHOST_MEMORY_MAX_NREGIONS]; + size_t fd_num = 0; + + assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); + + msg_request = vhost_user_request_translate(request); + msg.request = msg_request; + msg.flags = VHOST_USER_VERSION; + msg.size = 0; + + switch (request) { + case VHOST_GET_FEATURES: + need_reply = 1; + break; + + case VHOST_SET_FEATURES: + case VHOST_SET_LOG_BASE: + msg.u64 = *((__u64 *) arg); + msg.size = sizeof(m.u64); + break; + + case VHOST_SET_OWNER: + case VHOST_RESET_OWNER: + break; + + case VHOST_SET_MEM_TABLE: + QTAILQ_FOREACH(block, &ram_list.blocks, next) + { + if (block->fd > 0) { + msg.memory.regions[fd_num].userspace_addr = + (uintptr_t) block->host; + msg.memory.regions[fd_num].memory_size = block->length; + msg.memory.regions[fd_num].guest_phys_addr = block->offset; + fds[fd_num++] = block->fd; + } + } + + msg.memory.nregions = fd_num; + + if (!fd_num) { + error_report("Failed initializing vhost-user memory map\n" + "consider using -object memory-file share=on\n"); + return -1; + } + + msg.size = sizeof(m.memory.nregions); + msg.size += sizeof(m.memory.padding); + msg.size += fd_num * sizeof(VhostUserMemoryRegion); + + break; + + case VHOST_SET_LOG_FD: + fds[fd_num++] = *((int *) arg); + break; + + case VHOST_SET_VRING_NUM: + case VHOST_SET_VRING_BASE: + memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); + msg.size = sizeof(m.state); + break; + + case VHOST_GET_VRING_BASE: + memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); + msg.size = sizeof(m.state); + need_reply = 1; + break; + + case VHOST_SET_VRING_ADDR: + memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr)); + msg.size = sizeof(m.addr); + break; + + case VHOST_SET_VRING_KICK: + case VHOST_SET_VRING_CALL: + case VHOST_SET_VRING_ERR: + file = arg; + msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK; + msg.size = sizeof(m.u64); + if (ioeventfd_enabled() && file->fd > 0) { + fds[fd_num++] = file->fd; + } else { + msg.u64 |= VHOST_USER_VRING_NOFD_MASK; + } + break; + default: + error_report("vhost-user trying to send unhandled ioctl\n"); + return -1; + break; + } + + if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { + return 0; + } + + if (need_reply) { + if (vhost_user_read(dev, &msg) < 0) { + return 0; + } + + if (msg_request != msg.request) { + error_report("Received unexpected msg type." + " Expected %d received %d\n", msg_request, msg.request); + return -1; + } + + switch (msg_request) { + case VHOST_USER_GET_FEATURES: + if (msg.size != sizeof(m.u64)) { + error_report("Received bad msg size.\n"); + return -1; + } + *((__u64 *) arg) = msg.u64; + break; + case VHOST_USER_GET_VRING_BASE: + if (msg.size != sizeof(m.state)) { + error_report("Received bad msg size.\n"); + return -1; + } + memcpy(arg, &msg.state, sizeof(struct vhost_vring_state)); + break; + default: + error_report("Received unexpected msg type.\n"); + return -1; + break; + } + } + + return 0; +} + +static int vhost_user_init(struct vhost_dev *dev, void *opaque) +{ + assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); + + dev->opaque = opaque; + + return 0; +} + +static int vhost_user_cleanup(struct vhost_dev *dev) +{ + assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); + + dev->opaque = 0; + + return 0; +} + +const VhostOps user_ops = { + .backend_type = VHOST_BACKEND_TYPE_USER, + .vhost_call = vhost_user_call, + .vhost_backend_init = vhost_user_init, + .vhost_backend_cleanup = vhost_user_cleanup + };