From patchwork Sun Apr 1 09:13:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 893969 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="pg611mBs"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40DVXv28JGz9s1s for ; Sun, 1 Apr 2018 19:33:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753329AbeDAJdk (ORCPT ); Sun, 1 Apr 2018 05:33:40 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:45498 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753260AbeDAJde (ORCPT ); Sun, 1 Apr 2018 05:33:34 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w319I3s1124844; Sun, 1 Apr 2018 09:33:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=CuhXY8nVNEdm9BZkR9MkCJrFNNyuERjz0PLbBMkj2jk=; b=pg611mBs2eksDznOhcXtCxmv8psh7BXM7B0+vFBtysFDORUQ9hYBOPyczR3FdxMvdPHN sLJAcaBriLeOpuEbWvnFoGbUgMHE5TFUupRAUCXnInY9mlR4Bg5KROvPYXmHbKLLd7Nt 64/WCTD6R1YlVIPyh44FB0S1FTKawmyFb9YWia7Vkmqdcpq+UXIzwfYNRURX0bO1pPQJ 0hmoDODfQsSuLgk/V8ic6UqzsdKyx9xTaTLgvnsRnJfKHAWmdtRRNqaRw/PNwXO82HqE NszAyfmb9o4lSVGBzJBiCDUbJF2Lfm3BhvGHJWctENTwke2RPuQbOIdgpNlEsuBHze5Z xg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2h2vp100ur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 01 Apr 2018 09:33:24 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w319XNl9025337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 1 Apr 2018 09:33:24 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w319XKD7000697; Sun, 1 Apr 2018 09:33:20 GMT Received: from ban25x6uut24.us.oracle.com (/10.153.73.24) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 01 Apr 2018 02:33:19 -0700 From: Si-Wei Liu To: mst@redhat.com, jiri@resnulli.us, stephen@networkplumber.org, alexander.h.duyck@intel.com, davem@davemloft.net, jesse.brandeburg@intel.com, kubakici@wp.pl, jasowang@redhat.com, sridhar.samudrala@intel.com, netdev@vger.kernel.org, virtualization@lists.linux-foundation.org, virtio-dev@lists.oasis-open.org Subject: [RFC PATCH 1/3] qemu: virtio-bypass should explicitly bind to a passthrough device Date: Sun, 1 Apr 2018 05:13:08 -0400 Message-Id: <1522573990-5242-2-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> References: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8849 signatures=668697 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804010099 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The new backup option allows guest virtio-bypass driver to explicitly bind to a corresponding passthrough instance, which is identifiable by the :. notation. MAC address is still validated in the guest but not the only criteria for pairing two devices. MAC address is more a matter of network configuration than a (virtual) device identifier, the latter of which needs to be unique as part of VM configuration. Techinically it's possible there exists more than one device in the guest configured with the same MAC, but each belongs to completely isolated network. The direct benefit as a result of the explicit binding (or pairing), apparently, is the prohibition of improper binding or malicious pairing due to any flexiblility in terms of guest MAC address config. What's more important, the indicator of guest device location can even be used as a means to reserve the slot for the corresponding passthrough device in the PCI bus tree if such device is temporarily absent, but yet to be hot plugged into the VM. We'd need to preserve the slot for the passthrough device to which virtio-bypass is bound, such that once it is plugged out as a result of migration we can ensure the slot wouldn't be occupied by other devices, and any user-space application assumes consistent device location in the bus tree still works. The usage for the backup option is as follows: -device virtio-net-pci, ... ,backup=:[.] for e.g. -device virtio-net-pci,id=net0,mac=52:54:00:e0:58:80,backup=pci.2:0x3 ... -device vfio-pci,host=02:10.1,id=hostdev0,bus=pci.2,addr=0x3 Signed-off-by: Si-Wei Liu --- hw/net/virtio-net.c | 29 ++++++++++++- include/hw/pci/pci.h | 3 ++ include/hw/virtio/virtio-net.h | 2 + include/standard-headers/linux/virtio_net.h | 1 + qdev-monitor.c | 64 +++++++++++++++++++++++++++++ 5 files changed, 97 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index de31b1b98c..a36b169958 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -26,6 +26,7 @@ #include "qapi-event.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" +#include "hw/pci/pci.h" #define VIRTIO_NET_VM_VERSION 11 @@ -61,6 +62,8 @@ static VirtIOFeature feature_sizes[] = { .end = endof(struct virtio_net_config, max_virtqueue_pairs)}, {.flags = 1 << VIRTIO_NET_F_MTU, .end = endof(struct virtio_net_config, mtu)}, + {.flags = 1 << VIRTIO_NET_F_BACKUP, + .end = endof(struct virtio_net_config, bsf2backup)}, {} }; @@ -84,10 +87,24 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) { VirtIONet *n = VIRTIO_NET(vdev); struct virtio_net_config netcfg; + uint16_t busdevfn; virtio_stw_p(vdev, &netcfg.status, n->status); virtio_stw_p(vdev, &netcfg.max_virtqueue_pairs, n->max_queues); virtio_stw_p(vdev, &netcfg.mtu, n->net_conf.mtu); + if (n->net_conf.backup) { + /* Below function should not fail as the backup ID string + * has been validated when device is being realized. + * Until guest starts to run we can can get to the + * effective bus num in use from pci config space where + * guest had written to. + */ + pci_get_busdevfn_by_id(n->net_conf.backup, &busdevfn, + NULL, NULL); + busdevfn <<= 8; + busdevfn |= (n->backup_devfn & 0xFF); + virtio_stw_p(vdev, &netcfg.bsf2backup, busdevfn); + } memcpy(netcfg.mac, n->mac, ETH_ALEN); memcpy(config, &netcfg, n->config_size); } @@ -1935,11 +1952,19 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIONet *n = VIRTIO_NET(dev); NetClientState *nc; + uint16_t bdevfn; int i; if (n->net_conf.mtu) { n->host_features |= (0x1 << VIRTIO_NET_F_MTU); } + if (n->net_conf.backup) { + if (pci_get_busdevfn_by_id(n->net_conf.backup, NULL, + &bdevfn, errp)) + return; + n->backup_devfn = bdevfn; + n->host_features |= (0x1 << VIRTIO_NET_F_BACKUP); + } virtio_net_set_config_size(n, n->host_features); virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); @@ -2160,8 +2185,8 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0), DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend, true), - DEFINE_PROP_BIT("backup", VirtIONet, host_features, - VIRTIO_NET_F_BACKUP, false), + DEFINE_PROP_STRING("backup", VirtIONet, net_conf.backup), + DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index d8c18c7fa4..dbb910d162 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -431,6 +431,9 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus *rootbus, PCIDevice *pci_vga_init(PCIBus *bus); +int pci_get_busdevfn_by_id(const char *id, uint16_t *busnr, + uint16_t *devfn, Error **errp); + static inline PCIBus *pci_get_bus(const PCIDevice *dev) { return PCI_BUS(qdev_get_parent_bus(DEVICE(dev))); diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index b81b6a4624..276b39f64f 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -38,6 +38,7 @@ typedef struct virtio_net_conf uint16_t rx_queue_size; uint16_t tx_queue_size; uint16_t mtu; + char *backup; } virtio_net_conf; /* Maximum packet size we can receive from tap device: header + 64k */ @@ -99,6 +100,7 @@ typedef struct VirtIONet { int announce_counter; bool needs_vnet_hdr_swap; bool mtu_bypass_backend; + uint16_t backup_devfn; } VirtIONet; void virtio_net_set_netclient_name(VirtIONet *n, const char *name, diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h index 65dde3209d..cd936e5521 100644 --- a/include/standard-headers/linux/virtio_net.h +++ b/include/standard-headers/linux/virtio_net.h @@ -79,6 +79,7 @@ struct virtio_net_config { uint16_t max_virtqueue_pairs; /* Default maximum transmit unit advice */ uint16_t mtu; + uint16_t bsf2backup; } QEMU_PACKED; /* diff --git a/qdev-monitor.c b/qdev-monitor.c index 846238175f..600a81c73e 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -32,6 +32,8 @@ #include "qemu/help_option.h" #include "qemu/option.h" #include "sysemu/block-backend.h" +#include "hw/pci/pci.h" +#include "hw/vfio/pci.h" #include "migration/misc.h" /* @@ -896,6 +898,68 @@ void qmp_device_del(const char *id, Error **errp) } } +int pci_get_busdevfn_by_id(const char *id, uint16_t *busnr, + uint16_t *devfn, Error **errp) +{ + uint16_t busnum = 0, slot = 0, func = 0; + const char *pc, *pd, *pe; + Error *local_err = NULL; + ObjectClass *class; + char value[1024]; + BusState *bus; + uint64_t u64; + + if (!(pc = strchr(id, ':'))) { + error_setg(errp, "Invalid id: backup=%s, " + "correct format should be backup=" + "':[.]'", id); + return -1; + } + get_opt_name(value, sizeof(value), id, ':'); + if (pc != id + 1) { + bus = qbus_find(value, errp); + if (!bus) + return -1; + + class = object_get_class(OBJECT(bus)); + if (class != object_class_by_name(TYPE_PCI_BUS) && + class != object_class_by_name(TYPE_PCIE_BUS)) { + error_setg(errp, "%s is not a device on pci bus", id); + return -1; + } + busnum = (uint16_t)pci_bus_num(PCI_BUS(bus)); + } + + if (!devfn) + goto out; + + pd = strchr(pc, '.'); + pe = get_opt_name(value, sizeof(value), pc + 1, '.'); + if (pe != pc + 1) { + parse_option_number("slot", value, &u64, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return -1; + } + slot = (uint16_t)u64; + } + if (pd && *(pd + 1) != '\0') { + parse_option_number("function", pd, &u64, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return -1; + } + func = (uint16_t)u64; + } + +out: + if (busnr) + *busnr = busnum; + if (devfn) + *devfn = ((slot & 0x1F) << 3) | (func & 0x7); + return 0; +} + BlockBackend *blk_by_qdev_id(const char *id, Error **errp) { DeviceState *dev;