From patchwork Sun Apr 1 09:13:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 893969 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="pg611mBs"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40DVXv28JGz9s1s for ; Sun, 1 Apr 2018 19:33:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753329AbeDAJdk (ORCPT ); Sun, 1 Apr 2018 05:33:40 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:45498 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753260AbeDAJde (ORCPT ); Sun, 1 Apr 2018 05:33:34 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w319I3s1124844; Sun, 1 Apr 2018 09:33:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=CuhXY8nVNEdm9BZkR9MkCJrFNNyuERjz0PLbBMkj2jk=; b=pg611mBs2eksDznOhcXtCxmv8psh7BXM7B0+vFBtysFDORUQ9hYBOPyczR3FdxMvdPHN sLJAcaBriLeOpuEbWvnFoGbUgMHE5TFUupRAUCXnInY9mlR4Bg5KROvPYXmHbKLLd7Nt 64/WCTD6R1YlVIPyh44FB0S1FTKawmyFb9YWia7Vkmqdcpq+UXIzwfYNRURX0bO1pPQJ 0hmoDODfQsSuLgk/V8ic6UqzsdKyx9xTaTLgvnsRnJfKHAWmdtRRNqaRw/PNwXO82HqE NszAyfmb9o4lSVGBzJBiCDUbJF2Lfm3BhvGHJWctENTwke2RPuQbOIdgpNlEsuBHze5Z xg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2h2vp100ur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 01 Apr 2018 09:33:24 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w319XNl9025337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 1 Apr 2018 09:33:24 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w319XKD7000697; Sun, 1 Apr 2018 09:33:20 GMT Received: from ban25x6uut24.us.oracle.com (/10.153.73.24) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 01 Apr 2018 02:33:19 -0700 From: Si-Wei Liu To: mst@redhat.com, jiri@resnulli.us, stephen@networkplumber.org, alexander.h.duyck@intel.com, davem@davemloft.net, jesse.brandeburg@intel.com, kubakici@wp.pl, jasowang@redhat.com, sridhar.samudrala@intel.com, netdev@vger.kernel.org, virtualization@lists.linux-foundation.org, virtio-dev@lists.oasis-open.org Subject: [RFC PATCH 1/3] qemu: virtio-bypass should explicitly bind to a passthrough device Date: Sun, 1 Apr 2018 05:13:08 -0400 Message-Id: <1522573990-5242-2-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> References: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8849 signatures=668697 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804010099 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The new backup option allows guest virtio-bypass driver to explicitly bind to a corresponding passthrough instance, which is identifiable by the :. notation. MAC address is still validated in the guest but not the only criteria for pairing two devices. MAC address is more a matter of network configuration than a (virtual) device identifier, the latter of which needs to be unique as part of VM configuration. Techinically it's possible there exists more than one device in the guest configured with the same MAC, but each belongs to completely isolated network. The direct benefit as a result of the explicit binding (or pairing), apparently, is the prohibition of improper binding or malicious pairing due to any flexiblility in terms of guest MAC address config. What's more important, the indicator of guest device location can even be used as a means to reserve the slot for the corresponding passthrough device in the PCI bus tree if such device is temporarily absent, but yet to be hot plugged into the VM. We'd need to preserve the slot for the passthrough device to which virtio-bypass is bound, such that once it is plugged out as a result of migration we can ensure the slot wouldn't be occupied by other devices, and any user-space application assumes consistent device location in the bus tree still works. The usage for the backup option is as follows: -device virtio-net-pci, ... ,backup=:[.] for e.g. -device virtio-net-pci,id=net0,mac=52:54:00:e0:58:80,backup=pci.2:0x3 ... -device vfio-pci,host=02:10.1,id=hostdev0,bus=pci.2,addr=0x3 Signed-off-by: Si-Wei Liu --- hw/net/virtio-net.c | 29 ++++++++++++- include/hw/pci/pci.h | 3 ++ include/hw/virtio/virtio-net.h | 2 + include/standard-headers/linux/virtio_net.h | 1 + qdev-monitor.c | 64 +++++++++++++++++++++++++++++ 5 files changed, 97 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index de31b1b98c..a36b169958 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -26,6 +26,7 @@ #include "qapi-event.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" +#include "hw/pci/pci.h" #define VIRTIO_NET_VM_VERSION 11 @@ -61,6 +62,8 @@ static VirtIOFeature feature_sizes[] = { .end = endof(struct virtio_net_config, max_virtqueue_pairs)}, {.flags = 1 << VIRTIO_NET_F_MTU, .end = endof(struct virtio_net_config, mtu)}, + {.flags = 1 << VIRTIO_NET_F_BACKUP, + .end = endof(struct virtio_net_config, bsf2backup)}, {} }; @@ -84,10 +87,24 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) { VirtIONet *n = VIRTIO_NET(vdev); struct virtio_net_config netcfg; + uint16_t busdevfn; virtio_stw_p(vdev, &netcfg.status, n->status); virtio_stw_p(vdev, &netcfg.max_virtqueue_pairs, n->max_queues); virtio_stw_p(vdev, &netcfg.mtu, n->net_conf.mtu); + if (n->net_conf.backup) { + /* Below function should not fail as the backup ID string + * has been validated when device is being realized. + * Until guest starts to run we can can get to the + * effective bus num in use from pci config space where + * guest had written to. + */ + pci_get_busdevfn_by_id(n->net_conf.backup, &busdevfn, + NULL, NULL); + busdevfn <<= 8; + busdevfn |= (n->backup_devfn & 0xFF); + virtio_stw_p(vdev, &netcfg.bsf2backup, busdevfn); + } memcpy(netcfg.mac, n->mac, ETH_ALEN); memcpy(config, &netcfg, n->config_size); } @@ -1935,11 +1952,19 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIONet *n = VIRTIO_NET(dev); NetClientState *nc; + uint16_t bdevfn; int i; if (n->net_conf.mtu) { n->host_features |= (0x1 << VIRTIO_NET_F_MTU); } + if (n->net_conf.backup) { + if (pci_get_busdevfn_by_id(n->net_conf.backup, NULL, + &bdevfn, errp)) + return; + n->backup_devfn = bdevfn; + n->host_features |= (0x1 << VIRTIO_NET_F_BACKUP); + } virtio_net_set_config_size(n, n->host_features); virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); @@ -2160,8 +2185,8 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0), DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend, true), - DEFINE_PROP_BIT("backup", VirtIONet, host_features, - VIRTIO_NET_F_BACKUP, false), + DEFINE_PROP_STRING("backup", VirtIONet, net_conf.backup), + DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index d8c18c7fa4..dbb910d162 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -431,6 +431,9 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus *rootbus, PCIDevice *pci_vga_init(PCIBus *bus); +int pci_get_busdevfn_by_id(const char *id, uint16_t *busnr, + uint16_t *devfn, Error **errp); + static inline PCIBus *pci_get_bus(const PCIDevice *dev) { return PCI_BUS(qdev_get_parent_bus(DEVICE(dev))); diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index b81b6a4624..276b39f64f 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -38,6 +38,7 @@ typedef struct virtio_net_conf uint16_t rx_queue_size; uint16_t tx_queue_size; uint16_t mtu; + char *backup; } virtio_net_conf; /* Maximum packet size we can receive from tap device: header + 64k */ @@ -99,6 +100,7 @@ typedef struct VirtIONet { int announce_counter; bool needs_vnet_hdr_swap; bool mtu_bypass_backend; + uint16_t backup_devfn; } VirtIONet; void virtio_net_set_netclient_name(VirtIONet *n, const char *name, diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h index 65dde3209d..cd936e5521 100644 --- a/include/standard-headers/linux/virtio_net.h +++ b/include/standard-headers/linux/virtio_net.h @@ -79,6 +79,7 @@ struct virtio_net_config { uint16_t max_virtqueue_pairs; /* Default maximum transmit unit advice */ uint16_t mtu; + uint16_t bsf2backup; } QEMU_PACKED; /* diff --git a/qdev-monitor.c b/qdev-monitor.c index 846238175f..600a81c73e 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -32,6 +32,8 @@ #include "qemu/help_option.h" #include "qemu/option.h" #include "sysemu/block-backend.h" +#include "hw/pci/pci.h" +#include "hw/vfio/pci.h" #include "migration/misc.h" /* @@ -896,6 +898,68 @@ void qmp_device_del(const char *id, Error **errp) } } +int pci_get_busdevfn_by_id(const char *id, uint16_t *busnr, + uint16_t *devfn, Error **errp) +{ + uint16_t busnum = 0, slot = 0, func = 0; + const char *pc, *pd, *pe; + Error *local_err = NULL; + ObjectClass *class; + char value[1024]; + BusState *bus; + uint64_t u64; + + if (!(pc = strchr(id, ':'))) { + error_setg(errp, "Invalid id: backup=%s, " + "correct format should be backup=" + "':[.]'", id); + return -1; + } + get_opt_name(value, sizeof(value), id, ':'); + if (pc != id + 1) { + bus = qbus_find(value, errp); + if (!bus) + return -1; + + class = object_get_class(OBJECT(bus)); + if (class != object_class_by_name(TYPE_PCI_BUS) && + class != object_class_by_name(TYPE_PCIE_BUS)) { + error_setg(errp, "%s is not a device on pci bus", id); + return -1; + } + busnum = (uint16_t)pci_bus_num(PCI_BUS(bus)); + } + + if (!devfn) + goto out; + + pd = strchr(pc, '.'); + pe = get_opt_name(value, sizeof(value), pc + 1, '.'); + if (pe != pc + 1) { + parse_option_number("slot", value, &u64, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return -1; + } + slot = (uint16_t)u64; + } + if (pd && *(pd + 1) != '\0') { + parse_option_number("function", pd, &u64, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return -1; + } + func = (uint16_t)u64; + } + +out: + if (busnr) + *busnr = busnum; + if (devfn) + *devfn = ((slot & 0x1F) << 3) | (func & 0x7); + return 0; +} + BlockBackend *blk_by_qdev_id(const char *id, Error **errp) { DeviceState *dev; From patchwork Sun Apr 1 09:13:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 893971 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="kPKAX+u0"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40DVXw64Lfz9s1t for ; Sun, 1 Apr 2018 19:33:52 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753339AbeDAJdl (ORCPT ); Sun, 1 Apr 2018 05:33:41 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:33292 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753228AbeDAJde (ORCPT ); Sun, 1 Apr 2018 05:33:34 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w319IXEd028572; Sun, 1 Apr 2018 09:33:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=77MpWFKN2vj3MN0wh/Z636/HZ7HNY1zY+pjsBITFSxQ=; b=kPKAX+u0Al2z66EY0+3H3Q7P9h/E8LrnPwtIDABLuiB7Opk5V+8Gw+4AuudRWDBQonem o6MeKv9kehFcBzhznwluxvlYJ5Pa4mxzD/AG2DAvku/SPqj51mOeDfScGoOKML3F91l5 YNsmnBmEXGKdbbydYNU8Ap2nxFP9vKN3yL1j/uCZE3j6icH3bELknsBusoZWlNkebZea ZSrzMHPAFDO+0Q2us8XQQ5O9mf3iDEgW3mdqdl4BEmXoOf2rJkrMKj7oNT1XdOUETIFs 4C9yopO2yPkKWlVmDnaO4ukosXA3C9K7lsleRxSRXypzK3wqxyRIjmFV4nkyP269uBic dw== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2h2vp100ur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 01 Apr 2018 09:33:22 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w319XLPB025295 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 1 Apr 2018 09:33:21 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w319XLpB015155; Sun, 1 Apr 2018 09:33:21 GMT Received: from ban25x6uut24.us.oracle.com (/10.153.73.24) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 01 Apr 2018 02:33:20 -0700 From: Si-Wei Liu To: mst@redhat.com, jiri@resnulli.us, stephen@networkplumber.org, alexander.h.duyck@intel.com, davem@davemloft.net, jesse.brandeburg@intel.com, kubakici@wp.pl, jasowang@redhat.com, sridhar.samudrala@intel.com, netdev@vger.kernel.org, virtualization@lists.linux-foundation.org, virtio-dev@lists.oasis-open.org Subject: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice Date: Sun, 1 Apr 2018 05:13:09 -0400 Message-Id: <1522573990-5242-3-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> References: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8849 signatures=668697 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804010099 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hidden netdevice is not visible to userspace such that typical network utilites e.g. ip, ifconfig and et al, cannot sense its existence or configure it. Internally hidden netdev may associate with an upper level netdev that userspace has access to. Although userspace cannot manipulate the lower netdev directly, user may control or configure the underlying hidden device through the upper-level netdev. For identification purpose, the kobject for hidden netdev still presents in the sysfs hierarchy, however, no uevent message will be generated when the sysfs entry is created, modified or destroyed. For that end, a separate namescope needs to be carved out for IFF_HIDDEN netdevs. As of now netdev name that starts with colon i.e. ':' is invalid in userspace, since socket ioctls such as SIOCGIFCONF use ':' as the separator for ifname. The absence of namescope started with ':' can rightly be used as the namescope for the kernel-only IFF_HIDDEN netdevs. Signed-off-by: Si-Wei Liu --- include/linux/netdevice.h | 12 ++ include/net/net_namespace.h | 2 + net/core/dev.c | 281 ++++++++++++++++++++++++++++++++++++++------ net/core/net_namespace.c | 1 + 4 files changed, 263 insertions(+), 33 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ef789e1..1a70f3a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1380,6 +1380,7 @@ struct net_device_ops { * @IFF_PHONY_HEADROOM: the headroom value is controlled by an external * entity (i.e. the master device for bridged veth) * @IFF_MACSEC: device is a MACsec device + * @IFF_HIDDEN: device is not visible to userspace */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1410,6 +1411,7 @@ enum netdev_priv_flags { IFF_RXFH_CONFIGURED = 1<<25, IFF_PHONY_HEADROOM = 1<<26, IFF_MACSEC = 1<<27, + IFF_HIDDEN = 1<<28, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1439,6 +1441,7 @@ enum netdev_priv_flags { #define IFF_TEAM IFF_TEAM #define IFF_RXFH_CONFIGURED IFF_RXFH_CONFIGURED #define IFF_MACSEC IFF_MACSEC +#define IFF_HIDDEN IFF_HIDDEN /** * struct net_device - The DEVICE structure. @@ -1659,6 +1662,7 @@ enum netdev_priv_flags { struct net_device { char name[IFNAMSIZ]; struct hlist_node name_hlist; + struct hlist_node name_cmpl_hlist; struct dev_ifalias __rcu *ifalias; /* * I/O specific fields @@ -1680,6 +1684,7 @@ struct net_device { unsigned long state; struct list_head dev_list; + struct list_head dev_cmpl_list; struct list_head napi_list; struct list_head unreg_list; struct list_head close_list; @@ -2326,6 +2331,7 @@ struct netdev_lag_lower_state_info { #define NETDEV_UDP_TUNNEL_PUSH_INFO 0x001C #define NETDEV_UDP_TUNNEL_DROP_INFO 0x001D #define NETDEV_CHANGE_TX_QUEUE_LEN 0x001E +#define NETDEV_PRE_GETNAME 0x001F int register_netdevice_notifier(struct notifier_block *nb); int unregister_netdevice_notifier(struct notifier_block *nb); @@ -2393,6 +2399,8 @@ static inline void netdev_notifier_info_init(struct netdev_notifier_info *info, for_each_netdev_rcu(&init_net, slave) \ if (netdev_master_upper_dev_get_rcu(slave) == (bond)) #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) +#define for_each_netdev_complete(net, d) \ + list_for_each_entry(d, &(net)->dev_cmpl_head, dev_cmpl_list) static inline struct net_device *next_net_device(struct net_device *dev) { @@ -2462,6 +2470,10 @@ static inline void unregister_netdevice(struct net_device *dev) unregister_netdevice_queue(dev, NULL); } +void netdev_set_hidden(struct net_device *dev); +int hide_netdevice(struct net_device *dev); +void unhide_netdevice(struct net_device *dev); + int netdev_refcnt_read(const struct net_device *dev); void free_netdev(struct net_device *dev); void netdev_freemem(struct net_device *dev); diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 0490084..f9ce9b4 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -80,7 +80,9 @@ struct net { struct sock *genl_sock; struct list_head dev_base_head; + struct list_head dev_cmpl_head; struct hlist_head *dev_name_head; + struct hlist_head *dev_name_cmpl_head; struct hlist_head *dev_index_head; unsigned int dev_base_seq; /* protected by rtnl_mutex */ int ifindex; diff --git a/net/core/dev.c b/net/core/dev.c index 613fb40..a991b35 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -211,6 +211,13 @@ static inline struct hlist_head *dev_name_hash(struct net *net, const char *name return &net->dev_name_head[hash_32(hash, NETDEV_HASHBITS)]; } +static inline struct hlist_head *dev_cname_hash(struct net *net, const char *name) +{ + unsigned int hash = full_name_hash(net, name, strnlen(name, IFNAMSIZ)); + + return &net->dev_name_cmpl_head[hash_32(hash, NETDEV_HASHBITS)]; +} + static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex) { return &net->dev_index_head[ifindex & (NETDEV_HASHENTRIES - 1)]; @@ -237,11 +244,19 @@ static void list_netdevice(struct net_device *dev) ASSERT_RTNL(); + write_lock_bh(&dev_base_lock); - list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); - hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name)); - hlist_add_head_rcu(&dev->index_hlist, - dev_index_hash(net, dev->ifindex)); + if (!(dev->priv_flags & IFF_HIDDEN)) { + list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); + hlist_add_head_rcu(&dev->name_hlist, + dev_name_hash(net, dev->name)); + hlist_add_head_rcu(&dev->index_hlist, + dev_index_hash(net, dev->ifindex)); + } + list_add_tail_rcu(&dev->dev_cmpl_list, + &net->dev_cmpl_head); + hlist_add_head_rcu(&dev->name_cmpl_hlist, + dev_cname_hash(net, dev->name)); write_unlock_bh(&dev_base_lock); dev_base_seq_inc(net); @@ -256,9 +271,13 @@ static void unlist_netdevice(struct net_device *dev) /* Unlink dev from the device chain */ write_lock_bh(&dev_base_lock); - list_del_rcu(&dev->dev_list); - hlist_del_rcu(&dev->name_hlist); - hlist_del_rcu(&dev->index_hlist); + if (!(dev->priv_flags & IFF_HIDDEN)) { + list_del_rcu(&dev->dev_list); + hlist_del_rcu(&dev->name_hlist); + hlist_del_rcu(&dev->index_hlist); + } + list_del_rcu(&dev->dev_cmpl_list); + hlist_del_rcu(&dev->name_cmpl_hlist); write_unlock_bh(&dev_base_lock); dev_base_seq_inc(dev_net(dev)); @@ -736,11 +755,15 @@ int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) struct net_device *__dev_get_by_name(struct net *net, const char *name) { struct net_device *dev; - struct hlist_head *head = dev_name_hash(net, name); + struct hlist_head *head = dev_cname_hash(net, name); + bool hidden_name = (*name == ':'); - hlist_for_each_entry(dev, head, name_hlist) + hlist_for_each_entry(dev, head, name_cmpl_hlist) { + if (hidden_name && !(dev->priv_flags & IFF_HIDDEN)) + continue; if (!strncmp(dev->name, name, IFNAMSIZ)) return dev; + } return NULL; } @@ -1015,15 +1038,7 @@ struct net_device *__dev_get_by_flags(struct net *net, unsigned short if_flags, } EXPORT_SYMBOL(__dev_get_by_flags); -/** - * dev_valid_name - check if name is okay for network device - * @name: name string - * - * Network device names need to be valid file names to - * to allow sysfs to work. We also disallow any kind of - * whitespace. - */ -bool dev_valid_name(const char *name) +static bool __dev_valid_name(const char *name, bool hidden) { if (*name == '\0') return false; @@ -1033,12 +1048,27 @@ bool dev_valid_name(const char *name) return false; while (*name) { - if (*name == '/' || *name == ':' || isspace(*name)) + if (*name == '/' || isspace(*name)) + return false; + if (!hidden && *name == ':') return false; name++; } return true; } + +/** + * dev_valid_name - check if name is okay for network device + * @name: name string + * + * Network device names need to be valid file names to + * to allow sysfs to work. We also disallow any kind of + * whitespace. + */ +bool dev_valid_name(const char *name) +{ + return __dev_valid_name(name, false); +} EXPORT_SYMBOL(dev_valid_name); /** @@ -1064,9 +1094,6 @@ static int __dev_alloc_name(struct net *net, const char *name, char *buf) unsigned long *inuse; struct net_device *d; - if (!dev_valid_name(name)) - return -EINVAL; - p = strchr(name, '%'); if (p) { /* @@ -1082,7 +1109,7 @@ static int __dev_alloc_name(struct net *net, const char *name, char *buf) if (!inuse) return -ENOMEM; - for_each_netdev(net, d) { + for_each_netdev_complete(net, d) { if (!sscanf(d->name, name, &i)) continue; if (i < 0 || i >= max_netdevices) @@ -1139,18 +1166,18 @@ static int dev_alloc_name_ns(struct net *net, int dev_alloc_name(struct net_device *dev, const char *name) { + if (!dev_valid_name(name)) + return -EINVAL; + return dev_alloc_name_ns(dev_net(dev), dev, name); } EXPORT_SYMBOL(dev_alloc_name); -int dev_get_valid_name(struct net *net, struct net_device *dev, - const char *name) +static int __dev_get_name(struct net *net, struct net_device *dev, + const char *name) { BUG_ON(!net); - if (!dev_valid_name(name)) - return -EINVAL; - if (strchr(name, '%')) return dev_alloc_name_ns(net, dev, name); else if (__dev_get_by_name(net, name)) @@ -1160,6 +1187,15 @@ int dev_get_valid_name(struct net *net, struct net_device *dev, return 0; } + +int dev_get_valid_name(struct net *net, struct net_device *dev, + const char *name) +{ + if (!__dev_valid_name(name, (dev->priv_flags & IFF_HIDDEN))) + return -EINVAL; + + return __dev_get_name(net, dev, name); +} EXPORT_SYMBOL(dev_get_valid_name); /** @@ -1221,12 +1257,15 @@ int dev_change_name(struct net_device *dev, const char *newname) write_lock_bh(&dev_base_lock); hlist_del_rcu(&dev->name_hlist); + hlist_del_rcu(&dev->name_cmpl_hlist); write_unlock_bh(&dev_base_lock); synchronize_rcu(); write_lock_bh(&dev_base_lock); hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name)); + hlist_add_head_rcu(&dev->name_cmpl_hlist, + dev_cname_hash(net, dev->name)); write_unlock_bh(&dev_base_lock); ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); @@ -1594,7 +1633,7 @@ int register_netdevice_notifier(struct notifier_block *nb) if (dev_boot_phase) goto unlock; for_each_net(net) { - for_each_netdev(net, dev) { + for_each_netdev_complete(net, dev) { err = call_netdevice_notifier(nb, NETDEV_REGISTER, dev); err = notifier_to_errno(err); if (err) @@ -1614,7 +1653,7 @@ int register_netdevice_notifier(struct notifier_block *nb) rollback: last = dev; for_each_net(net) { - for_each_netdev(net, dev) { + for_each_netdev_complete(net, dev) { if (dev == last) goto outroll; @@ -1659,7 +1698,7 @@ int unregister_netdevice_notifier(struct notifier_block *nb) goto unlock; for_each_net(net) { - for_each_netdev(net, dev) { + for_each_netdev_complete(net, dev) { if (dev->flags & IFF_UP) { call_netdevice_notifier(nb, NETDEV_GOING_DOWN, dev); @@ -7642,6 +7681,11 @@ int register_netdevice(struct net_device *dev) spin_lock_init(&dev->addr_list_lock); netdev_set_addr_lockdep_class(dev); + ret = call_netdevice_notifiers(NETDEV_PRE_GETNAME, dev); + ret = notifier_to_errno(ret); + if (ret) + goto out; + ret = dev_get_valid_name(net, dev, dev->name); if (ret < 0) goto out; @@ -8461,6 +8505,166 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char } EXPORT_SYMBOL_GPL(dev_change_net_namespace); +/** + * netdev_set_hidden - indicate a hidden netdev before or at + * early point of driver registration + * @dev: device + * + * Callers must hold the rtnl semaphore, typically before or + * at some early point (e.g in NETDEV_PRE_GETNAME notifier) + * of driver registrationr, or it won't take effect to hide + * the netdev post registration. + */ +void netdev_set_hidden(struct net_device *dev) +{ + dev->priv_flags |= IFF_HIDDEN; + strlcpy(dev->name, ":eth%d", IFNAMSIZ); +} +EXPORT_SYMBOL(netdev_set_hidden); + +/** + * hide_netdevice - hide device from userspace's visibility + * @dev: device + * + * This function shuts down a device interface and removes it + * from all userspace visible dev lists, and moves it to + * comprehensive dev lists containing both userspace-visible + * and kernel-only devices. On success 0 is returned, on + * a failure a netagive errno code is returned. + */ +int hide_netdevice(struct net_device *dev) +{ + int err; + + rtnl_lock(); + + err = 0; + /* Get out if there is nothing to do */ + if (dev->priv_flags & IFF_HIDDEN) + goto out; + + err = -EINVAL; + /* Ensure the device has been registrered */ + if (dev->reg_state != NETREG_REGISTERED) + goto out; + + err = __dev_get_name(dev_net(dev), dev, ":eth%d"); + if (err < 0) + goto out; + + /* + * And now a mini version of register_netdevice unregister_netdevice. + */ + + /* If device is running close it first. */ + dev_close(dev); + + /* And unlink it from device chain */ + unlist_netdevice(dev); + synchronize_net(); + + /* Shutdown queueing discipline. */ + dev_shutdown(dev); + + /* Notify protocols, that we are about to destroy + * this device. They should clean all the things. + * + * Note that dev->reg_state stays at NETREG_REGISTERED. + * This is wanted because this way 8021q and macvlan know + * the device is just moving and can keep their slaves up. + */ + call_netdevice_notifiers(NETDEV_UNREGISTER, dev); + rcu_barrier(); + call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev); + rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL); + + /* + * Flush the unicast and multicast chains + */ + dev_uc_flush(dev); + dev_mc_flush(dev); + + /* Send a netdev-removed uevent to the old namespace */ + kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE); + netdev_adjacent_del_links(dev); + + /* Fixup kobjects */ + err = device_rename(&dev->dev, dev->name); + WARN_ON(err); + + dev->priv_flags |= IFF_HIDDEN; + list_netdevice(dev); + + /* Notify protocols, that a new device appeared. */ + call_netdevice_notifiers(NETDEV_REGISTER, dev); + + synchronize_net(); + err = 0; +out: + rtnl_unlock(); + return err; +} +EXPORT_SYMBOL(hide_netdevice); + +/** + * unhide_netdevice - make a hidden device visible to userspace + * @dev: device + * + * This function moves a hidden device to userspace visible + * interfaces. A %NETDEV_REGISTER message will be sent to + * the netdev notifier chain. + */ +void unhide_netdevice(struct net_device *dev) +{ + int err; + + rtnl_lock(); + /* Get out if there is nothing to do */ + if (!(dev->priv_flags & IFF_HIDDEN)) + goto out; + + /* Ensure the device has been registrered */ + if (dev->reg_state != NETREG_REGISTERED) + goto out; + + err = __dev_get_name(dev_net(dev), dev, "eth%d"); + WARN_ON(err < 0); + + /* If device is running close it first. */ + dev_close(dev); + unlist_netdevice(dev); + synchronize_net(); + + /* Shutdown queueing discipline. */ + dev_shutdown(dev); + + call_netdevice_notifiers(NETDEV_UNREGISTER, dev); + rcu_barrier(); + call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev); + dev_uc_flush(dev); + dev_mc_flush(dev); + + /* Send a netdev-add uevent to the new namespace */ + kobject_uevent(&dev->dev.kobj, KOBJ_ADD); + netdev_adjacent_add_links(dev); + + /* Fixup kobjects */ + err = device_rename(&dev->dev, dev->name); + WARN_ON(err); + + /* Add the device back in the hashes */ + dev->priv_flags &= ~IFF_HIDDEN; + list_netdevice(dev); + + call_netdevice_notifiers(NETDEV_REGISTER, dev); + + rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); + synchronize_net(); +out: + rtnl_unlock(); +} +EXPORT_SYMBOL(unhide_netdevice); + static int dev_cpu_dead(unsigned int oldcpu) { struct sk_buff **list_skb; @@ -8571,13 +8775,19 @@ static struct hlist_head * __net_init netdev_create_hash(void) /* Initialize per network namespace state */ static int __net_init netdev_init(struct net *net) { - if (net != &init_net) + if (net != &init_net) { INIT_LIST_HEAD(&net->dev_base_head); + INIT_LIST_HEAD(&net->dev_cmpl_head); + } net->dev_name_head = netdev_create_hash(); if (net->dev_name_head == NULL) goto err_name; + net->dev_name_cmpl_head = netdev_create_hash(); + if (net->dev_name_cmpl_head == NULL) + goto err_cname; + net->dev_index_head = netdev_create_hash(); if (net->dev_index_head == NULL) goto err_idx; @@ -8585,6 +8795,8 @@ static int __net_init netdev_init(struct net *net) return 0; err_idx: + kfree(net->dev_name_cmpl_head); +err_cname: kfree(net->dev_name_head); err_name: return -ENOMEM; @@ -8676,9 +8888,12 @@ void func(const struct net_device *dev, const char *fmt, ...) \ static void __net_exit netdev_exit(struct net *net) { kfree(net->dev_name_head); + kfree(net->dev_name_cmpl_head); kfree(net->dev_index_head); - if (net != &init_net) + if (net != &init_net) { WARN_ON_ONCE(!list_empty(&net->dev_base_head)); + WARN_ON_ONCE(!list_empty(&net->dev_cmpl_head)); + } } static struct pernet_operations __net_initdata netdev_net_ops = { diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 60a71be..1c399e9 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -37,6 +37,7 @@ struct net init_net = { .count = ATOMIC_INIT(1), .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head), + .dev_cmpl_head = LIST_HEAD_INIT(init_net.dev_cmpl_head), }; EXPORT_SYMBOL(init_net); From patchwork Sun Apr 1 09:13:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 893972 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="QrUNoTym"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40DVXy2L4Bz9s15 for ; Sun, 1 Apr 2018 19:33:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753311AbeDAJdj (ORCPT ); Sun, 1 Apr 2018 05:33:39 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:33296 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753257AbeDAJde (ORCPT ); Sun, 1 Apr 2018 05:33:34 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w319Hw7d027680; Sun, 1 Apr 2018 09:33:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=HubITZQ2EsB2AwbymSVZ9lxlZb2evppwzB1Fz4lgEfg=; b=QrUNoTym4sJvF4oGu6H0INKrVVWEwhhozFIL3oMqQ1ksSbdlYRi8P6GZ+S99r/ir4YXM RxQlv8XY3jhbZU9rBYKbpVxtBR09XQmHYnzMaIM8Gz5+sv2R/7HyL3uupDqUck0IQr/x 2Mqb3Y1I3VWMOAHvTna7x0uCGZGqXadrAf79RBO9H3TsUsX8wJoSuAuSGqZ69fyvTGmi dZM6j2lmCgb0h2mjTFfz4M76rjD1M0maoHukCCXMtE3DcohXHlgCp2enZcgsFKYgt0k4 +9m6iy4pmofg416wEwBNasyfViH4Lelthfl/lQl4qveETaGnB5F5i5RxA4xozO4P4at/ YQ== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2h2vp100v0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 01 Apr 2018 09:33:24 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w319XO2V008887 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 1 Apr 2018 09:33:24 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w319XLpR008776; Sun, 1 Apr 2018 09:33:21 GMT Received: from ban25x6uut24.us.oracle.com (/10.153.73.24) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 01 Apr 2018 02:33:21 -0700 From: Si-Wei Liu To: mst@redhat.com, jiri@resnulli.us, stephen@networkplumber.org, alexander.h.duyck@intel.com, davem@davemloft.net, jesse.brandeburg@intel.com, kubakici@wp.pl, jasowang@redhat.com, sridhar.samudrala@intel.com, netdev@vger.kernel.org, virtualization@lists.linux-foundation.org, virtio-dev@lists.oasis-open.org Subject: [RFC PATCH 3/3] virtio_net: make lower netdevs for virtio_bypass hidden Date: Sun, 1 Apr 2018 05:13:10 -0400 Message-Id: <1522573990-5242-4-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> References: <1522573990-5242-1-git-send-email-si-wei.liu@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8849 signatures=668697 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804010099 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We should move virtio_bypass to a 1-upper-with-2-hidden-lower driver model for greater compatibility with regard to preserving userpsace API and ABI. On the other hand, technically virtio_bypass should make stricter check before automatically enslaving the corresponding virtual function or passthrough device. It's more reasonable to pair virtio_bypass instance with a VF or passthrough device 1:1, rather than rely on searching for a random non-virtio netdev with exact same MAC address. One possible way of doing it is to bind virtio_bypass explicitly to a guest pci device by specifying its and : location. Changing BACKUP feature to take these configs into account, such that verifying target device for auto-enslavement no longer relies on the MAC address. Signed-off-by: Si-Wei Liu --- drivers/net/virtio_net.c | 159 ++++++++++++++++++++++++++++++++++++---- include/uapi/linux/virtio_net.h | 2 + 2 files changed, 148 insertions(+), 13 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f850cf6..c54a5bd 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -77,6 +77,8 @@ struct virtnet_stats { u64 rx_packets; }; +static struct workqueue_struct *virtnet_bypass_wq; + /* Internal representation of a send virtqueue */ struct send_queue { /* Virtqueue associated with this send _queue */ @@ -196,6 +198,13 @@ struct padded_vnet_hdr { char padding[4]; }; +struct virtnet_bypass_task { + struct work_struct work; + unsigned long event; + struct net_device *child_netdev; + struct net_device *bypass_netdev; +}; + /* Converting between virtqueue no. and kernel tx/rx queue no. * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq */ @@ -2557,6 +2566,11 @@ struct virtnet_bypass_info { /* spinlock while updating stats */ spinlock_t stats_lock; + + int bus; + int slot; + int function; + }; static void virtnet_bypass_child_open(struct net_device *dev, @@ -2822,10 +2836,13 @@ static void virtnet_bypass_ethtool_get_drvinfo(struct net_device *dev, .get_link_ksettings = virtnet_bypass_ethtool_get_link_ksettings, }; -static struct net_device *get_virtnet_bypass_bymac(struct net *net, - const u8 *mac) +static struct net_device * +get_virtnet_bypass_bymac(struct net_device *child_netdev) { + struct net *net = dev_net(child_netdev); struct net_device *dev; + struct virtnet_bypass_info *vbi; + int devfn; ASSERT_RTNL(); @@ -2833,7 +2850,29 @@ static struct net_device *get_virtnet_bypass_bymac(struct net *net, if (dev->netdev_ops != &virtnet_bypass_netdev_ops) continue; /* not a virtnet_bypass device */ - if (ether_addr_equal(mac, dev->perm_addr)) + if (!ether_addr_equal(child_netdev->dev_addr, dev->perm_addr)) + continue; /* not matching MAC address */ + + if (!child_netdev->dev.parent) + continue; + + /* Is child_netdev a backup netdev ? */ + if (child_netdev->dev.parent == dev->dev.parent) + return dev; + + /* Avoid non pci devices as active netdev */ + if (!dev_is_pci(child_netdev->dev.parent)) + continue; + + vbi = netdev_priv(dev); + devfn = PCI_DEVFN(vbi->slot, vbi->function); + + netdev_info(dev, "bus %d slot %d func %d", + vbi->bus, vbi->slot, vbi->function); + + /* Need to match :. */ + if (pci_get_bus_and_slot(vbi->bus, devfn) == + to_pci_dev(child_netdev->dev.parent)) return dev; } @@ -2878,10 +2917,61 @@ static rx_handler_result_t virtnet_bypass_handle_frame(struct sk_buff **pskb) return RX_HANDLER_ANOTHER; } +static int virtnet_bypass_pregetname_child(struct net_device *child_netdev) +{ + struct net_device *dev; + + if (child_netdev->addr_len != ETH_ALEN) + return NOTIFY_DONE; + + /* We will use the MAC address to locate the virtnet_bypass netdev + * to associate with the child netdev. If we don't find a matching + * bypass netdev, move on. + */ + dev = get_virtnet_bypass_bymac(child_netdev); + if (!dev) + return NOTIFY_DONE; + + if (child_netdev->dev.parent && + child_netdev->dev.parent != dev->dev.parent); + netdev_set_hidden(child_netdev); + + return NOTIFY_OK; +} + +static void virtnet_bypass_task_fn(struct work_struct *work) +{ + struct virtnet_bypass_task *task; + struct net_device *child_netdev; + int rc; + + task = container_of(work, struct virtnet_bypass_task, work); + child_netdev = task->child_netdev; + + switch (task->event) { + case NETDEV_REGISTER: + rc = hide_netdevice(child_netdev); + if (rc) + netdev_err(child_netdev, + "hide netdev %s failed with error %#x", + child_netdev->name, rc); + + break; + case NETDEV_UNREGISTER: + unhide_netdevice(child_netdev); + break; + default: + break; + } + dev_put(child_netdev); + kfree(task); +} + static int virtnet_bypass_register_child(struct net_device *child_netdev) { struct virtnet_bypass_info *vbi; struct net_device *dev; + struct virtnet_bypass_task *task; bool backup; int ret; @@ -2892,25 +2982,34 @@ static int virtnet_bypass_register_child(struct net_device *child_netdev) * to associate with the child netdev. If we don't find a matching * bypass netdev, move on. */ - dev = get_virtnet_bypass_bymac(dev_net(child_netdev), - child_netdev->perm_addr); + dev = get_virtnet_bypass_bymac(child_netdev); if (!dev) return NOTIFY_DONE; vbi = netdev_priv(dev); backup = (child_netdev->dev.parent == dev->dev.parent); if (backup ? rtnl_dereference(vbi->backup_netdev) : - rtnl_dereference(vbi->active_netdev)) { + rtnl_dereference(vbi->active_netdev)) { netdev_info(dev, "%s attempting to join bypass dev when %s already present\n", child_netdev->name, backup ? "backup" : "active"); return NOTIFY_DONE; } - /* Avoid non pci devices as active netdev */ - if (!backup && (!child_netdev->dev.parent || - !dev_is_pci(child_netdev->dev.parent))) - return NOTIFY_DONE; + /* Verify :. info */ + if (!backup && !(child_netdev->priv_flags & IFF_HIDDEN)) { + task = kzalloc(sizeof(*task), GFP_ATOMIC); + if (!task) + return NOTIFY_DONE; + task->event = NETDEV_REGISTER; + task->bypass_netdev = dev; + task->child_netdev = child_netdev; + INIT_WORK(&task->work, virtnet_bypass_task_fn); + queue_work(virtnet_bypass_wq, &task->work); + dev_hold(child_netdev); + + return NOTIFY_OK; + } ret = netdev_rx_handler_register(child_netdev, virtnet_bypass_handle_frame, dev); @@ -2981,6 +3080,7 @@ static int virtnet_bypass_unregister_child(struct net_device *child_netdev) { struct virtnet_bypass_info *vbi; struct net_device *dev, *backup; + struct virtnet_bypass_task *task; dev = get_virtnet_bypass_byref(child_netdev); if (!dev) @@ -3003,6 +3103,16 @@ static int virtnet_bypass_unregister_child(struct net_device *child_netdev) dev->min_mtu = backup->min_mtu; dev->max_mtu = backup->max_mtu; } + + task = kzalloc(sizeof(*task), GFP_ATOMIC); + if (task) { + task->event = NETDEV_UNREGISTER; + task->bypass_netdev = dev; + task->child_netdev = child_netdev; + INIT_WORK(&task->work, virtnet_bypass_task_fn); + queue_work(virtnet_bypass_wq, &task->work); + dev_hold(child_netdev); + } } dev_put(child_netdev); @@ -3059,6 +3169,8 @@ static int virtnet_bypass_event(struct notifier_block *this, return NOTIFY_DONE; switch (event) { + case NETDEV_PRE_GETNAME: + return virtnet_bypass_pregetname_child(event_dev); case NETDEV_REGISTER: return virtnet_bypass_register_child(event_dev); case NETDEV_UNREGISTER: @@ -3076,11 +3188,12 @@ static int virtnet_bypass_event(struct notifier_block *this, .notifier_call = virtnet_bypass_event, }; -static int virtnet_bypass_create(struct virtnet_info *vi) +static int virtnet_bypass_create(struct virtnet_info *vi, int bsf) { struct net_device *backup_netdev = vi->dev; struct device *dev = &vi->vdev->dev; struct net_device *bypass_netdev; + struct virtnet_bypass_info *vbi; int res; /* Alloc at least 2 queues, for now we are going with 16 assuming @@ -3095,6 +3208,11 @@ static int virtnet_bypass_create(struct virtnet_info *vi) dev_net_set(bypass_netdev, dev_net(backup_netdev)); SET_NETDEV_DEV(bypass_netdev, dev); + vbi = netdev_priv(bypass_netdev); + + vbi->bus = (bsf >> 8) & 0xFF; + vbi->slot = (bsf >> 3) & 0x1F; + vbi->function = bsf & 0x7; bypass_netdev->netdev_ops = &virtnet_bypass_netdev_ops; bypass_netdev->ethtool_ops = &virtnet_bypass_ethtool_ops; @@ -3183,7 +3301,7 @@ static int virtnet_probe(struct virtio_device *vdev) struct net_device *dev; struct virtnet_info *vi; u16 max_queue_pairs; - int mtu; + int mtu, bsf; /* Find if host supports multiqueue virtio_net device */ err = virtio_cread_feature(vdev, VIRTIO_NET_F_MQ, @@ -3339,8 +3457,12 @@ static int virtnet_probe(struct virtio_device *vdev) virtnet_init_settings(dev); if (virtio_has_feature(vdev, VIRTIO_NET_F_BACKUP)) { - if (virtnet_bypass_create(vi) != 0) + bsf = virtio_cread16(vdev, + offsetof(struct virtio_net_config, + bsf2backup)); + if (virtnet_bypass_create(vi, bsf) != 0) goto free_vqs; + netdev_set_hidden(dev); } err = register_netdev(dev); @@ -3384,6 +3506,7 @@ static int virtnet_probe(struct virtio_device *vdev) unregister_netdev(dev); free_bypass: virtnet_bypass_destroy(vi); + free_vqs: cancel_delayed_work_sync(&vi->refill); free_receive_page_frags(vi); @@ -3517,6 +3640,12 @@ static __init int virtio_net_driver_init(void) if (ret) goto err_dead; + virtnet_bypass_wq = create_singlethread_workqueue("virtio_bypass"); + if (!virtnet_bypass_wq) { + ret = -ENOMEM; + goto err_wq; + } + ret = register_virtio_driver(&virtio_net_driver); if (ret) goto err_virtio; @@ -3524,6 +3653,8 @@ static __init int virtio_net_driver_init(void) register_netdevice_notifier(&virtnet_bypass_notifier); return 0; err_virtio: + destroy_workqueue(virtnet_bypass_wq); +err_wq: cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD); err_dead: cpuhp_remove_multi_state(virtionet_online); @@ -3535,6 +3666,8 @@ static __init int virtio_net_driver_init(void) static __exit void virtio_net_driver_exit(void) { unregister_netdevice_notifier(&virtnet_bypass_notifier); + if (virtnet_bypass_wq) + destroy_workqueue(virtnet_bypass_wq); unregister_virtio_driver(&virtio_net_driver); cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD); cpuhp_remove_multi_state(virtionet_online); diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index aa40664..0827b7e 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -80,6 +80,8 @@ struct virtio_net_config { __u16 max_virtqueue_pairs; /* Default maximum transmit unit advice */ __u16 mtu; + /* Device at bus:slot.function backed up by virtio_net */ + __u16 bsf2backup; } __attribute__((packed)); /*