From patchwork Thu Apr 5 21:08:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Samudrala, Sridhar" X-Patchwork-Id: 895541 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40HFmn0bwZz9s1B for ; Fri, 6 Apr 2018 07:08:41 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751494AbeDEVIZ (ORCPT ); Thu, 5 Apr 2018 17:08:25 -0400 Received: from mga04.intel.com ([192.55.52.120]:19399 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750726AbeDEVIY (ORCPT ); Thu, 5 Apr 2018 17:08:24 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Apr 2018 14:08:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,412,1517904000"; d="scan'208";a="29876006" Received: from arch-p28.jf.intel.com ([10.166.187.31]) by fmsmga007.fm.intel.com with ESMTP; 05 Apr 2018 14:08:23 -0700 From: Sridhar Samudrala To: mst@redhat.com, stephen@networkplumber.org, davem@davemloft.net, netdev@vger.kernel.org, virtualization@lists.linux-foundation.org, virtio-dev@lists.oasis-open.org, jesse.brandeburg@intel.com, alexander.h.duyck@intel.com, kubakici@wp.pl, sridhar.samudrala@intel.com, jasowang@redhat.com, loseweigh@gmail.com, jiri@resnulli.us Subject: [RFC PATCH net-next v5 0/4] Enable virtio_net to act as a backup for a passthru device Date: Thu, 5 Apr 2018 14:08:19 -0700 Message-Id: <1522962503-3963-1-git-send-email-sridhar.samudrala@intel.com> X-Mailer: git-send-email 1.8.3.1 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The main motivation for this patch is to enable cloud service providers to provide an accelerated datapath to virtio-net enabled VMs in a transparent manner with no/minimal guest userspace changes. This also enables hypervisor controlled live migration to be supported with VMs that have direct attached SR-IOV VF devices. Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be used by hypervisor to indicate that virtio_net interface should act as a backup for another device with the same MAC address. Patch 2 introduces a bypass module that provides a generic interface for paravirtual drivers to listen for netdev register/unregister/link change events from pci ethernet devices with the same MAC and takeover their datapath. The notifier and event handling code is based on the existing netvsc implementation. A paravirtual driver can use this module by registering a set of ops and each instance of the device when it is probed. Patch 3 extends virtio_net to use alternate datapath when available and registered. When BACKUP feature is enabled, virtio_net driver creates an additional 'bypass' netdev that acts as a master device and controls 2 slave devices. The original virtio_net netdev is registered as 'backup' netdev and a passthru/vf device with the same MAC gets registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are associated with the same 'pci' device. The user accesses the network interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev as default for transmits when it is available with link up and running. Patch 4 refactors netvsc to use the registration/notification framework supported by bypass module. As this patch series is initially focusing on usecases where hypervisor fully controls the VM networking and the guest is not expected to directly configure any hardware settings, it doesn't expose all the ndo/ethtool ops that are supported by virtio_net at this time. To support additional usecases, it should be possible to enable additional ops later by caching the state in virtio netdev and replaying when the 'active' netdev gets registered. The hypervisor needs to enable only one datapath at any time so that packets don't get looped back to the VM over the other datapath. When a VF is plugged, the virtio datapath link state can be marked as down. At the time of live migration, the hypervisor needs to unplug the VF device from the guest on the source host and reset the MAC filter of the VF to initiate failover of datapath to virtio before starting the migration. After the migration is completed, the destination hypervisor sets the MAC filter on the VF and plugs it back to the guest to switch over to VF datapath. This patch is based on the discussion initiated by Jesse on this thread. https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 v5 RFC: Based on Jiri's comments, moved the common functionality to a 'bypass' module so that the same notifier and event handlers to handle child register/unregister/link change events can be shared between virtio_net and netvsc. Improved error handling based on Siwei's comments. v4: - Based on the review comments on the v3 version of the RFC patch and Jakub's suggestion for the naming issue with 3 netdev solution, proposed 3 netdev in-driver bonding solution for virtio-net. v3 RFC: - Introduced 3 netdev model and pointed out a couple of issues with that model and proposed 2 netdev model to avoid these issues. - Removed broadcast/multicast optimization and only use virtio as backup path when VF is unplugged. v2 RFC: - Changed VIRTIO_NET_F_MASTER to VIRTIO_NET_F_BACKUP (mst) - made a small change to the virtio-net xmit path to only use VF datapath for unicasts. Broadcasts/multicasts use virtio datapath. This avoids east-west broadcasts to go over the PCI link. - added suppport for the feature bit in qemu Sridhar Samudrala (4): virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit net: Introduce generic bypass module virtio_net: Extend virtio to use VF datapath when available netvsc: refactor notifier/event handling code to use the bypass framework drivers/net/Kconfig | 1 + drivers/net/hyperv/Kconfig | 1 + drivers/net/hyperv/netvsc_drv.c | 219 ++++---------- drivers/net/virtio_net.c | 614 +++++++++++++++++++++++++++++++++++++++- include/net/bypass.h | 80 ++++++ include/uapi/linux/virtio_net.h | 3 + net/Kconfig | 18 ++ net/core/Makefile | 1 + net/core/bypass.c | 406 ++++++++++++++++++++++++++ 9 files changed, 1184 insertions(+), 159 deletions(-) create mode 100644 include/net/bypass.h create mode 100644 net/core/bypass.c