From patchwork Wed Apr 3 04:52:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 1075424 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="2tP50hDg"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44YvVq1X9Pz9sPB for ; Wed, 3 Apr 2019 16:18:26 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728558AbfDCFSY (ORCPT ); Wed, 3 Apr 2019 01:18:24 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:56642 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727032AbfDCFSY (ORCPT ); Wed, 3 Apr 2019 01:18:24 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x335DiTi178060; Wed, 3 Apr 2019 05:18:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2018-07-02; bh=xZbJLgiCjKmzPMMyZqzSCpKQwLRJotgEJxDsbUxkABw=; b=2tP50hDgtMv4GvLAgbq69khdgNDQXsebR3gm15j9scpj+ZnovakR9EXzz7Cl55z0ZVty UZfeeDtoWuRY18f+cN4IfQ7c+q24Dnda0c7mH8RLOhdTxU4OOPAj012KBS1xGMZstn3O TPXgkmybi7ukZhgx1sGnjMpFa73LGyxQ7L4e2I9e2km/zKGqpvrYMyG+W50UDT2VCWJO AId1csqjweXzL9fA022Ev0TcimSi2YwwFKQaWK9SmWveVlWPvbLihYmVphoFs+E/GAuj 8caJUCQ5iazvKduWEuD44TuEGckqUzlzAGwGVzGmi3yh+z50theZROImJXEDMg6y+qJ6 rQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 2rj0dnnux7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 03 Apr 2019 05:18:13 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x335ID3O036428; Wed, 3 Apr 2019 05:18:13 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 2rm8f4w2qf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 03 Apr 2019 05:18:12 +0000 Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x335I8Oi028261; Wed, 3 Apr 2019 05:18:08 GMT Received: from ban25x6uut24.us.oracle.com (/10.153.73.24) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 02 Apr 2019 22:18:07 -0700 From: Si-Wei Liu To: mst@redhat.com, sridhar.samudrala@intel.com, stephen@networkplumber.org, davem@davemloft.net, kubakici@wp.pl, alexander.duyck@gmail.com, jiri@resnulli.us, netdev@vger.kernel.org, virtualization@lists.linux-foundation.org Cc: liran.alon@oracle.com, boris.ostrovsky@oracle.com, vijay.balakrishna@oracle.com, si-wei liu Subject: [PATCH net v6] failover: allow name change on IFF_UP slave interfaces Date: Wed, 3 Apr 2019 00:52:47 -0400 Message-Id: <1554267167-17561-1-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9215 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904030035 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9215 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=4 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904030034 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When a netdev appears through hot plug then gets enslaved by a failover master that is already up and running, the slave will be opened right away after getting enslaved. Today there's a race that userspace (udev) may fail to rename the slave if the kernel (net_failover) opens the slave earlier than when the userspace rename happens. Unlike bond or team, the primary slave of failover can't be renamed by userspace ahead of time, since the kernel initiated auto-enslavement is unable to, or rather, is never meant to be synchronized with the rename request from userspace. As the failover slave interfaces are not designed to be operated directly by userspace apps: IP configuration, filter rules with regard to network traffic passing and etc., should all be done on master interface. In general, userspace apps only care about the name of master interface, while slave names are less important as long as admin users can see reliable names that may carry other information describing the netdev. For e.g., they can infer that "ens3nsby" is a standby slave of "ens3", while for a name like "eth0" they can't tell which master it belongs to. Historically the name of IFF_UP interface can't be changed because there might be admin script or management software that is already relying on such behavior and assumes that the slave name can't be changed once UP. But failover is special: with the in-kernel auto-enslavement mechanism, the userspace expectation for device enumeration and bring-up order is already broken. Previously initramfs and various userspace config tools were modified to bypass failover slaves because of auto-enslavement and duplicate MAC address. Similarly, in case that users care about seeing reliable slave name, the new type of failover slaves needs to be taken care of specifically in userspace anyway. It's less risky to lift up the rename restriction on failover slave which is already UP. Although it's possible this change may potentially break userspace component (most likely configuration scripts or management software) that assumes slave name can't be changed while UP, it's relatively a limited and controllable set among all userspace components, which can be fixed specifically to listen for the rename and/or link down/up events on failover slaves. Userspace component interacting with slaves is expected to be changed to operate on failover master interface instead, as the failover slave is dynamic in nature which may come and go at any point. The goal is to make the role of failover slaves less relevant, and userspace components should only deal with failover master in the long run. Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") Signed-off-by: Si-Wei Liu Reviewed-by: Liran Alon Acked-by: Michael S. Tsirkin --- v1 -> v2: - Drop configurable module parameter (Sridhar) v2 -> v3: - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar) - Send down and up events around rename (Michael S. Tsirkin) v3 -> v4: - Simplify notification to be sent (Stephen Hemminger) v4 -> v5: - Sync up code with latest net-next (Sridhar) - Use proper structure initialization (Stephen, Jiri) v5 -> v6: - Make the property of live name change a generic flag (Stephen) --- include/linux/netdevice.h | 3 +++ net/core/dev.c | 25 ++++++++++++++++++++++++- net/core/failover.c | 6 +++--- 3 files changed, 30 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 78f5ec4e..ea9a63f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1498,6 +1498,7 @@ struct net_device_ops { * @IFF_FAILOVER: device is a failover master device * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device + * @IFF_LIVE_NAME_CHANGE: rename is allowed while device is running */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1530,6 +1531,7 @@ enum netdev_priv_flags { IFF_FAILOVER = 1<<27, IFF_FAILOVER_SLAVE = 1<<28, IFF_L3MDEV_RX_HANDLER = 1<<29, + IFF_LIVE_NAME_CHANGE = 1<<30, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1561,6 +1563,7 @@ enum netdev_priv_flags { #define IFF_FAILOVER IFF_FAILOVER #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE #define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER +#define IFF_LIVE_NAME_CHANGE IFF_LIVE_NAME_CHANGE /** * struct net_device - The DEVICE structure. diff --git a/net/core/dev.c b/net/core/dev.c index 9823b77..48341d5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1185,7 +1185,21 @@ int dev_change_name(struct net_device *dev, const char *newname) BUG_ON(!dev_net(dev)); net = dev_net(dev); - if (dev->flags & IFF_UP) + + /* Some auto-enslaved devices e.g. failover slaves are + * special, as userspace might rename the device after + * the interface had been brought up and running since + * the point kernel initiated auto-enslavement. Allow + * live name change even when these slave devices are + * up and running. + * + * Typically, users of these auto-enslaving devices + * don't actually care about slave name change, as + * they are supposed to operate on master interface + * directly. + */ + if (dev->flags & IFF_UP && + likely(!(dev->priv_flags & IFF_LIVE_NAME_CHANGE))) return -EBUSY; write_seqcount_begin(&devnet_rename_seq); @@ -1232,6 +1246,15 @@ int dev_change_name(struct net_device *dev, const char *newname) hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name)); write_unlock_bh(&dev_base_lock); + if (unlikely(dev->flags & IFF_UP)) { + struct netdev_notifier_change_info change_info = { + .info.dev = dev, + }; + + call_netdevice_notifiers_info(NETDEV_CHANGE, + &change_info.info); + } + ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); ret = notifier_to_errno(ret); diff --git a/net/core/failover.c b/net/core/failover.c index 4a92a98..b5cd3c7 100644 --- a/net/core/failover.c +++ b/net/core/failover.c @@ -80,14 +80,14 @@ static int failover_slave_register(struct net_device *slave_dev) goto err_upper_link; } - slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; + slave_dev->priv_flags |= (IFF_FAILOVER_SLAVE | IFF_LIVE_NAME_CHANGE); if (fops && fops->slave_register && !fops->slave_register(slave_dev, failover_dev)) return NOTIFY_OK; netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_LIVE_NAME_CHANGE); err_upper_link: netdev_rx_handler_unregister(slave_dev); done: @@ -121,7 +121,7 @@ int failover_slave_unregister(struct net_device *slave_dev) netdev_rx_handler_unregister(slave_dev); netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_LIVE_NAME_CHANGE); if (fops && fops->slave_unregister && !fops->slave_unregister(slave_dev, failover_dev))