From patchwork Tue Mar 5 00:36:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 1051574 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="ykdlTJg6"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44CzBC2F8rz9s4Y for ; Tue, 5 Mar 2019 12:01:54 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726807AbfCEBBx (ORCPT ); Mon, 4 Mar 2019 20:01:53 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:40176 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726095AbfCEBBw (ORCPT ); Mon, 4 Mar 2019 20:01:52 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x250wlo8182605; Tue, 5 Mar 2019 01:01:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id; s=corp-2018-07-02; bh=Nm27p+LPcXLioCsTJiQyB+YM2VXmUc3rzuMoqqWLmcM=; b=ykdlTJg6h/VVYz1h6AmxKHbpL3VG+4E8d04MvXTAndKYMijzfrIPPp387zrw39C2DxNT RJ9nEXGixCFlgEawPNmFmV168/+rbjJU+I94hu0fsAXc4tRQ/wmgYQe1VRcE1JJNhxbP /GVJZK/sZ8xn5q7WZrzVRJVyEVFe2upC9kzrEVVzC/P8r5SIvEIx2lQ2uuTOxemgkt/+ K62Xr3ygnpVlna9UeRL/oV9ytkL2NhT53PBIZ5ABRPK/qznrnxD3uVLX33pSdLOhCHxG zDU0ZlfSxkHUrtZ2un32zFzC+DOSpFwNu68mDA9TRfURnZbrU/uE9S9bnDBj8MNCpV2d ow== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2qyfbe2m57-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 Mar 2019 01:01:39 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x2511bcJ032357 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 5 Mar 2019 01:01:37 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x2511aab017700; Tue, 5 Mar 2019 01:01:36 GMT Received: from ban25x6uut24.us.oracle.com (/10.153.73.24) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 Mar 2019 17:01:36 -0800 From: Si-Wei Liu To: "Michael S. Tsirkin" , Sridhar Samudrala , Stephen Hemminger , Jakub Kicinski , Jiri Pirko , David Miller , Netdev , virtualization@lists.linux-foundation.org Subject: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Date: Mon, 4 Mar 2019 19:36:38 -0500 Message-Id: <1551746198-11143-1-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9185 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903050005 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When a netdev appears through hot plug then gets enslaved by a failover master that is already up and running, the slave will be opened right away after getting enslaved. Today there's a race that userspace (udev) may fail to rename the slave if the kernel (net_failover) opens the slave earlier than when the userspace rename happens. Unlike bond or team, the primary slave of failover can't be renamed by userspace ahead of time, since the kernel initiated auto-enslavement is unable to, or rather, is never meant to be synchronized with the rename request from userspace. As the failover slave interfaces are not designed to be operated directly by userspace apps: IP configuration, filter rules with regard to network traffic passing and etc., should all be done on master interface. In general, userspace apps only care about the name of master interface, while slave names are less important as long as admin users can see reliable names that may carry other information describing the netdev. For e.g., they can infer that "ens3nsby" is a standby slave of "ens3", while for a name like "eth0" they can't tell which master it belongs to. Historically the name of IFF_UP interface can't be changed because there might be admin script or management software that is already relying on such behavior and assumes that the slave name can't be changed once UP. But failover is special: with the in-kernel auto-enslavement mechanism, the userspace expectation for device enumeration and bring-up order is already broken. Previously initramfs and various userspace config tools were modified to bypass failover slaves because of auto-enslavement and duplicate MAC address. Similarly, in case that users care about seeing reliable slave name, the new type of failover slaves needs to be taken care of specifically in userspace anyway. For that to work, now introduce a module-level tunable, "slave_rename_ok" that allows users to lift up the rename restriction on failover slave which is already UP. Although it's possible this change potentially break userspace component (most likely configuration scripts or management software) that assumes slave name can't be changed while UP, it's relatively a limited and controllable set among all userspace components, which can be fixed specifically to work with the new naming behavior of the failover slave. Userspace component interacting with slaves should be changed to operate on failover master instead, as the failover slave is dynamic in nature which may come and go at any point. The goal is to make the role of failover slaves less relevant, and all userspace should only deal with master in the long run. The default for the "slave_rename_ok" is set to true(1). If userspace doesn't have the right support in place meanwhile users don't care about reliable userspace naming, the value can be set to false(0). Signed-off-by: Si-Wei.Liu@oracle.com Reviewed-by: Liran Alon --- include/linux/netdevice.h | 3 +++ net/core/dev.c | 3 ++- net/core/failover.c | 11 +++++++++-- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 857f8ab..6d9e4e0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1487,6 +1487,7 @@ struct net_device_ops { * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook * @IFF_FAILOVER: device is a failover master device * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { IFF_NO_RX_HANDLER = 1<<26, IFF_FAILOVER = 1<<27, IFF_FAILOVER_SLAVE = 1<<28, + IFF_SLAVE_RENAME_OK = 1<<29, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER #define IFF_FAILOVER IFF_FAILOVER #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE +#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK /** * struct net_device - The DEVICE structure. diff --git a/net/core/dev.c b/net/core/dev.c index 722d50d..ae070de 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) BUG_ON(!dev_net(dev)); net = dev_net(dev); - if (dev->flags & IFF_UP) + if (dev->flags & IFF_UP && + !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) return -EBUSY; write_seqcount_begin(&devnet_rename_seq); diff --git a/net/core/failover.c b/net/core/failover.c index 4a92a98..1fd8bbb 100644 --- a/net/core/failover.c +++ b/net/core/failover.c @@ -16,6 +16,11 @@ static LIST_HEAD(failover_list); static DEFINE_SPINLOCK(failover_lock); +static bool slave_rename_ok = true; + +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); +MODULE_PARM_DESC(slave_rename_ok, + "If set allow renaming the slave when failover master is up"); static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) { @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) } slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; + if (slave_rename_ok) + slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; if (fops && fops->slave_register && !fops->slave_register(slave_dev, failover_dev)) return NOTIFY_OK; netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); err_upper_link: netdev_rx_handler_unregister(slave_dev); done: @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) netdev_rx_handler_unregister(slave_dev); netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); if (fops && fops->slave_unregister && !fops->slave_unregister(slave_dev, failover_dev))