diff mbox series

[RFC,net-next] failover: allow name change on IFF_UP slave interfaces

Message ID 1551747059-11831-1-git-send-email-si-wei.liu@oracle.com
State Superseded
Delegated to: David Miller
Headers show
Series [RFC,net-next] failover: allow name change on IFF_UP slave interfaces | expand

Commit Message

Si-Wei Liu March 5, 2019, 12:50 a.m. UTC
When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.

As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.

Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.

For that to work, now introduce a module-level tunable,
"slave_rename_ok" that allows users to lift up the rename restriction on
failover slave which is already UP. Although it's possible this change
potentially break userspace component (most likely configuration scripts
or management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to work with the new naming
behavior of the failover slave. Userspace component interacting with
slaves should be changed to operate on failover master instead, as the
failover slave is dynamic in nature which may come and go at any point.
The goal is to make the role of failover slaves less relevant, and
all userspace should only deal with master in the long run. The default
for the "slave_rename_ok" is set to true(1). If userspace doesn't have
the right support in place meanwhile users don't care about reliable
userspace naming, the value can be set to false(0).

Signed-off-by: Si-Wei.Liu@oracle.com
Reviewed-by: Liran Alon <liran.alon@oracle.com>
---
 include/linux/netdevice.h |  3 +++
 net/core/dev.c            |  3 ++-
 net/core/failover.c       | 11 +++++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

Comments

Michael S. Tsirkin March 5, 2019, 2:33 a.m. UTC | #1
On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
> 
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
> 
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
> 
> For that to work, now introduce a module-level tunable,
> "slave_rename_ok" that allows users to lift up the rename restriction on
> failover slave which is already UP. Although it's possible this change
> potentially break userspace component (most likely configuration scripts
> or management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to work with the new naming
> behavior of the failover slave. Userspace component interacting with
> slaves should be changed to operate on failover master instead, as the
> failover slave is dynamic in nature which may come and go at any point.
> The goal is to make the role of failover slaves less relevant, and
> all userspace should only deal with master in the long run. The default
> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> the right support in place meanwhile users don't care about reliable
> userspace naming, the value can be set to false(0).
> 
> Signed-off-by: Si-Wei.Liu@oracle.com
> Reviewed-by: Liran Alon <liran.alon@oracle.com>

Not sure which of the versions I should reply to.

I have a vague idea: would it work to *not* set
IFF_UP on slave devices at all?

Would this reduce the chances of existing scripts such as dracut being
confused?

And this leaves open the option for scripts to address
slaves by checking some custom attribute.

> ---
>  include/linux/netdevice.h |  3 +++
>  net/core/dev.c            |  3 ++-
>  net/core/failover.c       | 11 +++++++++--
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8ab..6d9e4e0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>   * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>   * @IFF_FAILOVER: device is a failover master device
>   * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>   */
>  enum netdev_priv_flags {
>  	IFF_802_1Q_VLAN			= 1<<0,
> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>  	IFF_NO_RX_HANDLER		= 1<<26,
>  	IFF_FAILOVER			= 1<<27,
>  	IFF_FAILOVER_SLAVE		= 1<<28,
> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>  };
>  
>  #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>  #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>  #define IFF_FAILOVER			IFF_FAILOVER
>  #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>  
>  /**
>   *	struct net_device - The DEVICE structure.
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 722d50d..ae070de 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>  	BUG_ON(!dev_net(dev));
>  
>  	net = dev_net(dev);
> -	if (dev->flags & IFF_UP)
> +	if (dev->flags & IFF_UP &&
> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>  		return -EBUSY;
>  
>  	write_seqcount_begin(&devnet_rename_seq);
> diff --git a/net/core/failover.c b/net/core/failover.c
> index 4a92a98..1fd8bbb 100644
> --- a/net/core/failover.c
> +++ b/net/core/failover.c
> @@ -16,6 +16,11 @@
>  
>  static LIST_HEAD(failover_list);
>  static DEFINE_SPINLOCK(failover_lock);
> +static bool slave_rename_ok = true;
> +
> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> +MODULE_PARM_DESC(slave_rename_ok,
> +		 "If set allow renaming the slave when failover master is up");
>  
>  static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>  {
> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>  	}
>  
>  	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> +	if (slave_rename_ok)
> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>  
>  	if (fops && fops->slave_register &&
>  	    !fops->slave_register(slave_dev, failover_dev))
>  		return NOTIFY_OK;
>  
>  	netdev_upper_dev_unlink(slave_dev, failover_dev);
> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>  err_upper_link:
>  	netdev_rx_handler_unregister(slave_dev);
>  done:
> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>  
>  	netdev_rx_handler_unregister(slave_dev);
>  	netdev_upper_dev_unlink(slave_dev, failover_dev);
> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>  
>  	if (fops && fops->slave_unregister &&
>  	    !fops->slave_unregister(slave_dev, failover_dev))
> -- 
> 1.8.3.1
Si-Wei Liu March 5, 2019, 7:19 p.m. UTC | #2
On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
>> When a netdev appears through hot plug then gets enslaved by a failover
>> master that is already up and running, the slave will be opened
>> right away after getting enslaved. Today there's a race that userspace
>> (udev) may fail to rename the slave if the kernel (net_failover)
>> opens the slave earlier than when the userspace rename happens.
>> Unlike bond or team, the primary slave of failover can't be renamed by
>> userspace ahead of time, since the kernel initiated auto-enslavement is
>> unable to, or rather, is never meant to be synchronized with the rename
>> request from userspace.
>>
>> As the failover slave interfaces are not designed to be operated
>> directly by userspace apps: IP configuration, filter rules with
>> regard to network traffic passing and etc., should all be done on master
>> interface. In general, userspace apps only care about the
>> name of master interface, while slave names are less important as long
>> as admin users can see reliable names that may carry
>> other information describing the netdev. For e.g., they can infer that
>> "ens3nsby" is a standby slave of "ens3", while for a
>> name like "eth0" they can't tell which master it belongs to.
>>
>> Historically the name of IFF_UP interface can't be changed because
>> there might be admin script or management software that is already
>> relying on such behavior and assumes that the slave name can't be
>> changed once UP. But failover is special: with the in-kernel
>> auto-enslavement mechanism, the userspace expectation for device
>> enumeration and bring-up order is already broken. Previously initramfs
>> and various userspace config tools were modified to bypass failover
>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>> in case that users care about seeing reliable slave name, the new type
>> of failover slaves needs to be taken care of specifically in userspace
>> anyway.
>>
>> For that to work, now introduce a module-level tunable,
>> "slave_rename_ok" that allows users to lift up the rename restriction on
>> failover slave which is already UP. Although it's possible this change
>> potentially break userspace component (most likely configuration scripts
>> or management software) that assumes slave name can't be changed while
>> UP, it's relatively a limited and controllable set among all userspace
>> components, which can be fixed specifically to work with the new naming
>> behavior of the failover slave. Userspace component interacting with
>> slaves should be changed to operate on failover master instead, as the
>> failover slave is dynamic in nature which may come and go at any point.
>> The goal is to make the role of failover slaves less relevant, and
>> all userspace should only deal with master in the long run. The default
>> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>> the right support in place meanwhile users don't care about reliable
>> userspace naming, the value can be set to false(0).
>>
>> Signed-off-by: Si-Wei.Liu@oracle.com
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Not sure which of the versions I should reply to.
Sorry for multiple copies sent. It's fine to reply to this one.

>
> I have a vague idea: would it work to *not* set
> IFF_UP on slave devices at all?
Hmm, I ever thought about this option, and it appears this solution is 
more invasive than required to convert existing scripts, despite the 
controversy of introducing internal netdev state to differentiate user 
visible state. Either we disallow slave to be brought up by user, or to 
not set IFF_UP flag but instead use the internal one, could end up with 
substantial behavioral change that breaks scripts. Consider any admin 
script that does `ip link set dev ... up' successfully just assumes the 
link is up and subsequent operation can be done as usual. While it *may* 
work for dracut (yet to be verified), I'm a bit concerned that there are 
more scripts to be converted than those that don't follow volatile 
failover slave names. It's technically doable, but may not worth the 
effort (in terms of porting existing scripts/apps).

Thanks
-Siwei

>
> Would this reduce the chances of existing scripts such as dracut being
> confused?
>
> And this leaves open the option for scripts to address
> slaves by checking some custom attribute.
>
>> ---
>>   include/linux/netdevice.h |  3 +++
>>   net/core/dev.c            |  3 ++-
>>   net/core/failover.c       | 11 +++++++++--
>>   3 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 857f8ab..6d9e4e0 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>    * @IFF_FAILOVER: device is a failover master device
>>    * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>>    */
>>   enum netdev_priv_flags {
>>   	IFF_802_1Q_VLAN			= 1<<0,
>> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>>   	IFF_NO_RX_HANDLER		= 1<<26,
>>   	IFF_FAILOVER			= 1<<27,
>>   	IFF_FAILOVER_SLAVE		= 1<<28,
>> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>>   };
>>   
>>   #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>>   #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>>   #define IFF_FAILOVER			IFF_FAILOVER
>>   #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>>   
>>   /**
>>    *	struct net_device - The DEVICE structure.
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 722d50d..ae070de 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>   	BUG_ON(!dev_net(dev));
>>   
>>   	net = dev_net(dev);
>> -	if (dev->flags & IFF_UP)
>> +	if (dev->flags & IFF_UP &&
>> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>>   		return -EBUSY;
>>   
>>   	write_seqcount_begin(&devnet_rename_seq);
>> diff --git a/net/core/failover.c b/net/core/failover.c
>> index 4a92a98..1fd8bbb 100644
>> --- a/net/core/failover.c
>> +++ b/net/core/failover.c
>> @@ -16,6 +16,11 @@
>>   
>>   static LIST_HEAD(failover_list);
>>   static DEFINE_SPINLOCK(failover_lock);
>> +static bool slave_rename_ok = true;
>> +
>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>> +MODULE_PARM_DESC(slave_rename_ok,
>> +		 "If set allow renaming the slave when failover master is up");
>>   
>>   static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>>   {
>> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>>   	}
>>   
>>   	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>> +	if (slave_rename_ok)
>> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>>   
>>   	if (fops && fops->slave_register &&
>>   	    !fops->slave_register(slave_dev, failover_dev))
>>   		return NOTIFY_OK;
>>   
>>   	netdev_upper_dev_unlink(slave_dev, failover_dev);
>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>   err_upper_link:
>>   	netdev_rx_handler_unregister(slave_dev);
>>   done:
>> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>>   
>>   	netdev_rx_handler_unregister(slave_dev);
>>   	netdev_upper_dev_unlink(slave_dev, failover_dev);
>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>   
>>   	if (fops && fops->slave_unregister &&
>>   	    !fops->slave_unregister(slave_dev, failover_dev))
>> -- 
>> 1.8.3.1
Stephen Hemminger March 5, 2019, 7:24 p.m. UTC | #3
On Tue, 5 Mar 2019 11:19:32 -0800
si-wei liu <si-wei.liu@oracle.com> wrote:

> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?  
> Hmm, I ever thought about this option, and it appears this solution is 
> more invasive than required to convert existing scripts, despite the 
> controversy of introducing internal netdev state to differentiate user 
> visible state. Either we disallow slave to be brought up by user, or to 
> not set IFF_UP flag but instead use the internal one, could end up with 
> substantial behavioral change that breaks scripts. Consider any admin 
> script that does `ip link set dev ... up' successfully just assumes the 
> link is up and subsequent operation can be done as usual. While it *may* 
> work for dracut (yet to be verified), I'm a bit concerned that there are 
> more scripts to be converted than those that don't follow volatile 
> failover slave names. It's technically doable, but may not worth the 
> effort (in terms of porting existing scripts/apps).
> 
> Thanks
> -Siwei

Won't work for most devices.  Many devices turn off PHY and link layer
if not IFF_UP
Si-Wei Liu March 5, 2019, 7:35 p.m. UTC | #4
On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> On Tue, 5 Mar 2019 11:19:32 -0800
> si-wei liu <si-wei.liu@oracle.com> wrote:
>
>>> I have a vague idea: would it work to *not* set
>>> IFF_UP on slave devices at all?
>> Hmm, I ever thought about this option, and it appears this solution is
>> more invasive than required to convert existing scripts, despite the
>> controversy of introducing internal netdev state to differentiate user
>> visible state. Either we disallow slave to be brought up by user, or to
>> not set IFF_UP flag but instead use the internal one, could end up with
>> substantial behavioral change that breaks scripts. Consider any admin
>> script that does `ip link set dev ... up' successfully just assumes the
>> link is up and subsequent operation can be done as usual. While it *may*
>> work for dracut (yet to be verified), I'm a bit concerned that there are
>> more scripts to be converted than those that don't follow volatile
>> failover slave names. It's technically doable, but may not worth the
>> effort (in terms of porting existing scripts/apps).
>>
>> Thanks
>> -Siwei
> Won't work for most devices.  Many devices turn off PHY and link layer
> if not IFF_UP
True, that's what I said about introducing internal state for those 
driver and other kernel component. Very invasive change indeed.

-Siwei
Michael S. Tsirkin March 5, 2019, 8:28 p.m. UTC | #5
On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote:
> 
> 
> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
> > On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> > > When a netdev appears through hot plug then gets enslaved by a failover
> > > master that is already up and running, the slave will be opened
> > > right away after getting enslaved. Today there's a race that userspace
> > > (udev) may fail to rename the slave if the kernel (net_failover)
> > > opens the slave earlier than when the userspace rename happens.
> > > Unlike bond or team, the primary slave of failover can't be renamed by
> > > userspace ahead of time, since the kernel initiated auto-enslavement is
> > > unable to, or rather, is never meant to be synchronized with the rename
> > > request from userspace.
> > > 
> > > As the failover slave interfaces are not designed to be operated
> > > directly by userspace apps: IP configuration, filter rules with
> > > regard to network traffic passing and etc., should all be done on master
> > > interface. In general, userspace apps only care about the
> > > name of master interface, while slave names are less important as long
> > > as admin users can see reliable names that may carry
> > > other information describing the netdev. For e.g., they can infer that
> > > "ens3nsby" is a standby slave of "ens3", while for a
> > > name like "eth0" they can't tell which master it belongs to.
> > > 
> > > Historically the name of IFF_UP interface can't be changed because
> > > there might be admin script or management software that is already
> > > relying on such behavior and assumes that the slave name can't be
> > > changed once UP. But failover is special: with the in-kernel
> > > auto-enslavement mechanism, the userspace expectation for device
> > > enumeration and bring-up order is already broken. Previously initramfs
> > > and various userspace config tools were modified to bypass failover
> > > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > > in case that users care about seeing reliable slave name, the new type
> > > of failover slaves needs to be taken care of specifically in userspace
> > > anyway.
> > > 
> > > For that to work, now introduce a module-level tunable,
> > > "slave_rename_ok" that allows users to lift up the rename restriction on
> > > failover slave which is already UP. Although it's possible this change
> > > potentially break userspace component (most likely configuration scripts
> > > or management software) that assumes slave name can't be changed while
> > > UP, it's relatively a limited and controllable set among all userspace
> > > components, which can be fixed specifically to work with the new naming
> > > behavior of the failover slave. Userspace component interacting with
> > > slaves should be changed to operate on failover master instead, as the
> > > failover slave is dynamic in nature which may come and go at any point.
> > > The goal is to make the role of failover slaves less relevant, and
> > > all userspace should only deal with master in the long run. The default
> > > for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> > > the right support in place meanwhile users don't care about reliable
> > > userspace naming, the value can be set to false(0).
> > > 
> > > Signed-off-by: Si-Wei.Liu@oracle.com
> > > Reviewed-by: Liran Alon <liran.alon@oracle.com>
> > Not sure which of the versions I should reply to.
> Sorry for multiple copies sent. It's fine to reply to this one.
> 
> > 
> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?
> Hmm, I ever thought about this option, and it appears this solution is more
> invasive than required to convert existing scripts, despite the controversy
> of introducing internal netdev state to differentiate user visible state.
> Either we disallow slave to be brought up by user, or to not set IFF_UP flag
> but instead use the internal one, could end up with substantial behavioral
> change that breaks scripts. Consider any admin script that does `ip link set
> dev ... up' successfully just assumes the link is up and subsequent
> operation can be done as usual. While it *may* work for dracut (yet to be
> verified), I'm a bit concerned that there are more scripts to be converted
> than those that don't follow volatile failover slave names. It's technically
> doable, but may not worth the effort (in terms of porting existing
> scripts/apps).
> 
> Thanks
> -Siwei


Right. Advantage could be that we prevent all kind of
misconfigurations e.g. when one has a route on a slave.

> > 
> > Would this reduce the chances of existing scripts such as dracut being
> > confused?
> > 
> > And this leaves open the option for scripts to address
> > slaves by checking some custom attribute.
> > 
> > > ---
> > >   include/linux/netdevice.h |  3 +++
> > >   net/core/dev.c            |  3 ++-
> > >   net/core/failover.c       | 11 +++++++++--
> > >   3 files changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 857f8ab..6d9e4e0 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1487,6 +1487,7 @@ struct net_device_ops {
> > >    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> > >    * @IFF_FAILOVER: device is a failover master device
> > >    * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> > > + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
> > >    */
> > >   enum netdev_priv_flags {
> > >   	IFF_802_1Q_VLAN			= 1<<0,
> > > @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> > >   	IFF_NO_RX_HANDLER		= 1<<26,
> > >   	IFF_FAILOVER			= 1<<27,
> > >   	IFF_FAILOVER_SLAVE		= 1<<28,
> > > +	IFF_SLAVE_RENAME_OK		= 1<<29,
> > >   };
> > >   #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> > > @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> > >   #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> > >   #define IFF_FAILOVER			IFF_FAILOVER
> > >   #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> > > +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> > >   /**
> > >    *	struct net_device - The DEVICE structure.
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 722d50d..ae070de 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> > >   	BUG_ON(!dev_net(dev));
> > >   	net = dev_net(dev);
> > > -	if (dev->flags & IFF_UP)
> > > +	if (dev->flags & IFF_UP &&
> > > +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> > >   		return -EBUSY;
> > >   	write_seqcount_begin(&devnet_rename_seq);
> > > diff --git a/net/core/failover.c b/net/core/failover.c
> > > index 4a92a98..1fd8bbb 100644
> > > --- a/net/core/failover.c
> > > +++ b/net/core/failover.c
> > > @@ -16,6 +16,11 @@
> > >   static LIST_HEAD(failover_list);
> > >   static DEFINE_SPINLOCK(failover_lock);
> > > +static bool slave_rename_ok = true;
> > > +
> > > +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> > > +MODULE_PARM_DESC(slave_rename_ok,
> > > +		 "If set allow renaming the slave when failover master is up");
> > >   static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> > >   {
> > > @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> > >   	}
> > >   	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> > > +	if (slave_rename_ok)
> > > +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> > >   	if (fops && fops->slave_register &&
> > >   	    !fops->slave_register(slave_dev, failover_dev))
> > >   		return NOTIFY_OK;
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   err_upper_link:
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   done:
> > > @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   	if (fops && fops->slave_unregister &&
> > >   	    !fops->slave_unregister(slave_dev, failover_dev))
> > > -- 
> > > 1.8.3.1
Si-Wei Liu March 5, 2019, 10:49 p.m. UTC | #6
On 3/5/2019 12:28 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote:
>>
>> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
>>> On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
>>>> When a netdev appears through hot plug then gets enslaved by a failover
>>>> master that is already up and running, the slave will be opened
>>>> right away after getting enslaved. Today there's a race that userspace
>>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>>> opens the slave earlier than when the userspace rename happens.
>>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>>> unable to, or rather, is never meant to be synchronized with the rename
>>>> request from userspace.
>>>>
>>>> As the failover slave interfaces are not designed to be operated
>>>> directly by userspace apps: IP configuration, filter rules with
>>>> regard to network traffic passing and etc., should all be done on master
>>>> interface. In general, userspace apps only care about the
>>>> name of master interface, while slave names are less important as long
>>>> as admin users can see reliable names that may carry
>>>> other information describing the netdev. For e.g., they can infer that
>>>> "ens3nsby" is a standby slave of "ens3", while for a
>>>> name like "eth0" they can't tell which master it belongs to.
>>>>
>>>> Historically the name of IFF_UP interface can't be changed because
>>>> there might be admin script or management software that is already
>>>> relying on such behavior and assumes that the slave name can't be
>>>> changed once UP. But failover is special: with the in-kernel
>>>> auto-enslavement mechanism, the userspace expectation for device
>>>> enumeration and bring-up order is already broken. Previously initramfs
>>>> and various userspace config tools were modified to bypass failover
>>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>>> in case that users care about seeing reliable slave name, the new type
>>>> of failover slaves needs to be taken care of specifically in userspace
>>>> anyway.
>>>>
>>>> For that to work, now introduce a module-level tunable,
>>>> "slave_rename_ok" that allows users to lift up the rename restriction on
>>>> failover slave which is already UP. Although it's possible this change
>>>> potentially break userspace component (most likely configuration scripts
>>>> or management software) that assumes slave name can't be changed while
>>>> UP, it's relatively a limited and controllable set among all userspace
>>>> components, which can be fixed specifically to work with the new naming
>>>> behavior of the failover slave. Userspace component interacting with
>>>> slaves should be changed to operate on failover master instead, as the
>>>> failover slave is dynamic in nature which may come and go at any point.
>>>> The goal is to make the role of failover slaves less relevant, and
>>>> all userspace should only deal with master in the long run. The default
>>>> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>>>> the right support in place meanwhile users don't care about reliable
>>>> userspace naming, the value can be set to false(0).
>>>>
>>>> Signed-off-by: Si-Wei.Liu@oracle.com
>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>> Not sure which of the versions I should reply to.
>> Sorry for multiple copies sent. It's fine to reply to this one.
>>
>>> I have a vague idea: would it work to *not* set
>>> IFF_UP on slave devices at all?
>> Hmm, I ever thought about this option, and it appears this solution is more
>> invasive than required to convert existing scripts, despite the controversy
>> of introducing internal netdev state to differentiate user visible state.
>> Either we disallow slave to be brought up by user, or to not set IFF_UP flag
>> but instead use the internal one, could end up with substantial behavioral
>> change that breaks scripts. Consider any admin script that does `ip link set
>> dev ... up' successfully just assumes the link is up and subsequent
>> operation can be done as usual. While it *may* work for dracut (yet to be
>> verified), I'm a bit concerned that there are more scripts to be converted
>> than those that don't follow volatile failover slave names. It's technically
>> doable, but may not worth the effort (in terms of porting existing
>> scripts/apps).
>>
>> Thanks
>> -Siwei
>
> Right. Advantage could be that we prevent all kind of
> misconfigurations e.g. when one has a route on a slave.
The fix for the slave route problem is already there in dracut. The ship 
has sailed, no matter how seamless upstream thought failover could work 
with the existing userspace. I would rather avoid introducing more 
breakage to userspace if there's simple yet less intrusive way to fix 
the rename issue itself.

-Siwei

>
>>> Would this reduce the chances of existing scripts such as dracut being
>>> confused?
>>>
>>> And this leaves open the option for scripts to address
>>> slaves by checking some custom attribute.
>>>
>>>> ---
>>>>    include/linux/netdevice.h |  3 +++
>>>>    net/core/dev.c            |  3 ++-
>>>>    net/core/failover.c       | 11 +++++++++--
>>>>    3 files changed, 14 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 857f8ab..6d9e4e0 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>>>>     * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>>>     * @IFF_FAILOVER: device is a failover master device
>>>>     * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>>> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>>>>     */
>>>>    enum netdev_priv_flags {
>>>>    	IFF_802_1Q_VLAN			= 1<<0,
>>>> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>>>>    	IFF_NO_RX_HANDLER		= 1<<26,
>>>>    	IFF_FAILOVER			= 1<<27,
>>>>    	IFF_FAILOVER_SLAVE		= 1<<28,
>>>> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>>>>    };
>>>>    #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>>>> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>>>>    #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>>>>    #define IFF_FAILOVER			IFF_FAILOVER
>>>>    #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>>>> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>>>>    /**
>>>>     *	struct net_device - The DEVICE structure.
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 722d50d..ae070de 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>>>    	BUG_ON(!dev_net(dev));
>>>>    	net = dev_net(dev);
>>>> -	if (dev->flags & IFF_UP)
>>>> +	if (dev->flags & IFF_UP &&
>>>> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>>>>    		return -EBUSY;
>>>>    	write_seqcount_begin(&devnet_rename_seq);
>>>> diff --git a/net/core/failover.c b/net/core/failover.c
>>>> index 4a92a98..1fd8bbb 100644
>>>> --- a/net/core/failover.c
>>>> +++ b/net/core/failover.c
>>>> @@ -16,6 +16,11 @@
>>>>    static LIST_HEAD(failover_list);
>>>>    static DEFINE_SPINLOCK(failover_lock);
>>>> +static bool slave_rename_ok = true;
>>>> +
>>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>>>> +MODULE_PARM_DESC(slave_rename_ok,
>>>> +		 "If set allow renaming the slave when failover master is up");
>>>>    static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>>>>    {
>>>> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>>>>    	}
>>>>    	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>>>> +	if (slave_rename_ok)
>>>> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>>>>    	if (fops && fops->slave_register &&
>>>>    	    !fops->slave_register(slave_dev, failover_dev))
>>>>    		return NOTIFY_OK;
>>>>    	netdev_upper_dev_unlink(slave_dev, failover_dev);
>>>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>>>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>>>    err_upper_link:
>>>>    	netdev_rx_handler_unregister(slave_dev);
>>>>    done:
>>>> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>>>>    	netdev_rx_handler_unregister(slave_dev);
>>>>    	netdev_upper_dev_unlink(slave_dev, failover_dev);
>>>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>>>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>>>    	if (fops && fops->slave_unregister &&
>>>>    	    !fops->slave_unregister(slave_dev, failover_dev))
>>>> -- 
>>>> 1.8.3.1
Michael S. Tsirkin March 6, 2019, 12:06 a.m. UTC | #7
On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > On Tue, 5 Mar 2019 11:19:32 -0800
> > si-wei liu <si-wei.liu@oracle.com> wrote:
> > 
> > > > I have a vague idea: would it work to *not* set
> > > > IFF_UP on slave devices at all?
> > > Hmm, I ever thought about this option, and it appears this solution is
> > > more invasive than required to convert existing scripts, despite the
> > > controversy of introducing internal netdev state to differentiate user
> > > visible state. Either we disallow slave to be brought up by user, or to
> > > not set IFF_UP flag but instead use the internal one, could end up with
> > > substantial behavioral change that breaks scripts. Consider any admin
> > > script that does `ip link set dev ... up' successfully just assumes the
> > > link is up and subsequent operation can be done as usual.

How would it work when carrier is off?

> While it *may*
> > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > more scripts to be converted than those that don't follow volatile
> > > failover slave names. It's technically doable, but may not worth the
> > > effort (in terms of porting existing scripts/apps).
> > > 
> > > Thanks
> > > -Siwei
> > Won't work for most devices.  Many devices turn off PHY and link layer
> > if not IFF_UP
> True, that's what I said about introducing internal state for those driver
> and other kernel component. Very invasive change indeed.
> 
> -Siwei

Well I did say it's vague.
How about hiding IFF_UP from dev_get_flags (and probably
__dev_change_flags)?
Si-Wei Liu March 6, 2019, 12:20 a.m. UTC | #8
On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>
>>>>> I have a vague idea: would it work to *not* set
>>>>> IFF_UP on slave devices at all?
>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>> more invasive than required to convert existing scripts, despite the
>>>> controversy of introducing internal netdev state to differentiate user
>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>> link is up and subsequent operation can be done as usual.
> How would it work when carrier is off?
>
>> While it *may*
>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>> more scripts to be converted than those that don't follow volatile
>>>> failover slave names. It's technically doable, but may not worth the
>>>> effort (in terms of porting existing scripts/apps).
>>>>
>>>> Thanks
>>>> -Siwei
>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>> if not IFF_UP
>> True, that's what I said about introducing internal state for those driver
>> and other kernel component. Very invasive change indeed.
>>
>> -Siwei
> Well I did say it's vague.
> How about hiding IFF_UP from dev_get_flags (and probably
> __dev_change_flags)?
>
Any different? This has small footprint for the kernel change for sure, 
while the discrepancy is still there. Anyone who writes code for IFF_UP 
will not notice IFF_FAILOVER_SLAVE.

Not to mention more userspace "fixup" work has to be done due to this 
change.

-Siwei
Michael S. Tsirkin March 6, 2019, 12:36 a.m. UTC | #9
On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > 
> > > > > > I have a vague idea: would it work to *not* set
> > > > > > IFF_UP on slave devices at all?
> > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > more invasive than required to convert existing scripts, despite the
> > > > > controversy of introducing internal netdev state to differentiate user
> > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > link is up and subsequent operation can be done as usual.
> > How would it work when carrier is off?
> > 
> > > While it *may*
> > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > more scripts to be converted than those that don't follow volatile
> > > > > failover slave names. It's technically doable, but may not worth the
> > > > > effort (in terms of porting existing scripts/apps).
> > > > > 
> > > > > Thanks
> > > > > -Siwei
> > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > if not IFF_UP
> > > True, that's what I said about introducing internal state for those driver
> > > and other kernel component. Very invasive change indeed.
> > > 
> > > -Siwei
> > Well I did say it's vague.
> > How about hiding IFF_UP from dev_get_flags (and probably
> > __dev_change_flags)?
> > 
> Any different? This has small footprint for the kernel change for sure,
> while the discrepancy is still there. Anyone who writes code for IFF_UP will
> not notice IFF_FAILOVER_SLAVE.
> 
> Not to mention more userspace "fixup" work has to be done due to this
> change.
> 
> -Siwei
> 
> 

Point is it's ok since most userspace should just ignore slaves
- hopefully it will just ignore it since it already
ignores interfaces that are down.
Si-Wei Liu March 6, 2019, 12:51 a.m. UTC | #10
On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>>>
>>>>>>> I have a vague idea: would it work to *not* set
>>>>>>> IFF_UP on slave devices at all?
>>>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>>>> more invasive than required to convert existing scripts, despite the
>>>>>> controversy of introducing internal netdev state to differentiate user
>>>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>>>> link is up and subsequent operation can be done as usual.
>>> How would it work when carrier is off?
>>>
>>>> While it *may*
>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>>>> more scripts to be converted than those that don't follow volatile
>>>>>> failover slave names. It's technically doable, but may not worth the
>>>>>> effort (in terms of porting existing scripts/apps).
>>>>>>
>>>>>> Thanks
>>>>>> -Siwei
>>>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>>>> if not IFF_UP
>>>> True, that's what I said about introducing internal state for those driver
>>>> and other kernel component. Very invasive change indeed.
>>>>
>>>> -Siwei
>>> Well I did say it's vague.
>>> How about hiding IFF_UP from dev_get_flags (and probably
>>> __dev_change_flags)?
>>>
>> Any different? This has small footprint for the kernel change for sure,
>> while the discrepancy is still there. Anyone who writes code for IFF_UP will
>> not notice IFF_FAILOVER_SLAVE.
>>
>> Not to mention more userspace "fixup" work has to be done due to this
>> change.
>>
>> -Siwei
>>
>>
> Point is it's ok since most userspace should just ignore slaves
> - hopefully it will just ignore it since it already
> ignores interfaces that are down.
Admin script thought the interface could be bright up and do further 
operations without checking the UP flag. It doesn't look to be a 
reliable way of prohibit userspace from operating against slaves.

-Siwei
Michael S. Tsirkin March 6, 2019, 6:43 a.m. UTC | #11
On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > > > 
> > > > > > > > I have a vague idea: would it work to *not* set
> > > > > > > > IFF_UP on slave devices at all?
> > > > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > > > more invasive than required to convert existing scripts, despite the
> > > > > > > controversy of introducing internal netdev state to differentiate user
> > > > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > > > link is up and subsequent operation can be done as usual.
> > > > How would it work when carrier is off?
> > > > 
> > > > > While it *may*
> > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > > > more scripts to be converted than those that don't follow volatile
> > > > > > > failover slave names. It's technically doable, but may not worth the
> > > > > > > effort (in terms of porting existing scripts/apps).
> > > > > > > 
> > > > > > > Thanks
> > > > > > > -Siwei
> > > > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > > > if not IFF_UP
> > > > > True, that's what I said about introducing internal state for those driver
> > > > > and other kernel component. Very invasive change indeed.
> > > > > 
> > > > > -Siwei
> > > > Well I did say it's vague.
> > > > How about hiding IFF_UP from dev_get_flags (and probably
> > > > __dev_change_flags)?
> > > > 
> > > Any different? This has small footprint for the kernel change for sure,
> > > while the discrepancy is still there. Anyone who writes code for IFF_UP will
> > > not notice IFF_FAILOVER_SLAVE.
> > > 
> > > Not to mention more userspace "fixup" work has to be done due to this
> > > change.
> > > 
> > > -Siwei
> > > 
> > > 
> > Point is it's ok since most userspace should just ignore slaves
> > - hopefully it will just ignore it since it already
> > ignores interfaces that are down.
> Admin script thought the interface could be bright up and do further
> operations without checking the UP flag.

These scripts then would be broken  on any box with multiple interfaces
since not all of these would have carrier.


> It doesn't look to be a reliable
> way of prohibit userspace from operating against slaves.
> 
> -Siwei
> 
> 

This does not mean we shouldn't make an effort to disable broken
configurations.

I am not arguing against your patch. Not at all. I see better
hiding of slaves as a separate enhancement.


Acked-by: Michael S. Tsirkin <mst@redhat.com>
Si-Wei Liu March 6, 2019, 7:15 a.m. UTC | #12
On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
>>>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>>>>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>>>>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>>>>>
>>>>>>>>> I have a vague idea: would it work to *not* set
>>>>>>>>> IFF_UP on slave devices at all?
>>>>>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>>>>>> more invasive than required to convert existing scripts, despite the
>>>>>>>> controversy of introducing internal netdev state to differentiate user
>>>>>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>>>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>>>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>>>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>>>>>> link is up and subsequent operation can be done as usual.
>>>>> How would it work when carrier is off?
>>>>>
>>>>>> While it *may*
>>>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>>>>>> more scripts to be converted than those that don't follow volatile
>>>>>>>> failover slave names. It's technically doable, but may not worth the
>>>>>>>> effort (in terms of porting existing scripts/apps).
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> -Siwei
>>>>>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>>>>>> if not IFF_UP
>>>>>> True, that's what I said about introducing internal state for those driver
>>>>>> and other kernel component. Very invasive change indeed.
>>>>>>
>>>>>> -Siwei
>>>>> Well I did say it's vague.
>>>>> How about hiding IFF_UP from dev_get_flags (and probably
>>>>> __dev_change_flags)?
>>>>>
>>>> Any different? This has small footprint for the kernel change for sure,
>>>> while the discrepancy is still there. Anyone who writes code for IFF_UP will
>>>> not notice IFF_FAILOVER_SLAVE.
>>>>
>>>> Not to mention more userspace "fixup" work has to be done due to this
>>>> change.
>>>>
>>>> -Siwei
>>>>
>>>>
>>> Point is it's ok since most userspace should just ignore slaves
>>> - hopefully it will just ignore it since it already
>>> ignores interfaces that are down.
>> Admin script thought the interface could be bright up and do further
>> operations without checking the UP flag.
> These scripts then would be broken  on any box with multiple interfaces
> since not all of these would have carrier.
Consider a script executing `ifconfig ... up' and once succeeds runs 
tcpdump or some other command relying on UP interface. It's quite common 
that those scripts don't check the UP flag but instead just rely on the 
well-known fact that the command exits with 0 meaning the interface 
should be UP. This change might well break scripts of that kind.

>
>
>> It doesn't look to be a reliable
>> way of prohibit userspace from operating against slaves.
>>
>> -Siwei
>>
>>
> This does not mean we shouldn't make an effort to disable broken
> configurations.
>
> I am not arguing against your patch. Not at all. I see better
> hiding of slaves as a separate enhancement.
I understand, but my point is we should try to minimize unnecessary side 
impact to the current usage for whatever "hiding" effort we can make. 
It's hard to find a tradeoff sometimes.

>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
>
Thank you.

-Siwei
Michael S. Tsirkin March 6, 2019, 7:23 a.m. UTC | #13
On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> > > > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > > > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > > > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > > > > > 
> > > > > > > > > > I have a vague idea: would it work to *not* set
> > > > > > > > > > IFF_UP on slave devices at all?
> > > > > > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > > > > > more invasive than required to convert existing scripts, despite the
> > > > > > > > > controversy of introducing internal netdev state to differentiate user
> > > > > > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > > > > > link is up and subsequent operation can be done as usual.
> > > > > > How would it work when carrier is off?
> > > > > > 
> > > > > > > While it *may*
> > > > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > > > > > more scripts to be converted than those that don't follow volatile
> > > > > > > > > failover slave names. It's technically doable, but may not worth the
> > > > > > > > > effort (in terms of porting existing scripts/apps).
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > > -Siwei
> > > > > > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > > > > > if not IFF_UP
> > > > > > > True, that's what I said about introducing internal state for those driver
> > > > > > > and other kernel component. Very invasive change indeed.
> > > > > > > 
> > > > > > > -Siwei
> > > > > > Well I did say it's vague.
> > > > > > How about hiding IFF_UP from dev_get_flags (and probably
> > > > > > __dev_change_flags)?
> > > > > > 
> > > > > Any different? This has small footprint for the kernel change for sure,
> > > > > while the discrepancy is still there. Anyone who writes code for IFF_UP will
> > > > > not notice IFF_FAILOVER_SLAVE.
> > > > > 
> > > > > Not to mention more userspace "fixup" work has to be done due to this
> > > > > change.
> > > > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > Point is it's ok since most userspace should just ignore slaves
> > > > - hopefully it will just ignore it since it already
> > > > ignores interfaces that are down.
> > > Admin script thought the interface could be bright up and do further
> > > operations without checking the UP flag.
> > These scripts then would be broken  on any box with multiple interfaces
> > since not all of these would have carrier.
> Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump
> or some other command relying on UP interface. It's quite common that those
> scripts don't check the UP flag but instead just rely on the well-known fact
> that the command exits with 0 meaning the interface should be UP. This
> change might well break scripts of that kind.

I am sorry I don't get it. Could you give an example
of a script that works now but would be broken?


> > 
> > 
> > > It doesn't look to be a reliable
> > > way of prohibit userspace from operating against slaves.
> > > 
> > > -Siwei
> > > 
> > > 
> > This does not mean we shouldn't make an effort to disable broken
> > configurations.
> > 
> > I am not arguing against your patch. Not at all. I see better
> > hiding of slaves as a separate enhancement.
> I understand, but my point is we should try to minimize unnecessary side
> impact to the current usage for whatever "hiding" effort we can make. It's
> hard to find a tradeoff sometimes.

Yes if some userspace made an assumption and it worked, we should keep
it working I think. I don't necessarily agree we should worry too much
about theoretical issues. In half a year since the feature got merged
it's unlikely there are millions of slightly different scripts using it.

> > 
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > 
> Thank you.
> 
> -Siwei
Si-Wei Liu March 6, 2019, 8:20 a.m. UTC | #14
On 3/5/2019 11:23 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
>>>> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
>>>>>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>>>>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>>>>>>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>>>>>>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>>> I have a vague idea: would it work to *not* set
>>>>>>>>>>> IFF_UP on slave devices at all?
>>>>>>>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>>>>>>>> more invasive than required to convert existing scripts, despite the
>>>>>>>>>> controversy of introducing internal netdev state to differentiate user
>>>>>>>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>>>>>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>>>>>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>>>>>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>>>>>>>> link is up and subsequent operation can be done as usual.
>>>>>>> How would it work when carrier is off?
>>>>>>>
>>>>>>>> While it *may*
>>>>>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>>>>>>>> more scripts to be converted than those that don't follow volatile
>>>>>>>>>> failover slave names. It's technically doable, but may not worth the
>>>>>>>>>> effort (in terms of porting existing scripts/apps).
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> -Siwei
>>>>>>>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>>>>>>>> if not IFF_UP
>>>>>>>> True, that's what I said about introducing internal state for those driver
>>>>>>>> and other kernel component. Very invasive change indeed.
>>>>>>>>
>>>>>>>> -Siwei
>>>>>>> Well I did say it's vague.
>>>>>>> How about hiding IFF_UP from dev_get_flags (and probably
>>>>>>> __dev_change_flags)?
>>>>>>>
>>>>>> Any different? This has small footprint for the kernel change for sure,
>>>>>> while the discrepancy is still there. Anyone who writes code for IFF_UP will
>>>>>> not notice IFF_FAILOVER_SLAVE.
>>>>>>
>>>>>> Not to mention more userspace "fixup" work has to be done due to this
>>>>>> change.
>>>>>>
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>> Point is it's ok since most userspace should just ignore slaves
>>>>> - hopefully it will just ignore it since it already
>>>>> ignores interfaces that are down.
>>>> Admin script thought the interface could be bright up and do further
>>>> operations without checking the UP flag.
>>> These scripts then would be broken  on any box with multiple interfaces
>>> since not all of these would have carrier.
>> Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump
>> or some other command relying on UP interface. It's quite common that those
>> scripts don't check the UP flag but instead just rely on the well-known fact
>> that the command exits with 0 meaning the interface should be UP. This
>> change might well break scripts of that kind.
> I am sorry I don't get it. Could you give an example
> of a script that works now but would be broken?

https://github.com/torvalds/linux/blob/master/tools/testing/selftests/net/netdevice.sh#L27
https://github.com/WPO-Foundation/wptagent/blob/master/internal/adb.py#L443
https://github.com/openstack/steth/blob/master/steth/agent/api.py#L134

There are more if you keep searching.

-Siwei

>
>
>>>
>>>> It doesn't look to be a reliable
>>>> way of prohibit userspace from operating against slaves.
>>>>
>>>> -Siwei
>>>>
>>>>
>>> This does not mean we shouldn't make an effort to disable broken
>>> configurations.
>>>
>>> I am not arguing against your patch. Not at all. I see better
>>> hiding of slaves as a separate enhancement.
>> I understand, but my point is we should try to minimize unnecessary side
>> impact to the current usage for whatever "hiding" effort we can make. It's
>> hard to find a tradeoff sometimes.
> Yes if some userspace made an assumption and it worked, we should keep
> it working I think. I don't necessarily agree we should worry too much
> about theoretical issues. In half a year since the feature got merged
> it's unlikely there are millions of slightly different scripts using it.
>
>>>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>>
>> Thank you.
>>
>> -Siwei
Jiri Pirko March 6, 2019, 12:04 p.m. UTC | #15
Tue, Mar 05, 2019 at 01:50:59AM CET, si-wei.liu@oracle.com wrote:
>When a netdev appears through hot plug then gets enslaved by a failover
>master that is already up and running, the slave will be opened
>right away after getting enslaved. Today there's a race that userspace
>(udev) may fail to rename the slave if the kernel (net_failover)
>opens the slave earlier than when the userspace rename happens.
>Unlike bond or team, the primary slave of failover can't be renamed by
>userspace ahead of time, since the kernel initiated auto-enslavement is
>unable to, or rather, is never meant to be synchronized with the rename
>request from userspace.
>
>As the failover slave interfaces are not designed to be operated
>directly by userspace apps: IP configuration, filter rules with
>regard to network traffic passing and etc., should all be done on master
>interface. In general, userspace apps only care about the
>name of master interface, while slave names are less important as long
>as admin users can see reliable names that may carry
>other information describing the netdev. For e.g., they can infer that
>"ens3nsby" is a standby slave of "ens3", while for a
>name like "eth0" they can't tell which master it belongs to.
>
>Historically the name of IFF_UP interface can't be changed because
>there might be admin script or management software that is already
>relying on such behavior and assumes that the slave name can't be
>changed once UP. But failover is special: with the in-kernel
>auto-enslavement mechanism, the userspace expectation for device
>enumeration and bring-up order is already broken. Previously initramfs
>and various userspace config tools were modified to bypass failover
>slaves because of auto-enslavement and duplicate MAC address. Similarly,
>in case that users care about seeing reliable slave name, the new type
>of failover slaves needs to be taken care of specifically in userspace
>anyway.
>
>For that to work, now introduce a module-level tunable,
>"slave_rename_ok" that allows users to lift up the rename restriction on
>failover slave which is already UP. Although it's possible this change
>potentially break userspace component (most likely configuration scripts
>or management software) that assumes slave name can't be changed while
>UP, it's relatively a limited and controllable set among all userspace
>components, which can be fixed specifically to work with the new naming
>behavior of the failover slave. Userspace component interacting with
>slaves should be changed to operate on failover master instead, as the
>failover slave is dynamic in nature which may come and go at any point.
>The goal is to make the role of failover slaves less relevant, and
>all userspace should only deal with master in the long run. The default
>for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>the right support in place meanwhile users don't care about reliable
>userspace naming, the value can be set to false(0).
>
>Signed-off-by: Si-Wei.Liu@oracle.com
>Reviewed-by: Liran Alon <liran.alon@oracle.com>
>---
> include/linux/netdevice.h |  3 +++
> net/core/dev.c            |  3 ++-
> net/core/failover.c       | 11 +++++++++--
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 857f8ab..6d9e4e0 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1487,6 +1487,7 @@ struct net_device_ops {
>  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>  * @IFF_FAILOVER: device is a failover master device
>  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>  */
> enum netdev_priv_flags {
> 	IFF_802_1Q_VLAN			= 1<<0,
>@@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> 	IFF_NO_RX_HANDLER		= 1<<26,
> 	IFF_FAILOVER			= 1<<27,
> 	IFF_FAILOVER_SLAVE		= 1<<28,
>+	IFF_SLAVE_RENAME_OK		= 1<<29,
> };
> 
> #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>@@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> #define IFF_FAILOVER			IFF_FAILOVER
> #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>+#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> 
> /**
>  *	struct net_device - The DEVICE structure.
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 722d50d..ae070de 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 	BUG_ON(!dev_net(dev));
> 
> 	net = dev_net(dev);
>-	if (dev->flags & IFF_UP)
>+	if (dev->flags & IFF_UP &&
>+	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> 		return -EBUSY;
> 
> 	write_seqcount_begin(&devnet_rename_seq);
>diff --git a/net/core/failover.c b/net/core/failover.c
>index 4a92a98..1fd8bbb 100644
>--- a/net/core/failover.c
>+++ b/net/core/failover.c
>@@ -16,6 +16,11 @@
> 
> static LIST_HEAD(failover_list);
> static DEFINE_SPINLOCK(failover_lock);
>+static bool slave_rename_ok = true;
>+
>+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>+MODULE_PARM_DESC(slave_rename_ok,
>+		 "If set allow renaming the slave when failover master is up");

No module parameters please. If you need to set something do it using
rtnl_link_ops. Thanks.


> 
> static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> {
>@@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> 	}
> 
> 	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>+	if (slave_rename_ok)
>+		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> 
> 	if (fops && fops->slave_register &&
> 	    !fops->slave_register(slave_dev, failover_dev))
> 		return NOTIFY_OK;
> 
> 	netdev_upper_dev_unlink(slave_dev, failover_dev);
>-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> err_upper_link:
> 	netdev_rx_handler_unregister(slave_dev);
> done:
>@@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> 
> 	netdev_rx_handler_unregister(slave_dev);
> 	netdev_upper_dev_unlink(slave_dev, failover_dev);
>-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> 
> 	if (fops && fops->slave_unregister &&
> 	    !fops->slave_unregister(slave_dev, failover_dev))
>-- 
>1.8.3.1
>
Liran Alon March 6, 2019, 11:36 p.m. UTC | #16
> On 6 Mar 2019, at 23:42, si-wei liu <si-wei.liu@oracle.com> wrote:
> 
> 
> 
> On 3/6/2019 1:36 PM, Samudrala, Sridhar wrote:
>> 
>> On 3/6/2019 1:26 PM, si-wei liu wrote:
>>> 
>>> 
>>> On 3/6/2019 4:04 AM, Jiri Pirko wrote:
>>>>> --- a/net/core/failover.c
>>>>> +++ b/net/core/failover.c
>>>>> @@ -16,6 +16,11 @@
>>>>> 
>>>>> static LIST_HEAD(failover_list);
>>>>> static DEFINE_SPINLOCK(failover_lock);
>>>>> +static bool slave_rename_ok = true;
>>>>> +
>>>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>>>>> +MODULE_PARM_DESC(slave_rename_ok,
>>>>> +		 "If set allow renaming the slave when failover master is up");
>>>>> 
>>>> No module parameters please. If you need to set something do it using
>>>> rtnl_link_ops. Thanks.
>>>> 
>>>> 
>>> I understand what you ask for, but without module parameters userspace don't work. During boot (dracut) the virtio netdev gets enslaved earlier than when userspace comes up, so failover has to determine the setting during initialization/creation. This config is not dynamic, at least for the life cycle of a particular failover link it shouldn't be changed. Without module parameter, how does the userspace specify this value during kernel initialization? 
>>> 
>> Can we enable this by default and not make it configurable via module parameter?
>> Is there any  usecase where someone expects rename to fail with failover slaves?
> Probably just cater for those application that assumes fixed name on UP interface?
> 
> It's already the default for the configurable. I myself don't think that's a big problem for failover users. So far there's not even QEMU support I think everything can be changed. I don't feel strong to just fix it without introducing configurable. But maybe Michael or others think it differently...
> 
> If no one objects, I don't feel strong to make it fixed behavior.
> 
> -Siwei
> 

I agree we should just remove the module parameter.

-Liran
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 857f8ab..6d9e4e0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1487,6 +1487,7 @@  struct net_device_ops {
  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
  * @IFF_FAILOVER: device is a failover master device
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
  */
 enum netdev_priv_flags {
 	IFF_802_1Q_VLAN			= 1<<0,
@@ -1518,6 +1519,7 @@  enum netdev_priv_flags {
 	IFF_NO_RX_HANDLER		= 1<<26,
 	IFF_FAILOVER			= 1<<27,
 	IFF_FAILOVER_SLAVE		= 1<<28,
+	IFF_SLAVE_RENAME_OK		= 1<<29,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1548,6 +1550,7 @@  enum netdev_priv_flags {
 #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
 #define IFF_FAILOVER			IFF_FAILOVER
 #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
+#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
 
 /**
  *	struct net_device - The DEVICE structure.
diff --git a/net/core/dev.c b/net/core/dev.c
index 722d50d..ae070de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1180,7 +1180,8 @@  int dev_change_name(struct net_device *dev, const char *newname)
 	BUG_ON(!dev_net(dev));
 
 	net = dev_net(dev);
-	if (dev->flags & IFF_UP)
+	if (dev->flags & IFF_UP &&
+	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
 		return -EBUSY;
 
 	write_seqcount_begin(&devnet_rename_seq);
diff --git a/net/core/failover.c b/net/core/failover.c
index 4a92a98..1fd8bbb 100644
--- a/net/core/failover.c
+++ b/net/core/failover.c
@@ -16,6 +16,11 @@ 
 
 static LIST_HEAD(failover_list);
 static DEFINE_SPINLOCK(failover_lock);
+static bool slave_rename_ok = true;
+
+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
+MODULE_PARM_DESC(slave_rename_ok,
+		 "If set allow renaming the slave when failover master is up");
 
 static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
 {
@@ -81,13 +86,15 @@  static int failover_slave_register(struct net_device *slave_dev)
 	}
 
 	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
+	if (slave_rename_ok)
+		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
 
 	if (fops && fops->slave_register &&
 	    !fops->slave_register(slave_dev, failover_dev))
 		return NOTIFY_OK;
 
 	netdev_upper_dev_unlink(slave_dev, failover_dev);
-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
 err_upper_link:
 	netdev_rx_handler_unregister(slave_dev);
 done:
@@ -121,7 +128,7 @@  int failover_slave_unregister(struct net_device *slave_dev)
 
 	netdev_rx_handler_unregister(slave_dev);
 	netdev_upper_dev_unlink(slave_dev, failover_dev);
-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
 
 	if (fops && fops->slave_unregister &&
 	    !fops->slave_unregister(slave_dev, failover_dev))