Message ID | 1551747059-11831-1-git-send-email-si-wei.liu@oracle.com |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Series | [RFC,net-next] failover: allow name change on IFF_UP slave interfaces | expand |
On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote: > When a netdev appears through hot plug then gets enslaved by a failover > master that is already up and running, the slave will be opened > right away after getting enslaved. Today there's a race that userspace > (udev) may fail to rename the slave if the kernel (net_failover) > opens the slave earlier than when the userspace rename happens. > Unlike bond or team, the primary slave of failover can't be renamed by > userspace ahead of time, since the kernel initiated auto-enslavement is > unable to, or rather, is never meant to be synchronized with the rename > request from userspace. > > As the failover slave interfaces are not designed to be operated > directly by userspace apps: IP configuration, filter rules with > regard to network traffic passing and etc., should all be done on master > interface. In general, userspace apps only care about the > name of master interface, while slave names are less important as long > as admin users can see reliable names that may carry > other information describing the netdev. For e.g., they can infer that > "ens3nsby" is a standby slave of "ens3", while for a > name like "eth0" they can't tell which master it belongs to. > > Historically the name of IFF_UP interface can't be changed because > there might be admin script or management software that is already > relying on such behavior and assumes that the slave name can't be > changed once UP. But failover is special: with the in-kernel > auto-enslavement mechanism, the userspace expectation for device > enumeration and bring-up order is already broken. Previously initramfs > and various userspace config tools were modified to bypass failover > slaves because of auto-enslavement and duplicate MAC address. Similarly, > in case that users care about seeing reliable slave name, the new type > of failover slaves needs to be taken care of specifically in userspace > anyway. > > For that to work, now introduce a module-level tunable, > "slave_rename_ok" that allows users to lift up the rename restriction on > failover slave which is already UP. Although it's possible this change > potentially break userspace component (most likely configuration scripts > or management software) that assumes slave name can't be changed while > UP, it's relatively a limited and controllable set among all userspace > components, which can be fixed specifically to work with the new naming > behavior of the failover slave. Userspace component interacting with > slaves should be changed to operate on failover master instead, as the > failover slave is dynamic in nature which may come and go at any point. > The goal is to make the role of failover slaves less relevant, and > all userspace should only deal with master in the long run. The default > for the "slave_rename_ok" is set to true(1). If userspace doesn't have > the right support in place meanwhile users don't care about reliable > userspace naming, the value can be set to false(0). > > Signed-off-by: Si-Wei.Liu@oracle.com > Reviewed-by: Liran Alon <liran.alon@oracle.com> Not sure which of the versions I should reply to. I have a vague idea: would it work to *not* set IFF_UP on slave devices at all? Would this reduce the chances of existing scripts such as dracut being confused? And this leaves open the option for scripts to address slaves by checking some custom attribute. > --- > include/linux/netdevice.h | 3 +++ > net/core/dev.c | 3 ++- > net/core/failover.c | 11 +++++++++-- > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 857f8ab..6d9e4e0 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1487,6 +1487,7 @@ struct net_device_ops { > * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook > * @IFF_FAILOVER: device is a failover master device > * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device > + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running > */ > enum netdev_priv_flags { > IFF_802_1Q_VLAN = 1<<0, > @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { > IFF_NO_RX_HANDLER = 1<<26, > IFF_FAILOVER = 1<<27, > IFF_FAILOVER_SLAVE = 1<<28, > + IFF_SLAVE_RENAME_OK = 1<<29, > }; > > #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN > @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { > #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER > #define IFF_FAILOVER IFF_FAILOVER > #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE > +#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK > > /** > * struct net_device - The DEVICE structure. > diff --git a/net/core/dev.c b/net/core/dev.c > index 722d50d..ae070de 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) > BUG_ON(!dev_net(dev)); > > net = dev_net(dev); > - if (dev->flags & IFF_UP) > + if (dev->flags & IFF_UP && > + !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) > return -EBUSY; > > write_seqcount_begin(&devnet_rename_seq); > diff --git a/net/core/failover.c b/net/core/failover.c > index 4a92a98..1fd8bbb 100644 > --- a/net/core/failover.c > +++ b/net/core/failover.c > @@ -16,6 +16,11 @@ > > static LIST_HEAD(failover_list); > static DEFINE_SPINLOCK(failover_lock); > +static bool slave_rename_ok = true; > + > +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); > +MODULE_PARM_DESC(slave_rename_ok, > + "If set allow renaming the slave when failover master is up"); > > static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) > { > @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) > } > > slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; > + if (slave_rename_ok) > + slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; > > if (fops && fops->slave_register && > !fops->slave_register(slave_dev, failover_dev)) > return NOTIFY_OK; > > netdev_upper_dev_unlink(slave_dev, failover_dev); > - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; > + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); > err_upper_link: > netdev_rx_handler_unregister(slave_dev); > done: > @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) > > netdev_rx_handler_unregister(slave_dev); > netdev_upper_dev_unlink(slave_dev, failover_dev); > - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; > + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); > > if (fops && fops->slave_unregister && > !fops->slave_unregister(slave_dev, failover_dev)) > -- > 1.8.3.1
On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote: > On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote: >> When a netdev appears through hot plug then gets enslaved by a failover >> master that is already up and running, the slave will be opened >> right away after getting enslaved. Today there's a race that userspace >> (udev) may fail to rename the slave if the kernel (net_failover) >> opens the slave earlier than when the userspace rename happens. >> Unlike bond or team, the primary slave of failover can't be renamed by >> userspace ahead of time, since the kernel initiated auto-enslavement is >> unable to, or rather, is never meant to be synchronized with the rename >> request from userspace. >> >> As the failover slave interfaces are not designed to be operated >> directly by userspace apps: IP configuration, filter rules with >> regard to network traffic passing and etc., should all be done on master >> interface. In general, userspace apps only care about the >> name of master interface, while slave names are less important as long >> as admin users can see reliable names that may carry >> other information describing the netdev. For e.g., they can infer that >> "ens3nsby" is a standby slave of "ens3", while for a >> name like "eth0" they can't tell which master it belongs to. >> >> Historically the name of IFF_UP interface can't be changed because >> there might be admin script or management software that is already >> relying on such behavior and assumes that the slave name can't be >> changed once UP. But failover is special: with the in-kernel >> auto-enslavement mechanism, the userspace expectation for device >> enumeration and bring-up order is already broken. Previously initramfs >> and various userspace config tools were modified to bypass failover >> slaves because of auto-enslavement and duplicate MAC address. Similarly, >> in case that users care about seeing reliable slave name, the new type >> of failover slaves needs to be taken care of specifically in userspace >> anyway. >> >> For that to work, now introduce a module-level tunable, >> "slave_rename_ok" that allows users to lift up the rename restriction on >> failover slave which is already UP. Although it's possible this change >> potentially break userspace component (most likely configuration scripts >> or management software) that assumes slave name can't be changed while >> UP, it's relatively a limited and controllable set among all userspace >> components, which can be fixed specifically to work with the new naming >> behavior of the failover slave. Userspace component interacting with >> slaves should be changed to operate on failover master instead, as the >> failover slave is dynamic in nature which may come and go at any point. >> The goal is to make the role of failover slaves less relevant, and >> all userspace should only deal with master in the long run. The default >> for the "slave_rename_ok" is set to true(1). If userspace doesn't have >> the right support in place meanwhile users don't care about reliable >> userspace naming, the value can be set to false(0). >> >> Signed-off-by: Si-Wei.Liu@oracle.com >> Reviewed-by: Liran Alon <liran.alon@oracle.com> > Not sure which of the versions I should reply to. Sorry for multiple copies sent. It's fine to reply to this one. > > I have a vague idea: would it work to *not* set > IFF_UP on slave devices at all? Hmm, I ever thought about this option, and it appears this solution is more invasive than required to convert existing scripts, despite the controversy of introducing internal netdev state to differentiate user visible state. Either we disallow slave to be brought up by user, or to not set IFF_UP flag but instead use the internal one, could end up with substantial behavioral change that breaks scripts. Consider any admin script that does `ip link set dev ... up' successfully just assumes the link is up and subsequent operation can be done as usual. While it *may* work for dracut (yet to be verified), I'm a bit concerned that there are more scripts to be converted than those that don't follow volatile failover slave names. It's technically doable, but may not worth the effort (in terms of porting existing scripts/apps). Thanks -Siwei > > Would this reduce the chances of existing scripts such as dracut being > confused? > > And this leaves open the option for scripts to address > slaves by checking some custom attribute. > >> --- >> include/linux/netdevice.h | 3 +++ >> net/core/dev.c | 3 ++- >> net/core/failover.c | 11 +++++++++-- >> 3 files changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index 857f8ab..6d9e4e0 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -1487,6 +1487,7 @@ struct net_device_ops { >> * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook >> * @IFF_FAILOVER: device is a failover master device >> * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device >> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running >> */ >> enum netdev_priv_flags { >> IFF_802_1Q_VLAN = 1<<0, >> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { >> IFF_NO_RX_HANDLER = 1<<26, >> IFF_FAILOVER = 1<<27, >> IFF_FAILOVER_SLAVE = 1<<28, >> + IFF_SLAVE_RENAME_OK = 1<<29, >> }; >> >> #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN >> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { >> #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER >> #define IFF_FAILOVER IFF_FAILOVER >> #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE >> +#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK >> >> /** >> * struct net_device - The DEVICE structure. >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 722d50d..ae070de 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) >> BUG_ON(!dev_net(dev)); >> >> net = dev_net(dev); >> - if (dev->flags & IFF_UP) >> + if (dev->flags & IFF_UP && >> + !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) >> return -EBUSY; >> >> write_seqcount_begin(&devnet_rename_seq); >> diff --git a/net/core/failover.c b/net/core/failover.c >> index 4a92a98..1fd8bbb 100644 >> --- a/net/core/failover.c >> +++ b/net/core/failover.c >> @@ -16,6 +16,11 @@ >> >> static LIST_HEAD(failover_list); >> static DEFINE_SPINLOCK(failover_lock); >> +static bool slave_rename_ok = true; >> + >> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); >> +MODULE_PARM_DESC(slave_rename_ok, >> + "If set allow renaming the slave when failover master is up"); >> >> static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) >> { >> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) >> } >> >> slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; >> + if (slave_rename_ok) >> + slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; >> >> if (fops && fops->slave_register && >> !fops->slave_register(slave_dev, failover_dev)) >> return NOTIFY_OK; >> >> netdev_upper_dev_unlink(slave_dev, failover_dev); >> - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; >> + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); >> err_upper_link: >> netdev_rx_handler_unregister(slave_dev); >> done: >> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) >> >> netdev_rx_handler_unregister(slave_dev); >> netdev_upper_dev_unlink(slave_dev, failover_dev); >> - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; >> + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); >> >> if (fops && fops->slave_unregister && >> !fops->slave_unregister(slave_dev, failover_dev)) >> -- >> 1.8.3.1
On Tue, 5 Mar 2019 11:19:32 -0800 si-wei liu <si-wei.liu@oracle.com> wrote: > > I have a vague idea: would it work to *not* set > > IFF_UP on slave devices at all? > Hmm, I ever thought about this option, and it appears this solution is > more invasive than required to convert existing scripts, despite the > controversy of introducing internal netdev state to differentiate user > visible state. Either we disallow slave to be brought up by user, or to > not set IFF_UP flag but instead use the internal one, could end up with > substantial behavioral change that breaks scripts. Consider any admin > script that does `ip link set dev ... up' successfully just assumes the > link is up and subsequent operation can be done as usual. While it *may* > work for dracut (yet to be verified), I'm a bit concerned that there are > more scripts to be converted than those that don't follow volatile > failover slave names. It's technically doable, but may not worth the > effort (in terms of porting existing scripts/apps). > > Thanks > -Siwei Won't work for most devices. Many devices turn off PHY and link layer if not IFF_UP
On 3/5/2019 11:24 AM, Stephen Hemminger wrote: > On Tue, 5 Mar 2019 11:19:32 -0800 > si-wei liu <si-wei.liu@oracle.com> wrote: > >>> I have a vague idea: would it work to *not* set >>> IFF_UP on slave devices at all? >> Hmm, I ever thought about this option, and it appears this solution is >> more invasive than required to convert existing scripts, despite the >> controversy of introducing internal netdev state to differentiate user >> visible state. Either we disallow slave to be brought up by user, or to >> not set IFF_UP flag but instead use the internal one, could end up with >> substantial behavioral change that breaks scripts. Consider any admin >> script that does `ip link set dev ... up' successfully just assumes the >> link is up and subsequent operation can be done as usual. While it *may* >> work for dracut (yet to be verified), I'm a bit concerned that there are >> more scripts to be converted than those that don't follow volatile >> failover slave names. It's technically doable, but may not worth the >> effort (in terms of porting existing scripts/apps). >> >> Thanks >> -Siwei > Won't work for most devices. Many devices turn off PHY and link layer > if not IFF_UP True, that's what I said about introducing internal state for those driver and other kernel component. Very invasive change indeed. -Siwei
On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote: > > > On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote: > > On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote: > > > When a netdev appears through hot plug then gets enslaved by a failover > > > master that is already up and running, the slave will be opened > > > right away after getting enslaved. Today there's a race that userspace > > > (udev) may fail to rename the slave if the kernel (net_failover) > > > opens the slave earlier than when the userspace rename happens. > > > Unlike bond or team, the primary slave of failover can't be renamed by > > > userspace ahead of time, since the kernel initiated auto-enslavement is > > > unable to, or rather, is never meant to be synchronized with the rename > > > request from userspace. > > > > > > As the failover slave interfaces are not designed to be operated > > > directly by userspace apps: IP configuration, filter rules with > > > regard to network traffic passing and etc., should all be done on master > > > interface. In general, userspace apps only care about the > > > name of master interface, while slave names are less important as long > > > as admin users can see reliable names that may carry > > > other information describing the netdev. For e.g., they can infer that > > > "ens3nsby" is a standby slave of "ens3", while for a > > > name like "eth0" they can't tell which master it belongs to. > > > > > > Historically the name of IFF_UP interface can't be changed because > > > there might be admin script or management software that is already > > > relying on such behavior and assumes that the slave name can't be > > > changed once UP. But failover is special: with the in-kernel > > > auto-enslavement mechanism, the userspace expectation for device > > > enumeration and bring-up order is already broken. Previously initramfs > > > and various userspace config tools were modified to bypass failover > > > slaves because of auto-enslavement and duplicate MAC address. Similarly, > > > in case that users care about seeing reliable slave name, the new type > > > of failover slaves needs to be taken care of specifically in userspace > > > anyway. > > > > > > For that to work, now introduce a module-level tunable, > > > "slave_rename_ok" that allows users to lift up the rename restriction on > > > failover slave which is already UP. Although it's possible this change > > > potentially break userspace component (most likely configuration scripts > > > or management software) that assumes slave name can't be changed while > > > UP, it's relatively a limited and controllable set among all userspace > > > components, which can be fixed specifically to work with the new naming > > > behavior of the failover slave. Userspace component interacting with > > > slaves should be changed to operate on failover master instead, as the > > > failover slave is dynamic in nature which may come and go at any point. > > > The goal is to make the role of failover slaves less relevant, and > > > all userspace should only deal with master in the long run. The default > > > for the "slave_rename_ok" is set to true(1). If userspace doesn't have > > > the right support in place meanwhile users don't care about reliable > > > userspace naming, the value can be set to false(0). > > > > > > Signed-off-by: Si-Wei.Liu@oracle.com > > > Reviewed-by: Liran Alon <liran.alon@oracle.com> > > Not sure which of the versions I should reply to. > Sorry for multiple copies sent. It's fine to reply to this one. > > > > > I have a vague idea: would it work to *not* set > > IFF_UP on slave devices at all? > Hmm, I ever thought about this option, and it appears this solution is more > invasive than required to convert existing scripts, despite the controversy > of introducing internal netdev state to differentiate user visible state. > Either we disallow slave to be brought up by user, or to not set IFF_UP flag > but instead use the internal one, could end up with substantial behavioral > change that breaks scripts. Consider any admin script that does `ip link set > dev ... up' successfully just assumes the link is up and subsequent > operation can be done as usual. While it *may* work for dracut (yet to be > verified), I'm a bit concerned that there are more scripts to be converted > than those that don't follow volatile failover slave names. It's technically > doable, but may not worth the effort (in terms of porting existing > scripts/apps). > > Thanks > -Siwei Right. Advantage could be that we prevent all kind of misconfigurations e.g. when one has a route on a slave. > > > > Would this reduce the chances of existing scripts such as dracut being > > confused? > > > > And this leaves open the option for scripts to address > > slaves by checking some custom attribute. > > > > > --- > > > include/linux/netdevice.h | 3 +++ > > > net/core/dev.c | 3 ++- > > > net/core/failover.c | 11 +++++++++-- > > > 3 files changed, 14 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > > index 857f8ab..6d9e4e0 100644 > > > --- a/include/linux/netdevice.h > > > +++ b/include/linux/netdevice.h > > > @@ -1487,6 +1487,7 @@ struct net_device_ops { > > > * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook > > > * @IFF_FAILOVER: device is a failover master device > > > * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device > > > + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running > > > */ > > > enum netdev_priv_flags { > > > IFF_802_1Q_VLAN = 1<<0, > > > @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { > > > IFF_NO_RX_HANDLER = 1<<26, > > > IFF_FAILOVER = 1<<27, > > > IFF_FAILOVER_SLAVE = 1<<28, > > > + IFF_SLAVE_RENAME_OK = 1<<29, > > > }; > > > #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN > > > @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { > > > #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER > > > #define IFF_FAILOVER IFF_FAILOVER > > > #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE > > > +#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK > > > /** > > > * struct net_device - The DEVICE structure. > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 722d50d..ae070de 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) > > > BUG_ON(!dev_net(dev)); > > > net = dev_net(dev); > > > - if (dev->flags & IFF_UP) > > > + if (dev->flags & IFF_UP && > > > + !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) > > > return -EBUSY; > > > write_seqcount_begin(&devnet_rename_seq); > > > diff --git a/net/core/failover.c b/net/core/failover.c > > > index 4a92a98..1fd8bbb 100644 > > > --- a/net/core/failover.c > > > +++ b/net/core/failover.c > > > @@ -16,6 +16,11 @@ > > > static LIST_HEAD(failover_list); > > > static DEFINE_SPINLOCK(failover_lock); > > > +static bool slave_rename_ok = true; > > > + > > > +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); > > > +MODULE_PARM_DESC(slave_rename_ok, > > > + "If set allow renaming the slave when failover master is up"); > > > static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) > > > { > > > @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) > > > } > > > slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; > > > + if (slave_rename_ok) > > > + slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; > > > if (fops && fops->slave_register && > > > !fops->slave_register(slave_dev, failover_dev)) > > > return NOTIFY_OK; > > > netdev_upper_dev_unlink(slave_dev, failover_dev); > > > - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; > > > + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); > > > err_upper_link: > > > netdev_rx_handler_unregister(slave_dev); > > > done: > > > @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) > > > netdev_rx_handler_unregister(slave_dev); > > > netdev_upper_dev_unlink(slave_dev, failover_dev); > > > - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; > > > + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); > > > if (fops && fops->slave_unregister && > > > !fops->slave_unregister(slave_dev, failover_dev)) > > > -- > > > 1.8.3.1
On 3/5/2019 12:28 PM, Michael S. Tsirkin wrote: > On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote: >> >> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote: >>> On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote: >>>> When a netdev appears through hot plug then gets enslaved by a failover >>>> master that is already up and running, the slave will be opened >>>> right away after getting enslaved. Today there's a race that userspace >>>> (udev) may fail to rename the slave if the kernel (net_failover) >>>> opens the slave earlier than when the userspace rename happens. >>>> Unlike bond or team, the primary slave of failover can't be renamed by >>>> userspace ahead of time, since the kernel initiated auto-enslavement is >>>> unable to, or rather, is never meant to be synchronized with the rename >>>> request from userspace. >>>> >>>> As the failover slave interfaces are not designed to be operated >>>> directly by userspace apps: IP configuration, filter rules with >>>> regard to network traffic passing and etc., should all be done on master >>>> interface. In general, userspace apps only care about the >>>> name of master interface, while slave names are less important as long >>>> as admin users can see reliable names that may carry >>>> other information describing the netdev. For e.g., they can infer that >>>> "ens3nsby" is a standby slave of "ens3", while for a >>>> name like "eth0" they can't tell which master it belongs to. >>>> >>>> Historically the name of IFF_UP interface can't be changed because >>>> there might be admin script or management software that is already >>>> relying on such behavior and assumes that the slave name can't be >>>> changed once UP. But failover is special: with the in-kernel >>>> auto-enslavement mechanism, the userspace expectation for device >>>> enumeration and bring-up order is already broken. Previously initramfs >>>> and various userspace config tools were modified to bypass failover >>>> slaves because of auto-enslavement and duplicate MAC address. Similarly, >>>> in case that users care about seeing reliable slave name, the new type >>>> of failover slaves needs to be taken care of specifically in userspace >>>> anyway. >>>> >>>> For that to work, now introduce a module-level tunable, >>>> "slave_rename_ok" that allows users to lift up the rename restriction on >>>> failover slave which is already UP. Although it's possible this change >>>> potentially break userspace component (most likely configuration scripts >>>> or management software) that assumes slave name can't be changed while >>>> UP, it's relatively a limited and controllable set among all userspace >>>> components, which can be fixed specifically to work with the new naming >>>> behavior of the failover slave. Userspace component interacting with >>>> slaves should be changed to operate on failover master instead, as the >>>> failover slave is dynamic in nature which may come and go at any point. >>>> The goal is to make the role of failover slaves less relevant, and >>>> all userspace should only deal with master in the long run. The default >>>> for the "slave_rename_ok" is set to true(1). If userspace doesn't have >>>> the right support in place meanwhile users don't care about reliable >>>> userspace naming, the value can be set to false(0). >>>> >>>> Signed-off-by: Si-Wei.Liu@oracle.com >>>> Reviewed-by: Liran Alon <liran.alon@oracle.com> >>> Not sure which of the versions I should reply to. >> Sorry for multiple copies sent. It's fine to reply to this one. >> >>> I have a vague idea: would it work to *not* set >>> IFF_UP on slave devices at all? >> Hmm, I ever thought about this option, and it appears this solution is more >> invasive than required to convert existing scripts, despite the controversy >> of introducing internal netdev state to differentiate user visible state. >> Either we disallow slave to be brought up by user, or to not set IFF_UP flag >> but instead use the internal one, could end up with substantial behavioral >> change that breaks scripts. Consider any admin script that does `ip link set >> dev ... up' successfully just assumes the link is up and subsequent >> operation can be done as usual. While it *may* work for dracut (yet to be >> verified), I'm a bit concerned that there are more scripts to be converted >> than those that don't follow volatile failover slave names. It's technically >> doable, but may not worth the effort (in terms of porting existing >> scripts/apps). >> >> Thanks >> -Siwei > > Right. Advantage could be that we prevent all kind of > misconfigurations e.g. when one has a route on a slave. The fix for the slave route problem is already there in dracut. The ship has sailed, no matter how seamless upstream thought failover could work with the existing userspace. I would rather avoid introducing more breakage to userspace if there's simple yet less intrusive way to fix the rename issue itself. -Siwei > >>> Would this reduce the chances of existing scripts such as dracut being >>> confused? >>> >>> And this leaves open the option for scripts to address >>> slaves by checking some custom attribute. >>> >>>> --- >>>> include/linux/netdevice.h | 3 +++ >>>> net/core/dev.c | 3 ++- >>>> net/core/failover.c | 11 +++++++++-- >>>> 3 files changed, 14 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>> index 857f8ab..6d9e4e0 100644 >>>> --- a/include/linux/netdevice.h >>>> +++ b/include/linux/netdevice.h >>>> @@ -1487,6 +1487,7 @@ struct net_device_ops { >>>> * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook >>>> * @IFF_FAILOVER: device is a failover master device >>>> * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device >>>> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running >>>> */ >>>> enum netdev_priv_flags { >>>> IFF_802_1Q_VLAN = 1<<0, >>>> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { >>>> IFF_NO_RX_HANDLER = 1<<26, >>>> IFF_FAILOVER = 1<<27, >>>> IFF_FAILOVER_SLAVE = 1<<28, >>>> + IFF_SLAVE_RENAME_OK = 1<<29, >>>> }; >>>> #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN >>>> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { >>>> #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER >>>> #define IFF_FAILOVER IFF_FAILOVER >>>> #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE >>>> +#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK >>>> /** >>>> * struct net_device - The DEVICE structure. >>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>> index 722d50d..ae070de 100644 >>>> --- a/net/core/dev.c >>>> +++ b/net/core/dev.c >>>> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) >>>> BUG_ON(!dev_net(dev)); >>>> net = dev_net(dev); >>>> - if (dev->flags & IFF_UP) >>>> + if (dev->flags & IFF_UP && >>>> + !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) >>>> return -EBUSY; >>>> write_seqcount_begin(&devnet_rename_seq); >>>> diff --git a/net/core/failover.c b/net/core/failover.c >>>> index 4a92a98..1fd8bbb 100644 >>>> --- a/net/core/failover.c >>>> +++ b/net/core/failover.c >>>> @@ -16,6 +16,11 @@ >>>> static LIST_HEAD(failover_list); >>>> static DEFINE_SPINLOCK(failover_lock); >>>> +static bool slave_rename_ok = true; >>>> + >>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); >>>> +MODULE_PARM_DESC(slave_rename_ok, >>>> + "If set allow renaming the slave when failover master is up"); >>>> static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) >>>> { >>>> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) >>>> } >>>> slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; >>>> + if (slave_rename_ok) >>>> + slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; >>>> if (fops && fops->slave_register && >>>> !fops->slave_register(slave_dev, failover_dev)) >>>> return NOTIFY_OK; >>>> netdev_upper_dev_unlink(slave_dev, failover_dev); >>>> - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; >>>> + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); >>>> err_upper_link: >>>> netdev_rx_handler_unregister(slave_dev); >>>> done: >>>> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) >>>> netdev_rx_handler_unregister(slave_dev); >>>> netdev_upper_dev_unlink(slave_dev, failover_dev); >>>> - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; >>>> + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); >>>> if (fops && fops->slave_unregister && >>>> !fops->slave_unregister(slave_dev, failover_dev)) >>>> -- >>>> 1.8.3.1
On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote: > > On Tue, 5 Mar 2019 11:19:32 -0800 > > si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > I have a vague idea: would it work to *not* set > > > > IFF_UP on slave devices at all? > > > Hmm, I ever thought about this option, and it appears this solution is > > > more invasive than required to convert existing scripts, despite the > > > controversy of introducing internal netdev state to differentiate user > > > visible state. Either we disallow slave to be brought up by user, or to > > > not set IFF_UP flag but instead use the internal one, could end up with > > > substantial behavioral change that breaks scripts. Consider any admin > > > script that does `ip link set dev ... up' successfully just assumes the > > > link is up and subsequent operation can be done as usual. How would it work when carrier is off? > While it *may* > > > work for dracut (yet to be verified), I'm a bit concerned that there are > > > more scripts to be converted than those that don't follow volatile > > > failover slave names. It's technically doable, but may not worth the > > > effort (in terms of porting existing scripts/apps). > > > > > > Thanks > > > -Siwei > > Won't work for most devices. Many devices turn off PHY and link layer > > if not IFF_UP > True, that's what I said about introducing internal state for those driver > and other kernel component. Very invasive change indeed. > > -Siwei Well I did say it's vague. How about hiding IFF_UP from dev_get_flags (and probably __dev_change_flags)?
On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: >> >> On 3/5/2019 11:24 AM, Stephen Hemminger wrote: >>> On Tue, 5 Mar 2019 11:19:32 -0800 >>> si-wei liu <si-wei.liu@oracle.com> wrote: >>> >>>>> I have a vague idea: would it work to *not* set >>>>> IFF_UP on slave devices at all? >>>> Hmm, I ever thought about this option, and it appears this solution is >>>> more invasive than required to convert existing scripts, despite the >>>> controversy of introducing internal netdev state to differentiate user >>>> visible state. Either we disallow slave to be brought up by user, or to >>>> not set IFF_UP flag but instead use the internal one, could end up with >>>> substantial behavioral change that breaks scripts. Consider any admin >>>> script that does `ip link set dev ... up' successfully just assumes the >>>> link is up and subsequent operation can be done as usual. > How would it work when carrier is off? > >> While it *may* >>>> work for dracut (yet to be verified), I'm a bit concerned that there are >>>> more scripts to be converted than those that don't follow volatile >>>> failover slave names. It's technically doable, but may not worth the >>>> effort (in terms of porting existing scripts/apps). >>>> >>>> Thanks >>>> -Siwei >>> Won't work for most devices. Many devices turn off PHY and link layer >>> if not IFF_UP >> True, that's what I said about introducing internal state for those driver >> and other kernel component. Very invasive change indeed. >> >> -Siwei > Well I did say it's vague. > How about hiding IFF_UP from dev_get_flags (and probably > __dev_change_flags)? > Any different? This has small footprint for the kernel change for sure, while the discrepancy is still there. Anyone who writes code for IFF_UP will not notice IFF_FAILOVER_SLAVE. Not to mention more userspace "fixup" work has to be done due to this change. -Siwei
On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote: > > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: > > > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote: > > > > On Tue, 5 Mar 2019 11:19:32 -0800 > > > > si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > > > > > I have a vague idea: would it work to *not* set > > > > > > IFF_UP on slave devices at all? > > > > > Hmm, I ever thought about this option, and it appears this solution is > > > > > more invasive than required to convert existing scripts, despite the > > > > > controversy of introducing internal netdev state to differentiate user > > > > > visible state. Either we disallow slave to be brought up by user, or to > > > > > not set IFF_UP flag but instead use the internal one, could end up with > > > > > substantial behavioral change that breaks scripts. Consider any admin > > > > > script that does `ip link set dev ... up' successfully just assumes the > > > > > link is up and subsequent operation can be done as usual. > > How would it work when carrier is off? > > > > > While it *may* > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are > > > > > more scripts to be converted than those that don't follow volatile > > > > > failover slave names. It's technically doable, but may not worth the > > > > > effort (in terms of porting existing scripts/apps). > > > > > > > > > > Thanks > > > > > -Siwei > > > > Won't work for most devices. Many devices turn off PHY and link layer > > > > if not IFF_UP > > > True, that's what I said about introducing internal state for those driver > > > and other kernel component. Very invasive change indeed. > > > > > > -Siwei > > Well I did say it's vague. > > How about hiding IFF_UP from dev_get_flags (and probably > > __dev_change_flags)? > > > Any different? This has small footprint for the kernel change for sure, > while the discrepancy is still there. Anyone who writes code for IFF_UP will > not notice IFF_FAILOVER_SLAVE. > > Not to mention more userspace "fixup" work has to be done due to this > change. > > -Siwei > > Point is it's ok since most userspace should just ignore slaves - hopefully it will just ignore it since it already ignores interfaces that are down.
On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote: > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote: >> >> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: >>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: >>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote: >>>>> On Tue, 5 Mar 2019 11:19:32 -0800 >>>>> si-wei liu <si-wei.liu@oracle.com> wrote: >>>>> >>>>>>> I have a vague idea: would it work to *not* set >>>>>>> IFF_UP on slave devices at all? >>>>>> Hmm, I ever thought about this option, and it appears this solution is >>>>>> more invasive than required to convert existing scripts, despite the >>>>>> controversy of introducing internal netdev state to differentiate user >>>>>> visible state. Either we disallow slave to be brought up by user, or to >>>>>> not set IFF_UP flag but instead use the internal one, could end up with >>>>>> substantial behavioral change that breaks scripts. Consider any admin >>>>>> script that does `ip link set dev ... up' successfully just assumes the >>>>>> link is up and subsequent operation can be done as usual. >>> How would it work when carrier is off? >>> >>>> While it *may* >>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are >>>>>> more scripts to be converted than those that don't follow volatile >>>>>> failover slave names. It's technically doable, but may not worth the >>>>>> effort (in terms of porting existing scripts/apps). >>>>>> >>>>>> Thanks >>>>>> -Siwei >>>>> Won't work for most devices. Many devices turn off PHY and link layer >>>>> if not IFF_UP >>>> True, that's what I said about introducing internal state for those driver >>>> and other kernel component. Very invasive change indeed. >>>> >>>> -Siwei >>> Well I did say it's vague. >>> How about hiding IFF_UP from dev_get_flags (and probably >>> __dev_change_flags)? >>> >> Any different? This has small footprint for the kernel change for sure, >> while the discrepancy is still there. Anyone who writes code for IFF_UP will >> not notice IFF_FAILOVER_SLAVE. >> >> Not to mention more userspace "fixup" work has to be done due to this >> change. >> >> -Siwei >> >> > Point is it's ok since most userspace should just ignore slaves > - hopefully it will just ignore it since it already > ignores interfaces that are down. Admin script thought the interface could be bright up and do further operations without checking the UP flag. It doesn't look to be a reliable way of prohibit userspace from operating against slaves. -Siwei
On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote: > > > On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote: > > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote: > > > > > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: > > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: > > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote: > > > > > > On Tue, 5 Mar 2019 11:19:32 -0800 > > > > > > si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > > > > > > > > > I have a vague idea: would it work to *not* set > > > > > > > > IFF_UP on slave devices at all? > > > > > > > Hmm, I ever thought about this option, and it appears this solution is > > > > > > > more invasive than required to convert existing scripts, despite the > > > > > > > controversy of introducing internal netdev state to differentiate user > > > > > > > visible state. Either we disallow slave to be brought up by user, or to > > > > > > > not set IFF_UP flag but instead use the internal one, could end up with > > > > > > > substantial behavioral change that breaks scripts. Consider any admin > > > > > > > script that does `ip link set dev ... up' successfully just assumes the > > > > > > > link is up and subsequent operation can be done as usual. > > > > How would it work when carrier is off? > > > > > > > > > While it *may* > > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are > > > > > > > more scripts to be converted than those that don't follow volatile > > > > > > > failover slave names. It's technically doable, but may not worth the > > > > > > > effort (in terms of porting existing scripts/apps). > > > > > > > > > > > > > > Thanks > > > > > > > -Siwei > > > > > > Won't work for most devices. Many devices turn off PHY and link layer > > > > > > if not IFF_UP > > > > > True, that's what I said about introducing internal state for those driver > > > > > and other kernel component. Very invasive change indeed. > > > > > > > > > > -Siwei > > > > Well I did say it's vague. > > > > How about hiding IFF_UP from dev_get_flags (and probably > > > > __dev_change_flags)? > > > > > > > Any different? This has small footprint for the kernel change for sure, > > > while the discrepancy is still there. Anyone who writes code for IFF_UP will > > > not notice IFF_FAILOVER_SLAVE. > > > > > > Not to mention more userspace "fixup" work has to be done due to this > > > change. > > > > > > -Siwei > > > > > > > > Point is it's ok since most userspace should just ignore slaves > > - hopefully it will just ignore it since it already > > ignores interfaces that are down. > Admin script thought the interface could be bright up and do further > operations without checking the UP flag. These scripts then would be broken on any box with multiple interfaces since not all of these would have carrier. > It doesn't look to be a reliable > way of prohibit userspace from operating against slaves. > > -Siwei > > This does not mean we shouldn't make an effort to disable broken configurations. I am not arguing against your patch. Not at all. I see better hiding of slaves as a separate enhancement. Acked-by: Michael S. Tsirkin <mst@redhat.com>
On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote: > On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote: >> >> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote: >>> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote: >>>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: >>>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: >>>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote: >>>>>>> On Tue, 5 Mar 2019 11:19:32 -0800 >>>>>>> si-wei liu <si-wei.liu@oracle.com> wrote: >>>>>>> >>>>>>>>> I have a vague idea: would it work to *not* set >>>>>>>>> IFF_UP on slave devices at all? >>>>>>>> Hmm, I ever thought about this option, and it appears this solution is >>>>>>>> more invasive than required to convert existing scripts, despite the >>>>>>>> controversy of introducing internal netdev state to differentiate user >>>>>>>> visible state. Either we disallow slave to be brought up by user, or to >>>>>>>> not set IFF_UP flag but instead use the internal one, could end up with >>>>>>>> substantial behavioral change that breaks scripts. Consider any admin >>>>>>>> script that does `ip link set dev ... up' successfully just assumes the >>>>>>>> link is up and subsequent operation can be done as usual. >>>>> How would it work when carrier is off? >>>>> >>>>>> While it *may* >>>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are >>>>>>>> more scripts to be converted than those that don't follow volatile >>>>>>>> failover slave names. It's technically doable, but may not worth the >>>>>>>> effort (in terms of porting existing scripts/apps). >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Siwei >>>>>>> Won't work for most devices. Many devices turn off PHY and link layer >>>>>>> if not IFF_UP >>>>>> True, that's what I said about introducing internal state for those driver >>>>>> and other kernel component. Very invasive change indeed. >>>>>> >>>>>> -Siwei >>>>> Well I did say it's vague. >>>>> How about hiding IFF_UP from dev_get_flags (and probably >>>>> __dev_change_flags)? >>>>> >>>> Any different? This has small footprint for the kernel change for sure, >>>> while the discrepancy is still there. Anyone who writes code for IFF_UP will >>>> not notice IFF_FAILOVER_SLAVE. >>>> >>>> Not to mention more userspace "fixup" work has to be done due to this >>>> change. >>>> >>>> -Siwei >>>> >>>> >>> Point is it's ok since most userspace should just ignore slaves >>> - hopefully it will just ignore it since it already >>> ignores interfaces that are down. >> Admin script thought the interface could be bright up and do further >> operations without checking the UP flag. > These scripts then would be broken on any box with multiple interfaces > since not all of these would have carrier. Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump or some other command relying on UP interface. It's quite common that those scripts don't check the UP flag but instead just rely on the well-known fact that the command exits with 0 meaning the interface should be UP. This change might well break scripts of that kind. > > >> It doesn't look to be a reliable >> way of prohibit userspace from operating against slaves. >> >> -Siwei >> >> > This does not mean we shouldn't make an effort to disable broken > configurations. > > I am not arguing against your patch. Not at all. I see better > hiding of slaves as a separate enhancement. I understand, but my point is we should try to minimize unnecessary side impact to the current usage for whatever "hiding" effort we can make. It's hard to find a tradeoff sometimes. > > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > Thank you. -Siwei
On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote: > > > On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote: > > On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote: > > > > > > On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote: > > > > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote: > > > > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: > > > > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: > > > > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote: > > > > > > > > On Tue, 5 Mar 2019 11:19:32 -0800 > > > > > > > > si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > > > > > > > > > > > > > I have a vague idea: would it work to *not* set > > > > > > > > > > IFF_UP on slave devices at all? > > > > > > > > > Hmm, I ever thought about this option, and it appears this solution is > > > > > > > > > more invasive than required to convert existing scripts, despite the > > > > > > > > > controversy of introducing internal netdev state to differentiate user > > > > > > > > > visible state. Either we disallow slave to be brought up by user, or to > > > > > > > > > not set IFF_UP flag but instead use the internal one, could end up with > > > > > > > > > substantial behavioral change that breaks scripts. Consider any admin > > > > > > > > > script that does `ip link set dev ... up' successfully just assumes the > > > > > > > > > link is up and subsequent operation can be done as usual. > > > > > > How would it work when carrier is off? > > > > > > > > > > > > > While it *may* > > > > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are > > > > > > > > > more scripts to be converted than those that don't follow volatile > > > > > > > > > failover slave names. It's technically doable, but may not worth the > > > > > > > > > effort (in terms of porting existing scripts/apps). > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > -Siwei > > > > > > > > Won't work for most devices. Many devices turn off PHY and link layer > > > > > > > > if not IFF_UP > > > > > > > True, that's what I said about introducing internal state for those driver > > > > > > > and other kernel component. Very invasive change indeed. > > > > > > > > > > > > > > -Siwei > > > > > > Well I did say it's vague. > > > > > > How about hiding IFF_UP from dev_get_flags (and probably > > > > > > __dev_change_flags)? > > > > > > > > > > > Any different? This has small footprint for the kernel change for sure, > > > > > while the discrepancy is still there. Anyone who writes code for IFF_UP will > > > > > not notice IFF_FAILOVER_SLAVE. > > > > > > > > > > Not to mention more userspace "fixup" work has to be done due to this > > > > > change. > > > > > > > > > > -Siwei > > > > > > > > > > > > > > Point is it's ok since most userspace should just ignore slaves > > > > - hopefully it will just ignore it since it already > > > > ignores interfaces that are down. > > > Admin script thought the interface could be bright up and do further > > > operations without checking the UP flag. > > These scripts then would be broken on any box with multiple interfaces > > since not all of these would have carrier. > Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump > or some other command relying on UP interface. It's quite common that those > scripts don't check the UP flag but instead just rely on the well-known fact > that the command exits with 0 meaning the interface should be UP. This > change might well break scripts of that kind. I am sorry I don't get it. Could you give an example of a script that works now but would be broken? > > > > > > > It doesn't look to be a reliable > > > way of prohibit userspace from operating against slaves. > > > > > > -Siwei > > > > > > > > This does not mean we shouldn't make an effort to disable broken > > configurations. > > > > I am not arguing against your patch. Not at all. I see better > > hiding of slaves as a separate enhancement. > I understand, but my point is we should try to minimize unnecessary side > impact to the current usage for whatever "hiding" effort we can make. It's > hard to find a tradeoff sometimes. Yes if some userspace made an assumption and it worked, we should keep it working I think. I don't necessarily agree we should worry too much about theoretical issues. In half a year since the feature got merged it's unlikely there are millions of slightly different scripts using it. > > > > > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > > > > Thank you. > > -Siwei
On 3/5/2019 11:23 PM, Michael S. Tsirkin wrote: > On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote: >> >> On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote: >>> On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote: >>>> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote: >>>>> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote: >>>>>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote: >>>>>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote: >>>>>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote: >>>>>>>>> On Tue, 5 Mar 2019 11:19:32 -0800 >>>>>>>>> si-wei liu <si-wei.liu@oracle.com> wrote: >>>>>>>>> >>>>>>>>>>> I have a vague idea: would it work to *not* set >>>>>>>>>>> IFF_UP on slave devices at all? >>>>>>>>>> Hmm, I ever thought about this option, and it appears this solution is >>>>>>>>>> more invasive than required to convert existing scripts, despite the >>>>>>>>>> controversy of introducing internal netdev state to differentiate user >>>>>>>>>> visible state. Either we disallow slave to be brought up by user, or to >>>>>>>>>> not set IFF_UP flag but instead use the internal one, could end up with >>>>>>>>>> substantial behavioral change that breaks scripts. Consider any admin >>>>>>>>>> script that does `ip link set dev ... up' successfully just assumes the >>>>>>>>>> link is up and subsequent operation can be done as usual. >>>>>>> How would it work when carrier is off? >>>>>>> >>>>>>>> While it *may* >>>>>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are >>>>>>>>>> more scripts to be converted than those that don't follow volatile >>>>>>>>>> failover slave names. It's technically doable, but may not worth the >>>>>>>>>> effort (in terms of porting existing scripts/apps). >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> -Siwei >>>>>>>>> Won't work for most devices. Many devices turn off PHY and link layer >>>>>>>>> if not IFF_UP >>>>>>>> True, that's what I said about introducing internal state for those driver >>>>>>>> and other kernel component. Very invasive change indeed. >>>>>>>> >>>>>>>> -Siwei >>>>>>> Well I did say it's vague. >>>>>>> How about hiding IFF_UP from dev_get_flags (and probably >>>>>>> __dev_change_flags)? >>>>>>> >>>>>> Any different? This has small footprint for the kernel change for sure, >>>>>> while the discrepancy is still there. Anyone who writes code for IFF_UP will >>>>>> not notice IFF_FAILOVER_SLAVE. >>>>>> >>>>>> Not to mention more userspace "fixup" work has to be done due to this >>>>>> change. >>>>>> >>>>>> -Siwei >>>>>> >>>>>> >>>>> Point is it's ok since most userspace should just ignore slaves >>>>> - hopefully it will just ignore it since it already >>>>> ignores interfaces that are down. >>>> Admin script thought the interface could be bright up and do further >>>> operations without checking the UP flag. >>> These scripts then would be broken on any box with multiple interfaces >>> since not all of these would have carrier. >> Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump >> or some other command relying on UP interface. It's quite common that those >> scripts don't check the UP flag but instead just rely on the well-known fact >> that the command exits with 0 meaning the interface should be UP. This >> change might well break scripts of that kind. > I am sorry I don't get it. Could you give an example > of a script that works now but would be broken? https://github.com/torvalds/linux/blob/master/tools/testing/selftests/net/netdevice.sh#L27 https://github.com/WPO-Foundation/wptagent/blob/master/internal/adb.py#L443 https://github.com/openstack/steth/blob/master/steth/agent/api.py#L134 There are more if you keep searching. -Siwei > > >>> >>>> It doesn't look to be a reliable >>>> way of prohibit userspace from operating against slaves. >>>> >>>> -Siwei >>>> >>>> >>> This does not mean we shouldn't make an effort to disable broken >>> configurations. >>> >>> I am not arguing against your patch. Not at all. I see better >>> hiding of slaves as a separate enhancement. >> I understand, but my point is we should try to minimize unnecessary side >> impact to the current usage for whatever "hiding" effort we can make. It's >> hard to find a tradeoff sometimes. > Yes if some userspace made an assumption and it worked, we should keep > it working I think. I don't necessarily agree we should worry too much > about theoretical issues. In half a year since the feature got merged > it's unlikely there are millions of slightly different scripts using it. > >>> >>> Acked-by: Michael S. Tsirkin <mst@redhat.com> >>> >>> >> Thank you. >> >> -Siwei
Tue, Mar 05, 2019 at 01:50:59AM CET, si-wei.liu@oracle.com wrote: >When a netdev appears through hot plug then gets enslaved by a failover >master that is already up and running, the slave will be opened >right away after getting enslaved. Today there's a race that userspace >(udev) may fail to rename the slave if the kernel (net_failover) >opens the slave earlier than when the userspace rename happens. >Unlike bond or team, the primary slave of failover can't be renamed by >userspace ahead of time, since the kernel initiated auto-enslavement is >unable to, or rather, is never meant to be synchronized with the rename >request from userspace. > >As the failover slave interfaces are not designed to be operated >directly by userspace apps: IP configuration, filter rules with >regard to network traffic passing and etc., should all be done on master >interface. In general, userspace apps only care about the >name of master interface, while slave names are less important as long >as admin users can see reliable names that may carry >other information describing the netdev. For e.g., they can infer that >"ens3nsby" is a standby slave of "ens3", while for a >name like "eth0" they can't tell which master it belongs to. > >Historically the name of IFF_UP interface can't be changed because >there might be admin script or management software that is already >relying on such behavior and assumes that the slave name can't be >changed once UP. But failover is special: with the in-kernel >auto-enslavement mechanism, the userspace expectation for device >enumeration and bring-up order is already broken. Previously initramfs >and various userspace config tools were modified to bypass failover >slaves because of auto-enslavement and duplicate MAC address. Similarly, >in case that users care about seeing reliable slave name, the new type >of failover slaves needs to be taken care of specifically in userspace >anyway. > >For that to work, now introduce a module-level tunable, >"slave_rename_ok" that allows users to lift up the rename restriction on >failover slave which is already UP. Although it's possible this change >potentially break userspace component (most likely configuration scripts >or management software) that assumes slave name can't be changed while >UP, it's relatively a limited and controllable set among all userspace >components, which can be fixed specifically to work with the new naming >behavior of the failover slave. Userspace component interacting with >slaves should be changed to operate on failover master instead, as the >failover slave is dynamic in nature which may come and go at any point. >The goal is to make the role of failover slaves less relevant, and >all userspace should only deal with master in the long run. The default >for the "slave_rename_ok" is set to true(1). If userspace doesn't have >the right support in place meanwhile users don't care about reliable >userspace naming, the value can be set to false(0). > >Signed-off-by: Si-Wei.Liu@oracle.com >Reviewed-by: Liran Alon <liran.alon@oracle.com> >--- > include/linux/netdevice.h | 3 +++ > net/core/dev.c | 3 ++- > net/core/failover.c | 11 +++++++++-- > 3 files changed, 14 insertions(+), 3 deletions(-) > >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >index 857f8ab..6d9e4e0 100644 >--- a/include/linux/netdevice.h >+++ b/include/linux/netdevice.h >@@ -1487,6 +1487,7 @@ struct net_device_ops { > * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook > * @IFF_FAILOVER: device is a failover master device > * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device >+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running > */ > enum netdev_priv_flags { > IFF_802_1Q_VLAN = 1<<0, >@@ -1518,6 +1519,7 @@ enum netdev_priv_flags { > IFF_NO_RX_HANDLER = 1<<26, > IFF_FAILOVER = 1<<27, > IFF_FAILOVER_SLAVE = 1<<28, >+ IFF_SLAVE_RENAME_OK = 1<<29, > }; > > #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN >@@ -1548,6 +1550,7 @@ enum netdev_priv_flags { > #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER > #define IFF_FAILOVER IFF_FAILOVER > #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE >+#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK > > /** > * struct net_device - The DEVICE structure. >diff --git a/net/core/dev.c b/net/core/dev.c >index 722d50d..ae070de 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) > BUG_ON(!dev_net(dev)); > > net = dev_net(dev); >- if (dev->flags & IFF_UP) >+ if (dev->flags & IFF_UP && >+ !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) > return -EBUSY; > > write_seqcount_begin(&devnet_rename_seq); >diff --git a/net/core/failover.c b/net/core/failover.c >index 4a92a98..1fd8bbb 100644 >--- a/net/core/failover.c >+++ b/net/core/failover.c >@@ -16,6 +16,11 @@ > > static LIST_HEAD(failover_list); > static DEFINE_SPINLOCK(failover_lock); >+static bool slave_rename_ok = true; >+ >+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); >+MODULE_PARM_DESC(slave_rename_ok, >+ "If set allow renaming the slave when failover master is up"); No module parameters please. If you need to set something do it using rtnl_link_ops. Thanks. > > static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) > { >@@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) > } > > slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; >+ if (slave_rename_ok) >+ slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; > > if (fops && fops->slave_register && > !fops->slave_register(slave_dev, failover_dev)) > return NOTIFY_OK; > > netdev_upper_dev_unlink(slave_dev, failover_dev); >- slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; >+ slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); > err_upper_link: > netdev_rx_handler_unregister(slave_dev); > done: >@@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) > > netdev_rx_handler_unregister(slave_dev); > netdev_upper_dev_unlink(slave_dev, failover_dev); >- slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; >+ slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); > > if (fops && fops->slave_unregister && > !fops->slave_unregister(slave_dev, failover_dev)) >-- >1.8.3.1 >
> On 6 Mar 2019, at 23:42, si-wei liu <si-wei.liu@oracle.com> wrote: > > > > On 3/6/2019 1:36 PM, Samudrala, Sridhar wrote: >> >> On 3/6/2019 1:26 PM, si-wei liu wrote: >>> >>> >>> On 3/6/2019 4:04 AM, Jiri Pirko wrote: >>>>> --- a/net/core/failover.c >>>>> +++ b/net/core/failover.c >>>>> @@ -16,6 +16,11 @@ >>>>> >>>>> static LIST_HEAD(failover_list); >>>>> static DEFINE_SPINLOCK(failover_lock); >>>>> +static bool slave_rename_ok = true; >>>>> + >>>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); >>>>> +MODULE_PARM_DESC(slave_rename_ok, >>>>> + "If set allow renaming the slave when failover master is up"); >>>>> >>>> No module parameters please. If you need to set something do it using >>>> rtnl_link_ops. Thanks. >>>> >>>> >>> I understand what you ask for, but without module parameters userspace don't work. During boot (dracut) the virtio netdev gets enslaved earlier than when userspace comes up, so failover has to determine the setting during initialization/creation. This config is not dynamic, at least for the life cycle of a particular failover link it shouldn't be changed. Without module parameter, how does the userspace specify this value during kernel initialization? >>> >> Can we enable this by default and not make it configurable via module parameter? >> Is there any usecase where someone expects rename to fail with failover slaves? > Probably just cater for those application that assumes fixed name on UP interface? > > It's already the default for the configurable. I myself don't think that's a big problem for failover users. So far there's not even QEMU support I think everything can be changed. I don't feel strong to just fix it without introducing configurable. But maybe Michael or others think it differently... > > If no one objects, I don't feel strong to make it fixed behavior. > > -Siwei > I agree we should just remove the module parameter. -Liran
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 857f8ab..6d9e4e0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1487,6 +1487,7 @@ struct net_device_ops { * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook * @IFF_FAILOVER: device is a failover master device * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { IFF_NO_RX_HANDLER = 1<<26, IFF_FAILOVER = 1<<27, IFF_FAILOVER_SLAVE = 1<<28, + IFF_SLAVE_RENAME_OK = 1<<29, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER #define IFF_FAILOVER IFF_FAILOVER #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE +#define IFF_SLAVE_RENAME_OK IFF_SLAVE_RENAME_OK /** * struct net_device - The DEVICE structure. diff --git a/net/core/dev.c b/net/core/dev.c index 722d50d..ae070de 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname) BUG_ON(!dev_net(dev)); net = dev_net(dev); - if (dev->flags & IFF_UP) + if (dev->flags & IFF_UP && + !(dev->priv_flags & IFF_SLAVE_RENAME_OK)) return -EBUSY; write_seqcount_begin(&devnet_rename_seq); diff --git a/net/core/failover.c b/net/core/failover.c index 4a92a98..1fd8bbb 100644 --- a/net/core/failover.c +++ b/net/core/failover.c @@ -16,6 +16,11 @@ static LIST_HEAD(failover_list); static DEFINE_SPINLOCK(failover_lock); +static bool slave_rename_ok = true; + +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR)); +MODULE_PARM_DESC(slave_rename_ok, + "If set allow renaming the slave when failover master is up"); static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops) { @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev) } slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; + if (slave_rename_ok) + slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK; if (fops && fops->slave_register && !fops->slave_register(slave_dev, failover_dev)) return NOTIFY_OK; netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); err_upper_link: netdev_rx_handler_unregister(slave_dev); done: @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev) netdev_rx_handler_unregister(slave_dev); netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK); if (fops && fops->slave_unregister && !fops->slave_unregister(slave_dev, failover_dev))