Message ID | 1553644093-10917-1-git-send-email-si-wei.liu@oracle.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net,v3] failover: allow name change on IFF_UP slave interfaces | expand |
On Tue, 26 Mar 2019 19:48:13 -0400 Si-Wei Liu <si-wei.liu@oracle.com> wrote: > When a netdev appears through hot plug then gets enslaved by a failover > master that is already up and running, the slave will be opened > right away after getting enslaved. Today there's a race that userspace > (udev) may fail to rename the slave if the kernel (net_failover) > opens the slave earlier than when the userspace rename happens. > Unlike bond or team, the primary slave of failover can't be renamed by > userspace ahead of time, since the kernel initiated auto-enslavement is > unable to, or rather, is never meant to be synchronized with the rename > request from userspace. > > As the failover slave interfaces are not designed to be operated > directly by userspace apps: IP configuration, filter rules with > regard to network traffic passing and etc., should all be done on master > interface. In general, userspace apps only care about the > name of master interface, while slave names are less important as long > as admin users can see reliable names that may carry > other information describing the netdev. For e.g., they can infer that > "ens3nsby" is a standby slave of "ens3", while for a > name like "eth0" they can't tell which master it belongs to. > > Historically the name of IFF_UP interface can't be changed because > there might be admin script or management software that is already > relying on such behavior and assumes that the slave name can't be > changed once UP. But failover is special: with the in-kernel > auto-enslavement mechanism, the userspace expectation for device > enumeration and bring-up order is already broken. Previously initramfs > and various userspace config tools were modified to bypass failover > slaves because of auto-enslavement and duplicate MAC address. Similarly, > in case that users care about seeing reliable slave name, the new type > of failover slaves needs to be taken care of specifically in userspace > anyway. > > It's less risky to lift up the rename restriction on failover slave > which is already UP. Although it's possible this change may potentially > break userspace component (most likely configuration scripts or > management software) that assumes slave name can't be changed while > UP, it's relatively a limited and controllable set among all userspace > components, which can be fixed specifically to listen for the rename > and/or link down/up events on failover slaves. Userspace component > interacting with slaves is expected to be changed to operate on failover > master interface instead, as the failover slave is dynamic in nature > which may come and go at any point. The goal is to make the role of > failover slaves less relevant, and userspace components should only > deal with failover master in the long run. > > Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > Reviewed-by: Liran Alon <liran.alon@oracle.com> Why do you need to do dev_close/dev_open which will bounce the link?
Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote: >When a netdev appears through hot plug then gets enslaved by a failover >master that is already up and running, the slave will be opened >right away after getting enslaved. Today there's a race that userspace >(udev) may fail to rename the slave if the kernel (net_failover) >opens the slave earlier than when the userspace rename happens. >Unlike bond or team, the primary slave of failover can't be renamed by >userspace ahead of time, since the kernel initiated auto-enslavement is >unable to, or rather, is never meant to be synchronized with the rename >request from userspace. > >As the failover slave interfaces are not designed to be operated >directly by userspace apps: IP configuration, filter rules with >regard to network traffic passing and etc., should all be done on master >interface. In general, userspace apps only care about the >name of master interface, while slave names are less important as long >as admin users can see reliable names that may carry >other information describing the netdev. For e.g., they can infer that >"ens3nsby" is a standby slave of "ens3", while for a >name like "eth0" they can't tell which master it belongs to. > >Historically the name of IFF_UP interface can't be changed because >there might be admin script or management software that is already >relying on such behavior and assumes that the slave name can't be >changed once UP. But failover is special: with the in-kernel >auto-enslavement mechanism, the userspace expectation for device >enumeration and bring-up order is already broken. Previously initramfs >and various userspace config tools were modified to bypass failover >slaves because of auto-enslavement and duplicate MAC address. Similarly, >in case that users care about seeing reliable slave name, the new type >of failover slaves needs to be taken care of specifically in userspace >anyway. > >It's less risky to lift up the rename restriction on failover slave >which is already UP. Although it's possible this change may potentially >break userspace component (most likely configuration scripts or >management software) that assumes slave name can't be changed while >UP, it's relatively a limited and controllable set among all userspace >components, which can be fixed specifically to listen for the rename >and/or link down/up events on failover slaves. Userspace component >interacting with slaves is expected to be changed to operate on failover >master interface instead, as the failover slave is dynamic in nature >which may come and go at any point. The goal is to make the role of >failover slaves less relevant, and userspace components should only >deal with failover master in the long run. > >Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") >Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >Reviewed-by: Liran Alon <liran.alon@oracle.com> > >-- >v1 -> v2: >- Drop configurable module parameter (Sridhar) > >v2 -> v3: >- Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar) >- Send down and up events around rename (Michael S. Tsirkin) >--- > net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > >diff --git a/net/core/dev.c b/net/core/dev.c >index 722d50d..3e0cd80 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev, > int dev_change_name(struct net_device *dev, const char *newname) > { > unsigned char old_assign_type; >+ bool reopen_needed = false; > char oldname[IFNAMSIZ]; > int err = 0; > int ret; >@@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname) > BUG_ON(!dev_net(dev)); > > net = dev_net(dev); >- if (dev->flags & IFF_UP) >- return -EBUSY; >+ >+ /* Allow failover slave to rename even when >+ * it is up and running. >+ * >+ * Failover slaves are special, since userspace >+ * might rename the slave after the interface >+ * has been brought up and running due to >+ * auto-enslavement. >+ * >+ * Failover users don't actually care about slave >+ * name change, as they are only expected to operate >+ * on master interface directly. >+ */ >+ if (dev->flags & IFF_UP) { >+ if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE))) >+ return -EBUSY; >+ reopen_needed = true; >+ } > > write_seqcount_begin(&devnet_rename_seq); > >@@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname) > return err; > } > >+ if (reopen_needed) >+ dev_close(dev); Ugh. Don't dev_close/dev_open on name change. >+ > if (oldname[0] && !strchr(oldname, '%')) > netdev_info(dev, "renamed from %s\n", oldname); > >@@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname) > memcpy(dev->name, oldname, IFNAMSIZ); > dev->name_assign_type = old_assign_type; > write_seqcount_end(&devnet_rename_seq); >- return ret; >+ if (err >= 0) >+ err = ret; >+ goto reopen; > } > > write_seqcount_end(&devnet_rename_seq); >@@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname) > } > } > >+reopen: >+ if (reopen_needed) { >+ ret = dev_open(dev); >+ if (ret) { >+ pr_err("%s: reopen device failed: %d\n", >+ dev->name, ret); >+ } >+ } >+ > return err; > } > >-- >1.8.3.1 >
On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote: > On Tue, 26 Mar 2019 19:48:13 -0400 > Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > > When a netdev appears through hot plug then gets enslaved by a failover > > master that is already up and running, the slave will be opened > > right away after getting enslaved. Today there's a race that userspace > > (udev) may fail to rename the slave if the kernel (net_failover) > > opens the slave earlier than when the userspace rename happens. > > Unlike bond or team, the primary slave of failover can't be renamed by > > userspace ahead of time, since the kernel initiated auto-enslavement is > > unable to, or rather, is never meant to be synchronized with the rename > > request from userspace. > > > > As the failover slave interfaces are not designed to be operated > > directly by userspace apps: IP configuration, filter rules with > > regard to network traffic passing and etc., should all be done on master > > interface. In general, userspace apps only care about the > > name of master interface, while slave names are less important as long > > as admin users can see reliable names that may carry > > other information describing the netdev. For e.g., they can infer that > > "ens3nsby" is a standby slave of "ens3", while for a > > name like "eth0" they can't tell which master it belongs to. > > > > Historically the name of IFF_UP interface can't be changed because > > there might be admin script or management software that is already > > relying on such behavior and assumes that the slave name can't be > > changed once UP. But failover is special: with the in-kernel > > auto-enslavement mechanism, the userspace expectation for device > > enumeration and bring-up order is already broken. Previously initramfs > > and various userspace config tools were modified to bypass failover > > slaves because of auto-enslavement and duplicate MAC address. Similarly, > > in case that users care about seeing reliable slave name, the new type > > of failover slaves needs to be taken care of specifically in userspace > > anyway. > > > > It's less risky to lift up the rename restriction on failover slave > > which is already UP. Although it's possible this change may potentially > > break userspace component (most likely configuration scripts or > > management software) that assumes slave name can't be changed while > > UP, it's relatively a limited and controllable set among all userspace > > components, which can be fixed specifically to listen for the rename > > and/or link down/up events on failover slaves. Userspace component > > interacting with slaves is expected to be changed to operate on failover > > master interface instead, as the failover slave is dynamic in nature > > which may come and go at any point. The goal is to make the role of > > failover slaves less relevant, and userspace components should only > > deal with failover master in the long run. > > > > Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > > Reviewed-by: Liran Alon <liran.alon@oracle.com> > > > Why do you need to do dev_close/dev_open which will bounce > the link? What we need is notify userspace that link went up/down. close/open will do that but just sending notifications would do that as well without playing with link states.
On 3/27/2019 6:25 AM, Michael S. Tsirkin wrote: > On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote: >> On Tue, 26 Mar 2019 19:48:13 -0400 >> Si-Wei Liu <si-wei.liu@oracle.com> wrote: >> >>> When a netdev appears through hot plug then gets enslaved by a failover >>> master that is already up and running, the slave will be opened >>> right away after getting enslaved. Today there's a race that userspace >>> (udev) may fail to rename the slave if the kernel (net_failover) >>> opens the slave earlier than when the userspace rename happens. >>> Unlike bond or team, the primary slave of failover can't be renamed by >>> userspace ahead of time, since the kernel initiated auto-enslavement is >>> unable to, or rather, is never meant to be synchronized with the rename >>> request from userspace. >>> >>> As the failover slave interfaces are not designed to be operated >>> directly by userspace apps: IP configuration, filter rules with >>> regard to network traffic passing and etc., should all be done on master >>> interface. In general, userspace apps only care about the >>> name of master interface, while slave names are less important as long >>> as admin users can see reliable names that may carry >>> other information describing the netdev. For e.g., they can infer that >>> "ens3nsby" is a standby slave of "ens3", while for a >>> name like "eth0" they can't tell which master it belongs to. >>> >>> Historically the name of IFF_UP interface can't be changed because >>> there might be admin script or management software that is already >>> relying on such behavior and assumes that the slave name can't be >>> changed once UP. But failover is special: with the in-kernel >>> auto-enslavement mechanism, the userspace expectation for device >>> enumeration and bring-up order is already broken. Previously initramfs >>> and various userspace config tools were modified to bypass failover >>> slaves because of auto-enslavement and duplicate MAC address. Similarly, >>> in case that users care about seeing reliable slave name, the new type >>> of failover slaves needs to be taken care of specifically in userspace >>> anyway. >>> >>> It's less risky to lift up the rename restriction on failover slave >>> which is already UP. Although it's possible this change may potentially >>> break userspace component (most likely configuration scripts or >>> management software) that assumes slave name can't be changed while >>> UP, it's relatively a limited and controllable set among all userspace >>> components, which can be fixed specifically to listen for the rename >>> and/or link down/up events on failover slaves. Userspace component >>> interacting with slaves is expected to be changed to operate on failover >>> master interface instead, as the failover slave is dynamic in nature >>> which may come and go at any point. The goal is to make the role of >>> failover slaves less relevant, and userspace components should only >>> deal with failover master in the long run. >>> >>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") >>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >>> Reviewed-by: Liran Alon <liran.alon@oracle.com> >> >> Why do you need to do dev_close/dev_open which will bounce >> the link? > What we need is notify userspace that link went up/down. > close/open will do that but just sending notifications > would do that as well without playing with link states. > Since you were requesting to send fake link down/up events around rename, so as to keep existing userspace intact with this behavioral change, right? The thing is if you can't fake notification with just IFF_UP or ~IFF_UP then claim everything is done. If you look at rtnl_fill_ifinfo() where the notification payload is prepared, you'll find a lot of states and flags are correlated: ifi_flags IFLA_OPERSTATE IFLA_CARRIER IFLA_CARRIER_CHANGES which requires below states to be toggled or taken care of in between: operstate __LINK_STATE_START __LINK_STATE_NOCARRIER carrier_changes for e.g. user mostly treats IFF_RUNNING as the indication of link up/down as opposed to IFF_UP. That would require you to toggle __LINK_STATE_START (and operstate as well) without doing a full dev_close/open. Since __LINK_STATE_START is cleared, there's no sense to let CARRIER_OK remain set, and then you'd need to take care of carrier_changes... Since you don't really shutting down the device, the link watchdog keeps running and may race with inconsistent carrier state in between. dev_close/open may have done unneeded work, but it's the safest option IMHO, as apparently the cost and ugly complexity to fake link down/up events is not something worthwhile compared to simply bouncing the link state. Another point is kernel consumers of the NETDEV_CHANGENAME notifier might well assume the link is already taken down by dev_close() before the rename. I didn't check all those consumers in tree but thought it might be safe to keep the current convention. Now let me turn around and ask you what's your concerns if bouncing the link state. While I can tweak a lightweight version of dev_close/open to bypass ndo_stop and ndo_start while shutting down the link watchdog on behalf of drivers, it's far more involved than make me think if that's really what you had in mind. Another less safer option is that we just notify userspace anyway without sending down/up event around, as I don't see *any real application* cares about the link state or whatsoever when it attempts to detect rename. Given that the scope is limited to failover slave the chance of breaking userspace app would be extremely low in practice. Thanks, -Siwei
On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote: > Another less safer option is that we just notify userspace anyway without > sending down/up event around, as I don't see *any real application* cares > about the link state or whatsoever when it attempts to detect rename. How do you write a race ree handler then? ATM just detecting link up is sufficient and covers 100% of cases. Seems like a good idea to keep it that way.
On 3/27/2019 3:16 PM, Michael S. Tsirkin wrote: > On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote: >> Another less safer option is that we just notify userspace anyway without >> sending down/up event around, as I don't see *any real application* cares >> about the link state or whatsoever when it attempts to detect rename. > How do you write a race ree handler then? ATM just detecting link up is > sufficient and covers 100% of cases. Seems like a good idea to keep it > that way. > What do you mean? Which flag or attribute do you think 100% of the userspace regard it as link up? And why userspace that cares about name change needs to double check whether link is up or not? I'm sorry, but are you sure this is the 100% case that every userspace app does it like what you're claiming? -Siwei
On 3/27/2019 3:31 PM, si-wei liu wrote: > > > On 3/27/2019 3:16 PM, Michael S. Tsirkin wrote: >> On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote: >>> Another less safer option is that we just notify userspace anyway >>> without >>> sending down/up event around, as I don't see *any real application* >>> cares >>> about the link state or whatsoever when it attempts to detect rename. >> How do you write a race ree handler then? ATM just detecting link up is >> sufficient and covers 100% of cases. Seems like a good idea to keep it >> that way. >> > What do you mean? Which flag or attribute do you think 100% of the > userspace regard it as link up? And why userspace that cares about > name change needs to double check whether link is up or not? I'm > sorry, but are you sure this is the 100% case that every userspace app > does it like what you're claiming? > > -Siwei To me even the any flag checking regarding link state is unnecessary as it's hard for userspace apps to follow precisely the name change together with the link state. I'm not sure if you still persist in sending link down/up event. What the userspace apps that care about the name so far as I see chase the name through ifindex. I'm really doubtful if link state is that important to them. But if you really need that consistency I'd say bouncing the link is perhaps the only safe option. -Siwei
On 3/27/2019 4:11 AM, Jiri Pirko wrote: > Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote: >> When a netdev appears through hot plug then gets enslaved by a failover >> master that is already up and running, the slave will be opened >> right away after getting enslaved. Today there's a race that userspace >> (udev) may fail to rename the slave if the kernel (net_failover) >> opens the slave earlier than when the userspace rename happens. >> Unlike bond or team, the primary slave of failover can't be renamed by >> userspace ahead of time, since the kernel initiated auto-enslavement is >> unable to, or rather, is never meant to be synchronized with the rename >> request from userspace. >> >> As the failover slave interfaces are not designed to be operated >> directly by userspace apps: IP configuration, filter rules with >> regard to network traffic passing and etc., should all be done on master >> interface. In general, userspace apps only care about the >> name of master interface, while slave names are less important as long >> as admin users can see reliable names that may carry >> other information describing the netdev. For e.g., they can infer that >> "ens3nsby" is a standby slave of "ens3", while for a >> name like "eth0" they can't tell which master it belongs to. >> >> Historically the name of IFF_UP interface can't be changed because >> there might be admin script or management software that is already >> relying on such behavior and assumes that the slave name can't be >> changed once UP. But failover is special: with the in-kernel >> auto-enslavement mechanism, the userspace expectation for device >> enumeration and bring-up order is already broken. Previously initramfs >> and various userspace config tools were modified to bypass failover >> slaves because of auto-enslavement and duplicate MAC address. Similarly, >> in case that users care about seeing reliable slave name, the new type >> of failover slaves needs to be taken care of specifically in userspace >> anyway. >> >> It's less risky to lift up the rename restriction on failover slave >> which is already UP. Although it's possible this change may potentially >> break userspace component (most likely configuration scripts or >> management software) that assumes slave name can't be changed while >> UP, it's relatively a limited and controllable set among all userspace >> components, which can be fixed specifically to listen for the rename >> and/or link down/up events on failover slaves. Userspace component >> interacting with slaves is expected to be changed to operate on failover >> master interface instead, as the failover slave is dynamic in nature >> which may come and go at any point. The goal is to make the role of >> failover slaves less relevant, and userspace components should only >> deal with failover master in the long run. >> >> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >> Reviewed-by: Liran Alon <liran.alon@oracle.com> >> >> -- >> v1 -> v2: >> - Drop configurable module parameter (Sridhar) >> >> v2 -> v3: >> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar) >> - Send down and up events around rename (Michael S. Tsirkin) >> --- >> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++--- >> 1 file changed, 34 insertions(+), 3 deletions(-) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 722d50d..3e0cd80 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev, >> int dev_change_name(struct net_device *dev, const char *newname) >> { >> unsigned char old_assign_type; >> + bool reopen_needed = false; >> char oldname[IFNAMSIZ]; >> int err = 0; >> int ret; >> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname) >> BUG_ON(!dev_net(dev)); >> >> net = dev_net(dev); >> - if (dev->flags & IFF_UP) >> - return -EBUSY; >> + >> + /* Allow failover slave to rename even when >> + * it is up and running. >> + * >> + * Failover slaves are special, since userspace >> + * might rename the slave after the interface >> + * has been brought up and running due to >> + * auto-enslavement. >> + * >> + * Failover users don't actually care about slave >> + * name change, as they are only expected to operate >> + * on master interface directly. >> + */ >> + if (dev->flags & IFF_UP) { >> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE))) >> + return -EBUSY; >> + reopen_needed = true; >> + } >> >> write_seqcount_begin(&devnet_rename_seq); >> >> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname) >> return err; >> } >> >> + if (reopen_needed) >> + dev_close(dev); > Ugh. Don't dev_close/dev_open on name change. See my response to Michael and Stephen. What's your suggestion then? > >> + >> if (oldname[0] && !strchr(oldname, '%')) >> netdev_info(dev, "renamed from %s\n", oldname); >> >> @@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname) >> memcpy(dev->name, oldname, IFNAMSIZ); >> dev->name_assign_type = old_assign_type; >> write_seqcount_end(&devnet_rename_seq); >> - return ret; >> + if (err >= 0) >> + err = ret; >> + goto reopen; >> } >> >> write_seqcount_end(&devnet_rename_seq); >> @@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname) >> } >> } >> >> +reopen: >> + if (reopen_needed) { >> + ret = dev_open(dev); >> + if (ret) { >> + pr_err("%s: reopen device failed: %d\n", >> + dev->name, ret); >> + } >> + } >> + >> return err; >> } >> >> -- >> 1.8.3.1 >>
On Wed, 27 Mar 2019 16:44:19 -0700 si-wei liu <si-wei.liu@oracle.com> wrote: > On 3/27/2019 4:11 AM, Jiri Pirko wrote: > > Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote: > >> When a netdev appears through hot plug then gets enslaved by a failover > >> master that is already up and running, the slave will be opened > >> right away after getting enslaved. Today there's a race that userspace > >> (udev) may fail to rename the slave if the kernel (net_failover) > >> opens the slave earlier than when the userspace rename happens. > >> Unlike bond or team, the primary slave of failover can't be renamed by > >> userspace ahead of time, since the kernel initiated auto-enslavement is > >> unable to, or rather, is never meant to be synchronized with the rename > >> request from userspace. > >> > >> As the failover slave interfaces are not designed to be operated > >> directly by userspace apps: IP configuration, filter rules with > >> regard to network traffic passing and etc., should all be done on master > >> interface. In general, userspace apps only care about the > >> name of master interface, while slave names are less important as long > >> as admin users can see reliable names that may carry > >> other information describing the netdev. For e.g., they can infer that > >> "ens3nsby" is a standby slave of "ens3", while for a > >> name like "eth0" they can't tell which master it belongs to. > >> > >> Historically the name of IFF_UP interface can't be changed because > >> there might be admin script or management software that is already > >> relying on such behavior and assumes that the slave name can't be > >> changed once UP. But failover is special: with the in-kernel > >> auto-enslavement mechanism, the userspace expectation for device > >> enumeration and bring-up order is already broken. Previously initramfs > >> and various userspace config tools were modified to bypass failover > >> slaves because of auto-enslavement and duplicate MAC address. Similarly, > >> in case that users care about seeing reliable slave name, the new type > >> of failover slaves needs to be taken care of specifically in userspace > >> anyway. > >> > >> It's less risky to lift up the rename restriction on failover slave > >> which is already UP. Although it's possible this change may potentially > >> break userspace component (most likely configuration scripts or > >> management software) that assumes slave name can't be changed while > >> UP, it's relatively a limited and controllable set among all userspace > >> components, which can be fixed specifically to listen for the rename > >> and/or link down/up events on failover slaves. Userspace component > >> interacting with slaves is expected to be changed to operate on failover > >> master interface instead, as the failover slave is dynamic in nature > >> which may come and go at any point. The goal is to make the role of > >> failover slaves less relevant, and userspace components should only > >> deal with failover master in the long run. > >> > >> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") > >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > >> Reviewed-by: Liran Alon <liran.alon@oracle.com> > >> > >> -- > >> v1 -> v2: > >> - Drop configurable module parameter (Sridhar) > >> > >> v2 -> v3: > >> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar) > >> - Send down and up events around rename (Michael S. Tsirkin) > >> --- > >> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++--- > >> 1 file changed, 34 insertions(+), 3 deletions(-) > >> > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index 722d50d..3e0cd80 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev, > >> int dev_change_name(struct net_device *dev, const char *newname) > >> { > >> unsigned char old_assign_type; > >> + bool reopen_needed = false; > >> char oldname[IFNAMSIZ]; > >> int err = 0; > >> int ret; > >> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname) > >> BUG_ON(!dev_net(dev)); > >> > >> net = dev_net(dev); > >> - if (dev->flags & IFF_UP) > >> - return -EBUSY; > >> + > >> + /* Allow failover slave to rename even when > >> + * it is up and running. > >> + * > >> + * Failover slaves are special, since userspace > >> + * might rename the slave after the interface > >> + * has been brought up and running due to > >> + * auto-enslavement. > >> + * > >> + * Failover users don't actually care about slave > >> + * name change, as they are only expected to operate > >> + * on master interface directly. > >> + */ > >> + if (dev->flags & IFF_UP) { > >> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE))) > >> + return -EBUSY; > >> + reopen_needed = true; > >> + } > >> > >> write_seqcount_begin(&devnet_rename_seq); > >> > >> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname) > >> return err; > >> } > >> > >> + if (reopen_needed) > >> + dev_close(dev); > > Ugh. Don't dev_close/dev_open on name change. > See my response to Michael and Stephen. What's your suggestion then? To a DEV_CHANGE notification instead? My opinion is that allowing name change is not worth the doing. Also, the kernel should never do the name change, it is up to userspace.
Hi Si-Wei, Thank you for the patch! Yet something to improve: [auto build test ERROR on net/master] url: https://github.com/0day-ci/linux/commits/Si-Wei-Liu/failover-allow-name-change-on-IFF_UP-slave-interfaces/20190329-020744 config: openrisc-or1ksim_defconfig (attached as .config) compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental) reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=openrisc All errors (new ones prefixed by >>): net/core/dev.c: In function 'dev_change_name': >> net/core/dev.c:1277:9: error: too few arguments to function 'dev_open' ret = dev_open(dev); ^~~~~~~~ In file included from net/core/dev.c:93:0: include/linux/netdevice.h:2636:5: note: declared here int dev_open(struct net_device *dev, struct netlink_ext_ack *extack); ^~~~~~~~ vim +/dev_open +1277 net/core/dev.c 1166 1167 /** 1168 * dev_change_name - change name of a device 1169 * @dev: device 1170 * @newname: name (or format string) must be at least IFNAMSIZ 1171 * 1172 * Change name of a device, can pass format strings "eth%d". 1173 * for wildcarding. 1174 */ 1175 int dev_change_name(struct net_device *dev, const char *newname) 1176 { 1177 unsigned char old_assign_type; 1178 bool reopen_needed = false; 1179 char oldname[IFNAMSIZ]; 1180 int err = 0; 1181 int ret; 1182 struct net *net; 1183 1184 ASSERT_RTNL(); 1185 BUG_ON(!dev_net(dev)); 1186 1187 net = dev_net(dev); 1188 1189 /* Allow failover slave to rename even when 1190 * it is up and running. 1191 * 1192 * Failover slaves are special, since userspace 1193 * might rename the slave after the interface 1194 * has been brought up and running due to 1195 * auto-enslavement. 1196 * 1197 * Failover users don't actually care about slave 1198 * name change, as they are only expected to operate 1199 * on master interface directly. 1200 */ 1201 if (dev->flags & IFF_UP) { 1202 if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE))) 1203 return -EBUSY; 1204 reopen_needed = true; 1205 } 1206 1207 write_seqcount_begin(&devnet_rename_seq); 1208 1209 if (strncmp(newname, dev->name, IFNAMSIZ) == 0) { 1210 write_seqcount_end(&devnet_rename_seq); 1211 return 0; 1212 } 1213 1214 memcpy(oldname, dev->name, IFNAMSIZ); 1215 1216 err = dev_get_valid_name(net, dev, newname); 1217 if (err < 0) { 1218 write_seqcount_end(&devnet_rename_seq); 1219 return err; 1220 } 1221 1222 if (reopen_needed) 1223 dev_close(dev); 1224 1225 if (oldname[0] && !strchr(oldname, '%')) 1226 netdev_info(dev, "renamed from %s\n", oldname); 1227 1228 old_assign_type = dev->name_assign_type; 1229 dev->name_assign_type = NET_NAME_RENAMED; 1230 1231 rollback: 1232 ret = device_rename(&dev->dev, dev->name); 1233 if (ret) { 1234 memcpy(dev->name, oldname, IFNAMSIZ); 1235 dev->name_assign_type = old_assign_type; 1236 write_seqcount_end(&devnet_rename_seq); 1237 if (err >= 0) 1238 err = ret; 1239 goto reopen; 1240 } 1241 1242 write_seqcount_end(&devnet_rename_seq); 1243 1244 netdev_adjacent_rename_links(dev, oldname); 1245 1246 write_lock_bh(&dev_base_lock); 1247 hlist_del_rcu(&dev->name_hlist); 1248 write_unlock_bh(&dev_base_lock); 1249 1250 synchronize_rcu(); 1251 1252 write_lock_bh(&dev_base_lock); 1253 hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name)); 1254 write_unlock_bh(&dev_base_lock); 1255 1256 ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); 1257 ret = notifier_to_errno(ret); 1258 1259 if (ret) { 1260 /* err >= 0 after dev_alloc_name() or stores the first errno */ 1261 if (err >= 0) { 1262 err = ret; 1263 write_seqcount_begin(&devnet_rename_seq); 1264 memcpy(dev->name, oldname, IFNAMSIZ); 1265 memcpy(oldname, newname, IFNAMSIZ); 1266 dev->name_assign_type = old_assign_type; 1267 old_assign_type = NET_NAME_RENAMED; 1268 goto rollback; 1269 } else { 1270 pr_err("%s: name change rollback failed: %d\n", 1271 dev->name, ret); 1272 } 1273 } 1274 1275 reopen: 1276 if (reopen_needed) { > 1277 ret = dev_open(dev); 1278 if (ret) { 1279 pr_err("%s: reopen device failed: %d\n", 1280 dev->name, ret); 1281 } 1282 } 1283 1284 return err; 1285 } 1286 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 3/28/2019 10:14 AM, Stephen Hemminger wrote: > On Wed, 27 Mar 2019 16:44:19 -0700 > si-wei liu <si-wei.liu@oracle.com> wrote: > >> On 3/27/2019 4:11 AM, Jiri Pirko wrote: >>> Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote: >>>> When a netdev appears through hot plug then gets enslaved by a failover >>>> master that is already up and running, the slave will be opened >>>> right away after getting enslaved. Today there's a race that userspace >>>> (udev) may fail to rename the slave if the kernel (net_failover) >>>> opens the slave earlier than when the userspace rename happens. >>>> Unlike bond or team, the primary slave of failover can't be renamed by >>>> userspace ahead of time, since the kernel initiated auto-enslavement is >>>> unable to, or rather, is never meant to be synchronized with the rename >>>> request from userspace. >>>> >>>> As the failover slave interfaces are not designed to be operated >>>> directly by userspace apps: IP configuration, filter rules with >>>> regard to network traffic passing and etc., should all be done on master >>>> interface. In general, userspace apps only care about the >>>> name of master interface, while slave names are less important as long >>>> as admin users can see reliable names that may carry >>>> other information describing the netdev. For e.g., they can infer that >>>> "ens3nsby" is a standby slave of "ens3", while for a >>>> name like "eth0" they can't tell which master it belongs to. >>>> >>>> Historically the name of IFF_UP interface can't be changed because >>>> there might be admin script or management software that is already >>>> relying on such behavior and assumes that the slave name can't be >>>> changed once UP. But failover is special: with the in-kernel >>>> auto-enslavement mechanism, the userspace expectation for device >>>> enumeration and bring-up order is already broken. Previously initramfs >>>> and various userspace config tools were modified to bypass failover >>>> slaves because of auto-enslavement and duplicate MAC address. Similarly, >>>> in case that users care about seeing reliable slave name, the new type >>>> of failover slaves needs to be taken care of specifically in userspace >>>> anyway. >>>> >>>> It's less risky to lift up the rename restriction on failover slave >>>> which is already UP. Although it's possible this change may potentially >>>> break userspace component (most likely configuration scripts or >>>> management software) that assumes slave name can't be changed while >>>> UP, it's relatively a limited and controllable set among all userspace >>>> components, which can be fixed specifically to listen for the rename >>>> and/or link down/up events on failover slaves. Userspace component >>>> interacting with slaves is expected to be changed to operate on failover >>>> master interface instead, as the failover slave is dynamic in nature >>>> which may come and go at any point. The goal is to make the role of >>>> failover slaves less relevant, and userspace components should only >>>> deal with failover master in the long run. >>>> >>>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >>>> Reviewed-by: Liran Alon <liran.alon@oracle.com> >>>> >>>> -- >>>> v1 -> v2: >>>> - Drop configurable module parameter (Sridhar) >>>> >>>> v2 -> v3: >>>> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar) >>>> - Send down and up events around rename (Michael S. Tsirkin) >>>> --- >>>> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++--- >>>> 1 file changed, 34 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>> index 722d50d..3e0cd80 100644 >>>> --- a/net/core/dev.c >>>> +++ b/net/core/dev.c >>>> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev, >>>> int dev_change_name(struct net_device *dev, const char *newname) >>>> { >>>> unsigned char old_assign_type; >>>> + bool reopen_needed = false; >>>> char oldname[IFNAMSIZ]; >>>> int err = 0; >>>> int ret; >>>> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname) >>>> BUG_ON(!dev_net(dev)); >>>> >>>> net = dev_net(dev); >>>> - if (dev->flags & IFF_UP) >>>> - return -EBUSY; >>>> + >>>> + /* Allow failover slave to rename even when >>>> + * it is up and running. >>>> + * >>>> + * Failover slaves are special, since userspace >>>> + * might rename the slave after the interface >>>> + * has been brought up and running due to >>>> + * auto-enslavement. >>>> + * >>>> + * Failover users don't actually care about slave >>>> + * name change, as they are only expected to operate >>>> + * on master interface directly. >>>> + */ >>>> + if (dev->flags & IFF_UP) { >>>> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE))) >>>> + return -EBUSY; >>>> + reopen_needed = true; >>>> + } >>>> >>>> write_seqcount_begin(&devnet_rename_seq); >>>> >>>> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname) >>>> return err; >>>> } >>>> >>>> + if (reopen_needed) >>>> + dev_close(dev); >>> Ugh. Don't dev_close/dev_open on name change. >> See my response to Michael and Stephen. What's your suggestion then? > To a DEV_CHANGE notification instead? Thanks. That is what I was thinking. Will post a v4 to simplify the notification. Not worth the effort to fine tune for cases in hypothesis. -Siwei > > > My opinion is that allowing name change is not worth the doing. > Also, the kernel should never do the name change, it is up to userspace. >
diff --git a/net/core/dev.c b/net/core/dev.c index 722d50d..3e0cd80 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev, int dev_change_name(struct net_device *dev, const char *newname) { unsigned char old_assign_type; + bool reopen_needed = false; char oldname[IFNAMSIZ]; int err = 0; int ret; @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname) BUG_ON(!dev_net(dev)); net = dev_net(dev); - if (dev->flags & IFF_UP) - return -EBUSY; + + /* Allow failover slave to rename even when + * it is up and running. + * + * Failover slaves are special, since userspace + * might rename the slave after the interface + * has been brought up and running due to + * auto-enslavement. + * + * Failover users don't actually care about slave + * name change, as they are only expected to operate + * on master interface directly. + */ + if (dev->flags & IFF_UP) { + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE))) + return -EBUSY; + reopen_needed = true; + } write_seqcount_begin(&devnet_rename_seq); @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname) return err; } + if (reopen_needed) + dev_close(dev); + if (oldname[0] && !strchr(oldname, '%')) netdev_info(dev, "renamed from %s\n", oldname); @@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname) memcpy(dev->name, oldname, IFNAMSIZ); dev->name_assign_type = old_assign_type; write_seqcount_end(&devnet_rename_seq); - return ret; + if (err >= 0) + err = ret; + goto reopen; } write_seqcount_end(&devnet_rename_seq); @@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname) } } +reopen: + if (reopen_needed) { + ret = dev_open(dev); + if (ret) { + pr_err("%s: reopen device failed: %d\n", + dev->name, ret); + } + } + return err; }