Message ID | 3a247e91ebe81cdae4bae27ec1631c5015fb943f.1552672441.git.petrm@mellanox.com |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | RTNL: Add link-down reason reporting | expand |
On Fri, 15 Mar 2019 17:56:07 +0000, Petr Machata wrote: > diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h > index e2091bb2b3a8..cfd9e86ff0ca 100644 > --- a/include/net/rtnetlink.h > +++ b/include/net/rtnetlink.h > @@ -110,6 +110,9 @@ struct rtnl_link_ops { > int (*fill_linkxstats)(struct sk_buff *skb, > const struct net_device *dev, > int *prividx, int attr); > + size_t (*link_down_reason_get_size)(const struct net_device *dev); > + int (*fill_link_down_reason)(struct sk_buff *skb, > + const struct net_device *dev); IMHO the API is a little heavy for returning, what is effectively a u64 value sliced in two.. Perhaps the core can just assume the reason will be provided if the NDO is present? And the "fill" NDO should probably fill in the reason structure, rather than getting the skb passed and dealing with netlink directly. Also perhaps this would be a ethtool-nl candidate (which would hopefully land soon after the merge window)? > }; > > int __rtnl_link_register(struct rtnl_link_ops *ops); > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 5b225ff63b48..a47f42e79741 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -167,6 +167,8 @@ enum { > IFLA_NEW_IFINDEX, > IFLA_MIN_MTU, > IFLA_MAX_MTU, > + IFLA_LINK_DOWN_REASON_MAJOR, /* enum rtnl_link_down_reason_major */ > + IFLA_LINK_DOWN_REASON_MINOR, > __IFLA_MAX > };
Hi Petr > +enum rtnl_link_down_reason_major { > + RTNL_LDR_OTHER, Does 'other' make any sense? Seem better to just not report anything at all, or add a comment that more reasons should be added at the end to reflect whatever the hardware or software can determine. > + RTNL_LDR_NO_CABLE, > + RTNL_LDR_UNSUPPORTED_CABLE, > + RTNL_LDR_AUTONEG_FAILURE, > + RTNL_LDR_NO_LINK_PARTNER, > + RTNL_LDR_LINK_TRAINING_FAILURE, > + RTNL_LDR_LOGICAL_MISMATCH, > + RTNL_LDR_REMOTE_FAULT, > + RTNL_LDR_BAD_SIGNAL_INTEGRITY, > + RTNL_LDR_CALIBRATION_FAILURE, > + RTNL_LDR_POWER_BUDGET_EXCEEDED, > +}; What about SFP cage empty?, i.e. no SFP, SFP+ module in the cage? An SFP can also report LOS. That does not appear to be any of the above. Or that the core SFP code has been unable to read the EEPROM? We have people reporting this problem at the moment. We also have that the SERDES has not yet obtained sync to its peer, which you know from phylink_mac_change. That probably means the peer is using a different bit rate. I think it would be good if you handle the general case errors which phylib and phylink can report, as well as the proprietary cases your driver can report. We don't want this to be a Mellanox only API. Andrew
On Fri, Mar 15, 2019 at 07:26:11PM -0700, Jakub Kicinski wrote: > On Fri, 15 Mar 2019 17:56:07 +0000, Petr Machata wrote: > > diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h > > index e2091bb2b3a8..cfd9e86ff0ca 100644 > > --- a/include/net/rtnetlink.h > > +++ b/include/net/rtnetlink.h > > @@ -110,6 +110,9 @@ struct rtnl_link_ops { > > int (*fill_linkxstats)(struct sk_buff *skb, > > const struct net_device *dev, > > int *prividx, int attr); > > + size_t (*link_down_reason_get_size)(const struct net_device *dev); > > + int (*fill_link_down_reason)(struct sk_buff *skb, > > + const struct net_device *dev); > > IMHO the API is a little heavy for returning, what is effectively a u64 > value sliced in two.. Agreed. Unless we can expect more attributes to follow, repeating the same code to generate netlink attributes in each driver doesn't feel right. > Perhaps the core can just assume the reason will be provided if the NDO > is present? And the "fill" NDO should probably fill in the reason > structure, rather than getting the skb passed and dealing with netlink > directly. We could always reserve e.g. 0 as "none" which driver would use e.g. if it only provides major reason id and not minor. > Also perhaps this would be a ethtool-nl candidate (which would > hopefully land soon after the merge window)? Looking at the suggested values for major reasons, it does indeed seem to belong to link related information provided by ETHTOOL_GLINKSETTINGS request. Maybe nesting the reason with link state, i.e. ETHA_SETTINGS_LINK_STATE (nested) ETHA_LINKSTATE_LINK (u8) ETHA_LINKSTATE_DOWN_REASON_MAJOR (u32) ETHA_LINKSTATE_DOWN_REASON_MINOR (u32) Michal
On Fri, Mar 15, 2019 at 10:56 AM Petr Machata <petrm@mellanox.com> wrote: > > In general, after a port is put administratively up, certain handshake > protocols have to finish successfully, otherwise the port is left in a > NO-CARRIER or DORMANT state. When that happens, it would be useful to > communicate the reasons to the administrator, so that the problem that > prevents the link from being established can be corrected. > > Introduce two new link attributes: IFLA_LINK_DOWN_REASON_MAJOR and > _MINOR. Major reason code serve as broad categories intended to convey a > general idea of where the problem is. Minor codes are arbitrary numbers > specific for the driver that add detail to the major reasons. Add enum > rtnl_link_down_reason_major to define the well-known major reason codes. > > The party with visibility into details of this process is the driver. > Therefore add two new RTNL hooks, link_down_reason_get_size and > fill_link_down_reason, to provide the necessary information. > > Link-down reason is not included if the port is up or administratively > down, because those two state are easy to discover through existing > interfaces. > > Signed-off-by: Petr Machata <petrm@mellanox.com> This looks good and will be use-full. But i have some comments on the implementation below. Also, carrier can go down due to protocol down (IFLA_PROTODOWN). And I get asked about supporting reason codes or protocol owner for such protodown reason (I have not given it much thought yet. I will see if there is a way to use your series for that as well). > --- > include/net/rtnetlink.h | 3 +++ > include/uapi/linux/if_link.h | 16 ++++++++++++++++ > net/core/rtnetlink.c | 22 ++++++++++++++++++++++ > 3 files changed, 41 insertions(+) > > diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h > index e2091bb2b3a8..cfd9e86ff0ca 100644 > --- a/include/net/rtnetlink.h > +++ b/include/net/rtnetlink.h > @@ -110,6 +110,9 @@ struct rtnl_link_ops { > int (*fill_linkxstats)(struct sk_buff *skb, > const struct net_device *dev, > int *prividx, int attr); > + size_t (*link_down_reason_get_size)(const struct net_device *dev); > + int (*fill_link_down_reason)(struct sk_buff *skb, > + const struct net_device *dev); > }; Any reason to restrict this to network interfaces which support rtnl_link_ops ?. I also saw that you added rtnl_link_ops to mlxsw for this. Which also means every driver will now have to declare rtnl_link_ops to use this ?. I think we should keep rtnl_link_ops to logical links like bridge, bonds etc (ie which support newlink and dellink). Can't we use netdev_ops for this ?. That will allow any driver to just support this readily. > > int __rtnl_link_register(struct rtnl_link_ops *ops); > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 5b225ff63b48..a47f42e79741 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -167,6 +167,8 @@ enum { > IFLA_NEW_IFINDEX, > IFLA_MIN_MTU, > IFLA_MAX_MTU, > + IFLA_LINK_DOWN_REASON_MAJOR, /* enum rtnl_link_down_reason_major */ > + IFLA_LINK_DOWN_REASON_MINOR, > __IFLA_MAX > }; > > @@ -239,6 +241,20 @@ enum in6_addr_gen_mode { > IN6_ADDR_GEN_MODE_RANDOM, > }; > > +enum rtnl_link_down_reason_major { > + RTNL_LDR_OTHER, > + RTNL_LDR_NO_CABLE, > + RTNL_LDR_UNSUPPORTED_CABLE, > + RTNL_LDR_AUTONEG_FAILURE, > + RTNL_LDR_NO_LINK_PARTNER, > + RTNL_LDR_LINK_TRAINING_FAILURE, > + RTNL_LDR_LOGICAL_MISMATCH, > + RTNL_LDR_REMOTE_FAULT, > + RTNL_LDR_BAD_SIGNAL_INTEGRITY, > + RTNL_LDR_CALIBRATION_FAILURE, > + RTNL_LDR_POWER_BUDGET_EXCEEDED, > +}; prefer LINK_DOWN_REASON_* or LINKDOWN_REASON_* (Though there is no predefined convention, the prefix RTNL makes it feel like a top-level attribute when its really a value for an IFLA_* attribute.) > + > /* Bridge section */ > > enum { > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index a51cab95ba64..206795f13850 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -979,6 +979,22 @@ static size_t rtnl_xdp_size(void) > return xdp_size; > } > > +static bool rtnl_should_fill_link_down_reason(const struct net_device *dev) > +{ > + return (dev->flags & IFF_UP) && !netif_oper_up(dev) && > + dev->rtnl_link_ops && > + dev->rtnl_link_ops->link_down_reason_get_size && > + dev->rtnl_link_ops->fill_link_down_reason; > +} > + > +static size_t rtnl_link_down_reason_get_size(const struct net_device *dev) > +{ > + if (!rtnl_should_fill_link_down_reason(dev)) > + return 0; > + > + return dev->rtnl_link_ops->link_down_reason_get_size(dev); > +} > + > static noinline size_t if_nlmsg_size(const struct net_device *dev, > u32 ext_filter_mask) > { > @@ -1026,6 +1042,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, > + nla_total_size(4) /* IFLA_CARRIER_DOWN_COUNT */ > + nla_total_size(4) /* IFLA_MIN_MTU */ > + nla_total_size(4) /* IFLA_MAX_MTU */ > + + rtnl_link_down_reason_get_size(dev) > + 0; > } > > @@ -1683,6 +1700,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, > nla_put_s32(skb, IFLA_NEW_IFINDEX, new_ifindex) < 0) > goto nla_put_failure; > > + if (rtnl_should_fill_link_down_reason(dev) && > + dev->rtnl_link_ops->fill_link_down_reason(skb, dev)) > + goto nla_put_failure; > > rcu_read_lock(); > if (rtnl_fill_link_af(skb, dev, ext_filter_mask)) > @@ -1742,6 +1762,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { > [IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 }, > [IFLA_MIN_MTU] = { .type = NLA_U32 }, > [IFLA_MAX_MTU] = { .type = NLA_U32 }, > + [IFLA_LINK_DOWN_REASON_MAJOR] = { .type = NLA_U32 }, > + [IFLA_LINK_DOWN_REASON_MINOR] = { .type = NLA_U32 }, > }; > > static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { > -- > 2.4.11 >
> Can't we use netdev_ops for this ?. That will allow any driver to just > support this readily. Hi Roopa I argued this is a PHY layer status information. We don't really want to have to modify every MAC driver to call into phylib/phylink. Yes, we can have a netdev_ops for those drivers which ignore the Linux PHY layer, but ideally we want to transparently call into the PHY layer to get this information, if the MAC is not doing odd things. Andrew
Roopa Prabhu <roopa@cumulusnetworks.com> writes: > This looks good and will be use-full. But i have some comments on the > implementation below. > Also, carrier can go down due to protocol down (IFLA_PROTODOWN). And I > get asked about supporting > reason codes or protocol owner for such protodown reason (I have not > given it much thought yet. I will see if there is a way > to use your series for that as well). My thinking was that since protocol down is set from userspace, it's under admin control, and that's where the reason signalling should be. >> --- >> include/net/rtnetlink.h | 3 +++ >> include/uapi/linux/if_link.h | 16 ++++++++++++++++ >> net/core/rtnetlink.c | 22 ++++++++++++++++++++++ >> 3 files changed, 41 insertions(+) >> >> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h >> index e2091bb2b3a8..cfd9e86ff0ca 100644 >> --- a/include/net/rtnetlink.h >> +++ b/include/net/rtnetlink.h >> @@ -110,6 +110,9 @@ struct rtnl_link_ops { >> int (*fill_linkxstats)(struct sk_buff *skb, >> const struct net_device *dev, >> int *prividx, int attr); >> + size_t (*link_down_reason_get_size)(const struct net_device *dev); >> + int (*fill_link_down_reason)(struct sk_buff *skb, >> + const struct net_device *dev); >> }; > > Any reason to restrict this to network interfaces which support rtnl_link_ops ?. > I also saw that you added rtnl_link_ops to mlxsw for this. Which also > means every driver will now have to declare rtnl_link_ops to use this > ?. > I think we should keep rtnl_link_ops to logical links like bridge, > bonds etc (ie which support newlink and dellink). > > Can't we use netdev_ops for this ?. That will allow any driver to just > support this readily. I guess you are right. >> +enum rtnl_link_down_reason_major { >> + RTNL_LDR_OTHER, >> + RTNL_LDR_NO_CABLE, >> + RTNL_LDR_UNSUPPORTED_CABLE, >> + RTNL_LDR_AUTONEG_FAILURE, >> + RTNL_LDR_NO_LINK_PARTNER, >> + RTNL_LDR_LINK_TRAINING_FAILURE, >> + RTNL_LDR_LOGICAL_MISMATCH, >> + RTNL_LDR_REMOTE_FAULT, >> + RTNL_LDR_BAD_SIGNAL_INTEGRITY, >> + RTNL_LDR_CALIBRATION_FAILURE, >> + RTNL_LDR_POWER_BUDGET_EXCEEDED, >> +}; > > prefer LINK_DOWN_REASON_* or LINKDOWN_REASON_* > (Though there is no predefined convention, the prefix RTNL makes it > feel like a top-level attribute when its really a value for an IFLA_* > attribute.) OK.
Jakub Kicinski <jakub.kicinski@netronome.com> writes: > On Fri, 15 Mar 2019 17:56:07 +0000, Petr Machata wrote: >> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h >> index e2091bb2b3a8..cfd9e86ff0ca 100644 >> --- a/include/net/rtnetlink.h >> +++ b/include/net/rtnetlink.h >> @@ -110,6 +110,9 @@ struct rtnl_link_ops { >> int (*fill_linkxstats)(struct sk_buff *skb, >> const struct net_device *dev, >> int *prividx, int attr); >> + size_t (*link_down_reason_get_size)(const struct net_device *dev); >> + int (*fill_link_down_reason)(struct sk_buff *skb, >> + const struct net_device *dev); > > IMHO the API is a little heavy for returning, what is effectively a u64 > value sliced in two.. If a year from now someone wants to add string reason, this API will just extend naturally to cover that case. > Perhaps the core can just assume the reason will be provided if the NDO > is present? And the "fill" NDO should probably fill in the reason > structure, rather than getting the skb passed and dealing with netlink > directly. This copies how other fill APIs are done, in that the responsibility for filling up the message is deferred to the driver. I think it makes the API easier to extend: if there ever is richer information available, the drivers that want to support it just opt in and provide those attributes. > Also perhaps this would be a ethtool-nl candidate (which would > hopefully land soon after the merge window)? Is this the repository with the patches? I'll take a look. https://github.com/mkubecek/ethnl Thanks, Petr
On Mon, Mar 18, 2019 at 12:34:55PM +0000, Petr Machata wrote: > Jakub Kicinski <jakub.kicinski@netronome.com> writes: > > Also perhaps this would be a ethtool-nl candidate (which would > > hopefully land soon after the merge window)? > > Is this the repository with the patches? I'll take a look. > > https://github.com/mkubecek/ethnl Yes, that's it. Michal
> This copies how other fill APIs are done, in that the responsibility for > filling up the message is deferred to the driver. I think it makes the > API easier to extend: if there ever is richer information available, the > drivers that want to support it just opt in and provide those attributes. Hi Petr Actually, if you look at the PHY layer, none of the drivers have anything to do with netlink messages. Nor does the phylib or phylink core. It all happens a layer above. I would keep the API to the driver simple, just ask it for two u32, and do all the netlink conversion in the core. Andrew
Andrew Lunn <andrew@lunn.ch> writes: >> +enum rtnl_link_down_reason_major { >> + RTNL_LDR_OTHER, > > Does 'other' make any sense? Seem better to just not report anything > at all, or add a comment that more reasons should be added at the end > to reflect whatever the hardware or software can determine. You still have the minor code to give you some information. >> + RTNL_LDR_NO_CABLE, >> + RTNL_LDR_UNSUPPORTED_CABLE, >> + RTNL_LDR_AUTONEG_FAILURE, >> + RTNL_LDR_NO_LINK_PARTNER, >> + RTNL_LDR_LINK_TRAINING_FAILURE, >> + RTNL_LDR_LOGICAL_MISMATCH, >> + RTNL_LDR_REMOTE_FAULT, >> + RTNL_LDR_BAD_SIGNAL_INTEGRITY, >> + RTNL_LDR_CALIBRATION_FAILURE, >> + RTNL_LDR_POWER_BUDGET_EXCEEDED, >> +}; > > What about SFP cage empty?, i.e. no SFP, SFP+ module in the cage? An No cable? Maybe the name needs to change... > SFP can also report LOS. That does not appear to be any of the above. > Or that the core SFP code has been unable to read the EEPROM? We have My assumption was that cable with unreadable EEPROM is simply a bad cable. Does the admin actually care which particular part of the cable is at fault? > people reporting this problem at the moment. We also have that the > SERDES has not yet obtained sync to its peer, which you know from > phylink_mac_change. That probably means the peer is using a different > bit rate. We can add this. > I think it would be good if you handle the general case errors which > phylib and phylink can report, as well as the proprietary cases your > driver can report. We don't want this to be a Mellanox only API. Sure, I'll take a look at that. I didn't need to deal with PHY so far, so I need to figure out what's what.
On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: > > Andrew Lunn <andrew@lunn.ch> writes: > > >> +enum rtnl_link_down_reason_major { > >> + RTNL_LDR_OTHER, > > > > Does 'other' make any sense? Seem better to just not report anything > > at all, or add a comment that more reasons should be added at the end > > to reflect whatever the hardware or software can determine. > > You still have the minor code to give you some information. > > >> + RTNL_LDR_NO_CABLE, > >> + RTNL_LDR_UNSUPPORTED_CABLE, > >> + RTNL_LDR_AUTONEG_FAILURE, > >> + RTNL_LDR_NO_LINK_PARTNER, > >> + RTNL_LDR_LINK_TRAINING_FAILURE, > >> + RTNL_LDR_LOGICAL_MISMATCH, > >> + RTNL_LDR_REMOTE_FAULT, > >> + RTNL_LDR_BAD_SIGNAL_INTEGRITY, > >> + RTNL_LDR_CALIBRATION_FAILURE, > >> + RTNL_LDR_POWER_BUDGET_EXCEEDED, > >> +}; > > > > What about SFP cage empty?, i.e. no SFP, SFP+ module in the cage? An > > No cable? Maybe the name needs to change... An SFP module, and the cable plugged into it via LC connectors, are physically different things. And you can also have an SFP with an RJ45 for 1G copper. I know at higher speeds they can be inseparable, but this needs to be a generic API and also work with them being two separate things. > > > SFP can also report LOS. That does not appear to be any of the above. > > Or that the core SFP code has been unable to read the EEPROM? We have > > My assumption was that cable with unreadable EEPROM is simply a bad > cable. Does the admin actually care which particular part of the cable > is at fault? Yes. I throw away the SFP module, because its EEPROM is broke, but don't need to replace the 1KM of fibre cable, or 100m of Cat 6a copper cable. Classic example would be fibre to the home. Do you really think they are going to dig up the road again, to replace the cable, when all they need to do is replace the SFP module? Andrew
Andrew Lunn <andrew@lunn.ch> writes: > On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: >> >> Andrew Lunn <andrew@lunn.ch> writes: >> >> >> +enum rtnl_link_down_reason_major { >> >> + RTNL_LDR_OTHER, >> > >> > Does 'other' make any sense? Seem better to just not report anything >> > at all, or add a comment that more reasons should be added at the end >> > to reflect whatever the hardware or software can determine. >> >> You still have the minor code to give you some information. >> >> >> + RTNL_LDR_NO_CABLE, >> >> + RTNL_LDR_UNSUPPORTED_CABLE, >> >> + RTNL_LDR_AUTONEG_FAILURE, >> >> + RTNL_LDR_NO_LINK_PARTNER, >> >> + RTNL_LDR_LINK_TRAINING_FAILURE, >> >> + RTNL_LDR_LOGICAL_MISMATCH, >> >> + RTNL_LDR_REMOTE_FAULT, >> >> + RTNL_LDR_BAD_SIGNAL_INTEGRITY, >> >> + RTNL_LDR_CALIBRATION_FAILURE, >> >> + RTNL_LDR_POWER_BUDGET_EXCEEDED, >> >> +}; >> > >> > What about SFP cage empty?, i.e. no SFP, SFP+ module in the cage? An >> >> No cable? Maybe the name needs to change... > > An SFP module, and the cable plugged into it via LC connectors, are > physically different things. And you can also have an SFP with an RJ45 > for 1G copper. I know at higher speeds they can be inseparable, but > this needs to be a generic API and also work with them being two > separate things. Understood. >> >> > SFP can also report LOS. That does not appear to be any of the above. >> > Or that the core SFP code has been unable to read the EEPROM? We have >> >> My assumption was that cable with unreadable EEPROM is simply a bad >> cable. Does the admin actually care which particular part of the cable >> is at fault? > > Yes. I throw away the SFP module, because its EEPROM is broke, but > don't need to replace the 1KM of fibre cable, or 100m of Cat 6a copper > cable. Classic example would be fibre to the home. OK, gotcha.
On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: > > Andrew Lunn <andrew@lunn.ch> writes: > > >> +enum rtnl_link_down_reason_major { > >> + RTNL_LDR_OTHER, > > > > Does 'other' make any sense? Seem better to just not report anything > > at all, or add a comment that more reasons should be added at the end > > to reflect whatever the hardware or software can determine. > > You still have the minor code to give you some information. The problem i have with OTHER, is that you know it is not NO_CABLE, UNSUPPORTED_CABLE, AUTONEG_FAILURE, etc. But for people to know what OTHER cannot be, they have to know all the codes. But then later, some other driver writer does the right thing, adds a new value to the end for a code they can detect. Say for example SFP_OVERHEATED. This happened to be what the previous driver was using for OTHER. Now we have one driver returning SFP_OVERHEATED and the older driver OTHER. So OTHER no longer actually mean 'other', it just means something random, which could actually be the same as one of the listed codes. You can stop this from happening by not having OTHER. Always add a new code if there is something you can report, but there currently is no code for it. And the userspace tool should just print the decimal value if it does not know what text to translate it into. Andrew
On Mon, 18 Mar 2019 15:02:53 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: > > > > Andrew Lunn <andrew@lunn.ch> writes: > > > > >> +enum rtnl_link_down_reason_major { > > >> + RTNL_LDR_OTHER, > > > > > > Does 'other' make any sense? Seem better to just not report anything > > > at all, or add a comment that more reasons should be added at the end > > > to reflect whatever the hardware or software can determine. > > > > You still have the minor code to give you some information. > > The problem i have with OTHER, is that you know it is not NO_CABLE, > UNSUPPORTED_CABLE, AUTONEG_FAILURE, etc. But for people to know what > OTHER cannot be, they have to know all the codes. > > But then later, some other driver writer does the right thing, adds a > new value to the end for a code they can detect. Say for example > SFP_OVERHEATED. This happened to be what the previous driver was > using for OTHER. Now we have one driver returning SFP_OVERHEATED and > the older driver OTHER. So OTHER no longer actually mean 'other', it > just means something random, which could actually be the same as one > of the listed codes. > > You can stop this from happening by not having OTHER. Always add a new > code if there is something you can report, but there currently is no > code for it. And the userspace tool should just print the decimal > value if it does not know what text to translate it into. Gut feel is that enumerated values are going to grow and grow and be long term API headache. Would it be possible to use a string like the external ack error message?
Stephen Hemminger <stephen@networkplumber.org> writes: > On Mon, 18 Mar 2019 15:02:53 +0100 > Andrew Lunn <andrew@lunn.ch> wrote: > >> On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: >> > >> > Andrew Lunn <andrew@lunn.ch> writes: >> > >> > >> +enum rtnl_link_down_reason_major { >> > >> + RTNL_LDR_OTHER, >> > > >> > > Does 'other' make any sense? Seem better to just not report anything >> > > at all, or add a comment that more reasons should be added at the end >> > > to reflect whatever the hardware or software can determine. >> > >> > You still have the minor code to give you some information. >> >> The problem i have with OTHER, is that you know it is not NO_CABLE, >> UNSUPPORTED_CABLE, AUTONEG_FAILURE, etc. But for people to know what >> OTHER cannot be, they have to know all the codes. >> >> But then later, some other driver writer does the right thing, adds a >> new value to the end for a code they can detect. Say for example >> SFP_OVERHEATED. This happened to be what the previous driver was >> using for OTHER. Now we have one driver returning SFP_OVERHEATED and >> the older driver OTHER. So OTHER no longer actually mean 'other', it >> just means something random, which could actually be the same as one >> of the listed codes. >> >> You can stop this from happening by not having OTHER. Always add a new >> code if there is something you can report, but there currently is no >> code for it. And the userspace tool should just print the decimal >> value if it does not know what text to translate it into. > > Gut feel is that enumerated values are going to grow and grow and be > long term API headache. > > Would it be possible to use a string like the external ack error > message? It would, but then if any automated tools want to make use of it beyond just blindly displaying it, they will need to parse it with all the usual problems. In the end the string itself becomes the API anyway. Adding a string would make sense as an extra piece of information, not as the primary channel. Extack is like this as well, the primary channel there is errno.
On Tue, Mar 19, 2019 at 10:18:00AM +0000, Petr Machata wrote: > Stephen Hemminger <stephen@networkplumber.org> writes: > > Gut feel is that enumerated values are going to grow and grow and be > > long term API headache. > > > > Would it be possible to use a string like the external ack error > > message? > > It would, but then if any automated tools want to make use of it beyond > just blindly displaying it, they will need to parse it with all the > usual problems. In the end the string itself becomes the API anyway. Providing only text description doesn't sound right to me either, I would rather prefer having both text and numeric code in such case. > Adding a string would make sense as an extra piece of information, not > as the primary channel. Extack is like this as well, the primary channel > there is errno. Not exactly, the primary goal of extack was IMHO to provide better granularity than the standard error codes allowed (we all remember the cryptic messages of older iproute2 versions). A better analogy would be numeric error codes which could be passed via extack cookie (e.g. nl_set_extack_cookie_u64()) together with a text representation. But AFAIK only few netlink based interfaces use those. Michal Kubecek
On Tue, 19 Mar 2019 10:18:00 +0000 Petr Machata <petrm@mellanox.com> wrote: > Stephen Hemminger <stephen@networkplumber.org> writes: > > > On Mon, 18 Mar 2019 15:02:53 +0100 > > Andrew Lunn <andrew@lunn.ch> wrote: > > > >> On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: > >> > > >> > Andrew Lunn <andrew@lunn.ch> writes: > >> > > >> > >> +enum rtnl_link_down_reason_major { > >> > >> + RTNL_LDR_OTHER, > >> > > > >> > > Does 'other' make any sense? Seem better to just not report anything > >> > > at all, or add a comment that more reasons should be added at the end > >> > > to reflect whatever the hardware or software can determine. > >> > > >> > You still have the minor code to give you some information. > >> > >> The problem i have with OTHER, is that you know it is not NO_CABLE, > >> UNSUPPORTED_CABLE, AUTONEG_FAILURE, etc. But for people to know what > >> OTHER cannot be, they have to know all the codes. > >> > >> But then later, some other driver writer does the right thing, adds a > >> new value to the end for a code they can detect. Say for example > >> SFP_OVERHEATED. This happened to be what the previous driver was > >> using for OTHER. Now we have one driver returning SFP_OVERHEATED and > >> the older driver OTHER. So OTHER no longer actually mean 'other', it > >> just means something random, which could actually be the same as one > >> of the listed codes. > >> > >> You can stop this from happening by not having OTHER. Always add a new > >> code if there is something you can report, but there currently is no > >> code for it. And the userspace tool should just print the decimal > >> value if it does not know what text to translate it into. > > > > Gut feel is that enumerated values are going to grow and grow and be > > long term API headache. > > > > Would it be possible to use a string like the external ack error > > message? > > It would, but then if any automated tools want to make use of it beyond > just blindly displaying it, they will need to parse it with all the > usual problems. In the end the string itself becomes the API anyway. > > Adding a string would make sense as an extra piece of information, not > as the primary channel. Extack is like this as well, the primary channel > there is errno. The problem with codes is that without some standard (like IETF) the values are very system specific and likely to get lots of of version churn.
Stephen Hemminger <stephen@networkplumber.org> writes: > On Tue, 19 Mar 2019 10:18:00 +0000 > Petr Machata <petrm@mellanox.com> wrote: > >> Stephen Hemminger <stephen@networkplumber.org> writes: >> >> > On Mon, 18 Mar 2019 15:02:53 +0100 >> > Andrew Lunn <andrew@lunn.ch> wrote: >> > >> >> On Mon, Mar 18, 2019 at 01:15:41PM +0000, Petr Machata wrote: >> >> > >> >> > Andrew Lunn <andrew@lunn.ch> writes: >> >> > >> >> > >> +enum rtnl_link_down_reason_major { >> >> > >> + RTNL_LDR_OTHER, >> >> > > >> >> > > Does 'other' make any sense? Seem better to just not report anything >> >> > > at all, or add a comment that more reasons should be added at the end >> >> > > to reflect whatever the hardware or software can determine. >> >> > >> >> > You still have the minor code to give you some information. >> >> >> >> The problem i have with OTHER, is that you know it is not NO_CABLE, >> >> UNSUPPORTED_CABLE, AUTONEG_FAILURE, etc. But for people to know what >> >> OTHER cannot be, they have to know all the codes. >> >> >> >> But then later, some other driver writer does the right thing, adds a >> >> new value to the end for a code they can detect. Say for example >> >> SFP_OVERHEATED. This happened to be what the previous driver was >> >> using for OTHER. Now we have one driver returning SFP_OVERHEATED and >> >> the older driver OTHER. So OTHER no longer actually mean 'other', it >> >> just means something random, which could actually be the same as one >> >> of the listed codes. >> >> >> >> You can stop this from happening by not having OTHER. Always add a new >> >> code if there is something you can report, but there currently is no >> >> code for it. And the userspace tool should just print the decimal >> >> value if it does not know what text to translate it into. >> > >> > Gut feel is that enumerated values are going to grow and grow and be >> > long term API headache. >> > >> > Would it be possible to use a string like the external ack error >> > message? >> >> It would, but then if any automated tools want to make use of it beyond >> just blindly displaying it, they will need to parse it with all the >> usual problems. In the end the string itself becomes the API anyway. > > The problem with codes is that without some standard (like IETF) the > values are very system specific and likely to get lots of of version churn. I think this would be even worse with just strings.
Andrew Lunn <andrew@lunn.ch> writes: > I argued this is a PHY layer status information. We don't really want > to have to modify every MAC driver to call into phylib/phylink. Yes, > we can have a netdev_ops for those drivers which ignore the Linux PHY > layer, but ideally we want to transparently call into the PHY layer to > get this information, if the MAC is not doing odd things. From what I see, there are four places where the information could be collected: - MAC driver: dev->ethtool_ops->get_link_down_reason (At least mlxsw and mlx5 need this.) modified include/linux/ethtool.h @@ -395,4 +395,7 @@ struct ethtool_ops { struct ethtool_fecparam *); void (*get_ethtool_phy_stats)(struct net_device *, struct ethtool_stats *, u64 *); + int (*get_link_down_reason)(const struct net_device *, + enum link_down_reason_major *, + u32 *minor); }; Return -ENODATA to indicate there's nothing to report. - PHY driver: dev->phydev->drv->get_link_down_reason Certain PHY drivers might want to have a custom handling for some PHY-specific insight. Something like: modified include/linux/phy.h @@ -636,4 +636,7 @@ struct phy_driver { struct ethtool_tunable *tuna, const void *data); int (*set_loopback)(struct phy_device *dev, bool enable); + int (*get_link_down_reason)(struct phy_device *dev, + enum link_down_reason_major *major, + u32 *minor); }; Return -ENODATA to indicate there's nothing to report. - Generic PHY It would be possible to add a function that e.g. consults the PHY state machine to figure out what went wrong. Phy as such may have to be extended to keep track of e.g. why the state machine is PHY_HALTED, or possibly other problems worthy of reporting. - phylink I'm not sure how to generically plug in the phylink library, because it is a private property of a MAC driver. There are currently only three drivers using phylink (mvvp2, mvneta, DSA), but it looks as if the intention is that phylink is used much more widely. So maybe it would make sense to have an NDO like get_phylink and use that to call into the phylink library for some generic handling. Speaking specifically, my patch set would only do the first step above, because neither mlxsw nor mlx5 use the PHY subsystem. But it would be easy enough to extend for the other cases above. Thoughts? Thanks, Petr
On Thu, Mar 28, 2019 at 05:59:50PM +0000, Petr Machata wrote: Hi Petr > - PHY driver: dev->phydev->drv->get_link_down_reason > Certain PHY drivers might want to have a custom handling for some > PHY-specific insight. Something like: > > modified include/linux/phy.h > @@ -636,4 +636,7 @@ struct phy_driver { > struct ethtool_tunable *tuna, > const void *data); > int (*set_loopback)(struct phy_device *dev, bool enable); > + int (*get_link_down_reason)(struct phy_device *dev, > + enum link_down_reason_major *major, > + u32 *minor); > }; > > Return -ENODATA to indicate there's nothing to report. > > - Generic PHY > It would be possible to add a function that e.g. consults the PHY > state machine to figure out what went wrong. Phy as such may have to > be extended to keep track of e.g. why the state machine is PHY_HALTED, > or possibly other problems worthy of reporting. I would suggest changing the order of these two. Put the generic PHY first. If it sees the driver has a OP to get more specific detail, it would make the call into the driver. You don't violate any layering that way, and the locking is done correctly. The PHY being in state HALTED probably means there is a hardware error at the MDIO level. That does not happen very often at all. What you are more interested is state PHY_NOLINK, which indicates there is no link. Maybe because auto-neg didn't complete, if it was turned on, or if auto-neg is off and the link is forced, the peer is using something different, or is not there at all. > - phylink > I'm not sure how to generically plug in the phylink library, because > it is a private property of a MAC driver. There are currently only > three drivers using phylink (mvvp2, mvneta, DSA), but it looks as if > the intention is that phylink is used much more widely. So maybe it > would make sense to have an NDO like get_phylink and use that to call > into the phylink library for some generic handling. We need to consider adding a phylink pointer to the netdev structure soon, in the same way there is a phylib structure. > Speaking specifically, my patch set would only do the first step above, > because neither mlxsw nor mlx5 use the PHY subsystem. Shame. With just mlx, you will get a coverage of ~0. If you did the generic phylib work, you might get coverage of > 50% of the MAC drivers. It then becomes something useful and really used. Andrew
Thu, Mar 28, 2019 at 08:51:34PM CET, andrew@lunn.ch wrote: >On Thu, Mar 28, 2019 at 05:59:50PM +0000, Petr Machata wrote: > >Hi Petr > >> - PHY driver: dev->phydev->drv->get_link_down_reason >> Certain PHY drivers might want to have a custom handling for some >> PHY-specific insight. Something like: >> >> modified include/linux/phy.h >> @@ -636,4 +636,7 @@ struct phy_driver { >> struct ethtool_tunable *tuna, >> const void *data); >> int (*set_loopback)(struct phy_device *dev, bool enable); >> + int (*get_link_down_reason)(struct phy_device *dev, >> + enum link_down_reason_major *major, >> + u32 *minor); >> }; >> >> Return -ENODATA to indicate there's nothing to report. >> >> - Generic PHY >> It would be possible to add a function that e.g. consults the PHY >> state machine to figure out what went wrong. Phy as such may have to >> be extended to keep track of e.g. why the state machine is PHY_HALTED, >> or possibly other problems worthy of reporting. > >I would suggest changing the order of these two. Put the generic PHY >first. If it sees the driver has a OP to get more specific detail, it >would make the call into the driver. You don't violate any layering >that way, and the locking is done correctly. > >The PHY being in state HALTED probably means there is a hardware error >at the MDIO level. That does not happen very often at all. > >What you are more interested is state PHY_NOLINK, which indicates >there is no link. Maybe because auto-neg didn't complete, if it was >turned on, or if auto-neg is off and the link is forced, the peer is >using something different, or is not there at all. > >> - phylink >> I'm not sure how to generically plug in the phylink library, because >> it is a private property of a MAC driver. There are currently only >> three drivers using phylink (mvvp2, mvneta, DSA), but it looks as if >> the intention is that phylink is used much more widely. So maybe it >> would make sense to have an NDO like get_phylink and use that to call >> into the phylink library for some generic handling. > >We need to consider adding a phylink pointer to the netdev structure >soon, in the same way there is a phylib structure. > >> Speaking specifically, my patch set would only do the first step above, >> because neither mlxsw nor mlx5 use the PHY subsystem. > >Shame. With just mlx, you will get a coverage of ~0. If you did the >generic phylib work, you might get coverage of > 50% of the MAC >drivers. It then becomes something useful and really used. Yeah, that's what it is. But maybe someone would pick up the work and do the phylib implementation too. Let's see. > > Andrew
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index e2091bb2b3a8..cfd9e86ff0ca 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -110,6 +110,9 @@ struct rtnl_link_ops { int (*fill_linkxstats)(struct sk_buff *skb, const struct net_device *dev, int *prividx, int attr); + size_t (*link_down_reason_get_size)(const struct net_device *dev); + int (*fill_link_down_reason)(struct sk_buff *skb, + const struct net_device *dev); }; int __rtnl_link_register(struct rtnl_link_ops *ops); diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 5b225ff63b48..a47f42e79741 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -167,6 +167,8 @@ enum { IFLA_NEW_IFINDEX, IFLA_MIN_MTU, IFLA_MAX_MTU, + IFLA_LINK_DOWN_REASON_MAJOR, /* enum rtnl_link_down_reason_major */ + IFLA_LINK_DOWN_REASON_MINOR, __IFLA_MAX }; @@ -239,6 +241,20 @@ enum in6_addr_gen_mode { IN6_ADDR_GEN_MODE_RANDOM, }; +enum rtnl_link_down_reason_major { + RTNL_LDR_OTHER, + RTNL_LDR_NO_CABLE, + RTNL_LDR_UNSUPPORTED_CABLE, + RTNL_LDR_AUTONEG_FAILURE, + RTNL_LDR_NO_LINK_PARTNER, + RTNL_LDR_LINK_TRAINING_FAILURE, + RTNL_LDR_LOGICAL_MISMATCH, + RTNL_LDR_REMOTE_FAULT, + RTNL_LDR_BAD_SIGNAL_INTEGRITY, + RTNL_LDR_CALIBRATION_FAILURE, + RTNL_LDR_POWER_BUDGET_EXCEEDED, +}; + /* Bridge section */ enum { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a51cab95ba64..206795f13850 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -979,6 +979,22 @@ static size_t rtnl_xdp_size(void) return xdp_size; } +static bool rtnl_should_fill_link_down_reason(const struct net_device *dev) +{ + return (dev->flags & IFF_UP) && !netif_oper_up(dev) && + dev->rtnl_link_ops && + dev->rtnl_link_ops->link_down_reason_get_size && + dev->rtnl_link_ops->fill_link_down_reason; +} + +static size_t rtnl_link_down_reason_get_size(const struct net_device *dev) +{ + if (!rtnl_should_fill_link_down_reason(dev)) + return 0; + + return dev->rtnl_link_ops->link_down_reason_get_size(dev); +} + static noinline size_t if_nlmsg_size(const struct net_device *dev, u32 ext_filter_mask) { @@ -1026,6 +1042,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(4) /* IFLA_CARRIER_DOWN_COUNT */ + nla_total_size(4) /* IFLA_MIN_MTU */ + nla_total_size(4) /* IFLA_MAX_MTU */ + + rtnl_link_down_reason_get_size(dev) + 0; } @@ -1683,6 +1700,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, nla_put_s32(skb, IFLA_NEW_IFINDEX, new_ifindex) < 0) goto nla_put_failure; + if (rtnl_should_fill_link_down_reason(dev) && + dev->rtnl_link_ops->fill_link_down_reason(skb, dev)) + goto nla_put_failure; rcu_read_lock(); if (rtnl_fill_link_af(skb, dev, ext_filter_mask)) @@ -1742,6 +1762,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 }, [IFLA_MIN_MTU] = { .type = NLA_U32 }, [IFLA_MAX_MTU] = { .type = NLA_U32 }, + [IFLA_LINK_DOWN_REASON_MAJOR] = { .type = NLA_U32 }, + [IFLA_LINK_DOWN_REASON_MINOR] = { .type = NLA_U32 }, }; static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
In general, after a port is put administratively up, certain handshake protocols have to finish successfully, otherwise the port is left in a NO-CARRIER or DORMANT state. When that happens, it would be useful to communicate the reasons to the administrator, so that the problem that prevents the link from being established can be corrected. Introduce two new link attributes: IFLA_LINK_DOWN_REASON_MAJOR and _MINOR. Major reason code serve as broad categories intended to convey a general idea of where the problem is. Minor codes are arbitrary numbers specific for the driver that add detail to the major reasons. Add enum rtnl_link_down_reason_major to define the well-known major reason codes. The party with visibility into details of this process is the driver. Therefore add two new RTNL hooks, link_down_reason_get_size and fill_link_down_reason, to provide the necessary information. Link-down reason is not included if the port is up or administratively down, because those two state are easy to discover through existing interfaces. Signed-off-by: Petr Machata <petrm@mellanox.com> --- include/net/rtnetlink.h | 3 +++ include/uapi/linux/if_link.h | 16 ++++++++++++++++ net/core/rtnetlink.c | 22 ++++++++++++++++++++++ 3 files changed, 41 insertions(+)