Message ID | 20140422141425.127dabd3c63482a6a655469e@redhat.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote: > I believe I've found a problem with netlink handling which can be > triggered on Cisco enic devices with a large number (30-40) of virtual > functions. I believe this is the cause of a real customer problem > we've seen. > > * When requesting a list of interfaces with RTM_GETLINK, enic devices > (and currently, _only_ enic devices) report IFLA_VF_PORTS > information > > * IFLA_VF_PORTS information has at least 90 bytes ber virtual function > > * Unlike IFLA_VFINFO_LIST, the ports information is always reported, > regardless of the setting of the IFLA_EXT_MASK parameter [...] So I think you should make reporting of IFLA_VF_PORTS dependent on the same flag as IFLA_VFINFO_LIST. Ben.
From: Ben Hutchings <ben@decadent.org.uk> Date: Tue, 22 Apr 2014 19:03:19 +0100 > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote: >> I believe I've found a problem with netlink handling which can be >> triggered on Cisco enic devices with a large number (30-40) of virtual >> functions. I believe this is the cause of a real customer problem >> we've seen. >> >> * When requesting a list of interfaces with RTM_GETLINK, enic devices >> (and currently, _only_ enic devices) report IFLA_VF_PORTS >> information >> >> * IFLA_VF_PORTS information has at least 90 bytes ber virtual function >> >> * Unlike IFLA_VFINFO_LIST, the ports information is always reported, >> regardless of the setting of the IFLA_EXT_MASK parameter > [...] > > So I think you should make reporting of IFLA_VF_PORTS dependent on the > same flag as IFLA_VFINFO_LIST. I think that's what we'll have to do. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David, > -----Original Message----- > From: David Gibson [mailto:dgibson@redhat.com] > Sent: Monday, April 21, 2014 9:14 PM > To: netdev@vger.kernel.org > Cc: Christian Benvenuti (benve); Sujith Sankar (ssujith); Govindarajulu > Varadarajan; Neel Patel (neepatel); Nishank Trivedi > Subject: RFC: rtnetlink problems with Cisco enic and VFs > > I believe I've found a problem with netlink handling which can be triggered > on Cisco enic devices with a large number (30-40) of virtual functions. I > believe this is the cause of a real customer problem we've seen. > > * When requesting a list of interfaces with RTM_GETLINK, enic devices > (and currently, _only_ enic devices) report IFLA_VF_PORTS > information Is the fact that Enic is the only driver implementing ndo_get_vf_port [1] the root cause of the problem and the reason why this happens only with Enic? /Chris [1] This is what makes rtnl_port_size to account for vf_port_size*dev_num_vf(...) and rtnl_port_fill to add IFLA_VF_PORTS. > * IFLA_VF_PORTS information has at least 90 bytes ber virtual function > > * Unlike IFLA_VFINFO_LIST, the ports information is always reported, > regardless of the setting of the IFLA_EXT_MASK parameter > > * When IFLA_EXT_MASK is not specified, the reply packets have maximum > size NLMSG_GOODSIZE (4k - overheads) > > * If there are enough virtual functions the IFLA_VF_PORTS information > can cause a single interface's info to exceed NLMSG_GOODSIZE > > * The number of interfaces necessary to trigger this is reduced > substantially if both IPv4 and IPv6 IFLA_AF_SPEC information is > present (~972 bytes) > > * If the dump function returns -EMSGSIZE on the first message in a > packet, netlink_dump() incorrectly assumes the listing is done, > omitting information for that interface and any later ones. > > * This can cause getifaddrs(3) to go into an infinite loop > > * 'ip link' is not affected, because it supplies IFLA_EXT_MASK which > triggers rtnl_calcit() to recalculate the required packet size to > greater than NLMSG_GOODSIZE. > > > I can see several possible ways to fix this, but they all have possible > problems. I'm hoping someone here can determine which, if any, are real > problems, and therefore what's the right approach to fix this. > > A) Always calculate the RTM_NEWLINK packet size, rather than > assuming NLMSG_GOODSIZE. > Problem: The NLMSG_GOODSIZE limit was introduced to stop broken > user tools with limited buffers encountering problems > (see 115c9b81928360d769a76c632bae62d15206a94a). This > approach might mean that such tools break again. > > B) Don't issue the VF port information when RTEXT_FILTER_VF isn't > set > Problem: Do tools using the port information already set this flag? > > C) Don't include the VF port info when listing interfaces, only > when doing GETLINK on a specific interface. > Problem: As (B), plus it's ugly. > > D) Detect the case when the first interface in a packet doesn't fit > reallocate the packet buffer > Problem: As (A), plus more complicated. > > As an interim band-aid, here's a patch which adds a WARN_ON() in this > situation, which will at least make the problem easier to locate for the next > person to encounter it. > > From: David Gibson <david@gibson.dropbear.id.au> > Subject: [PATCH] rtnetlink: Warn when interface's information won't fit > in our packet > > Without IFLA_EXT_MASK specified, the information reported for a single > interface in response to RTM_GETLINK is expected to fit within a netlink > packet of NLMSG_GOODSIZE. > > If it doesn't, however, things will go badly wrong, When listing all > interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first > message in a packet as the end of the listing and omit information for that > interface and all subsequent ones. This can cause getifaddrs(3) to enter an > infinite loop. > > This patch won't fix the problem, but it will WARN_ON() making it easier to > track down what's going wrong. > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > net/core/rtnetlink.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index > d4ff417..5331db2 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -1198,6 +1198,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, > struct netlink_callback *cb) struct hlist_head *head; > struct nlattr *tb[IFLA_MAX+1]; > u32 ext_filter_mask = 0; > + int err; > > s_h = cb->args[0]; > s_idx = cb->args[1]; > @@ -1218,11 +1219,16 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, > struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head, > index_hlist) { if (idx < s_idx) > goto cont; > - if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, > - NETLINK_CB > (cb->skb).portid, > - cb->nlh->nlmsg_seq, 0, > - NLM_F_MULTI, > - ext_filter_mask) <= 0) > + err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, > + NETLINK_CB > (cb->skb).portid, > + cb->nlh->nlmsg_seq, 0, > + NLM_F_MULTI, > + ext_filter_mask); > + /* If we ran out of room on the first message, > + * we're in trouble */ > + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); > + > + if (err <= 0) > goto out; > > nl_dump_check_consistent(cb, nlmsg_hdr(skb)); > -- > 1.9.0 > > > > -- > David Gibson <dgibson@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 22 Apr 2014 19:14:04 +0000 "Christian Benvenuti (benve)" <benve@cisco.com> wrote: > David, > > > -----Original Message----- > > From: David Gibson [mailto:dgibson@redhat.com] > > Sent: Monday, April 21, 2014 9:14 PM > > To: netdev@vger.kernel.org > > Cc: Christian Benvenuti (benve); Sujith Sankar (ssujith); Govindarajulu > > Varadarajan; Neel Patel (neepatel); Nishank Trivedi > > Subject: RFC: rtnetlink problems with Cisco enic and VFs > > > > I believe I've found a problem with netlink handling which can be triggered > > on Cisco enic devices with a large number (30-40) of virtual functions. I > > believe this is the cause of a real customer problem we've seen. > > > > * When requesting a list of interfaces with RTM_GETLINK, enic devices > > (and currently, _only_ enic devices) report IFLA_VF_PORTS > > information > > Is the fact that Enic is the only driver implementing ndo_get_vf_port [1] > the root cause of the problem and the reason why this happens only with Enic? Yes, or at least it's one of the factors. > > /Chris > > [1] > This is what makes rtnl_port_size to account for vf_port_size*dev_num_vf(...) > and rtnl_port_fill to add IFLA_VF_PORTS.
On Tue, 22 Apr 2014 14:12:00 -0400 (EDT) David Miller <davem@davemloft.net> wrote: > From: Ben Hutchings <ben@decadent.org.uk> > Date: Tue, 22 Apr 2014 19:03:19 +0100 > > > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote: > >> I believe I've found a problem with netlink handling which can be > >> triggered on Cisco enic devices with a large number (30-40) of virtual > >> functions. I believe this is the cause of a real customer problem > >> we've seen. > >> > >> * When requesting a list of interfaces with RTM_GETLINK, enic devices > >> (and currently, _only_ enic devices) report IFLA_VF_PORTS > >> information > >> > >> * IFLA_VF_PORTS information has at least 90 bytes ber virtual function > >> > >> * Unlike IFLA_VFINFO_LIST, the ports information is always reported, > >> regardless of the setting of the IFLA_EXT_MASK parameter > > [...] > > > > So I think you should make reporting of IFLA_VF_PORTS dependent on the > > same flag as IFLA_VFINFO_LIST. > > I think that's what we'll have to do. Ok, makes logical sense. But does anyone know what tools make use of the IFLA_VF_PORTS information? Do they set the IFLA_EXT_MASK already?
On Wed, 23 Apr 2014 09:26:06 +1000 David Gibson <dgibson@redhat.com> wrote: > On Tue, 22 Apr 2014 14:12:00 -0400 (EDT) > David Miller <davem@davemloft.net> wrote: > > > From: Ben Hutchings <ben@decadent.org.uk> > > Date: Tue, 22 Apr 2014 19:03:19 +0100 > > > > > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote: > > >> I believe I've found a problem with netlink handling which can be > > >> triggered on Cisco enic devices with a large number (30-40) of > > >> virtual functions. I believe this is the cause of a real > > >> customer problem we've seen. > > >> > > >> * When requesting a list of interfaces with RTM_GETLINK, enic > > >> devices (and currently, _only_ enic devices) report IFLA_VF_PORTS > > >> information > > >> > > >> * IFLA_VF_PORTS information has at least 90 bytes ber virtual > > >> function > > >> > > >> * Unlike IFLA_VFINFO_LIST, the ports information is always > > >> reported, regardless of the setting of the IFLA_EXT_MASK > > >> parameter > > > [...] > > > > > > So I think you should make reporting of IFLA_VF_PORTS dependent > > > on the same flag as IFLA_VFINFO_LIST. > > > > I think that's what we'll have to do. > > Ok, makes logical sense. > > But does anyone know what tools make use of the IFLA_VF_PORTS > information? Do they set the IFLA_EXT_MASK already? > So far as I know only the IP route tool 'ip link' sets that. In fact, that's the reason I had to add it some number of years and months ago was because there were tools that didn't expect to get all the additional VF info and those tools were getting borked by all the additional goo sent up for VFs. Beyond that who knows what anyone's been up to with other tools in other places? - Greg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Gibson <dgibson@redhat.com> Date: Wed, 23 Apr 2014 09:26:06 +1000 > On Tue, 22 Apr 2014 14:12:00 -0400 (EDT) > David Miller <davem@davemloft.net> wrote: > >> From: Ben Hutchings <ben@decadent.org.uk> >> Date: Tue, 22 Apr 2014 19:03:19 +0100 >> >> > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote: >> >> I believe I've found a problem with netlink handling which can be >> >> triggered on Cisco enic devices with a large number (30-40) of virtual >> >> functions. I believe this is the cause of a real customer problem >> >> we've seen. >> >> >> >> * When requesting a list of interfaces with RTM_GETLINK, enic devices >> >> (and currently, _only_ enic devices) report IFLA_VF_PORTS >> >> information >> >> >> >> * IFLA_VF_PORTS information has at least 90 bytes ber virtual function >> >> >> >> * Unlike IFLA_VFINFO_LIST, the ports information is always reported, >> >> regardless of the setting of the IFLA_EXT_MASK parameter >> > [...] >> > >> > So I think you should make reporting of IFLA_VF_PORTS dependent on the >> > same flag as IFLA_VFINFO_LIST. >> >> I think that's what we'll have to do. > > Ok, makes logical sense. > > But does anyone know what tools make use of the IFLA_VF_PORTS > information? Do they set the IFLA_EXT_MASK already? iproute2 makes use of IFLA_VF_PORTS and sets IFLA_EXT_MASK unconditionally in rtnl_wilddump_request(). libvirt also makes use of IFLA_VF_PORTS, and also unconditionally sets IFLA_EXT_MASK in virNetDevLinkDump. I only did a quick search and can't find any more users. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 22 Apr 2014 17:04:38 -0700 Greg Rose <gregory.v.rose@intel.com> wrote: > On Wed, 23 Apr 2014 09:26:06 +1000 > David Gibson <dgibson@redhat.com> wrote: > > > On Tue, 22 Apr 2014 14:12:00 -0400 (EDT) > > David Miller <davem@davemloft.net> wrote: > > > > > From: Ben Hutchings <ben@decadent.org.uk> > > > Date: Tue, 22 Apr 2014 19:03:19 +0100 > > > > > > > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote: > > > >> I believe I've found a problem with netlink handling which can be > > > >> triggered on Cisco enic devices with a large number (30-40) of > > > >> virtual functions. I believe this is the cause of a real > > > >> customer problem we've seen. > > > >> > > > >> * When requesting a list of interfaces with RTM_GETLINK, enic > > > >> devices (and currently, _only_ enic devices) report IFLA_VF_PORTS > > > >> information > > > >> > > > >> * IFLA_VF_PORTS information has at least 90 bytes ber virtual > > > >> function > > > >> > > > >> * Unlike IFLA_VFINFO_LIST, the ports information is always > > > >> reported, regardless of the setting of the IFLA_EXT_MASK > > > >> parameter > > > > [...] > > > > > > > > So I think you should make reporting of IFLA_VF_PORTS dependent > > > > on the same flag as IFLA_VFINFO_LIST. > > > > > > I think that's what we'll have to do. > > > > Ok, makes logical sense. > > > > But does anyone know what tools make use of the IFLA_VF_PORTS > > information? Do they set the IFLA_EXT_MASK already? > > > > So far as I know only the IP route tool 'ip link' sets that. In fact, > that's the reason I had to add it some number of years and months ago > was because there were tools that didn't expect to get all the > additional VF info and those tools were getting borked by all the > additional goo sent up for VFs. > > Beyond that who knows what anyone's been up to with other tools in > other places? And therein lies the problem. I don't even know what the IFLA_VF_PORTS info is for, but presumably something uses it. If they stop receiving it, they can be expected to break horribly.
From: David Gibson <dgibson@redhat.com> Date: Wed, 23 Apr 2014 11:12:03 +1000 > And therein lies the problem. I don't even know what the IFLA_VF_PORTS > info is for, but presumably something uses it. If they stop receiving > it, they can be expected to break horribly. We did the same thing to VFINFO list. All the users I could find already set the mask unconditionally for all device dumps. It's absolutely, positively, the only reasonable fix for this problem. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev- > owner@vger.kernel.org] On Behalf Of David Miller > Sent: Tuesday, April 22, 2014 6:17 PM > To: dgibson@redhat.com > Cc: gregory.v.rose@intel.com; ben@decadent.org.uk; > netdev@vger.kernel.org; Christian Benvenuti (benve); Sujith Sankar > (ssujith); govindarajulu90@gmail.com; Neel Patel (neepatel); > nistrive@cisco.com > Subject: Re: RFC: rtnetlink problems with Cisco enic and VFs > > From: David Gibson <dgibson@redhat.com> > Date: Wed, 23 Apr 2014 11:12:03 +1000 > > > And therein lies the problem. I don't even know what the > > IFLA_VF_PORTS info is for, but presumably something uses it. If they > > stop receiving it, they can be expected to break horribly. In the case of Enic, libvirt uses IFLA_VF_PORTS in the context of the port profile (see virtualport section and 802.1Qbh in the libvirt network xml documentation). As Miller said, libvirt and iproute2 are the two known users (with libvirt being the main one) but you never know what else may be using it. > We did the same thing to VFINFO list. I guess you refer to Bugzilla 889319. > All the users I could find already set the mask unconditionally for all device > dumps. > > It's absolutely, positively, the only reasonable fix for this problem. The fix based on IFLA_EXT_MASK seems reasonable to me (IFLA_EXT_MASK is in use in libvirt >= 1.0.3 and iproute2 >=3.4.0 based on a quick check). /Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 23 Apr 2014 02:33:06 +0000 "Christian Benvenuti (benve)" <benve@cisco.com> wrote: > > -----Original Message----- > > From: netdev-owner@vger.kernel.org [mailto:netdev- > > owner@vger.kernel.org] On Behalf Of David Miller > > Sent: Tuesday, April 22, 2014 6:17 PM > > To: dgibson@redhat.com > > Cc: gregory.v.rose@intel.com; ben@decadent.org.uk; > > netdev@vger.kernel.org; Christian Benvenuti (benve); Sujith Sankar > > (ssujith); govindarajulu90@gmail.com; Neel Patel (neepatel); > > nistrive@cisco.com > > Subject: Re: RFC: rtnetlink problems with Cisco enic and VFs > > > > From: David Gibson <dgibson@redhat.com> > > Date: Wed, 23 Apr 2014 11:12:03 +1000 > > > > > And therein lies the problem. I don't even know what the > > > IFLA_VF_PORTS info is for, but presumably something uses it. If they > > > stop receiving it, they can be expected to break horribly. > > In the case of Enic, libvirt uses IFLA_VF_PORTS in the context of the port profile > (see virtualport section and 802.1Qbh in the libvirt network xml documentation). > > As Miller said, libvirt and iproute2 are the two known users (with libvirt being the main one) > but you never know what else may be using it. Ah, yes, I see it in libvirt. I don't think it's used in iproute2 though, at least not in the master branch. $ git remote show origin * remote origin Fetch URL: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git Push URL: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git HEAD branch: master Remote branches: iproute-3.5.1 tracked master tracked net-next tracked net-next-3.11 tracked net-next-for-3.13 tracked Local branch configured for 'git pull': master merges with remote master Local ref configured for 'git push': master pushes to master (up to date) $ git rev-parse HEAD ce3436ca05ee2a9f4bf4c5b10eb25638865772cb $ git grep IFLA_VF_PORTS include/linux/if_link.h: IFLA_VF_PORTS, include/linux/if_link.h: * [IFLA_VF_PORTS] Those are declaration and comment only, no actual uses. > > We did the same thing to VFINFO list. > > I guess you refer to Bugzilla 889319. > > > All the users I could find already set the mask unconditionally for all device > > dumps. > > > > It's absolutely, positively, the only reasonable fix for this problem. > > The fix based on IFLA_EXT_MASK seems reasonable to me > (IFLA_EXT_MASK is in use in libvirt >= 1.0.3 and iproute2 >=3.4.0 based on a quick check). Ok. And conveniently for me it looks like the EXT_MASK fix is also backported into the RHEL libvirt. Alright, I'll whip up a patch series that makes the IFLA_VF_PORTS information conditional on the RTEXT_FILTER_VF flag.
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index d4ff417..5331db2 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1198,6 +1198,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_head *head; struct nlattr *tb[IFLA_MAX+1]; u32 ext_filter_mask = 0; + int err; s_h = cb->args[0]; s_idx = cb->args[1]; @@ -1218,11 +1219,16 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; - if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, - NETLINK_CB (cb->skb).portid, - cb->nlh->nlmsg_seq, 0, - NLM_F_MULTI, - ext_filter_mask) <= 0) + err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, + NETLINK_CB (cb->skb).portid, + cb->nlh->nlmsg_seq, 0, + NLM_F_MULTI, + ext_filter_mask); + /* If we ran out of room on the first message, + * we're in trouble */ + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); + + if (err <= 0) goto out;