diff mbox

RFC: rtnetlink problems with Cisco enic and VFs

Message ID 20140422141425.127dabd3c63482a6a655469e@redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

David Gibson April 22, 2014, 4:14 a.m. UTC
I believe I've found a problem with netlink handling which can be
triggered on Cisco enic devices with a large number (30-40) of virtual
functions.  I believe this is the cause of a real customer problem
we've seen.

 * When requesting a list of interfaces with RTM_GETLINK, enic devices
   (and currently, _only_ enic devices) report IFLA_VF_PORTS
   information 

 * IFLA_VF_PORTS information has at least 90 bytes ber virtual function

 * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
   regardless of the setting of the IFLA_EXT_MASK parameter

 * When IFLA_EXT_MASK is not specified, the reply packets have maximum
   size NLMSG_GOODSIZE (4k - overheads)

 * If there are enough virtual functions the IFLA_VF_PORTS information
   can cause a single interface's info to exceed NLMSG_GOODSIZE

 * The number of interfaces necessary to trigger this is reduced
   substantially if both IPv4 and IPv6 IFLA_AF_SPEC information is
   present (~972 bytes)

 * If the dump function returns -EMSGSIZE on the first message in a
   packet, netlink_dump() incorrectly assumes the listing is done,
   omitting information for that interface and any later ones.

 * This can cause getifaddrs(3) to go into an infinite loop

 * 'ip link' is not affected, because it supplies IFLA_EXT_MASK which
   triggers rtnl_calcit() to recalculate the required packet size to
   greater than NLMSG_GOODSIZE.


I can see several possible ways to fix this, but they all have
possible problems.  I'm hoping someone here can determine which, if
any, are real problems, and therefore what's the right approach to fix
this.

    A) Always calculate the RTM_NEWLINK packet size, rather than
       assuming NLMSG_GOODSIZE.
    Problem: The NLMSG_GOODSIZE limit was introduced to stop broken
             user tools with limited buffers encountering problems
             (see 115c9b81928360d769a76c632bae62d15206a94a). This
             approach might mean that such tools break again.

    B) Don't issue the VF port information when RTEXT_FILTER_VF isn't
       set
    Problem: Do tools using the port information already set this flag?

    C) Don't include the VF port info when listing interfaces, only
       when doing GETLINK on a specific interface.
    Problem: As (B), plus it's ugly.

    D) Detect the case when the first interface in a packet doesn't fit
       reallocate the packet buffer
    Problem: As (A), plus more complicated.

As an interim band-aid, here's a patch which adds a WARN_ON() in this
situation, which will at least make the problem easier to locate for
the next person to encounter it.

From: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH] rtnetlink: Warn when interface's information won't fit
         in our packet

Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.

If it doesn't, however, things will go badly wrong,  When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones.  This can cause getifaddrs(3) to
enter an infinite loop.

This patch won't fix the problem, but it will WARN_ON() making it
easier to track down what's going wrong.


Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 net/core/rtnetlink.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

 
 			nl_dump_check_consistent(cb, nlmsg_hdr(skb));

Comments

Ben Hutchings April 22, 2014, 6:03 p.m. UTC | #1
On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote:
> I believe I've found a problem with netlink handling which can be
> triggered on Cisco enic devices with a large number (30-40) of virtual
> functions.  I believe this is the cause of a real customer problem
> we've seen.
> 
>  * When requesting a list of interfaces with RTM_GETLINK, enic devices
>    (and currently, _only_ enic devices) report IFLA_VF_PORTS
>    information 
> 
>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual function
> 
>  * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
>    regardless of the setting of the IFLA_EXT_MASK parameter
[...]

So I think you should make reporting of IFLA_VF_PORTS dependent on the
same flag as IFLA_VFINFO_LIST.

Ben.
David Miller April 22, 2014, 6:12 p.m. UTC | #2
From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 22 Apr 2014 19:03:19 +0100

> On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote:
>> I believe I've found a problem with netlink handling which can be
>> triggered on Cisco enic devices with a large number (30-40) of virtual
>> functions.  I believe this is the cause of a real customer problem
>> we've seen.
>> 
>>  * When requesting a list of interfaces with RTM_GETLINK, enic devices
>>    (and currently, _only_ enic devices) report IFLA_VF_PORTS
>>    information 
>> 
>>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual function
>> 
>>  * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
>>    regardless of the setting of the IFLA_EXT_MASK parameter
> [...]
> 
> So I think you should make reporting of IFLA_VF_PORTS dependent on the
> same flag as IFLA_VFINFO_LIST.

I think that's what we'll have to do.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Benvenuti (benve) April 22, 2014, 7:14 p.m. UTC | #3
David,

> -----Original Message-----
> From: David Gibson [mailto:dgibson@redhat.com]
> Sent: Monday, April 21, 2014 9:14 PM
> To: netdev@vger.kernel.org
> Cc: Christian Benvenuti (benve); Sujith Sankar (ssujith); Govindarajulu
> Varadarajan; Neel Patel (neepatel); Nishank Trivedi
> Subject: RFC: rtnetlink problems with Cisco enic and VFs
> 
> I believe I've found a problem with netlink handling which can be triggered
> on Cisco enic devices with a large number (30-40) of virtual functions.  I
> believe this is the cause of a real customer problem we've seen.
> 
>  * When requesting a list of interfaces with RTM_GETLINK, enic devices
>    (and currently, _only_ enic devices) report IFLA_VF_PORTS
>    information

Is the fact that Enic is the only driver implementing ndo_get_vf_port [1]
the root cause of the problem and the reason why this happens only with Enic?

/Chris

[1]
This is what makes rtnl_port_size to account for vf_port_size*dev_num_vf(...)
and rtnl_port_fill to add IFLA_VF_PORTS.
 
>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual function
> 
>  * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
>    regardless of the setting of the IFLA_EXT_MASK parameter
> 
>  * When IFLA_EXT_MASK is not specified, the reply packets have maximum
>    size NLMSG_GOODSIZE (4k - overheads)
> 
>  * If there are enough virtual functions the IFLA_VF_PORTS information
>    can cause a single interface's info to exceed NLMSG_GOODSIZE
> 
>  * The number of interfaces necessary to trigger this is reduced
>    substantially if both IPv4 and IPv6 IFLA_AF_SPEC information is
>    present (~972 bytes)
> 
>  * If the dump function returns -EMSGSIZE on the first message in a
>    packet, netlink_dump() incorrectly assumes the listing is done,
>    omitting information for that interface and any later ones.
> 
>  * This can cause getifaddrs(3) to go into an infinite loop
> 
>  * 'ip link' is not affected, because it supplies IFLA_EXT_MASK which
>    triggers rtnl_calcit() to recalculate the required packet size to
>    greater than NLMSG_GOODSIZE.
> 
> 
> I can see several possible ways to fix this, but they all have possible
> problems.  I'm hoping someone here can determine which, if any, are real
> problems, and therefore what's the right approach to fix this.
> 
>     A) Always calculate the RTM_NEWLINK packet size, rather than
>        assuming NLMSG_GOODSIZE.
>     Problem: The NLMSG_GOODSIZE limit was introduced to stop broken
>              user tools with limited buffers encountering problems
>              (see 115c9b81928360d769a76c632bae62d15206a94a). This
>              approach might mean that such tools break again.
> 
>     B) Don't issue the VF port information when RTEXT_FILTER_VF isn't
>        set
>     Problem: Do tools using the port information already set this flag?
> 
>     C) Don't include the VF port info when listing interfaces, only
>        when doing GETLINK on a specific interface.
>     Problem: As (B), plus it's ugly.
> 
>     D) Detect the case when the first interface in a packet doesn't fit
>        reallocate the packet buffer
>     Problem: As (A), plus more complicated.
> 
> As an interim band-aid, here's a patch which adds a WARN_ON() in this
> situation, which will at least make the problem easier to locate for the next
> person to encounter it.
> 
> From: David Gibson <david@gibson.dropbear.id.au>
> Subject: [PATCH] rtnetlink: Warn when interface's information won't fit
>          in our packet
> 
> Without IFLA_EXT_MASK specified, the information reported for a single
> interface in response to RTM_GETLINK is expected to fit within a netlink
> packet of NLMSG_GOODSIZE.
> 
> If it doesn't, however, things will go badly wrong,  When listing all
> interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
> message in a packet as the end of the listing and omit information for that
> interface and all subsequent ones.  This can cause getifaddrs(3) to enter an
> infinite loop.
> 
> This patch won't fix the problem, but it will WARN_ON() making it easier to
> track down what's going wrong.
> 
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  net/core/rtnetlink.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index
> d4ff417..5331db2 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1198,6 +1198,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb,
> struct netlink_callback *cb) struct hlist_head *head;
>  	struct nlattr *tb[IFLA_MAX+1];
>  	u32 ext_filter_mask = 0;
> +	int err;
> 
>  	s_h = cb->args[0];
>  	s_idx = cb->args[1];
> @@ -1218,11 +1219,16 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb,
> struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head,
> index_hlist) { if (idx < s_idx)
>  				goto cont;
> -			if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
> -					     NETLINK_CB
> (cb->skb).portid,
> -					     cb->nlh->nlmsg_seq, 0,
> -					     NLM_F_MULTI,
> -					     ext_filter_mask) <= 0)
> +			err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
> +					       NETLINK_CB
> (cb->skb).portid,
> +					       cb->nlh->nlmsg_seq, 0,
> +					       NLM_F_MULTI,
> +					       ext_filter_mask);
> +			/* If we ran out of room on the first message,
> +			 * we're in trouble */
> +			WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
> +
> +			if (err <= 0)
>  				goto out;
> 
>  			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> --
> 1.9.0
> 
> 
> 
> --
> David Gibson <dgibson@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Gibson April 22, 2014, 11:24 p.m. UTC | #4
On Tue, 22 Apr 2014 19:14:04 +0000
"Christian Benvenuti (benve)" <benve@cisco.com> wrote:

> David,
> 
> > -----Original Message-----
> > From: David Gibson [mailto:dgibson@redhat.com]
> > Sent: Monday, April 21, 2014 9:14 PM
> > To: netdev@vger.kernel.org
> > Cc: Christian Benvenuti (benve); Sujith Sankar (ssujith); Govindarajulu
> > Varadarajan; Neel Patel (neepatel); Nishank Trivedi
> > Subject: RFC: rtnetlink problems with Cisco enic and VFs
> > 
> > I believe I've found a problem with netlink handling which can be triggered
> > on Cisco enic devices with a large number (30-40) of virtual functions.  I
> > believe this is the cause of a real customer problem we've seen.
> > 
> >  * When requesting a list of interfaces with RTM_GETLINK, enic devices
> >    (and currently, _only_ enic devices) report IFLA_VF_PORTS
> >    information
> 
> Is the fact that Enic is the only driver implementing ndo_get_vf_port [1]
> the root cause of the problem and the reason why this happens only with Enic?

Yes, or at least it's one of the factors.

> 
> /Chris
> 
> [1]
> This is what makes rtnl_port_size to account for vf_port_size*dev_num_vf(...)
> and rtnl_port_fill to add IFLA_VF_PORTS.
David Gibson April 22, 2014, 11:26 p.m. UTC | #5
On Tue, 22 Apr 2014 14:12:00 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Ben Hutchings <ben@decadent.org.uk>
> Date: Tue, 22 Apr 2014 19:03:19 +0100
> 
> > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote:
> >> I believe I've found a problem with netlink handling which can be
> >> triggered on Cisco enic devices with a large number (30-40) of virtual
> >> functions.  I believe this is the cause of a real customer problem
> >> we've seen.
> >> 
> >>  * When requesting a list of interfaces with RTM_GETLINK, enic devices
> >>    (and currently, _only_ enic devices) report IFLA_VF_PORTS
> >>    information 
> >> 
> >>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual function
> >> 
> >>  * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
> >>    regardless of the setting of the IFLA_EXT_MASK parameter
> > [...]
> > 
> > So I think you should make reporting of IFLA_VF_PORTS dependent on the
> > same flag as IFLA_VFINFO_LIST.
> 
> I think that's what we'll have to do.

Ok, makes logical sense.

But does anyone know what tools make use of the IFLA_VF_PORTS
information?  Do they set the IFLA_EXT_MASK already?
Rose, Gregory V April 23, 2014, 12:04 a.m. UTC | #6
On Wed, 23 Apr 2014 09:26:06 +1000
David Gibson <dgibson@redhat.com> wrote:

> On Tue, 22 Apr 2014 14:12:00 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Ben Hutchings <ben@decadent.org.uk>
> > Date: Tue, 22 Apr 2014 19:03:19 +0100
> > 
> > > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote:
> > >> I believe I've found a problem with netlink handling which can be
> > >> triggered on Cisco enic devices with a large number (30-40) of
> > >> virtual functions.  I believe this is the cause of a real
> > >> customer problem we've seen.
> > >> 
> > >>  * When requesting a list of interfaces with RTM_GETLINK, enic
> > >> devices (and currently, _only_ enic devices) report IFLA_VF_PORTS
> > >>    information 
> > >> 
> > >>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual
> > >> function
> > >> 
> > >>  * Unlike IFLA_VFINFO_LIST, the ports information is always
> > >> reported, regardless of the setting of the IFLA_EXT_MASK
> > >> parameter
> > > [...]
> > > 
> > > So I think you should make reporting of IFLA_VF_PORTS dependent
> > > on the same flag as IFLA_VFINFO_LIST.
> > 
> > I think that's what we'll have to do.
> 
> Ok, makes logical sense.
> 
> But does anyone know what tools make use of the IFLA_VF_PORTS
> information?  Do they set the IFLA_EXT_MASK already?
> 

So far as I know only the IP route tool 'ip link' sets that.  In fact,
that's the reason I had to add it some number of years and months ago
was because there were tools that didn't expect to get all the
additional VF info and those tools were getting borked by all the
additional goo sent up for VFs.

Beyond that who knows what anyone's been up to with other tools in
other places?

- Greg
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2014, 12:59 a.m. UTC | #7
From: David Gibson <dgibson@redhat.com>
Date: Wed, 23 Apr 2014 09:26:06 +1000

> On Tue, 22 Apr 2014 14:12:00 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Ben Hutchings <ben@decadent.org.uk>
>> Date: Tue, 22 Apr 2014 19:03:19 +0100
>> 
>> > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote:
>> >> I believe I've found a problem with netlink handling which can be
>> >> triggered on Cisco enic devices with a large number (30-40) of virtual
>> >> functions.  I believe this is the cause of a real customer problem
>> >> we've seen.
>> >> 
>> >>  * When requesting a list of interfaces with RTM_GETLINK, enic devices
>> >>    (and currently, _only_ enic devices) report IFLA_VF_PORTS
>> >>    information 
>> >> 
>> >>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual function
>> >> 
>> >>  * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
>> >>    regardless of the setting of the IFLA_EXT_MASK parameter
>> > [...]
>> > 
>> > So I think you should make reporting of IFLA_VF_PORTS dependent on the
>> > same flag as IFLA_VFINFO_LIST.
>> 
>> I think that's what we'll have to do.
> 
> Ok, makes logical sense.
> 
> But does anyone know what tools make use of the IFLA_VF_PORTS
> information?  Do they set the IFLA_EXT_MASK already?

iproute2 makes use of IFLA_VF_PORTS and sets IFLA_EXT_MASK
unconditionally in rtnl_wilddump_request().

libvirt also makes use of IFLA_VF_PORTS, and also unconditionally
sets IFLA_EXT_MASK in virNetDevLinkDump.

I only did a quick search and can't find any more users.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Gibson April 23, 2014, 1:12 a.m. UTC | #8
On Tue, 22 Apr 2014 17:04:38 -0700
Greg Rose <gregory.v.rose@intel.com> wrote:

> On Wed, 23 Apr 2014 09:26:06 +1000
> David Gibson <dgibson@redhat.com> wrote:
> 
> > On Tue, 22 Apr 2014 14:12:00 -0400 (EDT)
> > David Miller <davem@davemloft.net> wrote:
> > 
> > > From: Ben Hutchings <ben@decadent.org.uk>
> > > Date: Tue, 22 Apr 2014 19:03:19 +0100
> > > 
> > > > On Tue, 2014-04-22 at 14:14 +1000, David Gibson wrote:
> > > >> I believe I've found a problem with netlink handling which can be
> > > >> triggered on Cisco enic devices with a large number (30-40) of
> > > >> virtual functions.  I believe this is the cause of a real
> > > >> customer problem we've seen.
> > > >> 
> > > >>  * When requesting a list of interfaces with RTM_GETLINK, enic
> > > >> devices (and currently, _only_ enic devices) report IFLA_VF_PORTS
> > > >>    information 
> > > >> 
> > > >>  * IFLA_VF_PORTS information has at least 90 bytes ber virtual
> > > >> function
> > > >> 
> > > >>  * Unlike IFLA_VFINFO_LIST, the ports information is always
> > > >> reported, regardless of the setting of the IFLA_EXT_MASK
> > > >> parameter
> > > > [...]
> > > > 
> > > > So I think you should make reporting of IFLA_VF_PORTS dependent
> > > > on the same flag as IFLA_VFINFO_LIST.
> > > 
> > > I think that's what we'll have to do.
> > 
> > Ok, makes logical sense.
> > 
> > But does anyone know what tools make use of the IFLA_VF_PORTS
> > information?  Do they set the IFLA_EXT_MASK already?
> > 
> 
> So far as I know only the IP route tool 'ip link' sets that.  In fact,
> that's the reason I had to add it some number of years and months ago
> was because there were tools that didn't expect to get all the
> additional VF info and those tools were getting borked by all the
> additional goo sent up for VFs.
> 
> Beyond that who knows what anyone's been up to with other tools in
> other places?

And therein lies the problem.  I don't even know what the IFLA_VF_PORTS
info is for, but presumably something uses it.  If they stop receiving
it, they can be expected to break horribly.
David Miller April 23, 2014, 1:16 a.m. UTC | #9
From: David Gibson <dgibson@redhat.com>
Date: Wed, 23 Apr 2014 11:12:03 +1000

> And therein lies the problem.  I don't even know what the IFLA_VF_PORTS
> info is for, but presumably something uses it.  If they stop receiving
> it, they can be expected to break horribly.

We did the same thing to VFINFO list.

All the users I could find already set the mask unconditionally
for all device dumps.

It's absolutely, positively, the only reasonable fix for this
problem.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Benvenuti (benve) April 23, 2014, 2:33 a.m. UTC | #10
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: Tuesday, April 22, 2014 6:17 PM
> To: dgibson@redhat.com
> Cc: gregory.v.rose@intel.com; ben@decadent.org.uk;
> netdev@vger.kernel.org; Christian Benvenuti (benve); Sujith Sankar
> (ssujith); govindarajulu90@gmail.com; Neel Patel (neepatel);
> nistrive@cisco.com
> Subject: Re: RFC: rtnetlink problems with Cisco enic and VFs
> 
> From: David Gibson <dgibson@redhat.com>
> Date: Wed, 23 Apr 2014 11:12:03 +1000
> 
> > And therein lies the problem.  I don't even know what the
> > IFLA_VF_PORTS info is for, but presumably something uses it.  If they
> > stop receiving it, they can be expected to break horribly.

In the case of Enic, libvirt uses IFLA_VF_PORTS in the context of the port profile
(see virtualport section and 802.1Qbh in the libvirt network xml documentation).

As Miller said, libvirt and iproute2 are the two known users (with libvirt being the main one)
but you never know what else may be using it.

> We did the same thing to VFINFO list.

I guess you refer to Bugzilla 889319.

> All the users I could find already set the mask unconditionally for all device
> dumps.
> 
> It's absolutely, positively, the only reasonable fix for this problem.

The fix based on IFLA_EXT_MASK seems reasonable to me
(IFLA_EXT_MASK is in use in libvirt >= 1.0.3 and iproute2 >=3.4.0 based on a quick check).

/Chris

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Gibson April 23, 2014, 4:15 a.m. UTC | #11
On Wed, 23 Apr 2014 02:33:06 +0000
"Christian Benvenuti (benve)" <benve@cisco.com> wrote:

> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of David Miller
> > Sent: Tuesday, April 22, 2014 6:17 PM
> > To: dgibson@redhat.com
> > Cc: gregory.v.rose@intel.com; ben@decadent.org.uk;
> > netdev@vger.kernel.org; Christian Benvenuti (benve); Sujith Sankar
> > (ssujith); govindarajulu90@gmail.com; Neel Patel (neepatel);
> > nistrive@cisco.com
> > Subject: Re: RFC: rtnetlink problems with Cisco enic and VFs
> > 
> > From: David Gibson <dgibson@redhat.com>
> > Date: Wed, 23 Apr 2014 11:12:03 +1000
> > 
> > > And therein lies the problem.  I don't even know what the
> > > IFLA_VF_PORTS info is for, but presumably something uses it.  If they
> > > stop receiving it, they can be expected to break horribly.
> 
> In the case of Enic, libvirt uses IFLA_VF_PORTS in the context of the port profile
> (see virtualport section and 802.1Qbh in the libvirt network xml documentation).
> 
> As Miller said, libvirt and iproute2 are the two known users (with libvirt being the main one)
> but you never know what else may be using it.

Ah, yes, I see it in libvirt.

I don't think it's used in iproute2 though, at least not in the master
branch.

$ git remote show origin
* remote origin
  Fetch URL:
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
Push  URL:
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
HEAD branch: master Remote branches: iproute-3.5.1     tracked
    master            tracked
    net-next          tracked
    net-next-3.11     tracked
    net-next-for-3.13 tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)
$ git rev-parse HEAD
ce3436ca05ee2a9f4bf4c5b10eb25638865772cb
$ git grep IFLA_VF_PORTS
include/linux/if_link.h:	IFLA_VF_PORTS,
include/linux/if_link.h: *		[IFLA_VF_PORTS]

Those are declaration and comment only, no actual uses.

> > We did the same thing to VFINFO list.
> 
> I guess you refer to Bugzilla 889319.
> 
> > All the users I could find already set the mask unconditionally for all device
> > dumps.
> > 
> > It's absolutely, positively, the only reasonable fix for this problem.
> 
> The fix based on IFLA_EXT_MASK seems reasonable to me
> (IFLA_EXT_MASK is in use in libvirt >= 1.0.3 and iproute2 >=3.4.0 based on a quick check).

Ok.  And conveniently for me it looks like the EXT_MASK fix is also
backported into the RHEL libvirt.

Alright, I'll whip up a patch series that makes the IFLA_VF_PORTS
information conditional on the RTEXT_FILTER_VF flag.
diff mbox

Patch

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d4ff417..5331db2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1198,6 +1198,7 @@  static int rtnl_dump_ifinfo(struct sk_buff *skb,
struct netlink_callback *cb) struct hlist_head *head;
 	struct nlattr *tb[IFLA_MAX+1];
 	u32 ext_filter_mask = 0;
+	int err;
 
 	s_h = cb->args[0];
 	s_idx = cb->args[1];
@@ -1218,11 +1219,16 @@  static int rtnl_dump_ifinfo(struct sk_buff
*skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head,
index_hlist) { if (idx < s_idx)
 				goto cont;
-			if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
-					     NETLINK_CB
(cb->skb).portid,
-					     cb->nlh->nlmsg_seq, 0,
-					     NLM_F_MULTI,
-					     ext_filter_mask) <= 0)
+			err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
+					       NETLINK_CB
(cb->skb).portid,
+					       cb->nlh->nlmsg_seq, 0,
+					       NLM_F_MULTI,
+					       ext_filter_mask);
+			/* If we ran out of room on the first message,
+			 * we're in trouble */
+			WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
+
+			if (err <= 0)
 				goto out;