diff mbox

[RFC,v2,1/4] bridge: enable interfaces to opt out from becoming the root bridge

Message ID 1392433180-16052-2-git-send-email-mcgrof@do-not-panic.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Luis R. Rodriguez Feb. 15, 2014, 2:59 a.m. UTC
From: "Luis R. Rodriguez" <mcgrof@suse.com>

It doesn't make sense for some interfaces to become a root bridge
at any point in time. One example is virtual backend interfaces
which rely on other entities on the bridge for actual physical
connectivity. They only provide virtual access.

Device drivers that know they should never become part of the
root bridge have been using a trick of setting their MAC address
to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
of using these hacks lets the interfaces annotate its intent and
generalizes a solution for multiple drivers, while letting the
drivers use a random MAC address or one prefixed with a proper OUI.
This sort of hack is used by both qemu and xen for their backend
interfaces.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: bridge@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 include/uapi/linux/if.h | 1 +
 net/bridge/br_if.c      | 2 ++
 net/bridge/br_private.h | 1 +
 net/bridge/br_stp_if.c  | 2 ++
 4 files changed, 6 insertions(+)

Comments

Ben Hutchings Feb. 16, 2014, 6:56 p.m. UTC | #1
On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> It doesn't make sense for some interfaces to become a root bridge

I think you mean 'root port'.

> at any point in time. One example is virtual backend interfaces
> which rely on other entities on the bridge for actual physical
> connectivity. They only provide virtual access.
> 
> Device drivers that know they should never become part of the
> root bridge have been using a trick of setting their MAC address
> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
> of using these hacks lets the interfaces annotate its intent and
> generalizes a solution for multiple drivers, while letting the
> drivers use a random MAC address or one prefixed with a proper OUI.
> This sort of hack is used by both qemu and xen for their backend
> interfaces.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: bridge@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  include/uapi/linux/if.h | 1 +
>  net/bridge/br_if.c      | 2 ++
>  net/bridge/br_private.h | 1 +
>  net/bridge/br_stp_if.c  | 2 ++
>  4 files changed, 6 insertions(+)
> 
> diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
> index d758163..8d10382 100644
> --- a/include/uapi/linux/if.h
> +++ b/include/uapi/linux/if.h
> @@ -84,6 +84,7 @@
>  #define IFF_LIVE_ADDR_CHANGE 0x100000	/* device supports hardware address
>  					 * change when it's running */
>  #define IFF_MACVLAN 0x200000		/* Macvlan device */
> +#define IFF_BRIDGE_NON_ROOT 0x400000    /* Don't consider for root bridge */
[...]

Does it really make sense to add a flag that says exactly which special
behaviour you want, or would it be better to define the flag as a
passive property, which other drivers/protocols then use as a condition
for special behaviour?

The fact that you also define the IFF_BRIDGE_SKIP_IP flag, and set it on
exactly the same devices, makes me think that they should actually be a
single flag.  I don't know how that flag should be named or described,
though.

Ben.
Stephen Hemminger Feb. 16, 2014, 6:57 p.m. UTC | #2
On Fri, 14 Feb 2014 18:59:37 -0800
"Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:

> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> It doesn't make sense for some interfaces to become a root bridge
> at any point in time. One example is virtual backend interfaces
> which rely on other entities on the bridge for actual physical
> connectivity. They only provide virtual access.
> 
> Device drivers that know they should never become part of the
> root bridge have been using a trick of setting their MAC address
> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
> of using these hacks lets the interfaces annotate its intent and
> generalizes a solution for multiple drivers, while letting the
> drivers use a random MAC address or one prefixed with a proper OUI.
> This sort of hack is used by both qemu and xen for their backend
> interfaces.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: bridge@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

This is already supported in a more standard way via the root
block flag.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 17, 2014, 5:52 p.m. UTC | #3
On 15/02/14 02:59, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> It doesn't make sense for some interfaces to become a root bridge
> at any point in time. One example is virtual backend interfaces
> which rely on other entities on the bridge for actual physical
> connectivity. They only provide virtual access.

It is possible that a guest bridge together to VIF, either from the same 
Dom0 bridge or from different ones. In that case using STP on VIFs sound 
sensible to me.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 18, 2014, 9:02 p.m. UTC | #4
On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 14 Feb 2014 18:59:37 -0800
> "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> It doesn't make sense for some interfaces to become a root bridge
>> at any point in time. One example is virtual backend interfaces
>> which rely on other entities on the bridge for actual physical
>> connectivity. They only provide virtual access.
>>
>> Device drivers that know they should never become part of the
>> root bridge have been using a trick of setting their MAC address
>> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
>> of using these hacks lets the interfaces annotate its intent and
>> generalizes a solution for multiple drivers, while letting the
>> drivers use a random MAC address or one prefixed with a proper OUI.
>> This sort of hack is used by both qemu and xen for their backend
>> interfaces.
>>
>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>> Cc: bridge@lists.linux-foundation.org
>> Cc: netdev@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>
> This is already supported in a more standard way via the root
> block flag.

Great! For documentation purposes the root_block flag is a sysfs
attribute, added via 3.8 through commit 1007dd1a. The respective
interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
bridge utility or through sysfs:

mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
/sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0/brport/root_block
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
/sys/devices/virtual/net/vif3.0-emu/brport/root_block

mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
0
mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
dev vif3.0 root_block on
mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
1

So if we'd want to avoid using the MAC address hack alternative to
skip a root port userspace would need to be updated to simply set this
attribute after adding the device to the bridge. Based on Zoltan's
feedback there seems to be use cases to not enable this always for all
xen-netback interfaces though as such we can just punt this to
userspace for the topologies that require this.

The original motivation for this series was to avoid the IPv6
duplicate address incurred by the MAC address hack for avoiding the
root bridge. Given that Zoltan also noted a use case whereby IPv4 and
IPv6 addresses can be assigned to the backend interfaces we should be
able to avoid the duplicate address situation for IPv6 by using a
proper random MAC address *once* userspace has been updated also to
use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
this flag for older kernels (older than 3.8) as root_block is not
implemented on those kernels and the MAC address hack would still be
used there. This strategy however does put a requirement on new
kernels to use new userspace as otherwise the MAC address workaround
would not be in place and root_block would not take effect.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Feb. 19, 2014, 9:52 a.m. UTC | #5
On Tue, 2014-02-18 at 13:02 -0800, Luis R. Rodriguez wrote:
> On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Fri, 14 Feb 2014 18:59:37 -0800
> > "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
> >
> >> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >>
> >> It doesn't make sense for some interfaces to become a root bridge
> >> at any point in time. One example is virtual backend interfaces
> >> which rely on other entities on the bridge for actual physical
> >> connectivity. They only provide virtual access.
> >>
> >> Device drivers that know they should never become part of the
> >> root bridge have been using a trick of setting their MAC address
> >> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
> >> of using these hacks lets the interfaces annotate its intent and
> >> generalizes a solution for multiple drivers, while letting the
> >> drivers use a random MAC address or one prefixed with a proper OUI.
> >> This sort of hack is used by both qemu and xen for their backend
> >> interfaces.
> >>
> >> Cc: Stephen Hemminger <stephen@networkplumber.org>
> >> Cc: bridge@lists.linux-foundation.org
> >> Cc: netdev@vger.kernel.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> >
> > This is already supported in a more standard way via the root
> > block flag.
> 
> Great! For documentation purposes the root_block flag is a sysfs
> attribute, added via 3.8 through commit 1007dd1a. The respective
> interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
> bridge utility or through sysfs:
> 
> mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
> /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0/brport/root_block
> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
> /sys/devices/virtual/net/vif3.0-emu/brport/root_block
> 
> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
> 0
> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
> dev vif3.0 root_block on
> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
> 1
> 
> So if we'd want to avoid using the MAC address hack alternative to
> skip a root port userspace would need to be updated to simply set this
> attribute after adding the device to the bridge. Based on Zoltan's
> feedback there seems to be use cases to not enable this always for all
> xen-netback interfaces though as such we can just punt this to
> userspace for the topologies that require this.
> 
> The original motivation for this series was to avoid the IPv6
> duplicate address incurred by the MAC address hack for avoiding the
> root bridge. Given that Zoltan also noted a use case whereby IPv4 and
> IPv6 addresses can be assigned to the backend interfaces we should be
> able to avoid the duplicate address situation for IPv6 by using a
> proper random MAC address *once* userspace has been updated also to
> use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
> this flag for older kernels (older than 3.8) as root_block is not
> implemented on those kernels and the MAC address hack would still be
> used there. This strategy however does put a requirement on new
> kernels to use new userspace as otherwise the MAC address workaround
> would not be in place and root_block would not take effect.

Can't we arrange things in the Xen hotplug scripts such that if the
root_block stuff isn't available/doesn't work we fallback to the
existing fe:ff:ff:ff:ff usage?

That would avoid concerns about forward/backwards compat I think. It
wouldn't solve the issue you are targeting on old systems, but it also
doesn't regress them any further.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 19, 2014, 2:35 p.m. UTC | #6
On 19/02/14 09:52, Ian Campbell wrote:
> On Tue, 2014-02-18 at 13:02 -0800, Luis R. Rodriguez wrote:
>> On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Fri, 14 Feb 2014 18:59:37 -0800
>>> "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>>>
>>>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>>>
>>>> It doesn't make sense for some interfaces to become a root bridge
>>>> at any point in time. One example is virtual backend interfaces
>>>> which rely on other entities on the bridge for actual physical
>>>> connectivity. They only provide virtual access.
>>>>
>>>> Device drivers that know they should never become part of the
>>>> root bridge have been using a trick of setting their MAC address
>>>> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
>>>> of using these hacks lets the interfaces annotate its intent and
>>>> generalizes a solution for multiple drivers, while letting the
>>>> drivers use a random MAC address or one prefixed with a proper OUI.
>>>> This sort of hack is used by both qemu and xen for their backend
>>>> interfaces.
>>>>
>>>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>>>> Cc: bridge@lists.linux-foundation.org
>>>> Cc: netdev@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>>>
>>> This is already supported in a more standard way via the root
>>> block flag.
>>
>> Great! For documentation purposes the root_block flag is a sysfs
>> attribute, added via 3.8 through commit 1007dd1a. The respective
>> interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
>> bridge utility or through sysfs:
>>
>> mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
>> /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0/brport/root_block
>> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
>> /sys/devices/virtual/net/vif3.0-emu/brport/root_block
>>
>> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
>> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
>> 0
>> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
>> dev vif3.0 root_block on
>> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
>> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
>> 1
>>
>> So if we'd want to avoid using the MAC address hack alternative to
>> skip a root port userspace would need to be updated to simply set this
>> attribute after adding the device to the bridge. Based on Zoltan's
>> feedback there seems to be use cases to not enable this always for all
>> xen-netback interfaces though as such we can just punt this to
>> userspace for the topologies that require this.
>>
>> The original motivation for this series was to avoid the IPv6
>> duplicate address incurred by the MAC address hack for avoiding the
>> root bridge. Given that Zoltan also noted a use case whereby IPv4 and
>> IPv6 addresses can be assigned to the backend interfaces we should be
>> able to avoid the duplicate address situation for IPv6 by using a
>> proper random MAC address *once* userspace has been updated also to
>> use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
>> this flag for older kernels (older than 3.8) as root_block is not
>> implemented on those kernels and the MAC address hack would still be
>> used there. This strategy however does put a requirement on new
>> kernels to use new userspace as otherwise the MAC address workaround
>> would not be in place and root_block would not take effect.
>
> Can't we arrange things in the Xen hotplug scripts such that if the
> root_block stuff isn't available/doesn't work we fallback to the
> existing fe:ff:ff:ff:ff usage?
>
> That would avoid concerns about forward/backwards compat I think. It
> wouldn't solve the issue you are targeting on old systems, but it also
> doesn't regress them any further.

I agree, I think this problem could be better handled from userspace: if 
it can set root_block then change the default MAC to a random one, if it 
can't, then stay with the default one. Or if someone doesn't care about 
STP but DAD is still important, userspace can have a force_random_mac 
option somewhere to change to a random MAC regardless of root_block 
presence.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 19, 2014, 4:45 p.m. UTC | #7
On Mon, Feb 17, 2014 at 9:52 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
> On 15/02/14 02:59, Luis R. Rodriguez wrote:
>>
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> It doesn't make sense for some interfaces to become a root bridge
>> at any point in time. One example is virtual backend interfaces
>> which rely on other entities on the bridge for actual physical
>> connectivity. They only provide virtual access.
>
> It is possible that a guest bridge together to VIF, either from the same
> Dom0 bridge or from different ones. In that case using STP on VIFs sound
> sensible to me.

You seem to describe a case whereby it can make sense for xen-netback
interfaces to end up becoming the root port of a bridge. Can you
elaborate a little more on that as it was unclear the use case.

Additionally if such cases exist then under the current upstream
implementation one would simply need to change the MAC address in
order to enable a vif to become the root port.  Stephen noted there is
a way to avoid nominating an interface for a root port through the
root block flag. We should use that instead of the MAC address hacks.
Let's keep in mind that part of the motivation for this series is to
avoid a duplicate IPv6 address left in place by use cases whereby the
MAC address of the backend vif was left static. The use case your are
explaining likely describes the more prevalent use case where address
conflicts can occur, perhaps when administrators for got to change the
backend MAC address. If we embrace a random MAC address we'd avoid
that issue, and but we'd need to update userspace to use the root
block on topologies where desired.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 19, 2014, 5:02 p.m. UTC | #8
On Wed, Feb 19, 2014 at 6:35 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
> On 19/02/14 09:52, Ian Campbell wrote:
>> Can't we arrange things in the Xen hotplug scripts such that if the
>> root_block stuff isn't available/doesn't work we fallback to the
>> existing fe:ff:ff:ff:ff usage?
>>
>> That would avoid concerns about forward/backwards compat I think. It
>> wouldn't solve the issue you are targeting on old systems, but it also
>> doesn't regress them any further.
>
> I agree, I think this problem could be better handled from userspace: if it
> can set root_block then change the default MAC to a random one, if it can't,
> then stay with the default one. Or if someone doesn't care about STP but DAD
> is still important, userspace can have a force_random_mac option somewhere
> to change to a random MAC regardless of root_block presence.

Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
upon initialization so that root block will be used once the device
gets added to a bridge. The purpose would be to avoid drivers from
using the high MAC address hack, streamline to use a random MAC
address thereby avoiding the possible duplicate address situation for
IPv6. In the STP use case for these interfaces we'd just require
userspace to unset the root block. I'd consider the STP use case the
most odd of all. The caveat to this approach is 3.8 would be needed
(or its the root block patches cherry picked) for base kernels older
than 3.8.

Stephen?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger Feb. 19, 2014, 5:08 p.m. UTC | #9
On Wed, 19 Feb 2014 09:02:06 -0800
"Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:

> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
> upon initialization so that root block will be used once the device
> gets added to a bridge. The purpose would be to avoid drivers from
> using the high MAC address hack, streamline to use a random MAC
> address thereby avoiding the possible duplicate address situation for
> IPv6. In the STP use case for these interfaces we'd just require
> userspace to unset the root block. I'd consider the STP use case the
> most odd of all. The caveat to this approach is 3.8 would be needed
> (or its the root block patches cherry picked) for base kernels older
> than 3.8.
> 
> Stephen?
> 
>   Luis

Don't add IFF_ flags that adds yet another API hook into bridge.
Please only use the netlink/sysfs flags fields that already exist
for new features.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 19, 2014, 5:59 p.m. UTC | #10
On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Wed, 19 Feb 2014 09:02:06 -0800
> "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>
>> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
>> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
>> upon initialization so that root block will be used once the device
>> gets added to a bridge. The purpose would be to avoid drivers from
>> using the high MAC address hack, streamline to use a random MAC
>> address thereby avoiding the possible duplicate address situation for
>> IPv6. In the STP use case for these interfaces we'd just require
>> userspace to unset the root block. I'd consider the STP use case the
>> most odd of all. The caveat to this approach is 3.8 would be needed
>> (or its the root block patches cherry picked) for base kernels older
>> than 3.8.
>>
>> Stephen?
>>
>>   Luis
>
> Don't add IFF_ flags that adds yet another API hook into bridge.

The goal was not to add a userspace API, but rather consider a driver
initialization preference.

> Please only use the netlink/sysfs flags fields that already exist
> for new features.

Sure, but what if we know a driver in most cases wants the root block
and we'd want to make it the default, thereby only requiring userspace
for toggling it off.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 20, 2014, 1:19 p.m. UTC | #11
On 19/02/14 17:02, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 6:35 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> On 19/02/14 09:52, Ian Campbell wrote:
>>> Can't we arrange things in the Xen hotplug scripts such that if the
>>> root_block stuff isn't available/doesn't work we fallback to the
>>> existing fe:ff:ff:ff:ff usage?
>>>
>>> That would avoid concerns about forward/backwards compat I think. It
>>> wouldn't solve the issue you are targeting on old systems, but it also
>>> doesn't regress them any further.
>>
>> I agree, I think this problem could be better handled from userspace: if it
>> can set root_block then change the default MAC to a random one, if it can't,
>> then stay with the default one. Or if someone doesn't care about STP but DAD
>> is still important, userspace can have a force_random_mac option somewhere
>> to change to a random MAC regardless of root_block presence.
>
> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
> upon initialization so that root block will be used once the device
> gets added to a bridge. The purpose would be to avoid drivers from
> using the high MAC address hack, streamline to use a random MAC
> address thereby avoiding the possible duplicate address situation for
> IPv6. In the STP use case for these interfaces we'd just require
> userspace to unset the root block. I'd consider the STP use case the
> most odd of all. The caveat to this approach is 3.8 would be needed
> (or its the root block patches cherry picked) for base kernels older
> than 3.8.

How about this: netback sets the root_block flag and a random MAC by 
default. So the default behaviour won't change, DAD will be happy, and 
userspace don't have to do anything unless it's using netback for STP 
root bridge (I don't think there are too many toolstacks doing that), in 
which case it has to remove the root_block flag instead of setting a 
random MAC.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 20, 2014, 2:47 p.m. UTC | #12
On 19/02/14 16:45, Luis R. Rodriguez wrote:
> On Mon, Feb 17, 2014 at 9:52 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> On 15/02/14 02:59, Luis R. Rodriguez wrote:
>>>
>>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>>
>>> It doesn't make sense for some interfaces to become a root bridge
>>> at any point in time. One example is virtual backend interfaces
>>> which rely on other entities on the bridge for actual physical
>>> connectivity. They only provide virtual access.
>>
>> It is possible that a guest bridge together to VIF, either from the same
>> Dom0 bridge or from different ones. In that case using STP on VIFs sound
>> sensible to me.
>
> You seem to describe a case whereby it can make sense for xen-netback
> interfaces to end up becoming the root port of a bridge. Can you
> elaborate a little more on that as it was unclear the use case.
Well, I might be wrong on that, but the scenario I was thinking: a guest 
(let's say domain 1) can have multiple interfaces on different Dom0 (or 
driver domain) bridges, let's say vif1.0 is plugged into xenbr0 and 
vif1.1 is in xenbr1. If the guest wants to make a bridge of this two, 
then using STP makes sense. I wanted to bring up CloudStack's virtual 
router as an example, but then I realized it's probably doesn't do such 
thing. However I don't think we should hardcode that a netback interface 
can't be RP ever.

>
> Additionally if such cases exist then under the current upstream
> implementation one would simply need to change the MAC address in
> order to enable a vif to become the root port.  Stephen noted there is
> a way to avoid nominating an interface for a root port through the
> root block flag. We should use that instead of the MAC address hacks.
> Let's keep in mind that part of the motivation for this series is to
> avoid a duplicate IPv6 address left in place by use cases whereby the
> MAC address of the backend vif was left static. The use case your are
> explaining likely describes the more prevalent use case where address
> conflicts can occur, perhaps when administrators for got to change the
> backend MAC address. If we embrace a random MAC address we'd avoid
> that issue, and but we'd need to update userspace to use the root
> block on topologies where desired.
If I understand you correctly, this is the same I suggested in my 
another email sent 1.5 hour ago.

Zoli

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger Feb. 20, 2014, 5:19 p.m. UTC | #13
On Wed, 19 Feb 2014 09:59:33 -0800
"Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:

> On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Wed, 19 Feb 2014 09:02:06 -0800
> > "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
> >
> >> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
> >> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
> >> upon initialization so that root block will be used once the device
> >> gets added to a bridge. The purpose would be to avoid drivers from
> >> using the high MAC address hack, streamline to use a random MAC
> >> address thereby avoiding the possible duplicate address situation for
> >> IPv6. In the STP use case for these interfaces we'd just require
> >> userspace to unset the root block. I'd consider the STP use case the
> >> most odd of all. The caveat to this approach is 3.8 would be needed
> >> (or its the root block patches cherry picked) for base kernels older
> >> than 3.8.
> >>
> >> Stephen?
> >>
> >>   Luis
> >
> > Don't add IFF_ flags that adds yet another API hook into bridge.
> 
> The goal was not to add a userspace API, but rather consider a driver
> initialization preference.
> 
> > Please only use the netlink/sysfs flags fields that already exist
> > for new features.
> 
> Sure, but what if we know a driver in most cases wants the root block
> and we'd want to make it the default, thereby only requiring userspace
> for toggling it off.
> 
>   Luis

Something in userspace has to put the device into the bridge.
Fix the port setup in that tool via the netlink or sysfs flags in
the bridge. It should not have to be handled in the bridge looking
at magic flags in the device.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 20, 2014, 8:01 p.m. UTC | #14
On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
> How about this: netback sets the root_block flag and a random MAC by
> default. So the default behaviour won't change, DAD will be happy, and
> userspace don't have to do anything unless it's using netback for STP root
> bridge (I don't think there are too many toolstacks doing that), in which
> case it has to remove the root_block flag instead of setting a random MAC.

:D that's exactly what I ended up proposing too. I mentioned how
xen-netback could do this as well, we'd keep or rename the flag I
added, and then the bridge could would look at it and enable the root
block if the flag is set. Stephen however does not like having the
bridge code look at magic flags for this behavior and would prefer for
us to get the tools to ask for the root block. Let's follow more up on
that thread.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 20, 2014, 8:24 p.m. UTC | #15
On Thu, Feb 20, 2014 at 9:19 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Wed, 19 Feb 2014 09:59:33 -0800 "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>> On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> >
>> > Please only use the netlink/sysfs flags fields that already exist
>> > for new features.
>>
>> Sure, but what if we know a driver in most cases wants the root block
>> and we'd want to make it the default, thereby only requiring userspace
>> for toggling it off.
>
> Something in userspace has to put the device into the bridge.
> Fix the port setup in that tool via the netlink or sysfs flags in
> the bridge. It should not have to be handled in the bridge looking
> at magic flags in the device.

Agreed that's the best strategy and I'll work on sending patches to
brctl to enable the root_block preference. This approach however also
requires a userspace upgrade. I'm trying to see if we can get an
old-nasty-cryptic-hack practice removed from the kernel and we'd try
to prevent future drivers from using it -- without requiring userspace
upgrade. In this case the bad practice is to using a high static MAC
address for mimicking a root block default preference. In order to
remove that *without* requiring a userspace upgrade the dev->priv_flag
approach is the only thing I can think of. If this would go in we'd
replace the high static MAC address with a random MAC address to
prevent IPv6 SLAAC / DAD conflicts. I'd document this flag and
indicate with preference for userspace to be the one tuning these
knobs.

Without this we'd have to keep the high static MAC address on upstream
drivers and let userspace do the random'ization if it confirms the
userspace knob to turn the root block flag is available. Is the
priv_flag approach worth the compromise to remove the root block hack
practice?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 20, 2014, 8:28 p.m. UTC | #16
On Thu, Feb 20, 2014 at 6:47 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
> On 19/02/14 16:45, Luis R. Rodriguez wrote:
>
>> You seem to describe a case whereby it can make sense for xen-netback
>> interfaces to end up becoming the root port of a bridge. Can you
>> elaborate a little more on that as it was unclear the use case.
>
> Well, I might be wrong on that, but the scenario I was thinking: a guest
> (let's say domain 1) can have multiple interfaces on different Dom0 (or
> driver domain) bridges, let's say vif1.0 is plugged into xenbr0 and vif1.1
> is in xenbr1. If the guest wants to make a bridge of this two, then using
> STP makes sense.

The bridging would happen on the front end in that case no?

>  I wanted to bring up CloudStack's virtual router as an
> example, but then I realized it's probably doesn't do such thing. However I
> don't think we should hardcode that a netback interface can't be RP ever.

My patch did allow for this but the root block flag that Stephen
mentioned can always be lifted.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 21, 2014, 1:02 p.m. UTC | #17
On 20/02/14 20:01, Luis R. Rodriguez wrote:
> On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> How about this: netback sets the root_block flag and a random MAC by
>> default. So the default behaviour won't change, DAD will be happy, and
>> userspace don't have to do anything unless it's using netback for STP root
>> bridge (I don't think there are too many toolstacks doing that), in which
>> case it has to remove the root_block flag instead of setting a random MAC.
>
> :D that's exactly what I ended up proposing too. I mentioned how
> xen-netback could do this as well, we'd keep or rename the flag I
> added, and then the bridge could would look at it and enable the root
> block if the flag is set. Stephen however does not like having the
> bridge code look at magic flags for this behavior and would prefer for
> us to get the tools to ask for the root block. Let's follow more up on
> that thread
We don't need that new flag, just forget about it. Set that root_block 
flag from netback device init, around the time you generate the random 
MAC, or at the earliest possible time. Nothing else has to be done from 
kernel side. If someone wants netback to be a root port, then remove 
root_block from their tools, instead of changing the the MAC address, as 
it happens now.
Another problem with the random addresses, pointed out by Ian earlier, 
that when adding/removing interfaces, the bridge does recalculate it's 
MAC address, and choose the lowest one. In the general usecase I think 
that's normal, but in case of Xen networking, we would like to keep the 
bridge using the physical interface's MAC, because the local port of the 
bridge is used for Dom0 network traffic, therefore changing the bridge 
MAC when a netback device has lower MAC breaks that traffic. I think the 
best is to address this from userspace: if it set the MAC of the bridge 
explicitly, dev_set_mac_address() does dev->addr_assign_type = 
NET_ADDR_SET;, so br_stp_recalculate_bridge_id() will exit before 
changing anything.
And when I say userspace, I mean Xen specific tools which does 
networking configuration, e.g. xapi in XenServer case. Not brctl, it 
doesn't have to know whether this is a xenbrX device or a bridge used 
for another purposes.

Zoli

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss Feb. 21, 2014, 1:02 p.m. UTC | #18
On 20/02/14 20:24, Luis R. Rodriguez wrote:
> On Thu, Feb 20, 2014 at 9:19 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Wed, 19 Feb 2014 09:59:33 -0800 "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>>> On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>>>>
>>>> Please only use the netlink/sysfs flags fields that already exist
>>>> for new features.
>>>
>>> Sure, but what if we know a driver in most cases wants the root block
>>> and we'd want to make it the default, thereby only requiring userspace
>>> for toggling it off.
>>
>> Something in userspace has to put the device into the bridge.
>> Fix the port setup in that tool via the netlink or sysfs flags in
>> the bridge. It should not have to be handled in the bridge looking
>> at magic flags in the device.
>
> Agreed that's the best strategy and I'll work on sending patches to
> brctl to enable the root_block preference. This approach however also
I don't think brctl should deal with any Xen specific stuff. I assume 
there is a misunderstanding in this thread: when I (and possibly other 
Xen folks) talk about "userspace" or "toolstack" here, I mean Xen 
specific tools which use e.g. brctl to set up bridges. Not brctl itself.
> requires a userspace upgrade. I'm trying to see if we can get an
> old-nasty-cryptic-hack practice removed from the kernel and we'd try
> to prevent future drivers from using it -- without requiring userspace
> upgrade. In this case the bad practice is to using a high static MAC
> address for mimicking a root block default preference. In order to
> remove that *without* requiring a userspace upgrade the dev->priv_flag
> approach is the only thing I can think of. If this would go in we'd
> replace the high static MAC address with a random MAC address to
> prevent IPv6 SLAAC / DAD conflicts. I'd document this flag and
> indicate with preference for userspace to be the one tuning these
> knobs.
>
> Without this we'd have to keep the high static MAC address on upstream
> drivers and let userspace do the random'ization if it confirms the
> userspace knob to turn the root block flag is available. Is the
> priv_flag approach worth the compromise to remove the root block hack
> practice?
>
>    Luis
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 21, 2014, 3:59 p.m. UTC | #19
On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
> On 20/02/14 20:01, Luis R. Rodriguez wrote:
>>
>> On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss <zoltan.kiss@citrix.com>
>> wrote:
>>>
>>> How about this: netback sets the root_block flag and a random MAC by
>>> default. So the default behaviour won't change, DAD will be happy, and
>>> userspace don't have to do anything unless it's using netback for STP
>>> root
>>> bridge (I don't think there are too many toolstacks doing that), in which
>>> case it has to remove the root_block flag instead of setting a random
>>> MAC.
>>
>>
>> :D that's exactly what I ended up proposing too. I mentioned how
>> xen-netback could do this as well, we'd keep or rename the flag I
>> added, and then the bridge could would look at it and enable the root
>> block if the flag is set. Stephen however does not like having the
>> bridge code look at magic flags for this behavior and would prefer for
>> us to get the tools to ask for the root block. Let's follow more up on
>> that thread
>
> We don't need that new flag, just forget about it.

Unless I'm missing something the root_block flag is a bridge port
primitive. This means we can't set it *until* the interface gets added
to a bridge, and even then, its a knob that would be available only to
the bridge.

> Another problem with the random addresses, pointed out by Ian earlier, that
> when adding/removing interfaces, the bridge does recalculate it's MAC
> address, and choose the lowest one. In the general usecase I think that's
> normal, but in case of Xen networking, we would like to keep the bridge
> using the physical interface's MAC, because the local port of the bridge is
> used for Dom0 network traffic, therefore changing the bridge MAC when a
> netback device has lower MAC breaks that traffic.

This is a good reason then to actually have an interface general
specific knob to annotate to the bridge that we'd prefer to root_block
by default, the alternative as you point out below is to have the xen
/ kvm utils to set the bridge MAC address statically, but that'll
requires a userspace upgrade. I'm looking for a kernel solution that
is backwards compatible with old userspace.

> I think the best is to
> address this from userspace: if it set the MAC of the bridge explicitly,
> dev_set_mac_address() does dev->addr_assign_type = NET_ADDR_SET;, so
> br_stp_recalculate_bridge_id() will exit before changing anything.

That will certainly work for new xen / kvm util userspace.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 21, 2014, 4:01 p.m. UTC | #20
On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> Agreed that's the best strategy and I'll work on sending patches to
>> brctl to enable the root_block preference. This approach however also
>
> I don't think brctl should deal with any Xen specific stuff. I assume there
> is a misunderstanding in this thread: when I (and possibly other Xen folks)
> talk about "userspace" or "toolstack" here, I mean Xen specific tools which
> use e.g. brctl to set up bridges. Not brctl itself.

I did mean brctl, but as I looked at the code it doesn't used
rtnl_open() and not sure if Stephen would want that. Additionally even
if it did handle root_block the other issue with this strategy is that
as you noted upon initialization the bridge, without a static MAC
address, could end up setting the backend as the root port, until you
let userspace turn the root_block knob.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez Feb. 22, 2014, 1:38 a.m. UTC | #21
On Fri, Feb 21, 2014 at 8:01 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>>> Agreed that's the best strategy and I'll work on sending patches to
>>> brctl to enable the root_block preference. This approach however also
>>
>> I don't think brctl should deal with any Xen specific stuff. I assume there
>> is a misunderstanding in this thread: when I (and possibly other Xen folks)
>> talk about "userspace" or "toolstack" here, I mean Xen specific tools which
>> use e.g. brctl to set up bridges. Not brctl itself.
>
> I did mean brctl, but as I looked at the code it doesn't used
> rtnl_open() and not sure if Stephen would want that.

Actually that'd be the incorrect tool to extend, iproute2 would be the
new way with:

ip link add dev xenbr0 type bridge
ip link set dev eth0 master xenbr0
ip link set dev vif1.0 master xenbr0 <root_block>

where root_block would be the new desired argument. This would use the
rtnetlink RTM_SETLINK + IFLA_MASTER, which will in turn kick off the
bridge ndo_add_slave(). Still though it seems this requires the eth0
device to actually exist and as such from what I can tell we can't set
the root_block preference until *after* the addition onto the bridge,
which should mean the bridge could still take the vif1.0 MAC address
momentarily. This is of course only an issue if the link was up during
the additions. This makes me think perhaps nothing is needed then and
scripts could just use the:

bridge link set dev vif1.0 root_block on

I also just noticed that if an entry that was the bridge root port got
a root_block toggle we don't kick off the newly blocked port, I just
verified this. Note that removing the interface from the bridge does
however reset the bridge with a proper new root port:

ip link set dev vif1.0 nomaster

For old userspace with brctl and no iproute2 we're shit out of luck,
this means we can't use root block (xen-netblock was added on
v2.6.39).

Stephen all this can we add the priv_flags flag to help out as
proposed, but I'd make it just toggle the new root_block flag, that'd
enable drivers to use this from initialization. Let me know if you
have other suggestions or things I may have missed.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index d758163..8d10382 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -84,6 +84,7 @@ 
 #define IFF_LIVE_ADDR_CHANGE 0x100000	/* device supports hardware address
 					 * change when it's running */
 #define IFF_MACVLAN 0x200000		/* Macvlan device */
+#define IFF_BRIDGE_NON_ROOT 0x400000    /* Don't consider for root bridge */
 
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 4bf02ad..a745415 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -228,6 +228,8 @@  static struct net_bridge_port *new_nbp(struct net_bridge *br,
 	br_init_port(p);
 	p->state = BR_STATE_DISABLED;
 	br_stp_port_timer_init(p);
+	if (dev->priv_flags & IFF_BRIDGE_NON_ROOT)
+		p->flags |= BR_DONT_ROOT;
 	br_multicast_add_port(p);
 
 	return p;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 045d56e..a89e8ad 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -173,6 +173,7 @@  struct net_bridge_port
 #define BR_ADMIN_COST		0x00000010
 #define BR_LEARNING		0x00000020
 #define BR_FLOOD		0x00000040
+#define BR_DONT_ROOT		0x00000080
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	struct bridge_mcast_query	ip4_query;
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 656a6f3..12fd848 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -228,6 +228,8 @@  bool br_stp_recalculate_bridge_id(struct net_bridge *br)
 		return false;
 
 	list_for_each_entry(p, &br->port_list, list) {
+		if (p->flags & BR_DONT_ROOT)
+			continue;
 		if (addr == br_mac_zero ||
 		    memcmp(p->dev->dev_addr, addr, ETH_ALEN) < 0)
 			addr = p->dev->dev_addr;