diff mbox

more findings/questions on vlans/bonds

Message ID 13724.1240520690@death.nxdomain.ibm.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jay Vosburgh April 23, 2009, 9:04 p.m. UTC
[ Added VLAN maintainer to cc, patch enclosed ]

Or Gerlitz <ogerlitz@voltaire.com> wrote:

>I hope that you can help clarify what's the correct/supported method
>to work with vlans and bonds, with 2.6.29 I see that one can either
>
>	- vlan a bond (bond0.4001 over bond0 over eth0/1.4001)
>	- bond vlans (e.g bond0 over eth0/1.4001)
>
>I played a bit with bonding vlans (2.6.29 active-backup mode) and it
>doesn't seem to work - specifically, I noted that bonding doesn't issue
>fail-over after I changed the current slave link status to down ("ifconfig
>eth0.4001 down"). I suspect that the carrier based link monitoring scheme
>is broken wrt to vlan devices - e.g I found that at least from sysfs
>perspective the vlan device carrier isn't available:
>
>$ cat /sys/class/net/eth0.4001/carrier
>cat: /sys/class/net/eth0.4001/carrier: Invalid argument

	Ok, I just got a minute to fool with this, and it appears to
sort of work for me on a relatively recent cut of net-2.6 (approximately
2.6.30-rc1).  I didn't try 2.6.29 exactly, but the only issues I had
with the version I used are as described below.

	I suspect the "carrier: Invalid argument" response you describe
above is a result of the show_carrier function in net-sysfs.c, which
returns EINVAL if the device is not netif_running.  If the device (vlan
or not) is running, the carrier is returned.

	Anyway, configuring a bond over two VLAN devices (bond0 ->
eth{0,1}.600 -> eth0{0,1}), it configures and comes up fine.

	If I pull the cable for eth0 or eth1, the vlan device's carrier
state updates correctly, and bonding fails over.

	However, if I set eth0.VLAN or eth0 itself administratively
down, no failover takes place.  This is apparently because the VLAN
device doesn't adjust its carrier state for UP or DOWN events from the
underlying real device or for open/close of the VLAN device itself.  It
does update the carrier state for CHANGE events (from linkwatch, for
example).

	So, here's a patch with a description.  This resolves the
bonding problem for me; it might have other undesirable side effects,
though.  It's debatable whether it's really necessary, but it does make
the VLAN device behave more like a real network device in this case:

[PATCH] vlan: update vlan carrier state for admin up/down

	Currently, the VLAN event handler does not adjust the VLAN
device's carrier state when the real device or the VLAN device is set
administratively up or down.

	The following patch adds a transfer of operating state from the
real device to the VLAN device when the real device is administratively
set up or down, and sets the carrier state up or down during init, open
and close of the VLAN device.

	This permits observers above the VLAN device that care about the
carrier state (bonding's link monitor, for example) to receive updates
for administrative changes by more closely mimicing the behavior of real
devices.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy April 24, 2009, 3:22 p.m. UTC | #1
Jay Vosburgh wrote:
> [PATCH] vlan: update vlan carrier state for admin up/down
> 
> 	Currently, the VLAN event handler does not adjust the VLAN
> device's carrier state when the real device or the VLAN device is set
> administratively up or down.
> 
> 	The following patch adds a transfer of operating state from the
> real device to the VLAN device when the real device is administratively
> set up or down, and sets the carrier state up or down during init, open
> and close of the VLAN device.
> 
> 	This permits observers above the VLAN device that care about the
> carrier state (bonding's link monitor, for example) to receive updates
> for administrative changes by more closely mimicing the behavior of real
> devices.

Looks good, applied, thanks Jay.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz April 26, 2009, 8:21 a.m. UTC | #2
Jay Vosburgh wrote:
>> $ cat /sys/class/net/eth0.4001/carrier
>> cat: /sys/class/net/eth0.4001/carrier: Invalid argument

>         I suspect the "carrier: Invalid argument" response you describe
> above is a result of the show_carrier function in net-sysfs.c, which
> returns EINVAL if the device is not netif_running.  If the device (vlan
> or not) is running, the carrier is returned.

Yes, I see now that this is indeed the case with the sysfs code.

> So, here's a patch with a description.  This resolves the
> bonding problem for me

many thanks for doing this patch

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz April 26, 2009, 2:14 p.m. UTC | #3
Jay Vosburgh wrote:
>         So, here's a patch with a description.  This resolves the
> bonding problem for me; it might have other undesirable side effects,
> though.  It's debatable whether it's really necessary, but it does make
> the VLAN device behave more like a real network device in this case:
> 
> [PATCH] vlan: update vlan carrier state for admin up/down

Jay, I just can't apply the patch on 2.6.30-rc2 it all fails, which
is quite weird since the code (e.g net/8021q/vlan.c) hasn't gone through
many changes recently - any idea what I do wrong?

Or.

> # patch --dry-run -p 1 < /tmp/vlans_bonds.patch
> patching file net/8021q/vlan.c
> Hunk #1 FAILED at 492.
> Hunk #2 FAILED at 508.
> 2 out of 2 hunks FAILED -- saving rejects to file net/8021q/vlan.c.rej
> patching file net/8021q/vlan_dev.c
> Hunk #1 FAILED at 462.
> Hunk #2 FAILED at 472.
> Hunk #3 FAILED at 494.
> Hunk #4 FAILED at 615.
> 4 out of 4 hunks FAILED -- saving rejects to file net/8021q/vlan_dev.c.rej

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 2b7390e..d1e1054 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -492,6 +492,7 @@  static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 				continue;
 
 			dev_change_flags(vlandev, flgs & ~IFF_UP);
+			vlan_transfer_operstate(dev, vlandev);
 		}
 		break;
 
@@ -507,6 +508,7 @@  static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 				continue;
 
 			dev_change_flags(vlandev, flgs | IFF_UP);
+			vlan_transfer_operstate(dev, vlandev);
 		}
 		break;
 
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 1b34135..2ce6658 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -462,6 +462,7 @@  static int vlan_dev_open(struct net_device *dev)
 	if (vlan->flags & VLAN_FLAG_GVRP)
 		vlan_gvrp_request_join(dev);
 
+	netif_carrier_on(dev);
 	return 0;
 
 clear_allmulti:
@@ -471,6 +472,7 @@  del_unicast:
 	if (compare_ether_addr(dev->dev_addr, real_dev->dev_addr))
 		dev_unicast_delete(real_dev, dev->dev_addr, ETH_ALEN);
 out:
+	netif_carrier_off(dev);
 	return err;
 }
 
@@ -492,6 +494,7 @@  static int vlan_dev_stop(struct net_device *dev)
 	if (compare_ether_addr(dev->dev_addr, real_dev->dev_addr))
 		dev_unicast_delete(real_dev, dev->dev_addr, dev->addr_len);
 
+	netif_carrier_off(dev);
 	return 0;
 }
 
@@ -612,6 +615,8 @@  static int vlan_dev_init(struct net_device *dev)
 	struct net_device *real_dev = vlan_dev_info(dev)->real_dev;
 	int subclass = 0;
 
+	netif_carrier_off(dev);
+
 	/* IFF_BROADCAST|IFF_MULTICAST; ??? */
 	dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI);
 	dev->iflink = real_dev->ifindex;