diff mbox

bond interface arp, vlan and trunk / network question

Message ID 32563.1240449149@death.nxdomain.ibm.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jay Vosburgh April 23, 2009, 1:12 a.m. UTC
stefan novak <lms.brubaker@gmail.com> wrote:

>so its a bug in kernel?

	Yes.  I think the following patch should add support for
arp_validate over VLANs, could you test this?  This is still a work in
progress, so it's pretty rough around the edges, but the core
functionality should be there.

	This works by moving the skb_bond_should_drop logic around, and
replaces the current inline code with a hook into bonding to do all of
that, plus additional logic.  The reason for a hook is to get the
skb_bond_should_drop callers from the VLAN input path before the input
device is assigned to the VLAN device.

	-J



---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet April 23, 2009, 5:58 a.m. UTC | #1
Jay Vosburgh a écrit :
> stefan novak <lms.brubaker@gmail.com> wrote:
> 
>> so its a bug in kernel?
> 
> 	Yes.  I think the following patch should add support for
> arp_validate over VLANs, could you test this?  This is still a work in
> progress, so it's pretty rough around the edges, but the core
> functionality should be there.
> 
> 	This works by moving the skb_bond_should_drop logic around, and
> replaces the current inline code with a hook into bonding to do all of
> that, plus additional logic.  The reason for a hook is to get the
> skb_bond_should_drop callers from the VLAN input path before the input
> device is assigned to the VLAN device.
> 
> 	-J

Hi Jay

I wanted to test your patch and setup such VLAN config, and not yet applied your patch.

# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 250
ARP IP target/s (n.n.n.n form): 192.168.20.254

Slave Interface: eth1
MII Status: up
Link Failure Count: 8
Permanent HW addr: 00:1e:0b:ec:d3:d2

Slave Interface: eth2
MII Status: up
Link Failure Count: 11
Permanent HW addr: 00:1e:0b:92:78:50

# grep . /sys/class/net/bond1/bonding/*
/sys/class/net/bond1/bonding/active_slave:eth2
/sys/class/net/bond1/bonding/ad_select:stable 0
/sys/class/net/bond1/bonding/arp_interval:250
/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254
/sys/class/net/bond1/bonding/arp_validate:none 0
/sys/class/net/bond1/bonding/downdelay:0
/sys/class/net/bond1/bonding/fail_over_mac:none 0
/sys/class/net/bond1/bonding/lacp_rate:slow 0
/sys/class/net/bond1/bonding/miimon:0
/sys/class/net/bond1/bonding/mii_status:up
/sys/class/net/bond1/bonding/mode:active-backup 1
/sys/class/net/bond1/bonding/num_grat_arp:1
/sys/class/net/bond1/bonding/num_unsol_na:1
/sys/class/net/bond1/bonding/slaves:eth1 eth2
/sys/class/net/bond1/bonding/updelay:0
/sys/class/net/bond1/bonding/use_carrier:1
/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0



So active slave is eth2. Still I receive trafic on eth1, according to ifconfig :
# ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets
          RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
          RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0

exactly 4 packets per second.

# ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets
          RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0
          RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0

I also receive arp answers on eth2 (quite normal)

I wanted to tcpdump on eth1 but got nothing :

# tcpdump -p -n -s 0 -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C

# tcpdump -p -n -s 0 -i eth2
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110

# tcpdump -p -n -s 0 -i bond1
tcpdump: WARNING: bond1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01

# tcpdump -p -n -s 0 -i bond1.103
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01

Configuration script is 

ip link set eth1 up
ip link set eth2 up

ip addr flush dev eth1
ip addr flush dev eth2

ip link set eth1 up
ip link set eth2 up

modprobe bond1

ifconfig bond1 down

# test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP
echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target
echo 250 >/sys/class/net/bond1/bonding/arp_interval

# mode actif/passif (active-backup)
echo 1 >/sys/class/net/bond1/bonding/mode

ifconfig bond1 up

ifenslave bond1 eth1 eth2

ip link set eth1 up
ip link set eth2 up

ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103
ip addr add 192.168.20.110/24 dev bond1.103
ip link set bond1.103 up

ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825
ip addr add 10.170.73.123/25 dev bond1.825
ip link set bond1.825 up



Is this setup OK to test your patch ?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko April 23, 2009, 7:48 a.m. UTC | #2
Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote:
>stefan novak <lms.brubaker@gmail.com> wrote:
>
>>so its a bug in kernel?
>
>	Yes.  I think the following patch should add support for
>arp_validate over VLANs, could you test this?  This is still a work in
>progress, so it's pretty rough around the edges, but the core
>functionality should be there.
>
>	This works by moving the skb_bond_should_drop logic around, and
>replaces the current inline code with a hook into bonding to do all of
>that, plus additional logic.  The reason for a hook is to get the
>skb_bond_should_drop callers from the VLAN input path before the input
>device is assigned to the VLAN device.
>
>	-J
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 99610f3..cc367a3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 	read_lock(&bond->lock);
> 
> 	new_slave->last_arp_rx = jiffies;
>+	bond_fix_slave_validate_flag(bond, new_slave);
> 
> 	if (bond->params.miimon && !bond->params.use_carrier) {
> 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
> {
> 	struct sk_buff *skb;
> 
>-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>-	       slave_dev->name, dest_ip, src_ip, vlan_id);
>+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
> 	       
> 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
> 			 NULL, slave_dev->dev_addr, NULL);
>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
> 
> 	targets = bond->params.arp_targets;
> 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
> 		if (sip == targets[i]) {
> 			if (bond_has_this_ip(bond, tip))
> 				slave->last_arp_rx = jiffies;
>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
> 	}
> }
> 
>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
> {
> 	struct arphdr *arp;
> 	struct slave *slave;
>+	struct net_device *bond_dev = dev->master;
> 	struct bonding *bond;
> 	unsigned char *arp_ptr;
> 	__be32 sip, tip;
> 
> 	if (dev_net(dev) != &init_net)
>-		goto out;
>+		return;
> 
>-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>-		goto out;
>+	bond = netdev_priv(bond_dev);
> 
>-	bond = netdev_priv(dev);
> 	read_lock(&bond->lock);
> 
>-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>-		orig_dev ? orig_dev->name : "NULL");
>-
>-	slave = bond_get_slave_by_dev(bond, orig_dev);
>+	slave = bond_get_slave_by_dev(bond, dev);
> 	if (!slave || !slave_do_arp_validate(bond, slave))
> 		goto out_unlock;
> 
> 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
> 		goto out_unlock;
> 
>-	arp = arp_hdr(skb);
>+	/* Can't use arp_hdr; it's not initialized yet. */
>+	arp = (struct arphdr *)skb->data;
>+
>+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>+	       ntohs(arp->ar_pro), arp->ar_pln);
>+
> 	if (arp->ar_hln != dev->addr_len ||
> 	    skb->pkt_type == PACKET_OTHERHOST ||
> 	    skb->pkt_type == PACKET_LOOPBACK ||
>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
> 	arp_ptr += 4 + dev->addr_len;
> 	memcpy(&tip, arp_ptr, 4);
> 
>-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>-		bond->dev->name, slave->dev->name, slave->state,
>-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>-		&sip, &tip);
>-
>-	/*
>-	 * Backup slaves won't see the ARP reply, but do come through
>-	 * here for each ARP probe (so we swap the sip/tip to validate
>-	 * the probe).  In a "redundant switch, common router" type of
>-	 * configuration, the ARP probe will (hopefully) travel from
>-	 * the active, through one switch, the router, then the other
>-	 * switch before reaching the backup.
>-	 */
>-	if (slave->state == BOND_STATE_ACTIVE)
>+	switch (ntohs(arp->ar_op)) {
>+	case ARPOP_REPLY:
> 		bond_validate_arp(bond, slave, sip, tip);
>-	else
>+		break;
>+	case ARPOP_REQUEST:
> 		bond_validate_arp(bond, slave, tip, sip);
>+		break;
>+	}
> 
> out_unlock:
> 	read_unlock(&bond->lock);
>-out:
>-	dev_kfree_skb(skb);
>-	return NET_RX_SUCCESS;
>+}
>+
>+/*
>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>+ * path to (a) determine if packet should be dropped, and (b) to perform
>+ * ARP monitor processing (last_rx, validation).
>+ *
>+ * For the VLAN case, called before the skb->dev is reassigned to the
>+ * VLAN.
>+ *
>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>+ *
>+ * We want to keep:
>+ * - all traffic on active slaves
>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>+ *
>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>+ * takes place here, so they need not be kept on inactive slaves.
>+ */
>+static int bond_handle_frame(struct sk_buff *skb)
>+{
>+	struct net_device *dev;
>+	struct net_device *master;
>+
>+	dev = skb->dev;
>+	master = dev->master;
>+	if (!master || !master->priv_flags & IFF_BONDING)
>+		return 0;
>+
>+	if (master->priv_flags & IFF_MASTER_ARPMON) {
>+		dev->last_rx = jiffies;
>+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>+		    skb->protocol == __constant_htons(ETH_P_ARP))
>+			bond_handle_arp(skb, dev);
>+	}
>+
>+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>+		if (master->priv_flags & IFF_MASTER_ALB) {
>+			if (skb->pkt_type != PACKET_BROADCAST &&
>+			    skb->pkt_type != PACKET_MULTICAST)
>+				return 0;
>+		}
>+		if (master->priv_flags & IFF_MASTER_8023AD &&
>+		    skb->protocol == __constant_htons(ETH_P_SLOW))
>+			return 0;
>+
>+		return 1;
>+	}
>+	return 0;
> }
> 
> /*
>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
> 
> void bond_register_arp(struct bonding *bond)
> {
>+#if 0
> 	struct packet_type *pt = &bond->arp_mon_pt;
> 
> 	if (pt->type)
>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
> 	pt->dev = bond->dev;
> 	pt->func = bond_arp_rcv;
> 	dev_add_pack(pt);
>+#endif
> }
> 
> void bond_unregister_arp(struct bonding *bond)
> {
>+#if 0
> 	struct packet_type *pt = &bond->arp_mon_pt;
> 
> 	dev_remove_pack(pt);
> 	pt->type = 0;
>+#endif
> }
> 
> /*---------------------------- Hashing Policies -----------------------------*/
>@@ -5188,6 +5230,8 @@ out_rtnl:
> 	return res;
> }
> 
>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>+
> static int __init bonding_init(void)
> {
> 	int i;
>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
> 	register_inetaddr_notifier(&bond_inetaddr_notifier);
> 	bond_register_ipv6_notifier();
> 
>+	bond_handle_frame_hook = bond_handle_frame;
>+
> 	goto out;
> err:
> 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
> 	rtnl_lock();
> 	bond_free_all();
> 	rtnl_unlock();
>+
>+	bond_handle_frame_hook = NULL;
>+
> }
> 
> module_init(bonding_init);
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 18cf478..0e9f0e9 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
> 					  struct device_attribute *attr,
> 					  const char *buf, size_t count)
> {
>-	int new_value;
>+	int new_value, i;
> 	struct bonding *bond = to_bond(d);
>+	struct slave *slave;
> 
> 	new_value = bond_parse_parm(buf, arp_validate_tbl);
> 	if (new_value < 0) {
>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
> 	       bond->dev->name, arp_validate_tbl[new_value].modename,
> 	       new_value);
> 
>-	if (!bond->params.arp_validate && new_value) {
>-		bond_register_arp(bond);
>-	} else if (bond->params.arp_validate && !new_value) {
>-		bond_unregister_arp(bond);
>-	}
>+	if (bond->params.arp_validate != new_value) {
>+		bond->params.arp_validate = new_value;
> 
>-	bond->params.arp_validate = new_value;
>+		bond_for_each_slave(bond, slave, i)
>+			bond_fix_slave_validate_flag(bond, slave);
>+	}
> 
> 	return count;
> }
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index ca849d2..275f08d 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
> 	return slave->dev->last_rx;
> }
> 
>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>+					       struct slave *slave)
>+{
>+	if (slave_do_arp_validate(bond, slave))
>+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>+	else
>+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>+}
>+
> static inline void bond_set_slave_inactive_flags(struct slave *slave)
> {
> 	struct bonding *bond = netdev_priv(slave->dev->master);
>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
> 	    bond->params.mode != BOND_MODE_ALB)
> 		slave->state = BOND_STATE_BACKUP;
> 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>-	if (slave_do_arp_validate(bond, slave))
>-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>+	bond_fix_slave_validate_flag(bond, slave);
> }
> 
> static inline void bond_set_slave_active_flags(struct slave *slave)
> {
>+	struct bonding *bond = netdev_priv(slave->dev->master);
>+
> 	slave->state = BOND_STATE_ACTIVE;
>-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>+	bond_fix_slave_validate_flag(bond, slave);
> }
> 
> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 2e7783f..3dafd8f 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
> 	dev->gso_max_size = size;
> }
> 
>+#if 0
> /* On bonding slaves other than the currently active slave, suppress
>  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>  * ARP on active-backup slaves with arp_validate enabled.
>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
> 	}
> 	return 0;
> }
>+#endif
>+extern int skb_bond_should_drop(struct sk_buff *skb);
> 
> extern struct pernet_operations __net_initdata loopback_net_ops;
> #endif /* __KERNEL__ */
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 91d792d..ac5a37e 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
> #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
> #endif
> 
>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>+
>+/* On bonding slaves other than the currently active slave, suppress
>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>+ * ARP on active-backup slaves with arp_validate enabled.
>+ */
>+int skb_bond_should_drop(struct sk_buff *skb)
>+{
>+	struct net_device *dev = skb->dev;
>+	struct net_device *master = dev->master;
>+
>+	if (master)
>+		return bond_handle_frame_hook(skb);

Maybe this hook can be called from netif_receive_skb() directly. You would safe
at least 2 dereferences, 1 if check. You would also need to add
"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>+
>+	return 0;
>+
>+#if 0
>+		if (master->priv_flags & IFF_MASTER_ARPMON)
>+			dev->last_rx = jiffies;
>+
>+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>+			    skb->protocol == __constant_htons(ETH_P_ARP))
>+				return 0;
>+
>+			if (master->priv_flags & IFF_MASTER_ALB) {
>+				if (skb->pkt_type != PACKET_BROADCAST &&
>+				    skb->pkt_type != PACKET_MULTICAST)
>+					return 0;
>+			}
>+			if (master->priv_flags & IFF_MASTER_8023AD &&
>+			    skb->protocol == __constant_htons(ETH_P_SLOW))
>+				return 0;
>+
>+			return 1;
>+		}
>+	}
>+	return 0;
>+#endif /* 0 */
>+}
>+#else
>+int skb_bond_should_drop(struct sk_buff *skb)
>+{
>+	return 0;
>+}
>+#endif
>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>+
> #ifdef CONFIG_NET_CLS_ACT
> /* TODO: Maybe we should just force sch_ingress to be compiled in
>  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>
>
>---
>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh April 23, 2009, 2:59 p.m. UTC | #3
Jiri Pirko <jpirko@redhat.com> wrote:

>Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote:
>>stefan novak <lms.brubaker@gmail.com> wrote:
>>
>>>so its a bug in kernel?
>>
>>	Yes.  I think the following patch should add support for
>>arp_validate over VLANs, could you test this?  This is still a work in
>>progress, so it's pretty rough around the edges, but the core
>>functionality should be there.
>>
>>	This works by moving the skb_bond_should_drop logic around, and
>>replaces the current inline code with a hook into bonding to do all of
>>that, plus additional logic.  The reason for a hook is to get the
>>skb_bond_should_drop callers from the VLAN input path before the input
>>device is assigned to the VLAN device.
>>
>>	-J
>>
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index 99610f3..cc367a3 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>> 	read_lock(&bond->lock);
>> 
>> 	new_slave->last_arp_rx = jiffies;
>>+	bond_fix_slave_validate_flag(bond, new_slave);
>> 
>> 	if (bond->params.miimon && !bond->params.use_carrier) {
>> 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
>> {
>> 	struct sk_buff *skb;
>> 
>>-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>>-	       slave_dev->name, dest_ip, src_ip, vlan_id);
>>+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>>+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
>> 	       
>> 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
>> 			 NULL, slave_dev->dev_addr, NULL);
>>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>> 
>> 	targets = bond->params.arp_targets;
>> 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>>-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>>-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
>> 		if (sip == targets[i]) {
>> 			if (bond_has_this_ip(bond, tip))
>> 				slave->last_arp_rx = jiffies;
>>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>> 	}
>> }
>> 
>>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
>> {
>> 	struct arphdr *arp;
>> 	struct slave *slave;
>>+	struct net_device *bond_dev = dev->master;
>> 	struct bonding *bond;
>> 	unsigned char *arp_ptr;
>> 	__be32 sip, tip;
>> 
>> 	if (dev_net(dev) != &init_net)
>>-		goto out;
>>+		return;
>> 
>>-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>>-		goto out;
>>+	bond = netdev_priv(bond_dev);
>> 
>>-	bond = netdev_priv(dev);
>> 	read_lock(&bond->lock);
>> 
>>-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>>-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>>-		orig_dev ? orig_dev->name : "NULL");
>>-
>>-	slave = bond_get_slave_by_dev(bond, orig_dev);
>>+	slave = bond_get_slave_by_dev(bond, dev);
>> 	if (!slave || !slave_do_arp_validate(bond, slave))
>> 		goto out_unlock;
>> 
>> 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
>> 		goto out_unlock;
>> 
>>-	arp = arp_hdr(skb);
>>+	/* Can't use arp_hdr; it's not initialized yet. */
>>+	arp = (struct arphdr *)skb->data;
>>+
>>+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>>+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>>+	       ntohs(arp->ar_pro), arp->ar_pln);
>>+
>> 	if (arp->ar_hln != dev->addr_len ||
>> 	    skb->pkt_type == PACKET_OTHERHOST ||
>> 	    skb->pkt_type == PACKET_LOOPBACK ||
>>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
>> 	arp_ptr += 4 + dev->addr_len;
>> 	memcpy(&tip, arp_ptr, 4);
>> 
>>-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>>-		bond->dev->name, slave->dev->name, slave->state,
>>-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>>-		&sip, &tip);
>>-
>>-	/*
>>-	 * Backup slaves won't see the ARP reply, but do come through
>>-	 * here for each ARP probe (so we swap the sip/tip to validate
>>-	 * the probe).  In a "redundant switch, common router" type of
>>-	 * configuration, the ARP probe will (hopefully) travel from
>>-	 * the active, through one switch, the router, then the other
>>-	 * switch before reaching the backup.
>>-	 */
>>-	if (slave->state == BOND_STATE_ACTIVE)
>>+	switch (ntohs(arp->ar_op)) {
>>+	case ARPOP_REPLY:
>> 		bond_validate_arp(bond, slave, sip, tip);
>>-	else
>>+		break;
>>+	case ARPOP_REQUEST:
>> 		bond_validate_arp(bond, slave, tip, sip);
>>+		break;
>>+	}
>> 
>> out_unlock:
>> 	read_unlock(&bond->lock);
>>-out:
>>-	dev_kfree_skb(skb);
>>-	return NET_RX_SUCCESS;
>>+}
>>+
>>+/*
>>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>>+ * path to (a) determine if packet should be dropped, and (b) to perform
>>+ * ARP monitor processing (last_rx, validation).
>>+ *
>>+ * For the VLAN case, called before the skb->dev is reassigned to the
>>+ * VLAN.
>>+ *
>>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>>+ *
>>+ * We want to keep:
>>+ * - all traffic on active slaves
>>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>>+ *
>>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>>+ * takes place here, so they need not be kept on inactive slaves.
>>+ */
>>+static int bond_handle_frame(struct sk_buff *skb)
>>+{
>>+	struct net_device *dev;
>>+	struct net_device *master;
>>+
>>+	dev = skb->dev;
>>+	master = dev->master;
>>+	if (!master || !master->priv_flags & IFF_BONDING)
>>+		return 0;
>>+
>>+	if (master->priv_flags & IFF_MASTER_ARPMON) {
>>+		dev->last_rx = jiffies;
>>+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>+		    skb->protocol == __constant_htons(ETH_P_ARP))
>>+			bond_handle_arp(skb, dev);
>>+	}
>>+
>>+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>+		if (master->priv_flags & IFF_MASTER_ALB) {
>>+			if (skb->pkt_type != PACKET_BROADCAST &&
>>+			    skb->pkt_type != PACKET_MULTICAST)
>>+				return 0;
>>+		}
>>+		if (master->priv_flags & IFF_MASTER_8023AD &&
>>+		    skb->protocol == __constant_htons(ETH_P_SLOW))
>>+			return 0;
>>+
>>+		return 1;
>>+	}
>>+	return 0;
>> }
>> 
>> /*
>>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
>> 
>> void bond_register_arp(struct bonding *bond)
>> {
>>+#if 0
>> 	struct packet_type *pt = &bond->arp_mon_pt;
>> 
>> 	if (pt->type)
>>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
>> 	pt->dev = bond->dev;
>> 	pt->func = bond_arp_rcv;
>> 	dev_add_pack(pt);
>>+#endif
>> }
>> 
>> void bond_unregister_arp(struct bonding *bond)
>> {
>>+#if 0
>> 	struct packet_type *pt = &bond->arp_mon_pt;
>> 
>> 	dev_remove_pack(pt);
>> 	pt->type = 0;
>>+#endif
>> }
>> 
>> /*---------------------------- Hashing Policies -----------------------------*/
>>@@ -5188,6 +5230,8 @@ out_rtnl:
>> 	return res;
>> }
>> 
>>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>+
>> static int __init bonding_init(void)
>> {
>> 	int i;
>>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
>> 	register_inetaddr_notifier(&bond_inetaddr_notifier);
>> 	bond_register_ipv6_notifier();
>> 
>>+	bond_handle_frame_hook = bond_handle_frame;
>>+
>> 	goto out;
>> err:
>> 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
>> 	rtnl_lock();
>> 	bond_free_all();
>> 	rtnl_unlock();
>>+
>>+	bond_handle_frame_hook = NULL;
>>+
>> }
>> 
>> module_init(bonding_init);
>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>index 18cf478..0e9f0e9 100644
>>--- a/drivers/net/bonding/bond_sysfs.c
>>+++ b/drivers/net/bonding/bond_sysfs.c
>>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>> 					  struct device_attribute *attr,
>> 					  const char *buf, size_t count)
>> {
>>-	int new_value;
>>+	int new_value, i;
>> 	struct bonding *bond = to_bond(d);
>>+	struct slave *slave;
>> 
>> 	new_value = bond_parse_parm(buf, arp_validate_tbl);
>> 	if (new_value < 0) {
>>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>> 	       bond->dev->name, arp_validate_tbl[new_value].modename,
>> 	       new_value);
>> 
>>-	if (!bond->params.arp_validate && new_value) {
>>-		bond_register_arp(bond);
>>-	} else if (bond->params.arp_validate && !new_value) {
>>-		bond_unregister_arp(bond);
>>-	}
>>+	if (bond->params.arp_validate != new_value) {
>>+		bond->params.arp_validate = new_value;
>> 
>>-	bond->params.arp_validate = new_value;
>>+		bond_for_each_slave(bond, slave, i)
>>+			bond_fix_slave_validate_flag(bond, slave);
>>+	}
>> 
>> 	return count;
>> }
>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>>index ca849d2..275f08d 100644
>>--- a/drivers/net/bonding/bonding.h
>>+++ b/drivers/net/bonding/bonding.h
>>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
>> 	return slave->dev->last_rx;
>> }
>> 
>>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>>+					       struct slave *slave)
>>+{
>>+	if (slave_do_arp_validate(bond, slave))
>>+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>+	else
>>+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>>+}
>>+
>> static inline void bond_set_slave_inactive_flags(struct slave *slave)
>> {
>> 	struct bonding *bond = netdev_priv(slave->dev->master);
>>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
>> 	    bond->params.mode != BOND_MODE_ALB)
>> 		slave->state = BOND_STATE_BACKUP;
>> 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>>-	if (slave_do_arp_validate(bond, slave))
>>-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>+	bond_fix_slave_validate_flag(bond, slave);
>> }
>> 
>> static inline void bond_set_slave_active_flags(struct slave *slave)
>> {
>>+	struct bonding *bond = netdev_priv(slave->dev->master);
>>+
>> 	slave->state = BOND_STATE_ACTIVE;
>>-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>>+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>>+	bond_fix_slave_validate_flag(bond, slave);
>> }
>> 
>> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>index 2e7783f..3dafd8f 100644
>>--- a/include/linux/netdevice.h
>>+++ b/include/linux/netdevice.h
>>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
>> 	dev->gso_max_size = size;
>> }
>> 
>>+#if 0
>> /* On bonding slaves other than the currently active slave, suppress
>>  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>  * ARP on active-backup slaves with arp_validate enabled.
>>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
>> 	}
>> 	return 0;
>> }
>>+#endif
>>+extern int skb_bond_should_drop(struct sk_buff *skb);
>> 
>> extern struct pernet_operations __net_initdata loopback_net_ops;
>> #endif /* __KERNEL__ */
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 91d792d..ac5a37e 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
>> #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
>> #endif
>> 
>>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>>+
>>+/* On bonding slaves other than the currently active slave, suppress
>>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>+ * ARP on active-backup slaves with arp_validate enabled.
>>+ */
>>+int skb_bond_should_drop(struct sk_buff *skb)
>>+{
>>+	struct net_device *dev = skb->dev;
>>+	struct net_device *master = dev->master;
>>+
>>+	if (master)
>>+		return bond_handle_frame_hook(skb);
>
>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>at least 2 dereferences, 1 if check. You would also need to add
>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().

	This won't work, because the VLAN code reassigns skb->dev to the
VLAN device before calling netif_receive_skb.

	In the VLAN path, the second call to skb_bond_should_drop (the
first is within the VLAN code itself, __vlan_hwaccel_rx or
vlan_gro_common, the second is netif_receive_skb) won't call the hook,
because the VLAN device has no dev->master.

	This is the whole reason for the hook: to process the ARP before
VLAN reassigns skb->dev.  Once that happens, the actual device the ARP
arrived on is lost.

	-J


>>+
>>+	return 0;
>>+
>>+#if 0
>>+		if (master->priv_flags & IFF_MASTER_ARPMON)
>>+			dev->last_rx = jiffies;
>>+
>>+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>+			    skb->protocol == __constant_htons(ETH_P_ARP))
>>+				return 0;
>>+
>>+			if (master->priv_flags & IFF_MASTER_ALB) {
>>+				if (skb->pkt_type != PACKET_BROADCAST &&
>>+				    skb->pkt_type != PACKET_MULTICAST)
>>+					return 0;
>>+			}
>>+			if (master->priv_flags & IFF_MASTER_8023AD &&
>>+			    skb->protocol == __constant_htons(ETH_P_SLOW))
>>+				return 0;
>>+
>>+			return 1;
>>+		}
>>+	}
>>+	return 0;
>>+#endif /* 0 */
>>+}
>>+#else
>>+int skb_bond_should_drop(struct sk_buff *skb)
>>+{
>>+	return 0;
>>+}
>>+#endif
>>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>>+
>> #ifdef CONFIG_NET_CLS_ACT
>> /* TODO: Maybe we should just force sch_ingress to be compiled in
>>  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>>
>>
>>---
>>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh April 23, 2009, 3:38 p.m. UTC | #4
Eric Dumazet <dada1@cosmosbay.com> wrote:

>Jay Vosburgh a écrit :
>> stefan novak <lms.brubaker@gmail.com> wrote:
>> 
>>> so its a bug in kernel?
>> 
>> 	Yes.  I think the following patch should add support for
>> arp_validate over VLANs, could you test this?  This is still a work in
>> progress, so it's pretty rough around the edges, but the core
>> functionality should be there.
>> 
>> 	This works by moving the skb_bond_should_drop logic around, and
>> replaces the current inline code with a hook into bonding to do all of
>> that, plus additional logic.  The reason for a hook is to get the
>> skb_bond_should_drop callers from the VLAN input path before the input
>> device is assigned to the VLAN device.
>> 
>> 	-J
>
>Hi Jay
>
>I wanted to test your patch and setup such VLAN config, and not yet applied your patch.
>
># cat /proc/net/bonding/bond1
>Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
>
>Bonding Mode: fault-tolerance (active-backup)
>Primary Slave: None
>Currently Active Slave: eth2
>MII Status: up
>MII Polling Interval (ms): 0
>Up Delay (ms): 0
>Down Delay (ms): 0
>ARP Polling Interval (ms): 250
>ARP IP target/s (n.n.n.n form): 192.168.20.254
>
>Slave Interface: eth1
>MII Status: up
>Link Failure Count: 8
>Permanent HW addr: 00:1e:0b:ec:d3:d2
>
>Slave Interface: eth2
>MII Status: up
>Link Failure Count: 11
>Permanent HW addr: 00:1e:0b:92:78:50
>
># grep . /sys/class/net/bond1/bonding/*
>/sys/class/net/bond1/bonding/active_slave:eth2
>/sys/class/net/bond1/bonding/ad_select:stable 0
>/sys/class/net/bond1/bonding/arp_interval:250
>/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254
>/sys/class/net/bond1/bonding/arp_validate:none 0

	To test this patch, you'll need to set arp_validate to all.
Well, insuring that nothing breaks without arp_validate is good, too,
but the patch is supposed to make arp_validate function with an
arp_ip_target accessed via a VLAN.

>/sys/class/net/bond1/bonding/downdelay:0
>/sys/class/net/bond1/bonding/fail_over_mac:none 0
>/sys/class/net/bond1/bonding/lacp_rate:slow 0
>/sys/class/net/bond1/bonding/miimon:0
>/sys/class/net/bond1/bonding/mii_status:up
>/sys/class/net/bond1/bonding/mode:active-backup 1
>/sys/class/net/bond1/bonding/num_grat_arp:1
>/sys/class/net/bond1/bonding/num_unsol_na:1
>/sys/class/net/bond1/bonding/slaves:eth1 eth2
>/sys/class/net/bond1/bonding/updelay:0
>/sys/class/net/bond1/bonding/use_carrier:1
>/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0
>
>
>
>So active slave is eth2. Still I receive trafic on eth1, according to ifconfig :
># ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets
>          RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
>          RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
>
>exactly 4 packets per second.

	This is expected, these are the broadcast ARP_REQUEST packets
the bond issues as probes, every 250 ms as specified by your
arp_interval.  Any broadcasts on the switch or other traffic flooded to
all ports will come into the slave device, and (for the active-backup
inactive slave case) all of it is thrown away.  There's some trickery in
netif_receive_skb for sockets that bind directly to the slave, but that
doesn't seem to affect tcpdump.

># ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets
>          RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0
>          RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0
>
>I also receive arp answers on eth2 (quite normal)
>
>I wanted to tcpdump on eth1 but got nothing :
>
># tcpdump -p -n -s 0 -i eth1
>tcpdump: WARNING: eth1: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

	This is normal, as incoming traffic is assigned to the master
before the packet capture gets a look.

># tcpdump -p -n -s 0 -i eth2
>tcpdump: WARNING: eth2: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110

	This is also normal (seeing only outbound traffic) for the same
reason as above.

># tcpdump -p -n -s 0 -i bond1
>tcpdump: WARNING: bond1: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>
># tcpdump -p -n -s 0 -i bond1.103
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>
>Configuration script is 
>
>ip link set eth1 up
>ip link set eth2 up
>
>ip addr flush dev eth1
>ip addr flush dev eth2
>
>ip link set eth1 up
>ip link set eth2 up
>
>modprobe bond1
>
>ifconfig bond1 down
>
># test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP
>echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target
>echo 250 >/sys/class/net/bond1/bonding/arp_interval
>
># mode actif/passif (active-backup)
>echo 1 >/sys/class/net/bond1/bonding/mode
>
>ifconfig bond1 up
>
>ifenslave bond1 eth1 eth2
>
>ip link set eth1 up
>ip link set eth2 up
>
>ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103
>ip addr add 192.168.20.110/24 dev bond1.103
>ip link set bond1.103 up
>
>ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825
>ip addr add 10.170.73.123/25 dev bond1.825
>ip link set bond1.825 up
>
>
>
>Is this setup OK to test your patch ?

	I believe so, provided you enable arp_validate as mentioned
above.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 99610f3..cc367a3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1607,6 +1607,7 @@  int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	read_lock(&bond->lock);
 
 	new_slave->last_arp_rx = jiffies;
+	bond_fix_slave_validate_flag(bond, new_slave);
 
 	if (bond->params.miimon && !bond->params.use_carrier) {
 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
@@ -2538,8 +2539,8 @@  static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
 {
 	struct sk_buff *skb;
 
-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
-	       slave_dev->name, dest_ip, src_ip, vlan_id);
+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
 	       
 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
 			 NULL, slave_dev->dev_addr, NULL);
@@ -2679,8 +2680,6 @@  static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 
 	targets = bond->params.arp_targets;
 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
 		if (sip == targets[i]) {
 			if (bond_has_this_ip(bond, tip))
 				slave->last_arp_rx = jiffies;
@@ -2689,35 +2688,36 @@  static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	}
 }
 
-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
 {
 	struct arphdr *arp;
 	struct slave *slave;
+	struct net_device *bond_dev = dev->master;
 	struct bonding *bond;
 	unsigned char *arp_ptr;
 	__be32 sip, tip;
 
 	if (dev_net(dev) != &init_net)
-		goto out;
+		return;
 
-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
-		goto out;
+	bond = netdev_priv(bond_dev);
 
-	bond = netdev_priv(dev);
 	read_lock(&bond->lock);
 
-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
-		orig_dev ? orig_dev->name : "NULL");
-
-	slave = bond_get_slave_by_dev(bond, orig_dev);
+	slave = bond_get_slave_by_dev(bond, dev);
 	if (!slave || !slave_do_arp_validate(bond, slave))
 		goto out_unlock;
 
 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
 		goto out_unlock;
 
-	arp = arp_hdr(skb);
+	/* Can't use arp_hdr; it's not initialized yet. */
+	arp = (struct arphdr *)skb->data;
+
+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
+	       ntohs(arp->ar_pro), arp->ar_pln);
+
 	if (arp->ar_hln != dev->addr_len ||
 	    skb->pkt_type == PACKET_OTHERHOST ||
 	    skb->pkt_type == PACKET_LOOPBACK ||
@@ -2732,29 +2732,67 @@  static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
 	arp_ptr += 4 + dev->addr_len;
 	memcpy(&tip, arp_ptr, 4);
 
-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
-		bond->dev->name, slave->dev->name, slave->state,
-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
-		&sip, &tip);
-
-	/*
-	 * Backup slaves won't see the ARP reply, but do come through
-	 * here for each ARP probe (so we swap the sip/tip to validate
-	 * the probe).  In a "redundant switch, common router" type of
-	 * configuration, the ARP probe will (hopefully) travel from
-	 * the active, through one switch, the router, then the other
-	 * switch before reaching the backup.
-	 */
-	if (slave->state == BOND_STATE_ACTIVE)
+	switch (ntohs(arp->ar_op)) {
+	case ARPOP_REPLY:
 		bond_validate_arp(bond, slave, sip, tip);
-	else
+		break;
+	case ARPOP_REQUEST:
 		bond_validate_arp(bond, slave, tip, sip);
+		break;
+	}
 
 out_unlock:
 	read_unlock(&bond->lock);
-out:
-	dev_kfree_skb(skb);
-	return NET_RX_SUCCESS;
+}
+
+/*
+ * Called by core packet processing in netif_receive_skb and in VLAN fast
+ * path to (a) determine if packet should be dropped, and (b) to perform
+ * ARP monitor processing (last_rx, validation).
+ *
+ * For the VLAN case, called before the skb->dev is reassigned to the
+ * VLAN.
+ *
+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
+ *
+ * We want to keep:
+ * - all traffic on active slaves
+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
+ * ALB/TLB mode and LACP frames in 802.3ad mode.
+ *
+ * ARP frames are handled as normal traffic; the ARP monitor handling
+ * takes place here, so they need not be kept on inactive slaves.
+ */
+static int bond_handle_frame(struct sk_buff *skb)
+{
+	struct net_device *dev;
+	struct net_device *master;
+
+	dev = skb->dev;
+	master = dev->master;
+	if (!master || !master->priv_flags & IFF_BONDING)
+		return 0;
+
+	if (master->priv_flags & IFF_MASTER_ARPMON) {
+		dev->last_rx = jiffies;
+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
+		    skb->protocol == __constant_htons(ETH_P_ARP))
+			bond_handle_arp(skb, dev);
+	}
+
+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
+		if (master->priv_flags & IFF_MASTER_ALB) {
+			if (skb->pkt_type != PACKET_BROADCAST &&
+			    skb->pkt_type != PACKET_MULTICAST)
+				return 0;
+		}
+		if (master->priv_flags & IFF_MASTER_8023AD &&
+		    skb->protocol == __constant_htons(ETH_P_SLOW))
+			return 0;
+
+		return 1;
+	}
+	return 0;
 }
 
 /*
@@ -3688,6 +3726,7 @@  static void bond_unregister_lacpdu(struct bonding *bond)
 
 void bond_register_arp(struct bonding *bond)
 {
+#if 0
 	struct packet_type *pt = &bond->arp_mon_pt;
 
 	if (pt->type)
@@ -3697,14 +3736,17 @@  void bond_register_arp(struct bonding *bond)
 	pt->dev = bond->dev;
 	pt->func = bond_arp_rcv;
 	dev_add_pack(pt);
+#endif
 }
 
 void bond_unregister_arp(struct bonding *bond)
 {
+#if 0
 	struct packet_type *pt = &bond->arp_mon_pt;
 
 	dev_remove_pack(pt);
 	pt->type = 0;
+#endif
 }
 
 /*---------------------------- Hashing Policies -----------------------------*/
@@ -5188,6 +5230,8 @@  out_rtnl:
 	return res;
 }
 
+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
+
 static int __init bonding_init(void)
 {
 	int i;
@@ -5221,6 +5265,8 @@  static int __init bonding_init(void)
 	register_inetaddr_notifier(&bond_inetaddr_notifier);
 	bond_register_ipv6_notifier();
 
+	bond_handle_frame_hook = bond_handle_frame;
+
 	goto out;
 err:
 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
@@ -5249,6 +5295,9 @@  static void __exit bonding_exit(void)
 	rtnl_lock();
 	bond_free_all();
 	rtnl_unlock();
+
+	bond_handle_frame_hook = NULL;
+
 }
 
 module_init(bonding_init);
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 18cf478..0e9f0e9 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -484,8 +484,9 @@  static ssize_t bonding_store_arp_validate(struct device *d,
 					  struct device_attribute *attr,
 					  const char *buf, size_t count)
 {
-	int new_value;
+	int new_value, i;
 	struct bonding *bond = to_bond(d);
+	struct slave *slave;
 
 	new_value = bond_parse_parm(buf, arp_validate_tbl);
 	if (new_value < 0) {
@@ -504,13 +505,12 @@  static ssize_t bonding_store_arp_validate(struct device *d,
 	       bond->dev->name, arp_validate_tbl[new_value].modename,
 	       new_value);
 
-	if (!bond->params.arp_validate && new_value) {
-		bond_register_arp(bond);
-	} else if (bond->params.arp_validate && !new_value) {
-		bond_unregister_arp(bond);
-	}
+	if (bond->params.arp_validate != new_value) {
+		bond->params.arp_validate = new_value;
 
-	bond->params.arp_validate = new_value;
+		bond_for_each_slave(bond, slave, i)
+			bond_fix_slave_validate_flag(bond, slave);
+	}
 
 	return count;
 }
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ca849d2..275f08d 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -283,6 +283,15 @@  static inline unsigned long slave_last_rx(struct bonding *bond,
 	return slave->dev->last_rx;
 }
 
+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
+					       struct slave *slave)
+{
+	if (slave_do_arp_validate(bond, slave))
+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
+	else
+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
+}
+
 static inline void bond_set_slave_inactive_flags(struct slave *slave)
 {
 	struct bonding *bond = netdev_priv(slave->dev->master);
@@ -290,14 +299,16 @@  static inline void bond_set_slave_inactive_flags(struct slave *slave)
 	    bond->params.mode != BOND_MODE_ALB)
 		slave->state = BOND_STATE_BACKUP;
 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
-	if (slave_do_arp_validate(bond, slave))
-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
+	bond_fix_slave_validate_flag(bond, slave);
 }
 
 static inline void bond_set_slave_active_flags(struct slave *slave)
 {
+	struct bonding *bond = netdev_priv(slave->dev->master);
+
 	slave->state = BOND_STATE_ACTIVE;
-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
+	bond_fix_slave_validate_flag(bond, slave);
 }
 
 static inline void bond_set_master_3ad_flags(struct bonding *bond)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e7783f..3dafd8f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1874,6 +1874,7 @@  static inline void netif_set_gso_max_size(struct net_device *dev,
 	dev->gso_max_size = size;
 }
 
+#if 0
 /* On bonding slaves other than the currently active slave, suppress
  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
  * ARP on active-backup slaves with arp_validate enabled.
@@ -1906,6 +1907,8 @@  static inline int skb_bond_should_drop(struct sk_buff *skb)
 	}
 	return 0;
 }
+#endif
+extern int skb_bond_should_drop(struct sk_buff *skb);
 
 extern struct pernet_operations __net_initdata loopback_net_ops;
 #endif /* __KERNEL__ */
diff --git a/net/core/dev.c b/net/core/dev.c
index 91d792d..ac5a37e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2098,6 +2098,56 @@  static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
 #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
 #endif
 
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+int (*bond_handle_frame_hook)(struct sk_buff *skb);
+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
+
+/* On bonding slaves other than the currently active slave, suppress
+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
+ * ARP on active-backup slaves with arp_validate enabled.
+ */
+int skb_bond_should_drop(struct sk_buff *skb)
+{
+	struct net_device *dev = skb->dev;
+	struct net_device *master = dev->master;
+
+	if (master)
+		return bond_handle_frame_hook(skb);
+
+	return 0;
+
+#if 0
+		if (master->priv_flags & IFF_MASTER_ARPMON)
+			dev->last_rx = jiffies;
+
+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
+			    skb->protocol == __constant_htons(ETH_P_ARP))
+				return 0;
+
+			if (master->priv_flags & IFF_MASTER_ALB) {
+				if (skb->pkt_type != PACKET_BROADCAST &&
+				    skb->pkt_type != PACKET_MULTICAST)
+					return 0;
+			}
+			if (master->priv_flags & IFF_MASTER_8023AD &&
+			    skb->protocol == __constant_htons(ETH_P_SLOW))
+				return 0;
+
+			return 1;
+		}
+	}
+	return 0;
+#endif /* 0 */
+}
+#else
+int skb_bond_should_drop(struct sk_buff *skb)
+{
+	return 0;
+}
+#endif
+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
+
 #ifdef CONFIG_NET_CLS_ACT
 /* TODO: Maybe we should just force sch_ingress to be compiled in
  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions