Message ID | 20200816185244.GA31426@incl |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] bonding: fix active-backup failover for current ARP slave | expand |
From: Jiri Wiesner <jwiesner@suse.com> Date: Sun, 16 Aug 2020 20:52:44 +0200 > When the ARP monitor is used for link detection, ARP replies are > validated for all slaves (arp_validate=3) and fail_over_mac is set to > active, two slaves of an active-backup bond may get stuck in a state > where both of them are active and pass packets that they receive to > the bond. This state makes IPv6 duplicate address detection fail. The > state is reached thus: > 1. The current active slave goes down because the ARP target > is not reachable. > 2. The current ARP slave is chosen and made active. > 3. A new slave is enslaved. This new slave becomes the current active > slave and can reach the ARP target. > As a result, the current ARP slave stays active after the enslave > action has finished and the log is littered with "PROBE BAD" messages: >> bond0: PROBE: c_arp ens10 && cas ens11 BAD > The workaround is to remove the slave with "going back" status from > the bond and re-enslave it. This issue was encountered when DPDK PMD > interfaces were being enslaved to an active-backup bond. > > I would be possible to fix the issue in bond_enslave() or > bond_change_active_slave() but the ARP monitor was fixed instead to > keep most of the actions changing the current ARP slave in the ARP > monitor code. The current ARP slave is set as inactive and backup > during the commit phase. A new state, BOND_LINK_FAIL, has been > introduced for slaves in the context of the ARP monitor. This allows > administrators to see how slaves are rotated for sending ARP requests > and attempts are made to find a new active slave. > > Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor") > Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Applied and queued up for -stable, thanks Jiri.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c853ca67058c..0ee59ea357f5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2945,6 +2945,9 @@ static int bond_ab_arp_inspect(struct bonding *bond) if (bond_time_in_interval(bond, last_rx, 1)) { bond_propose_link_state(slave, BOND_LINK_UP); commit++; + } else if (slave->link == BOND_LINK_BACK) { + bond_propose_link_state(slave, BOND_LINK_FAIL); + commit++; } continue; } @@ -3053,6 +3056,19 @@ static void bond_ab_arp_commit(struct bonding *bond) continue; + case BOND_LINK_FAIL: + bond_set_slave_link_state(slave, BOND_LINK_FAIL, + BOND_SLAVE_NOTIFY_NOW); + bond_set_slave_inactive_flags(slave, + BOND_SLAVE_NOTIFY_NOW); + + /* A slave has just been enslaved and has become + * the current active slave. + */ + if (rtnl_dereference(bond->curr_active_slave)) + RCU_INIT_POINTER(bond->current_arp_slave, NULL); + continue; + default: slave_err(bond->dev, slave->dev, "impossible: link_new_state %d on slave\n", @@ -3103,8 +3119,6 @@ static bool bond_ab_arp_probe(struct bonding *bond) return should_notify_rtnl; } - bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER); - bond_for_each_slave_rcu(bond, slave, iter) { if (!found && !before && bond_slave_is_up(slave)) before = slave;