Message ID | 20120822174534.GA20260@midget.suse.cz |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 08/22/2012 11:45 AM, Jiri Bohac wrote: > This code is run from bond_activebackup_arp_mon() about > delta_in_ticks jiffies after the previous ARP probe has been > sent. If the delayed work gets executed exactly in delta_in_ticks > jiffies, there is a chance the slave will be brought up. If the > delayed work runs one jiffy later, the slave will stay down. <snip> > Should they perhaps all be increased by, say, delta_in_ticks/2, to make this > less dependent on the current scheduling latencies? We have been using a patch that tracks the arpmon requested sleep time vs the actual sleep time and adds any scheduling latency to the allowed delta. That way if we sleep too long due to scheduling latency it doesn't affect the calculation. Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Friesen <chris.friesen@genband.com> wrote: >On 08/22/2012 11:45 AM, Jiri Bohac wrote: > >> This code is run from bond_activebackup_arp_mon() about >> delta_in_ticks jiffies after the previous ARP probe has been >> sent. If the delayed work gets executed exactly in delta_in_ticks >> jiffies, there is a chance the slave will be brought up. If the >> delayed work runs one jiffy later, the slave will stay down. Presumably the ARP reply is coming back in less than one jiffy, then, so the slave_last_rx() value is the same jiffy as when the _inspect was previously called? ><snip> > >> Should they perhaps all be increased by, say, delta_in_ticks/2, to make this >> less dependent on the current scheduling latencies? > >We have been using a patch that tracks the arpmon requested sleep time vs >the actual sleep time and adds any scheduling latency to the allowed >delta. That way if we sleep too long due to scheduling latency it doesn't >affect the calculation. How much scheduling latency do you see? Is that really better than just permitting a bit more slack in the timing window? As to the 2 * delta and 3 * delta calculations, these values predate my involvement with bonding, so I'm not entirely sure why those specific values were chosen (there are no log messages from that era that I'm aware of). My presumption has been that this part: /* * Active slave is down if: * - more than 2*delta since transmitting OR * - (more than 2*delta since receive AND * the bond has an IP address) */ trans_start = dev_trans_start(slave->dev); if (bond_is_active_slave(slave) && (!time_in_range(jiffies, trans_start - delta_in_ticks, trans_start + 2 * delta_in_ticks) || !time_in_range(jiffies, slave_last_rx(bond, slave) - delta_in_ticks, slave_last_rx(bond, slave) + 2 * delta_in_ticks))) { slave->new_link = BOND_LINK_DOWN; commit++; } was structured this way (allowing 2 * delta) to permit the loss of a single ARP on an otherwise idle interface without triggering a link down. My guess, though, is that until relatively recently the timing window was not too tight, and there was effectively some slack in the calculation, because the slave_last_rx() would be set to some small number of jiffies after the last exection of the monitor, and so the "slave_last_rx() + delta_in_ticks" wasn't as narrow a window as it appears to be now. So, without having tested this myself, based on the above, I don't see that adding some slack would be a problem. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/22/2012 12:42 PM, Jay Vosburgh wrote: > Chris Friesen<chris.friesen@genband.com> wrote: > >> On 08/22/2012 11:45 AM, Jiri Bohac wrote: >> >>> This code is run from bond_activebackup_arp_mon() about >>> delta_in_ticks jiffies after the previous ARP probe has been >>> sent. If the delayed work gets executed exactly in delta_in_ticks >>> jiffies, there is a chance the slave will be brought up. If the >>> delayed work runs one jiffy later, the slave will stay down. > > Presumably the ARP reply is coming back in less than one jiffy, > then, so the slave_last_rx() value is the same jiffy as when the > _inspect was previously called? > >> <snip> >> >>> Should they perhaps all be increased by, say, delta_in_ticks/2, to make this >>> less dependent on the current scheduling latencies? >> >> We have been using a patch that tracks the arpmon requested sleep time vs >> the actual sleep time and adds any scheduling latency to the allowed >> delta. That way if we sleep too long due to scheduling latency it doesn't >> affect the calculation. > > How much scheduling latency do you see? > > Is that really better than just permitting a bit more slack in > the timing window? We hit enough latency that it triggered arpmon to falsely mark multiple links as lost. This triggered our system maintenance code to go into a "oh no we can't talk to the outside world" secenario, which does fairly intrusive things to try and bring connectivity back up. Basically a bad thing to happen just because of a random scheduler latency spike. I should note that we added this some time back and are still running older kernels so I have no idea what latency on modern kernels is like. Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dne St 22. srpna 2012 20:42:02 Jay Vosburgh napsal(a): > Chris Friesen <chris.friesen@genband.com> wrote: > >On 08/22/2012 11:45 AM, Jiri Bohac wrote: > >> This code is run from bond_activebackup_arp_mon() about > >> delta_in_ticks jiffies after the previous ARP probe has been > >> sent. If the delayed work gets executed exactly in delta_in_ticks > >> jiffies, there is a chance the slave will be brought up. If the > >> delayed work runs one jiffy later, the slave will stay down. > > Presumably the ARP reply is coming back in less than one jiffy, > then, so the slave_last_rx() value is the same jiffy as when the > _inspect was previously called? Yes, that's what happens. Keep in mind that the backup slave validates the original ARP query, so on a fast network, you get it more or less immediately (for my case, I can see a delay of ~70us). Anyway, why do we have to wait until the next ARP send? Couldn't we simply kick the work queue when we receive a valid packet on a down interface? Petr Tesarik SUSE Linux -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3001,7 +3001,7 @@ static int bond_ab_arp_inspect(struct bo if (slave->link != BOND_LINK_UP) { if (time_in_range(jiffies, slave_last_rx(bond, slave) - delta_in_ticks, - slave_last_rx(bond, slave) + delta_in_ticks)) { + slave_last_rx(bond, slave) + 2 * delta_in_ticks)) { slave->new_link = BOND_LINK_UP; commit++;