Message ID | 1318526483-16073-1-git-send-email-fbl@redhat.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Flavio Leitner <fbl@redhat.com> wrote: >The port shouldn't be enabled unless its current MUX >state is DISTRIBUTING which is correctly handled by >ad_mux_machine(), otherwise the packet sent can be >lost because the other end may not be ready. > >The issue happens on every port initialization, but >as the ports are expected to move quickly to DISTRIBUTING, >it doesn't cause much problem. However, it does cause >constant packet loss if the other peer has the port >configured to stay in STANDBY (i.e. SYNC set to OFF). This may explain another misbehavior I've been looking into: if the bond's outgoing LACPDUs are lost (never received by the switch), but the switch's incoming LACPDUs are received, bonding puts the port into use, and packets to the switch are dropped by the switch. >Signed-off-by: Flavio Leitner <fbl@redhat.com> >--- > >The comments there suggests it was a workaround for losses >of link events, but I couldn't track the changelog as it >seems to be pretty old. Thus, as all the link notification >stuff has been improved a lot, maybe this is not an issue >anymore. At least, I didn't find any problem while >unplugging/plugging cables here. I believe this code fragment is original to the 802.3ad submission, which would have been around 2003 or so. Did you check the standard for what it says should happen in this case? I'm guessing this is something not specified by the standard, given the comment, but we should check to make sure. -J > drivers/net/bonding/bond_3ad.c | 7 ------- > 1 files changed, 0 insertions(+), 7 deletions(-) > >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c >index 47b928e..b33c099 100644 >--- a/drivers/net/bonding/bond_3ad.c >+++ b/drivers/net/bonding/bond_3ad.c >@@ -1135,13 +1135,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port) > __record_pdu(lacpdu, port); > port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT)); > port->actor_oper_port_state &= ~AD_STATE_EXPIRED; >- // verify that if the aggregator is enabled, the port is enabled too. >- //(because if the link goes down for a short time, the 802.3ad will not >- // catch it, and the port will continue to be disabled) >- if (port->aggregator >- && port->aggregator->is_active >- && !__port_is_enabled(port)) >- __enable_port(port); > break; > default: //to silence the compiler > break; >-- >1.7.6 > --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 13 Oct 2011 10:46:31 -0700 Jay Vosburgh <fubar@us.ibm.com> wrote: > Flavio Leitner <fbl@redhat.com> wrote: > > >The port shouldn't be enabled unless its current MUX > >state is DISTRIBUTING which is correctly handled by > >ad_mux_machine(), otherwise the packet sent can be > >lost because the other end may not be ready. > > > >The issue happens on every port initialization, but > >as the ports are expected to move quickly to DISTRIBUTING, > >it doesn't cause much problem. However, it does cause > >constant packet loss if the other peer has the port > >configured to stay in STANDBY (i.e. SYNC set to OFF). > > This may explain another misbehavior I've been looking into: if > the bond's outgoing LACPDUs are lost (never received by the switch), but > the switch's incoming LACPDUs are received, bonding puts the port into > use, and packets to the switch are dropped by the switch. > Yeah, it could explain that as well because on my tests here, all ports were enabled as soon as the aggregator is active. > >Signed-off-by: Flavio Leitner <fbl@redhat.com> > >--- > > > >The comments there suggests it was a workaround for losses > >of link events, but I couldn't track the changelog as it > >seems to be pretty old. Thus, as all the link notification > >stuff has been improved a lot, maybe this is not an issue > >anymore. At least, I didn't find any problem while > >unplugging/plugging cables here. > > I believe this code fragment is original to the 802.3ad > submission, which would have been around 2003 or so. > > Did you check the standard for what it says should happen in > this case? I'm guessing this is something not specified by the > standard, given the comment, but we should check to make sure. > The standard says that the port should receive when its mux state is COLLECTING and transmitting when its mux state is DISTRIBUTING. So, that seems to be violationing the standard because it doesn't consider the mux state at all. It is harmless for almost all cases though, because ports move quickly to DISTRIBUTING or no link at all, which disables the port. fbl -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Flavio Leitner <fbl@redhat.com> Date: Thu, 13 Oct 2011 14:21:23 -0300 > The port shouldn't be enabled unless its current MUX > state is DISTRIBUTING which is correctly handled by > ad_mux_machine(), otherwise the packet sent can be > lost because the other end may not be ready. > > The issue happens on every port initialization, but > as the ports are expected to move quickly to DISTRIBUTING, > it doesn't cause much problem. However, it does cause > constant packet loss if the other peer has the port > configured to stay in STANDBY (i.e. SYNC set to OFF). > > Signed-off-by: Flavio Leitner <fbl@redhat.com> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 47b928e..b33c099 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1135,13 +1135,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port) __record_pdu(lacpdu, port); port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT)); port->actor_oper_port_state &= ~AD_STATE_EXPIRED; - // verify that if the aggregator is enabled, the port is enabled too. - //(because if the link goes down for a short time, the 802.3ad will not - // catch it, and the port will continue to be disabled) - if (port->aggregator - && port->aggregator->is_active - && !__port_is_enabled(port)) - __enable_port(port); break; default: //to silence the compiler break;
The port shouldn't be enabled unless its current MUX state is DISTRIBUTING which is correctly handled by ad_mux_machine(), otherwise the packet sent can be lost because the other end may not be ready. The issue happens on every port initialization, but as the ports are expected to move quickly to DISTRIBUTING, it doesn't cause much problem. However, it does cause constant packet loss if the other peer has the port configured to stay in STANDBY (i.e. SYNC set to OFF). Signed-off-by: Flavio Leitner <fbl@redhat.com> --- The comments there suggests it was a workaround for losses of link events, but I couldn't track the changelog as it seems to be pretty old. Thus, as all the link notification stuff has been improved a lot, maybe this is not an issue anymore. At least, I didn't find any problem while unplugging/plugging cables here. drivers/net/bonding/bond_3ad.c | 7 ------- 1 files changed, 0 insertions(+), 7 deletions(-)