Message ID | 11276.1239757967@death.nxdomain.ibm.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Jay Vosburgh <fubar@us.ibm.com> Date: Tue, 14 Apr 2009 18:12:47 -0700 > I think I know what's going on. I believe this patch will > resolve things, but I won't be able to test it until tomorrow. If you > want to test this, great; if you want to wait, that's fine too. Jay, thanks for working on this. Let me know when you have a final version of this fix for me to include. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote: > I think I know what's going on. I believe this patch will > resolve things, but I won't be able to test it until tomorrow. If you > want to test this, great; if you want to wait, that's fine too. I tested this; it works great. All my systems came up fine with this change applied. Thanks! > diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c > index 8dc6fbb..b22467a 100644 > --- a/drivers/net/bonding/bond_alb.c > +++ b/drivers/net/bonding/bond_alb.c > @@ -1708,10 +1708,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave > * Called with RTNL > */ > int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) > - __releases(&bond->curr_slave_lock) > - __releases(&bond->lock) > __acquires(&bond->lock) > - __acquires(&bond->curr_slave_lock) > + __releases(&bond->lock) > { > struct bonding *bond = netdev_priv(bond_dev); > struct sockaddr *sa = addr; > @@ -1747,9 +1745,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) > } > } > > - write_unlock_bh(&bond->curr_slave_lock); > - read_unlock(&bond->lock); > - > if (swap_slave) { > alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave); > alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave); > @@ -1757,16 +1752,17 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) > alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr, > bond->alb_info.rlb_enabled); > > - alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr); > + read_lock(&bond->lock); > + alb_send_learning_packets(bond->curr_active_slave, > + bond_dev->dev_addr); > if (bond->alb_info.rlb_enabled) { > /* inform clients mac address has changed */ > - rlb_req_update_slave_clients(bond, bond->curr_active_slave); > + rlb_req_update_slave_clients(bond, > + bond->curr_active_slave); > } > + read_unlock(&bond->lock); > } > > - read_lock(&bond->lock); > - write_lock_bh(&bond->curr_slave_lock); > - > return 0; > } >
On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote: > I think I know what's going on. I believe this patch will > resolve things, but I won't be able to test it until tomorrow. If you > want to test this, great; if you want to wait, that's fine too. Hi Jay; as I mentioned last night this patch is working fine for me so far. However, looking at the rest of this function it seems to me that there are other locking issues, at least based on the documentation in the header file: * Here are the locking policies for the two bonding locks: * * 1) Get bond->lock when reading/writing slave list. * 2) Get bond->curr_slave_lock when reading/writing bond->curr_active_slave. * (It is unnecessary when the write-lock is put with bond->lock.) * 3) When we lock with bond->curr_slave_lock, we must lock with bond->lock * beforehand. For example, don't you need to hold bond->curr_slave_lock at least around the "if (!bond->curr_active_slave)"? What about around the "bond_for_each_slave" loop? Many of the other functions, later, also seem to work with bond->curr_active_slave and they don't take this lock. Unless I'm missing something, I think there are still more problems in the locking in bond_alb_set_mac_address(). Thoughts? > diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c > index 8dc6fbb..b22467a 100644 > --- a/drivers/net/bonding/bond_alb.c > +++ b/drivers/net/bonding/bond_alb.c > @@ -1708,10 +1708,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave > * Called with RTNL > */ > int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) > - __releases(&bond->curr_slave_lock) > - __releases(&bond->lock) > __acquires(&bond->lock) > - __acquires(&bond->curr_slave_lock) > + __releases(&bond->lock) > { > struct bonding *bond = netdev_priv(bond_dev); > struct sockaddr *sa = addr; > @@ -1747,9 +1745,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) > } > } > > - write_unlock_bh(&bond->curr_slave_lock); > - read_unlock(&bond->lock); > - > if (swap_slave) { > alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave); > alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave); > @@ -1757,16 +1752,17 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) > alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr, > bond->alb_info.rlb_enabled); > > - alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr); > + read_lock(&bond->lock); > + alb_send_learning_packets(bond->curr_active_slave, > + bond_dev->dev_addr); > if (bond->alb_info.rlb_enabled) { > /* inform clients mac address has changed */ > - rlb_req_update_slave_clients(bond, bond->curr_active_slave); > + rlb_req_update_slave_clients(bond, > + bond->curr_active_slave); > } > + read_unlock(&bond->lock); > } > > - read_lock(&bond->lock); > - write_lock_bh(&bond->curr_slave_lock); > - > return 0; > } >
Paul Smith <paul@mad-scientist.net> wrote: >On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote: >> I think I know what's going on. I believe this patch will >> resolve things, but I won't be able to test it until tomorrow. If you >> want to test this, great; if you want to wait, that's fine too. > >Hi Jay; as I mentioned last night this patch is working fine for me so >far. Thanks for the test report. >However, looking at the rest of this function it seems to me that there >are other locking issues, at least based on the documentation in the >header file: > > * Here are the locking policies for the two bonding locks: > * > * 1) Get bond->lock when reading/writing slave list. > * 2) Get bond->curr_slave_lock when reading/writing bond->curr_active_slave. > * (It is unnecessary when the write-lock is put with bond->lock.) > * 3) When we lock with bond->curr_slave_lock, we must lock with bond->lock > * beforehand. > >For example, don't you need to hold bond->curr_slave_lock at least >around the "if (!bond->curr_active_slave)"? What about around the >"bond_for_each_slave" loop? > >Many of the other functions, later, also seem to work with >bond->curr_active_slave and they don't take this lock. > >Unless I'm missing something, I think there are still more problems in >the locking in bond_alb_set_mac_address(). The various MAC manipulating functions are either called under RTNL (as bond_alb_set_mac_address is) or take pains to acquire RTNL before doing anything with the MAC. Also, the slave list and curr_active_slave are mutexed by RTNL, so those inspections should be safe. I'm reasonably sure that the curr_slave_lock is superfluous (which wasn't the case when it was originally introduced), but I haven't had a chance to validate this. The locking has changed from what's documented in the header file; RTNL wasn't used for this when that was written. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-04-15 at 11:11 -0700, Jay Vosburgh wrote: > The various MAC manipulating functions are either called under > RTNL (as bond_alb_set_mac_address is) or take pains to acquire RTNL > before doing anything with the MAC. Also, the slave list and > curr_active_slave are mutexed by RTNL, so those inspections should be > safe. > > I'm reasonably sure that the curr_slave_lock is superfluous > (which wasn't the case when it was originally introduced), but I > haven't had a chance to validate this. The locking has changed from > what's documented in the header file; RTNL wasn't used for this when > that was written. OK, sounds good. I'll let you know if I observe any other odd behavior with the bonding driver. Thanks for the great support! Cheers!
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 8dc6fbb..b22467a 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1708,10 +1708,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave * Called with RTNL */ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) - __releases(&bond->curr_slave_lock) - __releases(&bond->lock) __acquires(&bond->lock) - __acquires(&bond->curr_slave_lock) + __releases(&bond->lock) { struct bonding *bond = netdev_priv(bond_dev); struct sockaddr *sa = addr; @@ -1747,9 +1745,6 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) } } - write_unlock_bh(&bond->curr_slave_lock); - read_unlock(&bond->lock); - if (swap_slave) { alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave); alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave); @@ -1757,16 +1752,17 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr, bond->alb_info.rlb_enabled); - alb_send_learning_packets(bond->curr_active_slave, bond_dev->dev_addr); + read_lock(&bond->lock); + alb_send_learning_packets(bond->curr_active_slave, + bond_dev->dev_addr); if (bond->alb_info.rlb_enabled) { /* inform clients mac address has changed */ - rlb_req_update_slave_clients(bond, bond->curr_active_slave); + rlb_req_update_slave_clients(bond, + bond->curr_active_slave); } + read_unlock(&bond->lock); } - read_lock(&bond->lock); - write_lock_bh(&bond->curr_slave_lock); - return 0; }