Message ID | 1375838634.11370.13.camel@cr0 |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Calling unlock in dellink is not safe. can you reproduce with 3.10 or 3.11-rc? On Tue, Aug 6, 2013 at 6:23 PM, Cong Wang <amwang@redhat.com> wrote: > Hi, Stephen > > You introduced a soft lockup in vxlan module in > > commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8 > Author: stephen hemminger <stephen@networkplumber.org> > Date: Sat Jul 13 10:18:18 2013 -0700 > > vxlan: add necessary locking on device removal > > The problem is that vxlan_dellink(), which is called with RTNL lock > held, tries to flush the workqueue synchronously, but apparently > igmp_join and igmp_leave work need to hold RTNL lock too, therefore we > have a soft lockup! This is 100% reproducible on my 2.6.32 backport > while running `modprobe -r vxlan`. > > A quick but perhaps ugly fix is just releasing RTNL lock before calling > flush_workqueue(): > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 8bf31d9..581d3d5 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -1837,7 +1837,9 @@ static void vxlan_dellink(struct net_device *dev, > struct list_head *head) > struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); > struct vxlan_dev *vxlan = netdev_priv(dev); > > + rtnl_unlock(); > flush_workqueue(vxlan_wq); > + rtnl_lock(); > > spin_lock(&vn->sock_lock); > hlist_del_rcu(&vxlan->hlist); > > However, I think a better way is still what I did, that is, removing > RTNL lock from ip_mc_join_group() and ip_mc_leave_group(). > > What do you think? Any other idea to fix it? > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-08-06 at 19:18 -0700, Stephen Hemminger wrote: > Calling unlock in dellink is not safe. This is my feeling as well. > can you reproduce with 3.10 or 3.11-rc? > For net-next, I always compile modules as builtin, therefore can't test it. But clearly the code is same. The steps to reproduce it are: 1) ping over vxlan0 2) modprobe -r vxlan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 07 Aug 2013 09:23:54 +0800 Cong Wang <amwang@redhat.com> wrote: > Hi, Stephen > > You introduced a soft lockup in vxlan module in > > commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8 > Author: stephen hemminger <stephen@networkplumber.org> > Date: Sat Jul 13 10:18:18 2013 -0700 > > vxlan: add necessary locking on device removal > > The problem is that vxlan_dellink(), which is called with RTNL lock > held, tries to flush the workqueue synchronously, but apparently > igmp_join and igmp_leave work need to hold RTNL lock too, therefore we > have a soft lockup! This is 100% reproducible on my 2.6.32 backport > while running `modprobe -r vxlan`. > > A quick but perhaps ugly fix is just releasing RTNL lock before calling > flush_workqueue(): > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 8bf31d9..581d3d5 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -1837,7 +1837,9 @@ static void vxlan_dellink(struct net_device *dev, > struct list_head *head) > struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); > struct vxlan_dev *vxlan = netdev_priv(dev); > > + rtnl_unlock(); > flush_workqueue(vxlan_wq); > + rtnl_lock(); > > spin_lock(&vn->sock_lock); > hlist_del_rcu(&vxlan->hlist); > > However, I think a better way is still what I did, that is, removing > RTNL lock from ip_mc_join_group() and ip_mc_leave_group(). > > What do you think? Any other idea to fix it? > > Thanks. > Probably the flush_workqueue can just be removed and let the normal refcounting work. The workqueue has a reference to device and socket, therefore the cleanups should work correctly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 8bf31d9..581d3d5 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1837,7 +1837,9 @@ static void vxlan_dellink(struct net_device *dev, struct list_head *head) struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); struct vxlan_dev *vxlan = netdev_priv(dev); + rtnl_unlock(); flush_workqueue(vxlan_wq); + rtnl_lock(); spin_lock(&vn->sock_lock);