diff mbox

[2.6.28] RDMA/cxgb3: deadlock in iw_cxgb3 can cause hang when configuring interface.

Message ID 20081106230642.28808.66765.stgit@dell3.ogc.int
State Accepted, archived
Delegated to: Jeff Garzik
Headers show

Commit Message

Steve Wise Nov. 6, 2008, 11:06 p.m. UTC
From: Steve Wise <swise@opengridcomputing.com>

When the iw_cxgb3 module's cxgb3_client "add" func gets called by the
cxgb3 module, the iwarp driver ends up calling the ethtool ops get_drvinfo
function in cxgb3 to get the fw version and other info.  Currently the
iwarp driver grabs the rtnl lock around this down call to serialize.
As of 2.6.27 or so, things changed such that the rtnl lock is held around
the call to the netdev driver open function.  Also the cxgb3_client "add"
function doesn't get called if the device is down.  

So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the device,
the iw_cxgb3 add func gets called with the rtnl_lock held.   If you
load cxgb3, ifconfig up the device, then load iw_cxgb3, the add func
gets called without the rtnl_lock held.  The former causes the deadlock,
the latter does not.

In addition, there are iw_cxgb3 sysfs handlers that also can call
down into cxgb3 to gather the fw and hw versions.  These can be called
concurrently on different processors and at any time.  Thus we need to
push this serialization down in the cxgb3 driver get_drvinfo func.

The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |    6 ------
 drivers/net/cxgb3/cxgb3_main.c              |    2 ++
 2 files changed, 2 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Divy Le Ray Nov. 6, 2008, 11:27 p.m. UTC | #1
Steve Wise wrote:
> From: Steve Wise <swise@opengridcomputing.com>
>
> When the iw_cxgb3 module's cxgb3_client "add" func gets called by the
> cxgb3 module, the iwarp driver ends up calling the ethtool ops get_drvinfo
> function in cxgb3 to get the fw version and other info.  Currently the
> iwarp driver grabs the rtnl lock around this down call to serialize.
> As of 2.6.27 or so, things changed such that the rtnl lock is held around
> the call to the netdev driver open function.  Also the cxgb3_client "add"
> function doesn't get called if the device is down.  
>
> So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the device,
> the iw_cxgb3 add func gets called with the rtnl_lock held.   If you
> load cxgb3, ifconfig up the device, then load iw_cxgb3, the add func
> gets called without the rtnl_lock held.  The former causes the deadlock,
> the latter does not.
>
> In addition, there are iw_cxgb3 sysfs handlers that also can call
> down into cxgb3 to gather the fw and hw versions.  These can be called
> concurrently on different processors and at any time.  Thus we need to
> push this serialization down in the cxgb3 driver get_drvinfo func.
>
> The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>   

Acked-by: Divy Le Ray <divy@chelsio.com>

> ---
>
>  drivers/infiniband/hw/cxgb3/iwch_provider.c |    6 ------
>  drivers/net/cxgb3/cxgb3_main.c              |    2 ++
>  2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index ecff980..160ef48 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1102,9 +1102,7 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev)
>  	char *cp, *next;
>  	unsigned fw_maj, fw_min, fw_mic;
>  
> -	rtnl_lock();
>  	lldev->ethtool_ops->get_drvinfo(lldev, &info);
> -	rtnl_unlock();
>  
>  	next = info.fw_version + 1;
>  	cp = strsep(&next, ".");
> @@ -1192,9 +1190,7 @@ static ssize_t show_fw_ver(struct device *dev, struct device_attribute *attr, ch
>  	struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev;
>  
>  	PDBG("%s dev 0x%p\n", __func__, dev);
> -	rtnl_lock();
>  	lldev->ethtool_ops->get_drvinfo(lldev, &info);
> -	rtnl_unlock();
>  	return sprintf(buf, "%s\n", info.fw_version);
>  }
>  
> @@ -1207,9 +1203,7 @@ static ssize_t show_hca(struct device *dev, struct device_attribute *attr,
>  	struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev;
>  
>  	PDBG("%s dev 0x%p\n", __func__, dev);
> -	rtnl_lock();
>  	lldev->ethtool_ops->get_drvinfo(lldev, &info);
> -	rtnl_unlock();
>  	return sprintf(buf, "%s\n", info.driver);
>  }
>  
> diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
> index 1ace41a..5e663cc 100644
> --- a/drivers/net/cxgb3/cxgb3_main.c
> +++ b/drivers/net/cxgb3/cxgb3_main.c
> @@ -1307,8 +1307,10 @@ static void get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
>  	u32 fw_vers = 0;
>  	u32 tp_vers = 0;
>  
> +	spin_lock(&adapter->stats_lock);
>  	t3_get_fw_version(adapter, &fw_vers);
>  	t3_get_tp_version(adapter, &tp_vers);
> +	spin_unlock(&adapter->stats_lock);
>  
>  	strcpy(info->driver, DRV_NAME);
>  	strcpy(info->version, DRV_VERSION);
>   

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier Nov. 12, 2008, 6:20 p.m. UTC | #2
Looks good, applied.

However, I think it's a little yucky to call ethtool ops without rtnl,
although it is of course perfectly safe in this case.  It might be nicer
to introduce a new cxgb3 <-> iw_cxgb3 interface that returns the
firmware version, which can also be used to implement the get_drvinfo
ethtool op as well.  That would let you avoid fw_vers_string_to_u64() as
well -- it is a little silly at the moment how cxgb3 converts to a
string and then iw_cxgb3 parses that string.

But that's all much lower priority than just fixing a deadlock.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index ecff980..160ef48 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1102,9 +1102,7 @@  static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev)
 	char *cp, *next;
 	unsigned fw_maj, fw_min, fw_mic;
 
-	rtnl_lock();
 	lldev->ethtool_ops->get_drvinfo(lldev, &info);
-	rtnl_unlock();
 
 	next = info.fw_version + 1;
 	cp = strsep(&next, ".");
@@ -1192,9 +1190,7 @@  static ssize_t show_fw_ver(struct device *dev, struct device_attribute *attr, ch
 	struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev;
 
 	PDBG("%s dev 0x%p\n", __func__, dev);
-	rtnl_lock();
 	lldev->ethtool_ops->get_drvinfo(lldev, &info);
-	rtnl_unlock();
 	return sprintf(buf, "%s\n", info.fw_version);
 }
 
@@ -1207,9 +1203,7 @@  static ssize_t show_hca(struct device *dev, struct device_attribute *attr,
 	struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev;
 
 	PDBG("%s dev 0x%p\n", __func__, dev);
-	rtnl_lock();
 	lldev->ethtool_ops->get_drvinfo(lldev, &info);
-	rtnl_unlock();
 	return sprintf(buf, "%s\n", info.driver);
 }
 
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 1ace41a..5e663cc 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -1307,8 +1307,10 @@  static void get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
 	u32 fw_vers = 0;
 	u32 tp_vers = 0;
 
+	spin_lock(&adapter->stats_lock);
 	t3_get_fw_version(adapter, &fw_vers);
 	t3_get_tp_version(adapter, &tp_vers);
+	spin_unlock(&adapter->stats_lock);
 
 	strcpy(info->driver, DRV_NAME);
 	strcpy(info->version, DRV_VERSION);