diff mbox

ehea: Add some info messages and fix an issue

Message ID 1290792387-12331-1-git-send-email-leitao@linux.vnet.ibm.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Breno Leitao Nov. 26, 2010, 5:26 p.m. UTC
From: Breno Leitao <breno@cafe.(none)>

This patch adds some debug information about ehea not being able to
allocate enough spaces. Also it correctly updates the amount of available
skb.

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
---
 drivers/net/ehea/ehea_main.c |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

Comments

David Miller Nov. 29, 2010, 2:15 a.m. UTC | #1
From: leitao@linux.vnet.ibm.com
Date: Fri, 26 Nov 2010 15:26:27 -0200

> From: Breno Leitao <breno@cafe.(none)>
> 
> This patch adds some debug information about ehea not being able to
> allocate enough spaces. Also it correctly updates the amount of available
> skb.
> 
> Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>

Applied to net-2.6
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anton Blanchard Jan. 7, 2011, 3:24 a.m. UTC | #2
Hi,

> From: Breno Leitao <breno@cafe.(none)>
> 
> This patch adds some debug information about ehea not being able to
> allocate enough spaces. Also it correctly updates the amount of
> available skb.

I'm seeing issues on a number of machines with the ehea device.
Sometime after boot I see a bunch of:

ehea: Error in ehea_proc_rwqes: LL rq1: skb=NULL
ehea: Error in ehea_proc_rwqes: LL rq1: skb=NULL
ehea: Error in ehea_proc_rwqes: LL rq1: skb=NULL
ehea: Error in ehea_proc_rwqes: LL rq1: skb=NULL

which eventually stop.

-       for (i = 0; i < pr->rq1_skba.len; i++) {
+       for (i = 0; i < nr_rq1a; i++) {

It looks like you are now only initialising half the ring, but still
telling the hardware to use the whole ring. Once you get through the
entire ring once the errors go away.

Anton
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Breno Leitao Jan. 7, 2011, 12:14 p.m. UTC | #3
Hi Anton,

On 01/07/2011 01:24 AM, Anton Blanchard wrote:
> It looks like you are now only initialising half the ring, but still
>  telling the hardware to use the whole ring. Once you get through the
>  entire ring once the errors go away.
You are correct. The problem in the past ehea_init_fill_rq1() wasn't
respecting the nr_rq1a parameter. So, every time ehea_init_fill_rqX()
was trying to allocated the entire skb arrary, and assume that the
entire array was allocated, which wasn't correct.

My patch just allocate the requested number of skbs (described in
nr_rq1a) in skb array, and tell hea that only that amount of skb were
allocated (via doorbell).

On the other side, ehea_proc_rwqes() tries to maximize the array usage,
meaning that if the array is not completely used, it'd try to allocate
an skb "on-the-fly", and this is expected (For example, when you
initialize the system on memory pressure)

That is why I sent another patch that turns this message a netif_info()
instead of a netif_error() (as it was in the past). The commit id for
this patch is 782615aea84e57dc7f2f922cea823df3de635a78

So, although this behaviour is completely expected, this code path
should only being executed on memory pressure. But I am suspecting this
code path is execute more often than I'd expect. Let me investigate this.

Thanks
Breno
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 182b2a7..3d0af08 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -400,6 +400,7 @@  static void ehea_refill_rq1(struct ehea_port_res *pr, int index, int nr_of_wqes)
 			skb_arr_rq1[index] = netdev_alloc_skb(dev,
 							      EHEA_L_PKT_SIZE);
 			if (!skb_arr_rq1[index]) {
+				ehea_info("Unable to allocate enough skb in the array\n");
 				pr->rq1_skba.os_skbs = fill_wqes - i;
 				break;
 			}
@@ -422,13 +423,20 @@  static void ehea_init_fill_rq1(struct ehea_port_res *pr, int nr_rq1a)
 	struct net_device *dev = pr->port->netdev;
 	int i;
 
-	for (i = 0; i < pr->rq1_skba.len; i++) {
+	if (nr_rq1a > pr->rq1_skba.len) {
+		ehea_error("NR_RQ1A bigger than skb array len\n");
+		return;
+	}
+
+	for (i = 0; i < nr_rq1a; i++) {
 		skb_arr_rq1[i] = netdev_alloc_skb(dev, EHEA_L_PKT_SIZE);
-		if (!skb_arr_rq1[i])
+		if (!skb_arr_rq1[i]) {
+			ehea_info("No enough memory to allocate skb array\n");
 			break;
+		}
 	}
 	/* Ring doorbell */
-	ehea_update_rq1a(pr->qp, nr_rq1a);
+	ehea_update_rq1a(pr->qp, i);
 }
 
 static int ehea_refill_rq_def(struct ehea_port_res *pr,
@@ -735,8 +743,10 @@  static int ehea_proc_rwqes(struct net_device *dev,
 
 					skb = netdev_alloc_skb(dev,
 							       EHEA_L_PKT_SIZE);
-					if (!skb)
+					if (!skb) {
+						ehea_info("Not enough memory to allocate skb\n");
 						break;
+					}
 				}
 				skb_copy_to_linear_data(skb, ((char *)cqe) + 64,
 						 cqe->num_bytes_transfered - 4);