From patchwork Thu Feb 25 13:08:34 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Stanislaw Gruszka <sgruszka@redhat.com>
X-Patchwork-Id: 46229
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 442B3B7C67
	for <patchwork-incoming@ozlabs.org>;
	Thu, 25 Feb 2010 21:09:39 +1100 (EST)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756797Ab0BYKI6 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Thu, 25 Feb 2010 05:08:58 -0500
Received: from mx1.redhat.com ([209.132.183.28]:64689 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754775Ab0BYKIz (ORCPT <rfc822;netdev@vger.kernel.org>);
	Thu, 25 Feb 2010 05:08:55 -0500
Received: from int-mx01.intmail.prod.int.phx2.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o1PA8r5h016067
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Thu, 25 Feb 2010 05:08:53 -0500
Received: from dhcp-lab-109.englab.brq.redhat.com (dhcp-0-189.brq.redhat.com
	[10.34.0.189])
	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id o1PA8phH001016; Thu, 25 Feb 2010 05:08:51 -0500
Date: Thu, 25 Feb 2010 14:08:34 +0100
From: Stanislaw Gruszka <sgruszka@redhat.com>
To: netdev@vger.kernel.org, Vladislav Zolotarov <vladz@broadcom.com>
Cc: David Miller <davem@davemloft.net>,
	Eilon Greenstein <eilong@broadcom.com>,
	David Howells <dhowells@redhat.com>
Subject: [RFC PATCH] bnx2x: fix tx queue locking and memory barriers
Message-ID: <20100225140834.0169e9f2@dhcp-lab-109.englab.brq.redhat.com>
Organization: RedHat
Mime-Version: 1.0
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

We have done some optimizations in bnx2x_start_xmit() and bnx2x_tx_int(), which 
in my opinion can lead into some theoretical race conditions. 

I can be pretty wrong here, but if so, we have to optimize some other drivers,
which use memory barriers/locking schema from that patch (like tg3, bnx2).

Memory barriers here IMHO, prevent to make queue permanently stopped when on one
cpu bnx2x_tx_int() make queue empty, whereas on other cpu bnx2x_start_xmit() see
it full and make stop it, such cause queue will be stopped forever. 

I'm not quite sure what for is __netif_tx_lock, but other drivers use it.
---
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 5adf2a0..ca91aa8 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -893,7 +893,10 @@ static inline u16 bnx2x_tx_avail(struct bnx2x_fastpath *fp)
 	u16 prod;
 	u16 cons;
 
-	barrier(); /* Tell compiler that prod and cons can change */
+	/* prod and cons can change on other cpu, want to see
+	   consistend available space and queue (stop/running) state */
+	smp_mb();
+
 	prod = fp->tx_bd_prod;
 	cons = fp->tx_bd_cons;
 
@@ -957,21 +960,23 @@ static int bnx2x_tx_int(struct bnx2x_fastpath *fp)
 	fp->tx_pkt_cons = sw_cons;
 	fp->tx_bd_cons = bd_cons;
 
+	/* Need to make the tx_bd_cons update visible to start_xmit()
+	 * before checking for netif_tx_queue_stopped().  Without the
+	 * memory barrier, there is a small possibility that start_xmit()
+	 * will miss it and cause the queue to be stopped forever. This
+	 * can happen when we make queue empty here, when on other cpu
+	 * start_xmit() still see it becoming full and stop.
+	 */
+	smp_mb();
+
 	/* TBD need a thresh? */
 	if (unlikely(netif_tx_queue_stopped(txq))) {
-
-		/* Need to make the tx_bd_cons update visible to start_xmit()
-		 * before checking for netif_tx_queue_stopped().  Without the
-		 * memory barrier, there is a small possibility that
-		 * start_xmit() will miss it and cause the queue to be stopped
-		 * forever.
-		 */
-		smp_mb();
-
+		__netif_tx_lock(txq, smp_processor_id());
 		if ((netif_tx_queue_stopped(txq)) &&
 		    (bp->state == BNX2X_STATE_OPEN) &&
 		    (bnx2x_tx_avail(fp) >= MAX_SKB_FRAGS + 3))
 			netif_tx_wake_queue(txq);
+		__netif_tx_unlock(txq);
 	}
 	return 0;
 }