diff mbox

[4/4] net/mlx4_en: Use atomic counter to decide when queue is full

Message ID 1340270358-19504-5-git-send-email-yevgenyp@mellanox.co.il
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Yevgeny Petrilin June 21, 2012, 9:19 a.m. UTC
The Transmit and transmit completion flows execute from different contexts,
which are not synchronized. Hence naive reading the of consumer index might
give wrong value by the time it is being used, That could lead to a state of transmit timeout.
Fix that by using atomic variable to maintain that index.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   16 ++++++++--------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    1 +
 2 files changed, 9 insertions(+), 8 deletions(-)

Comments

David Miller June 23, 2012, 12:23 a.m. UTC | #1
From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Thu, 21 Jun 2012 12:19:17 +0300

> The Transmit and transmit completion flows execute from different contexts,
> which are not synchronized. Hence naive reading the of consumer index might
> give wrong value by the time it is being used, That could lead to a state of transmit timeout.
> Fix that by using atomic variable to maintain that index.
> 
> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

I'm not convinced.  There is only one place that actually changes
the counter.

So it seems more like you have a missing memory barrier somewhere.

Other drivers do not need to use something as expansive as an atomic
variable for this and neither should you.

I'm not applying this patch series, you'll need to resubmit it in
it's entirety once you fix this patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight June 25, 2012, 9 a.m. UTC | #2
> > The Transmit and transmit completion flows execute from different
contexts,
> > which are not synchronized. Hence naive reading the of consumer
index might
> > give wrong value by the time it is being used, That could lead to a
state of transmit timeout.
> > Fix that by using atomic variable to maintain that index.
> > 
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> I'm not convinced.  There is only one place that actually changes
> the counter.
> 
> So it seems more like you have a missing memory barrier somewhere.

Or just keep the two ring indexes - instead of keeping the
number of 'active' entries as well.
Then you don't have a variable which the tx setup and
tx completion routines both update.

	David


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 25, 2012, 10:12 a.m. UTC | #3
On Mon, 2012-06-25 at 10:00 +0100, David Laight wrote:
> > > The Transmit and transmit completion flows execute from different
> contexts,
> > > which are not synchronized. Hence naive reading the of consumer
> index might
> > > give wrong value by the time it is being used, That could lead to a
> state of transmit timeout.
> > > Fix that by using atomic variable to maintain that index.
> > > 
> > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > 
> > I'm not convinced.  There is only one place that actually changes
> > the counter.
> > 
> > So it seems more like you have a missing memory barrier somewhere.
> 
> Or just keep the two ring indexes - instead of keeping the
> number of 'active' entries as well.
> Then you don't have a variable which the tx setup and
> tx completion routines both update.

This is what was implied by David.

Using a producer/consumer index and appropriate memory barriers.

start_xmit() and tx completion can be truly lockless and atomicless in
their fast path.

There are many drivers doing that correctly.

tg3 driver is a good example.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yevgeny Petrilin June 25, 2012, 1:06 p.m. UTC | #4
> > The Transmit and transmit completion flows execute from different
> > contexts, which are not synchronized. Hence naive reading the of
> > consumer index might give wrong value by the time it is being used, That could lead to a state of transmit timeout.
> > Fix that by using atomic variable to maintain that index.
> >
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> I'm not convinced.  There is only one place that actually changes the counter.
> 
> So it seems more like you have a missing memory barrier somewhere.
> 
> Other drivers do not need to use something as expansive as an atomic
> variable for this and neither should you.
> 
> I'm not applying this patch series, you'll need to resubmit it in it's
> entirety once you fix this patch.

Thanks,
I'll resubmit the other 3 and continue to work on this one.

Thanks,
Yevgeny
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 019d856..f4b4703 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -165,6 +165,7 @@  int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
 	ring->last_nr_txbb = 1;
 	ring->poll_cnt = 0;
 	ring->blocked = 0;
+	atomic_set(&ring->inflight, 0);
 	memset(ring->tx_info, 0, ring->size * sizeof(struct mlx4_en_tx_info));
 	memset(ring->buf, 0, ring->buf_size);
 
@@ -364,15 +365,13 @@  static void mlx4_en_process_tx_cq(struct net_device *dev, struct mlx4_en_cq *cq)
 	wmb();
 	ring->cons += txbbs_skipped;
 	netdev_tx_completed_queue(ring->tx_queue, packets, bytes);
+	atomic_sub(txbbs_skipped, &ring->inflight);
 
 	/* Wakeup Tx queue if this ring stopped it */
-	if (unlikely(ring->blocked)) {
-		if ((u32) (ring->prod - ring->cons) <=
-		     ring->size - HEADROOM - MAX_DESC_TXBBS) {
-			ring->blocked = 0;
-			netif_tx_wake_queue(ring->tx_queue);
-			priv->port_stats.wake_queue++;
-		}
+	if (unlikely(ring->blocked && txbbs_skipped > 0)) {
+		ring->blocked = 0;
+		netif_tx_wake_queue(ring->tx_queue);
+		priv->port_stats.wake_queue++;
 	}
 }
 
@@ -588,7 +587,7 @@  netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		vlan_tag = vlan_tx_tag_get(skb);
 
 	/* Check available TXBBs And 2K spare for prefetch */
-	if (unlikely(((int)(ring->prod - ring->cons)) >
+	if (unlikely(atomic_read(&ring->inflight) >
 		     ring->size - HEADROOM - MAX_DESC_TXBBS)) {
 		/* every full Tx ring stops queue */
 		netif_tx_stop_queue(ring->tx_queue);
@@ -710,6 +709,7 @@  netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	ring->prod += nr_txbb;
+	atomic_add(nr_txbb, &ring->inflight);
 
 	/* If we used a bounce buffer then copy descriptor back into place */
 	if (bounce)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 225c20d..6a8a69d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -257,6 +257,7 @@  struct mlx4_en_tx_ring {
 	struct mlx4_bf bf;
 	bool bf_enabled;
 	struct netdev_queue *tx_queue;
+	atomic_t inflight;
 };
 
 struct mlx4_en_rx_desc {