[net,v6,4/4] tg3: Fix tx_pending checks for tg3_tso_bug

From: Eric Dumazet <eric.dumazet@gmail.com>

In tg3_set_ringparam(), the tx_pending test to cover the cases where
tg3_tso_bug() is entered has two problems
1) the check is only done for certain hardware whereas the workaround
is now used more broadly. IOW, the check may not be performed when it
is needed.
2) the check is too optimistic.

For example, with a 5761 (SHORT_DMA_BUG), tg3_set_ringparam() skips over the
"tx_pending <= (MAX_SKB_FRAGS * 3)" check because TSO_BUG is false. Even if it
did do the check, with a full sized skb, frag_cnt_est = 135 but the check is
for <= MAX_SKB_FRAGS * 3 (= 17 * 3 = 51). So the check is insufficient. This
leads to the following situation: by setting, ex. tx_pending = 100, there can
be an skb that triggers tg3_tso_bug() and that is large enough to cause
tg3_tso_bug() to stop the queue even when it is empty. We then end up with a
netdev watchdog transmit timeout.

Given that 1) some of the conditions tested for in tg3_tx_frag_set() apply
regardless of the chipset flags and that 2) it is difficult to estimate ahead
of time the max possible number of frames that a large skb may be split into
by gso, this patch changes tg3_set_ringparam() to ignore the requirements of
tg3_tso_bug(). Those requirements are instead checked in tg3_tso_bug() itself
and if there is not a sufficient number of descriptors available in the tx
queue, the skb is linearized.

This patch also removes the current scheme in tg3_tso_bug() where the number
of descriptors required to transmit an skb is estimated. Instead,
gso_segment() is called without _SG which yields predictable, linear skbs.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>

---

Changes v1->v2
* in tg3_set_ringparam(), reduce gso_max_segs further to budget 3 descriptors
  per gso seg instead of only 1 as in v1
* in tg3_tso_bug(), check that this estimation (3 desc/seg) holds, otherwise
  linearize some skbs as needed
* in tg3_start_xmit(), make the queue stop threshold a parameter, for the
  reason explained in the commit description

Changes v2->v3
* use tg3_maybe_stop_txq() instead of repeatedly open coding it
* add the requested tp->tx_dropped++ stat increase in tg3_tso_bug() if
  skb_linearize() fails and we must abort
* in the same code block, add an additional check to stop the queue with the
  default threshold. Otherwise, the netdev_err message at the start of
  __tg3_start_xmit() could be triggered when the next frame is transmitted.
  That is because the previous calls to __tg3_start_xmit() in tg3_tso_bug()
  may have been using a stop_thresh=segs_remaining that is < MAX_SKB_FRAGS +
  1.

Changes v3->v4
* in tg3_set_ringparam(), make sure that wakeup_thresh does not end up being
  >= tx_pending. Identified by Prashant.

Changes v4->v5
* in tg3_set_ringparam(), use TG3_TX_WAKEUP_THRESH() and tp->txq_cnt instead
  of tp->irq_max. Identified by Prashant.

Changes v5->v6
* avoid changing gso_max_segs and making the tx queue wakeup threshold
  dynamic. Instead of stopping the queue when there are not enough descriptors
  available, the skb is linearized.

I reproduced this bug using the same approach explained in patch 1.
The bug reproduces with tx_pending <= 135
---
 drivers/net/ethernet/broadcom/tg3.c | 59 ++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 21 deletions(-)

Message ID	1409880647-14887-5-git-send-email-bpoirier@suse.de
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 6BFB31400EA for <patchwork-incoming@ozlabs.org>; Fri, 5 Sep 2014 11:32:00 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755818AbaIEBbi (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 4 Sep 2014 21:31:38 -0400 Received: from mail-pa0-f50.google.com ([209.85.220.50]:35345 "EHLO mail-pa0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755790AbaIEBbh (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 4 Sep 2014 21:31:37 -0400 Received: by mail-pa0-f50.google.com with SMTP id kq14so21279415pab.37 for <multiple recipients>; Thu, 04 Sep 2014 18:31:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=jbtMHVmB+7KZO1rttsVhz0qPs6iR427ucg2lSESzSh8=; b=Q4SBveJ5dDxmBmIBZoxybogpwbGlFS7EbdknDi8XoWaherXL8xZoFIw5nIErWjVNvw t0kZBCvA9+wtfqtW1czR4+UEP5hZ30buKwR6w1kr7GkuPpFiDm3Ip0OikSjM2Tiu0zS9 d2Oicq5PUFWn8zhTWkvxCOmqwQObprGztTRkfCnxG8vzMTHcd7u77B/R5JDXwxhsgT// eGVDXEUYljkQVUEtuWI/jVr7GDRFTAKIEC6NrxpeaaOJ6/h07oZkVs7AuHmG5ONatjri gOQ7ET7IMmSd9mXr2Wcwl1l3+8ssT62tjr25dSxufSw5/EdBzeD8/UvIgLCOjmIKOcYG bUjg== X-Received: by 10.66.156.42 with SMTP id wb10mr15056390pab.155.1409880696530; Thu, 04 Sep 2014 18:31:36 -0700 (PDT) Received: from f1.synalogic.ca ([108.203.77.233]) by mx.google.com with ESMTPSA id q1sm336548pdq.67.2014.09.04.18.31.35 for <multiple recipients> (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Sep 2014 18:31:35 -0700 (PDT) From: Benjamin Poirier <bpoirier@suse.de> To: Prashant Sreedharan <prashant@broadcom.com>, Michael Chan <mchan@broadcom.com> Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net v6 4/4] tg3: Fix tx_pending checks for tg3_tso_bug Date: Thu, 4 Sep 2014 18:30:47 -0700 Message-Id: <1409880647-14887-5-git-send-email-bpoirier@suse.de> X-Mailer: git-send-email 1.8.4.5 In-Reply-To: <1409880647-14887-1-git-send-email-bpoirier@suse.de> References: <1409880647-14887-1-git-send-email-bpoirier@suse.de> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[net,v6,4/4] tg3: Fix tx_pending checks for tg3_tso_bug

Commit Message

Comments

Patch