[v2,3/3] tg3: Fix tx_pending checks for tg3_tso_bug

In tg3_set_ringparam(), the tx_pending test to cover the cases where
tg3_tso_bug() is entered has two problems
1) the check is only done for certain hardware whereas the workaround
is now used more broadly. IOW, the check may not be performed when it
is needed.
2) the check is too optimistic.

For example, with a 5761 (SHORT_DMA_BUG), tg3_set_ringparam() skips over the
"tx_pending <= (MAX_SKB_FRAGS * 3)" check because TSO_BUG is false. Even if it
did do the check, with a full sized skb, frag_cnt_est = 135 but the check is
for <= MAX_SKB_FRAGS * 3 (= 17 * 3 = 51). So the check is insufficient. This
leads to the following situation: by setting, ex. tx_pending = 100, there can
be an skb that triggers tg3_tso_bug() and that is large enough to cause
tg3_tso_bug() to stop the queue even when it is empty. We then end up with a
netdev watchdog transmit timeout.

Given that 1) some of the conditions tested for in tg3_tx_frag_set() apply
regardless of the chipset flags and that 2) it is difficult to estimate ahead
of time the max possible number of frames that a large skb may be split into
by gso, we instead take the approach of adjusting dev->gso_max_segs according
to the requested tx_pending size.

This puts us in the exceptional situation that a single skb that triggers
tg3_tso_bug() may require the entire tx ring. Usually the tx queue is woken up
when at least a quarter of it is available (TG3_TX_WAKEUP_THRESH) but that
would be insufficient now. To avoid useless wakeups, the tx queue wake up
threshold is made dynamic. Likewise, usually the tx queue is stopped as soon
as an skb with max frags may overrun it. Since the skbs submitted from
tg3_tso_bug() use a controlled number of descriptors, the tx queue stop
threshold may be lowered.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
---
Changes v1->v2
* in tg3_set_ringparam(), reduce gso_max_segs further to budget 3 descriptors
  per gso seg instead of only 1 as in v1
* in tg3_tso_bug(), check that this estimation (3 desc/seg) holds, otherwise
  linearize some skbs as needed
* in tg3_start_xmit(), make the queue stop threshold a parameter, for the
  reason explained in the commit description

I was concerned that this last change, because of the extra call in the
default xmit path, may impact performance so I performed an rr latency test
but I did not measure a significant impact. That test was with default mtu and
ring size.

# perf stat -r10 -ad netperf -H 192.168.9.30 -l60 -T 0,0 -t omni -- -d rr

* without patches
	rr values: 7039.63 6865.03 6939.21 6919.31 6931.88 6932.74 6925.1 6953.33 6868.43 6935.65
	sample size: 10
	mean: 6931.031
	standard deviation: 48.10918
	quantiles: 6865.03 6920.757 6932.31 6938.32 7039.63
	6930±50

 Performance counter stats for 'netperf -H 192.168.9.30 -l60 -T 0,0 -t omni -- -d rr' (10 runs):

     480643.024723 task-clock                #    8.001 CPUs utilized            ( +-  0.00% ) [100.00%]
           855,136 context-switches          #    0.002 M/sec                    ( +-  0.23% ) [100.00%]
               521 CPU-migrations            #    0.000 M/sec                    ( +-  6.49% ) [100.00%]
               104 page-faults               #    0.000 M/sec                    ( +-  2.73% )
   298,416,906,437 cycles                    #    0.621 GHz                      ( +-  4.08% ) [15.01%]
   812,072,320,370 stalled-cycles-frontend   #  272.13% frontend cycles idle     ( +-  1.89% ) [25.01%]
   685,633,562,247 stalled-cycles-backend    #  229.76% backend  cycles idle     ( +-  2.50% ) [35.00%]
   117,665,891,888 instructions              #    0.39  insns per cycle
                                             #    6.90  stalled cycles per insn  ( +-  2.22% ) [45.00%]
    26,158,399,505 branches                  #   54.424 M/sec                    ( +-  2.10% ) [50.00%]
       205,688,614 branch-misses             #    0.79% of all branches          ( +-  0.78% ) [50.00%]
    27,882,474,171 L1-dcache-loads           #   58.011 M/sec                    ( +-  1.98% ) [50.00%]
       369,911,372 L1-dcache-load-misses     #    1.33% of all L1-dcache hits    ( +-  0.62% ) [50.00%]
        76,240,847 LLC-loads                 #    0.159 M/sec                    ( +-  1.04% ) [40.00%]
             3,220 LLC-load-misses           #    0.00% of all LL-cache hits     ( +- 19.49% ) [ 5.00%]

      60.074059340 seconds time elapsed                                          ( +-  0.00% )

* with patches
	rr values: 6732.65 6920.1 6909.46 7032.41 6864.43 6897.6 6815.19 6967.83 6849.23 6929.52
	sample size: 10
	mean: 6891.842
	standard deviation: 82.91901
	quantiles: 6732.65 6853.03 6903.53 6927.165 7032.41
	6890±80

 Performance counter stats for 'netperf -H 192.168.9.30 -l60 -T 0,0 -t omni -- -d rr' (10 runs):

     480675.949728 task-clock                #    8.001 CPUs utilized            ( +-  0.01% ) [100.00%]
           850,461 context-switches          #    0.002 M/sec                    ( +-  0.37% ) [100.00%]
               564 CPU-migrations            #    0.000 M/sec                    ( +-  5.67% ) [100.00%]
               417 page-faults               #    0.000 M/sec                    ( +- 76.04% )
   287,019,442,295 cycles                    #    0.597 GHz                      ( +-  7.16% ) [15.01%]
   828,198,830,689 stalled-cycles-frontend   #  288.55% frontend cycles idle     ( +-  3.01% ) [25.01%]
   718,230,307,166 stalled-cycles-backend    #  250.24% backend  cycles idle     ( +-  3.53% ) [35.00%]
   117,976,598,188 instructions              #    0.41  insns per cycle
                                             #    7.02  stalled cycles per insn  ( +-  4.06% ) [45.00%]
    26,715,853,108 branches                  #   55.580 M/sec                    ( +-  3.77% ) [50.00%]
       198,787,673 branch-misses             #    0.74% of all branches          ( +-  0.86% ) [50.00%]
    28,416,922,166 L1-dcache-loads           #   59.119 M/sec                    ( +-  3.54% ) [50.00%]
       367,613,007 L1-dcache-load-misses     #    1.29% of all L1-dcache hits    ( +-  0.47% ) [50.00%]
        75,260,575 LLC-loads                 #    0.157 M/sec                    ( +-  2.24% ) [40.00%]
             5,777 LLC-load-misses           #    0.01% of all LL-cache hits     ( +- 36.03% ) [ 5.00%]

      60.077898757 seconds time elapsed                                          ( +-  0.01% )

I reproduced this bug using the same approach explained in patch 1.
The bug reproduces with tx_pending <= 135

---
 drivers/net/ethernet/broadcom/tg3.c | 67 +++++++++++++++++++++++++++++--------
 drivers/net/ethernet/broadcom/tg3.h |  1 +
 2 files changed, 54 insertions(+), 14 deletions(-)

Message ID	1408658240-6811-3-git-send-email-bpoirier@suse.de
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4C5B114011D for <patchwork-incoming@ozlabs.org>; Fri, 22 Aug 2014 07:58:06 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755470AbaHUV5n (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 21 Aug 2014 17:57:43 -0400 Received: from mail-pd0-f180.google.com ([209.85.192.180]:34340 "EHLO mail-pd0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754997AbaHUV5l (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 21 Aug 2014 17:57:41 -0400 Received: by mail-pd0-f180.google.com with SMTP id v10so14669849pde.11 for <multiple recipients>; Thu, 21 Aug 2014 14:57:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=WbTaDvLY9YvNk73xIm7iHSej9qeppwrwxFecC52qKuc=; b=Sgny0QiFvy/IOYxHuK+ECvtKwmyOzvIoD8C/DFyxUdlQPSe6tA6zjzFA11+PAvxC5t U1w+lOSD7Ej3dG5yvFP44FLYl6vSV30YIBABcx7714BgO+p77l3djA9e5vlJXt9qg1SF BvWtjfRbiSu9hIg0pmO0v0bXkyjtG+VsiBBuZlbC3rbQpL1fh+8ZYOSIxSH8uwAfDCrm X9E4kUTXVNtD2S1YTzIsK32v+5XF4zsD3BRgBvYLheVAV+ukEfrQ34yPAvGnOvhx910F Ohz1jdZQTDVRZ6s0tpM7JKR/puO/a6qh2CYz080huMqGl2eQFZ3N9LOvdEwQRja6RPl+ fuNA== X-Received: by 10.70.53.7 with SMTP id x7mr1618579pdo.38.1408658260514; Thu, 21 Aug 2014 14:57:40 -0700 (PDT) Received: from f1.synalogic.ca (adsl-108-203-76-248.dsl.scrm01.sbcglobal.net. [108.203.76.248]) by mx.google.com with ESMTPSA id qk9sm94933762pac.16.2014.08.21.14.57.39 for <multiple recipients> (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Aug 2014 14:57:39 -0700 (PDT) From: Benjamin Poirier <bpoirier@suse.de> To: Prashant Sreedharan <prashant@broadcom.com>, Michael Chan <mchan@broadcom.com> Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/3] tg3: Fix tx_pending checks for tg3_tso_bug Date: Thu, 21 Aug 2014 14:57:20 -0700 Message-Id: <1408658240-6811-3-git-send-email-bpoirier@suse.de> X-Mailer: git-send-email 1.8.4.5 In-Reply-To: <1408658240-6811-1-git-send-email-bpoirier@suse.de> References: <1408658240-6811-1-git-send-email-bpoirier@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[v2,3/3] tg3: Fix tx_pending checks for tg3_tso_bug

Commit Message

Comments

Patch