From patchwork Sat Feb 10 04:00:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Mauro S. M. Rodrigues" X-Patchwork-Id: 871637 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=osuosl.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zddWt4l7Mz9s7g for ; Sat, 10 Feb 2018 15:00:57 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id A6D6F2642C; Sat, 10 Feb 2018 04:00:54 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wCDj0Js3zzLE; Sat, 10 Feb 2018 04:00:53 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by silver.osuosl.org (Postfix) with ESMTP id 5988A22D68; Sat, 10 Feb 2018 04:00:48 +0000 (UTC) X-Original-To: intel-wired-lan@lists.osuosl.org Delivered-To: intel-wired-lan@lists.osuosl.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by ash.osuosl.org (Postfix) with ESMTP id A8AB91BFEBE for ; Sat, 10 Feb 2018 04:00:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id A47F7886C1 for ; Sat, 10 Feb 2018 04:00:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VhOncQSs6hm6 for ; Sat, 10 Feb 2018 04:00:45 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 22743886BC for ; Sat, 10 Feb 2018 04:00:45 +0000 (UTC) Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w1A3xEIZ061613 for ; Fri, 9 Feb 2018 23:00:44 -0500 Received: from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158]) by mx0a-001b2d01.pphosted.com with ESMTP id 2g1fgfjpcb-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 09 Feb 2018 23:00:44 -0500 Received: from localhost by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 9 Feb 2018 21:00:43 -0700 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 9 Feb 2018 21:00:41 -0700 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w1A40fj58323374; Fri, 9 Feb 2018 21:00:41 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 563FAC6037; Fri, 9 Feb 2018 21:00:41 -0700 (MST) Received: from localhost (unknown [9.85.180.123]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP id DB014C6043; Fri, 9 Feb 2018 21:00:40 -0700 (MST) From: "Mauro S. M. Rodrigues" To: intel-wired-lan@lists.osuosl.org Date: Sat, 10 Feb 2018 02:00:39 -0200 X-Mailer: git-send-email 2.7.4 X-TM-AS-GCONF: 00 x-cbid: 18021004-0024-0000-0000-000017EA1D12 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008507; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000251; SDB=6.00987501; UDB=6.00501246; IPR=6.00766859; BA=6.00005821; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019472; XFM=3.00000015; UTC=2018-02-10 04:00:42 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18021004-0025-0000-0000-00004EA7D4A3 Message-Id: <1518235239-16757-1-git-send-email-maurosr@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-02-10_01:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1802100049 Subject: [Intel-wired-lan] [PATCH] i40e: Fix bad state due to failed dcbx autonegotiation X-BeenThere: intel-wired-lan@osuosl.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" When connected to a dcbx capable switch, during the earlier link negotiations, a device can be left in a bad state which compromises the probe process of all interfaces: [ 11.404108] i40e 0002:01:00.0: capability discovery failed, err OK aq_err I40E_AQ_RC_EMODE The message above tell us that something failed during the capability discovery process, the error I40E_AQ_RC_EMODE (21) means the device is in a mode that such operation is not allowed, according to the datasheet. Digging some more in the source code it's possible to check that it fails during the I40E_PRTGEN_CNF read using i40e_aq_debug_read_register within i40e_parse_discover_capabilities, which, again according to the datasheet, was not supposed to return that. I also verified that any attempt to read a register, I40E_GL_FWSTS for instance, fails as well. Disabling the dcbx capability or setting it to dcbx-1.01, OUI= , instead of autonegotiation or ieee-dcbx, OUI= , mitgates the issue. Another evidence of the device getting into a bad state is tcpdump capture during the autonegotiation. It's possible to see the switch sharing its dcbx settings with willing bit=0. The device then answers with willing=1 to learn the dcbx configuration: " 1... .... = Willing: Yes" After that there is no other communication coming from the NIC, that make me to believe the device entered the bad state when trying to replicate switch dcbx's settings. From a device driver standpoint it's possible to recover from the bad state by issuing a Global Reset and ask PCI subsystem to probe the device again after it, by return -EPROBE_DEFER, we will see the following messages with this patch: [ 400.178850] i40e 0002:01:00.0: Using 64-bit DMA iommu bypass [ 404.179406] i40e 0002:01:00.0: fw 5.1.40981 api 1.5 nvm 5.03 0x80002469 1.1313.0 [ 404.420382] i40e 0002:01:00.0: capability discovery failed, err OK aq_err I40E_AQ_RC_EMODE [ 404.420473] i40e 0002:01:00.0: Probe failed due to unexpected device state, trying to fix it by resetting the device. Since the reset was done the other ports will probe just fine, [ 404.420610] i40e 0002:01:00.1: Using 64-bit DMA iommu bypass [ 407.659108] i40e 0002:01:00.1: fw 5.1.40981 api 1.5 nvm 5.03 0x80002469 1.1313.0 [ 407.900214] i40e 0002:01:00.1: MAC address: 0c:c4:7a:b7:ff:d9 [ 407.908532] i40e 0002:01:00.1 enP2p1s0f1: renamed from eth0 [ 407.909071] i40e 0002:01:00.1: PCI-Express: Speed 8.0GT/s Width x8 [ 407.909630] i40e 0002:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 20 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA then the first port will be re-probed later. [ 408.203217] i40e 0002:01:00.0: fw 5.1.40981 api 1.5 nvm 5.03 0x80002469 1.1313.0 [ 408.447187] i40e 0002:01:00.0: MAC address: 0c:c4:7a:b7:ff:d8 [ 408.699988] i40e 0002:01:00.0 enP2p1s0f0: renamed from eth0 [ 408.702453] i40e 0002:01:00.0: PCI-Express: Speed 8.0GT/s Width x8 [ 408.703011] i40e 0002:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 20 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA Signed-off-by: Mauro S. M. Rodrigues Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c --- drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index e31adbc..c41bb0e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -13513,8 +13513,18 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent) i40e_clear_pxe_mode(hw); err = i40e_get_capabilities(pf, i40e_aqc_opc_list_func_capabilities); - if (err) + if (err) { + if (hw->aq.asq_last_status == I40E_AQ_RC_EMODE) { + dev_warn(&pdev->dev, "Probe failed due to unexpected device state, trying to fix it by resetting the device.\n"); + i40e_do_reset(pf, BIT(__I40E_GLOBAL_RESET_REQUESTED), + false); + /* In this situation we reset and ask for re-probe + * later. + */ + err = -EPROBE_DEFER; + } goto err_adminq_setup; + } err = i40e_sw_init(pf); if (err) {