From patchwork Fri Jan 5 11:02:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Malz X-Patchwork-Id: 1883122 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4T6HDl0NnMz1yPM for ; Sat, 6 Jan 2024 08:49:59 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1rLs4e-0000Qt-Qk; Fri, 05 Jan 2024 21:49:52 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1rLhyd-0001wq-Ie for kernel-team@lists.ubuntu.com; Fri, 05 Jan 2024 11:02:59 +0000 Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 6A0803F154 for ; Fri, 5 Jan 2024 11:02:59 +0000 (UTC) Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-50e55470b49so1088027e87.0 for ; Fri, 05 Jan 2024 03:02:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704452578; x=1705057378; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b8fbvuBPIFZQbAkniSocjHbPZISddX1/hVgCUN4wep0=; b=uAqBNGehIe/TtR/LyzN/4mo5UiHFXquq5LiwlpF3/FT4m7ykTnT+Esp1ZrvvRwHmD9 mVDJVHv+9vvep+i9Igz5w0/xQLMMs8jh+9+jX7p9WEnPVBVDFjAPSeon6ESTdaboJOAe ZyDI/xiCU+dyR3gYNOo5kOmWWR4Ip9w+OT6Or7opUHfwEKryHxZ45lmSXDoFIwkUJasT FjCuA2D6vm4XULGlvwbq3gwh5yn1kYKSe6+nA7YFWe0LHgaEVDIUN+b/BE51Oevqn644 IS+da1xI6ql7bbTK0fJBLwj4uPuDZBz94ZLfeIfTcLBReg0UKtcieQmgv8pdBaLsSnMD Gxug== X-Gm-Message-State: AOJu0Yz25MwJx+rk5CUGA+zNSpxtbcAgJckQnKnwJ4pGBRgNv02ij/Ml Qs/12vLBmCL1ZMVwTgWWToLMPZmtalEemD9O/il8pKCVS0zsCGz9HHSr7YqdGwdR4s7wXM2hOoc dNXYsUKrfOos6kplLAyFcHaaIwQhtXbZQgNh8TRpjCw0nHxxk0WvGwf7n X-Received: by 2002:a05:6512:21a8:b0:50e:a6f8:aac0 with SMTP id c8-20020a05651221a800b0050ea6f8aac0mr910168lft.20.1704452578360; Fri, 05 Jan 2024 03:02:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IHqFNdwyQdYbofizrr9An38+BUSva8dXn4FWgrRrnqyjL0wb/Q9MqQtFI/fcz7wJC34nHwHzA== X-Received: by 2002:a05:6512:21a8:b0:50e:a6f8:aac0 with SMTP id c8-20020a05651221a800b0050ea6f8aac0mr910165lft.20.1704452577963; Fri, 05 Jan 2024 03:02:57 -0800 (PST) Received: from rmalz.. (89-64-27-150.dynamic.chello.pl. [89.64.27.150]) by smtp.gmail.com with ESMTPSA id wh5-20020a170906fd0500b00a294c744fcasm511758ejb.182.2024.01.05.03.02.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jan 2024 03:02:57 -0800 (PST) From: Robert Malz To: kernel-team@lists.ubuntu.com Subject: [PATCH 0/1][SRU][M] Intel E810 transmit hang with bonding enabled Date: Fri, 5 Jan 2024 12:02:55 +0100 Message-Id: <20240105110256.1455465-1-robert.malz@canonical.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Mailman-Approved-At: Fri, 05 Jan 2024 21:49:36 +0000 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/2036239 [Impact] * Issue is causing transmit hang on E810 ports with bonding enabled. * Based on the provided logs, TX hang can last for even a couple of minutes, but in most scenarios, the network will be recovered after the ice driver performs a PF reset (TX hang handler routine). * Originally, the issue was observed during Tempest tests on a newly created OpenStack cluster, resulting in a lack of certification. [Fix] * Initially, a workaround has been proposed by Intel engineers to disable LAG initialization [1]. This change has been tested in an environment where reproduction is easily achieved. After multiple iterations, no reproduction has been observed. * Shortly after, Intel proposed a patch [2] to disable LAG initialization if NVM does not expose proper capabilities. [Test Plan] * To reproduce the issue, over a 20-node cluster was used with Ceph-based storage. The problem could sometimes manifest while deploying a cluster or after the cluster was already deployed during the Tempest test run. * The issue could appear on a random node, making reproduction hard to achieve. * Multiple stress tests on single host with similar configuration did not trigger a reproduction. [Where problems could occur] * All ice drivers with ice_lag_event_handler registered can expose the issue. This handler is not implemented in 20.04 * CVL4.2 and older NVM images for E810 does not expose SRIOV LAG capabilities (CVL4.3 wasn't checked) meaning at some point NVM with this capability will be released. Although potentialy issue is caused by using features without proper FW support [2], we want to take a closer look once NVMs with proper support are introduced. [1] - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/comments/40 [2] - https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20231211/038588.html 4d50fcdc2476eef94c14c6761073af5667bb43b6 Dave Ertman (1): [SRU][M][PATCH 0/1] ice: alter feature support check for SRIOV and LAG drivers/net/ethernet/intel/ice/ice.h | 2 ++ .../net/ethernet/intel/ice/ice_adminq_cmd.h | 3 +++ drivers/net/ethernet/intel/ice/ice_common.c | 8 ++++++ drivers/net/ethernet/intel/ice/ice_lag.c | 25 +++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_lib.c | 2 +- drivers/net/ethernet/intel/ice/ice_lib.h | 1 + drivers/net/ethernet/intel/ice/ice_type.h | 2 ++ 7 files changed, 42 insertions(+), 1 deletion(-)