From patchwork Thu Sep 1 16:53:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frank Heimes X-Patchwork-Id: 1673060 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=PcbvRBor; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MJRvh3Bvtz1ynC for ; Fri, 2 Sep 2022 02:53:51 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1oTnRo-0002N3-7q; Thu, 01 Sep 2022 16:53:44 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1oTnRm-0002Mp-Ot for kernel-team@lists.ubuntu.com; Thu, 01 Sep 2022 16:53:42 +0000 Received: from T570.fritz.box (2.general.fheimes.us.vpn [10.172.66.67]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 2D5AD4094A for ; Thu, 1 Sep 2022 16:53:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1662051221; bh=qg/WQimsm5BVqG/0WAnANuYrcKZ9SEEUdX+vxKMYt/I=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=PcbvRBorOcTgV8TYB8wZ6SnRfkN/7nKTDRmnfjxNtipmLLnMOVV+D8Ekv/9OuR5wf GFbrzQiU6JSu2A9jgj6wwHkQfwCcH2zJ2qynbXoS/qnHcxGRrQ9GNI8upArZxuQVrF W559juEO+fFCXiXwdwQ4wsZgN6ag115oiJk74NG42duRAXg66801EZtE2mS3gymHS+ hHsOGZ3duUFFLaiQBmZAzXiPKvUrb+gbXpv/8aIdZ0OMWHJ1JkIy1jjPcIdjK1EgKs tThDLsjlfGm6yc5js3TflpgMMOz41p1yyUYyTN8Wi5WHS6i5AqdGHqfR6IYpKZhhEE bUZe9H20Jl1rg== From: frank.heimes@canonical.com To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 0/1] net/mlx5: Avoid processing commands before cmdif is ready (LP: 1987287) Date: Thu, 1 Sep 2022 18:53:36 +0200 Message-Id: <20220901165337.602338-1-frank.heimes@canonical.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1987287 SRU Justification: [Impact] * If the mlx5 driver is reloading while the recovery flow is happening, and if it receives new commands before the command interface is up again, this can lead to null pointer that tries to access non- initialized command structures. * So it's required to avoid processing commands before the command interface is up again. * This is accomplished by a new cmdif state that helps to avoid processing commands while cmdif is not ready. [Fix] * backport of f7936ddd35d8 f7936ddd35d8b849daf0372770c7c9dbe7910fca "net/mlx5: Avoid processing commands before cmdif is ready" [Test Plan] * An Ubuntu Server for s390x 18.04 or 20.04 LPAR or z/VM installation is needed that has Mellanox cards (RoCE Express 2.1) assigned, configured and enabled and that runs a 5.4 kernel (on bionic hwe-5.4). * Now trigger a recovery (guess that can be done at the Support Element) and reload the driver at the same time. * Make sure the module/driver mlx5 is loaded and in use (otherwise it can't be removed/unloaded). * Now remove/unload the module with: sudo modprobe -r mlx5 and (re-)load it again with: sudo modprobe mlx5 * Due to the lack of RoCE Express 2.1 hardware, IBM needs to do the verification. [Where problems could occur] * In case there is an issue with 'cmdif' it might not have the correct interface state, which: - either might lead to the fact that commands are not properly blocked and the situation is similar like before - or the commands may get always blocked, which render the hardware useless - or might block in wrong situation, which will cause unexpected issues and broken behavior. * Since the patch got upstream accepted with v5.7-rc7 it's not new to the kernel, was already part of groovy (and above) and is therefor already in use by newer Ubuntu releases. [Other Info] * Since the patch is upstream since v5.7-rc7, it's already included in jammy and kinetic. * Since the upstream patch incl. the line: Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") it looks to me that it was forgotten to mark the patch for upstream stable updates. * Such SRUs for focal's 5.4 will automatically land in bionic's hwe-5.4, too. But since this was especially requested for bionic's hwe-5.4, I wanted to mention this here. Eran Ben Elisha (1): net/mlx5: Avoid processing commands before cmdif is ready drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 10 ++++++++++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++++ include/linux/mlx5/driver.h | 9 +++++++++ 3 files changed, 23 insertions(+) Acked-by: Tim Gardner