Message ID | 20220901165337.602338-1-frank.heimes@canonical.com |
---|---|
Headers | show |
Series | net/mlx5: Avoid processing commands before cmdif is ready (LP: 1987287) | expand |
On 9/1/22 10:53, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/1987287 > > SRU Justification: > > [Impact] > > * If the mlx5 driver is reloading while the recovery flow is happening, > and if it receives new commands before the command interface is up > again, this can lead to null pointer that tries to access non- > initialized command structures. > > * So it's required to avoid processing commands before the command > interface is up again. > > * This is accomplished by a new cmdif state that helps to avoid > processing commands while cmdif is not ready. > > [Fix] > > * backport of f7936ddd35d8 f7936ddd35d8b849daf0372770c7c9dbe7910fca "net/mlx5: Avoid processing commands before cmdif is ready" > > [Test Plan] > > * An Ubuntu Server for s390x 18.04 or 20.04 LPAR or z/VM installation > is needed that has Mellanox cards (RoCE Express 2.1) assigned, > configured and enabled and that runs a 5.4 kernel (on bionic hwe-5.4). > > * Now trigger a recovery (guess that can be done at the Support Element) > and reload the driver at the same time. > > * Make sure the module/driver mlx5 is loaded and in use > (otherwise it can't be removed/unloaded). > > * Now remove/unload the module with: > sudo modprobe -r mlx5 > and (re-)load it again with: > sudo modprobe mlx5 > > * Due to the lack of RoCE Express 2.1 hardware, > IBM needs to do the verification. > > [Where problems could occur] > > * In case there is an issue with 'cmdif' it might not have the correct > interface state, which: > - either might lead to the fact that commands are not properly blocked > and the situation is similar like before > - or the commands may get always blocked, > which render the hardware useless > - or might block in wrong situation, > which will cause unexpected issues and broken behavior. > > * Since the patch got upstream accepted with v5.7-rc7 it's > not new to the kernel, was already part of groovy (and above) > and is therefor already in use by newer Ubuntu releases. > > [Other Info] > > * Since the patch is upstream since v5.7-rc7, > it's already included in jammy and kinetic. > > * Since the upstream patch incl. the line: > Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox > Connect-IB adapters") it looks to me that it was forgotten > to mark the patch for upstream stable updates. > > * Such SRUs for focal's 5.4 will automatically land in bionic's > hwe-5.4, too. But since this was especially requested for > bionic's hwe-5.4, I wanted to mention this here. > > Eran Ben Elisha (1): > net/mlx5: Avoid processing commands before cmdif is ready > > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 10 ++++++++++ > drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++++ > include/linux/mlx5/driver.h | 9 +++++++++ > 3 files changed, 23 insertions(+) > Acked-by: Tim Gardner <tim.gardner@canonical.com>
On 01.09.22 18:53, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/1987287 > > SRU Justification: > > [Impact] > > * If the mlx5 driver is reloading while the recovery flow is happening, > and if it receives new commands before the command interface is up > again, this can lead to null pointer that tries to access non- > initialized command structures. > > * So it's required to avoid processing commands before the command > interface is up again. > > * This is accomplished by a new cmdif state that helps to avoid > processing commands while cmdif is not ready. > > [Fix] > > * backport of f7936ddd35d8 f7936ddd35d8b849daf0372770c7c9dbe7910fca "net/mlx5: Avoid processing commands before cmdif is ready" > > [Test Plan] > > * An Ubuntu Server for s390x 18.04 or 20.04 LPAR or z/VM installation > is needed that has Mellanox cards (RoCE Express 2.1) assigned, > configured and enabled and that runs a 5.4 kernel (on bionic hwe-5.4). > > * Now trigger a recovery (guess that can be done at the Support Element) > and reload the driver at the same time. > > * Make sure the module/driver mlx5 is loaded and in use > (otherwise it can't be removed/unloaded). > > * Now remove/unload the module with: > sudo modprobe -r mlx5 > and (re-)load it again with: > sudo modprobe mlx5 > > * Due to the lack of RoCE Express 2.1 hardware, > IBM needs to do the verification. > > [Where problems could occur] > > * In case there is an issue with 'cmdif' it might not have the correct > interface state, which: > - either might lead to the fact that commands are not properly blocked > and the situation is similar like before > - or the commands may get always blocked, > which render the hardware useless > - or might block in wrong situation, > which will cause unexpected issues and broken behavior. > > * Since the patch got upstream accepted with v5.7-rc7 it's > not new to the kernel, was already part of groovy (and above) > and is therefor already in use by newer Ubuntu releases. > > [Other Info] > > * Since the patch is upstream since v5.7-rc7, > it's already included in jammy and kinetic. > > * Since the upstream patch incl. the line: > Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox > Connect-IB adapters") it looks to me that it was forgotten > to mark the patch for upstream stable updates. > > * Such SRUs for focal's 5.4 will automatically land in bionic's > hwe-5.4, too. But since this was especially requested for > bionic's hwe-5.4, I wanted to mention this here. > > Eran Ben Elisha (1): > net/mlx5: Avoid processing commands before cmdif is ready > > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 10 ++++++++++ > drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++++ > include/linux/mlx5/driver.h | 9 +++++++++ > 3 files changed, 23 insertions(+) > Applied to focal:linux/master-next. Thanks. -Stefan