mbox series

[SRU,F:linux-bluefield,v1,0/1] UBUNTU: SAUCE: mlxbf-gige: Vitesse PHY stuck in a bad state during reboot test

Message ID 20240523172503.1746-1-asmaa@nvidia.com
Headers show
Series UBUNTU: SAUCE: mlxbf-gige: Vitesse PHY stuck in a bad state during reboot test | expand

Message

Asmaa Mnebhi May 23, 2024, 5:25 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2064163

SRU Justification:

[Impact]

During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle.
We found a software workaround to avoid getting in this state in the first place: disable the OOB port in the shutdown function.
Although the PHY issue only happens in BF3, disabling the OOB port in the
shutdown is a fix that should apply to BF2 as well.

[Fix]

* Prevent the PHY from entering this bad state by disabling
  the OOB port during shutdown.

[Test Case]

* do the reboot test (at least 2000 reboots): run 'reboot' from linux.
* Check that the oob_net0 interface is up and the ip is assigned.
* please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe).
  It that solves the issue, that would be a different bug. In the bug at stake, nothing
  recovers the OOB ip except power cycle.

[Regression Potential]

* Make sure the redfish DHCP is still working during the reboot test
* Make sure the OOB gets an ip

Comments

Thibault Ferrante May 24, 2024, 10:14 a.m. UTC | #1
On 23-05-2024 19:25, Asmaa Mnebhi wrote:
> BugLink: https://bugs.launchpad.net/bugs/2064163
> 
> SRU Justification:
> 
> [Impact]
> 
> During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle.
> We found a software workaround to avoid getting in this state in the first place: disable the OOB port in the shutdown function.
> Although the PHY issue only happens in BF3, disabling the OOB port in the
> shutdown is a fix that should apply to BF2 as well.
> 
> [Fix]
> 
> * Prevent the PHY from entering this bad state by disabling
>    the OOB port during shutdown.
> 
> [Test Case]
> 
> * do the reboot test (at least 2000 reboots): run 'reboot' from linux.
> * Check that the oob_net0 interface is up and the ip is assigned.
> * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe).
>    It that solves the issue, that would be a different bug. In the bug at stake, nothing
>    recovers the OOB ip except power cycle.
> 
> [Regression Potential]
> 
> * Make sure the redfish DHCP is still working during the reboot test
> * Make sure the OOB gets an ip
> 

The bug target jammy but this patchset is about focal, what should be the target ? Both ?
I checked it applies/build for both.

Acked-by: Thibault Ferrante <thibault.ferrante@canonical.com>

--
Thibault
Tim Gardner May 24, 2024, 1:30 p.m. UTC | #2
On 5/23/24 11:25, Asmaa Mnebhi wrote:
> BugLink: https://bugs.launchpad.net/bugs/2064163
> 
> SRU Justification:
> 
> [Impact]
> 
> During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle.
> We found a software workaround to avoid getting in this state in the first place: disable the OOB port in the shutdown function.
> Although the PHY issue only happens in BF3, disabling the OOB port in the
> shutdown is a fix that should apply to BF2 as well.
> 
> [Fix]
> 
> * Prevent the PHY from entering this bad state by disabling
>    the OOB port during shutdown.
> 
> [Test Case]
> 
> * do the reboot test (at least 2000 reboots): run 'reboot' from linux.
> * Check that the oob_net0 interface is up and the ip is assigned.
> * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe).
>    It that solves the issue, that would be a different bug. In the bug at stake, nothing
>    recovers the OOB ip except power cycle.
> 
> [Regression Potential]
> 
> * Make sure the redfish DHCP is still working during the reboot test
> * Make sure the OOB gets an ip
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>

You need to nominate Focal in the bug report.
Bartlomiej Zolnierkiewicz June 3, 2024, 12:36 p.m. UTC | #3
Applied to focal:linux-bluefield/master-next. Thanks.

--
Best regards,
Bartlomiej

On Thu, May 23, 2024 at 7:26 PM Asmaa Mnebhi <asmaa@nvidia.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2064163
>
> SRU Justification:
>
> [Impact]
>
> During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle.
> We found a software workaround to avoid getting in this state in the first place: disable the OOB port in the shutdown function.
> Although the PHY issue only happens in BF3, disabling the OOB port in the
> shutdown is a fix that should apply to BF2 as well.
>
> [Fix]
>
> * Prevent the PHY from entering this bad state by disabling
>   the OOB port during shutdown.
>
> [Test Case]
>
> * do the reboot test (at least 2000 reboots): run 'reboot' from linux.
> * Check that the oob_net0 interface is up and the ip is assigned.
> * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe).
>   It that solves the issue, that would be a different bug. In the bug at stake, nothing
>   recovers the OOB ip except power cycle.
>
> [Regression Potential]
>
> * Make sure the redfish DHCP is still working during the reboot test
> * Make sure the OOB gets an ip
>