Message ID | 20240429200115.29252-1-asmaa@nvidia.com |
---|---|
Headers | show |
Series | UBUNTU: SAUCE: mlxbf-gige: Vitesse PHY stuck in a bad state during reboot test | expand |
Hi Asmaa, This patch fails to build: /build/jammy/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c: In func tion 'mlxbf_gige_shutdown': /build/jammy/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:569:33: error: 'MLXBF_GIGE_BLUEFIELD3' undeclared (first use in this function) 569 | if (priv->hw_version == MLXBF_GIGE_BLUEFIELD3) | ^~~~~~~~~~~~~~~~~~~~~ -- Best regards, Bartlomiej On Mon, Apr 29, 2024 at 10:02 PM Asmaa Mnebhi <asmaa@nvidia.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/2062384 > > SRU Justification: > > [Impact] > > During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle. > We might have found a software workaround to avoid getting in this state in the first place: suspend the PHY during graceful shutdown. Suspend the PHY = Power down = set bit 11 to 1 in reg 0 of the PHY. This WA passed 1800 reboots on QA's setup. > > [Fix] > > * During reboot, the mlxbf_gige_shutdown() function makes a call to phy_stop(). phy_stop() calls phy_suspend(). > * Certain Linux PHY drivers, like the Vitesse PHY, don't support suspend() to power down the PHY during shutdown. > * Our Hardware also does not toggle the hard reset signal of the PHY during reboot. > * Hence, when the PHY is in a bad state, it stays in its bad state until powercycle. > * We have found a way to prevent the PHY from entering this bad state by suspending the PHY in the case of reboot. > > [Test Case] > > * do the reboot test (at least 2000 reboots): run 'reboot' from linux. > * Check that the oob_net0 interface is up and the ip is assigned. > * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe). it that solves the issue, that would be a different bug. In the bug at stake, nothing recovers the OOB ip except power cycle. > > [Regression Potential] > > * Make sure the redfish DHCP is still working during the reboot test > * Make sure the OOB gets an ip > > [Other] > > These changes were made both in the mlxbf-gige driver and UEFI >
Thanks Bart. I am actually abandoning this patch since it turns out the WA doesn’t work. > -----Original Message----- > From: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com> > Sent: Tuesday, April 30, 2024 6:12 AM > To: Asmaa Mnebhi <asmaa@nvidia.com> > Cc: Ubuntu Kernel Team <kernel-team@lists.ubuntu.com> > Subject: NAK/Cmnt: [SRU][J:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: > mlxbf-gige: Vitesse PHY stuck in a bad state during reboot test > > Hi Asmaa, > > This patch fails to build: > > /build/jammy/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main. > c: In func tion 'mlxbf_gige_shutdown': > /build/jammy/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main. > c:569:33: > error: 'MLXBF_GIGE_BLUEFIELD3' undeclared (first use in this function) > 569 | if (priv->hw_version == MLXBF_GIGE_BLUEFIELD3) > | ^~~~~~~~~~~~~~~~~~~~~ > > -- > Best regards, > Bartlomiej > > On Mon, Apr 29, 2024 at 10:02 PM Asmaa Mnebhi <asmaa@nvidia.com> > wrote: > > > > BugLink: > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs > > > .launchpad.net%2Fbugs%2F2062384&data=05%7C02%7Casmaa%40nvidia.co > m%7C83 > > > 8774d1e7df4ae40b5f08dc68fe12d6%7C43083d15727340c1b7db39efd9ccc17 > a%7C0% > > > 7C0%7C638500687659578929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4 > wLjAwMDAiL > > > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata > =38%2F > > LMqrRFEHbTaIA9PTi6UiGlkkkS4j%2FW9ipGvEBkRw%3D&reserved=0 > > > > SRU Justification: > > > > [Impact] > > > > During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, > resulting in no ip provisioning. The only way to recover is to powercycle. > > We might have found a software workaround to avoid getting in this state > in the first place: suspend the PHY during graceful shutdown. Suspend the > PHY = Power down = set bit 11 to 1 in reg 0 of the PHY. This WA passed 1800 > reboots on QA's setup. > > > > [Fix] > > > > * During reboot, the mlxbf_gige_shutdown() function makes a call to > phy_stop(). phy_stop() calls phy_suspend(). > > * Certain Linux PHY drivers, like the Vitesse PHY, don't support suspend() > to power down the PHY during shutdown. > > * Our Hardware also does not toggle the hard reset signal of the PHY > during reboot. > > * Hence, when the PHY is in a bad state, it stays in its bad state until > powercycle. > > * We have found a way to prevent the PHY from entering this bad state by > suspending the PHY in the case of reboot. > > > > [Test Case] > > > > * do the reboot test (at least 2000 reboots): run 'reboot' from linux. > > * Check that the oob_net0 interface is up and the ip is assigned. > > * please note that if the the OOB doesn't get an ip, try reloading the > driver (rmmod/modprobe). it that solves the issue, that would be a > different bug. In the bug at stake, nothing recovers the OOB ip except power > cycle. > > > > [Regression Potential] > > > > * Make sure the redfish DHCP is still working during the reboot test > > * Make sure the OOB gets an ip > > > > [Other] > > > > These changes were made both in the mlxbf-gige driver and UEFI > >