mbox series

[SRU,J:linux-bluefield,v2,0/1] UBUNTU: SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2

Message ID 20240430183627.12961-1-asmaa@nvidia.com
Headers show
Series UBUNTU: SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2 | expand

Message

Asmaa Mnebhi April 30, 2024, 6:36 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2062384

SRU Justification:

[Impact]

During the reboot test, QA found an intermittent issue where the OOB link is down.
The link is down because the KSZ9031 PHY fails to complete autonegotiation.
Even under "normal" circumstances where autonegotiation completes,
it takes an abnormal time to do so (on average, at least 8 seconds).

Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown.
In the meantime, we need to provide a software workaround since customers are starting to see this issue as well.

[Fix]

* Restart autonegotiation when it fails the first time.

[Test Case]

* On BF2, Do the reboot test: 2000 loops.
* Check that the OOB link is up and ip is assigned.

[Regression Potential]

* no known regression.

[Other]
* Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s.
* we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified.

Comments

Bartlomiej Zolnierkiewicz May 6, 2024, 4:04 p.m. UTC | #1
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com>

On Tue, Apr 30, 2024 at 8:37 PM Asmaa Mnebhi <asmaa@nvidia.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2062384
>
> SRU Justification:
>
> [Impact]
>
> During the reboot test, QA found an intermittent issue where the OOB link is down.
> The link is down because the KSZ9031 PHY fails to complete autonegotiation.
> Even under "normal" circumstances where autonegotiation completes,
> it takes an abnormal time to do so (on average, at least 8 seconds).
>
> Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown.
> In the meantime, we need to provide a software workaround since customers are starting to see this issue as well.
>
> [Fix]
>
> * Restart autonegotiation when it fails the first time.
>
> [Test Case]
>
> * On BF2, Do the reboot test: 2000 loops.
> * Check that the OOB link is up and ip is assigned.
>
> [Regression Potential]
>
> * no known regression.
>
> [Other]
> * Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s.
> * we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified.
>
Tim Gardner May 13, 2024, 2:32 p.m. UTC | #2
On 4/30/24 20:36, Asmaa Mnebhi wrote:
> BugLink: https://bugs.launchpad.net/bugs/2062384
> 
> SRU Justification:
> 
> [Impact]
> 
> During the reboot test, QA found an intermittent issue where the OOB link is down.
> The link is down because the KSZ9031 PHY fails to complete autonegotiation.
> Even under "normal" circumstances where autonegotiation completes,
> it takes an abnormal time to do so (on average, at least 8 seconds).
> 
> Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown.
> In the meantime, we need to provide a software workaround since customers are starting to see this issue as well.
> 
> [Fix]
> 
> * Restart autonegotiation when it fails the first time.
> 
> [Test Case]
> 
> * On BF2, Do the reboot test: 2000 loops.
> * Check that the OOB link is up and ip is assigned.
> 
> [Regression Potential]
> 
> * no known regression.
> 
> [Other]
> * Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s.
> * we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified.
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Bartlomiej Zolnierkiewicz June 3, 2024, 10:50 a.m. UTC | #3
Applied to jammy:linux-bluefield/master-next. Thanks.

--
Best regards,
Bartlomiej

On Tue, Apr 30, 2024 at 8:37 PM Asmaa Mnebhi <asmaa@nvidia.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2062384
>
> SRU Justification:
>
> [Impact]
>
> During the reboot test, QA found an intermittent issue where the OOB link is down.
> The link is down because the KSZ9031 PHY fails to complete autonegotiation.
> Even under "normal" circumstances where autonegotiation completes,
> it takes an abnormal time to do so (on average, at least 8 seconds).
>
> Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown.
> In the meantime, we need to provide a software workaround since customers are starting to see this issue as well.
>
> [Fix]
>
> * Restart autonegotiation when it fails the first time.
>
> [Test Case]
>
> * On BF2, Do the reboot test: 2000 loops.
> * Check that the OOB link is up and ip is assigned.
>
> [Regression Potential]
>
> * no known regression.
>
> [Other]
> * Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s.
> * we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified.
>