Message ID | 20240430183627.12961-1-asmaa@nvidia.com |
---|---|
Headers | show |
Series | UBUNTU: SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2 | expand |
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com> On Tue, Apr 30, 2024 at 8:37 PM Asmaa Mnebhi <asmaa@nvidia.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/2062384 > > SRU Justification: > > [Impact] > > During the reboot test, QA found an intermittent issue where the OOB link is down. > The link is down because the KSZ9031 PHY fails to complete autonegotiation. > Even under "normal" circumstances where autonegotiation completes, > it takes an abnormal time to do so (on average, at least 8 seconds). > > Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown. > In the meantime, we need to provide a software workaround since customers are starting to see this issue as well. > > [Fix] > > * Restart autonegotiation when it fails the first time. > > [Test Case] > > * On BF2, Do the reboot test: 2000 loops. > * Check that the OOB link is up and ip is assigned. > > [Regression Potential] > > * no known regression. > > [Other] > * Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s. > * we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified. >
On 4/30/24 20:36, Asmaa Mnebhi wrote: > BugLink: https://bugs.launchpad.net/bugs/2062384 > > SRU Justification: > > [Impact] > > During the reboot test, QA found an intermittent issue where the OOB link is down. > The link is down because the KSZ9031 PHY fails to complete autonegotiation. > Even under "normal" circumstances where autonegotiation completes, > it takes an abnormal time to do so (on average, at least 8 seconds). > > Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown. > In the meantime, we need to provide a software workaround since customers are starting to see this issue as well. > > [Fix] > > * Restart autonegotiation when it fails the first time. > > [Test Case] > > * On BF2, Do the reboot test: 2000 loops. > * Check that the OOB link is up and ip is assigned. > > [Regression Potential] > > * no known regression. > > [Other] > * Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s. > * we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified. > Acked-by: Tim Gardner <tim.gardner@canonical.com>
Applied to jammy:linux-bluefield/master-next. Thanks. -- Best regards, Bartlomiej On Tue, Apr 30, 2024 at 8:37 PM Asmaa Mnebhi <asmaa@nvidia.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/2062384 > > SRU Justification: > > [Impact] > > During the reboot test, QA found an intermittent issue where the OOB link is down. > The link is down because the KSZ9031 PHY fails to complete autonegotiation. > Even under "normal" circumstances where autonegotiation completes, > it takes an abnormal time to do so (on average, at least 8 seconds). > > Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown. > In the meantime, we need to provide a software workaround since customers are starting to see this issue as well. > > [Fix] > > * Restart autonegotiation when it fails the first time. > > [Test Case] > > * On BF2, Do the reboot test: 2000 loops. > * Check that the OOB link is up and ip is assigned. > > [Regression Potential] > > * no known regression. > > [Other] > * Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s. > * we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified. >