diff mbox series

[SRU,J:linux-bluefield,v1,1/1] UBUNTU: SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2

Message ID 20240418152503.30820-2-asmaa@nvidia.com
State New
Headers show
Series UBUNTU: SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2 | expand

Commit Message

Asmaa Mnebhi April 18, 2024, 3:25 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2062384

During their reboot test, QA found an intermittent issue where the OOB link is down.
The link is down because the KSZ9031 PHY fails to complete autonegotiation.
Even under "normal" circumstances where autonegotiation completes,
it takes an abnormal time to do so (on average, at least 8 seconds).

Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown.
In the meantime, we need to provide a software workaround since customers are starting to see this issue as well.

Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
---
 .../mellanox/mlxbf_gige/mlxbf_gige_main.c       | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Asmaa Mnebhi April 29, 2024, 1:21 p.m. UTC | #1
Hi @Tim Gardner<mailto:tim.gardner@canonical.com> @Bartlomiej Zolnierkiewicz<mailto:bartlomiej.zolnierkiewicz@canonical.com> could you please review this patch?



> -----Original Message-----

> From: Asmaa Mnebhi <asmaa@nvidia.com>

> Sent: Thursday, April 18, 2024 11:25 AM

> To: kernel-team@lists.ubuntu.com

> Cc: Asmaa Mnebhi <asmaa@nvidia.com>; David Thompson

> <davthompson@nvidia.com>

> Subject: [SRU][J:linux-bluefield][PATCH v1 1/1] UBUNTU: SAUCE: mlxbf-gige:

> autonegotiation fails to complete on BF2

>

> BugLink: https://bugs.launchpad.net/bugs/2062384

>

> During their reboot test, QA found an intermittent issue where the OOB link is

> down.

> The link is down because the KSZ9031 PHY fails to complete autonegotiation.

> Even under "normal" circumstances where autonegotiation completes, it takes

> an abnormal time to do so (on average, at least 8 seconds).

>

> Hence, the hardware team and Microchip are involved in this debug but the root

> cause is still unknown.

> In the meantime, we need to provide a software workaround since customers are

> starting to see this issue as well.

>

> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com<mailto:asmaa@nvidia.com>>

> Reviewed-by: David Thompson <davthompson@nvidia.com<mailto:davthompson@nvidia.com>>

> ---

>  .../mellanox/mlxbf_gige/mlxbf_gige_main.c       | 17 +++++++++++++++++

>  1 file changed, 17 insertions(+)

>

> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> index 56235cef5cd6..e377aaa4a2f4 100644

> --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> @@ -132,6 +132,7 @@ static int mlxbf_gige_open(struct net_device *netdev)  {

>            struct mlxbf_gige *priv = netdev_priv(netdev);

>            struct phy_device *phydev = netdev->phydev;

> +         u8 timeout = 10;

>            u64 control;

>            u64 int_en;

>            int err;

> @@ -154,6 +155,22 @@ static int mlxbf_gige_open(struct net_device *netdev)

>

>            phy_start(phydev);

>

> +         if (priv->hw_version == MLXBF_GIGE_BLUEFIELD2) {

> +                       /* On BlueField-2 systems, the KSZ9031 PHY hardware could

> fail

> +                       * to complete autonegotiation and so the link remains down.

> +                       * The software workaround is to restart autonegotiation.

> +                       */

> +                       while (timeout) {

> +                                      if (phy_aneg_done(phydev))

> +                                                    break;

> +                                      msleep(1000);

> +                                      timeout--;

> +                       };

> +

> +                       if (timeout == 0)

> +                                      phy_restart_aneg(phydev);

> +         }

> +

>            err = mlxbf_gige_tx_init(priv);

>            if (err)

>                           goto phy_deinit;

> --

> 2.30.1
Asmaa Mnebhi April 29, 2024, 7:30 p.m. UTC | #2
++@Vladimir Sokolovsky<mailto:vlad@nvidia.com>

From: Asmaa Mnebhi <asmaa@nvidia.com>
Sent: Monday, April 29, 2024 9:22 AM
To: kernel-team@lists.ubuntu.com; Tim Gardner <tim.gardner@canonical.com>; Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com>
Cc: David Thompson <davthompson@nvidia.com>
Subject: RE: [SRU][J:linux-bluefield][PATCH v1 1/1] UBUNTU: SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2


Hi @Tim Gardner<mailto:tim.gardner@canonical.com> @Bartlomiej Zolnierkiewicz<mailto:bartlomiej.zolnierkiewicz@canonical.com> could you please review this patch?



> -----Original Message-----

> From: Asmaa Mnebhi <asmaa@nvidia.com<mailto:asmaa@nvidia.com>>

> Sent: Thursday, April 18, 2024 11:25 AM

> To: kernel-team@lists.ubuntu.com<mailto:kernel-team@lists.ubuntu.com>

> Cc: Asmaa Mnebhi <asmaa@nvidia.com<mailto:asmaa@nvidia.com>>; David Thompson

> <davthompson@nvidia.com<mailto:davthompson@nvidia.com>>

> Subject: [SRU][J:linux-bluefield][PATCH v1 1/1] UBUNTU: SAUCE: mlxbf-gige:

> autonegotiation fails to complete on BF2

>

> BugLink: https://bugs.launchpad.net/bugs/2062384

>

> During their reboot test, QA found an intermittent issue where the OOB link is

> down.

> The link is down because the KSZ9031 PHY fails to complete autonegotiation.

> Even under "normal" circumstances where autonegotiation completes, it takes

> an abnormal time to do so (on average, at least 8 seconds).

>

> Hence, the hardware team and Microchip are involved in this debug but the root

> cause is still unknown.

> In the meantime, we need to provide a software workaround since customers are

> starting to see this issue as well.

>

> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com<mailto:asmaa@nvidia.com>>

> Reviewed-by: David Thompson <davthompson@nvidia.com<mailto:davthompson@nvidia.com>>

> ---

>  .../mellanox/mlxbf_gige/mlxbf_gige_main.c       | 17 +++++++++++++++++

>  1 file changed, 17 insertions(+)

>

> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> index 56235cef5cd6..e377aaa4a2f4 100644

> --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

> @@ -132,6 +132,7 @@ static int mlxbf_gige_open(struct net_device *netdev)  {

>            struct mlxbf_gige *priv = netdev_priv(netdev);

>            struct phy_device *phydev = netdev->phydev;

> +         u8 timeout = 10;

>            u64 control;

>            u64 int_en;

>            int err;

> @@ -154,6 +155,22 @@ static int mlxbf_gige_open(struct net_device *netdev)

>

>            phy_start(phydev);

>

> +         if (priv->hw_version == MLXBF_GIGE_BLUEFIELD2) {

> +                       /* On BlueField-2 systems, the KSZ9031 PHY hardware could

> fail

> +                       * to complete autonegotiation and so the link remains down.

> +                       * The software workaround is to restart autonegotiation.

> +                       */

> +                       while (timeout) {

> +                                      if (phy_aneg_done(phydev))

> +                                                    break;

> +                                      msleep(1000);

> +                                      timeout--;

> +                       };

> +

> +                       if (timeout == 0)

> +                                      phy_restart_aneg(phydev);

> +         }

> +

>            err = mlxbf_gige_tx_init(priv);

>            if (err)

>                           goto phy_deinit;

> --

> 2.30.1
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
index 56235cef5cd6..e377aaa4a2f4 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -132,6 +132,7 @@  static int mlxbf_gige_open(struct net_device *netdev)
 {
 	struct mlxbf_gige *priv = netdev_priv(netdev);
 	struct phy_device *phydev = netdev->phydev;
+	u8 timeout = 10;
 	u64 control;
 	u64 int_en;
 	int err;
@@ -154,6 +155,22 @@  static int mlxbf_gige_open(struct net_device *netdev)
 
 	phy_start(phydev);
 
+	if (priv->hw_version == MLXBF_GIGE_BLUEFIELD2) {
+		/* On BlueField-2 systems, the KSZ9031 PHY hardware could fail
+		 * to complete autonegotiation and so the link remains down.
+		 * The software workaround is to restart autonegotiation.
+		 */
+		while (timeout) {
+			if (phy_aneg_done(phydev))
+				break;
+			msleep(1000);
+			timeout--;
+		};
+
+		if (timeout == 0)
+			phy_restart_aneg(phydev);
+	}
+
 	err = mlxbf_gige_tx_init(priv);
 	if (err)
 		goto phy_deinit;