Message ID | 20200408214326.934440-1-clemens.gruber@pqgruber.com |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Series | net: phy: marvell: Fix pause frame negotiation | expand |
On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote: > The negotiation of flow control / pause frame modes was broken since > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use > genphy_read_lpa()") moved the setting of phydev->duplex below the > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that > function, phydev->pause was no longer set. > > Fix it by moving the parsing of the status variable before the blocks > dealing with the pause frames. > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") > Cc: stable@vger.kernel.org # v5.6+ nit: please don't CC stable on networking patches > Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> > --- > drivers/net/phy/marvell.c | 44 +++++++++++++++++++-------------------- > 1 file changed, 22 insertions(+), 22 deletions(-) > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c > index 4714ca0e0d4b..02cde4c0668c 100644 > --- a/drivers/net/phy/marvell.c > +++ b/drivers/net/phy/marvell.c > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev, > int lpa; > int err; > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) > + return 0; If we return early here won't we miss updating the advertising bits? We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t(). Perhaps extracting info from status should be moved to a helper so we can return early without affecting the rest of the flow? Is my understanding correct? Russell? > + if (status & MII_M1011_PHY_STATUS_FULLDUPLEX) > + phydev->duplex = DUPLEX_FULL; > + else > + phydev->duplex = DUPLEX_HALF; > + > + switch (status & MII_M1011_PHY_STATUS_SPD_MASK) { > + case MII_M1011_PHY_STATUS_1000: > + phydev->speed = SPEED_1000; > + break; > + > + case MII_M1011_PHY_STATUS_100: > + phydev->speed = SPEED_100; > + break; > + > + default: > + phydev->speed = SPEED_10; > + break; > + } > + > if (!fiber) { > err = genphy_read_lpa(phydev); > if (err < 0) > @@ -1291,28 +1313,6 @@ static int marvell_read_status_page_an(struct phy_device *phydev, > } > } > > - if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) > - return 0; > - > - if (status & MII_M1011_PHY_STATUS_FULLDUPLEX) > - phydev->duplex = DUPLEX_FULL; > - else > - phydev->duplex = DUPLEX_HALF; > - > - switch (status & MII_M1011_PHY_STATUS_SPD_MASK) { > - case MII_M1011_PHY_STATUS_1000: > - phydev->speed = SPEED_1000; > - break; > - > - case MII_M1011_PHY_STATUS_100: > - phydev->speed = SPEED_100; > - break; > - > - default: > - phydev->speed = SPEED_10; > - break; > - } > - > return 0; > } >
On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote: > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote: > > The negotiation of flow control / pause frame modes was broken since > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use > > genphy_read_lpa()") moved the setting of phydev->duplex below the > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that > > function, phydev->pause was no longer set. > > > > Fix it by moving the parsing of the status variable before the blocks > > dealing with the pause frames. > > > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") > > Cc: stable@vger.kernel.org # v5.6+ > > nit: please don't CC stable on networking patches > > > Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> > > --- > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++-------------------- > > 1 file changed, 22 insertions(+), 22 deletions(-) > > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c > > index 4714ca0e0d4b..02cde4c0668c 100644 > > --- a/drivers/net/phy/marvell.c > > +++ b/drivers/net/phy/marvell.c > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev, > > int lpa; > > int err; > > > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) > > + return 0; > > If we return early here won't we miss updating the advertising bits? > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t(). > > Perhaps extracting info from status should be moved to a helper so we > can return early without affecting the rest of the flow? > > Is my understanding correct? Russell? You are correct - and yes, there is also a problem here. It is not clear whether the resolved bit is set before or after the link status reports that link is up - however, the resolved bit indicates whether the speed and duplex are valid. What I've done elsewhere is if the resolved bit is not set, then we force phydev->link to be false, so we don't attempt to process a link-up status until we can read the link parameters. I think that's what needs to happen here, i.o.w.: if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) { phydev->link = 0; return 0; } especially as we're not reading the LPA.
On Sat, Apr 11, 2020 at 10:17:05AM +0100, Russell King - ARM Linux admin wrote: > On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote: > > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote: > > > The negotiation of flow control / pause frame modes was broken since > > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use > > > genphy_read_lpa()") moved the setting of phydev->duplex below the > > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that > > > function, phydev->pause was no longer set. > > > > > > Fix it by moving the parsing of the status variable before the blocks > > > dealing with the pause frames. > > > > > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") > > > Cc: stable@vger.kernel.org # v5.6+ > > > > nit: please don't CC stable on networking patches > > > > > Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> > > > --- > > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++-------------------- > > > 1 file changed, 22 insertions(+), 22 deletions(-) > > > > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c > > > index 4714ca0e0d4b..02cde4c0668c 100644 > > > --- a/drivers/net/phy/marvell.c > > > +++ b/drivers/net/phy/marvell.c > > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev, > > > int lpa; > > > int err; > > > > > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) > > > + return 0; > > > > If we return early here won't we miss updating the advertising bits? > > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t(). > > > > Perhaps extracting info from status should be moved to a helper so we > > can return early without affecting the rest of the flow? > > > > Is my understanding correct? Russell? > > You are correct - and yes, there is also a problem here. > > It is not clear whether the resolved bit is set before or after the > link status reports that link is up - however, the resolved bit > indicates whether the speed and duplex are valid. I assumed that in the fiber case, the link status register won't be 1 until autonegotiation is complete. There is a part in the 88E1510 datasheet on page 57 [2.6.2], which says so but it's in the Fiber/Copper Auto-Selection chapter and I am not sure if that's true in general. (?) (For copper, we call genphy_update_link, which sets phydev->link to 0 if autoneg is enabled && !completed. And according to the datasheet, the resolved bit is set when autonegotiation is completed || disabled) TL/DR: It's probably a good idea to force link to 0 to be sure, as you suggested below. I will send a v2 with that change. Moving the extraction of info to a helper is probably better left to a separate patch? > What I've done elsewhere is if the resolved bit is not set, then we > force phydev->link to be false, so we don't attempt to process a > link-up status until we can read the link parameters. I think that's > what needs to happen here, i.o.w.: > > if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) { > phydev->link = 0; > return 0; > } > > especially as we're not reading the LPA. Thanks, Clemens
On Sat, Apr 11, 2020 at 03:24:01PM +0200, Clemens Gruber wrote: > On Sat, Apr 11, 2020 at 10:17:05AM +0100, Russell King - ARM Linux admin wrote: > > On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote: > > > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote: > > > > The negotiation of flow control / pause frame modes was broken since > > > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use > > > > genphy_read_lpa()") moved the setting of phydev->duplex below the > > > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that > > > > function, phydev->pause was no longer set. > > > > > > > > Fix it by moving the parsing of the status variable before the blocks > > > > dealing with the pause frames. > > > > > > > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") > > > > Cc: stable@vger.kernel.org # v5.6+ > > > > > > nit: please don't CC stable on networking patches > > > > > > > Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> > > > > --- > > > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++-------------------- > > > > 1 file changed, 22 insertions(+), 22 deletions(-) > > > > > > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c > > > > index 4714ca0e0d4b..02cde4c0668c 100644 > > > > --- a/drivers/net/phy/marvell.c > > > > +++ b/drivers/net/phy/marvell.c > > > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev, > > > > int lpa; > > > > int err; > > > > > > > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) > > > > + return 0; > > > > > > If we return early here won't we miss updating the advertising bits? > > > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t(). > > > > > > Perhaps extracting info from status should be moved to a helper so we > > > can return early without affecting the rest of the flow? > > > > > > Is my understanding correct? Russell? > > > > You are correct - and yes, there is also a problem here. > > > > It is not clear whether the resolved bit is set before or after the > > link status reports that link is up - however, the resolved bit > > indicates whether the speed and duplex are valid. > > I assumed that in the fiber case, the link status register won't be 1 > until autonegotiation is complete. There is a part in the 88E1510 > datasheet on page 57 [2.6.2], which says so but it's in the Fiber/Copper > Auto-Selection chapter and I am not sure if that's true in general. (?) The fiber code is IMHO very suspect; the decoding of the pause status seems to be completely broken. However, I'm not sure whether anyone actually uses that or not, so I've been trying not to touch it. > (For copper, we call genphy_update_link, which sets phydev->link to 0 if > autoneg is enabled && !completed. And according to the datasheet, > the resolved bit is set when autonegotiation is completed || disabled) The resolved bit indicates whether the resolution data is valid, which will be set when autoneg is complete or autoneg is disabled. However, the timing of the bit compared to the link status is not defined in the datasheet - and that's the problem. If the link status bits report that the link is up but the resolved bit is indicating that the resolution is not valid, what do we do? Report potential garbage but link up to the higher layers, or pretend that the link is down? > TL/DR: > It's probably a good idea to force link to 0 to be sure, as you > suggested below. I will send a v2 with that change. > > Moving the extraction of info to a helper is probably better left to a > separate patch? I'm not sure what you're suggesting.
On Sat, Apr 11, 2020 at 02:43:44PM +0100, Russell King - ARM Linux admin wrote: > On Sat, Apr 11, 2020 at 03:24:01PM +0200, Clemens Gruber wrote: > > On Sat, Apr 11, 2020 at 10:17:05AM +0100, Russell King - ARM Linux admin wrote: > > > On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote: > > > > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote: > > > > > The negotiation of flow control / pause frame modes was broken since > > > > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use > > > > > genphy_read_lpa()") moved the setting of phydev->duplex below the > > > > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that > > > > > function, phydev->pause was no longer set. > > > > > > > > > > Fix it by moving the parsing of the status variable before the blocks > > > > > dealing with the pause frames. > > > > > > > > > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") > > > > > Cc: stable@vger.kernel.org # v5.6+ > > > > > > > > nit: please don't CC stable on networking patches > > > > > > > > > Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> > > > > > --- > > > > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++-------------------- > > > > > 1 file changed, 22 insertions(+), 22 deletions(-) > > > > > > > > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c > > > > > index 4714ca0e0d4b..02cde4c0668c 100644 > > > > > --- a/drivers/net/phy/marvell.c > > > > > +++ b/drivers/net/phy/marvell.c > > > > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev, > > > > > int lpa; > > > > > int err; > > > > > > > > > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) > > > > > + return 0; > > > > > > > > If we return early here won't we miss updating the advertising bits? > > > > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t(). > > > > > > > > Perhaps extracting info from status should be moved to a helper so we > > > > can return early without affecting the rest of the flow? > > > > > > > > Is my understanding correct? Russell? > > > > > > You are correct - and yes, there is also a problem here. > > > > > > It is not clear whether the resolved bit is set before or after the > > > link status reports that link is up - however, the resolved bit > > > indicates whether the speed and duplex are valid. > > > > I assumed that in the fiber case, the link status register won't be 1 > > until autonegotiation is complete. There is a part in the 88E1510 > > datasheet on page 57 [2.6.2], which says so but it's in the Fiber/Copper > > Auto-Selection chapter and I am not sure if that's true in general. (?) > > The fiber code is IMHO very suspect; the decoding of the pause status > seems to be completely broken. However, I'm not sure whether anyone > actually uses that or not, so I've been trying not to touch it. > > > (For copper, we call genphy_update_link, which sets phydev->link to 0 if > > autoneg is enabled && !completed. And according to the datasheet, > > the resolved bit is set when autonegotiation is completed || disabled) > > The resolved bit indicates whether the resolution data is valid, which > will be set when autoneg is complete or autoneg is disabled. However, > the timing of the bit compared to the link status is not defined in the > datasheet - and that's the problem. If the link status bits report that > the link is up but the resolved bit is indicating that the resolution > is not valid, what do we do? Report potential garbage but link up to > the higher layers, or pretend that the link is down? I see, thanks for the clarification. Pretending that the link is down seems to be the right choice. > > > TL/DR: > > It's probably a good idea to force link to 0 to be sure, as you > > suggested below. I will send a v2 with that change. > > > > Moving the extraction of info to a helper is probably better left to a > > separate patch? > > I'm not sure what you're suggesting. I was referring to Jakub's suggestion to create a new helper function for the parsing of the status register. Clemens
On Sat, Apr 11, 2020 at 02:43:44PM +0100, Russell King - ARM Linux admin wrote: > The fiber code is IMHO very suspect; the decoding of the pause status > seems to be completely broken. However, I'm not sure whether anyone > actually uses that or not, so I've been trying not to touch it. If the following table for the link partner advertisement is correct.. PAUSE ASYM_PAUSE MEANING 0 0 Link partner has no pause frame support 0 1 <- Link partner can TX pause frames 1 0 <-> Link partner can RX and TX pauses 1 1 -> Link partner can RX pause frames ..then I think both pause and asym_pause have to be assigned independently, like this: phydev->pause = !!(lpa & LPA_1000XPAUSE); phydev->asym_pause = !!(lpa & LPA_1000XPAUSE_ASYM); (Using the defines from uapi mii.h instead of the redundant/combined LPA_PAUSE_FIBER etc. which can then be removed from marvell.c) Currently, if LPA_1000XPAUSE_ASYM is set we do pause=1 and asym_pause=1 no matter if LPA_1000XPAUSE is set. This could lead us to mistake a link partner who can only send for one who can only receive pause frames. ^ Was this the problem you meant? I saw that for the copper case and in other drivers, we first set the ETHTOOL_LINK_MODE_(Asym_)Pause_BIT bit in lp_advertising and then set phydev->(asym_)pause depending on the ETHTOOL_LINK_MODE_... bit. Do you agree that we should also set the ETHTOOL_ bits in the fiber case? Does anybody have access to a Marvell PHY with 1000base-X Ethernet? (I only have a 88E1510 + 1000Base-T at the home office) Thanks, Clemens
On Sun, Apr 12, 2020 at 07:03:36PM +0200, Clemens Gruber wrote: > On Sat, Apr 11, 2020 at 02:43:44PM +0100, Russell King - ARM Linux admin wrote: > > The fiber code is IMHO very suspect; the decoding of the pause status > > seems to be completely broken. However, I'm not sure whether anyone > > actually uses that or not, so I've been trying not to touch it. > > If the following table for the link partner advertisement is correct.. > PAUSE ASYM_PAUSE MEANING > 0 0 Link partner has no pause frame support > 0 1 <- Link partner can TX pause frames > 1 0 <-> Link partner can RX and TX pauses > 1 1 -> Link partner can RX pause frames > > ..then I think both pause and asym_pause have to be assigned > independently, like this: > phydev->pause = !!(lpa & LPA_1000XPAUSE); > phydev->asym_pause = !!(lpa & LPA_1000XPAUSE_ASYM); Yes, that's how it should be, because the pause and asym pause bits correspond exactly with the phydev members. > (Using the defines from uapi mii.h instead of the redundant/combined > LPA_PAUSE_FIBER etc. which can then be removed from marvell.c) > > Currently, if LPA_1000XPAUSE_ASYM is set we do pause=1 and asym_pause=1 > no matter if LPA_1000XPAUSE is set. This could lead us to mistake a link > partner who can only send for one who can only receive pause frames. > ^ Was this the problem you meant? Exactly, but given that I've no way to actually test anything with regard to 1G Marvell PHYs using 1000BASE-X, I have to assume that whoever contributed this code tested it and it worked for them. So, it should not be changed just because it looks wrong - there may be some subtle issues in the hardware that we don't know about that makes this code "do the best it can". We need someone who can actually do some tests to solve this. > Does anybody have access to a Marvell PHY with 1000base-X Ethernet? > (I only have a 88E1510 + 1000Base-T at the home office) Yes, that's what we need... this isn't the first time I've mentioned the problem, and so far no one has stepped forward.
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 4714ca0e0d4b..02cde4c0668c 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev, int lpa; int err; + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) + return 0; + + if (status & MII_M1011_PHY_STATUS_FULLDUPLEX) + phydev->duplex = DUPLEX_FULL; + else + phydev->duplex = DUPLEX_HALF; + + switch (status & MII_M1011_PHY_STATUS_SPD_MASK) { + case MII_M1011_PHY_STATUS_1000: + phydev->speed = SPEED_1000; + break; + + case MII_M1011_PHY_STATUS_100: + phydev->speed = SPEED_100; + break; + + default: + phydev->speed = SPEED_10; + break; + } + if (!fiber) { err = genphy_read_lpa(phydev); if (err < 0) @@ -1291,28 +1313,6 @@ static int marvell_read_status_page_an(struct phy_device *phydev, } } - if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) - return 0; - - if (status & MII_M1011_PHY_STATUS_FULLDUPLEX) - phydev->duplex = DUPLEX_FULL; - else - phydev->duplex = DUPLEX_HALF; - - switch (status & MII_M1011_PHY_STATUS_SPD_MASK) { - case MII_M1011_PHY_STATUS_1000: - phydev->speed = SPEED_1000; - break; - - case MII_M1011_PHY_STATUS_100: - phydev->speed = SPEED_100; - break; - - default: - phydev->speed = SPEED_10; - break; - } - return 0; }
The negotiation of flow control / pause frame modes was broken since commit fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") moved the setting of phydev->duplex below the phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that function, phydev->pause was no longer set. Fix it by moving the parsing of the status variable before the blocks dealing with the pause frames. Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") Cc: stable@vger.kernel.org # v5.6+ Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> --- drivers/net/phy/marvell.c | 44 +++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 22 deletions(-)