diff mbox series

[v2,net-next,5/6] net: dsa: felix: delete .phylink_mac_an_restart code

Message ID 20200704124507.3336497-6-olteanv@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series PHYLINK integration improvements for Felix DSA driver | expand

Commit Message

Vladimir Oltean July 4, 2020, 12:45 p.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

The Cisco SGMII and USXGMII standards specify control information
exchange to be "achieved by using the Auto-Negotiation functionality
defined in Clause 37 of the IEEE Specification 802.3z".

The differences to clause 37 auto-negotiation are specified by the
respective standards. In the case of SGMII, the differences are spelled
out as being:

- A reduction of the link timer value, from 10 ms to 1.6 ms.
- A customization of the tx_config_reg[15:0], mostly to allow
  propagation of speed information.

A similar situation is going on for USXGMII as well: "USXGMII Auto-neg
mechanism is based on Clause 37 (Figure 37-6) plus additional management
control to select USXGMII mode".

The point is, both Cisco standards make explicit reference that they
require an auto-negotiation state machine implemented as per "Figure
37-6-Auto-Negotiation state diagram" from IEEE 802.3. In the SGMII spec,
it is very clearly pointed out that both the MAC PCS (Figure 3 MAC
Functional Block) and the PHY PCS (Figure 2 PHY Functional Block)
contain an auto-negotiation block defined by "Auto-Negotiation Figure
37-6".

Since both ends of the SGMII/USXGMII link implement the same state
machine (just carry different tx_config_reg payloads, which they convey
to their link partner via /C/ ordered sets), naturally the ability to
restart auto-negotiation is symmetrical. The state machine in IEEE 802.3
Figure 37-6 specifies the signal that triggers an auto-negotiation
restart as being "mr_restart_an=TRUE".

Furthermore, clause "37.2.5.1.9 State diagram variable to management
register mapping", through its "Table 37-8-PCS state diagram variable to
management register mapping", requires a PCS compliant to clause 37 to
expose the mr_restart_an signal to management through MDIO register "0.9
Auto-Negotiation restart", aka BMCR_ANRESTART in Linux terms.

The Felix PCS for SGMII and USXGMII is compliant to clause 37, so it
exposes BMCR_ANRESTART to the operating system. When this bit is
asserted, the following happens:

1. STATUS[Auto_Negotiation_Complete] goes from 1->0.
2. The PCS starts sending AN sequences instead of packets or IDLEs.
3. The PCS waits to receive AN sequences from PHY and matches them.
4. Once it has received  matching AN sequences and a PHY acknowledge,
   STATUS[Auto_Negotiation_Complete] goes from 0->1.
5. Normal packet transmission restarts.

Otherwise stated, the MAC PCS has the ability to re-trigger a switch of
the lane from data mode into configuration mode, then control
information exchange takes place, then the lane is switched back into
data mode. These 5 steps are collectively described as "restart AN state
machine" by the PCS documentation.
This is all as per IEEE 802.3 Clause 37 AN state machine, which SGMII
and USXGMII do not touch at this fundamental level.

Now, it is true that the Cisco SGMII and USXGMII specs mention that the
control information exchange has a unidirectional meaning. That is, the
PHY restarts the clause 37 auto-negotiation upon any change in MDI
auto-negotiation parameters.

PHYLINK takes this fact a bit further, and since the fact that for
SGMII/USXGMII, the MAC PCS conveys no new information to the PHY PCS
(beyond acknowledging the received config word), does not have any use
for permitting the MAC PCS to trigger a restart of the clause 37
auto-negotiation.

The only SERDES protocols for which PHYLINK allows that are 1000Base-X
and 2500Base-X. For those, the control information exchange _is_
bidirectional (local PCS specifies its duplex and flow control
abilities) since the link partner is at the other side of the media.

For any other SERDES protocols, the .phylink_mac_an_restart callback is
dead code. This is probably OK, I can't come up with a situation where
it might be useful for the MAC PCS to clear its cache of link state and
ask for a new tx_config_reg.

So remove this code.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
Update commit message to be more clear.

 drivers/net/dsa/ocelot/felix.c         | 10 -------
 drivers/net/dsa/ocelot/felix.h         |  1 -
 drivers/net/dsa/ocelot/felix_vsc9959.c | 37 --------------------------
 3 files changed, 48 deletions(-)

Comments

Russell King (Oracle) July 4, 2020, 2:56 p.m. UTC | #1
On Sat, Jul 04, 2020 at 03:45:06PM +0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> The Cisco SGMII and USXGMII standards specify control information
> exchange to be "achieved by using the Auto-Negotiation functionality
> defined in Clause 37 of the IEEE Specification 802.3z".
> 
> The differences to clause 37 auto-negotiation are specified by the
> respective standards. In the case of SGMII, the differences are spelled
> out as being:
> 
> - A reduction of the link timer value, from 10 ms to 1.6 ms.
> - A customization of the tx_config_reg[15:0], mostly to allow
>   propagation of speed information.
> 
> A similar situation is going on for USXGMII as well: "USXGMII Auto-neg
> mechanism is based on Clause 37 (Figure 37-6) plus additional management
> control to select USXGMII mode".
> 
> The point is, both Cisco standards make explicit reference that they
> require an auto-negotiation state machine implemented as per "Figure
> 37-6-Auto-Negotiation state diagram" from IEEE 802.3. In the SGMII spec,
> it is very clearly pointed out that both the MAC PCS (Figure 3 MAC
> Functional Block) and the PHY PCS (Figure 2 PHY Functional Block)
> contain an auto-negotiation block defined by "Auto-Negotiation Figure
> 37-6".
> 
> Since both ends of the SGMII/USXGMII link implement the same state
> machine (just carry different tx_config_reg payloads, which they convey
> to their link partner via /C/ ordered sets), naturally the ability to
> restart auto-negotiation is symmetrical. The state machine in IEEE 802.3
> Figure 37-6 specifies the signal that triggers an auto-negotiation
> restart as being "mr_restart_an=TRUE".
> 
> Furthermore, clause "37.2.5.1.9 State diagram variable to management
> register mapping", through its "Table 37-8-PCS state diagram variable to
> management register mapping", requires a PCS compliant to clause 37 to
> expose the mr_restart_an signal to management through MDIO register "0.9
> Auto-Negotiation restart", aka BMCR_ANRESTART in Linux terms.
> 
> The Felix PCS for SGMII and USXGMII is compliant to clause 37, so it
> exposes BMCR_ANRESTART to the operating system. When this bit is
> asserted, the following happens:
> 
> 1. STATUS[Auto_Negotiation_Complete] goes from 1->0.
> 2. The PCS starts sending AN sequences instead of packets or IDLEs.
> 3. The PCS waits to receive AN sequences from PHY and matches them.
> 4. Once it has received  matching AN sequences and a PHY acknowledge,
>    STATUS[Auto_Negotiation_Complete] goes from 0->1.
> 5. Normal packet transmission restarts.
> 
> Otherwise stated, the MAC PCS has the ability to re-trigger a switch of
> the lane from data mode into configuration mode, then control
> information exchange takes place, then the lane is switched back into
> data mode. These 5 steps are collectively described as "restart AN state
> machine" by the PCS documentation.
> This is all as per IEEE 802.3 Clause 37 AN state machine, which SGMII
> and USXGMII do not touch at this fundamental level.
> 
> Now, it is true that the Cisco SGMII and USXGMII specs mention that the
> control information exchange has a unidirectional meaning. That is, the
> PHY restarts the clause 37 auto-negotiation upon any change in MDI
> auto-negotiation parameters.
> 
> PHYLINK takes this fact a bit further, and since the fact that for
> SGMII/USXGMII, the MAC PCS conveys no new information to the PHY PCS
> (beyond acknowledging the received config word), does not have any use
> for permitting the MAC PCS to trigger a restart of the clause 37
> auto-negotiation.
> 
> The only SERDES protocols for which PHYLINK allows that are 1000Base-X
> and 2500Base-X. For those, the control information exchange _is_
> bidirectional (local PCS specifies its duplex and flow control
> abilities) since the link partner is at the other side of the media.
> 
> For any other SERDES protocols, the .phylink_mac_an_restart callback is
> dead code. This is probably OK, I can't come up with a situation where
> it might be useful for the MAC PCS to clear its cache of link state and
> ask for a new tx_config_reg.
> 
> So remove this code.

NAK for this description.  You know why.
Vladimir Oltean July 4, 2020, 3:50 p.m. UTC | #2
On Sat, Jul 04, 2020 at 03:56:14PM +0100, Russell King - ARM Linux admin wrote:

[snip]

> 
> NAK for this description.  You know why.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Sorry, I cannot work with "too busy" (your feedback from v1) and "you
know why". If there's anything incorrect in the description of the
patch, please point it out and I will change it.

There seems to be a disconnect between what I thought this phylink
callback does (and hence the reason why the code I'm deleting exists)
and what it really does. That disconnect is explained in enough detail
that even somebody who isn't intimately familiar with phylink and/or
clause 37 AN can understand. Then a justification of why deleting this
code is, at least given what we know now, the right thing to do.

I am really not trying to make any more waves than necessary, so please
help me to formulate the description in a way that is acceptable for
merging into the mainline Linux kernel.

-Vladimir
Russell King (Oracle) July 4, 2020, 6:14 p.m. UTC | #3
On Sat, Jul 04, 2020 at 06:50:48PM +0300, Vladimir Oltean wrote:
> On Sat, Jul 04, 2020 at 03:56:14PM +0100, Russell King - ARM Linux admin wrote:
> 
> [snip]
> 
> > 
> > NAK for this description.  You know why.
> > 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> 
> Sorry, I cannot work with "too busy" (your feedback from v1) and "you
> know why". If there's anything incorrect in the description of the
> patch, please point it out and I will change it.

Let's recap.

I have explained to you on numerous instances that:

- as part of the ethtool program, there is the facility to restart
  negotiation, which users expect to cause the media to renegotiate.

- when dealing with a link that involves a conventional copper PHY,
  irrespective of how that PHY is connected, this has always resulted
  in the copper PHY being requested to restart negotiation on the
  media side.

- in order to provide this same capability for fibre links where
  negotiation is supported, phylink provides the ability to pass that
  request to the PCS, since that is the media facing hardware block
  responsible for on-media negotiation.

- for SGMII, there is no advertisement from the MAC per-se, it is only
  an acknowledgement that the MAC has received the configuration word
  from the PHY.

It is also true that phylink uses this when there may be a change to
the PCS advertisement - again, to support the ability for the user to
change the media-side advertisement.  There is no media side
advertisement in SGMII at the MAC PCS.

There has been code in phylink that avoids calling the an_restart
methods since day one, as a result of the above.

Let's now look at the first version of your commit message:
| In hardware, the AN_RESTART field for these SerDes protocols (SGMII,
| USXGMII) clears the resolved configuration from the PCS's
| auto-negotiation state machine.
| 
| But PHYLINK has a different interpretation of "AN restart". It assumes
| that this Linux system is capable of re-triggering an auto-negotiation
| sequence, something which is only possible with 1000Base-X and
| 2500Base-X, where the auto-negotiation is symmetrical. In SGMII and
| USXGMII, there's an AN master and an AN slave, and it isn't so much of
| "auto-negotiation" as it is "PHY passing the resolved link state on to
| the MAC".
| 
| So, in PHYLINK's interpretation of "AN restart", it doesn't make sense
| to do anything for SGMII and USXGMII. In fact, PHYLINK won't even call
| us for any other SerDes protocol than 1000Base-X and 2500Base-X. But we
| are not supporting those. So just remove this code.

This comes over as blaming phylink for an interpretation of "AN
restart" that does not conform to your ideas.  While it is true that
phylink has a "different interpretation", that interpretation comes
from the interface that this callback is implementing, which is for
the user-level interface.  So, the "blame" that comes over in this
commit message is completely unjustified.

You also capitalised "PHYLINK" throughout this message for some reason,
which comes over as a stressed word (capitals is generally interpreted
as stress or shouting.)  Then there's "this Linux system" which sounds
a bit spiteful.

None of those things belong in a commit message, so I objected to it,
explicitly asking you to (quote) "So, please, lay off your phylink
bashing in your commit messages."

The replacement that you sent was worse - it continues this theme,
taking it further:

| The point is, both Cisco standards make explicit reference that they
| require an auto-negotiation state machine implemented as per "Figure
| 37-6-Auto-Negotiation state diagram" from IEEE 802.3. In the SGMII spec,
| it is very clearly pointed out that both the MAC PCS (Figure 3 MAC
| Functional Block) and the PHY PCS (Figure 2 PHY Functional Block)
| contain an auto-negotiation block defined by "Auto-Negotiation Figure
| 37-6".

Specifically, "The point is, ..." and "very clearly pointed out" are
completely unnecessary in a commit message, it gives a lecturing tone
to this text.  The lecturing tone continues throughout the entire text.

| PHYLINK takes this fact a bit further, and since the fact that for
| SGMII/USXGMII, the MAC PCS conveys no new information to the PHY PCS
| (beyond acknowledging the received config word), does not have any use
| for permitting the MAC PCS to trigger a restart of the clause 37
| auto-negotiation.

Again, it is not phylink that "takes this fact a bit further".  Phylink
is implementing the needs of userspace via this callback, which is to
cause autonegotiation to restart on the media.

| The only SERDES protocols for which PHYLINK allows that are 1000Base-X
| and 2500Base-X. For those, the control information exchange _is_
| bidirectional (local PCS specifies its duplex and flow control
| abilities) since the link partner is at the other side of the media.

This avoids the point that I have been making for a long time now
about what phylink is doing here.

Let me re-cap: phylink implements what is required to support the
network driver in implementing the what the user expects from the APIs
exposed by the kernel. One of the APIs is to restart negotiation, which
is generally accepted to mean the on-media negotiation, rather than
whatever internal negotiation happens within their "network interface".

Hence, it is appropriate that phylink restricts this to situations
where it is known that the media link is terminated on hardware that
phylink is responsible for.

At the moment, the known cases are:
- at the phylib PHY when dealing with conventional twisted pair cabling.
- at the phylib PHY where one is involved in a fibre link.
- at the PCS, where one is involved in a fibre link (which means
  1000base-X or 2500base-X.)

Since SGMII and USXGMII are designed for use between a PHY and the
host system (hence internal to the network interface), rather than over
some user accessible media, there is little point universally making
that call in response to a user request to restart the media
negotiation.

There is two final points to make:

- if we discover a requirement where we need to restart SGMII or
  USXGMII at the MAC PCS end (thank you for showing me that it is
  possible) then, yes, we will have to revisit how we deal with this.
  Yes, we may wish the callback to restart SGMII and USXGMII at that
  point.  However, we do not want to do that if the user requests a
  media side renegotiation.  As I have already explained, restarting
  negotiation on the media side at the PHY will cause a fresh exchange
  - not once, but twice - on the SGMII and USXGMII side anyway, which
  will refresh the configuration.

  The exception to that is if we have a buggy SGMII or USXGMII
  implementation - and, again, when we have such a scenario, that is
  the time to adapt.

- changing the behaviour now that we have several users without good
  reason is inviting regressions - there is the possibility for a state
  machine error if both ends of the link are hit for a renegotiation.
  Yes, I'm being cautious there, but there is always risk to change,
  and if there is no benefit from making that change then it stands to
  reason that there is no net benefit from making that change.

So, to sum up, your commit message _only_ needs to describe the change
you are making.  You should not lecture in a commit message, and you
should use neutral language.

If there is something lacking in the understanding of the callback,
the right place to fix that is in the documentation within the kernel,
not buried in some commit message for some obscure driver that no one
is going to even look at while developing their own driver.  Even so,
such documentation should clearly but briefly explain what is going on.

I have just spent the last 1h40 composing this message - I've put a lot
of thought into it. I obviously do not have the capacity to do that all
the time.
Vladimir Oltean July 4, 2020, 8:29 p.m. UTC | #4
On Sat, Jul 04, 2020 at 07:14:01PM +0100, Russell King - ARM Linux admin wrote:
> On Sat, Jul 04, 2020 at 06:50:48PM +0300, Vladimir Oltean wrote:
> > On Sat, Jul 04, 2020 at 03:56:14PM +0100, Russell King - ARM Linux admin wrote:
> > 
> > [snip]
> > 
> > > 
> > > NAK for this description.  You know why.
> > > 
> > > -- 
> > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> > 
> > Sorry, I cannot work with "too busy" (your feedback from v1) and "you
> > know why". If there's anything incorrect in the description of the
> > patch, please point it out and I will change it.
> 
> Let's recap.
> 
> I have explained to you on numerous instances that:
> 
> - as part of the ethtool program, there is the facility to restart
>   negotiation, which users expect to cause the media to renegotiate.
> 
> - when dealing with a link that involves a conventional copper PHY,
>   irrespective of how that PHY is connected, this has always resulted
>   in the copper PHY being requested to restart negotiation on the
>   media side.
> 
> - in order to provide this same capability for fibre links where
>   negotiation is supported, phylink provides the ability to pass that
>   request to the PCS, since that is the media facing hardware block
>   responsible for on-media negotiation.
> 
> - for SGMII, there is no advertisement from the MAC per-se, it is only
>   an acknowledgement that the MAC has received the configuration word
>   from the PHY.
> 
> It is also true that phylink uses this when there may be a change to
> the PCS advertisement - again, to support the ability for the user to
> change the media-side advertisement.  There is no media side
> advertisement in SGMII at the MAC PCS.
> 

No comment there, my revised commit message confirms indeed that there's
nothing to advertise from MAC side in SGMII/USXGMII mode. The initial
commit message leaves that piece of info up in the air.

> There has been code in phylink that avoids calling the an_restart
> methods since day one, as a result of the above.
> 
> Let's now look at the first version of your commit message:
> | In hardware, the AN_RESTART field for these SerDes protocols (SGMII,
> | USXGMII) clears the resolved configuration from the PCS's
> | auto-negotiation state machine.
> | 
> | But PHYLINK has a different interpretation of "AN restart". It assumes
> | that this Linux system is capable of re-triggering an auto-negotiation
> | sequence, something which is only possible with 1000Base-X and
> | 2500Base-X, where the auto-negotiation is symmetrical. In SGMII and
> | USXGMII, there's an AN master and an AN slave, and it isn't so much of
> | "auto-negotiation" as it is "PHY passing the resolved link state on to
> | the MAC".
> | 
> | So, in PHYLINK's interpretation of "AN restart", it doesn't make sense
> | to do anything for SGMII and USXGMII. In fact, PHYLINK won't even call
> | us for any other SerDes protocol than 1000Base-X and 2500Base-X. But we
> | are not supporting those. So just remove this code.
> 
> This comes over as blaming phylink for an interpretation of "AN
> restart" that does not conform to your ideas.  While it is true that
> phylink has a "different interpretation", that interpretation comes
> from the interface that this callback is implementing, which is for
> the user-level interface.  So, the "blame" that comes over in this
> commit message is completely unjustified.
> 

I also apologized for being imprecise, but you are NACK'ing v2 for v1's
wording here. But, there are also better reasons below.

> You also capitalised "PHYLINK" throughout this message for some reason,
> which comes over as a stressed word (capitals is generally interpreted
> as stress or shouting.)  Then there's "this Linux system" which sounds
> a bit spiteful.
> 

I've been using PHYLINK using capitals for a lot of time now, nothing to
do with shouting, just with the fact that I also spell "Ethernet PHY"
with capitals. I'll change to "Phylink" or "phylink", depending on the
word's location within the phrase, and I'll also ask the people I know
to use this notation.

> None of those things belong in a commit message, so I objected to it,
> explicitly asking you to (quote) "So, please, lay off your phylink
> bashing in your commit messages."
> 
> The replacement that you sent was worse - it continues this theme,
> taking it further:
> 
> | The point is, both Cisco standards make explicit reference that they
> | require an auto-negotiation state machine implemented as per "Figure
> | 37-6-Auto-Negotiation state diagram" from IEEE 802.3. In the SGMII spec,
> | it is very clearly pointed out that both the MAC PCS (Figure 3 MAC
> | Functional Block) and the PHY PCS (Figure 2 PHY Functional Block)
> | contain an auto-negotiation block defined by "Auto-Negotiation Figure
> | 37-6".
> 
> Specifically, "The point is, ..." and "very clearly pointed out" are
> completely unnecessary in a commit message, it gives a lecturing tone
> to this text.  The lecturing tone continues throughout the entire text.
> 

I ramble quite a lot in commit messages, it's nothing personal.
Sometimes, among the rambling I say something useful too. I'll try to
value maintainers' time more by using more succint phrases.

> | PHYLINK takes this fact a bit further, and since the fact that for
> | SGMII/USXGMII, the MAC PCS conveys no new information to the PHY PCS
> | (beyond acknowledging the received config word), does not have any use
> | for permitting the MAC PCS to trigger a restart of the clause 37
> | auto-negotiation.
> 
> Again, it is not phylink that "takes this fact a bit further".  Phylink
> is implementing the needs of userspace via this callback, which is to
> cause autonegotiation to restart on the media.
> 
> | The only SERDES protocols for which PHYLINK allows that are 1000Base-X
> | and 2500Base-X. For those, the control information exchange _is_
> | bidirectional (local PCS specifies its duplex and flow control
> | abilities) since the link partner is at the other side of the media.
> 
> This avoids the point that I have been making for a long time now
> about what phylink is doing here.
> 
> Let me re-cap: phylink implements what is required to support the
> network driver in implementing the what the user expects from the APIs
> exposed by the kernel. One of the APIs is to restart negotiation, which
> is generally accepted to mean the on-media negotiation, rather than
> whatever internal negotiation happens within their "network interface".
> 
> Hence, it is appropriate that phylink restricts this to situations
> where it is known that the media link is terminated on hardware that
> phylink is responsible for.
> 

It doesn't avoid that point. ACK, ethtool -r exists, and .mac_an_restart
can be used to implement it under some circumstances. More below.

> At the moment, the known cases are:
> - at the phylib PHY when dealing with conventional twisted pair cabling.
> - at the phylib PHY where one is involved in a fibre link.
> - at the PCS, where one is involved in a fibre link (which means
>   1000base-X or 2500base-X.)
> 
> Since SGMII and USXGMII are designed for use between a PHY and the
> host system (hence internal to the network interface), rather than over
> some user accessible media, there is little point universally making
> that call in response to a user request to restart the media
> negotiation.
> 
> There is two final points to make:
> 
> - if we discover a requirement where we need to restart SGMII or
>   USXGMII at the MAC PCS end (thank you for showing me that it is
>   possible) then, yes, we will have to revisit how we deal with this.
>   Yes, we may wish the callback to restart SGMII and USXGMII at that
>   point.  However, we do not want to do that if the user requests a
>   media side renegotiation.  As I have already explained, restarting
>   negotiation on the media side at the PHY will cause a fresh exchange
>   - not once, but twice - on the SGMII and USXGMII side anyway, which
>   will refresh the configuration.
> 
>   The exception to that is if we have a buggy SGMII or USXGMII
>   implementation - and, again, when we have such a scenario, that is
>   the time to adapt.
> 
> - changing the behaviour now that we have several users without good
>   reason is inviting regressions - there is the possibility for a state
>   machine error if both ends of the link are hit for a renegotiation.
>   Yes, I'm being cautious there, but there is always risk to change,
>   and if there is no benefit from making that change then it stands to
>   reason that there is no net benefit from making that change.
> 

Yes, I am definitely not suggesting a phylink API change or
reinterpretation of existing API at this point. That would be confusing,
and "confusing" is what I want to avoid, perhaps by using more words
than necessary. I will happily accept that I am the only one who
misunderstood the API on this particular aspect.

> So, to sum up, your commit message _only_ needs to describe the change
> you are making.  You should not lecture in a commit message, and you
> should use neutral language.
> 

Ok, I will try to keep it shorter in v3 and lose the lecturing tone.

> If there is something lacking in the understanding of the callback,
> the right place to fix that is in the documentation within the kernel,
> not buried in some commit message for some obscure driver that no one
> is going to even look at while developing their own driver.  Even so,
> such documentation should clearly but briefly explain what is going on.
> 

There were a number of other points you've made in this text, all of
which boil down to one idea: that restarting SGMII AN from MAC side does
not trigger an MDI-side auto-negotiation process, so it cannot be used
for implementing the behavior expected by the user for "ethtool -r".

I will try to address those points centrally, here, by asking 2
questions.

1. In various topics you have brought up a certain copper SFP module
   from Mikrotik which embeds an inaccessible Atheros SGMII PHY. Mind
   you, I have never interacted with that SFP, but, I have a question
   out of sheer curiosity. How does ethtool -r currently work for such a
   system?

   [ I am not going to use this argument to lean this particular
   discussion in either direction (read: even if my hunch is right and
   restarting AN on the MAC PCS _could_ be the only way to implement
   ethtool -r there, I still don't care enough about that one-off case
   to change the phylink API, for the time being), but I _would_ like
   to know ]

2. There are some 1000Base-T PHYs, such as VSC8234 (which I know from
   first-hand experience, in fact there's even a comment in
   felix_vsc9959.c about it), which restart their MDI-side AN when they
   detect a transition of the system side from data mode to
   configuration mode [ initiated by the MAC ].
   Is this behavior implied by any standard (probably IEEE)? That I
   didn't check. Is this behavior also at least consistent with the
   non-SFP SGMII Atheros PHYs I have? I didn't check that either.
   Anyway, food for thought.

> I have just spent the last 1h40 composing this message - I've put a lot
> of thought into it. I obviously do not have the capacity to do that all
> the time.
> 

Thank you, it shows that you've put some more time and thought into this
reply. Maybe some balance would work a lot better overall?

> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

-Vladimir
Russell King (Oracle) July 4, 2020, 9:55 p.m. UTC | #5
On Sat, Jul 04, 2020 at 11:29:18PM +0300, Vladimir Oltean wrote:
> I will try to address those points centrally, here, by asking 2
> questions.
> 
> 1. In various topics you have brought up a certain copper SFP module
>    from Mikrotik which embeds an inaccessible Atheros SGMII PHY. Mind
>    you, I have never interacted with that SFP, but, I have a question
>    out of sheer curiosity. How does ethtool -r currently work for such a
>    system?

It does not, but we should probably error out if we're in SGMII mode
and we have no PHY, so userspace knows that the request could not be
satisfied.

>    [ I am not going to use this argument to lean this particular
>    discussion in either direction (read: even if my hunch is right and
>    restarting AN on the MAC PCS _could_ be the only way to implement
>    ethtool -r there, I still don't care enough about that one-off case
>    to change the phylink API, for the time being), but I _would_ like
>    to know ]

Even if we did, it will not cause the media side of the Atheros PHY to
renegotiate - the Atheros PHY makes no mention that restarting the
SGMII exchange has any effect on the media side.  I've just tried it
(again) this time with the module plugged in the LX2160A rather than
a Marvell platform - here's the PCS register dump:

00: 0x1140 0x002d 0x0083 0xe400 0x4001 0xd801 0x0006 0x0000
08: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
10: 0x0000 0x0001 0x0d40 0x0003 0x0003 0xdab6 0x0000 0x0000
18: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

If I write 0x1340 to the BMCR, nothing happens on the media side - it
remains linked with the switch the RJ45 cable is plugged in to, which
means nothing happened on the media side.

> 2. There are some 1000Base-T PHYs, such as VSC8234 (which I know from
>    first-hand experience, in fact there's even a comment in
>    felix_vsc9959.c about it), which restart their MDI-side AN when they
>    detect a transition of the system side from data mode to
>    configuration mode [ initiated by the MAC ].
>    Is this behavior implied by any standard (probably IEEE)? That I
>    didn't check. Is this behavior also at least consistent with the
>    non-SFP SGMII Atheros PHYs I have? I didn't check that either.
>    Anyway, food for thought.

The first thing to straighten out in your comment above is that SGMII
is a Cisco modification of IEEE 802.3 1000BASE-X.  SGMII has not been
incorporated into IEEE 802.3.

I'm going to start with a description of the two - which will aid the
later explanation, so please bear with me.

1000BASE-X deals with gigabit ethernet over Fibre media - and what you
have there (from a practical point, rather than the ISO levels that
are shown in IEEE 802.3) is:

MAC <-> PCS <-> Serdes <-> Optical media <-> Serdes <-> PCS <-> MAC

Each PCS transmits the abilities of its respective end, and receives
the abilities from the remote end. The abilities consists only of
duplex and pause modes, since speed is fixed at gigabit.  Negotiation
can be restarted by either end.

Cisco SGMII took this, and decided to make several modifications to
support a PHY instead of optical media, the most important being:

1) addition of symbol replication for 100M and 10M over the gigabit
   path without changing the bitrate of the path.
2) changed the contents of the configuration word to allow the PHY
   to inform the MAC of the current speed and duplex.

So we end up with:

MAC <-> PCS <-> Serdes <-> PHY <-> Media ...

Given that this is the case, IEEE 802.3 does not cover this setup -
having a PHY attached is beyond the 1000BASE-X specification that
it covers.  So there is nothing in there to mandate that a SGMII PHY
should restart its media side negotiation due to the SGMII side
restarting.

If it were required, it would be in the Cisco SGMII specification. As
it doesn't even make explicit mention of restarting the SGMII exchange
from the MAC end, I really doubt that it would make any comment about
a restart of the SGMII exchange restarting the media side.

So, we're down to the vaguaries of the various PHY manufacturers.

As I've shown, Atheros AR803x do not restart their media side on SGMII
side "negotiation" events.  I've just tested with the Marvell 88E1111,
which is probably the most popular PHY for copper gigabit SFPs out
there, and that also does not restart the media side either.

As for VSC8234, the comment you refer to is:

	Some PHYs like VSC8234 don't like it when AN restarts on
	their system side and they restart line side AN too, going
	into an endless link up/down loop.

However, without knowing in detail what is happening on the SGMII link,
it would be difficult to really know what is going on. It could be that
the implementation in the PHY is fine but has this additional vendor
feature, but the host side always triggers a second exchange of the
configuration word each time that the PHY notifies the MAC PCS of an
updated configuration word.  It could also be a misfeature of the PHY
itself.

It is possible to detect which mode the VSC8234 PHY is in when
connected to the Lynx PCS by looking at the link-partner advertisement
register (register 5) when an AN exchange has completed. Bit 0 is a good
indicator whether the PHY is operating in SGMII mode (1) or 1000BASE-X
mode (0).

There is another possibility, however.  That is the VSC8234 is not in
SGMII mode, but is in 1000BASE-X.  I'm aware of some copper SFPs that
use 1000BASE-X rather than SGMII, where the advertisement from the MAC
PCS to the SFP affects the media side duplex and pause advertisement,
and so any change on the host side causes the media side to restart.
It is, however, unlikely that a PHY configured in 1000BASE-X will be
able to complete negotiation with a host in SGMII mode - the duplex
bits will both be zero leading to an invalid resolution.
diff mbox series

Patch

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 4684339012c5..57c400a67f16 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -296,15 +296,6 @@  static void felix_phylink_mac_config(struct dsa_switch *ds, int port,
 						  state->speed);
 }
 
-static void felix_phylink_mac_an_restart(struct dsa_switch *ds, int port)
-{
-	struct ocelot *ocelot = ds->priv;
-	struct felix *felix = ocelot_to_felix(ocelot);
-
-	if (felix->info->pcs_an_restart)
-		felix->info->pcs_an_restart(ocelot, port);
-}
-
 static void felix_phylink_mac_link_down(struct dsa_switch *ds, int port,
 					unsigned int link_an_mode,
 					phy_interface_t interface)
@@ -810,7 +801,6 @@  static const struct dsa_switch_ops felix_switch_ops = {
 	.phylink_validate	= felix_phylink_validate,
 	.phylink_mac_link_state	= felix_phylink_mac_pcs_get_state,
 	.phylink_mac_config	= felix_phylink_mac_config,
-	.phylink_mac_an_restart	= felix_phylink_mac_an_restart,
 	.phylink_mac_link_down	= felix_phylink_mac_link_down,
 	.phylink_mac_link_up	= felix_phylink_mac_link_up,
 	.port_enable		= felix_port_enable,
diff --git a/drivers/net/dsa/ocelot/felix.h b/drivers/net/dsa/ocelot/felix.h
index a891736ca006..4a4cebcf04a7 100644
--- a/drivers/net/dsa/ocelot/felix.h
+++ b/drivers/net/dsa/ocelot/felix.h
@@ -31,7 +31,6 @@  struct felix_info {
 	void	(*pcs_init)(struct ocelot *ocelot, int port,
 			    unsigned int link_an_mode,
 			    const struct phylink_link_state *state);
-	void	(*pcs_an_restart)(struct ocelot *ocelot, int port);
 	void	(*pcs_link_state)(struct ocelot *ocelot, int port,
 				  struct phylink_link_state *state);
 	int	(*prevalidate_phy_mode)(struct ocelot *ocelot, int port,
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 94e946b26f90..65f83386bad1 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -728,42 +728,6 @@  static int vsc9959_reset(struct ocelot *ocelot)
 	return 0;
 }
 
-static void vsc9959_pcs_an_restart_sgmii(struct phy_device *pcs)
-{
-	phy_set_bits(pcs, MII_BMCR, BMCR_ANRESTART);
-}
-
-static void vsc9959_pcs_an_restart_usxgmii(struct phy_device *pcs)
-{
-	phy_write_mmd(pcs, MDIO_MMD_VEND2, MII_BMCR,
-		      USXGMII_BMCR_RESET |
-		      USXGMII_BMCR_AN_EN |
-		      USXGMII_BMCR_RST_AN);
-}
-
-static void vsc9959_pcs_an_restart(struct ocelot *ocelot, int port)
-{
-	struct felix *felix = ocelot_to_felix(ocelot);
-	struct phy_device *pcs = felix->pcs[port];
-
-	if (!pcs)
-		return;
-
-	switch (pcs->interface) {
-	case PHY_INTERFACE_MODE_SGMII:
-	case PHY_INTERFACE_MODE_QSGMII:
-		vsc9959_pcs_an_restart_sgmii(pcs);
-		break;
-	case PHY_INTERFACE_MODE_USXGMII:
-		vsc9959_pcs_an_restart_usxgmii(pcs);
-		break;
-	default:
-		dev_err(ocelot->dev, "Invalid PCS interface type %s\n",
-			phy_modes(pcs->interface));
-		break;
-	}
-}
-
 /* We enable SGMII AN only when the PHY has managed = "in-band-status" in the
  * device tree. If we are in MLO_AN_PHY mode, we program directly state->speed
  * into the PCS, which is retrieved out-of-band over MDIO. This also has the
@@ -1411,7 +1375,6 @@  struct felix_info felix_info_vsc9959 = {
 	.mdio_bus_alloc		= vsc9959_mdio_bus_alloc,
 	.mdio_bus_free		= vsc9959_mdio_bus_free,
 	.pcs_init		= vsc9959_pcs_init,
-	.pcs_an_restart		= vsc9959_pcs_an_restart,
 	.pcs_link_state		= vsc9959_pcs_link_state,
 	.prevalidate_phy_mode	= vsc9959_prevalidate_phy_mode,
 	.port_setup_tc          = vsc9959_port_setup_tc,