mbox series

[net-next,0/8] Extend `phy-mode` to string array

Message ID 20211117225050.18395-1-kabel@kernel.org
Headers show
Series Extend `phy-mode` to string array | expand

Message

Marek Behún Nov. 17, 2021, 10:50 p.m. UTC
Hello,

this series extends the `phy-connection-type` / `phy-mode` property to
be an array of strings, instead of just one string, and makes the
corresponding changes to code. It then uses this changes to make
marvell10g PHY driver choose the best MACTYPE according to which
phy-modes are supported by the MAC, the PHY and the board.

Conventionaly the `phy-mode` means "this is the mode I want the PHY to
operate in". But we now have some PHYs that may need to change the PHY
mode during operation (marvell10g PHY driver), and so we need to know
all the supported modes. Russell King is working on MAC and PHY drivers
to inform phylink on which PHY interface modes they support, but it is
not enough, because even if a MAC/PHY driver fills all the modes
supported by the driver, still each individual board may have only some
of these modes actually wired.

This series
- changes the type of the `phy-mode` property to be an array of PHY
  interface strings,
- updated documentation of of_get_phy_mode() and related to inform that
  only first mode is returned by these functions (since this function
  is needed to still support conventional usage of the `phy-mode`
  property),
- adds fwnode_get_phy_modes() function which reads the `phy-mode` array
  and fills bitmap with mentioned modes,
- adds code to phylink to intersect the supported interfaces bitmap
  supplied by MAC driver, with interface modes defined in device-tree
  (and keeps backwards compatibility with conventional usage of the
   phy-mode property, for more information read the commit message of
   patch 4/8),
- passes supported interfaces to PHY driver so that it may configure
  a PHY to a specific mode given these interfaces,
- uses this information in marvell10g driver.

Changes since RFC:
- update also description of the `phy-connection-type` property

Marek Behún (7):
  dt-bindings: ethernet-controller: support multiple PHY connection
    types
  net: Update documentation for *_get_phy_mode() functions
  device property: add helper function for getting phy mode bitmap
  net: phylink: update supported_interfaces with modes from fwnode
  net: phylink: pass supported PHY interface modes to phylib
  net: phy: marvell10g: Use generic macro for supported interfaces
  net: phy: marvell10g: Use tabs instead of spaces for indentation

Russell King (1):
  net: phy: marvell10g: select host interface configuration

 .../bindings/net/ethernet-controller.yaml     |  94 ++++++------
 drivers/base/property.c                       |  48 +++++-
 drivers/net/phy/marvell10g.c                  | 140 ++++++++++++++++--
 drivers/net/phy/phylink.c                     |  91 ++++++++++++
 include/linux/phy.h                           |  10 ++
 include/linux/property.h                      |   3 +
 net/core/of_net.c                             |   9 +-
 7 files changed, 325 insertions(+), 70 deletions(-)

Comments

Russell King (Oracle) Nov. 18, 2021, 9:05 a.m. UTC | #1
On Wed, Nov 17, 2021 at 11:50:46PM +0100, Marek Behún wrote:
> Now that the 'phy-mode' property can be a string array containing more
> PHY modes (all that are supported by the board), update the bitmap of
> interfaces supported by the MAC with this property.
> 
> Normally this would be a simple intersection (of interfaces supported by
> the current implementation of the driver and interfaces supported by the
> board), but we need to keep being backwards compatible with older DTs,
> which may only define one mode, since, as Russell King says,
>   conventionally phy-mode has meant "this is the mode we want to operate
>   the PHY interface in" which was fine when PHYs didn't change their
>   mode depending on the media speed
> 
> An example is DT defining
>   phy-mode = "sgmii";
> but the board supporting also 1000base-x and 2500base-x.
> 
> Add the following logic to keep this backwards compatiblity:
> - if more PHY modes are defined, do a simple intersection
> - if one PHY mode is defined:
>   - if it is sgmii, 1000base-x or 2500base-x, add all three and then do
>     the intersection
>   - if it is 10gbase-r or usxgmii, add both, and also 5gbase-r,
>     2500base-x, 1000base-x and sgmii, and then do the intersection
> 
> This is simple enough and should work for all boards.
> 
> Nonetheless it is possible (although extremely unlikely, in my opinion)
> that a board will be found that (for example) defines
>   phy-mode = "sgmii";
> and the MAC drivers supports sgmii, 1000base-x and 2500base-x, but the
> board DOESN'T support 2500base-x, because of electrical reasons (since
> the frequency is 2.5x of sgmii).
> Our code will in this case incorrectly infer also support for
> 2500base-x. To avoid this, the board maintainer should either change DTS
> to
>   phy-mode = "sgmii", "1000base-x";
> and update device tree on all boards, or, if that is impossible, add a
> fix into the function we are introducing in this commit.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>
> ---
>  drivers/net/phy/phylink.c | 63 +++++++++++++++++++++++++++++++++++++++
>  include/linux/phy.h       |  6 ++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index f7156b6868e7..6d7c216a5dea 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -563,6 +563,67 @@ static int phylink_parse_fixedlink(struct phylink *pl,
>  	return 0;
>  }
>  
> +static void phylink_update_phy_modes(struct phylink *pl,
> +				     struct fwnode_handle *fwnode)
> +{
> +	unsigned long *supported = pl->config->supported_interfaces;
> +	DECLARE_PHY_INTERFACE_MASK(modes);
> +
> +	if (fwnode_get_phy_modes(fwnode, modes) < 0)
> +		return;
> +
> +	if (phy_interface_empty(modes))
> +		return;
> +
> +	/* If supported is empty, just copy modes defined in fwnode. */
> +	if (phy_interface_empty(supported))
> +		return phy_interface_copy(supported, modes);

Doesn't this mean we always end up with the supported_interfaces field
filled in, even for drivers that haven't yet been converted? It will
have the effect of locking the driver to the interface mode in "modes"
where only one interface mode is mentioned in DT.

At the moment, I think the only drivers that would be affected would be
some DSA drivers, stmmac and macb as they haven't yet been converted.
Russell King (Oracle) Nov. 18, 2021, 9:32 a.m. UTC | #2
On Wed, Nov 17, 2021 at 11:50:47PM +0100, Marek Behún wrote:
> Pass the supported PHY interface types to phylib so that PHY drivers
> can select an appropriate host configuration mode for their interface
> according to the host capabilities.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) Nov. 18, 2021, 9:33 a.m. UTC | #3
On Wed, Nov 17, 2021 at 11:50:48PM +0100, Marek Behún wrote:
> Now that phy.h defines macro DECLARE_PHY_INTERFACE_MASK(), use it
> instead of DECLARE_BITMAP().
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) Nov. 18, 2021, 9:34 a.m. UTC | #4
On Wed, Nov 17, 2021 at 11:50:49PM +0100, Marek Behún wrote:
> Some register definitions were defined with spaces used for indentation.
> Change them to tabs.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) Nov. 18, 2021, 9:41 a.m. UTC | #5
On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> +static int mv3310_select_mactype(unsigned long *interfaces)
> +{
> +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;

Hi,

There are differences in the MACTYPE register between the 88X3310
and 88X3340. For example, the 88X3340 has no support for XAUI.
This is documented in the data sheet, and in the definitions of
these values - note that MV_V2_3310_PORT_CTRL_MACTYPE_XAUI and
MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH are only applicable
to the 88X3310 (they don't use MV_V2_33X0_*).
Vladimir Oltean Nov. 18, 2021, 12:03 p.m. UTC | #6
Hello,

On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> From: Russell King <rmk+kernel@armlinux.org.uk>
> 
> Select the host interface configuration according to the capabilities of
> the host.
> 
> This allows the kernel to:
> - support SFP modules with 88X33X0 or 88E21X0 inside them
> - switch interface modes when the PHY is used with the mvpp2 MAC
>   (e.g. on MacchiatoBIN)
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> [ rebase, updated, also added support for 88E21X0 ]
> Signed-off-by: Marek Behún <kabel@kernel.org>
> ---
>  drivers/net/phy/marvell10g.c | 120 +++++++++++++++++++++++++++++++++--
>  1 file changed, 115 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
> index 0cb9b4ef09c7..94bea1bade6f 100644
> --- a/drivers/net/phy/marvell10g.c
> +++ b/drivers/net/phy/marvell10g.c
> @@ -96,6 +96,11 @@ enum {
>  	MV_PCS_PORT_INFO_NPORTS_MASK	= 0x0380,
>  	MV_PCS_PORT_INFO_NPORTS_SHIFT	= 7,
>  
> +	/* SerDes reinitialization 88E21X0 */
> +	MV_AN_21X0_SERDES_CTRL2	= 0x800f,
> +	MV_AN_21X0_SERDES_CTRL2_AUTO_INIT_DIS	= BIT(13),
> +	MV_AN_21X0_SERDES_CTRL2_RUN_INIT	= BIT(15),
> +
>  	/* These registers appear at 0x800X and 0xa00X - the 0xa00X control
>  	 * registers appear to set themselves to the 0x800X when AN is
>  	 * restarted, but status registers appear readable from either.
> @@ -140,6 +145,8 @@ struct mv3310_chip {
>  	bool (*has_downshift)(struct phy_device *phydev);
>  	void (*init_supported_interfaces)(unsigned long *mask);
>  	int (*get_mactype)(struct phy_device *phydev);
> +	int (*set_mactype)(struct phy_device *phydev, int mactype);
> +	int (*select_mactype)(unsigned long *interfaces);
>  	int (*init_interface)(struct phy_device *phydev, int mactype);
>  
>  #ifdef CONFIG_HWMON
> @@ -593,6 +600,49 @@ static int mv2110_get_mactype(struct phy_device *phydev)
>  	return mactype & MV_PMA_21X0_PORT_CTRL_MACTYPE_MASK;
>  }
>  
> +static int mv2110_set_mactype(struct phy_device *phydev, int mactype)
> +{
> +	int err, val;
> +
> +	mactype &= MV_PMA_21X0_PORT_CTRL_MACTYPE_MASK;
> +	err = phy_modify_mmd(phydev, MDIO_MMD_PMAPMD, MV_PMA_21X0_PORT_CTRL,
> +			     MV_PMA_21X0_PORT_CTRL_SWRST |
> +			     MV_PMA_21X0_PORT_CTRL_MACTYPE_MASK,
> +			     MV_PMA_21X0_PORT_CTRL_SWRST | mactype);
> +	if (err)
> +		return err;
> +
> +	err = phy_set_bits_mmd(phydev, MDIO_MMD_AN, MV_AN_21X0_SERDES_CTRL2,
> +			       MV_AN_21X0_SERDES_CTRL2_AUTO_INIT_DIS |
> +			       MV_AN_21X0_SERDES_CTRL2_RUN_INIT);
> +	if (err)
> +		return err;
> +
> +	err = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_AN,
> +					MV_AN_21X0_SERDES_CTRL2, val,
> +					!(val &
> +					  MV_AN_21X0_SERDES_CTRL2_RUN_INIT),
> +					5000, 100000, true);
> +	if (err)
> +		return err;
> +
> +	return phy_clear_bits_mmd(phydev, MDIO_MMD_AN, MV_AN_21X0_SERDES_CTRL2,
> +				  MV_AN_21X0_SERDES_CTRL2_AUTO_INIT_DIS);
> +}
> +
> +static int mv2110_select_mactype(unsigned long *interfaces)
> +{
> +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> +		return MV_PMA_21X0_PORT_CTRL_MACTYPE_USXGMII;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 !test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> +		return MV_PMA_21X0_PORT_CTRL_MACTYPE_5GBASER;
> +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> +		return MV_PMA_21X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> +	else
> +		return -1;
> +}
> +
>  static int mv3310_get_mactype(struct phy_device *phydev)
>  {
>  	int mactype;
> @@ -604,6 +654,46 @@ static int mv3310_get_mactype(struct phy_device *phydev)
>  	return mactype & MV_V2_33X0_PORT_CTRL_MACTYPE_MASK;
>  }
>  
> +static int mv3310_set_mactype(struct phy_device *phydev, int mactype)
> +{
> +	int ret;
> +
> +	mactype &= MV_V2_33X0_PORT_CTRL_MACTYPE_MASK;
> +	ret = phy_modify_mmd_changed(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL,
> +				     MV_V2_33X0_PORT_CTRL_MACTYPE_MASK,
> +				     mactype);
> +	if (ret <= 0)
> +		return ret;
> +
> +	return phy_set_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL,
> +				MV_V2_33X0_PORT_CTRL_SWRST);
> +}
> +
> +static int mv3310_select_mactype(unsigned long *interfaces)
> +{
> +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> +	else
> +		return -1;
> +}
> +

I would like to understand this heuristic better. Both its purpose and
its implementation.

It says:
(a) If the intersection between interface modes supported by the MAC and
    the PHY contains USXGMII, then use USXGMII as a MACTYPE
(b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
    use 10GBaseR as MACTYPE
(...)
(c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
    use 10GBaseR with rate matching as MACTYPE
(...)
(d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
    use 10GBaseR as MACTYPE (no rate matching).

First of all, what is MACTYPE exactly? And what is the purpose of
changing it? What would happen if this configuration remained fixed, as
it were?

I see there is no MACTYPE definition for SGMII. Why is that? How does
the PHY choose to use SGMII?

Why is item (d) correct - use 10GBaseR as MACTYPE if the intersection
only supports SGMII?

A breakdown per link speed might be helpful in understanding the
configurations being performed here.

>  static int mv2110_init_interface(struct phy_device *phydev, int mactype)
>  {
>  	struct mv3310_priv *priv = dev_get_drvdata(&phydev->mdio.dev);
> @@ -674,10 +764,16 @@ static int mv3310_config_init(struct phy_device *phydev)
>  {
>  	struct mv3310_priv *priv = dev_get_drvdata(&phydev->mdio.dev);
>  	const struct mv3310_chip *chip = to_mv3310_chip(phydev);
> +	DECLARE_PHY_INTERFACE_MASK(interfaces);
>  	int err, mactype;
>  
> -	/* Check that the PHY interface type is compatible */
> -	if (!test_bit(phydev->interface, priv->supported_interfaces))
> +	/* In case host didn't provide supported interfaces */
> +	__set_bit(phydev->interface, phydev->host_interfaces);

Shouldn't phylib populate phydev->host_interfaces with
phydev->interface, rather than requiring PHY drivers to do it?

> +
> +	/* Check that there is at least one compatible PHY interface type */
> +	phy_interface_and(interfaces, phydev->host_interfaces,
> +			  priv->supported_interfaces);
> +	if (phy_interface_empty(interfaces))
>  		return -ENODEV;
>  
>  	phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
> @@ -687,9 +783,15 @@ static int mv3310_config_init(struct phy_device *phydev)
>  	if (err)
>  		return err;
>  
> -	mactype = chip->get_mactype(phydev);
> -	if (mactype < 0)
> -		return mactype;
> +	mactype = chip->select_mactype(interfaces);
> +	if (mactype < 0) {
> +		mactype = chip->get_mactype(phydev);
> +	} else {
> +		phydev_info(phydev, "Changing MACTYPE to %i\n", mactype);
> +		err = chip->set_mactype(phydev, mactype);
> +		if (err)
> +			return err;
> +	}
>  
>  	err = chip->init_interface(phydev, mactype);
>  	if (err) {
> @@ -1049,6 +1151,8 @@ static const struct mv3310_chip mv3310_type = {
>  	.has_downshift = mv3310_has_downshift,
>  	.init_supported_interfaces = mv3310_init_supported_interfaces,
>  	.get_mactype = mv3310_get_mactype,
> +	.set_mactype = mv3310_set_mactype,
> +	.select_mactype = mv3310_select_mactype,
>  	.init_interface = mv3310_init_interface,
>  
>  #ifdef CONFIG_HWMON
> @@ -1060,6 +1164,8 @@ static const struct mv3310_chip mv3340_type = {
>  	.has_downshift = mv3310_has_downshift,
>  	.init_supported_interfaces = mv3340_init_supported_interfaces,
>  	.get_mactype = mv3310_get_mactype,
> +	.set_mactype = mv3310_set_mactype,
> +	.select_mactype = mv3310_select_mactype,
>  	.init_interface = mv3340_init_interface,
>  
>  #ifdef CONFIG_HWMON
> @@ -1070,6 +1176,8 @@ static const struct mv3310_chip mv3340_type = {
>  static const struct mv3310_chip mv2110_type = {
>  	.init_supported_interfaces = mv2110_init_supported_interfaces,
>  	.get_mactype = mv2110_get_mactype,
> +	.set_mactype = mv2110_set_mactype,
> +	.select_mactype = mv2110_select_mactype,
>  	.init_interface = mv2110_init_interface,
>  
>  #ifdef CONFIG_HWMON
> @@ -1080,6 +1188,8 @@ static const struct mv3310_chip mv2110_type = {
>  static const struct mv3310_chip mv2111_type = {
>  	.init_supported_interfaces = mv2111_init_supported_interfaces,
>  	.get_mactype = mv2110_get_mactype,
> +	.set_mactype = mv2110_set_mactype,
> +	.select_mactype = mv2110_select_mactype,
>  	.init_interface = mv2110_init_interface,
>  
>  #ifdef CONFIG_HWMON
> -- 
> 2.32.0
>
Russell King (Oracle) Nov. 18, 2021, 1:22 p.m. UTC | #7
On Thu, Nov 18, 2021 at 02:03:34PM +0200, Vladimir Oltean wrote:
> On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > +static int mv3310_select_mactype(unsigned long *interfaces)
> > +{
> > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > +	else
> > +		return -1;
> > +}
> > +
> 
> I would like to understand this heuristic better. Both its purpose and
> its implementation.
> 
> It says:
> (a) If the intersection between interface modes supported by the MAC and
>     the PHY contains USXGMII, then use USXGMII as a MACTYPE
> (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
>     use 10GBaseR as MACTYPE
> (...)
> (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
>     use 10GBaseR with rate matching as MACTYPE
> (...)
> (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
>     use 10GBaseR as MACTYPE (no rate matching).

What is likely confusing you is a misinterpretation of the constant.
MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER actually means the PHY will
choose between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII depending
on the speed negotiated by the media. In this setting, the PHY
dictates which interface mode will be used.

I could have named "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER" as
"MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
Similar with "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_NO_SGMII_AN", which
would be
"MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_OFF".
And "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI" would be
"MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_5GBASER_2500BASEX_SGMII_AUTONEG_ON".

Clearly using such long identifiers would have been rediculous,
especially the second one at 74 characters.

> First of all, what is MACTYPE exactly? And what is the purpose of
> changing it? What would happen if this configuration remained fixed, as
> it were?

The PHY defines the MAC interface mode depending on the MACTYPE
setting selected and the results of the media side negotiation.

I think the above answers your remaining questions.
Marek Behún Nov. 18, 2021, 1:46 p.m. UTC | #8
On Thu, 18 Nov 2021 09:41:27 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > +static int mv3310_select_mactype(unsigned long *interfaces)
> > +{
> > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;  
> 
> Hi,
> 
> There are differences in the MACTYPE register between the 88X3310
> and 88X3340. For example, the 88X3340 has no support for XAUI.
> This is documented in the data sheet, and in the definitions of
> these values - note that MV_V2_3310_PORT_CTRL_MACTYPE_XAUI and
> MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH are only applicable
> to the 88X3310 (they don't use MV_V2_33X0_*).

Yes, but 88X3340 does not support XAUI, only RXAUI, it is defined in
supported_interfaces. So PHY_INTERFACE_MODE_XAUI will never be set in
interfaces, and thus this function can be used for both 88X3310 and
88X3340.

Marek
Marek Behún Nov. 18, 2021, 1:56 p.m. UTC | #9
On Thu, 18 Nov 2021 14:03:34 +0200
Vladimir Oltean <olteanv@gmail.com> wrote:

> Hello,
> 
> > +static int mv3310_select_mactype(unsigned long *interfaces)
> > +{
> > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > +	else
> > +		return -1;
> > +}
> > +  
> 
> I would like to understand this heuristic better. Both its purpose and
> its implementation.
> 
> It says:
> (a) If the intersection between interface modes supported by the MAC and
>     the PHY contains USXGMII, then use USXGMII as a MACTYPE
> (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
>     use 10GBaseR as MACTYPE
> (...)
> (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
>     use 10GBaseR with rate matching as MACTYPE
> (...)
> (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
>     use 10GBaseR as MACTYPE (no rate matching).
> 
> First of all, what is MACTYPE exactly? And what is the purpose of
> changing it? What would happen if this configuration remained fixed, as
> it were?
> 
> I see there is no MACTYPE definition for SGMII. Why is that? How does
> the PHY choose to use SGMII?
> 
> Why is item (d) correct - use 10GBaseR as MACTYPE if the intersection
> only supports SGMII?
> 
> A breakdown per link speed might be helpful in understanding the
> configurations being performed here.

Russell already explained this. I will put a better explanation into
the commit message in v2.

> >  static int mv2110_init_interface(struct phy_device *phydev, int mactype)
> >  {
> >  	struct mv3310_priv *priv = dev_get_drvdata(&phydev->mdio.dev);
> > @@ -674,10 +764,16 @@ static int mv3310_config_init(struct phy_device *phydev)
> >  {
> >  	struct mv3310_priv *priv = dev_get_drvdata(&phydev->mdio.dev);
> >  	const struct mv3310_chip *chip = to_mv3310_chip(phydev);
> > +	DECLARE_PHY_INTERFACE_MASK(interfaces);
> >  	int err, mactype;
> >  
> > -	/* Check that the PHY interface type is compatible */
> > -	if (!test_bit(phydev->interface, priv->supported_interfaces))
> > +	/* In case host didn't provide supported interfaces */
> > +	__set_bit(phydev->interface, phydev->host_interfaces);  
> 
> Shouldn't phylib populate phydev->host_interfaces with
> phydev->interface, rather than requiring PHY drivers to do it?

I will look into this.

Marek
Vladimir Oltean Nov. 18, 2021, 2:20 p.m. UTC | #10
On Thu, Nov 18, 2021 at 01:22:18PM +0000, Russell King (Oracle) wrote:
> On Thu, Nov 18, 2021 at 02:03:34PM +0200, Vladimir Oltean wrote:
> > On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > > +static int mv3310_select_mactype(unsigned long *interfaces)
> > > +{
> > > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > +	else
> > > +		return -1;
> > > +}
> > > +
> > 
> > I would like to understand this heuristic better. Both its purpose and
> > its implementation.
> > 
> > It says:
> > (a) If the intersection between interface modes supported by the MAC and
> >     the PHY contains USXGMII, then use USXGMII as a MACTYPE
> > (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
> >     use 10GBaseR as MACTYPE
> > (...)
> > (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
> >     use 10GBaseR with rate matching as MACTYPE
> > (...)
> > (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
> >     use 10GBaseR as MACTYPE (no rate matching).
> 
> What is likely confusing you is a misinterpretation of the constant.
> MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER actually means the PHY will
> choose between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII depending
> on the speed negotiated by the media. In this setting, the PHY
> dictates which interface mode will be used.
> 
> I could have named "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER" as
> "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> Similar with "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_NO_SGMII_AN", which
> would be
> "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_OFF".
> And "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI" would be
> "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> 
> Clearly using such long identifiers would have been rediculous,
> especially the second one at 74 characters.

True, but at least there could be a comment above each definition.
There's no size limit to that.

> > First of all, what is MACTYPE exactly? And what is the purpose of
> > changing it? What would happen if this configuration remained fixed, as
> > it were?
> 
> The PHY defines the MAC interface mode depending on the MACTYPE
> setting selected and the results of the media side negotiation.
> 
> I think the above answers your remaining questions.

Ok, so going back to case (d). You said that the full name would be
MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON.
This means that when the only interface mode supported by the host would
be SGMII, the PHY's MACTYPE is still configured to use 2500basex,
5gbaser, 10gbaser for the higher link speeds. Clearly this won't work.
But on the other hand, the phylink validate method will remove
2500baseT, 5000baseT, 1000baseT from the advertising mask of the PHY, so
the system will never end up operating at those speeds, so it should be fine.



The reason why I'm looking at these patches is to see whether they would
bring something useful to Aquantia PHYs. These come with firmware on a
flash that is customized by Aquantia themselves based on the specifications
of a single board. These PHYs have an ability which is very similar to
what I'm seeing here, which is to select, for each negotiated link speed
on the media side, the SERDES protocol to use on the system side. This
is pre-programmed by the firmware, but could be fixed up by the
operating system if done carefully.

The way Layerscape boards use Aquantia PHYs is to always select the
"rate matching" option and keep the SERDES protocol fixed, just
configure the mini MAC inside the PHY to emit PAUSE frames towards the
system to keep the data rate under control. We would be using these PHYs
with the generic C45 driver, which would be mostly enough except for
lack of PHY interrupts, because the firmware already configures
everything.

But on the other hand it gets a bit tiring, especially for PHYs on riser
cards, to have to change firmware in order to test a different SERDES
protocol, so we were experimenting with some changes in the PHY driver
that would basically keep the firmware image fixed, and just fix up the
configuration it made, and do things like "use 2500base-x for the
2500base-T speed, and sgmii for 1000base-T/100base-TX". The ability for
a PHY to work on a board where its firmware image wasn't specifically
designed for it comes in handy sometimes.

I was reluctant to submit any changes to the Aquantia PHY driver because
we don't really have any guarantees that we wouldn't break any system
that uses it. I do know that there are users who do expect SERDES
protocol changes currently, because I do see that aqr107_read_status()
checks for a possibly modified phydev->interface. But without the
ability of knowing what SERDES protocols does the system-side support,
it is pretty difficult to modify the SERDES protocol used for a certain
link speed without breaking something.

I see that this patch set basically introduces the phydev->host_interfaces
bitmap which is an attempt to find the answer to that question. But when
will we know enough about phydev->host_interfaces in order to safely
make decisions in the PHY driver based on it? phylink sets it, phylib does not.
And many Aquantia systems use the generic PHY driver, as mentioned.
Additionally, there are old device trees at play here, which only define
the initial SERDES protocol. Would we be changing the behavior for those,
in that we would be configuring the PHY to keep the SERDES protocol
fixed whereas it would have dynamically changed before?

Another question is what to do if there are multiple ways of
establishing a system-side link. For example 1000 Mbps can be achieved
either through SGMII, or USXGMII with symbol replication, or 2500base-x
with flow control, or 10GBaseR with flow control. And I want to test
them all. What would I need to do to change the SERDES protocol from one
value to the other? Changing the phy-mode array in the device tree would
be one option, but that may not always be possible.
Russell King (Oracle) Nov. 18, 2021, 2:25 p.m. UTC | #11
On Thu, Nov 18, 2021 at 02:46:28PM +0100, Marek Behún wrote:
> On Thu, 18 Nov 2021 09:41:27 +0000
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > > +static int mv3310_select_mactype(unsigned long *interfaces)
> > > +{
> > > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;  
> > 
> > Hi,
> > 
> > There are differences in the MACTYPE register between the 88X3310
> > and 88X3340. For example, the 88X3340 has no support for XAUI.
> > This is documented in the data sheet, and in the definitions of
> > these values - note that MV_V2_3310_PORT_CTRL_MACTYPE_XAUI and
> > MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH are only applicable
> > to the 88X3310 (they don't use MV_V2_33X0_*).
> 
> Yes, but 88X3340 does not support XAUI, only RXAUI, it is defined in
> supported_interfaces. So PHY_INTERFACE_MODE_XAUI will never be set in
> interfaces, and thus this function can be used for both 88X3310 and
> 88X3340.

Okay, that's fine then. The only setting we will omit from this for
the 88X3340 then is MV_V2_3340_PORT_CTRL_MACTYPE_RXAUI_NO_SGMII_AN.
Russell King (Oracle) Nov. 18, 2021, 2:47 p.m. UTC | #12
On Thu, Nov 18, 2021 at 04:20:39PM +0200, Vladimir Oltean wrote:
> On Thu, Nov 18, 2021 at 01:22:18PM +0000, Russell King (Oracle) wrote:
> > On Thu, Nov 18, 2021 at 02:03:34PM +0200, Vladimir Oltean wrote:
> > > On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > > > +static int mv3310_select_mactype(unsigned long *interfaces)
> > > > +{
> > > > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > > +	else
> > > > +		return -1;
> > > > +}
> > > > +
> > > 
> > > I would like to understand this heuristic better. Both its purpose and
> > > its implementation.
> > > 
> > > It says:
> > > (a) If the intersection between interface modes supported by the MAC and
> > >     the PHY contains USXGMII, then use USXGMII as a MACTYPE
> > > (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
> > >     use 10GBaseR as MACTYPE
> > > (...)
> > > (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
> > >     use 10GBaseR with rate matching as MACTYPE
> > > (...)
> > > (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
> > >     use 10GBaseR as MACTYPE (no rate matching).
> > 
> > What is likely confusing you is a misinterpretation of the constant.
> > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER actually means the PHY will
> > choose between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII depending
> > on the speed negotiated by the media. In this setting, the PHY
> > dictates which interface mode will be used.
> > 
> > I could have named "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER" as
> > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> > Similar with "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_NO_SGMII_AN", which
> > would be
> > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_OFF".
> > And "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI" would be
> > "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> > 
> > Clearly using such long identifiers would have been rediculous,
> > especially the second one at 74 characters.
> 
> True, but at least there could be a comment above each definition.
> There's no size limit to that.
> 
> > > First of all, what is MACTYPE exactly? And what is the purpose of
> > > changing it? What would happen if this configuration remained fixed, as
> > > it were?
> > 
> > The PHY defines the MAC interface mode depending on the MACTYPE
> > setting selected and the results of the media side negotiation.
> > 
> > I think the above answers your remaining questions.
> 
> Ok, so going back to case (d). You said that the full name would be
> MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON.
> This means that when the only interface mode supported by the host would
> be SGMII, the PHY's MACTYPE is still configured to use 2500basex,
> 5gbaser, 10gbaser for the higher link speeds. Clearly this won't work.
> But on the other hand, the phylink validate method will remove
> 2500baseT, 5000baseT, 1000baseT from the advertising mask of the PHY, so
> the system will never end up operating at those speeds, so it should be fine.

I think you mean 10000baseT rather than 1000baseT. With that correction,
you are then correct - with the media restricted to 1G or slower speeds,
and the phy in MACTYPE mode 4 (aka 10GBASE-R as the fastest interface
mode) it will permanently be talking SGMII to the host.

> The reason why I'm looking at these patches is to see whether they would
> bring something useful to Aquantia PHYs. These come with firmware on a
> flash that is customized by Aquantia themselves based on the specifications
> of a single board. These PHYs have an ability which is very similar to
> what I'm seeing here, which is to select, for each negotiated link speed
> on the media side, the SERDES protocol to use on the system side. This
> is pre-programmed by the firmware, but could be fixed up by the
> operating system if done carefully.
> 
> The way Layerscape boards use Aquantia PHYs is to always select the
> "rate matching" option and keep the SERDES protocol fixed, just
> configure the mini MAC inside the PHY to emit PAUSE frames towards the
> system to keep the data rate under control. We would be using these PHYs
> with the generic C45 driver, which would be mostly enough except for
> lack of PHY interrupts, because the firmware already configures
> everything.
> 
> But on the other hand it gets a bit tiring, especially for PHYs on riser
> cards, to have to change firmware in order to test a different SERDES
> protocol, so we were experimenting with some changes in the PHY driver
> that would basically keep the firmware image fixed, and just fix up the
> configuration it made, and do things like "use 2500base-x for the
> 2500base-T speed, and sgmii for 1000base-T/100base-TX". The ability for
> a PHY to work on a board where its firmware image wasn't specifically
> designed for it comes in handy sometimes.

You're going to get into problems with this on Layerscape, because
reconfiguring the Serdes etc is something I've tried to highlight
as being necessary to NXP since SolidRun started using LX2160A. I
think there's some slow progress towards that, but it's so slow that
I've basically given up caring about it on the Honeycomb/Clearfog CX
boards now.

All the SFP cages on my Honeycomb have been configured for the most
useful mode to me - 1000BASE-X/SGMII, and I've given up caring about
USXGMII/10GBASE-R on those ports.

> I see that this patch set basically introduces the phydev->host_interfaces
> bitmap which is an attempt to find the answer to that question. But when
> will we know enough about phydev->host_interfaces in order to safely
> make decisions in the PHY driver based on it? phylink sets it, phylib
> does not.

It won't be something phylib could set because phylib doesn't know
the capabilities of its user - it's information that would need to be
provided to phylib.

> And many Aquantia systems use the generic PHY driver, as mentioned.
> Additionally, there are old device trees at play here, which only define
> the initial SERDES protocol. Would we be changing the behavior for those,
> in that we would be configuring the PHY to keep the SERDES protocol
> fixed whereas it would have dynamically changed before?

We have the same situation on Macchiatobin. The 88X3310 there defaults
to MACTYPE mode 4, and we've supported this for years with DT describing
the interface as 10GBASE-R - because we haven't actually cared very much
what DT says up to this point for the 88X3310. As I said in my previous
reply, the 88X3310 effectively dictates what the PHY interface mode will
be, and that is communicated back through phylib to whoever is using
phylib.

> Another question is what to do if there are multiple ways of
> establishing a system-side link. For example 1000 Mbps can be achieved
> either through SGMII, or USXGMII with symbol replication, or 2500base-x
> with flow control, or 10GBaseR with flow control. And I want to test
> them all. What would I need to do to change the SERDES protocol from one
> value to the other? Changing the phy-mode array in the device tree would
> be one option, but that may not always be possible.

First point to make here is that rate adaption at the PHY is really
not well supported in Linux, and there is no way to know via phylib if
a PHY is capable or not of rate adaption.

Today, if you have a 10GBASE-R link between a PHY doing rate adaption
and the "MAC", then what you will get from phylib is:

	phydev->interface = PHY_INTERFACE_MODE_10GBASER;
	phydev->speed = SPEED_1000;	// result of media negotiation
	phydev->duplex = DUPLEX_FULL;	// result of media negotiation
	phydev->pause = ...;		// result of media negotiation
	phydev->asym_pause = ...;	// result of media negotiation

which will, for the majority of implementations, result in the MAC being
forced to a 1G speed, possibly with or without pause enabled.

Due to this, if phylink is being used, the parameters given to
mac_link_up/pcs_link_up will be the result of the media negotiation, not
what is required on the actual link.

You mention "10GBaseR with flow control" but there is another
possibility that exists in real hardware out there. "10GBaseR without
flow control" and in that case, the MAC needs to pace its transmission
for the media speed (which is a good reason why mac_link_up should be
given the result of the media negotiation so it can do transmission
pacing.)

I have a follow-up to the response I gave to Sean Anderson on rate-
adapting PHYs that I need to finish and send, and it would be better
to have any discussion on this topic after I've sent that reply and
follow-up to that reply.
Vladimir Oltean Nov. 18, 2021, 3:09 p.m. UTC | #13
On Thu, Nov 18, 2021 at 02:47:44PM +0000, Russell King (Oracle) wrote:
> On Thu, Nov 18, 2021 at 04:20:39PM +0200, Vladimir Oltean wrote:
> > On Thu, Nov 18, 2021 at 01:22:18PM +0000, Russell King (Oracle) wrote:
> > > On Thu, Nov 18, 2021 at 02:03:34PM +0200, Vladimir Oltean wrote:
> > > > On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > > > > +static int mv3310_select_mactype(unsigned long *interfaces)
> > > > > +{
> > > > > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > > > +	else
> > > > > +		return -1;
> > > > > +}
> > > > > +
> > > > 
> > > > I would like to understand this heuristic better. Both its purpose and
> > > > its implementation.
> > > > 
> > > > It says:
> > > > (a) If the intersection between interface modes supported by the MAC and
> > > >     the PHY contains USXGMII, then use USXGMII as a MACTYPE
> > > > (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
> > > >     use 10GBaseR as MACTYPE
> > > > (...)
> > > > (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
> > > >     use 10GBaseR with rate matching as MACTYPE
> > > > (...)
> > > > (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
> > > >     use 10GBaseR as MACTYPE (no rate matching).
> > > 
> > > What is likely confusing you is a misinterpretation of the constant.
> > > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER actually means the PHY will
> > > choose between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII depending
> > > on the speed negotiated by the media. In this setting, the PHY
> > > dictates which interface mode will be used.
> > > 
> > > I could have named "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER" as
> > > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> > > Similar with "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_NO_SGMII_AN", which
> > > would be
> > > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_OFF".
> > > And "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI" would be
> > > "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> > > 
> > > Clearly using such long identifiers would have been rediculous,
> > > especially the second one at 74 characters.
> > 
> > True, but at least there could be a comment above each definition.
> > There's no size limit to that.
> > 
> > > > First of all, what is MACTYPE exactly? And what is the purpose of
> > > > changing it? What would happen if this configuration remained fixed, as
> > > > it were?
> > > 
> > > The PHY defines the MAC interface mode depending on the MACTYPE
> > > setting selected and the results of the media side negotiation.
> > > 
> > > I think the above answers your remaining questions.
> > 
> > Ok, so going back to case (d). You said that the full name would be
> > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON.
> > This means that when the only interface mode supported by the host would
> > be SGMII, the PHY's MACTYPE is still configured to use 2500basex,
> > 5gbaser, 10gbaser for the higher link speeds. Clearly this won't work.
> > But on the other hand, the phylink validate method will remove
> > 2500baseT, 5000baseT, 1000baseT from the advertising mask of the PHY, so
> > the system will never end up operating at those speeds, so it should be fine.
> 
> I think you mean 10000baseT rather than 1000baseT. With that correction,
> you are then correct - with the media restricted to 1G or slower speeds,
> and the phy in MACTYPE mode 4 (aka 10GBASE-R as the fastest interface
> mode) it will permanently be talking SGMII to the host.

Yes, I missed a zero.

> > The reason why I'm looking at these patches is to see whether they would
> > bring something useful to Aquantia PHYs. These come with firmware on a
> > flash that is customized by Aquantia themselves based on the specifications
> > of a single board. These PHYs have an ability which is very similar to
> > what I'm seeing here, which is to select, for each negotiated link speed
> > on the media side, the SERDES protocol to use on the system side. This
> > is pre-programmed by the firmware, but could be fixed up by the
> > operating system if done carefully.
> > 
> > The way Layerscape boards use Aquantia PHYs is to always select the
> > "rate matching" option and keep the SERDES protocol fixed, just
> > configure the mini MAC inside the PHY to emit PAUSE frames towards the
> > system to keep the data rate under control. We would be using these PHYs
> > with the generic C45 driver, which would be mostly enough except for
> > lack of PHY interrupts, because the firmware already configures
> > everything.
> > 
> > But on the other hand it gets a bit tiring, especially for PHYs on riser
> > cards, to have to change firmware in order to test a different SERDES
> > protocol, so we were experimenting with some changes in the PHY driver
> > that would basically keep the firmware image fixed, and just fix up the
> > configuration it made, and do things like "use 2500base-x for the
> > 2500base-T speed, and sgmii for 1000base-T/100base-TX". The ability for
> > a PHY to work on a board where its firmware image wasn't specifically
> > designed for it comes in handy sometimes.
> 
> You're going to get into problems with this on Layerscape, because
> reconfiguring the Serdes etc is something I've tried to highlight
> as being necessary to NXP since SolidRun started using LX2160A. I
> think there's some slow progress towards that, but it's so slow that
> I've basically given up caring about it on the Honeycomb/Clearfog CX
> boards now.
> 
> All the SFP cages on my Honeycomb have been configured for the most
> useful mode to me - 1000BASE-X/SGMII, and I've given up caring about
> USXGMII/10GBASE-R on those ports.

Speaking of that, do you know of any SFP modules that would use USXGMII?
It doesn't appear to be listed in the spec sheet when looking for that.

> > I see that this patch set basically introduces the phydev->host_interfaces
> > bitmap which is an attempt to find the answer to that question. But when
> > will we know enough about phydev->host_interfaces in order to safely
> > make decisions in the PHY driver based on it? phylink sets it, phylib
> > does not.
> 
> It won't be something phylib could set because phylib doesn't know
> the capabilities of its user - it's information that would need to be
> provided to phylib.

So you're saying it would be in phylib's best interest to not set it at
all, not even to a single bit corresponding to phydev->interface. So PHY
drivers could work out this way whether they should operate in backwards
compatibility mode or they could change MACTYPE at will.

> > And many Aquantia systems use the generic PHY driver, as mentioned.
> > Additionally, there are old device trees at play here, which only define
> > the initial SERDES protocol. Would we be changing the behavior for those,
> > in that we would be configuring the PHY to keep the SERDES protocol
> > fixed whereas it would have dynamically changed before?
> 
> We have the same situation on Macchiatobin. The 88X3310 there defaults
> to MACTYPE mode 4, and we've supported this for years with DT describing
> the interface as 10GBASE-R - because we haven't actually cared very much
> what DT says up to this point for the 88X3310. As I said in my previous
> reply, the 88X3310 effectively dictates what the PHY interface mode will
> be, and that is communicated back through phylib to whoever is using
> phylib.

So what is the full backwards compatibility strategy with old DT blobs?
Is it in this patch set? I didn't notice it.

> > Another question is what to do if there are multiple ways of
> > establishing a system-side link. For example 1000 Mbps can be achieved
> > either through SGMII, or USXGMII with symbol replication, or 2500base-x
> > with flow control, or 10GBaseR with flow control. And I want to test
> > them all. What would I need to do to change the SERDES protocol from one
> > value to the other? Changing the phy-mode array in the device tree would
> > be one option, but that may not always be possible.
> 
> First point to make here is that rate adaption at the PHY is really
> not well supported in Linux, and there is no way to know via phylib if
> a PHY is capable or not of rate adaption.
> 
> Today, if you have a 10GBASE-R link between a PHY doing rate adaption
> and the "MAC", then what you will get from phylib is:
> 
> 	phydev->interface = PHY_INTERFACE_MODE_10GBASER;
> 	phydev->speed = SPEED_1000;	// result of media negotiation
> 	phydev->duplex = DUPLEX_FULL;	// result of media negotiation
> 	phydev->pause = ...;		// result of media negotiation
> 	phydev->asym_pause = ...;	// result of media negotiation
> 
> which will, for the majority of implementations, result in the MAC being
> forced to a 1G speed, possibly with or without pause enabled.
> 
> Due to this, if phylink is being used, the parameters given to
> mac_link_up/pcs_link_up will be the result of the media negotiation, not
> what is required on the actual link.
> 
> You mention "10GBaseR with flow control" but there is another
> possibility that exists in real hardware out there. "10GBaseR without
> flow control" and in that case, the MAC needs to pace its transmission
> for the media speed (which is a good reason why mac_link_up should be
> given the result of the media negotiation so it can do transmission
> pacing.)
> 
> I have a follow-up to the response I gave to Sean Anderson on rate-
> adapting PHYs that I need to finish and send, and it would be better
> to have any discussion on this topic after I've sent that reply and
> follow-up to that reply.

Ok, how would the MAC pace itself to send at a lower data rate, if the
SERDES protocol is 10G and the PHY doesn't send it PAUSE frames back?
At least Layerscape systems can't do this AFAIK.

I can watch for updates on this on the other thread.
Marek Behún Nov. 18, 2021, 3:20 p.m. UTC | #14
On Thu, 18 Nov 2021 17:09:51 +0200
Vladimir Oltean <olteanv@gmail.com> wrote:

> Speaking of that, do you know of any SFP modules that would use USXGMII?
> It doesn't appear to be listed in the spec sheet when looking for that.

RollBall 10G SFP modules, which this series prepares support for,
contain 88X3310 PHY and thus can work in USXGMII. By defualt they use
MACTYPE=6 (10gbase-r with rate matching), but when this series and
the series that adds support for RollBall SFPs are merged, then the SFP
module will work in USXGMII if USXGMII is supported by the host MAC.

Marek
Russell King (Oracle) Nov. 18, 2021, 3:59 p.m. UTC | #15
On Thu, Nov 18, 2021 at 05:09:51PM +0200, Vladimir Oltean wrote:
> On Thu, Nov 18, 2021 at 02:47:44PM +0000, Russell King (Oracle) wrote:
> > You're going to get into problems with this on Layerscape, because
> > reconfiguring the Serdes etc is something I've tried to highlight
> > as being necessary to NXP since SolidRun started using LX2160A. I
> > think there's some slow progress towards that, but it's so slow that
> > I've basically given up caring about it on the Honeycomb/Clearfog CX
> > boards now.
> > 
> > All the SFP cages on my Honeycomb have been configured for the most
> > useful mode to me - 1000BASE-X/SGMII, and I've given up caring about
> > USXGMII/10GBASE-R on those ports.
> 
> Speaking of that, do you know of any SFP modules that would use USXGMII?
> It doesn't appear to be listed in the spec sheet when looking for that.

I only know of one possibility, which is the DM7052 which uses a
Broadcom BCM84881 PHY. I believe the PHY is capable of USXGMII (there's
various references to it on the 'net) to some extent. By that, I mean
it probably does not provide the 16-bit configuration word in USXGMII
mode - just like it doesn't provide it in SGMII mode.

On the DM7052 module, the PHY is not in USXGMII mode, it will switch
interface modes according to the media speed just like the Marvell
88X3310 does. So, it's vital to use the PHY specific driver for this
PHY, and not the generic driver.

Since the datasheet for the PHY is unavailable, it is not known how to
switch it to USXGMII mode, so it may not be possible to do so except
by modifying the pin strapping on the PHY. It may be possible to do it
through vendor registers.

> > > I see that this patch set basically introduces the phydev->host_interfaces
> > > bitmap which is an attempt to find the answer to that question. But when
> > > will we know enough about phydev->host_interfaces in order to safely
> > > make decisions in the PHY driver based on it? phylink sets it, phylib
> > > does not.
> > 
> > It won't be something phylib could set because phylib doesn't know
> > the capabilities of its user - it's information that would need to be
> > provided to phylib.
> 
> So you're saying it would be in phylib's best interest to not set it at
> all, not even to a single bit corresponding to phydev->interface. So PHY
> drivers could work out this way whether they should operate in backwards
> compatibility mode or they could change MACTYPE at will.

That's what I'm thinking - if we end up with a single bit set in the
host interface, does that mean "the host only supports a single
interface type" or does it mean "DT only specified one interface type
and we need to operate in backwards compatibility mode".

If we provide an empty host_interfaces bitmap, then we can easily
detect that it's not set and fallback to compatibility mode - in the
case of 88x3310, that would basically mean not attempting to set
MACTYPE.

> > > And many Aquantia systems use the generic PHY driver, as mentioned.
> > > Additionally, there are old device trees at play here, which only define
> > > the initial SERDES protocol. Would we be changing the behavior for those,
> > > in that we would be configuring the PHY to keep the SERDES protocol
> > > fixed whereas it would have dynamically changed before?
> > 
> > We have the same situation on Macchiatobin. The 88X3310 there defaults
> > to MACTYPE mode 4, and we've supported this for years with DT describing
> > the interface as 10GBASE-R - because we haven't actually cared very much
> > what DT says up to this point for the 88X3310. As I said in my previous
> > reply, the 88X3310 effectively dictates what the PHY interface mode will
> > be, and that is communicated back through phylib to whoever is using
> > phylib.
> 
> So what is the full backwards compatibility strategy with old DT blobs?
> Is it in this patch set? I didn't notice it.

Marek has attempted to create a backwards compatibility in phylink for
this. See phylink_update_phy_modes().

> > > Another question is what to do if there are multiple ways of
> > > establishing a system-side link. For example 1000 Mbps can be achieved
> > > either through SGMII, or USXGMII with symbol replication, or 2500base-x
> > > with flow control, or 10GBaseR with flow control. And I want to test
> > > them all. What would I need to do to change the SERDES protocol from one
> > > value to the other? Changing the phy-mode array in the device tree would
> > > be one option, but that may not always be possible.
> > 
> > First point to make here is that rate adaption at the PHY is really
> > not well supported in Linux, and there is no way to know via phylib if
> > a PHY is capable or not of rate adaption.
> > 
> > Today, if you have a 10GBASE-R link between a PHY doing rate adaption
> > and the "MAC", then what you will get from phylib is:
> > 
> > 	phydev->interface = PHY_INTERFACE_MODE_10GBASER;
> > 	phydev->speed = SPEED_1000;	// result of media negotiation
> > 	phydev->duplex = DUPLEX_FULL;	// result of media negotiation
> > 	phydev->pause = ...;		// result of media negotiation
> > 	phydev->asym_pause = ...;	// result of media negotiation
> > 
> > which will, for the majority of implementations, result in the MAC being
> > forced to a 1G speed, possibly with or without pause enabled.
> > 
> > Due to this, if phylink is being used, the parameters given to
> > mac_link_up/pcs_link_up will be the result of the media negotiation, not
> > what is required on the actual link.
> > 
> > You mention "10GBaseR with flow control" but there is another
> > possibility that exists in real hardware out there. "10GBaseR without
> > flow control" and in that case, the MAC needs to pace its transmission
> > for the media speed (which is a good reason why mac_link_up should be
> > given the result of the media negotiation so it can do transmission
> > pacing.)
> > 
> > I have a follow-up to the response I gave to Sean Anderson on rate-
> > adapting PHYs that I need to finish and send, and it would be better
> > to have any discussion on this topic after I've sent that reply and
> > follow-up to that reply.
> 
> Ok, how would the MAC pace itself to send at a lower data rate, if the
> SERDES protocol is 10G and the PHY doesn't send it PAUSE frames back?
> At least Layerscape systems can't do this AFAIK.

I'm afraid that is an exercise for the reader/MAC driver author since
it's dependent on the hardware.

One would hope that no one would create a system where the PHY needs to
use rate adaption and requires the MAC to pace itself, but the MAC has
no way to achieve that. If such a hardware combination exists, I don't
see how it could work reliably.

However, bear in mind that the 88X3310 on Macchiatobin boards is the
one without MACSEC, so if it is configured to only operate at 10GBASE-R
with rate adaption, then it will not be generating pause frames and it
will expect the MAC to pace. This is about the only platform I have
which I could experiment with a PHY performing rate adaption. However,
I'm not currently sure how useful that would be - it would be nothing
more than an experimentation exercise, and would require MAC pacing
to be implemented in the mvpp2 driver.
Andrew Lunn Nov. 18, 2021, 4:27 p.m. UTC | #16
> +static int __init phylink_init(void)
> +{
> +	__set_bit(PHY_INTERFACE_MODE_USXGMII, phylink_sfp_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_10GBASER, phylink_sfp_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_10GKR, phylink_sfp_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_5GBASER, phylink_sfp_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_2500BASEX, phylink_sfp_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_SGMII, phylink_sfp_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_1000BASEX, phylink_sfp_interfaces);

Do we need to include PHY_INTERFACE_MODE_100BASEX here for 100BaseFX?
Not sure we actually have any systems using it.

   Andrew
Andrew Lunn Nov. 18, 2021, 4:27 p.m. UTC | #17
On Wed, Nov 17, 2021 at 11:50:48PM +0100, Marek Behún wrote:
> Now that phy.h defines macro DECLARE_PHY_INTERFACE_MASK(), use it
> instead of DECLARE_BITMAP().
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Andrew Lunn Nov. 18, 2021, 4:28 p.m. UTC | #18
On Wed, Nov 17, 2021 at 11:50:44PM +0100, Marek Behún wrote:
> Now that the `phy-mode` DT property can be an array of strings instead
> of just one string, update the documentation for of_get_phy_mode(),
> fwnode_get_phy_mode() and device_get_phy_mode() saying that if multiple
> strings are present, the first one is returned.
> 
> Conventionally the property was used to represent the mode we want the
> PHY to operate in, but we extended this to mean the list of all
> supported modes by that PHY on that particular board.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Russell King (Oracle) Nov. 18, 2021, 4:28 p.m. UTC | #19
On Thu, Nov 18, 2021 at 05:27:00PM +0100, Andrew Lunn wrote:
> > +static int __init phylink_init(void)
> > +{
> > +	__set_bit(PHY_INTERFACE_MODE_USXGMII, phylink_sfp_interfaces);
> > +	__set_bit(PHY_INTERFACE_MODE_10GBASER, phylink_sfp_interfaces);
> > +	__set_bit(PHY_INTERFACE_MODE_10GKR, phylink_sfp_interfaces);
> > +	__set_bit(PHY_INTERFACE_MODE_5GBASER, phylink_sfp_interfaces);
> > +	__set_bit(PHY_INTERFACE_MODE_2500BASEX, phylink_sfp_interfaces);
> > +	__set_bit(PHY_INTERFACE_MODE_SGMII, phylink_sfp_interfaces);
> > +	__set_bit(PHY_INTERFACE_MODE_1000BASEX, phylink_sfp_interfaces);
> 
> Do we need to include PHY_INTERFACE_MODE_100BASEX here for 100BaseFX?
> Not sure we actually have any systems using it.

Most likely. I think we can now drop PHY_INTERFACE_MODE_10GKR from
this too.
Andrew Lunn Nov. 18, 2021, 4:31 p.m. UTC | #20
On Wed, Nov 17, 2021 at 11:50:45PM +0100, Marek Behún wrote:
> Now that the 'phy-mode' property can be a string array containing more
> PHY modes, add helper function fwnode_get_phy_modes() that reads this
> property and fills in PHY interfaces bitmap.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Andrew Lunn Nov. 18, 2021, 4:33 p.m. UTC | #21
> > +	/* If supported is empty, just copy modes defined in fwnode. */
> > +	if (phy_interface_empty(supported))
> > +		return phy_interface_copy(supported, modes);
> 
> Doesn't this mean we always end up with the supported_interfaces field
> filled in, even for drivers that haven't yet been converted? It will
> have the effect of locking the driver to the interface mode in "modes"
> where only one interface mode is mentioned in DT.
> 
> At the moment, I think the only drivers that would be affected would be
> some DSA drivers, stmmac and macb as they haven't yet been converted.

Hi Russell

What do you think the best way forward is? Got those converted before
merging this?

	Andrew
Russell King (Oracle) Nov. 18, 2021, 5:08 p.m. UTC | #22
On Thu, Nov 18, 2021 at 05:33:33PM +0100, Andrew Lunn wrote:
> > > +	/* If supported is empty, just copy modes defined in fwnode. */
> > > +	if (phy_interface_empty(supported))
> > > +		return phy_interface_copy(supported, modes);
> > 
> > Doesn't this mean we always end up with the supported_interfaces field
> > filled in, even for drivers that haven't yet been converted? It will
> > have the effect of locking the driver to the interface mode in "modes"
> > where only one interface mode is mentioned in DT.
> > 
> > At the moment, I think the only drivers that would be affected would be
> > some DSA drivers, stmmac and macb as they haven't yet been converted.
> 
> Hi Russell
> 
> What do you think the best way forward is? Got those converted before
> merging this?

The situation is as follows:

- For macb, I have a bunch of patches ready to go against net-next (in
  git log order):
  net: macb: use phylink_generic_validate()
  net: macb: clean up macb_validate()
  net: macb: remove interface checks in macb_validate()
  net: macb: populate supported_interfaces member

  but I think Sean and myself need to finish discussing PCS capabilities
  before I send those.

- For mv88e6xxx DSA, I have some patches - I need to check with Marek
  whether he has any further changes for those as he's been looking over
  them and checking with the Marvell specs before I can send them.

- For mt7530 DSA, I also have some patches, but no way to test them.

All of the above patches are in my net-queue branch:
  http://git.armlinux.org.uk/cgit/linux-arm.git/log/?h=net-queue

- For stmmac, I pinged Jose Abreu about it. stmmac's validate function
  does not check the interface mode at all if there is no XPCS attached.
  Jose says that which interface modes are supported is up to the
  integrator to decide, and it seems there is nothing in the current
  driver to work out which interface modes can be supported.

  If there is an XPCS attached, it defers interface mode checks to
  xpcs_validate(), which uses the interface mode to walk an array to
  find a list of linkmodes that the PCS supports.

  So, I think it's going to take a bit of unpicking to work out what to
  do here, and may depend on what we come up with with Sean for PCS.

That leaves quite a number of DSA drivers untouched.

So, I think we need to realise that even though we have all the users
in the kernel, making changes to phylink's API is always going to be
difficult, and we always need to maintain compatibility for old ways
of doing stuff - at least until every user can be converted. However,
that brings with it its own problem - there is then no motivation for
people to adapt to the new way. This can be seen as we have the
compatibility for calling mac_config() whenever the link comes up
16 months after the PCS stuff was introduced. One of the problem
drivers for this is mtk_eth_soc:

http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=net-queue&id=278da006a936c0743c7fc261c23e7de992ac78e6

René van Dorst said in response to me posting that patch in July 2020:
> I know, you have pointed that out before. But I don't know how to fix
> mtk_gmac0_rgmii_adjust(). This function changes the PLL of the MAC.
> But without documentation I am not sure what all the bits are used
> for.

and my attempts to discuss a way forward didn't get a reply. So,
mtk_eth_soc has become an example of a problematical driver that makes
ongoing phylink changes difficult, and I fear stmmac may become another
example.

I'm quite certain that as we try to develop phylink, such as adding
extra facilities like specifying the interface modes, we're going to
end up running into these kinds of problems that we can't solve, and
we are going to have to keep compatibility for the old ways of doing
stuff going for years to come - which is just going to get more and
more painful.
Marek Behún Nov. 18, 2021, 5:33 p.m. UTC | #23
On Thu, 18 Nov 2021 17:08:10 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> I'm quite certain that as we try to develop phylink, such as adding
> extra facilities like specifying the interface modes, we're going to
> end up running into these kinds of problems that we can't solve, and
> we are going to have to keep compatibility for the old ways of doing
> stuff going for years to come - which is just going to get more and
> more painful.

One way to move this forward would be to check compatible of the
corresponding MAC in this new function. If it belongs to an unconverted
driver, we can ensure the old behaviour. This would require to add a
comment to the driver saying "if you convert this, please drop the
exception in phylink_update_phy_modes()".

Yes, it is an ugly solution, but it would work and this function
already is meant to be solving backward compatibility problems by such
mechanisms. The comment says:

+	 * Nonetheless it is possible that a device may support only one mode,
+	 * for example 1000base-x, due to strapping pins or some other reasons.
+	 * If a specific device supports only the mode mentioned in DT, the
+	 * exception should be made here with of_machine_is_compatible().

BTW, I now realize that of_machine_is_compatible() is not the best
solution. I will change the comment in v2.

Marek
Russell King (Oracle) Nov. 18, 2021, 5:39 p.m. UTC | #24
On Thu, Nov 18, 2021 at 06:33:01PM +0100, Marek Behún wrote:
> On Thu, 18 Nov 2021 17:08:10 +0000
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > I'm quite certain that as we try to develop phylink, such as adding
> > extra facilities like specifying the interface modes, we're going to
> > end up running into these kinds of problems that we can't solve, and
> > we are going to have to keep compatibility for the old ways of doing
> > stuff going for years to come - which is just going to get more and
> > more painful.
> 
> One way to move this forward would be to check compatible of the
> corresponding MAC in this new function. If it belongs to an unconverted
> driver, we can ensure the old behaviour. This would require to add a
> comment to the driver saying "if you convert this, please drop the
> exception in phylink_update_phy_modes()".

That could work when drivers pass the fwnode as the "device" node.
Some drivers don't do that, they pass a sub-node instead - such as
mvpp2. However, mvpp2 is of course up to date with all phylink
developments.

However, the same issue exists with DSA - the fwnode passed to
phylink doesn't have a compatible. I suppose we could walk up the
fwnode tree until we do find a node with a compatible property,
that may work.
Sean Anderson Nov. 18, 2021, 8:38 p.m. UTC | #25
On 11/17/21 5:50 PM, Marek Behún wrote:
> Now that the 'phy-mode' property can be a string array containing more
> PHY modes (all that are supported by the board), update the bitmap of
> interfaces supported by the MAC with this property.
> 
> Normally this would be a simple intersection (of interfaces supported by
> the current implementation of the driver and interfaces supported by the
> board), but we need to keep being backwards compatible with older DTs,
> which may only define one mode, since, as Russell King says,
>    conventionally phy-mode has meant "this is the mode we want to operate
>    the PHY interface in" which was fine when PHYs didn't change their
>    mode depending on the media speed
> 
> An example is DT defining
>    phy-mode = "sgmii";
> but the board supporting also 1000base-x and 2500base-x.
> 
> Add the following logic to keep this backwards compatiblity:
> - if more PHY modes are defined, do a simple intersection
> - if one PHY mode is defined:
>    - if it is sgmii, 1000base-x or 2500base-x, add all three and then do
>      the intersection
>    - if it is 10gbase-r or usxgmii, add both, and also 5gbase-r,
>      2500base-x, 1000base-x and sgmii, and then do the intersection
> 
> This is simple enough and should work for all boards.
> 
> Nonetheless it is possible (although extremely unlikely, in my opinion)
> that a board will be found that (for example) defines
>    phy-mode = "sgmii";
> and the MAC drivers supports sgmii, 1000base-x and 2500base-x, but the
> board DOESN'T support 2500base-x, because of electrical reasons (since
> the frequency is 2.5x of sgmii).
> Our code will in this case incorrectly infer also support for
> 2500base-x. To avoid this, the board maintainer should either change DTS
> to
>    phy-mode = "sgmii", "1000base-x";
> and update device tree on all boards, or, if that is impossible, add a
> fix into the function we are introducing in this commit.

Can you touch on the 5G/10G stuff as well in the message?

> Signed-off-by: Marek Behún <kabel@kernel.org>
> ---
>   drivers/net/phy/phylink.c | 63 +++++++++++++++++++++++++++++++++++++++
>   include/linux/phy.h       |  6 ++++
>   2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index f7156b6868e7..6d7c216a5dea 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -563,6 +563,67 @@ static int phylink_parse_fixedlink(struct phylink *pl,
>   	return 0;
>   }
>   
> +static void phylink_update_phy_modes(struct phylink *pl,
> +				     struct fwnode_handle *fwnode)
> +{
> +	unsigned long *supported = pl->config->supported_interfaces;
> +	DECLARE_PHY_INTERFACE_MASK(modes);
> +
> +	if (fwnode_get_phy_modes(fwnode, modes) < 0)
> +		return;
> +
> +	if (phy_interface_empty(modes))
> +		return;
> +
> +	/* If supported is empty, just copy modes defined in fwnode. */
> +	if (phy_interface_empty(supported))
> +		return phy_interface_copy(supported, modes);
> +
> +	/* We want the intersection of given supported modes with those defined
> +	 * in DT.
> +	 *
> +	 * Some older device-trees mention only one of `sgmii`, `1000base-x` or
> +	 * `2500base-x`, while supporting all three. Other mention `10gbase-r`
> +	 * or `usxgmii`, while supporting both, and also `sgmii`, `1000base-x`,
> +	 * `2500base-x` and `5gbase-r`.
> +	 * For backwards compatibility with these older DTs, make it so that if
> +	 * one of these modes is mentioned in DT and MAC supports more of them,
> +	 * keep all that are supported according to the logic above.
> +	 *
> +	 * Nonetheless it is possible that a device may support only one mode,
> +	 * for example 1000base-x, due to strapping pins or some other reasons.
> +	 * If a specific device supports only the mode mentioned in DT, the
> +	 * exception should be made here with of_machine_is_compatible().
> +	 */
> +	if (bitmap_weight(modes, PHY_INTERFACE_MODE_MAX) == 1) {
> +		bool lower = false;
> +
> +		if (test_bit(PHY_INTERFACE_MODE_10GBASER, modes) ||
> +		    test_bit(PHY_INTERFACE_MODE_USXGMII, modes)) { > +			if (test_bit(PHY_INTERFACE_MODE_5GBASER, supported))
> +				__set_bit(PHY_INTERFACE_MODE_5GBASER, modes);
> +			if (test_bit(PHY_INTERFACE_MODE_10GBASER, supported))
> +				__set_bit(PHY_INTERFACE_MODE_10GBASER, modes);
> +			if (test_bit(PHY_INTERFACE_MODE_USXGMII, supported))
> +				__set_bit(PHY_INTERFACE_MODE_USXGMII, modes);	
How about

	DECLARE_PHY_INTERFACE_MASK(upper_modes);

	__set_bit(upper_modes, PHY_INTERFACE_MODE_5GBASER);
	__set_bit(upper_modes, PHY_INTERFACE_MODE_10GBASER);
	__set_bit(upper_modes, PHY_INTERFACE_MODE_USXGMII);
	phy_interface_and(upper_modes, supported, upper_modes);
	phy_interface_or(modes, modes, upper_modes);

same linecount but less duplication

> +			lower = true;
> +		}
> +
> +		if (lower || (test_bit(PHY_INTERFACE_MODE_SGMII, modes) ||
> +			      test_bit(PHY_INTERFACE_MODE_1000BASEX, modes) ||
> +			      test_bit(PHY_INTERFACE_MODE_2500BASEX, modes))) {
> +			if (test_bit(PHY_INTERFACE_MODE_SGMII, supported))
> +				__set_bit(PHY_INTERFACE_MODE_SGMII, modes);
> +			if (test_bit(PHY_INTERFACE_MODE_1000BASEX, supported))
> +				__set_bit(PHY_INTERFACE_MODE_1000BASEX, modes);
> +			if (test_bit(PHY_INTERFACE_MODE_2500BASEX, supported))
> +				__set_bit(PHY_INTERFACE_MODE_2500BASEX, modes);

ditto

--Sean

> +		}
> +	}
> +
> +	phy_interface_and(supported, supported, modes);
> +}
> +
>   static int phylink_parse_mode(struct phylink *pl, struct fwnode_handle *fwnode)
>   {
>   	struct fwnode_handle *dn;
> @@ -1156,6 +1217,8 @@ struct phylink *phylink_create(struct phylink_config *config,
>   	__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
>   	timer_setup(&pl->link_poll, phylink_fixed_poll, 0);
>   
> +	phylink_update_phy_modes(pl, fwnode);
> +
>   	bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>   	linkmode_copy(pl->link_config.advertising, pl->supported);
>   	phylink_validate(pl, pl->supported, &pl->link_config);
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 1e57cdd95da3..83ae15ab1676 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -169,6 +169,12 @@ static inline bool phy_interface_empty(const unsigned long *intf)
>   	return bitmap_empty(intf, PHY_INTERFACE_MODE_MAX);
>   }
>   
> +static inline void phy_interface_copy(unsigned long *dst,
> +				      const unsigned long *src)
> +{
> +	bitmap_copy(dst, src, PHY_INTERFACE_MODE_MAX);
> +}
> +
>   static inline void phy_interface_and(unsigned long *dst, const unsigned long *a,
>   				     const unsigned long *b)
>   {
>