diff mbox series

[v3] net: dsa: microchip: call phy_remove_link_mode during probe

Message ID 20200720090416.GA7307@laureti-dev
State Superseded
Delegated to: David Miller
Headers show
Series [v3] net: dsa: microchip: call phy_remove_link_mode during probe | expand

Commit Message

Helmut Grohne July 20, 2020, 9:04 a.m. UTC
When doing "ip link set dev ... up" for a ksz9477 backed link,
ksz9477_phy_setup is called and it calls phy_remove_link_mode to remove
1000baseT HDX. During phy_remove_link_mode, phy_advertise_supported is
called. Doing so reverts any previous change to advertised link modes
e.g. using a udevd .link file.

phy_remove_link_mode is not meant to be used while opening a link and
should be called during phy probe when the link is not yet available to
userspace.

Therefore move the phy_remove_link_mode calls into
ksz9477_switch_register. It indirectly calls dsa_register_switch, which
creates the relevant struct phy_devices and we update the link modes
right after that. At that time dev->features is already initialized by
ksz9477_switch_detect.

Remove phy_setup from ksz_dev_ops as no users remain.

Link: https://lore.kernel.org/netdev/20200715192722.GD1256692@lunn.ch/
Fixes: 42fc6a4c613019 ("net: dsa: microchip: prepare PHY for proper advertisement")
Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
---
 drivers/net/dsa/microchip/ksz9477.c    | 39 +++++++++++++-------------
 drivers/net/dsa/microchip/ksz_common.c |  2 --
 drivers/net/dsa/microchip/ksz_common.h |  2 --
 3 files changed, 20 insertions(+), 23 deletions(-)

On Fri, Jul 17, 2020 at 03:18:14PM +0200, Andrew Lunn wrote:
> I'm not questioning the ordering. I'm questioning which phydev
> structure is being manipulated.
...
> Is slave_dev->phydev == &dev->ports[i].phydev ?

You are spot on. I mistakenly assumed this to be the case, but it really
is not. Thank you. Your detailed explanations are much appreciated. This
slipped my testing, because I only checked whether the 1Gbit HDX mode
was correctly removed. It seems like something else does that already,
so I didn't notice that I was operating on the wrong phy_device.

The dev->ports[i].phydev is not actually exposed beyond the driver. The
driver sets the phydev.speed in a few places and even reads it back in
one place. It also sets phydev.duplex, but never reads it back. It
queries phydev.link, which is statically 0 due to using devm_kzalloc.

I think the use of this ksz_port.phydev is very misleading, but I'm
unsure how to fix this. It is not clear to me whether all those updates
should be performed on the connected phydev instead or whether this is
just internal state tracking.

That leaves the question of when to remove the link modes. ksz9477_setup
is called before dsa_register_switch. At the time it runs, the port's
slave device is a NULL pointer. The actual phydev is not linked up and
we cannot access it. The phydev only gets initialized during
dsa_register_switch when dsa_slave_phy_connect is called. Since there is
no suitable hook there, we cannot change the link modes before
phylink_connect_phy is called.

As far as I understand your previous mails, it is not necessary to
remove the link modes before phylink_connect_phy. So the next place
after dsa_register_switch seems to be inside ksz9477_switch_register (as
dsa_register_switch is the final call in ksz_switch_register). I'm
unsure whether this poses a race condition with user space, but on the
system under test, the race is reliably won if there is any.

Beyond resolving the phydev mess, I wish that we could agree on the
order of verbs and nouns in symbols in some way. dsa_register_switch vs.
ksz_switch_register became confusing at a time. I see where either is
coming from, but I think that consistency would be better here.

Helmut

changes since v2:
 * Operate on the correct phydev. Thanks to Andrew Lunn.
changes since v1:
 * Don't change phy_remove_link_mode. Instead, call it at the right
   time. Thanks to Andrew Lunn for the detailed explanation.

Comments

Andrew Lunn July 20, 2020, 8:43 p.m. UTC | #1
Ignoring the part about how to cleanup this internal phydev for the
moment.

>  int ksz9477_switch_register(struct ksz_device *dev)
>  {
> -	return ksz_switch_register(dev, &ksz9477_dev_ops);
> +	int ret, i;
> +	struct phy_device *phydev;
> +
> +	ret = ksz_switch_register(dev, &ksz9477_dev_ops);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < dev->phy_port_cnt; ++i) {
> +		phydev = dsa_to_port(dev->ds, i)->slave->phydev;

There is no guarantee this phydev actually exists, as far as i
remember. It will only be allocated for user ports. If a port is not
used, i.e. not listed in DT, it won't have a phydev. So you should add
a test:

		if (!dsa_is_user(ds, i))
			continue;

Otherwise, this now seems correct.

Andrew
Andrew Lunn July 20, 2020, 9:04 p.m. UTC | #2
> The dev->ports[i].phydev is not actually exposed beyond the driver. The
> driver sets the phydev.speed in a few places and even reads it back in
> one place. It also sets phydev.duplex, but never reads it back. It
> queries phydev.link, which is statically 0 due to using devm_kzalloc.
> 
> I think the use of this ksz_port.phydev is very misleading, but I'm
> unsure how to fix this. It is not clear to me whether all those updates
> should be performed on the connected phydev instead or whether this is
> just internal state tracking.

I took a quick look at the code.

For PHY addresses < dev->phy_port_cnt it passes all reads/writes
through to the hardware. So the Linux MDIO/PHY subsystem will be able
to fully drive these PHYs, and the ksz9477 internal phydev is
unneeded.

Where it gets interesting is addr >= dev->phy_port_cnt. Reads of the
PHY registers return hard coded values, or the link speed from the
local phydev. Writes to these registers are just ignored.

If you compare this to other DSA drivers/DSA switches, reads/write for
addresses where there are no internal PHY get passed out to an
external MDIO bus, where an external PHY can be connected. The Linux
MDIO/PHY subsystem will discover these external PHYs and create phydev
instance for them. If there is no external PHY, for example the MAC is
connected to another MAC, no PHY will be detected, and fixed-link is
used in its place.

Do these switches have an external MDIO bus?
How are external PHYs usually managed?

At a minimum, the internal phydev can be replaced with just a speed,
rather than a full phydev, which will reduce confusion. But it would
be nice to go further and remove all the addr >= dev->phy_port_cnt
handling. But we need to understand the implications of that.

	Andrew
Helmut Grohne July 21, 2020, 7:38 a.m. UTC | #3
Hi Andrew,

Your persistence on this matter is much appreciated.

On Mon, Jul 20, 2020 at 11:04:49PM +0200, Andrew Lunn wrote:
> > The dev->ports[i].phydev is not actually exposed beyond the driver. The
> > driver sets the phydev.speed in a few places and even reads it back in
> > one place. It also sets phydev.duplex, but never reads it back. It
> > queries phydev.link, which is statically 0 due to using devm_kzalloc.
> > 
> > I think the use of this ksz_port.phydev is very misleading, but I'm
> > unsure how to fix this. It is not clear to me whether all those updates
> > should be performed on the connected phydev instead or whether this is
> > just internal state tracking.
> 
> I took a quick look at the code.
> 
> For PHY addresses < dev->phy_port_cnt it passes all reads/writes
> through to the hardware. So the Linux MDIO/PHY subsystem will be able
> to fully drive these PHYs, and the ksz9477 internal phydev is
> unneeded.

I do not fully concur here yet. For instance, ksz8795_port_setup and
ksz9477_port_setup branch on the port being a CPU port and evaluate the
phydev.link for non-CPU ports. Given that phydev.link is never assigned,
the branch where dev->live_ports is assigned is dead. Following
live_ports through the code reveals that it is only ever written to, but
no logic ever depends on its value. I'm not yet sure whether all of that
should simply be removed with no replacement or whether it was meant to
be extended some time later.

> Where it gets interesting is addr >= dev->phy_port_cnt. Reads of the
> PHY registers return hard coded values, or the link speed from the
> local phydev. Writes to these registers are just ignored.

This makes somewhat sense to me. It may become clearer below.

> If you compare this to other DSA drivers/DSA switches, reads/write for
> addresses where there are no internal PHY get passed out to an
> external MDIO bus, where an external PHY can be connected. The Linux
> MDIO/PHY subsystem will discover these external PHYs and create phydev
> instance for them. If there is no external PHY, for example the MAC is
> connected to another MAC, no PHY will be detected, and fixed-link is
> used in its place.

These switches all have internal PHYs for addresses < phy_port_cnt.
Beyond this index, the MACs are located. Few devices have multiple MACs
and only one MAC can be connected to the CPU at a time, because the tail
tagging scheme can only be enabled on one MAC port at a time. The driver
requires tail tagging on CPU ports (although this is not required by the
hardware).

> Do these switches have an external MDIO bus?

One has a choice of how one wishes to communicate with these switches.
Depending on configuration straps, they can do SPI or I²C or MDIO,
though the register space on the MDIO bus is too limited to do anything
useful, so the driver does not support MDIO. You can reach all of the
internal PHYs through the chosen bus. If you connect an external PHY to
a MAC, the KSZ is not involved in a management connection such as MDIO.

> How are external PHYs usually managed?

I honestly don't know. I only deal with internal PHYs. The typical use
case for the MAC ports is to establish fixed-links to other MACs (such
as the CPU or other switches).

> At a minimum, the internal phydev can be replaced with just a speed,
> rather than a full phydev, which will reduce confusion. But it would
> be nice to go further and remove all the addr >= dev->phy_port_cnt
> handling. But we need to understand the implications of that.

addr >= dev->phy_port_cnt identifies a MAC. While the KSZ may have a
data connection to the other side, but it does not have a management
connection (e.g. MDIO). The driver presently assumes that all MAC
connections are fixed-links, which is the case when you connect it to
the CPU. A significant fraction of KSZ switches only have one MAC or
have multiple MACs of which you only use one in a particular product
(e.g. because one only support SGMII and othe other only supports
RGMII). So the common case here is that addr >= dev->phy_port_cnt
uniquely identifies the fixed-link CPU port.

This also means that very likely the addr >= dev->phy_port_cnt handling
is not going away.

It also kinda routes us back to another thread of mine. In the followup
to https://lore.kernel.org/netdev/20200714120827.GA7939@laureti-dev/,
you also identified the assumption that any MAC port is the CPU port of
this driver and asked me to build on it. It is unclear whether that
should be lifted. If it isn't, I think it is fairly safe to assume that
any MAC is connected using a fixed-link and that there is no need for
any external PHY management.

Helmut
diff mbox series

Patch

diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index 8d15c3016024..368964b09aae 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -974,23 +974,6 @@  static void ksz9477_port_mirror_del(struct dsa_switch *ds, int port,
 			     PORT_MIRROR_SNIFFER, false);
 }
 
-static void ksz9477_phy_setup(struct ksz_device *dev, int port,
-			      struct phy_device *phy)
-{
-	/* Only apply to port with PHY. */
-	if (port >= dev->phy_port_cnt)
-		return;
-
-	/* The MAC actually cannot run in 1000 half-duplex mode. */
-	phy_remove_link_mode(phy,
-			     ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
-
-	/* PHY does not support gigabit. */
-	if (!(dev->features & GBIT_SUPPORT))
-		phy_remove_link_mode(phy,
-				     ETHTOOL_LINK_MODE_1000baseT_Full_BIT);
-}
-
 static bool ksz9477_get_gbit(struct ksz_device *dev, u8 data)
 {
 	bool gbit;
@@ -1603,7 +1586,6 @@  static const struct ksz_dev_ops ksz9477_dev_ops = {
 	.get_port_addr = ksz9477_get_port_addr,
 	.cfg_port_member = ksz9477_cfg_port_member,
 	.flush_dyn_mac_table = ksz9477_flush_dyn_mac_table,
-	.phy_setup = ksz9477_phy_setup,
 	.port_setup = ksz9477_port_setup,
 	.r_mib_cnt = ksz9477_r_mib_cnt,
 	.r_mib_pkt = ksz9477_r_mib_pkt,
@@ -1617,7 +1599,26 @@  static const struct ksz_dev_ops ksz9477_dev_ops = {
 
 int ksz9477_switch_register(struct ksz_device *dev)
 {
-	return ksz_switch_register(dev, &ksz9477_dev_ops);
+	int ret, i;
+	struct phy_device *phydev;
+
+	ret = ksz_switch_register(dev, &ksz9477_dev_ops);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < dev->phy_port_cnt; ++i) {
+		phydev = dsa_to_port(dev->ds, i)->slave->phydev;
+
+		/* The MAC actually cannot run in 1000 half-duplex mode. */
+		phy_remove_link_mode(phydev,
+				     ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
+
+		/* PHY does not support gigabit. */
+		if (!(dev->features & GBIT_SUPPORT))
+			phy_remove_link_mode(phydev,
+					     ETHTOOL_LINK_MODE_1000baseT_Full_BIT);
+	}
+	return ret;
 }
 EXPORT_SYMBOL(ksz9477_switch_register);
 
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index fd1d6676ae4f..7b6c0dce7536 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -358,8 +358,6 @@  int ksz_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
 
 	/* setup slave port */
 	dev->dev_ops->port_setup(dev, port, false);
-	if (dev->dev_ops->phy_setup)
-		dev->dev_ops->phy_setup(dev, port, phy);
 
 	/* port_stp_state_set() will be called after to enable the port so
 	 * there is no need to do anything.
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index f2c9bb68fd33..7d11dd32ec0d 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -119,8 +119,6 @@  struct ksz_dev_ops {
 	u32 (*get_port_addr)(int port, int offset);
 	void (*cfg_port_member)(struct ksz_device *dev, int port, u8 member);
 	void (*flush_dyn_mac_table)(struct ksz_device *dev, int port);
-	void (*phy_setup)(struct ksz_device *dev, int port,
-			  struct phy_device *phy);
 	void (*port_cleanup)(struct ksz_device *dev, int port);
 	void (*port_setup)(struct ksz_device *dev, int port, bool cpu_port);
 	void (*r_phy)(struct ksz_device *dev, u16 phy, u16 reg, u16 *val);