mbox series

[net-next,v1,0/4] net: phy: mxl-gpy: broken interrupt fixes

Message ID 20221202151204.3318592-1-michael@walle.cc
Headers show
Series net: phy: mxl-gpy: broken interrupt fixes | expand

Message

Michael Walle Dec. 2, 2022, 3:12 p.m. UTC
The GPY215 has a broken interrupt pin. This patch series tries to
workaround that and because in general that is not possible, disables the
interrupts by default and falls back to polling mode. There is an opt-in
via the devicetree.

net vs net-next: I'm not sure. No one seems to have noticed it so far.
My board I care about has no support for older kernel. Apart from that,
the first patch might be for net. The last one would need a new device
tree property, so it might only apply for net-next? Also it will disable
interrupts by default now.
Let me know what you think. I can send the first patch independently with a
Fixes tag and resend the last ones after the merge window. (The last one
depends on the first).

Btw. I just noticed that this series won't apply cleanly, because it
references patch context changed by
https://lore.kernel.org/netdev/20221202144900.3298204-1-michael@walle.cc/
:(

Michael Walle (4):
  net: phy: mxl-gpy: add MDINT workaround
  dt-bindings: vendor-prefixes: add MaxLinear
  dt-bindings: net: phy: add MaxLinear GPY2xx bindings
  net: phy: mxl-gpy: disable interrupts on GPY215 by default

 .../bindings/net/maxlinear,gpy2xx.yaml        | 47 ++++++++++
 .../devicetree/bindings/vendor-prefixes.yaml  |  2 +
 drivers/net/phy/mxl-gpy.c                     | 88 +++++++++++++++++++
 3 files changed, 137 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/maxlinear,gpy2xx.yaml

Comments

Andrew Lunn Dec. 2, 2022, 6:23 p.m. UTC | #1
On Fri, Dec 02, 2022 at 04:12:01PM +0100, Michael Walle wrote:
> At least the GPY215B and GPY215C has a bug where it is still driving the
> interrupt line (MDINT) even after the interrupt status register is read
> and its bits are cleared. This will cause an interrupt storm.
> 
> Although the MDINT is multiplexed with a GPIO pin and theoretically we
> could switch the pinmux to GPIO input mode, this isn't possible because
> the access to this register will stall exactly as long as the interrupt
> line is asserted. We exploit this very fact and just read a random
> internal register in our interrupt handler. This way, it will be delayed
> until the external interrupt line is released and an interrupt storm is
> avoided.
> 
> The internal register access via the mailbox was deduced by looking at
> the downstream PHY API because the datasheet doesn't mention any of
> this.
> 
> Signed-off-by: Michael Walle <michael@walle.cc>
> ---
>  drivers/net/phy/mxl-gpy.c | 83 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 83 insertions(+)
> 
> diff --git a/drivers/net/phy/mxl-gpy.c b/drivers/net/phy/mxl-gpy.c
> index 0ff7ef076072..20e610dda891 100644
> --- a/drivers/net/phy/mxl-gpy.c
> +++ b/drivers/net/phy/mxl-gpy.c
> @@ -9,6 +9,7 @@
>  #include <linux/module.h>
>  #include <linux/bitfield.h>
>  #include <linux/hwmon.h>
> +#include <linux/mutex.h>
>  #include <linux/phy.h>
>  #include <linux/polynomial.h>
>  #include <linux/netdevice.h>
> @@ -81,6 +82,14 @@
>  #define VSPEC1_TEMP_STA	0x0E
>  #define VSPEC1_TEMP_STA_DATA	GENMASK(9, 0)
>  
> +/* Mailbox */
> +#define VSPEC1_MBOX_DATA	0x5
> +#define VSPEC1_MBOX_ADDRLO	0x6
> +#define VSPEC1_MBOX_CMD		0x7
> +#define VSPEC1_MBOX_CMD_ADDRHI	GENMASK(7, 0)
> +#define VSPEC1_MBOX_CMD_RD	(0 << 8)
> +#define VSPEC1_MBOX_CMD_READY	BIT(15)
> +
>  /* WoL */
>  #define VPSPEC2_WOL_CTL		0x0E06
>  #define VPSPEC2_WOL_AD01	0x0E08
> @@ -88,7 +97,15 @@
>  #define VPSPEC2_WOL_AD45	0x0E0A
>  #define WOL_EN			BIT(0)
>  
> +/* Internal registers, access via mbox */
> +#define REG_GPIO0_OUT		0xd3ce00
> +
>  struct gpy_priv {
> +	struct phy_device *phydev;
> +
> +	/* serialize mailbox acesses */
> +	struct mutex mbox_lock;
> +

>  static int gpy_probe(struct phy_device *phydev)
>  {
>  	struct device *dev = &phydev->mdio.dev;
> @@ -228,7 +286,9 @@ static int gpy_probe(struct phy_device *phydev)
>  	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>  	if (!priv)
>  		return -ENOMEM;
> +	priv->phydev = phydev;

I don't think you use this anywhere. Maybe in one of the following
patches?

	Andrew
Andrew Lunn Dec. 2, 2022, 6:42 p.m. UTC | #2
On Fri, Dec 02, 2022 at 04:12:04PM +0100, Michael Walle wrote:
> The interrupts on the GPY215B and GPY215C are broken and the only viable
> fix is to disable them altogether. There is still the possibilty to
> opt-in via the device tree.
> 
> Signed-off-by: Michael Walle <michael@walle.cc>
> ---
>  drivers/net/phy/mxl-gpy.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/phy/mxl-gpy.c b/drivers/net/phy/mxl-gpy.c
> index 20e610dda891..edb8cd8313b0 100644
> --- a/drivers/net/phy/mxl-gpy.c
> +++ b/drivers/net/phy/mxl-gpy.c
> @@ -12,6 +12,7 @@
>  #include <linux/mutex.h>
>  #include <linux/phy.h>
>  #include <linux/polynomial.h>
> +#include <linux/property.h>
>  #include <linux/netdevice.h>
>  
>  /* PHY ID */
> @@ -290,6 +291,10 @@ static int gpy_probe(struct phy_device *phydev)
>  	phydev->priv = priv;
>  	mutex_init(&priv->mbox_lock);
>  
> +	if (gpy_has_broken_mdint(phydev) &&
> +	    !device_property_present(dev, "maxlinear,use-broken-interrupts"))
> +		phydev->irq = PHY_POLL;
> +

I'm not sure of ordering here. It could be phydev->irq is set after
probe. The IRQ is requested as part of phy_connect_direct(), which is
much later.

I think a better place for this test is in gpy_config_intr(), return
-EOPNOTSUPP. phy_enable_interrupts() failing should then cause
phy_request_interrupt() to use polling.

	Andrew
Michael Walle Dec. 2, 2022, 10:53 p.m. UTC | #3
Am 2022-12-02 19:23, schrieb Andrew Lunn:
> On Fri, Dec 02, 2022 at 04:12:01PM +0100, Michael Walle wrote:
>> At least the GPY215B and GPY215C has a bug where it is still driving 
>> the
>> interrupt line (MDINT) even after the interrupt status register is 
>> read
>> and its bits are cleared. This will cause an interrupt storm.
>> 
>> Although the MDINT is multiplexed with a GPIO pin and theoretically we
>> could switch the pinmux to GPIO input mode, this isn't possible 
>> because
>> the access to this register will stall exactly as long as the 
>> interrupt
>> line is asserted. We exploit this very fact and just read a random
>> internal register in our interrupt handler. This way, it will be 
>> delayed
>> until the external interrupt line is released and an interrupt storm 
>> is
>> avoided.
>> 
>> The internal register access via the mailbox was deduced by looking at
>> the downstream PHY API because the datasheet doesn't mention any of
>> this.
>> 
>> Signed-off-by: Michael Walle <michael@walle.cc>
>> ---
>>  drivers/net/phy/mxl-gpy.c | 83 
>> +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 83 insertions(+)
>> 
>> diff --git a/drivers/net/phy/mxl-gpy.c b/drivers/net/phy/mxl-gpy.c
>> index 0ff7ef076072..20e610dda891 100644
>> --- a/drivers/net/phy/mxl-gpy.c
>> +++ b/drivers/net/phy/mxl-gpy.c
>> @@ -9,6 +9,7 @@
>>  #include <linux/module.h>
>>  #include <linux/bitfield.h>
>>  #include <linux/hwmon.h>
>> +#include <linux/mutex.h>
>>  #include <linux/phy.h>
>>  #include <linux/polynomial.h>
>>  #include <linux/netdevice.h>
>> @@ -81,6 +82,14 @@
>>  #define VSPEC1_TEMP_STA	0x0E
>>  #define VSPEC1_TEMP_STA_DATA	GENMASK(9, 0)
>> 
>> +/* Mailbox */
>> +#define VSPEC1_MBOX_DATA	0x5
>> +#define VSPEC1_MBOX_ADDRLO	0x6
>> +#define VSPEC1_MBOX_CMD		0x7
>> +#define VSPEC1_MBOX_CMD_ADDRHI	GENMASK(7, 0)
>> +#define VSPEC1_MBOX_CMD_RD	(0 << 8)
>> +#define VSPEC1_MBOX_CMD_READY	BIT(15)
>> +
>>  /* WoL */
>>  #define VPSPEC2_WOL_CTL		0x0E06
>>  #define VPSPEC2_WOL_AD01	0x0E08
>> @@ -88,7 +97,15 @@
>>  #define VPSPEC2_WOL_AD45	0x0E0A
>>  #define WOL_EN			BIT(0)
>> 
>> +/* Internal registers, access via mbox */
>> +#define REG_GPIO0_OUT		0xd3ce00
>> +
>>  struct gpy_priv {
>> +	struct phy_device *phydev;
>> +
>> +	/* serialize mailbox acesses */
>> +	struct mutex mbox_lock;
>> +
> 
>>  static int gpy_probe(struct phy_device *phydev)
>>  {
>>  	struct device *dev = &phydev->mdio.dev;
>> @@ -228,7 +286,9 @@ static int gpy_probe(struct phy_device *phydev)
>>  	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>  	if (!priv)
>>  		return -ENOMEM;
>> +	priv->phydev = phydev;
> 
> I don't think you use this anywhere. Maybe in one of the following
> patches?

Arg. Yes, it's an leftover from when I was using a workqueue to
reenable the interrupts again.

Any opinion whether this patch should be net or net-next?

-michael
Michael Walle Dec. 2, 2022, 11:09 p.m. UTC | #4
Am 2022-12-02 19:42, schrieb Andrew Lunn:
> On Fri, Dec 02, 2022 at 04:12:04PM +0100, Michael Walle wrote:
>> The interrupts on the GPY215B and GPY215C are broken and the only 
>> viable
>> fix is to disable them altogether. There is still the possibilty to
>> opt-in via the device tree.
>> 
>> Signed-off-by: Michael Walle <michael@walle.cc>
>> ---
>>  drivers/net/phy/mxl-gpy.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>> 
>> diff --git a/drivers/net/phy/mxl-gpy.c b/drivers/net/phy/mxl-gpy.c
>> index 20e610dda891..edb8cd8313b0 100644
>> --- a/drivers/net/phy/mxl-gpy.c
>> +++ b/drivers/net/phy/mxl-gpy.c
>> @@ -12,6 +12,7 @@
>>  #include <linux/mutex.h>
>>  #include <linux/phy.h>
>>  #include <linux/polynomial.h>
>> +#include <linux/property.h>
>>  #include <linux/netdevice.h>
>> 
>>  /* PHY ID */
>> @@ -290,6 +291,10 @@ static int gpy_probe(struct phy_device *phydev)
>>  	phydev->priv = priv;
>>  	mutex_init(&priv->mbox_lock);
>> 
>> +	if (gpy_has_broken_mdint(phydev) &&
>> +	    !device_property_present(dev, 
>> "maxlinear,use-broken-interrupts"))
>> +		phydev->irq = PHY_POLL;
>> +
> 
> I'm not sure of ordering here. It could be phydev->irq is set after
> probe. The IRQ is requested as part of phy_connect_direct(), which is
> much later.

I've did it that way, because phy_probe() also sets phydev->irq = 
PHY_POLL
in some cases and the phy driver .probe() is called right after it.

> I think a better place for this test is in gpy_config_intr(), return
> -EOPNOTSUPP. phy_enable_interrupts() failing should then cause
> phy_request_interrupt() to use polling.

Which will then print a warning, which might be misleading.
Or we disable the warning if -EOPNOTSUPP is returned?

-michael
Andrew Lunn Dec. 3, 2022, 8:36 p.m. UTC | #5
> > > @@ -290,6 +291,10 @@ static int gpy_probe(struct phy_device *phydev)
> > >  	phydev->priv = priv;
> > >  	mutex_init(&priv->mbox_lock);
> > > 
> > > +	if (gpy_has_broken_mdint(phydev) &&
> > > +	    !device_property_present(dev,
> > > "maxlinear,use-broken-interrupts"))
> > > +		phydev->irq = PHY_POLL;
> > > +
> > 
> > I'm not sure of ordering here. It could be phydev->irq is set after
> > probe. The IRQ is requested as part of phy_connect_direct(), which is
> > much later.
> 
> I've did it that way, because phy_probe() also sets phydev->irq = PHY_POLL
> in some cases and the phy driver .probe() is called right after it.

Yes, it is a valid point to do this check, but on its own i don't
think it is sufficient.

> > I think a better place for this test is in gpy_config_intr(), return
> > -EOPNOTSUPP. phy_enable_interrupts() failing should then cause
> > phy_request_interrupt() to use polling.
> 
> Which will then print a warning, which might be misleading.
> Or we disable the warning if -EOPNOTSUPP is returned?

Disabling the warning is the right thing to do.

	  Andrew
Andrew Lunn Dec. 3, 2022, 8:41 p.m. UTC | #6
On Fri, Dec 02, 2022 at 04:12:00PM +0100, Michael Walle wrote:
> The GPY215 has a broken interrupt pin. This patch series tries to
> workaround that and because in general that is not possible, disables the
> interrupts by default and falls back to polling mode. There is an opt-in
> via the devicetree.
> 
> net vs net-next: I'm not sure. No one seems to have noticed it so far.

I guess you could fail to notice 2ms of interrupt storm, if it only
happens on link down. But we should probably fix it. So please post
the first patch to net with a Fixes: tag. The rest to net-next.

    Andrew
Michael Walle Dec. 16, 2022, 9:46 a.m. UTC | #7
Am 2022-12-03 21:36, schrieb Andrew Lunn:
>> > > @@ -290,6 +291,10 @@ static int gpy_probe(struct phy_device *phydev)
>> > >  	phydev->priv = priv;
>> > >  	mutex_init(&priv->mbox_lock);
>> > >
>> > > +	if (gpy_has_broken_mdint(phydev) &&
>> > > +	    !device_property_present(dev,
>> > > "maxlinear,use-broken-interrupts"))
>> > > +		phydev->irq = PHY_POLL;
>> > > +
>> >
>> > I'm not sure of ordering here. It could be phydev->irq is set after
>> > probe. The IRQ is requested as part of phy_connect_direct(), which is
>> > much later.
>> 
>> I've did it that way, because phy_probe() also sets phydev->irq = 
>> PHY_POLL
>> in some cases and the phy driver .probe() is called right after it.
> 
> Yes, it is a valid point to do this check, but on its own i don't
> think it is sufficient.

Care to elaborate a bit? E.g. what is the difference to the case
the phy would have an interrupt described but no .config_intr()
op.

>> > I think a better place for this test is in gpy_config_intr(), return
>> > -EOPNOTSUPP. phy_enable_interrupts() failing should then cause
>> > phy_request_interrupt() to use polling.
>> 
>> Which will then print a warning, which might be misleading.
>> Or we disable the warning if -EOPNOTSUPP is returned?
> 
> Disabling the warning is the right thing to do.

There is more to this. .config_intr() is also used in
phy_init_hw() and phy_drv_supports_irq(). The latter would
still return true in our case. I'm not sure that is correct.

After trying your suggestion, I'm still in favor of somehow
tell the phy core to force polling mode during probe() of the
driver. The same way it's done if there is no .config_intr().

It's not like we'd need change the mode after probe during
runtime. Also with your proposed changed the attachment print
is wrong/misleading as it still prints the original irq instead
of PHY_POLL.

-michael
Andrew Lunn Dec. 20, 2022, 1:33 p.m. UTC | #8
> > Yes, it is a valid point to do this check, but on its own i don't
> > think it is sufficient.
> 
> Care to elaborate a bit? E.g. what is the difference to the case
> the phy would have an interrupt described but no .config_intr()
> op.
> 
> > > > I think a better place for this test is in gpy_config_intr(), return
> > > > -EOPNOTSUPP. phy_enable_interrupts() failing should then cause
> > > > phy_request_interrupt() to use polling.
> > > 
> > > Which will then print a warning, which might be misleading.
> > > Or we disable the warning if -EOPNOTSUPP is returned?
> > 
> > Disabling the warning is the right thing to do.
> 
> There is more to this. .config_intr() is also used in
> phy_init_hw() and phy_drv_supports_irq(). The latter would
> still return true in our case. I'm not sure that is correct.
> 
> After trying your suggestion, I'm still in favor of somehow
> tell the phy core to force polling mode during probe() of the
> driver.

The problem is that the MAC can set the interrupt number after the PHY
probe has been called. e.g.

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c#L524

The interrupt needs to be set by the time the PHY is connected to the
MAC, which is often in the MAC open method, much later than the PHY
probe.

     Andrew