Message ID | 20171210164752.GO10595@n2100.armlinux.org.uk |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Series | [BUG] micrel phy suspend/resume | expand |
Hi Russell, On 12/10/2017 08:47 AM, Russell King - ARM Linux wrote: > Guys, > > I've just tripped over a bug with the Micrel PHY driver, but it > really isn't specific to the Micrel PHY driver. > > When we suspend, we suspend the PHY and then the MAC driver (eg, > on the ZII board): > > [ 198.822751] 400d0000.ethernet-1:00: bus : mdio_bus_suspend+0x0/0x34 > [ 198.822859] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3100 > [ 198.822878] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3900 > ... > [ 198.826235] 400d0000.ethernet: bus : platform_pm_freeze+0x0/0x5c > [ 198.826354] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 9198 > [ 198.826374] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 9198 > [ 198.826503] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0000 > [ 198.826699] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > > When we resume, the order is reversed: > > [ 198.848300] 400d0000.ethernet: bus : platform_pm_thaw+0x0/0x54 > [ 198.849024] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > [ 198.849120] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 9198 > [ 198.849141] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 9198 > [ 198.849243] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0500 > [ 198.849401] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3900 > [ 198.849419] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > [ 198.849637] __mdiobus_read: 61 400d0000.ethernet-1 00 01 => 7849 > ... > [ 198.852677] 400d0000.ethernet-1:00: bus : mdio_bus_resume+0x0/0x34 > [ 198.852778] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3100 > [ 198.852797] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > > Now, the MAC driver calls phy_stop() and phy_start() from within its > own suspend/resume methods, and at this is done while the PHY is, > as far as the kernel PM code is concerned, suspended. > > However, phylib works around that by resuming the PHY itself when > phy_start() is called in this situation. That looks good, but it > really isn't in this case. Given the above sequence, we will be in > PHY_HALTED state. > > So, when phy_start() is called, the first thing it does is check the > state, and finds PHY_HALTED. It then tries to enable the PHY > interrupts. This cause the Micrel driver to write to the PHY to > enable interrupts - it reads 0x1b to ack any pre-existing interrupt, > modifies 0x1f to set the interrupt pin level, and then supposedly > writes 0x1b to enable interrupts. > > However, at this point, the PHY is still powered down, as we can see > in the following read of the BMCR - containing 0x3900. The write to > enable interrupts is ignored, and so interrupts remain disabled. > > The resume process continues, and the system resumes, but interrupts > on the PHY remain disabled, and the phylib state machine never > advances. You can do anything you like with cables etc, as far as > phylib is concerned, the link steadfastly remains "down". > > That's a bit of a problem if your platform is running root-NFS through > that network interface. > > It looks like some variants of the Micrel phy code work around this by > including in their resume path code to enable PHY interrupts > independently of phylib. > > phy_resume() can't be called inside the locked region in phy_start(), > where we enable interrupts, because genphy_resume() also wants to take > phydev->lock - and it wants to do that to safely read-modify-write the > BMCR register. This, I feel, comes back to an abuse of the phy state > machine lock to also protect atomic bus operations. > > If we move over to using the bus lock to protect bus atomic operations > (as I believe we should) then phy_resume() can be called while holding > phydev->lock from within bits of the code that are protecting themselves > from concurrency with the phylib state machine. It also means that we > can get rid of some of these boolean variables, and most importantly > in this particular case, call phy_resume() before we enable interrupts. > > With that arrangement, things look a lot better: > > [ 74.545584] 400d0000.ethernet: bus : platform_pm_thaw+0x0/0x54 > [ 74.546010] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3900 > [ 74.546029] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > [ 74.546264] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > [ 74.546354] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 8100 > [ 74.546373] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 8100 > [ 74.546651] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0500 > > The patch for this is slightly larger than it needs to be because I've > converted more places that do read-modify-write to the new xxx_modify() > accessor, and it also ensures that phy_resume() is consistently called > with the phy state machine lock held. It also means we can get rid of > the interrupt enable hack for some micrel PHYs, which I suspect comes > from this same root cause. Since this is a bug fix, I would really rather have the ability to target the "net" tree with a simpler fix that is contained within drivers/net/phy/phy.c and which does not require the use of phy_modify(). Thanks for doing all the detective work!
On Sun, Dec 10, 2017 at 03:50:28PM -0800, Florian Fainelli wrote: > Hi Russell, > > On 12/10/2017 08:47 AM, Russell King - ARM Linux wrote: > > Guys, > > > > I've just tripped over a bug with the Micrel PHY driver, but it > > really isn't specific to the Micrel PHY driver. > > > > When we suspend, we suspend the PHY and then the MAC driver (eg, > > on the ZII board): > > > > [ 198.822751] 400d0000.ethernet-1:00: bus : mdio_bus_suspend+0x0/0x34 > > [ 198.822859] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3100 > > [ 198.822878] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3900 > > ... > > [ 198.826235] 400d0000.ethernet: bus : platform_pm_freeze+0x0/0x5c > > [ 198.826354] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 9198 > > [ 198.826374] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 9198 > > [ 198.826503] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0000 > > [ 198.826699] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > > > > When we resume, the order is reversed: > > > > [ 198.848300] 400d0000.ethernet: bus : platform_pm_thaw+0x0/0x54 > > [ 198.849024] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > > [ 198.849120] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 9198 > > [ 198.849141] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 9198 > > [ 198.849243] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0500 > > [ 198.849401] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3900 > > [ 198.849419] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > > [ 198.849637] __mdiobus_read: 61 400d0000.ethernet-1 00 01 => 7849 > > ... > > [ 198.852677] 400d0000.ethernet-1:00: bus : mdio_bus_resume+0x0/0x34 > > [ 198.852778] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3100 > > [ 198.852797] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > > > > Now, the MAC driver calls phy_stop() and phy_start() from within its > > own suspend/resume methods, and at this is done while the PHY is, > > as far as the kernel PM code is concerned, suspended. > > > > However, phylib works around that by resuming the PHY itself when > > phy_start() is called in this situation. That looks good, but it > > really isn't in this case. Given the above sequence, we will be in > > PHY_HALTED state. > > > > So, when phy_start() is called, the first thing it does is check the > > state, and finds PHY_HALTED. It then tries to enable the PHY > > interrupts. This cause the Micrel driver to write to the PHY to > > enable interrupts - it reads 0x1b to ack any pre-existing interrupt, > > modifies 0x1f to set the interrupt pin level, and then supposedly > > writes 0x1b to enable interrupts. > > > > However, at this point, the PHY is still powered down, as we can see > > in the following read of the BMCR - containing 0x3900. The write to > > enable interrupts is ignored, and so interrupts remain disabled. > > > > The resume process continues, and the system resumes, but interrupts > > on the PHY remain disabled, and the phylib state machine never > > advances. You can do anything you like with cables etc, as far as > > phylib is concerned, the link steadfastly remains "down". > > > > That's a bit of a problem if your platform is running root-NFS through > > that network interface. > > > > It looks like some variants of the Micrel phy code work around this by > > including in their resume path code to enable PHY interrupts > > independently of phylib. > > > > phy_resume() can't be called inside the locked region in phy_start(), > > where we enable interrupts, because genphy_resume() also wants to take > > phydev->lock - and it wants to do that to safely read-modify-write the > > BMCR register. This, I feel, comes back to an abuse of the phy state > > machine lock to also protect atomic bus operations. > > > > If we move over to using the bus lock to protect bus atomic operations > > (as I believe we should) then phy_resume() can be called while holding > > phydev->lock from within bits of the code that are protecting themselves > > from concurrency with the phylib state machine. It also means that we > > can get rid of some of these boolean variables, and most importantly > > in this particular case, call phy_resume() before we enable interrupts. > > > > With that arrangement, things look a lot better: > > > > [ 74.545584] 400d0000.ethernet: bus : platform_pm_thaw+0x0/0x54 > > [ 74.546010] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3900 > > [ 74.546029] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > > [ 74.546264] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > > [ 74.546354] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 8100 > > [ 74.546373] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 8100 > > [ 74.546651] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0500 > > > > The patch for this is slightly larger than it needs to be because I've > > converted more places that do read-modify-write to the new xxx_modify() > > accessor, and it also ensures that phy_resume() is consistently called > > with the phy state machine lock held. It also means we can get rid of > > the interrupt enable hack for some micrel PHYs, which I suspect comes > > from this same root cause. > > Since this is a bug fix, I would really rather have the ability to > target the "net" tree with a simpler fix that is contained within > drivers/net/phy/phy.c and which does not require the use of phy_modify(). Not really that easy. The simple solution would be to open-code the phy_modify() inside genphy_resume() in phy_device.c, and move the phy_resume() call in phy.c. However, that leaves at803x taking phydev->lock in its resume path, which will immediately deadlock. We couldn't just delete that lock without ensuring that phy_resume() is always called with phydev->lock held to preserve the existing locking. I wouldn't want to simply delete that lock without some testing of that driver in case there's a good reason for it being there. So, to preserve that would mean making sure the two calls to phy_resume() in phy_device.c are also under the lock, replacing the at803x.c read-modify-write sequences with open-coded versions, or convincing ourselves that the locks in at803x_suspend() and at803x_resume() are not required. My feeling is that the safest and minimal change for -rc is to add the phy_modify() helper, switch both genphy_resume() and at803x_resume() to it and delete the locking there, and ensure that phy_resume() is always called with phydev->lock taken for a consistent locking strategy there. We can prove the phy_modify() helper works elsewhere, which means the risk is reduced to the at803x_resume() conversion. The unfortunate issue that gives us is that the calling convention of "suspend" vs "resume" is different - resume ends up with phydev->lock always held but suspend doesn't, and that will be confusing for phy driver developers, and really doesn't help from the review point of view.
diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c index 4c75cfcdeaec..a20d1f5e4dbf 100644 --- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@ -258,38 +258,22 @@ static int at803x_suspend(struct phy_device *phydev) int value; int wol_enabled; - mutex_lock(&phydev->lock); - value = phy_read(phydev, AT803X_INTR_ENABLE); wol_enabled = value & AT803X_INTR_ENABLE_WOL; - value = phy_read(phydev, MII_BMCR); - if (wol_enabled) - value |= BMCR_ISOLATE; + value = BMCR_ISOLATE; else - value |= BMCR_PDOWN; - - phy_write(phydev, MII_BMCR, value); + value = BMCR_PDOWN; - mutex_unlock(&phydev->lock); + phy_modify(phydev, MII_BMCR, value, value); return 0; } static int at803x_resume(struct phy_device *phydev) { - int value; - - mutex_lock(&phydev->lock); - - value = phy_read(phydev, MII_BMCR); - value &= ~(BMCR_PDOWN | BMCR_ISOLATE); - phy_write(phydev, MII_BMCR, value); - - mutex_unlock(&phydev->lock); - - return 0; + return phy_modify(phydev, MII_BMCR, BMCR_PDOWN | BMCR_ISOLATE, 0); } static int at803x_probe(struct phy_device *phydev) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 1bf3adcdcbac..d5adb00eafdf 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -651,6 +651,38 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val) EXPORT_SYMBOL(mdiobus_write); /** + * mdiobus_modify - Convenience function for writing a given MII mgmt register + * @bus: the mii_bus struct + * @addr: the phy address + * @regnum: register number to write + * @mask: bits to mask off from the register @regnum + * @val: new value of bits set in mask to write to @regnum + * + * NOTE: MUST NOT be called from interrupt context, + * because the bus read/write functions may wait for an interrupt + * to conclude the operation. + */ +int mdiobus_modify(struct mii_bus *bus, int addr, u32 regnum, u16 mask, u16 val) +{ + int retval; + int err; + + BUG_ON(in_interrupt()); + + mutex_lock(&bus->mdio_lock); + err = retval = __mdiobus_read(bus, addr, regnum); + if (err >= 0) { + retval &= ~mask; + retval |= val & mask; + err = __mdiobus_write(bus, addr, regnum, retval); + } + mutex_unlock(&bus->mdio_lock); + + return err; +} +EXPORT_SYMBOL(mdiobus_modify); + +/** * mdio_bus_match - determine if given MDIO driver supports the given * MDIO device * @dev: target MDIO device diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index fdb43dd9b5cd..9e9438caa2b7 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -711,22 +711,9 @@ static int kszphy_suspend(struct phy_device *phydev) static int kszphy_resume(struct phy_device *phydev) { - int ret; - genphy_resume(phydev); - ret = kszphy_config_reset(phydev); - if (ret) - return ret; - - /* Enable PHY Interrupts */ - if (phy_interrupt_is_valid(phydev)) { - phydev->interrupts = PHY_INTERRUPT_ENABLED; - if (phydev->drv->config_intr) - phydev->drv->config_intr(phydev); - } - - return 0; + return kszphy_config_reset(phydev); } static int kszphy_probe(struct phy_device *phydev) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index f2a83ab00e71..874740957606 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -844,7 +844,6 @@ EXPORT_SYMBOL(phy_stop); */ void phy_start(struct phy_device *phydev) { - bool do_resume = false; int err = 0; mutex_lock(&phydev->lock); @@ -857,6 +856,9 @@ void phy_start(struct phy_device *phydev) phydev->state = PHY_UP; break; case PHY_HALTED: + /* if phy was suspended, bring the physical link up again */ + phy_resume(phydev); + /* make sure interrupts are re-enabled for the PHY */ if (phydev->irq != PHY_POLL) { err = phy_enable_interrupts(phydev); @@ -865,17 +867,12 @@ void phy_start(struct phy_device *phydev) } phydev->state = PHY_RESUMING; - do_resume = true; break; default: break; } mutex_unlock(&phydev->lock); - /* if phy was suspended, bring the physical link up again */ - if (do_resume) - phy_resume(phydev); - phy_trigger_machine(phydev, true); } EXPORT_SYMBOL(phy_start); diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 67f25ac29025..d0d98e6f6406 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -135,7 +135,9 @@ static int mdio_bus_phy_resume(struct device *dev) if (!mdio_bus_phy_may_suspend(phydev)) goto no_resume; + mutex_lock(&phydev->lock); ret = phy_resume(phydev); + mutex_unlock(&phydev->lock); if (ret < 0) return ret; @@ -1026,7 +1028,9 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, if (err) goto error; + mutex_lock(&phydev->lock); phy_resume(phydev); + mutex_unlock(&phydev->lock); phy_led_triggers_register(phydev); return err; @@ -1157,6 +1161,8 @@ int phy_resume(struct phy_device *phydev) struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver); int ret = 0; + WARN_ON(!mutex_is_locked(&phydev->lock)); + if (phydev->drv && phydrv->resume) ret = phydrv->resume(phydev); @@ -1322,9 +1328,8 @@ static int genphy_config_eee_advert(struct phy_device *phydev) */ int genphy_setup_forced(struct phy_device *phydev) { - int ctl = phy_read(phydev, MII_BMCR); + u16 ctl = 0; - ctl &= BMCR_LOOPBACK | BMCR_ISOLATE | BMCR_PDOWN; phydev->pause = 0; phydev->asym_pause = 0; @@ -1336,7 +1341,10 @@ int genphy_setup_forced(struct phy_device *phydev) if (DUPLEX_FULL == phydev->duplex) ctl |= BMCR_FULLDPLX; - return phy_write(phydev, MII_BMCR, ctl); + return phy_modify(phydev, MII_BMCR, + BMCR_LOOPBACK | BMCR_ISOLATE | BMCR_PDOWN | + BMCR_SPEED1000 | BMCR_SPEED100 | BMCR_FULLDPLX, + ctl); } EXPORT_SYMBOL(genphy_setup_forced); @@ -1346,17 +1354,10 @@ EXPORT_SYMBOL(genphy_setup_forced); */ int genphy_restart_aneg(struct phy_device *phydev) { - int ctl = phy_read(phydev, MII_BMCR); - - if (ctl < 0) - return ctl; - - ctl |= BMCR_ANENABLE | BMCR_ANRESTART; - /* Don't isolate the PHY if we're negotiating */ - ctl &= ~BMCR_ISOLATE; - - return phy_write(phydev, MII_BMCR, ctl); + return phy_modify(phydev, MII_BMCR, + BMCR_ANENABLE | BMCR_ANRESTART | BMCR_ISOLATE, + BMCR_ANENABLE | BMCR_ANRESTART); } EXPORT_SYMBOL(genphy_restart_aneg); @@ -1622,48 +1623,20 @@ EXPORT_SYMBOL(genphy_config_init); int genphy_suspend(struct phy_device *phydev) { - int value; - - mutex_lock(&phydev->lock); - - value = phy_read(phydev, MII_BMCR); - phy_write(phydev, MII_BMCR, value | BMCR_PDOWN); - - mutex_unlock(&phydev->lock); - - return 0; + return phy_modify(phydev, MII_BMCR, BMCR_PDOWN, BMCR_PDOWN); } EXPORT_SYMBOL(genphy_suspend); int genphy_resume(struct phy_device *phydev) { - int value; - - mutex_lock(&phydev->lock); - - value = phy_read(phydev, MII_BMCR); - phy_write(phydev, MII_BMCR, value & ~BMCR_PDOWN); - - mutex_unlock(&phydev->lock); - - return 0; + return phy_modify(phydev, MII_BMCR, BMCR_PDOWN, 0); } EXPORT_SYMBOL(genphy_resume); int genphy_loopback(struct phy_device *phydev, bool enable) { - int value; - - value = phy_read(phydev, MII_BMCR); - if (value < 0) - return value; - - if (enable) - value |= BMCR_LOOPBACK; - else - value &= ~BMCR_LOOPBACK; - - return phy_write(phydev, MII_BMCR, value); + return phy_modify(phydev, MII_BMCR, BMCR_LOOPBACK, + enable ? BMCR_LOOPBACK : 0); } EXPORT_SYMBOL(genphy_loopback); diff --git a/include/linux/mdio.h b/include/linux/mdio.h index 4be30adc033b..8e69211685be 100644 --- a/include/linux/mdio.h +++ b/include/linux/mdio.h @@ -264,6 +264,7 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum); int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum); int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val); int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val); +int mdiobus_modify(struct mii_bus *bus, int addr, u32 regnum, u16 mask, u16 val); int mdiobus_register_device(struct mdio_device *mdiodev); int mdiobus_unregister_device(struct mdio_device *mdiodev); diff --git a/include/linux/phy.h b/include/linux/phy.h index 03a259f128f9..afe5d963a98c 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -766,6 +766,24 @@ static inline int __phy_write(struct phy_device *phydev, u32 regnum, u16 val) } /** + * phy_modify - Convenience function for modifying a given PHY register + * @phydev: the phy_device struct + * @regnum: register number to write + * @mask: bits to mask off from the register @regnum + * @val: new value of bits set in mask to write to @regnum + * + * NOTE: MUST NOT be called from interrupt context, + * because the bus read/write functions may wait for an interrupt + * to conclude the operation. + */ +static inline int phy_modify(struct phy_device *phydev, u32 regnum, u16 mask, + u16 val) +{ + return mdiobus_modify(phydev->mdio.bus, phydev->mdio.addr, regnum, + mask, val); +} + +/** * phy_interrupt_is_valid - Convenience function for testing a given PHY irq * @phydev: the phy_device struct *