Message ID | 1559844377-17188-1-git-send-email-hancock@sedsystems.ca |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] net: sfp: Stop SFP polling and interrupt handling during shutdown | expand |
On Thu, Jun 06, 2019 at 12:06:17PM -0600, Robert Hancock wrote: > SFP device polling can cause problems during the shutdown process if the > parent devices of the network controller have been shut down already. > This problem was seen on the iMX6 platform with PCIe devices, where > accessing the device after the bus is shut down causes a hang. > > Stop all delayed work in the SFP driver during the shutdown process, and > set a flag which causes any further state checks or state machine events > (possibly triggered by previous GPIO IRQs) to be skipped. > > Signed-off-by: Robert Hancock <hancock@sedsystems.ca> > --- > > This is an updated version of a previous patch "net: sfp: Stop SFP polling > during shutdown" with the addition of stopping handling of GPIO-triggered > interrupts as well, as pointed out by Russell King. > > drivers/net/phy/sfp.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > index 554acc8..5fdf573 100644 > --- a/drivers/net/phy/sfp.c > +++ b/drivers/net/phy/sfp.c > @@ -191,6 +191,7 @@ struct sfp { > struct delayed_work poll; > struct delayed_work timeout; > struct mutex sm_mutex; > + bool shutdown; > unsigned char sm_mod_state; > unsigned char sm_dev_state; > unsigned short sm_state; > @@ -1466,6 +1467,11 @@ static void sfp_sm_mod_remove(struct sfp *sfp) > static void sfp_sm_event(struct sfp *sfp, unsigned int event) > { > mutex_lock(&sfp->sm_mutex); > + if (unlikely(sfp->shutdown)) { > + /* Do not handle any more state machine events. */ > + mutex_unlock(&sfp->sm_mutex); > + return; > + } > > dev_dbg(sfp->dev, "SM: enter %s:%s:%s event %s\n", > mod_state_to_str(sfp->sm_mod_state), > @@ -1704,6 +1710,13 @@ static void sfp_check_state(struct sfp *sfp) > { > unsigned int state, i, changed; > > + mutex_lock(&sfp->sm_mutex); > + if (unlikely(sfp->shutdown)) { > + /* No more state checks */ > + mutex_unlock(&sfp->sm_mutex); > + return; > + } > + I don't think you need to add the mutex locking - just check for sfp->shutdown and be done with it... > +static void sfp_shutdown(struct platform_device *pdev) > +{ > + struct sfp *sfp = platform_get_drvdata(pdev); > + > + mutex_lock(&sfp->sm_mutex); > + sfp->shutdown = true; > + mutex_unlock(&sfp->sm_mutex); > + > + cancel_delayed_work_sync(&sfp->poll); > + cancel_delayed_work_sync(&sfp->timeout); Since the work cancellation will ensure that the works are not running at the point they return, and should they then run again, they'll hit the sfp->shutdown condition. > +} > + > static struct platform_driver sfp_driver = { > .probe = sfp_probe, > .remove = sfp_remove, > + .shutdown = sfp_shutdown, > .driver = { > .name = "sfp", > .of_match_table = sfp_of_match, > -- > 1.8.3.1 > >
On 2019-06-06 12:09 p.m., Russell King - ARM Linux admin wrote: >> @@ -1466,6 +1467,11 @@ static void sfp_sm_mod_remove(struct sfp *sfp) >> static void sfp_sm_event(struct sfp *sfp, unsigned int event) >> { >> mutex_lock(&sfp->sm_mutex); >> + if (unlikely(sfp->shutdown)) { >> + /* Do not handle any more state machine events. */ >> + mutex_unlock(&sfp->sm_mutex); >> + return; >> + } >> >> dev_dbg(sfp->dev, "SM: enter %s:%s:%s event %s\n", >> mod_state_to_str(sfp->sm_mod_state), >> @@ -1704,6 +1710,13 @@ static void sfp_check_state(struct sfp *sfp) >> { >> unsigned int state, i, changed; >> >> + mutex_lock(&sfp->sm_mutex); >> + if (unlikely(sfp->shutdown)) { >> + /* No more state checks */ >> + mutex_unlock(&sfp->sm_mutex); >> + return; >> + } >> + > > I don't think you need to add the mutex locking - just check for > sfp->shutdown and be done with it... The idea there was to deal with the case where GPIO interrupts were previously raised before shutdown and not yet handled by the threaded interrupt handler by the time shutdown is called. After shutdown on the SFP completes, the bus the GPIO stuff is on could potentially be shut down at any moment, so we really don't want to be digging into the GPIO states after that. Locking the mutex there ensures that we don't read a stale value for the shutdown flag in the interrupt handler, since AFAIK there's no other synchronization around that value. It may also be helpful that the lock is now held for the subsequent code in sfp_check_state that's comparing the previous and new states - it seems like you could otherwise run into trouble if that function was being concurrently called from the polling thread and the interrupt handler (for example if you had an SFP where some GPIOs supported interrupts and some didn't). > >> +static void sfp_shutdown(struct platform_device *pdev) >> +{ >> + struct sfp *sfp = platform_get_drvdata(pdev); >> + >> + mutex_lock(&sfp->sm_mutex); >> + sfp->shutdown = true; >> + mutex_unlock(&sfp->sm_mutex); >> + >> + cancel_delayed_work_sync(&sfp->poll); >> + cancel_delayed_work_sync(&sfp->timeout); > > Since the work cancellation will ensure that the works are not running > at the point they return, and should they then run again, they'll hit > the sfp->shutdown condition. > >> +} >> + >> static struct platform_driver sfp_driver = { >> .probe = sfp_probe, >> .remove = sfp_remove, >> + .shutdown = sfp_shutdown, >> .driver = { >> .name = "sfp", >> .of_match_table = sfp_of_match, >> -- >> 1.8.3.1 >> >> >
On 6/6/2019 1:57 PM, Robert Hancock wrote: > On 2019-06-06 12:09 p.m., Russell King - ARM Linux admin wrote: >>> @@ -1466,6 +1467,11 @@ static void sfp_sm_mod_remove(struct sfp *sfp) >>> static void sfp_sm_event(struct sfp *sfp, unsigned int event) >>> { >>> mutex_lock(&sfp->sm_mutex); >>> + if (unlikely(sfp->shutdown)) { >>> + /* Do not handle any more state machine events. */ >>> + mutex_unlock(&sfp->sm_mutex); >>> + return; >>> + } >>> >>> dev_dbg(sfp->dev, "SM: enter %s:%s:%s event %s\n", >>> mod_state_to_str(sfp->sm_mod_state), >>> @@ -1704,6 +1710,13 @@ static void sfp_check_state(struct sfp *sfp) >>> { >>> unsigned int state, i, changed; >>> >>> + mutex_lock(&sfp->sm_mutex); >>> + if (unlikely(sfp->shutdown)) { >>> + /* No more state checks */ >>> + mutex_unlock(&sfp->sm_mutex); >>> + return; >>> + } >>> + >> >> I don't think you need to add the mutex locking - just check for >> sfp->shutdown and be done with it... > > The idea there was to deal with the case where GPIO interrupts were > previously raised before shutdown and not yet handled by the threaded > interrupt handler by the time shutdown is called. After shutdown on the > SFP completes, the bus the GPIO stuff is on could potentially be shut > down at any moment, so we really don't want to be digging into the GPIO > states after that. Locking the mutex there ensures that we don't read a > stale value for the shutdown flag in the interrupt handler, since AFAIK > there's no other synchronization around that value. > > It may also be helpful that the lock is now held for the subsequent code > in sfp_check_state that's comparing the previous and new states - it > seems like you could otherwise run into trouble if that function was > being concurrently called from the polling thread and the interrupt > handler (for example if you had an SFP where some GPIOs supported > interrupts and some didn't). Would not it be sufficient to call disable_irq() or devm_free_irq() (to match the devm_request_threaded_irq call) in order to achieve what you want here?
On Thu, Jun 06, 2019 at 02:57:22PM -0600, Robert Hancock wrote: > It may also be helpful that the lock is now held for the subsequent code > in sfp_check_state that's comparing the previous and new states - it > seems like you could otherwise run into trouble if that function was > being concurrently called from the polling thread and the interrupt > handler (for example if you had an SFP where some GPIOs supported > interrupts and some didn't). That's a good point, one that we should address separately. Rather than re-using the existing mutex (which would be difficult to hold across calling the state machine due to locking inversion) how about this: drivers/net/phy/sfp.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 8a21294d1ce8..5ff427dcbb31 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -190,6 +190,7 @@ struct sfp { struct delayed_work poll; struct delayed_work timeout; struct mutex sm_mutex; + struct mutex st_mutex; unsigned char sm_mod_state; unsigned char sm_dev_state; unsigned short sm_state; @@ -1976,6 +1977,7 @@ static void sfp_check_state(struct sfp *sfp) { unsigned int state, i, changed; + mutex_lock(&sfp->st_mutex); state = sfp_get_state(sfp); changed = state ^ sfp->state; changed &= SFP_F_PRESENT | SFP_F_LOS | SFP_F_TX_FAULT; @@ -2001,6 +2003,7 @@ static void sfp_check_state(struct sfp *sfp) sfp_sm_event(sfp, state & SFP_F_LOS ? SFP_E_LOS_HIGH : SFP_E_LOS_LOW); rtnl_unlock(); + mutex_unlock(&sfp->st_mutex); } static irqreturn_t sfp_irq(int irq, void *data) @@ -2031,6 +2034,7 @@ static struct sfp *sfp_alloc(struct device *dev) sfp->dev = dev; mutex_init(&sfp->sm_mutex); + mutex_init(&sfp->st_mutex); INIT_DELAYED_WORK(&sfp->poll, sfp_poll); INIT_DELAYED_WORK(&sfp->timeout, sfp_timeout);
On Thu, Jun 06, 2019 at 02:57:22PM -0600, Robert Hancock wrote: > The idea there was to deal with the case where GPIO interrupts were > previously raised before shutdown and not yet handled by the threaded > interrupt handler by the time shutdown is called. After shutdown on the > SFP completes, the bus the GPIO stuff is on could potentially be shut > down at any moment, so we really don't want to be digging into the GPIO > states after that. Locking the mutex there ensures that we don't read a > stale value for the shutdown flag in the interrupt handler, since AFAIK > there's no other synchronization around that value. There are two cases: 1) The interrupt is raised just as sfp_shutdown() is called but before the mutex is taken. We will get the full processing in this case. 2) The interrupt is raised during the mutex-locked bit of sfp_shutdown() or after the mutex in sfp_shutdown() is released. We will get the abbreviated processing. This means that the mutex doesn't provide any protection against full interrupt processing if it occurs just prior to or during the initial execution of sfp_shutdown(). All that we need to ensure is that the state of sfp->shutdown is visible by the time sfp_shutdown() returns, and that none of the interrupt and worker functions are executing. We have the worker functions covered by the synchronous cancelling of them, but not the interrupts, and as Florian points out, it's probably better to disable the interrupts, and again, that can be done synchronously to ensure that the handlers are not running. If the workers and interrupt handlers are synchronously disabled, we can be sure by the end of sfp_shutdown() that none of those paths are executing, so the next time something attempts to trigger them, they will see sfp->shutdown is set. I'm not convinced that we even need sfp->shutdown if we have cancelled the workers and disabled the interrupts.
On 2019-06-07 4:42 a.m., Russell King - ARM Linux admin wrote: > On Thu, Jun 06, 2019 at 02:57:22PM -0600, Robert Hancock wrote: >> The idea there was to deal with the case where GPIO interrupts were >> previously raised before shutdown and not yet handled by the threaded >> interrupt handler by the time shutdown is called. After shutdown on the >> SFP completes, the bus the GPIO stuff is on could potentially be shut >> down at any moment, so we really don't want to be digging into the GPIO >> states after that. Locking the mutex there ensures that we don't read a >> stale value for the shutdown flag in the interrupt handler, since AFAIK >> there's no other synchronization around that value. > > There are two cases: > > 1) The interrupt is raised just as sfp_shutdown() is called but before > the mutex is taken. We will get the full processing in this case. > > 2) The interrupt is raised during the mutex-locked bit of sfp_shutdown() > or after the mutex in sfp_shutdown() is released. We will get the > abbreviated processing. > > This means that the mutex doesn't provide any protection against full > interrupt processing if it occurs just prior to or during the initial > execution of sfp_shutdown(). > > All that we need to ensure is that the state of sfp->shutdown is > visible by the time sfp_shutdown() returns, and that none of the > interrupt and worker functions are executing. We have the worker > functions covered by the synchronous cancelling of them, but not the > interrupts, and as Florian points out, it's probably better to disable > the interrupts, and again, that can be done synchronously to ensure > that the handlers are not running. > > If the workers and interrupt handlers are synchronously disabled, we > can be sure by the end of sfp_shutdown() that none of those paths are > executing, so the next time something attempts to trigger them, they > will see sfp->shutdown is set. > > I'm not convinced that we even need sfp->shutdown if we have cancelled > the workers and disabled the interrupts. I think you're right that if we just free the interrupts we avoid the need for that flag, as that should ensure that the interrupt handlers are no longer running or pending. Will resubmit with that approach and also add a patch to add another mutex to the state checking as you suggested.
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 554acc8..5fdf573 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -191,6 +191,7 @@ struct sfp { struct delayed_work poll; struct delayed_work timeout; struct mutex sm_mutex; + bool shutdown; unsigned char sm_mod_state; unsigned char sm_dev_state; unsigned short sm_state; @@ -1466,6 +1467,11 @@ static void sfp_sm_mod_remove(struct sfp *sfp) static void sfp_sm_event(struct sfp *sfp, unsigned int event) { mutex_lock(&sfp->sm_mutex); + if (unlikely(sfp->shutdown)) { + /* Do not handle any more state machine events. */ + mutex_unlock(&sfp->sm_mutex); + return; + } dev_dbg(sfp->dev, "SM: enter %s:%s:%s event %s\n", mod_state_to_str(sfp->sm_mod_state), @@ -1704,6 +1710,13 @@ static void sfp_check_state(struct sfp *sfp) { unsigned int state, i, changed; + mutex_lock(&sfp->sm_mutex); + if (unlikely(sfp->shutdown)) { + /* No more state checks */ + mutex_unlock(&sfp->sm_mutex); + return; + } + state = sfp_get_state(sfp); changed = state ^ sfp->state; changed &= SFP_F_PRESENT | SFP_F_LOS | SFP_F_TX_FAULT; @@ -1715,6 +1728,7 @@ static void sfp_check_state(struct sfp *sfp) state |= sfp->state & (SFP_F_TX_DISABLE | SFP_F_RATE_SELECT); sfp->state = state; + mutex_unlock(&sfp->sm_mutex); rtnl_lock(); if (changed & SFP_F_PRESENT) @@ -1928,9 +1942,22 @@ static int sfp_remove(struct platform_device *pdev) return 0; } +static void sfp_shutdown(struct platform_device *pdev) +{ + struct sfp *sfp = platform_get_drvdata(pdev); + + mutex_lock(&sfp->sm_mutex); + sfp->shutdown = true; + mutex_unlock(&sfp->sm_mutex); + + cancel_delayed_work_sync(&sfp->poll); + cancel_delayed_work_sync(&sfp->timeout); +} + static struct platform_driver sfp_driver = { .probe = sfp_probe, .remove = sfp_remove, + .shutdown = sfp_shutdown, .driver = { .name = "sfp", .of_match_table = sfp_of_match,
SFP device polling can cause problems during the shutdown process if the parent devices of the network controller have been shut down already. This problem was seen on the iMX6 platform with PCIe devices, where accessing the device after the bus is shut down causes a hang. Stop all delayed work in the SFP driver during the shutdown process, and set a flag which causes any further state checks or state machine events (possibly triggered by previous GPIO IRQs) to be skipped. Signed-off-by: Robert Hancock <hancock@sedsystems.ca> --- This is an updated version of a previous patch "net: sfp: Stop SFP polling during shutdown" with the addition of stopping handling of GPIO-triggered interrupts as well, as pointed out by Russell King. drivers/net/phy/sfp.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)