Message ID | 28b287eb3939b1941bd46b2ed9a6981a577568c4.1553050609.git.sbobroff@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (de3c83c2fd2b87cf68214eda76dfa66989d78cb6) |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 10 lines checked |
On 20/03/2019 13:58, Sam Bobroff wrote: > The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the > use of driver callbacks in drivers that have been bound part way > through the recovery process. This is necessary to prevent later stage > handlers from being called when the earlier stage handlers haven't, > which can be confusing for drivers. The flag is used from eeh_pe_report()->eeh_pe_report_edev which is called many times from eeh_handle_normal_event() (and you clear the flag here unconditionally) and once from eeh_handle_special_event() - so this is actually the only case now when the flag matters. Is my understanding correct? Also is not clearing the flag correct in that case? I do not quite understand eeh_handle_normal_event vs. eeh_handle_special_event business though. > > However, the flag is set for all devices that are added after boot > time and only cleared at the end of the EEH recovery process. This > results in hot plugged devices erroneously having the flag set during > the first recovery after they are added (causing their driver's > handlers to be incorrectly ignored). > > To remedy this, clear the flag at the beginning of recovery > processing. The flag is still cleared at the end of recovery > processing, although it is no longer really necessary. Then may be remove that redundant clearing? > > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> > --- > arch/powerpc/kernel/eeh_driver.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c > index 6f3ee30565dd..4c34b9901f15 100644 > --- a/arch/powerpc/kernel/eeh_driver.c > +++ b/arch/powerpc/kernel/eeh_driver.c > @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe) > result = PCI_ERS_RESULT_DISCONNECT; > } > > + eeh_for_each_pe(pe, tmp_pe) > + eeh_pe_for_each_dev(tmp_pe, edev, tmp) > + edev->mode &= ~EEH_DEV_NO_HANDLER; > + > /* Walk the various device drivers attached to this slot through > * a reset sequence, giving each an opportunity to do what it needs > * to accomplish the reset. Each child gets a report of the >
On Wed, Mar 20, 2019 at 05:02:57PM +1100, Alexey Kardashevskiy wrote: > > > On 20/03/2019 13:58, Sam Bobroff wrote: > > The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the > > use of driver callbacks in drivers that have been bound part way > > through the recovery process. This is necessary to prevent later stage > > handlers from being called when the earlier stage handlers haven't, > > which can be confusing for drivers. > > The flag is used from eeh_pe_report()->eeh_pe_report_edev which is > called many times from eeh_handle_normal_event() (and you clear the flag > here unconditionally) and once from eeh_handle_special_event() - so this > is actually the only case now when the flag matters. Is my understanding > correct? Also is not clearing the flag correct in that case? I do not > quite understand eeh_handle_normal_event vs. eeh_handle_special_event > business though. I'm not sure I fully understand your question, but here's the situation: * EEH is detected on a PCI device that has no driver bound but there is a driver that COULD bind. * eeh_handle_normal_event() follows the "EEH: Reset with hotplug activity" path because the device doesn't (currently) have a driver that supports EEH. * eeh_reset_device() removes the device (pci_hp_remove_devices()). * eeh_reset_device() re-discovers the device with pci_hp_add_devices(). * As part of re-discovery the PCI subsystem will bind the available driver. * eeh_handle_normal_event() calls eeh_report_resume() (via eeh_pe_report()). If the (newly bound) driver has a resume() handler, then eeh_report_resume() will call it and AFAIK this will cause a problem for some drivers because their error_detected() handler wasn't called first. The fix for this is to have eeh_add_device_late() set EEH_DEV_NO_HANDLER so that we can detect that the device has been added DURING recovery, and avoid calling it's handlers later. I see what you mean about the eeh_handle_special_event() case, where EEH_DEV_NO_HANDLER isn't cleared before calling eeh_pe_report(), and I think it's a bug! I'll fix it in the next version. (Cleaning up that flag is on my list. I don't think it's a very good solution.) > > > > However, the flag is set for all devices that are added after boot > > time and only cleared at the end of the EEH recovery process. This > > results in hot plugged devices erroneously having the flag set during > > the first recovery after they are added (causing their driver's > > handlers to be incorrectly ignored). > > > > To remedy this, clear the flag at the beginning of recovery > > processing. The flag is still cleared at the end of recovery > > processing, although it is no longer really necessary. > > Then may be remove that redundant clearing? I don't really mind either way; clearing it when we are finished with recovery seems "cleaner" to me but it doesn't have any function. (In any case, I think I will eventually want to remove it.) > > > > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> > > --- > > arch/powerpc/kernel/eeh_driver.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c > > index 6f3ee30565dd..4c34b9901f15 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe) > > result = PCI_ERS_RESULT_DISCONNECT; > > } > > > > + eeh_for_each_pe(pe, tmp_pe) > > + eeh_pe_for_each_dev(tmp_pe, edev, tmp) > > + edev->mode &= ~EEH_DEV_NO_HANDLER; > > + > > /* Walk the various device drivers attached to this slot through > > * a reset sequence, giving each an opportunity to do what it needs > > * to accomplish the reset. Each child gets a report of the > > > > -- > Alexey >
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 6f3ee30565dd..4c34b9901f15 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe) result = PCI_ERS_RESULT_DISCONNECT; } + eeh_for_each_pe(pe, tmp_pe) + eeh_pe_for_each_dev(tmp_pe, edev, tmp) + edev->mode &= ~EEH_DEV_NO_HANDLER; + /* Walk the various device drivers attached to this slot through * a reset sequence, giving each an opportunity to do what it needs * to accomplish the reset. Each child gets a report of the
The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the use of driver callbacks in drivers that have been bound part way through the recovery process. This is necessary to prevent later stage handlers from being called when the earlier stage handlers haven't, which can be confusing for drivers. However, the flag is set for all devices that are added after boot time and only cleared at the end of the EEH recovery process. This results in hot plugged devices erroneously having the flag set during the first recovery after they are added (causing their driver's handlers to be incorrectly ignored). To remedy this, clear the flag at the beginning of recovery processing. The flag is still cleared at the end of recovery processing, although it is no longer really necessary. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 4 ++++ 1 file changed, 4 insertions(+)