diff mbox series

[PATCHv2,3/5] PCI/ERR: Retain status from error notification

Message ID 20210104230300.1277180-4-kbusch@kernel.org
State New
Headers show
Series aer handling fixups | expand

Commit Message

Keith Busch Jan. 4, 2021, 11:02 p.m. UTC
Overwriting the frozen detected status with the result of the link reset
loses the NEED_RESET result that drivers are depending on for error
handling to report the .slot_reset() callback. Retain this status so
that subsequent error handling has the correct flow.

Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pcie/err.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Dan Williams March 3, 2021, 5:34 a.m. UTC | #1
[ Add Sathya ]

On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
> 
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Just want to report that this fix might be a candidate for -stable.
Recent DPC error recovery testing on v5.10 shows that losing this
status prevents NVME from restarting the queue after error recovery
with a failing signature like:

[  424.685179] pcie_do_recovery: pcieport 0000:97:02.0: AER: broadcast error_detected message
[  424.685185] nvme nvme0: frozen state error detected, reset controller
[  425.078620] pcie_do_recovery: pcieport 0000:97:02.0: AER: broadcast resume message
[  425.078674] pcieport 0000:97:02.0: AER: device recovery successful

...and then later:

[  751.146277] sysrq: Show Blocked State
[  751.146431] task:touch           state:D stack:    0 pid:11969 ppid: 11413 flags:0x00000000
[  751.146434] Call Trace:
[  751.146443]  __schedule+0x31b/0x890
[  751.146445]  schedule+0x3c/0xa0
[  751.146449]  blk_queue_enter+0x151/0x220
[  751.146454]  ? finish_wait+0x80/0x80
[  751.146455]  submit_bio_noacct+0x39a/0x410
[  751.146457]  submit_bio+0x42/0x150
[  751.146460]  ? bio_add_page+0x62/0x90
[  751.146461]  ? guard_bio_eod+0x25/0x70
[  751.146465]  submit_bh_wbc+0x16d/0x190
[  751.146469]  ext4_read_bh+0x47/0xa0
[  751.146472]  ext4_read_inode_bitmap+0x3cd/0x5f0

...because NVME was only told to disable, never to reset and resume the
block queue.

I have not tested it, but by code inspection I eventually found this
upstream fix.
Kuppuswamy, Sathyanarayanan March 3, 2021, 5:46 a.m. UTC | #2
On 3/2/21 9:34 PM, Williams, Dan J wrote:
> [ Add Sathya ]
>
> On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
>> Overwriting the frozen detected status with the result of the link reset
>> loses the NEED_RESET result that drivers are depending on for error
>> handling to report the .slot_reset() callback. Retain this status so
>> that subsequent error handling has the correct flow.
>>
>> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
>> Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Just want to report that this fix might be a candidate for -stable.
Agree.

I think it can be merged in both stable and mainline kernels.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Recent DPC error recovery testing on v5.10 shows that losing this
> status prevents NVME from restarting the queue after error recovery
> with a failing signature like:
>
> [  424.685179] pcie_do_recovery: pcieport 0000:97:02.0: AER: broadcast error_detected message
> [  424.685185] nvme nvme0: frozen state error detected, reset controller
> [  425.078620] pcie_do_recovery: pcieport 0000:97:02.0: AER: broadcast resume message
> [  425.078674] pcieport 0000:97:02.0: AER: device recovery successful
>
> ...and then later:
>
> [  751.146277] sysrq: Show Blocked State
> [  751.146431] task:touch           state:D stack:    0 pid:11969 ppid: 11413 flags:0x00000000
> [  751.146434] Call Trace:
> [  751.146443]  __schedule+0x31b/0x890
> [  751.146445]  schedule+0x3c/0xa0
> [  751.146449]  blk_queue_enter+0x151/0x220
> [  751.146454]  ? finish_wait+0x80/0x80
> [  751.146455]  submit_bio_noacct+0x39a/0x410
> [  751.146457]  submit_bio+0x42/0x150
> [  751.146460]  ? bio_add_page+0x62/0x90
> [  751.146461]  ? guard_bio_eod+0x25/0x70
> [  751.146465]  submit_bh_wbc+0x16d/0x190
> [  751.146469]  ext4_read_bh+0x47/0xa0
> [  751.146472]  ext4_read_inode_bitmap+0x3cd/0x5f0
>
> ...because NVME was only told to disable, never to reset and resume the
> block queue.
>
> I have not tested it, but by code inspection I eventually found this
> upstream fix.
Keith Busch March 4, 2021, 8:01 p.m. UTC | #3
On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> 
> On 3/2/21 9:34 PM, Williams, Dan J wrote:
> > [ Add Sathya ]
> > 
> > On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
> > > Overwriting the frozen detected status with the result of the link reset
> > > loses the NEED_RESET result that drivers are depending on for error
> > > handling to report the .slot_reset() callback. Retain this status so
> > > that subsequent error handling has the correct flow.
> > > 
> > > Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> > > Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > Just want to report that this fix might be a candidate for -stable.
> Agree.
> 
> I think it can be merged in both stable and mainline kernels.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Just FYI, this patch is practically a revert of this one:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d

so please let me know if that is still a problem for you.
Dan Williams March 4, 2021, 10:11 p.m. UTC | #4
On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> >
> > On 3/2/21 9:34 PM, Williams, Dan J wrote:
> > > [ Add Sathya ]
> > >
> > > On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
> > > > Overwriting the frozen detected status with the result of the link reset
> > > > loses the NEED_RESET result that drivers are depending on for error
> > > > handling to report the .slot_reset() callback. Retain this status so
> > > > that subsequent error handling has the correct flow.
> > > >
> > > > Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> > > > Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > Just want to report that this fix might be a candidate for -stable.
> > Agree.
> >
> > I think it can be merged in both stable and mainline kernels.
> >
> > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
> Just FYI, this patch is practically a revert of this one:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d
>
> so please let me know if that is still a problem for you.

For what it's worth I think "6d2c89441571 PCI/ERR: Update error status
after reset_link()" is not justified. The link shouldn't recover if
the attached device is not prepared to handle DPC events. As far as I
can see it's just a cosmetic fix, right? Sathya, was there some end
user visible need to make the DPC recovery report "success" in this
case?
Dan Williams March 4, 2021, 10:59 p.m. UTC | #5
On Thu, Mar 4, 2021 at 2:38 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@intel.com> wrote:
>
>
> On 3/4/21 2:11 PM, Dan Williams wrote:
>
> On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:
>
> On 3/2/21 9:34 PM, Williams, Dan J wrote:
>
> [ Add Sathya ]
>
> On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
>
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
>
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
>
> Just want to report that this fix might be a candidate for -stable.
>
> Agree.
>
> I think it can be merged in both stable and mainline kernels.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
> Just FYI, this patch is practically a revert of this one:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d
>
> so please let me know if that is still a problem for you.
>
> For what it's worth I think "6d2c89441571 PCI/ERR: Update error status
> after reset_link()" is not justified. The link shouldn't recover if
> the attached device is not prepared to handle DPC events.
>
> I added that fix to address the recovery issue seen in a Dell server
> platform (for EDR test case). If I understand the history correctly,
> In EDR case, AER and DPC is owned by firmware, hence we get
> PCI_ERS_RESULT_NO_AER_DRIVER when executing error_detected() callbacks.
> So If we continue the pcie_do_recovery() with PCI_ERS_RESULT_NO_AER_DRIVER
> as error status, then even if we successfully reset the link we will report
> the recovery status as failure.

But that's the right response if there is no handler. If there is a
device attached that was not prepared for the link to go down then it
does not matter if the link comes back ok that device will be
thoroughly confused and should be disconnected.
Kuppuswamy, Sathyanarayanan March 4, 2021, 11:19 p.m. UTC | #6
On 3/4/21 2:59 PM, Dan Williams wrote:
> On Thu, Mar 4, 2021 at 2:38 PM Kuppuswamy, Sathyanarayanan
> <sathyanarayanan.kuppuswamy@intel.com> wrote:
>>
>> On 3/4/21 2:11 PM, Dan Williams wrote:
>>
>> On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@kernel.org> wrote:
>>
>> On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:
>>
>> On 3/2/21 9:34 PM, Williams, Dan J wrote:
>>
>> [ Add Sathya ]
>>
>> On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
>>
>> Overwriting the frozen detected status with the result of the link reset
>> loses the NEED_RESET result that drivers are depending on for error
>> handling to report the .slot_reset() callback. Retain this status so
>> that subsequent error handling has the correct flow.
>>
>> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
>> Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Keith Busch <kbusch@kernel.org>
>>
>> Just want to report that this fix might be a candidate for -stable.
>>
>> Agree.
>>
>> I think it can be merged in both stable and mainline kernels.
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> Just FYI, this patch is practically a revert of this one:
>>
>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d
>>
>> so please let me know if that is still a problem for you.
>>
>> For what it's worth I think "6d2c89441571 PCI/ERR: Update error status
>> after reset_link()" is not justified. The link shouldn't recover if
>> the attached device is not prepared to handle DPC events.
>>
>> I added that fix to address the recovery issue seen in a Dell server
>> platform (for EDR test case). If I understand the history correctly,
>> In EDR case, AER and DPC is owned by firmware, hence we get
>> PCI_ERS_RESULT_NO_AER_DRIVER when executing error_detected() callbacks.
>> So If we continue the pcie_do_recovery() with PCI_ERS_RESULT_NO_AER_DRIVER
>> as error status, then even if we successfully reset the link we will report
>> the recovery status as failure.
> But that's the right response if there is no handler.
If the handler is not available due to AER being owned by firmware,
then it needs to be fixed. In EDR mode, even if DPC/AER is owned
by firmware , OS need to own the recovery part. So I think it
needs further investigation to understand why it reports,
PCI_ERS_RESULT_NO_AER_DRIVER
> If there is a
> device attached that was not prepared for the link to go down then it
> does not matter if the link comes back ok that device will be
> thoroughly confused and should be disconnected.
Dan Williams March 5, 2021, 12:23 a.m. UTC | #7
On Thu, Mar 4, 2021 at 3:19 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@intel.com> wrote:
>
>
> On 3/4/21 2:59 PM, Dan Williams wrote:
> > On Thu, Mar 4, 2021 at 2:38 PM Kuppuswamy, Sathyanarayanan
> > <sathyanarayanan.kuppuswamy@intel.com> wrote:
> >>
> >> On 3/4/21 2:11 PM, Dan Williams wrote:
> >>
> >> On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@kernel.org> wrote:
> >>
> >> On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> >>
> >> On 3/2/21 9:34 PM, Williams, Dan J wrote:
> >>
> >> [ Add Sathya ]
> >>
> >> On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
> >>
> >> Overwriting the frozen detected status with the result of the link reset
> >> loses the NEED_RESET result that drivers are depending on for error
> >> handling to report the .slot_reset() callback. Retain this status so
> >> that subsequent error handling has the correct flow.
> >>
> >> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> >> Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
> >> Signed-off-by: Keith Busch <kbusch@kernel.org>
> >>
> >> Just want to report that this fix might be a candidate for -stable.
> >>
> >> Agree.
> >>
> >> I think it can be merged in both stable and mainline kernels.
> >>
> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> >>
> >> Just FYI, this patch is practically a revert of this one:
> >>
> >>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d
> >>
> >> so please let me know if that is still a problem for you.
> >>
> >> For what it's worth I think "6d2c89441571 PCI/ERR: Update error status
> >> after reset_link()" is not justified. The link shouldn't recover if
> >> the attached device is not prepared to handle DPC events.
> >>
> >> I added that fix to address the recovery issue seen in a Dell server
> >> platform (for EDR test case). If I understand the history correctly,
> >> In EDR case, AER and DPC is owned by firmware, hence we get
> >> PCI_ERS_RESULT_NO_AER_DRIVER when executing error_detected() callbacks.
> >> So If we continue the pcie_do_recovery() with PCI_ERS_RESULT_NO_AER_DRIVER
> >> as error status, then even if we successfully reset the link we will report
> >> the recovery status as failure.
> > But that's the right response if there is no handler.
> If the handler is not available due to AER being owned by firmware,
> then it needs to be fixed. In EDR mode, even if DPC/AER is owned
> by firmware , OS need to own the recovery part. So I think it
> needs further investigation to understand why it reports,
> PCI_ERS_RESULT_NO_AER_DRIVER

As far as I can see the only way to get PCI_ERS_RESULT_NO_AER_DRIVER
is when there actually is no handler, or the device io state has set
to failed. I notice the hotplug handler sets the device io state to
failed while processing link down. If the device is actually missing a
handler definition then disconnect seems the right response.
Keith Busch March 5, 2021, 12:54 a.m. UTC | #8
On Thu, Mar 04, 2021 at 04:23:01PM -0800, Dan Williams wrote:
> On Thu, Mar 4, 2021 at 3:19 PM Kuppuswamy, Sathyanarayanan
> <sathyanarayanan.kuppuswamy@intel.com> wrote:
>
> > If the handler is not available due to AER being owned by firmware,
> > then it needs to be fixed. In EDR mode, even if DPC/AER is owned
> > by firmware , OS need to own the recovery part. So I think it
> > needs further investigation to understand why it reports,
> > PCI_ERS_RESULT_NO_AER_DRIVER
> 
> As far as I can see the only way to get PCI_ERS_RESULT_NO_AER_DRIVER
> is when there actually is no handler, or the device io state has set
> to failed. I notice the hotplug handler sets the device io state to
> failed while processing link down. If the device is actually missing a
> handler definition then disconnect seems the right response.

Hypothetically speaking, if a DPC event occurs at the Root Port above a
deep PCIe topology, it only takes one device with no handler to spoil
the recovery for the well behaved drivers lower in the hierarchy. We can
probably handle that better if this scenario actually happens. Not sure
if does happen, but the switchtec driver looks sus.
diff mbox series

Patch

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index a84f0bf4c1e2..b576aa890c76 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -198,8 +198,7 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_dbg(bridge, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bridge(bridge, report_frozen_detected, &status);
-		status = reset_subordinates(bridge);
-		if (status != PCI_ERS_RESULT_RECOVERED) {
+		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(bridge, "subordinate device reset failed\n");
 			goto failed;
 		}