mbox series

[0/2] pcie hotplug and error fixes

Message ID 20240612181024.3577119-1-kbusch@meta.com
Headers show
Series pcie hotplug and error fixes | expand

Message

Keith Busch June 12, 2024, 6:10 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

I am working with larger pcie topologies again, and we're seeing some
failures when dealing with certain overlapping pcie events.

The topology is essentially this:

  [Root Port] <-> [UpStream Port] <-> [DownStream Port] <-> [End Device]

An error between the DSP and ED triggers DPC. There's only inband
presence detection so it also triggers hotplug. Before the error
handling is completed, though, another error seen by the RP triggers its
own DPC handling.

The concurrent event handling reveals some interesting races, and this
small patchset tries to address these in the low invasive way.

Keith Busch (2):
  PCI: pciehp: fix concurrent sub-tree removal deadlock
  PCI: err: ensure stable topology during handling

 drivers/pci/hotplug/pciehp_pci.c | 12 +++++++++---
 drivers/pci/pci.h                |  1 +
 drivers/pci/pcie/err.c           |  8 +++++++-
 drivers/pci/probe.c              | 24 ++++++++++++++++++++++++
 include/linux/pci.h              |  2 ++
 5 files changed, 43 insertions(+), 4 deletions(-)

Comments

Keith Busch June 12, 2024, 6:11 p.m. UTC | #1
On Wed, Jun 12, 2024 at 11:10:22AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> I am working with larger pcie topologies again, and we're seeing some
> failures when dealing with certain overlapping pcie events.
> 
> The topology is essentially this:
> 
>   [Root Port] <-> [UpStream Port] <-> [DownStream Port] <-> [End Device]
> 
> An error between the DSP and ED triggers DPC. There's only inband
> presence detection so it also triggers hotplug. Before the error
> handling is completed, though, another error seen by the RP triggers its
> own DPC handling.
> 
> The concurrent event handling reveals some interesting races, and this
> small patchset tries to address these in the low invasive way.
> 
> Keith Busch (2):
>   PCI: pciehp: fix concurrent sub-tree removal deadlock
>   PCI: err: ensure stable topology during handling

Oops, sorry for the noise here. I accidently pointed git send-email to
the wrong directory and resent v1 that was already flagged as breaking.
I'll resend the correct version shortly.