diff mbox series

[2/2] PCI/portdrv: Don't disable pci device during shutdown

Message ID 1599818977-25425-2-git-send-email-chenhc@lemote.com
State New
Headers show
Series [1/2] PCI/portdrv: Remove the .remove() method in pcie_portdriver | expand

Commit Message

Huacai Chen Sept. 11, 2020, 10:09 a.m. UTC
Don't call pci_disable_device() in pcie_port_device_remove() during the
portdrv's shutdown. This can avoid some poweroff/reboot failures.

The poweroff/reboot failures can easily reproduce on Loongson platforms.
I think this is not a Loongson-specific problem, instead, is a problem
related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.

Radeon driver is more difficult than amdgpu due to its confusing symbol
names, and I have maintained an out-of-tree patch for a long time [1].
Recently, we found more and more devices can cause the same problem, and
it is very difficult to modify all problematic drivers as radeon/amdgpu
does. So, I think modify the PCIe port driver is a simple and effective
way.

[1] https://github.com/chenhuacai/linux/commit/6612f9c1fc290d42a14618ce9a7d03014d8ebb1a

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 drivers/pci/pcie/portdrv_core.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Bjorn Helgaas Sept. 13, 2020, 4:04 p.m. UTC | #1
On Fri, Sep 11, 2020 at 06:09:37PM +0800, Huacai Chen wrote:
> Don't call pci_disable_device() in pcie_port_device_remove() during the
> portdrv's shutdown. This can avoid some poweroff/reboot failures.
> 
> The poweroff/reboot failures can easily reproduce on Loongson platforms.
> I think this is not a Loongson-specific problem, instead, is a problem
> related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
> devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
> ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.
> 
> Radeon driver is more difficult than amdgpu due to its confusing symbol
> names, and I have maintained an out-of-tree patch for a long time [1].
> Recently, we found more and more devices can cause the same problem, and
> it is very difficult to modify all problematic drivers as radeon/amdgpu
> does. So, I think modify the PCIe port driver is a simple and effective
> way.

This needs to explain in more detail what the failure is and how this
patch fixes it.

The main thing pci_disable_device() does is turn off bus mastering, so
I assume this has to do with DMA during shutdown or reboot.  This
change is in portdrv, so it affects PCIe Root Ports and Switch Ports,
which of course are type 1 (bridge) devices.  Clearing
PCI_COMMAND_MASTER on bridges prevents them from forwarding memory or
I/O requests in the upstream direction, i.e., it prevents DMA from
devices below the bridge.

But if the problem is DMA, the same problem may occur with Root
Complex Integrated Endpoints or conventional PCI devices, since
portdrv may not be involved in those topologies.

> [1] https://github.com/chenhuacai/linux/commit/6612f9c1fc290d42a14618ce9a7d03014d8ebb1a
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  drivers/pci/pcie/portdrv_core.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 50a9522..1991aca 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -491,7 +491,6 @@ void pcie_port_device_remove(struct pci_dev *dev)
>  {
>  	device_for_each_child(&dev->dev, NULL, remove_iter);
>  	pci_free_irq_vectors(dev);
> -	pci_disable_device(dev);
>  }
>  
>  /**
> -- 
> 2.7.0
>
Deucher, Alexander Sept. 14, 2020, 8:43 p.m. UTC | #2
[AMD Public Use]

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Sunday, September 13, 2020 12:05 PM
> To: Huacai Chen <chenhc@lemote.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; linux-pci@vger.kernel.org; Huacai Chen
> <chenhuacai@gmail.com>; Jiaxun Yang <jiaxun.yang@flygoat.com>; Tiezhu
> Yang <yangtiezhu@loongson.cn>
> Subject: Re: [PATCH 2/2] PCI/portdrv: Don't disable pci device during
> shutdown
> 
> On Fri, Sep 11, 2020 at 06:09:37PM +0800, Huacai Chen wrote:
> > Don't call pci_disable_device() in pcie_port_device_remove() during
> > the portdrv's shutdown. This can avoid some poweroff/reboot failures.
> >
> > The poweroff/reboot failures can easily reproduce on Loongson platforms.
> > I think this is not a Loongson-specific problem, instead, is a problem
> > related to some specific PCI hosts. On some x86 platforms,
> > radeon/amdgpu devices can cause the same problem, and commit
> > faefba95c9e8ca3a523831c2e
> > ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.
> >
> > Radeon driver is more difficult than amdgpu due to its confusing
> > symbol names, and I have maintained an out-of-tree patch for a long time
> [1].
> > Recently, we found more and more devices can cause the same problem,
> > and it is very difficult to modify all problematic drivers as
> > radeon/amdgpu does. So, I think modify the PCIe port driver is a
> > simple and effective way.
> 
> This needs to explain in more detail what the failure is and how this patch
> fixes it.
> 
> The main thing pci_disable_device() does is turn off bus mastering, so I
> assume this has to do with DMA during shutdown or reboot.  This change is in
> portdrv, so it affects PCIe Root Ports and Switch Ports, which of course are
> type 1 (bridge) devices.  Clearing PCI_COMMAND_MASTER on bridges
> prevents them from forwarding memory or I/O requests in the upstream
> direction, i.e., it prevents DMA from devices below the bridge.
> 
> But if the problem is DMA, the same problem may occur with Root Complex
> Integrated Endpoints or conventional PCI devices, since portdrv may not be
> involved in those topologies.

I'm not sure I understand what the point of this patch is.  Isn't the whole point of the shutdown callback to cleanly tear down whatever tasks are happening on the device?  It could be DMA or stuff happening on the device itself (e.g., microcontrollers on the devices, etc.).  Most of the complications in GPU drivers are due to the lifetime issues between drm and other subsystems.

Alex

> 
> > [1]
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2Fchenhuacai%2Flinux%2Fcommit%2F6612f9c1fc290d42a14618ce9a
> 7d030
> >
> 14d8ebb1a&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C1cc
> 5cca01b5
> >
> 74485a0aa08d857feb88e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
> 0%7C637
> >
> 356098841869775&amp;sdata=HJmniTV2jJLEiOh3UCFpqPuGeq38y6crax2ahZa
> 5Eqc%
> > 3D&amp;reserved=0
> >
> > Signed-off-by: Huacai Chen <chenhc@lemote.com>
> > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> > ---
> >  drivers/pci/pcie/portdrv_core.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv_core.c
> > b/drivers/pci/pcie/portdrv_core.c index 50a9522..1991aca 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -491,7 +491,6 @@ void pcie_port_device_remove(struct pci_dev
> *dev)
> > {
> >  	device_for_each_child(&dev->dev, NULL, remove_iter);
> >  	pci_free_irq_vectors(dev);
> > -	pci_disable_device(dev);
> >  }
> >
> >  /**
> > --
> > 2.7.0
> >
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 50a9522..1991aca 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -491,7 +491,6 @@  void pcie_port_device_remove(struct pci_dev *dev)
 {
 	device_for_each_child(&dev->dev, NULL, remove_iter);
 	pci_free_irq_vectors(dev);
-	pci_disable_device(dev);
 }
 
 /**