diff mbox series

[v9,10/13] PCI: Give pci_intx() its own devres callback

Message ID 20240613115032.29098-11-pstanner@redhat.com
State New
Headers show
Series Make PCI's devres API more consistent | expand

Commit Message

Philipp Stanner June 13, 2024, 11:50 a.m. UTC
pci_intx() is one of the functions that have "hybrid mode" (i.e.,
sometimes managed, sometimes not). Providing a separate pcim_intx()
function with its own device resource and cleanup callback allows for
removing further large parts of the legacy PCI devres implementation.

As in the region-request-functions, pci_intx() has to call into its
managed counterpart for backwards compatibility.

As pci_intx() is an outdated function, pcim_intx() shall not be made
visible to drivers via a public API.

Implement pcim_intx() with its own device resource.
Make pci_intx() call pcim_intx() in the managed case.
Remove the now surplus function find_pci_dr().

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
 drivers/pci/devres.c | 76 ++++++++++++++++++++++++++++++++++++--------
 drivers/pci/pci.c    | 21 ++++++------
 drivers/pci/pci.h    | 13 ++++----
 3 files changed, 80 insertions(+), 30 deletions(-)

Comments

Bjorn Helgaas June 13, 2024, 9:06 p.m. UTC | #1
On Thu, Jun 13, 2024 at 01:50:23PM +0200, Philipp Stanner wrote:
> pci_intx() is one of the functions that have "hybrid mode" (i.e.,
> sometimes managed, sometimes not). Providing a separate pcim_intx()
> function with its own device resource and cleanup callback allows for
> removing further large parts of the legacy PCI devres implementation.
> 
> As in the region-request-functions, pci_intx() has to call into its
> managed counterpart for backwards compatibility.
> 
> As pci_intx() is an outdated function, pcim_intx() shall not be made
> visible to drivers via a public API.

What makes pci_intx() outdated?  If it's outdated, we should mention
why and what the 30+ callers (including a couple in drivers/pci/)
should use instead.

Bjorn
Philipp Stanner June 14, 2024, 8:09 a.m. UTC | #2
On Thu, 2024-06-13 at 16:06 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 13, 2024 at 01:50:23PM +0200, Philipp Stanner wrote:
> > pci_intx() is one of the functions that have "hybrid mode" (i.e.,
> > sometimes managed, sometimes not). Providing a separate pcim_intx()
> > function with its own device resource and cleanup callback allows
> > for
> > removing further large parts of the legacy PCI devres
> > implementation.
> > 
> > As in the region-request-functions, pci_intx() has to call into its
> > managed counterpart for backwards compatibility.
> > 
> > As pci_intx() is an outdated function, pcim_intx() shall not be
> > made
> > visible to drivers via a public API.
> 
> What makes pci_intx() outdated?  If it's outdated, we should mention
> why and what the 30+ callers (including a couple in drivers/pci/)
> should use instead.

That is 100% based on Andy Shevchenko's (+CC) statement back from
January 2024 a.D. [1]

Apparently INTx is "old IRQ management" and should be done through
pci_alloc_irq_vectors() nowadays.


[1] https://lore.kernel.org/all/ZabyY3csP0y-p7lb@surfacebook.localdomain/


P.


> 
> Bjorn
>
Bjorn Helgaas June 14, 2024, 4:14 p.m. UTC | #3
On Fri, Jun 14, 2024 at 10:09:46AM +0200, Philipp Stanner wrote:
> On Thu, 2024-06-13 at 16:06 -0500, Bjorn Helgaas wrote:
> > On Thu, Jun 13, 2024 at 01:50:23PM +0200, Philipp Stanner wrote:
> > > pci_intx() is one of the functions that have "hybrid mode" (i.e.,
> > > sometimes managed, sometimes not). Providing a separate pcim_intx()
> > > function with its own device resource and cleanup callback allows
> > > for
> > > removing further large parts of the legacy PCI devres
> > > implementation.
> > > 
> > > As in the region-request-functions, pci_intx() has to call into its
> > > managed counterpart for backwards compatibility.
> > > 
> > > As pci_intx() is an outdated function, pcim_intx() shall not be
> > > made
> > > visible to drivers via a public API.
> > 
> > What makes pci_intx() outdated?  If it's outdated, we should mention
> > why and what the 30+ callers (including a couple in drivers/pci/)
> > should use instead.
> 
> That is 100% based on Andy Shevchenko's (+CC) statement back from
> January 2024 a.D. [1]
> 
> Apparently INTx is "old IRQ management" and should be done through
> pci_alloc_irq_vectors() nowadays.

Do we have pcim_ support for pci_alloc_irq_vectors()?

> [1] https://lore.kernel.org/all/ZabyY3csP0y-p7lb@surfacebook.localdomain/
Philipp Stanner June 17, 2024, 8:21 a.m. UTC | #4
On Fri, 2024-06-14 at 11:14 -0500, Bjorn Helgaas wrote:
> On Fri, Jun 14, 2024 at 10:09:46AM +0200, Philipp Stanner wrote:
> > On Thu, 2024-06-13 at 16:06 -0500, Bjorn Helgaas wrote:
> > > On Thu, Jun 13, 2024 at 01:50:23PM +0200, Philipp Stanner wrote:
> > > > pci_intx() is one of the functions that have "hybrid mode"
> > > > (i.e.,
> > > > sometimes managed, sometimes not). Providing a separate
> > > > pcim_intx()
> > > > function with its own device resource and cleanup callback
> > > > allows
> > > > for
> > > > removing further large parts of the legacy PCI devres
> > > > implementation.
> > > > 
> > > > As in the region-request-functions, pci_intx() has to call into
> > > > its
> > > > managed counterpart for backwards compatibility.
> > > > 
> > > > As pci_intx() is an outdated function, pcim_intx() shall not be
> > > > made
> > > > visible to drivers via a public API.
> > > 
> > > What makes pci_intx() outdated?  If it's outdated, we should
> > > mention
> > > why and what the 30+ callers (including a couple in drivers/pci/)
> > > should use instead.
> > 
> > That is 100% based on Andy Shevchenko's (+CC) statement back from
> > January 2024 a.D. [1]
> > 
> > Apparently INTx is "old IRQ management" and should be done through
> > pci_alloc_irq_vectors() nowadays.
> 
> Do we have pcim_ support for pci_alloc_irq_vectors()?

Nope.

All PCI devres functions that exist are now in pci/devres.c, except for
the hybrid functions (pci_intx() and everything calling
__pci_request_region()) in pci.c


P.

> 
> > [1]
> > https://lore.kernel.org/all/ZabyY3csP0y-p7lb@surfacebook.localdomain/
>
Bjorn Helgaas June 17, 2024, 4:46 p.m. UTC | #5
On Mon, Jun 17, 2024 at 10:21:10AM +0200, Philipp Stanner wrote:
> On Fri, 2024-06-14 at 11:14 -0500, Bjorn Helgaas wrote:
> > On Fri, Jun 14, 2024 at 10:09:46AM +0200, Philipp Stanner wrote:
> ...

> > > Apparently INTx is "old IRQ management" and should be done through
> > > pci_alloc_irq_vectors() nowadays.
> > 
> > Do we have pcim_ support for pci_alloc_irq_vectors()?
> 
> Nope.

Should we?  Or is IRQ support not amenable to devm?

Happened to see this new driver:
https://lore.kernel.org/all/20240617100359.2550541-3-Basavaraj.Natikar@amd.com/
that uses devm and the only PCI-related part of .remove() is cleaning
up the IRQs.
Philipp Stanner June 18, 2024, 7:56 a.m. UTC | #6
On Mon, 2024-06-17 at 11:46 -0500, Bjorn Helgaas wrote:
> On Mon, Jun 17, 2024 at 10:21:10AM +0200, Philipp Stanner wrote:
> > On Fri, 2024-06-14 at 11:14 -0500, Bjorn Helgaas wrote:
> > > On Fri, Jun 14, 2024 at 10:09:46AM +0200, Philipp Stanner wrote:
> > ...
> 
> > > > Apparently INTx is "old IRQ management" and should be done
> > > > through
> > > > pci_alloc_irq_vectors() nowadays.
> > > 
> > > Do we have pcim_ support for pci_alloc_irq_vectors()?
> > 
> > Nope.
> 
> Should we?  Or is IRQ support not amenable to devm?

I don't see why it wouldn't work, AFAIU you just register a callback
that deregisters the interrupts again.

This series here, though, stems from me trying to clean up drivers in
DRM. That's when I discovered that regions / IO-mappings (which I need)
are broken.

Adding further stuff to pci/devres.c is no problem at all and
independent from this series; one just has to add the code and call the
appropriate devm_ functions.

> 
> Happened to see this new driver:
> https://lore.kernel.org/all/20240617100359.2550541-3-Basavaraj.Natikar@amd.com/
> that uses devm and the only PCI-related part of .remove() is cleaning
> up the IRQs.
> 

OK. They also use pcim_iomap_table() and stuff. I think we should
inform about the deprecation.

I don't have a user for IRQ at hand for my DRM work right now. I'd try
to upstream new infrastructure we need there as I did for vboxvideo.


Grüße,
P.
Kalra, Ashish July 8, 2024, 9:46 p.m. UTC | #7
With this patch applied, we are observing unloading and then reloading issues with the AMD Crypto (CCP) driver:

with DEVRES logging enabled, we observe the following logs:

[  218.093588] ccp 0000:a2:00.1: DEVRES REL 00000000c18c52fb 0xffff8d09dc1972c0 devm_kzalloc_release (152 bytes)
[  218.105527] ccp 0000:a2:00.1: DEVRES REL 000000003091fb95 0xffff8d09d3aad000 devm_kzalloc_release (3072 bytes)
[  218.117500] ccp 0000:a2:00.1: DEVRES REL 0000000049e4adfe 0xffff8d09d588f000 pcim_intx_restore (4 bytes)
[  218.129519] ccp 0000:a2:00.1: DEVRES ADD 000000001a2ac6ad 0xffff8cfa867b7cc0 pcim_intx_restore (4 bytes)
[  218.140434] ccp 0000:a2:00.1: DEVRES REL 00000000627ecaf7 0xffff8d09d588f680 pcim_msi_release (16 bytes)
[  218.151665] ccp 0000:a2:00.1: DEVRES REL 0000000058b2252a 0xffff8d09dc199680 msi_device_data_release (80 bytes)
[  218.163625] ccp 0000:a2:00.1: DEVRES REL 00000000435cc85e 0xffff8d09d588ff80 devm_attr_group_remove (8 bytes)
[  218.175224] ccp 0000:a2:00.1: DEVRES REL 00000000cb6fcd9b 0xffff8d09eb583660 pcim_addr_resource_release (40 bytes)
[  218.187319] ccp 0000:a2:00.1: DEVRES REL 00000000d64a8b84 0xffff8d09eb583180 pcim_iomap_release (48 bytes)
[  218.198615] ccp 0000:a2:00.1: DEVRES REL 0000000099ac6b28 0xffff8d09eb5830c0 pcim_addr_resource_release (40 bytes)
[  218.210730] ccp 0000:a2:00.1: DEVRES REL 00000000bdd27f88 0xffff8d09d3ac2700 pcim_release (0 bytes)
[  218.221489] ccp 0000:a2:00.1: DEVRES REL 00000000e763315c 0xffff8d09d3ac2240 devm_kzalloc_release (20 bytes)
[  218.233008] ccp 0000:a2:00.1: DEVRES REL 00000000ae90f983 0xffff8d09dc25a800 devm_kzalloc_release (184 bytes)
[  218.245251] ccp 0000:23:00.1: DEVRES REL 00000000a2ec0085 0xffff8cfa86bee700 fw_name_devm_release (16 bytes)
[  218.256748] ccp 0000:23:00.1: DEVRES REL 0000000021bccd98 0xffff8cfaa528d5c0 devm_pages_release (16 bytes)
[  218.268044] ccp 0000:23:00.1: DEVRES REL 000000003ef7cbc7 0xffff8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
[  218.279631] ccp 0000:23:00.1: DEVRES REL 00000000619322e1 0xffff8cfaa1b5e480 devm_kzalloc_release (152 bytes)
[  218.300438] ccp 0000:23:00.1: DEVRES REL 00000000c261523b 0xffff8cfaad88b000 devm_kzalloc_release (3072 bytes)
[  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
[  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
[  218.391226] ccp 0000:23:00.1: DEVRES REL 0000000058c9dce1 0xffff8cfaa528d880 pcim_msi_release (16 bytes)
[  218.421340] ccp 0000:23:00.1: DEVRES REL 00000000c8ab08a7 0xffff8cfa9e617300 msi_device_data_release (80 bytes)
[  218.452357] ccp 0000:23:00.1: DEVRES REL 00000000cf5baccb 0xffff8cfaa528d8c0 devm_attr_group_remove (8 bytes)
[  218.483011] ccp 0000:23:00.1: DEVRES REL 00000000b8cbbadd 0xffff8cfa9c596060 pcim_addr_resource_release (40 bytes)
[  218.514343] ccp 0000:23:00.1: DEVRES REL 00000000920f9607 0xffff8cfa9c596c60 pcim_iomap_release (48 bytes)
[  218.544659] ccp 0000:23:00.1: DEVRES REL 00000000d401a708 0xffff8cfa9c596840 pcim_addr_resource_release (40 bytes)
[  218.575774] ccp 0000:23:00.1: DEVRES REL 00000000865d2fa2 0xffff8cfaa528d940 pcim_release (0 bytes)
[  218.605758] ccp 0000:23:00.1: DEVRES REL 00000000f5b79222 0xffff8cfaa528d080 devm_kzalloc_release (20 bytes)
[  218.636260] ccp 0000:23:00.1: DEVRES REL 0000000037ef240a 0xffff8cfa9eeb3f00 devm_kzalloc_release (184 bytes)

and the CCP driver reload issue during driver probe:

[  226.552684] pci 0000:23:00.1: Resources present before probing
[  226.568846] pci 0000:a2:00.1: Resources present before probing

From the above DEVRES logging, it looks like pcim_intx_restore associated resource is being released but then
being re-added during detach/unload, which causes really_probe() to fail at probe time, as dev->devres_head is
not empty due to this added resource:
...
[  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
[  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
...

Going more deep into this: 

This is the initial pcim_intx_resoure associated resource being added during first (CCP) driver load:

[   40.418933]  pcim_intx+0x3a/0x120
[   40.418936]  pci_intx+0x8b/0xa0
[   40.418939]  __pci_enable_msix_range+0x369/0x530
[   40.418943]  pci_enable_msix_range+0x18/0x20
[   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
[   40.418965] ipmi device interface
[   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
[   40.418969]  local_pci_probe+0x4f/0xb0
[   40.418973]  work_for_cpu_fn+0x1e/0x30
[   40.418976]  process_one_work+0x183/0x350
[   40.418980]  worker_thread+0x2df/0x3f0
[   40.418982]  ? __pfx_worker_thread+0x10/0x10
[   40.418985]  kthread+0xd0/0x100
[   40.418987]  ? __pfx_kthread+0x10/0x10
[   40.418990]  ret_from_fork+0x40/0x60
[   40.418993]  ? __pfx_kthread+0x10/0x10
[   40.418996]  ret_from_fork_asm+0x1a/0x30
[   40.419001]  </TASK>
..
..
[   40.419012] ccp 0000:23:00.1: DEVRES ADD 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)

Now, at driver unload: 
devres_release_all() -> remove_nodes() -> release_nodes() ...

remove_nodes() moves normal devres entries to the todo list, as can be seen with the following log:
...
[  218.245241] moving node 00000000fbd19618 0xffff8cfaa528d140 from devres to todo list
...

So, now this pcim_intx_resource associated resource is no longer part of dev->devres_head list and has been
moved to the todo list.

Later, when release_nodes() is invoked, it calls the associated release() callback associated with this devres:
...
[  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
...

The call flow for that is:
pcim_intx_restore() -> pci_intx() -> pcim_intx() ...

Now, pcim_intx() calls get_or_create_intx_devres() which tries to find it's associated devres using devres_find(), but 
that fails to find the devres, as the devres is no longer on dev->devres_head and has been moved to todo list.

Therefore, get_or_create_intx_devres() adds a new devres at driver unload/detach time:
...
[  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
...

But, then this is an issue as pcim_intx() is supposed to restore the original PCI INTx state on driver detach, but it now
operating on a newly added devres and not the original devres (added at driver probe) which contains the original PCI INTx
state, so it will be restoring an incorrect PCI INTx state ?

Additionally, this newly added devres causes driver reload/probe failure as really_probe() now finds resources present
before probing.

Not sure, if this issue has been observed with other PCI device drivers.

Thanks,
Ashish
Philipp Stanner July 9, 2024, 7:21 a.m. UTC | #8
@Bjorn, @Krzysztof

On Mon, 2024-07-08 at 21:46 +0000, Ashish Kalra wrote:
> With this patch applied, we are observing unloading and then
> reloading issues with the AMD Crypto (CCP) driver:

Thank you very much for digging into this, Ashish

Could you give me some pointers how one could test CCP by himself?

> 
> with DEVRES logging enabled, we observe the following logs:
> 
> [  218.093588] ccp 0000:a2:00.1: DEVRES REL 00000000c18c52fb
> 0xffff8d09dc1972c0 devm_kzalloc_release (152 bytes)
> [  218.105527] ccp 0000:a2:00.1: DEVRES REL 000000003091fb95
> 0xffff8d09d3aad000 devm_kzalloc_release (3072 bytes)
> [  218.117500] ccp 0000:a2:00.1: DEVRES REL 0000000049e4adfe
> 0xffff8d09d588f000 pcim_intx_restore (4 bytes)
> [  218.129519] ccp 0000:a2:00.1: DEVRES ADD 000000001a2ac6ad
> 0xffff8cfa867b7cc0 pcim_intx_restore (4 bytes)
> [  218.140434] ccp 0000:a2:00.1: DEVRES REL 00000000627ecaf7
> 0xffff8d09d588f680 pcim_msi_release (16 bytes)
> [  218.151665] ccp 0000:a2:00.1: DEVRES REL 0000000058b2252a
> 0xffff8d09dc199680 msi_device_data_release (80 bytes)
> [  218.163625] ccp 0000:a2:00.1: DEVRES REL 00000000435cc85e
> 0xffff8d09d588ff80 devm_attr_group_remove (8 bytes)
> [  218.175224] ccp 0000:a2:00.1: DEVRES REL 00000000cb6fcd9b
> 0xffff8d09eb583660 pcim_addr_resource_release (40 bytes)
> [  218.187319] ccp 0000:a2:00.1: DEVRES REL 00000000d64a8b84
> 0xffff8d09eb583180 pcim_iomap_release (48 bytes)
> [  218.198615] ccp 0000:a2:00.1: DEVRES REL 0000000099ac6b28
> 0xffff8d09eb5830c0 pcim_addr_resource_release (40 bytes)
> [  218.210730] ccp 0000:a2:00.1: DEVRES REL 00000000bdd27f88
> 0xffff8d09d3ac2700 pcim_release (0 bytes)
> [  218.221489] ccp 0000:a2:00.1: DEVRES REL 00000000e763315c
> 0xffff8d09d3ac2240 devm_kzalloc_release (20 bytes)
> [  218.233008] ccp 0000:a2:00.1: DEVRES REL 00000000ae90f983
> 0xffff8d09dc25a800 devm_kzalloc_release (184 bytes)
> [  218.245251] ccp 0000:23:00.1: DEVRES REL 00000000a2ec0085
> 0xffff8cfa86bee700 fw_name_devm_release (16 bytes)
> [  218.256748] ccp 0000:23:00.1: DEVRES REL 0000000021bccd98
> 0xffff8cfaa528d5c0 devm_pages_release (16 bytes)
> [  218.268044] ccp 0000:23:00.1: DEVRES REL 000000003ef7cbc7
> 0xffff8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
> [  218.279631] ccp 0000:23:00.1: DEVRES REL 00000000619322e1
> 0xffff8cfaa1b5e480 devm_kzalloc_release (152 bytes)
> [  218.300438] ccp 0000:23:00.1: DEVRES REL 00000000c261523b
> 0xffff8cfaad88b000 devm_kzalloc_release (3072 bytes)
> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
> [  218.391226] ccp 0000:23:00.1: DEVRES REL 0000000058c9dce1
> 0xffff8cfaa528d880 pcim_msi_release (16 bytes)
> [  218.421340] ccp 0000:23:00.1: DEVRES REL 00000000c8ab08a7
> 0xffff8cfa9e617300 msi_device_data_release (80 bytes)
> [  218.452357] ccp 0000:23:00.1: DEVRES REL 00000000cf5baccb
> 0xffff8cfaa528d8c0 devm_attr_group_remove (8 bytes)
> [  218.483011] ccp 0000:23:00.1: DEVRES REL 00000000b8cbbadd
> 0xffff8cfa9c596060 pcim_addr_resource_release (40 bytes)
> [  218.514343] ccp 0000:23:00.1: DEVRES REL 00000000920f9607
> 0xffff8cfa9c596c60 pcim_iomap_release (48 bytes)
> [  218.544659] ccp 0000:23:00.1: DEVRES REL 00000000d401a708
> 0xffff8cfa9c596840 pcim_addr_resource_release (40 bytes)
> [  218.575774] ccp 0000:23:00.1: DEVRES REL 00000000865d2fa2
> 0xffff8cfaa528d940 pcim_release (0 bytes)
> [  218.605758] ccp 0000:23:00.1: DEVRES REL 00000000f5b79222
> 0xffff8cfaa528d080 devm_kzalloc_release (20 bytes)
> [  218.636260] ccp 0000:23:00.1: DEVRES REL 0000000037ef240a
> 0xffff8cfa9eeb3f00 devm_kzalloc_release (184 bytes)
> 
> and the CCP driver reload issue during driver probe:
> 
> [  226.552684] pci 0000:23:00.1: Resources present before probing
> [  226.568846] pci 0000:a2:00.1: Resources present before probing
> 
> From the above DEVRES logging, it looks like pcim_intx_restore
> associated resource is being released but then
> being re-added during detach/unload, which causes really_probe() to
> fail at probe time, as dev->devres_head is
> not empty due to this added resource:
> ...
> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
> ...
> 
> Going more deep into this: 
> 
> This is the initial pcim_intx_resoure associated resource being added
> during first (CCP) driver load:
> 
> [   40.418933]  pcim_intx+0x3a/0x120
> [   40.418936]  pci_intx+0x8b/0xa0
> [   40.418939]  __pci_enable_msix_range+0x369/0x530
> [   40.418943]  pci_enable_msix_range+0x18/0x20
> [   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
> [   40.418965] ipmi device interface
> [   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   40.418969]  local_pci_probe+0x4f/0xb0
> [   40.418973]  work_for_cpu_fn+0x1e/0x30
> [   40.418976]  process_one_work+0x183/0x350
> [   40.418980]  worker_thread+0x2df/0x3f0
> [   40.418982]  ? __pfx_worker_thread+0x10/0x10
> [   40.418985]  kthread+0xd0/0x100
> [   40.418987]  ? __pfx_kthread+0x10/0x10
> [   40.418990]  ret_from_fork+0x40/0x60
> [   40.418993]  ? __pfx_kthread+0x10/0x10
> [   40.418996]  ret_from_fork_asm+0x1a/0x30
> [   40.419001]  </TASK>
> ..
> ..
> [   40.419012] ccp 0000:23:00.1: DEVRES ADD 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> 
> Now, at driver unload: 
> devres_release_all() -> remove_nodes() -> release_nodes() ...
> 
> remove_nodes() moves normal devres entries to the todo list, as can
> be seen with the following log:
> ...
> [  218.245241] moving node 00000000fbd19618 0xffff8cfaa528d140 from
> devres to todo list
> ...
> 
> So, now this pcim_intx_resource associated resource is no longer part
> of dev->devres_head list and has been
> moved to the todo list.
> 
> Later, when release_nodes() is invoked, it calls the associated
> release() callback associated with this devres:
> ...
> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> ...
> 
> The call flow for that is:
> pcim_intx_restore() -> pci_intx() -> pcim_intx() ...
> 
> Now, pcim_intx() calls get_or_create_intx_devres() which tries to
> find it's associated devres using devres_find(), but 
> that fails to find the devres, as the devres is no longer on dev-
> >devres_head and has been moved to todo list.
> 
> Therefore, get_or_create_intx_devres() adds a new devres at driver
> unload/detach time:
> ...
> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
> ...

You're absolutely right, that seems to be the issue precisely. In fact,
this problem of PCI hybrid functions calling themselves again even
forced me to implement a "pure unmanaged" version of
__pci_request_region(). So it's a pity that I didn't think of that
problem for pci_intx().

> 
> But, then this is an issue as pcim_intx() is supposed to restore the
> original PCI INTx state on driver detach, but it now
> operating on a newly added devres and not the original devres (added
> at driver probe) which contains the original PCI INTx
> state, so it will be restoring an incorrect PCI INTx state ?

I think this is just UB and we don't have to think about whether it's
the correct state or not – it must only be restored once, so it's
broken in any case.

> 
> Additionally, this newly added devres causes driver reload/probe
> failure as really_probe() now finds resources present
> before probing.

Yes, that has to be separated.

@Bjorn:
So I think the solution will be not to call into pci_intx() from
pcim_intx_restore() at all anymore.

Similar as we do with __pci_request_region() <-> __pcim_request_region().

Let me dig into that..

I guess you'll prefer me to send a fixup commit to squash into the
pcim_intx() commit?

I'm quite busy today, but will definitely deliver that quite soon.

> 
> Not sure, if this issue has been observed with other PCI device
> drivers.

Everyone using pci_intx() AND pcim_enable_device() will have this
issue.

The only thing I'm wondering about is where your code in
drivers/crypto/ccp/ calls into pci_intx()?


Regards,
P.

> 
> Thanks,
> Ashish
>
Kalra, Ashish July 9, 2024, 8:12 a.m. UTC | #9
Hello Philipp,

On 7/9/2024 2:21 AM, Philipp Stanner wrote:
> @Bjorn, @Krzysztof
>
> On Mon, 2024-07-08 at 21:46 +0000, Ashish Kalra wrote:
>> With this patch applied, we are observing unloading and then
>> reloading issues with the AMD Crypto (CCP) driver:
> Thank you very much for digging into this, Ashish
>
> Could you give me some pointers how one could test CCP by himself?
>
>> with DEVRES logging enabled, we observe the following logs:
>>
>> [  218.093588] ccp 0000:a2:00.1: DEVRES REL 00000000c18c52fb
>> 0xffff8d09dc1972c0 devm_kzalloc_release (152 bytes)
>> [  218.105527] ccp 0000:a2:00.1: DEVRES REL 000000003091fb95
>> 0xffff8d09d3aad000 devm_kzalloc_release (3072 bytes)
>> [  218.117500] ccp 0000:a2:00.1: DEVRES REL 0000000049e4adfe
>> 0xffff8d09d588f000 pcim_intx_restore (4 bytes)
>> [  218.129519] ccp 0000:a2:00.1: DEVRES ADD 000000001a2ac6ad
>> 0xffff8cfa867b7cc0 pcim_intx_restore (4 bytes)
>> [  218.140434] ccp 0000:a2:00.1: DEVRES REL 00000000627ecaf7
>> 0xffff8d09d588f680 pcim_msi_release (16 bytes)
>> [  218.151665] ccp 0000:a2:00.1: DEVRES REL 0000000058b2252a
>> 0xffff8d09dc199680 msi_device_data_release (80 bytes)
>> [  218.163625] ccp 0000:a2:00.1: DEVRES REL 00000000435cc85e
>> 0xffff8d09d588ff80 devm_attr_group_remove (8 bytes)
>> [  218.175224] ccp 0000:a2:00.1: DEVRES REL 00000000cb6fcd9b
>> 0xffff8d09eb583660 pcim_addr_resource_release (40 bytes)
>> [  218.187319] ccp 0000:a2:00.1: DEVRES REL 00000000d64a8b84
>> 0xffff8d09eb583180 pcim_iomap_release (48 bytes)
>> [  218.198615] ccp 0000:a2:00.1: DEVRES REL 0000000099ac6b28
>> 0xffff8d09eb5830c0 pcim_addr_resource_release (40 bytes)
>> [  218.210730] ccp 0000:a2:00.1: DEVRES REL 00000000bdd27f88
>> 0xffff8d09d3ac2700 pcim_release (0 bytes)
>> [  218.221489] ccp 0000:a2:00.1: DEVRES REL 00000000e763315c
>> 0xffff8d09d3ac2240 devm_kzalloc_release (20 bytes)
>> [  218.233008] ccp 0000:a2:00.1: DEVRES REL 00000000ae90f983
>> 0xffff8d09dc25a800 devm_kzalloc_release (184 bytes)
>> [  218.245251] ccp 0000:23:00.1: DEVRES REL 00000000a2ec0085
>> 0xffff8cfa86bee700 fw_name_devm_release (16 bytes)
>> [  218.256748] ccp 0000:23:00.1: DEVRES REL 0000000021bccd98
>> 0xffff8cfaa528d5c0 devm_pages_release (16 bytes)
>> [  218.268044] ccp 0000:23:00.1: DEVRES REL 000000003ef7cbc7
>> 0xffff8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
>> [  218.279631] ccp 0000:23:00.1: DEVRES REL 00000000619322e1
>> 0xffff8cfaa1b5e480 devm_kzalloc_release (152 bytes)
>> [  218.300438] ccp 0000:23:00.1: DEVRES REL 00000000c261523b
>> 0xffff8cfaad88b000 devm_kzalloc_release (3072 bytes)
>> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
>> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
>> [  218.391226] ccp 0000:23:00.1: DEVRES REL 0000000058c9dce1
>> 0xffff8cfaa528d880 pcim_msi_release (16 bytes)
>> [  218.421340] ccp 0000:23:00.1: DEVRES REL 00000000c8ab08a7
>> 0xffff8cfa9e617300 msi_device_data_release (80 bytes)
>> [  218.452357] ccp 0000:23:00.1: DEVRES REL 00000000cf5baccb
>> 0xffff8cfaa528d8c0 devm_attr_group_remove (8 bytes)
>> [  218.483011] ccp 0000:23:00.1: DEVRES REL 00000000b8cbbadd
>> 0xffff8cfa9c596060 pcim_addr_resource_release (40 bytes)
>> [  218.514343] ccp 0000:23:00.1: DEVRES REL 00000000920f9607
>> 0xffff8cfa9c596c60 pcim_iomap_release (48 bytes)
>> [  218.544659] ccp 0000:23:00.1: DEVRES REL 00000000d401a708
>> 0xffff8cfa9c596840 pcim_addr_resource_release (40 bytes)
>> [  218.575774] ccp 0000:23:00.1: DEVRES REL 00000000865d2fa2
>> 0xffff8cfaa528d940 pcim_release (0 bytes)
>> [  218.605758] ccp 0000:23:00.1: DEVRES REL 00000000f5b79222
>> 0xffff8cfaa528d080 devm_kzalloc_release (20 bytes)
>> [  218.636260] ccp 0000:23:00.1: DEVRES REL 0000000037ef240a
>> 0xffff8cfa9eeb3f00 devm_kzalloc_release (184 bytes)
>>
>> and the CCP driver reload issue during driver probe:
>>
>> [  226.552684] pci 0000:23:00.1: Resources present before probing
>> [  226.568846] pci 0000:a2:00.1: Resources present before probing
>>
>> From the above DEVRES logging, it looks like pcim_intx_restore
>> associated resource is being released but then
>> being re-added during detach/unload, which causes really_probe() to
>> fail at probe time, as dev->devres_head is
>> not empty due to this added resource:
>> ...
>> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
>> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
>> ...
>>
>> Going more deep into this: 
>>
>> This is the initial pcim_intx_resoure associated resource being added
>> during first (CCP) driver load:
>>
>> [   40.418933]  pcim_intx+0x3a/0x120
>> [   40.418936]  pci_intx+0x8b/0xa0
>> [   40.418939]  __pci_enable_msix_range+0x369/0x530
>> [   40.418943]  pci_enable_msix_range+0x18/0x20
>> [   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
>> [   40.418965] ipmi device interface
>> [   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [   40.418969]  local_pci_probe+0x4f/0xb0
>> [   40.418973]  work_for_cpu_fn+0x1e/0x30
>> [   40.418976]  process_one_work+0x183/0x350
>> [   40.418980]  worker_thread+0x2df/0x3f0
>> [   40.418982]  ? __pfx_worker_thread+0x10/0x10
>> [   40.418985]  kthread+0xd0/0x100
>> [   40.418987]  ? __pfx_kthread+0x10/0x10
>> [   40.418990]  ret_from_fork+0x40/0x60
>> [   40.418993]  ? __pfx_kthread+0x10/0x10
>> [   40.418996]  ret_from_fork_asm+0x1a/0x30
>> [   40.419001]  </TASK>
>> ..
>> ..
>> [   40.419012] ccp 0000:23:00.1: DEVRES ADD 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>>
>> Now, at driver unload: 
>> devres_release_all() -> remove_nodes() -> release_nodes() ...
>>
>> remove_nodes() moves normal devres entries to the todo list, as can
>> be seen with the following log:
>> ...
>> [  218.245241] moving node 00000000fbd19618 0xffff8cfaa528d140 from
>> devres to todo list
>> ...
>>
>> So, now this pcim_intx_resource associated resource is no longer part
>> of dev->devres_head list and has been
>> moved to the todo list.
>>
>> Later, when release_nodes() is invoked, it calls the associated
>> release() callback associated with this devres:
>> ...
>> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>> ...
>>
>> The call flow for that is:
>> pcim_intx_restore() -> pci_intx() -> pcim_intx() ...
>>
>> Now, pcim_intx() calls get_or_create_intx_devres() which tries to
>> find it's associated devres using devres_find(), but 
>> that fails to find the devres, as the devres is no longer on dev-
>>> devres_head and has been moved to todo list.
>> Therefore, get_or_create_intx_devres() adds a new devres at driver
>> unload/detach time:
>> ...
>> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
>> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
>> ...
> You're absolutely right, that seems to be the issue precisely. In fact,
> this problem of PCI hybrid functions calling themselves again even
> forced me to implement a "pure unmanaged" version of
> __pci_request_region(). So it's a pity that I didn't think of that
> problem for pci_intx().
>
>> But, then this is an issue as pcim_intx() is supposed to restore the
>> original PCI INTx state on driver detach, but it now
>> operating on a newly added devres and not the original devres (added
>> at driver probe) which contains the original PCI INTx
>> state, so it will be restoring an incorrect PCI INTx state ?
> I think this is just UB and we don't have to think about whether it's
> the correct state or not – it must only be restored once, so it's
> broken in any case.
>
>> Additionally, this newly added devres causes driver reload/probe
>> failure as really_probe() now finds resources present
>> before probing.
> Yes, that has to be separated.
>
> @Bjorn:
> So I think the solution will be not to call into pci_intx() from
> pcim_intx_restore() at all anymore.
>
> Similar as we do with __pci_request_region() <-> __pcim_request_region().
>
> Let me dig into that..
>
> I guess you'll prefer me to send a fixup commit to squash into the
> pcim_intx() commit?
>
> I'm quite busy today, but will definitely deliver that quite soon.
>
>> Not sure, if this issue has been observed with other PCI device
>> drivers.
> Everyone using pci_intx() AND pcim_enable_device() will have this
> issue.
>
> The only thing I'm wondering about is where your code in
> drivers/crypto/ccp/ calls into pci_intx()?
>
Actually, the CCP driver does not explicitly call into pci_intx(), the flow to pci_intx() from CCP driver probe is as follows:

[   40.418933]  pcim_intx+0x3a/0x120
[   40.418936]  pci_intx+0x8b/0xa0
[   40.418939]  __pci_enable_msix_range+0x369/0x530
[   40.418943]  pci_enable_msix_range+0x18/0x20
[   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
[   40.418965] ipmi device interface
[   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
[   40.418969]  local_pci_probe+0x4f/0xb0

And obviously, pci_intx()->pcim_intx() get invoked due to pcim_intx_restore() callback being invoked during sp_pci_exit() code path, as below:

[  218.128978]  pcim_intx+0x3a/0x120
[  218.128986]  ? srso_alias_return_thunk+0x5/0xfbef5
[  218.128999]  pci_intx+0x8b/0xa0
[  218.129010]  pcim_intx_restore+0x1b/0x30
[  218.129019]  release_nodes+0x65/0x90
[  218.129031]  devres_release_all+0x9f/0xe0
[  218.129043]  device_unbind_cleanup+0x12/0x80
[  218.129055]  device_release_driver_internal+0x20c/0x250
[  218.129065]  ? srso_alias_return_thunk+0x5/0xfbef5
[  218.129078]  driver_detach+0x4f/0xa0
[  218.129091]  bus_remove_driver+0x87/0x100
[  218.129101]  driver_unregister+0x35/0x60
[  218.129113]  pci_unregister_driver+0x44/0xa0
[  218.129123]  sp_pci_exit+0x19/0x20 [ccp]
[  218.129137]  sp_mod_exit+0x12/0x18 [ccp]

...

Basically, CCP driver calls pci_enable_msix_range() and looking at this function:

pci_enable_msix_range() -> __pci_enable_msix_range() -> msix_capability_init().

And, msix_capability_init() -> pci_intx_for_msi() -> pci_intx().

So it looks like drivers using MSI-X/MSI (using API such as pci_enable_msix_range()) and pcim_enable_device() will get into this issue without explicitly calling into pci_intx().

Thanks, Ashish

> Regards,
> P.
>
>> Thanks,
>> Ashish
>>
Philipp Stanner July 9, 2024, 8:56 a.m. UTC | #10
From c24bd5b66e798a341caf183fb7cdbdf235502d90 Mon Sep 17 00:00:00 2001
From: Philipp Stanner <pstanner@redhat.com>
Date: Tue, 9 Jul 2024 09:45:48 +0200
Subject: [PATCH] PCI: Fix pcim_intx() recursive calls

pci_intx() calls into pcim_intx() in managed mode, i.e., when
pcim_enable_device() had been called. This recursive call causes a bug
by re-registering the device resource in the release callback.

This is the same phenomenon that made it necessary to implement some
functionality a second time, see __pcim_request_region().

Implement __pcim_intx() to bypass the hybrid nature of pci_intx() on
driver detach.

Fixes: https://lore.kernel.org/all/20240708214656.4721-1-Ashish.Kalra@amd.com/
Reported-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
Hi Ashish,
I hacked down this fix that should be applyable on top.
Could you maybe have a first quick look whether this fixes the issue?
---
 drivers/pci/devres.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 2f0379a4e58f..dcef049b72fe 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -408,12 +408,31 @@ static inline bool mask_contains_bar(int mask, int bar)
 	return mask & BIT(bar);
 }
 
+/*
+ * This is a copy of pci_intx() used to bypass the problem of occuring
+ * recursive function calls due to the hybrid nature of pci_intx().
+ */
+static void __pcim_intx(struct pci_dev *pdev, int enable)
+{
+	u16 pci_command, new;
+
+	pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
+
+	if (enable)
+		new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
+	else
+		new = pci_command | PCI_COMMAND_INTX_DISABLE;
+
+	if (new != pci_command)
+		pci_write_config_word(pdev, PCI_COMMAND, new);
+}
+
 static void pcim_intx_restore(struct device *dev, void *data)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct pcim_intx_devres *res = data;
 
-	pci_intx(pdev, res->orig_intx);
+	__pcim_intx(pdev, res->orig_intx);
 }
 
 static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
@@ -443,7 +462,6 @@ static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
  */
 int pcim_intx(struct pci_dev *pdev, int enable)
 {
-	u16 pci_command, new;
 	struct pcim_intx_devres *res;
 
 	res = get_or_create_intx_devres(&pdev->dev);
@@ -451,16 +469,7 @@ int pcim_intx(struct pci_dev *pdev, int enable)
 		return -ENOMEM;
 
 	res->orig_intx = !enable;
-
-	pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
-
-	if (enable)
-		new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
-	else
-		new = pci_command | PCI_COMMAND_INTX_DISABLE;
-
-	if (new != pci_command)
-		pci_write_config_word(pdev, PCI_COMMAND, new);
+	__pcim_intx(pdev, enable);
 
 	return 0;
 }
Kalra, Ashish July 9, 2024, 6:46 p.m. UTC | #11
Hello Philipp,

I have reviewed and tested this patch, looks to be working fine and fixes the issue.

Thanks, Ashish

On 7/9/2024 3:56 AM, Philipp Stanner wrote:
> From c24bd5b66e798a341caf183fb7cdbdf235502d90 Mon Sep 17 00:00:00 2001
> From: Philipp Stanner <pstanner@redhat.com>
> Date: Tue, 9 Jul 2024 09:45:48 +0200
> Subject: [PATCH] PCI: Fix pcim_intx() recursive calls
>
> pci_intx() calls into pcim_intx() in managed mode, i.e., when
> pcim_enable_device() had been called. This recursive call causes a bug
> by re-registering the device resource in the release callback.
>
> This is the same phenomenon that made it necessary to implement some
> functionality a second time, see __pcim_request_region().
>
> Implement __pcim_intx() to bypass the hybrid nature of pci_intx() on
> driver detach.
>
> Fixes: https://lore.kernel.org/all/20240708214656.4721-1-Ashish.Kalra@amd.com/
> Reported-by: Ashish Kalra <Ashish.Kalra@amd.com>
> Signed-off-by: Philipp Stanner <pstanner@redhat.com>
> ---
> Hi Ashish,
> I hacked down this fix that should be applyable on top.
> Could you maybe have a first quick look whether this fixes the issue?
> ---
>  drivers/pci/devres.c | 33 +++++++++++++++++++++------------
>  1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
> index 2f0379a4e58f..dcef049b72fe 100644
> --- a/drivers/pci/devres.c
> +++ b/drivers/pci/devres.c
> @@ -408,12 +408,31 @@ static inline bool mask_contains_bar(int mask, int bar)
>  	return mask & BIT(bar);
>  }
>  
> +/*
> + * This is a copy of pci_intx() used to bypass the problem of occuring
> + * recursive function calls due to the hybrid nature of pci_intx().
> + */
> +static void __pcim_intx(struct pci_dev *pdev, int enable)
> +{
> +	u16 pci_command, new;
> +
> +	pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
> +
> +	if (enable)
> +		new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
> +	else
> +		new = pci_command | PCI_COMMAND_INTX_DISABLE;
> +
> +	if (new != pci_command)
> +		pci_write_config_word(pdev, PCI_COMMAND, new);
> +}
> +
>  static void pcim_intx_restore(struct device *dev, void *data)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
>  	struct pcim_intx_devres *res = data;
>  
> -	pci_intx(pdev, res->orig_intx);
> +	__pcim_intx(pdev, res->orig_intx);
>  }
>  
>  static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
> @@ -443,7 +462,6 @@ static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
>   */
>  int pcim_intx(struct pci_dev *pdev, int enable)
>  {
> -	u16 pci_command, new;
>  	struct pcim_intx_devres *res;
>  
>  	res = get_or_create_intx_devres(&pdev->dev);
> @@ -451,16 +469,7 @@ int pcim_intx(struct pci_dev *pdev, int enable)
>  		return -ENOMEM;
>  
>  	res->orig_intx = !enable;
> -
> -	pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
> -
> -	if (enable)
> -		new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
> -	else
> -		new = pci_command | PCI_COMMAND_INTX_DISABLE;
> -
> -	if (new != pci_command)
> -		pci_write_config_word(pdev, PCI_COMMAND, new);
> +	__pcim_intx(pdev, enable);
>  
>  	return 0;
>  }
Tested-by: Ashish Kalra <Ashish.Kalra@amd.com>
Krzysztof Wilczyński July 10, 2024, 4:08 a.m. UTC | #12
Hello Ashish and Philipp,

> I have reviewed and tested this patch, looks to be working fine and fixes the issue.

Great news!

Ashish, thank you for taking the time to report the problem and then also
testing the fix.  Much appreciated.

Philipp, I will take this patch and squash into the series you have sent
earlier, since the changes are not yet part of the mainline.

	Krzysztof
Krzysztof Wilczyński July 10, 2024, 4:43 a.m. UTC | #13
[...]
> pci_intx() calls into pcim_intx() in managed mode, i.e., when
> pcim_enable_device() had been called. This recursive call causes a bug
> by re-registering the device resource in the release callback.
> 
> This is the same phenomenon that made it necessary to implement some
> functionality a second time, see __pcim_request_region().
> 
> Implement __pcim_intx() to bypass the hybrid nature of pci_intx() on
> driver detach.

Squashed against devres branch, thank you!  See:

  https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=devres&id=37c9f6c55cfd63a9e38a98b5aa1d7da75845c2b2

To credit Ashish, I kept the Fixes: and Tested-by: tags.

	Krzysztof
diff mbox series

Patch

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index e8de93e95eb6..7b72c952a9e5 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -42,6 +42,11 @@  struct pcim_iomap_devres {
 	void __iomem *table[PCI_STD_NUM_BARS];
 };
 
+/* Used to restore the old intx state on driver detach. */
+struct pcim_intx_devres {
+	int orig_intx;
+};
+
 enum pcim_addr_devres_type {
 	/* Default initializer. */
 	PCIM_ADDR_DEVRES_TYPE_INVALID,
@@ -397,32 +402,75 @@  int pcim_set_mwi(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+
 static inline bool mask_contains_bar(int mask, int bar)
 {
 	return mask & BIT(bar);
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_intx_restore(struct device *dev, void *data)
 {
-	struct pci_dev *dev = to_pci_dev(gendev);
-	struct pci_devres *this = res;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pcim_intx_devres *res = data;
 
-	if (this->restore_intx)
-		pci_intx(dev, this->orig_intx);
+	pci_intx(pdev, res->orig_intx);
+}
 
-	if (pci_is_enabled(dev) && !dev->pinned)
-		pci_disable_device(dev);
+static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
+{
+	struct pcim_intx_devres *res;
+
+	res = devres_find(dev, pcim_intx_restore, NULL, NULL);
+	if (res)
+		return res;
+
+	res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL);
+	if (res)
+		devres_add(dev, res);
+
+	return res;
 }
 
-/*
- * TODO: After the last four callers in pci.c are ported, find_pci_dr()
- * needs to be made static again.
+/**
+ * pcim_intx - managed pci_intx()
+ * @pdev: the PCI device to operate on
+ * @enable: boolean: whether to enable or disable PCI INTx
+ *
+ * Returns: 0 on success, -ENOMEM on error.
+ *
+ * Enables/disables PCI INTx for device @pdev.
+ * Restores the original state on driver detach.
  */
-struct pci_devres *find_pci_dr(struct pci_dev *pdev)
+int pcim_intx(struct pci_dev *pdev, int enable)
 {
-	if (pci_is_managed(pdev))
-		return devres_find(&pdev->dev, pcim_release, NULL, NULL);
-	return NULL;
+	u16 pci_command, new;
+	struct pcim_intx_devres *res;
+
+	res = get_or_create_intx_devres(&pdev->dev);
+	if (!res)
+		return -ENOMEM;
+
+	res->orig_intx = !enable;
+
+	pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
+
+	if (enable)
+		new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
+	else
+		new = pci_command | PCI_COMMAND_INTX_DISABLE;
+
+	if (new != pci_command)
+		pci_write_config_word(pdev, PCI_COMMAND, new);
+
+	return 0;
+}
+
+static void pcim_release(struct device *gendev, void *res)
+{
+	struct pci_dev *dev = to_pci_dev(gendev);
+
+	if (pci_is_enabled(dev) && !dev->pinned)
+		pci_disable_device(dev);
 }
 
 static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index db2cc48f3d63..1b4832a60047 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4443,6 +4443,16 @@  void pci_intx(struct pci_dev *pdev, int enable)
 {
 	u16 pci_command, new;
 
+	/*
+	 * This is done for backwards compatibility, because the old PCI devres
+	 * API had a mode in which this function became managed if the dev had
+	 * been enabled with pcim_enable_device() instead of pci_enable_device().
+	 */
+	if (pci_is_managed(pdev)) {
+		WARN_ON_ONCE(pcim_intx(pdev, enable) != 0);
+		return;
+	}
+
 	pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
 
 	if (enable)
@@ -4450,17 +4460,8 @@  void pci_intx(struct pci_dev *pdev, int enable)
 	else
 		new = pci_command | PCI_COMMAND_INTX_DISABLE;
 
-	if (new != pci_command) {
-		struct pci_devres *dr;
-
+	if (new != pci_command)
 		pci_write_config_word(pdev, PCI_COMMAND, new);
-
-		dr = find_pci_dr(pdev);
-		if (dr && !dr->restore_intx) {
-			dr->restore_intx = 1;
-			dr->orig_intx = !enable;
-		}
-	}
 }
 EXPORT_SYMBOL_GPL(pci_intx);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c355bb6a698d..9e87528f1157 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -816,16 +816,17 @@  static inline pci_power_t mid_pci_get_power_state(struct pci_dev *pdev)
  * there's no need to track it separately.  pci_devres is initialized
  * when a device is enabled using managed PCI device enable interface.
  *
- * TODO: Struct pci_devres and find_pci_dr() only need to be here because
- * they're used in pci.c.  Port or move these functions to devres.c and
- * then remove them from here.
+ * TODO: Struct pci_devres only needs to be here because they're used in pci.c.
+ * Port or move these functions to devres.c and then remove them from here.
  */
 struct pci_devres {
-	unsigned int orig_intx:1;
-	unsigned int restore_intx:1;
+	/*
+	 * TODO:
+	 * This struct is now surplus. Remove it by refactoring pci/devres.c
+	 */
 };
 
-struct pci_devres *find_pci_dr(struct pci_dev *pdev);
+int pcim_intx(struct pci_dev *dev, int enable);
 
 int pcim_request_region(struct pci_dev *pdev, int bar, const char *name);
 int pcim_request_region_exclusive(struct pci_dev *pdev, int bar, const char *name);