mbox series

[SRU,J/L,0/1] Fix suspend hang on Lenovo workstation

Message ID 20230906053224.113652-1-aaron.ma@canonical.com
Headers show
Series Fix suspend hang on Lenovo workstation | expand

Message

Aaron Ma Sept. 6, 2023, 5:32 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2034479

[Impact]
The system hang after resume from S3.
Error logs:
kernel: igb 0000:02:00.0: disabling already-disabled device
kernel: WARNING: CPU: 0 PID: 277 at drivers/pci/pci.c:2248
pci_disable_device+0xc4/0xf0
kernel: RIP: 0010:pci_disable_device+0xc4/0xf0
kernel: Call Trace:
kernel: <TASK>
kernel: igb_io_error_detected+0x3e/0x60
kernel: report_error_detected+0xd6/0x1c0
kernel: ? __pfx_report_normal_detected+0x10/0x10
kernel: report_normal_detected+0x16/0x30
kernel: pci_walk_bus+0x74/0xa0
kernel: pcie_do_recovery+0xb9/0x340
kernel: ? __pfx_aer_root_reset+0x10/0x10
kernel: aer_process_err_devices+0x168/0x220
kernel: aer_isr+0x1b5/0x1e0
kernel: ? __pfx_irq_thread_fn+0x10/0x10
kernel: irq_thread_fn+0x21/0x70
kernel: irq_thread+0xf8/0x1c0
kernel: ? __pfx_irq_thread_dtor+0x10/0x10
kernel: ? __pfx_irq_thread+0x10/0x10
kernel: kthread+0xef/0x120
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x29/0x50
kernel: </TASK>
kernel: ---[ end trace 0000000000000000 ]---

[Fix]
The pci io error detected ops is called before the driver resumed.
Avoid this race condition to fix the issue.

[Test]
Tested on hardware, the system suspend/resume OK.

[Where problems could occur]
It may break igb driver.

The commit is included in 6.5-rc1 and merged by stable update in
oem-6.1, SRU for Jammy and Lunar.

Ying Hsu (1):
  igb: Fix igb_down hung on surprise removal

 drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Tim Gardner Sept. 6, 2023, 2:29 p.m. UTC | #1
On 9/5/23 11:32 PM, Aaron Ma wrote:
> BugLink: https://bugs.launchpad.net/bugs/2034479
> 
> [Impact]
> The system hang after resume from S3.
> Error logs:
> kernel: igb 0000:02:00.0: disabling already-disabled device
> kernel: WARNING: CPU: 0 PID: 277 at drivers/pci/pci.c:2248
> pci_disable_device+0xc4/0xf0
> kernel: RIP: 0010:pci_disable_device+0xc4/0xf0
> kernel: Call Trace:
> kernel: <TASK>
> kernel: igb_io_error_detected+0x3e/0x60
> kernel: report_error_detected+0xd6/0x1c0
> kernel: ? __pfx_report_normal_detected+0x10/0x10
> kernel: report_normal_detected+0x16/0x30
> kernel: pci_walk_bus+0x74/0xa0
> kernel: pcie_do_recovery+0xb9/0x340
> kernel: ? __pfx_aer_root_reset+0x10/0x10
> kernel: aer_process_err_devices+0x168/0x220
> kernel: aer_isr+0x1b5/0x1e0
> kernel: ? __pfx_irq_thread_fn+0x10/0x10
> kernel: irq_thread_fn+0x21/0x70
> kernel: irq_thread+0xf8/0x1c0
> kernel: ? __pfx_irq_thread_dtor+0x10/0x10
> kernel: ? __pfx_irq_thread+0x10/0x10
> kernel: kthread+0xef/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x29/0x50
> kernel: </TASK>
> kernel: ---[ end trace 0000000000000000 ]---
> 
> [Fix]
> The pci io error detected ops is called before the driver resumed.
> Avoid this race condition to fix the issue.
> 
> [Test]
> Tested on hardware, the system suspend/resume OK.
> 
> [Where problems could occur]
> It may break igb driver.
> 
> The commit is included in 6.5-rc1 and merged by stable update in
> oem-6.1, SRU for Jammy and Lunar.
> 
> Ying Hsu (1):
>    igb: Fix igb_down hung on surprise removal
> 
>   drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Roxana Nicolescu Sept. 8, 2023, 8:29 a.m. UTC | #2
On 06-09-2023 07:32, Aaron Ma wrote:
> BugLink: https://bugs.launchpad.net/bugs/2034479
>
> [Impact]
> The system hang after resume from S3.
> Error logs:
> kernel: igb 0000:02:00.0: disabling already-disabled device
> kernel: WARNING: CPU: 0 PID: 277 at drivers/pci/pci.c:2248
> pci_disable_device+0xc4/0xf0
> kernel: RIP: 0010:pci_disable_device+0xc4/0xf0
> kernel: Call Trace:
> kernel: <TASK>
> kernel: igb_io_error_detected+0x3e/0x60
> kernel: report_error_detected+0xd6/0x1c0
> kernel: ? __pfx_report_normal_detected+0x10/0x10
> kernel: report_normal_detected+0x16/0x30
> kernel: pci_walk_bus+0x74/0xa0
> kernel: pcie_do_recovery+0xb9/0x340
> kernel: ? __pfx_aer_root_reset+0x10/0x10
> kernel: aer_process_err_devices+0x168/0x220
> kernel: aer_isr+0x1b5/0x1e0
> kernel: ? __pfx_irq_thread_fn+0x10/0x10
> kernel: irq_thread_fn+0x21/0x70
> kernel: irq_thread+0xf8/0x1c0
> kernel: ? __pfx_irq_thread_dtor+0x10/0x10
> kernel: ? __pfx_irq_thread+0x10/0x10
> kernel: kthread+0xef/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x29/0x50
> kernel: </TASK>
> kernel: ---[ end trace 0000000000000000 ]---
>
> [Fix]
> The pci io error detected ops is called before the driver resumed.
> Avoid this race condition to fix the issue.
>
> [Test]
> Tested on hardware, the system suspend/resume OK.
>
> [Where problems could occur]
> It may break igb driver.
>
> The commit is included in 6.5-rc1 and merged by stable update in
> oem-6.1, SRU for Jammy and Lunar.
>
> Ying Hsu (1):
>    igb: Fix igb_down hung on surprise removal
>
>   drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Roxana Nicolescu Sept. 20, 2023, 9:02 a.m. UTC | #3
On 06/09/2023 07:32, Aaron Ma wrote:
> BugLink: https://bugs.launchpad.net/bugs/2034479
>
> [Impact]
> The system hang after resume from S3.
> Error logs:
> kernel: igb 0000:02:00.0: disabling already-disabled device
> kernel: WARNING: CPU: 0 PID: 277 at drivers/pci/pci.c:2248
> pci_disable_device+0xc4/0xf0
> kernel: RIP: 0010:pci_disable_device+0xc4/0xf0
> kernel: Call Trace:
> kernel: <TASK>
> kernel: igb_io_error_detected+0x3e/0x60
> kernel: report_error_detected+0xd6/0x1c0
> kernel: ? __pfx_report_normal_detected+0x10/0x10
> kernel: report_normal_detected+0x16/0x30
> kernel: pci_walk_bus+0x74/0xa0
> kernel: pcie_do_recovery+0xb9/0x340
> kernel: ? __pfx_aer_root_reset+0x10/0x10
> kernel: aer_process_err_devices+0x168/0x220
> kernel: aer_isr+0x1b5/0x1e0
> kernel: ? __pfx_irq_thread_fn+0x10/0x10
> kernel: irq_thread_fn+0x21/0x70
> kernel: irq_thread+0xf8/0x1c0
> kernel: ? __pfx_irq_thread_dtor+0x10/0x10
> kernel: ? __pfx_irq_thread+0x10/0x10
> kernel: kthread+0xef/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x29/0x50
> kernel: </TASK>
> kernel: ---[ end trace 0000000000000000 ]---
>
> [Fix]
> The pci io error detected ops is called before the driver resumed.
> Avoid this race condition to fix the issue.
>
> [Test]
> Tested on hardware, the system suspend/resume OK.
>
> [Where problems could occur]
> It may break igb driver.
>
> The commit is included in 6.5-rc1 and merged by stable update in
> oem-6.1, SRU for Jammy and Lunar.
>
> Ying Hsu (1):
>    igb: Fix igb_down hung on surprise removal
>
>   drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
>   1 file changed, 5 insertions(+)
Applied to lunar,jammy:master-next. Thanks!