Message ID | 20230620033659.136024-1-acelan.kao@canonical.com |
---|---|
Headers | show |
Series | A deadlock issue in scsi rescan task while resuming from S3 | expand |
On 20.06.23 05:36, AceLan Kao wrote: > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > > BugLink: https://launchpad.net/bugs/2018566 ^ bugs.launchpad.net > > [Impact] > During the S3 stress test, the system sometimes hangs when resuming. This > is due to the SCSI rescan task being unable to acquire the mutex lock > during the resumption from S3. The mutex lock has already been acquired by > EH and is waiting for the device to be ready for a rescan. Unfortunately, > the mutex lock is never released by either party, leading to a deadlock. > > [Fix] > Kaiheng submitted a patch to fix this issue which defers the rescan if the > disk is still suspended so the resume process of the disk device can proceed. > https://patchwork.ozlabs.org/project/linux-ide/patch/20230502150435.423770-2-kai.heng.feng@canonical.com/ > > Since the patch has not been accepted by the upstream yet, so submit it to the OEM kernel for now. This is no longer true. The submitted patch is upstream as of v6.4-rc7. Updating old justifications might help to convince others to look at this more favorably. > > The similiar patch has been included in v6.4-rc7, backport this to > generic ubuntu kernels. > 6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume > > [Test] > Verified on the machines by me and ODM. > > [Where problems could occur] > It only defers the rescan task, and should not have any impact to current systems. > > Damien Le Moal (1): > ata: libata-scsi: Avoid deadlock on rescan after device resume > > drivers/ata/libata-core.c | 3 ++- > drivers/ata/libata-eh.c | 2 +- > drivers/ata/libata-scsi.c | 22 +++++++++++++++++++++- > include/linux/libata.h | 2 +- > 4 files changed, 25 insertions(+), 4 deletions(-) > Anyhow, the submitted patch appears to be identical to upstream. Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 6/19/23 9:36 PM, AceLan Kao wrote: > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > > BugLink: https://launchpad.net/bugs/2018566 > > [Impact] > During the S3 stress test, the system sometimes hangs when resuming. This > is due to the SCSI rescan task being unable to acquire the mutex lock > during the resumption from S3. The mutex lock has already been acquired by > EH and is waiting for the device to be ready for a rescan. Unfortunately, > the mutex lock is never released by either party, leading to a deadlock. > > [Fix] > Kaiheng submitted a patch to fix this issue which defers the rescan if the > disk is still suspended so the resume process of the disk device can proceed. > https://patchwork.ozlabs.org/project/linux-ide/patch/20230502150435.423770-2-kai.heng.feng@canonical.com/ > > Since the patch has not been accepted by the upstream yet, so submit it to the OEM kernel for now. > > The similiar patch has been included in v6.4-rc7, backport this to > generic ubuntu kernels. > 6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume > > [Test] > Verified on the machines by me and ODM. > > [Where problems could occur] > It only defers the rescan task, and should not have any impact to current systems. > > Damien Le Moal (1): > ata: libata-scsi: Avoid deadlock on rescan after device resume > > drivers/ata/libata-core.c | 3 ++- > drivers/ata/libata-eh.c | 2 +- > drivers/ata/libata-scsi.c | 22 +++++++++++++++++++++- > include/linux/libata.h | 2 +- > 4 files changed, 25 insertions(+), 4 deletions(-) > Acked-by: Tim Gardner <tim.gardner@canonical.com>
On 23/06/20 09:53AM, Stefan Bader wrote: > On 20.06.23 05:36, AceLan Kao wrote: > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > > > > BugLink: https://launchpad.net/bugs/2018566 > ^ bugs.launchpad.net > > > > [Impact] > > During the S3 stress test, the system sometimes hangs when resuming. This > > is due to the SCSI rescan task being unable to acquire the mutex lock > > during the resumption from S3. The mutex lock has already been acquired by > > EH and is waiting for the device to be ready for a rescan. Unfortunately, > > the mutex lock is never released by either party, leading to a deadlock. > > > > [Fix] > > Kaiheng submitted a patch to fix this issue which defers the rescan if the > > disk is still suspended so the resume process of the disk device can proceed. > > https://patchwork.ozlabs.org/project/linux-ide/patch/20230502150435.423770-2-kai.heng.feng@canonical.com/ > > > > Since the patch has not been accepted by the upstream yet, so submit it to the OEM kernel for now. > > This is no longer true. The submitted patch is upstream as of v6.4-rc7. > Updating old justifications might help to convince others to look at this > more favorably. As Stefan mentioned, this is already in : v6.4 v6.4-rc7 > > > > The similiar patch has been included in v6.4-rc7, backport this to > > generic ubuntu kernels. > > 6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume > > > > [Test] > > Verified on the machines by me and ODM. > > > > [Where problems could occur] > > It only defers the rescan task, and should not have any impact to current systems. > > > > Damien Le Moal (1): > > ata: libata-scsi: Avoid deadlock on rescan after device resume > > > > drivers/ata/libata-core.c | 3 ++- > > drivers/ata/libata-eh.c | 2 +- > > drivers/ata/libata-scsi.c | 22 +++++++++++++++++++++- > > include/linux/libata.h | 2 +- > > 4 files changed, 25 insertions(+), 4 deletions(-) > > > > Anyhow, the submitted patch appears to be identical to upstream. > > Acked-by: Stefan Bader <stefan.bader@canonical.com> > Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com>
On 20/06/2023 05:36, AceLan Kao wrote: > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > > BugLink: https://launchpad.net/bugs/2018566 > > [Impact] > During the S3 stress test, the system sometimes hangs when resuming. This > is due to the SCSI rescan task being unable to acquire the mutex lock > during the resumption from S3. The mutex lock has already been acquired by > EH and is waiting for the device to be ready for a rescan. Unfortunately, > the mutex lock is never released by either party, leading to a deadlock. > > [Fix] > Kaiheng submitted a patch to fix this issue which defers the rescan if the > disk is still suspended so the resume process of the disk device can proceed. > https://patchwork.ozlabs.org/project/linux-ide/patch/20230502150435.423770-2-kai.heng.feng@canonical.com/ > > Since the patch has not been accepted by the upstream yet, so submit it to the OEM kernel for now. > > The similiar patch has been included in v6.4-rc7, backport this to > generic ubuntu kernels. > 6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume > > [Test] > Verified on the machines by me and ODM. > > [Where problems could occur] > It only defers the rescan task, and should not have any impact to current systems. > > Damien Le Moal (1): > ata: libata-scsi: Avoid deadlock on rescan after device resume > > drivers/ata/libata-core.c | 3 ++- > drivers/ata/libata-eh.c | 2 +- > drivers/ata/libata-scsi.c | 22 +++++++++++++++++++++- > include/linux/libata.h | 2 +- > 4 files changed, 25 insertions(+), 4 deletions(-) > Applied to lunar/jammy:master-next. I adjusted the buglink accordingly. Thanks. Roxana
From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> BugLink: https://launchpad.net/bugs/2018566 [Impact] During the S3 stress test, the system sometimes hangs when resuming. This is due to the SCSI rescan task being unable to acquire the mutex lock during the resumption from S3. The mutex lock has already been acquired by EH and is waiting for the device to be ready for a rescan. Unfortunately, the mutex lock is never released by either party, leading to a deadlock. [Fix] Kaiheng submitted a patch to fix this issue which defers the rescan if the disk is still suspended so the resume process of the disk device can proceed. https://patchwork.ozlabs.org/project/linux-ide/patch/20230502150435.423770-2-kai.heng.feng@canonical.com/ Since the patch has not been accepted by the upstream yet, so submit it to the OEM kernel for now. The similiar patch has been included in v6.4-rc7, backport this to generic ubuntu kernels. 6aa0365a3c85 ata: libata-scsi: Avoid deadlock on rescan after device resume [Test] Verified on the machines by me and ODM. [Where problems could occur] It only defers the rescan task, and should not have any impact to current systems. Damien Le Moal (1): ata: libata-scsi: Avoid deadlock on rescan after device resume drivers/ata/libata-core.c | 3 ++- drivers/ata/libata-eh.c | 2 +- drivers/ata/libata-scsi.c | 22 +++++++++++++++++++++- include/linux/libata.h | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-)