diff mbox series

[v4,09/23] scsi: sd: Do not issue commands to suspended disks on shutdown

Message ID 20230920135439.929695-10-dlemoal@kernel.org
State New
Headers show
Series Fix libata suspend/resume handling and code cleanup | expand

Commit Message

Damien Le Moal Sept. 20, 2023, 1:54 p.m. UTC
If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. This in turn results in a call to sd_shutdown() which
will issue a synchronize cache command and a start stop unit command to
spindown the disk. sd_shutdown() issues the commands only if the device
is not already suspended but does not check the power state for
system-wide suspend/resume. That is, the commands may be issued with the
device in a suspended state, which causes PM resume to hang, forcing a
reset of the machine to recover.

Fix this by not calling sd_shutdown() in sd_remove() if the device
is not running.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Sept. 21, 2023, 9:36 p.m. UTC | #1
On 9/20/23 06:54, Damien Le Moal wrote:
> If an error occurs when resuming a host adapter before the devices
> attached to the adapter are resumed, the adapter low level driver may
> remove the scsi host, resulting in a call to sd_remove() for the
> disks of the host. This in turn results in a call to sd_shutdown() which
> will issue a synchronize cache command and a start stop unit command to
> spindown the disk. sd_shutdown() issues the commands only if the device
> is not already suspended but does not check the power state for
> system-wide suspend/resume. That is, the commands may be issued with the
> device in a suspended state, which causes PM resume to hang, forcing a
> reset of the machine to recover.
> 
> Fix this by not calling sd_shutdown() in sd_remove() if the device
> is not running.

Hi Damien,

I'd like to look into an alternative fix (after this patch series went 
in) but I couldn't identify the call chain in the ATA resume code that 
results in removal of the SCSI host. Can you please show me the call 
chain that results in SCSI host removal if resuming fails?

Thanks,

Bart.
Damien Le Moal Sept. 21, 2023, 10:06 p.m. UTC | #2
On 2023/09/21 14:36, Bart Van Assche wrote:
> On 9/20/23 06:54, Damien Le Moal wrote:
>> If an error occurs when resuming a host adapter before the devices
>> attached to the adapter are resumed, the adapter low level driver may
>> remove the scsi host, resulting in a call to sd_remove() for the
>> disks of the host. This in turn results in a call to sd_shutdown() which
>> will issue a synchronize cache command and a start stop unit command to
>> spindown the disk. sd_shutdown() issues the commands only if the device
>> is not already suspended but does not check the power state for
>> system-wide suspend/resume. That is, the commands may be issued with the
>> device in a suspended state, which causes PM resume to hang, forcing a
>> reset of the machine to recover.
>>
>> Fix this by not calling sd_shutdown() in sd_remove() if the device
>> is not running.
> 
> Hi Damien,
> 
> I'd like to look into an alternative fix (after this patch series went 
> in) but I couldn't identify the call chain in the ATA resume code that 
> results in removal of the SCSI host. Can you please show me the call 
> chain that results in SCSI host removal if resuming fails?

See the pm80xx driver for which I recently fixed a resume issue. That is how I
found this problem with device removal: resuming the pm800xx HBA was failing and
the driver then called scsi_remove_host() to drop the ports and that led to
trying to removed sd devices that were still suspended.

> 
> Thanks,
> 
> Bart.
>
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 1d106c8ad5af..d86306d42445 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3727,7 +3727,8 @@  static int sd_remove(struct device *dev)
 
 	device_del(&sdkp->disk_dev);
 	del_gendisk(sdkp->disk);
-	sd_shutdown(dev);
+	if (sdkp->device->sdev_state == SDEV_RUNNING)
+		sd_shutdown(dev);
 
 	put_disk(sdkp->disk);
 	return 0;