Message ID | 20230921180758.955317-10-dlemoal@kernel.org |
---|---|
State | New |
Headers | show |
Series | Fix libata suspend/resume handling and code cleanup | expand |
On 9/21/23 11:07, Damien Le Moal wrote: > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 1d106c8ad5af..d86306d42445 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3727,7 +3727,8 @@ static int sd_remove(struct device *dev) > > device_del(&sdkp->disk_dev); > del_gendisk(sdkp->disk); > - sd_shutdown(dev); > + if (sdkp->device->sdev_state == SDEV_RUNNING) > + sd_shutdown(dev); > > put_disk(sdkp->disk); > return 0; Doesn't this patch involve a behavior change? I'm concerned the above patch will always cause the sd_shutdown() call to be skipped if scsi_remove_host() is called, whether or not the scsi_remove_host() call is triggered by a resume failure. Thanks, Bart.
On 2023/09/22 11:14, Bart Van Assche wrote: > On 9/21/23 11:07, Damien Le Moal wrote: >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 1d106c8ad5af..d86306d42445 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -3727,7 +3727,8 @@ static int sd_remove(struct device *dev) >> >> device_del(&sdkp->disk_dev); >> del_gendisk(sdkp->disk); >> - sd_shutdown(dev); >> + if (sdkp->device->sdev_state == SDEV_RUNNING) >> + sd_shutdown(dev); >> >> put_disk(sdkp->disk); >> return 0; > > Doesn't this patch involve a behavior change? I'm concerned the above patch > will always cause the sd_shutdown() call to be skipped if scsi_remove_host() > is called, whether or not the scsi_remove_host() call is triggered by a > resume failure. I guess the point is: what are the possible states of the scsi device when sd_remove() is called ? From what I have seen so far, it is RUNNING. But I am not 100% sure this true all the time. But given that sd_shutdown() issues commands, I would guess that it makes sense that it is always running. Looking at the code, scsi_remove_host() calls scsi_forget_host() which calls __scsi_remove_device() for any device that is not in the SDEV_DEL state. __scsi_remove_device() then sets the state to SDEV_CANCEL. So it appears that the state should always be CANCEL and not running. However, my tests showed it to be running. I am not fully understanding how sd_remove() end up being called... I will run tests again without this patch for libata suspend/resume. This patch was added because of the hang I saw with the pm80xx driver (using libsas/libata) failing to resume. That resume failure is fixed and libata does not need this patch I think. But I will double check and drop it for now. I think we should investigate this further though, to make sure that we can always safely call sd_shutdown(). __scsi_remove_device() has this comment: /* * If blocked, we go straight to DEL and restart the queue so * any commands issued during driver shutdown (like sync * cache) are errored immediately. */ Which kind of give a hint that we should probably not blindy always try to call sd_shutdown().
On 9/22/23 12:10, Damien Le Moal wrote: > Looking at the code, scsi_remove_host() calls scsi_forget_host() which calls > __scsi_remove_device() for any device that is not in the SDEV_DEL state. > __scsi_remove_device() then sets the state to SDEV_CANCEL. So it appears that > the state should always be CANCEL and not running. However, my tests showed it > to be running. I am not fully understanding how sd_remove() end up being called... I think this is how sd_sync_cache() gets called from inside scsi_remove_host(): scsi_remove_host() -> scsi_forget_host() -> __scsi_remove_device() -> device_del(&sdev->sdev_gendev) -> bus_remove_device() -> device_release_driver() -> __device_release_driver() -> sd_remove() -> sd_shutdown() -> sd_sync_cache() In other words, it is guaranteed that scsi_device_set_state(sdev, SDEV_CANCEL) has been called before sd_remove() if it is called by scsi_remove_host(). > I think we should investigate this further though, to make sure that we can > always safely call sd_shutdown(). __scsi_remove_device() has this comment: > > /* > * If blocked, we go straight to DEL and restart the queue so > * any commands issued during driver shutdown (like sync > * cache) are errored immediately. > */ > > Which kind of give a hint that we should probably not blindy always try to call > sd_shutdown(). Does that comment perhaps refer to the SDEV_BLOCK / SDEV_CREATED_BLOCK states? Anyway, I'm wondering whether there are better ways to prevent that it is attempted to queue SCSI commands if a SCSI device is suspended, e.g. by checking the suspended state from inside scsi_device_state_check() or scsi_dispatch_cmd(). Thanks, Bart.
On 2023/09/22 13:08, Bart Van Assche wrote: > On 9/22/23 12:10, Damien Le Moal wrote: >> Looking at the code, scsi_remove_host() calls scsi_forget_host() which calls >> __scsi_remove_device() for any device that is not in the SDEV_DEL state. >> __scsi_remove_device() then sets the state to SDEV_CANCEL. So it appears that >> the state should always be CANCEL and not running. However, my tests showed it >> to be running. I am not fully understanding how sd_remove() end up being called... > > I think this is how sd_sync_cache() gets called from inside > scsi_remove_host(): > > scsi_remove_host() > -> scsi_forget_host() > -> __scsi_remove_device() > -> device_del(&sdev->sdev_gendev) > -> bus_remove_device() > -> device_release_driver() > -> __device_release_driver() > -> sd_remove() > -> sd_shutdown() > -> sd_sync_cache() > > In other words, it is guaranteed that scsi_device_set_state(sdev, > SDEV_CANCEL) has been called before sd_remove() if it is called by > scsi_remove_host(). > >> I think we should investigate this further though, to make sure that we can >> always safely call sd_shutdown(). __scsi_remove_device() has this comment: >> >> /* >> * If blocked, we go straight to DEL and restart the queue so >> * any commands issued during driver shutdown (like sync >> * cache) are errored immediately. >> */ >> >> Which kind of give a hint that we should probably not blindy always try to call >> sd_shutdown(). > > Does that comment perhaps refer to the SDEV_BLOCK / SDEV_CREATED_BLOCK > states? Anyway, I'm wondering whether there are better ways to prevent > that it is attempted to queue SCSI commands if a SCSI device is > suspended, e.g. by checking the suspended state from inside > scsi_device_state_check() or scsi_dispatch_cmd(). Using information in the device ->power structure is not reliable without holding the device lock(), so we should not do that. But we can add a "suspended" scsi_device flag that we maintain on execution of sd_suspend_system() and sd_resume_system(). Many drivers do that... Thoughts ?
On 9/22/23 13:36, Damien Le Moal wrote: > On 2023/09/22 13:08, Bart Van Assche wrote: >> Does that comment perhaps refer to the SDEV_BLOCK / SDEV_CREATED_BLOCK >> states? Anyway, I'm wondering whether there are better ways to prevent >> that it is attempted to queue SCSI commands if a SCSI device is >> suspended, e.g. by checking the suspended state from inside >> scsi_device_state_check() or scsi_dispatch_cmd(). > > Using information in the device ->power structure is not reliable without > holding the device lock(), so we should not do that. But we can add a > "suspended" scsi_device flag that we maintain on execution of > sd_suspend_system() and sd_resume_system(). Many drivers do that... > Thoughts ? That sounds good to me. Thanks, Bart.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 1d106c8ad5af..d86306d42445 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3727,7 +3727,8 @@ static int sd_remove(struct device *dev) device_del(&sdkp->disk_dev); del_gendisk(sdkp->disk); - sd_shutdown(dev); + if (sdkp->device->sdev_state == SDEV_RUNNING) + sd_shutdown(dev); put_disk(sdkp->disk); return 0;