Message ID | 20240107180258.360886-2-phill@thesusis.net |
---|---|
State | New |
Headers | show |
Series | Let sleeping disks lie | expand |
On 1/8/24 03:02, Phillip Susi wrote: > When a disk is in SLEEP mode it can not respond to any > any commands. Several commands are simply a NOOP to a disk > that is in standby mode, but when a disk is in SLEEP mode, > they frequencly cause the disk to spin up for no reason. > To avoid this, complete these commands in libata without > waking the disk. These commands are: As commented in patch 3/3, please use full 72-char lines for commit messages. > > CHECK POWER MODE > FLUSH CACHE > SLEEP > STANDBY IMMEDIATE > IDENTIFY > > If we know the disk is sleeping, we don't need to wake it up > to find out if it is in standby, so just pretend it is in sleep and standby are different power states. So saying that a disk that is sleeping is in standby does not make sense. And if you wake up a drive from sleep mode, it will *not* be in standby (need to re-check, but I think that holds true even with PUIS enabled). > standby. While asleep, there's no dirty pages in the cache, > so there's no need to flush it. There's no point in waking > a disk from sleep just to put it back to sleep. We also have > a cache of the IDENTIFY information so just return that > instead of waking the disk. The problem here is that ATA_DFLAG_SLEEPING is a horrible hack to not endup with lots of timeout failures if the user execute "hdparm -Y". Executing such passthrough command with a disk being used by an FS (for instance) is complete nonsense and should not be done. So I would rather see this handled correctly, through the kernel pm runtime suspend/resume: 1) Define a libata device sysfs attribute that allows going to sleep instead of the default standby when the disk is runtime suspended. If sleep is used, set ATA_DFLAG_SLEEPING. 2) With that, any command issued to the disk will trigger runtime resume. If ATA_DFLAG_SLEEPING is set, then the drive can be woken up with a link reset from EH, going through ata_port_runtime_resume(), exactly like with the default standby state for suspend. ATA_DFLAG_SLEEPING being set or not will indicate if a simple verify command can spinup the disk or if a link abort is needed (like done now in ata_qc_issue() which is really a nasty place to do that). Now, the annoying thing is the drive being randomly woken-up due to commands being issued, like the ones you mention. This is indeed bad, and seeing news like this: https://www.phoronix.com/news/Linux-PM-Regulatory-Bugs I think we really should do better... But I do not think the kernel is necessarilly the right place to fix this, at least in the case of commands issued from userspace by things like smartd or udevd. Patching there is needed to avoid uselessly waking up disks in runtime suspend. systemd already has power policies etc, so there is integration with the kernel side power management. Your issues come from using a tool (hdparm) that has no integration at all with the OS daemons. For FSes issued commands like flush, these are generally not random at all. If you see them appearing randomly, then there is a problem with the FS and patching the FS may be needed. Beside flush, there are other things to consider here. Ex: FSes using zoned block devices (SMR disks) doing garbage collection while idle. We cannot prevent this from happening, which is why I seriously dislike the idea of faking any command for a sleeping disk. > --- > drivers/ata/libata-core.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 09ed67772fae..6c5269de4bf2 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -5040,6 +5040,26 @@ void ata_qc_issue(struct ata_queued_cmd *qc) > > /* if device is sleeping, schedule reset and abort the link */ > if (unlikely(qc->dev->flags & ATA_DFLAG_SLEEPING)) { > + switch (qc->tf.command) > + { > + case ATA_CMD_CHK_POWER: > + case ATA_CMD_SLEEP: > + case ATA_CMD_FLUSH: > + case ATA_CMD_FLUSH_EXT: > + case ATA_CMD_STANDBYNOW1: > + if (qc->tf.command == ATA_CMD_ID_ATA) > + { > + /* only fake the reply for IDENTIFY if it is from userspace */ > + if (ata_tag_internal(qc->tag)) > + break; > + sg_copy_from_buffer(qc->sg, 1, qc->dev->id, 2 * ATA_ID_WORDS); > + } > + /* fake reply to avoid waking drive */ > + qc->flags |= ATA_QCFLAG_RTF_FILLED; > + qc->result_tf.nsect = 0; > + ata_qc_complete(qc); > + return; > + } > link->eh_info.action |= ATA_EH_RESET; > ata_ehi_push_desc(&link->eh_info, "waking up from sleep"); > ata_link_abort(link);
On 1/7/24 9:02 PM, Phillip Susi wrote: > When a disk is in SLEEP mode it can not respond to any > any commands. Several commands are simply a NOOP to a disk > that is in standby mode, but when a disk is in SLEEP mode, > they frequencly cause the disk to spin up for no reason. Frequently. > To avoid this, complete these commands in libata without > waking the disk. These commands are: > > CHECK POWER MODE > FLUSH CACHE > SLEEP > STANDBY IMMEDIATE > IDENTIFY > > If we know the disk is sleeping, we don't need to wake it up > to find out if it is in standby, so just pretend it is in > standby. While asleep, there's no dirty pages in the cache, > so there's no need to flush it. There's no point in waking > a disk from sleep just to put it back to sleep. We also have > a cache of the IDENTIFY information so just return that > instead of waking the disk. [...] Did you run your patches thru scripts/checkpatch.pl? Looking at this patch, I think you didn't... :-) > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 09ed67772fae..6c5269de4bf2 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -5040,6 +5040,26 @@ void ata_qc_issue(struct ata_queued_cmd *qc) > > /* if device is sleeping, schedule reset and abort the link */ > if (unlikely(qc->dev->flags & ATA_DFLAG_SLEEPING)) { > + switch (qc->tf.command) > + { switch (qc->tf.command) { > + case ATA_CMD_CHK_POWER: > + case ATA_CMD_SLEEP: > + case ATA_CMD_FLUSH: > + case ATA_CMD_FLUSH_EXT: > + case ATA_CMD_STANDBYNOW1: > + if (qc->tf.command == ATA_CMD_ID_ATA) This can't happen, you forgot: case ATA_CMD_ID_ATA: > + { if (qc->tf.command == ATA_CMD_ID_ATA) { > + /* only fake the reply for IDENTIFY if it is from userspace */ > + if (ata_tag_internal(qc->tag)) > + break; > + sg_copy_from_buffer(qc->sg, 1, qc->dev->id, 2 * ATA_ID_WORDS); > + } > + /* fake reply to avoid waking drive */ > + qc->flags |= ATA_QCFLAG_RTF_FILLED; > + qc->result_tf.nsect = 0; > + ata_qc_complete(qc); > + return; > + } > link->eh_info.action |= ATA_EH_RESET; > ata_ehi_push_desc(&link->eh_info, "waking up from sleep"); > ata_link_abort(link); MBR, Sergey
Damien Le Moal <dlemoal@kernel.org> writes: > sleep and standby are different power states. So saying that a disk that is > sleeping is in standby does not make sense. And if you wake up a drive from There is no way to answer CHECK POWER MODE and say the drive is in sleep. It can only be either active, or standby, so standby is the closest fit. At least it gets smartd and udisks2 to leave the drive alone. > The problem here is that ATA_DFLAG_SLEEPING is a horrible hack to not endup with > lots of timeout failures if the user execute "hdparm -Y". Executing such > passthrough command with a disk being used by an FS (for instance) is complete > nonsense and should not be done. I'm not sure what you propose be done instead. Regardless, this is how it has always been done, so I don't think there is any changing it now. You also have the legacy standby timer that is exposed to users through udisks2/gnome-disk-utility that still has to be supported. > So I would rather see this handled correctly, through the kernel pm runtime > suspend/resume: I'd eventually like to as well, but it should also work in kernels that aren't built with runtime pm enabled. > Now, the annoying thing is the drive being randomly woken-up due to commands > being issued, like the ones you mention. This is indeed bad, and seeing news > like this: > > https://www.phoronix.com/news/Linux-PM-Regulatory-Bugs > > I think we really should do better... That sounds like broken firmware to me. When you ask the firmware to power off the system, it should really be powered off. > For FSes issued commands like flush, these are generally not random at all. If > you see them appearing randomly, then there is a problem with the FS and > patching the FS may be needed. Beside flush, there are other things to > consider I'm not sure the filesystem maintainers will see it that way. They generally issue barriers as part of a commit at regular intervals, and that gets turned into FLUSH CACHE. Also the kernel issues one during system suspend, and I think that happens even if no filesystem is mounted. I think systemd also issues a sync() during shutdown, which would wake up a sleeping disk only to shut down. I don't think it is up to all of these other sources to be patched to avoid this. libata knows the disk is in sleep mode, so that is the place to handle it. > here. Ex: FSes using zoned block devices (SMR disks) doing garbage collection > while idle. We cannot prevent this from happening, which is why I seriously > dislike the idea of faking any command for a sleeping disk. I don't see the connection. If the FS issues IO in the background, the disk will wake up, just like it does when in standby. The faking of the commands simply replicates the behavior you get from standby in sleep mode. With this patch, as far as anyone can tell, a sleeping disk is exactly the same as one in standby.
Sergey Shtylyov <s.shtylyov@omp.ru> writes: > Did you run your patches thru scripts/checkpatch.pl? Looking > at this patch, I think you didn't... :-) Will do. > This can't happen, you forgot: > > case ATA_CMD_ID_ATA: Woops.. that must have gotten lost somehow when I was cutting and pasting to avoid the fallthrough.
On 1/8/24 22:27, Phillip Susi wrote: > Damien Le Moal <dlemoal@kernel.org> writes: > >> sleep and standby are different power states. So saying that a disk that is >> sleeping is in standby does not make sense. And if you wake up a drive from > > There is no way to answer CHECK POWER MODE and say the drive is in > sleep. It can only be either active, or standby, so standby is the > closest fit. At least it gets smartd and udisks2 to leave the drive > alone. I was not commenting about what to do about your problem, but about the fact that your sentence was very hard to understand because it was not technically accurate. >> The problem here is that ATA_DFLAG_SLEEPING is a horrible hack to not endup with >> lots of timeout failures if the user execute "hdparm -Y". Executing such >> passthrough command with a disk being used by an FS (for instance) is complete >> nonsense and should not be done. > > I'm not sure what you propose be done instead. Regardless, this is how > it has always been done, so I don't think there is any changing it now. I never proposed to change that in any way. That is fine and can stay as it is. What I do NOT want to do is expand upon this to try to solve issues. The reason for that, which I already stated, is that hdparm issue passthrough commands. And if the user wants to use passthrough commands, then most of the time, he/she will have to deal with the consequences of not using kernel-provided management methods. I did propose to allow for runtime suspend to to use sleep state instead of standby. That would be fairly easy to do and replace manual "hdparm -Y" with a well integrated control of the disk power state up to the block layer. You never commented back about this. > You also have the legacy standby timer that is exposed to users through > udisks2/gnome-disk-utility that still has to be supported. What is this legacy standby timer ? What control path does it trigger ? Do udisks2/gnome-disk-utility use that timer to issue commands like "hdparm -Y" ? Or does that timer tigh into the regular runtime suspend ? >> So I would rather see this handled correctly, through the kernel pm runtime >> suspend/resume: > > I'd eventually like to as well, but it should also work in kernels that > aren't built with runtime pm enabled. No. As said many times now, I am not going to do anything about the hdparm -Y hacking. If a user want better power management features, he/she should enable power management in their kernels. >> For FSes issued commands like flush, these are generally not random at all. If >> you see them appearing randomly, then there is a problem with the FS and >> patching the FS may be needed. Beside flush, there are other things to >> consider > > I'm not sure the filesystem maintainers will see it that way. They > generally issue barriers as part of a commit at regular intervals, and > that gets turned into FLUSH CACHE. Also the kernel issues one during > system suspend, and I think that happens even if no filesystem is > mounted. I think systemd also issues a sync() during shutdown, which > would wake up a sleeping disk only to shut down. No. The scsi layer issues a FLUSH CACHE whenever a disk is removed, goes to sleep or the system shutdown. And there is no need to do that if the disk is already in standby. If you see that happening, then we need to fix that. If the device is in sleep state from "hdparm -Y", then only libata knows that the device is sleeping with the ATA_DFLAG_SLEEPING flag. That is the fundamental problem here: pm-core, scsi and the block layer do not know that the block device is sleeping (and so already had its write cache flushed). Your patches are not solving this root cause issue. They are only hidding it by faking the commands. This is a hack, wich likely will need more hacks in the future for different cases. See my point ? I do not want to go down such route. Let's fix things properly. > I don't think it is up to all of these other sources to be patched to > avoid this. libata knows the disk is in sleep mode, so that is the > place to handle it. Not that simple. See above.
Damien Le Moal <dlemoal@kernel.org> writes: > I did propose to allow for runtime suspend to to use sleep state instead of > standby. That would be fairly easy to do and replace manual "hdparm -Y" with a > well integrated control of the disk power state up to the block layer. > You never commented back about this. That would be nice. I assume that would involve changing how libata-scsi.c translates SYNCHRONIZE CACHE from the scsi layer? > What is this legacy standby timer ? What control path does it trigger ? Do > udisks2/gnome-disk-utility use that timer to issue commands like "hdparm -Y" ? > Or does that timer tigh into the regular runtime suspend ? The ATA disk internal auto standby timer, i.e. hdparm -S. > No. As said many times now, I am not going to do anything about the hdparm -Y > hacking. If a user want better power management features, he/she should enable > power management in their kernels. So you are saying that we need to patch the kernel to make runtime pm work better, then patch smartd and udisks2 to check for runtime pm before issuing their SMART commands, and patch udsisks2 to enable runtime pm rather than using the legacy ATA standby timer? > No. The scsi layer issues a FLUSH CACHE whenever a disk is removed, goes to > sleep or the system shutdown. And there is no need to do that if the disk is > already in standby. If you see that happening, then we need to fix that. I'm almost certain that I have seen this happen, and I don't currently see any code in sd.c that would would prevent it from issuing a FLUSH CACHE to a disk that is runtime suspended when the system suspends or shuts down. The block layer also would need patched to avoid turning a barrier into a FLUSH CACHE if the disk is runtime suspended, and also the sync() path. Is that even sensible to do? It is true that for all block devices, their caches do not need flushed while they are runtime suspended? It seems like it may be, but I'm not certain.
Phillip Susi <phill@thesusis.net> writes: > The block layer also would need patched to avoid turning a barrier into > a FLUSH CACHE if the disk is runtime suspended, and also the sync() > path. Is that even sensible to do? It is true that for all block > devices, their caches do not need flushed while they are runtime > suspended? It seems like it may be, but I'm not certain. I was trying to do this. I think the right place is in blkdev_issue_flush(), but apparently bdev->bd_device is not the same struct device that gets suspended. I can't seem to work out where the right struct device is to pass to pm_runtime_suspended() and skip the flush operation.
Phillip Susi <phill@thesusis.net> writes: > I was trying to do this. I think the right place is in > blkdev_issue_flush(), but apparently bdev->bd_device is not the same > struct device that gets suspended. I can't seem to work out where the > right struct device is to pass to pm_runtime_suspended() and skip the > flush operation. I don't know what I was thinking yesterday. It can't rely on pm_runtime_suspended() because it would continue to flush and reset the suspend timer before it ever gets suspended. I wonder if it could use the performance counters? Whenever a flush is done, and also when suspending, store the value of the write counter, and only if it has changed, issue the flush, otherwise skip it?
On 1/20/24 05:43, Phillip Susi wrote: > Phillip Susi <phill@thesusis.net> writes: > >> The block layer also would need patched to avoid turning a barrier into >> a FLUSH CACHE if the disk is runtime suspended, and also the sync() >> path. Is that even sensible to do? It is true that for all block >> devices, their caches do not need flushed while they are runtime >> suspended? It seems like it may be, but I'm not certain. > > I was trying to do this. I think the right place is in > blkdev_issue_flush(), but apparently bdev->bd_device is not the same > struct device that gets suspended. I can't seem to work out where the > right struct device is to pass to pm_runtime_suspended() and skip the > flush operation. Flush issuing is a lot more complicated than just blkdev_issue_flush(). There is a whole file dedicated to handling flushes (block/blk-flush.c). But that is beside the point, which is that trying to not execute flush is simply completely wrong. Please stop trying. For your case, which is a drive put to sleep with hdparm -Y, only libata is aware that the drive is sleeping. That first needs to change if you want the kernel overall to be able to do anything. As I proposed already, using runtime PM with sleep mode instead of standby would be a good start. Regarding the flushes and other commands you see that are waking up the drive for no *apparent* good reasons, please identify what application/component is issuing these commands and look there to see why the commands are being issued and if that is done with awareness for the device power state. Then we can patch properly instead of trying to "hack".
On 1/21/24 03:08, Phillip Susi wrote: > Phillip Susi <phill@thesusis.net> writes: > >> I was trying to do this. I think the right place is in >> blkdev_issue_flush(), but apparently bdev->bd_device is not the same >> struct device that gets suspended. I can't seem to work out where the >> right struct device is to pass to pm_runtime_suspended() and skip the >> flush operation. > > I don't know what I was thinking yesterday. It can't rely on > pm_runtime_suspended() because it would continue to flush and reset the > suspend timer before it ever gets suspended. I wonder if it could use > the performance counters? Whenever a flush is done, and also when > suspending, store the value of the write counter, and only if it has > changed, issue the flush, otherwise skip it? See my reply to your previous comment. What you are trying to do is not the correct approach.
Damien Le Moal <dlemoal@kernel.org> writes: > Flush issuing is a lot more complicated than just blkdev_issue_flush(). There is > a whole file dedicated to handling flushes (block/blk-flush.c). > > But that is beside the point, which is that trying to not execute flush is > simply completely wrong. Please stop trying. I tried before to have libata ignore the useless flush and you said to stop the flush from happening in the first place. Now you say that's wrong? > For your case, which is a drive put to sleep with hdparm -Y, only libata is > aware that the drive is sleeping. That first needs to change if you want the > kernel overall to be able to do anything. As I proposed already, using runtime > PM with sleep mode instead of standby would be a good start. No, I'm working on runtime pm now, as you suggested. If you try using runtime pm with disks, you quickly see that it does not work. > Regarding the flushes and other commands you see that are waking up the drive > for no *apparent* good reasons, please identify what application/component is > issuing these commands and look there to see why the commands are being issued > and if that is done with awareness for the device power state. Filesystems flush every few seconds. So does anyone calling sync(), which the kernel does when you suspend to ram. Either the filesystems need to keep track of whether a flush is needed and skip it, or if they all call the same place ( blkdev_issue_flush ), then it only needs done once in that place. The core logic needs to be "if nothing has been written, then nothing needs to be flushed". Right now filesystems just flush periodically or when their sync method is called to make sure that anything that has been written is stable. In userspace you have smartd and udisks2 to worry about. The information they need to know is whether the disk has otherwise been in recent use and so they should not poll the SMART data. I don't see anything exported in the power/ directory that would give an indication of the current *remaining* time until autosuspend, or how long it has been since the last access. Either one would be useful for userspace to decide that nobody else seems to be using the disk, so I'll leave it alone for now so it can go to sleep.
On 1/25/24 01:04, Phillip Susi wrote: > Damien Le Moal <dlemoal@kernel.org> writes: > >> Flush issuing is a lot more complicated than just blkdev_issue_flush(). There is >> a whole file dedicated to handling flushes (block/blk-flush.c). >> >> But that is beside the point, which is that trying to not execute flush is >> simply completely wrong. Please stop trying. > > I tried before to have libata ignore the useless flush and you said to > stop the flush from happening in the first place. Now you say that's wrong? No, not wrong. But you are looking at the wrong layer. Do not try to prevent flushes from being issued at the block layer level but *above* it. That is, FSes, applications etc. >> For your case, which is a drive put to sleep with hdparm -Y, only libata is >> aware that the drive is sleeping. That first needs to change if you want the >> kernel overall to be able to do anything. As I proposed already, using runtime >> PM with sleep mode instead of standby would be a good start. > > No, I'm working on runtime pm now, as you suggested. If you try using > runtime pm with disks, you quickly see that it does not work. What does not work ? Everything is fine with my testing: the drive is always woken up whenever someone (FS, applications etc) issue a block IO (including flush) to the block layer. That is the expected behavior. If you want to have the disk keep sleeping, the device users must stop poking at the drive. > >> Regarding the flushes and other commands you see that are waking up the drive >> for no *apparent* good reasons, please identify what application/component is >> issuing these commands and look there to see why the commands are being issued >> and if that is done with awareness for the device power state. > > Filesystems flush every few seconds. So does anyone calling sync(), > which the kernel does when you suspend to ram. Filesystems issue flush only if they have dirty data to flush, normally. I have not looked at ext4/xfs code in a while, but if the FS has not committed any transaction in the last cycle, there should be no flush issued. If there is, then someone is reading or writing files (for reading, if you have atime enabled, that will cause a metadata change and so a transaction commit). > Either the filesystems need to keep track of whether a flush is needed > and skip it, or if they all call the same place ( blkdev_issue_flush ), > then it only needs done once in that place. See above. That is why I am telling you to run blktrace or any tracer to figure out the command sequence and who is doing what. There is absolutely no clarity to what you are describing and so we cannot plan for a solution. > The core logic needs to be "if nothing has been written, then nothing > needs to be flushed". Right now filesystems just flush periodically or > when their sync method is called to make sure that anything that has > been written is stable. I do not think that the periodic flush happens if nothing has been written. For "sync", someone issues that, not the FS on its own. So trace it to find out who is doing that. > In userspace you have smartd and udisks2 to worry about. The > information they need to know is whether the disk has otherwise been in > recent use and so they should not poll the SMART data. I don't see > anything exported in the power/ directory that would give an indication > of the current *remaining* time until autosuspend, or how long it has > been since the last access. Either one would be useful for userspace to > decide that nobody else seems to be using the disk, so I'll leave it > alone for now so it can go to sleep.
Damien Le Moal <dlemoal@kernel.org> writes: > What does not work ? Everything is fine with my testing: the drive is always > woken up whenever someone (FS, applications etc) issue a block IO (including > flush) to the block layer. That is the expected behavior. If you want to have > the disk keep sleeping, the device users must stop poking at the drive. It seems that I have put my foot in my mouth. When I first started working on these patches way back when, I did see flushes without any writes in the blktrace keeping the drive awake. I assumed that was still a problem that I needed to tackle. I should have tested it first. It seems this problem has been fixed already. ext4 does still seem to issue a flush with no writes in the sync path though, causing a drive to spin up for no reason, then right back down when you suspend-to-ram. I guess I'll ask about this on the ext4 list. Circling back to the PuiS patch, did I understand correctly that you are fine with that as long as it integrates with runtime pm? I had tried at one point to add support for REQUEST SENSE to libata so that sd can issue that to find out if the disk is powered up or not, and set the runtime_pm status of the disk accordingly. Does that make sense to you?
On 2/2/24 05:01, Phillip Susi wrote: > Damien Le Moal <dlemoal@kernel.org> writes: > >> What does not work ? Everything is fine with my testing: the drive is always >> woken up whenever someone (FS, applications etc) issue a block IO (including >> flush) to the block layer. That is the expected behavior. If you want to have >> the disk keep sleeping, the device users must stop poking at the drive. > > It seems that I have put my foot in my mouth. When I first started > working on these patches way back when, I did see flushes without any > writes in the blktrace keeping the drive awake. I assumed that was > still a problem that I needed to tackle. I should have tested it > first. It seems this problem has been fixed already. > > ext4 does still seem to issue a flush with no writes in the sync path > though, causing a drive to spin up for no reason, then right back down > when you suspend-to-ram. I guess I'll ask about this on the ext4 list. > > Circling back to the PuiS patch, did I understand correctly that you are > fine with that as long as it integrates with runtime pm? Yes, but only for drives that report full identify data when PUIS is enabled. For drives that report incomplete identify data, we have no choice but to wake them up. And yes, we need integration with runtime pm to set the initial power state of the drive to standby (instead of "on") for both the ata device and its scsi device. > I had tried at one point to add support for REQUEST SENSE to libata so > that sd can issue that to find out if the disk is powered up or not, and > set the runtime_pm status of the disk accordingly. Does that make sense > to you? I need to check that. I think there may be a better/easier way to get the current power state of a drive. Will get back to you on that.
Damien Le Moal <dlemoal@kernel.org> writes: > Yes, but only for drives that report full identify data when PUIS is enabled. > For drives that report incomplete identify data, we have no choice but to wake > them up. And yes, we need integration with runtime pm to set the > initial power Why was that again? I think you said something about needing to set the speed correctly so you at least need to know if this drive requires a lower speed than the other in the PATA master/slave pair? Wouldn't that only require the speed information, not all identify data? > state of the drive to standby (instead of "on") for both the ata device and its > scsi device. You mean if the whole device hierarchy were changed so that instead of the scsi_host being a child of the port with the links and devices hanging off to the side, the scsi_host would be the child of the ata device? > I need to check that. I think there may be a better/easier way to get the > current power state of a drive. Will get back to you on that. That would be good. At one point that was the way I found, and also I think it was in the SAT-3 spec that is how REQUEST SENSE should be implemented for ATA disks.
On 2/3/24 04:53, Phillip Susi wrote: > Damien Le Moal <dlemoal@kernel.org> writes: > >> Yes, but only for drives that report full identify data when PUIS is enabled. >> For drives that report incomplete identify data, we have no choice but to wake >> them up. And yes, we need integration with runtime pm to set the >> initial power > > Why was that again? I think you said something about needing to set the > speed correctly so you at least need to know if this drive requires a > lower speed than the other in the PATA master/slave pair? Wouldn't that > only require the speed information, not all identify data? See ata_dev_revalidate() and ata_dev_configure() and all the drive features that are checked using the identify data. We need to preserve that to ensure that nothing changed on the drive so that its representation in libata is kept in sync with the drive config. That is why drive starting with PUIS and not giving full identify data *must* be woken up, which is the current libata behavior. >> state of the drive to standby (instead of "on") for both the ata device and its >> scsi device. > > You mean if the whole device hierarchy were changed so that instead of > the scsi_host being a child of the port with the links and devices > hanging off to the side, the scsi_host would be the child of the ata device? You do not need to change the hierarchy of devices. An ata_dev is already a child of its scsi_dev. So if you want to set the ata device to runtime suspend state, you have to have the parent in the same state too. runtime suspend work top-to-bottom in the device chain. You cannot have random device in the middle of the chain going to suspend without the devices above it also being suspended. Also, the user does not use ata devices directly. They use the scsi device representing the ata device. You must thus have that in sync with the ata device state. >> I need to check that. I think there may be a better/easier way to get the >> current power state of a drive. Will get back to you on that. > > That would be good. At one point that was the way I found, and also I > think it was in the SAT-3 spec that is how REQUEST SENSE should be > implemented for ATA disks. TEST UNIT READY is the command to use. I need to check the specs again about how it reports the device state though. I think it is through sense data. The scsi disk driver issues that command already to check that the drive is spun-up. See sd_revalidate_disk() calling sd_spinup_disk(). So that will also need to change to be aware of the initial power state to not spinup the drive if not wanted. That will need to be done extremely carefully though to only affect ata devices with PUIS. Likely some additional scsi device flag will be needed for this. I have not looked into that yet.
Damien Le Moal <dlemoal@kernel.org> writes: > See ata_dev_revalidate() and ata_dev_configure() and all the drive features that > are checked using the identify data. We need to preserve that to ensure that > nothing changed on the drive so that its representation in libata is kept in > sync with the drive config. That is why drive starting with PUIS and not giving > full identify data *must* be woken up, which is the current libata behavior. Yes, the information is needed to revalidate, but why must this revalidation be done during system resume, rather than postponed until later? > You do not need to change the hierarchy of devices. An ata_dev is already a > child of its scsi_dev. So if you want to set the ata device to runtime > suspend How is an ata_dev a child of its own scsi_dev? Or did you mean to say the reverse: the scsi_dev is a child of the ata_dev? But that isn't the case either. In sysfs, you have: ata_port - scsi_host - scsi_target - scsi_lun - ata_link - ata_dev The link and dev hang off to the side, not in the ancestry of the scsi disk. > state, you have to have the parent in the same state too. runtime suspend work > top-to-bottom in the device chain. You cannot have random device in the middle > of the chain going to suspend without the devices above it also being suspended. > > Also, the user does not use ata devices directly. They use the scsi device > representing the ata device. You must thus have that in sync with the ata device > state. Right... so when the system is resumed, first the ata_port is resumed, then it has to be on for the scsi host, target, and lun to resume. At that point sd could check if the disk is actually spinning, and if not, force it to suspend, which then allows the target to suspend, wich then allows the host to suspend, and finally the ata_port.
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 09ed67772fae..6c5269de4bf2 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -5040,6 +5040,26 @@ void ata_qc_issue(struct ata_queued_cmd *qc) /* if device is sleeping, schedule reset and abort the link */ if (unlikely(qc->dev->flags & ATA_DFLAG_SLEEPING)) { + switch (qc->tf.command) + { + case ATA_CMD_CHK_POWER: + case ATA_CMD_SLEEP: + case ATA_CMD_FLUSH: + case ATA_CMD_FLUSH_EXT: + case ATA_CMD_STANDBYNOW1: + if (qc->tf.command == ATA_CMD_ID_ATA) + { + /* only fake the reply for IDENTIFY if it is from userspace */ + if (ata_tag_internal(qc->tag)) + break; + sg_copy_from_buffer(qc->sg, 1, qc->dev->id, 2 * ATA_ID_WORDS); + } + /* fake reply to avoid waking drive */ + qc->flags |= ATA_QCFLAG_RTF_FILLED; + qc->result_tf.nsect = 0; + ata_qc_complete(qc); + return; + } link->eh_info.action |= ATA_EH_RESET; ata_ehi_push_desc(&link->eh_info, "waking up from sleep"); ata_link_abort(link);