diff mbox series

ata: libata: Clear DID_TIME_OUT at the start of EH instead of at the end

Message ID 20240906142720.1977225-2-cassel@kernel.org
State New
Headers show
Series ata: libata: Clear DID_TIME_OUT at the start of EH instead of at the end | expand

Commit Message

Niklas Cassel Sept. 6, 2024, 2:27 p.m. UTC
When ata_qc_complete() schedules a command for EH using
ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
req->q->mq_ops->timeout() / scsi_timeout() being called.

scsi_timeout(), if the LLDD has no abort handler (libata has no abort
handler), will set host byte to DID_TIME_OUT, and then call
scsi_eh_scmd_add() to add the command to EH.

Thus, when commands first enter libata's EH strategy_handler, all the
commands that have been added to EH will have DID_TIME_OUT set.

libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
have not received a completion at the time of entering EH.

Thus, we don't really care about DID_TIME_OUT at all, and currently clear
the host byte at the end of EH, in ata_scsi_qc_complete(), before
scsi_eh_finish_cmd() is called.

ata_scsi_qc_complete() will be called both for commands that are completed
normally (without going via EH), and for commands that went via EH.

It seems more appropriate to clear DID_TIME_OUT at the start of EH instead
of at the end of EH. That way, someone dumping the host byte at the middle
of EH will not see DID_TIME_OUT as being set. No functional change
intended.

This has the additional advantage that we will not needlessly clear the
host byte for commands that did not go via EH.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/ata/libata-eh.c   | 9 +++++++++
 drivers/ata/libata-scsi.c | 3 ---
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Igor Pylypiv Sept. 6, 2024, 6:39 p.m. UTC | #1
On Fri, Sep 06, 2024 at 04:27:21PM +0200, Niklas Cassel wrote:
> When ata_qc_complete() schedules a command for EH using
> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
> req->q->mq_ops->timeout() / scsi_timeout() being called.
> 
> scsi_timeout(), if the LLDD has no abort handler (libata has no abort
> handler), will set host byte to DID_TIME_OUT, and then call
> scsi_eh_scmd_add() to add the command to EH.
> 
> Thus, when commands first enter libata's EH strategy_handler, all the
> commands that have been added to EH will have DID_TIME_OUT set.
> 
> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
> have not received a completion at the time of entering EH.
> 
> Thus, we don't really care about DID_TIME_OUT at all, and currently clear
> the host byte at the end of EH, in ata_scsi_qc_complete(), before
> scsi_eh_finish_cmd() is called.
> 
> ata_scsi_qc_complete() will be called both for commands that are completed
> normally (without going via EH), and for commands that went via EH.
> 
> It seems more appropriate to clear DID_TIME_OUT at the start of EH instead
> of at the end of EH. That way, someone dumping the host byte at the middle
> of EH will not see DID_TIME_OUT as being set. No functional change
> intended.
> 
> This has the additional advantage that we will not needlessly clear the
> host byte for commands that did not go via EH.
> 
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

Thank you for coming up with this generic solution, Niklas!

I tested the patch with sg_sat_read_gplog. Without the fix we can see that
DID_TIME_OUT is being reported to user space. The issue is gone with the fix
applied.

Without the fix:                                                             
----------------                                                                 
$ sg_sat_read_gplog -a 30 -p 9 --dma -vv /dev/sda                               
open /dev/sda with flags=0x802                                                  
Building ATA READ LOG DMA EXT command; la=0x1e, pn=0x9, this_count=1            
    ATA pass-through(16) cdb: [85 0d 0e 00 00 00 01 00 1e 00 09 00 00 00 47 00] 
ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT]          
Driver_status=0x08 [DRIVER_SENSE]                        ^^^^^^^^^^^^           
                                                                                
ATA pass-through(16):                                                           
    Fixed format, current; Sense key: Illegal Request                           
    ASC=24, ASCQ=00 (hex)                                                       
      Info fld=0x4530001 [72548353]                                             
     Raw sense data (in hex), sb_len=64                                         
            f0 00 05 04 53 00 01 0a  80 1e 09 00 24 00 00 00                    
            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                    
            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                    
            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                    
                                                                                
ATA PASS-THROUGH (16) failed                                                    
                                                                                
$ echo $?                                                                       
99                                                                              
                                                                                
With the fix:                                                                  
-------------                                                                  
$ sg_sat_read_gplog -a 30 -p 9 --dma -vv /dev/sda                               
open /dev/sda with flags=0x802                                                  
Building ATA READ LOG DMA EXT command; la=0x1e, pn=0x9, this_count=1            
    ATA pass-through(16) cdb: [85 0d 0e 00 00 00 01 00 1e 00 09 00 00 00 47 00] 
ATA pass through:                                                               
Fixed format, current; Sense key: Illegal Request                               
ASC=24, ASCQ=00 (hex)                                                           
  Info fld=0x4530001 [72548353]                                                 
                                                                                
$ echo $?                                                                       
5   


Tested-by: Igor Pylypiv <ipylypiv@google.com>

Thanks,
Igor

> ---
>  drivers/ata/libata-eh.c   | 9 +++++++++
>  drivers/ata/libata-scsi.c | 3 ---
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 7de97ee8e78b..450e9bd96c97 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>  	list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) {
>  		struct ata_queued_cmd *qc;
>  
> +		/*
> +		 * If the scmd was added to EH, via ata_qc_schedule_eh() ->
> +		 * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will
> +		 * have set DID_TIME_OUT (since libata does not have an abort
> +		 * handler). Thus to clear the DID_TIME_OUT, we clear the host
> +		 * byte (but keep the SCSI ML and status byte).
> +		 */
> +		scmd->result &= 0x0000ffff;
> +
>  		ata_qc_for_each_raw(ap, qc, i) {
>  			if (qc->flags & ATA_QCFLAG_ACTIVE &&
>  			    qc->scsicmd == scmd)
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 3a442f564b0d..6a90062c8b55 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1680,9 +1680,6 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>  			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
>  	} else if (is_error && !have_sense) {
>  		ata_gen_ata_sense(qc);
> -	} else {
> -		/* Keep the SCSI ML and status byte, clear host byte. */
> -		cmd->result &= 0x0000ffff;
>  	}
>  
>  	ata_qc_done(qc);
> -- 
> 2.46.0
>
Niklas Cassel Sept. 6, 2024, 8:31 p.m. UTC | #2
On 6 September 2024 20:39:03 CEST, Igor Pylypiv <ipylypiv@google.com> wrote:
>On Fri, Sep 06, 2024 at 04:27:21PM +0200, Niklas Cassel wrote:
>> When ata_qc_complete() schedules a command for EH using
>> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
>> req->q->mq_ops->timeout() / scsi_timeout() being called.
>> 
>> scsi_timeout(), if the LLDD has no abort handler (libata has no abort
>> handler), will set host byte to DID_TIME_OUT, and then call
>> scsi_eh_scmd_add() to add the command to EH.
>> 
>> Thus, when commands first enter libata's EH strategy_handler, all the
>> commands that have been added to EH will have DID_TIME_OUT set.
>> 
>> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
>> have not received a completion at the time of entering EH.
>> 
>> Thus, we don't really care about DID_TIME_OUT at all, and currently clear
>> the host byte at the end of EH, in ata_scsi_qc_complete(), before
>> scsi_eh_finish_cmd() is called.
>> 
>> ata_scsi_qc_complete() will be called both for commands that are completed
>> normally (without going via EH), and for commands that went via EH.
>> 
>> It seems more appropriate to clear DID_TIME_OUT at the start of EH instead
>> of at the end of EH. That way, someone dumping the host byte at the middle
>> of EH will not see DID_TIME_OUT as being set. No functional change
>> intended.
>> 
>> This has the additional advantage that we will not needlessly clear the
>> host byte for commands that did not go via EH.
>> 
>> Signed-off-by: Niklas Cassel <cassel@kernel.org>
>
>Thank you for coming up with this generic solution, Niklas!
>
>I tested the patch with sg_sat_read_gplog. Without the fix we can see that
>DID_TIME_OUT is being reported to user space. The issue is gone with the fix
>applied.
>
>Without the fix:                                                             
>----------------                                                                 
>$ sg_sat_read_gplog -a 30 -p 9 --dma -vv /dev/sda                               
>open /dev/sda with flags=0x802                                                  
>Building ATA READ LOG DMA EXT command; la=0x1e, pn=0x9, this_count=1            
>    ATA pass-through(16) cdb: [85 0d 0e 00 00 00 01 00 1e 00 09 00 00 00 47 00] 
>ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT]          
>Driver_status=0x08 [DRIVER_SENSE]                        ^^^^^^^^^^^^           
>                                                                                
>ATA pass-through(16):                                                           
>    Fixed format, current; Sense key: Illegal Request                           
>    ASC=24, ASCQ=00 (hex)                                                       
>      Info fld=0x4530001 [72548353]                                             
>     Raw sense data (in hex), sb_len=64                                         
>            f0 00 05 04 53 00 01 0a  80 1e 09 00 24 00 00 00                    
>            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                    
>            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                    
>            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                    
>                                                                                
>ATA PASS-THROUGH (16) failed                                                    
>                                                                                
>$ echo $?                                                                       
>99                                                                              
>                                                                                
>With the fix:                                                                  
>-------------                                                                  
>$ sg_sat_read_gplog -a 30 -p 9 --dma -vv /dev/sda                               
>open /dev/sda with flags=0x802                                                  
>Building ATA READ LOG DMA EXT command; la=0x1e, pn=0x9, this_count=1            
>    ATA pass-through(16) cdb: [85 0d 0e 00 00 00 01 00 1e 00 09 00 00 00 47 00] 
>ATA pass through:                                                               
>Fixed format, current; Sense key: Illegal Request                               
>ASC=24, ASCQ=00 (hex)                                                           
>  Info fld=0x4530001 [72548353]                                                 
>                                                                                
>$ echo $?                                                                       
>5   
>
>
>Tested-by: Igor Pylypiv <ipylypiv@google.com>

Thank you for testing!

Since you point out that this is actually solving a real issue (for ATA PT commands), we should add a Fixes-tag, and improve the commit message to mention that it fixes ATA PT commands.

Will send a V2 on Monday, with your Reported-by tag.

Have a nice weekend!

Kind regards,
Niklas
diff mbox series

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 7de97ee8e78b..450e9bd96c97 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -630,6 +630,15 @@  void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 	list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) {
 		struct ata_queued_cmd *qc;
 
+		/*
+		 * If the scmd was added to EH, via ata_qc_schedule_eh() ->
+		 * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will
+		 * have set DID_TIME_OUT (since libata does not have an abort
+		 * handler). Thus to clear the DID_TIME_OUT, we clear the host
+		 * byte (but keep the SCSI ML and status byte).
+		 */
+		scmd->result &= 0x0000ffff;
+
 		ata_qc_for_each_raw(ap, qc, i) {
 			if (qc->flags & ATA_QCFLAG_ACTIVE &&
 			    qc->scsicmd == scmd)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3a442f564b0d..6a90062c8b55 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1680,9 +1680,6 @@  static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
 	} else if (is_error && !have_sense) {
 		ata_gen_ata_sense(qc);
-	} else {
-		/* Keep the SCSI ML and status byte, clear host byte. */
-		cmd->result &= 0x0000ffff;
 	}
 
 	ata_qc_done(qc);