Message ID | 20241023105540.1070012-2-cassel@kernel.org |
---|---|
State | New |
Headers | show |
Series | ata: libata: Set DID_TIME_OUT for commands that actually timed out | expand |
On Wed, Oct 23, 2024 at 12:55:41PM +0200, Niklas Cassel wrote: > When ata_qc_complete() schedules a command for EH using > ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to > req->q->mq_ops->timeout() / scsi_timeout() being called. > > scsi_timeout(), if the LLDD has no abort handler (libata has no abort > handler), will set host byte to DID_TIME_OUT, and then call > scsi_eh_scmd_add() to add the command to EH. > > Thus, when commands first enter libata's EH strategy_handler, all the > commands that have been added to EH will have DID_TIME_OUT set. > > Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands > with sense data") clears this bogus DID_TIME_OUT flag for all commands > that reached libata's EH strategy_handler. > > libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that > have not received a completion at the time of entering EH. > > ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by > default timed out commands will get flag ATA_QCFLAG_RETRY set and will be > retried after the port has been reset (ata_eh_link_autopsy() always > triggers a port reset if any command has AC_ERR_TIMEOUT set). > > For commands that have ATA_QCFLAG_RETRY set, but also has an error flag > set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment > scmd->allowed, so the command will at most be retried (scmd->allowed > number of times, which by default is set to 3). > > However, scsi_eh_flush_done_q() will only retry commands for which > scsi_noretry_cmd() returns false. > > For commands that has DID_TIME_OUT set, if the command is either > has FAILFAST or if the command is a passthrough command, scsi_noretry_cmd() "if the command has either FAILFAST flag set or if ..." I could fix up this sentence when applying. > will return true. Thus, such commands will never be retried. > > Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that > actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out > commands will once again not be retried if they are also a FAILFAST or > passthrough command. > > Cc: stable@vger.kernel.org > Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data") > Reported-by: Lai, Yi <yi1.lai@linux.intel.com> > Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/ > Signed-off-by: Niklas Cassel <cassel@kernel.org> > --- > drivers/ata/libata-eh.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index fa41ea57a978..3b303d4ae37a 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -651,6 +651,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, > /* the scmd has an associated qc */ > if (!(qc->flags & ATA_QCFLAG_EH)) { > /* which hasn't failed yet, timeout */ > + set_host_byte(scmd, DID_TIME_OUT); > qc->err_mask |= AC_ERR_TIMEOUT; > qc->flags |= ATA_QCFLAG_EH; > nr_timedout++; > -- > 2.47.0 >
On 10/23/24 19:55, Niklas Cassel wrote: > When ata_qc_complete() schedules a command for EH using > ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to > req->q->mq_ops->timeout() / scsi_timeout() being called. > > scsi_timeout(), if the LLDD has no abort handler (libata has no abort > handler), will set host byte to DID_TIME_OUT, and then call > scsi_eh_scmd_add() to add the command to EH. > > Thus, when commands first enter libata's EH strategy_handler, all the > commands that have been added to EH will have DID_TIME_OUT set. > > Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands > with sense data") clears this bogus DID_TIME_OUT flag for all commands > that reached libata's EH strategy_handler. > > libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that > have not received a completion at the time of entering EH. > > ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by > default timed out commands will get flag ATA_QCFLAG_RETRY set and will be > retried after the port has been reset (ata_eh_link_autopsy() always > triggers a port reset if any command has AC_ERR_TIMEOUT set). > > For commands that have ATA_QCFLAG_RETRY set, but also has an error flag > set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment > scmd->allowed, so the command will at most be retried (scmd->allowed > number of times, which by default is set to 3). > > However, scsi_eh_flush_done_q() will only retry commands for which > scsi_noretry_cmd() returns false. > > For commands that has DID_TIME_OUT set, if the command is either > has FAILFAST or if the command is a passthrough command, scsi_noretry_cmd() > will return true. Thus, such commands will never be retried. > > Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that > actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out > commands will once again not be retried if they are also a FAILFAST or > passthrough command. > > Cc: stable@vger.kernel.org > Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data") > Reported-by: Lai, Yi <yi1.lai@linux.intel.com> > Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/ > Signed-off-by: Niklas Cassel <cassel@kernel.org> Looks good. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> > --- > drivers/ata/libata-eh.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index fa41ea57a978..3b303d4ae37a 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -651,6 +651,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, > /* the scmd has an associated qc */ > if (!(qc->flags & ATA_QCFLAG_EH)) { > /* which hasn't failed yet, timeout */ > + set_host_byte(scmd, DID_TIME_OUT); > qc->err_mask |= AC_ERR_TIMEOUT; > qc->flags |= ATA_QCFLAG_EH; > nr_timedout++;
On Wed, 23 Oct 2024 12:55:41 +0200, Niklas Cassel wrote: > When ata_qc_complete() schedules a command for EH using > ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to > req->q->mq_ops->timeout() / scsi_timeout() being called. > > scsi_timeout(), if the LLDD has no abort handler (libata has no abort > handler), will set host byte to DID_TIME_OUT, and then call > scsi_eh_scmd_add() to add the command to EH. > > [...] Applied to libata/linux.git (for-6.12-fixes), thanks! [1/1] ata: libata: Set DID_TIME_OUT for commands that actually timed out https://git.kernel.org/libata/linux/c/8e59a2a5 Kind regards, Niklas
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index fa41ea57a978..3b303d4ae37a 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -651,6 +651,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, /* the scmd has an associated qc */ if (!(qc->flags & ATA_QCFLAG_EH)) { /* which hasn't failed yet, timeout */ + set_host_byte(scmd, DID_TIME_OUT); qc->err_mask |= AC_ERR_TIMEOUT; qc->flags |= ATA_QCFLAG_EH; nr_timedout++;
When ata_qc_complete() schedules a command for EH using ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to req->q->mq_ops->timeout() / scsi_timeout() being called. scsi_timeout(), if the LLDD has no abort handler (libata has no abort handler), will set host byte to DID_TIME_OUT, and then call scsi_eh_scmd_add() to add the command to EH. Thus, when commands first enter libata's EH strategy_handler, all the commands that have been added to EH will have DID_TIME_OUT set. Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data") clears this bogus DID_TIME_OUT flag for all commands that reached libata's EH strategy_handler. libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that have not received a completion at the time of entering EH. ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by default timed out commands will get flag ATA_QCFLAG_RETRY set and will be retried after the port has been reset (ata_eh_link_autopsy() always triggers a port reset if any command has AC_ERR_TIMEOUT set). For commands that have ATA_QCFLAG_RETRY set, but also has an error flag set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment scmd->allowed, so the command will at most be retried (scmd->allowed number of times, which by default is set to 3). However, scsi_eh_flush_done_q() will only retry commands for which scsi_noretry_cmd() returns false. For commands that has DID_TIME_OUT set, if the command is either has FAILFAST or if the command is a passthrough command, scsi_noretry_cmd() will return true. Thus, such commands will never be retried. Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out commands will once again not be retried if they are also a FAILFAST or passthrough command. Cc: stable@vger.kernel.org Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data") Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/ Signed-off-by: Niklas Cassel <cassel@kernel.org> --- drivers/ata/libata-eh.c | 1 + 1 file changed, 1 insertion(+)