Message ID | 20240909084745.2029818-2-cassel@kernel.org |
---|---|
State | New |
Headers | show |
Series | [v2] ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data | expand |
On 9/9/24 17:47, Niklas Cassel wrote: > When ata_qc_complete() schedules a command for EH using > ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to > req->q->mq_ops->timeout() / scsi_timeout() being called. > > scsi_timeout(), if the LLDD has no abort handler (libata has no abort > handler), will set host byte to DID_TIME_OUT, and then call > scsi_eh_scmd_add() to add the command to EH. > > Thus, when commands first enter libata's EH strategy_handler, all the > commands that have been added to EH will have DID_TIME_OUT set. > > libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that > have not received a completion at the time of entering EH. > > Thus, libata doesn't really care about DID_TIME_OUT at all, and currently > clears the host byte at the end of EH, in ata_scsi_qc_complete(), before > scsi_eh_finish_cmd() is called. > > However, this clearing in ata_scsi_qc_complete() is currently only done > for commands that are not ATA passthrough commands. > > Since the host byte is visible in the completion that we return to user > space for ATA passthrough commands, for ATA passthrough commands that got > completed via EH (commands with sense data), the user will incorrectly see: > ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT] > > Fix this by moving the clearing of the host byte (which is currently only > done for commands that are not ATA passthrough commands) from > ata_scsi_qc_complete() to the start of EH (regardless if the command is > ATA passthrough or not). > > This will make sure that we: > -Correctly clear DID_TIME_OUT for both ATA passthrough commands and > commands that are not ATA passthrough commands. > -Do not needlessly clear the host byte for commands that did not go via EH. > ata_scsi_qc_complete() is called both for commands that are completed > normally (without going via EH), and for commands that went via EH, > however, only commands that went via EH will have DID_TIME_OUT set. > > Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION") > Reported-by: Igor Pylypiv <ipylypiv@google.com> > Closes: https://lore.kernel.org/linux-ide/ZttIN8He8TOZ7Lct@google.com/ > Tested-by: Igor Pylypiv <ipylypiv@google.com> > Signed-off-by: Niklas Cassel <cassel@kernel.org> > --- > Changes since v1: > -Picked up tags from Igor. > -Added Fixes tag. > -Improved the commit message to clearly state that this is currently a > real bug for ATA PT commands with sense data. > > drivers/ata/libata-eh.c | 9 +++++++++ > drivers/ata/libata-scsi.c | 3 --- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index 7de97ee8e78b..450e9bd96c97 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, > list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) { > struct ata_queued_cmd *qc; > > + /* > + * If the scmd was added to EH, via ata_qc_schedule_eh() -> > + * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will > + * have set DID_TIME_OUT (since libata does not have an abort > + * handler). Thus to clear the DID_TIME_OUT, we clear the host > + * byte (but keep the SCSI ML and status byte). > + */ > + scmd->result &= 0x0000ffff; I know it was like that, but why not: set_host_byte(scmd, 0); or set_host_byte(scmd, DID_OK); ? set_host_byte() uses the mask 0xff00ffff, since the upper 8 bits seem to be ignored: bits [0..7] are the status byte, [16..23] are the host byte and bits [8..15] are the message byte but that is unused. > + > ata_qc_for_each_raw(ap, qc, i) { > if (qc->flags & ATA_QCFLAG_ACTIVE && > qc->scsicmd == scmd) > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 3a442f564b0d..6a90062c8b55 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1680,9 +1680,6 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) > set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION); > } else if (is_error && !have_sense) { > ata_gen_ata_sense(qc); > - } else { > - /* Keep the SCSI ML and status byte, clear host byte. */ > - cmd->result &= 0x0000ffff; > } > > ata_qc_done(qc);
On Mon, Sep 09, 2024 at 09:52:53PM +0900, Damien Le Moal wrote: > On 9/9/24 17:47, Niklas Cassel wrote: > > When ata_qc_complete() schedules a command for EH using > > ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to > > req->q->mq_ops->timeout() / scsi_timeout() being called. > > > > scsi_timeout(), if the LLDD has no abort handler (libata has no abort > > handler), will set host byte to DID_TIME_OUT, and then call > > scsi_eh_scmd_add() to add the command to EH. > > > > Thus, when commands first enter libata's EH strategy_handler, all the > > commands that have been added to EH will have DID_TIME_OUT set. > > > > libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that > > have not received a completion at the time of entering EH. > > > > Thus, libata doesn't really care about DID_TIME_OUT at all, and currently > > clears the host byte at the end of EH, in ata_scsi_qc_complete(), before > > scsi_eh_finish_cmd() is called. > > > > However, this clearing in ata_scsi_qc_complete() is currently only done > > for commands that are not ATA passthrough commands. > > > > Since the host byte is visible in the completion that we return to user > > space for ATA passthrough commands, for ATA passthrough commands that got > > completed via EH (commands with sense data), the user will incorrectly see: > > ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT] > > > > Fix this by moving the clearing of the host byte (which is currently only > > done for commands that are not ATA passthrough commands) from > > ata_scsi_qc_complete() to the start of EH (regardless if the command is > > ATA passthrough or not). > > > > This will make sure that we: > > -Correctly clear DID_TIME_OUT for both ATA passthrough commands and > > commands that are not ATA passthrough commands. > > -Do not needlessly clear the host byte for commands that did not go via EH. > > ata_scsi_qc_complete() is called both for commands that are completed > > normally (without going via EH), and for commands that went via EH, > > however, only commands that went via EH will have DID_TIME_OUT set. > > > > Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION") > > Reported-by: Igor Pylypiv <ipylypiv@google.com> > > Closes: https://lore.kernel.org/linux-ide/ZttIN8He8TOZ7Lct@google.com/ > > Tested-by: Igor Pylypiv <ipylypiv@google.com> > > Signed-off-by: Niklas Cassel <cassel@kernel.org> > > --- > > Changes since v1: > > -Picked up tags from Igor. > > -Added Fixes tag. > > -Improved the commit message to clearly state that this is currently a > > real bug for ATA PT commands with sense data. > > > > drivers/ata/libata-eh.c | 9 +++++++++ > > drivers/ata/libata-scsi.c | 3 --- > > 2 files changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > > index 7de97ee8e78b..450e9bd96c97 100644 > > --- a/drivers/ata/libata-eh.c > > +++ b/drivers/ata/libata-eh.c > > @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, > > list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) { > > struct ata_queued_cmd *qc; > > > > + /* > > + * If the scmd was added to EH, via ata_qc_schedule_eh() -> > > + * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will > > + * have set DID_TIME_OUT (since libata does not have an abort > > + * handler). Thus to clear the DID_TIME_OUT, we clear the host > > + * byte (but keep the SCSI ML and status byte). > > + */ > > + scmd->result &= 0x0000ffff; > > I know it was like that, but why not: > > set_host_byte(scmd, 0); > or > set_host_byte(scmd, DID_OK); > > ? No particular reason. Since we basically just moving the code, it made sense to keep it similar to the original code, but I can submit a v3 that instead does: set_host_byte(scmd, DID_OK); if you prefer that. Strictly speaking, that would probably require us to drop Igor's Tested-by though (even if the generated code for an optimizing compiler ought to generate the same code). > > set_host_byte() uses the mask 0xff00ffff, since the upper 8 bits seem to be > ignored: bits [0..7] are the status byte, [16..23] are the host byte and bits > [8..15] are the message byte but that is unused. Nit: 8..16 is the SCSI midlayer byte, not message byte, see 36ebf1e2aa14 ("scsi: core: Add error codes for internal SCSI midlayer use") Kind regards, Niklas
On 9/9/24 22:08, Niklas Cassel wrote: > On Mon, Sep 09, 2024 at 09:52:53PM +0900, Damien Le Moal wrote: >> On 9/9/24 17:47, Niklas Cassel wrote: >>> When ata_qc_complete() schedules a command for EH using >>> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to >>> req->q->mq_ops->timeout() / scsi_timeout() being called. >>> >>> scsi_timeout(), if the LLDD has no abort handler (libata has no abort >>> handler), will set host byte to DID_TIME_OUT, and then call >>> scsi_eh_scmd_add() to add the command to EH. >>> >>> Thus, when commands first enter libata's EH strategy_handler, all the >>> commands that have been added to EH will have DID_TIME_OUT set. >>> >>> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that >>> have not received a completion at the time of entering EH. >>> >>> Thus, libata doesn't really care about DID_TIME_OUT at all, and currently >>> clears the host byte at the end of EH, in ata_scsi_qc_complete(), before >>> scsi_eh_finish_cmd() is called. >>> >>> However, this clearing in ata_scsi_qc_complete() is currently only done >>> for commands that are not ATA passthrough commands. >>> >>> Since the host byte is visible in the completion that we return to user >>> space for ATA passthrough commands, for ATA passthrough commands that got >>> completed via EH (commands with sense data), the user will incorrectly see: >>> ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT] >>> >>> Fix this by moving the clearing of the host byte (which is currently only >>> done for commands that are not ATA passthrough commands) from >>> ata_scsi_qc_complete() to the start of EH (regardless if the command is >>> ATA passthrough or not). >>> >>> This will make sure that we: >>> -Correctly clear DID_TIME_OUT for both ATA passthrough commands and >>> commands that are not ATA passthrough commands. >>> -Do not needlessly clear the host byte for commands that did not go via EH. >>> ata_scsi_qc_complete() is called both for commands that are completed >>> normally (without going via EH), and for commands that went via EH, >>> however, only commands that went via EH will have DID_TIME_OUT set. >>> >>> Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION") >>> Reported-by: Igor Pylypiv <ipylypiv@google.com> >>> Closes: https://lore.kernel.org/linux-ide/ZttIN8He8TOZ7Lct@google.com/ >>> Tested-by: Igor Pylypiv <ipylypiv@google.com> >>> Signed-off-by: Niklas Cassel <cassel@kernel.org> >>> --- >>> Changes since v1: >>> -Picked up tags from Igor. >>> -Added Fixes tag. >>> -Improved the commit message to clearly state that this is currently a >>> real bug for ATA PT commands with sense data. >>> >>> drivers/ata/libata-eh.c | 9 +++++++++ >>> drivers/ata/libata-scsi.c | 3 --- >>> 2 files changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c >>> index 7de97ee8e78b..450e9bd96c97 100644 >>> --- a/drivers/ata/libata-eh.c >>> +++ b/drivers/ata/libata-eh.c >>> @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, >>> list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) { >>> struct ata_queued_cmd *qc; >>> >>> + /* >>> + * If the scmd was added to EH, via ata_qc_schedule_eh() -> >>> + * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will >>> + * have set DID_TIME_OUT (since libata does not have an abort >>> + * handler). Thus to clear the DID_TIME_OUT, we clear the host >>> + * byte (but keep the SCSI ML and status byte). >>> + */ >>> + scmd->result &= 0x0000ffff; >> >> I know it was like that, but why not: >> >> set_host_byte(scmd, 0); >> or >> set_host_byte(scmd, DID_OK); >> >> ? > > No particular reason. Since we basically just moving the code, > it made sense to keep it similar to the original code, but I > can submit a v3 that instead does: > set_host_byte(scmd, DID_OK); > > if you prefer that. I do prefer it. The magic 0x0000ffff mask is not exactly clear (the comment helps though)... Side note: patching scsi to define macros for status, ML and host byte masks and shifts would be very nice :) > > Strictly speaking, that would probably require us to drop Igor's > Tested-by though (even if the generated code for an optimizing > compiler ought to generate the same code). > > >> >> set_host_byte() uses the mask 0xff00ffff, since the upper 8 bits seem to be >> ignored: bits [0..7] are the status byte, [16..23] are the host byte and bits >> [8..15] are the message byte but that is unused. > > Nit: 8..16 is the SCSI midlayer byte, not message byte, see > 36ebf1e2aa14 ("scsi: core: Add error codes for internal SCSI midlayer use") > > > Kind regards, > Niklas
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 7de97ee8e78b..450e9bd96c97 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) { struct ata_queued_cmd *qc; + /* + * If the scmd was added to EH, via ata_qc_schedule_eh() -> + * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will + * have set DID_TIME_OUT (since libata does not have an abort + * handler). Thus to clear the DID_TIME_OUT, we clear the host + * byte (but keep the SCSI ML and status byte). + */ + scmd->result &= 0x0000ffff; + ata_qc_for_each_raw(ap, qc, i) { if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 3a442f564b0d..6a90062c8b55 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1680,9 +1680,6 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION); } else if (is_error && !have_sense) { ata_gen_ata_sense(qc); - } else { - /* Keep the SCSI ML and status byte, clear host byte. */ - cmd->result &= 0x0000ffff; } ata_qc_done(qc);