diff mbox series

ata: libata-eh: Clear scsicmd->result when setting SAM_STAT_CHECK_CONDITION

Message ID 20240904223727.1149294-1-ipylypiv@google.com
State New
Headers show
Series ata: libata-eh: Clear scsicmd->result when setting SAM_STAT_CHECK_CONDITION | expand

Commit Message

Igor Pylypiv Sept. 4, 2024, 10:37 p.m. UTC
commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
set_status_byte() which does not clear the scsicmd->result.

By not clearing the scsicmd->result we end up in a state where both
the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
bytes are set.

The DID_TIME_OUT host byte is getting set a part of error handling:

ata_qc_complete()
    ata_qc_schedule_eh()
        blk_abort_request()
            WRITE_ONCE(req->deadline, jiffies);

blk_mq_timeout_work()
    blk_mq_check_expired()
        blk_mq_rq_timed_out()
	    req->q->mq_ops->timeout() / scsi_timeout()
                set_host_byte(scmd, DID_TIME_OUT);

Having the host byte set to DID_TIME_OUT for a command that didn't timeout
is confusing. Let's bring the old behavior back by setting scmd->result to
SAM_STAT_CHECK_CONDITION.

Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
---
 drivers/ata/libata-eh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Igor Pylypiv Sept. 4, 2024, 11:27 p.m. UTC | #1
On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote:
> commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
> not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
> set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
> set_status_byte() which does not clear the scsicmd->result.
> 
> By not clearing the scsicmd->result we end up in a state where both
> the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
> bytes are set.
> 
> The DID_TIME_OUT host byte is getting set a part of error handling:
> 
> ata_qc_complete()
>     ata_qc_schedule_eh()
>         blk_abort_request()
>             WRITE_ONCE(req->deadline, jiffies);
> 
> blk_mq_timeout_work()
>     blk_mq_check_expired()
>         blk_mq_rq_timed_out()
> 	    req->q->mq_ops->timeout() / scsi_timeout()
>                 set_host_byte(scmd, DID_TIME_OUT);
> 
> Having the host byte set to DID_TIME_OUT for a command that didn't timeout
> is confusing. Let's bring the old behavior back by setting scmd->result to
> SAM_STAT_CHECK_CONDITION.
> 
> Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
> Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> ---
>  drivers/ata/libata-eh.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 214b935c2ced..4927b40e782f 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
>  		 */
>  		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
>  		    (stat & ATA_SENSE) && ata_eh_request_sense(qc))
> -			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
> +			qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;

Just realized that this needs a comment "/* Clear the host byte. */" to
prevent someone from changing it back to set_status_byte() in the future.
I'll add the comment in v2.

>  		if (err & ATA_ICRC)
>  			qc->err_mask |= AC_ERR_ATA_BUS;
>  		if (err & (ATA_UNC | ATA_AMNF))
> -- 
> 2.46.0.469.g59c65b2a67-goog
>
Niklas Cassel Sept. 5, 2024, 8:33 a.m. UTC | #2
Hello Igor,

On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote:
> commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
> not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
> set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
> set_status_byte() which does not clear the scsicmd->result.

"which does not clear the scsicmd->result"

scsicmd->result is a combination of:
-SCSI status byte
-SCSI ML byte
-host byte

Please be more specific of which byte(s) that you want to clear,
both here and in other places in the commit message.


> 
> By not clearing the scsicmd->result we end up in a state where both
> the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
> bytes are set.
> 


> The DID_TIME_OUT host byte is getting set a part of error handling:
> 
> ata_qc_complete()
>     ata_qc_schedule_eh()
>         blk_abort_request()
>             WRITE_ONCE(req->deadline, jiffies);
> 
> blk_mq_timeout_work()
>     blk_mq_check_expired()
>         blk_mq_rq_timed_out()
> 	    req->q->mq_ops->timeout() / scsi_timeout()
>                 set_host_byte(scmd, DID_TIME_OUT);

I would have reorder your commit log and have this first in the commit log.


> 
> Having the host byte set to DID_TIME_OUT for a command that didn't timeout
> is confusing. Let's bring the old behavior back by setting scmd->result to
> SAM_STAT_CHECK_CONDITION.

I think you are missing something very important in the commit log here:
What is the user visible change before and after your change?

Is there a difference is the error message in dmesg? If not, that should
be mentioned as well.


> 
> Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
> Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> ---
>  drivers/ata/libata-eh.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 214b935c2ced..4927b40e782f 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
>  		 */
>  		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
>  		    (stat & ATA_SENSE) && ata_eh_request_sense(qc))
> -			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
> +			qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;

ata_eh_analyze_tf() will only be called on commands that are owned by EH
(has ATA_QCFLAG_EH set).

Thus this command will end up in:
ata_eh_finish() -> ata_eh_qc_complete() -> __ata_eh_qc_complete() ->
-> __ata_qc_complete() -> qc->complete_fn().

complete_fn will be (except in special cases): ata_scsi_qc_complete()
If you look at ata_scsi_qc_complete(), it already clears the host byte:
https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/ata/libata-scsi.c#L1695-L1696

So could you please be more specific of what problem this change is fixing?
Is it just that you think that it makes sense to clear the host byte earlier
in the call chain?

There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
commands will not fetch sense data via ata_eh_request_sense(), so clearing
the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
to do a similar change to yours in all the different code paths that sets
sense data ...which might actually be something that makes sense, but then
I would expect a patch series that changes all the locations where we set
sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
libata-scsi: do not overwrite SCSI ML and status bytes")).


See this patch to see all the places where we set sense data:
https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?h=for-6.12&id=9526dec226f0779d72f798e7a18375bf8d414775




Side note:
You might also be interested to know that a command that was sent via EH will
be finished using scsi_eh_finish_cmd() (called by __ata_eh_qc_complete()),
and will thus end up in scsi_eh_flush_done_q(). A command that has sense data
will have scmd->retries == scmd->allowed (set by ata_eh_qc_complete()), so you
will end up in this code path:
https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/scsi/scsi_error.c#L2213-L2227

Which means that SCSI EH will set DID_TIME_OUT for any command that does
not have (SCSI ML byte || SCSI status byte || host byte) set.

A command with sense data will have most often have CHECK_CONDITION set, but
there is also CDL policy 0xD commands (which will have the SCSI ML byte set).
So this special flag SCMD_FORCE_EH_SUCCESS is for commands that were completed
successfully without any SK/ASC/ASCQ in the same interrupt as other policy 0xD
commands which did have SK/ASC/ASCQ set.
For details, see 3d848ca1ebc8 ("scsi: core: Allow libata to complete successful
commands via EH").


Kind regards,
Niklas
Niklas Cassel Sept. 5, 2024, 9:25 a.m. UTC | #3
On Thu, Sep 05, 2024 at 10:33:38AM +0200, Niklas Cassel wrote:
> There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
> commands will not fetch sense data via ata_eh_request_sense(), so clearing
> the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
> to do a similar change to yours in all the different code paths that sets
> sense data ...which might actually be something that makes sense, but then
> I would expect a patch series that changes all the locations where we set
> sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
> ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
> libata-scsi: do not overwrite SCSI ML and status bytes")).

I tried to implement the suggestion above, it looks like this:

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index e4023fc288ac..ff4945a8f647 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4824,6 +4824,14 @@ void ata_qc_free(struct ata_queued_cmd *qc)
 		qc->tag = ATA_TAG_POISON;
 }
 
+void ata_qc_set_sense_valid_flag(struct ata_queued_cmd *qc)
+{
+	qc->flags |= ATA_QCFLAG_SENSE_VALID;
+
+	/* Keep the SCSI ML and status byte, clear host byte. */
+	qc->scsicmd->result &= 0x0000ffff;
+}
+
 void __ata_qc_complete(struct ata_queued_cmd *qc)
 {
 	struct ata_port *ap;
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 7de97ee8e78b..df83251601dc 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1482,7 +1482,8 @@ static bool ata_eh_request_sense(struct ata_queued_cmd *qc)
 			scsi_build_sense_buffer(dev->flags & ATA_DFLAG_D_SENSE,
 						cmd->sense_buffer, tf.lbah,
 						tf.lbam, tf.lbal);
-			qc->flags |= ATA_QCFLAG_SENSE_VALID;
+			ata_qc_set_sense_valid_flag(qc);
+
 			return true;
 		}
 	} else {
@@ -1657,7 +1658,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
 						qc->scsicmd->sense_buffer,
 						qc->result_tf.error >> 4);
 			if (!tmp)
-				qc->flags |= ATA_QCFLAG_SENSE_VALID;
+				ata_qc_set_sense_valid_flag(qc);
 			else
 				qc->err_mask |= tmp;
 		}
@@ -2049,7 +2050,7 @@ static void ata_eh_get_success_sense(struct ata_link *link)
 
 		/* This success command had sense data, but we failed to get. */
 		ata_scsi_set_sense(dev, qc->scsicmd, ABORTED_COMMAND, 0, 0);
-		qc->flags |= ATA_QCFLAG_SENSE_VALID;
+		ata_qc_set_sense_valid_flag(qc);
 	}
 	ata_eh_done(link, dev, ATA_EH_GET_SUCCESS_SENSE);
 }
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index ea6126c139af..c3bbe8877376 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -1452,7 +1452,7 @@ int ata_eh_read_sense_success_ncq_log(struct ata_link *link)
 		scsi_build_sense_buffer(dev->flags & ATA_DFLAG_D_SENSE,
 					qc->scsicmd->sense_buffer, sk,
 					asc, ascq);
-		qc->flags |= ATA_QCFLAG_SENSE_VALID;
+		ata_qc_set_sense_valid_flag(qc);
 
 		/*
 		 * No point in checking the return value, since the command has
@@ -1539,7 +1539,7 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
 					   ascq);
 			ata_scsi_set_sense_information(dev, qc->scsicmd,
 						       &qc->result_tf);
-			qc->flags |= ATA_QCFLAG_SENSE_VALID;
+			ata_qc_set_sense_valid_flag(qc);
 		}
 	}
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3a442f564b0d..6a90062c8b55 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1680,9 +1680,6 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
 	} else if (is_error && !have_sense) {
 		ata_gen_ata_sense(qc);
-	} else {
-		/* Keep the SCSI ML and status byte, clear host byte. */
-		cmd->result &= 0x0000ffff;
 	}
 
 	ata_qc_done(qc);
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index ab4bd44ba17c..85d19d6edcea 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -70,6 +70,7 @@ extern int ata_down_xfermask_limit(struct ata_device *dev, unsigned int sel);
 extern unsigned int ata_dev_set_feature(struct ata_device *dev,
 					u8 subcmd, u8 action);
 extern void ata_qc_free(struct ata_queued_cmd *qc);
+extern void ata_qc_set_sense_valid_flag(struct ata_queued_cmd *qc);
 extern void ata_qc_issue(struct ata_queued_cmd *qc);
 extern void __ata_qc_complete(struct ata_queued_cmd *qc);
 extern int atapi_check_dma(struct ata_queued_cmd *qc);



I guess the obvious advantage that I can see is that we would do the
right thing regardless of qc->complete_fn.

qc->complete_fn can be any of:
ata_qc_complete_internal()
ata_scsi_qc_complete()
atapi_qc_complete()
ata_scsi_report_zones_complete()

Instead of only doing the right thing if:
qc->complete_fn == ata_scsi_qc_complete().

Thoughts?


Kind regards,
Niklas
Niklas Cassel Sept. 5, 2024, 12:40 p.m. UTC | #4
On Thu, Sep 05, 2024 at 11:25:55AM +0200, Niklas Cassel wrote:
> On Thu, Sep 05, 2024 at 10:33:38AM +0200, Niklas Cassel wrote:
> > There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
> > commands will not fetch sense data via ata_eh_request_sense(), so clearing
> > the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
> > to do a similar change to yours in all the different code paths that sets
> > sense data ...which might actually be something that makes sense, but then
> > I would expect a patch series that changes all the locations where we set
> > sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
> > ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
> > libata-scsi: do not overwrite SCSI ML and status bytes")).
> 
> I tried to implement the suggestion above, it looks like this:
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index e4023fc288ac..ff4945a8f647 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -4824,6 +4824,14 @@ void ata_qc_free(struct ata_queued_cmd *qc)
>  		qc->tag = ATA_TAG_POISON;
>  }
>  
> +void ata_qc_set_sense_valid_flag(struct ata_queued_cmd *qc)
> +{
> +	qc->flags |= ATA_QCFLAG_SENSE_VALID;
> +
> +	/* Keep the SCSI ML and status byte, clear host byte. */
> +	qc->scsicmd->result &= 0x0000ffff;

This would have to be:
	/* Keep the SCSI ML and status byte, clear host byte. */
	if (qc->scsicmd)
		qc->scsicmd->result &= 0x0000ffff;

Since for ata_qc_complete_internal(), qc->scsicmd will be NULL.


> +}
> +

(snip)

> I guess the obvious advantage that I can see is that we would do the
> right thing regardless of qc->complete_fn.
> 
> qc->complete_fn can be any of:
> ata_qc_complete_internal()
> ata_scsi_qc_complete()
> atapi_qc_complete()
> ata_scsi_report_zones_complete()
> 
> Instead of only doing the right thing if:
> qc->complete_fn == ata_scsi_qc_complete().
> 
> Thoughts?

ata_scsi_report_zones_complete() calls ata_scsi_qc_complete().
And like I said above, qc->scsicmd is NULL for ata_qc_complete_internal(),
so I guess the one qc->complete_fn that is currently not doing the right
thing is atapi_qc_complete().

However, looking at atapi_qc_complete(), it actually does:

	if (unlikely(err_mask || qc->flags & ATA_QCFLAG_SENSE_VALID)) {
		...
		qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
		ata_qc_done(qc);
		return;
	}

	...
	cmd->result = SAM_STAT_GOOD;
	ata_qc_done(qc);

So atapi_qc_complete() will always overwrite the host byte.

So, AFAICT, there is no problematic qc->complete_fn function in the existing
libata code, so I don't think there is any urgency to change the code.

Anyway, I think I came up with an even nicer patch to clear the driver byte.

Change ata_scsi_set_sense():
-to set SENSE_DATA_VALID
-to clear driver byte (if scsicmd)

For the cases that calls:
-scsi_build_sense_buffer()
(because they don't want to set the SAM_STAT_CHECK_CONDITION)
or
-scsi_build_sense()
without using ata_scsi_set_sense():
create a new function/functions (e.g. ata_build_sense_keep_status())
-sets SENSE_DATA_VALID
-clears driver byte (if scsicmd)

Will send a PATCH/RFC series today or tomorrow.


Kind regards,
Niklas
Niklas Cassel Sept. 5, 2024, 7:28 p.m. UTC | #5
On Thu, Sep 05, 2024 at 02:40:28PM +0200, Niklas Cassel wrote:
> Anyway, I think I came up with an even nicer patch to clear the driver byte.
> 
> Change ata_scsi_set_sense():
> -to set SENSE_DATA_VALID
> -to clear driver byte (if scsicmd)
> 
> For the cases that calls:
> -scsi_build_sense_buffer()
> (because they don't want to set the SAM_STAT_CHECK_CONDITION)
> or
> -scsi_build_sense()
> without using ata_scsi_set_sense():
> create a new function/functions (e.g. ata_build_sense_keep_status())
> -sets SENSE_DATA_VALID
> -clears driver byte (if scsicmd)
> 
> Will send a PATCH/RFC series today or tomorrow.

Thinking even more about this:

ata_qc_schedule_eh() will have called scsi_timeout(), which sets
DID_TIME_OUT, since libata does not have an abort handler.

Thus, the command when first entering EH will have DID_TIME_OUT set.
Right now, we clear it as the final thing in ata_scsi_qc_complete().
You are suggesting that we clear it when storing sense data.

However:
There can always be commands that was sent via EH, that will not
have added sense data, e.g. for CDL policy 0xD commands that were
completed in the same IRQ (where some commands might have sense
data, but not all), same with NCQ error, one command will be the
one that triggered the NCQ error, rest will not have sense data.
Thus we will always need to clear the DID_TIME_OUT bit for commands
that were sent via EH...

I think what we could do, is to clear it when first entering EH,
instead of at the end of EH. This is probably the best solution,
because then we can remove the:
cmd->result &= 0x0000ffff;
from ata_scsi_qc_complete(), which is executed both for commands
that went via EH and command that did not go via EH.

And instead clear it in the beginning of EH, so DID_TIME_OUT will
only get cleared for commands that went via EH. (Because only
commands that went via EH will have DID_TIME_OUT set in the first
place.)



So my proposal is simply this:

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 7de97ee8e78b..450e9bd96c97 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
        list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) {
                struct ata_queued_cmd *qc;
 
+               /*
+                * If the scmd was added to EH, via ata_qc_schedule_eh() ->
+                * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will
+                * have set DID_TIME_OUT (since libata does not have an abort
+                * handler). Thus to clear the DID_TIME_OUT, we clear the host
+                * byte (but keep the SCSI ML and status byte).
+                */
+               scmd->result &= 0x0000ffff;
+
                ata_qc_for_each_raw(ap, qc, i) {
                        if (qc->flags & ATA_QCFLAG_ACTIVE &&
                            qc->scsicmd == scmd)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3a442f564b0d..6a90062c8b55 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1680,9 +1680,6 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
                        set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
        } else if (is_error && !have_sense) {
                ata_gen_ata_sense(qc);
-       } else {
-               /* Keep the SCSI ML and status byte, clear host byte. */
-               cmd->result &= 0x0000ffff;
        }
 
        ata_qc_done(qc);



Testing is appreciated :)

Thoughts?


Kind regards,
Niklas
Igor Pylypiv Sept. 6, 2024, 6:21 p.m. UTC | #6
On Thu, Sep 05, 2024 at 10:33:38AM +0200, Niklas Cassel wrote:

Hi Niklas,

Thank you so much for a thorough review and coming up with a better patch!

> Hello Igor,
> 
> On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote:
> > commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
> > not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
> > set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
> > set_status_byte() which does not clear the scsicmd->result.
> 
> "which does not clear the scsicmd->result"
> 
> scsicmd->result is a combination of:
> -SCSI status byte
> -SCSI ML byte
> -host byte
> 
> Please be more specific of which byte(s) that you want to clear,
> both here and in other places in the commit message.
> 
> 
> > 
> > By not clearing the scsicmd->result we end up in a state where both
> > the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
> > bytes are set.
> > 
> 
> 
> > The DID_TIME_OUT host byte is getting set a part of error handling:
> > 
> > ata_qc_complete()
> >     ata_qc_schedule_eh()
> >         blk_abort_request()
> >             WRITE_ONCE(req->deadline, jiffies);
> > 
> > blk_mq_timeout_work()
> >     blk_mq_check_expired()
> >         blk_mq_rq_timed_out()
> > 	    req->q->mq_ops->timeout() / scsi_timeout()
> >                 set_host_byte(scmd, DID_TIME_OUT);
> 
> I would have reorder your commit log and have this first in the commit log.
> 
> 
> > 
> > Having the host byte set to DID_TIME_OUT for a command that didn't timeout
> > is confusing. Let's bring the old behavior back by setting scmd->result to
> > SAM_STAT_CHECK_CONDITION.
> 
> I think you are missing something very important in the commit log here:
> What is the user visible change before and after your change?
>

For ATA PT commands the 'result' value is being returned to user space so
not clearing the DID_TIME_OUT is user visible.

> Is there a difference is the error message in dmesg? If not, that should
> be mentioned as well.
> 

I haven't seen any dmesg errors reported for ATA PT hence I don't think there
is any difference in dmesg.

> 
> > 
> > Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
> > Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> > ---
> >  drivers/ata/libata-eh.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> > index 214b935c2ced..4927b40e782f 100644
> > --- a/drivers/ata/libata-eh.c
> > +++ b/drivers/ata/libata-eh.c
> > @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
> >  		 */
> >  		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
> >  		    (stat & ATA_SENSE) && ata_eh_request_sense(qc))
> > -			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
> > +			qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
> 
> ata_eh_analyze_tf() will only be called on commands that are owned by EH
> (has ATA_QCFLAG_EH set).
> 
> Thus this command will end up in:
> ata_eh_finish() -> ata_eh_qc_complete() -> __ata_eh_qc_complete() ->
> -> __ata_qc_complete() -> qc->complete_fn().
> 
> complete_fn will be (except in special cases): ata_scsi_qc_complete()
> If you look at ata_scsi_qc_complete(), it already clears the host byte:
> https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/ata/libata-scsi.c#L1695-L1696
>
> So could you please be more specific of what problem this change is fixing?
> Is it just that you think that it makes sense to clear the host byte earlier
> in the call chain?

I should have mentioned that the issue is specific to ATA PT commands.
The problem is that ata_scsi_qc_complete() is not clearing the host byte
for ATA PT commands causing the DID_TIME_OUT to be returned.

> 
> There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
> commands will not fetch sense data via ata_eh_request_sense(), so clearing
> the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
> to do a similar change to yours in all the different code paths that sets
> sense data ...which might actually be something that makes sense, but then
> I would expect a patch series that changes all the locations where we set
> sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
> ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
> libata-scsi: do not overwrite SCSI ML and status bytes")).
> 
> 
> See this patch to see all the places where we set sense data:
> https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?h=for-6.12&id=9526dec226f0779d72f798e7a18375bf8d414775
> 
> 
> 
> 
> Side note:
> You might also be interested to know that a command that was sent via EH will
> be finished using scsi_eh_finish_cmd() (called by __ata_eh_qc_complete()),
> and will thus end up in scsi_eh_flush_done_q(). A command that has sense data
> will have scmd->retries == scmd->allowed (set by ata_eh_qc_complete()), so you
> will end up in this code path:
> https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/scsi/scsi_error.c#L2213-L2227
> 
> Which means that SCSI EH will set DID_TIME_OUT for any command that does
> not have (SCSI ML byte || SCSI status byte || host byte) set.
> 
> A command with sense data will have most often have CHECK_CONDITION set, but
> there is also CDL policy 0xD commands (which will have the SCSI ML byte set).
> So this special flag SCMD_FORCE_EH_SUCCESS is for commands that were completed
> successfully without any SK/ASC/ASCQ in the same interrupt as other policy 0xD
> commands which did have SK/ASC/ASCQ set.
> For details, see 3d848ca1ebc8 ("scsi: core: Allow libata to complete successful
> commands via EH").
> 
> 
> Kind regards,
> Niklas
diff mbox series

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 214b935c2ced..4927b40e782f 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1605,7 +1605,7 @@  static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
 		 */
 		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
 		    (stat & ATA_SENSE) && ata_eh_request_sense(qc))
-			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
+			qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
 		if (err & ATA_ICRC)
 			qc->err_mask |= AC_ERR_ATA_BUS;
 		if (err & (ATA_UNC | ATA_AMNF))