diff mbox series

[v3,8/8] hw/ide/ahci: fix broken SError handling

Message ID 20230609140844.202795-9-nks@flawful.org
State New
Headers show
Series misc AHCI cleanups | expand

Commit Message

Niklas Cassel June 9, 2023, 2:08 p.m. UTC
From: Niklas Cassel <niklas.cassel@wdc.com>

When encountering an NCQ error, you should not write the NCQ tag to the
SError register. This is completely wrong.

The SError register has a clear definition, where each bit represents a
different error, see PxSERR definition in AHCI 1.3.1.

If we write a random value (like the NCQ tag) in SError, e.g. Linux will
read SError, and will trigger arbitrary error handling depending on the
NCQ tag that happened to be executing.

In case of success, ncq_cb() will call ncq_finish().
In case of error, ncq_cb() will call ncq_err() (which will clear
ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
sufficient to tell if finished should get set or not.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Philippe Mathieu-Daudé July 19, 2023, 11:59 a.m. UTC | #1
On 9/6/23 16:08, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> When encountering an NCQ error, you should not write the NCQ tag to the
> SError register. This is completely wrong.
> 
> The SError register has a clear definition, where each bit represents a
> different error, see PxSERR definition in AHCI 1.3.1.
> 
> If we write a random value (like the NCQ tag) in SError, e.g. Linux will
> read SError, and will trigger arbitrary error handling depending on the
> NCQ tag that happened to be executing.
> 
> In case of success, ncq_cb() will call ncq_finish().
> In case of error, ncq_cb() will call ncq_err() (which will clear
> ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
> sufficient to tell if finished should get set or not.
> 

Cc: qemu-stable@nongnu.org
Fixes: f6ad2e32f8 ("ahci: add ahci emulation")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   hw/ide/ahci.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
Michael Tokarev July 19, 2023, 12:06 p.m. UTC | #2
19.07.2023 14:59, Philippe Mathieu-Daudé wrote:
> On 9/6/23 16:08, Niklas Cassel wrote:
>> From: Niklas Cassel <niklas.cassel@wdc.com>
>>
>> When encountering an NCQ error, you should not write the NCQ tag to the
>> SError register. This is completely wrong.
>>
>> The SError register has a clear definition, where each bit represents a
>> different error, see PxSERR definition in AHCI 1.3.1.
>>
>> If we write a random value (like the NCQ tag) in SError, e.g. Linux will
>> read SError, and will trigger arbitrary error handling depending on the
>> NCQ tag that happened to be executing.
>>
>> In case of success, ncq_cb() will call ncq_finish().
>> In case of error, ncq_cb() will call ncq_err() (which will clear
>> ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
>> sufficient to tell if finished should get set or not.
>>
> 
> Cc: qemu-stable@nongnu.org
> Fixes: f6ad2e32f8 ("ahci: add ahci emulation")

If going this route, it feels like 7/8 "hw/ide/ahci: fix ahci_write_fis_sdb()"
should be there (-stable) too.

/mjt
Philippe Mathieu-Daudé July 19, 2023, 12:11 p.m. UTC | #3
On 19/7/23 14:06, Michael Tokarev wrote:
> 19.07.2023 14:59, Philippe Mathieu-Daudé wrote:
>> On 9/6/23 16:08, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@wdc.com>
>>>
>>> When encountering an NCQ error, you should not write the NCQ tag to the
>>> SError register. This is completely wrong.
>>>
>>> The SError register has a clear definition, where each bit represents a
>>> different error, see PxSERR definition in AHCI 1.3.1.
>>>
>>> If we write a random value (like the NCQ tag) in SError, e.g. Linux will
>>> read SError, and will trigger arbitrary error handling depending on the
>>> NCQ tag that happened to be executing.
>>>
>>> In case of success, ncq_cb() will call ncq_finish().
>>> In case of error, ncq_cb() will call ncq_err() (which will clear
>>> ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
>>> sufficient to tell if finished should get set or not.
>>>
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: f6ad2e32f8 ("ahci: add ahci emulation")
> 
> If going this route, it feels like 7/8 "hw/ide/ahci: fix 
> ahci_write_fis_sdb()"
> should be there (-stable) too.

Certainly; I realized this deserve to be in stable at this final
patch :/
diff mbox series

Patch

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index ef6c9fc378..d0a774bc17 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1012,7 +1012,6 @@  static void ncq_err(NCQTransferState *ncq_tfs)
 
     ide_state->error = ABRT_ERR;
     ide_state->status = READY_STAT | ERR_STAT;
-    ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
     qemu_sglist_destroy(&ncq_tfs->sglist);
     ncq_tfs->used = 0;
 }
@@ -1022,7 +1021,7 @@  static void ncq_finish(NCQTransferState *ncq_tfs)
     /* If we didn't error out, set our finished bit. Errored commands
      * do not get a bit set for the SDB FIS ACT register, nor do they
      * clear the outstanding bit in scr_act (PxSACT). */
-    if (!(ncq_tfs->drive->port_regs.scr_err & (1 << ncq_tfs->tag))) {
+    if (ncq_tfs->used) {
         ncq_tfs->drive->finished |= (1 << ncq_tfs->tag);
     }