diff mbox series

ide: Explicitly poll for BHs on cancel

Message ID 20220105111337.10366-1-hreitz@redhat.com
State New
Headers show
Series ide: Explicitly poll for BHs on cancel | expand

Commit Message

Hanna Czenczek Jan. 5, 2022, 11:13 a.m. UTC
When we still have an AIOCB registered for DMA operations, we try to
settle the respective operation by draining the BlockBackend associated
with the IDE device.

However, this assumes that every DMA operation is associated with some
I/O operation on the BlockBackend, and so settling the latter will
settle the former.  That is not the case; for example, the guest is free
to issue a zero-length TRIM operation that will not result in any I/O
operation forwarded to the BlockBackend.  In such a case, blk_drain()
will be a no-op if no other operations are in flight.

It is clear that if blk_drain() is a no-op, the value of
s->bus->dma->aiocb will not change between checking it in the `if`
condition and asserting that it is NULL after blk_drain().

To settle the DMA operation, we will thus need to explicitly invoke
aio_poll() ourselves, which will run any outstanding BHs (like
ide_trim_bh_cb()), until s->bus->dma->aiocb is NULL.  To stop this from
being an infinite loop, assert that we made progress with every
aio_poll() call (i.e., invoked some BH).

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
Perhaps for a lack of being aware of all the kinds of tests we have, I
found it impossible to write a reproducer in any of our current test
frameworks: From how I understand the issue, to reproduce it, you need
to issue a TRIM request and immediately cancel it, before
ide_trim_bh_cb() (scheduled as a BH) can run.

I wanted to do this via qtest, but that does not work, because every
port I/O operation is done via a qtest command, and QEMU will happily
poll the main context between each qtest command, which means that you
cannot cancel an ongoing IDE request before a BH scheduled by it is run.

Therefore, I wrote an x86 boot sector that sets up a no-op TRIM request
(i.e. one where all TRIM ranges have length 0) and immediately cancels
it by setting SRST.  It is attached to the BZ linked above, and can be
tested as follows:

$ TEST_BIN=test.bin
$ (sleep 1; echo 'info registers'; echo quit) \
    | ./qemu-system-x86_64 \
        -drive if=ide,file=$TEST_BIN,format=raw \
        -monitor stdio \
    | grep EIP= \
    | sed -e 's/ EFL.*//'

The result should be:
EIP=00007c72

Not:
qemu-system-x86_64: ../hw/ide/core.c:734: ide_cancel_dma_sync: Assertion
`s->bus->dma->aiocb == NULL' failed.
---
 hw/ide/core.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Hanna Czenczek Jan. 5, 2022, 1:53 p.m. UTC | #1
On 05.01.22 12:13, Hanna Reitz wrote:

[...]

> Perhaps for a lack of being aware of all the kinds of tests we have, I
> found it impossible to write a reproducer in any of our current test
> frameworks: From how I understand the issue, to reproduce it, you need
> to issue a TRIM request and immediately cancel it, before
> ide_trim_bh_cb() (scheduled as a BH) can run.
>
> I wanted to do this via qtest, but that does not work, because every
> port I/O operation is done via a qtest command, and QEMU will happily
> poll the main context between each qtest command, which means that you
> cannot cancel an ongoing IDE request before a BH scheduled by it is run.
>
> Therefore, I wrote an x86 boot sector that sets up a no-op TRIM request
> (i.e. one where all TRIM ranges have length 0) and immediately cancels
> it by setting SRST.

I just realized we could, if we really wanted to, add this to the 
iotests’ sample_images directory, and then run it from an iotest...

Hanna
Philippe Mathieu-Daudé Jan. 6, 2022, 12:11 a.m. UTC | #2
Cc'ing Mark for macio which seems to have the same issue.

On 5/1/22 12:13, Hanna Reitz wrote:
> When we still have an AIOCB registered for DMA operations, we try to
> settle the respective operation by draining the BlockBackend associated
> with the IDE device.
> 
> However, this assumes that every DMA operation is associated with some
> I/O operation on the BlockBackend, and so settling the latter will
> settle the former.  That is not the case; for example, the guest is free
> to issue a zero-length TRIM operation that will not result in any I/O
> operation forwarded to the BlockBackend.  In such a case, blk_drain()
> will be a no-op if no other operations are in flight.
> 
> It is clear that if blk_drain() is a no-op, the value of
> s->bus->dma->aiocb will not change between checking it in the `if`
> condition and asserting that it is NULL after blk_drain().
> 
> To settle the DMA operation, we will thus need to explicitly invoke
> aio_poll() ourselves, which will run any outstanding BHs (like
> ide_trim_bh_cb()), until s->bus->dma->aiocb is NULL.  To stop this from
> being an infinite loop, assert that we made progress with every
> aio_poll() call (i.e., invoked some BH).
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
> Perhaps for a lack of being aware of all the kinds of tests we have, I
> found it impossible to write a reproducer in any of our current test
> frameworks: From how I understand the issue, to reproduce it, you need
> to issue a TRIM request and immediately cancel it, before
> ide_trim_bh_cb() (scheduled as a BH) can run.
> 
> I wanted to do this via qtest, but that does not work, because every
> port I/O operation is done via a qtest command, and QEMU will happily
> poll the main context between each qtest command, which means that you
> cannot cancel an ongoing IDE request before a BH scheduled by it is run.
> 
> Therefore, I wrote an x86 boot sector that sets up a no-op TRIM request
> (i.e. one where all TRIM ranges have length 0) and immediately cancels
> it by setting SRST.  It is attached to the BZ linked above, and can be
> tested as follows:
> 
> $ TEST_BIN=test.bin
> $ (sleep 1; echo 'info registers'; echo quit) \
>      | ./qemu-system-x86_64 \
>          -drive if=ide,file=$TEST_BIN,format=raw \
>          -monitor stdio \
>      | grep EIP= \
>      | sed -e 's/ EFL.*//'
> 
> The result should be:
> EIP=00007c72
> 
> Not:
> qemu-system-x86_64: ../hw/ide/core.c:734: ide_cancel_dma_sync: Assertion
> `s->bus->dma->aiocb == NULL' failed.
> ---
>   hw/ide/core.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index e28f8aad61..c7f7a1016c 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -731,7 +731,17 @@ void ide_cancel_dma_sync(IDEState *s)
>       if (s->bus->dma->aiocb) {
>           trace_ide_cancel_dma_sync_remaining();
>           blk_drain(s->blk);
> -        assert(s->bus->dma->aiocb == NULL);
> +
> +        /*
> +         * Wait for potentially still-scheduled BHs, like ide_trim_bh_cb()
> +         * (blk_drain() will only poll if there are in-flight requests on the
> +         * BlockBackend, which there may not necessarily be, e.g. when the
> +         * guest has issued a zero-length TRIM request)
> +         */
> +        while (s->bus->dma->aiocb) {
> +            bool progress = aio_poll(qemu_get_aio_context(), true);
> +            assert(progress);
> +        }
>       }
>   }
>
Mark Cave-Ayland Jan. 7, 2022, 8:05 a.m. UTC | #3
On 06/01/2022 00:11, Philippe Mathieu-Daudé wrote:

> Cc'ing Mark for macio which seems to have the same issue.

Thanks. Presumably this is all done in the IDE layer so there is no need for any 
changes to the macio device itself?

Note that macio really is just a standard IDE interface but IIRC the reason that 
macio reimplements its own IDE callbacks is because the IDE code doesn't have the 
necessary hooks in place to enable the io->processing (DBDMA) and m->active (macio 
IDE) variables to be set correctly during DMA.

It would be nice one day to be able to fix this so that the existing IDE code could 
be used, which would allow most of this duplicate code to be removed.


ATB,

Mark.

> On 5/1/22 12:13, Hanna Reitz wrote:
>> When we still have an AIOCB registered for DMA operations, we try to
>> settle the respective operation by draining the BlockBackend associated
>> with the IDE device.
>>
>> However, this assumes that every DMA operation is associated with some
>> I/O operation on the BlockBackend, and so settling the latter will
>> settle the former.  That is not the case; for example, the guest is free
>> to issue a zero-length TRIM operation that will not result in any I/O
>> operation forwarded to the BlockBackend.  In such a case, blk_drain()
>> will be a no-op if no other operations are in flight.
>>
>> It is clear that if blk_drain() is a no-op, the value of
>> s->bus->dma->aiocb will not change between checking it in the `if`
>> condition and asserting that it is NULL after blk_drain().
>>
>> To settle the DMA operation, we will thus need to explicitly invoke
>> aio_poll() ourselves, which will run any outstanding BHs (like
>> ide_trim_bh_cb()), until s->bus->dma->aiocb is NULL.  To stop this from
>> being an infinite loop, assert that we made progress with every
>> aio_poll() call (i.e., invoked some BH).
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> 
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> 
>> ---
>> Perhaps for a lack of being aware of all the kinds of tests we have, I
>> found it impossible to write a reproducer in any of our current test
>> frameworks: From how I understand the issue, to reproduce it, you need
>> to issue a TRIM request and immediately cancel it, before
>> ide_trim_bh_cb() (scheduled as a BH) can run.
>>
>> I wanted to do this via qtest, but that does not work, because every
>> port I/O operation is done via a qtest command, and QEMU will happily
>> poll the main context between each qtest command, which means that you
>> cannot cancel an ongoing IDE request before a BH scheduled by it is run.
>>
>> Therefore, I wrote an x86 boot sector that sets up a no-op TRIM request
>> (i.e. one where all TRIM ranges have length 0) and immediately cancels
>> it by setting SRST.  It is attached to the BZ linked above, and can be
>> tested as follows:
>>
>> $ TEST_BIN=test.bin
>> $ (sleep 1; echo 'info registers'; echo quit) \
>>      | ./qemu-system-x86_64 \
>>          -drive if=ide,file=$TEST_BIN,format=raw \
>>          -monitor stdio \
>>      | grep EIP= \
>>      | sed -e 's/ EFL.*//'
>>
>> The result should be:
>> EIP=00007c72
>>
>> Not:
>> qemu-system-x86_64: ../hw/ide/core.c:734: ide_cancel_dma_sync: Assertion
>> `s->bus->dma->aiocb == NULL' failed.
>> ---
>>   hw/ide/core.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index e28f8aad61..c7f7a1016c 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -731,7 +731,17 @@ void ide_cancel_dma_sync(IDEState *s)
>>       if (s->bus->dma->aiocb) {
>>           trace_ide_cancel_dma_sync_remaining();
>>           blk_drain(s->blk);
>> -        assert(s->bus->dma->aiocb == NULL);
>> +
>> +        /*
>> +         * Wait for potentially still-scheduled BHs, like ide_trim_bh_cb()
>> +         * (blk_drain() will only poll if there are in-flight requests on the
>> +         * BlockBackend, which there may not necessarily be, e.g. when the
>> +         * guest has issued a zero-length TRIM request)
>> +         */
>> +        while (s->bus->dma->aiocb) {
>> +            bool progress = aio_poll(qemu_get_aio_context(), true);
>> +            assert(progress);
>> +        }
>>       }
>>   }
>
Paolo Bonzini Jan. 19, 2022, 11:11 a.m. UTC | #4
On 1/5/22 12:13, Hanna Reitz wrote:
> -        assert(s->bus->dma->aiocb == NULL);
> +
> +        /*
> +         * Wait for potentially still-scheduled BHs, like ide_trim_bh_cb()
> +         * (blk_drain() will only poll if there are in-flight requests on the
> +         * BlockBackend, which there may not necessarily be, e.g. when the
> +         * guest has issued a zero-length TRIM request)
> +         */
> +        while (s->bus->dma->aiocb) {
> +            bool progress = aio_poll(qemu_get_aio_context(), true);
> +            assert(progress);
> +        }


I think the right way to do this is to do  blk_inc_in_flight before 
scheduling the bottom half and blk_dec_in_flight in the BH callback. 
See virtio_blk_dma_restart_cb for an example.

Paolo
Hanna Czenczek Jan. 19, 2022, 11:29 a.m. UTC | #5
On 19.01.22 12:11, Paolo Bonzini wrote:
> On 1/5/22 12:13, Hanna Reitz wrote:
>> - assert(s->bus->dma->aiocb == NULL);
>> +
>> +        /*
>> +         * Wait for potentially still-scheduled BHs, like 
>> ide_trim_bh_cb()
>> +         * (blk_drain() will only poll if there are in-flight 
>> requests on the
>> +         * BlockBackend, which there may not necessarily be, e.g. 
>> when the
>> +         * guest has issued a zero-length TRIM request)
>> +         */
>> +        while (s->bus->dma->aiocb) {
>> +            bool progress = aio_poll(qemu_get_aio_context(), true);
>> +            assert(progress);
>> +        }
>
>
> I think the right way to do this is to do  blk_inc_in_flight before 
> scheduling the bottom half and blk_dec_in_flight in the BH callback. 
> See virtio_blk_dma_restart_cb for an example.

Oh, yes, that sounds better.  Thanks!

Hanna
diff mbox series

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index e28f8aad61..c7f7a1016c 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -731,7 +731,17 @@  void ide_cancel_dma_sync(IDEState *s)
     if (s->bus->dma->aiocb) {
         trace_ide_cancel_dma_sync_remaining();
         blk_drain(s->blk);
-        assert(s->bus->dma->aiocb == NULL);
+
+        /*
+         * Wait for potentially still-scheduled BHs, like ide_trim_bh_cb()
+         * (blk_drain() will only poll if there are in-flight requests on the
+         * BlockBackend, which there may not necessarily be, e.g. when the
+         * guest has issued a zero-length TRIM request)
+         */
+        while (s->bus->dma->aiocb) {
+            bool progress = aio_poll(qemu_get_aio_context(), true);
+            assert(progress);
+        }
     }
 }