diff mbox series

[v2] block/reqlist: allow adding overlapping requests

Message ID 20240712140716.517911-1-f.ebner@proxmox.com
State New
Headers show
Series [v2] block/reqlist: allow adding overlapping requests | expand

Commit Message

Fiona Ebner July 12, 2024, 2:07 p.m. UTC
Allow overlapping request by removing the assert that made it
impossible. There are only two callers:

1. block_copy_task_create()

It already asserts the very same condition before calling
reqlist_init_req().

2. cbw_snapshot_read_lock()

There is no need to have read requests be non-overlapping in
copy-before-write when used for snapshot-access. In fact, there was no
protection against two callers of cbw_snapshot_read_lock() calling
reqlist_init_req() with overlapping ranges and this could lead to an
assertion failure [1].

In particular, with the reproducer script below [0], two
cbw_co_snapshot_block_status() callers could race, with the second
calling reqlist_init_req() before the first one finishes and removes
its conflicting request.

[0]:

> #!/bin/bash -e
> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
> ./qemu-img create /tmp/fleecing.raw -f raw 1G
> (
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } }
> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}}
> EOF
> ) &
> sleep 5
> while true; do
> ./qemu-nbd -d /dev/nbd0
> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r
> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket'
> done

[1]:

> #5  0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
> #6  0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
> #7  0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237
> #8  0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304
> #9  0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726
> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48
> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652
> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473
> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121
> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175

Cc: qemu-stable@nongnu.org
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
* different approach, allowing overlapping requests for
  copy-before-write rather than waiting for them. block-copy already
  asserts there are no conflicts before adding a request.

 block/copy-before-write.c | 3 ++-
 block/reqlist.c           | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy July 12, 2024, 3:13 p.m. UTC | #1
On 12.07.24 17:07, Fiona Ebner wrote:
> Allow overlapping request by removing the assert that made it
> impossible. There are only two callers:
> 
> 1. block_copy_task_create()
> 
> It already asserts the very same condition before calling
> reqlist_init_req().
> 
> 2. cbw_snapshot_read_lock()
> 
> There is no need to have read requests be non-overlapping in
> copy-before-write when used for snapshot-access. In fact, there was no
> protection against two callers of cbw_snapshot_read_lock() calling
> reqlist_init_req() with overlapping ranges and this could lead to an
> assertion failure [1].
> 
> In particular, with the reproducer script below [0], two
> cbw_co_snapshot_block_status() callers could race, with the second
> calling reqlist_init_req() before the first one finishes and removes
> its conflicting request.
> 
> [0]:
> 
>> #!/bin/bash -e
>> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
>> ./qemu-img create /tmp/fleecing.raw -f raw 1G
>> (
>> ./qemu-system-x86_64 --qmp stdio \
>> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
>> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \
>> <<EOF
>> {"execute": "qmp_capabilities"}
>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
>> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } }
>> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}}
>> EOF
>> ) &
>> sleep 5
>> while true; do
>> ./qemu-nbd -d /dev/nbd0
>> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r
>> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket'
>> done
> 
> [1]:
> 
>> #5  0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
>> #6  0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
>> #7  0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237
>> #8  0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304
>> #9  0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726
>> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48
>> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
>> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652
>> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
>> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473
>> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
>> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
>> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
>> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121
>> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175
> 
> Cc: qemu-stable@nongnu.org
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> Changes in v2:
> * different approach, allowing overlapping requests for
>    copy-before-write rather than waiting for them. block-copy already
>    asserts there are no conflicts before adding a request.
> 
>   block/copy-before-write.c | 3 ++-
>   block/reqlist.c           | 2 --
>   2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
> index 853e01a1eb..28f6a096cd 100644
> --- a/block/copy-before-write.c
> +++ b/block/copy-before-write.c
> @@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState {
>   
>       /*
>        * @frozen_read_reqs: current read requests for fleecing user in bs->file
> -     * node. These areas must not be rewritten by guest.
> +     * node. These areas must not be rewritten by guest. There can be multiple
> +     * overlapping read requests.
>        */
>       BlockReqList frozen_read_reqs;
>   
> diff --git a/block/reqlist.c b/block/reqlist.c
> index 08cb57cfa4..098e807378 100644
> --- a/block/reqlist.c
> +++ b/block/reqlist.c
> @@ -20,8 +20,6 @@
>   void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
>                         int64_t bytes)
>   {
> -    assert(!reqlist_find_conflict(reqs, offset, bytes));
> -
>       *req = (BlockReq) {
>           .offset = offset,
>           .bytes = bytes,


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Thanks, applied to my block branch
Michael Tokarev Aug. 11, 2024, 5:55 p.m. UTC | #2
12.07.2024 17:07, Fiona Ebner wrote:
> Allow overlapping request by removing the assert that made it
> impossible. There are only two callers:
> 
> 1. block_copy_task_create()
> 
> It already asserts the very same condition before calling
> reqlist_init_req().
> 
> 2. cbw_snapshot_read_lock()
> 
> There is no need to have read requests be non-overlapping in
> copy-before-write when used for snapshot-access. In fact, there was no
> protection against two callers of cbw_snapshot_read_lock() calling
> reqlist_init_req() with overlapping ranges and this could lead to an
> assertion failure [1].
> 
> In particular, with the reproducer script below [0], two
> cbw_co_snapshot_block_status() callers could race, with the second
> calling reqlist_init_req() before the first one finishes and removes
> its conflicting request.
> 
> [0]:
> 
>> #!/bin/bash -e
>> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
>> ./qemu-img create /tmp/fleecing.raw -f raw 1G
>> (
>> ./qemu-system-x86_64 --qmp stdio \
>> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
>> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \
>> <<EOF
>> {"execute": "qmp_capabilities"}
>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
>> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } }
>> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}}
>> EOF
>> ) &
>> sleep 5
>> while true; do
>> ./qemu-nbd -d /dev/nbd0
>> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r
>> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket'
>> done
> 
> [1]:
> 
>> #5  0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
>> #6  0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
>> #7  0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237
>> #8  0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304
>> #9  0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726
>> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48
>> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
>> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652
>> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
>> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473
>> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
>> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
>> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
>> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121
>> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175
> 
> Cc: qemu-stable@nongnu.org
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>

Hi!

Has this been forgotten or is it not needed for 9.1?

Thanks,

/mjt

> 
> Changes in v2:
> * different approach, allowing overlapping requests for
>    copy-before-write rather than waiting for them. block-copy already
>    asserts there are no conflicts before adding a request.
> 
>   block/copy-before-write.c | 3 ++-
>   block/reqlist.c           | 2 --
>   2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
> index 853e01a1eb..28f6a096cd 100644
> --- a/block/copy-before-write.c
> +++ b/block/copy-before-write.c
> @@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState {
>   
>       /*
>        * @frozen_read_reqs: current read requests for fleecing user in bs->file
> -     * node. These areas must not be rewritten by guest.
> +     * node. These areas must not be rewritten by guest. There can be multiple
> +     * overlapping read requests.
>        */
>       BlockReqList frozen_read_reqs;
>   
> diff --git a/block/reqlist.c b/block/reqlist.c
> index 08cb57cfa4..098e807378 100644
> --- a/block/reqlist.c
> +++ b/block/reqlist.c
> @@ -20,8 +20,6 @@
>   void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
>                         int64_t bytes)
>   {
> -    assert(!reqlist_find_conflict(reqs, offset, bytes));
> -
>       *req = (BlockReq) {
>           .offset = offset,
>           .bytes = bytes,
Vladimir Sementsov-Ogievskiy Aug. 28, 2024, 1:34 p.m. UTC | #3
On 11.08.24 20:55, Michael Tokarev wrote:
> 12.07.2024 17:07, Fiona Ebner wrote:
>> Allow overlapping request by removing the assert that made it
>> impossible. There are only two callers:
>>
>> 1. block_copy_task_create()
>>
>> It already asserts the very same condition before calling
>> reqlist_init_req().
>>
>> 2. cbw_snapshot_read_lock()
>>
>> There is no need to have read requests be non-overlapping in
>> copy-before-write when used for snapshot-access. In fact, there was no
>> protection against two callers of cbw_snapshot_read_lock() calling
>> reqlist_init_req() with overlapping ranges and this could lead to an
>> assertion failure [1].
>>
>> In particular, with the reproducer script below [0], two
>> cbw_co_snapshot_block_status() callers could race, with the second
>> calling reqlist_init_req() before the first one finishes and removes
>> its conflicting request.
>>
>> [0]:
>>
>>> #!/bin/bash -e
>>> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
>>> ./qemu-img create /tmp/fleecing.raw -f raw 1G
>>> (
>>> ./qemu-system-x86_64 --qmp stdio \
>>> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
>>> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \
>>> <<EOF
>>> {"execute": "qmp_capabilities"}
>>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
>>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
>>> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } }
>>> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}}
>>> EOF
>>> ) &
>>> sleep 5
>>> while true; do
>>> ./qemu-nbd -d /dev/nbd0
>>> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r
>>> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket'
>>> done
>>
>> [1]:
>>
>>> #5  0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
>>> #6  0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
>>> #7  0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237
>>> #8  0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304
>>> #9  0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726
>>> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48
>>> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
>>> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652
>>> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
>>> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473
>>> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
>>> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
>>> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
>>> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121
>>> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175
>>
>> Cc: qemu-stable@nongnu.org
>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> 
> Hi!
> 
> Has this been forgotten or is it not needed for 9.1?
> 

My apologies, this is forgotten. I think rc4 is too late, I'll send Pull request as soon as 9.2 window open.
diff mbox series

Patch

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 853e01a1eb..28f6a096cd 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -66,7 +66,8 @@  typedef struct BDRVCopyBeforeWriteState {
 
     /*
      * @frozen_read_reqs: current read requests for fleecing user in bs->file
-     * node. These areas must not be rewritten by guest.
+     * node. These areas must not be rewritten by guest. There can be multiple
+     * overlapping read requests.
      */
     BlockReqList frozen_read_reqs;
 
diff --git a/block/reqlist.c b/block/reqlist.c
index 08cb57cfa4..098e807378 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -20,8 +20,6 @@ 
 void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
                       int64_t bytes)
 {
-    assert(!reqlist_find_conflict(reqs, offset, bytes));
-
     *req = (BlockReq) {
         .offset = offset,
         .bytes = bytes,