diff mbox series

[v2,for-6.2] file-posix: Fix alignment after reopen changing O_DIRECT

Message ID 20211116101431.105252-1-hreitz@redhat.com
State New
Headers show
Series [v2,for-6.2] file-posix: Fix alignment after reopen changing O_DIRECT | expand

Commit Message

Hanna Czenczek Nov. 16, 2021, 10:14 a.m. UTC
From: Kevin Wolf <kwolf@redhat.com>

At the end of a reopen, we already call bdrv_refresh_limits(), which
should update bs->request_alignment according to the new file
descriptor. However, raw_probe_alignment() relies on s->needs_alignment
and just uses 1 if it isn't set. We neglected to update this field, so
starting with cache=writeback and then reopening with cache=none means
that we get an incorrect bs->request_alignment == 1 and unaligned
requests fail instead of being automatically aligned.

Fix this by recalculating s->needs_alignment in raw_refresh_limits()
before calling raw_probe_alignment().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211104113109.56336-1-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-13-kwolf@redhat.com>
[hreitz: Fix iotest 142 for block sizes greater than 512 by operating on
         a file with a size of 1 MB]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
v1:
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00129.html

Though Kevin modified this patch according to Stefano's suggestion
(letting raw_needs_alignment() return a bool instead of an int) in his
block branch, and so the version in the pull request was slighly
different:
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00356.html

That is also the reason why there are already two Message-Id tags in
this patch.

(Kevin is on PTO, which is why I'm taking over this patch.)

v2:
Don't continue operating on a qcow2 file with an effectively random
size, but create a new 1 MB file to run our O_DIRECT reopen read tests
on.  Otherwise, we'll get into permission trouble because qemu-io does
not take the RESIZE permission by default, which it would need to
auto-align the file size to the sector size.

Note that the qcow2 file is not even aligned to 512 byte sectors before
the test case (its size is 196616), but the error message doesn't appear
then because qemu internally calculates file sizes in multiples of 512
bytes (BDRV_SECTOR_SIZE), so a misalignment can only become visible when
the physical sector size exceeds 512 bytes.
(That's "OK" because qemu only counts sizes in multiples of 512, so any
resize below that granularity is not visible as a resize to qemu, and so
does not need the RESIZE permission.)

Another way we could solve this problem is by having qemu-io take the
RESIZE permission, but I believe it would need to retain it not only
through the reopen, but until the image size is aligned to the sector
size; that is, we would need to resize the image anyway to be able to
drop the permission and not keep it constantly.  So it's simpler to just
create an aligned image from the start.
---
 block/file-posix.c         | 20 ++++++++++++++++----
 tests/qemu-iotests/142     | 29 +++++++++++++++++++++++++++++
 tests/qemu-iotests/142.out | 18 ++++++++++++++++++
 3 files changed, 63 insertions(+), 4 deletions(-)

Comments

Eric Blake Nov. 16, 2021, 2:30 p.m. UTC | #1
On Tue, Nov 16, 2021 at 11:14:31AM +0100, Hanna Reitz wrote:
> From: Kevin Wolf <kwolf@redhat.com>
> 
> At the end of a reopen, we already call bdrv_refresh_limits(), which
> should update bs->request_alignment according to the new file
> descriptor. However, raw_probe_alignment() relies on s->needs_alignment
> and just uses 1 if it isn't set. We neglected to update this field, so
> starting with cache=writeback and then reopening with cache=none means
> that we get an incorrect bs->request_alignment == 1 and unaligned
> requests fail instead of being automatically aligned.
> 

> v2:
> Don't continue operating on a qcow2 file with an effectively random
> size, but create a new 1 MB file to run our O_DIRECT reopen read tests
> on.  Otherwise, we'll get into permission trouble because qemu-io does
> not take the RESIZE permission by default, which it would need to
> auto-align the file size to the sector size.
> 
> Note that the qcow2 file is not even aligned to 512 byte sectors before
> the test case (its size is 196616), but the error message doesn't appear
> then because qemu internally calculates file sizes in multiples of 512
> bytes (BDRV_SECTOR_SIZE), so a misalignment can only become visible when
> the physical sector size exceeds 512 bytes.
> (That's "OK" because qemu only counts sizes in multiples of 512, so any
> resize below that granularity is not visible as a resize to qemu, and so
> does not need the RESIZE permission.)
> 
> Another way we could solve this problem is by having qemu-io take the
> RESIZE permission, but I believe it would need to retain it not only
> through the reopen, but until the image size is aligned to the sector
> size; that is, we would need to resize the image anyway to be able to
> drop the permission and not keep it constantly.  So it's simpler to just
> create an aligned image from the start.

I concur that your patch is simpler, and since we are in rc phase,
simpler is better.

> +++ b/tests/qemu-iotests/142
> @@ -350,6 +350,35 @@ info block backing-file"
>  
>  echo "$hmp_cmds" | run_qemu -drive "$files","$ids" | grep "Cache"
>  
> +echo
> +echo "--- Alignment after changing O_DIRECT ---"
> +echo
> +
> +# Directly test the protocol level: Can unaligned requests succeed even if
> +# O_DIRECT was only enabled through a reopen and vice versa?
> +
> +# Ensure image size is a multiple of the sector size (required for O_DIRECT)
> +$QEMU_IMG create -f file "$TEST_IMG" 1M | _filter_img_create
> +
> +# And write some data (not strictly necessary, but it feels better to actually
> +# have something to be read)
> +$QEMU_IO -f file -c 'write 0 4096' "$TEST_IMG" | _filter_qemu_io

Looks nice.

> +
> +$QEMU_IO --cache=writeback -f file $TEST_IMG <<EOF | _filter_qemu_io
> +read 42 42
> +reopen -o cache.direct=on
> +read 42 42

This particular region of the disk is a sub-sector, but completely
contained within one sector.  Is it also worth testing the case of an
unaligned read that crosses a 4096-byte boundary?  But we could do
that on top, your patch is already an improvement and catches the
particular problem you set out to solve.

Reviewed-by: Eric Blake <eblake@redhat.com>
diff mbox series

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index 7a27c83060..b283093e5b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -167,6 +167,7 @@  typedef struct BDRVRawState {
     int page_cache_inconsistent; /* errno from fdatasync failure */
     bool has_fallocate;
     bool needs_alignment;
+    bool force_alignment;
     bool drop_cache;
     bool check_cache_dropped;
     struct {
@@ -351,6 +352,17 @@  static bool dio_byte_aligned(int fd)
     return false;
 }
 
+static bool raw_needs_alignment(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+
+    if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) {
+        return true;
+    }
+
+    return s->force_alignment;
+}
+
 /* Check if read is allowed with given memory buffer and length.
  *
  * This function is used to check O_DIRECT memory buffer and request alignment.
@@ -728,9 +740,6 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
 
     s->has_discard = true;
     s->has_write_zeroes = true;
-    if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) {
-        s->needs_alignment = true;
-    }
 
     if (fstat(s->fd, &st) < 0) {
         ret = -errno;
@@ -784,9 +793,10 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
          * so QEMU makes sure all IO operations on the device are aligned
          * to sector size, or else FreeBSD will reject them with EINVAL.
          */
-        s->needs_alignment = true;
+        s->force_alignment = true;
     }
 #endif
+    s->needs_alignment = raw_needs_alignment(bs);
 
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
@@ -1251,7 +1261,9 @@  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
     BDRVRawState *s = bs->opaque;
     struct stat st;
 
+    s->needs_alignment = raw_needs_alignment(bs);
     raw_probe_alignment(bs, s->fd, errp);
+
     bs->bl.min_mem_alignment = s->buf_align;
     bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
 
diff --git a/tests/qemu-iotests/142 b/tests/qemu-iotests/142
index 69fd10ef51..86d65a2d1a 100755
--- a/tests/qemu-iotests/142
+++ b/tests/qemu-iotests/142
@@ -350,6 +350,35 @@  info block backing-file"
 
 echo "$hmp_cmds" | run_qemu -drive "$files","$ids" | grep "Cache"
 
+echo
+echo "--- Alignment after changing O_DIRECT ---"
+echo
+
+# Directly test the protocol level: Can unaligned requests succeed even if
+# O_DIRECT was only enabled through a reopen and vice versa?
+
+# Ensure image size is a multiple of the sector size (required for O_DIRECT)
+$QEMU_IMG create -f file "$TEST_IMG" 1M | _filter_img_create
+
+# And write some data (not strictly necessary, but it feels better to actually
+# have something to be read)
+$QEMU_IO -f file -c 'write 0 4096' "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IO --cache=writeback -f file $TEST_IMG <<EOF | _filter_qemu_io
+read 42 42
+reopen -o cache.direct=on
+read 42 42
+reopen -o cache.direct=off
+read 42 42
+EOF
+$QEMU_IO --cache=none -f file $TEST_IMG <<EOF | _filter_qemu_io
+read 42 42
+reopen -o cache.direct=off
+read 42 42
+reopen -o cache.direct=on
+read 42 42
+EOF
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/142.out b/tests/qemu-iotests/142.out
index a92b948edd..e20cfc8eb8 100644
--- a/tests/qemu-iotests/142.out
+++ b/tests/qemu-iotests/142.out
@@ -747,4 +747,22 @@  cache.no-flush=on on backing-file
     Cache mode:       writeback
     Cache mode:       writeback, direct
     Cache mode:       writeback, ignore flushes
+
+--- Alignment after changing O_DIRECT ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=file size=1048576
+wrote 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 42/42 bytes at offset 42
+42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 42/42 bytes at offset 42
+42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 42/42 bytes at offset 42
+42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 42/42 bytes at offset 42
+42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 42/42 bytes at offset 42
+42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 42/42 bytes at offset 42
+42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done