Message ID | 20231020215622.789260-8-andrey.drobyshev@virtuozzo.com |
---|---|
State | New |
Headers | show |
Series | qcow2: make subclusters discardable | expand |
On 20.10.23 23:56, Andrey Drobyshev wrote: > Add _verify_du_delta() checker which is used to check that real disk > usage delta meets the expectations. For now we use it for checking that > subcluster-based discard/unmap operations lead to actual disk usage > decrease (i.e. PUNCH_HOLE operation is performed). I’m not too happy about checking the disk usage because that relies on the underlying filesystem actually accepting and executing the unmap. Why is it not enough to check the L2 bitmap? …Coming back later (I had to fix the missing `ret = ` I mentioned in patch 2, or this test would hang, so I couldn’t run it at first), I note that checking the disk usage in fact doesn’t work on tmpfs. I usually run the iotests in tmpfs, so that’s not great. > Also add separate test case for discarding particular subcluster within > one cluster. > > Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> > --- > tests/qemu-iotests/271 | 25 ++++++++++++++++++++++++- > tests/qemu-iotests/271.out | 2 ++ > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271 > index c7c2cadda0..5fcb209f5f 100755 > --- a/tests/qemu-iotests/271 > +++ b/tests/qemu-iotests/271 > @@ -81,6 +81,15 @@ _verify_l2_bitmap() > fi > } > > +# Check disk usage delta after a discard/unmap operation > +# _verify_du_delta $before $after $expected_delta > +_verify_du_delta() > +{ > + if [ $(($1 - $2)) -ne $3 ]; then > + printf "ERROR: unexpected delta: $1 - $2 = $(($1 - $2)) != $3\n" > + fi > +} > + > # This should be called as _run_test c=XXX sc=XXX off=XXX len=XXX cmd=XXX > # c: cluster number (0 if unset) > # sc: subcluster number inside cluster @c (0 if unset) > @@ -198,9 +207,12 @@ for use_backing_file in yes no; do > alloc="$(seq 0 31)"; zero="" > _run_test sc=0 len=64k > > - ### Zero and unmap half of cluster #0 (this won't unmap it) > + ### Zero and unmap half of cluster #0 (this will unmap it) I think “it” refers to the cluster, and it is not unmapped. This test case does not use a discard, but write -z instead, so it worked before. (The L2 bitmap shown in the output doesn’t change, so functionally, this patch series didn’t change this case.) > alloc="$(seq 16 31)"; zero="$(seq 0 15)" > + before=$(disk_usage "$TEST_IMG") > _run_test sc=0 len=32k cmd=unmap > + after=$(disk_usage "$TEST_IMG") > + _verify_du_delta $before $after 32768 > > ### Zero and unmap cluster #0 > alloc=""; zero="$(seq 0 31)" For this following case shown truncated here, why don’t we try “_run_test sc=16 len=32k cmd=unmap” instead of “sc=0 len=64k”? I.e. unmap only the second half, which, thanks to patch 3, should still unmap the whole cluster, because the first half is already unmapped. > @@ -447,7 +459,10 @@ for use_backing_file in yes no; do > > # Subcluster-aligned request from clusters #12 to #14 > alloc="$(seq 0 15)"; zero="$(seq 16 31)" > + before=$(disk_usage "$TEST_IMG") > _run_test c=12 sc=16 len=128k cmd=unmap > + after=$(disk_usage "$TEST_IMG") > + _verify_du_delta $before $after $((128 * 1024)) > alloc=""; zero="$(seq 0 31)" > _verify_l2_bitmap 13 > alloc="$(seq 16 31)"; zero="$(seq 0 15)" > @@ -528,6 +543,14 @@ for use_backing_file in yes no; do > else > _make_test_img -o extended_l2=on 1M > fi > + # Write cluster #0 and discard its subclusters #0-#3 > + $QEMU_IO -c 'write -q 0 64k' "$TEST_IMG" > + before=$(disk_usage "$TEST_IMG") > + $QEMU_IO -c 'discard -q 0 8k' "$TEST_IMG" > + after=$(disk_usage "$TEST_IMG") > + _verify_du_delta $before $after 8192 > + alloc="$(seq 4 31)"; zero="$(seq 0 3)" > + _verify_l2_bitmap 0 > # Write clusters #0-#2 and then discard them > $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG" > $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG" Similarly to above, I think it would be good if we combined this following case with the one you added, i.e. to write 128k from the beginning, drop the write here, and change the discard to be “discard -q 8k 120k”, i.e. skip the subclusters we have already discarded, to see that this is still combined to discard the whole first cluster. ...Ah, see, and when I try this, the following assertion fails: qemu-io: ../block/qcow2-cache.c:156: qcow2_cache_destroy: Assertion `c->entries[i].ref == 0' failed. ./common.rc: line 220: 128894 Aborted (core dumped) ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" ) Looks like an L2 table is leaked somewhere. That’s why SCRI should be a g_auto()-able type. Hanna > diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out > index 5be780de76..0da8d72cde 100644 > --- a/tests/qemu-iotests/271.out > +++ b/tests/qemu-iotests/271.out > @@ -426,6 +426,7 @@ L2 entry #29: 0x0000000000000000 0000ffff00000000 > ### Discarding clusters with non-zero bitmaps (backing file: yes) ### > > Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=raw > +L2 entry #0: 0x8000000000050000 0000000ffffffff0 > L2 entry #0: 0x0000000000000000 ffffffff00000000 > L2 entry #1: 0x0000000000000000 ffffffff00000000 > Image resized. > @@ -436,6 +437,7 @@ L2 entry #1: 0x0000000000000000 ffffffff00000000 > ### Discarding clusters with non-zero bitmaps (backing file: no) ### > > Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 > +L2 entry #0: 0x8000000000050000 0000000ffffffff0 > L2 entry #0: 0x0000000000000000 ffffffff00000000 > L2 entry #1: 0x0000000000000000 ffffffff00000000 > Image resized.
On 03.11.23 16:51, Hanna Czenczek wrote: > On 20.10.23 23:56, Andrey Drobyshev wrote: [...] >> @@ -528,6 +543,14 @@ for use_backing_file in yes no; do >> else >> _make_test_img -o extended_l2=on 1M >> fi >> + # Write cluster #0 and discard its subclusters #0-#3 >> + $QEMU_IO -c 'write -q 0 64k' "$TEST_IMG" >> + before=$(disk_usage "$TEST_IMG") >> + $QEMU_IO -c 'discard -q 0 8k' "$TEST_IMG" >> + after=$(disk_usage "$TEST_IMG") >> + _verify_du_delta $before $after 8192 >> + alloc="$(seq 4 31)"; zero="$(seq 0 3)" >> + _verify_l2_bitmap 0 >> # Write clusters #0-#2 and then discard them >> $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG" >> $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG" > > Similarly to above, I think it would be good if we combined this > following case with the one you added, i.e. to write 128k from the > beginning, drop the write here, and change the discard to be “discard > -q 8k 120k”, i.e. skip the subclusters we have already discarded, to > see that this is still combined to discard the whole first cluster. > > ...Ah, see, and when I try this, the following assertion fails: > > qemu-io: ../block/qcow2-cache.c:156: qcow2_cache_destroy: Assertion > `c->entries[i].ref == 0' failed. > ./common.rc: line 220: 128894 Aborted (core dumped) ( > VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec > "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" ) > > Looks like an L2 table is leaked somewhere. That’s why SCRI should be > a g_auto()-able type. Forgot to add: This single test case here is the only place where we test the added functionality. I think there should be more cases. It doesn’t really make sense now that 271 has so many cases for writing zeroes, but so few for discarding, now that discarding works on subclusters. Most of them should at least be considered whether we can run them for discard as well. I didn’t want to push for such an extensive set of tests, but, well, now it turned out I overlooked a bug in patch 4, and only found it because I thought “this place might also make a nice test case for this series”. Hanna
On 11/3/23 17:51, Hanna Czenczek wrote: > On 20.10.23 23:56, Andrey Drobyshev wrote: >> Add _verify_du_delta() checker which is used to check that real disk >> usage delta meets the expectations. For now we use it for checking that >> subcluster-based discard/unmap operations lead to actual disk usage >> decrease (i.e. PUNCH_HOLE operation is performed). > > I’m not too happy about checking the disk usage because that relies on > the underlying filesystem actually accepting and executing the unmap. > Why is it not enough to check the L2 bitmap? > > …Coming back later (I had to fix the missing `ret = ` I mentioned in > patch 2, or this test would hang, so I couldn’t run it at first), I note > that checking the disk usage in fact doesn’t work on tmpfs. I usually > run the iotests in tmpfs, so that’s not great. > My original idea was to make sure that the PUNCH_HOLE operation did indeed take place, i.e. there was an actual discard. For instance, currently the discard operation initiated by qemu-io is called with the QCOW2_DISCARD_REQUEST discard type, but if some other type is passed by mistake, qcow2_queue_discard() won't be called, and though the subclusters will be marked unallocated in L2 the data will still be there. Not quite what we expect from discard operation. BTW checking the disk usage on tmpfs works on my machine: > # cd /tmp; df -Th /tmp > Filesystem Type Size Used Avail Use% Mounted on > tmpfs tmpfs 32G 2.5M 32G 1% /tmp > # BUILD=/root/src/qemu/master/build > # $BUILD/qemu-img create -f qcow2 -o extended_l2=on img.qcow2 1M > Formatting 'img.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=on compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16 > # $BUILD/qemu-io -c 'write -q 0 128k' img.qcow2 > # du --block-size=1 img.qcow2 > 397312 img.qcow2 > # $BUILD/qemu-io -f qcow2 -c 'discard -q 0 8k' img.qcow2 > # du --block-size=1 img.qcow2 > 389120 img.qcow2 > # $BUILD/qemu-io -f qcow2 -c 'discard -q 8k 120k' img.qcow2 > # du --block-size=1 img.qcow2 > 266240 img.qcow2 I'm wondering what this might depend on and can't we overcome this? >> Also add separate test case for discarding particular subcluster within >> one cluster. >> >> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> >> --- >> tests/qemu-iotests/271 | 25 ++++++++++++++++++++++++- >> tests/qemu-iotests/271.out | 2 ++ >> 2 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271 >> index c7c2cadda0..5fcb209f5f 100755 >> --- a/tests/qemu-iotests/271 >> +++ b/tests/qemu-iotests/271 >> @@ -81,6 +81,15 @@ _verify_l2_bitmap() >> fi >> } >> +# Check disk usage delta after a discard/unmap operation >> +# _verify_du_delta $before $after $expected_delta >> +_verify_du_delta() >> +{ >> + if [ $(($1 - $2)) -ne $3 ]; then >> + printf "ERROR: unexpected delta: $1 - $2 = $(($1 - $2)) != $3\n" >> + fi >> +} >> + >> # This should be called as _run_test c=XXX sc=XXX off=XXX len=XXX >> cmd=XXX >> # c: cluster number (0 if unset) >> # sc: subcluster number inside cluster @c (0 if unset) >> @@ -198,9 +207,12 @@ for use_backing_file in yes no; do >> alloc="$(seq 0 31)"; zero="" >> _run_test sc=0 len=64k >> - ### Zero and unmap half of cluster #0 (this won't unmap it) >> + ### Zero and unmap half of cluster #0 (this will unmap it) > > I think “it” refers to the cluster, and it is not unmapped. This test > case does not use a discard, but write -z instead, so it worked before. > (The L2 bitmap shown in the output doesn’t change, so functionally, this > patch series didn’t change this case.) > From the _run_test() implementation: > # cmd: the command to pass to qemu-io, must be one of > # write -> write > # zero -> write -z > # unmap -> write -z -u <------------- > # compress -> write -c > # discard -> discard > _run_test() So it actually uses 'write -z -u', and we end up with an actual unmap. I agree that the l2 bitmap doesn't change, that's why I specifically added disk usage check to catch the changed functionality. >> alloc="$(seq 16 31)"; zero="$(seq 0 15)" >> + before=$(disk_usage "$TEST_IMG") >> _run_test sc=0 len=32k cmd=unmap >> + after=$(disk_usage "$TEST_IMG") >> + _verify_du_delta $before $after 32768 >> ### Zero and unmap cluster #0 >> alloc=""; zero="$(seq 0 31)" > > For this following case shown truncated here, why don’t we try > “_run_test sc=16 len=32k cmd=unmap” instead of “sc=0 len=64k”? I.e. > unmap only the second half, which, thanks to patch 3, should still unmap > the whole cluster, because the first half is already unmapped. > Agreed. And the interesting part is that here we'd be calling 'write -u -z', thus following the zero_l2_subclusters() -> discard_l2_subclusters() -> discard_in_l2_slice() path... >> @@ -447,7 +459,10 @@ for use_backing_file in yes no; do >> # Subcluster-aligned request from clusters #12 to #14 >> alloc="$(seq 0 15)"; zero="$(seq 16 31)" >> + before=$(disk_usage "$TEST_IMG") >> _run_test c=12 sc=16 len=128k cmd=unmap >> + after=$(disk_usage "$TEST_IMG") >> + _verify_du_delta $before $after $((128 * 1024)) >> alloc=""; zero="$(seq 0 31)" >> _verify_l2_bitmap 13 >> alloc="$(seq 16 31)"; zero="$(seq 0 15)" >> @@ -528,6 +543,14 @@ for use_backing_file in yes no; do >> else >> _make_test_img -o extended_l2=on 1M >> fi >> + # Write cluster #0 and discard its subclusters #0-#3 >> + $QEMU_IO -c 'write -q 0 64k' "$TEST_IMG" >> + before=$(disk_usage "$TEST_IMG") >> + $QEMU_IO -c 'discard -q 0 8k' "$TEST_IMG" >> + after=$(disk_usage "$TEST_IMG") >> + _verify_du_delta $before $after 8192 >> + alloc="$(seq 4 31)"; zero="$(seq 0 3)" >> + _verify_l2_bitmap 0 >> # Write clusters #0-#2 and then discard them >> $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG" >> $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG" > > Similarly to above, I think it would be good if we combined this > following case with the one you added, i.e. to write 128k from the > beginning, drop the write here, and change the discard to be “discard -q > 8k 120k”, i.e. skip the subclusters we have already discarded, to see > that this is still combined to discard the whole first cluster. > ...and here we'd be calling discard directly, thus following the discard_l2_subclusters() -> discard_in_l2_slice() path. > ...Ah, see, and when I try this, the following assertion fails: > > qemu-io: ../block/qcow2-cache.c:156: qcow2_cache_destroy: Assertion > `c->entries[i].ref == 0' failed. > ./common.rc: line 220: 128894 Aborted (core dumped) ( > VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec > "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" ) > Yes, I should've added qcow2_cache_put() when going discard_l2_subclusters() -> discard_in_l2_slice(), same as I did on the zero_l2_subclusters() -> zero_in_l2_slice() path. Thanks for catching. > Looks like an L2 table is leaked somewhere. That’s why SCRI should be a > g_auto()-able type. > In this case this indeed makes sense, since when we extend the operation from the subclusters range to the entire cluster, SCRI is no longer needed. The only question with this approach is the zero_l2_subclusters() -> discard_l2_subclusters() path. > Hanna > > [...]
On 11/3/23 17:59, Hanna Czenczek wrote: > On 03.11.23 16:51, Hanna Czenczek wrote: >> On 20.10.23 23:56, Andrey Drobyshev wrote: > > [...] > >>> @@ -528,6 +543,14 @@ for use_backing_file in yes no; do >>> else >>> _make_test_img -o extended_l2=on 1M >>> fi >>> + # Write cluster #0 and discard its subclusters #0-#3 >>> + $QEMU_IO -c 'write -q 0 64k' "$TEST_IMG" >>> + before=$(disk_usage "$TEST_IMG") >>> + $QEMU_IO -c 'discard -q 0 8k' "$TEST_IMG" >>> + after=$(disk_usage "$TEST_IMG") >>> + _verify_du_delta $before $after 8192 >>> + alloc="$(seq 4 31)"; zero="$(seq 0 3)" >>> + _verify_l2_bitmap 0 >>> # Write clusters #0-#2 and then discard them >>> $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG" >>> $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG" >> >> Similarly to above, I think it would be good if we combined this >> following case with the one you added, i.e. to write 128k from the >> beginning, drop the write here, and change the discard to be “discard >> -q 8k 120k”, i.e. skip the subclusters we have already discarded, to >> see that this is still combined to discard the whole first cluster. >> >> ...Ah, see, and when I try this, the following assertion fails: >> >> qemu-io: ../block/qcow2-cache.c:156: qcow2_cache_destroy: Assertion >> `c->entries[i].ref == 0' failed. >> ./common.rc: line 220: 128894 Aborted (core dumped) ( >> VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec >> "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" ) >> >> Looks like an L2 table is leaked somewhere. That’s why SCRI should be >> a g_auto()-able type. > > Forgot to add: This single test case here is the only place where we > test the added functionality. I think there should be more cases. It > doesn’t really make sense now that 271 has so many cases for writing > zeroes, but so few for discarding, now that discarding works on > subclusters. Most of them should at least be considered whether we can > run them for discard as well. > > I didn’t want to push for such an extensive set of tests, but, well, now > it turned out I overlooked a bug in patch 4, and only found it because I > thought “this place might also make a nice test case for this series”. > > Hanna > I agree that more coverage should be added. Based on the previous email, I see the following cases: 1. Direct 'discard' on the subclusters range (discard_l2_subclusters()). 2. Direct 'discard' on the subclusters range, complementary to an unallocated range (i.e. discard_l2_subclusters() -> discard_in_l2_slice()). 3. 'write -u -z' on the subclusters range (zero_l2_subclusters() -> discard_l2_subclusters()). 4. 'write -u -z' on the subclusters range, complementary to an unallocated range (zero_l2_subclusters() -> discard_l2_subclusters() -> discard_in_l2_slice()). Would also be nice to test the zero_l2_subclusters() -> zero_in_l2_slice() path, but we'd have to somehow check the refcount table for that since the L2 bitmap doesn't change. Please let me know if you can think of anything else.
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271 index c7c2cadda0..5fcb209f5f 100755 --- a/tests/qemu-iotests/271 +++ b/tests/qemu-iotests/271 @@ -81,6 +81,15 @@ _verify_l2_bitmap() fi } +# Check disk usage delta after a discard/unmap operation +# _verify_du_delta $before $after $expected_delta +_verify_du_delta() +{ + if [ $(($1 - $2)) -ne $3 ]; then + printf "ERROR: unexpected delta: $1 - $2 = $(($1 - $2)) != $3\n" + fi +} + # This should be called as _run_test c=XXX sc=XXX off=XXX len=XXX cmd=XXX # c: cluster number (0 if unset) # sc: subcluster number inside cluster @c (0 if unset) @@ -198,9 +207,12 @@ for use_backing_file in yes no; do alloc="$(seq 0 31)"; zero="" _run_test sc=0 len=64k - ### Zero and unmap half of cluster #0 (this won't unmap it) + ### Zero and unmap half of cluster #0 (this will unmap it) alloc="$(seq 16 31)"; zero="$(seq 0 15)" + before=$(disk_usage "$TEST_IMG") _run_test sc=0 len=32k cmd=unmap + after=$(disk_usage "$TEST_IMG") + _verify_du_delta $before $after 32768 ### Zero and unmap cluster #0 alloc=""; zero="$(seq 0 31)" @@ -447,7 +459,10 @@ for use_backing_file in yes no; do # Subcluster-aligned request from clusters #12 to #14 alloc="$(seq 0 15)"; zero="$(seq 16 31)" + before=$(disk_usage "$TEST_IMG") _run_test c=12 sc=16 len=128k cmd=unmap + after=$(disk_usage "$TEST_IMG") + _verify_du_delta $before $after $((128 * 1024)) alloc=""; zero="$(seq 0 31)" _verify_l2_bitmap 13 alloc="$(seq 16 31)"; zero="$(seq 0 15)" @@ -528,6 +543,14 @@ for use_backing_file in yes no; do else _make_test_img -o extended_l2=on 1M fi + # Write cluster #0 and discard its subclusters #0-#3 + $QEMU_IO -c 'write -q 0 64k' "$TEST_IMG" + before=$(disk_usage "$TEST_IMG") + $QEMU_IO -c 'discard -q 0 8k' "$TEST_IMG" + after=$(disk_usage "$TEST_IMG") + _verify_du_delta $before $after 8192 + alloc="$(seq 4 31)"; zero="$(seq 0 3)" + _verify_l2_bitmap 0 # Write clusters #0-#2 and then discard them $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG" $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG" diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out index 5be780de76..0da8d72cde 100644 --- a/tests/qemu-iotests/271.out +++ b/tests/qemu-iotests/271.out @@ -426,6 +426,7 @@ L2 entry #29: 0x0000000000000000 0000ffff00000000 ### Discarding clusters with non-zero bitmaps (backing file: yes) ### Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=raw +L2 entry #0: 0x8000000000050000 0000000ffffffff0 L2 entry #0: 0x0000000000000000 ffffffff00000000 L2 entry #1: 0x0000000000000000 ffffffff00000000 Image resized. @@ -436,6 +437,7 @@ L2 entry #1: 0x0000000000000000 ffffffff00000000 ### Discarding clusters with non-zero bitmaps (backing file: no) ### Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 +L2 entry #0: 0x8000000000050000 0000000ffffffff0 L2 entry #0: 0x0000000000000000 ffffffff00000000 L2 entry #1: 0x0000000000000000 ffffffff00000000 Image resized.
Add _verify_du_delta() checker which is used to check that real disk usage delta meets the expectations. For now we use it for checking that subcluster-based discard/unmap operations lead to actual disk usage decrease (i.e. PUNCH_HOLE operation is performed). Also add separate test case for discarding particular subcluster within one cluster. Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> --- tests/qemu-iotests/271 | 25 ++++++++++++++++++++++++- tests/qemu-iotests/271.out | 2 ++ 2 files changed, 26 insertions(+), 1 deletion(-)