diff mbox series

qemu-img: Allow target be aligned to sector size

Message ID 20210819101200.64235-1-hreitz@redhat.com
State New
Headers show
Series qemu-img: Allow target be aligned to sector size | expand

Commit Message

Hanna Czenczek Aug. 19, 2021, 10:12 a.m. UTC
We cannot write to images opened with O_DIRECT unless we allow them to
be resized so they are aligned to the sector size: Since 9c60a5d1978,
bdrv_node_refresh_perm() ensures that for nodes whose length is not
aligned to the request alignment and where someone has taken a WRITE
permission, the RESIZE permission is taken, too).

Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
blk_new_open() to take the RESIZE permission) when using cache=none for
the target, so that when writing to it, it can be aligned to the target
sector size.

Without this patch, an error is returned:

$ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
permission without 'resize': Image size is not a multiple of request
alignment

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
As I have written in the BZ linked above, I am not sure what behavior we
want.  It can be argued that the current behavior is perfectly OK
because we want the target to have the same size as the source, so if
this cannot be done, we should just print the above error (which I think
explains the problem well enough that users can figure out they need to
resize the source image).

OTOH, it is difficult for me to imagine a case where the user would
prefer the above error to just having qemu-img align the target image's
length.
---
 qemu-img.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jose R. Ziviani Aug. 19, 2021, 2:31 p.m. UTC | #1
Hello Hanna,

On Thu, Aug 19, 2021 at 12:12:00PM +0200, Hanna Reitz wrote:
> We cannot write to images opened with O_DIRECT unless we allow them to
> be resized so they are aligned to the sector size: Since 9c60a5d1978,
> bdrv_node_refresh_perm() ensures that for nodes whose length is not
> aligned to the request alignment and where someone has taken a WRITE
> permission, the RESIZE permission is taken, too).
> 
> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
> blk_new_open() to take the RESIZE permission) when using cache=none for
> the target, so that when writing to it, it can be aligned to the target
> sector size.
> 
> Without this patch, an error is returned:
> 
> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
> permission without 'resize': Image size is not a multiple of request
> alignment
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> As I have written in the BZ linked above, I am not sure what behavior we
> want.  It can be argued that the current behavior is perfectly OK
> because we want the target to have the same size as the source, so if
> this cannot be done, we should just print the above error (which I think
> explains the problem well enough that users can figure out they need to
> resize the source image).
> 
> OTOH, it is difficult for me to imagine a case where the user would
> prefer the above error to just having qemu-img align the target image's
> length.

This is timely convenient because I'm currently investigating an issue detected
by a libvirt-tck test:

https://gitlab.com/libvirt/libvirt-tck/-/blob/master/scripts/qemu/200-qcow2-single-backing-file.t

It fails with the same message:
qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk1,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment

Libvirt-tck basically wants to make sure that libvirt won't pass a 'backing'
argument if we force a QCOW2 image to be interpreted as a RAW image. But, the
actual size of a (not preallocated) QCOW2 is usually unaligned so we ended up
failing at that requirement.

I crafted a reproducer (tck-main is a QCOW2 image):
 $ ./qemu-system-x86_64 \
  -machine pc-i440fx-6.0,accel=kvm -m 1024 -display none -no-user-config -nodefaults \
  -kernel vmlinuz -initrd initrd \
  -blockdev driver=file,filename=/var/cache/libvirt-tck/storage-fs/tck/tck-main,node-name=a,cache.direct=on \
  -blockdev node-name=name,driver=raw,file=a \
  -device virtio-blk-pci,drive=name

And if I remove 'cache.direct=on' OR if I remove the device virtio-blk-pci I
don't hit the failure.

In order to fix the libvirt-tck test case, I think that creating a preallocated
QCOW2 image is the best approach, considering their test case goal. But, if you
don't mind, could you explain why cache.direct=on doesn't set resize permission
with virtio-blk-pci?

Thank you!

> ---
>  qemu-img.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 908fd0cce5..d4b29bf73e 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>          goto out;
>      }
>  
> +    if (flags & BDRV_O_NOCACHE) {
> +        /*
> +         * If we open the target with O_DIRECT, it may be necessary to
> +         * extend its size to align to the physical sector size.
> +         */
> +        flags |= BDRV_O_RESIZE;
> +    }
> +
>      if (skip_create) {
>          s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>                              flags, writethrough, s.quiet, false);
> -- 
> 2.31.1
> 
>
Hanna Czenczek Aug. 19, 2021, 3:14 p.m. UTC | #2
On 19.08.21 16:31, Jose R. Ziviani wrote:
> Hello Hanna,
>
> On Thu, Aug 19, 2021 at 12:12:00PM +0200, Hanna Reitz wrote:
>> We cannot write to images opened with O_DIRECT unless we allow them to
>> be resized so they are aligned to the sector size: Since 9c60a5d1978,
>> bdrv_node_refresh_perm() ensures that for nodes whose length is not
>> aligned to the request alignment and where someone has taken a WRITE
>> permission, the RESIZE permission is taken, too).
>>
>> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
>> blk_new_open() to take the RESIZE permission) when using cache=none for
>> the target, so that when writing to it, it can be aligned to the target
>> sector size.
>>
>> Without this patch, an error is returned:
>>
>> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
>> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
>> permission without 'resize': Image size is not a multiple of request
>> alignment
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>> As I have written in the BZ linked above, I am not sure what behavior we
>> want.  It can be argued that the current behavior is perfectly OK
>> because we want the target to have the same size as the source, so if
>> this cannot be done, we should just print the above error (which I think
>> explains the problem well enough that users can figure out they need to
>> resize the source image).
>>
>> OTOH, it is difficult for me to imagine a case where the user would
>> prefer the above error to just having qemu-img align the target image's
>> length.
> This is timely convenient because I'm currently investigating an issue detected
> by a libvirt-tck test:
>
> https://gitlab.com/libvirt/libvirt-tck/-/blob/master/scripts/qemu/200-qcow2-single-backing-file.t
>
> It fails with the same message:
> qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk1,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment
>
> Libvirt-tck basically wants to make sure that libvirt won't pass a 'backing'
> argument if we force a QCOW2 image to be interpreted as a RAW image. But, the
> actual size of a (not preallocated) QCOW2 is usually unaligned so we ended up
> failing at that requirement.
>
> I crafted a reproducer (tck-main is a QCOW2 image):
>   $ ./qemu-system-x86_64 \
>    -machine pc-i440fx-6.0,accel=kvm -m 1024 -display none -no-user-config -nodefaults \
>    -kernel vmlinuz -initrd initrd \
>    -blockdev driver=file,filename=/var/cache/libvirt-tck/storage-fs/tck/tck-main,node-name=a,cache.direct=on \
>    -blockdev node-name=name,driver=raw,file=a \
>    -device virtio-blk-pci,drive=name
>
> And if I remove 'cache.direct=on' OR if I remove the device virtio-blk-pci I
> don't hit the failure.
>
> In order to fix the libvirt-tck test case, I think that creating a preallocated
> QCOW2 image is the best approach, considering their test case goal. But, if you
> don't mind, could you explain why cache.direct=on doesn't set resize permission
> with virtio-blk-pci?

Well, users only take the permissions they need.  Because the device 
doesn’t actively want to resize the disk, it doesn’t take the permission 
(because other simultaneous users might not share that permission, and 
so taking more permissions than you need may cause problems).

So the decision whether to take the permission or not is a tradeoff.  I 
mean, there’s always a work-around: The error message tells you that the 
image is not aligned to the request alignment, so you can align the 
image size manually.  The question is whether taking the permissions may 
be problematic, and whether the user can be expected to align the image 
size.

(And we also need to keep in mind that this case is extremely rare. I 
don’t think that users in practice will ever have images that are not 
4k-aligned.  What the test is doing is of course something that would 
never happen in a practical set-up, in fact, I believe attaching a qcow2 
image as a raw image to a VM is something that would usually be 
considered dangerous from a security perspective.[1])

For qemu-img convert (i.e. for this patch), I decided that it is 
extremely unlikely there are concurrent users for the target, so I can’t 
imagine taking more permissions would cause problems.  The only downside 
I could see is that the target image would be larger than the source 
image, but I think that is still the outcome that users want.

On the other side, applying the work-around may be problematic for 
users: The source image of qemu-img convert shouldn’t have to be 
writable.  So resizing it (just so it can be converted to be stored on a 
medium with 4k sector size) may actually be impossible for the user.

Now, for the virtio-blk case, that is different.  If you’re willing to 
apply the work-around, then you don’t have to do so on an “innocent 
bystander” image.  You have to resize the very image you want to use, 
i.e. one that must be writable anyway.  So resizing the image outside of 
qemu to match the alignment is feasible.

OTOH, because this is a live and complete image, it’s entirely possible 
that there are concurrent users that would block virtio-blk from taking 
the RESIZE permission, so I don’t think we should take it lightly.

So I think for the virtio-blk case the weight of pro and contra is very 
different than for qemu-img.  I’m not sure we should take the RESIZE 
permission in that case, especially considering that the example is not 
a real-world one.

I think if we really want to improve something for the virtio-blk case, 
it would be to have the error message tell what the request alignment 
is, and to add an error hint like

“Try resizing the image using `qemu-img resize -f {bs->drv->format_name} 
+{ROUND_UP(length, aligment) - length}`.”

Hanna

[1] Just to have it mentioned: Attaching a qcow2 image as a qcow2 image 
should work perfectly fine, because the qcow2 driver takes the RESIZE 
permission.
Jose R. Ziviani Aug. 19, 2021, 6:58 p.m. UTC | #3
On Thu, Aug 19, 2021 at 05:14:30PM +0200, Hanna Reitz wrote:
> On 19.08.21 16:31, Jose R. Ziviani wrote:
> > Hello Hanna,
> > 
> > On Thu, Aug 19, 2021 at 12:12:00PM +0200, Hanna Reitz wrote:
> > > We cannot write to images opened with O_DIRECT unless we allow them to
> > > be resized so they are aligned to the sector size: Since 9c60a5d1978,
> > > bdrv_node_refresh_perm() ensures that for nodes whose length is not
> > > aligned to the request alignment and where someone has taken a WRITE
> > > permission, the RESIZE permission is taken, too).
> > > 
> > > Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
> > > blk_new_open() to take the RESIZE permission) when using cache=none for
> > > the target, so that when writing to it, it can be aligned to the target
> > > sector size.
> > > 
> > > Without this patch, an error is returned:
> > > 
> > > $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
> > > qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
> > > permission without 'resize': Image size is not a multiple of request
> > > alignment
> > > 
> > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
> > > Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> > > ---
> > > As I have written in the BZ linked above, I am not sure what behavior we
> > > want.  It can be argued that the current behavior is perfectly OK
> > > because we want the target to have the same size as the source, so if
> > > this cannot be done, we should just print the above error (which I think
> > > explains the problem well enough that users can figure out they need to
> > > resize the source image).
> > > 
> > > OTOH, it is difficult for me to imagine a case where the user would
> > > prefer the above error to just having qemu-img align the target image's
> > > length.
> > This is timely convenient because I'm currently investigating an issue detected
> > by a libvirt-tck test:
> > 
> > https://gitlab.com/libvirt/libvirt-tck/-/blob/master/scripts/qemu/200-qcow2-single-backing-file.t
> > 
> > It fails with the same message:
> > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk1,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment
> > 
> > Libvirt-tck basically wants to make sure that libvirt won't pass a 'backing'
> > argument if we force a QCOW2 image to be interpreted as a RAW image. But, the
> > actual size of a (not preallocated) QCOW2 is usually unaligned so we ended up
> > failing at that requirement.
> > 
> > I crafted a reproducer (tck-main is a QCOW2 image):
> >   $ ./qemu-system-x86_64 \
> >    -machine pc-i440fx-6.0,accel=kvm -m 1024 -display none -no-user-config -nodefaults \
> >    -kernel vmlinuz -initrd initrd \
> >    -blockdev driver=file,filename=/var/cache/libvirt-tck/storage-fs/tck/tck-main,node-name=a,cache.direct=on \
> >    -blockdev node-name=name,driver=raw,file=a \
> >    -device virtio-blk-pci,drive=name
> > 
> > And if I remove 'cache.direct=on' OR if I remove the device virtio-blk-pci I
> > don't hit the failure.
> > 
> > In order to fix the libvirt-tck test case, I think that creating a preallocated
> > QCOW2 image is the best approach, considering their test case goal. But, if you
> > don't mind, could you explain why cache.direct=on doesn't set resize permission
> > with virtio-blk-pci?
> 
> Well, users only take the permissions they need.  Because the device doesn’t
> actively want to resize the disk, it doesn’t take the permission (because
> other simultaneous users might not share that permission, and so taking more
> permissions than you need may cause problems).
> 
> So the decision whether to take the permission or not is a tradeoff.  I
> mean, there’s always a work-around: The error message tells you that the
> image is not aligned to the request alignment, so you can align the image
> size manually.  The question is whether taking the permissions may be
> problematic, and whether the user can be expected to align the image size.
> 
> (And we also need to keep in mind that this case is extremely rare. I don’t
> think that users in practice will ever have images that are not 4k-aligned. 
> What the test is doing is of course something that would never happen in a
> practical set-up, in fact, I believe attaching a qcow2 image as a raw image
> to a VM is something that would usually be considered dangerous from a
> security perspective.[1])
> 
> For qemu-img convert (i.e. for this patch), I decided that it is extremely
> unlikely there are concurrent users for the target, so I can’t imagine
> taking more permissions would cause problems.  The only downside I could see
> is that the target image would be larger than the source image, but I think
> that is still the outcome that users want.
> 
> On the other side, applying the work-around may be problematic for users:
> The source image of qemu-img convert shouldn’t have to be writable.  So
> resizing it (just so it can be converted to be stored on a medium with 4k
> sector size) may actually be impossible for the user.
> 
> Now, for the virtio-blk case, that is different.  If you’re willing to apply
> the work-around, then you don’t have to do so on an “innocent bystander”
> image.  You have to resize the very image you want to use, i.e. one that
> must be writable anyway.  So resizing the image outside of qemu to match the
> alignment is feasible.
> 
> OTOH, because this is a live and complete image, it’s entirely possible that
> there are concurrent users that would block virtio-blk from taking the
> RESIZE permission, so I don’t think we should take it lightly.
> 
> So I think for the virtio-blk case the weight of pro and contra is very
> different than for qemu-img.  I’m not sure we should take the RESIZE
> permission in that case, especially considering that the example is not a
> real-world one.
> 
> I think if we really want to improve something for the virtio-blk case, it
> would be to have the error message tell what the request alignment is, and
> to add an error hint like
> 
> “Try resizing the image using `qemu-img resize -f {bs->drv->format_name}
> +{ROUND_UP(length, aligment) - length}`.”
> 
> Hanna
> 
> [1] Just to have it mentioned: Attaching a qcow2 image as a qcow2 image
> should work perfectly fine, because the qcow2 driver takes the RESIZE
> permission.
> 

Yep, it works perfectly fine. I only get the alignment error because (I think)
the RAW driver sees a QCOW2 containing only a few kB of metadata and assumes
it's the whole disk size. Even after some debugging I was not understanding
much about the requirements that apply to that RESIZE permission, which is
clear now.

Thank you SO much for such detailed explanation!

Jose
Hanna Czenczek Sept. 7, 2021, 9:58 a.m. UTC | #4
Ping – any thoughts on this?

Hanna

On 19.08.21 12:12, Hanna Reitz wrote:
> We cannot write to images opened with O_DIRECT unless we allow them to
> be resized so they are aligned to the sector size: Since 9c60a5d1978,
> bdrv_node_refresh_perm() ensures that for nodes whose length is not
> aligned to the request alignment and where someone has taken a WRITE
> permission, the RESIZE permission is taken, too).
>
> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
> blk_new_open() to take the RESIZE permission) when using cache=none for
> the target, so that when writing to it, it can be aligned to the target
> sector size.
>
> Without this patch, an error is returned:
>
> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
> permission without 'resize': Image size is not a multiple of request
> alignment
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> As I have written in the BZ linked above, I am not sure what behavior we
> want.  It can be argued that the current behavior is perfectly OK
> because we want the target to have the same size as the source, so if
> this cannot be done, we should just print the above error (which I think
> explains the problem well enough that users can figure out they need to
> resize the source image).
>
> OTOH, it is difficult for me to imagine a case where the user would
> prefer the above error to just having qemu-img align the target image's
> length.
> ---
>   qemu-img.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index 908fd0cce5..d4b29bf73e 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>           goto out;
>       }
>   
> +    if (flags & BDRV_O_NOCACHE) {
> +        /*
> +         * If we open the target with O_DIRECT, it may be necessary to
> +         * extend its size to align to the physical sector size.
> +         */
> +        flags |= BDRV_O_RESIZE;
> +    }
> +
>       if (skip_create) {
>           s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>                               flags, writethrough, s.quiet, false);
Vladimir Sementsov-Ogievskiy Sept. 7, 2021, 11:29 a.m. UTC | #5
19.08.2021 13:12, Hanna Reitz wrote:
> We cannot write to images opened with O_DIRECT unless we allow them to
> be resized so they are aligned to the sector size: Since 9c60a5d1978,
> bdrv_node_refresh_perm() ensures that for nodes whose length is not
> aligned to the request alignment and where someone has taken a WRITE
> permission, the RESIZE permission is taken, too).
> 
> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
> blk_new_open() to take the RESIZE permission) when using cache=none for
> the target, so that when writing to it, it can be aligned to the target
> sector size.
> 
> Without this patch, an error is returned:
> 
> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
> permission without 'resize': Image size is not a multiple of request
> alignment
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> As I have written in the BZ linked above, I am not sure what behavior we
> want.  It can be argued that the current behavior is perfectly OK
> because we want the target to have the same size as the source, so if
> this cannot be done, we should just print the above error (which I think
> explains the problem well enough that users can figure out they need to
> resize the source image).
> 
> OTOH, it is difficult for me to imagine a case where the user would
> prefer the above error to just having qemu-img align the target image's
> length.
> ---
>   qemu-img.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 908fd0cce5..d4b29bf73e 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>           goto out;
>       }
>   
> +    if (flags & BDRV_O_NOCACHE) {
> +        /*
> +         * If we open the target with O_DIRECT, it may be necessary to
> +         * extend its size to align to the physical sector size.
> +         */
> +        flags |= BDRV_O_RESIZE;
> +    }
> +
>       if (skip_create) {
>           s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>                               flags, writethrough, s.quiet, false);
> 

Hmm. Strange. I am experimenting now with master branch without that patch and can't reproduce:

[root@kvm master]# dd if=/dev/zero of=a bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00153639 s, 1.4 GB/s
[root@kvm master]# dd if=/dev/zero of=b bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.000939314 s, 1.1 GB/s
[root@kvm master]# ls -lthr a b
-rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
-rw-r--r--. 1 root root 1.0M Sep  7 14:28 b
[root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
[root@kvm master]# ls -lthr a b
-rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
-rw-r--r--. 1 root root 2.0M Sep  7 14:28 b
[root@kvm master]# dd if=/dev/zero of=b bs=1 count=2097147
2097147+0 records in
2097147+0 records out
2097147 bytes (2.1 MB, 2.0 MiB) copied, 2.76548 s, 758 kB/s
[root@kvm master]# ls -ltr a b
-rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
-rw-r--r--. 1 root root 2097147 Sep  7 14:29 b
[root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
[root@kvm master]# ls -ltr a b
-rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
-rw-r--r--. 1 root root 2097152 Sep  7 14:29 b

It just works, and do resize target without any additional BDRV_O_RESIZE...
Hanna Czenczek Sept. 7, 2021, 12:48 p.m. UTC | #6
On 07.09.21 13:29, Vladimir Sementsov-Ogievskiy wrote:
> 19.08.2021 13:12, Hanna Reitz wrote:
>> We cannot write to images opened with O_DIRECT unless we allow them to
>> be resized so they are aligned to the sector size: Since 9c60a5d1978,
>> bdrv_node_refresh_perm() ensures that for nodes whose length is not
>> aligned to the request alignment and where someone has taken a WRITE
>> permission, the RESIZE permission is taken, too).
>>
>> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
>> blk_new_open() to take the RESIZE permission) when using cache=none for
>> the target, so that when writing to it, it can be aligned to the target
>> sector size.
>>
>> Without this patch, an error is returned:
>>
>> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
>> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
>> permission without 'resize': Image size is not a multiple of request
>> alignment
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>> As I have written in the BZ linked above, I am not sure what behavior we
>> want.  It can be argued that the current behavior is perfectly OK
>> because we want the target to have the same size as the source, so if
>> this cannot be done, we should just print the above error (which I think
>> explains the problem well enough that users can figure out they need to
>> resize the source image).
>>
>> OTOH, it is difficult for me to imagine a case where the user would
>> prefer the above error to just having qemu-img align the target image's
>> length.
>> ---
>>   qemu-img.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 908fd0cce5..d4b29bf73e 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>>           goto out;
>>       }
>>   +    if (flags & BDRV_O_NOCACHE) {
>> +        /*
>> +         * If we open the target with O_DIRECT, it may be necessary to
>> +         * extend its size to align to the physical sector size.
>> +         */
>> +        flags |= BDRV_O_RESIZE;
>> +    }
>> +
>>       if (skip_create) {
>>           s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>>                               flags, writethrough, s.quiet, false);
>>
>
> Hmm. Strange. I am experimenting now with master branch without that 
> patch and can't reproduce:
>
> [root@kvm master]# dd if=/dev/zero of=a bs=1M count=2
> 2+0 records in
> 2+0 records out
> 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00153639 s, 1.4 GB/s
> [root@kvm master]# dd if=/dev/zero of=b bs=1M count=1
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.000939314 s, 1.1 GB/s
> [root@kvm master]# ls -lthr a b
> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
> -rw-r--r--. 1 root root 1.0M Sep  7 14:28 b
> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
> [root@kvm master]# ls -lthr a b
> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 b
> [root@kvm master]# dd if=/dev/zero of=b bs=1 count=2097147
> 2097147+0 records in
> 2097147+0 records out
> 2097147 bytes (2.1 MB, 2.0 MiB) copied, 2.76548 s, 758 kB/s
> [root@kvm master]# ls -ltr a b
> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
> -rw-r--r--. 1 root root 2097147 Sep  7 14:29 b
> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
> [root@kvm master]# ls -ltr a b
> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
> -rw-r--r--. 1 root root 2097152 Sep  7 14:29 b
>
> It just works, and do resize target without any additional 
> BDRV_O_RESIZE...

bdrv_getlength() invokes bdrv_nb_sectors(), so we always align sizes to 
512 anyway.  On volumes with a logical sector size of 512, this error 
cannot be reproduced.

You can use loop devices to get volumes with other sector sizes, like so:

$ cd /tmp
$ truncate fs.img -s 128M
$ sudo losetup -f --show --sector-size 4096 fs.img
/dev/loop0
$ sudo mkfs.ext4 /dev/loop0
mke2fs 1.46.4 (18-Aug-2021)
Discarding device blocks: done
Creating filesystem with 32768 4k blocks and 32768 inodes

Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

$ sudo mount /dev/loop0 /mnt/tmp
$ qemu-img create -f raw source.img $((2 * 1048576 + 512))
Formatting 'source.img', fmt=raw size=2097664
$ sudo qemu-img convert -f raw -O raw -t none source.img /mnt/tmp/target.img
qemu-img: Could not open '/mnt/tmp/target.img': Cannot get 'write' 
permission without 'resize': Image size is not a multiple of request 
alignment
Vladimir Sementsov-Ogievskiy Sept. 7, 2021, 1:44 p.m. UTC | #7
07.09.2021 15:48, Hanna Reitz wrote:
> On 07.09.21 13:29, Vladimir Sementsov-Ogievskiy wrote:
>> 19.08.2021 13:12, Hanna Reitz wrote:
>>> We cannot write to images opened with O_DIRECT unless we allow them to
>>> be resized so they are aligned to the sector size: Since 9c60a5d1978,
>>> bdrv_node_refresh_perm() ensures that for nodes whose length is not
>>> aligned to the request alignment and where someone has taken a WRITE
>>> permission, the RESIZE permission is taken, too).
>>>
>>> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
>>> blk_new_open() to take the RESIZE permission) when using cache=none for
>>> the target, so that when writing to it, it can be aligned to the target
>>> sector size.
>>>
>>> Without this patch, an error is returned:
>>>
>>> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
>>> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
>>> permission without 'resize': Image size is not a multiple of request
>>> alignment
>>>
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
>>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>>> ---
>>> As I have written in the BZ linked above, I am not sure what behavior we
>>> want.  It can be argued that the current behavior is perfectly OK
>>> because we want the target to have the same size as the source, so if
>>> this cannot be done, we should just print the above error (which I think
>>> explains the problem well enough that users can figure out they need to
>>> resize the source image).
>>>
>>> OTOH, it is difficult for me to imagine a case where the user would
>>> prefer the above error to just having qemu-img align the target image's
>>> length.
>>> ---
>>>   qemu-img.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index 908fd0cce5..d4b29bf73e 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>>>           goto out;
>>>       }
>>>   +    if (flags & BDRV_O_NOCACHE) {
>>> +        /*
>>> +         * If we open the target with O_DIRECT, it may be necessary to
>>> +         * extend its size to align to the physical sector size.
>>> +         */
>>> +        flags |= BDRV_O_RESIZE;
>>> +    }
>>> +
>>>       if (skip_create) {
>>>           s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>>>                               flags, writethrough, s.quiet, false);
>>>
>>
>> Hmm. Strange. I am experimenting now with master branch without that patch and can't reproduce:
>>
>> [root@kvm master]# dd if=/dev/zero of=a bs=1M count=2
>> 2+0 records in
>> 2+0 records out
>> 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00153639 s, 1.4 GB/s
>> [root@kvm master]# dd if=/dev/zero of=b bs=1M count=1
>> 1+0 records in
>> 1+0 records out
>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.000939314 s, 1.1 GB/s
>> [root@kvm master]# ls -lthr a b
>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
>> -rw-r--r--. 1 root root 1.0M Sep  7 14:28 b
>> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
>> [root@kvm master]# ls -lthr a b
>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 b
>> [root@kvm master]# dd if=/dev/zero of=b bs=1 count=2097147
>> 2097147+0 records in
>> 2097147+0 records out
>> 2097147 bytes (2.1 MB, 2.0 MiB) copied, 2.76548 s, 758 kB/s
>> [root@kvm master]# ls -ltr a b
>> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
>> -rw-r--r--. 1 root root 2097147 Sep  7 14:29 b
>> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
>> [root@kvm master]# ls -ltr a b
>> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
>> -rw-r--r--. 1 root root 2097152 Sep  7 14:29 b
>>
>> It just works, and do resize target without any additional BDRV_O_RESIZE...
> 
> bdrv_getlength() invokes bdrv_nb_sectors(), so we always align sizes to 512 anyway.  On volumes with a logical sector size of 512, this error cannot be reproduced.
> 
> You can use loop devices to get volumes with other sector sizes, like so:
> 
> $ cd /tmp
> $ truncate fs.img -s 128M
> $ sudo losetup -f --show --sector-size 4096 fs.img
> /dev/loop0
> $ sudo mkfs.ext4 /dev/loop0
> mke2fs 1.46.4 (18-Aug-2021)
> Discarding device blocks: done
> Creating filesystem with 32768 4k blocks and 32768 inodes
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (4096 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> $ sudo mount /dev/loop0 /mnt/tmp
> $ qemu-img create -f raw source.img $((2 * 1048576 + 512))
> Formatting 'source.img', fmt=raw size=2097664
> $ sudo qemu-img convert -f raw -O raw -t none source.img /mnt/tmp/target.img
> qemu-img: Could not open '/mnt/tmp/target.img': Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment
> 


Does it mean, that when logical sector size is 512, qemu-img do resize target without any 'resize' permission?

It looks very strange: in both cases we need to resize target. With 512-bytes sectors it just works (look it resizes 1M->2M which is a lot larger than 512b). And with 4096-bytes sectors it needs additional BDRV_O_RESIZE..
Hanna Czenczek Sept. 7, 2021, 2 p.m. UTC | #8
On 07.09.21 15:44, Vladimir Sementsov-Ogievskiy wrote:
> 07.09.2021 15:48, Hanna Reitz wrote:
>> On 07.09.21 13:29, Vladimir Sementsov-Ogievskiy wrote:
>>> 19.08.2021 13:12, Hanna Reitz wrote:
>>>> We cannot write to images opened with O_DIRECT unless we allow them to
>>>> be resized so they are aligned to the sector size: Since 9c60a5d1978,
>>>> bdrv_node_refresh_perm() ensures that for nodes whose length is not
>>>> aligned to the request alignment and where someone has taken a WRITE
>>>> permission, the RESIZE permission is taken, too).
>>>>
>>>> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
>>>> blk_new_open() to take the RESIZE permission) when using cache=none 
>>>> for
>>>> the target, so that when writing to it, it can be aligned to the 
>>>> target
>>>> sector size.
>>>>
>>>> Without this patch, an error is returned:
>>>>
>>>> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
>>>> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
>>>> permission without 'resize': Image size is not a multiple of request
>>>> alignment
>>>>
>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
>>>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>>>> ---
>>>> As I have written in the BZ linked above, I am not sure what 
>>>> behavior we
>>>> want.  It can be argued that the current behavior is perfectly OK
>>>> because we want the target to have the same size as the source, so if
>>>> this cannot be done, we should just print the above error (which I 
>>>> think
>>>> explains the problem well enough that users can figure out they 
>>>> need to
>>>> resize the source image).
>>>>
>>>> OTOH, it is difficult for me to imagine a case where the user would
>>>> prefer the above error to just having qemu-img align the target 
>>>> image's
>>>> length.
>>>> ---
>>>>   qemu-img.c | 8 ++++++++
>>>>   1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>> index 908fd0cce5..d4b29bf73e 100644
>>>> --- a/qemu-img.c
>>>> +++ b/qemu-img.c
>>>> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>>>>           goto out;
>>>>       }
>>>>   +    if (flags & BDRV_O_NOCACHE) {
>>>> +        /*
>>>> +         * If we open the target with O_DIRECT, it may be 
>>>> necessary to
>>>> +         * extend its size to align to the physical sector size.
>>>> +         */
>>>> +        flags |= BDRV_O_RESIZE;
>>>> +    }
>>>> +
>>>>       if (skip_create) {
>>>>           s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>>>>                               flags, writethrough, s.quiet, false);
>>>>
>>>
>>> Hmm. Strange. I am experimenting now with master branch without that 
>>> patch and can't reproduce:
>>>
>>> [root@kvm master]# dd if=/dev/zero of=a bs=1M count=2
>>> 2+0 records in
>>> 2+0 records out
>>> 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00153639 s, 1.4 GB/s
>>> [root@kvm master]# dd if=/dev/zero of=b bs=1M count=1
>>> 1+0 records in
>>> 1+0 records out
>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.000939314 s, 1.1 GB/s
>>> [root@kvm master]# ls -lthr a b
>>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
>>> -rw-r--r--. 1 root root 1.0M Sep  7 14:28 b
>>> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
>>> [root@kvm master]# ls -lthr a b
>>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
>>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 b
>>> [root@kvm master]# dd if=/dev/zero of=b bs=1 count=2097147
>>> 2097147+0 records in
>>> 2097147+0 records out
>>> 2097147 bytes (2.1 MB, 2.0 MiB) copied, 2.76548 s, 758 kB/s
>>> [root@kvm master]# ls -ltr a b
>>> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
>>> -rw-r--r--. 1 root root 2097147 Sep  7 14:29 b
>>> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
>>> [root@kvm master]# ls -ltr a b
>>> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
>>> -rw-r--r--. 1 root root 2097152 Sep  7 14:29 b
>>>
>>> It just works, and do resize target without any additional 
>>> BDRV_O_RESIZE...
>>
>> bdrv_getlength() invokes bdrv_nb_sectors(), so we always align sizes 
>> to 512 anyway.  On volumes with a logical sector size of 512, this 
>> error cannot be reproduced.
>>
>> You can use loop devices to get volumes with other sector sizes, like 
>> so:
>>
>> $ cd /tmp
>> $ truncate fs.img -s 128M
>> $ sudo losetup -f --show --sector-size 4096 fs.img
>> /dev/loop0
>> $ sudo mkfs.ext4 /dev/loop0
>> mke2fs 1.46.4 (18-Aug-2021)
>> Discarding device blocks: done
>> Creating filesystem with 32768 4k blocks and 32768 inodes
>>
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (4096 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> $ sudo mount /dev/loop0 /mnt/tmp
>> $ qemu-img create -f raw source.img $((2 * 1048576 + 512))
>> Formatting 'source.img', fmt=raw size=2097664
>> $ sudo qemu-img convert -f raw -O raw -t none source.img 
>> /mnt/tmp/target.img
>> qemu-img: Could not open '/mnt/tmp/target.img': Cannot get 'write' 
>> permission without 'resize': Image size is not a multiple of request 
>> alignment
>>
>
>
> Does it mean, that when logical sector size is 512, qemu-img do resize 
> target without any 'resize' permission?

Well, it creates the target with the size of the source, and the size of 
the source will always be reported to be aligned to 512.  If we didn’t 
have bdrv_nb_sectors() but used a byte granularity for a BDS’s length 
internally, then you could see this error with 512-byte-sectored 
volumes, too.

So from qemu’s point of view, it doesn’t need a `resize` permission 
because it doesn’t need resize the target, because it’s always created 
with a 512-byte-aligned size anyway.

Hanna

> It looks very strange: in both cases we need to resize target. With 
> 512-bytes sectors it just works (look it resizes 1M->2M which is a lot 
> larger than 512b). And with 4096-bytes sectors it needs additional 
> BDRV_O_RESIZE..
Vladimir Sementsov-Ogievskiy Sept. 7, 2021, 2:18 p.m. UTC | #9
07.09.2021 17:00, Hanna Reitz wrote:
> On 07.09.21 15:44, Vladimir Sementsov-Ogievskiy wrote:
>> 07.09.2021 15:48, Hanna Reitz wrote:
>>> On 07.09.21 13:29, Vladimir Sementsov-Ogievskiy wrote:
>>>> 19.08.2021 13:12, Hanna Reitz wrote:
>>>>> We cannot write to images opened with O_DIRECT unless we allow them to
>>>>> be resized so they are aligned to the sector size: Since 9c60a5d1978,
>>>>> bdrv_node_refresh_perm() ensures that for nodes whose length is not
>>>>> aligned to the request alignment and where someone has taken a WRITE
>>>>> permission, the RESIZE permission is taken, too).
>>>>>
>>>>> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
>>>>> blk_new_open() to take the RESIZE permission) when using cache=none for
>>>>> the target, so that when writing to it, it can be aligned to the target
>>>>> sector size.
>>>>>
>>>>> Without this patch, an error is returned:
>>>>>
>>>>> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
>>>>> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
>>>>> permission without 'resize': Image size is not a multiple of request
>>>>> alignment
>>>>>
>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
>>>>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>>>>> ---
>>>>> As I have written in the BZ linked above, I am not sure what behavior we
>>>>> want.  It can be argued that the current behavior is perfectly OK
>>>>> because we want the target to have the same size as the source, so if
>>>>> this cannot be done, we should just print the above error (which I think
>>>>> explains the problem well enough that users can figure out they need to
>>>>> resize the source image).
>>>>>
>>>>> OTOH, it is difficult for me to imagine a case where the user would
>>>>> prefer the above error to just having qemu-img align the target image's
>>>>> length.
>>>>> ---
>>>>>   qemu-img.c | 8 ++++++++
>>>>>   1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>>> index 908fd0cce5..d4b29bf73e 100644
>>>>> --- a/qemu-img.c
>>>>> +++ b/qemu-img.c
>>>>> @@ -2628,6 +2628,14 @@ static int img_convert(int argc, char **argv)
>>>>>           goto out;
>>>>>       }
>>>>>   +    if (flags & BDRV_O_NOCACHE) {
>>>>> +        /*
>>>>> +         * If we open the target with O_DIRECT, it may be necessary to
>>>>> +         * extend its size to align to the physical sector size.
>>>>> +         */
>>>>> +        flags |= BDRV_O_RESIZE;
>>>>> +    }
>>>>> +
>>>>>       if (skip_create) {
>>>>>           s.target = img_open(tgt_image_opts, out_filename, out_fmt,
>>>>>                               flags, writethrough, s.quiet, false);
>>>>>
>>>>
>>>> Hmm. Strange. I am experimenting now with master branch without that patch and can't reproduce:
>>>>
>>>> [root@kvm master]# dd if=/dev/zero of=a bs=1M count=2
>>>> 2+0 records in
>>>> 2+0 records out
>>>> 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00153639 s, 1.4 GB/s
>>>> [root@kvm master]# dd if=/dev/zero of=b bs=1M count=1
>>>> 1+0 records in
>>>> 1+0 records out
>>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.000939314 s, 1.1 GB/s
>>>> [root@kvm master]# ls -lthr a b
>>>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
>>>> -rw-r--r--. 1 root root 1.0M Sep  7 14:28 b
>>>> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
>>>> [root@kvm master]# ls -lthr a b
>>>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 a
>>>> -rw-r--r--. 1 root root 2.0M Sep  7 14:28 b
>>>> [root@kvm master]# dd if=/dev/zero of=b bs=1 count=2097147
>>>> 2097147+0 records in
>>>> 2097147+0 records out
>>>> 2097147 bytes (2.1 MB, 2.0 MiB) copied, 2.76548 s, 758 kB/s
>>>> [root@kvm master]# ls -ltr a b
>>>> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
>>>> -rw-r--r--. 1 root root 2097147 Sep  7 14:29 b
>>>> [root@kvm master]# ./build/qemu-img convert -f raw -O raw -t none a b
>>>> [root@kvm master]# ls -ltr a b
>>>> -rw-r--r--. 1 root root 2097152 Sep  7 14:28 a
>>>> -rw-r--r--. 1 root root 2097152 Sep  7 14:29 b
>>>>
>>>> It just works, and do resize target without any additional BDRV_O_RESIZE...
>>>
>>> bdrv_getlength() invokes bdrv_nb_sectors(), so we always align sizes to 512 anyway.  On volumes with a logical sector size of 512, this error cannot be reproduced.
>>>
>>> You can use loop devices to get volumes with other sector sizes, like so:
>>>
>>> $ cd /tmp
>>> $ truncate fs.img -s 128M
>>> $ sudo losetup -f --show --sector-size 4096 fs.img
>>> /dev/loop0
>>> $ sudo mkfs.ext4 /dev/loop0
>>> mke2fs 1.46.4 (18-Aug-2021)
>>> Discarding device blocks: done
>>> Creating filesystem with 32768 4k blocks and 32768 inodes
>>>
>>> Allocating group tables: done
>>> Writing inode tables: done
>>> Creating journal (4096 blocks): done
>>> Writing superblocks and filesystem accounting information: done
>>>
>>> $ sudo mount /dev/loop0 /mnt/tmp
>>> $ qemu-img create -f raw source.img $((2 * 1048576 + 512))
>>> Formatting 'source.img', fmt=raw size=2097664
>>> $ sudo qemu-img convert -f raw -O raw -t none source.img /mnt/tmp/target.img
>>> qemu-img: Could not open '/mnt/tmp/target.img': Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment
>>>
>>
>>
>> Does it mean, that when logical sector size is 512, qemu-img do resize target without any 'resize' permission?
> 
> Well, it creates the target with the size of the source, and the size of the source will always be reported to be aligned to 512.  If we didn’t have bdrv_nb_sectors() but used a byte granularity for a BDS’s length internally, then you could see this error with 512-byte-sectored volumes, too.
> 
> So from qemu’s point of view, it doesn’t need a `resize` permission because it doesn’t need resize the target, because it’s always created with a 512-byte-aligned size anyway.
> 

Ah. Sorry. Somehow I thought we are saying about unaligned precreated target. But it's unrelated here. The problem is about unaligned source, so we have to create unaligned target (of the same size) and then we have problems with it.

Then I don't see any problems here. Maybe we could add BDRV_O_RESIZE only when we have unaligned file size, but nothing bad in adding it always for BDRV_O_NOCACHE mode.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Hanna Czenczek Sept. 14, 2021, 9:24 a.m. UTC | #10
On 19.08.21 12:12, Hanna Reitz wrote:
> We cannot write to images opened with O_DIRECT unless we allow them to
> be resized so they are aligned to the sector size: Since 9c60a5d1978,
> bdrv_node_refresh_perm() ensures that for nodes whose length is not
> aligned to the request alignment and where someone has taken a WRITE
> permission, the RESIZE permission is taken, too).
>
> Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
> blk_new_open() to take the RESIZE permission) when using cache=none for
> the target, so that when writing to it, it can be aligned to the target
> sector size.
>
> Without this patch, an error is returned:
>
> $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
> qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
> permission without 'resize': Image size is not a multiple of request
> alignment
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> As I have written in the BZ linked above, I am not sure what behavior we
> want.  It can be argued that the current behavior is perfectly OK
> because we want the target to have the same size as the source, so if
> this cannot be done, we should just print the above error (which I think
> explains the problem well enough that users can figure out they need to
> resize the source image).
>
> OTOH, it is difficult for me to imagine a case where the user would
> prefer the above error to just having qemu-img align the target image's
> length.
> ---
>   qemu-img.c | 8 ++++++++
>   1 file changed, 8 insertions(+)

Thanks for the review, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Hanna
diff mbox series

Patch

diff --git a/qemu-img.c b/qemu-img.c
index 908fd0cce5..d4b29bf73e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2628,6 +2628,14 @@  static int img_convert(int argc, char **argv)
         goto out;
     }
 
+    if (flags & BDRV_O_NOCACHE) {
+        /*
+         * If we open the target with O_DIRECT, it may be necessary to
+         * extend its size to align to the physical sector size.
+         */
+        flags |= BDRV_O_RESIZE;
+    }
+
     if (skip_create) {
         s.target = img_open(tgt_image_opts, out_filename, out_fmt,
                             flags, writethrough, s.quiet, false);