Message ID | 1385387840-17307-6-git-send-email-pl@kamp.de |
---|---|
State | New |
Headers | show |
Il 25/11/2013 14:57, Peter Lieven ha scritto: > Signed-off-by: Peter Lieven <pl@kamp.de> Ok, given this patch I think the cluster_size is the right one to use here---and also the way you used the optimal unmap granularity makes sense; you could also use MAX(optimal unmap granularity, optimal transfer length granularity). However, there is no need to write one cluster at a time. What matters, I think, is to align the *end* of the transfer, so that the next transfer can start aligned. > + if (align && cluster_sectors > 0) { > + int64_t next_aligned_sector = (sector_num + cluster_sectors); So this should be "+ n", not "+ cluster_sectors". Perhaps it could be conditional on "n > cluster_sectors" (small requests happen when you have sparse region, and breaking them doesn't help). Finally, I believe there is no need for a separate "-a" knob. The patch looks fine to me with these small changes, though. Also, a couple of ideas for separate patches. Perhaps the default value of "-S" could be cluster_size if specified? This would avoid making raw images too fragmented, and compounding filesystem-level fragmentation with qcow2-level fragmentation. And 4K is too small a default in my opinion; it could be easily changed to 64K, though 4K was of course an improvement compared to 512 before commit a22f123 (qemu-img: Require larger zero areas for sparse handling, 2011-08-26). Paolo > + next_aligned_sector -= next_aligned_sector % cluster_sectors; > + if (sector_num + n > next_aligned_sector) { > + n = next_aligned_sector - sector_num; > + } > + } > + > if (n > bs_offset + bs_sectors - sector_num) { > n = bs_offset + bs_sectors - sector_num; > } > diff --git a/qemu-img.texi b/qemu-img.texi > index 87f9d0f..9b1720f 100644 > --- a/qemu-img.texi > +++ b/qemu-img.texi > @@ -179,11 +179,14 @@ Error on reading data > > @end table > > -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} > +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} > > Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename} > using format @var{output_fmt}. It can be optionally compressed (@code{-c} > option) or use any format specific options like encryption (@code{-o} option). > +If the @code{-a} option is specified write requests will be aligned > +to the cluster size of the output image if possible. This is the default > +for compressed images. > > Only the formats @code{qcow} and @code{qcow2} support compression. The > compression is read-only. It means that if a compressed sector is >
On 25.11.2013 16:11, Paolo Bonzini wrote: > Il 25/11/2013 14:57, Peter Lieven ha scritto: >> Signed-off-by: Peter Lieven <pl@kamp.de> > Ok, given this patch I think the cluster_size is the right one to use > here---and also the way you used the optimal unmap granularity makes > sense; you could also use MAX(optimal unmap granularity, optimal > transfer length granularity). > > However, there is no need to write one cluster at a time. What matters, > I think, is to align the *end* of the transfer, so that the next > transfer can start aligned. > >> + if (align && cluster_sectors > 0) { >> + int64_t next_aligned_sector = (sector_num + cluster_sectors); > So this should be "+ n", not "+ cluster_sectors". > > Perhaps it could be conditional on "n > cluster_sectors" (small requests > happen when you have sparse region, and breaking them doesn't help). > > Finally, I believe there is no need for a separate "-a" knob. > > The patch looks fine to me with these small changes, though. > > Also, a couple of ideas for separate patches. Perhaps the default value > of "-S" could be cluster_size if specified? This would avoid making raw > images too fragmented, and compounding filesystem-level fragmentation > with qcow2-level fragmentation. And 4K is too small a default in my > opinion; it could be easily changed to 64K, though 4K was of course an > improvement compared to 512 before commit a22f123 (qemu-img: Require > larger zero areas for sparse handling, 2011-08-26). I would vote for 64K or 256K, we already use the first for some time. However, it turned out that (much) bigger values decrease performance. Setting it to cluster_size can be dangerous. As described in my case its 15MB and I think for vhd its 1MB. This can be a lot of zeros that have to be written. Peter > > Paolo > >> + next_aligned_sector -= next_aligned_sector % cluster_sectors; >> + if (sector_num + n > next_aligned_sector) { >> + n = next_aligned_sector - sector_num; >> + } >> + } >> + >> if (n > bs_offset + bs_sectors - sector_num) { >> n = bs_offset + bs_sectors - sector_num; >> } >> diff --git a/qemu-img.texi b/qemu-img.texi >> index 87f9d0f..9b1720f 100644 >> --- a/qemu-img.texi >> +++ b/qemu-img.texi >> @@ -179,11 +179,14 @@ Error on reading data >> >> @end table >> >> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} >> +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} >> >> Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename} >> using format @var{output_fmt}. It can be optionally compressed (@code{-c} >> option) or use any format specific options like encryption (@code{-o} option). >> +If the @code{-a} option is specified write requests will be aligned >> +to the cluster size of the output image if possible. This is the default >> +for compressed images. >> >> Only the formats @code{qcow} and @code{qcow2} support compression. The >> compression is read-only. It means that if a compressed sector is >>
Il 25/11/2013 16:32, Peter Lieven ha scritto: >> >> Also, a couple of ideas for separate patches. Perhaps the default value >> of "-S" could be cluster_size if specified? This would avoid making raw >> images too fragmented, and compounding filesystem-level fragmentation >> with qcow2-level fragmentation. And 4K is too small a default in my >> opinion; it could be easily changed to 64K, though 4K was of course an >> improvement compared to 512 before commit a22f123 (qemu-img: Require >> larger zero areas for sparse handling, 2011-08-26). > I would vote for 64K or 256K, we already use the first for some time. > However, it turned out > that (much) bigger values decrease performance. Setting it > to cluster_size can be dangerous. As described in my case its 15MB and > I think for vhd its 1MB. This can be a lot of zeros that have to be > written. What about max(4096, min(bdi->cluster_size, 1048576))? Paolo
On 25.11.2013 16:50, Paolo Bonzini wrote: > Il 25/11/2013 16:32, Peter Lieven ha scritto: >>> Also, a couple of ideas for separate patches. Perhaps the default value >>> of "-S" could be cluster_size if specified? This would avoid making raw >>> images too fragmented, and compounding filesystem-level fragmentation >>> with qcow2-level fragmentation. And 4K is too small a default in my >>> opinion; it could be easily changed to 64K, though 4K was of course an >>> improvement compared to 512 before commit a22f123 (qemu-img: Require >>> larger zero areas for sparse handling, 2011-08-26). >> I would vote for 64K or 256K, we already use the first for some time. >> However, it turned out >> that (much) bigger values decrease performance. Setting it >> to cluster_size can be dangerous. As described in my case its 15MB and >> I think for vhd its 1MB. This can be a lot of zeros that have to be >> written. > What about max(4096, min(bdi->cluster_size, 1048576))? chaning sparse_size from 65536 to 1048576 about 5% performance decrease... lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S 1048576 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0 40980480 of 40980480 sectors converted. real 0m29.263s user 0m7.544s sys 0m1.636s lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S 4096 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0 40980480 of 40980480 sectors converted. real 0m28.169s user 0m7.792s sys 0m1.516s lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S 65536 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0 40980480 of 40980480 sectors converted. real 0m27.643s user 0m7.644s sys 0m1.520s i wouldn't go over 64k until we fully understand which impact it has. Peter
Il 25/11/2013 16:55, Peter Lieven ha scritto: >> > chaning sparse_size from 65536 to 1048576 about 5% performance decrease... > > lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S > 1048576 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 > iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0 > > 40980480 of 40980480 sectors converted. > > real 0m29.263s > user 0m7.544s > sys 0m1.636s > lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S > 4096 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 > iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0 > > 40980480 of 40980480 sectors converted. > > real 0m28.169s > user 0m7.792s > sys 0m1.516s > lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S > 65536 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 > iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0 > > 40980480 of 40980480 sectors converted. > > real 0m27.643s > user 0m7.644s > sys 0m1.520s > > i wouldn't go over 64k until we fully understand which impact it has. I agree. Paolo
On 25.11.2013 16:11, Paolo Bonzini wrote: > Il 25/11/2013 14:57, Peter Lieven ha scritto: >> Signed-off-by: Peter Lieven <pl@kamp.de> > Ok, given this patch I think the cluster_size is the right one to use > here---and also the way you used the optimal unmap granularity makes > sense; you could also use MAX(optimal unmap granularity, optimal > transfer length granularity). > > However, there is no need to write one cluster at a time. What matters, > I think, is to align the *end* of the transfer, so that the next > transfer can start aligned. > >> + if (align && cluster_sectors > 0) { >> + int64_t next_aligned_sector = (sector_num + cluster_sectors); > So this should be "+ n", not "+ cluster_sectors". > > Perhaps it could be conditional on "n > cluster_sectors" (small requests > happen when you have sparse region, and breaking them doesn't help). would you also agree to n >= cluster_sectors. In my case and if especially if n is bound by iobuf_size the case n > cluster_sectors will be hard to meet. Peter > > Finally, I believe there is no need for a separate "-a" knob. > > The patch looks fine to me with these small changes, though. > > Also, a couple of ideas for separate patches. Perhaps the default value > of "-S" could be cluster_size if specified? This would avoid making raw > images too fragmented, and compounding filesystem-level fragmentation > with qcow2-level fragmentation. And 4K is too small a default in my > opinion; it could be easily changed to 64K, though 4K was of course an > improvement compared to 512 before commit a22f123 (qemu-img: Require > larger zero areas for sparse handling, 2011-08-26). > > Paolo > >> + next_aligned_sector -= next_aligned_sector % cluster_sectors; >> + if (sector_num + n > next_aligned_sector) { >> + n = next_aligned_sector - sector_num; >> + } >> + } >> + >> if (n > bs_offset + bs_sectors - sector_num) { >> n = bs_offset + bs_sectors - sector_num; >> } >> diff --git a/qemu-img.texi b/qemu-img.texi >> index 87f9d0f..9b1720f 100644 >> --- a/qemu-img.texi >> +++ b/qemu-img.texi >> @@ -179,11 +179,14 @@ Error on reading data >> >> @end table >> >> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} >> +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} >> >> Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename} >> using format @var{output_fmt}. It can be optionally compressed (@code{-c} >> option) or use any format specific options like encryption (@code{-o} option). >> +If the @code{-a} option is specified write requests will be aligned >> +to the cluster size of the output image if possible. This is the default >> +for compressed images. >> >> Only the formats @code{qcow} and @code{qcow2} support compression. The >> compression is read-only. It means that if a compressed sector is >>
Il 25/11/2013 17:11, Peter Lieven ha scritto: > On 25.11.2013 16:11, Paolo Bonzini wrote: >> Il 25/11/2013 14:57, Peter Lieven ha scritto: >>> Signed-off-by: Peter Lieven <pl@kamp.de> >> Ok, given this patch I think the cluster_size is the right one to use >> here---and also the way you used the optimal unmap granularity makes >> sense; you could also use MAX(optimal unmap granularity, optimal >> transfer length granularity). >> >> However, there is no need to write one cluster at a time. What matters, >> I think, is to align the *end* of the transfer, so that the next >> transfer can start aligned. >> >>> + if (align && cluster_sectors > 0) { >>> + int64_t next_aligned_sector = (sector_num + >>> cluster_sectors); >> So this should be "+ n", not "+ cluster_sectors". >> >> Perhaps it could be conditional on "n > cluster_sectors" (small requests >> happen when you have sparse region, and breaking them doesn't help). > > would you also agree to n >= cluster_sectors. In my case > and if especially if n is bound by iobuf_size the case n > cluster_sectors > will be hard to meet. Of course. In fact > alone is wrong ("n > cluster_sectors || n == iobuf_size" could be right, but perhaps it's a useless complication). Paolo
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index e0b8ab4..266cdf3 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -34,9 +34,9 @@ STEXI ETEXI DEF("convert", img_convert, - "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename") + "convert [-c] [-p] [-q] [-n] [-a] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename") STEXI -@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [-c] [-p] [-q] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI DEF("info", img_info, diff --git a/qemu-img.c b/qemu-img.c index 0ce5d14..9fa8fd4 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -109,6 +109,7 @@ static void help(void) " '--output' takes the format in which the output must be done (human or json)\n" " '-n' skips the target volume creation (useful if the volume is created\n" " prior to running qemu-img)\n" + " '-a' align write requests to cluster size if possible\n" "\n" "Parameters to check subcommand:\n" " '-r' tries to repair any inconsistencies that are found during the check.\n" @@ -1125,8 +1126,7 @@ out3: static int img_convert(int argc, char **argv) { - int c, n, n1, bs_n, bs_i, compress, cluster_size, - cluster_sectors, skip_create; + int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create; int64_t ret = 0; int progress = 0, flags; const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; @@ -1144,7 +1144,7 @@ static int img_convert(int argc, char **argv) char *options = NULL; const char *snapshot_name = NULL; int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */ - bool quiet = false; + bool quiet = false, align = false; Error *local_err = NULL; fmt = NULL; @@ -1154,7 +1154,7 @@ static int img_convert(int argc, char **argv) compress = 0; skip_create = 0; for(;;) { - c = getopt(argc, argv, "f:O:B:s:hce6o:pS:m:t:qn"); + c = getopt(argc, argv, "f:O:B:s:hcae6o:pS:m:t:qn"); if (c == -1) { break; } @@ -1175,6 +1175,9 @@ static int img_convert(int argc, char **argv) case 'c': compress = 1; break; + case 'a': + align = true; + break; case 'e': error_report("option -e is deprecated, please use \'-o " "encryption\' instead!"); @@ -1402,19 +1405,21 @@ static int img_convert(int argc, char **argv) } } + cluster_sectors = 0; + ret = bdrv_get_info(out_bs, &bdi); + if (ret < 0 && compress) { + error_report("could not get block driver info"); + goto out; + } else { + cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE; + } + if (compress) { - ret = bdrv_get_info(out_bs, &bdi); - if (ret < 0) { - error_report("could not get block driver info"); - goto out; - } - cluster_size = bdi.cluster_size; - if (cluster_size <= 0 || cluster_size > bufsectors * BDRV_SECTOR_SIZE) { + if (cluster_sectors <= 0 || cluster_sectors > bufsectors) { error_report("invalid cluster size"); ret = -1; goto out; } - cluster_sectors = cluster_size >> 9; sector_num = 0; nb_sectors = total_sectors; @@ -1552,6 +1557,14 @@ static int img_convert(int argc, char **argv) n = nb_sectors; } + if (align && cluster_sectors > 0) { + int64_t next_aligned_sector = (sector_num + cluster_sectors); + next_aligned_sector -= next_aligned_sector % cluster_sectors; + if (sector_num + n > next_aligned_sector) { + n = next_aligned_sector - sector_num; + } + } + if (n > bs_offset + bs_sectors - sector_num) { n = bs_offset + bs_sectors - sector_num; } diff --git a/qemu-img.texi b/qemu-img.texi index 87f9d0f..9b1720f 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -179,11 +179,14 @@ Error on reading data @end table -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename} using format @var{output_fmt}. It can be optionally compressed (@code{-c} option) or use any format specific options like encryption (@code{-o} option). +If the @code{-a} option is specified write requests will be aligned +to the cluster size of the output image if possible. This is the default +for compressed images. Only the formats @code{qcow} and @code{qcow2} support compression. The compression is read-only. It means that if a compressed sector is
Signed-off-by: Peter Lieven <pl@kamp.de> --- qemu-img-cmds.hx | 4 ++-- qemu-img.c | 37 +++++++++++++++++++++++++------------ qemu-img.texi | 5 ++++- 3 files changed, 31 insertions(+), 15 deletions(-)