Message ID | 1388074792-29946-1-git-send-email-pl@kamp.de |
---|---|
State | New |
Headers | show |
On 2013年12月27日 00:19, Peter Lieven wrote: > while evaluatiing compressed qcow2 images as a good basis for > virtual machine templates I found out that there are a lot > of partly redundant (compressed clusters have common physical > sectors) and relatively short reads. > > This doesn't hurt if the image resides on a local > filesystem where we can benefit from the local page cache, > but it adds a lot of penalty when accessing remote images > on NFS or similar exports. > > This patch effectevily implements a readahead of 2 * cluster_size > which is 2 * 64kB per default resulting in 128kB readahead. This > is the common setting for Linux for instance. > > For example this leads to the following times when converting > a compressed qcow2 image to a local tmpfs partition. > > Old: > time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw > real 0m24.681s > user 0m8.597s > sys 0m4.084s > > New: > time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw > real 0m16.121s > user 0m7.932s > sys 0m2.244s > > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > block/qcow2-cluster.c | 27 +++++++++++++++++++++++++-- > block/qcow2.h | 1 + > 2 files changed, 26 insertions(+), 2 deletions(-) I like this idea, but here's a question. Actually, this penalty is common to all protocol drivers: curl, gluster, whatever. Readahead is not only good for compression processing, but also quite helpful for boot: BIOS and GRUB may send sequential 1 sector IO, synchronously, thus suffer from high latency of network communication. So I think if we want to do this, we will want to share it with other format and protocol combinations. Fam
Am 27.12.2013 04:23, schrieb Fam Zheng: > On 2013年12月27日 00:19, Peter Lieven wrote: >> while evaluatiing compressed qcow2 images as a good basis for >> virtual machine templates I found out that there are a lot >> of partly redundant (compressed clusters have common physical >> sectors) and relatively short reads. >> >> This doesn't hurt if the image resides on a local >> filesystem where we can benefit from the local page cache, >> but it adds a lot of penalty when accessing remote images >> on NFS or similar exports. >> >> This patch effectevily implements a readahead of 2 * cluster_size >> which is 2 * 64kB per default resulting in 128kB readahead. This >> is the common setting for Linux for instance. >> >> For example this leads to the following times when converting >> a compressed qcow2 image to a local tmpfs partition. >> >> Old: >> time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw >> real 0m24.681s >> user 0m8.597s >> sys 0m4.084s >> >> New: >> time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw >> real 0m16.121s >> user 0m7.932s >> sys 0m2.244s >> >> Signed-off-by: Peter Lieven <pl@kamp.de> >> --- >> block/qcow2-cluster.c | 27 +++++++++++++++++++++++++-- >> block/qcow2.h | 1 + >> 2 files changed, 26 insertions(+), 2 deletions(-) > > I like this idea, but here's a question. Actually, this penalty is common to all protocol drivers: curl, gluster, whatever. Readahead is not only good for compression processing, but also quite helpful for boot: BIOS and GRUB may send sequential 1 sector IO, synchronously, thus suffer from high latency of network communication. So I think if we want to do this, we will want to share it with other format and protocol combinations. I had the same idea in mind. Not only high latency, but also high I/O load on the storage as reading sectors one by one produces high IOPS. But we have to be very careful: - Its likely that the OS already does a readahead so we should not put the complexity in qemu in this case. - We definetely destroy zero copy functionality. My idea would be that we only do a readahead if we observe a read smaller than n bytes and then maybe round up to this size. Maybe we should only place this logic only in place if there is a 1 sector read and then read e.g. 4K. In any case this has to be an opt-in feature. If I have some time I will collect some historgram of transfer size versus timing booting popular OSs. Peter
Am 28.12.2013 16:35, schrieb Peter Lieven: > Am 27.12.2013 04:23, schrieb Fam Zheng: >> On 2013年12月27日 00:19, Peter Lieven wrote: >>> while evaluatiing compressed qcow2 images as a good basis for >>> virtual machine templates I found out that there are a lot >>> of partly redundant (compressed clusters have common physical >>> sectors) and relatively short reads. >>> >>> This doesn't hurt if the image resides on a local >>> filesystem where we can benefit from the local page cache, >>> but it adds a lot of penalty when accessing remote images >>> on NFS or similar exports. >>> >>> This patch effectevily implements a readahead of 2 * cluster_size >>> which is 2 * 64kB per default resulting in 128kB readahead. This >>> is the common setting for Linux for instance. >>> >>> For example this leads to the following times when converting >>> a compressed qcow2 image to a local tmpfs partition. >>> >>> Old: >>> time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw >>> real 0m24.681s >>> user 0m8.597s >>> sys 0m4.084s >>> >>> New: >>> time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw >>> real 0m16.121s >>> user 0m7.932s >>> sys 0m2.244s >>> >>> Signed-off-by: Peter Lieven <pl@kamp.de> >>> --- >>> block/qcow2-cluster.c | 27 +++++++++++++++++++++++++-- >>> block/qcow2.h | 1 + >>> 2 files changed, 26 insertions(+), 2 deletions(-) >> I like this idea, but here's a question. Actually, this penalty is common to all protocol drivers: curl, gluster, whatever. Readahead is not only good for compression processing, but also quite helpful for boot: BIOS and GRUB may send sequential 1 sector IO, synchronously, thus suffer from high latency of network communication. So I think if we want to do this, we will want to share it with other format and protocol combinations. > I had the same idea in mind. Not only high latency, but also high I/O load on the storage as reading sectors one by one produces high IOPS. > But we have to be very careful: > - Its likely that the OS already does a readahead so we should not put the complexity in qemu in this case. > - We definetely destroy zero copy functionality. > > My idea would be that we only do a readahead if we observe a read smaller than n bytes and then maybe round up to this size. Maybe > we should only place this logic only in place if there is a 1 sector read and then read e.g. 4K. In any case this has to be an opt-in feature. > > If I have some time I will collect some historgram of transfer size versus timing booting popular OSs. What I forgot here. In the QCOW2 compressed cluster case this was very low haning fruit as buffers etc. are already there. In this case its obvious that we benefit from readahead, but maybe we could qemu-img enable it itself in this special case if we really build it into the BlockDriver. Peter
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 11f9c50..367f089 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1321,7 +1321,7 @@ static int decompress_buffer(uint8_t *out_buf, int out_buf_size, int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) { BDRVQcowState *s = bs->opaque; - int ret, csize, nb_csectors, sector_offset; + int ret, csize, nb_csectors, sector_offset, max_read; uint64_t coffset; coffset = cluster_offset & s->cluster_offset_mask; @@ -1329,9 +1329,32 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1; sector_offset = coffset & 511; csize = nb_csectors * 512 - sector_offset; + max_read = MIN((bs->file->total_sectors - (coffset >> 9)), 2 * s->cluster_sectors); BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED); - ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data, nb_csectors); + if (s->cluster_cache_offset != -1 && coffset > s->cluster_cache_offset && + (coffset >> 9) < (s->cluster_cache_offset >> 9) + s->cluster_data_sectors) { + int cached_sectors = s->cluster_data_sectors - ((coffset >> 9) - + (s->cluster_cache_offset >> 9)); + memmove(s->cluster_data, + s->cluster_data + (s->cluster_data_sectors - cached_sectors) * 512, + cached_sectors * 512); + s->cluster_data_sectors = cached_sectors; + if (nb_csectors > cached_sectors) { + /* some sectors are missing read them and fill up to max_read sectors */ + ret = bdrv_read(bs->file, (coffset >> 9) + cached_sectors, + s->cluster_data + cached_sectors * 512, + max_read); + s->cluster_data_sectors = cached_sectors + max_read; + } else { + /* all relevant sectors are in the cache */ + ret = 0; + } + } else { + ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data, max_read); + s->cluster_data_sectors = max_read; + } if (ret < 0) { + s->cluster_data_sectors = 0; return ret; } if (decompress_buffer(s->cluster_cache, s->cluster_size, diff --git a/block/qcow2.h b/block/qcow2.h index 922e190..5edad26 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -185,6 +185,7 @@ typedef struct BDRVQcowState { uint8_t *cluster_cache; uint8_t *cluster_data; + int cluster_data_sectors; uint64_t cluster_cache_offset; QLIST_HEAD(QCowClusterAlloc, QCowL2Meta) cluster_allocs;
while evaluatiing compressed qcow2 images as a good basis for virtual machine templates I found out that there are a lot of partly redundant (compressed clusters have common physical sectors) and relatively short reads. This doesn't hurt if the image resides on a local filesystem where we can benefit from the local page cache, but it adds a lot of penalty when accessing remote images on NFS or similar exports. This patch effectevily implements a readahead of 2 * cluster_size which is 2 * 64kB per default resulting in 128kB readahead. This is the common setting for Linux for instance. For example this leads to the following times when converting a compressed qcow2 image to a local tmpfs partition. Old: time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw real 0m24.681s user 0m8.597s sys 0m4.084s New: time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw real 0m16.121s user 0m7.932s sys 0m2.244s Signed-off-by: Peter Lieven <pl@kamp.de> --- block/qcow2-cluster.c | 27 +++++++++++++++++++++++++-- block/qcow2.h | 1 + 2 files changed, 26 insertions(+), 2 deletions(-)