Message ID | 9b9c36f8c567da1f9561828380e8aa42c08b7efd.1430388393.git.berto@igalia.com |
---|---|
State | New |
Headers | show |
On 04/30/2015 04:11 AM, Alberto Garcia wrote: > The qcow2 L2/refcount cache contains one separate table for each cache > entry. Doing one allocation per table adds unnecessary overhead and it > also requires us to store the address of each table separately. > > Since the size of the cache is constant during its lifetime, it's > better to have an array that contains all the tables using one single > allocation. > > In my tests measuring freshly created caches with sizes 128MB (L2) and > 32MB (refcount) this uses around 10MB of RAM less. > > Signed-off-by: Alberto Garcia <berto@igalia.com> > --- > block/qcow2-cache.c | 48 +++++++++++++++++++++--------------------------- > 1 file changed, 21 insertions(+), 27 deletions(-) > > > typedef struct Qcow2CachedTable { > - void* table; > int64_t offset; > bool dirty; > int cache_hits; > @@ -40,39 +39,34 @@ struct Qcow2Cache { > struct Qcow2Cache* depends; > int size; > bool depends_on_flush; > + void *table_array; > + int table_size; Should this be size_t? [1] > }; > > +static inline void *table_addr(Qcow2Cache *c, int table) > +{ > + return c->table_array + table * c->table_size; Addition on void* is not portable. > +} > + > Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) > { > BDRVQcowState *s = bs->opaque; > Qcow2Cache *c; > - int i; > > c = g_new0(Qcow2Cache, 1); > c->size = num_tables; > + c->table_size = s->cluster_size; [1] Oh, maybe not, since BDRVQcowState declares cluster_size as int. > c->entries = g_try_new0(Qcow2CachedTable, num_tables); > - if (!c->entries) { > - goto fail; > - } > + c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size); Are we sure this won't overflow?
On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote: > Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) > { > BDRVQcowState *s = bs->opaque; > Qcow2Cache *c; > - int i; > > c = g_new0(Qcow2Cache, 1); > c->size = num_tables; > + c->table_size = s->cluster_size; This assumes c->table_size meets bs' memory alignment requirements. The following would be safer: c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben: > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote: > > Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) > > { > > BDRVQcowState *s = bs->opaque; > > Qcow2Cache *c; > > - int i; > > > > c = g_new0(Qcow2Cache, 1); > > c->size = num_tables; > > + c->table_size = s->cluster_size; > > This assumes c->table_size meets bs' memory alignment requirements. The > following would be safer: > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file)); You can't just access more than one cluster. You might be caching data and later write it back when it's stale. If you can't do I/O in chunks as small as the cluster size (which is rather unlikely), relying on bdrv_pwrite(), like Berto's patch does, is the correct solution. Kevin
On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote: > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben: > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote: > > > Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) > > > { > > > BDRVQcowState *s = bs->opaque; > > > Qcow2Cache *c; > > > - int i; > > > > > > c = g_new0(Qcow2Cache, 1); > > > c->size = num_tables; > > > + c->table_size = s->cluster_size; > > > > This assumes c->table_size meets bs' memory alignment requirements. The > > following would be safer: > > > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file)); > > You can't just access more than one cluster. You might be caching data > and later write it back when it's stale. I don't mean I/O alignment, just memory alignment (i.e. the start address of the data buffer in memory). Stefan
Am 05.05.2015 um 12:28 hat Stefan Hajnoczi geschrieben: > On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote: > > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben: > > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote: > > > > Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) > > > > { > > > > BDRVQcowState *s = bs->opaque; > > > > Qcow2Cache *c; > > > > - int i; > > > > > > > > c = g_new0(Qcow2Cache, 1); > > > > c->size = num_tables; > > > > + c->table_size = s->cluster_size; > > > > > > This assumes c->table_size meets bs' memory alignment requirements. The > > > following would be safer: > > > > > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file)); > > > > You can't just access more than one cluster. You might be caching data > > and later write it back when it's stale. > > I don't mean I/O alignment, just memory alignment (i.e. the start > address of the data buffer in memory). The start address is already taken care of by qemu_blockalign(). With rounding c->table_size, you'd align the length, and that would be wrong. Though looking at the code again I see now that c->table_size isn't consistently used. The I/O requests still use s->cluster_size. We should either use it everywhere or not introduce it at all. Kevin
On Thu 30 Apr 2015 05:08:05 PM CEST, Eric Blake <eblake@redhat.com> wrote: >> typedef struct Qcow2CachedTable { >> - void* table; >> int64_t offset; >> bool dirty; >> int cache_hits; >> @@ -40,39 +39,34 @@ struct Qcow2Cache { >> struct Qcow2Cache* depends; >> int size; >> bool depends_on_flush; >> + void *table_array; >> + int table_size; > > Should this be size_t? [1] The maximum supported table size is 2MB (MAX_CLUSTER_BITS == 21). >> c->entries = g_try_new0(Qcow2CachedTable, num_tables); >> - if (!c->entries) { >> - goto fail; >> - } >> + c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size); > > Are we sure this won't overflow? That's a good catch. I was making some numbers and I doubt that scenario would happen in practice, but I think it's possible so I'll fix it. Berto
On Tue 05 May 2015 01:20:19 PM CEST, Kevin Wolf wrote: > Though looking at the code again I see now that c->table_size isn't > consistently used. The I/O requests still use s->cluster_size. We > should either use it everywhere or not introduce it at all. c->table_size is necessary in order to calculate the offset of a particular table, because s->cluster_size is not always available (e.g. qcow2_cache_entry_mark_dirty()). We could make it a requirement of the API, though. Alternatively we could have c->bs instead of c->table_size. That would spare us from passing the BlockDriverState pointer to all qcow2_cache_ functions. The assumption would be that the BlockDriverState pointer is the same for the whole lifetime of the cache, but that's always guaranteed to be like that, right? Berto
On Tue, May 05, 2015 at 01:20:19PM +0200, Kevin Wolf wrote: > Am 05.05.2015 um 12:28 hat Stefan Hajnoczi geschrieben: > > On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote: > > > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben: > > > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote: > > > > > Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) > > > > > { > > > > > BDRVQcowState *s = bs->opaque; > > > > > Qcow2Cache *c; > > > > > - int i; > > > > > > > > > > c = g_new0(Qcow2Cache, 1); > > > > > c->size = num_tables; > > > > > + c->table_size = s->cluster_size; > > > > > > > > This assumes c->table_size meets bs' memory alignment requirements. The > > > > following would be safer: > > > > > > > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file)); > > > > > > You can't just access more than one cluster. You might be caching data > > > and later write it back when it's stale. > > > > I don't mean I/O alignment, just memory alignment (i.e. the start > > address of the data buffer in memory). > > The start address is already taken care of by qemu_blockalign(). With > rounding c->table_size, you'd align the length, and that would be wrong. It wasn't clear to me that c->table + n * c->table_size for all n is aligned, but I agree with you now: bdrv_qiov_is_aligned() checks both address and size against memory alignment. This means that if the I/O is memory aligned for table_size, then all multiples of table_size are also aligned. Stefan
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index b115549..586880b 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -28,7 +28,6 @@ #include "trace.h" typedef struct Qcow2CachedTable { - void* table; int64_t offset; bool dirty; int cache_hits; @@ -40,39 +39,34 @@ struct Qcow2Cache { struct Qcow2Cache* depends; int size; bool depends_on_flush; + void *table_array; + int table_size; }; +static inline void *table_addr(Qcow2Cache *c, int table) +{ + return c->table_array + table * c->table_size; +} + Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) { BDRVQcowState *s = bs->opaque; Qcow2Cache *c; - int i; c = g_new0(Qcow2Cache, 1); c->size = num_tables; + c->table_size = s->cluster_size; c->entries = g_try_new0(Qcow2CachedTable, num_tables); - if (!c->entries) { - goto fail; - } + c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size); - for (i = 0; i < c->size; i++) { - c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size); - if (c->entries[i].table == NULL) { - goto fail; - } + if (!c->entries || !c->table_array) { + qemu_vfree(c->table_array); + g_free(c->entries); + g_free(c); + c = NULL; } return c; - -fail: - if (c->entries) { - for (i = 0; i < c->size; i++) { - qemu_vfree(c->entries[i].table); - } - } - g_free(c->entries); - g_free(c); - return NULL; } int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c) @@ -81,9 +75,9 @@ int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c) for (i = 0; i < c->size; i++) { assert(c->entries[i].ref == 0); - qemu_vfree(c->entries[i].table); } + qemu_vfree(c->table_array); g_free(c->entries); g_free(c); @@ -151,8 +145,8 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i) BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE); } - ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table, - s->cluster_size); + ret = bdrv_pwrite(bs->file, c->entries[i].offset, table_addr(c, i), + s->cluster_size); if (ret < 0) { return ret; } @@ -304,7 +298,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD); } - ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size); + ret = bdrv_pread(bs->file, offset, table_addr(c, i), s->cluster_size); if (ret < 0) { return ret; } @@ -319,7 +313,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, found: c->entries[i].cache_hits++; c->entries[i].ref++; - *table = c->entries[i].table; + *table = table_addr(c, i); trace_qcow2_cache_get_done(qemu_coroutine_self(), c == s->l2_table_cache, i); @@ -344,7 +338,7 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table) int i; for (i = 0; i < c->size; i++) { - if (c->entries[i].table == *table) { + if (table_addr(c, i) == *table) { goto found; } } @@ -363,7 +357,7 @@ void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table) int i; for (i = 0; i < c->size; i++) { - if (c->entries[i].table == table) { + if (table_addr(c, i) == table) { goto found; } }
The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia <berto@igalia.com> --- block/qcow2-cache.c | 48 +++++++++++++++++++++--------------------------- 1 file changed, 21 insertions(+), 27 deletions(-)