diff mbox

[1/6] qcow2: use one single memory block for the L2/refcount cache tables

Message ID 9b9c36f8c567da1f9561828380e8aa42c08b7efd.1430388393.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia April 30, 2015, 10:11 a.m. UTC
The qcow2 L2/refcount cache contains one separate table for each cache
entry. Doing one allocation per table adds unnecessary overhead and it
also requires us to store the address of each table separately.

Since the size of the cache is constant during its lifetime, it's
better to have an array that contains all the tables using one single
allocation.

In my tests measuring freshly created caches with sizes 128MB (L2) and
32MB (refcount) this uses around 10MB of RAM less.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

Comments

Eric Blake April 30, 2015, 3:08 p.m. UTC | #1
On 04/30/2015 04:11 AM, Alberto Garcia wrote:
> The qcow2 L2/refcount cache contains one separate table for each cache
> entry. Doing one allocation per table adds unnecessary overhead and it
> also requires us to store the address of each table separately.
> 
> Since the size of the cache is constant during its lifetime, it's
> better to have an array that contains all the tables using one single
> allocation.
> 
> In my tests measuring freshly created caches with sizes 128MB (L2) and
> 32MB (refcount) this uses around 10MB of RAM less.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 48 +++++++++++++++++++++---------------------------
>  1 file changed, 21 insertions(+), 27 deletions(-)
> 

>  
>  typedef struct Qcow2CachedTable {
> -    void*   table;
>      int64_t offset;
>      bool    dirty;
>      int     cache_hits;
> @@ -40,39 +39,34 @@ struct Qcow2Cache {
>      struct Qcow2Cache*      depends;
>      int                     size;
>      bool                    depends_on_flush;
> +    void                   *table_array;
> +    int                     table_size;

Should this be size_t? [1]

>  };
>  
> +static inline void *table_addr(Qcow2Cache *c, int table)
> +{
> +    return c->table_array + table * c->table_size;

Addition on void* is not portable.

> +}
> +
>  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>  {
>      BDRVQcowState *s = bs->opaque;
>      Qcow2Cache *c;
> -    int i;
>  
>      c = g_new0(Qcow2Cache, 1);
>      c->size = num_tables;
> +    c->table_size = s->cluster_size;

[1] Oh, maybe not, since BDRVQcowState declares cluster_size as int.

>      c->entries = g_try_new0(Qcow2CachedTable, num_tables);
> -    if (!c->entries) {
> -        goto fail;
> -    }
> +    c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size);

Are we sure this won't overflow?
Stefan Hajnoczi May 1, 2015, 2:23 p.m. UTC | #2
On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
>  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>  {
>      BDRVQcowState *s = bs->opaque;
>      Qcow2Cache *c;
> -    int i;
>  
>      c = g_new0(Qcow2Cache, 1);
>      c->size = num_tables;
> +    c->table_size = s->cluster_size;

This assumes c->table_size meets bs' memory alignment requirements.  The
following would be safer:

c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
Kevin Wolf May 4, 2015, 10:58 a.m. UTC | #3
Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      Qcow2Cache *c;
> > -    int i;
> >  
> >      c = g_new0(Qcow2Cache, 1);
> >      c->size = num_tables;
> > +    c->table_size = s->cluster_size;
> 
> This assumes c->table_size meets bs' memory alignment requirements.  The
> following would be safer:
> 
> c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));

You can't just access more than one cluster. You might be caching data
and later write it back when it's stale.

If you can't do I/O in chunks as small as the cluster size (which is
rather unlikely), relying on bdrv_pwrite(), like Berto's patch does, is
the correct solution.

Kevin
Stefan Hajnoczi May 5, 2015, 10:28 a.m. UTC | #4
On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote:
> Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> > >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> > >  {
> > >      BDRVQcowState *s = bs->opaque;
> > >      Qcow2Cache *c;
> > > -    int i;
> > >  
> > >      c = g_new0(Qcow2Cache, 1);
> > >      c->size = num_tables;
> > > +    c->table_size = s->cluster_size;
> > 
> > This assumes c->table_size meets bs' memory alignment requirements.  The
> > following would be safer:
> > 
> > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
> 
> You can't just access more than one cluster. You might be caching data
> and later write it back when it's stale.

I don't mean I/O alignment, just memory alignment (i.e. the start
address of the data buffer in memory).

Stefan
Kevin Wolf May 5, 2015, 11:20 a.m. UTC | #5
Am 05.05.2015 um 12:28 hat Stefan Hajnoczi geschrieben:
> On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote:
> > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> > > >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> > > >  {
> > > >      BDRVQcowState *s = bs->opaque;
> > > >      Qcow2Cache *c;
> > > > -    int i;
> > > >  
> > > >      c = g_new0(Qcow2Cache, 1);
> > > >      c->size = num_tables;
> > > > +    c->table_size = s->cluster_size;
> > > 
> > > This assumes c->table_size meets bs' memory alignment requirements.  The
> > > following would be safer:
> > > 
> > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
> > 
> > You can't just access more than one cluster. You might be caching data
> > and later write it back when it's stale.
> 
> I don't mean I/O alignment, just memory alignment (i.e. the start
> address of the data buffer in memory).

The start address is already taken care of by qemu_blockalign(). With
rounding c->table_size, you'd align the length, and that would be wrong.

Though looking at the code again I see now that c->table_size isn't
consistently used. The I/O requests still use s->cluster_size. We should
either use it everywhere or not introduce it at all.

Kevin
Alberto Garcia May 5, 2015, 1 p.m. UTC | #6
On Thu 30 Apr 2015 05:08:05 PM CEST, Eric Blake <eblake@redhat.com> wrote:

>>  typedef struct Qcow2CachedTable {
>> -    void*   table;
>>      int64_t offset;
>>      bool    dirty;
>>      int     cache_hits;
>> @@ -40,39 +39,34 @@ struct Qcow2Cache {
>>      struct Qcow2Cache*      depends;
>>      int                     size;
>>      bool                    depends_on_flush;
>> +    void                   *table_array;
>> +    int                     table_size;
>
> Should this be size_t? [1]

The maximum supported table size is 2MB (MAX_CLUSTER_BITS == 21).

>>      c->entries = g_try_new0(Qcow2CachedTable, num_tables);
>> -    if (!c->entries) {
>> -        goto fail;
>> -    }
>> +    c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size);
>
> Are we sure this won't overflow?

That's a good catch. I was making some numbers and I doubt that scenario
would happen in practice, but I think it's possible so I'll fix it.

Berto
Alberto Garcia May 5, 2015, 3:09 p.m. UTC | #7
On Tue 05 May 2015 01:20:19 PM CEST, Kevin Wolf wrote:

> Though looking at the code again I see now that c->table_size isn't
> consistently used. The I/O requests still use s->cluster_size. We
> should either use it everywhere or not introduce it at all.

c->table_size is necessary in order to calculate the offset of a
particular table, because s->cluster_size is not always available
(e.g. qcow2_cache_entry_mark_dirty()). We could make it a requirement of
the API, though.

Alternatively we could have c->bs instead of c->table_size. That would
spare us from passing the BlockDriverState pointer to all qcow2_cache_
functions.

The assumption would be that the BlockDriverState pointer is the same
for the whole lifetime of the cache, but that's always guaranteed to be
like that, right?

Berto
Stefan Hajnoczi May 6, 2015, 2:57 p.m. UTC | #8
On Tue, May 05, 2015 at 01:20:19PM +0200, Kevin Wolf wrote:
> Am 05.05.2015 um 12:28 hat Stefan Hajnoczi geschrieben:
> > On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote:
> > > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> > > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> > > > >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> > > > >  {
> > > > >      BDRVQcowState *s = bs->opaque;
> > > > >      Qcow2Cache *c;
> > > > > -    int i;
> > > > >  
> > > > >      c = g_new0(Qcow2Cache, 1);
> > > > >      c->size = num_tables;
> > > > > +    c->table_size = s->cluster_size;
> > > > 
> > > > This assumes c->table_size meets bs' memory alignment requirements.  The
> > > > following would be safer:
> > > > 
> > > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
> > > 
> > > You can't just access more than one cluster. You might be caching data
> > > and later write it back when it's stale.
> > 
> > I don't mean I/O alignment, just memory alignment (i.e. the start
> > address of the data buffer in memory).
> 
> The start address is already taken care of by qemu_blockalign(). With
> rounding c->table_size, you'd align the length, and that would be wrong.

It wasn't clear to me that c->table + n * c->table_size for all n is
aligned, but I agree with you now:

bdrv_qiov_is_aligned() checks both address and size against memory
alignment.  This means that if the I/O is memory aligned for table_size,
then all multiples of table_size are also aligned.

Stefan
diff mbox

Patch

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index b115549..586880b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -28,7 +28,6 @@ 
 #include "trace.h"
 
 typedef struct Qcow2CachedTable {
-    void*   table;
     int64_t offset;
     bool    dirty;
     int     cache_hits;
@@ -40,39 +39,34 @@  struct Qcow2Cache {
     struct Qcow2Cache*      depends;
     int                     size;
     bool                    depends_on_flush;
+    void                   *table_array;
+    int                     table_size;
 };
 
+static inline void *table_addr(Qcow2Cache *c, int table)
+{
+    return c->table_array + table * c->table_size;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
     Qcow2Cache *c;
-    int i;
 
     c = g_new0(Qcow2Cache, 1);
     c->size = num_tables;
+    c->table_size = s->cluster_size;
     c->entries = g_try_new0(Qcow2CachedTable, num_tables);
-    if (!c->entries) {
-        goto fail;
-    }
+    c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size);
 
-    for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
-        if (c->entries[i].table == NULL) {
-            goto fail;
-        }
+    if (!c->entries || !c->table_array) {
+        qemu_vfree(c->table_array);
+        g_free(c->entries);
+        g_free(c);
+        c = NULL;
     }
 
     return c;
-
-fail:
-    if (c->entries) {
-        for (i = 0; i < c->size; i++) {
-            qemu_vfree(c->entries[i].table);
-        }
-    }
-    g_free(c->entries);
-    g_free(c);
-    return NULL;
 }
 
 int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
@@ -81,9 +75,9 @@  int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
 
     for (i = 0; i < c->size; i++) {
         assert(c->entries[i].ref == 0);
-        qemu_vfree(c->entries[i].table);
     }
 
+    qemu_vfree(c->table_array);
     g_free(c->entries);
     g_free(c);
 
@@ -151,8 +145,8 @@  static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
         BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
     }
 
-    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
+    ret = bdrv_pwrite(bs->file, c->entries[i].offset, table_addr(c, i),
+                      s->cluster_size);
     if (ret < 0) {
         return ret;
     }
@@ -304,7 +298,7 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
         }
 
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
+        ret = bdrv_pread(bs->file, offset, table_addr(c, i), s->cluster_size);
         if (ret < 0) {
             return ret;
         }
@@ -319,7 +313,7 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
 found:
     c->entries[i].cache_hits++;
     c->entries[i].ref++;
-    *table = c->entries[i].table;
+    *table = table_addr(c, i);
 
     trace_qcow2_cache_get_done(qemu_coroutine_self(),
                                c == s->l2_table_cache, i);
@@ -344,7 +338,7 @@  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
     int i;
 
     for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == *table) {
+        if (table_addr(c, i) == *table) {
             goto found;
         }
     }
@@ -363,7 +357,7 @@  void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
     int i;
 
     for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == table) {
+        if (table_addr(c, i) == table) {
             goto found;
         }
     }