diff mbox

[2/3] qcow2: add option to clean unused cache entries after some time

Message ID 8007efe81120cd72f7c4145b8bbc3f4bc558e62d.1432719752.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia May 27, 2015, 9:46 a.m. UTC
This adds a new 'cache-clean-interval' option that cleans all qcow2
cache entries that haven't been used in a certain interval, given in
seconds.

This allows setting a large L2 cache size so it can handle scenarios
with lots of I/O and at the same time use little memory during periods
of inactivity.

This feature currently relies on MADV_DONTNEED to free that memory, so
it is not useful in systems that don't follow that behavior.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
 block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json | 13 +++++++++--
 4 files changed, 116 insertions(+), 2 deletions(-)

Comments

Eric Blake May 27, 2015, 12:27 p.m. UTC | #1
On 05/27/2015 03:46 AM, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
> 
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
> 
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>  block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.h        |  4 ++++
>  qapi/block-core.json | 13 +++++++++--
>  4 files changed, 116 insertions(+), 2 deletions(-)
> 

> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_interval > 0) {
> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
> +                                             SCALE_MS, cache_clean_timer_cb,
> +                                             bs);
> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  (int64_t) s->cache_clean_interval * 1000);
> +    }
> +}
> +

This function sets up a timer for non-zero interval, but does nothing if
interval is zero. [1]

> @@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>          goto fail;
>      }
>  
> +    cache_clean_interval =
> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
> +    if (cache_clean_interval > UINT_MAX) {
> +        error_setg(errp, "Cache clean interval too big");
> +        ret = -EINVAL;
> +        goto fail;
> +    }

If you type the qapi code as 'uint32' rather than 'int', you could skip
the error checking here because the parser would have already clamped
things.  But I can live with this as-is.

> +    s->cache_clean_interval = cache_clean_interval;
> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));

[1] But here, you are unconditionally calling init, whether the new
value is 0 or nonzero.  Can a block reopen ever cause an existing BDS to
change its interval, in which case I could create a BDS originally with
a timer, then reopen it without a timer, and init() would have to remove
the existing timer?  If I'm reading this patch correctly, right now the
interval is a write-once deal (no way to change it after the fact), so
your code is okay; but should a separate patch be added to allow
adjusting the interval, via a reopen operation?

Reviewed-by: Eric Blake <eblake@redhat.com>
Alberto Garcia May 27, 2015, 2:34 p.m. UTC | #2
On Wed 27 May 2015 02:27:35 PM CEST, Eric Blake <eblake@redhat.com> wrote:

>> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
>
> This function sets up a timer for non-zero interval, but does nothing
> if interval is zero. [1]

Right, zero means there's no interval. I could clarify that in the
documentation (I did it in the ImageInfoSpecificQCow2 part, though).

>> +    if (cache_clean_interval > UINT_MAX) {
>> +        error_setg(errp, "Cache clean interval too big");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>
> If you type the qapi code as 'uint32' rather than 'int', you could
> skip the error checking here because the parser would have already
> clamped things.  But I can live with this as-is.

Hmmm... the UINT_MAX limit is just a way to prevent an overflow because
of an abnormal value, not because the limit itself make sense (it's ~136
years, way beyond any reasonable value for that setting).

I'm not sure if it's a good idea to expose that in the API, it gives the
idea that there's a reason why it is a 32-bit integer, but there's none.

I would actually be ok with having a smaller limit (I don't know, 1
year?), but there's also no easy way to choose one I guess, so I decided
to let the user choose as long as it doesn't break anything.

>> +    s->cache_clean_interval = cache_clean_interval;
>> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
>
> [1] But here, you are unconditionally calling init, whether the new
> value is 0 or nonzero.

Instead of having the same check in all places where we call
cache_clean_timer_init() (there's two at the moment) I think it's enough
with checking the value in the function where it is actually used (which
is anyway a good idea).

> Can a block reopen ever cause an existing BDS to change its interval,
> in which case I could create a BDS originally with a timer, then
> reopen it without a timer, and init() would have to remove the
> existing timer?  If I'm reading this patch correctly, right now the
> interval is a write-once deal (no way to change it after the fact), so
> your code is okay; but should a separate patch be added to allow
> adjusting the interval, via a reopen operation?

I have no objections against changing the interval, I just didn't
consider that case.

If we want to support it it should be as simple as

 timer_del();   s->interval = new_interval;   timer_init();

Berto
Max Reitz May 28, 2015, 2:47 p.m. UTC | #3
On 27.05.2015 14:27, Eric Blake wrote:
> On 05/27/2015 03:46 AM, Alberto Garcia wrote:
>> This adds a new 'cache-clean-interval' option that cleans all qcow2
>> cache entries that haven't been used in a certain interval, given in
>> seconds.
>>
>> This allows setting a large L2 cache size so it can handle scenarios
>> with lots of I/O and at the same time use little memory during periods
>> of inactivity.
>>
>> This feature currently relies on MADV_DONTNEED to free that memory, so
>> it is not useful in systems that don't follow that behavior.
>>
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> Cc: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>>   block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.h        |  4 ++++
>>   qapi/block-core.json | 13 +++++++++--
>>   4 files changed, 116 insertions(+), 2 deletions(-)
>>
>> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    if (s->cache_clean_interval > 0) {
>> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
>> +                                             SCALE_MS, cache_clean_timer_cb,
>> +                                             bs);
>> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                  (int64_t) s->cache_clean_interval * 1000);
>> +    }
>> +}
>> +
> This function sets up a timer for non-zero interval, but does nothing if
> interval is zero. [1]
>
>> @@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>           goto fail;
>>       }
>>   
>> +    cache_clean_interval =
>> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
>> +    if (cache_clean_interval > UINT_MAX) {
>> +        error_setg(errp, "Cache clean interval too big");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
> If you type the qapi code as 'uint32' rather than 'int', you could skip
> the error checking here because the parser would have already clamped
> things.  But I can live with this as-is.

Well, for blockdev-add, yes, but I don't think that applies when the 
option has been passed on the command line.

Max

>> +    s->cache_clean_interval = cache_clean_interval;
>> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> [1] But here, you are unconditionally calling init, whether the new
> value is 0 or nonzero.  Can a block reopen ever cause an existing BDS to
> change its interval, in which case I could create a BDS originally with
> a timer, then reopen it without a timer, and init() would have to remove
> the existing timer?  If I'm reading this patch correctly, right now the
> interval is a write-once deal (no way to change it after the fact), so
> your code is okay; but should a separate patch be added to allow
> adjusting the interval, via a reopen operation?
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
Max Reitz May 28, 2015, 2:56 p.m. UTC | #4
On 27.05.2015 11:46, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
>
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
>
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>   block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.h        |  4 ++++
>   qapi/block-core.json | 13 +++++++++--
>   4 files changed, 116 insertions(+), 2 deletions(-)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index ed14a92..a215f5b 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -43,6 +43,7 @@ struct Qcow2Cache {
>       bool                    depends_on_flush;
>       void                   *table_array;
>       uint64_t                lru_counter;
> +    uint64_t                cache_clean_lru_counter;
>   };
>   
>   static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
> @@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
>   #endif
>   }
>   
> +static inline bool can_clean_entry(Qcow2Cache *c, int i)
> +{
> +    Qcow2CachedTable *t = &c->entries[i];
> +    return t->ref == 0 && !t->dirty && t->offset != 0 &&
> +        t->lru_counter <= c->cache_clean_lru_counter;
> +}
> +
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
> +{
> +    int i = 0;
> +    while (i < c->size) {
> +        int to_clean = 0;
> +
> +        /* Skip the entries that we don't need to clean */
> +        while (i < c->size && !can_clean_entry(c, i)) {
> +            i++;
> +        }
> +
> +        /* And count how many we can clean in a row */
> +        while (i < c->size && can_clean_entry(c, i)) {
> +            c->entries[i].offset = 0;
> +            c->entries[i].lru_counter = 0;
> +            i++;
> +            to_clean++;
> +        }
> +
> +        if (to_clean > 0) {
> +            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
> +        }
> +    }
> +
> +    c->cache_clean_lru_counter = c->lru_counter;
> +}
> +
>   Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>   {
>       BDRVQcowState *s = bs->opaque;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f7b4cc6..abafcdf 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
>               .type = QEMU_OPT_SIZE,
>               .help = "Maximum refcount block cache size",
>           },
> +        {
> +            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Clean unused cache entries after this time (in seconds)",
> +        },
>           { /* end of list */ }
>       },
>   };
> @@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
>       [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
>   };
>   
> +static void cache_clean_timer_cb(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BDRVQcowState *s = bs->opaque;
> +    qcow2_cache_clean_unused(bs, s->l2_table_cache);
> +    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
> +    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +              (int64_t) s->cache_clean_interval * 1000);
> +}
> +
> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_interval > 0) {
> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
> +                                             SCALE_MS, cache_clean_timer_cb,
> +                                             bs);
> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  (int64_t) s->cache_clean_interval * 1000);
> +    }
> +}
> +
> +static void cache_clean_timer_del(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_timer) {
> +        timer_del(s->cache_clean_timer);
> +        timer_free(s->cache_clean_timer);
> +        s->cache_clean_timer = NULL;
> +    }
> +}
> +
> +static void qcow2_detach_aio_context(BlockDriverState *bs)
> +{
> +    cache_clean_timer_del(bs);
> +}
> +
> +static void qcow2_attach_aio_context(BlockDriverState *bs,
> +                                     AioContext *new_context)
> +{
> +    cache_clean_timer_init(bs, new_context);
> +}
> +
>   static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
>                                uint64_t *refcount_cache_size, Error **errp)
>   {
> @@ -552,6 +600,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       const char *opt_overlap_check, *opt_overlap_check_template;
>       int overlap_check_template = 0;
>       uint64_t l2_cache_size, refcount_cache_size;
> +    uint64_t cache_clean_interval;
>   
>       ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
>       if (ret < 0) {
> @@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>           goto fail;
>       }
>   
> +    cache_clean_interval =
> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
> +    if (cache_clean_interval > UINT_MAX) {
> +        error_setg(errp, "Cache clean interval too big");
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +    s->cache_clean_interval = cache_clean_interval;
> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> +
>       s->cluster_cache = g_malloc(s->cluster_size);
>       /* one more sector for decompressed data alignment */
>       s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
> @@ -1004,6 +1063,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       qemu_vfree(s->l1_table);
>       /* else pre-write overlap checks in cache_destroy may crash */
>       s->l1_table = NULL;
> +    cache_clean_timer_del(bs);
>       if (s->l2_table_cache) {
>           qcow2_cache_destroy(bs, s->l2_table_cache);
>       }
> @@ -1458,6 +1518,7 @@ static void qcow2_close(BlockDriverState *bs)
>           }
>       }
>   
> +    cache_clean_timer_del(bs);
>       qcow2_cache_destroy(bs, s->l2_table_cache);
>       qcow2_cache_destroy(bs, s->refcount_block_cache);
>   
> @@ -2552,6 +2613,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
>           };
>       }
>   
> +    spec_info->qcow2->cache_clean_interval = s->cache_clean_interval;
> +
>       return spec_info;
>   }
>   
> @@ -2970,6 +3033,9 @@ BlockDriver bdrv_qcow2 = {
>       .create_opts         = &qcow2_create_opts,
>       .bdrv_check          = qcow2_check,
>       .bdrv_amend_options  = qcow2_amend_options,
> +
> +    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
> +    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
>   };
>   
>   static void bdrv_qcow2_init(void)
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0076512..2c45eb2 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -93,6 +93,7 @@
>   #define QCOW2_OPT_CACHE_SIZE "cache-size"
>   #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
>   #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
> +#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
>   
>   typedef struct QCowHeader {
>       uint32_t magic;
> @@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
>   
>       Qcow2Cache* l2_table_cache;
>       Qcow2Cache* refcount_block_cache;
> +    QEMUTimer *cache_clean_timer;
> +    unsigned cache_clean_interval;
>   
>       uint8_t *cluster_cache;
>       uint8_t *cluster_data;
> @@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>       Qcow2Cache *dependency);
>   void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>   
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
>   int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
>   
>   int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 863ffea..3a9b590 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -41,6 +41,10 @@
>   # @corrupt: #optional true if the image has been marked corrupt; only valid for
>   #           compat >= 1.1 (since 2.2)
>   #
> +# @cache-clean-interval: interval in seconds after which unused L2 and
> +#                        refcount cache entries are removed. If 0 then
> +#                        this feature is not enabled (since 2.4)
> +#
>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
>   #
>   # Since: 1.7
> @@ -50,7 +54,8 @@
>         'compat': 'str',
>         '*lazy-refcounts': 'bool',
>         '*corrupt': 'bool',
> -      'refcount-bits': 'int'
> +      'refcount-bits': 'int',
> +      'cache-clean-interval': 'int'
>     } }

I'm not too happy about making this part of ImageInfoSpecificQCow2. Two 
reasons for this: First, it's eventually part of ImageInfo, which is 
defined as "Information about a QEMU image file", but this option cannot 
be set in the image file itself but is only a run-time option.

Second, my original goal for ImageInfoSpecific was to provide more 
information through qemu-img info, and this value will look pretty 
strange there.

I don't know how to resolve this quandary, though. Since qemu cannot 
change this interval by itself, I think not providing an interface for 
reading it is fine. On the other hand, if Eric finds such an interface 
absolutely mandatory, I can't think of a better place to return it than 
here.

The only solution which would satisfy both requirements would be another 
structure which contains "online" flags, and thus is not evaluated by 
qemu-img info, but only by QMP commands. But that's ugly.

Max

>   ##
> @@ -1538,6 +1543,9 @@
>   # @refcount-cache-size:   #optional the maximum size of the refcount block cache
>   #                         in bytes (since 2.2)
>   #
> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
> +#                         caches. The interval is in seconds (since 2.4)
> +#
>   # Since: 1.7
>   ##
>   { 'struct': 'BlockdevOptionsQcow2',
> @@ -1549,7 +1557,8 @@
>               '*overlap-check': 'Qcow2OverlapChecks',
>               '*cache-size': 'int',
>               '*l2-cache-size': 'int',
> -            '*refcount-cache-size': 'int' } }
> +            '*refcount-cache-size': 'int',
> +            '*cache-clean-interval': 'int' } }
>   
>   
>   ##
Alberto Garcia May 28, 2015, 3:02 p.m. UTC | #5
On Thu 28 May 2015 04:56:34 PM CEST, Max Reitz wrote:
>>         'compat': 'str',
>>         '*lazy-refcounts': 'bool',
>>         '*corrupt': 'bool',
>> -      'refcount-bits': 'int'
>> +      'refcount-bits': 'int',
>> +      'cache-clean-interval': 'int'
>>     } }
>
> I'm not too happy about making this part of ImageInfoSpecificQCow2.
> Two reasons for this: First, it's eventually part of ImageInfo, which
> is defined as "Information about a QEMU image file", but this option
> cannot be set in the image file itself but is only a run-time option.

That's a valid point. Now that I think of it, do we actually have a way
to retrieve the sizes of the L2 and refcount caches? I think cache-size,
l2-cache-size, and refcount-cache-size are already write-only values.

Berto
Max Reitz May 28, 2015, 3:04 p.m. UTC | #6
On 28.05.2015 17:02, Alberto Garcia wrote:
> On Thu 28 May 2015 04:56:34 PM CEST, Max Reitz wrote:
>>>          'compat': 'str',
>>>          '*lazy-refcounts': 'bool',
>>>          '*corrupt': 'bool',
>>> -      'refcount-bits': 'int'
>>> +      'refcount-bits': 'int',
>>> +      'cache-clean-interval': 'int'
>>>      } }
>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>> Two reasons for this: First, it's eventually part of ImageInfo, which
>> is defined as "Information about a QEMU image file", but this option
>> cannot be set in the image file itself but is only a run-time option.
> That's a valid point. Now that I think of it, do we actually have a way
> to retrieve the sizes of the L2 and refcount caches? I think cache-size,
> l2-cache-size, and refcount-cache-size are already write-only values.

No, in fact we don't. Well, except for if you manage to retrieve the 
JSON filename for the qcow2 BDS, it should be part of that. :-P

Max
Max Reitz May 28, 2015, 3:06 p.m. UTC | #7
On 28.05.2015 17:04, Max Reitz wrote:
> On 28.05.2015 17:02, Alberto Garcia wrote:
>> On Thu 28 May 2015 04:56:34 PM CEST, Max Reitz wrote:
>>>>          'compat': 'str',
>>>>          '*lazy-refcounts': 'bool',
>>>>          '*corrupt': 'bool',
>>>> -      'refcount-bits': 'int'
>>>> +      'refcount-bits': 'int',
>>>> +      'cache-clean-interval': 'int'
>>>>      } }
>>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>>> Two reasons for this: First, it's eventually part of ImageInfo, which
>>> is defined as "Information about a QEMU image file", but this option
>>> cannot be set in the image file itself but is only a run-time option.
>> That's a valid point. Now that I think of it, do we actually have a way
>> to retrieve the sizes of the L2 and refcount caches? I think cache-size,
>> l2-cache-size, and refcount-cache-size are already write-only values.
>
> No, in fact we don't. Well, except for if you manage to retrieve the 
> JSON filename for the qcow2 BDS, it should be part of that. :-P

And I just noticed, passing that option will result in qemu returning 
the JSON filename when queried:

$ qemu-img info "json:{'l2-cache-size':65536,'file.filename':'test.qcow2'}"
image: json:{"driver": "qcow2", "l2-cache-size": 65536, "file": 
{"driver": "file", "filename": "test.qcow2"}}
[…]

Max
Eric Blake May 28, 2015, 3:13 p.m. UTC | #8
On 05/28/2015 08:56 AM, Max Reitz wrote:
> On 27.05.2015 11:46, Alberto Garcia wrote:
>> This adds a new 'cache-clean-interval' option that cleans all qcow2
>> cache entries that haven't been used in a certain interval, given in
>> seconds.
>>
>> This allows setting a large L2 cache size so it can handle scenarios
>> with lots of I/O and at the same time use little memory during periods
>> of inactivity.
>>
>> This feature currently relies on MADV_DONTNEED to free that memory, so
>> it is not useful in systems that don't follow that behavior.
>>

>> +++ b/qapi/block-core.json
>> @@ -41,6 +41,10 @@
>>   # @corrupt: #optional true if the image has been marked corrupt;
>> only valid for
>>   #           compat >= 1.1 (since 2.2)
>>   #
>> +# @cache-clean-interval: interval in seconds after which unused L2 and
>> +#                        refcount cache entries are removed. If 0 then
>> +#                        this feature is not enabled (since 2.4)
>> +#
>>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
>>   #
>>   # Since: 1.7
>> @@ -50,7 +54,8 @@
>>         'compat': 'str',
>>         '*lazy-refcounts': 'bool',
>>         '*corrupt': 'bool',
>> -      'refcount-bits': 'int'
>> +      'refcount-bits': 'int',
>> +      'cache-clean-interval': 'int'
>>     } }
> 
> I'm not too happy about making this part of ImageInfoSpecificQCow2. Two
> reasons for this: First, it's eventually part of ImageInfo, which is
> defined as "Information about a QEMU image file", but this option cannot
> be set in the image file itself but is only a run-time option.
> 
> Second, my original goal for ImageInfoSpecific was to provide more
> information through qemu-img info, and this value will look pretty
> strange there.
> 
> I don't know how to resolve this quandary, though. Since qemu cannot
> change this interval by itself, I think not providing an interface for
> reading it is fine. On the other hand, if Eric finds such an interface
> absolutely mandatory, I can't think of a better place to return it than
> here.

Can we mark the parameter optional, and only provide it when it is
non-zero?  That way, qemu-img (which cannot set an interval) will not
report it, and the only time it will appear is if it was set as part of
opening the block device under qemu.

> 
> The only solution which would satisfy both requirements would be another
> structure which contains "online" flags, and thus is not evaluated by
> qemu-img info, but only by QMP commands. But that's ugly.
> 

Yeah, I'm not sure such duplication helps.  I'd still like it reported
somewhere, though, as it is nice to query that a requested setting is
actually working.
Max Reitz May 28, 2015, 3:14 p.m. UTC | #9
On 28.05.2015 17:13, Eric Blake wrote:
> On 05/28/2015 08:56 AM, Max Reitz wrote:
>> On 27.05.2015 11:46, Alberto Garcia wrote:
>>> This adds a new 'cache-clean-interval' option that cleans all qcow2
>>> cache entries that haven't been used in a certain interval, given in
>>> seconds.
>>>
>>> This allows setting a large L2 cache size so it can handle scenarios
>>> with lots of I/O and at the same time use little memory during periods
>>> of inactivity.
>>>
>>> This feature currently relies on MADV_DONTNEED to free that memory, so
>>> it is not useful in systems that don't follow that behavior.
>>>
>>> +++ b/qapi/block-core.json
>>> @@ -41,6 +41,10 @@
>>>    # @corrupt: #optional true if the image has been marked corrupt;
>>> only valid for
>>>    #           compat >= 1.1 (since 2.2)
>>>    #
>>> +# @cache-clean-interval: interval in seconds after which unused L2 and
>>> +#                        refcount cache entries are removed. If 0 then
>>> +#                        this feature is not enabled (since 2.4)
>>> +#
>>>    # @refcount-bits: width of a refcount entry in bits (since 2.3)
>>>    #
>>>    # Since: 1.7
>>> @@ -50,7 +54,8 @@
>>>          'compat': 'str',
>>>          '*lazy-refcounts': 'bool',
>>>          '*corrupt': 'bool',
>>> -      'refcount-bits': 'int'
>>> +      'refcount-bits': 'int',
>>> +      'cache-clean-interval': 'int'
>>>      } }
>> I'm not too happy about making this part of ImageInfoSpecificQCow2. Two
>> reasons for this: First, it's eventually part of ImageInfo, which is
>> defined as "Information about a QEMU image file", but this option cannot
>> be set in the image file itself but is only a run-time option.
>>
>> Second, my original goal for ImageInfoSpecific was to provide more
>> information through qemu-img info, and this value will look pretty
>> strange there.
>>
>> I don't know how to resolve this quandary, though. Since qemu cannot
>> change this interval by itself, I think not providing an interface for
>> reading it is fine. On the other hand, if Eric finds such an interface
>> absolutely mandatory, I can't think of a better place to return it than
>> here.
> Can we mark the parameter optional, and only provide it when it is
> non-zero?  That way, qemu-img (which cannot set an interval) will not
> report it, and the only time it will appear is if it was set as part of
> opening the block device under qemu.

That sounds good.

Max

>> The only solution which would satisfy both requirements would be another
>> structure which contains "online" flags, and thus is not evaluated by
>> qemu-img info, but only by QMP commands. But that's ugly.
>>
> Yeah, I'm not sure such duplication helps.  I'd still like it reported
> somewhere, though, as it is nice to query that a requested setting is
> actually working.
>
Alberto Garcia May 28, 2015, 3:19 p.m. UTC | #10
On Thu 28 May 2015 05:14:12 PM CEST, Max Reitz wrote:
>>>>          'compat': 'str',
>>>>          '*lazy-refcounts': 'bool',
>>>>          '*corrupt': 'bool',
>>>> -      'refcount-bits': 'int'
>>>> +      'refcount-bits': 'int',
>>>> +      'cache-clean-interval': 'int'
>>>>      } }
>>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>>> Two reasons for this: First, it's eventually part of ImageInfo,
>>> which is defined as "Information about a QEMU image file", but this
>>> option cannot be set in the image file itself but is only a run-time
>>> option.
>>>
>> Can we mark the parameter optional, and only provide it when it is
>> non-zero?  That way, qemu-img (which cannot set an interval) will not
>> report it, and the only time it will appear is if it was set as part
>> of opening the block device under qemu.
>
> That sounds good.

But what do we do with the other parameters (refcount-cache-size,
l2-cache-size)? We cannot have the same solution there because they
don't belong to the image file either, and they're never going go be
zero.

Berto
Max Reitz May 28, 2015, 3:23 p.m. UTC | #11
On 28.05.2015 17:19, Alberto Garcia wrote:
> On Thu 28 May 2015 05:14:12 PM CEST, Max Reitz wrote:
>>>>>           'compat': 'str',
>>>>>           '*lazy-refcounts': 'bool',
>>>>>           '*corrupt': 'bool',
>>>>> -      'refcount-bits': 'int'
>>>>> +      'refcount-bits': 'int',
>>>>> +      'cache-clean-interval': 'int'
>>>>>       } }
>>>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>>>> Two reasons for this: First, it's eventually part of ImageInfo,
>>>> which is defined as "Information about a QEMU image file", but this
>>>> option cannot be set in the image file itself but is only a run-time
>>>> option.
>>>>
>>> Can we mark the parameter optional, and only provide it when it is
>>> non-zero?  That way, qemu-img (which cannot set an interval) will not
>>> report it, and the only time it will appear is if it was set as part
>>> of opening the block device under qemu.
>> That sounds good.
> But what do we do with the other parameters (refcount-cache-size,
> l2-cache-size)? We cannot have the same solution there because they
> don't belong to the image file either, and they're never going go be
> zero.

Pssht, don't mention it, or Eric will notice. :-)

Well, one solution would be to remember whether they have been set 
explicitly or not. But that gets ugly really quickly. Maybe Kevin's 
series helps there, but I don't know.

Of course, the simplest solution is to worry about cache-clean-interval 
for now and worry about the cache sizes later… But that basically means 
"We'll never worry about them unless someone complains", so…

Max
Alberto Garcia May 28, 2015, 3:30 p.m. UTC | #12
On Thu 28 May 2015 05:23:40 PM CEST, Max Reitz wrote:

>>>> Can we mark the parameter optional, and only provide it when it is
>>>> non-zero?  That way, qemu-img (which cannot set an interval) will
>>>> not report it, and the only time it will appear is if it was set as
>>>> part of opening the block device under qemu.

>>> That sounds good.

>> But what do we do with the other parameters (refcount-cache-size,
>> l2-cache-size)? We cannot have the same solution there because they
>> don't belong to the image file either, and they're never going go be
>> zero.

> Of course, the simplest solution is to worry about cache-clean-interval 
> for now and worry about the cache sizes later… But that basically means 
> "We'll never worry about them unless someone complains", so…

My worry is that the solution works in this case because the default
value happens to be 0 and means "disabled", so it makes sense, but it
doesn't work in general with other parameters, so we would be adding
something that we know it's ad-hoc.

Berto
Kevin Wolf May 28, 2015, 4:44 p.m. UTC | #13
Am 28.05.2015 um 17:13 hat Eric Blake geschrieben:
> On 05/28/2015 08:56 AM, Max Reitz wrote:
> > On 27.05.2015 11:46, Alberto Garcia wrote:
> >> This adds a new 'cache-clean-interval' option that cleans all qcow2
> >> cache entries that haven't been used in a certain interval, given in
> >> seconds.
> >>
> >> This allows setting a large L2 cache size so it can handle scenarios
> >> with lots of I/O and at the same time use little memory during periods
> >> of inactivity.
> >>
> >> This feature currently relies on MADV_DONTNEED to free that memory, so
> >> it is not useful in systems that don't follow that behavior.
> >>
> 
> >> +++ b/qapi/block-core.json
> >> @@ -41,6 +41,10 @@
> >>   # @corrupt: #optional true if the image has been marked corrupt;
> >> only valid for
> >>   #           compat >= 1.1 (since 2.2)
> >>   #
> >> +# @cache-clean-interval: interval in seconds after which unused L2 and
> >> +#                        refcount cache entries are removed. If 0 then
> >> +#                        this feature is not enabled (since 2.4)
> >> +#
> >>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
> >>   #
> >>   # Since: 1.7
> >> @@ -50,7 +54,8 @@
> >>         'compat': 'str',
> >>         '*lazy-refcounts': 'bool',
> >>         '*corrupt': 'bool',
> >> -      'refcount-bits': 'int'
> >> +      'refcount-bits': 'int',
> >> +      'cache-clean-interval': 'int'
> >>     } }
> > 
> > I'm not too happy about making this part of ImageInfoSpecificQCow2. Two
> > reasons for this: First, it's eventually part of ImageInfo, which is
> > defined as "Information about a QEMU image file", but this option cannot
> > be set in the image file itself but is only a run-time option.
> > 
> > Second, my original goal for ImageInfoSpecific was to provide more
> > information through qemu-img info, and this value will look pretty
> > strange there.
> > 
> > I don't know how to resolve this quandary, though. Since qemu cannot
> > change this interval by itself, I think not providing an interface for
> > reading it is fine. On the other hand, if Eric finds such an interface
> > absolutely mandatory, I can't think of a better place to return it than
> > here.
> 
> Can we mark the parameter optional, and only provide it when it is
> non-zero?  That way, qemu-img (which cannot set an interval) will not
> report it, and the only time it will appear is if it was set as part of
> opening the block device under qemu.

No, that wouldn't be right. It's not information tied to the image file.

> > The only solution which would satisfy both requirements would be another
> > structure which contains "online" flags, and thus is not evaluated by
> > qemu-img info, but only by QMP commands. But that's ugly.
> > 
> 
> Yeah, I'm not sure such duplication helps.  I'd still like it reported
> somewhere, though, as it is nice to query that a requested setting is
> actually working.

This isn't duplicated information. You can have ImageInfoSpecificQCow2
show lazy_refcounts=off because that is what the image file contains,
but the real value in effect could be lazy_refcounts=on.

Options stored in the image and runtime options are two different pieces
of information, even if the former specify the defaults for the latter.
So I think it should be possible to query both of them.

Kevin
Alberto Garcia May 28, 2015, 5:03 p.m. UTC | #14
On Thu 28 May 2015 06:44:55 PM CEST, Kevin Wolf wrote:
>> Yeah, I'm not sure such duplication helps.  I'd still like it
>> reported somewhere, though, as it is nice to query that a requested
>> setting is actually working.
>
> This isn't duplicated information. You can have ImageInfoSpecificQCow2
> show lazy_refcounts=off because that is what the image file contains,
> but the real value in effect could be lazy_refcounts=on.

That's right, ImageInfoSpecificQCow2 returns the value from the header
(s->compatible_features), not the runtime value (s->use_lazy_refcounts).

I think I'll resend the patch without the ImageInfo change. Querying the
runtime values is a task on its own.

Berto
Eric Blake May 28, 2015, 7:41 p.m. UTC | #15
On 05/28/2015 09:23 AM, Max Reitz wrote:

>>>> Can we mark the parameter optional, and only provide it when it is
>>>> non-zero?  That way, qemu-img (which cannot set an interval) will not
>>>> report it, and the only time it will appear is if it was set as part
>>>> of opening the block device under qemu.
>>> That sounds good.
>> But what do we do with the other parameters (refcount-cache-size,
>> l2-cache-size)? We cannot have the same solution there because they
>> don't belong to the image file either, and they're never going go be
>> zero.
> 
> Pssht, don't mention it, or Eric will notice. :-)
> 
> Well, one solution would be to remember whether they have been set
> explicitly or not. But that gets ugly really quickly. Maybe Kevin's
> series helps there, but I don't know.
> 
> Of course, the simplest solution is to worry about cache-clean-interval
> for now and worry about the cache sizes later… But that basically means
> "We'll never worry about them unless someone complains", so…

Hmm, now that we have three pieces of data, I'm starting to lean more
towards ImageInfoSpecific being the wrong place for this after all; it
would still be nice to advertise all three, but where?  Is
BlockdevCacheInfo more appropriate (as a sub-struct of BlockDeviceInfo)?
Kevin Wolf May 29, 2015, 8:32 a.m. UTC | #16
Am 28.05.2015 um 21:41 hat Eric Blake geschrieben:
> On 05/28/2015 09:23 AM, Max Reitz wrote:
> 
> >>>> Can we mark the parameter optional, and only provide it when it is
> >>>> non-zero?  That way, qemu-img (which cannot set an interval) will not
> >>>> report it, and the only time it will appear is if it was set as part
> >>>> of opening the block device under qemu.
> >>> That sounds good.
> >> But what do we do with the other parameters (refcount-cache-size,
> >> l2-cache-size)? We cannot have the same solution there because they
> >> don't belong to the image file either, and they're never going go be
> >> zero.
> > 
> > Pssht, don't mention it, or Eric will notice. :-)
> > 
> > Well, one solution would be to remember whether they have been set
> > explicitly or not. But that gets ugly really quickly. Maybe Kevin's
> > series helps there, but I don't know.
> > 
> > Of course, the simplest solution is to worry about cache-clean-interval
> > for now and worry about the cache sizes later… But that basically means
> > "We'll never worry about them unless someone complains", so…
> 
> Hmm, now that we have three pieces of data, I'm starting to lean more
> towards ImageInfoSpecific being the wrong place for this after all; it
> would still be nice to advertise all three, but where?  Is
> BlockdevCacheInfo more appropriate (as a sub-struct of BlockDeviceInfo)?

BlockDeviceInfo contains runtime information, so that looks like the
right place indeed. As these options aren't present for all devices, I
don't think we should extend BlockdevCacheInfo, but rather introduce a
BlockDeviceInfoSpecific that works like ImageInfoSpecific, just for
runtime information.

I agree with Berto that that's out of scope for this series, though.

Kevin
diff mbox

Patch

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed14a92..a215f5b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -43,6 +43,7 @@  struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     uint64_t                lru_counter;
+    uint64_t                cache_clean_lru_counter;
 };
 
 static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -78,6 +79,40 @@  static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
 #endif
 }
 
+static inline bool can_clean_entry(Qcow2Cache *c, int i)
+{
+    Qcow2CachedTable *t = &c->entries[i];
+    return t->ref == 0 && !t->dirty && t->offset != 0 &&
+        t->lru_counter <= c->cache_clean_lru_counter;
+}
+
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
+{
+    int i = 0;
+    while (i < c->size) {
+        int to_clean = 0;
+
+        /* Skip the entries that we don't need to clean */
+        while (i < c->size && !can_clean_entry(c, i)) {
+            i++;
+        }
+
+        /* And count how many we can clean in a row */
+        while (i < c->size && can_clean_entry(c, i)) {
+            c->entries[i].offset = 0;
+            c->entries[i].lru_counter = 0;
+            i++;
+            to_clean++;
+        }
+
+        if (to_clean > 0) {
+            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
+        }
+    }
+
+    c->cache_clean_lru_counter = c->lru_counter;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index f7b4cc6..abafcdf 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -468,6 +468,11 @@  static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum refcount block cache size",
         },
+        {
+            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Clean unused cache entries after this time (in seconds)",
+        },
         { /* end of list */ }
     },
 };
@@ -483,6 +488,49 @@  static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    qcow2_cache_clean_unused(bs, s->l2_table_cache);
+    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
+    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              (int64_t) s->cache_clean_interval * 1000);
+}
+
+static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_interval > 0) {
+        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
+                                             SCALE_MS, cache_clean_timer_cb,
+                                             bs);
+        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  (int64_t) s->cache_clean_interval * 1000);
+    }
+}
+
+static void cache_clean_timer_del(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_timer) {
+        timer_del(s->cache_clean_timer);
+        timer_free(s->cache_clean_timer);
+        s->cache_clean_timer = NULL;
+    }
+}
+
+static void qcow2_detach_aio_context(BlockDriverState *bs)
+{
+    cache_clean_timer_del(bs);
+}
+
+static void qcow2_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    cache_clean_timer_init(bs, new_context);
+}
+
 static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
                              uint64_t *refcount_cache_size, Error **errp)
 {
@@ -552,6 +600,7 @@  static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     const char *opt_overlap_check, *opt_overlap_check_template;
     int overlap_check_template = 0;
     uint64_t l2_cache_size, refcount_cache_size;
+    uint64_t cache_clean_interval;
 
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
@@ -839,6 +888,16 @@  static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    cache_clean_interval =
+        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
+    if (cache_clean_interval > UINT_MAX) {
+        error_setg(errp, "Cache clean interval too big");
+        ret = -EINVAL;
+        goto fail;
+    }
+    s->cache_clean_interval = cache_clean_interval;
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
+
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
     s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
@@ -1004,6 +1063,7 @@  static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
+    cache_clean_timer_del(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(bs, s->l2_table_cache);
     }
@@ -1458,6 +1518,7 @@  static void qcow2_close(BlockDriverState *bs)
         }
     }
 
+    cache_clean_timer_del(bs);
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -2552,6 +2613,8 @@  static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
         };
     }
 
+    spec_info->qcow2->cache_clean_interval = s->cache_clean_interval;
+
     return spec_info;
 }
 
@@ -2970,6 +3033,9 @@  BlockDriver bdrv_qcow2 = {
     .create_opts         = &qcow2_create_opts,
     .bdrv_check          = qcow2_check,
     .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
+    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index 0076512..2c45eb2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -93,6 +93,7 @@ 
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
+#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -236,6 +237,8 @@  typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    QEMUTimer *cache_clean_timer;
+    unsigned cache_clean_interval;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -581,6 +584,7 @@  int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
     Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
 
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
 
 int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 863ffea..3a9b590 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -41,6 +41,10 @@ 
 # @corrupt: #optional true if the image has been marked corrupt; only valid for
 #           compat >= 1.1 (since 2.2)
 #
+# @cache-clean-interval: interval in seconds after which unused L2 and
+#                        refcount cache entries are removed. If 0 then
+#                        this feature is not enabled (since 2.4)
+#
 # @refcount-bits: width of a refcount entry in bits (since 2.3)
 #
 # Since: 1.7
@@ -50,7 +54,8 @@ 
       'compat': 'str',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
-      'refcount-bits': 'int'
+      'refcount-bits': 'int',
+      'cache-clean-interval': 'int'
   } }
 
 ##
@@ -1538,6 +1543,9 @@ 
 # @refcount-cache-size:   #optional the maximum size of the refcount block cache
 #                         in bytes (since 2.2)
 #
+# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
+#                         caches. The interval is in seconds (since 2.4)
+#
 # Since: 1.7
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -1549,7 +1557,8 @@ 
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
-            '*refcount-cache-size': 'int' } }
+            '*refcount-cache-size': 'int',
+            '*cache-clean-interval': 'int' } }
 
 
 ##