diff mbox

[5/7] qcow2: use a hash to look for entries in the L2 cache

Message ID 38be38957aff5457fd811db71706a6f645df5c7e.1430919406.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia May 6, 2015, 1:39 p.m. UTC
The current cache algorithm traverses the array starting always from
the beginning, so the average number of comparisons needed to perform
a lookup is proportional to the size of the array.

By using a hash of the offset as the starting point, lookups are
faster and independent from the array size.

The hash is computed using the cluster number of the table, multiplied
by 4 to make it perform better when there are collisions.

In my tests, using a cache with 2048 entries, this reduces the average
number of comparisons per lookup from 430 to 2.5.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Stefan Hajnoczi May 7, 2015, 10:18 a.m. UTC | #1
On Wed, May 06, 2015 at 04:39:29PM +0300, Alberto Garcia wrote:
> The current cache algorithm traverses the array starting always from
> the beginning, so the average number of comparisons needed to perform
> a lookup is proportional to the size of the array.
> 
> By using a hash of the offset as the starting point, lookups are
> faster and independent from the array size.
> 
> The hash is computed using the cluster number of the table, multiplied
> by 4 to make it perform better when there are collisions.
> 
> In my tests, using a cache with 2048 entries, this reduces the average
> number of comparisons per lookup from 430 to 2.5.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Max Reitz May 8, 2015, 3:46 p.m. UTC | #2
On 06.05.2015 15:39, Alberto Garcia wrote:
> The current cache algorithm traverses the array starting always from
> the beginning, so the average number of comparisons needed to perform
> a lookup is proportional to the size of the array.
>
> By using a hash of the offset as the starting point, lookups are
> faster and independent from the array size.
>
> The hash is computed using the cluster number of the table, multiplied
> by 4 to make it perform better when there are collisions.
>
> In my tests, using a cache with 2048 entries, this reduces the average
> number of comparisons per lookup from 430 to 2.5.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)

So it's a direct-mapped cache, unless the entry has been used, in which 
case it becomes a fully associative cache...

Let's assume the cache is full. Now this hash algorithm (direct mapped 
cache) basically becomes futile, because the LRU algorithm (fully 
associative cache) takes over, and any entry (the LRU entry) is 
replaced, independently of the cache. Thus, the entry will most probably 
not be placed at its hashed position. So once the cache is full, I guess 
this algorithm doesn't help a lot, does it?

(I'm sorry if you had this discussion before...)

Max
Alberto Garcia May 8, 2015, 4:48 p.m. UTC | #3
On Fri 08 May 2015 05:46:57 PM CEST, Max Reitz wrote:

> Let's assume the cache is full. Now this hash algorithm (direct mapped
> cache) basically becomes futile, because the LRU algorithm (fully
> associative cache) takes over

That's right, although in that scenario I guess there's no good
algorithm.

Anyway and summarizing that I wrote in a different e-mail, I just
thought that starting the lookup always from the beginning was the worst
alternative and it was very easy to improve it a bit, but I don't think
it has a big impact on the performance. Otherwise I would have evaluated
a different data structure.

Berto
diff mbox

Patch

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index dffb887..3137ba8 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -248,6 +248,7 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
     BDRVQcowState *s = bs->opaque;
     int i;
     int ret;
+    int lookup_index;
     uint64_t min_lru_counter = UINT64_MAX;
     int min_lru_index = -1;
 
@@ -255,7 +256,8 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
                           offset, read_from_disk);
 
     /* Check if the table is already cached */
-    for (i = 0; i < c->size; i++) {
+    i = lookup_index = (offset / s->cluster_size * 4) % c->size;
+    do {
         const Qcow2CachedTable *t = &c->entries[i];
         if (t->offset == offset) {
             goto found;
@@ -264,7 +266,10 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             min_lru_counter = t->lru_counter;
             min_lru_index = i;
         }
-    }
+        if (++i == c->size) {
+            i = 0;
+        }
+    } while (i != lookup_index);
 
     if (min_lru_index == -1) {
         /* This can't happen in current synchronous code, but leave the check