Message ID | 38be38957aff5457fd811db71706a6f645df5c7e.1430919406.git.berto@igalia.com |
---|---|
State | New |
Headers | show |
On Wed, May 06, 2015 at 04:39:29PM +0300, Alberto Garcia wrote: > The current cache algorithm traverses the array starting always from > the beginning, so the average number of comparisons needed to perform > a lookup is proportional to the size of the array. > > By using a hash of the offset as the starting point, lookups are > faster and independent from the array size. > > The hash is computed using the cluster number of the table, multiplied > by 4 to make it perform better when there are collisions. > > In my tests, using a cache with 2048 entries, this reduces the average > number of comparisons per lookup from 430 to 2.5. > > Signed-off-by: Alberto Garcia <berto@igalia.com> > --- > block/qcow2-cache.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On 06.05.2015 15:39, Alberto Garcia wrote: > The current cache algorithm traverses the array starting always from > the beginning, so the average number of comparisons needed to perform > a lookup is proportional to the size of the array. > > By using a hash of the offset as the starting point, lookups are > faster and independent from the array size. > > The hash is computed using the cluster number of the table, multiplied > by 4 to make it perform better when there are collisions. > > In my tests, using a cache with 2048 entries, this reduces the average > number of comparisons per lookup from 430 to 2.5. > > Signed-off-by: Alberto Garcia <berto@igalia.com> > --- > block/qcow2-cache.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) So it's a direct-mapped cache, unless the entry has been used, in which case it becomes a fully associative cache... Let's assume the cache is full. Now this hash algorithm (direct mapped cache) basically becomes futile, because the LRU algorithm (fully associative cache) takes over, and any entry (the LRU entry) is replaced, independently of the cache. Thus, the entry will most probably not be placed at its hashed position. So once the cache is full, I guess this algorithm doesn't help a lot, does it? (I'm sorry if you had this discussion before...) Max
On Fri 08 May 2015 05:46:57 PM CEST, Max Reitz wrote: > Let's assume the cache is full. Now this hash algorithm (direct mapped > cache) basically becomes futile, because the LRU algorithm (fully > associative cache) takes over That's right, although in that scenario I guess there's no good algorithm. Anyway and summarizing that I wrote in a different e-mail, I just thought that starting the lookup always from the beginning was the worst alternative and it was very easy to improve it a bit, but I don't think it has a big impact on the performance. Otherwise I would have evaluated a different data structure. Berto
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index dffb887..3137ba8 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -248,6 +248,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, BDRVQcowState *s = bs->opaque; int i; int ret; + int lookup_index; uint64_t min_lru_counter = UINT64_MAX; int min_lru_index = -1; @@ -255,7 +256,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, offset, read_from_disk); /* Check if the table is already cached */ - for (i = 0; i < c->size; i++) { + i = lookup_index = (offset / s->cluster_size * 4) % c->size; + do { const Qcow2CachedTable *t = &c->entries[i]; if (t->offset == offset) { goto found; @@ -264,7 +266,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, min_lru_counter = t->lru_counter; min_lru_index = i; } - } + if (++i == c->size) { + i = 0; + } + } while (i != lookup_index); if (min_lru_index == -1) { /* This can't happen in current synchronous code, but leave the check
The current cache algorithm traverses the array starting always from the beginning, so the average number of comparisons needed to perform a lookup is proportional to the size of the array. By using a hash of the offset as the starting point, lookups are faster and independent from the array size. The hash is computed using the cluster number of the table, multiplied by 4 to make it perform better when there are collisions. In my tests, using a cache with 2048 entries, this reduces the average number of comparisons per lookup from 430 to 2.5. Signed-off-by: Alberto Garcia <berto@igalia.com> --- block/qcow2-cache.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)