Message ID | 4CEAC792.7040307@ispras.ru |
---|---|
State | New |
Headers | show |
Hi, tested and improves indeed startup speed. On Mon, Nov 22, 2010 at 10:42:10PM +0300, Kirill Batuzov wrote: > Move the last found TB to the head of the list so it will be found more > quickly next time it will be looked for. > > Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> > Signed-off-by: Pavel Yushchenko <pau@ispras.ru> > --- > Hello. This patch gives significant boost to a used by us rather rich > (for embedded one - featuring X-server, many daemons and applications) > ARM-based system literally decreasing its boot to desktop time by TWO > times! (Average number of traversed 'tb_phys_hash' entries in the main > loop of the 'tb_find_slow' function reduced from 20 to 1.5.) We were > able to shorten boot to login time by about 25% as well using Debian on > versatilepb (no X-server, only basic system). Seems like kernel booting > time is not affected. No problems were encountered during our > experiments. > > We are looking forward for comments about this change and help with > testing. Thanks in advance! > > cpu-exec.c | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index 5d6dd51..55c4526 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -161,6 +161,11 @@ static TranslationBlock *tb_find_slow(target_ulong pc, > tb = tb_gen_code(env, pc, cs_base, flags, 0); > > found: > + if (*ptb1) { > + *ptb1 = tb->phys_hash_next; > + tb->phys_hash_next = tb_phys_hash[h]; > + tb_phys_hash[h] = tb; > + } > /* we add the TB in the virtual pc hash table */ > env->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb; > return tb;
Dear Kirill On Tue, Nov 23, 2010 at 02:42, Kirill Batuzov <batuzovk@ispras.ru> wrote: > Move the last found TB to the head of the list so it will be found more > quickly next time it will be looked for. ... > found: > + if (*ptb1) { > + *ptb1 = tb->phys_hash_next; > + tb->phys_hash_next = tb_phys_hash[h]; > + tb_phys_hash[h] = tb; > + } > /* we add the TB in the virtual pc hash table */ > env->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb; > return tb; > I thank you, because you indirectly teach me how to do it. Since a long time ago, I'd like to do the same thing but I never understand the way TB managed thoroughly. May I suggest something? a. the "if (*ptb)" could be improved by likely() or unlikely() macros (I forgot the real gcc macro's name, I just write down the way Linux kernel name it). I guess, statistically the hit rate of last executed TB could be somewhere above 50% (using locality principle, which is IIRC, saying that roughly 10% of code are actually ran and they took 90% of overall total code run time. CMIIW). So, likely() will improve the code ordering. b. Or better, we need somekind of LRU list ordering of TB. On each TB hit, "hit count" of that TB is incremented. On every certain time interval, the entire list is scanned and it is reordered with the most frequently called TB is in the head. All in all, I think it is due to simple TB "clean up" mechanism so far, that is whenever it is full, they are simply dumped out. While I agree it's the simplest, I remember Julian Seward's test that showed runtime acceleration when TB size is increased up to certain size. This, to me, in some degree demonstrate that more efficient management of TB cache is needed. Just my 2 cents idea...and so far this is all I can suggest to help Qemu development.
diff --git a/cpu-exec.c b/cpu-exec.c index 5d6dd51..55c4526 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -161,6 +161,11 @@ static TranslationBlock *tb_find_slow(target_ulong pc, tb = tb_gen_code(env, pc, cs_base, flags, 0); found: + if (*ptb1) { + *ptb1 = tb->phys_hash_next; + tb->phys_hash_next = tb_phys_hash[h]; + tb_phys_hash[h] = tb; + } /* we add the TB in the virtual pc hash table */ env->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb; return tb;