Message ID | 1293658044-10244-1-git-send-email-aurelien@aurel32.net |
---|---|
State | New |
Headers | show |
On Wed, Dec 29, 2010 at 9:27 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: > Most of emulated CPU have instructions aligned on 16 or 32 bits, while > on others GCC tries to align the target jump location. This means that > 1/2 or 3/4 of tb_phys_hash entries are never used. > > Update the hash function tb_phys_hash_func() to ignore the two lowest > bits of the address. This brings a 6% speed-up when booting a MIPS > image. Nice! The beginning of functions may be aligned to 16 bytes. Would it change the performance figures if one or two more bits were ignored?
On Thu, Dec 30, 2010 at 05:55:38PM +0000, Blue Swirl wrote: > On Wed, Dec 29, 2010 at 9:27 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: > > Most of emulated CPU have instructions aligned on 16 or 32 bits, while > > on others GCC tries to align the target jump location. This means that > > 1/2 or 3/4 of tb_phys_hash entries are never used. > > > > Update the hash function tb_phys_hash_func() to ignore the two lowest > > bits of the address. This brings a 6% speed-up when booting a MIPS > > image. > > Nice! The beginning of functions may be aligned to 16 bytes. Would it > change the performance figures if one or two more bits were ignored? > It makes a noticeable difference on how the TBs are dispatched in the hash table, but only by a few percents (slightly more on ppc). I am not able to measure any speed improvement, it is all in the noise. My guess is that compilers align functions to 16 bytes, but not jump in loops, which are far more numerous that functions starts.
On Fri, Dec 31, 2010 at 08:46:02PM +0100, Aurelien Jarno wrote: > On Thu, Dec 30, 2010 at 05:55:38PM +0000, Blue Swirl wrote: > > On Wed, Dec 29, 2010 at 9:27 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: > > > Most of emulated CPU have instructions aligned on 16 or 32 bits, while > > > on others GCC tries to align the target jump location. This means that > > > 1/2 or 3/4 of tb_phys_hash entries are never used. > > > > > > Update the hash function tb_phys_hash_func() to ignore the two lowest > > > bits of the address. This brings a 6% speed-up when booting a MIPS > > > image. > > > > Nice! The beginning of functions may be aligned to 16 bytes. Would it > > change the performance figures if one or two more bits were ignored? > > > > It makes a noticeable difference on how the TBs are dispatched in the > hash table, but only by a few percents (slightly more on ppc). I am not Here I meant how TBs are dispatched after my patch has been applied.
diff --git a/exec-all.h b/exec-all.h index 6821b17..a4b75bd 100644 --- a/exec-all.h +++ b/exec-all.h @@ -177,7 +177,7 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) static inline unsigned int tb_phys_hash_func(tb_page_addr_t pc) { - return pc & (CODE_GEN_PHYS_HASH_SIZE - 1); + return (pc >> 2) & (CODE_GEN_PHYS_HASH_SIZE - 1); } TranslationBlock *tb_alloc(target_ulong pc);
Most of emulated CPU have instructions aligned on 16 or 32 bits, while on others GCC tries to align the target jump location. This means that 1/2 or 3/4 of tb_phys_hash entries are never used. Update the hash function tb_phys_hash_func() to ignore the two lowest bits of the address. This brings a 6% speed-up when booting a MIPS image. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- exec-all.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)