Speed up genattrtab

Hi,

okay, let's try again, after many years.  Maybe this time :)

This speeds up genattrtab to be no time issue during bootstrap anymore.
Over the years I worked on many approaches to this.  My first one was to 
throttle down the optimization, then I completely removed the 
optimization, then I implemented different kinds of optimizations, then I 
combined them with the throttled down ones, and now I'm back to more or 
less only throttling the optimizations.

Obviously switching off all optimizations creates the fastest combination 
of genattrtab+compiling insn-attrtab.c.  But it has some effect on the 
overall speed of the compiler.  Not too bad as I rectified this a bit, but 
maybe too much to be acceptable to everyone.  But for all development 
during the last years I used a so modified genattrtab, it's really nice :)

Now, after much benchmarking last week I'm proposing the below patch.  It 
does not get rid of all optimizations, but throttles it significantly 
(plus reorders the order of computation so that lowering the limits 
doesn't have too much effect for small attributes).

Numbers follow.  First the architecture, then the version of genattrtab, 
then four numbers:
 gen_u == seconds to run an optimized (!) genattrtab
 st1_1 == seconds to compile generated insn-attrtab.c with an optimized 
          cc1
 big_u == seconds to compile an artificial piece of code that generates
          large functions, many loops, scheduling opportunities
 kde_u == seconds to compile kdecore.cc, a one-file variant of an older 
          version of libkdecore.

The genattrtab versions are: clean == as in SVN, try3 == no call to 
optimize_attrs, otherwise same as proposed patch, try == the proposed 
variant.  These measurements were taken on genattrtab versions that didn't 
contain the latest changes to support enum attributes, but those have no 
speed effect (I've checked for some combinations).

arch     name   gen_u  st1_u  big_u  kde_u
alpha    clean  0      0.75   43.21  32.52
alpha    try3   0      0.26   44.19  32.85
alpha    try    0      0.35   43.26  32.50
arm      clean  6      19.66  49.25  37.89
arm      try3   0      1.78   50.04  38.00
arm      try    2      2.35   49.85  38.10
crisv32  clean  0      0.21   36.17  27.33
crisv32  try3   0      0.15   36.49  27.53
crisv32  try    0      0.23   36.21  27.43
hppa     clean  0      1.11   46.77  31.97
hppa     try3   0      0.58   46.85  32.00
hppa     try    0      0.64   46.97  31.84
i386     clean  38     34.25  33.51  29.93
i386     try3   1      1.99   34.26  30.64
i386     try    6      2.31   33.78  30.12
ia64     clean  1      1.88   66.55  49.81
ia64     try3   0      0.71   67.08  50.33
ia64     try    0      0.95   66.62  49.81
mips     clean  74     17.08  51.23
mips     try3   0      1.74   52.11
mips     try    4      2.29   50.77
powerpc  clean  56     48.59  49.74  34.15
powerpc  try3   0      2.59   50.60  34.97
powerpc  try    5      2.17   49.38  34.71
s390x    clean  0      1.82   47.26  32.83
s390x    try3   0      0.62   47.63  33.75
s390x    try    0      0.78   47.41  33.64
sh       clean  0      1.46   50.99  38.09
sh       try3   0      0.68   51.05  38.30
sh       try    0      0.91   50.79  38.13
sparc    clean  0      1.11   44.21  32.98
sparc    try3   0      0.57   44.45  33.22
sparc    try    0      0.57   43.27  32.93
x86_64   clean  52     43.81  28.78  28.72
x86_64   try3   1      2.16   29.22  29.40
x86_64   try    6      2.98   28.96  28.96

(mips wasn't able to compile kdecore.cc).  This is all cross compilers to 
$arch-linux, all running on the same host machine (a x86_64-linux iCore7 
machine).  As said, I'm proposing "try", so compare the first and third 
numbers.  It will hugely help i386, mips, powerpc and x86_64, and arm a 
bit; the others aren't a problem right now anyway.  The speed difference 
of the compiler is acceptable I think, actually even speeding up the 
compiler sometimes (probably cache effects, because the .text size of 
insn-attrtab is _much_ smaller) or being in the noise.

So, if included, we go from 95 seconds to 9 seconds for x86_64 for an 
optimized cc1, the difference will be even larger for stage2 (using an 
possibly unoptimized cc1).

As said, I'm bootstrapping with variants of this since years, but of 
course I'm regstrapping this currently on x86_64-linux.  Okay for trunk?

Ciao,
Michael.

Speed up genattrtab

Commit Message

Comments

Patch