Message ID | Pine.LNX.4.64.1006151807190.15724@wotan.suse.de |
---|---|
State | New |
Headers | show |
Michael Matz wrote: > This speeds up genattrtab to be no time issue during bootstrap anymore. > The speed difference > of the compiler is acceptable I think, actually even speeding up the > compiler sometimes (probably cache effects, because the .text size of > insn-attrtab is _much_ smaller) or being in the noise. I certainly see how this benefits us as GCC developers. But, it does look like it has a negative effect on overall compile-time, even though in some cases it helps a bit. The effect seems negative, in the aggregate, though I didn't try to dump your table into a spreadsheet and actually get a number out. I don't think we should help ourselves at the expense of the user's experience. I happened to be talking to someone else this morning about this same issue. Would it help just to organize the Makefile in such a way that this file is being built first, so that on a parallel machine it's not the last thing being built? This person observed that in many cases make seems to wait to *start* building insn-attrtab until very late in the build process, meaning that all the other cores are idle while it's building. Thanks,
> Michael Matz wrote: > > > This speeds up genattrtab to be no time issue during bootstrap anymore. > > > The speed difference > > of the compiler is acceptable I think, actually even speeding up the > > compiler sometimes (probably cache effects, because the .text size of > > insn-attrtab is _much_ smaller) or being in the noise. > > I certainly see how this benefits us as GCC developers. But, it does > look like it has a negative effect on overall compile-time, even though > in some cases it helps a bit. The effect seems negative, in the > aggregate, though I didn't try to dump your table into a spreadsheet and > actually get a number out. I don't think we should help ourselves at > the expense of the user's experience. > > I happened to be talking to someone else this morning about this same > issue. Would it help just to organize the Makefile in such a way that > this file is being built first, so that on a parallel machine it's not > the last thing being built? This person observed that in many cases > make seems to wait to *start* building insn-attrtab until very late in > the build process, meaning that all the other cores are idle while it's > building. In WHOPR compilation we now partition the program based on source files (this is something I plan to revisit later), so we get partition that corresponds to insn-attrtab. WPA driver sorts units by size and feeds them to parallel make and insn-attrtab is still last on finishing build by quite long shot. This ignore time spent by genattrtab itself. So just reordering makefiles is not going to solve the problem. I also think insn-attrtab starts late because genattrtab takes long time to generate it. insn-attrtab on i386 also has one dominating function. With WHOPR this can be solved by better partitioning algorithm along with function splitting (I have prototypes of both the second I plan to contribute soon after some bugfixing). Honza > > Thanks, > > -- > Mark Mitchell > CodeSourcery > mark@codesourcery.com > (650) 331-3385 x713
On 15/06/2010 18:19, Mark Mitchell wrote: > Michael Matz wrote: > >> This speeds up genattrtab to be no time issue during bootstrap anymore. > >> The speed difference >> of the compiler is acceptable I think, actually even speeding up the >> compiler sometimes (probably cache effects, because the .text size of >> insn-attrtab is _much_ smaller) or being in the noise. > > I certainly see how this benefits us as GCC developers. 90 seconds out of a whole build, or did I misread something? cheers, DaveK
Jan Hubicka wrote: > In WHOPR compilation we now partition the program based on source files (this > is something I plan to revisit later), so we get partition that corresponds to > insn-attrtab. WPA driver sorts units by size and feeds them to parallel make > and insn-attrtab is still last on finishing build by quite long shot. This > ignore time spent by genattrtab itself. > > So just reordering makefiles is not going to solve the problem. Well, it might solve the problem for developers not using WHOPR. :-) I certainly don't claim it's a general solution; at some point, if you have enouh cores, the single core compiling insn-attrtab will dominate anyhow. But if some easy Makefile hackery would help some developers, I'd still argue we should do it. And I still have a hard time with a change that's going to make the compiler slower in order to save a minute or two during compiler builds, especially given that test cycles are so much longer than build cycles.
* Mark Mitchell wrote on Tue, Jun 15, 2010 at 08:39:47PM CEST: > But if some easy Makefile hackery would help some developers, > I'd still argue we should do it. IIRC, this hackery has already been done sometime last year or so, with reordering of objects and the order-only dependencies. Unless the build has changed a lot (haven't checked), it is simply not possible to improve in a generally useful way without changing GNU make to allow for more fine-tuning. It is typically helpful to not increase N in -jN too much over the number of available cores. Cheers, Ralf
On Tue, Jun 15, 2010 at 11:39:47AM -0700, Mark Mitchell wrote: > Jan Hubicka wrote: > > > In WHOPR compilation we now partition the program based on source files (this > > is something I plan to revisit later), so we get partition that corresponds to > > insn-attrtab. WPA driver sorts units by size and feeds them to parallel make > > and insn-attrtab is still last on finishing build by quite long shot. This > > ignore time spent by genattrtab itself. > > > > So just reordering makefiles is not going to solve the problem. > > Well, it might solve the problem for developers not using WHOPR. :-) > > I certainly don't claim it's a general solution; at some point, if you > have enouh cores, the single core compiling insn-attrtab will dominate > anyhow. But if some easy Makefile hackery would help some developers, > I'd still argue we should do it. And I still have a hard time with a > change that's going to make the compiler slower in order to save a > minute or two during compiler builds, especially given that test cycles > are so much longer than build cycles. I believe on x86_64/i686 the most time is spent in compiling internal_dfa_insn_code, primarily because there are so many different schedulings. The insn is a big switch on recog_memoized, where most of the cases first compare ix86_schedule var to some enum. I guess it would be certainly faster to compile to instead split the big function into separate function for each schedule and make internal_dfa_insn_code a function pointer, would need to be benchmarked how it would actually perform at runtime. Jakub
On 06/15/2010 07:57 PM, Dave Korn wrote: > On 15/06/2010 18:19, Mark Mitchell wrote: >> Michael Matz wrote: >> >>> This speeds up genattrtab to be no time issue during bootstrap anymore. >> >>> The speed difference >>> of the compiler is acceptable I think, actually even speeding up the >>> compiler sometimes (probably cache effects, because the .text size of >>> insn-attrtab is _much_ smaller) or being in the noise. >> >> I certainly see how this benefits us as GCC developers. > > 90 seconds out of a whole build, or did I misread something? 90 seconds out of "touch something.md; make cc1" is significant. It's the most sensitive part for day-to-day backend development. Here is Michael's data reformatted: big_u kde_u clean try clean try alpha 43.21 43.26 0.12% 32.52 32.50 -0.06% arm 49.25 49.85 1.22% 37.89 38.10 0.55% crisv32 36.17 36.21 0.11% 27.33 27.43 0.37% hppa 46.77 46.97 0.43% 31.97 31.84 -0.41% i386 33.51 33.78 0.81% 29.93 30.12 0.63% ia64 66.55 66.62 0.11% 49.81 49.81 0.00% mips 51.23 50.77 -0.90% powerpc 49.74 49.38 -0.72% 34.15 34.71 1.64% s390x 47.26 47.41 0.32% 32.83 33.64 2.47% sh 50.99 50.79 -0.39% 38.09 38.13 0.11% sparc 44.21 43.27 -2.13% 32.98 32.93 -0.15% x86_64 28.78 28.96 0.63% 28.72 28.96 0.84% It's a tougher call than it seems... Unfortunately, of the four likely most used platforms (i386, x86_64, mips and arm) three are degraded consistently, and the fourth lacks one of the two data points. That's indeed not too good. :-/ Compiling genattrtab at -O1 saves half of the time on x86_64, and cfgcleanup goes from 20 to 0.7 seconds. So it should be worthwhile investigating _which_ cfgcleanup pass is spending so much time. Compiling insn-recog.c at -O1 instead saves 30% of the compilation time. I used to like Michael's patches a lot but nowadays parallelization of the build is much better thanks to more cores in the systems. Since genattrtab can run in parallel with most of the rest of the build, what we need to save time on is "touch x.md; make -j6 cc1" (with the insn-attrtab.c move-if-changed rule disabled of course). Maybe -O1 is enough together with Jakub's old patch to split insn-attrtab.c into four or eight files. Moreover, the "split" idea could be applied to insn-recog.c too. Michael, can you get numbers for your testcases when you compile "clean" insn-attrtab.c and/or insn-recog.c with -O1? Paolo
On 06/15/2010 10:57 AM, Dave Korn wrote: > On 15/06/2010 18:19, Mark Mitchell wrote: >> Michael Matz wrote: >> >>> This speeds up genattrtab to be no time issue during bootstrap anymore. >> >>> The speed difference >>> of the compiler is acceptable I think, actually even speeding up the >>> compiler sometimes (probably cache effects, because the .text size of >>> insn-attrtab is _much_ smaller) or being in the noise. >> >> I certainly see how this benefits us as GCC developers. > > 90 seconds out of a whole build, or did I misread something? > For me, it is more like 20 minutes. As Mark pointed out, the Makefile for some reason starts running genattrtab close to last in stages 2 and 3. So in my case (12 CPU mips64-linux), genattrtab takes about 50% of the wall-clock time to build a pass. David Daney
On Tue, Jun 15, 2010 at 8:39 PM, Mark Mitchell <mark@codesourcery.com> wrote: > Jan Hubicka wrote: > >> In WHOPR compilation we now partition the program based on source files (this >> is something I plan to revisit later), so we get partition that corresponds to >> insn-attrtab. WPA driver sorts units by size and feeds them to parallel make >> and insn-attrtab is still last on finishing build by quite long shot. This >> ignore time spent by genattrtab itself. >> >> So just reordering makefiles is not going to solve the problem. > > Well, it might solve the problem for developers not using WHOPR. :-) > > I certainly don't claim it's a general solution; at some point, if you > have enouh cores, the single core compiling insn-attrtab will dominate > anyhow. But if some easy Makefile hackery would help some developers, > I'd still argue we should do it. And I still have a hard time with a > change that's going to make the compiler slower in order to save a > minute or two during compiler builds, especially given that test cycles > are so much longer than build cycles. I think the makefile ordering is already good - it is just that genattrtab takes so much time that compiling insn-attrtab starts very late. Michael addresses this by speeding up genattrtab. Richard. > -- > Mark Mitchell > CodeSourcery > mark@codesourcery.com > (650) 331-3385 x713 >
On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote: > On Tue, Jun 15, 2010 at 8:39 PM, Mark Mitchell <mark@codesourcery.com> wrote: > > I certainly don't claim it's a general solution; at some point, if you > > have enouh cores, the single core compiling insn-attrtab will dominate > > anyhow. But if some easy Makefile hackery would help some developers, > > I'd still argue we should do it. And I still have a hard time with a > > change that's going to make the compiler slower in order to save a > > minute or two during compiler builds, especially given that test cycles > > are so much longer than build cycles. > > I think the makefile ordering is already good - it is just that genattrtab > takes so much time that compiling insn-attrtab starts very late. Michael > addresses this by speeding up genattrtab. Pardon the dumb question here, but IIUC, we build insn-attrtab at each stage. Why not just build it once for all stages? Do we gain anything (besides less complicated Makefiles) by building it at every stage? I realize that this is not quite as nice as cutting its build time by 90%+, but moving it earlier might hide the compilation latency just as effectively. -Nathan
Hi, On Wed, 16 Jun 2010, Nathan Froyd wrote: > On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote: > > On Tue, Jun 15, 2010 at 8:39 PM, Mark Mitchell <mark@codesourcery.com> wrote: > > > I certainly don't claim it's a general solution; at some point, if you > > > have enouh cores, the single core compiling insn-attrtab will dominate > > > anyhow. But if some easy Makefile hackery would help some developers, > > > I'd still argue we should do it. And I still have a hard time with a > > > change that's going to make the compiler slower in order to save a > > > minute or two during compiler builds, especially given that test cycles > > > are so much longer than build cycles. > > > > I think the makefile ordering is already good - it is just that genattrtab > > takes so much time that compiling insn-attrtab starts very late. Michael > > addresses this by speeding up genattrtab. > > Pardon the dumb question here, but IIUC, we build insn-attrtab at each > stage. Why not just build it once for all stages? Because that's the definition of bootstrapping. Either we bootstrap, or we don't. It doesn't make sense to bootstrap only half the sources. Ciao, Michael.
On Wed, Jun 16, 2010 at 03:23:49PM +0200, Michael Matz wrote: > On Wed, 16 Jun 2010, Nathan Froyd wrote: > > On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote: > > > I think the makefile ordering is already good - it is just that genattrtab > > > takes so much time that compiling insn-attrtab starts very late. Michael > > > addresses this by speeding up genattrtab. > > > > Pardon the dumb question here, but IIUC, we build insn-attrtab at each > > stage. Why not just build it once for all stages? > > Because that's the definition of bootstrapping. Either we bootstrap, or > we don't. It doesn't make sense to bootstrap only half the sources. I'm not asking why we compile insn-attrtab at every stage, but why we run genattrtab at every stage. I understand that it's nice to compile genattrtab at each stage for a little sanity checking, but the compiler used to compile genattrtab should have no effect on the output of genattrtab. Or were you already responding to "why generate insn-attrtab at every stage"? -Nathan
On Jun 16, 2010, at 6:37 AM, Nathan Froyd <froydnj@codesourcery.com> wrote: > On Wed, Jun 16, 2010 at 03:23:49PM +0200, Michael Matz wrote: >> On Wed, 16 Jun 2010, Nathan Froyd wrote: >>> On Wed, Jun 16, 2010 at 09:15:39AM +0200, Richard Guenther wrote: >>>> I think the makefile ordering is already good - it is just that genattrtab >>>> takes so much time that compiling insn-attrtab starts very late. Michael >>>> addresses this by speeding up genattrtab. >>> >>> Pardon the dumb question here, but IIUC, we build insn-attrtab at each >>> stage. Why not just build it once for all stages? >> >> Because that's the definition of bootstrapping. Either we bootstrap, or >> we don't. It doesn't make sense to bootstrap only half the sources. > > I'm not asking why we compile insn-attrtab at every stage, but why we > run genattrtab at every stage. I understand that it's nice to compile > genattrtab at each stage for a little sanity checking, but the compiler > used to compile genattrtab should have no effect on the output of > genattrtab. That is true, unless there are bugs. We compile with gcc , and then test to ensure there are no bugs. In this fashion, 100% of the source that is compiled, is compiled with gcc, the very definition of bootstrap. People that want more speed at the expense of testing, don't bootstrap.
Hi, On Wed, 16 Jun 2010, Nathan Froyd wrote: > > > Pardon the dumb question here, but IIUC, we build insn-attrtab at > > > each stage. Why not just build it once for all stages? > > > > Because that's the definition of bootstrapping. Either we bootstrap, > > or we don't. It doesn't make sense to bootstrap only half the > > sources. > > I'm not asking why we compile insn-attrtab at every stage, but why we > run genattrtab at every stage. I understand that it's nice to compile > genattrtab at each stage for a little sanity checking, but the compiler > used to compile genattrtab should have no effect on the output of > genattrtab. If we were going that route, then we wouldn't need bootstrapping at all, because the compiler used to compile $random.c should have no effect on the output of the final cc1. > Or were you already responding to "why generate insn-attrtab at every > stage"? Implicitely, yes. Ciao, Michael.
> Hi, > > On Wed, 16 Jun 2010, Nathan Froyd wrote: > > > > > Pardon the dumb question here, but IIUC, we build insn-attrtab at > > > > each stage. Why not just build it once for all stages? > > > > > > Because that's the definition of bootstrapping. Either we bootstrap, > > > or we don't. It doesn't make sense to bootstrap only half the > > > sources. > > > > I'm not asking why we compile insn-attrtab at every stage, but why we > > run genattrtab at every stage. I understand that it's nice to compile > > genattrtab at each stage for a little sanity checking, but the compiler > > used to compile genattrtab should have no effect on the output of > > genattrtab. > > If we were going that route, then we wouldn't need bootstrapping at all, > because the compiler used to compile $random.c should have no effect on > the output of the final cc1. > > > Or were you already responding to "why generate insn-attrtab at every > > stage"? > > Implicitely, yes. With a lot of cores, we might actually execute getnattrtab and compilation of stage1 insn-attrtab.c at same time and then just sanity check that genattrtab did not change results :) Honza > > > Ciao, > Michael.
On 06/16/2010 05:51 PM, Jan Hubicka wrote: > With a lot of cores, we might actually execute getnattrtab and compilation > of stage1 insn-attrtab.c at same time and then just sanity check that genattrtab > did not change results :) That's a different story. It doesn't break the idea of bootstrapping, and can be applied to all insn-* files. Paolo
Hi, On Wed, 16 Jun 2010, Paolo Bonzini wrote: > On 06/16/2010 05:51 PM, Jan Hubicka wrote: > > With a lot of cores, we might actually execute getnattrtab and compilation > > of stage1 insn-attrtab.c at same time and then just sanity check that > > genattrtab > > did not change results :) > > That's a different story. It doesn't break the idea of bootstrapping, > and can be applied to all insn-* files. But it also doesn't give any gains. You still would have to run genattrtab (to get something to compare with), and you still would have to compile insn-attrtab.c (to check that cc1 produced the same code as last stage). Ciao, Michael.
> Hi, > > On Wed, 16 Jun 2010, Paolo Bonzini wrote: > > > On 06/16/2010 05:51 PM, Jan Hubicka wrote: > > > With a lot of cores, we might actually execute getnattrtab and compilation > > > of stage1 insn-attrtab.c at same time and then just sanity check that > > > genattrtab > > > did not change results :) > > > > That's a different story. It doesn't break the idea of bootstrapping, > > and can be applied to all insn-* files. > > But it also doesn't give any gains. You still would have to run > genattrtab (to get something to compare with), and you still would have to > compile insn-attrtab.c (to check that cc1 produced the same code as last > stage). This will let you to start insn-attrtab compilation as first thing during build, so you gain parallelizm in between genattrtab and insn-attrtab compilation for stage>1. Obviously as the insn-attrtab compilation time is still a bottleneck in whole build (as can be seen in whopr link), we won't solve this completely, but we will get genattrtab runtime less relevant to the whole build time. Honza > > > Ciao, > Michael.
On 06/17/2010 01:39 PM, Michael Matz wrote: > On Wed, 16 Jun 2010, Paolo Bonzini wrote: >> On 06/16/2010 05:51 PM, Jan Hubicka wrote: >>> With a lot of cores, we might actually execute getnattrtab and compilation >>> of stage1 insn-attrtab.c at same time and then just sanity check that >>> genattrtab >>> did not change results :) >> >> That's a different story. It doesn't break the idea of bootstrapping, >> and can be applied to all insn-* files. > > But it also doesn't give any gains. You still would have to run > genattrtab (to get something to compare with), and you still would have to > compile insn-attrtab.c (to check that cc1 produced the same code as last > stage). You can parallelize those two, so you would still get a 50% gain on a beefy-enough machine (pre Jakub's patch). However, the insn_dfa_code split makes the serialization of genattrtab and insn-attrtab.o compilation much less troublesome. Paolo
Hi, On Thu, 17 Jun 2010, Jan Hubicka wrote: > > > That's a different story. It doesn't break the idea of > > > bootstrapping, and can be applied to all insn-* files. > > > > But it also doesn't give any gains. You still would have to run > > genattrtab (to get something to compare with), and you still would > > have to compile insn-attrtab.c (to check that cc1 produced the same > > code as last stage). > > This will let you to start insn-attrtab compilation as first thing > during build, And then restart if the genattrtab result turns out to be different? Or just stop compilation? Ciao, Michael.
> Hi, > > On Thu, 17 Jun 2010, Jan Hubicka wrote: > > > > > That's a different story. It doesn't break the idea of > > > > bootstrapping, and can be applied to all insn-* files. > > > > > > But it also doesn't give any gains. You still would have to run > > > genattrtab (to get something to compare with), and you still would > > > have to compile insn-attrtab.c (to check that cc1 produced the same > > > code as last stage). > > > > This will let you to start insn-attrtab compilation as first thing > > during build, > > And then restart if the genattrtab result turns out to be different? Or > just stop compilation? Stop compilation. They should be the same, right? Honza
Index: genattrtab.c =================================================================== --- genattrtab.c (revision 160784) +++ genattrtab.c (working copy) @@ -241,6 +241,7 @@ static const char *length_str; static const char *delay_type_str; static const char *delay_1_0_str; static const char *num_delay_slots_str; +static const char *insn_code_str; /* Simplify an expression. Only call the routine if there is something to simplify. */ @@ -1621,6 +1622,58 @@ write_length_unit_log (void) printf ("EXPORTED_CONST int length_unit_log = %u;\n", length_unit_log); } +/* Compute approximate cost of the expression. Used to decide whether + expression is cheap enough for inline. */ +static int +attr_rtx_cost (rtx x) +{ + int cost = 1; + enum rtx_code code; + if (!x) + return 0; + code = GET_CODE (x); + switch (code) + { + case MATCH_OPERAND: + if (XSTR (x, 1)[0]) + return 10; + else + return 1; + + case EQ_ATTR_ALT: + return 1; + + case EQ_ATTR: + /* Alternatives don't result into function call. */ + if (!strcmp_check (XSTR (x, 0), alternative_name) + || !strcmp_check (XSTR (x, 0), insn_code_str)) + return 1; + else + return 5; + default: + { + int i, j; + const char *fmt = GET_RTX_FORMAT (code); + for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) + { + switch (fmt[i]) + { + case 'V': + case 'E': + for (j = 0; j < XVECLEN (x, i); j++) + cost += attr_rtx_cost (XVECEXP (x, i, j)); + break; + case 'e': + cost += attr_rtx_cost (XEXP (x, i)); + break; + } + } + } + break; + } + return cost; +} + /* Take a COND expression and see if any of the conditions in it can be simplified. If any are known true or known false for the particular insn code, the COND can be further simplified. @@ -1744,6 +1797,111 @@ simplify_cond (rtx exp, int insn_code, i return ret; } +/* Taken a COND expression EXP and a value AV, and returns a possibly + optimized variant of EXP. This is similar to simplify_cond, but + instead of optimizing for just one instruction we optimize for the + set of instructions found in AV. */ + +static rtx +simplify_cond_insn_list (rtx exp, struct attr_value *av) +{ + int i; + /* We store the desired contents here, + then build a new expression if they don't match EXP. */ + rtx defval = XEXP (exp, 1); + rtx new_defval = XEXP (exp, 1); + int len = XVECLEN (exp, 0); + rtx *tests; + int allsame = 1; + rtx ret; + + if (av->has_asm_insn) + return exp; + + tests = XNEWVEC (rtx, len); + /* This lets us free all storage allocated below, if appropriate. */ + obstack_finish (rtl_obstack); + + memcpy (tests, XVEC (exp, 0)->elem, len * sizeof (rtx)); + + /* See if default value needs simplification. */ + if (GET_CODE (defval) == COND) + new_defval = simplify_cond_insn_list (defval, av); + + for (i = 0; i < len; i += 2) + { + rtx newtest, newval; + struct insn_ent *ie; + int oldcost; + + oldcost = attr_rtx_cost (XVECEXP (exp, 0, i)); + tests[i] = false_rtx; + for (ie = av->first_insn; ie != 0; ie = ie->next) + { + rtx orig_test = XVECEXP (exp, 0, i); + gcc_assert (ie->def->insn_code >= 0); + newtest = simplify_test_exp_in_temp (orig_test, ie->def->insn_code, + ie->def->insn_index); + /* We couldn't simplify the expression for this insn code, + so it makes no sense to try further, as syntactically we + can't shorten the test anymore. Further, also the + simplifications done up to now can be removed. */ + if (newtest == orig_test) + { + tests[i] = orig_test; + break; + } + if (newtest != false_rtx) + { + newtest = attr_rtx (AND, + attr_eq (insn_code_str, + attr_numeral (ie->def->insn_code)), + newtest); + tests[i] = insert_right_side (IOR, tests[i], newtest, -2, -2); + } + } + + if (attr_rtx_cost (tests[i]) > oldcost) + tests[i] = XVECEXP (exp, 0, i); + + /* Simplify this test. */ + newtest = simplify_test_exp_in_temp (tests[i], -2, -2); + tests[i] = newtest; + + newval = tests[i + 1]; + /* See if this value may need simplification. */ + if (GET_CODE (newval) == COND) + newval = simplify_cond_insn_list (newval, av); + + tests[i + 1] = newval; + } + + /* See if we changed anything. */ + if (len != XVECLEN (exp, 0) || new_defval != XEXP (exp, 1)) + allsame = 0; + else + for (i = 0; i < len; i++) + if (! attr_equal_p (tests[i], XVECEXP (exp, 0, i))) + { + allsame = 0; + break; + } + + if (allsame) + ret = exp; + else + { + rtx newexp = rtx_alloc (COND); + + XVEC (newexp, 0) = rtvec_alloc (len); + memcpy (XVEC (newexp, 0)->elem, tests, len * sizeof (rtx)); + XEXP (newexp, 1) = new_defval; + ret = newexp; + } + free (tests); + return ret; +} + /* Remove an insn entry from an attribute value. */ static void @@ -2216,57 +2374,6 @@ simplify_or_tree (rtx exp, rtx *pterm, i return exp; } -/* Compute approximate cost of the expression. Used to decide whether - expression is cheap enough for inline. */ -static int -attr_rtx_cost (rtx x) -{ - int cost = 0; - enum rtx_code code; - if (!x) - return 0; - code = GET_CODE (x); - switch (code) - { - case MATCH_OPERAND: - if (XSTR (x, 1)[0]) - return 10; - else - return 0; - - case EQ_ATTR_ALT: - return 0; - - case EQ_ATTR: - /* Alternatives don't result into function call. */ - if (!strcmp_check (XSTR (x, 0), alternative_name)) - return 0; - else - return 5; - default: - { - int i, j; - const char *fmt = GET_RTX_FORMAT (code); - for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) - { - switch (fmt[i]) - { - case 'V': - case 'E': - for (j = 0; j < XVECLEN (x, i); j++) - cost += attr_rtx_cost (XVECEXP (x, i, j)); - break; - case 'e': - cost += attr_rtx_cost (XEXP (x, i)); - break; - } - } - } - break; - } - return cost; -} - /* Simplify test expression and use temporary obstack in order to avoid memory bloat. Use ATTR_IND_SIMPLIFIED to avoid unnecessary simplifications and avoid unnecessary copying if possible. */ @@ -2599,6 +2706,25 @@ simplify_test_exp (rtx exp, int insn_cod return SIMPLIFY_TEST_EXP (newexp, insn_code, insn_index); } + /* Similarly, + convert (ior (and (y) (x)) + (and (z) (x))) + to (and (ior (y) (z)) + (x)) + Note that we want the common term to stay at the end. + */ + + else if (GET_CODE (left) == AND && GET_CODE (right) == AND + && attr_equal_p (XEXP (left, 1), XEXP (right, 1))) + { + newexp = attr_rtx (IOR, XEXP (left, 0), XEXP (right, 0)); + + left = newexp; + right = XEXP (right, 1); + newexp = attr_rtx (AND, left, right); + return SIMPLIFY_TEST_EXP (newexp, insn_code, insn_index); + } + /* See if all or all but one of the insn's alternatives are specified in this tree. Optimize if so. */ @@ -2702,6 +2828,13 @@ simplify_test_exp (rtx exp, int insn_cod break; } + if (XSTR (exp, 0) == insn_code_str) + { + if (insn_code >= 0) + newexp = atoi (XSTR (exp, 1)) == insn_code ? true_rtx : false_rtx; + break; + } + /* Look at the value for this insn code in the specified attribute. We normally can replace this comparison with the condition that would give this insn the values being tested for. */ @@ -2734,7 +2867,7 @@ simplify_test_exp (rtx exp, int insn_cod x = evaluate_eq_attr (exp, attr, av->value, insn_code, insn_index); x = SIMPLIFY_TEST_EXP (x, insn_code, insn_index); - if (attr_rtx_cost(x) < 20) + if (attr_rtx_cost(x) < 7) return x; } } @@ -2754,6 +2887,131 @@ simplify_test_exp (rtx exp, int insn_cod return newexp; } +/* Return 1 if any EQ_ATTR subexpression of P refers to ATTR, + otherwise return 0. */ + +static int +tests_attr_p (rtx p, struct attr_desc *attr) +{ + const char *fmt; + int i, ie, j, je; + + if (GET_CODE (p) == EQ_ATTR) + { + if (XSTR (p, 0) != attr->name) + return 0; + return 1; + } + + fmt = GET_RTX_FORMAT (GET_CODE (p)); + ie = GET_RTX_LENGTH (GET_CODE (p)); + for (i = 0; i < ie; i++) + { + switch (*fmt++) + { + case 'e': + if (tests_attr_p (XEXP (p, i), attr)) + return 1; + break; + + case 'E': + je = XVECLEN (p, i); + for (j = 0; j < je; ++j) + if (tests_attr_p (XVECEXP (p, i, j), attr)) + return 1; + break; + } + } + + return 0; +} + +/* Calculate a topological sorting of all attributes so that + all attributes only depend on attributes in front of it. + Place the result in *RET (which is a pointer to an array of + attr_desc pointers), and return the size of that array. */ + +static int +get_attr_order (struct attr_desc ***ret) +{ + int i, j; + int num = 0; + struct attr_desc *attr; + struct attr_desc **all, **sorted; + char *handled; + for (i = 0; i < MAX_ATTRS_INDEX; i++) + for (attr = attrs[i]; attr; attr = attr->next) + num++; + all = XNEWVEC (struct attr_desc *, num); + sorted = XNEWVEC (struct attr_desc *, num); + handled = XCNEWVEC (char, num); + num = 0; + for (i = 0; i < MAX_ATTRS_INDEX; i++) + for (attr = attrs[i]; attr; attr = attr->next) + all[num++] = attr; + + j = 0; + for (i = 0; i < num; i++) + if (all[i]->is_const) + handled[i] = 1, sorted[j++] = all[i]; + + /* We have only few attributes hence we can live with the inner + loop being O(n^2), unlike the normal fast variants of topological + sorting. */ + while (j < num) + { + for (i = 0; i < num; i++) + if (!handled[i]) + { + /* Let's see if I depends on anything interesting. */ + int k; + for (k = 0; k < num; k++) + if (!handled[k]) + { + struct attr_value *av; + for (av = all[i]->first_value; av; av = av->next) + if (av->num_insns != 0) + if (tests_attr_p (av->value, all[k])) + break; + + if (av) + /* Something in I depends on K. */ + break; + } + if (k == num) + { + /* Nothing in I depended on anything intersting, so + it's done. */ + handled[i] = 1; + sorted[j++] = all[i]; + } + } + } + + for (j = 0; j < num; j++) + { + struct attr_desc *attr2; + struct attr_value *av; + + attr = sorted[j]; + fprintf (stderr, "%s depends on: ", attr->name); + for (i = 0; i < MAX_ATTRS_INDEX; ++i) + for (attr2 = attrs[i]; attr2; attr2 = attr2->next) + if (!attr2->is_const) + for (av = attr->first_value; av; av = av->next) + if (av->num_insns != 0) + if (tests_attr_p (av->value, attr2)) + { + fprintf (stderr, "%s, ", attr2->name); + break; + } + fprintf (stderr, "\n"); + } + free (all); + *ret = sorted; + return num; +} + /* Optimize the attribute lists by seeing if we can determine conditional values from the known values of other attributes. This will save subroutine calls during the compilation. */ @@ -2768,6 +3026,8 @@ optimize_attrs (void) int i; struct attr_value_list *ivbuf; struct attr_value_list *iv; + struct attr_desc **topsort; + int topnum; /* For each insn code, make a list of all the insn_ent's for it, for all values for all attributes. */ @@ -2783,18 +3043,22 @@ optimize_attrs (void) iv = ivbuf = XNEWVEC (struct attr_value_list, num_insn_ents); - for (i = 0; i < MAX_ATTRS_INDEX; i++) - for (attr = attrs[i]; attr; attr = attr->next) - for (av = attr->first_value; av; av = av->next) - for (ie = av->first_insn; ie; ie = ie->next) - { - iv->attr = attr; - iv->av = av; - iv->ie = ie; - iv->next = insn_code_values[ie->def->insn_code]; - insn_code_values[ie->def->insn_code] = iv; - iv++; - } + /* Create the chain of insn*attr values such that we see dependend + attributes after their dependencies. As we use a stack via the + next pointers start from the end of the topological order. */ + topnum = get_attr_order (&topsort); + for (i = topnum - 1; i >= 0; i--) + for (av = topsort[i]->first_value; av; av = av->next) + for (ie = av->first_insn; ie; ie = ie->next) + { + iv->attr = topsort[i]; + iv->av = av; + iv->ie = ie; + iv->next = insn_code_values[ie->def->insn_code]; + insn_code_values[ie->def->insn_code] = iv; + iv++; + } + free (topsort); /* Sanity check on num_insn_ents. */ gcc_assert (iv == ivbuf + num_insn_ents); @@ -2829,7 +3093,15 @@ optimize_attrs (void) } rtl_obstack = old; - if (newexp != av->value) + /* If we created a new value for this instruction, and it's + cheaper than the old value, and overall cheap, use that + one as specific value for the current instruction. + The last test is to avoid exploding the get_attr_ function + sizes for no much gain. */ + if (newexp != av->value + && attr_rtx_cost (newexp) < attr_rtx_cost (av->value) + && attr_rtx_cost (newexp) < 26 + ) { newexp = attr_copy_rtx (newexp); remove_insn_ent (av, ie); @@ -3332,6 +3604,12 @@ write_test_expr (rtx exp, int flags) break; } + if (XSTR (exp, 0) == insn_code_str) + { + printf ("insn_code == %s", XSTR (exp, 1)); + break; + } + attr = find_attr (&XSTR (exp, 0), 0); gcc_assert (attr); @@ -3618,10 +3896,80 @@ walk_attr_value (rtx exp) } } -/* Write out a function to obtain the attribute for a given INSN. */ +/* If a subexpression of P refers to ATTR, write out C code to retrieve + the value of that attribute storing it in a local variable. */ + +static int +write_expr_attr_cache (rtx p, struct attr_desc *attr, int indent) +{ + const char *fmt; + int i, ie, j, je; + + if (GET_CODE (p) == EQ_ATTR) + { + if (XSTR (p, 0) != attr->name) + return 0; + + write_indent (indent); + if (attr->enum_name) + printf (" enum %s ", attr->enum_name); + else if (!attr->is_numeric) + printf (" enum attr_%s ", attr->name); + else + printf (" int "); + + printf ("attr_%s ATTRIBUTE_UNUSED = get_attr_%s (insn);\n", + attr->name, attr->name); + return 1; + } + + fmt = GET_RTX_FORMAT (GET_CODE (p)); + ie = GET_RTX_LENGTH (GET_CODE (p)); + for (i = 0; i < ie; i++) + { + switch (*fmt++) + { + case 'e': + if (write_expr_attr_cache (XEXP (p, i), attr, indent)) + return 1; + break; + + case 'E': + je = XVECLEN (p, i); + for (j = 0; j < je; ++j) + if (write_expr_attr_cache (XVECEXP (p, i, j), attr, indent)) + return 1; + break; + } + } + + return 0; +} + +/* Given a VALUE (possibly having EQ_ATTR tests in subexpression) + write out C code to retrieve the value of all attributes used + in tests embedded therein. */ static void -write_attr_get (struct attr_desc *attr) +write_cache_used_attr_for_value (rtx value, int indent) +{ + struct attr_desc *attr2; + int i; + + for (i = 0; i < MAX_ATTRS_INDEX; ++i) + for (attr2 = attrs[i]; attr2; attr2 = attr2->next) + if (!attr2->is_const) + write_expr_attr_cache (value, attr2, indent); +} + +/* Write out C code to calculate the value of ATTR per instruction. + PREFIX and SUFFIX are used to delimit the 'return' statement delivering + the value to our caller. Either it will form a real return statement, + or an accumulator update. */ + +static void +write_attr_switch (struct attr_desc *attr, int indent, const char *prefix, + const char *suffix) { struct attr_value *av, *common_av; @@ -3629,6 +3977,32 @@ write_attr_get (struct attr_desc *attr) switch we will generate. */ common_av = find_most_used (attr); + write_indent (indent); + printf ("{\n"); + write_indent (indent); + printf (" int insn_code = recog_memoized (insn);\n"); + write_indent (indent); + printf (" switch (insn_code)\n"); + write_indent (indent); + printf (" {\n"); + + for (av = attr->first_value; av; av = av->next) + if (av != common_av) + write_attr_case (attr, av, 1, prefix, suffix, indent + 4, true_rtx); + + write_attr_case (attr, common_av, 0, prefix, suffix, indent + 4, true_rtx); + + write_indent (indent); + printf (" }\n"); + write_indent (indent); + printf ("}\n\n"); +} + +/* Write out a function to obtain the attribute for a given INSN. */ + +static void +write_attr_get (struct attr_desc *attr) +{ /* Write out start of function, then all values with explicit `case' lines, then a `default', then the value with the most uses. */ if (attr->enum_name) @@ -3646,6 +4020,7 @@ write_attr_get (struct attr_desc *attr) printf ("get_attr_%s (rtx insn ATTRIBUTE_UNUSED)\n", attr->name); else { + struct attr_value *av; printf ("get_attr_%s (void)\n", attr->name); printf ("{\n"); @@ -3656,22 +4031,13 @@ write_attr_get (struct attr_desc *attr) av->first_insn->def->insn_index); else if (av->num_insns != 0) write_attr_set (attr, 2, av->value, "return", ";", - true_rtx, -2, 0); + true_rtx, -2, -2); printf ("}\n\n"); return; } - printf ("{\n"); - printf (" switch (recog_memoized (insn))\n"); - printf (" {\n"); - - for (av = attr->first_value; av; av = av->next) - if (av != common_av) - write_attr_case (attr, av, 1, "return", ";", 4, true_rtx); - - write_attr_case (attr, common_av, 0, "return", ";", 4, true_rtx); - printf (" }\n}\n\n"); + write_attr_switch (attr, 0, "return", ";"); } /* Given an AND tree of known true terms (because we are inside an `if' with @@ -3727,6 +4093,10 @@ write_attr_set (struct attr_desc *attr, rtx testexp; rtx inner_true; + /* Reset our_known_true after some time to not accumulate + too much cruft (slowing down genattrtab). */ + if ((i & 31) == 0) + our_known_true = known_true; testexp = eliminate_known_true (our_known_true, XVECEXP (value, 0, i), insn_code, insn_index); @@ -3755,7 +4125,7 @@ write_attr_set (struct attr_desc *attr, write_indent (indent); printf ("%sif ", first_if ? "" : "else "); first_if = 0; - write_test_expr (testexp, 0); + write_test_expr (testexp, 2); printf ("\n"); write_indent (indent + 2); printf ("{\n"); @@ -3820,6 +4190,7 @@ write_attr_case (struct attr_desc *attr, int write_case_lines, const char *prefix, const char *suffix, int indent, rtx known_true) { + rtx opt_val; if (av->num_insns == 0) return; @@ -3843,9 +4214,33 @@ write_attr_case (struct attr_desc *attr, printf ("default:\n"); } + indent += 2; + write_indent (indent); + printf ("{\n"); + + opt_val = av->value; + /* If we have multiple but only few instructions associated with + this value we can possibly optimize this. */ + if (GET_CODE (opt_val) == COND && av->num_insns > 1 && av->num_insns < 10) + opt_val = simplify_cond_insn_list (opt_val, av); + while (GET_CODE (opt_val) == COND) + { + rtx newexp2; + if (av->num_insns == 1) + newexp2 = simplify_cond (opt_val, av->first_insn->def->insn_code, + av->first_insn->def->insn_index); + else + newexp2 = simplify_cond (opt_val, -2, -2); + if (newexp2 == opt_val) + break; + opt_val = newexp2; + } + /* See what we have to do to output this value. */ must_extract = must_constrain = address_used = 0; - walk_attr_value (av->value); + walk_attr_value (opt_val); + + write_cache_used_attr_for_value (opt_val, indent); if (must_constrain) { @@ -3859,11 +4254,11 @@ write_attr_case (struct attr_desc *attr, } if (av->num_insns == 1) - write_attr_set (attr, indent + 2, av->value, prefix, suffix, + write_attr_set (attr, indent + 2, opt_val, prefix, suffix, known_true, av->first_insn->def->insn_code, av->first_insn->def->insn_index); else - write_attr_set (attr, indent + 2, av->value, prefix, suffix, + write_attr_set (attr, indent + 2, opt_val, prefix, suffix, known_true, -2, 0); if (strncmp (prefix, "return", 6)) @@ -3871,6 +4266,8 @@ write_attr_case (struct attr_desc *attr, write_indent (indent + 2); printf ("break;\n"); } + write_indent (indent); + printf ("}\n"); printf ("\n"); } @@ -3993,7 +4390,6 @@ write_eligible_delay (const char *kind) char str[50]; const char *pstr; struct attr_desc *attr; - struct attr_value *av, *common_av; int i; /* Compute the maximum number of delay slots required. We use the delay @@ -4026,19 +4422,11 @@ write_eligible_delay (const char *kind) { attr = find_attr (&delay_type_str, 0); gcc_assert (attr); - common_av = find_most_used (attr); - - printf (" insn = delay_insn;\n"); - printf (" switch (recog_memoized (insn))\n"); - printf (" {\n"); sprintf (str, " * %d;\n break;", max_slots); - for (av = attr->first_value; av; av = av->next) - if (av != common_av) - write_attr_case (attr, av, 1, "slot +=", str, 4, true_rtx); - write_attr_case (attr, common_av, 0, "slot +=", str, 4, true_rtx); - printf (" }\n\n"); + printf (" insn = delay_insn;\n"); + write_attr_switch (attr, 2, "slot +=", str); /* Ensure matched. Otherwise, shouldn't have been called. */ printf (" gcc_assert (slot >= %d);\n\n", max_slots); @@ -4047,20 +4435,10 @@ write_eligible_delay (const char *kind) /* If just one type of delay slot, write simple switch. */ if (num_delays == 1 && max_slots == 1) { - printf (" insn = candidate_insn;\n"); - printf (" switch (recog_memoized (insn))\n"); - printf (" {\n"); - attr = find_attr (&delay_1_0_str, 0); gcc_assert (attr); - common_av = find_most_used (attr); - - for (av = attr->first_value; av; av = av->next) - if (av != common_av) - write_attr_case (attr, av, 1, "return", ";", 4, true_rtx); - - write_attr_case (attr, common_av, 0, "return", ";", 4, true_rtx); - printf (" }\n"); + printf (" insn = candidate_insn;\n"); + write_attr_switch (attr, 2, "return", ";"); } else @@ -4076,21 +4454,12 @@ write_eligible_delay (const char *kind) { printf (" case %d:\n", (i / 3) + (num_delays == 1 ? 0 : delay->num * max_slots)); - printf (" switch (recog_memoized (insn))\n"); - printf ("\t{\n"); sprintf (str, "*%s_%d_%d", kind, delay->num, i / 3); pstr = str; attr = find_attr (&pstr, 0); gcc_assert (attr); - common_av = find_most_used (attr); - - for (av = attr->first_value; av; av = av->next) - if (av != common_av) - write_attr_case (attr, av, 1, "return", ";", 8, true_rtx); - - write_attr_case (attr, common_av, 0, "return", ";", 8, true_rtx); - printf (" }\n"); + write_attr_switch (attr, 6, "return", ";"); } printf (" default:\n"); @@ -4133,7 +4502,7 @@ find_attr (const char **name_p, int crea /* Before we resort to using `strcmp', see if the string address matches anywhere. In most cases, it should have been canonicalized to do so. */ - if (name == alternative_name) + if (name == alternative_name || name == insn_code_str) return NULL; index = name[0] & (MAX_ATTRS_INDEX - 1); @@ -4458,6 +4827,7 @@ main (int argc, char **argv) delay_type_str = DEF_ATTR_STRING ("*delay_type"); delay_1_0_str = DEF_ATTR_STRING ("*delay_1_0"); num_delay_slots_str = DEF_ATTR_STRING ("*num_delay_slots"); + insn_code_str = DEF_ATTR_STRING ("insn_code"); printf ("/* Generated automatically by the program `genattrtab'\n\ from the machine description file `md'. */\n\n"); @@ -4573,7 +4943,7 @@ from the machine description file `md'. /* Construct extra attributes for `length'. */ make_length_attrs (); - /* Perform any possible optimizations to speed up compilation. */ + /* Perform some optimizations to speed up compilation. */ optimize_attrs (); /* Now write out all the `gen_attr_...' routines. Do these before the