Message ID | 4C1C4084.4060200@codesourcery.com |
---|---|
State | New |
Headers | show |
Ping? http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01920.html -Sandra > 2010-06-18 Sandra Loosemore <sandra@codesourcery.com> > > PR middle-end/42505 > > gcc/ > * tree-ssa-loop-ivopts.c (determine_set_costs): Delete obsolete > comments about cost model. > (try_add_cand_for): Add second strategy for choosing initial set > based on original IVs, controlled by ORIGINALP argument. > (get_initial_solution): Add ORIGINALP argument. > (find_optimal_iv_set_1): New function, split from find_optimal_iv_set. > (find_optimal_iv_set): Try two different strategies for choosing > the IV set, and return the one with lower cost. > > * cfgloopanal.c (init_set_costs): Use call_used_regs rather than > fixed_regs to count number of registers available for loop variables. > > gcc/testsuite/ > * gcc.target/arm/pr42505.c: New test case. >
Hi, > Ping? > > http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01920.html what are the effects on the compile time? The tree-ssa-loop-ivopts.c part of the patch is OK, assuming that they are negligible. >> 2010-06-18 Sandra Loosemore <sandra@codesourcery.com> >> >> PR middle-end/42505 >> >> gcc/ >> * tree-ssa-loop-ivopts.c (determine_set_costs): Delete obsolete >> comments about cost model. >> (try_add_cand_for): Add second strategy for choosing initial set >> based on original IVs, controlled by ORIGINALP argument. >> (get_initial_solution): Add ORIGINALP argument. >> (find_optimal_iv_set_1): New function, split from find_optimal_iv_set. >> (find_optimal_iv_set): Try two different strategies for choosing >> the IV set, and return the one with lower cost. As for this change: >> * cfgloopanal.c (init_set_costs): Use call_used_regs rather than >> fixed_regs to count number of registers available for loop variables. Should not we make call_used_regs unavailable only if there is a function call in the loop? Zdenek
Zdenek Dvorak wrote: > Hi, > >> Ping? >> >> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01920.html > > what are the effects on the compile time? The tree-ssa-loop-ivopts.c part of the > patch is OK, assuming that they are negligible. I looked at the CSiBE compilation times on ARM, and yes, the changes are negligible. If anything, it seems to be a little faster with my patch (smaller code, less work for subsequent passes). > As for this change: > >>> * cfgloopanal.c (init_set_costs): Use call_used_regs rather than >>> fixed_regs to count number of registers available for loop variables. > > Should not we make call_used_regs unavailable only if there is a function call in the > loop? I actually tried that before submitting the patch, and saw that empirically the results were better using call_used_regs all the time. My theory is that, at least on ARM, a couple additional scratch registers are required for loading constants and performing some of the address computations that ivopts thinks are free. It is better to be conservative in ivopts and underestimate register availability rather than risk introducing additional spills, which would totally overwhelm any savings from ivopts. I realize that it's possible to tweak the costs model in other ways, but after a week or two poking at it I felt like it was a black hole. The minimal change I proposed fixes a fairly obvious bug in the current code in a conservatively correct way, and actually helped both code size and speed, unlike other things I tried. ;-) Perhaps someone else would like to take a stab at it and/or can demonstrate that the more complicated version that tests for a function call in the loop wins on some other target, even if it produces worse code on ARM than my simple fix? -Sandra
Hi, >> As for this change: >> >>>> * cfgloopanal.c (init_set_costs): Use call_used_regs rather than >>>> fixed_regs to count number of registers available for loop variables. >> >> Should not we make call_used_regs unavailable only if there is a function call in the >> loop? > > I actually tried that before submitting the patch, and saw that > empirically the results were better using call_used_regs all the time. > My theory is that, at least on ARM, a couple additional scratch registers > are required for loading constants and performing some of the address > computations that ivopts thinks are free. It is better to be > conservative in ivopts and underestimate register availability rather > than risk introducing additional spills, which would totally overwhelm > any savings from ivopts. > > I realize that it's possible to tweak the costs model in other ways, but > after a week or two poking at it I felt like it was a black hole. The > minimal change I proposed fixes a fairly obvious bug in the current code > in a conservatively correct way, and actually helped both code size and > speed, unlike other things I tried. ;-) Perhaps someone else would like > to take a stab at it and/or can demonstrate that the more complicated > version that tests for a function call in the loop wins on some other > target, even if it produces worse code on ARM than my simple fix? my concern is that this change is not obviously correct, and did not get enough testing to warrant it purely on its heuristic value. So, I cannot approve it unless someone picks it up and makes a more detailed analysis of the problem, Zdenek
Zdenek Dvorak wrote: >>> As for this change: >>> >>>>> * cfgloopanal.c (init_set_costs): Use call_used_regs rather than >>>>> fixed_regs to count number of registers available for loop variables. >>> Should not we make call_used_regs unavailable only if there is a function call in the >>> loop? >> I actually tried that before submitting the patch, and saw that >> empirically the results were better using call_used_regs all the time. >> My theory is that, at least on ARM, a couple additional scratch registers >> are required for loading constants and performing some of the address >> computations that ivopts thinks are free. It is better to be >> conservative in ivopts and underestimate register availability rather >> than risk introducing additional spills, which would totally overwhelm >> any savings from ivopts. >> >> I realize that it's possible to tweak the costs model in other ways, but >> after a week or two poking at it I felt like it was a black hole. The >> minimal change I proposed fixes a fairly obvious bug in the current code >> in a conservatively correct way, and actually helped both code size and >> speed, unlike other things I tried. ;-) Perhaps someone else would like >> to take a stab at it and/or can demonstrate that the more complicated >> version that tests for a function call in the loop wins on some other >> target, even if it produces worse code on ARM than my simple fix? > > my concern is that this change is not obviously correct, and did not get > enough testing to warrant it purely on its heuristic value. So, I cannot > approve it unless someone picks it up and makes a more detailed analysis of the > problem, Hrmmmm. Well, the current code is obviously incorrect, so I don't think it's right to do nothing, either. I guess I'll check in the other uncontroversial part of the patch to tree-ssa-loop-ivopts.c first to get that out of the way, and then try resurrecting the more complicated hack for init_set_costs. -Sandra
Hi, >> my concern is that this change is not obviously correct, and did not get >> enough testing to warrant it purely on its heuristic value. So, I cannot >> approve it unless someone picks it up and makes a more detailed analysis of the >> problem, > > Hrmmmm. Well, the current code is obviously incorrect, that is not so obvious to me :-) It is rather plausible that most performance-critical loops contain no calls, which would make it more correct than your proposed change. In particular, your test results only indicate that we underestimate the register pressure, but do not give much insight into the reasons of the underestimation, Zdenek
On Mon, Jul 5, 2010 at 9:43 AM, Sandra Loosemore <sandra@codesourcery.com> wrote: > Zdenek Dvorak wrote: > >>>> As for this change: >>>> >>>>>> * cfgloopanal.c (init_set_costs): Use call_used_regs rather than >>>>>> fixed_regs to count number of registers available for loop >>>>>> variables. >>>> >>>> Should not we make call_used_regs unavailable only if there is a >>>> function call in the >>>> loop? >>> >>> I actually tried that before submitting the patch, and saw that >>> empirically the results were better using call_used_regs all the time. My >>> theory is that, at least on ARM, a couple additional scratch registers are >>> required for loading constants and performing some of the address >>> computations that ivopts thinks are free. It is better to be conservative >>> in ivopts and underestimate register availability rather than risk >>> introducing additional spills, which would totally overwhelm any savings >>> from ivopts. >>> >>> I realize that it's possible to tweak the costs model in other ways, but >>> after a week or two poking at it I felt like it was a black hole. The >>> minimal change I proposed fixes a fairly obvious bug in the current code in >>> a conservatively correct way, and actually helped both code size and speed, >>> unlike other things I tried. ;-) Perhaps someone else would like to take a >>> stab at it and/or can demonstrate that the more complicated version that >>> tests for a function call in the loop wins on some other target, even if it >>> produces worse code on ARM than my simple fix? >> >> my concern is that this change is not obviously correct, and did not get >> enough testing to warrant it purely on its heuristic value. So, I cannot >> approve it unless someone picks it up and makes a more detailed analysis >> of the >> problem, > > Hrmmmm. Well, the current code is obviously incorrect, so I don't think > it's right to do nothing, either. I guess I'll check in the other > uncontroversial part of the patch to tree-ssa-loop-ivopts.c first to get > that out of the way, and then try resurrecting the more complicated hack for > init_set_costs. > Your checkin: http://gcc.gnu.org/ml/gcc-cvs/2010-07/msg00198.html caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44838
Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 160755) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4444,26 +4444,6 @@ determine_set_costs (struct ivopts_data struct loop *loop = data->current_loop; bitmap_iterator bi; - /* We use the following model (definitely improvable, especially the - cost function -- TODO): - - We estimate the number of registers available (using MD data), name it A. - - We estimate the number of registers used by the loop, name it U. This - number is obtained as the number of loop phi nodes (not counting virtual - registers and bivs) + the number of variables from outside of the loop. - - We set a reserve R (free regs that are used for temporary computations, - etc.). For now the reserve is a constant 3. - - Let I be the number of induction variables. - - -- if U + I + R <= A, the cost is I * SMALL_COST (just not to encourage - make a lot of ivs without a reason). - -- if A - R < U + I <= A, the cost is I * PRES_COST - -- if U + I > A, the cost is I * PRES_COST and - number of uses * SPILL_COST * (U + I - A) / (U + I) is added. */ - if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Global costs:\n"); @@ -5087,11 +5067,13 @@ iv_ca_prune (struct ivopts_data *data, s } /* Tries to extend the sets IVS in the best possible way in order - to express the USE. */ + to express the USE. If ORIGINALP is true, prefer candidates from + the original set of IVs, otherwise favor important candidates not + based on any memory object. */ static bool try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs, - struct iv_use *use) + struct iv_use *use, bool originalp) { comp_cost best_cost, act_cost; unsigned i; @@ -5110,7 +5092,8 @@ try_add_cand_for (struct ivopts_data *da iv_ca_set_no_cp (data, ivs, use); } - /* First try important candidates not based on any memory object. Only if + /* If ORIGINALP is true, try to find the original IV for the use. Otherwise + first try important candidates not based on any memory object. Only if this fails, try the specific ones. Rationale -- in loops with many variables the best choice often is to use just one generic biv. If we added here many ivs specific to the uses, the optimization algorithm later @@ -5122,7 +5105,10 @@ try_add_cand_for (struct ivopts_data *da { cand = iv_cand (data, i); - if (cand->iv->base_object != NULL_TREE) + if (originalp && cand->pos !=IP_ORIGINAL) + continue; + + if (!originalp && cand->iv->base_object != NULL_TREE) continue; if (iv_ca_cand_used_p (ivs, cand)) @@ -5158,8 +5144,13 @@ try_add_cand_for (struct ivopts_data *da continue; /* Already tried this. */ - if (cand->important && cand->iv->base_object == NULL_TREE) - continue; + if (cand->important) + { + if (originalp && cand->pos == IP_ORIGINAL) + continue; + if (!originalp && cand->iv->base_object == NULL_TREE) + continue; + } if (iv_ca_cand_used_p (ivs, cand)) continue; @@ -5193,13 +5184,13 @@ try_add_cand_for (struct ivopts_data *da /* Finds an initial assignment of candidates to uses. */ static struct iv_ca * -get_initial_solution (struct ivopts_data *data) +get_initial_solution (struct ivopts_data *data, bool originalp) { struct iv_ca *ivs = iv_ca_new (data); unsigned i; for (i = 0; i < n_iv_uses (data); i++) - if (!try_add_cand_for (data, ivs, iv_use (data, i))) + if (!try_add_cand_for (data, ivs, iv_use (data, i), originalp)) { iv_ca_free (&ivs); return NULL; @@ -5271,14 +5262,12 @@ try_improve_iv_set (struct ivopts_data * solution and remove the unused ivs while this improves the cost. */ static struct iv_ca * -find_optimal_iv_set (struct ivopts_data *data) +find_optimal_iv_set_1 (struct ivopts_data *data, bool originalp) { - unsigned i; struct iv_ca *set; - struct iv_use *use; /* Get the initial solution. */ - set = get_initial_solution (data); + set = get_initial_solution (data, originalp); if (!set) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -5301,11 +5290,46 @@ find_optimal_iv_set (struct ivopts_data } } + return set; +} + +static struct iv_ca * +find_optimal_iv_set (struct ivopts_data *data) +{ + unsigned i; + struct iv_ca *set, *origset; + struct iv_use *use; + comp_cost cost, origcost; + + /* Determine the cost based on a strategy that starts with original IVs, + and try again using a strategy that prefers candidates not based + on any IVs. */ + origset = find_optimal_iv_set_1 (data, true); + set = find_optimal_iv_set_1 (data, false); + + if (!origset && !set) + return NULL; + + origcost = origset ? iv_ca_cost (origset) : infinite_cost; + cost = set ? iv_ca_cost (set) : infinite_cost; + if (dump_file && (dump_flags & TDF_DETAILS)) { - comp_cost cost = iv_ca_cost (set); - fprintf (dump_file, "Final cost %d (complexity %d)\n\n", cost.cost, cost.complexity); + fprintf (dump_file, "Original cost %d (complexity %d)\n\n", + origcost.cost, origcost.complexity); + fprintf (dump_file, "Final cost %d (complexity %d)\n\n", + cost.cost, cost.complexity); + } + + /* Choose the one with the best cost. */ + if (compare_costs (origcost, cost) <= 0) + { + if (set) + iv_ca_free (&set); + set = origset; } + else if (origset) + iv_ca_free (&origset); for (i = 0; i < n_iv_uses (data); i++) { Index: gcc/cfgloopanal.c =================================================================== --- gcc/cfgloopanal.c (revision 160755) +++ gcc/cfgloopanal.c (working copy) @@ -344,7 +344,7 @@ init_set_costs (void) target_avail_regs = 0; for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i) - && !fixed_regs[i]) + && !call_used_regs[i]) target_avail_regs++; target_res_regs = 3; Index: gcc/testsuite/gcc.target/arm/pr42505.c =================================================================== --- gcc/testsuite/gcc.target/arm/pr42505.c (revision 0) +++ gcc/testsuite/gcc.target/arm/pr42505.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-options "-mthumb -Os -march=armv5te" } */ +/* { dg-require-effective-target arm_thumb1_ok } */ +/* { dg-final { scan-assembler-not "str\[\\t \]*r.,\[\\t \]*.sp," } } */ + +struct A { + int f1; + int f2; +}; + +int func(int c); + +/* This function should not need to spill anything to the stack. */ +int test(struct A* src, struct A* dst, int count) +{ + while (count--) { + if (!func(src->f2)) { + return 0; + } + *dst++ = *src++; + } + + return 1; +}