Message ID | CA+4CFy76kH8XGTHTdd1dCe2GatkAh=wjXCkkCin1_-Q3guH_3Q@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Sat, Apr 26, 2014 at 5:35 AM, Wei Mi wrote: > Index: ira-lives.c > =================================================================== > --- ira-lives.c (revision 209253) > +++ ira-lives.c (working copy) > @@ -1025,7 +1025,11 @@ process_single_reg_class_operands (bool > { > ira_object_t obj = ira_object_id_map[px]; > a = OBJECT_ALLOCNO (obj); > - if (a != operand_a) > + /* If a is much hotter in some other region, don't add reg class > + cl into its conflict hardreg set. Let lra_split to do splitting > + here for operand_a. */ > + if (a != operand_a > + && (LRA_SPLIT_FREQ_RATIO * freq >= a->freq)) > { > /* We could increase costs of A instead of making it > conflicting with the hard register. But it works worse AFAICT this path is not LRA specific, so your patch may break ports still relying on reload. Ciao! Steven
Thanks. I will change > + if (a != operand_a > + && (LRA_SPLIT_FREQ_RATIO * freq >= a->freq)) to > + if (a != operand_a > + && (!ira_use_lra_p || LRA_SPLIT_FREQ_RATIO * freq >= a->freq)) Regards, Wei. On Mon, Apr 28, 2014 at 12:57 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Sat, Apr 26, 2014 at 5:35 AM, Wei Mi wrote: >> Index: ira-lives.c >> =================================================================== >> --- ira-lives.c (revision 209253) >> +++ ira-lives.c (working copy) >> @@ -1025,7 +1025,11 @@ process_single_reg_class_operands (bool >> { >> ira_object_t obj = ira_object_id_map[px]; >> a = OBJECT_ALLOCNO (obj); >> - if (a != operand_a) >> + /* If a is much hotter in some other region, don't add reg class >> + cl into its conflict hardreg set. Let lra_split to do splitting >> + here for operand_a. */ >> + if (a != operand_a >> + && (LRA_SPLIT_FREQ_RATIO * freq >= a->freq)) >> { >> /* We could increase costs of A instead of making it >> conflicting with the hard register. But it works worse > > AFAICT this path is not LRA specific, so your patch may break ports > still relying on reload. > > Ciao! > Steven
On 2014-04-25, 11:35 PM, Wei Mi wrote: > Hi, > > This patch is to address the missing optimization reported in > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60738 > > Now in process_single_reg_class_operands, any allocno A conflicting > with a single reg class operand B is marked to never use the reg class > in IRA. This is non-optimal when A is in a hot region while B is in a > cold region. The patch allows A to use the register in the single reg > class if only the hotness difference between A and B is large enough. > The patch also extends lra_split to make sure A is splitted in the > code region for B instead of being spilled. > > bootstrap and regression test are ok for x86_64-linux-gnu. Is it ok for trunk? > > Thanks for working on this Wei. The patch is ok for the trunk with the change you proposed to resolve Steven's concern and with resolving the changelog entries pitfalls below: It is better to add the name of the parameter to make easier a future search when the parameter was introduced (e.g. "params.h (LRA_SPLIT_FREQ_RATIO): New param."). Adding '#include "params.h"' is not in the changelog too. And you should remove unfinished comment (see below). For the future, it would be nice if such patches are benchmarked too (e.g. on some SPEC) in order to be sure that it works not worse in most cases (it is very hard to predict effect of such changes). > ChangeLog: > > 2014-04-25 Wei Mi <wmi@google.com> > > PR rtl-optimization/60738 > * params.h: New param. > * params.def: Ditto. > * lra-constraints.c (need_for_split_p): Let more > cases to do lra-split. > * ira-lives.c (process_single_reg_class_operands): > Avoid to add single reg class into conflict hardreg set in > some cases. ... > Index: lra-constraints.c > =================================================================== > --- lra-constraints.c (revision 209253) > +++ lra-constraints.c (working copy) > @@ -129,6 +129,7 @@ > #include "ira.h" > #include "rtl-error.h" > #include "lra-int.h" > +#include "params.h" > > /* Value of LRA_CURR_RELOAD_NUM at the beginning of BB of the current > insn. Remember that LRA_CURR_RELOAD_NUM is the number of emitted > @@ -4632,8 +4633,13 @@ static bitmap_head ebb_global_regs; > static inline bool > need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno) > { > + int freq; > + rtx last_use_insn; > int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno]; > > + last_use_insn = skip_usage_debug_insns (usage_insns[regno].insns); > + freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (last_use_insn)); > + > lra_assert (hard_regno >= 0); > return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno) > /* Don't split eliminable hard registers, otherwise we can > @@ -4653,25 +4659,27 @@ need_for_split_p (HARD_REG_SET potential > && (regno >= FIRST_PSEUDO_REGISTER > || ! TEST_HARD_REG_BIT (call_used_reg_set, regno) > || usage_insns[regno].calls_num == calls_num) > - /* We need at least 2 reloads to make pseudo splitting > - profitable. We should provide hard regno splitting in > - any case to solve 1st insn scheduling problem when > - moving hard register definition up might result in > - impossibility to find hard register for reload pseudo of > - small register class. */ > - && (usage_insns[regno].reloads_num > - + (regno < FIRST_PSEUDO_REGISTER ? 0 : 3) < reloads_num) > - && (regno < FIRST_PSEUDO_REGISTER > - /* For short living pseudos, spilling + inheritance can > - be considered a substitution for splitting. > - Therefore we do not splitting for local pseudos. It > - decreases also aggressiveness of splitting. The > - minimal number of references is chosen taking into > - account that for 2 references splitting has no sense > - as we can just spill the pseudo. */ > - || (regno >= FIRST_PSEUDO_REGISTER > - && lra_reg_info[regno].nrefs > 3 > - && bitmap_bit_p (&ebb_global_regs, regno)))) > + /* If ^ I guess it is a leftover from the final patch editing. It should be removed too. > + && ((LRA_SPLIT_FREQ_RATIO * freq < lra_reg_info[regno].freq) > + /* We need at least 2 reloads to make pseudo splitting > + profitable. We should provide hard regno splitting in > + any case to solve 1st insn scheduling problem when > + moving hard register definition up might result in > + impossibility to find hard register for reload pseudo of > + small register class. */ > + || ((usage_insns[regno].reloads_num > + + (regno < FIRST_PSEUDO_REGISTER ? 0 : 3) < reloads_num) > + && (regno < FIRST_PSEUDO_REGISTER > + /* For short living pseudos, spilling + inheritance can > + be considered a substitution for splitting. > + Therefore we do not splitting for local pseudos. It > + decreases also aggressiveness of splitting. The > + minimal number of references is chosen taking into > + account that for 2 references splitting has no sense > + as we can just spill the pseudo. */ > + || (regno >= FIRST_PSEUDO_REGISTER > + && lra_reg_info[regno].nrefs > 3 > + && bitmap_bit_p (&ebb_global_regs, regno)))))) > || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno))); > }
Index: params.def =================================================================== --- params.def (revision 209253) +++ params.def (working copy) @@ -826,6 +826,11 @@ DEFPARAM (PARAM_LRA_MAX_CONSIDERED_RELOA "The max number of reload pseudos which are considered during spilling a non-reload pseudo", 500, 0, 0) +DEFPARAM (PARAM_LRA_SPLIT_FREQ_RATIO, + "lra-split-freq-ratio", + "The ratio used to check when lra split is preferred than spilled", + 9, 0, 0) + /* Switch initialization conversion will refuse to create arrays that are bigger than this parameter times the number of switch branches. */ Index: ira-lives.c =================================================================== --- ira-lives.c (revision 209253) +++ ira-lives.c (working copy) @@ -1025,7 +1025,11 @@ process_single_reg_class_operands (bool { ira_object_t obj = ira_object_id_map[px]; a = OBJECT_ALLOCNO (obj); - if (a != operand_a) + /* If a is much hotter in some other region, don't add reg class + cl into its conflict hardreg set. Let lra_split to do splitting + here for operand_a. */ + if (a != operand_a + && (LRA_SPLIT_FREQ_RATIO * freq >= a->freq)) { /* We could increase costs of A instead of making it conflicting with the hard register. But it works worse Index: params.h =================================================================== --- params.h (revision 209253) +++ params.h (working copy) @@ -198,6 +198,8 @@ extern void init_param_values (int *para PARAM_VALUE (PARAM_IRA_LOOP_RESERVED_REGS) #define LRA_MAX_CONSIDERED_RELOAD_PSEUDOS \ PARAM_VALUE (PARAM_LRA_MAX_CONSIDERED_RELOAD_PSEUDOS) +#define LRA_SPLIT_FREQ_RATIO \ + PARAM_VALUE (PARAM_LRA_SPLIT_FREQ_RATIO) #define SWITCH_CONVERSION_BRANCH_RATIO \ PARAM_VALUE (PARAM_SWITCH_CONVERSION_BRANCH_RATIO) #define LOOP_INVARIANT_MAX_BBS_IN_LOOP \ Index: lra-constraints.c =================================================================== --- lra-constraints.c (revision 209253) +++ lra-constraints.c (working copy) @@ -129,6 +129,7 @@ #include "ira.h" #include "rtl-error.h" #include "lra-int.h" +#include "params.h" /* Value of LRA_CURR_RELOAD_NUM at the beginning of BB of the current insn. Remember that LRA_CURR_RELOAD_NUM is the number of emitted @@ -4632,8 +4633,13 @@ static bitmap_head ebb_global_regs; static inline bool need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno) { + int freq; + rtx last_use_insn; int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno]; + last_use_insn = skip_usage_debug_insns (usage_insns[regno].insns); + freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (last_use_insn)); + lra_assert (hard_regno >= 0); return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno) /* Don't split eliminable hard registers, otherwise we can @@ -4653,25 +4659,27 @@ need_for_split_p (HARD_REG_SET potential && (regno >= FIRST_PSEUDO_REGISTER || ! TEST_HARD_REG_BIT (call_used_reg_set, regno) || usage_insns[regno].calls_num == calls_num) - /* We need at least 2 reloads to make pseudo splitting - profitable. We should provide hard regno splitting in - any case to solve 1st insn scheduling problem when - moving hard register definition up might result in - impossibility to find hard register for reload pseudo of - small register class. */ - && (usage_insns[regno].reloads_num - + (regno < FIRST_PSEUDO_REGISTER ? 0 : 3) < reloads_num) - && (regno < FIRST_PSEUDO_REGISTER - /* For short living pseudos, spilling + inheritance can - be considered a substitution for splitting. - Therefore we do not splitting for local pseudos. It - decreases also aggressiveness of splitting. The - minimal number of references is chosen taking into - account that for 2 references splitting has no sense - as we can just spill the pseudo. */ - || (regno >= FIRST_PSEUDO_REGISTER - && lra_reg_info[regno].nrefs > 3 - && bitmap_bit_p (&ebb_global_regs, regno)))) + /* If + && ((LRA_SPLIT_FREQ_RATIO * freq < lra_reg_info[regno].freq) + /* We need at least 2 reloads to make pseudo splitting + profitable. We should provide hard regno splitting in + any case to solve 1st insn scheduling problem when + moving hard register definition up might result in + impossibility to find hard register for reload pseudo of + small register class. */ + || ((usage_insns[regno].reloads_num + + (regno < FIRST_PSEUDO_REGISTER ? 0 : 3) < reloads_num) + && (regno < FIRST_PSEUDO_REGISTER + /* For short living pseudos, spilling + inheritance can + be considered a substitution for splitting. + Therefore we do not splitting for local pseudos. It + decreases also aggressiveness of splitting. The + minimal number of references is chosen taking into + account that for 2 references splitting has no sense + as we can just spill the pseudo. */ + || (regno >= FIRST_PSEUDO_REGISTER + && lra_reg_info[regno].nrefs > 3 + && bitmap_bit_p (&ebb_global_regs, regno)))))) || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno))); } Index: testsuite/gcc.target/i386/pr60738-1.c =================================================================== --- testsuite/gcc.target/i386/pr60738-1.c (revision 0) +++ testsuite/gcc.target/i386/pr60738-1.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler "mov\[^\\n]*ecx," } } */ + +/* This test is to ensure r1 is lra-splitted in coldpath and leave + aside ecx for r2. */ + +int a, b, c, d, e, f, cond1, cond2; +void foo() { + int r1, r2, r3; + r1 = b; + r2 = d; + if (__builtin_expect(cond1 > 3, 0)) { + if (__builtin_expect(cond2 > 3, 0)) { + e = e * 5; + c = a << r1; + } + } + c = c << r2; + f = r1 + r2; +} Index: testsuite/gcc.target/i386/pr60738-2.c =================================================================== --- testsuite/gcc.target/i386/pr60738-2.c (revision 0) +++ testsuite/gcc.target/i386/pr60738-2.c (revision 0) @@ -0,0 +1,34 @@ +/* { dg-do compile { target { ia32 } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler "mov\[^\\n]*A\[^\\n]*, .ecx" } } */ +/* { dg-final { scan-assembler "mov\[^\\n]ecx, B\[^\\n]*" } } */ + +/* This test is to ensure no spill is generated for r1 + on hotpath because r1 can only use cl register. */ + +int a, b, c, d, e, cond1, cond2, A[20], B[20]; +void foo() { + int r1, r2, r3, r4, r5, r6, r7, r8; + r2 = A[2]; + r3 = A[3]; + r4 = A[4]; + r5 = A[5]; + r6 = A[6]; + r7 = A[7]; + r8 = A[8]; + + if (__builtin_expect(cond1 > 3, 0)) { + if (__builtin_expect(cond2 > 3, 0)) { + r1 = b; + c = a << r1; + } + } + + B[2] = r2; + B[3] = r3; + B[4] = r4; + B[5] = r5; + B[6] = r6; + B[7] = r7; + B[8] = r8; +}