Message ID | 56FAAD47.8090300@redhat.com |
---|---|
State | New |
Headers | show |
On 29 March 2016 at 18:28, Vladimir Makarov <vmakarov@redhat.com> wrote: > The following patch improves the code in 2 out of 3 cases in > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695 > > The patch uses more accurate costs for the RA cost improvement > optimization after colouring. > > The patch was tested and bootstrapped on x86-64. It is hard to create a > test to check the correct code generation. Therefore there is no test. As > the patch changes heuristics, a generated code in some cases will be > different but at least x86-64 tests expecting a specific code are not broken > by the patch. > > Committed as rev. 234527 > Hi, I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm: gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" Christophe
On 03/30/2016 05:23 PM, Christophe Lyon wrote: > On 29 March 2016 at 18:28, Vladimir Makarov <vmakarov@redhat.com> wrote: >> The following patch improves the code in 2 out of 3 cases in >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695 >> >> The patch uses more accurate costs for the RA cost improvement >> optimization after colouring. >> >> The patch was tested and bootstrapped on x86-64. It is hard to create a >> test to check the correct code generation. Therefore there is no test. As >> the patch changes heuristics, a generated code in some cases will be >> different but at least x86-64 tests expecting a specific code are not broken >> by the patch. >> >> Committed as rev. 234527 >> > Hi, > > I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm: > gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue > "Performing shrink-wrapping" > gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" > > Thanks, Christophe. I am having them too on the today trunk. I'll investigate this more.
On 03/30/2016 05:23 PM, Christophe Lyon wrote: > On 29 March 2016 at 18:28, Vladimir Makarov <vmakarov@redhat.com> wrote: >> The following patch improves the code in 2 out of 3 cases in >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695 >> >> The patch uses more accurate costs for the RA cost improvement >> optimization after colouring. >> >> The patch was tested and bootstrapped on x86-64. It is hard to create a >> test to check the correct code generation. Therefore there is no test. As >> the patch changes heuristics, a generated code in some cases will be >> different but at least x86-64 tests expecting a specific code are not broken >> by the patch. >> >> Committed as rev. 234527 >> > Hi, > > I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm: > gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue > "Performing shrink-wrapping" > gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" > I've checked the generated code. RA with the patch generates a better code for the both tests. So shrink wrap optimization failed. The final code has 1 insn less for the both tests when the patch is applied. I guess it is wrong to write quality tests based on expected code generated before any optimization. It has sense if we provide the same input. LLVM testsuite is mostly such tests as they have a readable IR. GCC unfortunately has no serialized and readable IR. On the other hand LLVM lacks integrated testing. So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in the tests.
On Fri, Apr 01, 2016 at 04:26:41PM -0400, Vladimir Makarov wrote: > >I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm: > > gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue > >"Performing shrink-wrapping" > > gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" > > > > I've checked the generated code. RA with the patch generates a better code > for the both tests. So shrink wrap optimization failed. The final code has 1 > insn less for the both tests when the patch is applied. > > I guess it is wrong to write quality tests based on expected code generated > before any optimization. It has sense if we provide the same input. LLVM > testsuite is mostly such tests as they have a readable IR. GCC > unfortunately has no serialized and readable IR. On the other hand LLVM > lacks integrated testing. > > So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in the > tests. FYI, those 2 tests also now FAIL on ppc64{,le}-linux in addition to armv7hl-linux-gnueabi. Jakub
Hi all, On 01/04/16 21:43, Jakub Jelinek wrote: > On Fri, Apr 01, 2016 at 04:26:41PM -0400, Vladimir Makarov wrote: >>> I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm: >>> gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue >>> "Performing shrink-wrapping" >>> gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" >>> >> I've checked the generated code. RA with the patch generates a better code >> for the both tests. So shrink wrap optimization failed. The final code has 1 >> insn less for the both tests when the patch is applied. >> >> I guess it is wrong to write quality tests based on expected code generated >> before any optimization. It has sense if we provide the same input. LLVM >> testsuite is mostly such tests as they have a readable IR. GCC >> unfortunately has no serialized and readable IR. On the other hand LLVM >> lacks integrated testing. >> >> So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in the >> tests. > FYI, those 2 tests also now FAIL on ppc64{,le}-linux in addition to > armv7hl-linux-gnueabi. So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this patch we perform shrink-wrapping: cmp r0, #0 bxeq lr push {r4, lr} mov r4, r0 ... And after the patch we don't: push {r4, lr} subs r4, r0, #0 popeq {r4, pc} ... The assembly after the "..." is identical. So the resulting code is indeed shorter, though there is an extra stack push and pop on the early return path. A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c. I think both codegen decisions are valid though one could argue that the new codegen is more appropriate for -Os rather than -O3. If you agree then this is indeed a regression. Though if so, it looks like a shrink-wrapping deficiency exposed by this patch, rather than caused by it. Jakub, do you happen to have the before and after codegen for these tests on ppc64? I wonder if the effect is more clearcut there. Thanks, Kyrill > Jakub >
On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote: > So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this patch we > perform shrink-wrapping: > cmp r0, #0 > bxeq lr > push {r4, lr} > mov r4, r0 > ... > > And after the patch we don't: > push {r4, lr} > subs r4, r0, #0 > popeq {r4, pc} > ... > > The assembly after the "..." is identical. > > So the resulting code is indeed shorter, though there is an > extra stack push and pop on the early return path. > A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c. The "new" code is better if there is no shrink-wrapping. We can probably teach prepare_shrink_wrap to do the extra register move if that will allow us to wrap more. > Though if so, it looks like a shrink-wrapping deficiency exposed by > this patch, rather than caused by it. Yes, and mostly a testcase problem even. > Jakub, do you happen to have the before and after codegen for these tests > on ppc64? I wonder if the effect is more clearcut there. RTL before shrink-wrapping would be useful, too. Segher
On 05/04/16 23:35, Segher Boessenkool wrote: > On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote: >> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this patch we >> perform shrink-wrapping: >> cmp r0, #0 >> bxeq lr >> push {r4, lr} >> mov r4, r0 >> ... >> >> And after the patch we don't: >> push {r4, lr} >> subs r4, r0, #0 >> popeq {r4, pc} >> ... >> >> The assembly after the "..." is identical. >> >> So the resulting code is indeed shorter, though there is an >> extra stack push and pop on the early return path. >> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c. > The "new" code is better if there is no shrink-wrapping. We can probably > teach prepare_shrink_wrap to do the extra register move if that will allow > us to wrap more. > >> Though if so, it looks like a shrink-wrapping deficiency exposed by >> this patch, rather than caused by it. > Yes, and mostly a testcase problem even. > >> Jakub, do you happen to have the before and after codegen for these tests >> on ppc64? I wonder if the effect is more clearcut there. > RTL before shrink-wrapping would be useful, too. So what shall we do for these tests for GCC 6? Add an XFAIL for arm and powerpc? Kyrill > > Segher
On 04/15/2016 05:06 AM, Kyrill Tkachov wrote: > > On 05/04/16 23:35, Segher Boessenkool wrote: >> On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote: >>> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this >>> patch we >>> perform shrink-wrapping: >>> cmp r0, #0 >>> bxeq lr >>> push {r4, lr} >>> mov r4, r0 >>> ... >>> >>> And after the patch we don't: >>> push {r4, lr} >>> subs r4, r0, #0 >>> popeq {r4, pc} >>> ... >>> >>> The assembly after the "..." is identical. >>> >>> So the resulting code is indeed shorter, though there is an >>> extra stack push and pop on the early return path. >>> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c. >> The "new" code is better if there is no shrink-wrapping. We can probably >> teach prepare_shrink_wrap to do the extra register move if that will >> allow >> us to wrap more. >> >>> Though if so, it looks like a shrink-wrapping deficiency exposed by >>> this patch, rather than caused by it. >> Yes, and mostly a testcase problem even. >> >>> Jakub, do you happen to have the before and after codegen for these >>> tests >>> on ppc64? I wonder if the effect is more clearcut there. >> RTL before shrink-wrapping would be useful, too. > > So what shall we do for these tests for GCC 6? > Add an XFAIL for arm and powerpc? We could just punt gcc-6 and focus on what we want for gcc-7 as this isn't a release critical issue. jeff
On 15/04/16 17:18, Jeff Law wrote: > On 04/15/2016 05:06 AM, Kyrill Tkachov wrote: >> >> On 05/04/16 23:35, Segher Boessenkool wrote: >>> On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote: >>>> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this >>>> patch we >>>> perform shrink-wrapping: >>>> cmp r0, #0 >>>> bxeq lr >>>> push {r4, lr} >>>> mov r4, r0 >>>> ... >>>> >>>> And after the patch we don't: >>>> push {r4, lr} >>>> subs r4, r0, #0 >>>> popeq {r4, pc} >>>> ... >>>> >>>> The assembly after the "..." is identical. >>>> >>>> So the resulting code is indeed shorter, though there is an >>>> extra stack push and pop on the early return path. >>>> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c. >>> The "new" code is better if there is no shrink-wrapping. We can probably >>> teach prepare_shrink_wrap to do the extra register move if that will >>> allow >>> us to wrap more. >>> >>>> Though if so, it looks like a shrink-wrapping deficiency exposed by >>>> this patch, rather than caused by it. >>> Yes, and mostly a testcase problem even. >>> >>>> Jakub, do you happen to have the before and after codegen for these >>>> tests >>>> on ppc64? I wonder if the effect is more clearcut there. >>> RTL before shrink-wrapping would be useful, too. >> >> So what shall we do for these tests for GCC 6? >> Add an XFAIL for arm and powerpc? > We could just punt gcc-6 and focus on what we want for gcc-7 as this isn't a release critical issue. > This was resolved with: https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00724.html Sorry, I should have replied to this thread... Kyrill > jeff
On 04/15/2016 10:20 AM, Kyrill Tkachov wrote: > > This was resolved with: > https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00724.html > > Sorry, I should have replied to this thread... No worries. I probably should have checked the testcase before replying to the older email thread. jeff
Index: ChangeLog =================================================================== --- ChangeLog (revision 234526) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2016-03-29 Vladimir Makarov <vmakarov@redhat.com> + + PR rtl-optimization/68695 + * ira-color.c (allocno_copy_cost_saving): New. + (improve_allocation): Use it. + 2016-03-29 Richard Henderson <rth@redhat.com> PR middle-end/70355 Index: ira-color.c =================================================================== --- ira-color.c (revision 234526) +++ ira-color.c (working copy) @@ -2728,6 +2728,37 @@ allocno_cost_compare_func (const void *v return ALLOCNO_NUM (p1) - ALLOCNO_NUM (p2); } +/* Return savings on removed copies when ALLOCNO is assigned to + HARD_REGNO. */ +static int +allocno_copy_cost_saving (ira_allocno_t allocno, int hard_regno) +{ + int cost = 0; + enum reg_class rclass; + ira_copy_t cp, next_cp; + + rclass = REGNO_REG_CLASS (hard_regno); + for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp) + { + if (cp->first == allocno) + { + next_cp = cp->next_first_allocno_copy; + if (ALLOCNO_HARD_REGNO (cp->second) != hard_regno) + continue; + } + else if (cp->second == allocno) + { + next_cp = cp->next_second_allocno_copy; + if (ALLOCNO_HARD_REGNO (cp->first) != hard_regno) + continue; + } + else + gcc_unreachable (); + cost += cp->freq * ira_register_move_cost[ALLOCNO_MODE (allocno)][rclass][rclass]; + } + return cost; +} + /* We used Chaitin-Briggs coloring to assign as many pseudos as possible to hard registers. Let us try to improve allocation with cost point of view. This function improves the allocation by @@ -2768,9 +2799,7 @@ improve_allocation (void) continue; check++; aclass = ALLOCNO_CLASS (a); - allocno_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (a); - if (allocno_costs == NULL) - allocno_costs = ALLOCNO_HARD_REG_COSTS (a); + allocno_costs = ALLOCNO_HARD_REG_COSTS (a); if ((hregno = ALLOCNO_HARD_REGNO (a)) < 0) base_cost = ALLOCNO_UPDATED_MEMORY_COST (a); else if (allocno_costs == NULL) @@ -2779,7 +2808,8 @@ improve_allocation (void) case). */ continue; else - base_cost = allocno_costs[ira_class_hard_reg_index[aclass][hregno]]; + base_cost = (allocno_costs[ira_class_hard_reg_index[aclass][hregno]] + - allocno_copy_cost_saving (a, hregno)); try_p = false; get_conflict_and_start_profitable_regs (a, false, conflicting_regs, @@ -2797,6 +2827,7 @@ improve_allocation (void) k = allocno_costs == NULL ? 0 : j; costs[hregno] = (allocno_costs == NULL ? ALLOCNO_UPDATED_CLASS_COST (a) : allocno_costs[k]); + costs[hregno] -= allocno_copy_cost_saving (a, hregno); costs[hregno] -= base_cost; if (costs[hregno] < 0) try_p = true; @@ -2835,14 +2866,13 @@ improve_allocation (void) k = (ira_class_hard_reg_index [ALLOCNO_CLASS (conflict_a)][conflict_hregno]); ira_assert (k >= 0); - if ((allocno_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (conflict_a)) + if ((allocno_costs = ALLOCNO_HARD_REG_COSTS (conflict_a)) != NULL) spill_cost -= allocno_costs[k]; - else if ((allocno_costs = ALLOCNO_HARD_REG_COSTS (conflict_a)) - != NULL) - spill_cost -= allocno_costs[k]; else spill_cost -= ALLOCNO_UPDATED_CLASS_COST (conflict_a); + spill_cost + += allocno_copy_cost_saving (conflict_a, conflict_hregno); conflict_nregs = hard_regno_nregs[conflict_hregno][ALLOCNO_MODE (conflict_a)]; for (r = conflict_hregno;