Message ID | 584AB9AA.6030800@arm.com |
---|---|
State | New |
Headers | show |
On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote: > This patch fixes the issue reported in PR78255 by making postreload > aware it should not be performing CSE on functions if NO_FUNCTION_CSE is > defined to true. > > Bootstrap and full regression on arm-none-linux-gnueabihf and > aarch64-unknown-linux-gnu. > > Also checked this fixed the reported issue on arm-none-eabi. > > Is this OK for trunk? Hmm, it probably doesn't hurt, but looking at the PR I think the originally reported problem suggests you need a different fix: a separate register class to be used for indirect sibling calls. I remember seeing similar issues on other targets. Bernd
On 09/12/16 15:02, Bernd Schmidt wrote: > On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote: >> This patch fixes the issue reported in PR78255 by making postreload >> aware it should not be performing CSE on functions if NO_FUNCTION_CSE is >> defined to true. >> >> Bootstrap and full regression on arm-none-linux-gnueabihf and >> aarch64-unknown-linux-gnu. >> >> Also checked this fixed the reported issue on arm-none-eabi. >> >> Is this OK for trunk? > > Hmm, it probably doesn't hurt, but looking at the PR I think the > originally reported problem suggests you need a different fix: a > separate register class to be used for indirect sibling calls. I > remember seeing similar issues on other targets. > > > Bernd I agree that even though this "fixes" the PR issue, this change is fixing more than just that. As for your suggestion to use a separate register class for indirect sibling calls. We already do, we use CALLER_SAVE_REGS. However, 'r3' is also allowed by that scheme as it should. Since if we don't use 'r3' to either pass an argument or align the stack, then it is perfectly valid to use it for indirect sibling calls. The problem is at the time where we decide whether it is safe to use 'r3' we expect the assigned registers not to change and postreload does, when it shouldn't. Hence why I am now telling it to not do that. Now it could be that there are other cases in which the register allocation would change after reload and before the pro and epilogue pass. Maybe we shouldn't be making the decision quite so early. This is a bit of a can of worms though... Regardless, the other testcases I add in this patch show a sub-optimal transformation done by postreload, turning direct calls into indirect calls, for targets which have specifically pointed out that no CSE should be done on functions through 'NO_FUNCTION_CSE'. Maybe it would make more sense to split this up into two PR's, though by fixing postreload I wouldn't be able to reproduce the failure mentioned in PR78255. Would you prefer I create a new PR for the problem this is actually fixing and refile this PATCH under that PR? Cheers, Andre
On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote: > Regardless, the other testcases I add in this patch show a sub-optimal > transformation done by postreload, turning direct calls into indirect > calls, for targets which have specifically pointed out that no CSE > should be done on functions through 'NO_FUNCTION_CSE'. What I'm wondering about is whether the patch wouldn't also prevent the opposite transformation. Is there a reason not to do that one? Can the problem be modeled by tweaking costs? > Would you prefer I create a new PR for the problem this is actually > fixing and refile this PATCH under that PR? Well, as long as you're working on fixing it I see no reason to clutter the bug database for the function cse issue, but do keep the existing PR open if there also ought to be register class changes. Bernd
On 12/09/2016 08:02 AM, Bernd Schmidt wrote: > On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote: >> This patch fixes the issue reported in PR78255 by making postreload >> aware it should not be performing CSE on functions if NO_FUNCTION_CSE is >> defined to true. >> >> Bootstrap and full regression on arm-none-linux-gnueabihf and >> aarch64-unknown-linux-gnu. >> >> Also checked this fixed the reported issue on arm-none-eabi. >> >> Is this OK for trunk? > > Hmm, it probably doesn't hurt, but looking at the PR I think the > originally reported problem suggests you need a different fix: a > separate register class to be used for indirect sibling calls. I > remember seeing similar issues on other targets. I think we actually split the call patterns into direct and indirect variants on the PA when we stumbled on this in cse.c. Jeff
On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt <bschmidt@redhat.com> wrote: > On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote: > >> Regardless, the other testcases I add in this patch show a sub-optimal >> transformation done by postreload, turning direct calls into indirect >> calls, for targets which have specifically pointed out that no CSE >> should be done on functions through 'NO_FUNCTION_CSE'. > > > What I'm wondering about is whether the patch wouldn't also prevent the > opposite transformation. Is there a reason not to do that one? Can the > problem be modeled by tweaking costs? I really don't think we should have a solution that relies on costs for correctness . regards Ramana
On 09/12/16 16:02, Ramana Radhakrishnan wrote: > On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt <bschmidt@redhat.com> wrote: >> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote: >> >>> Regardless, the other testcases I add in this patch show a sub-optimal >>> transformation done by postreload, turning direct calls into indirect >>> calls, for targets which have specifically pointed out that no CSE >>> should be done on functions through 'NO_FUNCTION_CSE'. >> >> >> What I'm wondering about is whether the patch wouldn't also prevent the >> opposite transformation. Is there a reason not to do that one? Can the >> problem be modeled by tweaking costs? > > I really don't think we should have a solution that relies on costs > for correctness . > > regards > Ramana > Regardless, 'reload_cse_simplify' would never perform the opposite transformation. It checks whether it can replace anything within the first argument INSN, with the second argument TESTREG. As the name implies this will always be a register. I double checked, the function is only called in 'reload_cse_regs' and 'testreg' is created using 'gen_rtx_REG'. Cheers, Andre
On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote: > Regardless, 'reload_cse_simplify' would never perform the opposite > transformation. It checks whether it can replace anything within the > first argument INSN, with the second argument TESTREG. As the name > implies this will always be a register. I double checked, the function > is only called in 'reload_cse_regs' and 'testreg' is created using > 'gen_rtx_REG'. Ok, let's go ahead with it. Bernd
Hi Andre, On 9 December 2016 at 17:16, Andre Vieira (lists) <Andre.SimoesDiasVieira@arm.com> wrote: > On 09/12/16 16:02, Ramana Radhakrishnan wrote: >> On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt <bschmidt@redhat.com> wrote: >>> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote: >>> >>>> Regardless, the other testcases I add in this patch show a sub-optimal >>>> transformation done by postreload, turning direct calls into indirect >>>> calls, for targets which have specifically pointed out that no CSE >>>> should be done on functions through 'NO_FUNCTION_CSE'. >>> >>> >>> What I'm wondering about is whether the patch wouldn't also prevent the >>> opposite transformation. Is there a reason not to do that one? Can the >>> problem be modeled by tweaking costs? >> >> I really don't think we should have a solution that relies on costs >> for correctness . >> >> regards >> Ramana >> > > Regardless, 'reload_cse_simplify' would never perform the opposite > transformation. It checks whether it can replace anything within the > first argument INSN, with the second argument TESTREG. As the name > implies this will always be a register. I double checked, the function > is only called in 'reload_cse_regs' and 'testreg' is created using > 'gen_rtx_REG'. > The new test (gcc.target/arm/pr78255-2.c scan-assembler b\\s+bar) added at r243494 fails on old arm architectures, such as: * arm-none-linux-gnueabi, forcing -march=armv5t in runtestflags * arm-none-eabi with default cpu/fpu/mode Christophe > Cheers, > Andre
On 09/12/16 16:31, Bernd Schmidt wrote: > On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote: > >> Regardless, 'reload_cse_simplify' would never perform the opposite >> transformation. It checks whether it can replace anything within the >> first argument INSN, with the second argument TESTREG. As the name >> implies this will always be a register. I double checked, the function >> is only called in 'reload_cse_regs' and 'testreg' is created using >> 'gen_rtx_REG'. > > Ok, let's go ahead with it. > > > Bernd > Hello, Is it OK to backport this (including the testcase fix) to gcc-6-branch? Patches apply cleanly and full bootstrap and regression tests for aarch64- and arm-none-linux-gnueabihf. Regression tested for arm-none-eabi. Cheers, Andre
On 01/06/2017 03:53 AM, Andre Vieira (lists) wrote: > On 09/12/16 16:31, Bernd Schmidt wrote: >> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote: >> >>> Regardless, 'reload_cse_simplify' would never perform the opposite >>> transformation. It checks whether it can replace anything within the >>> first argument INSN, with the second argument TESTREG. As the name >>> implies this will always be a register. I double checked, the function >>> is only called in 'reload_cse_regs' and 'testreg' is created using >>> 'gen_rtx_REG'. >> >> Ok, let's go ahead with it. >> >> >> Bernd >> > Hello, > > Is it OK to backport this (including the testcase fix) to gcc-6-branch? > > Patches apply cleanly and full bootstrap and regression tests for > aarch64- and arm-none-linux-gnueabihf. Regression tested for arm-none-eabi. Yes, that should be fine to backport to the active release branches. jeff
On 06/01/17 15:47, Jeff Law wrote: > On 01/06/2017 03:53 AM, Andre Vieira (lists) wrote: >> On 09/12/16 16:31, Bernd Schmidt wrote: >>> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote: >>> >>>> Regardless, 'reload_cse_simplify' would never perform the opposite >>>> transformation. It checks whether it can replace anything within the >>>> first argument INSN, with the second argument TESTREG. As the name >>>> implies this will always be a register. I double checked, the function >>>> is only called in 'reload_cse_regs' and 'testreg' is created using >>>> 'gen_rtx_REG'. >>> >>> Ok, let's go ahead with it. >>> >>> >>> Bernd >>> >> Hello, >> >> Is it OK to backport this (including the testcase fix) to gcc-6-branch? >> >> Patches apply cleanly and full bootstrap and regression tests for >> aarch64- and arm-none-linux-gnueabihf. Regression tested for >> arm-none-eabi. > Yes, that should be fine to backport to the active release branches. > > jeff OK, I have committed the backports to gcc-5 and gcc-6 branches. Cheers, Andre
diff --git a/gcc/postreload.c b/gcc/postreload.c index 539ad33b6c3eb1b968677419a7420badc3a52f01..8325d121c403786fdb7804956724a81d134252a2 100644 --- a/gcc/postreload.c +++ b/gcc/postreload.c @@ -90,6 +90,11 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg) basic_block insn_bb = BLOCK_FOR_INSN (insn); unsigned insn_bb_succs = EDGE_COUNT (insn_bb->succs); + /* If NO_FUNCTION_CSE has been set by the target, then we should not try + to cse function calls. */ + if (NO_FUNCTION_CSE && CALL_P (insn)) + return false; + if (GET_CODE (body) == SET) { int count = 0; diff --git a/gcc/testsuite/gcc.target/aarch64/pr78255.c b/gcc/testsuite/gcc.target/aarch64/pr78255.c new file mode 100644 index 0000000000000000000000000000000000000000..b078cf3e1c1c7717c9e227721a367f9846f0c7fe --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr78255.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcmodel=tiny" } */ + +extern int bar (void *); + +int +foo (void) +{ + return bar ((void *)bar); +} + +/* { dg-final { scan-assembler "b\\s+bar" } } */ diff --git a/gcc/testsuite/gcc.target/arm/pr78255-1.c b/gcc/testsuite/gcc.target/arm/pr78255-1.c new file mode 100644 index 0000000000000000000000000000000000000000..4901acea51466c0bac92d9cb90e52b00b450d88a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr78255-1.c @@ -0,0 +1,57 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include <string.h> + +struct table_s + { + void (*fun0) + ( void ); + void (*fun1) + ( void ); + void (*fun2) + ( void ); + void (*fun3) + ( void ); + void (*fun4) + ( void ); + void (*fun5) + ( void ); + void (*fun6) + ( void ); + void (*fun7) + ( void ); + } table; + +void callback0(){__asm("mov r0, r0 \n\t");} +void callback1(){__asm("mov r0, r0 \n\t");} +void callback2(){__asm("mov r0, r0 \n\t");} +void callback3(){__asm("mov r0, r0 \n\t");} +void callback4(){__asm("mov r0, r0 \n\t");} + +void test (void) { + memset(&table, 0, sizeof table); + + asm volatile ("" : : : "r3"); + + table.fun0 = callback0; + table.fun1 = callback1; + table.fun2 = callback2; + table.fun3 = callback3; + table.fun4 = callback4; + table.fun0(); +} + +void foo (void) +{ + __builtin_abort (); +} + +int main (void) +{ + unsigned long p = (unsigned long) &foo; + asm volatile ("mov r3, %0" : : "r" (p)); + test (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/arm/pr78255-2.c b/gcc/testsuite/gcc.target/arm/pr78255-2.c new file mode 100644 index 0000000000000000000000000000000000000000..9e64ef3939465b088e35a01d4bb23fd50d43006d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr78255-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +extern int bar (void *); + +int +foo (void) +{ + return bar ((void*)bar); +} + +/* { dg-final { scan-assembler "b\\s+bar" } } */