Message ID | 197abd1f-081c-3206-4dd5-45f0b098612a@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Don't use optimize_function_for_speed_p too early [PR108184] | expand |
On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote: > As Honza pointed out in [1], the current uses of function > optimize_function_for_speed_p in rs6000_option_override_internal > are too early, since the query results from the functions > optimize_function_for_{speed,size}_p could be changed later due > to profile feedback and some function attributes handlings etc. > > This patch is to move optimize_function_for_speed_p to all the > use places of the corresponding flags, which follows the existing > practices. Maybe we can cache it somewhere at an appropriate > timing, but that's another thing. > @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie) > > /* Can we optimize saving the TOC in the prologue or > do we need to do it at every call? */ > - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca) > + if (TARGET_SAVE_TOC_INDIRECT > + && !cfun->calls_alloca > + && optimize_function_for_speed_p (cfun)) > cfun->machine->save_toc_in_prologue = true; Is this correct? If so, it really needs a separate testcase. The rest looks good. Thanks! Segher
Hi Segher, Thanks for the comments. on 2023/1/4 18:46, Segher Boessenkool wrote: > On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote: >> As Honza pointed out in [1], the current uses of function >> optimize_function_for_speed_p in rs6000_option_override_internal >> are too early, since the query results from the functions >> optimize_function_for_{speed,size}_p could be changed later due >> to profile feedback and some function attributes handlings etc. >> >> This patch is to move optimize_function_for_speed_p to all the >> use places of the corresponding flags, which follows the existing >> practices. Maybe we can cache it somewhere at an appropriate >> timing, but that's another thing. > >> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie) >> >> /* Can we optimize saving the TOC in the prologue or >> do we need to do it at every call? */ >> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca) >> + if (TARGET_SAVE_TOC_INDIRECT >> + && !cfun->calls_alloca >> + && optimize_function_for_speed_p (cfun)) >> cfun->machine->save_toc_in_prologue = true; > > Is this correct? If so, it really needs a separate testcase. > Yes, it just moves the condition from: --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p) /* If we can shrink-wrap the TOC register save separately, then use -msave-toc-indirect unless explicitly disabled. */ if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 - && flag_shrink_wrap_separate - && optimize_function_for_speed_p (cfun)) + && flag_shrink_wrap_separate) rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; here. I tried to find one test case before, but failed to find one which is not fragile to test. And I thought the associated test case has demonstrated why the use of optimize_function_for_{speed,size}_p is too early in function rs6000_option_override_internal, so I gave up then. Do you worry about that we could revert it unexpectedly in future and no sensitive test case is on it? BR, Kewen
Hi! On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote: > on 2023/1/4 18:46, Segher Boessenkool wrote: > >> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie) > >> > >> /* Can we optimize saving the TOC in the prologue or > >> do we need to do it at every call? */ > >> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca) > >> + if (TARGET_SAVE_TOC_INDIRECT > >> + && !cfun->calls_alloca > >> + && optimize_function_for_speed_p (cfun)) > >> cfun->machine->save_toc_in_prologue = true; > > > > Is this correct? If so, it really needs a separate testcase. > > Yes, it just moves the condition from: > > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p) > /* If we can shrink-wrap the TOC register save separately, then use > -msave-toc-indirect unless explicitly disabled. */ > if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 > - && flag_shrink_wrap_separate > - && optimize_function_for_speed_p (cfun)) > + && flag_shrink_wrap_separate) > rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; > > here. That "just" reinforces that this really needs a testcase! It is all action at a distance, none of this is trivial (if it was there would not be a bug here in the first place, of course). > I tried to find one test case before, but failed to find one which is not fragile > to test. And I thought the associated test case has demonstrated why the use of > optimize_function_for_{speed,size}_p is too early in function > rs6000_option_override_internal, so I gave up then. Do you worry about that we > could revert it unexpectedly in future and no sensitive test case is on it? I worry that it might contradict what some other code does. I also worry that it just is not a sensible thing to do. I do not worry that your patch is not an improvement. But the resulting code more clearly (than the original) is problematic. Where is r2 saved to the frame if save_toc_in_prologue is false? Segher
on 2023/1/4 22:02, Segher Boessenkool wrote: > Hi! > > On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote: >> on 2023/1/4 18:46, Segher Boessenkool wrote: >>>> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie) >>>> >>>> /* Can we optimize saving the TOC in the prologue or >>>> do we need to do it at every call? */ >>>> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca) >>>> + if (TARGET_SAVE_TOC_INDIRECT >>>> + && !cfun->calls_alloca >>>> + && optimize_function_for_speed_p (cfun)) >>>> cfun->machine->save_toc_in_prologue = true; >>> >>> Is this correct? If so, it really needs a separate testcase. >> >> Yes, it just moves the condition from: >> >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p) >> /* If we can shrink-wrap the TOC register save separately, then use >> -msave-toc-indirect unless explicitly disabled. */ >> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >> - && flag_shrink_wrap_separate >> - && optimize_function_for_speed_p (cfun)) >> + && flag_shrink_wrap_separate) >> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >> >> here. > > That "just" reinforces that this really needs a testcase! It is all > action at a distance, none of this is trivial (if it was there would > not be a bug here in the first place, of course). OK, I'll make a test case for it. :) > >> I tried to find one test case before, but failed to find one which is not fragile >> to test. And I thought the associated test case has demonstrated why the use of >> optimize_function_for_{speed,size}_p is too early in function >> rs6000_option_override_internal, so I gave up then. Do you worry about that we >> could revert it unexpectedly in future and no sensitive test case is on it? > > I worry that it might contradict what some other code does. I also > worry that it just is not a sensible thing to do. > > I do not worry that your patch is not an improvement. But the resulting > code more clearly (than the original) is problematic. Where is r2 saved > to the frame if save_toc_in_prologue is false? If save_toc_in_prologue is false, the r2 saving to frame would occur at each indirect call. Currently separate shrink-wrapping will check save_toc_in_prologue to decide whether to consider saving toc as one component, I think that's why we enable save-toc-indirect implicitly (going to set save_toc_in_prologue) if it's not specified explicitly and doing separate shrink-wrapping. BR, Kewen
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index b1fcc69bb60..11f1779e7bf 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1878,7 +1878,9 @@ (define_predicate "fusion_gpr_mem_load" /* Handle sign/zero extend. */ if (GET_CODE (op) == ZERO_EXTEND - || (TARGET_P8_FUSION_SIGN && GET_CODE (op) == SIGN_EXTEND)) + || (TARGET_P8_FUSION_SIGN + && GET_CODE (op) == SIGN_EXTEND + && optimize_function_for_speed_p (cfun))) { op = XEXP (op, 0); mode = GET_MODE (op); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index eb7ad5e954f..bbf829f45d9 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p) /* If we can shrink-wrap the TOC register save separately, then use -msave-toc-indirect unless explicitly disabled. */ if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 - && flag_shrink_wrap_separate - && optimize_function_for_speed_p (cfun)) + && flag_shrink_wrap_separate) rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; /* Enable power8 fusion if we are tuning for power8, even if we aren't @@ -4013,7 +4012,6 @@ rs6000_option_override_internal (bool global_init_p) zero extending load, and an explicit sign extension. */ if (TARGET_P8_FUSION && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN) - && optimize_function_for_speed_p (cfun) && optimize >= 3) rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN; @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie) /* Can we optimize saving the TOC in the prologue or do we need to do it at every call? */ - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca) + if (TARGET_SAVE_TOC_INDIRECT + && !cfun->calls_alloca + && optimize_function_for_speed_p (cfun)) cfun->machine->save_toc_in_prologue = true; else { @@ -27471,7 +27471,9 @@ fusion_gpr_load_p (rtx addis_reg, /* register set via addis. */ /* Allow sign/zero extension. */ if (GET_CODE (mem) == ZERO_EXTEND - || (GET_CODE (mem) == SIGN_EXTEND && TARGET_P8_FUSION_SIGN)) + || (GET_CODE (mem) == SIGN_EXTEND + && TARGET_P8_FUSION_SIGN + && optimize_function_for_speed_p (cfun))) mem = XEXP (mem, 0); if (!MEM_P (mem)) @@ -27535,7 +27537,9 @@ expand_fusion_gpr_load (rtx *operands) enum rtx_code extend = UNKNOWN; if (GET_CODE (orig_mem) == ZERO_EXTEND - || (TARGET_P8_FUSION_SIGN && GET_CODE (orig_mem) == SIGN_EXTEND)) + || (TARGET_P8_FUSION_SIGN + && GET_CODE (orig_mem) == SIGN_EXTEND + && optimize_function_for_speed_p (cfun))) { extend = GET_CODE (orig_mem); orig_mem = XEXP (orig_mem, 0); diff --git a/gcc/testsuite/gcc.target/powerpc/pr108184.c b/gcc/testsuite/gcc.target/powerpc/pr108184.c new file mode 100644 index 00000000000..8f1e91d9258 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108184.c @@ -0,0 +1,15 @@ +/* Only possible to fuse sign extended loads with the addis when + optimize >= 3 and Power8 fusion takes effects. */ +/* { dg-options "-mdejagnu-tune=power8 -O3" } */ + +/* Verify it doesn't optimize this function for speed as it's cold, + by checking it doesn't try to fuse sign extended loads with addis, + which results in a zero extended load and a sign extension. */ + +__attribute__ ((cold)) int +fusion_short (short *p) +{ + return p[0x12345]; +} + +/* { dg-final { scan-assembler-not {\mextsh\M} } } */