Message ID | 58c24c28-0af8-4f0a-b884-a482c84e707e@arm.com |
---|---|
State | New |
Headers | show |
Series | arm: Fix missed CE optimization for armv8.1-m.main [PR 116444] | expand |
> On 25 Sep 2024, at 7:38 PM, Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Hi, > > This patch restores missed optimizations for armv8.1-m.main targets that > were missed when the generation of csinc, csinv and csneg were enabled > or the same with patch series containing: > > commit c2bb84be4a6e581bbf45891457ee632a07416982 > Author: Sudi Das <sudi.das@arm.com> > Date: Fri Sep 18 15:47:46 2020 +0100 > > [PATCH 2/5][Arm] New pattern for CSINV instructions > > The original patch series makes use of the "noce" machinery to transfor > RTL into patterns that later match the Armv8.1-M Mainline, by getting > the target hook TARGET_HAVE_CONDITIONAL_EXECUTION, to return FALSE for > such targets prior to reload_completed. The same machinery however was > transforming other RTL patterns which were later on causing the "ce" > pass post reload_completed to no longer optimize conditional execution > opportunities, which was causing the regression observed in PR > target/116444. > > This patch implements the target hook > TARGET_NOCE_CONVERSION_PROFITABLE_P to only allow "noce" to generate > patterns that match CSINV, CSINC and CSNEG. Thus ensuring that the > early "ce" passes do not ruin things for later ones. > > gcc/ChangeLog: > > * config/arm/arm-protos.h (arm_noce_conversion_profitable_p): New > declaration. > * config/arm/arm.cc (arm_is_v81m_cond_insn): New helper function used > in ... > (arm_noce_conversion_profitable_p): ... here. New function to implement > ... > (TARGET_NOCE_PROFITABLE_P): ... this target hook. New define. > > > Regression tested on arm-none-eabi, also specifically tested > thumb-icvt-2.c for -mcpu=cortex-m55. > > OK for trunk and backport to gcc-14 (after a week and testing ofc)? > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 50cae2b513a262880e4f6ee577341d7a5d2c39c6..b694589cab4b45730b257baacfdbd9db00584cdc 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -210,6 +210,7 @@ extern bool arm_pad_reg_upward (machine_mode, tree, int); > #endif > extern int arm_apply_result_size (void); > extern opt_machine_mode arm_get_mask_mode (machine_mode mode); > +extern bool arm_noce_conversion_profitable_p (rtx_insn *,struct noce_if_info *); > #endif /* RTX_CODE */ > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index de34e9867e67f5c4730e92a4bc6f347d83847e58..fe9b1ad5bf8560858d3996c0659d65c9887307d9 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -815,6 +815,9 @@ static const scoped_attribute_specs *const arm_attribute_table[] = > #undef TARGET_MODES_TIEABLE_P > #define TARGET_MODES_TIEABLE_P arm_modes_tieable_p > +#undef TARGET_NOCE_CONVERSION_PROFITABLE_P > +#define TARGET_NOCE_CONVERSION_PROFITABLE_P arm_noce_conversion_profitable_p > + > #undef TARGET_CAN_CHANGE_MODE_CLASS > #define TARGET_CAN_CHANGE_MODE_CLASS arm_can_change_mode_class > @@ -36074,6 +36077,90 @@ arm_get_mask_mode (machine_mode mode) > return default_get_mask_mode (mode); > } > +/* Helper function to determine whether SEQ represents a sequence of > + instructions representing the Armv8.1-M Mainline conditional arithmetic > + instructions: csinc, csneg and csinv. The cinc instruction is generated > + using a different mechanism. */ > + > +static bool > +arm_is_v81m_cond_insn (rtx_insn *seq) > +{ > + rtx_insn *curr_insn = seq; > + rtx set; > + /* The pattern may start with a simple set with register operands. Skip > + through any of those. */ > + while (curr_insn) > + { > + set = single_set (curr_insn); > + if (!set > + || !REG_P (SET_DEST (set))) > + return false; > + > + if (!REG_P (SET_SRC (set))) > + break; > + curr_insn = NEXT_INSN (curr_insn); Too late at night for me - but don’t you want to skip DEBUG_INSNS in some way here ? Ramana > + } > + > + if (!set) > + return false; > + > + /* The next instruction should be one of: > + NEG: for csneg, > + PLUS: for csinc, > + NOT: for csinv. */ > + if (GET_CODE (SET_SRC (set)) != NEG > + && GET_CODE (SET_SRC (set)) != PLUS > + && GET_CODE (SET_SRC (set)) != NOT) > + return false; > + > + curr_insn = NEXT_INSN (curr_insn); > + if (!curr_insn) > + return false; > + > + /* The next instruction should be a COMPARE. */ > + set = single_set (curr_insn); > + if (!set > + || !REG_P (SET_DEST (set)) > + || GET_CODE (SET_SRC (set)) != COMPARE) > + return false; > + > + curr_insn = NEXT_INSN (curr_insn); > + if (!curr_insn) > + return false; > + > + /* And the last instruction should be an IF_THEN_ELSE. */ > + set = single_set (curr_insn); > + if (!set > + || !REG_P (SET_DEST (set)) > + || GET_CODE (SET_SRC (set)) != IF_THEN_ELSE) > + return false; > + > + return !NEXT_INSN (curr_insn); > +} > + > +/* For Armv8.1-M Mainline we have both conditional execution through IT blocks, > + as well as conditional arithmetic instructions controlled by > + TARGET_COND_ARITH. To generate the latter we rely on a special part of the > + "ce" pass that generates code for targets that don't support conditional > + execution of general instructions known as "noce". These transformations > + happen before 'reload_completed'. However, "noce" also triggers for some > + unwanted patterns [PR 116444] that prevent "ce" optimisations after reload. > + To make sure we can get both we use the TARGET_NOCE_CONVERSION_PROFITABLE_P > + hook to only allow "noce" to generate the patterns that are profitable. */ > + > +bool > +arm_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) > +{ > + if (!TARGET_COND_ARITH > + || reload_completed) > + return true; > + > + if (arm_is_v81m_cond_insn (seq)) > + return true; > + > + return false; > +} > + > /* Output assembly to read the thread pointer from the appropriate TPIDR > register into DEST. If PRED_P also emit the %? that can be used to > output the predication code. */ > > Kind Regards, > Andre Vieira > <pr116444.patch>
On 26/09/2024 18:56, Ramana Radhakrishnan wrote: > >> +/* Helper function to determine whether SEQ represents a sequence of >> + instructions representing the Armv8.1-M Mainline conditional arithmetic >> + instructions: csinc, csneg and csinv. The cinc instruction is generated >> + using a different mechanism. */ >> + >> +static bool >> +arm_is_v81m_cond_insn (rtx_insn *seq) >> +{ >> + rtx_insn *curr_insn = seq; >> + rtx set; >> + /* The pattern may start with a simple set with register operands. Skip >> + through any of those. */ >> + while (curr_insn) >> + { >> + set = single_set (curr_insn); >> + if (!set >> + || !REG_P (SET_DEST (set))) >> + return false; >> + >> + if (!REG_P (SET_SRC (set))) >> + break; >> + curr_insn = NEXT_INSN (curr_insn); > > Too late at night for me - but don’t you want to skip DEBUG_INSNS in some way here ? It's a good point, but this sequence is created by noce as a potential replacement for the incoming one and no debug insns are inserted here. Compiling gcc/gcc/testsuite/gcc.target/arm/csinv-1.c with -g3 for an Armv8.1-M Mainline target still generates the csinv. Either way, I could add code to skip if we don't have a NONDEBUG_INSN_P, but that means we should also do so after every NEXT_INSN after the while loop and at the end. It does feel 'more robust', but I also fear it might be a bit overkill here? Kind regards, Andre
On Fri, Sep 27, 2024 at 2:11 PM Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> wrote: > > > > On 26/09/2024 18:56, Ramana Radhakrishnan wrote: > > > > >> +/* Helper function to determine whether SEQ represents a sequence of > >> + instructions representing the Armv8.1-M Mainline conditional arithmetic > >> + instructions: csinc, csneg and csinv. The cinc instruction is generated > >> + using a different mechanism. */ > >> + > >> +static bool > >> +arm_is_v81m_cond_insn (rtx_insn *seq) > >> +{ > >> + rtx_insn *curr_insn = seq; > >> + rtx set; > >> + /* The pattern may start with a simple set with register operands. Skip > >> + through any of those. */ > >> + while (curr_insn) > >> + { > >> + set = single_set (curr_insn); > >> + if (!set > >> + || !REG_P (SET_DEST (set))) > >> + return false; > >> + > >> + if (!REG_P (SET_SRC (set))) > >> + break; > >> + curr_insn = NEXT_INSN (curr_insn); > > > > Too late at night for me - but don’t you want to skip DEBUG_INSNS in some way here ? > > > It's a good point, but this sequence is created by noce as a potential > replacement for the incoming one and no debug insns are inserted here. True and fair. > > Compiling gcc/gcc/testsuite/gcc.target/arm/csinv-1.c with -g3 for an > Armv8.1-M Mainline target still generates the csinv. > > Either way, I could add code to skip if we don't have a NONDEBUG_INSN_P, > but that means we should also do so after every NEXT_INSN after the > while loop and at the end. It does feel 'more robust', but I also fear > it might be a bit overkill here? Feels overkill as you say. Please watch out for any regressions Ok to commit and later backport if no regressions as this is sufficiently isolated to only Armv8.1M configurations. regards Ramana > > Kind regards, > Andre > >
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 50cae2b513a262880e4f6ee577341d7a5d2c39c6..b694589cab4b45730b257baacfdbd9db00584cdc 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -210,6 +210,7 @@ extern bool arm_pad_reg_upward (machine_mode, tree, int); #endif extern int arm_apply_result_size (void); extern opt_machine_mode arm_get_mask_mode (machine_mode mode); +extern bool arm_noce_conversion_profitable_p (rtx_insn *,struct noce_if_info *); #endif /* RTX_CODE */ diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index de34e9867e67f5c4730e92a4bc6f347d83847e58..fe9b1ad5bf8560858d3996c0659d65c9887307d9 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -815,6 +815,9 @@ static const scoped_attribute_specs *const arm_attribute_table[] = #undef TARGET_MODES_TIEABLE_P #define TARGET_MODES_TIEABLE_P arm_modes_tieable_p +#undef TARGET_NOCE_CONVERSION_PROFITABLE_P +#define TARGET_NOCE_CONVERSION_PROFITABLE_P arm_noce_conversion_profitable_p + #undef TARGET_CAN_CHANGE_MODE_CLASS #define TARGET_CAN_CHANGE_MODE_CLASS arm_can_change_mode_class @@ -36074,6 +36077,90 @@ arm_get_mask_mode (machine_mode mode) return default_get_mask_mode (mode); } +/* Helper function to determine whether SEQ represents a sequence of + instructions representing the Armv8.1-M Mainline conditional arithmetic + instructions: csinc, csneg and csinv. The cinc instruction is generated + using a different mechanism. */ + +static bool +arm_is_v81m_cond_insn (rtx_insn *seq) +{ + rtx_insn *curr_insn = seq; + rtx set; + /* The pattern may start with a simple set with register operands. Skip + through any of those. */ + while (curr_insn) + { + set = single_set (curr_insn); + if (!set + || !REG_P (SET_DEST (set))) + return false; + + if (!REG_P (SET_SRC (set))) + break; + curr_insn = NEXT_INSN (curr_insn); + } + + if (!set) + return false; + + /* The next instruction should be one of: + NEG: for csneg, + PLUS: for csinc, + NOT: for csinv. */ + if (GET_CODE (SET_SRC (set)) != NEG + && GET_CODE (SET_SRC (set)) != PLUS + && GET_CODE (SET_SRC (set)) != NOT) + return false; + + curr_insn = NEXT_INSN (curr_insn); + if (!curr_insn) + return false; + + /* The next instruction should be a COMPARE. */ + set = single_set (curr_insn); + if (!set + || !REG_P (SET_DEST (set)) + || GET_CODE (SET_SRC (set)) != COMPARE) + return false; + + curr_insn = NEXT_INSN (curr_insn); + if (!curr_insn) + return false; + + /* And the last instruction should be an IF_THEN_ELSE. */ + set = single_set (curr_insn); + if (!set + || !REG_P (SET_DEST (set)) + || GET_CODE (SET_SRC (set)) != IF_THEN_ELSE) + return false; + + return !NEXT_INSN (curr_insn); +} + +/* For Armv8.1-M Mainline we have both conditional execution through IT blocks, + as well as conditional arithmetic instructions controlled by + TARGET_COND_ARITH. To generate the latter we rely on a special part of the + "ce" pass that generates code for targets that don't support conditional + execution of general instructions known as "noce". These transformations + happen before 'reload_completed'. However, "noce" also triggers for some + unwanted patterns [PR 116444] that prevent "ce" optimisations after reload. + To make sure we can get both we use the TARGET_NOCE_CONVERSION_PROFITABLE_P + hook to only allow "noce" to generate the patterns that are profitable. */ + +bool +arm_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) +{ + if (!TARGET_COND_ARITH + || reload_completed) + return true; + + if (arm_is_v81m_cond_insn (seq)) + return true; + + return false; +} + /* Output assembly to read the thread pointer from the appropriate TPIDR register into DEST. If PRED_P also emit the %? that can be used to output the predication code. */