Message ID | AM5PR0802MB26102AB89F81EA35890A192083A80@AM5PR0802MB2610.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > SHA1H instructions may be scheduled after a SHA1C instruction > that uses the same input register. However SHA1C updates its input, > so if SHA1H is scheduled after it, it requires an extra move. > Increase the priority of SHA1H to ensure it gets scheduled > earlier, avoiding the move. > > Is this something the generic scheduler could do automatically for > instructions with RMW operands? I was thinking that but there are many of those on x86 so it might not make sense at all. Also the SIMD instruction FMLA has the similar problem most likely though I don't know if it make sense to do a similar thing for that one. Maybe it makes sense to mark the instructions in the .md file with some attribute and then just check for that attribute here instead of special casing SHA1C. Something like (not a full patch and don't claim this is correct): (define_attr "rmw" "yes,no" (const_string "no")) ... "sha1<sha1_op>\\t%q0, %s2, %3.4s" [(set_attr "type" "crypto_sha1_slow")] [(set_attr "rmw" "yes")] ... static int aarch64_sched_adjust_priority (rtx_insn *insn, int priority) { if (get_attr_rmw (insn) == yes) return MAX(priority - 10, 0); return priority; } Thanks, Andrew > > Passes bootstrap & regress. OK for commit? > > ChangeLog: > 2016-10-25 Wilco Dijkstra <wdijkstr@arm.com> > > * config/aarch64/aarch64.c (aarch64_sched_adjust_priority) > New function. > (TARGET_SCHED_ADJUST_PRIORITY): Define target hook. > -- > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -13668,6 +13668,26 @@ aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri, > return; > } > > +/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook. > + Adjust priority of sha1h instructions so they are scheduled before > + other SHA1 instructions. */ > + > +static int > +aarch64_sched_adjust_priority (rtx_insn *insn, int priority) > +{ > + rtx x = PATTERN (insn); > + > + if (GET_CODE (x) == SET) > + { > + x = SET_SRC (x); > + > + if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H) > + return priority + 10; > + } > + > + return priority; > +} > + > /* Given OPERANDS of consecutive load/store, check if we can merge > them into ldp/stp. LOAD is true if they are load instructions. > MODE is the mode of memory operands. */ > @@ -14431,6 +14451,9 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode, > #undef TARGET_CAN_USE_DOLOOP_P > #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost > > +#undef TARGET_SCHED_ADJUST_PRIORITY > +#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority > + > #undef TARGET_SCHED_MACRO_FUSION_P > #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p > >
Andrew Pinski wrote: > On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > SHA1H instructions may be scheduled after a SHA1C instruction > > that uses the same input register. However SHA1C updates its input, > > so if SHA1H is scheduled after it, it requires an extra move. > > Increase the priority of SHA1H to ensure it gets scheduled > > earlier, avoiding the move. > > > > Is this something the generic scheduler could do automatically for > > instructions with RMW operands? > > I was thinking that but there are many of those on x86 so it might not > make sense at all. > Also the SIMD instruction FMLA has the similar problem most likely > though I don't know if it make sense to do a similar thing for that > one. Indeed it doesn't always make sense. I've never seen the issue with FMLA for example, so maybe a more generic mechanism isn't required. > Maybe it makes sense to mark the instructions in the .md file with > some attribute and then just check for that attribute here instead of > special casing SHA1C. That should be possible if adding attributes is not expensive in terms of memory/compile time. Wilco
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13668,6 +13668,26 @@ aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri, return; } +/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook. + Adjust priority of sha1h instructions so they are scheduled before + other SHA1 instructions. */ + +static int +aarch64_sched_adjust_priority (rtx_insn *insn, int priority) +{ + rtx x = PATTERN (insn); + + if (GET_CODE (x) == SET) + { + x = SET_SRC (x); + + if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H) + return priority + 10; + } + + return priority; +} + /* Given OPERANDS of consecutive load/store, check if we can merge them into ldp/stp. LOAD is true if they are load instructions. MODE is the mode of memory operands. */ @@ -14431,6 +14451,9 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode, #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost +#undef TARGET_SCHED_ADJUST_PRIORITY +#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority + #undef TARGET_SCHED_MACRO_FUSION_P #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p