diff mbox

[AArch64] Improve SHA1 scheduling

Message ID AM5PR0802MB26102AB89F81EA35890A192083A80@AM5PR0802MB2610.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Wilco Dijkstra Oct. 25, 2016, 5:08 p.m. UTC
SHA1H instructions may be scheduled after a SHA1C instruction
that uses the same input register.  However SHA1C updates its input,
so if SHA1H is scheduled after it, it requires an extra move.
Increase the priority of SHA1H to ensure it gets scheduled
earlier, avoiding the move.

Is this something the generic scheduler could do automatically for
instructions with RMW operands?

Passes bootstrap & regress. OK for commit?

ChangeLog:
2016-10-25  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.c (aarch64_sched_adjust_priority)
	New function.
	(TARGET_SCHED_ADJUST_PRIORITY): Define target hook.
--

Comments

Andrew Pinski Nov. 2, 2016, 5:13 p.m. UTC | #1
On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> SHA1H instructions may be scheduled after a SHA1C instruction
> that uses the same input register.  However SHA1C updates its input,
> so if SHA1H is scheduled after it, it requires an extra move.
> Increase the priority of SHA1H to ensure it gets scheduled
> earlier, avoiding the move.
>
> Is this something the generic scheduler could do automatically for
> instructions with RMW operands?

I was thinking that but there are many of those on x86 so it might not
make sense at all.
Also the SIMD instruction FMLA has the similar problem most likely
though I don't know if it make sense to do a similar thing for that
one.
Maybe it makes sense to mark the instructions in the .md file with
some attribute and then just check for that attribute here instead of
special casing SHA1C.
Something like (not a full patch and don't claim this is correct):
(define_attr "rmw" "yes,no" (const_string "no"))

...
  "sha1<sha1_op>\\t%q0, %s2, %3.4s"
  [(set_attr "type" "crypto_sha1_slow")]
  [(set_attr "rmw" "yes")]


...

static int
aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
{

  if (get_attr_rmw (insn) == yes)
    return MAX(priority - 10, 0);

  return priority;
}




Thanks,
Andrew

>
> Passes bootstrap & regress. OK for commit?
>
> ChangeLog:
> 2016-10-25  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_sched_adjust_priority)
>         New function.
>         (TARGET_SCHED_ADJUST_PRIORITY): Define target hook.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13668,6 +13668,26 @@ aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri,
>    return;
>  }
>
> +/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook.
> +   Adjust priority of sha1h instructions so they are scheduled before
> +   other SHA1 instructions.  */
> +
> +static int
> +aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
> +{
> +  rtx x = PATTERN (insn);
> +
> +  if (GET_CODE (x) == SET)
> +    {
> +      x = SET_SRC (x);
> +
> +      if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H)
> +       return priority + 10;
> +    }
> +
> +  return priority;
> +}
> +
>  /* Given OPERANDS of consecutive load/store, check if we can merge
>     them into ldp/stp.  LOAD is true if they are load instructions.
>     MODE is the mode of memory operands.  */
> @@ -14431,6 +14451,9 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>
> +#undef TARGET_SCHED_ADJUST_PRIORITY
> +#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
> +
>  #undef TARGET_SCHED_MACRO_FUSION_P
>  #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p
>
>
Wilco Dijkstra Nov. 3, 2016, 12:14 p.m. UTC | #2
Andrew Pinski wrote:
> On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > SHA1H instructions may be scheduled after a SHA1C instruction
> > that uses the same input register.  However SHA1C updates its input,
> > so if SHA1H is scheduled after it, it requires an extra move.
> > Increase the priority of SHA1H to ensure it gets scheduled
> > earlier, avoiding the move.
> >
> > Is this something the generic scheduler could do automatically for
> > instructions with RMW operands?
>
> I was thinking that but there are many of those on x86 so it might not
> make sense at all.
> Also the SIMD instruction FMLA has the similar problem most likely
> though I don't know if it make sense to do a similar thing for that
> one.

Indeed it doesn't always make sense. I've never seen the issue with
FMLA for example, so maybe a more generic mechanism isn't required.

> Maybe it makes sense to mark the instructions in the .md file with
> some attribute and then just check for that attribute here instead of
> special casing SHA1C.

That should be possible if adding attributes is not expensive in terms of
memory/compile time.

Wilco
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13668,6 +13668,26 @@  aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri,
   return;
 }
 
+/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook.
+   Adjust priority of sha1h instructions so they are scheduled before
+   other SHA1 instructions.  */
+
+static int
+aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
+{
+  rtx x = PATTERN (insn);
+
+  if (GET_CODE (x) == SET)
+    {
+      x = SET_SRC (x);
+
+      if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H)
+	return priority + 10;
+    }
+
+  return priority;
+}
+
 /* Given OPERANDS of consecutive load/store, check if we can merge
    them into ldp/stp.  LOAD is true if they are load instructions.
    MODE is the mode of memory operands.  */
@@ -14431,6 +14451,9 @@  aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_SCHED_ADJUST_PRIORITY
+#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
+
 #undef TARGET_SCHED_MACRO_FUSION_P
 #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p