[AArch64] Improve SHA1 scheduling

Message ID	AM5PR0802MB26102AB89F81EA35890A192083A80@AM5PR0802MB2610.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-439537-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=UIFUzz0mXO2hr5J3 t8LZoxVyGQElTIUWl8rE2kKdlgZ93kiy8FgrDU6YTSMJH/Ox1VrKL3QI4QRsdn/J AoMjOPnYQLAqK7dbugOHR8Jgt0ooEde06SVSLE9AzLIZPglaESIi2Xlgw+nbOeIA H9dsSn0kAdOkO0vUelNQGwnPMI4= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com> Subject: [PATCH][AArch64] Improve SHA1 scheduling Date: Tue, 25 Oct 2016 17:08:33 +0000 Message-ID: <AM5PR0802MB26102AB89F81EA35890A192083A80@AM5PR0802MB2610.eurprd08.prod.outlook.com> x-ms-office365-filtering-correlation-id: e4add604-8bdc-4c2b-7f48-08d3fcf98d23 x-microsoft-exchange-diagnostics: 1; AM5PR0802MB2609; 7:SCZXz4fJfLArfDdoAwZSN8B/cU8VJBwBspdNzFdKNs2Dt8bSwYWaV08BNK3f82eP7XyjXJbchI3RUrtuJfr3LRx6pVB9uSqZfXetx5G8ZyxNZHcOG2n/o257FVgc7ze62DcbnSYCQnZV7BO1iRIIMx3EzciUspqWgDqj7EY2KXXF7FhhPxKsrrNfznvvE+g8tgHVAjVc/SZWBd6+Xjc2hzVpgWtC2AoI3HYme/hUjyMcWmIeW5aDHCeiaOz4j9hJ1/RhFuuvKDratg89yIsjD+1PWftc4tC0DFaG4PxdrfZR8S25tTsNu0iopph6GVPgTGrxLFCmx89w5N6zwit1ocsbmelLHKmjeGjPlaTRDiM= nodisclaimer: True spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable

Message ID

AM5PR0802MB26102AB89F81EA35890A192083A80@AM5PR0802MB2610.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type
	:content-transfer-encoding; q=dns; s=default; b=UIFUzz0mXO2hr5J3
	t8LZoxVyGQElTIUWl8rE2kKdlgZ93kiy8FgrDU6YTSMJH/Ox1VrKL3QI4QRsdn/J
	AoMjOPnYQLAqK7dbugOHR8Jgt0ooEde06SVSLE9AzLIZPglaESIi2Xlgw+nbOeIA
	H9dsSn0kAdOkO0vUelNQGwnPMI4=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>
Subject: [PATCH][AArch64] Improve SHA1 scheduling
Date: Tue, 25 Oct 2016 17:08:33 +0000
Message-ID: <AM5PR0802MB26102AB89F81EA35890A192083A80@AM5PR0802MB2610.eurprd08.prod.outlook.com>
nodisclaimer: True
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Oct 2016 17:08:33.3251
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2609
X-MC-Unique: eoLNaMAjMqe1O-pMv_resQ-1
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable

Commit Message

Wilco Dijkstra Oct. 25, 2016, 5:08 p.m. UTC

SHA1H instructions may be scheduled after a SHA1C instruction
that uses the same input register.  However SHA1C updates its input,
so if SHA1H is scheduled after it, it requires an extra move.
Increase the priority of SHA1H to ensure it gets scheduled
earlier, avoiding the move.

Is this something the generic scheduler could do automatically for
instructions with RMW operands?

Passes bootstrap & regress. OK for commit?

ChangeLog:
2016-10-25  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.c (aarch64_sched_adjust_priority)
	New function.
	(TARGET_SCHED_ADJUST_PRIORITY): Define target hook.
--

Comments

Andrew Pinski Nov. 2, 2016, 5:13 p.m. UTC | #1

On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> SHA1H instructions may be scheduled after a SHA1C instruction
> that uses the same input register.  However SHA1C updates its input,
> so if SHA1H is scheduled after it, it requires an extra move.
> Increase the priority of SHA1H to ensure it gets scheduled
> earlier, avoiding the move.
>
> Is this something the generic scheduler could do automatically for
> instructions with RMW operands?

I was thinking that but there are many of those on x86 so it might not
make sense at all.
Also the SIMD instruction FMLA has the similar problem most likely
though I don't know if it make sense to do a similar thing for that
one.
Maybe it makes sense to mark the instructions in the .md file with
some attribute and then just check for that attribute here instead of
special casing SHA1C.
Something like (not a full patch and don't claim this is correct):
(define_attr "rmw" "yes,no" (const_string "no"))

...
  "sha1<sha1_op>\\t%q0, %s2, %3.4s"
  [(set_attr "type" "crypto_sha1_slow")]
  [(set_attr "rmw" "yes")]


...

static int
aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
{

  if (get_attr_rmw (insn) == yes)
    return MAX(priority - 10, 0);

  return priority;
}




Thanks,
Andrew

>
> Passes bootstrap & regress. OK for commit?
>
> ChangeLog:
> 2016-10-25  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_sched_adjust_priority)
>         New function.
>         (TARGET_SCHED_ADJUST_PRIORITY): Define target hook.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13668,6 +13668,26 @@ aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri,
>    return;
>  }
>
> +/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook.
> +   Adjust priority of sha1h instructions so they are scheduled before
> +   other SHA1 instructions.  */
> +
> +static int
> +aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
> +{
> +  rtx x = PATTERN (insn);
> +
> +  if (GET_CODE (x) == SET)
> +    {
> +      x = SET_SRC (x);
> +
> +      if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H)
> +       return priority + 10;
> +    }
> +
> +  return priority;
> +}
> +
>  /* Given OPERANDS of consecutive load/store, check if we can merge
>     them into ldp/stp.  LOAD is true if they are load instructions.
>     MODE is the mode of memory operands.  */
> @@ -14431,6 +14451,9 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>
> +#undef TARGET_SCHED_ADJUST_PRIORITY
> +#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
> +
>  #undef TARGET_SCHED_MACRO_FUSION_P
>  #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p
>
>

Wilco Dijkstra Nov. 3, 2016, 12:14 p.m. UTC | #2

Andrew Pinski wrote:
> On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > SHA1H instructions may be scheduled after a SHA1C instruction
> > that uses the same input register.  However SHA1C updates its input,
> > so if SHA1H is scheduled after it, it requires an extra move.
> > Increase the priority of SHA1H to ensure it gets scheduled
> > earlier, avoiding the move.
> >
> > Is this something the generic scheduler could do automatically for
> > instructions with RMW operands?
>
> I was thinking that but there are many of those on x86 so it might not
> make sense at all.
> Also the SIMD instruction FMLA has the similar problem most likely
> though I don't know if it make sense to do a similar thing for that
> one.

Indeed it doesn't always make sense. I've never seen the issue with
FMLA for example, so maybe a more generic mechanism isn't required.

> Maybe it makes sense to mark the instructions in the .md file with
> some attribute and then just check for that attribute here instead of
> special casing SHA1C.

That should be possible if adding attributes is not expensive in terms of
memory/compile time.

Wilco

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13668,6 +13668,26 @@  aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri,
   return;
 }
 
+/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook.
+   Adjust priority of sha1h instructions so they are scheduled before
+   other SHA1 instructions.  */
+
+static int
+aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
+{
+  rtx x = PATTERN (insn);
+
+  if (GET_CODE (x) == SET)
+    {
+      x = SET_SRC (x);
+
+      if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H)
+	return priority + 10;
+    }
+
+  return priority;
+}
+
 /* Given OPERANDS of consecutive load/store, check if we can merge
    them into ldp/stp.  LOAD is true if they are load instructions.
    MODE is the mode of memory operands.  */
@@ -14431,6 +14451,9 @@  aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_SCHED_ADJUST_PRIORITY
+#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
+
 #undef TARGET_SCHED_MACRO_FUSION_P
 #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p