[AArch64] Model Cortex-A53 load forwarding

Message ID	AM5PR0802MB261042E0DC03D761899AB888831B0@AM5PR0802MB2610.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-451801-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:content-transfer-encoding:mime-version; q=dns; s= default; b=qiTshUWvKdjUsdBoxQD4ovgt5t5WaBqpqsAopDBFlr3m5Oi82p5ID 2QfOIBUv9JTjGPO3uytLZHr04fn7I7R1zJnMzJQEOibIC2mLcFm6U/rwc9oZM8Qz GUdsELq7elwyp1FDQOAumZmgxdcDhe1GH2XC9350DeNBJwKCRFiAew= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org>, James Greenhalgh <James.Greenhalgh@arm.com> CC: nd <nd@arm.com> Subject: Re: [PATCH][AArch64] Model Cortex-A53 load forwarding Date: Thu, 20 Apr 2017 15:41:58 +0000 Message-ID: <AM5PR0802MB261042E0DC03D761899AB888831B0@AM5PR0802MB2610.eurprd08.prod.outlook.com> References: <AM5PR0802MB2610F4DB06ED48948BE3BEA9830A0@AM5PR0802MB2610.eurprd08.prod.outlook.com> In-Reply-To: <AM5PR0802MB2610F4DB06ED48948BE3BEA9830A0@AM5PR0802MB2610.eurprd08.prod.outlook.com> nodisclaimer: True spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

AM5PR0802MB261042E0DC03D761899AB888831B0@AM5PR0802MB2610.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:references:in-reply-to
	:content-type:content-transfer-encoding:mime-version; q=dns; s=
	default; b=qiTshUWvKdjUsdBoxQD4ovgt5t5WaBqpqsAopDBFlr3m5Oi82p5ID
	2QfOIBUv9JTjGPO3uytLZHr04fn7I7R1zJnMzJQEOibIC2mLcFm6U/rwc9oZM8Qz
	GUdsELq7elwyp1FDQOAumZmgxdcDhe1GH2XC9350DeNBJwKCRFiAew=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>,
	James Greenhalgh	<James.Greenhalgh@arm.com>
CC: nd <nd@arm.com>
Subject: Re: [PATCH][AArch64] Model Cortex-A53 load forwarding
Date: Thu, 20 Apr 2017 15:41:58 +0000
Message-ID: <AM5PR0802MB261042E0DC03D761899AB888831B0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
References: <AM5PR0802MB2610F4DB06ED48948BE3BEA9830A0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
In-Reply-To: <AM5PR0802MB2610F4DB06ED48948BE3BEA9830A0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
nodisclaimer: True
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Apr 2017 15:41:58.7303
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR08MB2658

Commit Message

Wilco Dijkstra April 20, 2017, 3:41 p.m. UTC

ping

From: Wilco Dijkstra
Sent: 05 April 2017 13:29
To: GCC Patches
Cc: nd; James Greenhalgh
Subject: [PATCH][AArch64] Model Cortex-A53 load forwarding

Code scheduling for Cortex-A53 isn't as good as it could be.  It turns out
code runs faster overall if we place loads and stores with a dependency
closer together.  To achieve this effect, this patch adds a bypass between
cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an
earlier load is used in an address calculation.  This significantly improved
benchmark scores in a proprietary benchmark suite.

Passes AArch64 bootstrap and regress. OK for stage 1?

ChangeLog:
2017-04-05  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/aarch-common.c (arm_early_load_addr_dep_ptr):
        New function.
        (arm_early_store_addr_dep_ptr): Likewise.
        * config/arm/aarch-common-protos.h
        (arm_early_load_addr_dep_ptr): Add prototype.
        (arm_early_store_addr_dep_ptr): Likewise.
        * config/arm/cortex-a53.md: Add new bypasses.
---

diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 8e9fb7a895b0a4aaf1585eb3368443899b061c9b..5298172e6b6930a110388a40a7533ff208a87095 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -30,7 +30,9 @@  extern bool aarch_rev16_p (rtx);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
 extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode);
 extern int arm_early_load_addr_dep (rtx, rtx);
+extern int arm_early_load_addr_dep_ptr (rtx, rtx);
 extern int arm_early_store_addr_dep (rtx, rtx);
+extern int arm_early_store_addr_dep_ptr (rtx, rtx);
 extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
 extern int arm_mac_accumulator_is_result (rtx, rtx);
 extern int arm_no_early_alu_shift_dep (rtx, rtx);
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index dd37be0291a633f606d95ec8acacc598435828b3..74b80b272550028919c4274387944867ffed43d1 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -241,6 +241,24 @@  arm_early_load_addr_dep (rtx producer, rtx consumer)
   return reg_overlap_mentioned_p (value, addr);
 }
 
+/* Return nonzero if the CONSUMER instruction (a load) does need
+   a Pmode PRODUCER's value to calculate the address.  */
+
+int
+arm_early_load_addr_dep_ptr (rtx producer, rtx consumer)
+{
+  rtx value = arm_find_sub_rtx_with_code (PATTERN (producer), SET, false);
+  rtx addr = arm_find_sub_rtx_with_code (PATTERN (consumer), SET, false);
+
+  if (!value || !addr || !MEM_P (SET_SRC (value)))
+    return 0;
+
+  value = SET_DEST (value);
+  addr = SET_SRC (addr);
+
+  return GET_MODE (value) == Pmode && reg_overlap_mentioned_p (value, addr);
+}
+
 /* Return nonzero if the CONSUMER instruction (an ALU op) does not
    have an early register shift value or amount dependency on the
    result of PRODUCER.  */
@@ -336,6 +354,24 @@  arm_early_store_addr_dep (rtx producer, rtx consumer)
   return !arm_no_early_store_addr_dep (producer, consumer);
 }
 
+/* Return nonzero if the CONSUMER instruction (a store) does need
+   a Pmode PRODUCER's value to calculate the address.  */
+
+int
+arm_early_store_addr_dep_ptr (rtx producer, rtx consumer)
+{
+  rtx value = arm_find_sub_rtx_with_code (PATTERN (producer), SET, false);
+  rtx addr = arm_find_sub_rtx_with_code (PATTERN (consumer), SET, false);
+
+  if (!value || !addr || !MEM_P (SET_SRC (value)))
+    return 0;
+
+  value = SET_DEST (value);
+  addr = SET_DEST (addr);
+
+  return GET_MODE (value) == Pmode && reg_overlap_mentioned_p (value, addr);
+}
+
 /* Return non-zero iff the consumer (a multiply-accumulate or a
    multiple-subtract instruction) has an accumulator dependency on the
    result of the producer and no other dependency on that result.  It
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index b367ad403a4a641da34521c17669027b87092737..f8225f33c7a06485147b30fe2633309ac252d0c7 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -246,6 +246,16 @@ 
                  "cortex_a53_store*"
                  "arm_no_early_store_addr_dep")
 
+;; Model a bypass for load to load/store address.
+
+(define_bypass 3 "cortex_a53_load1"
+                "cortex_a53_load*"
+                "arm_early_load_addr_dep_ptr")
+
+(define_bypass 3 "cortex_a53_load1"
+                "cortex_a53_store*"
+                "arm_early_store_addr_dep_ptr")
+
 ;; Model a GP->FP register move as similar to stores.
 
 (define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"