[AArch64] Increase code alignment

Message ID	DB3PR08MB0089D8CEBDA026094271E69483590@DB3PR08MB0089.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-428992-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=rSjV70yt7h8+Ruib M2bbzie/dlHzU5OwW5/zLxe05fCIKe/hHVGKjJCYySqcGvj9F9DqoDPV6mLZlZxy daMbtE7FL6BVNX+JU+NnmL5wcv4fDrlI7XjCBJcgYhtRA9oMOFyC14+O74LDuQ+d RxoRbHqgpe9UPdkhVOIe8KP/1xA= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com>, "philipp.tomsich@theobroma-systems.com" <philipp.tomsich@theobroma-systems.com>, "pinskia@gmail.com" <pinskia@gmail.com>, "jim.wilson@linaro.org" <jim.wilson@linaro.org>, "benedikt.huber@theobroma-systems.com" <benedikt.huber@theobroma-systems.com>, Evandro Menezes <e.menezes@samsung.com> Subject: [PATCH][AArch64] Increase code alignment Date: Fri, 3 Jun 2016 10:51:57 +0000 Message-ID: <DB3PR08MB0089D8CEBDA026094271E69483590@DB3PR08MB0089.eurprd08.prod.outlook.com> x-ms-office365-filtering-correlation-id: ef1d5c5d-ea5d-44d2-050c-08d38b9d156f x-microsoft-exchange-diagnostics: 1; DB3PR08MB0091; 5:6yCM20KgP2mGVn4easfA+BTKYfYzzj0qtrN1EtylTEicgNoSCl522isP1mfxs9flJHrdXRFTUceCEMVOX+2AvenrT5Jto9Du7J+rsqtzW4xcEYKn4+zv2SaVWXQUQfkBo3IVl2VbkNxQZZftZ8qT4A==; 24:5T1yZsem8OyeM3dFbA5yqu3lK4/p5t9p/9pP1zg/yeqpHlU1b+fVQZ1a0Cq/y3e27ZqLzIcVq3TWhMFy+rZto/PqbyIMfi0xadJxy6ovPxc=; 7:/ZbTHmRViHD123hNJgXFe6T69kT8pJZexYCytsUDaYfTLz5XkDBfAZkfdWa9UMjbJPcmzC5kOLs45U6VyzVeUiyBinx0PeEChrdQTGKYTKtaSlcfI1COyGRL1OnZljkSdu/6T+1ApEdtTdktIcyiSWy7bHbW79NZI7t8kjzNU8d88jHDMVUQG/fUBbzMAjdNPlIyVRhlXlL54ZO5vDVnRaWf9FRzNz0TeMcWmvHfVXA=; 20:URWryrdkEZClBbPw+JG2cqp0PxRQ6Alba4TTG02rFcsVuBnZHMqeaZivm47lTaalVVKFyEnBHTWVXM8DgF0pynVNrViU/pOAkOhZVAGgbAhpep9vzLl5YTAmKgttBgQKA2ztPZ95Tqm1Qse9NB5WqJAmetIKKoOd8FKr6xLgSEE= nodisclaimer: True spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable

Message ID

DB3PR08MB0089D8CEBDA026094271E69483590@DB3PR08MB0089.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type
	:content-transfer-encoding; q=dns; s=default; b=rSjV70yt7h8+Ruib
	M2bbzie/dlHzU5OwW5/zLxe05fCIKe/hHVGKjJCYySqcGvj9F9DqoDPV6mLZlZxy
	daMbtE7FL6BVNX+JU+NnmL5wcv4fDrlI7XjCBJcgYhtRA9oMOFyC14+O74LDuQ+d
	RxoRbHqgpe9UPdkhVOIe8KP/1xA=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>,
	"philipp.tomsich@theobroma-systems.com"	<philipp.tomsich@theobroma-systems.com>,
	"pinskia@gmail.com"	<pinskia@gmail.com>,
	"jim.wilson@linaro.org" <jim.wilson@linaro.org>,
	"benedikt.huber@theobroma-systems.com"	<benedikt.huber@theobroma-systems.com>,
	Evandro Menezes	<e.menezes@samsung.com>
Subject: [PATCH][AArch64] Increase code alignment
Date: Fri, 3 Jun 2016 10:51:57 +0000
Message-ID: <DB3PR08MB0089D8CEBDA026094271E69483590@DB3PR08MB0089.eurprd08.prod.outlook.com>
nodisclaimer: True
spamdiagnosticoutput: 1:23
spamdiagnosticmetadata: NSPM
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Jun 2016 10:51:57.3689
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR08MB0091
X-MC-Unique: heSjZey_SeOzIzs3jukYSg-1
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable

Commit Message

Wilco Dijkstra June 3, 2016, 10:51 a.m. UTC

Increase loop alignment on Cortex cores to 8 and set function alignment to 16.  This makes things consistent across big.LITTLE cores, improves performance of benchmarks with tight loops and reduces performance variations due to small changes in code layout. It looks almost all AArch64 cores agree on alignment of 16 for function, and 8 for loops and branches, so we should change -mcpu=generic as well if there is no disagreement - feedback welcome.

OK for commit?

ChangeLog:

2016-05-03  Wilco Dijkstra  <wdijkstr@arm.com>

	* gcc/config/aarch64/aarch64.c (cortexa53_tunings):
	Increase loop alignment to 8.  Set function alignment to 16.
	(cortexa35_tunings): Likewise.
	(cortexa57_tunings): Increase loop alignment to 8.
	(cortexa72_tunings): Likewise.

---

Comments

Evandro Menezes June 3, 2016, 10:22 p.m. UTC | #1

On 06/03/16 05:51, Wilco Dijkstra wrote:
> It looks almost all AArch64 cores agree on alignment of 16 for function, and 8 for loops and branches, so we should change -mcpu=generic as well if there is no disagreement - feedback welcome.

I'll see what sets of values Exynos M1 would be most comfortable with, 
but I also wonder if the -falign-labels shouldn't also be a parameter in 
tune_params.

Thoughts?

Andrew Pinski June 3, 2016, 10:33 p.m. UTC | #2

On Fri, Jun 3, 2016 at 3:51 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Increase loop alignment on Cortex cores to 8 and set function alignment to 16.  This makes things consistent across big.LITTLE cores, improves performance of benchmarks with tight loops and reduces performance variations due to small changes in code layout. It looks almost all AArch64 cores agree on alignment of 16 for function, and 8 for loops and branches, so we should change -mcpu=generic as well if there is no disagreement - feedback welcome.

This is actually might be better for ThunderX than the current set of
values for ThunderX.  I have tried 16 alignment for functions to see
if it is better but it should not hurt ThunderX that much as we have a
128 byte cache line anyways.

Thanks,
Andrew


>
> OK for commit?
>
> ChangeLog:
>
> 2016-05-03  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * gcc/config/aarch64/aarch64.c (cortexa53_tunings):
>         Increase loop alignment to 8.  Set function alignment to 16.
>         (cortexa35_tunings): Likewise.
>         (cortexa57_tunings): Increase loop alignment to 8.
>         (cortexa72_tunings): Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 12e5017a6d4b0ab15dcf932014980fdbd1a598ee..6ea10a187a1f895a399515b8cd0da0be63be827a 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -424,9 +424,9 @@ static const struct tune_params cortexa35_tunings =
>    1, /* issue_rate  */
>    (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
>     | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  8,   /* function_align.  */
> +  16,  /* function_align.  */
>    8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>    2,   /* int_reassoc_width.  */
>    4,   /* fp_reassoc_width.  */
>    1,   /* vec_reassoc_width.  */
> @@ -449,9 +449,9 @@ static const struct tune_params cortexa53_tunings =
>    2, /* issue_rate  */
>    (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
>     | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  8,   /* function_align.  */
> +  16,  /* function_align.  */
>    8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>    2,   /* int_reassoc_width.  */
>    4,   /* fp_reassoc_width.  */
>    1,   /* vec_reassoc_width.  */
> @@ -476,7 +476,7 @@ static const struct tune_params cortexa57_tunings =
>     | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
>    16,  /* function_align.  */
>    8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>    2,   /* int_reassoc_width.  */
>    4,   /* fp_reassoc_width.  */
>    1,   /* vec_reassoc_width.  */
> @@ -502,7 +502,7 @@ static const struct tune_params cortexa72_tunings =
>     | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
>    16,  /* function_align.  */
>    8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>    2,   /* int_reassoc_width.  */
>    4,   /* fp_reassoc_width.  */
>    1,   /* vec_reassoc_width.  */
>

Evandro Menezes June 4, 2016, 12:13 a.m. UTC | #3

On 06/03/16 17:22, Evandro Menezes wrote:
> On 06/03/16 05:51, Wilco Dijkstra wrote:
>> It looks almost all AArch64 cores agree on alignment of 16 for 
>> function, and 8 for loops and branches, so we should change 
>> -mcpu=generic as well if there is no disagreement - feedback welcome.
>
> I'll see what sets of values Exynos M1 would be most comfortable with, 
> but I also wonder if the -falign-labels shouldn't also be a parameter 
> in tune_params.
>
> Thoughts?
>

FWIW, here are the values for the alignment of functions, branches and 
loops that fare better on Exynos M1 when -mcpu=generic, in order of 
preference:

 1. 4-4-4
 2. 16-4-16
 3. 8-4-4

I also controlled the code size and, whenever the branch alignment was 8 
or 16 bytes, it would grow quickly, with no proportional improvement to 
performance on Exynos M1.

HTH

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12e5017a6d4b0ab15dcf932014980fdbd1a598ee..6ea10a187a1f895a399515b8cd0da0be63be827a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -424,9 +424,9 @@  static const struct tune_params cortexa35_tunings =
   1, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  8,	/* function_align.  */
+  16,	/* function_align.  */
   8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -449,9 +449,9 @@  static const struct tune_params cortexa53_tunings =
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  8,	/* function_align.  */
+  16,	/* function_align.  */
   8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -476,7 +476,7 @@  static const struct tune_params cortexa57_tunings =
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -502,7 +502,7 @@  static const struct tune_params cortexa72_tunings =
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */