From patchwork Wed Jun  1 15:49:00 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Julian Brown <julian@codesourcery.com>
X-Patchwork-Id: 98212
Return-Path: 
 <gcc-patches-return-293368-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 68906B6F82
	for <incoming@patchwork.ozlabs.org>;
	Thu,  2 Jun 2011 01:49:25 +1000 (EST)
Received: (qmail 19630 invoked by alias); 1 Jun 2011 15:49:23 -0000
Received: (qmail 19617 invoked by uid 22791); 1 Jun 2011 15:49:20 -0000
X-SWARE-Spam-Status: No, hits=-1.7 required=5.0	tests=AWL, BAYES_00,
	T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com)
	(38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Wed, 01 Jun 2011 15:49:06 +0000
Received: (qmail 28799 invoked from network); 1 Jun 2011 15:49:05 -0000
Received: from unknown (HELO rex.config) (julian@127.0.0.2) by
	mail.codesourcery.com with ESMTPA; 1 Jun 2011 15:49:05 -0000
Date: Wed, 1 Jun 2011 16:49:00 +0100
From: Julian Brown <julian@codesourcery.com>
To: gcc-patches@gcc.gnu.org, paul@codesourcery.com, rearnsha@arm.com,
	Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
Subject: [PATCH, ARM] Cortex-A5 tuning [1/2] - branch costs
Message-ID: <20110601164900.48a018f7@rex.config>
Mime-Version: 1.0
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

This patch overrides the branch cost for Cortex-A5 cores, building on
the previous patch:

  http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00045.html

(And also depending on:

  http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00044.html

to apply correctly.)

The rationale is as follows: branches are pretty much the only
instructions which can dual-issued on Cortex-A5. This makes them
relatively cheap: in particular, cheaper than long sequences of
conditionally-executed instructions. Setting the cost to zero was
experimentally determined to work better than one (or several other
values).

Together with the follow-up patch to tweak the value of
max_insns_skipped (for the arm_final_prescan_insn function), we obtain
(on a popular embedded benchmark, geometric mean improvement):

  * 2.75% improvement in ARM mode (~0.9% with just this patch).

  * 0.91% improvement in Thumb-2 mode.

Caveat: based on only a single test run, although previous benchmarking
(on a 4.5-based branch IIRC) showed similar improvements.

Testing still in progress. OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm-cores.def (cortex-a5): Use cortex_a5 tuning.
    * config/arm/arm.c (arm_cortex_a5_branch_cost): New.
    (arm_cortex_a5_tune): New.

commit c027c802ea85090f54df7432709f12be33226266
Author: Julian Brown <julian@henry7.codesourcery.com>
Date:   Fri May 27 11:05:49 2011 -0700

    Branch cost for Cortex-A5.

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index b315df7..4ff2324 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -124,7 +124,7 @@ ARM_CORE("mpcorenovfp",	  mpcorenovfp,	6K,				 FL_LDSCHED, 9e)
 ARM_CORE("mpcore",	  mpcore,	6K,				 FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE("arm1156t2-s",	  arm1156t2s,	6T2,				 FL_LDSCHED, v6t2)
 ARM_CORE("arm1156t2f-s",  arm1156t2fs,  6T2,				 FL_LDSCHED | FL_VFPV2, v6t2)
-ARM_CORE("cortex-a5",	  cortexa5,	7A,				 FL_LDSCHED, cortex)
+ARM_CORE("cortex-a5",	  cortexa5,	7A,				 FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a8",	  cortexa8,	7A,				 FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A,				 FL_LDSCHED, cortex_a9)
 ARM_CORE("cortex-a15",	  cortexa15,	7A,				 FL_LDSCHED, cortex)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c7eb5b0..cd3f104 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -256,6 +256,7 @@ static void arm_conditional_register_usage (void);
 static reg_class_t arm_preferred_rename_class (reg_class_t rclass);
 static unsigned int arm_autovectorize_vector_sizes (void);
 static int arm_default_branch_cost (bool, bool);
+static int arm_cortex_a5_branch_cost (bool, bool);
 
 
 /* Table of machine attributes.  */
@@ -912,6 +913,16 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost
 };
 
+const struct tune_params arm_cortex_a5_tune =
+{
+  arm_9e_rtx_costs,
+  NULL,
+  1,						/* Constant limit.  */
+  ARM_PREFETCH_NOT_BENEFICIAL,
+  false,					/* Prefer constant pool.  */
+  arm_cortex_a5_branch_cost
+};
+
 const struct tune_params arm_cortex_a9_tune =
 {
   arm_9e_rtx_costs,
@@ -8098,6 +8109,12 @@ arm_default_branch_cost (bool speed_p, bool predictable_p ATTRIBUTE_UNUSED)
     return (optimize > 0) ? 2 : 0;
 }
 
+static int
+arm_cortex_a5_branch_cost (bool speed_p, bool predictable_p)
+{
+  return speed_p ? 0 : arm_default_branch_cost (speed_p, predictable_p);
+}
+
 static int fp_consts_inited = 0;
 
 /* Only zero is valid for VFP.  Other values are also valid for FPA.  */