From patchwork Wed Jun 1 15:49:00 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 98212 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 68906B6F82 for ; Thu, 2 Jun 2011 01:49:25 +1000 (EST) Received: (qmail 19630 invoked by alias); 1 Jun 2011 15:49:23 -0000 Received: (qmail 19617 invoked by uid 22791); 1 Jun 2011 15:49:20 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 01 Jun 2011 15:49:06 +0000 Received: (qmail 28799 invoked from network); 1 Jun 2011 15:49:05 -0000 Received: from unknown (HELO rex.config) (julian@127.0.0.2) by mail.codesourcery.com with ESMTPA; 1 Jun 2011 15:49:05 -0000 Date: Wed, 1 Jun 2011 16:49:00 +0100 From: Julian Brown To: gcc-patches@gcc.gnu.org, paul@codesourcery.com, rearnsha@arm.com, Ramana Radhakrishnan Subject: [PATCH, ARM] Cortex-A5 tuning [1/2] - branch costs Message-ID: <20110601164900.48a018f7@rex.config> Mime-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch overrides the branch cost for Cortex-A5 cores, building on the previous patch: http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00045.html (And also depending on: http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00044.html to apply correctly.) The rationale is as follows: branches are pretty much the only instructions which can dual-issued on Cortex-A5. This makes them relatively cheap: in particular, cheaper than long sequences of conditionally-executed instructions. Setting the cost to zero was experimentally determined to work better than one (or several other values). Together with the follow-up patch to tweak the value of max_insns_skipped (for the arm_final_prescan_insn function), we obtain (on a popular embedded benchmark, geometric mean improvement): * 2.75% improvement in ARM mode (~0.9% with just this patch). * 0.91% improvement in Thumb-2 mode. Caveat: based on only a single test run, although previous benchmarking (on a 4.5-based branch IIRC) showed similar improvements. Testing still in progress. OK to apply? Thanks, Julian ChangeLog gcc/ * config/arm/arm-cores.def (cortex-a5): Use cortex_a5 tuning. * config/arm/arm.c (arm_cortex_a5_branch_cost): New. (arm_cortex_a5_tune): New. commit c027c802ea85090f54df7432709f12be33226266 Author: Julian Brown Date: Fri May 27 11:05:49 2011 -0700 Branch cost for Cortex-A5. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index b315df7..4ff2324 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -124,7 +124,7 @@ ARM_CORE("mpcorenovfp", mpcorenovfp, 6K, FL_LDSCHED, 9e) ARM_CORE("mpcore", mpcore, 6K, FL_LDSCHED | FL_VFPV2, 9e) ARM_CORE("arm1156t2-s", arm1156t2s, 6T2, FL_LDSCHED, v6t2) ARM_CORE("arm1156t2f-s", arm1156t2fs, 6T2, FL_LDSCHED | FL_VFPV2, v6t2) -ARM_CORE("cortex-a5", cortexa5, 7A, FL_LDSCHED, cortex) +ARM_CORE("cortex-a5", cortexa5, 7A, FL_LDSCHED, cortex_a5) ARM_CORE("cortex-a8", cortexa8, 7A, FL_LDSCHED, cortex) ARM_CORE("cortex-a9", cortexa9, 7A, FL_LDSCHED, cortex_a9) ARM_CORE("cortex-a15", cortexa15, 7A, FL_LDSCHED, cortex) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c7eb5b0..cd3f104 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -256,6 +256,7 @@ static void arm_conditional_register_usage (void); static reg_class_t arm_preferred_rename_class (reg_class_t rclass); static unsigned int arm_autovectorize_vector_sizes (void); static int arm_default_branch_cost (bool, bool); +static int arm_cortex_a5_branch_cost (bool, bool); /* Table of machine attributes. */ @@ -912,6 +913,16 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost }; +const struct tune_params arm_cortex_a5_tune = +{ + arm_9e_rtx_costs, + NULL, + 1, /* Constant limit. */ + ARM_PREFETCH_NOT_BENEFICIAL, + false, /* Prefer constant pool. */ + arm_cortex_a5_branch_cost +}; + const struct tune_params arm_cortex_a9_tune = { arm_9e_rtx_costs, @@ -8098,6 +8109,12 @@ arm_default_branch_cost (bool speed_p, bool predictable_p ATTRIBUTE_UNUSED) return (optimize > 0) ? 2 : 0; } +static int +arm_cortex_a5_branch_cost (bool speed_p, bool predictable_p) +{ + return speed_p ? 0 : arm_default_branch_cost (speed_p, predictable_p); +} + static int fp_consts_inited = 0; /* Only zero is valid for VFP. Other values are also valid for FPA. */