From patchwork Thu Aug  5 11:09:39 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bernd Schmidt <bernds@codesourcery.com>
X-Patchwork-Id: 60951
Return-Path: 
 <gcc-patches-return-269877-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id E29A5B6F0E
	for <incoming@patchwork.ozlabs.org>;
	Thu,  5 Aug 2010 21:10:01 +1000 (EST)
Received: (qmail 15620 invoked by alias); 5 Aug 2010 11:09:59 -0000
Received: (qmail 15612 invoked by uid 22791); 5 Aug 2010 11:09:58 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0	tests=AWL, BAYES_00,
	T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com)
	(38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Thu, 05 Aug 2010 11:09:54 +0000
Received: (qmail 25332 invoked from network); 5 Aug 2010 11:09:51 -0000
Received: from unknown (HELO ?84.152.192.116?) (bernds@127.0.0.2) by
	mail.codesourcery.com with ESMTPA; 5 Aug 2010 11:09:51 -0000
Message-ID: <4C5A9BF3.2090109@codesourcery.com>
Date: Thu, 05 Aug 2010 13:09:39 +0200
From: Bernd Schmidt <bernds@codesourcery.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.7) Gecko/20100724 Thunderbird/3.1.1
MIME-Version: 1.0
To: Phil Blundell <pb@reciva.com>
CC: Richard Earnshaw <rearnsha@arm.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: Fix ARM ldm/stm peephole2 loop
References: <4C5A0F50.4000904@codesourcery.com>	
	<1280994403.25655.11.camel@e102346-lin.cambridge.arm.com>
	<1281000102.10932.30.camel@lenovo.internal.reciva.com>
In-Reply-To: <1281000102.10932.30.camel@lenovo.internal.reciva.com>
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

On 08/05/2010 11:21 AM, Phil Blundell wrote:
> It does seem a little bit fragile to require the conditions in the two
> places to match in order to avoid loops, though.  Maybe there should be
> a comment at the appropriate place in arm_gen_xx_multiple to say that it
> needs to stay in sync with the code in multiple_operation_profitable_p,
> or maybe those two functions could be reworked to actually use
> multiple_operation_profitable_p() rather than duplicating its logic.

Like this?


Bernd
* config/arm/arm.c (multiple_operation_profitable_p): Move xscale
	test here from arm_gen_load_multiple_1.
	(arm_gen_load_multiple_1, arm_gen_store_multiple_1): Use
	multiple_operation_profitable_p.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 162821)
+++ config/arm/arm.c	(working copy)
@@ -9186,6 +9193,36 @@ multiple_operation_profitable_p (bool is
   if (nops == 2 && arm_ld_sched && add_offset != 0)
     return false;
 
+  /* XScale has load-store double instructions, but they have stricter
+     alignment requirements than load-store multiple, so we cannot
+     use them.
+
+     For XScale ldm requires 2 + NREGS cycles to complete and blocks
+     the pipeline until completion.
+
+	NREGS		CYCLES
+	  1		  3
+	  2		  4
+	  3		  5
+	  4		  6
+
+     An ldr instruction takes 1-3 cycles, but does not block the
+     pipeline.
+
+	NREGS		CYCLES
+	  1		 1-3
+	  2		 2-6
+	  3		 3-9
+	  4		 4-12
+
+     Best case ldr will always win.  However, the more ldr instructions
+     we issue, the less likely we are to be able to schedule them well.
+     Using ldr instructions also increases code size.
+
+     As a compromise, we use ldr for counts of 1 or 2 regs, and ldm
+     for counts of 3 or 4 regs.  */
+  if (nops <= 2 && arm_tune_xscale && !optimize_size)
+    return false;
   return true;
 }
 
@@ -9538,35 +9575,7 @@ arm_gen_load_multiple_1 (int count, int 
   int i = 0, j;
   rtx result;
 
-  /* XScale has load-store double instructions, but they have stricter
-     alignment requirements than load-store multiple, so we cannot
-     use them.
-
-     For XScale ldm requires 2 + NREGS cycles to complete and blocks
-     the pipeline until completion.
-
-	NREGS		CYCLES
-	  1		  3
-	  2		  4
-	  3		  5
-	  4		  6
-
-     An ldr instruction takes 1-3 cycles, but does not block the
-     pipeline.
-
-	NREGS		CYCLES
-	  1		 1-3
-	  2		 2-6
-	  3		 3-9
-	  4		 4-12
-
-     Best case ldr will always win.  However, the more ldr instructions
-     we issue, the less likely we are to be able to schedule them well.
-     Using ldr instructions also increases code size.
-
-     As a compromise, we use ldr for counts of 1 or 2 regs, and ldm
-     for counts of 3 or 4 regs.  */
-  if (arm_tune_xscale && count <= 2 && ! optimize_size)
+  if (!multiple_operation_profitable_p (false, count, 0))
     {
       rtx seq;
 
@@ -9618,9 +9627,7 @@ arm_gen_store_multiple_1 (int count, int
   if (GET_CODE (basereg) == PLUS)
     basereg = XEXP (basereg, 0);
 
-  /* See arm_gen_load_multiple_1 for discussion of
-     the pros/cons of ldm/stm usage for XScale.  */
-  if (arm_tune_xscale && count <= 2 && ! optimize_size)
+  if (!multiple_operation_profitable_p (false, count, 0))
     {
       rtx seq;