From patchwork Thu Aug 5 11:09:39 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Schmidt X-Patchwork-Id: 60951 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id E29A5B6F0E for ; Thu, 5 Aug 2010 21:10:01 +1000 (EST) Received: (qmail 15620 invoked by alias); 5 Aug 2010 11:09:59 -0000 Received: (qmail 15612 invoked by uid 22791); 5 Aug 2010 11:09:58 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 05 Aug 2010 11:09:54 +0000 Received: (qmail 25332 invoked from network); 5 Aug 2010 11:09:51 -0000 Received: from unknown (HELO ?84.152.192.116?) (bernds@127.0.0.2) by mail.codesourcery.com with ESMTPA; 5 Aug 2010 11:09:51 -0000 Message-ID: <4C5A9BF3.2090109@codesourcery.com> Date: Thu, 05 Aug 2010 13:09:39 +0200 From: Bernd Schmidt User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.7) Gecko/20100724 Thunderbird/3.1.1 MIME-Version: 1.0 To: Phil Blundell CC: Richard Earnshaw , GCC Patches Subject: Re: Fix ARM ldm/stm peephole2 loop References: <4C5A0F50.4000904@codesourcery.com> <1280994403.25655.11.camel@e102346-lin.cambridge.arm.com> <1281000102.10932.30.camel@lenovo.internal.reciva.com> In-Reply-To: <1281000102.10932.30.camel@lenovo.internal.reciva.com> Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 08/05/2010 11:21 AM, Phil Blundell wrote: > It does seem a little bit fragile to require the conditions in the two > places to match in order to avoid loops, though. Maybe there should be > a comment at the appropriate place in arm_gen_xx_multiple to say that it > needs to stay in sync with the code in multiple_operation_profitable_p, > or maybe those two functions could be reworked to actually use > multiple_operation_profitable_p() rather than duplicating its logic. Like this? Bernd * config/arm/arm.c (multiple_operation_profitable_p): Move xscale test here from arm_gen_load_multiple_1. (arm_gen_load_multiple_1, arm_gen_store_multiple_1): Use multiple_operation_profitable_p. Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 162821) +++ config/arm/arm.c (working copy) @@ -9186,6 +9193,36 @@ multiple_operation_profitable_p (bool is if (nops == 2 && arm_ld_sched && add_offset != 0) return false; + /* XScale has load-store double instructions, but they have stricter + alignment requirements than load-store multiple, so we cannot + use them. + + For XScale ldm requires 2 + NREGS cycles to complete and blocks + the pipeline until completion. + + NREGS CYCLES + 1 3 + 2 4 + 3 5 + 4 6 + + An ldr instruction takes 1-3 cycles, but does not block the + pipeline. + + NREGS CYCLES + 1 1-3 + 2 2-6 + 3 3-9 + 4 4-12 + + Best case ldr will always win. However, the more ldr instructions + we issue, the less likely we are to be able to schedule them well. + Using ldr instructions also increases code size. + + As a compromise, we use ldr for counts of 1 or 2 regs, and ldm + for counts of 3 or 4 regs. */ + if (nops <= 2 && arm_tune_xscale && !optimize_size) + return false; return true; } @@ -9538,35 +9575,7 @@ arm_gen_load_multiple_1 (int count, int int i = 0, j; rtx result; - /* XScale has load-store double instructions, but they have stricter - alignment requirements than load-store multiple, so we cannot - use them. - - For XScale ldm requires 2 + NREGS cycles to complete and blocks - the pipeline until completion. - - NREGS CYCLES - 1 3 - 2 4 - 3 5 - 4 6 - - An ldr instruction takes 1-3 cycles, but does not block the - pipeline. - - NREGS CYCLES - 1 1-3 - 2 2-6 - 3 3-9 - 4 4-12 - - Best case ldr will always win. However, the more ldr instructions - we issue, the less likely we are to be able to schedule them well. - Using ldr instructions also increases code size. - - As a compromise, we use ldr for counts of 1 or 2 regs, and ldm - for counts of 3 or 4 regs. */ - if (arm_tune_xscale && count <= 2 && ! optimize_size) + if (!multiple_operation_profitable_p (false, count, 0)) { rtx seq; @@ -9618,9 +9627,7 @@ arm_gen_store_multiple_1 (int count, int if (GET_CODE (basereg) == PLUS) basereg = XEXP (basereg, 0); - /* See arm_gen_load_multiple_1 for discussion of - the pros/cons of ldm/stm usage for XScale. */ - if (arm_tune_xscale && count <= 2 && ! optimize_size) + if (!multiple_operation_profitable_p (false, count, 0)) { rtx seq;