From patchwork Fri Mar 25 15:09:05 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Modra X-Patchwork-Id: 88387 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id E2253B6F9F for ; Sat, 26 Mar 2011 02:09:31 +1100 (EST) Received: (qmail 21703 invoked by alias); 25 Mar 2011 15:09:24 -0000 Received: (qmail 21691 invoked by uid 22791); 25 Mar 2011 15:09:22 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-iw0-f175.google.com (HELO mail-iw0-f175.google.com) (209.85.214.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Mar 2011 15:09:14 +0000 Received: by iwn10 with SMTP id 10so1514764iwn.20 for ; Fri, 25 Mar 2011 08:09:14 -0700 (PDT) Received: by 10.43.56.140 with SMTP id wc12mr1377391icb.237.1301065753879; Fri, 25 Mar 2011 08:09:13 -0700 (PDT) Received: from bubble.grove.modra.org ([115.187.252.19]) by mx.google.com with ESMTPS id xi12sm677902icb.18.2011.03.25.08.09.10 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 25 Mar 2011 08:09:13 -0700 (PDT) Received: by bubble.grove.modra.org (Postfix, from userid 1000) id A452116DE62E; Sat, 26 Mar 2011 01:39:05 +1030 (CST) Date: Sat, 26 Mar 2011 01:39:05 +1030 From: Alan Modra To: gcc-patches@gcc.gnu.org Cc: David Edelsohn Subject: PowerPC64 reload failure with misaligned fp load Message-ID: <20110325150905.GM13754@bubble.grove.modra.org> Mail-Followup-To: gcc-patches@gcc.gnu.org, David Edelsohn MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Compiler errors like the following have plagued powerpc64 gcc for a long time. error: insn does not satisfy its constraints: (insn 43020 13791 13792 1245 _thread.c:8306 (set (reg:DF 12 12) (mem:DF (plus:DI (reg:DI 3 3 [orig:4092 D.11951 ] [4092]) (const_int 15 [0xf])) [0 S8 A64])) 388 {*movdf_hardfloat64} They tend to appear when there is high register pressure and misaligned doubles. Often the reason the memory is misaligned is due to user error, for example, compiling generated code full of macros highly optimized for some other architecture. So people fix their error and this sort of compiler bug doesn't get much attention. I spent some time in gdb yesterday looking at what goes wrong in reload for this insn. First time in calculate_needs_all_insns we see (insn 11358 11553 11359 1256 _thread.c:8306 (set (reg/v:DF 12 12 [orig:4065 ___F64V1 ] [4065]) (mem:DF (plus:DI (reg:DI 15 15 [orig:4062 D.10743 ] [4062]) (const_int 15 [0xf])) [16 S8 A64])) 388 {*movdf_hardfloat64} (nil)) Reload correctly chooses the r,Y alternative of movdf_hardfloat64 and also correctly notices the address in the mem doesn't match the Y constraint so reloads the address. All good so far. However, some spills are needed and r12 gets chosen as a spill reg. Next time in calculate_needs_all_insns we see (insn 11358 11553 11359 1256 _thread.c:8306 (set (reg/v:DF 4065 [ ___F64V1 ]) (mem:DF (plus:DI (reg:DI 15 15 [orig:4062 D.10743 ] [4062]) (const_int 15 [0xf])) [16 S8 A64])) 388 {*movdf_hardfloat64} (nil)) Notice we are back to a pseudo for the set dest. reg_renumber[4065] is -1 and reg_equiv_mem[4065] is a stack slot. (mem/c:DF (plus:DI (reg/f:DI 1 1) (const_int 112 [0x70])) [19 %sfp+112 S8 A64]) No surprises there. The problem is that the stack slot perfectly matches the Y constraint, and we see reload choosing the Y,r alternative of movdf_hardfloat64 as this happens to give a better "losers" value than the r,Y alternative. This is a bit weird considering the original instruction was a load and now we're matching a store, but reasonable when you realize that what reload is looking at now is really a mem->mem move. But of course now (mem:DF (plus:DI (reg:DI 15 15 [orig:4062 D.10743 ] [4062]) (const_int 15 [0xf])) [16 S8 A64]) needs to be reloaded to a reg, and this is attempted with a general reg. Reload assumes that this reload is OK, but really hasn't made any progress at all! How to fix this? Well, stack slots need to stay matching the Y constraint since we obviously need to load and store fp values in stack slots. I don't think adding '?' to the first constraint is a solution as then the problem would reappear on fp stores to misaligned mem. The only solution I see is a secondary reload to fix up invalid offset addresses. Much of the following patch is based on Michael Meissner's support for vector reloads. The predicates.md change teaches the predicate used by the "Y" constraint to check cmodel medium addresses in case such addresses should ever be generated with invalid offsets. Bootstrapped and regression tested powerpc64-linux. OK mainline? * config/rs6000/predicates.md (word_offset_memref_op): Handle cmodel medium addresses. * config/rs6000/rs6000.c (rs6000_secondary_reload): Handle misaligned 64-bit gpr loads and stores. (rs6000_secondary_reload_ppc64): New function. * config/rs6000/rs6000-protos.h: Declare it. * config/rs6000/rs6000.md (reload_di_store, reload_di_load): New. Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (revision 171446) +++ gcc/config/rs6000/predicates.md (working copy) @@ -435,9 +435,12 @@ (define_predicate "word_offset_memref_op op = XEXP (op, 0); else if (GET_CODE (op) == PRE_MODIFY) op = XEXP (op, 1); + else if (GET_CODE (op) == LO_SUM + && GET_CODE (XEXP (op, 0)) == REG + && GET_CODE (XEXP (op, 1)) == CONST) + op = XEXP (XEXP (op, 1), 0); return (GET_CODE (op) != PLUS - || ! REG_P (XEXP (op, 0)) || GET_CODE (XEXP (op, 1)) != CONST_INT || INTVAL (XEXP (op, 1)) % 4 == 0); }) Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 171446) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -79,6 +79,7 @@ extern bool (*rs6000_cannot_change_mode_ enum machine_mode, enum reg_class); extern void rs6000_secondary_reload_inner (rtx, rtx, rtx, bool); +extern void rs6000_secondary_reload_ppc64 (rtx, rtx, rtx, bool); extern int paired_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx); extern void paired_expand_vector_move (rtx operands[]); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 171446) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -14816,7 +14815,10 @@ rs6000_reload_register_type (enum reg_cl needed for the immediate register. For VSX and Altivec, we may need a register to convert sp+offset into - reg+sp. */ + reg+sp. + + For misaligned 64-bit gpr loads and stores we need a register to + convert an offset address to indirect. */ static reg_class_t rs6000_secondary_reload (bool in_p, @@ -14919,6 +14921,34 @@ rs6000_secondary_reload (bool in_p, else default_p = true; } + else if (TARGET_POWERPC64 + && rs6000_reload_register_type (rclass) == GPR_REGISTER_TYPE + && MEM_P (x) + && GET_MODE_SIZE (GET_MODE (x)) >= UNITS_PER_WORD) + { + rtx addr = XEXP (x, 0); + + if (GET_CODE (addr) == PRE_MODIFY) + addr = XEXP (addr, 1); + else if (GET_CODE (addr) == LO_SUM + && GET_CODE (XEXP (addr, 0)) == REG + && GET_CODE (XEXP (addr, 1)) == CONST) + addr = XEXP (XEXP (addr, 1), 0); + + if (GET_CODE (addr) == PLUS + && GET_CODE (XEXP (addr, 1)) == CONST_INT + && (INTVAL (XEXP (addr, 1)) & 3) != 0) + { + if (in_p) + sri->icode = CODE_FOR_reload_di_load; + else + sri->icode = CODE_FOR_reload_di_store; + sri->extra_cost = 2; + ret = NO_REGS; + } + else + default_p = true; + } else default_p = true; @@ -15207,6 +15237,56 @@ rs6000_secondary_reload_inner (rtx reg, return; } +/* Convert reloads involving 64-bit gprs and misaligned offset + addressing to use indirect addressing. */ + +void +rs6000_secondary_reload_ppc64 (rtx reg, rtx mem, rtx scratch, bool store_p) +{ + int regno = true_regnum (reg); + enum reg_class rclass; + rtx addr; + rtx scratch_or_premodify = scratch; + + if (TARGET_DEBUG_ADDR) + { + fprintf (stderr, "\nrs6000_secondary_reload_ppc64, type = %s\n", + store_p ? "store" : "load"); + fprintf (stderr, "reg:\n"); + debug_rtx (reg); + fprintf (stderr, "mem:\n"); + debug_rtx (mem); + fprintf (stderr, "scratch:\n"); + debug_rtx (scratch); + } + + gcc_assert (regno >= 0 && regno < FIRST_PSEUDO_REGISTER); + gcc_assert (GET_CODE (mem) == MEM); + rclass = REGNO_REG_CLASS (regno); + gcc_assert (rclass == GENERAL_REGS || rclass == BASE_REGS); + addr = XEXP (mem, 0); + + if (GET_CODE (addr) == PRE_MODIFY) + { + scratch_or_premodify = XEXP (addr, 0); + gcc_assert (REG_P (scratch_or_premodify)); + addr = XEXP (addr, 1); + } + gcc_assert (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM); + + rs6000_emit_move (scratch_or_premodify, addr, Pmode); + + mem = change_address (mem, VOIDmode, scratch_or_premodify); + + /* Now create the move. */ + if (store_p) + emit_insn (gen_rtx_SET (VOIDmode, mem, reg)); + else + emit_insn (gen_rtx_SET (VOIDmode, reg, mem)); + + return; +} + /* Target hook to return the cover classes for Integrated Register Allocator. Cover classes is a set of non-intersected register classes covering all hard registers used for register allocation purpose. Any move between two Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 171446) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -9645,6 +9645,27 @@ (define_insn "*movdf_softfloat32" [(set_attr "type" "two,load,store,*,*,*") (set_attr "length" "8,8,8,8,12,16")]) +;; Reload patterns to support gpr load/store with misaligned mem. +(define_expand "reload_di_store" + [(parallel [(match_operand 0 "memory_operand" "=m") + (match_operand 1 "gpc_reg_operand" "r") + (match_operand:DI 2 "register_operand" "=&b")])] + "TARGET_POWERPC64" +{ + rs6000_secondary_reload_ppc64 (operands[1], operands[0], operands[2], true); + DONE; +}) + +(define_expand "reload_di_load" + [(parallel [(match_operand 0 "gpc_reg_operand" "=r") + (match_operand 1 "memory_operand" "m") + (match_operand:DI 2 "register_operand" "=b")])] + "TARGET_POWERPC64" +{ + rs6000_secondary_reload_ppc64 (operands[0], operands[1], operands[2], false); + DONE; +}) + ; ld/std require word-aligned displacements -> 'Y' constraint. ; List Y->r and r->Y before r->r for reload. (define_insn "*movdf_hardfloat64_mfpgpr"