From patchwork Wed Mar 19 19:33:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 331858 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 1EF7A2C0096 for ; Thu, 20 Mar 2014 06:36:06 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=ZdP 38mLAx2gMCN+AgncW2IQxuMb9BHEqrPxGsa95/e49WNVnH7RfAX0ZPCrmEVJ0jje V/Uja2E6q96GYSmz+TpJCBbRZ2gSl4tu5k5tFaD2BtVCwM1PSHc9oWaGBtBaqTxs Rh1ST8tsjVzMnASkCsrjSMppr+6pCadUrbLdO3nE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=tWjUp4DNJ SjVb2nmr4rEBRN5ZaQ=; b=TBUij6DFohtnofdiZT8CugePr7opwI5vLOr/yi66a cpU3Oqlj6K+avu92/82oMp8QAdBkIU7oW19ekH2tcaoRepsiFdWGE73HCEvN6V+E 2bNzApsHtlCrE22xcaCFF5EeaHCrkjJmPzZc2DBogWMdl3EiJhMh7cuhIoFzkyfY E0= Received: (qmail 31166 invoked by alias); 19 Mar 2014 19:33:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31132 invoked by uid 89); 19 Mar 2014 19:33:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_50, KAM_STOCKGEN, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: e28smtp02.in.ibm.com Received: from e28smtp02.in.ibm.com (HELO e28smtp02.in.ibm.com) (122.248.162.2) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 19 Mar 2014 19:33:48 +0000 Received: from /spool/local by e28smtp02.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Mar 2014 01:03:43 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp02.in.ibm.com (192.168.1.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 20 Mar 2014 01:03:41 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id CF6AF125803E for ; Thu, 20 Mar 2014 01:06:00 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s2JJXZpi65994890 for ; Thu, 20 Mar 2014 01:03:35 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s2JJXeSe029257 for ; Thu, 20 Mar 2014 01:03:40 +0530 Received: from [9.50.16.86] (dyn9050016086.mts.ibm.com [9.50.16.86] (may be forged)) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s2JJXcBX029194; Thu, 20 Mar 2014 01:03:39 +0530 Message-ID: <1395257630.17148.22.camel@gnopaine> Subject: [4.8, PATCH 20/26] Backport Power8 and LE support: LRA From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Wed, 19 Mar 2014 14:33:50 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14031919-5816-0000-0000-00000CF16AFE X-IsSubscribed: yes Hi, This patch (diff-lra) backports the changes to enable -mlra for the PowerPC back end. Thanks, Bill 2014-03-19 Bill Schmidt Backport from mainline 2014-02-04 Michael Meissner * config/rs6000/rs6000.opt (-mlra): Add switch to enable the LRA register allocator. * config/rs6000/rs6000.c (TARGET_LRA_P): Add support for -mlra to enable the LRA register allocator. Back port the changes from the trunk to enable LRA. (rs6000_legitimate_offset_address_p): Likewise. (legitimate_lo_sum_address_p): Likewise. (use_toc_relative_ref): Likewise. (rs6000_legitimate_address_p): Likewise. (rs6000_emit_move): Likewise. (rs6000_secondary_memory_needed_mode): Likewise. (rs6000_alloc_sdmode_stack_slot): Likewise. (rs6000_lra_p): Likewise. * config/rs6000/sync.md (load_lockedti): Copy TI/PTI variables by 64-bit parts to force the register allocator to allocate even/odd register pairs for the quad word atomic instructions. (store_conditionalti): Likewise. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -1,5 +1,5 @@ /* Subroutines used for code generation on IBM RS/6000. - Copyright (C) 1991-2013 Free Software Foundation, Inc. + Copyright (C) 1991-2014 Free Software Foundation, Inc. Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu) This file is part of GCC. @@ -56,6 +56,7 @@ #include "intl.h" #include "params.h" #include "tm-constrs.h" +#include "ira.h" #include "opts.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1563,6 +1564,9 @@ static const struct attribute_spec rs600 #undef TARGET_MODE_DEPENDENT_ADDRESS_P #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p +#undef TARGET_LRA_P +#define TARGET_LRA_P rs6000_lra_p + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE rs6000_can_eliminate @@ -6242,7 +6246,7 @@ rs6000_legitimate_offset_address_p (enum return false; if (!reg_offset_addressing_ok_p (mode)) return virtual_stack_registers_memory_p (x); - if (legitimate_constant_pool_address_p (x, mode, strict)) + if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress)) return true; if (GET_CODE (XEXP (x, 1)) != CONST_INT) return false; @@ -6383,9 +6387,21 @@ legitimate_lo_sum_address_p (enum machin if (TARGET_ELF || TARGET_MACHO) { + bool large_toc_ok; + if (DEFAULT_ABI == ABI_V4 && flag_pic) return false; - if (TARGET_TOC) + /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls + push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS + recognizes some LO_SUM addresses as valid although this + function says opposite. In most cases, LRA through different + transformations can generate correct code for address reloads. + It can not manage only some LO_SUM cases. So we need to add + code analogous to one in rs6000_legitimize_reload_address for + LOW_SUM here saying that some addresses are still valid. */ + large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL + && small_toc_ref (x, VOIDmode)); + if (TARGET_TOC && ! large_toc_ok) return false; if (GET_MODE_NUNITS (mode) != 1) return false; @@ -6395,7 +6411,7 @@ legitimate_lo_sum_address_p (enum machin && (mode == DFmode || mode == DDmode))) return false; - return CONSTANT_P (x); + return CONSTANT_P (x) || large_toc_ok; } return false; @@ -7106,7 +7122,6 @@ use_toc_relative_ref (rtx sym) && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (sym), get_pool_mode (sym))) || (TARGET_CMODEL == CMODEL_MEDIUM - && !CONSTANT_POOL_ADDRESS_P (sym) && SYMBOL_REF_LOCAL_P (sym))); } @@ -7394,7 +7409,8 @@ rs6000_legitimate_address_p (enum machin if (reg_offset_p && legitimate_small_data_p (mode, x)) return 1; if (reg_offset_p - && legitimate_constant_pool_address_p (x, mode, reg_ok_strict)) + && legitimate_constant_pool_address_p (x, mode, + reg_ok_strict || lra_in_progress)) return 1; /* For TImode, if we have load/store quad and TImode in VSX registers, only allow register indirect addresses. This will allow the values to go in @@ -7680,6 +7696,7 @@ rs6000_conditional_register_usage (void) fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1; } } + /* Try to output insns to set TARGET equal to the constant C if it can be done in less than N insns. Do all computations in MODE. @@ -8112,6 +8129,68 @@ rs6000_emit_move (rtx dest, rtx source, cfun->machine->sdmode_stack_slot = eliminate_regs (cfun->machine->sdmode_stack_slot, VOIDmode, NULL_RTX); + + if (lra_in_progress + && mode == SDmode + && REG_P (operands[0]) && REGNO (operands[0]) >= FIRST_PSEUDO_REGISTER + && reg_preferred_class (REGNO (operands[0])) == NO_REGS + && (REG_P (operands[1]) + || (GET_CODE (operands[1]) == SUBREG + && REG_P (SUBREG_REG (operands[1]))))) + { + int regno = REGNO (GET_CODE (operands[1]) == SUBREG + ? SUBREG_REG (operands[1]) : operands[1]); + enum reg_class cl; + + if (regno >= FIRST_PSEUDO_REGISTER) + { + cl = reg_preferred_class (regno); + gcc_assert (cl != NO_REGS); + regno = ira_class_hard_regs[cl][0]; + } + if (FP_REGNO_P (regno)) + { + if (GET_MODE (operands[0]) != DDmode) + operands[0] = gen_rtx_SUBREG (DDmode, operands[0], 0); + emit_insn (gen_movsd_store (operands[0], operands[1])); + } + else if (INT_REGNO_P (regno)) + emit_insn (gen_movsd_hardfloat (operands[0], operands[1])); + else + gcc_unreachable(); + return; + } + if (lra_in_progress + && mode == SDmode + && (REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + && REG_P (SUBREG_REG (operands[0])))) + && REG_P (operands[1]) && REGNO (operands[1]) >= FIRST_PSEUDO_REGISTER + && reg_preferred_class (REGNO (operands[1])) == NO_REGS) + { + int regno = REGNO (GET_CODE (operands[0]) == SUBREG + ? SUBREG_REG (operands[0]) : operands[0]); + enum reg_class cl; + + if (regno >= FIRST_PSEUDO_REGISTER) + { + cl = reg_preferred_class (regno); + gcc_assert (cl != NO_REGS); + regno = ira_class_hard_regs[cl][0]; + } + if (FP_REGNO_P (regno)) + { + if (GET_MODE (operands[1]) != DDmode) + operands[1] = gen_rtx_SUBREG (DDmode, operands[1], 0); + emit_insn (gen_movsd_load (operands[0], operands[1])); + } + else if (INT_REGNO_P (regno)) + emit_insn (gen_movsd_hardfloat (operands[0], operands[1])); + else + gcc_unreachable(); + return; + } + if (reload_in_progress && mode == SDmode && cfun->machine->sdmode_stack_slot != NULL_RTX @@ -15501,6 +15580,17 @@ rs6000_secondary_memory_needed_rtx (enum return ret; } +/* Return the mode to be used for memory when a secondary memory + location is needed. For SDmode values we need to use DDmode, in + all other cases we can use the same mode. */ +enum machine_mode +rs6000_secondary_memory_needed_mode (enum machine_mode mode) +{ + if (mode == SDmode) + return DDmode; + return mode; +} + static tree rs6000_check_sdmode (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) { @@ -16394,6 +16484,10 @@ rs6000_alloc_sdmode_stack_slot (void) gimple_stmt_iterator gsi; gcc_assert (cfun->machine->sdmode_stack_slot == NULL_RTX); + /* We use a different approach for dealing with the secondary + memory in LRA. */ + if (ira_use_lra_p) + return; if (TARGET_NO_SDMODE_STACK) return; @@ -16615,7 +16709,7 @@ rs6000_secondary_reload_class (enum reg_ /* Constants, memory, and FP registers can go into FP registers. */ if ((regno == -1 || FP_REGNO_P (regno)) && (rclass == FLOAT_REGS || rclass == NON_SPECIAL_REGS)) - return (mode != SDmode) ? NO_REGS : GENERAL_REGS; + return (mode != SDmode || lra_in_progress) ? NO_REGS : GENERAL_REGS; /* Memory, and FP/altivec registers can go into fp/altivec registers under VSX. However, for scalar variables, use the traditional floating point @@ -30418,6 +30512,13 @@ rs6000_libcall_value (enum machine_mode } +/* Return true if we use LRA instead of reload pass. */ +static bool +rs6000_lra_p (void) +{ + return rs6000_lra_flag; +} + /* Given FROM and TO register numbers, say whether this elimination is allowed. Frame pointer elimination is automatically handled. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.opt =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.opt +++ gcc-4_8-test/gcc/config/rs6000/rs6000.opt @@ -1,6 +1,6 @@ ; Options for the rs6000 port of the compiler ; -; Copyright (C) 2005-2013 Free Software Foundation, Inc. +; Copyright (C) 2005-2014 Free Software Foundation, Inc. ; Contributed by Aldy Hernandez . ; ; This file is part of GCC. @@ -454,6 +454,10 @@ mlong-double- Target RejectNegative Joined UInteger Var(rs6000_long_double_type_size) Save -mlong-double- Specify size of long double (64 or 128 bits) +mlra +Target Report Var(rs6000_lra_flag) Init(0) Save +Use LRA instead of reload + msched-costly-dep= Target RejectNegative Joined Var(rs6000_sched_costly_dep_str) Determine which dependences between insns are considered costly Index: gcc-4_8-test/gcc/config/rs6000/sync.md =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/sync.md +++ gcc-4_8-test/gcc/config/rs6000/sync.md @@ -1,5 +1,5 @@ ;; Machine description for PowerPC synchronization instructions. -;; Copyright (C) 2005-2013 Free Software Foundation, Inc. +;; Copyright (C) 2005-2014 Free Software Foundation, Inc. ;; Contributed by Geoffrey Keating. ;; This file is part of GCC. @@ -205,9 +205,12 @@ [(set_attr "type" "load_l")]) ;; Use PTImode to get even/odd register pairs. + ;; Use a temporary register to force getting an even register for the -;; lqarx/stqcrx. instructions. Normal optimizations will eliminate this extra -;; copy on big endian systems. +;; lqarx/stqcrx. instructions. Under AT 7.0, we need use an explicit copy, +;; even in big endian mode, unless we are using the LRA register allocator. In +;; GCC 4.9, the register allocator is smart enough to assign a even/odd +;; register pair. ;; On little endian systems where non-atomic quad word load/store instructions ;; are not used, the address can be register+offset, so make sure the address @@ -230,12 +233,26 @@ } emit_insn (gen_load_lockedpti (pti, op1)); - if (WORDS_BIG_ENDIAN) + if (WORDS_BIG_ENDIAN && rs6000_lra_flag) emit_move_insn (op0, gen_lowpart (TImode, pti)); else { - emit_move_insn (gen_lowpart (DImode, op0), gen_highpart (DImode, pti)); - emit_move_insn (gen_highpart (DImode, op0), gen_lowpart (DImode, pti)); + rtx op0_lo = gen_lowpart (DImode, op0); + rtx op0_hi = gen_highpart (DImode, op0); + rtx pti_lo = gen_lowpart (DImode, pti); + rtx pti_hi = gen_highpart (DImode, pti); + + emit_insn (gen_rtx_CLOBBER (VOIDmode, op0)); + if (WORDS_BIG_ENDIAN) + { + emit_move_insn (op0_hi, pti_hi); + emit_move_insn (op0_lo, pti_lo); + } + else + { + emit_move_insn (op0_hi, pti_lo); + emit_move_insn (op0_lo, pti_hi); + } } DONE; }) @@ -260,8 +277,9 @@ [(set_attr "type" "store_c")]) ;; Use a temporary register to force getting an even register for the -;; lqarx/stqcrx. instructions. Normal optimizations will eliminate this extra -;; copy on big endian systems. +;; lqarx/stqcrx. instructions. Under AT 7.0, we need use an explicit copy, +;; even in big endian mode. In GCC 4.9, the register allocator is smart enough +;; to assign a even/odd register pair. ;; On little endian systems where non-atomic quad word load/store instructions ;; are not used, the address can be register+offset, so make sure the address @@ -290,12 +308,26 @@ pti_mem = change_address (op1, PTImode, addr); pti_reg = gen_reg_rtx (PTImode); - if (WORDS_BIG_ENDIAN) + if (WORDS_BIG_ENDIAN && rs6000_lra_flag) emit_move_insn (pti_reg, gen_lowpart (PTImode, op2)); else { - emit_move_insn (gen_lowpart (DImode, pti_reg), gen_highpart (DImode, op2)); - emit_move_insn (gen_highpart (DImode, pti_reg), gen_lowpart (DImode, op2)); + rtx op2_lo = gen_lowpart (DImode, op2); + rtx op2_hi = gen_highpart (DImode, op2); + rtx pti_lo = gen_lowpart (DImode, pti_reg); + rtx pti_hi = gen_highpart (DImode, pti_reg); + + emit_insn (gen_rtx_CLOBBER (VOIDmode, op0)); + if (WORDS_BIG_ENDIAN) + { + emit_move_insn (pti_hi, op2_hi); + emit_move_insn (pti_lo, op2_lo); + } + else + { + emit_move_insn (pti_hi, op2_lo); + emit_move_insn (pti_lo, op2_hi); + } } emit_insn (gen_store_conditionalpti (op0, pti_mem, pti_reg));