From patchwork Mon Jul 15 21:43:10 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 259266 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id D41BB2C0185 for ; Tue, 16 Jul 2013 07:43:37 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=AcBG4/eco43aNtvbu XByYe2WdbSm3Qqj2xvA6hq5+wtNHYOyPA7k2KpnqZUFQMU9gUwTT8lhEPaCg9tiW 5UxXP7O1eFgbIy2rwqLXIJMnIyq+0XS+fHQQlUtFbqYwbb190UuKsGqC6NdzzBKn RLuMX0BPwC4+U6Hb5Zzri+pRoc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=ZX/bKbBL9ya/G83Q+jSyMpf 8SVU=; b=L3dG8GJ0fV0Pu7uljWUuF4W/lcROGUxGrZwzp7J7j+S6Kw6RmRdr1+m JQpguzTs2SlrnNggTsTM11IaX1qCcOTEkoMM2LQbt/G8pkyvK0i4+WoRteDrEzEY OYuBz22S6k21IJdaPvEIFdvAXcNblqNzapqxXSqfxDLFLZR/wGvk= Received: (qmail 25054 invoked by alias); 15 Jul 2013 21:43:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 25028 invoked by uid 89); 15 Jul 2013 21:43:27 -0000 X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_50, KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RDNS_NONE, TW_EQ, TW_FP, TW_XL autolearn=no version=3.3.1 Received: from Unknown (HELO e7.ny.us.ibm.com) (32.97.182.137) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 15 Jul 2013 21:43:23 +0000 Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 15 Jul 2013 17:43:15 -0400 Received: from d01dlp01.pok.ibm.com (9.56.250.166) by e7.ny.us.ibm.com (192.168.1.107) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 15 Jul 2013 17:43:14 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 3620A38C8027 for ; Mon, 15 Jul 2013 17:43:12 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r6FLhDvq092008 for ; Mon, 15 Jul 2013 17:43:13 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r6FLhCcA029621 for ; Mon, 15 Jul 2013 17:43:12 -0400 Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-206.usma.ibm.com [9.32.77.206]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r6FLhBKK029563; Mon, 15 Jul 2013 17:43:11 -0400 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 05ED345D53; Mon, 15 Jul 2013 17:43:10 -0400 (EDT) Date: Mon, 15 Jul 2013 17:43:10 -0400 From: Michael Meissner To: David Edelsohn Cc: Michael Meissner , GCC Patches , Pat Haugen , Peter Bergner Subject: Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins Message-ID: <20130715214310.GA24693@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , David Edelsohn , GCC Patches , Pat Haugen , Peter Bergner References: <20130520204053.GA21090@ibm-tiger.the-meissners.org> <20130521234717.GA27879@ibm-tiger.the-meissners.org> <20130604184853.GA12768@ibm-tiger.the-meissners.org> <20130605161332.GB5774@ibm-tiger.the-meissners.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13071521-5806-0000-0000-000022108AAD X-Virus-Found: No On Thu, Jun 06, 2013 at 11:57:01AM -0400, David Edelsohn wrote: > But I view this as a preliminary step. The logical instructions need > an iterator and TImode needs to be cleaned up on 32 bit. > > Thanks, David Here is my proposed cleanup of the logical support. It adds DI expanders, which on 32-bit split the insn immediately, just like the current behavior in 32-bit. It defines 128-bit logical operations for both 32/64-bit modes. If VSX is available, it uses the VSX register set, but allows fallback to GPRs. Similarly for Altivec only (which was not handled in the last patch). TImode prefers GPRs, while the vector types prefer VSX/Altivec. I've bootstrapped it and ran make check with no regressions. I'm running the 10 spec tests (gcc, hmmer, povray, milc, omnetpp, h264ref, cactusADM, libquantum, perlbench, and gromacs) that use long long in some fashion and there was no significant differences in 32-bit mode, when built with the same compiler version (I'm using subversion id 200823 as the base for the moment). Are these patches ok to install? 2013-07-15 Michael Meissner * config/rs6000/vector.md (xor3): Move 128-bit boolean expanders to rs6000.md. (ior3): Likewise. (and3): Likewise. (one_cmpl2): Likewise. (nor3): Likewise. (andc3): Likewise. (eqv3): Likewise. (nand3): Likewise. (orc3): Likewise. * config/rs6000/vsx.md (VSX_L2): Delete, no longer used. (vsx_and3_32bit): Move 128-bit logical insns to rs6000.md, and allow TImode operations in 32-bit. (vsx_and3_64bit): Likewise. (vsx_ior3_32bit): Likewise. (vsx_ior3_64bit): Likewise. (vsx_xor3_32bit): Likewise. (vsx_xor3_64bit): Likewise. (vsx_one_cmpl2_32bit): Likewise. (vsx_one_cmpl2_64bit): Likewise. (vsx_nor3_32bit): Likewise. (vsx_nor3_64bit): Likewise. (vsx_andc3_32bit): Likewise. (vsx_andc3_64bit): Likewise. (vsx_eqv3_32bit): Likewise. (vsx_eqv3_64bit): Likewise. (vsx_nand3_32bit): Likewise. (vsx_nand3_64bit): Likewise. (vsx_orc3_32bit): Likewise. (vsx_orc3_64bit): Likewise. * config/rs6000/altivec.md (altivec_and): Move 128-bit logical insns to rs6000.md, and allow TImode operations in 32-bit. (altivec_ior3): Likewise. (altivec_xor3): Likewise. (altivec_one_cmpl2): Likewise. (altivec_nor3): Likewise. (altivec_andc3): Likewise. * config/rs6000/rs6000.md (BOOL_128): New mode iterators and mode attributes for moving the 128-bit logical operations into rs6000.md. (BOOL_REGS_OUTPUT): Likewise. (BOOL_REGS_OP1): Likewise. (BOOL_REGS_OP2): Likewise. (BOOL_REGS_UNARY): Likewise. (BOOL_REGS_AND_CR0): Likewise. (one_cmpl2): Add support for DI logical operations on 32-bit, splitting the operations to 32-bit. (anddi3): Likewise. (iordi3): Likewise. (xordi3): Likewise. (and3, 128-bit types): Rewrite 2013-06-06 logical operator changes to combine the 32/64-bit code, allow logical operations on TI mode in 32-bit, and to use similar match_operator patterns like scalar mode uses. Combine the Altivec and VSX code for logical operations, and move it here. (ior3, 128-bit types): Likewise. (xor3, 128-bit types): Likewise. (one_cmpl3, 128-bit types): Likewise. (nor3, 128-bit types): Likewise. (andc3, 128-bit types): Likewise. (eqv3, 128-bit types): Likewise. (nand3, 128-bit types): Likewise. (orc3, 128-bit types): Likewise. (and3_internal): Likewise. (bool3_internal): Likewise. (boolc3_internal1): Likewise. (boolc3_internal2): Likewise. (boolcc3_internal1): Likewise. (boolcc3_internal2): Likewise. (eqv3_internal1): Likewise. (eqv3_internal2): Likewise. (one_cmpl13_internal): Likewise. * config/rs6000/rs6000-protos.h (rs6000_split_logical): New declaration. * config/rs6000/rs6000.c (rs6000_split_logical_inner): Add support to split multi-word logical operations. (rs6000_split_logical_di): Likewise. (rs6000_split_logical): Likewise. Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 200823) +++ gcc/config/rs6000/vector.md (.../gcc/config/rs6000) (working copy) @@ -710,87 +710,6 @@ (define_expand "cr6_test_for_lt_reverse" "") -;; Vector logical instructions -;; Do not support TImode logical instructions on 32-bit at present, because the -;; compiler will see that we have a TImode and when it wanted DImode, and -;; convert the DImode to TImode, store it on the stack, and load it in a VSX -;; register. -(define_expand "xor3" - [(set (match_operand:VEC_L 0 "vlogical_operand" "") - (xor:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") - (match_operand:VEC_L 2 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)" - "") - -(define_expand "ior3" - [(set (match_operand:VEC_L 0 "vlogical_operand" "") - (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") - (match_operand:VEC_L 2 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)" - "") - -(define_expand "and3" - [(parallel [(set (match_operand:VEC_L 0 "vlogical_operand" "") - (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "") - (match_operand:VEC_L 2 "vlogical_operand" ""))) - (clobber (match_scratch:CC 3 ""))])] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)" - "") - -(define_expand "one_cmpl2" - [(set (match_operand:VEC_L 0 "vlogical_operand" "") - (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)" - "") - -(define_expand "nor3" - [(set (match_operand:VEC_L 0 "vlogical_operand" "") - (and:VEC_L (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")) - (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)" - "") - -(define_expand "andc3" - [(set (match_operand:VEC_L 0 "vlogical_operand" "") - (and:VEC_L (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" "")) - (match_operand:VEC_L 1 "vlogical_operand" "")))] - "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)" - "") - -;; Power8 vector logical instructions. -(define_expand "eqv3" - [(set (match_operand:VEC_L 0 "register_operand" "") - (not:VEC_L - (xor:VEC_L (match_operand:VEC_L 1 "register_operand" "") - (match_operand:VEC_L 2 "register_operand" ""))))] - "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)") - -;; Rewrite nand into canonical form -(define_expand "nand3" - [(set (match_operand:VEC_L 0 "register_operand" "") - (ior:VEC_L - (not:VEC_L (match_operand:VEC_L 1 "register_operand" "")) - (not:VEC_L (match_operand:VEC_L 2 "register_operand" ""))))] - "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)") - -;; The canonical form is to have the negated elment first, so we need to -;; reverse arguments. -(define_expand "orc3" - [(set (match_operand:VEC_L 0 "register_operand" "") - (ior:VEC_L - (not:VEC_L (match_operand:VEC_L 1 "register_operand" "")) - (match_operand:VEC_L 2 "register_operand" "")))] - "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode) - && (mode != TImode || TARGET_POWERPC64)") - ;; Vector count leading zeros (define_expand "clz2" [(set (match_operand:VEC_I 0 "register_operand" "") Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 200823) +++ gcc/config/rs6000/rs6000-protos.h (.../gcc/config/rs6000) (working copy) @@ -138,6 +138,7 @@ extern rtx rs6000_address_for_fpconvert extern rtx rs6000_address_for_altivec (rtx); extern rtx rs6000_allocate_stack_temp (enum machine_mode, bool, bool); extern int rs6000_loop_align (rtx); +extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool, rtx); #endif /* RTX_CODE */ #ifdef TREE_CODE Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 200823) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -29780,6 +29780,280 @@ rs6000_set_up_by_prologue (struct hard_r add_to_hard_reg_set (&set->set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM); } + +/* Helper function for rs6000_split_logical to emit a logical instruction after + spliting the operation to single GPR registers. + + DEST is the destination register. + OP1 and OP2 are the input source registers. + CODE is the base operation (AND, IOR, XOR, NOT). + MODE is the machine mode. + If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT. + If COMPLEMENT_OP1_P is true, wrap operand1 with NOT. + If COMPLEMENT_OP2_P is true, wrap operand2 with NOT. + CLOBBER_REG is either NULL or a scratch register of type CC to allow + formation of the AND instructions. */ + +static void +rs6000_split_logical_inner (rtx dest, + rtx op1, + rtx op2, + enum rtx_code code, + enum machine_mode mode, + bool complement_final_p, + bool complement_op1_p, + bool complement_op2_p, + rtx clobber_reg) +{ + rtx bool_rtx; + rtx set_rtx; + + /* Optimize AND of 0/0xffffffff and IOR/XOR of 0. */ + if (op2 && GET_CODE (op2) == CONST_INT + && (mode == SImode || (mode == DImode && TARGET_POWERPC64)) + && !complement_final_p && !complement_op1_p && !complement_op2_p) + { + HOST_WIDE_INT mask = GET_MODE_MASK (mode); + HOST_WIDE_INT value = INTVAL (op2) & mask; + + /* Optimize AND of 0 to just set 0. Optimize AND of -1 to be a move. */ + if (code == AND) + { + if (value == 0) + { + emit_insn (gen_rtx_SET (VOIDmode, dest, const0_rtx)); + return; + } + + else if (value == mask) + { + if (!rtx_equal_p (dest, op1)) + emit_insn (gen_rtx_SET (VOIDmode, dest, op1)); + return; + } + } + + /* Optimize IOR/XOR of 0 to be a simple move. Split large operations + into separate ORI/ORIS or XORI/XORIS instrucitons. */ + else if (code == IOR || code == XOR) + { + if (value == 0) + { + if (!rtx_equal_p (dest, op1)) + emit_insn (gen_rtx_SET (VOIDmode, dest, op1)); + return; + } + } + } + + if (complement_op1_p) + op1 = gen_rtx_NOT (mode, op1); + + if (complement_op2_p) + op2 = gen_rtx_NOT (mode, op2); + + bool_rtx = ((code == NOT) + ? gen_rtx_NOT (mode, op1) + : gen_rtx_fmt_ee (code, mode, op1, op2)); + + if (complement_final_p) + bool_rtx = gen_rtx_NOT (mode, bool_rtx); + + set_rtx = gen_rtx_SET (VOIDmode, dest, bool_rtx); + + /* Is this AND with an explicit clobber? */ + if (clobber_reg) + { + rtx clobber = gen_rtx_CLOBBER (VOIDmode, clobber_reg); + set_rtx = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_rtx, clobber)); + } + + emit_insn (set_rtx); + return; +} + +/* Split a DImode AND/IOR/XOR with a constant on a 32-bit system. These + operations are split immediately during RTL generation to allow for more + optimizations of the AND/IOR/XOR. + + OPERANDS is an array containing the destination and two input operands. + CODE is the base operation (AND, IOR, XOR, NOT). + MODE is the machine mode. + If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT. + If COMPLEMENT_OP1_P is true, wrap operand1 with NOT. + If COMPLEMENT_OP2_P is true, wrap operand2 with NOT. + CLOBBER_REG is either NULL or a scratch register of type CC to allow + formation of the AND instructions. */ + +static void +rs6000_split_logical_di (rtx operands[3], + enum rtx_code code, + bool complement_final_p, + bool complement_op1_p, + bool complement_op2_p, + rtx clobber_reg) +{ + const HOST_WIDE_INT lower_32bits = HOST_WIDE_INT_C(0xffffffff); + const HOST_WIDE_INT upper_32bits = ~ lower_32bits; + const HOST_WIDE_INT sign_bit = HOST_WIDE_INT_C(0x80000000); + enum hi_lo { hi = 0, lo = 1 }; + rtx op0_hi_lo[2], op1_hi_lo[2], op2_hi_lo[2]; + size_t i; + + op0_hi_lo[hi] = gen_highpart (SImode, operands[0]); + op1_hi_lo[hi] = gen_highpart (SImode, operands[1]); + op0_hi_lo[lo] = gen_lowpart (SImode, operands[0]); + op1_hi_lo[lo] = gen_lowpart (SImode, operands[1]); + + if (code == NOT) + op2_hi_lo[hi] = op2_hi_lo[lo] = NULL_RTX; + else + { + if (GET_CODE (operands[2]) != CONST_INT) + { + op2_hi_lo[hi] = gen_highpart_mode (SImode, DImode, operands[2]); + op2_hi_lo[lo] = gen_lowpart (SImode, operands[2]); + } + else + { + HOST_WIDE_INT value = INTVAL (operands[2]); + HOST_WIDE_INT value_hi_lo[2]; + + gcc_assert (!complement_final_p); + gcc_assert (!complement_op1_p); + gcc_assert (!complement_op2_p); + + value_hi_lo[hi] = value >> 32; + value_hi_lo[lo] = value & lower_32bits; + + for (i = 0; i < 2; i++) + { + HOST_WIDE_INT sub_value = value_hi_lo[i]; + + if (sub_value & sign_bit) + sub_value |= upper_32bits; + + op2_hi_lo[i] = GEN_INT (sub_value); + + /* If this is an AND instruction, check to see if we need to load + the value in a register. */ + if (code == AND && sub_value != -1 && sub_value != 0 + && !and_operand (op2_hi_lo[i], SImode)) + op2_hi_lo[i] = force_reg (SImode, op2_hi_lo[i]); + } + } + } + + for (i = 0; i < 2; i++) + { + /* Split large IOR/XOR operations. */ + if ((code == IOR || code == XOR) + && GET_CODE (op2_hi_lo[i]) == CONST_INT + && !complement_final_p + && !complement_op1_p + && !complement_op2_p + && clobber_reg == NULL_RTX + && !logical_const_operand (op2_hi_lo[i], SImode)) + { + HOST_WIDE_INT value = INTVAL (op2_hi_lo[i]); + HOST_WIDE_INT hi_16bits = value & HOST_WIDE_INT_C(0xffff0000); + HOST_WIDE_INT lo_16bits = value & HOST_WIDE_INT_C(0x0000ffff); + rtx tmp = gen_reg_rtx (SImode); + + /* Make sure the constant is sign extended. */ + if ((hi_16bits & sign_bit) != 0) + hi_16bits |= upper_32bits; + + rs6000_split_logical_inner (tmp, op1_hi_lo[i], GEN_INT (hi_16bits), + code, SImode, false, false, false, + NULL_RTX); + + rs6000_split_logical_inner (op0_hi_lo[i], tmp, GEN_INT (lo_16bits), + code, SImode, false, false, false, + NULL_RTX); + } + else + rs6000_split_logical_inner (op0_hi_lo[i], op1_hi_lo[i], op2_hi_lo[i], + code, SImode, complement_final_p, + complement_op1_p, complement_op2_p, + clobber_reg); + } + + return; +} + +/* Split the insns that make up boolean operations operating on multiple GPR + registers. The boolean MD patterns ensure that the inputs either are + exactly the same as the output registers, or there is no overlap. + + OPERANDS is an array containing the destination and two input operands. + CODE is the base operation (AND, IOR, XOR, NOT). + MODE is the machine mode. + If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT. + If COMPLEMENT_OP1_P is true, wrap operand1 with NOT. + If COMPLEMENT_OP2_P is true, wrap operand2 with NOT. + CLOBBER_REG is either NULL or a scratch register of type CC to allow + formation of the AND instructions. */ + +void +rs6000_split_logical (rtx operands[3], + enum rtx_code code, + bool complement_final_p, + bool complement_op1_p, + bool complement_op2_p, + rtx clobber_reg) +{ + enum machine_mode mode = GET_MODE (operands[0]); + enum machine_mode sub_mode; + rtx op0, op1, op2; + int sub_size, regno0, regno1, nregs, i; + + /* If this is DImode, use the specialized version that can run before + register allocation. */ + if (mode == DImode && !TARGET_POWERPC64) + { + rs6000_split_logical_di (operands, code, complement_final_p, + complement_op1_p, complement_op2_p, + clobber_reg); + return; + } + + op0 = operands[0]; + op1 = operands[1]; + op2 = (code == NOT) ? NULL_RTX : operands[2]; + sub_mode = (TARGET_POWERPC64) ? DImode : SImode; + sub_size = GET_MODE_SIZE (sub_mode); + regno0 = REGNO (op0); + regno1 = REGNO (op1); + + gcc_assert (reload_completed); + gcc_assert (IN_RANGE (regno0, FIRST_GPR_REGNO, LAST_GPR_REGNO)); + gcc_assert (IN_RANGE (regno1, FIRST_GPR_REGNO, LAST_GPR_REGNO)); + + nregs = rs6000_hard_regno_nregs[(int)mode][regno0]; + gcc_assert (nregs > 1); + + if (op2 && REG_P (op2)) + gcc_assert (IN_RANGE (REGNO (op2), FIRST_GPR_REGNO, LAST_GPR_REGNO)); + + for (i = 0; i < nregs; i++) + { + int offset = i * sub_size; + rtx sub_op0 = simplify_subreg (sub_mode, op0, mode, offset); + rtx sub_op1 = simplify_subreg (sub_mode, op1, mode, offset); + rtx sub_op2 = ((code == NOT) + ? NULL_RTX + : simplify_subreg (sub_mode, op2, mode, offset)); + + rs6000_split_logical_inner (sub_op0, sub_op1, sub_op2, code, sub_mode, + complement_final_p, complement_op1_p, + complement_op2_p, clobber_reg); + } + + return; +} + + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 200823) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -36,10 +36,6 @@ (define_mode_iterator VSX_F [V4SF V2DF]) ;; Iterator for logical types supported by VSX (define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF TI]) -;; Like VSX_L, but don't support TImode for doing logical instructions in -;; 32-bit -(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF]) - ;; Iterator for memory move. Handle TImode specially to allow ;; it to use gprs as well as vsx registers. (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF]) @@ -1047,370 +1043,6 @@ (define_insn "*vsx_float_fix_2" (set_attr "fp_type" "")]) -;; Logical operations. Do not support TImode logical instructions on 32-bit at -;; present, because the compiler will see that we have a TImode and when it -;; wanted DImode, and convert the DImode to TImode, store it on the stack, and -;; load it in a VSX register or generate extra logical instructions in GPR -;; registers. - -;; When we are splitting the operations to GPRs, we use three alternatives, two -;; where the first/second inputs and output are in the same register, and the -;; third where the output specifies an early clobber so that we don't have to -;; worry about overlapping registers. - -(define_insn "*vsx_and3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (and:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa") - (match_operand:VSX_L2 2 "vlogical_operand" "wa"))) - (clobber (match_scratch:CC 3 "X"))] - "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "xxland %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_and3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r") - (and:VSX_L - (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r") - (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))) - (clobber (match_scratch:CC 3 "X,X,X,X"))] - "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "@ - xxland %x0,%x1,%x2 - # - # - #" - "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(parallel [(set (match_dup 4) (and:DI (match_dup 5) (match_dup 6))) - (clobber (match_dup 3))]) - (parallel [(set (match_dup 7) (and:DI (match_dup 8) (match_dup 9))) - (clobber (match_dup 3))])] -{ - operands[4] = simplify_subreg (DImode, operands[0], mode, 0); - operands[5] = simplify_subreg (DImode, operands[1], mode, 0); - operands[6] = simplify_subreg (DImode, operands[2], mode, 0); - operands[7] = simplify_subreg (DImode, operands[0], mode, 8); - operands[8] = simplify_subreg (DImode, operands[1], mode, 8); - operands[9] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two") - (set_attr "length" "4,8,8,8")]) - -(define_insn "*vsx_ior3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (ior:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa") - (match_operand:VSX_L2 2 "vlogical_operand" "wa")))] - "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "xxlor %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_ior3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r") - (ior:VSX_L - (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r") - (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))] - "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "@ - xxlor %x0,%x1,%x2 - # - # - # - # - #" - "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(const_int 0)] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); - - if (operands[5] == constm1_rtx) - emit_move_insn (operands[3], constm1_rtx); - - else if (operands[5] == const0_rtx) - { - if (!rtx_equal_p (operands[3], operands[4])) - emit_move_insn (operands[3], operands[4]); - } - else - emit_insn (gen_iordi3 (operands[3], operands[4], operands[5])); - - if (operands[8] == constm1_rtx) - emit_move_insn (operands[8], constm1_rtx); - - else if (operands[8] == const0_rtx) - { - if (!rtx_equal_p (operands[6], operands[7])) - emit_move_insn (operands[6], operands[7]); - } - else - emit_insn (gen_iordi3 (operands[6], operands[7], operands[8])); - DONE; -} - [(set_attr "type" "vecsimple,two,two,two,three,three") - (set_attr "length" "4,8,8,8,16,16")]) - -(define_insn "*vsx_xor3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa") - (match_operand:VSX_L2 2 "vlogical_operand" "wa")))] - "VECTOR_MEM_VSX_P (mode) && !TARGET_POWERPC64" - "xxlxor %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_xor3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r") - (xor:VSX_L - (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r") - (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))] - "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "@ - xxlxor %x0,%x1,%x2 - # - # - # - # - #" - "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 3) (xor:DI (match_dup 4) (match_dup 5))) - (set (match_dup 6) (xor:DI (match_dup 7) (match_dup 8)))] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two,three,three") - (set_attr "length" "4,8,8,8,16,16")]) - -(define_insn "*vsx_one_cmpl2_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))] - "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "xxlnor %x0,%x1,%x1" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_one_cmpl2_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,&?r") - (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r")))] - "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "@ - xxlnor %x0,%x1,%x1 - # - #" - "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 2) (not:DI (match_dup 3))) - (set (match_dup 4) (not:DI (match_dup 5)))] -{ - operands[2] = simplify_subreg (DImode, operands[0], mode, 0); - operands[3] = simplify_subreg (DImode, operands[1], mode, 0); - operands[4] = simplify_subreg (DImode, operands[0], mode, 8); - operands[5] = simplify_subreg (DImode, operands[1], mode, 8); -} - [(set_attr "type" "vecsimple,two,two") - (set_attr "length" "4,8,8")]) - -(define_insn "*vsx_nor3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (and:VSX_L2 - (not:VSX_L2 (match_operand:VSX_L 1 "vlogical_operand" "%wa")) - (not:VSX_L2 (match_operand:VSX_L 2 "vlogical_operand" "wa"))))] - "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "xxlnor %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_nor3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r") - (and:VSX_L - (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r")) - (not:VSX_L (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))] - "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "@ - xxlnor %x0,%x1,%x2 - # - # - #" - "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (not:DI (match_dup 5)))) - (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two") - (set_attr "length" "4,8,8,8")]) - -(define_insn "*vsx_andc3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (and:VSX_L2 - (not:VSX_L2 - (match_operand:VSX_L2 2 "vlogical_operand" "wa")) - (match_operand:VSX_L2 1 "vlogical_operand" "wa")))] - "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "xxlandc %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_andc3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r") - (and:VSX_L - (not:VSX_L - (match_operand:VSX_L 2 "vlogical_operand" "wa,0,r,r")) - (match_operand:VSX_L 1 "vlogical_operand" "wa,r,0,r")))] - "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)" - "@ - xxlandc %x0,%x1,%x2 - # - # - #" - "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (match_dup 5))) - (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (match_dup 8)))] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two") - (set_attr "length" "4,8,8,8")]) - -;; Power8 vector logical instructions. -(define_insn "*vsx_eqv3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (not:VSX_L2 - (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa") - (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))] - "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode)" - "xxleqv %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_eqv3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r") - (not:VSX_L - (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r") - (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))] - "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode)" - "@ - xxleqv %x0,%x1,%x2 - # - # - #" - "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR - && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 3) (not:DI (xor:DI (match_dup 4) (match_dup 5)))) - (set (match_dup 6) (not:DI (xor:DI (match_dup 7) (match_dup 8))))] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two") - (set_attr "length" "4,8,8,8")]) - -;; Rewrite nand into canonical form -(define_insn "*vsx_nand3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (ior:VSX_L2 - (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")) - (not:VSX_L2 (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))] - "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode)" - "xxlnand %x0,%x1,%x2" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_nand3_64bit" - [(set (match_operand:VSX_L 0 "register_operand" "=wa,?r,?r,?r") - (ior:VSX_L - (not:VSX_L (match_operand:VSX_L 1 "register_operand" "wa,0,r,r")) - (not:VSX_L (match_operand:VSX_L 2 "register_operand" "wa,r,0,r"))))] - "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode)" - "@ - xxlnand %x0,%x1,%x2 - # - # - #" - "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR - && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (not:DI (match_dup 5)))) - (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two") - (set_attr "length" "4,8,8,8")]) - -;; Rewrite or complement into canonical form, by reversing the arguments -(define_insn "*vsx_orc3_32bit" - [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa") - (ior:VSX_L2 - (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")) - (match_operand:VSX_L2 2 "vlogical_operand" "wa")))] - "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode)" - "xxlorc %x0,%x2,%x1" - [(set_attr "type" "vecsimple") - (set_attr "length" "4")]) - -(define_insn_and_split "*vsx_orc3_64bit" - [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r") - (ior:VSX_L - (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r")) - (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))] - "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (mode)" - "@ - xxlorc %x0,%x2,%x1 - # - # - #" - "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR - && VECTOR_MEM_VSX_P (mode) - && int_reg_operand (operands[0], mode)" - [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (match_dup 5))) - (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (match_dup 8)))] -{ - operands[3] = simplify_subreg (DImode, operands[0], mode, 0); - operands[4] = simplify_subreg (DImode, operands[1], mode, 0); - operands[5] = simplify_subreg (DImode, operands[2], mode, 0); - operands[6] = simplify_subreg (DImode, operands[0], mode, 8); - operands[7] = simplify_subreg (DImode, operands[1], mode, 8); - operands[8] = simplify_subreg (DImode, operands[2], mode, 8); -} - [(set_attr "type" "vecsimple,two,two,two") - (set_attr "length" "4,8,8,8")]) - - ;; Permute operations ;; Build a V2DF/V2DI vector from two scalars Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 200823) +++ gcc/config/rs6000/altivec.md (.../gcc/config/rs6000) (working copy) @@ -1040,59 +1040,7 @@ (define_insn "vec_widen_smult_odd_v8hi" [(set_attr "type" "veccomplex")]) -;; logical ops. Have the logical ops follow the memory ops in -;; terms of whether to prefer VSX or Altivec - -;; AND has a clobber to be consistant with VSX, which adds splitters for using -;; the GPR registers. -(define_insn "*altivec_and3" - [(set (match_operand:VM 0 "register_operand" "=v") - (and:VM (match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v"))) - (clobber (match_scratch:CC 3 "=X"))] - "VECTOR_MEM_ALTIVEC_P (mode)" - "vand %0,%1,%2" - [(set_attr "type" "vecsimple")]) - -(define_insn "*altivec_ior3" - [(set (match_operand:VM 0 "register_operand" "=v") - (ior:VM (match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v")))] - "VECTOR_MEM_ALTIVEC_P (mode)" - "vor %0,%1,%2" - [(set_attr "type" "vecsimple")]) - -(define_insn "*altivec_xor3" - [(set (match_operand:VM 0 "register_operand" "=v") - (xor:VM (match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v")))] - "VECTOR_MEM_ALTIVEC_P (mode)" - "vxor %0,%1,%2" - [(set_attr "type" "vecsimple")]) - -(define_insn "*altivec_one_cmpl2" - [(set (match_operand:VM 0 "register_operand" "=v") - (not:VM (match_operand:VM 1 "register_operand" "v")))] - "VECTOR_MEM_ALTIVEC_P (mode)" - "vnor %0,%1,%1" - [(set_attr "type" "vecsimple")]) - -(define_insn "*altivec_nor3" - [(set (match_operand:VM 0 "register_operand" "=v") - (and:VM (not:VM (match_operand:VM 1 "register_operand" "v")) - (not:VM (match_operand:VM 2 "register_operand" "v"))))] - "VECTOR_MEM_ALTIVEC_P (mode)" - "vnor %0,%1,%2" - [(set_attr "type" "vecsimple")]) - -(define_insn "*altivec_andc3" - [(set (match_operand:VM 0 "register_operand" "=v") - (and:VM (not:VM (match_operand:VM 2 "register_operand" "v")) - (match_operand:VM 1 "register_operand" "v")))] - "VECTOR_MEM_ALTIVEC_P (mode)" - "vandc %0,%1,%2" - [(set_attr "type" "vecsimple")]) - +;; Vector pack/unpack (define_insn "altivec_vpkpx" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v") Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 200823) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -388,6 +388,77 @@ (define_mode_attr E500_CONVERT [(SF "!TA (define_mode_attr TARGET_FLOAT [(SF "TARGET_SINGLE_FLOAT") (DF "TARGET_DOUBLE_FLOAT")]) + +;; Mode iterator for logical operations on 128-bit types +(define_mode_iterator BOOL_128 [TI + PTI + (V16QI "TARGET_ALTIVEC") + (V8HI "TARGET_ALTIVEC") + (V4SI "TARGET_ALTIVEC") + (V4SF "TARGET_ALTIVEC") + (V2DI "TARGET_ALTIVEC") + (V2DF "TARGET_ALTIVEC")]) + +;; For the GPRs we use 3 constraints for register outputs, two that are the +;; same as the output register, and a third where the output register is an +;; early clobber, so we don't have to deal with register overlaps. For the +;; vector types, we prefer to use the vector registers. For TI mode, allow +;; either. + +;; Mode attribute for boolean operation register constraints for output +(define_mode_attr BOOL_REGS_OUTPUT [(TI "&r,r,r,wa,v") + (PTI "&r,r,r") + (V16QI "wa,v,&?r,?r,?r") + (V8HI "wa,v,&?r,?r,?r") + (V4SI "wa,v,&?r,?r,?r") + (V4SF "wa,v,&?r,?r,?r") + (V2DI "wa,v,&?r,?r,?r") + (V2DF "wa,v,&?r,?r,?r")]) + +;; Mode attribute for boolean operation register constraints for operand1 +(define_mode_attr BOOL_REGS_OP1 [(TI "r,0,r,wa,v") + (PTI "r,0,r") + (V16QI "wa,v,r,0,r") + (V8HI "wa,v,r,0,r") + (V4SI "wa,v,r,0,r") + (V4SF "wa,v,r,0,r") + (V2DI "wa,v,r,0,r") + (V2DF "wa,v,r,0,r")]) + +;; Mode attribute for boolean operation register constraints for operand2 +(define_mode_attr BOOL_REGS_OP2 [(TI "r,r,0,wa,v") + (PTI "r,r,0") + (V16QI "wa,v,r,r,0") + (V8HI "wa,v,r,r,0") + (V4SI "wa,v,r,r,0") + (V4SF "wa,v,r,r,0") + (V2DI "wa,v,r,r,0") + (V2DF "wa,v,r,r,0")]) + +;; Mode attribute for boolean operation register constraints for operand1 +;; for one_cmpl. To simplify things, we repeat the constraint where 0 +;; is used for operand1 or operand2 +(define_mode_attr BOOL_REGS_UNARY [(TI "r,0,0,wa,v") + (PTI "r,0,0") + (V16QI "wa,v,r,0,0") + (V8HI "wa,v,r,0,0") + (V4SI "wa,v,r,0,0") + (V4SF "wa,v,r,0,0") + (V2DI "wa,v,r,0,0") + (V2DF "wa,v,r,0,0")]) + +;; Mode attribute for the clobber of CC0 for AND expansion. +;; For the 128-bit types, we never do AND immediate, but we need to +;; get the correct number of X's for the number of operands. +(define_mode_attr BOOL_REGS_AND_CR0 [(TI "X,X,X,X,X") + (PTI "X,X,X") + (V16QI "X,X,X,X,X") + (V8HI "X,X,X,X,X") + (V4SI "X,X,X,X,X") + (V4SF "X,X,X,X,X") + (V2DI "X,X,X,X,X") + (V2DF "X,X,X,X,X")]) + ;; Start with fixed-point load and store insns. Here we put only the more ;; complex forms. Basic data transfer is done later. @@ -1837,7 +1908,19 @@ (define_split FAIL; }) -(define_insn "one_cmpl2" +(define_expand "one_cmpl2" + [(set (match_operand:SDI 0 "gpc_reg_operand" "") + (not:SDI (match_operand:SDI 1 "gpc_reg_operand" "")))] + "" +{ + if (mode == DImode && !TARGET_POWERPC64) + { + rs6000_split_logical (operands, NOT, false, false, false, NULL_RTX); + DONE; + } +}) + +(define_insn "*one_cmpl2" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (not:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))] "" @@ -7959,10 +8042,19 @@ (define_expand "anddi3" [(parallel [(set (match_operand:DI 0 "gpc_reg_operand" "") (and:DI (match_operand:DI 1 "gpc_reg_operand" "") - (match_operand:DI 2 "and64_2_operand" ""))) + (match_operand:DI 2 "reg_or_cint_operand" ""))) (clobber (match_scratch:CC 3 ""))])] - "TARGET_POWERPC64" - "") + "" +{ + if (!TARGET_POWERPC64) + { + rtx cc = gen_rtx_SCRATCH (CCmode); + rs6000_split_logical (operands, AND, false, false, false, cc); + DONE; + } + else if (!and64_2_operand (operands[2], DImode)) + operands[2] = force_reg (DImode, operands[2]); +}) (define_insn "anddi3_mc" [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r,r,r,r") @@ -8143,11 +8235,17 @@ (define_split (define_expand "iordi3" [(set (match_operand:DI 0 "gpc_reg_operand" "") (ior:DI (match_operand:DI 1 "gpc_reg_operand" "") - (match_operand:DI 2 "reg_or_logical_cint_operand" "")))] - "TARGET_POWERPC64" - " + (match_operand:DI 2 "reg_or_cint_operand" "")))] + "" { - if (non_logical_cint_operand (operands[2], DImode)) + if (!TARGET_POWERPC64) + { + rs6000_split_logical (operands, IOR, false, false, false, NULL_RTX); + DONE; + } + else if (!reg_or_logical_cint_operand (operands[2], DImode)) + operands[2] = force_reg (DImode, operands[2]); + else if (non_logical_cint_operand (operands[2], DImode)) { HOST_WIDE_INT value; rtx tmp = ((!can_create_pseudo_p () @@ -8161,15 +8259,21 @@ (define_expand "iordi3" emit_insn (gen_iordi3 (operands[0], tmp, GEN_INT (value & 0xffff))); DONE; } -}") +}) (define_expand "xordi3" [(set (match_operand:DI 0 "gpc_reg_operand" "") (xor:DI (match_operand:DI 1 "gpc_reg_operand" "") - (match_operand:DI 2 "reg_or_logical_cint_operand" "")))] - "TARGET_POWERPC64" - " + (match_operand:DI 2 "reg_or_cint_operand" "")))] + "" { + if (!TARGET_POWERPC64) + { + rs6000_split_logical (operands, XOR, false, false, false, NULL_RTX); + DONE; + } + else if (!reg_or_logical_cint_operand (operands[2], DImode)) + operands[2] = force_reg (DImode, operands[2]); if (non_logical_cint_operand (operands[2], DImode)) { HOST_WIDE_INT value; @@ -8184,7 +8288,7 @@ (define_expand "xordi3" emit_insn (gen_xordi3 (operands[0], tmp, GEN_INT (value & 0xffff))); DONE; } -}") +}) (define_insn "*booldi3_internal1" [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r") @@ -8422,6 +8526,372 @@ (define_insn "*eqv3" (set_attr "length" "4")]) +;; 128-bit logical operations expanders + +(define_expand "and3" + [(parallel [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (and:BOOL_128 + (match_operand:BOOL_128 1 "vlogical_operand" "") + (match_operand:BOOL_128 2 "vlogical_operand" ""))) + (clobber (match_scratch:CC 3 ""))])] + "" + "") + +(define_expand "ior3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "") + (match_operand:BOOL_128 2 "vlogical_operand" "")))] + "" + "") + +(define_expand "xor3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "") + (match_operand:BOOL_128 2 "vlogical_operand" "")))] + "" + "") + +(define_expand "one_cmpl2" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")))] + "" + "") + +(define_expand "nor3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (and:BOOL_128 + (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")) + (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))))] + "" + "") + +(define_expand "andc3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (and:BOOL_128 + (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" "")) + (match_operand:BOOL_128 1 "vlogical_operand" "")))] + "" + "") + +;; Power8 vector logical instructions. +(define_expand "eqv3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (not:BOOL_128 + (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "") + (match_operand:BOOL_128 2 "vlogical_operand" ""))))] + "mode == TImode || mode == PTImode || TARGET_P8_VECTOR" + "") + +;; Rewrite nand into canonical form +(define_expand "nand3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (ior:BOOL_128 + (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")) + (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))))] + "mode == TImode || mode == PTImode || TARGET_P8_VECTOR" + "") + +;; The canonical form is to have the negated elment first, so we need to +;; reverse arguments. +(define_expand "orc3" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "") + (ior:BOOL_128 + (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" "")) + (match_operand:BOOL_128 1 "vlogical_operand" "")))] + "mode == TImode || mode == PTImode || TARGET_P8_VECTOR" + "") + +;; 128-bit logical operations insns and split operations +(define_insn_and_split "*and3_internal" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (and:BOOL_128 + (match_operand:BOOL_128 1 "vlogical_operand" "%") + (match_operand:BOOL_128 2 "vlogical_operand" ""))) + (clobber (match_scratch:CC 3 ""))] + "" +{ + if (TARGET_VSX && vsx_register_operand (operands[0], mode)) + return "xxland %x0,%x1,%x2"; + + if (TARGET_ALTIVEC && altivec_register_operand (operands[0], mode)) + return "vand %0,%1,%2"; + + return "#"; +} + "reload_completed && int_reg_operand (operands[0], mode)" + [(const_int 0)] +{ + rs6000_split_logical (operands, AND, false, false, false, operands[3]); + DONE; +} + [(set (attr "type") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "vecsimple") + (const_string "integer"))) + (set (attr "length") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "4") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16"))))]) + +;; 128-bit IOR/XOR +(define_insn_and_split "*bool3_internal" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (match_operator:BOOL_128 3 "boolean_or_operator" + [(match_operand:BOOL_128 1 "vlogical_operand" "%") + (match_operand:BOOL_128 2 "vlogical_operand" "")]))] + "" +{ + if (TARGET_VSX && vsx_register_operand (operands[0], mode)) + return "xxl%q3 %x0,%x1,%x2"; + + if (TARGET_ALTIVEC && altivec_register_operand (operands[0], mode)) + return "v%q3 %0,%1,%2"; + + return "#"; +} + "reload_completed && int_reg_operand (operands[0], mode)" + [(const_int 0)] +{ + rs6000_split_logical (operands, GET_CODE (operands[3]), false, false, false, + NULL_RTX); + DONE; +} + [(set (attr "type") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "vecsimple") + (const_string "integer"))) + (set (attr "length") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "4") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16"))))]) + +;; 128-bit ANDC/ORC +(define_insn_and_split "*boolc3_internal1" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (match_operator:BOOL_128 3 "boolean_operator" + [(not:BOOL_128 + (match_operand:BOOL_128 2 "vlogical_operand" "")) + (match_operand:BOOL_128 1 "vlogical_operand" "")]))] + "TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)" +{ + if (TARGET_VSX && vsx_register_operand (operands[0], mode)) + return "xxl%q3 %x0,%x1,%x2"; + + if (TARGET_ALTIVEC && altivec_register_operand (operands[0], mode)) + return "v%q3 %0,%1,%2"; + + return "#"; +} + "(TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)) + && reload_completed && int_reg_operand (operands[0], mode)" + [(const_int 0)] +{ + rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, false, + NULL_RTX); + DONE; +} + [(set (attr "type") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "vecsimple") + (const_string "integer"))) + (set (attr "length") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "4") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16"))))]) + +(define_insn_and_split "*boolc3_internal2" + [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r") + (match_operator:TI2 3 "boolean_operator" + [(not:TI2 + (match_operand:TI2 1 "int_reg_operand" "r,0,r")) + (match_operand:TI2 2 "int_reg_operand" "r,r,0")]))] + "!TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)" + "#" + "reload_completed && !TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)" + [(const_int 0)] +{ + rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, false, + NULL_RTX); + DONE; +} + [(set_attr "type" "integer") + (set (attr "length") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16")))]) + +;; 128-bit NAND/NOR +(define_insn_and_split "*boolcc3_internal1" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (match_operator:BOOL_128 3 "boolean_operator" + [(not:BOOL_128 + (match_operand:BOOL_128 1 "vlogical_operand" "")) + (not:BOOL_128 + (match_operand:BOOL_128 2 "vlogical_operand" ""))]))] + "TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)" +{ + if (TARGET_VSX && vsx_register_operand (operands[0], mode)) + return "xxl%q3 %x0,%x1,%x2"; + + if (TARGET_ALTIVEC && altivec_register_operand (operands[0], mode)) + return "v%q3 %0,%1,%2"; + + return "#"; +} + "(TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)) + && reload_completed && int_reg_operand (operands[0], mode)" + [(const_int 0)] +{ + rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, true, + NULL_RTX); + DONE; +} + [(set (attr "type") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "vecsimple") + (const_string "integer"))) + (set (attr "length") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "4") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16"))))]) + +(define_insn_and_split "*boolcc3_internal2" + [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r") + (match_operator:TI2 3 "boolean_operator" + [(not:TI2 + (match_operand:TI2 1 "int_reg_operand" "r,0,r")) + (not:TI2 + (match_operand:TI2 2 "int_reg_operand" "r,r,0"))]))] + "!TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)" + "#" + "reload_completed && !TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)" + [(const_int 0)] +{ + rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, true, + NULL_RTX); + DONE; +} + [(set_attr "type" "integer") + (set (attr "length") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16")))]) + + +;; 128-bit EQV +(define_insn_and_split "*eqv3_internal1" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (not:BOOL_128 + (xor:BOOL_128 + (match_operand:BOOL_128 1 "vlogical_operand" "") + (match_operand:BOOL_128 2 "vlogical_operand" ""))))] + "TARGET_P8_VECTOR" +{ + if (vsx_register_operand (operands[0], mode)) + return "xxleqv %x0,%x1,%x2"; + + return "#"; +} + "TARGET_P8_VECTOR && reload_completed + && int_reg_operand (operands[0], mode)" + [(const_int 0)] +{ + rs6000_split_logical (operands, XOR, true, false, false, NULL_RTX); + DONE; +} + [(set (attr "type") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "vecsimple") + (const_string "integer"))) + (set (attr "length") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "4") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16"))))]) + +(define_insn_and_split "*eqv3_internal2" + [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r") + (not:TI2 + (xor:TI2 + (match_operand:TI2 1 "int_reg_operand" "r,0,r") + (match_operand:TI2 2 "int_reg_operand" "r,r,0"))))] + "!TARGET_P8_VECTOR" + "#" + "reload_completed && !TARGET_P8_VECTOR" + [(const_int 0)] +{ + rs6000_split_logical (operands, XOR, true, false, false, NULL_RTX); + DONE; +} + [(set_attr "type" "integer") + (set (attr "length") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16")))]) + +;; 128-bit one's complement +(define_insn_and_split "*one_cmpl3_internal" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (not:BOOL_128 + (match_operand:BOOL_128 1 "vlogical_operand" "")))] + "" +{ + if (TARGET_VSX && vsx_register_operand (operands[0], mode)) + return "xxlnor %x0,%x1,%x1"; + + if (TARGET_ALTIVEC && altivec_register_operand (operands[0], mode)) + return "vnor %0,%1,%1"; + + return "#"; +} + "reload_completed && int_reg_operand (operands[0], mode)" + [(const_int 0)] +{ + rs6000_split_logical (operands, NOT, false, false, false, NULL_RTX); + DONE; +} + [(set (attr "type") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "vecsimple") + (const_string "integer"))) + (set (attr "length") + (if_then_else + (match_test "vsx_register_operand (operands[0], mode)") + (const_string "4") + (if_then_else + (match_test "TARGET_POWERPC64") + (const_string "8") + (const_string "16"))))]) + + ;; Now define ways of moving data around. ;; Set up a register with a value from the GOT table