From patchwork Thu Aug 4 04:33:44 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 655666 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s4cXX6Bxhz9stY for ; Thu, 4 Aug 2016 14:34:17 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=D/bzFHAm; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; q=dns; s= default; b=JEmnxPHqZ9Y8BNq9XvEaPlLF1hvdBiJEW6Y9AuGUxmPLnDQu0eBQs NUZK2107tU5GulJnB/9w/79u8t5fOkj+Awgb28sfHLWAtAJhc1QgKSKlDKvwKLKd QwwllWz512vMeQwQb9GsWzCVrlRhVu/8spTRjq0zg9OUecRRV82PhM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; s= default; bh=9t1qAB5dXisbDqk+DHL0tczElw4=; b=D/bzFHAmA4WLsKQeJxQV wzQTgil80otx6vNa9bmnAS7psm9B0culJe7wqlig35DcaNqBA4+KW8mwr+PeG7EW hEzjtB5UnBI0rrSxpXPiRodIstFF+kYDeP/u5IDbRrigyx3xUgr63O03FcNu/2ag /MM/9z01hExv0z9PJlCNpe8= Received: (qmail 100401 invoked by alias); 4 Aug 2016 04:34:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 100380 invoked by uid 89); 4 Aug 2016 04:34:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.0 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=King, addressing, HTo:U*wschmidt, concat X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Thu, 04 Aug 2016 04:33:54 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u744Xa7O125947 for ; Thu, 4 Aug 2016 00:33:50 -0400 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0a-001b2d01.pphosted.com with ESMTP id 24kkajg7f5-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 04 Aug 2016 00:33:49 -0400 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 4 Aug 2016 00:33:48 -0400 Received: from d01dlp01.pok.ibm.com (9.56.250.166) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 4 Aug 2016 00:33:46 -0400 X-IBM-Helo: d01dlp01.pok.ibm.com X-IBM-MailFrom: meissner@ibm-tiger.the-meissners.org Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id DD88538C8041; Thu, 4 Aug 2016 00:33:45 -0400 (EDT) Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u744XolA57147448; Thu, 4 Aug 2016 04:33:50 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C74DF124010; Thu, 4 Aug 2016 00:33:45 -0400 (EDT) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP id A65A430001; Thu, 4 Aug 2016 00:33:45 -0400 (EDT) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id B180545D0E; Thu, 4 Aug 2016 00:33:44 -0400 (EDT) Date: Thu, 4 Aug 2016 00:33:44 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt Subject: [PATCH], Improve vector int/long initialization on PowerPC Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16080404-0044-0000-0000-000000D3885D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005547; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000177; SDB=6.00739715; UDB=6.00347820; IPR=6.00512313; BA=6.00004642; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012147; XFM=3.00000011; UTC=2016-08-04 04:33:48 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16080404-0045-0000-0000-000004E9C517 Message-Id: <20160804043344.GA8391@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-04_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608040051 X-IsSubscribed: yes This is a set of 3 patches to improve initializing vectors on the PowerPC. The first patch changes the initialization of vector int where the initialization part was not constant. Previously, the compiler would create the vector initialization on the stack and load it up, and now on 64-bit power8 and newer systems, it will create the parts in the GPRs. Before the switch from using the old RELOAD register allocator to the newer LRA register allocator, this patch had a problem with one of the fortran benchmarks (cray_pointers_2) on a Power8 system (it works on power7 because the optimization is not done there, and on power9 because power9 has d-form vector addressing). This was due to TImode not being allowed in vector registers. So, I added a test to disable the optimization on such a system. Since LRA enables TImode to go into vector registers, most users will see the benefits of this optimization. The second part is cosmetic, in that moves the determination of the true register number for a REG or SUBREG to a helper function from the rs6000_adjust_vec_address function. Previous versions of this patch had additional callers to regno_or_subregno, but those uses were removed at this time. However, as I work on further optimizing vector initialization, set, and extract, I may wind up using the helper function. The third patch improves formation of vector long on ISA 3.0 system, to use the new MTVSRDD instruction (that builds a vector from two 64-bit GPRs). I built spec 2006 with these patches on a little endian power8 system, and at least 18 of the benchmarks had vector initializations replaced. Most benchmarks only used the initialization in a few places, but gamess, dealII, h264ref, and wrf each had over 100 initializations changed. I have tried these patches on a big endian power7 system (both 32-bit and 64-bit targets), on a big endian power8 system (just 64-bit targets), and a little endian power8 system (just 64-bit targets). There were no regressions on any of the systems. Can I install these patches to the trunk? [gcc] 2016-08-03 Michael Meissner * config/rs6000/rs6000.c (rs6000_expand_vector): On 64-bit systems with direct move and TImode registers allowed in VSX, initialize a V4SImode vector in the GPRs, rather than creating a temporary vector on the stack, doing 4 stores to that temporary vector, and then doing a vector load (which causes a pipeline bubble between the stores and the load). (regno_or_subregno): New helper function to get the register number of a REG or SUBREG rtx. (rs6000_adjust_vec_address): Use regno_or_subregno. * config/rs6000/vsx.md (vsx_concat_): Add support for the ISA 3.0 mtvsrdd instruction if we are moving two gpr registers to create on vector register. [gcc/testsuite] 2016-08-03 Michael Meissner * gcc.target/powerpc/vec-init-1.c: New tests for vector init. * gcc.target/powerpc/vec-init-2.c: Likewise. * gcc.target/powerpc/vec-init-3.c: Likewise. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 239098) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r return; } + /* Special case initializing vector int if we are on 64-bit systems with + direct move. This bug tickles a bug in reload for fortran's + cray_pointers_2 test unless -mvsx-timode is enabled. */ + if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT && TARGET_VSX_TIMODE) + { + rtx di_hi, di_lo, elements[4], tmp; + size_t i; + + for (i = 0; i < 4; i++) + { + rtx element_si = XVECEXP (vals, 0, VECTOR_ELT_ORDER_BIG ? i : 3 - i); + element_si = copy_to_mode_reg (SImode, element_si); + elements[i] = gen_reg_rtx (DImode); + convert_move (elements[i], element_si, true); + } + + di_hi = gen_reg_rtx (DImode); + tmp = gen_reg_rtx (DImode); + emit_insn (gen_ashldi3 (tmp, elements[0], GEN_INT (32))); + emit_insn (gen_iordi3 (di_hi, tmp, elements[1])); + + di_lo = gen_reg_rtx (DImode); + tmp = gen_reg_rtx (DImode); + emit_insn (gen_ashldi3 (tmp, elements[2], GEN_INT (32))); + emit_insn (gen_iordi3 (di_lo, tmp, elements[3])); + + emit_insn (gen_rtx_CLOBBER (VOIDmode, target)); + emit_move_insn (gen_highpart (DImode, target), di_hi); + emit_move_insn (gen_lowpart (DImode, target), di_lo); + return; + } + /* With single precision floating point on VSX, know that internally single precision is actually represented as a double, and either make 2 V2DF vectors, and convert these vectors to single precision, or do one @@ -7021,6 +7053,18 @@ rs6000_expand_vector_extract (rtx target emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0)); } +/* Helper function to return the register number of a RTX. */ +static inline int +regno_or_subregno (rtx op) +{ + if (REG_P (op)) + return REGNO (op); + else if (SUBREG_P (op)) + return subreg_regno (op); + else + gcc_unreachable (); +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -7136,14 +7180,7 @@ rs6000_adjust_vec_address (rtx scalar_re { rtx op1 = XEXP (new_addr, 1); addr_mask_type addr_mask; - int scalar_regno; - - if (REG_P (scalar_reg)) - scalar_regno = REGNO (scalar_reg); - else if (SUBREG_P (scalar_reg)) - scalar_regno = subreg_regno (scalar_reg); - else - gcc_unreachable (); + int scalar_regno = regno_or_subregno (scalar_reg); gcc_assert (scalar_regno < FIRST_PSEUDO_REGISTER); if (INT_REGNO_P (scalar_regno)) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 239098) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -1899,18 +1899,28 @@ (define_insn "*vsx_float_fix_v2df2" ;; Build a V2DF/V2DI vector from two scalars (define_insn "vsx_concat_" - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=,?") + [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we") (vec_concat:VSX_D - (match_operand: 1 "vsx_register_operand" ",") - (match_operand: 2 "vsx_register_operand" ",")))] + (match_operand: 1 "gpc_reg_operand" ",r") + (match_operand: 2 "gpc_reg_operand" ",r"))) + (clobber (match_scratch:DI 3 "=X,X"))] "VECTOR_MEM_VSX_P (mode)" { - if (BYTES_BIG_ENDIAN) - return "xxpermdi %x0,%x1,%x2,0"; + if (which_alternative == 0) + return (BYTES_BIG_ENDIAN + ? "xxpermdi %x0,%x1,%x2,0" + : "xxpermdi %x0,%x2,%x1,0"); + + else if (which_alternative == 1) + return (BYTES_BIG_ENDIAN + ? "mtvsrdd %x0,%1,%2" + : "mtvsrdd %x0,%2,%1"); + else - return "xxpermdi %x0,%x2,%x1,0"; + gcc_unreachable (); } - [(set_attr "type" "vecperm")]) + [(set_attr "type" "vecperm,mftgpr") + (set_attr "length" "4")]) ;; Special purpose concat using xxpermdi to glue two single precision values ;; together, relying on the fact that internally scalar floats are represented Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-init-1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 239099) @@ -0,0 +1,36 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +#include +#include +#include + +extern void check (vector int a) __attribute__((__noinline__)); +extern vector int pack (int a, int b, int c, int d) __attribute__((__noinline__)); + +void +check (vector int a) +{ + static const int expected[] = { -1, 2, 0, -3 }; + size_t i; + + for (i = 0; i < 4; i++) + if (vec_extract (a, i) != expected[i]) + abort (); +} + +vector int +pack (int a, int b, int c, int d) +{ + return (vector int) { a, b, c, d }; +} + +vector int sv = (vector int) { -1, 2, 0, -3 }; + +int main (void) +{ + check (sv); + check (pack (-1, 2, 0, -3)); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-init-2.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c (.../gcc/testsuite/gcc.target/powerpc) (revision 239099) @@ -0,0 +1,36 @@ +/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +#include +#include +#include + +extern void check (vector long a) __attribute__((__noinline__)); +extern vector long pack (long a, long b) __attribute__((__noinline__)); + +void +check (vector long a) +{ + static const long expected[] = { 2L, -3L }; + size_t i; + + for (i = 0; i < 2; i++) + if (vec_extract (a, i) != expected[i]) + abort (); +} + +vector long +pack (long a, long b) +{ + return (vector long) { a, b }; +} + +vector long sv = (vector long) { 2L, -3L }; + +int main (void) +{ + check (sv); + check (pack (2L, -3L)); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-init-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-init-3.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-init-3.c (.../gcc/testsuite/gcc.target/powerpc) (revision 239099) @@ -0,0 +1,12 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */ + +vector long +merge (long a, long b) +{ + return (vector long) { a, b }; +} + +/* { dg-final { scan-assembler "mtvsrdd" } } */