From patchwork Thu Aug  4 04:33:44 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.vnet.ibm.com>
X-Patchwork-Id: 655666
Return-Path: 
 <gcc-patches-return-433143-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3s4cXX6Bxhz9stY
	for <incoming@patchwork.ozlabs.org>;
	Thu,  4 Aug 2016 14:34:17 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=D/bzFHAm; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:mime-version:content-type:message-id; q=dns; s=
	default; b=JEmnxPHqZ9Y8BNq9XvEaPlLF1hvdBiJEW6Y9AuGUxmPLnDQu0eBQs
	NUZK2107tU5GulJnB/9w/79u8t5fOkj+Awgb28sfHLWAtAJhc1QgKSKlDKvwKLKd
	QwwllWz512vMeQwQb9GsWzCVrlRhVu/8spTRjq0zg9OUecRRV82PhM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:mime-version:content-type:message-id; s=
	default; bh=9t1qAB5dXisbDqk+DHL0tczElw4=; b=D/bzFHAmA4WLsKQeJxQV
	wzQTgil80otx6vNa9bmnAS7psm9B0culJe7wqlig35DcaNqBA4+KW8mwr+PeG7EW
	hEzjtB5UnBI0rrSxpXPiRodIstFF+kYDeP/u5IDbRrigyx3xUgr63O03FcNu/2ag
	/MM/9z01hExv0z9PJlCNpe8=
Received: (qmail 100401 invoked by alias); 4 Aug 2016 04:34:06 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 100380 invoked by uid 89); 4 Aug 2016 04:34:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.0 required=5.0 tests=AWL, BAYES_00,
	KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY,
	RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=King,
	addressing, HTo:U*wschmidt, concat
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.158.5) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted)
	ESMTPS; Thu, 04 Aug 2016 04:33:54 +0000
Received: from pps.filterd (m0098417.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id
	u744Xa7O125947	for <gcc-patches@gcc.gnu.org>;
	Thu, 4 Aug 2016 00:33:50 -0400
Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208])	by
	mx0a-001b2d01.pphosted.com with ESMTP id
	24kkajg7f5-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Thu, 04 Aug 2016 00:33:49 -0400
Received: from localhost	by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from
	<meissner@ibm-tiger.the-meissners.org>;
	Thu, 4 Aug 2016 00:33:48 -0400
Received: from d01dlp01.pok.ibm.com (9.56.250.166)	by e18.ny.us.ibm.com
	(146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Thu, 4 Aug 2016 00:33:46 -0400
X-IBM-Helo: d01dlp01.pok.ibm.com
X-IBM-MailFrom: meissner@ibm-tiger.the-meissners.org
Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com
	[9.57.198.27])	by d01dlp01.pok.ibm.com (Postfix) with ESMTP
	id DD88538C8041; Thu,  4 Aug 2016 00:33:45 -0400 (EDT)
Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com
	[9.57.199.107])	by b01cxnp23032.gho.pok.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id u744XolA57147448;
	Thu, 4 Aug 2016 04:33:50 GMT
Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id C74DF124010;
	Thu,  4 Aug 2016 00:33:45 -0400 (EDT)
Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111])	by
	b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP id
	A65A430001; Thu,  4 Aug 2016 00:33:45 -0400 (EDT)
Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500)	id
	B180545D0E; Thu,  4 Aug 2016 00:33:44 -0400 (EDT)
Date: Thu, 4 Aug 2016 00:33:44 -0400
From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org, Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH], Improve vector int/long initialization on PowerPC
Mail-Followup-To: Michael Meissner <meissner@linux.vnet.ibm.com>,
	gcc-patches@gcc.gnu.org,
	Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Bill Schmidt <wschmidt@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-12-10)
X-TM-AS-GCONF: 00
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16080404-0044-0000-0000-000000D3885D
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00005547; HX=3.00000240; KW=3.00000007;
	PH=3.00000004; SC=3.00000177; SDB=6.00739715; UDB=6.00347820;
	IPR=6.00512313; BA=6.00004642; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00012147; XFM=3.00000011;
	UTC=2016-08-04 04:33:48
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16080404-0045-0000-0000-000004E9C517
Message-Id: <20160804043344.GA8391@ibm-tiger.the-meissners.org>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2016-08-04_02:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=0 malwarescore=0 phishscore=0
	adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
	scancount=1 engine=8.0.1-1604210000
	definitions=main-1608040051
X-IsSubscribed: yes

This is a set of 3 patches to improve initializing vectors on the PowerPC.

The first patch changes the initialization of vector int where the
initialization part was not constant.  Previously, the compiler would create
the vector initialization on the stack and load it up, and now on 64-bit power8
and newer systems, it will create the parts in the GPRs.

Before the switch from using the old RELOAD register allocator to the newer LRA
register allocator, this patch had a problem with one of the fortran benchmarks
(cray_pointers_2) on a Power8 system (it works on power7 because the
optimization is not done there, and on power9 because power9 has d-form vector
addressing).  This was due to TImode not being allowed in vector registers.
So, I added a test to disable the optimization on such a system.  Since LRA
enables TImode to go into vector registers, most users will see the benefits of
this optimization.

The second part is cosmetic, in that moves the determination of the true
register number for a REG or SUBREG to a helper function from the
rs6000_adjust_vec_address function.  Previous versions of this patch had
additional callers to regno_or_subregno, but those uses were removed at this
time.  However, as I work on further optimizing vector initialization, set, and
extract, I may wind up using the helper function.

The third patch improves formation of vector long on ISA 3.0 system, to use the
new MTVSRDD instruction (that builds a vector from two 64-bit GPRs).

I built spec 2006 with these patches on a little endian power8 system, and at
least 18 of the benchmarks had vector initializations replaced.  Most
benchmarks only used the initialization in a few places, but gamess, dealII,
h264ref, and wrf each had over 100 initializations changed.

I have tried these patches on a big endian power7 system (both 32-bit and
64-bit targets), on a big endian power8 system (just 64-bit targets), and a
little endian power8 system (just 64-bit targets).  There were no regressions
on any of the systems.  Can I install these patches to the trunk?

[gcc]
2016-08-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vector): On 64-bit systems
	with direct move and TImode registers allowed in VSX, initialize a
	V4SImode vector in the GPRs, rather than creating a temporary
	vector on the stack, doing 4 stores to that temporary vector, and
	then doing a vector load (which causes a pipeline bubble between
	the stores and the load).
	(regno_or_subregno): New helper function to get the register
	number of a REG or SUBREG rtx.
	(rs6000_adjust_vec_address): Use regno_or_subregno.
	* config/rs6000/vsx.md (vsx_concat_<mode>): Add support for the
	ISA 3.0 mtvsrdd instruction if we are moving two gpr registers to
	create on vector register.

[gcc/testsuite]
2016-08-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-init-1.c: New tests for vector init.
	* gcc.target/powerpc/vec-init-2.c: Likewise.
	* gcc.target/powerpc/vec-init-3.c: Likewise.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r
       return;
     }
 
+  /* Special case initializing vector int if we are on 64-bit systems with
+     direct move.  This bug tickles a bug in reload for fortran's
+     cray_pointers_2 test unless -mvsx-timode is enabled.  */
+  if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT && TARGET_VSX_TIMODE)
+    {
+      rtx di_hi, di_lo, elements[4], tmp;
+      size_t i;
+
+      for (i = 0; i < 4; i++)
+	{
+	  rtx element_si = XVECEXP (vals, 0, VECTOR_ELT_ORDER_BIG ? i : 3 - i);
+	  element_si = copy_to_mode_reg (SImode, element_si);
+	  elements[i] = gen_reg_rtx (DImode);
+	  convert_move (elements[i], element_si, true);
+	}
+
+      di_hi = gen_reg_rtx (DImode);
+      tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_ashldi3 (tmp, elements[0], GEN_INT (32)));
+      emit_insn (gen_iordi3 (di_hi, tmp, elements[1]));
+
+      di_lo = gen_reg_rtx (DImode);
+      tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_ashldi3 (tmp, elements[2], GEN_INT (32)));
+      emit_insn (gen_iordi3 (di_lo, tmp, elements[3]));
+
+      emit_insn (gen_rtx_CLOBBER (VOIDmode, target));
+      emit_move_insn (gen_highpart (DImode, target), di_hi);
+      emit_move_insn (gen_lowpart (DImode, target), di_lo);
+      return;
+    }
+
   /* With single precision floating point on VSX, know that internally single
      precision is actually represented as a double, and either make 2 V2DF
      vectors, and convert these vectors to single precision, or do one
@@ -7021,6 +7053,18 @@ rs6000_expand_vector_extract (rtx target
   emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0));
 }
 
+/* Helper function to return the register number of a RTX.  */
+static inline int
+regno_or_subregno (rtx op)
+{
+  if (REG_P (op))
+    return REGNO (op);
+  else if (SUBREG_P (op))
+    return subreg_regno (op);
+  else
+    gcc_unreachable ();
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
    within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
    temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -7136,14 +7180,7 @@ rs6000_adjust_vec_address (rtx scalar_re
     {
       rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
-      int scalar_regno;
-
-      if (REG_P (scalar_reg))
-	scalar_regno = REGNO (scalar_reg);
-      else if (SUBREG_P (scalar_reg))
-	scalar_regno = subreg_regno (scalar_reg);
-      else
-	gcc_unreachable ();
+      int scalar_regno = regno_or_subregno (scalar_reg);
 
       gcc_assert (scalar_regno < FIRST_PSEUDO_REGISTER);
       if (INT_REGNO_P (scalar_regno))
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -1899,18 +1899,28 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
+  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r")))
+   (clobber (match_scratch:DI 3 "=X,X"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  if (BYTES_BIG_ENDIAN)
-    return "xxpermdi %x0,%x1,%x2,0";
+  if (which_alternative == 0)
+    return (BYTES_BIG_ENDIAN
+	    ? "xxpermdi %x0,%x1,%x2,0"
+	    : "xxpermdi %x0,%x2,%x1,0");
+
+  else if (which_alternative == 1)
+    return (BYTES_BIG_ENDIAN
+	    ? "mtvsrdd %x0,%1,%2"
+	    : "mtvsrdd %x0,%2,%1");
+
   else
-    return "xxpermdi %x0,%x2,%x1,0";
+    gcc_unreachable ();
 }
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,mftgpr")
+   (set_attr "length" "4")])
 
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented

Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,36 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+extern void check (vector int a)                    __attribute__((__noinline__));
+extern vector int pack (int a, int b, int c, int d) __attribute__((__noinline__));
+
+void
+check (vector int a)
+{
+  static const int expected[] = { -1, 2, 0, -3 };
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+vector int
+pack (int a, int b, int c, int d)
+{
+  return (vector int) { a, b, c, d };
+}
+
+vector int sv = (vector int) { -1, 2, 0, -3 };
+
+int main (void)
+{
+  check (sv);
+  check (pack (-1, 2, 0, -3));
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,36 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+extern void check (vector long a)        __attribute__((__noinline__));
+extern vector long pack (long a, long b) __attribute__((__noinline__));
+
+void
+check (vector long a)
+{
+  static const long expected[] = { 2L, -3L };
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+vector long
+pack (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+vector long sv = (vector long) { 2L, -3L };
+
+int main (void)
+{
+  check (sv);
+  check (pack (2L, -3L));
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */
+
+vector long
+merge (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+/* { dg-final { scan-assembler "mtvsrdd" } } */