From patchwork Wed Oct  9 23:11:33 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 282088
Return-Path: 
 <gcc-patches-return-350790-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id 67BDF2C012E
	for <incoming@patchwork.ozlabs.org>;
	Thu, 10 Oct 2013 10:11:36 +1100 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=fOG
	BIGS8rht0wLha17trCx8QEkv4dIPB9UHb+csMGUPSmGrOLvdAPXnwyXo5Adwa1j8
	jMQUtzhPAVkmovgmMzjyuaiiTwxLvqMOg/TIIXoi+bGR8pUp4T+RotCEeGYyPld4
	uz4mr3NK6D3k/7ea3DA08YCKyQV+WYQUdmLJ9vhc=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type
	:content-transfer-encoding:mime-version; s=default; bh=z9g2eIWRa
	NqlzwexjoVHeJ0TsTc=; b=tIqM8iwt7EKIjSx3XOaUhdVDD7hLC+P1OBE/Wi8GU
	Tilo0V8bEio282Ov2sG74MvyYToyE1HdhtOA9uDCn/d5u5JwZXJXXqUq2lSJDBok
	AuqnzUcwVy5T1FnNYIDaK5NVAQzFpxgvc/oblacuNcteyd3/4kokqgEPw4ClHoQZ
	y4=
Received: (qmail 12733 invoked by alias); 9 Oct 2013 23:11:29 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 12716 invoked by uid 89); 9 Oct 2013 23:11:28 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: e28smtp04.in.ibm.com
Received: from e28smtp04.in.ibm.com (HELO e28smtp04.in.ibm.com)
	(122.248.162.4) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted)
	ESMTPS; Wed, 09 Oct 2013 23:11:27 +0000
Received: from /spool/local	by e28smtp04.in.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<wschmidt@linux.vnet.ibm.com>; Thu, 10 Oct 2013 04:41:21 +0530
Received: from d28dlp03.in.ibm.com (9.184.220.128)	by e28smtp04.in.ibm.com
	(192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Thu, 10 Oct 2013 04:41:19 +0530
Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com
	[9.184.220.59])	by d28dlp03.in.ibm.com (Postfix) with ESMTP
	id 0BC021258051	for <gcc-patches@gcc.gnu.org>;
	Thu, 10 Oct 2013 04:41:42 +0530 (IST)
Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64])	by
	d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r99NDvnj41549894	for <gcc-patches@gcc.gnu.org>;
	Thu, 10 Oct 2013 04:43:57 +0530
Received: from d28av02.in.ibm.com (localhost [127.0.0.1])	by
	d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP
	id r99NBIOD001976	for <gcc-patches@gcc.gnu.org>;
	Thu, 10 Oct 2013 04:41:18 +0530
Received: from [9.65.233.72] (sig-9-65-233-72.mts.ibm.com [9.65.233.72])	by
	d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP
	id r99NBFnb001845; Thu, 10 Oct 2013 04:41:16 +0530
Message-ID: <1381360293.6275.31.camel@gnopaine>
Subject: [PATCH,
	rs6000] Fix variable permute control vectors for little endian
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: dje.gcc@gmail.com
Date: Wed, 09 Oct 2013 18:11:33 -0500
Mime-Version: 1.0
X-TM-AS-MML: No
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 13100923-5564-0000-0000-00000A174A45
X-IsSubscribed: yes

Hi,

This is a follow-up to the recent patch that fixed constant permute
control vectors for little endian.  When the control vector is constant,
we can adjust the constant and use a vperm without increasing code size.
When the control vector is unknown, however, we have to generate two
additional instructions to subtract each element of the control vector
from 31 (equivalently, from -1, since only 5 bits are pertinent).  This
patch adds the additional code generation.

There are two main paths to the affected permutes:  via the known
pattern vec_perm<mode>, and via an altivec builtin.  The builtin path
causes a little difficulty because there's no way to dispatch a builtin
to two different insns for BE and LE.  I solved this by adding two new
unspecs for the builtins (UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X).  The
insns for the builtins are changed from a define_insn to a
define_insn_and_split.  We create the _X forms at expand time and later
split them into the correct sequences for BE and LE, using the "real"
UNSPEC_VPERM and UNSPEC_VPERM_UNS to generate the vperm instruction.

For the path via the known pattern, I added a new routine in rs6000.c in
similar fashion to the solution for the constant control vector case.

When the permute control vector is a rotate vector loaded by lvsl or
lvsr, we can generate the desired control vector more cheaply by simply
changing to use the opposite instruction.  We are already doing that
when expanding an unaligned load.  The changes in vector.md avoid
undoing that effort by circumventing the subtract-from-splat (going
straight to the UNSPEC_VPERM).

I bootstrapped and tested this for big endian on
powerpc64-unknown-linux-gnu with no new regressions.  I did the same for
little endian on powerpc64le-unknown-linux-gnu.  Here the results were
slightly mixed: the changes fix 32 test failures, but expose an
unrelated bug in 9 others when -mvsx is permitted on LE (not currently
allowed).  The bug is a missing permute for a vector load in the
unaligned vector load logic that will be fixed in a subsequent patch.

Is this okay for trunk?

Thanks,
Bill


2013-10-09  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
	directly to circumvent subtract from splat{31} workaround.
	* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
	prototype.
	* config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
	* config/rs6000/altivec.md (define_c_enum "unspec"): Add
	UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
	(altivec_vperm_<mode>): Convert to define_insn_and_split to
	separate big and little endian logic.
	(*altivec_vperm_<mode>_internal): New define_insn.
	(altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
	separate big and little endian logic.
	(*altivec_vperm_<mode>_uns_internal): New define_insn.
	(vec_permv16qi): Add little endian logic.

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 203246)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -950,8 +950,15 @@
     emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
     	      				 operands[2], operands[3]));
   else
-    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[2],
-    	      				 operands[1], operands[3]));
+    {
+      /* Avoid the "subtract from splat31" workaround for vperm since
+         we have changed lvsr to lvsl instead.  */
+      rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
+                                   gen_rtvec (3, operands[2],
+                                              operands[1], operands[3]),
+                                   UNSPEC_VPERM);
+      emit_move_insn (operands[0], unspec);
+    }
   DONE;
 })
 
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 203246)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -56,6 +56,7 @@ extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, int);
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
+extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 203247)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -28608,6 +28608,54 @@ altivec_expand_vec_perm_const_le (rtx operands[4])
   emit_move_insn (target, unspec);
 }
 
+/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
+   permute control vector.  But here it's not a constant, so we must
+   generate a vector splat/subtract to do the adjustment.  */
+
+void
+altivec_expand_vec_perm_le (rtx operands[4])
+{
+  rtx splat, unspec;
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx sel = operands[3];
+  rtx tmp = target;
+
+  /* Get everything in regs so the pattern matches.  */
+  if (!REG_P (op0))
+    op0 = force_reg (V16QImode, op0);
+  if (!REG_P (op1))
+    op1 = force_reg (V16QImode, op1);
+  if (!REG_P (sel))
+    sel = force_reg (V16QImode, sel);
+  if (!REG_P (target))
+    tmp = gen_reg_rtx (V16QImode);
+
+  /* SEL = splat(31) - SEL.  */
+  /* We want to subtract from 31, but we can't vspltisb 31 since
+     it's out of range.  -1 works as well because only the low-order
+     five bits of the permute control vector elements are used.  */
+  splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+				 gen_rtx_CONST_INT (QImode, -1));
+  emit_move_insn (tmp, splat);
+  sel = gen_rtx_MINUS (V16QImode, tmp, sel);
+  emit_move_insn (tmp, sel);
+
+  /* Permute with operands reversed and adjusted selector.  */
+  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
+			   UNSPEC_VPERM);
+
+  /* Copy into target, possibly by way of a register.  */
+  if (!REG_P (target))
+    {
+      emit_move_insn (tmp, unspec);
+      unspec = tmp;
+    }
+
+  emit_move_insn (target, unspec);
+}
+
 /* Expand an Altivec constant permutation.  Return true if we match
    an efficient implementation; false to fall back to VPERM.  */
 
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 203244)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -59,6 +59,8 @@
    UNSPEC_VSUMSWS
    UNSPEC_VPERM
    UNSPEC_VPERM_UNS
+   UNSPEC_VPERM_X
+   UNSPEC_VPERM_UNS_X
    UNSPEC_VRFIN
    UNSPEC_VCFUX
    UNSPEC_VCFSX
@@ -1279,21 +1281,91 @@
   "vrfiz %0,%1"
   [(set_attr "type" "vecfloat")])
 
-(define_insn "altivec_vperm_<mode>"
+(define_insn_and_split "altivec_vperm_<mode>"
   [(set (match_operand:VM 0 "register_operand" "=v")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
 		    (match_operand:VM 2 "register_operand" "v")
 		    (match_operand:V16QI 3 "register_operand" "v")]
+		   UNSPEC_VPERM_X))]
+  "TARGET_ALTIVEC"
+  "#"
+  "!reload_in_progress && !reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  if (BYTES_BIG_ENDIAN)
+    operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                  gen_rtvec (3, operands[1],
+ 		                             operands[2], operands[3]),
+                                  UNSPEC_VPERM);
+  else
+    {
+      /* We want to subtract from 31, but we can't vspltisb 31 since
+         it's out of range.  -1 works as well because only the low-order
+         five bits of the permute control vector elements are used.  */
+      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+                                         gen_rtx_CONST_INT (QImode, -1));
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, splat);
+      rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]);
+      emit_move_insn (tmp, sel);
+      operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                    gen_rtvec (3, operands[2],
+		 		               operands[1], tmp),
+		                    UNSPEC_VPERM);
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*altivec_vperm_<mode>_internal"
+  [(set (match_operand:VM 0 "register_operand" "=v")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
+		    (match_operand:VM 2 "register_operand" "v")
+		    (match_operand:V16QI 3 "register_operand" "+v")]
 		   UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
   "vperm %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vperm_<mode>_uns"
+(define_insn_and_split "altivec_vperm_<mode>_uns"
   [(set (match_operand:VM 0 "register_operand" "=v")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
 		    (match_operand:VM 2 "register_operand" "v")
 		    (match_operand:V16QI 3 "register_operand" "v")]
+		   UNSPEC_VPERM_UNS_X))]
+  "TARGET_ALTIVEC"
+  "#"
+  "!reload_in_progress && !reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  if (BYTES_BIG_ENDIAN)
+    operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                  gen_rtvec (3, operands[1],
+				             operands[2], operands[3]),
+                                  UNSPEC_VPERM_UNS);
+  else
+    {
+      /* We want to subtract from 31, but we can't vspltisb 31 since
+         it's out of range.  -1 works as well because only the low-order
+         five bits of the permute control vector elements are used.  */
+      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+                                         gen_rtx_CONST_INT (QImode, -1));
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, splat);
+      rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]);
+      emit_move_insn (tmp, sel);
+      operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                    gen_rtvec (3, operands[2],
+				               operands[1], tmp),
+		                    UNSPEC_VPERM_UNS);
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*altivec_vperm_<mode>_uns_internal"
+  [(set (match_operand:VM 0 "register_operand" "=v")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
+		    (match_operand:VM 2 "register_operand" "v")
+		    (match_operand:V16QI 3 "register_operand" "+v")]
 		   UNSPEC_VPERM_UNS))]
   "TARGET_ALTIVEC"
   "vperm %0,%1,%2,%3"
@@ -1306,7 +1378,12 @@
 		       (match_operand:V16QI 3 "register_operand" "")]
 		      UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
-  "")
+{
+  if (!BYTES_BIG_ENDIAN) {
+    altivec_expand_vec_perm_le (operands);
+    DONE;
+  }
+})
 
 (define_expand "vec_perm_constv16qi"
   [(match_operand:V16QI 0 "register_operand" "")