From patchwork Mon Aug  7 13:18:30 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.vnet.ibm.com>
X-Patchwork-Id: 798634
Return-Path: 
 <gcc-patches-return-459938-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-459938-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="HaOipedA"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3xQymB5rl0z9s06
	for <incoming@patchwork.ozlabs.org>;
	Mon,  7 Aug 2017 23:19:06 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:references:mime-version:content-type
	:in-reply-to:message-id; q=dns; s=default; b=YOgEuJdjJ+tOU7oe3AS
	yowLOc7WvEOtIOL5R0rVGmLhcryceylMAQd1PFNjfhQyehkp1AjTyOMiZAxt91M7
	jv5ggPsh9zeMXkJaXJ4IolQldFBUH8tqo6wvm6EHSojGGoBjsK/6JKORRfQDty4e
	32WF0wWccoB6Za1RTAE+ydfE=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:references:mime-version:content-type
	:in-reply-to:message-id; s=default; bh=h7tBQmOq5wZUQCMzJmrQRyIXs
	SI=; b=HaOipedAB0TUwqFQNrgyBm66dr6jPVMMMPp/F7FEoT7voPH8Lo8rCDLrM
	3+RgrEpGT6qKPZxOn9KyFjOzl1MJ/YP2CShrBUqs7rj1R5weXEBomrOm1S3XERvz
	3RmH2Zh4xDB4JnQXSixXMqV4YUIY6/bnL1w2l+nXnGGlcQo1HU=
Received: (qmail 66579 invoked by alias); 7 Aug 2017 13:18:47 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 65173 invoked by uid 89); 7 Aug 2017 13:18:46 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-9.9 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	KAM_LAZY_DOMAIN_SECURITY,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=ii, ele
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.156.1) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Mon, 07 Aug 2017 13:18:38 +0000
Received: from pps.filterd (m0098394.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id
	v77DIK2p066129	for <gcc-patches@gcc.gnu.org>;
	Mon, 7 Aug 2017 09:18:36 -0400
Received: from e17.ny.us.ibm.com (e17.ny.us.ibm.com [129.33.205.207])	by
	mx0a-001b2d01.pphosted.com with ESMTP id
	2c6kfcgcx0-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Mon, 07 Aug 2017 09:18:36 -0400
Received: from localhost	by e17.ny.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from
	<meissner@ibm-tiger.the-meissners.org>;
	Mon, 7 Aug 2017 09:18:34 -0400
Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28)	by
	e17.ny.us.ibm.com (146.89.104.204) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	Mon, 7 Aug 2017 09:18:32 -0400
Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com
	[9.57.199.110])	by b01cxnp23033.gho.pok.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id v77DIVjk36896970;
	Mon, 7 Aug 2017 13:18:31 GMT
Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id A14F2AE03B;
	Mon,  7 Aug 2017 09:18:47 -0400 (EDT)
Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111])	by
	b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP id
	76FABAE03C; Mon,  7 Aug 2017 09:18:47 -0400 (EDT)
Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500)	id
	152EB45DA2; Mon,  7 Aug 2017 09:18:30 -0400 (EDT)
Date: Mon, 7 Aug 2017 09:18:30 -0400
From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Michael Meissner <meissner@linux.vnet.ibm.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: Re: [PATCH], PR target/81593,
	Optimize PowerPC vector sets coming from a vector extracts
Mail-Followup-To: Michael Meissner <meissner@linux.vnet.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Bill Schmidt <wschmidt@linux.vnet.ibm.com>
References: <20170727232113.GA8723@ibm-tiger.the-meissners.org>
	<20170728210848.GC13471@gate.crashing.org>
	<20170802142855.GA11603@ibm-tiger.the-meissners.org>
	<20170803150141.GV13471@gate.crashing.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20170803150141.GV13471@gate.crashing.org>
User-Agent: Mutt/1.5.20 (2009-12-10)
X-TM-AS-GCONF: 00
x-cbid: 17080713-0040-0000-0000-0000038BDAB2
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007501; HX=3.00000241; KW=3.00000007;
	PH=3.00000004; SC=3.00000217; SDB=6.00898859; UDB=6.00449851;
	IPR=6.00679065; BA=6.00005515; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00016574; XFM=3.00000015;
	UTC=2017-08-07 13:18:33
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17080713-0041-0000-0000-0000078006F7
Message-Id: <20170807131830.GA753@ibm-tiger.the-meissners.org>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2017-08-07_10:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=0 malwarescore=0 phishscore=0
	adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
	scancount=1 engine=8.0.1-1706020000
	definitions=main-1708070224
X-IsSubscribed: yes

On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
> 
> You didn't handle the first part of this as far as I see?  It's the
> big complicating issue here.
> 
> > +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > +   use for extracting the 64-bit double word from ARG1.
> > +
> > +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > +   use for extracting the 64-bit double word from ARG2.
> > +
> > +   The element number is based on the user element ordering, set by the
> > +   endianess and by the -maltivec={le,be} options.  */
> 
> ("endianness", two n's).
> 
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
> 
> Can you move the "which half is high" decision to the callers?

I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3}
insns.  Just to be sure I got things correct, I wrote a new executable test
that tests various methods of creating/inserting 2 element vectors with double
word elements, and tested in BE, LE -maltivec=be, and LE, and the results match
previous compilers.

I have done bootstrap/build checks on a big endian power7, a little endian
power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
have script issues that prevents a bootstrap build on power9 that I need to
look into).  There are no regressions in the tests and the new tests were run
on each of the systems.  Can I check this into the trunk?

I would also like to backport it to all open branches (particularly GCC 7, but
GCC 6 if possible).  Note, the patch will need a slight tweak on the older
systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust
the constraints to accomidate this, and under GCC 6 DImode not being allowed in
traditional Altivec registers.

[gcc]
2017-08-07  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D): Cleanup
	constraints since the -mupper-regs-* switches have been
	eliminated.
	(vsx_concat_<mode>_1): New combiner insns to recognize inserting
	into a vector from a double word element that was extracted from
	another vector, and eliminate extra XXPERMDI instructions.
	(vsx_concat_<mode>_2): Likewise.
	(vsx_concat_<mode>_3): Likewise.
	(vsx_set_<mode>, VSX_D): Rewrite vector set in terms of vector
	concat to allow optimizing inserts from previous extracts.

[gcc/testsuite]
2017-08-07  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* gcc.target/powerpc/vec-setup.h: New tests to test various
	combinations of setting up vectors of 2 double word elements.
	* gcc.target/powerpc/vec-setup-long.c: Likewise.
	* gcc.target/powerpc/vec-setup-double.c: Likewise.
	* gcc.target/powerpc/vec-setup-be-long.c: Likewise.
	* gcc.target/powerpc/vec-setup-be-double.c: Likewise.
	* gcc.target/powerpc/vsx-extract-6.c: New tests for optimzing
	vector inserts from vector extracts.
	* gcc.target/powerpc/vsx-extract-7.c: Likewise.

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 250858)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -2364,10 +2364,10 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
-	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (which_alternative == 0)
@@ -2385,6 +2385,80 @@ (define_insn "vsx_concat_<mode>"
 }
   [(set_attr "type" "vecperm")])
 
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; word element in a vector register.
+(define_insn "*vsx_concat_<mode>_1"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")]))
+	 (match_operand:<VS_scalar> 3 "gpc_reg_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  HOST_WIDE_INT dword = INTVAL (operands[2]);
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[4] = GEN_INT (2*dword);
+      return "xxpermdi %x0,%x1,%x3,%4";
+    }
+  else
+    {
+      operands[4] = GEN_INT (!dword);
+      return "xxpermdi %x0,%x3,%x1,%4";
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_concat_<mode>_2"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  HOST_WIDE_INT dword = INTVAL (operands[3]);
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[4] = GEN_INT (dword);
+      return "xxpermdi %x0,%x1,%x2,%4";
+    }
+  else
+    {
+      operands[4] = GEN_INT (2 * !dword);
+      return "xxpermdi %x0,%x2,%x1,%4";
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_concat_<mode>_3"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")]))
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 3 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  HOST_WIDE_INT dword1 = INTVAL (operands[2]);
+  HOST_WIDE_INT dword2 = INTVAL (operands[4]);
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[5] = GEN_INT ((2 * dword1) + dword2);
+      return "xxpermdi %x0,%x1,%x3,%5";
+    }
+  else
+    {
+      operands[5] = GEN_INT ((2 * !dword2) + !dword1);
+      return "xxpermdi %x0,%x3,%x1,%5";
+    }
+}
+  [(set_attr "type" "vecperm")])
+
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented
 ;; as doubles.  This is used to initialize a V4SF vector with 4 floats
@@ -2585,25 +2659,35 @@ (define_expand "vsx_set_v1ti"
   DONE;
 })
 
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
-	(unspec:VSX_D
-	 [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
-	  (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	  (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
-	 UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+   (use (match_operand:VSX_D 1 "vsx_register_operand"))
+   (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+   (use (match_operand:QI 3 "const_0_to_1_operand"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
-  if (INTVAL (operands[3]) == idx_first)
-    return \"xxpermdi %x0,%x2,%x1,1\";
-  else if (INTVAL (operands[3]) == 1 - idx_first)
-    return \"xxpermdi %x0,%x1,%x2,0\";
+  rtx dest = operands[0];
+  rtx vec_reg = operands[1];
+  rtx value = operands[2];
+  rtx ele = operands[3];
+  rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+  if (ele == const0_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+      DONE;
+    }
+  else if (ele == const1_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+      DONE;
+    }
   else
     gcc_unreachable ();
-}
-  [(set_attr "type" "vecperm")])
+})
 
 ;; Extract a DF/DI element from V2DF/V2DI
 ;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,11 @@
+/* { dg-do run { target { powerpc64le*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx -maltivec=be" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the long (on 64-bit systems) or long long datatype
+   (on 32-bit systems).
+
+   This test explicitly tests -maltivec=be to make sure things are correct.  */
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup.h	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,366 @@
+#include <altivec.h>
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This include files supports:
+
+	 testing double
+	 testing long on 64-bit systems
+	 testing long long on 32-bit systems.
+
+   The endian support is:
+
+	big endian
+	little endian with little endian element ordering
+	little endian with big endian element ordering.  */
+
+#ifdef DEBUG
+#include <stdio.h>
+#define DEBUG0(STR)		fputs (STR, stdout)
+#define DEBUG2(STR,A,B)		printf (STR, A, B)
+
+static int errors = 0;
+
+#else
+#include <stdlib.h>
+#define DEBUG0(STR)
+#define DEBUG2(STR,A,B)
+#endif
+
+#if defined(DO_DOUBLE)
+#define TYPE	double
+#define STYPE	"double"
+#define ZERO	0.0
+#define ONE	1.0
+#define TWO	2.0
+#define THREE	3.0
+#define FOUR	4.0
+#define FIVE	5.0
+#define SIX	6.0
+#define FMT	"g"
+
+#elif defined(_ARCH_PPC64)
+#define TYPE	long
+#define STYPE	"long"
+#define ZERO	0L
+#define ONE	1L
+#define TWO	2L
+#define THREE	3L
+#define FOUR	4L
+#define FIVE	5L
+#define SIX	6L
+#define FMT	"ld"
+
+#else
+#define TYPE	long long
+#define STYPE	"long long"
+#define ZERO	0LL
+#define ONE	1LL
+#define TWO	2LL
+#define THREE	3LL
+#define FOUR	4LL
+#define FIVE	5LL
+#define SIX	6LL
+#define FMT	"lld"
+#endif
+
+/* Macros to order the left/right values correctly.  Note, -maltivec=be does
+   not change the order for static initializations, so we have to handle it
+   specially.  */
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define INIT_ORDER(A, B)	(TYPE) A, (TYPE) B
+#define ELEMENT_ORDER(A, B)	(TYPE) A, (TYPE) B
+#define ENDIAN			"-mbig"
+
+#elif __VEC_ELEMENT_REG_ORDER__ == __ORDER_BIG_ENDIAN__
+#define NO_ARRAY
+#define INIT_ORDER(A, B)	(TYPE) B, (TYPE) A
+#define ELEMENT_ORDER(A, B)	(TYPE) A, (TYPE) B
+#define ENDIAN			"-mlittle -maltivec=be"
+
+#else
+#define INIT_ORDER(A, B)	(TYPE) B, (TYPE) A
+#define ELEMENT_ORDER(A, B)	(TYPE) B, (TYPE) A
+#define ENDIAN			"-mlittle"
+#endif
+
+static volatile TYPE		five	= FIVE;
+static volatile TYPE		six	= SIX;
+static volatile vector TYPE	s_v12 = { ONE,   TWO };
+static volatile vector TYPE	g_v34 = { THREE, FOUR };
+
+
+__attribute__((__noinline__))
+static void
+vector_check (vector TYPE v, TYPE expect_hi, TYPE expect_lo)
+{
+  TYPE actual_hi, actual_lo;
+#ifdef DEBUG
+  const char *pass_fail;
+#endif
+
+  __asm__ ("xxlor %x0,%x1,%x1"		: "=&wa" (actual_hi) : "wa" (v));
+  __asm__ ("xxpermdi %x0,%x1,%x1,3"	: "=&wa" (actual_lo) : "wa" (v));
+
+#ifdef DEBUG
+  if ((actual_hi == expect_hi) && (actual_lo == expect_lo))
+    pass_fail = ", pass";
+  else
+    {
+      pass_fail = ", fail";
+      errors++;
+    }
+
+  printf ("Expected %" FMT ", %" FMT ", got %" FMT ", %" FMT "%s\n",
+	  expect_hi, expect_lo,
+	  actual_hi, actual_lo,
+	  pass_fail);
+#else
+  if ((actual_hi != expect_hi) || (actual_lo != expect_lo))
+    abort ();
+#endif
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+combine (TYPE op0, TYPE op1)
+{
+  return (vector TYPE) { op0, op1 };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+combine_insert (TYPE op0, TYPE op1)
+{
+  vector TYPE ret = (vector TYPE) { ZERO, ZERO };
+  ret = vec_insert (op0, ret, 0);
+  ret = vec_insert (op1, ret, 1);
+  return ret;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_00 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_01 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_10 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_11 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_0s (vector TYPE a, TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 0), b };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_1s (vector TYPE a, TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 1), b };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_s0 (TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { a, vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_s1 (TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { a, vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_nn (vector TYPE a, vector TYPE b, size_t i, size_t j)
+{
+  return (vector TYPE) { vec_extract (a, i), vec_extract (b, j) };
+}
+
+#ifndef NO_ARRAY
+__attribute__((__noinline__))
+static vector TYPE
+array_0 (vector TYPE v, TYPE a)
+{
+  v[0] = a;
+  return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_1 (vector TYPE v, TYPE a)
+{
+  v[1] = a;
+  return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_01 (vector TYPE v, TYPE a, TYPE b)
+{
+  v[0] = a;
+  v[1] = b;
+  return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_01b (TYPE a, TYPE b)
+{
+  vector TYPE v = (vector TYPE) { 0, 0 };
+  v[0] = a;
+  v[1] = b;
+  return v;
+}
+#endif
+
+int
+main (void)
+{
+  vector TYPE a = (vector TYPE) { ONE,   TWO  };
+  vector TYPE b = (vector TYPE) { THREE, FOUR };
+  size_t i, j;
+
+#ifndef NO_ARRAY
+  vector TYPE z = (vector TYPE) { ZERO,  ZERO };
+#endif
+
+  DEBUG2 ("Endian: %s, type: %s\n", ENDIAN, STYPE);
+  DEBUG0 ("\nStatic/global initialization\n");
+  vector_check (s_v12, INIT_ORDER (1, 2));
+  vector_check (g_v34, INIT_ORDER (3, 4));
+
+  DEBUG0 ("\nVector via constant runtime intiialization\n");
+  vector_check (a, INIT_ORDER (1, 2));
+  vector_check (b, INIT_ORDER (3, 4));
+
+  DEBUG0 ("\nCombine scalars using vector initialization\n");
+  vector_check (combine (1, 2), INIT_ORDER (1, 2));
+  vector_check (combine (3, 4), INIT_ORDER (3, 4));
+
+  DEBUG0 ("\nSetup with vec_insert\n");
+  a = combine_insert (1, 2);
+  b = combine_insert (3, 4);
+  vector_check (a, ELEMENT_ORDER (1, 2));
+  vector_check (b, ELEMENT_ORDER (3, 4));
+
+#ifndef NO_ARRAY
+  DEBUG0 ("\nTesting array syntax\n");
+  vector_check (array_0   (a, FIVE),      ELEMENT_ORDER (5, 2));
+  vector_check (array_1   (b, SIX),       ELEMENT_ORDER (3, 6));
+  vector_check (array_01  (z, FIVE, SIX), ELEMENT_ORDER (5, 6));
+  vector_check (array_01b (FIVE, SIX),    ELEMENT_ORDER (5, 6));
+
+  vector_check (array_0   (a, five),      ELEMENT_ORDER (5, 2));
+  vector_check (array_1   (b, six),       ELEMENT_ORDER (3, 6));
+  vector_check (array_01  (z, five, six), ELEMENT_ORDER (5, 6));
+  vector_check (array_01b (five, six),    ELEMENT_ORDER (5, 6));
+#else
+  DEBUG0 ("\nSkipping array syntax on -maltivec=be\n");
+#endif
+
+  DEBUG0 ("\nTesting concat and extract\n");
+  vector_check (concat_extract_00 (a, b), INIT_ORDER (1, 3));
+  vector_check (concat_extract_01 (a, b), INIT_ORDER (1, 4));
+  vector_check (concat_extract_10 (a, b), INIT_ORDER (2, 3));
+  vector_check (concat_extract_11 (a, b), INIT_ORDER (2, 4));
+
+  DEBUG0 ("\nTesting concat and extract #2\n");
+  vector_check (concat_extract2_0s (a, FIVE), INIT_ORDER (1, 5));
+  vector_check (concat_extract2_1s (a, FIVE), INIT_ORDER (2, 5));
+  vector_check (concat_extract2_s0 (SIX, a),  INIT_ORDER (6, 1));
+  vector_check (concat_extract2_s1 (SIX, a),  INIT_ORDER (6, 2));
+
+  DEBUG0 ("\nTesting variable concat and extract\n");
+  for (i = 0; i < 2; i++)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  static struct {
+	    TYPE hi;
+	    TYPE lo;
+	  } hilo[2][2] =
+	      { { { ONE, THREE }, { ONE, FOUR } },
+		{ { TWO, THREE }, { TWO, FOUR } } };
+
+	  vector_check (concat_extract_nn (a, b, i, j),
+			INIT_ORDER (hilo[i][j].hi, hilo[i][j].lo));
+	}
+    }
+
+  DEBUG0 ("\nTesting separate function\n");
+  vector_check (combine (vec_extract (a, 0), vec_extract (b, 0)),
+		INIT_ORDER (1, 3));
+
+  vector_check (combine (vec_extract (a, 0), vec_extract (b, 1)),
+		INIT_ORDER (1, 4));
+
+  vector_check (combine (vec_extract (a, 1), vec_extract (b, 0)),
+		INIT_ORDER (2, 3));
+
+  vector_check (combine (vec_extract (a, 1), vec_extract (b, 1)),
+		INIT_ORDER (2, 4));
+
+  vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 0)),
+		ELEMENT_ORDER (1, 3));
+
+  vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 1)),
+		ELEMENT_ORDER (1, 4));
+
+  vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 0)),
+		ELEMENT_ORDER (2, 3));
+
+  vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 1)),
+		ELEMENT_ORDER (2, 4));
+
+
+#if defined(DO_DOUBLE)
+  DEBUG0 ("\nTesting explicit 2df concat\n");
+  vector_check (__builtin_vsx_concat_2df (FIVE, SIX), INIT_ORDER (5, 6));
+  vector_check (__builtin_vsx_concat_2df (five, six), INIT_ORDER (5, 6));
+
+#elif defined(_ARCH_PPC64)
+  DEBUG0 ("\nTesting explicit 2di concat\n");
+  vector_check (__builtin_vsx_concat_2di (FIVE, SIX), INIT_ORDER (5, 6));
+  vector_check (__builtin_vsx_concat_2di (five, six), INIT_ORDER (5, 6));
+
+#else
+  DEBUG0 ("\nSkip explicit 2di concat on 32-bit\n");
+#endif
+
+#ifdef DEBUG
+  if (errors)
+    printf ("\n%d error%s were found", errors, (errors == 1) ? "" : "s");
+  else
+    printf ("\nNo errors were found.\n");
+
+  return errors;
+
+#else
+  return 0;
+#endif
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,12 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the double datatype.
+
+   This test explicitly tests -maltivec=be to make sure things are correct.  */
+
+#define DO_DOUBLE
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-double.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-double.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-double.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,11 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the double datatype and the default endian
+   order.  */
+
+#define DO_DOUBLE
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-long.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-long.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,9 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the long (on 64-bit systems) or long long datatype
+   (on 32-bit systems).  The default endian order is used.  */
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250858)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+  vector unsigned long res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1    } } */
+/* { dg-final { scan-assembler-not   {\mvspltisw\M}      } } */
+/* { dg-final { scan-assembler-not   {\mxxlor\M}         } } */
+/* { dg-final { scan-assembler-not   {\mxxlxor\M}        } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}      } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}         } } */
+/* { dg-final { scan-assembler-not   {\mlxv[dw][24]x\M}  } } */
+/* { dg-final { scan-assembler-not   {\mlvx\M}           } } */
+/* { dg-final { scan-assembler-not   {\mstxvx?\M}        } } */
+/* { dg-final { scan-assembler-not   {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not   {\mstvx\M}          } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250858)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+  vector double res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1    } } */
+/* { dg-final { scan-assembler-not   {\mvspltisw\M}      } } */
+/* { dg-final { scan-assembler-not   {\mxxlor\M}         } } */
+/* { dg-final { scan-assembler-not   {\mxxlxor\M}        } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}      } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}         } } */
+/* { dg-final { scan-assembler-not   {\mlxv[dw][24]x\M}  } } */
+/* { dg-final { scan-assembler-not   {\mlvx\M}           } } */
+/* { dg-final { scan-assembler-not   {\mstxvx?\M}        } } */
+/* { dg-final { scan-assembler-not   {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not   {\mstvx\M}          } } */