From patchwork Mon Aug  4 13:32:26 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Lawrence <alan.lawrence@arm.com>
X-Patchwork-Id: 376312
Return-Path: 
 <gcc-patches-return-374049-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 755EB14007B
	for <incoming@patchwork.ozlabs.org>;
	Mon,  4 Aug 2014 23:32:40 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:content-type; q=
	dns; s=default; b=cIfCkuQDnLi50Fv7JfX+ZMKXLkzr84a0Dj/XRoi8qhq/nd
	LrrkPhsQEYW5I7ix78ZbIfgYb3XxAqmYMyBJakfVJ6CuNBUxYIkcN0yRbiL6IvjS
	A+xsgXINkgxIXOkUjbkx94YmcrQN3/2fSdrifRR8J/gNWorLqo0PxFxC8YT5w=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:content-type; s=
	default; bh=VH7ZR74+Y3vzdujm04nIC965BlA=; b=v2GsZdKPo0Ss3HSobJrC
	gSOM21ZSq+r5TxcLSTZWw+ITM/SQnfisTwa1HmAyi/5ZWUwWAWJjZHONkF4gquii
	TYsxxvGldMhajYE+btDYh0k96RBGhbprZhmEAj8PamPgtycWUmYOJBd7YzTUoZWX
	Q8hy6oSiKD4IJI0dO1nb7/Y=
Received: (qmail 8369 invoked by alias); 4 Aug 2014 13:32:33 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 8354 invoked by uid 89); 4 Aug 2014 13:32:32 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=3.0 required=5.0 tests=AWL, BAYES_00, SPF_PASS,
	ZIP_ATTACHED autolearn=no version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com)
	(91.220.42.44) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Mon, 04 Aug 2014 13:32:31 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Mon, 04 Aug 2014 14:32:28 +0100
Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Mon, 4 Aug 2014 14:32:27 +0100
Message-ID: <53DF8B6A.6040504@arm.com>
Date: Mon, 04 Aug 2014 14:32:26 +0100
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH AArch64] Prefer dup to zip for vec_perm_const;
	enable dup for bigendian; add testcase.
X-MC-Unique: 114080414322800301
X-IsSubscribed: yes

At the moment, for two-element vectors, __builtin_shuffle (vector, (mask) {C, 
C}) for identical constants C outputs a zip (with both argument vectors the 
same) rather than a dup. Dup is more obvious and easier to read, so prefer it.

For big-endian, aarch64_evpc_dup always aborts; however tests demonstrate it 
works ok, so enable it.

Finally, add a testcase (of execution results, this gives confidence that 
evpc_dup is ok for bigendian - yes, a different element index is output than for 
little-endian). Note existing tests for zip are not affected, they always have 
the two arguments different.

gcc/ChangeLog:
	* config/aarch64/aarch64.c (aarch64_evpc_dup): Enable for bigendian.
	(aarch64_expand_vec_perm_const): Check for dup before zip.

gcc/testsuite/ChangeLog:
	* gcc.target/aarch64/vdup_n_2.c: New test.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4c65bb1dbc190165eee9dd2d9b54779ac4a362fa..153b1c3d282cbfb4872d2b267e763c9ec0ddeb90 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9157,10 +9157,6 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
   unsigned int i, elt, nelt = d->nelt;
   rtx lane;
 
-  /* TODO: This may not be big-endian safe.  */
-  if (BYTES_BIG_ENDIAN)
-    return false;
-
   elt = d->perm[0];
   for (i = 1; i < nelt; i++)
     {
@@ -9174,7 +9170,7 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
      use d->op0 and need not do any extra arithmetic to get the
      correct lane number.  */
   in0 = d->op0;
-  lane = GEN_INT (elt);
+  lane = GEN_INT (elt); /* The pattern corrects for big-endian.  */
 
   switch (vmode)
     {
@@ -9255,14 +9251,14 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
 	return true;
       else if (aarch64_evpc_ext (d))
 	return true;
+      else if (aarch64_evpc_dup (d))
+	return true;
       else if (aarch64_evpc_zip (d))
 	return true;
       else if (aarch64_evpc_uzp (d))
 	return true;
       else if (aarch64_evpc_trn (d))
 	return true;
-      else if (aarch64_evpc_dup (d))
-	return true;
       return aarch64_evpc_tbl (d);
     }
   return false;
diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_n_2.c b/gcc/testsuite/gcc.target/aarch64/vdup_n_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..660fb0faeabcc632ae3edb1fb8fa9b96d57a4923
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vdup_n_2.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline --save-temps" } */
+
+extern void abort (void);
+
+typedef float float32x2_t __attribute__ ((__vector_size__ ((8))));
+typedef unsigned int uint32x2_t __attribute__ ((__vector_size__ ((8))));
+
+float32x2_t
+test_dup_1 (float32x2_t in)
+{
+  return __builtin_shuffle (in, (uint32x2_t) {1, 1});
+}
+
+int
+main (int argc, char **argv)
+{
+  float32x2_t test = {2.718, 3.141};
+  float32x2_t res = test_dup_1 (test);
+  if (res[0] != test[1] || res[1] != test[1])
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "\[ \t\]*dup\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.s\\\[\[01\]\\\]" 1 } } */
+/* { dg-final { scan-assembler-not "zip" } } */
+/* { dg-final { cleanup-saved-temps } } */
+