From patchwork Fri Apr 21 08:39:29 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
X-Patchwork-Id: 753205
Return-Path: 
 <gcc-patches-return-451837-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3w8Tgk6Jmsz9ryk
	for <incoming@patchwork.ozlabs.org>;
	Fri, 21 Apr 2017 18:39:45 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="qApE03wS"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	q=dns; s=default; b=JkhFAsKH5hbdsIt89SEMaYbELmfd63N/tky+RnWd95k
	Ks5FkLeb7OC1tYwb8cOkx5g5oy39aIuyDcqt+y7GeON9tYgwDOyy/4++LZWz8CxR
	AmRT69rdABiljYQs/zG5mt17VbExhtvtKU7J0TaKJj8PWSZ0rnfXQQGp4ayuWxHo
	=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	s=default; bh=92fK1gvQhlVOHQ039Kz5KDd1gok=; b=qApE03wSj0NBgedQP
	ASTNNxmBIimGr7K6lvT/cyUpd/Bk5DiFtnxsRM4huGE/mp3udrjujlRQWW8XdQ+1
	HLeI8DW16CFcopNGnedLfScP/4TStBywjX0tD6YgGBzRfQgmYyoEZdKQVdQFxbut
	uqsIJLYWNnA46gdKaTwYzlt/xQ=
Received: (qmail 126849 invoked by alias); 21 Apr 2017 08:39:34 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 126834 invoked by uid 89); 21 Apr 2017 08:39:33 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	KAM_LAZY_DOMAIN_SECURITY,
	RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 21 Apr 2017 08:39:32 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
	19D8580D; Fri, 21 Apr 2017 01:39:32 -0700 (PDT)
Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com
	[10.2.207.77])	by usa-sjc-imap-foss1.foss.arm.com (Postfix)
	with ESMTPSA id 55DB23F41F; Fri, 21 Apr 2017 01:39:31 -0700 (PDT)
Message-ID: <58F9C541.2090900@foss.arm.com>
Date: Fri, 21 Apr 2017 09:39:29 +0100
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <marcus.shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	James Greenhalgh <james.greenhalgh@arm.com>
Subject: [PATCH][AArch64] Add combine pattern for storing lane zero of a
	vecto

Hi all,

Consider the code:
typedef long long v2di __attribute__ ((vector_size (16)));
  
void
store_laned (v2di x, long long *y)
{
   y[0] = x[1];
   y[3] = x[0];
}

AArch64 GCC will generate:
store_laned:
         umov    x1, v0.d[0]
         st1     {v0.d}[1], [x0]
         str     x1, [x0, 24]
         ret

It moves the zero lane into a core register and does a scalar store when instead it could have used a scalar FP store
that supports the required addressing mode:
store_laned:
         st1     {v0.d}[1], [x0]
         str     d0, [x0, 24]
         ret

Combine already tries to match this pattern:

Trying 10 -> 11:
Failed to match this instruction:
(set (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
             (const_int 24 [0x18])) [1 MEM[(long long int *)y_4(D) + 24B]+0 S8 A64])
     (vec_select:DI (reg/v:V2DI 75 [ x ])
         (parallel [
                 (const_int 0 [0])
             ])))

but we don't match it in the backend. It's not hard to add it, so this patch does that for all the relevant vector modes.
With this patch we generate the second sequence above and in SPEC2006 eliminate some address computation instructions
because we use the more expressive STR instead of ST1 or we eliminate such moves to the integer registers because we
can just do the store of the D-reg.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* config/aarch64/aarch64-simd.md (aarch64_store_lane0<mode>):
	New pattern.

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* gcc.target/aarch64/store_lane0_str_1.c: New test.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f36665e27bb66e0f1fb42443ce7b506bd2bf6914..bf13f0753a856a13ae92ceeb44291df3dc379a13 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -153,6 +153,19 @@ (define_insn "*aarch64_simd_mov<mode>"
    (set_attr "length" "4,4,4,8,8,8,4")]
 )
 
+;; When storing lane zero we can use the normal STR and its more permissive
+;; addressing modes.
+
+(define_insn "aarch64_store_lane0<mode>"
+  [(set (match_operand:<VEL> 0 "memory_operand" "=m")
+	(vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand" "w")
+			(parallel [(match_operand 2 "const_int_operand" "n")])))]
+  "TARGET_SIMD
+   && ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])) == 0"
+  "str\\t%<Vetype>1, %0"
+  [(set_attr "type" "neon_store1_1reg<q>")]
+)
+
 (define_insn "load_pair<mode>"
   [(set (match_operand:VD 0 "register_operand" "=w")
 	(match_operand:VD 1 "aarch64_mem_pair_operand" "Ump"))
diff --git a/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c b/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4464fec2c1f24c212be4fc6c94b509843fd0058e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef float v2sf __attribute__ ((vector_size (8)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef __fp16 v4hf __attribute__ ((vector_size (8)));
+typedef char v8qi __attribute__ ((vector_size (8)));
+
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef __fp16 v8hf __attribute__ ((vector_size (16)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define LANE(N) (N - 1)
+#else
+#define LANE(N) 0
+#endif
+
+#define FUNC(T, E, N)			\
+void					\
+store_lane_##T (T x, E *y)		\
+{					\
+  y[0] = x[N - 1 - LANE (N)];		\
+  y[3] = x[LANE (N)];			\
+}
+
+FUNC (v2si, int, 2)
+FUNC (v2sf, float, 2)
+FUNC (v4hi, short, 4)
+FUNC (v4hf, __fp16, 4)
+FUNC (v8qi, char, 8)
+
+FUNC (v4si, int, 4)
+FUNC (v4sf, float, 4)
+FUNC (v8hi, short, 8)
+FUNC (v8hf, __fp16, 8)
+FUNC (v16qi, char, 16)
+FUNC (v2di, long long, 2)
+FUNC (v2df, double, 2)
+
+/* When storing lane zero of a vector we can use the scalar STR instruction
+   that supports more addressing modes.  */
+
+/* { dg-final { scan-assembler-times "str\ts\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "str\tb\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-times "str\th\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "str\td\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-not "umov" } } */
+/* { dg-final { scan-assembler-not "dup" } } */