From patchwork Thu Sep 12 21:31:20 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Pengxuan Zheng <quic_pzheng@quicinc.com>
X-Patchwork-Id: 1984895
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256
 header.s=qcppdkim1 header.b=hSOa+3bw;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4X4VyS2jVhz1y1y
	for <incoming@patchwork.ozlabs.org>; Fri, 13 Sep 2024 07:32:15 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 19D5F3858C78
	for <incoming@patchwork.ozlabs.org>; Thu, 12 Sep 2024 21:32:12 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com
 [205.220.168.131])
 by sourceware.org (Postfix) with ESMTPS id 8DE433858D26
 for <gcc-patches@gcc.gnu.org>; Thu, 12 Sep 2024 21:31:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DE433858D26
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=quicinc.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DE433858D26
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=205.220.168.131
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726176714; cv=none;
 b=G8V4XLgYOifwy+vAGx0VJdo2d5jCRJDbI2SLqg0Mh/wBy4QPFtYRk5fqlRuxZTYI2V7eQ5x04jLca37vnurd5SuwGV+BEXH2hwAnesg2dE/aXaxjEBATt5wpRNHzSlVjRP7Plawq2AGt1XUjINvFQCFA/q6Mt2TLqvIRR3Lnii0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1726176714; c=relaxed/simple;
 bh=WTTcm+k1xC75j0V/XiwnFFuPyPgTp9QRNQiDyJJv8J4=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=seYTEeby1MRQSN2BVYbaC5lD0p0NztQCpKGi0b/xtP0oIT4QeVtMyEx2c2OB/TGTOdkpsdiJSa5r71VImTrTGkkyDr11mKnVQqiciQt9n6GQ2AL1dOWTVAm2r+DE1eYPv3XKNPQcTDbonBW6/+JnOIbdDCGt9ulow38Y7eIFgsM=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from pps.filterd (m0279866.ppops.net [127.0.0.1])
 by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id
 48CKsJmD026536
 for <gcc-patches@gcc.gnu.org>; Thu, 12 Sep 2024 21:31:50 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=
 cc:content-type:date:from:message-id:mime-version:subject:to; s=
 qcppdkim1; bh=wMWPFhtA462IzRvZqgFyZuT0362Kj86w7suwx20IIDU=; b=hS
 Oa+3bwreN/KeBNiSOxaFuDgPZ4khpiaid/piKMRZ38V083/rk0xPXgnu4aA+g/Bw
 qw+lxxAFAs5gd8BssGbEMMEiXMhc9uLfzub5mdxZDpZ/durKhLcw2IKS5plTRZAh
 utvotbwjAdNJUZzcxPM23/hie43Z4RR5LmxfOWpoCxSqOGeUTsVG/X5eQXQVRn0S
 WLMwu5Kp9WKx03FWZcV4GEyFPncoOh8NL0flRbdC9ve08DPvUKpXBdwJjQaVjIGI
 YZEQGKCLnR0uwiNb4I4OEjf7soFI88lRZ1LTsVeAr7U/wHrJVtIehOJmMpdvxTEy
 YJ8ufWEifDfNr+kU0Dog==
Received: from nalasppmta03.qualcomm.com (Global_NAT1.qualcomm.com
 [129.46.96.20])
 by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 41gy6sxw2d-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Thu, 12 Sep 2024 21:31:49 +0000 (GMT)
Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com
 [10.47.97.35])
 by NALASPPMTA03.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id
 48CLVnKr016754
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <gcc-patches@gcc.gnu.org>; Thu, 12 Sep 2024 21:31:49 GMT
Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by
 nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.9; Thu, 12 Sep 2024 14:31:48 -0700
From: Pengxuan Zheng <quic_pzheng@quicinc.com>
To: <gcc-patches@gcc.gnu.org>
CC: Pengxuan Zheng <quic_pzheng@quicinc.com>
Subject: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization
 with SVE INDEX instruction [PR113328]
Date: Thu, 12 Sep 2024 14:31:20 -0700
Message-ID: <20240912213120.17158-1-quic_pzheng@quicinc.com>
X-Mailer: git-send-email 2.17.1
MIME-Version: 1.0
X-Originating-IP: [10.49.16.6]
X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To
 nalasex01c.na.qualcomm.com (10.47.97.35)
X-QCInternal: smtphost
X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800
 signatures=585085
X-Proofpoint-GUID: KoESfFUtXbRdOomwzzKrc8--m0QG7cYu
X-Proofpoint-ORIG-GUID: KoESfFUtXbRdOomwzzKrc8--m0QG7cYu
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29
 definitions=2024-09-06_09,2024-09-06_01,2024-09-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 impostorscore=0 spamscore=0
 suspectscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0
 adultscore=0 clxscore=1015 priorityscore=1501 lowpriorityscore=0
 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.19.0-2408220000 definitions=main-2409120157
X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

We can still use SVE's INDEX instruction to construct vectors even if not all
elements are constants. For example, { 0, x, 2, 3 } can be constructed by first
using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, and then set the elements which
are non-constants separately.

	PR target/113328

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
	Improve part-variable vector generation with SVE's INDEX if TARGET_SVE
	is available.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
	check-function-bodies.
	* gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
	* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_4.c: New test.
	* gcc.target/aarch64/sve/vec_init_5.c: New test.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
---
 gcc/config/aarch64/aarch64.cc                 | 81 ++++++++++++++++++-
 .../aarch64/sve/acle/general/dupq_1.c         | 18 ++++-
 .../aarch64/sve/acle/general/dupq_2.c         | 18 ++++-
 .../aarch64/sve/acle/general/dupq_3.c         | 18 ++++-
 .../aarch64/sve/acle/general/dupq_4.c         | 18 ++++-
 .../gcc.target/aarch64/sve/vec_init_4.c       | 47 +++++++++++
 .../gcc.target/aarch64/sve/vec_init_5.c       | 12 +++
 7 files changed, 199 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6b3ca57d0eb..7305a5c6375 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
   if (n_var != n_elts)
     {
       rtx copy = copy_rtx (vals);
+      bool is_index_seq = false;
+
+      /* If at least half of the elements of the vector are constants and all
+	 these constant elements form a linear sequence of the form { B, B + S,
+	 B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's
+	 INDEX instruction if SVE is available and then set the elements which
+	 are not constant separately.  More precisely, each constant element I
+	 has to be B + I * S where B and S must be valid immediate operand for
+	 an SVE INDEX instruction.
+
+	 For example, { X, 1, 2, 3} is a vector satisfying these conditions and
+	 we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first
+	 and then set the first element of the vector to X.  */
+
+      if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+	  && n_var <= n_elts / 2)
+	{
+	  int const_idx = -1;
+	  HOST_WIDE_INT const_val = 0;
+	  int base = 16;
+	  int step = 16;
+
+	  for (int i = 0; i < n_elts; ++i)
+	    {
+	      rtx x = XVECEXP (vals, 0, i);
+
+	      if (!CONST_INT_P (x))
+		continue;
+
+	      if (const_idx == -1)
+		{
+		  const_idx = i;
+		  const_val = INTVAL (x);
+		}
+	      else
+		{
+		  if ((INTVAL (x) - const_val) % (i - const_idx) == 0)
+		    {
+		      HOST_WIDE_INT s
+			  = (INTVAL (x) - const_val) / (i - const_idx);
+		      if (s >= -16 && s <= 15)
+			{
+			  int b = const_val - s * const_idx;
+			  if (b >= -16 && b <= 15)
+			    {
+			      base = b;
+			      step = s;
+			    }
+			}
+		    }
+		  break;
+		}
+	    }
+
+	  if (base != 16
+	      && (!CONST_INT_P (v0)
+		  || (CONST_INT_P (v0) && INTVAL (v0) == base)))
+	    {
+	      if (!CONST_INT_P (v0))
+		XVECEXP (copy, 0, 0) = GEN_INT (base);
+
+	      is_index_seq = true;
+	      for (int i = 1; i < n_elts; ++i)
+		{
+		  rtx x = XVECEXP (copy, 0, i);
+
+		  if (CONST_INT_P (x))
+		    {
+		      if (INTVAL (x) != base + i * step)
+			{
+			  is_index_seq = false;
+			  break;
+			}
+		    }
+		  else
+		    XVECEXP (copy, 0, i) = GEN_INT (base + i * step);
+		}
+	    }
+	}
 
       /* Load constant part of vector.  We really don't care what goes into the
 	 parts we will overwrite, but we're more likely to be able to load the
 	 constant efficiently if it has fewer, larger, repeating parts
 	 (see aarch64_simd_valid_immediate).  */
-      for (int i = 0; i < n_elts; i++)
+      for (int i = 0; !is_index_seq && i < n_elts; i++)
 	{
 	  rtx x = XVECEXP (vals, 0, i);
 	  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
index 0940bedd0dd..80eb1efdc66 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
@@ -1,15 +1,27 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
 /* { dg-require-effective-target aarch64_little_endian } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
 
 #include <arm_sve.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** dupq:
+**	index	z0\.s, #0, #1
+**	ins	v0\.s\[0\], w0
+**	dup	z0\.q, z0\.q\[0\]
+**	ret
+*/
 svint32_t
 dupq (int x)
 {
   return svdupq_s32 (x, 1, 2, 3);
 }
 
-/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */
-/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
-/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
+#ifdef __cplusplus
+}
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
index 218a6601337..afcad0a691e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
@@ -1,15 +1,27 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mbig-endian" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
 
 /* To avoid needing big-endian header files.  */
 #pragma GCC aarch64 "arm_sve.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** dupq:
+**	index	z0\.s, #3, #-1
+**	ins	v0\.s\[0\], w0
+**	dup	z0\.q, z0\.q\[0\]
+**	ret
+*/
 svint32_t
 dupq (int x)
 {
   return svdupq_s32 (x, 1, 2, 3);
 }
 
-/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #3, #-1} } } */
-/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
-/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
+#ifdef __cplusplus
+}
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c
index 245d43b75b5..f912f4b905c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c
@@ -1,15 +1,27 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mlittle-endian" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
 
 /* To avoid needing big-endian header files.  */
 #pragma GCC aarch64 "arm_sve.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** dupq:
+**	index	z0\.s, #0, #1
+**	ins	v0\.s\[2\], w0
+**	dup	z0\.q, z0\.q\[0\]
+**	ret
+*/
 svint32_t
 dupq (int x)
 {
   return svdupq_s32 (0, 1, x, 3);
 }
 
-/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */
-/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[2\], w0\n} } } */
-/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
+#ifdef __cplusplus
+}
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c
index cbee6f27b62..0cfdb23101b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c
@@ -1,15 +1,27 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mbig-endian" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
 
 /* To avoid needing big-endian header files.  */
 #pragma GCC aarch64 "arm_sve.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** dupq:
+**	index	z0\.s, #3, #-1
+**	ins	v0\.s\[2\], w0
+**	dup	z0\.q, z0\.q\[0\]
+**	ret
+*/
 svint32_t
 dupq (int x)
 {
   return svdupq_s32 (0, 1, x, 3);
 }
 
-/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #3, #-1} } } */
-/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[2\], w0\n} } } */
-/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
+#ifdef __cplusplus
+}
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
new file mode 100644
index 00000000000..898168dc8ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef short v8hi __attribute__((vector_size(16)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef long v2di __attribute__((vector_size(16)));
+
+/*
+** f:
+**	index	z0\.s, #0, #1
+**	ins	v0\.s\[1\], w0
+**	ret
+*/
+v4si
+f (int x)
+{
+  return (v4si){ 0, x, 2, 3 };
+}
+
+/*
+** f1:
+**	index	z0\.s, #3, #-4
+**	ins	v0\.s\[1\], w0
+**	ins	v0\.s\[2\], w1
+**	ret
+*/
+v4si
+f1 (int x, int y)
+{
+  return (v4si){ 3, x, y, -9 };
+}
+
+/*
+** f2:
+**	index	z0\.h, #4, #2
+**	ins	v0\.h\[0\], w0
+**	ins	v0\.h\[3\], w1
+**	ins	v0\.h\[7\], w2
+**	ret
+*/
+v8hi
+f2 (short x, short y, short z)
+{
+  return (v8hi){ x, 6, 8, y, 12, 14, 16, z };
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
new file mode 100644
index 00000000000..e4a71736f5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si
+f (int x, int y)
+{
+  return (v4si){ 1, x, y, 3 };
+}
+
+/* { dg-final { scan-assembler-not {index} } } */