From patchwork Thu Sep 12 21:31:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pengxuan Zheng X-Patchwork-Id: 1984895 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=hSOa+3bw; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X4VyS2jVhz1y1y for ; Fri, 13 Sep 2024 07:32:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 19D5F3858C78 for ; Thu, 12 Sep 2024 21:32:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by sourceware.org (Postfix) with ESMTPS id 8DE433858D26 for ; Thu, 12 Sep 2024 21:31:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DE433858D26 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DE433858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726176714; cv=none; b=G8V4XLgYOifwy+vAGx0VJdo2d5jCRJDbI2SLqg0Mh/wBy4QPFtYRk5fqlRuxZTYI2V7eQ5x04jLca37vnurd5SuwGV+BEXH2hwAnesg2dE/aXaxjEBATt5wpRNHzSlVjRP7Plawq2AGt1XUjINvFQCFA/q6Mt2TLqvIRR3Lnii0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726176714; c=relaxed/simple; bh=WTTcm+k1xC75j0V/XiwnFFuPyPgTp9QRNQiDyJJv8J4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=seYTEeby1MRQSN2BVYbaC5lD0p0NztQCpKGi0b/xtP0oIT4QeVtMyEx2c2OB/TGTOdkpsdiJSa5r71VImTrTGkkyDr11mKnVQqiciQt9n6GQ2AL1dOWTVAm2r+DE1eYPv3XKNPQcTDbonBW6/+JnOIbdDCGt9ulow38Y7eIFgsM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0279866.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 48CKsJmD026536 for ; Thu, 12 Sep 2024 21:31:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-type:date:from:message-id:mime-version:subject:to; s= qcppdkim1; bh=wMWPFhtA462IzRvZqgFyZuT0362Kj86w7suwx20IIDU=; b=hS Oa+3bwreN/KeBNiSOxaFuDgPZ4khpiaid/piKMRZ38V083/rk0xPXgnu4aA+g/Bw qw+lxxAFAs5gd8BssGbEMMEiXMhc9uLfzub5mdxZDpZ/durKhLcw2IKS5plTRZAh utvotbwjAdNJUZzcxPM23/hie43Z4RR5LmxfOWpoCxSqOGeUTsVG/X5eQXQVRn0S WLMwu5Kp9WKx03FWZcV4GEyFPncoOh8NL0flRbdC9ve08DPvUKpXBdwJjQaVjIGI YZEQGKCLnR0uwiNb4I4OEjf7soFI88lRZ1LTsVeAr7U/wHrJVtIehOJmMpdvxTEy YJ8ufWEifDfNr+kU0Dog== Received: from nalasppmta03.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 41gy6sxw2d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 12 Sep 2024 21:31:49 +0000 (GMT) Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com [10.47.97.35]) by NALASPPMTA03.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 48CLVnKr016754 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 12 Sep 2024 21:31:49 GMT Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Thu, 12 Sep 2024 14:31:48 -0700 From: Pengxuan Zheng To: CC: Pengxuan Zheng Subject: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328] Date: Thu, 12 Sep 2024 14:31:20 -0700 Message-ID: <20240912213120.17158-1-quic_pzheng@quicinc.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To nalasex01c.na.qualcomm.com (10.47.97.35) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: KoESfFUtXbRdOomwzzKrc8--m0QG7cYu X-Proofpoint-ORIG-GUID: KoESfFUtXbRdOomwzzKrc8--m0QG7cYu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-06_09,2024-09-06_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 clxscore=1015 priorityscore=1501 lowpriorityscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2408220000 definitions=main-2409120157 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org We can still use SVE's INDEX instruction to construct vectors even if not all elements are constants. For example, { 0, x, 2, 3 } can be constructed by first using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, and then set the elements which are non-constants separately. PR target/113328 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback): Improve part-variable vector generation with SVE's INDEX if TARGET_SVE is available. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use check-function-bodies. * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise. * gcc.target/aarch64/sve/vec_init_4.c: New test. * gcc.target/aarch64/sve/vec_init_5.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64.cc | 81 ++++++++++++++++++- .../aarch64/sve/acle/general/dupq_1.c | 18 ++++- .../aarch64/sve/acle/general/dupq_2.c | 18 ++++- .../aarch64/sve/acle/general/dupq_3.c | 18 ++++- .../aarch64/sve/acle/general/dupq_4.c | 18 ++++- .../gcc.target/aarch64/sve/vec_init_4.c | 47 +++++++++++ .../gcc.target/aarch64/sve/vec_init_5.c | 12 +++ 7 files changed, 199 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 6b3ca57d0eb..7305a5c6375 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals) if (n_var != n_elts) { rtx copy = copy_rtx (vals); + bool is_index_seq = false; + + /* If at least half of the elements of the vector are constants and all + these constant elements form a linear sequence of the form { B, B + S, + B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's + INDEX instruction if SVE is available and then set the elements which + are not constant separately. More precisely, each constant element I + has to be B + I * S where B and S must be valid immediate operand for + an SVE INDEX instruction. + + For example, { X, 1, 2, 3} is a vector satisfying these conditions and + we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first + and then set the first element of the vector to X. */ + + if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT + && n_var <= n_elts / 2) + { + int const_idx = -1; + HOST_WIDE_INT const_val = 0; + int base = 16; + int step = 16; + + for (int i = 0; i < n_elts; ++i) + { + rtx x = XVECEXP (vals, 0, i); + + if (!CONST_INT_P (x)) + continue; + + if (const_idx == -1) + { + const_idx = i; + const_val = INTVAL (x); + } + else + { + if ((INTVAL (x) - const_val) % (i - const_idx) == 0) + { + HOST_WIDE_INT s + = (INTVAL (x) - const_val) / (i - const_idx); + if (s >= -16 && s <= 15) + { + int b = const_val - s * const_idx; + if (b >= -16 && b <= 15) + { + base = b; + step = s; + } + } + } + break; + } + } + + if (base != 16 + && (!CONST_INT_P (v0) + || (CONST_INT_P (v0) && INTVAL (v0) == base))) + { + if (!CONST_INT_P (v0)) + XVECEXP (copy, 0, 0) = GEN_INT (base); + + is_index_seq = true; + for (int i = 1; i < n_elts; ++i) + { + rtx x = XVECEXP (copy, 0, i); + + if (CONST_INT_P (x)) + { + if (INTVAL (x) != base + i * step) + { + is_index_seq = false; + break; + } + } + else + XVECEXP (copy, 0, i) = GEN_INT (base + i * step); + } + } + } /* Load constant part of vector. We really don't care what goes into the parts we will overwrite, but we're more likely to be able to load the constant efficiently if it has fewer, larger, repeating parts (see aarch64_simd_valid_immediate). */ - for (int i = 0; i < n_elts; i++) + for (int i = 0; !is_index_seq && i < n_elts; i++) { rtx x = XVECEXP (vals, 0, i); if (CONST_INT_P (x) || CONST_DOUBLE_P (x)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c index 0940bedd0dd..80eb1efdc66 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c @@ -1,15 +1,27 @@ /* { dg-do compile } */ /* { dg-options "-O2" } */ /* { dg-require-effective-target aarch64_little_endian } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ #include +#ifdef __cplusplus +extern "C" { +#endif + +/* +** dupq: +** index z0\.s, #0, #1 +** ins v0\.s\[0\], w0 +** dup z0\.q, z0\.q\[0\] +** ret +*/ svint32_t dupq (int x) { return svdupq_s32 (x, 1, 2, 3); } -/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */ -/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */ -/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */ +#ifdef __cplusplus +} +#endif diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c index 218a6601337..afcad0a691e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c @@ -1,15 +1,27 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mbig-endian" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ /* To avoid needing big-endian header files. */ #pragma GCC aarch64 "arm_sve.h" +#ifdef __cplusplus +extern "C" { +#endif + +/* +** dupq: +** index z0\.s, #3, #-1 +** ins v0\.s\[0\], w0 +** dup z0\.q, z0\.q\[0\] +** ret +*/ svint32_t dupq (int x) { return svdupq_s32 (x, 1, 2, 3); } -/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #3, #-1} } } */ -/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */ -/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */ +#ifdef __cplusplus +} +#endif diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c index 245d43b75b5..f912f4b905c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c @@ -1,15 +1,27 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mlittle-endian" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ /* To avoid needing big-endian header files. */ #pragma GCC aarch64 "arm_sve.h" +#ifdef __cplusplus +extern "C" { +#endif + +/* +** dupq: +** index z0\.s, #0, #1 +** ins v0\.s\[2\], w0 +** dup z0\.q, z0\.q\[0\] +** ret +*/ svint32_t dupq (int x) { return svdupq_s32 (0, 1, x, 3); } -/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */ -/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[2\], w0\n} } } */ -/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */ +#ifdef __cplusplus +} +#endif diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c index cbee6f27b62..0cfdb23101b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_4.c @@ -1,15 +1,27 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mbig-endian" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ /* To avoid needing big-endian header files. */ #pragma GCC aarch64 "arm_sve.h" +#ifdef __cplusplus +extern "C" { +#endif + +/* +** dupq: +** index z0\.s, #3, #-1 +** ins v0\.s\[2\], w0 +** dup z0\.q, z0\.q\[0\] +** ret +*/ svint32_t dupq (int x) { return svdupq_s32 (0, 1, x, 3); } -/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #3, #-1} } } */ -/* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[2\], w0\n} } } */ -/* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */ +#ifdef __cplusplus +} +#endif diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c new file mode 100644 index 00000000000..898168dc8ac --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +typedef short v8hi __attribute__((vector_size(16))); +typedef int v4si __attribute__ ((vector_size (16))); +typedef long v2di __attribute__((vector_size(16))); + +/* +** f: +** index z0\.s, #0, #1 +** ins v0\.s\[1\], w0 +** ret +*/ +v4si +f (int x) +{ + return (v4si){ 0, x, 2, 3 }; +} + +/* +** f1: +** index z0\.s, #3, #-4 +** ins v0\.s\[1\], w0 +** ins v0\.s\[2\], w1 +** ret +*/ +v4si +f1 (int x, int y) +{ + return (v4si){ 3, x, y, -9 }; +} + +/* +** f2: +** index z0\.h, #4, #2 +** ins v0\.h\[0\], w0 +** ins v0\.h\[3\], w1 +** ins v0\.h\[7\], w2 +** ret +*/ +v8hi +f2 (short x, short y, short z) +{ + return (v8hi){ x, 6, 8, y, 12, 14, 16, z }; +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c new file mode 100644 index 00000000000..e4a71736f5f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef int v4si __attribute__ ((vector_size (16))); + +v4si +f (int x, int y) +{ + return (v4si){ 1, x, y, 3 }; +} + +/* { dg-final { scan-assembler-not {index} } } */