From patchwork Mon Oct 14 13:10:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pengxuan Zheng (QUIC)" X-Patchwork-Id: 1996930 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=QNnTATtf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRyKr25ZPz1xvK for ; Tue, 15 Oct 2024 00:11:28 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5AF06385AE43 for ; Mon, 14 Oct 2024 13:11:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by sourceware.org (Postfix) with ESMTPS id AE231385AC37 for ; Mon, 14 Oct 2024 13:10:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE231385AC37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AE231385AC37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728911438; cv=none; b=PFQrW433n7bY3T6QamWUZgsT93mHE4zhQa/7Nf9X0GKN4Xbhsb33b53ZsuWoyvMTzWO1XJRl1+TnieC6FdIyd9Ugkz8ywGO5xwtLCy6my5tQZA/qM7DVu27JGQm3Szaf9vTKqdWDF67grOPtD0knSfOmaDKMfNJMdJgapmhFwDI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728911438; c=relaxed/simple; bh=R9oFV84edy22CHuUNkcoLR4lrj9c1aPG4J8+eVa44j8=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=PimQR27DTWajOvN99y4cPw0ajGmooFWo3c3pRmTgeiMGZRnimTcxS28Dc/SeK/Am55rOs6thNbrWAFQYoL31i6REEnwU7DX2aaMWQSPg1qJOTrhC/eGalGEwwaUiLhSCCuTRlJ9pdH+itrh0U8Kt5ZnhGp09yJVSBsJtuUb06RU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0279865.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49EAUAg9022751 for ; Mon, 14 Oct 2024 13:10:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-type:date:from:message-id:mime-version:subject:to; s= qcppdkim1; bh=0R4GjPUffm1JFgN0J6ZmMb50Ns894nAeJFW3Kn/PeVk=; b=QN nTATtf15Sk/p81+4BztQ79MKqULUh7PDTDifMZ/j+Y0RCYL9Ic5dAxqNIXmYp+fa ywVV17WKkALiZ4bkzbh0zN5cMxQ6O6tYWZt7m1ORw2pfwIFmspg2EStfDisIDG5d J1mXFYIfcOeiWeusG37CFrq0OSud5rYbGIA6LwLZemtYP1kVwm/gp72GhGEqYSrU 2BR7v0J1mhufD39ZkRIEeCTCtchRGUbWASoLyYSB0A6bnVRgY59ISUM1hI5nLrFh eVf+i4OgIhfPuxay4q05ACrU97XUVfWYtqh/QeYDtzfr+P+FObcdLccHo2s3SctX Bk2jFSlvvtgA8L+ny35g== Received: from nalasppmta02.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 427g2rmg34-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 14 Oct 2024 13:10:31 +0000 (GMT) Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com [10.47.97.35]) by NALASPPMTA02.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 49EDAVZk030446 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 14 Oct 2024 13:10:31 GMT Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Mon, 14 Oct 2024 06:10:31 -0700 From: Pengxuan Zheng To: CC: Pengxuan Zheng Subject: [PATCH v3] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860] Date: Mon, 14 Oct 2024 06:10:09 -0700 Message-ID: <20241014131009.18722-1-quic_pzheng@quicinc.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nalasex01c.na.qualcomm.com (10.47.97.35) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: kWCylLujDIBKaLlHDXH--LQd1zjnwBLh X-Proofpoint-GUID: kWCylLujDIBKaLlHDXH--LQd1zjnwBLh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-06_09,2024-09-06_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 phishscore=0 malwarescore=0 clxscore=1011 spamscore=0 adultscore=0 mlxlogscore=999 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410140095 X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This is similar to the recent improvements to the Advanced SIMD popcount expansion by using SVE. We can utilize SVE to generate more efficient code for scalar mode popcount too. Changes since v1: * v2: Add a new VNx1BI mode and a new test case for V1DI. * v3: Abandon VNx1BI changes and add a new variant of aarch64_ptrue_reg. PR target/113860 gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): New function. * config/aarch64/aarch64-simd.md (popcount2): Update pattern to also support V1DI mode. * config/aarch64/aarch64.cc (aarch64_ptrue_reg): New function. * config/aarch64/aarch64.md (popcount2): Add TARGET_SVE support. * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator. (SVE_VDQ_I): Add V1DI. (bitsize): Likewise. (VPRED): Likewise. (VEC_POP_MODE): New mode attribute. (vec_pop_mode): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt-sve.c: Update test. * gcc.target/aarch64/popcnt11.c: New test. * gcc.target/aarch64/popcnt12.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 15 ++++- gcc/config/aarch64/aarch64.cc | 21 +++++++ gcc/config/aarch64/aarch64.md | 9 +++ gcc/config/aarch64/iterators.md | 16 ++++- gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 10 ++-- gcc/testsuite/gcc.target/aarch64/popcnt11.c | 58 +++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/popcnt12.c | 18 ++++++ 8 files changed, 137 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt11.c create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt12.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d03c1fe798b..064bbf430ff 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -922,6 +922,7 @@ rtx aarch64_expand_sve_dupq (rtx, machine_mode, rtx); void aarch64_expand_mov_immediate (rtx, rtx); rtx aarch64_stack_protect_canary_mem (machine_mode, rtx, aarch64_salt_type); rtx aarch64_ptrue_reg (machine_mode); +rtx aarch64_ptrue_reg (machine_mode, unsigned int); rtx aarch64_pfalse_reg (machine_mode); bool aarch64_sve_same_pred_for_ptest_p (rtx *, rtx *); void aarch64_emit_sve_pred_move (rtx, rtx, rtx); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bf272bc0b4e..476c69672d0 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3516,19 +3516,28 @@ (define_insn "popcount2" ) (define_expand "popcount2" - [(set (match_operand:VDQHSD 0 "register_operand") - (popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))] + [(set (match_operand:VDQHSD_V1DI 0 "register_operand") + (popcount:VDQHSD_V1DI + (match_operand:VDQHSD_V1DI 1 "register_operand")))] "TARGET_SIMD" { if (TARGET_SVE) { - rtx p = aarch64_ptrue_reg (mode); + rtx p = aarch64_ptrue_reg (mode, == 64 ? 8 : 16); emit_insn (gen_aarch64_pred_popcount (operands[0], p, operands[1])); DONE; } + if (mode == V1DImode) + { + rtx out = gen_reg_rtx (DImode); + emit_insn (gen_popcountdi2 (out, gen_lowpart (DImode, operands[1]))); + emit_move_insn (operands[0], gen_lowpart (mode, out)); + DONE; + } + /* Generate a byte popcount. */ machine_mode mode = == 64 ? V8QImode : V16QImode; machine_mode mode2 = == 64 ? V2SImode : V4SImode; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 102680a0efc..f82c8102701 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -3621,6 +3621,27 @@ aarch64_ptrue_reg (machine_mode mode) return gen_lowpart (mode, reg); } +/* Return an all-true (restricted to the leading VL bits) predicate register of + mode MODE. */ + +rtx +aarch64_ptrue_reg (machine_mode mode, unsigned int vl) +{ + gcc_assert (aarch64_sve_pred_mode_p (mode)); + + rtx_vector_builder builder (VNx16BImode, vl, 2); + + for (int i = 0; i < vl; i++) + builder.quick_push (CONST1_RTX (BImode)); + + for (int i = 0; i < vl; i++) + builder.quick_push (CONST0_RTX (BImode)); + + rtx const_vec = builder.build (); + rtx reg = force_reg (VNx16BImode, const_vec); + return gen_lowpart (mode, reg); +} + /* Return an all-false predicate register of mode MODE. */ rtx diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c54b29cd64b..ef52770f1cb 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5345,6 +5345,15 @@ (define_expand "popcount2" (popcount:ALLI (match_operand:ALLI 1 "register_operand")))] "TARGET_CSSC ? GET_MODE_BITSIZE (mode) >= 32 : TARGET_SIMD" { + if (!TARGET_CSSC && TARGET_SVE && mode != QImode) + { + rtx tmp = gen_reg_rtx (mode); + rtx op1 = gen_lowpart (mode, operands[1]); + emit_insn (gen_popcount2 (tmp, op1)); + emit_move_insn (operands[0], gen_lowpart (mode, tmp)); + DONE; + } + if (!TARGET_CSSC) { rtx v = gen_reg_rtx (V8QImode); diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index efba78375c2..fb315c3654b 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -290,6 +290,8 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI]) ;; Advanced SIMD modes for H, S and D types. (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI]) +(define_mode_iterator VDQHSD_V1DI [VDQHSD V1DI]) + ;; Advanced SIMD and scalar integer modes for H and S. (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI]) @@ -559,7 +561,7 @@ (define_mode_iterator SVE_ALL_STRUCT [SVE_ALL SVE_STRUCT]) (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI]) ;; All SVE and Advanced SIMD integer vector modes. -(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I]) +(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I V1DI]) ;; SVE integer vector modes whose elements are 16 bits or wider. (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI @@ -1235,7 +1237,7 @@ (define_mode_attr nunits [(V8QI "8") (V16QI "16") (define_mode_attr bitsize [(V8QI "64") (V16QI "128") (V4HI "64") (V8HI "128") (V2SI "64") (V4SI "128") - (V2DI "128")]) + (V1DI "64") (V2DI "128")]) ;; Map a floating point or integer mode to the appropriate register name prefix (define_mode_attr s [(HF "h") (SF "s") (DF "d") (SI "s") (DI "d")]) @@ -2289,7 +2291,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI") (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") (V8QI "VNx8BI") (V16QI "VNx16BI") (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") - (V4SI "VNx4BI") (V2DI "VNx2BI")]) + (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")]) ;; ...and again in lower case. (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") @@ -2323,6 +2325,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI") (VNx4SI "VNx8SI") (VNx4SF "VNx8SF") (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")]) +;; The Advanced SIMD modes of popcount corresponding to scalar modes. +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI") + (SI "V2SI") (DI "V1DI")]) + +;; ...and again in lower case. +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi") + (SI "v2si") (DI "v1di")]) + ;; On AArch64 the By element instruction doesn't have a 2S variant. ;; However because the instruction always selects a pair of values ;; The normal 3SAME instruction can be used here instead. diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c index 8e349efe390..c3b4c69b4b4 100644 --- a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c @@ -4,7 +4,7 @@ /* ** f_v4hi: -** ptrue (p[0-7]).b, all +** ptrue (p[0-7]).b, vl8 ** ldr d([0-9]+), \[x0\] ** cnt z\2.h, \1/m, z\2.h ** str d\2, \[x1\] @@ -21,7 +21,7 @@ f_v4hi (unsigned short *__restrict b, unsigned short *__restrict d) /* ** f_v8hi: -** ptrue (p[0-7]).b, all +** ptrue (p[0-7]).b, vl16 ** ldr q([0-9]+), \[x0\] ** cnt z\2.h, \1/m, z\2.h ** str q\2, \[x1\] @@ -42,7 +42,7 @@ f_v8hi (unsigned short *__restrict b, unsigned short *__restrict d) /* ** f_v2si: -** ptrue (p[0-7]).b, all +** ptrue (p[0-7]).b, vl8 ** ldr d([0-9]+), \[x0\] ** cnt z\2.s, \1/m, z\2.s ** str d\2, \[x1\] @@ -57,7 +57,7 @@ f_v2si (unsigned int *__restrict b, unsigned int *__restrict d) /* ** f_v4si: -** ptrue (p[0-7]).b, all +** ptrue (p[0-7]).b, vl16 ** ldr q([0-9]+), \[x0\] ** cnt z\2.s, \1/m, z\2.s ** str q\2, \[x1\] @@ -74,7 +74,7 @@ f_v4si (unsigned int *__restrict b, unsigned int *__restrict d) /* ** f_v2di: -** ptrue (p[0-7]).b, all +** ptrue (p[0-7]).b, vl16 ** ldr q([0-9]+), \[x0\] ** cnt z\2.d, \1/m, z\2.d ** str q\2, \[x1\] diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c b/gcc/testsuite/gcc.target/aarch64/popcnt11.c new file mode 100644 index 00000000000..e7e67de3572 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** f_qi: +** ldr b([0-9]+), \[x0\] +** cnt v\1.8b, v\1.8b +** smov w0, v\1.b\[0\] +** ret +*/ +unsigned +f_qi (unsigned char *a) +{ + return __builtin_popcountg (a[0]); +} + +/* +** f_hi: +** ldr h([0-9]+), \[x0\] +** ptrue (p[0-7]).b, vl8 +** cnt z\1.h, \2/m, z\1.h +** smov w0, v\1.h\[0\] +** ret +*/ +unsigned +f_hi (unsigned short *a) +{ + return __builtin_popcountg (a[0]); +} + +/* +** f_si: +** ldr s([0-9]+), \[x0\] +** ptrue (p[0-7]).b, vl8 +** cnt z\1.s, \2/m, z\1.s +** umov x0, v\1.d\[0\] +** ret +*/ +unsigned +f_si (unsigned int *a) +{ + return __builtin_popcountg (a[0]); +} + +/* +** f_di: +** ldr d([0-9]+), \[x0\] +** ptrue (p[0-7])\.b, vl8 +** cnt z\1\.d, \2/m, z\1\.d +** fmov x0, d\1 +** ret +*/ +unsigned +f_di (unsigned long *a) +{ + return __builtin_popcountg (a[0]); +} diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c b/gcc/testsuite/gcc.target/aarch64/popcnt12.c new file mode 100644 index 00000000000..f086cae55a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fgimple" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** foo: +** cnt v0.8b, v0.8b +** addv b0, v0.8b +** ret +*/ +__Uint64x1_t __GIMPLE +foo (__Uint64x1_t x) +{ + __Uint64x1_t z; + + z = .POPCOUNT (x); + return z; +}