From patchwork Tue Sep 3 18:54:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pengxuan Zheng (QUIC)" X-Patchwork-Id: 1980267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=dsRFXD5i; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wyvvg63cxz1yg9 for ; Wed, 4 Sep 2024 04:55:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2EC06386484A for ; Tue, 3 Sep 2024 18:55:25 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by sourceware.org (Postfix) with ESMTPS id 4E716385B50D for ; Tue, 3 Sep 2024 18:55:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4E716385B50D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4E716385B50D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725389704; cv=none; b=QRuy1gWEQEmx6RjT+D55YpWhsfsHm6GiXeJQzdbenMGgifI7YzOZpSP1KDKLmssxbKnLU4s1GD6CImkhj7VXq4CuTbl39QaubQGjTxgLVe4LJ+IQfOmRuADv4w+XDA0HInntHmcHcwnB5WucrNp/oXWcaKaM+teCvbQMjGCk16I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725389704; c=relaxed/simple; bh=89X+fSzlHkET67J8fdW/UuMr5fIo9as232iHxFeLdks=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Avlc4ip6ajEdTEOMCFrGvBVZ64hRfO4VnTauCb4hzicFGjd2C65LGRUGBbNQBiZaOzhQz+0Tp/QV3x69e8qDqsiuysK9RXu/qISYiTfNjhESEUkgZ2U9e7fcBip3uD1IXpVdfW9orOUQnTn+D0ha4L3HyEPsTTIHRF+nemikgYk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0279869.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 483B0Vfl009594 for ; Tue, 3 Sep 2024 18:55:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-type:date:from:message-id:mime-version:subject:to; s= qcppdkim1; bh=RQRK/tIoqwhw7a1CFfsnBba9tJxLGAY2xK7RLolpdFQ=; b=ds RFXD5iyWdXMW45hSWaLqVd048vENfpOmUP4SAGXK8g872FOO19TDIWOdy0PAYqLk oV7sR8/QMEgBE/ehKUnmgRL0OyWaflQqZCjQpFr5GHTEIhnDlEOyNIMF+r0nS/SA gVbwZuvHUBtPgaGwAa/GtBkbm2erwC65Rc6NGlw+1lfcaaSBokC2C7CYauT1mBlv h/q4BQxh7SRsz9qdtHeGH95HjZ3kzovnz2Ckb/VzYW5b8bUls9+o0zFVYQfWvuVa vbLEhCq84gQtZTTmuFKXZxNrREUL94q2ZLTJZR2fdj38oSWoN+99fiQeUZRCERme h6zaK6m6SCDZJ71bABtA== Received: from nalasppmta02.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 41bt670jes-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 03 Sep 2024 18:55:01 +0000 (GMT) Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com [10.47.97.35]) by NALASPPMTA02.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 483It0t3022383 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 3 Sep 2024 18:55:00 GMT Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 3 Sep 2024 11:54:59 -0700 From: Pengxuan Zheng To: CC: Pengxuan Zheng Subject: [PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860] Date: Tue, 3 Sep 2024 11:54:32 -0700 Message-ID: <20240903185432.23565-1-quic_pzheng@quicinc.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To nalasex01c.na.qualcomm.com (10.47.97.35) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: Oj3R_9f2opVJDWEHt-rStFe813Nif-bG X-Proofpoint-GUID: Oj3R_9f2opVJDWEHt-rStFe813Nif-bG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-03_06,2024-09-03_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 impostorscore=0 suspectscore=0 phishscore=0 mlxlogscore=991 lowpriorityscore=0 spamscore=0 clxscore=1011 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2409030151 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This is similar to the recent improvements to the Advanced SIMD popcount expansion by using SVE. We can utilize SVE to generate more efficient code for scalar mode popcount too. PR target/113860 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (popcount2): Update pattern to also support V1DI mode. * config/aarch64/aarch64.md (popcount2): Add TARGET_SVE support. * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator. (SVE_VDQ_I): Add V1DI. (bitsize): Likewise. (VPRED): Likewise. (VEC_POP_MODE): New mode attribute. (vec_pop_mode): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt11.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-simd.md | 5 +- gcc/config/aarch64/aarch64.md | 9 ++++ gcc/config/aarch64/iterators.md | 16 ++++-- gcc/testsuite/gcc.target/aarch64/popcnt11.c | 58 +++++++++++++++++++++ 4 files changed, 83 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt11.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 23c03a96371..649aeaf19ed 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3515,8 +3515,9 @@ (define_insn "popcount2" ) (define_expand "popcount2" - [(set (match_operand:VDQHSD 0 "register_operand") - (popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))] + [(set (match_operand:VDQHSD_V1DI 0 "register_operand") + (popcount:VDQHSD_V1DI + (match_operand:VDQHSD_V1DI 1 "register_operand")))] "TARGET_SIMD" { if (TARGET_SVE) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c54b29cd64b..ef52770f1cb 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5345,6 +5345,15 @@ (define_expand "popcount2" (popcount:ALLI (match_operand:ALLI 1 "register_operand")))] "TARGET_CSSC ? GET_MODE_BITSIZE (mode) >= 32 : TARGET_SIMD" { + if (!TARGET_CSSC && TARGET_SVE && mode != QImode) + { + rtx tmp = gen_reg_rtx (mode); + rtx op1 = gen_lowpart (mode, operands[1]); + emit_insn (gen_popcount2 (tmp, op1)); + emit_move_insn (operands[0], gen_lowpart (mode, tmp)); + DONE; + } + if (!TARGET_CSSC) { rtx v = gen_reg_rtx (V8QImode); diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 20a318e023b..84387a8119e 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -290,6 +290,8 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI]) ;; Advanced SIMD modes for H, S and D types. (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI]) +(define_mode_iterator VDQHSD_V1DI [VDQHSD V1DI]) + ;; Advanced SIMD and scalar integer modes for H and S. (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI]) @@ -560,7 +562,7 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI]) ;; All SVE and Advanced SIMD integer vector modes. -(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I]) +(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I V1DI]) ;; SVE integer vector modes whose elements are 16 bits or wider. (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI @@ -1230,7 +1232,7 @@ (define_mode_attr nunits [(V8QI "8") (V16QI "16") (define_mode_attr bitsize [(V8QI "64") (V16QI "128") (V4HI "64") (V8HI "128") (V2SI "64") (V4SI "128") - (V2DI "128")]) + (V1DI "64") (V2DI "128")]) ;; Map a floating point or integer mode to the appropriate register name prefix (define_mode_attr s [(HF "h") (SF "s") (DF "d") (SI "s") (DI "d")]) @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI") (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") (V8QI "VNx8BI") (V16QI "VNx16BI") (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") - (V4SI "VNx4BI") (V2DI "VNx2BI")]) + (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")]) ;; ...and again in lower case. (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") @@ -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI") (VNx4SI "VNx8SI") (VNx4SF "VNx8SF") (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")]) +;; The Advanced SIMD modes of popcount corresponding to scalar modes. +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI") + (SI "V2SI") (DI "V1DI")]) + +;; ...and again in lower case. +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi") + (SI "v2si") (DI "v1di")]) + ;; On AArch64 the By element instruction doesn't have a 2S variant. ;; However because the instruction always selects a pair of values ;; The normal 3SAME instruction can be used here instead. diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c b/gcc/testsuite/gcc.target/aarch64/popcnt11.c new file mode 100644 index 00000000000..595b2f9eb93 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** f_qi: +** ldr b([0-9]+), \[x0\] +** cnt v\1.8b, v\1.8b +** smov w0, v\1.b\[0\] +** ret +*/ +unsigned +f_qi (unsigned char *a) +{ + return __builtin_popcountg (a[0]); +} + +/* +** f_hi: +** ldr h([0-9]+), \[x0\] +** ptrue (p[0-7]).b, all +** cnt z\1.h, \2/m, z\1.h +** smov w0, v\1.h\[0\] +** ret +*/ +unsigned +f_hi (unsigned short *a) +{ + return __builtin_popcountg (a[0]); +} + +/* +** f_si: +** ldr s([0-9]+), \[x0\] +** ptrue (p[0-7]).b, all +** cnt z\1.s, \2/m, z\1.s +** umov x0, v\1.d\[0\] +** ret +*/ +unsigned +f_si (unsigned int *a) +{ + return __builtin_popcountg (a[0]); +} + +/* +** f_di: +** ldr d([0-9]+), \[x0\] +** ptrue (p[0-7])\.b, all +** cnt z\1\.d, \2/m, z\1\.d +** fmov x0, d\1 +** ret +*/ +unsigned +f_di (unsigned long *a) +{ + return __builtin_popcountg (a[0]); +}