From patchwork Thu Aug 1 00:17:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pengxuan Zheng (QUIC)" X-Patchwork-Id: 1967364 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=FORVJlBb; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WZ8lB1689z1ybX for ; Thu, 1 Aug 2024 10:21:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29F183858414 for ; Thu, 1 Aug 2024 00:21:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by sourceware.org (Postfix) with ESMTPS id 2C6B23858C78 for ; Thu, 1 Aug 2024 00:18:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C6B23858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2C6B23858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722471598; cv=none; b=ISs1vQuRZeBfQTuPivQZ4lwdPALwnbFidI989WjPGTuRwQlxRP5Z575u3aC6AtT4gg5YpoC+AcqUA4eH/4yMp965NcZ1ZNh5rY0WqcFWHPiDNKNWt4HHs8dgb+zJajaFU4ESRHb2lWj+7r+RiJqlvrKe+A3UTaudkzizTpfKGWI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722471598; c=relaxed/simple; bh=fNC0Ves71RnC17aN2rXaW1C5ALg3vbmRsOzrHH3Dgx8=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Q7n+glQnnaVUmqlfRgQDQj9GrodVjCXwhGRMdZqB4UzFEe7mQu/gT4DfVgsQaBSp0WP+l6lF9604BMuIOdagkKJuCSd9smL4H/o5JaX73cgZ/FjXMwl8PJklraDWA8Vk/nQRxjOOQJ+aeOgRiLk36j0B9MXBZ/fvOVRuNe0zFcg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0279862.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46VGEpOB012798 for ; Thu, 1 Aug 2024 00:18:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=qcppdkim1; bh=rFR2Xp9o/2xYsul3hDEcE5oP UELnFNDNyAjKOV+sMHU=; b=FORVJlBbUcdaQ+I4c8yVD6FN35u1nx8e4K9kTUTf YGaK2vBUVeeGMIaWMaivD5x+bsAe5i/l/L/RVseTeTto7XMq0MUh3vwacTbxA08T eaj7gmcEoPXyVfQeWS5MttVn/BZ1ZbNMGKOHReaoAMbqeFupkhwBhk5XU9KbILPY NmBiHyTDuENFCqXyR2HJySXVlirUYvo2oah3YZWZI6YT6zwK4h6tk3WrHtPUjuvx 2qzIalT/YKNF4zsey0Uj/ljOgX60IQOId28QdCUzbFIsFNzJsOOeIRrW/j9rCx8B WV3WeZscEPO+8ipO4kYUJ/Zgld/X1VSOG6kedL7/Js6b9g== Received: from nalasppmta02.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 40msnecrj0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 01 Aug 2024 00:18:25 +0000 (GMT) Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com [10.47.97.35]) by NALASPPMTA02.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 4710IPkl028998 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 1 Aug 2024 00:18:25 GMT Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Wed, 31 Jul 2024 17:18:25 -0700 From: Pengxuan Zheng To: CC: Pengxuan Zheng Subject: [PATCH v2] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860] Date: Wed, 31 Jul 2024 17:17:07 -0700 Message-ID: <20240801001707.7301-2-quic_pzheng@quicinc.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240801001707.7301-1-quic_pzheng@quicinc.com> References: <20240801001707.7301-1-quic_pzheng@quicinc.com> MIME-Version: 1.0 X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To nalasex01c.na.qualcomm.com (10.47.97.35) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: M0MOnFWU5ok_8n45i3n-K1DdVEtVqkhI X-Proofpoint-ORIG-GUID: M0MOnFWU5ok_8n45i3n-K1DdVEtVqkhI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-31_10,2024-07-31_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 phishscore=0 lowpriorityscore=0 mlxlogscore=718 priorityscore=1501 clxscore=1015 spamscore=0 mlxscore=0 bulkscore=0 suspectscore=0 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2407310168 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TIME_LIMIT_EXCEEDED, TXREP autolearn=unavailable version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch improves the Advanced SIMD popcount expansion by using SVE if available. For example, GCC currently generates the following code sequence for V2DI: cnt v31.16b, v31.16b uaddlp v31.8h, v31.16b uaddlp v31.4s, v31.8h uaddlp v31.2d, v31.4s However, by using SVE, we can generate the following sequence instead: ptrue p7.b, all cnt z31.d, p7/m, z31.d Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too. The scalar popcount expansion can also be improved similarly by using SVE and those changes will be included in a separate patch. PR target/113860 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (popcount2): Add TARGET_SVE support. * config/aarch64/aarch64-sve.md (@aarch64_pred_): Use new iterator SVE_VDQ_I. * config/aarch64/iterators.md (SVE_VDQ_I): New mode iterator. (VPRED): Add V8QI, V16QI, V4HI, V8HI and V2SI. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt-sve.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-simd.md | 9 ++ gcc/config/aarch64/aarch64-sve.md | 13 +-- gcc/config/aarch64/iterators.md | 5 ++ gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 88 +++++++++++++++++++ 4 files changed, 109 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bbeee221f37..895d6e5eab5 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3508,6 +3508,15 @@ (define_expand "popcount2" (popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))] "TARGET_SIMD" { + if (TARGET_SVE) + { + rtx p = aarch64_ptrue_reg (mode); + emit_insn (gen_aarch64_pred_popcount (operands[0], + p, + operands[1])); + DONE; + } + /* Generate a byte popcount. */ machine_mode mode = == 64 ? V8QImode : V16QImode; rtx tmp = gen_reg_rtx (mode); diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 5331e7121d5..eb3705ae515 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3104,16 +3104,16 @@ (define_expand "2" ;; Integer unary arithmetic predicated with a PTRUE. (define_insn "@aarch64_pred_" - [(set (match_operand:SVE_I 0 "register_operand") - (unspec:SVE_I + [(set (match_operand:SVE_VDQ_I 0 "register_operand") + (unspec:SVE_VDQ_I [(match_operand: 1 "register_operand") - (SVE_INT_UNARY:SVE_I - (match_operand:SVE_I 2 "register_operand"))] + (SVE_INT_UNARY:SVE_VDQ_I + (match_operand:SVE_VDQ_I 2 "register_operand"))] UNSPEC_PRED_X))] "TARGET_SVE" {@ [ cons: =0 , 1 , 2 ; attrs: movprfx ] - [ w , Upl , 0 ; * ] \t%0., %1/m, %2. - [ ?&w , Upl , w ; yes ] movprfx\t%0, %2\;\t%0., %1/m, %2. + [ w , Upl , 0 ; * ] \t%Z0., %1/m, %Z2. + [ ?&w , Upl , w ; yes ] movprfx\t%Z0, %Z2\;\t%Z0., %1/m, %Z2. } ) @@ -3168,6 +3168,7 @@ (define_insn "*cond__any" } ) + ;; ------------------------------------------------------------------------- ;; ---- [INT] General unary arithmetic corresponding to unspecs ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index f527b2cfeb8..ee3d1fb98fd 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -559,6 +559,9 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI ;; element modes (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI]) +;; All SVE and Advanced SIMD integer vector modes. +(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I]) + ;; SVE integer vector modes whose elements are 16 bits or wider. (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI VNx4SI VNx2SI @@ -2278,6 +2281,8 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI") (VNx32BF "VNx8BI") (VNx16SI "VNx4BI") (VNx16SF "VNx4BI") (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") + (V8QI "VNx8BI") (V16QI "VNx16BI") + (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") (V4SI "VNx4BI") (V2DI "VNx2BI")]) ;; ...and again in lower case. diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c new file mode 100644 index 00000000000..8e349efe390 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c @@ -0,0 +1,88 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve -fno-vect-cost-model -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** f_v4hi: +** ptrue (p[0-7]).b, all +** ldr d([0-9]+), \[x0\] +** cnt z\2.h, \1/m, z\2.h +** str d\2, \[x1\] +** ret +*/ +void +f_v4hi (unsigned short *__restrict b, unsigned short *__restrict d) +{ + d[0] = __builtin_popcount (b[0]); + d[1] = __builtin_popcount (b[1]); + d[2] = __builtin_popcount (b[2]); + d[3] = __builtin_popcount (b[3]); +} + +/* +** f_v8hi: +** ptrue (p[0-7]).b, all +** ldr q([0-9]+), \[x0\] +** cnt z\2.h, \1/m, z\2.h +** str q\2, \[x1\] +** ret +*/ +void +f_v8hi (unsigned short *__restrict b, unsigned short *__restrict d) +{ + d[0] = __builtin_popcount (b[0]); + d[1] = __builtin_popcount (b[1]); + d[2] = __builtin_popcount (b[2]); + d[3] = __builtin_popcount (b[3]); + d[4] = __builtin_popcount (b[4]); + d[5] = __builtin_popcount (b[5]); + d[6] = __builtin_popcount (b[6]); + d[7] = __builtin_popcount (b[7]); +} + +/* +** f_v2si: +** ptrue (p[0-7]).b, all +** ldr d([0-9]+), \[x0\] +** cnt z\2.s, \1/m, z\2.s +** str d\2, \[x1\] +** ret +*/ +void +f_v2si (unsigned int *__restrict b, unsigned int *__restrict d) +{ + d[0] = __builtin_popcount (b[0]); + d[1] = __builtin_popcount (b[1]); +} + +/* +** f_v4si: +** ptrue (p[0-7]).b, all +** ldr q([0-9]+), \[x0\] +** cnt z\2.s, \1/m, z\2.s +** str q\2, \[x1\] +** ret +*/ +void +f_v4si (unsigned int *__restrict b, unsigned int *__restrict d) +{ + d[0] = __builtin_popcount (b[0]); + d[1] = __builtin_popcount (b[1]); + d[2] = __builtin_popcount (b[2]); + d[3] = __builtin_popcount (b[3]); +} + +/* +** f_v2di: +** ptrue (p[0-7]).b, all +** ldr q([0-9]+), \[x0\] +** cnt z\2.d, \1/m, z\2.d +** str q\2, \[x1\] +** ret +*/ +void +f_v2di (unsigned long *__restrict b, unsigned long *__restrict d) +{ + d[0] = __builtin_popcountll (b[0]); + d[1] = __builtin_popcountll (b[1]); +}