From patchwork Fri Oct 18 07:30:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jennifer Schmitz X-Patchwork-Id: 1998937 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=RvyvXS6G; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVGbR2FPqz1xw2 for ; Fri, 18 Oct 2024 18:31:15 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4C3C2385840A for ; Fri, 18 Oct 2024 07:31:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2061b.outbound.protection.outlook.com [IPv6:2a01:111:f403:2414::61b]) by sourceware.org (Postfix) with ESMTPS id DF9E53858D37 for ; Fri, 18 Oct 2024 07:30:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF9E53858D37 Authentication-Results: sourceware.org; dmarc=fail (p=reject dis=none) header.from=nvidia.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DF9E53858D37 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2414::61b ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1729236650; cv=pass; b=W9VSoJea6mtHrui3QWknsG+7swhcRUu2/DF3w4K1QIB4pLdeGK4av2Z10kEA/BiVJtaAaosdAvi6xJc0Lmldy6WTI1iH4ZV9TtDX+7T5Coh8CVj69TZpc1NV/mfbvxPjIFzsU+FC9x2/Gl2DGRAEFEtFyW84KGwb604eEFQE8zc= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1729236650; c=relaxed/simple; bh=jyVW07pUwxnvOXjm/3UqPP2QLonRAsp7/XHlGOkG83E=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=euwHJ/mlksbd6SdskvSk35C+wUoplEQZKhj76Yz14VisM2TN65sqJDmYmHJKG0HXG7u3X/5gFoH3KIgjN2FTLMFqBeRBePkiowXmiSUyIoBOkhK03ovK6clyoi/N6GTPQka9vumKMNR8Hg+bPOdeEpMOFduKASeOpaKVBbaPFhM= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=XbEMZOyM9PcieLwrij1LAmdka4I1+ldUxUM3rYcs36UPpuHa/teK8BbI6PBeoJ468r77yxJuiLiHWXCX7sPB0Hd5ooljJLe5ukNadRe4ZcEjNUujxw6Tc6h4NJePmh5lBEPcnCzzx1iLnb3T5YEg13dY0XRqN2/RfOtA/kWen+ObN8SPatBShyASixYLntyjRCZL3NsC6bSUez7FrcsbMa0gt6l1YLBlglnVM3OEoVBA38q+7/5HFFzyF2KB/ItgiGgqTTj5z1/npkuUHG2ZvdyGj0iJNqwZkaOB0S+Bl/qnUBGPqcAcR4js/v4+5w9TagU3Zb2phuvI1fhA2xx3Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Vu0JjMQVtpYa9i0sq4q6I1cn8lNcUfife5Oh/seGEeI=; b=hIBaJpHqfOyAoQeBr2j1NW7i0W5CULeSLRmcmS75RMiP/ebFOWp+zTXrTt0RjmWpAFKFAHRnj421p/nehJK+7alYGy1rgnCqJ1eSn4PrHgphPsRTq50c2sntp1jDH3lqsubhhVeP8ZHIos/aDMeeh/eGpVqPnx8KS5LAReVsGbbOZq1CvyD0Qhx1oANKZtsH+t2s1Z8a4cgUdfrHorXDkbYJWZ8DQwdg8pRoAbTMYKi0pC3a6dp4grNlf1PxjWEGczchk0p04cJATolpc5qal65LH3FvM//+C2KUZ1RlOE0sC3tdJNDRGhMJOay4pzfv4MpVybRmfzeIVN0BHW7zjw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Vu0JjMQVtpYa9i0sq4q6I1cn8lNcUfife5Oh/seGEeI=; b=RvyvXS6Gm2tZaHJdXmBcRCqi3sxDG3EUdALn2/0HIpw8zgMhXF6tMtHyRles7+VxE4pNfxH8gshVC1DVt8gpekRDqcuRzDyGvtj5iNibQp1Jjo8cWHvtftMm8kKPYB91sCHaojbY9ZfJ1nxdRww+IxpC22zsDEYFFCgPcmDpLV6QciJDCpMTb6TATAydlqE6YeV1GLaSDGiQ5dq7WY3ACzY4N+Qv31A9PcFAtcwpr7/jpKF+tl+SUHzqGVlQ+oPnyzjPVD6IKbDGt+eK0Uk8HofoDVzo6itZlWRqtWpzDJND5O8mmY+5jS9W0CXOozG+qefgojZid0pzeCd2bs6iWg== Received: from CH0PR12MB5252.namprd12.prod.outlook.com (2603:10b6:610:d3::24) by CH2PR12MB4278.namprd12.prod.outlook.com (2603:10b6:610:ab::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.20; Fri, 18 Oct 2024 07:30:34 +0000 Received: from CH0PR12MB5252.namprd12.prod.outlook.com ([fe80::290b:293f:5cbd:9c9d]) by CH0PR12MB5252.namprd12.prod.outlook.com ([fe80::290b:293f:5cbd:9c9d%4]) with mapi id 15.20.8069.020; Fri, 18 Oct 2024 07:30:34 +0000 From: Jennifer Schmitz To: "gcc-patches@gcc.gnu.org" CC: Richard Sandiford , Kyrylo Tkachov Subject: [PATCH] SVE intrinsics: Fold svsra with op1 all zeros to svlsr/svasr. Thread-Topic: [PATCH] SVE intrinsics: Fold svsra with op1 all zeros to svlsr/svasr. Thread-Index: AQHbIS+emHGDHUgzlk2cnS+InrQKCQ== Date: Fri, 18 Oct 2024 07:30:34 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: CH0PR12MB5252:EE_|CH2PR12MB4278:EE_ x-ms-office365-filtering-correlation-id: b7703dd4-c05f-482c-598c-08dcef46c11c x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: MDnMwxbzBGZDDjrrLHom/EQycIYV8AEoOboUwMvQthJIzlb4GWVqwivF9jD+5YcVXNSrf7mFYJomb74HX2CMdSwLGKXYm43tSEv6izvdjkbbjHRgzoAdPDwPN6ctCav5B7BspqB0IHDNmPRKczvTr82laJ1EirPQVpXiw19joLjdkztUIhjX8r0ebmQNvxf/rmbuQysRQtQlvOIxoWtHJZosdCbZw54lWCwWxSMNHcR3eJmx59ABGoaUzkDdT1ao6igPFlqUJptVduqJ7IbMCBfaeBeS/nYD97GS9emX9bQzUKOYaAnjcpSrxuzVaDqLiKENJfw7d4hgdFw73Az7cN9IQ0vczRrdnPk2itUkeCtd8YLUSl/8HUl3b+abOf6YRo4y0e5qjdZCfpVBodg4MlWyBQKymQoLVLIpL1cpmZMt0ouvnghdB9BYOYLfFqM5esnnnB6KCdAXoooAqYuC5NfLLO4Z6OwPaG+kfiAddqvHEQTcbdcyfykPmPKEZJsM6XSukNX+xPi7F01W8KA3CtnVcLleovlBmDIoVYFs8ZZzaMwxWuUKO6QSjx2LqhfiywD3a7dLATbSuYpaPePDL/I0GpCYs0EfKe8EYKjIss0Ww4YBgh/tgJQhLA3iNDBrC/a3qtgL/7RlmvMYfJk8vvQL0M3IGINj7edcyBy7C4gWh9h/+Slm9yLE/IEjDO/nLU6EnRRUUQulbsPoVnoGnwJLgCtQ3b4oyIw8cnE8XFd6HHlR4/yKRc2AZLKYA+ozC6Qb6jXNmvqQzlRuhSB2pl0wcOghuvqgYhqfw7ICAmM/zZP0KNl8ShfqLyQ3+o/WQ1AoR8696TFJ7r/QNaY98LNjUaUyY6E8OlD4xpwWApHYhbJJEpMmb+5N/luMkVvY6qVhSVAELdC8UiXqZ+cApKlH0Msel9atMh7K7dzIyGSqSA5cDujrufCrSRo2xeGKyGFtF72xBR+BmFoDB75GP78tQPA5/rQP3ZKNdcyLk2ITX//r3d2tqJUp5vK8V2LtVM+pwRJ2/U9KFBFuRAdaCggSlIz1Lkbluy0xF4yiTLpX8blne9xGsS0uvcSdT2nq664FOuFgXRZVT3AgFeizUNgbG1bZ7JKr0TAginzTcImtxqpXxecYgqQkhblcBwd5nePOw1Dtfe00Sx91OuCzGA+5DgvFbFESo6SNBXT5K4XdihODQVTU1EVCOT3KYVcxep8ufD32jVyJYWPKN131fYY0AXPuFT5OfkhTbh66s0R6SZI+aujRN/hTXxiRX7DpZr6hPZ47ZyL7bQzZq+2WvPIxCuJ7EeCSt5OBkAwWt3Svaih7jbBJSZiih5t0yTh1Y7N9ctTt4VzaoAyZmD+VDw4tt1MF6Go9yLi6g81Tt29/R/exgggG4DkAMnWBwji5 x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH0PR12MB5252.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: V9HNGaRZWyjjxYaJTx8KX7gtiYReDRY1WP8GtOqoiMIH1KYecYixSY2w+SzBatOcL6n1yoQ3FXl04QtOFEDtH0WC+qqe0Kac/HdaaA+Dei2kxaE3g3HARY3q0G9ssZOogmon2RU3BHGRGSRqqG/q0vl2TuM5g4Luu6oDwfxCw+ayXacqGIXx9z8VzoYbb3/WZAVAdUPzxvugqcZu+yJ7pnHNEvjgM02Vk8AE73gz52+265PwM9jM0VDzCly8qn0YU3JIV/2fh89esDRWWo/ZukpFGERe4f1v3Oy7xknMWMu0TevjpMyZ6/38/zqyAGBfMEL+Bplj6iHgRznkD8e9h+3PvSYytth0FTt9EheneOBt/1U+LkJ4QZjjyPb8+D9ItZVaa9ax9SIIgmGlt9UQ8+H3kAQ7ZL1R2yAemtsWkNwi03Kbc14QQhvwG7VCQzRnIyRO2MVA5bjrF5QqA4uZxGKxf0uHo9ysaHvve1jphgkdu7NTESA21pvk0SK+zoYVqyOyrSwhrSnFJ16AQWspl8J4yPdQNu6EMHLg+1Fq4jewXh7AZgAe7JBprhgM5LN3cOrSFTu8QZx+0ewaHamusjYiTZUy2xRwvJpLiK43Y7kby+Tzf6AVLq2P7gLCCtzN02RsYyehf7sYpQ6wO06CFIETceqp8haK7s+V0QPAT0Miv1uIW2Qeno1tsfzwHqr1NHZAV4r/7/LJ5PdVTrrgQeFQqsQuye7wPRprLscNH6j8hH7wDydJlv/AqrskUN8E//mgNuhHsByWNdTmygIpgJW596LuzVG+glpgMhHUGBD8XxURrU/79LshgT8SjxPx2VIooXBDw0KTyU8Q/7n4UCtY4VgrHFMzFTqc/LpGectKki5qJmas9UKlUYKfPhTW5MaMi1uJ3fiy+rFRUNWdjIlaRT7UFEaiYj7FgOkEcw3yxwl4oQCsV/Wly9iZrzdGp0Ssh5v2v7A43oqDrwcpp1Z/JmVYHDxaz9aeDow6iyIxjF4gAGVtil0YWayeAQSUz5YRHEA/xbaA5CEWyPo3vynbh7aW1Rhao5oLF9bs+O0wm3qhH83VHuaK8bLcg7uxHo6bNVfFWyyWvYCiK3NCKvMmDykHGwSbx4nRZDWkKPfl7xHc41Mbq11aeQM0EB5WsswyMTF6IF2kc6ByilwLeFMJNrF/CZWmei1cii5BHZb/sqBYreFO2yKtX+B8OkyFQrOOlT8nAyQwynr7lI7dUTdVU7fi4ey3IgEeGJ0VVpok2VAC6jLYR+pRUDcy3aEM9dlqt4Ctcn6hvmNk7cDXZ2SUfLD3KMg5Bsw7B2jmHewtNB8CIrU8VvUpXtihYUlCqN1Md3MDMssGtJbCx5pOH5q7DIJHjvvKHfETGB/n9JiQPETvRiOaE+lwfCJoeRyK/fLZm9y4ykCG7Vlov1GST+yeu4cQ2TMnLCUwke/qUYCp2USM1DmdUkHDVIRvwf+LThJ6E2hgfD3FIg/2Bau6PbeO3nLbxRgDzF2qPnG3Rsq4/uSMiLYLIHs2lUVmePNY0YvFBg0xKEmSefsXUt33zaBiS/qW5D46GP9mGmhJUyJGy58ewodwTHWy1BB954GE MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CH0PR12MB5252.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: b7703dd4-c05f-482c-598c-08dcef46c11c X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Oct 2024 07:30:34.5881 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 9OqyNVvut64iQ1eV7dFAz7qDqiOZwUHUFII2biKF96x9Gb7I63BssTOm7lpY7VN+EU3oDaV2FZBP9FXnnfpyaw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4278 X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org A common idiom in intrinsics loops is to have accumulator intrinsics in an unrolled loop with an accumulator initialized to zero at the beginning. Propagating the initial zero accumulator into the first iteration of the loop and simplifying the first accumulate instruction is a desirable transformation that we should teach GCC. Therefore, this patch folds svsra to svlsr/svasr if op1 is all zeros, producing the lower latency instructions LSR/ASR instead of USRA/SSRA. We implemented this optimization in svsra_impl::fold. Because svlsr/svasr are predicated intrinsics, we added a ptrue predicate. Additionally, the width of the shift amount (imm3) was adjusted to fit the function type. In order to create the ptrue predicate, a new helper function build_ptrue was added. We also refactored gimple_folder::fold_to_ptrue to use the new helper function. Tests were added to check the produced assembly for use of LSR/ASR. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ * config/aarch64/aarch64-sve-builtins-sve2.cc (svsra_impl::fold): Fold svsra to svlsr/svasr if op1 is all zeros. * config/aarch64/aarch64-sve-builtins.cc (build_ptrue): New function that returns a ptrue tree. (gimple_folder::fold_to_ptrue): Refactor to use build_ptrue. * config/aarch64/aarch64-sve-builtins.h: Declare build_ptrue. gcc/testsuite/ * gcc.target/aarch64/sve2/acle/asm/sra_s32.c: New test. * gcc.target/aarch64/sve2/acle/asm/sra_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sra_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sra_u64.c: Likewise. --- .../aarch64/aarch64-sve-builtins-sve2.cc | 29 +++++++++++++++++++ gcc/config/aarch64/aarch64-sve-builtins.cc | 28 +++++++++++------- gcc/config/aarch64/aarch64-sve-builtins.h | 1 + .../aarch64/sve2/acle/asm/sra_s32.c | 9 ++++++ .../aarch64/sve2/acle/asm/sra_s64.c | 9 ++++++ .../aarch64/sve2/acle/asm/sra_u32.c | 9 ++++++ .../aarch64/sve2/acle/asm/sra_u64.c | 9 ++++++ 7 files changed, 83 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc index 6a20a613f83..0990918cc45 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc @@ -417,6 +417,35 @@ public: class svsra_impl : public function_base { +public: + gimple * + fold (gimple_folder &f) const override + { + /* Fold to svlsr/svasr if op1 is all zeros. */ + tree op1 = gimple_call_arg (f.call, 0); + if (!integer_zerop (op1)) + return NULL; + function_instance instance ("svlsr", functions::svlsr, + shapes::binary_uint_opt_n, MODE_n, + f.type_suffix_ids, GROUP_none, PRED_x); + if (!f.type_suffix (0).unsigned_p) + { + instance.base_name = "svasr"; + instance.base = functions::svasr; + } + gcall *call = f.redirect_call (instance); + unsigned int element_bytes = f.type_suffix (0).element_bytes; + /* Add a ptrue as predicate, because unlike svsra, svlsr/svasr are + predicated intrinsics. */ + gimple_call_set_arg (call, 0, build_ptrue (element_bytes)); + /* For svsra, the shift amount (imm3) is uint64_t for all function types, + but for svlsr/svasr, imm3 has the same width as the function type. */ + tree imm3 = gimple_call_arg (f.call, 2); + tree imm3_prec = wide_int_to_tree (scalar_types[f.type_suffix (0).vector_type], + wi::to_wide (imm3, element_bytes)); + gimple_call_set_arg (call, 2, imm3_prec); + return call; + } public: rtx expand (function_expander &e) const override diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index e7c703c987e..945e9818f4e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -3456,6 +3456,21 @@ is_ptrue (tree v, unsigned int step) && vector_cst_all_same (v, step)); } +/* Return a ptrue tree (type svbool_t) where the element width + is given by ELEMENT_BYTES. + For example, for ELEMENT_BYTES = 2, we get { 1, 0, 1, 0, ... }. */ +tree +build_ptrue (unsigned int element_bytes) +{ + tree bool_type = scalar_types[VECTOR_TYPE_svbool_t]; + tree svbool_type = acle_vector_types[0][VECTOR_TYPE_svbool_t]; + tree_vector_builder builder (svbool_type, element_bytes, 1); + builder.quick_push (build_all_ones_cst (bool_type)); + for (unsigned int i = 1; i < element_bytes; ++i) + builder.quick_push (build_zero_cst (bool_type)); + return builder.build (); +} + gimple_folder::gimple_folder (const function_instance &instance, tree fndecl, gimple_stmt_iterator *gsi_in, gcall *call_in) : function_call_info (gimple_location (call_in), instance, fndecl), @@ -3572,17 +3587,8 @@ gimple_folder::fold_to_cstu (poly_uint64 val) gimple * gimple_folder::fold_to_ptrue () { - tree svbool_type = TREE_TYPE (lhs); - tree bool_type = TREE_TYPE (svbool_type); - unsigned int element_bytes = type_suffix (0).element_bytes; - - /* The return type is svbool_t for all type suffixes, thus for b8 we - want { 1, 1, 1, 1, ... }, for b16 we want { 1, 0, 1, 0, ... }, etc. */ - tree_vector_builder builder (svbool_type, element_bytes, 1); - builder.quick_push (build_all_ones_cst (bool_type)); - for (unsigned int i = 1; i < element_bytes; ++i) - builder.quick_push (build_zero_cst (bool_type)); - return gimple_build_assign (lhs, builder.build ()); + return gimple_build_assign (lhs, + build_ptrue (type_suffix (0).element_bytes)); } /* Fold the call to a PFALSE. */ diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 645e56badbe..c5524b9664f 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -829,6 +829,7 @@ extern tree acle_svprfop; bool vector_cst_all_same (tree, unsigned int); bool is_ptrue (tree, unsigned int); const function_instance *lookup_fndecl (tree); +tree build_ptrue (unsigned int); /* Try to find a mode with the given mode_suffix_info fields. Return the mode on success or MODE_none on failure. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s32.c index ac992dc7b1c..86cf4bd8137 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s32.c @@ -91,3 +91,12 @@ TEST_UNIFORM_Z (sra_32_s32_tied2, svint32_t, TEST_UNIFORM_Z (sra_32_s32_untied, svint32_t, z0 = svsra_n_s32 (z1, z2, 32), z0 = svsra (z1, z2, 32)) + +/* +** sra_2_s32_zeroop1: +** asr z0\.s, z1\.s, #2 +** ret +*/ +TEST_UNIFORM_Z (sra_2_s32_zeroop1, svint32_t, + z0 = svsra_n_s32 (svdup_s32 (0), z1, 2), + z0 = svsra (svdup_s32 (0), z1, 2)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s64.c index 9ea5657ab88..7b39798ba1d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_s64.c @@ -91,3 +91,12 @@ TEST_UNIFORM_Z (sra_64_s64_tied2, svint64_t, TEST_UNIFORM_Z (sra_64_s64_untied, svint64_t, z0 = svsra_n_s64 (z1, z2, 64), z0 = svsra (z1, z2, 64)) + +/* +** sra_2_s64_zeroop1: +** asr z0\.d, z1\.d, #2 +** ret +*/ +TEST_UNIFORM_Z (sra_2_s64_zeroop1, svint64_t, + z0 = svsra_n_s64 (svdup_s64 (0), z1, 2), + z0 = svsra (svdup_s64 (0), z1, 2)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u32.c index 090245153f7..001e09ca78d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u32.c @@ -91,3 +91,12 @@ TEST_UNIFORM_Z (sra_32_u32_tied2, svuint32_t, TEST_UNIFORM_Z (sra_32_u32_untied, svuint32_t, z0 = svsra_n_u32 (z1, z2, 32), z0 = svsra (z1, z2, 32)) + +/* +** sra_2_u32_zeroop1: +** lsr z0\.s, z1\.s, #2 +** ret +*/ +TEST_UNIFORM_Z (sra_2_u32_zeroop1, svuint32_t, + z0 = svsra_n_u32 (svdup_u32 (0), z1, 2), + z0 = svsra (svdup_u32 (0), z1, 2)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u64.c index ff21c368b72..780cf7a7ff6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sra_u64.c @@ -91,3 +91,12 @@ TEST_UNIFORM_Z (sra_64_u64_tied2, svuint64_t, TEST_UNIFORM_Z (sra_64_u64_untied, svuint64_t, z0 = svsra_n_u64 (z1, z2, 64), z0 = svsra (z1, z2, 64)) + +/* +** sra_2_u64_zeroop1: +** lsr z0\.d, z1\.d, #2 +** ret +*/ +TEST_UNIFORM_Z (sra_2_u64_zeroop1, svuint64_t, + z0 = svsra_n_u64 (svdup_u64 (0), z1, 2), + z0 = svsra (svdup_u64 (0), z1, 2))