From patchwork Tue Oct 15 07:40:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jennifer Schmitz X-Patchwork-Id: 1997229 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=Drhc3JWS; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XSQyW30q2z1xth for ; Tue, 15 Oct 2024 18:41:22 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 077593857C5D for ; Tue, 15 Oct 2024 07:41:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2060c.outbound.protection.outlook.com [IPv6:2a01:111:f403:2418::60c]) by sourceware.org (Postfix) with ESMTPS id E0B7D3858CD1 for ; Tue, 15 Oct 2024 07:40:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E0B7D3858CD1 Authentication-Results: sourceware.org; dmarc=fail (p=reject dis=none) header.from=nvidia.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E0B7D3858CD1 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2418::60c ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1728978058; cv=pass; b=pX5eF6a66w4JU7KOrSJd6WbZlNZJHG/RYU3HOZfLw4bySOd10aK29QmT9AkukPyg03Uo+GNzqMaX2DzhJX7j0K5NEqRpJnmukgMPikJECcJzYcjPxmC+OUXepgvMQkXfq6JVNI+P9Ac58lrC7WRcRZOM1Vk6ftNp6aIW0p6N3rA= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1728978058; c=relaxed/simple; bh=naXdAM6d9ce+w/U1FGqCu+ItBwEBv6bbNeU3/RJtF/4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=DW1QrpAE2mw6NXXRKMZlAa3SA5LkhMYKgwRHk9NpZYxixGiXENmsjv4KqOf20r5WXZy/pfh+pTYP47ilEiFfMfyLTQYgvYlpL4OhHLX/d9wCXT3rzEj/rAND4URkavVV5qlOGvlvCka5RBnynXNfFXrbVa4pmfQ6cAzqZBjGTTk= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=d1YygR/MiaLHuUTjRRTrMMCic1fie9Qh7w6GGvzAxGdMtd/sJc2dkw0Z2kMB4Bd6G45xIr6E7i15eArepe3pn0ZGjG+k3EoI2ym+J4KMSsC960jXrKzO9BRwdsL9yKMDGekD5I/HLbz+IF2IYanIMkLK5yNHDec9stQG45ymKyoLD08vfDxvxTrPVu8Lu2ORc88eR+gO3Z9YInBCMAvuzOMDYpiJssS0fRWFsFpPsoUt2vgKr8WB9LxUW0ikUgg4l2ydZMOHjV0V+xDm7q1kxkx4AdTyLXM3+VdbzDfDfEKZ6+pL52j4c2C6WchLMXg6T0wB0C3FsaoqPTNSHmuY4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ftEhlA3p3nxDntTSTJP0C+LNNic+Ve4z3cSZBNOPF74=; b=ikIgj2gm+2HMcf53DRcuFo8kxrx1gQ5KvQ63FSwT18HzR45OYE9WAfnqO6LPFFNOveGm2araUL1mqTQshswSjlKStRcsqyZiCh3kTHuFoKWQK7KCfeg8/tV95AvO2dOBhIqPsb4B0+6gDP0xYpJH3kinwvLoKbF6QQacg4y/QYu4kPz/mOlP/l8ycYKLEdtcv9lBtibmrHaOROtU+RnNRpdUynkcmHihKOMf5iNrmWZ4BsSp4O/FEg83sktuC5XlTVwinNQSzEzo8XMQ5/kraN+TC73N2Yk6mEfNebGnAtiAUS6Iymwcqo2FnoPly8oooEIOi8gfkzbGMw2C5NRiSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ftEhlA3p3nxDntTSTJP0C+LNNic+Ve4z3cSZBNOPF74=; b=Drhc3JWSAbUB34RyvFNbNbZREw+jPvWz+0MQ3ov1bpBT8/cFEhNPebPOE4G47q0MZvhImQ5itRfD/XkGNW5uE3l58jLATrq2MKmfKMmmtpVGdXDhVw/Mq10Q6763i8WxUb/C/o4ZJzfQ7ZV+PYMz6tqP4RtyfXSMOtlefnfCN5WN4fDj7cSXjHpyxeuF4Vd+fbBeHFaSi6fI1lm0HF0HTA8x4+/zJQ0rwOUeEgOw5OyDaMk2PpL53xp31BWvmwUZMa1Hjz3KMDP8TVzTKZipw6jUWpgLpf4lkLdUVa9tNCIHJQpDCU+ZKttc8mbtBHKZoqFk40hrhhVfwECJODhnCQ== Received: from CH0PR12MB5252.namprd12.prod.outlook.com (2603:10b6:610:d3::24) by BY5PR12MB4146.namprd12.prod.outlook.com (2603:10b6:a03:20d::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.27; Tue, 15 Oct 2024 07:40:46 +0000 Received: from CH0PR12MB5252.namprd12.prod.outlook.com ([fe80::290b:293f:5cbd:9c9d]) by CH0PR12MB5252.namprd12.prod.outlook.com ([fe80::290b:293f:5cbd:9c9d%5]) with mapi id 15.20.8048.029; Tue, 15 Oct 2024 07:40:46 +0000 From: Jennifer Schmitz To: "gcc-patches@gcc.gnu.org" CC: Richard Sandiford , Kyrylo Tkachov Subject: [PATCH] SVE intrinsics: Fold division and multiplication by -1 to neg. Thread-Topic: [PATCH] SVE intrinsics: Fold division and multiplication by -1 to neg. Thread-Index: AQHbHtWMkirF0Q8BREyDQ6Z9BzCf7A== Date: Tue, 15 Oct 2024 07:40:46 +0000 Message-ID: <3839A82A-5059-4B74-A37E-8DD89765A3F4@nvidia.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: CH0PR12MB5252:EE_|BY5PR12MB4146:EE_ x-ms-office365-filtering-correlation-id: 6c2cccc2-93c4-46a7-b279-08dcececaeab x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|1800799024|366016|38070700018; x-microsoft-antispam-message-info: UvHJzsEyBgz8YrxEE6VjYLlBLqMIUWJdtk2FOOpBgpcLkvLgKqpScSx7ij1SvKcMBU5IinTiLkWzM7MsY6ofgWIasaG6GAVgdDa/h2U9wqWzb4JZaMP6Uth/0mDSK2QjRn467hd38DT1X4fsB/JkL6eLTM8+9d9wyTTLAKIcnmYttsWm0dlCw0gi/L7SZoJnXYJceb7sTbDflxHGwFxuP1WHUjsdZoaib3M9OSioKC/9bn5XE20z8t+/P4oFxvcd/JXaOBgt171nnchOQ1bYniVg/QUrJwQR80H/Oh1588PlGto0N3eczVIFfUPfkjU3pVsOSoDeaK4j4R8UUccEtANeQLYni5TvgRxJKksXzcFf0sZ2V8n/c4VoBUrogiDSskwzSNLxxV7oMhRRW8MGqjXfK6DLeYxJBQ6ljJWUOrSzKfcS8PU3i5KjUy/Gy5hmbUQqpzTpH5NbpmlUjTrZottxKMv8KZL5CFLwaWAC8WaiK3qFV6EVax8P/QqJr7Ry+W7AeJRZnpxEjPsz1ZQIm7xr3a/sVsDs50xLGIwm/h1uQVeV/+8xdCf4WatJPipflq9PnLZ4LYlcjPv7yOJ3vmUZV8hWbdHcW+8R4UPQXj7zvmR8ytUj4N/UTmZQ4T1isH0ZZxpHpsY0YZlZYaj6yocVNoBTllTiCyWAxKickpXFrX1Ai4h7C+QWSCnEejNwoQra5ql5YJJa1e8fbI6xMPyNrIUKzodvWMe7b2dCv3SkNyuIcpHJn+dtJVsbtFsAwbjgW5e+/bBm5/EUWu40yffwfa3d32+2RRJKC097XBm859WP5mjOxAt8c6XJuBKYaTecMzQ/nIlOurgNLnBcfDbY1H/rZbfRexCDIHfLfeJZtYudSbs4TrYKKBfvN/PQvkd5JA/+frrUzb8bEOcBJ5Ey0rWlDzDhohb5FDnpctHTvur+YJ3pDISKTuW1DS25v826pCMhNzxSmlFGDsXX11xkK1z40BdDRxuH110bHlOg6NF0KQk/3URmap8a1whr0UM8M8OWfe/JLET6RpjQOcv5d0/D4FAptnNSRCqyLzWXzbER4imGIOhRBwQZsSf1+bCMwJSqJTqeK5otSEfdCETP0JRyf1M1sh9c8K6Wh8R5aYprIey0iFEGJLnhkJE75SDSmxJ2AFIoXXspaCSCBNUqKBv+V+dD4h0UA9HM2mldw+Jx/eViR+ugJm5QLVAoFx7R1rlgoBiTf1ajv6SQNKiPo9tKSJ65BmX1wWFEItNIlExqJOxE9V51pTKHEZpKFMtCcQmCpoPECFjHrmVGdxJ0gFtc8WhqHwOnSDshN8YfNYmejpopYaw+62ZGIQ/09NwP1n2/hqkEqkvuPLXHPQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH0PR12MB5252.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(1800799024)(366016)(38070700018); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: W1CpBUIvZcUNR+QM0ptl7pryipPDvbHaW7X1v9A7b8nSDaINKrtn/O/kRn1ETjZ14ps9RjaJx88X5VGCA82kZ2VASNVMzsP27ntAAH3yz8BGIS5nUaSZBFWPkeefha3w+dCKrEd6HPbDoEo4uWyRUuxTF6Nbf+A5hnGrwZ0kWcMqbRdwu5+b7gCcJ3YCv6AHi8jZDO7uBzJ96ld9dE1B+IcYQFK6xTShAN/TXHFt+G6KbVJy8OCOaj2r1h599ZB6yUbYDBkB1Tw/aiKFgkBJAlREw0kdBeQtHw8IUR5ybrSECtU69/rwtAw8Q0QCGOTpojnPreOYhLvVJ2OCb/sEbUdnkvFTrWOqUpFZj2MXzaQgs8f6/jtA0BgaA3Um/3Symigyja57pZ/tsl3K1uaEeK5XmmPMuGKCORg/4pYIFT1/F3dnHNr4kHKebnkWgREMfJ4WKsxu/q/9IH4kJ1pfE0CXf1izrd8U4wikmROSCxUSpx4OYPmNXaj6agJFNsMdrDBmgLJ1sSovZvOGQSuyTbkFkmf6c2EMpV7N0VU4MOEubzTn2efq5JUZ0MINed1UxHOyuB2j1kRRLD6a0tWWZfdQyJheqeQ0s+cLgQW/knfriDEU2Hv2PCcRSh/SUxYGe7LAGMVvXpHnNT3Kub2AnH3A0Iv7NNiQFVAfEePQPoxibOwakzxmUFnwlU8CGh5sU5Q0EAoLKeaLMNUCwmj3LnIz8Jaf4dt7foXQ/S+V3YnXflCqD9X6rHrpqPYiB56dR6Y7etrPqCm6ZUHXj6dT1ZLpTg5fafaiecO2X6ityGToVUJoC9u7nTjF+/+Odyorfx9+q+zQ+UIaaMz92v22vJnclvnm7OPhAtKttLM33clhRoqth1HX/mrB1h0+YR+FjV13d0CNRSZMf8Ys2sw9EU4TVizr/ThkPnbOBkwnXXBstxL3hJGSOMRycIKGCdsYHZdQcRxBtg0Mkdd72hDxlWUyV8vdnfyvx6p6CoCip1DcyDClV/OvijzCxwz+hyV6wsJG5XtgQqIJPblGU9OYGDiJYbKmUzlmqWIBwAVeZ/aPqe9DvV5ZvEMjjtrpw63d+uQztTdTkhnYc/68n0yN3wkHcPwcPvslJua7bYC/4iTCdLmt8CX1/0rcxAcdqthfA8YlB7jmvNhrR9PuB6lbCflfQ5HMGWx72p0AQLPJFHEwokKBX5aFeGKMOhP+Y9eHgIWjBvGPhVDl+JBJcHZVzKhxzg57VQGOz0mwW90kpnIH8xeN//gFomr8YxmKd3RAiBSkC5tdktymkLcS21JzN9EYR3VF0FEXcrl2omlthFZh6OTBFj9jhqdP5n4fKyfAwF00QW6EWs8/mourN0JEyAS++AXsjwlnd87XhpzoIQAQsCuuwpWGohBIYNucLjMVxqgmtCi+Uah/FVImLCCNm6n/gbZEh8rnfCndIG9ZGT2q3fEkQGNdoe8QIO259rdzlBOQddrT9/KK3UCMYp484zEFOLLeA+GZDPcJ50OHLrF85L+WBpiLk74m6K3bKu2rJcrLi4Ma1YPuL+AwCkMCelZJ8qYsZNSF/OnXBFPK2O+vEXBVs3jC5WZ/10ocLxsv MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CH0PR12MB5252.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6c2cccc2-93c4-46a7-b279-08dcececaeab X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Oct 2024 07:40:46.6628 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: G+u1Qcmr7TNsEBOnLUN3MbsDGM7/R/2Lfvbfisgd1LrJKXoRBM9GgVM77FF7G/XJe3/g+Tk0h5lmetL0G2AzLw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4146 X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Because a neg instruction has lower latency and higher throughput than sdiv and mul, svdiv and svmul by -1 can be folded to svneg. For svdiv, this is already implemented on the RTL level; for svmul, the optimization was still missing. This patch implements folding to svneg for both operations using the gimple_folder. For svdiv, the transform is applied if the divisor is -1. Svmul is folded if either of the operands is -1. A case distinction of the predication is made to account for the fact that svneg_m has 3 arguments (argument 0 holds the values for the inactive lanes), while svneg_x and svneg_z have only 2 arguments. Tests were added or adjusted to check the produced assembly and runtime tests were added to check correctness. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): Fold division by -1 to svneg. (svmul_impl::fold): Fold multiplication by -1 to svneg. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/div_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/mul_s16.c: Adjust expected outcome. * gcc.target/aarch64/sve/acle/asm/mul_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Adjust expected outcome. * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise. * gcc.target/aarch64/sve/div_const_run.c: New test. * gcc.target/aarch64/sve/mul_const_run.c: Likewise. --- .../aarch64/aarch64-sve-builtins-base.cc | 73 ++++++++++++++++--- .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 59 +++++++++++++++ .../gcc.target/aarch64/sve/acle/asm/mul_s16.c | 5 +- .../gcc.target/aarch64/sve/acle/asm/mul_s32.c | 48 +++++++++++- .../gcc.target/aarch64/sve/acle/asm/mul_s64.c | 5 +- .../gcc.target/aarch64/sve/acle/asm/mul_s8.c | 7 +- .../gcc.target/aarch64/sve/div_const_run.c | 10 ++- .../gcc.target/aarch64/sve/mul_const_run.c | 10 ++- 8 files changed, 189 insertions(+), 28 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index e7eba20f07a..2312b124c29 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -768,6 +768,27 @@ public: if (integer_zerop (op1) || integer_zerop (op2)) return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); + /* If the divisor is all integer -1, fold to svneg. */ + tree pg = gimple_call_arg (f.call, 0); + if (!f.type_suffix (0).unsigned_p && integer_minus_onep (op2)) + { + function_instance instance ("svneg", functions::svneg, + shapes::unary, MODE_none, + f.type_suffix_ids, GROUP_none, f.pred); + gcall *call = f.redirect_call (instance); + unsigned offset_index = 0; + if (f.pred == PRED_m) + { + offset_index = 1; + gimple_call_set_arg (call, 0, op1); + } + else + gimple_set_num_ops (call, 5); + gimple_call_set_arg (call, offset_index, pg); + gimple_call_set_arg (call, offset_index + 1, op1); + return call; + } + /* If the divisor is a uniform power of 2, fold to a shift instruction. */ tree op2_cst = uniform_integer_cst_p (op2); @@ -2033,12 +2054,37 @@ public: if (integer_zerop (op1) || integer_zerop (op2)) return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); + /* If one of the operands is all integer -1, fold to svneg. */ + tree pg = gimple_call_arg (f.call, 0); + tree negated_op = NULL; + if (integer_minus_onep (op2)) + negated_op = op1; + else if (integer_minus_onep (op1)) + negated_op = op2; + if (!f.type_suffix (0).unsigned_p && negated_op) + { + function_instance instance ("svneg", functions::svneg, + shapes::unary, MODE_none, + f.type_suffix_ids, GROUP_none, f.pred); + gcall *call = f.redirect_call (instance); + unsigned offset_index = 0; + if (f.pred == PRED_m) + { + offset_index = 1; + gimple_call_set_arg (call, 0, op1); + } + else + gimple_set_num_ops (call, 5); + gimple_call_set_arg (call, offset_index, pg); + gimple_call_set_arg (call, offset_index + 1, negated_op); + return call; + } + /* If one of the operands is a uniform power of 2, fold to a left shift by immediate. */ - tree pg = gimple_call_arg (f.call, 0); tree op1_cst = uniform_integer_cst_p (op1); tree op2_cst = uniform_integer_cst_p (op2); - tree shift_op1, shift_op2; + tree shift_op1, shift_op2 = NULL; if (op1_cst && integer_pow2p (op1_cst) && (f.pred != PRED_m || is_ptrue (pg, f.type_suffix (0).element_bytes))) @@ -2054,15 +2100,20 @@ public: else return NULL; - shift_op2 = wide_int_to_tree (unsigned_type_for (TREE_TYPE (shift_op2)), - tree_log2 (shift_op2)); - function_instance instance ("svlsl", functions::svlsl, - shapes::binary_uint_opt_n, MODE_n, - f.type_suffix_ids, GROUP_none, f.pred); - gcall *call = f.redirect_call (instance); - gimple_call_set_arg (call, 1, shift_op1); - gimple_call_set_arg (call, 2, shift_op2); - return call; + if (shift_op2) + { + shift_op2 = wide_int_to_tree (unsigned_type_for (TREE_TYPE (shift_op2)), + tree_log2 (shift_op2)); + function_instance instance ("svlsl", functions::svlsl, + shapes::binary_uint_opt_n, MODE_n, + f.type_suffix_ids, GROUP_none, f.pred); + gcall *call = f.redirect_call (instance); + gimple_call_set_arg (call, 1, shift_op1); + gimple_call_set_arg (call, 2, shift_op2); + return call; + } + + return NULL; } }; diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c index 521f8bb4758..2c836db777e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c @@ -55,6 +55,15 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t, z0 = svdiv_n_s32_m (p0, z1, x0), z0 = svdiv_m (p0, z1, x0)) +/* +** div_m1_s32_m_tied1: +** neg z0\.s, p0/m, z0\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_s32_m_tied1, svint32_t, + z0 = svdiv_n_s32_m (p0, z0, -1), + z0 = svdiv_m (p0, z0, -1)) + /* ** div_1_s32_m_tied1: ** ret @@ -63,6 +72,16 @@ TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t, z0 = svdiv_n_s32_m (p0, z0, 1), z0 = svdiv_m (p0, z0, 1)) +/* +** div_m1_s32_m_untied: +** movprfx z0, z1 +** neg z0\.s, p0/m, z1\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_s32_m_untied, svint32_t, + z0 = svdiv_n_s32_m (p0, z1, -1), + z0 = svdiv_m (p0, z1, -1)) + /* ** div_1_s32_m_untied: ** mov z0\.d, z1\.d @@ -214,6 +233,17 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, int32_t, z0 = svdiv_n_s32_z (p0, z1, x0), z0 = svdiv_z (p0, z1, x0)) +/* +** div_m1_s32_z_tied1: +** mov (z[0-9]+)\.d, z0\.d +** movprfx z0\.s, p0/z, \1\.s +** neg z0\.s, p0/m, \1\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_s32_z_tied1, svint32_t, + z0 = svdiv_n_s32_z (p0, z0, -1), + z0 = svdiv_z (p0, z0, -1)) + /* ** div_1_s32_z_tied1: ** mov (z[0-9]+)\.b, #0 @@ -224,6 +254,16 @@ TEST_UNIFORM_Z (div_1_s32_z_tied1, svint32_t, z0 = svdiv_n_s32_z (p0, z0, 1), z0 = svdiv_z (p0, z0, 1)) +/* +** div_m1_s32_z_untied: +** movprfx z0\.s, p0/z, z1\.s +** neg z0\.s, p0/m, z1\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_s32_z_untied, svint32_t, + z0 = svdiv_n_s32_z (p0, z1, -1), + z0 = svdiv_z (p0, z1, -1)) + /* ** div_1_s32_z_untied: ** mov (z[0-9]+)\.b, #0 @@ -381,6 +421,15 @@ TEST_UNIFORM_ZX (div_w0_s32_x_untied, svint32_t, int32_t, z0 = svdiv_n_s32_x (p0, z1, x0), z0 = svdiv_x (p0, z1, x0)) +/* +** div_m1_s32_x_tied1: +** neg z0\.s, p0/m, z0\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_s32_x_tied1, svint32_t, + z0 = svdiv_n_s32_x (p0, z0, -1), + z0 = svdiv_x (p0, z0, -1)) + /* ** div_1_s32_x_tied1: ** ret @@ -389,6 +438,16 @@ TEST_UNIFORM_Z (div_1_s32_x_tied1, svint32_t, z0 = svdiv_n_s32_x (p0, z0, 1), z0 = svdiv_x (p0, z0, 1)) +/* +** div_m1_s32_x_untied: +** movprfx z0, z1 +** neg z0\.s, p0/m, z1\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_s32_x_untied, svint32_t, + z0 = svdiv_n_s32_x (p0, z1, -1), + z0 = svdiv_x (p0, z1, -1)) + /* ** div_1_s32_x_untied: ** mov z0\.d, z1\.d diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c index 381f3356025..2d780966e78 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c @@ -183,8 +183,7 @@ TEST_UNIFORM_Z (mul_3_s16_m_untied, svint16_t, /* ** mul_m1_s16_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.h, p0/m, z0\.h, \1\.h +** neg z0\.h, p0/m, z0\.h ** ret */ TEST_UNIFORM_Z (mul_m1_s16_m, svint16_t, @@ -597,7 +596,7 @@ TEST_UNIFORM_Z (mul_255_s16_x, svint16_t, /* ** mul_m1_s16_x: -** mul z0\.h, z0\.h, #-1 +** neg z0\.h, p0/m, z0\.h ** ret */ TEST_UNIFORM_Z (mul_m1_s16_x, svint16_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c index 13009d88619..1d605dbdd8d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_s32_m_untied, svint32_t, /* ** mul_m1_s32_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.s, p0/m, z0\.s, \1\.s +** neg z0\.s, p0/m, z0\.s ** ret */ TEST_UNIFORM_Z (mul_m1_s32_m, svint32_t, z0 = svmul_n_s32_m (p0, z0, -1), z0 = svmul_m (p0, z0, -1)) +/* +** mul_m1r_s32_m: +** mov (z[0-9]+)\.b, #-1 +** mov (z[0-9]+)\.d, z0\.d +** movprfx z0, \1 +** neg z0\.s, p0/m, \2\.s +** ret +*/ +TEST_UNIFORM_Z (mul_m1r_s32_m, svint32_t, + z0 = svmul_s32_m (p0, svdup_s32 (-1), z0), + z0 = svmul_m (p0, svdup_s32 (-1), z0)) + /* ** mul_s32_z_tied1: ** movprfx z0\.s, p0/z, z0\.s @@ -597,13 +608,44 @@ TEST_UNIFORM_Z (mul_255_s32_x, svint32_t, /* ** mul_m1_s32_x: -** mul z0\.s, z0\.s, #-1 +** neg z0\.s, p0/m, z0\.s ** ret */ TEST_UNIFORM_Z (mul_m1_s32_x, svint32_t, z0 = svmul_n_s32_x (p0, z0, -1), z0 = svmul_x (p0, z0, -1)) +/* +** mul_m1r_s32_x: +** neg z0\.s, p0/m, z0\.s +** ret +*/ +TEST_UNIFORM_Z (mul_m1r_s32_x, svint32_t, + z0 = svmul_s32_x (p0, svdup_s32 (-1), z0), + z0 = svmul_x (p0, svdup_s32 (-1), z0)) + +/* +** mul_m1_s32_z: +** mov (z[0-9]+)\.d, z0\.d +** movprfx z0\.s, p0/z, \1\.s +** neg z0\.s, p0/m, \1\.s +** ret +*/ +TEST_UNIFORM_Z (mul_m1_s32_z, svint32_t, + z0 = svmul_n_s32_z (p0, z0, -1), + z0 = svmul_z (p0, z0, -1)) + +/* +** mul_m1r_s32_z: +** mov (z[0-9]+)\.d, z0\.d +** movprfx z0\.s, p0/z, \1\.s +** neg z0\.s, p0/m, \1\.s +** ret +*/ +TEST_UNIFORM_Z (mul_m1r_s32_z, svint32_t, + z0 = svmul_s32_z (p0, svdup_s32 (-1), z0), + z0 = svmul_z (p0, svdup_s32 (-1), z0)) + /* ** mul_m127_s32_x: ** mul z0\.s, z0\.s, #-127 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c index 530d9fc84a5..c05d184f2fe 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c @@ -192,8 +192,7 @@ TEST_UNIFORM_Z (mul_3_s64_m_untied, svint64_t, /* ** mul_m1_s64_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.d, p0/m, z0\.d, \1\.d +** neg z0\.d, p0/m, z0\.d ** ret */ TEST_UNIFORM_Z (mul_m1_s64_m, svint64_t, @@ -625,7 +624,7 @@ TEST_UNIFORM_Z (mul_255_s64_x, svint64_t, /* ** mul_m1_s64_x: -** mul z0\.d, z0\.d, #-1 +** neg z0\.d, p0/m, z0\.d ** ret */ TEST_UNIFORM_Z (mul_m1_s64_x, svint64_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c index 0c90a8bb832..efc952e3a52 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c @@ -183,8 +183,7 @@ TEST_UNIFORM_Z (mul_3_s8_m_untied, svint8_t, /* ** mul_m1_s8_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.b, p0/m, z0\.b, \1\.b +** neg z0\.b, p0/m, z0\.b ** ret */ TEST_UNIFORM_Z (mul_m1_s8_m, svint8_t, @@ -587,7 +586,7 @@ TEST_UNIFORM_Z (mul_128_s8_x, svint8_t, /* ** mul_255_s8_x: -** mul z0\.b, z0\.b, #-1 +** neg z0\.b, p0/m, z0\.b ** ret */ TEST_UNIFORM_Z (mul_255_s8_x, svint8_t, @@ -596,7 +595,7 @@ TEST_UNIFORM_Z (mul_255_s8_x, svint8_t, /* ** mul_m1_s8_x: -** mul z0\.b, z0\.b, #-1 +** neg z0\.b, p0/m, z0\.b ** ret */ TEST_UNIFORM_Z (mul_m1_s8_x, svint8_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c index c96bb2763dc..60cf8345d6a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c @@ -42,7 +42,9 @@ typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128))); TEST_TYPES_1 (uint64, u64) #define TEST_VALUES_S_1(B, OP1, OP2) \ - F (int##B, s##B, x, OP1, OP2) + F (int##B, s##B, x, OP1, OP2) \ + F (int##B, s##B, z, OP1, OP2) \ + F (int##B, s##B, m, OP1, OP2) #define TEST_VALUES_S \ TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \ @@ -60,7 +62,11 @@ typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128))); TEST_VALUES_S_1 (32, INT32_MAX, -5) \ TEST_VALUES_S_1 (64, INT64_MAX, -5) \ TEST_VALUES_S_1 (32, INT32_MIN, -4) \ - TEST_VALUES_S_1 (64, INT64_MIN, -4) + TEST_VALUES_S_1 (64, INT64_MIN, -4) \ + TEST_VALUES_S_1 (32, INT32_MAX, -1) \ + TEST_VALUES_S_1 (32, -7, -1) \ + TEST_VALUES_S_1 (64, INT64_MIN, -1) \ + TEST_VALUES_S_1 (64, 16, -1) #define TEST_VALUES_U_1(B, OP1, OP2) \ F (uint##B, u##B, x, OP1, OP2) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mul_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/mul_const_run.c index c369d5be167..eb897d622fc 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/mul_const_run.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/mul_const_run.c @@ -44,7 +44,9 @@ typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128))); TEST_TYPES_1 (uint64, u64) #define TEST_VALUES_S_1(B, OP1, OP2) \ - F (int##B, s##B, x, OP1, OP2) + F (int##B, s##B, x, OP1, OP2) \ + F (int##B, s##B, m, OP1, OP2) \ + F (int##B, s##B, z, OP1, OP2) #define TEST_VALUES_S \ TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \ @@ -70,7 +72,11 @@ typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128))); TEST_VALUES_S_1 (32, INT32_MAX, -5) \ TEST_VALUES_S_1 (64, INT64_MAX, -5) \ TEST_VALUES_S_1 (32, INT32_MIN, -4) \ - TEST_VALUES_S_1 (64, INT64_MIN, -4) + TEST_VALUES_S_1 (64, INT64_MIN, -4) \ + TEST_VALUES_S_1 (32, INT32_MAX, -1) \ + TEST_VALUES_S_1 (32, -7, -1) \ + TEST_VALUES_S_1 (64, INT64_MIN, -1) \ + TEST_VALUES_S_1 (64, 16, -1) #define TEST_VALUES_U_1(B, OP1, OP2) \ F (uint##B, u##B, x, OP1, OP2)