From patchwork Mon Nov 11 15:11:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jennifer Schmitz X-Patchwork-Id: 2009855 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=i3E2+WPF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XnCh13tqNz1xyB for ; Tue, 12 Nov 2024 02:11:57 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7EF803858CDA for ; Mon, 11 Nov 2024 15:11:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on20605.outbound.protection.outlook.com [IPv6:2a01:111:f403:2413::605]) by sourceware.org (Postfix) with ESMTPS id DDE853858D21 for ; Mon, 11 Nov 2024 15:11:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DDE853858D21 Authentication-Results: sourceware.org; dmarc=fail (p=reject dis=none) header.from=nvidia.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DDE853858D21 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2413::605 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1731337890; cv=pass; b=YB4gyhwdPS0aoXwzStAPNM0lfD0O/EDCxlnXCjtWpqfsaDqyIHnvAXAKqUlUki2v0zeGw2C9Zo6fAPE/1HBSsECW8+gkGuvr+pQnlxjxKs4eipSooj+A4vWrUuUL4+8dpCWMBoSHBQN5nVO/G8PxeaCPOeimAIYMbzhilehWtyM= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1731337890; c=relaxed/simple; bh=alliJIDXZytRpf2qZatdOxoYIocqy1c0eZHN/kvKe2U=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EFJ/KFlhZuFUM+o5dy/cqS4vBlSn7V3DYDzMQp+/b3HP3j/nDBPai7jYQooWN37qMb2fSNXh2iQ+oBMJKs+5GRWYKfpxqVJ8bPjoC+6pkEaTLBhGqqWpK6JjJVreb84nBMY2MqQSZ40zVmdoRzkSdoENasidaW7Xbysj1TfR5Kc= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=kwVEH5hJWRgUPuks87lKCFGiFB0MnxqU1cvvn+O1uTLv8WtNne40nL0l/vW6pEK851oFu/9Ru/DiOg3ndztZzjy4Us3TTNEwlUrVoRftlXKHienWmf0IZvGMI32pxUBEHojJFcpHZaDT/xZKMhit4buZkTC31IW7cCBcZfndM+/lJnSHVLejJahSOiBKnFNbJkyDxdt/HgEDenQJG3Ab7Y5dy1FVDL86kHF2cz3ZRsoVXD84kxQCRxP0u+UKk1vwomFnQ25eahLCXMbU15D4D27Vwq6NIDqQYIRSljhit/dQdYEmlJYuc8zikVyzC24TEwwdIgy84rtOlPehvcyI3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rNVwkgThzCTCE8scgMU2sWyViHf/URIYm3K2WCnZsx8=; b=LSXQ2HRhG5TtUsNqexKg7n5IBoOBvWzR5xpa9PX3x0RUa8RI/6JzdNi+bieywpzC8Q/KLHe+f8lTNn0w/X27iYLtKqmp4hQPqIxqj5R/mOZ2H4urMcGsuHhPHpYnvPKYbp7Bu8auM6MK/UVqcCJeW/vTi3E+0Ge91iMr/nO6rW9qZ2UBevEzdzImE+5xEiQ+r+C1zxCN/eV3iwuuWf2amDUmY1UmVdcB6onVoSRFzqVEWRm4+qU/L4aVR+hJYU9FDxlBsOPZ6zuSENrZbBvC+9Zxr6a959Hdhe74PrQ6o9OnSrqRMw/ST8GS9Kjtcl1/6JEQB+5s2vahmq/lni1lLg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rNVwkgThzCTCE8scgMU2sWyViHf/URIYm3K2WCnZsx8=; b=i3E2+WPFcjnTyuU6Ysn+kHqMag5qFm86QIiCU5kxh7qfW5V4sIaLXxo73rb367C2YyhPVbx5CY6LJQAMZLNOBS5rsqzEabHhlzsMzudRs5LFapgo8h1x0VWKWPUwFsP7QuyjSP68eF6ySlmM9Et9aHys5KrkdBTSJptA18P6KgTkKFHu7vuIz9uLIdovrjly412Bt6TCZ1wA6rMz8uBF/5+bTTOMB+HyRl1TOpQVerq9UKlyl9FasL9qqF1t5TeSLrf+WSnEB4etNRTiAqkVPp6Icyevm1qvqyXtxc7CZRNgmbg+tNt7vJs9owbHMRsiysOP9w1iBHO+uVwISg6WIA== Received: from CH0PR12MB5252.namprd12.prod.outlook.com (2603:10b6:610:d3::24) by DM4PR12MB6088.namprd12.prod.outlook.com (2603:10b6:8:af::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8137.28; Mon, 11 Nov 2024 15:11:19 +0000 Received: from CH0PR12MB5252.namprd12.prod.outlook.com ([fe80::290b:293f:5cbd:9c9d]) by CH0PR12MB5252.namprd12.prod.outlook.com ([fe80::290b:293f:5cbd:9c9d%4]) with mapi id 15.20.8137.027; Mon, 11 Nov 2024 15:11:19 +0000 From: Jennifer Schmitz To: "gcc-patches@gcc.gnu.org" CC: Richard Sandiford , Kyrylo Tkachov Subject: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types Thread-Topic: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types Thread-Index: AQHbNEv2GIrqWogj0UWSlGqN8oJAwQ== Date: Mon, 11 Nov 2024 15:11:19 +0000 Message-ID: <8B42CDB5-96A6-43BD-ACF7-C4D04F5C98E5@nvidia.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: CH0PR12MB5252:EE_|DM4PR12MB6088:EE_ x-ms-office365-filtering-correlation-id: 3688fb9b-8de9-44d5-d6da-08dd026318a0 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|1800799024|366016|38070700018; x-microsoft-antispam-message-info: FoZI8Md6D8CJeqFEU0bFRNx8207E6fjPpj3YQO1PuK0ueDVxMTkYi9EpfEyz+vPdO5MZ68TbYJmWCOmyBRmCRmYyQT684bPKuz3rPMVfRj4jPgUx/h+HmvaikznwKRNF8QUSXk+f9g7T+8c145qMxhzFbhnHvu+/1p4u6Pqill1kYP6UV4YhHI9neM7BIOUhjQtyo0Fwe/N25vH0L9RTVZUvHPr/ice+3dmfgqKBIaXXIkYHcjqfVmaqg/inH12UwyP5SnwMe6LbAvIXOBbt795z+NqRR6zDcBRj/7sQcA7VGtpQQ3nfVD7Y2GVOVaxnQqXDBfuEJ52YCXlrEF6cYaA6uxeQtsopaBJqzTxR5Fe5enSL/A08WGRE7zyJNgalt8v0g+NJXGpvR4Anku3cL/grzmx35c1VrfF6YIzwr4D/FWAIvpLj7B5YrCYFpIjDOy2CKhGQjMK1wGaqOjo1ZlRIpWmfB6d/LgsL3bZAV/7gPhRWyyQEs9bGaU/6tc+wjiy4GQ0AIC3CyvM0btLVRXQ4/+PoAiPB9LUM3C0Ek3sFO6M6PRCZah2qX9qQI/nKehPoIA3HrmBjDwVarzAI8Tx6Mhx8tqljPiLr7+S3DkMfTnkIDdpT+pn1QIdC1r2l7x3561wAIQiRN8K02q5RbxJT3QIQolpa60EhgO17ODVwlCZUlI5bUXZXQaApyrN85PtsmjvKkOJWUT7GHq6rkzT/xEzP9XUrPwHAfyjY58/Ubq3f/F6uQVUH0Wc+Aty/qcLFJ6Ead6kDCc4VT9iXgE4gOi13fQFJYvxYbkNrnIQLUvoLzmyGkPAjAmdOrQlO7SkPlSfvY9TtSPp1vL9V3BPKG0wysG3W9S50y80r2uZXXwxQu8IHVWfuSkkB/6JMUQriEJ1SWdAEpjgXi3+2D2wT9tLo5NFlI+Utx0ceHGbkD8CFCJPqxNdEpfOX/Ni4v0tef7LdIRID2XVdZTrCDG1mkx5U8+m1ZGmHoWjDoItgqpUa5ZIlzeAGEVrJ1bCcqJt5gEoirNqP7DoeSTs2MEiejXt1n9KgfHmdtEM0sGhDonZH2Z6MjA51RYrxzKs4t1Wpdv8hRNbRdFVbRu7+MdKL/a1MlYMl51UUQRDrHfd/JeOwa/EVt9VXYtzl16qGgshhXO1k/UKr5hI0CtS1NHW24PEpZnKdraz44R+Eju60psy6+7dd8dTWvg609yaJXHQnR0ZMp1G1AGSqsi2xeCzW8N8MGLDYzoY8NUvDq38Sdr7RsEfW+BohgL6nI1ucgj5OjeM/I0XFhKGjw48rrKFvK51dwzN3tNHpnZhfW1dX5XT6G83roNqFIZUQGdfEHjy+t90iy+ZolqhpULa2JY8HVzoZx8XlD2tU3N4o16Wtjn92eHCgt9T5ERngbmVx x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH0PR12MB5252.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(1800799024)(366016)(38070700018); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: vOJly8ORMwC0vjfO97oy/uBPOhqZU3SrzSjV5ECTg5U9uNYzBiEdXc40p++VoHI95Jm7oUrxm9VP0aSqqg0LaumrhhPHtAq9bDiIbyv9iZYgLU/1ITvQBN2YuwEIR7562TZGKK1wrqOf7BDwJhVuydnUJg85LvweCLMTkt28WD0ul7OBQV5sMD3ChzDmoCD5KKlJqb7UexmOdXAjC3qqSaFyQ59lASD5W6Feq0BgY2zSr7wMawbDZIa88hxJPbHXKfSNuj3ddeOwnzwyEzgw5ioGQnUpKnnYbYNoPmd+lPLBJ9LCXPLAubYy6Bpz1gIAiL1vnlUKhVPLBwW8ilaslFECbGC5H4mN6GDEuJ4diXN00y04SaUmm6G+zYACky0Ka3369uY+4p1AjNnkkzoN/Ra47EV1rJxxOUkZD8hd9epAiHvVYTq/r8LbFdru3hLq/i0210BqWpiG1GQWN/n0HKfiPofhhzEkyKCX/mhp2KzsIRp2/VdfcPixMhZVukliFqdmWo6BYXu0E6uxgVmI5NmUj0Jpr05qu0GMJZzioSq2JHZocBGrdXkfW2k8WrIP314caZEre5abHwAzbBABJiPoIcX5mvibrIpOwpto9qdRvIqpy1OlRjwmXeSLvVZEtJUWQEPdBuTZlD7nydjShBatP/RdN7kPkUeWUOeNxLl5u1dEvPuhNpo/tqSv1Ap1A4SBP1IiEPlJfGpojFfcG8pKIC7qzXcAN3jpl2sS6UYDC9LJVknu3hD+LRxXKm9f2/sgY9HnkEDCOxOs0dY6HiUXuVrddQ5RvKy4v+3Q2CwyjynWBWqO8swO9JJLKptGSLPgSZIygL19c9SwzG5MhMWyO2yY+Umin+bADbYVk/HnVRy+B6Ha8v0DTB+9s/hniB36eaqKjy8oe8V+eFID2zvlHYNa1uewfxFg17VRiIxXVvzr7M8ZDQn0NHNyUIILqcZIjmz7GBghUAImpmMKqRjehLIU838jnbesf9hH7qOBCdC/HH9OMK7Pf/T0tBB82kcD1QvZnRGbRIdTfBOsegTvQ5TjLaWNlGV1DYVHgJotNZaRJaGPkSsM8J1kLqYVve18HSJcNFIW5g13WiQf1s2B3ueie6qaN+anvHobX3SjxT7grYRfA7jtbt3QBmWYETRfL94QdbIl7lYu+rGppjMsmefbFeOjf02YZYSJkFLlF8DoDWRB7pCblv76G0fBNlZh/HKOLX/jWsLecdV9NCxHJpKLvXdDLfl0N9dOW8ECn0JHRs1w8C08fqoIVUNm3PF0KAd5g5hZrVu3gg38yAwAS4uiT08HEoeffm+qhf9UnxC2hjv3+cKWtEUmmazHjMefxOhGOjr/cNeRsol9hwmPElYZTLB3sU0Xy/hE1T50BwR/A0vL9hkagJnAZ0DKJXnW6TlLbnpN3Rc2nVXQdJlVNGMaaSEhR/8ukfyltktdp8DyoQxvszE7THUJDjdtzlo7EkKH2Psfdyx6aS6orEcVRPgsdmwXFbSAkP27G0w36BqKLtPtIgM2dmmcJwj+gJaK3+gG5wFxCWQRjCGCqj0ueebf+MNhF/PtWQ5B9C4u4sj/7aepId1m5kBfkCuJ MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CH0PR12MB5252.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3688fb9b-8de9-44d5-d6da-08dd026318a0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Nov 2024 15:11:19.4580 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ORM6nAxUI0RI91vL8IRklG6Q8tGwajGOLhiSAc/P29hGz6LD8Q6T8IswyZ7fRxagFK4XHT+B2pFc8g0FGzpZMw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6088 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org As follow-up to https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, this patch implements folding of svmul and svdiv by -1 to svneg for unsigned SVE vector types. The key idea is to reuse the existing code that does this fold for signed types and feed it as callback to a helper function that adds the necessary type conversions. For example, for the test case svuint64_t foo (svuint64_t x, svbool_t pg) { return svmul_n_u64_x (pg, x, -1); } the following gimple sequence is emitted (-O2 -mcpu=grace): svuint64_t foo (svuint64_t x, svbool_t pg) { svuint64_t D.12921; svint64_t D.12920; svuint64_t D.12919; D.12920 = VIEW_CONVERT_EXPR(x); D.12921 = svneg_s64_x (pg, D.12920); D.12919 = VIEW_CONVERT_EXPR(D.12921); goto ; : return D.12919; } In general, the new helper gimple_folder::convert_and_fold - takes a target type and a function pointer, - converts all non-boolean vector types to the target type, - replaces the converted arguments in the function call, - calls the callback function, - adds the necessary view converts to the gimple sequence, - and returns the new call. Because all arguments are converted to the same target types, the helper function is only suitable for folding calls whose arguments are all of the same type. If necessary, this could be extended to convert the arguments to different types differentially. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold): Wrap code for folding to svneg in lambda function and pass to gimple_folder::convert_and_fold to enable the transform for unsigned types. (svdiv_impl::fold): Likewise. * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::convert_and_fold): New function that converts operands to target type before calling callback function, adding the necessary conversion statements. * config/aarch64/aarch64-sve-builtins.h (gimple_folder::convert_and_fold): Declare function. (signed_type_suffix_index): Return type_suffix_index of signed vector type for given width. (function_instance::signed_type): Return signed vector type for given width. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected outcome. * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust expected outcome. Signed-off-by: Jennifer Schmitz Signed-off-by: Jennifer Schmitz Signed-off-by: Jennifer Schmitz --- .../aarch64/aarch64-sve-builtins-base.cc | 99 ++++++++++++------- gcc/config/aarch64/aarch64-sve-builtins.cc | 40 ++++++++ gcc/config/aarch64/aarch64-sve-builtins.h | 30 ++++++ .../gcc.target/aarch64/sve/acle/asm/div_u32.c | 9 ++ .../gcc.target/aarch64/sve/acle/asm/div_u64.c | 9 ++ .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 5 +- .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 5 +- .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++- .../gcc.target/aarch64/sve/acle/asm/mul_u8.c | 7 +- 9 files changed, 180 insertions(+), 50 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 1c9f515a52c..6df14a8f4c4 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -769,24 +769,33 @@ public: return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); /* If the divisor is all integer -1, fold to svneg. */ - tree pg = gimple_call_arg (f.call, 0); - if (!f.type_suffix (0).unsigned_p && integer_minus_onep (op2)) + if (integer_minus_onep (op2)) { - function_instance instance ("svneg", functions::svneg, - shapes::unary, MODE_none, - f.type_suffix_ids, GROUP_none, f.pred); - gcall *call = f.redirect_call (instance); - unsigned offset_index = 0; - if (f.pred == PRED_m) + auto div_by_m1 = [](gimple_folder &f) -> gcall * { - offset_index = 1; - gimple_call_set_arg (call, 0, op1); - } - else - gimple_set_num_ops (call, 5); - gimple_call_set_arg (call, offset_index, pg); - gimple_call_set_arg (call, offset_index + 1, op1); - return call; + tree pg = gimple_call_arg (f.call, 0); + tree op1 = gimple_call_arg (f.call, 1); + type_suffix_pair signed_tsp = + {signed_type_suffix_index (f.type_suffix (0).element_bits), + f.type_suffix_ids[1]}; + function_instance instance ("svneg", functions::svneg, + shapes::unary, MODE_none, + signed_tsp, GROUP_none, f.pred); + gcall *call = f.redirect_call (instance); + unsigned offset = 0; + if (f.pred == PRED_m) + { + offset = 1; + gimple_call_set_arg (call, 0, op1); + } + else + gimple_set_num_ops (call, 5); + gimple_call_set_arg (call, offset, pg); + gimple_call_set_arg (call, offset + 1, op1); + return call; + }; + tree ty = f.signed_type (f.type_suffix (0).element_bits); + return f.convert_and_fold (ty, div_by_m1); } /* If the divisor is a uniform power of 2, fold to a shift @@ -2082,33 +2091,49 @@ public: return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); /* If one of the operands is all integer -1, fold to svneg. */ - tree pg = gimple_call_arg (f.call, 0); - tree negated_op = NULL; - if (integer_minus_onep (op2)) - negated_op = op1; - else if (integer_minus_onep (op1)) - negated_op = op2; - if (!f.type_suffix (0).unsigned_p && negated_op) + if (integer_minus_onep (op1) || integer_minus_onep (op2)) { - function_instance instance ("svneg", functions::svneg, - shapes::unary, MODE_none, - f.type_suffix_ids, GROUP_none, f.pred); - gcall *call = f.redirect_call (instance); - unsigned offset_index = 0; - if (f.pred == PRED_m) + auto mul_by_m1 = [](gimple_folder &f) -> gcall * { - offset_index = 1; - gimple_call_set_arg (call, 0, op1); - } - else - gimple_set_num_ops (call, 5); - gimple_call_set_arg (call, offset_index, pg); - gimple_call_set_arg (call, offset_index + 1, negated_op); - return call; + tree pg = gimple_call_arg (f.call, 0); + tree op1 = gimple_call_arg (f.call, 1); + tree op2 = gimple_call_arg (f.call, 2); + tree negated_op = op1; + bool negate_op1 = true; + if (integer_minus_onep (op1)) + { + negated_op = op2; + negate_op1 = false; + } + type_suffix_pair signed_tsp = + {signed_type_suffix_index (f.type_suffix (0).element_bits), + f.type_suffix_ids[1]}; + function_instance instance ("svneg", functions::svneg, + shapes::unary, MODE_none, + signed_tsp, GROUP_none, f.pred); + gcall *call = f.redirect_call (instance); + unsigned offset = 0; + if (f.pred == PRED_m) + { + offset = 1; + tree ty = f.signed_type (f.type_suffix (0).element_bits); + tree inactive = negate_op1 ? gimple_call_arg (f.call, 1) + : build_minus_one_cst (ty); + gimple_call_set_arg (call, 0, inactive); + } + else + gimple_set_num_ops (call, 5); + gimple_call_set_arg (call, offset, pg); + gimple_call_set_arg (call, offset + 1, negated_op); + return call; + }; + tree ty = f.signed_type (f.type_suffix (0).element_bits); + return f.convert_and_fold (ty, mul_by_m1); } /* If one of the operands is a uniform power of 2, fold to a left shift by immediate. */ + tree pg = gimple_call_arg (f.call, 0); tree op1_cst = uniform_integer_cst_p (op1); tree op2_cst = uniform_integer_cst_p (op2); tree shift_op1, shift_op2 = NULL; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 44b7f6edae5..6523a782dac 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -3576,6 +3576,46 @@ gimple_folder::redirect_pred_x () return redirect_call (instance); } +/* Convert all non-boolean vector-type operands to TYPE, fold the call + using callback F, and convert the result back to the original type. + Add the necessary conversion statements. Return the updated call. */ +gimple * +gimple_folder::convert_and_fold (tree type, gcall *(*fp) (gimple_folder &)) +{ + gcc_assert (VECTOR_TYPE_P (type) + && TYPE_MODE (type) != VNx16BImode); + tree old_type = TREE_TYPE (lhs); + if (useless_type_conversion_p (type, old_type)) + return fp (*this); + + unsigned int num_args = gimple_call_num_args (call); + gimple_seq stmts = NULL; + tree op, op_type, t1, t2, t3; + gimple *g; + gcall *new_call; + for (unsigned int i = 0; i < num_args; ++i) + { + op = gimple_call_arg (call, i); + op_type = TREE_TYPE (op); + if (VECTOR_TYPE_P (op_type) + && TREE_CODE (op) != VECTOR_CST + && TYPE_MODE (op_type) != VNx16BImode) + { + t1 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op); + gimple_call_set_arg (call, i, t1); + } + } + + new_call = fp (*this); + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); + t2 = create_tmp_var (gimple_call_return_type (new_call)); + gimple_call_set_lhs (new_call, t2); + t3 = build1 (VIEW_CONVERT_EXPR, old_type, t2); + g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t3); + gsi_insert_after (gsi, g, GSI_SAME_STMT); + return new_call; +} + /* Fold the call to constant VAL. */ gimple * gimple_folder::fold_to_cstu (poly_uint64 val) diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index d5cc6e0a40d..b48fed0df6b 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -406,6 +406,7 @@ public: tree scalar_type (unsigned int) const; tree vector_type (unsigned int) const; tree tuple_type (unsigned int) const; + tree signed_type (unsigned int) const; unsigned int elements_per_vq (unsigned int) const; machine_mode vector_mode (unsigned int) const; machine_mode tuple_mode (unsigned int) const; @@ -630,6 +631,7 @@ public: gcall *redirect_call (const function_instance &); gimple *redirect_pred_x (); + gimple *convert_and_fold (tree, gcall *(*) (gimple_folder &)); gimple *fold_to_cstu (poly_uint64); gimple *fold_to_pfalse (); @@ -860,6 +862,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits) gcc_unreachable (); } +/* Return the type suffix of the signed type of width ELEMENT_BITS. */ +inline type_suffix_index +signed_type_suffix_index (unsigned int element_bits) +{ + switch (element_bits) + { + case 8: return TYPE_SUFFIX_s8; + case 16: return TYPE_SUFFIX_s16; + case 32: return TYPE_SUFFIX_s32; + case 64: return TYPE_SUFFIX_s64; + } + gcc_unreachable (); +} + /* Return the single field in tuple type TYPE. */ inline tree tuple_type_field (tree type) @@ -1025,6 +1041,20 @@ function_instance::tuple_type (unsigned int i) const return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type]; } +/* Return the signed vector type of width ELEMENT_BITS. */ +inline tree +function_instance::signed_type (unsigned int element_bits) const +{ + switch (element_bits) + { + case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t]; + case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t]; + case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t]; + case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t]; + } + gcc_unreachable (); +} + /* Return the number of elements of type suffix I that fit within a 128-bit block. */ inline unsigned int diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c index 1e8d6104845..fcadc05d75b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t, z0 = svdiv_n_u32_m (p0, z1, 1), z0 = svdiv_m (p0, z1, 1)) +/* +** div_m1_u32_m_tied1: +** neg z0\.s, p0/m, z0\.s +** ret +*/ +TEST_UNIFORM_Z (div_m1_u32_m_tied1, svuint32_t, + z0 = svdiv_n_u32_m (p0, z0, -1), + z0 = svdiv_m (p0, z0, -1)) + /* ** div_2_u32_m_tied1: ** lsr z0\.s, p0/m, z0\.s, #1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c index ab049affb63..b95d5af13d8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t, z0 = svdiv_n_u64_m (p0, z1, 1), z0 = svdiv_m (p0, z1, 1)) +/* +** div_m1_u64_m_tied1: +** neg z0\.d, p0/m, z0\.d +** ret +*/ +TEST_UNIFORM_Z (div_m1_u64_m_tied1, svuint64_t, + z0 = svdiv_n_u64_m (p0, z0, -1), + z0 = svdiv_m (p0, z0, -1)) + /* ** div_2_u64_m_tied1: ** lsr z0\.d, p0/m, z0\.d, #1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c index bdf6fcb98d6..e228dc5995d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t, /* ** mul_m1_u16_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.h, p0/m, z0\.h, \1\.h +** neg z0\.h, p0/m, z0\.h ** ret */ TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t, @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t, /* ** mul_m1_u16_x: -** mul z0\.h, z0\.h, #-1 +** neg z0\.h, p0/m, z0\.h ** ret */ TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c index a61e85fa12d..e8f52c9d785 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t, /* ** mul_m1_u32_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.s, p0/m, z0\.s, \1\.s +** neg z0\.s, p0/m, z0\.s ** ret */ TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t, @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t, /* ** mul_m1_u32_x: -** mul z0\.s, z0\.s, #-1 +** neg z0\.s, p0/m, z0\.s ** ret */ TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c index eee1f8a0c99..2ccdc3642c5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t, /* ** mul_m1_u64_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.d, p0/m, z0\.d, \1\.d +** neg z0\.d, p0/m, z0\.d ** ret */ TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t, z0 = svmul_n_u64_m (p0, z0, -1), z0 = svmul_m (p0, z0, -1)) +/* +** mul_m1r_u64_m: +** mov (z[0-9]+)\.b, #-1 +** mov (z[0-9]+\.d), z0\.d +** movprfx z0, \1 +** neg z0\.d, p0/m, \2 +** ret +*/ +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t, + z0 = svmul_u64_m (p0, svdup_u64 (-1), z0), + z0 = svmul_m (p0, svdup_u64 (-1), z0)) + /* ** mul_u64_z_tied1: ** movprfx z0\.d, p0/z, z0\.d @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t, /* ** mul_m1_u64_x: -** mul z0\.d, z0\.d, #-1 +** neg z0\.d, p0/m, z0\.d ** ret */ TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t, z0 = svmul_n_u64_x (p0, z0, -1), z0 = svmul_x (p0, z0, -1)) +/* +** mul_m1r_u64_x: +** neg z0\.d, p0/m, z0\.d +** ret +*/ +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t, + z0 = svmul_u64_x (p0, svdup_u64 (-1), z0), + z0 = svmul_x (p0, svdup_u64 (-1), z0)) + /* ** mul_m127_u64_x: ** mul z0\.d, z0\.d, #-127 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c index 06ee1b3e7c8..8e53a4821f0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t, /* ** mul_m1_u8_m: -** mov (z[0-9]+)\.b, #-1 -** mul z0\.b, p0/m, z0\.b, \1\.b +** neg z0\.b, p0/m, z0\.b ** ret */ TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t, @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t, /* ** mul_255_u8_x: -** mul z0\.b, z0\.b, #-1 +** neg z0\.b, p0/m, z0\.b ** ret */ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t, @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t, /* ** mul_m1_u8_x: -** mul z0\.b, z0\.b, #-1 +** neg z0\.b, p0/m, z0\.b ** ret */ TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,