From patchwork Mon Oct 14 10:56:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1996809 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=kEsO3Fg5; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=kEsO3Fg5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvLq507hz1xvK for ; Mon, 14 Oct 2024 21:57:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D73233856DCA for ; Mon, 14 Oct 2024 10:57:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on20624.outbound.protection.outlook.com [IPv6:2a01:111:f403:260d::624]) by sourceware.org (Postfix) with ESMTPS id 94CAF385AC31; Mon, 14 Oct 2024 10:56:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 94CAF385AC31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 94CAF385AC31 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260d::624 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903403; cv=pass; b=bYUDgGdMoNTjYVptHHdxXwFgxzgelvFNK+ghgsvKdghwcinK3Q8EW5AMNnupjhdNBVdeupE3rrFyHoQaOhy4xWND/u98M29LDfa3cFKydQq6PbYz2wErMjtSo8EJyodSpQPhETBMHyDSXzS5nolLtak3qcH2GaeiMOlWx/SkyZs= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903403; c=relaxed/simple; bh=3CPJSzfVEGlDT9VxHSyrML0cam+RtJMrPKWWskcA36c=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=aIf8vZhlreO4Ew2eeVwyw9uKVXW5cY9g3NFKg0vMJU4mGA5BeU4DNqt7mX7HFzALjAp2DoMBBUUhmgONQ99OBAn5sQCNIdbuxUVnHiQvHimcx59gr4l50uoFxm7CdWXCxEwLLmSIDXvpnhK76fqfupioPGCQKcJd23fLnG5ISq0= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=bAD69XZEb5m9L1TnGrddZSeyjpBn8avO6OdUWWOMx/hSHvilQPq6pAWWMpOVXS/bSzEJSqb8sCegTIFkg1spNlpjtGSVfTEb33m/so+j66BlbfOY9vkbuerCidZN5ldV4kBsywLxc+7acYtRWnEHpixGx3XMvJiT3yg5nyy8QUYwqLUC53JPhw9d3MUg7oFNffUHFHgFQ1Qk5kccJqR1v82SnrYEQSz5YMn5q3Q4jmHuvmQxsOY2VcMz9td3HPUIGwkgHVsQpbIonnMUeuEDg2dL0w2kVp9nVafghYke2r4rikZPUo0uPjqktYSEKHRSysbQ4z7jE/ibMgOjnBC0ig== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=Gb16daDis456AMrQb7D5uIqGbA4yU3hcbNrK42Q+QwFqqg1CaD5vN5pulc2O2x6c6UdfDJexT+84CzoVVMtHwK1Y3eQsvO5XS+OZcyskulxJBS25hlrMHTSNtYt+Al57jFPlGlaR0qlIfEP7077Pm1a7fW2K2BzBIMbrlbtC8r1NlLJnXu+zTB9u9FS66PmevjQpx5ZmU//V7DqRVNQZBWjPpzou1wJfaitrDC6MDUC0UQZWvSAZ4c6SdBd7lNcbCCpOYgkvd7bRnDbiaJUV540UfssgI8kRpsvGHVhfCn/7KAkg3TcMLTAPp2YRCWCH0TBgSc4wiYxA4QKY6+cV3g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=kEsO3Fg5jEoOzxP6mIW/o3C/pgHYZ+R3dFdG+2r7MBSEh6SE4CTEKVxP+cgCVqhtdg9jTeshgVZtBoatA4ZT4StRHpBb+EXXLLqgujkyZ26OcfiHYO1pAukmV7n/InUBcpzr+/PIKO4Iz/x9K2QPPj9pR3K9OLMCimXmKsJGm0E= Received: from DB9PR06CA0018.eurprd06.prod.outlook.com (2603:10a6:10:1db::23) by AM9PR08MB6004.eurprd08.prod.outlook.com (2603:10a6:20b:285::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.27; Mon, 14 Oct 2024 10:56:26 +0000 Received: from DB5PEPF00014B96.eurprd02.prod.outlook.com (2603:10a6:10:1db:cafe::98) by DB9PR06CA0018.outlook.office365.com (2603:10a6:10:1db::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.26 via Frontend Transport; Mon, 14 Oct 2024 10:56:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B96.mail.protection.outlook.com (10.167.8.234) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8048.13 via Frontend Transport; Mon, 14 Oct 2024 10:56:26 +0000 Received: ("Tessian outbound 40ef283ec771:v473"); Mon, 14 Oct 2024 10:56:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 948f4651486f92a1 X-TessianGatewayMetadata: zywqHDfYfYNSEmhVrscKhqwSfVYJu7I4Ve6oYcV7X8AFeoGRmBy+PoSHQdmTGaAlQoFeQ3TD0kx4f5AxEn7c8TEKi+/0jZsv557YfjyBJ3PyvJeP+WZYUbo1D2B1ZBIe9NqD6ZULk4rHX5NNg++eEA== X-CR-MTA-TID: 64aa7808 Received: from L82550e7cdc6c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0C98B3E6-8F44-47F9-839C-57EADC252F1E.1; Mon, 14 Oct 2024 10:56:18 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L82550e7cdc6c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Oct 2024 10:56:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=REpPUzKthAAgHKhBOn3FVJctP2y9e6uRdfhQ/xh2OEpI5FNv6bKhWfe/TJReR7V5ocPitcmEV2q84jrtgFG2MwUUk2OKkJxpUh7A0xKxiIMyTAchZtxKt9m6n/bzMRCbUd8+0t7K40uTGZJMhmqJZ43/2iQxzv/JRN7dFvC9e1nt9Sr5a4cyeCf01gJVLJ4XZ+GYT1oTPNbtZvYR5Eog1r+AhZDGuzBIbRwx1S7gQY+xkzYtvgo3T+M3ia547imzF/f79CEdGMgBSJ1Ngk8OHedLk8TC9HNOds4rPLVIphG/flejTieOwuQXbHrT1reiGO2jC6Ei13jwWMK775yYbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=oxSD1GL1PWeeFt3zJ9smY9PlpcK2Q0NpBj5RLff15eLkVAmFENFXzgO8JcGuwzjvRMT32vqjqGcAqS0KvzN1ACTIrK/XPRs/LD1Y3ttQEBwaVAk0SVu5YcUSXl2ukMdRmnOWVsqJgBgrVPOttbgk1rZ3k7W7WD0q1wvPrIyz8XzUyy5r0Ak8N/y2rYqPraAJHdBt0xDm/ZelVNPEWHwIP9UzV7soHX7PIUJjpKJBhY4YxPsUvMT8VIZ3zlVFgvXL/zoinwX0XfU7mvHBg1L+e3qzzBO7OC/HFKtxkBdg9I0PSu2eI9UVu3jcz6OvuyRojwgHseN6Hb/gWr4HbFudUQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=kEsO3Fg5jEoOzxP6mIW/o3C/pgHYZ+R3dFdG+2r7MBSEh6SE4CTEKVxP+cgCVqhtdg9jTeshgVZtBoatA4ZT4StRHpBb+EXXLLqgujkyZ26OcfiHYO1pAukmV7n/InUBcpzr+/PIKO4Iz/x9K2QPPj9pR3K9OLMCimXmKsJGm0E= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DU0PR08MB8709.eurprd08.prod.outlook.com (2603:10a6:10:403::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.25; Mon, 14 Oct 2024 10:56:16 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%7]) with mapi id 15.20.8048.020; Mon, 14 Oct 2024 10:56:16 +0000 Date: Mon, 14 Oct 2024 11:56:14 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, ktkachov@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 3/4]AArch64: enable zero-extends using TBLs for Adv. SIMD Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0586.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:295::7) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DU0PR08MB8709:EE_|DB5PEPF00014B96:EE_|AM9PR08MB6004:EE_ X-MS-Office365-Filtering-Correlation-Id: 1f0e4342-be09-4d2e-0ce5-08dcec3ed977 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info-Original: uhTzwNTfKvIDN6n60wdYL1YycXkeIC6WP3WU3jukAfyE34rW8OfG3LeCRTg1tVGMJoh5jqvuF9OaSkjPuhz44DKAoWGRPY5MAAGQ3P/hM9SG2EkbaH5mcE65sabPhL/xCuW+afPCwYzgdXXlAz7Emi1YQdfxJW8CoMx4Na9U8zW7CU4BsjHzwrLQ0ZCULoyIOlY8SquPwcpKn0oDaI6OkHVHstYgkqhtdDY+HGtefBv9JnYCrHSb14+DIU+v7JRW4eJbSW+fMYwtkDPZ9YDBIgLDOilexlyr8TM0MtDXH9d3z3Nx7n4U0Wdt5yvgh0RzmjOeXE/9gmA3u176V7KlZ+gBltb3T2Xpp1gN6Xc30lP64c1xo3uHQmcP02MORJjwYguyK8vVbro9+9zVQnMwLWov4zfO7jiGhv98O933w8YauRwL4+Q9VR0c3xptJcaAoXdixaIjKTVAUCsTHPzBKJvhZeSma0QVajMLgCAEqbPT8d1qqkzw+MzJevvBlSfhGaVmA3ETMV/PDpsF+cURsZqX7R1BpY8QkrYwNBU3qycZJ60J6jsBWWuQ6D6u8r3GesiWDkOkvOcrhO3bvJOnUAYalDhrTHNJ0db28CsCYHNKWfaAoquxoL2/dNbai4JLAhdTzbBhJDYjYcZXk7qBGpG4u/czxPK8z8/zzQ6q3EbyH3pOV7R37Swa9joEFLvqvzs4+FFMPDIYqiYz/uUuqzSTQJyfb81gqGmFsaz3Cmai4g9INi2pdNKGMcfFdTfX8ZNWKs/vbmuWKgu5exvgYrIVYdRn62UukVmvVa71gOaL1dI/MtyL1+3eJdbYIR25egrYlxLR4kR2YZym53HANTacxgU7rF5KeOekWislpX/54f7vAmgrSlFZH8zT39H7mUp4dDZYU68FnbH2ov+JH1SzM23poLSYIwETZglpN46Qx8eIYuS8JYmlRi8NpPLWiycCJS6U/LKeDn3pLqVXeohyIv6JI+8R3f23czp6fz3wQjDGZ/SStJIErV++CnkJj07buV+hTz3OTI4rXcZti6gtEnaRR1BFJccvmXX0l7ZawHoKWg1u0UR5LQK/KzJIaqfzXEe4zNlqTq8xaGEepHiaJryWM5HvPIWpNiwFd99i6qBTKvs4tWSNePOZRMnAxqCRRqsyMdbedvWxyg9v5Nir/WGkMU7d1YEsQnWiSg7RVAZkTkaFBx5XE2+khJdqMDTmS1qJCBw0tLBj3T5zr8iuf6ejRevQQ1HQdVIYhVM= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8709 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B96.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: cf278b3f-c472-494a-3a1f-08dcec3ed365 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|82310400026|376014|36860700013|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?q?GE4xFpcKnZ4nyIuAbubBjORRVQr5Jzr?= =?utf-8?q?v46AQ7JpePtwOtPLL5Oru8c91hPIaL4tCNJ4zUhR+zXRXxUwqtfQ5bRP8wBzn6plg?= =?utf-8?q?5DgVhwY5/h1RMeZcIgRpGY4eiWHvh6CLSlUFsABX9vbO8MGmtoWlOLV3f5GTcJHho?= =?utf-8?q?NypoNGOBhNJP+tJUMyjQX9azTMZQqKr/43k/3HuV7d8cSCZOauTg8qciCzdGnkjnL?= =?utf-8?q?NXICKlahEBw4CcCAyTOaLTqY0hH2qW5gTZ62Cuex6SEwsHc55fLvgzppuQ59fQfzQ?= =?utf-8?q?TEv/lJzjNr7kQrPHeuthlECvxkBjVWkYrDPcqmYtEJiR+JNszwVsreB+WmKYGsB/+?= =?utf-8?q?3pfw7AuVFidXp/W0Bb+xK9ddaoe/3qpi/Vl/p5ppBNTH5u3N4pXFH/sJSPBIcH03B?= =?utf-8?q?IQaM//+dOtH3BlTLOmfyX3VbiXR7z/bJkYRFnC4EWvycnuPByK9mtnRzLv1Jv2b4x?= =?utf-8?q?Qa2YLK0TbZPMY4L7vbWapMfVMgf72EJeyq5FFDT1ZuNSOn7OFqeYA5EF6WHnoswlU?= =?utf-8?q?BgkEh+sbk5apFA7Rb+k3MTsndTp8mMnLcS4Rp1NuVsdX8dmXBanskyttLbH/n9Ai7?= =?utf-8?q?KJeONUmDDOYFA9GRSkj6sDhyxLKCa5JJXlfwNpgbzEbzT2InZkUybeoFKsc5tGwYV?= =?utf-8?q?/f00SLF2n6rPueQfFOsY94c6HHJeofK036k5GA3S7+fEhZUkz8Lrsgps6I/wv9Nms?= =?utf-8?q?au79NWWAZ6Q6EshTs8az6b9pahpZhTMGNx521GO1HP8+WEExXhCYTUMFzIIVe61if?= =?utf-8?q?b+3DZH3m1UZX1rVdn6DitcdnV+V+3wlsShM0U9StMv6Ke05cjs+4xJOSPunk5Z/k8?= =?utf-8?q?0uqzGhXEqwAUyjUqXmK6GcU3Ts7OVpBgovLhpjZkn92a+4HuuIKAhoAkkAuQ29pkt?= =?utf-8?q?Gwk5XbywdfmqCG6S6XXqMKjl/rpRfkYR7KMYVjhjrq1K3LaKm0g2pY11gteFcniD8?= =?utf-8?q?WAJbkGXYVG/yDIzNszKVzXzhPt7DOMhDykH35EWIuKpZodllv+p7dvKUXB0Ag9AII?= =?utf-8?q?aUaI8avcLxPbpHpZMfpcAS1EevLbHRBKOOQwhGa0qmyZejZLqBWRTMgo6hk2pVZ3v?= =?utf-8?q?JisgaphRircKgZf/3FLr4+z5Y/sApBwQLMnjdJFzCs2aT2ulhj5VS+zj4cT1kTLGv?= =?utf-8?q?ImnevwACBz3eP0xhWnxsxUiMKVgqcZp4yxXc1g0rbFzfIlgwCJLnfgO778ErG+Kgy?= =?utf-8?q?IXqDJiPx762qtD1BNh8PVebe27e3iilOFpgwZlGvHMU0Z2g0rD0FKt9PfIGmor/bb?= =?utf-8?q?lp2n+Ejrelmki?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(35042699022)(82310400026)(376014)(36860700013)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2024 10:56:26.0353 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1f0e4342-be09-4d2e-0ce5-08dcec3ed977 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B96.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6004 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, In this patch series I'm adding support for zero extending using permutes instead of requiring multi-step decomposition. This codegen has the benefit of needing fewer instructions and having much higher throughput than uxtl. We previously replaced pairs of uxtl/uxtl2s with ZIPs to increase throughput, but latency was still an issue due to the depencency chain created. To fix this we can now use TBLs. The indexes are listed output of the loop as well and can be shared amongst zero extends of the same type. The additional benefit of this is that if the values are being permuted after extensions they will be simplified and merged leading to better overall code. e.g. an LD4 can be replaced with LDR since the permutes being performance for the extensions can be merged with the load permutes. The way LOAD_LANES is currently implemented means this can't be done in GCC yet, but I'm aiming for this in GCC 15. I've additionally only added support for non-VLA. The problem with VLA is that the index registers are hard or impossible to make. On Adv. SIMD we use -1 to indicate an out of range register so we can transform the two regs TBL into a one reg one. However on e.g. a byte array, on VLA 255 would be a valid entry. e.g, at VL > 2048. Which means that's already not a transformation we can make. Secondly the actual mask looks something like {x,x,x,n,n+1, x,x,x, n+2, n+3} and while I think I can represent this in vect_perm_builder, I couldn't think of any real efficient VLA way to build such masks.. It would require a lot of setup code. Lastly I don't think this transformation would make much sense for SVE, as SVE has loads and converts that can do multi-step types. For instance the loop below is already pretty good for SVE (though it's missed that the load can do more than one step, presumably because a single extend is merged only in RTL). While I tried hard, for these reasons I don't support VLA, which I hope is ok.. Concretely on AArch64 this changes: void test4(unsigned char *x, long long *y, int n) { for(int i = 0; i < n; i++) { y[i] = x[i]; } } from generating: .L4: ldr q30, [x4], 16 add x3, x3, 128 zip1 v1.16b, v30.16b, v31.16b zip2 v30.16b, v30.16b, v31.16b zip1 v2.8h, v1.8h, v31.8h zip1 v0.8h, v30.8h, v31.8h zip2 v1.8h, v1.8h, v31.8h zip2 v30.8h, v30.8h, v31.8h zip1 v26.4s, v2.4s, v31.4s zip1 v29.4s, v0.4s, v31.4s zip1 v28.4s, v1.4s, v31.4s zip1 v27.4s, v30.4s, v31.4s zip2 v2.4s, v2.4s, v31.4s zip2 v0.4s, v0.4s, v31.4s zip2 v1.4s, v1.4s, v31.4s zip2 v30.4s, v30.4s, v31.4s stp q26, q2, [x3, -128] stp q28, q1, [x3, -96] stp q29, q0, [x3, -64] stp q27, q30, [x3, -32] cmp x4, x5 bne .L4 and instead we get: .L4: add x3, x3, 128 ldr q23, [x4], 16 tbl v5.16b, {v23.16b}, v31.16b tbl v4.16b, {v23.16b}, v30.16b tbl v3.16b, {v23.16b}, v29.16b tbl v2.16b, {v23.16b}, v28.16b tbl v1.16b, {v23.16b}, v27.16b tbl v0.16b, {v23.16b}, v26.16b tbl v22.16b, {v23.16b}, v25.16b tbl v23.16b, {v23.16b}, v24.16b stp q5, q4, [x3, -128] stp q3, q2, [x3, -96] stp q1, q0, [x3, -64] stp q22, q23, [x3, -32] cmp x4, x5 bne .L4 Which results in up to 40% performance uplift on certain workloads. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_use_permute_for_promotion): New. (TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION): Use it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-tbl-zero-extend_1.c: New test. --- -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 102680a0efca1ce928e6945033c01cfb68a65152..b90577f4fc8157b3e02936256c8af8b2b7fac144 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -28404,6 +28404,29 @@ aarch64_empty_mask_is_expensive (unsigned) return false; } +/* Implement TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION. Assume that + predicated operations when available are beneficial when doing more than + one step conversion. */ + +static bool +aarch64_use_permute_for_promotion (const_tree in_type, const_tree out_type) +{ + /* AArch64's vect_perm_constant doesn't currently support two 64 bit shuffles + into a 128 bit vector type. So for now reject it. */ + if (maybe_ne (GET_MODE_BITSIZE (TYPE_MODE (in_type)), + GET_MODE_BITSIZE (TYPE_MODE (out_type)))) + return false; + + auto bitsize_in = element_precision (in_type); + auto bitsize_out = element_precision (out_type); + + /* We don't want to use the permutes for a single widening step because we're + picking there between two zip and tbl sequences with the same throughput + and latencies. However the zip doesn't require a mask and uses less + registers so we prefer that. */ + return (bitsize_out / bitsize_in) > 2; +} + /* Return 1 if pseudo register should be created and used to hold GOT address for PIC code. */ @@ -31113,6 +31136,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE #define TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE \ aarch64_conditional_operation_is_expensive +#undef TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION +#define TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION \ + aarch64_use_permute_for_promotion #undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE #define TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE \ aarch64_empty_mask_is_expensive diff --git a/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_1.c b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_1.c new file mode 100644 index 0000000000000000000000000000000000000000..3c088ced63543c203d1cc020de5d67807b48b3fb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_1.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99 -march=armv8-a" } */ + +void test1(unsigned short *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + unsigned short a = x[i*4+0]; + unsigned short b = x[i*4+1]; + unsigned short c = x[i*4+2]; + unsigned short d = x[i*4+3]; + y[i] = (double)a + (double)b + (double)c + (double)d; + } +} + +void test2(unsigned char *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + unsigned short a = x[i*4+0]; + unsigned short b = x[i*4+1]; + unsigned short c = x[i*4+2]; + unsigned short d = x[i*4+3]; + y[i] = (double)a + (double)b + (double)c + (double)d; + } +} + +void test3(unsigned short *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + unsigned int a = x[i]; + y[i] = (double)a; + } +} + +void test4(unsigned short *x, long long *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] = x[i]; + } +} + +void test5(unsigned int *x, long long *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] = x[i]; + } +} + +void test6(unsigned char *x, long long *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] = x[i]; + } +} + +/* { dg-final { scan-assembler-times {\tzip1} 1 } } */ +/* { dg-final { scan-assembler-times {\tzip2} 1 } } */ +/* { dg-final { scan-assembler-times {\ttbl} 64 } } */ +/* { dg-final { scan-assembler-times {\.LC[0-9]+:} 12 } } */