From patchwork Mon Oct 14 10:55:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1996807 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=ObJroyxx; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=ObJroyxx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvLN24bsz1xvK for ; Mon, 14 Oct 2024 21:56:44 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 68E873856DFF for ; Mon, 14 Oct 2024 10:56:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on20628.outbound.protection.outlook.com [IPv6:2a01:111:f403:260c::628]) by sourceware.org (Postfix) with ESMTPS id 902C3385AC34 for ; Mon, 14 Oct 2024 10:55:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 902C3385AC34 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 902C3385AC34 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260c::628 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903365; cv=pass; b=o60PDRNv5uZ5wZl1K9YnoLtjXdu+eOYsIN0wwi6OLslwggKevxg6KzThcMzvutiSnPNi/0pHBHrSDCQCLShfkhCPL8X5rf6caLHqrQhxjHV5Ze/xxBO4x/HZ01lZzjany8aC5ROMyHjx6A4wnQif/ITmjBAch3Pclkc4clpCz1E= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903365; c=relaxed/simple; bh=lHUIJDFrqD7C/e0TAT5vZLCV/z/r4XWswICtvBHjLjA=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=BAmRKTdx2IJ6BZUPHgdii+IZw+8MoLslXPGOUGvifsNeDNz+Si3aBey4ECK+ZcB4Nl6F5NyArNDY6/2SJ6cxi9GsHJUVbR3GVaiwo/llLkYvhaTK24UTv8iXcfYXld1uv8FxZlnefiamaG9HPS3eHrMHtYCkGamzF/rkEub6CDE= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=li9wJIJcqmltHNPktKvT50eOH7v44UsXSPA9pLxjgvgjYzAmAu1T00mJnF57nJiL+1LSfEZaHwNKRF82hxer+wcctFU/jx2AC+EdVTpskEsTYUI+G0HJDDl9GctrV9/xJGHQQWNULUVVidp0eS9SQx0ZH0Kr1iMloHoq+ovgkDvh0e5lJk1voqkmeUXr1ivNtKktg2EABuKNLOTObJsuy6MFoBGWwGGYYGrFMTSs73ibRausXS2cpJ+7SYRvpRiDfk06Vm1t4sflIdW0E1UNGUdykaulghL5i4w4gdJgerG1WiP7MILnNHJNmbzHsfYR01LAGC5rjYvw5NLuNkQbHg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CYWpFkc3IXRjILWuV3IWDhL3dOocl5F1OC89dZoAM74=; b=aueCma7gMnKgZKqw+ZnmpFYJci3tFRJ46mPk6zR5pELzAx05n1Z/C5n7hJso9s/19F/QVua97DjLFvBbd3MTa4lXayva5M0v0LmrVWAwxvrNuGERPQe5NCNw2939RnCBPko5mSbw2bWQQOWJwuaA539FblqEXxUr6t5c5A03JxNwxIKE+5aV2v47yEgK+/tCbQVOFzy4a/vuNRdqGaNNwImPdRSUNL3pu1WObWD4McqAPL1ybcM4ss0aH7a7hkZ55AQVsmkImo89/7sGAy7D9GDm+KCYbEmXIRJT/4/8q8t5LIs5q4QRwBjZERsi6gQ69VvF/P9a49T6gz9DHZ8r7w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CYWpFkc3IXRjILWuV3IWDhL3dOocl5F1OC89dZoAM74=; b=ObJroyxxb4YhH++Qk5ItE/LK3Tzi2RSVE0XMz3eRfhORjZISeGar3ftdUa2WV+GHw98IBHePnzdeWjJDrbY+HsokndScusIV7SHZvz+tzGb6xjHdah6JwiH0Kato8cFdOKKxBwpxvE7/cSq6HuzlQ5EKPLCmx6op6AOmd5vi7zA= Received: from DB8PR03CA0010.eurprd03.prod.outlook.com (2603:10a6:10:be::23) by DU0PR08MB9080.eurprd08.prod.outlook.com (2603:10a6:10:474::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.26; Mon, 14 Oct 2024 10:55:53 +0000 Received: from DB1PEPF000509E5.eurprd03.prod.outlook.com (2603:10a6:10:be:cafe::57) by DB8PR03CA0010.outlook.office365.com (2603:10a6:10:be::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.26 via Frontend Transport; Mon, 14 Oct 2024 10:55:53 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF000509E5.mail.protection.outlook.com (10.167.242.55) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8069.17 via Frontend Transport; Mon, 14 Oct 2024 10:55:51 +0000 Received: ("Tessian outbound 40ef283ec771:v473"); Mon, 14 Oct 2024 10:55:51 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c46d92370449540a X-TessianGatewayMetadata: fcKUdioW+GWQDueD/3D/6Kz/JduUerhmgYOMY3Xt8V48P08pRr9IKRWp5ej4kPj1R8DMB3ylZlbLFW0sAglWlmd2laJpHTzdr9pGafWb38qC8adRvV5FnCazGW6JxIAo6S9WvS9u9b91FKXOGRMkfg== X-CR-MTA-TID: 64aa7808 Received: from La5b8bccaacf5.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B34BB2DB-656A-478B-991A-BEFCA5D81645.1; Mon, 14 Oct 2024 10:55:44 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id La5b8bccaacf5.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Oct 2024 10:55:44 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=f0izJOcWKeTMyFHyDQ1NNbtdLvSvlZdvl/CNNrJB0LADLuBHDeRMqxQivJnMZFNY/IKhWdJPvxqpaH7JOupGvKMf8hW+EQab+QEXpB28Kc7ITn6AjUPC0Y3rbkIMH1ecLEIs/YkqUx/JE1Oa3boNejDwms8u5q2hVtVt+AslgLs3Uaj9DYO8XrlX/Zu1RGBH7y2al4i9ue5nXCtiOxA7FUW5zXZAJC6arzIfv+eewEcEPdsQ2hWAb8cyNZDp9SZgMayZRH+nouunNzueA+v8MED0M6pGWqRTxLNVoIX7FvPP2nZ1aS6mC5DzJzk7kISw7sVPQlXW2TNMgRarR+Tw0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CYWpFkc3IXRjILWuV3IWDhL3dOocl5F1OC89dZoAM74=; b=C+5WnFYsrIg7J3XB6Y1cvsQOQhkyjMqEaopJJ8Zulsn/SGOU2D9B3SpeErQLSXnQUhGCElfmOe/n/jedeKakwQ4zTytJ5I1gVb5BgTCTJFV1C9BKXpUOn+M2AWRTXEsvo/mI4fp+7ZQjXFNeEkYmMK92ADLO+HLr8U1DAwmhLS6Wn+QkQuPDjaR9iSeWi52T53CM9E5mjQGqD97krI4591Smq3zkCfP1Os02XPEKD87AriINHoA2+yKjhCFBRImBI75t0/4uw52rHqVbgFkXJXyIcGwjMgRFHW6Yr8paOUP7yNmGVrTzOYMQvjBoC6IR3DmLboDF1QUXwZdZTA82Sw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CYWpFkc3IXRjILWuV3IWDhL3dOocl5F1OC89dZoAM74=; b=ObJroyxxb4YhH++Qk5ItE/LK3Tzi2RSVE0XMz3eRfhORjZISeGar3ftdUa2WV+GHw98IBHePnzdeWjJDrbY+HsokndScusIV7SHZvz+tzGb6xjHdah6JwiH0Kato8cFdOKKxBwpxvE7/cSq6HuzlQ5EKPLCmx6op6AOmd5vi7zA= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAWPR08MB9806.eurprd08.prod.outlook.com (2603:10a6:102:2e2::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.27; Mon, 14 Oct 2024 10:55:42 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%7]) with mapi id 15.20.8048.020; Mon, 14 Oct 2024 10:55:42 +0000 Date: Mon, 14 Oct 2024 11:55:40 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de Subject: [PATCH 1/4]middle-end: support multi-step zero-extends using VEC_PERM_EXPR Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO4P123CA0082.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:190::15) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAWPR08MB9806:EE_|DB1PEPF000509E5:EE_|DU0PR08MB9080:EE_ X-MS-Office365-Filtering-Correlation-Id: 50ba9d02-af9a-4538-e794-08dcec3ec4ef x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info-Original: 9SIAJZLJZjx9m9pmakNJAcWJPXqLEolfO9Mo8YcQlNjnHb1G+w8Sa1VSbmY0cwHNB5fgHm6BT2rxASsaxNrUEF6hKZuMzNMWDPMJfvAuW4UQYq/I4nY7v3k7Fbv52bvu80NK0RH2kU5ZUVU11bz6krg5EF8urTvTLCYlSrW5dGanNfmw09rTc+Tj+eygMgZsDZIhmlNseCGCvU0lr9m9j0ZMdoPkf3y5KguuXMO2Oxuw9xJxNZNmXR/7yvdfxrcmSic4SFixAS6/A9LlE+vMgd7zHJcw57wsO8ZBJvcqD5SBdRi9nUnLjFPC0+ubgy928GrfJtXEUWZqSCoB2ZaosOzCxqkKnjpRTyxLyflPKvT2vpCsq/7DdwX5XkZwPdyzbuhY7YbIF227ku0w1Vesqu6kCb++aKC/4wCQ3VGqQslATgV7jLM28y/XOod9Hqcu+Hna1JFhvo7hjzEs+/XOJHhgqaU6p0VQWrFvBtNNIULF089oW+csMKKcp9xEbYXMVLGsAaCWM8ioVFtDGKVr30EVhu2tRKGQD5VNIX7vnIhW4lbf+xP0d8Q43fGir9qDrSGlUEiqhNxLdoOoEwLM/RGyIosKEVGDkO7VPiRoPJPNaoPAX2nv19VtH7OcisCxC7yTIeoiz5N6qST0JFZD2xMTvOsNcy3B0nXln80Nx0gq/32ErFNDTEItWHBcuIh320TpCBrDb8zM2zgfk+u/R5uCgjs+o24/Z0eM0Z0Du8EKuSRaiQ3ZhuV6A1KRY8X0UOc6B9WkqIz+vdS65Z1525WidEWh7aWNbkCAy1hdkoITJ2yR238LqAk6lRGaNDWTgWOYgL+tn/JEXWah5Ro+KSr0G9D9aqgWS/PRVjce2j+G9kZqFpnIasA1CqDqaygocomoxVeSdanUlcOatxQxSeOtcOWQ6P02oE1kNWp8qwzGVEr72xPLrjh2E2bg9LcbW6S+II+gRHkQzHODMyfM4zyXSnJbMdV/DCPZKBTFj2FkWTItuqYpKWF67C3YUu8aOeU/VDSyOocKLEfbr+eqfeQ3ErPzbseNGkMI85CS+qgpJlRPO9+lQQiBqEREnrA1fVFK9dQMh7hmvY6P8PxawIhu5s/po+oJ81B0nVNyRpSzX3PQsMTplrzVcggjLmvuf2FRYo4A6t8MVDuQ+f07vMznD2Vz2KlfhqNNWpcbJ9llR9am3mU55IodVJ238J4V9/b+QR0gpg+0Y/3WQr12jbmqpBjHPBCELGISpnanEYfK+NahaL7oaKmZryB8Y7UrAdCEKN8XiN5n5YvpGTKJMQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9806 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF000509E5.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 98dbebda-79b6-4d6a-bca8-08dcec3ebf17 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|35042699022|376014|36860700013|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?q?EA+wL4Onrdxro6I6/m7znyCN6mkeTbM?= =?utf-8?q?Jh5UOSnpQzcOZy1K883XztnCWjn9oHBg4MMw69QsOo4guUymYa2FEIQ58/l+YsfM6?= =?utf-8?q?me5KV4hS8Z+vDt2AzuCnSDJf0FjixhdKRozqM2irrVOPViX1q5LzScKww3ADX5cr9?= =?utf-8?q?7+tO7VL527url5Da2z+JlXrVTgwek8pEQw1dYibFlwcXPVR45Tt+Lxm0oXO16moxe?= =?utf-8?q?FbTo2BwpP0l+RDjCcxXoyDAX6swJ/wFFXrHT41q/lxCZYiLDpsghIedcdI2oS8aL/?= =?utf-8?q?2ia25eQUG9e12AU0MTzGBcReU7lbtrcRy5wM3hiNHndF5ywZpjGvSr2bOeqjmSXA9?= =?utf-8?q?145qIkdU8A77g8bSoG8/ne/eETLVtcV/tHsP7r7079Uofz2+MqxQqBQEXM2q1/Foq?= =?utf-8?q?8iT3xUEu/IqMHTlm7Yw0HNol23QoMgEUXto34AXpRzv9xFuKsJFSsnnzdVcAuoBLi?= =?utf-8?q?LZLcDxwm20bkar/SOy7BnpMqavPRK7p02y0LtVpg09S4/JhKIGbcoY0voguM8aNKn?= =?utf-8?q?V7Eb5H5OHbB3jQN9payy1em4AkClPHKPo4iOgaLOs302JqnQ9dOyWvoTaMfAK3el/?= =?utf-8?q?i8pqofwb+tUQpxFW2TPxT8iwavoUaTQE5h6CUhYTpjmhXjWe5tdcq8bDW7O9shKgT?= =?utf-8?q?GARH03XGgVpoOSbmW2uPxlVGeHWfprqzAH0xCINSPq1qp04tKkA7K12hC/K+4O0GP?= =?utf-8?q?X0Y/N+L4tQyAUBWl8Xky8wrrOpQkvlK8iLPl4aB7tpZeGTBVMQTRoiiKOWXlcYMQr?= =?utf-8?q?dtEQXTvfbFCtBEvvFgfv+bQ8SGryFyF+FQkZt97IHs+uyyOWoQlKjaHgcDUksP+lU?= =?utf-8?q?Jjl98AyvROMU0D2PdKU8QFQc9kgH0X2O+VtfcKs2Ngd51UcLA7LryxmFtZ6GsOK63?= =?utf-8?q?OjxCjPEJ1SmIHl57seq7+aYE7FIOq8QXNBCeIGi7qIQvw2z9vDRwUxK5dcT4ezIvo?= =?utf-8?q?Yd55M5GeeVDLYntiaQFWx/bhpG/5i+i82ppbCDO2pTQMoAMuSqJvep5TzzweFsQ4V?= =?utf-8?q?1YcSuFIrIk58FgimWZQYiThZgcbSqENCThFAxc6Ik0VAURnHOzDVMFfmrTKVoGW4S?= =?utf-8?q?K1INT9d89WU7v5WPuHu6t2Uj3nPyQrFwIQThdB9/Ws7zBuatGXHN3YmBt8y3mTjfx?= =?utf-8?q?yoVK9oq4AIt9pgCceitvSuCkiJxBeI//HqeWj/DwVxLVeVGN/k1cfy3HLf0MLOQNv?= =?utf-8?q?mooIZTCSorNbdgXKphvQmHHfkXvPwg4HXGaf1JH8pesOLj/FXTDznwg7ujyLWfa8+?= =?utf-8?q?oZQDLqnGHZI/PlTDu4zdu9SujxwfuJey4g/3NMq3/YlLsTSkUrKD9Gcg=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(82310400026)(35042699022)(376014)(36860700013)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2024 10:55:51.5795 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 50ba9d02-af9a-4538-e794-08dcec3ec4ef X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF000509E5.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9080 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_LOTSOFHASH, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, This patch series adds support for a target to do a direct convertion for zero extends using permutes. To do this it uses a target hook use_permute_for_promotio which must be implemented by targets. This hook is used to indicate: 1. can a target do this for the given modes. 2. is it profitable for the target to do it. 3. can the target convert between various vector modes with a VIEW_CONVERT. Using permutations have a big benefit of multi-step zero extensions because they both reduce the number of needed instructions, but also increase throughput as the dependency chain is removed. Concretely on AArch64 this changes: void test4(unsigned char *x, long long *y, int n) { for(int i = 0; i < n; i++) { y[i] = x[i]; } } from generating: .L4: ldr q30, [x4], 16 add x3, x3, 128 zip1 v1.16b, v30.16b, v31.16b zip2 v30.16b, v30.16b, v31.16b zip1 v2.8h, v1.8h, v31.8h zip1 v0.8h, v30.8h, v31.8h zip2 v1.8h, v1.8h, v31.8h zip2 v30.8h, v30.8h, v31.8h zip1 v26.4s, v2.4s, v31.4s zip1 v29.4s, v0.4s, v31.4s zip1 v28.4s, v1.4s, v31.4s zip1 v27.4s, v30.4s, v31.4s zip2 v2.4s, v2.4s, v31.4s zip2 v0.4s, v0.4s, v31.4s zip2 v1.4s, v1.4s, v31.4s zip2 v30.4s, v30.4s, v31.4s stp q26, q2, [x3, -128] stp q28, q1, [x3, -96] stp q29, q0, [x3, -64] stp q27, q30, [x3, -32] cmp x4, x5 bne .L4 and instead we get: .L4: add x3, x3, 128 ldr q23, [x4], 16 tbl v5.16b, {v23.16b}, v31.16b tbl v4.16b, {v23.16b}, v30.16b tbl v3.16b, {v23.16b}, v29.16b tbl v2.16b, {v23.16b}, v28.16b tbl v1.16b, {v23.16b}, v27.16b tbl v0.16b, {v23.16b}, v26.16b tbl v22.16b, {v23.16b}, v25.16b tbl v23.16b, {v23.16b}, v24.16b stp q5, q4, [x3, -128] stp q3, q2, [x3, -96] stp q1, q0, [x3, -64] stp q22, q23, [x3, -32] cmp x4, x5 bne .L4 Tests are added in the AArch64 patch introducing the hook. The testsuite also already had about 800 runtime tests that get affected by this. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * target.def (use_permute_for_promotion): New. * doc/tm.texi.in: Document it. * doc/tm.texi: Regenerate. * targhooks.cc (default_use_permute_for_promotion): New. * targhooks.h (default_use_permute_for_promotion): New. (vectorizable_conversion): Support direct convertion with permute. * tree-vect-stmts.cc (vect_create_vectorized_promotion_stmts): Likewise. (supportable_widening_operation): Likewise. (vect_gen_perm_mask_any): Allow vector permutes where input registers are half the width of the result per the GCC 14 relaxation of VEC_PERM_EXPR. --- -- diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 4deb3d2c283a2964972b94f434370a6f57ea816a..e8192590ac14005bf7cb5f731c16ee7eacb78143 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6480,6 +6480,15 @@ type @code{internal_fn}) should be considered expensive when the mask is all zeros. GCC can then try to branch around the instruction instead. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION (const_tree @var{in_type}, const_tree @var{out_type}) +This hook returns true if the operation promoting @var{in_type} to +@var{out_type} should be done as a vector permute. If @var{out_type} is +a signed type the operation will be done as the related unsigned type and +converted to @var{out_type}. If the target supports the needed permute, +is able to convert unsigned(@var{out_type}) to @var{out_type} and it is +beneficial to the hook should return true, else false should be returned. +@end deftypefn + @deftypefn {Target Hook} {class vector_costs *} TARGET_VECTORIZE_CREATE_COSTS (vec_info *@var{vinfo}, bool @var{costing_for_scalar}) This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block. The default diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9f147ccb95cc6d4e79cdf5b265666ad502492145..c007bc707372dd374e8effc52d29b76f5bc283a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4303,6 +4303,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE +@hook TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION + @hook TARGET_VECTORIZE_CREATE_COSTS @hook TARGET_VECTORIZE_BUILTIN_GATHER diff --git a/gcc/target.def b/gcc/target.def index b31550108883c5c3f5ffc7e46a1e8a7b839ebe83..58545d5ef4248da5850edec8f4db9f2636973598 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2056,6 +2056,20 @@ all zeros. GCC can then try to branch around the instruction instead.", (unsigned ifn), default_empty_mask_is_expensive) +/* Function to say whether a target supports and prefers to use permutes for + zero extensions or truncates. */ +DEFHOOK +(use_permute_for_promotion, + "This hook returns true if the operation promoting @var{in_type} to\n\ +@var{out_type} should be done as a vector permute. If @var{out_type} is\n\ +a signed type the operation will be done as the related unsigned type and\n\ +converted to @var{out_type}. If the target supports the needed permute,\n\ +is able to convert unsigned(@var{out_type}) to @var{out_type} and it is\n\ +beneficial to the hook should return true, else false should be returned.", + bool, + (const_tree in_type, const_tree out_type), + default_use_permute_for_promotion) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 2704d6008f14d2aa65671f002af886d3b802effa..723f8f4fda7808b6899f10f8b3fafad74d3c536f 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -124,6 +124,7 @@ extern opt_machine_mode default_vectorize_related_mode (machine_mode, extern opt_machine_mode default_get_mask_mode (machine_mode); extern bool default_empty_mask_is_expensive (unsigned); extern bool default_conditional_operation_is_expensive (unsigned); +extern bool default_use_permute_for_promotion (const_tree, const_tree); extern vector_costs *default_vectorize_create_costs (vec_info *, bool); /* OpenACC hooks. */ diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index dc040df9fcd1182b62d83088ee7fb3a248c99f51..a487eab794fe9f1089ecb58fdfc881fdb19d28f3 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1615,6 +1615,14 @@ default_conditional_operation_is_expensive (unsigned ifn) return ifn == IFN_MASK_STORE; } +/* By default no targets prefer permutes over multi step extension. */ + +bool +default_use_permute_for_promotion (const_tree, const_tree) +{ + return false; +} + /* By default consider masked stores to be expensive. */ bool diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 4f6905f15417f90c6f36e1711a7a25071f0f507c..f2939655e4ec34111baa8894eaf769d29b1c5b82 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5129,6 +5129,111 @@ vect_create_vectorized_promotion_stmts (vec_info *vinfo, gimple *new_stmt1, *new_stmt2; vec vec_tmp = vNULL; + /* If we're using a VEC_PERM_EXPR then we're widening to the final type in + one go. */ + if (ch1 == VEC_PERM_EXPR + && op_type == unary_op) + { + vec_tmp.create (vec_oprnds0->length () * 2); + bool failed_p = false; + + /* Extending with a vec-perm requires 2 instructions per step. */ + FOR_EACH_VEC_ELT (*vec_oprnds0, i, vop0) + { + tree vectype_in = TREE_TYPE (vop0); + tree vectype_out = TREE_TYPE (vec_dest); + machine_mode mode_in = TYPE_MODE (vectype_in); + machine_mode mode_out = TYPE_MODE (vectype_out); + unsigned bitsize_in = element_precision (vectype_in); + unsigned tot_in, tot_out; + unsigned HOST_WIDE_INT count; + + /* We can't really support VLA here as the indexes depend on the VL. + VLA should really use widening instructions like widening + loads. */ + if (!GET_MODE_BITSIZE (mode_in).is_constant (&tot_in) + || !GET_MODE_BITSIZE (mode_out).is_constant (&tot_out) + || !TYPE_VECTOR_SUBPARTS (vectype_in).is_constant (&count) + || !TYPE_UNSIGNED (vectype_in) + || !targetm.vectorize.use_permute_for_promotion (vectype_in, + vectype_out)) + { + failed_p = true; + break; + } + + unsigned steps = tot_out / bitsize_in; + tree zero = build_zero_cst (vectype_in); + + unsigned chunk_size + = exact_div (TYPE_VECTOR_SUBPARTS (vectype_in), + TYPE_VECTOR_SUBPARTS (vectype_out)).to_constant (); + unsigned step_size = chunk_size * (tot_out / tot_in); + unsigned nunits = tot_out / bitsize_in; + + vec_perm_builder sel (steps, 1, 1); + sel.quick_grow (steps); + + /* Flood fill with the out of range value first. */ + for (unsigned long i = 0; i < steps; ++i) + sel[i] = count; + + tree var; + tree elem_in = TREE_TYPE (vectype_in); + machine_mode elem_mode_in = TYPE_MODE (elem_in); + unsigned long idx = 0; + tree vc_in = get_related_vectype_for_scalar_type (elem_mode_in, + elem_in, nunits); + + for (unsigned long j = 0; j < chunk_size; j++) + { + if (WORDS_BIG_ENDIAN) + for (int i = steps - 1; i >= 0; i -= step_size, idx++) + sel[i] = idx; + else + for (int i = 0; i < (int)steps; i += step_size, idx++) + sel[i] = idx; + + vec_perm_indices indices (sel, 2, steps); + + tree perm_mask = vect_gen_perm_mask_checked (vc_in, indices); + auto vec_oprnd = make_ssa_name (vc_in); + auto new_stmt = gimple_build_assign (vec_oprnd, VEC_PERM_EXPR, + vop0, zero, perm_mask); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + + tree intvect_out = unsigned_type_for (vectype_out); + var = make_ssa_name (intvect_out); + new_stmt = gimple_build_assign (var, build1 (VIEW_CONVERT_EXPR, + intvect_out, + vec_oprnd)); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + + gcc_assert (ch2.is_tree_code ()); + + var = make_ssa_name (vectype_out); + if (ch2 == VIEW_CONVERT_EXPR) + new_stmt = gimple_build_assign (var, + build1 (VIEW_CONVERT_EXPR, + vectype_out, + vec_oprnd)); + else + new_stmt = gimple_build_assign (var, (tree_code)ch2, + vec_oprnd); + + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + vec_tmp.safe_push (var); + } + } + + if (!failed_p) + { + vec_oprnds0->release (); + *vec_oprnds0 = vec_tmp; + return; + } + } + vec_tmp.create (vec_oprnds0->length () * 2); FOR_EACH_VEC_ELT (*vec_oprnds0, i, vop0) { @@ -5495,6 +5600,20 @@ vectorizable_conversion (vec_info *vinfo, || GET_MODE_SIZE (lhs_mode) <= GET_MODE_SIZE (rhs_mode)) goto unsupported; + /* Check to see if the target can use a permute to perform the zero + extension. */ + intermediate_type = unsigned_type_for (vectype_out); + if (TYPE_UNSIGNED (vectype_in) + && VECTOR_TYPE_P (intermediate_type) + && TYPE_VECTOR_SUBPARTS (intermediate_type).is_constant () + && targetm.vectorize.use_permute_for_promotion (vectype_in, + intermediate_type)) + { + code1 = VEC_PERM_EXPR; + code2 = FLOAT_EXPR; + break; + } + fltsz = GET_MODE_SIZE (lhs_mode); FOR_EACH_2XWIDER_MODE (rhs_mode_iter, rhs_mode) { @@ -9804,7 +9923,8 @@ vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel) tree mask_type; poly_uint64 nunits = sel.length (); - gcc_assert (known_eq (nunits, TYPE_VECTOR_SUBPARTS (vectype))); + gcc_assert (known_eq (nunits, TYPE_VECTOR_SUBPARTS (vectype)) + || known_eq (nunits, TYPE_VECTOR_SUBPARTS (vectype) * 2)); mask_type = build_vector_type (ssizetype, nunits); return vec_perm_indices_to_tree (mask_type, sel); @@ -14397,8 +14517,20 @@ supportable_widening_operation (vec_info *vinfo, break; CASE_CONVERT: - c1 = VEC_UNPACK_LO_EXPR; - c2 = VEC_UNPACK_HI_EXPR; + { + tree cvt_type = unsigned_type_for (vectype_out); + if (TYPE_UNSIGNED (vectype_in) + && VECTOR_TYPE_P (cvt_type) + && TYPE_VECTOR_SUBPARTS (cvt_type).is_constant () + && targetm.vectorize.use_permute_for_promotion (vectype_in, cvt_type)) + { + *code1 = VEC_PERM_EXPR; + *code2 = VIEW_CONVERT_EXPR; + return true; + } + c1 = VEC_UNPACK_LO_EXPR; + c2 = VEC_UNPACK_HI_EXPR; + } break; case FLOAT_EXPR: From patchwork Mon Oct 14 10:55:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1996808 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=L1Vtun+c; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=L1Vtun+c; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvLc2yGmz1xvK for ; Mon, 14 Oct 2024 21:56:56 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CBA5385AE59 for ; Mon, 14 Oct 2024 10:56:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on20614.outbound.protection.outlook.com [IPv6:2a01:111:f403:260d::614]) by sourceware.org (Postfix) with ESMTPS id C9949385AC30 for ; Mon, 14 Oct 2024 10:56:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C9949385AC30 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C9949385AC30 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260d::614 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903386; cv=pass; b=flaUpz2wYnWdbFJPYiOHKJSI/jqzxxn0m+IZTE7jxmeug0u1HhM+UQO1RBjO7SFGIs4er27V7lU9kmkxdGYd9MxtCQs13yexs1gqlG4qmaIIcJfhN3m1/H5H4lHZbji/6HEYyDlcNr/Pkz2AiG6kO/TedbAiedoYTvGqx4siCvs= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903386; c=relaxed/simple; bh=cGBYNbDHICpHYxMvIfqSJzT0vImqth9iLd/Azt1P5z8=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=ow6Z5bzDGbdio6F+B9tr6IfxDDg+V2SthPw54gIVmfW1mbomvV212zoofP3mArJr9CPRnC+6Vby17166XI62KsK4903d1cRoFrwuIKD1bydK5TY1KEPx4xXwg4ooFkTrDwR5dhD7cXBWn/sL3Qxj/q0VaFC+W83aP1F/9Y/9xHs= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=uTBkHUFAoKMrfrS9JlUZHBHIH5a4SPL3KQ/j8HbPfVhKi+SIxl3og+n8hLcPUGZupcg+Tg3UMxksU/42XeBfAV1+azbno8AT5AjBryVT+GNqf75HGdItLLGg2Y4lFv1a9O6TZfMLd0D3f7svLTrHwqI62/kvlq3Hrq3kCI5TmpSNmBfboedjh8ufCsG16fhAGWMZo0Fz1PqDBrFOgyXOE546HQL7aKRJmRx9UhC7mnd9K8x2YRM6VVQdzyekD8ScYzfEUM+oa8R+nqg9WllvvI5wv2q5rhbpT3xZRRjvXClBtQvmLWIgVMDfXoU/wQn6i+geR62teS3G/R+ZuYM1eg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Pb7qQw/0Z6o+znTOK8vclzW4+oWA9xFGUiNKfIOZCWU=; b=jBcxPFVHK27jnkXUHab69rjvGJ9s0GfHFWaL2fDEMQGhDM5vNVWE584nMRyEphZDSOE5r5v0qe47Q32d5ewNkh9xlvvN6aiGyQ2X+WPKoDyni7gFBUQOaMNKBlVAwA4V/IpEKZlWn32JeyQPnuPtYVMgVCrUYpvqLNT9QLQdwqDTn8FWIcrjoUKMsLpxeLfV7Zg1FCiKEQ+Wf7GbQLVEQDU0A/9HSQXcSBSBdFB0pmIUReH9R0MxeMnClmodGt6kiG410J9SHbNNr35L620MS+zME8P8Oy8iXNh/LcwM5kRK+ysJ29Y4X6hf/FnYU7Sx+HkRldO7atHZSpq7cV1mBA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Pb7qQw/0Z6o+znTOK8vclzW4+oWA9xFGUiNKfIOZCWU=; b=L1Vtun+coFiPL352bLeCHvVerkuu5aFcE6WhnchYjCXpHpXwd+4FWgYlsnHvWVYPxluQRyVVpf1/Utl+eJ+bCYKMR3Drjofuyg9KxtPKpXYiJnIfC3AI4TRhqmfPa/zfZqEhnSWtWzhRmaNa/zra4tJE5Ifpg2X4wOlFDqfLy3A= Received: from AS9PR06CA0650.eurprd06.prod.outlook.com (2603:10a6:20b:46f::18) by DB4PR08MB7933.eurprd08.prod.outlook.com (2603:10a6:10:37b::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.21; Mon, 14 Oct 2024 10:56:09 +0000 Received: from AMS0EPF0000019C.eurprd05.prod.outlook.com (2603:10a6:20b:46f:cafe::c2) by AS9PR06CA0650.outlook.office365.com (2603:10a6:20b:46f::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.26 via Frontend Transport; Mon, 14 Oct 2024 10:56:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS0EPF0000019C.mail.protection.outlook.com (10.167.16.248) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Mon, 14 Oct 2024 10:56:09 +0000 Received: ("Tessian outbound 60a4253641a2:v473"); Mon, 14 Oct 2024 10:56:08 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 50394f3e8e7930cf X-TessianGatewayMetadata: /Yxgpmqog56FshUhmESBhhUcHsobgVNnh2pm7PY9sw1VkV3MLMdJdXTxEUDXPGgDxLzzY6+I991qwgNsXSYZeKejuRx4v3162FSyDGZWmn7rsV+ACSSbrwLOc23dMiIP7Ueyc1biomAsBHgZ9SFH0Q== X-CR-MTA-TID: 64aa7808 Received: from Led3278eedea4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E57F5F0B-31DF-4E09-A155-1B3E4C102D50.1; Mon, 14 Oct 2024 10:56:01 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id Led3278eedea4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Oct 2024 10:56:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=m3Zp2sJmIhIazRFiPNIwG7PnUCEr+CrFqLbm3uYbRcU1IlSS5tUedksZphhFVpLNIh2lOoFQomRYCgV9FiMxvqu2bZMDmHTYi01WPlSPSh0az99aZyJ6bAXNLzYsV24XLm7ir9Kwq89hDu9xYfQGekW6rWIC74v4TWUQ3i9pX0kuoTcbuKeJV5XL872bH079xTCVEUT8vPHsN2U7+ZSIxUY7g7lz3RuL+LyzxkOvxuGkQG+mPpr3xoZ7SKyWcG8lmGAD0XztWZOM7zfKyU4SS2cc8ZJp4D1YFpkb2OxKs3NMQ87CnYDkr0D9eVTSmDlkdE35/NoLnZD04UR4+k2nhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Pb7qQw/0Z6o+znTOK8vclzW4+oWA9xFGUiNKfIOZCWU=; b=u1qkjj4J7LvozDFRwmqfsrFYeF78wSV72CvBKbPF4hIwzDHvuv1BJhU2f9AU67Zv+CeRKgPWwW06MjOffsaD3IwUAJaarrEG+tOyf/GmmFHDZcL4w0LxXEca3Cp2lUYttQkPDSrlpB8w+zMiyAkfjU9vxCYKE2CZwgtRD6e30EHZIhLMtVIouEodagwoLOB2zsazPB87r1iNz61ivG9X0Xzc7W6cT/ij+XpMHbFh5ySYIskI/RI/wr5XJmY4V3PBP5fqnyY5XvMeHJSQwW+4Onh8rAQqm0nNbCHZRc65gY8xFCqy1XNgUFwMf8wj/39ifupXAyOMYFi9yPgAhw5pSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Pb7qQw/0Z6o+znTOK8vclzW4+oWA9xFGUiNKfIOZCWU=; b=L1Vtun+coFiPL352bLeCHvVerkuu5aFcE6WhnchYjCXpHpXwd+4FWgYlsnHvWVYPxluQRyVVpf1/Utl+eJ+bCYKMR3Drjofuyg9KxtPKpXYiJnIfC3AI4TRhqmfPa/zfZqEhnSWtWzhRmaNa/zra4tJE5Ifpg2X4wOlFDqfLy3A= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DU0PR08MB8709.eurprd08.prod.outlook.com (2603:10a6:10:403::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.25; Mon, 14 Oct 2024 10:55:58 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%7]) with mapi id 15.20.8048.020; Mon, 14 Oct 2024 10:55:58 +0000 Date: Mon, 14 Oct 2024 11:55:55 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de Subject: [PATCH 2/4]middle-end: Fix VEC_PERM_EXPR lowering since relaxation of vector sizes Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0598.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:295::22) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DU0PR08MB8709:EE_|AMS0EPF0000019C:EE_|DB4PR08MB7933:EE_ X-MS-Office365-Filtering-Correlation-Id: 5768f85c-64a0-4670-31e2-08dcec3ecf6b x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info-Original: 05BlYopokkKW57Nk+iWKmwgcm+yDE8ePEcCQf7//3+lIxFVZijzd2dIczsc8xYovMCgDYzByIgS7pUDJEqHOIKkwz8gwXPhwBMhPB5dYVgvhrDlHmheWux4WueWg9FJfjQ/Oz617O7edn1LylzOW3CRDGY7pNY+KsqGBNtQqyL/mX4AaayNxePL3LvGVP9ULY9B+0/i93VyD+du1L+os15JnI1t18c4dbSa/yqQeOZs28a45+8yIPdtRI8hfL18a6cZRduu78SW4Trga2kLK40cBdGdvdWiYOrT/31ptUKWM1FbzuvJ6e6Vh8C4nQdrZNgNWG7J/9DcJTuxR6i068kLc8poY2HOPC4IWqTPjOYhmnD8szqnZn1M1+6OWvXZv23xSi11k6Ms42B8OdohwNN3mmGyORKYHnX50t6JbGYiSV81Mn6PviPi7fHPykZJH1MwO2vFWN05VhF1p6xoSv/y+fIwAv4x0hwkFUzkU9aFMRRNuuzOXL7Mz205HN/QTvq7PMvO+menpXwjBQUdWrTtPov9YKKxSWBP5z+s3AuoPTsgx/7hAwQhG+gyg9X4cBiGiZMKgH1zl1lr47uC/MEGeLZn9zwYBRDZscqz8vH5l43ioG8yomDzEj4+JnJAGJ8xvzpGyegk6PNr7AgeV0xo+jI+qPeLeWP/HfNR50tM3HrPLeAlD6btjQ/1hCJ1PSY3xyAn8IUZFA847qiJheYB30JeRn0KrIb//XOMIWlgaSj3THECiPC8mkjgfF0NhRQnkKfwJiNnwatJhWCxVfJqvG0pd5qJV5itQmcvJ+I7gZLg98rw9rAyPfh2PYfrfQhsqi1kzVsxQfEIoJXI/Cl0Rr0jAPnHq40xC/tvEBH/8BgHJKtciEjxMd7qLDw5YfAdYI+czWphxO6pvfpvIG87rfxRsr//BGyJXAydyJU0keu8E4NOYLukulWJ0KqT8gaBqjKh8QReiyxvk0rMgorJsHN0y4fvYPClsr87g80U7mRpeE+qrTHyA/KlxpW0Ui4WaGTFtI2/6DUuePUG6Z2YpGlCL1P7VTO0Vbs4gmpE/WmvlEoRFmfy8LNE5WvP8/Gq3HizPuqSeCilHtjxmcqlFGuqnMUpRQU0zIGp/SKATdu6Zz7HPxMNQfKfZ6gyi2qNtPP9YgfAgDKXYYgdQlhA3yFgMtR2fuBuNeVtjINMg92bLZSQV4AJtpC/h5OQhxmY0QhiLjXdZ6ljAWGnjm/+byWGzpqJ93+HZdoiGzp3B20KEs7rgI0i4v4/eyL9vrM/h+IeignF5tWisoszttw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8709 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS0EPF0000019C.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: dade1739-71c5-4026-fdc1-08dcec3ec8a0 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|82310400026|36860700013|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?q?7+Fzs6UB71ySfIdkqcngH2P2k9pxmXX?= =?utf-8?q?H/oZTgLB1v2rAKxq7m3ZrM61rmG5tsD+TpaBYjN68ShwRNZ/UbeNLies4AVpXQDo0?= =?utf-8?q?GN0yHz/xBv5feLIIS7j+EffJCNx6WtQNmMgB8jkr2lWANUeF2D4zQpP8FHpN12A1J?= =?utf-8?q?vSO2ITgXxkC7/ftkzXyFNXqt3tU+038p9aMS5hJnETFceYZToblIo5Vfxwk+Rj0P+?= =?utf-8?q?DsApfX+CIYEPb7sKytdfbaDDDwJOcUQe7ufCK5DVGt+RdUpiHD9j1x9DroYnaUX7T?= =?utf-8?q?Vc3x2MiP7iGqgYPyirkplbMOauyNsXWogJVgLcPM4pd/1NfjFW/X1VFpuHhoimCl5?= =?utf-8?q?alDYL79q0pxcNb8dEqOyKJLc2XtuALFdbHw1rnSPH+PfmpoHb/X6tb6T8nrQ4NtB/?= =?utf-8?q?aD7UOZkvyBDe6Vjn7Mepp8mfHUaRe8RwUWl0b7g1YOQQrQO9+69UYovlZMCMW3FIB?= =?utf-8?q?FDPnfXs/0eVXjxYiXU+HhrBs5NESMJjGkIFXvA51xMjK/+9DQmAbqe1gJ3vR4sPkA?= =?utf-8?q?d/PZz2tfbI26KuaUCnKcXPTiu1NiMQ97Q/+hJhdjbdz5dmHo2Dzpk6CY0VuhM6BKX?= =?utf-8?q?SZYT7xItUuwkatHywhSyD8zm5eIadoL9jWCUfGrnMHI51Cq2O81Lx4jdv5vgrHp1g?= =?utf-8?q?4Bk/BgVzOeA9xpBgvExmIv24qlWzT0DiByJ9ribs54mgg+WuyRh5ozxXkaadZTCF2?= =?utf-8?q?TMVwx+lbAmgTFiDAJHPNn9xKdtfaciHkZrKlfzJC3pyRTgrTqNTL62Hl6rNubhMDl?= =?utf-8?q?MlYGPaLC1sxMgjcTKsXI3FZTEfztTmeLU0pzMlu/QCjpLkYe7P8xSx/kgE1hf8Bgt?= =?utf-8?q?+ABbTBXZ/LQcUgyixHGLe+k3FRqipTDxPraZej+HCrAaCZC2ZHSKPAaJkbz/rRZW1?= =?utf-8?q?VClhsGc7X6+plf7VIcl2Oc83StsUKzAyAdTl4Nxxe8cQE/va8AGGbPBMsuAFL6y2C?= =?utf-8?q?4Js1D34t55GnbIiQwuLS6IFpqRu9gYlqPU3NNIfdVkA/7PLgM9OTIHqr3GmYH1TIE?= =?utf-8?q?dJLZT/VxUDgK054g/2/z38Hpf6wPxj6bbNiCA2hzsDcrXVfHnWul9FWgLHLisxjzR?= =?utf-8?q?qNIQUUJ3y09nM1o7KbYY1JImgtORjNckLS5Im3FcEyKkxtI2FEHQdB9cRsTOp6rXW?= =?utf-8?q?PHfVfdwGytiCpj/mCKLREjJoVNLYAY6dFg40Jf8tW5I/uIqu0mYMH3KT+hWeX2w8r?= =?utf-8?q?zodWqBqrfgcu3JmZ+h4iAOhF/RiblHx9klRCvWzHdF3L60fMWfZr65btpZip41cjT?= =?utf-8?q?Qk0QcjY1aj+z3TTZg7uScePWG+IJ/baQhj4+5hfcr9rLODoi+y+b3y9s=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(35042699022)(82310400026)(36860700013)(376014)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2024 10:56:09.1023 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5768f85c-64a0-4670-31e2-08dcec3ecf6b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF0000019C.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR08MB7933 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, In GCC 14 VEC_PERM_EXPR was relaxed to be able to permute to a 2x larger vector than the size of the input vectors. However various passes and transformations were not updated to account for this. I have patches in these area that I will be upstreaming with individual patches that expose them. This one is that vectlower tries to lower based on the size of the input vectors rather than the size of the output. As a consequence it creates an invalid vector of half the size. Luckily we ICE because the resulting nunits doesn't match the vector size. Tests in the AArch64 patch test for this behaviour. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-generic.cc (lower_vec_perm): Use output vector size instead of input vector when determining output nunits. --- -- diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index 3041fb8fcf235ba86f37ef73aa089330a2fd0b77..f86f7eabb255fde50b30fa3b85db367df930f321 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -1500,6 +1500,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi) tree mask = gimple_assign_rhs3 (stmt); tree vec0 = gimple_assign_rhs1 (stmt); tree vec1 = gimple_assign_rhs2 (stmt); + tree res_vect_type = TREE_TYPE (gimple_assign_lhs (stmt)); tree vect_type = TREE_TYPE (vec0); tree mask_type = TREE_TYPE (mask); tree vect_elt_type = TREE_TYPE (vect_type); @@ -1512,7 +1513,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi) location_t loc = gimple_location (gsi_stmt (*gsi)); unsigned i; - if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements)) + if (!TYPE_VECTOR_SUBPARTS (res_vect_type).is_constant (&elements)) return; if (TREE_CODE (mask) == SSA_NAME) @@ -1672,9 +1673,9 @@ lower_vec_perm (gimple_stmt_iterator *gsi) } if (constant_p) - constr = build_vector_from_ctor (vect_type, v); + constr = build_vector_from_ctor (res_vect_type, v); else - constr = build_constructor (vect_type, v); + constr = build_constructor (res_vect_type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); } From patchwork Mon Oct 14 10:56:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1996809 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=kEsO3Fg5; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=kEsO3Fg5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvLq507hz1xvK for ; Mon, 14 Oct 2024 21:57:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D73233856DCA for ; Mon, 14 Oct 2024 10:57:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on20624.outbound.protection.outlook.com [IPv6:2a01:111:f403:260d::624]) by sourceware.org (Postfix) with ESMTPS id 94CAF385AC31; Mon, 14 Oct 2024 10:56:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 94CAF385AC31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 94CAF385AC31 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260d::624 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903403; cv=pass; b=bYUDgGdMoNTjYVptHHdxXwFgxzgelvFNK+ghgsvKdghwcinK3Q8EW5AMNnupjhdNBVdeupE3rrFyHoQaOhy4xWND/u98M29LDfa3cFKydQq6PbYz2wErMjtSo8EJyodSpQPhETBMHyDSXzS5nolLtak3qcH2GaeiMOlWx/SkyZs= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903403; c=relaxed/simple; bh=3CPJSzfVEGlDT9VxHSyrML0cam+RtJMrPKWWskcA36c=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=aIf8vZhlreO4Ew2eeVwyw9uKVXW5cY9g3NFKg0vMJU4mGA5BeU4DNqt7mX7HFzALjAp2DoMBBUUhmgONQ99OBAn5sQCNIdbuxUVnHiQvHimcx59gr4l50uoFxm7CdWXCxEwLLmSIDXvpnhK76fqfupioPGCQKcJd23fLnG5ISq0= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=bAD69XZEb5m9L1TnGrddZSeyjpBn8avO6OdUWWOMx/hSHvilQPq6pAWWMpOVXS/bSzEJSqb8sCegTIFkg1spNlpjtGSVfTEb33m/so+j66BlbfOY9vkbuerCidZN5ldV4kBsywLxc+7acYtRWnEHpixGx3XMvJiT3yg5nyy8QUYwqLUC53JPhw9d3MUg7oFNffUHFHgFQ1Qk5kccJqR1v82SnrYEQSz5YMn5q3Q4jmHuvmQxsOY2VcMz9td3HPUIGwkgHVsQpbIonnMUeuEDg2dL0w2kVp9nVafghYke2r4rikZPUo0uPjqktYSEKHRSysbQ4z7jE/ibMgOjnBC0ig== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=Gb16daDis456AMrQb7D5uIqGbA4yU3hcbNrK42Q+QwFqqg1CaD5vN5pulc2O2x6c6UdfDJexT+84CzoVVMtHwK1Y3eQsvO5XS+OZcyskulxJBS25hlrMHTSNtYt+Al57jFPlGlaR0qlIfEP7077Pm1a7fW2K2BzBIMbrlbtC8r1NlLJnXu+zTB9u9FS66PmevjQpx5ZmU//V7DqRVNQZBWjPpzou1wJfaitrDC6MDUC0UQZWvSAZ4c6SdBd7lNcbCCpOYgkvd7bRnDbiaJUV540UfssgI8kRpsvGHVhfCn/7KAkg3TcMLTAPp2YRCWCH0TBgSc4wiYxA4QKY6+cV3g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=kEsO3Fg5jEoOzxP6mIW/o3C/pgHYZ+R3dFdG+2r7MBSEh6SE4CTEKVxP+cgCVqhtdg9jTeshgVZtBoatA4ZT4StRHpBb+EXXLLqgujkyZ26OcfiHYO1pAukmV7n/InUBcpzr+/PIKO4Iz/x9K2QPPj9pR3K9OLMCimXmKsJGm0E= Received: from DB9PR06CA0018.eurprd06.prod.outlook.com (2603:10a6:10:1db::23) by AM9PR08MB6004.eurprd08.prod.outlook.com (2603:10a6:20b:285::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.27; Mon, 14 Oct 2024 10:56:26 +0000 Received: from DB5PEPF00014B96.eurprd02.prod.outlook.com (2603:10a6:10:1db:cafe::98) by DB9PR06CA0018.outlook.office365.com (2603:10a6:10:1db::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.26 via Frontend Transport; Mon, 14 Oct 2024 10:56:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B96.mail.protection.outlook.com (10.167.8.234) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8048.13 via Frontend Transport; Mon, 14 Oct 2024 10:56:26 +0000 Received: ("Tessian outbound 40ef283ec771:v473"); Mon, 14 Oct 2024 10:56:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 948f4651486f92a1 X-TessianGatewayMetadata: zywqHDfYfYNSEmhVrscKhqwSfVYJu7I4Ve6oYcV7X8AFeoGRmBy+PoSHQdmTGaAlQoFeQ3TD0kx4f5AxEn7c8TEKi+/0jZsv557YfjyBJ3PyvJeP+WZYUbo1D2B1ZBIe9NqD6ZULk4rHX5NNg++eEA== X-CR-MTA-TID: 64aa7808 Received: from L82550e7cdc6c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0C98B3E6-8F44-47F9-839C-57EADC252F1E.1; Mon, 14 Oct 2024 10:56:18 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L82550e7cdc6c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Oct 2024 10:56:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=REpPUzKthAAgHKhBOn3FVJctP2y9e6uRdfhQ/xh2OEpI5FNv6bKhWfe/TJReR7V5ocPitcmEV2q84jrtgFG2MwUUk2OKkJxpUh7A0xKxiIMyTAchZtxKt9m6n/bzMRCbUd8+0t7K40uTGZJMhmqJZ43/2iQxzv/JRN7dFvC9e1nt9Sr5a4cyeCf01gJVLJ4XZ+GYT1oTPNbtZvYR5Eog1r+AhZDGuzBIbRwx1S7gQY+xkzYtvgo3T+M3ia547imzF/f79CEdGMgBSJ1Ngk8OHedLk8TC9HNOds4rPLVIphG/flejTieOwuQXbHrT1reiGO2jC6Ei13jwWMK775yYbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=oxSD1GL1PWeeFt3zJ9smY9PlpcK2Q0NpBj5RLff15eLkVAmFENFXzgO8JcGuwzjvRMT32vqjqGcAqS0KvzN1ACTIrK/XPRs/LD1Y3ttQEBwaVAk0SVu5YcUSXl2ukMdRmnOWVsqJgBgrVPOttbgk1rZ3k7W7WD0q1wvPrIyz8XzUyy5r0Ak8N/y2rYqPraAJHdBt0xDm/ZelVNPEWHwIP9UzV7soHX7PIUJjpKJBhY4YxPsUvMT8VIZ3zlVFgvXL/zoinwX0XfU7mvHBg1L+e3qzzBO7OC/HFKtxkBdg9I0PSu2eI9UVu3jcz6OvuyRojwgHseN6Hb/gWr4HbFudUQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lOb9WOz332xgWwU16/zHRIqsl3eRzJmKP5S2YbLaLIA=; b=kEsO3Fg5jEoOzxP6mIW/o3C/pgHYZ+R3dFdG+2r7MBSEh6SE4CTEKVxP+cgCVqhtdg9jTeshgVZtBoatA4ZT4StRHpBb+EXXLLqgujkyZ26OcfiHYO1pAukmV7n/InUBcpzr+/PIKO4Iz/x9K2QPPj9pR3K9OLMCimXmKsJGm0E= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DU0PR08MB8709.eurprd08.prod.outlook.com (2603:10a6:10:403::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.25; Mon, 14 Oct 2024 10:56:16 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%7]) with mapi id 15.20.8048.020; Mon, 14 Oct 2024 10:56:16 +0000 Date: Mon, 14 Oct 2024 11:56:14 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, ktkachov@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 3/4]AArch64: enable zero-extends using TBLs for Adv. SIMD Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0586.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:295::7) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DU0PR08MB8709:EE_|DB5PEPF00014B96:EE_|AM9PR08MB6004:EE_ X-MS-Office365-Filtering-Correlation-Id: 1f0e4342-be09-4d2e-0ce5-08dcec3ed977 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info-Original: uhTzwNTfKvIDN6n60wdYL1YycXkeIC6WP3WU3jukAfyE34rW8OfG3LeCRTg1tVGMJoh5jqvuF9OaSkjPuhz44DKAoWGRPY5MAAGQ3P/hM9SG2EkbaH5mcE65sabPhL/xCuW+afPCwYzgdXXlAz7Emi1YQdfxJW8CoMx4Na9U8zW7CU4BsjHzwrLQ0ZCULoyIOlY8SquPwcpKn0oDaI6OkHVHstYgkqhtdDY+HGtefBv9JnYCrHSb14+DIU+v7JRW4eJbSW+fMYwtkDPZ9YDBIgLDOilexlyr8TM0MtDXH9d3z3Nx7n4U0Wdt5yvgh0RzmjOeXE/9gmA3u176V7KlZ+gBltb3T2Xpp1gN6Xc30lP64c1xo3uHQmcP02MORJjwYguyK8vVbro9+9zVQnMwLWov4zfO7jiGhv98O933w8YauRwL4+Q9VR0c3xptJcaAoXdixaIjKTVAUCsTHPzBKJvhZeSma0QVajMLgCAEqbPT8d1qqkzw+MzJevvBlSfhGaVmA3ETMV/PDpsF+cURsZqX7R1BpY8QkrYwNBU3qycZJ60J6jsBWWuQ6D6u8r3GesiWDkOkvOcrhO3bvJOnUAYalDhrTHNJ0db28CsCYHNKWfaAoquxoL2/dNbai4JLAhdTzbBhJDYjYcZXk7qBGpG4u/czxPK8z8/zzQ6q3EbyH3pOV7R37Swa9joEFLvqvzs4+FFMPDIYqiYz/uUuqzSTQJyfb81gqGmFsaz3Cmai4g9INi2pdNKGMcfFdTfX8ZNWKs/vbmuWKgu5exvgYrIVYdRn62UukVmvVa71gOaL1dI/MtyL1+3eJdbYIR25egrYlxLR4kR2YZym53HANTacxgU7rF5KeOekWislpX/54f7vAmgrSlFZH8zT39H7mUp4dDZYU68FnbH2ov+JH1SzM23poLSYIwETZglpN46Qx8eIYuS8JYmlRi8NpPLWiycCJS6U/LKeDn3pLqVXeohyIv6JI+8R3f23czp6fz3wQjDGZ/SStJIErV++CnkJj07buV+hTz3OTI4rXcZti6gtEnaRR1BFJccvmXX0l7ZawHoKWg1u0UR5LQK/KzJIaqfzXEe4zNlqTq8xaGEepHiaJryWM5HvPIWpNiwFd99i6qBTKvs4tWSNePOZRMnAxqCRRqsyMdbedvWxyg9v5Nir/WGkMU7d1YEsQnWiSg7RVAZkTkaFBx5XE2+khJdqMDTmS1qJCBw0tLBj3T5zr8iuf6ejRevQQ1HQdVIYhVM= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8709 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B96.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: cf278b3f-c472-494a-3a1f-08dcec3ed365 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|82310400026|376014|36860700013|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?q?GE4xFpcKnZ4nyIuAbubBjORRVQr5Jzr?= =?utf-8?q?v46AQ7JpePtwOtPLL5Oru8c91hPIaL4tCNJ4zUhR+zXRXxUwqtfQ5bRP8wBzn6plg?= =?utf-8?q?5DgVhwY5/h1RMeZcIgRpGY4eiWHvh6CLSlUFsABX9vbO8MGmtoWlOLV3f5GTcJHho?= =?utf-8?q?NypoNGOBhNJP+tJUMyjQX9azTMZQqKr/43k/3HuV7d8cSCZOauTg8qciCzdGnkjnL?= =?utf-8?q?NXICKlahEBw4CcCAyTOaLTqY0hH2qW5gTZ62Cuex6SEwsHc55fLvgzppuQ59fQfzQ?= =?utf-8?q?TEv/lJzjNr7kQrPHeuthlECvxkBjVWkYrDPcqmYtEJiR+JNszwVsreB+WmKYGsB/+?= =?utf-8?q?3pfw7AuVFidXp/W0Bb+xK9ddaoe/3qpi/Vl/p5ppBNTH5u3N4pXFH/sJSPBIcH03B?= =?utf-8?q?IQaM//+dOtH3BlTLOmfyX3VbiXR7z/bJkYRFnC4EWvycnuPByK9mtnRzLv1Jv2b4x?= =?utf-8?q?Qa2YLK0TbZPMY4L7vbWapMfVMgf72EJeyq5FFDT1ZuNSOn7OFqeYA5EF6WHnoswlU?= =?utf-8?q?BgkEh+sbk5apFA7Rb+k3MTsndTp8mMnLcS4Rp1NuVsdX8dmXBanskyttLbH/n9Ai7?= =?utf-8?q?KJeONUmDDOYFA9GRSkj6sDhyxLKCa5JJXlfwNpgbzEbzT2InZkUybeoFKsc5tGwYV?= =?utf-8?q?/f00SLF2n6rPueQfFOsY94c6HHJeofK036k5GA3S7+fEhZUkz8Lrsgps6I/wv9Nms?= =?utf-8?q?au79NWWAZ6Q6EshTs8az6b9pahpZhTMGNx521GO1HP8+WEExXhCYTUMFzIIVe61if?= =?utf-8?q?b+3DZH3m1UZX1rVdn6DitcdnV+V+3wlsShM0U9StMv6Ke05cjs+4xJOSPunk5Z/k8?= =?utf-8?q?0uqzGhXEqwAUyjUqXmK6GcU3Ts7OVpBgovLhpjZkn92a+4HuuIKAhoAkkAuQ29pkt?= =?utf-8?q?Gwk5XbywdfmqCG6S6XXqMKjl/rpRfkYR7KMYVjhjrq1K3LaKm0g2pY11gteFcniD8?= =?utf-8?q?WAJbkGXYVG/yDIzNszKVzXzhPt7DOMhDykH35EWIuKpZodllv+p7dvKUXB0Ag9AII?= =?utf-8?q?aUaI8avcLxPbpHpZMfpcAS1EevLbHRBKOOQwhGa0qmyZejZLqBWRTMgo6hk2pVZ3v?= =?utf-8?q?JisgaphRircKgZf/3FLr4+z5Y/sApBwQLMnjdJFzCs2aT2ulhj5VS+zj4cT1kTLGv?= =?utf-8?q?ImnevwACBz3eP0xhWnxsxUiMKVgqcZp4yxXc1g0rbFzfIlgwCJLnfgO778ErG+Kgy?= =?utf-8?q?IXqDJiPx762qtD1BNh8PVebe27e3iilOFpgwZlGvHMU0Z2g0rD0FKt9PfIGmor/bb?= =?utf-8?q?lp2n+Ejrelmki?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(35042699022)(82310400026)(376014)(36860700013)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2024 10:56:26.0353 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1f0e4342-be09-4d2e-0ce5-08dcec3ed977 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B96.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6004 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, In this patch series I'm adding support for zero extending using permutes instead of requiring multi-step decomposition. This codegen has the benefit of needing fewer instructions and having much higher throughput than uxtl. We previously replaced pairs of uxtl/uxtl2s with ZIPs to increase throughput, but latency was still an issue due to the depencency chain created. To fix this we can now use TBLs. The indexes are listed output of the loop as well and can be shared amongst zero extends of the same type. The additional benefit of this is that if the values are being permuted after extensions they will be simplified and merged leading to better overall code. e.g. an LD4 can be replaced with LDR since the permutes being performance for the extensions can be merged with the load permutes. The way LOAD_LANES is currently implemented means this can't be done in GCC yet, but I'm aiming for this in GCC 15. I've additionally only added support for non-VLA. The problem with VLA is that the index registers are hard or impossible to make. On Adv. SIMD we use -1 to indicate an out of range register so we can transform the two regs TBL into a one reg one. However on e.g. a byte array, on VLA 255 would be a valid entry. e.g, at VL > 2048. Which means that's already not a transformation we can make. Secondly the actual mask looks something like {x,x,x,n,n+1, x,x,x, n+2, n+3} and while I think I can represent this in vect_perm_builder, I couldn't think of any real efficient VLA way to build such masks.. It would require a lot of setup code. Lastly I don't think this transformation would make much sense for SVE, as SVE has loads and converts that can do multi-step types. For instance the loop below is already pretty good for SVE (though it's missed that the load can do more than one step, presumably because a single extend is merged only in RTL). While I tried hard, for these reasons I don't support VLA, which I hope is ok.. Concretely on AArch64 this changes: void test4(unsigned char *x, long long *y, int n) { for(int i = 0; i < n; i++) { y[i] = x[i]; } } from generating: .L4: ldr q30, [x4], 16 add x3, x3, 128 zip1 v1.16b, v30.16b, v31.16b zip2 v30.16b, v30.16b, v31.16b zip1 v2.8h, v1.8h, v31.8h zip1 v0.8h, v30.8h, v31.8h zip2 v1.8h, v1.8h, v31.8h zip2 v30.8h, v30.8h, v31.8h zip1 v26.4s, v2.4s, v31.4s zip1 v29.4s, v0.4s, v31.4s zip1 v28.4s, v1.4s, v31.4s zip1 v27.4s, v30.4s, v31.4s zip2 v2.4s, v2.4s, v31.4s zip2 v0.4s, v0.4s, v31.4s zip2 v1.4s, v1.4s, v31.4s zip2 v30.4s, v30.4s, v31.4s stp q26, q2, [x3, -128] stp q28, q1, [x3, -96] stp q29, q0, [x3, -64] stp q27, q30, [x3, -32] cmp x4, x5 bne .L4 and instead we get: .L4: add x3, x3, 128 ldr q23, [x4], 16 tbl v5.16b, {v23.16b}, v31.16b tbl v4.16b, {v23.16b}, v30.16b tbl v3.16b, {v23.16b}, v29.16b tbl v2.16b, {v23.16b}, v28.16b tbl v1.16b, {v23.16b}, v27.16b tbl v0.16b, {v23.16b}, v26.16b tbl v22.16b, {v23.16b}, v25.16b tbl v23.16b, {v23.16b}, v24.16b stp q5, q4, [x3, -128] stp q3, q2, [x3, -96] stp q1, q0, [x3, -64] stp q22, q23, [x3, -32] cmp x4, x5 bne .L4 Which results in up to 40% performance uplift on certain workloads. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_use_permute_for_promotion): New. (TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION): Use it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-tbl-zero-extend_1.c: New test. --- -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 102680a0efca1ce928e6945033c01cfb68a65152..b90577f4fc8157b3e02936256c8af8b2b7fac144 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -28404,6 +28404,29 @@ aarch64_empty_mask_is_expensive (unsigned) return false; } +/* Implement TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION. Assume that + predicated operations when available are beneficial when doing more than + one step conversion. */ + +static bool +aarch64_use_permute_for_promotion (const_tree in_type, const_tree out_type) +{ + /* AArch64's vect_perm_constant doesn't currently support two 64 bit shuffles + into a 128 bit vector type. So for now reject it. */ + if (maybe_ne (GET_MODE_BITSIZE (TYPE_MODE (in_type)), + GET_MODE_BITSIZE (TYPE_MODE (out_type)))) + return false; + + auto bitsize_in = element_precision (in_type); + auto bitsize_out = element_precision (out_type); + + /* We don't want to use the permutes for a single widening step because we're + picking there between two zip and tbl sequences with the same throughput + and latencies. However the zip doesn't require a mask and uses less + registers so we prefer that. */ + return (bitsize_out / bitsize_in) > 2; +} + /* Return 1 if pseudo register should be created and used to hold GOT address for PIC code. */ @@ -31113,6 +31136,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE #define TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE \ aarch64_conditional_operation_is_expensive +#undef TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION +#define TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION \ + aarch64_use_permute_for_promotion #undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE #define TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE \ aarch64_empty_mask_is_expensive diff --git a/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_1.c b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_1.c new file mode 100644 index 0000000000000000000000000000000000000000..3c088ced63543c203d1cc020de5d67807b48b3fb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_1.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99 -march=armv8-a" } */ + +void test1(unsigned short *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + unsigned short a = x[i*4+0]; + unsigned short b = x[i*4+1]; + unsigned short c = x[i*4+2]; + unsigned short d = x[i*4+3]; + y[i] = (double)a + (double)b + (double)c + (double)d; + } +} + +void test2(unsigned char *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + unsigned short a = x[i*4+0]; + unsigned short b = x[i*4+1]; + unsigned short c = x[i*4+2]; + unsigned short d = x[i*4+3]; + y[i] = (double)a + (double)b + (double)c + (double)d; + } +} + +void test3(unsigned short *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + unsigned int a = x[i]; + y[i] = (double)a; + } +} + +void test4(unsigned short *x, long long *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] = x[i]; + } +} + +void test5(unsigned int *x, long long *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] = x[i]; + } +} + +void test6(unsigned char *x, long long *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] = x[i]; + } +} + +/* { dg-final { scan-assembler-times {\tzip1} 1 } } */ +/* { dg-final { scan-assembler-times {\tzip2} 1 } } */ +/* { dg-final { scan-assembler-times {\ttbl} 64 } } */ +/* { dg-final { scan-assembler-times {\.LC[0-9]+:} 12 } } */ From patchwork Mon Oct 14 10:56:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1996810 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=LC+AC2Dt; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=LC+AC2Dt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvNM6RXzz1xvK for ; Mon, 14 Oct 2024 21:58:27 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 098E1385AE41 for ; Mon, 14 Oct 2024 10:58:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-AM7-obe.outbound.protection.outlook.com (mail-am7eur03on2062a.outbound.protection.outlook.com [IPv6:2a01:111:f403:260e::62a]) by sourceware.org (Postfix) with ESMTPS id 73B9D385842A for ; Mon, 14 Oct 2024 10:56:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 73B9D385842A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 73B9D385842A Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260e::62a ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903427; cv=pass; b=CkKwmYn8FQ5Ge7x4LTwKqw2e/z0P3s09RXHE2/+K2jQpjob+NGKBJOOTYQzsEfN4emlSGUZSKfQFpMIeO1IyzXXudWIQjeEDnz+tfxWLNkDxeP+QKxpkjBMTGzMaX3j6Mb+uHXo1VLHtmbUTQVvGLa+BgdIAHzbd+50mhOY4ck8= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1728903427; c=relaxed/simple; bh=diooiiZYUv2fA73Spl8WcBUeGRV6xhfwILGwVZBAwL8=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=METyGgvGSubi/8+RE0Xq6oJzX/js68rwXsX1uBlkuMN/p0k5N7zXQBDeH8hjSVuz4TC+ltZ/KsZn+iW/GdLwIbocs27SaTjVyU09j9mt8PGTU0qtkD2fA6H+rzDaLvP71Gkswfy2Mp9RFR0ciuDqm6dljge/G+OADyzCxLx4Gyc= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=Gbfb37HUs8S6xyIkKIB5xgQesGuQCxmPqPXF4P6LJ37gqUV0rtb3Ab09lL88SEh/WGbndqMdD8LrpXxRkY8fhFklXkmNzxcbKDUO373vuW8gc7Y0S9TJjGPVBRf6nSp0iRvTaH1qfTw3NuJVJhUf2MFh4khVt1AOKzuWgCmZtST3L9gpgGbh+GTM/i4e88gZ34Yr2b5C2+2e1IvpRqxIil3hU7lPdauGt1kKrYq2LdNf+oC3rrkmErh7OLnKnhRnLcTxMTXXBg/zf/l7oSUyxM+GfL8d1Zc6bOKvUiaLdBX0HVnbLyAKKMDZ2AfT4XZGlVxzbCmlyGgDrCfymQulKQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TnY7B3gSWuXfLFBItBXqbbqqiUfPzZUCapDVVkTgdWw=; b=Kf7DXnPwzWgUOTuooSdr/VKJsxUvweF2vuzsSKTpaGAvoapFU/UJ3EO2otE4iOcR2GZabxoxjMO8PyNNtox0Nedk4YtG/BaimIv999W3VD3yYGi9YDC3T+PrqlDC1iGa2uaNN6mbKJUd6EHfVPzWtc2R9aB8nz4PYxh+RSYXFNB8NwAoUCrZ8UXcdmc0IFsCkCZ/7GgCNFTD8wu+l2LPyvWzKtZo/XQxxEcknx3bjd8bAA6cJyxhBritr1Fv7pEgrC/j499BeeU1lD7Eeysdyz/I79BJXDCjX1X1qV9XLiUFp9v4eS0UmRgHgNquS/BgCcThskY32tpsXORk7hjNsQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TnY7B3gSWuXfLFBItBXqbbqqiUfPzZUCapDVVkTgdWw=; b=LC+AC2DtedebvKgBzKhYNpxyrHNGAGVOaP/qBem1IQ1UHhu8cfFcskJ14jkn9Zce8qeYh60+dMfis7Wi6SGx1uLrJzUlmBiXB55bkJ2xFdKASz73z/WC3WneH+wfRRNAOCtrIoVf7nwNhkUFNs3Mi+D0ZcLdPnHawKsFC/r3gLg= Received: from AM0PR02CA0218.eurprd02.prod.outlook.com (2603:10a6:20b:28f::25) by AS8PR08MB6453.eurprd08.prod.outlook.com (2603:10a6:20b:31b::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.26; Mon, 14 Oct 2024 10:56:50 +0000 Received: from AM1PEPF000252E0.eurprd07.prod.outlook.com (2603:10a6:20b:28f:cafe::3b) by AM0PR02CA0218.outlook.office365.com (2603:10a6:20b:28f::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.27 via Frontend Transport; Mon, 14 Oct 2024 10:56:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM1PEPF000252E0.mail.protection.outlook.com (10.167.16.58) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8069.17 via Frontend Transport; Mon, 14 Oct 2024 10:56:48 +0000 Received: ("Tessian outbound de6fe3af73ff:v473"); Mon, 14 Oct 2024 10:56:48 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: bd62cf6f87da4a1c X-TessianGatewayMetadata: jm4DblG5Nk/ACu1KB/O6h48W9TT1AiEt2Ro7qkf/k4xD8QAHibs08nG/7TDWxKY1liF2zVHbh2VyBEl0G2JmSt16RVR6/vn4xXO3kVoXAZknwEfdfqIXZVMUpFCQBlSNbshuH/U1ggq5glzKr+8gNg== X-CR-MTA-TID: 64aa7808 Received: from Lc151ae719435.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0A2E920B-510E-4D8D-A14F-660514338209.1; Mon, 14 Oct 2024 10:56:41 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id Lc151ae719435.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Oct 2024 10:56:41 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ufJZ0xTRxSyID1xZFLK59QWiUp/+DRVE4X37xMiHPvuZCE35iodokw1zfDElhum3l6+QJ3tfnKisRcb0D8LII2QLtYOclwSrl0u+LhT54hq+DJbJzNJA7KFaOYS4GOOkAgrYl3vFTQhrrWTKejQc7Mv2YL0R2vf5fl5NIFAhJP0r5kGgnS2SksgH/hW0Yaw7N3F3kk07YB0PorqUsRft9KFBcQjPpkACLPr4mddmMiIIZvGgubsfteBERKUZprNqYq2xa6LMj9sPJhRvsOSKx01XpQHekQ828CQ1tgVJR7SNBcfMkxl058bzNAYNU/H1U6XW85YA4wR6OgEjfwbSIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TnY7B3gSWuXfLFBItBXqbbqqiUfPzZUCapDVVkTgdWw=; b=qanZRfPJB6FqDMQk/6GyLJkpWcJUTpCdowCKnoNLNUA3I7ih3rhP9sCSpi/4WD5sCynMfOLrsKYj9zQwi/ETseSx3/JEqs+8Hq89vPj/rD6pfoyISi57u+RN3PD72zWJx+NCrgjIPKJfXlz7DiBH4y3eXabnBa4qWbu0TcBEiCdnuC0ZSCqvQeeNxauqMj9TEgUUXMA4UQbxIWwePqhOmvAcQR2sS7uUJDJw+eEbuMA4M8lIci7N4oYhvTZZBSlLqe4vIxenXOsCbK+Xh8j7A0t/ivKeai263MfR2RewCnsB2Se++7aCGYuad9LoQoCLmQKxR4vkB8hJ9dxXEPf/gw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TnY7B3gSWuXfLFBItBXqbbqqiUfPzZUCapDVVkTgdWw=; b=LC+AC2DtedebvKgBzKhYNpxyrHNGAGVOaP/qBem1IQ1UHhu8cfFcskJ14jkn9Zce8qeYh60+dMfis7Wi6SGx1uLrJzUlmBiXB55bkJ2xFdKASz73z/WC3WneH+wfRRNAOCtrIoVf7nwNhkUFNs3Mi+D0ZcLdPnHawKsFC/r3gLg= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DU0PR08MB8709.eurprd08.prod.outlook.com (2603:10a6:10:403::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.25; Mon, 14 Oct 2024 10:56:37 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%7]) with mapi id 15.20.8048.020; Mon, 14 Oct 2024 10:56:37 +0000 Date: Mon, 14 Oct 2024 11:56:36 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de Subject: [PATCH 4/4]middle-end: create the longest possible zero extend chain after overwidening Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P265CA0060.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2af::6) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DU0PR08MB8709:EE_|AM1PEPF000252E0:EE_|AS8PR08MB6453:EE_ X-MS-Office365-Filtering-Correlation-Id: e0b28f1d-bfec-4ba2-db58-08dcec3ee713 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info-Original: 2sT49ZHedI5orQvm2h23EgjEqOmk+DyetDMomZ5JtBSqh9Ll4usSEoQvPJyH/KWinYoPnOdBE1NyOxWjpp+b01+7UhdAdp5a4xj8DmI9RrohyPglvSR7kTsKb8wY+J4vt7jogy9lmtQNxVWV/UB7zfMrGgAyyI+/uhfwLr74Q2iC3DsSRGtNUojzpXxkqZi6gDz448fuMzqXW/Km9w3I2mCKTicZ6UyvtSlOgREiSBS2vkZ0sHRQUYPXabyShGWhVMH9UxhNzSsFCYjxcjwPDnpYK+5Wf9W1hfoLzQn1suQBdmOqQb+CZslxZlyX9dyr29xROHP8tmCVxs8jZ5/ZG2893WF2KbRQMsqk+JpmwT3Bvs8lJ279/jpSxT5QimaCehkbslZ50Xf33WoUqG/IkYbX4iym5l0rji76FGKpD10qJ4pGiKMsLrrMAyT7alMNrjvypt3icajg0XzjtF+xVatq54OJZNafXaX8vm/TSgSmoLwNMRy6itelA4CMCzytkZ1hsKsrLnBpIagnCBUsoam7isocNMSB36pxUSuTqrM7G0GW4VFI5jFUYJK6eylkP+z+8ECoRrx+PgQ+s55VvblKOB2c8bV13ZBPlgfgr0yUat+JKXss9aOMjQCUBwA9Z/8oTFbD92mQlGKu/gd1yTnNTX4SeCuU9e9TTJgw6yzJ6NyS0XsWEfmO+yiUyDAtm26UicBGmv7tRa+R7RTOcJmsaf/3zILivIN6gWFPtO6fGq2WjWFb+3+xxxEkjwmN5fV7SPtYvTZ1EBHDG2yD49yMmEFTLQoCSci7/VzdmI7xEzzDKcR8wBHJAh+txT8/10d0Rx5bCJlmInt3lE2I5AyfqE+IYjbJZ380QEdVtV0626xwcmSQHk53qQI8lpNtVcZCAGlVdfq1zXp2W+WeVgKgHjk1ovH86urrcy7NJwKRiXq7el07pUVQnslIwLatuD0iqaWUJ2rSFPZ0JauL82Xa9ILnsUN0w+CYDNeOiB/kQ57zpeWDlZCsw5TDPOjwev0lfVBXiANLejOuhUfcPJ2QyqTk7Ayp8KtLoIj0nY4L1xX9ZCbK9ckoO7bYr4BVi7yW5sZdq548hQOAO2LZIxrBJ4P3RVpIOJEUxNw1Eobx3DnUDfIMkzJxyZTryGi1hx6CFwpkNPMcQ6S+RR/+97bKqF/RpxS4aO6qNmm3opKL6YJcrrAiLXoDGY8MxNaP/mpBcHbzXekQPTyDU0sxBF/TWXSVUCw/Dz3sQPZuB1w= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8709 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM1PEPF000252E0.eurprd07.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f8c3678f-59e3-4190-7e18-08dcec3ee073 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|36860700013|376014|1800799024|35042699022; X-Microsoft-Antispam-Message-Info: =?utf-8?q?+55To2abqB/UIw1qnW/LRGAXcTUFEXN?= =?utf-8?q?dKVqOu2odLunFxCYcfTqWYjoW0dq1RIDUuq8y1PuzE6vms0KBVbyJ9BnVUNtl9TJ3?= =?utf-8?q?pmQrDxfXrUMJ+p8Kk4nLhGpEnz/Vj0qg9IanSKjud5dHPasvNpV70LGLT7qH89i5A?= =?utf-8?q?AnsJEeobXnBEvbRM2HLRQ6eumEsPr3DaP8KyaaNg1V2BvI7a5l9iuTHKLWiU1aX0R?= =?utf-8?q?gBZsGMcBmSKkJEJd4ru6oGcepFsmzmLKOltXJ7aPkAH3cFu5JT0tQllWtUYsmKOl2?= =?utf-8?q?+F8dPeSmNHxVFpC2Y6ULf5iQvhKQ3NJnFto/Y0z5xkOGxlVnsupIFSHYi9ikLo0In?= =?utf-8?q?Hg07ogfU/dGqDEjXHzOfSY4pWOVmg01GFExwRSqw8RSSI2+dVXTPCeob/0kbp9GJH?= =?utf-8?q?jJp6cvnquN6RiKuAOe05rDdB1iVuSlVSzoRd76yLY+o7cM+HsArDsAvI4+Sxb2KBp?= =?utf-8?q?NvftRkyq09h2LJI6ctcO/aT/+KYXcU9Zf+HexHHwXyMynXzp3H+HSuOSG+qI0RXin?= =?utf-8?q?kEnMEwrapWIANpgkaRxhAFmqXUL59LMq/7KdjsAKSQtFC22YYYTzhhpzNIFhNX5f1?= =?utf-8?q?aSo9wo1ffgedXqsz7/Y19rU//lRH2VHrzcoaf0NTEXc00Mx7uwTGTCHCS//3159ej?= =?utf-8?q?abzjJICghFnNH7i9/YoqowW5CDxINJDw2DvP37tPgWSSsueFjeh10Tl4MDdzkueNu?= =?utf-8?q?adP3Me4IHsEe1gBOu1SDFmpfinsXpaO1ugLYfmdc5Gu5iOWvtzEDbiRDXVvvlBQSK?= =?utf-8?q?dx4ZQioPNU5XqLFOlF++9+kFkPx/XmTLAq0uNADHb2H20VbpTfhhtj4KJgJjIQJAj?= =?utf-8?q?ZoqfzYZh6Kcsb95TX/9ZU3u4SyfOP4d0K83gqQE9VCfr3DRpHbtJjIV9iJWHCtlFU?= =?utf-8?q?G+BO3eRSdgN4QSvNXxnE2viaGifNhqsG+xs+M3wG7WN7+zEZy4WDHy9rf7eQUbB/J?= =?utf-8?q?WrgzEwQcr93JaJGTa24itnlCu+kiN7XrLjhIgCg6RMTrnRZMII0rNUohHe2Mb5wFW?= =?utf-8?q?5gtANxLppOz40EhH6nRVmUlFIx4nduu7f/2OVON/Z37rznrjRp4i1aB8l/4rTIdq8?= =?utf-8?q?Sswelirgg9PM/LLy1QEWhYVGILmwr51VNHXJebddV86IvleGCF/E5smr5++z+iL6m?= =?utf-8?q?hnNglQtDk6reiZ2iLdgBT/AtNr0WW+0SKMWWpMs0fhtuAI6Z7iUY4grEqmRKtw6sW?= =?utf-8?q?lx76N9gy1P1kjsDu8RRTgAGdtHSCXFLMzyzdEwQmL5ksSX56l712j0jTDea7boKnK?= =?utf-8?q?jcbNKguSNMwtegHiGfcgAA8XC4TICSdCE8Q=3D=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(82310400026)(36860700013)(376014)(1800799024)(35042699022); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2024 10:56:48.8108 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e0b28f1d-bfec-4ba2-db58-08dcec3ee713 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM1PEPF000252E0.eurprd07.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6453 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, Consider loops such as: void test9(unsigned char *x, long long *y, int n, unsigned char k) { for(int i = 0; i < n; i++) { y[i] = k + x[i]; } } where today we generate: .L5: ldr q29, [x5], 16 add x4, x4, 128 uaddl v1.8h, v29.8b, v30.8b uaddl2 v29.8h, v29.16b, v30.16b zip1 v2.8h, v1.8h, v31.8h zip1 v0.8h, v29.8h, v31.8h zip2 v1.8h, v1.8h, v31.8h zip2 v29.8h, v29.8h, v31.8h sxtl v25.2d, v2.2s sxtl v28.2d, v0.2s sxtl v27.2d, v1.2s sxtl v26.2d, v29.2s sxtl2 v2.2d, v2.4s sxtl2 v0.2d, v0.4s sxtl2 v1.2d, v1.4s sxtl2 v29.2d, v29.4s stp q25, q2, [x4, -128] stp q27, q1, [x4, -96] stp q28, q0, [x4, -64] stp q26, q29, [x4, -32] cmp x5, x6 bne .L5 Note how the zero extend from short to long is half way the chain transformed into a sign extend. There are two problems with this: 1. sign extends are typically slower than zero extends on many uArches. 2. it prevents vectorizable_conversion from attempting to do a single step promotion. These sign extend happen due to the varous range reduction optimizations and patterns we have, such as multiplication widening, etc. My first attempt to fix this was just updating the patterns to when the original source is a zero extend, to not add the intermediate sign extend. However this behavior happens in many other places, some of it and as new patterns get added the problem can be re-introduced. Instead I have added a new pattern vect_recog_zero_extend_chain_pattern that attempts to simplify and extend an existing zero extend over multiple conversions statements. As an example, T3 a = (T3)(signed T2)(unsigned T1)x where bitsize T3 > T2 > T1 gets transformed into T3 a = (T3)(signed T2)(unsigned T2)x. The final cast to signed it kept so the types in the tree still match. It will be correctly elided later on. This represenation is the most optimal as vectorizable_conversion is already able to decompose a long promotion into multiple steps if the target does not support it in a single step. More importantly it allows us to do proper costing and support such conversions like (double)x, where bitsize(x) < int in an efficient manner. To do this I have used Ranger's on-demand analysis to perform the check to see if an extension can be removed and extended to zero extend. The reason for this is that the vectorizer introduces several patterns that are not in the IL, but also lots of widening IFNs for which handling in a switch wouldn't be very future proof. I did try to do it without Ranger, but ranger had two benefits: 1. It simplified the handling of the IL changes the vectorizer introduces, and makes it future proof. 2. Ranger has the advantage of doing the transformation in cases where it knows that the top bits of the value is zero. Which we wouldn't be able to tell by looking purely at statements. 3. Ranger simplified the handling of corner cases. Without it the handling was quite complex and I wasn't very confident in it's correctness. So I think ranger is the right way to go here... With these changes the above now generates: .L5: add x4, x4, 128 ldr q26, [x5], 16 uaddl v2.8h, v26.8b, v31.8b uaddl2 v26.8h, v26.16b, v31.16b tbl v4.16b, {v2.16b}, v30.16b tbl v3.16b, {v2.16b}, v29.16b tbl v24.16b, {v2.16b}, v28.16b tbl v1.16b, {v26.16b}, v30.16b tbl v0.16b, {v26.16b}, v29.16b tbl v25.16b, {v26.16b}, v28.16b tbl v2.16b, {v2.16b}, v27.16b tbl v26.16b, {v26.16b}, v27.16b stp q4, q3, [x4, -128] stp q1, q0, [x4, -64] stp q24, q2, [x4, -96] stp q25, q26, [x4, -32] cmp x5, x6 bne .L5 I have also seen similar improvements in codegen on Arm and x86_64, especially with AVX512. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Hopefully Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-patterns.cc (vect_recog_zero_extend_chain_pattern): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pattern-1.c: Update tests. * gcc.dg/vect/slp-widen-mult-half.c: Likewise. * gcc.dg/vect/vect-over-widen-10.c: Likewise. * gcc.dg/vect/vect-over-widen-12.c: Likewise. * gcc.dg/vect/vect-over-widen-14.c: Likewise. * gcc.dg/vect/vect-over-widen-16.c: Likewise. * gcc.dg/vect/vect-over-widen-6.c: Likewise. * gcc.dg/vect/vect-over-widen-8.c: Likewise. * gcc.dg/vect/vect-widen-mult-u16.c: Likewise. * gcc.dg/vect/vect-widen-mult-u8-s16-s32.c: Likewise. * lib/target-supports.exp (check_effective_target_vect_widen_mult_hi_to_si_pattern, check_effective_target_vect_widen_mult_si_to_di_pattern): Enable AArch64. * gcc.target/aarch64/vect-tbl-zero-extend_2.c: New test. --- -- diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c index 5ae99225273ca5f915f60ecba3a5aaedebe46e96..627de78af4e48581575beda97bf2a0708ac091cb 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c @@ -52,4 +52,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp2" { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 8 "slp2" { target vect_widen_mult_hi_to_si_pattern } } } */ -/* { dg-final { scan-tree-dump-times "pattern recognized" 8 "slp2" { target vect_widen_mult_hi_to_si_pattern } } } */ +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 8 "slp2" { target vect_widen_mult_hi_to_si_pattern } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c index b69ade338862cda4f44f5206d195eef1cb5e8d36..aecc085a51c93e0e7bed122df0a77a0a099ad6ef 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c +++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c @@ -52,5 +52,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_widen_mult_hi_to_si } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ -/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ +/* { dg-final { scan-tree-dump-times "pattern recognized" 4 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c index f0140e4ef6d70cd61aa7dbb3ba39b1da142a79b2..bd798fae7e8136975d488206cfef9e39fac2bfea 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c @@ -11,7 +11,7 @@ #include "vect-over-widen-9.c" -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */ +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c index ddb3bd8c0d378f0138c8cc7f9c6ea3300744b8a8..8c0544e35c29de60e76759f4ed13206278c72925 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c @@ -11,7 +11,7 @@ #include "vect-over-widen-11.c" -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */ +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c index dfa09f5d2cafe329e6d57b5cc681786cc2c7d215..1fe0305c1c4f61d05864ef97789726a1dc6ec8b1 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c @@ -11,7 +11,7 @@ #include "vect-over-widen-13.c" -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */ +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c index 4584c586da1e6f13e8c8de4c1291cea0141ebab5..4ecdadf7a035a4f83b1767a063a1b0f47bdd543d 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c @@ -11,7 +11,7 @@ #include "vect-over-widen-15.c" -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */ +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c index bda92c965e080dd3f48ec42b6bea16e79d9416cd..6b8c3dfa2c89ce04d7673607ef2d2f14a14eb32f 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c @@ -9,7 +9,7 @@ #include "vect-over-widen-5.c" -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */ +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c index 553c0712a79a1d19195dbdab7cbd6fa330685bea..1cf725ff4b7f151097192db1a0b65173c4c83b19 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c @@ -12,7 +12,7 @@ #include "vect-over-widen-7.c" -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */ +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c index 258d253f401459d448d1ae86f56b0c97815d5b61..b5018f855a72534b4d64d2dc2b7ab2ac0deb674b 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c @@ -47,5 +47,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c index 3baafca7b548124ae5c48fdf3c2f07c319155967..ab523ca77652e1f1533889fda9c0eb31c987ffe9 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c @@ -47,5 +47,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_2.c b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_2.c new file mode 100644 index 0000000000000000000000000000000000000000..1577eacd9dbbb52274d9f86c77406555b7726482 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_2.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99 -march=armv8-a" } */ + +void test6(unsigned char *x, double *y, int n) { + for(int i = 0; i < (n & -8); i++) { + y[i] += x[i]; + } +} + +void test7(unsigned char *x, double *y, int n, unsigned char k) { + for(int i = 0; i < (n & -8); i++) { + y[i] += k * x[i]; + } +} + +void test8(unsigned char *x, double *y, int n, unsigned char k) { + for(int i = 0; i < (n & -8); i++) { + y[i] = k + x[i]; + } +} + +void test9(unsigned char *x, long long *y, int n, unsigned char k) { + for(int i = 0; i < (n & -8); i++) { + y[i] = k + x[i]; + } +} + +/* { dg-final { scan-assembler-times {\tuxtl} 1 } } */ +/* { dg-final { scan-assembler-not {\tuxtl2} } } */ +/* { dg-final { scan-assembler-not {\tzip1} } } */ +/* { dg-final { scan-assembler-not {\tzip2} } } */ +/* { dg-final { scan-assembler-times {\ttbl} 44 } } */ +/* { dg-final { scan-assembler-times {\.LC[0-9]+:} 12 } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d113a08dff7b2a8ab5bdfe24386d271bff255afc..feae1b8fcf8cd7ab56a8c76c0cd3034c0a828724 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8240,6 +8240,7 @@ proc check_effective_target_vect_widen_mult_hi_to_si_pattern { } { return [check_cached_effective_target_indexed vect_widen_mult_hi_to_si_pattern { expr { [istarget powerpc*-*-*] || [istarget ia64-*-*] + || [istarget aarch64*-*-*] || [istarget loongarch*-*-*] || [istarget i?86-*-*] || [istarget x86_64-*-*] || ([is-effective-target arm_neon] @@ -8259,6 +8260,7 @@ proc check_effective_target_vect_widen_mult_si_to_di_pattern { } { expr { [istarget ia64-*-*] || [istarget i?86-*-*] || [istarget x86_64-*-*] || [istarget loongarch*-*-*] + || [istarget aarch64*-*-*] || ([istarget s390*-*-*] && [check_effective_target_s390_vx]) }}] } diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 9bf8526ac995c6c2678b25f5df4316aec41333e0..74c7269a3ab15cba1ee2ef0556d25afda851f7f0 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -5524,6 +5524,122 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_zero_extend_chain_pattern + + Try to find the following pattern: + + type x_t; + TYPE a_T, b_T, c_T; + loop: + S1 a_T = (b_T)(c_T)x_t; + + where type 'TYPE' is an integral type which has different size + from 'type' and c_T is a zero extend or a sign extend on a value whose top + bit is known to be zero. a_T can be signed or unsigned. + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + + Output: + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the pattern. + This replaces multiple chained extensions with the longest possible + chain or zero extends and a final convert to the required sign. + + S1 a_T = (a_T)(unsigned a_T)x_t; */ + +static gimple * +vect_recog_zero_extend_chain_pattern (vec_info *vinfo, + stmt_vec_info stmt_vinfo, tree *type_out) +{ + gimple *last_stmt = STMT_VINFO_STMT (vect_stmt_to_vectorize (stmt_vinfo)); + + if (!is_gimple_assign (last_stmt)) + return NULL; + + tree_code code = gimple_assign_rhs_code (last_stmt); + tree lhs = gimple_assign_lhs (last_stmt); + tree rhs = gimple_assign_rhs1 (last_stmt); + tree lhs_type = TREE_TYPE (lhs); + tree rhs_type = TREE_TYPE (rhs); + + if ((code != FLOAT_EXPR && code != NOP_EXPR) + || TYPE_UNSIGNED (lhs_type) + || TREE_CODE (rhs_type) != INTEGER_TYPE + || TREE_CODE (rhs) != SSA_NAME + || STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_internal_def) + return NULL; + + /* Check to see if it's safe to extend the zero extend to the new type. + In general this is safe if the rhs1 type is unsigned or if we know that + the top bits are zero, this can happen due to all the widening operations + we have. For instance a widening addition will have top bits zero. */ + if (!TYPE_UNSIGNED (rhs_type)) + { + wide_int wcst = get_nonzero_bits (rhs); + if (wi::neg_p (wcst) || wi::clz (wcst) == 0) + return NULL; + } + + tree cvt_type = unsigned_type_for (lhs_type); + + tree cvt_vectype = get_vectype_for_scalar_type (vinfo, cvt_type); + if (!cvt_vectype || !VECTOR_TYPE_P (cvt_vectype)) + return NULL; + + tree out_vectype = get_vectype_for_scalar_type (vinfo, lhs_type); + if (!out_vectype || !VECTOR_TYPE_P (out_vectype)) + return NULL; + + stmt_vec_info irhs; + + gimple_ranger ranger; + + /* Dig through any existing conversions to see if we can extend the zero + extend chain across multiple converts. */ + while ((irhs = vect_get_internal_def (vinfo, rhs))) + { + gimple *g_irhs = STMT_VINFO_STMT (irhs); + if (!is_gimple_assign (g_irhs) + || gimple_assign_rhs_code (g_irhs) != NOP_EXPR) + break; + + /* See if we can consume the next conversion as well. To do this it's + best to use Ranger as it can see through the intermediate IL that the + vectorizer creates throughout pattern matching. */ + int_range_max r; + ranger.range_of_stmt (r, g_irhs); + wide_int nz = r.get_nonzero_bits (); + if (wi::neg_p (nz) || wi::clz (nz) == 0) + break; + + rhs = gimple_assign_rhs1 (g_irhs); + } + + /* If the result is a no-op, or we've jumped over a truncate of sort, or if + nothing would change materially just leave it alone. */ + if (TYPE_PRECISION (lhs_type) <= TYPE_PRECISION (TREE_TYPE (rhs)) + || (code == FLOAT_EXPR && rhs == gimple_assign_rhs1 (last_stmt))) + return NULL; + + vect_pattern_detected ("vect_recog_zero_extend_chain_pattern", last_stmt); + + tree cast_var = vect_recog_temp_ssa_var (cvt_type, NULL); + gimple *pattern_stmt = NULL; + pattern_stmt = gimple_build_assign (cast_var, NOP_EXPR, rhs); + append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, cvt_vectype); + + tree cvt_var = vect_recog_temp_ssa_var (lhs_type, NULL); + pattern_stmt = gimple_build_assign (cvt_var, code, cast_var); + + *type_out = out_vectype; + + return pattern_stmt; +} + /* Helper function of vect_recog_bool_pattern. Called recursively, return true if bool VAR can and should be optimized that way. Assume it shouldn't @@ -7509,6 +7625,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_widen_minus_pattern, "widen_minus" }, { vect_recog_widen_abd_pattern, "widen_abd" }, /* These must come after the double widening ones. */ + { vect_recog_zero_extend_chain_pattern, "zero_extend_chain" }, }; /* Mark statements that are involved in a pattern. */