From patchwork Mon Jul 22 11:39:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurabh Jha X-Patchwork-Id: 1963200 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=FFs7I3Yy; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=FFs7I3Yy; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WSJHp4XTtz1yXp for ; Mon, 22 Jul 2024 21:40:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9E8A63858416 for ; Mon, 22 Jul 2024 11:40:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on20601.outbound.protection.outlook.com [IPv6:2a01:111:f403:260d::601]) by sourceware.org (Postfix) with ESMTPS id 9EE943858C31 for ; Mon, 22 Jul 2024 11:40:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9EE943858C31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9EE943858C31 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260d::601 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1721648413; cv=pass; b=kvaGVX6OXAzaGdQqkyeOsuPQWuMiSiMo6+Ne4qGJMHyiU5guyIW8OdlVGNv9v0PfhWZtLSvcfR0kY/BwklKp6XQUUcbUmc6ePXuFmVU+D6c3TmBSSOcP65MAvGg6ZGDFzmLft5vedtl7QMsDFRQV2qdysLZD7/Zn9sXo6PeB1Ow= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1721648413; c=relaxed/simple; bh=wo81SKTp0Wiy8Amp3t6HWhL4imhKIlv9OuEvFLcegww=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=jNQ0KuL2sQD4X/N0GCq3EXHsYiqMI5HQ9WMjIvc7Je9gLeLzwLqgiR8yfwE3GAiLuCsfdli4/9WYpSr1Vr/Wxbpk8OTjjw2Az3po11uWKWyPudx8t1RfGPM0p7nBxilpM9lu+xGkPVTtDLCoWqDKKpsSMYESwzz3yF6mPqQwdvc= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=JSc7NNZWbYTssc65DS/q1JKADLmdbkqqfIGBW7Ha63VK38uDc1f2/BPqLvf7/FEiv4uzrzVwuWBiVo1YeeW0qX4jylA4ReJ9mPPJMd9KSiL3WLXncC6jFw2xPj7JmOyYdD0W4peeNm8tgcQLzc9yMhq1lndla4pE5rd8ghzpRXtajcjazleyBdxCjDWltlQM8aCTjDVX5gopvYUKxk1vUKHr8OK7/372L0Gb4gqJr+aYT8vCq56RWtMpDT0nTcmjU1LsG0y3ClNJFg9HD/peR+cWnA3m1tHgu4OfClE3ajbzu3H+ohsoXfwcGNkuenw9r/ygJq9Xr4UXfDGXQfOclA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=u76LsXdp9X6IwAiKVliczWq+eEamY1Kgg9WaxOrjDrE=; b=iZVIBUOuRn3YDSR3Xbi8tgFuGNdhthya2CaU/5cdaWuYIv7afJvGBlvUb2g49W40jDOC7slslliPOvUJp3QsIrjYDTOBJpEn2kyU1UZvwHRCUb8REBzccDUm6VG5kDvhnhQUdb17mnjXge5osAWKH97xkytrEBEslBiLQpwW2n1TCFbpd1drmH+Yfcp374IUYVESgBF+ElEgnCRXsEzN/yndltTIoSNA/aU/Guu9LOjSKIoGL9V4pPyvrQ5WLS+9WBzFOn9ofPWe1FqFs/PAeKUB6o1RBRkR9Fi8ZGRKV8PgHcflvqZqVFOEQDTR1q5f9b1LcN4EoztIOhQwObFFtg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=u76LsXdp9X6IwAiKVliczWq+eEamY1Kgg9WaxOrjDrE=; b=FFs7I3YyzGkx70G0zWCzCFgrrTQsTUafElGtOW3GUWUzboJESvRsamhsWAdl1Yn0lhLCYcmzWhv225wxXeyxzUoC4CQY2fNFLIAAevMkuhZwcRGPLTeoGVV3u2qWfBCQPgje8rwJbqvbIes/aEtFtnyChpbTPiPHLIPo4N6ocK8= Received: from AM6P193CA0064.EURP193.PROD.OUTLOOK.COM (2603:10a6:209:8e::41) by GV2PR08MB9373.eurprd08.prod.outlook.com (2603:10a6:150:da::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.31; Mon, 22 Jul 2024 11:40:05 +0000 Received: from AMS1EPF0000004B.eurprd04.prod.outlook.com (2603:10a6:209:8e:cafe::7a) by AM6P193CA0064.outlook.office365.com (2603:10a6:209:8e::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.18 via Frontend Transport; Mon, 22 Jul 2024 11:40:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS1EPF0000004B.mail.protection.outlook.com (10.167.16.136) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7784.11 via Frontend Transport; Mon, 22 Jul 2024 11:40:05 +0000 Received: ("Tessian outbound cd0b9b5d6f11:v365"); Mon, 22 Jul 2024 11:40:05 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 67e886b2828cf872 X-CR-MTA-TID: 64aa7808 Received: from L103e71372833.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BDD32D75-F44B-4170-8285-3C2798D1B428.1; Mon, 22 Jul 2024 11:39:53 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L103e71372833.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 22 Jul 2024 11:39:53 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Wd4nrldTJg5d7PpZ9sPGcymwVg/vYa7L34jHS1hGkqFj643bp2a0CEYGoHqgVScm1pbMDWEiLDNh18yZgw6hqC5/bHy9+mn6fy7r0fdjW9pZ8QvkEspdEWbLhOsJkPysaqqEHa1RvZmMJnQTUFHNf/rEAzRHy4E0/aEQXhw+Dq4BH/Au+TJwbXcQ537AhvIoIK96WXbnwKPFlC9FrvuW1cJXLLrb06fu3Ox1KaqyU8on5JzwnG8UTZkLhmIiTzDcu9pR8H8PEOeZvw72XlJHj70oD+wyXYHDvP45HrSXsCS2Vg7I1EgABG/tZ0st7CrZTJ1TTF82LjAUM2POQ5QYjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=u76LsXdp9X6IwAiKVliczWq+eEamY1Kgg9WaxOrjDrE=; b=EHAnwwltNXTWUB4Aj69MzaS2r5yz4/UHujvO3wXUp1IP+WYZd++Q98QLxe+MXV6VdJn+lcWAiucQrNRYkOuAp9lrd22BSc3yGJzxrzCfZdbsCwmuuxox9xVY0hz4I3BbJ2uk5Ak70YWz/zfOTgIstTQTXuobx6el+nTt61w6/P5+ktIotNiWhMENyMxB3+4dPMV/SUAMJorAUMUWxyEVgjQG4KgnU6AUBmqImR5ESNSJ8Is78HC49eidObA/0IYRTLxBlxsJuJATx68v09FqCNDc+WOJ68O86KWcZkWXZUkYEDnwfEcW3nv6RkmSTu2J6vdCH/h+GuLdUwoBbg3gIg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=u76LsXdp9X6IwAiKVliczWq+eEamY1Kgg9WaxOrjDrE=; b=FFs7I3YyzGkx70G0zWCzCFgrrTQsTUafElGtOW3GUWUzboJESvRsamhsWAdl1Yn0lhLCYcmzWhv225wxXeyxzUoC4CQY2fNFLIAAevMkuhZwcRGPLTeoGVV3u2qWfBCQPgje8rwJbqvbIes/aEtFtnyChpbTPiPHLIPo4N6ocK8= Received: from AM0PR06CA0076.eurprd06.prod.outlook.com (2603:10a6:208:fa::17) by AS8PR08MB9768.eurprd08.prod.outlook.com (2603:10a6:20b:613::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.28; Mon, 22 Jul 2024 11:39:51 +0000 Received: from AM3PEPF0000A790.eurprd04.prod.outlook.com (2603:10a6:208:fa:cafe::db) by AM0PR06CA0076.outlook.office365.com (2603:10a6:208:fa::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.16 via Frontend Transport; Mon, 22 Jul 2024 11:39:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AM3PEPF0000A790.mail.protection.outlook.com (10.167.16.119) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7784.11 via Frontend Transport; Mon, 22 Jul 2024 11:39:51 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 22 Jul 2024 11:39:40 +0000 Received: from e130340.cambridge.arm.com (10.2.80.47) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Mon, 22 Jul 2024 11:39:40 +0000 From: To: CC: , , , Saurabh Jha Subject: [PATCH] aarch64: Add ACLE intrinsics for AdvSIMD faminmax Date: Mon, 22 Jul 2024 12:39:36 +0100 Message-ID: <20240722113937.959073-1-saurabh.jha@arm.com> X-Mailer: git-send-email 2.43.2 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM3PEPF0000A790:EE_|AS8PR08MB9768:EE_|AMS1EPF0000004B:EE_|GV2PR08MB9373:EE_ X-MS-Office365-Filtering-Correlation-Id: 21a47375-d4bb-4d4a-6ca4-08dcaa430832 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|82310400026|1800799024|36860700013|376014; X-Microsoft-Antispam-Message-Info-Original: Pvo+/g/vMvilUSDOZ5D/ZWaOO2qcoWG6Xuz82A9TilVZuH/ISLgJ45MEDpKWoxi/E0L/zAPO0qfCIljd2yVrJfi+oqVu4l8aZnpkx9Qxx04yZVxARlbuRD15l0JAqOSxxbD+lL1O6zPoNlzw8oHzFyHZCFQSEjLIL5rbRlr5d/ED6Vah7NbGM4cR81frxd9gcIeBxRKsxK36wwTQeWkSpzR1MYyJkeX82h3LpcNk/R6G8QVsYg78cU/Om8AHEXJSYBa8ZcsgHQ69y7S3xOKNB3+5aPktF8Af2ZfgWmvwYB6fe/doJtZHk2ZLSwgjxsftFwt3tZPXuZovXELSaA2epPU8Qzdm7SzpgO7tz3xvHdUeuQrlZu9255JE4yc4I+iCA1hMTdKmGozTHPTx+Y26IrbQsH6En7v4cgeU5dhtKGzfLa86GtJ3PLTM+fYO4SaQAE02jpSTTreYhlZLnYJ2HdqEb2VwlWuRuzjHbLAbPo673YY5Ae+yVTQiNujv60ELGBPFampZoAPwdDgdDKOoCM6pnUpyERLtx3JhAIwmiSG1U3elRasNrK5yDRTdVoDCukXBtiJmfqXI4F/VNhw9k+oy2IaPiFRCFJreAuvmpG/xXnzOjUp+MSO4NN+R+ziKXjrAqxwSjcVpkhYRQ9NCe11XoEZJ+jn9lrGRyFoqCMvA2TO+j+PtM6bxGK8aQXOK1yYFs0hcvnpSdiBk2juCIb4uefhX3rd1hwFgkctsxKF3SYKMH6b3KdBm8ubQGVkSEs+puVM1NtmRQw5+bCUJ1NJDVQ0yIR8T74lUHwNbEJJ/sxQE7yDvawLSoIc5cWoJD6+vQmHR1vP4mZdd0ZaYLqB4OkwI3si24lpoKlDAl20LaIzorioq6A8lXlej4O2aYbgc0xR0UxQJ8MTT53jrAAwNdeysxtFM9Zhadz0I7Kn8naRz19cUA6FfJlZL16oCR6LfeZjZot7w8pjFvpflx4cke7gaHkWRc2qGzclMGFYGxGwe0Tl/8C1KpXInphObGB+iqVs2yyj+0WLCqVcoCASGWW4z9Z7+5/t/2Zx4XEDKTx4Goj6f7VLYmCD+BDHgr00xjEWxKiUN6CVk6IKR6ZsR9ViF4+Ux9mRD5v7rI0wEbLFMD9iKxZG3ubdxoe1DTR4htdqARA2eOFfm2FNCYbdG8hmKRW9GcE+MACWXluh4MC1Bng7NtMpMzz4kIea4G/7qMan7HYGbgejGFEiRO/LDkxhseHBuPN+V8WpxR+Dv+YBI8mZKG6/4ReOI/mdiM3PKVvrZSMs9oD/7/TNPbxyiBINZme9wQHgsIxbOPwO3KcxuSdd4yD5hk1hOwM5GYhiGi+PsngnEIkAMnyzgS9A4YfSCYcIwae3k98xTFxyhfMXN8bJWIgYOgVaEAhEf9oVcfCl5gYEoIjwMZfGqyw== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(82310400026)(1800799024)(36860700013)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9768 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:208:fa::17]; domain=AM0PR06CA0076.eurprd06.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS1EPF0000004B.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 916028a8-89c2-47bf-5c79-08dcaa42ffda X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700013|1800799024|376014|82310400026|35042699022|34020700016; X-Microsoft-Antispam-Message-Info: =?utf-8?q?KhKjqqWysrF4B97It4kZcGLzgUBjvDb?= =?utf-8?q?0TvNCX6HowEosivMOe3GVIOAW1Qsb70HA3X6wLcGPWuXN6ofR2CXQxC0kBrXavaWD?= =?utf-8?q?sPMg98afSCuETdeLPVnGvxZwfNbh2bkMCXS/KiCDUuoY+mULbRcBZHpeffZV8bSpW?= =?utf-8?q?rSeI9Wso/I7lxlqPp8b2HviB9INa5WyoTYEyvZauqj7uRffMahg6LzpBHhnBcF8OP?= =?utf-8?q?fyd+A1bvfs3Rp+EPyXX/OjmSjuqxBDa71g/lf5woCpOOIz9OmHYMabHq/t9Djcs4M?= =?utf-8?q?RFrGQoYpy/GWn0VNhqWwcnhXSxlQ5bTl+ngBJ3NI3A32ilFN5m05vkAh2ZQL1+rIg?= =?utf-8?q?n11jz9ZzSrYreG26PmV4jWX9QirUbKjcO4SCh104+xbvCydFDpy5/ReFsGqyKeD4z?= =?utf-8?q?LZrXa61cJSSrS7QOyX525RCjLng6Xs9Ll0HMYXTwJWiFr4ZutJawRzh5NgPjNs31I?= =?utf-8?q?4MC8Hd4sv4nG8Cdhw+LsUXJKW5LlstiLMbxcmGwdIbSUoCvxAiy7HfzqJ+0K9q6Ir?= =?utf-8?q?qQr+2uce2r4Nvl/o8GdKoORTXy0hwHIXDQF9NZsFXpLHANcchzuLim1QuzA9OurIc?= =?utf-8?q?Awku2WZzmYKwQlMCY+XwoTJf83vXFc5gs4pZSZ/9+FvE3B1HIbilAn6tWCUqGf4ds?= =?utf-8?q?+9+s6wZSabVJGw19gKFuF3TRfF7Bja55raJZVtHXyukIHcgcIT8r0uCG9oOepC1ef?= =?utf-8?q?kSs6rvM+iuCK104kDf1hNME6HrpumUwXfjnksmje2hNXO8F4KvxaUG2S7wzLlGqBn?= =?utf-8?q?YnQKj13D98ZM0pEVquVg3RtFHoAFoaGxxuvDP6MMVXlmaAMEX3Gu3z424b1lH588c?= =?utf-8?q?9HUc2XTfA75LMc0uW7p8XGZpOpElmkvGeRuI9YBJxRMo6VAiLrBjToeObTYbQFy5U?= =?utf-8?q?uW93VLNX0I39Pj0U0IRneQeskncivat1KdHy1ohEncWPUtd3dTTuX+pMWgB/bGV9X?= =?utf-8?q?A3nSmxMfuyrnYi+oq66O+FQdUf1MAKtLtsEb4pWLMUwcjrhCyPWKAMH1uaQPdeoMM?= =?utf-8?q?Ts3pRLYQeady7cFELTQNnEKdsH13SE2lIv4bzqrGHLqiILQnkIep0qxjiQ4DX1qr7?= =?utf-8?q?/ypGZbeIAn5pQSBILUIsd/QNUYXcgoCUajR3ILhQ6TZM46iyTg+4lUt2I69hguXEz?= =?utf-8?q?ARt7+m98A3ZP66zKKd7fM5VtMNfrfQ8odrhRoKgoYkNgbi+54bRsol6ZdlAJmzKhJ?= =?utf-8?q?SgVJyZq7B+yZjPyH2xRF6YL1eHWcIYCU06O1hiOexorXUWTVdp9gw7ak8NMzKS+y5?= =?utf-8?q?7fIxdE7twHjd7+ltwJKg8Yz9p1rprTCapCZjEA9YRsXj8hmfmGVbIaPpEpNMgKyI6?= =?utf-8?q?o13qQJURN27G0sVrhn1LK3/k5Tftr2d/nw=3D=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(36860700013)(1800799024)(376014)(82310400026)(35042699022)(34020700016); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2024 11:40:05.6195 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 21a47375-d4bb-4d4a-6ca4-08dcaa430832 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS1EPF0000004B.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9373 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The AArch64 FEAT_FAMINMAX extension is optional in Armv9.2 and mandatory in Armv9.5. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch introduces intrinsics for AdvSIMD faminmax extension in the form of the following builtin-functions: * vamax_f16 * vamaxq_f16 * vamax_f32 * vamaxq_f32 * vamaxq_f64 * vamin_f16 * vaminq_f16 * vamin_f32 * vaminq_f32 * vaminq_f64 gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): New enum values for faminmax builtins. (aarch64_init_faminmax_builtins): New function to declare new builtins. (handle_arm_neon_h): Modified to call aarch64_init_faminmax_builtins. (aarch64_general_check_builtin_call): Modified to check whether +faminmax flag is being used and printing error message if not used. (aarch64_expand_builtin_faminmax): New function to emit instructions of this extension. (aarch64_general_expand_builtin): Modified to call aarch64_expand_builtin_faminmax. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Introduce new flag for this extension. * config/aarch64/aarch64-simd.md (aarch64_): Introduce instruction pattern for this extension. * config/aarch64/aarch64.h (TARGET_FAMINMAX): Introduce new flag for this extension. * config/aarch64/iterators.md: Introduce new iterators for this extension. * config/arm/types.md: Introduce neon_fp_aminmax attributes. * doc/invoke.texi: Document extension in AArch64 Options. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax.c: New tests for this extension. --- Hi, Regression tested for aarch64-none-linux-gnu and found no regressions. This patch is dependent on the patch series "Extend aarch64_feature_flags to 128 bits" which is under review. This patch should be commited only after that patch series is commited. I am raising this patch now for early feedback. Ok for master? I don't have commit access so can someone please commit on my behalf? Regards, Saurabh --- gcc/config/aarch64/aarch64-builtins.cc | 150 ++++++++++++++++-- .../aarch64/aarch64-option-extensions.def | 2 + gcc/config/aarch64/aarch64-simd.md | 11 ++ gcc/config/aarch64/aarch64.h | 4 + gcc/config/aarch64/iterators.md | 10 ++ gcc/config/arm/types.md | 6 + gcc/doc/invoke.texi | 2 + .../gcc.target/aarch64/simd/faminmax.c | 40 +++++ 8 files changed, 216 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 30669f8aa18..b3d8cf22eeb 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -829,6 +829,17 @@ enum aarch64_builtins AARCH64_RBIT, AARCH64_RBITL, AARCH64_RBITLL, + /* FAMINMAX builtins. */ + AARCH64_FAMINMAX_BUILTIN_FAMAX4H, + AARCH64_FAMINMAX_BUILTIN_FAMAX8H, + AARCH64_FAMINMAX_BUILTIN_FAMAX2S, + AARCH64_FAMINMAX_BUILTIN_FAMAX4S, + AARCH64_FAMINMAX_BUILTIN_FAMAX2D, + AARCH64_FAMINMAX_BUILTIN_FAMIN4H, + AARCH64_FAMINMAX_BUILTIN_FAMIN8H, + AARCH64_FAMINMAX_BUILTIN_FAMIN2S, + AARCH64_FAMINMAX_BUILTIN_FAMIN4S, + AARCH64_FAMINMAX_BUILTIN_FAMIN2D, /* System register builtins. */ AARCH64_RSR, AARCH64_RSRP, @@ -1547,6 +1558,66 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma) } } +/* Initialize the absolute maximum/minimum (FAMINMAX) builtins. */ + +typedef struct +{ + const char *name; + unsigned int code; + tree eltype; + machine_mode mode; +} faminmax_builtins_data; + +static void +aarch64_init_faminmax_builtins () +{ + faminmax_builtins_data data[] = { + /* Absolute maximum. */ + {"vamax_f16", AARCH64_FAMINMAX_BUILTIN_FAMAX4H, + aarch64_simd_types[Float16x4_t].eltype, + aarch64_simd_types[Float16x4_t].mode}, + {"vamaxq_f16", AARCH64_FAMINMAX_BUILTIN_FAMAX8H, + aarch64_simd_types[Float16x8_t].eltype, + aarch64_simd_types[Float16x8_t].mode}, + {"vamax_f32", AARCH64_FAMINMAX_BUILTIN_FAMAX2S, + aarch64_simd_types[Float32x2_t].eltype, + aarch64_simd_types[Float32x2_t].mode}, + {"vamaxq_f32", AARCH64_FAMINMAX_BUILTIN_FAMAX4S, + aarch64_simd_types[Float32x4_t].eltype, + aarch64_simd_types[Float32x4_t].mode}, + {"vamaxq_f64", AARCH64_FAMINMAX_BUILTIN_FAMAX2D, + aarch64_simd_types[Float64x2_t].eltype, + aarch64_simd_types[Float64x2_t].mode}, + /* Absolute maximum. */ + {"vamin_f16", AARCH64_FAMINMAX_BUILTIN_FAMIN4H, + aarch64_simd_types[Float16x4_t].eltype, + aarch64_simd_types[Float16x4_t].mode}, + {"vaminq_f16", AARCH64_FAMINMAX_BUILTIN_FAMIN8H, + aarch64_simd_types[Float16x8_t].eltype, + aarch64_simd_types[Float16x8_t].mode}, + {"vamin_f32", AARCH64_FAMINMAX_BUILTIN_FAMIN2S, + aarch64_simd_types[Float32x2_t].eltype, + aarch64_simd_types[Float32x2_t].mode}, + {"vaminq_f32", AARCH64_FAMINMAX_BUILTIN_FAMIN4S, + aarch64_simd_types[Float32x4_t].eltype, + aarch64_simd_types[Float32x4_t].mode}, + {"vaminq_f64", AARCH64_FAMINMAX_BUILTIN_FAMIN2D, + aarch64_simd_types[Float64x2_t].eltype, + aarch64_simd_types[Float64x2_t].mode}, + }; + + for (size_t i = 0; i < ARRAY_SIZE (data); ++i) + { + tree type + = build_vector_type (data[i].eltype, GET_MODE_NUNITS (data[i].mode)); + tree fntype = build_function_type_list (type, type, type, NULL_TREE); + unsigned int code = data[i].code; + const char *name = data[i].name; + aarch64_builtin_decls[code] + = aarch64_general_simulate_builtin (name, fntype, code); + } +} + /* Register the tuple type that contains NUM_VECTORS of the AdvSIMD type indexed by TYPE_INDEX. */ static void @@ -1640,6 +1711,7 @@ handle_arm_neon_h (void) aarch64_init_simd_builtin_functions (true); aarch64_init_simd_intrinsics (); + aarch64_init_faminmax_builtins (); } static void @@ -2197,15 +2269,34 @@ aarch64_general_check_builtin_call (location_t location, vec, case AARCH64_WSR64: case AARCH64_WSRF: case AARCH64_WSRF64: - tree addr = STRIP_NOPS (args[0]); - if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE - || TREE_CODE (addr) != ADDR_EXPR - || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST) - { - error_at (location, "first argument to %qD must be a string literal", - fndecl); - return false; - } + { + tree addr = STRIP_NOPS (args[0]); + if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE + || TREE_CODE (addr) != ADDR_EXPR + || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST) + { + error_at (location, + "first argument to %qD must be a string literal", + fndecl); + return false; + } + } + case AARCH64_FAMINMAX_BUILTIN_FAMAX8H: + case AARCH64_FAMINMAX_BUILTIN_FAMAX2S: + case AARCH64_FAMINMAX_BUILTIN_FAMAX4S: + case AARCH64_FAMINMAX_BUILTIN_FAMAX2D: + case AARCH64_FAMINMAX_BUILTIN_FAMIN4H: + case AARCH64_FAMINMAX_BUILTIN_FAMIN8H: + case AARCH64_FAMINMAX_BUILTIN_FAMIN2S: + case AARCH64_FAMINMAX_BUILTIN_FAMIN4S: + case AARCH64_FAMINMAX_BUILTIN_FAMIN2D: + { + if (!TARGET_FAMINMAX) + { + error_at (location, "need +faminmax flag to call %qD", fndecl); + return false; + } + } } /* Default behavior. */ return true; @@ -3071,6 +3162,44 @@ aarch64_expand_builtin_data_intrinsic (unsigned int fcode, tree exp, rtx target) return ops[0].value; } +static rtx +aarch64_expand_builtin_faminmax (unsigned int fcode, tree exp, rtx target) +{ + machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + rtx op0 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 0))); + rtx op1 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 1))); + + enum insn_code icode; + if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX4H) + icode = CODE_FOR_aarch64_famaxv4hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX8H) + icode = CODE_FOR_aarch64_famaxv8hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX2S) + icode = CODE_FOR_aarch64_famaxv2sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX4S) + icode = CODE_FOR_aarch64_famaxv4sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX2D) + icode = CODE_FOR_aarch64_famaxv2df; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN4H) + icode = CODE_FOR_aarch64_faminv4hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN8H) + icode = CODE_FOR_aarch64_faminv8hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN2S) + icode = CODE_FOR_aarch64_faminv2sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN4S) + icode = CODE_FOR_aarch64_faminv4sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN2D) + icode = CODE_FOR_aarch64_faminv2df; + else + gcc_unreachable (); + + rtx pat = GEN_FCN (icode) (target, op0, op1); + + emit_insn (pat); + + return target; +} + /* Expand an expression EXP as fpsr or fpcr setter (depending on UNSPEC) using MODE. */ static void @@ -3250,6 +3379,9 @@ aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target, if (fcode >= AARCH64_REV16 && fcode <= AARCH64_RBITLL) return aarch64_expand_builtin_data_intrinsic (fcode, exp, target); + if (fcode >= AARCH64_FAMINMAX_BUILTIN_FAMAX4H + && fcode <= AARCH64_FAMINMAX_BUILTIN_FAMIN2D) + return aarch64_expand_builtin_faminmax (fcode, exp, target); gcc_unreachable (); } diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 42ec0eec31e..e95bd70893a 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the") AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs") +AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax") + #undef AARCH64_OPT_FMV_EXTENSION #undef AARCH64_OPT_EXTENSION #undef AARCH64_FMV_FEATURE diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bbeee221f37..c769a482312 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -9881,3 +9881,14 @@ "shl\\t%d0, %d1, #16" [(set_attr "type" "neon_shift_imm")] ) + +;; faminmax instruction patterns +(define_insn "aarch64_" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FAMINMAX_UNS))] + "TARGET_FAMINMAX" + "\t%0., %1., %2." + [(set_attr "type" "neon_fp_aminmax")] +) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 8056c337957..c6773f64745 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -456,6 +456,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED enabled through +gcs. */ #define TARGET_GCS AARCH64_HAVE_ISA (GCS) +/* Floating Point Absolute Maximum/Minimum extension instructions are + enabled through +faminmax. */ +#define TARGET_FAMINMAX AARCH64_HAVE_ISA (FAMINMAX) + /* Prefer different predicate registers for the output of a predicated operation over re-using an existing input predicate. */ #define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 95fe8f070f4..297e1b8e9d9 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1111,6 +1111,10 @@ UNSPEC_SME_WRITE UNSPEC_SME_WRITE_HOR UNSPEC_SME_WRITE_VER + + ;; Used in faminmax patterns + UNSPEC_FAMAX + UNSPEC_FAMIN ]) ;; ------------------------------------------------------------------ @@ -4457,3 +4461,9 @@ (UNSPECV_SET_FPCR "fpcr")]) (define_int_attr bits_etype [(8 "b") (16 "h") (32 "s") (64 "d")]) + +;; Iterators and attributes for faminmax + +(define_int_iterator FAMINMAX_UNS [UNSPEC_FAMAX UNSPEC_FAMIN]) + +(define_int_attr faminmax [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")]) diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 9527bdb9e87..d8de9dbc9d1 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -492,6 +492,8 @@ ; neon_fp_reduc_minmax_s_q ; neon_fp_reduc_minmax_d ; neon_fp_reduc_minmax_d_q +; neon_fp_aminmax +; neon_fp_aminmax_q ; neon_fp_cvt_narrow_s_q ; neon_fp_cvt_narrow_d_q ; neon_fp_cvt_widen_h @@ -1044,6 +1046,8 @@ neon_fp_reduc_minmax_d,\ neon_fp_reduc_minmax_d_q,\ \ + neon_fp_aminmax,\ + neon_fp_aminmax_q,\ neon_fp_cvt_narrow_s_q,\ neon_fp_cvt_narrow_d_q,\ neon_fp_cvt_widen_h,\ @@ -1264,6 +1268,8 @@ neon_fp_reduc_add_d_q, neon_fp_reduc_minmax_s, neon_fp_reduc_minmax_s_q, neon_fp_reduc_minmax_d,\ neon_fp_reduc_minmax_d_q,\ + neon_fp_aminmax, neon_fp_aminmax_q,\ + neon_fp_aminmax, neon_fp_aminmax_q,\ neon_fp_cvt_narrow_s_q, neon_fp_cvt_narrow_d_q,\ neon_fp_cvt_widen_h, neon_fp_cvt_widen_s, neon_fp_to_int_s,\ neon_fp_to_int_s_q, neon_int_to_fp_s, neon_int_to_fp_s_q,\ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4850c7379bf..d48516f4f60 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21777,6 +21777,8 @@ Enable support for Armv9.4-a Guarded Control Stack extension. Enable support for Armv8.9-a/9.4-a translation hardening extension. @item rcpc3 Enable the RCpc3 (Release Consistency) extension. +@item faminmax +Enable the Floating Point Absolute Maximum/Minimum extension. @end table diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax.c new file mode 100644 index 00000000000..52eafce1b5e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax.c @@ -0,0 +1,40 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-march=armv9-a+faminmax" } */ + +#include "arm_neon.h" + +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + return vamax_f16 (a, b); +} + +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + return vamaxq_f16 (a, b); +} + +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + return vamax_f32 (a, b); +} + +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + return vamaxq_f32 (a, b); +} + +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + return vamaxq_f64 (a, b); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */