From patchwork Fri Aug 30 11:16:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurabh Jha X-Patchwork-Id: 1978939 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=O9FXM5HG; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=O9FXM5HG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WwFyF72wqz1yfX for ; Fri, 30 Aug 2024 21:18:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD302385DDD0 for ; Fri, 30 Aug 2024 11:18:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2060f.outbound.protection.outlook.com [IPv6:2a01:111:f403:2613::60f]) by sourceware.org (Postfix) with ESMTPS id 94A4F385DDD0 for ; Fri, 30 Aug 2024 11:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 94A4F385DDD0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 94A4F385DDD0 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2613::60f ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1725016643; cv=pass; b=K3oGt2sqyMzJCrEPGBPSlgGN07/1npshfSZlHBtF1UKWas1ohRqCo8jgOSxU1aAtTFR6beETYxNI7qG7L/DtwaHEw70Azc7nBVDN4NyZLTNP++qCEPf9UwRZz4gFUUlhSrGAQjJz4L/hQQMDd3qMVwjSPeOE5EINj1JIOMQ8L0I= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1725016643; c=relaxed/simple; bh=K6zwdVoLWKBSfwt9Vqvq6cpPuQ3iX0qR6tpgaTv8XVI=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=n+UaSqXE2Hfyo3t2z9xBN3LiOdkEhEb/Tf3GGd5KP3NpbTt6DkcGJiq6vv3xy9T2nsLiduT8RMWMQ2pyerHqb/LdFDMa7QoyhT2iOIIpbfPugyWl7KA/N8sPbgFcwoBMKHcSBI0y2fMHERFJ4eMwejcQ6d4nWXOsCQhqVjXAo0s= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=u7OU6gAaD21FYhp8giXhDxddCkYAjfODRi+8hhNUvCq96oHIzGKbAdEwWX/C7VA/9p6ItHmu8qejPRBl9bkPwUpte9JTJWS1gROA5HBj3xqeXTa/lD2cQDaR9gBkk6bTqaOA1qDZV0OjusE6dU2HD6gxukjhLaouyzCqQwF2H0uh/Hdq6AyNKVLsMOw4F7Dwhw8d6mmh8J2ZGe5v3HuKj+9DfITvAuGi70jcOdculfPW9chNL189Nzv1QFhqtM6w/NiuckN5/pCDywBMthC0RxhaK0mUfnjS/PoQmQTHeC7F2UnDCmi2+xU4avW1ShU5oVR9cqOxzYkId4ugnCLSuw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wvTMTRduqXNvFack00dSmMcDdEablPCc+K3Bz81o1Cs=; b=MKz0UxCHDUVG6X1EXc7Ne+ilpeGWw+DMM3QWH7vtwYzUkItGiAxK/w5T4zVIR/SaaCQ7lBS+DtPONpVRSEW226M//4mday1u3L/ifg/Eht6Kjd5xig0AV/oT7Gpx5/YhKs/bNxxAF8FrnrngquScqNDMJhM9bKKhg73bNw+LtHJgYefJ7BtChbaPU3/61aoQyVotO7pzwSyB0O0WyGAOhCz8SPPA+HzKLLL+hIsviIBb7fb4VdAgF7XjsgX5vuk2uIcA092IzeXBW6MAVHPEEH5C6xPVwC4uhG+ZSWVQq4aS26vraqjoPCibQh9EBRtVnrBRpeFBlTs3YwKMFKdsng== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wvTMTRduqXNvFack00dSmMcDdEablPCc+K3Bz81o1Cs=; b=O9FXM5HGGiuQYKW3NQfe5m0X9uscJaPlzruIyUFYab0AFAEd0PCTI0V4e+OQ2bcG1TGKODPT6ARYv5CicIrC01+zrY/xhelC98lbiWXbgDDTFvvI5eTsqzCyNdsx99PR4XGvCqQ5q2ySIkZyx/OrUysbdYYvR+JRGidoOK3Ze8I= Received: from AM6PR02CA0027.eurprd02.prod.outlook.com (2603:10a6:20b:6e::40) by DB4PR08MB9382.eurprd08.prod.outlook.com (2603:10a6:10:3f2::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.19; Fri, 30 Aug 2024 11:17:05 +0000 Received: from AM3PEPF00009BA2.eurprd04.prod.outlook.com (2603:10a6:20b:6e:cafe::3a) by AM6PR02CA0027.outlook.office365.com (2603:10a6:20b:6e::40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.20 via Frontend Transport; Fri, 30 Aug 2024 11:17:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM3PEPF00009BA2.mail.protection.outlook.com (10.167.16.27) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Fri, 30 Aug 2024 11:17:05 +0000 Received: ("Tessian outbound 22f8cf4ed816:v403"); Fri, 30 Aug 2024 11:17:05 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a600a62db0c6d354 X-CR-MTA-TID: 64aa7808 Received: from L9d9bade74f35.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A82E8A34-9BDE-4309-81C2-AB1570A751F3.1; Fri, 30 Aug 2024 11:16:53 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L9d9bade74f35.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 30 Aug 2024 11:16:53 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FetK+MLHy8UTF68OCZG1w1uC2292hGpwuOkWpx+8qaZpaYvfvpHBov/Qhhu7FJnZqT86jtDmXJDS1GjWR8RxyhR5cKGCNUNUXGakFBag+iYIOopYzL8kI1/YcjXXJczCgw/YOZK6LKvV3IhofIM6Ip8PFhhQuJq+qtR694slZfYFH4dWDeeVPBNd+qsq9QYpoCQndZzHEw+XWjUAHhANc+jyfJmvWpVDJvez00461IfhUmuurbxTwP95z6ufkWXJ6BMSx1ZGlu9+JbpikAsDzepMl9nYLz/bDkLXJ/DX5kVZmARQ65D+pwDVg6mIjBRCvW+Q3JBynQb1SDOYkXq4DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wvTMTRduqXNvFack00dSmMcDdEablPCc+K3Bz81o1Cs=; b=k/YEUCipUuzg/FmjFQeeQFgDanXC18aDOXCUeerLC3nD6YtbnCnS3emDg5uOi4mukDQrQA1la3qdcXx+X+Nd7cb8+WT+OfjJamaVaoKKzGQ0SFbfH1TCrndCyFvP+IHFR3MdW9DvktT4kRMDwtdZBDi/5mhkp6fGHBqLc/KOSPGILj0ShnWPVL+GYLtnOJxthdaHnvFqm5ePR0eA0ONkdErCV50f7vnTIzzRwRmxPNT5GDJ+Wt9NVIkH2e/yvL/0kX3EMSj9W9GfBNO5UYrfU5QDeFecE/dpjuOxH5c+x2WrZtnKUk0JvFQ1DFOPhbAfH6k3QdyBWYJy12RAlBJtlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wvTMTRduqXNvFack00dSmMcDdEablPCc+K3Bz81o1Cs=; b=O9FXM5HGGiuQYKW3NQfe5m0X9uscJaPlzruIyUFYab0AFAEd0PCTI0V4e+OQ2bcG1TGKODPT6ARYv5CicIrC01+zrY/xhelC98lbiWXbgDDTFvvI5eTsqzCyNdsx99PR4XGvCqQ5q2ySIkZyx/OrUysbdYYvR+JRGidoOK3Ze8I= Received: from AM4PR07CA0023.eurprd07.prod.outlook.com (2603:10a6:205:1::36) by VI1PR08MB5519.eurprd08.prod.outlook.com (2603:10a6:803:133::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.19; Fri, 30 Aug 2024 11:16:49 +0000 Received: from AMS0EPF0000019F.eurprd05.prod.outlook.com (2603:10a6:205:1:cafe::3c) by AM4PR07CA0023.outlook.office365.com (2603:10a6:205:1::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7897.13 via Frontend Transport; Fri, 30 Aug 2024 11:16:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AMS0EPF0000019F.mail.protection.outlook.com (10.167.16.251) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Fri, 30 Aug 2024 11:16:49 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 30 Aug 2024 11:16:44 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX06.Arm.com (10.240.25.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 30 Aug 2024 11:16:41 +0000 Received: from e130340.cambridge.arm.com (10.2.80.47) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 30 Aug 2024 11:16:41 +0000 From: To: CC: , , Saurabh Jha Subject: [PATCH v7 2/2] aarch64: Add codegen support for AdvSIMD faminmax Date: Fri, 30 Aug 2024 12:16:26 +0100 Message-ID: <20240830111626.70300-3-saurabh.jha@arm.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20240830111626.70300-1-saurabh.jha@arm.com> References: <20240830111626.70300-1-saurabh.jha@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AMS0EPF0000019F:EE_|VI1PR08MB5519:EE_|AM3PEPF00009BA2:EE_|DB4PR08MB9382:EE_ X-MS-Office365-Filtering-Correlation-Id: 8051888e-61a3-4379-5430-08dcc8e547a0 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|36860700013|1800799024|82310400026; X-Microsoft-Antispam-Message-Info-Original: VXtpC0ikvSNX0/NcxFc+Fw+O3CJX6f6NCCkZlj3oORvlXE++wZ8Il90apwNVvkF8mk10wYdY2f21wOu2UNjPdOq1/fLxXCWkJoOTnrBDJg4nURIyVdx8fihWuuo4D9rpodHgDl5LvAMBHyTnWaF40gON1aFIU4dbZuRA+o6D4aCs2qixy8Afp6dLE2E0DxPzZADMz6lYe/xzPpMZZyAp/UUvodxqugVsvFZOVZDiFstTyHr1UOnlZX9fQnKSvn3iMssfAKqAyLpdhr1rpwSmPEXiJ30F4RhGY6eSNXr/CMd5QDL05FMeML3970otCBppXeCAoszMXY8PFAYUDoWR+rOj9nYI77ZHHlbo47+ZIfADllOm4x0lvOU84ZTJbTydSYZW8mFpzeqmFWhQ/G69yw6TjRQDBHjUltoNKryeKiHaQ8dgPaw6UXibCsNuHFZQco8yNnHn2cBBZVCWrIDg926oYLNcXGqtVwfhBjtKB6wgTCagFbZl8jwEuQFG5oTYk9ucPO0W62osdPkJDbm0TIpEgbUi4nwn8uUqZoTIIGDc8celR/3TAXjiXYOcbWhUhkm0o0yPhhOHSxS2Vw7x+gaWWRKpdARpYSThomoTzJmERofcxemEI4ZzvvzKCBhQ/89vQaa3EV4C+P5vJc/FWGqcGy1APEMBQVaAXzrvk/3Bp+fXRzAFF/CZEsaEbDPMkFW6dRh57HJBoYH6vk94WEg6t9/UWgRl8Ib0RattJsOFZmU0ub2zeKt6IqfiyA98hwQkHj8Uwa0aNmWB+SnlL2kmrmWyV58484MEd7r6nOmnBsJIzLefdzHn+lnGZUQpjQMPYj7wTz0U5/TSOMFeyWBRjm6UyAUj6Mp4SvZmU60/HJI3vIx2ojDOUMW+DuLcB/JxGBn9pJhtG4niKACPNxym/42aU1pWXEFtvmmPI0AkxTA7iWCK7FavE+I/iF3c/d57OmsxvL7xJWOCl6mPqDeF6eYvd6DzG6xBNfmarNC+ABXXaTNm0DHo+VX311eACetgxdeF1jTH7YcNN9E01HXYSssC7vxO3goiz1otjf1P1Y80EJelzLNiqUhmtpmtrSTE4aW7eSayyH10FYVE9dwVs7OA3QOLznJ9YYejxL1w3V7t9XiwefxGhH/UY4fSt1/6m+gujaRNJPKyRhhO80kQ1mxEECVF9yP6El7XNFiL427W7zLRV3R16XCbVYlYK/D1XlvPQGNG3+AnoXgaoN9cZDaDEI7Ww+0BQ5IMum4tEyYisAumFQCK6LgMgRBqvNzxIBOw1v3xsNtpX9r0UKwX9kMb9rRpTJH/EUmjEB1LKcW0vDDXhtmqBRFs9TGw43Tb9AFvPF3Hh0we2qPpONbZWRmm9boQv/wzn32gO9gh3FBMOTEq5g9hJ9lgFXu5+RsewUVMXf9vrZYo6NBHgw== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(36860700013)(1800799024)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5519 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:205:1::36]; domain=AM4PR07CA0023.eurprd07.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM3PEPF00009BA2.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 9513afdc-970c-4707-3467-08dcc8e53e25 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|376014|36860700013|82310400026|35042699022; X-Microsoft-Antispam-Message-Info: =?utf-8?q?4n0AfRs0ap8B2FTtCqLenJYCmJu1hut?= =?utf-8?q?IypGEeKoKeCCrAkgKkkLWebi9lT77ReTrpHxBVuN6wKLaEU9y0Fd+koP1ftUa6QBU?= =?utf-8?q?rqwTw8NYwp0OP0aoi6zK0HS8Rvl/TnhYQwwXNMPxh4uvNMn0MxTmhffpw7jvXEA7h?= =?utf-8?q?UhVgZse+sspElZIdbmc6n4xRjL6vfh5ZFIgP5oi+BQ2Bq5uEW013IwgYtUbKzJJHF?= =?utf-8?q?dZacklFITSPsDgtTBPtyu0CotroYjeCVDUoTH3XIT0M0ZtOAYcpaMp3syNYcKo7De?= =?utf-8?q?B2fqB8Ca2JkofaCywRlMaxm8zeqWq4KCjilAhuDf9Yjd4kuHxSzwHMPyFVkvFDHsq?= =?utf-8?q?e/qMSkv4HtF53PycrKimrwlAxDuQAc5U+OaClyCGg6YHQgK+AzpF5GI/MDfVawI7/?= =?utf-8?q?zdF9o5gxy/kl4NowTfpu4zEmmZvO3swwn0XgfMaoNUstWkLJQFFWEl4fk/UpFM+rg?= =?utf-8?q?5SP9yKYVArHHF+BcMBgVRtCaRLjxSrm2K8Sox3HRE52jobYIaAOlGbf8D0Axd/oBz?= =?utf-8?q?3oTQfaOz/+Th1s9+UqVIXbD8PyauUtkvWmyYdheDGT6NcGCFlHw4SogeCpHxF8E8P?= =?utf-8?q?eClbdrLJMnOz6DEaC+b5+yjDUS3qWGOHN3Lmt6vTOX+uAUBKUDj5Neguuz2NWwWkG?= =?utf-8?q?Fxt4gLXJqw0YNHn34QoCcr+eXK1q1R2auU2f+RqNPMMSOddK3h8EJZ+cWgGXk0jmh?= =?utf-8?q?bxdJl70h3KuE80jMt9/GEz31nTC6HM1mIEbqk93x3JFhD7uYtJtsDSF+tZwSeaqu+?= =?utf-8?q?TeR354QRxr3mY0TtLMRIv3oyzCrvu023riMP92qycQmiK4TMb0xFc3WlsjUXyNAko?= =?utf-8?q?ZzaB2wR6Jjo612ONvwQfuq3qiN7Svvq4H/EW7ithIxr1CMqEYu8RuLxbNjaBjvV9l?= =?utf-8?q?NPJYKIefb8eZ74ai+C/h5R2/6dJtQTHCBldnJh5zEyUS7fKbQQl76HEPaWkxwe5oQ?= =?utf-8?q?jkYagQM79zbnFDYYwZvUsZciuqspY5fv4krWARmSRHO88SDT/byVuzVf/DEFfBR8U?= =?utf-8?q?eu2Y//QrMTFJB6zsaVQ0T6k08estRpsTluOr4hrj2m3ZdHs7dKYt0EiiRepheDSk3?= =?utf-8?q?M6NuNZcHTFOKOEz4Sm/Qqfh0Jxru/fg+1B2iTnNtQU76BpkK//97COb8gOVvaJeE1?= =?utf-8?q?bVxaq0CceV9+5hjBdn5ao3K3oGFc1/c2mhgTPkG960FfNx0DQSdebgQ6+hTP/Xcsz?= =?utf-8?q?atVLBfMM1y4hpbcW1nunjFTn2GGsKpdXQeA/LrTY1V8Ztb/5afq91512p32dRJAAG?= =?utf-8?q?oSbw/vJ5S+kvb12Sa1C5rIWbaGdbos6KnKFz97QxqXfO0rb+9aVoX4QTsQjTb1geP?= =?utf-8?q?paFSLLjegmf+t7tNLO8aqhFH4mbP+13VJw=3D=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026)(35042699022); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Aug 2024 11:17:05.3898 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8051888e-61a3-4379-5430-08dcc8e547a0 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF00009BA2.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR08MB9382 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch adds code generation support for famax and famin in terms of existing RTL operators. famax/famin is equivalent to first taking abs of the operands and then taking smax/smin on the results of abs. famax/famin (a, b) = smax/smin (abs (a), abs (b)) This fusion of operators is only possible when -march=armv9-a+faminmax flags are passed. We also need to pass -ffast-math flag; if we don't, then a statement like c[i] = __builtin_fmaxf16 (a[i], b[i]); is RTL expanded to UNSPEC_FMAXNM instead of smax (likewise for smin). This code generation is only available on -O2 or -O3 as that is when auto-vectorization is enabled. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_faminmax_fused): Instruction pattern for faminmax codegen. * config/aarch64/iterators.md: Attribute for faminmax codegen. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-codegen.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 10 + gcc/config/aarch64/iterators.md | 3 + .../aarch64/simd/faminmax-codegen-no-flag.c | 217 ++++++++++++++++++ .../aarch64/simd/faminmax-codegen.c | 197 ++++++++++++++++ 4 files changed, 427 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7542c81ed91..8973cade488 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -9921,3 +9921,13 @@ "\t%0., %1., %2." [(set_attr "type" "neon_fp_aminmax")] ) + +(define_insn "*aarch64_faminmax_fused" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (FMAXMIN:VHSDF + (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")) + (abs:VHSDF (match_operand:VHSDF 2 "register_operand" "w"))))] + "TARGET_FAMINMAX" + "\t%0., %1., %2." + [(set_attr "type" "neon_fp_aminmax")] +) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 17ac5e073aa..c2fcd18306e 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -4472,3 +4472,6 @@ (define_int_attr faminmax_uns_op [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")]) + +(define_code_attr faminmax_op + [(smax "famax") (smin "famin")]) diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c new file mode 100644 index 00000000000..d77f5a5d19f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c @@ -0,0 +1,217 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O3 -ffast-math -march=armv9-a" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +#pragma GCC target "+nosve" + +/* +** test_vamax_f16: +** fabs v1.4h, v1.4h +** fabs v0.4h, v0.4h +** fmaxnm v0.4h, v0.4h, v1.4h +** ret +*/ +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f16: +** fabs v1.8h, v1.8h +** fabs v0.8h, v0.8h +** fmaxnm v0.8h, v0.8h, v1.8h +** ret +*/ +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamax_f32: +** fabs v1.2s, v1.2s +** fabs v0.2s, v0.2s +** fmaxnm v0.2s, v0.2s, v1.2s +** ret +*/ +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f32: +** fabs v1.4s, v1.4s +** fabs v0.4s, v0.4s +** fmaxnm v0.4s, v0.4s, v1.4s +** ret +*/ +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f64: +** fabs v1.2d, v1.2d +** fabs v0.2d, v0.2d +** fmaxnm v0.2d, v0.2d, v1.2d +** ret +*/ +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fmaxf64 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f16: +** fabs v1.4h, v1.4h +** fabs v0.4h, v0.4h +** fminnm v0.4h, v0.4h, v1.4h +** ret +*/ +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f16: +** fabs v1.8h, v1.8h +** fabs v0.8h, v0.8h +** fminnm v0.8h, v0.8h, v1.8h +** ret +*/ +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f32: +** fabs v1.2s, v1.2s +** fabs v0.2s, v0.2s +** fminnm v0.2s, v0.2s, v1.2s +** ret +*/ +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f32: +** fabs v1.4s, v1.4s +** fabs v0.4s, v0.4s +** fminnm v0.4s, v0.4s, v1.4s +** ret +*/ +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f64: +** fabs v1.2d, v1.2d +** fabs v0.2d, v0.2d +** fminnm v0.2d, v0.2d, v1.2d +** ret +*/ +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fminf64 (a[i], b[i]); + } + return c; +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c new file mode 100644 index 00000000000..971386c0bf0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c @@ -0,0 +1,197 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O2 -ffast-math -march=armv9-a+faminmax" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +#pragma GCC target "+nosve" + +/* +** test_vamax_f16: +** famax v0.4h, v1.4h, v0.4h +** ret +*/ +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f16: +** famax v0.8h, v1.8h, v0.8h +** ret +*/ +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamax_f32: +** famax v0.2s, v1.2s, v0.2s +** ret +*/ +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f32: +** famax v0.4s, v1.4s, v0.4s +** ret +*/ +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f64: +** famax v0.2d, v1.2d, v0.2d +** ret +*/ +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fmaxf64 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f16: +** famin v0.4h, v1.4h, v0.4h +** ret +*/ +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f16: +** famin v0.8h, v1.8h, v0.8h +** ret +*/ +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f32: +** famin v0.2s, v1.2s, v0.2s +** ret +*/ +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f32: +** famin v0.4s, v1.4s, v0.4s +** ret +*/ +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f64: +** famin v0.2d, v1.2d, v0.2d +** ret +*/ +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fminf64 (a[i], b[i]); + } + return c; +}