From patchwork Tue Sep 3 15:32:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurabh Jha X-Patchwork-Id: 1980127 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=ShJzuejo; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=ShJzuejo; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WyqSn0drxz1yZ9 for ; Wed, 4 Sep 2024 01:35:21 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA183386182A for ; Tue, 3 Sep 2024 15:35:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on20629.outbound.protection.outlook.com [IPv6:2a01:111:f400:7e1a::629]) by sourceware.org (Postfix) with ESMTPS id 74E253860764 for ; Tue, 3 Sep 2024 15:33:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 74E253860764 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 74E253860764 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f400:7e1a::629 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1725377635; cv=pass; b=hgpMJwF8zQEWd719kftIZdCwfyFDrL/IK0MhYM61+kbrWiuNlDHIIeXpBHFAxmTndOYB17yDNbTy2woghlubWigsXQPo2RHBSvK/BYW+1FS/4fLbOCMrrmW0wUF3u6y84sBrSF/P7c1l5I08A4g43FaauTicc5kguyAslhQ6JME= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1725377635; c=relaxed/simple; bh=zoYG7LPCsh7O68doOvxANRLVNDYlMCbUT0i2yHGnLYw=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=tAFtoSN1adqtWMG59it/nWpAL3t8GugrlM0eBOXcvEjvNDQFT159BlyzpOZlzKvRKSzbVjqg/EKkSBXUbTVMOv6EOdHG9GmlktyFDpHFYsNVRFyWk3AkHNZhr6Jkf5VkV2WGyzHPrmq6JrzbbeZi43DvdNx+QhVcJLm5pe11Cl4= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=Wws+Qu4rOpD+oLZ9UpWzrAyiV/TuAkTL0BE5lPbbYsQ3dCnRsMfjG3rdp4Jn7P7+0dhsqinEKNrKf6vANMMe1T0JtL0TlhID2DbtHPVar395VJOkjXTjvmdkN5yrvABQMdIhi9qyHOMIwyr/Lig4LudWbB3QDxvH2ee3IEt6ODFERRVrL1iMfhoLF+4YZDsxQGDOE8XGqXiPzaHX32w+YEX6OwcMPzBJy1BfJucVjtFKCMRW1hukhbWvjVFjAwffFJUWMQjJFEXju5hqB6/f9zVyevRaSbbHyYO6pZ1cm3VqgVljS9NOOznJwmixxpaWXFrvCWlwqu1MmDT+sESxKw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BaAo9VrYoz2cvFxbWCgtJ6PjIEs7WYtkMFvTqxWtYYc=; b=jtajxExZQW4dOufCVanX2yTX/0Br3rrKxFTfj3xGPASu3Ybg1SjRTHfJYQpOwM9WMQbaVN4EwPjR4CSzFQejEFDCA/QMbKEfne5b3zJuwMSoW4wL36hSh6O8IP5931Wy4zVfMyfLc8ZtXbDXoqGJwF575ltYydAc0D3fC3vEgVrd8jFhxZlrfb+9Dm1yO4ZbdAReEz79s3FRnBnY4YKxv25R9MH8BNAEWtKEDS4gRPo6yViG96olpJHt9Usu4DCIa5ai1Kq1v9CPSShXfvOClEhSlQRcNYDXNOy6PM5Hgu6H7bLQaC8CGrVMskpteocvB87SOLgql5dizv3ZaqJHIQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BaAo9VrYoz2cvFxbWCgtJ6PjIEs7WYtkMFvTqxWtYYc=; b=ShJzuejokucPbsC736isb1vbNEQ8SUlaPny7queAsiP83Pv1Kg7Zq4Y+i60AwSW701zIj6W5L3/JKWkrl4C6xTeZk8vd2DOz7gi+BDAlswqqk1GEEPwuQ8G7y+ajBjYfhzy4sujUobNgYIpHbW0kxzi9PhrlJ6+9eHAm214Km74= Received: from AM0PR01CA0111.eurprd01.prod.exchangelabs.com (2603:10a6:208:168::16) by DU0PR08MB8397.eurprd08.prod.outlook.com (2603:10a6:10:407::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.12; Tue, 3 Sep 2024 15:33:47 +0000 Received: from AM4PEPF00027A61.eurprd04.prod.outlook.com (2603:10a6:208:168:cafe::f8) by AM0PR01CA0111.outlook.office365.com (2603:10a6:208:168::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.24 via Frontend Transport; Tue, 3 Sep 2024 15:33:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00027A61.mail.protection.outlook.com (10.167.16.70) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Tue, 3 Sep 2024 15:33:47 +0000 Received: ("Tessian outbound bc251c670828:v403"); Tue, 03 Sep 2024 15:33:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: cab2f3f01ec74688 X-CR-MTA-TID: 64aa7808 Received: from L703e7f444f0d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 052D00F6-3CB2-4B09-99FF-86290584A322.1; Tue, 03 Sep 2024 15:33:40 +0000 Received: from EUR02-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L703e7f444f0d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 03 Sep 2024 15:33:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=oC67laxZIG0rijcz0hcxLhO74UvuhrUCQlYeTDbr2TOlZnDwqwXtr7oJno3pzpif+6Q51MezGmk981ILpsQjqOayT9IZasN7T8Zs8c4Yh0nfS9hCmRIDvbJMh/WmeEc9y1uflnpE2WnVnyq29YSd3M/DKSTDUSj8ffD8HUbCbgcOat82i4a2k0i/gbhkoGHuzXW+/QO+LOZeJydZ5sHR3MbP197dhubcbdGILDcWUZbDZ2BxcJdgCKQV3nB8jKpwUpArPw7KgumrUmnSA7NlO5qIk6MePLBUhiypmZi+i0pwveM7hnk0BpDw2VRSuABOqWUvGnUzl/QSu0TuTMESaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BaAo9VrYoz2cvFxbWCgtJ6PjIEs7WYtkMFvTqxWtYYc=; b=bEWE6xQpsiXtbsKJw9hGG3gdEOlwDVeSiIOlbYj4s9D9AQ2guWPYimWEIJ5zpJqKXAUeVlcH0CmeUyUfDfc7vQYN3p6tmlAelc+cZxQoGViyPKecQ1GKKqpV5YN+bPE+4S7OlJtuJkJholwg2ptyWG3GvS7Vv8jBZs+OQk8HoR8t+zA12LepkbLU5MzEBpRux+1oAdUqwjaOH0/sqfPrB1uGGFQsHL6Qbc1Ah8UT6f+5UBrNm291C2+tSzkMS0iXnywtY3z3kkywO829fOS1tMInrWUfu1LQXDHmnhMQbSBeHVK7Mm5qSi+dJKax5tDyMljNBcOrPZ45m8s1zdNrKg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BaAo9VrYoz2cvFxbWCgtJ6PjIEs7WYtkMFvTqxWtYYc=; b=ShJzuejokucPbsC736isb1vbNEQ8SUlaPny7queAsiP83Pv1Kg7Zq4Y+i60AwSW701zIj6W5L3/JKWkrl4C6xTeZk8vd2DOz7gi+BDAlswqqk1GEEPwuQ8G7y+ajBjYfhzy4sujUobNgYIpHbW0kxzi9PhrlJ6+9eHAm214Km74= Received: from DU2PR04CA0329.eurprd04.prod.outlook.com (2603:10a6:10:2b5::34) by AS8PR08MB7992.eurprd08.prod.outlook.com (2603:10a6:20b:571::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.24; Tue, 3 Sep 2024 15:33:35 +0000 Received: from DU6PEPF0000B621.eurprd02.prod.outlook.com (2603:10a6:10:2b5:cafe::66) by DU2PR04CA0329.outlook.office365.com (2603:10a6:10:2b5::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.25 via Frontend Transport; Tue, 3 Sep 2024 15:33:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DU6PEPF0000B621.mail.protection.outlook.com (10.167.8.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Tue, 3 Sep 2024 15:33:35 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 3 Sep 2024 15:33:33 +0000 Received: from e130340.cambridge.arm.com (10.2.80.47) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Tue, 3 Sep 2024 15:33:33 +0000 From: To: CC: , , Saurabh Jha Subject: [PATCH v8 2/2] aarch64: Add codegen support for AdvSIMD faminmax Date: Tue, 3 Sep 2024 16:32:59 +0100 Message-ID: <20240903153259.3136111-3-saurabh.jha@arm.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20240903153259.3136111-1-saurabh.jha@arm.com> References: <20240903153259.3136111-1-saurabh.jha@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU6PEPF0000B621:EE_|AS8PR08MB7992:EE_|AM4PEPF00027A61:EE_|DU0PR08MB8397:EE_ X-MS-Office365-Filtering-Correlation-Id: 78b0fa45-e3d4-4edd-59d0-08dccc2dcd6e x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|1800799024|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info-Original: mUpwOUcw1Ger9DcP81OS3T0uJEGS8AyeWDUmC9WVIh2XJgZw3GcZxuV+6udxOAA1ww2FSaq5lkx88TD8AKqrXLDP9NBlGsN7Y60K5ktouH6cko4mtBB50gJjNZqMTGzdQYAVtMiD203hLQfO95p5jAbFabRoNcvvathOZDc23TCqOdDVHvzpxzdTcRjlZJLgVrrgBrDmLDGfL9busVaVqewKJJfUgbkmNWIoUTY2e0RAw8kvoaqBtlC/t9WQ48m9c8CAyTetsqb+dPwEf5qYj0llXtrkFYLgD3CXfaN2/vW6kGzpkFdMKdNYydx4n8eT6KBzhwMp4Y5gh/7rPgLbkR7vuYow4uQ+yJWn8UjGvsQyi99wZ/5GHqIxoCa+1PazBDHmpcD81blIgRF0YoxWuS5kMwoylrN2C3985JLMnHjS4pSpcMIn2f25Nt1JINsheskTzOMzPHSSHSwao5eZiCZS/qvzXcraBbkuGm0TQmrOL03N5PGmXIYypMN4dxiejT/kjq+RQtxcGlXNeN/KiVsumhAVSBovHarx1d4wOp1towN3s3gbWIz0C4UsQTSsCWQLvgiXFoV1S6YLWR97HATpWd0T6vGKORSPMI7X79ws8aL3z/oCUxlR0PZ4YAynPigAIFcJnOY3ri7lvLig1X/8EffEHa0DRs6SGt1+Q1DgsvyRix+cJHs8vsRjFRlmp+HaZGh2+Kg9OXu8Zr99baARvnGj3EDiL7Xvbtg7PHihhhDEIlkip9pKanYMLZor4k//80gs//s+GAlU35zZvRk7TGJjrP/aykiMXxhCkTG3tWyQM2+qo3m5Xm4SP7MMvrNG6lDUpm2v47aVeZROUiaIyGtFDBAOOQH1e+BGkVb0FHVVi25QAiVP2cBnpQ+TpYb+Hifz1Kob7X5dy36SxlNwUhG+xIrjvtacpHwfRFE0THiEaF6dUAU7O9joqr4PlKqP+Zhf2PdUkpTOe/7b5dbIV9RYBQQD97Zc7CVOnVagGGAOidmj1+a3zLUYkPnxZEzkJ6kIStzy3tiTFd0L87KCC/FBXkz5r0C7MAhEwxznNln0Ndix6vfqEPTJCGKYRjNcx4+mu1H0aMuZezYXchN9NLhdD9xlvdjosVwD567J6SJQ5AFtmhtQ0J+CGF+YnS3BYGeYZC9gwxjDKpmzSKOPPD7tloZj88iTZVl3+FwIL493uIZrlc/izHoQ3QIs/mjHsHnTSdE6vdurJIX3BvhjrhNqvKQb/kl0VLHhSOiG1gSdYWp+61mWAaIwc4uansOMLySe7L8vBp3qnpnhJX7VTbJ1XIn96CHHyrPo3L8a+FLV4Q5lX9t5Mk/pt1JO7Xt5tyGRFmsYsfk4HlvmUKvRrNzwjgOs1l2Bk411D8XCrhmtYasRy2fbPh8CtLB8xMrr0MMJxKL/3JI4YFWwJQ== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7992 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:10:2b5::34]; domain=DU2PR04CA0329.eurprd04.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00027A61.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 8ee0f7b0-88f6-49c5-cfcc-08dccc2dc662 X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|82310400026|35042699022|1800799024|36860700013; X-Microsoft-Antispam-Message-Info: =?utf-8?q?Y4eJJf+AWsgMFg00KugdcYHHArK6xRD?= =?utf-8?q?/KnJhVhbsBZU/jnzrSGN6e/1iHUr7CBNk7N9lmda0QFE0pXshTQtQCRINXbiyhn55?= =?utf-8?q?JPVRTT6Uay7DdXLJvNsQoFDJg+XhU/p8DKcpqwbUcTHxu4fpCrBzH1TdLLSDp0tSy?= =?utf-8?q?ZXud2L5EFsdv2MwiNcP1Tbi2O+f2z+EYZb9SKej1Msc81BA0ipaFWjtbPrncU7gsQ?= =?utf-8?q?eOcO/RCnj+NenZbWSmrGx5zA6qjY8t/5R6odsfv106QOgEux5JBoorZdmlp9rS4vC?= =?utf-8?q?cpwNfE809XNbp/1aF35WIJxyJRHQj2RGFI/hdU7M3wmHux2PHQ/YJ/kP0mcbltqN/?= =?utf-8?q?OjiQIS/mktKF20vygKjOWvzr/8EZ1c/8Zz7JVPyibMgYZfdoZaufDqXK1krD0dvDn?= =?utf-8?q?uQn5pBRGx6ZOcE9gm+i0qINOA2daLTAYEZBPSiq+ydeWfY1mV3KcBV+nQPcX8hVQ4?= =?utf-8?q?3S7cUvbqWI77TDDdjjxLuaGUlWgBOx170d07w8II6/dwfBnaBmtG6JG41ylFTm6kV?= =?utf-8?q?8HuIEfuX2/agU/BdxMVMqkLBAHZ59m0lDd3CHsruGVPeOSuvV6kJKwuuJ/gYyO93W?= =?utf-8?q?yq/kpm56SMgZQEsM5LwGJvaOcCWVSGTg81PTToWncNV828HZZROgd4uMHFp6RcUjs?= =?utf-8?q?svh42R9kYQIpJL31fdh8mkikt88Isgc2VGReuL/0rcut2gM+117k3eJdmfdF0JKYx?= =?utf-8?q?ZJP9Nyl1VyoE2zuKLy6q/E4hVwJm/gbUDAqalYBJI/SA4cv9GVOgGCsOWD0PB9bzm?= =?utf-8?q?+rg/HbcuPfdM8aTDYF7NKp/rvp1qIOT2YHD1vrj7vQVeDBUwjdtUgARvfDxNpP1b2?= =?utf-8?q?f51zzzwy13klYdFlEaxw3JAsueRlOy38CHAyfI7OAK46A0n0sJyEzH4MGzvZZK+Xd?= =?utf-8?q?M89gqjLNgf5hnd4MAKizd+IHWbkbGU+UAO1y6gE+A7xX7fZnk4g/Uo9qiRWCdjXPb?= =?utf-8?q?XAim7dAAlej1c2z/RyWXCEYMnyLK5i1DCGRpoDyFZdXPGHFdpKR7qYVVuPI+UyXpF?= =?utf-8?q?0gTUxU0Pw6fuLYoJwpgmzOQBftQX2ocIxUAq/L04nGdxembfulR8n3E8kRzcF4aC3?= =?utf-8?q?L7VLHuxfu/PT8jxL1a9iO0rI9MUxzBYfXm+jz0yx8rJYQRZtqQrhxyhOsmUW6LXGL?= =?utf-8?q?WFehvie6WchHaPGN5y/8wrHl7smm45pf5AzzEq8LE7l+Ik85Ne0DyiVuxwXP3S3i6?= =?utf-8?q?n2wvxIr3O/yohOT1dZpJOnyFPqqE684Onos7gr+yykskDWd/v9oR/zsexaWpMaQAL?= =?utf-8?q?YaGE/22wz3s2AUX9u0qK3eLmo8LFokuUv52+bJTc2CPCBxkaX9dFHGV4avj0txoF5?= =?utf-8?q?HWiP/vUU7z+HuC0sWUdDSsW9rDtDfrNRIg=3D=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(376014)(82310400026)(35042699022)(1800799024)(36860700013); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Sep 2024 15:33:47.1142 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 78b0fa45-e3d4-4edd-59d0-08dccc2dcd6e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00027A61.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8397 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch adds code generation support for famax and famin in terms of existing RTL operators. famax/famin is equivalent to first taking abs of the operands and then taking smax/smin on the results of abs. famax/famin (a, b) = smax/smin (abs (a), abs (b)) This fusion of operators is only possible when -march=armv9-a+faminmax flags are passed. We also need to pass -ffast-math flag; if we don't, then a statement like c[i] = __builtin_fmaxf16 (a[i], b[i]); is RTL expanded to UNSPEC_FMAXNM instead of smax (likewise for smin). This code generation is only available on -O2 or -O3 as that is when auto-vectorization is enabled. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_faminmax_fused): Instruction pattern for faminmax codegen. * config/aarch64/iterators.md: Attribute for faminmax codegen. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-codegen.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 10 + gcc/config/aarch64/iterators.md | 3 + .../aarch64/simd/faminmax-codegen-no-flag.c | 217 ++++++++++++++++++ .../aarch64/simd/faminmax-codegen.c | 197 ++++++++++++++++ 4 files changed, 427 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7542c81ed91..8973cade488 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -9921,3 +9921,13 @@ "\t%0., %1., %2." [(set_attr "type" "neon_fp_aminmax")] ) + +(define_insn "*aarch64_faminmax_fused" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (FMAXMIN:VHSDF + (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")) + (abs:VHSDF (match_operand:VHSDF 2 "register_operand" "w"))))] + "TARGET_FAMINMAX" + "\t%0., %1., %2." + [(set_attr "type" "neon_fp_aminmax")] +) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 17ac5e073aa..c2fcd18306e 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -4472,3 +4472,6 @@ (define_int_attr faminmax_uns_op [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")]) + +(define_code_attr faminmax_op + [(smax "famax") (smin "famin")]) diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c new file mode 100644 index 00000000000..d77f5a5d19f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c @@ -0,0 +1,217 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O3 -ffast-math -march=armv9-a" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +#pragma GCC target "+nosve" + +/* +** test_vamax_f16: +** fabs v1.4h, v1.4h +** fabs v0.4h, v0.4h +** fmaxnm v0.4h, v0.4h, v1.4h +** ret +*/ +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f16: +** fabs v1.8h, v1.8h +** fabs v0.8h, v0.8h +** fmaxnm v0.8h, v0.8h, v1.8h +** ret +*/ +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamax_f32: +** fabs v1.2s, v1.2s +** fabs v0.2s, v0.2s +** fmaxnm v0.2s, v0.2s, v1.2s +** ret +*/ +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f32: +** fabs v1.4s, v1.4s +** fabs v0.4s, v0.4s +** fmaxnm v0.4s, v0.4s, v1.4s +** ret +*/ +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f64: +** fabs v1.2d, v1.2d +** fabs v0.2d, v0.2d +** fmaxnm v0.2d, v0.2d, v1.2d +** ret +*/ +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fmaxf64 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f16: +** fabs v1.4h, v1.4h +** fabs v0.4h, v0.4h +** fminnm v0.4h, v0.4h, v1.4h +** ret +*/ +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f16: +** fabs v1.8h, v1.8h +** fabs v0.8h, v0.8h +** fminnm v0.8h, v0.8h, v1.8h +** ret +*/ +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f32: +** fabs v1.2s, v1.2s +** fabs v0.2s, v0.2s +** fminnm v0.2s, v0.2s, v1.2s +** ret +*/ +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f32: +** fabs v1.4s, v1.4s +** fabs v0.4s, v0.4s +** fminnm v0.4s, v0.4s, v1.4s +** ret +*/ +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f64: +** fabs v1.2d, v1.2d +** fabs v0.2d, v0.2d +** fminnm v0.2d, v0.2d, v1.2d +** ret +*/ +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fminf64 (a[i], b[i]); + } + return c; +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c new file mode 100644 index 00000000000..971386c0bf0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c @@ -0,0 +1,197 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O2 -ffast-math -march=armv9-a+faminmax" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +#pragma GCC target "+nosve" + +/* +** test_vamax_f16: +** famax v0.4h, v1.4h, v0.4h +** ret +*/ +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f16: +** famax v0.8h, v1.8h, v0.8h +** ret +*/ +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fmaxf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamax_f32: +** famax v0.2s, v1.2s, v0.2s +** ret +*/ +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f32: +** famax v0.4s, v1.4s, v0.4s +** ret +*/ +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fmaxf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vamaxq_f64: +** famax v0.2d, v1.2d, v0.2d +** ret +*/ +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fmaxf64 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f16: +** famin v0.4h, v1.4h, v0.4h +** ret +*/ +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + int i; + float16x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f16: +** famin v0.8h, v1.8h, v0.8h +** ret +*/ +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + int i; + float16x8_t c; + + for (i = 0; i < 8; ++i) { + a[i] = __builtin_fabsf16 (a[i]); + b[i] = __builtin_fabsf16 (b[i]); + c[i] = __builtin_fminf16 (a[i], b[i]); + } + return c; +} + +/* +** test_vamin_f32: +** famin v0.2s, v1.2s, v0.2s +** ret +*/ +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + int i; + float32x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f32: +** famin v0.4s, v1.4s, v0.4s +** ret +*/ +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + int i; + float32x4_t c; + + for (i = 0; i < 4; ++i) { + a[i] = __builtin_fabsf32 (a[i]); + b[i] = __builtin_fabsf32 (b[i]); + c[i] = __builtin_fminf32 (a[i], b[i]); + } + return c; +} + +/* +** test_vaminq_f64: +** famin v0.2d, v1.2d, v0.2d +** ret +*/ +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + int i; + float64x2_t c; + + for (i = 0; i < 2; ++i) { + a[i] = __builtin_fabsf64 (a[i]); + b[i] = __builtin_fabsf64 (b[i]); + c[i] = __builtin_fminf64 (a[i], b[i]); + } + return c; +}