From patchwork Tue Aug 20 15:44:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurabh Jha X-Patchwork-Id: 1974533 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=od4pqltL; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=od4pqltL; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WpDMV4Lf5z1yXf for ; Wed, 21 Aug 2024 01:45:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6F74B3842582 for ; Tue, 20 Aug 2024 15:45:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2061c.outbound.protection.outlook.com [IPv6:2a01:111:f403:2613::61c]) by sourceware.org (Postfix) with ESMTPS id 111B33844777 for ; Tue, 20 Aug 2024 15:45:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 111B33844777 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 111B33844777 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2613::61c ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1724168729; cv=pass; b=bgTD8WM4pLI+7wnvoH3pCzLNFjOWggNB0OOft5lf1q2s5oWFfYV/aRAkEbg3pW+U5G4RKZpr2dQgaKurrsmTTrlQUFcS6P/5NbAPN95SaP7eb53irLaraLPtUCHYl/yHF74W0FYMUeoqs+fr0CNWn0U5jBqoEx2dKVvi5FdOecc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1724168729; c=relaxed/simple; bh=pk1AcMCIEHxug8jiJqm36a0a4HTAAV4evM+bmDH+pt8=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=aHfHEztRJn8/4Yimcc02Fk8TNkaZqvQPNzcW2lGJIUNjpavO1BiTX9yIkqIfdHEUmHJTs1oLgNf0mAXmedo8KhD62v76pak4syGpBnCWHWg+SoRuwD9zO5VVI9D6XDpYcDApjyaLRldWb2oZIIuWQ6TzWAxFoGgYbAPJFbep8Ec= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=f6eVcVyfp1m4xrUGRyvGiVmcWN7h8o+gO8RTUbD0dqmNKWnJ2JcSb+5mvx89Qxxk9ol+KouP2lEnwbgqBOmLOGnH8uLa20E39vPVpXHI0cNlbG137y1N1aTYXVgjDF9tmsIUDau0P5Jk1KqD7M3+hNi4HANbZISK4wrkfkNdT+DYNk9ZWQRqmpL2RAOJvvajFaKjwuxEHy2ca6SRezDwseeUL5AVBIBqAHGUgidckTfSAsVXrJ44iYbI0zqed//7OhB0YbesOfI72/57etAPg2CNHbRJP0aSvGN5Dt7HBC4DGmeipiAsw2pB67AEiME+wAEDFsH0PzXEV9mNIxwChA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=z2UnDa99NjNh6iPMKrtolxH1FbYP2yx0a8RCgRJhYag=; b=EZKHJpoKZ53Dzje38gxXgkWJ4kcTYynZswfbpzQ2sgVH8LoWJrtBh+HbSrHPM6kuJg1DPN68/eh+nTLzUC9ZH9i2hb4gGQBY5d6Rttj9ll1YtkYaV460TAVUpodkXE55RNdlh2Ze2XqG4YhNDdEswpRLBrTbDxMAK/fOzML5kdJ4SAOD86JjtwgeDsFp5d2zM8sggoOjY3hzGt1QStJpslnoyQB0gZCF6bKb7lQW+1xelF2AAEiDzzw5wn9nI4HrWjHfecKwYXHHOzkkmmbKgKo4PvoDrCvXS78Z5wPb01fyzv70LBdB0G2MTuTtp3gm/gisG9HC48wSydL8E086zg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z2UnDa99NjNh6iPMKrtolxH1FbYP2yx0a8RCgRJhYag=; b=od4pqltLY/4atz9rryxRt3Ef6uGMAEJFqq2VDgZC4EZPuxmi56LIIz4kwl8L8kvOQE9vr/XsA+biQzGjLTGTfhNkWEsLMAVS4+vanXAQ5ZEdyJKQaJMKngLEO+qDjddBvdkCGxrdbqLMVqOSjWQlGKsdfcFBy9Nrt3elPGHucGY= Received: from AS4P251CA0003.EURP251.PROD.OUTLOOK.COM (2603:10a6:20b:5d2::9) by GV2PR08MB9277.eurprd08.prod.outlook.com (2603:10a6:150:d5::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7897.13; Tue, 20 Aug 2024 15:45:21 +0000 Received: from AM3PEPF0000A79C.eurprd04.prod.outlook.com (2603:10a6:20b:5d2:cafe::15) by AS4P251CA0003.outlook.office365.com (2603:10a6:20b:5d2::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.25 via Frontend Transport; Tue, 20 Aug 2024 15:45:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM3PEPF0000A79C.mail.protection.outlook.com (10.167.16.107) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7897.11 via Frontend Transport; Tue, 20 Aug 2024 15:45:20 +0000 Received: ("Tessian outbound 3d5aa05142a5:v403"); Tue, 20 Aug 2024 15:45:19 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 0a6bd2ef77965004 X-CR-MTA-TID: 64aa7808 Received: from L0f561544f51c.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 01F8F853-7507-4F5B-821E-9AD81752AAA4.1; Tue, 20 Aug 2024 15:45:08 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L0f561544f51c.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 20 Aug 2024 15:45:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=YT2A5ReAWbXmzekVA2ioE4/baQ/FAJcXgjl1Kbim8aJh6Pygi44MM34sGkyoRMWJWT6KO7e8XbGtD4BDdPn3gzPCqVpYCY/x0WWqW0yJIY954WF3f6oHOtmA1i+VZZiu+HfdbgVNoZ3QMANXd8q+3XvIAKfFa8LNEsf29xHw0QtESjcJMgyRJyitEKJCg52aRSgSPFSHAvMyyAICezVgY1W+9SjZeiuvomuNNcctHmO00PtI/6+xqJ3Mlezb2Vb3VXnLOtSpsvgyVtmUUUNo7WO5Gg5rr17SzHZ8xIwR7VPuaT3J7sq2WN+uInshKL6yqRjdPVKOUsF9fEohA04Jsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=z2UnDa99NjNh6iPMKrtolxH1FbYP2yx0a8RCgRJhYag=; b=I2DDNZIDZE3eOpDvFWFEq2dDAFJqgo5wSxIPkb8omaDX/ANYL1E9dI9hEk8urk9zLH/+WoEbkpIbmX8khdgCNPDw07la7DlEemMaPpUnnJNvHI6f+eqTTJU3q9ZhejhqWvPyQT+Cs9CK5mI2E1voydB7EA2GxEUpMl6MgNmJwRrClRdHn6otSK+apYkLhiVLRscaSgtoHnFQYSWTLDrd/eD5U6d6AeIc5Q0OSPQb5hGrxnOde4R1jY9kbH/axtmiBEoBrXXhJ3zyPgxB04JqV3Vu5BEOpRmpqtndmA6miqhFVNNrhxoAjKLxDt6eCqIcrxomE+DYmt7twJN/0XM9Zw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z2UnDa99NjNh6iPMKrtolxH1FbYP2yx0a8RCgRJhYag=; b=od4pqltLY/4atz9rryxRt3Ef6uGMAEJFqq2VDgZC4EZPuxmi56LIIz4kwl8L8kvOQE9vr/XsA+biQzGjLTGTfhNkWEsLMAVS4+vanXAQ5ZEdyJKQaJMKngLEO+qDjddBvdkCGxrdbqLMVqOSjWQlGKsdfcFBy9Nrt3elPGHucGY= Received: from DUZPR01CA0326.eurprd01.prod.exchangelabs.com (2603:10a6:10:4ba::25) by DB9PR08MB7628.eurprd08.prod.outlook.com (2603:10a6:10:30c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7897.12; Tue, 20 Aug 2024 15:45:05 +0000 Received: from DU6PEPF0000A7E4.eurprd02.prod.outlook.com (2603:10a6:10:4ba:cafe::9) by DUZPR01CA0326.outlook.office365.com (2603:10a6:10:4ba::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.25 via Frontend Transport; Tue, 20 Aug 2024 15:45:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DU6PEPF0000A7E4.mail.protection.outlook.com (10.167.8.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7897.11 via Frontend Transport; Tue, 20 Aug 2024 15:45:05 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 20 Aug 2024 15:45:03 +0000 Received: from e130340.cambridge.arm.com (10.2.80.47) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Tue, 20 Aug 2024 15:45:03 +0000 From: To: CC: , , Saurabh Jha Subject: [PATCH v4 1/2] aarch64: Add AdvSIMD faminmax intrinsics Date: Tue, 20 Aug 2024 16:44:58 +0100 Message-ID: <20240820154459.2881216-2-saurabh.jha@arm.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20240820154459.2881216-1-saurabh.jha@arm.com> References: <20240820154459.2881216-1-saurabh.jha@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU6PEPF0000A7E4:EE_|DB9PR08MB7628:EE_|AM3PEPF0000A79C:EE_|GV2PR08MB9277:EE_ X-MS-Office365-Filtering-Correlation-Id: b1481f33-f683-4b74-5a85-08dcc12f18a5 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|1800799024|82310400026|376014|36860700013; X-Microsoft-Antispam-Message-Info-Original: qz1cD6VzPWMx6cD1soZi0PGY4GUcxoO1C3b3//WW3J7fShwVVLR3I1k7o9LbbLr0Pp9uG7/NExT17SChueCEL3SSYUvC+k0qygV+zP0091sEDPdpsC/3uwZdpLhyd/xnRAcByVmxyZ922OOj3IueuLGXTqi3R9LftwIQd98sHi6sH7o8OsYl/SoPRm094loH2FEeh9mvwoe8stc4T85UDGhGXFaQFHbNVxPZoKS3qbw70eM7Mww8gMB2KUNf9/Ylmc7FYDHYFx3yktCJyfKQrVpdoY+TK02Jr5K4d9n4PcDUPwSTD1xKaY0Z9PYpc4RBRskljWSK9cYWi9QYkC+z0O5NZwlC1O2wrHQn4BrMuW/uGQ5kQJDnL0UEXstOxeLDGPqI7qyF5fRBEzaoroxLTK4sKSrr8zi8DPA1zVUgedV7ErTmNwdDb5NpZS0uVCvwBqGFQ8dOBcfMY3y/Jpzmcj4W4Bt+6evGxTReOCiI00S/MMwk3KFwLEyRZ1QzrSQJZG01NzuSKiMvOXo4GVRoIfsD0UUj0OkRiJtloJTcMnhlMMaSkBH2C4WnF02vuFA5Ay8VJhs+G5S0cogeKDAlR2FIFyRK/Vvvrcp57L25RwI8XL8LnNTw04tH+bpXiOUKMwlSJkIQHNri+BQfZuMKT3M5SC7gdRvpTLRffa8SnKOHL/6/bvW9f5rhekNjJ+AoK4D+yhbzbjGfQzPH89c88qtKNbX90SNei7VNQsvVxt61ysPR1B8PoMiLVNtASBdKdDY77JGgECAsuGGJwdV65V9dSJ0gVA9YFi6wFsukShYsljQiYV+FDU9VdiNetRBIQbJRX0PX3T1cdvxkH5Xd3gK2/pzOOMFK00jmzVruk/bcyOzmqoVdRuqmpWUewLun32UFpLfZSFM+TS+TMyyhJyd58oZYJA6IGpDX2mBt1izl8NE+dGjX4lks+LBbxAizp6oeLYU40Gb00BMVjgcKRlr74XG5bxr9sgh6tlYmdMfdjYX1apyN4u23TlypdQwTWqYnXz1kAFZX4wkMGLSwwwpwWQpQikzbto6BRG12Sey/TsvJ3yn1jplk2idvKi/TCxeBNosOxxmSutaFBPRetr5cX9zqOpfilIDNKRqawqdXW8eh8cyZn4zOTxlucUkB7Ng0cx/N13Sb2Mhh3sL6hvx2TetBRzR4SFjzzS46vUf4LFi45O4GsnOxsBGkxYU7/D2oJOY8qGseDKfAIEA76IDFn7nFdNdH0dCJsj2oRVKqfklNnuNinUWY46bgIm2UtQwOZZi64hROYHIoGmLWK4uibvtvRzEk6meBdL3B8nlSa1L0BE8Dq6hBo2jtUfgLc1CHQKAqoDUBrVQzlBpGzQDTe39OIu6RvVpPKDx8+ApnozCk6G4vVZbLb3afXAs6iscRyH5/vQb8oU8j9OltAA== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(1800799024)(82310400026)(376014)(36860700013); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB7628 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:10:4ba::25]; domain=DUZPR01CA0326.eurprd01.prod.exchangelabs.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM3PEPF0000A79C.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 56c28966-7694-4ff9-7063-08dcc12f0fcb X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|1800799024|36860700013|35042699022|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?S50AtaU/uXQaZUkiywYZa6dhWCH/F98?= =?utf-8?q?Z2UiPjLEdrQBQ4atC2hZW9P3BTDj7dWlatIYou62z49AdOl6yXoKRMFlygmI51dSM?= =?utf-8?q?vXyisADM0sk/aeRtHSP3gQ9T01rmtzk21c8iSBWGA08XKXhQ3Rhjzd1mXLBnY/9Ad?= =?utf-8?q?j4r5Jn4Fu4nAHpySalIRX+OK0ja5oV99Qx300DzCOEVxpnYp/0qUQrjfnTllyCYq6?= =?utf-8?q?c7gqwTbqghyqPWFFDYQxHLPuQuIl2ILkmnq5KLSqXepC0d31SXLIKzEpuIwiOn2/j?= =?utf-8?q?rsWtBkmot43uu0gqXQ4w1ZxLPIhIHfze3dbvdERbF2okg16UOZcmHdtutiamDrTAA?= =?utf-8?q?7AtqPX9nlkCUCGWZ2WB1HrRCox2d3azg3EhZeH6boZdYvYKVFhYxewLPZLeBN5+8o?= =?utf-8?q?VoKwve7wcO7se4BJY6Qeky6KyAaoAyhzE62CMDdKxY6htmhgbkXvdfIrtorftmm7M?= =?utf-8?q?VdWhoRiWjZagHXXzeuhOIynsZ5UOz/wlGrIY+5tm8KgJ1AEPEjmizcrk9o2mn4c/B?= =?utf-8?q?g0NrqtZcWJ+nVRVGxgeB+zNyC+XBwThv2clPp1JLJEHp/P9/lP3QUCGippGJ1E+oD?= =?utf-8?q?f5/oM8EzkI3/8CqQ05LP9eFwjPM49RqOJ0yPA8a8wXUeFVafL67sVCq511RWTXmEL?= =?utf-8?q?1Dky7y9j8dHvRqY9K5WLAAWZOTOeY4zNHB9IGf931rH4ucY4Kclp3EQkPTT8rlaAa?= =?utf-8?q?y42/7ZB+JE86cTXbruvlrcAM7agX3kjSNE2Nifk/FLlIYlof2y1L6xn9vDlmuHBq+?= =?utf-8?q?ZiLA3w+m8hQyW+nUFJKLZBiTGoNJ0VM9Ng/X2HuJfKH4NjyJVlXnEU8KcNmqx+X2M?= =?utf-8?q?o72lGHY57r6MVDjZlW/7Xyf4SV6aMc7Ve7KxEvVZDmPYdNy9XQQ9SpRdFs7H1BJCH?= =?utf-8?q?psvTW/1aMduoJxThd+DzjcFjSq0xDkjlcHFOrSSeTV6+5UyDS8R5WCWbfbiFiBQTw?= =?utf-8?q?e7+McC7sKGYmrKczebQIeIsiBayNX0uwyRu4hsNWDEyPxqEYT0OFmmCbtvy7g1j7E?= =?utf-8?q?WPEL6jj1wrElKsF19bH922cQrooEHwVbwFKLGvb+K1zvXj6JukGovsAfEwaOwxOtZ?= =?utf-8?q?JPXPTbNMFVk2/fde57BGF36ssDtoKDwCF7eeke2rs2WNcRuwH7/aG3XZXgNFweuKk?= =?utf-8?q?W97AxvN4k0qJbF/HvpJYS9dt6KhsUI+30hNkPEs5P0HrdcH2UvyNSr/FLZDyzL0pC?= =?utf-8?q?NrrdlYg+EtOludVGlOz5Ld1IrwDv3yiLJReBQTOtLgCtioGvYR+p+zGlhSng/kwUX?= =?utf-8?q?V6s0pfNBVWJZtusqIXoLo46OrarsuLG5zl9E1OSBUHyvSOLwZqezB59GUKd7mvL5O?= =?utf-8?q?BO0i2DEopXR+OSr28HLwG7lZi2lJl4GLbQ=3D=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(82310400026)(1800799024)(36860700013)(35042699022)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Aug 2024 15:45:20.0098 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b1481f33-f683-4b74-5a85-08dcc12f18a5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF0000A79C.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9277 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of this extension are implemented as the following builtin functions: * vamax_f16 * vamaxq_f16 * vamax_f32 * vamaxq_f32 * vamaxq_f64 * vamin_f16 * vaminq_f16 * vamin_f32 * vaminq_f32 * vaminq_f64 gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): New enum values for faminmax builtins. (aarch64_init_faminmax_builtins): New function to declare new builtins. (handle_arm_neon_h): Modify to call aarch64_init_faminmax_builtins. (aarch64_general_check_builtin_call): Modify to check whether +faminmax flag is being used and printing error message if not being used. (aarch64_expand_builtin_faminmax): New function to emit instructions of this extension. (aarch64_general_expand_builtin): Modify to call aarch64_expand_builtin_faminmax. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Introduce new flag for this extension. * config/aarch64/aarch64-simd.md (aarch64_): Instruction pattern for faminmax intrinsics. * config/aarch64/aarch64.h (TARGET_FAMINMAX): Introduce new flag for this extension. * config/aarch64/iterators.md: Introduce new iterators for faminmax intrinsics. * config/arm/types.md: Introduce neon_fp_aminmax attributes. * doc/invoke.texi: Document extension in AArch64 Options. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-builtins.c: New test. --- gcc/config/aarch64/aarch64-builtins.cc | 126 ++++++++++++++++++ .../aarch64/aarch64-option-extensions.def | 2 + gcc/config/aarch64/aarch64-simd.md | 11 ++ gcc/config/aarch64/aarch64.h | 4 + gcc/config/aarch64/iterators.md | 9 ++ gcc/config/arm/types.md | 6 + gcc/doc/invoke.texi | 2 + .../aarch64/simd/faminmax-builtins-no-flag.c | 10 ++ .../aarch64/simd/faminmax-builtins.c | 115 ++++++++++++++++ 9 files changed, 285 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index eb878b933fe..95ec8b6cccc 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -829,6 +829,17 @@ enum aarch64_builtins AARCH64_RBIT, AARCH64_RBITL, AARCH64_RBITLL, + /* FAMINMAX builtins. */ + AARCH64_FAMINMAX_BUILTIN_FAMAX4H, + AARCH64_FAMINMAX_BUILTIN_FAMAX8H, + AARCH64_FAMINMAX_BUILTIN_FAMAX2S, + AARCH64_FAMINMAX_BUILTIN_FAMAX4S, + AARCH64_FAMINMAX_BUILTIN_FAMAX2D, + AARCH64_FAMINMAX_BUILTIN_FAMIN4H, + AARCH64_FAMINMAX_BUILTIN_FAMIN8H, + AARCH64_FAMINMAX_BUILTIN_FAMIN2S, + AARCH64_FAMINMAX_BUILTIN_FAMIN4S, + AARCH64_FAMINMAX_BUILTIN_FAMIN2D, /* System register builtins. */ AARCH64_RSR, AARCH64_RSRP, @@ -1547,6 +1558,66 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma) } } +/* Initialize the absolute maximum/minimum (FAMINMAX) builtins. */ + +typedef struct +{ + const char *name; + unsigned int code; + tree eltype; + machine_mode mode; +} faminmax_builtins_data; + +static void +aarch64_init_faminmax_builtins () +{ + faminmax_builtins_data data[] = { + /* Absolute maximum. */ + {"vamax_f16", AARCH64_FAMINMAX_BUILTIN_FAMAX4H, + aarch64_simd_types[Float16x4_t].eltype, + aarch64_simd_types[Float16x4_t].mode}, + {"vamaxq_f16", AARCH64_FAMINMAX_BUILTIN_FAMAX8H, + aarch64_simd_types[Float16x8_t].eltype, + aarch64_simd_types[Float16x8_t].mode}, + {"vamax_f32", AARCH64_FAMINMAX_BUILTIN_FAMAX2S, + aarch64_simd_types[Float32x2_t].eltype, + aarch64_simd_types[Float32x2_t].mode}, + {"vamaxq_f32", AARCH64_FAMINMAX_BUILTIN_FAMAX4S, + aarch64_simd_types[Float32x4_t].eltype, + aarch64_simd_types[Float32x4_t].mode}, + {"vamaxq_f64", AARCH64_FAMINMAX_BUILTIN_FAMAX2D, + aarch64_simd_types[Float64x2_t].eltype, + aarch64_simd_types[Float64x2_t].mode}, + /* Absolute minimum. */ + {"vamin_f16", AARCH64_FAMINMAX_BUILTIN_FAMIN4H, + aarch64_simd_types[Float16x4_t].eltype, + aarch64_simd_types[Float16x4_t].mode}, + {"vaminq_f16", AARCH64_FAMINMAX_BUILTIN_FAMIN8H, + aarch64_simd_types[Float16x8_t].eltype, + aarch64_simd_types[Float16x8_t].mode}, + {"vamin_f32", AARCH64_FAMINMAX_BUILTIN_FAMIN2S, + aarch64_simd_types[Float32x2_t].eltype, + aarch64_simd_types[Float32x2_t].mode}, + {"vaminq_f32", AARCH64_FAMINMAX_BUILTIN_FAMIN4S, + aarch64_simd_types[Float32x4_t].eltype, + aarch64_simd_types[Float32x4_t].mode}, + {"vaminq_f64", AARCH64_FAMINMAX_BUILTIN_FAMIN2D, + aarch64_simd_types[Float64x2_t].eltype, + aarch64_simd_types[Float64x2_t].mode}, + }; + + for (size_t i = 0; i < ARRAY_SIZE (data); ++i) + { + tree type + = build_vector_type (data[i].eltype, GET_MODE_NUNITS (data[i].mode)); + tree fntype = build_function_type_list (type, type, type, NULL_TREE); + unsigned int code = data[i].code; + const char *name = data[i].name; + aarch64_builtin_decls[code] + = aarch64_general_simulate_builtin (name, fntype, code); + } +} + /* Register the tuple type that contains NUM_VECTORS of the AdvSIMD type indexed by TYPE_INDEX. */ static void @@ -1640,6 +1711,7 @@ handle_arm_neon_h (void) aarch64_init_simd_builtin_functions (true); aarch64_init_simd_intrinsics (); + aarch64_init_faminmax_builtins (); } static void @@ -2317,6 +2389,19 @@ aarch64_general_check_builtin_call (location_t location, vec, return aarch64_check_required_extensions (location, decl, AARCH64_FL_LS64); + case AARCH64_FAMINMAX_BUILTIN_FAMAX4H: + case AARCH64_FAMINMAX_BUILTIN_FAMAX8H: + case AARCH64_FAMINMAX_BUILTIN_FAMAX2S: + case AARCH64_FAMINMAX_BUILTIN_FAMAX4S: + case AARCH64_FAMINMAX_BUILTIN_FAMAX2D: + case AARCH64_FAMINMAX_BUILTIN_FAMIN4H: + case AARCH64_FAMINMAX_BUILTIN_FAMIN8H: + case AARCH64_FAMINMAX_BUILTIN_FAMIN2S: + case AARCH64_FAMINMAX_BUILTIN_FAMIN4S: + case AARCH64_FAMINMAX_BUILTIN_FAMIN2D: + return aarch64_check_required_extensions (location, decl, + AARCH64_FL_FAMINMAX); + default: break; } @@ -3189,6 +3274,44 @@ aarch64_expand_builtin_data_intrinsic (unsigned int fcode, tree exp, rtx target) return ops[0].value; } +static rtx +aarch64_expand_builtin_faminmax (unsigned int fcode, tree exp, rtx target) +{ + machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + rtx op0 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 0))); + rtx op1 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 1))); + + enum insn_code icode; + if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX4H) + icode = CODE_FOR_aarch64_famaxv4hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX8H) + icode = CODE_FOR_aarch64_famaxv8hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX2S) + icode = CODE_FOR_aarch64_famaxv2sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX4S) + icode = CODE_FOR_aarch64_famaxv4sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX2D) + icode = CODE_FOR_aarch64_famaxv2df; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN4H) + icode = CODE_FOR_aarch64_faminv4hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN8H) + icode = CODE_FOR_aarch64_faminv8hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN2S) + icode = CODE_FOR_aarch64_faminv2sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN4S) + icode = CODE_FOR_aarch64_faminv4sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN2D) + icode = CODE_FOR_aarch64_faminv2df; + else + gcc_unreachable (); + + rtx pat = GEN_FCN (icode) (target, op0, op1); + + emit_insn (pat); + + return target; +} + /* Expand an expression EXP as fpsr or fpcr setter (depending on UNSPEC) using MODE. */ static void @@ -3368,6 +3491,9 @@ aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target, if (fcode >= AARCH64_REV16 && fcode <= AARCH64_RBITLL) return aarch64_expand_builtin_data_intrinsic (fcode, exp, target); + if (fcode >= AARCH64_FAMINMAX_BUILTIN_FAMAX4H + && fcode <= AARCH64_FAMINMAX_BUILTIN_FAMIN2D) + return aarch64_expand_builtin_faminmax (fcode, exp, target); gcc_unreachable (); } diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 6998627f377..8279f5a76ea 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -234,6 +234,8 @@ AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs") AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8") +AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax") + #undef AARCH64_OPT_FMV_EXTENSION #undef AARCH64_OPT_EXTENSION #undef AARCH64_FMV_FEATURE diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 23c03a96371..488e27c36a9 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -9910,3 +9910,14 @@ "shl\\t%d0, %d1, #16" [(set_attr "type" "neon_shift_imm")] ) + +;; faminmax +(define_insn "aarch64_" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FAMINMAX_UNS))] + "TARGET_FAMINMAX" + "\t%0., %1., %2." + [(set_attr "type" "neon_fp_aminmax")] +) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 2dfb999bea5..de14f57071a 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -457,6 +457,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED enabled through +gcs. */ #define TARGET_GCS AARCH64_HAVE_ISA (GCS) +/* Floating Point Absolute Maximum/Minimum extension instructions are + enabled through +faminmax. */ +#define TARGET_FAMINMAX AARCH64_HAVE_ISA (FAMINMAX) + /* Prefer different predicate registers for the output of a predicated operation over re-using an existing input predicate. */ #define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 20a318e023b..17ac5e073aa 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1057,6 +1057,8 @@ UNSPEC_BFCVTN2 ; Used in aarch64-simd.md. UNSPEC_BFCVT ; Used in aarch64-simd.md. UNSPEC_FCVTXN ; Used in aarch64-simd.md. + UNSPEC_FAMAX ; Used in aarch64-simd.md. + UNSPEC_FAMIN ; Used in aarch64-simd.md. ;; All used in aarch64-sve2.md UNSPEC_FCVTN @@ -4463,3 +4465,10 @@ (UNSPECV_SET_FPCR "fpcr")]) (define_int_attr bits_etype [(8 "b") (16 "h") (32 "s") (64 "d")]) + +;; Iterators and attributes for faminmax + +(define_int_iterator FAMINMAX_UNS [UNSPEC_FAMAX UNSPEC_FAMIN]) + +(define_int_attr faminmax_uns_op + [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")]) diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 9527bdb9e87..d8de9dbc9d1 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -492,6 +492,8 @@ ; neon_fp_reduc_minmax_s_q ; neon_fp_reduc_minmax_d ; neon_fp_reduc_minmax_d_q +; neon_fp_aminmax +; neon_fp_aminmax_q ; neon_fp_cvt_narrow_s_q ; neon_fp_cvt_narrow_d_q ; neon_fp_cvt_widen_h @@ -1044,6 +1046,8 @@ neon_fp_reduc_minmax_d,\ neon_fp_reduc_minmax_d_q,\ \ + neon_fp_aminmax,\ + neon_fp_aminmax_q,\ neon_fp_cvt_narrow_s_q,\ neon_fp_cvt_narrow_d_q,\ neon_fp_cvt_widen_h,\ @@ -1264,6 +1268,8 @@ neon_fp_reduc_add_d_q, neon_fp_reduc_minmax_s, neon_fp_reduc_minmax_s_q, neon_fp_reduc_minmax_d,\ neon_fp_reduc_minmax_d_q,\ + neon_fp_aminmax, neon_fp_aminmax_q,\ + neon_fp_aminmax, neon_fp_aminmax_q,\ neon_fp_cvt_narrow_s_q, neon_fp_cvt_narrow_d_q,\ neon_fp_cvt_widen_h, neon_fp_cvt_widen_s, neon_fp_to_int_s,\ neon_fp_to_int_s_q, neon_int_to_fp_s, neon_int_to_fp_s_q,\ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 32b772d2a8a..2c509f62d98 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21865,6 +21865,8 @@ Enable support for Armv8.9-a/9.4-a translation hardening extension. Enable the RCpc3 (Release Consistency) extension. @item fp8 Enable the fp8 (8-bit floating point) extension. +@item faminmax +Enable the Floating Point Absolute Maximum/Minimum extension. @end table diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c new file mode 100644 index 00000000000..63ed1508c23 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c @@ -0,0 +1,10 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-march=armv9-a" } */ + +#include "arm_neon.h" + +void +test (float32x4_t a, float32x4_t b) +{ + vamaxq_f32 (a, b); /* { dg-error {ACLE function 'vamaxq_f32' requires ISA extension 'faminmax'} } */ +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c new file mode 100644 index 00000000000..7e4f3eba81a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c @@ -0,0 +1,115 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O3 -march=armv9-a+faminmax" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +/* +** test_vamax_f16: +** famax v0.4h, v0.4h, v1.4h +** ret +*/ +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + return vamax_f16 (a, b); +} + +/* +** test_vamaxq_f16: +** famax v0.8h, v0.8h, v1.8h +** ret +*/ +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + return vamaxq_f16 (a, b); +} + +/* +** test_vamax_f32: +** famax v0.2s, v0.2s, v1.2s +** ret +*/ +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + return vamax_f32 (a, b); +} + +/* +** test_vamaxq_f32: +** famax v0.4s, v0.4s, v1.4s +** ret +*/ +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + return vamaxq_f32 (a, b); +} + +/* +** test_vamaxq_f64: +** famax v0.2d, v0.2d, v1.2d +** ret +*/ +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + return vamaxq_f64 (a, b); +} + +/* +** test_vamin_f16: +** famin v0.4h, v0.4h, v1.4h +** ret +*/ +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + return vamin_f16 (a, b); +} + +/* +** test_vaminq_f16: +** famin v0.8h, v0.8h, v1.8h +** ret +*/ +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + return vaminq_f16 (a, b); +} + +/* +** test_vamin_f32: +** famin v0.2s, v0.2s, v1.2s +** ret +*/ +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + return vamin_f32 (a, b); +} + +/* +** test_vaminq_f32: +** famin v0.4s, v0.4s, v1.4s +** ret +*/ +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + return vaminq_f32 (a, b); +} + +/* +** test_vaminq_f64: +** famin v0.2d, v0.2d, v1.2d +** ret +*/ +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + return vaminq_f64 (a, b); +}