From patchwork Wed Nov 6 10:03:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurabh Jha X-Patchwork-Id: 2007383 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=FJqQb/5i; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=FJqQb/5i; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Xk18G54V2z1xyM for ; Wed, 6 Nov 2024 21:06:54 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C52423858C60 for ; Wed, 6 Nov 2024 10:06:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2062f.outbound.protection.outlook.com [IPv6:2a01:111:f403:2612::62f]) by sourceware.org (Postfix) with ESMTPS id 38DD7385840F for ; Wed, 6 Nov 2024 10:05:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38DD7385840F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 38DD7385840F Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2612::62f ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1730887538; cv=pass; b=rIxKfgOyjR62y1xrFHZf9W7O+qRde2wXzoZ3G97ep4ePxtQnO9uZg++3xfBOy/TKiZ1VnXTnkRpizKqcC8KnHwYwyCAh8sUAVaMBODDJNWK252ki4FC4cogd2pbIdLg72T9XhtZyZphsfIZjYHR1cEScDx5eN1EECnI5EkLa7AA= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1730887538; c=relaxed/simple; bh=rBMyxZsi3AYFYA1CYHKcjSnHKotGPhJZJhuD5z8TL7E=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=xkixs/hg3efYhUhcBbluc0KnnoWEvIPV7WOOo01MrytEEsYMDARL0dZkewpMVPtO7H0XnW/MuHaMkMNQxj2EE36wBXR9KdPXe/OMowVYBDhJscbct0RFvctqQafFNgU3VJcISyzl1hF51hTcoBKg8Nls+WOSaBDnxoRZNt07g1I= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=Rb7aFSE7yPGG1BqMlnOmHHT0CgjnCpBKfoXc/LYTWIGWndXTMgzm5XtVCrU4TO9oOTD5rgPS08PTdax2O/FU7h+zk3omG53KxiV5a1GsuP1phIsd7IGXJhXi+C+j9byAls3kp3av+70FCWR7M6lIl0X4hJQ6VHCNDPyxfZeVCQEihuqyL+Od+8oczwdxEsEpUOkhnd3wWD22QLoDGpJkmj5EfVsBDJmsLUYMRAe+64xMw95VyDLzQpyN8J73jJTlgQMy6rhHmXtIPYHZE40W3Pt4LPyKNpWYzTfQudQd7eMzZ6A7zarKODjdq3HVyOZQvY1/oFJtZgxuh4Jp0YXrWg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HxXDujIHJSQnV/X9n3L9Zx/VxKD0ts9NXSWmo/s7vLo=; b=FlNV2f85YRDvj1QQ8h+EVXkx9zoey8eu05h5ISMuzntgibmjjtjASJG3jgPYz+Xm/x6zQi0VLHxZFYV+4q4fbMaEgQ/v8afSk/pLw7266eflrok1ArZqx2kooZ+jQ0VtxfOVQxsiXI2lNASfCPZ5dreRS6LsPYLTzLb/n374SKpLKfUohEz/NUaNI8Aso0dDOE+dwPOvUBsDZy3KMFypjAbDSOJjj3ma0bWPLRcMnsqgjDJPKwoDsLmt0mabt+y8VApzTQoSq6SnbTSe+K9oDK3CLY/S90jLe57Sl4WaaO71vY0kdvsqRK28tM9evitVbgwFyEQRcHSeTOCGOywLyw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HxXDujIHJSQnV/X9n3L9Zx/VxKD0ts9NXSWmo/s7vLo=; b=FJqQb/5ifm3sVb+I3jZRq0NnMjbt3xLWlARVIDLpAVI0lUW7M9CCv8CkFvXNKjWft8dzblX9xKGuKUGvgNBFkNf3e/ry9FpnKDVLdu5Yox7yh3gZaR0hocHIdmiITjiTqBU6/xaBZTbc5xfRasAOnZCfLVmXZtJ7rdG+VHZ43f0= Received: from DUZPR01CA0231.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b4::12) by AS8PR08MB7791.eurprd08.prod.outlook.com (2603:10a6:20b:52d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.30; Wed, 6 Nov 2024 10:05:30 +0000 Received: from DU6PEPF0000B620.eurprd02.prod.outlook.com (2603:10a6:10:4b4:cafe::fd) by DUZPR01CA0231.outlook.office365.com (2603:10a6:10:4b4::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8137.19 via Frontend Transport; Wed, 6 Nov 2024 10:05:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF0000B620.mail.protection.outlook.com (10.167.8.136) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8137.17 via Frontend Transport; Wed, 6 Nov 2024 10:05:30 +0000 Received: ("Tessian outbound 0fe1877cedb7:v490"); Wed, 06 Nov 2024 10:05:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 51be4e511a062eb8 X-TessianGatewayMetadata: 0EbSTijtfAHYmB2i7xtrpPlAYhrY4gG4swYRIPxxwuQmu+pZTGr1e7YQkv809gnFQNNabaqR5XyXWh0xZKF4mhHscfcb23PJ5iVfCWGo8WsmO2tPcZO1wlq7OeTCT70kqXl8R2+vjxAiFz4hzm2vyBWhkousDYWPb/Y/UQztwXM= X-CR-MTA-TID: 64aa7808 Received: from L4378f6d6b65f.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 44CAFEB2-28AB-45FD-BDC7-5A70FC448F7D.1; Wed, 06 Nov 2024 10:04:17 +0000 Received: from EUR03-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L4378f6d6b65f.2 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Wed, 06 Nov 2024 10:04:17 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=jU2iUbD7IGU9KNiSDPGt8E27M+TzlOLYKbMtLYdteBzO4728/PjTJ8I2ZiXhyJiTLhLc3AiQNadJxYYQEpx1VLZDGMbEXxMfcO9c6KPYAAduwLcAvy4KZpJtir76R96yIgV+9yhi0QPvG9N1lcc8WoQCYGkfd/q25sl8bJeSZuM+TNI9WlW0oj8pHFpOfyUtIkGGYzJ128YHY6MOZxdWwUtKltDusbpjnZMbBXcSk9I/SM/17jsXQRlRO8mYGOxiI2FVk2M4c29o30qGWV4VCfM6JGNHppmTl+Ec3Rgf4EVjAMWcXDHkIdFXCWNvssBEtvYaMQQR3Sml930HP4dp7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HxXDujIHJSQnV/X9n3L9Zx/VxKD0ts9NXSWmo/s7vLo=; b=AN4R9O92bj/2SMIUareCpZ+K53HOLYshEBPAwp6yEP/cYYDhoHI62pFGa3TzCHffgX8H9n13+ad0r0UNbQsWacqO4YynQQ5/N1Z0dw7g0h/EpVc1r0+LsmJwesGFONRDrF6DYcN4QRxB7TkZpGsAdfgRDrEZZ2L69XBFl1LB0rPKSW6aItCEloZ1FCZ5UMIKwz83+C5OBjSLkvy7spNKjBEens82UrgdlP6S0cZ16HdH5dK4K47NudzSXa/DJBD4zIaZqsXXncilSuKwqJE7rLXKBCB109cDtHaO17SF1fCNz6ijz4KYm+zK/7NcoRLyQISWlbmtqeZcMJfQg3FgjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HxXDujIHJSQnV/X9n3L9Zx/VxKD0ts9NXSWmo/s7vLo=; b=FJqQb/5ifm3sVb+I3jZRq0NnMjbt3xLWlARVIDLpAVI0lUW7M9CCv8CkFvXNKjWft8dzblX9xKGuKUGvgNBFkNf3e/ry9FpnKDVLdu5Yox7yh3gZaR0hocHIdmiITjiTqBU6/xaBZTbc5xfRasAOnZCfLVmXZtJ7rdG+VHZ43f0= Received: from AS9PR06CA0373.eurprd06.prod.outlook.com (2603:10a6:20b:460::8) by AM9PR08MB6659.eurprd08.prod.outlook.com (2603:10a6:20b:30a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.31; Wed, 6 Nov 2024 10:04:14 +0000 Received: from AMS0EPF000001A9.eurprd05.prod.outlook.com (2603:10a6:20b:460:cafe::89) by AS9PR06CA0373.outlook.office365.com (2603:10a6:20b:460::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8137.18 via Frontend Transport; Wed, 6 Nov 2024 10:04:14 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AMS0EPF000001A9.mail.protection.outlook.com (10.167.16.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8137.17 via Frontend Transport; Wed, 6 Nov 2024 10:04:14 +0000 Received: from AZ-NEU-EX05.Arm.com (10.240.25.133) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 6 Nov 2024 10:04:12 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX05.Arm.com (10.240.25.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 6 Nov 2024 10:04:10 +0000 Received: from e130340.cambridge.arm.com (10.2.80.47) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Wed, 6 Nov 2024 10:04:10 +0000 From: To: CC: , , Saurabh Jha Subject: [PATCH 2/3] aarch64: Add support for fp8dot2 and fp8dot4 Date: Wed, 6 Nov 2024 10:03:57 +0000 Message-ID: <20241106100358.3622028-3-saurabh.jha@arm.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241106100358.3622028-1-saurabh.jha@arm.com> References: <20241106100358.3622028-1-saurabh.jha@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AMS0EPF000001A9:EE_|AM9PR08MB6659:EE_|DU6PEPF0000B620:EE_|AS8PR08MB7791:EE_ X-MS-Office365-Filtering-Correlation-Id: a9dd909d-2b48-4023-f4fe-08dcfe4a8bb3 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|36860700013|1800799024|82310400026; X-Microsoft-Antispam-Message-Info-Original: tw27Nf6GPLfUmF+m4Osyj8DY/I6NFvcVPDH6Qi/9VWYVKEHzCRHpZKquHb6O1L0FW/P9dVPfaEOTeSQO56oUczZG24u9v/ME4Avzr4hwVJpciN2r6OklNLgDiXU3pt2UQt/i8n3YOs3Tc6XEfH0/1ZX34MxkKt8K6S4GNCqWr+jOKDGrJNzuPR3xpyLu9oPO5zpQylexJ/PBFKLgrQhjcmpDeisU+j4oaKFLZJRkXcfo6PZXz0l0HBCiD86hjiJ+5+xzhmzYvvJKHGP5Gax+hoOCZVjl7vLXH2KYLUzUK8MzuF2ZyAspMo3hCcGge+yrs0Fa8qT30dY1V6eYQEZQbQeOOmc8mJmpwzyBuWl+gzPPWx7Ytm7J2I05jMGyzL2Efoqo/jHgK8lcEv3FsGDkhUS8TwMzB34Ntgfon/gZ42szEaAlHd+PR0PWuMtmQ1NWG8Gu4wRLvEE7wLI5VT6cITiqKlhE1F7LjRIf4yTX0rj8eOhSF8c8tpY6AzXvF8rL0ZDiXNyjAhgp1kUFFtOtnZHADAuUR+TXZQ4esPiI8Weufqou5RXpeAyTJwS4kZBRfSwmlR3D5w+hmwclv9Fi5+Pqcti6A5QsGeUo6A/OUi2lpuo/jautcD9eRkYY4POy5kKmhEFhAlZauuyijK+ADyGLwdwpk5cdChDaKTxeAYqFuRx+9utf3jnsAb0aSBcJy6BrD4myImGmtmhuHsY5c8sVEsjkGvEj0LC8Mjy/fQQmoy2jyWMbTH1sLuBgJ1BufQWQ8+0m6z93AtljFPL61zAVM37v23F/tDqxCHmGqEzwbYkd+c7b3HPIdBWfubSx63CaT5LlWK2uuP9C54ac7DZxk00HDtPatGFXQ1mUJctyZg4Up+A92TjPhOh89w8D5G6DukBse+rz5nS9AfslF8qjJIjOCrSvnVjLSulfYhQMT37nrC+67T0JJ+/zLQmP2tkt2FkFqKOsOpu47vwIZ9Ji5AI4b0Yj2k9jcWqJjri/nCOH7liBg5hHeq2QaWN7ci4yKoe2XsbeRlV2VIG+quXa02f4MHfboeYnm0OWjM3AZHFXpvLG0ORKY41HXyFn1/rt8LVQ6oAmGSnxSUtBqVa7W28ToN2idO7byDw9nTA6QRMOLvesoqd1JYn/UKtLIsVc+4bNW79miEid0A2Y7t9cX1lNguIC95LT6RnbUQzM/NWYe8uxSZSVKxcNCLTCtmVX8PUEmfpz/fv6Y07QDMBlawmYBtvrJtoVlPFupsKtYQr+OnwVhHzMtsTbNIk6cW5yIvomG0TVenK9C8Zhl4H/+xlLqhXWQfI3cOZMLY7jhG8dRadkz7JC3CdSPXLJJXVKqcIaRKDaFezbbbEY68/1hJQhJizfFk2lm9lBC5E= X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(36860700013)(1800799024)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6659 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:20b:460::8]; domain=AS9PR06CA0373.eurprd06.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF0000B620.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: cb301db0-2460-4dac-0634-08dcfe4a5e56 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|1800799024|14060799003|36860700013|82310400026|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?BCp696V7vVX/o7gWfEZoQDWFWhCox0T?= =?utf-8?q?POw+oF/Y74K7K86/R42RznBN3CcoxnDOg2qK+pnmbUVyq4yfBoHUujriy+o6LFCCm?= =?utf-8?q?stAkjJnXRJWN1kRG+g1GKBpKNn/yOHL5c2UOH5z6hc5kaHN4D5L9eUUWcj1e7Rthh?= =?utf-8?q?7Th+pPUM4UDp2an2QNI9fKizJM6hFmeAF3pykyNiTBiqJ7i/GyHp88keuT4AiklGH?= =?utf-8?q?gSOunvE0D2aJvXeQQ3alSth4XsLpUVqMSP5a+nNuAPxkD/APgkLvS7pSWabG73MAH?= =?utf-8?q?VQFHpqGQFH9vlwVt8AjXumfQ55lrV5IqqSV5uadHQr+FgXx2ibmifcz1DRAeQ2aKw?= =?utf-8?q?l8j9RngbnbJgBdp5w6aFqQLUJiAkjXNmjUcPhBxCLuvM1APgUpaGFPNrCo+aOs+GL?= =?utf-8?q?eGSg7nai8hWDbvXN+c4LadhZQGk6LK4hIDCy3Ppmvl44BySN1T6tD629U0h+pkxxa?= =?utf-8?q?RCJOe+wkHwiCx2w/Rs7zu69CcOT3+QCWZDmiJWhAA89a3fPFJL2z2AX+ipAT9I0Eb?= =?utf-8?q?A+2uLea2gqfmcI7vsYLGhD6t5z3mRs6OC/6n6p6KlQGQ6FS2q3Ahx+PImL0OC6pYf?= =?utf-8?q?ZxHxxWc7op/hj72IKsO2JTDE+87h2fD2qXYFMF3pKwt260OOGJlaJmf93T0vcR3PW?= =?utf-8?q?F3if5bmxG5rwSO70iuLczWEWnW3eUwdNmId2iMmv8Z3AEG3AoaHHj6CQ63GT1YIvb?= =?utf-8?q?0LhQmGR3Tsy6s5RSujylGpM0iUWxh7ohTF7c/uzTLtv9pfsfGdPmTgG+9mZRIKzQ4?= =?utf-8?q?DrrW4kKiugoNNs43t5cM5ueDpXy/G3rajYly2rj1neps1s1yMwbG4XikpMdAD7P0b?= =?utf-8?q?eBWnR6a2F2ZYHt+/ko2iQumGIT9VBhhdDgLKBfsVClDoqWPG4nmuNcYs53CFSTYgo?= =?utf-8?q?XVBzEy+cb3/PfouTtpiOWSt8aUDhfHIJ+JR6v0XrAqH2HHJ3c/xiTLxAuKDbtnK3C?= =?utf-8?q?ff58j6pBp8hiuoBqxQJRvJ2kg2XHXdO5sgttwdq8oclxvRqM3qGVvyhK4pSZufcdq?= =?utf-8?q?3REIsizQtSr22rO9v6V4SSnMX9vAjQnm9DZcFyWjzW2HkW+ifY0y1D5wChBE2bTqn?= =?utf-8?q?lfqwgRy/KgS7g0GmfD7fh3d2Ln+vy/sb4qlOJNGqT8lHxOtKHnoP56Qf1yX/4uqQv?= =?utf-8?q?K7zx9o63+tAbZyca+fGm4Yeh8Rc5ws8LUEagPfIJN9BmhTO2uBkQCRPhLUz69xFPg?= =?utf-8?q?nOuB5QzeS+hnFraHD4s/kvmfa6lXCZXpcnpu0O1N26wErUSojAismVmn2TeGThb0f?= =?utf-8?q?tIGUpBjzaSws2ficAtiChmfdwqw1firOkgs73qYYJYS6g3CLryHaUXMc=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(35042699022)(1800799024)(14060799003)(36860700013)(82310400026)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2024 10:05:30.4473 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a9dd909d-2b48-4023-f4fe-08dcfe4a8bb3 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF0000B620.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7791 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The AArch64 FEAT_FP8DOT2 and FEAT_FP8DOT4 extension introduces instructions for dot product of vectors. This patch introduces the following intrinsics: 1. vdot{q}_{fp16|fp32}_mf8_fpm. 2. vdot{q}_lane{q}_{fp16|fp32}_mf8_fpm. It introduces two flags: fp8dot2 and fp8dot4. We had to add space for another type in aarch64_pragma_builtins_data struct. The macros were updated to reflect that. We added a new aarch64_builtin_signature variant, ternary_fpm_lane, and added support it in declaration of types and expansion to RTL. We added a new namespace, function_checker, to implement range checks for functions defined using the new pragma approach. The old intrinsic range checks should remain unaffected. All the new AdvSIMD intrinsics we define that need lane checks should be using the function in this namespace to implement the checks. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Change to handle extra type. (enum class): Added new variant. (struct aarch64_pragma_builtins_data): Add support for another type. (aarch64_fntype): Handle new signature. (require_integer_constant): New function to check whether the operand is an integer constant. (require_immediate_range): New function to validate index ranges. (check_simd_lane_bounds): New function to validate index operands. (aarch64_expand_pragma_builtin): Handle new signature. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flags. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): New flags. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Change to handle extra type. (ENTRY_BINARY_FPM): Change to handle extra type. (ENTRY_TERNARY_FPM_LANE): Macro to declare fpm ternary with lane intrinsics. (ENTRY_VDOT_FPM): Change to handle extra type. (ENTRY_UNARY_FPM): Change to handle extra type. * config/aarch64/aarch64-simd.md: New instruction pattern for fp8dot2 and fp8dot4 instructions. * config/aarch64/aarch64.h (TARGET_FP8DOT2): New flag for fp8dot2 instructions. (TARGET_FP8DOT4): New flag for fp8dot4 instructions. * config/aarch64/iterators.md: New attributes and iterators. * doc/invoke.texi: New flag for fp8dot2 and fp8dot4 instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vdot2_fpmdot.c: New test. * gcc.target/aarch64/simd/vdot4_fpmdot.c: New test. --- Is there a better to validate indices? --- gcc/config/aarch64/aarch64-builtins.cc | 138 +++++++++++++++++- gcc/config/aarch64/aarch64-c.cc | 4 + .../aarch64/aarch64-option-extensions.def | 4 + .../aarch64/aarch64-simd-pragma-builtins.def | 39 ++++- gcc/config/aarch64/aarch64-simd.md | 58 ++++++++ gcc/config/aarch64/aarch64.h | 6 + gcc/config/aarch64/iterators.md | 20 ++- gcc/doc/invoke.texi | 4 + .../gcc.target/aarch64/simd/vdot2_fpmdot.c | 77 ++++++++++ .../gcc.target/aarch64/simd/vdot4_fpmdot.c | 77 ++++++++++ 10 files changed, 415 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vdot2_fpmdot.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vdot4_fpmdot.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index df19bff71d0..ba3bffaa4f9 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -780,7 +780,7 @@ typedef struct AARCH64_SIMD_BUILTIN_##T##_##N##A, #undef ENTRY -#define ENTRY(N, S, M0, M1, M2, M3, U) \ +#define ENTRY(N, S, M0, M1, M2, M3, M4, U) \ AARCH64_##N, enum aarch64_builtins @@ -1593,6 +1593,7 @@ enum class aarch64_builtin_signatures binary, binary_fpm, ternary_fpm, + ternary_fpm_lane, unary_fpm, }; @@ -1643,10 +1644,10 @@ namespace simd_types { } #undef ENTRY -#define ENTRY(N, S, T0, T1, T2, T3, U) \ +#define ENTRY(N, S, T0, T1, T2, T3, T4, U) \ {#N, aarch64_builtin_signatures::S, simd_types::T0, simd_types::T1, \ - simd_types::T2, simd_types::T3, U, \ - aarch64_required_extensions::REQUIRED_EXTENSIONS}, + simd_types::T2, simd_types::T3, simd_types::T4, U, \ + aarch64_required_extensions::REQUIRED_EXTENSIONS}, /* Initialize pragma builtins. */ @@ -1654,7 +1655,7 @@ struct aarch64_pragma_builtins_data { const char *name; aarch64_builtin_signatures signature; - simd_type types[4]; + simd_type types[5]; int unspec; aarch64_required_extensions required_extensions; }; @@ -1667,6 +1668,7 @@ static tree aarch64_fntype (const aarch64_pragma_builtins_data &builtin_data) { tree type0, type1, type2, type3; + tree immtype = aarch64_simd_builtin_type (SImode, qualifier_lane_index); switch (builtin_data.signature) { @@ -1701,6 +1703,18 @@ aarch64_fntype (const aarch64_pragma_builtins_data &builtin_data) return build_function_type_list (type0, type1, type2, type3, uint64_type_node, NULL_TREE); + case aarch64_builtin_signatures::ternary_fpm_lane: + type0 = aarch64_simd_builtin_type (builtin_data.types[0].mode, + builtin_data.types[0].qualifiers); + type1 = aarch64_simd_builtin_type (builtin_data.types[1].mode, + builtin_data.types[1].qualifiers); + type2 = aarch64_simd_builtin_type (builtin_data.types[2].mode, + builtin_data.types[2].qualifiers); + type3 = aarch64_simd_builtin_type (builtin_data.types[3].mode, + builtin_data.types[3].qualifiers); + return build_function_type_list (type0, type1, type2, type3, immtype, + uint64_type_node, NULL_TREE); + case aarch64_builtin_signatures::unary_fpm: type0 = aarch64_simd_builtin_type (builtin_data.types[0].mode, builtin_data.types[0].qualifiers); @@ -2519,6 +2533,80 @@ aarch64_general_required_extensions (unsigned int code) return ext::streaming_compatible (0); } +namespace function_checker { + +void +require_integer_constant (location_t location, tree arg) +{ + if (TREE_CODE (arg) != INTEGER_CST) + { + error_at (location, "Constant-type integer argument expected"); + return; + } +} + +void +require_immediate_range (location_t location, tree arg, HOST_WIDE_INT min, + HOST_WIDE_INT max) +{ + if (wi::to_widest (arg) < min || wi::to_widest (arg) > max) + { + error_at (location, "lane out of range %wd - %wd", min, max); + return; + } +} + +/* Validates indexing into a vector using the index's size and the instruction, + where instruction is represented by the unspec. + This only works for intrinsics declared using pragmas in + aarch64-simd-pragma-builtins.def. */ + +void +check_simd_lane_bounds (location_t location, const aarch64_pragma_builtins_data + *builtin_data, tree *args) +{ + if (builtin_data == NULL) + // Don't check for functions that are not declared in + // aarch64-simd-pragma-builtins.def. + return; + + switch (builtin_data->signature) + { + case aarch64_builtin_signatures::ternary_fpm_lane: + { + auto index_arg = args[3]; + require_integer_constant (location, index_arg); + + auto vector_to_index_mode = builtin_data->types[3].mode; + int vector_to_index_mode_size + = GET_MODE_NUNITS (vector_to_index_mode).to_constant (); + + switch (builtin_data->unspec) + { + case UNSPEC_VDOT2: + require_immediate_range (location, index_arg, 0, + vector_to_index_mode_size / 2 - 1); + break; + + case UNSPEC_VDOT4: + require_immediate_range (location, index_arg, 0, + vector_to_index_mode_size / 4 - 1); + break; + + default: + gcc_unreachable (); + } + } + + default: + // Other signatures don't have lanes and this check doesn't apply to + // them. + return; + } +} + +}; + bool aarch64_general_check_builtin_call (location_t location, vec, unsigned int code, tree fndecl, @@ -2530,6 +2618,9 @@ aarch64_general_check_builtin_call (location_t location, vec, if (!aarch64_check_required_extensions (location, decl, required_extensions)) return false; + auto builtin_data = aarch64_get_pragma_builtin (code); + function_checker::check_simd_lane_bounds (location, builtin_data, args); + switch (code) { case AARCH64_RSR: @@ -3425,7 +3516,8 @@ aarch64_expand_pragma_builtin (tree exp, rtx target, const aarch64_pragma_builtins_data *builtin_data) { auto unspec = builtin_data->unspec; - expand_operand ops[4]; + expand_operand ops[5]; + insn_code icode; switch (builtin_data->signature) { @@ -3445,6 +3537,40 @@ aarch64_expand_pragma_builtin (tree exp, rtx target, break; } + case aarch64_builtin_signatures::ternary_fpm_lane: + { + auto input1 = expand_normal (CALL_EXPR_ARG (exp, 0)); + auto input2 = expand_normal (CALL_EXPR_ARG (exp, 1)); + auto input3 = expand_normal (CALL_EXPR_ARG (exp, 2)); + auto index = expand_normal (CALL_EXPR_ARG (exp, 3)); + auto fpm_input = expand_normal (CALL_EXPR_ARG (exp, 4)); + + if (!CONST_INT_P (index)) + { + error_at (EXPR_LOCATION (exp), + "argument should have been a constant"); + break; + } + + auto fpmr = gen_rtx_REG (DImode, FPM_REGNUM); + emit_move_insn (fpmr, fpm_input); + + create_output_operand (&ops[0], target, builtin_data->types[0].mode); + create_input_operand (&ops[1], input1, builtin_data->types[1].mode); + create_input_operand (&ops[2], input2, builtin_data->types[2].mode); + create_input_operand (&ops[3], input3, builtin_data->types[3].mode); + create_input_operand (&ops[4], index, SImode); + + icode = code_for_aarch64 (unspec, + builtin_data->types[0].mode, + builtin_data->types[1].mode, + builtin_data->types[2].mode, + builtin_data->types[3].mode, + SImode); + expand_insn (icode, 5, ops); + break; + } + case aarch64_builtin_signatures::binary_fpm: { auto input1 = expand_normal (CALL_EXPR_ARG (exp, 0)); diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc index 68f9180520a..3e30ba5afd9 100644 --- a/gcc/config/aarch64/aarch64-c.cc +++ b/gcc/config/aarch64/aarch64-c.cc @@ -259,6 +259,10 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) aarch64_def_or_undef (TARGET_FP8, "__ARM_FEATURE_FP8", pfile); + aarch64_def_or_undef (TARGET_FP8DOT2, "__ARM_FEATURE_FP8DOT2", pfile); + + aarch64_def_or_undef (TARGET_FP8DOT4, "__ARM_FEATURE_FP8DOT4", pfile); + aarch64_def_or_undef (TARGET_LS64, "__ARM_FEATURE_LS64", pfile); aarch64_def_or_undef (TARGET_RCPC, "__ARM_FEATURE_RCPC", pfile); diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 8279f5a76ea..fd4d29e5df6 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -234,6 +234,10 @@ AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs") AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8") +AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (SIMD), (), (), "fp8dot2") + +AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (SIMD), (), (), "fp8dot4") + AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax") #undef AARCH64_OPT_FMV_EXTENSION diff --git a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def index cb5b546c541..9dea2939b47 100644 --- a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def @@ -21,17 +21,36 @@ #undef ENTRY_BINARY #define ENTRY_BINARY(N, S, T0, T1, T2, U) \ - ENTRY (N, S, T0, T1, T2, none, U) + ENTRY (N, S, T0, T1, T2, none, none, U) #undef ENTRY_BINARY_FPM #define ENTRY_BINARY_FPM(N, S, T0, T1, T2, U) \ - ENTRY (N, S, T0, T1, T2, none, U) + ENTRY (N, S, T0, T1, T2, none, none, U) #define ENTRY_TERNARY_FPM(N, S, T0, T1, T2, T3, U) \ - ENTRY (N, S, T0, T1, T2, T3, U) - + ENTRY (N, S, T0, T1, T2, T3, none, U) + +#undef ENTRY_TERNARY_FPM_LANE +#define ENTRY_TERNARY_FPM_LANE(N, S, T0, T1, T2, T3, U) \ + ENTRY (N, S, T0, T1, T2, T3, none, U) + +#undef ENTRY_VDOT_FPM +#define ENTRY_VDOT_FPM(T, U) \ + ENTRY_TERNARY_FPM (vdot_##T##_mf8_fpm, ternary_fpm, T, T, f8, f8, U) \ + ENTRY_TERNARY_FPM (vdotq_##T##_mf8_fpm, ternary_fpm, T##q, T##q, f8q, f8q, \ + U) \ + ENTRY_TERNARY_FPM_LANE (vdot_lane_##T##_mf8_fpm, ternary_fpm_lane, T, T, \ + f8, f8, U) \ + ENTRY_TERNARY_FPM_LANE (vdot_laneq_##T##_mf8_fpm, ternary_fpm_lane, T, T, \ + f8, f8q, U) \ + ENTRY_TERNARY_FPM_LANE (vdotq_lane_##T##_mf8_fpm, ternary_fpm_lane, T##q, \ + T##q, f8q, f8, U) \ + ENTRY_TERNARY_FPM_LANE (vdotq_laneq_##T##_mf8_fpm, ternary_fpm_lane, T##q, \ + T##q, f8q, f8q, U) + +#undef ENTRY_UNARY_FPM #define ENTRY_UNARY_FPM(N, S, T0, T1, U) \ - ENTRY (N, S, T0, T1, none, none, U) + ENTRY (N, S, T0, T1, none, none, none, U) #undef ENTRY_VHSDF #define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC) \ @@ -92,3 +111,13 @@ ENTRY_TERNARY_FPM (vcvt_high_mf8_f32_fpm, ternary_fpm, f8q, f8, f32q, f32q, \ #define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_FP8) ENTRY_VHSDF_VHSDI (vscale, binary, UNSPEC_FSCALE) #undef REQUIRED_EXTENSIONS + +// fpm dot product +#define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_FP8DOT2) +ENTRY_VDOT_FPM (f16, UNSPEC_VDOT2) +#undef REQUIRED_EXTENSIONS + +// fpm dot4 product +#define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_FP8DOT4) +ENTRY_VDOT_FPM (f32, UNSPEC_VDOT4) +#undef REQUIRED_EXTENSIONS diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 87bbfb0e586..ea1ef4963d2 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -10067,3 +10067,61 @@ "TARGET_FP8" "\t%0., %1., %2." ) + +;; fpm vdot2 instructions. +(define_insn + "@aarch64_" + [(set (match_operand:VHF 0 "register_operand" "=w") + (unspec:VHF + [(match_operand:VHF 1 "register_operand" "w") + (match_operand:VB 2 "register_operand" "w") + (match_operand:VB 3 "register_operand" "w") + (reg:DI FPM_REGNUM)] + FPM_VDOT2_UNS))] + "TARGET_FP8DOT2" + "\t%1., %2., %3." +) + +;; fpm vdot2 instructions with lane. +(define_insn + "@aarch64_" + [(set (match_operand:VHF 0 "register_operand" "=w") + (unspec:VHF + [(match_operand:VHF 1 "register_operand" "w") + (match_operand:VB 2 "register_operand" "w") + (match_operand:VB2 3 "register_operand" "w") + (match_operand:SI_ONLY 4 "const_int_operand" "n") + (reg:DI FPM_REGNUM)] + FPM_VDOT2_UNS))] + "TARGET_FP8DOT2" + "\t%1., %2., %3.[%4]" +) + +;; fpm vdot4 instructions. +(define_insn + "@aarch64_" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (unspec:VDQSF + [(match_operand:VDQSF 1 "register_operand" "w") + (match_operand:VB 2 "register_operand" "w") + (match_operand:VB 3 "register_operand" "w") + (reg:DI FPM_REGNUM)] + FPM_VDOT4_UNS))] + "TARGET_FP8DOT4" + "\t%1., %2., %3." +) + +;; fpm vdot4 instructions with lane. +(define_insn + "@aarch64_" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (unspec:VDQSF + [(match_operand:VDQSF 1 "register_operand" "w") + (match_operand:VB 2 "register_operand" "w") + (match_operand:VB2 3 "register_operand" "w") + (match_operand:SI_ONLY 4 "const_int_operand" "n") + (reg:DI FPM_REGNUM)] + FPM_VDOT4_UNS))] + "TARGET_FP8DOT4" + "\t%1., %2., %3.[%4]" +) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 593319fd472..bbe56afcb62 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -483,6 +483,12 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED /* fp8 instructions are enabled through +fp8. */ #define TARGET_FP8 AARCH64_HAVE_ISA (FP8) +/* fp8 dot product instructions are enabled through +fp8dot2. */ +#define TARGET_FP8DOT2 AARCH64_HAVE_ISA (FP8DOT2) + +/* fp8 dot product instructions are enabled through +fp8dot4. */ +#define TARGET_FP8DOT4 AARCH64_HAVE_ISA (FP8DOT4) + /* Standard register usage. */ /* 31 64-bit general purpose registers R0-R30: diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e3026c36e1c..45b9e74c231 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -163,6 +163,10 @@ ;; Advanced SIMD Float modes. (define_mode_iterator VDQF [V2SF V4SF V2DF]) + +(define_mode_iterator VHF [(V4HF "TARGET_SIMD_F16INST") + (V8HF "TARGET_SIMD_F16INST")]) + (define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") V2SF V4SF V2DF]) @@ -321,6 +325,7 @@ ;; All byte modes. (define_mode_iterator VB [V8QI V16QI]) +(define_mode_iterator VB2 [VB]) ;; 1 and 2 lane DI and DF modes. (define_mode_iterator V12DIF [V1DI V1DF V2DI V2DF]) @@ -766,6 +771,8 @@ UNSPEC_VCVT2_HIGH_F16 ; Used in aarch64-simd.md. UNSPEC_VCVT2_LOW_BF16 ; Used in aarch64-simd.md. UNSPEC_VCVT2_LOW_F16 ; Used in aarch64-simd.md. + UNSPEC_VDOT2 ; Used in aarch64-simd.md. + UNSPEC_VDOT4 ; Used in aarch64-simd.md. UNSPEC_TBL ; Used in vector permute patterns. UNSPEC_TBX ; Used in vector permute patterns. UNSPEC_CONCAT ; Used in vector permute patterns. @@ -2427,6 +2434,11 @@ (VNx8HF ".h") (VNx16HF "") (VNx32HF "") (VNx8HI ".h") (VNx16HI "") (VNx32HI "")]) + +;; Lane index suffix for fp8 vdot operations depends on the output mode +(define_mode_attr Vdotlanetype [(V4HF "2b") (V8HF "2b") + (V2SF "4b") (V4SF "4b")]) + ;; The number of bytes controlled by a predicate (define_mode_attr data_bytes [(VNx16BI "1") (VNx8BI "2") (VNx4BI "4") (VNx2BI "8")]) @@ -4597,6 +4609,10 @@ (define_int_iterator FPM_TERNARY_VCVT_UNS [UNSPEC_VCVT_HIGH_F32]) +(define_int_iterator FPM_VDOT2_UNS [UNSPEC_VDOT2]) + +(define_int_iterator FPM_VDOT4_UNS [UNSPEC_VDOT4]) + (define_int_attr fpm_uns_op [(UNSPEC_FSCALE "fscale") (UNSPEC_VCVT_F16 "fcvtn") @@ -4614,7 +4630,9 @@ (UNSPEC_VCVT2_HIGH_BF16 "bf2cvtl2") (UNSPEC_VCVT2_HIGH_F16 "f2cvtl2") (UNSPEC_VCVT2_LOW_BF16 "bf2cvtl") - (UNSPEC_VCVT2_LOW_F16 "f2cvtl")]) + (UNSPEC_VCVT2_LOW_F16 "f2cvtl") + (UNSPEC_VDOT2 "fdot") + (UNSPEC_VDOT4 "fdot")]) (define_int_attr fpm_uns_name [(UNSPEC_VCVT_F16 "vcvt_mf8_f16_fpm") diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 7146163d66d..332c664b30f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21805,6 +21805,10 @@ Enable support for Armv8.9-a/9.4-a translation hardening extension. Enable the RCpc3 (Release Consistency) extension. @item fp8 Enable the fp8 (8-bit floating point) extension. +@item fp8dot2 +Enable the fp8dot2 (8-bit floating point dot product) extension. +@item fp8dot4 +Enable the fp8dot4 (8-bit floating point dot product) extension. @item faminmax Enable the Floating Point Absolute Maximum/Minimum extension. diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vdot2_fpmdot.c b/gcc/testsuite/gcc.target/aarch64/simd/vdot2_fpmdot.c new file mode 100644 index 00000000000..3e888a67ec7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vdot2_fpmdot.c @@ -0,0 +1,77 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -march=armv9-a+fp8dot2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +/* +** test_vdot_f16_fpm: +** msr fpmr, x0 +** fdot v0.4h, v1.8b, v2.8b +** ret +*/ +float16x4_t +test_vdot_f16_fpm (float16x4_t a, mfloat8x8_t b, mfloat8x8_t c, fpm_t d) +{ + return vdot_f16_mf8_fpm (a, b, c, d); +} + +/* +** test_vdotq_f16_fpm: +** msr fpmr, x0 +** fdot v0.8h, v1.16b, v2.16b +** ret +*/ +float16x8_t +test_vdotq_f16_fpm (float16x8_t a, mfloat8x16_t b, mfloat8x16_t c, fpm_t d) +{ + return vdotq_f16_mf8_fpm (a, b, c, d); +} + +/* +** test_vdot_lane_f16_fpm: +** msr fpmr, x0 +** fdot v0.4h, v1.8b, v2.2b\[1\] +** ret +*/ +float16x4_t +test_vdot_lane_f16_fpm (float16x4_t a, mfloat8x8_t b, mfloat8x8_t c, fpm_t d) +{ + return vdot_lane_f16_mf8_fpm (a, b, c, 1, d); +} + +/* +** test_vdot_laneq_f16_fpm: +** msr fpmr, x0 +** fdot v0.4h, v1.8b, v2.2b\[1\] +** ret +*/ +float16x4_t +test_vdot_laneq_f16_fpm (float16x4_t a, mfloat8x8_t b, mfloat8x16_t c, fpm_t d) +{ + return vdot_laneq_f16_mf8_fpm (a, b, c, 1, d); +} + +/* +** test_vdotq_lane_f16_fpm: +** msr fpmr, x0 +** fdot v0.8h, v1.16b, v2.2b\[1\] +** ret +*/ +float16x8_t +test_vdotq_lane_f16_fpm (float16x8_t a, mfloat8x16_t b, mfloat8x8_t c, fpm_t d) +{ + return vdotq_lane_f16_mf8_fpm (a, b, c, 1, d); +} + +/* +** test_vdotq_laneq_f16_fpm: +** msr fpmr, x0 +** fdot v0.8h, v1.16b, v2.2b\[1\] +** ret +*/ +float16x8_t +test_vdotq_laneq_f16_fpm (float16x8_t a, mfloat8x16_t b, mfloat8x16_t c, fpm_t d) +{ + return vdotq_laneq_f16_mf8_fpm (a, b, c, 1, d); +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vdot4_fpmdot.c b/gcc/testsuite/gcc.target/aarch64/simd/vdot4_fpmdot.c new file mode 100644 index 00000000000..f03dd0a0d36 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vdot4_fpmdot.c @@ -0,0 +1,77 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -march=armv9-a+fp8dot4" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +/* +** test_vdot_f32_fpm: +** msr fpmr, x0 +** fdot v0.2s, v1.8b, v2.8b +** ret +*/ +float32x2_t +test_vdot_f32_fpm (float32x2_t a, mfloat8x8_t b, mfloat8x8_t c, fpm_t d) +{ + return vdot_f32_mf8_fpm (a, b, c, d); +} + +/* +** test_vdotq_f32_fpm: +** msr fpmr, x0 +** fdot v0.4s, v1.16b, v2.16b +** ret +*/ +float32x4_t +test_vdotq_f32_fpm (float32x4_t a, mfloat8x16_t b, mfloat8x16_t c, fpm_t d) +{ + return vdotq_f32_mf8_fpm (a, b, c, d); +} + +/* +** test_vdot_lane_f32_fpm: +** msr fpmr, x0 +** fdot v0.2s, v1.8b, v2.4b\[1\] +** ret +*/ +float32x2_t +test_vdot_lane_f32_fpm (float32x2_t a, mfloat8x8_t b, mfloat8x8_t c, fpm_t d) +{ + return vdot_lane_f32_mf8_fpm (a, b, c, 1, d); +} + +/* +** test_vdot_laneq_f32_fpm: +** msr fpmr, x0 +** fdot v0.2s, v1.8b, v2.4b\[1\] +** ret +*/ +float32x2_t +test_vdot_laneq_f32_fpm (float32x2_t a, mfloat8x8_t b, mfloat8x16_t c, fpm_t d) +{ + return vdot_laneq_f32_mf8_fpm (a, b, c, 1, d); +} + +/* +** test_vdotq_lane_f32_fpm: +** msr fpmr, x0 +** fdot v0.4s, v1.16b, v2.4b\[1\] +** ret +*/ +float32x4_t +test_vdotq_lane_f32_fpm (float32x4_t a, mfloat8x16_t b, mfloat8x8_t c, fpm_t d) +{ + return vdotq_lane_f32_mf8_fpm (a, b, c, 1, d); +} + +/* +** test_vdotq_laneq_f32_fpm: +** msr fpmr, x0 +** fdot v0.4s, v1.16b, v2.4b\[1\] +** ret +*/ +float32x4_t +test_vdotq_laneq_f32_fpm (float32x4_t a, mfloat8x16_t b, mfloat8x16_t c, fpm_t d) +{ + return vdotq_laneq_f32_mf8_fpm (a, b, c, 1, d); +}