From patchwork Mon Jul 8 16:15:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 1958046 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=X+rJFcX7; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=X+rJFcX7; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WHq6R3C3yz1xpP for ; Tue, 9 Jul 2024 02:18:07 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF1F0384A440 for ; Mon, 8 Jul 2024 16:18:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2062d.outbound.protection.outlook.com [IPv6:2a01:111:f403:2612::62d]) by sourceware.org (Postfix) with ESMTPS id 2C1C838654B3 for ; Mon, 8 Jul 2024 16:17:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C1C838654B3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2C1C838654B3 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2612::62d ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1720455456; cv=pass; b=guHX3K8zt99asuGEufU7vpcq9sp/3qvg7doyek9c74Dh3Bd7ipDIy8IJvxc0s35HoH0cfHkQ4x7sveUdvhy8ws3ecB3r6+AwPNK9ueeNp+COo9US21dIck/AwsUdBwV5sN7kSHNr+TRVODhbZL26CNtn1igvp60OWj2LuvcNn/Q= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1720455456; c=relaxed/simple; bh=u2CdMLozmlIzA7fVMVrtVxNGNvYN87YayTBxBO3jFns=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=pZoHfh3spK7c5uLeGP5a12a/PmPxMhK/5ySf1en0G29QtNSm4/E2IMFtp/9FbpMteBVgTmqbbKjTAgzsOFZ0jmUGair/yTqDwlgom/ZqDtRkamOdEIEQRslWlCxH21ZaX+gS1dyUcg2hDRp9MscMmd/xjJxb33ztuNuJveVMh8E= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=T+Nr3kspHtFUBYZvOvB0vbeL9i2ky7amMWVGtcnlSuOIJlpK3BtDxLp+han3S7vdU6hIbZ9leRdaaCcq0r/g+1WdLX/TsaJq7nMnLrhlGdxDHL2/cBNvpQozYlQZ84wTvVgwbOuPC04lVGouvmj9Syx4pyOPrutA29AenPNiLZ29beMd91LJXjpBtWnJzIgnpbv+mBfwQFmAxIkl7fbjqx0pN3M68wPFVx1efoHlSq0oakbgjRIpWgdSgiZpoUNdiJZ9p4WMtXqrQ1J0bR95rJosoBvN9g+mML+06+7RhOwL/rF/t439+jBI8gqFJMekkhEIDEgkHWTvsiONljHipQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NgjZ+YHunItqWI3O3piXL9MHYG63NDrhnpIqMaJm3yI=; b=Fyn6XVXGi0YW5QdMq8ZMIlxfXNpMLDMl06hCcU5zm+WtuxBb3pCE1VVi7/uVtpWAdCaMgPAzhHFTgv4YYBdmRIGKfY/Ixal2xjKoAX4j6T9iOU74f1e1lANIqh1fNr4rfZCkz2ht1ydS7hOxgRVJvfqLgrc9EZFIFBGOQ3viUxb4CopZ5pfh2TRCpU+wou39QY2yMXWHijIyQBVxva0a1laH/zFzTDZL3828FoWcQ1IsrWC5M3wtreKyssJRdrVXEKoj5vXYvsyAHtvJM3P2KwLeHWByv8u6P/MaHv4w9HtRkx2hcbsSMUsadsv71eG6wQrscvtsEJ2KFmPtwEFSJQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NgjZ+YHunItqWI3O3piXL9MHYG63NDrhnpIqMaJm3yI=; b=X+rJFcX7Emk+FLRZqa8A2UxSmIH+3tr8FTOk4B+gPnKpmORMXAzhFEeggQhqhiagjJZjM7czvYhBOraIIbb9V8ut5e7hDBHs3SnGZdwYwVgfjxSuAWOoxRIO4Ouq1/bobopMPnnLVFHADP1l4l0jmuCmpLT9fMsGZz3z0SGnUoo= Received: from DU6P191CA0055.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:53e::26) by GV1PR08MB8034.eurprd08.prod.outlook.com (2603:10a6:150:99::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35; Mon, 8 Jul 2024 16:17:26 +0000 Received: from DB1PEPF00050A00.eurprd03.prod.outlook.com (2603:10a6:10:53e:cafe::df) by DU6P191CA0055.outlook.office365.com (2603:10a6:10:53e::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35 via Frontend Transport; Mon, 8 Jul 2024 16:17:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF00050A00.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7741.18 via Frontend Transport; Mon, 8 Jul 2024 16:17:25 +0000 Received: ("Tessian outbound 69dff5ccb08e:v359"); Mon, 08 Jul 2024 16:17:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d75f55379e7e1e52 X-CR-MTA-TID: 64aa7808 Received: from 69882a2e5d4b.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BD503D96-AC40-49E5-8A55-58A085633603.1; Mon, 08 Jul 2024 16:16:19 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 69882a2e5d4b.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 08 Jul 2024 16:16:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OwT4EvV93q8naN+0Cwnj8gYROWSO4L0xAYuWpzUUzyrkUIQmaYGnTgVBmy2t6hdCas48OQy3A8APMhAR5CJ9EyKGQuajtwO/UfIe6FxNbe9uGLXwOO1PO07nDV32k7T/cPFKmnp+eNgl0S5QSdh32bFkZ5kwlu2906APO/oWe4Mkgup7DQsareNKchakzpRrQSCNd+dMbyHjMKZvcvpIVMn1sjJFEk6qmfh4FBfWKEzfvJIAh7YwQbhw38nJfrRIrn2qSnxtuEj6GLAOYfttJbUdHPoZXdiQuC1SRW3DxwGLCgAtDxP2OSfssWppWI+He2dbfro5NHxq6RnspB+rSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NgjZ+YHunItqWI3O3piXL9MHYG63NDrhnpIqMaJm3yI=; b=fHMORxGMSV2F5zG4xXStugknKZ1Hlf2qN0jrr8tzY0i4gHbnok2yEDZEj8LStYXNZSFmewDgU+cgaLv3F8ibp7OKg4viTV9yN1JXpbYdzHHu/qvImJafMGFuVGpYDCJPAr+YISagNG2B3NOs73PxpaN0djuXn1G2EwYc+5K7o/4XE90TsMlmajIAMwdjb09wMspLq8aTTnJzsaE+6zXlaEL6J4bsWRH3kmWsgx+SzUgkUeR1nfwEsD2Z1niRs/tR5M2iYCNa7sMBg7BJm7HRMU7ztwwoweDIs6zkFlwTQgF5LYiWDiVo6V4QF81PL7KnSEYgr8ixaCZ1prRYQjKdqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NgjZ+YHunItqWI3O3piXL9MHYG63NDrhnpIqMaJm3yI=; b=X+rJFcX7Emk+FLRZqa8A2UxSmIH+3tr8FTOk4B+gPnKpmORMXAzhFEeggQhqhiagjJZjM7czvYhBOraIIbb9V8ut5e7hDBHs3SnGZdwYwVgfjxSuAWOoxRIO4Ouq1/bobopMPnnLVFHADP1l4l0jmuCmpLT9fMsGZz3z0SGnUoo= Received: from DUZPR01CA0091.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bb::26) by DB5PR08MB10094.eurprd08.prod.outlook.com (2603:10a6:10:4a2::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35; Mon, 8 Jul 2024 16:16:15 +0000 Received: from DB1PEPF000509F4.eurprd02.prod.outlook.com (2603:10a6:10:4bb:cafe::1b) by DUZPR01CA0091.outlook.office365.com (2603:10a6:10:4bb::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35 via Frontend Transport; Mon, 8 Jul 2024 16:16:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DB1PEPF000509F4.mail.protection.outlook.com (10.167.242.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7762.17 via Frontend Transport; Mon, 8 Jul 2024 16:16:15 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 8 Jul 2024 16:16:13 +0000 Received: from e129018.arm.com (10.57.43.19) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.35 via Frontend Transport; Mon, 8 Jul 2024 16:16:13 +0000 From: Christophe Lyon To: , , , CC: Christophe Lyon Subject: [PATCH 2/2] arm: [MVE intrinsics] Improve vdupq_n implementation Date: Mon, 8 Jul 2024 18:15:52 +0200 Message-ID: <20240708161552.772361-2-christophe.lyon@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240708161552.772361-1-christophe.lyon@arm.com> References: <20240708161552.772361-1-christophe.lyon@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DB1PEPF000509F4:EE_|DB5PR08MB10094:EE_|DB1PEPF00050A00:EE_|GV1PR08MB8034:EE_ X-MS-Office365-Filtering-Correlation-Id: 92d1fed5-8780-43c1-81d2-08dc9f69749c x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info-Original: c7kB/Db3YY1CUY2XdrAVMacmhrujd+SM9yhVzV5bCwO/E+10URChAu2LpRTwcV1DonTfAnbnijiD/QiiGo1B3RTEuUnSjtRbfUJqNvGR6T6ZbdgwmmNcM0VHOFTSdFv2iLjbsUgl5oYkDP4jqso5n8BNC0pZD9GBX6w19GlYHDG1q8ZS3b3FPQIaA1DafSM55QeAz1AuDC0xn7ybRMeFAyYiwlbjJalyV8fzE7wZB0eMH3c4Vw4SzG9qi13VG4BtUlwdZrIdjPFqBSDfgHjDtdMEhdZE/me0bF9qL/6J2fezv2aUqLnfiqRcYIfHznXo/iXBnTDtaSXDnBfTw0rufx0lGyM7gfiNdPkeIxFHprM7YTBSq3nVYYyNYxGiECN7acMuF46bP0fnyXCBguDqHzypXl5vOlXTu0JN4B7fdhGWX/qiFV3A3SmLZDtMMpyzCLJxlcb2GIol95mdVIRkRnjq46ntHW3Gg7xBkrUO/43x8ln5S+D8MG5YL0FlQhTXmhWGsgnLm2pOhzoY2enArCUfLmGQ9pu7RIB1mb02g7thEPlqkNwMxH6zuLLOzWrc/LHsHXr2P567DWHksVp9Q3opcM4/bTfYD7v4ErSxVkT49OeFGhB0YHMcjvbtrtdRZEttdb/WrcALaFyGRR/kU4YSIut6+z1F4EvzprP5O4drTzU2npPF+oT2rCOzsxBJ4KAQ8fENVPNDWOlWdDq9H2nyWTecgta7XU8Z+VEyWajVIDsfpeYr1HrX60S27DjB1gxnBXYsnABFQCPZw1AKQuVctOQmjJjWa7lDXzDu40Pjm75Koi+rtMrxfs/6E9bYTjuZWRDHHBXzjzYYTu0+Q3Xs/72utbabG4oz7YmuUnlm2V7oTZq5dcQuDvMpJXKDXOabKRls5JAODM+VNyVyRucykNZr5tBRIsQeobat5Kcam2b5TmbPFFU4w8S1LanxEyxXCcY9Ffevw3qK0ZAhbxXt3XWUjrqeTy/Pm6kJOp4Il2ZvyhK/9itJ9ZAvY0rZtFxKXGaV0D1x4AywawpMewmgAUFimGG2ONXWxeLfms+4tFhxifRMCXtS4kvVT8j96Ug4QcHYR3SiTn8fJUqRN0yA77JOsJBBLOBgYqQPwX5B5kPAfXghdiwuPE0dtDMadkRPO+rltjZpTCRpRNTpVhBJyaj2TBE8x34qSn8EhCUAjzErI2QjCwgm1iKzmX2ah8QHST1Y/KF0I8x/KcYzBnlLyqVKHyGECQy/jSqdzEvoOyZGLlV09+wJbjg23x85Lf/NTnwCA1GAj7xjdidGT8SG94UIY/vJJGSc/OvtjFJuwvzgQKW5qoVMqWbLYy8r+wgqbwcZ+vR9OiBUD2xUC8RcOf+oG4kd418u/Xuq0U6f1g4UZag3d3p+3Emq1hSTi5YYM5ss9QrGhbENI1WpSg== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(1800799024)(36860700013)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB10094 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF00050A00.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 09869b47-95f8-4be8-e08b-08dc9f694af0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|35042699022|376014|1800799024|36860700013; X-Microsoft-Antispam-Message-Info: i+UF+mxpSTH1FILxXVxveoFHZK31t7GLiGlNBSIU5Vhen6z0SfUa+8KevZkImq2qioHwRJ0buesl49bhiI+f6fkqOovtifKuStMLfS5dYNqAh6JXSsfl4tw+uabP4+7WI7UJ+TSkGh4tzr1IXMwYEjtxebdOnsJVzWvcSOLHncXH9mwy6h0Ji2OxEXwivhICSbjx1h/sBr/QWQcqh41bN4AM8DHLU2SkZTh9l2eLReAC1j2W1wGhQ6fuThe5DjiemJMvKZAeJyYa5FZFb2AXtzaswqhS7sMFY9R4Im4IrNm5bE6O/+41dhhS2cW2XeFfu8838tPA8RqGVKqIP728BH6o59t/8cafyLpXhymYmVrT551kBQZcuKPtN+ze9cvImGL6TlOEJWdUOuYABVAeEmxRpcpudn4LdxAuhLFMKb4QJXxrOn+qU+8QOnWYXDd6ZENvsJPDh6goLei9Bk17JroPYcuE8A5t4Ckg+bPf3Qk7GRWaJJtL9fbMyw/YlsHhgxlaOIfAAhCxrYBqDhzDcB0Dm9utIJlyBEFLETwGbmodn4DDxknieh4pS9AwXtiUjb172BFMldNJvmugXaMYRXHYhrt5NlBFvBfmMafBPCpRoMhr6/dcgti+CaIcNldz704toqIdrv9uL52CTGrtxcf0BV5mu4NeyYD7eEp864vjwJ5HSOpl/g72YWEiig5lYF/UbonGNCXVe2bN16KR7DrLFc58u1aOcqNUswLpDwJzZ+mYZAZ2H0tuy0UT5mhACYWmrZc2grVDghjjIjKIZghfPiP31S/HO6ggNxpzeBXDu+apep4ZDBvTxfVo0odxDlKjzhsSHT4YcAjCSfotSjGJ5ZuXutn4dTjHi7Sy+/LUIhhijVdCYmP2hsWdX2JbS78QSJdZBsxP78moqpqYntUaj43BEoRoWKCO5Qkye5p6g7eoclNkoQ/NkWFbr2ptsLK9fGureviBLfiXFWm4eePJ1r4SQlyLHmGe64RVw8nt5F/E54wDZTXqebkv85hnBLDWZ4h8suTxLB5B8I9y/oRye5iq3ferf8lnuZxfaarMbo8mWLRE5RKnoXWVWW1SBStVOJMkvn1xEkr3YaQNabW2ZVt64duNpbxlFqhZ8AO7mnQOkg38rEhgWWoKUgRiijQ8N2fE7I5WqpOAzr5CL2+M/nPvtwl0QVQgY9lbWGGnfDvDai4QRH0sShgTfdDDYMBm8VmKOWYBPUJRKox4K0wBZTg5Tc01d7RZwJQCDq6RCUFCf21TK0HkU8H0xUZLeo96dVJtABvYgdIMd2ui4+YAaTqUjMNC3QYvf639ucyaZG2mAV0/soOZPe3PSJcqOKPBf0a3f1MzpCXZJWtfaBWu2BDt4+fHXSiXZDlC27w+3v9SCiO+3C3hUmh8lb+ST/tbJ8FM9EtIRxIVY/OjNA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(82310400026)(35042699022)(376014)(1800799024)(36860700013); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jul 2024 16:17:25.6407 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 92d1fed5-8780-43c1-81d2-08dc9f69749c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF00050A00.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8034 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch makes the non-predicated vdupq_n MVE intrinsics use vec_duplicate rather than an unspec. This enables the compiler to generate better code sequences (for instance using vmov when possible). The patch renames the existing mve_vdup pattern into @mve_vdupq_n, and removes the now useless @mve_q_n_f and @mve_q_n_ ones. As a side-effect, it needs to update the mve_unpredicated_insn predicates in @mve_q_m_n_ and @mve_q_m_n_f. Using vec_duplicates means the compiler is now able to use vmov in the tests with an immediate argument in vdupq_n_[su]{8,16,32}.c: vmov.i8 q0,#0x1 However, this is only possible when the immediate has a suitable value (MVE encoding constraints, see imm_for_neon_mov_operand predicate). Provided we adjust the cost computations in arm_rtx_costs_internal(), when the immediate does not meet the vmov constraints, we now generate: mov r0, #imm vdup.xx q0,r0 or ldr r0, .L4 vdup.32 q0,r0 in the f32 case (with 1.1 as immediate). Without the cost adjustment, we would generate: vldr.64 d0, .L4 vldr.64 d1, .L4+8 and an associated literal pool entry. Regarding the testsuite updates: -------------------------------- * The signed versions of vdupq_* tests lack a version with an immediate argument. This patch adds them, similar to what we already have for vdupq_n_u*.c tests. * Code generation for different immediate values is checked with the new tests this patch introduces. Note there's no need for s8/u8 tests because 8-bit immediates always comply wth imm_for_neon_mov_operand. * We can remove xfail from vcmp*f tests since we now generate: movw r3, #15462 vcmp.f16 eq, q0, r3 instead of the previous: vldr.64 d6, .L5 vldr.64 d7, .L5+8 vcmp.f16 eq, q0, q3 * 4 crypto-vsha1*_u32 need an update since we now generate one more vdup instruction, from: vldr d16, .L3 vldr d17, .L3+8 vldr d18, .L3+16 vldr d19, .L3+24 vldr d6, .L3+32 vldr d7, .L3+40 [...] .L3: .word -559038737 .word -559038737 .word -559038737 .word -559038737 to: ldr r3, .L3+32 vldr d18, .L3 vldr d19, .L3+8 vdup.32 q8, r3 vldr d6, .L3+16 vldr d7, .L3+24 [...] .L3+32: .word -559038737 Finally, this patch fixes a bug where the mode iterator for vdupq_n is now MVE_VLD_ST instead of MVE_vecs: V2DI and V2DF (thus vdup.64) are not supported by MVE. Tested on arm-linux-gnueabihf with no regression. 2024-07-02 Jolen Li Christophe Lyon gcc/ * config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class. (vdupq): Use new implementation. * config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode for COST_DOUBLE. Update consting for CONST_VECTOR. * config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s and vdupq_n_u into vdupq_n. * config/arm/mve.md (mve_vdup): Rename into ... (@mve_vdup_n): ... this. (@mve_q_n_f): Delete. (@mve_q_n_): Delete.. (@mve_q_m_n_): Update mve_unpredicated_insn attribute. (@mve_q_m_n_f): Likewise. gcc/testsuite/ * gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update expected code. * gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with immediate argument. * gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update expected code. * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise. * gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with immediate argument. * gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise. * gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler-times. * gcc.target/arm/crypto-vsha1h_u32.c: Likewise. * gcc.target/arm/crypto-vsha1mq_u32.c: Likewise. * gcc.target/arm/crypto-vsha1pq_u32.c: Likewise. --- gcc/config/arm/arm-mve-builtins-base.cc | 55 ++++++++++++++++++- gcc/config/arm/arm.cc | 19 +++++-- gcc/config/arm/arm_mve_builtins.def | 4 +- gcc/config/arm/mve.md | 41 +++----------- .../gcc.target/arm/crypto-vsha1cq_u32.c | 2 +- .../gcc.target/arm/crypto-vsha1h_u32.c | 2 +- .../gcc.target/arm/crypto-vsha1mq_u32.c | 2 +- .../gcc.target/arm/crypto-vsha1pq_u32.c | 2 +- .../arm/mve/intrinsics/vcmpeqq_n_f16.c | 4 +- .../arm/mve/intrinsics/vcmpeqq_n_f32.c | 4 +- .../arm/mve/intrinsics/vcmpgeq_n_f16.c | 4 +- .../arm/mve/intrinsics/vcmpgeq_n_f32.c | 4 +- .../arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +- .../arm/mve/intrinsics/vcmpgtq_n_f32.c | 4 +- .../arm/mve/intrinsics/vcmpleq_n_f16.c | 4 +- .../arm/mve/intrinsics/vcmpleq_n_f32.c | 4 +- .../arm/mve/intrinsics/vcmpltq_n_f16.c | 4 +- .../arm/mve/intrinsics/vcmpltq_n_f32.c | 4 +- .../arm/mve/intrinsics/vcmpneq_n_f16.c | 4 +- .../arm/mve/intrinsics/vcmpneq_n_f32.c | 4 +- .../arm/mve/intrinsics/vdupq_m_n_s16.c | 18 +++++- .../arm/mve/intrinsics/vdupq_m_n_s32.c | 18 +++++- .../arm/mve/intrinsics/vdupq_m_n_s8.c | 18 +++++- .../arm/mve/intrinsics/vdupq_n_f16.c | 3 +- .../arm/mve/intrinsics/vdupq_n_f32-2.c | 29 ++++++++++ .../arm/mve/intrinsics/vdupq_n_f32.c | 5 +- .../arm/mve/intrinsics/vdupq_n_s16-2.c | 30 ++++++++++ .../arm/mve/intrinsics/vdupq_n_s16.c | 14 ++++- .../arm/mve/intrinsics/vdupq_n_s32-2.c | 30 ++++++++++ .../arm/mve/intrinsics/vdupq_n_s32.c | 14 ++++- .../arm/mve/intrinsics/vdupq_n_s8.c | 14 ++++- .../arm/mve/intrinsics/vdupq_n_u16-2.c | 30 ++++++++++ .../arm/mve/intrinsics/vdupq_n_u16.c | 4 +- .../arm/mve/intrinsics/vdupq_n_u32-2.c | 30 ++++++++++ .../arm/mve/intrinsics/vdupq_n_u32.c | 4 +- .../arm/mve/intrinsics/vdupq_n_u8.c | 4 +- .../arm/mve/intrinsics/vdupq_x_n_s16.c | 18 +++++- .../arm/mve/intrinsics/vdupq_x_n_s32.c | 18 +++++- .../arm/mve/intrinsics/vdupq_x_n_s8.c | 18 +++++- 39 files changed, 405 insertions(+), 87 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc index e0ae593a6c0..be0f9c26c83 100644 --- a/gcc/config/arm/arm-mve-builtins-base.cc +++ b/gcc/config/arm/arm-mve-builtins-base.cc @@ -39,6 +39,59 @@ using namespace arm_mve; namespace { +/* Implements vdup_* intrinsics. */ +class vdupq_impl : public quiet +{ +public: + CONSTEXPR vdupq_impl (int unspec_for_m_n_sint, + int unspec_for_m_n_uint, + int unspec_for_m_n_fp) + : m_unspec_for_m_n_sint (unspec_for_m_n_sint), + m_unspec_for_m_n_uint (unspec_for_m_n_uint), + m_unspec_for_m_n_fp (unspec_for_m_n_fp) + {} + int m_unspec_for_m_n_sint; + int m_unspec_for_m_n_uint; + int m_unspec_for_m_n_fp; + + rtx expand (function_expander &e) const override + { + gcc_assert (e.mode_suffix_id == MODE_n); + + insn_code code; + machine_mode mode = e.vector_mode (0); + + switch (e.pred) + { + case PRED_none: + /* No predicate, _n suffix. */ + code = code_for_mve_vdupq_n (mode); + return e.use_exact_insn (code); + + case PRED_m: + case PRED_x: + /* "m" or "x" predicate, _n suffix. */ + if (e.type_suffix (0).integer_p) + if (e.type_suffix (0).unsigned_p) + code = code_for_mve_q_m_n (m_unspec_for_m_n_uint, + m_unspec_for_m_n_uint, mode); + else + code = code_for_mve_q_m_n (m_unspec_for_m_n_sint, + m_unspec_for_m_n_sint, mode); + else + code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp, mode); + + if (e.pred == PRED_m) + return e.use_cond_insn (code, 0); + else + return e.use_pred_x_insn (code); + + default: + gcc_unreachable (); + } + } +}; + /* Implements vreinterpretq_* intrinsics. */ class vreinterpretq_impl : public quiet { @@ -339,7 +392,7 @@ FUNCTION (vcmpltq, unspec_based_mve_function_exact_insn_vcmp, (LT, UNKNOWN, LT, FUNCTION (vcmpcsq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GEU, UNKNOWN, UNKNOWN, VCMPCSQ_M_U, UNKNOWN, UNKNOWN, VCMPCSQ_M_N_U, UNKNOWN)) FUNCTION (vcmphiq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GTU, UNKNOWN, UNKNOWN, VCMPHIQ_M_U, UNKNOWN, UNKNOWN, VCMPHIQ_M_N_U, UNKNOWN)) FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ) -FUNCTION_ONLY_N (vdupq, VDUPQ) +FUNCTION (vdupq, vdupq_impl, (VDUPQ_M_N_S, VDUPQ_M_N_U, VDUPQ_M_N_F)) FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ) FUNCTION (vfmaq, unspec_mve_function_exact_insn, (-1, -1, VFMAQ_F, -1, -1, VFMAQ_N_F, -1, -1, VFMAQ_M_F, -1, -1, VFMAQ_M_N_F)) FUNCTION (vfmasq, unspec_mve_function_exact_insn, (-1, -1, -1, -1, -1, VFMASQ_N_F, -1, -1, -1, -1, -1, VFMASQ_M_N_F)) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 7d67d2cfee9..04439be5d3a 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -11910,7 +11910,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, case CONST_DOUBLE: if (TARGET_HARD_FLOAT && GET_MODE_CLASS (mode) == MODE_FLOAT - && (mode == SFmode || !TARGET_VFP_SINGLE)) + && (mode == SFmode || mode == HFmode || !TARGET_VFP_SINGLE)) { if (vfp3_const_double_rtx (x)) { @@ -11935,14 +11935,25 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, return true; case CONST_VECTOR: - /* Fixme. */ if (((TARGET_NEON && TARGET_HARD_FLOAT && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))) || TARGET_HAVE_MVE) && simd_immediate_valid_for_move (x, mode, NULL, NULL)) - *cost = COSTS_N_INSNS (1); + *cost = extra_cost->vect.movi; else - *cost = COSTS_N_INSNS (4); + { + if (TARGET_NEON && TARGET_HARD_FLOAT + && (VALID_NEON_DREG_MODE (mode))) + /* 64-bit vector requires one load on Neon. */ + *cost = extra_cost->ldst.loadd; + else if ((TARGET_NEON && TARGET_HARD_FLOAT + && (VALID_NEON_QREG_MODE (mode))) + || TARGET_HAVE_MVE) + /* 128-bit vector requires two loads on Neon/MVE. */ + *cost = extra_cost->ldst.loadd * 4; + else + *cost = COSTS_N_INSNS (4); + } return true; case HIGH: diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index f141aab816c..dd99a90b952 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -27,7 +27,7 @@ VAR2 (UNOP_NONE_NONE, vrndmq_f, v8hf, v4sf) VAR2 (UNOP_NONE_NONE, vrndaq_f, v8hf, v4sf) VAR2 (UNOP_NONE_NONE, vrev64q_f, v8hf, v4sf) VAR2 (UNOP_NONE_NONE, vnegq_f, v8hf, v4sf) -VAR2 (UNOP_NONE_NONE, vdupq_n_f, v8hf, v4sf) +VAR5 (UNOP_NONE_NONE, vdupq_n, v8hf, v4sf, v16qi, v8hi, v4si) VAR2 (UNOP_NONE_NONE, vabsq_f, v8hf, v4sf) VAR1 (UNOP_NONE_NONE, vrev32q_f, v8hf) VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf) @@ -39,7 +39,6 @@ VAR3 (UNOP_SNONE_SNONE, vqnegq_s, v16qi, v8hi, v4si) VAR3 (UNOP_SNONE_SNONE, vqabsq_s, v16qi, v8hi, v4si) VAR3 (UNOP_SNONE_SNONE, vnegq_s, v16qi, v8hi, v4si) VAR3 (UNOP_SNONE_SNONE, vmvnq_s, v16qi, v8hi, v4si) -VAR3 (UNOP_SNONE_SNONE, vdupq_n_s, v16qi, v8hi, v4si) VAR3 (UNOP_SNONE_SNONE, vclzq_s, v16qi, v8hi, v4si) VAR3 (UNOP_SNONE_SNONE, vclsq_s, v16qi, v8hi, v4si) VAR3 (UNOP_SNONE_SNONE, vaddvq_s, v16qi, v8hi, v4si) @@ -57,7 +56,6 @@ VAR1 (UNOP_SNONE_SNONE, vrev16q_s, v16qi) VAR1 (UNOP_SNONE_SNONE, vaddlvq_s, v4si) VAR3 (UNOP_UNONE_UNONE, vrev64q_u, v16qi, v8hi, v4si) VAR3 (UNOP_UNONE_UNONE, vmvnq_u, v16qi, v8hi, v4si) -VAR3 (UNOP_UNONE_UNONE, vdupq_n_u, v16qi, v8hi, v4si) VAR3 (UNOP_UNONE_UNONE, vclzq_u, v16qi, v8hi, v4si) VAR3 (UNOP_UNONE_UNONE, vaddvq_u, v16qi, v8hi, v4si) VAR2 (UNOP_UNONE_UNONE, vrev32q_u, v16qi, v8hi) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index afe5fba698c..9fcc1242206 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -94,13 +94,16 @@ (define_insn "mve_mov" (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*,*") (set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")]) -(define_insn "mve_vdup" +;; +;; [vdupq_n_u, vdupq_n_s, vdupq_n_f] +;; +(define_insn "@mve_vdupq_n" [(set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w") (vec_duplicate:MVE_VLD_ST (match_operand: 1 "s_register_operand" "r")))] "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" "vdup.\t%q0, %1" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vdup")) + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vdupq_n")) (set_attr "length" "4") (set_attr "type" "mve_move")]) @@ -188,21 +191,6 @@ (define_insn "mve_vq_f" (set_attr "type" "mve_move") ]) -;; -;; [vdupq_n_f]) -;; -(define_insn "@mve_q_n_f" - [ - (set (match_operand:MVE_0 0 "s_register_operand" "=w") - (unspec:MVE_0 [(match_operand: 1 "s_register_operand" "r")] - MVE_FP_N_VDUPQ_ONLY)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" - ".%#\t%q0, %1" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_q_n_f")) - (set_attr "type" "mve_move") -]) - ;; ;; [vrev32q_f]) ;; @@ -328,21 +316,6 @@ (define_expand "mve_vmvnq_s" "TARGET_HAVE_MVE" ) -;; -;; [vdupq_n_u, vdupq_n_s]) -;; -(define_insn "@mve_q_n_" - [ - (set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand: 1 "s_register_operand" "r")] - VDUPQ_N)) - ] - "TARGET_HAVE_MVE" - ".%#\t%q0, %1" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_q_n_")) - (set_attr "type" "mve_move") -]) - ;; ;; [vclzq_u, vclzq_s]) ;; @@ -1903,7 +1876,7 @@ (define_insn "@mve_q_m_n_" ] "TARGET_HAVE_MVE" "vpst\;t.%#\t%q0, %2" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_q_n_")) + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_q_n")) (set_attr "type" "mve_move") (set_attr "length""8")]) @@ -2317,7 +2290,7 @@ (define_insn "@mve_q_m_n_f" ] "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" "vpst\;t.%#\t%q0, %2" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_q_n_f")) + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_q_n")) (set_attr "type" "mve_move") (set_attr "length""8")]) diff --git a/gcc/testsuite/gcc.target/arm/crypto-vsha1cq_u32.c b/gcc/testsuite/gcc.target/arm/crypto-vsha1cq_u32.c index 0cadd19c4dc..c7835a7968e 100644 --- a/gcc/testsuite/gcc.target/arm/crypto-vsha1cq_u32.c +++ b/gcc/testsuite/gcc.target/arm/crypto-vsha1cq_u32.c @@ -31,5 +31,5 @@ uint32_t foo (void) TEST_SHA1C_VEC_SELECT (GET_LANE) /* { dg-final { scan-assembler-times {sha1c.32\tq[0-9]+, q[0-9]+} 5 } } */ -/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 5 } } */ /* { dg-final { scan-assembler-times {vmov.32\tr[0-9]+, d[0-9]+\[[0-9]+\]+} 3 } } */ diff --git a/gcc/testsuite/gcc.target/arm/crypto-vsha1h_u32.c b/gcc/testsuite/gcc.target/arm/crypto-vsha1h_u32.c index 33af705c59e..496a6237623 100644 --- a/gcc/testsuite/gcc.target/arm/crypto-vsha1h_u32.c +++ b/gcc/testsuite/gcc.target/arm/crypto-vsha1h_u32.c @@ -27,5 +27,5 @@ uint32_t foo (void) TEST_SHA1H_VEC_SELECT (GET_LANE) /* { dg-final { scan-assembler-times {sha1h.32\tq[0-9]+, q[0-9]+} 5 } } */ -/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 5 } } */ /* { dg-final { scan-assembler-times {vmov.32\tr[0-9]+, d[0-9]+\[[0-9]+\]+} 3 } } */ diff --git a/gcc/testsuite/gcc.target/arm/crypto-vsha1mq_u32.c b/gcc/testsuite/gcc.target/arm/crypto-vsha1mq_u32.c index bdd1c4f3315..6526697698e 100644 --- a/gcc/testsuite/gcc.target/arm/crypto-vsha1mq_u32.c +++ b/gcc/testsuite/gcc.target/arm/crypto-vsha1mq_u32.c @@ -31,5 +31,5 @@ uint32_t foo (void) TEST_SHA1M_VEC_SELECT (GET_LANE) /* { dg-final { scan-assembler-times {sha1m.32\tq[0-9]+, q[0-9]+} 5 } } */ -/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 5 } } */ /* { dg-final { scan-assembler-times {vmov.32\tr[0-9]+, d[0-9]+\[[0-9]+\]+} 3 } } */ diff --git a/gcc/testsuite/gcc.target/arm/crypto-vsha1pq_u32.c b/gcc/testsuite/gcc.target/arm/crypto-vsha1pq_u32.c index d48a07c6fa4..3f857e74a3b 100644 --- a/gcc/testsuite/gcc.target/arm/crypto-vsha1pq_u32.c +++ b/gcc/testsuite/gcc.target/arm/crypto-vsha1pq_u32.c @@ -31,5 +31,5 @@ uint32_t foo (void) TEST_SHA1P_VEC_SELECT (GET_LANE) /* { dg-final { scan-assembler-times {sha1p.32\tq[0-9]+, q[0-9]+} 5 } } */ -/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vdup.32\tq[0-9]+, r[0-9]+} 5 } } */ /* { dg-final { scan-assembler-times {vmov.32\tr[0-9]+, d[0-9]+\[[0-9]+\]+} 3 } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c index 2f84d751c53..335e511b17b 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c @@ -39,7 +39,7 @@ foo1 (float16x8_t a, float16_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f16 eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float16x8_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c index 6cfe7338fce..e5c16be65e3 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c @@ -39,7 +39,7 @@ foo1 (float32x4_t a, float32_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f32 eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float32x4_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c index 978bd7d4b52..47d54863a85 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c @@ -39,7 +39,7 @@ foo1 (float16x8_t a, float16_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f16 ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float16x8_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c index 66b6d8b0056..1b775eaf8a0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c @@ -39,7 +39,7 @@ foo1 (float32x4_t a, float32_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f32 ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float32x4_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c index 9c5f1f2f5c8..89d8e2b9109 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c @@ -39,7 +39,7 @@ foo1 (float16x8_t a, float16_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f16 gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c index 2723aa7f98f..a5510e852d5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c @@ -39,7 +39,7 @@ foo1 (float32x4_t a, float32_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f32 gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float32x4_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c index 1d1f4bf0e58..c94b3119d59 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c @@ -39,7 +39,7 @@ foo1 (float16x8_t a, float16_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f16 le, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float16x8_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c index bf77a808064..80e2cfa1079 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c @@ -39,7 +39,7 @@ foo1 (float32x4_t a, float32_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f32 le, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float32x4_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c index f9f091cd9b3..c3a106455cb 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c @@ -39,7 +39,7 @@ foo1 (float16x8_t a, float16_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f16 lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float16x8_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c index d22ea1aca30..b485f75e769 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c @@ -39,7 +39,7 @@ foo1 (float32x4_t a, float32_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f32 lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float32x4_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c index 83beca964d6..1156caafda1 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c @@ -39,7 +39,7 @@ foo1 (float16x8_t a, float16_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f16 ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float16x8_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c index abe1abfed2a..c3ffbd1335f 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c @@ -39,7 +39,7 @@ foo1 (float32x4_t a, float32_t b) } /* -**foo2: { xfail *-*-* } +**foo2: ** ... ** vcmp.f32 ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... @@ -56,4 +56,4 @@ foo2 (float32x4_t a) } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c index bf05c73fc1d..dbbf8540681 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c @@ -42,8 +42,24 @@ foo1 (int16x8_t inactive, int16_t a, mve_pred16_t p) return vdupq_m (inactive, a, p); } +/* +**foo2: +** ... +** vmsr p0, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +** vpst(?: @.*|) +** ... +** vdupt.16 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +int16x8_t +foo2 (int16x8_t inactive, mve_pred16_t p) +{ + return vdupq_m (inactive, 1, p); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c index 71789bb620e..613b5d30fb3 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c @@ -42,8 +42,24 @@ foo1 (int32x4_t inactive, int32_t a, mve_pred16_t p) return vdupq_m (inactive, a, p); } +/* +**foo2: +** ... +** vmsr p0, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +** vpst(?: @.*|) +** ... +** vdupt.32 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +int32x4_t +foo2 (int32x4_t inactive, mve_pred16_t p) +{ + return vdupq_m (inactive, 1, p); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c index 48c4fbd1f82..a1ff48e94e5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c @@ -42,8 +42,24 @@ foo1 (int8x16_t inactive, int8_t a, mve_pred16_t p) return vdupq_m (inactive, a, p); } +/* +**foo2: +** ... +** vmsr p0, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +** vpst(?: @.*|) +** ... +** vdupt.8 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +int8x16_t +foo2 (int8x16_t inactive, mve_pred16_t p) +{ + return vdupq_m (inactive, 1, p); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c index 44112190fb8..f9aae2fd120 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c @@ -24,6 +24,7 @@ foo (float16_t a) /* **foo1: ** ... +** movw r[0-9]+, #15462 ** vdup.16 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) ** ... */ @@ -37,4 +38,4 @@ foo1 () } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c new file mode 100644 index 00000000000..a4b0022cdfc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c @@ -0,0 +1,29 @@ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_mve.h" + +#ifdef __cplusplus +extern "C" { +#endif + + /* Test with a constant that fits in vmov. */ +/* +**foo1: +** ... +** vmov.f32 q[0-9]+, #0.0 .* +** ... +*/ +float32x4_t +foo1 () +{ + return vdupq_n_f32 (0); +} + +#ifdef __cplusplus +} +#endif + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c index 059e3e42dd0..afa78129291 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c @@ -24,7 +24,8 @@ foo (float32_t a) /* **foo1: ** ... -** vdup.32 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ldr r[0-9]+, .L.* +** vdup.32 q0, r[0-9]+ ** ... */ float32x4_t @@ -37,4 +38,4 @@ foo1 () } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c new file mode 100644 index 00000000000..69536ef2bf6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c @@ -0,0 +1,30 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_mve.h" + +#ifdef __cplusplus +extern "C" { +#endif + + /* Test with a constant that does not fit in vmov. */ +/* +**foo1: +** ... +** mov r[0-9]+, #1000 +** vdup.16 q0, r[0-9]+ +** ... +*/ +int16x8_t +foo1 () +{ + return vdupq_n_s16 (1000); +} + +#ifdef __cplusplus +} +#endif + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c index d8ba299cb15..f2746075a3b 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c @@ -21,8 +21,20 @@ foo (int16_t a) return vdupq_n_s16 (a); } +/* +**foo1: +** ... +** vmov.i16 q[0-9]+, (#0x1) (?:@.*|) +** ... +*/ +int16x8_t +foo1 () +{ + return vdupq_n_s16 (1); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c new file mode 100644 index 00000000000..da8adeeeb7b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c @@ -0,0 +1,30 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_mve.h" + +#ifdef __cplusplus +extern "C" { +#endif + + /* Test with a constant that does not fit in vmov. */ +/* +**foo1: +** ... +** mov r[0-9]+, #1000 +** vdup.32 q0, r[0-9]+ +** ... +*/ +int32x4_t +foo1 () +{ + return vdupq_n_s32 (1000); +} + +#ifdef __cplusplus +} +#endif + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c index a81c6d1e220..7f75eca2ad2 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c @@ -21,8 +21,20 @@ foo (int32_t a) return vdupq_n_s32 (a); } +/* +**foo1: +** ... +** vmov.i32 q[0-9]+, (#0x1) (?:@.*|) +** ... +*/ +int32x4_t +foo1 () +{ + return vdupq_n_s32 (1); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c index b0bac4fce89..454ff5abac2 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c @@ -21,8 +21,20 @@ foo (int8_t a) return vdupq_n_s8 (a); } +/* +**foo1: +** ... +** vmov.i8 q[0-9]+, (#0x1) (?:@.*|) +** ... +*/ +int8x16_t +foo1 () +{ + return vdupq_n_s8 (1); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c new file mode 100644 index 00000000000..510e7e7387f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c @@ -0,0 +1,30 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_mve.h" + +#ifdef __cplusplus +extern "C" { +#endif + + /* Test with a constant that does not fit in vmov. */ +/* +**foo1: +** ... +** mov r[0-9]+, #1000 +** vdup.16 q0, r[0-9]+ +** ... +*/ +uint16x8_t +foo1 () +{ + return vdupq_n_u16 (1000); +} + +#ifdef __cplusplus +} +#endif + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c index 55e0a601110..4accb6480dd 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c @@ -24,7 +24,7 @@ foo (uint16_t a) /* **foo1: ** ... -** vdup.16 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** vmov.i16 q[0-9]+, (#0x1) (?:@.*|) ** ... */ uint16x8_t @@ -37,4 +37,4 @@ foo1 () } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c new file mode 100644 index 00000000000..c97cea186ef --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c @@ -0,0 +1,30 @@ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_mve.h" + +#ifdef __cplusplus +extern "C" { +#endif + + /* Test with a constant that does not fit in vmov. */ +/* +**foo1: +** ... +** mov r[0-9]+, #1000 +** vdup.32 q0, r[0-9]+ +** ... +*/ +uint32x4_t +foo1 () +{ + return vdupq_n_u32 (1000); +} + +#ifdef __cplusplus +} +#endif + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c index bf73bc17fc7..d08a94c7a16 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c @@ -24,7 +24,7 @@ foo (uint32_t a) /* **foo1: ** ... -** vdup.32 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** vmov.i32 q[0-9]+, (#0x1) (?:@.*|) ** ... */ uint32x4_t @@ -37,4 +37,4 @@ foo1 () } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c index 48cbdb2a1da..f1fcd4acaa3 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c @@ -24,7 +24,7 @@ foo (uint8_t a) /* **foo1: ** ... -** vdup.8 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** vmov.i8 q[0-9]+, (#0x1) (?:@.*|) ** ... */ uint8x16_t @@ -37,4 +37,4 @@ foo1 () } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c index 6756502ab21..9dcfe4e0376 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c @@ -25,8 +25,24 @@ foo (int16_t a, mve_pred16_t p) return vdupq_x_n_s16 (a, p); } +/* +**foo1: +** ... +** vmsr p0, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +** vpst(?: @.*|) +** ... +** vdupt.16 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +int16x8_t +foo1 (mve_pred16_t p) +{ + return vdupq_x_n_s16 (1, p); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c index b04afb3834b..eacdb2e454f 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c @@ -25,8 +25,24 @@ foo (int32_t a, mve_pred16_t p) return vdupq_x_n_s32 (a, p); } +/* +**foo1: +** ... +** vmsr p0, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +** vpst(?: @.*|) +** ... +** vdupt.32 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +int32x4_t +foo1 (mve_pred16_t p) +{ + return vdupq_x_n_s32 (1, p); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c index b23facd5e94..8951f7475f5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c @@ -25,8 +25,24 @@ foo (int8_t a, mve_pred16_t p) return vdupq_x_n_s8 (a, p); } +/* +**foo1: +** ... +** vmsr p0, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +** vpst(?: @.*|) +** ... +** vdupt.8 q[0-9]+, (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +int8x16_t +foo1 (mve_pred16_t p) +{ + return vdupq_x_n_s8 (1, p); +} + #ifdef __cplusplus } #endif -/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file +/* { dg-final { scan-assembler-not "__ARM_undef" } } */