From patchwork Mon Sep 16 09:38:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 1986064 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=fDjJkwDn; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=fDjJkwDn; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X6g0c6VRtz1y1g for ; Mon, 16 Sep 2024 19:41:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1C2FE3858414 for ; Mon, 16 Sep 2024 09:41:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on20616.outbound.protection.outlook.com [IPv6:2a01:111:f403:2613::616]) by sourceware.org (Postfix) with ESMTPS id 63F293858D20 for ; Mon, 16 Sep 2024 09:39:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 63F293858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 63F293858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2613::616 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1726479553; cv=pass; b=m17JEeIRI0R6lVLms4xTPd/NAbO7G1Jpvn/5ejBPM33vZCG6rJWKLTW/o37rTz6sP9mVBl8Uv4+QQ/+gxfHhsP7fKi9EUN3XL+Po+Bxeqfy82ACHUYA44m4WIMJFl7ngjuPdYajcOP6+8C+DAKtVyNHp80FLrRKqcrmxr8hNdB8= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1726479553; c=relaxed/simple; bh=aXpiZfpOIWrzRDsMRiNzYZeiNKskPf9uAEnisnSpga0=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=wXghopiVhn/08+Y0vYHvRT/VTxVTk/CDb062TPh4oTAfASk/4iUzJFnAGXLotPkvEQQgmKviphNHLdy0IuGkMuErmvXvozXzW6Qr0u8AtRBtFd412IflLDZdY2BHpjAidbhL7xn1RBrN9Knj51lSDxMe7Amje94filogVI0LG7k= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=sHqmxDrtG77AQBUCEbnW5P27TSGEFyOvkU3Dwm2j0vGa4iV11dWfIMylQkTN3fZpNIK8vn5hjtzeYYChT73nvS9mrP2q6NWBgyqdSZ8woIBQ4u4KXKxQ0rJ96aezx5GLEsEYkcyFtAkWYeGfVTzYzkFnUWaKvJgkrKIutai4yU1YMRy1BCnUF08xWvooaACMS9rbX6/dUNkAAF+k2qqUwQVveHW/sHRFjzJ4NmclHo7flRvpcAdtBAAy23U+fiKtMDrqWtYExddNm3pUK82bp+W2sBPspdHiyMMRepRGSxXlpP2usRJHQrAfDR1fFYQP7+DfcLqFh8stjmvbDvm3Fw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QcUWQCJ78cqx/rVJRmm1XsV2G2Y8pEr6qo3agTAPJyw=; b=j2aBi6OeNtWLM+Avd+yN7mFnycjIdqAxyum+cgY62hOVFNHOac8NdZ0TBw/sYcIwjRvkH4/XIhA1KFVy4z4pKKOsTY1lKdhhF93YOSysPrSop5sD4RkeGWc2CChn5DXbcbZCsDhNxii5UfJExn1LS5QS7yN6cRrl4ywOXomZY301LEAkIrLr8aJ7asxss6DsYfcaTh57NV1pj4sxvwYDsspUDDCpQqzK9P111ZwBh2Hhgq4Opan+ZX/1UkJFKWBgTSrE/ZcoFn9dhUXnIbSUjAXFNE/pQHAuf8ZKzqa9g9226cLu0tvO0nkTfI8WeCS44jB+HpzVCte7b8AoYGphMA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QcUWQCJ78cqx/rVJRmm1XsV2G2Y8pEr6qo3agTAPJyw=; b=fDjJkwDnX62oG2kP6uJIw+ZvVQKxzb2h51eeANU+wJgotfssr+3DiqgRNr9C1RNT9LAndIxHOegdd4wRvI9KeHTKGgyE8uz7M8pEGPRIPotxoojvNx6V3HT6HJyo7bpl4IFCgi84f3jS26rABDMScIstI8kIrx4vNkagnXVWZTY= Received: from AS9PR06CA0483.eurprd06.prod.outlook.com (2603:10a6:20b:49b::8) by GV1PR08MB10454.eurprd08.prod.outlook.com (2603:10a6:150:15d::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7982.14; Mon, 16 Sep 2024 09:38:58 +0000 Received: from AM1PEPF000252DF.eurprd07.prod.outlook.com (2603:10a6:20b:49b:cafe::a2) by AS9PR06CA0483.outlook.office365.com (2603:10a6:20b:49b::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 09:38:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM1PEPF000252DF.mail.protection.outlook.com (10.167.16.57) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 09:38:58 +0000 Received: ("Tessian outbound 7047db86dc93:v441"); Mon, 16 Sep 2024 09:38:58 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4eeea1a8eec10ea7 X-CR-MTA-TID: 64aa7808 Received: from Ldbd6b5478b59.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 500061E2-9EF7-43A4-A0B1-D7ED48F7708A.1; Mon, 16 Sep 2024 09:38:46 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id Ldbd6b5478b59.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 16 Sep 2024 09:38:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RWHML92eSpJT32gw/8itTI5xiC3bQCbgEzu4W/CUX+lB0sF7il3ztWlH+PuTskLmup/BnxTMAZKkeT1s1PKFJhUfUs1qx/JYBe9uYzB6oYG1Q+E7xrZx32UkMmnhb8EW7OPsmBHYatKwpno0AxNXANgpNIzeaUTFc2/ozFrTki9Rf0pexZ8ZPvDRHrnIvafE72olFI8J4f+vHh8XSGYM2MCYmffH5NYAjc0YIu33167+OMTuL5Fgp7lBHcviIbJbt3yL3bN80M85f+a5K/9VOmxsIjRxVXwLsMnNHBlkal9NsXJ1NdjwqysWbUfKC6AFVcIZg6gRdRNQstLSTQkmeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QcUWQCJ78cqx/rVJRmm1XsV2G2Y8pEr6qo3agTAPJyw=; b=w0Um7rqEeD8kdjjSMxa+nWF6xFKKgQrlmv513DzTL6Tfsx+ZlleUCQEdjnDEdlwKj8vZBIPL35MNmjtN+DOP8JT7P5Zb3oPHBuKdRO9PjxXYLtE/eRTZU9DCbYCwunM3f6OiiAy9l4UxJrJVTEjJT/2lPrcir/CJ8uPyHNubVYxS7Ug9il019FK4pdvTmCTrJ9mwU6UGa0La6Xvko+JYNTTA39/1xByJwdWjoiF2XJ3eokkNY28t+NXPCP0r3tv4K9/fAweVUfMFJQTQqEsevPZFchdxihDZ+gPmIQrRgDPJAibGmC/HpILgDxLxdSQ6oFP4/hPxFBliaJ0SyfQVEA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QcUWQCJ78cqx/rVJRmm1XsV2G2Y8pEr6qo3agTAPJyw=; b=fDjJkwDnX62oG2kP6uJIw+ZvVQKxzb2h51eeANU+wJgotfssr+3DiqgRNr9C1RNT9LAndIxHOegdd4wRvI9KeHTKGgyE8uz7M8pEGPRIPotxoojvNx6V3HT6HJyo7bpl4IFCgi84f3jS26rABDMScIstI8kIrx4vNkagnXVWZTY= Received: from AS9PR06CA0642.eurprd06.prod.outlook.com (2603:10a6:20b:46f::13) by PA4PR08MB7641.eurprd08.prod.outlook.com (2603:10a6:102:273::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7982.11; Mon, 16 Sep 2024 09:38:39 +0000 Received: from AMS0EPF0000019C.eurprd05.prod.outlook.com (2603:10a6:20b:46f:cafe::3a) by AS9PR06CA0642.outlook.office365.com (2603:10a6:20b:46f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 09:38:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AMS0EPF0000019C.mail.protection.outlook.com (10.167.16.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 09:38:39 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 09:38:33 +0000 Received: from e129018.arm.com (10.57.52.20) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Mon, 16 Sep 2024 09:38:32 +0000 From: Christophe Lyon To: , , CC: Alfie Richards Subject: [PATCH 5/5] arm: [MVE intrinsics] Rework MVE vld/vst intrinsics Date: Mon, 16 Sep 2024 11:38:19 +0200 Message-ID: <20240916093819.12740-6-christophe.lyon@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916093819.12740-1-christophe.lyon@arm.com> References: <20240916093819.12740-1-christophe.lyon@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AMS0EPF0000019C:EE_|PA4PR08MB7641:EE_|AM1PEPF000252DF:EE_|GV1PR08MB10454:EE_ X-MS-Office365-Filtering-Correlation-Id: 72c0ff80-cf44-4199-8172-08dcd6336397 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|82310400026|1800799024|36860700013|376014; X-Microsoft-Antispam-Message-Info-Original: m7Hz/Amg84zN6+XDLcoq50m6gYqI9kOwG/1bE3y1DzqvOyXqex+u2hdEcoUOxpI/fv822m/m9qXdsU+PJJ5dSvpG0RyreMynts0X+8Wn3Jh1pppix7MIHGtdK1tIm+YFZFBw4cOUMZBb99KGgBbU2oPRXxQMF332Ja0lvIAyZR1+ypm1aTnkbscBQZ0/MjnPdA7LrZsIbqIjP8aczBaHg2kwon3kUGjzoX21ly+VyKyAtVSuPvChq0BC099cVrx5lERLnpxBUxRjMyK4bmrPrUxIHtifLd40adPXDcdhUvmYJPAiYtYE2JqK1qWrGQ3onBTtM/wHsfrqon5B6mMD1yGUvbVmY6VAi2Ubxh5DcD3M+2/BbFruGG9C4+pMjm/cBpayFjdM9AtRgraxkEtOIQH341xVkBrfhYGvPmx0Vrc+QF58dJsKseTixZW5nsmP9IS52tRrx8uKlrTAAmeIcxlw2Xc8n3rkzi/vMFquozeFEMZouuEkEk0D77Q0/kBCVY1U8iTaToto6l6yABSthtTTS5W1uD1mdPt34bUxCvUhk5fCx7+Frr5ff8HU8FZb3XLcKCnfVgm4v8s/mYchlF479Rx96osAqqWaYEIKlMdbP+/PnjlF5kUfemSJLZWfob07mC2MiTLg0+iPuoAkO2e85K+a6mb3A5bPmUbnTG+5xwiiL+h0nJnRF/juZEyoqv3+7Kb4cbn2kbB9rQ0sF7FEprihAecRnWzfOHhFe4AoWJiJUjjm1x+HgamoOCZIGyVkGvWzk3zLarmSjXOiYbnwKgztNkSD2CsHzeJNRQUvRxJSHT+IljVoavdxg1KxMcLcG6OzqMXOfy70HHwtg5h6DH1iHbwnd+FNrAU8EVYItOpB2fyX59RV0nMO6JHohj7JasyWBLMsH+ANtPtijU0uEkcJqo2vgdgfGlFwVMcP0Vf5zuzfEeibmT/L/FnY9y+vgVrTTQFod7DN2sKzPMk9N/OY6M/pb4qd6jU+4mtOz7KwGLmWksp/HIRES64VUk2qJDkYe+defncKtCHlENivv8T+ZXye3b+akjRVCX3GyMrfWcoPTFPOSY3mf2ilJl5qNNl1oupj5vX+Zq9eLOqx82UZz5c4bo12HGJ3NJfAJPqyd8vWbOtXE95CFxA3EBDOLuEDQ/bkG0UTZip3qLoY41o7rmj/qV/3Q0oKyT4yfiCIrumzdagOXk/O26F1YFdGZUX7NydEd7V4IB9HCCHGSSYyEdBpG6SBPicnHJEod2/D1e504CifCJMf8xH2pfEyxuMLFLyBo/fpanDwxapD9SX024fgX6cUOr95upgsG5Mz4FiQclluECYUd3CCdAtAT4vIPZlhOCjVZV+ZcT3DPl38KTMW7RyZrIcpiGK0THcc0UGoPpCgr4bXzridljPH16PFogZ8P/Nk7e6mYLhZI9RlexxNw5jdY8knAe4QDVUUeEataB/aZmqDA+Xw X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(82310400026)(1800799024)(36860700013)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB7641 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:20b:46f::13]; domain=AS9PR06CA0642.eurprd06.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM1PEPF000252DF.eurprd07.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: cca0f3bf-2c6d-4b7e-cad7-08dcd633582a X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|376014|1800799024|36860700013|35042699022; X-Microsoft-Antispam-Message-Info: 575PJ3jlMN3M7W6mQKiEYh1ZyKi+gXwsZGihc0CUhpDjsjceZriuRTnr7bur4syNAfCSzlcj4bLJYbeDLKtpLpyQimoJK/pgQJa4WYe9t9z5dL6vz2hWSH59bffWTke9akMzrYgt8Qozq19pEFF0PtmSF2onNvOSNC/Vzy8Ioey6ECmAylbsKVJlm2Pc+NHltTj5JTCFWBAhH2XFYwT6bkMB3/iqw95vW+H08b2MoQAF6TLlZvFC7mRtPU9cA1e2lfou7qqgiVcwTOTwbVAJQPcvEiHg7C0WEReEmMupls0waCgCDTZyLJp1lFkLcYEmJmyc45DK1HNZN8CcRHY7YWud6Ur2nRBgzCnKDhOrSeOabh8S02qtpNDBuHYQx0Oped2zfjHCwbjyhbBrhAKCmVvrnJ2QozbZtH6xL8DuJfi/jDN/dxvNyh2lNMKm48D39aD+vCLuJRyoFgGhQQDCXb89SJAEYmJDs0Ma48W6ZBMxnoGwXzGwHskCYdkPezGgFat0ncxzhT4GoWLohTuhdVEc6T009UBdciRSXRR2SqM9lFhnqjBAV5X1NtnbmYlkUVTVp3VFN/ODW2w3JKComPZkRyp7ChmoTvhJgEbDoSExSxPGT1xhayB2BIMwiUmV8weJUpaFpv68V4tktWJT+e+QEnRJegICCGwuMQsbamO8LLUKfbDbW4HPm/IEBM3Q4FTBlT/b0BLcmlTMYFRJ8hW09cWaVpOeatksDDmBRcVvfqfFvDGBWgat3I63BiW880mfrlUiMYJwQiP8VaQhs58wOfkNHJbdyr+JD8CICiqbk2tUFXzJZCYK0/1knRSPXJXbTRQs4vYFC7A7EpftIWaEwlm5itMCBzayqZY0mYzRDdb0W/zcCs1UBTlOxD8DYck+QY9rbCbYll9+26luSvKANioYaGjntxfrjbVLiityashlVqGzFgyEhZLz9IDIbZbvXfdJughnLJ9IDsz1a/JVxX/yMj5orLO3HY0zW/cWQ/MgrFU6AUeOTm5CdykqiRJr5Jy+wtvKjMLbJpyu31uXN2gjH+y8+h2qrk+nat1eTseRXiH11ygMBDkzv0h5/5WIW7JXrVJpBkQKD7+qH9+EZMMtOR3DZlSX+F1Q+VomcIOXsQdZfa54lI36Q7fS2tomTt9d9+biSgT8STVPq0E3FLp3PIsvwbVubk39prSWENtDm252wnTqjX/HKizdnuyzn56we6p7Ji5TSK+OqVlUNJuNg2HtKL/B26qXR+jD+ILRtqmW7k9EN+30MM81cJJP30ZWRowxlaC/lGLWCvZzBCdxQ3CnFs3NhWofO0gPeIBnkLS+6vNmKRJkS/K/uzowiYbyD0Zr6mJq3RU8QaEHRNSIFF6y36FGqYn/s6IQI3fQlXrcVKCWP2EQBwEPL4nr8tbt0o13gtE6oqar07YrUsqso2o5bYDNa1nfSE1P4oPLEqYoCA/2PH8Fca6X X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(82310400026)(376014)(1800799024)(36860700013)(35042699022); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 09:38:58.1536 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 72c0ff80-cf44-4199-8172-08dcd6336397 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM1PEPF000252DF.eurprd07.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB10454 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Alfie Richards Implement the mve vld and vst intrinsics using the MVE builtins framework. The main part of the patch is to reimplement to vstr/vldr patterns such that we now have much fewer of them: - non-truncating stores - predicated non-truncating stores - truncating stores - predicated truncating stores - non-extending loads - predicated non-extending loads - extending loads - predicated extending loads This enables us to update the implementation of vld1/vst1 and use the new vldr/vstr builtins. The patch also adds support for the predicated vld1/vst1 versions. 2024-09-11 Alfie Richards Christophe Lyon gcc/ * config/arm/arm-mve-builtins-base.cc (vld1q_impl): Add support for predicated version. (vst1q_impl): Likewise. (vstrq_impl): New class. (vldrq_impl): New class. (vldrbq): New. (vldrhq): New. (vldrwq): New. (vstrbq): New. (vstrhq): New. (vstrwq): New. * config/arm/arm-mve-builtins-base.def (vld1q): Add predicated version. (vldrbq): New. (vldrhq): New. (vldrwq): New. (vst1q): Add predicated version. (vstrbq): New. (vstrhq): New. (vstrwq): New. (vrev32q): Update types to float_16. * config/arm/arm-mve-builtins-base.h (vldrbq): New. (vldrhq): New. (vldrwq): New. (vstrbq): New. (vstrhq): New. (vstrwq): New. * config/arm/arm-mve-builtins-functions.h (memory_vector_mode): Remove conversion of floating point vectors to integer. * config/arm/arm-mve-builtins.cc (TYPES_float16): Change to... (TYPES_float_16): ...this. (TYPES_float_32): New. (float16): Change to... (float_16): ...this. (float_32): New. (preds_z_or_none): New. (function_resolver::check_gp_argument): Add support for _z predicate. * config/arm/arm_mve.h (vstrbq): Remove. (vstrbq_p): Likewise. (vstrhq): Likewise. (vstrhq_p): Likewise. (vstrwq): Likewise. (vstrwq_p): Likewise. (vst1q_p): Likewise. (vld1q_z): Likewise. (vldrbq_s8): Likewise. (vldrbq_u8): Likewise. (vldrbq_s16): Likewise. (vldrbq_u16): Likewise. (vldrbq_s32): Likewise. (vldrbq_u32): Likewise. (vstrbq_p_s8): Likewise. (vstrbq_p_s32): Likewise. (vstrbq_p_s16): Likewise. (vstrbq_p_u8): Likewise. (vstrbq_p_u32): Likewise. (vstrbq_p_u16): Likewise. (vldrbq_z_s16): Likewise. (vldrbq_z_u8): Likewise. (vldrbq_z_s8): Likewise. (vldrbq_z_s32): Likewise. (vldrbq_z_u16): Likewise. (vldrbq_z_u32): Likewise. (vldrhq_s32): Likewise. (vldrhq_s16): Likewise. (vldrhq_u32): Likewise. (vldrhq_u16): Likewise. (vldrhq_z_s32): Likewise. (vldrhq_z_s16): Likewise. (vldrhq_z_u32): Likewise. (vldrhq_z_u16): Likewise. (vldrwq_s32): Likewise. (vldrwq_u32): Likewise. (vldrwq_z_s32): Likewise. (vldrwq_z_u32): Likewise. (vldrhq_f16): Likewise. (vldrhq_z_f16): Likewise. (vldrwq_f32): Likewise. (vldrwq_z_f32): Likewise. (vstrhq_f16): Likewise. (vstrhq_s32): Likewise. (vstrhq_s16): Likewise. (vstrhq_u32): Likewise. (vstrhq_u16): Likewise. (vstrhq_p_f16): Likewise. (vstrhq_p_s32): Likewise. (vstrhq_p_s16): Likewise. (vstrhq_p_u32): Likewise. (vstrhq_p_u16): Likewise. (vstrwq_f32): Likewise. (vstrwq_s32): Likewise. (vstrwq_u32): Likewise. (vstrwq_p_f32): Likewise. (vstrwq_p_s32): Likewise. (vstrwq_p_u32): Likewise. (vst1q_p_u8): Likewise. (vst1q_p_s8): Likewise. (vld1q_z_u8): Likewise. (vld1q_z_s8): Likewise. (vst1q_p_u16): Likewise. (vst1q_p_s16): Likewise. (vld1q_z_u16): Likewise. (vld1q_z_s16): Likewise. (vst1q_p_u32): Likewise. (vst1q_p_s32): Likewise. (vld1q_z_u32): Likewise. (vld1q_z_s32): Likewise. (vld1q_z_f16): Likewise. (vst1q_p_f16): Likewise. (vld1q_z_f32): Likewise. (vst1q_p_f32): Likewise. (__arm_vstrbq_s8): Likewise. (__arm_vstrbq_s32): Likewise. (__arm_vstrbq_s16): Likewise. (__arm_vstrbq_u8): Likewise. (__arm_vstrbq_u32): Likewise. (__arm_vstrbq_u16): Likewise. (__arm_vldrbq_s8): Likewise. (__arm_vldrbq_u8): Likewise. (__arm_vldrbq_s16): Likewise. (__arm_vldrbq_u16): Likewise. (__arm_vldrbq_s32): Likewise. (__arm_vldrbq_u32): Likewise. (__arm_vstrbq_p_s8): Likewise. (__arm_vstrbq_p_s32): Likewise. (__arm_vstrbq_p_s16): Likewise. (__arm_vstrbq_p_u8): Likewise. (__arm_vstrbq_p_u32): Likewise. (__arm_vstrbq_p_u16): Likewise. (__arm_vldrbq_z_s8): Likewise. (__arm_vldrbq_z_s32): Likewise. (__arm_vldrbq_z_s16): Likewise. (__arm_vldrbq_z_u8): Likewise. (__arm_vldrbq_z_u32): Likewise. (__arm_vldrbq_z_u16): Likewise. (__arm_vldrhq_s32): Likewise. (__arm_vldrhq_s16): Likewise. (__arm_vldrhq_u32): Likewise. (__arm_vldrhq_u16): Likewise. (__arm_vldrhq_z_s32): Likewise. (__arm_vldrhq_z_s16): Likewise. (__arm_vldrhq_z_u32): Likewise. (__arm_vldrhq_z_u16): Likewise. (__arm_vldrwq_s32): Likewise. (__arm_vldrwq_u32): Likewise. (__arm_vldrwq_z_s32): Likewise. (__arm_vldrwq_z_u32): Likewise. (__arm_vstrhq_s32): Likewise. (__arm_vstrhq_s16): Likewise. (__arm_vstrhq_u32): Likewise. (__arm_vstrhq_u16): Likewise. (__arm_vstrhq_p_s32): Likewise. (__arm_vstrhq_p_s16): Likewise. (__arm_vstrhq_p_u32): Likewise. (__arm_vstrhq_p_u16): Likewise. (__arm_vstrwq_s32): Likewise. (__arm_vstrwq_u32): Likewise. (__arm_vstrwq_p_s32): Likewise. (__arm_vstrwq_p_u32): Likewise. (__arm_vst1q_p_u8): Likewise. (__arm_vst1q_p_s8): Likewise. (__arm_vld1q_z_u8): Likewise. (__arm_vld1q_z_s8): Likewise. (__arm_vst1q_p_u16): Likewise. (__arm_vst1q_p_s16): Likewise. (__arm_vld1q_z_u16): Likewise. (__arm_vld1q_z_s16): Likewise. (__arm_vst1q_p_u32): Likewise. (__arm_vst1q_p_s32): Likewise. (__arm_vld1q_z_u32): Likewise. (__arm_vld1q_z_s32): Likewise. (__arm_vldrwq_f32): Likewise. (__arm_vldrwq_z_f32): Likewise. (__arm_vldrhq_z_f16): Likewise. (__arm_vldrhq_f16): Likewise. (__arm_vstrwq_p_f32): Likewise. (__arm_vstrwq_f32): Likewise. (__arm_vstrhq_f16): Likewise. (__arm_vstrhq_p_f16): Likewise. (__arm_vld1q_z_f16): Likewise. (__arm_vst1q_p_f16): Likewise. (__arm_vld1q_z_f32): Likewise. (__arm_vst2q_f32): Likewise. (__arm_vst1q_p_f32): Likewise. (__arm_vstrbq): Likewise. (__arm_vstrbq_p): Likewise. (__arm_vstrhq): Likewise. (__arm_vstrhq_p): Likewise. (__arm_vstrwq): Likewise. (__arm_vstrwq_p): Likewise. (__arm_vst1q_p): Likewise. (__arm_vld1q_z): Likewise. * config/arm/arm_mve_builtins.def: (vstrbq_s): Delete. (vstrbq_u): Likewise. (vldrbq_s): Likewise. (vldrbq_u): Likewise. (vstrbq_p_s): Likewise. (vstrbq_p_u): Likewise. (vldrbq_z_s): Likewise. (vldrbq_z_u): Likewise. (vld1q_u): Likewise. (vld1q_s): Likewise. (vldrhq_z_u): Likewise. (vldrhq_u): Likewise. (vldrhq_z_s): Likewise. (vldrhq_s): Likewise. (vld1q_f): Likewise. (vldrhq_f): Likewise. (vldrhq_z_f): Likewise. (vldrwq_f): Likewise. (vldrwq_s): Likewise. (vldrwq_u): Likewise. (vldrwq_z_f): Likewise. (vldrwq_z_s): Likewise. (vldrwq_z_u): Likewise. (vst1q_u): Likewise. (vst1q_s): Likewise. (vstrhq_p_u): Likewise. (vstrhq_u): Likewise. (vstrhq_p_s): Likewise. (vstrhq_s): Likewise. (vst1q_f): Likewise. (vstrhq_f): Likewise. (vstrhq_p_f): Likewise. (vstrwq_f): Likewise. (vstrwq_s): Likewise. (vstrwq_u): Likewise. (vstrwq_p_f): Likewise. (vstrwq_p_s): Likewise. (vstrwq_p_u): Likewise. * config/arm/iterators.md (MVE_w_narrow_TYPE): New iterator. (MVE_w_narrow_type): New iterator. (MVE_wide_n_TYPE): New attribute. (MVE_wide_n_type): New attribute. (MVE_wide_n_sz_elem): New attribute. (MVE_wide_n_VPRED): New attribute. (MVE_elem_ch): New attribute. (supf): Remove VSTRBQ_S, VSTRBQ_U, VLDRBQ_S, VLDRBQ_U, VLD1Q_S, VLD1Q_U, VLDRHQ_S, VLDRHQ_U, VLDRWQ_S, VLDRWQ_U, VST1Q_S, VST1Q_U, VSTRHQ_S, VSTRHQ_U, VSTRWQ_S, VSTRWQ_U. (VSTRBQ, VLDRBQ, VLD1Q, VLDRHQ, VLDRWQ, VST1Q, VSTRHQ, VSTRWQ): Delete. * config/arm/mve.md (mve_vstrbq_): Remove. (mve_vldrbq_): Likewise. (mve_vstrbq_p_): Likewise. (mve_vldrbq_z_): Likewise. (mve_vldrhq_fv8hf): Likewise. (mve_vldrhq_): Likewise. (mve_vldrhq_z_fv8hf): Likewise. (mve_vldrhq_z_): Likewise. (mve_vldrwq_fv4sf): Likewise. (mve_vldrwq_v4si): Likewise. (mve_vldrwq_z_fv4sf): Likewise. (mve_vldrwq_z_v4si): Likewise. (@mve_vld1q_f): Likewise. (@mve_vld1q_): Likewise. (mve_vstrhq_fv8hf): Likewise. (mve_vstrhq_p_fv8hf): Likewise. (mve_vstrhq_p_): Likewise. (mve_vstrhq_): Likewise. (mve_vstrwq_fv4sf): Likewise. (mve_vstrwq_p_fv4sf): Likewise. (mve_vstrwq_p_v4si): Likewise. (mve_vstrwq_v4si): Likewise. (@mve_vst1q_f): Likewise. (@mve_vst1q_): Likewise. (@mve_vstrq_): New. (@mve_vstrq_p_): New. (@mve_vstrq_truncate_): New. (@mve_vstrq_p_truncate_): New. (@mve_vldrq_): New. (@mve_vldrq_z_): New. (@mve_vldrq_extend_): New. (@mve_vldrq_z_extend_): New. * config/arm/unspecs.md: (VSTRBQ_S): Remove. (VSTRBQ_U): Likewise. (VLDRBQ_S): Likewise. (VLDRBQ_U): Likewise. (VLD1Q_F): Likewise. (VLD1Q_S): Likewise. (VLD1Q_U): Likewise. (VLDRHQ_F): Likewise. (VLDRHQ_U): Likewise. (VLDRHQ_S): Likewise. (VLDRWQ_F): Likewise. (VLDRWQ_S): Likewise. (VLDRWQ_U): Likewise. (VSTRHQ_F): Likewise. (VST1Q_S): Likewise. (VST1Q_U): Likewise. (VSTRHQ_U): Likewise. (VSTRWQ_S): Likewise. (VSTRWQ_U): Likewise. (VSTRWQ_F): Likewise. (VST1Q_F): Likewise. (VLDRQ): New. (VLDRQ_Z): Likewise. (VLDRQ_EXT): Likewise. (VLDRQ_EXT_Z): Likewise. (VSTRQ): Likewise. (VSTRQ_P): Likewise. (VSTRQ_TRUNC): Likewise. (VSTRQ_TRUNC_P): Likewise. --- gcc/config/arm/arm-mve-builtins-base.cc | 135 ++- gcc/config/arm/arm-mve-builtins-base.def | 20 +- gcc/config/arm/arm-mve-builtins-base.h | 6 + gcc/config/arm/arm-mve-builtins-functions.h | 13 - gcc/config/arm/arm-mve-builtins.cc | 15 +- gcc/config/arm/arm_mve.h | 978 +------------------- gcc/config/arm/arm_mve_builtins.def | 38 - gcc/config/arm/iterators.md | 37 +- gcc/config/arm/mve.md | 662 ++++--------- gcc/config/arm/unspecs.md | 29 +- 10 files changed, 379 insertions(+), 1554 deletions(-) diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc index e0ae593a6c0..9ca1bc4258a 100644 --- a/gcc/config/arm/arm-mve-builtins-base.cc +++ b/gcc/config/arm/arm-mve-builtins-base.cc @@ -96,16 +96,18 @@ public: expand (function_expander &e) const override { insn_code icode; - if (e.type_suffix (0).float_p) - icode = code_for_mve_vld1q_f(e.vector_mode (0)); - else + switch (e.pred) { - if (e.type_suffix (0).unsigned_p) - icode = code_for_mve_vld1q(VLD1Q_U, - e.vector_mode (0)); - else - icode = code_for_mve_vld1q(VLD1Q_S, - e.vector_mode (0)); + case PRED_none: + icode = code_for_mve_vldrq (e.vector_mode (0)); + break; + + case PRED_z: + icode = code_for_mve_vldrq_z (e.vector_mode (0)); + break; + + default: + gcc_unreachable (); } return e.use_contiguous_load_insn (icode); } @@ -124,21 +126,112 @@ public: expand (function_expander &e) const override { insn_code icode; - if (e.type_suffix (0).float_p) - icode = code_for_mve_vst1q_f(e.vector_mode (0)); - else + switch (e.pred) + { + case PRED_none: + icode = code_for_mve_vstrq (e.vector_mode (0)); + break; + + case PRED_p: + icode = code_for_mve_vstrq_p (e.vector_mode (0)); + break; + + default: + gcc_unreachable (); + } + return e.use_contiguous_store_insn (icode); + } +}; + +/* Builds the vstrq* intrinsics. */ +class vstrq_impl : public store_truncating +{ +public: + using store_truncating::store_truncating; + + unsigned int call_properties (const function_instance &) const override + { + return CP_WRITE_MEMORY; + } + + rtx expand (function_expander &e) const override + { + insn_code icode; + switch (e.pred) { - if (e.type_suffix (0).unsigned_p) - icode = code_for_mve_vst1q(VST1Q_U, - e.vector_mode (0)); + case PRED_none: + if (e.vector_mode (0) == e.memory_vector_mode ()) + /* Non-truncating store case. */ + icode = code_for_mve_vstrq (e.vector_mode (0)); else - icode = code_for_mve_vst1q(VST1Q_S, - e.vector_mode (0)); + /* Truncating store case. + (there is only one possible truncation for each memory mode so only + one mode argument is needed). */ + icode = code_for_mve_vstrq_truncate (e.memory_vector_mode ()); + break; + + case PRED_p: + if (e.vector_mode (0) == e.memory_vector_mode ()) + icode = code_for_mve_vstrq_p (e.vector_mode (0)); + else + icode = code_for_mve_vstrq_p_truncate (e.memory_vector_mode ()); + break; + + default: + gcc_unreachable (); } return e.use_contiguous_store_insn (icode); } }; +/* Builds the vldrq* intrinsics. */ +class vldrq_impl : public load_extending +{ +public: + using load_extending::load_extending; + + unsigned int call_properties (const function_instance &) const override + { + return CP_READ_MEMORY; + } + + rtx expand (function_expander &e) const override + { + insn_code icode; + switch (e.pred) + { + case PRED_none: + if (e.vector_mode (0) == e.memory_vector_mode ()) + /* Non-extending load case. */ + icode = code_for_mve_vldrq (e.vector_mode (0)); + else + /* Extending load case. + (there is only one extension for each memory mode so only one type + argument is needed). */ + icode = code_for_mve_vldrq_extend (e.memory_vector_mode (), + e.type_suffix (0).unsigned_p + ? ZERO_EXTEND + : SIGN_EXTEND); + break; + + case PRED_z: + if (e.vector_mode (0) == e.memory_vector_mode ()) + icode = code_for_mve_vldrq_z (e.vector_mode (0)); + else + icode = code_for_mve_vldrq_z_extend (e.memory_vector_mode (), + e.type_suffix (0).unsigned_p + ? ZERO_EXTEND + : SIGN_EXTEND); + break; + + default: + gcc_unreachable (); + } + + return e.use_contiguous_load_insn (icode); + } +}; + } /* end anonymous namespace */ namespace arm_mve { @@ -347,6 +440,11 @@ FUNCTION (vfmsq, unspec_mve_function_exact_insn, (-1, -1, VFMSQ_F, -1, -1, -1, - FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ) FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ) FUNCTION (vld1q, vld1_impl,) +FUNCTION (vldrbq, vldrq_impl, (TYPE_SUFFIX_s8, TYPE_SUFFIX_u8)) +FUNCTION (vldrhq, vldrq_impl, + (TYPE_SUFFIX_s16, TYPE_SUFFIX_u16, TYPE_SUFFIX_f16)) +FUNCTION (vldrwq, vldrq_impl, + (TYPE_SUFFIX_s32, TYPE_SUFFIX_u32, TYPE_SUFFIX_f32)) FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ) FUNCTION_WITHOUT_N_NO_U_F (vmaxaq, VMAXAQ) FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ) @@ -463,6 +561,9 @@ FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ) FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ) FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ) FUNCTION (vst1q, vst1_impl,) +FUNCTION (vstrbq, vstrq_impl, (QImode, opt_scalar_mode ())) +FUNCTION (vstrhq, vstrq_impl, (HImode, HFmode)) +FUNCTION (vstrwq, vstrq_impl, (SImode, SFmode)) FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ) FUNCTION (vuninitializedq, vuninitializedq_impl,) diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def index 90d031eebec..513827f0e40 100644 --- a/gcc/config/arm/arm-mve-builtins-base.def +++ b/gcc/config/arm/arm-mve-builtins-base.def @@ -47,7 +47,10 @@ DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none) DEF_MVE_FUNCTION (vhcaddq_rot90, binary, all_signed, mx_or_none) DEF_MVE_FUNCTION (vhcaddq_rot270, binary, all_signed, mx_or_none) DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none) -DEF_MVE_FUNCTION (vld1q, load, all_integer, none) +DEF_MVE_FUNCTION (vld1q, load, all_integer, z_or_none) +DEF_MVE_FUNCTION (vldrbq, load_ext, all_integer, z_or_none) +DEF_MVE_FUNCTION (vldrhq, load_ext, integer_16_32, z_or_none) +DEF_MVE_FUNCTION (vldrwq, load_ext, integer_32, z_or_none) DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none) DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none) DEF_MVE_FUNCTION (vmaxq, binary, all_integer, mx_or_none) @@ -151,7 +154,10 @@ DEF_MVE_FUNCTION (vshrntq, binary_rshift_narrow, integer_16_32, m_or_none) DEF_MVE_FUNCTION (vshrq, binary_rshift, all_integer, mx_or_none) DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none) DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none) -DEF_MVE_FUNCTION (vst1q, store, all_integer, none) +DEF_MVE_FUNCTION (vst1q, store, all_integer, p_or_none) +DEF_MVE_FUNCTION (vstrbq, store, all_integer, p_or_none) +DEF_MVE_FUNCTION (vstrhq, store, integer_16_32, p_or_none) +DEF_MVE_FUNCTION (vstrwq, store, integer_32, p_or_none) DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none) DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none) #undef REQUIRES_FLOAT @@ -184,7 +190,9 @@ DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none) DEF_MVE_FUNCTION (vfmaq, ternary_opt_n, all_float, m_or_none) DEF_MVE_FUNCTION (vfmasq, ternary_n, all_float, m_or_none) DEF_MVE_FUNCTION (vfmsq, ternary, all_float, m_or_none) -DEF_MVE_FUNCTION (vld1q, load, all_float, none) +DEF_MVE_FUNCTION (vld1q, load, all_float, z_or_none) +DEF_MVE_FUNCTION (vldrhq, load_ext, float_16, z_or_none) +DEF_MVE_FUNCTION (vldrwq, load_ext, float_32, z_or_none) DEF_MVE_FUNCTION (vmaxnmaq, binary, all_float, m_or_none) DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none) DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none) @@ -198,7 +206,7 @@ DEF_MVE_FUNCTION (vnegq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none) DEF_MVE_FUNCTION (vpselq, vpsel, all_float, none) DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none) -DEF_MVE_FUNCTION (vrev32q, unary, float16, mx_or_none) +DEF_MVE_FUNCTION (vrev32q, unary, float_16, mx_or_none) DEF_MVE_FUNCTION (vrev64q, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndaq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndmq, unary, all_float, mx_or_none) @@ -206,7 +214,9 @@ DEF_MVE_FUNCTION (vrndnq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndpq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none) -DEF_MVE_FUNCTION (vst1q, store, all_float, none) +DEF_MVE_FUNCTION (vst1q, store, all_float, p_or_none) +DEF_MVE_FUNCTION (vstrhq, store, float_16, p_or_none) +DEF_MVE_FUNCTION (vstrwq, store, float_32, p_or_none) DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none) DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none) #undef REQUIRES_FLOAT diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h index c9b52a81c5e..1e267ce0238 100644 --- a/gcc/config/arm/arm-mve-builtins-base.h +++ b/gcc/config/arm/arm-mve-builtins-base.h @@ -64,6 +64,9 @@ extern const function_base *const vhcaddq_rot270; extern const function_base *const vhcaddq_rot90; extern const function_base *const vhsubq; extern const function_base *const vld1q; +extern const function_base *const vldrbq; +extern const function_base *const vldrhq; +extern const function_base *const vldrwq; extern const function_base *const vmaxaq; extern const function_base *const vmaxavq; extern const function_base *const vmaxnmaq; @@ -180,6 +183,9 @@ extern const function_base *const vshrq; extern const function_base *const vsliq; extern const function_base *const vsriq; extern const function_base *const vst1q; +extern const function_base *const vstrbq; +extern const function_base *const vstrhq; +extern const function_base *const vstrwq; extern const function_base *const vsubq; extern const function_base *const vuninitializedq; diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h index e47bc69936e..c78b370e958 100644 --- a/gcc/config/arm/arm-mve-builtins-functions.h +++ b/gcc/config/arm/arm-mve-builtins-functions.h @@ -1005,19 +1005,6 @@ public: memory_vector_mode (const function_instance &fi) const override { machine_mode mode = fi.vector_mode (0); - /* Vectors of floating-point are managed in memory as vectors of - integers. */ - switch (mode) - { - case E_V4SFmode: - mode = E_V4SImode; - break; - case E_V8HFmode: - mode = E_V8HImode; - break; - default: - break; - } if (m_vectors_per_tuple != 1) mode = targetm.array_mode (mode, m_vectors_per_tuple).require (); diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc index f519fded000..109e391d768 100644 --- a/gcc/config/arm/arm-mve-builtins.cc +++ b/gcc/config/arm/arm-mve-builtins.cc @@ -149,8 +149,10 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = { class ("b", "f", etc.) and a numerical bit count. */ /* _f16. */ -#define TYPES_float16(S, D) \ - S (f16) +#define TYPES_float_16(S, D) S (f16) + +/* _f32. */ +#define TYPES_float_32(S, D) S (f32) /* _f16 _f32. */ #define TYPES_all_float(S, D) \ @@ -273,7 +275,8 @@ static const type_suffix_pair types_none[] = { DEF_MVE_TYPES_ARRAY (all_integer); DEF_MVE_TYPES_ARRAY (all_integer_with_64); -DEF_MVE_TYPES_ARRAY (float16); +DEF_MVE_TYPES_ARRAY (float_16); +DEF_MVE_TYPES_ARRAY (float_32); DEF_MVE_TYPES_ARRAY (all_float); DEF_MVE_TYPES_ARRAY (all_signed); DEF_MVE_TYPES_ARRAY (all_unsigned); @@ -308,6 +311,11 @@ static const predication_index preds_p_or_none[] = { PRED_p, PRED_none, NUM_PREDS }; +/* Used by functions that have the z predicated form, in addition to + an unpredicated form. */ +static const predication_index preds_z_or_none[] + = {PRED_z, PRED_none, NUM_PREDS}; + /* A list of all MVE ACLE functions. */ static CONSTEXPR const function_group_info function_groups[] = { #define DEF_MVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \ @@ -1601,6 +1609,7 @@ function_resolver::check_gp_argument (unsigned int nops, case PRED_p: case PRED_x: + case PRED_z: /* Add final predicate. */ nargs = nops + 1; break; diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index ae1b5438797..659d8802e4a 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -49,10 +49,8 @@ #define vbicq_m(__inactive, __a, __b, __p) __arm_vbicq_m(__inactive, __a, __b, __p) #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a, __b, __p) #define vstrbq_scatter_offset(__base, __offset, __value) __arm_vstrbq_scatter_offset(__base, __offset, __value) -#define vstrbq(__addr, __value) __arm_vstrbq(__addr, __value) #define vstrwq_scatter_base(__addr, __offset, __value) __arm_vstrwq_scatter_base(__addr, __offset, __value) #define vldrbq_gather_offset(__base, __offset) __arm_vldrbq_gather_offset(__base, __offset) -#define vstrbq_p(__addr, __value, __p) __arm_vstrbq_p(__addr, __value, __p) #define vstrbq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p(__base, __offset, __value, __p) #define vstrwq_scatter_base_p(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p(__addr, __offset, __value, __p) #define vldrbq_gather_offset_z(__base, __offset, __p) __arm_vldrbq_gather_offset_z(__base, __offset, __p) @@ -72,10 +70,6 @@ #define vstrhq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p(__base, __offset, __value, __p) #define vstrhq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrhq_scatter_shifted_offset(__base, __offset, __value) #define vstrhq_scatter_shifted_offset_p(__base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p(__base, __offset, __value, __p) -#define vstrhq(__addr, __value) __arm_vstrhq(__addr, __value) -#define vstrhq_p(__addr, __value, __p) __arm_vstrhq_p(__addr, __value, __p) -#define vstrwq(__addr, __value) __arm_vstrwq(__addr, __value) -#define vstrwq_p(__addr, __value, __p) __arm_vstrwq_p(__addr, __value, __p) #define vstrdq_scatter_base_p(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_p(__addr, __offset, __value, __p) #define vstrdq_scatter_base(__addr, __offset, __value) __arm_vstrdq_scatter_base(__addr, __offset, __value) #define vstrdq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrdq_scatter_offset_p(__base, __offset, __value, __p) @@ -129,9 +123,7 @@ #define vsbciq_m(__inactive, __a, __b, __carry_out, __p) __arm_vsbciq_m(__inactive, __a, __b, __carry_out, __p) #define vsbcq(__a, __b, __carry) __arm_vsbcq(__a, __b, __carry) #define vsbcq_m(__inactive, __a, __b, __carry, __p) __arm_vsbcq_m(__inactive, __a, __b, __carry, __p) -#define vst1q_p(__addr, __value, __p) __arm_vst1q_p(__addr, __value, __p) #define vst2q(__addr, __value) __arm_vst2q(__addr, __value) -#define vld1q_z(__base, __p) __arm_vld1q_z(__base, __p) #define vld2q(__addr) __arm_vld2q(__addr) #define vld4q(__addr) __arm_vld4q(__addr) #define vsetq_lane(__a, __b, __idx) __arm_vsetq_lane(__a, __b, __idx) @@ -304,24 +296,12 @@ #define vstrwq_scatter_base_u32(__addr, __offset, __value) __arm_vstrwq_scatter_base_u32(__addr, __offset, __value) #define vldrbq_gather_offset_u8(__base, __offset) __arm_vldrbq_gather_offset_u8(__base, __offset) #define vldrbq_gather_offset_s8(__base, __offset) __arm_vldrbq_gather_offset_s8(__base, __offset) -#define vldrbq_s8(__base) __arm_vldrbq_s8(__base) -#define vldrbq_u8(__base) __arm_vldrbq_u8(__base) #define vldrbq_gather_offset_u16(__base, __offset) __arm_vldrbq_gather_offset_u16(__base, __offset) #define vldrbq_gather_offset_s16(__base, __offset) __arm_vldrbq_gather_offset_s16(__base, __offset) -#define vldrbq_s16(__base) __arm_vldrbq_s16(__base) -#define vldrbq_u16(__base) __arm_vldrbq_u16(__base) #define vldrbq_gather_offset_u32(__base, __offset) __arm_vldrbq_gather_offset_u32(__base, __offset) #define vldrbq_gather_offset_s32(__base, __offset) __arm_vldrbq_gather_offset_s32(__base, __offset) -#define vldrbq_s32(__base) __arm_vldrbq_s32(__base) -#define vldrbq_u32(__base) __arm_vldrbq_u32(__base) #define vldrwq_gather_base_s32(__addr, __offset) __arm_vldrwq_gather_base_s32(__addr, __offset) #define vldrwq_gather_base_u32(__addr, __offset) __arm_vldrwq_gather_base_u32(__addr, __offset) -#define vstrbq_p_s8( __addr, __value, __p) __arm_vstrbq_p_s8( __addr, __value, __p) -#define vstrbq_p_s32( __addr, __value, __p) __arm_vstrbq_p_s32( __addr, __value, __p) -#define vstrbq_p_s16( __addr, __value, __p) __arm_vstrbq_p_s16( __addr, __value, __p) -#define vstrbq_p_u8( __addr, __value, __p) __arm_vstrbq_p_u8( __addr, __value, __p) -#define vstrbq_p_u32( __addr, __value, __p) __arm_vstrbq_p_u32( __addr, __value, __p) -#define vstrbq_p_u16( __addr, __value, __p) __arm_vstrbq_p_u16( __addr, __value, __p) #define vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p) #define vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p) #define vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p) @@ -336,12 +316,6 @@ #define vldrbq_gather_offset_z_u16(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u16(__base, __offset, __p) #define vldrbq_gather_offset_z_u32(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u32(__base, __offset, __p) #define vldrbq_gather_offset_z_s8(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s8(__base, __offset, __p) -#define vldrbq_z_s16(__base, __p) __arm_vldrbq_z_s16(__base, __p) -#define vldrbq_z_u8(__base, __p) __arm_vldrbq_z_u8(__base, __p) -#define vldrbq_z_s8(__base, __p) __arm_vldrbq_z_s8(__base, __p) -#define vldrbq_z_s32(__base, __p) __arm_vldrbq_z_s32(__base, __p) -#define vldrbq_z_u16(__base, __p) __arm_vldrbq_z_u16(__base, __p) -#define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p) #define vldrwq_gather_base_z_u32(__addr, __offset, __p) __arm_vldrwq_gather_base_z_u32(__addr, __offset, __p) #define vldrwq_gather_base_z_s32(__addr, __offset, __p) __arm_vldrwq_gather_base_z_s32(__addr, __offset, __p) #define vldrhq_gather_offset_s32(__base, __offset) __arm_vldrhq_gather_offset_s32(__base, __offset) @@ -360,22 +334,6 @@ #define vldrhq_gather_shifted_offset_z_s16(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_s16(__base, __offset, __p) #define vldrhq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_u32(__base, __offset, __p) #define vldrhq_gather_shifted_offset_z_u16(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_u16(__base, __offset, __p) -#define vldrhq_s32(__base) __arm_vldrhq_s32(__base) -#define vldrhq_s16(__base) __arm_vldrhq_s16(__base) -#define vldrhq_u32(__base) __arm_vldrhq_u32(__base) -#define vldrhq_u16(__base) __arm_vldrhq_u16(__base) -#define vldrhq_z_s32(__base, __p) __arm_vldrhq_z_s32(__base, __p) -#define vldrhq_z_s16(__base, __p) __arm_vldrhq_z_s16(__base, __p) -#define vldrhq_z_u32(__base, __p) __arm_vldrhq_z_u32(__base, __p) -#define vldrhq_z_u16(__base, __p) __arm_vldrhq_z_u16(__base, __p) -#define vldrwq_s32(__base) __arm_vldrwq_s32(__base) -#define vldrwq_u32(__base) __arm_vldrwq_u32(__base) -#define vldrwq_z_s32(__base, __p) __arm_vldrwq_z_s32(__base, __p) -#define vldrwq_z_u32(__base, __p) __arm_vldrwq_z_u32(__base, __p) -#define vldrhq_f16(__base) __arm_vldrhq_f16(__base) -#define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p) -#define vldrwq_f32(__base) __arm_vldrwq_f32(__base) -#define vldrwq_z_f32(__base, __p) __arm_vldrwq_z_f32(__base, __p) #define vldrdq_gather_base_s64(__addr, __offset) __arm_vldrdq_gather_base_s64(__addr, __offset) #define vldrdq_gather_base_u64(__addr, __offset) __arm_vldrdq_gather_base_u64(__addr, __offset) #define vldrdq_gather_base_z_s64(__addr, __offset, __p) __arm_vldrdq_gather_base_z_s64(__addr, __offset, __p) @@ -406,7 +364,6 @@ #define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) #define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) #define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) -#define vstrhq_f16(__addr, __value) __arm_vstrhq_f16(__addr, __value) #define vstrhq_scatter_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_offset_s32( __base, __offset, __value) #define vstrhq_scatter_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_offset_s16( __base, __offset, __value) #define vstrhq_scatter_offset_u32( __base, __offset, __value) __arm_vstrhq_scatter_offset_u32( __base, __offset, __value) @@ -423,21 +380,6 @@ #define vstrhq_scatter_shifted_offset_p_s16( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_s16( __base, __offset, __value, __p) #define vstrhq_scatter_shifted_offset_p_u32( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_u32( __base, __offset, __value, __p) #define vstrhq_scatter_shifted_offset_p_u16( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_u16( __base, __offset, __value, __p) -#define vstrhq_s32(__addr, __value) __arm_vstrhq_s32(__addr, __value) -#define vstrhq_s16(__addr, __value) __arm_vstrhq_s16(__addr, __value) -#define vstrhq_u32(__addr, __value) __arm_vstrhq_u32(__addr, __value) -#define vstrhq_u16(__addr, __value) __arm_vstrhq_u16(__addr, __value) -#define vstrhq_p_f16(__addr, __value, __p) __arm_vstrhq_p_f16(__addr, __value, __p) -#define vstrhq_p_s32(__addr, __value, __p) __arm_vstrhq_p_s32(__addr, __value, __p) -#define vstrhq_p_s16(__addr, __value, __p) __arm_vstrhq_p_s16(__addr, __value, __p) -#define vstrhq_p_u32(__addr, __value, __p) __arm_vstrhq_p_u32(__addr, __value, __p) -#define vstrhq_p_u16(__addr, __value, __p) __arm_vstrhq_p_u16(__addr, __value, __p) -#define vstrwq_f32(__addr, __value) __arm_vstrwq_f32(__addr, __value) -#define vstrwq_s32(__addr, __value) __arm_vstrwq_s32(__addr, __value) -#define vstrwq_u32(__addr, __value) __arm_vstrwq_u32(__addr, __value) -#define vstrwq_p_f32(__addr, __value, __p) __arm_vstrwq_p_f32(__addr, __value, __p) -#define vstrwq_p_s32(__addr, __value, __p) __arm_vstrwq_p_s32(__addr, __value, __p) -#define vstrwq_p_u32(__addr, __value, __p) __arm_vstrwq_p_u32(__addr, __value, __p) #define vstrdq_scatter_base_p_s64(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_p_s64(__addr, __offset, __value, __p) #define vstrdq_scatter_base_p_u64(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_p_u64(__addr, __offset, __value, __p) #define vstrdq_scatter_base_s64(__addr, __offset, __value) __arm_vstrdq_scatter_base_s64(__addr, __offset, __value) @@ -636,46 +578,30 @@ #define vsbcq_u32(__a, __b, __carry) __arm_vsbcq_u32(__a, __b, __carry) #define vsbcq_m_s32(__inactive, __a, __b, __carry, __p) __arm_vsbcq_m_s32(__inactive, __a, __b, __carry, __p) #define vsbcq_m_u32(__inactive, __a, __b, __carry, __p) __arm_vsbcq_m_u32(__inactive, __a, __b, __carry, __p) -#define vst1q_p_u8(__addr, __value, __p) __arm_vst1q_p_u8(__addr, __value, __p) -#define vst1q_p_s8(__addr, __value, __p) __arm_vst1q_p_s8(__addr, __value, __p) #define vst2q_s8(__addr, __value) __arm_vst2q_s8(__addr, __value) #define vst2q_u8(__addr, __value) __arm_vst2q_u8(__addr, __value) -#define vld1q_z_u8(__base, __p) __arm_vld1q_z_u8(__base, __p) -#define vld1q_z_s8(__base, __p) __arm_vld1q_z_s8(__base, __p) #define vld2q_s8(__addr) __arm_vld2q_s8(__addr) #define vld2q_u8(__addr) __arm_vld2q_u8(__addr) #define vld4q_s8(__addr) __arm_vld4q_s8(__addr) #define vld4q_u8(__addr) __arm_vld4q_u8(__addr) -#define vst1q_p_u16(__addr, __value, __p) __arm_vst1q_p_u16(__addr, __value, __p) -#define vst1q_p_s16(__addr, __value, __p) __arm_vst1q_p_s16(__addr, __value, __p) #define vst2q_s16(__addr, __value) __arm_vst2q_s16(__addr, __value) #define vst2q_u16(__addr, __value) __arm_vst2q_u16(__addr, __value) -#define vld1q_z_u16(__base, __p) __arm_vld1q_z_u16(__base, __p) -#define vld1q_z_s16(__base, __p) __arm_vld1q_z_s16(__base, __p) #define vld2q_s16(__addr) __arm_vld2q_s16(__addr) #define vld2q_u16(__addr) __arm_vld2q_u16(__addr) #define vld4q_s16(__addr) __arm_vld4q_s16(__addr) #define vld4q_u16(__addr) __arm_vld4q_u16(__addr) -#define vst1q_p_u32(__addr, __value, __p) __arm_vst1q_p_u32(__addr, __value, __p) -#define vst1q_p_s32(__addr, __value, __p) __arm_vst1q_p_s32(__addr, __value, __p) #define vst2q_s32(__addr, __value) __arm_vst2q_s32(__addr, __value) #define vst2q_u32(__addr, __value) __arm_vst2q_u32(__addr, __value) -#define vld1q_z_u32(__base, __p) __arm_vld1q_z_u32(__base, __p) -#define vld1q_z_s32(__base, __p) __arm_vld1q_z_s32(__base, __p) #define vld2q_s32(__addr) __arm_vld2q_s32(__addr) #define vld2q_u32(__addr) __arm_vld2q_u32(__addr) #define vld4q_s32(__addr) __arm_vld4q_s32(__addr) #define vld4q_u32(__addr) __arm_vld4q_u32(__addr) #define vld4q_f16(__addr) __arm_vld4q_f16(__addr) #define vld2q_f16(__addr) __arm_vld2q_f16(__addr) -#define vld1q_z_f16(__base, __p) __arm_vld1q_z_f16(__base, __p) #define vst2q_f16(__addr, __value) __arm_vst2q_f16(__addr, __value) -#define vst1q_p_f16(__addr, __value, __p) __arm_vst1q_p_f16(__addr, __value, __p) #define vld4q_f32(__addr) __arm_vld4q_f32(__addr) #define vld2q_f32(__addr) __arm_vld2q_f32(__addr) -#define vld1q_z_f32(__base, __p) __arm_vld1q_z_f32(__base, __p) #define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value) -#define vst1q_p_f32(__addr, __value, __p) __arm_vst1q_p_f32(__addr, __value, __p) #define vsetq_lane_f16(__a, __b, __idx) __arm_vsetq_lane_f16(__a, __b, __idx) #define vsetq_lane_f32(__a, __b, __idx) __arm_vsetq_lane_f32(__a, __b, __idx) #define vsetq_lane_s16(__a, __b, __idx) __arm_vsetq_lane_s16(__a, __b, __idx) @@ -1169,48 +1095,6 @@ __arm_vstrbq_scatter_offset_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8 __builtin_mve_vstrbq_scatter_offset_uv8hi ((__builtin_neon_qi *) __base, __offset, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_s8 (int8_t * __addr, int8x16_t __value) -{ - __builtin_mve_vstrbq_sv16qi ((__builtin_neon_qi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_s32 (int8_t * __addr, int32x4_t __value) -{ - __builtin_mve_vstrbq_sv4si ((__builtin_neon_qi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_s16 (int8_t * __addr, int16x8_t __value) -{ - __builtin_mve_vstrbq_sv8hi ((__builtin_neon_qi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_u8 (uint8_t * __addr, uint8x16_t __value) -{ - __builtin_mve_vstrbq_uv16qi ((__builtin_neon_qi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_u32 (uint8_t * __addr, uint32x4_t __value) -{ - __builtin_mve_vstrbq_uv4si ((__builtin_neon_qi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_u16 (uint8_t * __addr, uint16x8_t __value) -{ - __builtin_mve_vstrbq_uv8hi ((__builtin_neon_qi *) __addr, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_base_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value) @@ -1239,20 +1123,6 @@ __arm_vldrbq_gather_offset_s8 (int8_t const * __base, uint8x16_t __offset) return __builtin_mve_vldrbq_gather_offset_sv16qi ((__builtin_neon_qi *) __base, __offset); } -__extension__ extern __inline int8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_s8 (int8_t const * __base) -{ - return __builtin_mve_vldrbq_sv16qi ((__builtin_neon_qi *) __base); -} - -__extension__ extern __inline uint8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_u8 (uint8_t const * __base) -{ - return __builtin_mve_vldrbq_uv16qi ((__builtin_neon_qi *) __base); -} - __extension__ extern __inline uint16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrbq_gather_offset_u16 (uint8_t const * __base, uint16x8_t __offset) @@ -1267,20 +1137,6 @@ __arm_vldrbq_gather_offset_s16 (int8_t const * __base, uint16x8_t __offset) return __builtin_mve_vldrbq_gather_offset_sv8hi ((__builtin_neon_qi *) __base, __offset); } -__extension__ extern __inline int16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_s16 (int8_t const * __base) -{ - return __builtin_mve_vldrbq_sv8hi ((__builtin_neon_qi *) __base); -} - -__extension__ extern __inline uint16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_u16 (uint8_t const * __base) -{ - return __builtin_mve_vldrbq_uv8hi ((__builtin_neon_qi *) __base); -} - __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrbq_gather_offset_u32 (uint8_t const * __base, uint32x4_t __offset) @@ -1295,20 +1151,6 @@ __arm_vldrbq_gather_offset_s32 (int8_t const * __base, uint32x4_t __offset) return __builtin_mve_vldrbq_gather_offset_sv4si ((__builtin_neon_qi *) __base, __offset); } -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_s32 (int8_t const * __base) -{ - return __builtin_mve_vldrbq_sv4si ((__builtin_neon_qi *) __base); -} - -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_u32 (uint8_t const * __base) -{ - return __builtin_mve_vldrbq_uv4si ((__builtin_neon_qi *) __base); -} - __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_s32 (uint32x4_t __addr, const int __offset) @@ -1323,48 +1165,6 @@ __arm_vldrwq_gather_base_u32 (uint32x4_t __addr, const int __offset) return __builtin_mve_vldrwq_gather_base_uv4si (__addr, __offset); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p_s8 (int8_t * __addr, int8x16_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_p_sv16qi ((__builtin_neon_qi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p_s32 (int8_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_p_sv4si ((__builtin_neon_qi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p_s16 (int8_t * __addr, int16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_p_sv8hi ((__builtin_neon_qi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p_u8 (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_p_uv16qi ((__builtin_neon_qi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p_u32 (uint8_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_p_uv4si ((__builtin_neon_qi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p_u16 (uint8_t * __addr, uint16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_p_uv8hi ((__builtin_neon_qi *) __addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrbq_scatter_offset_p_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value, mve_pred16_t __p) @@ -1463,48 +1263,6 @@ __arm_vldrbq_gather_offset_z_u16 (uint8_t const * __base, uint16x8_t __offset, m return __builtin_mve_vldrbq_gather_offset_z_uv8hi ((__builtin_neon_qi *) __base, __offset, __p); } -__extension__ extern __inline int8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_z_s8 (int8_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrbq_z_sv16qi ((__builtin_neon_qi *) __base, __p); -} - -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_z_s32 (int8_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrbq_z_sv4si ((__builtin_neon_qi *) __base, __p); -} - -__extension__ extern __inline int16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_z_s16 (int8_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrbq_z_sv8hi ((__builtin_neon_qi *) __base, __p); -} - -__extension__ extern __inline uint8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_z_u8 (uint8_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrbq_z_uv16qi ((__builtin_neon_qi *) __base, __p); -} - -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_z_u32 (uint8_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrbq_z_uv4si ((__builtin_neon_qi *) __base, __p); -} - -__extension__ extern __inline uint16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrbq_z_u16 (uint8_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrbq_z_uv8hi ((__builtin_neon_qi *) __base, __p); -} - __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrwq_gather_base_z_s32 (uint32x4_t __addr, const int __offset, mve_pred16_t __p) @@ -1631,91 +1389,6 @@ __arm_vldrhq_gather_shifted_offset_z_u16 (uint16_t const * __base, uint16x8_t __ return __builtin_mve_vldrhq_gather_shifted_offset_z_uv8hi ((__builtin_neon_hi *) __base, __offset, __p); } -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_s32 (int16_t const * __base) -{ - return __builtin_mve_vldrhq_sv4si ((__builtin_neon_hi *) __base); -} - -__extension__ extern __inline int16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_s16 (int16_t const * __base) -{ - return __builtin_mve_vldrhq_sv8hi ((__builtin_neon_hi *) __base); -} - -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_u32 (uint16_t const * __base) -{ - return __builtin_mve_vldrhq_uv4si ((__builtin_neon_hi *) __base); -} - -__extension__ extern __inline uint16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_u16 (uint16_t const * __base) -{ - return __builtin_mve_vldrhq_uv8hi ((__builtin_neon_hi *) __base); -} - -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_z_s32 (int16_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrhq_z_sv4si ((__builtin_neon_hi *) __base, __p); -} - -__extension__ extern __inline int16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_z_s16 (int16_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrhq_z_sv8hi ((__builtin_neon_hi *) __base, __p); -} - -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_z_u32 (uint16_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrhq_z_uv4si ((__builtin_neon_hi *) __base, __p); -} - -__extension__ extern __inline uint16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_z_u16 (uint16_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrhq_z_uv8hi ((__builtin_neon_hi *) __base, __p); -} - -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrwq_s32 (int32_t const * __base) -{ - return __builtin_mve_vldrwq_sv4si ((__builtin_neon_si *) __base); -} - -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrwq_u32 (uint32_t const * __base) -{ - return __builtin_mve_vldrwq_uv4si ((__builtin_neon_si *) __base); -} - - -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrwq_z_s32 (int32_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrwq_z_sv4si ((__builtin_neon_si *) __base, __p); -} - -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrwq_z_u32 (uint32_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrwq_z_uv4si ((__builtin_neon_si *) __base, __p); -} - __extension__ extern __inline int64x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrdq_gather_base_s64 (uint64x2_t __addr, const int __offset) @@ -1969,90 +1642,6 @@ __arm_vstrhq_scatter_shifted_offset_p_u16 (uint16_t * __base, uint16x8_t __offse __builtin_mve_vstrhq_scatter_shifted_offset_p_uv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_s32 (int16_t * __addr, int32x4_t __value) -{ - __builtin_mve_vstrhq_sv4si ((__builtin_neon_hi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_s16 (int16_t * __addr, int16x8_t __value) -{ - __builtin_mve_vstrhq_sv8hi ((__builtin_neon_hi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_u32 (uint16_t * __addr, uint32x4_t __value) -{ - __builtin_mve_vstrhq_uv4si ((__builtin_neon_hi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_u16 (uint16_t * __addr, uint16x8_t __value) -{ - __builtin_mve_vstrhq_uv8hi ((__builtin_neon_hi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p_s32 (int16_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_p_sv4si ((__builtin_neon_hi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p_s16 (int16_t * __addr, int16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_p_sv8hi ((__builtin_neon_hi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p_u32 (uint16_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_p_uv4si ((__builtin_neon_hi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p_u16 (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_p_uv8hi ((__builtin_neon_hi *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_s32 (int32_t * __addr, int32x4_t __value) -{ - __builtin_mve_vstrwq_sv4si ((__builtin_neon_si *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_u32 (uint32_t * __addr, uint32x4_t __value) -{ - __builtin_mve_vstrwq_uv4si ((__builtin_neon_si *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_p_s32 (int32_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrwq_p_sv4si ((__builtin_neon_si *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_p_u32 (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrwq_p_uv4si ((__builtin_neon_si *) __addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrdq_scatter_base_p_s64 (uint64x2_t __addr, const int __offset, int64x2_t __value, mve_pred16_t __p) @@ -3190,20 +2779,6 @@ __arm_vsbcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsign return __res; } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_u8 (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p) -{ - return __arm_vstrbq_p_u8 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_s8 (int8_t * __addr, int8x16_t __value, mve_pred16_t __p) -{ - return __arm_vstrbq_p_s8 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q_s8 (int8_t * __addr, int8x16x2_t __value) @@ -3222,20 +2797,6 @@ __arm_vst2q_u8 (uint8_t * __addr, uint8x16x2_t __value) __builtin_mve_vst2qv16qi ((__builtin_neon_qi *) __addr, __rv.__o); } -__extension__ extern __inline uint8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_u8 (uint8_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrbq_z_u8 ( __base, __p); -} - -__extension__ extern __inline int8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_s8 (int8_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrbq_z_s8 ( __base, __p); -} - __extension__ extern __inline int8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld2q_s8 (int8_t const * __addr) @@ -3272,20 +2833,6 @@ __arm_vld4q_u8 (uint8_t const * __addr) return __rv.__i; } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_u16 (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p) -{ - return __arm_vstrhq_p_u16 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_s16 (int16_t * __addr, int16x8_t __value, mve_pred16_t __p) -{ - return __arm_vstrhq_p_s16 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q_s16 (int16_t * __addr, int16x8x2_t __value) @@ -3304,20 +2851,6 @@ __arm_vst2q_u16 (uint16_t * __addr, uint16x8x2_t __value) __builtin_mve_vst2qv8hi ((__builtin_neon_hi *) __addr, __rv.__o); } -__extension__ extern __inline uint16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_u16 (uint16_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrhq_z_u16 ( __base, __p); -} - -__extension__ extern __inline int16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_s16 (int16_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrhq_z_s16 ( __base, __p); -} - __extension__ extern __inline int16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld2q_s16 (int16_t const * __addr) @@ -3354,20 +2887,6 @@ __arm_vld4q_u16 (uint16_t const * __addr) return __rv.__i; } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_u32 (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - return __arm_vstrwq_p_u32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_s32 (int32_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - return __arm_vstrwq_p_s32 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q_s32 (int32_t * __addr, int32x4x2_t __value) @@ -3386,20 +2905,6 @@ __arm_vst2q_u32 (uint32_t * __addr, uint32x4x2_t __value) __builtin_mve_vst2qv4si ((__builtin_neon_si *) __addr, __rv.__o); } -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_u32 (uint32_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrwq_z_u32 ( __base, __p); -} - -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_s32 (int32_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrwq_z_s32 ( __base, __p); -} - __extension__ extern __inline int32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld2q_s32 (int32_t const * __addr) @@ -4319,34 +3824,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p); } -__extension__ extern __inline float32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrwq_f32 (float32_t const * __base) -{ - return __builtin_mve_vldrwq_fv4sf((__builtin_neon_si *) __base); -} - -__extension__ extern __inline float32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrwq_z_f32 (float32_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrwq_z_fv4sf((__builtin_neon_si *) __base, __p); -} - -__extension__ extern __inline float16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_z_f16 (float16_t const * __base, mve_pred16_t __p) -{ - return __builtin_mve_vldrhq_z_fv8hf((__builtin_neon_hi *) __base, __p); -} - -__extension__ extern __inline float16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vldrhq_f16 (float16_t const * __base) -{ - return __builtin_mve_vldrhq_fv8hf((__builtin_neon_hi *) __base); -} - __extension__ extern __inline float16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vldrhq_gather_offset_f16 (float16_t const * __base, uint16x8_t __offset) @@ -4417,34 +3894,6 @@ __arm_vldrwq_gather_shifted_offset_z_f32 (float32_t const * __base, uint32x4_t _ return __builtin_mve_vldrwq_gather_shifted_offset_z_fv4sf ((__builtin_neon_si *) __base, __offset, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrwq_p_fv4sf ((__builtin_neon_si *) __addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_f32 (float32_t * __addr, float32x4_t __value) -{ - __builtin_mve_vstrwq_fv4sf ((__builtin_neon_si *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_f16 (float16_t * __addr, float16x8_t __value) -{ - __builtin_mve_vstrhq_fv8hf ((__builtin_neon_hi *) __addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p_f16 (float16_t * __addr, float16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_p_fv8hf ((__builtin_neon_hi *) __addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrhq_scatter_offset_f16 (float16_t * __base, uint16x8_t __offset, float16x8_t __value) @@ -4833,13 +4282,6 @@ __arm_vld2q_f16 (float16_t const * __addr) return __rv.__i; } -__extension__ extern __inline float16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_f16 (float16_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrhq_z_f16 (__base, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q_f16 (float16_t * __addr, float16x8x2_t __value) @@ -4849,13 +4291,6 @@ __arm_vst2q_f16 (float16_t * __addr, float16x8x2_t __value) __builtin_mve_vst2qv8hf (__addr, __rv.__o); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_f16 (float16_t * __addr, float16x8_t __value, mve_pred16_t __p) -{ - return __arm_vstrhq_p_f16 (__addr, __value, __p); -} - __extension__ extern __inline float32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld4q_f32 (float32_t const * __addr) @@ -4874,27 +4309,13 @@ __arm_vld2q_f32 (float32_t const * __addr) return __rv.__i; } -__extension__ extern __inline float32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z_f32 (float32_t const *__base, mve_pred16_t __p) -{ - return __arm_vldrwq_z_f32 (__base, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_f32 (float32_t * __addr, float32x4x2_t __value) -{ - union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv4sf (__addr, __rv.__o); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p) +__arm_vst2q_f32 (float32_t * __addr, float32x4x2_t __value) { - return __arm_vstrwq_p_f32 (__addr, __value, __p); + union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv; + __rv.__i = __value; + __builtin_mve_vst2qv4sf (__addr, __rv.__o); } __extension__ extern __inline float16x8_t @@ -5283,48 +4704,6 @@ __arm_vstrbq_scatter_offset (uint8_t * __base, uint16x8_t __offset, uint16x8_t _ __arm_vstrbq_scatter_offset_u16 (__base, __offset, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq (int8_t * __addr, int8x16_t __value) -{ - __arm_vstrbq_s8 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq (int8_t * __addr, int32x4_t __value) -{ - __arm_vstrbq_s32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq (int8_t * __addr, int16x8_t __value) -{ - __arm_vstrbq_s16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq (uint8_t * __addr, uint8x16_t __value) -{ - __arm_vstrbq_u8 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq (uint8_t * __addr, uint32x4_t __value) -{ - __arm_vstrbq_u32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq (uint8_t * __addr, uint16x8_t __value) -{ - __arm_vstrbq_u16 (__addr, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_base (uint32x4_t __addr, const int __offset, int32x4_t __value) @@ -5381,48 +4760,6 @@ __arm_vldrbq_gather_offset (int8_t const * __base, uint32x4_t __offset) return __arm_vldrbq_gather_offset_s32 (__base, __offset); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p (int8_t * __addr, int8x16_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_p_s8 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p (int8_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_p_s32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p (int8_t * __addr, int16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_p_s16 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_p_u8 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p (uint8_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_p_u32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_p (uint8_t * __addr, uint16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_p_u16 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrbq_scatter_offset_p (int8_t * __base, uint8x16_t __offset, int8x16_t __value, mve_pred16_t __p) @@ -5857,90 +5194,6 @@ __arm_vstrhq_scatter_shifted_offset_p (uint16_t * __base, uint16x8_t __offset, u __arm_vstrhq_scatter_shifted_offset_p_u16 (__base, __offset, __value, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq (int16_t * __addr, int32x4_t __value) -{ - __arm_vstrhq_s32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq (int16_t * __addr, int16x8_t __value) -{ - __arm_vstrhq_s16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq (uint16_t * __addr, uint32x4_t __value) -{ - __arm_vstrhq_u32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq (uint16_t * __addr, uint16x8_t __value) -{ - __arm_vstrhq_u16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p (int16_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_p_s32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p (int16_t * __addr, int16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_p_s16 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p (uint16_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_p_u32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_p_u16 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq (int32_t * __addr, int32x4_t __value) -{ - __arm_vstrwq_s32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq (uint32_t * __addr, uint32x4_t __value) -{ - __arm_vstrwq_u32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_p (int32_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrwq_p_s32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_p (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrwq_p_u32 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrdq_scatter_base_p (uint64x2_t __addr, const int __offset, int64x2_t __value, mve_pred16_t __p) @@ -6837,20 +6090,6 @@ __arm_vsbcq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * return __arm_vsbcq_m_u32 (__inactive, __a, __b, __carry, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_u8 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (int8_t * __addr, int8x16_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_s8 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q (int8_t * __addr, int8x16x2_t __value) @@ -6865,20 +6104,6 @@ __arm_vst2q (uint8_t * __addr, uint8x16x2_t __value) __arm_vst2q_u8 (__addr, __value); } -__extension__ extern __inline uint8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (uint8_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_u8 (__base, __p); -} - -__extension__ extern __inline int8x16_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (int8_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_s8 (__base, __p); -} - __extension__ extern __inline int8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld2q (int8_t const * __addr) @@ -6907,20 +6132,6 @@ __arm_vld4q (uint8_t const * __addr) return __arm_vld4q_u8 (__addr); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_u16 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (int16_t * __addr, int16x8_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_s16 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q (int16_t * __addr, int16x8x2_t __value) @@ -6935,20 +6146,6 @@ __arm_vst2q (uint16_t * __addr, uint16x8x2_t __value) __arm_vst2q_u16 (__addr, __value); } -__extension__ extern __inline uint16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (uint16_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_u16 (__base, __p); -} - -__extension__ extern __inline int16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (int16_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_s16 (__base, __p); -} - __extension__ extern __inline int16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld2q (int16_t const * __addr) @@ -6977,20 +6174,6 @@ __arm_vld4q (uint16_t const * __addr) return __arm_vld4q_u16 (__addr); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_u32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (int32_t * __addr, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_s32 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q (int32_t * __addr, int32x4x2_t __value) @@ -7005,20 +6188,6 @@ __arm_vst2q (uint32_t * __addr, uint32x4x2_t __value) __arm_vst2q_u32 (__addr, __value); } -__extension__ extern __inline uint32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (uint32_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_u32 (__base, __p); -} - -__extension__ extern __inline int32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (int32_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_s32 (__base, __p); -} - __extension__ extern __inline int32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld2q (int32_t const * __addr) @@ -7651,34 +6820,6 @@ __arm_vldrwq_gather_shifted_offset_z (float32_t const * __base, uint32x4_t __off return __arm_vldrwq_gather_shifted_offset_z_f32 (__base, __offset, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_p (float32_t * __addr, float32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrwq_p_f32 (__addr, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq (float32_t * __addr, float32x4_t __value) -{ - __arm_vstrwq_f32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq (float16_t * __addr, float16x8_t __value) -{ - __arm_vstrhq_f16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_p (float16_t * __addr, float16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_p_f16 (__addr, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrhq_scatter_offset (float16_t * __base, uint16x8_t __offset, float16x8_t __value) @@ -7861,13 +7002,6 @@ __arm_vld2q (float16_t const * __addr) return __arm_vld2q_f16 (__addr); } -__extension__ extern __inline float16x8_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (float16_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_f16 (__base, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q (float16_t * __addr, float16x8x2_t __value) @@ -7875,13 +7009,6 @@ __arm_vst2q (float16_t * __addr, float16x8x2_t __value) __arm_vst2q_f16 (__addr, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (float16_t * __addr, float16x8_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_f16 (__addr, __value, __p); -} - __extension__ extern __inline float32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld4q (float32_t const * __addr) @@ -7896,13 +7023,6 @@ __arm_vld2q (float32_t const * __addr) return __arm_vld2q_f32 (__addr); } -__extension__ extern __inline float32x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld1q_z (float32_t const *__base, mve_pred16_t __p) -{ - return __arm_vld1q_z_f32 (__base, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst2q (float32_t * __addr, float32x4x2_t __value) @@ -7910,13 +7030,6 @@ __arm_vst2q (float32_t * __addr, float32x4x2_t __value) __arm_vst2q_f32 (__addr, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst1q_p (float32_t * __addr, float32x4_t __value, mve_pred16_t __p) -{ - __arm_vst1q_p_f32 (__addr, __value, __p); -} - __extension__ extern __inline float16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vsetq_lane (float16_t __a, float16x8_t __b, const int __idx) @@ -8428,17 +7541,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \ int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));}) -#define __arm_vld1q_z(p0,p1) ( \ - _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), p1), \ - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_z_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), p1), \ - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_z_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), p1), \ - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_z_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), p1), \ - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_z_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), p1), \ - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_z_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), p1), \ - int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_z_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), p1), \ - int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_z_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), p1))) - #define __arm_vld2q(p0) ( \ _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \ @@ -8517,17 +7619,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), p1, p2), \ int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), p1, p2))) -#define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_p_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_p_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_p_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_p_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vst1q_p_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t), p2), \ - int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vst1q_p_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t), p2));}) - #define __arm_vst2q(p0,p1) ({ __typeof(p1) __p1 = (p1); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x2_t]: __arm_vst2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x2_t)), \ @@ -8933,15 +8024,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, p2), \ int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, p2));}) -#define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_p_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_p_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_p_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_p_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));}) - #define __arm_vst2q(p0,p1) ({ __typeof(p1) __p1 = (p1); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x2_t]: __arm_vst2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x2_t)), \ @@ -8951,20 +8033,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x2_t]: __arm_vst2q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x2_t)), \ int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x2_t]: __arm_vst2q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x2_t)));}) -#define __arm_vstrhq(p0,p1) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrhq_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int32x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t)));}) - -#define __arm_vstrhq_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrhq_p_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_p_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));}) - #define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ @@ -8997,17 +8065,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_shifted_offset_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \ int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));}) - -#define __arm_vstrwq(p0,p1) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)));}) - -#define __arm_vstrwq_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_p_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_p_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));}) - #define __arm_vstrdq_scatter_base_p(p0,p1,p2,p3) ({ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \ int (*)[__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_base_p_s64 (p0, p1, __ARM_mve_coerce(__p2, int64x2_t), p3), \ @@ -9105,14 +8162,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vbicq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vbicq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));}) -#define __arm_vld1q_z(p0,p1) ( _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), p1), \ - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_z_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), p1), \ - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_z_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), p1), \ - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_z_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), p1), \ - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_z_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), p1), \ - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_z_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), p1))) - #define __arm_vld2q(p0) ( _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \ int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld2q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \ @@ -9428,25 +8477,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshlcq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2, p3), \ int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshlcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2, p3));}) -#define __arm_vstrbq(p0,p1) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vstrbq_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrbq_s16 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int16x8_t)), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrbq_s32 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int32x4_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_u16 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_u32 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t)));}) - -#define __arm_vstrbq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vstrbq_p_s8 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrbq_p_s16 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrbq_p_s32 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_p_u8 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_p_u16 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_p_u32 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));}) - #define __arm_vstrdq_scatter_base(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \ int (*)[__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_base_s64 (p0, p1, __ARM_mve_coerce(__p2, int64x2_t)), \ diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index f141aab816c..08ae37170b3 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -669,20 +669,14 @@ VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_f, v8hf, v4sf) VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_n_f, v8hf, v4sf) VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_f, v8hf, v4sf) VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vabdq_m_f, v8hf, v4sf) -VAR3 (STRS, vstrbq_s, v16qi, v8hi, v4si) -VAR3 (STRU, vstrbq_u, v16qi, v8hi, v4si) VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si) VAR3 (STRSU, vstrbq_scatter_offset_u, v16qi, v8hi, v4si) VAR1 (STRSBS, vstrwq_scatter_base_s, v4si) VAR1 (STRSBU, vstrwq_scatter_base_u, v4si) VAR3 (LDRGU, vldrbq_gather_offset_u, v16qi, v8hi, v4si) VAR3 (LDRGS, vldrbq_gather_offset_s, v16qi, v8hi, v4si) -VAR3 (LDRS, vldrbq_s, v16qi, v8hi, v4si) -VAR3 (LDRU, vldrbq_u, v16qi, v8hi, v4si) VAR1 (LDRGBS, vldrwq_gather_base_s, v4si) VAR1 (LDRGBU, vldrwq_gather_base_u, v4si) -VAR3 (STRS_P, vstrbq_p_s, v16qi, v8hi, v4si) -VAR3 (STRU_P, vstrbq_p_u, v16qi, v8hi, v4si) VAR3 (STRSS_P, vstrbq_scatter_offset_p_s, v16qi, v8hi, v4si) VAR3 (STRSU_P, vstrbq_scatter_offset_p_u, v16qi, v8hi, v4si) VAR1 (STRSBS_P, vstrwq_scatter_base_p_s, v4si) @@ -691,15 +685,6 @@ VAR1 (LDRGBS_Z, vldrwq_gather_base_z_s, v4si) VAR1 (LDRGBU_Z, vldrwq_gather_base_z_u, v4si) VAR3 (LDRGS_Z, vldrbq_gather_offset_z_s, v16qi, v8hi, v4si) VAR3 (LDRGU_Z, vldrbq_gather_offset_z_u, v16qi, v8hi, v4si) -VAR3 (LDRS_Z, vldrbq_z_s, v16qi, v8hi, v4si) -VAR3 (LDRU_Z, vldrbq_z_u, v16qi, v8hi, v4si) -VAR3 (LDRU, vld1q_u, v16qi, v8hi, v4si) -VAR3 (LDRS, vld1q_s, v16qi, v8hi, v4si) -VAR2 (LDRU_Z, vldrhq_z_u, v8hi, v4si) -VAR2 (LDRU, vldrhq_u, v8hi, v4si) -VAR2 (LDRS_Z, vldrhq_z_s, v8hi, v4si) -VAR2 (LDRS, vldrhq_s, v8hi, v4si) -VAR2 (LDRS, vld1q_f, v8hf, v4sf) VAR2 (LDRGU_Z, vldrhq_gather_shifted_offset_z_u, v8hi, v4si) VAR2 (LDRGU_Z, vldrhq_gather_offset_z_u, v8hi, v4si) VAR2 (LDRGU, vldrhq_gather_shifted_offset_u, v8hi, v4si) @@ -708,14 +693,6 @@ VAR2 (LDRGS_Z, vldrhq_gather_shifted_offset_z_s, v8hi, v4si) VAR2 (LDRGS_Z, vldrhq_gather_offset_z_s, v8hi, v4si) VAR2 (LDRGS, vldrhq_gather_shifted_offset_s, v8hi, v4si) VAR2 (LDRGS, vldrhq_gather_offset_s, v8hi, v4si) -VAR1 (LDRS, vldrhq_f, v8hf) -VAR1 (LDRS_Z, vldrhq_z_f, v8hf) -VAR1 (LDRS, vldrwq_f, v4sf) -VAR1 (LDRS, vldrwq_s, v4si) -VAR1 (LDRU, vldrwq_u, v4si) -VAR1 (LDRS_Z, vldrwq_z_f, v4sf) -VAR1 (LDRS_Z, vldrwq_z_s, v4si) -VAR1 (LDRU_Z, vldrwq_z_u, v4si) VAR1 (LDRGBS, vldrdq_gather_base_s, v2di) VAR1 (LDRGBS, vldrwq_gather_base_f, v4sf) VAR1 (LDRGBS_Z, vldrdq_gather_base_z_s, v2di) @@ -746,13 +723,6 @@ VAR1 (LDRGU_Z, vldrdq_gather_offset_z_u, v2di) VAR1 (LDRGU_Z, vldrdq_gather_shifted_offset_z_u, v2di) VAR1 (LDRGU_Z, vldrwq_gather_offset_z_u, v4si) VAR1 (LDRGU_Z, vldrwq_gather_shifted_offset_z_u, v4si) -VAR3 (STRU, vst1q_u, v16qi, v8hi, v4si) -VAR3 (STRS, vst1q_s, v16qi, v8hi, v4si) -VAR2 (STRU_P, vstrhq_p_u, v8hi, v4si) -VAR2 (STRU, vstrhq_u, v8hi, v4si) -VAR2 (STRS_P, vstrhq_p_s, v8hi, v4si) -VAR2 (STRS, vstrhq_s, v8hi, v4si) -VAR2 (STRS, vst1q_f, v8hf, v4sf) VAR2 (STRSU_P, vstrhq_scatter_shifted_offset_p_u, v8hi, v4si) VAR2 (STRSU_P, vstrhq_scatter_offset_p_u, v8hi, v4si) VAR2 (STRSU, vstrhq_scatter_shifted_offset_u, v8hi, v4si) @@ -761,14 +731,6 @@ VAR2 (STRSS_P, vstrhq_scatter_shifted_offset_p_s, v8hi, v4si) VAR2 (STRSS_P, vstrhq_scatter_offset_p_s, v8hi, v4si) VAR2 (STRSS, vstrhq_scatter_shifted_offset_s, v8hi, v4si) VAR2 (STRSS, vstrhq_scatter_offset_s, v8hi, v4si) -VAR1 (STRS, vstrhq_f, v8hf) -VAR1 (STRS_P, vstrhq_p_f, v8hf) -VAR1 (STRS, vstrwq_f, v4sf) -VAR1 (STRS, vstrwq_s, v4si) -VAR1 (STRU, vstrwq_u, v4si) -VAR1 (STRS_P, vstrwq_p_f, v4sf) -VAR1 (STRS_P, vstrwq_p_s, v4si) -VAR1 (STRU_P, vstrwq_p_u, v4si) VAR1 (STRSBS, vstrdq_scatter_base_s, v2di) VAR1 (STRSBS, vstrwq_scatter_base_f, v4sf) VAR1 (STRSBS_P, vstrdq_scatter_base_p_s, v2di) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index b9ff01cb104..d67e0be1788 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -283,6 +283,14 @@ (define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI V2QI]) (define_mode_iterator MVE_V8HF [V8HF]) (define_mode_iterator MVE_V16QI [V16QI]) +;; Types for MVE truncating stores and widening loads +(define_mode_iterator MVE_w_narrow_TYPE [V8QI V4QI V4HI]) +(define_mode_attr MVE_w_narrow_type [(V8QI "v8qi") (V4QI "v4qi") (V4HI "v4hi")]) +(define_mode_attr MVE_wide_n_TYPE [(V8QI "V8HI") (V4QI "V4SI") (V4HI "V4SI")]) +(define_mode_attr MVE_wide_n_type [(V8QI "v8hi") (V4QI "v4si") (V4HI "v4si")]) +(define_mode_attr MVE_wide_n_sz_elem [(V8QI "16") (V4QI "32") (V4HI "32")]) +(define_mode_attr MVE_wide_n_VPRED [(V8QI "V8BI") (V4QI "V4BI") (V4HI "V4BI")]) + ;;---------------------------------------------------------------------------- ;; Code iterators ;;---------------------------------------------------------------------------- @@ -1769,6 +1777,10 @@ (define_mode_attr V_elem_ch [(V8QI "b") (V16QI "b") (V2SF "s") (V4SF "s") (V2SF "s") (V4SF "s")]) +(define_mode_attr MVE_elem_ch [(V4QI "b") (V8QI "b") (V16QI "b") + (V4HI "h") (V8HI "h") (V8HF "h") + (V4SI "w") (V4SF "w")]) + (define_mode_attr VH_elem_ch [(V4HI "s") (V8HI "s") (V4HF "s") (V8HF "s") (HF "s")]) @@ -2472,19 +2484,16 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") (VMLALDAVAXQ_P_S "s") (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u") (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s") - (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u") - (VLDRBQGO_S "s") (VLDRBQGO_U "u") (VLDRBQ_S "s") - (VLDRBQ_U "u") (VLDRWQGB_S "s") (VLDRWQGB_U "u") - (VLD1Q_S "s") (VLD1Q_U "u") (VLDRHQGO_S "s") + (VSTRBQSO_U "u") + (VLDRBQGO_S "s") (VLDRBQGO_U "u") (VLDRWQGB_S "s") + (VLDRWQGB_U "u") (VLDRHQGO_S "s") (VLDRHQGO_U "u") (VLDRHQGSO_S "s") (VLDRHQGSO_U "u") - (VLDRHQ_S "s") (VLDRHQ_U "u") (VLDRWQ_S "s") - (VLDRWQ_U "u") (VLDRDQGB_S "s") (VLDRDQGB_U "u") + (VLDRDQGB_S "s") (VLDRDQGB_U "u") (VLDRDQGO_S "s") (VLDRDQGO_U "u") (VLDRDQGSO_S "s") (VLDRDQGSO_U "u") (VLDRWQGO_S "s") (VLDRWQGO_U "u") - (VLDRWQGSO_S "s") (VLDRWQGSO_U "u") (VST1Q_S "s") - (VST1Q_U "u") (VSTRHQSO_S "s") (VSTRHQSO_U "u") - (VSTRHQSSO_S "s") (VSTRHQSSO_U "u") (VSTRHQ_S "s") - (VSTRHQ_U "u") (VSTRWQ_S "s") (VSTRWQ_U "u") + (VLDRWQGSO_S "s") (VLDRWQGSO_U "u") + (VSTRHQSO_S "s") (VSTRHQSO_U "u") + (VSTRHQSSO_S "s") (VSTRHQSSO_U "u") (VSTRDQSB_S "s") (VSTRDQSB_U "u") (VSTRDQSO_S "s") (VSTRDQSO_U "u") (VSTRDQSSO_S "s") (VSTRDQSSO_U "u") (VSTRWQSO_U "u") (VSTRWQSO_S "s") (VSTRWQSSO_U "u") @@ -2899,25 +2908,17 @@ (define_int_iterator VSHRNBQ_M_N [VSHRNBQ_M_N_S VSHRNBQ_M_N_U]) (define_int_iterator VSHRNTQ_M_N [VSHRNTQ_M_N_S VSHRNTQ_M_N_U]) (define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U]) (define_int_iterator VSTRBSOQ [VSTRBQSO_S VSTRBQSO_U]) -(define_int_iterator VSTRBQ [VSTRBQ_S VSTRBQ_U]) (define_int_iterator VLDRBGOQ [VLDRBQGO_S VLDRBQGO_U]) -(define_int_iterator VLDRBQ [VLDRBQ_S VLDRBQ_U]) (define_int_iterator VLDRWGBQ [VLDRWQGB_S VLDRWQGB_U]) -(define_int_iterator VLD1Q [VLD1Q_S VLD1Q_U]) (define_int_iterator VLDRHGOQ [VLDRHQGO_S VLDRHQGO_U]) (define_int_iterator VLDRHGSOQ [VLDRHQGSO_S VLDRHQGSO_U]) -(define_int_iterator VLDRHQ [VLDRHQ_S VLDRHQ_U]) -(define_int_iterator VLDRWQ [VLDRWQ_S VLDRWQ_U]) (define_int_iterator VLDRDGBQ [VLDRDQGB_S VLDRDQGB_U]) (define_int_iterator VLDRDGOQ [VLDRDQGO_S VLDRDQGO_U]) (define_int_iterator VLDRDGSOQ [VLDRDQGSO_S VLDRDQGSO_U]) (define_int_iterator VLDRWGOQ [VLDRWQGO_S VLDRWQGO_U]) (define_int_iterator VLDRWGSOQ [VLDRWQGSO_S VLDRWQGSO_U]) -(define_int_iterator VST1Q [VST1Q_S VST1Q_U]) (define_int_iterator VSTRHSOQ [VSTRHQSO_S VSTRHQSO_U]) (define_int_iterator VSTRHSSOQ [VSTRHQSSO_S VSTRHQSSO_U]) -(define_int_iterator VSTRHQ [VSTRHQ_S VSTRHQ_U]) -(define_int_iterator VSTRWQ [VSTRWQ_S VSTRWQ_U]) (define_int_iterator VSTRDSBQ [VSTRDQSB_S VSTRDQSB_U]) (define_int_iterator VSTRDSOQ [VSTRDQSO_S VSTRDQSO_U]) (define_int_iterator VSTRDSSOQ [VSTRDQSSO_S VSTRDQSSO_U]) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 706a45c7d66..17fa4d0182e 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -3354,26 +3354,6 @@ (define_insn "mve_vornq_m_f" (set_attr "type" "mve_move") (set_attr "length""8")]) -;; -;; [vstrbq_s vstrbq_u] -;; -(define_insn "mve_vstrbq_" - [(set (match_operand: 0 "mve_memory_operand" "=Ux") - (unspec: [(match_operand:MVE_2 1 "s_register_operand" "w")] - VSTRBQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn("vstrb.\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrbq_")) - (set_attr "length" "4")]) - ;; ;; [vstrbq_scatter_offset_s vstrbq_scatter_offset_u] ;; @@ -3450,29 +3430,6 @@ (define_insn "mve_vldrbq_gather_offset_" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrbq_gather_offset_")) (set_attr "length" "4")]) -;; -;; [vldrbq_s vldrbq_u] -;; -(define_insn "mve_vldrbq_" - [(set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand: 1 "mve_memory_operand" "Ux")] - VLDRBQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - if ( == 8) - output_asm_insn ("vldrb.\t%q0, %E1",ops); - else - output_asm_insn ("vldrb.\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrbq_")) - (set_attr "length" "4")]) - ;; ;; [vldrwq_gather_base_s vldrwq_gather_base_u] ;; @@ -3551,25 +3508,6 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_scatter_base_v4si")) (set_attr "length" "8")]) -(define_insn "mve_vstrbq_p_" - [(set (match_operand: 0 "mve_memory_operand" "=Ux") - (unspec: - [(match_operand:MVE_2 1 "s_register_operand" "w") - (match_operand: 2 "vpr_register_operand" "Up") - (match_dup 0)] - VSTRBQ))] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vpst\;vstrbt.\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrbq_")) - (set_attr "length" "8")]) - ;; ;; [vldrbq_gather_offset_z_s vldrbq_gather_offset_z_u] ;; @@ -3596,30 +3534,6 @@ (define_insn "mve_vldrbq_gather_offset_z_" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrbq_gather_offset_")) (set_attr "length" "8")]) -;; -;; [vldrbq_z_s vldrbq_z_u] -;; -(define_insn "mve_vldrbq_z_" - [(set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand: 1 "mve_memory_operand" "Ux") - (match_operand: 2 "vpr_register_operand" "Up")] - VLDRBQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - if ( == 8) - output_asm_insn ("vpst\;vldrbt.\t%q0, %E1",ops); - else - output_asm_insn ("vpst\;vldrbt.\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrbq_")) - (set_attr "length" "8")]) - ;; ;; [vldrwq_gather_base_z_s vldrwq_gather_base_z_u] ;; @@ -3642,26 +3556,6 @@ (define_insn "mve_vldrwq_gather_base_z_v4si" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_gather_base_v4si")) (set_attr "length" "8")]) -;; -;; [vldrhq_f] -;; -(define_insn "mve_vldrhq_fv8hf" - [(set (match_operand:V8HF 0 "s_register_operand" "=w") - (unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux")] - VLDRHQ_F)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - output_asm_insn ("vldrh.16\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrhq_fv8hf")) - (set_attr "length" "4")]) - ;; ;; [vldrhq_gather_offset_s vldrhq_gather_offset_u] ;; @@ -3762,176 +3656,6 @@ (define_insn "mve_vldrhq_gather_shifted_offset_z_" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrhq_gather_shifted_offset_")) (set_attr "length" "8")]) -;; -;; [vldrhq_s, vldrhq_u] -;; -(define_insn "mve_vldrhq_" - [(set (match_operand:MVE_5 0 "s_register_operand" "=w") - (unspec:MVE_5 [(match_operand: 1 "mve_memory_operand" "Ux")] - VLDRHQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - if ( == 16) - output_asm_insn ("vldrh.16\t%q0, %E1",ops); - else - output_asm_insn ("vldrh.\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrhq_")) - (set_attr "length" "4")]) - -;; -;; [vldrhq_z_f] -;; -(define_insn "mve_vldrhq_z_fv8hf" - [(set (match_operand:V8HF 0 "s_register_operand" "=w") - (unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux") - (match_operand: 2 "vpr_register_operand" "Up")] - VLDRHQ_F)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrhq_fv8hf")) - (set_attr "length" "8")]) - -;; -;; [vldrhq_z_s vldrhq_z_u] -;; -(define_insn "mve_vldrhq_z_" - [(set (match_operand:MVE_5 0 "s_register_operand" "=w") - (unspec:MVE_5 [(match_operand: 1 "mve_memory_operand" "Ux") - (match_operand: 2 "vpr_register_operand" "Up")] - VLDRHQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - if ( == 16) - output_asm_insn ("vpst\;vldrht.16\t%q0, %E1",ops); - else - output_asm_insn ("vpst\;vldrht.\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrhq_")) - (set_attr "length" "8")]) - -;; -;; [vldrwq_f] -;; -(define_insn "mve_vldrwq_fv4sf" - [(set (match_operand:V4SF 0 "s_register_operand" "=w") - (unspec:V4SF [(match_operand:V4SI 1 "mve_memory_operand" "Ux")] - VLDRWQ_F)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - output_asm_insn ("vldrw.32\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_fv4sf")) - (set_attr "length" "4")]) - -;; -;; [vldrwq_s vldrwq_u] -;; -(define_insn "mve_vldrwq_v4si" - [(set (match_operand:V4SI 0 "s_register_operand" "=w") - (unspec:V4SI [(match_operand:V4SI 1 "mve_memory_operand" "Ux")] - VLDRWQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - output_asm_insn ("vldrw.32\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_v4si")) - (set_attr "length" "4")]) - -;; -;; [vldrwq_z_f] -;; -(define_insn "mve_vldrwq_z_fv4sf" - [(set (match_operand:V4SF 0 "s_register_operand" "=w") - (unspec:V4SF [(match_operand:V4SI 1 "mve_memory_operand" "Ux") - (match_operand:V4BI 2 "vpr_register_operand" "Up")] - VLDRWQ_F)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_fv4sf")) - (set_attr "length" "8")]) - -;; -;; [vldrwq_z_s vldrwq_z_u] -;; -(define_insn "mve_vldrwq_z_v4si" - [(set (match_operand:V4SI 0 "s_register_operand" "=w") - (unspec:V4SI [(match_operand:V4SI 1 "mve_memory_operand" "Ux") - (match_operand:V4BI 2 "vpr_register_operand" "Up")] - VLDRWQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[0]); - ops[0] = gen_rtx_REG (TImode, regno); - ops[1] = operands[1]; - output_asm_insn ("vpst\;vldrwt.32\t%q0, %E1",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_v4si")) - (set_attr "length" "8")]) - -(define_expand "@mve_vld1q_f" - [(match_operand:MVE_0 0 "s_register_operand") - (unspec:MVE_0 [(match_operand: 1 "mve_memory_operand")] VLD1Q_F) - ] - "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" -{ - emit_insn (gen_mve_vldrq_f(operands[0],operands[1])); - DONE; -}) - -(define_expand "@mve_vld1q_" - [(match_operand:MVE_2 0 "s_register_operand") - (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")] VLD1Q) - ] - "TARGET_HAVE_MVE" -{ - emit_insn (gen_mve_vldrq_(operands[0],operands[1])); - DONE; -}) - ;; ;; [vldrdq_gather_base_s vldrdq_gather_base_u] ;; @@ -4368,71 +4092,6 @@ (define_insn "mve_vldrwq_gather_shifted_offset_z_v4si" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_gather_shifted_offset_v4si")) (set_attr "length" "8")]) -;; -;; [vstrhq_f] -;; -(define_insn "mve_vstrhq_fv8hf" - [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux") - (unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")] - VSTRHQ_F)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vstrh.16\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_fv8hf")) - (set_attr "length" "4")]) - -;; -;; [vstrhq_p_f] -;; -(define_insn "mve_vstrhq_p_fv8hf" - [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux") - (unspec:V8HI - [(match_operand:V8HF 1 "s_register_operand" "w") - (match_operand:V8BI 2 "vpr_register_operand" "Up") - (match_dup 0)] - VSTRHQ_F))] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vpst\;vstrht.16\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_fv8hf")) - (set_attr "length" "8")]) - -;; -;; [vstrhq_p_s vstrhq_p_u] -;; -(define_insn "mve_vstrhq_p_" - [(set (match_operand: 0 "mve_memory_operand" "=Ux") - (unspec: - [(match_operand:MVE_5 1 "s_register_operand" "w") - (match_operand: 2 "vpr_register_operand" "Up") - (match_dup 0)] - VSTRHQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vpst\;vstrht.\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_")) - (set_attr "length" "8")]) - ;; ;; [vstrhq_scatter_offset_p_s vstrhq_scatter_offset_p_u] ;; @@ -4558,130 +4217,6 @@ (define_insn "mve_vstrhq_scatter_shifted_offset__insn" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_scatter_shifted_offset__insn")) (set_attr "length" "4")]) -;; -;; [vstrhq_s, vstrhq_u] -;; -(define_insn "mve_vstrhq_" - [(set (match_operand: 0 "mve_memory_operand" "=Ux") - (unspec: [(match_operand:MVE_5 1 "s_register_operand" "w")] - VSTRHQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vstrh.\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_")) - (set_attr "length" "4")]) - -;; -;; [vstrwq_f] -;; -(define_insn "mve_vstrwq_fv4sf" - [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux") - (unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")] - VSTRWQ_F)) - ] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vstrw.32\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_fv4sf")) - (set_attr "length" "4")]) - -;; -;; [vstrwq_p_f] -;; -(define_insn "mve_vstrwq_p_fv4sf" - [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux") - (unspec:V4SI - [(match_operand:V4SF 1 "s_register_operand" "w") - (match_operand:V4BI 2 "vpr_register_operand" "Up") - (match_dup 0)] - VSTRWQ_F))] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_fv4sf")) - (set_attr "length" "8")]) - -;; -;; [vstrwq_p_s vstrwq_p_u] -;; -(define_insn "mve_vstrwq_p_v4si" - [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux") - (unspec:V4SI - [(match_operand:V4SI 1 "s_register_operand" "w") - (match_operand:V4BI 2 "vpr_register_operand" "Up") - (match_dup 0)] - VSTRWQ))] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vpst\;vstrwt.32\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_v4si")) - (set_attr "length" "8")]) - -;; -;; [vstrwq_s vstrwq_u] -;; -(define_insn "mve_vstrwq_v4si" - [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux") - (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")] - VSTRWQ)) - ] - "TARGET_HAVE_MVE" -{ - rtx ops[2]; - int regno = REGNO (operands[1]); - ops[1] = gen_rtx_REG (TImode, regno); - ops[0] = operands[0]; - output_asm_insn ("vstrw.32\t%q1, %E0",ops); - return ""; -} - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_v4si")) - (set_attr "length" "4")]) - -(define_expand "@mve_vst1q_f" - [(match_operand: 0 "mve_memory_operand") - (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F) - ] - "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT" -{ - emit_insn (gen_mve_vstrq_f(operands[0],operands[1])); - DONE; -}) - -(define_expand "@mve_vst1q_" - [(match_operand:MVE_2 0 "mve_memory_operand") - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q) - ] - "TARGET_HAVE_MVE" -{ - emit_insn (gen_mve_vstrq_(operands[0],operands[1])); - DONE; -}) - ;; ;; [vstrdq_scatter_base_p_s vstrdq_scatter_base_p_u] ;; @@ -6931,6 +6466,7 @@ (define_expand "@arm_mve_reinterpret" } ) + ;; Originally expanded by 'predicated_doloop_end'. ;; In the rare situation where the branch is too far, we do also need to ;; revert FPSCR.LTPSIZE back to 0x100 after the last iteration. @@ -6980,3 +6516,199 @@ (define_insn "dlstp_insn" "TARGET_HAVE_MVE" "dlstp.\t%|lr, %0" [(set_attr "type" "mve_misc")]) + + +;; Vector stores +;; [vstrbq_s8, vstrhq_s16, vstrwq_s32, +;; vstrbq_u8, vstrhq_u16, vstrwq_u32, +;; vst1q ] +(define_insn "@mve_vstrq_" + [(set (match_operand:MVE_VLD_ST 0 "mve_memory_operand" "=Ux") + (unspec:MVE_VLD_ST + [(match_operand:MVE_VLD_ST 1 "s_register_operand" "w")] + VSTRQ)) + ] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + rtx ops[2]; + int regno = REGNO (operands[1]); + ops[1] = gen_rtx_REG (TImode, regno); + ops[0] = operands[0]; + output_asm_insn ("vstr.\t%q1, %E0",ops); + return ""; +} + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrq_")) + (set_attr "length" "4")]) + +;; Predicated vector stores +;; [vstrbq_p_s8, vstrhq_p_s16, vstrwq_p_s32, +;; vstrbq_p_u8, vstrhq_p_u16, vstrwq_p_u32, +;; vst1q_p ] +(define_insn "@mve_vstrq_p_" + [(set (match_operand:MVE_VLD_ST 0 "mve_memory_operand" "=Ux") + (unspec:MVE_VLD_ST [ + (match_operand:MVE_VLD_ST 1 "s_register_operand" "w") + (match_operand: 2 "vpr_register_operand" "Up") + (match_dup 0) + ] VSTRQ_P)) + ] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + rtx ops[2]; + int regno = REGNO (operands[1]); + ops[1] = gen_rtx_REG (TImode, regno); + ops[0] = operands[0]; + output_asm_insn ("vpst\;vstrt.\t%q1, %E0",ops); + return ""; +} + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrq_")) + (set_attr "type" "mve_move") + (set_attr "length" "8")]) + +;; Truncating vector stores +;; [vstrbq_s16, vstrbq_s32, vstrhq_s32, +;; vstrbq_u16, vstrbq_u32, vstrhq_u32] +(define_insn "@mve_vstrq_truncate_" + [(set (match_operand:MVE_w_narrow_TYPE 0 "mve_memory_operand" "=Ux") + (unspec:MVE_w_narrow_TYPE + [(truncate:MVE_w_narrow_TYPE + (match_operand: 1 "s_register_operand" "w"))] + VSTRQ_TRUNC + ))] + "TARGET_HAVE_MVE" +{ + rtx ops[2]; + int regno = REGNO (operands[1]); + ops[1] = gen_rtx_REG (TImode, regno); + ops[0] = operands[0]; + output_asm_insn ("vstr.\t%q1, %E0",ops); + return ""; +} + [(set (attr "mve_unpredicated_insn") + (symbol_ref "CODE_FOR_mve_vstrq_truncate_")) + (set_attr "length" "4")]) + +;; Predicated truncating vector stores +;; [vstrbq_p_s16, vstrbq_p_s32, vstrhq_p_s32, +;; vstrbq_p_u16, vstrbq_p_u32, vstrhq_p_u32] +(define_insn "@mve_vstrq_p_truncate_" + [(set (match_operand:MVE_w_narrow_TYPE 0 "mve_memory_operand" "=Ux") + (unspec:MVE_w_narrow_TYPE [ + (truncate:MVE_w_narrow_TYPE + (match_operand: 1 "s_register_operand" "w")) + (match_operand: 2 "vpr_register_operand" "Up") + (match_dup 0) + ] VSTRQ_TRUNC_P))] + "TARGET_HAVE_MVE" +{ + rtx ops[2]; + int regno = REGNO (operands[1]); + ops[1] = gen_rtx_REG (TImode, regno); + ops[0] = operands[0]; + output_asm_insn ( + "vpst\;vstrt.\t%q1, %E0", + ops + ); + return ""; +} + [(set (attr "mve_unpredicated_insn") + (symbol_ref "CODE_FOR_mve_vstrq_truncate_")) + (set_attr "type" "mve_move") + (set_attr "length" "8")]) + +;; Vector Loads +;; [vldrbq_s8, vldrhq_s16, vldrwq_s32, +;; vldrbq_u8, vldrhq_u16, vldrwq_u32, +;; vld1q ] +(define_insn "@mve_vldrq_" + [(set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w") + (unspec:MVE_VLD_ST + [(match_operand:MVE_VLD_ST 1 "mve_memory_operand" "Ux")] + VLDRQ))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + rtx ops[2]; + int regno = REGNO (operands[0]); + ops[0] = gen_rtx_REG (TImode, regno); + ops[1] = operands[1]; + output_asm_insn ("vldr.\t%q0, %E1",ops); + return ""; + } + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrq_")) + (set_attr "length" "4")]) + +;; Predicated vector load +;; [vldrbq_z_s8, vldrhq_z_s16, vldrwq_z_s32, +;; vldrbq_z_u8, vldrhq_z_u16, vldrwq_z_u32, +;; vld1q_z ] +(define_insn "@mve_vldrq_z_" + [(set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w") + (unspec:MVE_VLD_ST [ + (match_operand:MVE_VLD_ST 1 "mve_memory_operand" "Ux") + (match_operand: 2 "vpr_register_operand" "Up") + ] VLDRQ_Z))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + rtx ops[2]; + int regno = REGNO (operands[0]); + ops[0] = gen_rtx_REG (TImode, regno); + ops[1] = operands[1]; + output_asm_insn ("vpst\;vldrt.\t%q0, %E1",ops); + return ""; +} + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrq_")) + (set_attr "type" "mve_move") + (set_attr "length" "8")]) + +;; Extending vector loads +;; [vldrbq_s16, vldrbq_s32, vldrhq_s32, +;; vldrbq_u16, vldrbq_u32, vldrhq_u32] +(define_insn "@mve_vldrq_extend_" + [(set (match_operand: 0 "s_register_operand" "=w") + (unspec: + [(SE: + (match_operand:MVE_w_narrow_TYPE 1 "mve_memory_operand" "Ux"))] + VLDRQ_EXT))] + "TARGET_HAVE_MVE" +{ + rtx ops[2]; + int regno = REGNO (operands[0]); + ops[0] = gen_rtx_REG (TImode, regno); + ops[1] = operands[1]; + output_asm_insn ("vldr.\t%q0, %E1",ops); + return ""; +} + [(set (attr "mve_unpredicated_insn") + (symbol_ref "CODE_FOR_mve_vldrq_extend_")) + (set_attr "length" "4")]) + +;; Predicated extending vector loads +;; [vldrbq_z_s16, vldrbq_z_s32, vldrhq_z_s32, +;; vldrbq_z_u16, vldrbq_z_u32, vldrhq_z_u32] +(define_insn "@mve_vldrq_z_extend_" + [(set (match_operand: 0 "s_register_operand" "=w") + (unspec: [ + (SE: + (match_operand:MVE_w_narrow_TYPE 1 "mve_memory_operand" "Ux")) + (match_operand: 2 "vpr_register_operand" "Up") + ] VLDRQ_EXT_Z))] + "TARGET_HAVE_MVE" +{ + rtx ops[2]; + int regno = REGNO (operands[0]); + ops[0] = gen_rtx_REG (TImode, regno); + ops[1] = operands[1]; + output_asm_insn ( + "vpst\;vldrt.\t%q0, %E1", + ops + ); + return ""; +} + [(set (attr "mve_unpredicated_insn") + (symbol_ref "CODE_FOR_mve_vldrq_extend_")) + (set_attr "type" "mve_move") + (set_attr "length" "8")]) diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index f5f4d154364..01963d54cd4 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -1150,27 +1150,18 @@ (define_c_enum "unspec" [ VSTRWQSB_U VSTRBQSO_S VSTRBQSO_U - VSTRBQ_S - VSTRBQ_U + VLDRQ + VLDRQ_Z + VLDRQ_EXT + VLDRQ_EXT_Z VLDRBQGO_S VLDRBQGO_U - VLDRBQ_S - VLDRBQ_U VLDRWQGB_S VLDRWQGB_U - VLD1Q_F - VLD1Q_S - VLD1Q_U - VLDRHQ_F VLDRHQGO_S VLDRHQGO_U VLDRHQGSO_S VLDRHQGSO_U - VLDRHQ_S - VLDRHQ_U - VLDRWQ_F - VLDRWQ_S - VLDRWQ_U VLDRDQGB_S VLDRDQGB_U VLDRDQGO_S @@ -1186,15 +1177,11 @@ (define_c_enum "unspec" [ VLDRWQGSO_F VLDRWQGSO_S VLDRWQGSO_U - VSTRHQ_F - VST1Q_S - VST1Q_U + VSTRQ + VSTRQ_P + VSTRQ_TRUNC + VSTRQ_TRUNC_P VSTRHQSO_S - VSTRHQ_U - VSTRWQ_S - VSTRWQ_U - VSTRWQ_F - VST1Q_F VSTRDQSB_S VSTRDQSB_U VSTRDQSO_S