From patchwork Wed Nov 15 17:06:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1864437 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=EAP5zYIo; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=EAP5zYIo; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SVqPd2jqqz1yRV for ; Thu, 16 Nov 2023 04:08:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 494DA3858009 for ; Wed, 15 Nov 2023 17:08:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2045.outbound.protection.outlook.com [40.107.8.45]) by sourceware.org (Postfix) with ESMTPS id 0F1CA3858D20 for ; Wed, 15 Nov 2023 17:07:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F1CA3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0F1CA3858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.8.45 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068062; cv=pass; b=YBpI7t5+BcaP+den6eHu7qJKx3Zey3nxhLxydYZj4OjHo/G0J92R3yW8i7I2dUw86JgsQ4W6/OO3OrZp2amS9AosfYRlDUvufNoVFms9LC80h35pFQ5O73MMS/UpJahB9/dLgwh7b2leuusqDzQ8R0MJSEb+tsb6yJa8wvo/10s= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068062; c=relaxed/simple; bh=Ma1jbODQX5Zn04fv9HmI4HXqkEnb461H0crT15ciEJ0=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=O51hc5Kf7tffZ+cIaiI9fr8sKaynrCQK6kApZbH1yh5wDiJnshOQEB/mw8hisYwGtd1AxnbOMqF9pTLUL7R0ldeSgVgmah64Q2lvll12dxlyPd/l8hXCdJwo6coVV4XIqk+1aicMFyK8JL5zUTxT18jX3vrZCXVJ+WCr2JhtyPg= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=QjC6bOCg0fPJZAIBDEQJo8K0t6LgikZsuhk2as+j6uxp6tocFjpq9FpLFfFs8oP7aeoc0Wej0b5mISXF1UHmO8L3qXAFjeAC+49jjkM8Y42rDZ3kwkv6ZSFryY/VFJKoMJWvuTxKpPtRf30G7yb0sA3hBLLp7WmmEOOpNjnC7cTzkvqGl5Tz1q4+Fpf65nNKtcLbjc48ZJv8XAFyci+VkdE3Xj4R9pDBSo0V4wplbkpHJPGWabuFRE+bQTus97gSo17AOQd6+bq5PoaZiXRjnRhhFSaW/0fTuJ5QvuWnUcMQSfdQiZfm6CW1VXLnjwo6VDAxDbJmISE24EV4zKRREA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sJ/iTuXy/pFe+HrD/yTQOrWIZJz/Zz396ryc9isIhxo=; b=ad2P84cLNZmAb5VbXQcijpTbP02HE6u/YSFHvMw3056FwqTPUpqvwnJN4Xnk2tXKGQlhWhFMSkBpFtGb8bPBKUB7E5zl+QAHVGnY08eyWGwLJtMUy95NM+6rWQyqcNDZJSsh1RrnFT0ziToqezyVgnocFKJJgYUvLiT5oe5MPIIo7J+uFNL5wKJXH6+lSvduqe0pv9VUMWEu1x+lvp6skJxE6fmGapy0uLjSVRlIXMy0vkqiMWVLpQ6TtjJ51ZHufGxr9JIx+Psbq1OqhKDHpObjHjlnFzsDM0uvWsvMSYL6dCj81M462ywNqiFBaYOgvc4EjJxyCeG/4/LSZ2dbeA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sJ/iTuXy/pFe+HrD/yTQOrWIZJz/Zz396ryc9isIhxo=; b=EAP5zYIoJbnQkCbpCf9c1BGTn/bbKTtVo7h5u5gXMuxfsbkuDR71bxOA9Lw2q3GJu3Zb5G2HLjYI4p//tv3YLuEIr+vVp0zBr99+vTnNx2ejk9cDQHj84lnWfWO6COFQfci4YTsJG5mqeO4OytnU0OT1bVvdh4Jhkfy3/rO160g= Received: from AS9PR07CA0054.eurprd07.prod.outlook.com (2603:10a6:20b:46b::13) by GV1PR08MB10569.eurprd08.prod.outlook.com (2603:10a6:150:16e::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Wed, 15 Nov 2023 17:07:21 +0000 Received: from AM4PEPF00027A64.eurprd04.prod.outlook.com (2603:10a6:20b:46b:cafe::57) by AS9PR07CA0054.outlook.office365.com (2603:10a6:20b:46b::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.17 via Frontend Transport; Wed, 15 Nov 2023 17:07:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00027A64.mail.protection.outlook.com (10.167.16.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20 via Frontend Transport; Wed, 15 Nov 2023 17:07:21 +0000 Received: ("Tessian outbound 385ad2f98d71:v228"); Wed, 15 Nov 2023 17:07:21 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 964e25288d75ecee X-CR-MTA-TID: 64aa7808 Received: from 2ea2f6a1716a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0F8B7C98-F535-47D3-8E41-FEE0FB2CED3E.1; Wed, 15 Nov 2023 17:07:10 +0000 Received: from EUR02-AM0-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2ea2f6a1716a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 15 Nov 2023 17:07:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jKeOpedQt2cACGm2qBilu//RJXv5CjW59ccLn5tEYh4bjc1S/TfFFUbyZywUlTl2xZQEzo+b1MLDBTo8VCH++F6fQzFXPbJbw1BLQTDeXdN5qfG4lqCNSOq6jiodiJUtbz/tItgX6TyU1EHWioHZoRgOJUczceLgd1Hq0KFaVSF4wkdzD2ruGU2GrBzm8KHm+KmAl6SUWFack2lHjMs2IE1BUaQXjLYmgB0OEzz3u1OtuBVchn0x2SEFoxaJvdOr0pPuO7hhakMRaiqiBADHSGKKfdJsnofp363AF+DHpShlCW5PetDHwuxg5vIDKtq8LRN0KelTeuDsWmBmVei9fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sJ/iTuXy/pFe+HrD/yTQOrWIZJz/Zz396ryc9isIhxo=; b=d5dN/cnAmjWbt1ZJOiMsBIRORenvUUtu0ih5TM+Z5AZsh5x+1x8eilrTaDfGPYyDjlhCkaIZlqlMcDOu5b+Rwn+iM8L9hQeCfJ41NPUXDrGbOq8W3HeSzgnPyAbucuSZmy4/jekw8SUrQshhKAsZrYMxwuuG1a3fyoi6L8/5kP8uzGCQtPW70umMeNxN/d7YYuQsduKZyglbWVHfrjEryNUrjxDCLOCWtzI4JjxOACocZZgw103+dzMZANXmVXNf38T1Lk9iKk4WgQvtFRbJuYc84lmH+oZ02S2WYQqhxcDumQDHe4RB7/BJarMDi9jh/ckhw6nk7DraE3joPLACUQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sJ/iTuXy/pFe+HrD/yTQOrWIZJz/Zz396ryc9isIhxo=; b=EAP5zYIoJbnQkCbpCf9c1BGTn/bbKTtVo7h5u5gXMuxfsbkuDR71bxOA9Lw2q3GJu3Zb5G2HLjYI4p//tv3YLuEIr+vVp0zBr99+vTnNx2ejk9cDQHj84lnWfWO6COFQfci4YTsJG5mqeO4OytnU0OT1bVvdh4Jhkfy3/rO160g= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAVPR08MB9040.eurprd08.prod.outlook.com (2603:10a6:102:32d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Wed, 15 Nov 2023 17:07:00 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.6977.029; Wed, 15 Nov 2023 17:07:00 +0000 Date: Wed, 15 Nov 2023 17:06:57 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 1/6]AArch64: Refactor costs models to different files. Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO4P123CA0617.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:314::18) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAVPR08MB9040:EE_|AM4PEPF00027A64:EE_|GV1PR08MB10569:EE_ X-MS-Office365-Filtering-Correlation-Id: 40cf311f-a613-4322-1e71-08dbe5fd54c3 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: DJsUs+esaQlIjFllcVXEkLmC5ju9P0NU2tr4H/bRYRCI4uAQDuQz0e0/S1yjLNlvOsWh4K524QChxnBwwn6G/Wwd45o7wwGsSysazLKrbYnE9HcXZ54dHANaYNkPuLOdzH9tzqylTSni8A7+ZskQa5Je+wtD0waMQB6zIPm/nOr3QWbGbApzN0DPRaVvY72u0upkteCH2e5gJviy3EJYdVKJZBSkFA9B6KTjjpxctINMnz0j+lk8OtUXCOr0saX0+gw/LWz/h12hU92s/yBIQ8/K3raKCUfzVfQMPV2+o6RbDHF5Bq1lM1AqgDu4iDYFL9wtVu7nKOaiGD1LbUyQIzgzY5qFSpZtMPvNC70i0Ev1VUAKRuRzHwlHyviWgqqyj/A2gM7+LIfeokpMli9lGnkE+OxMky8EN1mv4wHe6M7qERe05dIFmqge+b5or6qGRF1AxMSycN6u8bgS8QH7MDqOqAf2G7+CaXL60lzC+BlVbONa12KXHeHVEyK88GvPug1Sfdbu17+vdfAIZnf1XOCC0GHk0S82wZyaeIM/mG3CM51vCOr2zmyekBp7Kh7YKFRfsiqFacoySTLBrwf+BIjRWbO2/WF4eX9F4hPgJMYr71pOcsXmDnh5MjLJufjhPSIfydwu0O1y+1plNEKSxw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(396003)(346002)(366004)(376002)(136003)(230922051799003)(64100799003)(451199024)(1800799009)(186009)(83380400001)(2616005)(235185007)(26005)(478600001)(30864003)(6666004)(6506007)(5660300002)(6512007)(4743002)(2906002)(8936002)(8676002)(4326008)(33964004)(44832011)(44144004)(316002)(41300700001)(38100700002)(6486002)(66476007)(66556008)(66946007)(6916009)(36756003)(86362001)(4216001)(2004002)(2700100001)(579004)(559001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9040 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00027A64.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 50d73f1d-d8d4-4c6f-d611-08dbe5fd483a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jIvKHfPiR7RGVJ9q3tG1wxznkNzM31O1OzqCZZ1i2dlW/NneoB9+1B680GvNNTjGVHrFbKO53JCp3iaMs9JcsmIftgiehE8UH54pDUlPBxkDEabLk1yeElQmQo4AFbaaYqJaUPbvJg8sP5ztfa0gl/d/qNcN5QENsc0PUI5j9ixsnn393YfmSsV+mqjxSsFPf56nReP3CEXFm3AJTFhcBs9LOLr7KKSMUAn/r2WAmjVv3FrEl9cvSqQSmoaD871X9SeaytTYGIwFknlG4xotHhwogUaJepuKbG30QU7iIQDWWoguKFNL8/T8XdyTqmhC5ZeGLxKbxOZ7G8ftNum6Mq7qzjWkaH9wT8jzSGwlJ1MMERVdeNwS5F4fxWz9A6/02qQ2ofK7oRdK1d54FEZmlSEFQOjpBfeThR97h3BhvAF+XzfF7EXK6V4qkkfOosVT3OzsmbgG2RNANimiUtAhNZB9EMjee30IwHilCsxaARXbyJO9rSktPQ2MN3XrtI20eA1jBGvy2tYOn3xEuP3U/5wHpaO6OLhv7eX1NCTMk7QfR6+e9MSVNPPZDIrP9FFIZyzzPcvycdMGkq8Rqiv2RnnkFQQ8+PDuAnjBvL5sPAkIEi6W49fKDPIvszvA0UH7fgLbQZo85WQcTCJ3wdS5vo+BJ06TGKliDgm6+CeNEW7aqgVWbI+a9ZcMCe9lX6lPvAyzed6VMl9YeaXc0imHpTSev4OoWtBQHXPtyEFrgegJjf2wdJ5iTRv6o3UygNKBGeBaN9/PXpO49hMJhIJuSqqjGO4bNz8fEq7z/eesdrExkLQVvoQsoZ3o2xy6LRhQ X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(396003)(39860400002)(376002)(136003)(346002)(230922051799003)(64100799003)(451199024)(82310400011)(186009)(1800799009)(40470700004)(36840700001)(46966006)(356005)(40480700001)(81166007)(83380400001)(36860700001)(30864003)(8676002)(4326008)(70586007)(8936002)(70206006)(4743002)(336012)(36756003)(41300700001)(47076005)(26005)(2616005)(6506007)(44144004)(6512007)(33964004)(6666004)(478600001)(6486002)(2906002)(40460700003)(44832011)(82740400003)(6916009)(316002)(86362001)(235185007)(5660300002)(4216001)(2004002)(2700100001)(579004)(559001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Nov 2023 17:07:21.3835 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 40cf311f-a613-4322-1e71-08dbe5fd54c3 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00027A64.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB10569 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, This patch series attempts to move the generic cost model in AArch64 to a new and modern generic standard. The current standard is quite old and generates very suboptimal code out of the box for user of GCC. The goal is for the new cost model to be beneficial on newer/current Arm Microarchitectures while not being too negative for older ones. It does not change any core specific optimization. The final changes reflect both performance optimizations and size optimizations. This first patch just re-organizes the cost structures to their own files. The AArch64.cc file has gotten very big and it's hard to follow. No functional changes are expected from this change. Note that since all the structures have private visibility I've put them in header files instead. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64.cc (generic_addrcost_table, exynosm1_addrcost_table, xgene1_addrcost_table, thunderx2t99_addrcost_table, thunderx3t110_addrcost_table, tsv110_addrcost_table, qdf24xx_addrcost_table, a64fx_addrcost_table, neoversev1_addrcost_table, neoversen2_addrcost_table, neoversev2_addrcost_table, generic_regmove_cost, cortexa57_regmove_cost, cortexa53_regmove_cost, exynosm1_regmove_cost, thunderx_regmove_cost, xgene1_regmove_cost, qdf24xx_regmove_cost, thunderx2t99_regmove_cost, thunderx3t110_regmove_cost, tsv110_regmove_cost, a64fx_regmove_cost, neoversen2_regmove_cost, neoversev1_regmove_cost, neoversev2_regmove_cost, generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost, thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost, exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost, thunderx3t110_vector_cost, ampere1_vector_cost, generic_branch_cost, generic_tunings, cortexa35_tunings, cortexa53_tunings, cortexa57_tunings, cortexa72_tunings, cortexa73_tunings, exynosm1_tunings, thunderxt88_tunings, thunderx_tunings, tsv110_tunings, xgene1_tunings, emag_tunings, qdf24xx_tunings, saphira_tunings, thunderx2t99_tunings, thunderx3t110_tunings, neoversen1_tunings, ampere1_tunings, ampere1a_tunings, neoversev1_vector_cost, neoversev1_tunings, neoverse512tvb_vector_cost, neoverse512tvb_tunings, neoversen2_vector_cost, neoversen2_tunings, neoversev2_vector_cost, neoversev2_tunings a64fx_tunings): Split into own files. * config/aarch64/tuning_models/a64fx.h: New file. * config/aarch64/tuning_models/ampere1.h: New file. * config/aarch64/tuning_models/ampere1a.h: New file. * config/aarch64/tuning_models/cortexa35.h: New file. * config/aarch64/tuning_models/cortexa53.h: New file. * config/aarch64/tuning_models/cortexa57.h: New file. * config/aarch64/tuning_models/cortexa72.h: New file. * config/aarch64/tuning_models/cortexa73.h: New file. * config/aarch64/tuning_models/emag.h: New file. * config/aarch64/tuning_models/exynosm1.h: New file. * config/aarch64/tuning_models/generic.h: New file. * config/aarch64/tuning_models/neoverse512tvb.h: New file. * config/aarch64/tuning_models/neoversen1.h: New file. * config/aarch64/tuning_models/neoversen2.h: New file. * config/aarch64/tuning_models/neoversev1.h: New file. * config/aarch64/tuning_models/neoversev2.h: New file. * config/aarch64/tuning_models/qdf24xx.h: New file. * config/aarch64/tuning_models/saphira.h: New file. * config/aarch64/tuning_models/thunderx.h: New file. * config/aarch64/tuning_models/thunderx2t99.h: New file. * config/aarch64/tuning_models/thunderx3t110.h: New file. * config/aarch64/tuning_models/thunderxt88.h: New file. * config/aarch64/tuning_models/tsv110.h: New file. * config/aarch64/tuning_models/xgene1.h: New file. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9fbfc548a891f5d11940c6fd3c49a14bfbdec886..07b1cde39209f5c7740e336b499e9aed31e4c515 100644 --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9fbfc548a891f5d11940c6fd3c49a14bfbdec886..07b1cde39209f5c7740e336b499e9aed31e4c515 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -354,2405 +354,30 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] = }; /* Tuning parameters. */ - -static const struct cpu_addrcost_table generic_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_addrcost_table exynosm1_addrcost_table = -{ - { - 0, /* hi */ - 0, /* si */ - 0, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 1, /* register_offset */ - 1, /* register_sextend */ - 2, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table xgene1_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 1, /* pre_modify */ - 1, /* post_modify */ - 1, /* post_modify_ld3_st3 */ - 1, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 1, /* register_sextend */ - 1, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table thunderx2t99_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 2, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table thunderx3t110_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 2, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table tsv110_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 1, /* register_sextend */ - 1, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table qdf24xx_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 1, /* pre_modify */ - 1, /* post_modify */ - 1, /* post_modify_ld3_st3 */ - 1, /* post_modify_ld4_st4 */ - 3, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 2, /* imm_offset */ -}; - -static const struct cpu_addrcost_table a64fx_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 2, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table neoversev1_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 3, /* post_modify_ld3_st3 */ - 3, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_addrcost_table neoversen2_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 2, /* post_modify_ld3_st3 */ - 2, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_addrcost_table neoversev2_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 2, /* post_modify_ld3_st3 */ - 2, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_regmove_cost generic_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 5, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost cortexa57_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 5, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost cortexa53_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 5, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost exynosm1_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost (actual, 4 and 9). */ - 9, /* GP2FP */ - 9, /* FP2GP */ - 1 /* FP2FP */ -}; - -static const struct cpu_regmove_cost thunderx_regmove_cost = -{ - 2, /* GP2GP */ - 2, /* GP2FP */ - 6, /* FP2GP */ - 4 /* FP2FP */ -}; - -static const struct cpu_regmove_cost xgene1_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 8, /* GP2FP */ - 8, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost qdf24xx_regmove_cost = -{ - 2, /* GP2GP */ - /* Avoid the use of int<->fp moves for spilling. */ - 6, /* GP2FP */ - 6, /* FP2GP */ - 4 /* FP2FP */ -}; - -static const struct cpu_regmove_cost thunderx2t99_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of int<->fp moves for spilling. */ - 5, /* GP2FP */ - 6, /* FP2GP */ - 3, /* FP2FP */ -}; - -static const struct cpu_regmove_cost thunderx3t110_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of int<->fp moves for spilling. */ - 4, /* GP2FP */ - 5, /* FP2GP */ - 4 /* FP2FP */ -}; - -static const struct cpu_regmove_cost tsv110_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 2, /* GP2FP */ - 3, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost a64fx_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 7, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost neoversen2_regmove_cost = -{ - 1, /* GP2GP */ - /* Spilling to int<->fp instead of memory is recommended so set - realistic costs compared to memmov_cost. */ - 3, /* GP2FP */ - 2, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost neoversev1_regmove_cost = -{ - 1, /* GP2GP */ - /* Spilling to int<->fp instead of memory is recommended so set - realistic costs compared to memmov_cost. */ - 3, /* GP2FP */ - 2, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost neoversev2_regmove_cost = -{ - 1, /* GP2GP */ - /* Spilling to int<->fp instead of memory is recommended so set - realistic costs compared to memmov_cost. */ - 3, /* GP2FP */ - 2, /* FP2GP */ - 2 /* FP2FP */ -}; - -/* Generic costs for Advanced SIMD vector operations. */ -static const advsimd_vec_cost generic_advsimd_vector_cost = -{ - 1, /* int_stmt_cost */ - 1, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 2, /* reduc_i8_cost */ - 2, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 2, /* reduc_f16_cost */ - 2, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - 2, /* vec_to_scalar_cost */ - 1, /* scalar_to_vec_cost */ - 1, /* align_load_cost */ - 1, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Generic costs for SVE vector operations. */ -static const sve_vec_cost generic_sve_vector_cost = -{ - { - 1, /* int_stmt_cost */ - 1, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 2, /* reduc_i8_cost */ - 2, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 2, /* reduc_f16_cost */ - 2, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - 2, /* vec_to_scalar_cost */ - 1, /* scalar_to_vec_cost */ - 1, /* align_load_cost */ - 1, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 2, /* clast_cost */ - 2, /* fadda_f16_cost */ - 2, /* fadda_f32_cost */ - 2, /* fadda_f64_cost */ - 4, /* gather_load_x32_cost */ - 2, /* gather_load_x64_cost */ - 1 /* scatter_store_elt_cost */ -}; - -/* Generic costs for vector insn classes. */ -static const struct cpu_vector_cost generic_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 1, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &generic_advsimd_vector_cost, /* advsimd */ - &generic_sve_vector_cost, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost a64fx_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 13, /* reduc_i8_cost */ - 13, /* reduc_i16_cost */ - 13, /* reduc_i32_cost */ - 13, /* reduc_i64_cost */ - 13, /* reduc_f16_cost */ - 13, /* reduc_f32_cost */ - 13, /* reduc_f64_cost */ - 13, /* store_elt_extra_cost */ - 13, /* vec_to_scalar_cost */ - 4, /* scalar_to_vec_cost */ - 6, /* align_load_cost */ - 6, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost a64fx_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 13, /* reduc_i8_cost */ - 13, /* reduc_i16_cost */ - 13, /* reduc_i32_cost */ - 13, /* reduc_i64_cost */ - 13, /* reduc_f16_cost */ - 13, /* reduc_f32_cost */ - 13, /* reduc_f64_cost */ - 13, /* store_elt_extra_cost */ - 13, /* vec_to_scalar_cost */ - 4, /* scalar_to_vec_cost */ - 6, /* align_load_cost */ - 6, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 13, /* clast_cost */ - 13, /* fadda_f16_cost */ - 13, /* fadda_f32_cost */ - 13, /* fadda_f64_cost */ - 64, /* gather_load_x32_cost */ - 32, /* gather_load_x64_cost */ - 1 /* scatter_store_elt_cost */ -}; - -static const struct cpu_vector_cost a64fx_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 5, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &a64fx_advsimd_vector_cost, /* advsimd */ - &a64fx_sve_vector_cost, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost qdf24xx_advsimd_vector_cost = -{ - 1, /* int_stmt_cost */ - 3, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 1, /* reduc_i8_cost */ - 1, /* reduc_i16_cost */ - 1, /* reduc_i32_cost */ - 1, /* reduc_i64_cost */ - 1, /* reduc_f16_cost */ - 1, /* reduc_f32_cost */ - 1, /* reduc_f64_cost */ - 1, /* store_elt_extra_cost */ - 1, /* vec_to_scalar_cost */ - 1, /* scalar_to_vec_cost */ - 1, /* align_load_cost */ - 1, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* QDF24XX costs for vector insn classes. */ -static const struct cpu_vector_cost qdf24xx_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 1, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &qdf24xx_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - - -static const advsimd_vec_cost thunderx_advsimd_vector_cost = -{ - 4, /* int_stmt_cost */ - 1, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 4, /* permute_cost */ - 2, /* reduc_i8_cost */ - 2, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 2, /* reduc_f16_cost */ - 2, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - 2, /* vec_to_scalar_cost */ - 2, /* scalar_to_vec_cost */ - 3, /* align_load_cost */ - 5, /* unalign_load_cost */ - 5, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* ThunderX costs for vector insn classes. */ -static const struct cpu_vector_cost thunderx_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 3, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 3, /* cond_not_taken_branch_cost */ - &thunderx_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost tsv110_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 3, /* reduc_i8_cost */ - 3, /* reduc_i16_cost */ - 3, /* reduc_i32_cost */ - 3, /* reduc_i64_cost */ - 3, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 3, /* reduc_f64_cost */ - 3, /* store_elt_extra_cost */ - 3, /* vec_to_scalar_cost */ - 2, /* scalar_to_vec_cost */ - 5, /* align_load_cost */ - 5, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const struct cpu_vector_cost tsv110_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 5, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &tsv110_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost cortexa57_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 8, /* reduc_i8_cost */ - 8, /* reduc_i16_cost */ - 8, /* reduc_i32_cost */ - 8, /* reduc_i64_cost */ - 8, /* reduc_f16_cost */ - 8, /* reduc_f32_cost */ - 8, /* reduc_f64_cost */ - 8, /* store_elt_extra_cost */ - 8, /* vec_to_scalar_cost */ - 8, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Cortex-A57 costs for vector insn classes. */ -static const struct cpu_vector_cost cortexa57_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &cortexa57_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost exynosm1_advsimd_vector_cost = -{ - 3, /* int_stmt_cost */ - 3, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 3, /* reduc_i8_cost */ - 3, /* reduc_i16_cost */ - 3, /* reduc_i32_cost */ - 3, /* reduc_i64_cost */ - 3, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 3, /* reduc_f64_cost */ - 3, /* store_elt_extra_cost */ - 3, /* vec_to_scalar_cost */ - 3, /* scalar_to_vec_cost */ - 5, /* align_load_cost */ - 5, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const struct cpu_vector_cost exynosm1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 5, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &exynosm1_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost xgene1_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 4, /* reduc_i32_cost */ - 4, /* reduc_i64_cost */ - 4, /* reduc_f16_cost */ - 4, /* reduc_f32_cost */ - 4, /* reduc_f64_cost */ - 4, /* store_elt_extra_cost */ - 4, /* vec_to_scalar_cost */ - 4, /* scalar_to_vec_cost */ - 10, /* align_load_cost */ - 10, /* unalign_load_cost */ - 2, /* unalign_store_cost */ - 2 /* store_cost */ -}; - -/* Generic costs for vector insn classes. */ -static const struct cpu_vector_cost xgene1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 5, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 2, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &xgene1_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost = -{ - 4, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 10, /* permute_cost */ - 6, /* reduc_i8_cost */ - 6, /* reduc_i16_cost */ - 6, /* reduc_i32_cost */ - 6, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 6, /* reduc_f32_cost */ - 6, /* reduc_f64_cost */ - 6, /* store_elt_extra_cost */ - 6, /* vec_to_scalar_cost */ - 5, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Costs for vector insn classes for Vulcan. */ -static const struct cpu_vector_cost thunderx2t99_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 6, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 2, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &thunderx2t99_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost = -{ - 5, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 10, /* permute_cost */ - 5, /* reduc_i8_cost */ - 5, /* reduc_i16_cost */ - 5, /* reduc_i32_cost */ - 5, /* reduc_i64_cost */ - 5, /* reduc_f16_cost */ - 5, /* reduc_f32_cost */ - 5, /* reduc_f64_cost */ - 5, /* store_elt_extra_cost */ - 5, /* vec_to_scalar_cost */ - 5, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 4, /* unalign_store_cost */ - 4 /* store_cost */ -}; - -static const struct cpu_vector_cost thunderx3t110_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 5, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 2, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &thunderx3t110_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost ampere1_advsimd_vector_cost = -{ - 1, /* int_stmt_cost */ - 3, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 12, /* reduc_i8_cost */ - 9, /* reduc_i16_cost */ - 6, /* reduc_i32_cost */ - 5, /* reduc_i64_cost */ - 9, /* reduc_f16_cost */ - 6, /* reduc_f32_cost */ - 5, /* reduc_f64_cost */ - 8, /* store_elt_extra_cost */ - 6, /* vec_to_scalar_cost */ - 7, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Ampere-1 costs for vector insn classes. */ -static const struct cpu_vector_cost ampere1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 3, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &ere1_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -/* Generic costs for branch instructions. */ -static const struct cpu_branch_cost generic_branch_cost = -{ - 1, /* Predictable. */ - 3 /* Unpredictable. */ -}; - -/* Generic approximation modes. */ -static const cpu_approx_modes generic_approx_modes = -{ - AARCH64_APPROX_NONE, /* division */ - AARCH64_APPROX_NONE, /* sqrt */ - AARCH64_APPROX_NONE /* recip_sqrt */ -}; - -/* Approximation modes for Exynos M1. */ -static const cpu_approx_modes exynosm1_approx_modes = -{ - AARCH64_APPROX_NONE, /* division */ - AARCH64_APPROX_ALL, /* sqrt */ - AARCH64_APPROX_ALL /* recip_sqrt */ -}; - -/* Approximation modes for X-Gene 1. */ -static const cpu_approx_modes xgene1_approx_modes = -{ - AARCH64_APPROX_NONE, /* division */ - AARCH64_APPROX_NONE, /* sqrt */ - AARCH64_APPROX_ALL /* recip_sqrt */ -}; - -/* Generic prefetch settings (which disable prefetch). */ -static const cpu_prefetch_tune generic_prefetch_tune = -{ - 0, /* num_slots */ - -1, /* l1_cache_size */ - -1, /* l1_cache_line_size */ - -1, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune exynosm1_prefetch_tune = -{ - 0, /* num_slots */ - -1, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - -1, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune qdf24xx_prefetch_tune = -{ - 4, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 512, /* l2_cache_size */ - false, /* prefetch_dynamic_strides */ - 2048, /* minimum_stride */ - 3 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderxt88_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 128, /* l1_cache_line_size */ - 16*1024, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - 3 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderx_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 128, /* l1_cache_line_size */ - -1, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderx2t99_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 256, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderx3t110_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 256, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune tsv110_prefetch_tune = -{ - 0, /* num_slots */ - 64, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 512, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune xgene1_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 256, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune a64fx_prefetch_tune = -{ - 8, /* num_slots */ - 64, /* l1_cache_size */ - 256, /* l1_cache_line_size */ - 32768, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune ampere1_prefetch_tune = -{ - 0, /* num_slots */ - 64, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 2048, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const struct tune_params generic_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "16:12", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits - Neoverse V1. It does not have a noticeable effect on A64FX and should - have at most a very minor effect on SVE2 cores. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa35_tunings = -{ - &cortexa53_extra_costs, - &generic_addrcost_table, - &cortexa53_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 1, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa53_tunings = -{ - &cortexa53_extra_costs, - &generic_addrcost_table, - &cortexa53_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa57_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &cortexa57_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa72_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &cortexa57_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa73_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &cortexa57_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate. */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params exynosm1_tunings = -{ - &exynosm1_extra_costs, - &exynosm1_addrcost_table, - &exynosm1_regmove_cost, - &exynosm1_vector_cost, - &generic_branch_cost, - &exynosm1_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ - "4", /* function_align. */ - "4", /* jump_align. */ - "4", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 48, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &exynosm1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params thunderxt88_tunings = -{ - &thunderx_extra_costs, - &generic_addrcost_table, - &thunderx_regmove_cost, - &thunderx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ - "8", /* function_align. */ - "8", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderxt88_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const struct tune_params thunderx_tunings = -{ - &thunderx_extra_costs, - &generic_addrcost_table, - &thunderx_regmove_cost, - &thunderx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ - "8", /* function_align. */ - "8", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &thunderx_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const struct tune_params tsv110_tunings = -{ - &tsv110_extra_costs, - &tsv110_addrcost_table, - &tsv110_regmove_cost, - &tsv110_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH - | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &tsv110_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params xgene1_tunings = -{ - &xgene1_extra_costs, - &xgene1_addrcost_table, - &xgene1_regmove_cost, - &xgene1_vector_cost, - &generic_branch_cost, - &xgene1_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - AARCH64_FUSE_NOTHING, /* fusible_ops */ - "16", /* function_align. */ - "16", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 17, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ - &xgene1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params emag_tunings = -{ - &xgene1_extra_costs, - &xgene1_addrcost_table, - &xgene1_regmove_cost, - &xgene1_vector_cost, - &generic_branch_cost, - &xgene1_approx_modes, - SVE_NOT_IMPLEMENTED, - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - AARCH64_FUSE_NOTHING, /* fusible_ops */ - "16", /* function_align. */ - "16", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 17, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ - &xgene1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params qdf24xx_tunings = -{ - &qdf24xx_extra_costs, - &qdf24xx_addrcost_table, - &qdf24xx_regmove_cost, - &qdf24xx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ - &qdf24xx_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -/* Tuning structure for the Qualcomm Saphira core. Default to falkor values - for now. */ -static const struct tune_params saphira_tunings = -{ - &generic_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params thunderx2t99_tunings = -{ - &thunderx2t99_extra_costs, - &thunderx2t99_addrcost_table, - &thunderx2t99_regmove_cost, - &thunderx2t99_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate. */ - (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC - | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 3, /* int_reassoc_width. */ - 2, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderx2t99_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params thunderx3t110_tunings = -{ - &thunderx3t110_extra_costs, - &thunderx3t110_addrcost_table, - &thunderx3t110_regmove_cost, - &thunderx3t110_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 6, /* issue_rate. */ - (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC - | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 3, /* int_reassoc_width. */ - 2, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderx3t110_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params neoversen1_tunings = -{ - &cortexa76_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 5, /* load_fp. */ - 2, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params ampere1_tunings = -{ - &ere1_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &ere1_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | - AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | - AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | - AARCH64_FUSE_CMP_BRANCH), - /* fusible_ops */ - "32", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &ere1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const struct tune_params ampere1a_tunings = -{ - &ere1a_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &ere1_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | - AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | - AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | - AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ | - AARCH64_FUSE_ADDSUB_2REG_CONST1), - /* fusible_ops */ - "32", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &ere1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const advsimd_vec_cost neoversev1_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 4, /* ld2_st2_permute_cost */ - 4, /* ld3_st3_permute_cost */ - 5, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost neoversev1_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 4, /* ld2_st2_permute_cost */ - 7, /* ld3_st3_permute_cost */ - 8, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 31 scalar ADDs could - complete in ~9 cycles and would have a cost of 31. [SU]ADDV - completes in 14 cycles, so give it a cost of 31 + 5. */ - 36, /* reduc_i8_cost */ - /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7. */ - 22, /* reduc_i16_cost */ - /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7. */ - 14, /* reduc_i32_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8. */ - 11, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 15 scalar FADDs could - complete in ~9 cycles and would have a cost of 30. FADDV - completes in 13 cycles, so give it a cost of 30 + 4. */ - 34, /* reduc_f16_cost */ - /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5. */ - 19, /* reduc_f32_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5. */ - 11, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* See the comment above the Advanced SIMD versions. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 19, /* fadda_f16_cost */ - 11, /* fadda_f32_cost */ - 8, /* fadda_f64_cost */ - 32, /* gather_load_x32_cost */ - 16, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info = -{ - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ -}; - -static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info = -{ - { - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ -}; - -static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info = -{ - { - { - 2, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 2, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 1, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoversev1_vec_issue_info = -{ - &neoversev1_scalar_issue_info, - &neoversev1_advsimd_issue_info, - &neoversev1_sve_issue_info -}; - -/* Neoverse V1 costs for vector insn classes. */ -static const struct cpu_vector_cost neoversev1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversev1_advsimd_vector_cost, /* advsimd */ - &neoversev1_sve_vector_cost, /* sve */ - &neoversev1_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoversev1_tunings = -{ - &cortexa76_extra_costs, - &neoversev1_addrcost_table, - &neoversev1_regmove_cost, - &neoversev1_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_256, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 6, /* load_fp. */ - 2, /* store_fp. */ - 6, /* load_pred. */ - 1 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const sve_vec_cost neoverse512tvb_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 4, /* ld2_st2_permute_cost */ - 5, /* ld3_st3_permute_cost */ - 5, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 15 scalar ADDs could - complete in ~5 cycles and would have a cost of 15. Assume that - [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6. */ - 21, /* reduc_i8_cost */ - /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ - 13, /* reduc_i16_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ - 9, /* reduc_i32_cost */ - /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7. */ - 8, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 7 scalar FADDs could - complete in ~6 cycles and would have a cost of 14. Assume that - FADDV completes in 8 cycles and so give it a cost of 14 + 2. */ - 16, /* reduc_f16_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ - 8, /* reduc_f32_cost */ - /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2. */ - 4, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores generally have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 10, /* fadda_f16_cost */ - 6, /* fadda_f32_cost */ - 4, /* fadda_f64_cost */ - /* A strided Advanced SIMD x64 load would take two parallel FP loads - (6 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather - is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads - (cost 8) and a vec_construct (cost 2). Add a full vector operation - (cost 2) to that, to avoid the difference being lost in rounding. - - There is no easy comparison between a strided Advanced SIMD x32 load - and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector - operation more than a 64-bit gather. */ - 14, /* gather_load_x32_cost */ - 12, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info = -{ - { - { - 3, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 2, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info = -{ - &neoversev1_scalar_issue_info, - &neoversev1_advsimd_issue_info, - &neoverse512tvb_sve_issue_info -}; - -static const struct cpu_vector_cost neoverse512tvb_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversev1_advsimd_vector_cost, /* advsimd */ - &neoverse512tvb_sve_vector_cost, /* sve */ - &neoverse512tvb_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoverse512tvb_tunings = -{ - &cortexa76_extra_costs, - &neoversev1_addrcost_table, - &neoversev1_regmove_cost, - &neoverse512tvb_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_128 | SVE_256, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 6, /* load_fp. */ - 2, /* store_fp. */ - 6, /* load_pred. */ - 1 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const advsimd_vec_cost neoversen2_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 2, /* ld2_st2_permute_cost */ - 2, /* ld3_st3_permute_cost */ - 3, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 4, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost neoversen2_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 3, /* ld2_st2_permute_cost */ - 4, /* ld3_st3_permute_cost */ - 4, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 15 scalar ADDs could - complete in ~5 cycles and would have a cost of 15. [SU]ADDV - completes in 11 cycles, so give it a cost of 15 + 6. */ - 21, /* reduc_i8_cost */ - /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ - 13, /* reduc_i16_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ - 9, /* reduc_i32_cost */ - /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ - 2, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 7 scalar FADDs could - complete in ~8 cycles and would have a cost of 14. FADDV - completes in 6 cycles, so give it a cost of 14 - 2. */ - 12, /* reduc_f16_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ - 6, /* reduc_f32_cost */ - /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0. */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* See the comment above the Advanced SIMD versions. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 10, /* fadda_f16_cost */ - 6, /* fadda_f32_cost */ - 4, /* fadda_f64_cost */ - /* A strided Advanced SIMD x64 load would take two parallel FP loads - (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather - is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads - (cost 8) and a vec_construct (cost 2). Add a full vector operation - (cost 2) to that, to avoid the difference being lost in rounding. - - There is no easy comparison between a strided Advanced SIMD x32 load - and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector - operation more than a 64-bit gather. */ - 14, /* gather_load_x32_cost */ - 12, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info = -{ - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ -}; - -static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info = -{ - { - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 2, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ -}; - -static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info = -{ - { - { - 3, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 2, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 3, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 2, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoversen2_vec_issue_info = -{ - &neoversen2_scalar_issue_info, - &neoversen2_advsimd_issue_info, - &neoversen2_sve_issue_info -}; - -/* Neoverse N2 costs for vector insn classes. */ -static const struct cpu_vector_cost neoversen2_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversen2_advsimd_vector_cost, /* advsimd */ - &neoversen2_sve_vector_cost, /* sve */ - &neoversen2_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoversen2_tunings = -{ - &cortexa76_extra_costs, - &neoversen2_addrcost_table, - &neoversen2_regmove_cost, - &neoversen2_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_128, /* sve_width */ - { 4, /* load_int. */ - 1, /* store_int. */ - 6, /* load_fp. */ - 2, /* store_fp. */ - 6, /* load_pred. */ - 1 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND - | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const advsimd_vec_cost neoversev2_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 2, /* ld2_st2_permute_cost */ - 2, /* ld3_st3_permute_cost */ - 3, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost neoversev2_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 3, /* ld2_st2_permute_cost */ - 3, /* ld3_st3_permute_cost */ - 4, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 15 scalar ADDs could - complete in ~3 cycles and would have a cost of 15. [SU]ADDV - completes in 11 cycles, so give it a cost of 15 + 8. */ - 21, /* reduc_i8_cost */ - /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7. */ - 14, /* reduc_i16_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4. */ - 7, /* reduc_i32_cost */ - /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ - 2, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 7 scalar FADDs could - complete in ~6 cycles and would have a cost of 14. FADDV - completes in 8 cycles, so give it a cost of 14 + 2. */ - 16, /* reduc_f16_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ - 8, /* reduc_f32_cost */ - /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2. */ - 4, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* See the comment above the Advanced SIMD versions. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 10, /* fadda_f16_cost */ - 6, /* fadda_f32_cost */ - 4, /* fadda_f64_cost */ - /* A strided Advanced SIMD x64 load would take two parallel FP loads - (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather - is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads - (cost 8) and a vec_construct (cost 2). Add a full vector operation - (cost 2) to that, to avoid the difference being lost in rounding. - - There is no easy comparison between a strided Advanced SIMD x32 load - and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector - operation more than a 64-bit gather. */ - 14, /* gather_load_x32_cost */ - 12, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info = -{ - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 6, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ -}; - -static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info = -{ - { - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ -}; - -static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info = -{ - { - { - 3, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 3, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 2, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoversev2_vec_issue_info = -{ - &neoversev2_scalar_issue_info, - &neoversev2_advsimd_issue_info, - &neoversev2_sve_issue_info -}; - -/* Demeter costs for vector insn classes. */ -static const struct cpu_vector_cost neoversev2_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversev2_advsimd_vector_cost, /* advsimd */ - &neoversev2_sve_vector_cost, /* sve */ - &neoversev2_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoversev2_tunings = -{ - &cortexa76_extra_costs, - &neoversev2_addrcost_table, - &neoversev2_regmove_cost, - &neoversev2_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_128, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 6, /* load_fp. */ - 1, /* store_fp. */ - 6, /* load_pred. */ - 2 /* store_pred. */ - }, /* memmov_cost. */ - 5, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 3, /* int_reassoc_width. */ - 6, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 3, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND - | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params a64fx_tunings = -{ - &a64fx_extra_costs, - &a64fx_addrcost_table, - &a64fx_regmove_cost, - &a64fx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_512, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 7, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32", /* function_align. */ - "16", /* jump_align. */ - "32", /* loop_align. */ - 4, /* int_reassoc_width. */ - 2, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &a64fx_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; +#include "tuning_models/generic.h" +#include "tuning_models/cortexa35.h" +#include "tuning_models/cortexa53.h" +#include "tuning_models/cortexa57.h" +#include "tuning_models/cortexa72.h" +#include "tuning_models/cortexa73.h" +#include "tuning_models/exynosm1.h" +#include "tuning_models/thunderxt88.h" +#include "tuning_models/thunderx.h" +#include "tuning_models/tsv110.h" +#include "tuning_models/xgene1.h" +#include "tuning_models/emag.h" +#include "tuning_models/qdf24xx.h" +#include "tuning_models/saphira.h" +#include "tuning_models/thunderx2t99.h" +#include "tuning_models/thunderx3t110.h" +#include "tuning_models/neoversen1.h" +#include "tuning_models/ampere1.h" +#include "tuning_models/ampere1a.h" +#include "tuning_models/neoversev1.h" +#include "tuning_models/neoverse512tvb.h" +#include "tuning_models/neoversen2.h" +#include "tuning_models/neoversev2.h" +#include "tuning_models/a64fx.h" /* Support for fine-grained override of the tuning structures. */ struct aarch64_tuning_override_function diff --git a/gcc/config/aarch64/tuning_models/a64fx.h b/gcc/config/aarch64/tuning_models/a64fx.h new file mode 100644 index 0000000000000000000000000000000000000000..7b06c27eba1e4de01738bdfdc077460f9135fb41 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/a64fx.h @@ -0,0 +1,169 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_A64FX +#define GCC_AARCH64_H_A64FX + +#include "generic.h" + +static const struct cpu_addrcost_table a64fx_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 2, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost a64fx_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 7, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost a64fx_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 13, /* reduc_i8_cost */ + 13, /* reduc_i16_cost */ + 13, /* reduc_i32_cost */ + 13, /* reduc_i64_cost */ + 13, /* reduc_f16_cost */ + 13, /* reduc_f32_cost */ + 13, /* reduc_f64_cost */ + 13, /* store_elt_extra_cost */ + 13, /* vec_to_scalar_cost */ + 4, /* scalar_to_vec_cost */ + 6, /* align_load_cost */ + 6, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost a64fx_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 13, /* reduc_i8_cost */ + 13, /* reduc_i16_cost */ + 13, /* reduc_i32_cost */ + 13, /* reduc_i64_cost */ + 13, /* reduc_f16_cost */ + 13, /* reduc_f32_cost */ + 13, /* reduc_f64_cost */ + 13, /* store_elt_extra_cost */ + 13, /* vec_to_scalar_cost */ + 4, /* scalar_to_vec_cost */ + 6, /* align_load_cost */ + 6, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 13, /* clast_cost */ + 13, /* fadda_f16_cost */ + 13, /* fadda_f32_cost */ + 13, /* fadda_f64_cost */ + 64, /* gather_load_x32_cost */ + 32, /* gather_load_x64_cost */ + 1 /* scatter_store_elt_cost */ +}; + +static const struct cpu_vector_cost a64fx_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 5, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &a64fx_advsimd_vector_cost, /* advsimd */ + &a64fx_sve_vector_cost, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune a64fx_prefetch_tune = +{ + 8, /* num_slots */ + 64, /* l1_cache_size */ + 256, /* l1_cache_line_size */ + 32768, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params a64fx_tunings = +{ + &a64fx_extra_costs, + &a64fx_addrcost_table, + &a64fx_regmove_cost, + &a64fx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_512, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 7, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32", /* function_align. */ + "16", /* jump_align. */ + "32", /* loop_align. */ + 4, /* int_reassoc_width. */ + 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &a64fx_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_A64FX. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h new file mode 100644 index 0000000000000000000000000000000000000000..8d2a1c696103259f23cf73df26cef9d4fa05ac73 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/ampere1.h @@ -0,0 +1,113 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_AMPERE1 +#define GCC_AARCH64_H_AMPERE1 + +#include "generic.h" + +static const advsimd_vec_cost ampere1_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 12, /* reduc_i8_cost */ + 9, /* reduc_i16_cost */ + 6, /* reduc_i32_cost */ + 5, /* reduc_i64_cost */ + 9, /* reduc_f16_cost */ + 6, /* reduc_f32_cost */ + 5, /* reduc_f64_cost */ + 8, /* store_elt_extra_cost */ + 6, /* vec_to_scalar_cost */ + 7, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Ampere-1 costs for vector insn classes. */ +static const struct cpu_vector_cost ampere1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 3, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &ere1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune ampere1_prefetch_tune = +{ + 0, /* num_slots */ + 64, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 2048, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params ampere1_tunings = +{ + &ere1_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &ere1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | + AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | + AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | + AARCH64_FUSE_CMP_BRANCH), + /* fusible_ops */ + "32", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_AMPERE1. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h new file mode 100644 index 0000000000000000000000000000000000000000..c419ffb3c1a936a01690ad157c6c71dc645273c8 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/ampere1a.h @@ -0,0 +1,65 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_AMPERE1A +#define GCC_AARCH64_H_AMPERE1A + +#include "generic.h" + +static const struct tune_params ampere1a_tunings = +{ + &ere1a_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &ere1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | + AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | + AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | + AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ | + AARCH64_FUSE_ADDSUB_2REG_CONST1), + /* fusible_ops */ + "32", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_AMPERE1A. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa35.h b/gcc/config/aarch64/tuning_models/cortexa35.h new file mode 100644 index 0000000000000000000000000000000000000000..5534335348db96cc57fc9eccd7ff79a624cb528a --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa35.h @@ -0,0 +1,62 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA35 +#define GCC_AARCH64_H_CORTEXA35 + +#include "generic.h" +#include "cortexa53.h" + +static const struct tune_params cortexa35_tunings = +{ + &cortexa53_extra_costs, + &generic_addrcost_table, + &cortexa53_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 1, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA35. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa53.h b/gcc/config/aarch64/tuning_models/cortexa53.h new file mode 100644 index 0000000000000000000000000000000000000000..9dfdccc5968e7f062af5c78f153bfe3838263b0a --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa53.h @@ -0,0 +1,71 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA53 +#define GCC_AARCH64_H_CORTEXA53 + +#include "generic.h" + +static const struct cpu_regmove_cost cortexa53_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const struct tune_params cortexa53_tunings = +{ + &cortexa53_extra_costs, + &generic_addrcost_table, + &cortexa53_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA53. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa57.h b/gcc/config/aarch64/tuning_models/cortexa57.h new file mode 100644 index 0000000000000000000000000000000000000000..9c4789d57833a5879dda8e2fe454ac5f56cb0601 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa57.h @@ -0,0 +1,109 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA57 +#define GCC_AARCH64_H_CORTEXA57 + +#include "generic.h" + +static const struct cpu_regmove_cost cortexa57_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost cortexa57_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 8, /* reduc_i8_cost */ + 8, /* reduc_i16_cost */ + 8, /* reduc_i32_cost */ + 8, /* reduc_i64_cost */ + 8, /* reduc_f16_cost */ + 8, /* reduc_f32_cost */ + 8, /* reduc_f64_cost */ + 8, /* store_elt_extra_cost */ + 8, /* vec_to_scalar_cost */ + 8, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Cortex-A57 costs for vector insn classes. */ +static const struct cpu_vector_cost cortexa57_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &cortexa57_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const struct tune_params cortexa57_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &cortexa57_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA57. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa72.h b/gcc/config/aarch64/tuning_models/cortexa72.h new file mode 100644 index 0000000000000000000000000000000000000000..968171c9b2e898d7479dbcb462e33fe3905e183d --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa72.h @@ -0,0 +1,61 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA72 +#define GCC_AARCH64_H_CORTEXA72 + +#include "generic.h" + +static const struct tune_params cortexa72_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &cortexa57_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA72. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa73.h b/gcc/config/aarch64/tuning_models/cortexa73.h new file mode 100644 index 0000000000000000000000000000000000000000..8d1a504ddac39604dd193ce0f434fd2f5145c129 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa73.h @@ -0,0 +1,62 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA73 +#define GCC_AARCH64_H_CORTEXA73 + +#include "generic.h" + +static const struct tune_params cortexa73_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &cortexa57_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate. */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + + +#endif /* GCC_AARCH64_H_CORTEXA73. */ diff --git a/gcc/config/aarch64/tuning_models/emag.h b/gcc/config/aarch64/tuning_models/emag.h new file mode 100644 index 0000000000000000000000000000000000000000..3f3402c3fc2a94704eeaf9223ecb0ca1c057cace --- /dev/null +++ b/gcc/config/aarch64/tuning_models/emag.h @@ -0,0 +1,60 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_EMAG +#define GCC_AARCH64_H_EMAG + +#include "generic.h" + +static const struct tune_params emag_tunings = +{ + &xgene1_extra_costs, + &xgene1_addrcost_table, + &xgene1_regmove_cost, + &xgene1_vector_cost, + &generic_branch_cost, + &xgene1_approx_modes, + SVE_NOT_IMPLEMENTED, + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + AARCH64_FUSE_NOTHING, /* fusible_ops */ + "16", /* function_align. */ + "16", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 17, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ + &xgene1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_EMAG. */ diff --git a/gcc/config/aarch64/tuning_models/exynosm1.h b/gcc/config/aarch64/tuning_models/exynosm1.h new file mode 100644 index 0000000000000000000000000000000000000000..a42ea4df97f3f048c41481c304fd3684a69d743b --- /dev/null +++ b/gcc/config/aarch64/tuning_models/exynosm1.h @@ -0,0 +1,144 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_EXYNOSM1 +#define GCC_AARCH64_H_EXYNOSM1 + +#include "generic.h" + +static const struct cpu_addrcost_table exynosm1_addrcost_table = +{ + { + 0, /* hi */ + 0, /* si */ + 0, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 1, /* register_offset */ + 1, /* register_sextend */ + 2, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost exynosm1_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost (actual, 4 and 9). */ + 9, /* GP2FP */ + 9, /* FP2GP */ + 1 /* FP2FP */ +}; + +static const advsimd_vec_cost exynosm1_advsimd_vector_cost = +{ + 3, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 3, /* reduc_i8_cost */ + 3, /* reduc_i16_cost */ + 3, /* reduc_i32_cost */ + 3, /* reduc_i64_cost */ + 3, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 3, /* reduc_f64_cost */ + 3, /* store_elt_extra_cost */ + 3, /* vec_to_scalar_cost */ + 3, /* scalar_to_vec_cost */ + 5, /* align_load_cost */ + 5, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const struct cpu_vector_cost exynosm1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 5, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &exynosm1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +/* Approximation modes for Exynos M1. */ +static const cpu_approx_modes exynosm1_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_ALL, /* sqrt */ + AARCH64_APPROX_ALL /* recip_sqrt */ +}; + +static const cpu_prefetch_tune exynosm1_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params exynosm1_tunings = +{ + &exynosm1_extra_costs, + &exynosm1_addrcost_table, + &exynosm1_regmove_cost, + &exynosm1_vector_cost, + &generic_branch_cost, + &exynosm1_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ + "4", /* function_align. */ + "4", /* jump_align. */ + "4", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 48, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &exynosm1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_EXYNOSM1. */ diff --git a/gcc/config/aarch64/tuning_models/generic.h b/gcc/config/aarch64/tuning_models/generic.h new file mode 100644 index 0000000000000000000000000000000000000000..deb2c1cffe255bddcb5be571b12086442782da60 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/generic.h @@ -0,0 +1,190 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + Contributed by ARM Ltd. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_GENERIC +#define GCC_AARCH64_H_GENERIC + +static const struct cpu_addrcost_table generic_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost generic_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +/* Generic costs for Advanced SIMD vector operations. */ +static const advsimd_vec_cost generic_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Generic costs for SVE vector operations. */ +static const sve_vec_cost generic_sve_vector_cost = +{ + { + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 2, /* clast_cost */ + 2, /* fadda_f16_cost */ + 2, /* fadda_f32_cost */ + 2, /* fadda_f64_cost */ + 4, /* gather_load_x32_cost */ + 2, /* gather_load_x64_cost */ + 1 /* scatter_store_elt_cost */ +}; + +/* Generic costs for vector insn classes. */ +static const struct cpu_vector_cost generic_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 1, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &generic_advsimd_vector_cost, /* advsimd */ + &generic_sve_vector_cost, /* sve */ + nullptr /* issue_info */ +}; + +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_branch_cost = +{ + 1, /* Predictable. */ + 3 /* Unpredictable. */ +}; + +/* Generic approximation modes. */ +static const cpu_approx_modes generic_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_NONE, /* sqrt */ + AARCH64_APPROX_NONE /* recip_sqrt */ +}; + +/* Generic prefetch settings (which disable prefetch). */ +static const cpu_prefetch_tune generic_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + -1, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params generic_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "16:12", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits + Neoverse V1. It does not have a noticeable effect on A64FX and should + have at most a very minor effect on SVE2 cores. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_GENERIC. */ diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h new file mode 100644 index 0000000000000000000000000000000000000000..50d7b23712cc6a8be8f35246657ec5d86d6d4191 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h @@ -0,0 +1,164 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSE512TVB +#define GCC_AARCH64_H_NEOVERSE512TVB + +#include "generic.h" + +static const sve_vec_cost neoverse512tvb_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 5, /* ld3_st3_permute_cost */ + 5, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. Assume that + [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ + 13, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ + 9, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7. */ + 8, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~6 cycles and would have a cost of 14. Assume that + FADDV completes in 8 cycles and so give it a cost of 14 + 2. */ + 16, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ + 8, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2. */ + 4, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores generally have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (6 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info = +{ + &neoversev1_scalar_issue_info, + &neoversev1_advsimd_issue_info, + &neoverse512tvb_sve_issue_info +}; + +static const struct cpu_vector_cost neoverse512tvb_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev1_advsimd_vector_cost, /* advsimd */ + &neoverse512tvb_sve_vector_cost, /* sve */ + &neoverse512tvb_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoverse512tvb_tunings = +{ + &cortexa76_extra_costs, + &neoversev1_addrcost_table, + &neoversev1_regmove_cost, + &neoverse512tvb_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_128 | SVE_256, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSE512TVB. */ diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h b/gcc/config/aarch64/tuning_models/neoversen1.h new file mode 100644 index 0000000000000000000000000000000000000000..132166d3d06430b725e4448937332cc159c11cda --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversen1.h @@ -0,0 +1,60 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEN1 +#define GCC_AARCH64_H_NEOVERSEN1 + +#include "generic.h" + +static const struct tune_params neoversen1_tunings = +{ + &cortexa76_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 5, /* load_fp. */ + 2, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSEN1. */ diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h new file mode 100644 index 0000000000000000000000000000000000000000..395a6d82b8403e586bf179cade055543cf9b9eb0 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -0,0 +1,245 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEN2 +#define GCC_AARCH64_H_NEOVERSEN2 + +#include "generic.h" + +static const struct cpu_addrcost_table neoversen2_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost neoversen2_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost neoversen2_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversen2_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 3, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 4, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 6. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ + 13, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ + 9, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ + 2, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~8 cycles and would have a cost of 14. FADDV + completes in 6 cycles, so give it a cost of 14 - 2. */ + 12, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ + 6, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0. */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 3, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoversen2_vec_issue_info = +{ + &neoversen2_scalar_issue_info, + &neoversen2_advsimd_issue_info, + &neoversen2_sve_issue_info +}; + +/* Neoverse N2 costs for vector insn classes. */ +static const struct cpu_vector_cost neoversen2_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversen2_advsimd_vector_cost, /* advsimd */ + &neoversen2_sve_vector_cost, /* sve */ + &neoversen2_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoversen2_tunings = +{ + &cortexa76_extra_costs, + &neoversen2_addrcost_table, + &neoversen2_regmove_cost, + &neoversen2_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_128, /* sve_width */ + { 4, /* load_int. */ + 1, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSEN2. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h new file mode 100644 index 0000000000000000000000000000000000000000..584a5000e06f598dcdd3bcc533dc6dbc642223ca --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversev1.h @@ -0,0 +1,237 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEV1 +#define GCC_AARCH64_H_NEOVERSEV1 + +#include "generic.h" + +static const struct cpu_addrcost_table neoversev1_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 3, /* post_modify_ld3_st3 */ + 3, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost neoversev1_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost neoversev1_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 5, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversev1_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 7, /* ld3_st3_permute_cost */ + 8, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 31 scalar ADDs could + complete in ~9 cycles and would have a cost of 31. [SU]ADDV + completes in 14 cycles, so give it a cost of 31 + 5. */ + 36, /* reduc_i8_cost */ + /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7. */ + 22, /* reduc_i16_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7. */ + 14, /* reduc_i32_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8. */ + 11, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 15 scalar FADDs could + complete in ~9 cycles and would have a cost of 30. FADDV + completes in 13 cycles, so give it a cost of 30 + 4. */ + 34, /* reduc_f16_cost */ + /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5. */ + 19, /* reduc_f32_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5. */ + 11, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 19, /* fadda_f16_cost */ + 11, /* fadda_f32_cost */ + 8, /* fadda_f64_cost */ + 32, /* gather_load_x32_cost */ + 16, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info = +{ + { + { + 2, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 1, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoversev1_vec_issue_info = +{ + &neoversev1_scalar_issue_info, + &neoversev1_advsimd_issue_info, + &neoversev1_sve_issue_info +}; + +/* Neoverse V1 costs for vector insn classes. */ +static const struct cpu_vector_cost neoversev1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev1_advsimd_vector_cost, /* advsimd */ + &neoversev1_sve_vector_cost, /* sve */ + &neoversev1_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoversev1_tunings = +{ + &cortexa76_extra_costs, + &neoversev1_addrcost_table, + &neoversev1_regmove_cost, + &neoversev1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_256, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + + +#endif /* GCC_AARCH64_H_NEOVERSEV1. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h new file mode 100644 index 0000000000000000000000000000000000000000..28d4244ef4c99ecdffb7408e39dc21bc191223de --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversev2.h @@ -0,0 +1,245 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEV2 +#define GCC_AARCH64_H_NEOVERSEV2 + +#include "generic.h" + +static const struct cpu_addrcost_table neoversev2_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost neoversev2_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost neoversev2_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversev2_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 3, /* ld2_st2_permute_cost */ + 3, /* ld3_st3_permute_cost */ + 4, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~3 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 8. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7. */ + 14, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4. */ + 7, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ + 2, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~6 cycles and would have a cost of 14. FADDV + completes in 8 cycles, so give it a cost of 14 + 2. */ + 16, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ + 8, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2. */ + 4, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 6, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 3, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoversev2_vec_issue_info = +{ + &neoversev2_scalar_issue_info, + &neoversev2_advsimd_issue_info, + &neoversev2_sve_issue_info +}; + +/* Demeter costs for vector insn classes. */ +static const struct cpu_vector_cost neoversev2_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev2_advsimd_vector_cost, /* advsimd */ + &neoversev2_sve_vector_cost, /* sve */ + &neoversev2_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoversev2_tunings = +{ + &cortexa76_extra_costs, + &neoversev2_addrcost_table, + &neoversev2_regmove_cost, + &neoversev2_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_128, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 6, /* load_fp. */ + 1, /* store_fp. */ + 6, /* load_pred. */ + 2 /* store_pred. */ + }, /* memmov_cost. */ + 5, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 3, /* int_reassoc_width. */ + 6, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 3, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSEV2. */ diff --git a/gcc/config/aarch64/tuning_models/qdf24xx.h b/gcc/config/aarch64/tuning_models/qdf24xx.h new file mode 100644 index 0000000000000000000000000000000000000000..29c9b9f5843acc15450a2492b141c02ee48a3f13 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/qdf24xx.h @@ -0,0 +1,137 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_QDF24XX +#define GCC_AARCH64_H_QDF24XX + +#include "generic.h" + +static const struct cpu_addrcost_table qdf24xx_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 1, /* pre_modify */ + 1, /* post_modify */ + 1, /* post_modify_ld3_st3 */ + 1, /* post_modify_ld4_st4 */ + 3, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 2, /* imm_offset */ +}; + +static const struct cpu_regmove_cost qdf24xx_regmove_cost = +{ + 2, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 6, /* GP2FP */ + 6, /* FP2GP */ + 4 /* FP2FP */ +}; + +static const advsimd_vec_cost qdf24xx_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 1, /* reduc_i8_cost */ + 1, /* reduc_i16_cost */ + 1, /* reduc_i32_cost */ + 1, /* reduc_i64_cost */ + 1, /* reduc_f16_cost */ + 1, /* reduc_f32_cost */ + 1, /* reduc_f64_cost */ + 1, /* store_elt_extra_cost */ + 1, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* QDF24XX costs for vector insn classes. */ +static const struct cpu_vector_cost qdf24xx_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 1, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &qdf24xx_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune qdf24xx_prefetch_tune = +{ + 4, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 512, /* l2_cache_size */ + false, /* prefetch_dynamic_strides */ + 2048, /* minimum_stride */ + 3 /* default_opt_level */ +}; + +static const struct tune_params qdf24xx_tunings = +{ + &qdf24xx_extra_costs, + &qdf24xx_addrcost_table, + &qdf24xx_regmove_cost, + &qdf24xx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ + &qdf24xx_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_QDF24XX. */ diff --git a/gcc/config/aarch64/tuning_models/saphira.h b/gcc/config/aarch64/tuning_models/saphira.h new file mode 100644 index 0000000000000000000000000000000000000000..e584d316bb7c3c2d232cf7623a92100ad261f07d --- /dev/null +++ b/gcc/config/aarch64/tuning_models/saphira.h @@ -0,0 +1,63 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_SAPHIRA +#define GCC_AARCH64_H_SAPHIRA + +#include "generic.h" + +/* Tuning structure for the Qualcomm Saphira core. Default to falkor values + for now. */ +static const struct tune_params saphira_tunings = +{ + &generic_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_SAPHIRA. */ diff --git a/gcc/config/aarch64/tuning_models/thunderx.h b/gcc/config/aarch64/tuning_models/thunderx.h new file mode 100644 index 0000000000000000000000000000000000000000..dd4b9d539fc5cf2bd20d84e91d6b72fa7237f99f --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderx.h @@ -0,0 +1,117 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERX +#define GCC_AARCH64_H_THUNDERX + +#include "generic.h" + +static const struct cpu_regmove_cost thunderx_regmove_cost = +{ + 2, /* GP2GP */ + 2, /* GP2FP */ + 6, /* FP2GP */ + 4 /* FP2FP */ +}; + +static const advsimd_vec_cost thunderx_advsimd_vector_cost = +{ + 4, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 4, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 2, /* scalar_to_vec_cost */ + 3, /* align_load_cost */ + 5, /* unalign_load_cost */ + 5, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* ThunderX costs for vector insn classes. */ +static const struct cpu_vector_cost thunderx_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 3, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 3, /* cond_not_taken_branch_cost */ + &thunderx_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune thunderx_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 128, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params thunderx_tunings = +{ + &thunderx_extra_costs, + &generic_addrcost_table, + &thunderx_regmove_cost, + &thunderx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ + "8", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &thunderx_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERX. */ diff --git a/gcc/config/aarch64/tuning_models/thunderx2t99.h b/gcc/config/aarch64/tuning_models/thunderx2t99.h new file mode 100644 index 0000000000000000000000000000000000000000..0a376e0bab37b0b5bc1ea23de0e96a9245846fd7 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderx2t99.h @@ -0,0 +1,137 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERX2T99 +#define GCC_AARCH64_H_THUNDERX2T99 + +#include "generic.h" + +static const struct cpu_addrcost_table thunderx2t99_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 2, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost thunderx2t99_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 5, /* GP2FP */ + 6, /* FP2GP */ + 3, /* FP2FP */ +}; + +static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost = +{ + 4, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 10, /* permute_cost */ + 6, /* reduc_i8_cost */ + 6, /* reduc_i16_cost */ + 6, /* reduc_i32_cost */ + 6, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 6, /* reduc_f32_cost */ + 6, /* reduc_f64_cost */ + 6, /* store_elt_extra_cost */ + 6, /* vec_to_scalar_cost */ + 5, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Costs for vector insn classes for Vulcan. */ +static const struct cpu_vector_cost thunderx2t99_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 6, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 2, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &thunderx2t99_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune thunderx2t99_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 256, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params thunderx2t99_tunings = +{ + &thunderx2t99_extra_costs, + &thunderx2t99_addrcost_table, + &thunderx2t99_regmove_cost, + &thunderx2t99_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate. */ + (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC + | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 3, /* int_reassoc_width. */ + 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderx2t99_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERX2T99. */ diff --git a/gcc/config/aarch64/tuning_models/thunderx3t110.h b/gcc/config/aarch64/tuning_models/thunderx3t110.h new file mode 100644 index 0000000000000000000000000000000000000000..65203b4af132e12e4994013fbab228bd3873b756 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderx3t110.h @@ -0,0 +1,136 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERX3T110 +#define GCC_AARCH64_H_THUNDERX3T110 + +#include "generic.h" + +static const struct cpu_addrcost_table thunderx3t110_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 2, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost thunderx3t110_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 4, /* GP2FP */ + 5, /* FP2GP */ + 4 /* FP2FP */ +}; + +static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost = +{ + 5, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 10, /* permute_cost */ + 5, /* reduc_i8_cost */ + 5, /* reduc_i16_cost */ + 5, /* reduc_i32_cost */ + 5, /* reduc_i64_cost */ + 5, /* reduc_f16_cost */ + 5, /* reduc_f32_cost */ + 5, /* reduc_f64_cost */ + 5, /* store_elt_extra_cost */ + 5, /* vec_to_scalar_cost */ + 5, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 4, /* unalign_store_cost */ + 4 /* store_cost */ +}; + +static const struct cpu_vector_cost thunderx3t110_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 5, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 2, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &thunderx3t110_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune thunderx3t110_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 256, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params thunderx3t110_tunings = +{ + &thunderx3t110_extra_costs, + &thunderx3t110_addrcost_table, + &thunderx3t110_regmove_cost, + &thunderx3t110_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 6, /* issue_rate. */ + (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC + | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 3, /* int_reassoc_width. */ + 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderx3t110_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERX3T110. */ diff --git a/gcc/config/aarch64/tuning_models/thunderxt88.h b/gcc/config/aarch64/tuning_models/thunderxt88.h new file mode 100644 index 0000000000000000000000000000000000000000..dcc74d31484ee6b99d37920dbfe7b1d59377d074 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderxt88.h @@ -0,0 +1,72 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERXT88 +#define GCC_AARCH64_H_THUNDERXT88 + +#include "generic.h" +#include "thunderx.h" + +static const cpu_prefetch_tune thunderxt88_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 128, /* l1_cache_line_size */ + 16*1024, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + 3 /* default_opt_level */ +}; + +static const struct tune_params thunderxt88_tunings = +{ + &thunderx_extra_costs, + &generic_addrcost_table, + &thunderx_regmove_cost, + &thunderx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ + "8", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderxt88_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERXT88. */ diff --git a/gcc/config/aarch64/tuning_models/tsv110.h b/gcc/config/aarch64/tuning_models/tsv110.h new file mode 100644 index 0000000000000000000000000000000000000000..42aeafce652fff34e3277194993dd4aa1f0383a1 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/tsv110.h @@ -0,0 +1,137 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_TSV110 +#define GCC_AARCH64_H_TSV110 + +#include "generic.h" + +static const struct cpu_addrcost_table tsv110_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 1, /* register_sextend */ + 1, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost tsv110_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 2, /* GP2FP */ + 3, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost tsv110_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 3, /* reduc_i8_cost */ + 3, /* reduc_i16_cost */ + 3, /* reduc_i32_cost */ + 3, /* reduc_i64_cost */ + 3, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 3, /* reduc_f64_cost */ + 3, /* store_elt_extra_cost */ + 3, /* vec_to_scalar_cost */ + 2, /* scalar_to_vec_cost */ + 5, /* align_load_cost */ + 5, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const struct cpu_vector_cost tsv110_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 5, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &tsv110_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune tsv110_prefetch_tune = +{ + 0, /* num_slots */ + 64, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 512, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params tsv110_tunings = +{ + &tsv110_extra_costs, + &tsv110_addrcost_table, + &tsv110_regmove_cost, + &tsv110_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH + | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &tsv110_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_TSV110. */ diff --git a/gcc/config/aarch64/tuning_models/xgene1.h b/gcc/config/aarch64/tuning_models/xgene1.h new file mode 100644 index 0000000000000000000000000000000000000000..53a3eb0ddeb80a9735cc988e242a70e87dc90655 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/xgene1.h @@ -0,0 +1,145 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_XGENE1 +#define GCC_AARCH64_H_XGENE1 + +#include "generic.h" + +static const struct cpu_addrcost_table xgene1_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 1, /* pre_modify */ + 1, /* post_modify */ + 1, /* post_modify_ld3_st3 */ + 1, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 1, /* register_sextend */ + 1, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost xgene1_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 8, /* GP2FP */ + 8, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost xgene1_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 4, /* reduc_i32_cost */ + 4, /* reduc_i64_cost */ + 4, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 4, /* reduc_f64_cost */ + 4, /* store_elt_extra_cost */ + 4, /* vec_to_scalar_cost */ + 4, /* scalar_to_vec_cost */ + 10, /* align_load_cost */ + 10, /* unalign_load_cost */ + 2, /* unalign_store_cost */ + 2 /* store_cost */ +}; + +/* Generic costs for vector insn classes. */ +static const struct cpu_vector_cost xgene1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 5, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 2, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &xgene1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +/* Approximation modes for X-Gene 1. */ +static const cpu_approx_modes xgene1_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_NONE, /* sqrt */ + AARCH64_APPROX_ALL /* recip_sqrt */ +}; + +static const cpu_prefetch_tune xgene1_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 256, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params xgene1_tunings = +{ + &xgene1_extra_costs, + &xgene1_addrcost_table, + &xgene1_regmove_cost, + &xgene1_vector_cost, + &generic_branch_cost, + &xgene1_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + AARCH64_FUSE_NOTHING, /* fusible_ops */ + "16", /* function_align. */ + "16", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 17, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ + &xgene1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_XGENE1. */ --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -354,2405 +354,30 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] = }; /* Tuning parameters. */ - -static const struct cpu_addrcost_table generic_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_addrcost_table exynosm1_addrcost_table = -{ - { - 0, /* hi */ - 0, /* si */ - 0, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 1, /* register_offset */ - 1, /* register_sextend */ - 2, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table xgene1_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 1, /* pre_modify */ - 1, /* post_modify */ - 1, /* post_modify_ld3_st3 */ - 1, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 1, /* register_sextend */ - 1, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table thunderx2t99_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 2, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table thunderx3t110_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 2, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table tsv110_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 1, /* register_sextend */ - 1, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table qdf24xx_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 1, /* pre_modify */ - 1, /* post_modify */ - 1, /* post_modify_ld3_st3 */ - 1, /* post_modify_ld4_st4 */ - 3, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 2, /* imm_offset */ -}; - -static const struct cpu_addrcost_table a64fx_addrcost_table = -{ - { - 1, /* hi */ - 1, /* si */ - 1, /* di */ - 2, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 0, /* post_modify_ld3_st3 */ - 0, /* post_modify_ld4_st4 */ - 2, /* register_offset */ - 3, /* register_sextend */ - 3, /* register_zextend */ - 0, /* imm_offset */ -}; - -static const struct cpu_addrcost_table neoversev1_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 3, /* post_modify_ld3_st3 */ - 3, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_addrcost_table neoversen2_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 2, /* post_modify_ld3_st3 */ - 2, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_addrcost_table neoversev2_addrcost_table = -{ - { - 1, /* hi */ - 0, /* si */ - 0, /* di */ - 1, /* ti */ - }, - 0, /* pre_modify */ - 0, /* post_modify */ - 2, /* post_modify_ld3_st3 */ - 2, /* post_modify_ld4_st4 */ - 0, /* register_offset */ - 0, /* register_sextend */ - 0, /* register_zextend */ - 0 /* imm_offset */ -}; - -static const struct cpu_regmove_cost generic_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 5, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost cortexa57_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 5, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost cortexa53_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 5, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost exynosm1_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost (actual, 4 and 9). */ - 9, /* GP2FP */ - 9, /* FP2GP */ - 1 /* FP2FP */ -}; - -static const struct cpu_regmove_cost thunderx_regmove_cost = -{ - 2, /* GP2GP */ - 2, /* GP2FP */ - 6, /* FP2GP */ - 4 /* FP2FP */ -}; - -static const struct cpu_regmove_cost xgene1_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 8, /* GP2FP */ - 8, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost qdf24xx_regmove_cost = -{ - 2, /* GP2GP */ - /* Avoid the use of int<->fp moves for spilling. */ - 6, /* GP2FP */ - 6, /* FP2GP */ - 4 /* FP2FP */ -}; - -static const struct cpu_regmove_cost thunderx2t99_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of int<->fp moves for spilling. */ - 5, /* GP2FP */ - 6, /* FP2GP */ - 3, /* FP2FP */ -}; - -static const struct cpu_regmove_cost thunderx3t110_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of int<->fp moves for spilling. */ - 4, /* GP2FP */ - 5, /* FP2GP */ - 4 /* FP2FP */ -}; - -static const struct cpu_regmove_cost tsv110_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 2, /* GP2FP */ - 3, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost a64fx_regmove_cost = -{ - 1, /* GP2GP */ - /* Avoid the use of slow int<->fp moves for spilling by setting - their cost higher than memmov_cost. */ - 5, /* GP2FP */ - 7, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost neoversen2_regmove_cost = -{ - 1, /* GP2GP */ - /* Spilling to int<->fp instead of memory is recommended so set - realistic costs compared to memmov_cost. */ - 3, /* GP2FP */ - 2, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost neoversev1_regmove_cost = -{ - 1, /* GP2GP */ - /* Spilling to int<->fp instead of memory is recommended so set - realistic costs compared to memmov_cost. */ - 3, /* GP2FP */ - 2, /* FP2GP */ - 2 /* FP2FP */ -}; - -static const struct cpu_regmove_cost neoversev2_regmove_cost = -{ - 1, /* GP2GP */ - /* Spilling to int<->fp instead of memory is recommended so set - realistic costs compared to memmov_cost. */ - 3, /* GP2FP */ - 2, /* FP2GP */ - 2 /* FP2FP */ -}; - -/* Generic costs for Advanced SIMD vector operations. */ -static const advsimd_vec_cost generic_advsimd_vector_cost = -{ - 1, /* int_stmt_cost */ - 1, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 2, /* reduc_i8_cost */ - 2, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 2, /* reduc_f16_cost */ - 2, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - 2, /* vec_to_scalar_cost */ - 1, /* scalar_to_vec_cost */ - 1, /* align_load_cost */ - 1, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Generic costs for SVE vector operations. */ -static const sve_vec_cost generic_sve_vector_cost = -{ - { - 1, /* int_stmt_cost */ - 1, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 2, /* reduc_i8_cost */ - 2, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 2, /* reduc_f16_cost */ - 2, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - 2, /* vec_to_scalar_cost */ - 1, /* scalar_to_vec_cost */ - 1, /* align_load_cost */ - 1, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 2, /* clast_cost */ - 2, /* fadda_f16_cost */ - 2, /* fadda_f32_cost */ - 2, /* fadda_f64_cost */ - 4, /* gather_load_x32_cost */ - 2, /* gather_load_x64_cost */ - 1 /* scatter_store_elt_cost */ -}; - -/* Generic costs for vector insn classes. */ -static const struct cpu_vector_cost generic_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 1, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &generic_advsimd_vector_cost, /* advsimd */ - &generic_sve_vector_cost, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost a64fx_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 13, /* reduc_i8_cost */ - 13, /* reduc_i16_cost */ - 13, /* reduc_i32_cost */ - 13, /* reduc_i64_cost */ - 13, /* reduc_f16_cost */ - 13, /* reduc_f32_cost */ - 13, /* reduc_f64_cost */ - 13, /* store_elt_extra_cost */ - 13, /* vec_to_scalar_cost */ - 4, /* scalar_to_vec_cost */ - 6, /* align_load_cost */ - 6, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost a64fx_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 13, /* reduc_i8_cost */ - 13, /* reduc_i16_cost */ - 13, /* reduc_i32_cost */ - 13, /* reduc_i64_cost */ - 13, /* reduc_f16_cost */ - 13, /* reduc_f32_cost */ - 13, /* reduc_f64_cost */ - 13, /* store_elt_extra_cost */ - 13, /* vec_to_scalar_cost */ - 4, /* scalar_to_vec_cost */ - 6, /* align_load_cost */ - 6, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 13, /* clast_cost */ - 13, /* fadda_f16_cost */ - 13, /* fadda_f32_cost */ - 13, /* fadda_f64_cost */ - 64, /* gather_load_x32_cost */ - 32, /* gather_load_x64_cost */ - 1 /* scatter_store_elt_cost */ -}; - -static const struct cpu_vector_cost a64fx_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 5, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &a64fx_advsimd_vector_cost, /* advsimd */ - &a64fx_sve_vector_cost, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost qdf24xx_advsimd_vector_cost = -{ - 1, /* int_stmt_cost */ - 3, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 1, /* reduc_i8_cost */ - 1, /* reduc_i16_cost */ - 1, /* reduc_i32_cost */ - 1, /* reduc_i64_cost */ - 1, /* reduc_f16_cost */ - 1, /* reduc_f32_cost */ - 1, /* reduc_f64_cost */ - 1, /* store_elt_extra_cost */ - 1, /* vec_to_scalar_cost */ - 1, /* scalar_to_vec_cost */ - 1, /* align_load_cost */ - 1, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* QDF24XX costs for vector insn classes. */ -static const struct cpu_vector_cost qdf24xx_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 1, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &qdf24xx_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - - -static const advsimd_vec_cost thunderx_advsimd_vector_cost = -{ - 4, /* int_stmt_cost */ - 1, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 4, /* permute_cost */ - 2, /* reduc_i8_cost */ - 2, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 2, /* reduc_f16_cost */ - 2, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - 2, /* vec_to_scalar_cost */ - 2, /* scalar_to_vec_cost */ - 3, /* align_load_cost */ - 5, /* unalign_load_cost */ - 5, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* ThunderX costs for vector insn classes. */ -static const struct cpu_vector_cost thunderx_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 3, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 3, /* cond_taken_branch_cost */ - 3, /* cond_not_taken_branch_cost */ - &thunderx_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost tsv110_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 3, /* reduc_i8_cost */ - 3, /* reduc_i16_cost */ - 3, /* reduc_i32_cost */ - 3, /* reduc_i64_cost */ - 3, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 3, /* reduc_f64_cost */ - 3, /* store_elt_extra_cost */ - 3, /* vec_to_scalar_cost */ - 2, /* scalar_to_vec_cost */ - 5, /* align_load_cost */ - 5, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const struct cpu_vector_cost tsv110_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 5, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &tsv110_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost cortexa57_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 8, /* reduc_i8_cost */ - 8, /* reduc_i16_cost */ - 8, /* reduc_i32_cost */ - 8, /* reduc_i64_cost */ - 8, /* reduc_f16_cost */ - 8, /* reduc_f32_cost */ - 8, /* reduc_f64_cost */ - 8, /* store_elt_extra_cost */ - 8, /* vec_to_scalar_cost */ - 8, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Cortex-A57 costs for vector insn classes. */ -static const struct cpu_vector_cost cortexa57_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &cortexa57_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost exynosm1_advsimd_vector_cost = -{ - 3, /* int_stmt_cost */ - 3, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 3, /* reduc_i8_cost */ - 3, /* reduc_i16_cost */ - 3, /* reduc_i32_cost */ - 3, /* reduc_i64_cost */ - 3, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 3, /* reduc_f64_cost */ - 3, /* store_elt_extra_cost */ - 3, /* vec_to_scalar_cost */ - 3, /* scalar_to_vec_cost */ - 5, /* align_load_cost */ - 5, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const struct cpu_vector_cost exynosm1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 5, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &exynosm1_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost xgene1_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 4, /* reduc_i32_cost */ - 4, /* reduc_i64_cost */ - 4, /* reduc_f16_cost */ - 4, /* reduc_f32_cost */ - 4, /* reduc_f64_cost */ - 4, /* store_elt_extra_cost */ - 4, /* vec_to_scalar_cost */ - 4, /* scalar_to_vec_cost */ - 10, /* align_load_cost */ - 10, /* unalign_load_cost */ - 2, /* unalign_store_cost */ - 2 /* store_cost */ -}; - -/* Generic costs for vector insn classes. */ -static const struct cpu_vector_cost xgene1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 1, /* scalar_fp_stmt_cost */ - 5, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 2, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &xgene1_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost = -{ - 4, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 10, /* permute_cost */ - 6, /* reduc_i8_cost */ - 6, /* reduc_i16_cost */ - 6, /* reduc_i32_cost */ - 6, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 6, /* reduc_f32_cost */ - 6, /* reduc_f64_cost */ - 6, /* store_elt_extra_cost */ - 6, /* vec_to_scalar_cost */ - 5, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Costs for vector insn classes for Vulcan. */ -static const struct cpu_vector_cost thunderx2t99_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 6, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 2, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &thunderx2t99_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost = -{ - 5, /* int_stmt_cost */ - 5, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 10, /* permute_cost */ - 5, /* reduc_i8_cost */ - 5, /* reduc_i16_cost */ - 5, /* reduc_i32_cost */ - 5, /* reduc_i64_cost */ - 5, /* reduc_f16_cost */ - 5, /* reduc_f32_cost */ - 5, /* reduc_f64_cost */ - 5, /* store_elt_extra_cost */ - 5, /* vec_to_scalar_cost */ - 5, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 4, /* unalign_store_cost */ - 4 /* store_cost */ -}; - -static const struct cpu_vector_cost thunderx3t110_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 5, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 2, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &thunderx3t110_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -static const advsimd_vec_cost ampere1_advsimd_vector_cost = -{ - 1, /* int_stmt_cost */ - 3, /* fp_stmt_cost */ - 0, /* ld2_st2_permute_cost */ - 0, /* ld3_st3_permute_cost */ - 0, /* ld4_st4_permute_cost */ - 2, /* permute_cost */ - 12, /* reduc_i8_cost */ - 9, /* reduc_i16_cost */ - 6, /* reduc_i32_cost */ - 5, /* reduc_i64_cost */ - 9, /* reduc_f16_cost */ - 6, /* reduc_f32_cost */ - 5, /* reduc_f64_cost */ - 8, /* store_elt_extra_cost */ - 6, /* vec_to_scalar_cost */ - 7, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -/* Ampere-1 costs for vector insn classes. */ -static const struct cpu_vector_cost ampere1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 3, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &ere1_advsimd_vector_cost, /* advsimd */ - nullptr, /* sve */ - nullptr /* issue_info */ -}; - -/* Generic costs for branch instructions. */ -static const struct cpu_branch_cost generic_branch_cost = -{ - 1, /* Predictable. */ - 3 /* Unpredictable. */ -}; - -/* Generic approximation modes. */ -static const cpu_approx_modes generic_approx_modes = -{ - AARCH64_APPROX_NONE, /* division */ - AARCH64_APPROX_NONE, /* sqrt */ - AARCH64_APPROX_NONE /* recip_sqrt */ -}; - -/* Approximation modes for Exynos M1. */ -static const cpu_approx_modes exynosm1_approx_modes = -{ - AARCH64_APPROX_NONE, /* division */ - AARCH64_APPROX_ALL, /* sqrt */ - AARCH64_APPROX_ALL /* recip_sqrt */ -}; - -/* Approximation modes for X-Gene 1. */ -static const cpu_approx_modes xgene1_approx_modes = -{ - AARCH64_APPROX_NONE, /* division */ - AARCH64_APPROX_NONE, /* sqrt */ - AARCH64_APPROX_ALL /* recip_sqrt */ -}; - -/* Generic prefetch settings (which disable prefetch). */ -static const cpu_prefetch_tune generic_prefetch_tune = -{ - 0, /* num_slots */ - -1, /* l1_cache_size */ - -1, /* l1_cache_line_size */ - -1, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune exynosm1_prefetch_tune = -{ - 0, /* num_slots */ - -1, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - -1, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune qdf24xx_prefetch_tune = -{ - 4, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 512, /* l2_cache_size */ - false, /* prefetch_dynamic_strides */ - 2048, /* minimum_stride */ - 3 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderxt88_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 128, /* l1_cache_line_size */ - 16*1024, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - 3 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderx_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 128, /* l1_cache_line_size */ - -1, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderx2t99_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 256, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune thunderx3t110_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 256, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune tsv110_prefetch_tune = -{ - 0, /* num_slots */ - 64, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 512, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune xgene1_prefetch_tune = -{ - 8, /* num_slots */ - 32, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 256, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune a64fx_prefetch_tune = -{ - 8, /* num_slots */ - 64, /* l1_cache_size */ - 256, /* l1_cache_line_size */ - 32768, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const cpu_prefetch_tune ampere1_prefetch_tune = -{ - 0, /* num_slots */ - 64, /* l1_cache_size */ - 64, /* l1_cache_line_size */ - 2048, /* l2_cache_size */ - true, /* prefetch_dynamic_strides */ - -1, /* minimum_stride */ - -1 /* default_opt_level */ -}; - -static const struct tune_params generic_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "16:12", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits - Neoverse V1. It does not have a noticeable effect on A64FX and should - have at most a very minor effect on SVE2 cores. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa35_tunings = -{ - &cortexa53_extra_costs, - &generic_addrcost_table, - &cortexa53_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 1, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa53_tunings = -{ - &cortexa53_extra_costs, - &generic_addrcost_table, - &cortexa53_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa57_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &cortexa57_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa72_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &cortexa57_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params cortexa73_tunings = -{ - &cortexa57_extra_costs, - &generic_addrcost_table, - &cortexa57_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate. */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params exynosm1_tunings = -{ - &exynosm1_extra_costs, - &exynosm1_addrcost_table, - &exynosm1_regmove_cost, - &exynosm1_vector_cost, - &generic_branch_cost, - &exynosm1_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ - "4", /* function_align. */ - "4", /* jump_align. */ - "4", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 48, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &exynosm1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params thunderxt88_tunings = -{ - &thunderx_extra_costs, - &generic_addrcost_table, - &thunderx_regmove_cost, - &thunderx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ - "8", /* function_align. */ - "8", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderxt88_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const struct tune_params thunderx_tunings = -{ - &thunderx_extra_costs, - &generic_addrcost_table, - &thunderx_regmove_cost, - &thunderx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 2, /* issue_rate */ - AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ - "8", /* function_align. */ - "8", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &thunderx_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const struct tune_params tsv110_tunings = -{ - &tsv110_extra_costs, - &tsv110_addrcost_table, - &tsv110_regmove_cost, - &tsv110_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH - | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ - "16", /* function_align. */ - "4", /* jump_align. */ - "8", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &tsv110_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params xgene1_tunings = -{ - &xgene1_extra_costs, - &xgene1_addrcost_table, - &xgene1_regmove_cost, - &xgene1_vector_cost, - &generic_branch_cost, - &xgene1_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - AARCH64_FUSE_NOTHING, /* fusible_ops */ - "16", /* function_align. */ - "16", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 17, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ - &xgene1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params emag_tunings = -{ - &xgene1_extra_costs, - &xgene1_addrcost_table, - &xgene1_regmove_cost, - &xgene1_vector_cost, - &generic_branch_cost, - &xgene1_approx_modes, - SVE_NOT_IMPLEMENTED, - { 6, /* load_int. */ - 6, /* store_int. */ - 6, /* load_fp. */ - 6, /* store_fp. */ - 6, /* load_pred. */ - 6 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - AARCH64_FUSE_NOTHING, /* fusible_ops */ - "16", /* function_align. */ - "16", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 17, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ - &xgene1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params qdf24xx_tunings = -{ - &qdf24xx_extra_costs, - &qdf24xx_addrcost_table, - &qdf24xx_regmove_cost, - &qdf24xx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ - &qdf24xx_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -/* Tuning structure for the Qualcomm Saphira core. Default to falkor values - for now. */ -static const struct tune_params saphira_tunings = -{ - &generic_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &generic_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD - | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 1, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params thunderx2t99_tunings = -{ - &thunderx2t99_extra_costs, - &thunderx2t99_addrcost_table, - &thunderx2t99_regmove_cost, - &thunderx2t99_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate. */ - (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC - | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 3, /* int_reassoc_width. */ - 2, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderx2t99_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params thunderx3t110_tunings = -{ - &thunderx3t110_extra_costs, - &thunderx3t110_addrcost_table, - &thunderx3t110_regmove_cost, - &thunderx3t110_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 6, /* issue_rate. */ - (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC - | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ - "16", /* function_align. */ - "8", /* jump_align. */ - "16", /* loop_align. */ - 3, /* int_reassoc_width. */ - 2, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderx3t110_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params neoversen1_tunings = -{ - &cortexa76_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &cortexa57_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 5, /* load_fp. */ - 2, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params ampere1_tunings = -{ - &ere1_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &ere1_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | - AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | - AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | - AARCH64_FUSE_CMP_BRANCH), - /* fusible_ops */ - "32", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &ere1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const struct tune_params ampere1a_tunings = -{ - &ere1a_extra_costs, - &generic_addrcost_table, - &generic_regmove_cost, - &ere1_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_NOT_IMPLEMENTED, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 4, /* issue_rate */ - (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | - AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | - AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | - AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ | - AARCH64_FUSE_ADDSUB_2REG_CONST1), - /* fusible_ops */ - "32", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &ere1_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ -}; - -static const advsimd_vec_cost neoversev1_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 4, /* ld2_st2_permute_cost */ - 4, /* ld3_st3_permute_cost */ - 5, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost neoversev1_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 4, /* ld2_st2_permute_cost */ - 7, /* ld3_st3_permute_cost */ - 8, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 31 scalar ADDs could - complete in ~9 cycles and would have a cost of 31. [SU]ADDV - completes in 14 cycles, so give it a cost of 31 + 5. */ - 36, /* reduc_i8_cost */ - /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7. */ - 22, /* reduc_i16_cost */ - /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7. */ - 14, /* reduc_i32_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8. */ - 11, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 15 scalar FADDs could - complete in ~9 cycles and would have a cost of 30. FADDV - completes in 13 cycles, so give it a cost of 30 + 4. */ - 34, /* reduc_f16_cost */ - /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5. */ - 19, /* reduc_f32_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5. */ - 11, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* See the comment above the Advanced SIMD versions. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 19, /* fadda_f16_cost */ - 11, /* fadda_f32_cost */ - 8, /* fadda_f64_cost */ - 32, /* gather_load_x32_cost */ - 16, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info = -{ - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ -}; - -static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info = -{ - { - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ -}; - -static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info = -{ - { - { - 2, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 2, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 1, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoversev1_vec_issue_info = -{ - &neoversev1_scalar_issue_info, - &neoversev1_advsimd_issue_info, - &neoversev1_sve_issue_info -}; - -/* Neoverse V1 costs for vector insn classes. */ -static const struct cpu_vector_cost neoversev1_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversev1_advsimd_vector_cost, /* advsimd */ - &neoversev1_sve_vector_cost, /* sve */ - &neoversev1_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoversev1_tunings = -{ - &cortexa76_extra_costs, - &neoversev1_addrcost_table, - &neoversev1_regmove_cost, - &neoversev1_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_256, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 6, /* load_fp. */ - 2, /* store_fp. */ - 6, /* load_pred. */ - 1 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const sve_vec_cost neoverse512tvb_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 4, /* ld2_st2_permute_cost */ - 5, /* ld3_st3_permute_cost */ - 5, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 15 scalar ADDs could - complete in ~5 cycles and would have a cost of 15. Assume that - [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6. */ - 21, /* reduc_i8_cost */ - /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ - 13, /* reduc_i16_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ - 9, /* reduc_i32_cost */ - /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7. */ - 8, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 7 scalar FADDs could - complete in ~6 cycles and would have a cost of 14. Assume that - FADDV completes in 8 cycles and so give it a cost of 14 + 2. */ - 16, /* reduc_f16_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ - 8, /* reduc_f32_cost */ - /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2. */ - 4, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores generally have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 10, /* fadda_f16_cost */ - 6, /* fadda_f32_cost */ - 4, /* fadda_f64_cost */ - /* A strided Advanced SIMD x64 load would take two parallel FP loads - (6 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather - is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads - (cost 8) and a vec_construct (cost 2). Add a full vector operation - (cost 2) to that, to avoid the difference being lost in rounding. - - There is no easy comparison between a strided Advanced SIMD x32 load - and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector - operation more than a 64-bit gather. */ - 14, /* gather_load_x32_cost */ - 12, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info = -{ - { - { - 3, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 2, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info = -{ - &neoversev1_scalar_issue_info, - &neoversev1_advsimd_issue_info, - &neoverse512tvb_sve_issue_info -}; - -static const struct cpu_vector_cost neoverse512tvb_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversev1_advsimd_vector_cost, /* advsimd */ - &neoverse512tvb_sve_vector_cost, /* sve */ - &neoverse512tvb_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoverse512tvb_tunings = -{ - &cortexa76_extra_costs, - &neoversev1_addrcost_table, - &neoversev1_regmove_cost, - &neoverse512tvb_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_128 | SVE_256, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 6, /* load_fp. */ - 2, /* store_fp. */ - 6, /* load_pred. */ - 1 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const advsimd_vec_cost neoversen2_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 2, /* ld2_st2_permute_cost */ - 2, /* ld3_st3_permute_cost */ - 3, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 4, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost neoversen2_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 3, /* ld2_st2_permute_cost */ - 4, /* ld3_st3_permute_cost */ - 4, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 15 scalar ADDs could - complete in ~5 cycles and would have a cost of 15. [SU]ADDV - completes in 11 cycles, so give it a cost of 15 + 6. */ - 21, /* reduc_i8_cost */ - /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ - 13, /* reduc_i16_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ - 9, /* reduc_i32_cost */ - /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ - 2, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 7 scalar FADDs could - complete in ~8 cycles and would have a cost of 14. FADDV - completes in 6 cycles, so give it a cost of 14 - 2. */ - 12, /* reduc_f16_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ - 6, /* reduc_f32_cost */ - /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0. */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* See the comment above the Advanced SIMD versions. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 10, /* fadda_f16_cost */ - 6, /* fadda_f32_cost */ - 4, /* fadda_f64_cost */ - /* A strided Advanced SIMD x64 load would take two parallel FP loads - (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather - is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads - (cost 8) and a vec_construct (cost 2). Add a full vector operation - (cost 2) to that, to avoid the difference being lost in rounding. - - There is no easy comparison between a strided Advanced SIMD x32 load - and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector - operation more than a 64-bit gather. */ - 14, /* gather_load_x32_cost */ - 12, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info = -{ - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ -}; - -static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info = -{ - { - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 2, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ -}; - -static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info = -{ - { - { - 3, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 2, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 3, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 2, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoversen2_vec_issue_info = -{ - &neoversen2_scalar_issue_info, - &neoversen2_advsimd_issue_info, - &neoversen2_sve_issue_info -}; - -/* Neoverse N2 costs for vector insn classes. */ -static const struct cpu_vector_cost neoversen2_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversen2_advsimd_vector_cost, /* advsimd */ - &neoversen2_sve_vector_cost, /* sve */ - &neoversen2_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoversen2_tunings = -{ - &cortexa76_extra_costs, - &neoversen2_addrcost_table, - &neoversen2_regmove_cost, - &neoversen2_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_128, /* sve_width */ - { 4, /* load_int. */ - 1, /* store_int. */ - 6, /* load_fp. */ - 2, /* store_fp. */ - 6, /* load_pred. */ - 1 /* store_pred. */ - }, /* memmov_cost. */ - 3, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 2, /* int_reassoc_width. */ - 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND - | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const advsimd_vec_cost neoversev2_advsimd_vector_cost = -{ - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 2, /* ld2_st2_permute_cost */ - 2, /* ld3_st3_permute_cost */ - 3, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - 4, /* reduc_i8_cost */ - 4, /* reduc_i16_cost */ - 2, /* reduc_i32_cost */ - 2, /* reduc_i64_cost */ - 6, /* reduc_f16_cost */ - 3, /* reduc_f32_cost */ - 2, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* This depends very much on what the scalar value is and - where it comes from. E.g. some constants take two dependent - instructions or a load, while others might be moved from a GPR. - 4 seems to be a reasonable compromise in practice. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ -}; - -static const sve_vec_cost neoversev2_sve_vector_cost = -{ - { - 2, /* int_stmt_cost */ - 2, /* fp_stmt_cost */ - 3, /* ld2_st2_permute_cost */ - 3, /* ld3_st3_permute_cost */ - 4, /* ld4_st4_permute_cost */ - 3, /* permute_cost */ - /* Theoretically, a reduction involving 15 scalar ADDs could - complete in ~3 cycles and would have a cost of 15. [SU]ADDV - completes in 11 cycles, so give it a cost of 15 + 8. */ - 21, /* reduc_i8_cost */ - /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7. */ - 14, /* reduc_i16_cost */ - /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4. */ - 7, /* reduc_i32_cost */ - /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ - 2, /* reduc_i64_cost */ - /* Theoretically, a reduction involving 7 scalar FADDs could - complete in ~6 cycles and would have a cost of 14. FADDV - completes in 8 cycles, so give it a cost of 14 + 2. */ - 16, /* reduc_f16_cost */ - /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ - 8, /* reduc_f32_cost */ - /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2. */ - 4, /* reduc_f64_cost */ - 2, /* store_elt_extra_cost */ - /* This value is just inherited from the Cortex-A57 table. */ - 8, /* vec_to_scalar_cost */ - /* See the comment above the Advanced SIMD versions. */ - 4, /* scalar_to_vec_cost */ - 4, /* align_load_cost */ - 4, /* unalign_load_cost */ - /* Although stores have a latency of 2 and compete for the - vector pipes, in practice it's better not to model that. */ - 1, /* unalign_store_cost */ - 1 /* store_cost */ - }, - 3, /* clast_cost */ - 10, /* fadda_f16_cost */ - 6, /* fadda_f32_cost */ - 4, /* fadda_f64_cost */ - /* A strided Advanced SIMD x64 load would take two parallel FP loads - (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather - is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads - (cost 8) and a vec_construct (cost 2). Add a full vector operation - (cost 2) to that, to avoid the difference being lost in rounding. - - There is no easy comparison between a strided Advanced SIMD x32 load - and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector - operation more than a 64-bit gather. */ - 14, /* gather_load_x32_cost */ - 12, /* gather_load_x64_cost */ - 3 /* scatter_store_elt_cost */ -}; - -static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info = -{ - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 6, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ -}; - -static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info = -{ - { - 3, /* loads_stores_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 2, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ -}; - -static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info = -{ - { - { - 3, /* loads_per_cycle */ - 2, /* stores_per_cycle */ - 4, /* general_ops_per_cycle */ - 0, /* fp_simd_load_general_ops */ - 1 /* fp_simd_store_general_ops */ - }, - 2, /* ld2_st2_general_ops */ - 3, /* ld3_st3_general_ops */ - 3 /* ld4_st4_general_ops */ - }, - 2, /* pred_ops_per_cycle */ - 2, /* while_pred_ops */ - 2, /* int_cmp_pred_ops */ - 1, /* fp_cmp_pred_ops */ - 1, /* gather_scatter_pair_general_ops */ - 1 /* gather_scatter_pair_pred_ops */ -}; - -static const aarch64_vec_issue_info neoversev2_vec_issue_info = -{ - &neoversev2_scalar_issue_info, - &neoversev2_advsimd_issue_info, - &neoversev2_sve_issue_info -}; - -/* Demeter costs for vector insn classes. */ -static const struct cpu_vector_cost neoversev2_vector_cost = -{ - 1, /* scalar_int_stmt_cost */ - 2, /* scalar_fp_stmt_cost */ - 4, /* scalar_load_cost */ - 1, /* scalar_store_cost */ - 1, /* cond_taken_branch_cost */ - 1, /* cond_not_taken_branch_cost */ - &neoversev2_advsimd_vector_cost, /* advsimd */ - &neoversev2_sve_vector_cost, /* sve */ - &neoversev2_vec_issue_info /* issue_info */ -}; - -static const struct tune_params neoversev2_tunings = -{ - &cortexa76_extra_costs, - &neoversev2_addrcost_table, - &neoversev2_regmove_cost, - &neoversev2_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_128, /* sve_width */ - { 4, /* load_int. */ - 2, /* store_int. */ - 6, /* load_fp. */ - 1, /* store_fp. */ - 6, /* load_pred. */ - 2 /* store_pred. */ - }, /* memmov_cost. */ - 5, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32:16", /* function_align. */ - "4", /* jump_align. */ - "32:16", /* loop_align. */ - 3, /* int_reassoc_width. */ - 6, /* fp_reassoc_width. */ - 4, /* fma_reassoc_width. */ - 3, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND - | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; - -static const struct tune_params a64fx_tunings = -{ - &a64fx_extra_costs, - &a64fx_addrcost_table, - &a64fx_regmove_cost, - &a64fx_vector_cost, - &generic_branch_cost, - &generic_approx_modes, - SVE_512, /* sve_width */ - { 4, /* load_int. */ - 4, /* store_int. */ - 4, /* load_fp. */ - 4, /* store_fp. */ - 4, /* load_pred. */ - 4 /* store_pred. */ - }, /* memmov_cost. */ - 7, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ - "32", /* function_align. */ - "16", /* jump_align. */ - "32", /* loop_align. */ - 4, /* int_reassoc_width. */ - 2, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ - 2, /* vec_reassoc_width. */ - 2, /* min_div_recip_mul_sf. */ - 2, /* min_div_recip_mul_df. */ - 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &a64fx_prefetch_tune, - AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ - AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ -}; +#include "tuning_models/generic.h" +#include "tuning_models/cortexa35.h" +#include "tuning_models/cortexa53.h" +#include "tuning_models/cortexa57.h" +#include "tuning_models/cortexa72.h" +#include "tuning_models/cortexa73.h" +#include "tuning_models/exynosm1.h" +#include "tuning_models/thunderxt88.h" +#include "tuning_models/thunderx.h" +#include "tuning_models/tsv110.h" +#include "tuning_models/xgene1.h" +#include "tuning_models/emag.h" +#include "tuning_models/qdf24xx.h" +#include "tuning_models/saphira.h" +#include "tuning_models/thunderx2t99.h" +#include "tuning_models/thunderx3t110.h" +#include "tuning_models/neoversen1.h" +#include "tuning_models/ampere1.h" +#include "tuning_models/ampere1a.h" +#include "tuning_models/neoversev1.h" +#include "tuning_models/neoverse512tvb.h" +#include "tuning_models/neoversen2.h" +#include "tuning_models/neoversev2.h" +#include "tuning_models/a64fx.h" /* Support for fine-grained override of the tuning structures. */ struct aarch64_tuning_override_function diff --git a/gcc/config/aarch64/tuning_models/a64fx.h b/gcc/config/aarch64/tuning_models/a64fx.h new file mode 100644 index 0000000000000000000000000000000000000000..7b06c27eba1e4de01738bdfdc077460f9135fb41 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/a64fx.h @@ -0,0 +1,169 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_A64FX +#define GCC_AARCH64_H_A64FX + +#include "generic.h" + +static const struct cpu_addrcost_table a64fx_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 2, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost a64fx_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 7, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost a64fx_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 13, /* reduc_i8_cost */ + 13, /* reduc_i16_cost */ + 13, /* reduc_i32_cost */ + 13, /* reduc_i64_cost */ + 13, /* reduc_f16_cost */ + 13, /* reduc_f32_cost */ + 13, /* reduc_f64_cost */ + 13, /* store_elt_extra_cost */ + 13, /* vec_to_scalar_cost */ + 4, /* scalar_to_vec_cost */ + 6, /* align_load_cost */ + 6, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost a64fx_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 13, /* reduc_i8_cost */ + 13, /* reduc_i16_cost */ + 13, /* reduc_i32_cost */ + 13, /* reduc_i64_cost */ + 13, /* reduc_f16_cost */ + 13, /* reduc_f32_cost */ + 13, /* reduc_f64_cost */ + 13, /* store_elt_extra_cost */ + 13, /* vec_to_scalar_cost */ + 4, /* scalar_to_vec_cost */ + 6, /* align_load_cost */ + 6, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 13, /* clast_cost */ + 13, /* fadda_f16_cost */ + 13, /* fadda_f32_cost */ + 13, /* fadda_f64_cost */ + 64, /* gather_load_x32_cost */ + 32, /* gather_load_x64_cost */ + 1 /* scatter_store_elt_cost */ +}; + +static const struct cpu_vector_cost a64fx_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 5, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &a64fx_advsimd_vector_cost, /* advsimd */ + &a64fx_sve_vector_cost, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune a64fx_prefetch_tune = +{ + 8, /* num_slots */ + 64, /* l1_cache_size */ + 256, /* l1_cache_line_size */ + 32768, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params a64fx_tunings = +{ + &a64fx_extra_costs, + &a64fx_addrcost_table, + &a64fx_regmove_cost, + &a64fx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_512, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 7, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32", /* function_align. */ + "16", /* jump_align. */ + "32", /* loop_align. */ + 4, /* int_reassoc_width. */ + 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &a64fx_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_A64FX. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h new file mode 100644 index 0000000000000000000000000000000000000000..8d2a1c696103259f23cf73df26cef9d4fa05ac73 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/ampere1.h @@ -0,0 +1,113 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_AMPERE1 +#define GCC_AARCH64_H_AMPERE1 + +#include "generic.h" + +static const advsimd_vec_cost ampere1_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 12, /* reduc_i8_cost */ + 9, /* reduc_i16_cost */ + 6, /* reduc_i32_cost */ + 5, /* reduc_i64_cost */ + 9, /* reduc_f16_cost */ + 6, /* reduc_f32_cost */ + 5, /* reduc_f64_cost */ + 8, /* store_elt_extra_cost */ + 6, /* vec_to_scalar_cost */ + 7, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Ampere-1 costs for vector insn classes. */ +static const struct cpu_vector_cost ampere1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 3, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &ere1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune ampere1_prefetch_tune = +{ + 0, /* num_slots */ + 64, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 2048, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params ampere1_tunings = +{ + &ere1_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &ere1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | + AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | + AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | + AARCH64_FUSE_CMP_BRANCH), + /* fusible_ops */ + "32", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_AMPERE1. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h new file mode 100644 index 0000000000000000000000000000000000000000..c419ffb3c1a936a01690ad157c6c71dc645273c8 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/ampere1a.h @@ -0,0 +1,65 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_AMPERE1A +#define GCC_AARCH64_H_AMPERE1A + +#include "generic.h" + +static const struct tune_params ampere1a_tunings = +{ + &ere1a_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &ere1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | + AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | + AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | + AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ | + AARCH64_FUSE_ADDSUB_2REG_CONST1), + /* fusible_ops */ + "32", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_AMPERE1A. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa35.h b/gcc/config/aarch64/tuning_models/cortexa35.h new file mode 100644 index 0000000000000000000000000000000000000000..5534335348db96cc57fc9eccd7ff79a624cb528a --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa35.h @@ -0,0 +1,62 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA35 +#define GCC_AARCH64_H_CORTEXA35 + +#include "generic.h" +#include "cortexa53.h" + +static const struct tune_params cortexa35_tunings = +{ + &cortexa53_extra_costs, + &generic_addrcost_table, + &cortexa53_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 1, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA35. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa53.h b/gcc/config/aarch64/tuning_models/cortexa53.h new file mode 100644 index 0000000000000000000000000000000000000000..9dfdccc5968e7f062af5c78f153bfe3838263b0a --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa53.h @@ -0,0 +1,71 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA53 +#define GCC_AARCH64_H_CORTEXA53 + +#include "generic.h" + +static const struct cpu_regmove_cost cortexa53_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const struct tune_params cortexa53_tunings = +{ + &cortexa53_extra_costs, + &generic_addrcost_table, + &cortexa53_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA53. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa57.h b/gcc/config/aarch64/tuning_models/cortexa57.h new file mode 100644 index 0000000000000000000000000000000000000000..9c4789d57833a5879dda8e2fe454ac5f56cb0601 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa57.h @@ -0,0 +1,109 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA57 +#define GCC_AARCH64_H_CORTEXA57 + +#include "generic.h" + +static const struct cpu_regmove_cost cortexa57_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost cortexa57_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 8, /* reduc_i8_cost */ + 8, /* reduc_i16_cost */ + 8, /* reduc_i32_cost */ + 8, /* reduc_i64_cost */ + 8, /* reduc_f16_cost */ + 8, /* reduc_f32_cost */ + 8, /* reduc_f64_cost */ + 8, /* store_elt_extra_cost */ + 8, /* vec_to_scalar_cost */ + 8, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Cortex-A57 costs for vector insn classes. */ +static const struct cpu_vector_cost cortexa57_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &cortexa57_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const struct tune_params cortexa57_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &cortexa57_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA57. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa72.h b/gcc/config/aarch64/tuning_models/cortexa72.h new file mode 100644 index 0000000000000000000000000000000000000000..968171c9b2e898d7479dbcb462e33fe3905e183d --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa72.h @@ -0,0 +1,61 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA72 +#define GCC_AARCH64_H_CORTEXA72 + +#include "generic.h" + +static const struct tune_params cortexa72_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &cortexa57_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_CORTEXA72. */ diff --git a/gcc/config/aarch64/tuning_models/cortexa73.h b/gcc/config/aarch64/tuning_models/cortexa73.h new file mode 100644 index 0000000000000000000000000000000000000000..8d1a504ddac39604dd193ce0f434fd2f5145c129 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/cortexa73.h @@ -0,0 +1,62 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_CORTEXA73 +#define GCC_AARCH64_H_CORTEXA73 + +#include "generic.h" + +static const struct tune_params cortexa73_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &cortexa57_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate. */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + + +#endif /* GCC_AARCH64_H_CORTEXA73. */ diff --git a/gcc/config/aarch64/tuning_models/emag.h b/gcc/config/aarch64/tuning_models/emag.h new file mode 100644 index 0000000000000000000000000000000000000000..3f3402c3fc2a94704eeaf9223ecb0ca1c057cace --- /dev/null +++ b/gcc/config/aarch64/tuning_models/emag.h @@ -0,0 +1,60 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_EMAG +#define GCC_AARCH64_H_EMAG + +#include "generic.h" + +static const struct tune_params emag_tunings = +{ + &xgene1_extra_costs, + &xgene1_addrcost_table, + &xgene1_regmove_cost, + &xgene1_vector_cost, + &generic_branch_cost, + &xgene1_approx_modes, + SVE_NOT_IMPLEMENTED, + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + AARCH64_FUSE_NOTHING, /* fusible_ops */ + "16", /* function_align. */ + "16", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 17, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ + &xgene1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_EMAG. */ diff --git a/gcc/config/aarch64/tuning_models/exynosm1.h b/gcc/config/aarch64/tuning_models/exynosm1.h new file mode 100644 index 0000000000000000000000000000000000000000..a42ea4df97f3f048c41481c304fd3684a69d743b --- /dev/null +++ b/gcc/config/aarch64/tuning_models/exynosm1.h @@ -0,0 +1,144 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_EXYNOSM1 +#define GCC_AARCH64_H_EXYNOSM1 + +#include "generic.h" + +static const struct cpu_addrcost_table exynosm1_addrcost_table = +{ + { + 0, /* hi */ + 0, /* si */ + 0, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 1, /* register_offset */ + 1, /* register_sextend */ + 2, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost exynosm1_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost (actual, 4 and 9). */ + 9, /* GP2FP */ + 9, /* FP2GP */ + 1 /* FP2FP */ +}; + +static const advsimd_vec_cost exynosm1_advsimd_vector_cost = +{ + 3, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 3, /* reduc_i8_cost */ + 3, /* reduc_i16_cost */ + 3, /* reduc_i32_cost */ + 3, /* reduc_i64_cost */ + 3, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 3, /* reduc_f64_cost */ + 3, /* store_elt_extra_cost */ + 3, /* vec_to_scalar_cost */ + 3, /* scalar_to_vec_cost */ + 5, /* align_load_cost */ + 5, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const struct cpu_vector_cost exynosm1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 5, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &exynosm1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +/* Approximation modes for Exynos M1. */ +static const cpu_approx_modes exynosm1_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_ALL, /* sqrt */ + AARCH64_APPROX_ALL /* recip_sqrt */ +}; + +static const cpu_prefetch_tune exynosm1_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params exynosm1_tunings = +{ + &exynosm1_extra_costs, + &exynosm1_addrcost_table, + &exynosm1_regmove_cost, + &exynosm1_vector_cost, + &generic_branch_cost, + &exynosm1_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ + "4", /* function_align. */ + "4", /* jump_align. */ + "4", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 48, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &exynosm1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_EXYNOSM1. */ diff --git a/gcc/config/aarch64/tuning_models/generic.h b/gcc/config/aarch64/tuning_models/generic.h new file mode 100644 index 0000000000000000000000000000000000000000..deb2c1cffe255bddcb5be571b12086442782da60 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/generic.h @@ -0,0 +1,190 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + Contributed by ARM Ltd. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_GENERIC +#define GCC_AARCH64_H_GENERIC + +static const struct cpu_addrcost_table generic_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost generic_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +/* Generic costs for Advanced SIMD vector operations. */ +static const advsimd_vec_cost generic_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Generic costs for SVE vector operations. */ +static const sve_vec_cost generic_sve_vector_cost = +{ + { + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 2, /* clast_cost */ + 2, /* fadda_f16_cost */ + 2, /* fadda_f32_cost */ + 2, /* fadda_f64_cost */ + 4, /* gather_load_x32_cost */ + 2, /* gather_load_x64_cost */ + 1 /* scatter_store_elt_cost */ +}; + +/* Generic costs for vector insn classes. */ +static const struct cpu_vector_cost generic_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 1, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &generic_advsimd_vector_cost, /* advsimd */ + &generic_sve_vector_cost, /* sve */ + nullptr /* issue_info */ +}; + +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_branch_cost = +{ + 1, /* Predictable. */ + 3 /* Unpredictable. */ +}; + +/* Generic approximation modes. */ +static const cpu_approx_modes generic_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_NONE, /* sqrt */ + AARCH64_APPROX_NONE /* recip_sqrt */ +}; + +/* Generic prefetch settings (which disable prefetch). */ +static const cpu_prefetch_tune generic_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + -1, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params generic_tunings = +{ + &cortexa57_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "16:12", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits + Neoverse V1. It does not have a noticeable effect on A64FX and should + have at most a very minor effect on SVE2 cores. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_GENERIC. */ diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h new file mode 100644 index 0000000000000000000000000000000000000000..50d7b23712cc6a8be8f35246657ec5d86d6d4191 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h @@ -0,0 +1,164 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSE512TVB +#define GCC_AARCH64_H_NEOVERSE512TVB + +#include "generic.h" + +static const sve_vec_cost neoverse512tvb_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 5, /* ld3_st3_permute_cost */ + 5, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. Assume that + [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ + 13, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ + 9, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7. */ + 8, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~6 cycles and would have a cost of 14. Assume that + FADDV completes in 8 cycles and so give it a cost of 14 + 2. */ + 16, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ + 8, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2. */ + 4, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores generally have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (6 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info = +{ + &neoversev1_scalar_issue_info, + &neoversev1_advsimd_issue_info, + &neoverse512tvb_sve_issue_info +}; + +static const struct cpu_vector_cost neoverse512tvb_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev1_advsimd_vector_cost, /* advsimd */ + &neoverse512tvb_sve_vector_cost, /* sve */ + &neoverse512tvb_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoverse512tvb_tunings = +{ + &cortexa76_extra_costs, + &neoversev1_addrcost_table, + &neoversev1_regmove_cost, + &neoverse512tvb_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_128 | SVE_256, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSE512TVB. */ diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h b/gcc/config/aarch64/tuning_models/neoversen1.h new file mode 100644 index 0000000000000000000000000000000000000000..132166d3d06430b725e4448937332cc159c11cda --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversen1.h @@ -0,0 +1,60 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEN1 +#define GCC_AARCH64_H_NEOVERSEN1 + +#include "generic.h" + +static const struct tune_params neoversen1_tunings = +{ + &cortexa76_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &cortexa57_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 5, /* load_fp. */ + 2, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSEN1. */ diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h new file mode 100644 index 0000000000000000000000000000000000000000..395a6d82b8403e586bf179cade055543cf9b9eb0 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -0,0 +1,245 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEN2 +#define GCC_AARCH64_H_NEOVERSEN2 + +#include "generic.h" + +static const struct cpu_addrcost_table neoversen2_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost neoversen2_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost neoversen2_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversen2_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 3, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 4, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 6. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ + 13, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ + 9, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ + 2, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~8 cycles and would have a cost of 14. FADDV + completes in 6 cycles, so give it a cost of 14 - 2. */ + 12, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ + 6, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0. */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 3, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoversen2_vec_issue_info = +{ + &neoversen2_scalar_issue_info, + &neoversen2_advsimd_issue_info, + &neoversen2_sve_issue_info +}; + +/* Neoverse N2 costs for vector insn classes. */ +static const struct cpu_vector_cost neoversen2_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversen2_advsimd_vector_cost, /* advsimd */ + &neoversen2_sve_vector_cost, /* sve */ + &neoversen2_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoversen2_tunings = +{ + &cortexa76_extra_costs, + &neoversen2_addrcost_table, + &neoversen2_regmove_cost, + &neoversen2_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_128, /* sve_width */ + { 4, /* load_int. */ + 1, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSEN2. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h new file mode 100644 index 0000000000000000000000000000000000000000..584a5000e06f598dcdd3bcc533dc6dbc642223ca --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversev1.h @@ -0,0 +1,237 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEV1 +#define GCC_AARCH64_H_NEOVERSEV1 + +#include "generic.h" + +static const struct cpu_addrcost_table neoversev1_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 3, /* post_modify_ld3_st3 */ + 3, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost neoversev1_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost neoversev1_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 5, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversev1_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 4, /* ld2_st2_permute_cost */ + 7, /* ld3_st3_permute_cost */ + 8, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 31 scalar ADDs could + complete in ~9 cycles and would have a cost of 31. [SU]ADDV + completes in 14 cycles, so give it a cost of 31 + 5. */ + 36, /* reduc_i8_cost */ + /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7. */ + 22, /* reduc_i16_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7. */ + 14, /* reduc_i32_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8. */ + 11, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 15 scalar FADDs could + complete in ~9 cycles and would have a cost of 30. FADDV + completes in 13 cycles, so give it a cost of 30 + 4. */ + 34, /* reduc_f16_cost */ + /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5. */ + 19, /* reduc_f32_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5. */ + 11, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 19, /* fadda_f16_cost */ + 11, /* fadda_f32_cost */ + 8, /* fadda_f64_cost */ + 32, /* gather_load_x32_cost */ + 16, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info = +{ + { + { + 2, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 1, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoversev1_vec_issue_info = +{ + &neoversev1_scalar_issue_info, + &neoversev1_advsimd_issue_info, + &neoversev1_sve_issue_info +}; + +/* Neoverse V1 costs for vector insn classes. */ +static const struct cpu_vector_cost neoversev1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev1_advsimd_vector_cost, /* advsimd */ + &neoversev1_sve_vector_cost, /* sve */ + &neoversev1_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoversev1_tunings = +{ + &cortexa76_extra_costs, + &neoversev1_addrcost_table, + &neoversev1_regmove_cost, + &neoversev1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_256, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + + +#endif /* GCC_AARCH64_H_NEOVERSEV1. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h new file mode 100644 index 0000000000000000000000000000000000000000..28d4244ef4c99ecdffb7408e39dc21bc191223de --- /dev/null +++ b/gcc/config/aarch64/tuning_models/neoversev2.h @@ -0,0 +1,245 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_NEOVERSEV2 +#define GCC_AARCH64_H_NEOVERSEV2 + +#include "generic.h" + +static const struct cpu_addrcost_table neoversev2_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost neoversev2_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost neoversev2_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversev2_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 3, /* ld2_st2_permute_cost */ + 3, /* ld3_st3_permute_cost */ + 4, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~3 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 8. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7. */ + 14, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4. */ + 7, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ + 2, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~6 cycles and would have a cost of 14. FADDV + completes in 8 cycles, so give it a cost of 14 + 2. */ + 16, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2. */ + 8, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2. */ + 4, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 6, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 3, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info neoversev2_vec_issue_info = +{ + &neoversev2_scalar_issue_info, + &neoversev2_advsimd_issue_info, + &neoversev2_sve_issue_info +}; + +/* Demeter costs for vector insn classes. */ +static const struct cpu_vector_cost neoversev2_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &neoversev2_advsimd_vector_cost, /* advsimd */ + &neoversev2_sve_vector_cost, /* sve */ + &neoversev2_vec_issue_info /* issue_info */ +}; + +static const struct tune_params neoversev2_tunings = +{ + &cortexa76_extra_costs, + &neoversev2_addrcost_table, + &neoversev2_regmove_cost, + &neoversev2_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_128, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 6, /* load_fp. */ + 1, /* store_fp. */ + 6, /* load_pred. */ + 2 /* store_pred. */ + }, /* memmov_cost. */ + 5, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 3, /* int_reassoc_width. */ + 6, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ + 3, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_NEOVERSEV2. */ diff --git a/gcc/config/aarch64/tuning_models/qdf24xx.h b/gcc/config/aarch64/tuning_models/qdf24xx.h new file mode 100644 index 0000000000000000000000000000000000000000..29c9b9f5843acc15450a2492b141c02ee48a3f13 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/qdf24xx.h @@ -0,0 +1,137 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_QDF24XX +#define GCC_AARCH64_H_QDF24XX + +#include "generic.h" + +static const struct cpu_addrcost_table qdf24xx_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 1, /* pre_modify */ + 1, /* post_modify */ + 1, /* post_modify_ld3_st3 */ + 1, /* post_modify_ld4_st4 */ + 3, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 2, /* imm_offset */ +}; + +static const struct cpu_regmove_cost qdf24xx_regmove_cost = +{ + 2, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 6, /* GP2FP */ + 6, /* FP2GP */ + 4 /* FP2FP */ +}; + +static const advsimd_vec_cost qdf24xx_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 1, /* reduc_i8_cost */ + 1, /* reduc_i16_cost */ + 1, /* reduc_i32_cost */ + 1, /* reduc_i64_cost */ + 1, /* reduc_f16_cost */ + 1, /* reduc_f32_cost */ + 1, /* reduc_f64_cost */ + 1, /* store_elt_extra_cost */ + 1, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* QDF24XX costs for vector insn classes. */ +static const struct cpu_vector_cost qdf24xx_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 1, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &qdf24xx_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune qdf24xx_prefetch_tune = +{ + 4, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 512, /* l2_cache_size */ + false, /* prefetch_dynamic_strides */ + 2048, /* minimum_stride */ + 3 /* default_opt_level */ +}; + +static const struct tune_params qdf24xx_tunings = +{ + &qdf24xx_extra_costs, + &qdf24xx_addrcost_table, + &qdf24xx_regmove_cost, + &qdf24xx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ + &qdf24xx_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_QDF24XX. */ diff --git a/gcc/config/aarch64/tuning_models/saphira.h b/gcc/config/aarch64/tuning_models/saphira.h new file mode 100644 index 0000000000000000000000000000000000000000..e584d316bb7c3c2d232cf7623a92100ad261f07d --- /dev/null +++ b/gcc/config/aarch64/tuning_models/saphira.h @@ -0,0 +1,63 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_SAPHIRA +#define GCC_AARCH64_H_SAPHIRA + +#include "generic.h" + +/* Tuning structure for the Qualcomm Saphira core. Default to falkor values + for now. */ +static const struct tune_params saphira_tunings = +{ + &generic_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_SAPHIRA. */ diff --git a/gcc/config/aarch64/tuning_models/thunderx.h b/gcc/config/aarch64/tuning_models/thunderx.h new file mode 100644 index 0000000000000000000000000000000000000000..dd4b9d539fc5cf2bd20d84e91d6b72fa7237f99f --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderx.h @@ -0,0 +1,117 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERX +#define GCC_AARCH64_H_THUNDERX + +#include "generic.h" + +static const struct cpu_regmove_cost thunderx_regmove_cost = +{ + 2, /* GP2GP */ + 2, /* GP2FP */ + 6, /* FP2GP */ + 4 /* FP2FP */ +}; + +static const advsimd_vec_cost thunderx_advsimd_vector_cost = +{ + 4, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 4, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 2, /* scalar_to_vec_cost */ + 3, /* align_load_cost */ + 5, /* unalign_load_cost */ + 5, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* ThunderX costs for vector insn classes. */ +static const struct cpu_vector_cost thunderx_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 3, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 3, /* cond_not_taken_branch_cost */ + &thunderx_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune thunderx_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 128, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params thunderx_tunings = +{ + &thunderx_extra_costs, + &generic_addrcost_table, + &thunderx_regmove_cost, + &thunderx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ + "8", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &thunderx_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERX. */ diff --git a/gcc/config/aarch64/tuning_models/thunderx2t99.h b/gcc/config/aarch64/tuning_models/thunderx2t99.h new file mode 100644 index 0000000000000000000000000000000000000000..0a376e0bab37b0b5bc1ea23de0e96a9245846fd7 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderx2t99.h @@ -0,0 +1,137 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERX2T99 +#define GCC_AARCH64_H_THUNDERX2T99 + +#include "generic.h" + +static const struct cpu_addrcost_table thunderx2t99_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 2, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost thunderx2t99_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 5, /* GP2FP */ + 6, /* FP2GP */ + 3, /* FP2FP */ +}; + +static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost = +{ + 4, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 10, /* permute_cost */ + 6, /* reduc_i8_cost */ + 6, /* reduc_i16_cost */ + 6, /* reduc_i32_cost */ + 6, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 6, /* reduc_f32_cost */ + 6, /* reduc_f64_cost */ + 6, /* store_elt_extra_cost */ + 6, /* vec_to_scalar_cost */ + 5, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Costs for vector insn classes for Vulcan. */ +static const struct cpu_vector_cost thunderx2t99_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 6, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 2, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &thunderx2t99_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune thunderx2t99_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 256, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params thunderx2t99_tunings = +{ + &thunderx2t99_extra_costs, + &thunderx2t99_addrcost_table, + &thunderx2t99_regmove_cost, + &thunderx2t99_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate. */ + (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC + | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 3, /* int_reassoc_width. */ + 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderx2t99_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERX2T99. */ diff --git a/gcc/config/aarch64/tuning_models/thunderx3t110.h b/gcc/config/aarch64/tuning_models/thunderx3t110.h new file mode 100644 index 0000000000000000000000000000000000000000..65203b4af132e12e4994013fbab228bd3873b756 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderx3t110.h @@ -0,0 +1,136 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERX3T110 +#define GCC_AARCH64_H_THUNDERX3T110 + +#include "generic.h" + +static const struct cpu_addrcost_table thunderx3t110_addrcost_table = +{ + { + 1, /* hi */ + 1, /* si */ + 1, /* di */ + 2, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 2, /* register_offset */ + 3, /* register_sextend */ + 3, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost thunderx3t110_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 4, /* GP2FP */ + 5, /* FP2GP */ + 4 /* FP2FP */ +}; + +static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost = +{ + 5, /* int_stmt_cost */ + 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 10, /* permute_cost */ + 5, /* reduc_i8_cost */ + 5, /* reduc_i16_cost */ + 5, /* reduc_i32_cost */ + 5, /* reduc_i64_cost */ + 5, /* reduc_f16_cost */ + 5, /* reduc_f32_cost */ + 5, /* reduc_f64_cost */ + 5, /* store_elt_extra_cost */ + 5, /* vec_to_scalar_cost */ + 5, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + 4, /* unalign_store_cost */ + 4 /* store_cost */ +}; + +static const struct cpu_vector_cost thunderx3t110_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 5, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 2, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &thunderx3t110_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune thunderx3t110_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 256, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params thunderx3t110_tunings = +{ + &thunderx3t110_extra_costs, + &thunderx3t110_addrcost_table, + &thunderx3t110_regmove_cost, + &thunderx3t110_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 6, /* issue_rate. */ + (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC + | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ + "16", /* function_align. */ + "8", /* jump_align. */ + "16", /* loop_align. */ + 3, /* int_reassoc_width. */ + 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderx3t110_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERX3T110. */ diff --git a/gcc/config/aarch64/tuning_models/thunderxt88.h b/gcc/config/aarch64/tuning_models/thunderxt88.h new file mode 100644 index 0000000000000000000000000000000000000000..dcc74d31484ee6b99d37920dbfe7b1d59377d074 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/thunderxt88.h @@ -0,0 +1,72 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_THUNDERXT88 +#define GCC_AARCH64_H_THUNDERXT88 + +#include "generic.h" +#include "thunderx.h" + +static const cpu_prefetch_tune thunderxt88_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 128, /* l1_cache_line_size */ + 16*1024, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + 3 /* default_opt_level */ +}; + +static const struct tune_params thunderxt88_tunings = +{ + &thunderx_extra_costs, + &generic_addrcost_table, + &thunderx_regmove_cost, + &thunderx_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 2, /* issue_rate */ + AARCH64_FUSE_ALU_BRANCH, /* fusible_ops */ + "8", /* function_align. */ + "8", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderxt88_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_THUNDERXT88. */ diff --git a/gcc/config/aarch64/tuning_models/tsv110.h b/gcc/config/aarch64/tuning_models/tsv110.h new file mode 100644 index 0000000000000000000000000000000000000000..42aeafce652fff34e3277194993dd4aa1f0383a1 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/tsv110.h @@ -0,0 +1,137 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_TSV110 +#define GCC_AARCH64_H_TSV110 + +#include "generic.h" + +static const struct cpu_addrcost_table tsv110_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 1, /* register_sextend */ + 1, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost tsv110_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 2, /* GP2FP */ + 3, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost tsv110_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 3, /* reduc_i8_cost */ + 3, /* reduc_i16_cost */ + 3, /* reduc_i32_cost */ + 3, /* reduc_i64_cost */ + 3, /* reduc_f16_cost */ + 3, /* reduc_f32_cost */ + 3, /* reduc_f64_cost */ + 3, /* store_elt_extra_cost */ + 3, /* vec_to_scalar_cost */ + 2, /* scalar_to_vec_cost */ + 5, /* align_load_cost */ + 5, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const struct cpu_vector_cost tsv110_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 5, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &tsv110_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +static const cpu_prefetch_tune tsv110_prefetch_tune = +{ + 0, /* num_slots */ + 64, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 512, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params tsv110_tunings = +{ + &tsv110_extra_costs, + &tsv110_addrcost_table, + &tsv110_regmove_cost, + &tsv110_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 4, /* store_int. */ + 4, /* load_fp. */ + 4, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH + | AARCH64_FUSE_ALU_CBZ), /* fusible_ops */ + "16", /* function_align. */ + "4", /* jump_align. */ + "8", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &tsv110_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_TSV110. */ diff --git a/gcc/config/aarch64/tuning_models/xgene1.h b/gcc/config/aarch64/tuning_models/xgene1.h new file mode 100644 index 0000000000000000000000000000000000000000..53a3eb0ddeb80a9735cc988e242a70e87dc90655 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/xgene1.h @@ -0,0 +1,145 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_XGENE1 +#define GCC_AARCH64_H_XGENE1 + +#include "generic.h" + +static const struct cpu_addrcost_table xgene1_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 1, /* pre_modify */ + 1, /* post_modify */ + 1, /* post_modify_ld3_st3 */ + 1, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 1, /* register_sextend */ + 1, /* register_zextend */ + 0, /* imm_offset */ +}; + +static const struct cpu_regmove_cost xgene1_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 8, /* GP2FP */ + 8, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost xgene1_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 4, /* reduc_i32_cost */ + 4, /* reduc_i64_cost */ + 4, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 4, /* reduc_f64_cost */ + 4, /* store_elt_extra_cost */ + 4, /* vec_to_scalar_cost */ + 4, /* scalar_to_vec_cost */ + 10, /* align_load_cost */ + 10, /* unalign_load_cost */ + 2, /* unalign_store_cost */ + 2 /* store_cost */ +}; + +/* Generic costs for vector insn classes. */ +static const struct cpu_vector_cost xgene1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 5, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 2, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &xgene1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; + +/* Approximation modes for X-Gene 1. */ +static const cpu_approx_modes xgene1_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_NONE, /* sqrt */ + AARCH64_APPROX_ALL /* recip_sqrt */ +}; + +static const cpu_prefetch_tune xgene1_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 256, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params xgene1_tunings = +{ + &xgene1_extra_costs, + &xgene1_addrcost_table, + &xgene1_regmove_cost, + &xgene1_vector_cost, + &generic_branch_cost, + &xgene1_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 6, /* load_int. */ + 6, /* store_int. */ + 6, /* load_fp. */ + 6, /* store_fp. */ + 6, /* load_pred. */ + 6 /* store_pred. */ + }, /* memmov_cost. */ + 4, /* issue_rate */ + AARCH64_FUSE_NOTHING, /* fusible_ops */ + "16", /* function_align. */ + "16", /* jump_align. */ + "16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 17, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ + &xgene1_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_XGENE1. */ From patchwork Wed Nov 15 17:07:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1864435 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=lJNVbeuD; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=lJNVbeuD; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SVqNx1ZP5z1yRV for ; Thu, 16 Nov 2023 04:08:00 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2F343858419 for ; Wed, 15 Nov 2023 17:07:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on2048.outbound.protection.outlook.com [40.107.13.48]) by sourceware.org (Postfix) with ESMTPS id B653F3858D35 for ; Wed, 15 Nov 2023 17:07:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B653F3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B653F3858D35 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.13.48 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068061; cv=pass; b=YyiH0wCJ4YMoiFY66AKQSredxQ3w6sohJFXCmXNdKt4aRX5w8ncOJCAElfxHDs2/S7iHutQb8bqr+9xxn65cZFl4L98AKpXcmc7lhqcqdtUYYbvsEBDJC3Y9/SPMa1kPTaksl6k1cij0LZJ9fayrYf8NhzvnbOF7kV62p8t5mMw= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068061; c=relaxed/simple; bh=XW2uSm5BXmVhJ8VSEYP24ev5+lTbfgocovzqopxjlY0=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=ZFJrk3B0m4yJCgDk9Cq034pqGtblEWBBajhjZ9jaRDcb/a9iFzfcNuz66kNedr0ys4cPRn0RLasazjD8oboMXDU73qL69fSRwZEwaUX6vip/LHSouoYQLWSF0JQe+J0GI/Ce7rPs4F6QyYmEa5gqdb+mOgAr4qQr/8vf9QbVYVc= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=h+G7y4YYkhU3IxGMcgGo0m7UIDrpkvQU7KC0MH60NPP4r1pZ2OrnzTS5tgwKvw8te7iKdu6vvjP7W7aznYZOHAGsaUdyrwvfF2cPbuSCSfc/hTeH8odW69A7QmR+8Aj7HUDYoIWt/bZSeEcj1z3yuoIwnMPFrxacnwMLFCo6os+fZys7s9AcqyxpeviG/BdXs09j0lUIQ5ks4TNjoDxZ6/9zg/D/TeKKY2gWVbeLBYM3sqNOhaWExsWLKeEPJw+Yd6Yqn3SHTMhQCgGqhGe+iNACykc/wxXSqGdh4ntlAWfrlqyRV0CQ0q3QuBw0mcy9vdCT0lTFxzi44BTUW2Wr0Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4pyaNif596428Z9kJXWHdAYAlgk3U6lzi/eeoTP5Liw=; b=ZUKrUB71bZGZArM0fKMDFBUcYkHs3pz0YseSKOAnkzcz6YTWr15xqayFKUMdHH9IIoi8UuWBTdcNznver9dg9a6VYVESaTj8jLj2667IqUUZqFk7klohSJeuTOMMXiFfyXbc9o8YiYL2R/N0AWW78E5yEXHJy5v5COAT+1/Mbb4J/Idz/pYdBC2RF7KyAJNGYt4vgD/kCr3DrjqQBTyJZ27Rp2YzvoaSH2R6bQSO2w5XC9tqOCfolSMmVR56VFEHfBwgFBAtn/nJ8BLMiVcB27ZpfJtlaP7ulHaWPBW+clrlaNLKosP96pb9GtV4KQVzor22LHgv4QvkpzSgN0+asw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4pyaNif596428Z9kJXWHdAYAlgk3U6lzi/eeoTP5Liw=; b=lJNVbeuD3LHD3G4x2xNhZmJVJfQH83+Je3B62pJ6uAoOZrz2WnH6teTVCFRZj/RHnOGwVIkwcnk90k7W07KIXf93cxt65Szbftxva81ELS/iaNzy5moDNGdOUlYg7694lVztwy6g+c3bGrtHfZ7gMoGsH6+Tbm3i1V3rGyJUVKI= Received: from AM5PR1001CA0005.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:206:2::18) by AS2PR08MB9811.eurprd08.prod.outlook.com (2603:10a6:20b:604::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.31; Wed, 15 Nov 2023 17:07:36 +0000 Received: from AMS0EPF00000190.eurprd05.prod.outlook.com (2603:10a6:206:2:cafe::88) by AM5PR1001CA0005.outlook.office365.com (2603:10a6:206:2::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.19 via Frontend Transport; Wed, 15 Nov 2023 17:07:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS0EPF00000190.mail.protection.outlook.com (10.167.16.213) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.19 via Frontend Transport; Wed, 15 Nov 2023 17:07:35 +0000 Received: ("Tessian outbound 5d213238733f:v228"); Wed, 15 Nov 2023 17:07:35 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 97aacdcc073aebea X-CR-MTA-TID: 64aa7808 Received: from aca05202a038.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BE7754EF-8265-460B-9255-E9DDFE653ACA.1; Wed, 15 Nov 2023 17:07:29 +0000 Received: from EUR01-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id aca05202a038.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 15 Nov 2023 17:07:29 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=borfbBpqG2Nj6K7qwZa0kj0tr4KOqJ556/zOd19qz3sxRUqPz4fS505VR3N9P6S68pOFKP8+P7xrynJgMn5Tl+pOyPdJ6Sx9K1sYOcvrGRhkr9jnApxZU6Yg2i4dobAgtlvAH6eraFqMxgZONXvBo1rjOm83b30DX4SFuEEQ9tbU9BhBIyrC1X0eV2PTRBoLQAETb+HA3o2itv+CpE3Oy4MJRPyNp3j1rPF9HhJBS8qsgzocoJQpJBXplD+wC20DKxY0YU1I/3LmhgJBPEPgDwDIBh/x2Ts+KmKtOPLFcfz1Y3ducFASuGPOcWmeVC4bW5eI5sd5QoZ307HA5a4igg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4pyaNif596428Z9kJXWHdAYAlgk3U6lzi/eeoTP5Liw=; b=ApCT24i5VWV6USQVV0GZboZOPuiNU1CWbwZWMYQtRBaRCjiiPcvbniXrb/rGioHulGqvob7j8NGbjSZYj4jbqANH74uEdtnf9c+hB9ibcl64BcQM1G3Y0bHzc2UgTskZ9XzPDQv/otoEfkypFd489vmwIDc9rMYdRchrXAfJMPNQy+N95HdNngPY5uPk5YaPY6+1iZxz8HhjsEoP1YI/dYfbQvVecaobxS+41Tko/8NPUObFd+AubAgdPIrxS/VKfQZA1uE+dMD6VZQ9aWc0hViOKM8HPFNqJq9xE97JIpF38y7QmUYme5iYriPgbUhNd2BiKSFZxRNXLZez0bf+hA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4pyaNif596428Z9kJXWHdAYAlgk3U6lzi/eeoTP5Liw=; b=lJNVbeuD3LHD3G4x2xNhZmJVJfQH83+Je3B62pJ6uAoOZrz2WnH6teTVCFRZj/RHnOGwVIkwcnk90k7W07KIXf93cxt65Szbftxva81ELS/iaNzy5moDNGdOUlYg7694lVztwy6g+c3bGrtHfZ7gMoGsH6+Tbm3i1V3rGyJUVKI= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAVPR08MB9040.eurprd08.prod.outlook.com (2603:10a6:102:32d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Wed, 15 Nov 2023 17:07:26 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.6977.029; Wed, 15 Nov 2023 17:07:26 +0000 Date: Wed, 15 Nov 2023 17:07:23 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 2/6]AArch64: Remove special handling of generic cpu. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0468.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1aa::23) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAVPR08MB9040:EE_|AMS0EPF00000190:EE_|AS2PR08MB9811:EE_ X-MS-Office365-Filtering-Correlation-Id: bff35dcc-c05a-4254-126d-08dbe5fd5d79 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: D+ou1AhRp9JMZNNtfCDDVIHoCcwDVBt1iWxIN1lxv52naeIN8scJDLaAbTZJrmJNwm7n8/kfETVLgp+N+ZNB47XfCE3D9K2lVBpnP/LT18rRscyKKLtZi8hX4WLhO0AYyPRUdDUq3AonWpJ/DYp6QrIoybKp8dj/2ev6YpuB7X5/CKj4gl2Ueeqtfh6750VhSa43+xMAZCeI4s5GBHNPLWcFlLxOaSomdJzDWz1hBMOdezwOFBtg5TdTOxPDDA9GbTfYfPuIfzMTAUjrHfqWOppcOsJgLjwZpJdmzxoe6nHx5BdvQg2NsrgsyuErCnXYfwhsUiwQctiP6x5+VgAdxAG+sC0IcunWffkSNLkL9fZEwZBgicy9/WtaaSDYt9H/NxR/OqwBlX4x1swqB7tty6gfGP2rqEISe5JO1vn15nHGrUhauiSPLSSUuuybrlhT6zEVJg/qkgeTk9DaNLWBUmISCqdis/58wcAcU5I8nPcJBG0fOdRRnqnBL5ke3b8KynvIm5RxtPWpS6Is6EbmCyaAKNuV3fBh3tPaBSdB0fan9U0XcEpPakKQprv9KqgmIV5sw13+c+bidPEs8BQBxaWj3lKlXxuvbmB6OTIof/PcpTBn1cuaF8nhEVQZx8kZrp3sf6m685eLokNCvm3aMg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(396003)(346002)(366004)(376002)(136003)(230922051799003)(64100799003)(451199024)(1800799009)(186009)(2616005)(235185007)(26005)(478600001)(6666004)(6506007)(5660300002)(6512007)(4743002)(2906002)(8936002)(8676002)(4326008)(33964004)(44832011)(44144004)(316002)(41300700001)(38100700002)(6486002)(66476007)(66556008)(66946007)(6916009)(36756003)(86362001)(41533002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9040 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS0EPF00000190.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 7fa1ccda-0e8b-452b-8fc4-08dbe5fd5764 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: J2lAC8M67+FMiDbfYQzZ+O7sgaf9+jcTtZNPrmhIo9DRefSe9uaI7Btxgx0zm/ZToVZg3JQ7pxB2UsrK/T3W9vIqwHhxRHcLGxTlnI+HAS4LPCMpcMg4TEezaADRvJsumn6KTu0av96XD4BNFoXIv3Y12ROPiwwTQKmCy8Imn9xmgLY+2UvdVSqC6p93crT+ZZP3pn4IGbApXuecFHPA+hzvt/X0rRosXBUvpOtffrPif8VpxIPlOxyUc2XY+nqVk6iC3UHylCipJQH28cYyg0wfInwZTT0YVMZaGI3IjFd7fKQwBIapz/k4o8iI/hcp2T2VEk5CXGwDCJ3RI3WQNlbM2qjVATVL2Xe3frG/Bqv0OrFhtEddKXHUDM2c2/aE+aQOmXdEO6kLuGY2AA5eV0GnoiH86EN7xzXy+oCwNZE6Lnjewn78yta7UHGOLdHPyIcK58g5t8Fe8+rPInSaHVFg7c889EqE5xjAVfQot1hTIICRoG6DJ3/M28WMPqmT6eUCYdlLpTKiADVQBqdDhrz7cQy9V35w60ueMbGMsIxKz0QrEV94rarpV7+RQ8iELAxQcI+WxpbMrTdEER10CcOzBL4zegZyHhoqwLES9lp8oXGF4Uep5d/jFZqI7RIihWAAOpUoKMCahqBZFqFnEnQfqb6pgoXxfI5dv+ggteSuxMnxQYDaZ8rEyCHABb2b906RugG17EBTP1vdYZHRm1yyQOrXIS/SMrQ+hVEQdRoI8xnFrcvjlga8zPtdtfmnijBq7LEHTgOK2zXxCRIhslgCbIqLikWeDhrA2xmCKRI= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(39860400002)(396003)(136003)(376002)(346002)(230922051799003)(64100799003)(1800799009)(82310400011)(186009)(451199024)(46966006)(36840700001)(40470700004)(2906002)(5660300002)(40460700003)(235185007)(41300700001)(36756003)(81166007)(356005)(86362001)(336012)(4743002)(26005)(2616005)(6506007)(6486002)(47076005)(6512007)(6666004)(33964004)(82740400003)(36860700001)(44144004)(478600001)(4326008)(8676002)(8936002)(44832011)(70206006)(6916009)(316002)(70586007)(40480700001)(41533002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Nov 2023 17:07:35.9978 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bff35dcc-c05a-4254-126d-08dbe5fd5d79 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF00000190.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9811 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, In anticipation of adding new generic turning values this removes the hardcoding of the "generic" CPU and instead just specifies it as a normal CPU. No change in behavior is expected. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-cores.def: Add generic. * config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic. * config/aarch64/aarch64-tune.md: Regenerate * config/aarch64/aarch64.cc (all_cores): Remove generic * config/aarch64/aarch64.h (enum target_cpus): Remove TARGET_CPU_generic. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40 100644 --- diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPER AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1) AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1) +/* Generic Architecture Processors. */ +AARCH64_CORE("generic", generic, cortexa53, V8A, (), generic, 0x0, 0x0, -1) + #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h index 831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70 100644 --- a/gcc/config/aarch64/aarch64-opts.h +++ b/gcc/config/aarch64/aarch64-opts.h @@ -32,8 +32,6 @@ enum aarch64_processor #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \ INTERNAL_IDENT, #include "aarch64-cores.def" - /* Used to indicate that no processor has been specified. */ - generic, /* Used to mark the end of the processor table. */ aarch64_none }; diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -720,7 +720,6 @@ enum target_cpus #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \ TARGET_CPU_##INTERNAL_IDENT, #include "aarch64-cores.def" - TARGET_CPU_generic }; /* If there is no CPU defined at configure, use generic as default. */ diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -427,8 +427,6 @@ static const struct processor all_cores[] = {NAME, IDENT, SCHED, AARCH64_ARCH_##ARCH, \ feature_deps::cpu_##IDENT, &COSTS##_tunings}, #include "aarch64-cores.def" - {"generic", generic, cortexa53, AARCH64_ARCH_V8A, - feature_deps::V8A ().enable, &generic_tunings}, {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL} }; --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPER AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1) AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1) +/* Generic Architecture Processors. */ +AARCH64_CORE("generic", generic, cortexa53, V8A, (), generic, 0x0, 0x0, -1) + #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h index 831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70 100644 --- a/gcc/config/aarch64/aarch64-opts.h +++ b/gcc/config/aarch64/aarch64-opts.h @@ -32,8 +32,6 @@ enum aarch64_processor #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \ INTERNAL_IDENT, #include "aarch64-cores.def" - /* Used to indicate that no processor has been specified. */ - generic, /* Used to mark the end of the processor table. */ aarch64_none }; diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -720,7 +720,6 @@ enum target_cpus #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \ TARGET_CPU_##INTERNAL_IDENT, #include "aarch64-cores.def" - TARGET_CPU_generic }; /* If there is no CPU defined at configure, use generic as default. */ diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -427,8 +427,6 @@ static const struct processor all_cores[] = {NAME, IDENT, SCHED, AARCH64_ARCH_##ARCH, \ feature_deps::cpu_##IDENT, &COSTS##_tunings}, #include "aarch64-cores.def" - {"generic", generic, cortexa53, AARCH64_ARCH_V8A, - feature_deps::V8A ().enable, &generic_tunings}, {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL} }; From patchwork Wed Nov 15 17:07:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1864436 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=U0vTVi98; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=U0vTVi98; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SVqPX6XVRz1yRV for ; Thu, 16 Nov 2023 04:08:32 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 659843858011 for ; Wed, 15 Nov 2023 17:08:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2053.outbound.protection.outlook.com [40.107.15.53]) by sourceware.org (Postfix) with ESMTPS id 2188C38582AC for ; Wed, 15 Nov 2023 17:08:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2188C38582AC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2188C38582AC Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.15.53 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068096; cv=pass; b=dOLiBfCr2o/qnuUXGBII0P9ddMIH01iYeLjEhym/6yKJRIOvBpa/xEUA6f2s0io6S3SevEKaeU/H6LXClezVb+R9xsW47wEMyIcCqB1GBDeX/kCfbSkTEd5lnGaBG86chKtFGMoRTv0yQCMig984wiXuAE2h5fctiEKN/v+p1gI= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068096; c=relaxed/simple; bh=g5gIrif0vWDMcPXJif2q5v1n4RhzT+rtAT5G6YRPocM=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=vxht/sd4QTUfqwUT8KATwpiKMWb/AvwkpCe8wCgO4Hg1Kzi3F1OB87BdONv6MNAxcLU4XU70XXZ0Y+ifKaH9TuxTymxJksh7Kpe3BoOwfPFjOx10DIsGvcSzpgSnPd2mTuv6I/gbbRgYuFbBhX7PMN6IlAps4m7ELCWRbJ4U6ew= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=QU7zfrrTsfTiPni0W2Fgjs2/G9Qr53dVdkuutmmtpEYr8ojkqOeIOnl8JPJDHaIUZfTP42YuQEn92A9Yt1nEOdV1/CtiG57X09qM7AEoKibUKS56TuDc/HPvdqtAdFs18Dauu1FoeYp6QeDPt0mmOZtkyCmXl7q/tHVgzBw/+tp9L7MEsJCEor8Wt1FajDGNc/wPK8DudwPta20tWmYuvvsyXTD5Eo+pqmmALouzca/qvXmiT7Zdp+PmVgqBpXx+1cqv8t3C5U7YtvXIk1e6iz7lR8LflvDzHfck05YfYov8qu3HG4Q9SiqHi3TGrml6xLq3PedgJxR+Ue31hFAsgw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=izdqPgN83Duj7cS5t4GAtVPQsY34nJD+dYv/UsrMHJw=; b=bKHzkQR78Hdbg0rQA/ukbq6VgAU54Q2ebLSfwQPtZ1/he9IjApzwB+tSRWE4KEwYqLf+ykUCp3dwIFBybLrVbGOQU0tIa96TMykXbRCpnmYY1QU1wwm6gvr2Stl8hukCSjy5VumDHCMGsx29vTxaEC5t2QulFEama7Y3bYbg7CvymVzRLReHsAr2IVrytcuINOkUYz/Epskgqpdts9f469b8nwu5/yuAbFVV0S79c+wPYyagpdSKJC5EEFFhBw+HvDIPAcJVBkePkjHb3UAjAzkrolKsQqzK7C7CoaP+W/ss4tgem5s4yrFVHyUWAoNKtp8/w0ooyJdFkhmDz+mh8A== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=izdqPgN83Duj7cS5t4GAtVPQsY34nJD+dYv/UsrMHJw=; b=U0vTVi98lAPXx+8yE71DYi/+rOhqpX97tuF8wQi8ZD030fPatfAAN5Xcz6gAjsijF2ZIioKIpCLgbMe9+xABKJLTIQQvggq5VoEELSpMMz8NBltP0XsYMxExvVVQzvspjRLMRqrDCH/aoHU7OjNqQ6zT3A++D5ovzI/5OflTYKM= Received: from DU2PR04CA0223.eurprd04.prod.outlook.com (2603:10a6:10:2b1::18) by GV1PR08MB8451.eurprd08.prod.outlook.com (2603:10a6:150:83::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.29; Wed, 15 Nov 2023 17:08:07 +0000 Received: from DB5PEPF00014B92.eurprd02.prod.outlook.com (2603:10a6:10:2b1:cafe::d9) by DU2PR04CA0223.outlook.office365.com (2603:10a6:10:2b1::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18 via Frontend Transport; Wed, 15 Nov 2023 17:08:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B92.mail.protection.outlook.com (10.167.8.230) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20 via Frontend Transport; Wed, 15 Nov 2023 17:08:07 +0000 Received: ("Tessian outbound 8289ea11ec17:v228"); Wed, 15 Nov 2023 17:08:07 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c7168f360463db85 X-CR-MTA-TID: 64aa7808 Received: from 88b0cb3e79b4.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BFFDF8A3-1E6E-4FF4-BDE3-3DF2EDCDB2E6.1; Wed, 15 Nov 2023 17:07:55 +0000 Received: from EUR01-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 88b0cb3e79b4.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 15 Nov 2023 17:07:55 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JUji6Fe+6J00MDyVchuEeJlijZF4az7Nf5ScVCggUrazlC5F/yd6k6uWMemb0Gwdo0Wp61XRqsc7eBRH/pPQoh1qqoONF45jL6HjltHBLRZy7TsQkEOlfehkXK0hNQdCIoY2yWnXsflrxhBJhzr9yhaWHKxDpnaGJlpjiCdz4lN+fXrbfwclcO5+XT6/kM+14grCTKhDhWIdZUJPe9+PUYgJ5CxDaKckStkHL+PglZrnRz9ziaweuz4Dl7YLTBCoJA7vtExdDzUZXMcIDSHWCmyj1AoHWe0LheHPof14JenpbdXyffE7wvwegJmbyfU6btPr0Qdfoyr/HqNDCBBK0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=izdqPgN83Duj7cS5t4GAtVPQsY34nJD+dYv/UsrMHJw=; b=YeWqfzmMT/1GS46WgdU4WmfVU1al8tcwCYANeM2UpiUm9hFf9chj+6q+KQS1IlE4kprGWtFZJ5CCUzqbeAaUHHVHoEEPZ2L8nANtdt6cXpYxNOI/cHD/mgfDNYUeEL2figJfKWMiA72j40aYj0aoKQkTQrftoCANmhNjvDP1sgV3SkaqOZDzdAk1Fvqj7Hcn+HgYvUJrO/ixYZygV3vTYTRY768M1iLSne+Jbngd/Ac3+WGIrD+TVRibUL26DPj/ULLkzcD5Vf2cBdHhfMwQULF/ckIkwqUUp65XtMiIimb7QV7vc5L9Y0xI/L9EaNTIOniJ4yklp+Xs6WFQ33lwxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=izdqPgN83Duj7cS5t4GAtVPQsY34nJD+dYv/UsrMHJw=; b=U0vTVi98lAPXx+8yE71DYi/+rOhqpX97tuF8wQi8ZD030fPatfAAN5Xcz6gAjsijF2ZIioKIpCLgbMe9+xABKJLTIQQvggq5VoEELSpMMz8NBltP0XsYMxExvVVQzvspjRLMRqrDCH/aoHU7OjNqQ6zT3A++D5ovzI/5OflTYKM= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAVPR08MB9040.eurprd08.prod.outlook.com (2603:10a6:102:32d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Wed, 15 Nov 2023 17:07:52 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.6977.029; Wed, 15 Nov 2023 17:07:52 +0000 Date: Wed, 15 Nov 2023 17:07:48 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0655.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:316::9) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAVPR08MB9040:EE_|DB5PEPF00014B92:EE_|GV1PR08MB8451:EE_ X-MS-Office365-Filtering-Correlation-Id: e151e702-5d38-4ba2-d4f2-08dbe5fd7012 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: /0GnaSNq4zNPs1Mr3LgEKGWHxVnpn3IGfFjoOrl7wX7+NgFcOYe3Dyd5M6ivYPUwb8EUopKCqLDJ7nICoCivtiLlLfpeIEiKyW96GVJXFJUl9YU/mNWoxLI7pfnkAkyQyFXzpxkUSxqlWEkpjWiwFsi76puE6YM2dInOTqXnJ3cAXoBKnJcfRG635Dijf7LRHkXv27yWAsMj1x6RrQ/khCWhWpyWRrwEuEaOAaeBzl1XLHvNssKiGBIGAsD8VnNMBjYkicB4Accv8dmZsPqSMx2m41aMqoVwIMnMLzUWj/enNi+2IxQCVPDe9IBe9jgOOY992VOw8UqKTwg8lJI4YoU+1Kh6Nbj34KoNSPmocsjcYutS7i/51hEHUTX4DQF4P7hmJzPyviWEwDsSnoyv6jXw/adUHRJdaHgcAvrdPDvMhfeVyTNazB1KneDrYnDRY28ivNK4Ss4sZjOiSqs949gBhSu4cd33oLv0Bh9m+R2XEyix7LMYuT2gaFvvrmnLvghVMk4Fhs6zil1hNTgZuY0oBD+jx+P53DeW8orMfeDpzQJ+MavsMRY6+vAXLCrT06Iy2D5axLCdgikpJvtmgEPwaOatN7osEE114FrAtyZYaPX7Z64Uin9dCqHQcHxHANuVMTHgyCXn6jhOx+cNAg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(396003)(346002)(366004)(376002)(136003)(230922051799003)(64100799003)(451199024)(1800799009)(186009)(83380400001)(2616005)(235185007)(26005)(478600001)(30864003)(6666004)(6506007)(5660300002)(6512007)(4743002)(2906002)(8936002)(8676002)(4326008)(33964004)(44832011)(44144004)(316002)(41300700001)(38100700002)(6486002)(66476007)(66556008)(66946007)(6916009)(36756003)(86362001)(41533002)(2004002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9040 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B92.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 1d77398a-2986-4460-82aa-08dbe5fd66e6 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9Mq1/9FbcvgySptX0p8vJD/22aalUJIdx03IkQSGeA6gWuSC/AMVdchcYYcA7lWqNsvSbvqEPYW783F4kL8dN9QyHcEqilnrn+GH/Ss5A3+FBBUQT32o1w+Q+oft/3gErcPiUBrUeXc+zuaBO3gEE4Dw8ry3onmSI1OZlXiwU/ZOituOdS54ypgSTmo2+Ifh7c/7kU+UVAYVAk5e9XyRYXH9BjXc/QHe9JBC0x5wRP/R+E9KQsw15sFLnEVZuhYqLWdnmJqTNaxU4EjIt3EsuLnU4X8VsOc+tLi5CIgBpNnzfiR3px+mYksOLGr9iz4k14pep58dzYuH8gbjxeusQ+/hQMDq0IyOJcw0P7SYi6dq+RoTP86euPyXURCkc6Ms0cgk8plncobC0La7Be0kkTNI7s/9P7NlkF1IpTY/47iRj3QmEJyZp4BArdK0b0Oaa7ADXpP4S+VT+dVX3BRxYo7wPuIDtBRK3VyycSiKovetCcYORgeUVLqWGSWkhH67x19YKMeaJFfTbx0tYpG735p9LlX3I8+g1+vLB0y5UXsTttmsTwgsDkVqV157pL2qigNrCL4ax1Q3j+r4ViN0e0hx9NFHNPif1C+9YN+/kKXftoUVPLKMfEL3g5gA+G1EEjtqXMMp/ygBd8caRF4Svslxv9J4eC1phDU1ZYDw6iruNlQTfR1dtK7bm/0z7Vi3fs3iAB2sfq69w2Fl8VFIrZoTMuq+ekN1mPs+IiIDd9K5MBdaA8heHVFlQSjbvmeq+eoAGmi+NBeXM0ufoTRciU2obhXonlo68pvZ0akEuc/Ugz2j7/hGLIzpaMac2pOu X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(346002)(39860400002)(396003)(136003)(376002)(230922051799003)(1800799009)(186009)(82310400011)(451199024)(64100799003)(46966006)(36840700001)(40470700004)(6486002)(81166007)(478600001)(36756003)(6666004)(356005)(33964004)(44144004)(6506007)(336012)(4743002)(47076005)(316002)(6916009)(70586007)(6512007)(70206006)(83380400001)(41300700001)(40480700001)(235185007)(44832011)(8676002)(4326008)(8936002)(40460700003)(5660300002)(86362001)(2616005)(30864003)(82740400003)(26005)(2906002)(36860700001)(41533002)(2004002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Nov 2023 17:08:07.2826 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e151e702-5d38-4ba2-d4f2-08dbe5fd7012 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B92.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8451 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, This patch adds a new generic scheduling model "generic-armv8-a" and makes it the default for all Armv8 architectures. -mcpu=generic and -mtune=generic is kept around for those that really want the deprecated cost model. This shows on SPECCPU 2017 the following: generic: SPECINT 1.0% imporvement in geomean, SPECFP -0.6%. The SPECFP is due to fotonik3d_r where we vectorize an FP calculation that only ever needs one lane of the result. This I believe is a generic costing bug but at the moment we can't change costs of FP and INT independently. So will defer updating that cost to stage3 after Richard's other costing updates land. generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a, armv8.8-a): Update to generic_armv8_a. * config/aarch64/aarch64-cores.def (generic-armv8-a): New. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include generic_armv8_a.h * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to TARGET_CPU_generic_armv8_a. * config/aarch64/tuning_models/generic_armv8_a.h: New file. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc 100644 --- diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -30,19 +30,19 @@ Due to the assumptions about the positions of these fields in config.gcc, NAME should be kept as the first argument. */ -AARCH64_ARCH("armv8-a", generic, V8A, 8, (SIMD)) -AARCH64_ARCH("armv8.1-a", generic, V8_1A, 8, (V8A, LSE, CRC, RDMA)) -AARCH64_ARCH("armv8.2-a", generic, V8_2A, 8, (V8_1A)) -AARCH64_ARCH("armv8.3-a", generic, V8_3A, 8, (V8_2A, PAUTH, RCPC)) -AARCH64_ARCH("armv8.4-a", generic, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM)) -AARCH64_ARCH("armv8.5-a", generic, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES)) -AARCH64_ARCH("armv8.6-a", generic, V8_6A, 8, (V8_5A, I8MM, BF16)) -AARCH64_ARCH("armv8.7-a", generic, V8_7A, 8, (V8_6A, LS64)) -AARCH64_ARCH("armv8.8-a", generic, V8_8A, 8, (V8_7A, MOPS)) -AARCH64_ARCH("armv8-r", generic, V8R , 8, (V8_4A)) -AARCH64_ARCH("armv9-a", generic, V9A , 9, (V8_5A, SVE2)) -AARCH64_ARCH("armv9.1-a", generic, V9_1A, 9, (V8_6A, V9A)) -AARCH64_ARCH("armv9.2-a", generic, V9_2A, 9, (V8_7A, V9_1A)) -AARCH64_ARCH("armv9.3-a", generic, V9_3A, 9, (V8_8A, V9_2A)) +AARCH64_ARCH("armv8-a", generic_armv8_a, V8A, 8, (SIMD)) +AARCH64_ARCH("armv8.1-a", generic_armv8_a, V8_1A, 8, (V8A, LSE, CRC, RDMA)) +AARCH64_ARCH("armv8.2-a", generic_armv8_a, V8_2A, 8, (V8_1A)) +AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, PAUTH, RCPC)) +AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM)) +AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES)) +AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF16)) +AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, LS64)) +AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) +AARCH64_ARCH("armv8-r", generic_armv8_a, V8R , 8, (V8_4A)) +AARCH64_ARCH("armv9-a", generic, V9A , 9, (V8_5A, SVE2)) +AARCH64_ARCH("armv9.1-a", generic, V9_1A, 9, (V8_6A, V9A)) +AARCH64_ARCH("armv9.2-a", generic, V9_2A, 9, (V8_7A, V9_1A)) +AARCH64_ARCH("armv9.3-a", generic, V9_3A, 9, (V8_8A, V9_2A)) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, /* Generic Architecture Processors. */ AARCH64_CORE("generic", generic, cortexa53, V8A, (), generic, 0x0, 0x0, -1) +AARCH64_CORE("generic-armv8-a", generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1) #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index cd5d79ea9c221874578a4d5804e4f618e671ebcd..0a32056f255de455f47a0b7395dfef0af84c6b5e 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 145bf536c28fdef84246e16d8351f4b4e357d27c..1ac298926ce1606a87bcdcaf691f182ca416d600 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -724,7 +724,7 @@ enum target_cpus /* If there is no CPU defined at configure, use generic as default. */ #ifndef TARGET_CPU_DEFAULT -# define TARGET_CPU_DEFAULT TARGET_CPU_generic +# define TARGET_CPU_DEFAULT TARGET_CPU_generic_armv8_a #endif /* If inserting NOP before a mult-accumulate insn remember to adjust the diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9d59431d933021d71c5c202f0a61f807a2d2b0f1..1f5645e4886acd30ee5a437f60ffb53ee7b09436 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -355,6 +355,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] = /* Tuning parameters. */ #include "tuning_models/generic.h" +#include "tuning_models/generic_armv8_a.h" #include "tuning_models/cortexa35.h" #include "tuning_models/cortexa53.h" #include "tuning_models/cortexa57.h" diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h new file mode 100644 index 0000000000000000000000000000000000000000..82abe172834756696a3905dbf92464f73a1ea3da --- /dev/null +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h @@ -0,0 +1,191 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_GENERIC_ARMV8_A +#define GCC_AARCH64_H_GENERIC_ARMV8_A + +#include "generic.h" + +static const struct cpu_addrcost_table generic_armv8_a_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost generic_armv8_a_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +/* Generic costs for Advanced SIMD vector operations. */ +static const advsimd_vec_cost generic_armv8_a_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Generic costs for SVE vector operations. */ +static const sve_vec_cost generic_armv8_a_sve_vector_cost = +{ + { + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 2, /* clast_cost */ + 2, /* fadda_f16_cost */ + 2, /* fadda_f32_cost */ + 2, /* fadda_f64_cost */ + 4, /* gather_load_x32_cost */ + 2, /* gather_load_x64_cost */ + 1 /* scatter_store_elt_cost */ +}; + +/* Generic costs for vector insn classes. */ +static const struct cpu_vector_cost generic_armv8_a_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 1, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &generic_armv8_a_advsimd_vector_cost, /* advsimd */ + &generic_armv8_a_sve_vector_cost, /* sve */ + nullptr /* issue_info */ +}; + +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_armv8_a_branch_cost = +{ + 1, /* Predictable. */ + 3 /* Unpredictable. */ +}; + +/* Generic approximation modes. */ +static const cpu_approx_modes generic_armv8_a_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_NONE, /* sqrt */ + AARCH64_APPROX_NONE /* recip_sqrt */ +}; + +/* Generic prefetch settings (which disable prefetch). */ +static const cpu_prefetch_tune generic_armv8_a_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + -1, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params generic_armv8_a_tunings = +{ + &cortexa76_extra_costs, + &generic_armv8_a_addrcost_table, + &generic_armv8_a_regmove_cost, + &generic_armv8_a_vector_cost, + &generic_armv8_a_branch_cost, + &generic_armv8_a_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 5, /* load_fp. */ + 2, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_GENERIC_ARMV8_A. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c index aac06bd8093bed9e50928ee23f9a075888f14543..96e9935360100e25a4c01cceabc7aa840f520a3e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c index f6278916e1afeb3f0cb8fdbff4e98782ad0a726e..6f969a829425960b414508a7e354a1f39426a0e4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c index 03a6636f2d20b12f7e950a5bd6e43216139370fa..e6ec5157cd6dcc6b6dc24c5384432289b6dcdfba 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c index 9a2bd8f152ff32e8da1c4e2a73a31a249e5991c7..7ed35921b6f914441dc463c4030fcc4663a6813c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c index d5bee3a7b900bf9348c9cbfd67f487c381b13bf6..4bdb167944cda1861dd0462d905149646be69693 100644 --- a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options "-O2 -march=armv8-a+crc+crypto -mcpu=generic" } */ +/* { dg-options "-O2 -mcpu=generic+crypto" } */ #include "arm_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c index 069a0010865334324a100bab358bb53369f122fb..e6f31ba72ee77d1129f3cfbe2d90216d6c355c57 100644 --- a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options "-march=armv8-a+crypto -mcpu=generic -save-temps" } */ +/* { dg-options "-mcpu=generic+crypto -save-temps" } */ /* Check that "+nothing" clears the ISA flags. */ --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -30,19 +30,19 @@ Due to the assumptions about the positions of these fields in config.gcc, NAME should be kept as the first argument. */ -AARCH64_ARCH("armv8-a", generic, V8A, 8, (SIMD)) -AARCH64_ARCH("armv8.1-a", generic, V8_1A, 8, (V8A, LSE, CRC, RDMA)) -AARCH64_ARCH("armv8.2-a", generic, V8_2A, 8, (V8_1A)) -AARCH64_ARCH("armv8.3-a", generic, V8_3A, 8, (V8_2A, PAUTH, RCPC)) -AARCH64_ARCH("armv8.4-a", generic, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM)) -AARCH64_ARCH("armv8.5-a", generic, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES)) -AARCH64_ARCH("armv8.6-a", generic, V8_6A, 8, (V8_5A, I8MM, BF16)) -AARCH64_ARCH("armv8.7-a", generic, V8_7A, 8, (V8_6A, LS64)) -AARCH64_ARCH("armv8.8-a", generic, V8_8A, 8, (V8_7A, MOPS)) -AARCH64_ARCH("armv8-r", generic, V8R , 8, (V8_4A)) -AARCH64_ARCH("armv9-a", generic, V9A , 9, (V8_5A, SVE2)) -AARCH64_ARCH("armv9.1-a", generic, V9_1A, 9, (V8_6A, V9A)) -AARCH64_ARCH("armv9.2-a", generic, V9_2A, 9, (V8_7A, V9_1A)) -AARCH64_ARCH("armv9.3-a", generic, V9_3A, 9, (V8_8A, V9_2A)) +AARCH64_ARCH("armv8-a", generic_armv8_a, V8A, 8, (SIMD)) +AARCH64_ARCH("armv8.1-a", generic_armv8_a, V8_1A, 8, (V8A, LSE, CRC, RDMA)) +AARCH64_ARCH("armv8.2-a", generic_armv8_a, V8_2A, 8, (V8_1A)) +AARCH64_ARCH("armv8.3-a", generic_armv8_a, V8_3A, 8, (V8_2A, PAUTH, RCPC)) +AARCH64_ARCH("armv8.4-a", generic_armv8_a, V8_4A, 8, (V8_3A, F16FML, DOTPROD, FLAGM)) +AARCH64_ARCH("armv8.5-a", generic_armv8_a, V8_5A, 8, (V8_4A, SB, SSBS, PREDRES)) +AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF16)) +AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, LS64)) +AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) +AARCH64_ARCH("armv8-r", generic_armv8_a, V8R , 8, (V8_4A)) +AARCH64_ARCH("armv9-a", generic, V9A , 9, (V8_5A, SVE2)) +AARCH64_ARCH("armv9.1-a", generic, V9_1A, 9, (V8_6A, V9A)) +AARCH64_ARCH("armv9.2-a", generic, V9_2A, 9, (V8_7A, V9_1A)) +AARCH64_ARCH("armv9.3-a", generic, V9_3A, 9, (V8_8A, V9_2A)) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, /* Generic Architecture Processors. */ AARCH64_CORE("generic", generic, cortexa53, V8A, (), generic, 0x0, 0x0, -1) +AARCH64_CORE("generic-armv8-a", generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1) #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index cd5d79ea9c221874578a4d5804e4f618e671ebcd..0a32056f255de455f47a0b7395dfef0af84c6b5e 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 145bf536c28fdef84246e16d8351f4b4e357d27c..1ac298926ce1606a87bcdcaf691f182ca416d600 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -724,7 +724,7 @@ enum target_cpus /* If there is no CPU defined at configure, use generic as default. */ #ifndef TARGET_CPU_DEFAULT -# define TARGET_CPU_DEFAULT TARGET_CPU_generic +# define TARGET_CPU_DEFAULT TARGET_CPU_generic_armv8_a #endif /* If inserting NOP before a mult-accumulate insn remember to adjust the diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9d59431d933021d71c5c202f0a61f807a2d2b0f1..1f5645e4886acd30ee5a437f60ffb53ee7b09436 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -355,6 +355,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] = /* Tuning parameters. */ #include "tuning_models/generic.h" +#include "tuning_models/generic_armv8_a.h" #include "tuning_models/cortexa35.h" #include "tuning_models/cortexa53.h" #include "tuning_models/cortexa57.h" diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h new file mode 100644 index 0000000000000000000000000000000000000000..82abe172834756696a3905dbf92464f73a1ea3da --- /dev/null +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h @@ -0,0 +1,191 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_GENERIC_ARMV8_A +#define GCC_AARCH64_H_GENERIC_ARMV8_A + +#include "generic.h" + +static const struct cpu_addrcost_table generic_armv8_a_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* post_modify_ld3_st3 */ + 0, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost generic_armv8_a_regmove_cost = +{ + 1, /* GP2GP */ + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + 5, /* GP2FP */ + 5, /* FP2GP */ + 2 /* FP2FP */ +}; + +/* Generic costs for Advanced SIMD vector operations. */ +static const advsimd_vec_cost generic_armv8_a_advsimd_vector_cost = +{ + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +/* Generic costs for SVE vector operations. */ +static const sve_vec_cost generic_armv8_a_sve_vector_cost = +{ + { + 1, /* int_stmt_cost */ + 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 2, /* reduc_i8_cost */ + 2, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 2, /* reduc_f16_cost */ + 2, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + 2, /* vec_to_scalar_cost */ + 1, /* scalar_to_vec_cost */ + 1, /* align_load_cost */ + 1, /* unalign_load_cost */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 2, /* clast_cost */ + 2, /* fadda_f16_cost */ + 2, /* fadda_f32_cost */ + 2, /* fadda_f64_cost */ + 4, /* gather_load_x32_cost */ + 2, /* gather_load_x64_cost */ + 1 /* scatter_store_elt_cost */ +}; + +/* Generic costs for vector insn classes. */ +static const struct cpu_vector_cost generic_armv8_a_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 1, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 3, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &generic_armv8_a_advsimd_vector_cost, /* advsimd */ + &generic_armv8_a_sve_vector_cost, /* sve */ + nullptr /* issue_info */ +}; + +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_armv8_a_branch_cost = +{ + 1, /* Predictable. */ + 3 /* Unpredictable. */ +}; + +/* Generic approximation modes. */ +static const cpu_approx_modes generic_armv8_a_approx_modes = +{ + AARCH64_APPROX_NONE, /* division */ + AARCH64_APPROX_NONE, /* sqrt */ + AARCH64_APPROX_NONE /* recip_sqrt */ +}; + +/* Generic prefetch settings (which disable prefetch). */ +static const cpu_prefetch_tune generic_armv8_a_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + -1, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + +static const struct tune_params generic_armv8_a_tunings = +{ + &cortexa76_extra_costs, + &generic_armv8_a_addrcost_table, + &generic_armv8_a_regmove_cost, + &generic_armv8_a_vector_cost, + &generic_armv8_a_branch_cost, + &generic_armv8_a_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + { 4, /* load_int. */ + 2, /* store_int. */ + 5, /* load_fp. */ + 2, /* store_fp. */ + 4, /* load_pred. */ + 4 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_GENERIC_ARMV8_A. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c index aac06bd8093bed9e50928ee23f9a075888f14543..96e9935360100e25a4c01cceabc7aa840f520a3e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c index f6278916e1afeb3f0cb8fdbff4e98782ad0a726e..6f969a829425960b414508a7e354a1f39426a0e4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c index 03a6636f2d20b12f7e950a5bd6e43216139370fa..e6ec5157cd6dcc6b6dc24c5384432289b6dcdfba 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c index 9a2bd8f152ff32e8da1c4e2a73a31a249e5991c7..7ed35921b6f914441dc463c4030fcc4663a6813c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c index d5bee3a7b900bf9348c9cbfd67f487c381b13bf6..4bdb167944cda1861dd0462d905149646be69693 100644 --- a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options "-O2 -march=armv8-a+crc+crypto -mcpu=generic" } */ +/* { dg-options "-O2 -mcpu=generic+crypto" } */ #include "arm_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c index 069a0010865334324a100bab358bb53369f122fb..e6f31ba72ee77d1129f3cfbe2d90216d6c355c57 100644 --- a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c @@ -1,5 +1,5 @@ /* { dg-do assemble } */ -/* { dg-options "-march=armv8-a+crypto -mcpu=generic -save-temps" } */ +/* { dg-options "-mcpu=generic+crypto -save-temps" } */ /* Check that "+nothing" clears the ISA flags. */ From patchwork Wed Nov 15 17:08:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1864439 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=A5S5+rGo; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=A5S5+rGo; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SVqS23TKtz1yRM for ; Thu, 16 Nov 2023 04:10:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 031413857C45 for ; Wed, 15 Nov 2023 17:10:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2077.outbound.protection.outlook.com [40.107.22.77]) by sourceware.org (Postfix) with ESMTPS id 06B61385801F for ; Wed, 15 Nov 2023 17:08:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 06B61385801F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 06B61385801F Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.77 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068117; cv=pass; b=aeTRKidpcKZON5qgLVLLnmDclA1Th9ys/OaKx+pwl/ptnNoOCs9dBYFCXbQT8raIjs4hMm8X0Q5VUnKr6EkDaLqV0a44FzGovY64qAubiuaryEIq2ScXV7EXkM3nNHm+EJXCXPHQV3WTMnc58iqJcN+ZcNC7A433PAptIVcgEBc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068117; c=relaxed/simple; bh=PxEZTUiynI6KgwOo3IzVId96V20A2xfSUFoPGlCc0V8=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=A6woOw9mKR6AS1K9fIEEsvtAfFFzWKA1ZwSO4muY+fURZy3cum8u5hEN9MTgmWtxuL20oZ3+K0+Qup0363ZphrrwF9/LaRwBWe4RaNVYALU7L8pgCy9m16qQfZFNYmezoNYZxQculuV5Px/OalMDy82aHVQ6foJC+jjbBFYEJWI= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=i1WSnx4G+QZHzZGCrl+UOteqCFw46P1FH3t1AQM3eMpTgiuHV4PDeIKzB8t95VlE3A5CNIoDwAlY+/yYdNb/hYprhnZKP0ngVuYe6Bv9Yw/seMcK5O0bSeVnwBAEbCAkBmINeIAdBIjuY0bBkHRv1DePqA2TM5SVXrVNp+anLC6wD4sA8xkrA2S248WvYOgzfYUPccgwtXV28W8/C3f6tUGcTYnrtMNk1/Dy/0SrXiZZa+dZLqVjc/nbUxegdl4LLHcQy4D27j2YAzSK9sMobFpO5b7Co2EnuK1kdvWujeGqV2TM6GUEZN0WtcQRvQpj+DcEyRh3vYHg4Q3OoTDnXg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BPV2RqXxGAVzkuXBaXawYnwKsnonqGStE/KBgkpTHGo=; b=Eu/jUBOoJfCItpaXEIptYPwQjb2mRpPP7SpH8/fyDi56CZdl0gvtHXPjn5FZijqRJYhu7kyhxtRRAkW6QnDqGc3Hs3IO47jbks7jUTTQRUrW0n3hWji611cuEouMlyaoaWsk820rjVcuWaTurzBQdm2HApBLeQsZrtLMAmj4xbUylAa7wNH7EU47mojeOZIBiyHDwOj0SVD2tFlq41AgONkbOxYXw3Z8YNUICDkzlN9vcSxD/SJBPjl+50yYvsAO8RrUuyR4k7+YD3az64K4hix2s/sI4oUxl6+Zc2d4UFnju+qXlEf61lGm88F8ni1s/tEuTMjNK8++HzMvucbRCw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BPV2RqXxGAVzkuXBaXawYnwKsnonqGStE/KBgkpTHGo=; b=A5S5+rGosfOeDBWOj5JK64xwSA5PCca/b/1zBGinPJvzeuLISGne6gHbqbUYal8CNw7BennSKZs84ZFlQ0gepjcuCdWC8Lle2L8uzTGotNbU7OI0PWWDDuEgfUFHRMsy/b5ZlsoNq5W7ShjvJVAcdsbCZQsB/IXmp7lW57mygpY= Received: from DB7PR02CA0020.eurprd02.prod.outlook.com (2603:10a6:10:52::33) by AS2PR08MB9690.eurprd08.prod.outlook.com (2603:10a6:20b:606::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.26; Wed, 15 Nov 2023 17:08:30 +0000 Received: from DU6PEPF0000B61B.eurprd02.prod.outlook.com (2603:10a6:10:52:cafe::a8) by DB7PR02CA0020.outlook.office365.com (2603:10a6:10:52::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18 via Frontend Transport; Wed, 15 Nov 2023 17:08:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF0000B61B.mail.protection.outlook.com (10.167.8.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.14 via Frontend Transport; Wed, 15 Nov 2023 17:08:29 +0000 Received: ("Tessian outbound 5d213238733f:v228"); Wed, 15 Nov 2023 17:08:29 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 74ca544fde9dd0a0 X-CR-MTA-TID: 64aa7808 Received: from 4dcbefdf13ab.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5ABC2E6F-4D94-46E7-B5DC-70F4A639E32D.1; Wed, 15 Nov 2023 17:08:18 +0000 Received: from EUR01-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4dcbefdf13ab.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 15 Nov 2023 17:08:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ktsOgyTFtDvkIRLua7X/TG6FYaibKVoegfRa9WEDYdSqisrfwekn3aQOO2p5CWiul03EXIsW0B56aieoeueHfg+l5AenewMdzZpK9NMdEjpv70G0A7XyfkXs3C+vQGaNKeuP2K1mL+Jn14pDB9CMZzWE/av1uoXGIn/NwuEw49eHZXUEzofMpd4OJKhuOFzAQQTR6Cgk5GEtNmja6HZ+UWa7An04nb6vrMUhKNVO8G7gCx682OI6eLi2bIwR9yMedsb6xjTg6QbXu6e1V+/6b7zOggos8q/TSIbWZLXU4s2Q8+vq9CoTLpXWTYBozR58uXnxt5NNI19mAGtCqmrnWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BPV2RqXxGAVzkuXBaXawYnwKsnonqGStE/KBgkpTHGo=; b=SkQBkv3isht8XkWpHEUUdy6La23/OM6Qd48JhC3eHp8dT86yjiQ158CnWm+2PVi+8kNEbWIBm9tlKOYE8M84yuc1+JcISqhc8Hl/3T1nUbKExRtL3JfsIrPzXJkc1NFhmLty3gXzdy+Idm0NbeKq6IjVl+C0RXXzLzsFcCqPjCxNkkynu/tm47agb+7fiDhdtvrNM/FYl3VHVFckmgJMr81m0sL68i5ACGGO21Sj4b19O1ksGwUX6FD2C4PI/C/QfnEci+lhnZ8AW7rFB7xsngNGB5/ZBZM8FjJHUn3vYxxgLhY0ciLc4AX6uT9Fqv21a3MKnPxKWwilQnBcY07T8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BPV2RqXxGAVzkuXBaXawYnwKsnonqGStE/KBgkpTHGo=; b=A5S5+rGosfOeDBWOj5JK64xwSA5PCca/b/1zBGinPJvzeuLISGne6gHbqbUYal8CNw7BennSKZs84ZFlQ0gepjcuCdWC8Lle2L8uzTGotNbU7OI0PWWDDuEgfUFHRMsy/b5ZlsoNq5W7ShjvJVAcdsbCZQsB/IXmp7lW57mygpY= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAVPR08MB9040.eurprd08.prod.outlook.com (2603:10a6:102:32d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Wed, 15 Nov 2023 17:08:14 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.6977.029; Wed, 15 Nov 2023 17:08:14 +0000 Date: Wed, 15 Nov 2023 17:08:10 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P265CA0302.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:391::6) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAVPR08MB9040:EE_|DU6PEPF0000B61B:EE_|AS2PR08MB9690:EE_ X-MS-Office365-Filtering-Correlation-Id: 62520db7-c5b2-4f82-0e9d-08dbe5fd7d7c x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 75W7vws1v57OX/QGzz3MqwAgZZys7375BnUQPXxXf+3zHntZ6MI7EdgeL7vVs5i9BjJ9DvUOwMm/FgAJXXhr+9ZZmEabC7AZK4VqYpzxDYckA3UcZL8fJXDD9dR9Ma4XzCPAemwebcbHKYadXphs8ncz0e5gl9GG8AgWD+PWRHNvPOmBU0c8ZF2Vhz0Nc3cZnJiZgCtE0Ea7mUfL/uiUNKz0FXFnml3FjYMS8N8uYBa50qAeHI+S/v+MvAWiIfCLynjSLRB0SJDneeqa9LOqToFzN+cQKvVrSC8VBG0VwjH2krZdHOPg0AjZ0rqu7IvYB308hVS/2woAf6MhaVXY67Ki885O5HdS5D113jt5QTYUKA1hXPk0j3DNawgZEV5cOYqeHAPIKddgB5gKMb/YiHXx73dqYAvEga26y8UhkiqSnffE93Zc+odSBBMT/i84y6qO9EH+sMXslJva1XsOxdhhF3YQghcaLmlH5qrVtJY51aKdrR0+L/2G+eKz+Q0NfryWz3ekxLYDvj7w40gzMMTgXboV34aaW3ghfO4NoMAdYsdXcEz0q7RRKeFf25V4D0CM4Xj7wjaUjwb4BbHskl6Xiehp5czaT/6GyPREEsc01bCUj9kmz9heFFNxgsx8Y4SN/pJmeDi3Rc3mqzVJpBoqXMEpQXV3RMfENgfDKAQ= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(396003)(346002)(366004)(376002)(136003)(230922051799003)(64100799003)(451199024)(1800799009)(186009)(83380400001)(2616005)(235185007)(26005)(478600001)(30864003)(6666004)(6506007)(5660300002)(6512007)(2906002)(8936002)(8676002)(4326008)(33964004)(44832011)(44144004)(316002)(41300700001)(38100700002)(6486002)(66476007)(66556008)(66946007)(6916009)(36756003)(86362001)(41533002)(2004002)(67856001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9040 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF0000B61B.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: b02a7de6-dafa-4c97-ab6f-08dbe5fd7460 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Lu4ci4uLZd4fl9QmBQQXHVirx/CEHzNFNy3vZh7X37WVdWfiaDgJ+Iym+F7H3XbTJe+rTTPNcUuSF1CeHg8OIkqnkyqdWG2dphBv9byjqd9aoQMLCARfrSsY/LTNFvmg+1iNeTlINi3rdLm3AgNyyPU1tirROR046or1DpiwSbMosXRnjJXVhPq2/EQwh7XSRKQNpnd5kYC0u1Vc/GrWPxZjOcoXN2cPAyvnqDLzCCsTZp8VYIBPCnt2eQfS+CK03/FnTRNPc69XfZA3c4j6f4r0ToP4aK1EPeQdcXIXeiQ/fVHentpQ2kKNw8VnhxIrGlCRcdv95ZIIg1ZiGN0VAWWSrjDFNiaq4/IdPS3/K3jcZHZYs2Q/3wwphKLgJ2F1KVjduMBOZI+64lnR5VGnJIfl8SCO98Me7+8J6Qfl+s6TmBrzjQKs6Axv5IyqCwYHAWs90Hn39pYLZ1itX2m9VHKKYkDaAJx6JO98uVUKpos0adP0Wh8lrM90V59Ahl2fZgtV9U3iol60KfkLeeI3I5SKjf50pF1xXA2SEfDx+cVZf1RonczG3MWnKEalzGoiSLrvJ9BluaH4fHeTeDem5gNcyze+ICV3vIUxZDhLrWQs1BzNXWfgOAopVabN8HOaPNn98j+snrIHNc2ZdCQiUsXj1pNs9umAQw6mY5aFCOd8edoD2t7zvOcPAlQuplOnyoTQys/3GnBIoRxLv/NJa8KfoGHOEa6fdkyetfTn7CzCcpIGDSCPOw6mSR1oqg5mXAlxrm7tpB+19kxqpujOfEaYOe0X7DOHvUc3fPCymu8Js5nGNgvv6VGcp/eKufNeABUTmJZH97FYwuKa1mGZ0Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(39860400002)(136003)(376002)(346002)(396003)(230922051799003)(186009)(451199024)(1800799009)(82310400011)(64100799003)(36840700001)(46966006)(40470700004)(40480700001)(33964004)(2616005)(36860700001)(44832011)(6506007)(6512007)(44144004)(83380400001)(81166007)(356005)(26005)(336012)(82740400003)(47076005)(478600001)(6486002)(40460700003)(6666004)(70586007)(70206006)(6916009)(316002)(8936002)(41300700001)(4326008)(8676002)(36756003)(2906002)(235185007)(86362001)(30864003)(5660300002)(41533002)(2004002)(2700100001)(67856001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Nov 2023 17:08:29.7825 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 62520db7-c5b2-4f82-0e9d-08dbe5fd7d7c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF0000B61B.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9690 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, This patch adds a new generic scheduling model "generic-armv9-a" and makes it the default for all Armv9 architectures. -mcpu=generic and -mtune=generic is kept around for those that really want the deprecated cost model. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a, armv9.3-a): Update to generic-armv9-a. * config/aarch64/aarch64-cores.def (generic-armv9-a): New. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include generic_armv9_a.h. * config/aarch64/tuning_models/generic_armv9_a.h: New file. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b 100644 --- diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, LS64)) AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) AARCH64_ARCH("armv8-r", generic_armv8_a, V8R , 8, (V8_4A)) -AARCH64_ARCH("armv9-a", generic, V9A , 9, (V8_5A, SVE2)) -AARCH64_ARCH("armv9.1-a", generic, V9_1A, 9, (V8_6A, V9A)) -AARCH64_ARCH("armv9.2-a", generic, V9_2A, 9, (V8_7A, V9_1A)) -AARCH64_ARCH("armv9.3-a", generic, V9_3A, 9, (V8_8A, V9_2A)) +AARCH64_ARCH("armv9-a", generic_armv9_a, V9A , 9, (V8_5A, SVE2)) +AARCH64_ARCH("armv9.1-a", generic_armv9_a, V9_1A, 9, (V8_6A, V9A)) +AARCH64_ARCH("armv9.2-a", generic_armv9_a, V9_2A, 9, (V8_7A, V9_1A)) +AARCH64_ARCH("armv9.3-a", generic_armv9_a, V9_3A, 9, (V8_8A, V9_2A)) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, /* Generic Architecture Processors. */ AARCH64_CORE("generic", generic, cortexa53, V8A, (), generic, 0x0, 0x0, -1) -AARCH64_CORE("generic-armv8-a", generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1) +AARCH64_CORE("generic-armv8-a", generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1) +AARCH64_CORE("generic-armv9-a", generic_armv9_a, cortexa53, V9A, (), generic_armv9_a, 0x0, 0x0, -1) #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index 0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 08635e0df9cfa02286f3950383a32f6f93d1b4e0..5bed5f84cef242ec01f8510c76a450f81a985521 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -356,6 +356,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] = /* Tuning parameters. */ #include "tuning_models/generic.h" #include "tuning_models/generic_armv8_a.h" +#include "tuning_models/generic_armv9_a.h" #include "tuning_models/cortexa35.h" #include "tuning_models/cortexa53.h" #include "tuning_models/cortexa57.h" diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h new file mode 100644 index 0000000000000000000000000000000000000000..c017468592a9dba74ddd432247aaf51a70bb34b5 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h @@ -0,0 +1,245 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_GENERIC_ARMV9_A +#define GCC_AARCH64_H_GENERIC_ARMV9_A + +#include "generic.h" +#include "generic_armv8_a.h" + +static const struct cpu_addrcost_table generic_armv9_a_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost generic_armv9_a_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost generic_armv9_a_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost generic_armv9_a_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 3, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 4, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 6. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ + 13, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ + 9, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ + 2, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~8 cycles and would have a cost of 14. FADDV + completes in 6 cycles, so give it a cost of 14 - 2. */ + 12, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ + 6, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0. */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info generic_armv9_a_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info generic_armv9_a_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info generic_armv9_a_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 3, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info generic_armv9_a_vec_issue_info = +{ + &generic_armv9_a_scalar_issue_info, + &generic_armv9_a_advsimd_issue_info, + &generic_armv9_a_sve_issue_info +}; + +/* Neoverse N2 costs for vector insn classes. */ +static const struct cpu_vector_cost generic_armv9_a_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &generic_armv9_a_advsimd_vector_cost, /* advsimd */ + &generic_armv9_a_sve_vector_cost, /* sve */ + &generic_armv9_a_vec_issue_info /* issue_info */ +}; + +static const struct tune_params generic_armv9_a_tunings = +{ + &cortexa76_extra_costs, + &generic_armv9_a_addrcost_table, + &generic_armv9_a_regmove_cost, + &generic_armv9_a_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_SCALABLE, /* sve_width */ + { 4, /* load_int. */ + 1, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_GENERIC_ARMV9_A. */ --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a", generic_armv8_a, V8_6A, 8, (V8_5A, I8MM, BF AARCH64_ARCH("armv8.7-a", generic_armv8_a, V8_7A, 8, (V8_6A, LS64)) AARCH64_ARCH("armv8.8-a", generic_armv8_a, V8_8A, 8, (V8_7A, MOPS)) AARCH64_ARCH("armv8-r", generic_armv8_a, V8R , 8, (V8_4A)) -AARCH64_ARCH("armv9-a", generic, V9A , 9, (V8_5A, SVE2)) -AARCH64_ARCH("armv9.1-a", generic, V9_1A, 9, (V8_6A, V9A)) -AARCH64_ARCH("armv9.2-a", generic, V9_2A, 9, (V8_7A, V9_1A)) -AARCH64_ARCH("armv9.3-a", generic, V9_3A, 9, (V8_8A, V9_2A)) +AARCH64_ARCH("armv9-a", generic_armv9_a, V9A , 9, (V8_5A, SVE2)) +AARCH64_ARCH("armv9.1-a", generic_armv9_a, V9_1A, 9, (V8_6A, V9A)) +AARCH64_ARCH("armv9.2-a", generic_armv9_a, V9_2A, 9, (V8_7A, V9_1A)) +AARCH64_ARCH("armv9.3-a", generic_armv9_a, V9_3A, 9, (V8_8A, V9_2A)) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, /* Generic Architecture Processors. */ AARCH64_CORE("generic", generic, cortexa53, V8A, (), generic, 0x0, 0x0, -1) -AARCH64_CORE("generic-armv8-a", generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1) +AARCH64_CORE("generic-armv8-a", generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1) +AARCH64_CORE("generic-armv9-a", generic_armv9_a, cortexa53, V9A, (), generic_armv9_a, 0x0, 0x0, -1) #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index 0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 08635e0df9cfa02286f3950383a32f6f93d1b4e0..5bed5f84cef242ec01f8510c76a450f81a985521 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -356,6 +356,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] = /* Tuning parameters. */ #include "tuning_models/generic.h" #include "tuning_models/generic_armv8_a.h" +#include "tuning_models/generic_armv9_a.h" #include "tuning_models/cortexa35.h" #include "tuning_models/cortexa53.h" #include "tuning_models/cortexa57.h" diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h new file mode 100644 index 0000000000000000000000000000000000000000..c017468592a9dba74ddd432247aaf51a70bb34b5 --- /dev/null +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h @@ -0,0 +1,245 @@ +/* Tuning model description for AArch64 architecture. + Copyright (C) 2009-2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_AARCH64_H_GENERIC_ARMV9_A +#define GCC_AARCH64_H_GENERIC_ARMV9_A + +#include "generic.h" +#include "generic_armv8_a.h" + +static const struct cpu_addrcost_table generic_armv9_a_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + +static const struct cpu_regmove_cost generic_armv9_a_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmov_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + +static const advsimd_vec_cost generic_armv9_a_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost generic_armv9_a_sve_vector_cost = +{ + { + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 3, /* ld2_st2_permute_cost */ + 4, /* ld3_st3_permute_cost */ + 4, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + /* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 6. */ + 21, /* reduc_i8_cost */ + /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ + 13, /* reduc_i16_cost */ + /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ + 9, /* reduc_i32_cost */ + /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1. */ + 2, /* reduc_i64_cost */ + /* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~8 cycles and would have a cost of 14. FADDV + completes in 6 cycles, so give it a cost of 14 - 2. */ + 12, /* reduc_f16_cost */ + /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ + 6, /* reduc_f32_cost */ + /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0. */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* See the comment above the Advanced SIMD versions. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would take two parallel FP loads + (8 cycles) plus an insertion (2 cycles). Assume a 64-bit SVE gather + is 1 cycle more. The Advanced SIMD version is costed as 2 scalar loads + (cost 8) and a vec_construct (cost 2). Add a full vector operation + (cost 2) to that, to avoid the difference being lost in rounding. + + There is no easy comparison between a strided Advanced SIMD x32 load + and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector + operation more than a 64-bit gather. */ + 14, /* gather_load_x32_cost */ + 12, /* gather_load_x64_cost */ + 3 /* scatter_store_elt_cost */ +}; + +static const aarch64_scalar_vec_issue_info generic_armv9_a_scalar_issue_info = +{ + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 4, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ +}; + +static const aarch64_advsimd_vec_issue_info generic_armv9_a_advsimd_issue_info = +{ + { + 3, /* loads_stores_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 2, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ +}; + +static const aarch64_sve_vec_issue_info generic_armv9_a_sve_issue_info = +{ + { + { + 3, /* loads_per_cycle */ + 2, /* stores_per_cycle */ + 2, /* general_ops_per_cycle */ + 0, /* fp_simd_load_general_ops */ + 1 /* fp_simd_store_general_ops */ + }, + 2, /* ld2_st2_general_ops */ + 3, /* ld3_st3_general_ops */ + 3 /* ld4_st4_general_ops */ + }, + 2, /* pred_ops_per_cycle */ + 2, /* while_pred_ops */ + 2, /* int_cmp_pred_ops */ + 1, /* fp_cmp_pred_ops */ + 1, /* gather_scatter_pair_general_ops */ + 1 /* gather_scatter_pair_pred_ops */ +}; + +static const aarch64_vec_issue_info generic_armv9_a_vec_issue_info = +{ + &generic_armv9_a_scalar_issue_info, + &generic_armv9_a_advsimd_issue_info, + &generic_armv9_a_sve_issue_info +}; + +/* Neoverse N2 costs for vector insn classes. */ +static const struct cpu_vector_cost generic_armv9_a_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 2, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &generic_armv9_a_advsimd_vector_cost, /* advsimd */ + &generic_armv9_a_sve_vector_cost, /* sve */ + &generic_armv9_a_vec_issue_info /* issue_info */ +}; + +static const struct tune_params generic_armv9_a_tunings = +{ + &cortexa76_extra_costs, + &generic_armv9_a_addrcost_table, + &generic_armv9_a_regmove_cost, + &generic_armv9_a_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_SCALABLE, /* sve_width */ + { 4, /* load_int. */ + 1, /* store_int. */ + 6, /* load_fp. */ + 2, /* store_fp. */ + 6, /* load_pred. */ + 1 /* store_pred. */ + }, /* memmov_cost. */ + 3, /* issue_rate */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ + "32:16", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + &generic_prefetch_tune, + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ + AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ +}; + +#endif /* GCC_AARCH64_H_GENERIC_ARMV9_A. */ From patchwork Wed Nov 15 17:08:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1864438 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=RiVAcyi4; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=RiVAcyi4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SVqQJ5y6Sz1yRV for ; Thu, 16 Nov 2023 04:09:12 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CF3A038582B7 for ; Wed, 15 Nov 2023 17:09:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2074.outbound.protection.outlook.com [40.107.21.74]) by sourceware.org (Postfix) with ESMTPS id CBCF73858012 for ; Wed, 15 Nov 2023 17:08:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBCF73858012 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CBCF73858012 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.21.74 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068136; cv=pass; b=I2xIFHbVaIvbAqHyHvjUPnsFJHfA3fkTua5qRxYTGuUYqaaw1nZsZy822aO9DkgFp+VXFygwIXz700fDkwKPcJBWVw4utIgIIZR8h+fEHqsJT0wxsL8HMlsqNC/MLE6JUf8yjtzqg9cXvZu2jadCTIaTIujTj83Fq28GYsiXThE= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700068136; c=relaxed/simple; bh=5B0LRRS8TwsOU4kT3djJ5rwAnGAO6wdxaY3xk2fSdYI=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=wKtC7hDmK6GI2OJcKrf5fyBjxOB/ZEobdV/5oSrSmI7o5ugffOH4ti/tzL0wAcy78maCmaYXTsWdZ5Huba71SkDTqtpQb/j6DR0m5y2r+NZcmUng5JbHn8v8JOmwBbaCWf0rqQHgj4H1gaqkD/xraLpcbJs87PEKDaSoNnmu0yE= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=AHNI50PuhAK3Ud90OYDtKp0lHx2dKL1pb3wRsWRf7ttSV5OTRaClwUCklpWApn6Qp0U90H6jyM7HCITHbmdFKoNuM7r4FmNG917zpD87Nr70NXdJ1l/KVKjcziVqVwUl+y9wgMYLi5ncQ8C2CurO9Enig2jTsrRU+39tZaZujVksQVl1W9yCaCpaWuFeXlLcc/I8/Um/OoL0BcqxCWyW1lq2w7BULCXLJvzmZl70a0rHUuxXau6fIkG0SvZmyr26dDOqKDzM6bHQ6NDDuXoq8UeiNUuBH+EwBpiP+G+52LyWC2d07Yt2l145AdBb5/EoQoDdQ1XFenl28wh2sE29pg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zA1Gb+EGJpTD48qhPMslJFaup2++ILuxiu6MweB3nEM=; b=Pr6IiNK+eWj+A7pyekoXVAdvCdTUN99RxokWZTRxSGx+hAw5Gtmxsg7vqojEZx5D7psklsdW5JKCACgIsBXFVxo9Ty2vxgDBggFJSVX790s4nUI0fr0MIXogkmaiDveNamwPzMutmvAnv+usYuB64eMLspNCPnqlS7L2ZwRyDmMivTS+ZbMnD8PJ7mKunPJLJWTCQjewkOU7d7WOpkFBjtn9PBpRCEryaHoycfVKLayz20DsBn65+eVO0Xn0Tce3lDYzaI5nsIaXY2Gpb5ZvElP6d4F2z8S8WXEHYC40TzwjdYeHNmIgkBeHmhgZA34QTjpMg65eeo/Ug1pO4I4O6Q== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zA1Gb+EGJpTD48qhPMslJFaup2++ILuxiu6MweB3nEM=; b=RiVAcyi42Y4xBq5i4+kEMTFWjluW7jGJ4OWmUhT/8qyNBYXLjLok7W29bgmHwRerA1RM99jFR+bIdmgjBrOAWSq0NEriPHrcGKl0TzwFyQZyn27PV5cIcaUoF6J9Asy8cd6nLBcxUbUQPn+zG3TYb5R34V+Kmk0pCBNfDdQ7C+U= Received: from DUZP191CA0025.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:4f8::21) by VI1PR08MB9983.eurprd08.prod.outlook.com (2603:10a6:800:1c8::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.29; Wed, 15 Nov 2023 17:08:48 +0000 Received: from DU6PEPF00009529.eurprd02.prod.outlook.com (2603:10a6:10:4f8:cafe::4b) by DUZP191CA0025.outlook.office365.com (2603:10a6:10:4f8::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18 via Frontend Transport; Wed, 15 Nov 2023 17:08:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF00009529.mail.protection.outlook.com (10.167.8.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.14 via Frontend Transport; Wed, 15 Nov 2023 17:08:48 +0000 Received: ("Tessian outbound 26ee1d40577c:v228"); Wed, 15 Nov 2023 17:08:48 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 0e606b5adfc43f88 X-CR-MTA-TID: 64aa7808 Received: from 8e8fbe0c7b29.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id CC0FDB6E-B904-4CFC-B5B6-98CAB71C59DD.1; Wed, 15 Nov 2023 17:08:41 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 8e8fbe0c7b29.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 15 Nov 2023 17:08:41 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RvnY7it5ppxPyUQvh9yLfEvV1eJhpzydgqMPCLdlI5qrKnisdc5y0mN46AQSulLXTo/0un1TnyqYkPy5laz3Mnrq4PtxUDnAnhOZwxMoz30DplM6CUoJUpJR/y30O932OKkPr/jFdWEu0QmDwQcg82408XZKGUCgwqjyjGBeBBpJWAbJkv31HcEwQZpASezRLhi8BcDft6g5OGDFGyJBZ+0C7z6/3eq7Gv0VM3WTX+ezjl99Acdr4Tr7zAqHfPVZ6fWtccLG/WxnOD9rb6/0cwf1ieP1t3veY0fcnIe8CIighkgrH2lyWBRiGO5pOx3J4tkD+pUqLc/LaTe+54yXPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zA1Gb+EGJpTD48qhPMslJFaup2++ILuxiu6MweB3nEM=; b=kTWUcgye9/4m75oUSlyZq4wDfvoEw+QoR0R1c7v0TVBlLt0k6vu9rt9Iqxb3KkkqFa4B+2jlgU62+AHbg1cquhz5SrxQ+tHlJLFSZhTR/CQAVV5m/vNeZXxMFSxyoF3VqehF5x4nhJslNX4SHfDv49F9EZWAmNZvtN1ZzlYrBnECkKrv+aYaSCrrcj0vRAGZrt9utLe6QAEw5Z9Usgj0Kto+FrTxUwehCR4jhG2GVYJnrkm+aH1Z+8FqncTLDXULtpwIrr0V7hd+zZcfmxge0dbXT1Yn1U2SweJq/p1LRVtBZkJd9vqm7T42tTOH8K0wZ2FSEDE1z2MUJxwWEPC1gA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zA1Gb+EGJpTD48qhPMslJFaup2++ILuxiu6MweB3nEM=; b=RiVAcyi42Y4xBq5i4+kEMTFWjluW7jGJ4OWmUhT/8qyNBYXLjLok7W29bgmHwRerA1RM99jFR+bIdmgjBrOAWSq0NEriPHrcGKl0TzwFyQZyn27PV5cIcaUoF6J9Asy8cd6nLBcxUbUQPn+zG3TYb5R34V+Kmk0pCBNfDdQ7C+U= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAVPR08MB9040.eurprd08.prod.outlook.com (2603:10a6:102:32d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Wed, 15 Nov 2023 17:08:38 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.6977.029; Wed, 15 Nov 2023 17:08:38 +0000 Date: Wed, 15 Nov 2023 17:08:34 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO6P123CA0055.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:310::12) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAVPR08MB9040:EE_|DU6PEPF00009529:EE_|VI1PR08MB9983:EE_ X-MS-Office365-Filtering-Correlation-Id: 100a5898-80c0-448a-3adc-08dbe5fd88a0 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0+gZQlmq2RxsaxkrRArIC+t+A79FiA/3NYbjsrEwSSdxARCLHilzg3hCXjXlzyu6Ph0vhUp9gXnC0iGNAjGhyHXBreQp5irpmNd2/drjvGfxVePWSYsA48FESgCKT/EXLEV+plr3JHjaPpYTI0anJ/Df7vNF4QXbEmEenUkHaMkbFW6e7Y3nDna+LlOqI2j0t5FaFCyoAARZCtZaHs/zgssIP6GGHM6fsez6XF7iMNzaLezkUO4JMAjxB2ti1clyO7jNQkydRprLcpvhYsa1tbJma2WJ6Il4JEnBgGdMmdKWtagT1tHFLMmK2qTcOgSui2f89W408LZgFEU4XBEDnVhcokTL966jqmon9RCPKD77uWWjW3tF2G/H+I5uvLMCc5GqMDykAOaYxb5hMoG1dLQK60ShUITC+cUYv5yVmsw5Gvz09JvrEvEqgwmFOeuNaV0mVJ1QsfWwoN0YJtQaRFHG2KLVAjW1d9vF3PhIdMuOK9iVTAvCXbN3a84x0BYgMOcJdTQZTSVGWLZ2/uIPOw/Cii/5p2YYrGU2uAjqg/t56kd76RLbweGR4jKKYnYLXRZjjx9bxOXtfSSgmAFZcKjy7I/SJKx8Qr6YP2yJAMxHA4x2TVjlGEYYd+SNif8gxmNUBIMkQhP8XRhw91NYI8nAaeEVCXfIz7qObH0kx+WBHrBgzdtpL86uxJUQf776 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(396003)(346002)(366004)(376002)(136003)(230173577357003)(230922051799003)(230273577357003)(64100799003)(451199024)(1800799009)(186009)(83380400001)(2616005)(235185007)(26005)(478600001)(6666004)(6506007)(5660300002)(6512007)(4743002)(2906002)(8936002)(8676002)(4326008)(33964004)(44832011)(44144004)(316002)(41300700001)(38100700002)(6486002)(66476007)(66556008)(66946007)(6916009)(36756003)(86362001)(67856001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9040 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF00009529.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 6dfc82ca-318e-4820-a5ca-08dbe5fd82cb X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cfrgmDX0NcCXGBJcikgwzaXSJS1T8rcZCzdwozkyHokl7vQwTnwLnRkEy2wdSgRIA3MOOPSdhK9SoQIVo70IM1kxeFCvMRH9WnkqZZyU7br+rcsOXKeF8YpAaitIlkWD+bEgCMVAq1kx0TA8VtGaMha0h8ZDqqQ8t/xez5NHw10pTnCumCRosmaKgNVAIGrlGouNlDu4TZdrV4VegjbXBRZAk7eD7iisJOf5GpVuhTHLW3yXTBr47KXeyDB8uavUnFOTo2oi2QytvmziCK82H/oAVbuKRpGwbFDZ8jkH43ebrw0y9JrecoW3XM+oHzxFCLCoWsMW0isRUOK2jhwT2lnMUHihO7fyoY2pgV/Na6SWt1/8/pwAqbVfZ/2J3gT2qN8y3vwIjMB3q4RLiX53kxP5pxPdgVLBlON9cLDR8CP3pbZHISbS3zsyzl5HY9/nBfIzj5Svzt6Ekufg6QVM5iaMMI0UnyBcueacXlgSiLnKAgGnV6fKn2RzAdQ9f57Bk3EMjYPLx+hKQId4nQKPxXdGi5UtY44DdsGarZM/yaFJtc2k2Cp6H0iJTB/ERfUZelJO3sTj924QhrfWAvBFScivIfl5eguaU1DbtJrHQe60VYd2+wyzkep9DVfsLILyZK69RIe5LH9TRKwt/IhmXOnMhqAEHdsAMi/BY7E0fx+BZ4BB+WbMNlq5s7FDSTc2PUG8aB/mHqeuliUSMLwO+zvsseE+VbD8yg0BdQYgQLD7i2dPaxVNLFK+0ENyPLXRnyzU94f+Qne7bVaFpQbfzU5eGMrkXbsK7DiMEWMOieKXYOr6GtVoGg9PO2q9FZsH8SPHiWDyv4/veovTG8w6hQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(396003)(376002)(136003)(39850400004)(346002)(230273577357003)(230922051799003)(230173577357003)(1800799009)(451199024)(82310400011)(186009)(64100799003)(36840700001)(46966006)(40470700004)(4326008)(40480700001)(26005)(40460700003)(70586007)(356005)(81166007)(82740400003)(70206006)(86362001)(6916009)(36756003)(336012)(83380400001)(6512007)(2616005)(6666004)(4743002)(36860700001)(33964004)(6506007)(47076005)(478600001)(44832011)(2906002)(44144004)(316002)(6486002)(5660300002)(8936002)(235185007)(8676002)(41300700001)(67856001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Nov 2023 17:08:48.4433 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 100a5898-80c0-448a-3adc-08dbe5fd88a0 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF00009529.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB9983 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, At the moment we emit a warning whenever you specify both -march and -mcpu and the architecture of them differ. The idea originally was that the user may not be aware of this change. However this has a few problems: 1. Architecture revisions is not an observable part of the architecture, extensions are. Starting with GCC 14 we have therefore relaxed the rule that all extensions can be enabled at any architecture level. Therefore it's incorrect, or at least not useful to keep the check on architecture. 2. It's problematic in Makefiles and other build systems, where you want to for certain files enable CPU specific builds. i.e. you may be by default building for -march=armv8-a but for some file for -mcpu=neoverse-n1. Since there's no easy way to remove the earlier options we end up warning and there's no way to disable just this warning. Build systems compiling with -Werror face an issue in this case that compiling with GCC is needlessly hard. 3. It doesn't actually warn for cases that may lead to issues, so e.g. -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would be disabled. For this reason I have one of two proposals: 1. Just remove this warning all together. 2. Rework the warning based on extensions and only warn when features would be disabled by the presence of the -mcpu. This is the approach this patch has taken. As examples: > aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1 cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being added .arch armv8.2-a+crc+sve > aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1 > aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1 > aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2 The one remaining issue here is that if both -march and -mcpu are specified we pick the -march. This is not particularly obvious and for the use case to be more useful I think it makes sense to pick the CPU's arch? I did not make that change in the patch as it changes semantics. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Note that I can't write a test for this because dg-warning expects warnings to be at a particular line and doesn't support warnings at the "global" level. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f 100644 --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16388,12 +16388,22 @@ aarch64_override_options (void) if (cpu && arch) { /* If both -mcpu and -march are specified, warn if they are not - architecturally compatible and prefer the -march ISA flags. */ - if (arch->arch != cpu->arch) - { - warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch", + feature compatible. feature compatible means that the inclusion of the + cpu features would end up disabling an achitecture feature. In + otherwords the cpu features need to be a strict superset of the arch + features and if so prefer the -march ISA flags. */ + auto full_arch_flags = arch->flags | arch_isa; + auto full_cpu_flags = cpu->flags | cpu_isa; + if (~full_cpu_flags & full_arch_flags) + { + std::string ext_diff + = aarch64_get_extension_string_for_isa_flags (full_arch_flags, + full_cpu_flags); + warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch " + "and resulted in options %s being added", aarch64_cpu_string, - aarch64_arch_string); + aarch64_arch_string, + ext_diff.c_str ()); } selected_arch = arch->arch; --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16388,12 +16388,22 @@ aarch64_override_options (void) if (cpu && arch) { /* If both -mcpu and -march are specified, warn if they are not - architecturally compatible and prefer the -march ISA flags. */ - if (arch->arch != cpu->arch) - { - warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch", + feature compatible. feature compatible means that the inclusion of the + cpu features would end up disabling an achitecture feature. In + otherwords the cpu features need to be a strict superset of the arch + features and if so prefer the -march ISA flags. */ + auto full_arch_flags = arch->flags | arch_isa; + auto full_cpu_flags = cpu->flags | cpu_isa; + if (~full_cpu_flags & full_arch_flags) + { + std::string ext_diff + = aarch64_get_extension_string_for_isa_flags (full_arch_flags, + full_cpu_flags); + warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch " + "and resulted in options %s being added", aarch64_cpu_string, - aarch64_arch_string); + aarch64_arch_string, + ext_diff.c_str ()); } selected_arch = arch->arch;