From patchwork Fri Jul 26 09:21:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1965228 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=K8dTvkvR; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=K8dTvkvR; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WVj3X45Dxz1yY5 for ; Fri, 26 Jul 2024 19:23:20 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CE1D9384AB70 for ; Fri, 26 Jul 2024 09:23:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on20621.outbound.protection.outlook.com [IPv6:2a01:111:f403:2612::621]) by sourceware.org (Postfix) with ESMTPS id 9E065384A479; Fri, 26 Jul 2024 09:22:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9E065384A479 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9E065384A479 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2612::621 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1721985751; cv=pass; b=WhByEy7rIdVUUUItQK1bsUeZXIIFxJOx9JHaj30gJoKih0GI7U3F1M6N+MmYrReCfy6coVBInqX6p3lSXGmkJ2HoD9+svi7x80c18vmpxF9Va3lK/Vk7vIkF+n1qPveNjezb9NgxK58JG4yk9Zo/I8S2Xvipe/oIIIzMztnCnz0= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1721985751; c=relaxed/simple; bh=8TTXOWnL4sccjUOEsiLRH5eWJUN7IY0FCcIYxm4Ed58=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=ildYzJDe6R6X/4V0WPPU7qu3qUlYNnx35KsA9q6SgDbnQwVsNCFnYLUev+mb1uTbES2SYMzeixYhksn2IBUDXrVp5pNKsX4V0CbT42/nffe38TxfSYbvmBz19ttNzdDVNq0xKcmrjpEJM1DqkqjyK8pXdJQ1BbPhOrysVJq6qw8= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=h0LmPXOkPCxgQfb+wGi7knj3Jwo0ty1uZbGsWJoNyUzGjvdiKDcSn37uN+I58lsbe92hvl27y3zU9kUWfgRUSvH6DQwv82FoEAG0rde8j94f53U5IsPFBlNJWwR6ZoqGXsNe3JEYZc3f0iGPUbbYGrJYQc6IjdrjaWiXSJzPJvIonm6rDwe3A6Pm6kldSzDT4nwVw9DoFch1gfoJdZqlmMOcROwnI26MZ3bYxpgeO4HHnS4ykZq0m3HKx1gcdVxrGoW3T1+fDYnUGFsHrD2poyZsQ+4jImcRPsJQNrxHymk6gJxOQ9ofAqM/pDZ9qnYdGwPCxRVNxRHuwVnHzwEGOQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XAVxVwnH+JVDBcLEGomMw4+7BBBp58nwNUOQNRgnVh8=; b=TZPM4im7dLnoBkD2wTTJu5h7x1OiZensckxujWOr/4Y7wQ2jpe3466WHav/WbkBTM8WCWqDyv2a0NPzAUVLDlAiey01JGrUVYfH2lirZz3wVJATebhcDQCAGJNLYtCeYVTzBUe7WEiAC5dyq9awYS+uWBabFi5BDGekZWW/eA103J99Tl3B2epgd975oexe71qX4HoLBUuQa0cAErfJND/whFihy4wZsFgtEjpzbgxMVu/rsNMjDeJzqsArBFJ1YceBsOnWTk6ZjJjXg8y202SFYCTi5MZaw7HTlL8B3ptbruiJhM7rr60IZk09OHBcmiUoaFBsNgGkXriOPEaGFBw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XAVxVwnH+JVDBcLEGomMw4+7BBBp58nwNUOQNRgnVh8=; b=K8dTvkvRHgJH4yvTp0t1Y/kr4EBiNWecDzZ5mQleZupkkki5ERt0jzfeKc6fsH0sqa6G3WQxfDqmMg9M+P0h1i7HV2FkfRjrvoxIhKLYl0DAynVtlQWGoneFTu/N19JYP/cHqku8JdHTaebgIpdnrYZV8ZwBEeu7AhigPk3L0AQ= Received: from AM0PR01CA0110.eurprd01.prod.exchangelabs.com (2603:10a6:208:168::15) by DU0PR08MB8424.eurprd08.prod.outlook.com (2603:10a6:10:404::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.18; Fri, 26 Jul 2024 09:22:16 +0000 Received: from AM4PEPF00027A68.eurprd04.prod.outlook.com (2603:10a6:208:168:cafe::50) by AM0PR01CA0110.outlook.office365.com (2603:10a6:208:168::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.28 via Frontend Transport; Fri, 26 Jul 2024 09:22:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00027A68.mail.protection.outlook.com (10.167.16.85) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7784.11 via Frontend Transport; Fri, 26 Jul 2024 09:22:16 +0000 Received: ("Tessian outbound 2fd79eef2229:v365"); Fri, 26 Jul 2024 09:22:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 48eb7746bc8ae901 X-CR-MTA-TID: 64aa7808 Received: from L3a6ba7bce3da.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B13C65A6-AC35-4ABA-8D65-1CE1F0F67EEC.1; Fri, 26 Jul 2024 09:22:04 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L3a6ba7bce3da.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 26 Jul 2024 09:22:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ZhAeexICrLH66gZTLCLi2Wn982tlwDlhRPslY6BDWJ0G8dmebm4rDzFUoURL/U5gRcCoR+cpE1uQz5wouQjwA3Xxb5N0M2mNm5fESvH1G6+wOqHoqk87OYIloxq5ZViy++JquiF/YYirjOlw9JrHm2tZaUH6zUjQ49EBHjfkZiodv9DhWlGQCX+wGN3PetTy/7udhk2ykFSCEndcTlir884Hjyyd0NgbdZZM7J593nQLqo5dsquJj0nDZgdmrgNJkyRMVVCPYb3FVunadKIvqnirJA72luB0ORzsBCzjEVtaicljV1l1eEFSlRMFbFEJJi5Vx538hWyozD1sffwHQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XAVxVwnH+JVDBcLEGomMw4+7BBBp58nwNUOQNRgnVh8=; b=mICSaGeQfjQvcrouYaavP5wQA5LsRuIJ0Emwy7WKr1es4e1SGHAdLYH+bffVqPpOeKLfaAaSzb7ylWjii3iY0DUBAK4qwS4WdYUgmFEhYonmynYu8EUD9sCosA3gXZr3xdg2aQslrrdfeREgBGU1CyrgB+EOlAMFHhmUlKcRP49eY615h+To+qMuFK09ImgU4YRzXrOIevcmgL/oy4jYFHHsVFwRGi5d/oSZbB2jltmc7Ip6KapxKu8676TWXqLtymhQO94HixXmC9ZfAaHlrxdYu5fBbVLPdYCXRvcqje3Vyhoh5z6UciKqiJRZ4ENGMzHYoHOatyXFa+hQhslgBA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XAVxVwnH+JVDBcLEGomMw4+7BBBp58nwNUOQNRgnVh8=; b=K8dTvkvRHgJH4yvTp0t1Y/kr4EBiNWecDzZ5mQleZupkkki5ERt0jzfeKc6fsH0sqa6G3WQxfDqmMg9M+P0h1i7HV2FkfRjrvoxIhKLYl0DAynVtlQWGoneFTu/N19JYP/cHqku8JdHTaebgIpdnrYZV8ZwBEeu7AhigPk3L0AQ= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by GV1PR08MB8571.eurprd08.prod.outlook.com (2603:10a6:150:83::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.29; Fri, 26 Jul 2024 09:22:02 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%6]) with mapi id 15.20.7784.020; Fri, 26 Jul 2024 09:22:02 +0000 Date: Fri, 26 Jul 2024 10:21:59 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, ktkachov@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 8/8]AArch64: take gather/scatter decode overhead into account Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P265CA0076.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2bd::11) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|GV1PR08MB8571:EE_|AM4PEPF00027A68:EE_|DU0PR08MB8424:EE_ X-MS-Office365-Filtering-Correlation-Id: 2a92a7a0-cc41-44e6-2347-08dcad5470d1 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|376014|1800799024|366016; X-Microsoft-Antispam-Message-Info-Original: jbm4TEO1N8Zf07DH2DvtX5h+ZTxnGsLlbubZPOZT7FK5a2sgMlRj22fYG/n+Flx9ZvCvH7+JdLQ4656ZMiiti2/oTToHEou9mdWQ87wM5LdPlrcM72HAVMLwow4p7pDlS5Lu3DBusEedFdbHeOnHrBhkgxZT6vUPxvK7zUeani6nlaqf7Y35Ii5YsOm4429QKKAguJN859ytHeMrWCrIelXdgWSiq17dBsiZHXQahSgsFXBv7ajnL07ngku+Jo9MSE17BWE1p1rLx15GFcbykGggcSJIk1ocYX8D1aZfhsPZlDvPpR6gh41VN8gjNmzV00mixzkPdD0F4RJErF4eCwpDxmGgz77g8moWeECfPyMl9qU5Os0uwlBRwV9Oy0v2j6J2QI9bEKVyBicnYOv8GoeFDBSAVfubb4UXDIJfCHtT1ZJMrkDHCkTuTKVtNXxOwn3fh0cAIZHk8r8qHCDcX1nHuyU9GPimKdQs7Vhv0idm+WJxX+oBoNYoQ24iv6nbuQ92xdB8wi86rqtPmda9yCLGqeqAsxg+JGcoayWQR2uIkL93lO0OoVgR+gq8o9hePS0gu/cxgsp5h07N01COyY46WEqNYOgcp62CG+EP2zC1HCJPDr+qWmiS7/RRttQ8+oZwXk864eh95dteGClbGdoABKOUmNdLuoohv8XyG+6zWc82cJ9kkBIqACaF1LFJE/UcAiwqZEKheaNCcIdFatFkmgKDGPOrqFdtxHrKMFSXQuPfG/Fd3INSjbk1weslKh8MEe1m5hUxsmqDseyjlgA0lQF7unDijvuqYoCpNqUFAg9CBVlFKK/DHlT6XHMhNKDGvTRmcpK/a0hLxpcp8TaeDLSEnWcV9AQf/A+UbezHX0SDrZX9hKjvAJKx4uKYmFZQ2hIYvpfH9yPapu+9In0LNoBhyWwdWZ9wEmRFZD4rX0KitYuORCFHUJHuJix+8JDMuPPMe9nkmxAchkGAC59hZE+maWiuZzbw2Wi6LvZvcZI6sd2QRXFv48C1xB84MUh7JbTtxlQysLppQhRpH6l4NS4EwJdWDDd0WmNuNeBQ+YdqxA/EWU13x/mAAmbnGU0zQeIm/HakR4rfURAJ/qWYNHM/68bgv3yIZtQjKw+uugLqEAIn6k++n2ed13F6ygEGvxcZtaFRK++OHomaxNq7n0qdIjGyZ/nSOExwAmpTAMO+guwXhD87pGkdYkOLicVWi7nAOtW3pvmGLZhYPG2mzx+qr6MYmA6ajN6DVZt3KhNYRF/2RgRpLkY0qJhnNOJww3lyc80W/G4YNDR+70WVJV44fy50xs2aCLIj59qdXzHkIiI9Q4BTrylXsYGPtaz/oPS1QNL0Y6phAmMBag== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(1800799024)(366016); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8571 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00027A68.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 9c1794ef-0f2f-42c4-9f5a-08dcad546836 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|35042699022|34020700016|36860700013|82310400026|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?eMYqpT09N0YOxXNTFO8hLQ6geI/opGs?= =?utf-8?q?8kwjm9WaUjb+QCe/roeO7/iQq8d+69wcb2FbgLwVStt88q5JTXrORZm4yMmQ9dTJ3?= =?utf-8?q?19Ggvf8cTu3SVQm0iJQ7ZmmqBr07/WxrNw3vu9O8fbZEg9/Vffx/jegqRw3y0r7UL?= =?utf-8?q?QHtb/2rMY4PojzWVvjLE5uZPclTVTf0GzXRkVYQmjmnqoKOY9lLHvA4BFMHxovSf7?= =?utf-8?q?tcpffXBipV3oBN6usrNHcJqGarI7qp7E41rx+6owRGzTu0gedCrXpE7EqSHamytiW?= =?utf-8?q?33g0v9+dO+lxNgfxKQYP4uA1hkKfwtMDLNE73e4jTODOFktXRHkfPzV1z2X/Bdm+7?= =?utf-8?q?5AhfGSYZgILtFnkeFKtGOoIO7deX2Dt55ToWvganOt0gxXWzDO2kbf/rf2sRLMcZJ?= =?utf-8?q?L8KfkVC3anIHVnvcf8YvZg0zMoUEyiHgul5LvTPf35hQNx/akE02XFDEcUHl1vLxo?= =?utf-8?q?xDSCjYOWqhwdW3M99O/VeRjaKTy228kz+xr0+tlISpoxHa0EAgsTNowvADXQ+/ESC?= =?utf-8?q?TbTHBeWpuPHv2HuqjgksQ9lQcm64jvOn2iaS2NRscejGSKInWgf9ejsUdcaOKMfch?= =?utf-8?q?wT+61ySSkE2hLKMgrf31UjL7og52oZbKIIpxIm3tu87csa7l/1+xO6O8eoekbx1+l?= =?utf-8?q?RrqRMcjnjrhfRlFE0MUoek4MH9Qv+AtdTIOX5Fzd4sGClB6oYw6aUFe/PAw3Ifqvq?= =?utf-8?q?Q3SSHgkkXxn/eVMdz0p0FlqMNpcqctqT9ob5A4Ru6yePQiO8ZjBQ+ojbv46RoT2VV?= =?utf-8?q?k6whNVRSPLqdgpyp91ijSVj0ByBwmxABnIjJNpvpOHJg20HY04WHgRUZ3DExGeRN6?= =?utf-8?q?KcN0e36G/qoRrlEqUU0MXSxXvj2cXxDv9nap87iQmRN9FvRrnKTH9W0MnLI7V5CXS?= =?utf-8?q?UZx1oj7U2888fRYm3xucu/1isB5EusViJ8+YHii6pv0uO/Gl3ZNzzjqwP7q66Zhv8?= =?utf-8?q?8b+VRxq2nddyzcngOLB/Hq2S5pC0xwiERd970sXTvLD4STWfnUboWUDk6jTL5xs4F?= =?utf-8?q?+/296yi+19GZtrCo1vOqQpM48GM1yGyZCXK8VbzBuouHzCGTXCGZ6IO6aMucCYl8u?= =?utf-8?q?njoFJs3OJ+goUkWUdu2/4j9Cf/gpoC0IFIzqlz4c3YwfK09O01x65xOSPgftjozKk?= =?utf-8?q?rx3lSuhVTyaTuSSf1JSI1v03zZIKucwi7NUyJzwrZLoeNJBGgOXiCNNepzxA9YHHF?= =?utf-8?q?h2mcH3f8j4aTIYJ/WKNbzWIh6DzNRnGM5eE0vOWURPSks8qQPSWbOUwYhoeUk8M0I?= =?utf-8?q?D9ntHYbULLxVRP2EylDcP8Odw1N9mXq9SE2eJ3NIhHysvaFoOSo/L1z6TmjZ6t4HA?= =?utf-8?q?dSSj4GmWUSPlUqwPwFHAbx4mADUid26D4Ed2YPDiMCv5ScWv+4izRmY/8GAlF5CDt?= =?utf-8?q?our4XQJjCdx?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(1800799024)(35042699022)(34020700016)(36860700013)(82310400026)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jul 2024 09:22:16.0442 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2a92a7a0-cc41-44e6-2347-08dcad5470d1 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00027A68.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8424 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_LOTSOFHASH, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, Gather and scatters are not usually beneficial when the loop count is small. This is because there's not only a cost to their execution within the loop but there is also some cost to enter loops with them. As such this patch models this overhead. For generic tuning we however still prefer gathers/scatters when the loop costs work out. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. This improves performance of Exchange in SPECCPU 2017 by 3% with SVE enabled. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-protos.h (struct sve_vec_cost): Add gather_load_x32_init_cost and gather_load_x64_init_cost. * config/aarch64/aarch64.cc (aarch64_vector_costs): Add m_sve_gather_scatter_x32 and m_sve_gather_scatter_x64. (aarch64_vector_costs::add_stmt_cost): Use them. (aarch64_vector_costs::finish_cost): Likewise. * config/aarch64/tuning_models/a64fx.h: Update. * config/aarch64/tuning_models/cortexx925.h: Update. * config/aarch64/tuning_models/generic.h: Update. * config/aarch64/tuning_models/generic_armv8_a.h: Update. * config/aarch64/tuning_models/generic_armv9_a.h: Update. * config/aarch64/tuning_models/neoverse512tvb.h: Update. * config/aarch64/tuning_models/neoversen2.h: Update. * config/aarch64/tuning_models/neoversen3.h: Update. * config/aarch64/tuning_models/neoversev1.h: Update. * config/aarch64/tuning_models/neoversev2.h: Update. * config/aarch64/tuning_models/neoversev3.h: Update. * config/aarch64/tuning_models/neoversev3ae.h: Update. --- -- diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 42639e9efcf1e0f9362f759ae63a31b8eeb0d581..16eb8edab4d9fdfc6e3672c56ef5c9f6962d0c0b 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -262,6 +262,8 @@ struct sve_vec_cost : simd_vec_cost unsigned int fadda_f64_cost, unsigned int gather_load_x32_cost, unsigned int gather_load_x64_cost, + unsigned int gather_load_x32_init_cost, + unsigned int gather_load_x64_init_cost, unsigned int scatter_store_elt_cost) : simd_vec_cost (base), clast_cost (clast_cost), @@ -270,6 +272,8 @@ struct sve_vec_cost : simd_vec_cost fadda_f64_cost (fadda_f64_cost), gather_load_x32_cost (gather_load_x32_cost), gather_load_x64_cost (gather_load_x64_cost), + gather_load_x32_init_cost (gather_load_x32_init_cost), + gather_load_x64_init_cost (gather_load_x64_init_cost), scatter_store_elt_cost (scatter_store_elt_cost) {} @@ -289,6 +293,12 @@ struct sve_vec_cost : simd_vec_cost const int gather_load_x32_cost; const int gather_load_x64_cost; + /* Additional loop initialization cost of using a gather load instruction. The x32 + value is for loads of 32-bit elements and the x64 value is for loads of + 64-bit elements. */ + const int gather_load_x32_init_cost; + const int gather_load_x64_init_cost; + /* The per-element cost of a scatter store. */ const int scatter_store_elt_cost; }; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index eafa377cb095f49408d8a926fb49ce13e2155ba2..1e14c3c0d24b449d404724e436ba57e1996ec062 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16227,6 +16227,12 @@ private: supported by Advanced SIMD and SVE2. */ bool m_has_avg = false; + /* This loop uses an SVE 32-bit element gather or scatter operation. */ + bool m_sve_gather_scatter_x32 = false; + + /* This loop uses an SVE 64-bit element gather or scatter operation. */ + bool m_sve_gather_scatter_x64 = false; + /* True if the vector body contains a store to a decl and if the function is known to have a vld1 from the same decl. @@ -17291,6 +17297,17 @@ aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind, stmt_info, vectype, where, stmt_cost); + + /* Check if we've seen an SVE gather/scatter operation and which size. */ + if (kind == scalar_load + && aarch64_sve_mode_p (TYPE_MODE (vectype)) + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER) + { + if (GET_MODE_UNIT_BITSIZE (TYPE_MODE (vectype)) == 64) + m_sve_gather_scatter_x64 = true; + else + m_sve_gather_scatter_x32 = true; + } } /* Do any SVE-specific adjustments to the cost. */ @@ -17676,6 +17693,18 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs) m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs, m_costs[vect_body]); m_suggested_unroll_factor = determine_suggested_unroll_factor (); + + /* For gather and scatters there's an additional overhead for the first + iteration. For low count loops they're not beneficial so model the + overhead as loop prologue costs. */ + if (m_sve_gather_scatter_x32 || m_sve_gather_scatter_x64) + { + const sve_vec_cost *sve_costs = aarch64_tune_params.vec_costs->sve; + if (m_sve_gather_scatter_x32) + m_costs[vect_prologue] += sve_costs->gather_load_x32_init_cost; + else + m_costs[vect_prologue] += sve_costs->gather_load_x64_init_cost; + } } /* Apply the heuristic described above m_stp_sequence_cost. Prefer diff --git a/gcc/config/aarch64/tuning_models/a64fx.h b/gcc/config/aarch64/tuning_models/a64fx.h index 6091289d4c3c66f01d7e4dbf97a85c1f8c40bb0b..378a1b3889ee265859786c1ff6525fce2305b615 100644 --- a/gcc/config/aarch64/tuning_models/a64fx.h +++ b/gcc/config/aarch64/tuning_models/a64fx.h @@ -104,6 +104,8 @@ static const sve_vec_cost a64fx_sve_vector_cost = 13, /* fadda_f64_cost */ 64, /* gather_load_x32_cost */ 32, /* gather_load_x64_cost */ + 0, /* gather_load_x32_init_cost */ + 0, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h b/gcc/config/aarch64/tuning_models/cortexx925.h index fb95e87526985b02410d54a5a3ec8539c1b0ba6d..c4206018a3ff707f89ff3300700ec7dc2a5bc6b0 100644 --- a/gcc/config/aarch64/tuning_models/cortexx925.h +++ b/gcc/config/aarch64/tuning_models/cortexx925.h @@ -135,6 +135,8 @@ static const sve_vec_cost cortexx925_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/generic.h b/gcc/config/aarch64/tuning_models/generic.h index 2b1f68b3052117814161a32f426422736ad6462b..101969bdbb9ccf7eafbd9a1cd6e25f0b584fb261 100644 --- a/gcc/config/aarch64/tuning_models/generic.h +++ b/gcc/config/aarch64/tuning_models/generic.h @@ -105,6 +105,8 @@ static const sve_vec_cost generic_sve_vector_cost = 2, /* fadda_f64_cost */ 4, /* gather_load_x32_cost */ 2, /* gather_load_x64_cost */ + 12, /* gather_load_x32_init_cost */ + 4, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h index b38b9a8c5cad7d12aa38afdb610a14a25e755010..b5088afe068aa4be7f9dd614cfdd2a51fa96e524 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h @@ -106,6 +106,8 @@ static const sve_vec_cost generic_armv8_a_sve_vector_cost = 2, /* fadda_f64_cost */ 4, /* gather_load_x32_cost */ 2, /* gather_load_x64_cost */ + 12, /* gather_load_x32_init_cost */ + 4, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h index b39a0c73db910888168790888d24ddf4406bf1ee..fd72de542862909ccb9a9260a16bb01935d97f36 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h @@ -136,6 +136,8 @@ static const sve_vec_cost generic_armv9_a_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 3 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h index 825c6a64990b72cda3641737957dc94d75db1509..d2a0b647791de8fca6d7684849d2ab1e9104b045 100644 --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h @@ -79,6 +79,8 @@ static const sve_vec_cost neoverse512tvb_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 3 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h index 3430eb9c06819e00ab38966bb960bd6525ff2b5c..00d2c12e739ffd371dd4720826894e980d577ca7 100644 --- a/gcc/config/aarch64/tuning_models/neoversen2.h +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -135,6 +135,8 @@ static const sve_vec_cost neoversen2_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 3 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h b/gcc/config/aarch64/tuning_models/neoversen3.h index 7438e39a4bbe43de624b63fdd20d3fde9dfb6fc9..fc4333ffdeaef0115ac162e2da9d8d548bacf576 100644 --- a/gcc/config/aarch64/tuning_models/neoversen3.h +++ b/gcc/config/aarch64/tuning_models/neoversen3.h @@ -135,6 +135,8 @@ static const sve_vec_cost neoversen3_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h index 0fc41ce6a41b3135fa06d2bda1f517fdf4f8dbcf..705ed025730f6683109a4796c6eefa55b437cec9 100644 --- a/gcc/config/aarch64/tuning_models/neoversev1.h +++ b/gcc/config/aarch64/tuning_models/neoversev1.h @@ -126,6 +126,8 @@ static const sve_vec_cost neoversev1_sve_vector_cost = 8, /* fadda_f64_cost */ 32, /* gather_load_x32_cost */ 16, /* gather_load_x64_cost */ + 96, /* gather_load_x32_init_cost */ + 32, /* gather_load_x64_init_cost */ 3 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h index cca459e32c1384f57f8345d86b42b7814ae44115..680feeb9e4ee7bf21d5a258d83e522e079fdc156 100644 --- a/gcc/config/aarch64/tuning_models/neoversev2.h +++ b/gcc/config/aarch64/tuning_models/neoversev2.h @@ -135,6 +135,8 @@ static const sve_vec_cost neoversev2_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 3 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversev3.h b/gcc/config/aarch64/tuning_models/neoversev3.h index 3daa3d2365c817d03c6c0d5e66fe832620d8fb2c..812c6ad304e8d4c503dcd444437bf6528d6f3176 100644 --- a/gcc/config/aarch64/tuning_models/neoversev3.h +++ b/gcc/config/aarch64/tuning_models/neoversev3.h @@ -135,6 +135,8 @@ static const sve_vec_cost neoversev3_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversev3ae.h b/gcc/config/aarch64/tuning_models/neoversev3ae.h index 29c6f22e941b26ee333c87b9fac22aea86625e97..280b5abb27d3c9f404d5f96f14d0cba1e13b9bd1 100644 --- a/gcc/config/aarch64/tuning_models/neoversev3ae.h +++ b/gcc/config/aarch64/tuning_models/neoversev3ae.h @@ -135,6 +135,8 @@ static const sve_vec_cost neoversev3ae_sve_vector_cost = operation more than a 64-bit gather. */ 14, /* gather_load_x32_cost */ 12, /* gather_load_x64_cost */ + 42, /* gather_load_x32_init_cost */ + 24, /* gather_load_x64_init_cost */ 1 /* scatter_store_elt_cost */ };