From patchwork Fri Nov 10 17:33:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1862476 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=rh8tYsH2; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=rh8tYsH2; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SRmBx28N8z1yQl for ; Sat, 11 Nov 2023 04:33:45 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DCF8B3858C01 for ; Fri, 10 Nov 2023 17:33:41 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2087.outbound.protection.outlook.com [40.107.21.87]) by sourceware.org (Postfix) with ESMTPS id 2E5CF3858D32 for ; Fri, 10 Nov 2023 17:33:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2E5CF3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2E5CF3858D32 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.21.87 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637611; cv=pass; b=ljT0S3STAvwD+w03M5SsY/TfKl0intS9K0FyZlwphvseZxWOnXJ3uNfGiBQm5IGJa84rhnDZuU3vs0NYeVvHk8f0iAu3QsVzsQINvZjk4GWzo3fYZWxAYcyLCUoag+AbXCV3Q/GRUPr8uWmFW0zslrcK7sNuiD9fh7XmiLCqEfI= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637611; c=relaxed/simple; bh=9f1whndkPa5MyeLjOSMJnUxMzY+OWAv/TOsWsRG7dxU=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=m1a7RnIO4qWIGsd+0GPib9oVVyXb/re/l9olbpRvhfEn0a10zfmlvJfm1dnF8hvn7fBanlhX0hfqiUyS0HMmKR14vQUYVhXTejm4fEEcpi/Z5Tmy77R4nhOYMdCDTdRi9KDEmKjHQREj3MpCebl8GWx1RODixUvq/1WMsQ55X0A= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=U7JZifYO8DwfWUHpvB1mbMYZEFtIHfOwAZAcwpBjFmBlVNlxlvzCDLmQ8TcN6/uqTdogfU51p31H2rKxsd/iLIQirfGlql/975nYlRMSJTnVesiSWT5DiluCcKbH2MhWDGYFyvrfnjL4DW6tPcaIZWprDMPeg2Bl1wJVPO4Ny8JmQlUt7zNfTEWQN/pEvt323HDgmbOC1bJMty5uo9vPQfDztjnUZoKd6ceF3bGn7fPCRp3YG3CQckH6VZBD3OUC9y8toFDLFcqoUFZ5IACPQj/We7ngX4D+gAja/kz9q3pzOS1Pxst0S3rHQEjEHCPqMeZJosZrCYZlCMNA4ZFqTA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=GxZnAd8MyIsVcDU0gvVNfof9fW0/gGmLQTM6Un7mvLG9MJpg3a2C2qG5xhvTM33WVc3Cn9ZVUKqHXy3XM1pBT8zk+dm7G8HMN2uJz/uW8KUDyJGhghXffcfcHJ1ZRvkWjxZZHm/s2CJ5uY9GlWtazI6gQwy5ECfQau9p3YHa+mO3MIIhoW2KIuGkS82t2Snvd1WZtT+ZuYIXo6Q+Mi3OKbPKcGE+8dwjl1p0ldczGuwhOreeQgr2SgH06ThWKXoqiUE+xLr8XFKcFfeGs2QR2RQQbJOflo8Jv8oHUmogy0jUa/BdOgAmMF3fPMivgX1XsUph/NOywMqnWwj4kTm+3A== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=rh8tYsH2eRLY/6KI/Uh8jxkCzkCMEyjx6sdCzD7kf9uZx9J5VA8XDzzlkPlQGDRS5Xva2MpRzMuwmiLxN3zqWuZsluaZFlVU4U0W2evtww+7P23yvaA2ENu50nJYE5mWBWxHC+9b/DLnaP8U4JnYWNoEBjBrlXHcKcKpJkpTntc= Received: from DU2PR04CA0262.eurprd04.prod.outlook.com (2603:10a6:10:28e::27) by AS2PR08MB9572.eurprd08.prod.outlook.com (2603:10a6:20b:608::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Fri, 10 Nov 2023 17:33:25 +0000 Received: from DB5PEPF00014B9D.eurprd02.prod.outlook.com (2603:10a6:10:28e:cafe::6c) by DU2PR04CA0262.outlook.office365.com (2603:10a6:10:28e::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19 via Frontend Transport; Fri, 10 Nov 2023 17:33:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B9D.mail.protection.outlook.com (10.167.8.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Fri, 10 Nov 2023 17:33:25 +0000 Received: ("Tessian outbound 20615a7e7970:v228"); Fri, 10 Nov 2023 17:33:24 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ccce7936657ed274 X-CR-MTA-TID: 64aa7808 Received: from 54e9b748342e.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 40172A21-892A-4FA4-BEE0-A5FD43688632.1; Fri, 10 Nov 2023 17:33:19 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 54e9b748342e.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 10 Nov 2023 17:33:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dtLfmzlyey9/3OjwQWNMRSjufPe/G/WsqkZxLyhZBq3m9Xv2Jz39vKRLt4musbqtCqJnruqQ1kyqksavb3wbtelTpPJb+z6es2FNrxZFfOUUsV4zC1HWe+iU/AZ4/A4n0F3nMNrOomKfrgjpZczfAcV/kKXLTp3Lf03MyW25iPr4utXNjk8f7l60dzchqYG2x7lBm9o3ChvBkdsY8+b6gsKPg+TFwfaNl4SG+/DAPJkTPDRHaGG50AY3hdCMgZDkIkd7mh7cLNx5fzlBy2OlHbONOTG7+V/adHTb5Hp+gXvqK5+VIx8VUgKGveT4lc8Hx9EsowM1bKhk7L7sywMqLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=go0kEzFoZDUgPFN/IN/MXFCuVRkbr6WIy9fS2KfkVmUXE6iZGeX65ik/xo0PTS/M5RxBuOCwb1KV2C3H+VEiEdvKU4AZzrJiunix+8pa4kI7BR+4y4bE525jAhcB2m2vEXE9c9THulA1SCJoYt3R23r1arBNRvW6+MhdkmidPIIzRrHzluLTvxdRO7oJONmBDPepkzyP/6gmzg4KDhZe52UIN/Zq3Kzov1ld6d0rIYVeSidZR/xkTHq2xHqvRqVQWrznl7IP+uvdWXd5aNZK2+LGiWJSX7XdpkUZcUBSE+2CZlHD8a0G5M2Nh12i4ZMN61w7lfNPo07AXlNXLS/J8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=rh8tYsH2eRLY/6KI/Uh8jxkCzkCMEyjx6sdCzD7kf9uZx9J5VA8XDzzlkPlQGDRS5Xva2MpRzMuwmiLxN3zqWuZsluaZFlVU4U0W2evtww+7P23yvaA2ENu50nJYE5mWBWxHC+9b/DLnaP8U4JnYWNoEBjBrlXHcKcKpJkpTntc= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS8PR08MB7766.eurprd08.prod.outlook.com (2603:10a6:20b:526::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19; Fri, 10 Nov 2023 17:33:17 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704%5]) with mapi id 15.20.6954.028; Fri, 10 Nov 2023 17:33:16 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH 1/3] AArch64: Cleanup emag memset Thread-Topic: [PATCH 1/3] AArch64: Cleanup emag memset Thread-Index: AQHaE/tjx2PHJXcNXkuLVKP0zFh/3w== Date: Fri, 10 Nov 2023 17:33:14 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS8PR08MB7766:EE_|DB5PEPF00014B9D:EE_|AS2PR08MB9572:EE_ X-MS-Office365-Filtering-Correlation-Id: ed123a0d-d591-48ab-01ae-08dbe21324a5 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: vizeO0cuVHSyg/aMbvA3grfQxulkRlIxk947Q354FSLe6MlULGWkOYqH2o3HrHei/GwUa7fOVY6hhqhPHuFw8rqd6xL2o/iobBI3XAY0kAXLmJZ7ell4whKJ9olOnJ3jgwKWtjnoAvfRk7Mn+rKIjpV4qk8JY3NZyCnf/QFUcqO3MEaNhfUpQQS7BI9rTJkyJfgFS95zcRKmMjE3rypc/uYUED/0KODFTpa5b8usdMIGTo/6niZ/pOf/o/1MfD9SU/b6CGZOB2Uvieq4m1KsEhaputYJlgTTPAYK5mWGBHc1WhF2R89Da0vHDmZxDYWpx6YlTgEiW8M/Hl8z5TO1DNuRKDH0WlXg41XsL1F1xweSJc/keA1VugJTmUBSlAJsuTaU9Ru0Hr1Ol74l3u10OebXNxeSseEa0j8taYd5DkB1exvWyLN2qQIzX9LzlYnb7Mc9ZO6V1njvnRM0WMVr02wrGEaHFSZqmenS7cKShvK0aZbYEXCmE11QnGs6kGl2ND+Edn4d5/D76QOjj7yc4LTckyY3AquRwutUEVIuTi+M1+GAuneGm6DYTrH0ojeKC6Upj7CLHEJOkjzTpAefzluM7pM2nn6ao1rXjBqM6RxnjgcN0PaGVfITjs0fLmicjo0VIaj8z+jiWmTd8BGO5+OO20epKqHUnqt15B1d7oU= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(136003)(366004)(346002)(396003)(39860400002)(376002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(26005)(38100700002)(6506007)(7696005)(55016003)(122000001)(83380400001)(9686003)(478600001)(316002)(71200400001)(91956017)(66946007)(64756008)(76116006)(66556008)(66476007)(66446008)(6916009)(8936002)(38070700009)(33656002)(41300700001)(4326008)(52536014)(2906002)(8676002)(86362001)(5660300002)(2004002); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7766 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B9D.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 7cb879aa-a641-4e35-441d-08dbe2131e8d X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xXrCSgocUMWNCh64uiZ4WxMzmtxpd27v8dALn+mu8Zsy8FC0JVEmp8XB+PS6Al9m3tEHqDAi0oguCjf6S8nztKugtAM/x2nj2hZAq1yf/g67EKOJz4sl+Eam+aGXKwfN/kHjcbI8l9mi5Bic5pa6KUIMyLW4Gkpi1HpL1qjfcZplTGSU4XVaAaXBvKd379RWrijt0F8TcpnvNRVKajsUO1UdMhjtJ521byzFbEKZw0YGTcN3lDxh0PLfwv0axrHghm64uH77XebpFKOayKtCDqFeTylOvmvX+sGDW7tYZ8ePApEA5rSKTLDMo80/uf1nH+A8UWuYShr1ManFZgLPCE1ZtZy7i4lG4dRngsjLJ5V6H6jRa4B8fy/VkT6/eV2rClKQ0m+aiTIjIrylAmEmTFoXOqErHZ3BBWVY/p5aTHOXpLhfGLs8niEPEQqtYm9fzZSlorNEBy5I5FdYzo/lnsH5lpVUNkiSoNlC7znaoVDXeb3cKE0wCX5NEa7eg2NCIgROcdN2rDNySr3FFaPFQumta6O9OpzI0eHiUUXHOIz2icJW+gZSoP1adLV1bwHfbNrbsIGba54dsoZQwOC18Lu+N50/CKx4WyGwdifRpOC+KjwzVpMG3xqpXdKSc9kcsDZNJdvcpjOdkaZHUHoTHkGmdW0fz2pBacjMlW20VNOJKEYtLNGkujE2Lo3SoZxN+4UrLoWkmura2NncaXTqxdusflhtI5X1PjIzF5G+iP8B4XdhwjUODnixztL/m0yC X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(376002)(396003)(346002)(39860400002)(136003)(230922051799003)(1800799009)(186009)(82310400011)(64100799003)(451199024)(46966006)(36840700001)(40470700004)(40460700003)(40480700001)(55016003)(7696005)(9686003)(478600001)(6506007)(82740400003)(33656002)(336012)(356005)(86362001)(81166007)(36860700001)(70586007)(2906002)(52536014)(5660300002)(8676002)(26005)(41300700001)(47076005)(70206006)(4326008)(316002)(6916009)(83380400001)(8936002)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2023 17:33:25.0106 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ed123a0d-d591-48ab-01ae-08dbe21324a5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B9D.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9572 X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Cleanup emag memset - merge the memset_base64.S file, remove the ZVA code. OK for commit? Reviewed-by: Adhemerval Zanella diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index 836e8317a5d3b652134d199cf685499983b1a3fc..3596d3c8d3403b4ea07d80d9a8877e2908a9883e 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -57,7 +57,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Enable this on non-falkor processors too so that other cores can do a comparative analysis with __memset_generic. */ IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_falkor) - IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_emag) + IFUNC_IMPL_ADD (array, i, memset, 1, __memset_emag) IFUNC_IMPL_ADD (array, i, memset, 1, __memset_kunpeng) #if HAVE_AARCH64_SVE_ASM IFUNC_IMPL_ADD (array, i, memset, sve && zva_size == 256, __memset_a64fx) diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c index 23fc66e15879847557b0e4f6941f03bc7ac5cab9..9193b197ddc3a647768184a6a639d6635cfea96e 100644 --- a/sysdeps/aarch64/multiarch/memset.c +++ b/sysdeps/aarch64/multiarch/memset.c @@ -56,7 +56,7 @@ select_memset_ifunc (void) if ((IS_FALKOR (midr) || IS_PHECDA (midr)) && zva_size == 64) return __memset_falkor; - if (IS_EMAG (midr) && zva_size == 64) + if (IS_EMAG (midr)) return __memset_emag; return __memset_generic; diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S deleted file mode 100644 index 0e8f709fa58478d6e9d62020c576bb9be108866c..0000000000000000000000000000000000000000 --- a/sysdeps/aarch64/multiarch/memset_base64.S +++ /dev/null @@ -1,185 +0,0 @@ -/* Copyright (C) 2018-2023 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#include -#include "memset-reg.h" - -#ifndef MEMSET -# define MEMSET __memset_base64 -#endif - -/* To disable DC ZVA, set this threshold to 0. */ -#ifndef DC_ZVA_THRESHOLD -# define DC_ZVA_THRESHOLD 512 -#endif - -/* Assumptions: - * - * ARMv8-a, AArch64, unaligned accesses - * - */ - -ENTRY (MEMSET) - - PTR_ARG (0) - SIZE_ARG (2) - - bfi valw, valw, 8, 8 - bfi valw, valw, 16, 16 - bfi val, val, 32, 32 - - add dstend, dstin, count - - cmp count, 96 - b.hi L(set_long) - cmp count, 16 - b.hs L(set_medium) - - /* Set 0..15 bytes. */ - tbz count, 3, 1f - str val, [dstin] - str val, [dstend, -8] - ret - - .p2align 3 -1: tbz count, 2, 2f - str valw, [dstin] - str valw, [dstend, -4] - ret -2: cbz count, 3f - strb valw, [dstin] - tbz count, 1, 3f - strh valw, [dstend, -2] -3: ret - - .p2align 3 - /* Set 16..96 bytes. */ -L(set_medium): - stp val, val, [dstin] - tbnz count, 6, L(set96) - stp val, val, [dstend, -16] - tbz count, 5, 1f - stp val, val, [dstin, 16] - stp val, val, [dstend, -32] -1: ret - - .p2align 4 - /* Set 64..96 bytes. Write 64 bytes from the start and - 32 bytes from the end. */ -L(set96): - stp val, val, [dstin, 16] - stp val, val, [dstin, 32] - stp val, val, [dstin, 48] - stp val, val, [dstend, -32] - stp val, val, [dstend, -16] - ret - - .p2align 4 -L(set_long): - stp val, val, [dstin] - bic dst, dstin, 15 -#if DC_ZVA_THRESHOLD - cmp count, DC_ZVA_THRESHOLD - ccmp val, 0, 0, cs - b.eq L(zva_64) -#endif - /* Small-size or non-zero memset does not use DC ZVA. */ - sub count, dstend, dst - - /* - * Adjust count and bias for loop. By subtracting extra 1 from count, - * it is easy to use tbz instruction to check whether loop tailing - * count is less than 33 bytes, so as to bypass 2 unnecessary stps. - */ - sub count, count, 64+16+1 - -#if DC_ZVA_THRESHOLD - /* Align loop on 16-byte boundary, this might be friendly to i-cache. */ - nop -#endif - -1: stp val, val, [dst, 16] - stp val, val, [dst, 32] - stp val, val, [dst, 48] - stp val, val, [dst, 64]! - subs count, count, 64 - b.hs 1b - - tbz count, 5, 1f /* Remaining count is less than 33 bytes? */ - stp val, val, [dst, 16] - stp val, val, [dst, 32] -1: stp val, val, [dstend, -32] - stp val, val, [dstend, -16] - ret - -#if DC_ZVA_THRESHOLD - .p2align 3 -L(zva_64): - stp val, val, [dst, 16] - stp val, val, [dst, 32] - stp val, val, [dst, 48] - bic dst, dst, 63 - - /* - * Previous memory writes might cross cache line boundary, and cause - * cache line partially dirty. Zeroing this kind of cache line using - * DC ZVA will incur extra cost, for it requires loading untouched - * part of the line from memory before zeoring. - * - * So, write the first 64 byte aligned block using stp to force - * fully dirty cache line. - */ - stp val, val, [dst, 64] - stp val, val, [dst, 80] - stp val, val, [dst, 96] - stp val, val, [dst, 112] - - sub count, dstend, dst - /* - * Adjust count and bias for loop. By subtracting extra 1 from count, - * it is easy to use tbz instruction to check whether loop tailing - * count is less than 33 bytes, so as to bypass 2 unnecessary stps. - */ - sub count, count, 128+64+64+1 - add dst, dst, 128 - nop - - /* DC ZVA sets 64 bytes each time. */ -1: dc zva, dst - add dst, dst, 64 - subs count, count, 64 - b.hs 1b - - /* - * Write the last 64 byte aligned block using stp to force fully - * dirty cache line. - */ - stp val, val, [dst, 0] - stp val, val, [dst, 16] - stp val, val, [dst, 32] - stp val, val, [dst, 48] - - tbz count, 5, 1f /* Remaining count is less than 33 bytes? */ - stp val, val, [dst, 64] - stp val, val, [dst, 80] -1: stp val, val, [dstend, -32] - stp val, val, [dstend, -16] - ret -#endif - -END (MEMSET) diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S index 6fecad4fae699f9967da94ddc88867afd5c59414..bbfa815925899149e2313a9317380fa9fd089abd 100644 --- a/sysdeps/aarch64/multiarch/memset_emag.S +++ b/sysdeps/aarch64/multiarch/memset_emag.S @@ -18,17 +18,95 @@ . */ #include +#include "memset-reg.h" -#define MEMSET __memset_emag - -/* - * Using DC ZVA to zero memory does not produce better performance if - * memory size is not very large, especially when there are multiple - * processes/threads contending memory/cache. Here we set threshold to - * zero to disable using DC ZVA, which is good for multi-process/thread - * workloads. +/* Assumptions: + * + * ARMv8-a, AArch64, unaligned accesses + * */ -#define DC_ZVA_THRESHOLD 0 +ENTRY (__memset_emag) + + PTR_ARG (0) + SIZE_ARG (2) + + bfi valw, valw, 8, 8 + bfi valw, valw, 16, 16 + bfi val, val, 32, 32 + + add dstend, dstin, count + + cmp count, 96 + b.hi L(set_long) + cmp count, 16 + b.hs L(set_medium) + + /* Set 0..15 bytes. */ + tbz count, 3, 1f + str val, [dstin] + str val, [dstend, -8] + ret + + .p2align 3 +1: tbz count, 2, 2f + str valw, [dstin] + str valw, [dstend, -4] + ret +2: cbz count, 3f + strb valw, [dstin] + tbz count, 1, 3f + strh valw, [dstend, -2] +3: ret + + .p2align 3 + /* Set 16..96 bytes. */ +L(set_medium): + stp val, val, [dstin] + tbnz count, 6, L(set96) + stp val, val, [dstend, -16] + tbz count, 5, 1f + stp val, val, [dstin, 16] + stp val, val, [dstend, -32] +1: ret + + .p2align 4 + /* Set 64..96 bytes. Write 64 bytes from the start and + 32 bytes from the end. */ +L(set96): + stp val, val, [dstin, 16] + stp val, val, [dstin, 32] + stp val, val, [dstin, 48] + stp val, val, [dstend, -32] + stp val, val, [dstend, -16] + ret + + .p2align 4 +L(set_long): + stp val, val, [dstin] + bic dst, dstin, 15 + /* Small-size or non-zero memset does not use DC ZVA. */ + sub count, dstend, dst + + /* + * Adjust count and bias for loop. By subtracting extra 1 from count, + * it is easy to use tbz instruction to check whether loop tailing + * count is less than 33 bytes, so as to bypass 2 unnecessary stps. + */ + sub count, count, 64+16+1 + +1: stp val, val, [dst, 16] + stp val, val, [dst, 32] + stp val, val, [dst, 48] + stp val, val, [dst, 64]! + subs count, count, 64 + b.hs 1b + + tbz count, 5, 1f /* Remaining count is less than 33 bytes? */ + stp val, val, [dst, 16] + stp val, val, [dst, 32] +1: stp val, val, [dstend, -32] + stp val, val, [dstend, -16] + ret -#include "./memset_base64.S" +END (__memset_emag) From patchwork Fri Nov 10 17:35:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1862477 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=sjmo/Sn7; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=sjmo/Sn7; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SRmFF4wfRz1yQK for ; Sat, 11 Nov 2023 04:35:45 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F1CC13858C01 for ; Fri, 10 Nov 2023 17:35:42 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-AM0-obe.outbound.protection.outlook.com (mail-am0eur02on2041.outbound.protection.outlook.com [40.107.247.41]) by sourceware.org (Postfix) with ESMTPS id BB8CF3858D32 for ; Fri, 10 Nov 2023 17:35:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BB8CF3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BB8CF3858D32 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.247.41 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637733; cv=pass; b=ccYqWw32j1fN7QagvSNINZnqLEkVNsNCiYgb86DVJ+ok7CjS4NDkGEtRE8aM03d3S23xC6OYRZgL2/YAU2SQfL4AESUlm0d0bNThPayqUA8nCWpTq/DXpcQNYrmsXPCBISZ9L0rZEz0elob+Y0fc8pHXcNG+nMExbIdEDIjYaNU= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637733; c=relaxed/simple; bh=slzr/SjqOWlAYfwbbHHfbPRQwPym0zqkfuCb1qbg/Vk=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=G0WfAaYQpZV3+yetcAkjdaO4cM6TyzEPLvtG12MsQ/E+H9gg1ePwV81M69yV1cMUvNF8lj9TKPj0MvIaDpm5R8D+RqfSoSFZFLFZLjBCR7foGz6NrFCSQClAe9qOPRLavgqEVSfcuYCLCqAaou0BfGTR3wb58l8OVe6vTz7T+9I= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=X3SEywcMqYLv36BBJ/mlJtMcarHg2FCCIs6L+rTmE0NlsvJ3+TYMyHTgUknJsK7OZV81pkdW38AnJJo3FYQD44e08BUxt+bnS3wkOoE4xcOOYA+UHSRyyeTLCYHwGZ1zcmWFLiJ6vVzG7v6B0GT7h2bCEOvaGYZC2voURhCUyhU2+UIj+0WhHf/38bSrfu8qPcCmyaMHtb5n36icBo7f5591r+fdyLLqfB0T3TzSgGC8Fkpfhe2e+PY8QqBfAZ63eVN6w8wJZ3MX1qItSlrJLNHh7LDqbVNMa3NXgyWVnBF8qsUrfYTQmIJrtFHfYthdkSR4i1WgJbCbso0s7NZMKw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UCz6g/8EBagKYn4oe9zCM8t8kobFLU3OVYJVUmDvgeU=; b=DlZs/S+At/r8CkOBII0G4ywJwsu3QQ+pEOQg+QEl+6hHIpxWx12S7s5p0G3dpOAxe38MCQsewOWersBGRBBz+/uD/Tl3+acg8TR1QiRmpE9J0SAC94u2VPgKVBUgE6BQAmIqiP2bi0S8npmWTuwyOHF3bOpR/IaVWftoggOY6hIuZZVkfX/d5RAun1q+hQYQNTxbgFJdNpE1vCsuoo1b8SefgHvSvXjO2p65cxDwwKU/ba88aYmm7CeE3frSh+yQvnbBptJRibEoaInDyerc5KMD1T5NNgsFOMGsN15fhf6l/Ma1DUf1n0/IP2I4ahd+kM61lIHZb4wmsHss4TgXaw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UCz6g/8EBagKYn4oe9zCM8t8kobFLU3OVYJVUmDvgeU=; b=sjmo/Sn71Y2HdefVQ4ldP64ATJ6NSli6SjBsVZjyekYKA3I5dKA8oypdaWh7nkiW97O9dU8VvJvD4pU/VHqrdmNEobtl2zbFDKLWm+o9en9lgtlTd6wJjPLc3tTidRcIZ6pYALEwR2EGQ/8g5/knm5qzsuBkX+nTZs/AKt/M+E8= Received: from AM6P192CA0040.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:82::17) by AS8PR08MB7695.eurprd08.prod.outlook.com (2603:10a6:20b:520::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19; Fri, 10 Nov 2023 17:35:28 +0000 Received: from AM3PEPF00009BA2.eurprd04.prod.outlook.com (2603:10a6:209:82:cafe::74) by AM6P192CA0040.outlook.office365.com (2603:10a6:209:82::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19 via Frontend Transport; Fri, 10 Nov 2023 17:35:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM3PEPF00009BA2.mail.protection.outlook.com (10.167.16.27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Fri, 10 Nov 2023 17:35:27 +0000 Received: ("Tessian outbound 5d213238733f:v228"); Fri, 10 Nov 2023 17:35:27 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d231eb1536256b7e X-CR-MTA-TID: 64aa7808 Received: from 155e175bfb78.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 03406A69-2BCB-4045-922A-5375A5B57C6C.1; Fri, 10 Nov 2023 17:35:21 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 155e175bfb78.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 10 Nov 2023 17:35:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SOOyjjL6H9jGqFy15Wb9NVVL/UlB6qUnEA7hfEyzad7GsXmXeyI+eOAZeUjJfa0tCDmn3W62DWVSwhZjc8VTDb+y/Jqaudur05Mazoj1YcqW3evOy3J53I6qmd0rFMDqPLlPH9bssstTGK2YlUgNjdh1ac0vW3hm/qSMIJ5rY9BPlQdr3YKhykopndO6w5/gpoGvtoikfSUJrBNZSgXUPA+Rd+hOonGwSBEEK51KYLNOcd/VY+OYyTSoQy4si22ulZ9SEPSvnKXnRzyPst73YwJB2sBMa9nE9CpZAqJtYIH+2wjpZ+IQN2JS2XyPhrsv7L0n3iuF9zJ6G6r50kTKTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UCz6g/8EBagKYn4oe9zCM8t8kobFLU3OVYJVUmDvgeU=; b=eD85uenOHydbCQhf1q6Fyn9QQipgt4p5I+tl3zG6IZPR50VRkTc4PN9G8AC7MFG9tKroHhRvTX5iYEpw343Uj1zZyvOPU5fVLtXQUW4SrYBQLOXNP6hNHun2caP6o9HcoCS77pXaEnI7yuWFLYTRVLvoUFawOZD5adkyo0aB8F6C5QHqLCn/jOa/tKMmvuq1/AhdpZUA4wchW8ow/P20sPmO6NQO1sRd4Kkewkl0Urc54HwCOQoVAgn6ilclk7ZqDqUW4NTynG+n2ImEwyDanyLCIvxIxBBOTJYB6huVRkZcEcu8gF14grtiLbltojiRHHA4lCDMH+GjVASijEe0Pw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UCz6g/8EBagKYn4oe9zCM8t8kobFLU3OVYJVUmDvgeU=; b=sjmo/Sn71Y2HdefVQ4ldP64ATJ6NSli6SjBsVZjyekYKA3I5dKA8oypdaWh7nkiW97O9dU8VvJvD4pU/VHqrdmNEobtl2zbFDKLWm+o9en9lgtlTd6wJjPLc3tTidRcIZ6pYALEwR2EGQ/8g5/knm5qzsuBkX+nTZs/AKt/M+E8= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS8PR08MB7766.eurprd08.prod.outlook.com (2603:10a6:20b:526::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19; Fri, 10 Nov 2023 17:35:20 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704%5]) with mapi id 15.20.6954.028; Fri, 10 Nov 2023 17:35:20 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH 2/3] AArch64: Add memset_zva64 Thread-Topic: [PATCH 2/3] AArch64: Add memset_zva64 Thread-Index: AQHaE/wPp2cXJ+gkQ0+YV1pmbvS1Hg== Date: Fri, 10 Nov 2023 17:35:19 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS8PR08MB7766:EE_|AM3PEPF00009BA2:EE_|AS8PR08MB7695:EE_ X-MS-Office365-Filtering-Correlation-Id: fb3cb6a8-15e3-4228-1d76-08dbe2136ddf x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Dw/fGi1D37d9h5edtwEgtejqp/dVeUfnc/UUFRDFJSNY56jNylpofkBHLNQIV44AsvaaHQ387GSCAxqByzS9GWTqCtwQa5j60wQtMMrZXVbXg/Lolz1oZgjc42F3h0H3cSVJlgq0RxXz0tPFMzyDjHklQQDir7EjJt6PvManP/GwEFe1nH/vjCQq1MTwkKDgOXdiMpsvu20MPDpGxXSgeiFKv8Wa4zoKvtNUGAGVNaHThABrPy7/xTPyHI/t9oaMj7Kid1H58NcqRfZAVQb9iafN1ZTOhko8KYBAeqVN5tN7g5TNsT8lwPmD3kW5BSUy5xuC1Cx5nXdeEWGhhW10LpTirrN8RBhrqHQaSm96J5k4BjuVk85XVAu4BOry0UOEt+0w2BeEwsA3JRAk04ChUHU7Z9m93CrZWZlwbwNKsT4r1UWFr4Y03mg1e7Zki7s5ABi7cHaM4R+659el0BVasBVM0Wu3SPjxcBMu8Qtblyht3WqpwW7SuVN0n/Ba9TAvTndLhelK7oE31W5ADIV1RF3g2G72lXnPuXaBqlGZHabZro8S2jMCBEZt2IYbFrBriXNKs8ew5lMdbMVCsJTSsYW768bMpmfOTgwHgetGI/wqYXbrcuBI3OI0rFuz3PQjYyJxcZvs6eqWqbY+awbO0kzCVic2xZQnl8foxVLb2XY1pmW9r/2qWdWzvs2sGKedBMN7vlKpt4J4HqQdn4Oq6A== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(136003)(366004)(346002)(396003)(39860400002)(376002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(26005)(38100700002)(6506007)(7696005)(55016003)(122000001)(83380400001)(9686003)(478600001)(316002)(71200400001)(91956017)(66946007)(64756008)(76116006)(66556008)(66476007)(66446008)(6916009)(8936002)(38070700009)(33656002)(41300700001)(4326008)(52536014)(2906002)(8676002)(86362001)(5660300002)(2004002)(357404004); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7766 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM3PEPF00009BA2.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 15281538-b317-4f04-4231-08dbe2136930 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: f0WYnt8ZG32B2MG/3HYjoZw0l2V7anYMD9vJR4u/K49mheljlQQZjIFrM/TZFpRWWPP6+BZ9r8XE9XQQoH25oNqnzn5H0eacAf7T3/ptO7KGIPlob5Ayix87gR6bl+r8Ev6T/yZk+GurTqtF3mUSA45e9Xzcq+dtwZKQXzb8dMtc8QvU/hvSZxYVCRjDMdCAFCqqlFpoFToxP9lj0BUOS+wt4sCsjdB0ftyJOr0R817Q/7eN1/DcU8qx2zK6x+VK2sbUirSEuQm0n9f8IJhjmyIzRS8VT/z5xlikOO1zNe3Zvp7DkVCMEt5qP5Z7zDfSs17U216cratcrv+jOZK+oHBPILbcr4olmPeTZfSzaQS0y82uD1i/MIgo6yNDIoGXBxJT2VpsvwzjI87WAQWSlYOxGkAW+WWKxtEVxQoSB2JTSiH9ezcka37BQ7PAJl9A5gu8E4dkLhzyOECdHvinNjxpRvTifI+gwvp9obyiV/LNm7ZuZq0dW465e/xEn/hDwfDFsUryUEeWb28uCApLkxHz4oQxRqCIR+D/JXHbvR60s3jp/3VFwXVWTlOjzGXTQAMS7M0ikt5UZS8uhRDvNN9z1rczMCqyOzHgBGoJk7B5A4bkuWKWh0C9LxJ5TXLdp/G5rTrmJqvhwUw4xX0R0kdLzFTe2gHc4eVhNZzLCWxZSsDPeCrmqsX3yFIqbo2FYaumNrC9y2TUhuo30+qM5Sx9WoH2x1DW/fkHvZlfJuzOnrgsY0N/yTNjixKk94BihtovtDqlDzko/ega8QMsTA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(346002)(39860400002)(396003)(376002)(136003)(230922051799003)(82310400011)(64100799003)(1800799009)(451199024)(186009)(40470700004)(36840700001)(46966006)(26005)(9686003)(47076005)(55016003)(40480700001)(83380400001)(336012)(40460700003)(36860700001)(478600001)(6506007)(7696005)(316002)(81166007)(70206006)(6916009)(2906002)(70586007)(41300700001)(5660300002)(82740400003)(86362001)(33656002)(8936002)(4326008)(356005)(52536014)(8676002)(2004002)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2023 17:35:27.8026 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fb3cb6a8-15e3-4228-1d76-08dbe2136ddf X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF00009BA2.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7695 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Add a specialized memset for the common ZVA size of 64. Since the code is identical to __memset_falkor, remove the latter. OK for commit? Reviewed-by: Adhemerval Zanella diff --git a/sysdeps/aarch64/memset.S b/sysdeps/aarch64/memset.S index bf3cf85c8a95fd8c03ae13c4173fe507040ee8cd..bbfb7184c3e4277f59178ccf4f9b92814dd7a48d 100644 --- a/sysdeps/aarch64/memset.S +++ b/sysdeps/aarch64/memset.S @@ -101,19 +101,19 @@ L(tail64): ret L(try_zva): -#ifdef ZVA_MACRO - zva_macro -#else +#ifndef ZVA64_ONLY .p2align 3 mrs tmp1, dczid_el0 tbnz tmp1w, 4, L(no_zva) and tmp1w, tmp1w, 15 cmp tmp1w, 4 /* ZVA size is 64 bytes. */ b.ne L(zva_128) - + nop +#endif /* Write the first and last 64 byte aligned block using stp rather than using DC ZVA. This is faster on some cores. */ + .p2align 4 L(zva_64): str q0, [dst, 16] stp q0, q0, [dst, 32] @@ -123,7 +123,6 @@ L(zva_64): sub count, dstend, dst /* Count is now 128 too large. */ sub count, count, 128+64+64 /* Adjust count and bias for loop. */ add dst, dst, 128 - nop 1: dc zva, dst add dst, dst, 64 subs count, count, 64 @@ -134,6 +133,7 @@ L(zva_64): stp q0, q0, [dstend, -32] ret +#ifndef ZVA64_ONLY .p2align 3 L(zva_128): cmp tmp1w, 5 /* ZVA size is 128 bytes. */ diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile index a1a4de3cd93c48db6e47eebc9c111186efca53fb..171ca5e4cf9a87fc7df5896f21c2e5b94ea218ba 100644 --- a/sysdeps/aarch64/multiarch/Makefile +++ b/sysdeps/aarch64/multiarch/Makefile @@ -12,10 +12,10 @@ sysdep_routines += \ memmove_mops \ memset_a64fx \ memset_emag \ - memset_falkor \ memset_generic \ memset_kunpeng \ memset_mops \ + memset_zva64 \ strlen_asimd \ strlen_generic \ # sysdep_routines diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index 3596d3c8d3403b4ea07d80d9a8877e2908a9883e..fdd9ea92463123df213dec27f6f0598f8ce54d6e 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -54,9 +54,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, memmove, mops, __memmove_mops) IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic)) IFUNC_IMPL (i, name, memset, - /* Enable this on non-falkor processors too so that other cores - can do a comparative analysis with __memset_generic. */ - IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_falkor) + IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_zva64) IFUNC_IMPL_ADD (array, i, memset, 1, __memset_emag) IFUNC_IMPL_ADD (array, i, memset, 1, __memset_kunpeng) #if HAVE_AARCH64_SVE_ASM diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c index 9193b197ddc3a647768184a6a639d6635cfea96e..6deb6865e5154f129922dca673cf069f72f46d79 100644 --- a/sysdeps/aarch64/multiarch/memset.c +++ b/sysdeps/aarch64/multiarch/memset.c @@ -28,7 +28,7 @@ extern __typeof (__redirect_memset) __libc_memset; -extern __typeof (__redirect_memset) __memset_falkor attribute_hidden; +extern __typeof (__redirect_memset) __memset_zva64 attribute_hidden; extern __typeof (__redirect_memset) __memset_emag attribute_hidden; extern __typeof (__redirect_memset) __memset_kunpeng attribute_hidden; extern __typeof (__redirect_memset) __memset_a64fx attribute_hidden; @@ -47,18 +47,17 @@ select_memset_ifunc (void) { if (IS_A64FX (midr) && zva_size == 256) return __memset_a64fx; - return __memset_generic; } if (IS_KUNPENG920 (midr)) return __memset_kunpeng; - if ((IS_FALKOR (midr) || IS_PHECDA (midr)) && zva_size == 64) - return __memset_falkor; - if (IS_EMAG (midr)) return __memset_emag; + if (zva_size == 64) + return __memset_zva64; + return __memset_generic; } diff --git a/sysdeps/aarch64/multiarch/memset_falkor.S b/sysdeps/aarch64/multiarch/memset_falkor.S deleted file mode 100644 index c6946a8072ce60099f9c3da0cf4ca54785e6a520..0000000000000000000000000000000000000000 --- a/sysdeps/aarch64/multiarch/memset_falkor.S +++ /dev/null @@ -1,54 +0,0 @@ -/* Memset for falkor. - Copyright (C) 2017-2023 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#include -#include - -/* Reading dczid_el0 is expensive on falkor so move it into the ifunc - resolver and assume ZVA size of 64 bytes. The IFUNC resolver takes care to - use this function only when ZVA is enabled. */ - -#if IS_IN (libc) -.macro zva_macro - .p2align 4 - /* Write the first and last 64 byte aligned block using stp rather - than using DC ZVA. This is faster on some cores. */ - str q0, [dst, 16] - stp q0, q0, [dst, 32] - bic dst, dst, 63 - stp q0, q0, [dst, 64] - stp q0, q0, [dst, 96] - sub count, dstend, dst /* Count is now 128 too large. */ - sub count, count, 128+64+64 /* Adjust count and bias for loop. */ - add dst, dst, 128 -1: dc zva, dst - add dst, dst, 64 - subs count, count, 64 - b.hi 1b - stp q0, q0, [dst, 0] - stp q0, q0, [dst, 32] - stp q0, q0, [dstend, -64] - stp q0, q0, [dstend, -32] - ret -.endm - -# define ZVA_MACRO zva_macro -# define MEMSET __memset_falkor -# include -#endif diff --git a/sysdeps/aarch64/multiarch/memset_zva64.S b/sysdeps/aarch64/multiarch/memset_zva64.S new file mode 100644 index 0000000000000000000000000000000000000000..13f45fd3d882c756f18a1679d758e2eb688f9c3d --- /dev/null +++ b/sysdeps/aarch64/multiarch/memset_zva64.S @@ -0,0 +1,27 @@ +/* Optimized memset for zva size = 64. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include + +#define ZVA64_ONLY 1 +#define MEMSET __memset_zva64 +#undef libc_hidden_builtin_def +#define libc_hidden_builtin_def(X) + +#include "../memset.S" From patchwork Fri Nov 10 17:37:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1862478 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=jILI+hFT; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=jILI+hFT; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SRmHL6zhyz1yQK for ; Sat, 11 Nov 2023 04:37:34 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4D4FC3858C41 for ; Fri, 10 Nov 2023 17:37:32 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on2055.outbound.protection.outlook.com [40.107.13.55]) by sourceware.org (Postfix) with ESMTPS id 60BCC3858D32 for ; Fri, 10 Nov 2023 17:37:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 60BCC3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 60BCC3858D32 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.13.55 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637843; cv=pass; b=L3nou+HKM6h8LKQ1OSW4nnWOywRaaETvCOFEIIuOzIKsmdUyTiK0idKCUKYZ+4fdZT21T740KXUUN79mS/mkxaLQoX3l2qShRu+d2nYhmQAoz4aWwVCA9cYHWirnM5pcQ+lGOdDpUpDLb1zXDHX2F2mh2UXAwPLHc5NAF3s1QMo= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637843; c=relaxed/simple; bh=mW88asbveUV6nWHkBRWT7XSlTeSaUH6HQxPKc8pRBO4=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=MJLayTb0IfEpFof+ez3AbX6xllmHwx6MgUC93Um4FsdTT1S23TUF3nffgTvoxFfU6DMQfzE6Y7OFuAHqMOWMSDheymMXJKxwBrtpVwPP2eZGpqhI8uH88NpoSXxaYpgnpGJyUzp1/39MEim59FZ1z/xB3ls64d39Vbm6dtFT+NA= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=BCL7CqaMfKBLqfGVgfhhjxd8Y/SSO9wDHzPwXhw6std0s+BziYgyTv2sBMnNvoGpwsU+9bhfL2w9/0JUiLac6O508qI7+tzR+a1AIZ94Y69mlYHoSXrMS7CjfJBhXbUCWu03c1zDZZO5pkDo+VDzL7UKn4/fSWwsVtpMNyKaH5k7ztYTvnFgfitHMJqJ9kCFuG5LfT8fV4cDm/hzDv2K6nkg/v2S95bxw0v8M6wFxBCVYR9nE1o4Ez3a3xQ+Ab6nFwg51A/LnPO5XwPB4Ogn3MYSGjEiToYOjIBtNQGHNOYb1P2UzpoMm1OpGpostm5rs3KEPATnd/BO5rISHvyZmg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vYpO3ybxsEpgiXFHSyk/9VCoSlNvYieJjIlydmDfmNY=; b=QJJPwC3pr1eN50xYh/XoXxJ20ct8xlzkcZgvLasl+nXronuj7aQn8leAm41x/rNiVrQmiphJxzOVXh8a6vHXMv57+QdpEoOLz2JcYM7VGDHB+Fy6QdgdQ6bYMnU2996dKmzQ2IOtn1VWtjNK6JYSPUvQCkhkFdRDVD6VRAHDN6NlAgZEff0osdqPpH7U9NwSnrdXWJMQZon/kakdC8y+hnU0DiSdkyDDX3yyrg4r7c3wlxGfNozFOJwDiKdhAkH7wgLX9N5QKIeqDX9m/KQpWoncwFfSPlRbIvBh0zmSdzFInF95zvW8lPKAqC4oNFsxDMbCh7Pg6pLiyU7ldCkwRw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vYpO3ybxsEpgiXFHSyk/9VCoSlNvYieJjIlydmDfmNY=; b=jILI+hFT/nAQv2yuKdImIuW0FNoQa8V1y+f0/grHFTE5oJQx0yKTRdu55FLl0LhhM0wiWgxQNfqWrZNYYCKzI5RsTPc5zB7Ub5/5iPicNEArRZOsXwiBgVplThrmggzbsMYETXydLH9JvTx3KgPdC7CrQsyCtn/2FKyCtQxSpCA= Received: from DB7PR03CA0094.eurprd03.prod.outlook.com (2603:10a6:10:72::35) by GV2PR08MB8390.eurprd08.prod.outlook.com (2603:10a6:150:bc::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Fri, 10 Nov 2023 17:37:14 +0000 Received: from DB5PEPF00014B8D.eurprd02.prod.outlook.com (2603:10a6:10:72:cafe::a3) by DB7PR03CA0094.outlook.office365.com (2603:10a6:10:72::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.21 via Frontend Transport; Fri, 10 Nov 2023 17:37:14 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B8D.mail.protection.outlook.com (10.167.8.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Fri, 10 Nov 2023 17:37:14 +0000 Received: ("Tessian outbound 8289ea11ec17:v228"); Fri, 10 Nov 2023 17:37:14 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 97cdb808cfcf07ac X-CR-MTA-TID: 64aa7808 Received: from 774a0edb7025.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 88B6B370-35BE-46B0-B9FA-407925FE8E3F.1; Fri, 10 Nov 2023 17:37:03 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 774a0edb7025.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 10 Nov 2023 17:37:03 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Dc4V5W3zYFEkKE1YIMgKLIUMdLXbO2DDRrojt2glP7HH9yKnlMYyelEAoKRbsNCnUh78IDb4ZOx0eskcT4SvQIprkKfxR9otkqlrtgm7ai+bJsvQD9g9/VNWT1BZrE69KrtMJWtz5nqGaJe7DB1dADC1S263r9Kz+IFKWsMavmWBxVYlYtrYmNfHIhXknV3p7T4+vbMSRUiHxk/b49I99qGyPef+v2KQMObJ2nbgvhTgCzfva4/DEQfL6U5pzjbJW0slc6Ung7HXSdz0mxUEFFvGbpMo05M59o6c9dX+HaXJ1EX6G2TO0HeON0Z/na+pdduFZnjxmPsECOiddmB/wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vYpO3ybxsEpgiXFHSyk/9VCoSlNvYieJjIlydmDfmNY=; b=bbJWkFdzGItl/WUHWvhvoQQe2RLq+KFX9OjjFfq1lsQPwSPoOwBp7f7ppA9jv9jjf0nKa9wbssLcw91JdwJh3DKcOPQtHya+PpfbaVplGHjzDpdnd19z1GXt/DwiCOHe2lCfkyr+CI4UPx+fBDziABzLjwU1HQ+N+GVOwIowQVQoORfqA7YUVE9Ps4UYFgh09AHeE13WvGcyNW7fo4flI9KI0Wb+1WDxQrpX3Y7zkXiMeC5VM0wWe+Er48zuvDet9uImZPNHAikB9EiOCVuGr4bfa/Mz/097YiK2qHCz6hSNBIZyZc4doj4WznZm2LXVFTcg8icUXieUFovmScXhTg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vYpO3ybxsEpgiXFHSyk/9VCoSlNvYieJjIlydmDfmNY=; b=jILI+hFT/nAQv2yuKdImIuW0FNoQa8V1y+f0/grHFTE5oJQx0yKTRdu55FLl0LhhM0wiWgxQNfqWrZNYYCKzI5RsTPc5zB7Ub5/5iPicNEArRZOsXwiBgVplThrmggzbsMYETXydLH9JvTx3KgPdC7CrQsyCtn/2FKyCtQxSpCA= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS8PR08MB7766.eurprd08.prod.outlook.com (2603:10a6:20b:526::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19; Fri, 10 Nov 2023 17:37:01 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704%5]) with mapi id 15.20.6954.028; Fri, 10 Nov 2023 17:37:01 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH 3/3] AArch64: Remove Falkor memcpy Thread-Topic: [PATCH 3/3] AArch64: Remove Falkor memcpy Thread-Index: AQHaE/xZBFV/Y2tME0ynXF/9UXDNIg== Date: Fri, 10 Nov 2023 17:37:01 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS8PR08MB7766:EE_|DB5PEPF00014B8D:EE_|GV2PR08MB8390:EE_ X-MS-Office365-Filtering-Correlation-Id: 916f2fa3-b352-4c01-28f3-08dbe213ad84 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: HvvCsByj8vQBuG1Nf8JBPv2upCiBaGYnv1kNVcNxaR8o8WUk+b75UXYSHrfMqIQncehLz8Q8rJEoT9dEoOYFExrNQw1Q8U0MNr2Uei7KS6RFaxT+1ahvzqftV2iQJ0Bu1kI5uErUN//xab4jqA9KO9oeS+ASfEzzpPJSYpx9fusFA9q9xMYO852PMInLtr7z8UO2T6dUC7gtWttoXMd3KVYYYJvWl6GYKKyeUCr6iXvrJZDnSDJ2qTJZFankbo4ZEr4EuD9h13rQfOA4IJQW6gBDnT0qW0oPnuNfipbNbkO6DqEYhlj1hagUyS5yhUKOeTvJTc1BgngeBiKvGB5eQhS597PcYtJ8oNznz4A5aZIDhUQu0/sKyJAF5al8g9KcysXXjP+gzHyUx8l8ELfReoevv1oPqOh+i7Qy5BQPJfSfhYxSdqbOAtYG58eLcEJN4WjloD5ZhPskJiUJG4uIe2u391VR2Mo51n6mdogXr18DN+1AqIafXVBZFIBxssH7vjLoTuYKEvPzsc3tg2jSmcZZDRD5N3B8/18ptHKNCMKCEEt2h/RvGEvGrf+19jOBk3Y6CQYS3GKM8jVOUjoOzxt8MA154Ao9vA/CICXR032Nx5zgF5S/EOWhmSlQxDUlH4M/6fGdQIxe3VqPoC3rfrzpNFqJrXyUbEzyheW5edMfirKF0Z6ewQa7pObEjpP28FgM7ErWnqaEfs0Unuve+Cqs6D6qVfPmc164QhUOjdU= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(136003)(366004)(346002)(396003)(39860400002)(376002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(26005)(38100700002)(6506007)(7696005)(55016003)(122000001)(83380400001)(9686003)(478600001)(316002)(966005)(71200400001)(91956017)(66946007)(64756008)(76116006)(66556008)(66476007)(66446008)(6916009)(8936002)(38070700009)(33656002)(41300700001)(4326008)(52536014)(2906002)(8676002)(86362001)(5660300002)(30864003)(21314003)(2004002); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7766 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B8D.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c661a6d8-2b24-4594-40df-08dbe213a5e9 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4vQNqWWYckAR5HRgZDl9ESaQOyH/8cZBTheQcp8j5YHte6ok0jYEcHF6ptU3oM4TUv1Tx9TZVDehFJCv0kggEJZSoxYgzuaoWZzJ9l4FBMMruAHe3DEy7OsPGHuQ4Ua0HfIZXpWLOu//hsQodvznAh5gKzdKMO1V6IbaObNDZ/EjvO16g3C4EY8TFkC+/0Ngq6ODRepL2MC8uQa78yEEDyi4qu/NS95tp0xp+S4b+8NFjNQSF6mX41izWw0L4eo/q/WE2xS64fWo+5T+usSE5OGoxsgG4zVR+g2MGmFtYobdJ4q+aCfXc1kwQZ0D2yt1rzhIufpXLhF1Pfdkz2dqTHLR5AgzgOF+schEgG7mlVvFyDnrMcu7dE+4eaNrt9xqplAQFHqEmCQj90595YiujVlwuQ+j1s+46MrO9r/2HpZ+8yRv/efIYvZpox5a+uWxH26n6NCUKi2mfhkRuKoEU9UE2+rCcfoFS7BiUDtjqyy7lyhcvtvuUyUy0vNQEdh3XCXSbg4BLZ8THVpDfUFwvuqv3iCGPX0UwVUjdQ7NNpm5pniijgclOuZMLGQNNl7xYPL3uwji02TTscYE9Vw466ifOvIzi4EjtySZo0Qc+DWdqbov0S//zzLpWmpWUF+EcrVvnGAm+Jz1k9S81bNHZzsT4W7tMmrY9uOGWscJIoqg8SKON5RXWdInihPWVIDktS/Z+SbXXHCU5qwmqhqkuzNYJIq5n7U2nXph/4hVqf2+Tv46QdOaZJbveD8azN5yOZhUDssaZoOJwFspKOtq0LqYQRFzl5OUnSLub39hABEpG1dAu9FKiKjwMOO1KoyX X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(396003)(136003)(346002)(376002)(39860400002)(230922051799003)(82310400011)(186009)(1800799009)(451199024)(64100799003)(40470700004)(36840700001)(46966006)(70586007)(30864003)(70206006)(478600001)(6506007)(7696005)(9686003)(966005)(81166007)(26005)(6916009)(336012)(2906002)(83380400001)(316002)(47076005)(41300700001)(5660300002)(8936002)(52536014)(4326008)(33656002)(36860700001)(8676002)(82740400003)(86362001)(356005)(40480700001)(55016003)(40460700003)(21314003)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2023 17:37:14.6432 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 916f2fa3-b352-4c01-28f3-08dbe213ad84 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B8D.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8390 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. Passes regress on AArch64. OK for commit? [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella diff --git a/manual/tunables.texi b/manual/tunables.texi index 776fd93fd99741ad4ee99e6553e819538c851e29..d9669ba92df2ac02264009c15626abe11e7b12d8 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -532,7 +532,7 @@ This tunable is specific to powerpc, powerpc64 and powerpc64le. @deftp Tunable glibc.cpu.name The @code{glibc.cpu.name=xxx} tunable allows the user to tell @theglibc{} to assume that the CPU is @code{xxx} where xxx may have one of these values: -@code{generic}, @code{falkor}, @code{thunderxt88}, @code{thunderx2t99}, +@code{generic}, @code{thunderxt88}, @code{thunderx2t99}, @code{thunderx2t99p1}, @code{ares}, @code{emag}, @code{kunpeng}, @code{a64fx}. diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile index 171ca5e4cf9a87fc7df5896f21c2e5b94ea218ba..e4720b746859f51502e070ba0c2f308072a49740 100644 --- a/sysdeps/aarch64/multiarch/Makefile +++ b/sysdeps/aarch64/multiarch/Makefile @@ -3,7 +3,6 @@ sysdep_routines += \ memchr_generic \ memchr_nosimd \ memcpy_a64fx \ - memcpy_falkor \ memcpy_generic \ memcpy_mops \ memcpy_sve \ diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index fdd9ea92463123df213dec27f6f0598f8ce54d6e..73038ac8102b1ef8a58a51ca19638720f26e6a66 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -36,7 +36,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memcpy, IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_thunderx) IFUNC_IMPL_ADD (array, i, memcpy, !bti, __memcpy_thunderx2) - IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_falkor) #if HAVE_AARCH64_SVE_ASM IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_a64fx) IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_sve) @@ -46,7 +45,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memmove, IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_thunderx) IFUNC_IMPL_ADD (array, i, memmove, !bti, __memmove_thunderx2) - IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_falkor) #if HAVE_AARCH64_SVE_ASM IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_a64fx) IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_sve) diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c index 9aace954cbfd1eb3e2b35e570e4eb31bbb3c6cfe..6471fe82e32e91086ea862a4e1a488129e4af456 100644 --- a/sysdeps/aarch64/multiarch/memcpy.c +++ b/sysdeps/aarch64/multiarch/memcpy.c @@ -31,7 +31,6 @@ extern __typeof (__redirect_memcpy) __libc_memcpy; extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_thunderx2 attribute_hidden; -extern __typeof (__redirect_memcpy) __memcpy_falkor attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_a64fx attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_sve attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_mops attribute_hidden; @@ -57,9 +56,6 @@ select_memcpy_ifunc (void) if (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr)) return __memcpy_thunderx2; - if (IS_FALKOR (midr) || IS_PHECDA (midr)) - return __memcpy_falkor; - return __memcpy_generic; } diff --git a/sysdeps/aarch64/multiarch/memcpy_falkor.S b/sysdeps/aarch64/multiarch/memcpy_falkor.S deleted file mode 100644 index 67c4ab34eba40c37c6aae08be6cb5e11e2a82d17..0000000000000000000000000000000000000000 --- a/sysdeps/aarch64/multiarch/memcpy_falkor.S +++ /dev/null @@ -1,313 +0,0 @@ -/* Optimized memcpy for Qualcomm Falkor processor. - Copyright (C) 2017-2023 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#include - -/* Assumptions: - - ARMv8-a, AArch64, falkor, unaligned accesses. */ - -#define dstin x0 -#define src x1 -#define count x2 -#define dst x3 -#define srcend x4 -#define dstend x5 -#define tmp1 x14 -#define A_x x6 -#define B_x x7 -#define A_w w6 -#define B_w w7 - -#define A_q q0 -#define B_q q1 -#define C_q q2 -#define D_q q3 -#define E_q q4 -#define F_q q5 -#define G_q q6 -#define H_q q7 -#define Q_q q6 -#define S_q q22 - -/* Copies are split into 3 main cases: - - 1. Small copies of up to 32 bytes - 2. Medium copies of 33..128 bytes which are fully unrolled - 3. Large copies of more than 128 bytes. - - Large copies align the source to a quad word and use an unrolled loop - processing 64 bytes per iteration. - - FALKOR-SPECIFIC DESIGN: - - The smallest copies (32 bytes or less) focus on optimal pipeline usage, - which is why the redundant copies of 0-3 bytes have been replaced with - conditionals, since the former would unnecessarily break across multiple - issue groups. The medium copy group has been enlarged to 128 bytes since - bumping up the small copies up to 32 bytes allows us to do that without - cost and also allows us to reduce the size of the prep code before loop64. - - The copy loop uses only one register q0. This is to ensure that all loads - hit a single hardware prefetcher which can get correctly trained to prefetch - a single stream. - - The non-temporal stores help optimize cache utilization. */ - -#if IS_IN (libc) -ENTRY (__memcpy_falkor) - - PTR_ARG (0) - PTR_ARG (1) - SIZE_ARG (2) - - cmp count, 32 - add srcend, src, count - add dstend, dstin, count - b.ls L(copy32) - cmp count, 128 - b.hi L(copy_long) - - /* Medium copies: 33..128 bytes. */ -L(copy128): - sub tmp1, count, 1 - ldr A_q, [src] - ldr B_q, [src, 16] - ldr C_q, [srcend, -32] - ldr D_q, [srcend, -16] - tbz tmp1, 6, 1f - ldr E_q, [src, 32] - ldr F_q, [src, 48] - ldr G_q, [srcend, -64] - ldr H_q, [srcend, -48] - str G_q, [dstend, -64] - str H_q, [dstend, -48] - str E_q, [dstin, 32] - str F_q, [dstin, 48] -1: - str A_q, [dstin] - str B_q, [dstin, 16] - str C_q, [dstend, -32] - str D_q, [dstend, -16] - ret - - .p2align 4 - /* Small copies: 0..32 bytes. */ -L(copy32): - /* 16-32 */ - cmp count, 16 - b.lo 1f - ldr A_q, [src] - ldr B_q, [srcend, -16] - str A_q, [dstin] - str B_q, [dstend, -16] - ret - .p2align 4 -1: - /* 8-15 */ - tbz count, 3, 1f - ldr A_x, [src] - ldr B_x, [srcend, -8] - str A_x, [dstin] - str B_x, [dstend, -8] - ret - .p2align 4 -1: - /* 4-7 */ - tbz count, 2, 1f - ldr A_w, [src] - ldr B_w, [srcend, -4] - str A_w, [dstin] - str B_w, [dstend, -4] - ret - .p2align 4 -1: - /* 2-3 */ - tbz count, 1, 1f - ldrh A_w, [src] - ldrh B_w, [srcend, -2] - strh A_w, [dstin] - strh B_w, [dstend, -2] - ret - .p2align 4 -1: - /* 0-1 */ - tbz count, 0, 1f - ldrb A_w, [src] - strb A_w, [dstin] -1: - ret - - /* Align SRC to 16 bytes and copy; that way at least one of the - accesses is aligned throughout the copy sequence. - - The count is off by 0 to 15 bytes, but this is OK because we trim - off the last 64 bytes to copy off from the end. Due to this the - loop never runs out of bounds. */ - - .p2align 4 - nop /* Align loop64 below. */ -L(copy_long): - ldr A_q, [src] - sub count, count, 64 + 16 - and tmp1, src, 15 - str A_q, [dstin] - bic src, src, 15 - sub dst, dstin, tmp1 - add count, count, tmp1 - -L(loop64): - ldr A_q, [src, 16]! - str A_q, [dst, 16] - ldr A_q, [src, 16]! - subs count, count, 64 - str A_q, [dst, 32] - ldr A_q, [src, 16]! - str A_q, [dst, 48] - ldr A_q, [src, 16]! - str A_q, [dst, 64]! - b.hi L(loop64) - - /* Write the last full set of 64 bytes. The remainder is at most 64 - bytes, so it is safe to always copy 64 bytes from the end even if - there is just 1 byte left. */ - ldr E_q, [srcend, -64] - str E_q, [dstend, -64] - ldr D_q, [srcend, -48] - str D_q, [dstend, -48] - ldr C_q, [srcend, -32] - str C_q, [dstend, -32] - ldr B_q, [srcend, -16] - str B_q, [dstend, -16] - ret - -END (__memcpy_falkor) - - -/* RATIONALE: - - The move has 4 distinct parts: - * Small moves of 32 bytes and under. - * Medium sized moves of 33-128 bytes (fully unrolled). - * Large moves where the source address is higher than the destination - (forward copies) - * Large moves where the destination address is higher than the source - (copy backward, or move). - - We use only two registers q6 and q22 for the moves and move 32 bytes at a - time to correctly train the hardware prefetcher for better throughput. - - For small and medium cases memcpy is used. */ - -ENTRY (__memmove_falkor) - - PTR_ARG (0) - PTR_ARG (1) - SIZE_ARG (2) - - cmp count, 32 - add srcend, src, count - add dstend, dstin, count - b.ls L(copy32) - cmp count, 128 - b.ls L(copy128) - sub tmp1, dstin, src - ccmp tmp1, count, 2, hi - b.lo L(move_long) - - /* CASE: Copy Forwards - - Align src to 16 byte alignment so that we don't cross cache line - boundaries on both loads and stores. There are at least 128 bytes - to copy, so copy 16 bytes unaligned and then align. The loop - copies 32 bytes per iteration and prefetches one iteration ahead. */ - - ldr S_q, [src] - and tmp1, src, 15 - bic src, src, 15 - sub dst, dstin, tmp1 - add count, count, tmp1 /* Count is now 16 too large. */ - ldr Q_q, [src, 16]! - str S_q, [dstin] - ldr S_q, [src, 16]! - sub count, count, 32 + 32 + 16 /* Test and readjust count. */ - - .p2align 4 -1: - subs count, count, 32 - str Q_q, [dst, 16] - ldr Q_q, [src, 16]! - str S_q, [dst, 32]! - ldr S_q, [src, 16]! - b.hi 1b - - /* Copy 32 bytes from the end before writing the data prefetched in the - last loop iteration. */ -2: - ldr B_q, [srcend, -32] - ldr C_q, [srcend, -16] - str Q_q, [dst, 16] - str S_q, [dst, 32] - str B_q, [dstend, -32] - str C_q, [dstend, -16] - ret - - /* CASE: Copy Backwards - - Align srcend to 16 byte alignment so that we don't cross cache line - boundaries on both loads and stores. There are at least 128 bytes - to copy, so copy 16 bytes unaligned and then align. The loop - copies 32 bytes per iteration and prefetches one iteration ahead. */ - - .p2align 4 - nop - nop -L(move_long): - cbz tmp1, 3f /* Return early if src == dstin */ - ldr S_q, [srcend, -16] - and tmp1, srcend, 15 - sub srcend, srcend, tmp1 - ldr Q_q, [srcend, -16]! - str S_q, [dstend, -16] - sub count, count, tmp1 - ldr S_q, [srcend, -16]! - sub dstend, dstend, tmp1 - sub count, count, 32 + 32 - -1: - subs count, count, 32 - str Q_q, [dstend, -16] - ldr Q_q, [srcend, -16]! - str S_q, [dstend, -32]! - ldr S_q, [srcend, -16]! - b.hi 1b - - /* Copy 32 bytes from the start before writing the data prefetched in the - last loop iteration. */ - - ldr B_q, [src, 16] - ldr C_q, [src] - str Q_q, [dstend, -16] - str S_q, [dstend, -32] - str B_q, [dstin, 16] - str C_q, [dstin] -3: ret - -END (__memmove_falkor) -#endif diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c index fd346e7b73a86a076ba8e1cdd7fd588098333f48..7602a5d57d1384fa06167aea58b8b16b94f49e4f 100644 --- a/sysdeps/aarch64/multiarch/memmove.c +++ b/sysdeps/aarch64/multiarch/memmove.c @@ -31,7 +31,6 @@ extern __typeof (__redirect_memmove) __libc_memmove; extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden; extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden; extern __typeof (__redirect_memmove) __memmove_thunderx2 attribute_hidden; -extern __typeof (__redirect_memmove) __memmove_falkor attribute_hidden; extern __typeof (__redirect_memmove) __memmove_a64fx attribute_hidden; extern __typeof (__redirect_memmove) __memmove_sve attribute_hidden; extern __typeof (__redirect_memmove) __memmove_mops attribute_hidden; @@ -57,9 +56,6 @@ select_memmove_ifunc (void) if (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr)) return __memmove_thunderx2; - if (IS_FALKOR (midr) || IS_PHECDA (midr)) - return __memmove_falkor; - return __memmove_generic; } diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h index 40b709677d86f040c653315199f62677425abc58..2cf745cd1920552149da9b497f3ff4d7572480b8 100644 --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h @@ -47,11 +47,6 @@ #define IS_THUNDERX2(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \ && MIDR_PARTNUM(midr) == 0xaf) -#define IS_FALKOR(midr) (MIDR_IMPLEMENTOR(midr) == 'Q' \ - && MIDR_PARTNUM(midr) == 0xc00) - -#define IS_PHECDA(midr) (MIDR_IMPLEMENTOR(midr) == 'h' \ - && MIDR_PARTNUM(midr) == 0x000) #define IS_NEOVERSE_N1(midr) (MIDR_IMPLEMENTOR(midr) == 'A' \ && MIDR_PARTNUM(midr) == 0xd0c) #define IS_NEOVERSE_N2(midr) (MIDR_IMPLEMENTOR(midr) == 'A' \ diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c index 233d5b2407e2b792805b7fa661852f59fca0cb71..a11a86efab64118fd2622840228d6bbb4d0b860c 100644 --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c @@ -37,11 +37,9 @@ struct cpu_list }; static struct cpu_list cpu_list[] = { - {"falkor", 0x510FC000}, {"thunderxt88", 0x430F0A10}, {"thunderx2t99", 0x431F0AF0}, {"thunderx2t99p1", 0x420F5160}, - {"phecda", 0x680F0000}, {"ares", 0x411FD0C0}, {"emag", 0x503F0001}, {"kunpeng920", 0x481FD010},