From patchwork Thu Jan 12 15:57:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1725246 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=toZ3dYH4; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Nt8Mx0S6Mz23fd for ; Fri, 13 Jan 2023 02:58:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 830A33854384 for ; Thu, 12 Jan 2023 15:58:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 830A33854384 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673539082; bh=44LgVpI1hoLG+gQsEXOS4mREcX9dFytACINTxAHnlTM=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=toZ3dYH4ZcTQcqWG1pbLC8X0oITix6m2wqbD+Hqkgg4KeLFDK+uhPgChzFcvcWesy /lgChOCuc85ISp6uyxWlaCic7GYn0dlTcmueLAHdbBPLJ4IulBvNm8Vz9ZVznHy/OY cIBGO2ONcXkFNPVWZ0iCcE37P5VWwbHWd9dq/lDA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2062.outbound.protection.outlook.com [40.107.20.62]) by sourceware.org (Postfix) with ESMTPS id CF2973858D35 for ; Thu, 12 Jan 2023 15:57:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF2973858D35 Received: from AS9PR05CA0202.eurprd05.prod.outlook.com (2603:10a6:20b:495::31) by PA4PR08MB6238.eurprd08.prod.outlook.com (2603:10a6:102:e8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13; Thu, 12 Jan 2023 15:57:45 +0000 Received: from AM7EUR03FT041.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:495:cafe::b5) by AS9PR05CA0202.outlook.office365.com (2603:10a6:20b:495::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:57:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT041.mail.protection.outlook.com (100.127.140.233) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:57:45 +0000 Received: ("Tessian outbound 0d7b2ab0f13d:v132"); Thu, 12 Jan 2023 15:57:45 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 212799be7f5ea731 X-CR-MTA-TID: 64aa7808 Received: from ce3b836705db.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id CBB250AB-1675-435E-8D95-3E484D92C158.1; Thu, 12 Jan 2023 15:57:37 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ce3b836705db.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:57:37 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DgroAxjWNN9Xc/Cb4szvJlG0cY2am7PQTTvCP/nNeGHtrwZ57yPzzkizy6gxRlUsFr5Ge8qhFESRyQF08McvdY3WCYe/NsIyTIWP6UQWHzA4Ii/PBe0SfZ1VV5jwKVWKPFWGwSVwUeLYahjvB2dvsIh6aIUAAT6ocEyQKWt7t3K1CphO0rtJC4rLr23dauDns85klxVSkSqI92ssRx7WhM9TB4+iY5dyzAYqHTH4ArCXQ/yJYICyxi3bo/mqddOCaasorz3le80diQu/050bOtJhfrAiSsvpYpDUTVYZUQRmaoqCD4qs9q4O3melbiMhf+/3U+vHHYaKqsqgjjTxxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=44LgVpI1hoLG+gQsEXOS4mREcX9dFytACINTxAHnlTM=; b=OENEk3g8eEQlPEaVgc4MnT980gHZkKb3GQQjZEXvLYPfa9v1vMWIvIV3zPluKJ85qat/wwHqsgeD7JDsRVGGEfndpxesbZmxY14Q+2UWXbSVVSzhP6/koJ2XESO8Kssn6Fmz7St6tfcp427ejUkUbq7j1Y3Fphh2hFun8N7OQ89SC717Es0TTe90kjoYGkzHI27++qgqjU0C1L251y7SZ8PTshpOxJE8h5RFIhed5vhLs2It6SsZLUlLXuhDmCHdrgsSI2osvyv10KHEqmsWlr+QJIPn4B39eXKogPGJm5AbAiZpGavmgvVE0dIqyEJ2OU3HRs8tLTm6kT9Av5Rb8w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:57:36 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:57:36 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize memrchr Thread-Topic: [PATCH] AArch64: Optimize memrchr Thread-Index: AQHZJp5/hJ/UrlI3MUKRFxDXOPvwPQ== Date: Thu, 12 Jan 2023 15:57:35 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|AM7EUR03FT041:EE_|PA4PR08MB6238:EE_ X-MS-Office365-Filtering-Correlation-Id: 2a81bc9f-c497-4ebe-72a7-08daf4b5beec x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: l+7ky9jkHJEob8AjDVzHWvRB0HPmRLEjV483pV7yUxHcWb6UbAW+HL86Pv97CVXNVtMY0uHtmFOaMdcdoR1k8x7rEDu/SF8VubjZsvYeGAOg1d+q4X11TVMbeucRkxkU7a8zrYhe9xV8aA7eBPuvENeAHnfke0wJrGvbZhT6pshIcu1LixioHer4WKs4OdKAWGicC0y2cClXVJ0QGJF87YqZEsATYLtmAh85105pa9mrC51uX6mzgpaL0De5eXaYDmIskOJkr4yzuIIrS7aXxolzvdyX1eaIzycMyHjAIxpX9mSG9YMrTOLmTEt4pJhsgmLdYzwgdsrEciEZG+1MyDDzO46irXtWllopexefUf+/uxgX66YykCSCEzlPuS5n2038zwCdHOxx3QSiqDaZgk+b26OqG/dMF6i2qJMDIiBBheisBnOpM1ik1WL8iLohUW8bgG63x5hUDl4JG0htjWBYFdTWmwIvL9o4y0oxU12rJYZMFX/psGzL4nE8doYBpivP82NewxUayj8Vhk0vkhdxtSM7aaaCKbqZOCfrkvkY9LFMuRumJKlO4XTWizbaFaeVjN+HBU8Cr2lRRFYw08BvgwiiOmq99wwfNwkaek9IW/uQ8WdW6usaraefWStgTLbH+YKv8Jn96sOyErI+S87FIhfxKSRYxLBy8slyNO3eGHu+30ZqEZjOdbHHcPxgUOHvzbliUCVtpGy1h7oSdA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT041.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 0fa48245-c503-4095-13b6-08daf4b5b931 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: VL/OJLyjT87c8pBqeLRE7YqRG/tBL9VPJh/DWIHAJI8ZjukQ+fgughXllfmQtLR+9DCZs/ya8lfHEOjIS0i+yHBAEe1V9A1HcW0uKWre71JaC8z7T1VETgIKwPFNHSc57d1BOexG9RicBMWdq/CEdnSTbZvns3y0UZz/KBIgMKOgh/tMuxnmvCfkOkBGXVehiYfnYIUtIsmWxsfnr6sVg2gp8YLLZPcG5mWr4qSsqwO2zA7bEnGLLy/yjgqBRcj75EjbBW4WSQJ2wIjUqS0h9TRa89AwUV2iQMrnKTRMl5cQBzvDFZKDCiZPLu5vS4FqR36vSM+RM5dgTto22HmtVFbvttfMP9CgBJXWfzbBAzYveTkal5JyVpOSkG6c7+aLbrHzbQ6IQn9pJu9U5UJaZ5Wgo2LrCylLhrfFhderbjU3xbUSMVMpSJylPRIvFHzTubwgNF5if4hMz7m7bTiU/3DRh9RQbtKwZ8e9JWOwmVW7mHN9nzi10XY+wNCUviEReJ9wdhJeSaNKz3AfHoBTvIhaeXEK++u80STKaAUelBfo4efQCrcEY0mOQzbEw8yDVfGIXDu5dT6zBIQtFXMUhkaSR37Cyla1XmLzd6PDIN7cKGOg9AoJjcSKTJL/pjblLkhtEEZHqs1VDPy66D6lci41dG8vsJr92You35ibe1bQIe/14xvT8A/3djnu951Z5aqaKUIm11OVB2RGocXlGA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(396003)(39860400002)(346002)(136003)(376002)(451199015)(40470700004)(36840700001)(46966006)(70586007)(70206006)(8676002)(6916009)(4326008)(52536014)(8936002)(316002)(478600001)(5660300002)(2906002)(6506007)(26005)(9686003)(186003)(336012)(47076005)(33656002)(55016003)(40480700001)(36860700001)(40460700003)(82310400005)(81166007)(356005)(86362001)(41300700001)(7696005)(82740400003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:57:45.5302 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2a81bc9f-c497-4ebe-72a7-08daf4b5beec X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT041.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6238 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Optimize the main loop - large strings are 43% faster on modern CPUs. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/memrchr.S b/sysdeps/aarch64/memrchr.S index 9d2d29a396d46d6c2e74e3ca637091e2f3d68d5e..621fc65109736646b74900db8d15c6f8a7c68895 100644 --- a/sysdeps/aarch64/memrchr.S +++ b/sysdeps/aarch64/memrchr.S @@ -26,7 +26,6 @@ * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 #define chrin w1 #define cntin x2 @@ -77,31 +76,34 @@ ENTRY (__memrchr) csel result, result, xzr, hi ret + nop L(start_loop): - sub tmp, end, src - subs cntrem, cntin, tmp + subs cntrem, src, srcin b.ls L(nomatch) /* Make sure that it won't overread by a 16-byte chunk */ - add tmp, cntrem, 15 - tbnz tmp, 4, L(loop32_2) + sub cntrem, cntrem, 1 + tbz cntrem, 4, L(loop32_2) + add src, src, 16 - .p2align 4 + .p2align 5 L(loop32): - ldr qdata, [src, -16]! + ldr qdata, [src, -32]! cmeq vhas_chr.16b, vdata.16b, vrepchr.16b umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ fmov synd, dend cbnz synd, L(end) L(loop32_2): - ldr qdata, [src, -16]! + ldr qdata, [src, -16] subs cntrem, cntrem, 32 cmeq vhas_chr.16b, vdata.16b, vrepchr.16b - b.ls L(end) + b.lo L(end_2) umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ fmov synd, dend cbz synd, L(loop32) +L(end_2): + sub src, src, 16 L(end): shrn vend.8b, vhas_chr.8h, 4 /* 128->64 */ fmov synd, dend