From patchwork Thu Jan 12 15:51:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1725239 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=J7+ikTzM; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Nt8G61kD3z23fk for ; Fri, 13 Jan 2023 02:53:02 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D62873854390 for ; Thu, 12 Jan 2023 15:52:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D62873854390 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673538779; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=J7+ikTzMYVxPGi4nf0joIsPDqmv1A4QQGNEQo+6Q0RWrS3u+y2EKSUYUrYndHemwB kR0ytvoeT9OQD3V1eySNJ1TGcb6gZT4UP8ZC5X6fzzPtVtgmhdiYxR3fPDV4kESX1B +1Eb5OPl9txUERry1nwaIemdnHdFk/v/uvIY4wfQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2055.outbound.protection.outlook.com [40.107.20.55]) by sourceware.org (Postfix) with ESMTPS id DC69E3858C74 for ; Thu, 12 Jan 2023 15:51:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DC69E3858C74 Received: from DB6PR07CA0099.eurprd07.prod.outlook.com (2603:10a6:6:2c::13) by DU0PR08MB7590.eurprd08.prod.outlook.com (2603:10a6:10:317::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:51:42 +0000 Received: from DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:2c:cafe::45) by DB6PR07CA0099.outlook.office365.com (2603:10a6:6:2c::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6023.6 via Frontend Transport; Thu, 12 Jan 2023 15:51:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT030.mail.protection.outlook.com (100.127.142.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:51:42 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Thu, 12 Jan 2023 15:51:42 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 222dc42d7b6b6fd4 X-CR-MTA-TID: 64aa7808 Received: from 4b1ec5a452e6.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C0AE5D42-1D25-41B1-9F2A-01F30853F805.1; Thu, 12 Jan 2023 15:51:35 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4b1ec5a452e6.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:51:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YHOGp8uoWRK6r2t3Qr5G1RLRMNnZWWvjvAd4EG9gsYj/28m1bsAy3QwvuvkgAjY4v6fazEngcawl6k7lpqoQAYRPItxMZEtkQnBdXnvIK8MnRNaNoAoDDTT65vQ/UQnLOHzz75a+edLeHj85+bKspL4hXsa6cmhJh422ntL7pj6met2A44rCg47+KFDfhLFtzrDDF29rSQAjRPOty7PAbT85fvCoscO7imlLoGrrCFsstJeQ9i7rbdCjk7KDMcO49Q15C+MLQ1/z1M5bBNtAjMD/vHKJ67qdmjfluiEnZG8mwRFulE7X0MfuMUJUwq1KES6nq1DLOM8QWjzal4cJYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; b=HFxKqadiEUM9cUWOTr44UArcYcV4qisgRI0nFe84KIMgSPw3rVamyylxszWhgBgi/zibrMMnrWu5BaiYePAN6W7yYaUhEaeNipCmYhBwi40TOYNKe/Hes0/KDI4AFO8DWOUERdBXUduHT6r+d+163i0+mG9jM8v7Av1w/QU0XEOSxC1tG/K+o22t9Cmj+ONtfk9X2fsC9mL1Kz4aVbcNs5fMkInB3QLB7GRlPLR0fbMLgE3PmCTS7pQ69pEdE93J+DE/Vc7m9jDOd3djS8brC6rrNrD2X5s79GEm6KQHwc1XkXf/bivLs2v+uNsZjZORGcieORR/zk6g0mT5kHx/GQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:51:34 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:51:32 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Improve strlen_asimd Thread-Topic: [PATCH] AArch64: Improve strlen_asimd Thread-Index: AQHZJp2JkWdXUUw02UqIwGwR+HcG/Q== Date: Thu, 12 Jan 2023 15:51:32 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|DBAEUR03FT030:EE_|DU0PR08MB7590:EE_ X-MS-Office365-Filtering-Correlation-Id: f5b6b68c-ae57-491d-45df-08daf4b4e647 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ijGM0Uk0r+8wfVKWCowyBG5hkgOcVVyAOCE4z+ePDywTqPCLZkNSu/wiuZze3PL4FUCSV8c41+khjLhXDsQxGiagqZqwF45suMpSsPQ15OjuiTCESRGvyn7NxEdZ4itXF22q4W21pKNqhtYI8t+RIGfI9G8/S9aKS7UE7pmNLSi7s7TlxycgiTA8+owKhR3CL1y6JVQEa5isKcaebqmwcKoclYuMLerxfsr6smqE7hxtSsrp7Q1qIZLqcN4Qm5NgyEk4HY66UGI8f8gBVjhuju2MT3y+0RMHlMccU/IENY266VeHBYyKsTUBcvsj9/HyVqTsBJnUvNpYnBUaJ8wJNYgUxw7dbBiY3A9GocnRzk+NM2RTgrk6KX+6eXvzFG4jOwTyfmysOHXSlpH2RpccY3RNQ+L0nx7Ejioxz9xzSehDf/kGTkjEL3oJ0pheo5pC8kPgFkxJTtTkhkNUDFP4PzPccFWekdBe3m8OJ3lzECh6tGWrY6AZrCC4YdF8tufSpG2NNEHTm03EkIav5v2yQuE9yt0zFNvsjSPVaGHytJFjN8C+yIXoD/628vqIApalvzmqhPnXy61LH98sI/fIB+9gHIb7jNpu9TmbHU5KYnsRtNjI0HqYvuAf2OKcEZgyo8Pz/J0WQtXXSJ9LhCQBJMdrbhs18intW1jzGK9q9aye+oMjlXzrFlJ13du1o5OYJudc9u/JAx+vWUdeVMApKg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(83380400001)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c4c77672-508b-4d3d-1792-08daf4b4e087 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dUtFvdTCPiHF2KO5N+fW406cyR8mh7nmXqsDs3YHBTyeHVgcq0axWPgkTGwkNp6qucPvtits2Dq/CFaUcoKnGH4HKEpLSE987YIjHrlVqLfBFW2lLOtgaCDg1Jv7d5PRDs3GDwubumw2EYKmgH2Ij9TXDIIs5CHGKxEc3cz+DIuTfMXJDRJIhNZbfJsc+Ub7T/oUDHJzKOPPno/GX5vllRUcR0Iz6fbItiU40psBjQzEZlZavJ0JFXtV6XKZpSHEK+LMgh+fdS4X1VypOO8Cuxp1tg/W9JCyVSbOKN+/TqCoB1KEZtOb/zjxvSnY7DJXVFyEsj9BM72xkvynGronH461Ioe9YkbrnTFEWM9xmfQUk+mWaZBbMdAuDqJTgKC47lJhAMmiUW0bDwpXE3zkKP5rKElwqZ3XKMD81sbVs1rnVuv6EAyGW+pdMbKifcxNw3QfHEb0V7r12Ra4PQasJl2L0e0SpN+GZiyyHPGldcHivQ/0WqrktBvu215Sj4XJIKuu2jIUMrBzX5gmPTXCfjbv83O7PRJOyaqOPz/bbl1mZcSmlZD/m9sGjBg8mrBqz0ncI2IVAszagl7hv+kRUxr2l57vzOZadD0HCvYVixkwLRnOn1Jses03oZGiNILRDqNHm+8VUQ9rekWLfPcVQeO39ZuJ7zvLi0d2ZAFlfdxxwhAkeeumeqH7ONilr7YvXVEKUuEB7VWnDPmIvHB3+w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(39860400002)(396003)(451199015)(36840700001)(46966006)(40470700004)(82310400005)(40460700003)(478600001)(83380400001)(9686003)(36860700001)(6916009)(86362001)(5660300002)(2906002)(316002)(4326008)(33656002)(82740400003)(26005)(70586007)(6506007)(336012)(70206006)(40480700001)(7696005)(55016003)(8936002)(52536014)(356005)(8676002)(81166007)(47076005)(186003)(41300700001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:51:42.0911 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f5b6b68c-ae57-491d-45df-08daf4b4e647 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7590 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. Performance improves slightly as a result. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/multiarch/strlen_asimd.S b/sysdeps/aarch64/multiarch/strlen_asimd.S index ca6ab96ecf2de45def79539facd8e0b86f4edc95..490439491d19c3f14b0228f42248bc8aa6e9e8bd 100644 --- a/sysdeps/aarch64/multiarch/strlen_asimd.S +++ b/sysdeps/aarch64/multiarch/strlen_asimd.S @@ -48,6 +48,7 @@ #define tmp x2 #define tmpw w2 #define synd x3 +#define syndw w3 #define shift x4 /* For the first 32 bytes, NUL detection works on the principle that @@ -87,7 +88,6 @@ ENTRY (__strlen_asimd) PTR_ARG (0) - and tmp1, srcin, MIN_PAGE_SIZE - 1 cmp tmp1, MIN_PAGE_SIZE - 32 b.hi L(page_cross) @@ -123,7 +123,6 @@ ENTRY (__strlen_asimd) add len, len, tmp1, lsr 3 ret - .p2align 3 /* Look for a NUL byte at offset 16..31 in the string. */ L(bytes16_31): ldp data1, data2, [srcin, 16] @@ -151,6 +150,7 @@ L(bytes16_31): add len, len, tmp1, lsr 3 ret + nop L(loop_entry): bic src, srcin, 31 @@ -166,18 +166,12 @@ L(loop): /* Low 32 bits of synd are non-zero if a NUL was found in datav1. */ cmeq maskv.16b, datav1.16b, 0 sub len, src, srcin - tst synd, 0xffffffff - b.ne 1f + cbnz syndw, 1f cmeq maskv.16b, datav2.16b, 0 add len, len, 16 1: /* Generate a bitmask and compute correct byte offset. */ -#ifdef __AARCH64EB__ - bic maskv.8h, 0xf0 -#else - bic maskv.8h, 0x0f, lsl 8 -#endif - umaxp maskv.16b, maskv.16b, maskv.16b + shrn maskv.8b, maskv.8h, 4 fmov synd, maskd #ifndef __AARCH64EB__ rbit synd, synd @@ -186,8 +180,6 @@ L(loop): add len, len, tmp, lsr 2 ret - .p2align 4 - L(page_cross): bic src, srcin, 31 mov tmpw, 0x0c03