From patchwork Wed Mar 23 21:57:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1608838 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=jq2x2lQV; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KP2bX5sw1z9sFh for ; Thu, 24 Mar 2022 09:10:16 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E786B388883C for ; Wed, 23 Mar 2022 22:10:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E786B388883C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1648073414; bh=A4Uy0SyZmbNlAy+/+/BVz8ahuqimWUHNJ28tJFyidCo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jq2x2lQVP2b3QT9Dc8KUa0nGMQbOBxCzy7+hloe3xaNvy+RWoxI8zjU6rq/EiJwz5 XWgKDydxva84xFK5HGlD5nhS5MpCvUWSmssp17myiNS/idapPMqvykBV/6sjpKbVJk UW5FeHMSX+Q/3EW07E1ecq1wKSPyS80c2VOqyiuM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-il1-x134.google.com (mail-il1-x134.google.com [IPv6:2607:f8b0:4864:20::134]) by sourceware.org (Postfix) with ESMTPS id B2AF73865C2A for ; Wed, 23 Mar 2022 22:01:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B2AF73865C2A Received: by mail-il1-x134.google.com with SMTP id x9so1993605ilc.3 for ; Wed, 23 Mar 2022 15:01:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=A4Uy0SyZmbNlAy+/+/BVz8ahuqimWUHNJ28tJFyidCo=; b=POom6euoI8Ovdh+8jWnGlz2BhldptiHGGrws0hTw9r457M6YjLdMFCh1uU5agmGDs5 gtWmdUzjCvFJDGkFK1+xNdNS0uPZapbLS7I9Fkw24rDzOliAMygthFj0opOi94B1GaJO OCIefIH5XklCyXgjOyDZQhg08I3swi6DMdat1wtkH93oKh/1XRHQXufGO3b/LqgDelAT MsS3LEpjqeX7teS9GpTrNs3e4BUMoEtM/9QdqdtjnKnG1+QargNR7FS3RwkILOJHllNq PJnWPWCMcpEDUZ6D6dzxIOUwWS97Osu4eEn0kzUjb1TqaepRLkBVXAAr7/NqRrPjINo+ LthA== X-Gm-Message-State: AOAM530CI3a2vrLWN8MDy13Yf+GIeSX5TN0zndCBb4vhDPQvYRnLrQJm Yey+cLEhdH0kXJEZu49i0RAjOT6aaak= X-Google-Smtp-Source: ABdhPJzWGY9lmEHcwfbZg8Q/9D12l13ftVILMKauN6qDY3XJk0p6Rmy6urNTMFkWKBZbfgwAAgBVgQ== X-Received: by 2002:a05:6e02:12c3:b0:2c8:5ba5:f97c with SMTP id i3-20020a056e0212c300b002c85ba5f97cmr1097421ilm.218.1648072898829; Wed, 23 Mar 2022 15:01:38 -0700 (PDT) Received: from localhost.localdomain (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.googlemail.com with ESMTPSA id c22-20020a5ea816000000b00649d360663asm529227ioa.40.2022.03.23.15.01.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 15:01:38 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 17/23] x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S Date: Wed, 23 Mar 2022 16:57:36 -0500 Message-Id: <20220323215734.3927131-17-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220323215734.3927131-1-goldstein.w.n@gmail.com> References: <20220323215734.3927131-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Slightly faster method of doing TOLOWER that saves an instruction. Also replace the hard coded 5-byte no with .p2align 4. On builds with CET enabled this misaligned entry to strcasecmp. geometric_mean(N=40) of all benchmarks New / Original: .894 All string/memory tests pass. Reviewed-by: H.J. Lu --- Geomtric Mean N=40 runs; All functions page aligned length, align1, align2, max_char, New Time / Old Time 1, 1, 1, 127, 0.903 2, 2, 2, 127, 0.905 3, 3, 3, 127, 0.877 4, 4, 4, 127, 0.888 5, 5, 5, 127, 0.901 6, 6, 6, 127, 0.954 7, 7, 7, 127, 0.932 8, 0, 0, 127, 0.918 9, 1, 1, 127, 0.914 10, 2, 2, 127, 0.877 11, 3, 3, 127, 0.909 12, 4, 4, 127, 0.876 13, 5, 5, 127, 0.886 14, 6, 6, 127, 0.914 15, 7, 7, 127, 0.939 4, 0, 0, 127, 0.963 4, 0, 0, 254, 0.943 8, 0, 0, 254, 0.927 16, 0, 0, 127, 0.876 16, 0, 0, 254, 0.865 32, 0, 0, 127, 0.865 32, 0, 0, 254, 0.862 64, 0, 0, 127, 0.863 64, 0, 0, 254, 0.896 128, 0, 0, 127, 0.885 128, 0, 0, 254, 0.882 256, 0, 0, 127, 0.87 256, 0, 0, 254, 0.869 512, 0, 0, 127, 0.832 512, 0, 0, 254, 0.848 1024, 0, 0, 127, 0.835 1024, 0, 0, 254, 0.843 16, 1, 2, 127, 0.914 16, 2, 1, 254, 0.949 32, 2, 4, 127, 0.955 32, 4, 2, 254, 1.004 64, 3, 6, 127, 0.844 64, 6, 3, 254, 0.905 128, 4, 0, 127, 0.889 128, 0, 4, 254, 0.845 256, 5, 2, 127, 0.929 256, 2, 5, 254, 0.907 512, 6, 4, 127, 0.837 512, 4, 6, 254, 0.862 1024, 7, 6, 127, 0.895 1024, 6, 7, 254, 0.89 sysdeps/x86_64/strcmp.S | 64 +++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 35 deletions(-) diff --git a/sysdeps/x86_64/strcmp.S b/sysdeps/x86_64/strcmp.S index e2ab59c555..99d8b36f1d 100644 --- a/sysdeps/x86_64/strcmp.S +++ b/sysdeps/x86_64/strcmp.S @@ -75,9 +75,8 @@ ENTRY2 (__strcasecmp) movq __libc_tsd_LOCALE@gottpoff(%rip),%rax mov %fs:(%rax),%RDX_LP - // XXX 5 byte should be before the function - /* 5-byte NOP. */ - .byte 0x0f,0x1f,0x44,0x00,0x00 + /* Either 1 or 5 bytes (dependeing if CET is enabled). */ + .p2align 4 END2 (__strcasecmp) # ifndef NO_NOLOCALE_ALIAS weak_alias (__strcasecmp, strcasecmp) @@ -94,9 +93,8 @@ ENTRY2 (__strncasecmp) movq __libc_tsd_LOCALE@gottpoff(%rip),%rax mov %fs:(%rax),%RCX_LP - // XXX 5 byte should be before the function - /* 5-byte NOP. */ - .byte 0x0f,0x1f,0x44,0x00,0x00 + /* Either 1 or 5 bytes (dependeing if CET is enabled). */ + .p2align 4 END2 (__strncasecmp) # ifndef NO_NOLOCALE_ALIAS weak_alias (__strncasecmp, strncasecmp) @@ -146,22 +144,22 @@ ENTRY (STRCMP) #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L .section .rodata.cst16,"aM",@progbits,16 .align 16 -.Lbelowupper: - .quad 0x4040404040404040 - .quad 0x4040404040404040 -.Ltopupper: - .quad 0x5b5b5b5b5b5b5b5b - .quad 0x5b5b5b5b5b5b5b5b -.Ltouppermask: +.Llcase_min: + .quad 0x3f3f3f3f3f3f3f3f + .quad 0x3f3f3f3f3f3f3f3f +.Llcase_max: + .quad 0x9999999999999999 + .quad 0x9999999999999999 +.Lcase_add: .quad 0x2020202020202020 .quad 0x2020202020202020 .previous - movdqa .Lbelowupper(%rip), %xmm5 -# define UCLOW_reg %xmm5 - movdqa .Ltopupper(%rip), %xmm6 -# define UCHIGH_reg %xmm6 - movdqa .Ltouppermask(%rip), %xmm7 -# define LCQWORD_reg %xmm7 + movdqa .Llcase_min(%rip), %xmm5 +# define LCASE_MIN_reg %xmm5 + movdqa .Llcase_max(%rip), %xmm6 +# define LCASE_MAX_reg %xmm6 + movdqa .Lcase_add(%rip), %xmm7 +# define CASE_ADD_reg %xmm7 #endif cmp $0x30, %ecx ja LABEL(crosscache) /* rsi: 16-byte load will cross cache line */ @@ -172,22 +170,18 @@ ENTRY (STRCMP) movhpd 8(%rdi), %xmm1 movhpd 8(%rsi), %xmm2 #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L -# define TOLOWER(reg1, reg2) \ - movdqa reg1, %xmm8; \ - movdqa UCHIGH_reg, %xmm9; \ - movdqa reg2, %xmm10; \ - movdqa UCHIGH_reg, %xmm11; \ - pcmpgtb UCLOW_reg, %xmm8; \ - pcmpgtb reg1, %xmm9; \ - pcmpgtb UCLOW_reg, %xmm10; \ - pcmpgtb reg2, %xmm11; \ - pand %xmm9, %xmm8; \ - pand %xmm11, %xmm10; \ - pand LCQWORD_reg, %xmm8; \ - pand LCQWORD_reg, %xmm10; \ - por %xmm8, reg1; \ - por %xmm10, reg2 - TOLOWER (%xmm1, %xmm2) +# define TOLOWER(reg1, reg2) \ + movdqa LCASE_MIN_reg, %xmm8; \ + movdqa LCASE_MIN_reg, %xmm9; \ + paddb reg1, %xmm8; \ + paddb reg2, %xmm9; \ + pcmpgtb LCASE_MAX_reg, %xmm8; \ + pcmpgtb LCASE_MAX_reg, %xmm9; \ + pandn CASE_ADD_reg, %xmm8; \ + pandn CASE_ADD_reg, %xmm9; \ + paddb %xmm8, reg1; \ + paddb %xmm9, reg2 + TOLOWER (%xmm1, %xmm2) #else # define TOLOWER(reg1, reg2) #endif