From patchwork Mon Mar 28 15:19:46 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 602510 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qYd030RG8z9sDD for ; Tue, 29 Mar 2016 02:20:50 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=BiD9gmmz; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=K0ZKQKYUZO5gtkwZTVV1N2d7156x9nc WD95V5FoVc5Aej84mftaTr8QLgD8fPJ/LVoHTviaY6BhnLrlc0xkEotd1spLWSVV N7/180SCtsTs77JJsYKjdnQrYDLwWZLugPcy8rT1QgyDzSMz5RgZxRWg9feWgd4W wgLNUGD+u6qs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=8J3DgX5bEKHFaK+abhN/kG0d7EE=; b=BiD9g mmzr1RFNdEtrYsozPTEt6E+xV/qz7+2qRc3O3R0HPbeyGFlwe2R3/CZgEqBn5TRt qXemt1mfW6yMDnVjtkb35zQOfsow4MkSqvSnio2OaOZkOH+qY9s/D89HVzUzatdO mBG+V9MEVjzM3boLS/jIqf2MMOv7XZ4A7U3N28= Received: (qmail 16887 invoked by alias); 28 Mar 2016 15:20:12 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 16688 invoked by uid 89); 28 Mar 2016 15:20:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=memsets, strcspnc, repeatedly, REJECT X-HELO: mail-qk0-f176.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=rZ7P3rzntkD8cin5juz/RIpcVrHfVE00LMRtDy7m5vw=; b=dzGHt8UkI8IsuFD9Cxl+u6lZDzDfKvWWhoLm6tqIdCf0QMkhYRILOBE1ctgR+WTPQi 3fYcHcM0LrBWCYEeNxgfDGg+I3dg7KgaMlUstM8+VSJEbS7dV7+X1Mt+OQmDDrTpZASi qOsqGuywp5LgzvppfODek0lXXKQJBk6RaCCar/HkQyc9IK2CDg4SWIUJ31Mf1V8E5dL0 e+EiXIz9NwwHbbYeJprJZrumc52EtI+2ilgnhqCGt2qszaJv1d0FhRgVROoP3EGN5eMp afTFRP5SNRGPO1Jv5BWB9q0maEBQOtE7aPiStNu9Rgx0H2WI7b2qZAKZCT0uXzPGPokI W9nQ== X-Gm-Message-State: AD7BkJJDiZurNwyuezVEMpC6vjRKEKckpyGP3MiGQ/fwMej3lV3U+mlc1xQwgUDqHYb3Wj7z X-Received: by 10.129.31.131 with SMTP id f125mr13148550ywf.267.1459178398307; Mon, 28 Mar 2016 08:19:58 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Wilco Dijkstra Subject: [PATCH 1/4] Improve generic strcspn performance Date: Mon, 28 Mar 2016 12:19:46 -0300 Message-Id: <1459178389-14133-2-git-send-email-adhemerval.zanella@linaro.org> In-Reply-To: <1459178389-14133-1-git-send-email-adhemerval.zanella@linaro.org> References: <1459178389-14133-1-git-send-email-adhemerval.zanella@linaro.org> From: Wilco Dijkstra Improve strcspn performance using a much faster algorithm. It is kept simple so it works well on most targets. It is generally at least 10 times faster than the existing implementation on bench-strcspn on a few AArch64 implementations, and for some tests 100 times as fast (repeatedly calling strchr on a small string is extremely slow...). In fact the string/bits/string2.h inlines make no longer sense, as GCC already uses strlen if reject is an empty string, strchrnul is 5 times as fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than the strcspn main loop for large strings (though reject length 2-4 could be special cased in the future to gain even more performance). Tested on x86_64, i686, and aarch64. * string/strcspn.c (strcspn): Rewrite function. * string/bits/string2.h (strcspn): Use __builtin_strcspn. --- ChangeLog | 6 ++++++ string/bits/string2.h | 41 ++++++----------------------------------- string/strcspn.c | 44 ++++++++++++++++++++++++++++++++++++-------- 3 files changed, 48 insertions(+), 43 deletions(-) diff --git a/string/bits/string2.h b/string/bits/string2.h index 8200ef1..1b87686 100644 --- a/string/bits/string2.h +++ b/string/bits/string2.h @@ -905,43 +905,14 @@ __stpcpy_small (char *__dest, /* Return the length of the initial segment of S which consists entirely of characters not in REJECT. */ -#if !defined _HAVE_STRING_ARCH_strcspn || defined _FORCE_INLINES -# ifndef _HAVE_STRING_ARCH_strcspn -# if __GNUC_PREREQ (3, 2) -# define strcspn(s, reject) \ - __extension__ \ - ({ char __r0, __r1, __r2; \ - (__builtin_constant_p (reject) && __string2_1bptr_p (reject) \ - ? ((__builtin_constant_p (s) && __string2_1bptr_p (s)) \ - ? __builtin_strcspn (s, reject) \ - : ((__r0 = ((const char *) (reject))[0], __r0 == '\0') \ - ? strlen (s) \ - : ((__r1 = ((const char *) (reject))[1], __r1 == '\0') \ - ? __strcspn_c1 (s, __r0) \ - : ((__r2 = ((const char *) (reject))[2], __r2 == '\0') \ - ? __strcspn_c2 (s, __r0, __r1) \ - : (((const char *) (reject))[3] == '\0' \ - ? __strcspn_c3 (s, __r0, __r1, __r2) \ - : __builtin_strcspn (s, reject)))))) \ - : __builtin_strcspn (s, reject)); }) -# else -# define strcspn(s, reject) \ - __extension__ \ - ({ char __r0, __r1, __r2; \ - (__builtin_constant_p (reject) && __string2_1bptr_p (reject) \ - ? ((__r0 = ((const char *) (reject))[0], __r0 == '\0') \ - ? strlen (s) \ - : ((__r1 = ((const char *) (reject))[1], __r1 == '\0') \ - ? __strcspn_c1 (s, __r0) \ - : ((__r2 = ((const char *) (reject))[2], __r2 == '\0') \ - ? __strcspn_c2 (s, __r0, __r1) \ - : (((const char *) (reject))[3] == '\0' \ - ? __strcspn_c3 (s, __r0, __r1, __r2) \ - : strcspn (s, reject))))) \ - : strcspn (s, reject)); }) -# endif +#ifndef _HAVE_STRING_ARCH_strcspn +# if __GNUC_PREREQ (3, 2) +# define strcspn(s, reject) __builtin_strcspn (s, reject) # endif + /* The inline functions are not used from GLIBC 2.24 and forward, however + they are required to provide the symbols through string-inlines.c + (if inlining is not possible for compatibility reasons). */ __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject); __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject) diff --git a/string/strcspn.c b/string/strcspn.c index 8888919..89ba4ca 100644 --- a/string/strcspn.c +++ b/string/strcspn.c @@ -26,16 +26,44 @@ /* Return the length of the maximum initial segment of S which contains no characters from REJECT. */ size_t -STRCSPN (const char *s, const char *reject) +STRCSPN (const char *str, const char *reject) { - size_t count = 0; + if (reject[0] == '\0' || reject[1] == '\0') + return __strchrnul (str, reject [0]) - str; - while (*s != '\0') - if (strchr (reject, *s++) == NULL) - ++count; - else - return count; + /* Use multiple small memsets to enable inlining on most targets. */ + unsigned char table[256]; + unsigned char *p = memset (table, 0, 64); + memset (p + 64, 0, 64); + memset (p + 128, 0, 64); + memset (p + 192, 0, 64); - return count; + unsigned char *s = (unsigned char*) reject; + unsigned char tmp; + do + p[tmp = *s++] = 1; + while (tmp); + + s = (unsigned char*) str; + if (p[s[0]]) return 0; + if (p[s[1]]) return 1; + if (p[s[2]]) return 2; + if (p[s[3]]) return 3; + + s = (unsigned char *) ((size_t)s & ~3); + + unsigned int c0, c1, c2, c3; + do + { + s += 4; + c0 = p[s[0]]; + c1 = p[s[1]]; + c2 = p[s[2]]; + c3 = p[s[3]]; + } + while ((c0 | c1 | c2 | c3) == 0); + + size_t count = s - (unsigned char *) str; + return (c0 | c1) != 0 ? count - c0 + 1 : count - c2 + 3; } libc_hidden_builtin_def (strcspn)