From patchwork Fri May 29 19:34:53 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?T25kxZllaiBCw61sa2E=?= X-Patchwork-Id: 478033 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5C210140F91 for ; Sat, 30 May 2015 05:35:24 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=sourceware.org header.i=@sourceware.org header.b=N4g5OB0l; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:references :mime-version:content-type:in-reply-to; q=dns; s=default; b=Y62W 3pcVKxUFXxumU/d0qePeSS18d11D1u5hsIUoy3K2T0WjpQw05wrrUmBzN4GcPTSb BOzdjNtxoEcBBQuAlfiDvlGzzAlM2hFTXWOYmerOmeNzsDJOKb2taFAGsxezNHAW tcAv/6Yj/2ERHZ/+UFanqOaY4aVTD6yF0x+isCo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:references :mime-version:content-type:in-reply-to; s=default; bh=Dx5DX5iw12 bQeg0y7FHTpfCHoPg=; b=N4g5OB0l+JJXbE7KGK0GRy2Gw6GcAHmnLUkqHPF4Fd +y23K9YPUaeBADoRXgN//wRY2XPgNg0K9nieM0VQ76/Q7pqCM1Xz1MWbAK28xQ5G Qk1tXr99nvVbx8uDrZubJRPjt3xoENKMb9u6EGRT0B1ThPhREgE5LePWulNd8rcl c= Received: (qmail 117031 invoked by alias); 29 May 2015 19:35:18 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 117017 invoked by uid 89); 29 May 2015 19:35:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.7 required=5.0 tests=AWL, BAYES_50, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Fri, 29 May 2015 21:34:53 +0200 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: libc-alpha@sourceware.org Subject: [PATCH 2/*] generic strstr, strcasestr, memmem Message-ID: <20150529193453.GA28852@domone> References: <20150529190952.GA23952@domone> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150529190952.GA23952@domone> User-Agent: Mutt/1.5.20 (2009-06-14) I added a new version of string skeleton also for improving strstr hot path. This uses same trick as x64, which is looking for leading digraphs, switching to two-way algorithm when you reach superlinear number of comparisons. This superseedes my previous strstr... patches which were written without fast way of finding digraphs. I didn't tuned buy-or-rent cost of switchings yet, that could be upto discussion. Also previous strcasestr ascii trick is included. I didn't tuned a strstr expressions yet. They could be simplified but thats for separate patch, also I could add strstr with ascii trick. Other speedup would be that I could add a iterator interface. It would speed up patterns like while (s = strchr(s + 1, 'c') ... that would become strchr_iterator i; strchr_start (s, c, &i); while (s = strchr_next(&i)) { } Implementation would be saving mask created by strchr, then strchr_next inline would first look up for mask and would resort to call at most once per 32 bytes when it needs to move for next mask. Same trick would be used for faster iteration over digraphs and for strstr itself. I wonder if it would be possible to make gcc recognize these patterns and do transformation automatically. It would be similar analysis like optimizing out quadratic strcat loop that I mentioned before. * benchtests/bench-strcasestr.c: Remove simple_strcasestr. * string/test-strcasestr.c: Likewise. * sysdeps/generic/string_vector_search.h: New file. * string/strstr.c (strstr): Use fast digraph search. * string/strcasestr.c (STRCASESTR): Likewise. * string/memmem.c (__memmem): Likewise. diff --git a/benchtests/bench-strcasestr.c b/benchtests/bench-strcasestr.c index 33531a4..6c6309f 100644 --- a/benchtests/bench-strcasestr.c +++ b/benchtests/bench-strcasestr.c @@ -21,10 +21,6 @@ #include "bench-string.h" -#define STRCASESTR simple_strcasestr -#define NO_ALIAS -#define __strncasecmp strncasecmp -#include "../string/strcasestr.c" static char * @@ -53,7 +49,6 @@ stupid_strcasestr (const char *s1, const char *s2) typedef char *(*proto_t) (const char *, const char *); IMPL (stupid_strcasestr, 0) -IMPL (simple_strcasestr, 0) IMPL (strcasestr, 1) diff --git a/string/memmem.c b/string/memmem.c index 8a81f65..a13e16f 100644 --- a/string/memmem.c +++ b/string/memmem.c @@ -35,6 +35,45 @@ #undef memmem +#include + +struct cmask +{ + unsigned long int n0, n1; + size_t needle_size; +}; +#define CUSTOM_CMASK +#define CMASK_PARAM struct cmask cmask + +#define LOOKAHEAD 1 +#define EXPRESSION_NOCARRY(x, cmask) \ + ( bytes_equal_nocarry (BYTE (0), cmask.n0) \ + & bytes_equal_nocarry (BYTE (1), cmask.n1)) + +/* We calculate mask, not look for first byte and carry could corrupt + following bytes. Only possible optimization would be use + contains_zero (x) as it must end there. */ +#define EXPRESSION(x, cmask) EXPRESSION_NOCARRY(x, cmask) + +static int +check (char *s, char *n, unsigned long *rent, struct cmask cmask) +{ + /* First two characters were already checked by vector loop. */ + size_t i = 2; + + while (i < cmask.needle_size && s[i] == n[i]) + i++; + + if (i == cmask.needle_size) + return 1; + + rent += i + 10; + return 0; +} + +#define BOUND(p) ((uintptr_t) p >= (uintptr_t) end) +#include + /* Return the first occurrence of NEEDLE in HAYSTACK. Return HAYSTACK if NEEDLE_LEN is 0, otherwise NULL if NEEDLE is not found in HAYSTACK. */ @@ -46,6 +85,7 @@ __memmem (const void *haystack_start, size_t haystack_len, not an array of 'char' values. See ISO C 99 section 6.2.6.1. */ const unsigned char *haystack = (const unsigned char *) haystack_start; const unsigned char *needle = (const unsigned char *) needle_start; + char *ret; if (needle_len == 0) /* The first occurrence of the empty string is deemed to occur at @@ -57,6 +97,31 @@ __memmem (const void *haystack_start, size_t haystack_len, if (__glibc_unlikely (haystack_len < needle_len)) return NULL; + unsigned char *n = (unsigned char *) needle; + if (__BYTE_ORDER == __LITTLE_ENDIAN && needle_len >= 2) + { + struct cmask cmask; + cmask.n0 = n[0] * ones; + cmask.n1 = n[1] * ones; + cmask.needle_size = needle_len; + char *end_search = (char *) (((uintptr_t) haystack) + haystack_len + - needle_len + 1); + if ((uintptr_t) end_search < (uintptr_t) haystack) + end_search = (char *) UINTPTR_MAX; + if (vector_search ((char *) haystack, (char *) needle, + cmask, end_search, &ret)) + return ((uintptr_t) ret < (uintptr_t) end_search) ? ret : NULL; + else + { + haystack_len -= (char *) ret - (char *) haystack; + haystack = (const unsigned char *) ret; + + if (ret == NULL || haystack_len < needle_len) + return NULL; + } + } + + /* Use optimizations in memchr when possible, to reduce the search size of haystack using a linear algorithm with a smaller coefficient. However, avoid memchr for long needles, since we diff --git a/string/strcasestr.c b/string/strcasestr.c index 400fab8..b512134 100644 --- a/string/strcasestr.c +++ b/string/strcasestr.c @@ -57,6 +57,61 @@ #define STRCASESTR __strcasestr #endif +/* A initial fast vectorized loop looking for first digraph. */ + +#include + +struct cmask +{ + vector_int l0, u0, l1, u1; +}; +#define CUSTOM_CMASK +#define CMASK_PARAM struct cmask cmask + + /* We use trick that for when character is in ascii and character + 0 <= c <= 127 could be caselessly equal only one of characters + tolower (c), toupper (c) or a character x in range 128 <= x <= 255 + As these are exactly characters have highest bit set to 1 we adjust + a expression from strstr and filter lower bits by anding with high_bits. + */ + +#define LOOKAHEAD 1 +#define BYTE_EXPRESSION(x, l, u) \ + ( bytes_equal_nocarry (x, l) \ + | bytes_equal_nocarry (x, u) \ + | x) + +#define EXPRESSION_NOCARRY(x, cmask) (( \ + ( BYTE_EXPRESSION (BYTE(0), cmask.l0, cmask.u0) \ + & BYTE_EXPRESSION (BYTE(1), cmask.l1, cmask.u1) \ + ) | contains_zero_nocarry (BYTE (1))) & high_bits) + +/* We calculate mask, not look for first byte and carry could corrupt + following bytes. Only possible optimization would be use + contains_zero (x) as it must end there. */ +#define EXPRESSION(x, cmask) EXPRESSION_NOCARRY (x, cmask) + +static int +check (char *s, char *n, unsigned long *rent, struct cmask cmask) +{ + /* We used heuristic. Need to recheck. */ + size_t i = 0; + if (!s[1]) + return 1; + + while (n[i] && tolower (s[i]) == tolower (n[i])) + i++; + + if (!n[i]) + return 1; + + rent += i + 10; + return 0; +} + +#include + +#include "../locale/localeinfo.h" /* Find the first occurrence of NEEDLE in HAYSTACK, using case-insensitive comparison. This function gives unspecified @@ -66,10 +121,33 @@ STRCASESTR (const char *haystack_start, const char *needle_start) { const char *haystack = haystack_start; const char *needle = needle_start; + char *ret; size_t needle_len; /* Length of NEEDLE. */ size_t haystack_len; /* Known minimum length of HAYSTACK. */ bool ok = true; /* True if NEEDLE is prefix of HAYSTACK. */ + __locale_t loc = _NL_CURRENT_LOCALE; + struct __locale_data *ctype = loc->__locales[LC_CTYPE]; + int nonascii = ctype->values[_NL_ITEM_INDEX (_NL_CTYPE_NONASCII_CASE)].word; + + unsigned char *n = (unsigned char *) needle; + if (__BYTE_ORDER == __LITTLE_ENDIAN && !nonascii && haystack[0] != '\0' && + n[0] != '\0' && n[0] < 128 && n[1] != '\0' && n[1] < 128) + { + struct cmask cmask; + cmask.l0 = tolower (n[0]) * ones; + cmask.u0 = toupper (n[0]) * ones; + cmask.l1 = tolower (n[1]) * ones; + cmask.u1 = toupper (n[1]) * ones; + + if (vector_search ((char *) haystack, (char *) needle, cmask,\ + NULL, &ret)) + return (ret[1] != '\0') ? ret : NULL; + else + haystack = ret; + } + + /* Determine length of NEEDLE, and in the process, make sure HAYSTACK is at least as long (no point processing all of a long NEEDLE if HAYSTACK is too short). */ diff --git a/string/strstr.c b/string/strstr.c index 045e878..9429cc6 100644 --- a/string/strstr.c +++ b/string/strstr.c @@ -45,6 +45,48 @@ #define STRSTR strstr #endif +#include + +struct cmask +{ + vector_int n0, n1; +}; +#define CUSTOM_CMASK +#define CMASK_PARAM struct cmask cmask + +#define LOOKAHEAD 1 +#define EXPRESSION_NOCARRY(x, cmask) (\ + ( bytes_equal_nocarry (BYTE (0), cmask.n0) \ + & bytes_equal_nocarry (BYTE (1), cmask.n1)) \ + | contains_zero_nocarry (BYTE (1))) + +/* We calculate mask, not look for first byte and carry could corrupt + following bytes. Only possible optimization would be use + contains_zero (x) as it must end there. */ +#define EXPRESSION(x, cmask) EXPRESSION_NOCARRY(x, cmask) + +static int +check (char *s, char *n, unsigned long *rent, struct cmask cmask) +{ + /* First two characters were already checked by vector loop. */ + size_t i = 2; + if (!s[1]) + return 1; + + while (n[i] && s[i] == n[i]) + i++; + + if (!n[i]) + return 1; + + rent += i + 10; + return 0; +} + +#include + + + /* Return the first occurrence of NEEDLE in HAYSTACK. Return HAYSTACK if NEEDLE is empty, otherwise NULL if NEEDLE is not found in HAYSTACK. */ @@ -53,10 +95,28 @@ STRSTR (const char *haystack_start, const char *needle_start) { const char *haystack = haystack_start; const char *needle = needle_start; + char *ret; size_t needle_len; /* Length of NEEDLE. */ size_t haystack_len; /* Known minimum length of HAYSTACK. */ bool ok = true; /* True if NEEDLE is prefix of HAYSTACK. */ + unsigned char *n = (unsigned char *) needle; + if (__BYTE_ORDER == __LITTLE_ENDIAN + && haystack[0] != '\0' && n[0] != '\0' && n[1] != '\0') + { + struct cmask cmask; + cmask.n0 = n[0] * ones; + cmask.n1 = n[1] * ones; + + if (vector_search ((char *) haystack, (char *) needle, cmask,\ + NULL, &ret)) + return ret[1] != '\0' ? ret : NULL; + else + haystack = ret; + } + + + /* Determine length of NEEDLE, and in the process, make sure HAYSTACK is at least as long (no point processing all of a long NEEDLE if HAYSTACK is too short). */ diff --git a/string/test-strcasestr.c b/string/test-strcasestr.c index 489dc84..3c01881 100644 --- a/string/test-strcasestr.c +++ b/string/test-strcasestr.c @@ -25,7 +25,6 @@ #define STRCASESTR simple_strcasestr #define NO_ALIAS #define __strncasecmp strncasecmp -#include "strcasestr.c" static char * @@ -54,7 +53,6 @@ stupid_strcasestr (const char *s1, const char *s2) typedef char *(*proto_t) (const char *, const char *); IMPL (stupid_strcasestr, 0) -IMPL (simple_strcasestr, 0) IMPL (strcasestr, 1) diff --git a/sysdeps/generic/string_vector_search.h b/sysdeps/generic/string_vector_search.h new file mode 100644 index 0000000..5ad714b --- /dev/null +++ b/sysdeps/generic/string_vector_search.h @@ -0,0 +1,85 @@ +/* Algorithm to select a faster string search implementation. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* + For string search algorithms there are two ways how you could look + at performance. + + From practical standpoint a naive algorithm is baseline. Most theoretical + algorithms (BW, KMP, ...) are slower because they need to do expensive random access. This makes prohibitive constant cost per character while naive + algorithm is linear on random inputs as most of time you don't match string. + Also potential gains are nonexistent, user supplies needle so you could be + happy when its at least 8 characters long and most haystacks are less than 64 bytes large. + + On other hand we need to ensure a linear worst-case of our algorithm. + + We do that with bit of accounting. We count number of comparisons and when + it exceeds a size of haystack times some constant we switch to two-way + algorithm with guaranteed linear time. + + Main performance boost is gained by vectorizing naive algorithm checks. + We will check in parallel if first digraph matches. That should be quite rare, in english most frequent digraph is th with frequency around 1%. + + Then after each digraph found we face decision if to keep looking for + digraphs or switch to two-way algorithm. These are covered as common + problem in online algorithm setting: buy-or-rent problem. + A precomputation in two-way algorithm with needle size n takes + around same time as 20 character comparisons so in worst case a two-way + algorithm would be twenty times slower than naive for haystacks of size n+1. + A solution would be pay higher rent rate until it accumulates to buy cost. + Then we would in worst case be twice slower than selecting optimal + implementation from start. + + That would work except it needs strlen (needle) which is unnecessary + in practice. To show that haystack cannot match needle it suffices to know + that there is no leading triplet from needle in haystack. As unless there + was a planted needle in haystack and false positive is unlikely we likely + don't have to inspect more than three or four characters from needle. Also + correct accounting takes time so we approximate cost based on number of + comparisons and vector searchs. + + */ + +#include + +static int +vector_search (char *haystack, char *needle, CMASK_PARAM, + char *end, char **ret) +{ + char *s = haystack; + unsigned long rent = 0; + while (rent < 256 + (s - haystack)) + { + s = string_skeleton (s, cmask, end); + if (s == NULL || BOUND (s)) + { + *ret = s; + return 0; + } + if (check (s, needle, &rent, cmask)) + { + *ret = s; + return 1; + } + else + s++; + } + /* Superlinear behaviour detected, switch to two-way algorithm. */ + *ret = s; + return 0; +}