From patchwork Tue Mar 29 14:08:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 602941 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qZCLp3xmcz9s9Z for ; Wed, 30 Mar 2016 01:09:06 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=hveopCKM; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:cc:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=MoRC8gYTmkmCgtva VSP6kJO8h0vzQZJGOwul4OMAV6abwTomaZjW8Quf6UklsM5r1VR1W5yQOzD4YM3W N0FNayQKVZm+frh5IuCAd82nRxaEGP73TzOUgqE4o0gJbZXwtyZbtqfbYm2seyGo Eme5kk+eh51OluK0cXg5l5WImOE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:cc:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=X9asvsO/9VjyXgn4hLdxlE zBf3w=; b=hveopCKM+ZocGFxKojmFiGScGRJMPnCdfI7OPdUzP404qt3xHFEunj zg/TgJplQEnTB1gjhbGJxgOfTBgRtbk5Sbsuy8RS7srem5Z/L/rnnLMKkSAEhJy8 TjcblwZh1Q33ZcHrpziMb4wi1VwMQDpQ6jNDx/dBuXGvWoWGYiySQ= Received: (qmail 43813 invoked by alias); 29 Mar 2016 14:08:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 43794 invoked by uid 89); 29 Mar 2016 14:08:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=HX-HELO:sk:mail-yw, H*r:sk:mail-yw, H*RU:sk:mail-yw, Hx-spam-relays-external:sk:mail-yw X-HELO: mail-yw0-f179.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=Mv7bcxqlg1aFsZwS90BArp5DRAiJnJcJkPubPsID62o=; b=h/VfcWrvHtPZgWBEt6au1obALlmgS7Nc0pblpsWjz7FdE5uicSsivmN6V101QsK5JT 6QmKph5lmXvHQhhRE+03XYioN90Jizic1CEAwqdXsgXB282vnA65KUZlyMRVNud2T08I iZ2DImVCUSkDHdoY4Urv1DrNZcH53DqFY26p5eIYtYZ+FH/xImKyUI3UaEYyvbslXMx3 AA83ViRrrzDc9OFLq+eA1GYt+zA8erioT7+grso/V8Sh/ffZzXS1Gd9AgQ+unzMAtrJF awg4NMbRSQACsej5EmnnvxL+LgXlhuvzZ8pFtQj8ETAS7f4YGa9TlyPvpHAH0vNq/4Mo NDJw== X-Gm-Message-State: AD7BkJJNNho3HCHjHSwZCYnAR3kHbTUc6+eMg1zJZgce6D/QftPJCCCfQHXLJlmPkS5mTPsM X-Received: by 10.129.89.197 with SMTP id n188mr1113656ywb.270.1459260525305; Tue, 29 Mar 2016 07:08:45 -0700 (PDT) Subject: Re: [PATCH 2/4] Improve generic strspn performance To: Wilco Dijkstra , "libc-alpha@sourceware.org" References: <1459178389-14133-1-git-send-email-adhemerval.zanella@linaro.org> <1459178389-14133-2-git-send-email-adhemerval.zanella@linaro.org> Cc: nd From: Adhemerval Zanella Message-ID: <56FA8C69.8010709@linaro.org> Date: Tue, 29 Mar 2016 11:08:41 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: On 29-03-2016 10:02, Wilco Dijkstra wrote: > Adhemerval Zanella wrote: > >> + if (accept[0] == '\0') >> + return 0; >> + if (accept[1] == '\0') >> + { > > GCC doesn't get the static branch prediction correct for the 2nd if, > so it would be useful to add __glibc_unlikely given that single-character > accepts are rare. > >> + s = (unsigned char *) ((size_t)(s) & ~3); >> + unsigned int c0, c1, c2, c3; >> + do { >> + s += 4; >> + c0 = p[s[0]]; >> + c1 = p[s[1]]; >> + c2 = p[s[2]]; >> + c3 = p[s[3]]; >> + } while ((c0 && c1 && c2 && c3) == 1); > > That should use '&' rather than '&&' and '!= 0' similar to how I did it in strcspn. > This will use 3 AND(S) instructions and a single branch. > >> + >> + size_t count = s - (unsigned char *) str; >> + return (c0 && c1) == 0 ? count - !c0 + 1 : count - !c2 + 3; > > Again, c0 & c1 is better and allows CSE with the while expression above. > Also -!c0 +1 is equivalent to c0, -!c2 + 3 is equivalent to c2 + 2 - this is simpler > and faster. > > Otherwise it looks good, and thanks for doing this one too! > > Cheers, > Wilco > Thanks for the review. I think this version fixes the points you noted: diff --git a/string/strspn.c b/string/strspn.c index f0635c1..15d7fa7 100644 --- a/string/strspn.c +++ b/string/strspn.c @@ -25,23 +25,49 @@ /* Return the length of the maximum initial segment of S which contains only characters in ACCEPT. */ size_t -STRSPN (const char *s, const char *accept) +STRSPN (const char *str, const char *accept) { - const char *p; - const char *a; - size_t count = 0; - - for (p = s; *p != '\0'; ++p) - { - for (a = accept; *a != '\0'; ++a) - if (*p == *a) - break; - if (*a == '\0') - return count; - else - ++count; + if (accept[0] == '\0') + return 0; + if (__glibc_unlikely (accept[1] == '\0')) + { + const char *a = str; + for (; *str == *accept; str++); + return str - a; } - return count; + /* Use multiple small memsets to enable inlining on most targets. */ + unsigned char table[256]; + unsigned char *p = memset (table, 0, 64); + memset (p + 64, 0, 64); + memset (p + 128, 0, 64); + memset (p + 192, 0, 64); + + unsigned char *s = (unsigned char*) accept; + /* Different from strcspn it does not add the NULL on the table + so can avoid check if str[i] is NULL, since table['\0'] will + be 0 and thus stopping the loop check. */ + do + p[*s++] = 1; + while (*s); + + s = (unsigned char*) str; + if (!p[s[0]]) return 0; + if (!p[s[1]]) return 1; + if (!p[s[2]]) return 2; + if (!p[s[3]]) return 3; + + s = (unsigned char *) ((size_t)(s) & ~3); + unsigned int c0, c1, c2, c3; + do { + s += 4; + c0 = p[s[0]]; + c1 = p[s[1]]; + c2 = p[s[2]]; + c3 = p[s[3]]; + } while ((c0 & c1 & c2 & c3) != 0); + + size_t count = s - (unsigned char *) str; + return (c0 & c1) == 0 ? count + c0 : count + c2 + 2; } libc_hidden_builtin_def (strspn)