From patchwork Fri Aug 10 19:25:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 956450 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-95167-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=sourceware.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="TOaFDAS2"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41nFV02crhz9s47 for ; Sat, 11 Aug 2018 05:26:16 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id; q=dns; s= default; b=y0hpcGkRgfdwikeriM5Ax/2qGFTSGMM1W3ARsdO90LQnqam1LCn44 wDoYFaeOHRuc33BXVndi6oidPmz5XFlE9w6TMlIXb5/VmBDHjCTF/ibaQdJoP4i0 WZ49XfGQx3pLCyqlg5yVcM+IrAWRUSqjvu/ta31bm99WTaVAQDAIRI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id; s=default; bh=InhZDIYxOjcSU13avxBlyShxra8=; b=TOaFDAS250h1sFdhuDA7mQ32JsTA 7GH7Nyb8XMM8mZS+Em6Kiwib4cuqDau6E0YwmJpqlmseWwCdZihJn8UvrEu5DF35 1GKU/rYSVmKLO57xrpIdyiRnXAzSB+TFQye6ekcSLmKWC39P4EOFf9/8FUiLL74F HlfqZ+tLh28vIuM= Received: (qmail 36093 invoked by alias); 10 Aug 2018 19:26:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 36078 invoked by uid 89); 10 Aug 2018 19:26:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_NEUTRAL autolearn=ham version=3.3.2 spammy=quarter, walked, randomized, earliest X-HELO: homiemail-a52.g.dreamhost.com From: Siddhesh Poyarekar To: libc-alpha@sourceware.org Cc: Wilco.Dijkstra@arm.com, carlos@redhat.com Subject: [PATCH] New benchmark strlen-walk Date: Sat, 11 Aug 2018 00:55:55 +0530 Message-Id: <20180810192555.11657-1-siddhesh@sourceware.org> Hi, This is a second take at a strlen benchmark and it takes a different approach from the previous linked list idea. A comment in the test provides the rationale for the benchmark; to summarize it focuses on testing strlen with small to medium sized inputs with different sizes mixed in and walking backwards to try and trick the prefetcher. The numbers are kinda stable; I'm not super happy but they're close enough to make out a general performance characteristic. * benchtests/bench-strlen-walk.c: New benchmark. * benchtests/Makefile (string-benchset): Add it. CC: Wilco.Dijkstra@arm.com CC: carlos@redhat.com --- benchtests/Makefile | 2 +- benchtests/bench-strlen-walk.c | 217 +++++++++++++++++++++++++++++++++ 2 files changed, 218 insertions(+), 1 deletion(-) create mode 100644 benchtests/bench-strlen-walk.c diff --git a/benchtests/Makefile b/benchtests/Makefile index bcd6a9c26d..31cacef373 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -43,7 +43,7 @@ string-benchset := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \ strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \ strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok \ strcoll memcpy-large memcpy-random memmove-large memset-large \ - memcpy-walk memset-walk memmove-walk + memcpy-walk memset-walk memmove-walk strlen-walk # Build and run locale-dependent benchmarks only if we're building natively. ifeq (no,$(cross-compiling)) diff --git a/benchtests/bench-strlen-walk.c b/benchtests/bench-strlen-walk.c new file mode 100644 index 0000000000..1ac0ae6fdf --- /dev/null +++ b/benchtests/bench-strlen-walk.c @@ -0,0 +1,217 @@ +/* Measure STRLEN functions - walk through a list of elements and measure + string lengths. + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* RATIONALE + --------- + + The following assumptions are made in this test about strlen usage in the + wild: + + - Target strings are small or medium in size, rarely (if ever) very large + - In well written code, the target string is not in cache since strlen is + among the earliest operations on it. + + This test measures the latency of strlen with a mix of sizes up to a maximum + length, for a set of maximum lengths. The distribution of lengths in each + set is logarithmic, with the longest length having one occurrence, it's half + having 2, quarter having 4 and so on. The lengths are further randomized by + adding a jitter of up to 8 bytes. This makes the small string sets + completely arbitrary. + + Further, The buffer and its index are reallocated at every run to ensure + that there is no cross talk between implementation. Finally, the list of + strings is walked through backwards to try and trick the hardware + prefetcher. There is an issue with this access too though, which is that + the most recently touched string ends up getting measured first when we go + from front to back, so that introduces some cache side effect. */ + +#define TEST_MAIN +#define MIN_PAGE_SIZE (getpagesize () * 4096) +#ifndef WIDE +# define TEST_NAME "strlen" +#else +# define TEST_NAME "wcslen" +#endif +#include "bench-string.h" + +#ifndef WIDE +# define STRLEN strlen +# define CHAR char +# define MAX_CHAR CHAR_MAX +#else +# include +# define STRLEN wcslen +# define CHAR wchar_t +# define MAX_CHAR WCHAR_MAX +#endif + +#include "json-lib.h" + +typedef size_t (*proto_t) (const CHAR *); + +size_t +simple_STRLEN (const CHAR *s) +{ + const CHAR *p; + + for (p = s; *p; ++p); + return p - s; +} + +#ifndef WIDE +size_t +builtin_strlen (const CHAR *p) +{ + return __builtin_strlen (p); +} +IMPL (builtin_strlen, 0) +#endif + +IMPL (simple_STRLEN, 0) +IMPL (STRLEN, 1) + +static unsigned char **str_index; + +static void +do_one_test (json_ctx_t *json_ctx, impl_t *impl, size_t last_str_index) +{ + timing_t start, stop, cur; + + TIMING_NOW (start); + for (int i = last_str_index - 1; i >= 0; i--) + CALL (impl, (char *) str_index[i]); + TIMING_NOW (stop); + + TIMING_DIFF (cur, start, stop); + + json_element_double (json_ctx, (double) cur / (double) last_str_index); +} + +/* Split the buffer into strings and populate an str_index. Return + the size of the str_index so that it can be iterated backwards */ +static size_t +setup_strings (size_t maxlen) +{ + unsigned char *p = buf1; + size_t orig_maxlen = maxlen, i = 0; + int cur_cnt, cnt; + size_t logn = 0, m = maxlen; + + while ((m>>=1) > 0) + logn++; + + /*Size of the index is buf_size*(2*M-1)/(M*ln(M)) where M is the max len and + we have a logarithmic distribution of string sizes, i.e. 1 of maxlen, 2 of + maxlen/2, 4 of maxlen/4 and so on. Round up to avoid buffer overflows. */ + size_t index_size = (2 * buf1_size - buf1_size / orig_maxlen) / (logn + 1); + index_size *= sizeof (unsigned char *); + + str_index = malloc (index_size); + + if (str_index == NULL) + error (1, ENOMEM, "Out of memory\n"); + + srand (42); + cur_cnt = cnt = 1; + + size_t len = maxlen + rand () % 8; + + while (p < buf1 + buf1_size - len - 1) + { + str_index[i++] = p; + memset (p, 'a', len); + p[len] = '\0'; + p += len + 1; + + cnt--; + if (cnt == 0) + { + cur_cnt = cur_cnt << 1; + cnt = cur_cnt; + maxlen >>= 1; + if (maxlen == 0) + maxlen = orig_maxlen; + } + len = maxlen + rand () % 8; + } + + return i; +} + + +static void +do_test (json_ctx_t *json_ctx, size_t len) +{ + json_element_object_begin (json_ctx); + json_attr_uint (json_ctx, "length", len); + json_array_begin (json_ctx, "timings"); + + /* Rebuild everything for each implementation so that we don't have cache + side effects across implementations. */ + FOR_EACH_IMPL (impl, 0) + { + size_t i = setup_strings (len); + do_one_test (json_ctx, impl, i); + alloc_bufs (); + free (str_index); + } + + json_array_end (json_ctx); + json_element_object_end (json_ctx); +} + +int +test_main (void) +{ + json_ctx_t json_ctx; + size_t i; + + test_init (); + + json_init (&json_ctx, 0, stdout); + + json_document_begin (&json_ctx); + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + + json_attr_object_begin (&json_ctx, "functions"); + json_attr_object_begin (&json_ctx, TEST_NAME); + json_attr_string (&json_ctx, "bench-variant", "random"); + + json_array_begin (&json_ctx, "ifuncs"); + FOR_EACH_IMPL (impl, 0) + json_element_string (&json_ctx, impl->name); + json_array_end (&json_ctx); + + json_array_begin (&json_ctx, "results"); + + /* The maximum sizes to test. These are arbitrary. */ + const size_t size_ranges[] = {64, 128, 512, 2048, 8192}; + + for (i = 0; i < sizeof (size_ranges) / sizeof (size_t); i++) + do_test (&json_ctx, size_ranges[i]); + + json_array_end (&json_ctx); + json_attr_object_end (&json_ctx); + json_attr_object_end (&json_ctx); + json_document_end (&json_ctx); + + return ret; +} + +#include