diff mbox

[PING] Wordexp benchtest: good, bad and unreliable.

Message ID 20150618163408.GA16797@domone
State New
Headers show

Commit Message

Ondřej Bílka June 18, 2015, 4:34 p.m. UTC
On Tue, Jun 16, 2015 at 03:24:03PM -0400, Carlos O'Donell wrote:
> On 06/16/2015 01:32 PM, Ondřej Bílka wrote:
> >> As I mentioned before for wordexp performance depends on too many
> >> parameters. So here is microbenchmark that shows some improvements and
> >> some regressions. There is lot of noise. I ran these three times and on
> >> one * and ffffffffoo pattern become twice slower for no reason.
> 
> There is a reason, but we just don't know it yet.
>
My guess that difference was caused by stat variance but I couldn't
verify that. 

> >> So how do you decide with conflicted data like this?
> 
> You can't make an "automatic" or "automated" decision, but as a baseline
> the statistics do help. With noisy benchmarks you can still look at
> potential trends in the noise. We don't want the bad patterns to get 5x
> slower.
> 

Thats for most of functions simply unavoidable, what I submitted is classical example that you cannot avoid some paterns being 5 times slower as you are shifting worst case.

Here with my wordexp change patterns that didn't used strchr(ifs,x)
would get five times slower when user uses big IFS.

On other hand patterns that checked strchr(ifs,x) will become five times
faster with this change.

These tradeoffs are actually quite routine, for example most of string
routines tend to be slow when inputs are always at page boundary. You
cannot avoid that as alternatives move worst case to much more probable
scenarios like empirical probability ~1/8 that 16 bytes cross 64 byte boundary.

I could improve performance by optimizing that case aggresively. However
it would likely be net loss as cross-page is most of time cold path and
I try to optimize it to size.

> Similarly we will have conflicting workloads modeled as benchmarks, and
> the hard part is that as an expert we have to make a difficult choice.
> We have to understand the workload and decide "Yes, we ignore this performance
> decrease because we expect fewer people will suffer." However, in order
> to do that we need comments about exactly what workload we're trying to model.
>
Thats problem that you need to collect lot of domain knowledge to be
usable. Problem is that I didn't find yet application that uses it.
 
> For most of our microbenchmarks the workloads are "raw performance of function
> call" and that's very simplistic, it's also very generic.
> 
Thats quite problem when there are many variants that need to be tried
what should you priorize, as this migth be improved or not depending on
average ifs size and average size of pattern.

> This group of tests should have a comment about what's it's testing.
> 
> What workload are we modeling?
>
None in particular, just trying to measure performance change.

> >> +
> >> +
> >> +  do_test ("*", 0);
> >> +  setenv("IFS","                                                  \
> >> +                                               ", 1);
> >> +
> >> +  do_test ("foo", 0);
> >> +  do_test ("ffoo", 0);
> >> +  do_test ("ffffoo", 0);
> >> +  do_test ("ffffffffoo", 0);
> >> +  do_test ("ffffffffffffffffoo", 0);
> >> +  do_test ("ffffffffffffffffffffffffffffffffoo", 0);
> >> +do_test ("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo", 0);
> >> +  do_test ("*", 0);
> >> +
> 
> Similarly with the IFS change. This is not a normal IFS change for an average workload.
> 
> Usually one changes IFS to be a series of characters, or just tab, or just newline etc.
> 
These are worst case for my change. There is constant overhead per call
to parse IFS, then my patch saves some cycles per character depending on
which path we take.

On paths of f...ffoo form you save cost of conversion
 strchr ("\n|&;<>(){}", ch)
per character.

I read code more and I could also measure saving of
strchr(ifs, ch) 
per character for patterns ?fff...fffo.

However measuring exact performance is bit difficult again due to noise
as wordexp calls glob which takes ~70000 cycles. Thats lot compared with
savings in wordexp itself, difference between longest pattern was 4500
which this change reduced to 500 cycles. I am not completely sure about
that numbers as I got that estimate by subtracting performance of glob
without ifs.

old one:

                       	wordexp
Pattern foo flags 0:	1131.86
Pattern ffoo flags 0:	629.5
Pattern ffffoo flags 0:	709.562
Pattern ffffffffoo flags 0:	866.359
Pattern ffffffffffffffffoo flags 0:	1115.53
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	1591.81
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	2608.45
Pattern ?foo flags 0:	74733.9
Pattern ?ffoo flags 0:	71044.9
Pattern ?ffffoo flags 0:	71165.1
Pattern ?ffffffffoo flags 0:	71009
Pattern ?ffffffffffffffffoo flags 0:	71283.4
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	71748.2
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	73446.5
Pattern * flags 0:	135284
Hundred byte IFS
Pattern foo flags 0:	1067.41
Pattern ffoo flags 0:	1103.17
Pattern ffffoo flags 0:	1164.02
Pattern ffffffffoo flags 0:	1449.36
Pattern ffffffffffffffffoo flags 0:	2375.23
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	2170.95
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	3120.31
Pattern ?foo flags 0:	72397
Pattern ?ffoo flags 0:	71573.4
Pattern ?ffffoo flags 0:	71768.1
Pattern ?ffffffffoo flags 0:	71925.9
Pattern ?ffffffffffffffffoo flags 0:	72591.7
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	73172.4
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	78188.5
Pattern * flags 0:	134651


new one:

                       	wordexp
Pattern foo flags 0:	1056.98
Pattern ffoo flags 0:	680.562
Pattern ffffoo flags 0:	690.219
Pattern ffffffffoo flags 0:	842.516
Pattern ffffffffffffffffoo flags 0:	1016.67
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	1411.91
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	2174.77
Pattern ?foo flags 0:	81948.1
Pattern ?ffoo flags 0:	71048.2
Pattern ?ffffoo flags 0:	71345
Pattern ?ffffffffoo flags 0:	71098.2
Pattern ?ffffffffffffffffoo flags 0:	71314.9
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	71669
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	72204.1
Pattern * flags 0:	137048
Hundred byte IFS
Pattern foo flags 0:	1568.56
Pattern ffoo flags 0:	1581.34
Pattern ffffoo flags 0:	1627.89
Pattern ffffffffoo flags 0:	1924.27
Pattern ffffffffffffffffoo flags 0:	2109.28
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	2456.67
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	3204.42
Pattern ?foo flags 0:	74253.8
Pattern ?ffoo flags 0:	72561.5
Pattern ?ffffoo flags 0:	72630.3
Pattern ?ffffffffoo flags 0:	72794.3
Pattern ?ffffffffffffffffoo flags 0:	72951
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	73194.2
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	74068.2
Pattern * flags 0:	137750

Comments

Carlos O'Donell June 18, 2015, 5:39 p.m. UTC | #1
On 06/18/2015 12:34 PM, Ondřej Bílka wrote:
>> Similarly we will have conflicting workloads modeled as benchmarks, and
>> the hard part is that as an expert we have to make a difficult choice.
>> We have to understand the workload and decide "Yes, we ignore this performance
>> decrease because we expect fewer people will suffer." However, in order
>> to do that we need comments about exactly what workload we're trying to model.
>>
> Thats problem that you need to collect lot of domain knowledge to be
> usable. Problem is that I didn't find yet application that uses it.

mailx uses wordexp to expand folder paths.

>>
> None in particular, just trying to measure performance change.

What about looking at feeding paths into wordexp and using it like mailx
does to do light-weight globbing?

See mailx-12.5/fio.c, where globname uses wordexp to do the expansions,
and also expand as used in the rest of mailx.

Cheers,
Carlos.
diff mbox

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 8e615e5..fb737da 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -35,7 +35,7 @@  string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \
 		strcat strchr strchrnul strcmp strcpy strcspn strlen \
 		strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
 		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok \
-		strcoll
+		strcoll wordexp
 string-bench-all := $(string-bench)
 
 # We have to generate locales
diff --git a/benchtests/bench-wordexp.c b/benchtests/bench-wordexp.c
new file mode 100644
index 0000000..7696f73
--- /dev/null
+++ b/benchtests/bench-wordexp.c
@@ -0,0 +1,118 @@ 
+/* Measure wordexp functions.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define TEST_MAIN
+#define TEST_NAME "wordexp"
+#include "bench-string.h"
+
+#include <wordexp.h>
+
+typedef int (*proto_t) (const char *, wordexp_t *, int);
+
+IMPL (wordexp, 1)
+
+
+static void
+do_one_test (impl_t *impl, const char *s1, int flag)
+{
+  size_t i, iters = INNER_LOOP_ITERS;
+  timing_t start, stop, cur;
+
+  TIMING_NOW (start);
+  for (i = 0; i < iters; ++i)
+    {
+      wordexp_t out;
+      CALL (impl, s1, &out, flag);
+    }
+  TIMING_NOW (stop);
+
+  TIMING_DIFF (cur, start, stop);
+
+  TIMING_PRINT_MEAN ((double) cur, (double) iters);
+}
+
+
+static void
+do_test (const char *s1, int flag)
+{
+
+  printf ("Pattern %s flags %i:", s1, flag);
+
+  FOR_EACH_IMPL (impl, 0)
+    do_one_test (impl, s1, flag);
+
+  putchar ('\n');
+}
+
+static int
+test_main (void)
+{
+  test_init ();
+
+  printf ("%23s", "");
+  FOR_EACH_IMPL (impl, 0)
+    printf ("\t%s", impl->name);
+  putchar ('\n');
+
+  do_test ("foo", 0);
+  do_test ("ffoo", 0);
+  do_test ("ffffoo", 0);
+  do_test ("ffffffffoo", 0);
+  do_test ("ffffffffffffffffoo", 0);
+  do_test ("ffffffffffffffffffffffffffffffffoo", 0);
+do_test ("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo", 0);
+
+  do_test ("?foo", 0);
+  do_test ("?ffoo", 0);
+  do_test ("?ffffoo", 0);
+  do_test ("?ffffffffoo", 0);
+  do_test ("?ffffffffffffffffoo", 0);
+  do_test ("?ffffffffffffffffffffffffffffffffoo", 0);
+do_test ("?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo", 0);
+
+
+  do_test ("*", 0);
+
+  printf ("Hundred byte IFS\n");
+
+
+  setenv("IFS","                                                  \
+                                               ", 1);
+
+  do_test ("foo", 0);
+  do_test ("ffoo", 0);
+  do_test ("ffffoo", 0);
+  do_test ("ffffffffoo", 0);
+  do_test ("ffffffffffffffffoo", 0);
+  do_test ("ffffffffffffffffffffffffffffffffoo", 0);
+do_test ("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo", 0);
+
+  do_test ("?foo", 0);
+  do_test ("?ffoo", 0);
+  do_test ("?ffffoo", 0);
+  do_test ("?ffffffffoo", 0);
+  do_test ("?ffffffffffffffffoo", 0);
+  do_test ("?ffffffffffffffffffffffffffffffffoo", 0);
+do_test ("?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo", 0);
+
+  do_test ("*", 0);
+
+  return ret;
+}
+
+#include "../test-skeleton.c"