Message ID | 20150512210946.GB24057@domone |
---|---|
State | New |
Headers | show |
On 05/12/2015 11:09 PM, Ondřej Bílka wrote: > Main problem of current benchtest was very biased input. I replaced that > with randomly generated ones that are closer to average case. But strstr is not run on random strings. Is there some easy way to generated traces of actual strstr arguments from a running system, perhaps using Systemtap?
On Thu, May 14, 2015 at 11:33:58AM +0200, Florian Weimer wrote: > On 05/12/2015 11:09 PM, Ondřej Bílka wrote: > > Main problem of current benchtest was very biased input. I replaced that > > with randomly generated ones that are closer to average case. > > But strstr is not run on random strings. > > Is there some easy way to generated traces of actual strstr arguments > from a running system, perhaps using Systemtap? > Use my profiler here. http://kam.mff.cuni.cz/~ondra/benchmark_string/strstr_profile250813.tar.bz2 But it won't help you much with this case. Haystacks are really short, 61% of calls end with match in first 16 bytes. See: http://kam.mff.cuni.cz/~ondra/benchmark_string/profile/results/result.html strstr function or http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_ivy_bridge/strstr_profile/results_gcc/result.html While its extremely valuable information and you should focus to optimize header it would probably report little change as both implementations just do initial strchr. Longer strings where you could see improvement don't exist in practice. So you are left with generating some data.
On Thu, May 14, 2015 at 11:33:58AM +0200, Florian Weimer wrote: > On 05/12/2015 11:09 PM, Ondřej Bílka wrote: > > Main problem of current benchtest was very biased input. I replaced that > > with randomly generated ones that are closer to average case. > > But strstr is not run on random strings. > > Is there some easy way to generated traces of actual strstr arguments > from a running system, perhaps using Systemtap? > To be able to rerun these I wrote a dryrun, I send a previous version of it some time ago. As systemtap problem is that it offers only problems with zero benefit as you need just write strings to disk which is easier just in pure c without unnecessary wrapping of that, I put current version here: http://kam.mff.cuni.cz/~ondra/dryrun.tar.bz2 You should record data with LD_PRELOAD=path/record_strstr.so bash These could be replayed with LD_PRELOADed suitable alternative implementation with LD_PRELOAD=new_strstr.so ./bench_strstr Then you can inspect data with ./show_strstr and finally summaries from these are created by ./summary #for brief one or ./summary_strstr -u # to show only unique binaries or ./summary_strstr -a # to also show repeated recordings of same binary For me ./summary_strstr -u shows following and I wont submit raw data as it contains at least list of my mail headers. replaying mc average size 4.62 needle size 1.01 comparisons 6.62 digraphs 0.02 trigraphs 0.00 calls 2440 succeed 99.7% latencies -18.7 -18.6 s1 aligned to 4 bytes 99.9% aligned to 8 bytes 94.5% aligned to 16 bytes 94.5% s2 aligned to 4 bytes 99.8% aligned to 8 bytes 99.8% aligned to 16 bytes 99.7% needle found in n bytes: n <= 0: 0.2% n <= 1: 17.0% n <= 2: 66.4% n <= 3: 69.1% n <= 4: 71.0% n <= 8: 81.4% n <= 16: 95.6% n <= 32: 99.8% n <= 64: 100.0% needle size: n <= 0: 0.0% n <= 1: 99.8% n <= 2: 99.8% n <= 3: 99.8% n <= 4: 99.9% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying mutt average size 20.60 needle size 2.01 comparisons 20.90 digraphs 0.08 trigraphs 0.00 calls 3505 succeed 8.4% latencies -97.4 -88.6 s1 aligned to 4 bytes 99.7% aligned to 8 bytes 99.5% aligned to 16 bytes 99.5% s2 aligned to 4 bytes 1.4% aligned to 8 bytes 0.7% aligned to 16 bytes 0.1% needle found in n bytes: n <= 0: 8.0% n <= 1: 8.3% n <= 2: 8.3% n <= 3: 8.9% n <= 4: 9.0% n <= 8: 12.2% n <= 16: 68.9% n <= 32: 84.1% n <= 64: 94.8% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 99.3% n <= 3: 99.9% n <= 4: 99.9% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/bin/python average size 53.38 needle size 8.55 comparisons 54.57 digraphs 0.10 trigraphs 0.00 calls 60 succeed 0.0% latencies -73.0 -68.5 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 26.7% aligned to 8 bytes 26.7% aligned to 16 bytes 26.7% needle found in n bytes: n <= 0: 1.7% n <= 1: 1.7% n <= 2: 1.7% n <= 3: 1.7% n <= 4: 1.7% n <= 8: 1.7% n <= 16: 1.7% n <= 32: 21.7% n <= 64: 60.0% needle size: n <= 0: 1.7% n <= 1: 1.7% n <= 2: 1.7% n <= 3: 1.7% n <= 4: 26.7% n <= 8: 26.7% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying /opt/google/chrome/chrome average size 6.95 needle size 1.35 comparisons 8.63 digraphs 0.05 trigraphs 0.00 calls 1990 succeed 79.8% latencies -44.7 -46.4 s1 aligned to 4 bytes 96.7% aligned to 8 bytes 95.5% aligned to 16 bytes 95.0% s2 aligned to 4 bytes 90.5% aligned to 8 bytes 89.6% aligned to 16 bytes 75.4% needle found in n bytes: n <= 0: 0.2% n <= 1: 0.2% n <= 2: 0.3% n <= 3: 0.4% n <= 4: 46.0% n <= 8: 78.3% n <= 16: 96.6% n <= 32: 99.7% n <= 64: 99.9% needle size: n <= 0: 0.1% n <= 1: 90.6% n <= 2: 95.0% n <= 3: 95.7% n <= 4: 95.7% n <= 8: 99.0% n <= 16: 99.8% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/lib/kde4/libexec/kdm_greet average size 11.38 needle size 8.99 comparisons 11.41 digraphs 0.00 trigraphs 0.00 calls 23221 succeed 0.2% latencies -14.4 -13.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 0.1% aligned to 8 bytes 0.0% aligned to 16 bytes 0.0% needle found in n bytes: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 99.8% n <= 32: 99.8% n <= 64: 99.8% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.2% n <= 8: 0.3% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying su average size 40.09 needle size 9.60 comparisons 45.20 digraphs 0.29 trigraphs 0.00 calls 91 succeed 0.0% latencies -2.1 -0.4 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 51.6% aligned to 16 bytes 51.6% s2 aligned to 4 bytes 1.1% aligned to 8 bytes 1.1% aligned to 16 bytes 1.1% needle found in n bytes: n <= 0: 1.1% n <= 1: 1.1% n <= 2: 1.1% n <= 3: 1.1% n <= 4: 1.1% n <= 8: 4.4% n <= 16: 26.4% n <= 32: 28.6% n <= 64: 100.0% needle size: n <= 0: 1.1% n <= 1: 1.1% n <= 2: 1.1% n <= 3: 1.1% n <= 4: 1.1% n <= 8: 1.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying make average size 32.87 needle size 1.20 comparisons 34.73 digraphs 0.01 trigraphs 0.01 calls 277 succeed 91.7% latencies -12.5 -10.5 s1 aligned to 4 bytes 31.0% aligned to 8 bytes 17.3% aligned to 16 bytes 13.4% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 98.6% needle found in n bytes: n <= 0: 0.4% n <= 1: 4.3% n <= 2: 4.7% n <= 3: 6.9% n <= 4: 7.6% n <= 8: 9.7% n <= 16: 14.8% n <= 32: 35.0% n <= 64: 99.6% needle size: n <= 0: 0.4% n <= 1: 95.3% n <= 2: 95.7% n <= 3: 97.1% n <= 4: 97.1% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying gdb average size 14.69 needle size 6.77 comparisons 15.46 digraphs 0.06 trigraphs 0.01 calls 17206 succeed 5.0% latencies 8.4 10.2 s1 aligned to 4 bytes 52.6% aligned to 8 bytes 45.0% aligned to 16 bytes 41.3% s2 aligned to 4 bytes 9.6% aligned to 8 bytes 9.1% aligned to 16 bytes 9.1% needle found in n bytes: n <= 0: 0.2% n <= 1: 1.0% n <= 2: 4.3% n <= 3: 5.3% n <= 4: 7.6% n <= 8: 32.9% n <= 16: 81.9% n <= 32: 92.1% n <= 64: 96.7% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 27.9% n <= 3: 62.5% n <= 4: 64.8% n <= 8: 65.1% n <= 16: 92.7% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/bin/urxvt average size 10.04 needle size 9.01 comparisons 10.05 digraphs 0.00 trigraphs 0.00 calls 11674 succeed 0.0% latencies -5.5 -0.7 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 0.6% aligned to 8 bytes 0.3% aligned to 16 bytes 0.2% needle found in n bytes: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 99.8% n <= 32: 100.0% n <= 64: 100.0% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.3% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0% replaying man average size 7.91 needle size 29.43 comparisons 10.99 digraphs 0.24 trigraphs 0.24 calls 74 succeed 33.8% latencies 0.0 0.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 63.5% aligned to 8 bytes 63.5% aligned to 16 bytes 62.2% needle found in n bytes: n <= 0: 47.3% n <= 1: 47.3% n <= 2: 47.3% n <= 3: 51.4% n <= 4: 51.4% n <= 8: 51.4% n <= 16: 83.8% n <= 32: 100.0% n <= 64: 100.0% needle size: n <= 0: 1.4% n <= 1: 1.4% n <= 2: 1.4% n <= 3: 2.7% n <= 4: 5.4% n <= 8: 6.8% n <= 16: 39.2% n <= 32: 55.4% n <= 64: 100.0% replaying troff average size 11.87 needle size 9.00 comparisons 11.87 digraphs 0.00 trigraphs 0.00 calls 5498 succeed 0.0% latencies 0.0 0.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 0.0% aligned to 8 bytes 0.0% aligned to 16 bytes 0.0% needle found in n bytes: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 81.5% n <= 32: 100.0% n <= 64: 100.0% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c index 74f3ee8..0412c1b 100644 --- a/benchtests/bench-strstr.c +++ b/benchtests/bench-strstr.c @@ -81,31 +81,15 @@ do_test (size_t align1, size_t align2, size_t len1, size_t len2, char *s1 = (char *) (buf1 + align1); char *s2 = (char *) (buf2 + align2); - static const char d[] = "1234567890abcdef"; -#define dl (sizeof (d) - 1) - char *ss2 = s2; - for (size_t l = len2; l > 0; l = l > dl ? l - dl : 0) + for (size_t l = 0; l < len2; l++) { - size_t t = l > dl ? dl : l; - ss2 = mempcpy (ss2, d, t); + s2[l] = (rand () % 128) + 1; } s2[len2] = '\0'; - if (fail) + for (size_t l = 0; l < len1; l++) { - char *ss1 = s1; - for (size_t l = len1; l > 0; l = l > dl ? l - dl : 0) - { - size_t t = l > dl ? dl : l; - memcpy (ss1, d, t); - ++ss1[len2 > 7 ? 7 : len2 - 1]; - ss1 += t; - } - } - else - { - memset (s1, '0', len1); - memcpy (s1 + len1 - len2, s2, len2); + s1[l] = (rand () % 128) + 1; } s1[len1] = '\0';