Message ID | 003601d0c546$b559d3d0$200d7b70$@com |
---|---|
State | New |
Headers | show |
On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote: > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. The test uses 2 arrays with 1024 > doubles, one with 99% finite FP numbers (10% zeroes, 10% negative) and 1% inf/NaN, the other with > 50% inf, and 50% Nan. > > This version removes various tests that caused confusion and only leaves the existing GLIBC > definitions and inlines for comparison with the GCC builtins. I changed the tests to not inline > inside the loop and use a branch on the boolean result. The 64-bit immediates used by the GLIBC > inlines seem very expensive on some microarchitectures, so this shows even more clearly that using > the built-ins results in a significant performance gain (see x64 results below). > Thats better but still not ok. First is what we need to make explicit. You argue both that you use benchmark just to show that inlining provides speedup and to justify claims. You cannot have it both ways. If you decide to write benchmark just to show that inlining is better than not then write simple benchmark that compares current noinline implementation with builtin inlines. I would be ok with just that with comment that it shouldn't be used as justification for claims about current inlines. Thats simpler way if you really just want to show that builtins are better than noninline version. If you also want to make claims about builtins/current inlines then you need to make a benchmark that can accurately measure inlines and builtins to see what is correct and what is wrong. Thats quite hard as I said as it depends how gcc will optimize implementation. Here there are still unresolved issues from previous patches. First you still test on x64 without getting EXTRACT_WORDS64 from math_private. Without that you don't measure current inlines. First as isinf you ommited current isinf inline. As its faster than isinf_ns and builtin which you check. Then remainder test is wrong. It is inlined but kernel_standard isn't. As you wanted to used it to measure performance of noninline function it obviously doesn't measure that. When you fix that it clearly shows that current inlines are better on x64. "remainder_test1": { "normal": { "duration": 4.23772e+06, "iterations": 500, "mean": 8475 } }, "remainder_test2": { "normal": { "duration": 4.45968e+06, "iterations": 500, "mean": 8919 } }
> Ondřej Bílka wrote: > On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote: > > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. The test uses 2 arrays with > 1024 > > doubles, one with 99% finite FP numbers (10% zeroes, 10% negative) and 1% inf/NaN, the other > with > > 50% inf, and 50% Nan. > > > > This version removes various tests that caused confusion and only leaves the existing GLIBC > > definitions and inlines for comparison with the GCC builtins. I changed the tests to not > inline > > inside the loop and use a branch on the boolean result. The 64-bit immediates used by the > GLIBC > > inlines seem very expensive on some microarchitectures, so this shows even more clearly that > using > > the built-ins results in a significant performance gain (see x64 results below). > > > Thats better but still not ok. > > First is what we need to make explicit. You argue both that you use > benchmark just to show that inlining provides speedup and to justify > claims. You cannot have it both ways. Of course I can. This is not at all an either/or issue nor a contentious statement. The existing inlines are pretty inefficient so it shouldn't be a surprise to anyone that they can be easily beaten. And all the evidence shows that my patch improves performance of the existing inlines. My goal is to use the fastest possible implementation in all cases. What would the point be to add fast inlines to math.h but keeping the inefficient internal inlines? That makes no sense whatsoever unless your goal is to keep GLIBC slow... > If you also want to make claims about builtins/current inlines then you > need to make a benchmark that can accurately measure inlines and > builtins to see what is correct and what is wrong. Thats quite hard as I > said as it depends how gcc will optimize implementation. I am confident that my benchmark does a great job. I looked at the assembly code of all functions and concluded that GCC6 compiled the code exactly as I expected and so it is measuring precisely what I intended to measure. > Here there are still unresolved issues from previous patches. > > First you still test on x64 without getting EXTRACT_WORDS64 > from math_private. Without that you don't measure > current inlines. I ran my benchmark with the movq instruction and it doesn't change the conclusion in any way (store-load forwarding is pretty quick). So this is a non-issue - it is simply a bug in the x64 GCC that should be fixed and yet another reason why it is better to use a builtin that generates more optimal code without needing inline assembler. > First as isinf you ommited current isinf inline. As its faster than > isinf_ns and builtin which you check. There is no isinf inline. I just kept the 3 math_private.h inlines. > Then remainder test is wrong. It is inlined but kernel_standard isn't. > As you wanted to used it to measure performance of noninline function it > obviously doesn't measure that. Kernel_standard should not be inlined as the real implementation isn't. > When you fix that it clearly shows that current inlines are better on > x64. No that's not what it shows. In all cases the new inlines are faster. > "remainder_test1": { > "normal": { > "duration": 4.23772e+06, > "iterations": 500, > "mean": 8475 > } > }, > "remainder_test2": { > "normal": { > "duration": 4.45968e+06, > "iterations": 500, > "mean": 8919 > } > } Those results are way too fast, so something must be wrong with your changes. To conclude none your issues are actual issues with my benchmark. Wilco
On Fri, Jul 24, 2015 at 01:04:40PM +0100, Wilco Dijkstra wrote: > > Ondřej Bílka wrote: > > On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote: > > First is what we need to make explicit. You argue both that you use > > benchmark just to show that inlining provides speedup and to justify > > claims. You cannot have it both ways. > > Of course I can. This is not at all an either/or issue nor a contentious > statement. The existing inlines are pretty inefficient so it shouldn't > be a surprise to anyone that they can be easily beaten. And all the > evidence shows that my patch improves performance of the existing inlines. > As you selected a more difficult route your benchmark is just wrong. It claims that finite builtin is faster than inline. But when I look at actual benchtests I reach opposite conclusion. I ran benchtest three times on haswell and consistently get that replacing finite in pow function by isfinite causes regression. I also checked on core2 and ivy bridge, core2 shows no difference, on ivy bridge inlines are also faster than builtin. current timing: {"timing_type": "hp_timing", "functions": { "pow": { "": { "duration": 3.50115e+10, "iterations": 2.44713e+08, "max": 882.983, "min": 47.615, "mean": 143.072 }, "240bits": { "duration": 3.5122e+10, "iterations": 1.3e+06, "max": 33208, "min": 21679.5, "mean": 27016.9 }, "768bits": { "duration": 4.24365e+10, "iterations": 101000, "max": 442505, "min": 205110, "mean": 420163 } }, with builtin: {"timing_type": "hp_timing", "functions": { "pow": { "": { "duration": 3.49986e+10, "iterations": 2.4381e+08, "max": 868.965, "min": 47.612, "mean": 143.549 }, "240bits": { "duration": 3.5529e+10, "iterations": 1.4e+06, "max": 30331.3, "min": 20436, "mean": 25377.9 }, "768bits": { "duration": 3.98189e+10, "iterations": 101000, "max": 417778, "min": 194699, "mean": 394246 } }, > My goal is to use the fastest possible implementation in all cases. > What would the point be to add fast inlines to math.h but keeping the > inefficient internal inlines? That makes no sense whatsoever unless > your goal is to keep GLIBC slow... > Then you are forced to take more difficult route. It isn't about inefficient inlines but also about any other inlines introduced. As performance varies wildly on how do you benchmark it you will select implementation that best on benchmark. That doesn't have anything in common in practical performance. > > If you also want to make claims about builtins/current inlines then you > > need to make a benchmark that can accurately measure inlines and > > builtins to see what is correct and what is wrong. Thats quite hard as I > > said as it depends how gcc will optimize implementation. > > I am confident that my benchmark does a great job. I looked at the > assembly code of all functions and concluded that GCC6 compiled the > code exactly as I expected and so it is measuring precisely what I > intended to measure. > And what you measure? > > Here there are still unresolved issues from previous patches. > > > > First you still test on x64 without getting EXTRACT_WORDS64 > > from math_private. Without that you don't measure > > current inlines. > > I ran my benchmark with the movq instruction and it doesn't change the > conclusion in any way (store-load forwarding is pretty quick). > > So this is a non-issue - it is simply a bug in the x64 GCC that should > be fixed and yet another reason why it is better to use a builtin that > generates more optimal code without needing inline assembler. > Problem is that this could be cpu specific where it could matter in some cpu. Also its just matter that just measure correct function, telling that it doesn't matter in one case does not imply that it wouldn't matter for different workload. > > First as isinf you ommited current isinf inline. As its faster than > > isinf_ns and builtin which you check. > > There is no isinf inline. I just kept the 3 math_private.h inlines. > Thanks for clarification. > > Then remainder test is wrong. It is inlined but kernel_standard isn't. > > As you wanted to used it to measure performance of noninline function it > > obviously doesn't measure that. > > Kernel_standard should not be inlined as the real implementation isn't. > Now when I looked you test different function than remainder as real implementation calls __ieee754_remainder and handles differently overflow. This could also make difference so > > When you fix that it clearly shows that current inlines are better on > > x64. > > No that's not what it shows. In all cases the new inlines are faster. > > > "remainder_test1": { > > "normal": { > > "duration": 4.23772e+06, > > "iterations": 500, > > "mean": 8475 > > } > > }, > > "remainder_test2": { > > "normal": { > > "duration": 4.45968e+06, > > "iterations": 500, > > "mean": 8919 > > } > > } > > Those results are way too fast, so something must be wrong with your changes. > I got timeout for that so I decreased size. That doesn't change outcome that with fixed test current inlines are faster. Also it does not change that you have bug there and should fix it, and this isn't real remainder as you need to call ieee754 remainder which is different workload as you need to do second check there and OoO execution could help you. > To conclude none your issues are actual issues with my benchmark. > No, there are still plenty of issues. You need to be more careful what you do benchmark.
> Ondřej Bílka wrote: > On Fri, Jul 24, 2015 at 01:04:40PM +0100, Wilco Dijkstra wrote: > > > Ondřej Bílka wrote: > > > On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote: > It claims that finite builtin is faster than inline. But when I look at > actual benchtests I reach opposite conclusion. I ran benchtest three > times on haswell and consistently get that replacing finite in pow > function by isfinite causes regression. > > I also checked on core2 and ivy bridge, core2 shows no difference, on > ivy bridge inlines are also faster than builtin. There is no regression in your results. On x64 I get a consistent 4.6% speedup on pow, 8.4% on exp2 and 9.2% on atan. Average speedup of my patch is 1% across all math functions. See attached results. > > My goal is to use the fastest possible implementation in all cases. > > What would the point be to add fast inlines to math.h but keeping the > > inefficient internal inlines? That makes no sense whatsoever unless > > your goal is to keep GLIBC slow... > > > Then you are forced to take more difficult route. It isn't about > inefficient inlines but also about any other inlines introduced. The only inlines that matter are the inefficient ones because there are no other inlines. > As performance varies wildly on how do you benchmark it you will select > implementation that best on benchmark. That doesn't have anything in > common in practical performance. Of course it does. A micro benchmark is the best way to evaluate which code sequence is best. And since the math functions show significant speedups with my patch, it proves that the results of my micro benchmark are 100% accurate. > Problem is that this could be cpu specific where it could matter in some > cpu. Also its just matter that just measure correct function, telling > that it doesn't matter in one case does not imply that it wouldn't > matter for different workload. Speculation... > Now when I looked you test different function than remainder as real > implementation calls __ieee754_remainder and handles differently > overflow. This could also make difference so __ieee754_remainder is not exported so you can't call it. > Also it does not change that you have bug there and should fix it, and > this isn't real remainder as you need to call ieee754 remainder which is > different workload as you need to do second check there and OoO > execution could help you. It doesn't make any difference as the remainder call has a constant overhead. > > To conclude none your issues are actual issues with my benchmark. > > > No, there are still plenty of issues. You need to be more careful what > you do benchmark. So far you have not pointed out a single concrete issue. Wilco
diff --git a/benchtests/Makefile b/benchtests/Makefile index 8e615e5..91970f8 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -36,6 +36,7 @@ string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \ strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \ strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok \ strcoll + string-bench-all := $(string-bench) # We have to generate locales @@ -50,7 +51,10 @@ stdlib-bench := strtod stdio-common-bench := sprintf -benchset := $(string-bench-all) $(stdlib-bench) $(stdio-common-bench) +math-benchset := math-inlines + +benchset := $(string-bench-all) $(stdlib-bench) $(stdio-common-bench) \ + $(math-benchset) CFLAGS-bench-ffs.c += -fno-builtin CFLAGS-bench-ffsll.c += -fno-builtin @@ -58,6 +62,7 @@ CFLAGS-bench-ffsll.c += -fno-builtin bench-malloc := malloc-thread $(addprefix $(objpfx)bench-,$(bench-math)): $(libm) +$(addprefix $(objpfx)bench-,$(math-benchset)): $(libm) $(addprefix $(objpfx)bench-,$(bench-pthread)): $(shared-thread-library) $(objpfx)bench-malloc-thread: $(shared-thread-library) diff --git a/benchtests/bench-math-inlines.c b/benchtests/bench-math-inlines.c new file mode 100644 index 0000000..cc4f008 --- /dev/null +++ b/benchtests/bench-math-inlines.c @@ -0,0 +1,284 @@ +/* Measure math inline functions. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#define SIZE 1024 +#define TEST_MAIN +#define TEST_NAME "math-inlines" +#define TEST_FUNCTION test_main () +#include "bench-timing.h" +#include "json-lib.h" +#include "bench-util.h" + +#include <stdlib.h> +#include <math.h> +#include <stdint.h> + +#define BOOLTEST(func) \ +int __attribute__((noinline)) \ +func ## _f (double d, int i) \ +{ \ + if (func (d)) \ + return (int)d + i; \ + else \ + return 5; \ +} \ +int \ +func ## _t (volatile double *p, size_t n, size_t iters) \ +{ \ + int i, j; \ + int res = 0; \ + for (j = 0; j < iters; j++) \ + for (i = 0; i < n; i++) \ + if (func ## _f (p[i] * 2.0, i)) \ + res += 5; \ + return res; \ +} + +#define VALUETEST(func) \ +int __attribute__((noinline)) \ +func ## _f (double d) \ +{ \ + return func (d); \ +} \ +int \ +func ## _t (volatile double *p, size_t n, size_t iters) \ +{ \ + int i, j; \ + int res = 0; \ + for (j = 0; j < iters; j++) \ + for (i = 0; i < n; i++) \ + res += func ## _f (p[i] * 2.0); \ + return res; \ +} + +typedef union +{ + double value; + uint64_t word; +} ieee_double_shape_type; + +#define EXTRACT_WORDS64(i,d) \ +do { \ + ieee_double_shape_type gh_u; \ + gh_u.value = (d); \ + (i) = gh_u.word; \ +} while (0) + +/* Inlines similar to existing math_private.h versions. */ + +extern __always_inline int +__isnan_inl (double d) +{ + uint64_t di; + EXTRACT_WORDS64 (di, d); + return (di & 0x7fffffffffffffffull) > 0x7ff0000000000000ull; +} + +extern __always_inline int +__isinf_ns (double d) +{ + uint64_t di; + EXTRACT_WORDS64 (di, d); + return (di & 0x7fffffffffffffffull) == 0x7ff0000000000000ull; +} +extern __always_inline int +__finite_inl (double d) +{ + uint64_t di; + EXTRACT_WORDS64 (di, d); + return (di & 0x7fffffffffffffffull) < 0x7ff0000000000000ull; +} + +#define __isnormal_inl(X) (__fpclassify (X) == FP_NORMAL) + +/* Inlines for the builtin functions. */ + +#define __isnan_builtin(X) __builtin_isnan (X) +#define __isinf_ns_builtin(X) __builtin_isinf (X) +#define __isinf_builtin(X) __builtin_isinf_sign (X) +#define __isfinite_builtin(X) __builtin_isfinite (X) +#define __isnormal_builtin(X) __builtin_isnormal (X) +#define __fpclassify_builtin(X) __builtin_fpclassify (FP_NAN, FP_INFINITE, \ + FP_NORMAL, FP_SUBNORMAL, FP_ZERO, (X)) + +double __attribute ((noinline)) +kernel_standard (double x, double y, int z) +{ + return x * y + z; +} + +volatile double rem1 = 2.5; + +extern __always_inline int +remainder_test1 (double x) +{ + double y = rem1; + if (((__builtin_expect (y == 0.0, 0) && !__isnan_inl (x)) + || (__builtin_expect (__isinf_ns (x), 0) && !__isnan_inl (y)))) + return kernel_standard (x, y, 10); + + return remainder (x, y); +} + +extern __always_inline int +remainder_test2 (double x) +{ + double y = rem1; + if (((__builtin_expect (y == 0.0, 0) && !__builtin_isnan (x)) + || (__builtin_expect (__builtin_isinf (x), 0) && !__builtin_isnan (y)))) + return kernel_standard (x, y, 10); + + return remainder (x, y); +} + +/* Create test functions for each possibility. */ + +BOOLTEST (__isnan) +BOOLTEST (__isnan_inl) +BOOLTEST (__isnan_builtin) +BOOLTEST (isnan) + +BOOLTEST (__isinf) +BOOLTEST (__isinf_builtin) +BOOLTEST (__isinf_ns) +BOOLTEST (__isinf_ns_builtin) +BOOLTEST (isinf) + +BOOLTEST (__finite) +BOOLTEST (__finite_inl) +BOOLTEST (__isfinite_builtin) +BOOLTEST (isfinite) + +BOOLTEST (__isnormal_inl) +BOOLTEST (__isnormal_builtin) +BOOLTEST (isnormal) + +VALUETEST (__fpclassify) +VALUETEST (__fpclassify_builtin) +VALUETEST (fpclassify) + +BOOLTEST (remainder_test1) +BOOLTEST (remainder_test2) + +typedef int (*proto_t) (volatile double *p, size_t n, size_t iters); + +typedef struct +{ + const char *name; + proto_t fn; +} impl_t; + +#define IMPL(name) { #name, name ## _t } + +impl_t test_list[] = +{ + IMPL (__isnan), + IMPL (__isnan_inl), + IMPL (__isnan_builtin), + IMPL (isnan), + + IMPL (__isinf), + IMPL (__isinf_ns), + IMPL (__isinf_ns_builtin), + IMPL (__isinf_builtin), + IMPL (isinf), + + IMPL (__finite), + IMPL (__finite_inl), + IMPL (__isfinite_builtin), + IMPL (isfinite), + + IMPL (__isnormal_inl), + IMPL (__isnormal_builtin), + IMPL (isnormal), + + IMPL (__fpclassify), + IMPL (__fpclassify_builtin), + IMPL (fpclassify), + + IMPL (remainder_test1), + IMPL (remainder_test2) +}; + +static void +do_one_test (json_ctx_t *json_ctx, proto_t test_fn, volatile double *arr, + size_t len, const char *testname) +{ + size_t iters = 500; + timing_t start, stop, cur; + + json_attr_object_begin (json_ctx, testname); + + TIMING_NOW (start); + test_fn (arr, len, iters); + TIMING_NOW (stop); + TIMING_DIFF (cur, start, stop); + + json_attr_double (json_ctx, "duration", cur); + json_attr_double (json_ctx, "iterations", iters); + json_attr_double (json_ctx, "mean", cur / iters); + json_attr_object_end (json_ctx); +} + +static volatile double arr1[SIZE]; +static volatile double arr2[SIZE]; + +int +test_main (void) +{ + json_ctx_t json_ctx; + size_t i; + + bench_start (); + + json_init (&json_ctx, 2, stdout); + json_attr_object_begin (&json_ctx, "math-inlines"); + + /* Create 2 test arrays, one with 10% zeroes, 10% negative values, + 79% positive values and 1% infinity/NaN. The other contains + 50% inf, 50% NaN. */ + + for (i = 0; i < SIZE; i++) + { + int x = rand () & 255; + arr1[i] = (x < 25) ? 0.0 : ((x < 50) ? -1 : 100); + if (x == 255) arr1[i] = __builtin_inf (); + if (x == 254) arr1[i] = __builtin_nan ("0"); + arr2[i] = (x < 128) ? __builtin_inf () : __builtin_nan ("0"); + } + + for (i = 0; i < sizeof (test_list) / sizeof (test_list[0]); i++) + { + json_attr_object_begin (&json_ctx, test_list[i].name); + do_one_test (&json_ctx, test_list[i].fn, arr2, SIZE, "inf/nan"); + json_attr_object_end (&json_ctx); + } + + for (i = 0; i < sizeof (test_list) / sizeof (test_list[0]); i++) + { + json_attr_object_begin (&json_ctx, test_list[i].name); + do_one_test (&json_ctx, test_list[i].fn, arr1, SIZE, "normal"); + json_attr_object_end (&json_ctx); + } + + json_attr_object_end (&json_ctx); + return 0; +} + +#include "bench-util.c" +#include "../test-skeleton.c" diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c index e357f0c..bc820df 100644 --- a/benchtests/bench-skeleton.c +++ b/benchtests/bench-skeleton.c @@ -24,21 +24,9 @@ #include <inttypes.h> #include "bench-timing.h" #include "json-lib.h" +#include "bench-util.h" -volatile unsigned int dontoptimize = 0; - -void -startup (void) -{ - /* This loop should cause CPU to switch to maximal freqency. - This makes subsequent measurement more accurate. We need a side effect - to prevent the loop being deleted by compiler. - This should be enough to cause CPU to speed up and it is simpler than - running loop for constant time. This is used when user does not have root - access to set a constant freqency. */ - for (int k = 0; k < 10000000; k++) - dontoptimize += 23 * dontoptimize + 2; -} +#include "bench-util.c" #define TIMESPEC_AFTER(a, b) \ (((a).tv_sec == (b).tv_sec) ? \ @@ -56,7 +44,7 @@ main (int argc, char **argv) if (argc == 2 && !strcmp (argv[1], "-d")) detailed = true; - startup(); + bench_start (); memset (&runtime, 0, sizeof (runtime)); diff --git a/benchtests/bench-util.c b/benchtests/bench-util.c new file mode 100644 index 0000000..c4149ae --- /dev/null +++ b/benchtests/bench-util.c @@ -0,0 +1,34 @@ +/* Benchmark utility functions. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + + +static volatile unsigned int dontoptimize = 0; + +void +bench_start (void) +{ + /* This loop should cause CPU to switch to maximal freqency. + This makes subsequent measurement more accurate. We need a side effect + to prevent the loop being deleted by compiler. + This should be enough to cause CPU to speed up and it is simpler than + running loop for constant time. This is used when user does not have root + access to set a constant freqency. */ + + for (int k = 0; k < START_ITER; k++) + dontoptimize += 23 * dontoptimize + 2; +} diff --git a/benchtests/bench-util.h b/benchtests/bench-util.h new file mode 100644 index 0000000..930cecc --- /dev/null +++ b/benchtests/bench-util.h @@ -0,0 +1,28 @@ +/* Benchmark utility functions. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + + +#ifndef START_ITER +# define START_ITER (100000000) +#endif + +/* bench_start reduces the random variations due to frequency scaling by + executing a small loop with many memory accesses. START_ITER controls + the number of iterations. */ + +void bench_start (void);