Message ID | DB6PR0801MB2053B473F89233B992CEE1B383B60@DB6PR0801MB2053.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Fri, 4 Aug 2017, Wilco Dijkstra wrote: > This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C). I don't think you can do that for non-positive C. > Do this only for fast-math as accuracy is reduced. This is much faster > since pow is more complex than exp - with a current GLIBC the speedup > is more than 7 times for this transformation. Is it bound to be so on future glibc revisions and non-glibc platforms? Alexander
On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov <amonakov@ispras.ru> wrote: > On Fri, 4 Aug 2017, Wilco Dijkstra wrote: >> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C). > > I don't think you can do that for non-positive C. Hmm, the question is also how this interacts with other folders like sqrt (pow (x, y)) -> pow (|x|, y * 0,5)? Also we seem to miss pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may be a better fit than exp (log (2/10) * x)? OTOH for fast-math canonicalization getting rid of exp2/10 and pow10 might be beneficial. >> Do this only for fast-math as accuracy is reduced. This is much faster >> since pow is more complex than exp - with a current GLIBC the speedup >> is more than 7 times for this transformation. > > Is it bound to be so on future glibc revisions and non-glibc platforms? And how is accuracy affected? I think the transform is only reasonable for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10). Can you provide an idea on whether there's a systematic error (with glibc) and how that behaves over the parameter space? Oh, and what value of C does the benchmark that triggered this have? Richard. > Alexander
Richard Biener wrote: > On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov <amonakov@ispras.ru> wrote: > > On Fri, 4 Aug 2017, Wilco Dijkstra wrote: > >> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C). > > > > I don't think you can do that for non-positive C. True, that can be easily disallowed. > Hmm, the question is also how this interacts with other folders like > sqrt (pow (x, y)) -> pow (|x|, y * 0,5)? Also we seem to miss We fold sqrt (pow (C, x)) into pow (C, x * 0.5) first, then fold that to exp. > pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may > be a better fit than exp (log (2/10) * x)? OTOH for fast-math > canonicalization getting rid of exp2/10 and pow10 might be beneficial. exp10 is non-standard and doesn't have a first-class implementation in GLIBC. Although pow (10, x) is frequently used in Fortran, I can't get exp10 emitted by match.pd... >> Do this only for fast-math as accuracy is reduced. This is much faster >> since pow is more complex than exp - with a current GLIBC the speedup >> is more than 7 times for this transformation. > > Is it bound to be so on future glibc revisions and non-glibc platforms? Yes, pow is basically log followed by exp, so exp will always be cheaper than pow. How much will obviously vary depending on the implementation. Szabolc's highly optimized expf has 3x throughput of the optimized powf. > And how is accuracy affected? I think the transform is only reasonable > for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10). Can you > provide an idea on whether there's a systematic error (with glibc) and > how that behaves over the parameter space? Accuracy depends again on the library implementation. If log (C) is accurate (ie. far less than 0.5 ULP), and exp (x) accurate to 0.5 ULP then you get perfect answers over the full range. The exp function has the largest steps close to inf - a 1 ULP change in input changes the output by 1024 ULP (128 ULP for expf). So a 0.5ULP input error would give ~512 ULP error if the final result is close to inf. In practice the output doesn't get anywhere near inf, so ULP errors are far smaller. > Oh, and what value of C does the benchmark that triggered this have? 10 appears quite common. I extracted a runtime log of all powf calls in SPEC (see https://sourceware.org/ml/libc-alpha/2017-06/msg00718.html) and noticed a lot of repetition in some inputs. Further investigation showed many uses of pow have a constant first operand, so an obvious target for optimization. Wilco
On Fri, 4 Aug 2017, Richard Biener wrote: > >> Do this only for fast-math as accuracy is reduced. This is much faster > >> since pow is more complex than exp - with a current GLIBC the speedup > >> is more than 7 times for this transformation. > > > > Is it bound to be so on future glibc revisions and non-glibc platforms? > > And how is accuracy affected? I think the transform is only reasonable For pow to be accurate when the result has large (positive or negative) exponent, it needs to compute the log and intermediate multiplication to a precision around (number of mantissa bits + number of exponent bits). This is inevitably slower than when you omit the extra intermediate precision (and if you omit that precision, the error can be around MAX_EXP ulps).
diff --git a/gcc/match.pd b/gcc/match.pd index e98db52af84946cf579c6434e06d450713a47162..96486aa1f512fe32d85a1de95c46523263ea1b6d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3548,6 +3548,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (logs (pows @0 @1)) (mult @1 (logs @0)))) + /* pow(C,x) -> exp(log(C)*x). */ + (for pows (POW) + exps (EXP) + logs (LOG) + (simplify + (pows REAL_CST@0 @1) + (exps (mult (logs @0) @1)))) + (for sqrts (SQRT) cbrts (CBRT) pows (POW)