Simplify pow with constant

Message ID	DB6PR0801MB2053B473F89233B992CEE1B383B60@DB6PR0801MB2053.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-459809-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=Qw4 MHXosjFt130xU1aQi0t2YqOzpWRLKBSFinemIPoyIl48n79A8YsOGg2p9+7bJpnS Jzz7L/VT+xHjmHIFZ98r7F7oa+qSEAB1dTmTVLLYmhS9BNfnSD7QQDbTsoUrCZqx St4QWi8QYATwjj/EDa13bzId8gXbomaojB6m6Ej4= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com> Subject: [PATCH] Simplify pow with constant Date: Fri, 4 Aug 2017 11:23:06 +0000 Message-ID: <DB6PR0801MB2053B473F89233B992CEE1B383B60@DB6PR0801MB2053.eurprd08.prod.outlook.com> nodisclaimer: True received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

DB6PR0801MB2053B473F89233B992CEE1B383B60@DB6PR0801MB2053.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=Qw4
	MHXosjFt130xU1aQi0t2YqOzpWRLKBSFinemIPoyIl48n79A8YsOGg2p9+7bJpnS
	Jzz7L/VT+xHjmHIFZ98r7F7oa+qSEAB1dTmTVLLYmhS9BNfnSD7QQDbTsoUrCZqx
	St4QWi8QYATwjj/EDa13bzId8gXbomaojB6m6Ej4=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>
Subject: [PATCH] Simplify pow with constant
Date: Fri, 4 Aug 2017 11:23:06 +0000
Message-ID: <DB6PR0801MB2053B473F89233B992CEE1B383B60@DB6PR0801MB2053.eurprd08.prod.outlook.com>
nodisclaimer: True
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Aug 2017 11:23:06.2734
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2055

Commit Message

Wilco Dijkstra Aug. 4, 2017, 11:23 a.m. UTC

This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
Do this only for fast-math as accuracy is reduced.  This is much faster
since pow is more complex than exp - with a current GLIBC the speedup
is more than 7 times for this transformation.

ChangeLog:
2017-08-04  Wilco Dijkstra  <wdijkstr@arm.com>

	* match.pd: Add pow (C, x) simplification.

--

Comments

Alexander Monakov Aug. 4, 2017, 12:26 p.m. UTC | #1

On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).

I don't think you can do that for non-positive C.

> Do this only for fast-math as accuracy is reduced.  This is much faster
> since pow is more complex than exp - with a current GLIBC the speedup
> is more than 7 times for this transformation.

Is it bound to be so on future glibc revisions and non-glibc platforms?

Alexander

Richard Biener Aug. 4, 2017, 12:44 p.m. UTC | #2

On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
>> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
>
> I don't think you can do that for non-positive C.

Hmm, the question is also how this interacts with other folders like
sqrt (pow (x, y)) -> pow (|x|, y * 0,5)?  Also we seem to miss
pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may
be a better fit than exp (log (2/10) * x)?  OTOH for fast-math
canonicalization getting rid of exp2/10 and pow10 might be beneficial.

>> Do this only for fast-math as accuracy is reduced.  This is much faster
>> since pow is more complex than exp - with a current GLIBC the speedup
>> is more than 7 times for this transformation.
>
> Is it bound to be so on future glibc revisions and non-glibc platforms?

And how is accuracy affected?  I think the transform is only reasonable
for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10).  Can you
provide an idea on whether there's a systematic error (with glibc) and
how that behaves over the parameter space?

Oh, and what value of C does the benchmark that triggered this have?

Richard.

> Alexander

Wilco Dijkstra Aug. 4, 2017, 3:28 p.m. UTC | #3

Richard Biener wrote:
> On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> > On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
> >> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
> >
> > I don't think you can do that for non-positive C.

True, that can be easily disallowed.

> Hmm, the question is also how this interacts with other folders like
> sqrt (pow (x, y)) -> pow (|x|, y * 0,5)?  Also we seem to miss

We fold sqrt (pow (C, x)) into pow (C, x * 0.5) first, then fold that to exp.

> pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may
> be a better fit than exp (log (2/10) * x)?  OTOH for fast-math
> canonicalization getting rid of exp2/10 and pow10 might be beneficial.

exp10 is non-standard and doesn't have a first-class implementation in GLIBC.
Although pow (10, x) is frequently used in Fortran, I can't get exp10 emitted
by match.pd...

>> Do this only for fast-math as accuracy is reduced.  This is much faster
>> since pow is more complex than exp - with a current GLIBC the speedup
>> is more than 7 times for this transformation.
>
> Is it bound to be so on future glibc revisions and non-glibc platforms?

Yes, pow is basically log followed by exp, so exp will always be cheaper than
pow. How much will obviously vary depending on the implementation.
Szabolc's highly optimized expf has 3x throughput of the optimized powf.

> And how is accuracy affected?  I think the transform is only reasonable
> for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10).  Can you
> provide an idea on whether there's a systematic error (with glibc) and
> how that behaves over the parameter space?

Accuracy depends again on the library implementation. If log (C) is accurate
(ie. far less than 0.5 ULP), and exp (x) accurate to 0.5 ULP then you get 
perfect answers over the full range.

The exp function has the largest steps close to inf - a 1 ULP change in input
changes the output by 1024 ULP (128 ULP for expf). So a 0.5ULP input error
would give ~512 ULP error if the final result is close to inf. In practice the output
doesn't get anywhere near inf, so ULP errors are far smaller.

> Oh, and what value of C does the benchmark that triggered this have?

10 appears quite common. I extracted a runtime log of all powf calls in SPEC
(see https://sourceware.org/ml/libc-alpha/2017-06/msg00718.html) and noticed
a lot of repetition in some inputs. Further investigation showed many uses of
pow have a constant first operand, so an obvious target for optimization.

Wilco

Joseph Myers Aug. 4, 2017, 10:38 p.m. UTC | #4

On Fri, 4 Aug 2017, Richard Biener wrote:

> >> Do this only for fast-math as accuracy is reduced.  This is much faster
> >> since pow is more complex than exp - with a current GLIBC the speedup
> >> is more than 7 times for this transformation.
> >
> > Is it bound to be so on future glibc revisions and non-glibc platforms?
> 
> And how is accuracy affected?  I think the transform is only reasonable

For pow to be accurate when the result has large (positive or negative) 
exponent, it needs to compute the log and intermediate multiplication to a 
precision around (number of mantissa bits + number of exponent bits).  
This is inevitably slower than when you omit the extra intermediate 
precision (and if you omit that precision, the error can be around MAX_EXP 
ulps).

diff --git a/gcc/match.pd b/gcc/match.pd
index e98db52af84946cf579c6434e06d450713a47162..96486aa1f512fe32d85a1de95c46523263ea1b6d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3548,6 +3548,14 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (logs (pows @0 @1))
    (mult @1 (logs @0))))
 
+ /* pow(C,x) -> exp(log(C)*x).  */
+ (for pows (POW)
+      exps (EXP)
+      logs (LOG)
+  (simplify
+   (pows REAL_CST@0 @1)
+   (exps (mult (logs @0) @1))))
+
  (for sqrts (SQRT)
       cbrts (CBRT)
       pows (POW)