From patchwork Fri Apr  5 17:49:40 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 234232
Return-Path: 
 <gcc-patches-return-339343-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified))
	by ozlabs.org (Postfix) with ESMTPS id E2B112C00EA
	for <incoming@patchwork.ozlabs.org>;
	Sat,  6 Apr 2013 04:50:03 +1100 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=mDj
	kbrv5RVocV970t85Q6aOANOXckIePEgWUkpsDuYEWi00X6ORQDjoPOIo3na5fM/l
	Dt2I/ual3i+CWZlpeLJXg8xA/jnE2aIXXitthtilX9qcr57VMEh0nQWvDX1seyeI
	JKEKOPQATaKwCBpagq2OPb5j96gKmc/E/76mFx84=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type
	:content-transfer-encoding:mime-version; s=default; bh=pkEK5/w+J
	ogUlqV0N6AOqdzdlOs=; b=Rta8qk9deveC8t9P5Ri2a926+YoYXCKg04UAP/fUQ
	0G4tQYJcOTGqJGwecoaKk9LK7FHoBziUQNe4WfSZ0rW4PsEyWRYQqzowC5NKKk6c
	2nf/Lzy7CSFYjkI5HLXFXUmSw+L+/cwls8JZ0lsri2QVvpl2X6aC0PpArHXthvut
	8w=
Received: (qmail 11283 invoked by alias); 5 Apr 2013 17:49:56 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 11258 invoked by uid 89); 5 Apr 2013 17:49:56 -0000
X-Spam-SWARE-Status: No, score=-6.0 required=5.0 tests=AWL, BAYES_00,
	KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL,
	RP_MATCHES_RCVD autolearn=ham version=3.3.1
Received: from e23smtp04.au.ibm.com (HELO e23smtp04.au.ibm.com)
	(202.81.31.146) by sourceware.org
	(qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP;
	Fri, 05 Apr 2013 17:49:52 +0000
Received: from /spool/local	by e23smtp04.au.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<wschmidt@linux.vnet.ibm.com>; Sat, 6 Apr 2013 03:38:42 +1000
Received: from d23dlp02.au.ibm.com (202.81.31.213)	by e23smtp04.au.ibm.com
	(202.81.31.210) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Sat, 6 Apr 2013 03:38:40 +1000
Received: from d23relay05.au.ibm.com (d23relay05.au.ibm.com
	[9.190.235.152])	by d23dlp02.au.ibm.com (Postfix) with ESMTP
	id 1C0BC2BB0050; Sat,  6 Apr 2013 04:49:44 +1100 (EST)
Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139])	by
	d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r35HaLwg59441284; Sat, 6 Apr 2013 04:36:22 +1100
Received: from d23av04.au.ibm.com (loopback [127.0.0.1])	by
	d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id r35HngpJ009598; Sat, 6 Apr 2013 04:49:42 +1100
Received: from [9.80.16.12] ([9.80.16.12])	by d23av04.au.ibm.com
	(8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r35HndQZ009559;
	Sat, 6 Apr 2013 04:49:40 +1100
Message-ID: <1365184180.3460.26.camel@gnopaine>
Subject: [PATCH, PowerPC] Fix PR 56843
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: dje@gcc.gnu.org, bergner@gcc.gnu.org
Date: Fri, 05 Apr 2013 12:49:40 -0500
Mime-Version: 1.0
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 13040517-9264-0000-0000-00000376F4F5

This patch improves code generation for Newton-Raphson reciprocal
estimates for divide and square root on PowerPC
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843).

For the divide case, we formerly had specialized routines for two- and
three-pass estimates.  Rather than add new routines for one- and
four-pass estimates, I removed those and rewrote the algorithm to be
general for any number of passes.  This unfortunately makes the patch
hard to read.  It will probably be easiest to review by applying it to a
tree and looking at the whole rs6000_emit_swdiv function.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
regressions.  Ok for trunk?

Thanks,
Bill


gcc:

2013-04-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/56843
	* config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove.
	(rs6000_emit_swdiv_low_precision): Remove.
	(rs6000_emit_swdiv): Rewrite to handle between one and four
	iterations of Newton-Raphson generally; modify required number of
	iterations for some cases.
	* config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove.

gcc/testsuite:

2013-04-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/56843
	* gcc.target/powerpc/recip-1.c: Modify expected output.
	* gcc.target/powerpc/recip-3.c: Likewise.
	* gcc.target/powerpc/recip-4.c: Likewise.
	* gcc.target/powerpc/recip-5.c: Add expected output for iterations.

Index: gcc/testsuite/gcc.target/powerpc/recip-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/recip-1.c	(revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-1.c	(working copy)
@@ -3,8 +3,8 @@
 /* { dg-options "-O2 -mrecip -ffast-math -mcpu=power6" } */
 /* { dg-final { scan-assembler-times "frsqrte" 2 } } */
 /* { dg-final { scan-assembler-times "fmsub" 2 } } */
-/* { dg-final { scan-assembler-times "fmul" 8 } } */
-/* { dg-final { scan-assembler-times "fnmsub" 4 } } */
+/* { dg-final { scan-assembler-times "fmul" 6 } } */
+/* { dg-final { scan-assembler-times "fnmsub" 3 } } */
 
 double
 rsqrt_d (double a)
Index: gcc/testsuite/gcc.target/powerpc/recip-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/recip-3.c	(revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-3.c	(working copy)
@@ -7,8 +7,8 @@
 /* { dg-final { scan-assembler-times "xsnmsub.dp\|fnmsub\ " 2 } } */
 /* { dg-final { scan-assembler-times "frsqrtes" 1 } } */
 /* { dg-final { scan-assembler-times "fmsubs" 1 } } */
-/* { dg-final { scan-assembler-times "fmuls" 4 } } */
-/* { dg-final { scan-assembler-times "fnmsubs" 2 } } */
+/* { dg-final { scan-assembler-times "fmuls" 2 } } */
+/* { dg-final { scan-assembler-times "fnmsubs" 1 } } */
 
 double
 rsqrt_d (double a)
Index: gcc/testsuite/gcc.target/powerpc/recip-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/recip-4.c	(revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-4.c	(working copy)
@@ -7,8 +7,8 @@
 /* { dg-final { scan-assembler-times "xvnmsub.dp" 2 } } */
 /* { dg-final { scan-assembler-times "xvrsqrtesp" 1 } } */
 /* { dg-final { scan-assembler-times "xvmsub.sp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmulsp" 4 } } */
-/* { dg-final { scan-assembler-times "xvnmsub.sp" 2 } } */
+/* { dg-final { scan-assembler-times "xvmulsp" 2 } } */
+/* { dg-final { scan-assembler-times "xvnmsub.sp" 1 } } */
 
 #define SIZE 1024
 
Index: gcc/testsuite/gcc.target/powerpc/recip-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/recip-5.c	(revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-5.c	(working copy)
@@ -6,6 +6,14 @@
 /* { dg-final { scan-assembler-times "xvresp" 5 } } */
 /* { dg-final { scan-assembler-times "xsredp" 2 } } */
 /* { dg-final { scan-assembler-times "fres" 2 } } */
+/* { dg-final { scan-assembler-times "fmuls" 2 } } */
+/* { dg-final { scan-assembler-times "fnmsubs" 2 } } */
+/* { dg-final { scan-assembler-times "xsmuldp" 2 } } */
+/* { dg-final { scan-assembler-times "xsnmsub.dp" 4 } } */
+/* { dg-final { scan-assembler-times "xvmulsp" 7 } } */
+/* { dg-final { scan-assembler-times "xvnmsub.sp" 5 } } */
+/* { dg-final { scan-assembler-times "xvmuldp" 6 } } */
+/* { dg-final { scan-assembler-times "xvnmsub.dp" 8 } } */
 
 #include <altivec.h>
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 197486)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -26913,54 +26913,26 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx m2, rtx a)
   emit_insn (gen_rtx_SET (VOIDmode, dst, r));
 }
 
-/* Newton-Raphson approximation of floating point divide with just 2 passes
-   (either single precision floating point, or newer machines with higher
-   accuracy estimates).  Support both scalar and vector divide.  Assumes no
-   trapping math and finite arguments.  */
+/* Newton-Raphson approximation of floating point divide DST = N/D.  If NOTE_P,
+   add a reg_note saying that this was a division.  Support both scalar and
+   vector divide.  Assumes no trapping math and finite arguments.  */
 
-static void
-rs6000_emit_swdiv_high_precision (rtx dst, rtx n, rtx d)
+void
+rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p)
 {
   enum machine_mode mode = GET_MODE (dst);
-  rtx x0, e0, e1, y1, u0, v0;
-  enum insn_code code = optab_handler (smul_optab, mode);
-  gen_2arg_fn_t gen_mul = (gen_2arg_fn_t) GEN_FCN (code);
-  rtx one = rs6000_load_constant_and_splat (mode, dconst1);
+  rtx one, x0, e0, x1, xprev, eprev, xnext, enext, u, v;
+  int i;
 
-  gcc_assert (code != CODE_FOR_nothing);
+  /* Low precision estimates guarantee 5 bits of accuracy.  High
+     precision estimates guarantee 14 bits of accuracy.  SFmode
+     requires 23 bits of accuracy.  DFmode requires 52 bits of
+     accuracy.  Each pass at least doubles the accuracy, leading
+     to the following.  */
+  int passes = (TARGET_RECIP_PRECISION) ? 1 : 3;
+  if (mode == DFmode || mode == V2DFmode)
+    passes++;
 
-  /* x0 = 1./d estimate */
-  x0 = gen_reg_rtx (mode);
-  emit_insn (gen_rtx_SET (VOIDmode, x0,
-			  gen_rtx_UNSPEC (mode, gen_rtvec (1, d),
-					  UNSPEC_FRES)));
-
-  e0 = gen_reg_rtx (mode);
-  rs6000_emit_nmsub (e0, d, x0, one);		/* e0 = 1. - (d * x0) */
-
-  e1 = gen_reg_rtx (mode);
-  rs6000_emit_madd (e1, e0, e0, e0);		/* e1 = (e0 * e0) + e0 */
-
-  y1 = gen_reg_rtx (mode);
-  rs6000_emit_madd (y1, e1, x0, x0);		/* y1 = (e1 * x0) + x0 */
-
-  u0 = gen_reg_rtx (mode);
-  emit_insn (gen_mul (u0, n, y1));		/* u0 = n * y1 */
-
-  v0 = gen_reg_rtx (mode);
-  rs6000_emit_nmsub (v0, d, u0, n);		/* v0 = n - (d * u0) */
-
-  rs6000_emit_madd (dst, v0, y1, u0);		/* dst = (v0 * y1) + u0 */
-}
-
-/* Newton-Raphson approximation of floating point divide that has a low
-   precision estimate.  Assumes no trapping math and finite arguments.  */
-
-static void
-rs6000_emit_swdiv_low_precision (rtx dst, rtx n, rtx d)
-{
-  enum machine_mode mode = GET_MODE (dst);
-  rtx x0, e0, e1, e2, y1, y2, y3, u0, v0, one;
   enum insn_code code = optab_handler (smul_optab, mode);
   gen_2arg_fn_t gen_mul = (gen_2arg_fn_t) GEN_FCN (code);
 
@@ -26974,47 +26946,45 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx m2, rtx a)
 			  gen_rtx_UNSPEC (mode, gen_rtvec (1, d),
 					  UNSPEC_FRES)));
 
-  e0 = gen_reg_rtx (mode);
-  rs6000_emit_nmsub (e0, d, x0, one);		/* e0 = 1. - d * x0 */
+  /* Each iteration but the last calculates x_(i+1) = x_i * (2 - d * x_i).  */
+  if (passes > 1) {
 
-  y1 = gen_reg_rtx (mode);
-  rs6000_emit_madd (y1, e0, x0, x0);		/* y1 = x0 + e0 * x0 */
+    /* e0 = 1. - d * x0  */
+    e0 = gen_reg_rtx (mode);
+    rs6000_emit_nmsub (e0, d, x0, one);
 
-  e1 = gen_reg_rtx (mode);
-  emit_insn (gen_mul (e1, e0, e0));		/* e1 = e0 * e0 */
+    /* x1 = x0 + e0 * x0  */
+    x1 = gen_reg_rtx (mode);
+    rs6000_emit_madd (x1, e0, x0, x0);
 
-  y2 = gen_reg_rtx (mode);
-  rs6000_emit_madd (y2, e1, y1, y1);		/* y2 = y1 + e1 * y1 */
+    for (i = 0, xprev = x1, eprev = e0; i < passes - 2;
+	 ++i, xprev = xnext, eprev = enext) {
+      
+      /* enext = eprev * eprev  */
+      enext = gen_reg_rtx (mode);
+      emit_insn (gen_mul (enext, eprev, eprev));
 
-  e2 = gen_reg_rtx (mode);
-  emit_insn (gen_mul (e2, e1, e1));		/* e2 = e1 * e1 */
+      /* xnext = xprev + enext * xprev  */
+      xnext = gen_reg_rtx (mode);
+      rs6000_emit_madd (xnext, enext, xprev, xprev);
+    }
 
-  y3 = gen_reg_rtx (mode);
-  rs6000_emit_madd (y3, e2, y2, y2);		/* y3 = y2 + e2 * y2 */
+  } else
+    xprev = x0;
 
-  u0 = gen_reg_rtx (mode);
-  emit_insn (gen_mul (u0, n, y3));		/* u0 = n * y3 */
+  /* The last iteration calculates x_(i+1) = n * x_i * (2 - d * x_i).  */
 
-  v0 = gen_reg_rtx (mode);
-  rs6000_emit_nmsub (v0, d, u0, n);		/* v0 = n - d * u0 */
+  /* u = n * xprev  */
+  u = gen_reg_rtx (mode);
+  emit_insn (gen_mul (u, n, xprev));
 
-  rs6000_emit_madd (dst, v0, y3, u0);		/* dst = u0 + v0 * y3 */
-}
+  /* v = n - (d * u)  */
+  v = gen_reg_rtx (mode);
+  rs6000_emit_nmsub (v, d, u, n);
 
-/* Newton-Raphson approximation of floating point divide DST = N/D.  If NOTE_P,
-   add a reg_note saying that this was a division.  Support both scalar and
-   vector divide.  Assumes no trapping math and finite arguments.  */
+  /* dst = (v * xprev) + u  */
+  rs6000_emit_madd (dst, v, xprev, u);
 
-void
-rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p)
-{
-  enum machine_mode mode = GET_MODE (dst);
-
-  if (RS6000_RECIP_HIGH_PRECISION_P (mode))
-    rs6000_emit_swdiv_high_precision (dst, n, d);
-  else
-    rs6000_emit_swdiv_low_precision (dst, n, d);
-
   if (note_p)
     add_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_DIV (mode, n, d));
 }
@@ -27028,7 +26998,16 @@ rs6000_emit_swrsqrt (rtx dst, rtx src)
   enum machine_mode mode = GET_MODE (src);
   rtx x0 = gen_reg_rtx (mode);
   rtx y = gen_reg_rtx (mode);
-  int passes = (TARGET_RECIP_PRECISION) ? 2 : 3;
+
+  /* Low precision estimates guarantee 5 bits of accuracy.  High
+     precision estimates guarantee 14 bits of accuracy.  SFmode
+     requires 23 bits of accuracy.  DFmode requires 52 bits of
+     accuracy.  Each pass at least doubles the accuracy, leading
+     to the following.  */
+  int passes = (TARGET_RECIP_PRECISION) ? 1 : 3;
+  if (mode == DFmode || mode == V2DFmode)
+    passes++;
+
   REAL_VALUE_TYPE dconst3_2;
   int i;
   rtx halfthree;
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 197486)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -601,9 +601,6 @@ extern unsigned char rs6000_recip_bits[];
 #define RS6000_RECIP_AUTO_RSQRTE_P(MODE) \
   (rs6000_recip_bits[(int)(MODE)] & RS6000_RECIP_MASK_AUTO_RSQRTE)
 
-#define RS6000_RECIP_HIGH_PRECISION_P(MODE) \
-  ((MODE) == SFmode || (MODE) == V4SFmode || TARGET_RECIP_PRECISION)
-
 /* The default CPU for TARGET_OPTION_OVERRIDE.  */
 #define OPTION_TARGET_CPU_DEFAULT TARGET_CPU_DEFAULT