From patchwork Fri Apr 5 17:49:40 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 234232 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id E2B112C00EA for ; Sat, 6 Apr 2013 04:50:03 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=mDj kbrv5RVocV970t85Q6aOANOXckIePEgWUkpsDuYEWi00X6ORQDjoPOIo3na5fM/l Dt2I/ual3i+CWZlpeLJXg8xA/jnE2aIXXitthtilX9qcr57VMEh0nQWvDX1seyeI JKEKOPQATaKwCBpagq2OPb5j96gKmc/E/76mFx84= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=pkEK5/w+J ogUlqV0N6AOqdzdlOs=; b=Rta8qk9deveC8t9P5Ri2a926+YoYXCKg04UAP/fUQ 0G4tQYJcOTGqJGwecoaKk9LK7FHoBziUQNe4WfSZ0rW4PsEyWRYQqzowC5NKKk6c 2nf/Lzy7CSFYjkI5HLXFXUmSw+L+/cwls8JZ0lsri2QVvpl2X6aC0PpArHXthvut 8w= Received: (qmail 11283 invoked by alias); 5 Apr 2013 17:49:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 11258 invoked by uid 89); 5 Apr 2013 17:49:56 -0000 X-Spam-SWARE-Status: No, score=-6.0 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD autolearn=ham version=3.3.1 Received: from e23smtp04.au.ibm.com (HELO e23smtp04.au.ibm.com) (202.81.31.146) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 05 Apr 2013 17:49:52 +0000 Received: from /spool/local by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 6 Apr 2013 03:38:42 +1000 Received: from d23dlp02.au.ibm.com (202.81.31.213) by e23smtp04.au.ibm.com (202.81.31.210) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Sat, 6 Apr 2013 03:38:40 +1000 Received: from d23relay05.au.ibm.com (d23relay05.au.ibm.com [9.190.235.152]) by d23dlp02.au.ibm.com (Postfix) with ESMTP id 1C0BC2BB0050; Sat, 6 Apr 2013 04:49:44 +1100 (EST) Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r35HaLwg59441284; Sat, 6 Apr 2013 04:36:22 +1100 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r35HngpJ009598; Sat, 6 Apr 2013 04:49:42 +1100 Received: from [9.80.16.12] ([9.80.16.12]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r35HndQZ009559; Sat, 6 Apr 2013 04:49:40 +1100 Message-ID: <1365184180.3460.26.camel@gnopaine> Subject: [PATCH, PowerPC] Fix PR 56843 From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje@gcc.gnu.org, bergner@gcc.gnu.org Date: Fri, 05 Apr 2013 12:49:40 -0500 Mime-Version: 1.0 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13040517-9264-0000-0000-00000376F4F5 This patch improves code generation for Newton-Raphson reciprocal estimates for divide and square root on PowerPC (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843). For the divide case, we formerly had specialized routines for two- and three-pass estimates. Rather than add new routines for one- and four-pass estimates, I removed those and rewrote the algorithm to be general for any number of passes. This unfortunately makes the patch hard to read. It will probably be easiest to review by applying it to a tree and looking at the whole rs6000_emit_swdiv function. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Ok for trunk? Thanks, Bill gcc: 2013-04-05 Bill Schmidt PR target/56843 * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove. (rs6000_emit_swdiv_low_precision): Remove. (rs6000_emit_swdiv): Rewrite to handle between one and four iterations of Newton-Raphson generally; modify required number of iterations for some cases. * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove. gcc/testsuite: 2013-04-05 Bill Schmidt PR target/56843 * gcc.target/powerpc/recip-1.c: Modify expected output. * gcc.target/powerpc/recip-3.c: Likewise. * gcc.target/powerpc/recip-4.c: Likewise. * gcc.target/powerpc/recip-5.c: Add expected output for iterations. Index: gcc/testsuite/gcc.target/powerpc/recip-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/recip-1.c (revision 197486) +++ gcc/testsuite/gcc.target/powerpc/recip-1.c (working copy) @@ -3,8 +3,8 @@ /* { dg-options "-O2 -mrecip -ffast-math -mcpu=power6" } */ /* { dg-final { scan-assembler-times "frsqrte" 2 } } */ /* { dg-final { scan-assembler-times "fmsub" 2 } } */ -/* { dg-final { scan-assembler-times "fmul" 8 } } */ -/* { dg-final { scan-assembler-times "fnmsub" 4 } } */ +/* { dg-final { scan-assembler-times "fmul" 6 } } */ +/* { dg-final { scan-assembler-times "fnmsub" 3 } } */ double rsqrt_d (double a) Index: gcc/testsuite/gcc.target/powerpc/recip-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/recip-3.c (revision 197486) +++ gcc/testsuite/gcc.target/powerpc/recip-3.c (working copy) @@ -7,8 +7,8 @@ /* { dg-final { scan-assembler-times "xsnmsub.dp\|fnmsub\ " 2 } } */ /* { dg-final { scan-assembler-times "frsqrtes" 1 } } */ /* { dg-final { scan-assembler-times "fmsubs" 1 } } */ -/* { dg-final { scan-assembler-times "fmuls" 4 } } */ -/* { dg-final { scan-assembler-times "fnmsubs" 2 } } */ +/* { dg-final { scan-assembler-times "fmuls" 2 } } */ +/* { dg-final { scan-assembler-times "fnmsubs" 1 } } */ double rsqrt_d (double a) Index: gcc/testsuite/gcc.target/powerpc/recip-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/recip-4.c (revision 197486) +++ gcc/testsuite/gcc.target/powerpc/recip-4.c (working copy) @@ -7,8 +7,8 @@ /* { dg-final { scan-assembler-times "xvnmsub.dp" 2 } } */ /* { dg-final { scan-assembler-times "xvrsqrtesp" 1 } } */ /* { dg-final { scan-assembler-times "xvmsub.sp" 1 } } */ -/* { dg-final { scan-assembler-times "xvmulsp" 4 } } */ -/* { dg-final { scan-assembler-times "xvnmsub.sp" 2 } } */ +/* { dg-final { scan-assembler-times "xvmulsp" 2 } } */ +/* { dg-final { scan-assembler-times "xvnmsub.sp" 1 } } */ #define SIZE 1024 Index: gcc/testsuite/gcc.target/powerpc/recip-5.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/recip-5.c (revision 197486) +++ gcc/testsuite/gcc.target/powerpc/recip-5.c (working copy) @@ -6,6 +6,14 @@ /* { dg-final { scan-assembler-times "xvresp" 5 } } */ /* { dg-final { scan-assembler-times "xsredp" 2 } } */ /* { dg-final { scan-assembler-times "fres" 2 } } */ +/* { dg-final { scan-assembler-times "fmuls" 2 } } */ +/* { dg-final { scan-assembler-times "fnmsubs" 2 } } */ +/* { dg-final { scan-assembler-times "xsmuldp" 2 } } */ +/* { dg-final { scan-assembler-times "xsnmsub.dp" 4 } } */ +/* { dg-final { scan-assembler-times "xvmulsp" 7 } } */ +/* { dg-final { scan-assembler-times "xvnmsub.sp" 5 } } */ +/* { dg-final { scan-assembler-times "xvmuldp" 6 } } */ +/* { dg-final { scan-assembler-times "xvnmsub.dp" 8 } } */ #include Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 197486) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -26913,54 +26913,26 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx m2, rtx a) emit_insn (gen_rtx_SET (VOIDmode, dst, r)); } -/* Newton-Raphson approximation of floating point divide with just 2 passes - (either single precision floating point, or newer machines with higher - accuracy estimates). Support both scalar and vector divide. Assumes no - trapping math and finite arguments. */ +/* Newton-Raphson approximation of floating point divide DST = N/D. If NOTE_P, + add a reg_note saying that this was a division. Support both scalar and + vector divide. Assumes no trapping math and finite arguments. */ -static void -rs6000_emit_swdiv_high_precision (rtx dst, rtx n, rtx d) +void +rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p) { enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, y1, u0, v0; - enum insn_code code = optab_handler (smul_optab, mode); - gen_2arg_fn_t gen_mul = (gen_2arg_fn_t) GEN_FCN (code); - rtx one = rs6000_load_constant_and_splat (mode, dconst1); + rtx one, x0, e0, x1, xprev, eprev, xnext, enext, u, v; + int i; - gcc_assert (code != CODE_FOR_nothing); + /* Low precision estimates guarantee 5 bits of accuracy. High + precision estimates guarantee 14 bits of accuracy. SFmode + requires 23 bits of accuracy. DFmode requires 52 bits of + accuracy. Each pass at least doubles the accuracy, leading + to the following. */ + int passes = (TARGET_RECIP_PRECISION) ? 1 : 3; + if (mode == DFmode || mode == V2DFmode) + passes++; - /* x0 = 1./d estimate */ - x0 = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (VOIDmode, x0, - gen_rtx_UNSPEC (mode, gen_rtvec (1, d), - UNSPEC_FRES))); - - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - (d * x0) */ - - e1 = gen_reg_rtx (mode); - rs6000_emit_madd (e1, e0, e0, e0); /* e1 = (e0 * e0) + e0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e1, x0, x0); /* y1 = (e1 * x0) + x0 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y1)); /* u0 = n * y1 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n); /* v0 = n - (d * u0) */ - - rs6000_emit_madd (dst, v0, y1, u0); /* dst = (v0 * y1) + u0 */ -} - -/* Newton-Raphson approximation of floating point divide that has a low - precision estimate. Assumes no trapping math and finite arguments. */ - -static void -rs6000_emit_swdiv_low_precision (rtx dst, rtx n, rtx d) -{ - enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, e2, y1, y2, y3, u0, v0, one; enum insn_code code = optab_handler (smul_optab, mode); gen_2arg_fn_t gen_mul = (gen_2arg_fn_t) GEN_FCN (code); @@ -26974,47 +26946,45 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx m2, rtx a) gen_rtx_UNSPEC (mode, gen_rtvec (1, d), UNSPEC_FRES))); - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - d * x0 */ + /* Each iteration but the last calculates x_(i+1) = x_i * (2 - d * x_i). */ + if (passes > 1) { - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e0, x0, x0); /* y1 = x0 + e0 * x0 */ + /* e0 = 1. - d * x0 */ + e0 = gen_reg_rtx (mode); + rs6000_emit_nmsub (e0, d, x0, one); - e1 = gen_reg_rtx (mode); - emit_insn (gen_mul (e1, e0, e0)); /* e1 = e0 * e0 */ + /* x1 = x0 + e0 * x0 */ + x1 = gen_reg_rtx (mode); + rs6000_emit_madd (x1, e0, x0, x0); - y2 = gen_reg_rtx (mode); - rs6000_emit_madd (y2, e1, y1, y1); /* y2 = y1 + e1 * y1 */ + for (i = 0, xprev = x1, eprev = e0; i < passes - 2; + ++i, xprev = xnext, eprev = enext) { + + /* enext = eprev * eprev */ + enext = gen_reg_rtx (mode); + emit_insn (gen_mul (enext, eprev, eprev)); - e2 = gen_reg_rtx (mode); - emit_insn (gen_mul (e2, e1, e1)); /* e2 = e1 * e1 */ + /* xnext = xprev + enext * xprev */ + xnext = gen_reg_rtx (mode); + rs6000_emit_madd (xnext, enext, xprev, xprev); + } - y3 = gen_reg_rtx (mode); - rs6000_emit_madd (y3, e2, y2, y2); /* y3 = y2 + e2 * y2 */ + } else + xprev = x0; - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y3)); /* u0 = n * y3 */ + /* The last iteration calculates x_(i+1) = n * x_i * (2 - d * x_i). */ - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n); /* v0 = n - d * u0 */ + /* u = n * xprev */ + u = gen_reg_rtx (mode); + emit_insn (gen_mul (u, n, xprev)); - rs6000_emit_madd (dst, v0, y3, u0); /* dst = u0 + v0 * y3 */ -} + /* v = n - (d * u) */ + v = gen_reg_rtx (mode); + rs6000_emit_nmsub (v, d, u, n); -/* Newton-Raphson approximation of floating point divide DST = N/D. If NOTE_P, - add a reg_note saying that this was a division. Support both scalar and - vector divide. Assumes no trapping math and finite arguments. */ + /* dst = (v * xprev) + u */ + rs6000_emit_madd (dst, v, xprev, u); -void -rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p) -{ - enum machine_mode mode = GET_MODE (dst); - - if (RS6000_RECIP_HIGH_PRECISION_P (mode)) - rs6000_emit_swdiv_high_precision (dst, n, d); - else - rs6000_emit_swdiv_low_precision (dst, n, d); - if (note_p) add_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_DIV (mode, n, d)); } @@ -27028,7 +26998,16 @@ rs6000_emit_swrsqrt (rtx dst, rtx src) enum machine_mode mode = GET_MODE (src); rtx x0 = gen_reg_rtx (mode); rtx y = gen_reg_rtx (mode); - int passes = (TARGET_RECIP_PRECISION) ? 2 : 3; + + /* Low precision estimates guarantee 5 bits of accuracy. High + precision estimates guarantee 14 bits of accuracy. SFmode + requires 23 bits of accuracy. DFmode requires 52 bits of + accuracy. Each pass at least doubles the accuracy, leading + to the following. */ + int passes = (TARGET_RECIP_PRECISION) ? 1 : 3; + if (mode == DFmode || mode == V2DFmode) + passes++; + REAL_VALUE_TYPE dconst3_2; int i; rtx halfthree; Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (revision 197486) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -601,9 +601,6 @@ extern unsigned char rs6000_recip_bits[]; #define RS6000_RECIP_AUTO_RSQRTE_P(MODE) \ (rs6000_recip_bits[(int)(MODE)] & RS6000_RECIP_MASK_AUTO_RSQRTE) -#define RS6000_RECIP_HIGH_PRECISION_P(MODE) \ - ((MODE) == SFmode || (MODE) == V4SFmode || TARGET_RECIP_PRECISION) - /* The default CPU for TARGET_OPTION_OVERRIDE. */ #define OPTION_TARGET_CPU_DEFAULT TARGET_CPU_DEFAULT