From patchwork Fri Oct 21 07:29:11 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Uros Bizjak <ubizjak@gmail.com>
X-Patchwork-Id: 120953
Return-Path: 
 <gcc-patches-return-305135-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id C824D1007D1
	for <incoming@patchwork.ozlabs.org>;
	Fri, 21 Oct 2011 18:29:31 +1100 (EST)
Received: (qmail 18783 invoked by alias); 21 Oct 2011 07:29:28 -0000
Received: (qmail 18769 invoked by uid 22791); 21 Oct 2011 07:29:26 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0	tests=AWL, BAYES_00,
	DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW, TW_DD, TW_VM, TW_ZJ, T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: sourceware.org
Received: from mail-yx0-f175.google.com (HELO mail-yx0-f175.google.com)
	(209.85.213.175) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Fri, 21 Oct 2011 07:29:12 +0000
Received: by yxj20 with SMTP id 20so1498809yxj.20 for
	<gcc-patches@gcc.gnu.org>; Fri, 21 Oct 2011 00:29:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.236.191.170 with SMTP id g30mr20042000yhn.110.1319182151987;
	Fri, 21 Oct 2011 00:29:11 -0700 (PDT)
Received: by 10.146.82.5 with HTTP; Fri, 21 Oct 2011 00:29:11 -0700 (PDT)
Date: Fri, 21 Oct 2011 09:29:11 +0200
Message-ID: 
 <CAFULd4ZJOW=A1dEFZdsx-iXjFez4id+8Ok_3SaM50i-FeZs=bQ@mail.gmail.com>
Subject: [PATCH, i386]: Improve recip sequences a bit
From: Uros Bizjak <ubizjak@gmail.com>
To: gcc-patches@gcc.gnu.org
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Hello!

While eyeballing following testcase:

float a[256], b[256], c[256];

void foo(void)
{
  int i;

  for (i=0; i<256; ++i)
    c[i] = a[i] / b[i];
}

-O2 -ftree-vectorize -ffast-math

I noticed that for some reason CSE doesn't eliminate memory read, resulting in:

.L2:
	vrcpps	b(%rax), %ymm0
	vmulps	b(%rax), %ymm0, %ymm1
	vmulps	%ymm1, %ymm0, %ymm1
	vaddps	%ymm0, %ymm0, %ymm0
	vsubps	%ymm1, %ymm0, %ymm1
	vmulps	a(%rax), %ymm1, %ymm1
	vmovaps	%ymm1, c(%rax)
	addq	$32, %rax
	cmpq	$1024, %rax
	jne	.L2

Attached patch forces memory operand into register, producing:

.L2:
	vmovaps	b(%rax), %ymm1
	vrcpps	%ymm1, %ymm0
	vmulps	%ymm1, %ymm0, %ymm1
	vmulps	%ymm1, %ymm0, %ymm1
	vaddps	%ymm0, %ymm0, %ymm0
	vsubps	%ymm1, %ymm0, %ymm1
	vmulps	a(%rax), %ymm1, %ymm1
	vmovaps	%ymm1, c(%rax)
	addq	$32, %rax
	cmpq	$1024, %rax
	jne	.L2

The same cure could be applied for rsqrt sequences.

2011-10-21  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.c (ix86_emit_swdivsf): Force b into register.
	(ix86_emit_swsqrtsf): Force a into register.

Patch was tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 180255)
+++ config/i386/i386.c	(working copy)
@@ -33682,6 +33682,8 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, enu
 
   /* a / b = a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b))) */
 
+  b = force_reg (mode, b);
+
   /* x0 = rcp(b) estimate */
   emit_insn (gen_rtx_SET (VOIDmode, x0,
 			  gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
@@ -33737,6 +33739,8 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, enum mach
   /* sqrt(a)  = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0)
      rsqrt(a) = -0.5     * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */
 
+  a = force_reg (mode, a);
+
   /* x0 = rsqrt(a) estimate */
   emit_insn (gen_rtx_SET (VOIDmode, x0,
 			  gen_rtx_UNSPEC (mode, gen_rtvec (1, a),