From patchwork Fri Oct 21 07:29:11 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 120953 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id C824D1007D1 for ; Fri, 21 Oct 2011 18:29:31 +1100 (EST) Received: (qmail 18783 invoked by alias); 21 Oct 2011 07:29:28 -0000 Received: (qmail 18769 invoked by uid 22791); 21 Oct 2011 07:29:26 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, TW_DD, TW_VM, TW_ZJ, T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-yx0-f175.google.com (HELO mail-yx0-f175.google.com) (209.85.213.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 21 Oct 2011 07:29:12 +0000 Received: by yxj20 with SMTP id 20so1498809yxj.20 for ; Fri, 21 Oct 2011 00:29:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.191.170 with SMTP id g30mr20042000yhn.110.1319182151987; Fri, 21 Oct 2011 00:29:11 -0700 (PDT) Received: by 10.146.82.5 with HTTP; Fri, 21 Oct 2011 00:29:11 -0700 (PDT) Date: Fri, 21 Oct 2011 09:29:11 +0200 Message-ID: Subject: [PATCH, i386]: Improve recip sequences a bit From: Uros Bizjak To: gcc-patches@gcc.gnu.org Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello! While eyeballing following testcase: float a[256], b[256], c[256]; void foo(void) { int i; for (i=0; i<256; ++i) c[i] = a[i] / b[i]; } -O2 -ftree-vectorize -ffast-math I noticed that for some reason CSE doesn't eliminate memory read, resulting in: .L2: vrcpps b(%rax), %ymm0 vmulps b(%rax), %ymm0, %ymm1 vmulps %ymm1, %ymm0, %ymm1 vaddps %ymm0, %ymm0, %ymm0 vsubps %ymm1, %ymm0, %ymm1 vmulps a(%rax), %ymm1, %ymm1 vmovaps %ymm1, c(%rax) addq $32, %rax cmpq $1024, %rax jne .L2 Attached patch forces memory operand into register, producing: .L2: vmovaps b(%rax), %ymm1 vrcpps %ymm1, %ymm0 vmulps %ymm1, %ymm0, %ymm1 vmulps %ymm1, %ymm0, %ymm1 vaddps %ymm0, %ymm0, %ymm0 vsubps %ymm1, %ymm0, %ymm1 vmulps a(%rax), %ymm1, %ymm1 vmovaps %ymm1, c(%rax) addq $32, %rax cmpq $1024, %rax jne .L2 The same cure could be applied for rsqrt sequences. 2011-10-21 Uros Bizjak * config/i386/i386.c (ix86_emit_swdivsf): Force b into register. (ix86_emit_swsqrtsf): Force a into register. Patch was tested on x86_64-pc-linux-gnu, committed to mainline SVN. Uros. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 180255) +++ config/i386/i386.c (working copy) @@ -33682,6 +33682,8 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, enu /* a / b = a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b))) */ + b = force_reg (mode, b); + /* x0 = rcp(b) estimate */ emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), @@ -33737,6 +33739,8 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, enum mach /* sqrt(a) = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ + a = force_reg (mode, a); + /* x0 = rsqrt(a) estimate */ emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a),