From patchwork Sun Sep 26 10:33:01 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 65780 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id EC4B4B70FB for ; Sun, 26 Sep 2010 20:33:14 +1000 (EST) Received: (qmail 13012 invoked by alias); 26 Sep 2010 10:33:10 -0000 Received: (qmail 13001 invoked by uid 22791); 26 Sep 2010 10:33:09 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, TW_HR, TW_KP, TW_OV, TW_VL, TW_VS, TW_ZJ, T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-qy0-f182.google.com (HELO mail-qy0-f182.google.com) (209.85.216.182) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 26 Sep 2010 10:33:03 +0000 Received: by qyk7 with SMTP id 7so3471716qyk.20 for ; Sun, 26 Sep 2010 03:33:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.184.149 with SMTP id ck21mr4315079qcb.198.1285497181897; Sun, 26 Sep 2010 03:33:01 -0700 (PDT) Received: by 10.229.84.4 with HTTP; Sun, 26 Sep 2010 03:33:01 -0700 (PDT) Date: Sun, 26 Sep 2010 12:33:01 +0200 Message-ID: Subject: [PATCH, i386]: Optimize DFmode signbit() for SSE math From: Uros Bizjak To: gcc-patches@gcc.gnu.org Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello! Attached patch optimized DFmode signbit function for SSE math. On 32 bit targets, generic code compiles int test (double a) { return signbit (a + 1.0); } to (-O2 -mfpmath=sse -msse2): movlpd .LC0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, -8(%ebp) movl -4(%ebp), %eax This creates store forwarding (partial memory) stall. On 64 bit targets, generic code creates: addsd .LC0(%rip), %xmm0 movq %xmm0, -8(%rsp) movq -8(%rsp), %rax shrq $63, %rax shrq has high latency on AMD processors, and movq+shrq is always slower on Intel (movq also introduces bypass delay on i7, since it operates in "int" domain, where movmskpd operates in "float" domain). Atached patch generates: movlpd .LC0, %xmm0 addsd 4(%esp), %xmm0 movmskpd %xmm0, %eax andl $1, %eax and addsd .LC0(%rip), %xmm0 movmskpd %xmm0, %eax andl $1, %eax 2010-09-24 Uros Bizjak * config/i386/i386.md (movmsk_df): New insn. (signbitdf): Split out of signbit2. Generate movmsk_df sequence for TARGET_SSE_MATH. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, committed to mainline SVN. Uros. Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 164628) +++ config/i386/i386.md (working copy) @@ -14980,18 +14980,65 @@ DONE; }) -(define_expand "signbit2" +(define_expand "signbitxf2" [(use (match_operand:SI 0 "register_operand" "")) - (use (match_operand:X87MODEF 1 "register_operand" ""))] + (use (match_operand:XF 1 "register_operand" ""))] + "TARGET_USE_FANCY_MATH_387" +{ + rtx scratch = gen_reg_rtx (HImode); + + emit_insn (gen_fxamxf2_i387 (scratch, operands[1])); + emit_insn (gen_andsi3 (operands[0], + gen_lowpart (SImode, scratch), GEN_INT (0x200))); + DONE; +}) + +(define_insn "movmsk_df" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI + [(match_operand:DF 1 "register_operand" "x")] + UNSPEC_MOVMSK))] + "SSE_FLOAT_MODE_P (DFmode) && TARGET_SSE_MATH" + "%vmovmskpd\t{%1, %0|%0, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "maybe_vex") + (set_attr "mode" "DF")]) + +;; Use movmskpd in SSE mode to avoid store forwarding stall +;; for 32bit targets and movq+shrq sequence for 64bit targets. +(define_expand "signbitdf2" + [(use (match_operand:SI 0 "register_operand" "")) + (use (match_operand:DF 1 "register_operand" ""))] "TARGET_USE_FANCY_MATH_387 - && !(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)" + || (SSE_FLOAT_MODE_P (DFmode) && TARGET_SSE_MATH)" { - rtx mask = GEN_INT (0x0200); + if (SSE_FLOAT_MODE_P (DFmode) && TARGET_SSE_MATH) + { + emit_insn (gen_movmsk_df (operands[0], operands[1])); + emit_insn (gen_andsi3 (operands[0], operands[0], const1_rtx)); + } + else + { + rtx scratch = gen_reg_rtx (HImode); + emit_insn (gen_fxamdf2_i387 (scratch, operands[1])); + emit_insn (gen_andsi3 (operands[0], + gen_lowpart (SImode, scratch), GEN_INT (0x200))); + } + DONE; +}) + +(define_expand "signbitsf2" + [(use (match_operand:SI 0 "register_operand" "")) + (use (match_operand:SF 1 "register_operand" ""))] + "TARGET_USE_FANCY_MATH_387 + && !(SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)" +{ rtx scratch = gen_reg_rtx (HImode); - emit_insn (gen_fxam2_i387 (scratch, operands[1])); - emit_insn (gen_andsi3 (operands[0], gen_lowpart (SImode, scratch), mask)); + emit_insn (gen_fxamsf2_i387 (scratch, operands[1])); + emit_insn (gen_andsi3 (operands[0], + gen_lowpart (SImode, scratch), GEN_INT (0x200))); DONE; })