From patchwork Mon Nov 19 23:25:44 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.vnet.ibm.com>
X-Patchwork-Id: 200216
Return-Path: 
 <gcc-patches-return-332305-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 38DC92C0098
	for <incoming@patchwork.ozlabs.org>;
	Tue, 20 Nov 2012 10:26:44 +1100 (EST)
Comment: DKIM? See http://www.dkim.org
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;
	d=gcc.gnu.org; s=default; x=1353972404; h=Comment:
	DomainKey-Signature:Received:Received:Received:Received:Received:
	Received:Received:Received:Received:Received:Date:From:To:
	Subject:Message-ID:Mail-Followup-To:MIME-Version:Content-Type:
	Content-Disposition:User-Agent:Mailing-List:Precedence:List-Id:
	List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:
	Delivered-To; bh=7jT8MORd1ZQBObirt4Mxd97/sD0=; b=Z6tYqKrA4bixLy4
	tK5XayKbYcvFoR6LI9vZQbEnm2qtnY+Lc511q8l3OjZhNekbbHP3w5eWfiksR/6a
	lNx2WtBSWkL/9Y8EYp3Yuseo7CuSbwA3F6FMZG/Y90OpcoydpbaD3hbPSZo6OhQ1
	QUF9BwsU3JYcCT1GqzUKIxHV2FoU=
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Received:Received:Date:From:To:Subject:Message-ID:Mail-Followup-To:MIME-Version:Content-Type:Content-Disposition:User-Agent:X-Content-Scanned:x-cbid:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=P2StACiUfrafRZl/cLS9sM/7GVDHG/UC8CHVqkF7p7yeelvb0yxjTAaGAsxfjy
	t+yVyiIGMKIbhRdGo1nXZ3iHKv6pvZsfpqL5qedglKXJhUANCfjqDzAs1IVSCYrT
	PT5vbnKUsvBFiWztgge+BcklJPrcdzJVtuWMrfbCDoRrg=;
Received: (qmail 14294 invoked by alias); 19 Nov 2012 23:26:33 -0000
Received: (qmail 14268 invoked by uid 22791); 19 Nov 2012 23:26:33 -0000
X-SWARE-Spam-Status: No, hits=-3.8 required=5.0	tests=AWL, BAYES_00,
	KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W,
	TW_XL
X-Spam-Check-By: sourceware.org
Received: from e32.co.us.ibm.com (HELO e32.co.us.ibm.com) (32.97.110.150) by
	sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Mon, 19 Nov 2012 23:26:26 +0000
Received: from /spool/local	by e32.co.us.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<meissner@ibm-tiger.the-meissners.org>;
	Mon, 19 Nov 2012 16:26:25 -0700
Received: from d03dlp03.boulder.ibm.com (9.17.202.179)	by e32.co.us.ibm.com
	(192.168.1.132) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Mon, 19 Nov 2012 16:25:48 -0700
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com
	[9.17.195.106])	by d03dlp03.boulder.ibm.com (Postfix) with
	ESMTP id CAB3419D8036	for <gcc-patches@gcc.gnu.org>;
	Mon, 19 Nov 2012 16:25:47 -0700 (MST)
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com
	[9.17.195.167])	by d03relay04.boulder.ibm.com
	(8.13.8/8.13.8/NCO v10.0) with ESMTP id qAJNPkQK380806	for
	<gcc-patches@gcc.gnu.org>; Mon, 19 Nov 2012 16:25:47 -0700
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])	by
	d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with
	ESMTP id qAJNPk18004652	for <gcc-patches@gcc.gnu.org>;
	Mon, 19 Nov 2012 16:25:46 -0700
Received: from ibm-tiger.the-meissners.org ([9.33.37.85])	by
	d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with
	ESMTP id qAJNPk14004540; Mon, 19 Nov 2012 16:25:46 -0700
Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500)	id
	A42C7425CB; Mon, 19 Nov 2012 18:25:44 -0500 (EST)
Date: Mon, 19 Nov 2012 18:25:44 -0500
From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com
Subject: [Patch] Slightly improve powerpc floating point handling
Message-ID: <20121119232544.GA24478@ibm-tiger.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.vnet.ibm.com>,
	gcc-patches@gcc.gnu.org, dje.gcc@gmail.com
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-12-10)
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12111923-5406-0000-0000-0000022FE37A
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

I am working on support for a future processor, and I noticed that when I did
the power7 work initially in 2009, that I ordered the DF moves so that the VSX
moves came before traditional floating point moves.

If reload needs to reload a floating point register, it will match first on the
VSX instructions and generate LXSDX or STXSDX instead of the traditional
LFD/LFDX and STFD/STFDX instructions.  Because the LXSDX/STXSDX instructions
are only REG+REG, reload needs to generate the stack offset in a GPR and use
this.  Note, for normal loads/stores, the register allocator will see if there
are other options, and eventually match against the traditional floating point
load and store.  Reload however, seems to stop as soon as it finds an
appropriate instruction.

The following patch reorders the movdf patterns so that first the traditional
floating point registers are considered, then the VSX registers, and finally
the general purpose registers.  I have bootstrapped the compiler with these
changes, and had no regressions in the testsuite.

I also ran the spec 2006 benchmark suite with/without these patches (using
subversion id 193503 as the base).  There were no slow downs that were outside
of the normal range that I consider to be noise level (2%).  The 447.dealII
benchmark sped up by 14% (456.hmmer and 471.omnetpp sped up by 2%).

Are these patches ok to apply?

2012-11-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (movdf_hardfloat32): Reorder move
	constraints so that the traditional floating point loads, stores,
	and moves are done first, then the VSX loads, stores, and moves,
	and finally the GPR loads, stores, and moves so that reload
	chooses FPRs over GPRs, and uses the traditional load/store
	instructions which provide an offset.
	(movdf_hardfloat64): Likewise.

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 193635)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -8019,46 +8019,30 @@ (define_split
 ;; less efficient than loading the constant into an FP register, since
 ;; it will probably be used there.
 (define_insn "*movdf_hardfloat32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,m,d,d,wa,!r,!r,!r")
-	(match_operand:DF 1 "input_operand" "r,Y,r,ws,wa,Z,Z,ws,wa,d,m,d,j,G,H,F"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,d,d,ws,?wa,Z,?Z,ws,?wa,wa,Y,r,!r,!r,!r,!r")
+	(match_operand:DF 1 "input_operand" "d,m,d,Z,Z,ws,wa,ws,wa,j,r,Y,r,G,H,F"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT 
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
-  "*
-{
-  switch (which_alternative)
-    {
-    default:
-      gcc_unreachable ();
-    case 0:
-    case 1:
-    case 2:
-      return \"#\";
-    case 3:
-    case 4:
-      return \"xxlor %x0,%x1,%x1\";
-    case 5:
-    case 6:
-      return \"lxsd%U1x %x0,%y1\";
-    case 7:
-    case 8:
-      return \"stxsd%U0x %x1,%y0\";
-    case 9:
-      return \"stfd%U0%X0 %1,%0\";
-    case 10:
-      return \"lfd%U1%X1 %0,%1\";
-    case 11:
-      return \"fmr %0,%1\";
-    case 12:
-      return \"xxlxor %x0,%x0,%x0\";
-    case 13:
-    case 14:
-    case 15:
-      return \"#\";
-    }
-}"
-  [(set_attr "type" "store,load,two,fp,fp,fpload,fpload,fpstore,fpstore,fpstore,fpload,fp,vecsimple,*,*,*")
-   (set_attr "length" "8,8,8,4,4,4,4,4,4,4,4,4,4,8,12,16")])
+  "@
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
+   fmr %0,%1
+   lxsd%U1x %x0,%y1
+   lxsd%U1x %x0,%y1
+   stxsd%U0x %x1,%y0
+   stxsd%U0x %x1,%y0
+   xxlor %x0,%x1,%x1
+   xxlor %x0,%x1,%x1
+   xxlxor %x0,%x0,%x0
+   #
+   #
+   #
+   #
+   #
+   #"
+  [(set_attr "type" "fpstore,fpload,fp,fpload,fpload,fpstore,fpstore,vecsimple,vecsimple,vecsimple,store,load,two,fp,fp,*")
+   (set_attr "length" "4,4,4,4,4,4,4,4,4,4,8,8,8,8,12,16")])
 
 (define_insn "*movdf_softfloat32"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,r,r,r,r")
@@ -8131,25 +8115,25 @@ (define_insn "*movdf_hardfloat64_mfpgpr"
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*movdf_hardfloat64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,m,d,d,wa,*c*l,!r,*h,!r,!r,!r")
-	(match_operand:DF 1 "input_operand" "r,Y,r,ws,wa,Z,Z,ws,wa,d,m,d,j,r,h,0,G,H,F"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=m,d,d,Y,r,!r,ws,?wa,Z,?Z,ws,?wa,wa,*c*l,!r,*h,!r,!r,!r")
+	(match_operand:DF 1 "input_operand" "d,m,d,r,Y,r,Z,Z,ws,wa,ws,wa,j,r,h,0,G,H,F"))]
   "TARGET_POWERPC64 && !TARGET_MFPGPR && TARGET_HARD_FLOAT && TARGET_FPRS 
    && TARGET_DOUBLE_FLOAT
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
   "@
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
+   fmr %0,%1
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
-   xxlor %x0,%x1,%x1
-   xxlor %x0,%x1,%x1
    lxsd%U1x %x0,%y1
    lxsd%U1x %x0,%y1
    stxsd%U0x %x1,%y0
    stxsd%U0x %x1,%y0
-   stfd%U0%X0 %1,%0
-   lfd%U1%X1 %0,%1
-   fmr %0,%1
+   xxlor %x0,%x1,%x1
+   xxlor %x0,%x1,%x1
    xxlxor %x0,%x0,%x0
    mt%0 %1
    mf%1 %0
@@ -8157,7 +8141,7 @@ (define_insn "*movdf_hardfloat64"
    #
    #
    #"
-  [(set_attr "type" "store,load,*,fp,fp,fpload,fpload,fpstore,fpstore,fpstore,fpload,fp,vecsimple,mtjmpr,mfjmpr,*,*,*,*")
+  [(set_attr "type" "fpstore,fpload,fp,store,load,*,fpload,fpload,fpstore,fpstore,vecsimple,vecsimple,vecsimple,mtjmpr,mfjmpr,*,*,*,*")
    (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,8,12,16")])
 
 (define_insn "*movdf_softfloat64"