From patchwork Mon Feb 15 12:36:53 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Modra <amodra@gmail.com>
X-Patchwork-Id: 582870
Return-Path: 
 <gcc-patches-return-421431-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 5F7C11402EC
	for <incoming@patchwork.ozlabs.org>;
	Mon, 15 Feb 2016 23:37:12 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=ga1qzaWp; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; q=dns; s=default; b=s6vvoj/BvSoT6nR0W
	kAfoPj90YzF5uOTxXrwxUyVVfevmS4csIFswnuXjhynnqdWTZ5TRW0pTeP0YlwNK
	reoX0pm46Pyh2G/qzag88XymqSqBPIsnp2OZuHRlX8HPd54IPY7mazH3gnAqYIKG
	dKgPKPijt43oJGq3I9HwvSzi/s=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; s=default; bh=tH4DyqqVpfUJQX2FLn7z3or
	BebY=; b=ga1qzaWphEamy6BjNEk2jHizRyox2CmFW9VEjwKa0fVTkq8yR1XzZPy
	Za3Vp0tybRxXI1Zq1UUxqcjYeKhrrIX9rmWwRASxo8cUwLLos0Fqx4i59q1OnL93
	WA9qYNnyNqtfqFbW8ymLp7Nq2AKTsKBmDImCoF/s40UHk8xGVCJs=
Received: (qmail 2852 invoked by alias); 15 Feb 2016 12:37:05 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 2790 invoked by uid 89); 15 Feb 2016 12:37:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2 spammy=sk:!target,
	sk:TARGET, sk:!TARGET, Similarly
X-HELO: mail-pf0-f171.google.com
Received: from mail-pf0-f171.google.com (HELO mail-pf0-f171.google.com)
	(209.85.192.171) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
	encrypted) ESMTPS; Mon, 15 Feb 2016 12:37:01 +0000
Received: by mail-pf0-f171.google.com with SMTP id e127so87162870pfe.3 for
	<gcc-patches@gcc.gnu.org>; Mon, 15 Feb 2016 04:37:01 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20130820;
	h=x-gm-message-state:date:from:to:cc:subject:message-id:references
	:mime-version:content-type:content-disposition:in-reply-to
	:user-agent; bh=EqD9TKvNbuwHxvfTbq3rehBHweiyY12gfD4R9oaAjxE=;
	b=LFN4lcWgqg6eXUoDIJzX/7Sv8Qvaivof7RXE47b6b2B2SBjXmZbl/xaq05xOp87Z4P
	p9V60jfEB/x29TTZ5VYrfbNomzvQwINJd+fnzLZO7QI11dTRLnMR0DHUycSlDMyC3YAj
	sL8OMXu82UNEkkQgqf1lG7okwlFl0ROd0tHZ3hJ+Q0pI6/pfUf52FeWn3F79boK4zL8g
	GFLfd9yi8IvsHNwhl/i3h71z76KuCuZrPIVcKBWU266nZ14OI58qTUwQ5g+rMGed3Mhq
	kmqZV3tTTXWum9tCn+DO4vW8VVpflOReHI16N7mYWfdfJSf70CFY+t9pJE8NAlGx4fIE
	IRLA==
X-Gm-Message-State: 
 AG10YOTcgQdhdmZZN1WKv4zWBSw8U2Iggg51iB97yhPf35kxOuPDTYREkFEeWoUlWZP2EQ==
X-Received: by 10.98.1.197 with SMTP id 188mr23132622pfb.8.1455539819549;
	Mon, 15 Feb 2016 04:36:59 -0800 (PST)
Received: from bubble.grove.modra.org (CPE-58-160-146-233.sa.bigpond.net.au.
	[58.160.146.233]) by smtp.gmail.com with ESMTPSA id
	u64sm38415507pfa.86.2016.02.15.04.36.58 (version=TLS1_2
	cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Mon, 15 Feb 2016 04:36:58 -0800 (PST)
Received: by bubble.grove.modra.org (Postfix, from userid 1000)	id
	08BF1EA0157; Mon, 15 Feb 2016 23:06:53 +1030 (ACDT)
Date: Mon, 15 Feb 2016 23:06:53 +1030
From: Alan Modra <amodra@gmail.com>
To: Ulrich Weigand <uweigand@de.ibm.com>
Cc: Michael Meissner <meissner@linux.vnet.ibm.com>,
	David Edelsohn <dje.gcc@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	Michael R Meissner <mrmeissn@us.ibm.com>
Subject: Re: [RS6000] reload_vsx_from_gprsf splitter
Message-ID: <20160215123653.GG10888@bubble.grove.modra.org>
References: <20160211215302.GB22837@ibm-tiger.the-meissners.org>
	<20160212135722.AD6DD6B9A@oc7340732750.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160212135722.AD6DD6B9A@oc7340732750.ibm.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-IsSubscribed: yes

On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
> > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
> > > Another concern I had about this, besides using %L in asm output (what
> > > forces TFmode to use just fprs?), is what happens when we're using
> > > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
> > 
> > Good point that it breaks if the default long double (TFmode) type is IEEE
> > 128-bit floating point.  We would need to have two patterns, one that uses
> > TFmode and one that uses IFmode.  I wrote the power8 direct move stuff before
> > going down the road of IEEE 128-bit floating point.
> 
> Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
> but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
> then we probably shouldn't be using it.

Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
is relevant only up to and including expand.  Nothing prevents the
backend from using IFmode.

> Another option might be to use TDmode to allocate a scratch register pair.

That won't work, at least if we want to extract the two component regs
with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
my original patch I just extracted the regs by using gen_rtx_REG but I
changed that, based on your criticism of using gen_rtx_REG in
reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
operand regnos in other places.  That particular change is of course
entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
punning regs, instead duplicating insn patterns as done elsewhere in
the vsx support.  I don't believe we will see subregs of vsx or fp
regs after reload, but I'm quite willing to concede the point for a
stage4 fix.

Here's the revised patch.  To recap, the main bug fixes here are:
- stop reload_vsx_from_gprsf splitter from emitting a move not
handled by movdi_internal64
- don't use TFmode, which cannot now be assumed to be IBM
double-double.
Secondary to that, not using or passing around TFmode means the %L
restriction no longer matters, and constraints on the reload temp reg
can be relaxed.

Bootstrapped and regression tested powerpc64-linux biarch and
powerpc64le-linux.  OK David?

	PR target/68973
	* config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf
	rather than attempting to use movdi_internal64.  Remove op0_di.
	(p8_mtvsrd_df, p8_mtvsrd_sf): New.
	(p8_mtvsrd_1, p8_mtvsrd_2): Delete.
	(p8_mtvsrwz): New.
	(p8_mtvsrwz_1, p8_mtvsrwz_2): Delete.
	(p8_xxpermdi_<mode>): Take two DF inputs rather than one TF.
	(p8_fmrgow_<mode>): Likewise.
	(reload_vsx_from_gpr<mode>): Make clobber IF.  Adjust for above
	changes.
	(reload_fpr_from_gpr<mode>): Similarly. Use "d" for op0 constraint.
	* config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdbf873..ec356cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7488,41 +7488,31 @@
 ;; value, since it is allocated in reload and not all of the flow information
 ;; is setup for it.  We have two patterns to do the two moves between gprs and
 ;; fprs.  There isn't a dependancy between the two, but we could potentially
-;; schedule other instructions between the two instructions.  TFmode is
-;; currently limited to traditional FPR registers.  If/when this is changed, we
-;; will need to revist %L to make sure it works with VSX registers, or add an
-;; %x version of %L.
+;; schedule other instructions between the two instructions.
 
 (define_insn "p8_fmrgow_<mode>"
   [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
-	(unspec:FMOVE64X [(match_operand:TF 1 "register_operand" "d")]
+	(unspec:FMOVE64X [
+		(match_operand:DF 1 "register_operand" "d")
+		(match_operand:DF 2 "register_operand" "d")]
 			 UNSPEC_P8V_FMRGOW))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "fmrgow %0,%1,%L1"
+  "fmrgow %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "p8_mtvsrwz_1"
-  [(set (match_operand:TF 0 "register_operand" "=d")
-	(unspec:TF [(match_operand:SI 1 "register_operand" "r")]
+(define_insn "p8_mtvsrwz"
+  [(set (match_operand:DF 0 "register_operand" "=d")
+	(unspec:DF [(match_operand:SI 1 "register_operand" "r")]
 		   UNSPEC_P8V_MTVSRWZ))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrwz %x0,%1"
   [(set_attr "type" "mftgpr")])
 
-(define_insn "p8_mtvsrwz_2"
-  [(set (match_operand:TF 0 "register_operand" "+d")
-	(unspec:TF [(match_dup 0)
-		    (match_operand:SI 1 "register_operand" "r")]
-		   UNSPEC_P8V_MTVSRWZ))]
-  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrwz %L0,%1"
-  [(set_attr "type" "mftgpr")])
-
 (define_insn_and_split "reload_fpr_from_gpr<mode>"
-  [(set (match_operand:FMOVE64X 0 "register_operand" "=ws")
+  [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
 	(unspec:FMOVE64X [(match_operand:FMOVE64X 1 "register_operand" "r")]
 			 UNSPEC_P8V_RELOAD_FROM_GPR))
-   (clobber (match_operand:TF 2 "register_operand" "=d"))]
+   (clobber (match_operand:IF 2 "register_operand" "=d"))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "#"
   "&& reload_completed"
@@ -7530,42 +7520,36 @@
 {
   rtx dest = operands[0];
   rtx src = operands[1];
-  rtx tmp = operands[2];
+  rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0);
+  rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8);
   rtx gpr_hi_reg = gen_highpart (SImode, src);
   rtx gpr_lo_reg = gen_lowpart (SImode, src);
 
-  emit_insn (gen_p8_mtvsrwz_1 (tmp, gpr_hi_reg));
-  emit_insn (gen_p8_mtvsrwz_2 (tmp, gpr_lo_reg));
-  emit_insn (gen_p8_fmrgow_<mode> (dest, tmp));
+  emit_insn (gen_p8_mtvsrwz (tmp_hi, gpr_hi_reg));
+  emit_insn (gen_p8_mtvsrwz (tmp_lo, gpr_lo_reg));
+  emit_insn (gen_p8_fmrgow_<mode> (dest, tmp_hi, tmp_lo));
   DONE;
 }
   [(set_attr "length" "12")
    (set_attr "type" "three")])
 
 ;; Move 128 bit values from GPRs to VSX registers in 64-bit mode
-(define_insn "p8_mtvsrd_1"
-  [(set (match_operand:TF 0 "register_operand" "=ws")
-	(unspec:TF [(match_operand:DI 1 "register_operand" "r")]
-		   UNSPEC_P8V_MTVSRD))]
-  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrd %0,%1"
-  [(set_attr "type" "mftgpr")])
-
-(define_insn "p8_mtvsrd_2"
-  [(set (match_operand:TF 0 "register_operand" "+ws")
-	(unspec:TF [(match_dup 0)
-		    (match_operand:DI 1 "register_operand" "r")]
+(define_insn "p8_mtvsrd_df"
+  [(set (match_operand:DF 0 "register_operand" "=wa")
+	(unspec:DF [(match_operand:DI 1 "register_operand" "r")]
 		   UNSPEC_P8V_MTVSRD))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrd %L0,%1"
+  "mtvsrd %x0,%1"
   [(set_attr "type" "mftgpr")])
 
 (define_insn "p8_xxpermdi_<mode>"
   [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa")
-	(unspec:FMOVE128_GPR [(match_operand:TF 1 "register_operand" "ws")]
-			     UNSPEC_P8V_XXPERMDI))]
+	(unspec:FMOVE128_GPR [
+		(match_operand:DF 1 "register_operand" "wa")
+		(match_operand:DF 2 "register_operand" "wa")]
+		UNSPEC_P8V_XXPERMDI))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "xxpermdi %x0,%1,%L1,0"
+  "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
 
 (define_insn_and_split "reload_vsx_from_gpr<mode>"
@@ -7573,7 +7557,7 @@
 	(unspec:FMOVE128_GPR
 	 [(match_operand:FMOVE128_GPR 1 "register_operand" "r")]
 	 UNSPEC_P8V_RELOAD_FROM_GPR))
-   (clobber (match_operand:TF 2 "register_operand" "=ws"))]
+   (clobber (match_operand:IF 2 "register_operand" "=wa"))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "#"
   "&& reload_completed"
@@ -7581,13 +7565,17 @@
 {
   rtx dest = operands[0];
   rtx src = operands[1];
-  rtx tmp = operands[2];
+  /* You might think that we could use op0 as one temp and a DF clobber
+     as op2, but you'd be wrong.  Secondary reload move patterns don't
+     check for overlap of the clobber and the destination.  */
+  rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0);
+  rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8);
   rtx gpr_hi_reg = gen_highpart (DImode, src);
   rtx gpr_lo_reg = gen_lowpart (DImode, src);
 
-  emit_insn (gen_p8_mtvsrd_1 (tmp, gpr_hi_reg));
-  emit_insn (gen_p8_mtvsrd_2 (tmp, gpr_lo_reg));
-  emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp));
+  emit_insn (gen_p8_mtvsrd_df (tmp_hi, gpr_hi_reg));
+  emit_insn (gen_p8_mtvsrd_df (tmp_lo, gpr_lo_reg));
+  emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp_hi, tmp_lo));
   DONE;
 }
   [(set_attr "length" "12")
@@ -7608,6 +7596,13 @@
 ;; Move SFmode to a VSX from a GPR register.  Because scalar floating point
 ;; type is stored internally as double precision in the VSX registers, we have
 ;; to convert it from the vector format.
+(define_insn "p8_mtvsrd_sf"
+  [(set (match_operand:SF 0 "register_operand" "=wa")
+	(unspec:SF [(match_operand:DI 1 "register_operand" "r")]
+		   UNSPEC_P8V_MTVSRD))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrd %x0,%1"
+  [(set_attr "type" "mftgpr")])
 
 (define_insn_and_split "reload_vsx_from_gprsf"
   [(set (match_operand:SF 0 "register_operand" "=wa")
@@ -7622,16 +7617,12 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  /* Also use the destination register to hold the unconverted DImode value.
-     This is conceptually a separate value from OP0, so we use gen_rtx_REG
-     rather than simplify_gen_subreg.  */
-  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
   rtx op1_di = simplify_gen_subreg (DImode, op1, SFmode, 0);
 
   /* Move SF value to upper 32-bits for xscvspdpn.  */
   emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
-  emit_move_insn (op0_di, op2);
-  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
   DONE;
 }
   [(set_attr "length" "8")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 997ff31..45af233 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1521,7 +1521,7 @@
 ;; Used by direct move to move a SFmode value from GPR to VSX register
 (define_insn "vsx_xscvspdpn_directmove"
   [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
-	(unspec:SF [(match_operand:DI 1 "vsx_register_operand" "wa")]
+	(unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")]
 		   UNSPEC_VSX_CVSPDPN))]
   "TARGET_XSCVSPDPN"
   "xscvspdpn %x0,%x1"