From patchwork Mon Feb 15 12:36:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Modra X-Patchwork-Id: 582870 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5F7C11402EC for ; Mon, 15 Feb 2016 23:37:12 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=ga1qzaWp; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=s6vvoj/BvSoT6nR0W kAfoPj90YzF5uOTxXrwxUyVVfevmS4csIFswnuXjhynnqdWTZ5TRW0pTeP0YlwNK reoX0pm46Pyh2G/qzag88XymqSqBPIsnp2OZuHRlX8HPd54IPY7mazH3gnAqYIKG dKgPKPijt43oJGq3I9HwvSzi/s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=tH4DyqqVpfUJQX2FLn7z3or BebY=; b=ga1qzaWphEamy6BjNEk2jHizRyox2CmFW9VEjwKa0fVTkq8yR1XzZPy Za3Vp0tybRxXI1Zq1UUxqcjYeKhrrIX9rmWwRASxo8cUwLLos0Fqx4i59q1OnL93 WA9qYNnyNqtfqFbW8ymLp7Nq2AKTsKBmDImCoF/s40UHk8xGVCJs= Received: (qmail 2852 invoked by alias); 15 Feb 2016 12:37:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2790 invoked by uid 89); 15 Feb 2016 12:37:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=sk:!target, sk:TARGET, sk:!TARGET, Similarly X-HELO: mail-pf0-f171.google.com Received: from mail-pf0-f171.google.com (HELO mail-pf0-f171.google.com) (209.85.192.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 15 Feb 2016 12:37:01 +0000 Received: by mail-pf0-f171.google.com with SMTP id e127so87162870pfe.3 for ; Mon, 15 Feb 2016 04:37:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=EqD9TKvNbuwHxvfTbq3rehBHweiyY12gfD4R9oaAjxE=; b=LFN4lcWgqg6eXUoDIJzX/7Sv8Qvaivof7RXE47b6b2B2SBjXmZbl/xaq05xOp87Z4P p9V60jfEB/x29TTZ5VYrfbNomzvQwINJd+fnzLZO7QI11dTRLnMR0DHUycSlDMyC3YAj sL8OMXu82UNEkkQgqf1lG7okwlFl0ROd0tHZ3hJ+Q0pI6/pfUf52FeWn3F79boK4zL8g GFLfd9yi8IvsHNwhl/i3h71z76KuCuZrPIVcKBWU266nZ14OI58qTUwQ5g+rMGed3Mhq kmqZV3tTTXWum9tCn+DO4vW8VVpflOReHI16N7mYWfdfJSf70CFY+t9pJE8NAlGx4fIE IRLA== X-Gm-Message-State: AG10YOTcgQdhdmZZN1WKv4zWBSw8U2Iggg51iB97yhPf35kxOuPDTYREkFEeWoUlWZP2EQ== X-Received: by 10.98.1.197 with SMTP id 188mr23132622pfb.8.1455539819549; Mon, 15 Feb 2016 04:36:59 -0800 (PST) Received: from bubble.grove.modra.org (CPE-58-160-146-233.sa.bigpond.net.au. [58.160.146.233]) by smtp.gmail.com with ESMTPSA id u64sm38415507pfa.86.2016.02.15.04.36.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Feb 2016 04:36:58 -0800 (PST) Received: by bubble.grove.modra.org (Postfix, from userid 1000) id 08BF1EA0157; Mon, 15 Feb 2016 23:06:53 +1030 (ACDT) Date: Mon, 15 Feb 2016 23:06:53 +1030 From: Alan Modra To: Ulrich Weigand Cc: Michael Meissner , David Edelsohn , GCC Patches , Michael R Meissner Subject: Re: [RS6000] reload_vsx_from_gprsf splitter Message-ID: <20160215123653.GG10888@bubble.grove.modra.org> References: <20160211215302.GB22837@ibm-tiger.the-meissners.org> <20160212135722.AD6DD6B9A@oc7340732750.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160212135722.AD6DD6B9A@oc7340732750.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote: > > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote: > > > Another concern I had about this, besides using %L in asm output (what > > > forces TFmode to use just fprs?), is what happens when we're using > > > IEEE 128-bit floats? In that case it looks like we'd get just one reg. > > > > Good point that it breaks if the default long double (TFmode) type is IEEE > > 128-bit floating point. We would need to have two patterns, one that uses > > TFmode and one that uses IFmode. I wrote the power8 direct move stuff before > > going down the road of IEEE 128-bit floating point. > > Right. It's a bit unfortunate that we can't just use IFmode unconditionally, > but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and > then we probably shouldn't be using it. Actually, we can use IFmode unconditionally. scalar_mode_supported_p is relevant only up to and including expand. Nothing prevents the backend from using IFmode. > Another option might be to use TDmode to allocate a scratch register pair. That won't work, at least if we want to extract the two component regs with simplify_gen_subreg, due to rs6000_cannot_change_mode_class. In my original patch I just extracted the regs by using gen_rtx_REG but I changed that, based on your criticism of using gen_rtx_REG in reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using operand regnos in other places. That particular change is of course entirely cosmetic. I also changed reload_vsx_from_gprsf to avoid mode punning regs, instead duplicating insn patterns as done elsewhere in the vsx support. I don't believe we will see subregs of vsx or fp regs after reload, but I'm quite willing to concede the point for a stage4 fix. Here's the revised patch. To recap, the main bug fixes here are: - stop reload_vsx_from_gprsf splitter from emitting a move not handled by movdi_internal64 - don't use TFmode, which cannot now be assumed to be IBM double-double. Secondary to that, not using or passing around TFmode means the %L restriction no longer matters, and constraints on the reload temp reg can be relaxed. Bootstrapped and regression tested powerpc64-linux biarch and powerpc64le-linux. OK David? PR target/68973 * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf rather than attempting to use movdi_internal64. Remove op0_di. (p8_mtvsrd_df, p8_mtvsrd_sf): New. (p8_mtvsrd_1, p8_mtvsrd_2): Delete. (p8_mtvsrwz): New. (p8_mtvsrwz_1, p8_mtvsrwz_2): Delete. (p8_xxpermdi_): Take two DF inputs rather than one TF. (p8_fmrgow_): Likewise. (reload_vsx_from_gpr): Make clobber IF. Adjust for above changes. (reload_fpr_from_gpr): Similarly. Use "d" for op0 constraint. * config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index cdbf873..ec356cb 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7488,41 +7488,31 @@ ;; value, since it is allocated in reload and not all of the flow information ;; is setup for it. We have two patterns to do the two moves between gprs and ;; fprs. There isn't a dependancy between the two, but we could potentially -;; schedule other instructions between the two instructions. TFmode is -;; currently limited to traditional FPR registers. If/when this is changed, we -;; will need to revist %L to make sure it works with VSX registers, or add an -;; %x version of %L. +;; schedule other instructions between the two instructions. (define_insn "p8_fmrgow_" [(set (match_operand:FMOVE64X 0 "register_operand" "=d") - (unspec:FMOVE64X [(match_operand:TF 1 "register_operand" "d")] + (unspec:FMOVE64X [ + (match_operand:DF 1 "register_operand" "d") + (match_operand:DF 2 "register_operand" "d")] UNSPEC_P8V_FMRGOW))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "fmrgow %0,%1,%L1" + "fmrgow %0,%1,%2" [(set_attr "type" "vecperm")]) -(define_insn "p8_mtvsrwz_1" - [(set (match_operand:TF 0 "register_operand" "=d") - (unspec:TF [(match_operand:SI 1 "register_operand" "r")] +(define_insn "p8_mtvsrwz" + [(set (match_operand:DF 0 "register_operand" "=d") + (unspec:DF [(match_operand:SI 1 "register_operand" "r")] UNSPEC_P8V_MTVSRWZ))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "mtvsrwz %x0,%1" [(set_attr "type" "mftgpr")]) -(define_insn "p8_mtvsrwz_2" - [(set (match_operand:TF 0 "register_operand" "+d") - (unspec:TF [(match_dup 0) - (match_operand:SI 1 "register_operand" "r")] - UNSPEC_P8V_MTVSRWZ))] - "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrwz %L0,%1" - [(set_attr "type" "mftgpr")]) - (define_insn_and_split "reload_fpr_from_gpr" - [(set (match_operand:FMOVE64X 0 "register_operand" "=ws") + [(set (match_operand:FMOVE64X 0 "register_operand" "=d") (unspec:FMOVE64X [(match_operand:FMOVE64X 1 "register_operand" "r")] UNSPEC_P8V_RELOAD_FROM_GPR)) - (clobber (match_operand:TF 2 "register_operand" "=d"))] + (clobber (match_operand:IF 2 "register_operand" "=d"))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" @@ -7530,42 +7520,36 @@ { rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp = operands[2]; + rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0); + rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8); rtx gpr_hi_reg = gen_highpart (SImode, src); rtx gpr_lo_reg = gen_lowpart (SImode, src); - emit_insn (gen_p8_mtvsrwz_1 (tmp, gpr_hi_reg)); - emit_insn (gen_p8_mtvsrwz_2 (tmp, gpr_lo_reg)); - emit_insn (gen_p8_fmrgow_ (dest, tmp)); + emit_insn (gen_p8_mtvsrwz (tmp_hi, gpr_hi_reg)); + emit_insn (gen_p8_mtvsrwz (tmp_lo, gpr_lo_reg)); + emit_insn (gen_p8_fmrgow_ (dest, tmp_hi, tmp_lo)); DONE; } [(set_attr "length" "12") (set_attr "type" "three")]) ;; Move 128 bit values from GPRs to VSX registers in 64-bit mode -(define_insn "p8_mtvsrd_1" - [(set (match_operand:TF 0 "register_operand" "=ws") - (unspec:TF [(match_operand:DI 1 "register_operand" "r")] - UNSPEC_P8V_MTVSRD))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrd %0,%1" - [(set_attr "type" "mftgpr")]) - -(define_insn "p8_mtvsrd_2" - [(set (match_operand:TF 0 "register_operand" "+ws") - (unspec:TF [(match_dup 0) - (match_operand:DI 1 "register_operand" "r")] +(define_insn "p8_mtvsrd_df" + [(set (match_operand:DF 0 "register_operand" "=wa") + (unspec:DF [(match_operand:DI 1 "register_operand" "r")] UNSPEC_P8V_MTVSRD))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrd %L0,%1" + "mtvsrd %x0,%1" [(set_attr "type" "mftgpr")]) (define_insn "p8_xxpermdi_" [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa") - (unspec:FMOVE128_GPR [(match_operand:TF 1 "register_operand" "ws")] - UNSPEC_P8V_XXPERMDI))] + (unspec:FMOVE128_GPR [ + (match_operand:DF 1 "register_operand" "wa") + (match_operand:DF 2 "register_operand" "wa")] + UNSPEC_P8V_XXPERMDI))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "xxpermdi %x0,%1,%L1,0" + "xxpermdi %x0,%x1,%x2,0" [(set_attr "type" "vecperm")]) (define_insn_and_split "reload_vsx_from_gpr" @@ -7573,7 +7557,7 @@ (unspec:FMOVE128_GPR [(match_operand:FMOVE128_GPR 1 "register_operand" "r")] UNSPEC_P8V_RELOAD_FROM_GPR)) - (clobber (match_operand:TF 2 "register_operand" "=ws"))] + (clobber (match_operand:IF 2 "register_operand" "=wa"))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" @@ -7581,13 +7565,17 @@ { rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp = operands[2]; + /* You might think that we could use op0 as one temp and a DF clobber + as op2, but you'd be wrong. Secondary reload move patterns don't + check for overlap of the clobber and the destination. */ + rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0); + rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8); rtx gpr_hi_reg = gen_highpart (DImode, src); rtx gpr_lo_reg = gen_lowpart (DImode, src); - emit_insn (gen_p8_mtvsrd_1 (tmp, gpr_hi_reg)); - emit_insn (gen_p8_mtvsrd_2 (tmp, gpr_lo_reg)); - emit_insn (gen_p8_xxpermdi_ (dest, tmp)); + emit_insn (gen_p8_mtvsrd_df (tmp_hi, gpr_hi_reg)); + emit_insn (gen_p8_mtvsrd_df (tmp_lo, gpr_lo_reg)); + emit_insn (gen_p8_xxpermdi_ (dest, tmp_hi, tmp_lo)); DONE; } [(set_attr "length" "12") @@ -7608,6 +7596,13 @@ ;; Move SFmode to a VSX from a GPR register. Because scalar floating point ;; type is stored internally as double precision in the VSX registers, we have ;; to convert it from the vector format. +(define_insn "p8_mtvsrd_sf" + [(set (match_operand:SF 0 "register_operand" "=wa") + (unspec:SF [(match_operand:DI 1 "register_operand" "r")] + UNSPEC_P8V_MTVSRD))] + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" + "mtvsrd %x0,%1" + [(set_attr "type" "mftgpr")]) (define_insn_and_split "reload_vsx_from_gprsf" [(set (match_operand:SF 0 "register_operand" "=wa") @@ -7622,16 +7617,12 @@ rtx op0 = operands[0]; rtx op1 = operands[1]; rtx op2 = operands[2]; - /* Also use the destination register to hold the unconverted DImode value. - This is conceptually a separate value from OP0, so we use gen_rtx_REG - rather than simplify_gen_subreg. */ - rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); rtx op1_di = simplify_gen_subreg (DImode, op1, SFmode, 0); /* Move SF value to upper 32-bits for xscvspdpn. */ emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); - emit_move_insn (op0_di, op2); - emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di)); + emit_insn (gen_p8_mtvsrd_sf (op0, op2)); + emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0)); DONE; } [(set_attr "length" "8") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 997ff31..45af233 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1521,7 +1521,7 @@ ;; Used by direct move to move a SFmode value from GPR to VSX register (define_insn "vsx_xscvspdpn_directmove" [(set (match_operand:SF 0 "vsx_register_operand" "=wa") - (unspec:SF [(match_operand:DI 1 "vsx_register_operand" "wa")] + (unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")] UNSPEC_VSX_CVSPDPN))] "TARGET_XSCVSPDPN" "xscvspdpn %x0,%x1"