From patchwork Thu Nov 17 18:55:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 696288 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tKVhD6Wxqz9rxv for ; Fri, 18 Nov 2016 05:55:54 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="TmElnqf7"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; q=dns; s= default; b=j3Vil7l4PE9+ly3zHydY6erH8yDy6TnOKoVQ3VYLAKDct6QkPDNm3 5u50DTKqopslIGzf4VvlFOtJmc5a3vPrZRUerMPH1t0ftkajS58aAofGF5p7B7l0 Mwjf0+dkypFrMoAMQnZivcRcT89MfWWyTl87my+miB1sST2EEYFCq0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; s= default; bh=yg+/4xF3nCwfbLUV8KmAGgpvnO0=; b=TmElnqf79srUMBiG/X7/ ijvVCoDSdR4bkspvFAcbRzvL8PN+7oNm7Ul1Cq+PwNYYGgH0iOTcjjRYP1YIRFsz i+FBFLhM5HY8Y4LJZ1Fqj8vqoTJZj45WH1aVhYTuXKjSO9cCMIQilU4QPaesuSbb T6feC6XOGZ9K6EnGBWDBHWc= Received: (qmail 23984 invoked by alias); 17 Nov 2016 18:55:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 23967 invoked by uid 89); 17 Nov 2016 18:55:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.2 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, RCVD_IN_SEMBACKSCATTER autolearn=no version=3.3.2 spammy=2506r, 8994797, U*meissner, 2506R X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 17 Nov 2016 18:55:32 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uAHIsjmV096984 for ; Thu, 17 Nov 2016 13:55:30 -0500 Received: from e19.ny.us.ibm.com (e19.ny.us.ibm.com [129.33.205.209]) by mx0a-001b2d01.pphosted.com with ESMTP id 26se89n6a7-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 17 Nov 2016 13:55:29 -0500 Received: from localhost by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 17 Nov 2016 13:55:28 -0500 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e19.ny.us.ibm.com (146.89.104.206) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 17 Nov 2016 13:55:27 -0500 Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 5B4D46E803F; Thu, 17 Nov 2016 13:55:01 -0500 (EST) Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uAHItNga53149746; Thu, 17 Nov 2016 18:55:26 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A2D84AC059; Thu, 17 Nov 2016 13:55:26 -0500 (EST) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP id 83457AC043; Thu, 17 Nov 2016 13:55:26 -0500 (EST) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id AFB26476C9; Thu, 17 Nov 2016 13:55:25 -0500 (EST) Date: Thu, 17 Nov 2016 13:55:25 -0500 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt Subject: [PATCH] PR target/78101, Fix PowerPC power9-fusion support Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111718-0056-0000-0000-000001F52508 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006095; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00782043; UDB=6.00377324; IPR=6.00559547; BA=6.00004889; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013362; XFM=3.00000011; UTC=2016-11-17 18:55:28 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111718-0057-0000-0000-000006282F9E Message-Id: <20161117185525.GA31983@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-11-17_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611170327 X-IsSubscribed: yes This patch fixes the problem reported in PR 78101 where the power9-fusion support generates an insn that isn't matched: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78101 It also fixes the bug that Andrew Stubbs reported. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01367.html There were two bugs in the code: 1) The fusion peephole for SFmode and DFmode was inconsistant about whether those values were allowed in GPRs if software floating point is used. 2) The power9 fusion store insn had an early clobber in the match_scratch, which prevented having an address that uses a register, does ADDIS to add to the upper 16 bits, and then folds the lower ADDI into the store operation would not work if the address used the scratch register. In addition to the two bugs, the fusion code had been written much earlier than the support for the new ISA 3.0 scalar d-form (register+offset) instructions, and the fusion code did not match these types of stores. I have fixed this, so that those memory references can also be fused. I have bootstraped the compiler with these changes and there were no regressions on the following systems: 1) Little endian power8 2) Big endian power8 (no support for 32-bit libraries) 3) Big endian power7 (support for 32-bit libraries) I have also built and ran spec 2006 CPU tests with this option enabled, and they run fine, with some minor performance changes on power8 using power9 fusion. I have built the cam4_r and cam4_s benchmarks of the next generation Spec (kit 102) and they now compile fine with -mpower9-fusion (they were the source of the bug 78101). Are these patches ok to change into the trunk? Since the bug shows up in GCC 6.x, can I apply and submit these patches to the GCC 6.x branch? [gcc] 2016-11-17 Michael Meissner PR target/78101 * config/rs6000/predicates.md (fusion_addis_mem_combo_load): Add the appropriate checks for SFmode/DFmode load/stores in GPR registers. (fusion_addis_mem_combo_store): Likewise. * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Rename fusion_fpr_* to fusion_vsx_* and add in support for ISA 3.0 scalar d-form instructions for traditional Altivec registers. (emit_fusion_p9_load): Likewise. (emit_fusion_p9_store): Likewise. * config/rs6000/rs6000.md (p9 fusion store peephole2): Remove early clobber from scratch register. Do not match if the register being stored is the scratch register. (fusion_vsx___load): Rename fusion_fpr_* to fusion_vsx_* and add in support for ISA 3.0 scalar d-form instructions for traditional Altivec registers. (fusion_fpr___load): Likewise. (fusion_vsx___store): Likewise. (fusion_fpr___store): Likewise. [gcc/testsuite] 2016-11-17 Michael Meissner PR target/78101 * gcc.target/powerpc/fusion4.c: New test. Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 242456) +++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy) @@ -1844,7 +1844,7 @@ (define_predicate "fusion_gpr_mem_load" ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the ;; memory field with both the addis and the memory offset. Sign extension ;; is not handled here, since lha and lwa are not fused. -;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend +;; With P9 fusion, also match a fpr/vector load and float_extend (define_predicate "fusion_addis_mem_combo_load" (match_code "mem,zero_extend,float_extend") { @@ -1873,11 +1873,15 @@ (define_predicate "fusion_addis_mem_comb break; case SFmode: - case DFmode: if (!TARGET_P9_FUSION) return 0; break; + case DFmode: + if ((!TARGET_POWERPC64 && !TARGET_DF_FPR) || !TARGET_P9_FUSION) + return 0; + break; + default: return 0; } @@ -1920,6 +1924,7 @@ (define_predicate "fusion_addis_mem_comb case QImode: case HImode: case SImode: + case SFmode: break; case DImode: @@ -1927,13 +1932,8 @@ (define_predicate "fusion_addis_mem_comb return 0; break; - case SFmode: - if (!TARGET_SF_FPR) - return 0; - break; - case DFmode: - if (!TARGET_DF_FPR) + if (!TARGET_POWERPC64 && !TARGET_DF_FPR) return 0; break; Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 242456) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -3441,28 +3441,28 @@ rs6000_init_hard_regno_mode_ok (bool glo static const struct fuse_insns addis_insns[] = { { SFmode, DImode, RELOAD_REG_FPR, - CODE_FOR_fusion_fpr_di_sf_load, - CODE_FOR_fusion_fpr_di_sf_store }, + CODE_FOR_fusion_vsx_di_sf_load, + CODE_FOR_fusion_vsx_di_sf_store }, { SFmode, SImode, RELOAD_REG_FPR, - CODE_FOR_fusion_fpr_si_sf_load, - CODE_FOR_fusion_fpr_si_sf_store }, + CODE_FOR_fusion_vsx_si_sf_load, + CODE_FOR_fusion_vsx_si_sf_store }, { DFmode, DImode, RELOAD_REG_FPR, - CODE_FOR_fusion_fpr_di_df_load, - CODE_FOR_fusion_fpr_di_df_store }, + CODE_FOR_fusion_vsx_di_df_load, + CODE_FOR_fusion_vsx_di_df_store }, { DFmode, SImode, RELOAD_REG_FPR, - CODE_FOR_fusion_fpr_si_df_load, - CODE_FOR_fusion_fpr_si_df_store }, + CODE_FOR_fusion_vsx_si_df_load, + CODE_FOR_fusion_vsx_si_df_store }, { DImode, DImode, RELOAD_REG_FPR, - CODE_FOR_fusion_fpr_di_di_load, - CODE_FOR_fusion_fpr_di_di_store }, + CODE_FOR_fusion_vsx_di_di_load, + CODE_FOR_fusion_vsx_di_di_store }, { DImode, SImode, RELOAD_REG_FPR, - CODE_FOR_fusion_fpr_si_di_load, - CODE_FOR_fusion_fpr_si_di_store }, + CODE_FOR_fusion_vsx_si_di_load, + CODE_FOR_fusion_vsx_si_di_store }, { QImode, DImode, RELOAD_REG_GPR, CODE_FOR_fusion_gpr_di_qi_load, @@ -3522,6 +3522,14 @@ rs6000_init_hard_regno_mode_ok (bool glo reg_addr[xmode].fusion_addis_ld[rtype] = addis_insns[i].load; reg_addr[xmode].fusion_addis_st[rtype] = addis_insns[i].store; + + if (rtype == RELOAD_REG_FPR && TARGET_P9_DFORM_SCALAR) + { + reg_addr[xmode].fusion_addis_ld[RELOAD_REG_VMX] + = addis_insns[i].load; + reg_addr[xmode].fusion_addis_st[RELOAD_REG_VMX] + = addis_insns[i].store; + } } } @@ -39817,6 +39825,15 @@ emit_fusion_p9_load (rtx reg, rtx mem, r else gcc_unreachable (); } + else if (ALTIVEC_REGNO_P (r) && TARGET_P9_DFORM_SCALAR) + { + if (mode == SFmode) + load_string = "lxssp"; + else if (mode == DFmode || mode == DImode) + load_string = "lxsd"; + else + gcc_unreachable (); + } else if (INT_REGNO_P (r)) { switch (mode) @@ -39895,6 +39912,15 @@ emit_fusion_p9_store (rtx mem, rtx reg, else gcc_unreachable (); } + else if (ALTIVEC_REGNO_P (r) && TARGET_P9_DFORM_SCALAR) + { + if (mode == SFmode) + store_string = "stxssp"; + else if (mode == DFmode || mode == DImode) + store_string = "stxsd"; + else + gcc_unreachable (); + } else if (INT_REGNO_P (r)) { switch (mode) Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 242456) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -13438,7 +13438,8 @@ (define_peephole2 (set (match_operand:SFDF 2 "offsettable_mem_operand" "") (match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))] "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0]) - && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])" + && fusion_p9_p (operands[0], operands[1], operands[2], operands[3]) + && !rtx_equal_p (operands[0], operands[3])" [(const_int 0)] { expand_fusion_p9_store (operands); @@ -13496,7 +13497,7 @@ (define_insn "fusion_gpr_____load" - [(set (match_operand:FPR_FUSION 0 "fpr_reg_operand" "=d") +(define_insn "fusion_vsx___load" + [(set (match_operand:FPR_FUSION 0 "vsx_register_operand" "=dwb") (unspec:FPR_FUSION [(match_operand:FPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")] UNSPEC_FUSION_P9)) @@ -13517,10 +13518,10 @@ (define_insn "fusion_fpr____store" +(define_insn "fusion_vsx___store" [(set (match_operand:FPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF") (unspec:FPR_FUSION - [(match_operand:FPR_FUSION 1 "fpr_reg_operand" "d")] + [(match_operand:FPR_FUSION 1 "vsx_register_operand" "dwb")] UNSPEC_FUSION_P9)) (clobber (match_operand:P 2 "base_reg_operand" "=b"))] "TARGET_P9_FUSION" Index: gcc/testsuite/gcc.target/powerpc/fusion4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fusion4.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fusion4.c (.../gcc/testsuite/gcc.target/powerpc) (revision 242499) @@ -0,0 +1,13 @@ +/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-mcpu=power7 -mtune=power9 -O3 -msoft-float -m32" } */ + +#define LARGE 0x12345 + +float fusion_float_read (float *p){ return p[LARGE]; } + +void fusion_float_write (float *p, float f){ p[LARGE] = f; } + +/* { dg-final { scan-assembler "store fusion, type SF" } } */