From patchwork Thu Jun 22 22:54:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 779726 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wtxkC5p4rz9s8J for ; Fri, 23 Jun 2017 08:55:12 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="BHhPpUFr"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; q=dns; s= default; b=BLFEHJkNx92lk6MLDQJwy03bLjesups7yz3qcrXCO3Xset4lNVAUe v2hWcF8RWUBS8TyTBq0yB9HCma4kwCUuRrIzylNsEmGV5CJ0iDJYkm8DfxyR3F5s gvOaRJBCiZtBiZr05BmEJwXqKvDzYT1hQPeGoVcjWyeLXCVOeG9YMM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; s= default; bh=osg/mewz0origR0rUj77a2lRigw=; b=BHhPpUFr6JKyoyMqlNeP 5oIbSJPQDPdDoFK4r1alWe/6A44OVo64TargcDqRRfeGVCRs8IwS6f/aqBhVsUu5 8jiUMD4fG5sYrEFv9HoqubCtqcMR0podM0QufT0XsTcCnWqs0tjg1OMQqMurT7VN VvodiCtNi5nQ450ltZBEDxE= Received: (qmail 37700 invoked by alias); 22 Jun 2017 22:55:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 36563 invoked by uid 89); 22 Jun 2017 22:55:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, KHOP_DYNAMIC, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=burn X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Jun 2017 22:54:59 +0000 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v5MMrVfT084950 for ; Thu, 22 Jun 2017 18:54:58 -0400 Received: from e19.ny.us.ibm.com (e19.ny.us.ibm.com [129.33.205.209]) by mx0b-001b2d01.pphosted.com with ESMTP id 2b8mcb625r-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 22 Jun 2017 18:54:57 -0400 Received: from localhost by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 22 Jun 2017 18:54:57 -0400 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e19.ny.us.ibm.com (146.89.104.206) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 22 Jun 2017 18:54:55 -0400 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v5MMstPs24117250; Thu, 22 Jun 2017 22:54:55 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C7C39AC03F; Thu, 22 Jun 2017 18:54:55 -0400 (EDT) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP id B452AAC03A; Thu, 22 Jun 2017 18:54:55 -0400 (EDT) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 53089476E7; Thu, 22 Jun 2017 18:54:53 -0400 (EDT) Date: Thu, 22 Jun 2017 18:54:52 -0400 From: Michael Meissner To: GCC Patches , Segher Boessenkool , David Edelsohn , Bill Schmidt Subject: [PATCH], PR target/80510, Optimize 32-bit offsettable memory references on power7/power8 Mail-Followup-To: Michael Meissner , GCC Patches , Segher Boessenkool , David Edelsohn , Bill Schmidt MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 x-cbid: 17062222-0056-0000-0000-000003921A6F X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007274; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00878482; UDB=6.00437729; IPR=6.00658622; BA=6.00005437; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015929; XFM=3.00000015; UTC=2017-06-22 22:54:57 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17062222-0057-0000-0000-000007C8332A Message-Id: <20170622225452.GA7801@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-06-22_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706220392 X-IsSubscribed: yes Andreas Schwab noticed that the two tests for PR 80510 failed on 32-bit systems due to long being only a 32-bit type. Yesterday, I committed this patch to disable the test for 32-bit: https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01607.html This patch implements the necessary move and peephole support for 32-bit ISA 2.05/2.06 (power7/power8) targets, so that the compiler can optimize: load FPR, move FPR, ALTIVEC move ALTIVEC, FPR store FPR, into: ADDI GPR, ADDI GPR, load ALTIVEC, GPR store ALTIVEC, GPR I tested it on two systems: 1) a big endian power7 system that has the 32-bit libraries installed, and 2) a little endian power8 system. On both systems, it bootstrapped and passed make check. I did verify on the power7 system that it ran the two tests for the functionality correctly. FWIW, I built two 32-bit versions of Spec 2006, using the compiler without and with the changes installed. Unlike 64-bit, I don't see any code changes as a result of this optimization, and all 30 spec benchmarks built correctly. However, the tests show that it will generate the instructions in some cases, but it is evidently not currently triggered. Can I install this into the trunk and after a burn in period, install it on the GCC 7 and GCC 6 branches (the previous patch for 64-bit is already installed on both branches)? If desired, I can make sure it gets into 6.4, or I can wait to install the patch until after 6.4 ships. [gcc] 2017-06-22 Michael Meissner PR target/80510 * config/rs6000/rs6000.md (ALTIVEC_DFORM): Do not allow DImode in 32-bit, since indexed is not valid for DImode. (mov_hardfloat32): Reorder ISA 2.07 load/stores before ISA 3.0 d-form load/stores to be the same as mov_hardfloat64. (define_peephole2 for Altivec d-form load): Add 32-bit support. (define_peephole2 for Altivec d-form store): Likewise. [gcc/testsuite] 2017-06-22 Michael Meissner PR target/80510 * gcc.target/powerpc/pr80510-1.c: Allow test to run on 32-bit. * gcc.target/powerpc/pr80510-2.c: Likewise. Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 249488) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -690,7 +690,9 @@ (define_code_attr SMINMAX [(smin "SM ;; Iterator to optimize the following cases: ;; D-form load to FPR register & move to Altivec register ;; Move Altivec register to FPR register and store -(define_mode_iterator ALTIVEC_DFORM [DI DF SF]) +(define_mode_iterator ALTIVEC_DFORM [DF + SF + (DI "TARGET_POWERPC64")]) ;; Start with fixed-point load and store insns. Here we put only the more @@ -7391,8 +7393,8 @@ (define_split ;; except for 0.0 which can be created on VSX with an xor instruction. (define_insn "*mov_hardfloat32" - [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,,Z,,wY,,,!r,Y,r,!r") - (match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,,wY,,,,,r,Y,r"))] + [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,,wY,,Z,,,!r,Y,r,!r") + (match_operand:FMOVE64 1 "input_operand" "d,m,d,wY,,Z,,,,,r,Y,r"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -7400,10 +7402,10 @@ (define_insn "*mov_hardfloat32" stfd%U0%X0 %1,%0 lfd%U1%X1 %0,%1 fmr %0,%1 - lxsd%U1x %x0,%y1 - stxsd%U0x %x1,%y0 lxsd %0,%1 stxsd %1,%0 + lxsd%U1x %x0,%y1 + stxsd%U0x %x1,%y0 xxlor %x0,%x1,%x1 xxlxor %x0,%x0,%x0 # @@ -13967,13 +13969,13 @@ (define_insn "*fusion_p9__constant ;; LXSDX 32,3,9 (define_peephole2 - [(match_scratch:DI 0 "b") + [(match_scratch:P 0 "b") (set (match_operand:ALTIVEC_DFORM 1 "fpr_reg_operand") (match_operand:ALTIVEC_DFORM 2 "simple_offsettable_mem_operand")) (set (match_operand:ALTIVEC_DFORM 3 "altivec_register_operand") (match_dup 1))] - "TARGET_VSX && TARGET_POWERPC64 && TARGET_UPPER_REGS_ - && !TARGET_P9_DFORM_SCALAR && peep2_reg_dead_p (2, operands[1])" + "TARGET_VSX && TARGET_UPPER_REGS_ && !TARGET_P9_DFORM_SCALAR + && peep2_reg_dead_p (2, operands[1])" [(set (match_dup 0) (match_dup 4)) (set (match_dup 3) @@ -13988,7 +13990,7 @@ (define_peephole2 add_op0 = XEXP (addr, 0); add_op1 = XEXP (addr, 1); gcc_assert (REG_P (add_op0)); - new_addr = gen_rtx_PLUS (DImode, add_op0, tmp_reg); + new_addr = gen_rtx_PLUS (Pmode, add_op0, tmp_reg); operands[4] = add_op1; operands[5] = change_address (mem, mode, new_addr); @@ -14004,13 +14006,13 @@ (define_peephole2 ;; STXSDX 32,3,9 (define_peephole2 - [(match_scratch:DI 0 "b") + [(match_scratch:P 0 "b") (set (match_operand:ALTIVEC_DFORM 1 "fpr_reg_operand") (match_operand:ALTIVEC_DFORM 2 "altivec_register_operand")) (set (match_operand:ALTIVEC_DFORM 3 "simple_offsettable_mem_operand") (match_dup 1))] - "TARGET_VSX && TARGET_POWERPC64 && TARGET_UPPER_REGS_ - && !TARGET_P9_DFORM_SCALAR && peep2_reg_dead_p (2, operands[1])" + "TARGET_VSX && TARGET_UPPER_REGS_ && !TARGET_P9_DFORM_SCALAR + && peep2_reg_dead_p (2, operands[1])" [(set (match_dup 0) (match_dup 4)) (set (match_dup 5) @@ -14025,7 +14027,7 @@ (define_peephole2 add_op0 = XEXP (addr, 0); add_op1 = XEXP (addr, 1); gcc_assert (REG_P (add_op0)); - new_addr = gen_rtx_PLUS (DImode, add_op0, tmp_reg); + new_addr = gen_rtx_PLUS (Pmode, add_op0, tmp_reg); operands[4] = add_op1; operands[5] = change_address (mem, mode, new_addr); Index: gcc/testsuite/gcc.target/powerpc/pr80510-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr80510-1.c (revision 249488) +++ gcc/testsuite/gcc.target/powerpc/pr80510-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ @@ -6,9 +6,7 @@ /* Make sure that STXSDX is generated for double scalars in Altivec registers on power7 instead of moving the value to a FPR register and doing a X-FORM - store. - - 32-bit currently does not have support for STXSDX in the mov{df,dd} patterns. */ + store. */ #ifndef TYPE #define TYPE double Index: gcc/testsuite/gcc.target/powerpc/pr80510-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr80510-2.c (revision 249488) +++ gcc/testsuite/gcc.target/powerpc/pr80510-2.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ @@ -6,9 +6,7 @@ /* Make sure that STXSSPX is generated for float scalars in Altivec registers on power7 instead of moving the value to a FPR register and doing a X-FORM - store. - - 32-bit currently does not have support for STXSSPX in the mov{sf,sd} patterns. */ + store. */ #ifndef TYPE #define TYPE float