From patchwork Mon Aug 2 20:19:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pat Haugen X-Patchwork-Id: 1512558 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=nd82lB/O; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Gdq9n52JLz9sRR for ; Tue, 3 Aug 2021 06:19:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 930703891C20 for ; Mon, 2 Aug 2021 20:19:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 930703891C20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627935591; bh=OBlhUOhK1B6rSNbguM3sSGwUZnyaOf96sNuxTXiY3VI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=nd82lB/OArl2f+0Gm4iSZ2u+78LsWxVyLV2XFIpgkG19T+7eVs9VXn2ngweajYDSo zrH8KI6tzWoTBBgnPQWl9j5vRoiUWjsC8TMM+CPCCa88B2KLHL3qnvW8RNBSU/wJ0V dJ6DHUT2bI4sHIUhE6Aky+IMzVMnQlHGUAn4Ib58= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6B72C385041E for ; Mon, 2 Aug 2021 20:19:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B72C385041E Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 172K4Bvu018913; Mon, 2 Aug 2021 16:19:06 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a6nkmkd02-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 02 Aug 2021 16:19:06 -0400 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 172K4i2x029101; Mon, 2 Aug 2021 16:19:05 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a6nkmkcys-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 02 Aug 2021 16:19:05 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 172KHTFv018532; Mon, 2 Aug 2021 20:19:04 GMT Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by ppma03wdc.us.ibm.com with ESMTP id 3a4x5bbyf9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 02 Aug 2021 20:19:04 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 172KJ4m414090572 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 2 Aug 2021 20:19:04 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 090D0B2064; Mon, 2 Aug 2021 20:19:04 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id ADF75B206A; Mon, 2 Aug 2021 20:19:03 +0000 (GMT) Received: from [9.160.57.210] (unknown [9.160.57.210]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTPS; Mon, 2 Aug 2021 20:19:03 +0000 (GMT) To: GCC Patches Subject: [PATCH, rs6000] Add store fusion support for Power10 Message-ID: Date: Mon, 2 Aug 2021 15:19:03 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: oa374zAss_DbXOAcE9dawxlMVhfTh6_r X-Proofpoint-GUID: n7WUQnLSvIeUf5LocIo328AWD8GmKNc- X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-02_07:2021-08-02, 2021-08-02 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 mlxlogscore=999 mlxscore=0 clxscore=1015 adultscore=0 lowpriorityscore=0 spamscore=0 phishscore=0 bulkscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108020129 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Pat Haugen via Gcc-patches From: Pat Haugen Reply-To: Pat Haugen Cc: Peter Bergner , David Edelsohn , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Enable store fusion on Power10. Use the SCHED_REORDER hook to implement Power10 specific ready list reordering. As of now, pairing stores for store fusion is the only function being performed. Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for master? -Pat 2021-08-02 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable store fusion for Power10. (is_load_insn1): Verify destination is a register. (is_store_insn1): Verify source is a register. (is_fusable_store): New. (power10_sched_reorder): Likewise. (rs6000_sched_reorder): Do Power10 specific reordering. (rs6000_sched_reorder2): Likewise. * config/rs6000/rs6000.opt: Add new option. diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 6758296c0fd..f5812da0184 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -90,7 +90,8 @@ | OPTION_MASK_P10_FUSION_2LOGICAL \ | OPTION_MASK_P10_FUSION_LOGADD \ | OPTION_MASK_P10_FUSION_ADDLOG \ - | OPTION_MASK_P10_FUSION_2ADD) + | OPTION_MASK_P10_FUSION_2ADD \ + | OPTION_MASK_P10_FUSION_2STORE) /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \ @@ -143,6 +144,7 @@ | OPTION_MASK_P10_FUSION_LOGADD \ | OPTION_MASK_P10_FUSION_ADDLOG \ | OPTION_MASK_P10_FUSION_2ADD \ + | OPTION_MASK_P10_FUSION_2STORE \ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_MFCRF \ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 279f00cc648..1460a0d7c5c 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p) && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0) rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD; + if (TARGET_POWER10 + && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) + rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; + /* Turn off vector pair/mma options on non-power10 systems. */ else if (!TARGET_POWER10 && TARGET_MMA) { @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) + if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) return find_mem_ref (SET_SRC (pat), load_mem); if (GET_CODE (pat) == PARALLEL) @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) + if (GET_CODE (pat) == SET + && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))) return find_mem_ref (SET_DEST (pat), str_mem); if (GET_CODE (pat) == PARALLEL) @@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos) return cached_can_issue_more; } +/* Determine if INSN is a store to memory that can be fused with a similar + adjacent store. */ + +static bool +is_fusable_store (rtx_insn *insn, rtx *str_mem) +{ + /* Exit early if not doing store fusion. */ + if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE)) + return false; + + /* Insn must be a non-prefixed base+disp form store. */ + if (is_store_insn (insn, str_mem) + && get_attr_prefixed (insn) == PREFIXED_NO + && get_attr_update (insn) == UPDATE_NO + && get_attr_indexed (insn) == INDEXED_NO) + { + /* Further restictions by mode and size. */ + machine_mode mode = GET_MODE (*str_mem); + HOST_WIDE_INT size; + if MEM_SIZE_KNOWN_P (*str_mem) + size = MEM_SIZE (*str_mem); + else + return false; + + if INTEGRAL_MODE_P (mode) + { + /* Must be word or dword size. */ + return (size == 4 || size == 8); + } + else if FLOAT_MODE_P (mode) + { + /* Must be dword size. */ + return (size == 8); + } + } + + return false; +} + +/* Do Power10 specific reordering of the ready list. */ + +static int +power10_sched_reorder (rtx_insn **ready, int lastpos) +{ + int pos; + rtx mem1, mem2; + + /* Do store fusion during sched2 only. */ + if (!reload_completed) + return cached_can_issue_more; + + /* If the prior insn finished off a store fusion pair then simply + reset the counter and return, nothing more to do. */ + if (load_store_pendulum != 0) + { + load_store_pendulum = 0; + return cached_can_issue_more; + } + + /* Try to pair certain store insns to adjacent memory locations + so that the hardware will fuse them to a single operation. */ + if (is_fusable_store (last_scheduled_insn, &mem1)) + { + /* A fusable store was just scheduled. Scan the ready list for another + store that it can fuse with. */ + pos = lastpos; + while (pos >= 0) + { + /* GPR stores can be ascending or descending offsets, FPR/VSR stores + must be ascending only. */ + if (is_fusable_store (ready[pos], &mem2) + && ((INTEGRAL_MODE_P (GET_MODE (mem1)) + && adjacent_mem_locations (mem1, mem2)) + || (FLOAT_MODE_P (GET_MODE (mem1)) + && (adjacent_mem_locations (mem1, mem2) == mem1)))) + { + /* Found a fusable store. Move it to the end of the ready list + so it is scheduled next. */ + move_to_end_of_ready (ready, pos, lastpos); + + load_store_pendulum = -1; + break; + } + pos--; + } + } + + return cached_can_issue_more; +} + /* We are about to begin issuing insns for this clock cycle. */ static int @@ -18885,6 +18980,10 @@ rs6000_sched_reorder (FILE *dump ATTRIBUTE_UNUSED, int sched_verbose, if (rs6000_tune == PROCESSOR_POWER6) load_store_pendulum = 0; + /* Do Power10 dependent reordering. */ + if (rs6000_tune == PROCESSOR_POWER10 && last_scheduled_insn) + power10_sched_reorder (ready, *pn_ready - 1); + return rs6000_issue_rate (); } @@ -18906,6 +19005,10 @@ rs6000_sched_reorder2 (FILE *dump, int sched_verbose, rtx_insn **ready, && recog_memoized (last_scheduled_insn) >= 0) return power9_sched_reorder2 (ready, *pn_ready - 1); + /* Do Power10 dependent reordering. */ + if (rs6000_tune == PROCESSOR_POWER10 && last_scheduled_insn) + return power10_sched_reorder (ready, *pn_ready - 1); + return cached_can_issue_more; } diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 0538db387dc..3753de19557 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -514,6 +514,10 @@ mpower10-fusion-2add Target Undocumented Mask(P10_FUSION_2ADD) Var(rs6000_isa_flags) Fuse dependent pairs of add or vaddudm instructions for better performance on power10. +mpower10-fusion-2store +Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags) +Fuse certain store operations together for better performance on power10. + mcrypto Target Mask(CRYPTO) Var(rs6000_isa_flags) Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.