From patchwork Mon May 16 22:31:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Bergner X-Patchwork-Id: 1631938 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=tKf9yqS1; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4L2DXP2PMKz9s2R for ; Tue, 17 May 2022 08:32:35 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BD072383983C for ; Mon, 16 May 2022 22:32:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BD072383983C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1652740350; bh=487bcvyg/cc9WEZkeHS71l2cZNrRUoBZnNFdERgJd/c=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=tKf9yqS1/1l2BN/apE4X0LVurdOk1w7e/raNSeLhc0BwHwDX9q7rQZCMv1M56VWoe aMuln51EU4Ciw0lfGidjmXkYu9R/eOOeD6G1+MarhBx7DvDws9m/7M9qzIpoizUA9I qCVDx4P06B7jp0bwxRfGuTemqNh5YOkBJsxagtHg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 9615D3857407 for ; Mon, 16 May 2022 22:32:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9615D3857407 Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24GKGaLa029898; Mon, 16 May 2022 22:32:09 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g3wgqa1qm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 16 May 2022 22:32:08 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24GMW8EL011330; Mon, 16 May 2022 22:32:08 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g3wgqa1qd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 16 May 2022 22:32:08 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24GMCi5U001456; Mon, 16 May 2022 22:32:07 GMT Received: from b03cxnp07027.gho.boulder.ibm.com (b03cxnp07027.gho.boulder.ibm.com [9.17.130.14]) by ppma03wdc.us.ibm.com with ESMTP id 3g2429h7mt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 16 May 2022 22:32:07 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp07027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 24GMW6eR29819232 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 16 May 2022 22:32:06 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 643F4BE063; Mon, 16 May 2022 22:32:06 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C9B0EBE082; Mon, 16 May 2022 22:32:05 +0000 (GMT) Received: from [9.160.179.82] (unknown [9.160.179.82]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 16 May 2022 22:32:05 +0000 (GMT) Message-ID: <90380588-1f25-3efb-376b-a9820fe3bd46@linux.ibm.com> Date: Mon, 16 May 2022 17:31:31 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Content-Language: en-US To: Segher Boessenkool Subject: [PATCH v2] rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556] X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 2KqHRdn2iEj0xZMQVxeUyo10nCS3iHmb X-Proofpoint-GUID: smHku3wH8kqaHGZV9qzX5NFpASyA6FCS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-16_15,2022-05-16_02,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 impostorscore=0 lowpriorityscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 clxscore=1015 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205160123 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Peter Bergner via Gcc-patches From: Peter Bergner Reply-To: Peter Bergner Cc: GCC Patches , David Edelsohn Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" On 5/10/22 5:35 PM, Segher Boessenkool wrote: > Out of interest, did you try using v,?wa (so just two alternatives, not > four)? Or did you think it wouldresult in measurably worse code? Or > did you decide it is not such bad backend code size explosion after > all :-) So I tried using just "v,?wa" instead of the 4 alternative "v,v,?d,?d" version and that fixes the performance issue too and is simpler too. The other option is "better", in that it can allow one operand to get a "v" reg when the other gets a "d" reg, but I think that's just a micro-optimization and not worth the extra complexity in the pattern. Thanks for the suggestion! Changes since v1: * Use v,?wa constraints rather than v,v,?d,?d. * Update git log entry and ChangeLog text. When optimizing the DGEMM kernel in OpenBLAS to use MMA, the MMA code uses all 8 accumulators, which overlap all vs0-vs31 vector registers. Current trunk assigns one of the normal vector inputs to one of the MMA instructions, which forces us to spill one of the accumulators to memory, leading to poor performance. The solution here is to replace the "wa" constraints for the vector input operands in the MMA instruction patterns with "v,?wa" so that we prefer using the altivec registers vs32-vs63 over the vs0-vs31 registers. Bootstrap and regtesting on powerpc64le-linux showed no regressions. Ok for trunk and the GCC12 release branch after some burn-in time on trunk? Technically, the same issue exists in GCC11 and GCC10, but the RA assignment is OK with the current code, so unless/until we have a test case that exhibits the issue, I'm only asking for a backport to GCC12 which does show the performance problem. Peter gcc/ PR target/105556 * config/rs6000/mma.md (mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Replace "wa" constraints with "v,?wa". Update other operands accordingly. diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 907c9d6d516..a183b6a168a 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -490,50 +490,50 @@ (define_insn "mma_xxsetaccz" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_VV))] "TARGET_MMA" " %A0,%x1,%x2" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_AVV))] "TARGET_MMA" " %A0,%x2,%x3" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_PV))] "TARGET_MMA" " %A0,%x1,%x2" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:OO 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:OO 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_APV))] "TARGET_MMA" " %A0,%x2,%x3" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "u8bit_cint_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "u8bit_cint_operand" "n,n")] MMA_VVI4I4I8))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" @@ -541,13 +541,13 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n") - (match_operand:SI 6 "u8bit_cint_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n") + (match_operand:SI 6 "u8bit_cint_operand" "n,n")] MMA_AVVI4I4I8))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" @@ -555,12 +555,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n")] MMA_VVI4I4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" @@ -568,13 +568,13 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n") - (match_operand:SI 6 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n") + (match_operand:SI 6 "const_0_to_3_operand" "n,n")] MMA_AVVI4I4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" @@ -582,11 +582,11 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n")] MMA_VVI4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" @@ -594,12 +594,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n")] MMA_AVVI4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" @@ -607,11 +607,11 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n") + (match_operand:SI 4 "const_0_to_3_operand" "n,n")] MMA_PVI4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" @@ -619,12 +619,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:OO 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:OO 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n")] MMA_APVI4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" @@ -632,12 +632,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n")] MMA_VVI4I4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" @@ -645,13 +645,13 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n") - (match_operand:SI 6 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n") + (match_operand:SI 6 "const_0_to_15_operand" "n,n")] MMA_AVVI4I4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6"