From patchwork Mon Oct 28 19:36:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2003474 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=UgoS3ZVa; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XckCw2Xjfz1xwK for ; Tue, 29 Oct 2024 06:36:44 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 911CA3858C35 for ; Mon, 28 Oct 2024 19:36:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id F0F273858D20 for ; Mon, 28 Oct 2024 19:36:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F0F273858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F0F273858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144186; cv=none; b=MrHRMYCulXSDHnWeT2i+PqdwtgwE2V3Qfy+KATEvBaaewhGb2TaqRM8T5JAvxBn7kuGhNqlVIH2QUBWAqlME9WiPAeecIcWYtQTonJsZQ+pEsV50eMtICL3zHdh9IlZbk+wN6VWgnbo55dA4ISDmNp+qyLKTYjLGfmxTiW7NiBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144186; c=relaxed/simple; bh=grJmzlVw7oSUIW//2C/sYKPCM2Turi+7DxFNnKyAJFs=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=jtIn1rs413Ku/JDXjsyVEObKNb2LYtcGnFUm4jsWBOE/4wqIfXDwJTOCpAHvNf3Gp1JyEYC0I1DLtUx1gX81gWOtzhYe8v0ABHjM5Z4yl6mEjSr6fBioTBxtsd1yZP4CczToLCdSuRN9NwsIVNo62Nd6OBVB6ZwjpoFYs1vvUkg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49SHnN9i014734; Mon, 28 Oct 2024 19:36:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=b3U/O3JSqDBQOKHNgbW16c6SRed+uI dblT1PadrPxPA=; b=UgoS3ZVamUwE6N58uo1VIY5Kb97vyZzn1dvD+W+zAkB3Nl YyaFOhVXjxKkB4UNwT2JhdnIwVHudUjBwWZvPg4RQggj8/kfDBMQ8ZYDgnBIeHZ1 RqqJmIpIGl3fAuFJwVJ8H9G4HjA5ZbRhtYxsE7dk7O5TArEFAC/qoS0UeuoFWLP9 tzwq3FYU+HUZ2c6fCCOjsvVN8bvTkr3YmTq0CAJ+4eSFRtYtPswUC+XSsU9F+xgi WEZ2z4bYv542L/1Y4pMJmO3mGXp5Z2d3M44vHrdtQI0jHqzEHF4SI/eqyujKL0f2 YkD1VgycFIyZuI3ubWZZTLSwifnpcXCd1pynSmCw== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42j3x4cn4a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:36:21 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 49SGVrfu018803; Mon, 28 Oct 2024 19:36:20 GMT Received: from smtprelay05.wdc07v.mail.ibm.com ([172.16.1.72]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 42hc8jynwm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:36:20 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay05.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 49SJaJ9D26608250 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Oct 2024 19:36:19 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3B2AB58057; Mon, 28 Oct 2024 19:36:19 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D9F2358059; Mon, 28 Oct 2024 19:36:18 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.80.9]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Oct 2024 19:36:18 +0000 (GMT) Date: Mon, 28 Oct 2024 15:36:17 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner Subject: [PATCH 1/6] Use vector pair load/store for memcpy with -mcpu=future Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: g6Ho2-XoPYQaefH2pp9AkbVPNAJrwWdd X-Proofpoint-ORIG-GUID: g6Ho2-XoPYQaefH2pp9AkbVPNAJrwWdd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 adultscore=0 mlxscore=0 suspectscore=0 clxscore=1015 mlxlogscore=999 malwarescore=0 spamscore=0 priorityscore=1501 phishscore=0 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410280152 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org In the development for the power10 processor, GCC did not enable using the load vector pair and store vector pair instructions when optimizing things like memory copy. This patch enables using those instructions if -mcpu=future is used. 2024-10-22 Michael Meissner gcc/ * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using load vector pair and store vector pair instructions for memory copy operations. (POWERPC_MASKS): Make the bit for enabling using load vector pair and store vector pair operations set and reset when the PowerPC processor is changed. --- gcc/config/rs6000/rs6000-cpus.def | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index e73d9ef51f8..74151be4048 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -86,7 +86,8 @@ #define POWER11_MASKS_SERVER ISA_3_1_MASKS_SERVER -#define FUTURE_MASKS_SERVER POWER11_MASKS_SERVER +#define FUTURE_MASKS_SERVER (POWER11_MASKS_SERVER \ + | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR) /* Flags that need to be turned off if -mno-vsx. */ #define OTHER_VSX_VECTOR_MASKS (OPTION_MASK_EFFICIENT_UNALIGNED_VSX \ @@ -116,6 +117,7 @@ /* Mask of all options to set the default isa flags based on -mcpu=. */ #define POWERPC_MASKS (OPTION_MASK_ALTIVEC \ + | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \ | OPTION_MASK_CMPB \ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DFP \ From patchwork Mon Oct 28 19:37:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2003475 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=hDzV3Gil; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XckF22V2cz1xwK for ; Tue, 29 Oct 2024 06:37:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 896E73858403 for ; Mon, 28 Oct 2024 19:37:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 441E23858D20 for ; Mon, 28 Oct 2024 19:37:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 441E23858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 441E23858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144243; cv=none; b=w6exWdJ8qbdvIrDxuAjC+SqFvaLxJeCDteDHAMnWtgAYApy3t+QrZIMwTJNuOIF+CM9pGyRr+EcKdsUnwHVfzCs/3FWT+NYF5Hf5ZLmkbDaTa4Xoy4JjjK48gZse7KnKtZEUuX5fRv9MhSvuMH6wfXqez8Iyzx0+vTl2zL54xaI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144243; c=relaxed/simple; bh=PsJHtgsS/Yc/6DwOQUZoY3IcKUFN5DN6Lot7kJUSIV4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=taOAShvzCObFgJiryLTktdQkOiokSDJdRpB2cdvJPqVEHzhcpZWf4yV3m5OQjyBMwjXJRcTAS9Mcu4gKn9VENHgRhZc22ZeIOHDqGuiTHE+q+V7519ofpJqoI06RjXvrKjTsoRNtDK1ZYriO66dz1yYYiJ1+g8VCtedyDy/c8lw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49SHnPLj006196; Mon, 28 Oct 2024 19:37:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=2jYpnSRgs1sy0LZSCCQzKFaB/INgZd 1eymxFolpuupA=; b=hDzV3Gil/fduyPyJ09abSefApsanFSAWNvSIO1m1EoM/83 eDEmreRzOUzZCuLPysUlqiN5Uxf3OGDb2/5R7O13gQzRR8lphFwTqm3tvPNuNV4P TAPoyYEPygVi6KNTzDnCL2WdlYzrJdt2LsQqdSCV/Tpd9Ps1gZ8HnKCeACR2qHkD kCvtf1zXn3ZbJezCsEZY9GgYnRnAPlnDdfgbs05AimrOj7sJ9W3VwwdX+Nf4gRWL qlbdTR8PQ1nGxJxOfCwNRd5HK4ajPsABRwB5KPE3OA34EETbt32BtKO/8t5B2HYE 6JOsBm6G1s9U3w0F0KixACjfIPDHKo8fqK1WDnBA== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42j43ebvxv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:37:20 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 49SGLvlH013670; Mon, 28 Oct 2024 19:37:19 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 42hbrmqrej-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:37:19 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 49SJbIYp37159334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Oct 2024 19:37:18 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BC23E5805A; Mon, 28 Oct 2024 19:37:18 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6584B5803F; Mon, 28 Oct 2024 19:37:18 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.80.9]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Oct 2024 19:37:18 +0000 (GMT) Date: Mon, 28 Oct 2024 15:37:16 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner Subject: [PATCH 2/6] Add wD constraint. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 1na8LeY04ULfZDUBhnSpr_PnLZ1BPles X-Proofpoint-ORIG-GUID: 1na8LeY04ULfZDUBhnSpr_PnLZ1BPles X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 bulkscore=0 impostorscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 lowpriorityscore=0 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410280152 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a new constraint ('wD') that matches the accumulator registers that overlap with VSX registers 0..31 on power10. Future patches will add the support for a separate accumulator register class that will be used when the support for dense math registes is added. 2024-10-22 Michael Meissner * config/rs6000/constraints.md (wD): New constraint. * config/rs6000/mma.md (mma_): Prepare for alternate accumulator registers. Use wD constraint instead of 'd' constraint. Use accumulator_operand instead of fpr_reg_operand. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] + [(set (match_operand:XO 0 "accumulator_operand" "=&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")] MMA_ACC))] "TARGET_MMA" " %A0" @@ -523,7 +523,7 @@ (define_insn "mma_xxsetaccz" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_VV))] @@ -532,8 +532,8 @@ (define_insn "mma_" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_AVV))] @@ -542,7 +542,7 @@ (define_insn "mma_" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_PV))] @@ -551,8 +551,8 @@ (define_insn "mma_" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:OO 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_APV))] @@ -561,7 +561,7 @@ (define_insn "mma_" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -574,8 +574,8 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") (match_operand:SI 4 "const_0_to_15_operand" "n,n") @@ -588,7 +588,7 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -601,8 +601,8 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") (match_operand:SI 4 "const_0_to_15_operand" "n,n") @@ -615,7 +615,7 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -627,8 +627,8 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") (match_operand:SI 4 "const_0_to_15_operand" "n,n") @@ -640,7 +640,7 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -652,8 +652,8 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:OO 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") (match_operand:SI 4 "const_0_to_15_operand" "n,n") @@ -665,7 +665,7 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -678,8 +678,8 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") (match_operand:SI 4 "const_0_to_15_operand" "n,n") diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 0b78901e94b..1827647b7c1 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -186,6 +186,21 @@ (define_predicate "vlogical_operand" return VLOGICAL_REGNO_P (REGNO (op)); }) +;; Return 1 if op is an accumulator. On power10 systems, the accumulators +;; overlap with the FPRs. +(define_predicate "accumulator_operand" + (match_operand 0 "register_operand") +{ + if (!REG_P (op)) + return 0; + + if (!HARD_REGISTER_P (op)) + return 1; + + int r = REGNO (op); + return FP_REGNO_P (r) && (r & 3) == 0; +}) + ;; Return 1 if op is the carry register. (define_predicate "ca_operand" (match_operand 0 "register_operand") diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index aa67e7256bb..9e9342d4793 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -2411,6 +2411,7 @@ rs6000_debug_reg_global (void) "wr reg_class = %s\n" "wx reg_class = %s\n" "wA reg_class = %s\n" + "wD reg_class = %s\n" "\n", reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]], @@ -2418,7 +2419,8 @@ rs6000_debug_reg_global (void) reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]], - reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wA]]); + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wA]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wD]]); nl = "\n"; for (m = 0; m < NUM_MACHINE_MODES; ++m) @@ -3081,6 +3083,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) if (TARGET_DIRECT_MOVE_128) rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; + if (TARGET_MMA) + rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS; + /* Set up the reload helper and direct move functions. */ if (TARGET_VSX || TARGET_ALTIVEC) { diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 8cfd9faf77d..07a372b8902 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1203,6 +1203,7 @@ enum r6000_reg_class_enum { RS6000_CONSTRAINT_wr, /* GPR register if 64-bit */ RS6000_CONSTRAINT_wx, /* FPR register for STFIWX */ RS6000_CONSTRAINT_wA, /* BASE_REGS if 64-bit. */ + RS6000_CONSTRAINT_wD, /* Accumulator regs if MMA/Dense Math. */ RS6000_CONSTRAINT_MAX }; diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 6d9c8643739..569172031fd 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3440,6 +3440,11 @@ Like @code{d}, if @option{-mpowerpc-gfxopt} is used; otherwise, @code{NO_REGS}. @item wA Like @code{b}, if @option{-mpowerpc64} is used; otherwise, @code{NO_REGS}. +@item wD +Accumulator register if @option{-mma} is used; otherwise, +@code{NO_REGS}. For @option{-mcpu=power10} the accumulator registers +overlap with VSX vector registers 0..31. + @item wB Signed 5-bit constant integer that can be loaded into an Altivec register. From patchwork Mon Oct 28 19:38:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2003478 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=DgGVfB8g; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XckH36rPdz1xwK for ; Tue, 29 Oct 2024 06:39:27 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1CDDF3858C52 for ; Mon, 28 Oct 2024 19:39:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 39DBC3858D20 for ; Mon, 28 Oct 2024 19:38:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 39DBC3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 39DBC3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144340; cv=none; b=kcnYBu0QVWsVHps0OeUQ6SnVbxpivOgVCkgoCGHxuVrkXVQAXrQLPqgxfwC/z6EWer/FBo/kvrTrcoqrKRYirQTaP375JqbzqyUY9dgrWPc3MTPkzHzUG+xz1QgjVrXPuy8w7tnISygJN3Adazo2dfv5ccSoLE1FtR2+hL7KIxQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144340; c=relaxed/simple; bh=G84xPdp2/NEveDsR885eQTO+lhsqj4g1Iat8WEl0V6s=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=pqAIH3iS+n8sxWoaGrU4Co4p5sauOHyVN1hhHj/8Dv3078Yl1g818Uno9a+6OeV8oZbr8Zi8Zwgp/FFVX2KoFJ7wPU/Bi6lZkoDR5Nq6mxZ5rO9M6cIQSQSepvl69suOiV/1Wo3qqVqkpWj49LnwR6UHwe5Dc0c0r6hzbKECfXQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49SHnOfe020521; Mon, 28 Oct 2024 19:38:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=sZUFUtBoh4Rd08BF2QE2fDTpNXECIT KDWT9ngs2LaJY=; b=DgGVfB8gW6XEHWBxzspQNHH/QCS8+gufsIyP+XJy0WKBsh Jb6N0AvBuAEs4gpbTxzn1ETWaAv6YMwR5J+xzdTzEf4ksDReWCyC/OBoREhhVBFw tPqnKeyZh20p5zRz/n7X0WxsX8xIUYpkg/XnVTim0t8bRZqtHgvjrMc295uXEq6r mfomoutQpRMORunrIAOf2tXHEk4Vw+id6237VVhJqoyrNuI9VNKQcCwl/NmC+jGz Xm+0y2g/1CUelGTu0SC1Yd8BOE9rxsdyrpfYhJVpEHigB18l/GhzcFq216LPHVAZ 0eTZj7dDPpw/yeEo3f4Mdi0b1eayHQ0yMac5UP5w== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42j43fvm5f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:38:41 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 49SJZfNs028327; Mon, 28 Oct 2024 19:38:40 GMT Received: from smtprelay04.wdc07v.mail.ibm.com ([172.16.1.71]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 42hb4xqvgb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:38:40 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 49SJcdaR35258908 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Oct 2024 19:38:39 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6B19C5805A; Mon, 28 Oct 2024 19:38:39 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BFF155805C; Mon, 28 Oct 2024 19:38:38 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.80.9]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Oct 2024 19:38:38 +0000 (GMT) Date: Mon, 28 Oct 2024 15:38:37 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner Subject: [PATCH 3/6] Add support for dense math registers. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: s8_s_NpFezZcVUa1KflJKB6aXO692zsT X-Proofpoint-ORIG-GUID: s8_s_NpFezZcVUa1KflJKB6aXO692zsT X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 clxscore=1015 adultscore=0 mlxscore=0 priorityscore=1501 spamscore=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410280152 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The MMA subsystem added the notion of accumulator registers as an optional feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with the VSX registers 0..31, but logically the accumulator registers were separate from the FPR registers. In ISA 3.1, it was anticipated that in future systems, the accumulator registers may no overlap with the FPR registers. This patch adds the support for dense math registers as separate registers. This particular patch does not change the MMA support to use the accumulators within the dense math registers. This patch just adds the basic support for having separate DMRs. The next patch will switch the MMA support to use the accumulators if -mcpu=future is used. For testing purposes, I added an undocumented option '-mdense-math' to enable or disable the dense math support. This patch adds a new constraint (wD). If MMA is selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint will allow access to accumulators that overlap with VSX registers 0..31. If both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint will only allow dense math registers. This patch modifies the existing %A output modifier. If MMA is selected but dense math is not selected, then %A output modifier converts the VSX register number to the accumulator number, by dividing it by 4. If both MMA and dense math are selected, then %A will map the separate DMR registers into 0..7. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. It is possible that the mangling for DMRs and the GDB register numbers may produce other changes in the future. gcc/ 2024-10-28 Michael Meissner * config/rs6000/mma.md (UNSPEC_MMA_DMSETDMRZ): New unspec. (movxo): Add comments about dense math registers. (movxo_nodm): Rename from movxo and restrict the usage to machines without dense math registers. (movxo_dm): New insn for movxo support for machines with dense math registers. (mma_): Restrict usage to machines without dense math registers. (mma_xxsetaccz): Add a define_expand wrapper, and add support for dense math registers. (mma_dmsetaccz): New insn. * config/rs6000/predicates.md (dmr_operand): New predicate. (accumulator_operand): Add support for dense math registers. * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do not issue a de-prime instruction when disassembling a vector quad on a system with dense math registers. * config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): Define __DENSE_MATH__ if we have dense math registers. * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD constraint. (reload_reg_map): Likewise. (rs6000_reg_names): Likewise. (alt_reg_names): Likewise. (rs6000_hard_regno_nregs_internal): Likewise. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_secondary_reload_memory): Add support for DMR registers. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (print_operand): Make %A handle both FPRs and DMRs. (rs6000_dmr_register_move_cost): New helper function. (rs6000_register_move_cost): Add support for DMR registers. (rs6000_memory_move_cost): Likewise. (rs6000_compute_pressure_classes): Likewise. (rs6000_debugger_regno): Likewise. (rs6000_split_multireg_move): Add support for DMRs. * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro. (TARGET_MMA_DENSE_MATH): Likewise. (TARGET_MMA_NO_DENSE_MATH): Likewise (UNITS_PER_DMR_WORD): Likewise. (FIRST_PSEUDO_REGISTER): Update for DMRs. (FIXED_REGISTERS): Add DMRs. (CALL_REALLY_USED_REGISTERS): Likewise. (REG_ALLOC_ORDER): Likewise. (DMR_REGNO_P): New macro. (enum reg_class): Add DM_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD. (REGISTER_NAMES): Add DMR registers. (ADDITIONAL_REGISTER_NAMES): Likewise. * config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant. (LAST_DMR_REGNO): Likewise. --- gcc/config/rs6000/mma.md | 74 ++++++++-- gcc/config/rs6000/predicates.md | 21 ++- gcc/config/rs6000/rs6000-builtin.cc | 5 +- gcc/config/rs6000/rs6000-c.cc | 9 +- gcc/config/rs6000/rs6000.cc | 220 ++++++++++++++++++++++------ gcc/config/rs6000/rs6000.h | 43 +++++- gcc/config/rs6000/rs6000.md | 2 + 7 files changed, 311 insertions(+), 63 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index e051239df57..ae6e7e9695b 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,7 @@ (define_c_enum "unspec" UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_MMA_DMSETDMRZ ]) (define_c_enum "unspecv" @@ -314,7 +315,9 @@ (define_insn_and_split "*movoo" (set_attr "length" "*,*,8")]) -;; Vector quad support. XOmode can only live in FPRs. +;; Vector quad support. Under the original MMA, XOmode can only live in VSX +;; registers 0..31. With dense math, XOmode can live in either VSX registers +;; (0..63) or DMR registers. (define_expand "movxo" [(set (match_operand:XO 0 "nonimmediate_operand") (match_operand:XO 1 "input_operand"))] @@ -339,10 +342,10 @@ (define_expand "movxo" gcc_assert (false); }) -(define_insn_and_split "*movxo" +(define_insn_and_split "*movxo_nodm" [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d") (match_operand:XO 1 "input_operand" "ZwO,d,d"))] - "TARGET_MMA + "TARGET_MMA_NO_DENSE_MATH && (gpc_reg_operand (operands[0], XOmode) || gpc_reg_operand (operands[1], XOmode))" "@ @@ -359,6 +362,31 @@ (define_insn_and_split "*movxo" (set_attr "length" "*,*,16") (set_attr "max_prefixed_insns" "2,2,*")]) +(define_insn_and_split "*movxo_dm" + [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa") + (match_operand:XO 1 "input_operand" "ZwO,wa, wa,wa,wD,wD"))] + "TARGET_MMA_DENSE_MATH + && (gpc_reg_operand (operands[0], XOmode) + || gpc_reg_operand (operands[1], XOmode))" + "@ + # + # + # + dmxxinstdmr512 %0,%1,%Y1,0 + dmmr %0,%1 + dmxxextfdmr512 %0,%Y0,%1,0" + "&& reload_completed + && !dmr_operand (operands[0], XOmode) + && !dmr_operand (operands[1], XOmode)" + [(const_int 0)] +{ + rs6000_split_multireg_move (operands[0], operands[1]); + DONE; +} + [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma") + (set_attr "length" "*,*,16,*,*,*") + (set_attr "max_prefixed_insns" "2,2,*,*,*,*")]) + (define_expand "vsx_assemble_pair" [(match_operand:OO 0 "vsx_register_operand") (match_operand:V16QI 1 "mma_assemble_input_operand") @@ -499,29 +527,53 @@ (define_insn_and_split "*mma_disassemble_acc" DONE; }) -;; MMA instructions that do not use their accumulators as an input, still -;; must not allow their vector operands to overlap the registers used by -;; the accumulator. We enforce this by marking the output as early clobber. +;; MMA instructions that do not use their accumulators as an input, still must +;; not allow their vector operands to overlap the registers used by the +;; accumulator. We enforce this by marking the output as early clobber. The +;; prime and de-prime instructions are not needed on systems with dense math +;; registers. (define_insn "mma_" [(set (match_operand:XO 0 "accumulator_operand" "=&wD") - (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")] + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] MMA_ACC))] - "TARGET_MMA" + "TARGET_MMA_NO_DENSE_MATH" " %A0" [(set_attr "type" "mma")]) ;; We can't have integer constants in XOmode so we wrap this in an -;; UNSPEC_VOLATILE. +;; UNSPEC_VOLATILE. If we have dense math registers, we can just use a normal +;; UNSPEC instead of UNSPEC_VOLATILE. -(define_insn "mma_xxsetaccz" - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") +(define_expand "mma_xxsetaccz" + [(set (match_operand:XO 0 "accumulator_operand") (unspec_volatile:XO [(const_int 0)] UNSPECV_MMA_XXSETACCZ))] "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) + { + emit_insn (gen_mma_dmsetdmrz (operands[0])); + DONE; + } +}) + +(define_insn "*mma_xxsetaccz" + [(set (match_operand:XO 0 "fpr_reg_operand" "=d") + (unspec_volatile:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_MMA_NO_DENSE_MATH" "xxsetaccz %A0" [(set_attr "type" "mma")]) +(define_insn "mma_dmsetdmrz" + [(set (match_operand:XO 0 "accumulator_operand" "=wD") + (unspec [(const_int 0)] + UNSPEC_MMA_DMSETDMRZ))] + "TARGET_MMA_DENSE_MATH" + "dmsetdmrz %A0" + [(set_attr "type" "mma")]) + (define_insn "mma_" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 1827647b7c1..2797c3cf619 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -186,8 +186,23 @@ (define_predicate "vlogical_operand" return VLOGICAL_REGNO_P (REGNO (op)); }) +;; Return 1 if op is a DMR register +(define_predicate "dmr_operand" + (match_operand 0 "register_operand") +{ + if (!REG_P (op)) + return 0; + + if (!HARD_REGISTER_P (op)) + return 1; + + return DMR_REGNO_P (REGNO (op)); +}) + ;; Return 1 if op is an accumulator. On power10 systems, the accumulators -;; overlap with the FPRs. +;; overlap with the FPRs, while on systems with dense math, the accumulators +;; are separate dense math registers and do not overlap with the FPR +;; registers.. (define_predicate "accumulator_operand" (match_operand 0 "register_operand") { @@ -198,7 +213,9 @@ (define_predicate "accumulator_operand" return 1; int r = REGNO (op); - return FP_REGNO_P (r) && (r & 3) == 0; + return (TARGET_MMA_DENSE_MATH + ? DMR_REGNO_P (r) + : FP_REGNO_P (r) && (r & 3) == 0); }) ;; Return 1 if op is the carry register. diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index b6093b3cb64..f2063edd2c3 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -1125,8 +1125,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, } /* If we're disassembling an accumulator into a different type, we need - to emit a xxmfacc instruction now, since we cannot do it later. */ - if (fncode == RS6000_BIF_DISASSEMBLE_ACC) + to emit a xxmfacc instruction now, since we cannot do it later. If we + have dense math registers, we don't need to do this. */ + if (fncode == RS6000_BIF_DISASSEMBLE_ACC && !TARGET_DENSE_MATH) { new_decl = rs6000_builtin_decls[RS6000_BIF_XXMFACC_INTERNAL]; new_call = gimple_build_call (new_decl, 1, src); diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 82826f96a8e..f0feaa3bd5d 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -590,9 +590,14 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, if (rs6000_cpu == PROCESSOR_CELL) rs6000_define_or_undefine_macro (define_p, "__PPU__"); - /* Tell the user if we support the MMA instructions. */ + /* Tell the user if we support the MMA instructions. Also tell them if MMA + uses the dense math registers. */ if ((flags & OPTION_MASK_MMA) != 0) - rs6000_define_or_undefine_macro (define_p, "__MMA__"); + { + rs6000_define_or_undefine_macro (define_p, "__MMA__"); + if ((arch_flags & ARCH_MASK_FUTURE) != 0) + rs6000_define_or_undefine_macro (define_p, "__DENSE_MATH__"); + } /* Whether pc-relative code is being generated. */ if ((flags & OPTION_MASK_PCREL) != 0) rs6000_define_or_undefine_macro (define_p, "__PCREL__"); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 9e9342d4793..bd1c979eca2 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -293,7 +293,8 @@ enum rs6000_reg_type { ALTIVEC_REG_TYPE, FPR_REG_TYPE, SPR_REG_TYPE, - CR_REG_TYPE + CR_REG_TYPE, + DMR_REG_TYPE }; /* Map register class to register type. */ @@ -307,22 +308,23 @@ static enum rs6000_reg_type reg_class_to_reg_type[N_REG_CLASSES]; /* Register classes we care about in secondary reload or go if legitimate - address. We only need to worry about GPR, FPR, and Altivec registers here, - along an ANY field that is the OR of the 3 register classes. */ + address. We only need to worry about GPR, FPR, Altivec, and DMR registers + here, along an ANY field that is the OR of the 4 register classes. */ enum rs6000_reload_reg_type { RELOAD_REG_GPR, /* General purpose registers. */ RELOAD_REG_FPR, /* Traditional floating point regs. */ RELOAD_REG_VMX, /* Altivec (VMX) registers. */ - RELOAD_REG_ANY, /* OR of GPR, FPR, Altivec masks. */ + RELOAD_REG_DMR, /* DMR registers. */ + RELOAD_REG_ANY, /* OR of GPR/FPR/VMX/DMR masks. */ N_RELOAD_REG }; -/* For setting up register classes, loop through the 3 register classes mapping +/* For setting up register classes, loop through the 4 register classes mapping into real registers, and skip the ANY class, which is just an OR of the bits. */ #define FIRST_RELOAD_REG_CLASS RELOAD_REG_GPR -#define LAST_RELOAD_REG_CLASS RELOAD_REG_VMX +#define LAST_RELOAD_REG_CLASS RELOAD_REG_DMR /* Map reload register type to a register in the register class. */ struct reload_reg_map_type { @@ -334,6 +336,7 @@ static const struct reload_reg_map_type reload_reg_map[N_RELOAD_REG] = { { "Gpr", FIRST_GPR_REGNO }, /* RELOAD_REG_GPR. */ { "Fpr", FIRST_FPR_REGNO }, /* RELOAD_REG_FPR. */ { "VMX", FIRST_ALTIVEC_REGNO }, /* RELOAD_REG_VMX. */ + { "DMR", FIRST_DMR_REGNO }, /* RELOAD_REG_DMR. */ { "Any", -1 }, /* RELOAD_REG_ANY. */ }; @@ -1228,6 +1231,8 @@ char rs6000_reg_names[][8] = "0", "1", "2", "3", "4", "5", "6", "7", /* vrsave vscr sfp */ "vrsave", "vscr", "sfp", + /* DMRs */ + "0", "1", "2", "3", "4", "5", "6", "7", }; #ifdef TARGET_REGNAMES @@ -1254,6 +1259,8 @@ static const char alt_reg_names[][8] = "%cr0", "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7", /* vrsave vscr sfp */ "vrsave", "vscr", "sfp", + /* DMRs */ + "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7", }; #endif @@ -1925,6 +1932,9 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode) else if (ALTIVEC_REGNO_P (regno)) reg_size = UNITS_PER_ALTIVEC_WORD; + else if (DMR_REGNO_P (regno)) + reg_size = UNITS_PER_DMR_WORD; + else reg_size = UNITS_PER_WORD; @@ -1946,9 +1956,35 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) if (mode == OOmode) return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0); - /* MMA accumulator modes need FPR registers divisible by 4. */ + /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible + by 4. + + If dense math registers are enabled, we can allow all VSX registers plus + the DMR registers. VSX registers are used to load and store the registers + as the accumulator registers do not have load and store instructions. + Because we just use the VSX registers for load/store operations, we just + need to make sure load vector pair and store vector pair instructions can + be used. */ if (mode == XOmode) - return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0); + { + if (!TARGET_MMA) + return 0; + + else if (!TARGET_DENSE_MATH) + return (FP_REGNO_P (regno) && (regno & 3) == 0); + + else if (DMR_REGNO_P (regno)) + return 1; + + else + return (VSX_REGNO_P (regno) + && VSX_REGNO_P (last_regno) + && (regno & 1) == 0); + } + + /* No other types other than XOmode can go in DMRs. */ + if (DMR_REGNO_P (regno)) + return 0; /* PTImode can only go in GPRs. Quad word memory operations require even/odd register combinations, and use PTImode where we need to deal with quad @@ -2391,6 +2427,7 @@ rs6000_debug_reg_global (void) rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO, "vs"); + rs6000_debug_reg_print (FIRST_DMR_REGNO, LAST_DMR_REGNO, "dmr"); rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr"); rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr"); rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr"); @@ -2723,6 +2760,21 @@ rs6000_setup_reg_addr_masks (void) addr_mask = 0; reg = reload_reg_map[rc].reg; + /* Special case DMR registers. */ + if (rc == RELOAD_REG_DMR) + { + if (TARGET_DENSE_MATH && m2 == XOmode) + { + addr_mask = RELOAD_REG_VALID; + reg_addr[m].addr_mask[rc] = addr_mask; + any_addr_mask |= addr_mask; + } + else + reg_addr[m].addr_mask[rc] = 0; + + continue; + } + /* Can mode values go in the GPR/FPR/Altivec registers? */ if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg]) { @@ -2873,6 +2925,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) for (r = CR1_REGNO; r <= CR7_REGNO; ++r) rs6000_regno_regclass[r] = CR_REGS; + for (r = FIRST_DMR_REGNO; r <= LAST_DMR_REGNO; ++r) + rs6000_regno_regclass[r] = DM_REGS; + rs6000_regno_regclass[LR_REGNO] = LINK_REGS; rs6000_regno_regclass[CTR_REGNO] = CTR_REGS; rs6000_regno_regclass[CA_REGNO] = NO_REGS; @@ -2897,6 +2952,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE; reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE; reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE; + reg_class_to_reg_type[(int)DM_REGS] = DMR_REG_TYPE; if (TARGET_VSX) { @@ -3083,8 +3139,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) if (TARGET_DIRECT_MOVE_128) rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; + /* Support for the accumulator registers, either FPR registers (aka original + mma) or DMR registers (dense math). */ if (TARGET_MMA) - rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS; + rs6000_constraints[RS6000_CONSTRAINT_wD] + = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS; /* Set up the reload helper and direct move functions. */ if (TARGET_VSX || TARGET_ALTIVEC) @@ -12396,6 +12455,11 @@ rs6000_secondary_reload_memory (rtx addr, addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX] & ~RELOAD_REG_AND_M16); + /* DMR registers use VSX registers for memory operations, and need to + generate some extra instructions. */ + else if (rclass == DM_REGS) + return 2; + /* If the register allocator hasn't made up its mind yet on the register class to use, settle on defaults to use. */ else if (rclass == NO_REGS) @@ -12724,6 +12788,13 @@ rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE))) return true; + /* We can transfer between VSX registers and DMR registers without needing + extra registers. */ + if (TARGET_DENSE_MATH && mode == XOmode + && ((to_type == DMR_REG_TYPE && from_type == VSX_REG_TYPE) + || (to_type == VSX_REG_TYPE && from_type == DMR_REG_TYPE))) + return true; + return false; } @@ -13418,6 +13489,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) machine_mode mode = GET_MODE (x); bool is_constant = CONSTANT_P (x); + /* DMR registers can't be loaded or stored. */ + if (rclass == DM_REGS) + return NO_REGS; + /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a preferred reload class for it. */ if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS) @@ -13514,7 +13589,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) return VSX_REGS; if (mode == XOmode) - return FLOAT_REGS; + return TARGET_MMA_DENSE_MATH ? VSX_REGS : FLOAT_REGS; if (GET_MODE_CLASS (mode) == MODE_INT) return GENERAL_REGS; @@ -13639,6 +13714,11 @@ rs6000_secondary_reload_class (enum reg_class rclass, machine_mode mode, else regno = -1; + /* DMR registers don't have loads or stores. We have to go through the VSX + registers to load XOmode (vector quad). */ + if (TARGET_MMA_DENSE_MATH && rclass == DM_REGS) + return VSX_REGS; + /* If we have VSX register moves, prefer moving scalar values between Altivec registers and GPR by going via an FPR (and then via memory) instead of reloading the secondary memory address for Altivec moves. */ @@ -14152,8 +14232,19 @@ print_operand (FILE *file, rtx x, int code) output_operand. */ case 'A': - /* Write the MMA accumulator number associated with VSX register X. */ - if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) + /* Write the MMA accumulator number associated with VSX register X. On + dense math systems, only allow DMR accumulators, not accumulators + overlapping with the FPR registers. */ + if (!REG_P (x)) + output_operand_lossage ("invalid %%A value"); + else if (TARGET_MMA_DENSE_MATH) + { + if (DMR_REGNO_P (REGNO (x))) + fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); + else + output_operand_lossage ("%%A operand is not a DMR"); + } + else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) output_operand_lossage ("invalid %%A value"); else fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4); @@ -22836,6 +22927,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode, } +/* Subroutine to determine the move cost of dense math registers. If we are + moving to/from VSX_REGISTER registers, the cost is either 1 move (for + 512-bit accumulators) or 2 moves (for 1,024 dmr registers). If we are + moving to anything else like GPR registers, make the cost very high. */ + +static int +rs6000_dmr_register_move_cost (machine_mode mode, reg_class_t rclass) +{ + const int reg_move_base = 2; + HARD_REG_SET vsx_set = (reg_class_contents[rclass] + & reg_class_contents[VSX_REGS]); + + if (TARGET_MMA_DENSE_MATH && !hard_reg_set_empty_p (vsx_set)) + { + /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction. */ + if (mode == XOmode) + return reg_move_base; + + else + return reg_move_base * 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); + } + + return 1000 * 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); +} + /* A C expression returning the cost of moving data from a register of class CLASS1 to one of CLASS2. */ @@ -22849,17 +22965,28 @@ rs6000_register_move_cost (machine_mode mode, if (TARGET_DEBUG_COST) dbg_cost_ctrl++; + HARD_REG_SET to_vsx, from_vsx; + to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS]; + from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS]; + + /* Special case DMR registers, that can only move to/from VSX registers. */ + if (from == DM_REGS && to == DM_REGS) + ret = 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); + + else if (from == DM_REGS) + ret = rs6000_dmr_register_move_cost (mode, to); + + else if (to == DM_REGS) + ret = rs6000_dmr_register_move_cost (mode, from); + /* If we have VSX, we can easily move between FPR or Altivec registers, otherwise we can only easily move within classes. Do this first so we give best-case answers for union classes containing both gprs and vsx regs. */ - HARD_REG_SET to_vsx, from_vsx; - to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS]; - from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS]; - if (!hard_reg_set_empty_p (to_vsx) - && !hard_reg_set_empty_p (from_vsx) - && (TARGET_VSX - || hard_reg_set_intersect_p (to_vsx, from_vsx))) + else if (!hard_reg_set_empty_p (to_vsx) + && !hard_reg_set_empty_p (from_vsx) + && (TARGET_VSX + || hard_reg_set_intersect_p (to_vsx, from_vsx))) { int reg = FIRST_FPR_REGNO; if (TARGET_VSX @@ -22956,6 +23083,9 @@ rs6000_memory_move_cost (machine_mode mode, reg_class_t rclass, ret = 4 * hard_regno_nregs (32, mode); else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS)) ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode); + else if (reg_classes_intersect_p (rclass, DM_REGS)) + ret = (rs6000_dmr_register_move_cost (mode, VSX_REGS) + + rs6000_memory_move_cost (mode, VSX_REGS, false)); else ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS); @@ -24164,6 +24294,8 @@ rs6000_compute_pressure_classes (enum reg_class *pressure_classes) if (TARGET_HARD_FLOAT) pressure_classes[n++] = FLOAT_REGS; } + if (TARGET_MMA_DENSE_MATH) + pressure_classes[n++] = DM_REGS; pressure_classes[n++] = CR_REGS; pressure_classes[n++] = SPECIAL_REGS; @@ -24328,6 +24460,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned int format) return 67; if (regno == 64) return 64; + /* XXX: This is a guess. The GCC register number for FIRST_DMR_REGNO is 111, + but the frame pointer regnum uses that. */ + if (DMR_REGNO_P (regno)) + return regno - FIRST_DMR_REGNO + 112; gcc_unreachable (); } @@ -27717,9 +27853,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) unsigned offset = 0; unsigned size = GET_MODE_SIZE (reg_mode); - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA + /* If we are reading an accumulator register, we have to deprime it + before we can access it unless we have dense math registers. */ + if (TARGET_MMA_NO_DENSE_MATH && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27751,9 +27887,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) emit_insn (gen_rtx_SET (dst2, src2)); } - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA + /* If we are writing an accumulator register, we have to prime it + after we've written it unless we have dense math registers. */ + if (TARGET_MMA_NO_DENSE_MATH && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); @@ -27767,7 +27903,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE); gcc_assert (REG_P (dst)); if (GET_MODE (src) == XOmode) - gcc_assert (FP_REGNO_P (REGNO (dst))); + gcc_assert ((TARGET_MMA_DENSE_MATH + ? VSX_REGNO_P (REGNO (dst)) + : FP_REGNO_P (REGNO (dst)))); if (GET_MODE (src) == OOmode) gcc_assert (VSX_REGNO_P (REGNO (dst))); @@ -27820,9 +27958,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) emit_insn (gen_rtx_SET (dst_i, op)); } - /* We are writing an accumulator register, so we have to - prime it after we've written it. */ - if (GET_MODE (src) == XOmode) + /* We are writing an accumulator register, so we have to prime it + after we've written it unless we have dense math registers. */ + if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH) emit_insn (gen_mma_xxmtacc (dst, dst)); return; @@ -27833,9 +27971,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) { - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA + /* If we are reading an accumulator register, we have to deprime it + before we can access it unless we have dense math registers. */ + if (TARGET_MMA_NO_DENSE_MATH && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27861,9 +27999,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) i * reg_mode_size))); } - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA + /* If we are writing an accumulator register, we have to prime it after + we've written it unless we have dense math registers. */ + if (TARGET_MMA_NO_DENSE_MATH && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); } @@ -27998,9 +28136,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true)); } - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA && REG_P (src) + /* If we are reading an accumulator register, we have to deprime it + before we can access it unless we have dense math registers. */ + if (TARGET_MMA_NO_DENSE_MATH && REG_P (src) && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -28030,9 +28168,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) j * reg_mode_size))); } - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA && REG_P (dst) + /* If we are writing an accumulator register, we have to prime it after + we've written it unless we have dense math registers. */ + if (TARGET_MMA_NO_DENSE_MATH && REG_P (dst) && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 07a372b8902..39dd8756b12 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -576,6 +576,12 @@ extern int rs6000_vector_align[]; && TARGET_P8_VECTOR \ && TARGET_POWERPC64) +/* Whether we have dense math support. At present, we don't have a dense math + ISA bit, just use the future bit set by -mcpu=future. */ +#define TARGET_DENSE_MATH TARGET_FUTURE +#define TARGET_MMA_DENSE_MATH (TARGET_MMA && TARGET_DENSE_MATH) +#define TARGET_MMA_NO_DENSE_MATH (TARGET_MMA && !TARGET_DENSE_MATH) + /* Inlining allows targets to define the meanings of bits in target_info field of ipa_fn_summary by itself, the used bits for rs6000 are listed below. */ @@ -673,6 +679,7 @@ extern unsigned char rs6000_recip_bits[]; #define UNITS_PER_FP_WORD 8 #define UNITS_PER_ALTIVEC_WORD 16 #define UNITS_PER_VSX_WORD 16 +#define UNITS_PER_DMR_WORD 128 /* Type used for ptrdiff_t, as a string used in a declaration. */ #define PTRDIFF_TYPE "int" @@ -786,7 +793,7 @@ enum data_align { align_abi, align_opt, align_both }; Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame pointer, which is eventually eliminated in favor of SP or FP. */ -#define FIRST_PSEUDO_REGISTER 111 +#define FIRST_PSEUDO_REGISTER 119 /* Use standard DWARF numbering for DWARF debugging information. */ #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0) @@ -823,7 +830,9 @@ enum data_align { align_abi, align_opt, align_both }; /* cr0..cr7 */ \ 0, 0, 0, 0, 0, 0, 0, 0, \ /* vrsave vscr sfp */ \ - 1, 1, 1 \ + 1, 1, 1, \ + /* DMR registers. */ \ + 0, 0, 0, 0, 0, 0, 0, 0 \ } /* Like `CALL_USED_REGISTERS' except this macro doesn't require that @@ -847,7 +856,9 @@ enum data_align { align_abi, align_opt, align_both }; /* cr0..cr7 */ \ 1, 1, 0, 0, 0, 1, 1, 1, \ /* vrsave vscr sfp */ \ - 0, 0, 0 \ + 0, 0, 0, \ + /* DMR registers. */ \ + 0, 0, 0, 0, 0, 0, 0, 0 \ } #define TOTAL_ALTIVEC_REGS (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1) @@ -884,6 +895,7 @@ enum data_align { align_abi, align_opt, align_both }; v2 (not saved; incoming vector arg reg; return value) v19 - v14 (not saved or used for anything) v31 - v20 (saved; order given to save least number) + dmr0 - dmr7 (not saved) vrsave, vscr (fixed) sfp (fixed) */ @@ -926,6 +938,9 @@ enum data_align { align_abi, align_opt, align_both }; 66, \ 83, 82, 81, 80, 79, 78, \ 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, \ + /* DMR registers. */ \ + 111, 112, 113, 114, 115, 116, 117, 118, \ + /* Vrsave, vscr, sfp. */ \ 108, 109, \ 110 \ } @@ -952,6 +967,9 @@ enum data_align { align_abi, align_opt, align_both }; /* True if register is a VSX register. */ #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N)) +/* True if register is a DMR register. */ +#define DMR_REGNO_P(N) ((N) >= FIRST_DMR_REGNO && (N) <= LAST_DMR_REGNO) + /* Alternate name for any vector register supporting floating point, no matter which instruction set(s) are available. */ #define VFLOAT_REGNO_P(N) \ @@ -1089,6 +1107,7 @@ enum reg_class FLOAT_REGS, ALTIVEC_REGS, VSX_REGS, + DM_REGS, VRSAVE_REGS, VSCR_REGS, GEN_OR_FLOAT_REGS, @@ -1118,6 +1137,7 @@ enum reg_class "FLOAT_REGS", \ "ALTIVEC_REGS", \ "VSX_REGS", \ + "DM_REGS", \ "VRSAVE_REGS", \ "VSCR_REGS", \ "GEN_OR_FLOAT_REGS", \ @@ -1152,6 +1172,8 @@ enum reg_class { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 }, \ /* VSX_REGS. */ \ { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 }, \ + /* DM_REGS. */ \ + { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 }, \ /* VRSAVE_REGS. */ \ { 0x00000000, 0x00000000, 0x00000000, 0x00001000 }, \ /* VSCR_REGS. */ \ @@ -1179,7 +1201,7 @@ enum reg_class /* CA_REGS. */ \ { 0x00000000, 0x00000000, 0x00000000, 0x00000004 }, \ /* ALL_REGS. */ \ - { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff } \ + { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff } \ } /* The same information, inverted: @@ -2080,7 +2102,16 @@ extern char rs6000_reg_names[][8]; /* register names (0 vs. %r0). */ &rs6000_reg_names[108][0], /* vrsave */ \ &rs6000_reg_names[109][0], /* vscr */ \ \ - &rs6000_reg_names[110][0] /* sfp */ \ + &rs6000_reg_names[110][0], /* sfp */ \ + \ + &rs6000_reg_names[111][0], /* dmr0 */ \ + &rs6000_reg_names[112][0], /* dmr1 */ \ + &rs6000_reg_names[113][0], /* dmr2 */ \ + &rs6000_reg_names[114][0], /* dmr3 */ \ + &rs6000_reg_names[115][0], /* dmr4 */ \ + &rs6000_reg_names[116][0], /* dmr5 */ \ + &rs6000_reg_names[117][0], /* dmr6 */ \ + &rs6000_reg_names[118][0], /* dmr7 */ \ } /* Table of additional register names to use in user input. */ @@ -2134,6 +2165,8 @@ extern char rs6000_reg_names[][8]; /* register names (0 vs. %r0). */ {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87}, \ {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91}, \ {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95}, \ + {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114}, \ + {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118}, \ } /* This is how to output an element of a case-vector that is relative. */ diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 6642a471796..fd525e47f48 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -51,6 +51,8 @@ (define_constants (VRSAVE_REGNO 108) (VSCR_REGNO 109) (FRAME_POINTER_REGNUM 110) + (FIRST_DMR_REGNO 111) + (LAST_DMR_REGNO 118) ]) ;; From patchwork Mon Oct 28 19:40:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2003479 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=g6xDTJOL; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XckJx0Sn4z1xwF for ; Tue, 29 Oct 2024 06:41:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 605423858C35 for ; Mon, 28 Oct 2024 19:41:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 9B6113858C52 for ; Mon, 28 Oct 2024 19:40:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B6113858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9B6113858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144439; cv=none; b=F46SOIctNwW37jkFikmMSvY7/OqCLN+6aP1gI3gg7gxl0lcOiKV1c4tI1ACDo21gkqgxeWO62cFxFGRUPvsvuutQn4ReehqGSH95MdXy6/He2WlZOAXzbczJOK1AXGssHBEM0XJPeruDeRDFRJCUMPzuQFIN0iO2wYQjog6DvXE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144439; c=relaxed/simple; bh=mTCI1aayaIuuEu74tFp5VwJNcLapiwMfqD02mfHaSsI=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=k1sL0j8fGPUCJngyNDMDoaBqrno3pFbZX8p4rT0Nvk/R+vkHiyx2uYni4RhEDSP/esIYYpTXR5i+n60/PskxWhOJpbaNrxfOa2/JKKZlj4XqvNQWROb4Cd8+dMQKKUnidDvB753UT8RQbgi0f6LX6JE00u1XY9YByEgqvSegQC4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49SHnMwv004888; Mon, 28 Oct 2024 19:40:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=kogucWxbBXeirufuwIs1/UAHuekPgp 07TiuNR8hnCo8=; b=g6xDTJOLot1W7dxNva/s7OygUR2Ou4T7Cau35FhJ1hwE14 jrpzpdYtuzO9FrKB9mOkblMWmYjO6UVTDhM4FcFkf6l/EQN7scv336KC/xUZGRD2 mkape1dJ0f60fYyWpvFPkEWpT2FrD0USWmkbP/4ZpXkAd+oPF3oKNUMYYwOWd634 YtaCLlOxopTxIhdDqdvspzP8Y7NTJ0F9WacbbvJXin19YnzCSYyVDs9TO53ico16 hq30iWBzS701XYZMc1efaWzO0BeUETXJ6qmHDPcFmrZNy47N8y7j4k8tFtXkPPXe cJUgbAlKnrA5TMVygeQPw3ih3fKU0v6GwPamoebg== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42j3nsmqr7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:40:35 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 49SJahOA017307; Mon, 28 Oct 2024 19:40:34 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 42hars7ycn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:40:34 +0000 Received: from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com [10.39.53.233]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 49SJeXpt43254042 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Oct 2024 19:40:33 GMT Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5596D5803F; Mon, 28 Oct 2024 19:40:33 +0000 (GMT) Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BC5A058054; Mon, 28 Oct 2024 19:40:32 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.80.9]) by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Oct 2024 19:40:32 +0000 (GMT) Date: Mon, 28 Oct 2024 15:40:31 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner Subject: [PATCH 4/6] Switch to dense math names for all MMA operations with -mcpu=future Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: o_EOKK-zenInrXylfGjfYiwpkErcc1wb X-Proofpoint-GUID: o_EOKK-zenInrXylfGjfYiwpkErcc1wb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 adultscore=0 bulkscore=0 mlxlogscore=953 clxscore=1015 priorityscore=1501 suspectscore=0 lowpriorityscore=0 mlxscore=0 impostorscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410280152 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the instruction. However, the prefixed instructions have a 'pm' prefix, and we add the 'dm' prefix afterwards. To prevent having two sets of parallel int attributes, we remove the "pm" prefix from the instruction string in the attributes, and add it later, both in the insn name and in the output template. 2024-10-28 Michael Meissner gcc/ * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a "pm" prefix. (avvi4i4i8): Likewise. (vvi4i4i2): Likewise. (avvi4i4i2): Likewise. (vvi4i4): Likewise. (avvi4i4): Likewise. (pvi4i2): Likewise. (apvi4i2): Likewise. (vvi4i4i4): Likewise. (avvi4i4i4): Likewise. (mma_): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_pm): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm' prefixes based on whether we have the original MMA specification or if we have dense math support. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. --- gcc/config/rs6000/mma.md | 157 ++++++++++++++++++++++++++------------- 1 file changed, 104 insertions(+), 53 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index ae6e7e9695b..2e04eb653fa 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -225,44 +225,47 @@ (define_int_attr apv [(UNSPEC_MMA_XVF64GERPP "xvf64gerpp") (UNSPEC_MMA_XVF64GERNP "xvf64gernp") (UNSPEC_MMA_XVF64GERNN "xvf64gernn")]) -(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) +;; The "pm" prefix is not in these expansions, so that we can generate +;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems +;; without dense math registers. +(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")]) -(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) +(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "xvi4ger8pp")]) -(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2 "pmxvi16ger2") - (UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") - (UNSPEC_MMA_PMXVF16GER2 "pmxvf16ger2") - (UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) +(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2 "xvi16ger2") + (UNSPEC_MMA_PMXVI16GER2S "xvi16ger2s") + (UNSPEC_MMA_PMXVF16GER2 "xvf16ger2") + (UNSPEC_MMA_PMXVBF16GER2 "xvbf16ger2")]) -(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") - (UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") - (UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") - (UNSPEC_MMA_PMXVF16GER2PN "pmxvf16ger2pn") - (UNSPEC_MMA_PMXVF16GER2NP "pmxvf16ger2np") - (UNSPEC_MMA_PMXVF16GER2NN "pmxvf16ger2nn") - (UNSPEC_MMA_PMXVBF16GER2PP "pmxvbf16ger2pp") - (UNSPEC_MMA_PMXVBF16GER2PN "pmxvbf16ger2pn") - (UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") - (UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) +(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "xvi16ger2pp") + (UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp") + (UNSPEC_MMA_PMXVF16GER2PP "xvf16ger2pp") + (UNSPEC_MMA_PMXVF16GER2PN "xvf16ger2pn") + (UNSPEC_MMA_PMXVF16GER2NP "xvf16ger2np") + (UNSPEC_MMA_PMXVF16GER2NN "xvf16ger2nn") + (UNSPEC_MMA_PMXVBF16GER2PP "xvbf16ger2pp") + (UNSPEC_MMA_PMXVBF16GER2PN "xvbf16ger2pn") + (UNSPEC_MMA_PMXVBF16GER2NP "xvbf16ger2np") + (UNSPEC_MMA_PMXVBF16GER2NN "xvbf16ger2nn")]) -(define_int_attr vvi4i4 [(UNSPEC_MMA_PMXVF32GER "pmxvf32ger")]) +(define_int_attr vvi4i4 [(UNSPEC_MMA_PMXVF32GER "xvf32ger")]) -(define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "pmxvf32gerpp") - (UNSPEC_MMA_PMXVF32GERPN "pmxvf32gerpn") - (UNSPEC_MMA_PMXVF32GERNP "pmxvf32gernp") - (UNSPEC_MMA_PMXVF32GERNN "pmxvf32gernn")]) +(define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "xvf32gerpp") + (UNSPEC_MMA_PMXVF32GERPN "xvf32gerpn") + (UNSPEC_MMA_PMXVF32GERNP "xvf32gernp") + (UNSPEC_MMA_PMXVF32GERNN "xvf32gernn")]) -(define_int_attr pvi4i2 [(UNSPEC_MMA_PMXVF64GER "pmxvf64ger")]) +(define_int_attr pvi4i2 [(UNSPEC_MMA_PMXVF64GER "xvf64ger")]) -(define_int_attr apvi4i2 [(UNSPEC_MMA_PMXVF64GERPP "pmxvf64gerpp") - (UNSPEC_MMA_PMXVF64GERPN "pmxvf64gerpn") - (UNSPEC_MMA_PMXVF64GERNP "pmxvf64gernp") - (UNSPEC_MMA_PMXVF64GERNN "pmxvf64gernn")]) +(define_int_attr apvi4i2 [(UNSPEC_MMA_PMXVF64GERPP "xvf64gerpp") + (UNSPEC_MMA_PMXVF64GERPN "xvf64gerpn") + (UNSPEC_MMA_PMXVF64GERNP "xvf64gernp") + (UNSPEC_MMA_PMXVF64GERNN "xvf64gernn")]) -(define_int_attr vvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4 "pmxvi8ger4")]) +(define_int_attr vvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4 "xvi8ger4")]) -(define_int_attr avvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4PP "pmxvi8ger4pp") - (UNSPEC_MMA_PMXVI8GER4SPP "pmxvi8ger4spp")]) +(define_int_attr avvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4PP "xvi8ger4pp") + (UNSPEC_MMA_PMXVI8GER4SPP "xvi8ger4spp")]) ;; Vector pair support. OOmode can only live in VSRs. @@ -580,7 +583,9 @@ (define_insn "mma_" (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_VV))] "TARGET_MMA" - " %A0,%x1,%x2" +{ + return TARGET_DENSE_MATH ? "dm %A0,%x1,%x2" : " %A0,%x1,%x2"; +} [(set_attr "type" "mma")]) (define_insn "mma_" @@ -590,7 +595,9 @@ (define_insn "mma_" (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_AVV))] "TARGET_MMA" - " %A0,%x2,%x3" +{ + return TARGET_DENSE_MATH ? "dm %A0,%x2,%x3" : " %A0,%x2,%x3"; +} [(set_attr "type" "mma")]) (define_insn "mma_" @@ -599,7 +606,9 @@ (define_insn "mma_" (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_PV))] "TARGET_MMA" - " %A0,%x1,%x2" +{ + return TARGET_DENSE_MATH ? "dm %A0,%x1,%x2" : " %A0,%x1,%x2"; +} [(set_attr "type" "mma")]) (define_insn "mma_" @@ -609,10 +618,12 @@ (define_insn "mma_" (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_APV))] "TARGET_MMA" - " %A0,%x2,%x3" +{ + return TARGET_DENSE_MATH ? "dm %A0,%x2,%x3" : " %A0,%x2,%x3"; +} [(set_attr "type" "mma")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -621,11 +632,15 @@ (define_insn "mma_" (match_operand:SI 5 "u8bit_cint_operand" "n,n")] MMA_VVI4I4I8))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4,%5" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x1,%x2,%3,%4,%5" + : "pm %A0,%x1,%x2,%3,%4,%5"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -635,11 +650,15 @@ (define_insn "mma_" (match_operand:SI 6 "u8bit_cint_operand" "n,n")] MMA_AVVI4I4I8))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5,%6" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x2,%x3,%4,%5,%6" + : "pm %A0,%x2,%x3,%4,%5,%6"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -648,11 +667,15 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_3_operand" "n,n")] MMA_VVI4I4I2))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4,%5" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x1,%x2,%3,%4,%5" + : "pm %A0,%x1,%x2,%3,%4,%5"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -662,11 +685,15 @@ (define_insn "mma_" (match_operand:SI 6 "const_0_to_3_operand" "n,n")] MMA_AVVI4I4I2))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5,%6" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x2,%x3,%4,%5,%6" + : "pm %A0,%x2,%x3,%4,%5,%6"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -674,11 +701,15 @@ (define_insn "mma_" (match_operand:SI 4 "const_0_to_15_operand" "n,n")] MMA_VVI4I4))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x1,%x2,%3,%4" + : "pm %A0,%x1,%x2,%3,%4"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -687,11 +718,15 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_15_operand" "n,n")] MMA_AVVI4I4))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x2,%x3,%4,%5" + : "pm %A0,%x2,%x3,%4,%5"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -699,11 +734,15 @@ (define_insn "mma_" (match_operand:SI 4 "const_0_to_3_operand" "n,n")] MMA_PVI4I2))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x1,%x2,%3,%4" + : "pm %A0,%x1,%x2,%3,%4"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:OO 2 "vsx_register_operand" "v,?wa") @@ -712,11 +751,15 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_3_operand" "n,n")] MMA_APVI4I2))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x2,%x3,%4,%5" + : "pm %A0,%x2,%x3,%4,%5"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -725,11 +768,15 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_15_operand" "n,n")] MMA_VVI4I4I4))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4,%5" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x1,%x2,%3,%4,%5" + : "pm %A0,%x1,%x2,%3,%4,%5"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) -(define_insn "mma_" +(define_insn "mma_pm" [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") @@ -739,6 +786,10 @@ (define_insn "mma_" (match_operand:SI 6 "const_0_to_15_operand" "n,n")] MMA_AVVI4I4I4))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5,%6" +{ + return (TARGET_DENSE_MATH + ? "pmdm %A0,%x2,%x3,%4,%5,%6" + : "pm %A0,%x2,%x3,%4,%5,%6"); +} [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) From patchwork Mon Oct 28 19:42:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2003480 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=FL/1EbII; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XckLp1V84z1xwF for ; Tue, 29 Oct 2024 06:42:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6FEA83858D37 for ; Mon, 28 Oct 2024 19:42:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 3754E3858D34 for ; Mon, 28 Oct 2024 19:42:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3754E3858D34 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3754E3858D34 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144533; cv=none; b=EY75t5UH3ocgde9GGWQp4WcSG7GxTVCCoXQeggh3OU3TvlJrgwKhyWupDOzJ3Ktz6Mm6eSizFwkVSZ2JpaWManeU3JQkItE9gtNv2kiHLBe0vXhaAQzsiGPvP8rvbM2kLLVktCsyWneGKLOI8AQTH9Ax+JjMIUBJT5WouUzjGDw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144533; c=relaxed/simple; bh=nMuWzdhTtPNImKiSKq3BJYpbT/MnNieX/VhEwaG/VeI=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=KLP8dyUU5xoCCRfQN5FhgXTxQVT7Zy2UhZkoWU4p/zFMEBeapYVBylqPwrx2t1Vunedt48dv+vR77MBFgM7pHKRJ6CN6UI1zQlt9ewIiR13MgU1YALqjZgmJMgNd1wyD07rgCQlg3tLgmYpte5FLv49ZyZ1lIg6Y2DpXFzDtUq0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49SHnOj5006132; Mon, 28 Oct 2024 19:42:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=IqeNm24ly8Oy4mXKiJfkrz16yqlg9A mXBvNcQFsEiqQ=; b=FL/1EbII46AyoYIo9bBjLMHaCtgge7f7HPivbg4jj9CqiF LURm0lq7/QzYi7xhk6orIlyzazYUynMB24gYV1ivvtwfW3wlnaDUZGSco0Phxjsn n5Dzi6JfQYF1Mz9FrbRn0gaxTVtut+v1/dgDYTWJI7Airlos2gPqSedrVCbeqiAi KB/ETuwWtva9+YEPRm7ZcXO3sg2pBxPI1gFte+cOB6oX3p8ZLtGXVIr1ZqudRs0M +rHtTc2rMFnrKVQhMoumOofRhaPUvHPRFou2V+QOJ7gWYuNJhpR5Hv2A1eOUVBOX srddC+sSe1O44i1XmysdB3yEIudvi15fMIgkLJkw== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42j43ebwd4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:42:10 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 49SFo1MW028181; Mon, 28 Oct 2024 19:42:10 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([172.16.1.70]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 42hb4xqvtn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:42:09 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 49SJg8DO20775462 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Oct 2024 19:42:08 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BCEBE5805B; Mon, 28 Oct 2024 19:42:08 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3D52458055; Mon, 28 Oct 2024 19:42:08 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.80.9]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Oct 2024 19:42:08 +0000 (GMT) Date: Mon, 28 Oct 2024 15:42:06 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner Subject: [PATCH 5/6] Add dense math test for new instruction names Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ghEA5I1FGcK0r0EK6UCqmEouSxhUTrqj X-Proofpoint-ORIG-GUID: ghEA5I1FGcK0r0EK6UCqmEouSxhUTrqj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 bulkscore=0 impostorscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 lowpriorityscore=0 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410280152 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a test for the new dense math support. 2024-10-28 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. --- .../gcc.target/powerpc/dm-double-test.c | 194 ++++++++++++++++++ gcc/testsuite/lib/target-supports.exp | 23 +++ 2 files changed, 217 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c new file mode 100644 index 00000000000..66c19779585 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c @@ -0,0 +1,194 @@ +/* Test derived from mma-double-1.c, modified for dense math. */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef double v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[0]; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[1]; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[2]; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[3]; + +void +DM (int m, int n, int k, double *A, double *B, double *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 4) + { + double *CO; + double *AO; + AO = A; + CO = C; + C += m * 4; + for (int j = 0; j < m; j += 16) + { + double *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + __vector_pair rowB; + vec_t *rb = (vec_t *) & BO[i * 4]; + __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]); + __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]); + __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]); + __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]); + __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]); + __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]); + __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]); + __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]); + __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC (&acc1, m, 2); + SAVE_ACC (&acc3, m, 6); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC (&acc5, m, 10); + SAVE_ACC (&acc7, m, 14); + AO += k * 16; + BO += k * 4; + CO += 16; + } + B += k * 4; + } +} + +void +init (double *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) + { + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } + } +} + +void +init0 (double *matrix, double *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) + for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const double *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) + { + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); + } + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) + { + for (int t1 = 4; t1 <= 16; t1 += 4) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + double A[rowsA * common]; + double B[common * colsB]; + double C[rowsA * colsB]; + double D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + DM (rowsA, colsB, common, A, B, C); + + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + D[i * rowsA + j] = 0; + for (k = 0; k < common; k++) + { + D[i * rowsA + j] += + A[k * rowsA + j] * B[k + common * i]; + } + } + } + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + for (k = 0; k < common; k++) + { + if (D[i * rowsA + j] != C[i * rowsA + j]) + { + printf ("Error %d,%d,%d\n",i,j,k); + ret++; + } + } + } + } + if (ret) + { + print ("A", A, rowsA, common); + print ("B", B, common, colsB); + print ("C", C, rowsA, colsB); + print ("D", D, rowsA, colsB); + } + } + } + +#ifdef VERBOSE + if (ret) + printf ("DM double test fail: %d errors\n",ret); + else + printf ("DM double test success: 0 DM errors\n"); +#else + if (ret) + abort(); +#endif + + return ret; +} + +/* { dg-final { scan-assembler {\mdmsetdmrz\M} } } */ +/* { dg-final { scan-assembler {\mdmxvf64gerpp\M} } } */ +/* { dg-final { scan-assembler {\mdmxxextfdmr512\M} } } */ + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d113a08dff7..0b506fdda35 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -7366,6 +7366,29 @@ proc check_effective_target_power10_ok { } { } } +# Return 1 if this is a PowerPC target supporting -mcpu=future which enables +# the dense math operations. +proc check_effective_target_powerpc_dense_math_ok { } { + if { ([istarget powerpc*-*-*]) } { + return [check_no_compiler_messages powerpc_dense_math_ok object { + __vector_quad vq; + int main (void) { + #ifndef __DENSE_MATH__ + #error "target does not have dense math support." + #else + /* Make sure we have dense math support. */ + __vector_quad dmr; + __asm__ ("dmsetaccz %A0" : "=wD" (dmr)); + vq = dmr; + #endif + return 0; + } + } "-mcpu=future"] + } else { + return 0; + } +} + # Return 1 if this is a PowerPC target supporting -mfloat128 via either # software emulation on power7/power8 systems or hardware support on power9. From patchwork Mon Oct 28 19:43:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2003481 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=WcnHQZkv; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XckNM4GLrz1xwF for ; Tue, 29 Oct 2024 06:44:03 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ADF5E3858C98 for ; Mon, 28 Oct 2024 19:44:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 3FA6F3858D26 for ; Mon, 28 Oct 2024 19:43:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3FA6F3858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3FA6F3858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144618; cv=none; b=bACpxQgrDfPmT0SGUMjqv1+DdAV+bqX+iNGXP+XQh9jO7QuNQzCXGx3I6RcwiYxMv9LSOn1X30DGTgcpfh54mgeTeOEjR9R95V1+Mgxb+oZ7Ogjy0U4SrFhznbDPrW1pdw2xXCIQDArKfcPk/tKdJkkjz/Nm/Vajx1RtJOQRzcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730144618; c=relaxed/simple; bh=mQTKKcNCqlDM3plYVrX5zjsTDbjWKXkM+Ub2TQfWyIg=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=uiy9s0kAGPuPjZM7BZsunRwLeV3xkA/19vR/MjIiZV17a/NSk5PJgrdFHqmZUj4hrxdqdK7ruIHIBJUpbbA1pVIELdTgzwbpytUgBvxTFmkx4viHYTI550F03M4W5EURgb5vLkn2LUy0xLTsNYX49PhFGNVdXg71O7KkFf11/pw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49SHnMIg030909; Mon, 28 Oct 2024 19:43:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=JIBHJ02u/ThX5hPz+RuzGwMdeicrpq kMl1HcjG7Vwfc=; b=WcnHQZkvCXITedJdLrvtejP63D3bqN8kTIA57fJZ6wN93k nMAz42Mx8sU/fe7VGSt9ox1ocg8iQNMQg3bKLivppWGNrdZl3dzawT6X9eV8Yq9i rPnh3iDd+06gGlL+CAP6WLo+KChX8Y8aJdCDNlp6OSYzGCMBYFLs169yueMNtzxR +SUEIshlDDARurI47OMGJjeGSx0xnlMqfJOAUIAYKnlplQL0olJojarWLcNr+4z2 pmZn37MVFLIP7D+8D8rms5yRceGSUCZA6Zc6MfJnqMxI/44ITgT7XKcVVY5WsBp4 +a60+NwkZ8J628umGlb3Nqtd78mBNjEFvcr7X13Q== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42jb659rqu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:43:34 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 49SJMlJc017362; Mon, 28 Oct 2024 19:43:33 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 42hars7ym4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Oct 2024 19:43:33 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 49SJhW9r36766162 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Oct 2024 19:43:33 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9DF5758053; Mon, 28 Oct 2024 19:43:32 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 098C258043; Mon, 28 Oct 2024 19:43:32 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.80.9]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Oct 2024 19:43:31 +0000 (GMT) Date: Mon, 28 Oct 2024 15:43:30 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner Subject: [PATCH 6/6] Add support for 1,024 bit Dense Math Registers Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: y-DrS9efQ6T7abHBbhGYTOScbuL1woI4 X-Proofpoint-GUID: y-DrS9efQ6T7abHBbhGYTOScbuL1woI4 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 adultscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 impostorscore=0 spamscore=0 suspectscore=0 mlxscore=0 malwarescore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410280152 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit registers instead of 512 bit registers. I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit registers. The 'wD' constraint added in previous patches is used for these registers. I added support to do load and store of DMRs via the VSX registers, since there are no load/store dense math instructions. I added the new keyword '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2024-10-28 Michael Meissner gcc/ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. (UNSPEC_DM_INSERT512_LOWER): Likewise. (UNSPEC_DM_EXTRACT512): Likewise. (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. (movtdo): New define_expand and define_insn_and_split to implement 1,024 bit DMR registers. (movtdo_insert512_upper): New insn. (movtdo_insert512_lower): Likewise. (movtdo_extract512): Likewise. (reload_dmr_from_memory): Likewise. (reload_dmr_to_memory): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR support. (rs6000_init_builtins): Add support for __dmr keyword. * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support for TDOmode. (rs6000_function_arg): Likewise. * config/rs6000/rs6000-modes.def (TDOmode): New mode. * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add support for TDOmode. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_hard_regno_mode_ok): Likewise. (rs6000_modes_tieable_p): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload hooks for DMR mode. (reg_offset_addressing_ok_p): Add support for TDOmode. (rs6000_emit_move): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (rs6000_mangle_type): Add mangling for __dmr type. (rs6000_dmr_register_move_cost): Add support for TDOmode. (rs6000_split_multireg_move): Likewise. (rs6000_invalid_conversion): Likewise. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. (enum rs6000_builtin_type_index): Add DMR type nodes. (dmr_type_node): Likewise. (ptr_dmr_type_node): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. --- gcc/config/rs6000/mma.md | 154 ++++++++++++++++++ gcc/config/rs6000/rs6000-builtin.cc | 17 ++ gcc/config/rs6000/rs6000-call.cc | 10 +- gcc/config/rs6000/rs6000-modes.def | 4 + gcc/config/rs6000/rs6000.cc | 101 ++++++++---- gcc/config/rs6000/rs6000.h | 6 +- gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 +++++++ 7 files changed, 321 insertions(+), 34 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 2e04eb653fa..8461499e1c3 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -92,6 +92,11 @@ (define_c_enum "unspec" UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC UNSPEC_MMA_DMSETDMRZ + UNSPEC_DM_INSERT512_UPPER + UNSPEC_DM_INSERT512_LOWER + UNSPEC_DM_EXTRACT512 + UNSPEC_DMR_RELOAD_FROM_MEMORY + UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -793,3 +798,152 @@ (define_insn "mma_pm" } [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) + +;; TDOmode (__dmr keyword for 1,024 bit registers). +(define_expand "movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand") + (match_operand:TDO 1 "input_operand"))] + "TARGET_MMA_DENSE_MATH" +{ + rs6000_emit_move (operands[0], operands[1], TDOmode); + DONE; +}) + +(define_insn_and_split "*movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa") + (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] + "TARGET_MMA_DENSE_MATH + && (gpc_reg_operand (operands[0], TDOmode) + || gpc_reg_operand (operands[1], TDOmode))" + "@ + # + # + # + # + dmmr %0,%1 + #" + "&& reload_completed + && (!dmr_operand (operands[0], TDOmode) || !dmr_operand (operands[1], TDOmode))" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + + if (REG_P (op0) && REG_P (op1)) + { + int regno0 = REGNO (op0); + int regno1 = REGNO (op1); + + if (DMR_REGNO_P (regno0) && VSX_REGNO_P (regno1)) + { + rtx op1_upper = gen_rtx_REG (XOmode, regno1); + rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4); + emit_insn (gen_movtdo_insert512_upper (op0, op1_upper)); + emit_insn (gen_movtdo_insert512_lower (op0, op0, op1_lower)); + DONE; + } + + else if (VSX_REGNO_P (regno0) && DMR_REGNO_P (regno1)) + { + rtx op0_upper = gen_rtx_REG (XOmode, regno0); + rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4); + emit_insn (gen_movtdo_extract512 (op0_upper, op1, const0_rtx)); + emit_insn (gen_movtdo_extract512 (op0_lower, op1, const1_rtx)); + DONE; + } + + else + gcc_assert (VSX_REGNO_P (regno0) && VSX_REGNO_P (regno1)); + } + + rs6000_split_multireg_move (operands[0], operands[1]); + DONE; +} + [(set_attr "type" "vecload,vecstore,vecmove,vecmove,vecmove,vecmove") + (set_attr "length" "*,*,32,8,*,8") + (set_attr "max_prefixed_insns" "4,4,*,*,*,*")]) + +;; Move from VSX registers to DMR registers via two insert 512 bit +;; instructions. +(define_insn "movtdo_insert512_upper" + [(set (match_operand:TDO 0 "dmr_operand" "=wD") + (unspec:TDO [(match_operand:XO 1 "vsx_register_operand" "wa")] + UNSPEC_DM_INSERT512_UPPER))] + "TARGET_MMA_DENSE_MATH" + "dmxxinstdmr512 %0,%1,%Y1,0" + [(set_attr "type" "mma")]) + +(define_insn "movtdo_insert512_lower" + [(set (match_operand:TDO 0 "dmr_operand" "=wD") + (unspec:TDO [(match_operand:TDO 1 "dmr_operand" "0") + (match_operand:XO 2 "vsx_register_operand" "wa")] + UNSPEC_DM_INSERT512_LOWER))] + "TARGET_MMA_DENSE_MATH" + "dmxxinstdmr512 %0,%2,%Y2,1" + [(set_attr "type" "mma")]) + +;; Move from DMR registers to VSX registers via two extract 512 bit +;; instructions. +(define_insn "movtdo_extract512" + [(set (match_operand:XO 0 "vsx_register_operand" "=wa") + (unspec:XO [(match_operand:TDO 1 "dmr_operand" "wD") + (match_operand 2 "const_0_to_1_operand" "n")] + UNSPEC_DM_EXTRACT512))] + "TARGET_MMA_DENSE_MATH" + "dmxxextfdmr512 %0,%Y0,%1,%2" + [(set_attr "type" "mma")]) + +;; Reload DMR registers from memory +(define_insn_and_split "reload_dmr_from_memory" + [(set (match_operand:TDO 0 "dmr_operand" "=wD") + (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")] + UNSPEC_DMR_RELOAD_FROM_MEMORY)) + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] + "TARGET_MMA_DENSE_MATH" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx tmp = operands[2]; + rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 : 64); + rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 64 : 0); + + emit_move_insn (tmp, mem_upper); + emit_insn (gen_movtdo_insert512_upper (dest, tmp)); + + emit_move_insn (tmp, mem_lower); + emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp)); + DONE; +} + [(set_attr "length" "16") + (set_attr "max_prefixed_insns" "2") + (set_attr "type" "vecload")]) + +;; Reload dense math registers to memory +(define_insn_and_split "reload_dmr_to_memory" + [(set (match_operand:TDO 0 "memory_operand" "=m") + (unspec:TDO [(match_operand:TDO 1 "dmr_operand" "wD")] + UNSPEC_DMR_RELOAD_TO_MEMORY)) + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] + "TARGET_MMA_DENSE_MATH" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx tmp = operands[2]; + rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0 : 64); + rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 64 : 0); + + emit_insn (gen_movtdo_extract512 (tmp, src, const0_rtx)); + emit_move_insn (mem_upper, tmp); + + emit_insn (gen_movtdo_extract512 (tmp, src, const1_rtx)); + emit_move_insn (mem_lower, tmp); + DONE; +} + [(set_attr "length" "16") + (set_attr "max_prefixed_insns" "2")]) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index f2063edd2c3..8e4335e9b44 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -495,6 +495,8 @@ const char *rs6000_type_string (tree type_node) return "__vector_pair"; else if (type_node == vector_quad_type_node) return "__vector_quad"; + else if (type_node == dmr_type_node) + return "__dmr"; return "unknown"; } @@ -781,6 +783,21 @@ rs6000_init_builtins (void) t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST); ptr_vector_quad_type_node = build_pointer_type (t); + /* For TDOmode (1,024 bit dense math accumulators), don't use an alignment of + 1,024, use 512. TDOmode loads and stores are always broken up into 2 + vector pair loads or stores. In addition, we don't have support for + aligning the stack to 1,024 bits. */ + dmr_type_node = make_node (OPAQUE_TYPE); + SET_TYPE_MODE (dmr_type_node, TDOmode); + TYPE_SIZE (dmr_type_node) = bitsize_int (GET_MODE_BITSIZE (TDOmode)); + TYPE_PRECISION (dmr_type_node) = GET_MODE_BITSIZE (TDOmode); + TYPE_SIZE_UNIT (dmr_type_node) = size_int (GET_MODE_SIZE (TDOmode)); + SET_TYPE_ALIGN (dmr_type_node, 512); + TYPE_USER_ALIGN (dmr_type_node) = 0; + lang_hooks.types.register_builtin_type (dmr_type_node, "__dmr"); + t = build_qualified_type (dmr_type_node, TYPE_QUAL_CONST); + ptr_dmr_type_node = build_pointer_type (t); + tdecl = add_builtin_type ("__bool char", bool_char_type_node); TYPE_NAME (bool_char_type_node) = tdecl; diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc index a039ff75f3c..5220654ef6f 100644 --- a/gcc/config/rs6000/rs6000-call.cc +++ b/gcc/config/rs6000/rs6000-call.cc @@ -437,14 +437,15 @@ rs6000_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) if (cfun && !cfun->machine->mma_return_type_error && TREE_TYPE (cfun->decl) == fntype - && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode)) + && OPAQUE_MODE_P (TYPE_MODE (type))) { /* Record we have now handled function CFUN, so the next time we are called, we do not re-report the same error. */ cfun->machine->mma_return_type_error = true; if (TYPE_CANONICAL (type) != NULL_TREE) type = TYPE_CANONICAL (type); - error ("invalid use of MMA type %qs as a function return value", + error ("invalid use of %s type %qs as a function return value", + (TYPE_MODE (type) == TDOmode) ? "dense math" : "MMA", IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); } @@ -1632,11 +1633,12 @@ rs6000_function_arg (cumulative_args_t cum_v, const function_arg_info &arg) int n_elts; /* We do not allow MMA types being used as function arguments. */ - if (mode == OOmode || mode == XOmode) + if (OPAQUE_MODE_P (mode)) { if (TYPE_CANONICAL (type) != NULL_TREE) type = TYPE_CANONICAL (type); - error ("invalid use of MMA operand of type %qs as a function parameter", + error ("invalid use of %s operand of type %qs as a function parameter", + (mode == TDOmode) ? "dense math" : "MMA", IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); return NULL_RTX; } diff --git a/gcc/config/rs6000/rs6000-modes.def b/gcc/config/rs6000/rs6000-modes.def index b69593c40a6..d089dc2c323 100644 --- a/gcc/config/rs6000/rs6000-modes.def +++ b/gcc/config/rs6000/rs6000-modes.def @@ -79,3 +79,7 @@ PARTIAL_INT_MODE (TI, 128, PTI); /* Modes used by __vector_pair and __vector_quad. */ OPAQUE_MODE (OO, 32); OPAQUE_MODE (XO, 64); + +/* Mode used by __dmr. */ +OPAQUE_MODE (TDO, 128); + diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index bd1c979eca2..a660101f51a 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1925,7 +1925,8 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode) 128-bit floating point that can go in vector registers, which has VSX memory addressing. */ if (FP_REGNO_P (regno)) - reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode) + reg_size = (VECTOR_MEM_VSX_P (mode) + || VECTOR_ALIGNMENT_P (mode) ? UNITS_PER_VSX_WORD : UNITS_PER_FP_WORD); @@ -1965,13 +1966,13 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) Because we just use the VSX registers for load/store operations, we just need to make sure load vector pair and store vector pair instructions can be used. */ - if (mode == XOmode) + if (mode == XOmode || mode == TDOmode) { if (!TARGET_MMA) return 0; else if (!TARGET_DENSE_MATH) - return (FP_REGNO_P (regno) && (regno & 3) == 0); + return (mode == XOmode && FP_REGNO_P (regno) && (regno & 3) == 0); else if (DMR_REGNO_P (regno)) return 1; @@ -1982,7 +1983,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) && (regno & 1) == 0); } - /* No other types other than XOmode can go in DMRs. */ + /* No other types other than XOmode or TDOmode can go in DMRs. */ if (DMR_REGNO_P (regno)) return 0; @@ -2090,9 +2091,11 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode) GPR registers, and TImode can go in any GPR as well as VSX registers (PR 57744). - Similarly, don't allow OOmode (vector pair, restricted to even VSX - registers) or XOmode (vector quad, restricted to FPR registers divisible - by 4) to tie with other modes. + Similarly, don't allow OOmode (vector pair), XOmode (vector quad), or + TDOmode (dmr register) to pair with anything else. Vector pairs are + restricted to even/odd VSX registers. Without dense math, vector quads are + limited to FPR registers divisible by 4. With dense math, vector quads are + limited to even VSX registers or DMR registers. Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE 128-bit floating point on VSX systems ties with other vectors. */ @@ -2101,7 +2104,8 @@ static bool rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2) { if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode - || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode) + || mode1 == TDOmode || mode2 == PTImode || mode2 == OOmode + || mode2 == XOmode || mode2 == TDOmode) return mode1 == mode2; if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1)) @@ -2392,6 +2396,7 @@ rs6000_debug_reg_global (void) V4DFmode, OOmode, XOmode, + TDOmode, CCmode, CCUNSmode, CCEQmode, @@ -2763,7 +2768,7 @@ rs6000_setup_reg_addr_masks (void) /* Special case DMR registers. */ if (rc == RELOAD_REG_DMR) { - if (TARGET_DENSE_MATH && m2 == XOmode) + if (TARGET_DENSE_MATH && (m2 == XOmode || m2 == TDOmode)) { addr_mask = RELOAD_REG_VALID; reg_addr[m].addr_mask[rc] = addr_mask; @@ -2870,10 +2875,10 @@ rs6000_setup_reg_addr_masks (void) /* Vector pairs can do both indexed and offset loads if the instructions are enabled, otherwise they can only do offset loads - since it will be broken into two vector moves. Vector quads can - only do offset loads. */ + since it will be broken into two vector moves. Vector quads and + dense math types can only do offset loads. */ else if ((addr_mask != 0) && TARGET_MMA - && (m2 == OOmode || m2 == XOmode)) + && (m2 == OOmode || m2 == XOmode || m2 == TDOmode)) { addr_mask |= RELOAD_REG_OFFSET; if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX) @@ -3101,6 +3106,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) rs6000_vector_align[XOmode] = 512; } + /* Add support for 1,024 bit DMR registers. */ + if (TARGET_DENSE_MATH) + { + rs6000_vector_unit[TDOmode] = VECTOR_NONE; + rs6000_vector_mem[TDOmode] = VECTOR_VSX; + rs6000_vector_align[TDOmode] = 512; + } + /* Register class constraints for the constraints that depend on compile switches. When the VSX code was added, different constraints were added based on the type (DFmode, V2DFmode, V4SFmode). For the vector types, all @@ -3313,6 +3326,12 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) } } + if (TARGET_DENSE_MATH) + { + reg_addr[TDOmode].reload_load = CODE_FOR_reload_dmr_from_memory; + reg_addr[TDOmode].reload_store = CODE_FOR_reload_dmr_to_memory; + } + /* Precalculate HARD_REGNO_NREGS. */ for (r = 0; HARD_REGISTER_NUM_P (r); ++r) for (m = 0; m < NUM_MACHINE_MODES; ++m) @@ -8771,12 +8790,15 @@ reg_offset_addressing_ok_p (machine_mode mode) return mode_supports_dq_form (mode); break; - /* The vector pair/quad types support offset addressing if the - underlying vectors support offset addressing. */ + /* The vector pair/quad types and the dense math types support offset + addressing if the underlying vectors support offset addressing. */ case E_OOmode: case E_XOmode: return TARGET_MMA; + case E_TDOmode: + return TARGET_DENSE_MATH; + case E_SDmode: /* If we can do direct load/stores of SDmode, restrict it to reg+reg addressing for the LFIWZX and STFIWX instructions. */ @@ -11327,6 +11349,12 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) (mode == OOmode) ? "__vector_pair" : "__vector_quad"); break; + case E_TDOmode: + if (CONST_INT_P (operands[1])) + error ("%qs is an opaque type, and you cannot set it to constants", + "__dmr"); + break; + case E_SImode: case E_DImode: /* Use default pattern for address of ELF small data */ @@ -12790,7 +12818,7 @@ rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, /* We can transfer between VSX registers and DMR registers without needing extra registers. */ - if (TARGET_DENSE_MATH && mode == XOmode + if (TARGET_DENSE_MATH && (mode == XOmode || mode == TDOmode) && ((to_type == DMR_REG_TYPE && from_type == VSX_REG_TYPE) || (to_type == VSX_REG_TYPE && from_type == DMR_REG_TYPE))) return true; @@ -13591,6 +13619,9 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) if (mode == XOmode) return TARGET_MMA_DENSE_MATH ? VSX_REGS : FLOAT_REGS; + if (mode == TDOmode) + return VSX_REGS; + if (GET_MODE_CLASS (mode) == MODE_INT) return GENERAL_REGS; } @@ -20820,6 +20851,8 @@ rs6000_mangle_type (const_tree type) return "u13__vector_pair"; if (type == vector_quad_type_node) return "u13__vector_quad"; + if (type == dmr_type_node) + return "u5__dmr"; /* For all other types, use the default mangling. */ return NULL; @@ -22945,6 +22978,10 @@ rs6000_dmr_register_move_cost (machine_mode mode, reg_class_t rclass) if (mode == XOmode) return reg_move_base; + /* __dmr (i.e. TDOmode) is transferred in 2 instructions. */ + else if (mode == TDOmode) + return reg_move_base * 2; + else return reg_move_base * 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); } @@ -27782,9 +27819,10 @@ rs6000_split_multireg_move (rtx dst, rtx src) mode = GET_MODE (dst); nregs = hard_regno_nregs (reg, mode); - /* If we have a vector quad register for MMA, and this is a load or store, - see if we can use vector paired load/stores. */ - if (mode == XOmode && TARGET_MMA + /* If we have a vector quad register for MMA or DMR register for dense math, + and this is a load or store, see if we can use vector paired + load/stores. */ + if ((mode == XOmode || mode == TDOmode) && TARGET_MMA && (MEM_P (dst) || MEM_P (src))) { reg_mode = OOmode; @@ -27792,7 +27830,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) } /* If we have a vector pair/quad mode, split it into two/four separate vectors. */ - else if (mode == OOmode || mode == XOmode) + else if (mode == OOmode || mode == XOmode || mode == TDOmode) reg_mode = V1TImode; else if (FP_REGNO_P (reg)) reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : @@ -27838,13 +27876,13 @@ rs6000_split_multireg_move (rtx dst, rtx src) return; } - /* The __vector_pair and __vector_quad modes are multi-register - modes, so if we have to load or store the registers, we have to be - careful to properly swap them if we're in little endian mode - below. This means the last register gets the first memory - location. We also need to be careful of using the right register - numbers if we are splitting XO to OO. */ - if (mode == OOmode || mode == XOmode) + /* The __vector_pair, __vector_quad, and __dmr modes are multi-register + modes, so if we have to load or store the registers, we have to be careful + to properly swap them if we're in little endian mode below. This means + the last register gets the first memory location. We also need to be + careful of using the right register numbers if we are splitting XO to + OO. */ + if (mode == OOmode || mode == XOmode || mode == TDOmode) { nregs = hard_regno_nregs (reg, mode); int reg_mode_nregs = hard_regno_nregs (reg, reg_mode); @@ -27981,7 +28019,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) overlap. */ int i; /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) + if (mode == OOmode || mode == XOmode || mode == TDOmode) { for (i = nregs - 1; i >= 0; i--) { @@ -28155,7 +28193,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) continue; /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) + if (mode == OOmode || mode == XOmode || mode == TDOmode) { rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j); rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j); @@ -29137,7 +29175,8 @@ rs6000_invalid_conversion (const_tree fromtype, const_tree totype) if (frommode != tomode) { - /* Do not allow conversions to/from XOmode and OOmode types. */ + /* Do not allow conversions to/from XOmode, OOmode, and TDOmode + types. */ if (frommode == XOmode) return N_("invalid conversion from type %<__vector_quad%>"); if (tomode == XOmode) @@ -29146,6 +29185,10 @@ rs6000_invalid_conversion (const_tree fromtype, const_tree totype) return N_("invalid conversion from type %<__vector_pair%>"); if (tomode == OOmode) return N_("invalid conversion to type %<__vector_pair%>"); + if (frommode == TDOmode) + return N_("invalid conversion from type %<__dmr%>"); + if (tomode == TDOmode) + return N_("invalid conversion to type %<__dmr%>"); } /* Conversion allowed. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 39dd8756b12..eb1b3ceb63e 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1009,7 +1009,7 @@ enum data_align { align_abi, align_opt, align_both }; /* Modes that are not vectors, but require vector alignment. Treat these like vectors in terms of loads and stores. */ #define VECTOR_ALIGNMENT_P(MODE) \ - (FLOAT128_VECTOR_P (MODE) || (MODE) == OOmode || (MODE) == XOmode) + (FLOAT128_VECTOR_P (MODE) || OPAQUE_MODE_P (MODE)) #define ALTIVEC_VECTOR_MODE(MODE) \ ((MODE) == V16QImode \ @@ -2300,6 +2300,7 @@ enum rs6000_builtin_type_index RS6000_BTI_const_str, /* pointer to const char * */ RS6000_BTI_vector_pair, /* unsigned 256-bit types (vector pair). */ RS6000_BTI_vector_quad, /* unsigned 512-bit types (vector quad). */ + RS6000_BTI_dmr, /* unsigned 1,024-bit types (dmr). */ RS6000_BTI_const_ptr_void, /* const pointer to void */ RS6000_BTI_ptr_V16QI, RS6000_BTI_ptr_V1TI, @@ -2338,6 +2339,7 @@ enum rs6000_builtin_type_index RS6000_BTI_ptr_dfloat128, RS6000_BTI_ptr_vector_pair, RS6000_BTI_ptr_vector_quad, + RS6000_BTI_ptr_dmr, RS6000_BTI_ptr_long_long, RS6000_BTI_ptr_long_long_unsigned, RS6000_BTI_MAX @@ -2395,6 +2397,7 @@ enum rs6000_builtin_type_index #define const_str_type_node (rs6000_builtin_types[RS6000_BTI_const_str]) #define vector_pair_type_node (rs6000_builtin_types[RS6000_BTI_vector_pair]) #define vector_quad_type_node (rs6000_builtin_types[RS6000_BTI_vector_quad]) +#define dmr_type_node (rs6000_builtin_types[RS6000_BTI_dmr]) #define pcvoid_type_node (rs6000_builtin_types[RS6000_BTI_const_ptr_void]) #define ptr_V16QI_type_node (rs6000_builtin_types[RS6000_BTI_ptr_V16QI]) #define ptr_V1TI_type_node (rs6000_builtin_types[RS6000_BTI_ptr_V1TI]) @@ -2433,6 +2436,7 @@ enum rs6000_builtin_type_index #define ptr_dfloat128_type_node (rs6000_builtin_types[RS6000_BTI_ptr_dfloat128]) #define ptr_vector_pair_type_node (rs6000_builtin_types[RS6000_BTI_ptr_vector_pair]) #define ptr_vector_quad_type_node (rs6000_builtin_types[RS6000_BTI_ptr_vector_quad]) +#define ptr_dmr_type_node (rs6000_builtin_types[RS6000_BTI_ptr_dmr]) #define ptr_long_long_integer_type_node (rs6000_builtin_types[RS6000_BTI_ptr_long_long]) #define ptr_long_long_unsigned_type_node (rs6000_builtin_types[RS6000_BTI_ptr_long_long_unsigned]) diff --git a/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c new file mode 100644 index 00000000000..0a9884ddf63 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +/* Test basic load/store for __dmr type. */ + +#ifndef CONSTRAINT +#if defined(USE_D) +#define CONSTRAINT "d" + +#elif defined(USE_V) +#define CONSTRAINT "v" + +#elif defined(USE_WA) +#define CONSTRAINT "wa" + +#else +#define CONSTRAINT "wD" +#endif +#endif +const char constraint[] = CONSTRAINT; + +void foo_mem_asm (__dmr *p, __dmr *q) +{ + /* 2 LXVP instructions. */ + __dmr vq = *p; + + /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMR. */ + __asm__ ("# foo (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq)); + /* 2 DMXXEXTFDMR512 instructions to transfer DMR to VSX. */ + + /* 2 STXVP instructions. */ + *q = vq; +} + +void foo_mem_asm2 (__dmr *p, __dmr *q) +{ + /* 2 LXVP instructions. */ + __dmr vq = *p; + __dmr vq2; + __dmr vq3; + + /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMR. */ + __asm__ ("# foo1 (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq)); + /* 2 DMXXEXTFDMR512 instructions to transfer DMR to VSX. */ + + vq2 = vq; + __asm__ ("# foo2 (wa) %0" : "+wa" (vq2)); + + /* 2 STXVP instructions. */ + *q = vq2; +} + +void foo_mem (__dmr *p, __dmr *q) +{ + /* 2 LXVP, 2 STXVP instructions, no DMR transfer. */ + *q = *p; +} + +/* { dg-final { scan-assembler-times {\mdmxxextfdmr512\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mdmxxinstdmr512\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 12 } } */