From patchwork Tue Sep 17 03:45:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1986289 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=bG/1wIiI; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X774N312Sz1y1g for ; Tue, 17 Sep 2024 13:46:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A9663858D39 for ; Tue, 17 Sep 2024 03:46:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 99ADE3858D39 for ; Tue, 17 Sep 2024 03:46:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 99ADE3858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 99ADE3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726544764; cv=none; b=h2UK7i+51/oZBDMRvQxGbku741oUtnGUUcleFbMkFdb/i52ZoNN4kwICXLxq2sOuaQh/lg5ahHNC881vMV8ICiJCDwQXTZWAPCmxHEup9knjX3NDGZsN/T9Tcr7s0AZa5OhzOJ7T0tKSWcDQRplGHBDqQLdAJURl18VIBBUBw1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726544764; c=relaxed/simple; bh=84uEmHPVrTXstbFWTF5sABTdYUr8VyyPzBEGxaAAhWM=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=VCvh06AuQ7FBPSwE9FzBFIk//GyZ1TdU2Gemv+L/J6gfOnNgLDO4v4i7CwtmfYFBN48yLQtPUe+eCi3hpQaIBc4B43DbUN1Kg8IoLNTxgMuM8mFaCVsBKTLn/DjBkcq6bskP8mlvj922wfSsGGD30Sa/zttW9KfPUDI2c02NWlo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 48GIVBJY023433; Tue, 17 Sep 2024 03:46:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date :from:to:subject:message-id:mime-version:content-type; s=pp1; bh=CCHFL6fUL8nR/O7gqd/nLASWjfJ9P2JQh5NXjaq8Abg=; b=bG/1wIiIuR12 iXtVj57qvrLgTcuTQffynXp64PiIDjAjk8rP8SfG277+/miNNOF21AyJJh2AwY3G HPNIhkDZJVcGmIZVidtQO3zgtL6CBcI53G1qVV9TvKte0gsoGtv068VBbcTUg1LE FI+YBUyuT/aZDSXAUJy+x000b7dRtgRvWNtUowEtcsdg8bTRdr8edr6mwuN3fcxC ks0vpeZfsVdBxBp74bL882vBNc1Tn69rh5zRMgxib090bfo95RQI3agw03qMBRJ8 0Sl71+gPCAuSqOc+HCox6UVKQ0k5qe6nbwdNr1+cS+p4NREanegmqGYlrnCVBlF2 tRM0b1IWvA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 41n3ud5ksu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Sep 2024 03:46:01 +0000 (GMT) Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 48H3k1jD009217; Tue, 17 Sep 2024 03:46:01 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 41n3ud5ksq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Sep 2024 03:46:00 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 48H3XoYY001822; Tue, 17 Sep 2024 03:46:00 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 41nqh3jhc0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Sep 2024 03:46:00 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 48H3jxgG13369970 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 17 Sep 2024 03:45:59 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF19958053; Tue, 17 Sep 2024 03:45:58 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3A97558043; Tue, 17 Sep 2024 03:45:58 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.174.39]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Tue, 17 Sep 2024 03:45:58 +0000 (GMT) Date: Mon, 16 Sep 2024 23:45:56 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH] PR 99293: Optimize splat of a V2DF/V2DI extract with constant element Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 2Q8CU-6gFVZB8p_1avRWWj5Z84HtcS3D X-Proofpoint-ORIG-GUID: 7eavhPWIHxpkjAMpy8uOWA0y2ROG2UNw X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-17_01,2024-09-16_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 bulkscore=0 phishscore=0 impostorscore=0 spamscore=0 priorityscore=1501 suspectscore=0 adultscore=0 mlxscore=0 lowpriorityscore=0 malwarescore=0 mlxlogscore=447 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2408220000 definitions=main-2409170025 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This is an old patch that I first wrote in 2021, and in the press of other work, the patch got lost. We had optimizations for splat of a vector extract for the other vector types, but we missed having one for V2DI and V2DF. This patch adds a combiner insn to do this optimization. In looking at the source, we had similar optimizations for V4SI and V4SF extract and splats, but we missed doing V2DI/V2DF. Without the patch for the code: vector long long splat_dup_l_0 (vector long long v) { return __builtin_vec_splats (__builtin_vec_extract (v, 0)); } the compiler generates (on a little endian power9): splat_dup_l_0: mfvsrld 9,34 mtvsrdd 34,9,9 blr Now it generates: splat_dup_l_0: xxpermdi 34,34,34,3 blr I have built compilers with this patch on little endian and big endian PowerPC servers, and there was no regression in the code. Can I apply this patch to the master trunk for GCC 15? 2024-09-16 Michael Meissner gcc/ * config/rs6000/vsx.md (vsx_splat_extract_): New insn. gcc/testsuite/ * gcc.target/powerpc/builtins-1.c: Adjust insn count. * gcc.target/powerpc/pr99293.c: New test. --- gcc/config/rs6000/vsx.md | 18 +++++++++++++++ gcc/testsuite/gcc.target/powerpc/builtins-1.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr99293.c | 22 +++++++++++++++++++ 3 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index b2fc39acf4e..73f20a86e56 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4796,6 +4796,24 @@ (define_insn "vsx_splat__mem" "lxvdsx %x0,%y1" [(set_attr "type" "vecload")]) +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element +(define_insn "*vsx_splat_extract_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_duplicate:VSX_D + (vec_select: + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (mode)" +{ + int which_word = INTVAL (operands[2]); + if (!BYTES_BIG_ENDIAN) + which_word = 1 - which_word; + + operands[3] = GEN_INT (which_word ? 3 : 0); + return "xxpermdi %x0,%x1,%x1,%3"; +} + [(set_attr "type" "vecperm")]) + ;; V4SI splat support (define_insn "vsx_splat_v4si" [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa") diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c b/gcc/testsuite/gcc.target/powerpc/builtins-1.c index 8410a5fd431..4e7e5384675 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c @@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa) /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */ /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */ /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */ -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c b/gcc/testsuite/gcc.target/powerpc/pr99293.c new file mode 100644 index 00000000000..20adc1f27f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c @@ -0,0 +1,22 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test for PR 99263, which wants to do: + __builtin_vec_splats (__builtin_vec_extract (v, n)) + + where v is a V2DF or V2DI vector and n is either 0 or 1. Previously the + compiler would do a direct move to the GPR registers to select the item and a + direct move from the GPR registers to do the splat. */ + +vector long long splat_dup_l_0 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector long long splat_dup_l_1 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */