From patchwork Sun Nov 17 00:49:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 2012415 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=FDwqlBlg; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XrXHm4qtYz1xy5 for ; Sun, 17 Nov 2024 11:50:58 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 449CF3857003 for ; Sun, 17 Nov 2024 00:50:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 449CF3857003 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=FDwqlBlg X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D1B213857341 for ; Sun, 17 Nov 2024 00:49:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D1B213857341 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D1B213857341 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1731804563; cv=none; b=WYyD4vtlrIH7GC+xYh3h8rF7D+AZpLEnaKAKlQhWy39AEatlOouYlKyp+T2RWeu/3J1nUtBaHW8j9fljF63UnHWFpnalSKP14gcSvnk2phhHy7c51d59gERzNh2zClDkNclLJvCojk6daDGoeFXwVbU+USyWv1zzZ5dE2p+9N/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1731804563; c=relaxed/simple; bh=IGckMVLsOVHNDN5zi7iP0r3vYrdm/vvDTaTdiMtO66A=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=I5rZ9ltjqdyJzRbRaUidneucpSf0MGXJQBADF5ZPJdXGiF8dF+KpwFbIyf5om72lWLdvPjSrizVOkyn0T7GRwqEuL3xFpp5jqnbldgwLEmf2nbADR6vzkVlrLqs3ffC0sdm473CpWFs5RRveA+fBC8ibExjah2I7dnC2evJaDHg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D1B213857341 Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4AGLtQQS010768; Sun, 17 Nov 2024 00:49:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= content-type:date:from:message-id:mime-version:subject:to; s= pp1; bh=1aGLn5J59xqo2REbtk4quDieGf423ciTHrvcS+RzHtI=; b=FDwqlBlg Lga5Jgrm0hhffbBi7v/j8uXC2DGPq/kYnPYiyhkO1ez2i4+bN6n35d01VjMBEGns yXh2uxNEgI9KuSMbkomSG1mxtP93aNA5DpKy5SrDTFfX8AaUsGcJB265CvfPCBZv IAXvZ+jsMq4PrFwGxPXXc34LFz1MetuTgAOCLnbV1BW5elGcFgXV77hcoTcJT6+C SqJVo3Rif74GYDLs4LYYGb960NUtu8PUacd/WjtbL94BO80k9LxkmdPoKaqnBK1C IeOwBIPEfbUdLmnAS55IwsIbtdk9tZUzYdm/AZpClfDsfLGvcNTcBjYz+8K1q5Cc 7CeZelcjOb95Fw== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 42xjw7kbcb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 17 Nov 2024 00:49:20 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4AGJGMMo018606; Sun, 17 Nov 2024 00:49:19 GMT Received: from smtprelay01.wdc07v.mail.ibm.com ([172.16.1.68]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 42tk2n4yh9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 17 Nov 2024 00:49:19 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4AH0nILo47382952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 17 Nov 2024 00:49:18 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5AD5358058; Sun, 17 Nov 2024 00:49:18 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EFCFC58057; Sun, 17 Nov 2024 00:49:17 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.98.188]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTPS; Sun, 17 Nov 2024 00:49:17 +0000 (GMT) Date: Sat, 16 Nov 2024 19:49:16 -0500 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , Peter Bergner Subject: [PATCH report] PR target/99293 Optimize splat of a V2DF/V2DI extract with constant element Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: fkaCxwTWJ43jsAbrUDa5nExJRFuIuvCt X-Proofpoint-ORIG-GUID: fkaCxwTWJ43jsAbrUDa5nExJRFuIuvCt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 adultscore=0 priorityscore=1501 bulkscore=0 malwarescore=0 phishscore=0 lowpriorityscore=0 mlxscore=0 mlxlogscore=569 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2411170003 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org I've posted this patch previous, and pinged it a few times. So I'm reposting it now in case it got lost. We had optimizations for splat of a vector extract for the other vector types, but we missed having one for V2DI and V2DF. This patch adds a combiner insn to do this optimization. In looking at the source, we had similar optimizations for V4SI and V4SF extract and splats, but we missed doing V2DI/V2DF. Without the patch for the code: vector long long splat_dup_l_0 (vector long long v) { return __builtin_vec_splats (__builtin_vec_extract (v, 0)); } the compiler generates (on a little endian power9): splat_dup_l_0: mfvsrld 9,34 mtvsrdd 34,9,9 blr Now it generates: splat_dup_l_0: xxpermdi 34,34,34,3 blr I have built GCC with the patches in this patch set applied on both little and big endian PowerPC systems and there were no regressions. Can I apply this patch to GCC 15? 2024-11-16 Michael Meissner gcc/ * config/rs6000/vsx.md (vsx_splat_extract_): New insn. gcc/testsuite/ * gcc.target/powerpc/builtins-1.c: Adjust insn count. * gcc.target/powerpc/pr99293.c: New test. --- gcc/config/rs6000/vsx.md | 18 +++++++++++++++ gcc/testsuite/gcc.target/powerpc/builtins-1.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr99293.c | 22 +++++++++++++++++++ 3 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index b2fc39acf4e..73f20a86e56 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4796,6 +4796,24 @@ (define_insn "vsx_splat__mem" "lxvdsx %x0,%y1" [(set_attr "type" "vecload")]) +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element +(define_insn "*vsx_splat_extract_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_duplicate:VSX_D + (vec_select: + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (mode)" +{ + int which_word = INTVAL (operands[2]); + if (!BYTES_BIG_ENDIAN) + which_word = 1 - which_word; + + operands[3] = GEN_INT (which_word ? 3 : 0); + return "xxpermdi %x0,%x1,%x1,%3"; +} + [(set_attr "type" "vecperm")]) + ;; V4SI splat support (define_insn "vsx_splat_v4si" [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa") diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c b/gcc/testsuite/gcc.target/powerpc/builtins-1.c index 8410a5fd431..4e7e5384675 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c @@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa) /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */ /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */ /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */ -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c b/gcc/testsuite/gcc.target/powerpc/pr99293.c new file mode 100644 index 00000000000..20adc1f27f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c @@ -0,0 +1,22 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test for PR 99263, which wants to do: + __builtin_vec_splats (__builtin_vec_extract (v, n)) + + where v is a V2DF or V2DI vector and n is either 0 or 1. Previously the + compiler would do a direct move to the GPR registers to select the item and a + direct move from the GPR registers to do the splat. */ + +vector long long splat_dup_l_0 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector long long splat_dup_l_1 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */