From patchwork Tue Sep 17 03:40:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1986288 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=hGrjJQSM; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X76yP0JNhz1y2N for ; Tue, 17 Sep 2024 13:41:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B014A385840B for ; Tue, 17 Sep 2024 03:41:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 47DDA3858D26 for ; Tue, 17 Sep 2024 03:40:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 47DDA3858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 47DDA3858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726544454; cv=none; b=pqnvgEM1XpmcSIxBw49BDJJUXyyFsx2kf3kwBQQlUA6G+QXqaguFXpPKgMpj18xslPaKXNMbnTcLi8nzMNZ1gXfywNLONKbGlD6Obw16CCbsBGqDa9B2bSNWqQfF8mfS3wldQxAt5nY9Bq4a8G4YijCQZD2S9RBvQzAqOJrKPoE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726544454; c=relaxed/simple; bh=eV+Yvgyt+8I//PjcW/Z16+i76FyShaHCVqRq4modCfs=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=mktcQx7/pBbhL716M+bbNHM9rrbPE/e9M21LUimjSyLMdHbVoYuGDfgPsHPLwcCf5io4xVTZO+cb6ZRJveiaf8TiqteWiNO90MODKJkGjXpmWWIEJ0++uBBolfxhHJbgAFv8c+m63oLv+JwUDQcCKpyLmdv3fYm324qBwBwx9d4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 48GIUx2m007555; Tue, 17 Sep 2024 03:40:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date :from:to:subject:message-id:mime-version:content-type; s=pp1; bh=oA/UpKDIveXsre0Xd/iMqrfDxSgVQjtiYjLUitOWN4c=; b=hGrjJQSMW+Iv 0LzciusirtMnosxj/p/wu5g9wE7cHfE7R7+e+z3lYJOeAgAHLNGj/orUWEXD8Rhl zHe7DuhpKITfkEoPA1ySlsA3FFiKPOe/zG/nRrEXMQjuxM4ZKNTeOurImVdkqXWe hcv5dG9zqlCDg14k6pzaPMdNPtJ/+Z5wwJPzNomsQvsDvy1wp5tGjDZpVzQq14Hy yaQ60ZEEiuMZbaxS3WDdF8lbpcS0fwjqrkOMgbwyMR0UZP5L6H6ZmAXe2/AZGIDa SnvB4A5+MXeGJULCrosLTcl6fXlutoVeC1zCV4ioh1v6WUfOzWvrIupfqWxgwl3S v6CqbpSpQA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 41n3vddmum-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Sep 2024 03:40:50 +0000 (GMT) Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 48H3eovA013402; Tue, 17 Sep 2024 03:40:50 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 41n3vddmuj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Sep 2024 03:40:50 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 48H12xS2000620; Tue, 17 Sep 2024 03:40:49 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 41nn71321e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Sep 2024 03:40:49 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 48H3el8W42467882 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 17 Sep 2024 03:40:47 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8E26D58055; Tue, 17 Sep 2024 03:40:47 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D1B105804B; Tue, 17 Sep 2024 03:40:46 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.174.39]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Tue, 17 Sep 2024 03:40:46 +0000 (GMT) Date: Mon, 16 Sep 2024 23:40:45 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [REPOST, PATCH] PR 89213: Add better support for shifting vectors with 64-bit elements Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: TaUi9k49syQA7iGRBAOOeHi_nIY2F6il X-Proofpoint-GUID: oFSi3oPs7y7JcPDBzTjzg840QznSPXkK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-17_01,2024-09-16_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 lowpriorityscore=0 suspectscore=0 bulkscore=0 clxscore=1011 mlxscore=0 malwarescore=0 mlxlogscore=999 priorityscore=1501 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2408220000 definitions=main-2409170025 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org I posted this patch in August, and I never got a reply, so I'm reposting this now. This patch fixes PR target/89213 to allow better code to be generated to do constant shifts of V2DI/V2DF vectors. Previously GCC would do constant shifts of vectors with 64-bit elements by using: XXSPLTIB 32,4 VEXTSB2D 0,0 VSRAD 2,2,0 I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for the 64-bit shift count in one instruction. Instead, it would need to load a byte and then convert it to 64-bit. With this patch, GCC now realizes that the vector shift instructions will look at the bottom 6 bits for the shift count, and it can use either a VSPLTISW or XXSPLTIB instruction to load the shift count. I have built these patches on both big endian PowerPC server systems and little endian PowerPC server systems, and there were no regressions. Can I check in this patch to the master branch for GCC 15? 2024-09-16 Michael Meissner gcc/ PR target/89213 * config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec. (VSHIFT_MODE): New mode iterator. (vshift_code): New code iterator. (vshift_attr): New code attribute. (altivec___const): New pattern to optimize vector long long/int shifts by a constant. (altivec__shift_const): New helper insn to load up a constant used by the shift operation. * config/rs6000/predicates.md (vector_shift_constant): New predicate. gcc/testsuite/ PR target/89213 * gcc.target/powerpc/pr89213.c: New test. * gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count. --- gcc/config/rs6000/altivec.md | 51 +++++++++ gcc/config/rs6000/predicates.md | 63 +++++++++++ gcc/testsuite/gcc.target/powerpc/pr89213.c | 106 ++++++++++++++++++ .../gcc.target/powerpc/vec-rlmi-rlnm.c | 4 +- 4 files changed, 222 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr89213.c diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 1f5489b974f..8faece984e9 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -170,6 +170,7 @@ (define_c_enum "unspec" UNSPEC_VSTRIL UNSPEC_SLDB UNSPEC_SRDB + UNSPEC_VECTOR_SHIFT ]) (define_c_enum "unspecv" @@ -2176,6 +2177,56 @@ (define_insn "altivec_vsro" "vsro %0,%1,%2" [(set_attr "type" "vecperm")]) +;; Optimize V2DI shifts by constants. This relies on the shift instructions +;; only looking at the bits needed to do the shift. This means we can use +;; VSPLTISW or XXSPLTIB to load up the constant, and not worry about the bits +;; that the vector shift instructions will not use. +(define_mode_iterator VSHIFT_MODE [(V4SI "TARGET_P9_VECTOR") + (V2DI "TARGET_P8_VECTOR")]) + +(define_code_iterator vshift_code [ashift ashiftrt lshiftrt]) +(define_code_attr vshift_attr [(ashift "ashift") + (ashiftrt "ashiftrt") + (lshiftrt "lshiftrt")]) + +(define_insn_and_split "*altivec___const" + [(set (match_operand:VSHIFT_MODE 0 "register_operand" "=v") + (vshift_code:VSHIFT_MODE + (match_operand:VSHIFT_MODE 1 "register_operand" "v") + (match_operand:VSHIFT_MODE 2 "vector_shift_constant" ""))) + (clobber (match_scratch:VSHIFT_MODE 3 "=&v"))] + "((mode == V2DImode && TARGET_P8_VECTOR) + || (mode == V4SImode && TARGET_P9_VECTOR))" + "#" + "&& 1" + [(set (match_dup 3) + (unspec:VSHIFT_MODE [(match_dup 4)] UNSPEC_VECTOR_SHIFT)) + (set (match_dup 0) + (vshift_code:VSHIFT_MODE (match_dup 1) + (match_dup 3)))] +{ + if (GET_CODE (operands[3]) == SCRATCH) + operands[3] = gen_reg_rtx (mode); + + operands[4] = ((GET_CODE (operands[2]) == CONST_VECTOR) + ? CONST_VECTOR_ELT (operands[2], 0) + : XEXP (operands[2], 0)); +}) + +(define_insn "*altivec__shift_const" + [(set (match_operand:VSHIFT_MODE 0 "register_operand" "=v") + (unspec:VSHIFT_MODE [(match_operand 1 "const_int_operand" "n")] + UNSPEC_VECTOR_SHIFT))] + "TARGET_P8_VECTOR" +{ + if (UINTVAL (operands[1]) <= 15) + return "vspltisw %0,%1"; + else if (TARGET_P9_VECTOR) + return "xxspltib %x0,%1"; + else + gcc_unreachable (); +}) + (define_insn "altivec_vsum4ubs" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V16QI 1 "register_operand" "v") diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 7f0b4ab61e6..0b78901e94b 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -861,6 +861,69 @@ (define_predicate "vector_int_reg_or_same_bit" return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode); }) +;; Return 1 if the operand is a V2DI or V4SI const_vector, where each element +;; is the same constant, and the constant can be used for a shift operation. +;; This is to prevent sub-optimal code, that needs to load up the constant and +;; then zero extend it 32 or 64-bit vectors or load up the constant from the +;; literal pool. +;; +;; For V4SImode, we only recognize shifts by 16..31 on ISA 3.0, since shifts by +;; 1..15 can be handled by the normal VSPLTISW and vector shift instruction. +;; For V2DImode, we do this all of the time, since there is no convenient +;; instruction to load up a vector long long splatted constant. +;; +;; If we can use XXSPLTIB, then allow constants up to 63. If not, we restrict +;; the constant to 0..15 that can be loaded with VSPLTISW. V4SI shifts are +;; only optimized for ISA 3.0 when the shift value is >= 16 and <= 31. Values +;; between 0 and 15 can use a normal VSPLTISW to load the value, and it doesn't +;; need this optimization. +(define_predicate "vector_shift_constant" + (match_code "const_vector,vec_duplicate") +{ + unsigned HOST_WIDE_INT min_value; + + if (mode == V2DImode) + { + min_value = 0; + if (!TARGET_P8_VECTOR) + return 0; + } + else if (mode == V4SImode) + { + min_value = 16; + if (!TARGET_P9_VECTOR) + return 0; + } + else + return 0; + + unsigned HOST_WIDE_INT max_value = TARGET_P9_VECTOR ? 63 : 15; + + if (GET_CODE (op) == CONST_VECTOR) + { + unsigned HOST_WIDE_INT first = UINTVAL (CONST_VECTOR_ELT (op, 0)); + unsigned nunits = GET_MODE_NUNITS (mode); + unsigned i; + + if (!IN_RANGE (first, min_value, max_value)) + return 0; + + for (i = 1; i < nunits; i++) + if (first != UINTVAL (CONST_VECTOR_ELT (op, i))) + return 0; + + return 1; + } + else + { + rtx op0 = XEXP (op, 0); + if (!CONST_INT_P (op0)) + return 0; + + return IN_RANGE (UINTVAL (op0), min_value, max_value); + } +}) + ;; Return 1 if operand is 0.0. (define_predicate "zero_fp_constant" (and (match_code "const_double") diff --git a/gcc/testsuite/gcc.target/powerpc/pr89213.c b/gcc/testsuite/gcc.target/powerpc/pr89213.c new file mode 100644 index 00000000000..8f5fae2c3ef --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr89213.c @@ -0,0 +1,106 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +/* Optimize vector shifts by constants. */ + +#include + +typedef vector long long vi64_t; +typedef vector unsigned long long vui64_t; + +typedef vector int vi32_t; +typedef vector unsigned int vui32_t; + +vi64_t +shiftra_test64_4 (vi64_t a) +{ + vui64_t x = {4, 4}; + return (vi64_t) vec_vsrad (a, x); +} + +vi64_t +shiftrl_test64_4 (vi64_t a) +{ + vui64_t x = {4, 4}; + return (vi64_t) vec_vsrd (a, x); +} + +vi64_t +shiftl_test64_4 (vi64_t a) +{ + vui64_t x = {4, 4}; + return (vi64_t) vec_vsld (a, x); +} + +vi64_t +shiftra_test64_29 (vi64_t a) +{ + vui64_t x = {29, 29}; + return (vi64_t) vec_vsrad (a, x); +} + +vi64_t +shiftrl_test64_29 (vi64_t a) +{ + vui64_t x = {29, 29}; + return (vi64_t) vec_vsrd (a, x); +} + +vi64_t +shiftl_test64_29 (vi64_t a) +{ + vui64_t x = {29, 29}; + return (vi64_t) vec_vsld (a, x); +} + +vi32_t +shiftra_test32_4 (vi32_t a) +{ + vui32_t x = {4, 4, 4, 4}; + return (vi32_t) vec_vsraw (a, x); +} + +vi32_t +shiftrl_test32_4 (vi32_t a) +{ + vui32_t x = {4, 4, 4, 4}; + return (vi32_t) vec_vsrw (a, x); +} + +vi32_t +shiftl_test32_4 (vi32_t a) +{ + vui32_t x = {4, 4, 4, 4}; + return (vi32_t) vec_vslw (a, x); +} + +vi32_t +shiftra_test32_29 (vi32_t a) +{ + vui32_t x = {29, 29, 29, 29}; + return (vi32_t) vec_vsraw (a, x); +} + +vi32_t +shiftrl_test32_29 (vi32_t a) +{ + vui32_t x = {29, 29, 29, 29}; + return (vi32_t) vec_vsrw (a, x); +} + +vi32_t +shiftl_test32_29 (vi32_t a) +{ + vui32_t x = {29, 29, 29, 29}; + return (vi32_t) vec_vslw (a, x); +} + +/* { dg-final { scan-assembler-times {\mxxspltib\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mvsld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvslw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvspltisw\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mvsrd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsrw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsrad\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsraw\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c index 6834733b1bf..01fa0a99d46 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c @@ -54,12 +54,12 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned long long y, - For rlnm_test_1: vspltisw, vslw, xxlor, vrlwnm. - For rlnm_test_2: xxspltib, vextsb2d, vsld, xxlor, vrldnm. There is a choice of splat instructions in both cases, so we - just check for "splt". */ + just check for "splt". In the past vextsb2d would be generated for + rlnm_test_2, but the compiler no longer generates it. */ /* { dg-final { scan-assembler-times "vrlwmi" 1 } } */ /* { dg-final { scan-assembler-times "vrldmi" 1 } } */ /* { dg-final { scan-assembler-times "splt" 2 } } */ -/* { dg-final { scan-assembler-times "vextsb2d" 1 } } */ /* { dg-final { scan-assembler-times "vslw" 1 } } */ /* { dg-final { scan-assembler-times "vsld" 1 } } */ /* { dg-final { scan-assembler-times "xxlor" 4 } } */