From patchwork Mon Jun 24 02:23:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1951660 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=HWZJH3ti; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W7CFY6mkvz20X6 for ; Tue, 25 Jun 2024 01:54:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1ED14384AB54 for ; Mon, 24 Jun 2024 15:54:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B37DB3858C3A for ; Mon, 24 Jun 2024 15:53:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B37DB3858C3A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B37DB3858C3A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719244432; cv=none; b=Cyhnxur5En8NQ7mK9g3nssIrJPKii50+PduwQR3zzZ+upe0hI4euuAI/1eW0PQg10U1c336TdqYfmjGeeQVwuv4bnRZVgwTfbhKG3kZ9a9XUAXXnHuMj00ZzzDTYxzifzeNbCDV6dN4DFxmXQkZ6fQNIV0TDewoulThcXJwivLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719244432; c=relaxed/simple; bh=N2fS5aMLz5TmAeaAbI5dCDoIe/dFRX6vGR7wj+h/AAI=; h=DKIM-Signature:Message-ID:Date:To:From:Subject:MIME-Version; b=wifqjXqMrWSlMAPsPyR0hPBtwN/6A8/BC4iABbDRQLsGCULQLiTEDVvG7ywbIA+oet5TrS8hMX8nZsptvtOeVll44r6rcFDoOWssrjeb/v9ApMU/hUJS6bjBJLeLPIeMWzNDs9VJ5BEp/u5Hnj+7S8pcXcG8Bw1u9uE+G9nL2cQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45O2R1Td019208; Mon, 24 Jun 2024 02:44:38 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= message-id:date:to:cc:from:subject:content-type :content-transfer-encoding:mime-version; s=pp1; bh=2MxKrSKURAj8E JmkEkgpL/Rp/UVTMIAMwsojkb/fahU=; b=HWZJH3ti1xhsuk1zASIlHup5j54Dq nfsdtdIWhsIBUWVM/5F2GUCH4mukZDiHQMhngWkJjuxlc9ySxqOq6Irvm+SC9E8Y xSYVLxjmaQRSY+YLE/9qHBxEcPasHn3pUqwtsVNa6ehjNg2LtPjWdZ/5fSUYVG7e RAOyh4uMTCqu1uUl6DmxqxeO+OzEDUGezCvGLJY3qrj4FCESdWD13eZNGrjl88g/ cdZk6zcJObTjXZgZaL/4Kk9Cy+3QG8LounIPLch7v7Lbk0G3vYknqvMyCVvcEPqi YQe3Sm3AKpyx/th/bJQj52CbUnCotDuCO//pyd6SY/Eiy8BIKpI5SHkKA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yxym5022u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jun 2024 02:44:38 +0000 (GMT) Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 45O2ib2V012352; Mon, 24 Jun 2024 02:44:37 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yxym5022j-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jun 2024 02:44:37 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 45O0WNaC019897; Mon, 24 Jun 2024 02:23:14 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3yxb5m4s62-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jun 2024 02:23:14 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 45O2NAmd57409886 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Jun 2024 02:23:12 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6811C2004B; Mon, 24 Jun 2024 02:23:10 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9ED9E20040; Mon, 24 Jun 2024 02:23:08 +0000 (GMT) Received: from [9.197.227.211] (unknown [9.197.227.211]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 24 Jun 2024 02:23:08 +0000 (GMT) Message-ID: Date: Mon, 24 Jun 2024 10:23:06 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Content-Language: en-US To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , Peter Bergner From: "Kewen.Lin" Subject: [PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low char on LE X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TdEaCavB-t-hXPN9eGQcfNUSU__NS9if X-Proofpoint-ORIG-GUID: f-PfvJmsY2zig9KC_26_G6A3JehPlMIL X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-24_01,2024-06-21_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 bulkscore=0 phishscore=0 mlxscore=0 impostorscore=0 spamscore=0 priorityscore=1501 adultscore=0 clxscore=1015 lowpriorityscore=0 mlxlogscore=644 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2406240018 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, Commit r12-4496 changes some define_expands and define_insns for vector merge high/low char, which are altivec_vmrg[hl]b. These defines are mainly for built-in function vec_merge{h,l} and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghb on BE while vmrglb on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, but gets changed into different patterns on BE and LE starting from commit r12-4496. Similar to 32-bit element case in commit log of r15-1504, this 8-bit element pattern on LE doesn't actually match what the underlying insn is intended to represent, once some optimization like combine does some changes basing on it, it would cause the unexpected consequence. The newly constructed test case pr106069-1.c is a typical example for this issue. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghb expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Following [1], this one is for vector element 8-bit size, bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9 and P10. I'm going to push this two days later if no objections, thanks! [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655239.html Co-authored-by: Xionghu Luo PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ... (altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrghb_direct_le): New define_insn. (altivec_vmrglb_direct): Rename to ... (altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN. (altivec_vmrglb_direct_le): New define_insn. (altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be for BE and gen_altivec_vmrglb_direct_le for LE. (altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be for BE and gen_altivec_vmrghb_direct_le for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghb_direct by CODE_FOR_altivec_vmrghb_direct_be for BE and CODE_FOR_altivec_vmrghb_direct_le for LE. And replace CODE_FOR_altivec_vmrglb_direct by CODE_FOR_altivec_vmrglb_direct_be for BE and CODE_FOR_altivec_vmrglb_direct_le for LE. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106069-1.c: New test. --- gcc/config/rs6000/altivec.md | 66 +++++++++++++++---- gcc/config/rs6000/rs6000.cc | 8 +-- gcc/testsuite/gcc.target/powerpc/pr106069-1.c | 39 +++++++++++ 3 files changed, 95 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069-1.c -- 2.43.0 diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index dcc71cc0f52..a0e8a35b843 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1152,15 +1152,16 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1174,7 +1175,25 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1274,15 +1293,16 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1296,7 +1316,25 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 2046a831938..8e46cf19907 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -23442,8 +23442,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be + : CODE_FOR_altivec_vmrglb_direct_le, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, {OPTION_MASK_ALTIVEC, BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct @@ -23454,8 +23454,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, : CODE_FOR_altivec_vmrglw_direct_v4si_le, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be + : CODE_FOR_altivec_vmrghb_direct_le, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, {OPTION_MASK_ALTIVEC, BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069-1.c b/gcc/testsuite/gcc.target/powerpc/pr106069-1.c new file mode 100644 index 00000000000..4945d8fedfb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106069-1.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target vmx_hw } */ + +/* Test vector merge for 8-bit element size, + it will abort if the RTL pattern isn't expected. */ + +#include "altivec.h" + +__attribute__((noipa)) +signed char elem_6 (vector signed char a, vector signed char b) +{ + vector signed char c = vec_mergeh (a,b); + return vec_extract (c, 6); +} + +__attribute__((noipa)) +unsigned char elem_15 (vector unsigned char a, vector unsigned char b) +{ + vector unsigned char c = vec_mergel (a,b); + return vec_extract (c, 15); +} + +int +main () +{ + vector unsigned char v1 + = {3, 33, 22, 12, 34, 14, 5, 25, 30, 11, 0, 21, 17, 27, 38, 8}; + vector unsigned char v2 + = {81, 82, 83, 84, 68, 67, 66, 65, 99, 100, 101, 102, 250, 125, 0, 6}; + signed char x1 = elem_6 ((vector signed char) v1, (vector signed char) v2); + unsigned char x2 = elem_15 (v1, v2); + + if (x1 != 12 || x2 != 6) + __builtin_abort (); + + return 0; +} +