From patchwork Thu Jun 20 10:22:07 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Kewen.Lin" <linkw@linux.ibm.com>
X-Patchwork-Id: 1950064
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256
 header.s=pp1 header.b=Gpj54K9k;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4W4c4h6mvJz20X8
	for <incoming@patchwork.ozlabs.org>; Thu, 20 Jun 2024 20:22:44 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2825D3891C34
	for <incoming@patchwork.ozlabs.org>; Thu, 20 Jun 2024 10:22:43 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id A76FE388E827
 for <gcc-patches@gcc.gnu.org>; Thu, 20 Jun 2024 10:22:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A76FE388E827
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A76FE388E827
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=148.163.156.1
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718878942; cv=none;
 b=HlIrC5VPGPolF0lFVfu/hvqi25ZENhCc8K/heskl0v80X3Jaroff8XmVjy+Yuuwym6XNajnG2l2p+HVCpOFrwtoDfl6YTrPzGbdVASdsB8XZNFPe+85v9zTW8qs8FBbORKZ4uSoWXOMx2mh5Ts/xnJgZci9DXTR3VGXmjPGtPNE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1718878942; c=relaxed/simple;
 bh=tqaSjojTOODp15IH61YBV3Y1OmkXOKIXvf+QgbpMicU=;
 h=DKIM-Signature:Message-ID:Date:To:From:Subject:MIME-Version;
 b=HE0QGMBPhnvuj/EPyuuxqG0WDGljA4ILUAMQd7ptDXHyKCfwhih5GIzUXojT8nle570jJ98DassdfCDEmejzSQG0oRHZq7FCO9z0VNL8H44JMvBXB8eKJ4LWrycc4AGBos3zNkHpz83Ze2no8KeggsZcruXIg86TTvjbdcmMKg8=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from pps.filterd (m0356517.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id
 45KALXBA006852;
 Thu, 20 Jun 2024 10:22:17 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=
 message-id:date:to:cc:from:subject:content-type
 :content-transfer-encoding:mime-version; s=pp1; bh=HKkP7A1U5CIEw
 yGHMubFTB/wLTZItMRlPifw+oyIeCw=; b=Gpj54K9kr+uBZEPqIJjd5u65QtLLM
 wJ+8qh1V4WJ9Aphz20ahN46CBqecSuYboVe6pBcnXEEAlGhPJ3raVwxSLEnL3cmj
 4uX5horqLMGZW/6HK2U0l+CN0va6MmZEYwxTZYX4x+WwXrW4AshSNLVRtD3bG+H3
 BgMlGKiMSVuVutm8Ckat762VE6cSqNZy9KPZXX4xRcDUqxwdjB87eqSDKVaFcYIw
 Oer9L2OHIesPgM+RxTJwLRO4zpJ3W1xbKyHRQeik0Ja50wInOC+i+IPqhGZLjfea
 iWa4lq5uHyBGxChtxexYd1MxknEgH/xuaSiWXZkTrY7zCh9AOephp8OCw==
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yvhtw83v4-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 20 Jun 2024 10:22:17 +0000 (GMT)
Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1])
 by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 45KAMGGM009563;
 Thu, 20 Jun 2024 10:22:16 GMT
Received: from ppma22.wdc07v.mail.ibm.com
 (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yvhtw83ux-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 20 Jun 2024 10:22:16 +0000 (GMT)
Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1])
 by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id
 45KA1vou019545; Thu, 20 Jun 2024 10:22:15 GMT
Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225])
 by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3ysnp1n59q-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 20 Jun 2024 10:22:15 +0000
Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com
 [10.20.54.105])
 by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 45KAMBnt42533342
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 20 Jun 2024 10:22:13 GMT
Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 7F87820381;
 Thu, 20 Jun 2024 10:22:11 +0000 (GMT)
Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 33EBD20379;
 Thu, 20 Jun 2024 10:22:09 +0000 (GMT)
Received: from [9.197.227.151] (unknown [9.197.227.151])
 by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP;
 Thu, 20 Jun 2024 10:22:08 +0000 (GMT)
Message-ID: <2ed0589f-6c28-5dbc-4e8d-d621ebd03e9f@linux.ibm.com>
Date: Thu, 20 Jun 2024 18:22:07 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.15.1
Content-Language: en-US
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
 Peter Bergner <bergner@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Xionghu Luo <xionghuluo@tencent.com>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Subject: [PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low word
 on LE
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: 2-hb_RqOdBIAEK1_A7Hlsp0mf3WEPWP4
X-Proofpoint-GUID: xS9z3_bE9p1S95WPHlyws49VCTYBJEok
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16
 definitions=2024-06-20_07,2024-06-19_01,2024-05-17_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 impostorscore=0 clxscore=1015
 suspectscore=0 adultscore=0 spamscore=0 mlxscore=0 lowpriorityscore=0
 mlxlogscore=999 phishscore=0 priorityscore=1501 bulkscore=0 malwarescore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001
 definitions=main-2406200070
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3,
 RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Hi Segher,

Following your review comments in [1], this patch is
separated from Xionghu's patch v4 [2] and mainly targetted
for 32-bit element size, it changes with the generic call
altivec_vmrg*w in vec_widen_[su]mult_{hi,lo}* expanders as
well.  If this patch looks good to you, I'll post the others
for 16 and 8 bits element size.

btw, there are still something that can be improved like
the loose predicates pointed out by Peter, vsx_xxmrg[hl]w_*
can be merged with altivec_vmrg*w etc., but I think we want
this one to focus on fixing regression and those enhancements
can be supported by some separated follow-up patches.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655019.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611683.html

Is it ok for trunk?

BR,
Kewen
-----

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low word, which are altivec_vmrg[hl]w,
vsx_xxmrg[hl]w_<VSX_W:mode>.  These defines are mainly for
built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si and some internal gen function
needs.  These functions should consider endianness, taking
vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges
the first halves (in element order) of two vectors", it does
note it's in element order.  So it's mapped into vmrghw on
BE while vmrglw on LE respectively.  Although the mapped
insns are different, as the discussion in PR106069, the RTL
pattern should be still the same, it is conformed before
commit r12-4496, define_expand altivec_vmrghw got expanded
into:

  (vec_select:VSX_W
     (vec_concat:<VS_double>
        (match_operand:VSX_W 1 "register_operand" "wa,v")
        (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 0) (const_int 4)
                   (const_int 1) (const_int 5)])))]

on both BE and LE then.  But commit r12-4496 changed it to
expand into:

  (vec_select:VSX_W
     (vec_concat:<VS_double>
        (match_operand:VSX_W 1 "register_operand" "wa,v")
        (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 0) (const_int 4)
                   (const_int 1) (const_int 5)])))]

on BE, and

  (vec_select:VSX_W
     (vec_concat:<VS_double>
        (match_operand:VSX_W 1 "register_operand" "wa,v")
        (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 2) (const_int 6)
                   (const_int 3) (const_int 7)])))]

on LE, although the mapped insn are still vmrghw on BE and
vmrglw on LE, the associated RTL pattern is completely
wrong and inconsistent with the mapped insn.  If optimization
passes leave this pattern alone, even if its pattern doesn't
represent its mapped insn, it's still fine, that's why simple
testing on bif doesn't expose this issue.  But once some
optimization pass such as combine does some changes basing
on this wrong pattern, because the pattern doesn't match the
semantics that the expanded insn is intended to represent,
it would cause the unexpected result.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghw expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>

	PR target/106069
	PR target/115355

gcc/ChangeLog:

	* config/rs6000/altivec.md (altivec_vmrghw_direct_<VSX_W:mode>): Rename
	to ...
	(altivec_vmrghw_direct_<VSX_W:mode>_be): ... this.  Add the condition
	BYTES_BIG_ENDIAN.
	(altivec_vmrghw_direct_<VSX_W:mode>_le): New define_insn.
	(altivec_vmrglw_direct_<VSX_W:mode>): Rename to ...
	(altivec_vmrglw_direct_<VSX_W:mode>_be): ... this.  Add the condition
	BYTES_BIG_ENDIAN.
	(altivec_vmrglw_direct_<VSX_W:mode>_le): New define_insn.
	(altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be
	for BE and gen_altivec_vmrglw_direct_v4si_le for LE.
	(altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be
	for BE and gen_altivec_vmrghw_direct_v4si_le for LE.
	(vec_widen_umult_hi_v8hi): Adjust the call to
	gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE
	and by gen_altivec_vmrglw for LE.
	(vec_widen_smult_hi_v8hi): Likewise.
	(vec_widen_umult_lo_v8hi): Adjust the call to
	gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE
	and by gen_altivec_vmrghw for LE
	(vec_widen_smult_lo_v8hi): Likewise.
	* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
	CODE_FOR_altivec_vmrghw_direct_v4si by
	CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and
	CODE_FOR_altivec_vmrghw_direct_v4si_le for LE.  And replace
	CODE_FOR_altivec_vmrglw_direct_v4si by
	CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and
	CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
	* config/rs6000/vsx.md (vsx_xxmrghw_<VSX_W:mode>): Adjust by calling
	gen_altivec_vmrghw_direct_v4si_be for BE and
	gen_altivec_vmrglw_direct_v4si_le for LE.
	(vsx_xxmrglw_<VSX_W:mode>): Adjust by calling
	gen_altivec_vmrglw_direct_v4si_be for BE and
	gen_altivec_vmrghw_direct_v4si_le for LE.

gcc/testsuite/ChangeLog:

	* g++.target/powerpc/pr106069.C: New test.
	* gcc.target/powerpc/pr115355.c: New test.
---
 gcc/config/rs6000/altivec.md                |  80 +++++++++----
 gcc/config/rs6000/rs6000.cc                 |   8 +-
 gcc/config/rs6000/vsx.md                    |  28 +++--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 119 ++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr115355.c |  37 ++++++
 5 files changed, 232 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr115355.c

--
2.43.0

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index bb20441c096..dcc71cc0f52 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1212,16 +1212,18 @@ (define_expand "altivec_vmrghw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-			 : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })

-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1229,7 +1231,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrghw %x0,%x1,%x2
+   vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrghw %x0,%x1,%x2
    vmrghw %0,%1,%2"
@@ -1318,16 +1334,18 @@ (define_expand "altivec_vmrglw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-			 : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 })

-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
 	(vec_select:VSX_W
 	  (vec_concat:<VS_double>
@@ -1335,7 +1353,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
 	    (match_operand:VSX_W 2 "register_operand" "wa,v"))
 	  (parallel [(const_int 2) (const_int 6)
 		     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+   xxmrglw %x0,%x1,%x2
+   vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 2 "register_operand" "wa,v")
+	    (match_operand:VSX_W 1 "register_operand" "wa,v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
    xxmrglw %x0,%x1,%x2
    vmrglw %0,%1,%2"
@@ -3861,13 +3893,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3886,13 +3918,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3911,13 +3943,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
     }
   DONE;
 })
@@ -3936,13 +3968,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
     }
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e4dc629ddcc..2046a831938 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23450,8 +23450,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
 		      : CODE_FOR_altivec_vmrglh_direct,
      {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-		      : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrglw_direct_v4si_le,
      {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
     {OPTION_MASK_ALTIVEC,
      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
@@ -23462,8 +23462,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
 		      : CODE_FOR_altivec_vmrghh_direct,
      {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-		      : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+		      : CODE_FOR_altivec_vmrghw_direct_v4si_le,
      {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
     {OPTION_MASK_P8_VECTOR,
      BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..7a9c19ac903 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4798,12 +4798,14 @@ (define_expand "vsx_xxmrghw_<mode>"
 		     (const_int 1) (const_int 5)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-			 : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
@@ -4818,12 +4820,14 @@ (define_expand "vsx_xxmrglw_<mode>"
 		     (const_int 3) (const_int 7)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-			 : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+						  operands[1],
+						  operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+						  operands[2],
+						  operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..537207d2fe8
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,119 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    __builtin_memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+		      0,	  825562964,  1471091955, 1346092787,
+		      506976774,  4197066702, 518848283,  118491664,
+		      0,	  0,	      0,	  0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115355.c b/gcc/testsuite/gcc.target/powerpc/pr115355.c
new file mode 100644
index 00000000000..8955126b808
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115355.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9vector_hw } */
+/* Force vectorization with -fno-vect-cost-model to have vector unpack
+   which exposes the issue in PR115355.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -fno-vect-cost-model" } */
+
+/* Verify it runs successfully.  */
+
+__attribute__((noipa))
+void setToIdentityGOOD(unsigned long long *mVec, unsigned int mLen)
+{
+  #pragma GCC novector
+  for (unsigned int i = 0; i < mLen; i++)
+    mVec[i] = i;
+}
+
+__attribute__((noipa))
+void setToIdentityBAD(unsigned long long *mVec, unsigned int mLen)
+{
+  for (unsigned int i = 0; i < mLen; i++)
+    mVec[i] = i;
+}
+
+unsigned long long vec1[100];
+unsigned long long vec2[100];
+
+int main()
+{
+  unsigned int l = 29;
+  setToIdentityGOOD (vec1, 29);
+  setToIdentityBAD (vec2, 29);
+
+  if (__builtin_memcmp (vec1, vec2, l * sizeof (vec1[0])) != 0)
+    __builtin_abort ();
+
+  return 0;
+}