From patchwork Fri Jun 16 02:10:28 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.vnet.ibm.com>
X-Patchwork-Id: 776531
Return-Path: 
 <gcc-patches-return-456060-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3wpkPM1T9xz9s3s
	for <incoming@patchwork.ozlabs.org>;
	Fri, 16 Jun 2017 12:11:02 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="qL4P3+iW"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:references:mime-version:content-type
	:in-reply-to:message-id; q=dns; s=default; b=No4ZkJJH74O5cJ6l3Q4
	OhCQEfwKnRIflZeZiGQ/yepST/oVL3FldzIKUkfOBMymM9mq3flZsCGpt3S7ohtp
	lvxwC1b45Z8rjk4brtjGb115tUw6WBSQgij8LuCUVKcoLNTOI/VcBjYWSC3Xe9Uh
	od39quWRF67tOuFM36aDRbbY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:references:mime-version:content-type
	:in-reply-to:message-id; s=default; bh=V/qUR7Gj1OUwXoLVfbD62FV67
	70=; b=qL4P3+iW8lW774mCR/EM7k3XA3URXDb5iMdtufgbl4booX+X42VhUUqzR
	w2jDsUMM8ex0YoojoD+GdAPIdL8dvQ36FynuGyAsmZc6apbDLkaBzHQKfSza5108
	WWFbGRFoJsKUcZOKYVAs+m6FwhHa6tBkYkXxrq4DZkRR/g73mg=
Received: (qmail 94038 invoked by alias); 16 Jun 2017 02:10:36 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 94012 invoked by uid 89); 16 Jun 2017 02:10:35 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-9.5 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	KAM_LAZY_DOMAIN_SECURITY, KHOP_DYNAMIC,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.158.5) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 16 Jun 2017 02:10:31 +0000
Received: from pps.filterd (m0098421.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id
	v5G28W5S053530	for <gcc-patches@gcc.gnu.org>;
	Thu, 15 Jun 2017 22:10:34 -0400
Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159])	by
	mx0a-001b2d01.pphosted.com with ESMTP id
	2b44gg3egy-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Thu, 15 Jun 2017 22:10:33 -0400
Received: from localhost	by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from
	<meissner@ibm-tiger.the-meissners.org>;
	Thu, 15 Jun 2017 20:10:32 -0600
Received: from b03cxnp07029.gho.boulder.ibm.com (9.17.130.16)	by
	e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	Thu, 15 Jun 2017 20:10:29 -0600
Received: from b03ledav004.gho.boulder.ibm.com
	(b03ledav004.gho.boulder.ibm.com [9.17.130.235])	by
	b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)
	with ESMTP id v5G2ATL210027488; Thu, 15 Jun 2017 19:10:29 -0700
Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1])	by
	IMSVA (Postfix) with ESMTP id EB1F378043;
	Thu, 15 Jun 2017 20:10:28 -0600 (MDT)
Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111])	by
	b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP id
	C60D678038; Thu, 15 Jun 2017 20:10:28 -0600 (MDT)
Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500)	id
	3E72845F58; Thu, 15 Jun 2017 22:10:28 -0400 (EDT)
Date: Thu, 15 Jun 2017 22:10:28 -0400
From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: Michael Meissner <meissner@linux.vnet.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH, rev 2] PR target/79799,
	Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
Mail-Followup-To: Michael Meissner <meissner@linux.vnet.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Bill Schmidt <wschmidt@linux.vnet.ibm.com>
References: <20170615000158.GA11033@ibm-tiger.the-meissners.org>
	<20170615233938.GA15195@ibm-tiger.the-meissners.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20170615233938.GA15195@ibm-tiger.the-meissners.org>
User-Agent: Mutt/1.5.20 (2009-12-10)
X-TM-AS-GCONF: 00
x-cbid: 17061602-0028-0000-0000-000007D22831
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007240; HX=3.00000241; KW=3.00000007;
	PH=3.00000004; SC=3.00000212; SDB=6.00875395; UDB=6.00435846;
	IPR=6.00655484; BA=6.00005423; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00015844; XFM=3.00000015;
	UTC=2017-06-16 02:10:31
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17061602-0029-0000-0000-0000363D8998
Message-Id: <20170616021027.GA2916@ibm-tiger.the-meissners.org>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2017-06-16_01:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=0 malwarescore=0 phishscore=0
	adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
	scancount=1 engine=8.0.1-1703280000
	definitions=main-1706160032
X-IsSubscribed: yes

On Thu, Jun 15, 2017 at 07:39:39PM -0400, Michael Meissner wrote:
> I thought the patch was fine as I posted.  I had an optimization I thought
> about (optimizing for inserting 0.0f) and I noticed some problems with it.
> However, even in backing out the change, there are some problems.  So, I will
> hopefully reissue the patch tomorrow.

Ok, the problem was I need to patch the compiler with a work around to run code
on the current alpha hardware, and in backing out the patches of the code I was
working on, I backed out the work around as well.

This patch replaces the first patch.  It adds an optimazation so that if you
set a field in a V4SFmode vector to 0.0f, the compiler will know it can just
clear the field, and it doesn't have to convert the 0.0 in internal scalar
format to vector format witht he XSCVDPSPN instruction.

As before, I have bootstrapped this patch on a little endian power8 system, and
I had no regressions in the test suite.  The new tests pr79799-{1,2,3,5}.c all
generate the appropriate code.  I have also done a non-bootstrap build and make
check on the alpha power9 hardware with --with-cpu=power9, and there are no
regressions.  The executable test (pr79799-4.c) runs fine.

Can I install this change to the trunk?  After a week of burn-in, can I install
this on the GCC 7.x branch?  Note, it will not work on previous branches.

[gcc]
2017-06-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
	for doing vector set of SFmode on ISA 3.0.
	* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
	(vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
	element.
	(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
	SFmode value into a V4SF variable that was extracted from another
	V4SF variable without converting the element to double precision
	and back to single precision vector format.
	(vsx_insert_extract_v4sf_p9_2): Likewise.

[gcc/testsuite]
2017-06-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* gcc.target/powerpc/pr79799-1.c: New test.
	* gcc.target/powerpc/pr79799-2.c: Likewise.
	* gcc.target/powerpc/pr79799-3.c: Likewise.
	* gcc.target/powerpc/pr79799-4.c: Likewise.
	* gcc.target/powerpc/pr79799-5.c: Likewise.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 249175)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -7442,6 +7442,9 @@ rs6000_expand_vector_set (rtx target, rt
       else if (mode == V2DImode)
 	insn = gen_vsx_set_v2di (target, target, val, elt_rtx);
 
+      else if (TARGET_P9_VECTOR && mode == V4SFmode)
+	insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx);
+
       else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
 	       && TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
 	{
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 249175)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3012,6 +3012,130 @@ (define_insn "vsx_set_<mode>_p9"
 }
   [(set_attr "type" "vecperm")])
 
+(define_insn_and_split "vsx_set_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "gpc_reg_operand" "ww")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+	(unspec:V4SF [(match_dup 2)]
+		     UNSPEC_VSX_CVDPSPN))
+   (parallel [(set (match_dup 4)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 7)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 8)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  unsigned int tmp_regno = reg_or_subregno (operands[4]);
+
+  operands[5] = gen_rtx_REG (V4SFmode, tmp_regno);
+  operands[6] = gen_rtx_REG (V4SImode, tmp_regno);
+  operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2);
+  operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "12")])
+
+;; Special case setting 0.0f to a V4SF element
+(define_insn_and_split "*vsx_set_v4sf_p9_zero"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "zero_fp_constant" "j")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(const_int 0))
+   (set (match_dup 5)
+	(unspec:V4SI [(match_dup 5)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  operands[5] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "8")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  We just need to do an xxinsertw since the element is in the
+;; correct location.
+
+(define_insn "*vsx_insert_extract_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR
+   && (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+{
+  int ele = INTVAL (operands[4]);
+
+  if (!VECTOR_ELT_ORDER_BIG)
+    ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
+
+  operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
+  return "xxinsertw %x0,%x2,%4";
+}
+  [(set_attr "type" "vecperm")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is not the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  Convert the insert/extract to int and avoid doing the conversion.
+
+(define_insn_and_split "*vsx_insert_extract_v4sf_p9_2"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 5 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode)
+   && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 5)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 3)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 7)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 5)
+		      (match_dup 4)]
+		     UNSPEC_VSX_SET))]
+{
+  if (GET_CODE (operands[5]) == SCRATCH)
+    operands[5] = gen_reg_rtx (SImode);
+
+  operands[6] = gen_lowpart (V4SImode, operands[2]);
+  operands[7] = gen_lowpart (V4SImode, operands[0]);
+  operands[8] = gen_lowpart (V4SImode, operands[1]);
+}
+  [(set_attr "type" "vecperm")])
+
 ;; Expanders for builtins
 (define_expand "vsx_mergel_<mode>"
   [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
Index: gcc/testsuite/gcc.target/powerpc/pr79799-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(working copy)
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* GCC 7.1 did not have a specialized method for inserting 32-bit floating point on
+   ISA 3.0 (power9) systems.  */
+
+vector float
+insert_arg_0 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 0);
+}
+
+vector float
+insert_arg_1 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 1);
+}
+
+vector float
+insert_arg_2 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 2);
+}
+
+vector float
+insert_arg_3 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 3);
+}
+
+/* { dg-final { scan-assembler     {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler     {\mxxinsertw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}     } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}       } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}     } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}    } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}    } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}   } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the default
+   scalar position.  */
+
+#if __ORDER_LITTLE_ENDIAN__
+#define ELE 2
+#else
+#define ELE 1
+#endif
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, ELE), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode.  */
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 4), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler     {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(working copy)
@@ -0,0 +1,105 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+#include <stdlib.h>
+
+__attribute__ ((__noinline__))
+vector float
+insert_0 (vector float v, float f)
+{
+  return vec_insert (f, v, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_1 (vector float v, float f)
+{
+  return vec_insert (f, v, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_2 (vector float v, float f)
+{
+  return vec_insert (f, v, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_3 (vector float v, float f)
+{
+  return vec_insert (f, v, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+
+  v1 = insert_0 (v1, 5.0f);
+  v1 = insert_1 (v1, 6.0f);
+  v1 = insert_2 (v1, 7.0f);
+  v1 = insert_3 (v1, 8.0f);
+
+  if (vec_any_ne (v1, v2))
+    abort ();
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_0_3 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 3), v1, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_1_2 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 2), v1, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_2_1 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 1), v1, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_3_0 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 0), v1, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert_extract (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+  vector float v3 = { 8.0f, 7.0f, 6.0f, 5.0f };
+
+  v1 = insert_extract_0_3 (v1, v2);
+  v1 = insert_extract_1_2 (v1, v2);
+  v1 = insert_extract_2_1 (v1, v2);
+  v1 = insert_extract_3_0 (v1, v2);
+
+  if (vec_any_ne (v1, v3))
+    abort ();
+}
+
+int
+main (void)
+{
+  test_insert ();
+  test_insert_extract ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr79799-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-5.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-5.c	(working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Insure setting 0.0f to a V4SFmode element does not do a FP conversion.  */
+
+vector float
+insert_arg_0 (vector float vf)
+{
+  return vec_insert (0.0f, vf, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
+/* { dg-final { scan-assembler-not {\mxscvdpspn\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */