From patchwork Tue Aug 29 21:47:18 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 807325
Return-Path: 
 <gcc-patches-return-461124-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-461124-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="IIwpzl+X"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3xhj0p3bqzz9s7F
	for <incoming@patchwork.ozlabs.org>;
	Wed, 30 Aug 2017 07:47:36 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; q=dns; s=default; b=Odq3F
	OeVbHLOSkeHO4wfF7p4OjQvto4diTNXF1JvtxuShm9DrYjDUKKUdjp7jLEL8RgMb
	kk1Z3cx+do1CZh2ZloIJBetjFEdwC+exdE+i6BQTflY0HAWsiZqHSlOytUmaZnbi
	Zm7TnXecRhh5ak6CR2QJCin4fTlSKL2d4neks4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; s=default; bh=9FI4zd7JKvS
	YcZOAyaEx9NCTKYE=; b=IIwpzl+XcZuxUCXk0NGnaza6E9Gtg6hGB7fwLxND6QL
	bX087AjC9+lcL9WhCWorLHcK/hCvLAuIiPvT2MuI99lIqC16ixzwq6fZQzWVK3/U
	Tutv5Gk/7n6vbuF8P1MoExzvaoCM/RUMMq71nDazocqaBsGY0bGCLIS3Y+RRFS/8
	=
Received: (qmail 92234 invoked by alias); 29 Aug 2017 21:47:28 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 92225 invoked by uid 89); 29 Aug 2017 21:47:27 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-10.2 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	KAM_LAZY_DOMAIN_SECURITY,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.156.1) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 29 Aug 2017 21:47:25 +0000
Received: from pps.filterd (m0098404.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id
	v7TLkFU9147163	for <gcc-patches@gcc.gnu.org>;
	Tue, 29 Aug 2017 17:47:23 -0400
Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208])	by
	mx0a-001b2d01.pphosted.com with ESMTP id
	2cna46jtjw-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Tue, 29 Aug 2017 17:47:23 -0400
Received: from localhost	by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;
	Tue, 29 Aug 2017 17:47:21 -0400
Received: from b01cxnp22035.gho.pok.ibm.com (9.57.198.25)	by
	e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	Tue, 29 Aug 2017 17:47:20 -0400
Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com
	[9.57.199.109])	by b01cxnp22035.gho.pok.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id v7TLlJjP32833570;
	Tue, 29 Aug 2017 21:47:19 GMT
Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id E9E59112054;
	Tue, 29 Aug 2017 17:47:05 -0400 (EDT)
Received: from BigMac.local (unknown [9.80.238.20])	by
	b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id
	9690A112047; Tue, 29 Aug 2017 17:47:05 -0400 (EDT)
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH v2, rs6000] Fix PR81833
Date: Tue, 29 Aug 2017 16:47:18 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12;
	rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
X-TM-AS-GCONF: 00
x-cbid: 17082921-0044-0000-0000-00000384E6BD
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007633; HX=3.00000241; KW=3.00000007;
	PH=3.00000004; SC=3.00000226; SDB=6.00909425; UDB=6.00456107;
	IPR=6.00689709; BA=6.00005560; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00016921; XFM=3.00000015;
	UTC=2017-08-29 21:47:21
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17082921-0045-0000-0000-000007B2FF3C
Message-Id: <75124dce-5afd-12bf-26a9-3370904c869f@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2017-08-29_09:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=0 malwarescore=0 phishscore=0
	adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
	scancount=1 engine=8.0.1-1707230000
	definitions=main-1708290330
X-IsSubscribed: yes

Hi Segher,

Thanks for approving the previous patch with changes.  I've made those and also
modified the test case to require VSX hardware for execution.  I duplicated the
test so we get coverage on P7 BE 32/64 and P8 BE/LE.  I'd appreciate it if you
could look over the dejagnu instructions once more on these.  Thanks!

Bill


[gcc]

2017-08-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/81833
	* config/rs6000/altivec.md (altivec_vsum2sws): Convert from a
	define_insn to a define_expand.
	(altivec_vsum2sws_direct): New define_insn.
	(altivec_vsumsws): Convert from a define_insn to a define_expand.

[gcc/testsuite]

2017-08-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/81833
	* gcc.target/powerpc/pr81833-1.c: New file.
	* gcc.target/powerpc/pr81833-2.c: New file.

Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 251369)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -1804,51 +1804,61 @@
   "vsum4s<VI_char>s %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; FIXME: For the following two patterns, the scratch should only be
-;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should
-;; be emitted separately.
-(define_insn "altivec_vsum2sws"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-		     UNSPEC_VSUM2SWS))
-   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
-   (clobber (match_scratch:V4SI 3 "=v"))]
+(define_expand "altivec_vsum2sws"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:V4SI 1 "register_operand"))
+   (use (match_operand:V4SI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
   if (VECTOR_ELT_ORDER_BIG)
-    return "vsum2sws %0,%1,%2";
+    emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1],
+                                            operands[2]));
   else
-    return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4";
-}
-  [(set_attr "type" "veccomplex")
-   (set (attr "length")
-     (if_then_else
-       (match_test "VECTOR_ELT_ORDER_BIG")
-       (const_string "4")
-       (const_string "12")))])
+    {
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2],
+                                          operands[2], GEN_INT (12)));
+      emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1));
+      emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
+                                          GEN_INT (4)));
+    }
+  DONE;
+})
 
-(define_insn "altivec_vsumsws"
+; FIXME: This can probably be expressed without an UNSPEC.
+(define_insn "altivec_vsum2sws_direct"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-		     UNSPEC_VSUMSWS))
-   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
-   (clobber (match_scratch:V4SI 3 "=v"))]
+	              (match_operand:V4SI 2 "register_operand" "v")]
+		     UNSPEC_VSUM2SWS))
+   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
   "TARGET_ALTIVEC"
+  "vsum2sws %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_expand "altivec_vsumsws"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:V4SI 1 "register_operand"))
+   (use (match_operand:V4SI 2 "register_operand"))]
+  "TARGET_ALTIVEC"
 {
   if (VECTOR_ELT_ORDER_BIG)
-    return "vsumsws %0,%1,%2";
+    emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1],
+                                           operands[2]));
   else
-    return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12";
-}
-  [(set_attr "type" "veccomplex")
-   (set (attr "length")
-     (if_then_else
-       (match_test "(VECTOR_ELT_ORDER_BIG)")
-       (const_string "4")
-       (const_string "12")))])
+    {
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_altivec_vspltw_direct (tmp1, operands[2], const0_rtx));
+      emit_insn (gen_altivec_vsumsws_direct (tmp2, operands[1], tmp1));
+      emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
+                                          GEN_INT (12)));
+    }
+  DONE;
+})
 
+; FIXME: This can probably be expressed without an UNSPEC.
 (define_insn "altivec_vsumsws_direct"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
Index: gcc/testsuite/gcc.target/powerpc/pr81833-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr81833-1.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr81833-1.c	(working copy)
@@ -0,0 +1,59 @@
+/* PR81833: This used to fail due to improper implementation of vec_msum.  */
+/* Test case relies on -mcpu=power7 or later.  Currently we don't have
+   machinery to express that, so we have two separate tests for -mcpu=power7
+   and -mcpu=power8 to catch 32-bit BE on P7 and 64-bit BE/LE on P8.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+#include <altivec.h>
+
+#define vec_u8  vector unsigned char
+#define vec_s8  vector signed char
+#define vec_u16 vector unsigned short
+#define vec_s16 vector signed short
+#define vec_u32 vector unsigned int
+#define vec_s32 vector signed int
+#define vec_f   vector float
+
+#define LOAD_ZERO const vec_u8 zerov = vec_splat_u8 (0)
+
+#define zero_u8v  (vec_u8)  zerov
+#define zero_s8v  (vec_s8)  zerov
+#define zero_u16v (vec_u16) zerov
+#define zero_s16v (vec_s16) zerov
+#define zero_u32v (vec_u32) zerov
+#define zero_s32v (vec_s32) zerov
+
+signed int __attribute__((noinline))
+scalarproduct_int16_vsx (const signed short *v1, const signed short *v2,
+			 int order)
+{
+  int i;
+  LOAD_ZERO;
+  register vec_s16 vec1;
+  register vec_s32 res = vec_splat_s32 (0), t;
+  signed int ires;
+
+  for (i = 0; i < order; i += 8) {
+    vec1 = vec_vsx_ld (0, v1);
+    t    = vec_msum (vec1, vec_vsx_ld (0, v2), zero_s32v);
+    res  = vec_sums (t, res);
+    v1  += 8;
+    v2  += 8;
+  }
+  res = vec_splat (res, 3);
+  vec_ste (res, 0, &ires);
+
+  return ires;
+}
+
+int main(void)
+{
+  const signed short test_vec[] = { 1, 1, 1, 1, 1, 1, 1, 1 };
+  if (scalarproduct_int16_vsx (test_vec, test_vec, 8) != 8)
+    __builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr81833-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr81833-2.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr81833-2.c	(working copy)
@@ -0,0 +1,59 @@
+/* PR81833: This used to fail due to improper implementation of vec_msum.  */
+/* Test case relies on -mcpu=power7 or later.  Currently we don't have
+   machinery to express that, so we have two separate tests for -mcpu=power7
+   and -mcpu=power8 to catch 32-bit BE on P7 and 64-bit BE/LE on P8.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -O2" } */
+
+#include <altivec.h>
+
+#define vec_u8  vector unsigned char
+#define vec_s8  vector signed char
+#define vec_u16 vector unsigned short
+#define vec_s16 vector signed short
+#define vec_u32 vector unsigned int
+#define vec_s32 vector signed int
+#define vec_f   vector float
+
+#define LOAD_ZERO const vec_u8 zerov = vec_splat_u8 (0)
+
+#define zero_u8v  (vec_u8)  zerov
+#define zero_s8v  (vec_s8)  zerov
+#define zero_u16v (vec_u16) zerov
+#define zero_s16v (vec_s16) zerov
+#define zero_u32v (vec_u32) zerov
+#define zero_s32v (vec_s32) zerov
+
+signed int __attribute__((noinline))
+scalarproduct_int16_vsx (const signed short *v1, const signed short *v2,
+			 int order)
+{
+  int i;
+  LOAD_ZERO;
+  register vec_s16 vec1;
+  register vec_s32 res = vec_splat_s32 (0), t;
+  signed int ires;
+
+  for (i = 0; i < order; i += 8) {
+    vec1 = vec_vsx_ld (0, v1);
+    t    = vec_msum (vec1, vec_vsx_ld (0, v2), zero_s32v);
+    res  = vec_sums (t, res);
+    v1  += 8;
+    v2  += 8;
+  }
+  res = vec_splat (res, 3);
+  vec_ste (res, 0, &ires);
+
+  return ires;
+}
+
+int main(void)
+{
+  const signed short test_vec[] = { 1, 1, 1, 1, 1, 1, 1, 1 };
+  if (scalarproduct_int16_vsx (test_vec, test_vec, 8) != 8)
+    __builtin_abort ();
+  return 0;
+}