From patchwork Wed Nov  2 02:05:03 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 690225
Return-Path: 
 <gcc-patches-return-440118-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3t7rzJ3nzwz9t9b
	for <incoming@patchwork.ozlabs.org>;
	Wed,  2 Nov 2016 13:05:31 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=rmXMCOUB; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; q=dns; s=default; b=XTLSe
	Xi+wYIFVkWSRwCe4Ud1/VFHSI6lvBFa7Z/eYdZFOkykK+sQfEc1qPVsa5jVkL3yd
	GDT2nkagwZtgf/HtYRxcTmL6VFa4UXkoF5189vJwYOfNv+Ckfc9+WEIWkWcHeGV7
	ZbQZDWoXV9Q6PUq77cz2ru5uwUtRYeAoGVtPL0=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; s=default; bh=okU9LJpWmec
	zyu0+8U5GAFlzK+c=; b=rmXMCOUBeqZ+s9/PAHDhzBo4U361+w3pWclv6jEIeMQ
	ojsqbeqpbAbhcf6Vz61qrPOeooichm3B4xxZN29deBw5a3nHk9QJUH7+IZG+q2Cc
	9MpWjIjbPPkMto9hMyGN7bRPXW78bI/5DWZGWJ1ZLyGuKDz9iao60EvPzl++9mDk
	=
Received: (qmail 46604 invoked by alias); 2 Nov 2016 02:05:12 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 46448 invoked by uid 89); 2 Nov 2016 02:05:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.4 required=5.0 tests=AWL, BAYES_05,
	KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY,
	RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=deliberately,
	timode, TImode, powerpc_vsx_ok
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.158.5) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 02 Nov 2016 02:05:09 +0000
Received: from pps.filterd (m0098417.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id
	uA223VNu048436	for <gcc-patches@gcc.gnu.org>;
	Tue, 1 Nov 2016 22:05:07 -0400
Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151])	by
	mx0a-001b2d01.pphosted.com with ESMTP id
	26f3nex0w0-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Tue, 01 Nov 2016 22:05:07 -0400
Received: from localhost	by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;
	Tue, 1 Nov 2016 20:05:07 -0600
Received: from d03dlp02.boulder.ibm.com (9.17.202.178)	by e33.co.us.ibm.com
	(192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Tue, 1 Nov 2016 20:05:05 -0600
Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com
	[9.57.198.26])	by d03dlp02.boulder.ibm.com (Postfix) with
	ESMTP id BD44C3E40030; Tue,  1 Nov 2016 20:05:04 -0600 (MDT)
Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com
	[9.57.199.106])	by b01cxnp22036.gho.pok.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id uA224wMh26345566;
	Wed, 2 Nov 2016 02:05:04 GMT
Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id 3B64128041;
	Tue,  1 Nov 2016 22:05:04 -0400 (EDT)
Received: from BigMac.local (unknown [9.85.138.244])	by
	b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP id
	DC24F2805A; Tue,  1 Nov 2016 22:05:03 -0400 (EDT)
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>, will_schmidt@vnet.ibm.com
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH, rs6000] Fold vector addition built-ins in GIMPLE
Date: Tue, 1 Nov 2016 21:05:03 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12;
	rv:45.0) Gecko/20100101 Thunderbird/45.4.0
MIME-Version: 1.0
X-TM-AS-GCONF: 00
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16110202-0008-0000-0000-000005F50E77
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00006017; HX=3.00000240; KW=3.00000007;
	PH=3.00000004; SC=3.00000189; SDB=6.00775622; UDB=6.00372940;
	IPR=6.00552698; BA=6.00004849; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00013180; XFM=3.00000011;
	UTC=2016-11-02 02:05:06
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16110202-0009-0000-0000-00003CA2CCCA
Message-Id: <ba8b54e7-d2e4-99aa-7b1d-418e1b3d5e31@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2016-11-01_10:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=0 malwarescore=0 phishscore=0
	adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
	scancount=1 engine=8.0.1-1609300000
	definitions=main-1611020036
X-IsSubscribed: yes

Hi,

As Jakub suggested in response to my *ahem* ornate patch for overloaded
function built-ins, a much better approach is to use the existing
machinery for overloading and then immediately fold the specific
functions during gimplification.  There is a target hook available for
this purpose that we have not previously used.  This patch demonstrates
this functionality by implementing the target hook and folding vector
addition built-ins within it.  Future patches will fold other such
operations, improving the optimization available for many vector
intrinsics.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2016-11-01  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (gimple-ssa.h): New #include.
	(TARGET_GIMPLE_FOLD_BUILTIN): Define as
	rs6000_gimple_fold_builtin.
	(rs6000_gimple_fold_builtin): New function.  Add handling for
	early expansion of vector addition builtins.


[gcc/testsuite]

2016-11-01  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/fold-vec-add-1.c: New.
	* gcc.target/powerpc/fold-vec-add-2.c: New.
	* gcc.target/powerpc/fold-vec-add-3.c: New.
	* gcc.target/powerpc/fold-vec-add-4.c: New.
	* gcc.target/powerpc/fold-vec-add-5.c: New.
	* gcc.target/powerpc/fold-vec-add-6.c: New.
	* gcc.target/powerpc/fold-vec-add-7.c: New.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 241624)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -56,6 +56,7 @@
 #include "sched-int.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-ssa.h"
 #include "gimple-walk.h"
 #include "intl.h"
 #include "params.h"
@@ -1632,6 +1633,8 @@ static const struct attribute_spec rs6000_attribut
 
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN rs6000_fold_builtin
+#undef TARGET_GIMPLE_FOLD_BUILTIN
+#define TARGET_GIMPLE_FOLD_BUILTIN rs6000_gimple_fold_builtin
 
 #undef TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN rs6000_expand_builtin
@@ -16337,6 +16340,46 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRI
 #endif
 }
 
+/* Fold a machine-dependent built-in in GIMPLE.  (For folding into
+   a constant, use rs6000_fold_builtin.)  */
+
+bool
+rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
+  enum rs6000_builtins fn_code
+    = (enum rs6000_builtins) DECL_FUNCTION_CODE (fndecl);
+  tree arg0, arg1, lhs;
+
+  switch (fn_code)
+    {
+    /* Flavors of vec_add.  We deliberately don't expand
+       P8V_BUILTIN_VADDUQM as it gets lowered from V1TImode to
+       TImode, resulting in much poorer code generation.  */
+    case ALTIVEC_BUILTIN_VADDUBM:
+    case ALTIVEC_BUILTIN_VADDUHM:
+    case ALTIVEC_BUILTIN_VADDUWM:
+    case P8V_BUILTIN_VADDUDM:
+    case ALTIVEC_BUILTIN_VADDFP:
+    case VSX_BUILTIN_XVADDDP:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	lhs = gimple_call_lhs (stmt);
+	gimple *g = gimple_build_assign (lhs, PLUS_EXPR, arg0, arg1);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-1.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with char
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed char
+test1 (vector bool char x, vector signed char y)
+{
+  return vec_add (x, y);
+}
+
+vector signed char
+test2 (vector signed char x, vector bool char y)
+{
+  return vec_add (x, y);
+}
+
+vector signed char
+test3 (vector signed char x, vector signed char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test4 (vector bool char x, vector unsigned char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test5 (vector unsigned char x, vector bool char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test6 (vector unsigned char x, vector unsigned char y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddubm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-2.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with short
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed short
+test1 (vector bool short x, vector signed short y)
+{
+  return vec_add (x, y);
+}
+
+vector signed short
+test2 (vector signed short x, vector bool short y)
+{
+  return vec_add (x, y);
+}
+
+vector signed short
+test3 (vector signed short x, vector signed short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test4 (vector bool short x, vector unsigned short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test5 (vector unsigned short x, vector bool short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test6 (vector unsigned short x, vector unsigned short y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduhm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-3.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with int
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed int
+test1 (vector bool int x, vector signed int y)
+{
+  return vec_add (x, y);
+}
+
+vector signed int
+test2 (vector signed int x, vector bool int y)
+{
+  return vec_add (x, y);
+}
+
+vector signed int
+test3 (vector signed int x, vector signed int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test4 (vector bool int x, vector unsigned int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test5 (vector unsigned int x, vector bool int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test6 (vector unsigned int x, vector unsigned int y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduwm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-4.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with long long
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed long long
+test1 (vector bool long long x, vector signed long long y)
+{
+  return vec_add (x, y);
+}
+
+vector signed long long
+test2 (vector signed long long x, vector bool long long y)
+{
+  return vec_add (x, y);
+}
+
+vector signed long long
+test3 (vector signed long long x, vector signed long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test4 (vector bool long long x, vector unsigned long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test5 (vector unsigned long long x, vector bool long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test6 (vector unsigned long long x, vector unsigned long long y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddudm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-5.c	(working copy)
@@ -0,0 +1,16 @@
+/* Verify that overloaded built-ins for vec_add with float
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11 -mno-vsx" } */
+
+#include <altivec.h>
+
+vector float
+test1 (vector float x, vector float y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddfp" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-6.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-6.c	(working copy)
@@ -0,0 +1,23 @@
+/* Verify that overloaded built-ins for vec_add with float and
+   double inputs for VSX produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector float
+test1 (vector float x, vector float y)
+{
+  return vec_add (x, y);
+}
+
+vector double
+test2 (vector double x, vector double y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "xvaddsp" 1 } } */
+/* { dg-final { scan-assembler-times "xvadddp" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fold-vec-add-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-7.c	(working copy)
@@ -0,0 +1,22 @@
+/* Verify that overloaded built-ins for vec_add with __int128
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-additional-options "-std=gnu11 -Wno-pedantic" } */
+
+#include "altivec.h"
+
+vector signed __int128
+test1 (vector signed __int128 x, vector signed __int128 y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned __int128
+test2 (vector unsigned __int128 x, vector unsigned __int128 y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduqm" 2 } } */