From patchwork Mon Oct 31 22:28:42 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 689661
Return-Path: 
 <gcc-patches-return-440010-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3t78D96PSwz9s3v
	for <incoming@patchwork.ozlabs.org>;
	Tue,  1 Nov 2016 09:29:09 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=xF/NWrRb; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:cc:date:mime-version:content-type
	:content-transfer-encoding:message-id; q=dns; s=default; b=J/Tgx
	GM77sNqItOm94K8HWphzovzHX+88aPXk5vF1Me2AzMOHO/44eJNyDL6VZQNnhQME
	qve4rvbXypuhIF9q1ZTLwCcFkvTxUVySqXrSAYF+7fcNSy61PwOfPhv/69wRKks6
	gPOwBrpmpA8r015yG9VPoY/RWdSsW1xFUnqZD8=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:cc:date:mime-version:content-type
	:content-transfer-encoding:message-id; s=default; bh=C22X2pdqtnb
	12cp10tG+qiNAB94=; b=xF/NWrRb53zW4nx2reKvmiDorPkr599TsVT6xdJbRqb
	R3GDjxlbT/+dwbcIQReLzAuHyUBpjTlnHzq10hzQNh1ARK07lyp9r8eIaEVINDwu
	qnt2/8PQt54jXEJnYlXJw2N9ZIJaCGZPCm9Q+XPSAbo4w50IEMOea6/tjdv7Wzxk
	=
Received: (qmail 50683 invoked by alias); 31 Oct 2016 22:29:01 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 50657 invoked by uid 89); 31 Oct 2016 22:29:00 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_00,
	KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY,
	RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=granted,
	Exception, ease, regret
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.158.5) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Mon, 31 Oct 2016 22:28:49 +0000
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])	by
	mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id
	u9VMSbZP108970	for <gcc-patches@gcc.gnu.org>;
	Mon, 31 Oct 2016 18:28:48 -0400
Received: from e19.ny.us.ibm.com (e19.ny.us.ibm.com [129.33.205.209])	by
	mx0b-001b2d01.pphosted.com with ESMTP id
	26ebsttc28-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Mon, 31 Oct 2016 18:28:47 -0400
Received: from localhost	by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;
	Mon, 31 Oct 2016 18:28:47 -0400
Received: from d01dlp02.pok.ibm.com (9.56.250.167)	by e19.ny.us.ibm.com
	(146.89.104.206) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Mon, 31 Oct 2016 18:28:44 -0400
Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com
	[9.57.198.26])	by d01dlp02.pok.ibm.com (Postfix) with ESMTP
	id 682C66E801D; Mon, 31 Oct 2016 18:28:19 -0400 (EDT)
Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com
	[9.57.199.110])	by b01cxnp22036.gho.pok.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id u9VMSiNQ24969354;
	Mon, 31 Oct 2016 22:28:44 GMT
Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id E0ACCAE054;
	Mon, 31 Oct 2016 18:28:43 -0400 (EDT)
Received: from BigMac.local (unknown [9.85.138.244])	by
	b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP id
	6B3F4AE03B; Mon, 31 Oct 2016 18:28:43 -0400 (EDT)
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH, RFC,
	rs6000] Add overloaded built-in function support to altivec.h,
	and re-implement vec_add
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>, willschm@vnet.ibm.com,
	Michael Meissner <meissner@linux.vnet.ibm.com>
Date: Mon, 31 Oct 2016 17:28:42 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12;
	rv:45.0) Gecko/20100101 Thunderbird/45.4.0
MIME-Version: 1.0
X-TM-AS-GCONF: 00
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16103122-0056-0000-0000-000001C9D4F1
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00006010; HX=3.00000240; KW=3.00000007;
	PH=3.00000004; SC=3.00000189; SDB=6.00775158; UDB=6.00372608;
	IPR=6.00552152; BA=6.00004845; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00013164; XFM=3.00000011;
	UTC=2016-10-31 22:28:46
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16103122-0057-0000-0000-000005FCEE66
Message-Id: <4fb8f7f2-ff17-6416-3869-a8576c245dde@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2016-10-31_09:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=2 malwarescore=0 phishscore=0
	adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
	scancount=1 engine=8.0.1-1609300000
	definitions=main-1610310394
X-IsSubscribed: yes

Hi,

The PowerPC back end loses performance on vector intrinsics, because currently
all of them are treated as calls throughout the middle-end phases and only
expanded when they reach RTL.  Our version of altivec.h currently defines the
public names of overloaded functions (like vec_add) to be #defines for hidden
functions (like __builtin_vec_add), which are recognized in the parser as 
requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
the overloaded functions to specific function calls appropriate to the argument
types.

The Clang version of altivec.h, by contrast, creates static inlines for each
overloaded function variant, relying on a special __attribute__((overloadable))
construct to do the dispatch in the parser itself.  This allows vec_add to be
immediately translated into type-specific addition during parsing, allowing
the expressions to be subject to all subsequent optimization.

We have opened a PR suggesting that this attribute be supported in GCC as well
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71199), but so far there hasn't
been any success in that regard.  While waiting/hoping for the attribute to be
implemented, though, we can use existing mechanisms to create a poor man's
version of overloading dispatch.  This patch is a proof of concept for how
this can be done, and provides support for early expansion of the overloaded
vec_add intrinsic.  If we get this working, then we can gradually add more
intrinsics over time.

The dispatch mechanism is provided in a new header file, overload.h, which is
included in altivec.h.  This is done because the guts of the dispatch
mechanism are pretty ugly to look at.  Overloading is done with a chain of
calls to __builtin_choose_expr and __builtin_types_compatible_p.  Currently
I envision providing a separate dispatch macro for each combination of the
number of arguments and the number of variants to be distinguished.  I also
provide a separate "decl" macro for each number of arguments, used to create
the function decls for each static inline function.  The add_vec intrinsic
takes two input arguments and has 28 variants, so it requires the definition
of OVERLOAD_2ARG_28VAR and OVERLOAD_2ARG_DECL in overload.h.

These macros are then instantiated in altivec.h.  The dispatch macro for an
overloaded intrinsic is instantiated once, and the decl macro is instantiated
once for each variant, along with the associated inline function body.

The dispatch macro may need to vary depending on the supported processor
features.  In the vec_add example, we have some variants that support the
"vector double" and "vector long long" data types.  These only exist when
VSX code generation is supported, so a dispatch table conditioned on
__VSX__ includes these, while a separate one without VSX support does not.
Similarly, __POWER8_VECTOR__ must be defined if we are to support "vector
signed/unsigned __int128".  Because we use a numbering scheme that needs
to be kept consistent, this requires three versions of the dispatch table,
where the more restrictive versions replace the unimplemented entries with
redundant entries.

Note that if and when we get an overloadable attribute in GCC, the stuff
in overload.h will become obsolete, we will remove the dispatch instantiations,
and we will replace the decl instantiations with plain decls with the
overloadable attribute applied.

There are several complications on top of the basic design:

 * When compiling for C++, the dispatch mechanism is not available, and indeed
   is not necessary.  Thus for C++ we skip the dispatch mechanism, and change
   the definition of OVERLOAD_2ARG_DECL to use standard function overloading.

 * Compiling with -ansi or -std=c11 or the like means the dispatch mechanism
   is unavailable even for C, since GNU extensions are disallowed.  Regret-
   tably, this means that we can't get rid of the existing late-expansion
   methods altogether.  I don't see any way to avoid this.  Note that this
   would be the case even if we had __attribute__ ((overloadable)), since
   that would also be a GNU extension.  Despite the mess, I think that the 
   performance improvements for non-strict-ANSI code make the dual maintenance
   worthwhile.

 * "#pragma GCC target" is going to cause a lot of trouble.  With the patch
   in its present state, we fail gcc.target/powerpc/ppc-target-4.c, which
   tests the use of "vsx", "altivec,no-vsx", and "no-altivec" target options,
   and happens to use vec_add (float, float) as a testbed.  The test fails
   because altivec.h is #included after the #pragma GCC target("vsx"), which
   allows the interfaces involving vector long long and vector double to be
   produced.  However, when the options are changed to "altivec,no-vsx", the
   subsequent invocation of vec_add expands to a dispatch sequence including
   vector long long, leading to a compile-time error.

   I can only think of two ways to deal with this, neither of which is
   attractive.  The first idea would be to make altivec.h capable of being
   inlined more than once.  This essentially requires an #undef before each
   #define.  Once this is done, usage of #pragma GCC target would be 
   supported provided that altivec.h is re-included after each such #pragma,
   so that the dispatch macros would be re-evaluated in the new context.
   The problem with this is that existing code not conforming to this
   requirement would fail to compile, so this is probably off the table.

   The other way would be to require a specific option on the command line
   to use the new dispatch mechanism.  When the option is present, we would
   predefine a macro such as __PPC_FAST_VECTOR__, which would then gate the
   usage in altivec.h and overload.h.  Use of #pragma GCC target to change
   the availability of Altivec, VMX, P8-vector, etc. would also be disallowed
   when the option is present.  This has the advantage of always generating
   correct code, at the cost of requiring a special option before anyone
   can leverage the benefits of early vector expansion.  That's unfortunate,
   but I suspect it's the best we can do.

The current patch is nearly complete, but the #pragma GCC target issue is
not yet resolved.  I'd like to get opinions on the overall approach of the
patch and whether you agree with my assessment of the #pragma issue before
taking the patch forward.  Thanks for reading this far, and thanks in 
advance for your opinions.  We can get some big performance improvements
here eventually, but the road is a bit rocky.

Thanks,
Bill


[gcc]

2016-10-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.h: Add new include of overload.h; when not
	compiling for C++ or strict ANSI, add new #defines for vec_add in
	terms of OVERLOAD_2ARG_28VAR and OVERLOAD_2ARG_DECL macros; when
	compiling for C++ but not for strict ANSI, use just the
	OVERLOAD_2ARG_DECL macros; when not compiling for strict ANSI,
	remove #define of vec_add in terms of __builtin_vec_add.
	* config/rs6000/overload.h: New file, with #defines of
	OVERLOAD_2ARG_28VAR when not compiling for C++ or strict ANSI, and
	two different flavors of OVERLOAD_2ARG_DECL (C++ and otherwise)
	when not compiling for strict ANSI.
	* config.gcc: For each triple that includes altivec.h in
	extra_headers, also add overload.h.

[gcc/testsuite]

2016-10-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/overload-add-1.c: New.
	* gcc.target/powerpc/overload-add-2.c: New.
	* gcc.target/powerpc/overload-add-3.c: New.
	* gcc.target/powerpc/overload-add-4.c: New.
	* gcc.target/powerpc/overload-add-5.c: New.
	* gcc.target/powerpc/overload-add-6.c: New.
	* gcc.target/powerpc/overload-add-7.c: New.

Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 241624)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -53,6 +53,353 @@
 #define __CR6_LT		2
 #define __CR6_LT_REV		3
 
+/* Machinery to support overloaded functions in C.  */
+#include "overload.h"
+
+/* Overloaded function declarations.  Please maintain these in
+   alphabetical order.  */
+
+/* Since __builtin_choose_expr and __builtin_types_compatible_p
+   aren't permitted in C++, we'll need to use standard overloading
+   for those.  Disable this mechanism for C++.  GNU extensions are
+   also unavailable for -ansi, -std=c11, etc.  */
+#ifndef __STRICT_ANSI__
+#ifndef __cplusplus
+
+#ifdef __POWER8_VECTOR__
+#define vec_add(a1, a2)							\
+  OVERLOAD_2ARG_28VAR(vec_add, a1, a2,					\
+    1, vector bool char, vector signed char,				\
+    2, vector signed char, vector bool char,				\
+    3, vector signed char, vector signed char,				\
+    4, vector bool char, vector unsigned char,				\
+    5, vector unsigned char, vector bool char,				\
+    6, vector unsigned char, vector unsigned char,			\
+    7, vector bool short, vector signed short,				\
+    8, vector signed short, vector bool short,				\
+    9, vector signed short, vector signed short,			\
+    10, vector bool short, vector unsigned short,			\
+    11, vector unsigned short, vector bool short,			\
+    12, vector unsigned short, vector unsigned short,			\
+    13, vector bool int, vector signed int,				\
+    14, vector signed int, vector bool int,				\
+    15, vector signed int, vector signed int,				\
+    16, vector bool int, vector unsigned int,				\
+    17, vector unsigned int, vector bool int,				\
+    18, vector unsigned int, vector unsigned int,			\
+    19, vector bool long long, vector signed long long,			\
+    20, vector signed long long, vector bool long long,			\
+    21, vector signed long long, vector signed long long,		\
+    22, vector bool long long, vector unsigned long long,		\
+    23, vector unsigned long long, vector bool long long,		\
+    24, vector unsigned long long, vector unsigned long long,		\
+    25, vector float, vector float,					\
+    26, vector double, vector double,					\
+    27, vector signed __int128, vector signed __int128,			\
+    28, vector unsigned __int128, vector unsigned __int128)
+#elif defined __VSX__
+#define vec_add(a1, a2)							\
+  OVERLOAD_2ARG_28VAR(vec_add, a1, a2,					\
+    1, vector bool char, vector signed char,				\
+    2, vector signed char, vector bool char,				\
+    3, vector signed char, vector signed char,				\
+    4, vector bool char, vector unsigned char,				\
+    5, vector unsigned char, vector bool char,				\
+    6, vector unsigned char, vector unsigned char,			\
+    7, vector bool short, vector signed short,				\
+    8, vector signed short, vector bool short,				\
+    9, vector signed short, vector signed short,			\
+    10, vector bool short, vector unsigned short,			\
+    11, vector unsigned short, vector bool short,			\
+    12, vector unsigned short, vector unsigned short,			\
+    13, vector bool int, vector signed int,				\
+    14, vector signed int, vector bool int,				\
+    15, vector signed int, vector signed int,				\
+    16, vector bool int, vector unsigned int,				\
+    17, vector unsigned int, vector bool int,				\
+    18, vector unsigned int, vector unsigned int,			\
+    19, vector bool long long, vector signed long long,			\
+    20, vector signed long long, vector bool long long,			\
+    21, vector signed long long, vector signed long long,		\
+    22, vector bool long long, vector unsigned long long,		\
+    23, vector unsigned long long, vector bool long long,		\
+    24, vector unsigned long long, vector unsigned long long,		\
+    25, vector float, vector float,					\
+    26, vector double, vector double,					\
+    26, vector double, vector double,					\
+    26, vector double, vector double)
+#else
+#define vec_add(a1, a2)							\
+  OVERLOAD_2ARG_28VAR(vec_add, a1, a2,					\
+    1, vector bool char, vector signed char,				\
+    2, vector signed char, vector bool char,				\
+    3, vector signed char, vector signed char,				\
+    4, vector bool char, vector unsigned char,				\
+    5, vector unsigned char, vector bool char,				\
+    6, vector unsigned char, vector unsigned char,			\
+    7, vector bool short, vector signed short,				\
+    8, vector signed short, vector bool short,				\
+    9, vector signed short, vector signed short,			\
+    10, vector bool short, vector unsigned short,			\
+    11, vector unsigned short, vector bool short,			\
+    12, vector unsigned short, vector unsigned short,			\
+    13, vector bool int, vector signed int,				\
+    14, vector signed int, vector bool int,				\
+    15, vector signed int, vector signed int,				\
+    16, vector bool int, vector unsigned int,				\
+    17, vector unsigned int, vector bool int,				\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    25, vector float, vector float,					\
+    25, vector float, vector float,					\
+    25, vector float, vector float,					\
+    25, vector float, vector float)
+#endif /* __POWER8_VECTOR__ #elif __VSX__ */
+
+#endif /* !__cplusplus */
+
+OVERLOAD_2ARG_DECL(vec_add, 1,						\
+		   vector signed char,					\
+		   vector bool char, a1,				\
+		   vector signed char, a2)
+{
+  return (vector signed char)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 2,						\
+		   vector signed char,					\
+		   vector signed char, a1,				\
+		   vector bool char, a2)
+{
+  return a1 + (vector signed char)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 3,						\
+		   vector signed char,					\
+		   vector signed char, a1,				\
+		   vector signed char, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 4,						\
+		   vector unsigned char,				\
+		   vector bool char, a1,				\
+		   vector unsigned char, a2)
+{
+  return (vector unsigned char)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 5,						\
+		   vector unsigned char,				\
+		   vector unsigned char, a1,				\
+		   vector bool char, a2)
+{
+  return a1 + (vector unsigned char)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 6,						\
+		   vector unsigned char,				\
+		   vector unsigned char, a1,				\
+		   vector unsigned char, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 7,						\
+		   vector signed short,					\
+		   vector bool short, a1,				\
+		   vector signed short, a2)
+{
+  return (vector signed short)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 8,						\
+		   vector signed short,					\
+		   vector signed short, a1,				\
+		   vector bool short, a2)
+{
+  return a1 + (vector signed short)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 9,						\
+		   vector signed short,					\
+		   vector signed short, a1,				\
+		   vector signed short, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 10,						\
+		   vector unsigned short,				\
+		   vector bool short, a1,				\
+		   vector unsigned short, a2)
+{
+  return (vector unsigned short)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 11,						\
+		   vector unsigned short,				\
+		   vector unsigned short, a1,				\
+		   vector bool short, a2)
+{
+  return a1 + (vector unsigned short)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 12,						\
+		   vector unsigned short,				\
+		   vector unsigned short, a1,				\
+		   vector unsigned short, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 13,						\
+		   vector signed int,					\
+		   vector bool int, a1,					\
+		   vector signed int, a2)
+{
+  return (vector signed int)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 14,						\
+		   vector signed int,					\
+		   vector signed int, a1,				\
+		   vector bool int, a2)
+{
+  return a1 + (vector signed int)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 15,						\
+		   vector signed int,					\
+		   vector signed int, a1,				\
+		   vector signed int, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 16,						\
+		   vector unsigned int,					\
+		   vector bool int, a1,					\
+		   vector unsigned int, a2)
+{
+  return (vector unsigned int)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 17,						\
+		   vector unsigned int,					\
+		   vector unsigned int, a1,				\
+		   vector bool int, a2)
+{
+  return a1 + (vector unsigned int)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 18,						\
+		   vector unsigned int,					\
+		   vector unsigned int, a1,				\
+		   vector unsigned int, a2)
+{
+  return a1 + a2;
+}
+
+#ifdef __VSX__
+OVERLOAD_2ARG_DECL(vec_add, 19,						\
+		   vector signed long long,				\
+		   vector bool long long, a1,				\
+		   vector signed long long, a2)
+{
+  return (vector signed long long)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 20,						\
+		   vector signed long long,				\
+		   vector signed long long, a1,				\
+		   vector bool long long, a2)
+{
+  return a1 + (vector signed long long)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 21,						\
+		   vector signed long long,				\
+		   vector signed long long, a1,				\
+		   vector signed long long, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 22,						\
+		   vector unsigned long long,				\
+		   vector bool long long, a1,				\
+		   vector unsigned long long, a2)
+{
+  return (vector unsigned long long)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 23,						\
+		   vector unsigned long long,				\
+		   vector unsigned long long, a1,			\
+		   vector bool long long, a2)
+{
+  return a1 + (vector unsigned long long)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 24,						\
+		   vector unsigned long long,				\
+		   vector unsigned long long, a1,			\
+		   vector unsigned long long, a2)
+{
+  return a1 + a2;
+}
+#endif /* __VSX__ */
+
+OVERLOAD_2ARG_DECL(vec_add, 25,						\
+		   vector float,					\
+		   vector float, a1,					\
+		   vector float, a2)
+{
+  return a1 + a2;
+}
+
+#ifdef __VSX__
+OVERLOAD_2ARG_DECL(vec_add, 26,						\
+		   vector double,					\
+		   vector double, a1,					\
+		   vector double, a2)
+{
+  return a1 + a2;
+}
+#endif /* __VSX__ */
+
+/* Currently we do not early-expand vec_add for vector __int128.  This
+   is because vector lowering in the middle end casts V1TImode to TImode,
+   which is probably appropriate since we have very little support for
+   V1TImode arithmetic.  Late expansion ensures we get the single
+   instruction add.  */
+#ifdef __POWER8_VECTOR__
+OVERLOAD_2ARG_DECL(vec_add, 27,						\
+		   vector signed __int128,				\
+		   vector signed __int128, a1,				\
+		   vector signed __int128, a2)
+{
+  return __builtin_vec_add (a1, a2);
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 28,						\
+		   vector unsigned __int128,				\
+		   vector unsigned __int128, a1,			\
+		   vector unsigned __int128, a2)
+{
+  return __builtin_vec_add (a1, a2);
+}
+#endif /* __POWER8_VECTOR__ */
+
+#endif /* !__STRICT_ANSI__ */
+
 /* Synonyms.  */
 #define vec_vaddcuw vec_addc
 #define vec_vand vec_and
@@ -190,7 +537,9 @@
 #define vec_vupklsb __builtin_vec_vupklsb
 #define vec_abs __builtin_vec_abs
 #define vec_abss __builtin_vec_abss
+#ifdef __STRICT_ANSI__
 #define vec_add __builtin_vec_add
+#endif
 #define vec_adds __builtin_vec_adds
 #define vec_and __builtin_vec_and
 #define vec_andc __builtin_vec_andc
Index: gcc/config/rs6000/overload.h
===================================================================
--- gcc/config/rs6000/overload.h	(revision 0)
+++ gcc/config/rs6000/overload.h	(working copy)
@@ -0,0 +1,206 @@
+/* Overloaded Built-In Function Support
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OVERLOAD_H
+#define _OVERLOAD_H 1
+
+/* Since __builtin_choose_expr and __builtin_types_compatible_p
+   aren't permitted in C++, we'll need to use standard overloading
+   for those.  Disable this mechanism for C++.  GNU extensions are
+   also unavailable for -ansi, -std=c11, etc.  */
+#if !defined __cplusplus && !defined __STRICT_ANSI__
+
+/* Macros named OVERLOAD_<N>ARG_<M>VAR provide a dispatch mechanism
+   for built-in functions taking N input arguments and M overloaded
+   variants.  Note that indentation conventions for nested calls to
+   __builtin_choose_expr are violated for practicality.  Please
+   maintain these macros in increasing order by N and M for ease
+   of reuse.  */
+
+#define OVERLOAD_2ARG_28VAR(NAME, ARG1, ARG2,				\
+			    VAR1_ID, VAR1_TYPE1, VAR1_TYPE2,		\
+			    VAR2_ID, VAR2_TYPE1, VAR2_TYPE2,		\
+			    VAR3_ID, VAR3_TYPE1, VAR3_TYPE2,		\
+			    VAR4_ID, VAR4_TYPE1, VAR4_TYPE2,		\
+			    VAR5_ID, VAR5_TYPE1, VAR5_TYPE2,		\
+			    VAR6_ID, VAR6_TYPE1, VAR6_TYPE2,		\
+			    VAR7_ID, VAR7_TYPE1, VAR7_TYPE2,		\
+			    VAR8_ID, VAR8_TYPE1, VAR8_TYPE2,		\
+			    VAR9_ID, VAR9_TYPE1, VAR9_TYPE2,		\
+			    VAR10_ID, VAR10_TYPE1, VAR10_TYPE2,		\
+			    VAR11_ID, VAR11_TYPE1, VAR11_TYPE2,		\
+			    VAR12_ID, VAR12_TYPE1, VAR12_TYPE2,		\
+			    VAR13_ID, VAR13_TYPE1, VAR13_TYPE2,		\
+			    VAR14_ID, VAR14_TYPE1, VAR14_TYPE2,		\
+			    VAR15_ID, VAR15_TYPE1, VAR15_TYPE2,		\
+			    VAR16_ID, VAR16_TYPE1, VAR16_TYPE2,		\
+			    VAR17_ID, VAR17_TYPE1, VAR17_TYPE2,		\
+			    VAR18_ID, VAR18_TYPE1, VAR18_TYPE2,		\
+			    VAR19_ID, VAR19_TYPE1, VAR19_TYPE2,		\
+			    VAR20_ID, VAR20_TYPE1, VAR20_TYPE2,		\
+			    VAR21_ID, VAR21_TYPE1, VAR21_TYPE2,		\
+			    VAR22_ID, VAR22_TYPE1, VAR22_TYPE2,		\
+			    VAR23_ID, VAR23_TYPE1, VAR23_TYPE2,		\
+			    VAR24_ID, VAR24_TYPE1, VAR24_TYPE2,		\
+			    VAR25_ID, VAR25_TYPE1, VAR25_TYPE2,		\
+			    VAR26_ID, VAR26_TYPE1, VAR26_TYPE2,		\
+			    VAR27_ID, VAR27_TYPE1, VAR27_TYPE2,		\
+			    VAR28_ID, VAR28_TYPE1, VAR28_TYPE2)		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR1_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR1_TYPE2),	\
+    _##NAME##_##VAR1_ID ((VAR1_TYPE1)ARG1, (VAR1_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR2_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR2_TYPE2),	\
+    _##NAME##_##VAR2_ID ((VAR2_TYPE1)ARG1, (VAR2_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR3_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR3_TYPE2),	\
+    _##NAME##_##VAR3_ID ((VAR3_TYPE1)ARG1, (VAR3_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR4_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR4_TYPE2),	\
+    _##NAME##_##VAR4_ID ((VAR4_TYPE1)ARG1, (VAR4_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR5_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR5_TYPE2),	\
+    _##NAME##_##VAR5_ID ((VAR5_TYPE1)ARG1, (VAR5_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR6_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR6_TYPE2),	\
+    _##NAME##_##VAR6_ID ((VAR6_TYPE1)ARG1, (VAR6_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR7_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR7_TYPE2),	\
+    _##NAME##_##VAR7_ID ((VAR7_TYPE1)ARG1, (VAR7_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR8_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR8_TYPE2),	\
+    _##NAME##_##VAR8_ID ((VAR8_TYPE1)ARG1, (VAR8_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR9_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR9_TYPE2),	\
+    _##NAME##_##VAR9_ID ((VAR9_TYPE1)ARG1, (VAR9_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR10_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR10_TYPE2),	\
+    _##NAME##_##VAR10_ID ((VAR10_TYPE1)ARG1, (VAR10_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR11_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR11_TYPE2),	\
+    _##NAME##_##VAR11_ID ((VAR11_TYPE1)ARG1, (VAR11_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR12_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR12_TYPE2),	\
+    _##NAME##_##VAR12_ID ((VAR12_TYPE1)ARG1, (VAR12_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR13_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR13_TYPE2),	\
+    _##NAME##_##VAR13_ID ((VAR13_TYPE1)ARG1, (VAR13_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR14_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR14_TYPE2),	\
+    _##NAME##_##VAR14_ID ((VAR14_TYPE1)ARG1, (VAR14_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR15_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR15_TYPE2),	\
+    _##NAME##_##VAR15_ID ((VAR15_TYPE1)ARG1, (VAR15_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR16_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR16_TYPE2),	\
+    _##NAME##_##VAR16_ID ((VAR16_TYPE1)ARG1, (VAR16_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR17_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR17_TYPE2),	\
+    _##NAME##_##VAR17_ID ((VAR17_TYPE1)ARG1, (VAR17_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR18_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR18_TYPE2),	\
+    _##NAME##_##VAR18_ID ((VAR18_TYPE1)ARG1, (VAR18_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR19_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR19_TYPE2),	\
+    _##NAME##_##VAR19_ID ((VAR19_TYPE1)ARG1, (VAR19_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR20_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR20_TYPE2),	\
+    _##NAME##_##VAR20_ID ((VAR20_TYPE1)ARG1, (VAR20_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR21_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR21_TYPE2),	\
+    _##NAME##_##VAR21_ID ((VAR21_TYPE1)ARG1, (VAR21_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR22_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR22_TYPE2),	\
+    _##NAME##_##VAR22_ID ((VAR22_TYPE1)ARG1, (VAR22_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR23_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR23_TYPE2),	\
+    _##NAME##_##VAR23_ID ((VAR23_TYPE1)ARG1, (VAR23_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR24_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR24_TYPE2),	\
+    _##NAME##_##VAR24_ID ((VAR24_TYPE1)ARG1, (VAR24_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR25_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR25_TYPE2),	\
+    _##NAME##_##VAR25_ID ((VAR25_TYPE1)ARG1, (VAR25_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR26_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR26_TYPE2),	\
+    _##NAME##_##VAR26_ID ((VAR26_TYPE1)ARG1, (VAR26_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR27_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR27_TYPE2),	\
+    _##NAME##_##VAR27_ID ((VAR27_TYPE1)ARG1, (VAR27_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR28_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR28_TYPE2),	\
+    _##NAME##_##VAR28_ID ((VAR28_TYPE1)ARG1, (VAR28_TYPE2)ARG2),	\
+    (void)0))))))))))))))))))))))))))))
+
+/* Macros named OVERLOAD_<N>ARG_DECL provide a declaration for one
+   variant of an overloaded built-in function having N arguments.
+   Please maintain these macros in increasing order by N for ease
+   of reuse.  */
+
+#define OVERLOAD_2ARG_DECL(NAME, VAR_ID, TYPE0,				\
+			   TYPE1, ARG1,					\
+			   TYPE2, ARG2)					\
+static __inline__ TYPE0 __attribute__ ((__always_inline__))		\
+_##NAME##_##VAR_ID (TYPE1 ARG1, TYPE2 ARG2)
+
+/* With C++, we can just use function overloading.  */
+#elif defined __cplusplus && !defined __STRICT_ANSI__
+
+#define OVERLOAD_2ARG_DECL(NAME, VAR_ID, TYPE0,				\
+			   TYPE1, ARG1,					\
+			   TYPE2, ARG2)					\
+static __inline__ TYPE0 __attribute__ ((__always_inline__))		\
+NAME (TYPE1 ARG1, TYPE2 ARG2)
+
+#endif /* !__cplusplus && !__STRICT_ANSI__ */
+
+#endif /* _OVERLOAD_H */
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 241624)
+++ gcc/config.gcc	(working copy)
@@ -440,7 +440,7 @@ nvptx-*-*)
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
-	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
+	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h overload.h"
 	case x$with_cpu in
 	    xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
 		cpu_is_64bit=yes
@@ -2279,13 +2279,13 @@ powerpc-*-darwin*)
 	    ;;
 	esac
 	tmake_file="${tmake_file} t-slibgcc"
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	;;
 powerpc64-*-darwin*)
 	extra_options="${extra_options} ${cpu_type}/darwin.opt"
 	tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc"
 	tm_file="${tm_file} ${cpu_type}/darwin8.h ${cpu_type}/darwin64.h"
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	;;
 powerpc*-*-freebsd*)
 	tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} rs6000/sysv4.h"
@@ -2512,7 +2512,7 @@ rs6000-ibm-aix5.3.* | powerpc-ibm-aix5.3.*)
 	use_collect2=yes
 	thread_file='aix'
 	use_gcc_stdint=wrap
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	;;
 rs6000-ibm-aix6.* | powerpc-ibm-aix6.*)
 	tm_file="${tm_file} rs6000/aix.h rs6000/aix61.h rs6000/xcoff.h rs6000/aix-stdint.h"
@@ -2521,7 +2521,7 @@ rs6000-ibm-aix6.* | powerpc-ibm-aix6.*)
 	use_collect2=yes
 	thread_file='aix'
 	use_gcc_stdint=wrap
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	default_use_cxa_atexit=yes
 	;;
 rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
@@ -2531,7 +2531,7 @@ rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
 	use_collect2=yes
 	thread_file='aix'
 	use_gcc_stdint=wrap
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	default_use_cxa_atexit=yes
 	;;
 rl78-*-elf*)
Index: gcc/testsuite/gcc.target/powerpc/overload-add-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-1.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with char
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed char
+test1 (vector bool char x, vector signed char y)
+{
+  return vec_add (x, y);
+}
+
+vector signed char
+test2 (vector signed char x, vector bool char y)
+{
+  return vec_add (x, y);
+}
+
+vector signed char
+test3 (vector signed char x, vector signed char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test4 (vector bool char x, vector unsigned char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test5 (vector unsigned char x, vector bool char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test6 (vector unsigned char x, vector unsigned char y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddubm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-2.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with short
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed short
+test1 (vector bool short x, vector signed short y)
+{
+  return vec_add (x, y);
+}
+
+vector signed short
+test2 (vector signed short x, vector bool short y)
+{
+  return vec_add (x, y);
+}
+
+vector signed short
+test3 (vector signed short x, vector signed short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test4 (vector bool short x, vector unsigned short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test5 (vector unsigned short x, vector bool short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test6 (vector unsigned short x, vector unsigned short y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduhm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-3.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with int
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed int
+test1 (vector bool int x, vector signed int y)
+{
+  return vec_add (x, y);
+}
+
+vector signed int
+test2 (vector signed int x, vector bool int y)
+{
+  return vec_add (x, y);
+}
+
+vector signed int
+test3 (vector signed int x, vector signed int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test4 (vector bool int x, vector unsigned int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test5 (vector unsigned int x, vector bool int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test6 (vector unsigned int x, vector unsigned int y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduwm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-4.c	(working copy)
@@ -0,0 +1,46 @@
+/* Verify that overloaded built-ins for vec_add with long long
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed long long
+test1 (vector bool long long x, vector signed long long y)
+{
+  return vec_add (x, y);
+}
+
+vector signed long long
+test2 (vector signed long long x, vector bool long long y)
+{
+  return vec_add (x, y);
+}
+
+vector signed long long
+test3 (vector signed long long x, vector signed long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test4 (vector bool long long x, vector unsigned long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test5 (vector unsigned long long x, vector bool long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test6 (vector unsigned long long x, vector unsigned long long y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddudm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-5.c	(working copy)
@@ -0,0 +1,16 @@
+/* Verify that overloaded built-ins for vec_add with float
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11 -mno-vsx" } */
+
+#include <altivec.h>
+
+vector float
+test1 (vector float x, vector float y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddfp" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-6.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-6.c	(working copy)
@@ -0,0 +1,23 @@
+/* Verify that overloaded built-ins for vec_add with float and
+   double inputs for VSX produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector float
+test1 (vector float x, vector float y)
+{
+  return vec_add (x, y);
+}
+
+vector double
+test2 (vector double x, vector double y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "xvaddsp" 1 } } */
+/* { dg-final { scan-assembler-times "xvadddp" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-7.c	(working copy)
@@ -0,0 +1,22 @@
+/* Verify that overloaded built-ins for vec_add with __int128
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-additional-options "-std=gnu11 -Wno-pedantic" } */
+
+#include "altivec.h"
+
+vector signed __int128
+test1 (vector signed __int128 x, vector signed __int128 y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned __int128
+test2 (vector unsigned __int128 x, vector unsigned __int128 y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduqm" 2 } } */