diff mbox

[RFC,rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add

Message ID 4fb8f7f2-ff17-6416-3869-a8576c245dde@linux.vnet.ibm.com
State New
Headers show

Commit Message

Bill Schmidt Oct. 31, 2016, 10:28 p.m. UTC
Hi,

The PowerPC back end loses performance on vector intrinsics, because currently
all of them are treated as calls throughout the middle-end phases and only
expanded when they reach RTL.  Our version of altivec.h currently defines the
public names of overloaded functions (like vec_add) to be #defines for hidden
functions (like __builtin_vec_add), which are recognized in the parser as 
requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
the overloaded functions to specific function calls appropriate to the argument
types.

The Clang version of altivec.h, by contrast, creates static inlines for each
overloaded function variant, relying on a special __attribute__((overloadable))
construct to do the dispatch in the parser itself.  This allows vec_add to be
immediately translated into type-specific addition during parsing, allowing
the expressions to be subject to all subsequent optimization.

We have opened a PR suggesting that this attribute be supported in GCC as well
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71199), but so far there hasn't
been any success in that regard.  While waiting/hoping for the attribute to be
implemented, though, we can use existing mechanisms to create a poor man's
version of overloading dispatch.  This patch is a proof of concept for how
this can be done, and provides support for early expansion of the overloaded
vec_add intrinsic.  If we get this working, then we can gradually add more
intrinsics over time.

The dispatch mechanism is provided in a new header file, overload.h, which is
included in altivec.h.  This is done because the guts of the dispatch
mechanism are pretty ugly to look at.  Overloading is done with a chain of
calls to __builtin_choose_expr and __builtin_types_compatible_p.  Currently
I envision providing a separate dispatch macro for each combination of the
number of arguments and the number of variants to be distinguished.  I also
provide a separate "decl" macro for each number of arguments, used to create
the function decls for each static inline function.  The add_vec intrinsic
takes two input arguments and has 28 variants, so it requires the definition
of OVERLOAD_2ARG_28VAR and OVERLOAD_2ARG_DECL in overload.h.

These macros are then instantiated in altivec.h.  The dispatch macro for an
overloaded intrinsic is instantiated once, and the decl macro is instantiated
once for each variant, along with the associated inline function body.

The dispatch macro may need to vary depending on the supported processor
features.  In the vec_add example, we have some variants that support the
"vector double" and "vector long long" data types.  These only exist when
VSX code generation is supported, so a dispatch table conditioned on
__VSX__ includes these, while a separate one without VSX support does not.
Similarly, __POWER8_VECTOR__ must be defined if we are to support "vector
signed/unsigned __int128".  Because we use a numbering scheme that needs
to be kept consistent, this requires three versions of the dispatch table,
where the more restrictive versions replace the unimplemented entries with
redundant entries.

Note that if and when we get an overloadable attribute in GCC, the stuff
in overload.h will become obsolete, we will remove the dispatch instantiations,
and we will replace the decl instantiations with plain decls with the
overloadable attribute applied.

There are several complications on top of the basic design:

 * When compiling for C++, the dispatch mechanism is not available, and indeed
   is not necessary.  Thus for C++ we skip the dispatch mechanism, and change
   the definition of OVERLOAD_2ARG_DECL to use standard function overloading.

 * Compiling with -ansi or -std=c11 or the like means the dispatch mechanism
   is unavailable even for C, since GNU extensions are disallowed.  Regret-
   tably, this means that we can't get rid of the existing late-expansion
   methods altogether.  I don't see any way to avoid this.  Note that this
   would be the case even if we had __attribute__ ((overloadable)), since
   that would also be a GNU extension.  Despite the mess, I think that the 
   performance improvements for non-strict-ANSI code make the dual maintenance
   worthwhile.

 * "#pragma GCC target" is going to cause a lot of trouble.  With the patch
   in its present state, we fail gcc.target/powerpc/ppc-target-4.c, which
   tests the use of "vsx", "altivec,no-vsx", and "no-altivec" target options,
   and happens to use vec_add (float, float) as a testbed.  The test fails
   because altivec.h is #included after the #pragma GCC target("vsx"), which
   allows the interfaces involving vector long long and vector double to be
   produced.  However, when the options are changed to "altivec,no-vsx", the
   subsequent invocation of vec_add expands to a dispatch sequence including
   vector long long, leading to a compile-time error.

   I can only think of two ways to deal with this, neither of which is
   attractive.  The first idea would be to make altivec.h capable of being
   inlined more than once.  This essentially requires an #undef before each
   #define.  Once this is done, usage of #pragma GCC target would be 
   supported provided that altivec.h is re-included after each such #pragma,
   so that the dispatch macros would be re-evaluated in the new context.
   The problem with this is that existing code not conforming to this
   requirement would fail to compile, so this is probably off the table.

   The other way would be to require a specific option on the command line
   to use the new dispatch mechanism.  When the option is present, we would
   predefine a macro such as __PPC_FAST_VECTOR__, which would then gate the
   usage in altivec.h and overload.h.  Use of #pragma GCC target to change
   the availability of Altivec, VMX, P8-vector, etc. would also be disallowed
   when the option is present.  This has the advantage of always generating
   correct code, at the cost of requiring a special option before anyone
   can leverage the benefits of early vector expansion.  That's unfortunate,
   but I suspect it's the best we can do.

The current patch is nearly complete, but the #pragma GCC target issue is
not yet resolved.  I'd like to get opinions on the overall approach of the
patch and whether you agree with my assessment of the #pragma issue before
taking the patch forward.  Thanks for reading this far, and thanks in 
advance for your opinions.  We can get some big performance improvements
here eventually, but the road is a bit rocky.

Thanks,
Bill


[gcc]

2016-10-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.h: Add new include of overload.h; when not
	compiling for C++ or strict ANSI, add new #defines for vec_add in
	terms of OVERLOAD_2ARG_28VAR and OVERLOAD_2ARG_DECL macros; when
	compiling for C++ but not for strict ANSI, use just the
	OVERLOAD_2ARG_DECL macros; when not compiling for strict ANSI,
	remove #define of vec_add in terms of __builtin_vec_add.
	* config/rs6000/overload.h: New file, with #defines of
	OVERLOAD_2ARG_28VAR when not compiling for C++ or strict ANSI, and
	two different flavors of OVERLOAD_2ARG_DECL (C++ and otherwise)
	when not compiling for strict ANSI.
	* config.gcc: For each triple that includes altivec.h in
	extra_headers, also add overload.h.

[gcc/testsuite]

2016-10-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/overload-add-1.c: New.
	* gcc.target/powerpc/overload-add-2.c: New.
	* gcc.target/powerpc/overload-add-3.c: New.
	* gcc.target/powerpc/overload-add-4.c: New.
	* gcc.target/powerpc/overload-add-5.c: New.
	* gcc.target/powerpc/overload-add-6.c: New.
	* gcc.target/powerpc/overload-add-7.c: New.

Comments

Bill Schmidt Oct. 31, 2016, 11:49 p.m. UTC | #1
> 
> On Oct 31, 2016, at 5:28 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> 
>   The other way would be to require a specific option on the command line
>   to use the new dispatch mechanism.  When the option is present, we would
>   predefine a macro such as __PPC_FAST_VECTOR__, which would then gate the
>   usage in altivec.h and overload.h.  Use of #pragma GCC target to change
>   the availability of Altivec, VMX, P8-vector, etc. would also be disallowed
>   when the option is present.  This has the advantage of always generating
>   correct code, at the cost of requiring a special option before anyone
>   can leverage the benefits of early vector expansion.  That's unfortunate,
>   but I suspect it's the best we can do.

Though I suppose we could require the option to turn off the new dispatch
mechanism, and document the change in gcc7/changes.html.  A little irritating
for people already using the pragma support, but I really expect this wouldn't affect
many people at all.

-- Bill

Bill Schmidt, Ph.D.
GCC for Linux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschmidt@linux.vnet.ibm.com
Jakub Jelinek Nov. 1, 2016, 12:09 a.m. UTC | #2
On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
> The PowerPC back end loses performance on vector intrinsics, because currently
> all of them are treated as calls throughout the middle-end phases and only
> expanded when they reach RTL.  Our version of altivec.h currently defines the
> public names of overloaded functions (like vec_add) to be #defines for hidden
> functions (like __builtin_vec_add), which are recognized in the parser as 
> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
> the overloaded functions to specific function calls appropriate to the argument
> types.

This doesn't look very nice.  If all you care is that the builtins like
__builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
into generic vector operations under certain conditions, just fold those
into whatever you want in targetm.gimple_fold_builtin (gsi).

	Jakub
Michael Meissner Nov. 1, 2016, 12:19 a.m. UTC | #3
On Mon, Oct 31, 2016 at 06:49:20PM -0500, Bill Schmidt wrote:
> > 
> > On Oct 31, 2016, at 5:28 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> > 
> >   The other way would be to require a specific option on the command line
> >   to use the new dispatch mechanism.  When the option is present, we would
> >   predefine a macro such as __PPC_FAST_VECTOR__, which would then gate the
> >   usage in altivec.h and overload.h.  Use of #pragma GCC target to change
> >   the availability of Altivec, VMX, P8-vector, etc. would also be disallowed
> >   when the option is present.  This has the advantage of always generating
> >   correct code, at the cost of requiring a special option before anyone
> >   can leverage the benefits of early vector expansion.  That's unfortunate,
> >   but I suspect it's the best we can do.
> 
> Though I suppose we could require the option to turn off the new dispatch
> mechanism, and document the change in gcc7/changes.html.  A little irritating
> for people already using the pragma support, but I really expect this wouldn't affect
> many people at all.

I suspect we may find out how many people are using #pragma GCC target and
altivec vector instructions if we break their code :-(

Even with attribute target instead of pragma, you need to give appropriate
error messages if the user calls a built-in that the current machine doesn't
support.

IIRC, C++ does not support #pragma GCC target nor the target attribute.

One question is how many of the billions and billions (ok, 1,345) of the rs6000
built-ins would be improved by expanding them in gimple time rather than rtl?
Bill Schmidt Nov. 1, 2016, 12:26 a.m. UTC | #4
> On Oct 31, 2016, at 7:09 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
>> The PowerPC back end loses performance on vector intrinsics, because currently
>> all of them are treated as calls throughout the middle-end phases and only
>> expanded when they reach RTL.  Our version of altivec.h currently defines the
>> public names of overloaded functions (like vec_add) to be #defines for hidden
>> functions (like __builtin_vec_add), which are recognized in the parser as 
>> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
>> the overloaded functions to specific function calls appropriate to the argument
>> types.
> 
> This doesn't look very nice.  If all you care is that the builtins like
> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> into generic vector operations under certain conditions, just fold those
> into whatever you want in targetm.gimple_fold_builtin (gsi).
> 
> 	Jakub
> 
Ah, thanks, Jakub.  I wasn't aware of that hook, and that sounds like the best
approach.  I had found previously how difficult it can be to expand some of
these things during the parser hook, but if we can expand them early in
GIMPLE that is probably much easier.  I will look into it.

"This doesn't look very nice" wins the understatement of the year award...
I was getting increasingly unhappy the further I got into it.

Thanks,
Bill
Bill Schmidt Nov. 1, 2016, 12:29 a.m. UTC | #5
On Oct 31, 2016, at 7:19 PM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote:
> 
> One question is how many of the billions and billions (ok, 1,345) of the rs6000
> built-ins would be improved by expanding them in gimple time rather than rtl?
> 
Hundreds and hundreds of them.  All of the basic operators, many of the memory
operations, all of the dozens of flavors of things that are just permutes at heart.
The loads and stores alone are a huge deal that we've seen cause problems in
customer code.

Bill
Marc Glisse Nov. 1, 2016, 9:01 a.m. UTC | #6
Hello,

how far are we from being able to use

#define vec_add(a,b) ((a)+(b))

?

The few tests I tried pass with -flax-vector-conversions, and the only 
ones that require this flag are those involving vector bool XXX. Would it 
make sense to tweak the front-ends to do the right thing for those 
specific vector types (that people probably didn't have in mind when 
developing the C extension)?
Richard Biener Nov. 2, 2016, 9:19 a.m. UTC | #7
On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
>> The PowerPC back end loses performance on vector intrinsics, because currently
>> all of them are treated as calls throughout the middle-end phases and only
>> expanded when they reach RTL.  Our version of altivec.h currently defines the
>> public names of overloaded functions (like vec_add) to be #defines for hidden
>> functions (like __builtin_vec_add), which are recognized in the parser as
>> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
>> the overloaded functions to specific function calls appropriate to the argument
>> types.
>
> This doesn't look very nice.  If all you care is that the builtins like
> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> into generic vector operations under certain conditions, just fold those
> into whatever you want in targetm.gimple_fold_builtin (gsi).

Note that traditionally "overloading" with GCC "builtins" is done by
using varargs
and the "type generic" attribute.  That doesn't scale to return type overloading
though for which we usually added direct support to the parser (for example
for __builtin_shuffle).

The folding trick of course should work just fine.

Richard.

>         Jakub
Jakub Jelinek Nov. 2, 2016, 9:28 a.m. UTC | #8
On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
> >> The PowerPC back end loses performance on vector intrinsics, because currently
> >> all of them are treated as calls throughout the middle-end phases and only
> >> expanded when they reach RTL.  Our version of altivec.h currently defines the
> >> public names of overloaded functions (like vec_add) to be #defines for hidden
> >> functions (like __builtin_vec_add), which are recognized in the parser as
> >> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
> >> the overloaded functions to specific function calls appropriate to the argument
> >> types.
> >
> > This doesn't look very nice.  If all you care is that the builtins like
> > __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> > into generic vector operations under certain conditions, just fold those
> > into whatever you want in targetm.gimple_fold_builtin (gsi).
> 
> Note that traditionally "overloading" with GCC "builtins" is done by
> using varargs
> and the "type generic" attribute.  That doesn't scale to return type overloading
> though for which we usually added direct support to the parser (for example
> for __builtin_shuffle).

My understanding is that rs6000 already does that, it hooks into
resolve_overloaded_builtin which already handles the fully type generic
builtins where not just the arguments, but also the return type can be
picked up.  But it resolves the overloaded builtins into calls to other
builtins that are not type-generic.

So, either that function instead of returning the specific md builtin calls
in some cases already returns trees with the generic behavior of the
builtin, or it returns what it does now and then in the gimple fold builtin
target hook (note, the normal fold builtin target hook is not right for
that, because it is mostly used for folding builtins into constant - callers
will usually throw away other results) fold those specific md builtins
into whatever GIMPLE you want.  If we want to decrease amount of folding in
the FEs, the gimple fold builtin hook is probably better.

	Jakub
Bill Schmidt Nov. 2, 2016, 12:16 p.m. UTC | #9
On Nov 2, 2016, at 4:28 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote:
>> On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
>>>> The PowerPC back end loses performance on vector intrinsics, because currently
>>>> all of them are treated as calls throughout the middle-end phases and only
>>>> expanded when they reach RTL.  Our version of altivec.h currently defines the
>>>> public names of overloaded functions (like vec_add) to be #defines for hidden
>>>> functions (like __builtin_vec_add), which are recognized in the parser as
>>>> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
>>>> the overloaded functions to specific function calls appropriate to the argument
>>>> types.
>>> 
>>> This doesn't look very nice.  If all you care is that the builtins like
>>> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
>>> into generic vector operations under certain conditions, just fold those
>>> into whatever you want in targetm.gimple_fold_builtin (gsi).
>> 
>> Note that traditionally "overloading" with GCC "builtins" is done by
>> using varargs
>> and the "type generic" attribute.  That doesn't scale to return type overloading
>> though for which we usually added direct support to the parser (for example
>> for __builtin_shuffle).
> 
> My understanding is that rs6000 already does that, it hooks into
> resolve_overloaded_builtin which already handles the fully type generic
> builtins where not just the arguments, but also the return type can be
> picked up.  But it resolves the overloaded builtins into calls to other
> builtins that are not type-generic.
> 
> So, either that function instead of returning the specific md builtin calls
> in some cases already returns trees with the generic behavior of the
> builtin, or it returns what it does now and then in the gimple fold builtin
> target hook (note, the normal fold builtin target hook is not right for
> that, because it is mostly used for folding builtins into constant - callers
> will usually throw away other results) fold those specific md builtins
> into whatever GIMPLE you want.  If we want to decrease amount of folding in
> the FEs, the gimple fold builtin hook is probably better.
> 
> 	Jakub

Thanks, all.  Using the gimple_fold_builtin target hook works very well and
is exactly what I'm looking for.  I've reworked the patch to the much simpler
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00104.html.

Much obliged for the help!

Bill
diff mbox

Patch

Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 241624)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -53,6 +53,353 @@ 
 #define __CR6_LT		2
 #define __CR6_LT_REV		3
 
+/* Machinery to support overloaded functions in C.  */
+#include "overload.h"
+
+/* Overloaded function declarations.  Please maintain these in
+   alphabetical order.  */
+
+/* Since __builtin_choose_expr and __builtin_types_compatible_p
+   aren't permitted in C++, we'll need to use standard overloading
+   for those.  Disable this mechanism for C++.  GNU extensions are
+   also unavailable for -ansi, -std=c11, etc.  */
+#ifndef __STRICT_ANSI__
+#ifndef __cplusplus
+
+#ifdef __POWER8_VECTOR__
+#define vec_add(a1, a2)							\
+  OVERLOAD_2ARG_28VAR(vec_add, a1, a2,					\
+    1, vector bool char, vector signed char,				\
+    2, vector signed char, vector bool char,				\
+    3, vector signed char, vector signed char,				\
+    4, vector bool char, vector unsigned char,				\
+    5, vector unsigned char, vector bool char,				\
+    6, vector unsigned char, vector unsigned char,			\
+    7, vector bool short, vector signed short,				\
+    8, vector signed short, vector bool short,				\
+    9, vector signed short, vector signed short,			\
+    10, vector bool short, vector unsigned short,			\
+    11, vector unsigned short, vector bool short,			\
+    12, vector unsigned short, vector unsigned short,			\
+    13, vector bool int, vector signed int,				\
+    14, vector signed int, vector bool int,				\
+    15, vector signed int, vector signed int,				\
+    16, vector bool int, vector unsigned int,				\
+    17, vector unsigned int, vector bool int,				\
+    18, vector unsigned int, vector unsigned int,			\
+    19, vector bool long long, vector signed long long,			\
+    20, vector signed long long, vector bool long long,			\
+    21, vector signed long long, vector signed long long,		\
+    22, vector bool long long, vector unsigned long long,		\
+    23, vector unsigned long long, vector bool long long,		\
+    24, vector unsigned long long, vector unsigned long long,		\
+    25, vector float, vector float,					\
+    26, vector double, vector double,					\
+    27, vector signed __int128, vector signed __int128,			\
+    28, vector unsigned __int128, vector unsigned __int128)
+#elif defined __VSX__
+#define vec_add(a1, a2)							\
+  OVERLOAD_2ARG_28VAR(vec_add, a1, a2,					\
+    1, vector bool char, vector signed char,				\
+    2, vector signed char, vector bool char,				\
+    3, vector signed char, vector signed char,				\
+    4, vector bool char, vector unsigned char,				\
+    5, vector unsigned char, vector bool char,				\
+    6, vector unsigned char, vector unsigned char,			\
+    7, vector bool short, vector signed short,				\
+    8, vector signed short, vector bool short,				\
+    9, vector signed short, vector signed short,			\
+    10, vector bool short, vector unsigned short,			\
+    11, vector unsigned short, vector bool short,			\
+    12, vector unsigned short, vector unsigned short,			\
+    13, vector bool int, vector signed int,				\
+    14, vector signed int, vector bool int,				\
+    15, vector signed int, vector signed int,				\
+    16, vector bool int, vector unsigned int,				\
+    17, vector unsigned int, vector bool int,				\
+    18, vector unsigned int, vector unsigned int,			\
+    19, vector bool long long, vector signed long long,			\
+    20, vector signed long long, vector bool long long,			\
+    21, vector signed long long, vector signed long long,		\
+    22, vector bool long long, vector unsigned long long,		\
+    23, vector unsigned long long, vector bool long long,		\
+    24, vector unsigned long long, vector unsigned long long,		\
+    25, vector float, vector float,					\
+    26, vector double, vector double,					\
+    26, vector double, vector double,					\
+    26, vector double, vector double)
+#else
+#define vec_add(a1, a2)							\
+  OVERLOAD_2ARG_28VAR(vec_add, a1, a2,					\
+    1, vector bool char, vector signed char,				\
+    2, vector signed char, vector bool char,				\
+    3, vector signed char, vector signed char,				\
+    4, vector bool char, vector unsigned char,				\
+    5, vector unsigned char, vector bool char,				\
+    6, vector unsigned char, vector unsigned char,			\
+    7, vector bool short, vector signed short,				\
+    8, vector signed short, vector bool short,				\
+    9, vector signed short, vector signed short,			\
+    10, vector bool short, vector unsigned short,			\
+    11, vector unsigned short, vector bool short,			\
+    12, vector unsigned short, vector unsigned short,			\
+    13, vector bool int, vector signed int,				\
+    14, vector signed int, vector bool int,				\
+    15, vector signed int, vector signed int,				\
+    16, vector bool int, vector unsigned int,				\
+    17, vector unsigned int, vector bool int,				\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    18, vector unsigned int, vector unsigned int,			\
+    25, vector float, vector float,					\
+    25, vector float, vector float,					\
+    25, vector float, vector float,					\
+    25, vector float, vector float)
+#endif /* __POWER8_VECTOR__ #elif __VSX__ */
+
+#endif /* !__cplusplus */
+
+OVERLOAD_2ARG_DECL(vec_add, 1,						\
+		   vector signed char,					\
+		   vector bool char, a1,				\
+		   vector signed char, a2)
+{
+  return (vector signed char)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 2,						\
+		   vector signed char,					\
+		   vector signed char, a1,				\
+		   vector bool char, a2)
+{
+  return a1 + (vector signed char)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 3,						\
+		   vector signed char,					\
+		   vector signed char, a1,				\
+		   vector signed char, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 4,						\
+		   vector unsigned char,				\
+		   vector bool char, a1,				\
+		   vector unsigned char, a2)
+{
+  return (vector unsigned char)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 5,						\
+		   vector unsigned char,				\
+		   vector unsigned char, a1,				\
+		   vector bool char, a2)
+{
+  return a1 + (vector unsigned char)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 6,						\
+		   vector unsigned char,				\
+		   vector unsigned char, a1,				\
+		   vector unsigned char, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 7,						\
+		   vector signed short,					\
+		   vector bool short, a1,				\
+		   vector signed short, a2)
+{
+  return (vector signed short)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 8,						\
+		   vector signed short,					\
+		   vector signed short, a1,				\
+		   vector bool short, a2)
+{
+  return a1 + (vector signed short)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 9,						\
+		   vector signed short,					\
+		   vector signed short, a1,				\
+		   vector signed short, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 10,						\
+		   vector unsigned short,				\
+		   vector bool short, a1,				\
+		   vector unsigned short, a2)
+{
+  return (vector unsigned short)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 11,						\
+		   vector unsigned short,				\
+		   vector unsigned short, a1,				\
+		   vector bool short, a2)
+{
+  return a1 + (vector unsigned short)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 12,						\
+		   vector unsigned short,				\
+		   vector unsigned short, a1,				\
+		   vector unsigned short, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 13,						\
+		   vector signed int,					\
+		   vector bool int, a1,					\
+		   vector signed int, a2)
+{
+  return (vector signed int)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 14,						\
+		   vector signed int,					\
+		   vector signed int, a1,				\
+		   vector bool int, a2)
+{
+  return a1 + (vector signed int)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 15,						\
+		   vector signed int,					\
+		   vector signed int, a1,				\
+		   vector signed int, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 16,						\
+		   vector unsigned int,					\
+		   vector bool int, a1,					\
+		   vector unsigned int, a2)
+{
+  return (vector unsigned int)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 17,						\
+		   vector unsigned int,					\
+		   vector unsigned int, a1,				\
+		   vector bool int, a2)
+{
+  return a1 + (vector unsigned int)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 18,						\
+		   vector unsigned int,					\
+		   vector unsigned int, a1,				\
+		   vector unsigned int, a2)
+{
+  return a1 + a2;
+}
+
+#ifdef __VSX__
+OVERLOAD_2ARG_DECL(vec_add, 19,						\
+		   vector signed long long,				\
+		   vector bool long long, a1,				\
+		   vector signed long long, a2)
+{
+  return (vector signed long long)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 20,						\
+		   vector signed long long,				\
+		   vector signed long long, a1,				\
+		   vector bool long long, a2)
+{
+  return a1 + (vector signed long long)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 21,						\
+		   vector signed long long,				\
+		   vector signed long long, a1,				\
+		   vector signed long long, a2)
+{
+  return a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 22,						\
+		   vector unsigned long long,				\
+		   vector bool long long, a1,				\
+		   vector unsigned long long, a2)
+{
+  return (vector unsigned long long)a1 + a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 23,						\
+		   vector unsigned long long,				\
+		   vector unsigned long long, a1,			\
+		   vector bool long long, a2)
+{
+  return a1 + (vector unsigned long long)a2;
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 24,						\
+		   vector unsigned long long,				\
+		   vector unsigned long long, a1,			\
+		   vector unsigned long long, a2)
+{
+  return a1 + a2;
+}
+#endif /* __VSX__ */
+
+OVERLOAD_2ARG_DECL(vec_add, 25,						\
+		   vector float,					\
+		   vector float, a1,					\
+		   vector float, a2)
+{
+  return a1 + a2;
+}
+
+#ifdef __VSX__
+OVERLOAD_2ARG_DECL(vec_add, 26,						\
+		   vector double,					\
+		   vector double, a1,					\
+		   vector double, a2)
+{
+  return a1 + a2;
+}
+#endif /* __VSX__ */
+
+/* Currently we do not early-expand vec_add for vector __int128.  This
+   is because vector lowering in the middle end casts V1TImode to TImode,
+   which is probably appropriate since we have very little support for
+   V1TImode arithmetic.  Late expansion ensures we get the single
+   instruction add.  */
+#ifdef __POWER8_VECTOR__
+OVERLOAD_2ARG_DECL(vec_add, 27,						\
+		   vector signed __int128,				\
+		   vector signed __int128, a1,				\
+		   vector signed __int128, a2)
+{
+  return __builtin_vec_add (a1, a2);
+}
+
+OVERLOAD_2ARG_DECL(vec_add, 28,						\
+		   vector unsigned __int128,				\
+		   vector unsigned __int128, a1,			\
+		   vector unsigned __int128, a2)
+{
+  return __builtin_vec_add (a1, a2);
+}
+#endif /* __POWER8_VECTOR__ */
+
+#endif /* !__STRICT_ANSI__ */
+
 /* Synonyms.  */
 #define vec_vaddcuw vec_addc
 #define vec_vand vec_and
@@ -190,7 +537,9 @@ 
 #define vec_vupklsb __builtin_vec_vupklsb
 #define vec_abs __builtin_vec_abs
 #define vec_abss __builtin_vec_abss
+#ifdef __STRICT_ANSI__
 #define vec_add __builtin_vec_add
+#endif
 #define vec_adds __builtin_vec_adds
 #define vec_and __builtin_vec_and
 #define vec_andc __builtin_vec_andc
Index: gcc/config/rs6000/overload.h
===================================================================
--- gcc/config/rs6000/overload.h	(revision 0)
+++ gcc/config/rs6000/overload.h	(working copy)
@@ -0,0 +1,206 @@ 
+/* Overloaded Built-In Function Support
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _OVERLOAD_H
+#define _OVERLOAD_H 1
+
+/* Since __builtin_choose_expr and __builtin_types_compatible_p
+   aren't permitted in C++, we'll need to use standard overloading
+   for those.  Disable this mechanism for C++.  GNU extensions are
+   also unavailable for -ansi, -std=c11, etc.  */
+#if !defined __cplusplus && !defined __STRICT_ANSI__
+
+/* Macros named OVERLOAD_<N>ARG_<M>VAR provide a dispatch mechanism
+   for built-in functions taking N input arguments and M overloaded
+   variants.  Note that indentation conventions for nested calls to
+   __builtin_choose_expr are violated for practicality.  Please
+   maintain these macros in increasing order by N and M for ease
+   of reuse.  */
+
+#define OVERLOAD_2ARG_28VAR(NAME, ARG1, ARG2,				\
+			    VAR1_ID, VAR1_TYPE1, VAR1_TYPE2,		\
+			    VAR2_ID, VAR2_TYPE1, VAR2_TYPE2,		\
+			    VAR3_ID, VAR3_TYPE1, VAR3_TYPE2,		\
+			    VAR4_ID, VAR4_TYPE1, VAR4_TYPE2,		\
+			    VAR5_ID, VAR5_TYPE1, VAR5_TYPE2,		\
+			    VAR6_ID, VAR6_TYPE1, VAR6_TYPE2,		\
+			    VAR7_ID, VAR7_TYPE1, VAR7_TYPE2,		\
+			    VAR8_ID, VAR8_TYPE1, VAR8_TYPE2,		\
+			    VAR9_ID, VAR9_TYPE1, VAR9_TYPE2,		\
+			    VAR10_ID, VAR10_TYPE1, VAR10_TYPE2,		\
+			    VAR11_ID, VAR11_TYPE1, VAR11_TYPE2,		\
+			    VAR12_ID, VAR12_TYPE1, VAR12_TYPE2,		\
+			    VAR13_ID, VAR13_TYPE1, VAR13_TYPE2,		\
+			    VAR14_ID, VAR14_TYPE1, VAR14_TYPE2,		\
+			    VAR15_ID, VAR15_TYPE1, VAR15_TYPE2,		\
+			    VAR16_ID, VAR16_TYPE1, VAR16_TYPE2,		\
+			    VAR17_ID, VAR17_TYPE1, VAR17_TYPE2,		\
+			    VAR18_ID, VAR18_TYPE1, VAR18_TYPE2,		\
+			    VAR19_ID, VAR19_TYPE1, VAR19_TYPE2,		\
+			    VAR20_ID, VAR20_TYPE1, VAR20_TYPE2,		\
+			    VAR21_ID, VAR21_TYPE1, VAR21_TYPE2,		\
+			    VAR22_ID, VAR22_TYPE1, VAR22_TYPE2,		\
+			    VAR23_ID, VAR23_TYPE1, VAR23_TYPE2,		\
+			    VAR24_ID, VAR24_TYPE1, VAR24_TYPE2,		\
+			    VAR25_ID, VAR25_TYPE1, VAR25_TYPE2,		\
+			    VAR26_ID, VAR26_TYPE1, VAR26_TYPE2,		\
+			    VAR27_ID, VAR27_TYPE1, VAR27_TYPE2,		\
+			    VAR28_ID, VAR28_TYPE1, VAR28_TYPE2)		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR1_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR1_TYPE2),	\
+    _##NAME##_##VAR1_ID ((VAR1_TYPE1)ARG1, (VAR1_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR2_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR2_TYPE2),	\
+    _##NAME##_##VAR2_ID ((VAR2_TYPE1)ARG1, (VAR2_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR3_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR3_TYPE2),	\
+    _##NAME##_##VAR3_ID ((VAR3_TYPE1)ARG1, (VAR3_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR4_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR4_TYPE2),	\
+    _##NAME##_##VAR4_ID ((VAR4_TYPE1)ARG1, (VAR4_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR5_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR5_TYPE2),	\
+    _##NAME##_##VAR5_ID ((VAR5_TYPE1)ARG1, (VAR5_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR6_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR6_TYPE2),	\
+    _##NAME##_##VAR6_ID ((VAR6_TYPE1)ARG1, (VAR6_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR7_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR7_TYPE2),	\
+    _##NAME##_##VAR7_ID ((VAR7_TYPE1)ARG1, (VAR7_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR8_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR8_TYPE2),	\
+    _##NAME##_##VAR8_ID ((VAR8_TYPE1)ARG1, (VAR8_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR9_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR9_TYPE2),	\
+    _##NAME##_##VAR9_ID ((VAR9_TYPE1)ARG1, (VAR9_TYPE2)ARG2),		\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR10_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR10_TYPE2),	\
+    _##NAME##_##VAR10_ID ((VAR10_TYPE1)ARG1, (VAR10_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR11_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR11_TYPE2),	\
+    _##NAME##_##VAR11_ID ((VAR11_TYPE1)ARG1, (VAR11_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR12_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR12_TYPE2),	\
+    _##NAME##_##VAR12_ID ((VAR12_TYPE1)ARG1, (VAR12_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR13_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR13_TYPE2),	\
+    _##NAME##_##VAR13_ID ((VAR13_TYPE1)ARG1, (VAR13_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR14_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR14_TYPE2),	\
+    _##NAME##_##VAR14_ID ((VAR14_TYPE1)ARG1, (VAR14_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR15_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR15_TYPE2),	\
+    _##NAME##_##VAR15_ID ((VAR15_TYPE1)ARG1, (VAR15_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR16_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR16_TYPE2),	\
+    _##NAME##_##VAR16_ID ((VAR16_TYPE1)ARG1, (VAR16_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR17_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR17_TYPE2),	\
+    _##NAME##_##VAR17_ID ((VAR17_TYPE1)ARG1, (VAR17_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR18_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR18_TYPE2),	\
+    _##NAME##_##VAR18_ID ((VAR18_TYPE1)ARG1, (VAR18_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR19_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR19_TYPE2),	\
+    _##NAME##_##VAR19_ID ((VAR19_TYPE1)ARG1, (VAR19_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR20_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR20_TYPE2),	\
+    _##NAME##_##VAR20_ID ((VAR20_TYPE1)ARG1, (VAR20_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR21_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR21_TYPE2),	\
+    _##NAME##_##VAR21_ID ((VAR21_TYPE1)ARG1, (VAR21_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR22_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR22_TYPE2),	\
+    _##NAME##_##VAR22_ID ((VAR22_TYPE1)ARG1, (VAR22_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR23_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR23_TYPE2),	\
+    _##NAME##_##VAR23_ID ((VAR23_TYPE1)ARG1, (VAR23_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR24_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR24_TYPE2),	\
+    _##NAME##_##VAR24_ID ((VAR24_TYPE1)ARG1, (VAR24_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR25_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR25_TYPE2),	\
+    _##NAME##_##VAR25_ID ((VAR25_TYPE1)ARG1, (VAR25_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR26_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR26_TYPE2),	\
+    _##NAME##_##VAR26_ID ((VAR26_TYPE1)ARG1, (VAR26_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR27_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR27_TYPE2),	\
+    _##NAME##_##VAR27_ID ((VAR27_TYPE1)ARG1, (VAR27_TYPE2)ARG2),	\
+  __builtin_choose_expr (						\
+    __builtin_types_compatible_p (__typeof__ (ARG1), VAR28_TYPE1)	\
+    && __builtin_types_compatible_p (__typeof__ (ARG2), VAR28_TYPE2),	\
+    _##NAME##_##VAR28_ID ((VAR28_TYPE1)ARG1, (VAR28_TYPE2)ARG2),	\
+    (void)0))))))))))))))))))))))))))))
+
+/* Macros named OVERLOAD_<N>ARG_DECL provide a declaration for one
+   variant of an overloaded built-in function having N arguments.
+   Please maintain these macros in increasing order by N for ease
+   of reuse.  */
+
+#define OVERLOAD_2ARG_DECL(NAME, VAR_ID, TYPE0,				\
+			   TYPE1, ARG1,					\
+			   TYPE2, ARG2)					\
+static __inline__ TYPE0 __attribute__ ((__always_inline__))		\
+_##NAME##_##VAR_ID (TYPE1 ARG1, TYPE2 ARG2)
+
+/* With C++, we can just use function overloading.  */
+#elif defined __cplusplus && !defined __STRICT_ANSI__
+
+#define OVERLOAD_2ARG_DECL(NAME, VAR_ID, TYPE0,				\
+			   TYPE1, ARG1,					\
+			   TYPE2, ARG2)					\
+static __inline__ TYPE0 __attribute__ ((__always_inline__))		\
+NAME (TYPE1 ARG1, TYPE2 ARG2)
+
+#endif /* !__cplusplus && !__STRICT_ANSI__ */
+
+#endif /* _OVERLOAD_H */
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 241624)
+++ gcc/config.gcc	(working copy)
@@ -440,7 +440,7 @@  nvptx-*-*)
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
-	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
+	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h overload.h"
 	case x$with_cpu in
 	    xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
 		cpu_is_64bit=yes
@@ -2279,13 +2279,13 @@  powerpc-*-darwin*)
 	    ;;
 	esac
 	tmake_file="${tmake_file} t-slibgcc"
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	;;
 powerpc64-*-darwin*)
 	extra_options="${extra_options} ${cpu_type}/darwin.opt"
 	tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc"
 	tm_file="${tm_file} ${cpu_type}/darwin8.h ${cpu_type}/darwin64.h"
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	;;
 powerpc*-*-freebsd*)
 	tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} rs6000/sysv4.h"
@@ -2512,7 +2512,7 @@  rs6000-ibm-aix5.3.* | powerpc-ibm-aix5.3.*)
 	use_collect2=yes
 	thread_file='aix'
 	use_gcc_stdint=wrap
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	;;
 rs6000-ibm-aix6.* | powerpc-ibm-aix6.*)
 	tm_file="${tm_file} rs6000/aix.h rs6000/aix61.h rs6000/xcoff.h rs6000/aix-stdint.h"
@@ -2521,7 +2521,7 @@  rs6000-ibm-aix6.* | powerpc-ibm-aix6.*)
 	use_collect2=yes
 	thread_file='aix'
 	use_gcc_stdint=wrap
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	default_use_cxa_atexit=yes
 	;;
 rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
@@ -2531,7 +2531,7 @@  rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
 	use_collect2=yes
 	thread_file='aix'
 	use_gcc_stdint=wrap
-	extra_headers=altivec.h
+	extra_headers="altivec.h overload.h"
 	default_use_cxa_atexit=yes
 	;;
 rl78-*-elf*)
Index: gcc/testsuite/gcc.target/powerpc/overload-add-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-1.c	(working copy)
@@ -0,0 +1,46 @@ 
+/* Verify that overloaded built-ins for vec_add with char
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed char
+test1 (vector bool char x, vector signed char y)
+{
+  return vec_add (x, y);
+}
+
+vector signed char
+test2 (vector signed char x, vector bool char y)
+{
+  return vec_add (x, y);
+}
+
+vector signed char
+test3 (vector signed char x, vector signed char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test4 (vector bool char x, vector unsigned char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test5 (vector unsigned char x, vector bool char y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned char
+test6 (vector unsigned char x, vector unsigned char y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddubm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-2.c	(working copy)
@@ -0,0 +1,46 @@ 
+/* Verify that overloaded built-ins for vec_add with short
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed short
+test1 (vector bool short x, vector signed short y)
+{
+  return vec_add (x, y);
+}
+
+vector signed short
+test2 (vector signed short x, vector bool short y)
+{
+  return vec_add (x, y);
+}
+
+vector signed short
+test3 (vector signed short x, vector signed short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test4 (vector bool short x, vector unsigned short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test5 (vector unsigned short x, vector bool short y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned short
+test6 (vector unsigned short x, vector unsigned short y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduhm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-3.c	(working copy)
@@ -0,0 +1,46 @@ 
+/* Verify that overloaded built-ins for vec_add with int
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed int
+test1 (vector bool int x, vector signed int y)
+{
+  return vec_add (x, y);
+}
+
+vector signed int
+test2 (vector signed int x, vector bool int y)
+{
+  return vec_add (x, y);
+}
+
+vector signed int
+test3 (vector signed int x, vector signed int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test4 (vector bool int x, vector unsigned int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test5 (vector unsigned int x, vector bool int y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned int
+test6 (vector unsigned int x, vector unsigned int y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduwm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-4.c	(working copy)
@@ -0,0 +1,46 @@ 
+/* Verify that overloaded built-ins for vec_add with long long
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector signed long long
+test1 (vector bool long long x, vector signed long long y)
+{
+  return vec_add (x, y);
+}
+
+vector signed long long
+test2 (vector signed long long x, vector bool long long y)
+{
+  return vec_add (x, y);
+}
+
+vector signed long long
+test3 (vector signed long long x, vector signed long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test4 (vector bool long long x, vector unsigned long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test5 (vector unsigned long long x, vector bool long long y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned long long
+test6 (vector unsigned long long x, vector unsigned long long y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddudm" 6 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-5.c	(working copy)
@@ -0,0 +1,16 @@ 
+/* Verify that overloaded built-ins for vec_add with float
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-additional-options "-std=gnu11 -mno-vsx" } */
+
+#include <altivec.h>
+
+vector float
+test1 (vector float x, vector float y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vaddfp" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-6.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-6.c	(working copy)
@@ -0,0 +1,23 @@ 
+/* Verify that overloaded built-ins for vec_add with float and
+   double inputs for VSX produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-additional-options "-std=gnu11" } */
+
+#include <altivec.h>
+
+vector float
+test1 (vector float x, vector float y)
+{
+  return vec_add (x, y);
+}
+
+vector double
+test2 (vector double x, vector double y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "xvaddsp" 1 } } */
+/* { dg-final { scan-assembler-times "xvadddp" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/overload-add-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/overload-add-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/overload-add-7.c	(working copy)
@@ -0,0 +1,22 @@ 
+/* Verify that overloaded built-ins for vec_add with __int128
+   inputs produce the right results.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-additional-options "-std=gnu11 -Wno-pedantic" } */
+
+#include "altivec.h"
+
+vector signed __int128
+test1 (vector signed __int128 x, vector signed __int128 y)
+{
+  return vec_add (x, y);
+}
+
+vector unsigned __int128
+test2 (vector unsigned __int128 x, vector unsigned __int128 y)
+{
+  return vec_add (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vadduqm" 2 } } */