diff mbox

[powerpc] Add -mmass to use XL's MASS vectorization library

Message ID 20100818220404.GA27010@hungry-tiger.westford.ibm.com
State New
Headers show

Commit Message

Michael Meissner Aug. 18, 2010, 10:04 p.m. UTC
On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote:
> On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
> > new switch (-mmass) that says to vectorize various mathematical functions (sin,
> > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
> > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
> > functions.  I have done bootstraps on my power systems, and comparison tests
> > and there were no regressions.  Is it ok to install in the tree?
> 
> In the case that we develop a common library for all archs it would be nice
> to have the same switch for ppc as we have for x86, so why didn't you
> use -mveclibabi=mass?

This revised patch changes the name of the switch to -mveclibabi=mass.  Is it
ok to apply?

[gcc]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mveclibabi=mass): New option to
	enable the compiler to autovectorize mathmetical functions for
	power7 using the Mathematical Acceleration Subsystem library.

	* config/rs6000/rs6000.c (rs6000_veclib_handler): New variable to
	handle which vector math library we have.
	(rs6000_override_options): Add -mveclibabi=mass support.
	(rs6000_builtin_vectorized_libmass): New function to handle auto
	vectorizing math functions that are in the MASS library.
	(rs6000_builtin_vectorized_function): Call it.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Document
	-mveclibabi=mass.

[gcc/testsuite]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vsx-mass-1.c: New file, test
	-mveclibabi=mass.

Comments

David Edelsohn Aug. 20, 2010, 2:34 p.m. UTC | #1
On Wed, Aug 18, 2010 at 6:04 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

>        * config/rs6000/rs6000.opt (-mveclibabi=mass): New option to
>        enable the compiler to autovectorize mathmetical functions for
>        power7 using the Mathematical Acceleration Subsystem library.
>
>        * config/rs6000/rs6000.c (rs6000_veclib_handler): New variable to
>        handle which vector math library we have.
>        (rs6000_override_options): Add -mveclibabi=mass support.
>        (rs6000_builtin_vectorized_libmass): New function to handle auto
>        vectorizing math functions that are in the MASS library.
>        (rs6000_builtin_vectorized_function): Call it.
>
>        * doc/invoke.texi (RS/6000 and PowerPC Options): Document
>        -mveclibabi=mass.
>
> [gcc/testsuite]
> 2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/vsx-mass-1.c: New file, test
>        -mveclibabi=mass.

Okay.

Thanks, David
diff mbox

Patch

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 163345)
+++ gcc/doc/invoke.texi	(working copy)
@@ -786,7 +786,9 @@  See RS/6000 and PowerPC Options.
 -mprototype  -mno-prototype @gol
 -msim  -mmvme  -mads  -myellowknife  -memb  -msdata @gol
 -msdata=@var{opt}  -mvxworks  -G @var{num}  -pthread @gol
--mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision -mno-recip-precision}
+-mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision
+-mno-recip-precision @gol
+-mveclibabi=@var{type}}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -15847,6 +15849,30 @@  automatically selects @option{-mrecip-pr
 precision square root estimate instructions are not generated by
 default on low precision machines, since they do not provide an
 estimate that converges after three steps.
+
+@item -mveclibabi=@var{type}
+@opindex mveclibabi
+Specifies the ABI type to use for vectorizing intrinsics using an
+external library.  The only type supported at present is @code{mass},
+which specifies to use IBM's Mathematical Acceleration Subsystem
+(MASS) libraries for vectorizing intrinsics using external libraries.
+GCC will currently emit calls to @code{acosd2}, @code{acosf4},
+@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4},
+@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4},
+@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4},
+@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4},
+@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4},
+@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4},
+@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4},
+@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4},
+@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4},
+@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4},
+@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2},
+@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2},
+@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code
+for power7.  Both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} have to be enabled.  The MASS
+libraries will have to be specified at link time.
 @end table
 
 @node RX Options
Index: gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(revision 163355)
@@ -0,0 +1,554 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -mveclibabi=mass" } */
+/* { dg-final { scan-assembler "bl atan2d2" } } */
+/* { dg-final { scan-assembler "bl atan2f4" } } */
+/* { dg-final { scan-assembler "bl hypotd2" } } */
+/* { dg-final { scan-assembler "bl hypotf4" } } */
+/* { dg-final { scan-assembler "bl powd2" } } */
+/* { dg-final { scan-assembler "bl powf4" } } */
+/* { dg-final { scan-assembler "bl acosd2" } } */
+/* { dg-final { scan-assembler "bl acosf4" } } */
+/* { dg-final { scan-assembler "bl acoshd2" } } */
+/* { dg-final { scan-assembler "bl acoshf4" } } */
+/* { dg-final { scan-assembler "bl asind2" } } */
+/* { dg-final { scan-assembler "bl asinf4" } } */
+/* { dg-final { scan-assembler "bl asinhd2" } } */
+/* { dg-final { scan-assembler "bl asinhf4" } } */
+/* { dg-final { scan-assembler "bl atand2" } } */
+/* { dg-final { scan-assembler "bl atanf4" } } */
+/* { dg-final { scan-assembler "bl atanhd2" } } */
+/* { dg-final { scan-assembler "bl atanhf4" } } */
+/* { dg-final { scan-assembler "bl cbrtd2" } } */
+/* { dg-final { scan-assembler "bl cbrtf4" } } */
+/* { dg-final { scan-assembler "bl cosd2" } } */
+/* { dg-final { scan-assembler "bl cosf4" } } */
+/* { dg-final { scan-assembler "bl coshd2" } } */
+/* { dg-final { scan-assembler "bl coshf4" } } */
+/* { dg-final { scan-assembler "bl erfd2" } } */
+/* { dg-final { scan-assembler "bl erff4" } } */
+/* { dg-final { scan-assembler "bl erfcd2" } } */
+/* { dg-final { scan-assembler "bl erfcf4" } } */
+/* { dg-final { scan-assembler "bl exp2d2" } } */
+/* { dg-final { scan-assembler "bl exp2f4" } } */
+/* { dg-final { scan-assembler "bl expd2" } } */
+/* { dg-final { scan-assembler "bl expf4" } } */
+/* { dg-final { scan-assembler "bl expm1d2" } } */
+/* { dg-final { scan-assembler "bl expm1f4" } } */
+/* { dg-final { scan-assembler "bl lgamma" } } */
+/* { dg-final { scan-assembler "bl lgammaf" } } */
+/* { dg-final { scan-assembler "bl log10d2" } } */
+/* { dg-final { scan-assembler "bl log10f4" } } */
+/* { dg-final { scan-assembler "bl log1pd2" } } */
+/* { dg-final { scan-assembler "bl log1pf4" } } */
+/* { dg-final { scan-assembler "bl log2d2" } } */
+/* { dg-final { scan-assembler "bl log2f4" } } */
+/* { dg-final { scan-assembler "bl logd2" } } */
+/* { dg-final { scan-assembler "bl logf4" } } */
+/* { dg-final { scan-assembler "bl sind2" } } */
+/* { dg-final { scan-assembler "bl sinf4" } } */
+/* { dg-final { scan-assembler "bl sinhd2" } } */
+/* { dg-final { scan-assembler "bl sinhf4" } } */
+/* { dg-final { scan-assembler "bl tand2" } } */
+/* { dg-final { scan-assembler "bl tanf4" } } */
+/* { dg-final { scan-assembler "bl tanhd2" } } */
+/* { dg-final { scan-assembler "bl tanhf4" } } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+double d1[SIZE] __attribute__((__aligned__(32)));
+double d2[SIZE] __attribute__((__aligned__(32)));
+double d3[SIZE] __attribute__((__aligned__(32)));
+
+float f1[SIZE] __attribute__((__aligned__(32)));
+float f2[SIZE] __attribute__((__aligned__(32)));
+float f3[SIZE] __attribute__((__aligned__(32)));
+
+void
+test_double_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan2 (d2[i], d3[i]);
+}
+
+void
+test_float_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atan2f (f2[i], f3[i]);
+}
+
+void
+test_double_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_hypot (d2[i], d3[i]);
+}
+
+void
+test_float_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_hypotf (f2[i], f3[i]);
+}
+
+void
+test_double_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_pow (d2[i], d3[i]);
+}
+
+void
+test_float_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_powf (f2[i], f3[i]);
+}
+
+void
+test_double_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acos (d2[i]);
+}
+
+void
+test_float_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acosf (f2[i]);
+}
+
+void
+test_double_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acosh (d2[i]);
+}
+
+void
+test_float_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acoshf (f2[i]);
+}
+
+void
+test_double_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asin (d2[i]);
+}
+
+void
+test_float_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinf (f2[i]);
+}
+
+void
+test_double_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asinh (d2[i]);
+}
+
+void
+test_float_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinhf (f2[i]);
+}
+
+void
+test_double_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan (d2[i]);
+}
+
+void
+test_float_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanf (f2[i]);
+}
+
+void
+test_double_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atanh (d2[i]);
+}
+
+void
+test_float_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanhf (f2[i]);
+}
+
+void
+test_double_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cbrt (d2[i]);
+}
+
+void
+test_float_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cbrtf (f2[i]);
+}
+
+void
+test_double_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cos (d2[i]);
+}
+
+void
+test_float_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cosf (f2[i]);
+}
+
+void
+test_double_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cosh (d2[i]);
+}
+
+void
+test_float_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_coshf (f2[i]);
+}
+
+void
+test_double_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erf (d2[i]);
+}
+
+void
+test_float_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erff (f2[i]);
+}
+
+void
+test_double_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erfc (d2[i]);
+}
+
+void
+test_float_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erfcf (f2[i]);
+}
+
+void
+test_double_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp2 (d2[i]);
+}
+
+void
+test_float_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_exp2f (f2[i]);
+}
+
+void
+test_double_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp (d2[i]);
+}
+
+void
+test_float_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expf (f2[i]);
+}
+
+void
+test_double_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_expm1 (d2[i]);
+}
+
+void
+test_float_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expm1f (f2[i]);
+}
+
+void
+test_double_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_lgamma (d2[i]);
+}
+
+void
+test_float_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_lgammaf (f2[i]);
+}
+
+void
+test_double_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log10 (d2[i]);
+}
+
+void
+test_float_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log10f (f2[i]);
+}
+
+void
+test_double_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log1p (d2[i]);
+}
+
+void
+test_float_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log1pf (f2[i]);
+}
+
+void
+test_double_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log2 (d2[i]);
+}
+
+void
+test_float_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log2f (f2[i]);
+}
+
+void
+test_double_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log (d2[i]);
+}
+
+void
+test_float_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_logf (f2[i]);
+}
+
+void
+test_double_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sin (d2[i]);
+}
+
+void
+test_float_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinf (f2[i]);
+}
+
+void
+test_double_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sinh (d2[i]);
+}
+
+void
+test_float_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinhf (f2[i]);
+}
+
+void
+test_double_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sqrt (d2[i]);
+}
+
+void
+test_float_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sqrtf (f2[i]);
+}
+
+void
+test_double_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tan (d2[i]);
+}
+
+void
+test_float_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanf (f2[i]);
+}
+
+void
+test_double_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tanh (d2[i]);
+}
+
+void
+test_float_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanhf (f2[i]);
+}
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 163345)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -115,6 +115,10 @@  mpopcntd
 Target Report Mask(POPCNTD)
 Use PowerPC V2.06 popcntd instruction
 
+mveclibabi=
+Target RejectNegative Joined Var(rs6000_veclibabi_name)
+Vector library ABI to use
+
 mvsx
 Target Report Mask(VSX)
 Use vector/scalar (VSX) instructions
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 163345)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -949,6 +949,9 @@  static const enum rs6000_btc builtin_cla
 #undef RS6000_BUILTIN
 #undef RS6000_BUILTIN_EQUATE
 
+/* Support for -mveclibabi=<xxx> to control which vector library to use.  */
+static tree (*rs6000_veclib_handler) (tree, tree, tree);
+
 
 static bool rs6000_function_ok_for_sibcall (tree, tree);
 static const char *rs6000_invalid_within_doloop (const_rtx);
@@ -989,6 +992,7 @@  static rtx rs6000_emit_stack_reset (rs60
 static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
 				   enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
+static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
 static int rs6000_savres_strategy (rs6000_stack_t *, bool, int, int);
 static void rs6000_restore_saved_cr (rtx, int);
@@ -2771,6 +2775,15 @@  rs6000_override_options (const char *def
 	       rs6000_traceback_name);
     }
 
+  if (rs6000_veclibabi_name)
+    {
+      if (strcmp (rs6000_veclibabi_name, "mass") == 0)
+	rs6000_veclib_handler = rs6000_builtin_vectorized_libmass;
+      else
+	error ("unknown vectorization library ABI type (%s) for "
+	       "-mveclibabi= switch", rs6000_veclibabi_name);
+    }
+
   if (!rs6000_explicit_options.long_double)
     rs6000_long_double_type_size = RS6000_DEFAULT_LONG_DOUBLE_SIZE;
 
@@ -3602,6 +3615,145 @@  rs6000_parse_fpu_option (const char *opt
   return FPU_NONE;
 }
 
+
+/* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
+   library with vectorized intrinsics.  */
+
+static tree
+rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
+{
+  char name[32];
+  const char *suffix = NULL;
+  tree fntype, new_fndecl, bdecl = NULL_TREE;
+  int n_args = 1;
+  const char *bname;
+  enum machine_mode el_mode, in_mode;
+  int n, in_n;
+
+  /* Libmass is suitable for unsafe math only as it does not correctly support
+     parts of IEEE with the required precision such as denormals.  Only support
+     it if we have VSX to use the simd d2 or f4 functions.
+     XXX: Add variable length support.  */
+  if (!flag_unsafe_math_optimizations || !TARGET_VSX)
+    return NULL_TREE;
+
+  el_mode = TYPE_MODE (TREE_TYPE (type_out));
+  n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+  if (el_mode != in_mode
+      || n != in_n)
+    return NULL_TREE;
+
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
+      switch (fn)
+	{
+	case BUILT_IN_ATAN2:
+	case BUILT_IN_HYPOT:
+	case BUILT_IN_POW:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOS:
+	case BUILT_IN_ACOSH:
+	case BUILT_IN_ASIN:
+	case BUILT_IN_ASINH:
+	case BUILT_IN_ATAN:
+	case BUILT_IN_ATANH:
+	case BUILT_IN_CBRT:
+	case BUILT_IN_COS:
+	case BUILT_IN_COSH:
+	case BUILT_IN_ERF:
+	case BUILT_IN_ERFC:
+	case BUILT_IN_EXP2:
+	case BUILT_IN_EXP:
+	case BUILT_IN_EXPM1:
+	case BUILT_IN_LGAMMA:
+	case BUILT_IN_LOG10:
+	case BUILT_IN_LOG1P:
+	case BUILT_IN_LOG2:
+	case BUILT_IN_LOG:
+	case BUILT_IN_SIN:
+	case BUILT_IN_SINH:
+	case BUILT_IN_SQRT:
+	case BUILT_IN_TAN:
+	case BUILT_IN_TANH:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "d2";				/* pow -> powd2 */
+	  if (el_mode != DFmode
+	      || n != 2)
+	    return NULL_TREE;
+	  break;
+
+	case BUILT_IN_ATAN2F:
+	case BUILT_IN_HYPOTF:
+	case BUILT_IN_POWF:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOSF:
+	case BUILT_IN_ACOSHF:
+	case BUILT_IN_ASINF:
+	case BUILT_IN_ASINHF:
+	case BUILT_IN_ATANF:
+	case BUILT_IN_ATANHF:
+	case BUILT_IN_CBRTF:
+	case BUILT_IN_COSF:
+	case BUILT_IN_COSHF:
+	case BUILT_IN_ERFF:
+	case BUILT_IN_ERFCF:
+	case BUILT_IN_EXP2F:
+	case BUILT_IN_EXPF:
+	case BUILT_IN_EXPM1F:
+	case BUILT_IN_LGAMMAF:
+	case BUILT_IN_LOG10F:
+	case BUILT_IN_LOG1PF:
+	case BUILT_IN_LOG2F:
+	case BUILT_IN_LOGF:
+	case BUILT_IN_SINF:
+	case BUILT_IN_SINHF:
+	case BUILT_IN_SQRTF:
+	case BUILT_IN_TANF:
+	case BUILT_IN_TANHF:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "4";					/* powf -> powf4 */
+	  if (el_mode != SFmode
+	      || n != 4)
+	    return NULL_TREE;
+	  break;
+
+	default:
+	  return NULL_TREE;
+	}
+    }
+  else
+    return NULL_TREE;
+
+  gcc_assert (suffix != NULL);
+  bname = IDENTIFIER_POINTER (DECL_NAME (bdecl));
+  strcpy (name, bname + sizeof ("__builtin_") - 1);
+  strcat (name, suffix);
+
+  if (n_args == 1)
+    fntype = build_function_type_list (type_out, type_in, NULL);
+  else if (n_args == 2)
+    fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+  else
+    gcc_unreachable ();
+
+  /* Build a function declaration for the vectorized function.  */
+  new_fndecl = build_decl (BUILTINS_LOCATION,
+			   FUNCTION_DECL, get_identifier (name), fntype);
+  TREE_PUBLIC (new_fndecl) = 1;
+  DECL_EXTERNAL (new_fndecl) = 1;
+  DECL_IS_NOVOPS (new_fndecl) = 1;
+  TREE_READONLY (new_fndecl) = 1;
+
+  return new_fndecl;
+}
+
 /* Returns a function decl for a vectorized version of the builtin function
    with builtin function code FN and the result vector type TYPE, or NULL_TREE
    if it is not available.  */
@@ -3768,6 +3920,10 @@  rs6000_builtin_vectorized_function (tree
 	}
     }
 
+  /* Generate calls to libmass if appropriate.  */
+  if (rs6000_veclib_handler)
+    return rs6000_veclib_handler (fndecl, type_out, type_in);
+
   return NULL_TREE;
 }