diff mbox series

Aarch64: Add simd exp/expf functions

Message ID ea8e4b6e1ebe5eddb9e72dc1a21baad50f8e6fcf.camel@marvell.com
State New
Headers show
Series Aarch64: Add simd exp/expf functions | expand

Commit Message

Steve Ellcey March 6, 2019, 5:18 p.m. UTC
Here are float and double vector exp functions for Aarch64.  The vector
functions are based on the ieee ones in sysdeps/ieee754/flt-32/e_expf.c
and sysdeps/ieee754/dbl-64/e_exp.c.  If any of the values are 'large'
or NaN they actually call the scalar routines, otherwise they use the
Aarch64 SIMD instructions with the same algorithm as the ieee functions.
My testing has not found any differences in exp output for scalar vs.
vector and the newly added tests for the vector routines pass using the
updated libm-test-ulps file.

This patch also sets build_mathvec to yes by default on Aarch64, applies
the simd attribute to exp and expf in the C header and includes a
Fortran header.  The Fortran header is in finclude so this patch needs
Martin Liska's patch that moves math-vector-fortran.h from bits to
finclude in order to work correctly.

Comments?

Steve Ellcey
sellcey@marvell.com


2019-03-06  Steve Ellcey  <sellcey@marvell.com>

	* sysdeps/aarch64/configure.ac (build_mathvec): Set to yes by default.
	* sysdeps/aarch64/configure: Regenerate.
	* sysdeps/aarch64/fpu/Makefile (CFLAGS-libmvec_double_vlen2_exp.c):
	Set flag.
	(CFLAGS-libmvec_float_vlen4_expf.c): Likewise.
	(CFLAGS-libmvec_exp_data.c): Likewise.
	(CFLAGS-libmvec_exp2f_data.c): Likewise.
	(libmvec-support): Add libmvec_double_vlen2_exp,
	libmvec_float_vlen4_expf, libmvec_exp_data, libmvec_exp2f_data to list.
	(libmvec-static-only-routines): Add dummy name to list.
	(libmvec-tests): Add double-vlen2, float-vlen4 to list.
	(double-vlen2-funcs): Add new vector function name.
	(float-vlen4-funcs): Add new vector function name.
	* sysdeps/aarch64/fpu/Versions: New file.
	* sysdeps/aarch64/fpu/bits/math-vector.h: New file.
	* sysdeps/aarch64/fpu/finclude/math-vector-fortran.h: New file.
	* sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c: New file.
	* sysdeps/aarch64/fpu/libmvec_exp2f_data.c: New file.
	* sysdeps/aarch64/fpu/libmvec_exp_data.c: New file.
	* sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c: New file.
	* sysdeps/aarch64/fpu/libmvec_util.h: New file.
	* sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c: New file.
	* sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c: New file.
	* sysdeps/aarch64/libm-test-ulps (exp_vlen2): New entry.
	(exp_vlen4): Likewise.
	* sysdeps/unix/sysv/linux/aarch64/libmvec.abilist: New file.

Comments

Szabolcs Nagy March 6, 2019, 7:04 p.m. UTC | #1
On 06/03/2019 17:18, Steve Ellcey wrote:
> Here are float and double vector exp functions for Aarch64.  The vector
> functions are based on the ieee ones in sysdeps/ieee754/flt-32/e_expf.c
> and sysdeps/ieee754/dbl-64/e_exp.c.  If any of the values are 'large'
> or NaN they actually call the scalar routines, otherwise they use the
> Aarch64 SIMD instructions with the same algorithm as the ieee functions.
> My testing has not found any differences in exp output for scalar vs.
> vector and the newly added tests for the vector routines pass using the
> updated libm-test-ulps file.
> 
> This patch also sets build_mathvec to yes by default on Aarch64, applies
> the simd attribute to exp and expf in the C header and includes a
> Fortran header.  The Fortran header is in finclude so this patch needs
> Martin Liska's patch that moves math-vector-fortran.h from bits to
> finclude in order to work correctly.
> 
> Comments?

thanks

this will need to detect support for
__attribute__((aarch64_vector_pcs))
(which will require gcc-9)

and i plan to fix the lazy binding issue
with vector pcs which will require a new
binutils too (currently that's not super
important since the dynamic linker is
unlikely to use fpregs outside of v0-v7,
but depending on the exact nature of the
solution we may require a new gcc and
new binutils too for libmvec)

the scalar algorithms are not optimal for
simd, but should work and i'm fine with
such initial code to enable libmvec and
then optimize it later.
Steve Ellcey March 6, 2019, 7:14 p.m. UTC | #2
On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
> 
> this will need to detect support for
> __attribute__((aarch64_vector_pcs))
> (which will require gcc-9)

That seems easy enough to check for.

> and i plan to fix the lazy binding issue
> with vector pcs which will require a new
> binutils too (currently that's not super
> important since the dynamic linker is
> unlikely to use fpregs outside of v0-v7,
> but depending on the exact nature of the
> solution we may require a new gcc and
> new binutils too for libmvec)

I am not sure how I would check for this.
Will it need to be a version check on binutils
or will there be some functionality that can
be checked for?  A new fixup type?  Do you
have an estimate for when the binutils change
will go in?

Steve Ellcey
sellcey@marvell.com
Florian Weimer March 6, 2019, 7:16 p.m. UTC | #3
* Steve Ellcey:

> On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
>> 
>> this will need to detect support for
>> __attribute__((aarch64_vector_pcs))
>> (which will require gcc-9)
>
> That seems easy enough to check for.

Can you add assembler trampolines, so that the compiler support
becomes optional, at a performance cost?

>> and i plan to fix the lazy binding issue
>> with vector pcs which will require a new
>> binutils too (currently that's not super
>> important since the dynamic linker is
>> unlikely to use fpregs outside of v0-v7,
>> but depending on the exact nature of the
>> solution we may require a new gcc and
>> new binutils too for libmvec)
>
> I am not sure how I would check for this.
> Will it need to be a version check on binutils
> or will there be some functionality that can
> be checked for?  A new fixup type?  Do you
> have an estimate for when the binutils change
> will go in?

I don't think the binutils change is needed for building or testing
glibc, at least not initially.  Just disable lazy binding.
Steve Ellcey March 6, 2019, 7:39 p.m. UTC | #4
On Wed, 2019-03-06 at 20:16 +0100, Florian Weimer wrote:

> * Steve Ellcey:
> 
> > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
> > > 
> > > this will need to detect support for
> > > __attribute__((aarch64_vector_pcs))
> > > (which will require gcc-9)
> > 
> > That seems easy enough to check for.
> 
> Can you add assembler trampolines, so that the compiler support
> becomes optional, at a performance cost?

Yuck.  I suppose this is possible, but I do not want to do it.
The whole reason for vector functions (and for the new vector ABI)
is performance so adding a slow path doesn't seem to me like it is
worthwhile.

Steve Ellcey
sellcey@marvell.com
Florian Weimer March 6, 2019, 7:45 p.m. UTC | #5
* Steve Ellcey:

> On Wed, 2019-03-06 at 20:16 +0100, Florian Weimer wrote:
>
>> * Steve Ellcey:
>> 
>> > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
>> > > 
>> > > this will need to detect support for
>> > > __attribute__((aarch64_vector_pcs))
>> > > (which will require gcc-9)
>> > 
>> > That seems easy enough to check for.
>> 
>> Can you add assembler trampolines, so that the compiler support
>> becomes optional, at a performance cost?
>
> Yuck.  I suppose this is possible, but I do not want to do it.
> The whole reason for vector functions (and for the new vector ABI)
> is performance so adding a slow path doesn't seem to me like it is
> worthwhile.

On the other hand, it could help to get libmvec out of the door more
quickly.  I think it's not ideal that if you use an older compiler,
you get only a subset of the glibc ABI.  We can get away with it here
because it affects an entire soname.  Still it might be difficult to
explain why applications are not portable.
Steve Ellcey March 6, 2019, 8:54 p.m. UTC | #6
On Wed, 2019-03-06 at 20:45 +0100, Florian Weimer wrote:

> > > Can you add assembler trampolines, so that the compiler support
> > > becomes optional, at a performance cost?
> > 
> > Yuck.  I suppose this is possible, but I do not want to do it.
> > The whole reason for vector functions (and for the new vector ABI)
> > is performance so adding a slow path doesn't seem to me like it is
> > worthwhile.
> 
> On the other hand, it could help to get libmvec out of the door more
> quickly.  I think it's not ideal that if you use an older compiler,
> you get only a subset of the glibc ABI.  We can get away with it here
> because it affects an entire soname.  Still it might be difficult to
> explain why applications are not portable.

If the user doesn't have gcc-9, their compiler isn't going to generate
any calls to these routines anyway.  So it doesn't really matter if 
they have libmvec or not if they don't have gcc-9.  If a program was
compiled with gcc-9 somewhere else and then moved, then yes the new
platform might not have libmvec and there will be portability problems.

I guess if someone was building a platform with gcc-8 and the latest
glibc then it might be nice if libmvec could be built, but gcc-9 should
be released before the next glibc is released so hopefully anyone using
the latest released glibc will also use the latest gcc and have all the
necessary compiler support.

Steve Ellcey
sellcey@marvell.com
Szabolcs Nagy March 7, 2019, 10:28 a.m. UTC | #7
On 06/03/2019 20:54, Steve Ellcey wrote:
> On Wed, 2019-03-06 at 20:45 +0100, Florian Weimer wrote:
> 
>>>> Can you add assembler trampolines, so that the compiler support
>>>> becomes optional, at a performance cost?
>>>
>>> Yuck.  I suppose this is possible, but I do not want to do it.
>>> The whole reason for vector functions (and for the new vector ABI)
>>> is performance so adding a slow path doesn't seem to me like it is
>>> worthwhile.
>>
>> On the other hand, it could help to get libmvec out of the door more
>> quickly.  I think it's not ideal that if you use an older compiler,
>> you get only a subset of the glibc ABI.  We can get away with it here
>> because it affects an entire soname.  Still it might be difficult to
>> explain why applications are not portable.
> 
> If the user doesn't have gcc-9, their compiler isn't going to generate
> any calls to these routines anyway.  So it doesn't really matter if 
> they have libmvec or not if they don't have gcc-9.  If a program was
> compiled with gcc-9 somewhere else and then moved, then yes the new
> platform might not have libmvec and there will be portability problems.

glibc is probably built with a stable distro gcc, but then the user
may use a trunk gcc to compile code.

of course with trampolines vector math functions may not be worth to
call at all, so it's not clear if having a libmvec with trampolines is
useful other than allowing the glibc abi to be independent of the gcc
version used to compile it.

On 06/03/2019 19:16, Florian Weimer wrote:
> I don't think the binutils change is needed for building or testing
> glibc, at least not initially.  Just disable lazy binding.

in principle libmvec dso as well as anything that references vector
pcs symbols would need to be linked with -z now, and even that's
not enough if we ever want to support LD_AUDIT (which is like
permanent lazy binding). i originally thought i can fix this up
with some simple hack, but it will need a bigger change across the
toolchain, i hope i can post some patches soon and then we can
discuss what to do in glibc.

On 06/03/2019 17:18, Steve Ellcey wrote:
> +  g = __builtin_aarch64_absv2df (x);
> +  h = __builtin_aarch64_reduc_smax_scal_v2df (g);

please use arm_neon.h intrinsics instead of __builtin_aarch64_*, these
are not documented gcc apis, so they may change.
Joseph Myers March 7, 2019, 3:08 p.m. UTC | #8
The following comments are mostly on issues also raised for other 
architectures, so reading the discussions of both the powerpc patches and 
the x86_64 patches is encouraged.

1. The commit message needs to reference the specification of the ABI 
being immplemented, and give confirmation of this having been agreed among 
all relevant parties, and give details of the GCC version implementing the 
ABI.  (The ABI document should be clear on exactly what function variants 
the pragma / attributes mean should be available.  If you wish to add 
other variants in future, e.g. SVE variants, those will need to use a 
*different* pragma / attribute, to avoid new compilers misinterpreting the 
headers from old glibc as meaning the CVE variants are available.)

2. There needs to be a NEWS entry describing the new user-visible feature 
and also giving details of the GCC version with support.

3. There should not be any _finite aliases exported from the shared 
library; rather, use static-only wrappers as on x86_64.  Or fix the 
underlying GCC issue to allow the asm name used as a basis for vector 
function names to be different from that used as a scalar function name; 
see <https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html>.

4. There are formatting issues in this code, including missing spaces 
before '(' and incorrect indentation.

5. Give details (including test programs) of how you tested that the 
functions do work, with an installed glibc and new-enough GCC, for 
vectorized calls resulting from source code calling the scalar functions, 
which the glibc testsuite doesn't cover.  This is important end-to-end 
validation that the ABI is as intended; the lack of it for x86_64 resulted 
in sincos ABI issues only being found later.

6. What does if('aarch64') in the Fortran header mean?  What do you need 
it at all?  The installed header should work for all AArch64 ABIs (so 
currently BE and LE); it's not expected to work for other architectures.

7. "#ifdef BIG_ENDIAN" is not a valid conditional.  The endian.h header 
defines both BIG_ENDIAN and LITTLE_ENDIAN, and then defines BYTE_ORDER to 
one of those.  Does libmvec_util.h get an implicit include of endian.h 
somewhere (so you always get the BE path, which somehow works on LE, 
indicating test coverage issues that should be resolved, preferably 
through automated tests but failing that please describe in the commit 
message how you tested that the endian conditionals were correct), or have 
you only tested for little-endian which worked because of the macro being 
accidentally undefined?

8. Please confirm in the commit message how testing was run for both BE 
and LE, given the presence of such conditionals.
Joseph Myers March 7, 2019, 7:04 p.m. UTC | #9
On Wed, 6 Mar 2019, Florian Weimer wrote:

> On the other hand, it could help to get libmvec out of the door more
> quickly.  I think it's not ideal that if you use an older compiler,
> you get only a subset of the glibc ABI.  We can get away with it here
> because it affects an entire soname.  Still it might be difficult to
> explain why applications are not portable.

On the whole I think I agree with Rich Felker's argument 
<https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against 
having the presence of libmvec depend on the tools used for the build.  
(Note that the installed bits/math-vector.h file, which may be shared 
between multilibs, does not depend on the tools used, so if libmvec was 
disabled then the installed bits/math-vector.h is not actually correct and 
some programs will fail to build.)

This is an argument for removing the --disable-mathvec configure option as 
well as either having assembly wrappers or a requirement for new-enough 
tool versions for building libmvec functions on platforms where the oldest 
supported GCC / binutils aren't new enough.
Szabolcs Nagy March 8, 2019, 9:10 a.m. UTC | #10
On 07/03/2019 19:04, Joseph Myers wrote:
> On Wed, 6 Mar 2019, Florian Weimer wrote:
> 
>> On the other hand, it could help to get libmvec out of the door more
>> quickly.  I think it's not ideal that if you use an older compiler,
>> you get only a subset of the glibc ABI.  We can get away with it here
>> because it affects an entire soname.  Still it might be difficult to
>> explain why applications are not portable.
> 
> On the whole I think I agree with Rich Felker's argument 
> <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against 
> having the presence of libmvec depend on the tools used for the build.  
> (Note that the installed bits/math-vector.h file, which may be shared 
> between multilibs, does not depend on the tools used, so if libmvec was 
> disabled then the installed bits/math-vector.h is not actually correct and 
> some programs will fail to build.)
> 
> This is an argument for removing the --disable-mathvec configure option as 
> well as either having assembly wrappers or a requirement for new-enough 
> tool versions for building libmvec functions on platforms where the oldest 
> supported GCC / binutils aren't new enough.

so is it acceptable to submit generated asm to the
source tree together with the c source?

(or even object files if the assembler is not new
enough?)
Florian Weimer March 8, 2019, 10:11 a.m. UTC | #11
* Szabolcs Nagy:

> On 07/03/2019 19:04, Joseph Myers wrote:
>> On Wed, 6 Mar 2019, Florian Weimer wrote:
>> 
>>> On the other hand, it could help to get libmvec out of the door more
>>> quickly.  I think it's not ideal that if you use an older compiler,
>>> you get only a subset of the glibc ABI.  We can get away with it here
>>> because it affects an entire soname.  Still it might be difficult to
>>> explain why applications are not portable.
>> 
>> On the whole I think I agree with Rich Felker's argument 
>> <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against 
>> having the presence of libmvec depend on the tools used for the build.  
>> (Note that the installed bits/math-vector.h file, which may be shared 
>> between multilibs, does not depend on the tools used, so if libmvec was 
>> disabled then the installed bits/math-vector.h is not actually correct and 
>> some programs will fail to build.)
>> 
>> This is an argument for removing the --disable-mathvec configure option as 
>> well as either having assembly wrappers or a requirement for new-enough 
>> tool versions for building libmvec functions on platforms where the oldest 
>> supported GCC / binutils aren't new enough.
>
> so is it acceptable to submit generated asm to the
> source tree together with the c source?

No, before we do that, I think we should just require GCC 9 and binutils
2.33 for building aarch64.

I had the hope that you could build a compatible ABI with just a few
assember trampolines, but that's not the case if the DSOs need markers
for disabling lazy binding in client code.  (But it is probably more
natural to disable lazy binding though function attributes in the header
file.)

Thanks,
Florian
Joseph Myers March 8, 2019, 11:16 p.m. UTC | #12
On Fri, 8 Mar 2019, Florian Weimer wrote:

> > so is it acceptable to submit generated asm to the
> > source tree together with the c source?
> 
> No, before we do that, I think we should just require GCC 9 and binutils
> 2.33 for building aarch64.

I'm dubious of requiring unreleased versions (for an architecture that 
previously worked with released versions), but given suitable releases, 
requiring recent releases for a given architecture may well be appropriate 
if it's required for some feature the architecture maintainers want to 
support now rather than later.  (Cf. how we set the required version to 
6.2 for powerpc64le to facilitate the work towards IEEE long double 
support.)
Szabolcs Nagy March 18, 2019, 4:52 p.m. UTC | #13
On 07/03/2019 15:08, Joseph Myers wrote:
> 1. The commit message needs to reference the specification of the ABI 
> being immplemented, and give confirmation of this having been agreed among 
> all relevant parties, and give details of the GCC version implementing the 
> ABI.  (The ABI document should be clear on exactly what function variants 
> the pragma / attributes mean should be available.  If you wish to add 
> other variants in future, e.g. SVE variants, those will need to use a 
> *different* pragma / attribute, to avoid new compilers misinterpreting the 
> headers from old glibc as meaning the CVE variants are available.)

the next revision of the vector abi document will try to address this.
(might need some gcc changes)
diff mbox series

Patch

diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
index 7851dd4..c6d9646 100644
--- a/sysdeps/aarch64/configure.ac
+++ b/sysdeps/aarch64/configure.ac
@@ -20,3 +20,7 @@  if test $libc_cv_aarch64_be = yes; then
 else
   LIBC_CONFIG_VAR([default-abi], [lp64])
 fi
+
+if test x"$build_mathvec" = xnotset; then
+  build_mathvec=yes
+fi
diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
index 4a182bd..579b6a5 100644
--- a/sysdeps/aarch64/fpu/Makefile
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -12,3 +12,27 @@  CFLAGS-s_fmaxf.c += -ffinite-math-only
 CFLAGS-s_fmin.c += -ffinite-math-only
 CFLAGS-s_fminf.c += -ffinite-math-only
 endif
+
+ifeq ($(subdir),mathvec)
+CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno
+
+libmvec-support += libmvec_double_vlen2_exp
+libmvec-support += libmvec_float_vlen4_expf
+libmvec-support += libmvec_exp_data
+libmvec-support += libmvec_exp2f_data
+
+# If I do not add a static routine I do not get libmvec_nonshared.a
+# installed and GCC will fail to link when it cannot find it.
+libmvec-static-only-routines += libmvec_dummy
+endif
+
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen2 float-vlen4
+double-vlen2-funcs = exp
+float-vlen4-funcs = exp
+endif
+endif
diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions
index e69de29..9fe90ba 100644
--- a/sysdeps/aarch64/fpu/Versions
+++ b/sysdeps/aarch64/fpu/Versions
@@ -0,0 +1,5 @@ 
+libmvec {
+  GLIBC_2.30 {
+    _ZGVnN2v___exp_finite; _ZGVnN2v_exp; _ZGVnN4v___expf_finite; _ZGVnN4v_expf;
+  }
+}
diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h
index e69de29..4c34159 100644
--- a/sysdeps/aarch64/fpu/bits/math-vector.h
+++ b/sysdeps/aarch64/fpu/bits/math-vector.h
@@ -0,0 +1,43 @@ 
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations.  */
+#include <bits/libm-simd-decl-stubs.h>
+
+#if defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case.  */
+#  define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch")
+# elif __GNUC_PREREQ (6,0)
+/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
+#  define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch")))
+# endif
+
+# ifdef __DECL_SIMD_AARCH64
+#  undef __DECL_SIMD_exp
+#  define __DECL_SIMD_exp __DECL_SIMD_AARCH64
+#  undef __DECL_SIMD_expf
+#  define __DECL_SIMD_expf __DECL_SIMD_AARCH64
+
+# endif
+#endif
diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
index e69de29..e42bed4 100644
--- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
@@ -0,0 +1,20 @@ 
+! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*-
+!   Copyright (C) 2019 Free Software Foundation, Inc.
+!   This file is part of the GNU C Library.
+!
+!   The GNU C Library is free software; you can redistribute it and/or
+!   modify it under the terms of the GNU Lesser General Public
+!   License as published by the Free Software Foundation; either
+!   version 2.1 of the License, or (at your option) any later version.
+!
+!   The GNU C Library is distributed in the hope that it will be useful,
+!   but WITHOUT ANY WARRANTY; without even the implied warranty of
+!   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+!   Lesser General Public License for more details.
+!
+!   You should have received a copy of the GNU Lesser General Public
+!   License along with the GNU C Library; if not, see
+!   <http://www.gnu.org/licenses/>.
+
+!GCC$ builtin (exp) attributes simd (notinbranch) if('aarch64')
+!GCC$ builtin (expf) attributes simd (notinbranch) if('aarch64')
diff --git a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
index e69de29..fecb0ad 100644
--- a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
+++ b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
@@ -0,0 +1,95 @@ 
+/* Double-precision 2 element vector e^x function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c.  */
+
+#include <math.h>
+#include <float.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <ieee754.h>
+#include <math-narrow-eval.h>
+#include "math_config.h"
+#include "libmvec_util.h"
+
+#define N (1 << EXP_TABLE_BITS)
+#define InvLn2N __exp_data.invln2N
+#define NegLn2hiN __exp_data.negln2hiN
+#define NegLn2loN __exp_data.negln2loN
+#define Shift __exp_data.shift
+#define T __exp_data.tab
+#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
+#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
+#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
+#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
+
+#define LIMIT 700.0
+
+/* Do not inline this call.   That way _ZGVnN2v_exp has no calls to non-vector
+   functions.  This reduces the register saves that _ZGVnN2v_exp has to do.  */
+
+__attribute__((aarch64_vector_pcs, noinline)) static __Float64x2_t
+__scalar_exp(__Float64x2_t x)
+{
+  return (__Float64x2_t) { exp(x[0]), exp(x[1]) };
+}
+
+__attribute__((aarch64_vector_pcs)) __Float64x2_t
+_ZGVnN2v_exp(__Float64x2_t x)
+{
+  double h, z_0, z_1;
+  __Float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v;
+  __Float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v;
+  uint64_t ki_0, ki_1, idx_0, idx_1;
+  uint64_t top_0, top_1, sbits_0, sbits_1;
+
+  /* If any value is larger than LIMIT, or NAN, call scalar operation.  */
+  g = __builtin_aarch64_absv2df (x);
+  h = __builtin_aarch64_reduc_smax_scal_v2df (g);
+  if (__glibc_unlikely (!(h < LIMIT)))
+    return __scalar_exp (x);
+
+  z_0 = InvLn2N * x[0];
+  z_1 = InvLn2N * x[1];
+  ki_0 = converttoint (z_0);
+  ki_1 = converttoint (z_1);
+
+  idx_0 = 2 * (ki_0 % N);
+  idx_1 = 2 * (ki_1 % N);
+  top_0 = ki_0 << (52 - EXP_TABLE_BITS);
+  top_1 = ki_1 << (52 - EXP_TABLE_BITS);
+  sbits_0 = T[idx_0 + 1] + top_0;
+  sbits_1 = T[idx_1 + 1] + top_1;
+
+  kd_v = (__Float64x2_t) { roundtoint (z_0), roundtoint (z_1) };
+  scale_v = (__Float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) };
+  tail_v = (__Float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) };
+  NegLn2hiN_v = (__Float64x2_t) { NegLn2hiN, NegLn2hiN };
+  NegLn2loN_v = (__Float64x2_t) { NegLn2loN, NegLn2loN };
+  C2_v = (__Float64x2_t) { C2, C2 };
+  C3_v = (__Float64x2_t) { C3, C3 };
+  C4_v = (__Float64x2_t) { C4, C4 };
+  C5_v = (__Float64x2_t) { C5, C5 };
+
+  r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v;
+  r2_v = r_v * r_v;
+  tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v
+	  * (C4_v + r_v * C5_v);
+  return scale_v + scale_v * tmp_v;
+}
+weak_alias (_ZGVnN2v_exp, _ZGVnN2v___exp_finite)
diff --git a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
index e69de29..d97ce15 100644
--- a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
+++ b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
@@ -0,0 +1,2 @@ 
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include <sysdeps/ieee754/flt-32/e_exp2f_data.c>
diff --git a/sysdeps/aarch64/fpu/libmvec_exp_data.c b/sysdeps/aarch64/fpu/libmvec_exp_data.c
index e69de29..a83661b 100644
--- a/sysdeps/aarch64/fpu/libmvec_exp_data.c
+++ b/sysdeps/aarch64/fpu/libmvec_exp_data.c
@@ -0,0 +1 @@ 
+#include <sysdeps/ieee754/dbl-64/e_exp_data.c>
diff --git a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
index e69de29..6504574 100644
--- a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
+++ b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
@@ -0,0 +1,115 @@ 
+/* Single-precision 2 element vector e^x function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This function is based on sysdeps/ieee754/flt-32/e_expf.c.  */
+
+#include <math.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include "libmvec_util.h"
+
+#define N (1 << EXP2F_TABLE_BITS)
+#define LIMIT 80.0
+
+#define InvLn2N __exp2f_data.invln2_scaled
+#define T __exp2f_data.tab
+#define C __exp2f_data.poly_scaled
+#define SHIFT __exp2f_data.shift
+
+/* Do not inline this call.  That way _ZGVnN4v_expf has no calls to non-vector
+   functions.  This reduces the register saves that _ZGVnN4v_expf has to do.  */
+
+__attribute__((aarch64_vector_pcs,noinline)) static __Float32x4_t
+__scalar_expf (__Float32x4_t x)
+{
+  return (__Float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) };
+}
+
+__attribute__((aarch64_vector_pcs)) __Float32x4_t
+_ZGVnN4v_expf(__Float32x4_t x)
+{
+  __Float32x4_t g, result;
+  __Float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1;
+  __Float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one;
+  uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3;
+  double s_0, s_1, s_2, s_3;
+  float f;
+
+  /* If any value is larger than LIMIT, or NAN, call scalar operation.  */
+  g = __builtin_aarch64_absv4sf (x);
+  f = __builtin_aarch64_reduc_smax_scal_v4sf (g);
+  if (__glibc_unlikely (!(f < LIMIT)))
+    return __scalar_expf (x);
+  
+  xd_0 = get_lo_and_extend (x);
+  xd_1 = get_hi_and_extend (x);
+
+  vInvLn2N = (__Float64x2_t) { InvLn2N, InvLn2N };
+  /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k.  */
+  z_0 = vInvLn2N * xd_0;
+  z_1 = vInvLn2N * xd_1;
+
+  /* Round and convert z to int, the result is in [-150*N, 128*N] and
+     ideally ties-to-even rule is used, otherwise the magnitude of r
+     can be bigger which gives larger approximation error.  */
+  vkd_0 = __builtin_aarch64_roundv2df (z_0);
+  vkd_1 = __builtin_aarch64_roundv2df (z_1);
+  r_0 = z_0 - vkd_0;
+  r_1 = z_1 - vkd_1;
+
+  ki_0 = (long) vkd_0[0];
+  ki_1 = (long) vkd_0[1];
+  ki_2 = (long) vkd_1[0];
+  ki_3 = (long) vkd_1[1];
+
+  /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */
+  t_0 = T[ki_0 % N];
+  t_1 = T[ki_1 % N];
+  t_2 = T[ki_2 % N];
+  t_3 = T[ki_3 % N];
+  t_0 += ki_0 << (52 - EXP2F_TABLE_BITS);
+  t_1 += ki_1 << (52 - EXP2F_TABLE_BITS);
+  t_2 += ki_2 << (52 - EXP2F_TABLE_BITS);
+  t_3 += ki_3 << (52 - EXP2F_TABLE_BITS);
+  s_0 = asdouble (t_0);
+  s_1 = asdouble (t_1);
+  s_2 = asdouble (t_2);
+  s_3 = asdouble (t_3);
+
+  vs_0 = (__Float64x2_t) { s_0, s_1 };
+  vs_1 = (__Float64x2_t) { s_2, s_3 };
+  c0 = (__Float64x2_t) { C[0], C[0] };
+  c1 = (__Float64x2_t) { C[1], C[1] };
+  c2 = (__Float64x2_t) { C[2], C[2] };
+  one = (__Float64x2_t) { 1.0, 1.0 };
+
+  z_0 = c0 * r_0 + c1;
+  z_1 = c0 * r_1 + c1;
+  r2_0 = r_0 * r_0;
+  r2_1 = r_1 * r_1;
+  y_0 = c2 * r_0 + one;
+  y_1 = c2 * r_1 + one;
+  y_0 = z_0 * r2_0 + y_0;
+  y_1 = z_1 * r2_1 + y_1;
+  y_0 = y_0 * vs_0;
+  y_1 = y_1 * vs_1;
+  result = pack_and_trunc (y_0, y_1);
+  return result;
+}
+weak_alias (_ZGVnN4v_expf, _ZGVnN4v___expf_finite)
diff --git a/sysdeps/aarch64/fpu/libmvec_util.h b/sysdeps/aarch64/fpu/libmvec_util.h
index e69de29..a127724 100644
--- a/sysdeps/aarch64/fpu/libmvec_util.h
+++ b/sysdeps/aarch64/fpu/libmvec_util.h
@@ -0,0 +1,53 @@ 
+/* Utility functions for Aarch64 vector functions.
+   Copyright (C) 2015-2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+
+/* Copy lower 2 elements of of a 4 element float vector into a 2 element
+   double vector.  */
+
+static __always_inline
+__Float64x2_t get_lo_and_extend (__Float32x4_t x)
+{
+	__Uint64x2_t  tmp1 = (__Uint64x2_t) x;
+#ifdef BIG_ENDIAN
+	uint64_t      tmp2 = (uint64_t) tmp1[1];
+#else
+	uint64_t      tmp2 = (uint64_t) tmp1[0];
+#endif
+	return __builtin_aarch64_float_extend_lo_v2df ((__Float32x2_t) tmp2);
+}
+
+/* Copy upper 2 elements of of a 4 element float vector into a 2 element
+   double vector.  */
+
+static __always_inline
+__Float64x2_t get_hi_and_extend (__Float32x4_t x)
+{
+	return __builtin_aarch64_vec_unpacks_hi_v4sf (x);
+}
+
+/* Copy a pair of 2 element double vectors into a 4 element float vector.  */
+
+static __always_inline
+__Float32x4_t pack_and_trunc (__Float64x2_t x, __Float64x2_t y)
+{
+        __Float32x2_t xx = __builtin_aarch64_float_truncate_lo_v2sf (x);
+        __Float32x2_t yy = __builtin_aarch64_float_truncate_lo_v2sf (y);
+        return (__builtin_aarch64_combinev2sf (xx, yy));
+}
diff --git a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
index e69de29..331a51e 100644
--- a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
@@ -0,0 +1,23 @@ 
+/* Wrapper part of tests for aarch64 double vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen2.h"
+
+#define VEC_TYPE __Float64x2_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp)
diff --git a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
index e69de29..e3feef6 100644
--- a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
@@ -0,0 +1,23 @@ 
+/* Wrapper part of tests for float aarch64 vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen4.h"
+
+#define VEC_TYPE __Float32x4_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf)
diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
index 585e5bb..1ed4af9 100644
--- a/sysdeps/aarch64/libm-test-ulps
+++ b/sysdeps/aarch64/libm-test-ulps
@@ -1601,6 +1601,12 @@  float: 1
 idouble: 1
 ifloat: 1
 
+Function: "exp_vlen2":
+double: 1
+
+Function: "exp_vlen4":
+float: 1
+
 Function: "expm1":
 double: 1
 float: 1
diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
index e69de29..b7431a3 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
@@ -0,0 +1,4 @@ 
+GLIBC_2.30 _ZGVnN2v___exp_finite F
+GLIBC_2.30 _ZGVnN2v_exp F
+GLIBC_2.30 _ZGVnN4v___expf_finite F
+GLIBC_2.30 _ZGVnN4v_expf F