Message ID | ea8e4b6e1ebe5eddb9e72dc1a21baad50f8e6fcf.camel@marvell.com |
---|---|
State | New |
Headers | show |
Series | Aarch64: Add simd exp/expf functions | expand |
On 06/03/2019 17:18, Steve Ellcey wrote: > Here are float and double vector exp functions for Aarch64. The vector > functions are based on the ieee ones in sysdeps/ieee754/flt-32/e_expf.c > and sysdeps/ieee754/dbl-64/e_exp.c. If any of the values are 'large' > or NaN they actually call the scalar routines, otherwise they use the > Aarch64 SIMD instructions with the same algorithm as the ieee functions. > My testing has not found any differences in exp output for scalar vs. > vector and the newly added tests for the vector routines pass using the > updated libm-test-ulps file. > > This patch also sets build_mathvec to yes by default on Aarch64, applies > the simd attribute to exp and expf in the C header and includes a > Fortran header. The Fortran header is in finclude so this patch needs > Martin Liska's patch that moves math-vector-fortran.h from bits to > finclude in order to work correctly. > > Comments? thanks this will need to detect support for __attribute__((aarch64_vector_pcs)) (which will require gcc-9) and i plan to fix the lazy binding issue with vector pcs which will require a new binutils too (currently that's not super important since the dynamic linker is unlikely to use fpregs outside of v0-v7, but depending on the exact nature of the solution we may require a new gcc and new binutils too for libmvec) the scalar algorithms are not optimal for simd, but should work and i'm fine with such initial code to enable libmvec and then optimize it later.
On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote: > > this will need to detect support for > __attribute__((aarch64_vector_pcs)) > (which will require gcc-9) That seems easy enough to check for. > and i plan to fix the lazy binding issue > with vector pcs which will require a new > binutils too (currently that's not super > important since the dynamic linker is > unlikely to use fpregs outside of v0-v7, > but depending on the exact nature of the > solution we may require a new gcc and > new binutils too for libmvec) I am not sure how I would check for this. Will it need to be a version check on binutils or will there be some functionality that can be checked for? A new fixup type? Do you have an estimate for when the binutils change will go in? Steve Ellcey sellcey@marvell.com
* Steve Ellcey: > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote: >> >> this will need to detect support for >> __attribute__((aarch64_vector_pcs)) >> (which will require gcc-9) > > That seems easy enough to check for. Can you add assembler trampolines, so that the compiler support becomes optional, at a performance cost? >> and i plan to fix the lazy binding issue >> with vector pcs which will require a new >> binutils too (currently that's not super >> important since the dynamic linker is >> unlikely to use fpregs outside of v0-v7, >> but depending on the exact nature of the >> solution we may require a new gcc and >> new binutils too for libmvec) > > I am not sure how I would check for this. > Will it need to be a version check on binutils > or will there be some functionality that can > be checked for? A new fixup type? Do you > have an estimate for when the binutils change > will go in? I don't think the binutils change is needed for building or testing glibc, at least not initially. Just disable lazy binding.
On Wed, 2019-03-06 at 20:16 +0100, Florian Weimer wrote: > * Steve Ellcey: > > > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote: > > > > > > this will need to detect support for > > > __attribute__((aarch64_vector_pcs)) > > > (which will require gcc-9) > > > > That seems easy enough to check for. > > Can you add assembler trampolines, so that the compiler support > becomes optional, at a performance cost? Yuck. I suppose this is possible, but I do not want to do it. The whole reason for vector functions (and for the new vector ABI) is performance so adding a slow path doesn't seem to me like it is worthwhile. Steve Ellcey sellcey@marvell.com
* Steve Ellcey: > On Wed, 2019-03-06 at 20:16 +0100, Florian Weimer wrote: > >> * Steve Ellcey: >> >> > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote: >> > > >> > > this will need to detect support for >> > > __attribute__((aarch64_vector_pcs)) >> > > (which will require gcc-9) >> > >> > That seems easy enough to check for. >> >> Can you add assembler trampolines, so that the compiler support >> becomes optional, at a performance cost? > > Yuck. I suppose this is possible, but I do not want to do it. > The whole reason for vector functions (and for the new vector ABI) > is performance so adding a slow path doesn't seem to me like it is > worthwhile. On the other hand, it could help to get libmvec out of the door more quickly. I think it's not ideal that if you use an older compiler, you get only a subset of the glibc ABI. We can get away with it here because it affects an entire soname. Still it might be difficult to explain why applications are not portable.
On Wed, 2019-03-06 at 20:45 +0100, Florian Weimer wrote: > > > Can you add assembler trampolines, so that the compiler support > > > becomes optional, at a performance cost? > > > > Yuck. I suppose this is possible, but I do not want to do it. > > The whole reason for vector functions (and for the new vector ABI) > > is performance so adding a slow path doesn't seem to me like it is > > worthwhile. > > On the other hand, it could help to get libmvec out of the door more > quickly. I think it's not ideal that if you use an older compiler, > you get only a subset of the glibc ABI. We can get away with it here > because it affects an entire soname. Still it might be difficult to > explain why applications are not portable. If the user doesn't have gcc-9, their compiler isn't going to generate any calls to these routines anyway. So it doesn't really matter if they have libmvec or not if they don't have gcc-9. If a program was compiled with gcc-9 somewhere else and then moved, then yes the new platform might not have libmvec and there will be portability problems. I guess if someone was building a platform with gcc-8 and the latest glibc then it might be nice if libmvec could be built, but gcc-9 should be released before the next glibc is released so hopefully anyone using the latest released glibc will also use the latest gcc and have all the necessary compiler support. Steve Ellcey sellcey@marvell.com
On 06/03/2019 20:54, Steve Ellcey wrote: > On Wed, 2019-03-06 at 20:45 +0100, Florian Weimer wrote: > >>>> Can you add assembler trampolines, so that the compiler support >>>> becomes optional, at a performance cost? >>> >>> Yuck. I suppose this is possible, but I do not want to do it. >>> The whole reason for vector functions (and for the new vector ABI) >>> is performance so adding a slow path doesn't seem to me like it is >>> worthwhile. >> >> On the other hand, it could help to get libmvec out of the door more >> quickly. I think it's not ideal that if you use an older compiler, >> you get only a subset of the glibc ABI. We can get away with it here >> because it affects an entire soname. Still it might be difficult to >> explain why applications are not portable. > > If the user doesn't have gcc-9, their compiler isn't going to generate > any calls to these routines anyway. So it doesn't really matter if > they have libmvec or not if they don't have gcc-9. If a program was > compiled with gcc-9 somewhere else and then moved, then yes the new > platform might not have libmvec and there will be portability problems. glibc is probably built with a stable distro gcc, but then the user may use a trunk gcc to compile code. of course with trampolines vector math functions may not be worth to call at all, so it's not clear if having a libmvec with trampolines is useful other than allowing the glibc abi to be independent of the gcc version used to compile it. On 06/03/2019 19:16, Florian Weimer wrote: > I don't think the binutils change is needed for building or testing > glibc, at least not initially. Just disable lazy binding. in principle libmvec dso as well as anything that references vector pcs symbols would need to be linked with -z now, and even that's not enough if we ever want to support LD_AUDIT (which is like permanent lazy binding). i originally thought i can fix this up with some simple hack, but it will need a bigger change across the toolchain, i hope i can post some patches soon and then we can discuss what to do in glibc. On 06/03/2019 17:18, Steve Ellcey wrote: > + g = __builtin_aarch64_absv2df (x); > + h = __builtin_aarch64_reduc_smax_scal_v2df (g); please use arm_neon.h intrinsics instead of __builtin_aarch64_*, these are not documented gcc apis, so they may change.
The following comments are mostly on issues also raised for other architectures, so reading the discussions of both the powerpc patches and the x86_64 patches is encouraged. 1. The commit message needs to reference the specification of the ABI being immplemented, and give confirmation of this having been agreed among all relevant parties, and give details of the GCC version implementing the ABI. (The ABI document should be clear on exactly what function variants the pragma / attributes mean should be available. If you wish to add other variants in future, e.g. SVE variants, those will need to use a *different* pragma / attribute, to avoid new compilers misinterpreting the headers from old glibc as meaning the CVE variants are available.) 2. There needs to be a NEWS entry describing the new user-visible feature and also giving details of the GCC version with support. 3. There should not be any _finite aliases exported from the shared library; rather, use static-only wrappers as on x86_64. Or fix the underlying GCC issue to allow the asm name used as a basis for vector function names to be different from that used as a scalar function name; see <https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html>. 4. There are formatting issues in this code, including missing spaces before '(' and incorrect indentation. 5. Give details (including test programs) of how you tested that the functions do work, with an installed glibc and new-enough GCC, for vectorized calls resulting from source code calling the scalar functions, which the glibc testsuite doesn't cover. This is important end-to-end validation that the ABI is as intended; the lack of it for x86_64 resulted in sincos ABI issues only being found later. 6. What does if('aarch64') in the Fortran header mean? What do you need it at all? The installed header should work for all AArch64 ABIs (so currently BE and LE); it's not expected to work for other architectures. 7. "#ifdef BIG_ENDIAN" is not a valid conditional. The endian.h header defines both BIG_ENDIAN and LITTLE_ENDIAN, and then defines BYTE_ORDER to one of those. Does libmvec_util.h get an implicit include of endian.h somewhere (so you always get the BE path, which somehow works on LE, indicating test coverage issues that should be resolved, preferably through automated tests but failing that please describe in the commit message how you tested that the endian conditionals were correct), or have you only tested for little-endian which worked because of the macro being accidentally undefined? 8. Please confirm in the commit message how testing was run for both BE and LE, given the presence of such conditionals.
On Wed, 6 Mar 2019, Florian Weimer wrote: > On the other hand, it could help to get libmvec out of the door more > quickly. I think it's not ideal that if you use an older compiler, > you get only a subset of the glibc ABI. We can get away with it here > because it affects an entire soname. Still it might be difficult to > explain why applications are not portable. On the whole I think I agree with Rich Felker's argument <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against having the presence of libmvec depend on the tools used for the build. (Note that the installed bits/math-vector.h file, which may be shared between multilibs, does not depend on the tools used, so if libmvec was disabled then the installed bits/math-vector.h is not actually correct and some programs will fail to build.) This is an argument for removing the --disable-mathvec configure option as well as either having assembly wrappers or a requirement for new-enough tool versions for building libmvec functions on platforms where the oldest supported GCC / binutils aren't new enough.
On 07/03/2019 19:04, Joseph Myers wrote: > On Wed, 6 Mar 2019, Florian Weimer wrote: > >> On the other hand, it could help to get libmvec out of the door more >> quickly. I think it's not ideal that if you use an older compiler, >> you get only a subset of the glibc ABI. We can get away with it here >> because it affects an entire soname. Still it might be difficult to >> explain why applications are not portable. > > On the whole I think I agree with Rich Felker's argument > <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against > having the presence of libmvec depend on the tools used for the build. > (Note that the installed bits/math-vector.h file, which may be shared > between multilibs, does not depend on the tools used, so if libmvec was > disabled then the installed bits/math-vector.h is not actually correct and > some programs will fail to build.) > > This is an argument for removing the --disable-mathvec configure option as > well as either having assembly wrappers or a requirement for new-enough > tool versions for building libmvec functions on platforms where the oldest > supported GCC / binutils aren't new enough. so is it acceptable to submit generated asm to the source tree together with the c source? (or even object files if the assembler is not new enough?)
* Szabolcs Nagy: > On 07/03/2019 19:04, Joseph Myers wrote: >> On Wed, 6 Mar 2019, Florian Weimer wrote: >> >>> On the other hand, it could help to get libmvec out of the door more >>> quickly. I think it's not ideal that if you use an older compiler, >>> you get only a subset of the glibc ABI. We can get away with it here >>> because it affects an entire soname. Still it might be difficult to >>> explain why applications are not portable. >> >> On the whole I think I agree with Rich Felker's argument >> <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against >> having the presence of libmvec depend on the tools used for the build. >> (Note that the installed bits/math-vector.h file, which may be shared >> between multilibs, does not depend on the tools used, so if libmvec was >> disabled then the installed bits/math-vector.h is not actually correct and >> some programs will fail to build.) >> >> This is an argument for removing the --disable-mathvec configure option as >> well as either having assembly wrappers or a requirement for new-enough >> tool versions for building libmvec functions on platforms where the oldest >> supported GCC / binutils aren't new enough. > > so is it acceptable to submit generated asm to the > source tree together with the c source? No, before we do that, I think we should just require GCC 9 and binutils 2.33 for building aarch64. I had the hope that you could build a compatible ABI with just a few assember trampolines, but that's not the case if the DSOs need markers for disabling lazy binding in client code. (But it is probably more natural to disable lazy binding though function attributes in the header file.) Thanks, Florian
On Fri, 8 Mar 2019, Florian Weimer wrote: > > so is it acceptable to submit generated asm to the > > source tree together with the c source? > > No, before we do that, I think we should just require GCC 9 and binutils > 2.33 for building aarch64. I'm dubious of requiring unreleased versions (for an architecture that previously worked with released versions), but given suitable releases, requiring recent releases for a given architecture may well be appropriate if it's required for some feature the architecture maintainers want to support now rather than later. (Cf. how we set the required version to 6.2 for powerpc64le to facilitate the work towards IEEE long double support.)
On 07/03/2019 15:08, Joseph Myers wrote: > 1. The commit message needs to reference the specification of the ABI > being immplemented, and give confirmation of this having been agreed among > all relevant parties, and give details of the GCC version implementing the > ABI. (The ABI document should be clear on exactly what function variants > the pragma / attributes mean should be available. If you wish to add > other variants in future, e.g. SVE variants, those will need to use a > *different* pragma / attribute, to avoid new compilers misinterpreting the > headers from old glibc as meaning the CVE variants are available.) the next revision of the vector abi document will try to address this. (might need some gcc changes)
diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac index 7851dd4..c6d9646 100644 --- a/sysdeps/aarch64/configure.ac +++ b/sysdeps/aarch64/configure.ac @@ -20,3 +20,7 @@ if test $libc_cv_aarch64_be = yes; then else LIBC_CONFIG_VAR([default-abi], [lp64]) fi + +if test x"$build_mathvec" = xnotset; then + build_mathvec=yes +fi diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 4a182bd..579b6a5 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -12,3 +12,27 @@ CFLAGS-s_fmaxf.c += -ffinite-math-only CFLAGS-s_fmin.c += -ffinite-math-only CFLAGS-s_fminf.c += -ffinite-math-only endif + +ifeq ($(subdir),mathvec) +CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno + +libmvec-support += libmvec_double_vlen2_exp +libmvec-support += libmvec_float_vlen4_expf +libmvec-support += libmvec_exp_data +libmvec-support += libmvec_exp2f_data + +# If I do not add a static routine I do not get libmvec_nonshared.a +# installed and GCC will fail to link when it cannot find it. +libmvec-static-only-routines += libmvec_dummy +endif + +ifeq ($(subdir),math) +ifeq ($(build-mathvec),yes) +libmvec-tests += double-vlen2 float-vlen4 +double-vlen2-funcs = exp +float-vlen4-funcs = exp +endif +endif diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index e69de29..9fe90ba 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -0,0 +1,5 @@ +libmvec { + GLIBC_2.30 { + _ZGVnN2v___exp_finite; _ZGVnN2v_exp; _ZGVnN4v___expf_finite; _ZGVnN4v_expf; + } +} diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index e69de29..4c34159 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -0,0 +1,43 @@ +/* Platform-specific SIMD declarations of math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _MATH_H +# error "Never include <bits/math-vector.h> directly;\ + include <math.h> instead." +#endif + +/* Get default empty definitions for simd declarations. */ +#include <bits/libm-simd-decl-stubs.h> + +#if defined __FAST_MATH__ +# if defined _OPENMP && _OPENMP >= 201307 +/* OpenMP case. */ +# define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch") +# elif __GNUC_PREREQ (6,0) +/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)). */ +# define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch"))) +# endif + +# ifdef __DECL_SIMD_AARCH64 +# undef __DECL_SIMD_exp +# define __DECL_SIMD_exp __DECL_SIMD_AARCH64 +# undef __DECL_SIMD_expf +# define __DECL_SIMD_expf __DECL_SIMD_AARCH64 + +# endif +#endif diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h index e69de29..e42bed4 100644 --- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h +++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h @@ -0,0 +1,20 @@ +! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*- +! Copyright (C) 2019 Free Software Foundation, Inc. +! This file is part of the GNU C Library. +! +! The GNU C Library is free software; you can redistribute it and/or +! modify it under the terms of the GNU Lesser General Public +! License as published by the Free Software Foundation; either +! version 2.1 of the License, or (at your option) any later version. +! +! The GNU C Library is distributed in the hope that it will be useful, +! but WITHOUT ANY WARRANTY; without even the implied warranty of +! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +! Lesser General Public License for more details. +! +! You should have received a copy of the GNU Lesser General Public +! License along with the GNU C Library; if not, see +! <http://www.gnu.org/licenses/>. + +!GCC$ builtin (exp) attributes simd (notinbranch) if('aarch64') +!GCC$ builtin (expf) attributes simd (notinbranch) if('aarch64') diff --git a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c index e69de29..fecb0ad 100644 --- a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c +++ b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c @@ -0,0 +1,95 @@ +/* Double-precision 2 element vector e^x function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c. */ + +#include <math.h> +#include <float.h> +#include <stdint.h> +#include <stdlib.h> +#include <ieee754.h> +#include <math-narrow-eval.h> +#include "math_config.h" +#include "libmvec_util.h" + +#define N (1 << EXP_TABLE_BITS) +#define InvLn2N __exp_data.invln2N +#define NegLn2hiN __exp_data.negln2hiN +#define NegLn2loN __exp_data.negln2loN +#define Shift __exp_data.shift +#define T __exp_data.tab +#define C2 __exp_data.poly[5 - EXP_POLY_ORDER] +#define C3 __exp_data.poly[6 - EXP_POLY_ORDER] +#define C4 __exp_data.poly[7 - EXP_POLY_ORDER] +#define C5 __exp_data.poly[8 - EXP_POLY_ORDER] + +#define LIMIT 700.0 + +/* Do not inline this call. That way _ZGVnN2v_exp has no calls to non-vector + functions. This reduces the register saves that _ZGVnN2v_exp has to do. */ + +__attribute__((aarch64_vector_pcs, noinline)) static __Float64x2_t +__scalar_exp(__Float64x2_t x) +{ + return (__Float64x2_t) { exp(x[0]), exp(x[1]) }; +} + +__attribute__((aarch64_vector_pcs)) __Float64x2_t +_ZGVnN2v_exp(__Float64x2_t x) +{ + double h, z_0, z_1; + __Float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v; + __Float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v; + uint64_t ki_0, ki_1, idx_0, idx_1; + uint64_t top_0, top_1, sbits_0, sbits_1; + + /* If any value is larger than LIMIT, or NAN, call scalar operation. */ + g = __builtin_aarch64_absv2df (x); + h = __builtin_aarch64_reduc_smax_scal_v2df (g); + if (__glibc_unlikely (!(h < LIMIT))) + return __scalar_exp (x); + + z_0 = InvLn2N * x[0]; + z_1 = InvLn2N * x[1]; + ki_0 = converttoint (z_0); + ki_1 = converttoint (z_1); + + idx_0 = 2 * (ki_0 % N); + idx_1 = 2 * (ki_1 % N); + top_0 = ki_0 << (52 - EXP_TABLE_BITS); + top_1 = ki_1 << (52 - EXP_TABLE_BITS); + sbits_0 = T[idx_0 + 1] + top_0; + sbits_1 = T[idx_1 + 1] + top_1; + + kd_v = (__Float64x2_t) { roundtoint (z_0), roundtoint (z_1) }; + scale_v = (__Float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) }; + tail_v = (__Float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) }; + NegLn2hiN_v = (__Float64x2_t) { NegLn2hiN, NegLn2hiN }; + NegLn2loN_v = (__Float64x2_t) { NegLn2loN, NegLn2loN }; + C2_v = (__Float64x2_t) { C2, C2 }; + C3_v = (__Float64x2_t) { C3, C3 }; + C4_v = (__Float64x2_t) { C4, C4 }; + C5_v = (__Float64x2_t) { C5, C5 }; + + r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v; + r2_v = r_v * r_v; + tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v + * (C4_v + r_v * C5_v); + return scale_v + scale_v * tmp_v; +} +weak_alias (_ZGVnN2v_exp, _ZGVnN2v___exp_finite) diff --git a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c index e69de29..d97ce15 100644 --- a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c +++ b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c @@ -0,0 +1,2 @@ +#include <sysdeps/ieee754/flt-32/math_config.h> +#include <sysdeps/ieee754/flt-32/e_exp2f_data.c> diff --git a/sysdeps/aarch64/fpu/libmvec_exp_data.c b/sysdeps/aarch64/fpu/libmvec_exp_data.c index e69de29..a83661b 100644 --- a/sysdeps/aarch64/fpu/libmvec_exp_data.c +++ b/sysdeps/aarch64/fpu/libmvec_exp_data.c @@ -0,0 +1 @@ +#include <sysdeps/ieee754/dbl-64/e_exp_data.c> diff --git a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c index e69de29..6504574 100644 --- a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c +++ b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c @@ -0,0 +1,115 @@ +/* Single-precision 2 element vector e^x function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +/* This function is based on sysdeps/ieee754/flt-32/e_expf.c. */ + +#include <math.h> +#include <stdint.h> +#include <stdio.h> +#include <sysdeps/ieee754/flt-32/math_config.h> +#include "libmvec_util.h" + +#define N (1 << EXP2F_TABLE_BITS) +#define LIMIT 80.0 + +#define InvLn2N __exp2f_data.invln2_scaled +#define T __exp2f_data.tab +#define C __exp2f_data.poly_scaled +#define SHIFT __exp2f_data.shift + +/* Do not inline this call. That way _ZGVnN4v_expf has no calls to non-vector + functions. This reduces the register saves that _ZGVnN4v_expf has to do. */ + +__attribute__((aarch64_vector_pcs,noinline)) static __Float32x4_t +__scalar_expf (__Float32x4_t x) +{ + return (__Float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) }; +} + +__attribute__((aarch64_vector_pcs)) __Float32x4_t +_ZGVnN4v_expf(__Float32x4_t x) +{ + __Float32x4_t g, result; + __Float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1; + __Float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one; + uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3; + double s_0, s_1, s_2, s_3; + float f; + + /* If any value is larger than LIMIT, or NAN, call scalar operation. */ + g = __builtin_aarch64_absv4sf (x); + f = __builtin_aarch64_reduc_smax_scal_v4sf (g); + if (__glibc_unlikely (!(f < LIMIT))) + return __scalar_expf (x); + + xd_0 = get_lo_and_extend (x); + xd_1 = get_hi_and_extend (x); + + vInvLn2N = (__Float64x2_t) { InvLn2N, InvLn2N }; + /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k. */ + z_0 = vInvLn2N * xd_0; + z_1 = vInvLn2N * xd_1; + + /* Round and convert z to int, the result is in [-150*N, 128*N] and + ideally ties-to-even rule is used, otherwise the magnitude of r + can be bigger which gives larger approximation error. */ + vkd_0 = __builtin_aarch64_roundv2df (z_0); + vkd_1 = __builtin_aarch64_roundv2df (z_1); + r_0 = z_0 - vkd_0; + r_1 = z_1 - vkd_1; + + ki_0 = (long) vkd_0[0]; + ki_1 = (long) vkd_0[1]; + ki_2 = (long) vkd_1[0]; + ki_3 = (long) vkd_1[1]; + + /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */ + t_0 = T[ki_0 % N]; + t_1 = T[ki_1 % N]; + t_2 = T[ki_2 % N]; + t_3 = T[ki_3 % N]; + t_0 += ki_0 << (52 - EXP2F_TABLE_BITS); + t_1 += ki_1 << (52 - EXP2F_TABLE_BITS); + t_2 += ki_2 << (52 - EXP2F_TABLE_BITS); + t_3 += ki_3 << (52 - EXP2F_TABLE_BITS); + s_0 = asdouble (t_0); + s_1 = asdouble (t_1); + s_2 = asdouble (t_2); + s_3 = asdouble (t_3); + + vs_0 = (__Float64x2_t) { s_0, s_1 }; + vs_1 = (__Float64x2_t) { s_2, s_3 }; + c0 = (__Float64x2_t) { C[0], C[0] }; + c1 = (__Float64x2_t) { C[1], C[1] }; + c2 = (__Float64x2_t) { C[2], C[2] }; + one = (__Float64x2_t) { 1.0, 1.0 }; + + z_0 = c0 * r_0 + c1; + z_1 = c0 * r_1 + c1; + r2_0 = r_0 * r_0; + r2_1 = r_1 * r_1; + y_0 = c2 * r_0 + one; + y_1 = c2 * r_1 + one; + y_0 = z_0 * r2_0 + y_0; + y_1 = z_1 * r2_1 + y_1; + y_0 = y_0 * vs_0; + y_1 = y_1 * vs_1; + result = pack_and_trunc (y_0, y_1); + return result; +} +weak_alias (_ZGVnN4v_expf, _ZGVnN4v___expf_finite) diff --git a/sysdeps/aarch64/fpu/libmvec_util.h b/sysdeps/aarch64/fpu/libmvec_util.h index e69de29..a127724 100644 --- a/sysdeps/aarch64/fpu/libmvec_util.h +++ b/sysdeps/aarch64/fpu/libmvec_util.h @@ -0,0 +1,53 @@ +/* Utility functions for Aarch64 vector functions. + Copyright (C) 2015-2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdint.h> + +/* Copy lower 2 elements of of a 4 element float vector into a 2 element + double vector. */ + +static __always_inline +__Float64x2_t get_lo_and_extend (__Float32x4_t x) +{ + __Uint64x2_t tmp1 = (__Uint64x2_t) x; +#ifdef BIG_ENDIAN + uint64_t tmp2 = (uint64_t) tmp1[1]; +#else + uint64_t tmp2 = (uint64_t) tmp1[0]; +#endif + return __builtin_aarch64_float_extend_lo_v2df ((__Float32x2_t) tmp2); +} + +/* Copy upper 2 elements of of a 4 element float vector into a 2 element + double vector. */ + +static __always_inline +__Float64x2_t get_hi_and_extend (__Float32x4_t x) +{ + return __builtin_aarch64_vec_unpacks_hi_v4sf (x); +} + +/* Copy a pair of 2 element double vectors into a 4 element float vector. */ + +static __always_inline +__Float32x4_t pack_and_trunc (__Float64x2_t x, __Float64x2_t y) +{ + __Float32x2_t xx = __builtin_aarch64_float_truncate_lo_v2sf (x); + __Float32x2_t yy = __builtin_aarch64_float_truncate_lo_v2sf (y); + return (__builtin_aarch64_combinev2sf (xx, yy)); +} diff --git a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c index e69de29..331a51e 100644 --- a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c @@ -0,0 +1,23 @@ +/* Wrapper part of tests for aarch64 double vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include "test-double-vlen2.h" + +#define VEC_TYPE __Float64x2_t + +VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp) diff --git a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c index e69de29..e3feef6 100644 --- a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c @@ -0,0 +1,23 @@ +/* Wrapper part of tests for float aarch64 vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include "test-float-vlen4.h" + +#define VEC_TYPE __Float32x4_t + +VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 585e5bb..1ed4af9 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1601,6 +1601,12 @@ float: 1 idouble: 1 ifloat: 1 +Function: "exp_vlen2": +double: 1 + +Function: "exp_vlen4": +float: 1 + Function: "expm1": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index e69de29..b7431a3 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -0,0 +1,4 @@ +GLIBC_2.30 _ZGVnN2v___exp_finite F +GLIBC_2.30 _ZGVnN2v_exp F +GLIBC_2.30 _ZGVnN4v___expf_finite F +GLIBC_2.30 _ZGVnN4v_expf F