Message ID | 20190529191106.23888-1-shawn@git.icu |
---|---|
State | New |
Headers | show |
Series | [v3] PPC64: Add libmvec SIMD double-precision natural exponent function. | expand |
On Wed, May 29, 2019 at 2:11 PM Shawn Landden <shawn@git.icu> wrote: > > Passes all tests. I change this but forgot to regenerate the patch. I am getting a bunch of seemingly unrelated test failures. FAIL: dlfcn/bug-atexit3 FAIL: elf/check-abi-libc XPASS: elf/tst-protected1a XPASS: elf/tst-protected1b FAIL: elf/tst-unwind-main FAIL: iconv/tst-iconv-mt FAIL: inet/tst-idna_name_classify FAIL: malloc/tst-malloc-stats-cancellation UNSUPPORTED: math/test-fesetexcept-traps UNSUPPORTED: math/test-fexcept-traps FAIL: misc/tst-gettid-kill FAIL: misc/tst-pkey UNSUPPORTED: nptl/test-cond-printers UNSUPPORTED: nptl/test-condattr-printers FAIL: nptl/test-get_hwcap UNSUPPORTED: nptl/test-mutex-printers UNSUPPORTED: nptl/test-mutexattr-printers UNSUPPORTED: nptl/test-rwlock-printers UNSUPPORTED: nptl/test-rwlockattr-printers FAIL: nptl/tst-basic3 FAIL: nptl/tst-basic4 FAIL: nptl/tst-call-once FAIL: nptl/tst-cancel-self FAIL: nptl/tst-cancel-self-cancelstate FAIL: nptl/tst-cancel-self-canceltype FAIL: nptl/tst-cancel-self-testcancel FAIL: nptl/tst-cancel1 FAIL: nptl/tst-cancel10 FAIL: nptl/tst-cancel11 FAIL: nptl/tst-cancel12 FAIL: nptl/tst-cancel13 FAIL: nptl/tst-cancel14 FAIL: nptl/tst-cancel15 FAIL: nptl/tst-cancel16 FAIL: nptl/tst-cancel17 FAIL: nptl/tst-cancel18 FAIL: nptl/tst-cancel2 FAIL: nptl/tst-cancel20 FAIL: nptl/tst-cancel21 FAIL: nptl/tst-cancel22 FAIL: nptl/tst-cancel23 FAIL: nptl/tst-cancel24 FAIL: nptl/tst-cancel25 FAIL: nptl/tst-cancel3 FAIL: nptl/tst-cancel4 FAIL: nptl/tst-cancel4_1 FAIL: nptl/tst-cancel4_2 FAIL: nptl/tst-cancel5 FAIL: nptl/tst-cancel6 FAIL: nptl/tst-cancel7 FAIL: nptl/tst-cancel8 FAIL: nptl/tst-cancel9 FAIL: nptl/tst-cancelx10 FAIL: nptl/tst-cancelx11 FAIL: nptl/tst-cancelx12 FAIL: nptl/tst-cancelx13 FAIL: nptl/tst-cancelx14 FAIL: nptl/tst-cancelx15 FAIL: nptl/tst-cancelx16 FAIL: nptl/tst-cancelx17 FAIL: nptl/tst-cancelx18 FAIL: nptl/tst-cancelx2 FAIL: nptl/tst-cancelx20 FAIL: nptl/tst-cancelx21 FAIL: nptl/tst-cancelx3 FAIL: nptl/tst-cancelx4 FAIL: nptl/tst-cancelx5 FAIL: nptl/tst-cancelx6 FAIL: nptl/tst-cancelx7 FAIL: nptl/tst-cancelx8 FAIL: nptl/tst-cancelx9 FAIL: nptl/tst-cleanup0 FAIL: nptl/tst-cleanup0-cmp FAIL: nptl/tst-cleanup1 FAIL: nptl/tst-cleanup3 FAIL: nptl/tst-cleanup4 FAIL: nptl/tst-cleanupx0 FAIL: nptl/tst-cleanupx1 FAIL: nptl/tst-cleanupx3 FAIL: nptl/tst-cleanupx4 FAIL: nptl/tst-cnd-basic FAIL: nptl/tst-cnd-broadcast FAIL: nptl/tst-cnd-timedwait FAIL: nptl/tst-cond-except FAIL: nptl/tst-cond22 FAIL: nptl/tst-cond25 FAIL: nptl/tst-cond7 FAIL: nptl/tst-cond8 FAIL: nptl/tst-default-attr FAIL: nptl/tst-exec5 FAIL: nptl/tst-execstack FAIL: nptl/tst-exit2 FAIL: nptl/tst-exit3 FAIL: nptl/tst-fini1 FAIL: nptl/tst-join1 FAIL: nptl/tst-join5 FAIL: nptl/tst-join6 FAIL: nptl/tst-join8 FAIL: nptl/tst-join9 FAIL: nptl/tst-key3 FAIL: nptl/tst-minstack-cancel FAIL: nptl/tst-minstack-exit FAIL: nptl/tst-minstack-throw FAIL: nptl/tst-mtx-basic FAIL: nptl/tst-mtx-timedlock FAIL: nptl/tst-mtx-trylock FAIL: nptl/tst-mutex8 FAIL: nptl/tst-mutexpi8 FAIL: nptl/tst-once3 FAIL: nptl/tst-once4 FAIL: nptl/tst-oncex3 FAIL: nptl/tst-oncex4 FAIL: nptl/tst-robust1 FAIL: nptl/tst-robust2 FAIL: nptl/tst-robust3 FAIL: nptl/tst-robust4 FAIL: nptl/tst-robust5 FAIL: nptl/tst-robust6 FAIL: nptl/tst-robust7 FAIL: nptl/tst-robustpi1 FAIL: nptl/tst-robustpi2 FAIL: nptl/tst-robustpi3 FAIL: nptl/tst-robustpi4 FAIL: nptl/tst-robustpi5 FAIL: nptl/tst-robustpi6 FAIL: nptl/tst-robustpi7 FAIL: nptl/tst-sem11 FAIL: nptl/tst-sem12 FAIL: nptl/tst-sem16 FAIL: nptl/tst-stack4 FAIL: nptl/tst-thrd-detach FAIL: nptl/tst-thrd-sleep FAIL: nptl/tst-thread-exit-clobber FAIL: nptl/tst-thread_local1 FAIL: nptl/tst-tsd5 FAIL: nptl/tst-tss-basic FAIL: nptl/tst-unwind-thread FAIL: nss/tst-cancel-getpwuid_r FAIL: nss/tst-nss-files-hosts-long FAIL: nss/tst-nss-test3 FAIL: posix/tst-getopt-cancel UNSUPPORTED: posix/tst-spawn4-compat UNSUPPORTED: resolv/tst-resolv-ai_idn UNSUPPORTED: resolv/tst-resolv-ai_idn-latin1 FAIL: rt/tst-cpuclock2 FAIL: rt/tst-mqueue8 FAIL: rt/tst-mqueue8x FAIL: rt/tst-shm-cancel FAIL: stdlib/tst-quick_exit FAIL: stdlib/tst-thread-quick_exit > > Unlike other libmvec functions, this sets the underflow and overflow bits. > The caller can check these flags, and possibly re-run the calculations with > scalar expf to figure out what is causing the overflow or underflow. > > The special-case path is not vectorized, and performs much worse than > the scalar code. > Normalized data: 1 to 2^32 converted to double > Running 20 times over 32MiB > vector: mean 563.807107 MiB/s (sd 0.390922) > scalar: mean 226.527824 MiB/s (sd 0.077406) > > Random data: > vector: mean 80.175986 MiB/s (sd 1.110948) > scalar: mean 244.738130 MiB/s (sd 0.029561) > > v2: Fix some serious problems with the algorithm. Not > sure how I missed those problems. > v3: Style problems. Fix symbol order. > > 2019-05-11 Shawn Landden <shawn@git.icu> > > [BZ #24209] > * NEWS: Noted the addition of PPC64 vector exp function. > * sysdeps/powerpc/bits/math-vector.h: Added entry for vector exp. > * sysdeps/powerpc/powerpc64/fpu/Versions: Added vector exp entry. > * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: > (libmvec-sysdep_routines, CFLAGS-vec_d_exp2_vsx.c): > (CFLAGS-vec_d_exp_data.c): Added build of VSX SIMD expf > function and tests. Added vec_math_err.c to build. > * sysdeps/powerpc/powerpc64/fpu/multiarch/(math_config.h)->(math_config_dbl.h) > Renamed, and modified for exp. > * sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c: > Added entry for vector exp. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c: New file. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c: Likewise. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c: Likewise. > * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD expf > added. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c: Rename of math_config.h > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c: Likewise > --- > NEWS | 1 + > sysdeps/powerpc/bits/math-vector.h | 2 + > sysdeps/powerpc/powerpc64/fpu/Versions | 2 +- > .../powerpc/powerpc64/fpu/multiarch/Makefile | 8 +- > .../{math_config.h => math_config_dbl.h} | 15 +- > .../multiarch/test-double-vlen2-wrappers.c | 1 + > .../powerpc64/fpu/multiarch/vec_d_exp2_vsx.c | 197 +++++++++++++++++ > .../powerpc64/fpu/multiarch/vec_d_exp_data.c | 202 ++++++++++++++++++ > .../powerpc64/fpu/multiarch/vec_d_log2_vsx.c | 2 +- > .../powerpc64/fpu/multiarch/vec_d_log_data.c | 2 +- > .../powerpc64/fpu/multiarch/vec_math_err.c | 40 ++++ > .../linux/powerpc/powerpc64/libmvec.abilist | 1 + > 12 files changed, 461 insertions(+), 12 deletions(-) > rename sysdeps/powerpc/powerpc64/fpu/multiarch/{math_config.h => math_config_dbl.h} (95%) > create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c > create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c > > diff --git a/NEWS b/NEWS > index 26f5c67f75..a33af8c316 100644 > --- a/NEWS > +++ b/NEWS > @@ -20,6 +20,7 @@ Major new features: > - single-precision sine: sinf > - double-precision sincos: sincos > - single-precision sincos: sincosf > + - double-precision exp: exp > - single-precision exp: expf > > GCC support for auto-vectorization of functions on PPC64 is not yet > diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h > index bbc9db0dd9..e7846b4bfa 100644 > --- a/sysdeps/powerpc/bits/math-vector.h > +++ b/sysdeps/powerpc/bits/math-vector.h > @@ -46,6 +46,8 @@ > # define __DECL_SIMD_sincosf __DECL_SIMD_PPC64 > # undef __DECL_SIMD_expf > # define __DECL_SIMD_expf __DECL_SIMD_PPC64 > +# undef __DECL_SIMD_exp > +# define __DECL_SIMD_exp __DECL_SIMD_PPC64 > > # endif > #endif > diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions > index 7c45e76074..361edee9e7 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/Versions > +++ b/sysdeps/powerpc/powerpc64/fpu/Versions > @@ -2,6 +2,6 @@ libmvec { > GLIBC_2.30 { > _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf; > _ZGVbN2v_log; _ZGVbN4v_logf; _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf; > - _ZGVbN4v_expf; > + _ZGVbN4v_expf; _ZGVbN2v_exp; > } > } > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile > index d15183d5ee..dd982bb068 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile > @@ -49,7 +49,8 @@ libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \ > vec_d_log2_vsx vec_d_log_data \ > vec_s_logf4_vsx vec_s_logf_data \ > vec_s_expf4_vsx vec_s_exp2f_data \ > - vec_math_errf \ > + vec_math_errf vec_math_err \ > + vec_d_exp2_vsx vec_d_exp_data \ > vec_d_sincos2_vsx vec_s_sincosf4_vsx > CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > @@ -64,6 +65,9 @@ CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx > CFLAGS-vec_s_expf4_vsx.c += -mabi=altivec -maltivec -mvsx > CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx > +CFLAGS-vec_d_exp2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > +CFLAGS-vec_d_exp_data.c += -mabi=altivec -maltivec -mvsx > +CFLAGS-vec_math_err.c += -mabi=altivec -maltivec -mvsx > endif > > # Variables for libmvec tests. > @@ -71,7 +75,7 @@ ifeq ($(subdir),math) > ifeq ($(build-mathvec),yes) > libmvec-tests += double-vlen2 float-vlen4 > > -double-vlen2-funcs = cos sin sincos log > +double-vlen2-funcs = cos sin sincos log exp > float-vlen4-funcs = cos sin sincos log exp > > double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h > similarity index 95% > rename from sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h > rename to sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h > index bd37906ecb..7c94f5b9b2 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h > @@ -24,6 +24,7 @@ > #include <nan-high-order-bit.h> > #include <stdint.h> > #include <altivec.h> > +#include <math-narrow-eval.h> > > #ifndef WANT_ROUNDING > /* Correct special case results in non-nearest rounding modes. */ > @@ -134,13 +135,13 @@ check_uflow (double x) > #define EXP2_POLY_ORDER 5 > extern const struct exp_data > { > - double invln2N; > - double shift; > - double negln2hiN; > - double negln2loN; > - double poly[4]; /* Last four coefficients. */ > - double exp2_shift; > - double exp2_poly[EXP2_POLY_ORDER]; > + vector double invln2N; > + vector double shift; > + vector double negln2hiN; > + vector double negln2loN; > + vector double poly[4]; /* Last four coefficients. */ > + vector double exp2_shift; > + vector double exp2_poly[EXP2_POLY_ORDER]; > uint64_t tab[2*(1 << EXP_TABLE_BITS)]; > } __exp_data attribute_hidden; > > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c > index c9e2dbfdd0..f368e47947 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c > @@ -24,4 +24,5 @@ > VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos) > VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) > VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log) > +VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp) > VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos) > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > new file mode 100644 > index 0000000000..376531e32b > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > @@ -0,0 +1,197 @@ > +/* Double-precision vector exp(x) function. > + Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > +/* Based on sysdeps/ieee754/dbl-64/e_exp.c which came from > + Szabolcs Nagy at ARM Ltd. */ > +#include <altivec.h> > +#include <math.h> > +#include <math-barriers.h> > +#include <math-narrow-eval.h> > + > +#include "math_config_dbl.h" > + > +typedef vector long long unsigned v64u; > +typedef union { > + vector unsigned u; > + vector float f; > + v64u l; > + vector double d; > +} u; > +typedef union { > + double d; > + uint64_t l; > + unsigned u; > + float f; > +} us; > + > +#define N (1 << EXP_TABLE_BITS) > +#define InvLn2N __exp_data.invln2N > +#define NegLn2hiN __exp_data.negln2hiN > +#define NegLn2loN __exp_data.negln2loN > +#define Shift __exp_data.shift > +#define T __exp_data.tab > +#define C2 __exp_data.poly[5 - EXP_POLY_ORDER] > +#define C3 __exp_data.poly[6 - EXP_POLY_ORDER] > +#define C4 __exp_data.poly[7 - EXP_POLY_ORDER] > +#define C5 __exp_data.poly[8 - EXP_POLY_ORDER] > + > +/* Handle cases that may overflow or underflow when computing the result that > + is scale*(1+TMP) without intermediate rounding. The bit representation of > + scale is in SBITS, however it has a computed exponent that may have > + overflown into the sign bit so that needs to be adjusted before using it as > + a double. (int32_t)KI is the k used in the argument reduction and exponent > + adjustment of scale, positive k here means the result may overflow and > + negative k means the result may underflow. */ > +static inline double > +specialcase (double_t tmp, uint64_t sbits, uint64_t ki) > +{ > + double_t scale, y; > + > + if ((ki & 0x80000000) == 0) > + { > + /* k > 0, the exponent of scale might have overflowed by <= 460. */ > + sbits -= 1009ull << 52; > + scale = asdouble (sbits); > + y = 0x1p1009 * (scale + scale * tmp); > + return y; > + } > + /* k < 0, need special care in the subnormal range. */ > + sbits += 1022ull << 52; > + scale = asdouble (sbits); > + y = scale + scale * tmp; > + if (y < 1.0) > + { > + /* Round y to the right precision before scaling it into the subnormal > + range to avoid double rounding that can cause 0.5+E/2 ulp error where > + E is the worst-case ulp error outside the subnormal range. So this > + is only useful if the goal is better than 1 ulp worst-case error. */ > + double_t hi, lo; > + lo = scale - y + scale * tmp; > + hi = 1.0 + y; > + lo = 1.0 - hi + y + lo; > + y = math_narrow_eval (hi + lo) - 1.0; > + /* Avoid -0.0 with downward rounding. */ > + if (WANT_ROUNDING && y == 0.0) > + y = 0.0; > + /* The underflow exception needs to be signaled explicitly. */ > + math_force_eval (math_opt_barrier (0x1p-1022) * 0x1p-1022); > + } > + y = 0x1p-1022 * y; > + return y; > +} > + > +/* Top 12 bits of a double (sign and exponent bits). */ > +static inline uint32_t top12(double x) > +{ > + return asuint64(x) >> 52; > +} > + > +vector double > +_ZGVbN2v_exp (vector double x) > +{ > + v64u abstop, is_special_case, is_special_case2, idx, top, sbits; > + vector double kd, z, r, r2, scale, tail, tmp; > + u t, t2; > + u res; > + u xu; > + xu.d = x; > + v64u zero = {0, 0}; > + u c512; > + vector double load = {512.0, 512.0}; > + c512.d = load; > + u normalize; > + vector double load2 = {0x1p-54, 0x1p-54}; > + normalize.d = load2; > + c512.l &= 0x7ff0000000000000; > + normalize.l &= 0x7ff0000000000000; > + abstop = xu.l & 0x7ff0000000000000; > + is_special_case = (v64u) vec_cmpge (xu.l - normalize.l, c512.l - normalize.l); > + if (__glibc_unlikely (!vec_all_eq (is_special_case, zero))) > + { > + for (int m=0;m<2;++m) > + { > + if (!is_special_case[m]) > + continue; > + double v; > + uint32_t abstops = abstop[m] >> 52; > + if (!is_special_case[m]) > + continue; > + v = x[m]; > + if (abstops - top12(0x1p-54) >= 0x80000000) > + /* Avoid spurious underflow for tiny x. */ > + /* Note: 0 is common input. */ > + res.d[m] = WANT_ROUNDING ? 1.0 + v : 1.0; > + if (abstops >= top12(1024.0)) { > + if (asuint64(v) == asuint64(-INFINITY)) > + res.d[m] = 0.0; > + else if (abstops >= top12(INFINITY)) > + res.d[m] = 1.0 + v; > + else if (asuint64(v) >> 63) > + res.d[m] = __math_uflow(0); > + else > + res.d[m] = __math_oflow(0); > + /* prevent is_special_case2 from matching */ > + abstop[m] = 1; > + } else { > + /* Large x is special cased below. */ > + abstop[m] = 0; > + is_special_case[m] = 0; > + } > + } > + } > + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ > + /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */ > + z = InvLn2N * x; > + u kdu; > + /* z - kd is in [-1, 1] in non-nearest rounding modes. */ > + kd = z + Shift; > + kdu.d = kd; > + kd -= Shift; > + r = x + kd * NegLn2hiN + kd * NegLn2loN; > + /* 2^(k/N) ~= scale * (1 + tail). */ > + idx = 2 * (kdu.l % N); > + top = kdu.l << (52 - EXP_TABLE_BITS); > + v64u load3 = {T[idx[0]], T[idx[1]]}; > + t.l = load3; > + v64u load4 = {T[idx[0] + 1], T[idx[1] + 1]}; > + t2.l = load4; > + tail = t.d; > + /* This is only a valid scale when -1023*N < k < 1024*N. */ > + sbits = t2.l + top; > + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1). */ > + /* Evaluation is optimized assuming superscalar pipelined execution. */ > + r2 = r * r; > + /* Without fma the worst case error is 0.25/N ulp larger. */ > + /* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp. */ > + tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5); > + is_special_case2 = (v64u) vec_cmpeq (abstop, zero); > + if (__glibc_unlikely (!vec_all_eq (is_special_case2, zero))) > + { > + u sc2; > + for (int m=0;m<2;++m) > + { > + sc2.d[m] = specialcase(tmp[m], sbits[m], kdu.l[m]); > + } > + res.l = vec_sel (res.l, sc2.l, is_special_case2); > + } > + u sbitsu; > + sbitsu.l = sbits; > + scale = sbitsu.d; > + u res2; > + res2.d = scale + scale * tmp; > + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there > + is no spurious underflow here even without fma. */ > + res.l = vec_sel (res.l, res2.l, ~is_special_case & ~is_special_case2); > + return res.d; > +} > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c > new file mode 100644 > index 0000000000..208b897d04 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c > @@ -0,0 +1,202 @@ > +/* Shared data between vector exp, exp2 and pow > + Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > +/* Based on sysdeps/ieee754/dbl-64/e_exp.c which came from > + Szabolcs Nagy at ARM Ltd. */ > + > +#include "math_config_dbl.h" > + > +#define N (1 << EXP_TABLE_BITS) > + > +const struct exp_data __exp_data = { > +// N/ln2 > +.invln2N = { > + 0x1.71547652b82fep0 * N, > + 0x1.71547652b82fep0 * N, > +}, > +// -ln2/N > +.negln2hiN = { > + -0x1.62e42fefa0000p-8, > + -0x1.62e42fefa0000p-8, > +}, > +.negln2loN = { > + -0x1.cf79abc9e3b3ap-47, > + -0x1.cf79abc9e3b3ap-47, > +}, > +.shift = { > + 0x1.8p52, > + 0x1.8p52, > +}, > +// exp polynomial coefficients. > +.poly = { > +// abs error: 1.555*2^-66 > +// ulp error: 0.509 (0.511 without fma) > +// if |x| < ln2/256+eps > +// abs error if |x| < ln2/256+0x1p-15: 1.09*2^-65 > +// abs error if |x| < ln2/128: 1.7145*2^-56 > +{0x1.ffffffffffdbdp-2, 0x1.ffffffffffdbdp-2}, > +{0x1.555555555543cp-3, 0x1.555555555543cp-3}, > +{0x1.55555cf172b91p-5, 0x1.55555cf172b91p-5}, > +{0x1.1111167a4d017p-7, 0x1.1111167a4d017p-7}, > +}, > +.exp2_shift = { > + 0x1.8p52 / N, > + 0x1.8p52 / N, > +}, > +// exp2 polynomial coefficients. > +.exp2_poly = { > +// abs error: 1.2195*2^-65 > +// ulp error: 0.507 (0.511 without fma) > +// if |x| < 1/256 > +// abs error if |x| < 1/128: 1.9941*2^-56 > +{0x1.62e42fefa39efp-1, 0x1.62e42fefa39efp-1}, > +{0x1.ebfbdff82c424p-3, 0x1.ebfbdff82c424p-3}, > +{0x1.c6b08d70cf4b5p-5, 0x1.c6b08d70cf4b5p-5}, > +{0x1.3b2abd24650ccp-7, 0x1.3b2abd24650ccp-7}, > +{0x1.5d7e09b4e3a84p-10, 0x1.5d7e09b4e3a84p-10}, > +}, > +// 2^(k/N) ~= H[k]*(1 + T[k]) for int k in [0,N) > +// tab[2*k] = asuint64(T[k]) > +// tab[2*k+1] = asuint64(H[k]) - (k << 52)/N > +.tab = { > +0x0, 0x3ff0000000000000, > +0x3c9b3b4f1a88bf6e, 0x3feff63da9fb3335, > +0xbc7160139cd8dc5d, 0x3fefec9a3e778061, > +0xbc905e7a108766d1, 0x3fefe315e86e7f85, > +0x3c8cd2523567f613, 0x3fefd9b0d3158574, > +0xbc8bce8023f98efa, 0x3fefd06b29ddf6de, > +0x3c60f74e61e6c861, 0x3fefc74518759bc8, > +0x3c90a3e45b33d399, 0x3fefbe3ecac6f383, > +0x3c979aa65d837b6d, 0x3fefb5586cf9890f, > +0x3c8eb51a92fdeffc, 0x3fefac922b7247f7, > +0x3c3ebe3d702f9cd1, 0x3fefa3ec32d3d1a2, > +0xbc6a033489906e0b, 0x3fef9b66affed31b, > +0xbc9556522a2fbd0e, 0x3fef9301d0125b51, > +0xbc5080ef8c4eea55, 0x3fef8abdc06c31cc, > +0xbc91c923b9d5f416, 0x3fef829aaea92de0, > +0x3c80d3e3e95c55af, 0x3fef7a98c8a58e51, > +0xbc801b15eaa59348, 0x3fef72b83c7d517b, > +0xbc8f1ff055de323d, 0x3fef6af9388c8dea, > +0x3c8b898c3f1353bf, 0x3fef635beb6fcb75, > +0xbc96d99c7611eb26, 0x3fef5be084045cd4, > +0x3c9aecf73e3a2f60, 0x3fef54873168b9aa, > +0xbc8fe782cb86389d, 0x3fef4d5022fcd91d, > +0x3c8a6f4144a6c38d, 0x3fef463b88628cd6, > +0x3c807a05b0e4047d, 0x3fef3f49917ddc96, > +0x3c968efde3a8a894, 0x3fef387a6e756238, > +0x3c875e18f274487d, 0x3fef31ce4fb2a63f, > +0x3c80472b981fe7f2, 0x3fef2b4565e27cdd, > +0xbc96b87b3f71085e, 0x3fef24dfe1f56381, > +0x3c82f7e16d09ab31, 0x3fef1e9df51fdee1, > +0xbc3d219b1a6fbffa, 0x3fef187fd0dad990, > +0x3c8b3782720c0ab4, 0x3fef1285a6e4030b, > +0x3c6e149289cecb8f, 0x3fef0cafa93e2f56, > +0x3c834d754db0abb6, 0x3fef06fe0a31b715, > +0x3c864201e2ac744c, 0x3fef0170fc4cd831, > +0x3c8fdd395dd3f84a, 0x3feefc08b26416ff, > +0xbc86a3803b8e5b04, 0x3feef6c55f929ff1, > +0xbc924aedcc4b5068, 0x3feef1a7373aa9cb, > +0xbc9907f81b512d8e, 0x3feeecae6d05d866, > +0xbc71d1e83e9436d2, 0x3feee7db34e59ff7, > +0xbc991919b3ce1b15, 0x3feee32dc313a8e5, > +0x3c859f48a72a4c6d, 0x3feedea64c123422, > +0xbc9312607a28698a, 0x3feeda4504ac801c, > +0xbc58a78f4817895b, 0x3feed60a21f72e2a, > +0xbc7c2c9b67499a1b, 0x3feed1f5d950a897, > +0x3c4363ed60c2ac11, 0x3feece086061892d, > +0x3c9666093b0664ef, 0x3feeca41ed1d0057, > +0x3c6ecce1daa10379, 0x3feec6a2b5c13cd0, > +0x3c93ff8e3f0f1230, 0x3feec32af0d7d3de, > +0x3c7690cebb7aafb0, 0x3feebfdad5362a27, > +0x3c931dbdeb54e077, 0x3feebcb299fddd0d, > +0xbc8f94340071a38e, 0x3feeb9b2769d2ca7, > +0xbc87deccdc93a349, 0x3feeb6daa2cf6642, > +0xbc78dec6bd0f385f, 0x3feeb42b569d4f82, > +0xbc861246ec7b5cf6, 0x3feeb1a4ca5d920f, > +0x3c93350518fdd78e, 0x3feeaf4736b527da, > +0x3c7b98b72f8a9b05, 0x3feead12d497c7fd, > +0x3c9063e1e21c5409, 0x3feeab07dd485429, > +0x3c34c7855019c6ea, 0x3feea9268a5946b7, > +0x3c9432e62b64c035, 0x3feea76f15ad2148, > +0xbc8ce44a6199769f, 0x3feea5e1b976dc09, > +0xbc8c33c53bef4da8, 0x3feea47eb03a5585, > +0xbc845378892be9ae, 0x3feea34634ccc320, > +0xbc93cedd78565858, 0x3feea23882552225, > +0x3c5710aa807e1964, 0x3feea155d44ca973, > +0xbc93b3efbf5e2228, 0x3feea09e667f3bcd, > +0xbc6a12ad8734b982, 0x3feea012750bdabf, > +0xbc6367efb86da9ee, 0x3fee9fb23c651a2f, > +0xbc80dc3d54e08851, 0x3fee9f7df9519484, > +0xbc781f647e5a3ecf, 0x3fee9f75e8ec5f74, > +0xbc86ee4ac08b7db0, 0x3fee9f9a48a58174, > +0xbc8619321e55e68a, 0x3fee9feb564267c9, > +0x3c909ccb5e09d4d3, 0x3feea0694fde5d3f, > +0xbc7b32dcb94da51d, 0x3feea11473eb0187, > +0x3c94ecfd5467c06b, 0x3feea1ed0130c132, > +0x3c65ebe1abd66c55, 0x3feea2f336cf4e62, > +0xbc88a1c52fb3cf42, 0x3feea427543e1a12, > +0xbc9369b6f13b3734, 0x3feea589994cce13, > +0xbc805e843a19ff1e, 0x3feea71a4623c7ad, > +0xbc94d450d872576e, 0x3feea8d99b4492ed, > +0x3c90ad675b0e8a00, 0x3feeaac7d98a6699, > +0x3c8db72fc1f0eab4, 0x3feeace5422aa0db, > +0xbc65b6609cc5e7ff, 0x3feeaf3216b5448c, > +0x3c7bf68359f35f44, 0x3feeb1ae99157736, > +0xbc93091fa71e3d83, 0x3feeb45b0b91ffc6, > +0xbc5da9b88b6c1e29, 0x3feeb737b0cdc5e5, > +0xbc6c23f97c90b959, 0x3feeba44cbc8520f, > +0xbc92434322f4f9aa, 0x3feebd829fde4e50, > +0xbc85ca6cd7668e4b, 0x3feec0f170ca07ba, > +0x3c71affc2b91ce27, 0x3feec49182a3f090, > +0x3c6dd235e10a73bb, 0x3feec86319e32323, > +0xbc87c50422622263, 0x3feecc667b5de565, > +0x3c8b1c86e3e231d5, 0x3feed09bec4a2d33, > +0xbc91bbd1d3bcbb15, 0x3feed503b23e255d, > +0x3c90cc319cee31d2, 0x3feed99e1330b358, > +0x3c8469846e735ab3, 0x3feede6b5579fdbf, > +0xbc82dfcd978e9db4, 0x3feee36bbfd3f37a, > +0x3c8c1a7792cb3387, 0x3feee89f995ad3ad, > +0xbc907b8f4ad1d9fa, 0x3feeee07298db666, > +0xbc55c3d956dcaeba, 0x3feef3a2b84f15fb, > +0xbc90a40e3da6f640, 0x3feef9728de5593a, > +0xbc68d6f438ad9334, 0x3feeff76f2fb5e47, > +0xbc91eee26b588a35, 0x3fef05b030a1064a, > +0x3c74ffd70a5fddcd, 0x3fef0c1e904bc1d2, > +0xbc91bdfbfa9298ac, 0x3fef12c25bd71e09, > +0x3c736eae30af0cb3, 0x3fef199bdd85529c, > +0x3c8ee3325c9ffd94, 0x3fef20ab5fffd07a, > +0x3c84e08fd10959ac, 0x3fef27f12e57d14b, > +0x3c63cdaf384e1a67, 0x3fef2f6d9406e7b5, > +0x3c676b2c6c921968, 0x3fef3720dcef9069, > +0xbc808a1883ccb5d2, 0x3fef3f0b555dc3fa, > +0xbc8fad5d3ffffa6f, 0x3fef472d4a07897c, > +0xbc900dae3875a949, 0x3fef4f87080d89f2, > +0x3c74a385a63d07a7, 0x3fef5818dcfba487, > +0xbc82919e2040220f, 0x3fef60e316c98398, > +0x3c8e5a50d5c192ac, 0x3fef69e603db3285, > +0x3c843a59ac016b4b, 0x3fef7321f301b460, > +0xbc82d52107b43e1f, 0x3fef7c97337b9b5f, > +0xbc892ab93b470dc9, 0x3fef864614f5a129, > +0x3c74b604603a88d3, 0x3fef902ee78b3ff6, > +0x3c83c5ec519d7271, 0x3fef9a51fbc74c83, > +0xbc8ff7128fd391f0, 0x3fefa4afa2a490da, > +0xbc8dae98e223747d, 0x3fefaf482d8e67f1, > +0x3c8ec3bc41aa2008, 0x3fefba1bee615a27, > +0x3c842b94c3a9eb32, 0x3fefc52b376bba97, > +0x3c8a64a931d185ee, 0x3fefd0765b6e4540, > +0xbc8e37bae43be3ed, 0x3fefdbfdad9cbe14, > +0x3c77893b4d91cd9d, 0x3fefe7c1819e90d8, > +0x3c5305c14160cc89, 0x3feff3c22b8f71f1, > +}, > +}; > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c > index d7588a7929..3a9fb05aee 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c > @@ -21,7 +21,7 @@ > #include <math/math-svid-compat.h> > #include <shlib-compat.h> > #include <libm-alias-double.h> > -#include "math_config.h" > +#include "math_config_dbl.h" > #include <altivec.h> > > #define T __log_data.tab > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c > index 2fe569576c..f1299404c4 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c > @@ -16,7 +16,7 @@ > License along with the GNU C Library; if not, see > <http://www.gnu.org/licenses/>. */ > > -#include "math_config.h" > +#include "math_config_dbl.h" > > #define N (1 << LOG_TABLE_BITS) > > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c > new file mode 100644 > index 0000000000..e60b0d232f > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c > @@ -0,0 +1,40 @@ > +/* Double-precision math error handling. > + Copyright (C) 2017-2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#include <fpu/math-barriers.h> > +#include "math_config_dbl.h" > + > +/* NOINLINE reduces code size. */ > +NOINLINE static double > +xflow (uint32_t sign, double y) > +{ > + y = math_opt_barrier (sign ? -y : y) * y; > + return y; > +} > + > +attribute_hidden double > +__math_uflow (uint32_t sign) > +{ > + return xflow (sign, 0x1p-767); > +} > + > +attribute_hidden double > +__math_oflow (uint32_t sign) > +{ > + return xflow (sign, 0x1p769); > +} > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist > index bfbd340626..46558dbd77 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist > @@ -1,4 +1,5 @@ > GLIBC_2.30 _ZGVbN2v_cos F > +GLIBC_2.30 _ZGVbN2v_exp F > GLIBC_2.30 _ZGVbN2v_log F > GLIBC_2.30 _ZGVbN2v_sin F > GLIBC_2.30 _ZGVbN2vvv_sincos F > -- > 2.21.0.1020.gf2820cf01a >
* Shawn Landden: > On Wed, May 29, 2019 at 2:11 PM Shawn Landden <shawn@git.icu> wrote: >> >> Passes all tests. > I change this but forgot to regenerate the patch. I am getting a bunch > of seemingly unrelated test failures. Did you rebuild everything from scratch? Incremental builds often result in bizarre failures. 8-( Thanks, Florian
Ping On Wed, May 29, 2019 at 2:11 PM Shawn Landden <shawn@git.icu> wrote: > > Passes all tests. > > Unlike other libmvec functions, this sets the underflow and overflow bits. > The caller can check these flags, and possibly re-run the calculations with > scalar expf to figure out what is causing the overflow or underflow. > > The special-case path is not vectorized, and performs much worse than > the scalar code. > Normalized data: 1 to 2^32 converted to double > Running 20 times over 32MiB > vector: mean 563.807107 MiB/s (sd 0.390922) > scalar: mean 226.527824 MiB/s (sd 0.077406) > > Random data: > vector: mean 80.175986 MiB/s (sd 1.110948) > scalar: mean 244.738130 MiB/s (sd 0.029561) > > v2: Fix some serious problems with the algorithm. Not > sure how I missed those problems. > v3: Style problems. Fix symbol order. > > 2019-05-11 Shawn Landden <shawn@git.icu> > > [BZ #24209] > * NEWS: Noted the addition of PPC64 vector exp function. > * sysdeps/powerpc/bits/math-vector.h: Added entry for vector exp. > * sysdeps/powerpc/powerpc64/fpu/Versions: Added vector exp entry. > * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: > (libmvec-sysdep_routines, CFLAGS-vec_d_exp2_vsx.c): > (CFLAGS-vec_d_exp_data.c): Added build of VSX SIMD expf > function and tests. Added vec_math_err.c to build. > * sysdeps/powerpc/powerpc64/fpu/multiarch/(math_config.h)->(math_config_dbl.h) > Renamed, and modified for exp. > * sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c: > Added entry for vector exp. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c: New file. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c: Likewise. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c: Likewise. > * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD expf > added. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c: Rename of math_config.h > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c: Likewise > --- > NEWS | 1 + > sysdeps/powerpc/bits/math-vector.h | 2 + > sysdeps/powerpc/powerpc64/fpu/Versions | 2 +- > .../powerpc/powerpc64/fpu/multiarch/Makefile | 8 +- > .../{math_config.h => math_config_dbl.h} | 15 +- > .../multiarch/test-double-vlen2-wrappers.c | 1 + > .../powerpc64/fpu/multiarch/vec_d_exp2_vsx.c | 197 +++++++++++++++++ > .../powerpc64/fpu/multiarch/vec_d_exp_data.c | 202 ++++++++++++++++++ > .../powerpc64/fpu/multiarch/vec_d_log2_vsx.c | 2 +- > .../powerpc64/fpu/multiarch/vec_d_log_data.c | 2 +- > .../powerpc64/fpu/multiarch/vec_math_err.c | 40 ++++ > .../linux/powerpc/powerpc64/libmvec.abilist | 1 + > 12 files changed, 461 insertions(+), 12 deletions(-) > rename sysdeps/powerpc/powerpc64/fpu/multiarch/{math_config.h => math_config_dbl.h} (95%) > create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c > create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c > > diff --git a/NEWS b/NEWS > index 26f5c67f75..a33af8c316 100644 > --- a/NEWS > +++ b/NEWS > @@ -20,6 +20,7 @@ Major new features: > - single-precision sine: sinf > - double-precision sincos: sincos > - single-precision sincos: sincosf > + - double-precision exp: exp > - single-precision exp: expf > > GCC support for auto-vectorization of functions on PPC64 is not yet > diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h > index bbc9db0dd9..e7846b4bfa 100644 > --- a/sysdeps/powerpc/bits/math-vector.h > +++ b/sysdeps/powerpc/bits/math-vector.h > @@ -46,6 +46,8 @@ > # define __DECL_SIMD_sincosf __DECL_SIMD_PPC64 > # undef __DECL_SIMD_expf > # define __DECL_SIMD_expf __DECL_SIMD_PPC64 > +# undef __DECL_SIMD_exp > +# define __DECL_SIMD_exp __DECL_SIMD_PPC64 > > # endif > #endif > diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions > index 7c45e76074..361edee9e7 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/Versions > +++ b/sysdeps/powerpc/powerpc64/fpu/Versions > @@ -2,6 +2,6 @@ libmvec { > GLIBC_2.30 { > _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf; > _ZGVbN2v_log; _ZGVbN4v_logf; _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf; > - _ZGVbN4v_expf; > + _ZGVbN4v_expf; _ZGVbN2v_exp; > } > } > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile > index d15183d5ee..dd982bb068 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile > @@ -49,7 +49,8 @@ libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \ > vec_d_log2_vsx vec_d_log_data \ > vec_s_logf4_vsx vec_s_logf_data \ > vec_s_expf4_vsx vec_s_exp2f_data \ > - vec_math_errf \ > + vec_math_errf vec_math_err \ > + vec_d_exp2_vsx vec_d_exp_data \ > vec_d_sincos2_vsx vec_s_sincosf4_vsx > CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > @@ -64,6 +65,9 @@ CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx > CFLAGS-vec_s_expf4_vsx.c += -mabi=altivec -maltivec -mvsx > CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx > +CFLAGS-vec_d_exp2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector > +CFLAGS-vec_d_exp_data.c += -mabi=altivec -maltivec -mvsx > +CFLAGS-vec_math_err.c += -mabi=altivec -maltivec -mvsx > endif > > # Variables for libmvec tests. > @@ -71,7 +75,7 @@ ifeq ($(subdir),math) > ifeq ($(build-mathvec),yes) > libmvec-tests += double-vlen2 float-vlen4 > > -double-vlen2-funcs = cos sin sincos log > +double-vlen2-funcs = cos sin sincos log exp > float-vlen4-funcs = cos sin sincos log exp > > double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h > similarity index 95% > rename from sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h > rename to sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h > index bd37906ecb..7c94f5b9b2 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h > @@ -24,6 +24,7 @@ > #include <nan-high-order-bit.h> > #include <stdint.h> > #include <altivec.h> > +#include <math-narrow-eval.h> > > #ifndef WANT_ROUNDING > /* Correct special case results in non-nearest rounding modes. */ > @@ -134,13 +135,13 @@ check_uflow (double x) > #define EXP2_POLY_ORDER 5 > extern const struct exp_data > { > - double invln2N; > - double shift; > - double negln2hiN; > - double negln2loN; > - double poly[4]; /* Last four coefficients. */ > - double exp2_shift; > - double exp2_poly[EXP2_POLY_ORDER]; > + vector double invln2N; > + vector double shift; > + vector double negln2hiN; > + vector double negln2loN; > + vector double poly[4]; /* Last four coefficients. */ > + vector double exp2_shift; > + vector double exp2_poly[EXP2_POLY_ORDER]; > uint64_t tab[2*(1 << EXP_TABLE_BITS)]; > } __exp_data attribute_hidden; > > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c > index c9e2dbfdd0..f368e47947 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c > @@ -24,4 +24,5 @@ > VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos) > VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) > VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log) > +VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp) > VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos) > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > new file mode 100644 > index 0000000000..376531e32b > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > @@ -0,0 +1,197 @@ > +/* Double-precision vector exp(x) function. > + Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > +/* Based on sysdeps/ieee754/dbl-64/e_exp.c which came from > + Szabolcs Nagy at ARM Ltd. */ > +#include <altivec.h> > +#include <math.h> > +#include <math-barriers.h> > +#include <math-narrow-eval.h> > + > +#include "math_config_dbl.h" > + > +typedef vector long long unsigned v64u; > +typedef union { > + vector unsigned u; > + vector float f; > + v64u l; > + vector double d; > +} u; > +typedef union { > + double d; > + uint64_t l; > + unsigned u; > + float f; > +} us; > + > +#define N (1 << EXP_TABLE_BITS) > +#define InvLn2N __exp_data.invln2N > +#define NegLn2hiN __exp_data.negln2hiN > +#define NegLn2loN __exp_data.negln2loN > +#define Shift __exp_data.shift > +#define T __exp_data.tab > +#define C2 __exp_data.poly[5 - EXP_POLY_ORDER] > +#define C3 __exp_data.poly[6 - EXP_POLY_ORDER] > +#define C4 __exp_data.poly[7 - EXP_POLY_ORDER] > +#define C5 __exp_data.poly[8 - EXP_POLY_ORDER] > + > +/* Handle cases that may overflow or underflow when computing the result that > + is scale*(1+TMP) without intermediate rounding. The bit representation of > + scale is in SBITS, however it has a computed exponent that may have > + overflown into the sign bit so that needs to be adjusted before using it as > + a double. (int32_t)KI is the k used in the argument reduction and exponent > + adjustment of scale, positive k here means the result may overflow and > + negative k means the result may underflow. */ > +static inline double > +specialcase (double_t tmp, uint64_t sbits, uint64_t ki) > +{ > + double_t scale, y; > + > + if ((ki & 0x80000000) == 0) > + { > + /* k > 0, the exponent of scale might have overflowed by <= 460. */ > + sbits -= 1009ull << 52; > + scale = asdouble (sbits); > + y = 0x1p1009 * (scale + scale * tmp); > + return y; > + } > + /* k < 0, need special care in the subnormal range. */ > + sbits += 1022ull << 52; > + scale = asdouble (sbits); > + y = scale + scale * tmp; > + if (y < 1.0) > + { > + /* Round y to the right precision before scaling it into the subnormal > + range to avoid double rounding that can cause 0.5+E/2 ulp error where > + E is the worst-case ulp error outside the subnormal range. So this > + is only useful if the goal is better than 1 ulp worst-case error. */ > + double_t hi, lo; > + lo = scale - y + scale * tmp; > + hi = 1.0 + y; > + lo = 1.0 - hi + y + lo; > + y = math_narrow_eval (hi + lo) - 1.0; > + /* Avoid -0.0 with downward rounding. */ > + if (WANT_ROUNDING && y == 0.0) > + y = 0.0; > + /* The underflow exception needs to be signaled explicitly. */ > + math_force_eval (math_opt_barrier (0x1p-1022) * 0x1p-1022); > + } > + y = 0x1p-1022 * y; > + return y; > +} > + > +/* Top 12 bits of a double (sign and exponent bits). */ > +static inline uint32_t top12(double x) > +{ > + return asuint64(x) >> 52; > +} > + > +vector double > +_ZGVbN2v_exp (vector double x) > +{ > + v64u abstop, is_special_case, is_special_case2, idx, top, sbits; > + vector double kd, z, r, r2, scale, tail, tmp; > + u t, t2; > + u res; > + u xu; > + xu.d = x; > + v64u zero = {0, 0}; > + u c512; > + vector double load = {512.0, 512.0}; > + c512.d = load; > + u normalize; > + vector double load2 = {0x1p-54, 0x1p-54}; > + normalize.d = load2; > + c512.l &= 0x7ff0000000000000; > + normalize.l &= 0x7ff0000000000000; > + abstop = xu.l & 0x7ff0000000000000; > + is_special_case = (v64u) vec_cmpge (xu.l - normalize.l, c512.l - normalize.l); > + if (__glibc_unlikely (!vec_all_eq (is_special_case, zero))) > + { > + for (int m=0;m<2;++m) > + { > + if (!is_special_case[m]) > + continue; > + double v; > + uint32_t abstops = abstop[m] >> 52; > + if (!is_special_case[m]) > + continue; > + v = x[m]; > + if (abstops - top12(0x1p-54) >= 0x80000000) > + /* Avoid spurious underflow for tiny x. */ > + /* Note: 0 is common input. */ > + res.d[m] = WANT_ROUNDING ? 1.0 + v : 1.0; > + if (abstops >= top12(1024.0)) { > + if (asuint64(v) == asuint64(-INFINITY)) > + res.d[m] = 0.0; > + else if (abstops >= top12(INFINITY)) > + res.d[m] = 1.0 + v; > + else if (asuint64(v) >> 63) > + res.d[m] = __math_uflow(0); > + else > + res.d[m] = __math_oflow(0); > + /* prevent is_special_case2 from matching */ > + abstop[m] = 1; > + } else { > + /* Large x is special cased below. */ > + abstop[m] = 0; > + is_special_case[m] = 0; > + } > + } > + } > + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ > + /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */ > + z = InvLn2N * x; > + u kdu; > + /* z - kd is in [-1, 1] in non-nearest rounding modes. */ > + kd = z + Shift; > + kdu.d = kd; > + kd -= Shift; > + r = x + kd * NegLn2hiN + kd * NegLn2loN; > + /* 2^(k/N) ~= scale * (1 + tail). */ > + idx = 2 * (kdu.l % N); > + top = kdu.l << (52 - EXP_TABLE_BITS); > + v64u load3 = {T[idx[0]], T[idx[1]]}; > + t.l = load3; > + v64u load4 = {T[idx[0] + 1], T[idx[1] + 1]}; > + t2.l = load4; > + tail = t.d; > + /* This is only a valid scale when -1023*N < k < 1024*N. */ > + sbits = t2.l + top; > + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1). */ > + /* Evaluation is optimized assuming superscalar pipelined execution. */ > + r2 = r * r; > + /* Without fma the worst case error is 0.25/N ulp larger. */ > + /* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp. */ > + tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5); > + is_special_case2 = (v64u) vec_cmpeq (abstop, zero); > + if (__glibc_unlikely (!vec_all_eq (is_special_case2, zero))) > + { > + u sc2; > + for (int m=0;m<2;++m) > + { > + sc2.d[m] = specialcase(tmp[m], sbits[m], kdu.l[m]); > + } > + res.l = vec_sel (res.l, sc2.l, is_special_case2); > + } > + u sbitsu; > + sbitsu.l = sbits; > + scale = sbitsu.d; > + u res2; > + res2.d = scale + scale * tmp; > + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there > + is no spurious underflow here even without fma. */ > + res.l = vec_sel (res.l, res2.l, ~is_special_case & ~is_special_case2); > + return res.d; > +} > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c > new file mode 100644 > index 0000000000..208b897d04 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c > @@ -0,0 +1,202 @@ > +/* Shared data between vector exp, exp2 and pow > + Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > +/* Based on sysdeps/ieee754/dbl-64/e_exp.c which came from > + Szabolcs Nagy at ARM Ltd. */ > + > +#include "math_config_dbl.h" > + > +#define N (1 << EXP_TABLE_BITS) > + > +const struct exp_data __exp_data = { > +// N/ln2 > +.invln2N = { > + 0x1.71547652b82fep0 * N, > + 0x1.71547652b82fep0 * N, > +}, > +// -ln2/N > +.negln2hiN = { > + -0x1.62e42fefa0000p-8, > + -0x1.62e42fefa0000p-8, > +}, > +.negln2loN = { > + -0x1.cf79abc9e3b3ap-47, > + -0x1.cf79abc9e3b3ap-47, > +}, > +.shift = { > + 0x1.8p52, > + 0x1.8p52, > +}, > +// exp polynomial coefficients. > +.poly = { > +// abs error: 1.555*2^-66 > +// ulp error: 0.509 (0.511 without fma) > +// if |x| < ln2/256+eps > +// abs error if |x| < ln2/256+0x1p-15: 1.09*2^-65 > +// abs error if |x| < ln2/128: 1.7145*2^-56 > +{0x1.ffffffffffdbdp-2, 0x1.ffffffffffdbdp-2}, > +{0x1.555555555543cp-3, 0x1.555555555543cp-3}, > +{0x1.55555cf172b91p-5, 0x1.55555cf172b91p-5}, > +{0x1.1111167a4d017p-7, 0x1.1111167a4d017p-7}, > +}, > +.exp2_shift = { > + 0x1.8p52 / N, > + 0x1.8p52 / N, > +}, > +// exp2 polynomial coefficients. > +.exp2_poly = { > +// abs error: 1.2195*2^-65 > +// ulp error: 0.507 (0.511 without fma) > +// if |x| < 1/256 > +// abs error if |x| < 1/128: 1.9941*2^-56 > +{0x1.62e42fefa39efp-1, 0x1.62e42fefa39efp-1}, > +{0x1.ebfbdff82c424p-3, 0x1.ebfbdff82c424p-3}, > +{0x1.c6b08d70cf4b5p-5, 0x1.c6b08d70cf4b5p-5}, > +{0x1.3b2abd24650ccp-7, 0x1.3b2abd24650ccp-7}, > +{0x1.5d7e09b4e3a84p-10, 0x1.5d7e09b4e3a84p-10}, > +}, > +// 2^(k/N) ~= H[k]*(1 + T[k]) for int k in [0,N) > +// tab[2*k] = asuint64(T[k]) > +// tab[2*k+1] = asuint64(H[k]) - (k << 52)/N > +.tab = { > +0x0, 0x3ff0000000000000, > +0x3c9b3b4f1a88bf6e, 0x3feff63da9fb3335, > +0xbc7160139cd8dc5d, 0x3fefec9a3e778061, > +0xbc905e7a108766d1, 0x3fefe315e86e7f85, > +0x3c8cd2523567f613, 0x3fefd9b0d3158574, > +0xbc8bce8023f98efa, 0x3fefd06b29ddf6de, > +0x3c60f74e61e6c861, 0x3fefc74518759bc8, > +0x3c90a3e45b33d399, 0x3fefbe3ecac6f383, > +0x3c979aa65d837b6d, 0x3fefb5586cf9890f, > +0x3c8eb51a92fdeffc, 0x3fefac922b7247f7, > +0x3c3ebe3d702f9cd1, 0x3fefa3ec32d3d1a2, > +0xbc6a033489906e0b, 0x3fef9b66affed31b, > +0xbc9556522a2fbd0e, 0x3fef9301d0125b51, > +0xbc5080ef8c4eea55, 0x3fef8abdc06c31cc, > +0xbc91c923b9d5f416, 0x3fef829aaea92de0, > +0x3c80d3e3e95c55af, 0x3fef7a98c8a58e51, > +0xbc801b15eaa59348, 0x3fef72b83c7d517b, > +0xbc8f1ff055de323d, 0x3fef6af9388c8dea, > +0x3c8b898c3f1353bf, 0x3fef635beb6fcb75, > +0xbc96d99c7611eb26, 0x3fef5be084045cd4, > +0x3c9aecf73e3a2f60, 0x3fef54873168b9aa, > +0xbc8fe782cb86389d, 0x3fef4d5022fcd91d, > +0x3c8a6f4144a6c38d, 0x3fef463b88628cd6, > +0x3c807a05b0e4047d, 0x3fef3f49917ddc96, > +0x3c968efde3a8a894, 0x3fef387a6e756238, > +0x3c875e18f274487d, 0x3fef31ce4fb2a63f, > +0x3c80472b981fe7f2, 0x3fef2b4565e27cdd, > +0xbc96b87b3f71085e, 0x3fef24dfe1f56381, > +0x3c82f7e16d09ab31, 0x3fef1e9df51fdee1, > +0xbc3d219b1a6fbffa, 0x3fef187fd0dad990, > +0x3c8b3782720c0ab4, 0x3fef1285a6e4030b, > +0x3c6e149289cecb8f, 0x3fef0cafa93e2f56, > +0x3c834d754db0abb6, 0x3fef06fe0a31b715, > +0x3c864201e2ac744c, 0x3fef0170fc4cd831, > +0x3c8fdd395dd3f84a, 0x3feefc08b26416ff, > +0xbc86a3803b8e5b04, 0x3feef6c55f929ff1, > +0xbc924aedcc4b5068, 0x3feef1a7373aa9cb, > +0xbc9907f81b512d8e, 0x3feeecae6d05d866, > +0xbc71d1e83e9436d2, 0x3feee7db34e59ff7, > +0xbc991919b3ce1b15, 0x3feee32dc313a8e5, > +0x3c859f48a72a4c6d, 0x3feedea64c123422, > +0xbc9312607a28698a, 0x3feeda4504ac801c, > +0xbc58a78f4817895b, 0x3feed60a21f72e2a, > +0xbc7c2c9b67499a1b, 0x3feed1f5d950a897, > +0x3c4363ed60c2ac11, 0x3feece086061892d, > +0x3c9666093b0664ef, 0x3feeca41ed1d0057, > +0x3c6ecce1daa10379, 0x3feec6a2b5c13cd0, > +0x3c93ff8e3f0f1230, 0x3feec32af0d7d3de, > +0x3c7690cebb7aafb0, 0x3feebfdad5362a27, > +0x3c931dbdeb54e077, 0x3feebcb299fddd0d, > +0xbc8f94340071a38e, 0x3feeb9b2769d2ca7, > +0xbc87deccdc93a349, 0x3feeb6daa2cf6642, > +0xbc78dec6bd0f385f, 0x3feeb42b569d4f82, > +0xbc861246ec7b5cf6, 0x3feeb1a4ca5d920f, > +0x3c93350518fdd78e, 0x3feeaf4736b527da, > +0x3c7b98b72f8a9b05, 0x3feead12d497c7fd, > +0x3c9063e1e21c5409, 0x3feeab07dd485429, > +0x3c34c7855019c6ea, 0x3feea9268a5946b7, > +0x3c9432e62b64c035, 0x3feea76f15ad2148, > +0xbc8ce44a6199769f, 0x3feea5e1b976dc09, > +0xbc8c33c53bef4da8, 0x3feea47eb03a5585, > +0xbc845378892be9ae, 0x3feea34634ccc320, > +0xbc93cedd78565858, 0x3feea23882552225, > +0x3c5710aa807e1964, 0x3feea155d44ca973, > +0xbc93b3efbf5e2228, 0x3feea09e667f3bcd, > +0xbc6a12ad8734b982, 0x3feea012750bdabf, > +0xbc6367efb86da9ee, 0x3fee9fb23c651a2f, > +0xbc80dc3d54e08851, 0x3fee9f7df9519484, > +0xbc781f647e5a3ecf, 0x3fee9f75e8ec5f74, > +0xbc86ee4ac08b7db0, 0x3fee9f9a48a58174, > +0xbc8619321e55e68a, 0x3fee9feb564267c9, > +0x3c909ccb5e09d4d3, 0x3feea0694fde5d3f, > +0xbc7b32dcb94da51d, 0x3feea11473eb0187, > +0x3c94ecfd5467c06b, 0x3feea1ed0130c132, > +0x3c65ebe1abd66c55, 0x3feea2f336cf4e62, > +0xbc88a1c52fb3cf42, 0x3feea427543e1a12, > +0xbc9369b6f13b3734, 0x3feea589994cce13, > +0xbc805e843a19ff1e, 0x3feea71a4623c7ad, > +0xbc94d450d872576e, 0x3feea8d99b4492ed, > +0x3c90ad675b0e8a00, 0x3feeaac7d98a6699, > +0x3c8db72fc1f0eab4, 0x3feeace5422aa0db, > +0xbc65b6609cc5e7ff, 0x3feeaf3216b5448c, > +0x3c7bf68359f35f44, 0x3feeb1ae99157736, > +0xbc93091fa71e3d83, 0x3feeb45b0b91ffc6, > +0xbc5da9b88b6c1e29, 0x3feeb737b0cdc5e5, > +0xbc6c23f97c90b959, 0x3feeba44cbc8520f, > +0xbc92434322f4f9aa, 0x3feebd829fde4e50, > +0xbc85ca6cd7668e4b, 0x3feec0f170ca07ba, > +0x3c71affc2b91ce27, 0x3feec49182a3f090, > +0x3c6dd235e10a73bb, 0x3feec86319e32323, > +0xbc87c50422622263, 0x3feecc667b5de565, > +0x3c8b1c86e3e231d5, 0x3feed09bec4a2d33, > +0xbc91bbd1d3bcbb15, 0x3feed503b23e255d, > +0x3c90cc319cee31d2, 0x3feed99e1330b358, > +0x3c8469846e735ab3, 0x3feede6b5579fdbf, > +0xbc82dfcd978e9db4, 0x3feee36bbfd3f37a, > +0x3c8c1a7792cb3387, 0x3feee89f995ad3ad, > +0xbc907b8f4ad1d9fa, 0x3feeee07298db666, > +0xbc55c3d956dcaeba, 0x3feef3a2b84f15fb, > +0xbc90a40e3da6f640, 0x3feef9728de5593a, > +0xbc68d6f438ad9334, 0x3feeff76f2fb5e47, > +0xbc91eee26b588a35, 0x3fef05b030a1064a, > +0x3c74ffd70a5fddcd, 0x3fef0c1e904bc1d2, > +0xbc91bdfbfa9298ac, 0x3fef12c25bd71e09, > +0x3c736eae30af0cb3, 0x3fef199bdd85529c, > +0x3c8ee3325c9ffd94, 0x3fef20ab5fffd07a, > +0x3c84e08fd10959ac, 0x3fef27f12e57d14b, > +0x3c63cdaf384e1a67, 0x3fef2f6d9406e7b5, > +0x3c676b2c6c921968, 0x3fef3720dcef9069, > +0xbc808a1883ccb5d2, 0x3fef3f0b555dc3fa, > +0xbc8fad5d3ffffa6f, 0x3fef472d4a07897c, > +0xbc900dae3875a949, 0x3fef4f87080d89f2, > +0x3c74a385a63d07a7, 0x3fef5818dcfba487, > +0xbc82919e2040220f, 0x3fef60e316c98398, > +0x3c8e5a50d5c192ac, 0x3fef69e603db3285, > +0x3c843a59ac016b4b, 0x3fef7321f301b460, > +0xbc82d52107b43e1f, 0x3fef7c97337b9b5f, > +0xbc892ab93b470dc9, 0x3fef864614f5a129, > +0x3c74b604603a88d3, 0x3fef902ee78b3ff6, > +0x3c83c5ec519d7271, 0x3fef9a51fbc74c83, > +0xbc8ff7128fd391f0, 0x3fefa4afa2a490da, > +0xbc8dae98e223747d, 0x3fefaf482d8e67f1, > +0x3c8ec3bc41aa2008, 0x3fefba1bee615a27, > +0x3c842b94c3a9eb32, 0x3fefc52b376bba97, > +0x3c8a64a931d185ee, 0x3fefd0765b6e4540, > +0xbc8e37bae43be3ed, 0x3fefdbfdad9cbe14, > +0x3c77893b4d91cd9d, 0x3fefe7c1819e90d8, > +0x3c5305c14160cc89, 0x3feff3c22b8f71f1, > +}, > +}; > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c > index d7588a7929..3a9fb05aee 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c > @@ -21,7 +21,7 @@ > #include <math/math-svid-compat.h> > #include <shlib-compat.h> > #include <libm-alias-double.h> > -#include "math_config.h" > +#include "math_config_dbl.h" > #include <altivec.h> > > #define T __log_data.tab > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c > index 2fe569576c..f1299404c4 100644 > --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c > @@ -16,7 +16,7 @@ > License along with the GNU C Library; if not, see > <http://www.gnu.org/licenses/>. */ > > -#include "math_config.h" > +#include "math_config_dbl.h" > > #define N (1 << LOG_TABLE_BITS) > > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c > new file mode 100644 > index 0000000000..e60b0d232f > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c > @@ -0,0 +1,40 @@ > +/* Double-precision math error handling. > + Copyright (C) 2017-2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#include <fpu/math-barriers.h> > +#include "math_config_dbl.h" > + > +/* NOINLINE reduces code size. */ > +NOINLINE static double > +xflow (uint32_t sign, double y) > +{ > + y = math_opt_barrier (sign ? -y : y) * y; > + return y; > +} > + > +attribute_hidden double > +__math_uflow (uint32_t sign) > +{ > + return xflow (sign, 0x1p-767); > +} > + > +attribute_hidden double > +__math_oflow (uint32_t sign) > +{ > + return xflow (sign, 0x1p769); > +} > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist > index bfbd340626..46558dbd77 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist > @@ -1,4 +1,5 @@ > GLIBC_2.30 _ZGVbN2v_cos F > +GLIBC_2.30 _ZGVbN2v_exp F > GLIBC_2.30 _ZGVbN2v_log F > GLIBC_2.30 _ZGVbN2v_sin F > GLIBC_2.30 _ZGVbN2vvv_sincos F > -- > 2.21.0.1020.gf2820cf01a >
Shawn Landden <shawn@git.icu> writes: > [BZ #24209] > * NEWS: Noted the addition of PPC64 vector exp function. > * sysdeps/powerpc/bits/math-vector.h: Added entry for vector exp. > * sysdeps/powerpc/powerpc64/fpu/Versions: Added vector exp entry. > * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: > (libmvec-sysdep_routines, CFLAGS-vec_d_exp2_vsx.c): > (CFLAGS-vec_d_exp_data.c): Added build of VSX SIMD expf > function and tests. Added vec_math_err.c to build. > * sysdeps/powerpc/powerpc64/fpu/multiarch/(math_config.h)->(math_config_dbl.h) > Renamed, and modified for exp. > * sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c: > Added entry for vector exp. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c: New file. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c: Likewise. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c: Likewise. > * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD expf > added. > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c: Rename of math_config.h > * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c: Likewise Looks like some of the issues you fixed in v2 got reverted here. Fixed. > diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > new file mode 100644 > index 0000000000..376531e32b > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c > ... > +/* Top 12 bits of a double (sign and exponent bits). */ > +static inline uint32_t top12(double x) > +{ > + return asuint64(x) >> 52; > +} Something happened here as well and all the coding style issues that you fixed in v2 got lost. Starting from this function. Fixed. I'm pushing this patch with the fixes I mentioned to branch tuliom/libmvec. Thanks!
diff --git a/NEWS b/NEWS index 26f5c67f75..a33af8c316 100644 --- a/NEWS +++ b/NEWS @@ -20,6 +20,7 @@ Major new features: - single-precision sine: sinf - double-precision sincos: sincos - single-precision sincos: sincosf + - double-precision exp: exp - single-precision exp: expf GCC support for auto-vectorization of functions on PPC64 is not yet diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h index bbc9db0dd9..e7846b4bfa 100644 --- a/sysdeps/powerpc/bits/math-vector.h +++ b/sysdeps/powerpc/bits/math-vector.h @@ -46,6 +46,8 @@ # define __DECL_SIMD_sincosf __DECL_SIMD_PPC64 # undef __DECL_SIMD_expf # define __DECL_SIMD_expf __DECL_SIMD_PPC64 +# undef __DECL_SIMD_exp +# define __DECL_SIMD_exp __DECL_SIMD_PPC64 # endif #endif diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions index 7c45e76074..361edee9e7 100644 --- a/sysdeps/powerpc/powerpc64/fpu/Versions +++ b/sysdeps/powerpc/powerpc64/fpu/Versions @@ -2,6 +2,6 @@ libmvec { GLIBC_2.30 { _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf; _ZGVbN2v_log; _ZGVbN4v_logf; _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf; - _ZGVbN4v_expf; + _ZGVbN4v_expf; _ZGVbN2v_exp; } } diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile index d15183d5ee..dd982bb068 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile @@ -49,7 +49,8 @@ libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \ vec_d_log2_vsx vec_d_log_data \ vec_s_logf4_vsx vec_s_logf_data \ vec_s_expf4_vsx vec_s_exp2f_data \ - vec_math_errf \ + vec_math_errf vec_math_err \ + vec_d_exp2_vsx vec_d_exp_data \ vec_d_sincos2_vsx vec_s_sincosf4_vsx CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector @@ -64,6 +65,9 @@ CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx CFLAGS-vec_s_expf4_vsx.c += -mabi=altivec -maltivec -mvsx CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx +CFLAGS-vec_d_exp2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector +CFLAGS-vec_d_exp_data.c += -mabi=altivec -maltivec -mvsx +CFLAGS-vec_math_err.c += -mabi=altivec -maltivec -mvsx endif # Variables for libmvec tests. @@ -71,7 +75,7 @@ ifeq ($(subdir),math) ifeq ($(build-mathvec),yes) libmvec-tests += double-vlen2 float-vlen4 -double-vlen2-funcs = cos sin sincos log +double-vlen2-funcs = cos sin sincos log exp float-vlen4-funcs = cos sin sincos log exp double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h similarity index 95% rename from sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h rename to sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h index bd37906ecb..7c94f5b9b2 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config.h +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h @@ -24,6 +24,7 @@ #include <nan-high-order-bit.h> #include <stdint.h> #include <altivec.h> +#include <math-narrow-eval.h> #ifndef WANT_ROUNDING /* Correct special case results in non-nearest rounding modes. */ @@ -134,13 +135,13 @@ check_uflow (double x) #define EXP2_POLY_ORDER 5 extern const struct exp_data { - double invln2N; - double shift; - double negln2hiN; - double negln2loN; - double poly[4]; /* Last four coefficients. */ - double exp2_shift; - double exp2_poly[EXP2_POLY_ORDER]; + vector double invln2N; + vector double shift; + vector double negln2hiN; + vector double negln2loN; + vector double poly[4]; /* Last four coefficients. */ + vector double exp2_shift; + vector double exp2_poly[EXP2_POLY_ORDER]; uint64_t tab[2*(1 << EXP_TABLE_BITS)]; } __exp_data attribute_hidden; diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c index c9e2dbfdd0..f368e47947 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c @@ -24,4 +24,5 @@ VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos) VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log) +VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp) VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos) diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c new file mode 100644 index 0000000000..376531e32b --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp2_vsx.c @@ -0,0 +1,197 @@ +/* Double-precision vector exp(x) function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ +/* Based on sysdeps/ieee754/dbl-64/e_exp.c which came from + Szabolcs Nagy at ARM Ltd. */ +#include <altivec.h> +#include <math.h> +#include <math-barriers.h> +#include <math-narrow-eval.h> + +#include "math_config_dbl.h" + +typedef vector long long unsigned v64u; +typedef union { + vector unsigned u; + vector float f; + v64u l; + vector double d; +} u; +typedef union { + double d; + uint64_t l; + unsigned u; + float f; +} us; + +#define N (1 << EXP_TABLE_BITS) +#define InvLn2N __exp_data.invln2N +#define NegLn2hiN __exp_data.negln2hiN +#define NegLn2loN __exp_data.negln2loN +#define Shift __exp_data.shift +#define T __exp_data.tab +#define C2 __exp_data.poly[5 - EXP_POLY_ORDER] +#define C3 __exp_data.poly[6 - EXP_POLY_ORDER] +#define C4 __exp_data.poly[7 - EXP_POLY_ORDER] +#define C5 __exp_data.poly[8 - EXP_POLY_ORDER] + +/* Handle cases that may overflow or underflow when computing the result that + is scale*(1+TMP) without intermediate rounding. The bit representation of + scale is in SBITS, however it has a computed exponent that may have + overflown into the sign bit so that needs to be adjusted before using it as + a double. (int32_t)KI is the k used in the argument reduction and exponent + adjustment of scale, positive k here means the result may overflow and + negative k means the result may underflow. */ +static inline double +specialcase (double_t tmp, uint64_t sbits, uint64_t ki) +{ + double_t scale, y; + + if ((ki & 0x80000000) == 0) + { + /* k > 0, the exponent of scale might have overflowed by <= 460. */ + sbits -= 1009ull << 52; + scale = asdouble (sbits); + y = 0x1p1009 * (scale + scale * tmp); + return y; + } + /* k < 0, need special care in the subnormal range. */ + sbits += 1022ull << 52; + scale = asdouble (sbits); + y = scale + scale * tmp; + if (y < 1.0) + { + /* Round y to the right precision before scaling it into the subnormal + range to avoid double rounding that can cause 0.5+E/2 ulp error where + E is the worst-case ulp error outside the subnormal range. So this + is only useful if the goal is better than 1 ulp worst-case error. */ + double_t hi, lo; + lo = scale - y + scale * tmp; + hi = 1.0 + y; + lo = 1.0 - hi + y + lo; + y = math_narrow_eval (hi + lo) - 1.0; + /* Avoid -0.0 with downward rounding. */ + if (WANT_ROUNDING && y == 0.0) + y = 0.0; + /* The underflow exception needs to be signaled explicitly. */ + math_force_eval (math_opt_barrier (0x1p-1022) * 0x1p-1022); + } + y = 0x1p-1022 * y; + return y; +} + +/* Top 12 bits of a double (sign and exponent bits). */ +static inline uint32_t top12(double x) +{ + return asuint64(x) >> 52; +} + +vector double +_ZGVbN2v_exp (vector double x) +{ + v64u abstop, is_special_case, is_special_case2, idx, top, sbits; + vector double kd, z, r, r2, scale, tail, tmp; + u t, t2; + u res; + u xu; + xu.d = x; + v64u zero = {0, 0}; + u c512; + vector double load = {512.0, 512.0}; + c512.d = load; + u normalize; + vector double load2 = {0x1p-54, 0x1p-54}; + normalize.d = load2; + c512.l &= 0x7ff0000000000000; + normalize.l &= 0x7ff0000000000000; + abstop = xu.l & 0x7ff0000000000000; + is_special_case = (v64u) vec_cmpge (xu.l - normalize.l, c512.l - normalize.l); + if (__glibc_unlikely (!vec_all_eq (is_special_case, zero))) + { + for (int m=0;m<2;++m) + { + if (!is_special_case[m]) + continue; + double v; + uint32_t abstops = abstop[m] >> 52; + if (!is_special_case[m]) + continue; + v = x[m]; + if (abstops - top12(0x1p-54) >= 0x80000000) + /* Avoid spurious underflow for tiny x. */ + /* Note: 0 is common input. */ + res.d[m] = WANT_ROUNDING ? 1.0 + v : 1.0; + if (abstops >= top12(1024.0)) { + if (asuint64(v) == asuint64(-INFINITY)) + res.d[m] = 0.0; + else if (abstops >= top12(INFINITY)) + res.d[m] = 1.0 + v; + else if (asuint64(v) >> 63) + res.d[m] = __math_uflow(0); + else + res.d[m] = __math_oflow(0); + /* prevent is_special_case2 from matching */ + abstop[m] = 1; + } else { + /* Large x is special cased below. */ + abstop[m] = 0; + is_special_case[m] = 0; + } + } + } + /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */ + /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */ + z = InvLn2N * x; + u kdu; + /* z - kd is in [-1, 1] in non-nearest rounding modes. */ + kd = z + Shift; + kdu.d = kd; + kd -= Shift; + r = x + kd * NegLn2hiN + kd * NegLn2loN; + /* 2^(k/N) ~= scale * (1 + tail). */ + idx = 2 * (kdu.l % N); + top = kdu.l << (52 - EXP_TABLE_BITS); + v64u load3 = {T[idx[0]], T[idx[1]]}; + t.l = load3; + v64u load4 = {T[idx[0] + 1], T[idx[1] + 1]}; + t2.l = load4; + tail = t.d; + /* This is only a valid scale when -1023*N < k < 1024*N. */ + sbits = t2.l + top; + /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1). */ + /* Evaluation is optimized assuming superscalar pipelined execution. */ + r2 = r * r; + /* Without fma the worst case error is 0.25/N ulp larger. */ + /* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp. */ + tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5); + is_special_case2 = (v64u) vec_cmpeq (abstop, zero); + if (__glibc_unlikely (!vec_all_eq (is_special_case2, zero))) + { + u sc2; + for (int m=0;m<2;++m) + { + sc2.d[m] = specialcase(tmp[m], sbits[m], kdu.l[m]); + } + res.l = vec_sel (res.l, sc2.l, is_special_case2); + } + u sbitsu; + sbitsu.l = sbits; + scale = sbitsu.d; + u res2; + res2.d = scale + scale * tmp; + /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there + is no spurious underflow here even without fma. */ + res.l = vec_sel (res.l, res2.l, ~is_special_case & ~is_special_case2); + return res.d; +} diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c new file mode 100644 index 0000000000..208b897d04 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_exp_data.c @@ -0,0 +1,202 @@ +/* Shared data between vector exp, exp2 and pow + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ +/* Based on sysdeps/ieee754/dbl-64/e_exp.c which came from + Szabolcs Nagy at ARM Ltd. */ + +#include "math_config_dbl.h" + +#define N (1 << EXP_TABLE_BITS) + +const struct exp_data __exp_data = { +// N/ln2 +.invln2N = { + 0x1.71547652b82fep0 * N, + 0x1.71547652b82fep0 * N, +}, +// -ln2/N +.negln2hiN = { + -0x1.62e42fefa0000p-8, + -0x1.62e42fefa0000p-8, +}, +.negln2loN = { + -0x1.cf79abc9e3b3ap-47, + -0x1.cf79abc9e3b3ap-47, +}, +.shift = { + 0x1.8p52, + 0x1.8p52, +}, +// exp polynomial coefficients. +.poly = { +// abs error: 1.555*2^-66 +// ulp error: 0.509 (0.511 without fma) +// if |x| < ln2/256+eps +// abs error if |x| < ln2/256+0x1p-15: 1.09*2^-65 +// abs error if |x| < ln2/128: 1.7145*2^-56 +{0x1.ffffffffffdbdp-2, 0x1.ffffffffffdbdp-2}, +{0x1.555555555543cp-3, 0x1.555555555543cp-3}, +{0x1.55555cf172b91p-5, 0x1.55555cf172b91p-5}, +{0x1.1111167a4d017p-7, 0x1.1111167a4d017p-7}, +}, +.exp2_shift = { + 0x1.8p52 / N, + 0x1.8p52 / N, +}, +// exp2 polynomial coefficients. +.exp2_poly = { +// abs error: 1.2195*2^-65 +// ulp error: 0.507 (0.511 without fma) +// if |x| < 1/256 +// abs error if |x| < 1/128: 1.9941*2^-56 +{0x1.62e42fefa39efp-1, 0x1.62e42fefa39efp-1}, +{0x1.ebfbdff82c424p-3, 0x1.ebfbdff82c424p-3}, +{0x1.c6b08d70cf4b5p-5, 0x1.c6b08d70cf4b5p-5}, +{0x1.3b2abd24650ccp-7, 0x1.3b2abd24650ccp-7}, +{0x1.5d7e09b4e3a84p-10, 0x1.5d7e09b4e3a84p-10}, +}, +// 2^(k/N) ~= H[k]*(1 + T[k]) for int k in [0,N) +// tab[2*k] = asuint64(T[k]) +// tab[2*k+1] = asuint64(H[k]) - (k << 52)/N +.tab = { +0x0, 0x3ff0000000000000, +0x3c9b3b4f1a88bf6e, 0x3feff63da9fb3335, +0xbc7160139cd8dc5d, 0x3fefec9a3e778061, +0xbc905e7a108766d1, 0x3fefe315e86e7f85, +0x3c8cd2523567f613, 0x3fefd9b0d3158574, +0xbc8bce8023f98efa, 0x3fefd06b29ddf6de, +0x3c60f74e61e6c861, 0x3fefc74518759bc8, +0x3c90a3e45b33d399, 0x3fefbe3ecac6f383, +0x3c979aa65d837b6d, 0x3fefb5586cf9890f, +0x3c8eb51a92fdeffc, 0x3fefac922b7247f7, +0x3c3ebe3d702f9cd1, 0x3fefa3ec32d3d1a2, +0xbc6a033489906e0b, 0x3fef9b66affed31b, +0xbc9556522a2fbd0e, 0x3fef9301d0125b51, +0xbc5080ef8c4eea55, 0x3fef8abdc06c31cc, +0xbc91c923b9d5f416, 0x3fef829aaea92de0, +0x3c80d3e3e95c55af, 0x3fef7a98c8a58e51, +0xbc801b15eaa59348, 0x3fef72b83c7d517b, +0xbc8f1ff055de323d, 0x3fef6af9388c8dea, +0x3c8b898c3f1353bf, 0x3fef635beb6fcb75, +0xbc96d99c7611eb26, 0x3fef5be084045cd4, +0x3c9aecf73e3a2f60, 0x3fef54873168b9aa, +0xbc8fe782cb86389d, 0x3fef4d5022fcd91d, +0x3c8a6f4144a6c38d, 0x3fef463b88628cd6, +0x3c807a05b0e4047d, 0x3fef3f49917ddc96, +0x3c968efde3a8a894, 0x3fef387a6e756238, +0x3c875e18f274487d, 0x3fef31ce4fb2a63f, +0x3c80472b981fe7f2, 0x3fef2b4565e27cdd, +0xbc96b87b3f71085e, 0x3fef24dfe1f56381, +0x3c82f7e16d09ab31, 0x3fef1e9df51fdee1, +0xbc3d219b1a6fbffa, 0x3fef187fd0dad990, +0x3c8b3782720c0ab4, 0x3fef1285a6e4030b, +0x3c6e149289cecb8f, 0x3fef0cafa93e2f56, +0x3c834d754db0abb6, 0x3fef06fe0a31b715, +0x3c864201e2ac744c, 0x3fef0170fc4cd831, +0x3c8fdd395dd3f84a, 0x3feefc08b26416ff, +0xbc86a3803b8e5b04, 0x3feef6c55f929ff1, +0xbc924aedcc4b5068, 0x3feef1a7373aa9cb, +0xbc9907f81b512d8e, 0x3feeecae6d05d866, +0xbc71d1e83e9436d2, 0x3feee7db34e59ff7, +0xbc991919b3ce1b15, 0x3feee32dc313a8e5, +0x3c859f48a72a4c6d, 0x3feedea64c123422, +0xbc9312607a28698a, 0x3feeda4504ac801c, +0xbc58a78f4817895b, 0x3feed60a21f72e2a, +0xbc7c2c9b67499a1b, 0x3feed1f5d950a897, +0x3c4363ed60c2ac11, 0x3feece086061892d, +0x3c9666093b0664ef, 0x3feeca41ed1d0057, +0x3c6ecce1daa10379, 0x3feec6a2b5c13cd0, +0x3c93ff8e3f0f1230, 0x3feec32af0d7d3de, +0x3c7690cebb7aafb0, 0x3feebfdad5362a27, +0x3c931dbdeb54e077, 0x3feebcb299fddd0d, +0xbc8f94340071a38e, 0x3feeb9b2769d2ca7, +0xbc87deccdc93a349, 0x3feeb6daa2cf6642, +0xbc78dec6bd0f385f, 0x3feeb42b569d4f82, +0xbc861246ec7b5cf6, 0x3feeb1a4ca5d920f, +0x3c93350518fdd78e, 0x3feeaf4736b527da, +0x3c7b98b72f8a9b05, 0x3feead12d497c7fd, +0x3c9063e1e21c5409, 0x3feeab07dd485429, +0x3c34c7855019c6ea, 0x3feea9268a5946b7, +0x3c9432e62b64c035, 0x3feea76f15ad2148, +0xbc8ce44a6199769f, 0x3feea5e1b976dc09, +0xbc8c33c53bef4da8, 0x3feea47eb03a5585, +0xbc845378892be9ae, 0x3feea34634ccc320, +0xbc93cedd78565858, 0x3feea23882552225, +0x3c5710aa807e1964, 0x3feea155d44ca973, +0xbc93b3efbf5e2228, 0x3feea09e667f3bcd, +0xbc6a12ad8734b982, 0x3feea012750bdabf, +0xbc6367efb86da9ee, 0x3fee9fb23c651a2f, +0xbc80dc3d54e08851, 0x3fee9f7df9519484, +0xbc781f647e5a3ecf, 0x3fee9f75e8ec5f74, +0xbc86ee4ac08b7db0, 0x3fee9f9a48a58174, +0xbc8619321e55e68a, 0x3fee9feb564267c9, +0x3c909ccb5e09d4d3, 0x3feea0694fde5d3f, +0xbc7b32dcb94da51d, 0x3feea11473eb0187, +0x3c94ecfd5467c06b, 0x3feea1ed0130c132, +0x3c65ebe1abd66c55, 0x3feea2f336cf4e62, +0xbc88a1c52fb3cf42, 0x3feea427543e1a12, +0xbc9369b6f13b3734, 0x3feea589994cce13, +0xbc805e843a19ff1e, 0x3feea71a4623c7ad, +0xbc94d450d872576e, 0x3feea8d99b4492ed, +0x3c90ad675b0e8a00, 0x3feeaac7d98a6699, +0x3c8db72fc1f0eab4, 0x3feeace5422aa0db, +0xbc65b6609cc5e7ff, 0x3feeaf3216b5448c, +0x3c7bf68359f35f44, 0x3feeb1ae99157736, +0xbc93091fa71e3d83, 0x3feeb45b0b91ffc6, +0xbc5da9b88b6c1e29, 0x3feeb737b0cdc5e5, +0xbc6c23f97c90b959, 0x3feeba44cbc8520f, +0xbc92434322f4f9aa, 0x3feebd829fde4e50, +0xbc85ca6cd7668e4b, 0x3feec0f170ca07ba, +0x3c71affc2b91ce27, 0x3feec49182a3f090, +0x3c6dd235e10a73bb, 0x3feec86319e32323, +0xbc87c50422622263, 0x3feecc667b5de565, +0x3c8b1c86e3e231d5, 0x3feed09bec4a2d33, +0xbc91bbd1d3bcbb15, 0x3feed503b23e255d, +0x3c90cc319cee31d2, 0x3feed99e1330b358, +0x3c8469846e735ab3, 0x3feede6b5579fdbf, +0xbc82dfcd978e9db4, 0x3feee36bbfd3f37a, +0x3c8c1a7792cb3387, 0x3feee89f995ad3ad, +0xbc907b8f4ad1d9fa, 0x3feeee07298db666, +0xbc55c3d956dcaeba, 0x3feef3a2b84f15fb, +0xbc90a40e3da6f640, 0x3feef9728de5593a, +0xbc68d6f438ad9334, 0x3feeff76f2fb5e47, +0xbc91eee26b588a35, 0x3fef05b030a1064a, +0x3c74ffd70a5fddcd, 0x3fef0c1e904bc1d2, +0xbc91bdfbfa9298ac, 0x3fef12c25bd71e09, +0x3c736eae30af0cb3, 0x3fef199bdd85529c, +0x3c8ee3325c9ffd94, 0x3fef20ab5fffd07a, +0x3c84e08fd10959ac, 0x3fef27f12e57d14b, +0x3c63cdaf384e1a67, 0x3fef2f6d9406e7b5, +0x3c676b2c6c921968, 0x3fef3720dcef9069, +0xbc808a1883ccb5d2, 0x3fef3f0b555dc3fa, +0xbc8fad5d3ffffa6f, 0x3fef472d4a07897c, +0xbc900dae3875a949, 0x3fef4f87080d89f2, +0x3c74a385a63d07a7, 0x3fef5818dcfba487, +0xbc82919e2040220f, 0x3fef60e316c98398, +0x3c8e5a50d5c192ac, 0x3fef69e603db3285, +0x3c843a59ac016b4b, 0x3fef7321f301b460, +0xbc82d52107b43e1f, 0x3fef7c97337b9b5f, +0xbc892ab93b470dc9, 0x3fef864614f5a129, +0x3c74b604603a88d3, 0x3fef902ee78b3ff6, +0x3c83c5ec519d7271, 0x3fef9a51fbc74c83, +0xbc8ff7128fd391f0, 0x3fefa4afa2a490da, +0xbc8dae98e223747d, 0x3fefaf482d8e67f1, +0x3c8ec3bc41aa2008, 0x3fefba1bee615a27, +0x3c842b94c3a9eb32, 0x3fefc52b376bba97, +0x3c8a64a931d185ee, 0x3fefd0765b6e4540, +0xbc8e37bae43be3ed, 0x3fefdbfdad9cbe14, +0x3c77893b4d91cd9d, 0x3fefe7c1819e90d8, +0x3c5305c14160cc89, 0x3feff3c22b8f71f1, +}, +}; diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c index d7588a7929..3a9fb05aee 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log2_vsx.c @@ -21,7 +21,7 @@ #include <math/math-svid-compat.h> #include <shlib-compat.h> #include <libm-alias-double.h> -#include "math_config.h" +#include "math_config_dbl.h" #include <altivec.h> #define T __log_data.tab diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c index 2fe569576c..f1299404c4 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_log_data.c @@ -16,7 +16,7 @@ License along with the GNU C Library; if not, see <http://www.gnu.org/licenses/>. */ -#include "math_config.h" +#include "math_config_dbl.h" #define N (1 << LOG_TABLE_BITS) diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c new file mode 100644 index 0000000000..e60b0d232f --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c @@ -0,0 +1,40 @@ +/* Double-precision math error handling. + Copyright (C) 2017-2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <fpu/math-barriers.h> +#include "math_config_dbl.h" + +/* NOINLINE reduces code size. */ +NOINLINE static double +xflow (uint32_t sign, double y) +{ + y = math_opt_barrier (sign ? -y : y) * y; + return y; +} + +attribute_hidden double +__math_uflow (uint32_t sign) +{ + return xflow (sign, 0x1p-767); +} + +attribute_hidden double +__math_oflow (uint32_t sign) +{ + return xflow (sign, 0x1p769); +} diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist index bfbd340626..46558dbd77 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist @@ -1,4 +1,5 @@ GLIBC_2.30 _ZGVbN2v_cos F +GLIBC_2.30 _ZGVbN2v_exp F GLIBC_2.30 _ZGVbN2v_log F GLIBC_2.30 _ZGVbN2v_sin F GLIBC_2.30 _ZGVbN2vvv_sincos F