diff mbox series

[v8] replace tgammaf by the CORE-MATH implementation

Message ID 20240909145429.1122975-1-Paul.Zimmermann@inria.fr
State New
Headers show
Series [v8] replace tgammaf by the CORE-MATH implementation | expand

Commit Message

Paul Zimmermann Sept. 9, 2024, 2:54 p.m. UTC
The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check against for example GNU MPFR.
This patch also adds some bench values for tgammaf.

Tested on x86_64 and x86 (cfarm26).

With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:

      "tgammaf": {
       "": {
        "duration": 3.50188e+09,
        "iterations": 2e+07,
        "max": 602.891,
        "min": 65.1415,
        "mean": 175.094
       }
      }

With the new code:

      "tgammaf": {
       "": {
        "duration": 3.30825e+09,
        "iterations": 5e+07,
        "max": 211.592,
        "min": 32.0325,
        "mean": 66.1649
       }
      }

With the initial GNU libc code it gave on cfarm26 (i686):

  "tgammaf": {
   "": {
    "duration": 3.70505e+09,
    "iterations": 6e+06,
    "max": 2420.23,
    "min": 243.154,
    "mean": 617.509
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.24497e+09,
    "iterations": 1.8e+07,
    "max": 1238.15,
    "min": 101.155,
    "mean": 180.276
   }
  }

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Changes in v2:
    - include <math.h> (fix the linknamespace failures)
    - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
    - restored original wrapper code (math/w_tgammaf_compat.c),
      except for the dealing with the sign
    - removed the tgammaf/float entries in all libm-test-ulps files
    - address other comments from Joseph Myers
      (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)

Changes in v3:
    - pass NULL argument for signgam from w_tgammaf_compat.c
    - use of math_narrow_eval
    - added more comments

Changes in v4:
    - initialize local_signgam to 0 in math/w_tgamma_template.c
    - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file

Changes in v5:
    - do not mention local_signgam any more in math/w_tgammaf_compat.c
    - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
      and added comment

Changes in v6:
    - pass NULL as 2nd argument of __ieee754_gammaf_r in
      w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c

Changes in v7:
    - added Signed-off-by line for Alexei Sibidanov (author of the code)

Changes in v8:
    - added Signed-off-by line for Paul Zimmermann (submitted of the patch)
---
 benchtests/Makefile                           |   1 +
 math/w_tgammaf_compat.c                       |   6 +-
 sysdeps/aarch64/libm-test-ulps                |   4 -
 sysdeps/alpha/fpu/libm-test-ulps              |   4 -
 sysdeps/arc/fpu/libm-test-ulps                |   4 -
 sysdeps/arc/nofpu/libm-test-ulps              |   1 -
 sysdeps/arm/libm-test-ulps                    |   4 -
 sysdeps/csky/fpu/libm-test-ulps               |   4 -
 sysdeps/csky/nofpu/libm-test-ulps             |   4 -
 sysdeps/hppa/fpu/libm-test-ulps               |   4 -
 sysdeps/i386/fpu/libm-test-ulps               |   4 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   4 -
 sysdeps/ieee754/dbl-64/gamma_productf.c       |  45 +--
 sysdeps/ieee754/flt-32/e_gammaf_r.c           | 321 +++++++-----------
 sysdeps/loongarch/lp64/libm-test-ulps         |   4 -
 sysdeps/m68k/coldfire/fpu/libm-test-ulps      |   1 -
 sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   4 -
 sysdeps/microblaze/libm-test-ulps             |   1 -
 sysdeps/mips/mips32/libm-test-ulps            |   4 -
 sysdeps/mips/mips64/libm-test-ulps            |   4 -
 sysdeps/nios2/libm-test-ulps                  |   1 -
 sysdeps/or1k/fpu/libm-test-ulps               |   4 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   4 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   4 -
 sysdeps/powerpc/nofpu/libm-test-ulps          |   4 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   4 -
 sysdeps/riscv/rvd/libm-test-ulps              |   4 -
 sysdeps/s390/fpu/libm-test-ulps               |   4 -
 sysdeps/sh/libm-test-ulps                     |   2 -
 sysdeps/sparc/fpu/libm-test-ulps              |   4 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
 31 files changed, 133 insertions(+), 334 deletions(-)

--
2.43.0

Comments

Adhemerval Zanella Netto Oct. 7, 2024, 1:06 p.m. UTC | #1
On 09/09/24 11:54, Paul Zimmermann wrote:
> The CORE-MATH implementation is correctly rounded (for any rounding mode).
> This can be checked by exhaustive tests in a few minutes since there are
> less than 2^32 values to check against for example GNU MPFR.
> This patch also adds some bench values for tgammaf.
> 
> Tested on x86_64 and x86 (cfarm26).
> 
> With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:
> 
>       "tgammaf": {
>        "": {
>         "duration": 3.50188e+09,
>         "iterations": 2e+07,
>         "max": 602.891,
>         "min": 65.1415,
>         "mean": 175.094
>        }
>       }
> 
> With the new code:
> 
>       "tgammaf": {
>        "": {
>         "duration": 3.30825e+09,
>         "iterations": 5e+07,
>         "max": 211.592,
>         "min": 32.0325,
>         "mean": 66.1649
>        }
>       }
> 
> With the initial GNU libc code it gave on cfarm26 (i686):
> 
>   "tgammaf": {
>    "": {
>     "duration": 3.70505e+09,
>     "iterations": 6e+06,
>     "max": 2420.23,
>     "min": 243.154,
>     "mean": 617.509
>    }
>   }
> 
> With the new code:
> 
>   "tgammaf": {
>    "": {
>     "duration": 3.24497e+09,
>     "iterations": 1.8e+07,
>     "max": 1238.15,
>     "min": 101.155,
>     "mean": 180.276
>    }
>   }
> 
> Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

From last weekly call it seems the license on the implementation (which is
compatible with the one used by glibc) and the sign-off from the authors
are suffice for inclusion.

Florian has raised that we should also update the license used on LICENSES
file, but it can be done in a subsequent patch.

> 
> Changes in v2:
>     - include <math.h> (fix the linknamespace failures)
>     - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
>     - restored original wrapper code (math/w_tgammaf_compat.c),
>       except for the dealing with the sign
>     - removed the tgammaf/float entries in all libm-test-ulps files
>     - address other comments from Joseph Myers
>       (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)
> 
> Changes in v3:
>     - pass NULL argument for signgam from w_tgammaf_compat.c
>     - use of math_narrow_eval
>     - added more comments
> 
> Changes in v4:
>     - initialize local_signgam to 0 in math/w_tgamma_template.c
>     - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file
> 
> Changes in v5:
>     - do not mention local_signgam any more in math/w_tgammaf_compat.c
>     - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
>       and added comment
> 
> Changes in v6:
>     - pass NULL as 2nd argument of __ieee754_gammaf_r in
>       w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c
> 
> Changes in v7:
>     - added Signed-off-by line for Alexei Sibidanov (author of the code)
> 
> Changes in v8:
>     - added Signed-off-by line for Paul Zimmermann (submitted of the patch)
> ---
>  benchtests/Makefile                           |   1 +
>  math/w_tgammaf_compat.c                       |   6 +-
>  sysdeps/aarch64/libm-test-ulps                |   4 -
>  sysdeps/alpha/fpu/libm-test-ulps              |   4 -
>  sysdeps/arc/fpu/libm-test-ulps                |   4 -
>  sysdeps/arc/nofpu/libm-test-ulps              |   1 -
>  sysdeps/arm/libm-test-ulps                    |   4 -
>  sysdeps/csky/fpu/libm-test-ulps               |   4 -
>  sysdeps/csky/nofpu/libm-test-ulps             |   4 -
>  sysdeps/hppa/fpu/libm-test-ulps               |   4 -
>  sysdeps/i386/fpu/libm-test-ulps               |   4 -
>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   4 -
>  sysdeps/ieee754/dbl-64/gamma_productf.c       |  45 +--
>  sysdeps/ieee754/flt-32/e_gammaf_r.c           | 321 +++++++-----------
>  sysdeps/loongarch/lp64/libm-test-ulps         |   4 -
>  sysdeps/m68k/coldfire/fpu/libm-test-ulps      |   1 -
>  sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   4 -
>  sysdeps/microblaze/libm-test-ulps             |   1 -
>  sysdeps/mips/mips32/libm-test-ulps            |   4 -
>  sysdeps/mips/mips64/libm-test-ulps            |   4 -
>  sysdeps/nios2/libm-test-ulps                  |   1 -
>  sysdeps/or1k/fpu/libm-test-ulps               |   4 -
>  sysdeps/or1k/nofpu/libm-test-ulps             |   4 -
>  sysdeps/powerpc/fpu/libm-test-ulps            |   4 -
>  sysdeps/powerpc/nofpu/libm-test-ulps          |   4 -
>  sysdeps/riscv/nofpu/libm-test-ulps            |   4 -
>  sysdeps/riscv/rvd/libm-test-ulps              |   4 -
>  sysdeps/s390/fpu/libm-test-ulps               |   4 -
>  sysdeps/sh/libm-test-ulps                     |   2 -
>  sysdeps/sparc/fpu/libm-test-ulps              |   4 -
>  sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
>  31 files changed, 133 insertions(+), 334 deletions(-)
> 
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 382fb7bae1..265ad34d8d 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -94,6 +94,7 @@ bench-math := \
>    tan \
>    tanh \
>    tgamma \
> +  tgammaf \
>    trunc \
>    truncf \
>    y0 \

The patch is missing the benchtests/tgammaf-inputs file, without it 
'make bench-build' fails with a missing input file.


> diff --git a/math/w_tgammaf_compat.c b/math/w_tgammaf_compat.c
> index 34e0e096e0..a3851f6728 100644
> --- a/math/w_tgammaf_compat.c
> +++ b/math/w_tgammaf_compat.c
> @@ -14,6 +14,7 @@
> 
>  #include <errno.h>
>  #include <math.h>
> +#include <stddef.h>
>  #include <math_private.h>
>  #include <math-svid-compat.h>
>  #include <libm-alias-float.h>
> @@ -22,8 +23,7 @@
>  float
>  __tgammaf(float x)
>  {
> -	int local_signgam;
> -	float y = __ieee754_gammaf_r(x,&local_signgam);
> +        float y = __ieee754_gammaf_r(x, NULL);

Although this file does not follow current code guidelines, I also think
we should either fix it entirely or keep using the current one.  So
maybe it would be better to just use tab here. 

> 
>  	if(__glibc_unlikely (!isfinite (y) || y == 0)
>  	   && (isfinite (x) || (isinf (x) && x < 0.0))
> @@ -41,7 +41,7 @@ __tgammaf(float x)
>  	    /* tgammaf overflow */
>  	    return __kernel_standard_f(x, x, 140);
>  	}
> -	return local_signgam < 0 ? - y : y;
> +	return y;
>  }
>  libm_alias_float (__tgamma, tgamma)
>  #endif
> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
> index a7a4a94265..b557dee8e5 100644
> --- a/sysdeps/aarch64/libm-test-ulps
> +++ b/sysdeps/aarch64/libm-test-ulps
> @@ -1653,22 +1653,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/alpha/fpu/libm-test-ulps b/sysdeps/alpha/fpu/libm-test-ulps
> index e28c2af683..5c6175406b 100644
> --- a/sysdeps/alpha/fpu/libm-test-ulps
> +++ b/sysdeps/alpha/fpu/libm-test-ulps
> @@ -1410,22 +1410,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
> index 41c8ef16d7..8f4571b5fa 100644
> --- a/sysdeps/arc/fpu/libm-test-ulps
> +++ b/sysdeps/arc/fpu/libm-test-ulps
> @@ -1093,19 +1093,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 9
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 9
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 9
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
> index d3f45957d4..6d1f97b3fc 100644
> --- a/sysdeps/arc/nofpu/libm-test-ulps
> +++ b/sysdeps/arc/nofpu/libm-test-ulps
> @@ -262,7 +262,6 @@ float: 2
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
> index 6480353d39..9ca4750f74 100644
> --- a/sysdeps/arm/libm-test-ulps
> +++ b/sysdeps/arm/libm-test-ulps
> @@ -1152,19 +1152,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/csky/fpu/libm-test-ulps b/sysdeps/csky/fpu/libm-test-ulps
> index fc634f89ca..151df00638 100644
> --- a/sysdeps/csky/fpu/libm-test-ulps
> +++ b/sysdeps/csky/fpu/libm-test-ulps
> @@ -1061,19 +1061,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_downward":
>  double: 8
> -float: 7
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/csky/nofpu/libm-test-ulps b/sysdeps/csky/nofpu/libm-test-ulps
> index a1e28c8ee0..48697a7b09 100644
> --- a/sysdeps/csky/nofpu/libm-test-ulps
> +++ b/sysdeps/csky/nofpu/libm-test-ulps
> @@ -1092,19 +1092,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_downward":
>  double: 5
> -float: 5
> 
>  Function: "tgamma_towardzero":
>  double: 5
> -float: 4
> 
>  Function: "tgamma_upward":
>  double: 4
> -float: 4
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
> index ea5101f6b6..2459868892 100644
> --- a/sysdeps/hppa/fpu/libm-test-ulps
> +++ b/sysdeps/hppa/fpu/libm-test-ulps
> @@ -1181,20 +1181,16 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 1
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
> index 03297e6527..5f12dd899a 100644
> --- a/sysdeps/i386/fpu/libm-test-ulps
> +++ b/sysdeps/i386/fpu/libm-test-ulps
> @@ -1699,25 +1699,21 @@ ldouble: 4
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> index 85a2456971..35fc9c36bd 100644
> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> @@ -1701,25 +1701,21 @@ ldouble: 4
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_upward":
>  double: 8
> -float: 8
>  float128: 4
>  ldouble: 5
> 
> diff --git a/sysdeps/ieee754/dbl-64/gamma_productf.c b/sysdeps/ieee754/dbl-64/gamma_productf.c
> index f3596eeae4..1cc8931700 100644
> --- a/sysdeps/ieee754/dbl-64/gamma_productf.c
> +++ b/sysdeps/ieee754/dbl-64/gamma_productf.c
> @@ -1,44 +1 @@
> -/* Compute a product of X, X+1, ..., with an error estimate.
> -   Copyright (C) 2013-2024 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math-narrow-eval.h>
> -#include <math_private.h>
> -#include <float.h>
> -
> -/* Compute the product of X + X_EPS, X + X_EPS + 1, ..., X + X_EPS + N
> -   - 1, in the form R * (1 + *EPS) where the return value R is an
> -   approximation to the product and *EPS is set to indicate the
> -   approximate error in the return value.  X is such that all the
> -   values X + 1, ..., X + N - 1 are exactly representable, and X_EPS /
> -   X is small enough that factors quadratic in it can be
> -   neglected.  */
> -
> -float
> -__gamma_productf (float x, float x_eps, int n, float *eps)
> -{
> -  double x_full = (double) x + (double) x_eps;
> -  double ret = x_full;
> -  for (int i = 1; i < n; i++)
> -    ret *= x_full + i;
> -
> -  float fret = math_narrow_eval ((float) ret);
> -  *eps = (ret - fret) / fret;
> -
> -  return fret;
> -}
> +/* Not needed.  */

Ok.

> diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c
> index a9730d61c1..90ed3b4890 100644
> --- a/sysdeps/ieee754/flt-32/e_gammaf_r.c
> +++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c
> @@ -1,215 +1,150 @@
> -/* Implementation of gamma function according to ISO C.
> -   Copyright (C) 1997-2024 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> +/* Implementation of the gamma function for binary32.
> 
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> +Copyright (c) 2023-2024 Alexei Sibidanov.
> 
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> +The original version of this file was copied from the CORE-MATH
> +project (file src/binary32/tgamma/tgammaf.c, revision a48e352).
> 
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> 
> -#include <math.h>
> -#include <math-narrow-eval.h>
> -#include <math_private.h>
> -#include <fenv_private.h>
> -#include <math-underflow.h>
> -#include <float.h>
> -#include <libm-alias-finite.h>
> -
> -/* Coefficients B_2k / 2k(2k-1) of x^-(2k-1) inside exp in Stirling's
> -   approximation to gamma function.  */
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> 
> -static const float gamma_coeff[] =
> -  {
> -    0x1.555556p-4f,
> -    -0xb.60b61p-12f,
> -    0x3.403404p-12f,
> -  };
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> + */
> 
> -#define NCOEFF (sizeof (gamma_coeff) / sizeof (gamma_coeff[0]))
> +/* Changes with respect to the original CORE-MATH code:
> +   - removed the dealing with errno
> +     (this is done in the wrapper math/w_tgammaf_compat.c)
> +   - usage of math_narrow_eval to deal with underflow/overflow
> +   - deal with signgamp
> + */
> 
> -/* Return gamma (X), for positive X less than 42, in the form R *
> -   2^(*EXP2_ADJ), where R is the return value and *EXP2_ADJ is set to
> -   avoid overflow or underflow in intermediate calculations.  */
> +#include <math.h>
> +#include <float.h>
> +#include <stdint.h>
> +#include <stddef.h>
> +#include <libm-alias-finite.h>
> +#include <math-narrow-eval.h>
> 
> -static float
> -gammaf_positive (float x, int *exp2_adj)
> -{
> -  int local_signgam;
> -  if (x < 0.5f)
> -    {
> -      *exp2_adj = 0;
> -      return __ieee754_expf (__ieee754_lgammaf_r (x + 1, &local_signgam)) / x;
> -    }
> -  else if (x <= 1.5f)
> -    {
> -      *exp2_adj = 0;
> -      return __ieee754_expf (__ieee754_lgammaf_r (x, &local_signgam));
> -    }
> -  else if (x < 2.5f)
> -    {
> -      *exp2_adj = 0;
> -      float x_adj = x - 1;
> -      return (__ieee754_expf (__ieee754_lgammaf_r (x_adj, &local_signgam))
> -	      * x_adj);
> -    }
> -  else
> -    {
> -      float eps = 0;
> -      float x_eps = 0;
> -      float x_adj = x;
> -      float prod = 1;
> -      if (x < 4.0f)
> -	{
> -	  /* Adjust into the range for applying Stirling's
> -	     approximation.  */
> -	  float n = ceilf (4.0f - x);
> -	  x_adj = math_narrow_eval (x + n);
> -	  x_eps = (x - (x_adj - n));
> -	  prod = __gamma_productf (x_adj - n, x_eps, n, &eps);
> -	}
> -      /* The result is now gamma (X_ADJ + X_EPS) / (PROD * (1 + EPS)).
> -	 Compute gamma (X_ADJ + X_EPS) using Stirling's approximation,
> -	 starting by computing pow (X_ADJ, X_ADJ) with a power of 2
> -	 factored out.  */
> -      float exp_adj = -eps;
> -      float x_adj_int = roundf (x_adj);
> -      float x_adj_frac = x_adj - x_adj_int;
> -      int x_adj_log2;
> -      float x_adj_mant = __frexpf (x_adj, &x_adj_log2);
> -      if (x_adj_mant < M_SQRT1_2f)
> -	{
> -	  x_adj_log2--;
> -	  x_adj_mant *= 2.0f;
> -	}
> -      *exp2_adj = x_adj_log2 * (int) x_adj_int;
> -      float ret = (__ieee754_powf (x_adj_mant, x_adj)
> -		   * __ieee754_exp2f (x_adj_log2 * x_adj_frac)
> -		   * __ieee754_expf (-x_adj)
> -		   * sqrtf (2 * M_PIf / x_adj)
> -		   / prod);
> -      exp_adj += x_eps * __ieee754_logf (x_adj);
> -      float bsum = gamma_coeff[NCOEFF - 1];
> -      float x_adj2 = x_adj * x_adj;
> -      for (size_t i = 1; i <= NCOEFF - 1; i++)
> -	bsum = bsum / x_adj2 + gamma_coeff[NCOEFF - 1 - i];
> -      exp_adj += bsum / x_adj;
> -      return ret + ret * __expm1f (exp_adj);
> -    }
> -}
> +typedef union {float f; uint32_t u;} b32u32_u;
> +typedef union {double f; uint64_t u;} b64u64_u;
> 
>  float
>  __ieee754_gammaf_r (float x, int *signgamp)
>  {
> -  int32_t hx;
> -  float ret;
> +  /* The wrapper in math/w_tgamma_template.c expects *signgamp to be set to a
> +     non-negative value if the returned value is gamma(x), and to a negative
> +     value if it is -gamma(x).
> +     Since the code here directly computes gamma(x), we set it to 1.
> +  */
> +  if (signgamp != NULL)
> +    *signgamp = 1;
> 
> -  GET_FLOAT_WORD (hx, x);
> +  /* List of exceptional cases. Each entry contains the 32-bit encoding u of x,
> +     a binary32 approximation f of gamma(x), and a correction term df.  */
> +  static const struct {uint32_t u; float f, df;} tb[] = {
> +    {0x27de86a9u, 0x1.268266p+47f, 0x1p22f},      // x = 0x1.bd0d52p-48
> +    {0x27e05475u, 0x1.242422p+47f, 0x1p22f},      // x = 0x1.c0a8eap-48
> +    {0xb63befb3u, -0x1.5cb6e4p+18f, 0x1p-7f},     // x = -0x1.77df66p-19
> +    {0x3c7bb570u, 0x1.021d9p+6f, 0x1p-19f},       // x = 0x1.f76aep-7
> +    {0x41e886d1u, 0x1.33136ap+98f, 0x1p73f},      // x = 0x1.d10da2p+4
> +    {0xc067d177u, 0x1.f6850cp-3f, 0x1p-28f},      // x = -0x1.cfa2eep+1
> +    {0xbd99da31u, -0x1.befe66p+3, -0x1p-22f},     // x = -0x1.33b462p-4
> +    {0xbf54c45au, -0x1.a6b4ecp+2, +0x1p-23f},     // x = -0x1.a988b4p-1
> +    {0x41ee77feu, 0x1.d3631cp+101, -0x1p-76f},    // x = 0x1.dceffcp+4
> +    {0x3f843a64u, 0x1.f6c638p-1, 0x1p-26f},       // x = 0x1.0874c8p+0
> +  };
> 
> -  if (__glibc_unlikely ((hx & 0x7fffffff) == 0))
> -    {
> -      /* Return value for x == 0 is Inf with divide by zero exception.  */
> -      *signgamp = 0;
> -      return 1.0 / x;
> +  b32u32_u t = {.f = x};
> +  uint32_t ax = t.u<<1;
> +  if(__builtin_expect(ax>=(0xffu<<24), 0)){ /* x=NaN or +/-Inf */
> +    if(ax==(0xffu<<24)){ /* x=+/-Inf */
> +      if(t.u>>31){ /* x=-Inf */
> +        return x / x; /* will raise the "Invalid operation" exception */
> +      }
> +      return x; /* x=+Inf */
>      }
> -  if (__builtin_expect (hx < 0, 0)
> -      && (uint32_t) hx < 0xff800000 && rintf (x) == x)
> -    {
> -      /* Return value for integer x < 0 is NaN with invalid exception.  */
> -      *signgamp = 0;
> -      return (x - x) / (x - x);
> +    return x + x; /* x=NaN, where x+x ensures the "Invalid operation"
> +                     exception is set if x is sNaN */
> +  }
> +  double z = x;
> +  if(__builtin_expect(ax<0x6d000000u, 0)){ /* |x| < 0x1p-18 */
> +    volatile double d = (0x1.fa658c23b1578p-1 - 0x1.d0a118f324b63p-1*z)*z - 0x1.2788cfc6fb619p-1;
> +    double f = 1.0/z + d;
> +    float r = f;
> +    b64u64_u rt = {.f = f};
> +    if(((rt.u+2)&0xfffffff) < 4){
> +      for(unsigned i=0;i<sizeof(tb)/sizeof(tb[0]);i++)
> +	if(t.u==tb[i].u) return tb[i].f + tb[i].df;
>      }
> -  if (__glibc_unlikely (hx == 0xff800000))
> -    {
> -      /* x == -Inf.  According to ISO this is NaN.  */
> -      *signgamp = 0;
> -      return x - x;
> +    return r;
> +  }
> +  float fx = __builtin_floorf(x);
> +  if(__builtin_expect(x >= 0x1.18522p+5f, 0)){
> +    /* Overflow case. The original CORE-MATH code returns 0x1p127f * 0x1p127f,
> +       but apparently some compilers replace this by +Inf.  */
> +    return math_narrow_eval (x * 0x1p127f);
> +  }
> +  /* compute k only after the overflow check, otherwise the case to integer
> +     might overflow */
> +  int k = fx;
> +  if(__builtin_expect(fx==x, 0)){ /* x is integer */
> +    if(x == 0.0f){
> +      return 1.0f/x;
>      }
> -  if (__glibc_unlikely ((hx & 0x7f800000) == 0x7f800000))
> -    {
> -      /* Positive infinity (return positive infinity) or NaN (return
> -	 NaN).  */
> -      *signgamp = 0;
> -      return x + x;
> +    if(x < 0.0f){
> +      return 0.0f / 0.0f; /* should raise the "Invalid operation" exception */
>      }
> +    double t0 = 1, x0 = 1;
> +    for(int i=1; i<k; i++, x0 += 1.0) t0 *= x0;
> +    return t0;
> +  }
> +  if(__builtin_expect(x<-42.0f, 0)){ /* negative non-integer */
> +    /* For x < -42, x non-integer, |gamma(x)| < 2^-151.  */
> +    static const float sgn[2] = {0x1p-127f, -0x1p-127f};
> +    /* Underflows always happens */
> +    return math_narrow_eval (0x1p-127f * sgn[k&1]);
> +  }
> +  /* The array c[] stores a degree-15 polynomial approximation for gamma(x).  */
> +  static const double c[] =
> +    {0x1.c9a76be577123p+0, 0x1.8f2754ddcf90dp+0, 0x1.0d1191949419bp+0, 0x1.e1f42cf0ae4a1p-2,
> +     0x1.82b358a3ab638p-3, 0x1.e1f2b30cd907bp-5, 0x1.240f6d4071bd8p-6, 0x1.1522c9f3cd012p-8,
> +     0x1.1fd0051a0525bp-10, 0x1.9808a8b96c37ep-13, 0x1.b3f78e01152b5p-15, 0x1.49c85a7e1fd04p-18,
> +     0x1.471ca49184475p-19, -0x1.368f0b7ed9e36p-23, 0x1.882222f9049efp-23, -0x1.a69ed2042842cp-25};
> 
> -  if (x >= 36.0f)
> -    {
> -      /* Overflow.  */
> -      *signgamp = 0;
> -      ret = math_narrow_eval (FLT_MAX * FLT_MAX);
> -      return ret;
> -    }
> -  else
> -    {
> -      SET_RESTORE_ROUNDF (FE_TONEAREST);
> -      if (x > 0.0f)
> -	{
> -	  *signgamp = 0;
> -	  int exp2_adj;
> -	  float tret = gammaf_positive (x, &exp2_adj);
> -	  ret = __scalbnf (tret, exp2_adj);
> -	}
> -      else if (x >= -FLT_EPSILON / 4.0f)
> -	{
> -	  *signgamp = 0;
> -	  ret = 1.0f / x;
> -	}
> -      else
> -	{
> -	  float tx = truncf (x);
> -	  *signgamp = (tx == 2.0f * truncf (tx / 2.0f)) ? -1 : 1;
> -	  if (x <= -42.0f)
> -	    /* Underflow.  */
> -	    ret = FLT_MIN * FLT_MIN;
> -	  else
> -	    {
> -	      float frac = tx - x;
> -	      if (frac > 0.5f)
> -		frac = 1.0f - frac;
> -	      float sinpix = (frac <= 0.25f
> -			      ? __sinf (M_PIf * frac)
> -			      : __cosf (M_PIf * (0.5f - frac)));
> -	      int exp2_adj;
> -	      float tret = M_PIf / (-x * sinpix
> -				    * gammaf_positive (-x, &exp2_adj));
> -	      ret = __scalbnf (tret, -exp2_adj);
> -	      math_check_force_underflow_nonneg (ret);
> -	    }
> -	}
> -      ret = math_narrow_eval (ret);
> -    }
> -  if (isinf (ret) && x != 0)
> -    {
> -      if (*signgamp < 0)
> -	{
> -	  ret = math_narrow_eval (-copysignf (FLT_MAX, ret) * FLT_MAX);
> -	  ret = -ret;
> -	}
> -      else
> -	ret = math_narrow_eval (copysignf (FLT_MAX, ret) * FLT_MAX);
> -      return ret;
> -    }
> -  else if (ret == 0)
> -    {
> -      if (*signgamp < 0)
> -	{
> -	  ret = math_narrow_eval (-copysignf (FLT_MIN, ret) * FLT_MIN);
> -	  ret = -ret;
> -	}
> -      else
> -	ret = math_narrow_eval (copysignf (FLT_MIN, ret) * FLT_MIN);
> -      return ret;
> +  double m = z - 0x1.7p+1, i = __builtin_roundeven(m), step = __builtin_copysign(1.0,i);
> +  double d = m - i, d2 = d*d, d4 = d2*d2, d8 = d4*d4;
> +  double f = (c[0] + d*c[1]) + d2*(c[2] + d*c[3]) + d4*((c[4] + d*c[5]) + d2*(c[6] + d*c[7]))
> +    + d8*((c[8] + d*c[9]) + d2*(c[10] + d*c[11]) + d4*((c[12] + d*c[13]) + d2*(c[14] + d*c[15])));
> +  int jm = __builtin_fabs(i);
> +  double w = 1;
> +  if(jm){
> +    z -= 0.5 + step*0.5;
> +    w = z;
> +    for(int j=jm-1; j; j--) {z -= step; w *= z;}
> +  }
> +  if(i<=-0.5) w = 1/w;
> +  f *= w;
> +  b64u64_u rt = {.f = f};
> +  float r = f;
> +  /* Deal with exceptional cases.  */
> +  if(__builtin_expect(((rt.u+2)&0xfffffff) < 8, 0)){
> +    for(unsigned j=0;j<sizeof(tb)/sizeof(tb[0]);j++) {
> +      if(t.u==tb[j].u) return tb[j].f + tb[j].df;
>      }
> -  else
> -    return ret;
> +  }
> +  return r;
>  }
>  libm_alias_finite (__ieee754_gammaf_r, __gammaf_r)

Is this code meant to be eventually synced with the CORE-MATH? If so I think
it should be ok to keep current style and add an entry to SHARED-FILES with
the master project and some brief instruction on how it should be merged.

Otherwise, if this was adapted to glibc I think it should be better to use
glibc style and convention.  You can easily adapt it with clang-format.py
along with our .clang-format file. There is also a couple of changes that
it would be good to do:

  * replace __builtin_expect with __glibc_likely.
  * replace _builtin_copysign with copysign (which current gcc will also 
    use the builtin anyway).
  * remove C++ style comments ('//').
  * replace the b32u32_u type with the ones on math_config.h
  * replace sizeof(tb)/sizeof(tb[0]) with array_length.
  * return '0.0f / 0.0f' with __math_divzerof.

> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
> index bdfd683454..5668a801c9 100644
> --- a/sysdeps/loongarch/lp64/libm-test-ulps
> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
> @@ -1432,22 +1432,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/m68k/coldfire/fpu/libm-test-ulps b/sysdeps/m68k/coldfire/fpu/libm-test-ulps
> index 1b25a70e3f..ae05498719 100644
> --- a/sysdeps/m68k/coldfire/fpu/libm-test-ulps
> +++ b/sysdeps/m68k/coldfire/fpu/libm-test-ulps
> @@ -146,7 +146,6 @@ double: 1
> 
>  Function: "tgamma":
>  double: 1
> -float: 1
> 
>  Function: "y0":
>  double: 2
> diff --git a/sysdeps/m68k/m680x0/fpu/libm-test-ulps b/sysdeps/m68k/m680x0/fpu/libm-test-ulps
> index 6eacfb6b6f..73e98eb479 100644
> --- a/sysdeps/m68k/m680x0/fpu/libm-test-ulps
> +++ b/sysdeps/m68k/m680x0/fpu/libm-test-ulps
> @@ -1208,22 +1208,18 @@ float: 1
> 
>  Function: "tgamma":
>  double: 3
> -float: 9
>  ldouble: 9
> 
>  Function: "tgamma_downward":
>  double: 3
> -float: 9
>  ldouble: 9
> 
>  Function: "tgamma_towardzero":
>  double: 3
> -float: 9
>  ldouble: 9
> 
>  Function: "tgamma_upward":
>  double: 2
> -float: 9
>  ldouble: 9
> 
>  Function: "y0":
> diff --git a/sysdeps/microblaze/libm-test-ulps b/sysdeps/microblaze/libm-test-ulps
> index d3666eb7d4..98ea940ccc 100644
> --- a/sysdeps/microblaze/libm-test-ulps
> +++ b/sysdeps/microblaze/libm-test-ulps
> @@ -257,7 +257,6 @@ float: 2
> 
>  Function: "tgamma":
>  double: 5
> -float: 4
> 
>  Function: "y0":
>  double: 2
> diff --git a/sysdeps/mips/mips32/libm-test-ulps b/sysdeps/mips/mips32/libm-test-ulps
> index 83772d5772..a6480f1dcf 100644
> --- a/sysdeps/mips/mips32/libm-test-ulps
> +++ b/sysdeps/mips/mips32/libm-test-ulps
> @@ -1156,19 +1156,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
> index 0addfd6cb4..3c86d96053 100644
> --- a/sysdeps/mips/mips64/libm-test-ulps
> +++ b/sysdeps/mips/mips64/libm-test-ulps
> @@ -1444,22 +1444,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/nios2/libm-test-ulps b/sysdeps/nios2/libm-test-ulps
> index c8d1a722f7..df6e932cb6 100644
> --- a/sysdeps/nios2/libm-test-ulps
> +++ b/sysdeps/nios2/libm-test-ulps
> @@ -266,7 +266,6 @@ float: 2
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
> index 59b9f072f5..4ccb136298 100644
> --- a/sysdeps/or1k/fpu/libm-test-ulps
> +++ b/sysdeps/or1k/fpu/libm-test-ulps
> @@ -1066,19 +1066,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 9
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
> index 726855faaa..7087cf9add 100644
> --- a/sysdeps/or1k/nofpu/libm-test-ulps
> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
> @@ -1064,19 +1064,15 @@ float: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 9
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
> index 2e038492cd..dc953bead9 100644
> --- a/sysdeps/powerpc/fpu/libm-test-ulps
> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
> @@ -1828,25 +1828,21 @@ ldouble: 6
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
> diff --git a/sysdeps/powerpc/nofpu/libm-test-ulps b/sysdeps/powerpc/nofpu/libm-test-ulps
> index dc9b499cc4..c604baa245 100644
> --- a/sysdeps/powerpc/nofpu/libm-test-ulps
> +++ b/sysdeps/powerpc/nofpu/libm-test-ulps
> @@ -1560,22 +1560,18 @@ ldouble: 6
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 5
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> index 9ad64d1d85..4adad138b0 100644
> --- a/sysdeps/riscv/nofpu/libm-test-ulps
> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
> @@ -1361,22 +1361,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 5
> -float: 5
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 5
> -float: 4
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 4
> -float: 4
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
> index 1e6c092361..eb27d4f2f9 100644
> --- a/sysdeps/riscv/rvd/libm-test-ulps
> +++ b/sysdeps/riscv/rvd/libm-test-ulps
> @@ -1431,22 +1431,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 8
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
> index 9ac3db4fa5..e6872cf932 100644
> --- a/sysdeps/s390/fpu/libm-test-ulps
> +++ b/sysdeps/s390/fpu/libm-test-ulps
> @@ -1429,22 +1429,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/sh/libm-test-ulps b/sysdeps/sh/libm-test-ulps
> index 3c84259941..36f21ed395 100644
> --- a/sysdeps/sh/libm-test-ulps
> +++ b/sysdeps/sh/libm-test-ulps
> @@ -532,11 +532,9 @@ float: 2
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
> 
>  Function: "y0":
>  double: 3
> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
> index 0142357b3f..ae1e8ee972 100644
> --- a/sysdeps/sparc/fpu/libm-test-ulps
> +++ b/sysdeps/sparc/fpu/libm-test-ulps
> @@ -1444,22 +1444,18 @@ ldouble: 3
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  ldouble: 5
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  ldouble: 4
> 
>  Function: "y0":
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 37d8998c71..29e3ca926a 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -2263,25 +2263,21 @@ double: 1
> 
>  Function: "tgamma":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
>  Function: "tgamma_downward":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_towardzero":
>  double: 9
> -float: 7
>  float128: 5
>  ldouble: 6
> 
>  Function: "tgamma_upward":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5
> 
> --
> 2.43.0
Paul Zimmermann Oct. 8, 2024, 9:27 a.m. UTC | #2
thank you Adhemerval for your review. I will send a new version
addressing your comments.

Since the file sysdeps/ieee754/flt-32/e_gammaf_r.c is meant to be synced
with CORE-MATH, I have added instructions in SHARED-FILES on how to do it.

Paul
diff mbox series

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 382fb7bae1..265ad34d8d 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -94,6 +94,7 @@  bench-math := \
   tan \
   tanh \
   tgamma \
+  tgammaf \
   trunc \
   truncf \
   y0 \
diff --git a/math/w_tgammaf_compat.c b/math/w_tgammaf_compat.c
index 34e0e096e0..a3851f6728 100644
--- a/math/w_tgammaf_compat.c
+++ b/math/w_tgammaf_compat.c
@@ -14,6 +14,7 @@ 

 #include <errno.h>
 #include <math.h>
+#include <stddef.h>
 #include <math_private.h>
 #include <math-svid-compat.h>
 #include <libm-alias-float.h>
@@ -22,8 +23,7 @@ 
 float
 __tgammaf(float x)
 {
-	int local_signgam;
-	float y = __ieee754_gammaf_r(x,&local_signgam);
+        float y = __ieee754_gammaf_r(x, NULL);

 	if(__glibc_unlikely (!isfinite (y) || y == 0)
 	   && (isfinite (x) || (isinf (x) && x < 0.0))
@@ -41,7 +41,7 @@  __tgammaf(float x)
 	    /* tgammaf overflow */
 	    return __kernel_standard_f(x, x, 140);
 	}
-	return local_signgam < 0 ? - y : y;
+	return y;
 }
 libm_alias_float (__tgamma, tgamma)
 #endif
diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
index a7a4a94265..b557dee8e5 100644
--- a/sysdeps/aarch64/libm-test-ulps
+++ b/sysdeps/aarch64/libm-test-ulps
@@ -1653,22 +1653,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/alpha/fpu/libm-test-ulps b/sysdeps/alpha/fpu/libm-test-ulps
index e28c2af683..5c6175406b 100644
--- a/sysdeps/alpha/fpu/libm-test-ulps
+++ b/sysdeps/alpha/fpu/libm-test-ulps
@@ -1410,22 +1410,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
index 41c8ef16d7..8f4571b5fa 100644
--- a/sysdeps/arc/fpu/libm-test-ulps
+++ b/sysdeps/arc/fpu/libm-test-ulps
@@ -1093,19 +1093,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 9

 Function: "tgamma_downward":
 double: 9
-float: 9

 Function: "tgamma_towardzero":
 double: 9
-float: 8

 Function: "tgamma_upward":
 double: 9
-float: 9

 Function: "y0":
 double: 3
diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
index d3f45957d4..6d1f97b3fc 100644
--- a/sysdeps/arc/nofpu/libm-test-ulps
+++ b/sysdeps/arc/nofpu/libm-test-ulps
@@ -262,7 +262,6 @@  float: 2

 Function: "tgamma":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
index 6480353d39..9ca4750f74 100644
--- a/sysdeps/arm/libm-test-ulps
+++ b/sysdeps/arm/libm-test-ulps
@@ -1152,19 +1152,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_downward":
 double: 9
-float: 7

 Function: "tgamma_towardzero":
 double: 9
-float: 7

 Function: "tgamma_upward":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/csky/fpu/libm-test-ulps b/sysdeps/csky/fpu/libm-test-ulps
index fc634f89ca..151df00638 100644
--- a/sysdeps/csky/fpu/libm-test-ulps
+++ b/sysdeps/csky/fpu/libm-test-ulps
@@ -1061,19 +1061,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_downward":
 double: 8
-float: 7

 Function: "tgamma_towardzero":
 double: 9
-float: 7

 Function: "tgamma_upward":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/csky/nofpu/libm-test-ulps b/sysdeps/csky/nofpu/libm-test-ulps
index a1e28c8ee0..48697a7b09 100644
--- a/sysdeps/csky/nofpu/libm-test-ulps
+++ b/sysdeps/csky/nofpu/libm-test-ulps
@@ -1092,19 +1092,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_downward":
 double: 5
-float: 5

 Function: "tgamma_towardzero":
 double: 5
-float: 4

 Function: "tgamma_upward":
 double: 4
-float: 4

 Function: "y0":
 double: 3
diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
index ea5101f6b6..2459868892 100644
--- a/sysdeps/hppa/fpu/libm-test-ulps
+++ b/sysdeps/hppa/fpu/libm-test-ulps
@@ -1181,20 +1181,16 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 1

 Function: "tgamma_downward":
 double: 9
-float: 7

 Function: "tgamma_towardzero":
 double: 9
-float: 7

 Function: "tgamma_upward":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
index 03297e6527..5f12dd899a 100644
--- a/sysdeps/i386/fpu/libm-test-ulps
+++ b/sysdeps/i386/fpu/libm-test-ulps
@@ -1699,25 +1699,21 @@  ldouble: 4

 Function: "tgamma":
 double: 9
-float: 8
 float128: 4
 ldouble: 5

 Function: "tgamma_downward":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_upward":
 double: 9
-float: 8
 float128: 4
 ldouble: 5

diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
index 85a2456971..35fc9c36bd 100644
--- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
+++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
@@ -1701,25 +1701,21 @@  ldouble: 4

 Function: "tgamma":
 double: 9
-float: 8
 float128: 4
 ldouble: 5

 Function: "tgamma_downward":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_upward":
 double: 8
-float: 8
 float128: 4
 ldouble: 5

diff --git a/sysdeps/ieee754/dbl-64/gamma_productf.c b/sysdeps/ieee754/dbl-64/gamma_productf.c
index f3596eeae4..1cc8931700 100644
--- a/sysdeps/ieee754/dbl-64/gamma_productf.c
+++ b/sysdeps/ieee754/dbl-64/gamma_productf.c
@@ -1,44 +1 @@ 
-/* Compute a product of X, X+1, ..., with an error estimate.
-   Copyright (C) 2013-2024 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math-narrow-eval.h>
-#include <math_private.h>
-#include <float.h>
-
-/* Compute the product of X + X_EPS, X + X_EPS + 1, ..., X + X_EPS + N
-   - 1, in the form R * (1 + *EPS) where the return value R is an
-   approximation to the product and *EPS is set to indicate the
-   approximate error in the return value.  X is such that all the
-   values X + 1, ..., X + N - 1 are exactly representable, and X_EPS /
-   X is small enough that factors quadratic in it can be
-   neglected.  */
-
-float
-__gamma_productf (float x, float x_eps, int n, float *eps)
-{
-  double x_full = (double) x + (double) x_eps;
-  double ret = x_full;
-  for (int i = 1; i < n; i++)
-    ret *= x_full + i;
-
-  float fret = math_narrow_eval ((float) ret);
-  *eps = (ret - fret) / fret;
-
-  return fret;
-}
+/* Not needed.  */
diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c
index a9730d61c1..90ed3b4890 100644
--- a/sysdeps/ieee754/flt-32/e_gammaf_r.c
+++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c
@@ -1,215 +1,150 @@ 
-/* Implementation of gamma function according to ISO C.
-   Copyright (C) 1997-2024 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
+/* Implementation of the gamma function for binary32.

-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
+Copyright (c) 2023-2024 Alexei Sibidanov.

-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
+The original version of this file was copied from the CORE-MATH
+project (file src/binary32/tgamma/tgammaf.c, revision a48e352).

-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:

-#include <math.h>
-#include <math-narrow-eval.h>
-#include <math_private.h>
-#include <fenv_private.h>
-#include <math-underflow.h>
-#include <float.h>
-#include <libm-alias-finite.h>
-
-/* Coefficients B_2k / 2k(2k-1) of x^-(2k-1) inside exp in Stirling's
-   approximation to gamma function.  */
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.

-static const float gamma_coeff[] =
-  {
-    0x1.555556p-4f,
-    -0xb.60b61p-12f,
-    0x3.403404p-12f,
-  };
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+ */

-#define NCOEFF (sizeof (gamma_coeff) / sizeof (gamma_coeff[0]))
+/* Changes with respect to the original CORE-MATH code:
+   - removed the dealing with errno
+     (this is done in the wrapper math/w_tgammaf_compat.c)
+   - usage of math_narrow_eval to deal with underflow/overflow
+   - deal with signgamp
+ */

-/* Return gamma (X), for positive X less than 42, in the form R *
-   2^(*EXP2_ADJ), where R is the return value and *EXP2_ADJ is set to
-   avoid overflow or underflow in intermediate calculations.  */
+#include <math.h>
+#include <float.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <libm-alias-finite.h>
+#include <math-narrow-eval.h>

-static float
-gammaf_positive (float x, int *exp2_adj)
-{
-  int local_signgam;
-  if (x < 0.5f)
-    {
-      *exp2_adj = 0;
-      return __ieee754_expf (__ieee754_lgammaf_r (x + 1, &local_signgam)) / x;
-    }
-  else if (x <= 1.5f)
-    {
-      *exp2_adj = 0;
-      return __ieee754_expf (__ieee754_lgammaf_r (x, &local_signgam));
-    }
-  else if (x < 2.5f)
-    {
-      *exp2_adj = 0;
-      float x_adj = x - 1;
-      return (__ieee754_expf (__ieee754_lgammaf_r (x_adj, &local_signgam))
-	      * x_adj);
-    }
-  else
-    {
-      float eps = 0;
-      float x_eps = 0;
-      float x_adj = x;
-      float prod = 1;
-      if (x < 4.0f)
-	{
-	  /* Adjust into the range for applying Stirling's
-	     approximation.  */
-	  float n = ceilf (4.0f - x);
-	  x_adj = math_narrow_eval (x + n);
-	  x_eps = (x - (x_adj - n));
-	  prod = __gamma_productf (x_adj - n, x_eps, n, &eps);
-	}
-      /* The result is now gamma (X_ADJ + X_EPS) / (PROD * (1 + EPS)).
-	 Compute gamma (X_ADJ + X_EPS) using Stirling's approximation,
-	 starting by computing pow (X_ADJ, X_ADJ) with a power of 2
-	 factored out.  */
-      float exp_adj = -eps;
-      float x_adj_int = roundf (x_adj);
-      float x_adj_frac = x_adj - x_adj_int;
-      int x_adj_log2;
-      float x_adj_mant = __frexpf (x_adj, &x_adj_log2);
-      if (x_adj_mant < M_SQRT1_2f)
-	{
-	  x_adj_log2--;
-	  x_adj_mant *= 2.0f;
-	}
-      *exp2_adj = x_adj_log2 * (int) x_adj_int;
-      float ret = (__ieee754_powf (x_adj_mant, x_adj)
-		   * __ieee754_exp2f (x_adj_log2 * x_adj_frac)
-		   * __ieee754_expf (-x_adj)
-		   * sqrtf (2 * M_PIf / x_adj)
-		   / prod);
-      exp_adj += x_eps * __ieee754_logf (x_adj);
-      float bsum = gamma_coeff[NCOEFF - 1];
-      float x_adj2 = x_adj * x_adj;
-      for (size_t i = 1; i <= NCOEFF - 1; i++)
-	bsum = bsum / x_adj2 + gamma_coeff[NCOEFF - 1 - i];
-      exp_adj += bsum / x_adj;
-      return ret + ret * __expm1f (exp_adj);
-    }
-}
+typedef union {float f; uint32_t u;} b32u32_u;
+typedef union {double f; uint64_t u;} b64u64_u;

 float
 __ieee754_gammaf_r (float x, int *signgamp)
 {
-  int32_t hx;
-  float ret;
+  /* The wrapper in math/w_tgamma_template.c expects *signgamp to be set to a
+     non-negative value if the returned value is gamma(x), and to a negative
+     value if it is -gamma(x).
+     Since the code here directly computes gamma(x), we set it to 1.
+  */
+  if (signgamp != NULL)
+    *signgamp = 1;

-  GET_FLOAT_WORD (hx, x);
+  /* List of exceptional cases. Each entry contains the 32-bit encoding u of x,
+     a binary32 approximation f of gamma(x), and a correction term df.  */
+  static const struct {uint32_t u; float f, df;} tb[] = {
+    {0x27de86a9u, 0x1.268266p+47f, 0x1p22f},      // x = 0x1.bd0d52p-48
+    {0x27e05475u, 0x1.242422p+47f, 0x1p22f},      // x = 0x1.c0a8eap-48
+    {0xb63befb3u, -0x1.5cb6e4p+18f, 0x1p-7f},     // x = -0x1.77df66p-19
+    {0x3c7bb570u, 0x1.021d9p+6f, 0x1p-19f},       // x = 0x1.f76aep-7
+    {0x41e886d1u, 0x1.33136ap+98f, 0x1p73f},      // x = 0x1.d10da2p+4
+    {0xc067d177u, 0x1.f6850cp-3f, 0x1p-28f},      // x = -0x1.cfa2eep+1
+    {0xbd99da31u, -0x1.befe66p+3, -0x1p-22f},     // x = -0x1.33b462p-4
+    {0xbf54c45au, -0x1.a6b4ecp+2, +0x1p-23f},     // x = -0x1.a988b4p-1
+    {0x41ee77feu, 0x1.d3631cp+101, -0x1p-76f},    // x = 0x1.dceffcp+4
+    {0x3f843a64u, 0x1.f6c638p-1, 0x1p-26f},       // x = 0x1.0874c8p+0
+  };

-  if (__glibc_unlikely ((hx & 0x7fffffff) == 0))
-    {
-      /* Return value for x == 0 is Inf with divide by zero exception.  */
-      *signgamp = 0;
-      return 1.0 / x;
+  b32u32_u t = {.f = x};
+  uint32_t ax = t.u<<1;
+  if(__builtin_expect(ax>=(0xffu<<24), 0)){ /* x=NaN or +/-Inf */
+    if(ax==(0xffu<<24)){ /* x=+/-Inf */
+      if(t.u>>31){ /* x=-Inf */
+        return x / x; /* will raise the "Invalid operation" exception */
+      }
+      return x; /* x=+Inf */
     }
-  if (__builtin_expect (hx < 0, 0)
-      && (uint32_t) hx < 0xff800000 && rintf (x) == x)
-    {
-      /* Return value for integer x < 0 is NaN with invalid exception.  */
-      *signgamp = 0;
-      return (x - x) / (x - x);
+    return x + x; /* x=NaN, where x+x ensures the "Invalid operation"
+                     exception is set if x is sNaN */
+  }
+  double z = x;
+  if(__builtin_expect(ax<0x6d000000u, 0)){ /* |x| < 0x1p-18 */
+    volatile double d = (0x1.fa658c23b1578p-1 - 0x1.d0a118f324b63p-1*z)*z - 0x1.2788cfc6fb619p-1;
+    double f = 1.0/z + d;
+    float r = f;
+    b64u64_u rt = {.f = f};
+    if(((rt.u+2)&0xfffffff) < 4){
+      for(unsigned i=0;i<sizeof(tb)/sizeof(tb[0]);i++)
+	if(t.u==tb[i].u) return tb[i].f + tb[i].df;
     }
-  if (__glibc_unlikely (hx == 0xff800000))
-    {
-      /* x == -Inf.  According to ISO this is NaN.  */
-      *signgamp = 0;
-      return x - x;
+    return r;
+  }
+  float fx = __builtin_floorf(x);
+  if(__builtin_expect(x >= 0x1.18522p+5f, 0)){
+    /* Overflow case. The original CORE-MATH code returns 0x1p127f * 0x1p127f,
+       but apparently some compilers replace this by +Inf.  */
+    return math_narrow_eval (x * 0x1p127f);
+  }
+  /* compute k only after the overflow check, otherwise the case to integer
+     might overflow */
+  int k = fx;
+  if(__builtin_expect(fx==x, 0)){ /* x is integer */
+    if(x == 0.0f){
+      return 1.0f/x;
     }
-  if (__glibc_unlikely ((hx & 0x7f800000) == 0x7f800000))
-    {
-      /* Positive infinity (return positive infinity) or NaN (return
-	 NaN).  */
-      *signgamp = 0;
-      return x + x;
+    if(x < 0.0f){
+      return 0.0f / 0.0f; /* should raise the "Invalid operation" exception */
     }
+    double t0 = 1, x0 = 1;
+    for(int i=1; i<k; i++, x0 += 1.0) t0 *= x0;
+    return t0;
+  }
+  if(__builtin_expect(x<-42.0f, 0)){ /* negative non-integer */
+    /* For x < -42, x non-integer, |gamma(x)| < 2^-151.  */
+    static const float sgn[2] = {0x1p-127f, -0x1p-127f};
+    /* Underflows always happens */
+    return math_narrow_eval (0x1p-127f * sgn[k&1]);
+  }
+  /* The array c[] stores a degree-15 polynomial approximation for gamma(x).  */
+  static const double c[] =
+    {0x1.c9a76be577123p+0, 0x1.8f2754ddcf90dp+0, 0x1.0d1191949419bp+0, 0x1.e1f42cf0ae4a1p-2,
+     0x1.82b358a3ab638p-3, 0x1.e1f2b30cd907bp-5, 0x1.240f6d4071bd8p-6, 0x1.1522c9f3cd012p-8,
+     0x1.1fd0051a0525bp-10, 0x1.9808a8b96c37ep-13, 0x1.b3f78e01152b5p-15, 0x1.49c85a7e1fd04p-18,
+     0x1.471ca49184475p-19, -0x1.368f0b7ed9e36p-23, 0x1.882222f9049efp-23, -0x1.a69ed2042842cp-25};

-  if (x >= 36.0f)
-    {
-      /* Overflow.  */
-      *signgamp = 0;
-      ret = math_narrow_eval (FLT_MAX * FLT_MAX);
-      return ret;
-    }
-  else
-    {
-      SET_RESTORE_ROUNDF (FE_TONEAREST);
-      if (x > 0.0f)
-	{
-	  *signgamp = 0;
-	  int exp2_adj;
-	  float tret = gammaf_positive (x, &exp2_adj);
-	  ret = __scalbnf (tret, exp2_adj);
-	}
-      else if (x >= -FLT_EPSILON / 4.0f)
-	{
-	  *signgamp = 0;
-	  ret = 1.0f / x;
-	}
-      else
-	{
-	  float tx = truncf (x);
-	  *signgamp = (tx == 2.0f * truncf (tx / 2.0f)) ? -1 : 1;
-	  if (x <= -42.0f)
-	    /* Underflow.  */
-	    ret = FLT_MIN * FLT_MIN;
-	  else
-	    {
-	      float frac = tx - x;
-	      if (frac > 0.5f)
-		frac = 1.0f - frac;
-	      float sinpix = (frac <= 0.25f
-			      ? __sinf (M_PIf * frac)
-			      : __cosf (M_PIf * (0.5f - frac)));
-	      int exp2_adj;
-	      float tret = M_PIf / (-x * sinpix
-				    * gammaf_positive (-x, &exp2_adj));
-	      ret = __scalbnf (tret, -exp2_adj);
-	      math_check_force_underflow_nonneg (ret);
-	    }
-	}
-      ret = math_narrow_eval (ret);
-    }
-  if (isinf (ret) && x != 0)
-    {
-      if (*signgamp < 0)
-	{
-	  ret = math_narrow_eval (-copysignf (FLT_MAX, ret) * FLT_MAX);
-	  ret = -ret;
-	}
-      else
-	ret = math_narrow_eval (copysignf (FLT_MAX, ret) * FLT_MAX);
-      return ret;
-    }
-  else if (ret == 0)
-    {
-      if (*signgamp < 0)
-	{
-	  ret = math_narrow_eval (-copysignf (FLT_MIN, ret) * FLT_MIN);
-	  ret = -ret;
-	}
-      else
-	ret = math_narrow_eval (copysignf (FLT_MIN, ret) * FLT_MIN);
-      return ret;
+  double m = z - 0x1.7p+1, i = __builtin_roundeven(m), step = __builtin_copysign(1.0,i);
+  double d = m - i, d2 = d*d, d4 = d2*d2, d8 = d4*d4;
+  double f = (c[0] + d*c[1]) + d2*(c[2] + d*c[3]) + d4*((c[4] + d*c[5]) + d2*(c[6] + d*c[7]))
+    + d8*((c[8] + d*c[9]) + d2*(c[10] + d*c[11]) + d4*((c[12] + d*c[13]) + d2*(c[14] + d*c[15])));
+  int jm = __builtin_fabs(i);
+  double w = 1;
+  if(jm){
+    z -= 0.5 + step*0.5;
+    w = z;
+    for(int j=jm-1; j; j--) {z -= step; w *= z;}
+  }
+  if(i<=-0.5) w = 1/w;
+  f *= w;
+  b64u64_u rt = {.f = f};
+  float r = f;
+  /* Deal with exceptional cases.  */
+  if(__builtin_expect(((rt.u+2)&0xfffffff) < 8, 0)){
+    for(unsigned j=0;j<sizeof(tb)/sizeof(tb[0]);j++) {
+      if(t.u==tb[j].u) return tb[j].f + tb[j].df;
     }
-  else
-    return ret;
+  }
+  return r;
 }
 libm_alias_finite (__ieee754_gammaf_r, __gammaf_r)
diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
index bdfd683454..5668a801c9 100644
--- a/sysdeps/loongarch/lp64/libm-test-ulps
+++ b/sysdeps/loongarch/lp64/libm-test-ulps
@@ -1432,22 +1432,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/m68k/coldfire/fpu/libm-test-ulps b/sysdeps/m68k/coldfire/fpu/libm-test-ulps
index 1b25a70e3f..ae05498719 100644
--- a/sysdeps/m68k/coldfire/fpu/libm-test-ulps
+++ b/sysdeps/m68k/coldfire/fpu/libm-test-ulps
@@ -146,7 +146,6 @@  double: 1

 Function: "tgamma":
 double: 1
-float: 1

 Function: "y0":
 double: 2
diff --git a/sysdeps/m68k/m680x0/fpu/libm-test-ulps b/sysdeps/m68k/m680x0/fpu/libm-test-ulps
index 6eacfb6b6f..73e98eb479 100644
--- a/sysdeps/m68k/m680x0/fpu/libm-test-ulps
+++ b/sysdeps/m68k/m680x0/fpu/libm-test-ulps
@@ -1208,22 +1208,18 @@  float: 1

 Function: "tgamma":
 double: 3
-float: 9
 ldouble: 9

 Function: "tgamma_downward":
 double: 3
-float: 9
 ldouble: 9

 Function: "tgamma_towardzero":
 double: 3
-float: 9
 ldouble: 9

 Function: "tgamma_upward":
 double: 2
-float: 9
 ldouble: 9

 Function: "y0":
diff --git a/sysdeps/microblaze/libm-test-ulps b/sysdeps/microblaze/libm-test-ulps
index d3666eb7d4..98ea940ccc 100644
--- a/sysdeps/microblaze/libm-test-ulps
+++ b/sysdeps/microblaze/libm-test-ulps
@@ -257,7 +257,6 @@  float: 2

 Function: "tgamma":
 double: 5
-float: 4

 Function: "y0":
 double: 2
diff --git a/sysdeps/mips/mips32/libm-test-ulps b/sysdeps/mips/mips32/libm-test-ulps
index 83772d5772..a6480f1dcf 100644
--- a/sysdeps/mips/mips32/libm-test-ulps
+++ b/sysdeps/mips/mips32/libm-test-ulps
@@ -1156,19 +1156,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_downward":
 double: 9
-float: 7

 Function: "tgamma_towardzero":
 double: 9
-float: 7

 Function: "tgamma_upward":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
index 0addfd6cb4..3c86d96053 100644
--- a/sysdeps/mips/mips64/libm-test-ulps
+++ b/sysdeps/mips/mips64/libm-test-ulps
@@ -1444,22 +1444,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/nios2/libm-test-ulps b/sysdeps/nios2/libm-test-ulps
index c8d1a722f7..df6e932cb6 100644
--- a/sysdeps/nios2/libm-test-ulps
+++ b/sysdeps/nios2/libm-test-ulps
@@ -266,7 +266,6 @@  float: 2

 Function: "tgamma":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
index 59b9f072f5..4ccb136298 100644
--- a/sysdeps/or1k/fpu/libm-test-ulps
+++ b/sysdeps/or1k/fpu/libm-test-ulps
@@ -1066,19 +1066,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_downward":
 double: 9
-float: 9

 Function: "tgamma_towardzero":
 double: 9
-float: 8

 Function: "tgamma_upward":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
index 726855faaa..7087cf9add 100644
--- a/sysdeps/or1k/nofpu/libm-test-ulps
+++ b/sysdeps/or1k/nofpu/libm-test-ulps
@@ -1064,19 +1064,15 @@  float: 3

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_downward":
 double: 9
-float: 9

 Function: "tgamma_towardzero":
 double: 9
-float: 8

 Function: "tgamma_upward":
 double: 9
-float: 8

 Function: "y0":
 double: 3
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index 2e038492cd..dc953bead9 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -1828,25 +1828,21 @@  ldouble: 6

 Function: "tgamma":
 double: 9
-float: 8
 float128: 4
 ldouble: 5

 Function: "tgamma_downward":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 float128: 5
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 float128: 4
 ldouble: 5

diff --git a/sysdeps/powerpc/nofpu/libm-test-ulps b/sysdeps/powerpc/nofpu/libm-test-ulps
index dc9b499cc4..c604baa245 100644
--- a/sysdeps/powerpc/nofpu/libm-test-ulps
+++ b/sysdeps/powerpc/nofpu/libm-test-ulps
@@ -1560,22 +1560,18 @@  ldouble: 6

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 5

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
index 9ad64d1d85..4adad138b0 100644
--- a/sysdeps/riscv/nofpu/libm-test-ulps
+++ b/sysdeps/riscv/nofpu/libm-test-ulps
@@ -1361,22 +1361,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 5
-float: 5
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 5
-float: 4
 ldouble: 5

 Function: "tgamma_upward":
 double: 4
-float: 4
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
index 1e6c092361..eb27d4f2f9 100644
--- a/sysdeps/riscv/rvd/libm-test-ulps
+++ b/sysdeps/riscv/rvd/libm-test-ulps
@@ -1431,22 +1431,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 8
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
index 9ac3db4fa5..e6872cf932 100644
--- a/sysdeps/s390/fpu/libm-test-ulps
+++ b/sysdeps/s390/fpu/libm-test-ulps
@@ -1429,22 +1429,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/sh/libm-test-ulps b/sysdeps/sh/libm-test-ulps
index 3c84259941..36f21ed395 100644
--- a/sysdeps/sh/libm-test-ulps
+++ b/sysdeps/sh/libm-test-ulps
@@ -532,11 +532,9 @@  float: 2

 Function: "tgamma":
 double: 9
-float: 8

 Function: "tgamma_towardzero":
 double: 9
-float: 7

 Function: "y0":
 double: 3
diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
index 0142357b3f..ae1e8ee972 100644
--- a/sysdeps/sparc/fpu/libm-test-ulps
+++ b/sysdeps/sparc/fpu/libm-test-ulps
@@ -1444,22 +1444,18 @@  ldouble: 3

 Function: "tgamma":
 double: 9
-float: 8
 ldouble: 4

 Function: "tgamma_downward":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 ldouble: 5

 Function: "tgamma_upward":
 double: 9
-float: 8
 ldouble: 4

 Function: "y0":
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 37d8998c71..29e3ca926a 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -2263,25 +2263,21 @@  double: 1

 Function: "tgamma":
 double: 9
-float: 8
 float128: 4
 ldouble: 5

 Function: "tgamma_downward":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_towardzero":
 double: 9
-float: 7
 float128: 5
 ldouble: 6

 Function: "tgamma_upward":
 double: 9
-float: 8
 float128: 4
 ldouble: 5