Message ID | 20170927221818.GA20589@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | , Define __FP_FAST_FMAF128 on PowerPC ISA 3.0 | expand |
On Wed, 27 Sep 2017, Michael Meissner wrote: > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128) > for PowerPC code when we have the IEEE 128-bit floating point hardware > instructions enabled. It's not a standard macro. TS 18661-3 has FP_FAST_FMAF128 as an optional math.h macro (but glibc doesn't define it anywhere at present). > This patch does this in the PowerPC backend. As I look at the whole issue, at > some point we should do this more in the machine independent portion of the > compiler. I have some initial patches to do this in the c-family files, but at > the present time, the patches are not complete, and I need to think about it > more. I think a machine-independent definition (for _FloatN / _FloatNx types in general) should go along with machine-independent fmafN / fmafNx built-in functions; when the built-in function is machine-specific, it's natural for the macro to be as well. But in any case, the new macro should be documented in cpp.texi alongside the existing __FP_FAST_FMA* macros (probably in the generic __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).
On Thu, Sep 28, 2017 at 12:40:24AM +0000, Joseph Myers wrote: > On Wed, 27 Sep 2017, Michael Meissner wrote: > > > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128) > > for PowerPC code when we have the IEEE 128-bit floating point hardware > > instructions enabled. > > It's not a standard macro. TS 18661-3 has FP_FAST_FMAF128 as an optional > math.h macro (but glibc doesn't define it anywhere at present). > > > This patch does this in the PowerPC backend. As I look at the whole issue, at > > some point we should do this more in the machine independent portion of the > > compiler. I have some initial patches to do this in the c-family files, but at > > the present time, the patches are not complete, and I need to think about it > > more. > > I think a machine-independent definition (for _FloatN / _FloatNx types in > general) should go along with machine-independent fmafN / fmafNx built-in > functions; when the built-in function is machine-specific, it's natural > for the macro to be as well. I have patches for this that I will submit shortly to replace the rs6000 specific patch. I haven't yet found all of the places that need to be changed for more traditional math functions like sqrtf128. > But in any case, the new macro should be documented in cpp.texi alongside > the existing __FP_FAST_FMA* macros (probably in the generic > __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).
On Thu, Sep 28, 2017 at 12:40:24AM +0000, Joseph Myers wrote: > On Wed, 27 Sep 2017, Michael Meissner wrote: > > > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128) > > for PowerPC code when we have the IEEE 128-bit floating point hardware > > instructions enabled. > > It's not a standard macro. TS 18661-3 has FP_FAST_FMAF128 as an optional > math.h macro (but glibc doesn't define it anywhere at present). > > > This patch does this in the PowerPC backend. As I look at the whole issue, at > > some point we should do this more in the machine independent portion of the > > compiler. I have some initial patches to do this in the c-family files, but at > > the present time, the patches are not complete, and I need to think about it > > more. > > I think a machine-independent definition (for _FloatN / _FloatNx types in > general) should go along with machine-independent fmafN / fmafNx built-in > functions; when the built-in function is machine-specific, it's natural > for the macro to be as well. > > But in any case, the new macro should be documented in cpp.texi alongside > the existing __FP_FAST_FMA* macros (probably in the generic > __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form). This patch adds support for adding the built-in __builtin_fmaf<N> and __builtin_fmaf<N>x functions if the target machine supports an appropriate fused multiply-add (FMA) instruction. This patch replaces the original PowerPC specific patch. Because it involves changes in the built-in support, both the c and c-family subdirectories, as well as PowerPC changes, I added the global/release maintainers to the To: list. I have done a bootstrap and make check on a little endian Power8 with no regresions in the tests. I have verified that the changed and new tests both ran fine. I have also bootstrapped the changes on an x86-64 compiler, and it bootstrapped fine. I am currently running the unmodified build, but I'm not expecting any changes in the test suite. Assuming the x86-64 tests also have no regressions, can I check these changes into the trunk? [gcc] 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> * builtins.def (BUILT_IN_FMAF16): Add support for fused multiply-add built-in functions for _Float<N> and _Float<N>x types. (BUILT_IN_FMAF32): Likewise. (BUILT_IN_FMAF64): Likewise. (BUILT_IN_FMAF128): Likewise. (BUILT_IN_FMAF32X): Likewise. (BUILT_IN_FMAF64X): Likewise. (BUILT_IN_FMAF128X): Likewise. * builtin-types.def (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16): Likewise. (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32): Likewise. (BT_FN_FLOAT64_FLOAT64_FLOAT64_FLOAT64): Likewise. (BT_FN_FLOAT128_FLOAT128_FLOAT128_FLOAT128): Likewise. (BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_FLOAT32X): Likewise. (BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_FLOAT64X): Likewise. (BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_FLOAT128X): Likewise. * builtins.c (expand_builtin_mathfn_ternary): Likewise. (expand_builtin): Add fused multiply-add builtin support for _Float<N> and _Float<N>X types. Issue a warning if the machine does not provide an appropriate FMA insn. (fold_builtin_3): Add support for fused multiply-add built-in functions for _Float<N> and _Float<N>x types. * config/rs6000/rs6000-builtins.def (FMAF128): Delete creating __builtin_fmaf128, since this is now done in machine independent code. * doc/cpp.texi (__FP_FAST_FMAF16): Document macros set to declare that the appropriate fused multiply-add on _Float<N> and _Float<N>X types is implemented. (__FP_FAST_FMAF32): Likewise. (__FP_FAST_FMAF64): Likewise. (__FP_FAST_FMAF128): Likewise. (__FP_FAST_FMAF32X): Likewise. (__FP_FAST_FMAF64X): Likewise. (__FP_FAST_FMAF128X): Likewise. [gcc/c] 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> * c-decl.c (header_for_builtin_fn): Add support for fused multiply-add built-in functions for _Float<N> and _Float<N>x types. [gcc/c-family] 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> * c-cppbuiltin.c (mode_has_fma): Add support for PowerPC _float128 FMA (KFmode) if long double != __float128. (c_cpp_builtins): Define __FP_FAST_FMAF<N> if _Float<N> fused multiply-add is supported. Define __FP_FAST_FMAF<N>X if _Float<N>x fused multiply-add is supported. [gcc/testsuite] 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> * gcc.target/powerpc/float128-fma2.c: Change error to new warning. * gcc.target/powerpc/float128-fma3.c: New test.
Whoops, I forgot to attach the patch. On Mon, Oct 02, 2017 at 07:51:00PM -0400, Michael Meissner wrote: > On Thu, Sep 28, 2017 at 12:40:24AM +0000, Joseph Myers wrote: > > On Wed, 27 Sep 2017, Michael Meissner wrote: > > > > > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128) > > > for PowerPC code when we have the IEEE 128-bit floating point hardware > > > instructions enabled. > > > > It's not a standard macro. TS 18661-3 has FP_FAST_FMAF128 as an optional > > math.h macro (but glibc doesn't define it anywhere at present). > > > > > This patch does this in the PowerPC backend. As I look at the whole issue, at > > > some point we should do this more in the machine independent portion of the > > > compiler. I have some initial patches to do this in the c-family files, but at > > > the present time, the patches are not complete, and I need to think about it > > > more. > > > > I think a machine-independent definition (for _FloatN / _FloatNx types in > > general) should go along with machine-independent fmafN / fmafNx built-in > > functions; when the built-in function is machine-specific, it's natural > > for the macro to be as well. > > > > But in any case, the new macro should be documented in cpp.texi alongside > > the existing __FP_FAST_FMA* macros (probably in the generic > > __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form). > > This patch adds support for adding the built-in __builtin_fmaf<N> and > __builtin_fmaf<N>x functions if the target machine supports an appropriate > fused multiply-add (FMA) instruction. This patch replaces the original PowerPC > specific patch. > > Because it involves changes in the built-in support, both the c and c-family > subdirectories, as well as PowerPC changes, I added the global/release > maintainers to the To: list. > > I have done a bootstrap and make check on a little endian Power8 with no > regresions in the tests. I have verified that the changed and new tests both > ran fine. > > I have also bootstrapped the changes on an x86-64 compiler, and it bootstrapped > fine. I am currently running the unmodified build, but I'm not expecting any > changes in the test suite. > > Assuming the x86-64 tests also have no regressions, can I check these changes > into the trunk? > > [gcc] > 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> > > * builtins.def (BUILT_IN_FMAF16): Add support for fused > multiply-add built-in functions for _Float<N> and _Float<N>x > types. > (BUILT_IN_FMAF32): Likewise. > (BUILT_IN_FMAF64): Likewise. > (BUILT_IN_FMAF128): Likewise. > (BUILT_IN_FMAF32X): Likewise. > (BUILT_IN_FMAF64X): Likewise. > (BUILT_IN_FMAF128X): Likewise. > * builtin-types.def (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16): > Likewise. > (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32): Likewise. > (BT_FN_FLOAT64_FLOAT64_FLOAT64_FLOAT64): Likewise. > (BT_FN_FLOAT128_FLOAT128_FLOAT128_FLOAT128): Likewise. > (BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_FLOAT32X): Likewise. > (BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_FLOAT64X): Likewise. > (BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_FLOAT128X): Likewise. > * builtins.c (expand_builtin_mathfn_ternary): Likewise. > (expand_builtin): Add fused multiply-add builtin support for > _Float<N> and _Float<N>X types. Issue a warning if the machine > does not provide an appropriate FMA insn. > (fold_builtin_3): Add support for fused multiply-add built-in > functions for _Float<N> and _Float<N>x types. > * config/rs6000/rs6000-builtins.def (FMAF128): Delete creating > __builtin_fmaf128, since this is now done in machine independent > code. > * doc/cpp.texi (__FP_FAST_FMAF16): Document macros set to declare > that the appropriate fused multiply-add on _Float<N> and > _Float<N>X types is implemented. > (__FP_FAST_FMAF32): Likewise. > (__FP_FAST_FMAF64): Likewise. > (__FP_FAST_FMAF128): Likewise. > (__FP_FAST_FMAF32X): Likewise. > (__FP_FAST_FMAF64X): Likewise. > (__FP_FAST_FMAF128X): Likewise. > > [gcc/c] > 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> > > * c-decl.c (header_for_builtin_fn): Add support for fused > multiply-add built-in functions for _Float<N> and _Float<N>x > types. > > [gcc/c-family] > 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> > > * c-cppbuiltin.c (mode_has_fma): Add support for PowerPC _float128 > FMA (KFmode) if long double != __float128. > (c_cpp_builtins): Define __FP_FAST_FMAF<N> if _Float<N> fused > multiply-add is supported. Define __FP_FAST_FMAF<N>X if > _Float<N>x fused multiply-add is supported. > > [gcc/testsuite] > 2017-10-02 Michael Meissner <meissner@linux.vnet.ibm.com> > > * gcc.target/powerpc/float128-fma2.c: Change error to new > warning. > * gcc.target/powerpc/float128-fma3.c: New test. > > > -- > Michael Meissner, IBM > IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA > email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
On Mon, 2 Oct 2017, Michael Meissner wrote: > > > But in any case, the new macro should be documented in cpp.texi alongside > > > the existing __FP_FAST_FMA* macros (probably in the generic > > > __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form). > > > > This patch adds support for adding the built-in __builtin_fmaf<N> and > > __builtin_fmaf<N>x functions if the target machine supports an appropriate > > fused multiply-add (FMA) instruction. This patch replaces the original PowerPC > > specific patch. Certainly the <math.h> FP_FAST_FMA* macros are supposed to relate to whether the public functions such as fmaf128 are fast rather than to __builtin_* names. I think there's a strong case that you should provide built-in functions under the public names when defining __FP_FAST_FMA*. I.e., add a variant of DEF_GCC_FLOATN_NX_BUILTINS that uses DEF_EXT_LIB_BUILTIN instead of DEF_GCC_BUILTIN, and use that for the new built-in functions. Then, the built-in functions, in whatever form they are provided, should be documented in extend.texi, alongside the documentation of __builtin_fabsf@var{n} etc. (with, of course, the caveats about availability when appropriate instruction support isn't available - the __builtin_fabsfN, __builtin_copysignfN functions are always inlined, the fma ones aren't, and people may well lack C library support for the underlying functions). Given that, I don't think the warning about lack of instruction support is appropriate; a call to __builtin_fmaf128, if the type is supported but there is no corresponding instruction (on x86_64, say), would just fall back to calling an external fmaf128 function (which in that case would work with glibc 2.26 or later, though calls to fmaf64 etc. wouldn't), much like any other such built-in function (we don't warn about e.g. calling __builtin_clog10 on systems whose C library may not have clog10).
Hi!
On Mon, Oct 02, 2017 at 07:52:50PM -0400, Michael Meissner wrote:
> Whoops, I forgot to attach the patch.
Heh. The rs6000 parts are of course okay (trivial / obvious, but maybe
you are waiting for an ack).
Thanks,
Segher
Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (revision 253236) +++ gcc/config/rs6000/rs6000-c.c (working copy) @@ -585,7 +585,10 @@ rs6000_target_modify_macros (bool define /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or via the target attribute/pragma. */ if ((flags & OPTION_MASK_FLOAT128_HW) != 0) - rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); + { + rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); + rs6000_define_or_undefine_macro (define_p, "__FP_FAST_FMAF128"); + } /* options from the builtin masks. */ /* Note that RS6000_BTM_PAIRED is enabled only if Index: gcc/testsuite/gcc.target/powerpc/float128-fma3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/float128-fma3.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/float128-fma3.c (working copy) @@ -0,0 +1,33 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mpower9-vector -O2" } */ + +/* Make sure the appropriate FMA fast macros are defined. */ + +#ifdef __FP_FAST_FMAF +float +do_fmaf (float a, float b, float c) +{ + return __builtin_fmaf (a, b, c); +} +#endif + +#ifdef __FP_FAST_FMA +double +do_fma (double a, double b, double c) +{ + return __builtin_fma (a, b, c); +} +#endif + +#ifdef __FP_FAST_FMAF128 +_Float128 +do_fmaf128 (_Float128 a, _Float128 b, _Float128 c) +{ + return __builtin_fmaf128 (a, b, c); +} +#endif + +/* { dg-final { scan-assembler {\mfmadds\M|\mxsmadd.sp\M} } } */ +/* { dg-final { scan-assembler {\mfmadd\M|\mxsmadd.dp\M} } } */ +/* { dg-final { scan-assembler {\mxsmaddqp\M} } } */