Message ID | 20240807031351.46105-2-zengxiao@eswincomputing.com |
---|---|
State | New |
Headers | show |
Series | [v2,1/1] RISC-V: Support BF16 interfaces in libgcc | expand |
2024-08-07 11:13 Xiao Zeng <zengxiao@eswincomputing.com> wrote: The existing test cases 'gcc.dg/portal/float16 complex.c' for gcc are already good, so no new test cases were added. Of course, more test cases are always good, and if necessary, I will supplement the test cases. > >gcc/ChangeLog: > > * builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node. > (BT_BFLOAT16_PTR): Ditto. > (BT_FN_BFLOAT16): New. > (BT_FN_BFLOAT16_BFLOAT16): Ditto. > (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto. > (BT_FN_INT_BFLOAT16): Ditto. > (BT_FN_LONG_BFLOAT16): Ditto. > (BT_FN_LONGLONG_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_INT): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto. > (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto. > * builtins.cc (expand_builtin_classify_type): Support BF16. > (mathfn_built_in_2): Ditto. > (CASE_MATHFN_FLOATN): Ditto. > * builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto. > (DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto. > (BUILT_IN_NANSF16B): Added in general processing, redundant > is removed here. > (BUILT_IN_NEXTAFTERF16B): Ditto. > * fold-const-call.cc (fold_const_call): Ditto. > (fold_const_call_sss): Ditto. > * gencfn-macros.cc: Support BF16. > * match.pd: Like FP16, add optimization for BF16. > * tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16. > >gcc/c-family/ChangeLog: > > * c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid > conflicts. > >libgcc/ChangeLog: > > * Makefile.in: Add _mulbc3 and _divbc3. > * libgcc2.c (if): Ditto. > (defined): Ditto. > (MTYPE): Macros defined for BF16. > (CTYPE): Ditto. > (AMTYPE): Ditto. > (MODE): Ditto. > (CEXT): Ditto. > (NOTRUNC): Ditto. > * libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16. > (__attribute__): Ditto. > (__divbc3): Add __divbc3 declaration. > (__mulbc3): Add __mulbc3 declaration. > >Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com> >--- > gcc/builtin-types.def | 30 ++++++++++++++++++++++++++++++ > gcc/builtins.cc | 6 ++++++ > gcc/builtins.def | 22 +++++++++++----------- > gcc/c-family/c-cppbuiltin.cc | 2 +- > gcc/fold-const-call.cc | 2 -- > gcc/gencfn-macros.cc | 5 +++-- > gcc/match.pd | 9 ++++++--- > gcc/tree.h | 2 +- > libgcc/Makefile.in | 6 +++--- > libgcc/libgcc2.c | 20 ++++++++++++++------ > libgcc/libgcc2.h | 14 ++++++++++++++ > 11 files changed, 89 insertions(+), 29 deletions(-) > >diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def >index c97d6bad1de..6980873f2f1 100644 >--- a/gcc/builtin-types.def >+++ b/gcc/builtin-types.def >@@ -109,6 +109,10 @@ DEF_PRIMITIVE_TYPE (BT_FLOAT128X, (float128x_type_node > DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT, complex_float_type_node) > DEF_PRIMITIVE_TYPE (BT_COMPLEX_DOUBLE, complex_double_type_node) > DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex_long_double_type_node) >+DEF_PRIMITIVE_TYPE (BT_COMPLEX_BFLOAT16, (bfloat16_type_node >+ ? build_complex_type >+ (bfloat16_type_node) >+ : error_mark_node)) > DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT16, (float16_type_node > ? build_complex_type > (float16_type_node) >@@ -163,6 +167,9 @@ DEF_PRIMITIVE_TYPE (BT_CONST_DOUBLE_PTR, > (build_qualified_type (double_type_node, > TYPE_QUAL_CONST))) > DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE_PTR, long_double_ptr_type_node) >+DEF_PRIMITIVE_TYPE (BT_BFLOAT16_PTR, (bfloat16_type_node >+ ? build_pointer_type (bfloat16_type_node) >+ : error_mark_node)) > DEF_PRIMITIVE_TYPE (BT_FLOAT16_PTR, (float16_type_node > ? build_pointer_type (float16_type_node) > : error_mark_node)) >@@ -239,6 +246,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DOUBLE) > distinguish it from two types in sequence, "long" followed by > "double". */ > DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE) >+DEF_FUNCTION_TYPE_0 (BT_FN_BFLOAT16, BT_BFLOAT16) > DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16) > DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32) > DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64) >@@ -257,6 +265,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_FLOAT, BT_FLOAT, BT_FLOAT) > DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE, BT_DOUBLE, BT_DOUBLE) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE, > BT_LONGDOUBLE, BT_LONGDOUBLE) >+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32) > DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64) >@@ -270,6 +279,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_DOUBLE_COMPLEX_DOUBLE, > BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE) > DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE, > BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE) >+DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16, >+ BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16, > BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32, >@@ -290,6 +301,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_COMPLEX_DOUBLE, > BT_DOUBLE, BT_COMPLEX_DOUBLE) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_COMPLEX_LONGDOUBLE, > BT_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE) >+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_COMPLEX_BFLOAT16, >+ BT_BFLOAT16, BT_COMPLEX_BFLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_COMPLEX_FLOAT16, > BT_FLOAT16, BT_COMPLEX_FLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_COMPLEX_FLOAT32, >@@ -324,6 +337,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_PTR, BT_INT, BT_PTR) > DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT, BT_INT, BT_FLOAT) > DEF_FUNCTION_TYPE_1 (BT_FN_INT_DOUBLE, BT_INT, BT_DOUBLE) > DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONGDOUBLE, BT_INT, BT_LONGDOUBLE) >+DEF_FUNCTION_TYPE_1 (BT_FN_INT_BFLOAT16, BT_INT, BT_BFLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT16, BT_INT, BT_FLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT32, BT_INT, BT_FLOAT32) > DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT64, BT_INT, BT_FLOAT64) >@@ -337,6 +351,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_DFLOAT128, BT_INT, BT_DFLOAT128) > DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT, BT_LONG, BT_FLOAT) > DEF_FUNCTION_TYPE_1 (BT_FN_LONG_DOUBLE, BT_LONG, BT_DOUBLE) > DEF_FUNCTION_TYPE_1 (BT_FN_LONG_LONGDOUBLE, BT_LONG, BT_LONGDOUBLE) >+DEF_FUNCTION_TYPE_1 (BT_FN_LONG_BFLOAT16, BT_LONG, BT_BFLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT16, BT_LONG, BT_FLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT32, BT_LONG, BT_FLOAT32) > DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT64, BT_LONG, BT_FLOAT64) >@@ -347,6 +362,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT128X, BT_LONG, BT_FLOAT128X) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT, BT_LONGLONG, BT_FLOAT) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_DOUBLE, BT_LONGLONG, BT_DOUBLE) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_LONGDOUBLE, BT_LONGLONG, BT_LONGDOUBLE) >+DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_BFLOAT16, BT_LONGLONG, BT_BFLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT16, BT_LONGLONG, BT_FLOAT16) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT32, BT_LONGLONG, BT_FLOAT32) > DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT64, BT_LONGLONG, BT_FLOAT64) >@@ -525,6 +541,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_DOUBLEPTR, > BT_DOUBLE, BT_DOUBLE, BT_DOUBLE_PTR) > DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLEPTR, > BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE_PTR) >+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR, >+ BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16_PTR) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, > BT_FLOAT16, BT_FLOAT16, BT_FLOAT16_PTR) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_FLOAT32PTR, >@@ -549,6 +567,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_INT, > BT_DOUBLE, BT_DOUBLE, BT_INT) > DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_INT, > BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT) >+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_INT, >+ BT_BFLOAT16, BT_BFLOAT16, BT_INT) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_INT, > BT_FLOAT16, BT_FLOAT16, BT_INT) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_INT, >@@ -569,6 +589,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_INTPTR, > BT_DOUBLE, BT_DOUBLE, BT_INT_PTR) > DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_INTPTR, > BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT_PTR) >+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_INTPTR, >+ BT_BFLOAT16, BT_BFLOAT16, BT_INT_PTR) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_INTPTR, > BT_FLOAT16, BT_FLOAT16, BT_INT_PTR) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_INTPTR, >@@ -595,6 +617,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_LONG, > BT_DOUBLE, BT_DOUBLE, BT_LONG) > DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONG, > BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONG) >+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_LONG, >+ BT_BFLOAT16, BT_BFLOAT16, BT_LONG) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_LONG, > BT_FLOAT16, BT_FLOAT16, BT_LONG) > DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_LONG, >@@ -621,6 +645,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_DOUBLE_COMPLEX_DOUBLE_COMPLEX_DOUBLE, > BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE) > DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE, > BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE) >+DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16, >+ BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16) > DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16_COMPLEX_FLOAT16, > BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16) > DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32_COMPLEX_FLOAT32, >@@ -728,6 +754,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE_DOUBLE_DOUBLE, > BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE) > DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, > BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE) >+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16, >+ BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16) > DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16, > BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16) > DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32, >@@ -748,6 +776,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE_DOUBLE_INTPTR, > BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_INT_PTR) > DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_INTPTR, > BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT_PTR) >+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR, >+ BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_INT_PTR) > DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR, > BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_INT_PTR) > DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR, >diff --git a/gcc/builtins.cc b/gcc/builtins.cc >index 0b902896ddd..d0fc8e755e8 100644 >--- a/gcc/builtins.cc >+++ b/gcc/builtins.cc >@@ -1918,6 +1918,7 @@ expand_builtin_classify_type (tree exp) > fcodef32 = BUILT_IN_##MATHFN##F32; fcodef64 = BUILT_IN_##MATHFN##F64 ; \ > fcodef128 = BUILT_IN_##MATHFN##F128 ; fcodef32x = BUILT_IN_##MATHFN##F32X ; \ > fcodef64x = BUILT_IN_##MATHFN##F64X ; fcodef128x = BUILT_IN_##MATHFN##F128X ;\ >+ fcodef16b = BUILT_IN_##MATHFN##F16B ; \ > break; > /* Similar to above, but appends _R after any F/L suffix. */ > #define CASE_MATHFN_REENT(MATHFN) \ >@@ -1937,6 +1938,7 @@ mathfn_built_in_2 (tree type, combined_fn fn) > { > tree mtype; > built_in_function fcode, fcodef, fcodel; >+ built_in_function fcodef16b = END_BUILTINS; > built_in_function fcodef16 = END_BUILTINS; > built_in_function fcodef32 = END_BUILTINS; > built_in_function fcodef64 = END_BUILTINS; >@@ -2055,6 +2057,8 @@ mathfn_built_in_2 (tree type, combined_fn fn) > return fcodef; > else if (mtype == long_double_type_node) > return fcodel; >+ else if (mtype == bfloat16_type_node) >+ return fcodef16b; > else if (mtype == float16_type_node) > return fcodef16; > else if (mtype == float32_type_node) >@@ -2137,6 +2141,8 @@ mathfn_built_in_type (combined_fn fn) > > #define CASE_MATHFN_FLOATN(MATHFN) \ > CASE_MATHFN(MATHFN) \ >+ case CFN_BUILT_IN_##MATHFN##F16B: \ >+ return bfloat16_type_node; \ > case CFN_BUILT_IN_##MATHFN##F16: \ > return float16_type_node; \ > case CFN_BUILT_IN_##MATHFN##F32: \ >diff --git a/gcc/builtins.def b/gcc/builtins.def >index f6f3e104f6a..ffd427d7d93 100644 >--- a/gcc/builtins.def >+++ b/gcc/builtins.def >@@ -77,11 +77,12 @@ along with GCC; see the file COPYING3. If not see > DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, TYPE, BT_LAST, \ > false, false, false, ATTRS, true, true) > >-/* A set of GCC builtins for _FloatN and _FloatNx types. TYPE_MACRO >- is called with an argument such as FLOAT32 to produce the enum >- value for the type. */ >+/* A set of GCC builtins for __bf16, _FloatN and _FloatNx types. >+ TYPE_MACRO is called with an argument such as FLOAT32 to produce >+ the enum value for the type. */ > #undef DEF_GCC_FLOATN_NX_BUILTINS > #define DEF_GCC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS) \ >+ DEF_GCC_BUILTIN (ENUM ## F16B, NAME "f16b", TYPE_MACRO (BFLOAT16), ATTRS) \ > DEF_GCC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \ > DEF_GCC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \ > DEF_GCC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \ >@@ -110,12 +111,12 @@ along with GCC; see the file COPYING3. If not see > DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ > true, true, true, ATTRS, false, true) > >-/* A set of GCC builtins for _FloatN and _FloatNx types. TYPE_MACRO is called >- with an argument such as FLOAT32 to produce the enum value for the type. If >- we are compiling for the C language with GNU extensions, we enable the name >- without the __builtin_ prefix as well as the name with the __builtin_ >- prefix. C++ does not enable these names by default because a class based >- library should use the __builtin_ names. */ >+/* A set of GCC builtins for __bf16, _FloatN and _FloatNx types. >+ TYPE_MACRO is called with an argument such as FLOAT32 to produce the enum >+ value for the type. If we are compiling for the C language with GNU >+ extensions, we enable the name without the __builtin_ prefix as well as the >+ name with the __builtin_ prefix. C++ does not enable these names by default >+ because a class based library should use the __builtin_ names. */ > #undef DEF_FLOATN_BUILTIN > #define DEF_FLOATN_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ > DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ >@@ -123,6 +124,7 @@ along with GCC; see the file COPYING3. If not see > false, true) > #undef DEF_EXT_LIB_FLOATN_NX_BUILTINS > #define DEF_EXT_LIB_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS) \ >+ DEF_FLOATN_BUILTIN (ENUM ## F16B, NAME "f16b", TYPE_MACRO (BFLOAT16), ATTRS) \ > DEF_FLOATN_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \ > DEF_FLOATN_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \ > DEF_FLOATN_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \ >@@ -576,7 +578,6 @@ DEF_GCC_BUILTIN (BUILT_IN_NANSF, "nansf", BT_FN_FLOAT_CONST_STRING, ATTR_ > DEF_GCC_BUILTIN (BUILT_IN_NANSL, "nansl", BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) > DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, ATTR_CONST_NOTHROW_NONNULL) > #undef NAN_TYPE >-DEF_GCC_BUILTIN (BUILT_IN_NANSF16B, "nansf16b", BT_FN_BFLOAT16_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) > DEF_GCC_BUILTIN (BUILT_IN_NANSD32, "nansd32", BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) > DEF_GCC_BUILTIN (BUILT_IN_NANSD64, "nansd64", BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) > DEF_GCC_BUILTIN (BUILT_IN_NANSD128, "nansd128", BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) >@@ -591,7 +592,6 @@ DEF_C99_BUILTIN (BUILT_IN_NEXTAFTERF, "nextafterf", BT_FN_FLOAT_FLOAT_FLO > DEF_C99_BUILTIN (BUILT_IN_NEXTAFTERL, "nextafterl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO) > #define NEXTAFTER_TYPE(F) BT_FN_##F##_##F##_##F > DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_NEXTAFTER, "nextafter", NEXTAFTER_TYPE, ATTR_MATHFN_ERRNO) >-DEF_GCC_BUILTIN (BUILT_IN_NEXTAFTERF16B, "nextafterf16b", BT_FN_BFLOAT16_BFLOAT16_BFLOAT16, ATTR_MATHFN_ERRNO) > DEF_C99_BUILTIN (BUILT_IN_NEXTTOWARD, "nexttoward", BT_FN_DOUBLE_DOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO) > DEF_C99_BUILTIN (BUILT_IN_NEXTTOWARDF, "nexttowardf", BT_FN_FLOAT_FLOAT_LONGDOUBLE, ATTR_MATHFN_ERRNO) > DEF_C99_BUILTIN (BUILT_IN_NEXTTOWARDL, "nexttowardl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO) >diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc >index a80372c8991..273bb9cf028 100644 >--- a/gcc/c-family/c-cppbuiltin.cc >+++ b/gcc/c-family/c-cppbuiltin.cc >@@ -1422,7 +1422,7 @@ c_cpp_builtins (cpp_reader *pfile) > else if (bfloat16_type_node > && mode == TYPE_MODE (bfloat16_type_node)) > { >- memcpy (suffix, "bf16", 5); >+ memcpy (suffix, "f16b", 5); > memcpy (float_h_prefix, "BFLT16", 7); > } > else >diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc >index 47bf8d64391..ed1ec0ab3ee 100644 >--- a/gcc/fold-const-call.cc >+++ b/gcc/fold-const-call.cc >@@ -1354,7 +1354,6 @@ fold_const_call (combined_fn fn, tree type, tree arg) > > CASE_CFN_NANS: > CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS): >- case CFN_BUILT_IN_NANSF16B: > case CFN_BUILT_IN_NANSD32: > case CFN_BUILT_IN_NANSD64: > case CFN_BUILT_IN_NANSD128: >@@ -1462,7 +1461,6 @@ fold_const_call_sss (real_value *result, combined_fn fn, > > CASE_CFN_NEXTAFTER: > CASE_CFN_NEXTAFTER_FN: >- case CFN_BUILT_IN_NEXTAFTERF16B: > CASE_CFN_NEXTTOWARD: > return fold_const_nextafter (result, arg0, arg1, format); > >diff --git a/gcc/gencfn-macros.cc b/gcc/gencfn-macros.cc >index 2581e758fe6..8c78ef084fe 100644 >--- a/gcc/gencfn-macros.cc >+++ b/gcc/gencfn-macros.cc >@@ -156,10 +156,11 @@ const char *const internal_fn_int_names[] = { > > static const char *const flt_suffixes[] = { "F", "", "L", NULL }; > static const char *const fltfn_suffixes[] = { "F16", "F32", "F64", "F128", >- "F32X", "F64X", "F128X", NULL }; >+ "F32X", "F64X", "F128X","F16B", >+ NULL }; > static const char *const fltall_suffixes[] = { "F", "", "L", "F16", "F32", > "F64", "F128", "F32X", "F64X", >- "F128X", NULL }; >+ "F128X", "F16B", NULL }; > static const char *const int_suffixes[] = { "", "L", "LL", "IMAX", NULL }; > > static const char *const *const suffix_lists[] = { >diff --git a/gcc/match.pd b/gcc/match.pd >index c9c8478d286..ca01c6714d8 100644 >--- a/gcc/match.pd >+++ b/gcc/match.pd >@@ -8386,7 +8386,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > #if GIMPLE > (match float16_value_p > @0 >- (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node))) >+ (if ((TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node) || >+ (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == bfloat16_type_node)))) > (for froms (BUILT_IN_TRUNCL BUILT_IN_TRUNC BUILT_IN_TRUNCF > BUILT_IN_FLOORL BUILT_IN_FLOOR BUILT_IN_FLOORF > BUILT_IN_CEILL BUILT_IN_CEIL BUILT_IN_CEILF >@@ -8403,8 +8404,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT > IFN_RINT IFN_RINT IFN_RINT > IFN_SQRT IFN_SQRT IFN_SQRT) >- /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc., >- if x is a _Float16. */ >+ /* 1 (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc., >+ if x is a _Float16. >+ 2 (__bf16) round ((doube) x) -> __built_in_roundf16b (x), etc., >+ if x is a __bf16. */ > (simplify > (convert (froms (convert float16_value_p@0))) > (if (optimize >diff --git a/gcc/tree.h b/gcc/tree.h >index 5dcbb2fb5dd..67fc2a2e614 100644 >--- a/gcc/tree.h >+++ b/gcc/tree.h >@@ -310,7 +310,7 @@ code_helper::is_builtin_fn () const > #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L > #define CASE_FLT_FN_FLOATN_NX(FN) \ > case FN##F16: case FN##F32: case FN##F64: case FN##F128: \ >- case FN##F32X: case FN##F64X: case FN##F128X >+ case FN##F32X: case FN##F64X: case FN##F128X: case FN##F16B > #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R > #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX > >diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in >index 0e46e9ef768..b71fd5e2250 100644 >--- a/libgcc/Makefile.in >+++ b/libgcc/Makefile.in >@@ -450,9 +450,9 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 _cmpdi2 _ucmpdi2 \ > _negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 \ > _ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2 \ > _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2 \ >- _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3 \ >- _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2 \ >- _clrsbdi2 _mulbitint3 >+ _mulhc3 _mulbc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 \ >+ _divbc3 _divsc3 _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 \ >+ _clrsbsi2 _clrsbdi2 _mulbitint3 > > # The floating-point conversion routines that involve a single-word integer. > # XX stands for the integer mode. >diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c >index 3fcb85c5b92..512ca92bfb9 100644 >--- a/libgcc/libgcc2.c >+++ b/libgcc/libgcc2.c >@@ -2591,6 +2591,7 @@ NAME (TYPE x, int m) > #endif > > > #if((defined(L_mulhc3) || defined(L_divhc3)) && LIBGCC2_HAS_HF_MODE) \ >+ || ((defined(L_mulbc3) || defined(L_divbc3)) && LIBGCC2_HAS_BF_MODE) \ > || ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \ > || ((defined(L_muldc3) || defined(L_divdc3)) && LIBGCC2_HAS_DF_MODE) \ > || ((defined(L_mulxc3) || defined(L_divxc3)) && LIBGCC2_HAS_XF_MODE) \ >@@ -2607,6 +2608,13 @@ NAME (TYPE x, int m) > # define MODE hc > # define CEXT __LIBGCC_HF_FUNC_EXT__ > # define NOTRUNC (!__LIBGCC_HF_EXCESS_PRECISION__) >+#elif defined(L_mulbc3) || defined(L_divbc3) >+# define MTYPE BFtype >+# define CTYPE BCtype >+# define AMTYPE SFtype >+# define MODE bc >+# define CEXT __LIBGCC_BF_FUNC_EXT__ >+# define NOTRUNC (!__LIBGCC_BF_EXCESS_PRECISION__) > #elif defined(L_mulsc3) || defined(L_divsc3) > # define MTYPE SFtype > # define CTYPE SCtype >@@ -2690,8 +2698,8 @@ extern void *compile_type_assert[sizeof(INFINITY) == sizeof(MTYPE) ? 1 : -1]; > # define TRUNC(x) __asm__ ("" : "=m"(x) : "m"(x)) > #endif > >-#if defined(L_mulhc3) || defined(L_mulsc3) || defined(L_muldc3) \ >- || defined(L_mulxc3) || defined(L_multc3) >+#if defined(L_mulhc3) || defined(L_mulbc3) || defined(L_mulsc3) \ >+ || defined(L_muldc3) || defined(L_mulxc3) || defined(L_multc3) > > CTYPE > CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) >@@ -2760,16 +2768,16 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) > } > #endif /* complex multiply */ > >-#if defined(L_divhc3) || defined(L_divsc3) || defined(L_divdc3) \ >- || defined(L_divxc3) || defined(L_divtc3) >+#if defined(L_divhc3) || defined(L_divbc3) || defined(L_divsc3) \ >+ || defined(L_divdc3) || defined(L_divxc3) || defined(L_divtc3) > > CTYPE > CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) > { >-#if defined(L_divhc3) \ >+#if (defined(L_divhc3) || defined(L_divbc3) ) \ > || (defined(L_divsc3) && defined(__LIBGCC_HAVE_HWDBL__) ) > >- /* Half precision is handled with float precision. >+ /* _Float16 and __bf16 are handled with float precision. > float is handled with double precision when double precision > hardware is available. > Due to the additional precision, the simple complex divide >diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h >index b358b3a2b50..ee99badde86 100644 >--- a/libgcc/libgcc2.h >+++ b/libgcc/libgcc2.h >@@ -43,6 +43,12 @@ extern void __eprintf (const char *, const char *, unsigned int, const char *) > #define LIBGCC2_HAS_HF_MODE 0 > #endif > >+#ifdef __LIBGCC_HAS_BF_MODE__ >+#define LIBGCC2_HAS_BF_MODE 1 >+#else >+#define LIBGCC2_HAS_BF_MODE 0 >+#endif >+ > #ifdef __LIBGCC_HAS_SF_MODE__ > #define LIBGCC2_HAS_SF_MODE 1 > #else >@@ -146,6 +152,10 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); > typedef float HFtype __attribute__ ((mode (HF))); > typedef _Complex float HCtype __attribute__ ((mode (HC))); > #endif >+#if LIBGCC2_HAS_BF_MODE >+typedef float BFtype __attribute__ ((mode (BF))); >+typedef _Complex float BCtype __attribute__ ((mode (BC))); >+#endif > #if LIBGCC2_HAS_SF_MODE > typedef float SFtype __attribute__ ((mode (SF))); > typedef _Complex float SCtype __attribute__ ((mode (SC))); >@@ -465,6 +475,10 @@ extern SItype __negvsi2 (SItype); > extern HCtype __divhc3 (HFtype, HFtype, HFtype, HFtype); > extern HCtype __mulhc3 (HFtype, HFtype, HFtype, HFtype); > #endif >+#if LIBGCC2_HAS_BF_MODE >+extern BCtype __divbc3 (BFtype, BFtype, BFtype, BFtype); >+extern BCtype __mulbc3 (BFtype, BFtype, BFtype, BFtype); >+#endif > #if LIBGCC2_HAS_SF_MODE > extern DWtype __fixsfdi (SFtype); > extern SFtype __floatdisf (DWtype); >-- >2.43.0 Thanks Xiao Zeng
On Wed, Aug 07, 2024 at 11:13:51AM +0800, Xiao Zeng wrote: > gcc/ChangeLog: > > * builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node. > (BT_BFLOAT16_PTR): Ditto. > (BT_FN_BFLOAT16): New. > (BT_FN_BFLOAT16_BFLOAT16): Ditto. > (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto. > (BT_FN_INT_BFLOAT16): Ditto. > (BT_FN_LONG_BFLOAT16): Ditto. > (BT_FN_LONGLONG_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_INT): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto. > (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto. > (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto. > * builtins.cc (expand_builtin_classify_type): Support BF16. > (mathfn_built_in_2): Ditto. > (CASE_MATHFN_FLOATN): Ditto. > * builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto. > (DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto. > (BUILT_IN_NANSF16B): Added in general processing, redundant > is removed here. > (BUILT_IN_NEXTAFTERF16B): Ditto. > * fold-const-call.cc (fold_const_call): Ditto. > (fold_const_call_sss): Ditto. > * gencfn-macros.cc: Support BF16. > * match.pd: Like FP16, add optimization for BF16. > * tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16. > > gcc/c-family/ChangeLog: > > * c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid > conflicts. > > libgcc/ChangeLog: > > * Makefile.in: Add _mulbc3 and _divbc3. > * libgcc2.c (if): Ditto. > (defined): Ditto. > (MTYPE): Macros defined for BF16. > (CTYPE): Ditto. > (AMTYPE): Ditto. > (MODE): Ditto. > (CEXT): Ditto. > (NOTRUNC): Ditto. > * libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16. > (__attribute__): Ditto. > (__divbc3): Add __divbc3 declaration. > (__mulbc3): Add __mulbc3 declaration. > > Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com> This looks all wrong to me. On all the other targets that already do support __bf16 type it is a storage only type, so all arithmetics on it is expected to be done on float, not in __bf16. Therefore, those targets really don't want any of those other builtins, there will be no libm support for it, and they don't want support in libgcc either, that is just wasted code. Intentionally the only builtins provided are the minimum required for proper C++23 support, __builtin_nansf16b and __builtin_nextafterf16b, because those need to be constexpr friendly and can't be dealt with by extending to float and using float builtins. So, if riscv wants something different (will there by e.g. any libm implementation with all the __bf16 APIs though?), it should ask for it some way (target hook or whatever) and only in that case it should enable the other builtins, libgcc APIs etc. Jakub
On 8/7/24 1:16 AM, Jakub Jelinek wrote: > > This looks all wrong to me. > > On all the other targets that already do support __bf16 type it is a storage > only type, so all arithmetics on it is expected to be done on float, not in > __bf16. RISC-V has (via extensions) degrees of arithmetic/conversion support, so for example it can do a multiply-add of bf16 operands widening to float. > > So, if riscv wants something different (will there by e.g. any libm > implementation with all the __bf16 APIs though?), it should ask for it some way > (target hook or whatever) and only in that case it should enable the other > builtins, libgcc APIs etc. ISTM for the limited cases where we want native bf16 support we could just have target specific builtins. I'm not sure what the motivation is behind trying to support the richer set of operations really is. So perhaps Xiao could start with explaining why this is important. jeff
On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote: > > > On 8/7/24 1:16 AM, Jakub Jelinek wrote: > > > > > This looks all wrong to me. > > > > On all the other targets that already do support __bf16 type it is a storage > > only type, so all arithmetics on it is expected to be done on float, not in > > __bf16. > RISC-V has (via extensions) degrees of arithmetic/conversion support, so for > example it can do a multiply-add of bf16 operands widening to float. Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on other targets), but there those functions are at least part of C23, even when they are really not implemented yet in libm (at least talking about glibc, but I doubt other C libraries are any further than that). For __bf16, the only standard required stuff is in C++23 and the provided builtins are whatever was necessary for that. I understand RISC-V has via extensions more full _Float16 and __bf16 support, but if it needs further builtins, the questions are: 1) should they be enabled on all arches or just on those that need them? 2) is there plan to add libm support for __bf16, even when it is non-standard in C (especially if we don't know if C2y or newer will or won't add support for it and if it will use the chosen suffixes or some others)? 3) is there plan to add variants for C++23 <cmath> and <complex> etc. to handle _Float16 and __bf16 differently? Currently those types are just handled by doing as much as possible on float, using its builtins Jakub
On 8/7/24 8:55 AM, Jakub Jelinek wrote: > On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote: >> >> >> On 8/7/24 1:16 AM, Jakub Jelinek wrote: >> >>> >>> This looks all wrong to me. >>> >>> On all the other targets that already do support __bf16 type it is a storage >>> only type, so all arithmetics on it is expected to be done on float, not in >>> __bf16. >> RISC-V has (via extensions) degrees of arithmetic/conversion support, so for >> example it can do a multiply-add of bf16 operands widening to float. > > Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on > other targets), but there those functions are at least part of C23, even > when they are really not implemented yet in libm (at least talking about > glibc, but I doubt other C libraries are any further than that). > For __bf16, the only standard required stuff is in C++23 and the provided > builtins are whatever was necessary for that. > > I understand RISC-V has via extensions more full _Float16 and __bf16 > support, but if it needs further builtins, the questions are: > 1) should they be enabled on all arches or just on those that need them? I'd tend to take a wait and see approach, meaning start when them as target builtins and promote them to generic builtins if we see other targets implementing a richer set of bf16 operations. > 2) is there plan to add libm support for __bf16, even when it is > non-standard in C (especially if we don't know if C2y or newer will or won't > add support for it and if it will use the chosen suffixes or some others)? > 3) is there plan to add variants for C++23 <cmath> and <complex> etc.> to handle _Float16 and __bf16 differently? Currently those types are just > handled by doing as much as possible on float, using its builtins I have no idea on either of these questions. jeff
2024-08-07 23:15 Jeff Law <jeffreyalaw@gmail.com> wrote: > > > >On 8/7/24 8:55 AM, Jakub Jelinek wrote: >> On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote: >>> >>> >>> On 8/7/24 1:16 AM, Jakub Jelinek wrote: >>> >>>> >>>> This looks all wrong to me. >>>> >>>> On all the other targets that already do support __bf16 type it is a storage >>>> only type, so all arithmetics on it is expected to be done on float, not in >>>> __bf16. >>> RISC-V has (via extensions) degrees of arithmetic/conversion support, so for >>> example it can do a multiply-add of bf16 operands widening to float. >> >> Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on >> other targets), but there those functions are at least part of C23, even >> when they are really not implemented yet in libm (at least talking about >> glibc, but I doubt other C libraries are any further than that). >> For __bf16, the only standard required stuff is in C++23 and the provided >> builtins are whatever was necessary for that. >> >> I understand RISC-V has via extensions more full _Float16 and __bf16 >> support, but if it needs further builtins, the questions are: >> 1) should they be enabled on all arches or just on those that need them? >I'd tend to take a wait and see approach, meaning start when them as >target builtins and promote them to generic builtins if we see other >targets implementing a richer set of bf16 operations. > >> 2) is there plan to add libm support for __bf16, even when it is >> non-standard in C (especially if we don't know if C2y or newer will or won't >> add support for it and if it will use the chosen suffixes or some others)? > > 3) is there plan to add variants for C++23 <cmath> and <complex> >etc.> to handle _Float16 and __bf16 differently? Currently those types >are just >> handled by doing as much as possible on float, using its builtins >I have no idea on either of these questions. > >jeff Thank you very much for the in-depth discussion between Jakub Jelinek and jeff. My knowledge is narrow, and I am not familiar with architectures other than RISCV. At the same time, my understanding of libraries such as libc and libm is also shallow. I spent some time sorting out my thoughts, which resulted in slow email replies. I am very sorry. 1 BF16 is a 16 bit floating-point data type that differs only in encoding from FP16, but is otherwise the same. 2 BF16 can be used by any architecture, just like FP16. 3 libgcc provides interface functions related to floating-point types, such as __mulsc3/__divsc3. 4 There is test case: ---------------------------------------------------------------------------------------- typedef _Complex float __cbf16 __attribute__((__mode__(__BC__))); __cbf16 cbf16; __cbf16 cbf16_1; __cbf16 cbf16_2; __cbf16 cbf16_mul_cbf16() { cbf16 = cbf16_1 * cbf16_2; } __cbf16 cbf16_div_cbf16() { cbf16 = cbf16_1 / cbf16_2; } ---------------------------------------------------------------------------------------- 4.1 Riscv architecture, -march=rv64imafdcv_zvfh -mabi=lp64d -O2. After compilation, the resulting assembly will include: ---------------------------------------------------------------------------------------- call __mulbc3 call __divbc3 ---------------------------------------------------------------------------------------- Due to the absence of the __mulbc3/__divbc3 interface in libgcc, this can result in link errors. 4.2 Riscv architecture, -march=rv64imafdcv -mabi=lp64d -O2 After compilation, the resulting assembly will include: ---------------------------------------------------------------------------------------- call __mulsc3 call __divsc3 ---------------------------------------------------------------------------------------- Due to the presence of the __mulsc3/__divsc3 interface in libgcc, it can be linked normally. 4.3 x86_64 architecture, the results obtained after testing are the same as the Riscv architecture in 4.2, that is: ---------------------------------------------------------------------------------------- a) bf16 -> fp32 b) calls the corresponding complex interfaces __mulsc3/__divsc3 ---------------------------------------------------------------------------------------- At the beginning, I had planned to only add the __mulbc3/__divbc3 interface in libgcc. After exploration, it was found that libgcc already has a complete infrastructure, and adding only the __mulbc3/__divbc3 interfaces would cause a lot of trouble. In this context, it was decided to add a new data type BF16 to the infrastructure of libgcc, similar to FP16. Perhaps I can get some suggestions to complete the addition of __mulbc3/__divbc3 and eliminate errors when linking. Thanks Xiao Zeng
On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote: > Thank you very much for the in-depth discussion between Jakub Jelinek and jeff. > My knowledge is narrow, and I am not familiar with architectures other than RISCV. > At the same time, my understanding of libraries such as libc and libm is also shallow. > > I spent some time sorting out my thoughts, which resulted in slow email replies. I am very sorry. The important thing is that the current state of BF16 support on other architectures is what we want there, not more. So any changes done for RISCV shouldn't affect the other architectures, that wasn't the case of the patch you've posted. E.g. on x86_64, for FP16 we have: __divhc3@@GCC_12.0.0 __eqhf2@@GCC_12.0.0 __extendhfdf2@@GCC_12.0.0 __extendhfsf2@@GCC_12.0.0 __extendhftf2@@GCC_12.0.0 __extendhfxf2@@GCC_12.0.0 __fixhfti@@GCC_12.0.0 __fixunshfti@@GCC_12.0.0 __floatbitinthf@@GCC_14.0.0 __floattihf@@GCC_12.0.0 __floatuntihf@@GCC_12.0.0 __mulhc3@@GCC_12.0.0 __nehf2@@GCC_12.0.0 __truncdfhf2@@GCC_12.0.0 __trunchfbf2@@GCC_13.0.0 __truncsfhf2@@GCC_12.0.0 __trunctfhf2@@GCC_12.0.0 __truncxfhf2@@GCC_12.0.0 exported from libgcc, while for BF16 just: __extendbfsf2@@GCC_13.0.0 __floatbitintbf@@GCC_14.0.0 __floattibf@@GCC_13.0.0 __floatuntibf@@GCC_13.0.0 __truncdfbf2@@GCC_13.0.0 __trunchfbf2@@GCC_13.0.0 __truncsfbf2@@GCC_13.0.0 __trunctfbf2@@GCC_13.0.0 __truncxfbf2@@GCC_13.0.0 More attention has been paid to what we actually need there, which is primarily conversions to/from other types (but even not to all of them, with some changes on the RTL expression lowering side to make sure we use the SFmode arithmetics as much as possible and only have the really required stuff on the libgcc side. We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV, then it should be added for that arch only. And similarly, the choice of the builtins on the compiler side, the two builtins we have right now is all we want on the other arches. So, further builtins would be either a matter of RISCV specific builtins, or in generic code but guarded by some target hook so that they aren't enabled on arches which don't want them. On the libstdc++ side, the current headers provide for std::bfloat16_t and std::float16_t an implementation which uses SFmode calculations where possible, so stuff like: constexpr _Float16 acos(_Float16 __x) { return _Float16(__builtin_acosf(__x)); } or constexpr __gnu_cxx::__bfloat16_t acos(__gnu_cxx::__bfloat16_t __x) { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); } And for printing, note there is _ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 _ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 _ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 which input and output _Float16 and __bf16, but in the parameter passing they expect those types to be promoted to float, so that the ABIs aren't dependent on when a particular arch enables those types. For RISCV, the things to consider are, what is the _Float16 and __bf16 function argument passing/returning ABI? Is the type enabled on all variants of RISCV, or just some (e.g. regarding _Float16 and __bf16 on i686-linux, there is support for it only if the SSE2 ISA is available, so e.g. the *[hb][fc]* functions in libgcc need to be compiled with -msse2 extra flag)? If it can be passed/returned the same in all ABIs, what excess precision mode do you want to use on them? I mean e.g. the TARGET_C_EXCESS_PRECISION target hook. On e.g. x86_64, the default is to promote all _Float16 and __bf16 calculations to float, so if you have __bf16 a, b, c, d, e; ... a = b * c + d - e + c * d; all variables are converted to SFmode temporaries and all the arithmetics is done in SFmode and only then at the end finally converted to HFmode or BFmode. One can request a different mode, -fexcess-precision=16 in which such promotion isn't done, but as there is no hw support for most of the operations, the actual multiplication, addition or subtraction is still done in SFmode, just there is a conversion to BFmode after each operation (so slower, but more precise). If you still want to export __divbc3 and __mulbc3, do you want to export those just on some RISCV ABI variants or all of them? Depending on that, arrange for those to be compiled just for those; and, if it is exported from libgcc_s.so.1, you also need to add a symbol version for those, likely GCC_15.0.0. For enabling just those 2 functions, I don't think you need any changes on the builtins.def etc. side, those aren't builtins but libcalls. If you need other libgcc calls, similar questions to above apply, but please don't add them just because you can, but only if you really need them (they can't be handled in hw instructions and promotion to SFmode and conversion afterwards is undesirable and you actually have code that proves it emits those calls). Again, they should be only enabled on arches which ask for it (and/or sub-ABIs) and they need to symbol version stuff resolved. Jakub
2024-08-13 15:53 Jakub Jelinek <jakub@redhat.com> wrote: > >On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote: >> Thank you very much for the in-depth discussion between Jakub Jelinek and jeff. >> My knowledge is narrow, and I am not familiar with architectures other than RISCV. >> At the same time, my understanding of libraries such as libc and libm is also shallow. >> >> I spent some time sorting out my thoughts, which resulted in slow email replies. I am very sorry. > >The important thing is that the current state of BF16 support on other >architectures is what we want there, not more. So any changes done for >RISCV shouldn't affect the other architectures, that wasn't the case of >the patch you've posted. >E.g. on x86_64, for FP16 we have: >__divhc3@@GCC_12.0.0 >__eqhf2@@GCC_12.0.0 >__extendhfdf2@@GCC_12.0.0 >__extendhfsf2@@GCC_12.0.0 >__extendhftf2@@GCC_12.0.0 >__extendhfxf2@@GCC_12.0.0 >__fixhfti@@GCC_12.0.0 >__fixunshfti@@GCC_12.0.0 >__floatbitinthf@@GCC_14.0.0 >__floattihf@@GCC_12.0.0 >__floatuntihf@@GCC_12.0.0 >__mulhc3@@GCC_12.0.0 >__nehf2@@GCC_12.0.0 >__truncdfhf2@@GCC_12.0.0 >__trunchfbf2@@GCC_13.0.0 >__truncsfhf2@@GCC_12.0.0 >__trunctfhf2@@GCC_12.0.0 >__truncxfhf2@@GCC_12.0.0 >exported from libgcc, while for BF16 just: >__extendbfsf2@@GCC_13.0.0 >__floatbitintbf@@GCC_14.0.0 >__floattibf@@GCC_13.0.0 >__floatuntibf@@GCC_13.0.0 >__truncdfbf2@@GCC_13.0.0 >__trunchfbf2@@GCC_13.0.0 >__truncsfbf2@@GCC_13.0.0 >__trunctfbf2@@GCC_13.0.0 >__truncxfbf2@@GCC_13.0.0 >More attention has been paid to what we actually need there, which is >primarily conversions to/from other types (but even not to all of them, with >some changes on the RTL expression lowering side to make sure we use the >SFmode arithmetics as much as possible and only have the really required >stuff on the libgcc side. >We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV, >then it should be added for that arch only. And similarly, the choice >of the builtins on the compiler side, the two builtins we have right now is >all we want on the other arches. So, further builtins would be either a >matter of RISCV specific builtins, or in generic code but guarded by some >target hook so that they aren't enabled on arches which don't want them. >On the libstdc++ side, the current headers provide for std::bfloat16_t and >std::float16_t an implementation which uses SFmode calculations where >possible, so stuff like: > constexpr _Float16 > acos(_Float16 __x) > { return _Float16(__builtin_acosf(__x)); } >or > constexpr __gnu_cxx::__bfloat16_t > acos(__gnu_cxx::__bfloat16_t __x) > { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); } >And for printing, note there is >_ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 >_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 >_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 >_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 >which input and output _Float16 and __bf16, but in the parameter passing >they expect those types to be promoted to float, so that the ABIs aren't >dependent on when a particular arch enables those types. > >For RISCV, the things to consider are, what is the _Float16 and __bf16 >function argument passing/returning ABI? Is the type enabled on all >variants of RISCV, or just some (e.g. regarding _Float16 and __bf16 >on i686-linux, there is support for it only if the SSE2 ISA is available, >so e.g. the *[hb][fc]* functions in libgcc need to be compiled with >-msse2 extra flag)? If it can be passed/returned the same in all ABIs, >what excess precision mode do you want to use on them? I mean e.g. the >TARGET_C_EXCESS_PRECISION target hook. On e.g. x86_64, the default >is to promote all _Float16 and __bf16 calculations to float, so if you have >__bf16 a, b, c, d, e; >... >a = b * c + d - e + c * d; >all variables are converted to SFmode temporaries and all the arithmetics >is done in SFmode and only then at the end finally converted to HFmode >or BFmode. One can request a different mode, -fexcess-precision=16 >in which such promotion isn't done, but as there is no hw support for >most of the operations, the actual multiplication, addition or subtraction >is still done in SFmode, just there is a conversion to BFmode after each >operation (so slower, but more precise). >If you still want to export __divbc3 and __mulbc3, do you want to export >those just on some RISCV ABI variants or all of them? Depending on that, >arrange for those to be compiled just for those; and, if it is exported >from libgcc_s.so.1, you also need to add a symbol version for those, likely >GCC_15.0.0. > >For enabling just those 2 functions, I don't think you need any changes on >the builtins.def etc. side, those aren't builtins but libcalls. > >If you need other libgcc calls, similar questions to above apply, but please >don't add them just because you can, but only if you really need them (they >can't be handled in hw instructions and promotion to SFmode and conversion >afterwards is undesirable and you actually have code that proves it emits >those calls). Again, they should be only enabled on arches which ask for it >(and/or sub-ABIs) and they need to symbol version stuff resolved. > > Jakub Thank Jakub for a detailed analysis of this issue. This mentioned issues that I had not considered before, such as: symbol versions, their impact on all architectures, riscv architecture variants, and so on. Your analysis has expanded my knowledge, and I will seek better solutions to this problem in my free time. Thank you again, Jakub . Thanks Xiao Zeng
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index c97d6bad1de..6980873f2f1 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -109,6 +109,10 @@ DEF_PRIMITIVE_TYPE (BT_FLOAT128X, (float128x_type_node DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT, complex_float_type_node) DEF_PRIMITIVE_TYPE (BT_COMPLEX_DOUBLE, complex_double_type_node) DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex_long_double_type_node) +DEF_PRIMITIVE_TYPE (BT_COMPLEX_BFLOAT16, (bfloat16_type_node + ? build_complex_type + (bfloat16_type_node) + : error_mark_node)) DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT16, (float16_type_node ? build_complex_type (float16_type_node) @@ -163,6 +167,9 @@ DEF_PRIMITIVE_TYPE (BT_CONST_DOUBLE_PTR, (build_qualified_type (double_type_node, TYPE_QUAL_CONST))) DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE_PTR, long_double_ptr_type_node) +DEF_PRIMITIVE_TYPE (BT_BFLOAT16_PTR, (bfloat16_type_node + ? build_pointer_type (bfloat16_type_node) + : error_mark_node)) DEF_PRIMITIVE_TYPE (BT_FLOAT16_PTR, (float16_type_node ? build_pointer_type (float16_type_node) : error_mark_node)) @@ -239,6 +246,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DOUBLE) distinguish it from two types in sequence, "long" followed by "double". */ DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE) +DEF_FUNCTION_TYPE_0 (BT_FN_BFLOAT16, BT_BFLOAT16) DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16) DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32) DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64) @@ -257,6 +265,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_FLOAT, BT_FLOAT, BT_FLOAT) DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE, BT_DOUBLE, BT_DOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE) +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64) @@ -270,6 +279,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_DOUBLE_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE) +DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16, + BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32, @@ -290,6 +301,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_COMPLEX_DOUBLE, BT_DOUBLE, BT_COMPLEX_DOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_COMPLEX_LONGDOUBLE, BT_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE) +DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_COMPLEX_BFLOAT16, + BT_BFLOAT16, BT_COMPLEX_BFLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_COMPLEX_FLOAT16, BT_FLOAT16, BT_COMPLEX_FLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_COMPLEX_FLOAT32, @@ -324,6 +337,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_PTR, BT_INT, BT_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT, BT_INT, BT_FLOAT) DEF_FUNCTION_TYPE_1 (BT_FN_INT_DOUBLE, BT_INT, BT_DOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONGDOUBLE, BT_INT, BT_LONGDOUBLE) +DEF_FUNCTION_TYPE_1 (BT_FN_INT_BFLOAT16, BT_INT, BT_BFLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT16, BT_INT, BT_FLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT32, BT_INT, BT_FLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT64, BT_INT, BT_FLOAT64) @@ -337,6 +351,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_DFLOAT128, BT_INT, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT, BT_LONG, BT_FLOAT) DEF_FUNCTION_TYPE_1 (BT_FN_LONG_DOUBLE, BT_LONG, BT_DOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_LONG_LONGDOUBLE, BT_LONG, BT_LONGDOUBLE) +DEF_FUNCTION_TYPE_1 (BT_FN_LONG_BFLOAT16, BT_LONG, BT_BFLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT16, BT_LONG, BT_FLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT32, BT_LONG, BT_FLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT64, BT_LONG, BT_FLOAT64) @@ -347,6 +362,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT128X, BT_LONG, BT_FLOAT128X) DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT, BT_LONGLONG, BT_FLOAT) DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_DOUBLE, BT_LONGLONG, BT_DOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_LONGDOUBLE, BT_LONGLONG, BT_LONGDOUBLE) +DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_BFLOAT16, BT_LONGLONG, BT_BFLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT16, BT_LONGLONG, BT_FLOAT16) DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT32, BT_LONGLONG, BT_FLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT64, BT_LONGLONG, BT_FLOAT64) @@ -525,6 +541,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_DOUBLEPTR, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLEPTR, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR, + BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_FLOAT32PTR, @@ -549,6 +567,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_INT, BT_DOUBLE, BT_DOUBLE, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_INT, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT) +DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_INT, + BT_BFLOAT16, BT_BFLOAT16, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_INT, BT_FLOAT16, BT_FLOAT16, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_INT, @@ -569,6 +589,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_INTPTR, BT_DOUBLE, BT_DOUBLE, BT_INT_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_INTPTR, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_INTPTR, + BT_BFLOAT16, BT_BFLOAT16, BT_INT_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_INTPTR, BT_FLOAT16, BT_FLOAT16, BT_INT_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_INTPTR, @@ -595,6 +617,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_LONG, BT_DOUBLE, BT_DOUBLE, BT_LONG) DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONG, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONG) +DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_LONG, + BT_BFLOAT16, BT_BFLOAT16, BT_LONG) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_LONG, BT_FLOAT16, BT_FLOAT16, BT_LONG) DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_LONG, @@ -621,6 +645,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_DOUBLE_COMPLEX_DOUBLE_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE) DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE) +DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16, + BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16) DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16) DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32_COMPLEX_FLOAT32, @@ -728,6 +754,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE_DOUBLE_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE) DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE) +DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16, + BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16) DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16) DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32, @@ -748,6 +776,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE_DOUBLE_INTPTR, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_INT_PTR) DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_INTPTR, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT_PTR) +DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR, + BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_INT_PTR) DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_INT_PTR) DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR, diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 0b902896ddd..d0fc8e755e8 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -1918,6 +1918,7 @@ expand_builtin_classify_type (tree exp) fcodef32 = BUILT_IN_##MATHFN##F32; fcodef64 = BUILT_IN_##MATHFN##F64 ; \ fcodef128 = BUILT_IN_##MATHFN##F128 ; fcodef32x = BUILT_IN_##MATHFN##F32X ; \ fcodef64x = BUILT_IN_##MATHFN##F64X ; fcodef128x = BUILT_IN_##MATHFN##F128X ;\ + fcodef16b = BUILT_IN_##MATHFN##F16B ; \ break; /* Similar to above, but appends _R after any F/L suffix. */ #define CASE_MATHFN_REENT(MATHFN) \ @@ -1937,6 +1938,7 @@ mathfn_built_in_2 (tree type, combined_fn fn) { tree mtype; built_in_function fcode, fcodef, fcodel; + built_in_function fcodef16b = END_BUILTINS; built_in_function fcodef16 = END_BUILTINS; built_in_function fcodef32 = END_BUILTINS; built_in_function fcodef64 = END_BUILTINS; @@ -2055,6 +2057,8 @@ mathfn_built_in_2 (tree type, combined_fn fn) return fcodef; else if (mtype == long_double_type_node) return fcodel; + else if (mtype == bfloat16_type_node) + return fcodef16b; else if (mtype == float16_type_node) return fcodef16; else if (mtype == float32_type_node) @@ -2137,6 +2141,8 @@ mathfn_built_in_type (combined_fn fn) #define CASE_MATHFN_FLOATN(MATHFN) \ CASE_MATHFN(MATHFN) \ + case CFN_BUILT_IN_##MATHFN##F16B: \ + return bfloat16_type_node; \ case CFN_BUILT_IN_##MATHFN##F16: \ return float16_type_node; \ case CFN_BUILT_IN_##MATHFN##F32: \ diff --git a/gcc/builtins.def b/gcc/builtins.def index f6f3e104f6a..ffd427d7d93 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -77,11 +77,12 @@ along with GCC; see the file COPYING3. If not see DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, TYPE, BT_LAST, \ false, false, false, ATTRS, true, true) -/* A set of GCC builtins for _FloatN and _FloatNx types. TYPE_MACRO - is called with an argument such as FLOAT32 to produce the enum - value for the type. */ +/* A set of GCC builtins for __bf16, _FloatN and _FloatNx types. + TYPE_MACRO is called with an argument such as FLOAT32 to produce + the enum value for the type. */ #undef DEF_GCC_FLOATN_NX_BUILTINS #define DEF_GCC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS) \ + DEF_GCC_BUILTIN (ENUM ## F16B, NAME "f16b", TYPE_MACRO (BFLOAT16), ATTRS) \ DEF_GCC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \ DEF_GCC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \ DEF_GCC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \ @@ -110,12 +111,12 @@ along with GCC; see the file COPYING3. If not see DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, true, ATTRS, false, true) -/* A set of GCC builtins for _FloatN and _FloatNx types. TYPE_MACRO is called - with an argument such as FLOAT32 to produce the enum value for the type. If - we are compiling for the C language with GNU extensions, we enable the name - without the __builtin_ prefix as well as the name with the __builtin_ - prefix. C++ does not enable these names by default because a class based - library should use the __builtin_ names. */ +/* A set of GCC builtins for __bf16, _FloatN and _FloatNx types. + TYPE_MACRO is called with an argument such as FLOAT32 to produce the enum + value for the type. If we are compiling for the C language with GNU + extensions, we enable the name without the __builtin_ prefix as well as the + name with the __builtin_ prefix. C++ does not enable these names by default + because a class based library should use the __builtin_ names. */ #undef DEF_FLOATN_BUILTIN #define DEF_FLOATN_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ @@ -123,6 +124,7 @@ along with GCC; see the file COPYING3. If not see false, true) #undef DEF_EXT_LIB_FLOATN_NX_BUILTINS #define DEF_EXT_LIB_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS) \ + DEF_FLOATN_BUILTIN (ENUM ## F16B, NAME "f16b", TYPE_MACRO (BFLOAT16), ATTRS) \ DEF_FLOATN_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \ DEF_FLOATN_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \ DEF_FLOATN_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \ @@ -576,7 +578,6 @@ DEF_GCC_BUILTIN (BUILT_IN_NANSF, "nansf", BT_FN_FLOAT_CONST_STRING, ATTR_ DEF_GCC_BUILTIN (BUILT_IN_NANSL, "nansl", BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, ATTR_CONST_NOTHROW_NONNULL) #undef NAN_TYPE -DEF_GCC_BUILTIN (BUILT_IN_NANSF16B, "nansf16b", BT_FN_BFLOAT16_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) DEF_GCC_BUILTIN (BUILT_IN_NANSD32, "nansd32", BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) DEF_GCC_BUILTIN (BUILT_IN_NANSD64, "nansd64", BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) DEF_GCC_BUILTIN (BUILT_IN_NANSD128, "nansd128", BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL) @@ -591,7 +592,6 @@ DEF_C99_BUILTIN (BUILT_IN_NEXTAFTERF, "nextafterf", BT_FN_FLOAT_FLOAT_FLO DEF_C99_BUILTIN (BUILT_IN_NEXTAFTERL, "nextafterl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO) #define NEXTAFTER_TYPE(F) BT_FN_##F##_##F##_##F DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_NEXTAFTER, "nextafter", NEXTAFTER_TYPE, ATTR_MATHFN_ERRNO) -DEF_GCC_BUILTIN (BUILT_IN_NEXTAFTERF16B, "nextafterf16b", BT_FN_BFLOAT16_BFLOAT16_BFLOAT16, ATTR_MATHFN_ERRNO) DEF_C99_BUILTIN (BUILT_IN_NEXTTOWARD, "nexttoward", BT_FN_DOUBLE_DOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO) DEF_C99_BUILTIN (BUILT_IN_NEXTTOWARDF, "nexttowardf", BT_FN_FLOAT_FLOAT_LONGDOUBLE, ATTR_MATHFN_ERRNO) DEF_C99_BUILTIN (BUILT_IN_NEXTTOWARDL, "nexttowardl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO) diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc index a80372c8991..273bb9cf028 100644 --- a/gcc/c-family/c-cppbuiltin.cc +++ b/gcc/c-family/c-cppbuiltin.cc @@ -1422,7 +1422,7 @@ c_cpp_builtins (cpp_reader *pfile) else if (bfloat16_type_node && mode == TYPE_MODE (bfloat16_type_node)) { - memcpy (suffix, "bf16", 5); + memcpy (suffix, "f16b", 5); memcpy (float_h_prefix, "BFLT16", 7); } else diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc index 47bf8d64391..ed1ec0ab3ee 100644 --- a/gcc/fold-const-call.cc +++ b/gcc/fold-const-call.cc @@ -1354,7 +1354,6 @@ fold_const_call (combined_fn fn, tree type, tree arg) CASE_CFN_NANS: CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS): - case CFN_BUILT_IN_NANSF16B: case CFN_BUILT_IN_NANSD32: case CFN_BUILT_IN_NANSD64: case CFN_BUILT_IN_NANSD128: @@ -1462,7 +1461,6 @@ fold_const_call_sss (real_value *result, combined_fn fn, CASE_CFN_NEXTAFTER: CASE_CFN_NEXTAFTER_FN: - case CFN_BUILT_IN_NEXTAFTERF16B: CASE_CFN_NEXTTOWARD: return fold_const_nextafter (result, arg0, arg1, format); diff --git a/gcc/gencfn-macros.cc b/gcc/gencfn-macros.cc index 2581e758fe6..8c78ef084fe 100644 --- a/gcc/gencfn-macros.cc +++ b/gcc/gencfn-macros.cc @@ -156,10 +156,11 @@ const char *const internal_fn_int_names[] = { static const char *const flt_suffixes[] = { "F", "", "L", NULL }; static const char *const fltfn_suffixes[] = { "F16", "F32", "F64", "F128", - "F32X", "F64X", "F128X", NULL }; + "F32X", "F64X", "F128X","F16B", + NULL }; static const char *const fltall_suffixes[] = { "F", "", "L", "F16", "F32", "F64", "F128", "F32X", "F64X", - "F128X", NULL }; + "F128X", "F16B", NULL }; static const char *const int_suffixes[] = { "", "L", "LL", "IMAX", NULL }; static const char *const *const suffix_lists[] = { diff --git a/gcc/match.pd b/gcc/match.pd index c9c8478d286..ca01c6714d8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8386,7 +8386,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) #if GIMPLE (match float16_value_p @0 - (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node))) + (if ((TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node) || + (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == bfloat16_type_node)))) (for froms (BUILT_IN_TRUNCL BUILT_IN_TRUNC BUILT_IN_TRUNCF BUILT_IN_FLOORL BUILT_IN_FLOOR BUILT_IN_FLOORF BUILT_IN_CEILL BUILT_IN_CEIL BUILT_IN_CEILF @@ -8403,8 +8404,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT IFN_RINT IFN_RINT IFN_RINT IFN_SQRT IFN_SQRT IFN_SQRT) - /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc., - if x is a _Float16. */ + /* 1 (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc., + if x is a _Float16. + 2 (__bf16) round ((doube) x) -> __built_in_roundf16b (x), etc., + if x is a __bf16. */ (simplify (convert (froms (convert float16_value_p@0))) (if (optimize diff --git a/gcc/tree.h b/gcc/tree.h index 5dcbb2fb5dd..67fc2a2e614 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -310,7 +310,7 @@ code_helper::is_builtin_fn () const #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L #define CASE_FLT_FN_FLOATN_NX(FN) \ case FN##F16: case FN##F32: case FN##F64: case FN##F128: \ - case FN##F32X: case FN##F64X: case FN##F128X + case FN##F32X: case FN##F64X: case FN##F128X: case FN##F16B #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index 0e46e9ef768..b71fd5e2250 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -450,9 +450,9 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 _cmpdi2 _ucmpdi2 \ _negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 \ _ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2 \ _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2 \ - _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3 \ - _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2 \ - _clrsbdi2 _mulbitint3 + _mulhc3 _mulbc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 \ + _divbc3 _divsc3 _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 \ + _clrsbsi2 _clrsbdi2 _mulbitint3 # The floating-point conversion routines that involve a single-word integer. # XX stands for the integer mode. diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c index 3fcb85c5b92..512ca92bfb9 100644 --- a/libgcc/libgcc2.c +++ b/libgcc/libgcc2.c @@ -2591,6 +2591,7 @@ NAME (TYPE x, int m) #endif #if((defined(L_mulhc3) || defined(L_divhc3)) && LIBGCC2_HAS_HF_MODE) \ + || ((defined(L_mulbc3) || defined(L_divbc3)) && LIBGCC2_HAS_BF_MODE) \ || ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \ || ((defined(L_muldc3) || defined(L_divdc3)) && LIBGCC2_HAS_DF_MODE) \ || ((defined(L_mulxc3) || defined(L_divxc3)) && LIBGCC2_HAS_XF_MODE) \ @@ -2607,6 +2608,13 @@ NAME (TYPE x, int m) # define MODE hc # define CEXT __LIBGCC_HF_FUNC_EXT__ # define NOTRUNC (!__LIBGCC_HF_EXCESS_PRECISION__) +#elif defined(L_mulbc3) || defined(L_divbc3) +# define MTYPE BFtype +# define CTYPE BCtype +# define AMTYPE SFtype +# define MODE bc +# define CEXT __LIBGCC_BF_FUNC_EXT__ +# define NOTRUNC (!__LIBGCC_BF_EXCESS_PRECISION__) #elif defined(L_mulsc3) || defined(L_divsc3) # define MTYPE SFtype # define CTYPE SCtype @@ -2690,8 +2698,8 @@ extern void *compile_type_assert[sizeof(INFINITY) == sizeof(MTYPE) ? 1 : -1]; # define TRUNC(x) __asm__ ("" : "=m"(x) : "m"(x)) #endif -#if defined(L_mulhc3) || defined(L_mulsc3) || defined(L_muldc3) \ - || defined(L_mulxc3) || defined(L_multc3) +#if defined(L_mulhc3) || defined(L_mulbc3) || defined(L_mulsc3) \ + || defined(L_muldc3) || defined(L_mulxc3) || defined(L_multc3) CTYPE CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) @@ -2760,16 +2768,16 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) } #endif /* complex multiply */ -#if defined(L_divhc3) || defined(L_divsc3) || defined(L_divdc3) \ - || defined(L_divxc3) || defined(L_divtc3) +#if defined(L_divhc3) || defined(L_divbc3) || defined(L_divsc3) \ + || defined(L_divdc3) || defined(L_divxc3) || defined(L_divtc3) CTYPE CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) { -#if defined(L_divhc3) \ +#if (defined(L_divhc3) || defined(L_divbc3) ) \ || (defined(L_divsc3) && defined(__LIBGCC_HAVE_HWDBL__) ) - /* Half precision is handled with float precision. + /* _Float16 and __bf16 are handled with float precision. float is handled with double precision when double precision hardware is available. Due to the additional precision, the simple complex divide diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h index b358b3a2b50..ee99badde86 100644 --- a/libgcc/libgcc2.h +++ b/libgcc/libgcc2.h @@ -43,6 +43,12 @@ extern void __eprintf (const char *, const char *, unsigned int, const char *) #define LIBGCC2_HAS_HF_MODE 0 #endif +#ifdef __LIBGCC_HAS_BF_MODE__ +#define LIBGCC2_HAS_BF_MODE 1 +#else +#define LIBGCC2_HAS_BF_MODE 0 +#endif + #ifdef __LIBGCC_HAS_SF_MODE__ #define LIBGCC2_HAS_SF_MODE 1 #else @@ -146,6 +152,10 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); typedef float HFtype __attribute__ ((mode (HF))); typedef _Complex float HCtype __attribute__ ((mode (HC))); #endif +#if LIBGCC2_HAS_BF_MODE +typedef float BFtype __attribute__ ((mode (BF))); +typedef _Complex float BCtype __attribute__ ((mode (BC))); +#endif #if LIBGCC2_HAS_SF_MODE typedef float SFtype __attribute__ ((mode (SF))); typedef _Complex float SCtype __attribute__ ((mode (SC))); @@ -465,6 +475,10 @@ extern SItype __negvsi2 (SItype); extern HCtype __divhc3 (HFtype, HFtype, HFtype, HFtype); extern HCtype __mulhc3 (HFtype, HFtype, HFtype, HFtype); #endif +#if LIBGCC2_HAS_BF_MODE +extern BCtype __divbc3 (BFtype, BFtype, BFtype, BFtype); +extern BCtype __mulbc3 (BFtype, BFtype, BFtype, BFtype); +#endif #if LIBGCC2_HAS_SF_MODE extern DWtype __fixsfdi (SFtype); extern SFtype __floatdisf (DWtype);
gcc/ChangeLog: * builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node. (BT_BFLOAT16_PTR): Ditto. (BT_FN_BFLOAT16): New. (BT_FN_BFLOAT16_BFLOAT16): Ditto. (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto. (BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto. (BT_FN_INT_BFLOAT16): Ditto. (BT_FN_LONG_BFLOAT16): Ditto. (BT_FN_LONGLONG_BFLOAT16): Ditto. (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto. (BT_FN_BFLOAT16_BFLOAT16_INT): Ditto. (BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto. (BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto. (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto. (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto. (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto. * builtins.cc (expand_builtin_classify_type): Support BF16. (mathfn_built_in_2): Ditto. (CASE_MATHFN_FLOATN): Ditto. * builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto. (DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto. (BUILT_IN_NANSF16B): Added in general processing, redundant is removed here. (BUILT_IN_NEXTAFTERF16B): Ditto. * fold-const-call.cc (fold_const_call): Ditto. (fold_const_call_sss): Ditto. * gencfn-macros.cc: Support BF16. * match.pd: Like FP16, add optimization for BF16. * tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16. gcc/c-family/ChangeLog: * c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid conflicts. libgcc/ChangeLog: * Makefile.in: Add _mulbc3 and _divbc3. * libgcc2.c (if): Ditto. (defined): Ditto. (MTYPE): Macros defined for BF16. (CTYPE): Ditto. (AMTYPE): Ditto. (MODE): Ditto. (CEXT): Ditto. (NOTRUNC): Ditto. * libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16. (__attribute__): Ditto. (__divbc3): Add __divbc3 declaration. (__mulbc3): Add __mulbc3 declaration. Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com> --- gcc/builtin-types.def | 30 ++++++++++++++++++++++++++++++ gcc/builtins.cc | 6 ++++++ gcc/builtins.def | 22 +++++++++++----------- gcc/c-family/c-cppbuiltin.cc | 2 +- gcc/fold-const-call.cc | 2 -- gcc/gencfn-macros.cc | 5 +++-- gcc/match.pd | 9 ++++++--- gcc/tree.h | 2 +- libgcc/Makefile.in | 6 +++--- libgcc/libgcc2.c | 20 ++++++++++++++------ libgcc/libgcc2.h | 14 ++++++++++++++ 11 files changed, 89 insertions(+), 29 deletions(-)