[v2,1/1] RISC-V: Support BF16 interfaces in libgcc

Message ID	20240807031351.46105-2-zengxiao@eswincomputing.com
State	New
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 00B203858429 From: Xiao Zeng <zengxiao@eswincomputing.com> To: gcc-patches@gcc.gnu.org Cc: jeffreyalaw@gmail.com, research_trasio@irq.a4lg.com, kito.cheng@gmail.com, palmer@dabbelt.com, zhengyu@eswincomputing.com, Xiao Zeng <zengxiao@eswincomputing.com> Subject: [PATCH v2 1/1] RISC-V: Support BF16 interfaces in libgcc Date: Wed, 7 Aug 2024 11:13:51 +0800 Message-Id: <20240807031351.46105-2-zengxiao@eswincomputing.com> In-Reply-To: <20240807031351.46105-1-zengxiao@eswincomputing.com> References: <20240807031351.46105-1-zengxiao@eswincomputing.com> Precedence: list Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[v2,1/1] RISC-V: Support BF16 interfaces in libgcc \| expand [v2,1/1] RISC-V: Support BF16 interfaces in libgcc

Xiao Zeng Aug. 7, 2024, 3:13 a.m. UTC

gcc/ChangeLog:

	* builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node.
	(BT_BFLOAT16_PTR): Ditto.
	(BT_FN_BFLOAT16): New.
	(BT_FN_BFLOAT16_BFLOAT16): Ditto.
	(BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
	(BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
	(BT_FN_INT_BFLOAT16): Ditto.
	(BT_FN_LONG_BFLOAT16): Ditto.
	(BT_FN_LONGLONG_BFLOAT16): Ditto.
	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto.
	(BT_FN_BFLOAT16_BFLOAT16_INT): Ditto.
	(BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto.
	(BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto.
	(BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto.
	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto.
	* builtins.cc (expand_builtin_classify_type): Support BF16.
	(mathfn_built_in_2): Ditto.
	(CASE_MATHFN_FLOATN): Ditto.
	* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto.
	(DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto.
	(BUILT_IN_NANSF16B): Added in general processing, redundant
	is removed here.
	(BUILT_IN_NEXTAFTERF16B): Ditto.
	* fold-const-call.cc (fold_const_call): Ditto.
	(fold_const_call_sss): Ditto.
	* gencfn-macros.cc: Support BF16.
	* match.pd: Like FP16, add optimization for BF16.
	* tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16.

gcc/c-family/ChangeLog:

	* c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid
	conflicts.

libgcc/ChangeLog:

	* Makefile.in: Add _mulbc3 and _divbc3.
	* libgcc2.c (if): Ditto.
	(defined): Ditto.
	(MTYPE): Macros defined for BF16.
	(CTYPE): Ditto.
	(AMTYPE): Ditto.
	(MODE): Ditto.
	(CEXT): Ditto.
	(NOTRUNC): Ditto.
	* libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16.
	(__attribute__): Ditto.
	(__divbc3): Add __divbc3 declaration.
	(__mulbc3): Add __mulbc3 declaration.

Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com>
---
 gcc/builtin-types.def        | 30 ++++++++++++++++++++++++++++++
 gcc/builtins.cc              |  6 ++++++
 gcc/builtins.def             | 22 +++++++++++-----------
 gcc/c-family/c-cppbuiltin.cc |  2 +-
 gcc/fold-const-call.cc       |  2 --
 gcc/gencfn-macros.cc         |  5 +++--
 gcc/match.pd                 |  9 ++++++---
 gcc/tree.h                   |  2 +-
 libgcc/Makefile.in           |  6 +++---
 libgcc/libgcc2.c             | 20 ++++++++++++++------
 libgcc/libgcc2.h             | 14 ++++++++++++++
 11 files changed, 89 insertions(+), 29 deletions(-)

Xiao Zeng Aug. 7, 2024, 3:28 a.m. UTC | #1

2024-08-07 11:13  Xiao Zeng <zengxiao@eswincomputing.com> wrote: 

The existing test cases 'gcc.dg/portal/float16 complex.c' for gcc are
already good, so no new test cases were added.

Of course, more test cases are always good, and if necessary, I will
supplement the test cases.
>
>gcc/ChangeLog:
>
>	* builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node.
>	(BT_BFLOAT16_PTR): Ditto.
>	(BT_FN_BFLOAT16): New.
>	(BT_FN_BFLOAT16_BFLOAT16): Ditto.
>	(BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
>	(BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
>	(BT_FN_INT_BFLOAT16): Ditto.
>	(BT_FN_LONG_BFLOAT16): Ditto.
>	(BT_FN_LONGLONG_BFLOAT16): Ditto.
>	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto.
>	(BT_FN_BFLOAT16_BFLOAT16_INT): Ditto.
>	(BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto.
>	(BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto.
>	(BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
>	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto.
>	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto.
>	* builtins.cc (expand_builtin_classify_type): Support BF16.
>	(mathfn_built_in_2): Ditto.
>	(CASE_MATHFN_FLOATN): Ditto.
>	* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto.
>	(DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto.
>	(BUILT_IN_NANSF16B): Added in general processing, redundant
>	is removed here.
>	(BUILT_IN_NEXTAFTERF16B): Ditto.
>	* fold-const-call.cc (fold_const_call): Ditto.
>	(fold_const_call_sss): Ditto.
>	* gencfn-macros.cc: Support BF16.
>	* match.pd: Like FP16, add optimization for BF16.
>	* tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16.
>
>gcc/c-family/ChangeLog:
>
>	* c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid
>	conflicts.
>
>libgcc/ChangeLog:
>
>	* Makefile.in: Add _mulbc3 and _divbc3.
>	* libgcc2.c (if): Ditto.
>	(defined): Ditto.
>	(MTYPE): Macros defined for BF16.
>	(CTYPE): Ditto.
>	(AMTYPE): Ditto.
>	(MODE): Ditto.
>	(CEXT): Ditto.
>	(NOTRUNC): Ditto.
>	* libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16.
>	(__attribute__): Ditto.
>	(__divbc3): Add __divbc3 declaration.
>	(__mulbc3): Add __mulbc3 declaration.
>
>Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com>
>---
> gcc/builtin-types.def        | 30 ++++++++++++++++++++++++++++++
> gcc/builtins.cc              |  6 ++++++
> gcc/builtins.def             | 22 +++++++++++-----------
> gcc/c-family/c-cppbuiltin.cc |  2 +-
> gcc/fold-const-call.cc       |  2 --
> gcc/gencfn-macros.cc         |  5 +++--
> gcc/match.pd                 |  9 ++++++---
> gcc/tree.h                   |  2 +-
> libgcc/Makefile.in           |  6 +++---
> libgcc/libgcc2.c             | 20 ++++++++++++++------
> libgcc/libgcc2.h             | 14 ++++++++++++++
> 11 files changed, 89 insertions(+), 29 deletions(-)
>
>diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
>index c97d6bad1de..6980873f2f1 100644
>--- a/gcc/builtin-types.def
>+++ b/gcc/builtin-types.def
>@@ -109,6 +109,10 @@ DEF_PRIMITIVE_TYPE (BT_FLOAT128X, (float128x_type_node
> DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT, complex_float_type_node)
> DEF_PRIMITIVE_TYPE (BT_COMPLEX_DOUBLE, complex_double_type_node)
> DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex_long_double_type_node)
>+DEF_PRIMITIVE_TYPE (BT_COMPLEX_BFLOAT16, (bfloat16_type_node
>+	? build_complex_type
>+	(bfloat16_type_node)
>+	: error_mark_node))
> DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT16, (float16_type_node
> ? build_complex_type
> (float16_type_node)
>@@ -163,6 +167,9 @@ DEF_PRIMITIVE_TYPE (BT_CONST_DOUBLE_PTR,
>      (build_qualified_type (double_type_node,
>          TYPE_QUAL_CONST)))
> DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE_PTR, long_double_ptr_type_node)
>+DEF_PRIMITIVE_TYPE (BT_BFLOAT16_PTR, (bfloat16_type_node
>+	      ? build_pointer_type (bfloat16_type_node)
>+	      : error_mark_node))
> DEF_PRIMITIVE_TYPE (BT_FLOAT16_PTR, (float16_type_node
>       ? build_pointer_type (float16_type_node)
>       : error_mark_node))
>@@ -239,6 +246,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DOUBLE)
>    distinguish it from two types in sequence, "long" followed by
>    "double".  */
> DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_0 (BT_FN_BFLOAT16, BT_BFLOAT16)
> DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16)
> DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32)
> DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64)
>@@ -257,6 +265,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_FLOAT, BT_FLOAT, BT_FLOAT)
> DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE, BT_DOUBLE, BT_DOUBLE)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE,
>      BT_LONGDOUBLE, BT_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32)
> DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64)
>@@ -270,6 +279,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_DOUBLE_COMPLEX_DOUBLE,
>      BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE)
> DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE,
>      BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16,
>+	     BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16,
>      BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32,
>@@ -290,6 +301,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_COMPLEX_DOUBLE,
>      BT_DOUBLE, BT_COMPLEX_DOUBLE)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_COMPLEX_LONGDOUBLE,
>      BT_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_1 (BT_FN_BFLOAT16_COMPLEX_BFLOAT16,
>+	     BT_BFLOAT16, BT_COMPLEX_BFLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_COMPLEX_FLOAT16,
>      BT_FLOAT16, BT_COMPLEX_FLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_COMPLEX_FLOAT32,
>@@ -324,6 +337,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_PTR, BT_INT, BT_PTR)
> DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT, BT_INT, BT_FLOAT)
> DEF_FUNCTION_TYPE_1 (BT_FN_INT_DOUBLE, BT_INT, BT_DOUBLE)
> DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONGDOUBLE, BT_INT, BT_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_1 (BT_FN_INT_BFLOAT16, BT_INT, BT_BFLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT16, BT_INT, BT_FLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT32, BT_INT, BT_FLOAT32)
> DEF_FUNCTION_TYPE_1 (BT_FN_INT_FLOAT64, BT_INT, BT_FLOAT64)
>@@ -337,6 +351,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_DFLOAT128, BT_INT, BT_DFLOAT128)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT, BT_LONG, BT_FLOAT)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONG_DOUBLE, BT_LONG, BT_DOUBLE)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONG_LONGDOUBLE, BT_LONG, BT_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_1 (BT_FN_LONG_BFLOAT16, BT_LONG, BT_BFLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT16, BT_LONG, BT_FLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT32, BT_LONG, BT_FLOAT32)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT64, BT_LONG, BT_FLOAT64)
>@@ -347,6 +362,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_LONG_FLOAT128X, BT_LONG, BT_FLOAT128X)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT, BT_LONGLONG, BT_FLOAT)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_DOUBLE, BT_LONGLONG, BT_DOUBLE)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_LONGDOUBLE, BT_LONGLONG, BT_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_BFLOAT16, BT_LONGLONG, BT_BFLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT16, BT_LONGLONG, BT_FLOAT16)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT32, BT_LONGLONG, BT_FLOAT32)
> DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT64, BT_LONGLONG, BT_FLOAT64)
>@@ -525,6 +541,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_DOUBLEPTR,
>      BT_DOUBLE, BT_DOUBLE, BT_DOUBLE_PTR)
> DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLEPTR,
>      BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE_PTR)
>+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR,
>+	     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16_PTR)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16PTR,
>      BT_FLOAT16, BT_FLOAT16, BT_FLOAT16_PTR)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_FLOAT32PTR,
>@@ -549,6 +567,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_INT,
>      BT_DOUBLE, BT_DOUBLE, BT_INT)
> DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_INT,
>      BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT)
>+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_INT,
>+	     BT_BFLOAT16, BT_BFLOAT16, BT_INT)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_INT,
>      BT_FLOAT16, BT_FLOAT16, BT_INT)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_INT,
>@@ -569,6 +589,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_INTPTR,
>      BT_DOUBLE, BT_DOUBLE, BT_INT_PTR)
> DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_INTPTR,
>      BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT_PTR)
>+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_INTPTR,
>+	     BT_BFLOAT16, BT_BFLOAT16, BT_INT_PTR)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_INTPTR,
>      BT_FLOAT16, BT_FLOAT16, BT_INT_PTR)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_INTPTR,
>@@ -595,6 +617,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_LONG,
>      BT_DOUBLE, BT_DOUBLE, BT_LONG)
> DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONG,
>      BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONG)
>+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_LONG,
>+	     BT_BFLOAT16, BT_BFLOAT16, BT_LONG)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_LONG,
>      BT_FLOAT16, BT_FLOAT16, BT_LONG)
> DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT32_FLOAT32_LONG,
>@@ -621,6 +645,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_DOUBLE_COMPLEX_DOUBLE_COMPLEX_DOUBLE,
>      BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE, BT_COMPLEX_DOUBLE)
> DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE_COMPLEX_LONGDOUBLE,
>      BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16,
>+	     BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16, BT_COMPLEX_BFLOAT16)
> DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16_COMPLEX_FLOAT16,
>      BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT16)
> DEF_FUNCTION_TYPE_2 (BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32_COMPLEX_FLOAT32,
>@@ -728,6 +754,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE_DOUBLE_DOUBLE,
>      BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
> DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
>      BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
>+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16,
>+	     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
> DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16,
>      BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_FLOAT16)
> DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32,
>@@ -748,6 +776,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_DOUBLE_DOUBLE_INTPTR,
>      BT_DOUBLE, BT_DOUBLE, BT_DOUBLE, BT_INT_PTR)
> DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_INTPTR,
>      BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE, BT_INT_PTR)
>+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR,
>+	     BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16, BT_INT_PTR)
> DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR,
>      BT_FLOAT16, BT_FLOAT16, BT_FLOAT16, BT_INT_PTR)
> DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR,
>diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>index 0b902896ddd..d0fc8e755e8 100644
>--- a/gcc/builtins.cc
>+++ b/gcc/builtins.cc
>@@ -1918,6 +1918,7 @@ expand_builtin_classify_type (tree exp)
>   fcodef32 = BUILT_IN_##MATHFN##F32; fcodef64 = BUILT_IN_##MATHFN##F64 ; \
>   fcodef128 = BUILT_IN_##MATHFN##F128 ; fcodef32x = BUILT_IN_##MATHFN##F32X ; \
>   fcodef64x = BUILT_IN_##MATHFN##F64X ; fcodef128x = BUILT_IN_##MATHFN##F128X ;\
>+  fcodef16b = BUILT_IN_##MATHFN##F16B ; \
>   break;
> /* Similar to above, but appends _R after any F/L suffix.  */
> #define CASE_MATHFN_REENT(MATHFN) \
>@@ -1937,6 +1938,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)
> {
>   tree mtype;
>   built_in_function fcode, fcodef, fcodel;
>+  built_in_function fcodef16b = END_BUILTINS;
>   built_in_function fcodef16 = END_BUILTINS;
>   built_in_function fcodef32 = END_BUILTINS;
>   built_in_function fcodef64 = END_BUILTINS;
>@@ -2055,6 +2057,8 @@ mathfn_built_in_2 (tree type, combined_fn fn)
>     return fcodef;
>   else if (mtype == long_double_type_node)
>     return fcodel;
>+  else if (mtype == bfloat16_type_node)
>+    return fcodef16b;
>   else if (mtype == float16_type_node)
>     return fcodef16;
>   else if (mtype == float32_type_node)
>@@ -2137,6 +2141,8 @@ mathfn_built_in_type (combined_fn fn)
>
> #define CASE_MATHFN_FLOATN(MATHFN)	\
>   CASE_MATHFN(MATHFN)	\
>+  case CFN_BUILT_IN_##MATHFN##F16B:	\
>+    return bfloat16_type_node;	\
>   case CFN_BUILT_IN_##MATHFN##F16:	\
>     return float16_type_node;	\
>   case CFN_BUILT_IN_##MATHFN##F32:	\
>diff --git a/gcc/builtins.def b/gcc/builtins.def
>index f6f3e104f6a..ffd427d7d93 100644
>--- a/gcc/builtins.def
>+++ b/gcc/builtins.def
>@@ -77,11 +77,12 @@ along with GCC; see the file COPYING3.  If not see
>   DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, TYPE, BT_LAST,	\
>        false, false, false, ATTRS, true, true)
>
>-/* A set of GCC builtins for _FloatN and _FloatNx types.  TYPE_MACRO
>-   is called with an argument such as FLOAT32 to produce the enum
>-   value for the type.  */
>+/* A set of GCC builtins for __bf16, _FloatN and _FloatNx types.
>+   TYPE_MACRO is called with an argument such as FLOAT32 to produce
>+   the enum value for the type.  */
> #undef DEF_GCC_FLOATN_NX_BUILTINS
> #define DEF_GCC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
>+  DEF_GCC_BUILTIN (ENUM ## F16B, NAME "f16b", TYPE_MACRO (BFLOAT16), ATTRS) \
>   DEF_GCC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
>   DEF_GCC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
>   DEF_GCC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
>@@ -110,12 +111,12 @@ along with GCC; see the file COPYING3.  If not see
>   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,	\
>        true, true, true, ATTRS, false, true)
>
>-/* A set of GCC builtins for _FloatN and _FloatNx types.  TYPE_MACRO is called
>-   with an argument such as FLOAT32 to produce the enum value for the type.  If
>-   we are compiling for the C language with GNU extensions, we enable the name
>-   without the __builtin_ prefix as well as the name with the __builtin_
>-   prefix.  C++ does not enable these names by default because a class based
>-   library should use the __builtin_ names.  */
>+/* A set of GCC builtins for __bf16, _FloatN and _FloatNx types.
>+   TYPE_MACRO is called with an argument such as FLOAT32 to produce the enum
>+   value for the type.  If we are compiling for the C language with GNU
>+   extensions, we enable the name without the __builtin_ prefix as well as the
>+   name with the __builtin_ prefix.  C++ does not enable these names by default
>+   because a class based library should use the __builtin_ names.  */
> #undef DEF_FLOATN_BUILTIN
> #define DEF_FLOATN_BUILTIN(ENUM, NAME, TYPE, ATTRS)	\
>   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,	\
>@@ -123,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
>        false, true)
> #undef DEF_EXT_LIB_FLOATN_NX_BUILTINS
> #define DEF_EXT_LIB_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)	\
>+  DEF_FLOATN_BUILTIN (ENUM ## F16B, NAME "f16b", TYPE_MACRO (BFLOAT16), ATTRS) \
>   DEF_FLOATN_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
>   DEF_FLOATN_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
>   DEF_FLOATN_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
>@@ -576,7 +578,6 @@ DEF_GCC_BUILTIN        (BUILT_IN_NANSF, "nansf", BT_FN_FLOAT_CONST_STRING, ATTR_
> DEF_GCC_BUILTIN        (BUILT_IN_NANSL, "nansl", BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, ATTR_CONST_NOTHROW_NONNULL)
> #undef NAN_TYPE
>-DEF_GCC_BUILTIN        (BUILT_IN_NANSF16B, "nansf16b", BT_FN_BFLOAT16_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> DEF_GCC_BUILTIN        (BUILT_IN_NANSD32, "nansd32", BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> DEF_GCC_BUILTIN        (BUILT_IN_NANSD64, "nansd64", BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
> DEF_GCC_BUILTIN        (BUILT_IN_NANSD128, "nansd128", BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
>@@ -591,7 +592,6 @@ DEF_C99_BUILTIN        (BUILT_IN_NEXTAFTERF, "nextafterf", BT_FN_FLOAT_FLOAT_FLO
> DEF_C99_BUILTIN        (BUILT_IN_NEXTAFTERL, "nextafterl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO)
> #define NEXTAFTER_TYPE(F) BT_FN_##F##_##F##_##F
> DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_NEXTAFTER, "nextafter", NEXTAFTER_TYPE, ATTR_MATHFN_ERRNO)
>-DEF_GCC_BUILTIN        (BUILT_IN_NEXTAFTERF16B, "nextafterf16b", BT_FN_BFLOAT16_BFLOAT16_BFLOAT16, ATTR_MATHFN_ERRNO)
> DEF_C99_BUILTIN        (BUILT_IN_NEXTTOWARD, "nexttoward", BT_FN_DOUBLE_DOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO)
> DEF_C99_BUILTIN        (BUILT_IN_NEXTTOWARDF, "nexttowardf", BT_FN_FLOAT_FLOAT_LONGDOUBLE, ATTR_MATHFN_ERRNO)
> DEF_C99_BUILTIN        (BUILT_IN_NEXTTOWARDL, "nexttowardl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO)
>diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
>index a80372c8991..273bb9cf028 100644
>--- a/gcc/c-family/c-cppbuiltin.cc
>+++ b/gcc/c-family/c-cppbuiltin.cc
>@@ -1422,7 +1422,7 @@ c_cpp_builtins (cpp_reader *pfile)
>   else if (bfloat16_type_node
>    && mode == TYPE_MODE (bfloat16_type_node))
>     {
>-	      memcpy (suffix, "bf16", 5);
>+	      memcpy (suffix, "f16b", 5);
>       memcpy (float_h_prefix, "BFLT16", 7);
>     }
>   else
>diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc
>index 47bf8d64391..ed1ec0ab3ee 100644
>--- a/gcc/fold-const-call.cc
>+++ b/gcc/fold-const-call.cc
>@@ -1354,7 +1354,6 @@ fold_const_call (combined_fn fn, tree type, tree arg)
>
>     CASE_CFN_NANS:
>     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS):
>-    case CFN_BUILT_IN_NANSF16B:
>     case CFN_BUILT_IN_NANSD32:
>     case CFN_BUILT_IN_NANSD64:
>     case CFN_BUILT_IN_NANSD128:
>@@ -1462,7 +1461,6 @@ fold_const_call_sss (real_value *result, combined_fn fn,
>
>     CASE_CFN_NEXTAFTER:
>     CASE_CFN_NEXTAFTER_FN:
>-    case CFN_BUILT_IN_NEXTAFTERF16B:
>     CASE_CFN_NEXTTOWARD:
>       return fold_const_nextafter (result, arg0, arg1, format);
>
>diff --git a/gcc/gencfn-macros.cc b/gcc/gencfn-macros.cc
>index 2581e758fe6..8c78ef084fe 100644
>--- a/gcc/gencfn-macros.cc
>+++ b/gcc/gencfn-macros.cc
>@@ -156,10 +156,11 @@ const char *const internal_fn_int_names[] = {
>
> static const char *const flt_suffixes[] = { "F", "", "L", NULL };
> static const char *const fltfn_suffixes[] = { "F16", "F32", "F64", "F128",
>-	      "F32X", "F64X", "F128X", NULL };
>+	      "F32X", "F64X", "F128X","F16B",
>+	      NULL };
> static const char *const fltall_suffixes[] = { "F", "", "L", "F16", "F32",
>        "F64", "F128", "F32X", "F64X",
>-	       "F128X", NULL };
>+	       "F128X", "F16B", NULL };
> static const char *const int_suffixes[] = { "", "L", "LL", "IMAX", NULL };
>
> static const char *const *const suffix_lists[] = {
>diff --git a/gcc/match.pd b/gcc/match.pd
>index c9c8478d286..ca01c6714d8 100644
>--- a/gcc/match.pd
>+++ b/gcc/match.pd
>@@ -8386,7 +8386,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> #if GIMPLE
> (match float16_value_p
>  @0
>- (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node)))
>+ (if ((TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node) ||
>+      (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == bfloat16_type_node))))
> (for froms (BUILT_IN_TRUNCL BUILT_IN_TRUNC BUILT_IN_TRUNCF
>     BUILT_IN_FLOORL BUILT_IN_FLOOR BUILT_IN_FLOORF
>     BUILT_IN_CEILL BUILT_IN_CEIL BUILT_IN_CEILF
>@@ -8403,8 +8404,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT
>   IFN_RINT IFN_RINT IFN_RINT
>   IFN_SQRT IFN_SQRT IFN_SQRT)
>- /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc.,
>-    if x is a _Float16.  */
>+ /* 1 (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc.,
>+    if x is a _Float16.
>+    2 (__bf16) round ((doube) x) -> __built_in_roundf16b (x), etc.,
>+    if x is a __bf16.  */
>  (simplify
>    (convert (froms (convert float16_value_p@0)))
>      (if (optimize
>diff --git a/gcc/tree.h b/gcc/tree.h
>index 5dcbb2fb5dd..67fc2a2e614 100644
>--- a/gcc/tree.h
>+++ b/gcc/tree.h
>@@ -310,7 +310,7 @@ code_helper::is_builtin_fn () const
> #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L
> #define CASE_FLT_FN_FLOATN_NX(FN)	   \
>   case FN##F16: case FN##F32: case FN##F64: case FN##F128: \
>-  case FN##F32X: case FN##F64X: case FN##F128X
>+  case FN##F32X: case FN##F64X: case FN##F128X: case FN##F16B
> #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R
> #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX
>
>diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
>index 0e46e9ef768..b71fd5e2250 100644
>--- a/libgcc/Makefile.in
>+++ b/libgcc/Makefile.in
>@@ -450,9 +450,9 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 _cmpdi2 _ucmpdi2	   \
>     _negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2  \
>     _ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2	   \
>     _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2	   \
>-	    _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3	   \
>-	    _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2	   \
>-	    _clrsbdi2 _mulbitint3
>+	    _mulhc3 _mulbc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3	   \
>+	    _divbc3 _divsc3 _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2	   \
>+	    _clrsbsi2 _clrsbdi2 _mulbitint3
>
> # The floating-point conversion routines that involve a single-word integer.
> # XX stands for the integer mode.
>diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
>index 3fcb85c5b92..512ca92bfb9 100644
>--- a/libgcc/libgcc2.c
>+++ b/libgcc/libgcc2.c
>@@ -2591,6 +2591,7 @@ NAME (TYPE x, int m)
> #endif
>
>
> #if((defined(L_mulhc3) || defined(L_divhc3)) && LIBGCC2_HAS_HF_MODE) \
>+    || ((defined(L_mulbc3) || defined(L_divbc3)) && LIBGCC2_HAS_BF_MODE) \
>     || ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \
>     || ((defined(L_muldc3) || defined(L_divdc3)) && LIBGCC2_HAS_DF_MODE) \
>     || ((defined(L_mulxc3) || defined(L_divxc3)) && LIBGCC2_HAS_XF_MODE) \
>@@ -2607,6 +2608,13 @@ NAME (TYPE x, int m)
> # define MODE	hc
> # define CEXT	__LIBGCC_HF_FUNC_EXT__
> # define NOTRUNC (!__LIBGCC_HF_EXCESS_PRECISION__)
>+#elif defined(L_mulbc3) || defined(L_divbc3)
>+# define MTYPE  BFtype
>+# define CTYPE  BCtype
>+# define AMTYPE SFtype
>+# define MODE   bc
>+# define CEXT   __LIBGCC_BF_FUNC_EXT__
>+# define NOTRUNC (!__LIBGCC_BF_EXCESS_PRECISION__)
> #elif defined(L_mulsc3) || defined(L_divsc3)
> # define MTYPE	SFtype
> # define CTYPE	SCtype
>@@ -2690,8 +2698,8 @@ extern void *compile_type_assert[sizeof(INFINITY) == sizeof(MTYPE) ? 1 : -1];
> # define TRUNC(x)	__asm__ ("" : "=m"(x) : "m"(x))
> #endif
>
>-#if defined(L_mulhc3) || defined(L_mulsc3) || defined(L_muldc3) \
>-    || defined(L_mulxc3) || defined(L_multc3)
>+#if defined(L_mulhc3) || defined(L_mulbc3) || defined(L_mulsc3) \
>+    || defined(L_muldc3) || defined(L_mulxc3) || defined(L_multc3)
>
> CTYPE
> CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
>@@ -2760,16 +2768,16 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
> }
> #endif /* complex multiply */
>
>-#if defined(L_divhc3) || defined(L_divsc3) || defined(L_divdc3) \
>-    || defined(L_divxc3) || defined(L_divtc3)
>+#if defined(L_divhc3) || defined(L_divbc3) || defined(L_divsc3) \
>+    || defined(L_divdc3) || defined(L_divxc3) || defined(L_divtc3)
>
> CTYPE
> CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
> {
>-#if defined(L_divhc3)	\
>+#if (defined(L_divhc3) || defined(L_divbc3) )	\
>   || (defined(L_divsc3) && defined(__LIBGCC_HAVE_HWDBL__) )
>
>-  /* Half precision is handled with float precision.
>+  /* _Float16 and __bf16 are handled with float precision.
>      float is handled with double precision when double precision
>      hardware is available.
>      Due to the additional precision, the simple complex divide
>diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
>index b358b3a2b50..ee99badde86 100644
>--- a/libgcc/libgcc2.h
>+++ b/libgcc/libgcc2.h
>@@ -43,6 +43,12 @@ extern void __eprintf (const char *, const char *, unsigned int, const char *)
> #define LIBGCC2_HAS_HF_MODE 0
> #endif
>
>+#ifdef __LIBGCC_HAS_BF_MODE__
>+#define LIBGCC2_HAS_BF_MODE 1
>+#else
>+#define LIBGCC2_HAS_BF_MODE 0
>+#endif
>+
> #ifdef __LIBGCC_HAS_SF_MODE__
> #define LIBGCC2_HAS_SF_MODE 1
> #else
>@@ -146,6 +152,10 @@ typedef unsigned int UTItype	__attribute__ ((mode (TI)));
> typedef	float HFtype	__attribute__ ((mode (HF)));
> typedef _Complex float HCtype	__attribute__ ((mode (HC)));
> #endif
>+#if LIBGCC2_HAS_BF_MODE
>+typedef	float BFtype	__attribute__ ((mode (BF)));
>+typedef _Complex float BCtype	__attribute__ ((mode (BC)));
>+#endif
> #if LIBGCC2_HAS_SF_MODE
> typedef float SFtype	__attribute__ ((mode (SF)));
> typedef _Complex float SCtype	__attribute__ ((mode (SC)));
>@@ -465,6 +475,10 @@ extern SItype __negvsi2 (SItype);
> extern HCtype __divhc3 (HFtype, HFtype, HFtype, HFtype);
> extern HCtype __mulhc3 (HFtype, HFtype, HFtype, HFtype);
> #endif
>+#if LIBGCC2_HAS_BF_MODE
>+extern BCtype __divbc3 (BFtype, BFtype, BFtype, BFtype);
>+extern BCtype __mulbc3 (BFtype, BFtype, BFtype, BFtype);
>+#endif
> #if LIBGCC2_HAS_SF_MODE
> extern DWtype __fixsfdi (SFtype);
> extern SFtype __floatdisf (DWtype);
>--
>2.43.0
Thanks
Xiao Zeng

Jakub Jelinek Aug. 7, 2024, 7:16 a.m. UTC | #2

On Wed, Aug 07, 2024 at 11:13:51AM +0800, Xiao Zeng wrote:
> gcc/ChangeLog:
> 
> 	* builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node.
> 	(BT_BFLOAT16_PTR): Ditto.
> 	(BT_FN_BFLOAT16): New.
> 	(BT_FN_BFLOAT16_BFLOAT16): Ditto.
> 	(BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
> 	(BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
> 	(BT_FN_INT_BFLOAT16): Ditto.
> 	(BT_FN_LONG_BFLOAT16): Ditto.
> 	(BT_FN_LONGLONG_BFLOAT16): Ditto.
> 	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto.
> 	(BT_FN_BFLOAT16_BFLOAT16_INT): Ditto.
> 	(BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto.
> 	(BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto.
> 	(BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
> 	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto.
> 	(BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto.
> 	* builtins.cc (expand_builtin_classify_type): Support BF16.
> 	(mathfn_built_in_2): Ditto.
> 	(CASE_MATHFN_FLOATN): Ditto.
> 	* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto.
> 	(DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto.
> 	(BUILT_IN_NANSF16B): Added in general processing, redundant
> 	is removed here.
> 	(BUILT_IN_NEXTAFTERF16B): Ditto.
> 	* fold-const-call.cc (fold_const_call): Ditto.
> 	(fold_const_call_sss): Ditto.
> 	* gencfn-macros.cc: Support BF16.
> 	* match.pd: Like FP16, add optimization for BF16.
> 	* tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16.
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid
> 	conflicts.
> 
> libgcc/ChangeLog:
> 
> 	* Makefile.in: Add _mulbc3 and _divbc3.
> 	* libgcc2.c (if): Ditto.
> 	(defined): Ditto.
> 	(MTYPE): Macros defined for BF16.
> 	(CTYPE): Ditto.
> 	(AMTYPE): Ditto.
> 	(MODE): Ditto.
> 	(CEXT): Ditto.
> 	(NOTRUNC): Ditto.
> 	* libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16.
> 	(__attribute__): Ditto.
> 	(__divbc3): Add __divbc3 declaration.
> 	(__mulbc3): Add __mulbc3 declaration.
> 
> Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com>

This looks all wrong to me.

On all the other targets that already do support __bf16 type it is a storage
only type, so all arithmetics on it is expected to be done on float, not in
__bf16.
Therefore, those targets really don't want any of those other builtins,
there will be no libm support for it, and they don't want support in libgcc
either, that is just wasted code.
Intentionally the only builtins provided are the minimum required for proper
C++23 support, __builtin_nansf16b and __builtin_nextafterf16b, because
those need to be constexpr friendly and can't be dealt with by extending to
float and using float builtins.

So, if riscv wants something different (will there by e.g. any libm
implementation with all the __bf16 APIs though?), it should ask for it some way
(target hook or whatever) and only in that case it should enable the other
builtins, libgcc APIs etc.

	Jakub

Jeff Law Aug. 7, 2024, 2:46 p.m. UTC | #3

On 8/7/24 1:16 AM, Jakub Jelinek wrote:

> 
> This looks all wrong to me.
> 
> On all the other targets that already do support __bf16 type it is a storage
> only type, so all arithmetics on it is expected to be done on float, not in
> __bf16.
RISC-V has (via extensions) degrees of arithmetic/conversion support, so 
for example it can do a multiply-add of bf16 operands widening to float.

> 
> So, if riscv wants something different (will there by e.g. any libm
> implementation with all the __bf16 APIs though?), it should ask for it some way
> (target hook or whatever) and only in that case it should enable the other
> builtins, libgcc APIs etc.
ISTM for the limited cases where we want native bf16 support we could 
just have target specific builtins.

I'm not sure what the motivation is behind trying to support the richer 
set of operations really is.  So perhaps Xiao could start with 
explaining why this is important.

jeff

Jakub Jelinek Aug. 7, 2024, 2:55 p.m. UTC | #4

On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote:
> 
> 
> On 8/7/24 1:16 AM, Jakub Jelinek wrote:
> 
> > 
> > This looks all wrong to me.
> > 
> > On all the other targets that already do support __bf16 type it is a storage
> > only type, so all arithmetics on it is expected to be done on float, not in
> > __bf16.
> RISC-V has (via extensions) degrees of arithmetic/conversion support, so for
> example it can do a multiply-add of bf16 operands widening to float.

Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on
other targets), but there those functions are at least part of C23, even
when they are really not implemented yet in libm (at least talking about
glibc, but I doubt other C libraries are any further than that).
For __bf16, the only standard required stuff is in C++23 and the provided
builtins are whatever was necessary for that.

I understand RISC-V has via extensions more full _Float16 and __bf16
support, but if it needs further builtins, the questions are:
1) should they be enabled on all arches or just on those that need them?
2) is there plan to add libm support for __bf16, even when it is
non-standard in C (especially if we don't know if C2y or newer will or won't
add support for it and if it will use the chosen suffixes or some others)?
3) is there plan to add variants for C++23 <cmath> and <complex> etc.
to handle _Float16 and __bf16 differently?  Currently those types are just
handled by doing as much as possible on float, using its builtins

	Jakub

Jeff Law Aug. 7, 2024, 3:15 p.m. UTC | #5

On 8/7/24 8:55 AM, Jakub Jelinek wrote:
> On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote:
>>
>>
>> On 8/7/24 1:16 AM, Jakub Jelinek wrote:
>>
>>>
>>> This looks all wrong to me.
>>>
>>> On all the other targets that already do support __bf16 type it is a storage
>>> only type, so all arithmetics on it is expected to be done on float, not in
>>> __bf16.
>> RISC-V has (via extensions) degrees of arithmetic/conversion support, so for
>> example it can do a multiply-add of bf16 operands widening to float.
> 
> Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on
> other targets), but there those functions are at least part of C23, even
> when they are really not implemented yet in libm (at least talking about
> glibc, but I doubt other C libraries are any further than that).
> For __bf16, the only standard required stuff is in C++23 and the provided
> builtins are whatever was necessary for that.
> 
> I understand RISC-V has via extensions more full _Float16 and __bf16
> support, but if it needs further builtins, the questions are:
> 1) should they be enabled on all arches or just on those that need them?
I'd tend to take a wait and see approach, meaning start when them as 
target builtins and promote them to generic builtins if we see other 
targets implementing a richer set of bf16 operations.

> 2) is there plan to add libm support for __bf16, even when it is
> non-standard in C (especially if we don't know if C2y or newer will or won't
> add support for it and if it will use the chosen suffixes or some others)?
 > 3) is there plan to add variants for C++23 <cmath> and <complex> 
etc.> to handle _Float16 and __bf16 differently?  Currently those types 
are just
> handled by doing as much as possible on float, using its builtins
I have no idea on either of these questions.

jeff

Xiao Zeng Aug. 13, 2024, 3:14 a.m. UTC | #6

2024-08-07 23:15  Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
>On 8/7/24 8:55 AM, Jakub Jelinek wrote:
>> On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote:
>>>
>>>
>>> On 8/7/24 1:16 AM, Jakub Jelinek wrote:
>>>
>>>>
>>>> This looks all wrong to me.
>>>>
>>>> On all the other targets that already do support __bf16 type it is a storage
>>>> only type, so all arithmetics on it is expected to be done on float, not in
>>>> __bf16.
>>> RISC-V has (via extensions) degrees of arithmetic/conversion support, so for
>>> example it can do a multiply-add of bf16 operands widening to float.
>>
>> Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on
>> other targets), but there those functions are at least part of C23, even
>> when they are really not implemented yet in libm (at least talking about
>> glibc, but I doubt other C libraries are any further than that).
>> For __bf16, the only standard required stuff is in C++23 and the provided
>> builtins are whatever was necessary for that.
>>
>> I understand RISC-V has via extensions more full _Float16 and __bf16
>> support, but if it needs further builtins, the questions are:
>> 1) should they be enabled on all arches or just on those that need them?
>I'd tend to take a wait and see approach, meaning start when them as
>target builtins and promote them to generic builtins if we see other
>targets implementing a richer set of bf16 operations.
>
>> 2) is there plan to add libm support for __bf16, even when it is
>> non-standard in C (especially if we don't know if C2y or newer will or won't
>> add support for it and if it will use the chosen suffixes or some others)?
> > 3) is there plan to add variants for C++23 <cmath> and <complex>
>etc.> to handle _Float16 and __bf16 differently?  Currently those types
>are just
>> handled by doing as much as possible on float, using its builtins
>I have no idea on either of these questions.
>
>jeff

Thank you very much for the in-depth discussion between Jakub Jelinek and jeff.
My knowledge is narrow, and I am not familiar with architectures other than RISCV.
At the same time, my understanding of libraries such as libc and libm is also shallow.

I spent some time sorting out my thoughts, which resulted in slow email replies. I am very sorry.

1 BF16 is a 16 bit floating-point data type that differs only in encoding from FP16, but is otherwise the same.

2 BF16 can be used by any architecture, just like FP16.

3 libgcc provides interface functions related to floating-point types, such as __mulsc3/__divsc3.

4 There is test case:
----------------------------------------------------------------------------------------
typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
__cbf16 cbf16;
__cbf16 cbf16_1;
__cbf16 cbf16_2;
__cbf16 cbf16_mul_cbf16() { cbf16 = cbf16_1 * cbf16_2; }
__cbf16 cbf16_div_cbf16() { cbf16 = cbf16_1 / cbf16_2; }
----------------------------------------------------------------------------------------

4.1 Riscv architecture, -march=rv64imafdcv_zvfh -mabi=lp64d -O2. After compilation, the resulting assembly will include:
----------------------------------------------------------------------------------------
call	__mulbc3
call	__divbc3
----------------------------------------------------------------------------------------
Due to the absence of the __mulbc3/__divbc3 interface in libgcc, this can result in link errors.

4.2 Riscv architecture, -march=rv64imafdcv -mabi=lp64d -O2 After compilation, the resulting assembly will include:
----------------------------------------------------------------------------------------
call	__mulsc3
call	__divsc3
----------------------------------------------------------------------------------------
Due to the presence of the __mulsc3/__divsc3 interface in libgcc, it can be linked normally.

4.3 x86_64 architecture, the results obtained after testing are the same as the Riscv architecture in 4.2, that is:
----------------------------------------------------------------------------------------
a) bf16 -> fp32
b) calls the corresponding complex interfaces __mulsc3/__divsc3
----------------------------------------------------------------------------------------

At the beginning, I had planned to only add the __mulbc3/__divbc3 interface in libgcc.
After exploration, it was found that libgcc already has a complete infrastructure, and adding
only the __mulbc3/__divbc3 interfaces would cause a lot of trouble.
In this context, it was decided to add a new data type BF16 to the infrastructure of libgcc, similar to FP16.

Perhaps I can get some suggestions to complete the addition of __mulbc3/__divbc3 and eliminate errors when linking.

Thanks
Xiao Zeng

Jakub Jelinek Aug. 13, 2024, 7:53 a.m. UTC | #7

On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote:
> Thank you very much for the in-depth discussion between Jakub Jelinek and jeff.
> My knowledge is narrow, and I am not familiar with architectures other than RISCV.
> At the same time, my understanding of libraries such as libc and libm is also shallow.
> 
> I spent some time sorting out my thoughts, which resulted in slow email replies. I am very sorry.

The important thing is that the current state of BF16 support on other
architectures is what we want there, not more.  So any changes done for
RISCV shouldn't affect the other architectures, that wasn't the case of
the patch you've posted.
E.g. on x86_64, for FP16 we have:
__divhc3@@GCC_12.0.0
__eqhf2@@GCC_12.0.0
__extendhfdf2@@GCC_12.0.0
__extendhfsf2@@GCC_12.0.0
__extendhftf2@@GCC_12.0.0
__extendhfxf2@@GCC_12.0.0
__fixhfti@@GCC_12.0.0
__fixunshfti@@GCC_12.0.0
__floatbitinthf@@GCC_14.0.0
__floattihf@@GCC_12.0.0
__floatuntihf@@GCC_12.0.0
__mulhc3@@GCC_12.0.0
__nehf2@@GCC_12.0.0
__truncdfhf2@@GCC_12.0.0
__trunchfbf2@@GCC_13.0.0
__truncsfhf2@@GCC_12.0.0
__trunctfhf2@@GCC_12.0.0
__truncxfhf2@@GCC_12.0.0
exported from libgcc, while for BF16 just:
__extendbfsf2@@GCC_13.0.0
__floatbitintbf@@GCC_14.0.0
__floattibf@@GCC_13.0.0
__floatuntibf@@GCC_13.0.0
__truncdfbf2@@GCC_13.0.0
__trunchfbf2@@GCC_13.0.0
__truncsfbf2@@GCC_13.0.0
__trunctfbf2@@GCC_13.0.0
__truncxfbf2@@GCC_13.0.0
More attention has been paid to what we actually need there, which is
primarily conversions to/from other types (but even not to all of them, with
some changes on the RTL expression lowering side to make sure we use the
SFmode arithmetics as much as possible and only have the really required
stuff on the libgcc side.
We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV,
then it should be added for that arch only.  And similarly, the choice
of the builtins on the compiler side, the two builtins we have right now is
all we want on the other arches.  So, further builtins would be either a
matter of RISCV specific builtins, or in generic code but guarded by some
target hook so that they aren't enabled on arches which don't want them.
On the libstdc++ side, the current headers provide for std::bfloat16_t and
std::float16_t an implementation which uses SFmode calculations where
possible, so stuff like:
  constexpr _Float16
  acos(_Float16 __x)
  { return _Float16(__builtin_acosf(__x)); }
or
  constexpr __gnu_cxx::__bfloat16_t
  acos(__gnu_cxx::__bfloat16_t __x)
  { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); }
And for printing, note there is
_ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
which input and output _Float16 and __bf16, but in the parameter passing
they expect those types to be promoted to float, so that the ABIs aren't
dependent on when a particular arch enables those types.

For RISCV, the things to consider are, what is the _Float16 and __bf16
function argument passing/returning ABI?  Is the type enabled on all
variants of RISCV, or just some (e.g. regarding _Float16 and __bf16
on i686-linux, there is support for it only if the SSE2 ISA is available,
so e.g. the *[hb][fc]* functions in libgcc need to be compiled with
-msse2 extra flag)?  If it can be passed/returned the same in all ABIs,
what excess precision mode do you want to use on them?  I mean e.g. the
TARGET_C_EXCESS_PRECISION target hook.  On e.g. x86_64, the default
is to promote all _Float16 and __bf16 calculations to float, so if you have
__bf16 a, b, c, d, e;
...
a = b * c + d - e + c * d;
all variables are converted to SFmode temporaries and all the arithmetics
is done in SFmode and only then at the end finally converted to HFmode
or BFmode.  One can request a different mode, -fexcess-precision=16
in which such promotion isn't done, but as there is no hw support for
most of the operations, the actual multiplication, addition or subtraction
is still done in SFmode, just there is a conversion to BFmode after each
operation (so slower, but more precise).
If you still want to export __divbc3 and __mulbc3, do you want to export
those just on some RISCV ABI variants or all of them?  Depending on that,
arrange for those to be compiled just for those; and, if it is exported
from libgcc_s.so.1, you also need to add a symbol version for those, likely
GCC_15.0.0.

For enabling just those 2 functions, I don't think you need any changes on
the builtins.def etc. side, those aren't builtins but libcalls.

If you need other libgcc calls, similar questions to above apply, but please
don't add them just because you can, but only if you really need them (they
can't be handled in hw instructions and promotion to SFmode and conversion
afterwards is undesirable and you actually have code that proves it emits
those calls).  Again, they should be only enabled on arches which ask for it
(and/or sub-ABIs) and they need to symbol version stuff resolved.

	Jakub

Xiao Zeng Aug. 15, 2024, 9:05 a.m. UTC | #8

2024-08-13 15:53  Jakub Jelinek <jakub@redhat.com> wrote:
>
>On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote:
>> Thank you very much for the in-depth discussion between Jakub Jelinek and jeff.
>> My knowledge is narrow, and I am not familiar with architectures other than RISCV.
>> At the same time, my understanding of libraries such as libc and libm is also shallow.
>>
>> I spent some time sorting out my thoughts, which resulted in slow email replies. I am very sorry.
>
>The important thing is that the current state of BF16 support on other
>architectures is what we want there, not more.  So any changes done for
>RISCV shouldn't affect the other architectures, that wasn't the case of
>the patch you've posted.
>E.g. on x86_64, for FP16 we have:
>__divhc3@@GCC_12.0.0
>__eqhf2@@GCC_12.0.0
>__extendhfdf2@@GCC_12.0.0
>__extendhfsf2@@GCC_12.0.0
>__extendhftf2@@GCC_12.0.0
>__extendhfxf2@@GCC_12.0.0
>__fixhfti@@GCC_12.0.0
>__fixunshfti@@GCC_12.0.0
>__floatbitinthf@@GCC_14.0.0
>__floattihf@@GCC_12.0.0
>__floatuntihf@@GCC_12.0.0
>__mulhc3@@GCC_12.0.0
>__nehf2@@GCC_12.0.0
>__truncdfhf2@@GCC_12.0.0
>__trunchfbf2@@GCC_13.0.0
>__truncsfhf2@@GCC_12.0.0
>__trunctfhf2@@GCC_12.0.0
>__truncxfhf2@@GCC_12.0.0
>exported from libgcc, while for BF16 just:
>__extendbfsf2@@GCC_13.0.0
>__floatbitintbf@@GCC_14.0.0
>__floattibf@@GCC_13.0.0
>__floatuntibf@@GCC_13.0.0
>__truncdfbf2@@GCC_13.0.0
>__trunchfbf2@@GCC_13.0.0
>__truncsfbf2@@GCC_13.0.0
>__trunctfbf2@@GCC_13.0.0
>__truncxfbf2@@GCC_13.0.0
>More attention has been paid to what we actually need there, which is
>primarily conversions to/from other types (but even not to all of them, with
>some changes on the RTL expression lowering side to make sure we use the
>SFmode arithmetics as much as possible and only have the really required
>stuff on the libgcc side.
>We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV,
>then it should be added for that arch only.  And similarly, the choice
>of the builtins on the compiler side, the two builtins we have right now is
>all we want on the other arches.  So, further builtins would be either a
>matter of RISCV specific builtins, or in generic code but guarded by some
>target hook so that they aren't enabled on arches which don't want them.
>On the libstdc++ side, the current headers provide for std::bfloat16_t and
>std::float16_t an implementation which uses SFmode calculations where
>possible, so stuff like:
>  constexpr _Float16
>  acos(_Float16 __x)
>  { return _Float16(__builtin_acosf(__x)); }
>or
>  constexpr __gnu_cxx::__bfloat16_t
>  acos(__gnu_cxx::__bfloat16_t __x)
>  { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); }
>And for printing, note there is
>_ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>which input and output _Float16 and __bf16, but in the parameter passing
>they expect those types to be promoted to float, so that the ABIs aren't
>dependent on when a particular arch enables those types.
>
>For RISCV, the things to consider are, what is the _Float16 and __bf16
>function argument passing/returning ABI?  Is the type enabled on all
>variants of RISCV, or just some (e.g. regarding _Float16 and __bf16
>on i686-linux, there is support for it only if the SSE2 ISA is available,
>so e.g. the *[hb][fc]* functions in libgcc need to be compiled with
>-msse2 extra flag)?  If it can be passed/returned the same in all ABIs,
>what excess precision mode do you want to use on them?  I mean e.g. the
>TARGET_C_EXCESS_PRECISION target hook.  On e.g. x86_64, the default
>is to promote all _Float16 and __bf16 calculations to float, so if you have
>__bf16 a, b, c, d, e;
>...
>a = b * c + d - e + c * d;
>all variables are converted to SFmode temporaries and all the arithmetics
>is done in SFmode and only then at the end finally converted to HFmode
>or BFmode.  One can request a different mode, -fexcess-precision=16
>in which such promotion isn't done, but as there is no hw support for
>most of the operations, the actual multiplication, addition or subtraction
>is still done in SFmode, just there is a conversion to BFmode after each
>operation (so slower, but more precise).
>If you still want to export __divbc3 and __mulbc3, do you want to export
>those just on some RISCV ABI variants or all of them?  Depending on that,
>arrange for those to be compiled just for those; and, if it is exported
>from libgcc_s.so.1, you also need to add a symbol version for those, likely
>GCC_15.0.0.
>
>For enabling just those 2 functions, I don't think you need any changes on
>the builtins.def etc. side, those aren't builtins but libcalls.
>
>If you need other libgcc calls, similar questions to above apply, but please
>don't add them just because you can, but only if you really need them (they
>can't be handled in hw instructions and promotion to SFmode and conversion
>afterwards is undesirable and you actually have code that proves it emits
>those calls).  Again, they should be only enabled on arches which ask for it
>(and/or sub-ABIs) and they need to symbol version stuff resolved.
>
>	Jakub 
Thank Jakub for a detailed analysis of this issue.

This mentioned issues that I had not considered before, such as:
symbol versions, their impact on all architectures, riscv architecture variants, and so on.

Your analysis has expanded my knowledge, and I will seek better solutions to this problem in my free time.

Thank you again, Jakub .

Thanks
Xiao Zeng

[v2,1/1] RISC-V: Support BF16 interfaces in libgcc

Commit Message

Comments

Patch