diff mbox series

[1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it

Message ID 20240816213559.1486438-1-quic_apinski@quicinc.com
State New
Headers show
Series [1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it | expand

Commit Message

Andrew Pinski Aug. 16, 2024, 9:35 p.m. UTC
On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
if there was an optab for the type, so this changes that to check the optab to see if we should expand
or have the backend handle it.

Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

	* builtins.cc (fold_builtin_bit_query): Don't expand double
	`unsigned long long` typess if there is an optab entry for that
	type.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
---
 gcc/builtins.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Richard Sandiford Aug. 20, 2024, 4:45 p.m. UTC | #1
Andrew Pinski <quic_apinski@quicinc.com> writes:
> On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
> instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
> reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
> if there was an optab for the type, so this changes that to check the optab to see if we should expand
> or have the backend handle it.
>
> Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
> 	* builtins.cc (fold_builtin_bit_query): Don't expand double
> 	`unsigned long long` typess if there is an optab entry for that
> 	type.

OK.  The logic in the function seems a bit twisty (the same condition
is checked later), but all my attempts to improve it only made it worse.

Thanks,
Richard

>
> Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> ---
>  gcc/builtins.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 0b902896ddd..b4d51eaeba5 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum built_in_function fcode,
>    tree call = NULL_TREE, tem;
>    if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
>        && (TYPE_PRECISION (arg0_type)
> -	  == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> +	  == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> +      /* If the target supports the optab, then don't do the expansion. */
> +      && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
>      {
>        /* __int128 expansions using up to 2 long long builtins.  */
>        arg0 = save_expr (arg0);
Andrew Pinski Aug. 21, 2024, 12:22 a.m. UTC | #2
On Tue, Aug 20, 2024 at 9:46 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Andrew Pinski <quic_apinski@quicinc.com> writes:
> > On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
> > instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
> > reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
> > if there was an optab for the type, so this changes that to check the optab to see if we should expand
> > or have the backend handle it.
> >
> > Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> >       * builtins.cc (fold_builtin_bit_query): Don't expand double
> >       `unsigned long long` typess if there is an optab entry for that
> >       type.
>
> OK.  The logic in the function seems a bit twisty (the same condition
> is checked later), but all my attempts to improve it only made it worse.

I tried to look if there was a good refactoring here too but I didn't
see any either.
Anyways I have now pushed it as
r15-3056-g50b5000a5e430aaf99a5e00465cc9e25563d908b .

Thanks,
Andrew

>
> Thanks,
> Richard
>
> >
> > Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> > ---
> >  gcc/builtins.cc | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 0b902896ddd..b4d51eaeba5 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum built_in_function fcode,
> >    tree call = NULL_TREE, tem;
> >    if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
> >        && (TYPE_PRECISION (arg0_type)
> > -       == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> > +       == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> > +      /* If the target supports the optab, then don't do the expansion. */
> > +      && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
> >      {
> >        /* __int128 expansions using up to 2 long long builtins.  */
> >        arg0 = save_expr (arg0);
diff mbox series

Patch

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..b4d51eaeba5 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10185,7 +10185,9 @@  fold_builtin_bit_query (location_t loc, enum built_in_function fcode,
   tree call = NULL_TREE, tem;
   if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
       && (TYPE_PRECISION (arg0_type)
-	  == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
+	  == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
+      /* If the target supports the optab, then don't do the expansion. */
+      && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
     {
       /* __int128 expansions using up to 2 long long builtins.  */
       arg0 = save_expr (arg0);