diff mbox series

middle-end: reorder masking priority of math functions

Message ID 20241002162551.1575129-1-victor.donascimento@arm.com
State New
Headers show
Series middle-end: reorder masking priority of math functions | expand

Commit Message

Victor Do Nascimento Oct. 2, 2024, 4:25 p.m. UTC
Given the categorization of math built-in functions as `ECF_CONST',
when if-converting their uses, their calls are not masked and are thus
called with an all-true predicate.

This, however, is not appropriate where built-ins have library
equivalents, wherein they may exhibit highly architecture-specific
behaviors. For example, vectorized implementations may delegate the
computation of values outside a certain acceptable numerical range to
special (non-vectorized) routines which considerably slow down
computation.

As numerical simulation programs often do bounds check on input values
prior to math calls, conditionally assigning default output values for
out-of-bounds input and skipping the math call altogether, these
fallback implementations should seldom be called in the execution of
vectorized code.  If, however, we don't apply any masking to these
math functions, we end up effectively executing both if and else
branches for these values, leading to considerable performance
degradation on scientific workloads.

We therefore invert the order of handling of math function calls in
`if_convertible_stmt_p' to prioritize the handling of their
library-provided implementations over the equivalent internal function.

Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
new regressions.

gcc/ChangeLog:

	* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
	function declaration before IFN fallback.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-fncall-mask-math.c: New.
---
 .../gcc.dg/vect/vect-fncall-mask-math.c       | 33 +++++++++++++++++++
 gcc/tree-if-conv.cc                           | 18 +++++-----
 2 files changed, 42 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c

Comments

Tamar Christina Oct. 4, 2024, 8:32 a.m. UTC | #1
Hi Victor,

> -----Original Message-----
> From: Victor Do Nascimento <victor.donascimento@arm.com>
> Sent: Wednesday, October 2, 2024 5:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> Victor Do Nascimento <Victor.DoNascimento@arm.com>
> Subject: [PATCH] middle-end: reorder masking priority of math functions
> 
> Given the categorization of math built-in functions as `ECF_CONST',
> when if-converting their uses, their calls are not masked and are thus
> called with an all-true predicate.
> 
> This, however, is not appropriate where built-ins have library
> equivalents, wherein they may exhibit highly architecture-specific
> behaviors. For example, vectorized implementations may delegate the
> computation of values outside a certain acceptable numerical range to
> special (non-vectorized) routines which considerably slow down
> computation.
> 
> As numerical simulation programs often do bounds check on input values
> prior to math calls, conditionally assigning default output values for
> out-of-bounds input and skipping the math call altogether, these
> fallback implementations should seldom be called in the execution of
> vectorized code.  If, however, we don't apply any masking to these
> math functions, we end up effectively executing both if and else
> branches for these values, leading to considerable performance
> degradation on scientific workloads.
> 
> We therefore invert the order of handling of math function calls in
> `if_convertible_stmt_p' to prioritize the handling of their
> library-provided implementations over the equivalent internal function.

I think this makes sense to me from a technical standpoint and from an SVE
one.  Though I think the original order may have been there because of the
assumption that on some uarches unpredicated implementations are faster than
predicated ones.

So there may be some concerns about this order being slower for some.
I'll leave it up to Richi since e.g. I don't know the perf characteristics of the
x86 variants here, but if there is a concern you could use the
conditional_operation_is_expensive target hook to decide on the preferred order.

But other than that the change itself looks good to be but you still need approval.

Cheers,
Tamar

> 
> Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
> new regressions.
> 
> gcc/ChangeLog:
> 
> 	* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
> 	function declaration before IFN fallback.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-fncall-mask-math.c: New.
> ---
>  .../gcc.dg/vect/vect-fncall-mask-math.c       | 33 +++++++++++++++++++
>  gcc/tree-if-conv.cc                           | 18 +++++-----
>  2 files changed, 42 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> new file mode 100644
> index 00000000000..15e22da2807
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
> @@ -0,0 +1,33 @@
> +/* Test the correct application of masking to autovectorized math function calls.
> +   Test is currently set to xfail pending the release of the relevant lmvec
> +   support. */
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
> { target { aarch64*-*-* } } } */
> +
> +#include <math.h>
> +
> +const int N = 20;
> +const float lim = 101.0;
> +const float cst =  -1.0;
> +float tot =   0.0;
> +
> +float b[20];
> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
> +		[10 ... 19] = 100.0 };    /* Else branch.  */
> +
> +int main (void)
> +{
> +  #pragma omp simd
> +  for (int i = 0; i < N; i += 1)
> +    {
> +      if (a[i] > lim)
> +	b[i] = cst;
> +      else
> +	b[i] = expf (a[i]);
> +      tot += b[i];
> +    }
> +  return (0);
> +}
> +
> +/* { dg-final { scan-tree-dump-not { gimple_call <expf, _2, _1>} ifcvt { xfail {
> aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} ifcvt
> { xfail { aarch64*-*-* } } } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 3b04d1e8d34..90c754a4814 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt,
> vec<data_reference_p> refs)
> 
>      case GIMPLE_CALL:
>        {
> -	/* There are some IFN_s that are used to replace builtins but have the
> -	   same semantics.  Even if MASK_CALL cannot handle them vectorable_call
> -	   will insert the proper selection, so do not block conversion.  */
> -	int flags = gimple_call_flags (stmt);
> -	if ((flags & ECF_CONST)
> -	    && !(flags & ECF_LOOPING_CONST_OR_PURE)
> -	    && gimple_call_combined_fn (stmt) != CFN_LAST)
> -	  return true;
> -
>  	tree fndecl = gimple_call_fndecl (stmt);
>  	if (fndecl)
>  	  {
> @@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt,
> vec<data_reference_p> refs)
>  		  }
>  	  }
> 
> +	/* There are some IFN_s that are used to replace builtins but have the
> +	   same semantics.  Even if MASK_CALL cannot handle them vectorable_call
> +	   will insert the proper selection, so do not block conversion.  */
> +	int flags = gimple_call_flags (stmt);
> +	if ((flags & ECF_CONST)
> +	    && !(flags & ECF_LOOPING_CONST_OR_PURE)
> +	    && gimple_call_combined_fn (stmt) != CFN_LAST)
> +	  return true;
> +
>  	return false;
>        }
> 
> --
> 2.34.1
Victor Do Nascimento Oct. 4, 2024, 9:24 a.m. UTC | #2
On 10/4/24 09:32, Tamar Christina wrote:
> Hi Victor,
> 
>> -----Original Message-----
>> From: Victor Do Nascimento <victor.donascimento@arm.com>
>> Sent: Wednesday, October 2, 2024 5:26 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
>> Victor Do Nascimento <Victor.DoNascimento@arm.com>
>> Subject: [PATCH] middle-end: reorder masking priority of math functions
>>
>> Given the categorization of math built-in functions as `ECF_CONST',
>> when if-converting their uses, their calls are not masked and are thus
>> called with an all-true predicate.
>>
>> This, however, is not appropriate where built-ins have library
>> equivalents, wherein they may exhibit highly architecture-specific
>> behaviors. For example, vectorized implementations may delegate the
>> computation of values outside a certain acceptable numerical range to
>> special (non-vectorized) routines which considerably slow down
>> computation.
>>
>> As numerical simulation programs often do bounds check on input values
>> prior to math calls, conditionally assigning default output values for
>> out-of-bounds input and skipping the math call altogether, these
>> fallback implementations should seldom be called in the execution of
>> vectorized code.  If, however, we don't apply any masking to these
>> math functions, we end up effectively executing both if and else
>> branches for these values, leading to considerable performance
>> degradation on scientific workloads.
>>
>> We therefore invert the order of handling of math function calls in
>> `if_convertible_stmt_p' to prioritize the handling of their
>> library-provided implementations over the equivalent internal function.
> 
> I think this makes sense to me from a technical standpoint and from an SVE
> one.  Though I think the original order may have been there because of the
> assumption that on some uarches unpredicated implementations are faster than
> predicated ones.
> 
> So there may be some concerns about this order being slower for some.
> I'll leave it up to Richi since e.g. I don't know the perf characteristics of the
> x86 variants here, but if there is a concern you could use the
> conditional_operation_is_expensive target hook to decide on the preferred order.
> 
> But other than that the change itself looks good to be but you still need approval.
> 
> Cheers,
> Tamar

Thank you very much for your input here, Tamar.

Yes, I do agree that this solution may well not be the best path forward 
for all architectures and that is something that has indeed crossed my 
mind before.

Nonetheless, I did think that the best way to get further feedback on 
the matter was to present this initial proposal to which others could 
respond as they saw fit regarding the performance characteristics in 
other architectures.

Let's see what Richi has to say. If necessary we can, as you rightly 
suggested, resort to the use of the `conditional_operation_is_expensive' 
target hook.


Many thanks once again,
Victor

>>
>> Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
>> new regressions.
>>
>> gcc/ChangeLog:
>>
>> 	* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
>> 	function declaration before IFN fallback.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/vect/vect-fncall-mask-math.c: New.
>> ---
>>   .../gcc.dg/vect/vect-fncall-mask-math.c       | 33 +++++++++++++++++++
>>   gcc/tree-if-conv.cc                           | 18 +++++-----
>>   2 files changed, 42 insertions(+), 9 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
>> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
>> new file mode 100644
>> index 00000000000..15e22da2807
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
>> @@ -0,0 +1,33 @@
>> +/* Test the correct application of masking to autovectorized math function calls.
>> +   Test is currently set to xfail pending the release of the relevant lmvec
>> +   support. */
>> +/* { dg-do compile { target { aarch64*-*-* } } } */
>> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
>> { target { aarch64*-*-* } } } */
>> +
>> +#include <math.h>
>> +
>> +const int N = 20;
>> +const float lim = 101.0;
>> +const float cst =  -1.0;
>> +float tot =   0.0;
>> +
>> +float b[20];
>> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
>> +		[10 ... 19] = 100.0 };    /* Else branch.  */
>> +
>> +int main (void)
>> +{
>> +  #pragma omp simd
>> +  for (int i = 0; i < N; i += 1)
>> +    {
>> +      if (a[i] > lim)
>> +	b[i] = cst;
>> +      else
>> +	b[i] = expf (a[i]);
>> +      tot += b[i];
>> +    }
>> +  return (0);
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not { gimple_call <expf, _2, _1>} ifcvt { xfail {
>> aarch64*-*-* } } } } */
>> +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} ifcvt
>> { xfail { aarch64*-*-* } } } } */
>> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
>> index 3b04d1e8d34..90c754a4814 100644
>> --- a/gcc/tree-if-conv.cc
>> +++ b/gcc/tree-if-conv.cc
>> @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt,
>> vec<data_reference_p> refs)
>>
>>       case GIMPLE_CALL:
>>         {
>> -	/* There are some IFN_s that are used to replace builtins but have the
>> -	   same semantics.  Even if MASK_CALL cannot handle them vectorable_call
>> -	   will insert the proper selection, so do not block conversion.  */
>> -	int flags = gimple_call_flags (stmt);
>> -	if ((flags & ECF_CONST)
>> -	    && !(flags & ECF_LOOPING_CONST_OR_PURE)
>> -	    && gimple_call_combined_fn (stmt) != CFN_LAST)
>> -	  return true;
>> -
>>   	tree fndecl = gimple_call_fndecl (stmt);
>>   	if (fndecl)
>>   	  {
>> @@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt,
>> vec<data_reference_p> refs)
>>   		  }
>>   	  }
>>
>> +	/* There are some IFN_s that are used to replace builtins but have the
>> +	   same semantics.  Even if MASK_CALL cannot handle them vectorable_call
>> +	   will insert the proper selection, so do not block conversion.  */
>> +	int flags = gimple_call_flags (stmt);
>> +	if ((flags & ECF_CONST)
>> +	    && !(flags & ECF_LOOPING_CONST_OR_PURE)
>> +	    && gimple_call_combined_fn (stmt) != CFN_LAST)
>> +	  return true;
>> +
>>   	return false;
>>         }
>>
>> --
>> 2.34.1
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
new file mode 100644
index 00000000000..15e22da2807
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
@@ -0,0 +1,33 @@ 
+/* Test the correct application of masking to autovectorized math function calls.
+   Test is currently set to xfail pending the release of the relevant lmvec
+   support. */
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" { target { aarch64*-*-* } } } */
+
+#include <math.h>
+
+const int N = 20;
+const float lim = 101.0;
+const float cst =  -1.0;
+float tot =   0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+		[10 ... 19] = 100.0 };    /* Else branch.  */
+
+int main (void)
+{
+  #pragma omp simd
+  for (int i = 0; i < N; i += 1)
+    {
+      if (a[i] > lim)
+	b[i] = cst;
+      else
+	b[i] = expf (a[i]);
+      tot += b[i];
+    }
+  return (0);
+}
+
+/* { dg-final { scan-tree-dump-not { gimple_call <expf, _2, _1>} ifcvt { xfail { aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} ifcvt { xfail { aarch64*-*-* } } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 3b04d1e8d34..90c754a4814 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1133,15 +1133,6 @@  if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
 
     case GIMPLE_CALL:
       {
-	/* There are some IFN_s that are used to replace builtins but have the
-	   same semantics.  Even if MASK_CALL cannot handle them vectorable_call
-	   will insert the proper selection, so do not block conversion.  */
-	int flags = gimple_call_flags (stmt);
-	if ((flags & ECF_CONST)
-	    && !(flags & ECF_LOOPING_CONST_OR_PURE)
-	    && gimple_call_combined_fn (stmt) != CFN_LAST)
-	  return true;
-
 	tree fndecl = gimple_call_fndecl (stmt);
 	if (fndecl)
 	  {
@@ -1160,6 +1151,15 @@  if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
 		  }
 	  }
 
+	/* There are some IFN_s that are used to replace builtins but have the
+	   same semantics.  Even if MASK_CALL cannot handle them vectorable_call
+	   will insert the proper selection, so do not block conversion.  */
+	int flags = gimple_call_flags (stmt);
+	if ((flags & ECF_CONST)
+	    && !(flags & ECF_LOOPING_CONST_OR_PURE)
+	    && gimple_call_combined_fn (stmt) != CFN_LAST)
+	  return true;
+
 	return false;
       }