diff mbox series

vect: Merge loop mask and cond_op mask in fold-left, reduction.

Message ID 7c770d27-3d66-4c0f-8594-dbd7e084eab0@gmail.com
State New
Headers show
Series vect: Merge loop mask and cond_op mask in fold-left, reduction. | expand

Commit Message

Robin Dapp June 10, 2024, 7:22 a.m. UTC
Hi,

currently we discard the cond-op mask when the loop is fully masked
which causes wrong code in
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
when compiled with
-O3 -march=cascadelake --param vect-partial-vector-usage=2.

This patch ANDs both masks instead.

Bootstrapped and regtested on x86, aarch64 and power10.
Regtested on riscv64 and armv8.8-a+sve via qemu.

Regards
 Robin

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
	mask and cond-op mask.
---
 gcc/tree-vect-loop.cc | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Robin Dapp June 10, 2024, 7:37 a.m. UTC | #1
Just realized I missed the PR115382 tag in the patch...

Regards
 Robin
Richard Sandiford June 10, 2024, 9:13 a.m. UTC | #2
Robin Dapp <rdapp.gcc@gmail.com> writes:
> Hi,
>
> currently we discard the cond-op mask when the loop is fully masked
> which causes wrong code in
> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> when compiled with
> -O3 -march=cascadelake --param vect-partial-vector-usage=2.
>
> This patch ANDs both masks instead.
>
> Bootstrapped and regtested on x86, aarch64 and power10.
> Regtested on riscv64 and armv8.8-a+sve via qemu.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> 	* tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
> 	mask and cond-op mask.

OK, thanks.

Richard

> ---
>  gcc/tree-vect-loop.cc | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 028692614bb..f9bf6a45611 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7215,7 +7215,21 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>        tree len = NULL_TREE;
>        tree bias = NULL_TREE;
>        if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> -	mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
> +	{
> +	  tree mask_loop = vect_get_loop_mask (loop_vinfo, gsi, masks,
> +					       vec_num, vectype_in, i);
> +	  if (is_cond_op)
> +	    {
> +	      /* Merge the loop mask and the cond_op mask.  */
> +	      mask = make_ssa_name (TREE_TYPE (mask_loop));
> +	      gassign *and_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> +						       mask_loop,
> +						       vec_opmask[i]);
> +	      gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
> +	    }
> +	  else
> +	    mask = mask_loop;
> +	}
>        else if (is_cond_op)
>  	mask = vec_opmask[i];
>        if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
Richard Sandiford June 10, 2024, 9:45 a.m. UTC | #3
Richard Sandiford <richard.sandiford@arm.com> writes:
> Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Hi,
>>
>> currently we discard the cond-op mask when the loop is fully masked
>> which causes wrong code in
>> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>> when compiled with
>> -O3 -march=cascadelake --param vect-partial-vector-usage=2.
>>
>> This patch ANDs both masks instead.
>>
>> Bootstrapped and regtested on x86, aarch64 and power10.
>> Regtested on riscv64 and armv8.8-a+sve via qemu.
>>
>> Regards
>>  Robin
>>
>> gcc/ChangeLog:
>>
>> 	* tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
>> 	mask and cond-op mask.
>
> OK, thanks.

Actually, as Richard mentioned in the PR, it would probably be better
to use prepare_vec_mask instead.  It should work in this context too
and would avoid redundant double masking.

Thanks,
Richard

>
>> ---
>>  gcc/tree-vect-loop.cc | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index 028692614bb..f9bf6a45611 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -7215,7 +7215,21 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>        tree len = NULL_TREE;
>>        tree bias = NULL_TREE;
>>        if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>> -	mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
>> +	{
>> +	  tree mask_loop = vect_get_loop_mask (loop_vinfo, gsi, masks,
>> +					       vec_num, vectype_in, i);
>> +	  if (is_cond_op)
>> +	    {
>> +	      /* Merge the loop mask and the cond_op mask.  */
>> +	      mask = make_ssa_name (TREE_TYPE (mask_loop));
>> +	      gassign *and_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
>> +						       mask_loop,
>> +						       vec_opmask[i]);
>> +	      gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
>> +	    }
>> +	  else
>> +	    mask = mask_loop;
>> +	}
>>        else if (is_cond_op)
>>  	mask = vec_opmask[i];
>>        if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
diff mbox series

Patch

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 028692614bb..f9bf6a45611 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7215,7 +7215,21 @@  vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
       tree len = NULL_TREE;
       tree bias = NULL_TREE;
       if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
+	{
+	  tree mask_loop = vect_get_loop_mask (loop_vinfo, gsi, masks,
+					       vec_num, vectype_in, i);
+	  if (is_cond_op)
+	    {
+	      /* Merge the loop mask and the cond_op mask.  */
+	      mask = make_ssa_name (TREE_TYPE (mask_loop));
+	      gassign *and_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
+						       mask_loop,
+						       vec_opmask[i]);
+	      gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
+	    }
+	  else
+	    mask = mask_loop;
+	}
       else if (is_cond_op)
 	mask = vec_opmask[i];
       if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))