diff mbox series

vect: Do not try to duplicate_and_interleave one-element mode.

Message ID D3Z9ALBHPQVX.3R9RVP0K2C01A@gmail.com
State New
Headers show
Series vect: Do not try to duplicate_and_interleave one-element mode. | expand

Commit Message

Robin Dapp Sept. 6, 2024, 2:04 p.m. UTC
Hi,

PR112694 shows that we try to create sub-vectors of single-element
vectors because can_duplicate_and_interleave_p returns true.
The problem resurfaced in PR116611.

This patch makes can_duplicate_and_interleave_p return false
if count / nvectors > 0 and removes the corresponding check in the riscv
backend.

This partially gets rid of the FAIL in slp-19a.c.  At least when built
with cost model we don't have LOAD_LANES anymore.  Without cost model,
as in the test suite, we choose a different path and still end up with
LOAD_LANES.

Bootstrapped and regtested on x86 and power10, regtested on
rv64gcv_zvfh_zvbb.  Still waiting for the aarch64 results.

Regards
 Robin

gcc/ChangeLog:

	PR target/112694
	PR target/116611.

	* config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
	return.
	* tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
	false when we cannot create sub-elements.
---
 gcc/config/riscv/riscv-v.cc | 9 ---------
 gcc/tree-vect-slp.cc        | 4 ++++
 2 files changed, 4 insertions(+), 9 deletions(-)

Comments

Richard Biener Sept. 6, 2024, 2:56 p.m. UTC | #1
> Am 06.09.2024 um 16:05 schrieb Robin Dapp <rdapp.gcc@gmail.com>:
> 
> Hi,
> 
> PR112694 shows that we try to create sub-vectors of single-element
> vectors because can_duplicate_and_interleave_p returns true.

Can we avoid querying the function?  CCing Richard who should know more about this.

Richard 

> The problem resurfaced in PR116611.
> 
> This patch makes can_duplicate_and_interleave_p return false
> if count / nvectors > 0 and removes the corresponding check in the riscv
> backend.
> 
> This partially gets rid of the FAIL in slp-19a.c.  At least when built
> with cost model we don't have LOAD_LANES anymore.  Without cost model,
> as in the test suite, we choose a different path and still end up with
> LOAD_LANES.
> 
> Bootstrapped and regtested on x86 and power10, regtested on
> rv64gcv_zvfh_zvbb.  Still waiting for the aarch64 results.
> 
> Regards
> Robin
> 
> gcc/ChangeLog:
> 
>    PR target/112694
>    PR target/116611.
> 
>    * config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
>    return.
>    * tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
>    false when we cannot create sub-elements.
> ---
> gcc/config/riscv/riscv-v.cc | 9 ---------
> gcc/tree-vect-slp.cc        | 4 ++++
> 2 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9b6c3a21e2d..5c5ed63d22e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
>      mask to do the iteration loop control. Just disable it directly.  */
>   if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
>     return false;
> -  /* FIXME: Explicitly disable VLA interleave SLP vectorization when we
> -     may encounter ICE for poly size (1, 1) vectors in loop vectorizer.
> -     Ideally, middle-end loop vectorizer should be able to disable it
> -     itself, We can remove the codes here when middle-end code is able
> -     to disable VLA SLP vectorization for poly size (1, 1) VF.  */
> -  if (!BYTES_PER_RISCV_VECTOR.is_constant ()
> -      && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL,
> -           poly_int64 (16, 16)))
> -    return false;
> 
>   struct expand_vec_perm_d d;
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3d2973698e2..17b59870c69 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
>   unsigned int nvectors = 1;
>   for (;;)
>     {
> +      /* We need to be able to to fuse COUNT / NVECTORS elements together,
> +     so no point in continuing if there are none.  */
> +      if (nvectors > count)
> +    return false;
>       scalar_int_mode int_mode;
>       poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
>       if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
> --
> 2.46.0
>
Richard Sandiford Sept. 9, 2024, 11:07 a.m. UTC | #2
Richard Biener <rguenther@suse.de> writes:
>> Am 06.09.2024 um 16:05 schrieb Robin Dapp <rdapp.gcc@gmail.com>:
>> 
>> Hi,
>> 
>> PR112694 shows that we try to create sub-vectors of single-element
>> vectors because can_duplicate_and_interleave_p returns true.
>
> Can we avoid querying the function?  CCing Richard who should know more about this.
>
> Richard 
>
>> The problem resurfaced in PR116611.
>> 
>> This patch makes can_duplicate_and_interleave_p return false
>> if count / nvectors > 0 and removes the corresponding check in the riscv
>> backend.
>> 
>> This partially gets rid of the FAIL in slp-19a.c.  At least when built
>> with cost model we don't have LOAD_LANES anymore.  Without cost model,
>> as in the test suite, we choose a different path and still end up with
>> LOAD_LANES.

Could you walk me through the failure in more detail?  It sounds
like can_duplicate_and_interleave_p eventually gets to the point of
subdividing the original elements, instead of either combining consecutive
elements (the best case), or leaving them as-is (the expected fallback
for SVE).  But it sounds like those attempts fail in this case, but an
attempt to subdivide the elements succeeds.  Is that right?  And if so,
why does that happen?

Thanks,
Richard

>> 
>> Bootstrapped and regtested on x86 and power10, regtested on
>> rv64gcv_zvfh_zvbb.  Still waiting for the aarch64 results.
>> 
>> Regards
>> Robin
>> 
>> gcc/ChangeLog:
>> 
>>    PR target/112694
>>    PR target/116611.
>> 
>>    * config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
>>    return.
>>    * tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
>>    false when we cannot create sub-elements.
>> ---
>> gcc/config/riscv/riscv-v.cc | 9 ---------
>> gcc/tree-vect-slp.cc        | 4 ++++
>> 2 files changed, 4 insertions(+), 9 deletions(-)
>> 
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 9b6c3a21e2d..5c5ed63d22e 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
>>      mask to do the iteration loop control. Just disable it directly.  */
>>   if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
>>     return false;
>> -  /* FIXME: Explicitly disable VLA interleave SLP vectorization when we
>> -     may encounter ICE for poly size (1, 1) vectors in loop vectorizer.
>> -     Ideally, middle-end loop vectorizer should be able to disable it
>> -     itself, We can remove the codes here when middle-end code is able
>> -     to disable VLA SLP vectorization for poly size (1, 1) VF.  */
>> -  if (!BYTES_PER_RISCV_VECTOR.is_constant ()
>> -      && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL,
>> -           poly_int64 (16, 16)))
>> -    return false;
>> 
>>   struct expand_vec_perm_d d;
>> 
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 3d2973698e2..17b59870c69 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
>>   unsigned int nvectors = 1;
>>   for (;;)
>>     {
>> +      /* We need to be able to to fuse COUNT / NVECTORS elements together,
>> +     so no point in continuing if there are none.  */
>> +      if (nvectors > count)
>> +    return false;
>>       scalar_int_mode int_mode;
>>       poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
>>       if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
>> --
>> 2.46.0
>>
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b6c3a21e2d..5c5ed63d22e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3709,15 +3709,6 @@  expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
      mask to do the iteration loop control. Just disable it directly.  */
   if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
     return false;
-  /* FIXME: Explicitly disable VLA interleave SLP vectorization when we
-     may encounter ICE for poly size (1, 1) vectors in loop vectorizer.
-     Ideally, middle-end loop vectorizer should be able to disable it
-     itself, We can remove the codes here when middle-end code is able
-     to disable VLA SLP vectorization for poly size (1, 1) VF.  */
-  if (!BYTES_PER_RISCV_VECTOR.is_constant ()
-      && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL,
-		   poly_int64 (16, 16)))
-    return false;
 
   struct expand_vec_perm_d d;
 
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3d2973698e2..17b59870c69 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -434,6 +434,10 @@  can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
   unsigned int nvectors = 1;
   for (;;)
     {
+      /* We need to be able to to fuse COUNT / NVECTORS elements together,
+	 so no point in continuing if there are none.  */
+      if (nvectors > count)
+	return false;
       scalar_int_mode int_mode;
       poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
       if (int_mode_for_size (elt_bits, 1).exists (&int_mode))