RISC-V: optim const DF +0.0 store to mem [PR/110748]

Message ID	20230721175552.2693295-1-vineetg@rivosinc.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 06DFB3858409 From: Vineet Gupta <vineetg@rivosinc.com> To: gcc-patches@gcc.gnu.org, Manolis Tsamis <manolis.tsamis@vrull.eu> Cc: kito.cheng@gmail.com, Jeff Law <jeffreyalaw@gmail.com>, Palmer Dabbelt <palmer@rivosinc.com>, gnu-toolchain@rivosinc.com, Vineet Gupta <vineetg@rivosinc.com> Subject: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748] Date: Fri, 21 Jul 2023 10:55:52 -0700 Message-Id: <20230721175552.2693295-1-vineetg@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	RISC-V: optim const DF +0.0 store to mem [PR/110748] \| expand RISC-V: optim const DF +0.0 store to mem [PR/110748]

Vineet Gupta July 21, 2023, 5:55 p.m. UTC

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd     fa5,0(a0)
| ret

With patch

| sd      zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

	* config/riscv/predicates.md (const_0_operand): Add back
	  const_double.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr110748-1.c: New Test.
	* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
	  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
 gcc/config/riscv/predicates.md                 |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c    | 10 ++++++++++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 ++++----
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

Philipp Tomsich July 21, 2023, 6:15 p.m. UTC | #1

On Fri, 21 Jul 2023 at 19:56, Vineet Gupta <vineetg@rivosinc.com> wrote:
>
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd     fa5,0(a0)
> | ret
>
> With patch
>
> | sd      zero,0(a0)
> | ret
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"

> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.

Should we add a "Fixes: "?

> Ran thru full multilib testsuite, there was 1 false failure due to
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:

PR target/110748

>
>         * config/riscv/predicates.md (const_0_operand): Add back
>           const_double.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/pr110748-1.c: New Test.
>         * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>           patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
>  gcc/config/riscv/predicates.md                 |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110748-1.c    | 10 ++++++++++
>  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 ++++----
>  3 files changed, 15 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
>         (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
>         (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index 000000000000..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0;  }
> +void zf(float *f)  { *f = 0.0;  }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> --
> 2.34.1
>

Vineet Gupta July 21, 2023, 6:23 p.m. UTC | #2

On 7/21/23 11:15, Philipp Tomsich wrote:
> On Fri, 21 Jul 2023 at 19:56, Vineet Gupta <vineetg@rivosinc.com> wrote:
>> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
>>
>> void zd(double *) { *d = 0.0; }
>>
>> currently:
>>
>> | fmv.d.x fa5,zero
>> | fsd     fa5,0(a0)
>> | ret
>>
>> With patch
>>
>> | sd      zero,0(a0)
>> | ret
>> This came to light when testing the in-flight f-m-o patch where an ICE
>> was gettinh triggered due to lack of this pattern but turns out this
> typo: "gettinh" -> "getting"

Fixed.

>> is an independent optimization of its own [1]
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>>
>> Apparently this is a regression in gcc-13, introduced by commit
>> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
>> thus is a partial revert of that change.
> Should we add a "Fixes: "?

Sure. Although gcc usage of Fixes tag seems slightly different than say 
linux kernel's.


>
>> Ran thru full multilib testsuite, there was 1 false failure due to
>> random string "lw" appearing in lto build assembler output,
>> which is also fixed in the patch.
>>
>> gcc/Changelog:
> PR target/110748

Added.

Thx,
-Vineet


>
>>          * config/riscv/predicates.md (const_0_operand): Add back
>>            const_double.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/pr110748-1.c: New Test.
>>          * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>>            patterns to avoid random string matches.
>>
>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>> ---
>>   gcc/config/riscv/predicates.md                 |  2 +-
>>   gcc/testsuite/gcc.target/riscv/pr110748-1.c    | 10 ++++++++++
>>   gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 ++++----
>>   3 files changed, 15 insertions(+), 5 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>>
>> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>> index 5a22c77f0cd0..9db28c2def7e 100644
>> --- a/gcc/config/riscv/predicates.md
>> +++ b/gcc/config/riscv/predicates.md
>> @@ -58,7 +58,7 @@
>>          (match_test "INTVAL (op) + 1 != 0")))
>>
>>   (define_predicate "const_0_operand"
>> -  (and (match_code "const_int,const_wide_int,const_vector")
>> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
>>          (match_test "op == CONST0_RTX (GET_MODE (op))")))
>>
>>   (define_predicate "const_1_operand"
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
>> new file mode 100644
>> index 000000000000..2f5bc08aae72
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target hard_float } */
>> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
>> +
>> +
>> +void zd(double *d) { *d = 0.0;  }
>> +void zf(float *f)  { *f = 0.0;  }
>> +
>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> index 1036044291e7..89eb48bed1b9 100644
>> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> @@ -18,7 +18,7 @@ d2ll (double d)
>>   /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>>   /* { dg-final { scan-assembler "fmv.x.w" } } */
>>   /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
>> -/* { dg-final { scan-assembler-not "sw" } } */
>> -/* { dg-final { scan-assembler-not "fld" } } */
>> -/* { dg-final { scan-assembler-not "fsd" } } */
>> -/* { dg-final { scan-assembler-not "lw" } } */
>> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
>> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
>> --
>> 2.34.1
>>

Palmer Dabbelt July 21, 2023, 6:31 p.m. UTC | #3

On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd     fa5,0(a0)
> | ret
>
> With patch
>
> | sd      zero,0(a0)
> | ret
>
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this
> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.

Given that it can ICE, we should probably backport it to 13.

> Ran thru full multilib testsuite, there was 1 false failure due to

Did you run the test with autovec?  There's also a 
pmode_reg_or_0_operand, some of those don't appear protected from FP 
values.  So we might need something like

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
 
 (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
   [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
    (match_operand:VNX1_QHSDI 2 "register_operand")
    (match_operand 3 "<VNX1_QHSD:gs_extension>")
    (match_operand 4 "<VNX1_QHSD:gs_scale>")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...

> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:
>
> 	* config/riscv/predicates.md (const_0_operand): Add back
> 	  const_double.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/riscv/pr110748-1.c: New Test.
> 	* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
> 	  patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
>  gcc/config/riscv/predicates.md                 |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110748-1.c    | 10 ++++++++++
>  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 ++++----
>  3 files changed, 15 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
>         (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
>         (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index 000000000000..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0;  }
> +void zf(float *f)  { *f = 0.0;  }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */

IIUC the pattern to emit fmv suffers from the same bug -- it's fixed in the same
way, but I think we might be able to come up with a test for it: `fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like

    double sum(double *d) {
      double sum = 0;
      for (int i = 0; i < 8; ++i)
        sum += d[i];
      return sum;
    }

would do it?  That's generating the fmv on 13 for me, though, so maybe I'm
missing something?`

> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */

I think that autovec one is the only possible dependency that might have snuck
in, so we should be safe otherwise.  Thanks!

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

Jeff Law July 21, 2023, 6:47 p.m. UTC | #4

On 7/21/23 12:31, Palmer Dabbelt wrote:

> 
> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>    [(match_operand:VNX1_QHSD 0 "register_operand")
> -   (match_operand 1 "pmode_reg_or_0_operand")
> +   (match_operand:P 1 "pmode_reg_or_0_operand")
>     (match_operand:VNX1_QHSDI 2 "register_operand")
>     (match_operand 3 "<VNX1_QHSD:gs_extension>")
>     (match_operand 4 "<VNX1_QHSD:gs_scale>")
> 
> a bunch of times, as there's a ton of them?  I'm not entirely sure if that
> could manifest as an actual bug, though...
But won't this cause (const_int 0) to no longer match because CONST_INT 
nodes are modeless (VOIDmode)?

Jeff

Vineet Gupta July 21, 2023, 6:55 p.m. UTC | #5

On 7/21/23 11:31, Palmer Dabbelt wrote:
> On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
>> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to 
>> optimize it.
>>
>> void zd(double *) { *d = 0.0; }
>>
>> currently:
>>
>> | fmv.d.x fa5,zero
>> | fsd     fa5,0(a0)
>> | ret
>>
>> With patch
>>
>> | sd      zero,0(a0)
>> | ret
>>
>> This came to light when testing the in-flight f-m-o patch where an ICE
>> was gettinh triggered due to lack of this pattern but turns out this
>> is an independent optimization of its own [1]
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>>
>> Apparently this is a regression in gcc-13, introduced by commit
>> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
>> thus is a partial revert of that change.
>
> Given that it can ICE, we should probably backport it to 13.

FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree 
already. And this will merge ahead of that.
I'm fine with backport though.

>
>> Ran thru full multilib testsuite, there was 1 false failure due to
>
> Did you run the test with autovec?

I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite 
being enabled at -O2 and above will not kick in.
I think we should add a 'v' multilib.


> There's also a pmode_reg_or_0_operand, some of those don't appear 
> protected from FP values.  So we might need something like
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index cd5b19457f8..d8ce9223343 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
>
> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>   [(match_operand:VNX1_QHSD 0 "register_operand")
> -   (match_operand 1 "pmode_reg_or_0_operand")
> +   (match_operand:P 1 "pmode_reg_or_0_operand")
>    (match_operand:VNX1_QHSDI 2 "register_operand")
>    (match_operand 3 "<VNX1_QHSD:gs_extension>")
>    (match_operand 4 "<VNX1_QHSD:gs_scale>")
>
> a bunch of times, as there's a ton of them?  I'm not entirely sure if 
> that
> could manifest as an actual bug, though...

What does 'P' do here ?

>> +
>> +void zd(double *d) { *d = 0.0;  }
>> +void zf(float *f)  { *f = 0.0;  }
>> +
>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
>
> IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
> in the same
> way, but I think we might be able to come up with a test for it: 
> `fmv.d.x FREG,
> x0` would be the fastest way to generate 0.0, so maybe something like
>
>    double sum(double *d) {
>      double sum = 0;
>      for (int i = 0; i < 8; ++i)
>        sum += d[i];
>      return sum;
>    }
>
> would do it?  That's generating the fmv on 13 for me, though, so maybe 
> I'm
> missing something?`

I need to unpack this first :-)

>
>> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
>> b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> index 1036044291e7..89eb48bed1b9 100644
>> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> @@ -18,7 +18,7 @@ d2ll (double d)
>>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
>> -/* { dg-final { scan-assembler-not "sw" } } */
>> -/* { dg-final { scan-assembler-not "fld" } } */
>> -/* { dg-final { scan-assembler-not "fsd" } } */
>> -/* { dg-final { scan-assembler-not "lw" } } */
>> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
>> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
>
> I think that autovec one is the only possible dependency that might 
> have snuck
> in, so we should be safe otherwise.  Thanks!

I'm not sure if this specific comment is related to the xthead test or 
continuation of above.
For xthead it is real issue since I saw a random "lw" in lto assembler 
output.

Vineet Gupta July 21, 2023, 7:37 p.m. UTC | #6

On 7/21/23 11:31, Palmer Dabbelt wrote:
>
> IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
> in the same
> way, but I think we might be able to come up with a test for it: 
> `fmv.d.x FREG,
> x0` would be the fastest way to generate 0.0, so maybe something like
>
>    double sum(double *d) {
>      double sum = 0;
>      for (int i = 0; i < 8; ++i)
>        sum += d[i];
>      return sum;
>    }
>
> would do it?  That's generating the fmv on 13 for me, though, so maybe 
> I'm
> missing something?` 

I don't think we can avoid FMV in this case

     fmv.d.x    fa0,zero     #1
     addi    a5,a0,64
.L2:
     fld    fa5,0(a0)
     addi    a0,a0,8
     fadd.d    fa0,fa0,fa5   #2
     bne    a0,a5,.L2
     ret

In #1, the zero needs to be setup in FP reg (possible using FMV), since 
in #2 it will be used for FP math.

If we change ur test slightly,

double zadd(double *d) {
      double sum = 0.0;
      for (int i = 0; i < 8; ++i)
        d[i] = sum;
      return sum;
}

We still get the optimal code for writing to FP 0. The last FMV is 
unavoidable as we need an FP return reg.


     addi    a5,a0,64
.L2:
     sd    zero,0(a0)
     addi    a0,a0,8
     bne    a0,a5,.L2
     fmv.d.x    fa0,zero
     ret

Jeff Law July 22, 2023, 6:03 a.m. UTC | #7

On 7/21/23 12:55, Vineet Gupta wrote:

>>>
>>> Apparently this is a regression in gcc-13, introduced by commit
>>> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
>>> thus is a partial revert of that change.
>>
>> Given that it can ICE, we should probably backport it to 13.
> 
> FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree 
> already. And this will merge ahead of that.
> I'm fine with backport though.
It's latent on the gcc-13 branch from an ICE standpoint and would 
probably stay that way -- triggering would require something to 
re-recognize the pattern after register allocation.

I won't object to it going into gcc-13.  No strong opinions either way, 
happy to go with consensus opinion.



>> There's also a pmode_reg_or_0_operand, some of those don't appear 
>> protected from FP values.  So we might need something like
>>
>> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
>> index cd5b19457f8..d8ce9223343 100644
>> --- a/gcc/config/riscv/autovec.md
>> +++ b/gcc/config/riscv/autovec.md
>> @@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
>>
>> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>>   [(match_operand:VNX1_QHSD 0 "register_operand")
>> -   (match_operand 1 "pmode_reg_or_0_operand")
>> +   (match_operand:P 1 "pmode_reg_or_0_operand")
>>    (match_operand:VNX1_QHSDI 2 "register_operand")
>>    (match_operand 3 "<VNX1_QHSD:gs_extension>")
>>    (match_operand 4 "<VNX1_QHSD:gs_scale>")
>>
>> a bunch of times, as there's a ton of them?  I'm not entirely sure if 
>> that
>> could manifest as an actual bug, though...
> 
> What does 'P' do here ?
It will force the operand to match the P mode iterator, which should be 
DImode for rv64 and SImode for rv32.  I'm not sure this is wise as I 
think it'll end up rejecting (const_int 0) because CONST_INT nodes do 
not have a mode.

Jeff

Palmer Dabbelt July 25, 2023, 11:05 p.m. UTC | #8

On Fri, 21 Jul 2023 11:47:58 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> On 7/21/23 12:31, Palmer Dabbelt wrote:
>> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>>    [(match_operand:VNX1_QHSD 0 "register_operand")
>> -   (match_operand 1 "pmode_reg_or_0_operand")
>> +   (match_operand:P 1 "pmode_reg_or_0_operand")
>>     (match_operand:VNX1_QHSDI 2 "register_operand")
>>     (match_operand 3 "<VNX1_QHSD:gs_extension>")
>>     (match_operand 4 "<VNX1_QHSD:gs_scale>")
>>
>> a bunch of times, as there's a ton of them?  I'm not entirely sure if that
>> could manifest as an actual bug, though...
> But won't this cause (const_int 0) to no longer match because CONST_INT
> nodes are modeless (VOIDmode)?

I poked around a bit and I'm not actually sure, I'm kind of lost on the docs
here.  IIUC we're eliding the VOIDmode in the predicate correctly

    (define_predicate "const_0_operand"
      (and (match_code "const_int,const_wide_int,const_vector")
           (match_test "op == CONST0_RTX (GET_MODE (op))")))

so we're OK there, otherwise we'd presumably have similar problems with
expanders like

    (define_expand "subsi3"
      [(set (match_operand:SI           0 "register_operand" "= r")
           (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
                     (match_operand:SI 2 "register_operand" "  r")))]
      ""

which we have a few of -- though it'd be kind of a silent failure, as
presumably we'd just end up with some more move-x0s emitted?

Jeff Law July 26, 2023, 3:09 a.m. UTC | #9

On 7/25/23 17:05, Palmer Dabbelt wrote:
> On Fri, 21 Jul 2023 11:47:58 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> On 7/21/23 12:31, Palmer Dabbelt wrote:
>>> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>>>    [(match_operand:VNX1_QHSD 0 "register_operand")
>>> -   (match_operand 1 "pmode_reg_or_0_operand")
>>> +   (match_operand:P 1 "pmode_reg_or_0_operand")
>>>     (match_operand:VNX1_QHSDI 2 "register_operand")
>>>     (match_operand 3 "<VNX1_QHSD:gs_extension>")
>>>     (match_operand 4 "<VNX1_QHSD:gs_scale>")
>>>
>>> a bunch of times, as there's a ton of them?  I'm not entirely sure if 
>>> that
>>> could manifest as an actual bug, though...
>> But won't this cause (const_int 0) to no longer match because CONST_INT
>> nodes are modeless (VOIDmode)?
> 
> I poked around a bit and I'm not actually sure, I'm kind of lost on the 
> docs
> here.  IIUC we're eliding the VOIDmode in the predicate correctly
> 
>     (define_predicate "const_0_operand"
>       (and (match_code "const_int,const_wide_int,const_vector")
>            (match_test "op == CONST0_RTX (GET_MODE (op))")))
> 
> so we're OK there, otherwise we'd presumably have similar problems with
> expanders like
> 
>     (define_expand "subsi3"
>       [(set (match_operand:SI           0 "register_operand" "= r")
>            (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
>                      (match_operand:SI 2 "register_operand" "  r")))]
>       ""
> 
> which we have a few of -- though it'd be kind of a silent failure, as
> presumably we'd just end up with some more move-x0s emitted?
It's a bit messy to say the least.  However, we can look at other ports 
and after doing so I'm less sure my concern is valid.

Take the typical movXX pattern or expander.  Both operands have a mode, 
so things like CONST_INT must be passing through, even though they're 
VOIDmode.

So it's probably a non-issue.
jeff

RISC-V: optim const DF +0.0 store to mem [PR/110748]

Commit Message

Comments

Patch