diff mbox series

RISC-V: Enable Hoist to GCSE simple constants

Message ID 20230810003026.214982-1-vineetg@rivosinc.com
State New
Headers show
Series RISC-V: Enable Hoist to GCSE simple constants | expand

Commit Message

Vineet Gupta Aug. 10, 2023, 12:30 a.m. UTC
Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates . For a const with cost 1 backend currently returns 0,
causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
already since backend is returning 1 as well. Arguably that needs updating
as well to reflect cost better, but that should be a different change
anyways.

To keep testsuite parity, some V tests need updatinge which started failing
in the new costing regime.

gcc/ChangeLog:
	* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.

gcc/testsuite/ChangeLog:
	* gcc.target/riscv/gcse-const.c: New Test
	* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
	  -O2.
	* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
 gcc/config/riscv/riscv.cc                           |  9 ++-------
 gcc/testsuite/gcc.target/riscv/gcse-const.c         | 13 +++++++++++++
 .../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c  |  2 +-
 4 files changed, 17 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/gcse-const.c

Comments

Jeff Law Aug. 23, 2023, 8:04 p.m. UTC | #1
On 8/9/23 18:30, Vineet Gupta wrote:
> Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
> hoist candidates . For a const with cost 1 backend currently returns 0,
> causing Hoist to bail and elide GCSE.
> 
> Note that constants requiring more than 1 insns to setup were working
> already since backend is returning 1 as well. Arguably that needs updating
> as well to reflect cost better, but that should be a different change
> anyways.
> 
> To keep testsuite parity, some V tests need updatinge which started failing
> in the new costing regime.
> 
> gcc/ChangeLog:
> 	* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/riscv/gcse-const.c: New Test
> 	* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
> 	  -O2.
> 	* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.
Thanks for your patience on this.  I needed a bit of time to gather my 
thoughts and review some code.


> index 8b7256108157..1802eef908fc 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
>       case CONST:
>         if ((cost = riscv_const_insns (x)) > 0)
>   	{
> -	  /* If the constant is likely to be stored in a GPR, SETs of
> -	     single-insn constants are as cheap as register sets; we
> -	     never want to CSE them.  */
> +	  /* Hoist will GCSE constants only if cost is non-zero.  */
>   	  if (cost == 1 && outer_code == SET)
> -	    *total = 0;
> -	  /* When we load a constant more than once, it usually is better
> -	     to duplicate the last operation in the sequence than to CSE
> -	     the constant itself.  */
> +	    *total = COSTS_N_INSNS (1);
>   	  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>   	    *total = COSTS_N_INSNS (1);
>   	}
So the concern here was we have two classes of constants which can be 
synthesized in a single instruction.

One class would be those constants that can be used as-is in most 
instructions.  (const_int 0) being the most obvious, but of course 
there's many others.

The other class can be synthesized in a single instruction, but aren't 
typically usable in something like addi, andi, etc.  A good example 
might be (const_int 0x40000000).


I wanted to make sure we were doing something sensible across those two 
cases.  And I think we probably are as we have an earlier check in the 
CONST_INT case (no I don't like the case fallthrus at all :(

So I think your change makes sense.   But I think it can be refined to 
simplify the larger chunk of code we're looking at:

>           /* If the constant is likely to be stored in a GPR, SETs of
>              single-insn constants are as cheap as register sets; we
>              never want to CSE them.  */
>           if (cost == 1 && outer_code == SET)
>             *total = 0;
>           /* When we load a constant more than once, it usually is better
>              to duplicate the last operation in the sequence than to CSE
>              the constant itself.  */
>           else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>             *total = COSTS_N_INSNS (1);

Turns into
   if (outer_code == SET || GET_MODE (x) == VOIDmode)
     *total = COSTS_N_INSNS (1);

With a suitable comment about GCSE and the general desire to duplicate 
the last op rather than CSE the constant for multi instruction constant 
synthesis cases.

If you agree, then consider the patch pre-approved with that change.  If 
not, then state why and your original patch is OK as well.

Again, thanks for your patience on this.

jeff
Vineet Gupta Aug. 24, 2023, 12:42 a.m. UTC | #2
On 8/23/23 13:04, Jeff Law wrote:
> Thanks for your patience on this.  I needed a bit of time to gather my 
> thoughts and review some code.

No worries at all.

>> index 8b7256108157..1802eef908fc 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
>> outer_code, int opno ATTRIBUTE_UN
>>       case CONST:
>>         if ((cost = riscv_const_insns (x)) > 0)
>>       {
>> -      /* If the constant is likely to be stored in a GPR, SETs of
>> -         single-insn constants are as cheap as register sets; we
>> -         never want to CSE them.  */
>> +      /* Hoist will GCSE constants only if cost is non-zero. */
>>         if (cost == 1 && outer_code == SET)
>> -        *total = 0;
>> -      /* When we load a constant more than once, it usually is better
>> -         to duplicate the last operation in the sequence than to CSE
>> -         the constant itself.  */
>> +        *total = COSTS_N_INSNS (1);
>>         else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>>           *total = COSTS_N_INSNS (1);
>>       }
> So the concern here was we have two classes of constants which can be 
> synthesized in a single instruction.
>
> One class would be those constants that can be used as-is in most 
> instructions.  (const_int 0) being the most obvious, but of course 
> there's many others.
>
> The other class can be synthesized in a single instruction, but aren't 
> typically usable in something like addi, andi, etc.  A good example 
> might be (const_int 0x40000000).
>
>
> I wanted to make sure we were doing something sensible across those 
> two cases.  And I think we probably are as we have an earlier check in 
> the CONST_INT case

Exactly, I have a note written down to trace the call flow to remind 
myself how this works. I'll add a comment here to make this clear.

> (no I don't like the case fallthrus at all :(

Seriously, I detest it too, but the irony is I've now made my 2nd change 
in there and keep adding to ugliness :-(

>
> So I think your change makes sense.   But I think it can be refined to 
> simplify the larger chunk of code we're looking at:
>
>>           /* If the constant is likely to be stored in a GPR, SETs of
>>              single-insn constants are as cheap as register sets; we
>>              never want to CSE them.  */
>>           if (cost == 1 && outer_code == SET)
>>             *total = 0;
>>           /* When we load a constant more than once, it usually is 
>> better
>>              to duplicate the last operation in the sequence than to CSE
>>              the constant itself.  */
>>           else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>>             *total = COSTS_N_INSNS (1);
>
> Turns into
>   if (outer_code == SET || GET_MODE (x) == VOIDmode)
>     *total = COSTS_N_INSNS (1);

Yep that's what I started with too but then left it, leaving it as an 
visual indication to fix things up when ultimately cost model returns 
the actual num of insns for a non trivial large const.Leaving the code 
there meant we  But I agree I'll fold it and add a TODO comment for 
improving the cost model.

For the current proposal I do want to understand/reason what is left 
there - what cases are we trying to filter out with #2467 ?

|    case CONST:
|      if ((cost = riscv_const_insns (x)) > 0)                 # 2465
|     {
|        if (outer_code == SET || GET_MODE (x) == VOIDmode)  # 2467
|                        *total = COSTS_N_INSNS 
(1);                            # 2468

(1) AFAIU, VOIDmode is for const_int - and supposedly true for symbolic 
addresses etc whose actual values are not known at compile time ? Or is 
it needed just as an artifact of the weird fall through.

(2) outer_code SET will kick in for set_src_cost( ) call from Hoist, 
which passes a const_int rtx directly.
       But in case of say expand_mult () -> set_src_cost ( ) called for say
         (mult:DI (reg:DI 134)
             (const_int [0x202020202020202]))

       In the eventual call for const_int operand, outer_code is MULT, 
so we elide #2468

       But wait we don't even hit #2467 since #2465 has a weird cap 
inside which caps 6 to 0.

|    case CONST_INT:
|      {
|          int cost = riscv_integer_cost (INTVAL (x));
|          /* Force complicated constants to memory.  */
|          return cost < 4 ? cost : 0;          #1321
|      }

This definitely needs to be tracked separately in a PR

>
> With a suitable comment about GCSE and the general desire to duplicate 
> the last op rather than CSE the constant for multi instruction 
> constant synthesis cases.

Hmm, do we not prefer GCSE/CSE a 3-4 insn const sequence. It seems the 
RA is capable enough of undoing that ;-)
For now I can I can keep the comment with current philosophy

>
> If you agree, then consider the patch pre-approved with that change.  
> If not, then state why and your original patch is OK as well.

I'm afraid I have more questions than either of us were hoping for :-)

But it seems we can chunk up the work for just Hoist enabling and then 
improve the const cost model with that new PR.
I'll spin up a buildroot testbed to get a wider impact of that change.



Thx,
-Vineet
Jeff Law Aug. 24, 2023, 1:51 p.m. UTC | #3
On 8/23/23 18:42, Vineet Gupta wrote:

> 
> Seriously, I detest it too, but the irony is I've now made my 2nd change 
> in there and keep adding to ugliness :-(
Happens to all of us sometimes.

> 
>>
>> So I think your change makes sense.   But I think it can be refined to 
>> simplify the larger chunk of code we're looking at:
>>
>>>           /* If the constant is likely to be stored in a GPR, SETs of
>>>              single-insn constants are as cheap as register sets; we
>>>              never want to CSE them.  */
>>>           if (cost == 1 && outer_code == SET)
>>>             *total = 0;
>>>           /* When we load a constant more than once, it usually is 
>>> better
>>>              to duplicate the last operation in the sequence than to CSE
>>>              the constant itself.  */
>>>           else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>>>             *total = COSTS_N_INSNS (1);
>>
>> Turns into
>>   if (outer_code == SET || GET_MODE (x) == VOIDmode)
>>     *total = COSTS_N_INSNS (1);
> 
> Yep that's what I started with too but then left it, leaving it as an 
> visual indication to fix things up when ultimately cost model returns 
> the actual num of insns for a non trivial large const.Leaving the code 
> there meant we  But I agree I'll fold it and add a TODO comment for 
> improving the cost model.
> 
> For the current proposal I do want to understand/reason what is left 
> there - what cases are we trying to filter out with #2467 ?
> 
> |    case CONST:
> |      if ((cost = riscv_const_insns (x)) > 0)                 # 2465
> |     {
> |        if (outer_code == SET || GET_MODE (x) == VOIDmode)  # 2467
> |                        *total = COSTS_N_INSNS 
> (1);                            # 2468
> 
> (1) AFAIU, VOIDmode is for const_int - and supposedly true for symbolic 
> addresses etc whose actual values are not known at compile time ? Or is 
> it needed just as an artifact of the weird fall through.
I'd expect it to filter away some symbolics and perhaps floating point 
constants.


> 
> (2) outer_code SET will kick in for set_src_cost( ) call from Hoist, 
> which passes a const_int rtx directly.
>        But in case of say expand_mult () -> set_src_cost ( ) called for say
>          (mult:DI (reg:DI 134)
>              (const_int [0x202020202020202]))
> 
>        In the eventual call for const_int operand, outer_code is MULT, 
> so we elide #2468
> 
>        But wait we don't even hit #2467 since #2465 has a weird cap 
> inside which caps 6 to 0.
> 
> |    case CONST_INT:
> |      {
> |          int cost = riscv_integer_cost (INTVAL (x));
> |          /* Force complicated constants to memory.  */
> |          return cost < 4 ? cost : 0;          #1321
> |      }
> 
> This definitely needs to be tracked separately in a PR
And when we solve it, it'll likely need to be uarch driven.  I've hinted 
before that there'll be changes in this area that will allow some 
targets to handle most, if not all, of those complicated constants in a 
single cycle.

> 
>>
>> With a suitable comment about GCSE and the general desire to duplicate 
>> the last op rather than CSE the constant for multi instruction 
>> constant synthesis cases.
> 
> Hmm, do we not prefer GCSE/CSE a 3-4 insn const sequence. It seems the 
> RA is capable enough of undoing that ;-)
> For now I can I can keep the comment with current philosophy
I mentally set aside the multi-insn sequence concern.  Presumably the 
code was written that way for a reason and without a good one, backed by 
data, I was hesitant to push back on that choice.

> 
>>
>> If you agree, then consider the patch pre-approved with that change. 
>> If not, then state why and your original patch is OK as well.
> 
> I'm afraid I have more questions than either of us were hoping for :-)
> 
> But it seems we can chunk up the work for just Hoist enabling and then 
> improve the const cost model with that new PR.
> I'll spin up a buildroot testbed to get a wider impact of that change.
Exactly.  THere's multiple issues, but I think they can be tackled 
independently.

jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8b7256108157..1802eef908fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2464,14 +2464,9 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
     case CONST:
       if ((cost = riscv_const_insns (x)) > 0)
 	{
-	  /* If the constant is likely to be stored in a GPR, SETs of
-	     single-insn constants are as cheap as register sets; we
-	     never want to CSE them.  */
+	  /* Hoist will GCSE constants only if cost is non-zero.  */
 	  if (cost == 1 && outer_code == SET)
-	    *total = 0;
-	  /* When we load a constant more than once, it usually is better
-	     to duplicate the last operation in the sequence than to CSE
-	     the constant itself.  */
+	    *total = COSTS_N_INSNS (1);
 	  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
 	    *total = COSTS_N_INSNS (1);
 	}
diff --git a/gcc/testsuite/gcc.target/riscv/gcse-const.c b/gcc/testsuite/gcc.target/riscv/gcse-const.c
new file mode 100644
index 000000000000..b04707ce9745
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/gcse-const.c
@@ -0,0 +1,13 @@ 
+/* Slightly modified copy of gcc.target/arm/pr40956.c.  */
+/* { dg-options "-Os" }  */
+/* Make sure the constant "6" is loaded into register only once.  */
+/* { dg-final { scan-assembler-times "\tli.*6" 1 } } */
+
+int foo(int p, int* q)
+{
+  if (p!=9)
+    *q = 6;
+  else
+    *(q+1) = 6;
+  return 3;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
index 60ad108666f8..83618880c8ea 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
@@ -21,5 +21,5 @@  void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
index 7b9574cc332d..8aca839c49aa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
@@ -21,7 +21,7 @@  void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0"  no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-O2"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */