diff mbox series

[to-be-committed,RISC-V] Avoid unnecessary extensions after sCC insns

Message ID 84d0a9e2-a49c-4030-93b1-5f813c93389a@gmail.com
State New
Headers show
Series [to-be-committed,RISC-V] Avoid unnecessary extensions after sCC insns | expand

Commit Message

Jeff Law Sept. 4, 2024, 8:47 p.m. UTC
So I was looking at a performance regression in spec with Ventana's 
internal tree.  Ultimately the problem was a bad interaction with an 
internal patch (REP_MODE_EXTENDED), fwprop and ext-dce.  The details of 
that problem aren't particularly important.


Removal of the local patch went reasonably well.  But I did see some 
secondary cases where we had redundant sign extensions.  The most 
notable cases come from the integer sCC insns.

Expansion of those cases for rv64 can be improved using Jivan's trick. 
ie, if the target is not DImode, then create a DImode temporary for the 
result and copy the low bits out with a promoted subreg to the real target.


With the change in expansion the final code we generate is slightly 
different for a few tests at -O1/-Og, but should perform the same.  The 
key for the affected tests is we're not seeing the introduction of 
unnecessary extensions.  Rather than adjust the regexps to handle the 
-O1/-Og output, skipping for those seemed OK to me.  I didn't extract a 
testcase.  I'm a bit fried from digging through LTO'd code right now ;-)

Anyway, this has gone through my tester on rv32 and rv64.  I'll let it 
spin in the pre-commit tester before taking further action.

Jeff
gcc/
	* config/riscv/riscv.cc (riscv_expand_int_scc): For rv64, use a DI
	temporary for the output and a promoted subreg to extract it into SI
	target.

gcc/testsuite/
	* gcc.target/riscv/sge.c: Skip for -O1 and -Og.
	* gcc.target/riscv/sgeu.c: Likewise.
	* gcc.target/riscv/sle.c: Likewise.
	* gcc.target/riscv/sleu.c: Likewise.

Comments

Palmer Dabbelt Sept. 4, 2024, 10:11 p.m. UTC | #1
On Wed, 04 Sep 2024 13:47:58 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
> So I was looking at a performance regression in spec with Ventana's
> internal tree.  Ultimately the problem was a bad interaction with an
> internal patch (REP_MODE_EXTENDED), fwprop and ext-dce.  The details of
> that problem aren't particularly important.
>
>
> Removal of the local patch went reasonably well.  But I did see some
> secondary cases where we had redundant sign extensions.  The most
> notable cases come from the integer sCC insns.
>
> Expansion of those cases for rv64 can be improved using Jivan's trick.
> ie, if the target is not DImode, then create a DImode temporary for the
> result and copy the low bits out with a promoted subreg to the real target.
>
>
> With the change in expansion the final code we generate is slightly
> different for a few tests at -O1/-Og, but should perform the same.  The
> key for the affected tests is we're not seeing the introduction of
> unnecessary extensions.  Rather than adjust the regexps to handle the
> -O1/-Og output, skipping for those seemed OK to me.  I didn't extract a
> testcase.  I'm a bit fried from digging through LTO'd code right now ;-)

Just spot-checking that first one, it's changing from

	slt	a0,a0,a1
	seqz	a0,a0
	ret

to

	slt	a0,a0,a1
	xori	a0,a0,1
	ret

which smells like it's just an arbitrary instruction selection decision getting
tripped up -- they're the same cost (or at least I'd expect them to be pretty
much everywhere) and do the same thing for the 1-bit output, so it's not like
three's a reason to select one rather than the other.

IIUC we're really trying to optimize for "doing a setCC without an 
extension", so I think if we just adjust the positive-regex to match for 
that (as opposed to matching for inverting the output) then we'd have 
tests rebust to the instruction selection.

Unless I've managed to screw something up, this would do it
<https://inbox.sourceware.org/gcc-patches/20240904220707.3900-1-palmer@rivosinc.com/>
-- the sign extension rules are always a bit tricky as there's some implicit
assumptions, but I think this is safe because it's all internal to this
restricted case of the setCC.

> Anyway, this has gone through my tester on rv32 and rv64.  I'll let it
> spin in the pre-commit tester before taking further action.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

Thanks!
>
> Jeff
>
> gcc/
> 	* config/riscv/riscv.cc (riscv_expand_int_scc): For rv64, use a DI
> 	temporary for the output and a promoted subreg to extract it into SI
> 	target.
>
> gcc/testsuite/
> 	* gcc.target/riscv/sge.c: Skip for -O1 and -Og.
> 	* gcc.target/riscv/sgeu.c: Likewise.
> 	* gcc.target/riscv/sle.c: Likewise.
> 	* gcc.target/riscv/sleu.c: Likewise.
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index f82e64a6fec..ad8bcd3e959 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -4891,13 +4891,31 @@ riscv_expand_int_scc (rtx target, enum rtx_code code, rtx op0, rtx op1, bool *in
>    riscv_extend_comparands (code, &op0, &op1);
>    op0 = force_reg (word_mode, op0);
>
> +  /* For sub-word targets on rv64, do the computation in DImode
> +     then extract the lowpart for the final target, marking it
> +     as sign extended.  Note that it's also properly zero extended,
> +     but it's probably more profitable to expose it as sign extended.  */
> +  rtx t;
> +  if (TARGET_64BIT && GET_MODE (target) == SImode)
> +    t = gen_reg_rtx (DImode);
> +  else
> +    t = target;
> +
>    if (code == EQ || code == NE)
>      {
>        rtx zie = riscv_zero_if_equal (op0, op1);
> -      riscv_emit_binary (code, target, zie, const0_rtx);
> +      riscv_emit_binary (code, t, zie, const0_rtx);
>      }
>    else
> -    riscv_emit_int_order_test (code, invert_ptr, target, op0, op1);
> +    riscv_emit_int_order_test (code, invert_ptr, t, op0, op1);
> +
> +  if (t != target)
> +    {
> +      t = gen_lowpart (SImode, t);
> +      SUBREG_PROMOTED_VAR_P (t) = 1;
> +      SUBREG_PROMOTED_SET (t, SRP_SIGNED);
> +      emit_move_insn (target, t);
> +    }
>  }
>
>  /* Like riscv_expand_int_scc, but for floating-point comparisons.  */
> diff --git a/gcc/testsuite/gcc.target/riscv/sge.c b/gcc/testsuite/gcc.target/riscv/sge.c
> index 5f7e7ae82db..ab278d18d3b 100644
> --- a/gcc/testsuite/gcc.target/riscv/sge.c
> +++ b/gcc/testsuite/gcc.target/riscv/sge.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target rv64 } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
>
>  int
>  sge (int x, int y)
> diff --git a/gcc/testsuite/gcc.target/riscv/sgeu.c b/gcc/testsuite/gcc.target/riscv/sgeu.c
> index 234b9aa52bd..38c2272a55f 100644
> --- a/gcc/testsuite/gcc.target/riscv/sgeu.c
> +++ b/gcc/testsuite/gcc.target/riscv/sgeu.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target rv64 } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
>
>  int
>  sgeu (unsigned int x, unsigned int y)
> diff --git a/gcc/testsuite/gcc.target/riscv/sle.c b/gcc/testsuite/gcc.target/riscv/sle.c
> index 3259c191598..4371068c50e 100644
> --- a/gcc/testsuite/gcc.target/riscv/sle.c
> +++ b/gcc/testsuite/gcc.target/riscv/sle.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target rv64 } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
>
>  int
>  sle (int x, int y)
> diff --git a/gcc/testsuite/gcc.target/riscv/sleu.c b/gcc/testsuite/gcc.target/riscv/sleu.c
> index 301b8c32eb7..8d4ee50b07c 100644
> --- a/gcc/testsuite/gcc.target/riscv/sleu.c
> +++ b/gcc/testsuite/gcc.target/riscv/sleu.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target rv64 } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
>
>  int
>  sleu (unsigned int x, unsigned int y)
Jeff Law Sept. 4, 2024, 10:31 p.m. UTC | #2
On 9/4/24 4:11 PM, Palmer Dabbelt wrote:
> On Wed, 04 Sep 2024 13:47:58 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>> So I was looking at a performance regression in spec with Ventana's
>> internal tree.  Ultimately the problem was a bad interaction with an
>> internal patch (REP_MODE_EXTENDED), fwprop and ext-dce.  The details of
>> that problem aren't particularly important.
>>
>>
>> Removal of the local patch went reasonably well.  But I did see some
>> secondary cases where we had redundant sign extensions.  The most
>> notable cases come from the integer sCC insns.
>>
>> Expansion of those cases for rv64 can be improved using Jivan's trick.
>> ie, if the target is not DImode, then create a DImode temporary for the
>> result and copy the low bits out with a promoted subreg to the real target.
>>
>>
>> With the change in expansion the final code we generate is slightly
>> different for a few tests at -O1/-Og, but should perform the same.  The
>> key for the affected tests is we're not seeing the introduction of
>> unnecessary extensions.  Rather than adjust the regexps to handle the
>> -O1/-Og output, skipping for those seemed OK to me.  I didn't extract a
>> testcase.  I'm a bit fried from digging through LTO'd code right now ;-)
> 
> Just spot-checking that first one, it's changing from
> 
> 	slt	a0,a0,a1
> 	seqz	a0,a0
> 	ret
> 
> to
> 
> 	slt	a0,a0,a1
> 	xori	a0,a0,1
> 	ret
> 
> which smells like it's just an arbitrary instruction selection decision getting
> tripped up -- they're the same cost (or at least I'd expect them to be pretty
> much everywhere) and do the same thing for the 1-bit output, so it's not like
> three's a reason to select one rather than the other.
Exactly.  It's just instruction selection changes as a result of 
something changing early in the RTL pipeline.  I would expect both of 
those sequences to perform the same.

The RTL generation change is really targeted at cases that show up 
elsewhere.  For example   feval(state_t*, int, t_eval_comps*) in deepsjeng:

>       011038b3          snez                    a7,a7
>       01e037b3          snez                    a5,t5
[ ... ]
>       2881              sext.w                  a7,a7
[ ... ]
>       0007851b          sext.w                  a0,a5


By constructing into an X mode temporary and extracting via a promoted 
subreg, we tell the rest of the RTL pipeline that the value is already 
sign extended and the explicit extensions above are unnecessary.
Jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f82e64a6fec..ad8bcd3e959 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4891,13 +4891,31 @@  riscv_expand_int_scc (rtx target, enum rtx_code code, rtx op0, rtx op1, bool *in
   riscv_extend_comparands (code, &op0, &op1);
   op0 = force_reg (word_mode, op0);
 
+  /* For sub-word targets on rv64, do the computation in DImode
+     then extract the lowpart for the final target, marking it
+     as sign extended.  Note that it's also properly zero extended,
+     but it's probably more profitable to expose it as sign extended.  */
+  rtx t;
+  if (TARGET_64BIT && GET_MODE (target) == SImode)
+    t = gen_reg_rtx (DImode);
+  else
+    t = target;
+
   if (code == EQ || code == NE)
     {
       rtx zie = riscv_zero_if_equal (op0, op1);
-      riscv_emit_binary (code, target, zie, const0_rtx);
+      riscv_emit_binary (code, t, zie, const0_rtx);
     }
   else
-    riscv_emit_int_order_test (code, invert_ptr, target, op0, op1);
+    riscv_emit_int_order_test (code, invert_ptr, t, op0, op1);
+
+  if (t != target)
+    {
+      t = gen_lowpart (SImode, t);
+      SUBREG_PROMOTED_VAR_P (t) = 1;
+      SUBREG_PROMOTED_SET (t, SRP_SIGNED);
+      emit_move_insn (target, t);
+    }
 }
 
 /* Like riscv_expand_int_scc, but for floating-point comparisons.  */
diff --git a/gcc/testsuite/gcc.target/riscv/sge.c b/gcc/testsuite/gcc.target/riscv/sge.c
index 5f7e7ae82db..ab278d18d3b 100644
--- a/gcc/testsuite/gcc.target/riscv/sge.c
+++ b/gcc/testsuite/gcc.target/riscv/sge.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target rv64 } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
 
 int
 sge (int x, int y)
diff --git a/gcc/testsuite/gcc.target/riscv/sgeu.c b/gcc/testsuite/gcc.target/riscv/sgeu.c
index 234b9aa52bd..38c2272a55f 100644
--- a/gcc/testsuite/gcc.target/riscv/sgeu.c
+++ b/gcc/testsuite/gcc.target/riscv/sgeu.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target rv64 } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
 
 int
 sgeu (unsigned int x, unsigned int y)
diff --git a/gcc/testsuite/gcc.target/riscv/sle.c b/gcc/testsuite/gcc.target/riscv/sle.c
index 3259c191598..4371068c50e 100644
--- a/gcc/testsuite/gcc.target/riscv/sle.c
+++ b/gcc/testsuite/gcc.target/riscv/sle.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target rv64 } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
 
 int
 sle (int x, int y)
diff --git a/gcc/testsuite/gcc.target/riscv/sleu.c b/gcc/testsuite/gcc.target/riscv/sleu.c
index 301b8c32eb7..8d4ee50b07c 100644
--- a/gcc/testsuite/gcc.target/riscv/sleu.c
+++ b/gcc/testsuite/gcc.target/riscv/sleu.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target rv64 } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
 
 int
 sleu (unsigned int x, unsigned int y)