diff mbox series

[2/2] RISC-V: Constant synthesis by shifting the lower half

Message ID 20240808171010.16216-2-rzinsly@ventanamicro.com
State New
Headers show
Series [1/2] RISC-V: Constant synthesis with same upper and lower halves | expand

Commit Message

Raphael Moreira Zinsly Aug. 8, 2024, 5:10 p.m. UTC
Improve handling of constants where the high half can be constructed
by shifting the low half.

gcc/ChangeLog:
	* config/riscv/riscv.cc (riscv_build_integer): Detect constants
	were the higher half is a shift of the lower half.

gcc/testsuite/ChangeLog:
	* gcc.target/riscv/synthesis-12.c: New test.
---
 gcc/config/riscv/riscv.cc                     | 39 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/synthesis-12.c | 27 +++++++++++++
 2 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-12.c

Comments

Jeff Law Aug. 25, 2024, 3:51 p.m. UTC | #1
On 8/8/24 11:10 AM, Raphael Moreira Zinsly wrote:
> Improve handling of constants where the high half can be constructed
> by shifting the low half.
> 
> gcc/ChangeLog:
> 	* config/riscv/riscv.cc (riscv_build_integer): Detect constants
> 	were the higher half is a shift of the lower half.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/riscv/synthesis-12.c: New test.
Don't you need to check somewhere that the upper/lower halves are the 
same after an appropriate shift?   It looks like you just assume they are.


Jeff
Jeff Law Aug. 25, 2024, 4:26 p.m. UTC | #2
On 8/8/24 11:10 AM, Raphael Moreira Zinsly wrote:
> Improve handling of constants where the high half can be constructed
> by shifting the low half.
> 
> gcc/ChangeLog:
> 	* config/riscv/riscv.cc (riscv_build_integer): Detect constants
> 	were the higher half is a shift of the lower half.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/riscv/synthesis-12.c: New test.
Oh, nevermind.  The test is a bit later than I expected to find it.

I'd move the test for equality after shifting to a point before you call 
riscv_build_integer_1.  That routine is more expensive than I'd like 
with all the recursive calls and such, so let's do the relatively cheap 
test first and only call riscv_build_integer_1 when there's a reasonable 
chance we can optimize.

This code should also test ALLOW_NEW_PSEUDOS since we need cost 
stability before/after reload.

Repost after those changes.


With this framework I think you could also handle the case where the 
upper/lower vary by just one bit fairly trivially.

ie, when popcount (upper ^ lower) == 1 use binv to flip the bit high 
word.  Obviously this only applies when ZBS is enabled.  If you want to 
do this, I'd structure it largely like the shifted case.

And if high is +-2k from low, then there may be a synthesis for that 
case as well.

And if the high word is 3x 5x or 9x the low word, then shadd applies.

Those three additional cases aren't required for this patch to move 
forward.  Just additional enhancements if you want to tackle them.

Jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 454220d8ba4..a3e8a243f15 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1259,6 +1259,45 @@  riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
 	      cost = alt_cost;
 	    }
 	}
+
+      if (cost > 4 && !bit31)
+	{
+	  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
+	  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
+	  alt_cost = 2 + riscv_build_integer_1 (alt_codes, sext_hwi (loval, 32),
+						mode);
+	  /* For constants where the upper half is a shift of the lower half we
+	     can do a similar transformation as for constants with the same
+	     halves.  */
+	  if (alt_cost < cost)
+	    {
+	      alt_codes[alt_cost - 3].save_temporary = true;
+	      alt_codes[alt_cost - 2].code = ASHIFT;
+	      alt_codes[alt_cost - 2].use_uw = false;
+	      alt_codes[alt_cost - 2].save_temporary = false;
+	      alt_codes[alt_cost - 1].code = CONCAT;
+	      alt_codes[alt_cost - 1].value = 0;
+	      alt_codes[alt_cost - 1].use_uw = false;
+	      alt_codes[alt_cost - 1].save_temporary = false;
+
+	      /* Adjust the shift into the high half accordingly.  */
+	      if ((trailing_shift > 0 && hival == (loval >> trailing_shift)) ||
+		   (trailing_shift < 0 && hival == (loval << trailing_shift)))
+		{
+		  alt_codes[alt_cost - 2].value = 32 - trailing_shift;
+		  memcpy (codes, alt_codes, sizeof (alt_codes));
+		  cost = alt_cost;
+		}
+	      else if ((leading_shift < 0 && hival == (loval >> leading_shift))
+			|| (leading_shift > 0
+			    && hival == (loval << leading_shift)))
+		{
+		  alt_codes[alt_cost - 2].value = 32 + leading_shift;
+		  memcpy (codes, alt_codes, sizeof (alt_codes));
+		  cost = alt_cost;
+		}
+	    }
+	}
     }
 
   return cost;
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-12.c b/gcc/testsuite/gcc.target/riscv/synthesis-12.c
new file mode 100644
index 00000000000..0265a2d6f13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-12.c
@@ -0,0 +1,27 @@ 
+
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */
+/* { dg-options "-march=rv64gc" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions.
+
+   This isn't expected to change much and any change is worthy of a look.  */
+/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|pack|ret|sh1add|sh2add|sh3add|slli|srli|xori|or)" 45 } } */
+
+
+unsigned long foo_0x7857f2de7857f2de(void) { return 0x7857f2de7857f2deUL; }
+unsigned long foo_0x7fffdffe3fffefff(void) { return 0x7fffdffe3fffefffUL; }
+unsigned long foo_0x1ffff7fe3fffeffc(void) { return 0x1ffff7fe3fffeffcUL; }
+unsigned long foo_0x0a3fdbf0028ff6fc(void) { return 0x0a3fdbf0028ff6fcUL; }
+unsigned long foo_0x014067e805019fa0(void) { return 0x014067e805019fa0UL; }
+unsigned long foo_0x09d87e90009d87e9(void) { return 0x09d87e90009d87e9UL; }
+unsigned long foo_0x2302320000118119(void) { return 0x2302320000118119UL; }
+unsigned long foo_0x000711eb00e23d60(void) { return 0x000711eb00e23d60UL; }
+unsigned long foo_0x5983800001660e00(void) { return 0x5983800001660e00UL; }