diff mbox series

[to-be-committed,RISC-V] Improve bset generation for another case

Message ID 49edf2b0-ed30-4baf-87d0-a729164357e4@ventanamicro.com
State New
Headers show
Series [to-be-committed,RISC-V] Improve bset generation for another case | expand

Commit Message

Jeff Law July 9, 2024, 6:05 p.m. UTC
So another minor improvement for bitmanip code generation.

Essentially we have a pattern which matches a bset idiom for
x = zero_extend (1 << n).   That pattern only handles SI->DI extension.

For the QI/HI case the 1<<n is first in a narrowing subreg to QI/HI.  ie

(zero_extend:DI (subreg:QI (ashift (...))))

The same principles apply to this case as it can be implemented with 
bset target,x0,bitpos and by using x0 we'll get the desired zero extension.

I think this testcase is ultimately derived from 500.perlbench.  Our 
code for this testcase still isn't great, but this is an easy 
improvement and makes one of the remaining inefficiencies more obvious:


>         bset    a5,x0,a5        # 24    [c=8 l=4]  *bsetdi_3
>         andn    a3,a0,a5        # 52    [c=4 l=4]  and_notdi3
>         beq     a4,zero,.L3     # 41    [c=12 l=4]  *branchdi
>         or      a3,a0,a5        # 44    [c=4 l=4]  *iordi3/0

The bset is what this patch generates instead of a li+sll sequence. In 
the form above its easier see that the andn can be replaced with a bclr 
and the or can be replaced with a bset which in turn would allow the 
bset above to go away completely.


This has been tested in my tester for rv32 and rv64.  I'll wait for 
pre-commit testing to complete before moving forward.

jeff
gcc/
	* config/riscv/bitmanip.md (bset<X:mode>_3): New pattern.

testsuite/

	* gcc.target/riscv/zbs-bset-2.c: New test.

Comments

Jeff Law July 10, 2024, 6:04 a.m. UTC | #1
On 7/9/24 12:05 PM, Jeff Law wrote:
> So another minor improvement for bitmanip code generation.
> 
> Essentially we have a pattern which matches a bset idiom for
> x = zero_extend (1 << n).   That pattern only handles SI->DI extension.
> 
> For the QI/HI case the 1<<n is first in a narrowing subreg to QI/HI.  ie
> 
> (zero_extend:DI (subreg:QI (ashift (...))))
> 
> The same principles apply to this case as it can be implemented with 
> bset target,x0,bitpos and by using x0 we'll get the desired zero extension.
> 
> I think this testcase is ultimately derived from 500.perlbench.  Our 
> code for this testcase still isn't great, but this is an easy 
> improvement and makes one of the remaining inefficiencies more obvious:
> 
> 
>>         bset    a5,x0,a5        # 24    [c=8 l=4]  *bsetdi_3
>>         andn    a3,a0,a5        # 52    [c=4 l=4]  and_notdi3
>>         beq     a4,zero,.L3     # 41    [c=12 l=4]  *branchdi
>>         or      a3,a0,a5        # 44    [c=4 l=4]  *iordi3/0
> 
> The bset is what this patch generates instead of a li+sll sequence. In 
> the form above its easier see that the andn can be replaced with a bclr 
> and the or can be replaced with a bset which in turn would allow the 
> bset above to go away completely.
> 
> 
> This has been tested in my tester for rv32 and rv64.  I'll wait for pre- 
> commit testing to complete before moving forward.
This has to be dropped.  It's wrong.

+(define_insn "*bset<X:mode>_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+	(zero_extend:X
+	  (subreg:SHORT
+	    (ashift:X (const_int 1)
+		      (match_operand:QI 1 "register_operand" "r")) 0)))]

That can't be a naked bset.  The problem is the DImode shift may have 
set a bit outside of the mode of SHORT which could be cleared by the 
outer zero_extend from SHORT to DI.

If we tried to implement that with a naked bset that bit outside SHORT 
would be left on resulting in incorrect code.

It's too bad.  There's two additional follow-ups improvements that 
aren't viable.

Jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index f403ba8dbba..b5e052fe4f2 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -611,6 +611,18 @@  (define_insn "*bsetdi_2"
   "bset\t%0,x0,%1"
   [(set_attr "type" "bitmanip")])
 
+;; Similar, but we have a narrowing SUBREG.  We're still using x0 as
+;; a source, so the result is still zero extended.
+(define_insn "*bset<X:mode>_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+	(zero_extend:X
+	  (subreg:SHORT
+	    (ashift:X (const_int 1)
+		      (match_operand:QI 1 "register_operand" "r")) 0)))]
+  "TARGET_ZBS"
+  "bset\t%0,x0,%1"
+  [(set_attr "type" "bitmanip")])
+
 ;; These two splitters take advantage of the limited range of the
 ;; shift constant.  With the limited range we know the SImode sign
 ;; bit is never set, thus we can treat this as zero extending and
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bset-2.c b/gcc/testsuite/gcc.target/riscv/zbs-bset-2.c
new file mode 100644
index 00000000000..8555e3784ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bset-2.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zba_zbb_zbs -mabi=lp64d" { target { riscv64*-*-* } } } */
+/* { dg-options "-O2 -march=rv32gc_zba_zbb_zbs -mabi=ilp32" { target { riscv32*-*-* } } } */
+
+typedef struct SHA {
+        unsigned char block[128];
+        unsigned int blockcnt;
+
+} SHA;
+
+#define BITSET(s, pos)    s[(pos) >> 3] &  (char)  (0x01 << (7 - (pos) % 8))
+#define SETBIT(s, pos)    s[(pos) >> 3] |= (char)  (0x01 << (7 - (pos) % 8))
+#define CLRBIT(s, pos)    s[(pos) >> 3] &= (char) ~(0x01 << (7 - (pos) % 8))
+
+#define ULNG        unsigned long
+
+void shabits(char *bitstr, long bitcnt, SHA *s, ULNG i)
+{
+  if (BITSET(bitstr, i))
+    SETBIT(s->block, s->blockcnt);
+  else
+    CLRBIT(s->block, s->blockcnt);
+}
+
+/* { dg-final { scan-assembler-times "bset\t\[a-x\]\[0-9\]+.x0" 2 { target { riscv64*-*-* } } } } */
+
+/* rv32 doesn't have the first bset for some reason, probably a missed
+   generalization of one of the other bitmanip combiner patterns.  The
+   bset we do generate corresponds to the one the related patch generates.  */
+/* { dg-final { scan-assembler-times "bset\t\[a-x\]\[0-9\]+.x0" 1 { target { riscv32*-*-* } } } } */
+