Message ID | 49edf2b0-ed30-4baf-87d0-a729164357e4@ventanamicro.com |
---|---|
State | New |
Headers | show |
Series | [to-be-committed,RISC-V] Improve bset generation for another case | expand |
On 7/9/24 12:05 PM, Jeff Law wrote: > So another minor improvement for bitmanip code generation. > > Essentially we have a pattern which matches a bset idiom for > x = zero_extend (1 << n). That pattern only handles SI->DI extension. > > For the QI/HI case the 1<<n is first in a narrowing subreg to QI/HI. ie > > (zero_extend:DI (subreg:QI (ashift (...)))) > > The same principles apply to this case as it can be implemented with > bset target,x0,bitpos and by using x0 we'll get the desired zero extension. > > I think this testcase is ultimately derived from 500.perlbench. Our > code for this testcase still isn't great, but this is an easy > improvement and makes one of the remaining inefficiencies more obvious: > > >> bset a5,x0,a5 # 24 [c=8 l=4] *bsetdi_3 >> andn a3,a0,a5 # 52 [c=4 l=4] and_notdi3 >> beq a4,zero,.L3 # 41 [c=12 l=4] *branchdi >> or a3,a0,a5 # 44 [c=4 l=4] *iordi3/0 > > The bset is what this patch generates instead of a li+sll sequence. In > the form above its easier see that the andn can be replaced with a bclr > and the or can be replaced with a bset which in turn would allow the > bset above to go away completely. > > > This has been tested in my tester for rv32 and rv64. I'll wait for pre- > commit testing to complete before moving forward. This has to be dropped. It's wrong. +(define_insn "*bset<X:mode>_3" + [(set (match_operand:X 0 "register_operand" "=r") + (zero_extend:X + (subreg:SHORT + (ashift:X (const_int 1) + (match_operand:QI 1 "register_operand" "r")) 0)))] That can't be a naked bset. The problem is the DImode shift may have set a bit outside of the mode of SHORT which could be cleared by the outer zero_extend from SHORT to DI. If we tried to implement that with a naked bset that bit outside SHORT would be left on resulting in incorrect code. It's too bad. There's two additional follow-ups improvements that aren't viable. Jeff
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index f403ba8dbba..b5e052fe4f2 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -611,6 +611,18 @@ (define_insn "*bsetdi_2" "bset\t%0,x0,%1" [(set_attr "type" "bitmanip")]) +;; Similar, but we have a narrowing SUBREG. We're still using x0 as +;; a source, so the result is still zero extended. +(define_insn "*bset<X:mode>_3" + [(set (match_operand:X 0 "register_operand" "=r") + (zero_extend:X + (subreg:SHORT + (ashift:X (const_int 1) + (match_operand:QI 1 "register_operand" "r")) 0)))] + "TARGET_ZBS" + "bset\t%0,x0,%1" + [(set_attr "type" "bitmanip")]) + ;; These two splitters take advantage of the limited range of the ;; shift constant. With the limited range we know the SImode sign ;; bit is never set, thus we can treat this as zero extending and diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bset-2.c b/gcc/testsuite/gcc.target/riscv/zbs-bset-2.c new file mode 100644 index 00000000000..8555e3784ff --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zbs-bset-2.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=rv64gc_zba_zbb_zbs -mabi=lp64d" { target { riscv64*-*-* } } } */ +/* { dg-options "-O2 -march=rv32gc_zba_zbb_zbs -mabi=ilp32" { target { riscv32*-*-* } } } */ + +typedef struct SHA { + unsigned char block[128]; + unsigned int blockcnt; + +} SHA; + +#define BITSET(s, pos) s[(pos) >> 3] & (char) (0x01 << (7 - (pos) % 8)) +#define SETBIT(s, pos) s[(pos) >> 3] |= (char) (0x01 << (7 - (pos) % 8)) +#define CLRBIT(s, pos) s[(pos) >> 3] &= (char) ~(0x01 << (7 - (pos) % 8)) + +#define ULNG unsigned long + +void shabits(char *bitstr, long bitcnt, SHA *s, ULNG i) +{ + if (BITSET(bitstr, i)) + SETBIT(s->block, s->blockcnt); + else + CLRBIT(s->block, s->blockcnt); +} + +/* { dg-final { scan-assembler-times "bset\t\[a-x\]\[0-9\]+.x0" 2 { target { riscv64*-*-* } } } } */ + +/* rv32 doesn't have the first bset for some reason, probably a missed + generalization of one of the other bitmanip combiner patterns. The + bset we do generate corresponds to the one the related patch generates. */ +/* { dg-final { scan-assembler-times "bset\t\[a-x\]\[0-9\]+.x0" 1 { target { riscv32*-*-* } } } } */ +