Message ID | AS1PR08MB7449D252D609A655A60571FEC5312@AS1PR08MB7449.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: optimization on checking certain bits set ((x & mask) == val) | expand |
On 12/6/24 6:12 AM, Oliver Kozul wrote: > The patch optimizes code generation for comparisons of the form > X & C1 == C2 by converting them to (X | ~C1) == (C2 | ~C1). > C1 is a constant that requires li and addi to be loaded, > while ~C1 requires a single lui instruction. > > 2024-12-06 Oliver Kozul <oliver.kozul@rt-rk.com> > > PR target/114087 > > gcc/ChangeLog: > > * config/riscv/riscv.md (*lui_constraint<ANYI:mode>_and_to_or): > New pattern. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr114087-1.c: New test. A few comments: > > +(define_insn_and_split "*lui_constraint<ANYI:mode>_and_to_or" > + [(set (match_operand:ANYI 0 "register_operand" "=r") > + (plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r") > + (match_operand 2 "const_int_operand")) > + (match_operand 3 "const_int_operand"))) > + (clobber (match_scratch:X 4 "=&r"))] Similar formatting comment as I had on the other patch. Easily fixed. Just for giggles I asked chatgpt to format it and it came up with: > (define_insn_and_split "*lui_constraint<ANYI:mode>_and_to_or" > [(set (match_operand:ANYI 0 "register_operand" "=r") > (plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r") > (match_operand 2 "const_int_operand")) > (match_operand 3 "const_int_operand"))) > (clobber (match_scratch:X 4 "=&r"))] > "LUI_OPERAND (INTVAL (operands[2]) + 1) > && (INTVAL (operands[2]) & (-INTVAL (operands[3]))) == (-INTVAL (operands[3]))" > "#" > "&& reload_completed" > [(set (match_dup 4) (match_dup 5)) > (set (match_dup 0) (ior:X (match_dup 1) (match_dup 4))) > (set (match_dup 4) (match_dup 6)) > (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))] > { > operands[5] = GEN_INT (~INTVAL (operands[2])); > operands[6] = GEN_INT ((~INTVAL (operands[2])) | (-INTVAL (operands[3]))); > } > [(set_attr "type" "arith")]) Which looks basically correct. More importantly we probably want a comment on what this pattern is actually doing. Perhaps something like: ;; Transform (X & C1) + C2 into (X | ~C1) - (-C2 | ~C1) ;; Where C1 is not a LUI operand, but ~C1 is a LUI operand Did I get that correct? > + "LUI_OPERAND (INTVAL (operands[2]) + 1) So shouldn't this have been LUI_OPERAND (~INTVAL (operands[2]))? I can kind-of see why you used +1, but given the C fragment in the pattern uses ~operands[2], let's use the same here for the test if we can. > + && (INTVAL (operands[2]) & (-INTVAL (operands[3]))) > + == (-INTVAL (operands[3]))" Does it make sense to verify that operands[3] is a constant that requires synthesis and that ~INTVAL (operands[2]) | -INTVAL (operands[3]) is no more expensive to synthesize than operands[3]? ie, saving the addi on the first constant isn't a win if the second constant gets more complex to synthesize. The API isn't great, but you can use riscv_const_insns to find out how many instructions are necessary to synthesize a constant. Overall it looks good. Just needs a bit of polishing. jeff
Thank you again for your helpful comments. I will implement the formatting changes you mentioned. I agree that a description like: ;; Transform (X & C1) + C2 into (X | ~C1) - (-C2 | ~C1) ;; Where C1 is not a LUI operand, but ~C1 is a LUI operand is more fitting since we are catching a plus expression. Your proposed condition also makes more sense than the current one. LUI_OPERAND (INTVAL (operands[2]) + 1) made sense when starting out, but I never got around to changing to LUI_OPERAND (~INTVAL (operands[2]). I did not know about the costs of constant synthesis. I thought that the values of those constants are known at compile time, and therefore their creation would not have an impact on performance. I will definitely take a look at riscv_const_insns and evaluate the synthesis complexity. Best regards, Oliver
On 12/15/24 3:11 AM, Oliver Kozul wrote: > > I did not know about the costs of constant synthesis. > I thought that the values of those constants are known at compile time, > and therefore their creation would not have an impact on performance. > I will definitely take a look at riscv_const_insns and evaluate the > synthesis complexity. On an rv32 system constructing a word sized constant takes at most two instructions (lui+addi). On rv64 it can take as many as 6 instructions (lui+addi+lui+addi+sll+ior). If you dive into riscv_const_insns you'll see a ton of code that's designed to bring down the cost of constant synthesis, particularly on rv64 using various tricks. Jeff
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 3a4cd1d93a0..add31bbf51c 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -858,6 +858,27 @@ [(set_attr "type" "arith") (set_attr "mode" "SI")]) +(define_insn_and_split "*lui_constraint<ANYI:mode>_and_to_or" + [(set (match_operand:ANYI 0 "register_operand" "=r") + (plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r") + (match_operand 2 "const_int_operand")) + (match_operand 3 "const_int_operand"))) + (clobber (match_scratch:X 4 "=&r"))] + "LUI_OPERAND (INTVAL (operands[2]) + 1) + && (INTVAL (operands[2]) & (-INTVAL (operands[3]))) + == (-INTVAL (operands[3]))" + "#" + "&& reload_completed" + [(set (match_dup 4) (match_dup 5)) + (set (match_dup 0) (ior:X (match_dup 1) (match_dup 4))) + (set (match_dup 4) (match_dup 6)) + (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))] + { + operands[5] = GEN_INT (~INTVAL (operands[2])); + operands[6] = GEN_INT ((~INTVAL (operands[2])) | (-INTVAL (operands[3]))); + } + [(set_attr "type" "arith")]) + ;; ;; .................... ;; diff --git a/gcc/testsuite/gcc.target/riscv/pr114087-1.c b/gcc/testsuite/gcc.target/riscv/pr114087-1.c new file mode 100644 index 00000000000..5e40b5f7b5b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr114087-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */ +/* { dg-options "-march=rv64gc -mabi=lp64d" } */ + +int pred1a(int x) { + return ((x & 0x55555FFF) == 0x14501DEF); +} + +/* { dg-final { scan-assembler {or\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+} } } */