Message ID | DB9PR08MB6634E94834941085D2F96115B7612@DB9PR08MB6634.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: Improve code generation for select of consecutive constants | expand |
On 9/17/24 5:25 AM, Jovan Vukic wrote: > The patch optimizes code generated for comparisons of the form x > y ? 2 : 3 > (x <= y ? 3 : 2) and x < y ? 2 : 3 (x >= y ? 3 : 2). > > For the following C code: > > long f1(long x, long y) { > return (x > y) ? 2 : 3; > } > > long f2(long x, long y) { > return (x < y) ? 2 : 3; > } > > > Before the patch, the generated assembly is: > > f1(long, long): > sgt a0, a0, a1 > xori a0, a0, 1 > addi a0, a0, 2 > ret > > f2(long, long): > slt a0, a0, a1 > xori a0, a0, 1 > addi a0, a0, 2 > ret > > > After the patch, the generated assembly is: > > f1(long, long): > slt a0,a1,a0 > xori a0,a0,3 > ret > > f2(long, long): > slt a0,a0,a1 > xori a0,a0,3 > ret > > > If we only consider the first example, during the combine pass, it can match > the "le:" pattern (x <= y), but that can only be replaced by the inverse > operation x > y (y < x), which requires replacing the following addi > instruction with an xori instruction as well, as proposed by the patch. > > Tested on both RV32 and RV64 with no regressions. > > 2024-09-17 Jovan Vukic <Jovan.Vukic@rt-rk.com> > > PR target/108038 > > gcc/ChangeLog: > > * config/riscv/iterators.md (paired_lt): New code attributes. > * config/riscv/riscv.md (*sle<u>_<X:mode><GPR:mode>_xor): New pattern. > (*sge<u>_<X:mode><GPR:mode>_xor): New pattern. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/slt-1.c: New test. So you're essentially taking advantage of the fact that the bits set by the sCC and the constant in the addi are disjoint. That's a pretty generic concept. So there's a real possibility we could do this in generic code so that everyone could benefit. The first point in the various pipelines would probably be phi-opt which tries to collapse certain if-then constructs into conditional expressions (tree-ssa-phiopt.cc). It'll be presented with something like this: > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > if (x_2(D) > y_3(D)) > goto <bb 3>; [50.00%] > else > goto <bb 4>; [50.00%] > ;; succ: 3 > ;; 4 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 2 > ;; 3 > # _1 = PHI <3(2), 2(3)> Which seems well suited for value_replacement in tree-ssa-phiopt.cc except that value_replacement seems to want an equality comparison and SSA_NAMEs (rather than constants) as PHI args. But still, it's pretty close. Somewhere in tree-ssa-phiopt.cc seems like the best place to handle this. An alternative would be in in simplify_binary_operation in the RTL pipeline. At some point it should be passed a useful PLUS as the outer operator and (xor (reg) (const_int 1)) as the first argument and the constant as the second argument. *If* we ultimately decided to do this in the risc-v backend, the first thing I would suggest would be to try and use a define_split rather than a define_insn_and_split. The latter tends to throw off costing models for splitting, thus the former is slightly preferred. And just to be 100% clear, I think this is a good patch, but that there's a better way to accomplish what you're trying to do. jeff
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 2844cb02ff0..964d0f24f4c 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -234,6 +234,10 @@ (define_code_iterator any_lt [lt ltu]) (define_code_iterator any_le [le leu]) +(define_code_attr paired_lt [(gt "lt") (gtu "ltu") + (ge "lt") (geu "ltu") + (le "lt") (leu "ltu")]) + ; atomics code iterator (define_code_iterator any_atomic [plus ior xor and]) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 9f94b5aa023..b22a8ac05e0 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -3600,6 +3600,38 @@ [(set_attr "type" "slt") (set_attr "mode" "<X:MODE>")]) +(define_insn_and_split "*sle<u>_<X:mode><GPR:mode>_xor" + [(set (match_operand:GPR 0 "register_operand" "=r") + (plus:GPR (any_le:GPR (match_operand:X 1 "register_operand" " r") + (match_operand:X 2 "register_operand" " r")) + (match_operand 3 "const_arith_operand" "i")))] + "INTVAL (operands[3]) % 2 == 0" + "#" + "&& 1" + [(set (match_dup 0) (<paired_lt>:GPR (match_dup 2) (match_dup 1))) + (set (match_dup 0) (xor:GPR (match_dup 0) (match_dup 3)))] +{ + operands[3] = GEN_INT (INTVAL (operands[3]) + 1); +} + [(set_attr "type" "slt") + (set_attr "mode" "<X:MODE>")]) + +(define_insn_and_split "*sge<u>_<X:mode><GPR:mode>_xor" + [(set (match_operand:GPR 0 "register_operand" "=r") + (plus:GPR (any_ge:GPR (match_operand:X 1 "register_operand" " r") + (match_operand:X 2 "register_operand" " r")) + (match_operand 3 "const_arith_operand" "i")))] + "INTVAL (operands[3]) % 2 == 0" + "#" + "&& 1" + [(set (match_dup 0) (<paired_lt>:GPR (match_dup 1) (match_dup 2))) + (set (match_dup 0) (xor:GPR (match_dup 0) (match_dup 3)))] +{ + operands[3] = GEN_INT (INTVAL (operands[3]) + 1); +} + [(set_attr "type" "slt") + (set_attr "mode" "<X:MODE>")]) + ;; ;; .................... ;; diff --git a/gcc/testsuite/gcc.target/riscv/slt-1.c b/gcc/testsuite/gcc.target/riscv/slt-1.c new file mode 100644 index 00000000000..c672da92acf --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/slt-1.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + +#include <stdint.h> + +#define COMPARISON(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \ + TYPE test##OPN(TYPE x, TYPE y) { \ + return (x OP y) ? RESULT_TRUE : RESULT_FALSE; \ + } + +/* Signed comparisons */ +COMPARISON(int64_t, >, GT1, 2, 3) +COMPARISON(int64_t, >, GT3, 5, 6) + +COMPARISON(int64_t, <, LT1, 2, 3) +COMPARISON(int64_t, <, LT3, 5, 6) + +COMPARISON(int64_t, >=, GE2, 3, 2) +COMPARISON(int64_t, >=, GE4, 6, 5) + +COMPARISON(int64_t, <=, LE2, 3, 2) +COMPARISON(int64_t, <=, LE4, 6, 5) + +/* Unsigned comparisons */ +COMPARISON(uint64_t, >, GTU1, 2, 3) +COMPARISON(uint64_t, >, GTU3, 5, 6) + +COMPARISON(uint64_t, <, LTU1, 2, 3) +COMPARISON(uint64_t, <, LTU3, 5, 6) + +COMPARISON(uint64_t, >=, GEU2, 3, 2) +COMPARISON(uint64_t, >=, GEU4, 6, 5) + +COMPARISON(uint64_t, <=, LEU2, 3, 2) +COMPARISON(uint64_t, <=, LEU4, 6, 5) + +/* { dg-final { scan-assembler-times "sgt\\t" 2 } } */ +/* { dg-final { scan-assembler-times "sgtu\\t" 2 } } */ +/* { dg-final { scan-assembler-times "slt\\t" 6 } } */ +/* { dg-final { scan-assembler-times "sltu\\t" 6 } } */ +/* { dg-final { scan-assembler-times "xori\\ta0,a0,1" 8 } } */ +/* { dg-final { scan-assembler-times "xori\\ta0,a0,3" 8 } } */ +