Message ID | 20230301195315.1793087-1-vineetg@rivosinc.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: costs: miscomputed shiftadd_cost triggering synth_mult [PR/108987] | expand |
On Wed, 1 Mar 2023 at 20:53, Vineet Gupta <vineetg@rivosinc.com> wrote: > > This showed up as dynamic icount regression in SPEC 531.deepsjeng with upstream > gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s) > even when multiply had clear cost benefit. > > |00000000000133b8 <see(state_t*, int, int, int, int) [clone .constprop.0]+0x382>: > | 133b8: srl a3,a1,s6 > | 133bc: and a3,a3,s5 > | 133c0: slli a4,a3,0x9 > | 133c4: add a4,a4,a3 > | 133c6: slli a4,a4,0x9 > | 133c8: add a4,a4,a3 > | 133ca: slli a3,a4,0x1b > | 133ce: add a4,a4,a3 > > vs. gcc 12 doing something lke below. > > |00000000000131c4 <see(state_t*, int, int, int, int) [clone .constprop.0]+0x35c>: > | 131c4: ld s1,8(sp) > | 131c6: srl a3,a1,s4 > | 131ca: and a3,a3,s11 > | 131ce: mul a3,a3,s1 > > Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in > strength-reduction"). The intent was to optimize cost for > shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended > up doing that for all shift values which seems to favor synthezing > multiply among others. > > The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool > vs. exact_log2() returning power of 2. > > This fix also requires update to the test introduced by the same commit > which now generates MUL vs. synthesizing it. > > gcc/Changelog: > > * config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to > use exact_log2(). > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs. > 5 insn sh1add+slli+add+slli+sub. > * gcc.target/riscv/pr108987.c: New test. > > Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Committed 2 weeks ago but apparently I didn't send mail to say that, thanks Vineet. On Thu, Mar 2, 2023 at 3:56 AM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > On Wed, 1 Mar 2023 at 20:53, Vineet Gupta <vineetg@rivosinc.com> wrote: > > > > This showed up as dynamic icount regression in SPEC 531.deepsjeng with upstream > > gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s) > > even when multiply had clear cost benefit. > > > > |00000000000133b8 <see(state_t*, int, int, int, int) [clone .constprop.0]+0x382>: > > | 133b8: srl a3,a1,s6 > > | 133bc: and a3,a3,s5 > > | 133c0: slli a4,a3,0x9 > > | 133c4: add a4,a4,a3 > > | 133c6: slli a4,a4,0x9 > > | 133c8: add a4,a4,a3 > > | 133ca: slli a3,a4,0x1b > > | 133ce: add a4,a4,a3 > > > > vs. gcc 12 doing something lke below. > > > > |00000000000131c4 <see(state_t*, int, int, int, int) [clone .constprop.0]+0x35c>: > > | 131c4: ld s1,8(sp) > > | 131c6: srl a3,a1,s4 > > | 131ca: and a3,a3,s11 > > | 131ce: mul a3,a3,s1 > > > > Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in > > strength-reduction"). The intent was to optimize cost for > > shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended > > up doing that for all shift values which seems to favor synthezing > > multiply among others. > > > > The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool > > vs. exact_log2() returning power of 2. > > > > This fix also requires update to the test introduced by the same commit > > which now generates MUL vs. synthesizing it. > > > > gcc/Changelog: > > > > * config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to > > use exact_log2(). > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs. > > 5 insn sh1add+slli+add+slli+sub. > > * gcc.target/riscv/pr108987.c: New test. > > > > Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> > > Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index e36ff05695a6..2cf172f59c28 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2496,7 +2496,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN && GET_CODE (XEXP (x, 0)) == MULT && REG_P (XEXP (XEXP (x, 0), 0)) && CONST_INT_P (XEXP (XEXP (x, 0), 1)) - && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) + && pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))) + && IN_RANGE (exact_log2 (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) { *total = COSTS_N_INSNS (1); return true; diff --git a/gcc/testsuite/gcc.target/riscv/pr108987.c b/gcc/testsuite/gcc.target/riscv/pr108987.c new file mode 100644 index 000000000000..6179c7e13a45 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr108987.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zba -mabi=lp64 -O2" } */ + +unsigned long long f5(unsigned long long i) +{ + return i * 0x0202020202020202ULL; +} + +/* { dg-final { scan-assembler-times "mul" 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c index 98d35e1da9b4..93da241c9b60 100644 --- a/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c +++ b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c @@ -26,6 +26,6 @@ f4 (unsigned long i) } /* { dg-final { scan-assembler-times "sh2add" 2 } } */ -/* { dg-final { scan-assembler-times "sh1add" 2 } } */ -/* { dg-final { scan-assembler-times "slli" 5 } } */ -/* { dg-final { scan-assembler-times "mul" 1 } } */ +/* { dg-final { scan-assembler-times "sh1add" 1 } } */ +/* { dg-final { scan-assembler-times "slli" 3 } } */ +/* { dg-final { scan-assembler-times "mul" 2 } } */
This showed up as dynamic icount regression in SPEC 531.deepsjeng with upstream gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s) even when multiply had clear cost benefit. |00000000000133b8 <see(state_t*, int, int, int, int) [clone .constprop.0]+0x382>: | 133b8: srl a3,a1,s6 | 133bc: and a3,a3,s5 | 133c0: slli a4,a3,0x9 | 133c4: add a4,a4,a3 | 133c6: slli a4,a4,0x9 | 133c8: add a4,a4,a3 | 133ca: slli a3,a4,0x1b | 133ce: add a4,a4,a3 vs. gcc 12 doing something lke below. |00000000000131c4 <see(state_t*, int, int, int, int) [clone .constprop.0]+0x35c>: | 131c4: ld s1,8(sp) | 131c6: srl a3,a1,s4 | 131ca: and a3,a3,s11 | 131ce: mul a3,a3,s1 Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in strength-reduction"). The intent was to optimize cost for shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended up doing that for all shift values which seems to favor synthezing multiply among others. The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool vs. exact_log2() returning power of 2. This fix also requires update to the test introduced by the same commit which now generates MUL vs. synthesizing it. gcc/Changelog: * config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to use exact_log2(). gcc/testsuite/ChangeLog: * gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs. 5 insn sh1add+slli+add+slli+sub. * gcc.target/riscv/pr108987.c: New test. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> --- gcc/config/riscv/riscv.cc | 3 ++- gcc/testsuite/gcc.target/riscv/pr108987.c | 9 +++++++++ gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c | 6 +++--- 3 files changed, 14 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr108987.c