Message ID | d942aada-0060-4fc5-a058-9b1456990d6b@yahoo.co.jp |
---|---|
State | New |
Headers | show |
Series | [v2] xtensa: Fix the issue in "*extzvsi-1bit_addsubx" | expand |
On Sat, Nov 9, 2024 at 10:39 PM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > > The second source register of insn "*extzvsi-1bit_addsubx" cannot be the > same as the destination register, because that register will be overwritten > with an intermediate value after insn splitting. > > /* example #1 */ > int test1(int b, int a) { > return ((a & 1024) ? 4 : 0) + b; > } > > ;; result #1 (incorrect) > test1: > extui a2, a3, 10, 1 ;; overwrites A2 before used > addx4 a2, a2, a2 > ret.n Interestingly I couldn't reproduce it with the current gcc mainline. For me it produces the following for the above source: test1: mov.n a9, a2 extui a2, a3, 10, 1 addx4 a2, a2, a9 ret.n With this change the generated code is one instruction shorter, so applying it still makes sense. > This patch fixes that. > > ;; result #1 (correct) > test1: > extui a3, a3, 10, 1 ;; uses A3 and then overwrites > addx4 a2, a3, a2 > ret.n > > However, it should be noted that the first source register can be the same > as the destination without any problems. > > /* example #2 */ > int test2(int a, int b) { > return ((a & 1024) ? 4 : 0) + b; > } > > ;; result (correct) > test2: > extui a2, a2, 10, 1 ;; uses A2 and then overwrites > addx4 a2, a2, a3 > ret.n > > gcc/ChangeLog: > > * config/xtensa/xtensa.md (*extzvsi-1bit_addsubx): > Add '&' to the destination register constraint to indicate that > it is 'earlyclobber', append '0' to the first source register > constraint to indicate that it can be the same as the destination > register, and change the split condition from 1 to reload_completed > so that the insn will be split only after RA in order to obtain > allocated registers that satisfy the above constraints. > --- > gcc/config/xtensa/xtensa.md | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master
Takayuki, thank you for the quick fix! It seems works good now except only one degradation. Instead generating two instructions: 7 ptr += (i & 1); 0x40078564 <+12>: extui a9, a8, 0, 1 0x40078567 <+15>: addx2 a2, a9, a2 Now it generates three: 7 ptr += (i & 1); 0x40078564 <+12>: extui a9, a8, 0, 1 0x40078567 <+15>: addx2 a9, a9, a2 0x4007856a <+18>: mov.n a2, a9 Testcase to reproduce: #include <stdint.h> __attribute__((noinline)) void * func (short *ptr, uint8_t count) { for (unsigned i = 0; i < count; i++) { ptr += (i & 1); } return ptr; } int main(void) { short buf[16] = {}; func(buf, 16); }
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index e8349a93b..695b9c861 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -1108,17 +1108,17 @@ (const_int 6)))]) (define_insn_and_split "*extzvsi-1bit_addsubx" - [(set (match_operand:SI 0 "register_operand" "=a") + [(set (match_operand:SI 0 "register_operand" "=&a") (match_operator:SI 5 "addsub_operator" [(and:SI (match_operator:SI 6 "logical_shift_operator" - [(match_operand:SI 1 "register_operand" "r") + [(match_operand:SI 1 "register_operand" "r0") (match_operand:SI 3 "const_int_operand" "i")]) (match_operand:SI 4 "const_int_operand" "i")) (match_operand:SI 2 "register_operand" "r")]))] "TARGET_ADDX && IN_RANGE (exact_log2 (INTVAL (operands[4])), 1, 3)" "#" - "&& 1" + "&& reload_completed" [(set (match_dup 0) (zero_extract:SI (match_dup 1) (const_int 1)