| Message ID | d942aada-0060-4fc5-a058-9b1456990d6b@yahoo.co.jp |
|---|---|
| State | New |
| Headers | show |
| Series | [v2] xtensa: Fix the issue in "*extzvsi-1bit_addsubx" | expand |
On Sat, Nov 9, 2024 at 10:39 PM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > > The second source register of insn "*extzvsi-1bit_addsubx" cannot be the > same as the destination register, because that register will be overwritten > with an intermediate value after insn splitting. > > /* example #1 */ > int test1(int b, int a) { > return ((a & 1024) ? 4 : 0) + b; > } > > ;; result #1 (incorrect) > test1: > extui a2, a3, 10, 1 ;; overwrites A2 before used > addx4 a2, a2, a2 > ret.n Interestingly I couldn't reproduce it with the current gcc mainline. For me it produces the following for the above source: test1: mov.n a9, a2 extui a2, a3, 10, 1 addx4 a2, a2, a9 ret.n With this change the generated code is one instruction shorter, so applying it still makes sense. > This patch fixes that. > > ;; result #1 (correct) > test1: > extui a3, a3, 10, 1 ;; uses A3 and then overwrites > addx4 a2, a3, a2 > ret.n > > However, it should be noted that the first source register can be the same > as the destination without any problems. > > /* example #2 */ > int test2(int a, int b) { > return ((a & 1024) ? 4 : 0) + b; > } > > ;; result (correct) > test2: > extui a2, a2, 10, 1 ;; uses A2 and then overwrites > addx4 a2, a2, a3 > ret.n > > gcc/ChangeLog: > > * config/xtensa/xtensa.md (*extzvsi-1bit_addsubx): > Add '&' to the destination register constraint to indicate that > it is 'earlyclobber', append '0' to the first source register > constraint to indicate that it can be the same as the destination > register, and change the split condition from 1 to reload_completed > so that the insn will be split only after RA in order to obtain > allocated registers that satisfy the above constraints. > --- > gcc/config/xtensa/xtensa.md | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master
Takayuki, thank you for the quick fix!
It seems works good now except only one degradation. Instead generating two instructions:
7 ptr += (i & 1);
0x40078564 <+12>: extui a9, a8, 0, 1
0x40078567 <+15>: addx2 a2, a9, a2
Now it generates three:
7 ptr += (i & 1);
0x40078564 <+12>: extui a9, a8, 0, 1
0x40078567 <+15>: addx2 a9, a9, a2
0x4007856a <+18>: mov.n a2, a9
Testcase to reproduce:
#include <stdint.h>
__attribute__((noinline))
void * func (short *ptr, uint8_t count)
{
for (unsigned i = 0; i < count; i++) {
ptr += (i & 1);
}
return ptr;
}
int main(void)
{
short buf[16] = {};
func(buf, 16);
}
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index e8349a93b..695b9c861 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -1108,17 +1108,17 @@ (const_int 6)))]) (define_insn_and_split "*extzvsi-1bit_addsubx" - [(set (match_operand:SI 0 "register_operand" "=a") + [(set (match_operand:SI 0 "register_operand" "=&a") (match_operator:SI 5 "addsub_operator" [(and:SI (match_operator:SI 6 "logical_shift_operator" - [(match_operand:SI 1 "register_operand" "r") + [(match_operand:SI 1 "register_operand" "r0") (match_operand:SI 3 "const_int_operand" "i")]) (match_operand:SI 4 "const_int_operand" "i")) (match_operand:SI 2 "register_operand" "r")]))] "TARGET_ADDX && IN_RANGE (exact_log2 (INTVAL (operands[4])), 1, 3)" "#" - "&& 1" + "&& reload_completed" [(set (match_dup 0) (zero_extract:SI (match_dup 1) (const_int 1)
The second source register of insn "*extzvsi-1bit_addsubx" cannot be the same as the destination register, because that register will be overwritten with an intermediate value after insn splitting. /* example #1 */ int test1(int b, int a) { return ((a & 1024) ? 4 : 0) + b; } ;; result #1 (incorrect) test1: extui a2, a3, 10, 1 ;; overwrites A2 before used addx4 a2, a2, a2 ret.n This patch fixes that. ;; result #1 (correct) test1: extui a3, a3, 10, 1 ;; uses A3 and then overwrites addx4 a2, a3, a2 ret.n However, it should be noted that the first source register can be the same as the destination without any problems. /* example #2 */ int test2(int a, int b) { return ((a & 1024) ? 4 : 0) + b; } ;; result (correct) test2: extui a2, a2, 10, 1 ;; uses A2 and then overwrites addx4 a2, a2, a3 ret.n gcc/ChangeLog: * config/xtensa/xtensa.md (*extzvsi-1bit_addsubx): Add '&' to the destination register constraint to indicate that it is 'earlyclobber', append '0' to the first source register constraint to indicate that it can be the same as the destination register, and change the split condition from 1 to reload_completed so that the insn will be split only after RA in order to obtain allocated registers that satisfy the above constraints. --- gcc/config/xtensa/xtensa.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)