diff mbox series

[v2] xtensa: Fix the issue in "*extzvsi-1bit_addsubx"

Message ID d942aada-0060-4fc5-a058-9b1456990d6b@yahoo.co.jp
State New
Headers show
Series [v2] xtensa: Fix the issue in "*extzvsi-1bit_addsubx" | expand

Commit Message

Takayuki 'January June' Suwa Nov. 10, 2024, 6:39 a.m. UTC
The second source register of insn "*extzvsi-1bit_addsubx" cannot be the
same as the destination register, because that register will be overwritten
with an intermediate value after insn splitting.

     /* example #1 */
     int test1(int b, int a) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result #1 (incorrect)
     test1:
     	extui	a2, a3, 10, 1	;; overwrites A2 before used
     	addx4	a2, a2, a2
     	ret.n

This patch fixes that.

     ;; result #1 (correct)
     test1:
     	extui	a3, a3, 10, 1	;; uses A3 and then overwrites
     	addx4	a2, a3, a2
     	ret.n

However, it should be noted that the first source register can be the same
as the destination without any problems.

     /* example #2 */
     int test2(int a, int b) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result (correct)
     test2:
     	extui	a2, a2, 10, 1	;; uses A2 and then overwrites
     	addx4	a2, a2, a3
     	ret.n

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*extzvsi-1bit_addsubx):
	Add '&' to the destination register constraint to indicate that
	it is 'earlyclobber', append '0' to the first source register
	constraint to indicate that it can be the same as the destination
	register, and change the split condition from 1 to reload_completed
	so that the insn will be split only after RA in order to obtain
         allocated registers that satisfy the above constraints.
---
  gcc/config/xtensa/xtensa.md | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Max Filippov Nov. 11, 2024, 3:01 a.m. UTC | #1
On Sat, Nov 9, 2024 at 10:39 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> The second source register of insn "*extzvsi-1bit_addsubx" cannot be the
> same as the destination register, because that register will be overwritten
> with an intermediate value after insn splitting.
>
>      /* example #1 */
>      int test1(int b, int a) {
>        return ((a & 1024) ? 4 : 0) + b;
>      }
>
>      ;; result #1 (incorrect)
>      test1:
>         extui   a2, a3, 10, 1   ;; overwrites A2 before used
>         addx4   a2, a2, a2
>         ret.n

Interestingly I couldn't reproduce it with the current gcc mainline.
For me it produces the following for the above source:

test1:
       mov.n   a9, a2
       extui   a2, a3, 10, 1
       addx4   a2, a2, a9
       ret.n

With this change the generated code is one instruction shorter,
so applying it still makes sense.

> This patch fixes that.
>
>      ;; result #1 (correct)
>      test1:
>         extui   a3, a3, 10, 1   ;; uses A3 and then overwrites
>         addx4   a2, a3, a2
>         ret.n
>
> However, it should be noted that the first source register can be the same
> as the destination without any problems.
>
>      /* example #2 */
>      int test2(int a, int b) {
>        return ((a & 1024) ? 4 : 0) + b;
>      }
>
>      ;; result (correct)
>      test2:
>         extui   a2, a2, 10, 1   ;; uses A2 and then overwrites
>         addx4   a2, a2, a3
>         ret.n
>
> gcc/ChangeLog:
>
>         * config/xtensa/xtensa.md (*extzvsi-1bit_addsubx):
>         Add '&' to the destination register constraint to indicate that
>         it is 'earlyclobber', append '0' to the first source register
>         constraint to indicate that it can be the same as the destination
>         register, and change the split condition from 1 to reload_completed
>         so that the insn will be split only after RA in order to obtain
>          allocated registers that satisfy the above constraints.
> ---
>   gcc/config/xtensa/xtensa.md | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master
Alexey Lapshin Nov. 13, 2024, 11:19 a.m. UTC | #2
Takayuki, thank you for the quick fix!

It seems works good now except only one degradation. Instead generating two instructions:


7	        ptr += (i & 1);
   0x40078564 <+12>:	extui	a9, a8, 0, 1
   0x40078567 <+15>:	addx2	a2, a9, a2

Now it generates three:


7	        ptr += (i & 1);
   0x40078564 <+12>:	extui	a9, a8, 0, 1
   0x40078567 <+15>:	addx2	a9, a9, a2
   0x4007856a <+18>:	mov.n	a2, a9


Testcase to reproduce:


#include <stdint.h>

__attribute__((noinline))
void * func (short *ptr, uint8_t count)
{
    for (unsigned i = 0; i < count; i++) {
        ptr += (i & 1);
    }
    return ptr;
}

int main(void)
{
    short buf[16] = {}; 
    func(buf, 16);
}
diff mbox series

Patch

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index e8349a93b..695b9c861 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1108,17 +1108,17 @@ 
  		      (const_int 6)))])
  
  (define_insn_and_split "*extzvsi-1bit_addsubx"
-  [(set (match_operand:SI 0 "register_operand" "=a")
+  [(set (match_operand:SI 0 "register_operand" "=&a")
  	(match_operator:SI 5 "addsub_operator"
  		[(and:SI (match_operator:SI 6 "logical_shift_operator"
-				[(match_operand:SI 1 "register_operand" "r")
+				[(match_operand:SI 1 "register_operand" "r0")
  				 (match_operand:SI 3 "const_int_operand" "i")])
  			 (match_operand:SI 4 "const_int_operand" "i"))
  		 (match_operand:SI 2 "register_operand" "r")]))]
    "TARGET_ADDX
     && IN_RANGE (exact_log2 (INTVAL (operands[4])), 1, 3)"
    "#"
-  "&& 1"
+  "&& reload_completed"
    [(set (match_dup 0)
  	(zero_extract:SI (match_dup 1)
  			 (const_int 1)