Message ID | bfcff14a-49fc-4f47-9a1e-2c04d9cd3458@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] Add subreg patterns for SImode rotate and mask insert | expand |
Hi, on 2024/3/1 10:41, HAO CHEN GUI wrote: > Hi, > This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In > combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode > lshiftrt with an out AND. It matches a DImode rotate and mask insert on > rs6000. > > Trying 2 -> 7: > 2: r122:DI=r129:DI > REG_DEAD r129:DI > 7: r125:SI=r122:DI#0 0>>0x1f > REG_DEAD r122:DI > Failed to match this instruction: > (set (subreg:DI (reg:SI 125 [ x ]) 0) > (zero_extract:DI (reg:DI 129) > (const_int 32 [0x20]) > (const_int 1 [0x1]))) > Successfully matched this instruction: > (set (subreg:DI (reg:SI 125 [ x ]) 0) > (and:DI (lshiftrt:DI (reg:DI 129) > (const_int 31 [0x1f])) > (const_int 4294967295 [0xffffffff]))) > > This conversion blocks the further combination which combines to a SImode > rotate and mask insert insn. > > Trying 9, 7 -> 10: > 9: r127:SI=r130:DI#0&0xfffffffffffffffe > REG_DEAD r130:DI > 7: r125:SI#0=r129:DI 0>>0x1f&0xffffffff > REG_DEAD r129:DI > 10: r124:SI=r127:SI|r125:SI > REG_DEAD r125:SI > REG_DEAD r127:SI > Failed to match this instruction: > (set (reg:SI 124) > (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) > (const_int -2 [0xfffffffffffffffe])) > (subreg:SI (zero_extract:DI (reg:DI 129) > (const_int 32 [0x20]) > (const_int 1 [0x1])) 0))) > Failed to match this instruction: > (set (reg:SI 124) > (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) > (const_int -2 [0xfffffffffffffffe])) > (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129) > (const_int 31 [0x1f])) > (const_int 4294967295 [0xffffffff])) 0))) > > The root cause of the issue is if it's necessary to do the widen mode for > lshiftrt when the target already has the narrow mode lshiftrt and its cost > is not high. My former patch tried to fix the problem but not accepted yet. > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html Hope Segher can chime in this proposal updating combine pass, I can understand the new proposal by introducing new patterns in target code is able to fix the issue, but IMHO it's likely there are some other mis-optimization which don't get noticed and they need some similar pattern extension (duplicate some pattern & adjust with subreg) to optimize, from this perspective, it would be nice if it's possible to have a more general fix. Some minor comments for this patch itself are inlined. > > As it's stage 4 now, I drafted this patch to fix the regression by adding > subreg patterns of SImode rotate and mask insert. It actually does reversed > things and narrow the mode for lshiftrt so that it can matches the SImode > rotate and mask insert. > > The case "rlwimi-2.c" is fixed and restore the corresponding number of > insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm" > is replaced with 9 "rldicl" as the sequence of combine is changed. It's not > a regression as the total number of insns isn't changed. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > Thanks > Gui Haochen > > > ChangeLog > rs6000: Add subreg patterns for SImode rotate and mask insert > > In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode > lshiftrt with an AND. The new pattern matches rotate and mask insert on > rs6000. Thus it blocks the pattern to be further combined to a SImode rotate > and mask insert pattern. This patch fixes the problem by adding two subreg > pattern for SImode rotate and mask insert patterns. > > gcc/ > PR target/93738 > * config/rs6000/rs6000.md (*rotlsi3_insert_9): New. > (*rotlsi3_insert_8): New. > > gcc/testsuite/ > PR target/93738 > * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit > rotate instructions. > * gcc.target/powerpc/rlwinm-0.c: Likewise. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index bc8bc6ab060..b0b40f91e3e 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -4253,6 +4253,36 @@ (define_insn "*rotl<mode>3_insert" > ; difference between rlwimi and rldimi. We also might want dot forms, > ; but not for rlwimi on POWER4 and similar processors. > > +; Subreg pattern of insn "*rotlsi3_insert" > +(define_insn_and_split "*rotlsi3_insert_9" Nit: "*rotlsi3_insert_subreg" seems a better name, ... > + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") > + (ior:SI (and:SI > + (match_operator:SI 8 "lowpart_subreg_operator" > + [(and:DI (match_operator:DI 4 "rotate_mask_operator" > + [(match_operand:DI 1 "gpc_reg_operand" "r") > + (match_operand:SI 2 "const_int_operand" "n")]) > + (match_operand:DI 3 "const_int_operand" "n"))]) > + (match_operand:SI 5 "const_int_operand" "n")) > + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0") > + (match_operand:SI 7 "const_int_operand" "n"))))] > + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode) > + && GET_CODE (operands[4]) == LSHIFTRT > + && INTVAL (operands[3]) == 0xffffffff > + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (ior:SI (and:SI (lshiftrt:SI (match_dup 9) > + (match_dup 2)) > + (match_dup 5)) > + (and:SI (match_dup 6) > + (match_dup 7))))] > +{ > + int offset = BYTES_BIG_ENDIAN ? 4 : 0; > + operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset); > +} > + [(set_attr "type" "insert")]) > + > (define_insn "*rotl<mode>3_insert_2" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (ior:GPR (and:GPR (match_operand:GPR 5 "gpc_reg_operand" "0") > @@ -4331,6 +4361,31 @@ (define_insn "*rotlsi3_insert_4" > "rlwimi %0,%1,32-%h2,%h2,31" > [(set_attr "type" "insert")]) > > +; Subreg pattern of insn "*rotlsi3_insert_4" > +(define_insn_and_split "*rotlsi3_insert_8" ..., and "*rotlsi3_insert_4_subreg" for this. > + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") > + (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0") > + (match_operand:SI 4 "const_int_operand" "n")) > + (match_operator:SI 6 "lowpart_subreg_operator" > + [(and:DI > + (lshiftrt:DI (match_operand:DI 1 "gpc_reg_operand" "r") > + (match_operand:SI 2 "const_int_operand" "n")) > + (match_operand:DI 5 "const_int_operand" "n"))])))] > + "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32 > + && INTVAL (operands[5]) == 0xffffffff" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (ior:SI (and:SI (match_dup 3) > + (match_dup 4)) > + (lshiftrt:SI (match_dup 7) > + (match_dup 2))))] > +{ > + int offset = BYTES_BIG_ENDIAN ? 4 : 0; > + operands[7] = gen_rtx_SUBREG (SImode, operands[1], offset); > +} > + [(set_attr "type" "insert")]) > + > (define_insn "*rotlsi3_insert_5" > [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") > (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") > diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > index bafa371db73..62344a95aa0 100644 > --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > @@ -6,10 +6,9 @@ > /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } } } */ > > -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */ > > /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */ > > diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c > index 4f4fca2d8ef..a10d9174306 100644 > --- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c > +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c > @@ -4,10 +4,10 @@ > /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } } } */ > > /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */ Peter's commit r14-9085-g81e5f276c59897 has fixed this with same counts, not rebased? BR, Kewen
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bc8bc6ab060..b0b40f91e3e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4253,6 +4253,36 @@ (define_insn "*rotl<mode>3_insert" ; difference between rlwimi and rldimi. We also might want dot forms, ; but not for rlwimi on POWER4 and similar processors. +; Subreg pattern of insn "*rotlsi3_insert" +(define_insn_and_split "*rotlsi3_insert_9" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") + (ior:SI (and:SI + (match_operator:SI 8 "lowpart_subreg_operator" + [(and:DI (match_operator:DI 4 "rotate_mask_operator" + [(match_operand:DI 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "const_int_operand" "n")]) + (match_operand:DI 3 "const_int_operand" "n"))]) + (match_operand:SI 5 "const_int_operand" "n")) + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0") + (match_operand:SI 7 "const_int_operand" "n"))))] + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode) + && GET_CODE (operands[4]) == LSHIFTRT + && INTVAL (operands[3]) == 0xffffffff + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0" + "#" + "&& 1" + [(set (match_dup 0) + (ior:SI (and:SI (lshiftrt:SI (match_dup 9) + (match_dup 2)) + (match_dup 5)) + (and:SI (match_dup 6) + (match_dup 7))))] +{ + int offset = BYTES_BIG_ENDIAN ? 4 : 0; + operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset); +} + [(set_attr "type" "insert")]) + (define_insn "*rotl<mode>3_insert_2" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (and:GPR (match_operand:GPR 5 "gpc_reg_operand" "0") @@ -4331,6 +4361,31 @@ (define_insn "*rotlsi3_insert_4" "rlwimi %0,%1,32-%h2,%h2,31" [(set_attr "type" "insert")]) +; Subreg pattern of insn "*rotlsi3_insert_4" +(define_insn_and_split "*rotlsi3_insert_8" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") + (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0") + (match_operand:SI 4 "const_int_operand" "n")) + (match_operator:SI 6 "lowpart_subreg_operator" + [(and:DI + (lshiftrt:DI (match_operand:DI 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "const_int_operand" "n")) + (match_operand:DI 5 "const_int_operand" "n"))])))] + "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32 + && INTVAL (operands[5]) == 0xffffffff" + "#" + "&& 1" + [(set (match_dup 0) + (ior:SI (and:SI (match_dup 3) + (match_dup 4)) + (lshiftrt:SI (match_dup 7) + (match_dup 2))))] +{ + int offset = BYTES_BIG_ENDIAN ? 4 : 0; + operands[7] = gen_rtx_SUBREG (SImode, operands[1], offset); +} + [(set_attr "type" "insert")]) + (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c index bafa371db73..62344a95aa0 100644 --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c @@ -6,10 +6,9 @@ /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */ /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */ /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c index 4f4fca2d8ef..a10d9174306 100644 --- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c @@ -4,10 +4,10 @@ /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */ /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */