Message ID | 368de06c-f6d6-e759-0f91-5df170687346@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453] | expand |
Hi! On Thu, Jul 07, 2022 at 04:30:50PM +0800, HAO CHEN GUI wrote: > This patch modifies the combine pattern after recog fails. With a helper It modifies combine itself, not just a pattern in the machine description. > - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with > a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT > or AND. The conversion helps pattern to match rotate and mask insn on some > targets. > For test case rlwimi-2.c, current trunk fails on > "scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch > reduces the total number of insns from 21305 to 21279. That is incorrect. You need to figure out what actually changed, and if that is wanted or not, and then write some explanation about that. > * config/rs6000/rs6000.md (plus_ior_xor): Removed. > (anonymous split pattern for plus_ior_xor): Removed. "Remove.", in both cases. Always use imperative in changelogs and commit messages and the like, not some passive tense. > +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is > + ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits s/psuedo/pseudo/ > + is less than its mode mask. The nonzero_bits in other pass doesn't return > + the same value as it does in combine pass. */ That isn't quite the problem. Later passes can return a mask for nonzero_bits (which means: bits that are not known to be zero) that is not a superset of what was known during combine. If you use nonzero_bits in the insn condition of a define_insn (or define_insn_and_split, same thing under the covers) you then can end up with an insns that is fine during combine, but no longer recog()ed later. > +static bool > +change_pseudo_and_mask (rtx pat) > +{ > + rtx src = SET_SRC (pat); > + if ((GET_CODE (src) == IOR > + || GET_CODE (src) == XOR > + || GET_CODE (src) == PLUS) > + && (((GET_CODE (XEXP (src, 0)) == ASHIFT > + || GET_CODE (XEXP (src, 0)) == AND) > + && REG_P (XEXP (src, 1))))) > + { > + rtx *reg = &XEXP (SET_SRC (pat), 1); Why the extra indirection? SUBST is a macro, it can take lvalues just fine :-) > + machine_mode mode = GET_MODE (*reg); > + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); > + if (nonzero < GET_MODE_MASK (mode)) > + { > + int shift; > + > + if (GET_CODE (XEXP (src, 0)) == ASHIFT) > + shift = INTVAL (XEXP (XEXP (src, 0), 1)); > + else > + shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1))); > + > + if (shift > 0 > + && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero) Too many parens. > + { > + unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1; > + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask)); > + SUBST (*reg, x); > + maybe_swap_commutative_operands (SET_SRC (pat)); > + return true; > + } > + } > + } > + return false; Broken indentation. > --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c > @@ -12,7 +12,7 @@ void rotins (unsigned int x) > b.y = (x<<12) | (x>>20); > } > > -/* { dg-final { scan-assembler-not {\mrlwinm} } } */ > +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */ Why? > +/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */ > +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */ Can this just be /* { dg-final { scan-assembler-times {\mrl[wd]imi\M} 2 } } */ or is it necessary to not want rlwimi on 64-bit? > --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > @@ -2,14 +2,14 @@ > /* { dg-options "-O2" } */ > > /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } } */ You are saying there should be 21279 instructions generated by this test case. What makes you say that? Yes, we regressed some time ago, we generate too many insns in many cases, but that is *bad*. > /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } } */ This needs an explanation (and then the 32-bit and 64-bit checks can be merged). This probably needs changes after 4306339798b6 (if it is still wanted?) Segher
Hi, Segher On 8/7/2022 上午 1:31, Segher Boessenkool wrote: >> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c >> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c >> @@ -2,14 +2,14 @@ >> /* { dg-options "-O2" } */ >> >> /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } } } */ >> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } } */ >> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } } */ > You are saying there should be 21279 instructions generated by this test > case. What makes you say that? Yes, we regressed some time ago, we > generate too many insns in many cases, but that is *bad*. > The trunk generates 21305. My patch generates 26 "rlwimi" instead of "rlwimn+ior". So it saves 26 insns and reduce the total number of insns from 21305 to 21279 and increase the number of "rlwimi" from 1666 to 1692. I did a biset for the problem. After commit "commit 8d2d39587: combine: Do not combine moves from hard registers", the case fails. The root cause is it can't combine from the hard registers and has to use subreg which causes its high part to be undefined. Thus, there is an additional "AND" generated. Before the commit Trying 2 -> 7: 2: r125:DI=%3:DI REG_DEAD %3:DI 7: r128:SI=r125:DI#0 0>>0x1f REG_DEAD r125:DI Successfully matched this instruction: (set (reg:SI 128 [ x ]) (lshiftrt:SI (reg:SI 3 3 [ x ]) (const_int 31 [0x1f]))) allowing combination of insns 2 and 7 After the commit Trying 20 -> 7: 20: r125:DI=r132:DI REG_DEAD r132:DI 7: r128:SI=r125:DI#0 0>>0x1f REG_DEAD r125:DI Failed to match this instruction: (set (subreg:DI (reg:SI 128 [ x ]) 0) (zero_extract:DI (reg:DI 132) (const_int 32 [0x20]) (const_int 1 [0x1]))) Successfully matched this instruction: (set (subreg:DI (reg:SI 128 [ x ]) 0) (and:DI (lshiftrt:DI (reg:DI 132) (const_int 31 [0x1f])) (const_int 4294967295 [0xffffffff]))) allowing combination of insns 20 and 7 The problem should be fixed in another case? Please advice. Thanks Gui Haochen
Hi! On Mon, Jul 11, 2022 at 10:13:41AM +0800, HAO CHEN GUI wrote: > I did a biset for the problem. After commit "commit 8d2d39587: combine: Do not combine > moves from hard registers", the case fails. The root cause is it can't combine from the > hard registers and has to use subreg which causes its high part to be undefined. Thus, > there is an additional "AND" generated. > > Before the commit > Trying 2 -> 7: > 2: r125:DI=%3:DI > REG_DEAD %3:DI > 7: r128:SI=r125:DI#0 0>>0x1f > REG_DEAD r125:DI > Successfully matched this instruction: > (set (reg:SI 128 [ x ]) > (lshiftrt:SI (reg:SI 3 3 [ x ]) > (const_int 31 [0x1f]))) > allowing combination of insns 2 and 7 > > After the commit > Trying 20 -> 7: > 20: r125:DI=r132:DI > REG_DEAD r132:DI > 7: r128:SI=r125:DI#0 0>>0x1f > REG_DEAD r125:DI > Failed to match this instruction: > (set (subreg:DI (reg:SI 128 [ x ]) 0) > (zero_extract:DI (reg:DI 132) > (const_int 32 [0x20]) > (const_int 1 [0x1]))) > Successfully matched this instruction: > (set (subreg:DI (reg:SI 128 [ x ]) 0) > (and:DI (lshiftrt:DI (reg:DI 132) > (const_int 31 [0x1f])) > (const_int 4294967295 [0xffffffff]))) > allowing combination of insns 20 and 7 > > The problem should be fixed in another case? Please advice. You should not change the expected counts to what is currently generated. What is currently generated is sub-optimal. It all starts with those zero_extracts, which are always bad for us -- it is a harder to manipulate representation of a limited subset of more basic operations we *do* have. And combine and simplify can handle the more general and simpler formulation just fine. Ideally combine would not try to use *_extract at all if this is not used in the machine description (compare to rotatert for example, a similarly redundant operation). But it currently needs it as intermediate form, untangling this all is quite a bit of work. These testcases (all the rl* ones) should have a big fat comment explaining what the expected, wanted code is. This was easier to do originally, when I actually tested all 65536 possibly combinations, because the expected counts were more "regular" numbers. But this is too slow to test in normal testsuite runs :-) It is wrong to pretend the current state makes the wanted code, these testcases are meant to show exactly when we make suboptimal machine code :-) Segher
diff --git a/gcc/combine.cc b/gcc/combine.cc index a5fabf397f7..3cd7b2b652b 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -11599,6 +11599,47 @@ change_zero_ext (rtx pat) return changed; } +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is + ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits + is less than its mode mask. The nonzero_bits in other pass doesn't return + the same value as it does in combine pass. */ +static bool +change_pseudo_and_mask (rtx pat) +{ + rtx src = SET_SRC (pat); + if ((GET_CODE (src) == IOR + || GET_CODE (src) == XOR + || GET_CODE (src) == PLUS) + && (((GET_CODE (XEXP (src, 0)) == ASHIFT + || GET_CODE (XEXP (src, 0)) == AND) + && REG_P (XEXP (src, 1))))) + { + rtx *reg = &XEXP (SET_SRC (pat), 1); + machine_mode mode = GET_MODE (*reg); + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); + if (nonzero < GET_MODE_MASK (mode)) + { + int shift; + + if (GET_CODE (XEXP (src, 0)) == ASHIFT) + shift = INTVAL (XEXP (XEXP (src, 0), 1)); + else + shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1))); + + if (shift > 0 + && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero) + { + unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1; + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask)); + SUBST (*reg, x); + maybe_swap_commutative_operands (SET_SRC (pat)); + return true; + } + } + } + return false; +} + /* Like recog, but we receive the address of a pointer to a new pattern. We try to match the rtx that the pointer points to. If that fails, we may try to modify or replace the pattern, @@ -11646,7 +11687,10 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) } } else - changed = change_zero_ext (pat); + { + changed = change_pseudo_and_mask (pat); + changed |= change_zero_ext (pat); + } } else if (GET_CODE (pat) == PARALLEL) { diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 1367a2cb779..2bd6bd5f908 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4207,24 +4207,6 @@ (define_insn_and_split "*rotl<mode>3_insert_3_<code>" (ior:GPR (and:GPR (match_dup 3) (match_dup 4)) (ashift:GPR (match_dup 1) (match_dup 2))))]) -(define_code_iterator plus_ior_xor [plus ior xor]) - -(define_split - [(set (match_operand:GPR 0 "gpc_reg_operand") - (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand") - (match_operand:SI 2 "const_int_operand")) - (match_operand:GPR 3 "gpc_reg_operand")))] - "nonzero_bits (operands[3], <MODE>mode) - < HOST_WIDE_INT_1U << INTVAL (operands[2])" - [(set (match_dup 0) - (ior:GPR (and:GPR (match_dup 3) - (match_dup 4)) - (ashift:GPR (match_dup 1) - (match_dup 2))))] -{ - operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1); -}) - (define_insn "*rotlsi3_insert_4" [(set (match_operand:SI 0 "gpc_reg_operand" "=r") (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0") diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c b/gcc/testsuite/gcc.target/powerpc/20050603-3.c index 4017d34f429..e628be11532 100644 --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c @@ -12,7 +12,7 @@ void rotins (unsigned int x) b.y = (x<<12) | (x>>20); } -/* { dg-final { scan-assembler-not {\mrlwinm} } } */ +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */ /* { dg-final { scan-assembler-not {\mrldic} } } */ /* { dg-final { scan-assembler-not {\mrot[lr]} } } */ /* { dg-final { scan-assembler-not {\ms[lr][wd]} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr93453-2.c b/gcc/testsuite/gcc.target/powerpc/pr93453-2.c new file mode 100644 index 00000000000..34b7834af8d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr93453-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +long foo (char a, long b) +{ + long c = a; + c = c | (b << 12); + return c; +} + +long bar (long b, char a) +{ + long c = a; + long m = -4096; + c = c | (b & m); + return c; +} + +/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c index bafa371db73..ffb5f9e450f 100644 --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c @@ -2,14 +2,14 @@ /* { dg-options "-O2" } */ /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */ /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } } */ /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */