Message ID | CACgzC7CJnoaOsFMDjXDZ0CLiECWPbejxDHg2GOk_AUtjrzC80w@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 08/04/14 02:24, Zhenqiang Chen wrote: >>> >>> ChangeLog: >>> 2014-05-22 Zhenqiang Chen <zhenqiang.chen@linaro.org> >>> >>> Part of PR rtl-optimization/61225 >>> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New >>> proto. >>> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. >>> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when >>> propagating to SET. >> >> I can't help but wonder why the new 4 insn combination code isn't presenting >> this as a nice big fat insn to the x86 backend which would eliminate the >> need for the peep2. >> >> But, assuming there's a fundamental reason why that's not kicking in... > > Current combine pass can only handle > > I0 -> I1 -> I2 -> I3. > I0, I1 -> I2, I2 -> I3. > I0 -> I2; I1, I2 -> I3. > I0 -> I1; I1, I2 -> I3. > > For the case, it is > I1 -> I2 -> I3; I2 -> INSN > > I3 and INSN looks like not related. But INSN is a COMPARE to set CC > and I3 can also set CC. I3 and INSN can be combined together as one > instruction to set CC. Presumably there's no dataflow between I3 and INSN because they both set CC (doesn't that make them anti-dependent? Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply find it easier to look at RTL rather than guess why we don't have the appropriate linkage and thus not attempting the combinations we want. Pseudo code for the resulting I3 and INSN would help -- as I work through this there's some inconsistencies in how I'm interpreting a few things and RTL and pseudo-rtl for the desired output RTL would help a lot. I > > ChangeLog > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > Part of PR rtl-optimization/61225 > * combine.c (refer_same_reg_p): New function. > (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. > (try_combine): Add one more parameter TO_COMBINED_INSN, which is > used to create a new insn parallel (TO_COMBINED_INSN, I3). > > testsuite/ChangeLog: > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * gcc.target/i386/pr61225.c: New test. > > diff --git a/gcc/combine.c b/gcc/combine.c > index 53ac1d6..42098ab 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx); > static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *); > static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *); > static int contains_muldiv (rtx); > -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx); > +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx); > static void undo_all (void); > static void undo_commit (void); > static rtx *find_split_point (rtx *, rtx, bool); > @@ -1099,6 +1099,46 @@ insn_a_feeds_b (rtx a, rtx b) > #endif > return false; > } > + > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > + except A and B. */ > + > +static bool > +refer_same_reg_p (rtx a, rtx b) > +{ > + rtx seta = single_set (a); > + rtx setb = single_set (b); > + > + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > + || !seta || !setb) > + return false; Go ahead and use || !setb || !setb It's a bit more vertical space, but I believe closer in line with our coding standards. > + > + if (GET_CODE (SET_SRC (seta)) != COMPARE > + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC > + || !REG_P (XEXP (SET_SRC (seta), 0)) > + || !const0_rtx > + || !REG_P (SET_SRC (setb)) > + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) > + return false; What's the !const0_rtx test here? Don't you want to test some object from SETA against const0_rtx? Also note that you may need to test against CONST0_RTX (mode) > @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs) > } > } > > + /* Try to combine a compare insn that sets CC > + with a preceding insn that can set CC, and maybe with its > + logical predecessor as well. > + We need this special code because data flow connections > + do not get entered in LOG_LINKS. */ So you'd want to be more specific about what dataflow connections are not in the LOG_LINKS that we want. It feels to me like we're missing the anti-dependence links on CC and that there's a general aspect to combine missing here. But I want to hold off on final judgement until I know more. I also wonder if compare-elim ought to be helping here. Isn't that the point here, to eliminate the comparison and instead get it for free as part of the arithmetic? If so, is it the fact that we have memory references that prevents compare-elim from kicking in? jeff
> I also wonder if compare-elim ought to be helping here. Isn't that the > point here, to eliminate the comparison and instead get it for free as > part of the arithmetic? If so, is it the fact that we have memory > references that prevents compare-elim from kicking in? Yes, compare-elim doesn't work with memory references but, more radically, it is not enabled for x86 (it is only enabled for aarch64, mn10300 and rx).
> -----Original Message----- > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches- > owner@gcc.gnu.org] On Behalf Of Jeff Law > Sent: Tuesday, December 02, 2014 6:11 AM > To: Zhenqiang Chen > Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek > Subject: Re: [PATCH] Fix PR 61225 > > On 08/04/14 02:24, Zhenqiang Chen wrote: > > >>> > >>> ChangeLog: > >>> 2014-05-22 Zhenqiang Chen <zhenqiang.chen@linaro.org> > >>> > >>> Part of PR rtl-optimization/61225 > >>> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): > >>> New proto. > >>> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. > >>> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note > when > >>> propagating to SET. > >> > >> I can't help but wonder why the new 4 insn combination code isn't > >> presenting this as a nice big fat insn to the x86 backend which would > >> eliminate the need for the peep2. > >> > >> But, assuming there's a fundamental reason why that's not kicking in... > > > > Current combine pass can only handle > > > > I0 -> I1 -> I2 -> I3. > > I0, I1 -> I2, I2 -> I3. > > I0 -> I2; I1, I2 -> I3. > > I0 -> I1; I1, I2 -> I3. > > > > For the case, it is > > I1 -> I2 -> I3; I2 -> INSN > > > > I3 and INSN looks like not related. But INSN is a COMPARE to set CC > > and I3 can also set CC. I3 and INSN can be combined together as one > > instruction to set CC. > Presumably there's no dataflow between I3 and INSN because they both set > CC (doesn't that make them anti-dependent? > > Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply find it > easier to look at RTL rather than guess why we don't have the appropriate > linkage and thus not attempting the combinations we want. > > Pseudo code for the resulting I3 and INSN would help -- as I work through > this there's some inconsistencies in how I'm interpreting a few things and RTL > and pseudo-rtl for the desired output RTL would help a lot. C code: if (!--*p) rtl code: 6: r91:SI=[r90:SI] 7: {r88:SI=r91:SI-0x1;clobber flags:CC;} 8: [r90:SI]=r88:SI 9: flags:CCZ=cmp(r88:SI,0) expected output: 8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;} in assemble, it is decl (%eax) > > > > ChangeLog > > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > > > Part of PR rtl-optimization/61225 > > * combine.c (refer_same_reg_p): New function. > > (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. > > (try_combine): Add one more parameter TO_COMBINED_INSN, which > is > > used to create a new insn parallel (TO_COMBINED_INSN, I3). > > > > testsuite/ChangeLog: > > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > > > * gcc.target/i386/pr61225.c: New test. > > > > diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab > > 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx); > > static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *); > > static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *); > > static int contains_muldiv (rtx); > > -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx); > > +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx); > > static void undo_all (void); > > static void undo_commit (void); > > static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46 > > @@ insn_a_feeds_b (rtx a, rtx b) > > #endif > > return false; > > } > > + > > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > > + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > > + except A and B. */ > > + > > +static bool > > +refer_same_reg_p (rtx a, rtx b) > > +{ > > + rtx seta = single_set (a); > > + rtx setb = single_set (b); > > + > > + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > > + || !seta || !setb) > > + return false; > Go ahead and use > || !setb > || !setb > > It's a bit more vertical space, but I believe closer in line with our coding > standards. Updated. > > + > > + if (GET_CODE (SET_SRC (seta)) != COMPARE > > + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC > > + || !REG_P (XEXP (SET_SRC (seta), 0)) > > + || !const0_rtx > > + || !REG_P (SET_SRC (setb)) > > + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) > > + return false; > What's the !const0_rtx test here? Don't you want to test some object > from SETA against const0_rtx? Also note that you may need to test > against CONST0_RTX (mode) It's my fault. It should be XEXP (SET_SRC (seta), 1) != const0_rtx The updated patch is attached. Thanks! -Zhenqiang > > @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs) > > } > > } > > > > + /* Try to combine a compare insn that sets CC > > + with a preceding insn that can set CC, and maybe with its > > + logical predecessor as well. > > + We need this special code because data flow connections > > + do not get entered in LOG_LINKS. */ > So you'd want to be more specific about what dataflow connections are > not in the LOG_LINKS that we want. > > It feels to me like we're missing the anti-dependence links on CC and > that there's a general aspect to combine missing here. But I want to > hold off on final judgement until I know more. > > I also wonder if compare-elim ought to be helping here. Isn't that the > point here, to eliminate the comparison and instead get it for free as > part of the arithmetic? If so, is it the fact that we have memory > references that prevents compare-elim from kicking in? > > jeff >
On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote: > C code: > > if (!--*p) > > rtl code: > > 6: r91:SI=[r90:SI] > 7: {r88:SI=r91:SI-0x1;clobber flags:CC;} > 8: [r90:SI]=r88:SI > 9: flags:CCZ=cmp(r88:SI,0) > > expected output: > > 8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;} > > in assemble, it is > > decl (%eax) Combine does not consider combining 9 into 7 because there is no LOG_LINK between them (the link for r88 is between 8 and 7 already). Segher
On Thu, Dec 04, 2014 at 02:49:56PM -0600, Segher Boessenkool wrote: > On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote: > > C code: > > > > if (!--*p) > > > > rtl code: > > > > 6: r91:SI=[r90:SI] > > 7: {r88:SI=r91:SI-0x1;clobber flags:CC;} > > 8: [r90:SI]=r88:SI > > 9: flags:CCZ=cmp(r88:SI,0) > > > > expected output: > > > > 8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;} > > > > in assemble, it is > > > > decl (%eax) > > Combine does not consider combining 9 into 7 because there is no LOG_LINK > between them (the link for r88 is between 8 and 7 already). So combine tries to combine 6+7+8; the RTL it comes up with is a parallel of the memory decrement (without cc clobber, but that is fine), and setting r88 to the mem minus one. There is no such pattern in the target, and combine cannot break the parallel into two sets (because the first modifies the mem used by the second), so 6+7+8 doesn't combine. Adding a bridge pattern in the target would work; or you can enhance combine so it can break up this parallel correctly. Segher
On 12/04/14 13:49, Segher Boessenkool wrote: > On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote: >> C code: >> >> if (!--*p) >> >> rtl code: >> >> 6: r91:SI=[r90:SI] >> 7: {r88:SI=r91:SI-0x1;clobber flags:CC;} >> 8: [r90:SI]=r88:SI >> 9: flags:CCZ=cmp(r88:SI,0) >> >> expected output: >> >> 8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;} >> >> in assemble, it is >> >> decl (%eax) > > Combine does not consider combining 9 into 7 because there is no LOG_LINK > between them (the link for r88 is between 8 and 7 already). OK, yea, that's a long standing design decision. We don't feed a single def into multiple use sites. jeff
On 12/04/14 13:57, Segher Boessenkool wrote: > > So combine tries to combine 6+7+8; the RTL it comes up with is a parallel > of the memory decrement (without cc clobber, but that is fine), and setting > r88 to the mem minus one. There is no such pattern in the target, and > combine cannot break the parallel into two sets (because the first modifies > the mem used by the second), so 6+7+8 doesn't combine. > > Adding a bridge pattern in the target would work; or you can enhance combine > so it can break up this parallel correctly. I think myself or someone suggested a bridge pattern in the past, but I can't find it, perhaps it was one of the other threads WRT limitations of the combiner. Zhenqiang, can you look at what happens if you provide a pattern for 6+7+8 (probably via a define_and_split)? Jeff
On Fri, Dec 05, 2014 at 03:36:01PM -0700, Jeff Law wrote: > >So combine tries to combine 6+7+8; the RTL it comes up with is a parallel > >of the memory decrement (without cc clobber, but that is fine), and setting > >r88 to the mem minus one. There is no such pattern in the target, and > >combine cannot break the parallel into two sets (because the first modifies > >the mem used by the second), so 6+7+8 doesn't combine. > > > >Adding a bridge pattern in the target would work; or you can enhance > >combine > >so it can break up this parallel correctly. > I think myself or someone suggested a bridge pattern in the past, but I > can't find it, perhaps it was one of the other threads WRT limitations > of the combiner. > > Zhenqiang, can you look at what happens if you provide a pattern for > 6+7+8 (probably via a define_and_split)? I tried this out yesterday. There are a few options (a bridge pattern for 6+7+8, or one for 7+8). I went with 6+7+8. So the code combine is asked to optimise is 6 A = M 7 T = A + B 8 M = T 9 C = cmp T, 0 and the bridge pattern I added is M = M + B :: T = M + B (I made it to split to M = M + B ; T = M which is probably not optimal, but irrelevant for the rest here). So combine happily combines 6+7+8 to the bridge pattern. But then it forgets to make a link from 9. I suppose it just doesn't know how to make a link to a parallel (it wouldn't ever be useful before my recent patches). Investigating... Segher
On Fri, Dec 05, 2014 at 03:31:54PM -0700, Jeff Law wrote: > >Combine does not consider combining 9 into 7 because there is no LOG_LINK > >between them (the link for r88 is between 8 and 7 already). > OK, yea, that's a long standing design decision. We don't feed a single > def into multiple use sites. There is no real reason not to do that. It doesn't increase computational complexity, although it is of course more expensive than what combine does today (it is more work, after all). And combining with a later use does not have too big a chance to succeed (since it has to keep the result of the earlier insn around always). GCC 6 or later ;-) Segher
On Thu, Dec 04, 2014 at 02:57:56PM -0600, Segher Boessenkool wrote: > Adding a bridge pattern in the target would work; or you can enhance combine > so it can break up this parallel correctly. I also investigated that second option. The enhancement transforms the combine result M = XXX :: T = XXX into M = XXX T = M and then the set of T can combine with its later use (the compare), but it won't ever combine that with the store to M: there is never a link for memory, only for registers. Never mind that this is unsuitable for many targets anyway (it creates a read-after-write hazard). Segher
On Fri, Dec 05, 2014 at 06:09:11PM -0600, Segher Boessenkool wrote: > On Fri, Dec 05, 2014 at 03:36:01PM -0700, Jeff Law wrote: > > Zhenqiang, can you look at what happens if you provide a pattern for > > 6+7+8 (probably via a define_and_split)? > > I tried this out yesterday. There are a few options (a bridge pattern > for 6+7+8, or one for 7+8). I went with 6+7+8. > > So the code combine is asked to optimise is > > 6 A = M > 7 T = A + B > 8 M = T > 9 C = cmp T, 0 ... and combine will never combine a write to memory (8 here) into a later insn (see can_combine_p). So this won't ever fly. I see no reasonably simple way combine can be convinced to do this. There are various possible schemes to pull insn 9 to before 8, but when does this help and when does it hurt? It all depends on the target :-( Segher
On 12/05/14 17:16, Segher Boessenkool wrote: > On Fri, Dec 05, 2014 at 03:31:54PM -0700, Jeff Law wrote: >>> Combine does not consider combining 9 into 7 because there is no LOG_LINK >>> between them (the link for r88 is between 8 and 7 already). >> OK, yea, that's a long standing design decision. We don't feed a single >> def into multiple use sites. > > There is no real reason not to do that. It doesn't increase computational > complexity, although it is of course more expensive than what combine does > today (it is more work, after all). And combining with a later use does > not have too big a chance to succeed (since it has to keep the result of > the earlier insn around always). No fundamental reason, it's just always been that way. One could argue that a bridge pattern often makes this unnecessary and bridges have been a well known way to work around combine's failings for a long time. > GCC 6 or later ;-) Yea, I think so :) jeff
On 12/04/14 01:43, Zhenqiang Chen wrote: >>> > > >>> > > Part of PR rtl-optimization/61225 >>> > > * combine.c (refer_same_reg_p): New function. >>> > > (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. >>> > > (try_combine): Add one more parameter TO_COMBINED_INSN, which >> >is >>> > > used to create a new insn parallel (TO_COMBINED_INSN, I3). >>> > > >>> > >testsuite/ChangeLog: >>> > >2014-08-04 Zhenqiang Chen<zhenqiang.chen@linaro.org> >>> > > >>> > > * gcc.target/i386/pr61225.c: New test. THanks for the updates and clarifications. Just a few minor things and while it's a bit of a hack, I'll approve: > + > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > + except A and B. */ > + > +static bool > +refer_same_reg_p (rtx_insn *a, rtx_insn *b) > +{ > + rtx seta = single_set (a); > + rtx setb = single_set (b); > + > + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > + || !seta > + || !setb) > + return false; > + > + if (GET_CODE (SET_SRC (seta)) != COMPARE > + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC > + || !REG_P (XEXP (SET_SRC (seta), 0)) > + || XEXP (SET_SRC (seta), 1) != const0_rtx > + || !REG_P (SET_SRC (setb)) > + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) > + return false; Do you need to verify SETA and SETB satisfy single_set? Or has that already been done elsewhere? The name refer_same_reg_p seems wrong -- your function is verifying the underlying RTL store as well as the existence of a a dependency between the insns. Can you try to come up with a better name? Please use CONST0_RTX (mode) IIRC that'll allow this to work regardless of the size of the modes relative to the host word size. > + > + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2) > + { > + df_ref use; > + rtx insn; > + unsigned int i = REGNO (SET_SRC (setb)); > + > + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) > + { > + insn = DF_REF_INSN (use); > + if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn))) > + return false; > + } > + } > + > + return true; > +} Is this fragment really needed? Does it ever trigger? I'd think that for > 2 uses punting would be fine. Do we really commonly have cases with > 2 uses, but where they're all in SETA and SETB? > } > } > > + /* Try to combine a compare insn that sets CC > + with a preceding insn that can set CC, and maybe with its > + logical predecessor as well. > + We need this special code because data flow connections > + do not get entered in LOG_LINKS. */ > + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX > + && refer_same_reg_p (insn, prev) > + && max_combine >= 4) > + { > + struct insn_link *next1; > + FOR_EACH_LOG_LINK (next1, prev) > + { > + rtx_insn *link1 = next1->insn; > + if (NOTE_P (link1)) > + continue; > + /* I1 -> I2 -> I3; I2 -> insn; > + output parallel (insn, I3). */ > + FOR_EACH_LOG_LINK (nextlinks, link1) > + if ((next = try_combine (prev, link1, > + nextlinks->insn, NULL, > + &new_direct_jump_p, > + last_combined_insn, insn)) != 0) > + > + { > + delete_insn (insn); > + insn = next; > + statistics_counter_event (cfun, "four-insn combine", 1); > + goto retry; > + } > + /* I2 -> I3; I2 -> insn > + output next = parallel (insn, I3). */ > + if ((next = try_combine (prev, link1, > + NULL, NULL, > + &new_direct_jump_p, > + last_combined_insn, insn)) != 0) > + > + { > + delete_insn (insn); > + insn = next; > + statistics_counter_event (cfun, "three-insn combine", 1); > + goto retry; > + } > + } > + } So you've got two new combine cases here, but I think the testcase only tests one of them. Can you include a testcase for both of hte major paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN) Please make those changes and repost for final approval. jeff
diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx); static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *); static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *); static int contains_muldiv (rtx); -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx); +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx); static void undo_all (void); static void undo_commit (void); static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46 @@ insn_a_feeds_b (rtx a, rtx b) #endif return false; } + +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. + It returns TRUE, if reg1 == reg2, and no other refer of reg1 + except A and B. */ + +static bool +refer_same_reg_p (rtx a, rtx b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) + || !seta || !setb) + return false; + + if (GET_CODE (SET_SRC (seta)) != COMPARE + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC + || !REG_P (XEXP (SET_SRC (seta), 0)) + || !const0_rtx + || !REG_P (SET_SRC (setb)) + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) + return false; + + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2) + { + df_ref use; + rtx insn; + unsigned int i = REGNO (SET_SRC (setb)); + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) + { + insn = DF_REF_INSN (use); + if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn))) + return false; + } + } + + return true; +} + ^L /* Main entry point for combiner. F is the first insn of the function. NREGS is the first unused pseudo-reg number. @@ -1108,10 +1148,7 @@ insn_a_feeds_b (rtx a, rtx b) static int combine_instructions (rtx f, unsigned int nregs) { - rtx insn, next; -#ifdef HAVE_cc0 - rtx prev; -#endif + rtx insn, next, prev; struct insn_link *links, *nextlinks; rtx first; basic_block last_bb; @@ -1258,7 +1295,7 @@ combine_instructions (rtx f, unsigned int nregs) FOR_EACH_LOG_LINK (links, insn) if ((next = try_combine (insn, links->insn, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "two-insn combine", 1); goto retry; @@ -1279,7 +1316,7 @@ combine_instructions (rtx f, unsigned int nregs) FOR_EACH_LOG_LINK (nextlinks, link) if ((next = try_combine (insn, link, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); goto retry; @@ -1301,13 +1338,13 @@ combine_instructions (rtx f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; } @@ -1321,13 +1358,13 @@ combine_instructions (rtx f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; } @@ -1343,7 +1380,7 @@ combine_instructions (rtx f, unsigned int nregs) && sets_cc0_p (PATTERN (prev)) && (next = try_combine (insn, links->insn, prev, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; #endif @@ -1356,7 +1393,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, links->insn, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); @@ -1385,7 +1422,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1396,7 +1433,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1413,7 +1450,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1423,7 +1460,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs) } } + /* Try to combine a compare insn that sets CC + with a preceding insn that can set CC, and maybe with its + logical predecessor as well. + We need this special code because data flow connections + do not get entered in LOG_LINKS. */ + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX + && refer_same_reg_p (insn, prev) + && max_combine >= 4) + { + struct insn_link *next1; + FOR_EACH_LOG_LINK (next1, prev) + { + rtx link1 = next1->insn; + if (NOTE_P (link1)) + continue; + /* I1 -> I2 -> I3; I2 -> insn; + output parallel (insn, I3). */ + FOR_EACH_LOG_LINK (nextlinks, link1) + if ((next = try_combine (prev, link1, + nextlinks->insn, NULL_RTX, + &new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, "four-insn combine", 1); + goto retry; + } + /* I2 -> I3; I2 -> insn + output next = parallel (insn, I3). */ + if ((next = try_combine (prev, link1, + NULL_RTX, NULL_RTX, + &new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, "three-insn combine", 1); + goto retry; + } + } + } /* Try this insn with each REG_EQUAL note it links back to. */ FOR_EACH_LOG_LINK (links, insn) { @@ -1456,7 +1537,7 @@ combine_instructions (rtx f, unsigned int nregs) i2mod_new_rhs = copy_rtx (note); next = try_combine (insn, i2mod, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn); + last_combined_insn, NULL_RTX); i2mod = NULL_RTX; if (next) { @@ -2450,11 +2531,14 @@ update_cfg_for_uncondjump (rtx insn) LAST_COMBINED_INSN is either I3, or some insn after I3 that has been I3 passed to an earlier try_combine within the same basic - block. */ + block. + + TO_COMBINED_INSN is an insn after I3 that sets CC flags. It is used to + combine with I3 to make a new insn. */ static rtx try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, - rtx last_combined_insn) + rtx last_combined_insn, rtx to_combined_insn) { /* New patterns for I3 and I2, respectively. */ rtx newpat, newi2pat = 0; @@ -2543,6 +2627,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, || cant_combine_insn_p (i2) || (i1 && cant_combine_insn_p (i1)) || (i0 && cant_combine_insn_p (i0)) + || (to_combined_insn && cant_combine_insn_p (to_combined_insn)) || likely_spilled_retval_p (i3)) return 0; @@ -3216,7 +3301,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, rtx old = newpat; total_sets = 1 + extra_sets; newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); - XVECEXP (newpat, 0, 0) = old; + + if (to_combined_insn) + XVECEXP (newpat, 0, --total_sets) = old; + else + XVECEXP (newpat, 0, 0) = old; } if (added_sets_0) @@ -3239,6 +3328,21 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n) t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0, 0); + if (to_combined_insn) + { + rtx todest = NULL_RTX, tosrc = NULL_RTX; + if (can_combine_p (i2, to_combined_insn, NULL_RTX, NULL_RTX, + i3, NULL_RTX, &todest, &tosrc)) + { + rtx pat = copy_rtx (PATTERN (to_combined_insn)); + t = subst (pat, todest, tosrc, 0, 0, 0); + } + else + { + undo_all (); + return 0; + } + } XVECEXP (newpat, 0, --total_sets) = t; } } diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c b/gcc/testsuite/gcc.target/i386/pr61225.c new file mode 100644 index 0000000..9c08102 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr61225.c @@ -0,0 +1,16 @@ +/* PR rtl-optimization/61225 */ +/* { dg-do compile } */ +/* { dg-options "-Os -fdump-rtl-combine-details" } */ + +void foo (void *); + +int * +f1 (int *x) +{ + if (!--*x) + foo (x); + return x; +} + +/* { dg-final { scan-rtl-dump "Successfully matched this instruction" "combine" } } */