From patchwork Tue Dec 9 09:49:18 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhenqiang Chen X-Patchwork-Id: 418988 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 589DB1400D2 for ; Tue, 9 Dec 2014 20:49:44 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=NfEDlv7jKYu+LHVL+tUDdwSQqHQDhLmp8/gCaiDyNjaRbPa7Fy1+R V4BZ8ydfYAnN0ETl8pwjfPv1wdiOtAyDJVq582RiFCP1Ax1FNXhaW0uUljwurK3U 5LOTN+e2yDtiEo2rHwlVaTYB/TVVfXskx/sxfEFvF78rwpo+bbWNJY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-type:content-transfer-encoding; s=default; bh=usuLJAT+CjgM9j/ImhcUORvScIY=; b=mZL5E4m2dGbHLnFUu7n71UmX9r0s 0uVVPDQblMBM4i5RFxHFIKAswzRO56GKAjrZQsJbV9i6BZPPHRJZ+HHz89uh7Jru x3U6I+TXBjZxhNklBe3fk9+7OZCWsJPWAtx4P85cnKjlbGrWpqsYKzgmPb8Do1X0 xEwKTTVfNWG6wjY= Received: (qmail 32732 invoked by alias); 9 Dec 2014 09:49:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 32718 invoked by uid 89); 9 Dec 2014 09:49:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 09 Dec 2014 09:49:33 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by service87.mimecast.com; Tue, 09 Dec 2014 09:49:30 +0000 Received: from shawin003 ([10.164.2.132]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 9 Dec 2014 09:49:29 +0000 From: "Zhenqiang Chen" To: "'Jeff Law'" Cc: References: <53C73EBD.9070909@redhat.com> <547CE75B.6090709@redhat.com> <000001d00f9e$64f43600$2edca200$@arm.com> <54861817.4030409@redhat.com> In-Reply-To: <54861817.4030409@redhat.com> Subject: RE: [PATCH] Fix PR 61225 Date: Tue, 9 Dec 2014 17:49:18 +0800 Message-ID: <000301d01395$675d5890$361809b0$@arm.com> MIME-Version: 1.0 X-MC-Unique: 114120909493014601 X-IsSubscribed: yes > -----Original Message----- > From: Jeff Law [mailto:law@redhat.com] > Sent: Tuesday, December 09, 2014 5:29 AM > To: Zhenqiang Chen > Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek > Subject: Re: [PATCH] Fix PR 61225 > > On 12/04/14 01:43, Zhenqiang Chen wrote: > >>> > > > >>> > > Part of PR rtl-optimization/61225 > >>> > > * combine.c (refer_same_reg_p): New function. > >>> > > (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. > >>> > > (try_combine): Add one more parameter TO_COMBINED_INSN, > >>> > > which > >> >is > >>> > > used to create a new insn parallel (TO_COMBINED_INSN, I3). > >>> > > > >>> > >testsuite/ChangeLog: > >>> > >2014-08-04 Zhenqiang Chen > >>> > > > >>> > > * gcc.target/i386/pr61225.c: New test. > THanks for the updates and clarifications. Just a few minor things and while > it's a bit of a hack, I'll approve: > > > > > + > > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > > + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > > + except A and B. */ > > + > > +static bool > > +refer_same_reg_p (rtx_insn *a, rtx_insn *b) > > +{ > > + rtx seta = single_set (a); > > + rtx setb = single_set (b); > > + > > + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > > + || !seta > > + || !setb) > > + return false; > > > > + if (GET_CODE (SET_SRC (seta)) != COMPARE > > + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC > > + || !REG_P (XEXP (SET_SRC (seta), 0)) > > + || XEXP (SET_SRC (seta), 1) != const0_rtx > > + || !REG_P (SET_SRC (setb)) > > + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) > > + return false; > Do you need to verify SETA and SETB satisfy single_set? Or has that > already been done elsewhere? A is NEXT_INSN (insn) B is prev_nonnote_nondebug_insn (insn), For I1 -> I2 -> B; I2 -> A; LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did found codes in function try_combine, which can make sure B (or I3) is single_set. So I think the check can skip failed cases at early stage. > The name refer_same_reg_p seems wrong -- your function is verifying the > underlying RTL store as well as the existence of a a dependency between > the insns. Can you try to come up with a better name? Change it to can_reuse_cc_set_p. > Please use CONST0_RTX (mode) IIRC that'll allow this to work regardless > of the size of the modes relative to the host word size. Updated. > > + > > + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2) > > + { > > + df_ref use; > > + rtx insn; > > + unsigned int i = REGNO (SET_SRC (setb)); > > + > > + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) > > + { > > + insn = DF_REF_INSN (use); > > + if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P > (insn))) > > + return false; > > + } > > + } > > + > > + return true; > > +} > Is this fragment really needed? Does it ever trigger? I'd think that > for > 2 uses punting would be fine. Do we really commonly have cases > with > 2 uses, but where they're all in SETA and SETB? The check is to make sure the correctness. Here is a case, int f1 (int *x) { int t = --*x; if (!t) foo (x); return t; } _4 = *x_3(D); _5 = _4 + -1; *x_3(D) = _5; # DEBUG t => _5 if (_5 == 0) ... : return _5; "_5" is used in "return _5". So we can not remove "_5 = _4 + -1". > > } > > } > > > > + /* Try to combine a compare insn that sets CC > > + with a preceding insn that can set CC, and maybe with its > > + logical predecessor as well. > > + We need this special code because data flow connections > > + do not get entered in LOG_LINKS. */ > > + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX > > + && refer_same_reg_p (insn, prev) > > + && max_combine >= 4) > > + { > > + struct insn_link *next1; > > + FOR_EACH_LOG_LINK (next1, prev) > > + { > > + rtx_insn *link1 = next1->insn; > > + if (NOTE_P (link1)) > > + continue; > > + /* I1 -> I2 -> I3; I2 -> insn; > > + output parallel (insn, I3). */ > > + FOR_EACH_LOG_LINK (nextlinks, link1) > > + if ((next = try_combine (prev, link1, > > + nextlinks->insn, NULL, > > + &new_direct_jump_p, > > + last_combined_insn, insn)) != 0) > > + > > + { > > + delete_insn (insn); > > + insn = next; > > + statistics_counter_event (cfun, "four-insn combine", > 1); > > + goto retry; > > + } > > + /* I2 -> I3; I2 -> insn > > + output next = parallel (insn, I3). */ > > + if ((next = try_combine (prev, link1, > > + NULL, NULL, > > + &new_direct_jump_p, > > + last_combined_insn, insn)) != 0) > > + > > + { > > + delete_insn (insn); > > + insn = next; > > + statistics_counter_event (cfun, "three-insn combine", > 1); > > + goto retry; > > + } > > + } > > + } > So you've got two new combine cases here, but I think the testcase only > tests one of them. Can you include a testcase for both of hte major > paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN) pr61225.c is the case to cover I1->I2->I3; I2->insn. For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also handle them. So I removed the code from the patch. Here is the final patch. Bootstrap and no make check regression on X86-64. ChangeLog: 2014-11-09 Zhenqiang Chen Part of PR rtl-optimization/61225 * combine.c (can_reuse_cc_set_p): New function. (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. (try_combine): Add one more parameter TO_COMBINED_INSN, which is used to create a new insn parallel (TO_COMBINED_INSN, I3). testsuite/ChangeLog: 2014-11-09 Zhenqiang Chen * gcc.target/i386/pr61225.c: New test. +/* { dg-final { cleanup-rtl-dump "combine" } } */ diff --git a/gcc/combine.c b/gcc/combine.c index e6deb41..7360ca3 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -431,7 +431,7 @@ static int can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *, static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int, rtx *); static int contains_muldiv (rtx); static rtx_insn *try_combine (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *, - int *, rtx_insn *); + int *, rtx_insn *, rtx_insn *); static void undo_all (void); static void undo_commit (void); static rtx *find_split_point (rtx *, rtx_insn *, bool); @@ -1120,6 +1120,46 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b) #endif return false; } + +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. + It returns TRUE, if reg1 == reg2, and no other refer of reg1 + except A and B. */ + +static bool +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) + || !seta + || !setb) + return false; + + if (GET_CODE (SET_SRC (seta)) != COMPARE + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC + || !REG_P (XEXP (SET_SRC (seta), 0)) + || XEXP (SET_SRC (seta), 1) != CONST0_RTX (GET_MODE (SET_SRC (seta))) + || !REG_P (SET_SRC (setb)) + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) + return false; + + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2) + { + df_ref use; + rtx insn; + unsigned int i = REGNO (SET_SRC (setb)); + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) + { + insn = DF_REF_INSN (use); + if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn))) + return false; + } + } + return true; +} + /* Main entry point for combiner. F is the first insn of the function. NREGS is the first unused pseudo-reg number. @@ -1129,10 +1169,7 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b) static int combine_instructions (rtx_insn *f, unsigned int nregs) { - rtx_insn *insn, *next; -#ifdef HAVE_cc0 - rtx_insn *prev; -#endif + rtx_insn *insn, *next, *prev; struct insn_link *links, *nextlinks; rtx_insn *first; basic_block last_bb; @@ -1279,7 +1316,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) FOR_EACH_LOG_LINK (links, insn) if ((next = try_combine (insn, links->insn, NULL, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "two-insn combine", 1); goto retry; @@ -1300,7 +1337,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) FOR_EACH_LOG_LINK (nextlinks, link) if ((next = try_combine (insn, link, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); goto retry; @@ -1322,13 +1359,13 @@ combine_instructions (rtx_insn *f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; } @@ -1342,13 +1379,13 @@ combine_instructions (rtx_insn *f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; } @@ -1364,7 +1401,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) && sets_cc0_p (PATTERN (prev)) && (next = try_combine (insn, links->insn, prev, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; #endif @@ -1377,7 +1414,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, links->insn, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); @@ -1406,7 +1443,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1417,7 +1454,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1434,7 +1471,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1444,7 +1481,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1452,6 +1489,37 @@ combine_instructions (rtx_insn *f, unsigned int nregs) } } + /* Try to combine a compare insn that sets CC + with a preceding insn that can set CC, and maybe with its + logical predecessor as well. + We need this special code because data flow connections + do not get entered in LOG_LINKS. */ + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX + && can_reuse_cc_set_p (insn, prev) + && max_combine >= 4) + { + struct insn_link *next1; + FOR_EACH_LOG_LINK (next1, prev) + { + rtx_insn *link1 = next1->insn; + if (NOTE_P (link1)) + continue; + /* I1 -> I2 -> I3; I2 -> insn; + output parallel (insn, I3). */ + FOR_EACH_LOG_LINK (nextlinks, link1) + if ((next = try_combine (prev, link1, + nextlinks->insn, NULL, + &new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, "four-insn combine", 1); + goto retry; + } + } + } /* Try this insn with each REG_EQUAL note it links back to. */ FOR_EACH_LOG_LINK (links, insn) { @@ -1477,7 +1545,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) i2mod_new_rhs = copy_rtx (note); next = try_combine (insn, i2mod, NULL, NULL, &new_direct_jump_p, - last_combined_insn); + last_combined_insn, NULL); i2mod = NULL; if (next) { @@ -2533,11 +2601,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n) LAST_COMBINED_INSN is either I3, or some insn after I3 that has been I3 passed to an earlier try_combine within the same basic - block. */ + block. + + TO_COMBINED_INSN is an insn after I3 that sets CC flags. It is used to + combine with I3 to make a new insn. */ static rtx_insn * try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, - int *new_direct_jump_p, rtx_insn *last_combined_insn) + int *new_direct_jump_p, rtx_insn *last_combined_insn, + rtx_insn *to_combined_insn) { /* New patterns for I3 and I2, respectively. */ rtx newpat, newi2pat = 0; @@ -2630,6 +2702,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, || cant_combine_insn_p (i2) || (i1 && cant_combine_insn_p (i1)) || (i0 && cant_combine_insn_p (i0)) + || (to_combined_insn && cant_combine_insn_p (to_combined_insn)) || likely_spilled_retval_p (i3)) return 0; @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, rtx old = newpat; total_sets = 1 + extra_sets; newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); - XVECEXP (newpat, 0, 0) = old; + + if (to_combined_insn) + XVECEXP (newpat, 0, --total_sets) = old; + else + XVECEXP (newpat, 0, 0) = old; } if (added_sets_0) @@ -3346,6 +3423,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n) t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0, 0); + if (to_combined_insn) + { + rtx todest = NULL_RTX, tosrc = NULL_RTX; + if (can_combine_p (i2, to_combined_insn, NULL, NULL, + i3, NULL, &todest, &tosrc)) + { + rtx pat = copy_rtx (PATTERN (to_combined_insn)); + t = subst (pat, todest, tosrc, 0, 0, 0); + } + else + { + undo_all (); + return 0; + } + } XVECEXP (newpat, 0, --total_sets) = t; } } diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c b/gcc/testsuite/gcc.target/i386/pr61225.c new file mode 100644 index 0000000..9c08102 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr61225.c @@ -0,0 +1,16 @@ +/* PR rtl-optimization/61225 */ +/* { dg-do compile } */ +/* { dg-options "-Os -fdump-rtl-combine-details" } */ + +void foo (void *); + +int * +f1 (int *x) +{ + if (!--*x) + foo (x); + return x; +} + +/* { dg-final { scan-rtl-dump "Successfully matched this instruction" "combine" } } */