From patchwork Tue Apr 28 13:57:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 465561 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 964FC140079 for ; Tue, 28 Apr 2015 23:57:29 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass reason="1024-bit key; unprotected key" header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=OQ32Uf5K; dkim-adsp=none (unprotected policy); dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=KNkLIQt1CgwuCLjUE qy/cJGleXfPEylRm+MLGyjqnVYjJpq4neriAjrOmWLTYg7+CDm4jmzUtof/NFfa7 89z2Tm7cwFHLMtyOOBfBrGNdz4bpvjcyRUdyblNJDE64ULlAwg0NlcrSFrXflrbh bxSF1kSn8WMbzaqM9Jnubrou1k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=3icYo6in4F2CWQC3J/p1lIg xt/g=; b=OQ32Uf5KVeLqSgVXM+ECcntXVFU1ZtgW5uio93Uy22HLvoo/Jh6CuEs oSUL2ShQLCdgImPSvrByoDurTsyOyA5lRs0mdpjhASO9+l2tnPKkcH8hBMAtKb8e 9rnrIe/xLHFw6raVLT9a7g4AvUwMuWWPMCMTAuszX8FQPXjd0zZU= Received: (qmail 114313 invoked by alias); 28 Apr 2015 13:57:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 114274 invoked by uid 89); 28 Apr 2015 13:57:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Apr 2015 13:57:17 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-1.uk.mimecast.lan; Tue, 28 Apr 2015 14:57:13 +0100 Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 28 Apr 2015 14:57:13 +0100 Message-ID: <553F91B9.7050703@arm.com> Date: Tue, 28 Apr 2015 14:57:13 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c References: <553F916C.30609@arm.com> In-Reply-To: <553F916C.30609@arm.com> X-MC-Unique: mDQzvxidQpm8DEYUtX-jVA-1 X-IsSubscribed: yes Alan Lawrence wrote: > Tree if-conversion currently bails out for loops that (a) contain nested loops; > (b) have more than one exit; (c) where the exit block (source of the exit edge) > does not dominate the loop latch; (d) where the exit block is the loop header, > or there are statements after the exit. > > This patch removes restrictions (c) and (d). The intuition is that, for (c), "if > (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q) break;" and > this is mostly handled by existing code for propagating conditions. For (d), "if > (P) break; stmts" is equivalent to "if (!P) stmts; if (P) break;" - this > requires inserting the predicated stmts before the branch rather than after. > > Mostly thus this patch is just removing assumptions about when we do/don't need > to store predicates. One 'gotcha' was in some test cases the latch block passed > into if-conversion is non-empty; in such cases, if-conversion will now restore > "good form" by moving the statement into the exit block (predicated with > !exit-condition). > > The condition on dominance in add_to_predicate_list, I haven't quite managed to > convince myself is right; we _do_ want to store a predicate for the latch block > to handle the above case, but I'm not totally sure of the postdominance > condition - I think it may store conditions in cases where we don't really need > to (e.g. "for (;;) { ... if (P) { for (;;) ; } }" which might look nested but > isn't, and has no route to the function exit). However, storing conditions when > we don't need to, is OK, unlike failing to store when we do need to ;). > > A simple example of the patch at work: > > int > foo () > { > for (int i = 0; i < N ; i++) > { > int m = (a[i] & i) ? 5 : 4; > b[i] = a[i] * m; > } > } > > compiled at -O3, -fdump-tree-ivcanon shows this immediately before > tree-if-conversion: > > ...function entry, variables, etc... > : > _10 = a[0]; > goto ; > > : > _5 = a[i_9]; > _6 = _5 & i_9; > if (_6 != 0) > goto ; > else > goto ; > > : > > : > # m_14 = PHI <5(3), 4(4)> > > : > # m_2 = PHI > # _15 = PHI <_5(5), _10(2)> > # i_16 = PHI > # ivtmp_13 = PHI > _7 = m_2 * _15; > b[i_16] = _7; > i_9 = i_16 + 1; > ivtmp_3 = ivtmp_13 - 1; > if (ivtmp_3 != 0) > goto ; > else > goto ; > > which previously was not if-converted. With this patch: > > : > _10 = a[0]; > goto ; > > : > > : > # m_2 = PHI > # _15 = PHI <_5(3), _10(2)> > # i_16 = PHI > # ivtmp_13 = PHI > _7 = m_2 * _15; > b[i_16] = _7; > i_9 = i_16 + 1; > ivtmp_3 = ivtmp_13 - 1; > _5 = a[i_9]; > _6 = _5 & i_9; > m_14 = _6 != 0 ? 5 : 4; > if (ivtmp_3 != 0) > goto ; > else > goto ; > > : > return; > > (Unfortunately the vectorizer still doesn't handle this loop either, but that's > another issue/patch...) > > Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and aarch64-none-linux-gnu. > Cross-tested check-gcc on aarch64-none-elf. > I'm investigating impact on benchmarks - on AArch64 Spec2k6, this touches a > number of object files, leading to an overall slight decrease in the number of > instructions, but no change that looks significant (specifically, no more or > less vectorization). > > Is this OK for trunk? > > Cheers, Alan Attached! --Alan diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 400ee01a5122c0c74b21a9b100f36b2951e45a46..6bd642e54848ed2faf72aa99987ab9f838634ca6 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -455,8 +455,8 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) return; /* If dominance tells us this basic block is always executed, - don't record any predicates for it. */ - if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) + i.e. it postdominates the header, don't record any predicates for it. */ + if (dominated_by_p (CDI_POST_DOMINATORS, loop->header, bb)) return; dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); @@ -1025,11 +1025,10 @@ has_pred_critical_p (basic_block bb) Last restriction is valid if aggressive_if_conv is false. - EXIT_BB is the basic block containing the exit of the LOOP. BB is - inside LOOP. */ + BB is inside LOOP. */ static bool -if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb) +if_convertible_bb_p (struct loop *loop, basic_block bb) { edge e; edge_iterator ei; @@ -1044,30 +1043,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb) && !aggressive_if_conv) return false; - if (exit_bb) - { - if (bb != loop->latch) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "basic block after exit bb but before latch\n"); - return false; - } - else if (!empty_block_p (bb)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "non empty basic block after exit bb\n"); - return false; - } - else if (bb == loop->latch - && bb != exit_bb - && !dominated_by_p (CDI_DOMINATORS, bb, exit_bb)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "latch is not dominated by exit_block\n"); - return false; - } - } - /* Be less adventurous and handle only normal edges. */ FOR_EACH_EDGE (e, ei, bb->succs) if (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP)) @@ -1199,15 +1174,6 @@ predicate_bbs (loop_p loop) tree cond; gimple stmt; - /* The loop latch and loop exit block are always executed and - have no extra conditions to be processed: skip them. */ - if (bb == loop->latch - || bb_with_exit_edge_p (loop, bb)) - { - reset_bb_predicate (bb); - continue; - } - cond = bb_predicate (bb); stmt = last_stmt (bb); if (stmt && gimple_code (stmt) == GIMPLE_COND) @@ -1271,7 +1237,6 @@ if_convertible_loop_p_1 (struct loop *loop, { bool res; unsigned int i; - basic_block exit_bb = NULL; /* Don't if-convert the loop when the data dependences cannot be computed: the loop won't be vectorized in that case. */ @@ -1295,11 +1260,8 @@ if_convertible_loop_p_1 (struct loop *loop, { basic_block bb = ifc_bbs[i]; - if (!if_convertible_bb_p (loop, bb, exit_bb)) + if (!if_convertible_bb_p (loop, bb)) return false; - - if (bb_with_exit_edge_p (loop, bb)) - exit_bb = bb; } for (i = 0; i < loop->num_nodes; i++) @@ -1375,14 +1337,11 @@ if_convertible_loop_p_1 (struct loop *loop, - it is innermost, - it has two or more basic blocks, - it has only one exit, - - loop header is not the exit edge, - if its basic blocks and phi nodes are if convertible. */ static bool if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) { - edge e; - edge_iterator ei; bool res = false; vec refs; vec ddrs; @@ -1411,12 +1370,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) return false; } - /* If one of the loop header's edge is an exit edge then do not - apply if-conversion. */ - FOR_EACH_EDGE (e, ei, loop->header->succs) - if (loop_exit_edge_p (loop, e)) - return false; - refs.create (5); ddrs.create (25); auto_vec loop_nest; @@ -2249,7 +2202,6 @@ remove_conditions_and_labels (loop_p loop) static void combine_blocks (struct loop *loop, bool any_mask_load_store) { - basic_block bb, exit_bb, merge_target_bb; unsigned int orig_loop_num_nodes = loop->num_nodes; unsigned int i; edge e; @@ -2265,10 +2217,10 @@ combine_blocks (struct loop *loop, bool any_mask_load_store) /* Merge basic blocks: first remove all the edges in the loop, except for those from the exit block. */ - exit_bb = NULL; + basic_block exit_bb = NULL; for (i = 0; i < orig_loop_num_nodes; i++) { - bb = ifc_bbs[i]; + basic_block bb = ifc_bbs[i]; free_bb_predicate (bb); if (bb_with_exit_edge_p (loop, bb)) { @@ -2280,7 +2232,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store) for (i = 1; i < orig_loop_num_nodes; i++) { - bb = ifc_bbs[i]; + basic_block bb = ifc_bbs[i]; for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei));) { @@ -2315,37 +2267,53 @@ combine_blocks (struct loop *loop, bool any_mask_load_store) set_immediate_dominator (CDI_DOMINATORS, loop->latch, loop->header); } - merge_target_bb = loop->header; + basic_block merge_target_bb = loop->header; + /* Indicates that any further statements must be inserted + _before_ the final jump. */ + bool seen_exit = (loop->header == exit_bb); for (i = 1; i < orig_loop_num_nodes; i++) { - gimple_stmt_iterator gsi; - gimple_stmt_iterator last; - - bb = ifc_bbs[i]; + basic_block bb = ifc_bbs[i]; - if (bb == exit_bb || bb == loop->latch) - continue; + if (bb == exit_bb) + { + /* If possible, merge loop header to the block with the exit edge. + This reduces the number of basic blocks to two, to please the + vectorizer that handles only loops with two nodes. */ + if (can_merge_blocks_p (loop->header, bb)) + merge_blocks (loop->header, bb); + else + { + /* Must leave stmts in this BB; move following stmts here too. */ + merge_target_bb = bb; + } + seen_exit = true; + continue; + } /* Make stmts member of loop->header. */ - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) gimple_set_bb (gsi_stmt (gsi), merge_target_bb); /* Update stmt list. */ - last = gsi_last_bb (merge_target_bb); - gsi_insert_seq_after (&last, bb_seq (bb), GSI_NEW_STMT); + gimple_stmt_iterator last = gsi_last_bb (merge_target_bb); + + if (seen_exit) + { + gcc_assert (is_ctrl_stmt (gsi_stmt (last))); + gsi_insert_seq_before (&last, bb_seq(bb), GSI_SAME_STMT); + } + else + gsi_insert_seq_after (&last, bb_seq (bb), GSI_NEW_STMT); + set_bb_seq (bb, NULL); - delete_basic_block (bb); + if (bb != loop->latch) + delete_basic_block (bb); } - /* If possible, merge loop header to the block with the exit edge. - This reduces the number of basic blocks to two, to please the - vectorizer that handles only loops with two nodes. */ - if (exit_bb - && exit_bb != loop->header - && can_merge_blocks_p (loop->header, exit_bb)) - merge_blocks (loop->header, exit_bb); - free (ifc_bbs); ifc_bbs = NULL; }