From patchwork Tue Apr 28 13:57:13 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Lawrence <alan.lawrence@arm.com>
X-Patchwork-Id: 465561
Return-Path: 
 <gcc-patches-return-396207-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 964FC140079
	for <incoming@patchwork.ozlabs.org>;
	Tue, 28 Apr 2015 23:57:29 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass
	reason="1024-bit key; unprotected key"
	header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=OQ32Uf5K;
	dkim-adsp=none (unprotected policy); dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type; q=dns; s=default; b=KNkLIQt1CgwuCLjUE
	qy/cJGleXfPEylRm+MLGyjqnVYjJpq4neriAjrOmWLTYg7+CDm4jmzUtof/NFfa7
	89z2Tm7cwFHLMtyOOBfBrGNdz4bpvjcyRUdyblNJDE64ULlAwg0NlcrSFrXflrbh
	bxSF1kSn8WMbzaqM9Jnubrou1k=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type; s=default; bh=3icYo6in4F2CWQC3J/p1lIg
	xt/g=; b=OQ32Uf5KVeLqSgVXM+ECcntXVFU1ZtgW5uio93Uy22HLvoo/Jh6CuEs
	oSUL2ShQLCdgImPSvrByoDurTsyOyA5lRs0mdpjhASO9+l2tnPKkcH8hBMAtKb8e
	9rnrIe/xLHFw6raVLT9a7g4AvUwMuWWPMCMTAuszX8FQPXjd0zZU=
Received: (qmail 114313 invoked by alias); 28 Apr 2015 13:57:20 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 114274 invoked by uid 89); 28 Apr 2015 13:57:19 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
	eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 28 Apr 2015 13:57:17 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.140]) by uk-mta-1.uk.mimecast.lan;
	Tue, 28 Apr 2015 14:57:13 +0100
Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Tue, 28 Apr 2015 14:57:13 +0100
Message-ID: <553F91B9.7050703@arm.com>
Date: Tue, 28 Apr 2015 14:57:13 +0100
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
References: <553F916C.30609@arm.com>
In-Reply-To: <553F916C.30609@arm.com>
X-MC-Unique: mDQzvxidQpm8DEYUtX-jVA-1
X-IsSubscribed: yes

Alan Lawrence wrote:
> Tree if-conversion currently bails out for loops that (a) contain nested loops; 
> (b) have more than one exit; (c) where the exit block (source of the exit edge) 
> does not dominate the loop latch; (d) where the exit block is the loop header, 
> or there are statements after the exit.
> 
> This patch removes restrictions (c) and (d). The intuition is that, for (c), "if 
> (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q) break;" and 
> this is mostly handled by existing code for propagating conditions. For (d), "if 
> (P) break; stmts" is equivalent to "if (!P) stmts; if (P) break;" - this 
> requires inserting the predicated stmts before the branch rather than after.
> 
> Mostly thus this patch is just removing assumptions about when we do/don't need 
> to store predicates. One 'gotcha' was in some test cases the latch block passed 
> into if-conversion is non-empty; in such cases, if-conversion will now restore 
> "good form" by moving the statement into the exit block (predicated with 
> !exit-condition).
> 
> The condition on dominance in add_to_predicate_list, I haven't quite managed to 
> convince myself is right; we _do_ want to store a predicate for the latch block 
> to handle the above case, but I'm not totally sure of the postdominance 
> condition - I think it may store conditions in cases where we don't really need 
> to (e.g. "for (;;) { ... if (P) { for (;;) ; } }" which might look nested but 
> isn't, and has no route to the function exit). However, storing conditions when 
> we don't need to, is OK, unlike failing to store when we do need to ;).
> 
> A simple example of the patch at work:
> 
> int
> foo ()
> {
>    for (int i = 0; i < N ; i++)
>    {
>      int m = (a[i] & i) ? 5 : 4;
>      b[i] = a[i] * m;
>    }
> }
> 
> compiled at -O3, -fdump-tree-ivcanon shows this immediately before 
> tree-if-conversion:
> 
> ...function entry, variables, etc...
>    <bb 2>:
>    _10 = a[0];
>    goto <bb 6>;
> 
>    <bb 3>:
>    _5 = a[i_9];
>    _6 = _5 & i_9;
>    if (_6 != 0)
>      goto <bb 5>;
>    else
>      goto <bb 4>;
> 
>    <bb 4>:
> 
>    <bb 5>:
>    # m_14 = PHI <5(3), 4(4)>
> 
>    <bb 6>:
>    # m_2 = PHI <m_14(5), 4(2)>
>    # _15 = PHI <_5(5), _10(2)>
>    # i_16 = PHI <i_9(5), 0(2)>
>    # ivtmp_13 = PHI <ivtmp_3(5), 32(2)>
>    _7 = m_2 * _15;
>    b[i_16] = _7;
>    i_9 = i_16 + 1;
>    ivtmp_3 = ivtmp_13 - 1;
>    if (ivtmp_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 7>;
> 
> which previously was not if-converted. With this patch:
> 
>    <bb 2>:
>    _10 = a[0];
>    goto <bb 4>;
> 
>    <bb 3>:
> 
>    <bb 4>:
>    # m_2 = PHI <m_14(3), 4(2)>
>    # _15 = PHI <_5(3), _10(2)>
>    # i_16 = PHI <i_9(3), 0(2)>
>    # ivtmp_13 = PHI <ivtmp_3(3), 32(2)>
>    _7 = m_2 * _15;
>    b[i_16] = _7;
>    i_9 = i_16 + 1;
>    ivtmp_3 = ivtmp_13 - 1;
>    _5 = a[i_9];
>    _6 = _5 & i_9;
>    m_14 = _6 != 0 ? 5 : 4;
>    if (ivtmp_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 5>;
> 
>    <bb 5>:
>    return;
> 
> (Unfortunately the vectorizer still doesn't handle this loop either, but that's 
> another issue/patch...)
> 
> Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and aarch64-none-linux-gnu.
> Cross-tested check-gcc on aarch64-none-elf.
> I'm investigating impact on benchmarks - on AArch64 Spec2k6, this touches a 
> number of object files, leading to an overall slight decrease in the number of 
> instructions, but no change that looks significant (specifically, no more or 
> less vectorization).
> 
> Is this OK for trunk?
> 
> Cheers, Alan

Attached!

--Alan

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 400ee01a5122c0c74b21a9b100f36b2951e45a46..6bd642e54848ed2faf72aa99987ab9f838634ca6 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -455,8 +455,8 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
     return;
 
   /* If dominance tells us this basic block is always executed,
-     don't record any predicates for it.  */
-  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+     i.e. it postdominates the header, don't record any predicates for it.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, loop->header, bb))
     return;
 
   dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
@@ -1025,11 +1025,10 @@ has_pred_critical_p (basic_block bb)
 
    Last restriction is valid if aggressive_if_conv is false.
 
-   EXIT_BB is the basic block containing the exit of the LOOP.  BB is
-   inside LOOP.  */
+   BB is inside LOOP.  */
 
 static bool
-if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
+if_convertible_bb_p (struct loop *loop, basic_block bb)
 {
   edge e;
   edge_iterator ei;
@@ -1044,30 +1043,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
       && !aggressive_if_conv)
     return false;
 
-  if (exit_bb)
-    {
-      if (bb != loop->latch)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "basic block after exit bb but before latch\n");
-	  return false;
-	}
-      else if (!empty_block_p (bb))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "non empty basic block after exit bb\n");
-	  return false;
-	}
-      else if (bb == loop->latch
-	       && bb != exit_bb
-	       && !dominated_by_p (CDI_DOMINATORS, bb, exit_bb))
-	  {
-	    if (dump_file && (dump_flags & TDF_DETAILS))
-	      fprintf (dump_file, "latch is not dominated by exit_block\n");
-	    return false;
-	  }
-    }
-
   /* Be less adventurous and handle only normal edges.  */
   FOR_EACH_EDGE (e, ei, bb->succs)
     if (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP))
@@ -1199,15 +1174,6 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch and loop exit block are always executed and
-	 have no extra conditions to be processed: skip them.  */
-      if (bb == loop->latch
-	  || bb_with_exit_edge_p (loop, bb))
-	{
-	  reset_bb_predicate (bb);
-	  continue;
-	}
-
       cond = bb_predicate (bb);
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
@@ -1271,7 +1237,6 @@ if_convertible_loop_p_1 (struct loop *loop,
 {
   bool res;
   unsigned int i;
-  basic_block exit_bb = NULL;
 
   /* Don't if-convert the loop when the data dependences cannot be
      computed: the loop won't be vectorized in that case.  */
@@ -1295,11 +1260,8 @@ if_convertible_loop_p_1 (struct loop *loop,
     {
       basic_block bb = ifc_bbs[i];
 
-      if (!if_convertible_bb_p (loop, bb, exit_bb))
+      if (!if_convertible_bb_p (loop, bb))
 	return false;
-
-      if (bb_with_exit_edge_p (loop, bb))
-	exit_bb = bb;
     }
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -1375,14 +1337,11 @@ if_convertible_loop_p_1 (struct loop *loop,
    - it is innermost,
    - it has two or more basic blocks,
    - it has only one exit,
-   - loop header is not the exit edge,
    - if its basic blocks and phi nodes are if convertible.  */
 
 static bool
 if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
 {
-  edge e;
-  edge_iterator ei;
   bool res = false;
   vec<data_reference_p> refs;
   vec<ddr_p> ddrs;
@@ -1411,12 +1370,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
       return false;
     }
 
-  /* If one of the loop header's edge is an exit edge then do not
-     apply if-conversion.  */
-  FOR_EACH_EDGE (e, ei, loop->header->succs)
-    if (loop_exit_edge_p (loop, e))
-      return false;
-
   refs.create (5);
   ddrs.create (25);
   auto_vec<loop_p, 3> loop_nest;
@@ -2249,7 +2202,6 @@ remove_conditions_and_labels (loop_p loop)
 static void
 combine_blocks (struct loop *loop, bool any_mask_load_store)
 {
-  basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
   edge e;
@@ -2265,10 +2217,10 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
-  exit_bb = NULL;
+  basic_block exit_bb = NULL;
   for (i = 0; i < orig_loop_num_nodes; i++)
     {
-      bb = ifc_bbs[i];
+      basic_block bb = ifc_bbs[i];
       free_bb_predicate (bb);
       if (bb_with_exit_edge_p (loop, bb))
 	{
@@ -2280,7 +2232,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
-      bb = ifc_bbs[i];
+      basic_block bb = ifc_bbs[i];
 
       for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei));)
 	{
@@ -2315,37 +2267,53 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
       set_immediate_dominator (CDI_DOMINATORS, loop->latch, loop->header);
     }
 
-  merge_target_bb = loop->header;
+  basic_block merge_target_bb = loop->header;
+  /* Indicates that any further statements must be inserted
+     _before_ the final jump.  */
+  bool seen_exit = (loop->header == exit_bb);
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
-      gimple_stmt_iterator gsi;
-      gimple_stmt_iterator last;
-
-      bb = ifc_bbs[i];
+      basic_block bb = ifc_bbs[i];
 
-      if (bb == exit_bb || bb == loop->latch)
-	continue;
+      if (bb == exit_bb)
+	{
+	  /* If possible, merge loop header to the block with the exit edge.
+	     This reduces the number of basic blocks to two, to please the
+	     vectorizer that handles only loops with two nodes.  */
+	  if (can_merge_blocks_p (loop->header, bb))
+	    merge_blocks (loop->header, bb);
+	  else
+	    {
+	      /* Must leave stmts in this BB; move following stmts here too.  */
+	      merge_target_bb = bb;
+	    }
+	  seen_exit = true;
+	  continue;
+	}
 
       /* Make stmts member of loop->header.  */
-      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi);
+	   gsi_next (&gsi))
 	gimple_set_bb (gsi_stmt (gsi), merge_target_bb);
 
       /* Update stmt list.  */
-      last = gsi_last_bb (merge_target_bb);
-      gsi_insert_seq_after (&last, bb_seq (bb), GSI_NEW_STMT);
+      gimple_stmt_iterator last = gsi_last_bb (merge_target_bb);
+
+      if (seen_exit)
+	{
+	  gcc_assert (is_ctrl_stmt (gsi_stmt (last)));
+	  gsi_insert_seq_before (&last, bb_seq(bb), GSI_SAME_STMT);
+	}
+      else
+	  gsi_insert_seq_after (&last, bb_seq (bb), GSI_NEW_STMT);
+
       set_bb_seq (bb, NULL);
 
-      delete_basic_block (bb);
+      if (bb != loop->latch)
+	delete_basic_block (bb);
     }
 
-  /* If possible, merge loop header to the block with the exit edge.
-     This reduces the number of basic blocks to two, to please the
-     vectorizer that handles only loops with two nodes.  */
-  if (exit_bb
-      && exit_bb != loop->header
-      && can_merge_blocks_p (loop->header, exit_bb))
-    merge_blocks (loop->header, exit_bb);
-
   free (ifc_bbs);
   ifc_bbs = NULL;
 }