diff mbox

Remove many restrictions for identifying jump threads across backedges

Message ID 528F2D5A.7030902@redhat.com
State New
Headers show

Commit Message

Jeff Law Nov. 22, 2013, 10:09 a.m. UTC
It's amazing what 10 years away from a hunk of code will do...


So we've long had an arbitrary restriction in the jump thread 
identification code which made it very conservative when searching for 
threading opportunities once it encountered a back edge.

I've always known the problem was pollution of the tables with 
equivalences that were not valid once we cross the backedge.  But I'd 
never *really* looked at the problem.  I'd just had a sneaking suspicion 
the problems were related to the equivalences we record for the 
true/false arms of conditionals, PHIs and maybe other lurkers.

So for example, if we have a PHI

x2 = PHI (x0 (bb2), x1 (bb3));
[ ... ]
x1 = <whatever>


Where the incoming edge from bb3 is the backedge.  We will record a 
temporary equivalence x2 = x1.  Once x1 gets reassigned, we can no 
longer use that equivalence.

So we record in bitmaps the PHI arg of any PHI nodes that are used to 
create equivalences and the LHS of PHI nodes into another bitmap.  We 
only do this after traversing a backedge.

If we see an assignment to anything recorded in the bitmap for the PHI 
arg names, then we iterate over the recorded LHS names to see if any are 
currently equivalent to the RHS that just changed.  Obviously if such a 
situation is found, we eliminate the equivalence.  This is probably 
somewhat imprecise, but it's good enough.

We also have problems with expressions in the hash tables.  These are 
pretty trivial to solve -- the hash tables are only used by the SIMPLIFY 
callback.  Thus, we can just use a dummy SIMPLIFY callback once we've 
passed a backedge in the CFG.

Finally, for assignments that set a SSA_NAME to a value that isn't 
particularly useful, we need to record *something* in the equivalence 
tables (a NULL_TREE is good).  That effectively invalidates any 
equivalences with that name.  This is useful if we had used a 
conditional to derive a single point range for an SSA_NAME but the 
SSA_NAME's assignment statement doesn't provide such range precision.

This patch has bootstrapped and tested in various forms over the last 
couple weeks.  However, I made some simplifications late tonight and 
thus it needs to go through another spin.

Bootstrapped, regression test is in progress.  Will commit in the AM if 
everything is clean and a final once-over doesn't cause something to 
jump out as wrong.

Onwards to stage3 bugfixing :-)

Jeff
* tree-ssa-threadedge.c (record_temporary_equivalence): Handle
	NULL for RHS, which we used to invalidate equivalences.
	(record_temporary_equivalences_from_phis): New bitmap arguments
	and a boolean indicating if we have passed a backedge.  If we
	have passed a backedge, then set the appropriate bit in the
	bitmaps for the SRC & DEST of PHIs creating equivalences.
	(invalidate_equivalences, dummy_simplify): New functions.
	(cond_arg_set_in_b): Remove.
	(record_temporary_equivalences_from_stmts_at_dest): New bitmap
	arguments and a boolean indicating if we have passed a backedge.
	If we have passed a backedge, then perform invalidations as
	needed.
	(thread_around_empty_blocks): If we have seen a backedge, then
	use the dummy simplify routine.
	(thread_through_normal_block): Likewise.  Pass bitmaps and
	backedge status to children.  Do not pessimize so much when
	traversing backedges in the CFG.
	(thread_across_edge): Manage the SRC_MAP/DST_MAP bitmaps.
	If we have seen a backedge, then use the dummy simplify routine.
	Do not pessimize so much when traversing backedges.

Comments

Steven Bosscher Nov. 22, 2013, 8:05 p.m. UTC | #1
On Fri, Nov 22, 2013 at 11:09 AM, Jeff Law wrote:
>
> So we've long had an arbitrary restriction in the jump thread identification
> code which made it very conservative when searching for threading
> opportunities once it encountered a back edge.

You're adding a pretty complex piece of code, for something that's
been tricky in the past: Threading around loops makes control and data
flow harder to handle for later passes.

Is there any code you have in mind for which removing the conservatism
is a win? If not, then perhaps we should test first, whether your
patch doesn't hurt the loop and vect opts ability to perform their
job.

Or is this prep work for the FSA patch?

Ciao!
Steven
Jeff Law Nov. 22, 2013, 8:27 p.m. UTC | #2
On 11/22/13 13:05, Steven Bosscher wrote:
> On Fri, Nov 22, 2013 at 11:09 AM, Jeff Law wrote:
>>
>> So we've long had an arbitrary restriction in the jump thread identification
>> code which made it very conservative when searching for threading
>> opportunities once it encountered a back edge.
>
> You're adding a pretty complex piece of code, for something that's
> been tricky in the past: Threading around loops makes control and data
> flow harder to handle for later passes.
>
> Is there any code you have in mind for which removing the conservatism
> is a win? If not, then perhaps we should test first, whether your
> patch doesn't hurt the loop and vect opts ability to perform their
> job.
>
> Or is this prep work for the FSA patch?
It was the last piece in the the FSA work.  It should dramatically help 
the coremark benchmark.


jeff
diff mbox

Patch

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 7600d7b..ce161a8 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -170,7 +170,8 @@  record_temporary_equivalence (tree x, tree y, vec<tree> *stack)
 {
   tree prev_x = SSA_NAME_VALUE (x);
 
-  if (TREE_CODE (y) == SSA_NAME)
+  /* Y may be NULL if we are invalidating entries in the table.  */
+  if (y && TREE_CODE (y) == SSA_NAME)
     {
       tree tmp = SSA_NAME_VALUE (y);
       y = tmp ? tmp : y;
@@ -186,10 +187,16 @@  record_temporary_equivalence (tree x, tree y, vec<tree> *stack)
    edge E.  Record unwind information for the equivalences onto STACK.
 
    If a PHI which prevents threading is encountered, then return FALSE
-   indicating we should not thread this edge, else return TRUE.  */
+   indicating we should not thread this edge, else return TRUE. 
+
+   If SRC_MAP/DST_MAP exist, then mark the source and destination SSA_NAMEs
+   of any equivalences recorded.  We use this to make invalidation after
+   traversing back edges less painful.  */
 
 static bool
-record_temporary_equivalences_from_phis (edge e, vec<tree> *stack)
+record_temporary_equivalences_from_phis (edge e, vec<tree> *stack,
+					 bool backedge_seen,
+					 bitmap src_map, bitmap dst_map)
 {
   gimple_stmt_iterator gsi;
 
@@ -217,6 +224,14 @@  record_temporary_equivalences_from_phis (edge e, vec<tree> *stack)
 	stmt_count++;
 
       record_temporary_equivalence (dst, src, stack);
+
+      /* If we have crossed a backedge, then start recording equivalences
+	 we might need to invalidate.  */
+      if (backedge_seen && TREE_CODE (src) == SSA_NAME)
+	{
+	  bitmap_set_bit (src_map, SSA_NAME_VERSION (src));
+	  bitmap_set_bit (dst_map, SSA_NAME_VERSION (dst));
+	}
     }
   return true;
 }
@@ -271,6 +286,34 @@  fold_assignment_stmt (gimple stmt)
     }
 }
 
+/* A new value has been assigned to LHS.  If necessary, invalidate any
+   equivalences that are no longer valid.  */
+static void
+invalidate_equivalences (tree lhs, vec<tree> *stack,
+			 bitmap src_map, bitmap dst_map)
+{
+  /* SRC_MAP contains the source SSA_NAMEs for equivalences created by PHI
+     nodes.  If an entry in SRC_MAP changes, there's some destination that
+     has been recorded as equivalent to the source and that equivalency
+     needs to be eliminated.  */
+  if (bitmap_bit_p (src_map, SSA_NAME_VERSION (lhs)))
+    {
+      unsigned int i;
+      bitmap_iterator bi;
+
+      /* We know that the LHS of STMT was used as the RHS in an equivalency
+	 created by a PHI.  All the LHS of such PHIs were recorded into DST_MAP.
+	 So we can iterate over them to see if any have the LHS of STMT as
+	 an equivalence, and if so, remove the equivalence as it is no longer
+	 valid.  */
+      EXECUTE_IF_SET_IN_BITMAP (dst_map, 0, i, bi)
+	{
+	  if (SSA_NAME_VALUE (ssa_name (i)) == lhs)
+	    record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);
+	}
+    }
+}
+
 /* Try to simplify each statement in E->dest, ultimately leading to
    a simplification of the COND_EXPR at the end of E->dest.
 
@@ -292,7 +335,10 @@  static gimple
 record_temporary_equivalences_from_stmts_at_dest (edge e,
 						  vec<tree> *stack,
 						  tree (*simplify) (gimple,
-								    gimple))
+								    gimple),
+						  bool backedge_seen,
+						  bitmap src_map,
+						  bitmap dst_map)
 {
   gimple stmt = NULL;
   gimple_stmt_iterator gsi;
@@ -369,7 +415,15 @@  record_temporary_equivalences_from_stmts_at_dest (edge e,
 	  if (fndecl
 	      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_OBJECT_SIZE
 		  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CONSTANT_P))
-	    continue;
+	    {
+	      if (backedge_seen)
+		{
+		  tree lhs = gimple_get_lhs (stmt);
+		  record_temporary_equivalence (lhs, NULL_TREE, stack);
+		  invalidate_equivalences (lhs, stack, src_map, dst_map);
+		}
+	      continue;
+	    }
 	}
 
       /* At this point we have a statement which assigns an RHS to an
@@ -433,15 +487,36 @@  record_temporary_equivalences_from_stmts_at_dest (edge e,
 	}
 
       /* Record the context sensitive equivalence if we were able
-	 to simplify this statement.  */
+	 to simplify this statement. 
+
+	 If we have traversed a backedge at some point during threading,
+	 then always enter something here.  Either a real equivalence, 
+	 or a NULL_TREE equivalence which is effectively invalidation of
+	 prior equivalences.  */
       if (cached_lhs
 	  && (TREE_CODE (cached_lhs) == SSA_NAME
 	      || is_gimple_min_invariant (cached_lhs)))
 	record_temporary_equivalence (gimple_get_lhs (stmt), cached_lhs, stack);
+      else if (backedge_seen)
+	record_temporary_equivalence (gimple_get_lhs (stmt), NULL_TREE, stack);
+
+      if (backedge_seen)
+	invalidate_equivalences (gimple_get_lhs (stmt), stack,
+				 src_map, dst_map);
     }
   return stmt;
 }
 
+/* Once we have passed a backedge in the CFG when threading, we do not want to
+   utilize edge equivalences for simplification purpose.  They are no longer
+   necessarily valid.  We use this callback rather than the ones provided by
+   DOM/VRP to achieve that effect.  */
+static tree
+dummy_simplify (gimple stmt1 ATTRIBUTE_UNUSED, gimple stmt2 ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* Simplify the control statement at the end of the block E->dest.
 
    To avoid allocating memory unnecessarily, a scratch GIMPLE_COND
@@ -581,44 +656,6 @@  simplify_control_stmt_condition (edge e,
   return cached_lhs;
 }
 
-/* Return TRUE if the statement at the end of e->dest depends on
-   the output of any statement in BB.   Otherwise return FALSE.
-
-   This is used when we are threading a backedge and need to ensure
-   that temporary equivalences from BB do not affect the condition
-   in e->dest.  */
-
-static bool
-cond_arg_set_in_bb (edge e, basic_block bb)
-{
-  ssa_op_iter iter;
-  use_operand_p use_p;
-  gimple last = last_stmt (e->dest);
-
-  /* E->dest does not have to end with a control transferring
-     instruction.  This can occur when we try to extend a jump
-     threading opportunity deeper into the CFG.  In that case
-     it is safe for this check to return false.  */
-  if (!last)
-    return false;
-
-  if (gimple_code (last) != GIMPLE_COND
-      && gimple_code (last) != GIMPLE_GOTO
-      && gimple_code (last) != GIMPLE_SWITCH)
-    return false;
-
-  FOR_EACH_SSA_USE_OPERAND (use_p, last, iter, SSA_OP_USE | SSA_OP_VUSE)
-    {
-      tree use = USE_FROM_PTR (use_p);
-
-      if (TREE_CODE (use) == SSA_NAME
-	  && gimple_code (SSA_NAME_DEF_STMT (use)) != GIMPLE_PHI
-	  && gimple_bb (SSA_NAME_DEF_STMT (use)) == bb)
-	return true;
-    }
-  return false;
-}
-
 /* Copy debug stmts from DEST's chain of single predecessors up to
    SRC, so that we don't lose the bindings as PHI nodes are introduced
    when DEST gains new predecessors.  */
@@ -805,6 +842,8 @@  thread_around_empty_blocks (edge taken_edge,
 	      path->safe_push (x);
 	      bitmap_set_bit (visited, taken_edge->dest->index);
 	      *backedge_seen_p |= ((taken_edge->flags & EDGE_DFS_BACK) != 0);
+	      if (*backedge_seen_p)
+		simplify = dummy_simplify;
 	      return thread_around_empty_blocks (taken_edge,
 						 dummy_cond,
 						 handle_dominating_asserts,
@@ -827,6 +866,13 @@  thread_around_empty_blocks (edge taken_edge,
       && gimple_code (stmt) != GIMPLE_SWITCH)
     return false;
 
+  /* If we have traversed a backedge, then we do not want to look
+     at certain expressions in the table that can not be relied upon.
+     Luckily the only code that looked at those expressions is the
+     SIMPLIFY callback, which we replace if we can no longer use it.  */
+  if (*backedge_seen_p)
+    simplify = dummy_simplify;
+
   /* Extract and simplify the condition.  */
   cond = simplify_control_stmt_condition (taken_edge, stmt, dummy_cond,
 					  simplify, handle_dominating_asserts);
@@ -846,6 +892,8 @@  thread_around_empty_blocks (edge taken_edge,
 	= new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
       path->safe_push (x);
       *backedge_seen_p |= ((taken_edge->flags & EDGE_DFS_BACK) != 0);
+      if (*backedge_seen_p)
+	simplify = dummy_simplify;
 
       thread_around_empty_blocks (taken_edge,
 				  dummy_cond,
@@ -896,24 +944,28 @@  thread_through_normal_block (edge e,
 			     tree (*simplify) (gimple, gimple),
 			     vec<jump_thread_edge *> *path,
 			     bitmap visited,
-			     bool *backedge_seen_p)
+			     bool *backedge_seen_p,
+			     bitmap src_map,
+			     bitmap dst_map)
 {
-  /* If we have crossed a backedge, then we want to verify that the COND_EXPR,
-     SWITCH_EXPR or GOTO_EXPR at the end of e->dest is not affected
-     by any statements in e->dest.  If it is affected, then it is not
-     safe to thread this edge.  */
-  if (*backedge_seen_p
-      && cond_arg_set_in_bb (e, e->dest))
-    return false;
+  /* If we have traversed a backedge, then we do not want to look
+     at certain expressions in the table that can not be relied upon.
+     Luckily the only code that looked at those expressions is the
+     SIMPLIFY callback, which we replace if we can no longer use it.  */
+  if (*backedge_seen_p)
+    simplify = dummy_simplify;
 
   /* PHIs create temporary equivalences.  */
-  if (!record_temporary_equivalences_from_phis (e, stack))
+  if (!record_temporary_equivalences_from_phis (e, stack, *backedge_seen_p,
+						src_map, dst_map))
     return false;
 
   /* Now walk each statement recording any context sensitive
      temporary equivalences we can detect.  */
   gimple stmt
-    = record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify);
+    = record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify,
+							*backedge_seen_p,
+							src_map, dst_map);
   if (!stmt)
     return false;
 
@@ -955,25 +1007,24 @@  thread_through_normal_block (edge e,
 	    = new jump_thread_edge (taken_edge, EDGE_COPY_SRC_BLOCK);
 	  path->safe_push (x);
 	  *backedge_seen_p |= ((taken_edge->flags & EDGE_DFS_BACK) != 0);
+	  if (*backedge_seen_p)
+	    simplify = dummy_simplify;
 
 	  /* See if we can thread through DEST as well, this helps capture
 	     secondary effects of threading without having to re-run DOM or
-	     VRP.  */
-	  if (!*backedge_seen_p
-	      || ! cond_arg_set_in_bb (taken_edge, e->dest))
-	    {
-	      /* We don't want to thread back to a block we have already
- 		 visited.  This may be overly conservative.  */
-	      bitmap_set_bit (visited, dest->index);
-	      bitmap_set_bit (visited, e->dest->index);
-	      thread_around_empty_blocks (taken_edge,
-					  dummy_cond,
-					  handle_dominating_asserts,
-					  simplify,
-					  visited,
-					  path,
-					  backedge_seen_p);
-	    }
+	     VRP. 
+
+	     We don't want to thread back to a block we have already
+ 	     visited.  This may be overly conservative.  */
+	  bitmap_set_bit (visited, dest->index);
+	  bitmap_set_bit (visited, e->dest->index);
+	  thread_around_empty_blocks (taken_edge,
+				      dummy_cond,
+				      handle_dominating_asserts,
+				      simplify,
+				      visited,
+				      path,
+				      backedge_seen_p);
 	  return true;
 	}
     }
@@ -1015,6 +1066,8 @@  thread_across_edge (gimple dummy_cond,
 		    tree (*simplify) (gimple, gimple))
 {
   bitmap visited = BITMAP_ALLOC (NULL);
+  bitmap src_map = BITMAP_ALLOC (NULL);
+  bitmap dst_map = BITMAP_ALLOC (NULL);
   bool backedge_seen;
 
   stmt_count = 0;
@@ -1024,14 +1077,19 @@  thread_across_edge (gimple dummy_cond,
   bitmap_set_bit (visited, e->src->index);
   bitmap_set_bit (visited, e->dest->index);
   backedge_seen = ((e->flags & EDGE_DFS_BACK) != 0);
+  if (backedge_seen)
+    simplify = dummy_simplify;
+
   if (thread_through_normal_block (e, dummy_cond, handle_dominating_asserts,
 				   stack, simplify, path, visited,
-				   &backedge_seen))
+				   &backedge_seen, src_map, dst_map))
     {
       propagate_threaded_block_debug_into (path->last ()->e->dest,
 					   e->dest);
       remove_temporary_equivalences (stack);
       BITMAP_FREE (visited);
+      BITMAP_FREE (src_map);
+      BITMAP_FREE (dst_map);
       register_jump_thread (path);
       return;
     }
@@ -1066,15 +1124,26 @@  thread_across_edge (gimple dummy_cond,
 	{
 	  remove_temporary_equivalences (stack);
 	  BITMAP_FREE (visited);
+	  BITMAP_FREE (src_map);
+	  BITMAP_FREE (dst_map);
 	  return;
 	}
 
+    /* We need to restore the state of the maps to this point each loop
+       iteration.  */
+    bitmap src_map_copy = BITMAP_ALLOC (NULL);
+    bitmap dst_map_copy = BITMAP_ALLOC (NULL);
+    bitmap_copy (src_map_copy, src_map);
+    bitmap_copy (dst_map_copy, dst_map);
+
     /* Look at each successor of E->dest to see if we can thread through it.  */
     FOR_EACH_EDGE (taken_edge, ei, e->dest->succs)
       {
 	/* Push a fresh marker so we can unwind the equivalences created
 	   for each of E->dest's successors.  */
 	stack->safe_push (NULL_TREE);
+	bitmap_copy (src_map_copy, src_map);
+	bitmap_copy (dst_map_copy, dst_map);
      
 	/* Avoid threading to any block we have already visited.  */
 	bitmap_clear (visited);
@@ -1093,23 +1162,25 @@  thread_across_edge (gimple dummy_cond,
 	found = false;
 	backedge_seen = ((e->flags & EDGE_DFS_BACK) != 0);
 	backedge_seen |= ((taken_edge->flags & EDGE_DFS_BACK) != 0);
-	if (!backedge_seen
-	    || ! cond_arg_set_in_bb (path->last ()->e, e->dest))
-	  found = thread_around_empty_blocks (taken_edge,
-					      dummy_cond,
-					      handle_dominating_asserts,
-					      simplify,
-					      visited,
-					      path,
-					      &backedge_seen);
-
-	if (!found
-	    && (!backedge_seen
-		|| ! cond_arg_set_in_bb (path->last ()->e, e->dest)))
+	if (backedge_seen)
+	  simplify = dummy_simplify;
+	found = thread_around_empty_blocks (taken_edge,
+					    dummy_cond,
+					    handle_dominating_asserts,
+					    simplify,
+					    visited,
+					    path,
+					    &backedge_seen);
+
+	if (backedge_seen)
+	  simplify = dummy_simplify;
+
+	if (!found)
 	  found = thread_through_normal_block (path->last ()->e, dummy_cond,
 					       handle_dominating_asserts,
 					       stack, simplify, path, visited,
-					       &backedge_seen);
+					       &backedge_seen,
+					       src_map, dst_map);
 
 	/* If we were able to thread through a successor of E->dest, then
 	   record the jump threading opportunity.  */
@@ -1128,6 +1199,10 @@  thread_across_edge (gimple dummy_cond,
 	remove_temporary_equivalences (stack);
       }
     BITMAP_FREE (visited);
+    BITMAP_FREE (src_map);
+    BITMAP_FREE (dst_map);
+    BITMAP_FREE (src_map_copy);
+    BITMAP_FREE (dst_map_copy);
   }
 
   remove_temporary_equivalences (stack);