diff mbox

[PR,tree-optimization/69347] Speedup DOM slightly

Message ID 56A28F36.9000807@redhat.com
State New
Headers show

Commit Message

Jeff Law Jan. 22, 2016, 8:21 p.m. UTC
So as noted in BZ69347 we have regressed a bit in the amount of time 
spent in DOM for included testcase.  My previous patch helped a little, 
but there's still some low hanging fruit.

One of the things that was added during this cycle was the ability to do 
a bit of secondary equivalence discovery.  ie, we discover an 
equivalence due to an edge traversal -- we can, in limited 
circumstances, back propagate the known value backwards and discover 
additional equivalences.

That code turns out to be surprisingly expensive.  After a fair amount 
poking with perf & gprof a few things stick out.

While dominance testing is relatively cheap, when done often enough it 
gets expensive.

We can do a bit better.  Essentially we're walking over a set of 
immediate uses and seeing which of those uses dominate a given block.

Instead we can take the given block and compute its dominators into a 
bitmap.  We then check if the immediate use blocks are in the bitmap.

That turns out to be considerably faster, I believe that's because the 
immediate uses are typically clustered, thus there's a single element in 
the bitmap.  Testing is two memory loads and a memory bit test.

Contrast that to 9 memory loads for dominated_by_p if I'm counting 
correctly.

That cuts the amount of time spent in DOM in half for the 69347 testcase.

The second change is in cprop_into_successor_phis and was found as I was 
wandering the perf report.  Essentially SSA_NAME_VALUE will always be 
null, an SSA_NAME or a constant (min_invariant).  Thus the test that 
it's an SSA_NAME || is_gimple_min_invariant is totally useless.

Bootstrapped and regression tested on x86.  Also verified that for my 
bucket of .i files, there was no change in the resulting code.

Given the major issues are resolved, I'm moving this to a P4.  There's 
still a compile-time regression that's not accounted for, but it's 
relatively small and I suspect it's related to the computed goto used 
for a dispatch table -- which is certainly supported, but it's not the 
typical code pushed through GCC.  The big regression left is intentional 
as we've intentionally upped the parameter for when to avoid gcse.

Installed on the trunk.

Jeff
commit c3b7df35046db818c4c4b7d808b5a3471a0ee9b9
Author: Jeff Law <law@redhat.com>
Date:   Fri Jan 22 13:17:54 2016 -0700

    	PR middle-end/69347
    	* tree-ssa-dom.c (back_propagate_equivalences): Factored out of
    	record_temporary_equivalences.  Rewritten to avoid unnecessary calls
    	into dominated_by_p.
    	(cprop_into_successor_phis): Avoid unnecessary tests.
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 85cde94..73df84f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@ 
+2016-01-21  Jeff Law  <law@redhat.com>
+
+	PR middle-end/69347
+	* tree-ssa-dom.c (back_propagate_equivalences): Factored out of
+	record_temporary_equivalences.  Rewritten to avoid unnecessary calls
+	into dominated_by_p.
+	(cprop_into_successor_phis): Avoid unnecessary tests.
+
 2016-01-22  Richard Henderson  <rth@redhat.com>
 
 	PR target/69416
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 84c9a6a..b690d92 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -819,6 +819,74 @@  dom_valueize (tree t)
   return t;
 }
 
+/* We have just found an equivalence for LHS on an edge E.
+   Look backwards to other uses of LHS and see if we can derive
+   additional equivalences that are valid on edge E.  */
+static void
+back_propagate_equivalences (tree lhs, edge e,
+			     class const_and_copies *const_and_copies)
+{
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  bitmap domby = NULL;
+  basic_block dest = e->dest;
+
+  /* Iterate over the uses of LHS to see if any dominate E->dest.
+     If so, they may create useful equivalences too.
+
+     ???  If the code gets re-organized to a worklist to catch more
+     indirect opportunities and it is made to handle PHIs then this
+     should only consider use_stmts in basic-blocks we have already visited.  */
+  FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
+    {
+      gimple *use_stmt = USE_STMT (use_p);
+
+      /* Often the use is in DEST, which we trivially know we can't use.
+	 This is cheaper than the dominator set tests below.  */
+      if (dest == gimple_bb (use_stmt))
+	continue;
+
+      /* Filter out statements that can never produce a useful
+	 equivalence.  */
+      tree lhs2 = gimple_get_lhs (use_stmt);
+      if (!lhs2 || TREE_CODE (lhs2) != SSA_NAME)
+	continue;
+
+      /* Profiling has shown the domination tests here can be fairly
+	 expensive.  We get significant improvements by building the
+	 set of blocks that dominate BB.  We can then just test
+	 for set membership below.
+
+	 We also initialize the set lazily since often the only uses
+	 are going to be in the same block as DEST.  */
+      if (!domby)
+	{
+	  domby = BITMAP_ALLOC (NULL);
+	  basic_block bb = get_immediate_dominator (CDI_DOMINATORS, dest);
+	  while (bb)
+	    {
+	      bitmap_set_bit (domby, bb->index);
+	      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+	    }
+	}
+
+      /* This tests if USE_STMT does not dominate DEST.  */
+      if (!bitmap_bit_p (domby, gimple_bb (use_stmt)->index))
+	continue;
+
+      /* At this point USE_STMT dominates DEST and may result in a
+	 useful equivalence.  Try to simplify its RHS to a constant
+	 or SSA_NAME.  */
+      tree res = gimple_fold_stmt_to_constant_1 (use_stmt, dom_valueize,
+						 no_follow_ssa_edges);
+      if (res && (TREE_CODE (res) == SSA_NAME || is_gimple_min_invariant (res)))
+	record_equality (lhs2, res, const_and_copies);
+    }
+
+  if (domby)
+    BITMAP_FREE (domby);
+}
+
 /* Record into CONST_AND_COPIES and AVAIL_EXPRS_STACK any equivalences implied
    by traversing edge E (which are cached in E->aux).
 
@@ -836,19 +904,23 @@  record_temporary_equivalences (edge e,
   if (edge_info)
     {
       cond_equivalence *eq;
+      /* If we have 0 = COND or 1 = COND equivalences, record them
+	 into our expression hash tables.  */
+      for (i = 0; edge_info->cond_equivalences.iterate (i, &eq); ++i)
+	record_cond (eq, avail_exprs_stack);
+
       tree lhs = edge_info->lhs;
-      tree rhs = edge_info->rhs;
+      if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+	return;
 
-      /* If we have a simple NAME = VALUE equivalence, record it.  */
-      if (lhs)
-	record_equality (lhs, rhs, const_and_copies);
+      /* Record the simple NAME = VALUE equivalence.  */
+      tree rhs = edge_info->rhs;
+      record_equality (lhs, rhs, const_and_copies);
 
       /* If LHS is an SSA_NAME and RHS is a constant integer and LHS was
 	 set via a widening type conversion, then we may be able to record
 	 additional equivalences.  */
-      if (lhs
-	  && TREE_CODE (lhs) == SSA_NAME
-	  && TREE_CODE (rhs) == INTEGER_CST)
+      if (TREE_CODE (rhs) == INTEGER_CST)
 	{
 	  gimple *defstmt = SSA_NAME_DEF_STMT (lhs);
 
@@ -875,45 +947,10 @@  record_temporary_equivalences (edge e,
 	    }
 	}
 
-      /* If LHS is an SSA_NAME with a new equivalency then try if
-         stmts with uses of that LHS that dominate the edge destination
-	 simplify and allow further equivalences to be recorded.  */
-      if (lhs && TREE_CODE (lhs) == SSA_NAME)
-	{
-	  use_operand_p use_p;
-	  imm_use_iterator iter;
-	  FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
-	    {
-	      gimple *use_stmt = USE_STMT (use_p);
-
-	      /* Only bother to record more equivalences for lhs that
-	         can be directly used by e->dest.
-		 ???  If the code gets re-organized to a worklist to
-		 catch more indirect opportunities and it is made to
-		 handle PHIs then this should only consider use_stmts
-		 in basic-blocks we have already visited.  */
-	      if (e->dest == gimple_bb (use_stmt)
-		  || !dominated_by_p (CDI_DOMINATORS,
-				      e->dest, gimple_bb (use_stmt)))
-		continue;
-	      tree lhs2 = gimple_get_lhs (use_stmt);
-	      if (lhs2 && TREE_CODE (lhs2) == SSA_NAME)
-		{
-		  tree res
-		    = gimple_fold_stmt_to_constant_1 (use_stmt, dom_valueize,
-						      no_follow_ssa_edges);
-		  if (res
-		      && (TREE_CODE (res) == SSA_NAME
-			  || is_gimple_min_invariant (res)))
-		    record_equality (lhs2, res, const_and_copies);
-		}
-	    }
-	}
-
-      /* If we have 0 = COND or 1 = COND equivalences, record them
-	 into our expression hash tables.  */
-      for (i = 0; edge_info->cond_equivalences.iterate (i, &eq); ++i)
-	record_cond (eq, avail_exprs_stack);
+      /* Any equivalence found for LHS may result in additional
+	 equivalences for other uses of LHS that we have already
+	 processed.  */
+      back_propagate_equivalences (lhs, e, const_and_copies);
     }
 }
 
@@ -1321,8 +1358,6 @@  cprop_into_successor_phis (basic_block bb,
 	  new_val = SSA_NAME_VALUE (orig_val);
 	  if (new_val
 	      && new_val != orig_val
-	      && (TREE_CODE (new_val) == SSA_NAME
-		  || is_gimple_min_invariant (new_val))
 	      && may_propagate_copy (orig_val, new_val))
 	    propagate_value (orig_p, new_val);
 	}