diff mbox

0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch

Message ID 4C225BA4.9070603@codesourcery.com
State New
Headers show

Commit Message

Maxim Kuvyrkov June 23, 2010, 7:08 p.m. UTC
On 6/22/10 4:24 PM, Maxim Kuvyrkov wrote:
> On 6/22/10 12:18 AM, Jeff Law wrote:
>> On 06/21/10 13:28, Steven Bosscher wrote:
>>>
>>> I experimented with a patch similar to Maxim's already 2.5 years ago
>>> (and offered to work on it for CS, but there was no interest in this
>>> work at the time :-/) See these three Bugzilla comments:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>> Right. This is precisely the problem with using immediate dominators.
>> This doesn't argue that Maxim's approach is wrong or bad for compile
>> time performance or anything like that. It merely raises the same issue.
>
> I agree with Steven that the search is better be constrained, possibly,
> with a large enough constant. I've added a new parameter and a
> dominance.c function to return dominated blocks up to depth N in the
> dominator tree (with N==1 being immediate dominators and N==0 being all
> dominators).

The attached patch adds max-hoist-depth parameter to control depth of 
descend in dominator tree.  The default value of 30 should be enough for 
most practical purposes.

>
> Does this sound OK?
>
> ...
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13
>>>
>>> Note especially the pessimization in comment #13 of PR33828.
>>> Therefore I maintain my objection to this patch.
>> Clearly you don't want to hoist any higher than the lowest common
>> dominator. Otherwise you unreasonably lengthen lifetimes. Maxim will
>> need to address this problem as well.
>
> This can be addressed with a walk over the dominator tree after we
> compute VBEout. Start with the root and descend in the tree keeping a
> bitset of expressions that should be alive up the tree. If current node
>
> 1. has a single successor,
> 2. has i'th expression set in VBEout,
> 3. the successor has i'th expression set in VBEout,
> 4. current node doesn't generate i'th expression,
> 5. i'th expression is not marked in the bitset as required up the tree,
>
> than we can hoist i'th expression in the successor with the same result
> as in the current node and not unnecessarily extend live ranges. There
> maybe a couple more details to the above, but the problem should be
> easily fixable.

This is implemented as cleanup_code_hoist_vbeout() function.  The 
solution it produces is OK from correctness point of view (it removes 
bits from VBEout), but, please, *check my reasoning* to make sure it 
doesn't remove from VBEout expressions it shouldn't.

VBEout for the testcase in PR33828 is now just as expected:

hoisting vbeinout computation: 2 passes
vbeout(2): n_bits = 5, set = {1 3 }
vbeout(3): n_bits = 5, set = {1 3 }
vbeout(4): n_bits = 5, set = {1 3 }
vbeout(5): n_bits = 5, set = {0 1 2 3 }
vbeout(6): n_bits = 5, set = {}
hoisting vbeout cleanup pass
vbeout(2): n_bits = 5, set = {}
vbeout(3): n_bits = 5, set = {}
vbeout(4): n_bits = 5, set = {1 3 }
vbeout(5): n_bits = 5, set = {}
vbeout(6): n_bits = 5, set = {}

With cleaned up vbeout the pass hoists occurences from bb5 and bb6 to 
bb4 instead of [unnecessarily far] to bb2:

PRE/HOIST: end of bb 4, insn 47, copying expression 1 to reg 146

Cleaning up vbeout also makes for less mechanic work to be done in 
hoist_code speeding up the pass.

Any comments?  OK to check in?

Comments

Paolo Bonzini June 23, 2010, 7:20 p.m. UTC | #1
On 06/23/2010 09:08 PM, Maxim Kuvyrkov wrote:
> On 6/22/10 4:24 PM, Maxim Kuvyrkov wrote:
>> On 6/22/10 12:18 AM, Jeff Law wrote:
>>> On 06/21/10 13:28, Steven Bosscher wrote:
>>>>
>>>> I experimented with a patch similar to Maxim's already 2.5 years ago
>>>> (and offered to work on it for CS, but there was no interest in this
>>>> work at the time :-/) See these three Bugzilla comments:
>>>>
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>>> Right. This is precisely the problem with using immediate dominators.
>>> This doesn't argue that Maxim's approach is wrong or bad for compile
>>> time performance or anything like that. It merely raises the same issue.
>>
>> I agree with Steven that the search is better be constrained, possibly,
>> with a large enough constant. I've added a new parameter and a
>> dominance.c function to return dominated blocks up to depth N in the
>> dominator tree (with N==1 being immediate dominators and N==0 being all
>> dominators).
>
> The attached patch adds max-hoist-depth parameter to control depth of
> descend in dominator tree. The default value of 30 should be enough for
> most practical purposes.

30 seems like "infinite" for most practical cases.  Have you measured 
the code size impact on CSiBE with say 1, 5, 10, 20, 30?

Paolo
Maxim Kuvyrkov June 23, 2010, 7:52 p.m. UTC | #2
On 6/23/10 11:20 PM, Paolo Bonzini wrote:
> On 06/23/2010 09:08 PM, Maxim Kuvyrkov wrote:
>> On 6/22/10 4:24 PM, Maxim Kuvyrkov wrote:
>>> On 6/22/10 12:18 AM, Jeff Law wrote:
>>>> On 06/21/10 13:28, Steven Bosscher wrote:
>>>>>
>>>>> I experimented with a patch similar to Maxim's already 2.5 years ago
>>>>> (and offered to work on it for CS, but there was no interest in this
>>>>> work at the time :-/) See these three Bugzilla comments:
>>>>>
>>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>>>> Right. This is precisely the problem with using immediate dominators.
>>>> This doesn't argue that Maxim's approach is wrong or bad for compile
>>>> time performance or anything like that. It merely raises the same
>>>> issue.
>>>
>>> I agree with Steven that the search is better be constrained, possibly,
>>> with a large enough constant. I've added a new parameter and a
>>> dominance.c function to return dominated blocks up to depth N in the
>>> dominator tree (with N==1 being immediate dominators and N==0 being all
>>> dominators).
>>
>> The attached patch adds max-hoist-depth parameter to control depth of
>> descend in dominator tree. The default value of 30 should be enough for
>> most practical purposes.
>
> 30 seems like "infinite" for most practical cases.

Exactly.  The purpose of the parameter is to avoid quadratic behavior on 
weird CFGs.  For normal graphs code hoisting should traverse the whole 
structure.

The problem of excessive code hoisting that arises when looking for 
expressions in the whole dominator subtree [instead of just immediately 
dominated blocks] will be addressed in another patch I'm about to post. 
  The the GCSE-complex-constants thread.

Thanks,
Steven Bosscher June 23, 2010, 8:49 p.m. UTC | #3
On Wed, Jun 23, 2010 at 9:52 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
>>> The attached patch adds max-hoist-depth parameter to control depth of
>>> descend in dominator tree. The default value of 30 should be enough for
>>> most practical purposes.
>>
>> 30 seems like "infinite" for most practical cases.
>
> Exactly.  The purpose of the parameter is to avoid quadratic behavior on
> weird CFGs.  For normal graphs code hoisting should traverse the whole
> structure.
>
> The problem of excessive code hoisting that arises when looking for
> expressions in the whole dominator subtree [instead of just immediately
> dominated blocks] will be addressed in another patch I'm about to post.  The
> the GCSE-complex-constants thread.

Actually the parameter also makes a difference for code size. When I
experimented with this all this time ago, I had the best CSiBE size
scores with a depth of 5 or 6.

Ciao!
Steven
Maxim Kuvyrkov June 23, 2010, 8:59 p.m. UTC | #4
On 6/24/10 12:49 AM, Steven Bosscher wrote:
...
> Actually the parameter also makes a difference for code size. When I
> experimented with this all this time ago, I had the best CSiBE size
> scores with a depth of 5 or 6.

Yes, I got similar results when tested this patch without max_distance 
restriction on expressions.  Depth of 5 or 6 avoids nasty regressions in 
code with high register pressure; with a greater depth all small 
expressions get hoisted to the first basic block and register pressure 
sky-rockets.

However, it is still better to address this problem by restricting 
individual expressions with max_distance.  Using depth of the search 
seems like too big a hammer.

Regards,
diff mbox

Patch

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 135c0c2..1bf192d 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -854,6 +854,8 @@  extern VEC (basic_block, heap) *get_dominated_by (enum cdi_direction, basic_bloc
 extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction,
 							 basic_block *,
 							 unsigned);
+extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction,
+							basic_block, int);
 extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction,
 							  basic_block);
 extern void add_to_dominance_info (enum cdi_direction, basic_block);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9e517e9..05ebcf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8195,6 +8195,12 @@  when @option{-ftree-vectorize} is used.  The number of iterations after
 vectorization needs to be greater than the value specified by this option
 to allow vectorization.  The default value is 0.
 
+@item max-hoist-depth
+The depth of search in the dominator tree for expressions to hoist.
+This is used to avoid quadratic behavior in hoisting algorithm.
+The value of 0 will avoid limiting the search, but may slow down compilation
+of huge functions.  The default value is 30.
+
 @item max-unrolled-insns
 The maximum number of instructions that a loop should have if that loop
 is unrolled, and if the loop is unrolled, it determines how many times
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c2dcf0..7861439 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -783,16 +783,20 @@  get_dominated_by_region (enum cdi_direction dir, basic_block *region,
 }
 
 /* Returns the list of basic blocks including BB dominated by BB, in the
-   direction DIR.  The vector will be sorted in preorder.  */
+   direction DIR up to DEPTH in the dominator tree.  The DEPTH of zero will
+   produce a vector containing all dominated blocks.  The vector will be sorted
+   in preorder.  */
 
 VEC (basic_block, heap) *
-get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth)
 {
   VEC(basic_block, heap) *bbs = NULL;
   unsigned i;
+  unsigned next_level_start;
 
   i = 0;
   VEC_safe_push (basic_block, heap, bbs, bb);
+  next_level_start = 1; /* = VEC_length (basic_block, bbs); */
 
   do
     {
@@ -803,12 +807,24 @@  get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
 	   son;
 	   son = next_dom_son (dir, son))
 	VEC_safe_push (basic_block, heap, bbs, son);
+
+      if (i == next_level_start && --depth)
+	next_level_start = VEC_length (basic_block, bbs);
     }
-  while (i < VEC_length (basic_block, bbs));
+  while (i < next_level_start);
 
   return bbs;
 }
 
+/* Returns the list of basic blocks including BB dominated by BB, in the
+   direction DIR.  The vector will be sorted in preorder.  */
+
+VEC (basic_block, heap) *
+get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+{
+  return get_dominated_to_depth (dir, bb, 0);
+}
+
 /* Redirect all edges pointing to BB to TO.  */
 void
 redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 3af1a01..e323db1 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4145,6 +4145,57 @@  free_code_hoist_mem (void)
   free_dominance_info (CDI_DOMINATORS);
 }
 
+/* Cleanup VBEout of BB and its successors in the dominator tree.  */
+
+static void
+cleanup_code_hoist_vbeout (basic_block bb)
+{
+  basic_block son;
+  bool first_p = true;
+
+  /* We follow two rules to clean up VBEout[BB]:
+
+     1. If BB does not have any dominated blocks, nothing will ever be hoisted
+     to BB, so we can just wipe its VBEout clean.
+
+     2. If an expression can be hoisted both to BB and to a *single* successor
+     of BB in the dominator tree, then there is no point of hoisting
+     the expression to BB over BB's successor.  Doing otherwise would
+     unnecessarily extend live ranges.  One exception to this rule is when
+     an expression is computed in BB and available at BB's end, so we need
+     to subtract comp[bb] from the set of expressions that are present in
+     only one of the dominated blocks.  */
+
+  for (son = first_dom_son (CDI_DOMINATORS, bb);
+       son != NULL;
+       son = next_dom_son (CDI_DOMINATORS, son))
+    {
+      cleanup_code_hoist_vbeout (son);
+
+      if (first_p)
+	{
+	  sbitmap_copy (hoist_vbein[bb->index], hoist_vbeout[son->index]);
+	  first_p = false;
+	}
+      else
+	sbitmap_difference (hoist_vbein[bb->index],
+			    hoist_vbein[bb->index], hoist_vbeout[son->index]);
+    }
+
+  if (!first_p)
+    {
+      sbitmap_difference (hoist_vbein[bb->index],
+			  hoist_vbein[bb->index], comp[bb->index]);
+
+      if (sbitmap_any_common_bits (hoist_vbeout[bb->index],
+				   hoist_vbein[bb->index]))
+	sbitmap_difference (hoist_vbeout[bb->index],
+			    hoist_vbeout[bb->index], hoist_vbein[bb->index]);
+    }
+  else
+    sbitmap_zero (hoist_vbeout[bb->index]);
+}
+
 /* Compute the very busy expressions at entry/exit from each block.
 
    An expression is very busy if all paths from a given point
@@ -4203,6 +4254,19 @@  compute_code_hoist_vbeinout (void)
 	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
 	}
     }
+
+  cleanup_code_hoist_vbeout (ENTRY_BLOCK_PTR->next_bb);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "hoisting vbeout cleanup pass\n");
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
 }
 
 /* Top level routine to do the dataflow analysis needed by code hoisting.  */
@@ -4212,8 +4276,8 @@  compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
   compute_transpout ();
-  compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
+  compute_code_hoist_vbeinout ();
   if (dump_file)
     fprintf (dump_file, "\n");
 }
@@ -4306,7 +4370,8 @@  hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
@@ -4397,7 +4462,11 @@  hoist_code (void)
 		     it would be safe to compute it at the start of the
 		     dominated block.  Now we have to determine if the
 		     expression would reach the dominated block if it was
-		     placed at the end of BB.  */
+		     placed at the end of BB.
+		     Note: the fact that hoist_exprs has i-th bit set means
+		     that /some/, not necesserilly all, occurences from
+		     the dominated blocks can be hoisted to BB.  Here we check
+		     if a specific occurence can be hoisted to BB.  */
 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
 		    {
 		      struct expr *expr = index_map[i];
@@ -4410,6 +4479,12 @@  hoist_code (void)
 			occr = occr->next;
 
 		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			continue;
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);
diff --git a/gcc/params.def b/gcc/params.def
index 35650ff..f08d482 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -219,6 +219,14 @@  DEFPARAM(PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION,
 	"gcse-after-reload-critical-fraction",
 	"The threshold ratio of critical edges execution count that permit performing redundancy elimination after reload",
         10, 0, 0)
+/* How deep from a given basic block the dominator tree should be searched
+   for expressions to hoist to the block.  The value of 0 will avoid limiting
+   the search.  */
+DEFPARAM(PARAM_MAX_HOIST_DEPTH,
+	 "max-hoist-depth",
+	 "Maximum depth of search in the dominator tree for expressions to hoist",
+	 30, 0, 0)
+
 /* This parameter limits the number of insns in a loop that will be unrolled,
    and by how much the loop is unrolled.
 
diff --git a/gcc/params.h b/gcc/params.h
index 833fc3b..c0404ca 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -125,6 +125,8 @@  typedef enum compiler_param
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
 #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION)
+#define MAX_HOIST_DEPTH \
+  PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
 #define MAX_UNROLLED_INSNS \
   PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
 #define MAX_SMS_LOOP_NUMBER \