diff mbox series

[V2] tree-optimization: Add register pressure heuristics and appropriate use of profile data

Message ID e07b1b73-b64d-469a-8d45-3e0e6b967791@linux.ibm.com
State New
Headers show
Series [V2] tree-optimization: Add register pressure heuristics and appropriate use of profile data | expand

Commit Message

Ajit Agarwal Nov. 16, 2023, 9:33 a.m. UTC
Hello Richard:

This patch does decision making in code sinking considers the following decision.
High register pressure region is true if the following criteria
satisfied.

a) If liveout (early_bb) <= livein (early_bb).
b) if liveout (best_bb) <= liveout (early_bb).
c) !best_bb->count >= early_bb->count.

If above is true we decide to do code motion (code sinking).

Decision making doesn't include sinking threshold and multiplication
factor of 100.

Decisions as above is based on liveness analysis and profile
data.

consider a stmt a = b + c; where b and c die at the definition of a.

There are chances that live_out (best_bb) greater if for all
successors of best_bb there are more GEN (variables). If
live_out (best_bb) is less means there more KILL (Variables)
in successors of best_bb.

With below heuristics live_out (best_bb) > live_out (early_bb)
then we dont do code motion as there are chances of more
interfering live ranges.If liveout (best_bb) <= liveout (early_bb)
then we do code motion as there is there are more KILL(for all
successors of best_bb) and there is less chance of interfering
live ranges.

With moving down above stmt from early_bb to best_bb increases
live_out (early_bb) by one but live_out (best_bb) may be remains.
If live_out (early_bb) increase by 1 but if it becomes greater
than live_out (best_bb) then we dont do code motion if we have
more GEN (Variables) in best_bb otherewise its safer to do
code motion.

for above statement a = b + c dies b and c and generates a in
early_bb then liveout(early_bb) increases by 1. If before moving
if liveout (best_bb) is 10 and then liveout (early_bb) becomes > 10
then we dont do code motion otherwise we do code motion.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-optimization: Add register pressure heuristics and  appropriate use of profile data

Decision making in code sinking considers the following decision.
High register pressure region is true if the following criteria
satisfied.

a) If liveout (early_bb) <= livein (early_bb).
b) if liveout (best_bb) <= liveout (early_bb).
c) !best_bb->count >= early_bb->count.

If above is true we decide to do code motion (code sinking).

Decision making doesn't include sinking threshold and multiplication
factor of 100.

Decisions as above is based on liveness analysis and profile
data.

consider a stmt a = b + c; where b and c die at the definition of a.

There are chances that live_out (best_bb) greater if for all
successors of best_bb there are more GEN (variables). If
live_out (best_bb) is less means there more KILL (Variables)
in successors of best_bb.

With below heuristics live_out (best_bb) > live_out (early_bb)
then we dont do code motion as there are chances of more
interfering live ranges.If liveout (best_bb) <= liveout (early_bb)
then we do code motion as there is there are more KILL(for all
successors of best_bb) and there is less chance of interfering
live ranges.

With moving down above stmt from early_bb to best_bb increases
live_out (early_bb) by one but live_out (best_bb) may be remains.
If live_out (early_bb) increase by 1 but if it becomes greater
than live_out (best_bb) then we dont do code motion if we have
more GEN (Variables) in best_bb otherewise its safer to do
code motion.

for above statement a = b + c dies b and c and generates a in
early_bb then liveout(early_bb) increases by 1. If before moving
if liveout (best_bb) is 10 and then liveout (early_bb) becomes > 10
then we dont do code motion otherwise we do code motion.

2023-11-16  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

        * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
        as paramters.
        (sink_code_in_bb): Ditto.
        (select_best_block): Add register pressure heuristics to select
        the best blocks in the immediate dominator for same loop nest depth.
        (execute): Add live range analysis.
        (additional_var_map): New function.
        * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
        tests on ssa_names.
        (verify_live_on_entry): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
        * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  15 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c |  19 ++++
 gcc/tree-ssa-live.cc                        |  11 +-
 gcc/tree-ssa-sink.cc                        | 120 +++++++++++++++-----
 4 files changed, 131 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 00000000000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+    {
+      bar();
+      j = l;
+    }
+}
+/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 00000000000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+    {
+      bar();
+      if (b != 3)
+        x = 3;
+      else
+        x = 5;
+      j = l;
+    }
+}
+/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index f06daf23035..998fe588278 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -1141,7 +1141,8 @@  set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
     def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
 
   /* An undefined local variable does not need to be very alive.  */
-  if (ssa_undefined_value_p (ssa_name, false))
+  if (virtual_operand_p (ssa_name)
+      || ssa_undefined_value_p (ssa_name, false))
     return;
 
   /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
@@ -1540,7 +1541,6 @@  debug (tree_live_info_d *ptr)
 
 
 /* Verify that the info in LIVE matches the current cfg.  */
-
 static void
 verify_live_on_entry (tree_live_info_p live)
 {
@@ -1569,11 +1569,13 @@  verify_live_on_entry (tree_live_info_p live)
 	  tree d = NULL_TREE;
 	  bitmap loe;
 	  var = partition_to_var (map, i);
+	  if (var == NULL_TREE)
+	    continue;
 	  stmt = SSA_NAME_DEF_STMT (var);
 	  tmp = gimple_bb (stmt);
+
 	  if (SSA_NAME_VAR (var))
 	    d = ssa_default_def (cfun, SSA_NAME_VAR (var));
-
 	  loe = live_on_entry (live, e->dest);
 	  if (loe && bitmap_bit_p (loe, i))
 	    {
@@ -1614,7 +1616,8 @@  verify_live_on_entry (tree_live_info_p live)
 	      {
 		/* An undefined local variable does not need to be very
 		   alive.  */
-		if (ssa_undefined_value_p (var, false))
+		if (virtual_operand_p (var)
+		    || ssa_undefined_value_p (var, false))
 		  continue;
 
 		/* The only way this var shouldn't be marked live on entry is
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..eca4a4dd8e9 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -176,6 +176,9 @@  nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
    tree, return the best basic block between them (inclusive) to place
    statements.
 
+   The best basic block should be an immediate dominator of
+   best basic block if we've moved to same loop nest.
+
    We want the most control dependent block in the shallowest loop nest.
 
    If the resulting block is in a shallower loop nest, then use it.  Else
@@ -191,24 +194,23 @@  nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
 		   basic_block late_bb,
-		   gimple *stmt)
+		   gimple *stmt,
+		   tree_live_info_p &live)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
     {
       /* If we've moved into a lower loop nest, then that becomes
 	 our best block.  */
-      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
+      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
 	best_bb = temp_bb;
 
       /* Walk up the dominator tree, hopefully we'll find a shallower
  	 loop nest.  */
       temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
     }
-
   /* Placing a statement before a setjmp-like function would be invalid
      (it cannot be reevaluated when execution follows an abnormal edge).
      If we selected a block with abnormal predecessors, just punt.  */
@@ -233,24 +235,63 @@  select_best_block (basic_block early_bb,
       && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
     return early_bb;
 
-  /* Get the sinking threshold.  If the statement to be moved has memory
-     operands, then increase the threshold by 7% as those are even more
-     profitable to avoid, clamping at 100%.  */
-  threshold = param_sink_frequency_threshold;
-  if (gimple_vuse (stmt) || gimple_vdef (stmt))
-    {
-      threshold += 7;
-      if (threshold > 100)
-	threshold = 100;
-    }
+  int best_bb_liveout_cnt
+    = bitmap_count_bits (&live->liveout[best_bb->index]);
+  int early_bb_liveout_cnt
+    = bitmap_count_bits (&live->liveout[early_bb->index]);
+  int early_livein_cnt
+    = bitmap_count_bits (&live->livein[early_bb->index]);
+
+  /* consider a stmt a = b + c; where b and c die at the definition
+     of a.
+
+     There are chances that live_out (best_bb) greater if for all
+     successors of best_bb there are more GEN (variables). If
+     live_out (best_bb) is less means there more KILL (Variables)
+     in successors of best_bb.
+
+     With below heuristics live_out (best_bb) > live_out (early_bb)
+     then we dont do code motion as there are chances of more
+     interfering live ranges.If liveout (best_bb) <= liveout (early_bb)
+     then we do code motion as there is there are more KILL (for all
+     successors of best_bb) and there is less chance of interfering
+     live ranges.
+
+     With moving down above stmt from early_bb to best_bb increases
+     live_out (early_bb) by one but live_out (best_bb) may be remains.
+     If live_out (early_bb) increase by 1 but if it becomes greater
+     than live_out (best_bb) then we dont do code motion if we have
+     more GEN (Variables) in best_bb otherewise its safer to do
+     code motion.  */
+  bool live_in_rgn = (early_livein_cnt != 0
+		      && early_bb_liveout_cnt <= early_livein_cnt);
+
+  bool high_reg_pressure_rgn = false;
+
+  if (live_in_rgn)
+    high_reg_pressure_rgn
+      =	(best_bb_liveout_cnt <= early_bb_liveout_cnt);
+  else
+    high_reg_pressure_rgn
+      = (best_bb_liveout_cnt != 0
+	 && best_bb_liveout_cnt <= early_bb_liveout_cnt);
+
+  /* Add profile count along with register pressure.  */
+  high_reg_pressure_rgn
+    = (high_reg_pressure_rgn && !(best_bb->count >= early_bb->count));
 
   /* If BEST_BB is at the same nesting level, then require it to have
-     significantly lower execution frequency to avoid gratuitous movement.  */
+     significantly high register pressure region to avoid gratuitous
+      movement.  */
   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
-      /* If result of comparsion is unknown, prefer EARLY_BB.
-	 Thus use !(...>=..) rather than (...<...)  */
-      && !(best_bb->count * 100 >= early_bb->count * threshold))
-    return best_bb;
+     && high_reg_pressure_rgn)
+    {
+     /* Avoid sinking to immediate dominator if the statement to be moved
+	has memory operand and same loop nest.  */
+      if (best_bb != late_bb && gimple_vuse (stmt))
+	return late_bb;
+      return best_bb;
+    }
 
   /* No better block found, so return EARLY_BB, which happens to be the
      statement's original block.  */
@@ -265,7 +306,8 @@  select_best_block (basic_block early_bb,
 static bool
 statement_sink_location (gimple *stmt, basic_block frombb,
 			 gimple_stmt_iterator *togsi, bool *zero_uses_p,
-			 virtual_operand_live &vop_live)
+			 virtual_operand_live &vop_live,
+			 tree_live_info_p &live)
 {
   gimple *use;
   use_operand_p one_use = NULL_USE_OPERAND_P;
@@ -413,7 +455,7 @@  statement_sink_location (gimple *stmt, basic_block frombb,
       if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
 	return false;
 
-      commondom = select_best_block (frombb, commondom, stmt);
+      commondom = select_best_block (frombb, commondom, stmt, live);
 
       if (commondom == frombb)
 	return false;	
@@ -430,19 +472,17 @@  statement_sink_location (gimple *stmt, basic_block frombb,
 	    continue;
 	  break;
 	}
+
       use = USE_STMT (one_use);
 
       if (gimple_code (use) != GIMPLE_PHI)
 	{
-	  sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
+	  sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
 
 	  if (sinkbb == frombb)
 	    return false;
 
-	  if (sinkbb == gimple_bb (use))
-	    *togsi = gsi_for_stmt (use);
-	  else
-	    *togsi = gsi_after_labels (sinkbb);
+	  *togsi = gsi_after_labels (sinkbb);
 
 	  return true;
 	}
@@ -454,7 +494,7 @@  statement_sink_location (gimple *stmt, basic_block frombb,
   if (!sinkbb)
     return false;
   
-  sinkbb = select_best_block (frombb, sinkbb, stmt);
+  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
   if (!sinkbb || sinkbb == frombb)
     return false;
 
@@ -643,7 +683,8 @@  sink_common_stores_to_bb (basic_block bb)
 /* Perform code sinking on BB */
 
 static unsigned
-sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
+sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
+		tree_live_info_p &live)
 {
   gimple_stmt_iterator gsi;
   edge_iterator ei;
@@ -670,7 +711,8 @@  sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
       gimple_stmt_iterator togsi;
       bool zero_uses_p;
 
-      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
+      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
+				    vop_live, live))
 	{
 	  gimple_stmt_iterator saved = gsi;
 	  if (!gsi_end_p (gsi))
@@ -814,6 +856,15 @@  private:
   bool unsplit_edges;
 }; // class pass_sink_code
 
+void additional_var_map (var_map &map)
+{
+  map->bmp_bbs = BITMAP_ALLOC (NULL);
+  map->outofssa_p = false;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    bitmap_set_bit (map->bmp_bbs, bb->index);
+}
+
 unsigned int
 pass_sink_code::execute (function *fun)
 {
@@ -827,12 +878,21 @@  pass_sink_code::execute (function *fun)
   calculate_dominance_info (CDI_DOMINATORS);
 
   virtual_operand_live vop_live;
+  var_map map = init_var_map (num_ssa_names);
+  additional_var_map (map);
+  tree_live_info_p live = calculate_live_ranges (map, true);
 
   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
   int n = inverted_rev_post_order_compute (fun, rpo);
+
   for (int i = 0; i < n; ++i)
-    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
+    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
   free (rpo);
+  if (live)
+    delete_tree_live_info (live);
+
+  if (map)
+    delete_var_map (map);
 
   statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
   statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);