diff mbox

Sanitize block partitioning under -freorder-blocks-and-partition

Message ID CAAe5K+UhCzfwm_WomE1yk+ET1tBiNT5svfn_LAc57MUSqJbnaQ@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Aug. 27, 2013, 6:04 p.m. UTC
On Mon, Aug 19, 2013 at 10:47 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Aug 19, 2013 at 8:09 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Remember it isn't using dominance anymore. The latest patch was
>>> instead ensuring the most frequent path between hot blocks and the
>>> entry/exit are marked hot. That should be better than the dominance
>>> approach used in the earlier version.
>>
>> Indeed, that looks more resonable approach.
>> Can you point me to the last version of patch? Last one I remember still
>> walked dominators...
>
> I've included the latest patch below. I still use dominators in the
> post-cfg-optimization fixup (fixup_partitions), but not in the
> partition sanitizing done during the partitioning itself
> (sanitize_hot_paths). The former is looking for hot bbs newly
> dominated by cold bbs after cfg transformations.
>
>>>
>>> > We can commit it and work on better solution incrementally but it will
>>> > probably mean replacing it later.  If you think it makes things easier
>>> > to work on it incrementally, I think the patch is OK.
>>>
>>> Yes, I think this is a big step forward from what is there now for
>>> splitting, which does the splitting purely based on bb count in
>>> isolation. I don't have a much better solution in mind yet.
>>>

Ping on this patch. Honza, did the latest version I sent last week
look ok to you? I've included below a new version that adds the
partitioning insanity warnings we discussed (emitted to the dump only
since as I noted there are various optimization passes that provoke
this due to not fixing up profile data).

(Honza - I am also going to move the discussion we started in this
thread on the COMDAT missing profile handling patch to a different
thread. Should have an update on that shortly.)

Latest patch below retested with bootstrap and regression testing on
x86-64-unknown-linux-gnu, and with a profiledbootstrap and
-freorder-blocks-and-partition forced on. Ok for trunk?

Thanks,
Teresa

2013-08-27  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

        * cfgrtl.c (fixup_new_cold_bb): New routine.
        (commit_edge_insertions): Invoke fixup_partitions.
        (find_partition_fixes): New routine.
        (fixup_partitions): Ditto.
        (verify_hot_cold_block_grouping): Update comments.
        (rtl_verify_edges): Invoke find_partition_fixes.
        (rtl_verify_bb_pointers): Update comments.
        (rtl_verify_bb_layout): Ditto.
        * basic-block.h (probably_never_executed_edge_p): Declare.
        (fixup_partitions): Ditto.
        * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
        * bb-reorder.c (sanitize_hot_paths): New function.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
        sanitize_hot_paths.
        * predict.c (probably_never_executed_edge_p): New routine.
        * cfg.c (check_bb_profile): Add partition insanity warnings.

Comments

Jan Hubicka Aug. 28, 2013, 4:58 p.m. UTC | #1
Hi,
with Martin we did bit of progress on analyzing the problems.  We now have
COMDAT profile merging for FDO and we also noticed that forks can make your
basic blocks appear never executed even though they are executed every run:
the fork is accounted as 3 independnet runs of the program.  First run
is until fork, the second 2 runs are both variant.

I have patch to track this.  Moreover vforks seems to produce repeated
merging of results.

These two factors seems to help to riddle enought the firefox profiles
so it took us weeks to understand what happens.
> +         if (changed)
> +            {
> +              /* Edge forwarding in particular can cause hot blocks previously
> +                 reached by both hot and cold blocks to become dominated only
> +                 by cold blocks. This will cause the verification
> below to fail,
> +                 and lead to now cold code in the hot section. This is not easy
> +                 to detect and fix during edge forwarding, and in some cases
> +                 is only visible after newly unreachable blocks are deleted,
> +                 which will be done in fixup_partitions.  */
> +              fixup_partitions ();

Is it really necessary to run this from internal loop of the cfgcleanup?  It seems
you will play back and forth game where edge forwarding will remove your fallthru
and you will be re-adding it?

I would wait for cfgcleanup to finish its job (I don't really think it needs to be
iterative) and then do the fixup possibly cleaning up when the blocks was repoisitoined
(I suppose it is only case where the code above introduces new cfgcleanup oppurtunities).

> +      /* Walk the preds/succs and check if there is at least one already
> +         marked hot. Keep track of the most frequent pred/succ so that we
> +         can mark it hot if we don't find one.  */
> +      FOR_EACH_EDGE (e, ei, edges)
> +        {
> +          basic_block reach_bb = walk_up ? e->src : e->dest;
> +
> +          if (e->flags & EDGE_DFS_BACK)
> +            continue;
> +
> +          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
> +          {
> +            found = true;
> +            break;
> +          }
> +          if (e->probability > highest_probability)
> +            highest_probability = e->probability;

When doing predecestor walk if you have two predecestors, one executing 100000
times and getting to the block with probability 1%, you want to chose it over
block executing once and getting to you with probability 100%.

You probably want to look for most likely predecestor here.  You need to look
for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for
maximal probability only if all fails?

> +        }
> +
> +      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
> +         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
> +         then the most frequent pred (or succ) needs to be adjusted.  In the
> +         case where multiple preds/succs have the same probability (e.g. a
> +         50-50 branch), then both will be adjusted.  */
> +      if (found)
> +        continue;
> +
> +      FOR_EACH_EDGE (e, ei, edges)
> +        {
> +          if (e->flags & EDGE_DFS_BACK)
> +            continue;
> +          if (e->probability < highest_probability)
> +            continue;

Again for predecestor walk you need to wind down the bit crazy logic described above.
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 202021)
> +++ predict.c   (working copy)
> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun
>    return false;
>  }
> 
> +
> +/* Return true in case edge E is probably never executed.  */
> +
> +bool
> +probably_never_executed_edge_p (struct function *fun, edge e)
> +{
> +  gcc_checking_assert (fun);
> +  if (profile_info && flag_branch_probabilities)
> +    return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
> +  if ((!profile_info || !flag_branch_probabilities)
> +      && (cgraph_get_node (fun->decl)->frequency
> +         == NODE_FREQUENCY_UNLIKELY_EXECUTED))
> +    return true;
> +  return false;
Instead of duplicating the conditional, break out the tests into
probably_never_executed_count_p, like we have for maybe_hot_count_p.

It would be nice to extend this to work w/o profile: probably_never_executed_edge_p
can return true for EH edges, setjmp edges and we can then walk bodies ignoring
EH/setjmp flagging blocks that are probably never executed enabling the partitioning
to do a job w/o profile.

Otherwise the patch look OK to me. Thanks for working on this. Do we have agreement on C++ way of mixing
declarations and code?

Honza
Teresa Johnson Aug. 28, 2013, 6:20 p.m. UTC | #2
On Wed, Aug 28, 2013 at 9:58 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> with Martin we did bit of progress on analyzing the problems.  We now have
> COMDAT profile merging for FDO

 Great! Is this the LTO merging you were talking about in an earlier
message, or the gcov runtime fix (that would presumably not be
lto-specific)?

> and we also noticed that forks can make your
> basic blocks appear never executed even though they are executed every run:
> the fork is accounted as 3 independnet runs of the program.  First run
> is until fork, the second 2 runs are both variant.
>
> I have patch to track this.  Moreover vforks seems to produce repeated
> merging of results.

Aha, does this explain the gimp issue as well?

>
> These two factors seems to help to riddle enought the firefox profiles
> so it took us weeks to understand what happens.
>> +         if (changed)
>> +            {
>> +              /* Edge forwarding in particular can cause hot blocks previously
>> +                 reached by both hot and cold blocks to become dominated only
>> +                 by cold blocks. This will cause the verification
>> below to fail,
>> +                 and lead to now cold code in the hot section. This is not easy
>> +                 to detect and fix during edge forwarding, and in some cases
>> +                 is only visible after newly unreachable blocks are deleted,
>> +                 which will be done in fixup_partitions.  */
>> +              fixup_partitions ();
>
> Is it really necessary to run this from internal loop of the cfgcleanup?

The reason I added it here is that just below there is a call to
verify_flow_info, and that will fail with the new verification.

> It seems
> you will play back and forth game where edge forwarding will remove your fallthru
> and you will be re-adding it?

fixup_partitions will not add new fall through edges. (It may invoke
force_nonfallthru to do the opposite.) So there shouldn't be any
ping-ponging effect.

>
> I would wait for cfgcleanup to finish its job (I don't really think it needs to be
> iterative) and then do the fixup possibly cleaning up when the blocks was repoisitoined
> (I suppose it is only case where the code above introduces new cfgcleanup oppurtunities).

As noted above, I can't do this due to the call to verify_flow_info
for each iteration. One option is to move both down outside the loop.

>
>> +      /* Walk the preds/succs and check if there is at least one already
>> +         marked hot. Keep track of the most frequent pred/succ so that we
>> +         can mark it hot if we don't find one.  */
>> +      FOR_EACH_EDGE (e, ei, edges)
>> +        {
>> +          basic_block reach_bb = walk_up ? e->src : e->dest;
>> +
>> +          if (e->flags & EDGE_DFS_BACK)
>> +            continue;
>> +
>> +          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
>> +          {
>> +            found = true;
>> +            break;
>> +          }
>> +          if (e->probability > highest_probability)
>> +            highest_probability = e->probability;
>
> When doing predecestor walk if you have two predecestors, one executing 100000
> times and getting to the block with probability 1%, you want to chose it over
> block executing once and getting to you with probability 100%.
>
> You probably want to look for most likely predecestor here.  You need to look
> for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for
> maximal probability only if all fails?

Yes, thanks, let me do that.

>
>> +        }
>> +
>> +      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
>> +         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
>> +         then the most frequent pred (or succ) needs to be adjusted.  In the
>> +         case where multiple preds/succs have the same probability (e.g. a
>> +         50-50 branch), then both will be adjusted.  */
>> +      if (found)
>> +        continue;
>> +
>> +      FOR_EACH_EDGE (e, ei, edges)
>> +        {
>> +          if (e->flags & EDGE_DFS_BACK)
>> +            continue;
>> +          if (e->probability < highest_probability)
>> +            continue;
>
> Again for predecestor walk you need to wind down the bit crazy logic described above.
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 202021)
>> +++ predict.c   (working copy)
>> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun
>>    return false;
>>  }
>>
>> +
>> +/* Return true in case edge E is probably never executed.  */
>> +
>> +bool
>> +probably_never_executed_edge_p (struct function *fun, edge e)
>> +{
>> +  gcc_checking_assert (fun);
>> +  if (profile_info && flag_branch_probabilities)
>> +    return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
>> +  if ((!profile_info || !flag_branch_probabilities)
>> +      && (cgraph_get_node (fun->decl)->frequency
>> +         == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>> +    return true;
>> +  return false;
> Instead of duplicating the conditional, break out the tests into
> probably_never_executed_count_p, like we have for maybe_hot_count_p.

ok

>
> It would be nice to extend this to work w/o profile: probably_never_executed_edge_p
> can return true for EH edges, setjmp edges and we can then walk bodies ignoring
> EH/setjmp flagging blocks that are probably never executed enabling the partitioning
> to do a job w/o profile.

agreed. although I would prefer to leave that for a follow-on patch so
it can be tuned a bit.

>
> Otherwise the patch look OK to me. Thanks for working on this. Do we have agreement on C++ way of mixing
> declarations and code?

According to http://gcc.gnu.org/wiki/CppConventions:

"In new code variables which are used in a small scope should be
defined at the point of first use, rather than at the top of the
function. Variables which are used throughout the function may be
defined at the top of the function, as in C."

I think I am following that, but let me know if you see something that
needs to be fixed.

Thanks,
Teresa

>
> Honza
diff mbox

Patch

Index: cfgrtl.c
===================================================================
--- cfgrtl.c    (revision 202021)
+++ cfgrtl.c    (working copy)
@@ -1358,6 +1358,43 @@  fixup_partition_crossing (edge e)
     }
 }

+/* Called when block BB has been reassigned to the cold partition,
+   because it is now dominated by another cold block,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_new_cold_bb (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* This is called when a hot bb is found to now be dominated
+     by a cold bb and therefore needs to become cold. Therefore,
+     its preds will no longer be region crossing. Any non-dominating
+     preds that were previously hot would also have become cold
+     in the caller for the same region. Any preds that were previously
+     region-crossing will be adjusted in fixup_partition_crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      /* We can't have fall-through edges across partition boundaries.
+         Note that force_nonfallthru will do any necessary partition
+         boundary fixup by calling fixup_partition_crossing itself.  */
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.

@@ -1996,6 +2033,14 @@  commit_edge_insertions (void)
 {
   basic_block bb;

+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2190,6 +2235,101 @@  get_last_bb_insn (basic_block bb)
   return end;
 }

+/* Sanity check partition hotness to ensure that basic blocks in
+   the cold partition don't dominate basic blocks in the hot partition.
+   If FLAG_ONLY is true, report violations as errors. Otherwise
+   re-mark the dominated blocks as cold, since this is run after
+   cfg optimizations that may make hot blocks previously reached
+   by both hot and cold blocks now only reachable along cold paths.  */
+
+static vec<basic_block>
+find_partition_fixes (bool flag_only)
+{
+  basic_block bb;
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  vec<basic_block> bbs_to_fix = vNULL;
+
+  /* Callers check this.  */
+  gcc_checking_assert (crtl->has_bb_partition);
+
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      bbs_in_cold_partition.safe_push (bb);
+
+  if (bbs_in_cold_partition.is_empty ())
+    return vNULL;
+
+  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+  if (dom_calculated_here)
+    calculate_dominance_info (CDI_DOMINATORS);
+
+  while (! bbs_in_cold_partition.is_empty  ())
+    {
+      bb = bbs_in_cold_partition.pop ();
+      /* Any blocks dominated by a block in the cold section
+         must also be cold.  */
+      basic_block son;
+      for (son = first_dom_son (CDI_DOMINATORS, bb);
+           son;
+           son = next_dom_son (CDI_DOMINATORS, son))
+        {
+          /* If son is not yet cold, then mark it cold here and
+             enqueue it for further processing.  */
+          if ((BB_PARTITION (son) != BB_COLD_PARTITION))
+            {
+              if (flag_only)
+                error ("non-cold basic block %d dominated "
+                       "by a block in the cold partition (%d)",
son->index, bb->index);
+              else
+                BB_SET_PARTITION (son, BB_COLD_PARTITION);
+              bbs_to_fix.safe_push (son);
+              bbs_in_cold_partition.safe_push (son);
+            }
+        }
+    }
+
+  if (dom_calculated_here)
+    free_dominance_info (CDI_DOMINATORS);
+
+  return bbs_to_fix;
+}
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  vec<basic_block> bbs_to_fix = find_partition_fixes (false);
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! bbs_to_fix.is_empty ())
+    {
+      bb = bbs_to_fix.pop ();
+      fixup_new_cold_bb (bb);
+    }
+}
+
 /* Verify, in the basic block chain, that there is at most one switch
    between hot/cold partitions. This condition will not be true until
    after reorder_basic_blocks is called.  */
@@ -2236,7 +2376,8 @@  verify_hot_cold_block_grouping (void)
 /* Perform several checks on the edges out of each block, such as
    the consistency of the branch probabilities, the correctness
    of hot/cold partition crossing edges, and the number of expected
-   successor edges.  */
+   successor edges.  Also verify that the dominance relationship
+   between hot/cold blocks is sane.  */

 static int
 rtl_verify_edges (void)
@@ -2399,6 +2540,14 @@  rtl_verify_edges (void)
        }
     }

+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  if (crtl->has_bb_partition && !err)
+    {
+      vec<basic_block> bbs_to_fix = find_partition_fixes (true);
+      err = !bbs_to_fix.is_empty ();
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2532,7 +2681,7 @@  rtl_verify_bb_pointers (void)
      and NOTE_INSN_BASIC_BLOCK
    - verify that no fall_thru edge crosses hot/cold partition boundaries
    - verify that there are no pending RTL branch predictions
-   - verify that there is a single hot/cold partition boundary after bbro
+   - verify that hot blocks are not dominated by cold blocks

    In future it can be extended check a lot of other stuff as well
    (reachability of basic blocks, life information, etc. etc.).  */
@@ -2778,7 +2927,8 @@  rtl_verify_bb_layout (void)
    - check that all insns are in the basic blocks
      (except the switch handling code, barriers and notes)
    - check that all returns are followed by barriers
-   - check that all fallthru edge points to the adjacent blocks.  */
+   - check that all fallthru edge points to the adjacent blocks
+   - verify that there is a single hot/cold partition boundary after bbro  */

 static int
 rtl_verify_flow_info (void)
Index: basic-block.h
===================================================================
--- basic-block.h       (revision 202021)
+++ basic-block.h       (working copy)
@@ -726,6 +726,7 @@  extern void compute_available (sbitmap *, sbitmap
 extern bool maybe_hot_bb_p (struct function *, const_basic_block);
 extern bool maybe_hot_edge_p (edge);
 extern bool probably_never_executed_bb_p (struct function *,
const_basic_block);
+extern bool probably_never_executed_edge_p (struct function *, edge);
 extern bool optimize_bb_for_size_p (const_basic_block);
 extern bool optimize_bb_for_speed_p (const_basic_block);
 extern bool optimize_edge_for_size_p (edge);
@@ -797,6 +798,7 @@  extern bool contains_no_active_insn_p (const_basic
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
 extern void emit_barrier_after_bb (basic_block bb);
+extern void fixup_partitions (void);

 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c        (revision 202021)
+++ cfgcleanup.c        (working copy)
@@ -2807,10 +2807,21 @@  try_optimize_cfg (int mode)
              df_analyze ();
            }

+         if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification
below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-         if (changed)
-           verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }

          changed_overall |= changed;
          first_pass = false;
Index: bb-reorder.c
===================================================================
--- bb-reorder.c        (revision 202021)
+++ bb-reorder.c        (working copy)
@@ -1444,27 +1444,134 @@  fix_up_crossing_landing_pad (eh_landing_pad old_lp
       ei_next (&ei);
 }

+
+/* Ensure that all hot bbs are included in a hot path through the
+   procedure. This is done by calling this function twice, once
+   with WALK_UP true (to look for paths from the entry to hot bbs) and
+   once with WALK_UP false (to look for paths from hot bbs to the exit).
+   Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
+   to BBS_IN_HOT_PARTITION.  */
+
+static unsigned int
+sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
+                    vec<basic_block> *bbs_in_hot_partition)
+{
+  /* Callers check this.  */
+  gcc_checking_assert (cold_bb_count);
+
+  /* Keep examining hot bbs while we still have some left to check
+     and there are remaining cold bbs.  */
+  vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
+  while (! hot_bbs_to_check.is_empty ()
+         && cold_bb_count)
+    {
+      basic_block bb = hot_bbs_to_check.pop ();
+      vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
+      edge e;
+      edge_iterator ei;
+      int highest_probability = 0;
+      bool found = false;
+
+      /* Walk the preds/succs and check if there is at least one already
+         marked hot. Keep track of the most frequent pred/succ so that we
+         can mark it hot if we don't find one.  */
+      FOR_EACH_EDGE (e, ei, edges)
+        {
+          basic_block reach_bb = walk_up ? e->src : e->dest;
+
+          if (e->flags & EDGE_DFS_BACK)
+            continue;
+
+          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
+          {
+            found = true;
+            break;
+          }
+          if (e->probability > highest_probability)
+            highest_probability = e->probability;
+        }
+
+      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
+         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
+         then the most frequent pred (or succ) needs to be adjusted.  In the
+         case where multiple preds/succs have the same probability (e.g. a
+         50-50 branch), then both will be adjusted.  */
+      if (found)
+        continue;
+
+      FOR_EACH_EDGE (e, ei, edges)
+        {
+          if (e->flags & EDGE_DFS_BACK)
+            continue;
+          if (e->probability < highest_probability)
+            continue;
+
+          basic_block reach_bb = walk_up ? e->src : e->dest;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked hot.  */
+          BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot reach_bb to see if it is also
+             dominated by a cold bb.  */
+          bbs_in_hot_partition->safe_push (reach_bb);
+          hot_bbs_to_check.safe_push (reach_bb);
+        }
+    }
+
+  return cold_bb_count;
+}
+
+
 /* Find the basic blocks that are rarely executed and need to be moved to
    a separate section of the .o file (to cut down on paging and improve
    cache locality).  Return a vector of all edges that cross.  */

-static vec<edge>
+static vec<edge>
 find_rarely_executed_basic_blocks_and_crossing_edges (void)
 {
   vec<edge> crossing_edges = vNULL;
   basic_block bb;
   edge e;
   edge_iterator ei;
+  unsigned int cold_bb_count = 0;
+  vec<basic_block> bbs_in_hot_partition = vNULL;

   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          bbs_in_hot_partition.safe_push (bb);
+        }
     }

+  /* Ensure that hot bbs are included along a hot path from the entry to exit.
+     Several different possibilities may include cold bbs along all paths
+     to/from a hot bb. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be
hot, because
+     it is entered fewer times than the number of profile training
runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness. This is fixed by walking up from hot bbs
+     to the entry block, and then down from hot bbs to the exit, performing
+     partitioning fixups as necessary.  */
+  if (cold_bb_count)
+    {
+      mark_dfs_back_edges ();
+      cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
+                                          &bbs_in_hot_partition);
+      if (cold_bb_count)
+        sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
Index: predict.c
===================================================================
--- predict.c   (revision 202021)
+++ predict.c   (working copy)
@@ -241,6 +241,22 @@  probably_never_executed_bb_p (struct function *fun
   return false;
 }

+
+/* Return true in case edge E is probably never executed.  */
+
+bool
+probably_never_executed_edge_p (struct function *fun, edge e)
+{
+  gcc_checking_assert (fun);
+  if (profile_info && flag_branch_probabilities)
+    return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
+  if ((!profile_info || !flag_branch_probabilities)
+      && (cgraph_get_node (fun->decl)->frequency
+         == NODE_FREQUENCY_UNLIKELY_EXECUTED))
+    return true;
+  return false;
+}
+
 /* Return true if NODE should be optimized for size.  */

 bool
Index: cfg.c
===================================================================
--- cfg.c       (revision 202021)
+++ cfg.c       (working copy)
@@ -446,6 +446,21 @@  check_bb_profile (basic_block bb, FILE * file, int
                 (flags & TDF_COMMENT) ? ";; " : "", s_indent,
                 (int) lsum, (int) bb->count);
     }
+  if (BB_PARTITION (bb) == BB_COLD_PARTITION)
+    {
+      /* Warn about inconsistencies in the partitioning that are
+         currently caused by profile insanities created via optimization.  */
+      if (!probably_never_executed_bb_p (fun, bb))
+        fprintf (file, "%s%sBlock in cold partition with hot count\n",
+                 (flags & TDF_COMMENT) ? ";; " : "", s_indent);
+      FOR_EACH_EDGE (e, ei, bb->preds)
+        {
+          if (!probably_never_executed_edge_p (fun, e))
+            fprintf (file,
+                     "%s%sBlock in cold partition with incoming hot edge\n",
+                     (flags & TDF_COMMENT) ? ";; " : "", s_indent);
+        }
+    }
 }
 ^L
 void