diff mbox

[P1,tree-optimization/68541] Add heuristics to path splitting

Message ID 56B534F6.6010201@redhat.com
State New
Headers show

Commit Message

Jeff Law Feb. 5, 2016, 11:49 p.m. UTC
This patch addresses multiple issues with how we determine when to split 
paths.  The primary motivation is addressing 68541 (P1).


As I've gotten testcodes from Ajit, I've been able to look closely at 
the path splitting opportunities and consistently what I've seen is that 
contrary to the original belief, CSE/DCE opportunities are rarely 
exposed by path splitting.  It seems the benefit is more due to removing 
the unconditional jump inherent in a CFG diamond -- even more so on the 
microblaze where these jumps have delay slots.

While there are cases where splitting allows for CSE/DCE, they're the 
exception rather than the rule in the codes I've looked at.

With that in mind, we want to encourage path splitting when the amount 
of code we're duplicating is small.   We also want to balance that with 
not path splitting when the original code is something that may be 
if-convertable.

The vast majority of if-convertable codes are cases where the two 
predecessors of the join block are single statement blocks with their 
results feeding the same PHI in the join block.  We now reject that case 
for path splitting so as not to spoil if-conversion.

The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, 
COND, calls, etc) are not good candidates for further if conversion. 
(ie, we have a MIN_EXPR in both predecessors that feed the PHI in the 
join).   So we still allow those cases for splitting.  This can be 
easily refined as we find more cases or as the if-converter is extended.

So with that in place as the first filter, we just need to put a limiter 
on the number of statements we allow to be copied.  I punted a bit and 
re-used the PARAM for jump threading.  They've got enough in common that 
I felt it was reasonable.  If I were to create a new PARAM, I'd probably 
start with a smaller default value after doing further instrumentation.

While I was working through this I noticed we were path splitting in 
some cases where we shouldn't.  Particularly if we had a CFG like:

           a
          / \
         b   c
        / \ /
       e   d
          / \
         /   \
        /     \
      latch   exit


Duplicating d for the edges b->d and c->d isn't as helpful because the 
duplicate can't be merged into b.  We now detect this kind of case and 
reject it for path splitting.

The new tests are a combination of reductions of codes from Ajit and my 
own poking around.

Looking further out, I really wonder if the low level aspects of path 
splitting like we're trying to capture here really belong in the 
cross-jumping phase of the RTL optimizers.  The higher level aspects 
(when we are able to expose CSE/DCE opportunities) should be driven by 
actually looking at the propagation/simplifications enabled by a 
potential split path.  But those are gcc-7 things.

Bootstrapped and regression tested on x86 linux.  Installed on the 
trunk.  I'll obviously keep an eye out for how the tests are handled on 
other platforms -- I don't think the tests are real sensitive to branch 
costs and such, but I've been surprised before.

Onward to the jump threading code explosion wrap-up...

jeff

Comments

Andreas Schwab Feb. 6, 2016, 10:19 a.m. UTC | #1
Jeff Law <law@redhat.com> writes:

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
> new file mode 100644
> index 0000000..aeb926e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */
> +
> +int
> +foo(char *p, int n)
> +{
> +  int s = 0;
> +  int i;
> +
> +  for (i = 0; i < n; i++) {
> +    if (p[i] >= 0)
> +      s++;
> +    else
> +      s--;
> +  }
> +
> +  return s;
> +}
> +
> +/* { dg-final { scan-tree-dump "appears to be a join point for if-convertable diamond" "split-paths" } } */

That fails on aarch64.

Andreas.
Jeff Law Feb. 8, 2016, 8:35 a.m. UTC | #2
On 02/06/2016 03:19 AM, Andreas Schwab wrote:
> Jeff Law <law@redhat.com> writes:
>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
>> new file mode 100644
>> index 0000000..aeb926e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */
>> +
>> +int
>> +foo(char *p, int n)
>> +{
>> +  int s = 0;
>> +  int i;
>> +
>> +  for (i = 0; i < n; i++) {
>> +    if (p[i] >= 0)
>> +      s++;
>> +    else
>> +      s--;
>> +  }
>> +
>> +  return s;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "appears to be a join point for if-convertable diamond" "split-paths" } } */
>
> That fails on aarch64.
AH.  Depends chars being signed by default.  I'll fix it momentarily.

Thanks,
jeff
Kyrill Tkachov Feb. 8, 2016, 2:18 p.m. UTC | #3
Hi Jeff,

On 05/02/16 23:49, Jeff Law wrote:
> This patch addresses multiple issues with how we determine when to split paths.  The primary motivation is addressing 68541 (P1).
>
>
> As I've gotten testcodes from Ajit, I've been able to look closely at the path splitting opportunities and consistently what I've seen is that contrary to the original belief, CSE/DCE opportunities are rarely exposed by path splitting.  
> It seems the benefit is more due to removing the unconditional jump inherent in a CFG diamond -- even more so on the microblaze where these jumps have delay slots.
>
> While there are cases where splitting allows for CSE/DCE, they're the exception rather than the rule in the codes I've looked at.
>
> With that in mind, we want to encourage path splitting when the amount of code we're duplicating is small.   We also want to balance that with not path splitting when the original code is something that may be if-convertable.
>
> The vast majority of if-convertable codes are cases where the two predecessors of the join block are single statement blocks with their results feeding the same PHI in the join block.  We now reject that case for path splitting so as not 
> to spoil if-conversion.
>
> The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, COND, calls, etc) are not good candidates for further if conversion. (ie, we have a MIN_EXPR in both predecessors that feed the PHI in the join).   So we still allow 
> those cases for splitting.  This can be easily refined as we find more cases or as the if-converter is extended.
>
> So with that in place as the first filter, we just need to put a limiter on the number of statements we allow to be copied.  I punted a bit and re-used the PARAM for jump threading.  They've got enough in common that I felt it was 
> reasonable.  If I were to create a new PARAM, I'd probably start with a smaller default value after doing further instrumentation.
>
> While I was working through this I noticed we were path splitting in some cases where we shouldn't.  Particularly if we had a CFG like:
>
>           a
>          / \
>         b   c
>        / \ /
>       e   d
>          / \
>         /   \
>        /     \
>      latch   exit
>
>
> Duplicating d for the edges b->d and c->d isn't as helpful because the duplicate can't be merged into b.  We now detect this kind of case and reject it for path splitting.
>
> The new tests are a combination of reductions of codes from Ajit and my own poking around.
>
> Looking further out, I really wonder if the low level aspects of path splitting like we're trying to capture here really belong in the cross-jumping phase of the RTL optimizers.  The higher level aspects (when we are able to expose 
> CSE/DCE opportunities) should be driven by actually looking at the propagation/simplifications enabled by a potential split path.  But those are gcc-7 things.
>
> Bootstrapped and regression tested on x86 linux.  Installed on the trunk.  I'll obviously keep an eye out for how the tests are handled on other platforms -- I don't think the tests are real sensitive to branch costs and such, but I've 
> been surprised before.
>
> Onward to the jump threading code explosion wrap-up...
>
> jeff
>

After this patch I also see:
FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join block"

on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with:
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb

Kyrill
Jeff Law Feb. 8, 2016, 5:02 p.m. UTC | #4
On 02/08/2016 07:18 AM, Kyrill Tkachov wrote:
> Hi Jeff,
>
> On 05/02/16 23:49, Jeff Law wrote:
>> This patch addresses multiple issues with how we determine when to
>> split paths.  The primary motivation is addressing 68541 (P1).
>>
>>
>> As I've gotten testcodes from Ajit, I've been able to look closely at
>> the path splitting opportunities and consistently what I've seen is
>> that contrary to the original belief, CSE/DCE opportunities are rarely
>> exposed by path splitting. It seems the benefit is more due to
>> removing the unconditional jump inherent in a CFG diamond -- even more
>> so on the microblaze where these jumps have delay slots.
>>
>> While there are cases where splitting allows for CSE/DCE, they're the
>> exception rather than the rule in the codes I've looked at.
>>
>> With that in mind, we want to encourage path splitting when the amount
>> of code we're duplicating is small.   We also want to balance that
>> with not path splitting when the original code is something that may
>> be if-convertable.
>>
>> The vast majority of if-convertable codes are cases where the two
>> predecessors of the join block are single statement blocks with their
>> results feeding the same PHI in the join block.  We now reject that
>> case for path splitting so as not to spoil if-conversion.
>>
>> The only wrinkle with that heuristic is that some codes (MIN, MAX,
>> ABS, COND, calls, etc) are not good candidates for further if
>> conversion. (ie, we have a MIN_EXPR in both predecessors that feed the
>> PHI in the join).   So we still allow those cases for splitting.  This
>> can be easily refined as we find more cases or as the if-converter is
>> extended.
>>
>> So with that in place as the first filter, we just need to put a
>> limiter on the number of statements we allow to be copied.  I punted a
>> bit and re-used the PARAM for jump threading.  They've got enough in
>> common that I felt it was reasonable.  If I were to create a new
>> PARAM, I'd probably start with a smaller default value after doing
>> further instrumentation.
>>
>> While I was working through this I noticed we were path splitting in
>> some cases where we shouldn't.  Particularly if we had a CFG like:
>>
>>           a
>>          / \
>>         b   c
>>        / \ /
>>       e   d
>>          / \
>>         /   \
>>        /     \
>>      latch   exit
>>
>>
>> Duplicating d for the edges b->d and c->d isn't as helpful because the
>> duplicate can't be merged into b.  We now detect this kind of case and
>> reject it for path splitting.
>>
>> The new tests are a combination of reductions of codes from Ajit and
>> my own poking around.
>>
>> Looking further out, I really wonder if the low level aspects of path
>> splitting like we're trying to capture here really belong in the
>> cross-jumping phase of the RTL optimizers.  The higher level aspects
>> (when we are able to expose CSE/DCE opportunities) should be driven by
>> actually looking at the propagation/simplifications enabled by a
>> potential split path.  But those are gcc-7 things.
>>
>> Bootstrapped and regression tested on x86 linux.  Installed on the
>> trunk.  I'll obviously keep an eye out for how the tests are handled
>> on other platforms -- I don't think the tests are real sensitive to
>> branch costs and such, but I've been surprised before.
>>
>> Onward to the jump threading code explosion wrap-up...
>>
>> jeff
>>
>
> After this patch I also see:
> FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths
> "Duplicating join block"
>
> on arm, but not on aarch64. My arm-none-eabi cross compiler is
> configured with:
> --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb
Thanks.  That configuration just has more statements in the join block 
and bumps up against the limit.

I'll probably just manually increase the limit for the test.

jeff
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a51133..a465156 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@ 
+2016-02-05  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/68541
+	* gimple-ssa-split-paths.c: Include tree-cfg.h and params.h.
+	(count_stmts_in_block): New function.
+	(poor_ifcvt_candidate_code): Likewise.
+	(is_feasible_trace): Add some heuristics to determine when path
+	splitting is profitable.
+	(find_block_to_duplicate_for_splitting_paths): Make sure the graph
+	is a diamond with a single exit.
+
 2016-02-05  Martin Sebor  <msebor@redhat.com>
 
 	PR c++/69662
diff --git a/gcc/gimple-ssa-split-paths.c b/gcc/gimple-ssa-split-paths.c
index 40c099a..ac6de81 100644
--- a/gcc/gimple-ssa-split-paths.c
+++ b/gcc/gimple-ssa-split-paths.c
@@ -25,11 +25,13 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "gimple.h"
 #include "tree-pass.h"
+#include "tree-cfg.h"
 #include "cfganal.h"
 #include "cfgloop.h"
 #include "gimple-iterator.h"
 #include "tracer.h"
 #include "predict.h"
+#include "params.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
    path reaching LATCH is suitable for being split by duplication.
@@ -79,6 +81,11 @@  find_block_to_duplicate_for_splitting_paths (basic_block latch)
 	      || !find_edge (bb_idom, EDGE_PRED (bb, 1)->src))
 	    return NULL;
 
+	  /* And that the predecessors of BB each have a single successor.  */
+	  if (!single_succ_p (EDGE_PRED (bb, 0)->src)
+	      || !single_succ_p (EDGE_PRED (bb, 1)->src))
+	    return NULL;
+
 	  /* So at this point we have a simple diamond for an IF-THEN-ELSE
 	     construct starting at BB_IDOM, with a join point at BB.  BB
 	     pass control outside the loop or to the loop latch.
@@ -91,29 +98,111 @@  find_block_to_duplicate_for_splitting_paths (basic_block latch)
   return NULL;
 }
 
+/* Return the number of non-debug statements in a block.  */
+static unsigned int
+count_stmts_in_block (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+  unsigned int num_stmts = 0;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (!is_gimple_debug (stmt))
+	num_stmts++;
+    }
+  return num_stmts;
+}
+
+/* Return TRUE if CODE represents a tree code that is not likely to
+   be easily if-convertable because it likely expands into multiple
+   insns, FALSE otherwise.  */
+static bool
+poor_ifcvt_candidate_code (enum tree_code code)
+{
+  return (code == MIN_EXPR
+	  || code == MAX_EXPR
+	  || code == ABS_EXPR
+	  || code == COND_EXPR
+	  || code == CALL_EXPR);
+}
+
 /* Return TRUE if BB is a reasonable block to duplicate by examining
    its size, false otherwise.  BB will always be a loop latch block.
 
-   Should this use the same tests as we do for jump threading?  */
+   Things to consider:
+
+     We do not want to spoil if-conversion if at all possible.
+
+     Most of the benefit seems to be from eliminating the unconditional
+     jump rather than CSE/DCE opportunities.  So favor duplicating
+     small latches.  A latch with just a conditional branch is ideal.
+
+     CSE/DCE opportunties crop up when statements from the predecessors
+     feed statements in the latch and allow statements in the latch to
+     simplify.  */
 
 static bool
 is_feasible_trace (basic_block bb)
 {
-  int num_stmt = 0;
-  gimple_stmt_iterator gsi;
-
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  basic_block pred1 = EDGE_PRED (bb, 0)->src;
+  basic_block pred2 = EDGE_PRED (bb, 1)->src;
+  int num_stmts_in_join = count_stmts_in_block (bb);
+  int num_stmts_in_pred1 = count_stmts_in_block (pred1);
+  int num_stmts_in_pred2 = count_stmts_in_block (pred2);
+
+  /* This is meant to catch cases that are likely opportunities for
+     if-conversion.  Essentially we look for the case where
+     BB's predecessors are both single statement blocks where
+     the output of that statement feed the same PHI in BB.  */
+  if (num_stmts_in_pred1 == 1 && num_stmts_in_pred2 == 1)
     {
-      gimple *stmt = gsi_stmt (gsi);
-      if (!is_gimple_debug (stmt))
-	num_stmt++;
+      gimple *stmt1 = last_and_only_stmt (pred1);
+      gimple *stmt2 = last_and_only_stmt (pred2);
+
+      if (stmt1 && stmt2
+	  && gimple_code (stmt1) == GIMPLE_ASSIGN
+	  && gimple_code (stmt2) == GIMPLE_ASSIGN)
+	{
+	  enum tree_code code1 = gimple_assign_rhs_code (stmt1);
+	  enum tree_code code2 = gimple_assign_rhs_code (stmt2);
+
+	  if (!poor_ifcvt_candidate_code (code1)
+	      && !poor_ifcvt_candidate_code (code2))
+	    {
+	      tree lhs1 = gimple_assign_lhs (stmt1);
+	      tree lhs2 = gimple_assign_lhs (stmt2);
+	      gimple_stmt_iterator gsi;
+	      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+		{
+		  gimple *phi = gsi_stmt (gsi);
+		  if ((gimple_phi_arg_def (phi, 0) == lhs1
+		       && gimple_phi_arg_def (phi, 1) == lhs2)
+		      || (gimple_phi_arg_def (phi, 1) == lhs1
+			  && gimple_phi_arg_def (phi, 0) == lhs2))
+		    {
+		      if (dump_file && (dump_flags & TDF_DETAILS))
+			fprintf (dump_file,
+				 "Block %d appears to be a join point for "
+				 "if-convertable diamond.\n",
+				 bb->index);
+		      return false;
+		    }
+		}
+	    }
+	}
     }
 
-  /* We may want to limit how many statements we copy.  */
-  if (num_stmt > 1)
-    return true;
+  /* We may want something here which looks at dataflow and tries
+     to guess if duplication of BB is likely to result in simplification
+     of instructions in BB in either the original or the duplicate.  */
+
+  /* Upper Hard limit on the number statements to copy.  */
+  if (num_stmts_in_join
+      >= PARAM_VALUE (PARAM_MAX_JUMP_THREAD_DUPLICATION_STMTS))
+    return false;
 
-  return false;
+  return true;
 }
 
 /* If the immediate dominator of the latch of the loop is
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index cce24b4..388b2b9 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,13 @@ 
+2016-02-05  Jeff Law  <law@redhat.com>
+
+        PR tree-optimization/68541
+	* gcc.dg/tree-ssa/split-path-2.c: New test.
+	* gcc.dg/tree-ssa/split-path-3.c: New test.
+	* gcc.dg/tree-ssa/split-path-4.c: New test.
+	* gcc.dg/tree-ssa/split-path-5.c: New test.
+	* gcc.dg/tree-ssa/split-path-6.c: New test.
+	* gcc.dg/tree-ssa/split-path-7.c: New test.
+
 2016-02-05  Martin Sebor  <msebor@redhat.com>
 
 	PR c++/69662
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
new file mode 100644
index 0000000..aeb926e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */
+
+int
+foo(char *p, int n)
+{
+  int s = 0;
+  int i;
+
+  for (i = 0; i < n; i++) {
+    if (p[i] >= 0)
+      s++;
+    else
+      s--;
+  }
+
+  return s;
+}
+
+/* { dg-final { scan-tree-dump "appears to be a join point for if-convertable diamond" "split-paths" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-3.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-3.c
new file mode 100644
index 0000000..814504a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-3.c
@@ -0,0 +1,90 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details -w" } */
+
+typedef struct bitmap_head_def *bitmap;
+extern void vec_assert_fail (const char *, const char *, const char *file_,
+			     unsigned line_, const char *function_)
+  __attribute__ ((__noreturn__));
+typedef struct VEC_int_base
+{
+  unsigned num;
+  unsigned alloc;
+  int vec[1];
+}
+VEC_int_base;
+static __inline__ int
+VEC_int_base_space (VEC_int_base * vec_, int alloc_, const char *file_,
+		    unsigned line_, const char *function_)
+{
+  return vec_ ? vec_->alloc - vec_->num >= (unsigned) alloc_ : !alloc_;
+}
+
+static __inline__ int *
+VEC_int_base_quick_push (VEC_int_base * vec_, int obj_, const char *file_,
+			 unsigned line_, const char *function_)
+{
+  (void) ((vec_->num <
+	   vec_->alloc) ? 0 : (vec_assert_fail ("push", "VEC(int,base)",
+						file_, line_, function_), 0));
+}
+
+typedef struct VEC_int_heap
+{
+  VEC_int_base base;
+}
+VEC_int_heap;
+static __inline__ int
+VEC_int_heap_reserve (VEC_int_heap ** vec_, int alloc_, const char *file_,
+		      unsigned line_, const char *function_)
+{
+  int extend =
+    !VEC_int_base_space (((*vec_) ? &(*vec_)->base : 0), alloc_, file_, line_,
+			 function_);
+  if (extend)
+    *vec_ =
+      (VEC_int_heap *) vec_heap_o_reserve (*vec_, alloc_,
+					   __builtin_offsetof (VEC_int_heap,
+							       base.vec),
+					   sizeof (int));
+}
+
+static __inline__ int *
+VEC_int_heap_safe_push (VEC_int_heap ** vec_, const int obj_,
+			const char *file_, unsigned line_,
+			const char *function_)
+{
+  VEC_int_heap_reserve (vec_, 1, file_, line_, function_);
+  return VEC_int_base_quick_push (((*vec_) ? &(*vec_)->base : 0), obj_, file_,
+				  line_, function_);
+};
+
+typedef struct bitmap_head_def
+{
+}
+bitmap_head;
+typedef struct
+{
+}
+bitmap_iterator;
+bitmap
+compute_idf (bitmap_head * dfs)
+{
+  bitmap_iterator bi;
+  unsigned bb_index, i;
+  VEC_int_heap *work_stack;
+  bitmap phi_insertion_points;
+  while ((VEC_int_base_length (((work_stack) ? &(work_stack)->base : 0))) > 0)
+    {
+      for (bmp_iter_and_compl_init
+	   (&(bi), (&dfs[bb_index]), (phi_insertion_points), (0), &(i));
+	   bmp_iter_and_compl (&(bi), &(i)); bmp_iter_next (&(bi), &(i)))
+	{
+	  (VEC_int_heap_safe_push
+	   (&(work_stack), i, "/home/gcc/virgin-gcc/gcc/cfganal.c", 1349,
+	    __FUNCTION__));
+	}
+    }
+  (VEC_int_heap_free (&work_stack));
+}
+
+/* { dg-final { scan-tree-dump-not "Duplicating join block" "split-paths" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-4.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-4.c
new file mode 100644
index 0000000..dac931c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-4.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details -w" } */
+
+powi_cost (long n)
+{
+  unsigned char cache[256];
+  unsigned long digit;
+  unsigned long val;
+  int result;
+  while (val >= 256)
+    {
+      if (val & 1)
+	{
+	  result += powi_lookup_cost (digit, cache) + 3 + 1;
+	}
+      else
+	{
+	  val >>= 1;
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 1 "split-paths" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c
new file mode 100644
index 0000000..5044c73
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c
@@ -0,0 +1,60 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details -w" } */
+
+const extern char *__ctype_ptr__;
+typedef unsigned char uchar;
+static int patlen;
+static int skip[(0x7f * 2 + 1) + 1];
+static uchar *pat = ((void *) 0);
+void
+bmhi_init (const char *pattern)
+{
+  int i, lastpatchar;
+  patlen = strlen (pattern);
+  for (i = 0; i < patlen; i++)
+    pat[i] = (
+	       {
+	       __typeof__ (pattern[i]) __x = (pattern[i]);
+	       ((((__ctype_ptr__ +
+		   sizeof (""[__x]))[(int) (__x)]) & (01 | 02))
+		== 02) ? (int) __x - 'a' + 'A' : (int) __x;
+	       });
+  for (i = 0; i < patlen - 1; ++i)
+    {
+      skip[(
+	     {
+	     __typeof__ (pat[i]) __x = (pat[i]);
+	     ((((__ctype_ptr__ +
+		 sizeof (""[__x]))[(int) (__x)]) & (01 | 02)) ==
+	      01) ? (int) __x - 'A' + 'a' : (int) __x;
+	     })] = patlen - i - 1;
+    }
+  skip[(
+	 {
+	 __typeof__ (lastpatchar) __x = (lastpatchar);
+	 ((((__ctype_ptr__ +
+	     sizeof (""[__x]))[(int) (__x)]) & (01 | 02)) ==
+	  01) ? (int) __x - 'A' + 'a' : (int) __x;
+	 })] = 32767;
+  for (i = 0; i < patlen - 1; ++i)
+    {
+    }
+}
+
+char *
+bmhi_search (const char *string, const int stringlen)
+{
+  int i, j;
+  char *s;
+  for (;;)
+    {
+      while (--j >= 0 && (
+			   {
+			   __typeof__ (s[j]) __x = (s[j]);
+			   ((((__ctype_ptr__ +
+			       sizeof (""[__x]))[(int) (__x)]) &
+			     (01 | 02)) ==
+			    02) ? (int) __x - 'a' +
+			   'A' : (int) __x;}) == pat[j]);
+}}
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 2 "split-paths" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c
new file mode 100644
index 0000000..682166f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c
@@ -0,0 +1,72 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details -w" } */
+
+struct __sFILE
+{
+  unsigned char *_p;
+  int _r;
+};
+typedef struct __sFILE __FILE;
+struct _reent
+{
+  __FILE *_stdin, *_stdout, *_stderr;
+};
+extern struct _reent *_impure_ptr;
+extern char contextbufs[10][1024];
+extern int contextoffset;
+extern int sflag;
+void
+givehelp (interactive)
+     int interactive;
+{
+  if (interactive)
+    {
+      while ((--((_impure_ptr->_stdin))->_r <
+	      0 ? __srget_r (_impure_ptr,
+			     (_impure_ptr->
+			      _stdin)) : (int) (*((_impure_ptr->_stdin))->
+						_p++)) != ' ');
+    }
+}
+
+oof ()
+{
+  int bufsize;
+  int hadnl;
+  while (1)
+    {
+      if (bufsize == (sizeof contextbufs[0]) / 2 - 1)
+	{
+	  if (contextbufs[0][0] == '*' || contextbufs[0][0] == '@')
+	    treeinsert (ichartosstr (strtosichar (contextbufs[0] + 1, 0), 1),
+			(100 + 4 * 20 + 4), contextbufs[0][0] == '*');
+	}
+      if (hadnl)
+	contextoffset = 0;
+      else
+	contextoffset += bufsize;
+      if (sflag)
+	{
+	  stop ();
+	}
+    }
+}
+
+void
+lookharder (string)
+     char *string;
+{
+  register char *g;
+  register char *s;
+  for (s = string; *s != '\0'; s++)
+    {
+      if (*s == '*')
+	{
+	  *g++ = '.';
+	}
+      else
+	*g++ = *s;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 3 "split-paths" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c
new file mode 100644
index 0000000..f14ab79
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c
@@ -0,0 +1,94 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details -w" } */
+
+
+struct _reent
+{
+};
+typedef unsigned char ichar_t;
+struct dent
+{
+  char *word;
+};
+struct flagent
+{
+  ichar_t *strip;
+  ichar_t *affix;
+  short stripl;
+  short affl;
+};
+union ptr_union
+{
+  struct flagptr *fp;
+  struct flagent *ent;
+};
+struct flagptr
+{
+  union ptr_union pu;
+  int numents;
+};
+struct hashheader
+{
+};
+extern struct dent *hashtbl;
+extern char *hashstrings;
+extern int hashsize;
+extern int nodictflag;
+extern int numsflags;
+extern int numpflags;
+extern struct flagent *pflaglist;
+extern struct flagent *sflaglist;
+int
+linit ()
+{
+  register int i;
+  register struct dent *dp;
+  struct flagent *entry;
+  struct flagptr *ind;
+  int viazero;
+  register ichar_t *cp;
+  if (!nodictflag)
+    {
+      for (i = hashsize, dp = hashtbl; --i >= 0; dp++)
+	{
+	  if (dp->word == (char *) -1)
+	    dp->word = ((void *) 0);
+	  else
+	    dp->word = &hashstrings[(int) (dp->word)];
+	}
+    }
+  for (i = numsflags + numpflags, entry = sflaglist; --i >= 0; entry++)
+    {
+      if (entry->stripl)
+	entry->strip = (ichar_t *) & hashstrings[(int) entry->strip];
+      else
+	entry->affix = ((void *) 0);
+    }
+  for (i = numsflags, entry = sflaglist; i > 0; i--, entry++)
+    {
+      if (entry->affl == 0)
+	{
+	  if (ind->pu.fp == ((void *) 0))
+	    {
+	    }
+	}
+    }
+  for (i = numpflags, entry = pflaglist; i > 0; i--, entry++)
+    {
+      if (entry->affl == 0)
+	{
+	  while (ind->numents == 0 && ind->pu.fp != ((void *) 0))
+	    {
+	      if (*cp == 0)
+		{
+		}
+	    }
+	}
+      if (!viazero && ind->numents >= 4
+	  && strcmp ((char *) (entry->affix),
+		     (char *) (ind->pu.ent->affix)) != 0)
+	{
+	}
+    }
+}
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 2 "split-paths" } } */