diff mbox

Fix PR64878: do not jump thread across two loop iterations

Message ID 20150204212006.GA18747@f1.c.bardezibar.internal
State New
Headers show

Commit Message

Sebastian Pop Feb. 4, 2015, 9:20 p.m. UTC
Hi,

The attached patch stops the recursion in the detection of FSM jump-threads at
loop phi nodes after having visited a loop phi node.  This avoids jump-threading
two iterations forward that were possible due to a flip-flop operation that
exchange the value of the switch control variable as illustrated in the
testcase:

do {
  c = read_from_memory;
  switch (a) {
  case 0:
    if (c == ' ')
     [...]
    else
      a = b;   // flip-flop
    break;
  case 1:
    a = 0;
    b = 15; // this will jump-thread to 15
    break;

  case 15:
  [...]
  }
} while (...);

The patch has passed bootstrap and regression testing on x86_64-linux.
Ok to commit?

Thanks,
Sebastian

Comments

Richard Biener Feb. 5, 2015, 7:21 a.m. UTC | #1
On February 4, 2015 10:20:06 PM CET, Sebastian Pop <sebpop@gmail.com> wrote:
>Hi,
>
>The attached patch stops the recursion in the detection of FSM
>jump-threads at
>loop phi nodes after having visited a loop phi node.  This avoids
>jump-threading
>two iterations forward that were possible due to a flip-flop operation
>that
>exchange the value of the switch control variable as illustrated in the
>testcase:
>
>do {
>  c = read_from_memory;
>  switch (a) {
>  case 0:
>    if (c == ' ')
>     [...]
>    else
>      a = b;   // flip-flop
>    break;
>  case 1:
>    a = 0;
>    b = 15; // this will jump-thread to 15
>    break;
>
>  case 15:
>  [...]
>  }
>} while (...);
>
>The patch has passed bootstrap and regression testing on x86_64-linux.
>Ok to commit?

But is sounds like a useful optimization?

Richard.

>Thanks,
>Sebastian
Jakub Jelinek Feb. 5, 2015, 2:46 p.m. UTC | #2
On Thu, Feb 05, 2015 at 08:21:46AM +0100, Richard Biener wrote:
> On February 4, 2015 10:20:06 PM CET, Sebastian Pop <sebpop@gmail.com> wrote:
> >The attached patch stops the recursion in the detection of FSM
> >jump-threads at
> >loop phi nodes after having visited a loop phi node.  This avoids
> >jump-threading
> >two iterations forward that were possible due to a flip-flop operation
> >that
> >exchange the value of the switch control variable as illustrated in the
> >testcase:
> >
> >do {
> >  c = read_from_memory;
> >  switch (a) {
> >  case 0:
> >    if (c == ' ')
> >     [...]
> >    else
> >      a = b;   // flip-flop
> >    break;
> >  case 1:
> >    a = 0;
> >    b = 15; // this will jump-thread to 15
> >    break;
> >
> >  case 15:
> >  [...]
> >  }
> >} while (...);
> >
> >The patch has passed bootstrap and regression testing on x86_64-linux.
> >Ok to commit?
> 
> But is sounds like a useful optimization?

Only if we could do it reliably.  If it isn't fixable in a different way for
GCC 5.0, then I think it is better to live with a possibly
missed-optimization than wrong-code.

	Jakub
Sebastian Pop Feb. 5, 2015, 7:35 p.m. UTC | #3
Jakub Jelinek wrote:
> On Thu, Feb 05, 2015 at 08:21:46AM +0100, Richard Biener wrote:
> > On February 4, 2015 10:20:06 PM CET, Sebastian Pop <sebpop@gmail.com> wrote:
> > >The attached patch stops the recursion in the detection of FSM
> > >jump-threads at
> > >loop phi nodes after having visited a loop phi node.  This avoids
> > >jump-threading
> > >two iterations forward that were possible due to a flip-flop operation
> > >that
> > >exchange the value of the switch control variable as illustrated in the
> > >testcase:
> > >
> > >do {
> > >  c = read_from_memory;
> > >  switch (a) {
> > >  case 0:
> > >    if (c == ' ')
> > >     [...]
> > >    else
> > >      a = b;   // flip-flop
> > >    break;
> > >  case 1:
> > >    a = 0;
> > >    b = 15; // this will jump-thread to 15
> > >    break;
> > >
> > >  case 15:
> > >  [...]
> > >  }
> > >} while (...);
> > >
> > >The patch has passed bootstrap and regression testing on x86_64-linux.
> > >Ok to commit?
> > 
> > But is sounds like a useful optimization?
> 
> Only if we could do it reliably.  If it isn't fixable in a different way for
> GCC 5.0, then I think it is better to live with a possibly
> missed-optimization than wrong-code.

The alternative to correct the code generation for this jump-thread is to add
code to detect control dependences ("a = b" is dependent on all the conditions
around it, "c != ' ' && c != '/'") and to copy the control dependences over in
the duplicated jump-thread path.  I'm more confortable with the 3 line fix that
I submitted to disable the functionality than adding 100 lines to support copy
of control dependences.  From a perf point of view, the patch only disables one
jump-thread in the testcase that was failing, all the other jump-threads are
still enabled, and we do not regress on coremark.

Does it make sense to also add a check on how many jumps are threaded for the
testcase that I'm adding with this patch?

Thanks,
Sebastian
Jeff Law Feb. 6, 2015, 7:55 a.m. UTC | #4
On 02/04/15 14:20, Sebastian Pop wrote:
> Hi,
>
> The attached patch stops the recursion in the detection of FSM jump-threads at
> loop phi nodes after having visited a loop phi node.  This avoids jump-threading
> two iterations forward that were possible due to a flip-flop operation that
> exchange the value of the switch control variable as illustrated in the
> testcase:
>
> do {
>    c = read_from_memory;
>    switch (a) {
>    case 0:
>      if (c == ' ')
>       [...]
>      else
>        a = b;   // flip-flop
>      break;
>    case 1:
>      a = 0;
>      b = 15; // this will jump-thread to 15
>      break;
>
>    case 15:
>    [...]
>    }
> } while (...);
>
> The patch has passed bootstrap and regression testing on x86_64-linux.
> Ok to commit?
>
> Thanks,
> Sebastian
>
>
> 0001-PR-64878-do-not-jump-thread-across-more-than-one-bac.patch
>
>
>  From 59c486720749fc3d5feac2a5364d52eac60ba2d0 Mon Sep 17 00:00:00 2001
> From: Sebastian Pop<s.pop@samsung.com>
> Date: Wed, 4 Feb 2015 11:17:54 -0600
> Subject: [PATCH] PR 64878: do not jump thread across more than one back-edge
>
> 2015-02-04  Sebastian Pop<s.pop@samsung.com>
> 	    Brian Rzycki<b.rzycki@samsung.com>
>
> 	PR tree-optimization/64878
> 	* tree-ssa-threadedge.c: Include tree-ssa-loop.h.
> 	(fsm_find_control_statement_thread_paths): Add parameter seen_loop_phi.
> 	Stop recursion at loop phi nodes after having visited a loop phi node.
>
> 	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c: New.

+      /* PR64878: do not take more than a loop phi node: it may be a 
flip-flop
+	 operation across two latch edges.  */

The comment doesn't make much sense.  Shouldn't it be something like "Do 
not walk through more than one loop PHI node"?

OK for the trunk with the comment clarified.

jeff
diff mbox

Patch

From 59c486720749fc3d5feac2a5364d52eac60ba2d0 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <s.pop@samsung.com>
Date: Wed, 4 Feb 2015 11:17:54 -0600
Subject: [PATCH] PR 64878: do not jump thread across more than one back-edge

2015-02-04  Sebastian Pop  <s.pop@samsung.com>
	    Brian Rzycki  <b.rzycki@samsung.com>

	PR tree-optimization/64878
	* tree-ssa-threadedge.c: Include tree-ssa-loop.h.
	(fsm_find_control_statement_thread_paths): Add parameter seen_loop_phi.
	Stop recursion at loop phi nodes after having visited a loop phi node.

	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c | 440 +++++++++++++++++++++++
 gcc/tree-ssa-threadedge.c                        |  20 +-
 2 files changed, 457 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c
new file mode 100644
index 0000000..9be75aa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-8.c
@@ -0,0 +1,440 @@ 
+/* PR 64878 */
+/* { dg-options "-O2" } */
+/* { dg-do run } */
+
+struct A { int a1; };
+struct B { char *b1; int b2; int b3; };
+struct C { char *c1; int c2; struct B *c3; };
+extern struct A *f1 (char *s);
+static struct A *f2 (struct C *x);
+__attribute__ ((noinline, noclone)) int f3 (struct A *x, struct A *z) { asm volatile ("" : : "g" (x), "g" (z) : "memory"); return 0; }
+__attribute__ ((noinline, noclone)) void f4 (struct A *x, char *y, struct A *z) { asm volatile ("" : : "g" (x), "g" (z), "g" (y) : "memory"); }
+__attribute__ ((noinline, noclone)) struct B *f5 (void) { static char b[32]; static struct B f3 = { b, 0, 32 }; return &f3; }
+__attribute__ ((noinline, noclone)) int f6 (struct B *p, char *w, int z) { asm volatile ("" : : "g" (p), "g" (w), "g" (z) : "memory"); return 0; }
+__attribute__ ((noinline, noclone)) void f7 (struct B *p) { asm volatile ("" : : "g" (p) : "memory"); }
+__attribute__ ((noinline, noclone)) void f8 (struct B *p) { asm volatile ("" : : "g" (p) : "memory"); }
+__attribute__ ((noinline, noclone)) void f9 (struct A *x) { asm volatile ("" : : "g" (x) : "memory"); }
+__attribute__ ((noinline, noclone)) struct A *f10 (void) { static struct A j; asm volatile ("" : :  : "memory"); return &j; }
+__attribute__ ((noinline, noclone)) struct A *f11 (void) { static struct A j; asm volatile ("" : :  : "memory"); return &j; }
+__attribute__ ((noinline, noclone)) struct A *f12 (int b) { static struct A j; asm volatile ("" : : "g" (b) : "memory"); return &j; }
+__attribute__ ((noinline, noclone)) struct A *f13 (int i) { static struct A j; asm volatile ("" : : "g" (i) : "memory"); return &j; }
+__attribute__ ((noinline, noclone)) struct A *f14 (double d) { static struct A j; asm volatile ("" : : "g" (&d) : "memory"); return &j; }
+__attribute__ ((noinline, noclone)) struct A *f15 (char *s) { static struct A j; asm volatile ("" : : "g" (s) : "memory"); return &j; }
+char *t = "0123456789abcdef";
+char *u = "0123456789.+-e";
+
+__attribute__ ((noinline, noclone)) struct A *
+f1 (char *s)
+{
+  struct C f;
+  struct A *o;
+  f.c1 = s;
+  f.c2 = 0;
+  f.c3 = f5 ();
+  o = f2 (&f);
+  f8 (f.c3);
+  return o;
+}
+
+static struct A *
+f2 (struct C *x)
+{
+  int a, b, e = 0;
+  struct A *f = 0, *o;
+  char *g = 0;
+  char h = '\0';
+  int i = 0, j = 0;
+  a = 0;
+  b = 1;
+  char c;
+  do
+    {
+      c = x->c1[x->c2];
+      switch (a)
+	{
+	case 0:
+	  if (c == ' ')
+	    x->c2++;
+	  else if (c == '/')
+	    {
+	      a = 4;
+	      j = x->c2++;
+	    }
+	  else
+	    a = b;
+	  break;
+	case 1:
+	  switch (c)
+	    {
+	    case '{':
+	      a = 0;
+	      b = 15;
+	      f = f10 ();
+	      x->c2++;
+	      break;
+	    case '[':
+	      a = 0;
+	      b = 13;
+	      f = f11 ();
+	      x->c2++;
+	      break;
+	    case 'N':
+	    case 'n':
+	      a = 3;
+	      j = x->c2++;
+	      break;
+	    case '"':
+	    case '\'':
+	      h = c;
+	      f7 (x->c3);
+	      a = 8;
+	      j = ++x->c2;
+	      break;
+	    case 'T':
+	    case 't':
+	    case 'F':
+	    case 'f':
+	      a = 11;
+	      j = x->c2++;
+	      break;
+	    case '0' ... '9':
+	    case '-':
+	      i = 0;
+	      a = 12;
+	      j = x->c2++;
+	      break;
+	    default:
+	      e = 1;
+	      goto out;
+	    }
+	  break;
+	case 2:
+	  goto out;
+	case 3:
+	  if (__builtin_strncmp ("null", x->c1 + j, x->c2 - j))
+	    {
+	      e = 2;
+	      goto out;
+	    }
+	  if (x->c2 - j == 4)
+	    {
+	      f = 0;
+	      b = 2;
+	      a = 0;
+	    }
+	  else
+	    x->c2++;
+	  break;
+	case 4:
+	  if (c == '*')
+	    a = 5;
+	  else if (c == '/')
+	    a = 6;
+	  else
+	    {
+	      e = 8;
+	      goto out;
+	    }
+	  x->c2++;
+	  break;
+	case 5:
+	  if (c == '*')
+	    a = 7;
+	  x->c2++;
+	  break;
+	case 6:
+	  if (c == '\n')
+	    a = 0;
+	  x->c2++;
+	  break;
+	case 7:
+	  if (c == '/')
+	    a = 0;
+	  else
+	    a = 5;
+	  x->c2++;
+	  break;
+	case 8:
+	  if (c == h)
+	    {
+	      f6 (x->c3, x->c1 + j, x->c2 - j);
+	      f = f15 (x->c3->b1);
+	      b = 2;
+	      a = 0;
+	    }
+	  else if (c == '\\')
+	    {
+	      b = 8;
+	      a = 9;
+	    }
+	  x->c2++;
+	  break;
+	case 9:
+	  switch (c)
+	    {
+	    case '"':
+	    case '\\':
+	      f6 (x->c3, x->c1 + j, x->c2 - j - 1);
+	      j = x->c2++;
+	      a = b;
+	      break;
+	    case 'b':
+	    case 'n':
+	    case 'r':
+	    case 't':
+	      f6 (x->c3, x->c1 + j, x->c2 - j - 1);
+	      if (c == 'b')
+		f6 (x->c3, "\b", 1);
+	      else if (c == 'n')
+		f6 (x->c3, "\n", 1);
+	      else if (c == 'r')
+		f6 (x->c3, "\r", 1);
+	      else if (c == 't')
+		f6 (x->c3, "\t", 1);
+	      j = ++x->c2;
+	      a = b;
+	      break;
+	    case 'u':
+	      f6 (x->c3, x->c1 + j, x->c2 - j - 1);
+	      j = ++x->c2;
+	      a = 10;
+	      break;
+	    default:
+	      e = 7;
+	      goto out;
+	    }
+	  break;
+	case 10:
+	  if (__builtin_strchr (t, c))
+	    {
+	      x->c2++;
+	      if (x->c2 - j == 4)
+		{
+		  unsigned char w[3];
+		  unsigned int s =
+		    (((x->c1[j] <= '9') ? x->c1[j] - '0' : (x->c1[j] & 7) + 9) << 12)
+		    + (((x->c1[j + 1] <= '9') ? x->c1[j + 1] - '0' : (x->c1[j + 1] & 7) + 9) << 8)
+		    + (((x->c1[j + 2] <= '9') ? x->c1[j + 2] - '0' : (x->c1[j + 2] & 7) + 9) << 4)
+		    + ((x->c1[j + 3] <= '9') ? x->c1[j + 3] - '0' : (x->c1[j + 3] & 7) + 9);
+		  if (s < 0x80)
+		    {
+		      w[0] = s;
+		      f6 (x->c3, (char *) w, 1);
+		    }
+		  else if (s < 0x800)
+		    {
+		      w[0] = 0xc0 | (s >> 6);
+		      w[1] = 0x80 | (s & 0x3f);
+		      f6 (x->c3, (char *) w, 2);
+		    }
+		  else
+		    {
+		      w[0] = 0x0 | (s >> 12);
+		      w[1] = 0x80 | ((s >> 6) & 0x3f);
+		      w[2] = 0x80 | (s & 0x3f);
+		      f6 (x->c3, (char *) w, 3);
+		    }
+		  j = x->c2;
+		  a = b;
+		}
+	    }
+	  else
+	    {
+	      e = 7;
+	      goto out;
+	    }
+	  break;
+	case 11:
+	  if (__builtin_strncmp ("true", x->c1 + j, x->c2 - j) == 0)
+	    {
+	      if (x->c2 - j == 4)
+		{
+		  f = f12 (1);
+		  b = 2;
+		  a = 0;
+		}
+	      else
+		x->c2++;
+	    }
+	  else if (__builtin_strncmp ("false", x->c1 + j, x->c2 - j) == 0)
+	    {
+	      if (x->c2 - j == 5)
+		{
+		  f = f12 (0);
+		  b = 2;
+		  a = 0;
+		}
+	      else
+		x->c2++;
+	    }
+	  else
+	    {
+	      e = 3;
+	      goto out;
+	    }
+	  break;
+	case 12:
+	  if (!c || !__builtin_strchr (u, c))
+	    {
+	      if (!i)
+		f = f13 (0);
+	      else
+		f = f14 (0.0);
+	      b = 2;
+	      a = 0;
+	    }
+	  else
+	    {
+	      if (c == '.' || c == 'e')
+		i = 1;
+	      x->c2++;
+	    }
+	  break;
+	case 13:
+	  if (c == ']')
+	    {
+	      x->c2++;
+	      b = 2;
+	      a = 0;
+	    }
+	  else
+	    {
+	      o = f2 (x);
+	      if (((unsigned long) o > (unsigned long) -4000L))
+		{
+		  e = 5;
+		  goto out;
+		}
+	      f3 (f, o);
+	      b = 14;
+	      a = 0;
+	    }
+	  break;
+	case 14:
+	  if (c == ']')
+	    {
+	      x->c2++;
+	      b = 2;
+	      a = 0;
+	    }
+	  else if (c == ',')
+	    {
+	      x->c2++;
+	      b = 13;
+	      a = 0;
+	    }
+	  else
+	    {
+	      f9 (f);
+	      e = 5;
+	      goto out;
+	    }
+	  break;
+	case 15:
+	  a = 16;
+	  j = x->c2;
+	  break;
+	case 16:
+	  if (c == '}')
+	    {
+	      x->c2++;
+	      b = 2;
+	      a = 0;
+	    }
+	  else if (c == '"' || c == '\'')
+	    {
+	      h = c;
+	      f7 (x->c3);
+	      a = 17;
+	      j = ++x->c2;
+	    }
+	  else
+	    {
+	      e = 6;
+	      goto out;
+	    }
+	  break;
+	case 17:
+	  if (c == h)
+	    {
+	      f6 (x->c3, x->c1 + j, x->c2 - j);
+	      g = __builtin_strdup (x->c3->b1);
+	      b = 18;
+	      a = 0;
+	    }
+	  else if (c == '\\')
+	    {
+	      b = 17;
+	      a = 9;
+	    }
+	  x->c2++;
+	  break;
+	case 18:
+	  if (c == ':')
+	    {
+	      x->c2++;
+	      b = 19;
+	      a = 0;
+	    }
+	  else
+	    {
+	      e = -6;
+	      goto out;
+	    }
+	  break;
+	case 19:
+	  o = f2 (x);
+	  if (((unsigned long) o > (unsigned long) -4000L))
+	    {
+	      e = 6;
+	      goto out;
+	    }
+	  f4 (f, g, o);
+	  __builtin_free (g);
+	  g = 0;
+	  b = 20;
+	  a = 0;
+	  break;
+	case 20:
+	  if (c == '}')
+	    {
+	      x->c2++;
+	      b = 2;
+	      a = 0;
+	    }
+	  else if (c == ',')
+	    {
+	      x->c2++;
+	      b = 15;
+	      a = 0;
+	    }
+	  else
+	    {
+	      e = 6;
+	      goto out;
+	    }
+	  break;
+	}
+    }
+  while (c);
+  if (a != 2 && b != 2)
+    e = 9;
+out:
+  __builtin_free (g);
+  if (e == 0)
+    return f;
+  f9 (f);
+  return 0;
+}
+
+int
+main ()
+{
+  asm volatile ("" : : : "memory");
+  struct A *r = f1 ("{ \"id\": null, \"blahah\": \"foobarbazbar\", \"barbar\": { \"barbarbarba\":"
+		    "\"abcdefgh\", \"ijklmnopqr\": \"stuvwxyzabcdefghijklmnopqrstuv\", \"xyzxyz\":"
+		    " [ \"1\" ] } }");
+  if (!r)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 9e10e44..b2e7764 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -61,6 +61,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "params.h"
 #include "tree-ssa-threadedge.h"
+#include "tree-ssa-loop.h"
 #include "builtins.h"
 #include "cfg.h"
 #include "cfganal.h"
@@ -1006,7 +1007,8 @@  static int max_threaded_paths;
 static void
 fsm_find_control_statement_thread_paths (tree expr,
 					 hash_set<gimple> *visited_phis,
-					 vec<basic_block, va_gc> *&path)
+					 vec<basic_block, va_gc> *&path,
+					 bool seen_loop_phi)
 {
   tree var = SSA_NAME_VAR (expr);
   gimple def_stmt = SSA_NAME_DEF_STMT (expr);
@@ -1030,6 +1032,15 @@  fsm_find_control_statement_thread_paths (tree expr,
   int next_path_length = 0;
   basic_block last_bb_in_path = path->last ();
 
+  if (loop_containing_stmt (phi)->header == gimple_bb (phi))
+    {
+      /* PR64878: do not take more than a loop phi node: it may be a flip-flop
+	 operation across two latch edges.  */
+      if (seen_loop_phi)
+	return;
+      seen_loop_phi = true;
+    }
+
   /* Following the chain of SSA_NAME definitions, we jumped from a definition in
      LAST_BB_IN_PATH to a definition in VAR_BB.  When these basic blocks are
      different, append to PATH the blocks from LAST_BB_IN_PATH to VAR_BB.  */
@@ -1090,7 +1101,9 @@  fsm_find_control_statement_thread_paths (tree expr,
 	{
 	  vec_safe_push (path, bbi);
 	  /* Recursively follow SSA_NAMEs looking for a constant definition.  */
-	  fsm_find_control_statement_thread_paths (arg, visited_phis, path);
+	  fsm_find_control_statement_thread_paths (arg, visited_phis, path,
+						   seen_loop_phi);
+
 	  path->pop ();
 	  continue;
 	}
@@ -1357,7 +1370,8 @@  thread_through_normal_block (edge e,
       hash_set<gimple> *visited_phis = new hash_set<gimple>;
 
       max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS);
-      fsm_find_control_statement_thread_paths (cond, visited_phis, bb_path);
+      fsm_find_control_statement_thread_paths (cond, visited_phis, bb_path,
+					       false);
 
       delete visited_phis;
       vec_free (bb_path);
-- 
1.9.1