diff mbox series

Eliminates phi on branch for CMP result

Message ID 1557152644-3063-1-git-send-email-guojiufu@linux.ibm.com
State New
Headers show
Series Eliminates phi on branch for CMP result | expand

Commit Message

Jiufu Guo May 6, 2019, 2:24 p.m. UTC
Hi,

This patch implements the optimization in PR77820.  The optimization
eliminates phi and phi's basic block, if the phi is used only by
condition branch, and the phi's incoming value in the result of a
CMP result.

This optimization eliminates:
1. saving CMP result: p0 = a CMP b.
2. additional CMP on branch: if (phi != 0).
3. converting CMP result if there is phi = (INT_CONV) p0 if there is.

  <P0>
  p0 = a CMP b
  goto <X>;

  <P1>
  p1 = c CMP d
  goto <X>;

  <X>
  # phi = PHI <p0 (P0), p1 (P1)>
  if (phi != 0) goto <Y>; else goto <Z>;

Transform to:

  <P0>
  p0 = a CMP b
  if (p0 != 0) goto <Y>; else goto <Z>;

  <P1>
  p1 = c CMP d
  if (p1 != 0) goto <Y>; else goto <Z>;

Bootstrapped and tested on powerpc64le with no regressions, and testcases were
saved. Is this ok for trunk?

Thanks!

[gcc]
2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
	    Lijia He  <helijia@linux.ibm.com>

	PR tree-optimization/77820
	* tree-ssa-mergephicmp.c: New file.
	* Makefile.in (OBJS): Add tree-ssa-mergephicmp.o.
	* common.opt (ftree-mergephicmp): New flag.
	* passes.def (pass_mergephicmp): New pass.
	* timevar.def (TV_TREE_MERGEPHICMP): New timevar.
	* tree-pass.h: New file.

[gcc/testsuite]
2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
	    Lijia He  <helijia@linux.ibm.com>

	PR tree-optimization/77820
	* gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase.


---
 gcc/Makefile.in                                  |   1 +
 gcc/common.opt                                   |   4 +
 gcc/passes.def                                   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c |  31 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c |  31 +++
 gcc/timevar.def                                  |   1 +
 gcc/tree-pass.h                                  |   1 +
 gcc/tree-ssa-mergephicmp.c                       | 260 +++++++++++++++++++++++
 8 files changed, 330 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
 create mode 100644 gcc/tree-ssa-mergephicmp.c

Comments

Richard Biener May 7, 2019, 1:31 p.m. UTC | #1
On Mon, 6 May 2019, Jiufu Guo wrote:

> Hi,
> 
> This patch implements the optimization in PR77820.  The optimization
> eliminates phi and phi's basic block, if the phi is used only by
> condition branch, and the phi's incoming value in the result of a
> CMP result.
> 
> This optimization eliminates:
> 1. saving CMP result: p0 = a CMP b.
> 2. additional CMP on branch: if (phi != 0).
> 3. converting CMP result if there is phi = (INT_CONV) p0 if there is.
> 
>   <P0>
>   p0 = a CMP b
>   goto <X>;
> 
>   <P1>
>   p1 = c CMP d
>   goto <X>;
> 
>   <X>
>   # phi = PHI <p0 (P0), p1 (P1)>
>   if (phi != 0) goto <Y>; else goto <Z>;
> 
> Transform to:
> 
>   <P0>
>   p0 = a CMP b
>   if (p0 != 0) goto <Y>; else goto <Z>;
> 
>   <P1>
>   p1 = c CMP d
>   if (p1 != 0) goto <Y>; else goto <Z>;
> 
> Bootstrapped and tested on powerpc64le with no regressions, and testcases were
> saved. Is this ok for trunk?

I'm not sure I like a new pass here ;)  The transform is basically
tail-duplicating the PHI block because the exit conditional can
be "simplfied" - that's something jump threading generally does
though it relies on "simplified" being a little bit more simplified
than above.

I suspect this transform was implemented because of some benchmark?

I suspect the performance benefit is because of better branch
prediction by not mangling both conditional branches into one?

The transform is also somewhat similar to tail-duplication done
in path splitting or tracer.

The pass itself does quite strict pattern-matching but I wonder
if more cases should be handled this way.

Any specific reason for the pass placement between PRE and sinking?
tracer and path splitting run much later, jump threading runs
all over the place.

Thanks,
Richard.

> Thanks!
> 
> [gcc]
> 2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
> 	    Lijia He  <helijia@linux.ibm.com>
> 
> 	PR tree-optimization/77820
> 	* tree-ssa-mergephicmp.c: New file.
> 	* Makefile.in (OBJS): Add tree-ssa-mergephicmp.o.
> 	* common.opt (ftree-mergephicmp): New flag.
> 	* passes.def (pass_mergephicmp): New pass.
> 	* timevar.def (TV_TREE_MERGEPHICMP): New timevar.
> 	* tree-pass.h: New file.
> 
> [gcc/testsuite]
> 2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
> 	    Lijia He  <helijia@linux.ibm.com>
> 
> 	PR tree-optimization/77820
> 	* gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase.
> 	* gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase.
> 
> 
> ---
>  gcc/Makefile.in                                  |   1 +
>  gcc/common.opt                                   |   4 +
>  gcc/passes.def                                   |   1 +
>  gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c |  31 +++
>  gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c |  31 +++
>  gcc/timevar.def                                  |   1 +
>  gcc/tree-pass.h                                  |   1 +
>  gcc/tree-ssa-mergephicmp.c                       | 260 +++++++++++++++++++++++
>  8 files changed, 330 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
>  create mode 100644 gcc/tree-ssa-mergephicmp.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index d186d71..9729501 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1567,6 +1567,7 @@ OBJS = \
>  	tree-ssa-reassoc.o \
>  	tree-ssa-sccvn.o \
>  	tree-ssa-scopedtables.o \
> +	tree-ssa-mergephicmp.o \
>  	tree-ssa-sink.o \
>  	tree-ssa-strlen.o \
>  	tree-ssa-structalias.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d342c4f..5ea5ed2 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2702,6 +2702,10 @@ ftree-salias
>  Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>  
> +ftree-mergephicmp
> +Common Report Var(flag_mergephicmp) Init(1) Optimization
> +Enable optimization on branch phi compare on trees.
> +
>  ftree-sink
>  Common Report Var(flag_tree_sink) Optimization
>  Enable SSA code sinking on trees.
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 446a7c4..e3a3913 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_lim);
>        NEXT_PASS (pass_walloca, false);
>        NEXT_PASS (pass_pre);
> +      NEXT_PASS (pass_mergephicmp);
>        NEXT_PASS (pass_sink_code);
>        NEXT_PASS (pass_sancov);
>        NEXT_PASS (pass_asan);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> new file mode 100644
> index 0000000..2e3f4f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
> +
> +void g (void);
> +void g1 (void);
> +
> +void
> +f (long a, long b, long c, long d, int x)
> +{
> +  _Bool t;
> +  if (x)
> +    {
> +      t = a < b;
> +    }
> +  else if (d == a + b)
> +    {
> +      t = c < d;
> +    }
> +  else
> +    {
> +      t = a == c;
> +    }
> +
> +  if (t)
> +    {
> +      g1 ();
> +      g ();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> new file mode 100644
> index 0000000..7c35417
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
> +
> +void g (void);
> +void g1 (void);
> +
> +void
> +f (long a, long b, long c, long d, int x)
> +{
> +  int t;
> +  if (x)
> +    {
> +      t = a < b;
> +    }
> +  else if (d == a + b)
> +    {
> +      t = c < d;
> +    }
> +  else
> +    {
> +      t = a == c;
> +    }
> +
> +  if (t)
> +    {
> +      g1 ();
> +      g ();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 5415446..91f92d7 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES      , "tree split crit edges")
>  DEFTIMEVAR (TV_TREE_REASSOC          , "tree reassociation")
>  DEFTIMEVAR (TV_TREE_PRE		     , "tree PRE")
>  DEFTIMEVAR (TV_TREE_FRE		     , "tree FRE")
> +DEFTIMEVAR (TV_TREE_MERGEPHICMP	     , "tree branch on PHI compare")
>  DEFTIMEVAR (TV_TREE_SINK             , "tree code sinking")
>  DEFTIMEVAR (TV_TREE_PHIOPT	     , "tree linearize phis")
>  DEFTIMEVAR (TV_TREE_BACKPROP	     , "tree backward propagate")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 47be59b..e21e820 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c
> new file mode 100644
> index 0000000..7bb82d5
> --- /dev/null
> +++ b/gcc/tree-ssa-mergephicmp.c
> @@ -0,0 +1,260 @@
> +/* Elimiate PHI nodes which come from comparasion and used by branch.
> +   Copyright (C) 2004-2019 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "cfghooks.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "tree-cfg.h"
> +
> +/* Return true if it is ok to merge phi's incoming value:
> +  - phi's incoming value is
> +  phi_arg = a CMP b
> +  or
> +  cmp = a CMP b
> +  phi_arg = (INT_CONV) cmp
> +  - phi's incoming value is defined in incoming basic block.
> +
> +  * Parameter PHI: the phi to be checked.
> +  * Parameter INDEX: index'th incoming argument of phi to be checked. */
> +static bool
> +could_incoming_merge (gphi *phi, int index)
> +{
> +  tree value = gimple_phi_arg_def (phi, index);
> +  if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value)))
> +    return false;
> +
> +  gimple *def = SSA_NAME_DEF_STMT (value);
> +
> +  if (!is_gimple_assign (def))
> +    return false;
> +
> +  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)))
> +    {
> +      value = gimple_assign_rhs1 (def);
> +      if (!has_single_use (value))
> +	return false;
> +
> +      def = SSA_NAME_DEF_STMT (value);
> +
> +      if (!is_gimple_assign (def))
> +	return false;
> +    }
> +
> +  if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison)
> +    return false;
> +
> +  /* Check if phi's incoming value is defined in the incoming basic_block.  */
> +  edge e = gimple_phi_arg_edge (phi, index);
> +  if (def->bb != e->src)
> +    return false;
> +
> +  if (!single_succ_p (def->bb))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return true if the basic_block is ok to optimize:
> +   <X>
> +   res = PHI <v0(b0), v1(b1)...>
> +   if (res != 0) goto l_true; goto l_false;
> +
> +   The phi stmt is saved to argument PHI.
> +   The gcond stmt is saved to argument GC.  */
> +static bool
> +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc)
> +{
> +  /* There should be only one phi.  */
> +  gphi_iterator pi = gsi_start_phis (bb);
> +  if (gsi_end_p (pi))
> +    return false;
> +
> +  *phi = pi.phi ();
> +  if (!has_single_use ((*phi)->result))
> +    return false;
> +
> +  gsi_next (&pi);
> +  if (!gsi_end_p (pi))
> +    return false;
> +
> +  /* There should be only one stmt which is gcond.  */
> +  gimple *gs = last_and_only_stmt (bb);
> +  if (gs == NULL)
> +    return false;
> +
> +  if (!is_a<gcond *> (gs))
> +    return false;
> +
> +  *gc = as_a<gcond *> (gs);
> +  if (gimple_cond_lhs (*gc) != (*phi)->result)
> +    return false;
> +
> +  /* Check if all incoming basic blocks can merge.  */
> +  for (unsigned int i = 0; i < (*phi)->nargs; i++)
> +    if (!could_incoming_merge (*phi, i))
> +      return false;
> +
> +  /* Check if there is no phi in any successors.  */
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (e, ei, (*phi)->bb->succs)
> +    {
> +      if (!gsi_end_p (gsi_start_phis (e->dest)))
> +	return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Merge the phi and the gcond into the index'th incoming block.  */
> +static void
> +merge_incoming_bb (gcond *gc, gphi *phi, int index)
> +{
> +  tree value = gimple_phi_arg_def (phi, index);
> +
> +  gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value,
> +				     gimple_cond_rhs (gc), NULL, NULL);
> +
> +  gimple *def = SSA_NAME_DEF_STMT (value);
> +  gimple_stmt_iterator last = gsi_last_bb (def->bb);
> +  gsi_insert_after (&last, new_gc, GSI_NEW_STMT);
> +
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (e, ei, phi->bb->succs)
> +    {
> +      edge new_e = make_edge (def->bb, e->dest, e->flags);
> +      new_e->probability = e->probability;
> +    }
> +}
> +
> +/* If there are basic blocks look like:
> +  <P0>
> +  p0 = a CMP b
> +  goto <X>;
> +
> +  <P1>
> +  p1 = c CMP d
> +  goto <X>;
> +
> +  <X>
> +  # phi = PHI <p0 (P0), p1 (P1)>
> +  if (phi != 0) goto <Y>; else goto <Z>;
> +
> +Transform it to:
> +
> +  <P0>
> +  p0 = a CMP b
> +  if (p0 != 0) goto <Y>; else goto <Z>;
> +
> +  <P1>
> +  p1 = c CMP d
> +  if (p1 != 0) goto <Y>; else goto <Z>; */
> +
> +static bool
> +mergephicmp_once (function *fun)
> +{
> +  bool change = false;
> +  basic_block bb;
> +  basic_block prev_need_delete = NULL;
> +
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      gphi *phi;
> +      gcond *gc;
> +
> +      /* Prev bb can be deleted only after iterator move to next bb.  */
> +      if (prev_need_delete)
> +	delete_basic_block (prev_need_delete);
> +      prev_need_delete = NULL;
> +
> +      if (could_optimize_phi_bb (bb, &phi, &gc))
> +	{
> +	  for (unsigned int i = 0; i < phi->nargs; i++)
> +	    merge_incoming_bb (gc, phi, i);
> +
> +	  change = true;
> +	  prev_need_delete = bb;
> +	}
> +    }
> +
> +  return change;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_mergephicmp = {
> +  GIMPLE_PASS,		/* type */
> +  "mergephicmp",       /* name */
> +  OPTGROUP_NONE,	/* optinfo_flags */
> +  TV_TREE_MERGEPHICMP, /* tv_id */
> +  /* PROP_no_crit_edges is ensured by running split_critical_edges in
> +     pass_data_sink_code::execute ().  */
> +  (PROP_cfg | PROP_ssa), /* properties_required */
> +  0,			 /* properties_provided */
> +  0,			 /* properties_destroyed */
> +  0,			 /* todo_flags_start */
> +  0,			 /* todo_flags_finish */
> +};
> +
> +class pass_mergephicmp : public gimple_opt_pass
> +{
> +public:
> +  pass_mergephicmp (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_mergephicmp, ctxt)
> +  {
> +  }
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_mergephicmp != 0; }
> +
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_sink_code
> +
> +unsigned int
> +pass_mergephicmp::execute (function *fun)
> +{
> +  bool changed;
> +
> +  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
> +    return 0;
> +
> +  changed = mergephicmp_once (fun);
> +
> +  if (changed)
> +    free_dominance_info (CDI_DOMINATORS);
> +
> +  return changed ? TODO_cleanup_cfg : 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_mergephicmp (gcc::context *ctxt)
> +{
> +  return new pass_mergephicmp (ctxt);
> +}
>
Segher Boessenkool May 7, 2019, 4:42 p.m. UTC | #2
Let me try to answer some of this...

On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
> On Mon, 6 May 2019, Jiufu Guo wrote:
> > This patch implements the optimization in PR77820.  The optimization
> > eliminates phi and phi's basic block, if the phi is used only by
> > condition branch, and the phi's incoming value in the result of a
> > CMP result.

> I'm not sure I like a new pass here ;)  The transform is basically
> tail-duplicating the PHI block because the exit conditional can
> be "simplfied" - that's something jump threading generally does
> though it relies on "simplified" being a little bit more simplified
> than above.

Right, but where in the pipeline does this fit in?

> I suspect this transform was implemented because of some benchmark?

Something in SPEC...  2006 iirc...  Will need to dig it up, I forgot
the details.

> I suspect the performance benefit is because of better branch
> prediction by not mangling both conditional branches into one?

No, it is that previously a condition was moved to a GPR, and then compared
again.  See PR77820.  This is expensive, and serial, too.

> The transform is also somewhat similar to tail-duplication done
> in path splitting or tracer.

Yes.

> The pass itself does quite strict pattern-matching but I wonder
> if more cases should be handled this way.

Maybe.  Probably.  But which?

> Any specific reason for the pass placement between PRE and sinking?
> tracer and path splitting run much later, jump threading runs
> all over the place.

Dunno.  Jiufu, does the pass placement matter much?


Segher
Andrew Pinski May 7, 2019, 5:14 p.m. UTC | #3
On Mon, May 6, 2019 at 7:24 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>
> Hi,
>
> This patch implements the optimization in PR77820.  The optimization
> eliminates phi and phi's basic block, if the phi is used only by
> condition branch, and the phi's incoming value in the result of a
> CMP result.
>
> This optimization eliminates:
> 1. saving CMP result: p0 = a CMP b.
> 2. additional CMP on branch: if (phi != 0).
> 3. converting CMP result if there is phi = (INT_CONV) p0 if there is.
>
>   <P0>
>   p0 = a CMP b
>   goto <X>;
>
>   <P1>
>   p1 = c CMP d
>   goto <X>;
>
>   <X>
>   # phi = PHI <p0 (P0), p1 (P1)>
>   if (phi != 0) goto <Y>; else goto <Z>;
>
> Transform to:
>
>   <P0>
>   p0 = a CMP b
>   if (p0 != 0) goto <Y>; else goto <Z>;
>
>   <P1>
>   p1 = c CMP d
>   if (p1 != 0) goto <Y>; else goto <Z>;
>
> Bootstrapped and tested on powerpc64le with no regressions, and testcases were
> saved. Is this ok for trunk?

forwprop was created orginally to something similar but this case is a
specific case of backwards prop (almost).
I wonder if it could be combined with that or as Richard mentioned,
jump threading.

Thanks,
Andrew Pinski

>
> Thanks!
>
> [gcc]
> 2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
>             Lijia He  <helijia@linux.ibm.com>
>
>         PR tree-optimization/77820
>         * tree-ssa-mergephicmp.c: New file.
>         * Makefile.in (OBJS): Add tree-ssa-mergephicmp.o.
>         * common.opt (ftree-mergephicmp): New flag.
>         * passes.def (pass_mergephicmp): New pass.
>         * timevar.def (TV_TREE_MERGEPHICMP): New timevar.
>         * tree-pass.h: New file.
>
> [gcc/testsuite]
> 2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
>             Lijia He  <helijia@linux.ibm.com>
>
>         PR tree-optimization/77820
>         * gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase.
>         * gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase.
>
>
> ---
>  gcc/Makefile.in                                  |   1 +
>  gcc/common.opt                                   |   4 +
>  gcc/passes.def                                   |   1 +
>  gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c |  31 +++
>  gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c |  31 +++
>  gcc/timevar.def                                  |   1 +
>  gcc/tree-pass.h                                  |   1 +
>  gcc/tree-ssa-mergephicmp.c                       | 260 +++++++++++++++++++++++
>  8 files changed, 330 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
>  create mode 100644 gcc/tree-ssa-mergephicmp.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index d186d71..9729501 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1567,6 +1567,7 @@ OBJS = \
>         tree-ssa-reassoc.o \
>         tree-ssa-sccvn.o \
>         tree-ssa-scopedtables.o \
> +       tree-ssa-mergephicmp.o \
>         tree-ssa-sink.o \
>         tree-ssa-strlen.o \
>         tree-ssa-structalias.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d342c4f..5ea5ed2 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2702,6 +2702,10 @@ ftree-salias
>  Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>
> +ftree-mergephicmp
> +Common Report Var(flag_mergephicmp) Init(1) Optimization
> +Enable optimization on branch phi compare on trees.
> +
>  ftree-sink
>  Common Report Var(flag_tree_sink) Optimization
>  Enable SSA code sinking on trees.
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 446a7c4..e3a3913 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_lim);
>        NEXT_PASS (pass_walloca, false);
>        NEXT_PASS (pass_pre);
> +      NEXT_PASS (pass_mergephicmp);
>        NEXT_PASS (pass_sink_code);
>        NEXT_PASS (pass_sancov);
>        NEXT_PASS (pass_asan);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> new file mode 100644
> index 0000000..2e3f4f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
> +
> +void g (void);
> +void g1 (void);
> +
> +void
> +f (long a, long b, long c, long d, int x)
> +{
> +  _Bool t;
> +  if (x)
> +    {
> +      t = a < b;
> +    }
> +  else if (d == a + b)
> +    {
> +      t = c < d;
> +    }
> +  else
> +    {
> +      t = a == c;
> +    }
> +
> +  if (t)
> +    {
> +      g1 ();
> +      g ();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> new file mode 100644
> index 0000000..7c35417
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
> +
> +void g (void);
> +void g1 (void);
> +
> +void
> +f (long a, long b, long c, long d, int x)
> +{
> +  int t;
> +  if (x)
> +    {
> +      t = a < b;
> +    }
> +  else if (d == a + b)
> +    {
> +      t = c < d;
> +    }
> +  else
> +    {
> +      t = a == c;
> +    }
> +
> +  if (t)
> +    {
> +      g1 ();
> +      g ();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 5415446..91f92d7 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES      , "tree split crit edges")
>  DEFTIMEVAR (TV_TREE_REASSOC          , "tree reassociation")
>  DEFTIMEVAR (TV_TREE_PRE                     , "tree PRE")
>  DEFTIMEVAR (TV_TREE_FRE                     , "tree FRE")
> +DEFTIMEVAR (TV_TREE_MERGEPHICMP             , "tree branch on PHI compare")
>  DEFTIMEVAR (TV_TREE_SINK             , "tree code sinking")
>  DEFTIMEVAR (TV_TREE_PHIOPT          , "tree linearize phis")
>  DEFTIMEVAR (TV_TREE_BACKPROP        , "tree backward propagate")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 47be59b..e21e820 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c
> new file mode 100644
> index 0000000..7bb82d5
> --- /dev/null
> +++ b/gcc/tree-ssa-mergephicmp.c
> @@ -0,0 +1,260 @@
> +/* Elimiate PHI nodes which come from comparasion and used by branch.
> +   Copyright (C) 2004-2019 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "cfghooks.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "tree-cfg.h"
> +
> +/* Return true if it is ok to merge phi's incoming value:
> +  - phi's incoming value is
> +  phi_arg = a CMP b
> +  or
> +  cmp = a CMP b
> +  phi_arg = (INT_CONV) cmp
> +  - phi's incoming value is defined in incoming basic block.
> +
> +  * Parameter PHI: the phi to be checked.
> +  * Parameter INDEX: index'th incoming argument of phi to be checked. */
> +static bool
> +could_incoming_merge (gphi *phi, int index)
> +{
> +  tree value = gimple_phi_arg_def (phi, index);
> +  if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value)))
> +    return false;
> +
> +  gimple *def = SSA_NAME_DEF_STMT (value);
> +
> +  if (!is_gimple_assign (def))
> +    return false;
> +
> +  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)))
> +    {
> +      value = gimple_assign_rhs1 (def);
> +      if (!has_single_use (value))
> +       return false;
> +
> +      def = SSA_NAME_DEF_STMT (value);
> +
> +      if (!is_gimple_assign (def))
> +       return false;
> +    }
> +
> +  if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison)
> +    return false;
> +
> +  /* Check if phi's incoming value is defined in the incoming basic_block.  */
> +  edge e = gimple_phi_arg_edge (phi, index);
> +  if (def->bb != e->src)
> +    return false;
> +
> +  if (!single_succ_p (def->bb))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return true if the basic_block is ok to optimize:
> +   <X>
> +   res = PHI <v0(b0), v1(b1)...>
> +   if (res != 0) goto l_true; goto l_false;
> +
> +   The phi stmt is saved to argument PHI.
> +   The gcond stmt is saved to argument GC.  */
> +static bool
> +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc)
> +{
> +  /* There should be only one phi.  */
> +  gphi_iterator pi = gsi_start_phis (bb);
> +  if (gsi_end_p (pi))
> +    return false;
> +
> +  *phi = pi.phi ();
> +  if (!has_single_use ((*phi)->result))
> +    return false;
> +
> +  gsi_next (&pi);
> +  if (!gsi_end_p (pi))
> +    return false;
> +
> +  /* There should be only one stmt which is gcond.  */
> +  gimple *gs = last_and_only_stmt (bb);
> +  if (gs == NULL)
> +    return false;
> +
> +  if (!is_a<gcond *> (gs))
> +    return false;
> +
> +  *gc = as_a<gcond *> (gs);
> +  if (gimple_cond_lhs (*gc) != (*phi)->result)
> +    return false;
> +
> +  /* Check if all incoming basic blocks can merge.  */
> +  for (unsigned int i = 0; i < (*phi)->nargs; i++)
> +    if (!could_incoming_merge (*phi, i))
> +      return false;
> +
> +  /* Check if there is no phi in any successors.  */
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (e, ei, (*phi)->bb->succs)
> +    {
> +      if (!gsi_end_p (gsi_start_phis (e->dest)))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Merge the phi and the gcond into the index'th incoming block.  */
> +static void
> +merge_incoming_bb (gcond *gc, gphi *phi, int index)
> +{
> +  tree value = gimple_phi_arg_def (phi, index);
> +
> +  gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value,
> +                                    gimple_cond_rhs (gc), NULL, NULL);
> +
> +  gimple *def = SSA_NAME_DEF_STMT (value);
> +  gimple_stmt_iterator last = gsi_last_bb (def->bb);
> +  gsi_insert_after (&last, new_gc, GSI_NEW_STMT);
> +
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (e, ei, phi->bb->succs)
> +    {
> +      edge new_e = make_edge (def->bb, e->dest, e->flags);
> +      new_e->probability = e->probability;
> +    }
> +}
> +
> +/* If there are basic blocks look like:
> +  <P0>
> +  p0 = a CMP b
> +  goto <X>;
> +
> +  <P1>
> +  p1 = c CMP d
> +  goto <X>;
> +
> +  <X>
> +  # phi = PHI <p0 (P0), p1 (P1)>
> +  if (phi != 0) goto <Y>; else goto <Z>;
> +
> +Transform it to:
> +
> +  <P0>
> +  p0 = a CMP b
> +  if (p0 != 0) goto <Y>; else goto <Z>;
> +
> +  <P1>
> +  p1 = c CMP d
> +  if (p1 != 0) goto <Y>; else goto <Z>; */
> +
> +static bool
> +mergephicmp_once (function *fun)
> +{
> +  bool change = false;
> +  basic_block bb;
> +  basic_block prev_need_delete = NULL;
> +
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      gphi *phi;
> +      gcond *gc;
> +
> +      /* Prev bb can be deleted only after iterator move to next bb.  */
> +      if (prev_need_delete)
> +       delete_basic_block (prev_need_delete);
> +      prev_need_delete = NULL;
> +
> +      if (could_optimize_phi_bb (bb, &phi, &gc))
> +       {
> +         for (unsigned int i = 0; i < phi->nargs; i++)
> +           merge_incoming_bb (gc, phi, i);
> +
> +         change = true;
> +         prev_need_delete = bb;
> +       }
> +    }
> +
> +  return change;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_mergephicmp = {
> +  GIMPLE_PASS,         /* type */
> +  "mergephicmp",       /* name */
> +  OPTGROUP_NONE,       /* optinfo_flags */
> +  TV_TREE_MERGEPHICMP, /* tv_id */
> +  /* PROP_no_crit_edges is ensured by running split_critical_edges in
> +     pass_data_sink_code::execute ().  */
> +  (PROP_cfg | PROP_ssa), /* properties_required */
> +  0,                    /* properties_provided */
> +  0,                    /* properties_destroyed */
> +  0,                    /* todo_flags_start */
> +  0,                    /* todo_flags_finish */
> +};
> +
> +class pass_mergephicmp : public gimple_opt_pass
> +{
> +public:
> +  pass_mergephicmp (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_mergephicmp, ctxt)
> +  {
> +  }
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_mergephicmp != 0; }
> +
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_sink_code
> +
> +unsigned int
> +pass_mergephicmp::execute (function *fun)
> +{
> +  bool changed;
> +
> +  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
> +    return 0;
> +
> +  changed = mergephicmp_once (fun);
> +
> +  if (changed)
> +    free_dominance_info (CDI_DOMINATORS);
> +
> +  return changed ? TODO_cleanup_cfg : 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_mergephicmp (gcc::context *ctxt)
> +{
> +  return new pass_mergephicmp (ctxt);
> +}
> --
> 2.7.4
>
Jiufu Guo May 8, 2019, 6:32 a.m. UTC | #4
Hi,

Thanks Richard, Segher, Andrew and all.

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Let me try to answer some of this...
>
> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
>> On Mon, 6 May 2019, Jiufu Guo wrote:
>> > This patch implements the optimization in PR77820.  The optimization
>> > eliminates phi and phi's basic block, if the phi is used only by
>> > condition branch, and the phi's incoming value in the result of a
>> > CMP result.
>
>> I'm not sure I like a new pass here ;)  The transform is basically
>> tail-duplicating the PHI block because the exit conditional can
>> be "simplfied" - that's something jump threading generally does
>> though it relies on "simplified" being a little bit more simplified
>> than above.
>
Adding a new pass is just because it may be relatively easy to extend
and maintain. 

Adding this micor-optimization into other passes is also a good
sugguestion. 

- Similar with jump threading, this new transform is replacing jump
old destination with new destination. While for this case, the new
destination can not be determined at compile time. 

- And as Andrew said: forwprop/backprop are similar, but this case is in
opposite: it is spread common code(phi+gcond) into different preds.

- And there is phiopt pass which focus on making 'phi' stmt better. 
it also do some similar thing:
     bb0:
       if (a != b) goto bb2; else goto bb1;
     bb1:
     bb2:
       x = PHI <a (bb1), b (bb0), ...>;

   tranform to:

     bb0:
     bb2:
       x = PHI <b (bb0), ...>;

If I avoid to add new pass, any suggustions about which exiting pass
(jumpthreading/forwprop/phiopt/...) would be more suitable to extend to
support this new transform? 

> Right, but where in the pipeline does this fit in?
>
>> I suspect this transform was implemented because of some benchmark?
>
> Something in SPEC...  2006 iirc...  Will need to dig it up, I forgot
> the details.
>
>> I suspect the performance benefit is because of better branch
>> prediction by not mangling both conditional branches into one?
>
> No, it is that previously a condition was moved to a GPR, and then compared
> again.  See PR77820.  This is expensive, and serial, too.
>
This transform focusing on eliminating additional GPR saving and
additional compare, then it would helps a lot if the comparasion in in
hot path.
>> The transform is also somewhat similar to tail-duplication done
>> in path splitting or tracer.
>
> Yes.
>
>> The pass itself does quite strict pattern-matching but I wonder
>> if more cases should be handled this way.
>
> Maybe.  Probably.  But which?
Currently this pass handles the case if there is only one PHI and the
phi is used by gcond. If there are other suitable cases, we can just
extend this tranform.

>
>> Any specific reason for the pass placement between PRE and sinking?
>> tracer and path splitting run much later, jump threading runs
>> all over the place.
>
> Dunno.  Jiufu, does the pass placement matter much?
This pass would just need be inserted after ssa, and before the last dce
and forwprop which could help to eliminate temp conversion from bool to
int:
 _1 = a_8(D) < b_9(D);
 t_14 = (int) _1;
 if (_1 != 0)

Putting it after PRE is just because some case is better be handled by PREx,
like below case:  if (x) t = a < b; else t = a < b; if (t) xx.
While actually, this pass is ok to insert at other placement.
     
>
>
> Segher
Richard Biener May 8, 2019, 12:06 p.m. UTC | #5
On Tue, 7 May 2019, Andrew Pinski wrote:

> On Mon, May 6, 2019 at 7:24 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
> >
> > Hi,
> >
> > This patch implements the optimization in PR77820.  The optimization
> > eliminates phi and phi's basic block, if the phi is used only by
> > condition branch, and the phi's incoming value in the result of a
> > CMP result.
> >
> > This optimization eliminates:
> > 1. saving CMP result: p0 = a CMP b.
> > 2. additional CMP on branch: if (phi != 0).
> > 3. converting CMP result if there is phi = (INT_CONV) p0 if there is.
> >
> >   <P0>
> >   p0 = a CMP b
> >   goto <X>;
> >
> >   <P1>
> >   p1 = c CMP d
> >   goto <X>;
> >
> >   <X>
> >   # phi = PHI <p0 (P0), p1 (P1)>
> >   if (phi != 0) goto <Y>; else goto <Z>;
> >
> > Transform to:
> >
> >   <P0>
> >   p0 = a CMP b
> >   if (p0 != 0) goto <Y>; else goto <Z>;
> >
> >   <P1>
> >   p1 = c CMP d
> >   if (p1 != 0) goto <Y>; else goto <Z>;
> >
> > Bootstrapped and tested on powerpc64le with no regressions, and testcases were
> > saved. Is this ok for trunk?
> 
> forwprop was created orginally to something similar but this case is a
> specific case of backwards prop (almost).
> I wonder if it could be combined with that or as Richard mentioned,
> jump threading.

The awkward part is that it changes the CFG so it doesn't fit forwprop
well.  I thought of ifcombine which kind-of does a reverse transform
but in the end it is really duplicating <X> to get rid of the PHI and
then applying forwprop.

Richard.

> Thanks,
> Andrew Pinski
> 
> >
> > Thanks!
> >
> > [gcc]
> > 2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
> >             Lijia He  <helijia@linux.ibm.com>
> >
> >         PR tree-optimization/77820
> >         * tree-ssa-mergephicmp.c: New file.
> >         * Makefile.in (OBJS): Add tree-ssa-mergephicmp.o.
> >         * common.opt (ftree-mergephicmp): New flag.
> >         * passes.def (pass_mergephicmp): New pass.
> >         * timevar.def (TV_TREE_MERGEPHICMP): New timevar.
> >         * tree-pass.h: New file.
> >
> > [gcc/testsuite]
> > 2019-05-06  Jiufu Guo  <guojiufu@linux.ibm.com>
> >             Lijia He  <helijia@linux.ibm.com>
> >
> >         PR tree-optimization/77820
> >         * gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase.
> >         * gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase.
> >
> >
> > ---
> >  gcc/Makefile.in                                  |   1 +
> >  gcc/common.opt                                   |   4 +
> >  gcc/passes.def                                   |   1 +
> >  gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c |  31 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c |  31 +++
> >  gcc/timevar.def                                  |   1 +
> >  gcc/tree-pass.h                                  |   1 +
> >  gcc/tree-ssa-mergephicmp.c                       | 260 +++++++++++++++++++++++
> >  8 files changed, 330 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> >  create mode 100644 gcc/tree-ssa-mergephicmp.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index d186d71..9729501 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1567,6 +1567,7 @@ OBJS = \
> >         tree-ssa-reassoc.o \
> >         tree-ssa-sccvn.o \
> >         tree-ssa-scopedtables.o \
> > +       tree-ssa-mergephicmp.o \
> >         tree-ssa-sink.o \
> >         tree-ssa-strlen.o \
> >         tree-ssa-structalias.o \
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index d342c4f..5ea5ed2 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -2702,6 +2702,10 @@ ftree-salias
> >  Common Ignore
> >  Does nothing.  Preserved for backward compatibility.
> >
> > +ftree-mergephicmp
> > +Common Report Var(flag_mergephicmp) Init(1) Optimization
> > +Enable optimization on branch phi compare on trees.
> > +
> >  ftree-sink
> >  Common Report Var(flag_tree_sink) Optimization
> >  Enable SSA code sinking on trees.
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 446a7c4..e3a3913 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3.  If not see
> >        NEXT_PASS (pass_lim);
> >        NEXT_PASS (pass_walloca, false);
> >        NEXT_PASS (pass_pre);
> > +      NEXT_PASS (pass_mergephicmp);
> >        NEXT_PASS (pass_sink_code);
> >        NEXT_PASS (pass_sancov);
> >        NEXT_PASS (pass_asan);
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> > new file mode 100644
> > index 0000000..2e3f4f6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
> > @@ -0,0 +1,31 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
> > +
> > +void g (void);
> > +void g1 (void);
> > +
> > +void
> > +f (long a, long b, long c, long d, int x)
> > +{
> > +  _Bool t;
> > +  if (x)
> > +    {
> > +      t = a < b;
> > +    }
> > +  else if (d == a + b)
> > +    {
> > +      t = c < d;
> > +    }
> > +  else
> > +    {
> > +      t = a == c;
> > +    }
> > +
> > +  if (t)
> > +    {
> > +      g1 ();
> > +      g ();
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> > new file mode 100644
> > index 0000000..7c35417
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
> > @@ -0,0 +1,31 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
> > +
> > +void g (void);
> > +void g1 (void);
> > +
> > +void
> > +f (long a, long b, long c, long d, int x)
> > +{
> > +  int t;
> > +  if (x)
> > +    {
> > +      t = a < b;
> > +    }
> > +  else if (d == a + b)
> > +    {
> > +      t = c < d;
> > +    }
> > +  else
> > +    {
> > +      t = a == c;
> > +    }
> > +
> > +  if (t)
> > +    {
> > +      g1 ();
> > +      g ();
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */
> > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > index 5415446..91f92d7 100644
> > --- a/gcc/timevar.def
> > +++ b/gcc/timevar.def
> > @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES      , "tree split crit edges")
> >  DEFTIMEVAR (TV_TREE_REASSOC          , "tree reassociation")
> >  DEFTIMEVAR (TV_TREE_PRE                     , "tree PRE")
> >  DEFTIMEVAR (TV_TREE_FRE                     , "tree FRE")
> > +DEFTIMEVAR (TV_TREE_MERGEPHICMP             , "tree branch on PHI compare")
> >  DEFTIMEVAR (TV_TREE_SINK             , "tree code sinking")
> >  DEFTIMEVAR (TV_TREE_PHIOPT          , "tree linearize phis")
> >  DEFTIMEVAR (TV_TREE_BACKPROP        , "tree backward propagate")
> > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > index 47be59b..e21e820 100644
> > --- a/gcc/tree-pass.h
> > +++ b/gcc/tree-pass.h
> > @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt);
> > +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt);
> > diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c
> > new file mode 100644
> > index 0000000..7bb82d5
> > --- /dev/null
> > +++ b/gcc/tree-ssa-mergephicmp.c
> > @@ -0,0 +1,260 @@
> > +/* Elimiate PHI nodes which come from comparasion and used by branch.
> > +   Copyright (C) 2004-2019 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by the
> > +Free Software Foundation; either version 3, or (at your option) any
> > +later version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "cfghooks.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "gimple-iterator.h"
> > +#include "tree-cfg.h"
> > +
> > +/* Return true if it is ok to merge phi's incoming value:
> > +  - phi's incoming value is
> > +  phi_arg = a CMP b
> > +  or
> > +  cmp = a CMP b
> > +  phi_arg = (INT_CONV) cmp
> > +  - phi's incoming value is defined in incoming basic block.
> > +
> > +  * Parameter PHI: the phi to be checked.
> > +  * Parameter INDEX: index'th incoming argument of phi to be checked. */
> > +static bool
> > +could_incoming_merge (gphi *phi, int index)
> > +{
> > +  tree value = gimple_phi_arg_def (phi, index);
> > +  if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value)))
> > +    return false;
> > +
> > +  gimple *def = SSA_NAME_DEF_STMT (value);
> > +
> > +  if (!is_gimple_assign (def))
> > +    return false;
> > +
> > +  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)))
> > +    {
> > +      value = gimple_assign_rhs1 (def);
> > +      if (!has_single_use (value))
> > +       return false;
> > +
> > +      def = SSA_NAME_DEF_STMT (value);
> > +
> > +      if (!is_gimple_assign (def))
> > +       return false;
> > +    }
> > +
> > +  if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison)
> > +    return false;
> > +
> > +  /* Check if phi's incoming value is defined in the incoming basic_block.  */
> > +  edge e = gimple_phi_arg_edge (phi, index);
> > +  if (def->bb != e->src)
> > +    return false;
> > +
> > +  if (!single_succ_p (def->bb))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Return true if the basic_block is ok to optimize:
> > +   <X>
> > +   res = PHI <v0(b0), v1(b1)...>
> > +   if (res != 0) goto l_true; goto l_false;
> > +
> > +   The phi stmt is saved to argument PHI.
> > +   The gcond stmt is saved to argument GC.  */
> > +static bool
> > +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc)
> > +{
> > +  /* There should be only one phi.  */
> > +  gphi_iterator pi = gsi_start_phis (bb);
> > +  if (gsi_end_p (pi))
> > +    return false;
> > +
> > +  *phi = pi.phi ();
> > +  if (!has_single_use ((*phi)->result))
> > +    return false;
> > +
> > +  gsi_next (&pi);
> > +  if (!gsi_end_p (pi))
> > +    return false;
> > +
> > +  /* There should be only one stmt which is gcond.  */
> > +  gimple *gs = last_and_only_stmt (bb);
> > +  if (gs == NULL)
> > +    return false;
> > +
> > +  if (!is_a<gcond *> (gs))
> > +    return false;
> > +
> > +  *gc = as_a<gcond *> (gs);
> > +  if (gimple_cond_lhs (*gc) != (*phi)->result)
> > +    return false;
> > +
> > +  /* Check if all incoming basic blocks can merge.  */
> > +  for (unsigned int i = 0; i < (*phi)->nargs; i++)
> > +    if (!could_incoming_merge (*phi, i))
> > +      return false;
> > +
> > +  /* Check if there is no phi in any successors.  */
> > +  edge e;
> > +  edge_iterator ei;
> > +  FOR_EACH_EDGE (e, ei, (*phi)->bb->succs)
> > +    {
> > +      if (!gsi_end_p (gsi_start_phis (e->dest)))
> > +       return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Merge the phi and the gcond into the index'th incoming block.  */
> > +static void
> > +merge_incoming_bb (gcond *gc, gphi *phi, int index)
> > +{
> > +  tree value = gimple_phi_arg_def (phi, index);
> > +
> > +  gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value,
> > +                                    gimple_cond_rhs (gc), NULL, NULL);
> > +
> > +  gimple *def = SSA_NAME_DEF_STMT (value);
> > +  gimple_stmt_iterator last = gsi_last_bb (def->bb);
> > +  gsi_insert_after (&last, new_gc, GSI_NEW_STMT);
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +  FOR_EACH_EDGE (e, ei, phi->bb->succs)
> > +    {
> > +      edge new_e = make_edge (def->bb, e->dest, e->flags);
> > +      new_e->probability = e->probability;
> > +    }
> > +}
> > +
> > +/* If there are basic blocks look like:
> > +  <P0>
> > +  p0 = a CMP b
> > +  goto <X>;
> > +
> > +  <P1>
> > +  p1 = c CMP d
> > +  goto <X>;
> > +
> > +  <X>
> > +  # phi = PHI <p0 (P0), p1 (P1)>
> > +  if (phi != 0) goto <Y>; else goto <Z>;
> > +
> > +Transform it to:
> > +
> > +  <P0>
> > +  p0 = a CMP b
> > +  if (p0 != 0) goto <Y>; else goto <Z>;
> > +
> > +  <P1>
> > +  p1 = c CMP d
> > +  if (p1 != 0) goto <Y>; else goto <Z>; */
> > +
> > +static bool
> > +mergephicmp_once (function *fun)
> > +{
> > +  bool change = false;
> > +  basic_block bb;
> > +  basic_block prev_need_delete = NULL;
> > +
> > +  FOR_EACH_BB_FN (bb, fun)
> > +    {
> > +      gphi *phi;
> > +      gcond *gc;
> > +
> > +      /* Prev bb can be deleted only after iterator move to next bb.  */
> > +      if (prev_need_delete)
> > +       delete_basic_block (prev_need_delete);
> > +      prev_need_delete = NULL;
> > +
> > +      if (could_optimize_phi_bb (bb, &phi, &gc))
> > +       {
> > +         for (unsigned int i = 0; i < phi->nargs; i++)
> > +           merge_incoming_bb (gc, phi, i);
> > +
> > +         change = true;
> > +         prev_need_delete = bb;
> > +       }
> > +    }
> > +
> > +  return change;
> > +}
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_mergephicmp = {
> > +  GIMPLE_PASS,         /* type */
> > +  "mergephicmp",       /* name */
> > +  OPTGROUP_NONE,       /* optinfo_flags */
> > +  TV_TREE_MERGEPHICMP, /* tv_id */
> > +  /* PROP_no_crit_edges is ensured by running split_critical_edges in
> > +     pass_data_sink_code::execute ().  */
> > +  (PROP_cfg | PROP_ssa), /* properties_required */
> > +  0,                    /* properties_provided */
> > +  0,                    /* properties_destroyed */
> > +  0,                    /* todo_flags_start */
> > +  0,                    /* todo_flags_finish */
> > +};
> > +
> > +class pass_mergephicmp : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_mergephicmp (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_mergephicmp, ctxt)
> > +  {
> > +  }
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *) { return flag_mergephicmp != 0; }
> > +
> > +  virtual unsigned int execute (function *);
> > +
> > +}; // class pass_sink_code
> > +
> > +unsigned int
> > +pass_mergephicmp::execute (function *fun)
> > +{
> > +  bool changed;
> > +
> > +  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
> > +    return 0;
> > +
> > +  changed = mergephicmp_once (fun);
> > +
> > +  if (changed)
> > +    free_dominance_info (CDI_DOMINATORS);
> > +
> > +  return changed ? TODO_cleanup_cfg : 0;
> > +}
> > +
> > +} // anon namespace
> > +
> > +gimple_opt_pass *
> > +make_pass_mergephicmp (gcc::context *ctxt)
> > +{
> > +  return new pass_mergephicmp (ctxt);
> > +}
> > --
> > 2.7.4
> >
>
Richard Biener May 8, 2019, 12:20 p.m. UTC | #6
On Wed, 8 May 2019, Jiufu Guo wrote:

> Hi,
> 
> Thanks Richard, Segher, Andrew and all.
> 
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> 
> > Let me try to answer some of this...
> >
> > On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
> >> On Mon, 6 May 2019, Jiufu Guo wrote:
> >> > This patch implements the optimization in PR77820.  The optimization
> >> > eliminates phi and phi's basic block, if the phi is used only by
> >> > condition branch, and the phi's incoming value in the result of a
> >> > CMP result.
> >
> >> I'm not sure I like a new pass here ;)  The transform is basically
> >> tail-duplicating the PHI block because the exit conditional can
> >> be "simplfied" - that's something jump threading generally does
> >> though it relies on "simplified" being a little bit more simplified
> >> than above.
> >
> Adding a new pass is just because it may be relatively easy to extend
> and maintain. 
> 
> Adding this micor-optimization into other passes is also a good
> sugguestion. 
> 
> - Similar with jump threading, this new transform is replacing jump
> old destination with new destination. While for this case, the new
> destination can not be determined at compile time. 
> 
> - And as Andrew said: forwprop/backprop are similar, but this case is in
> opposite: it is spread common code(phi+gcond) into different preds.
> 
> - And there is phiopt pass which focus on making 'phi' stmt better. 
> it also do some similar thing:
>      bb0:
>        if (a != b) goto bb2; else goto bb1;
>      bb1:
>      bb2:
>        x = PHI <a (bb1), b (bb0), ...>;
> 
>    tranform to:
> 
>      bb0:
>      bb2:
>        x = PHI <b (bb0), ...>;
> 
> If I avoid to add new pass, any suggustions about which exiting pass
> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to
> support this new transform? 

The main thing the transform does is tail-duplicate the PHI block,
if we'd just do that followup transforms would do the rest.

So my suggestion would be to look at passes doing exactly that
which would be tracer and path splitting (but that's just run for
loops).  If you want to fit it into jump threading then it would
be the backwards threader since the heuristics would look at
opportunities to simplify the conditional, see it fed by a PHI
and there from (two) conditional(s).

You do not put any constraints on the block relation of the
conditional generators, for example path splitting looks
for diamonds, duplicating the merge block.

It might be that extending the backwards threader with the
pattern matching is easiest (this one also runs quite a few times).

> > Right, but where in the pipeline does this fit in?
> >
> >> I suspect this transform was implemented because of some benchmark?
> >
> > Something in SPEC...  2006 iirc...  Will need to dig it up, I forgot
> > the details.
> >
> >> I suspect the performance benefit is because of better branch
> >> prediction by not mangling both conditional branches into one?
> >
> > No, it is that previously a condition was moved to a GPR, and then compared
> > again.  See PR77820.  This is expensive, and serial, too.
> >
> This transform focusing on eliminating additional GPR saving and
> additional compare, then it would helps a lot if the comparasion in in
> hot path.
> >> The transform is also somewhat similar to tail-duplication done
> >> in path splitting or tracer.
> >
> > Yes.
> >
> >> The pass itself does quite strict pattern-matching but I wonder
> >> if more cases should be handled this way.
> >
> > Maybe.  Probably.  But which?
> Currently this pass handles the case if there is only one PHI and the
> phi is used by gcond. If there are other suitable cases, we can just
> extend this tranform.

I was just thinking of a small amount of unrelated stmts in those
blocks or the case where just one PHI argument is fed by a conditional
thus only one path would simplify (but maybe the hot one if you look
at edge probabilities).

> >
> >> Any specific reason for the pass placement between PRE and sinking?
> >> tracer and path splitting run much later, jump threading runs
> >> all over the place.
> >
> > Dunno.  Jiufu, does the pass placement matter much?
> This pass would just need be inserted after ssa, and before the last dce
> and forwprop which could help to eliminate temp conversion from bool to
> int:
>  _1 = a_8(D) < b_9(D);
>  t_14 = (int) _1;
>  if (_1 != 0)
> 
> Putting it after PRE is just because some case is better be handled by PREx,
> like below case:  if (x) t = a < b; else t = a < b; if (t) xx.
> While actually, this pass is ok to insert at other placement.

OK, so all of the suggestions above would work since we have a late
forwprop and DCE pass to clean up.

Btw, I wonder if on RTL basic-block reordering (which also does
some tail duplication) could be a place to do such transform?
Or is it too late to do the desired cleanups after that?
Possibly since we're after RA.

Richard.
Segher Boessenkool May 8, 2019, 3:43 p.m. UTC | #7
On Wed, May 08, 2019 at 02:20:19PM +0200, Richard Biener wrote:
> Btw, I wonder if on RTL basic-block reordering (which also does
> some tail duplication) could be a place to do such transform?
> Or is it too late to do the desired cleanups after that?
> Possibly since we're after RA.

It is *much* too late; it is too late to do it at combine time, already
(and combine of course cannot do this).  You need the early RTL passes to
clean up all that mess the conditionals generate -- the point of this patch
is to never have that conditional-to-integer-reg stuff to begin with.

RTL isn't nice for doing cross-BB transforms, either.


Segher
Richard Biener May 8, 2019, 4:34 p.m. UTC | #8
On Wed, 8 May 2019, Segher Boessenkool wrote:

> On Wed, May 08, 2019 at 02:20:19PM +0200, Richard Biener wrote:
> > Btw, I wonder if on RTL basic-block reordering (which also does
> > some tail duplication) could be a place to do such transform?
> > Or is it too late to do the desired cleanups after that?
> > Possibly since we're after RA.
> 
> It is *much* too late; it is too late to do it at combine time, already
> (and combine of course cannot do this).  You need the early RTL passes to
> clean up all that mess the conditionals generate -- the point of this patch
> is to never have that conditional-to-integer-reg stuff to begin with.
> 
> RTL isn't nice for doing cross-BB transforms, either.

Ah, well - I thought it works reasonably well for extended BBs ;)

We can of course also match this at RTL expansion time but exposing
this earlier definitely looks valuable.

Richard.
Jeff Law May 8, 2019, 8:39 p.m. UTC | #9
On 5/8/19 6:20 AM, Richard Biener wrote:
> On Wed, 8 May 2019, Jiufu Guo wrote:
> 
>> Hi,
>>
>> Thanks Richard, Segher, Andrew and all.
>>
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>>
>>> Let me try to answer some of this...
>>>
>>> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
>>>> On Mon, 6 May 2019, Jiufu Guo wrote:
>>>>> This patch implements the optimization in PR77820.  The optimization
>>>>> eliminates phi and phi's basic block, if the phi is used only by
>>>>> condition branch, and the phi's incoming value in the result of a
>>>>> CMP result.
>>>
>>>> I'm not sure I like a new pass here ;)  The transform is basically
>>>> tail-duplicating the PHI block because the exit conditional can
>>>> be "simplfied" - that's something jump threading generally does
>>>> though it relies on "simplified" being a little bit more simplified
>>>> than above.
>>>
>> Adding a new pass is just because it may be relatively easy to extend
>> and maintain. 
>>
>> Adding this micor-optimization into other passes is also a good
>> sugguestion. 
>>
>> - Similar with jump threading, this new transform is replacing jump
>> old destination with new destination. While for this case, the new
>> destination can not be determined at compile time. 
>>
>> - And as Andrew said: forwprop/backprop are similar, but this case is in
>> opposite: it is spread common code(phi+gcond) into different preds.
>>
>> - And there is phiopt pass which focus on making 'phi' stmt better. 
>> it also do some similar thing:
>>      bb0:
>>        if (a != b) goto bb2; else goto bb1;
>>      bb1:
>>      bb2:
>>        x = PHI <a (bb1), b (bb0), ...>;
>>
>>    tranform to:
>>
>>      bb0:
>>      bb2:
>>        x = PHI <b (bb0), ...>;
>>
>> If I avoid to add new pass, any suggustions about which exiting pass
>> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to
>> support this new transform? 
> 
> The main thing the transform does is tail-duplicate the PHI block,
> if we'd just do that followup transforms would do the rest.
One might even claim this is really a transformation for cfg cleanups.
If we ignore the PHI what we have is a unconditional jump to a
conditional jump.  The obvious way to optimize that is to replace the
unconditional jump with a copy of the conditional jump.



>> Currently this pass handles the case if there is only one PHI and the
>> phi is used by gcond. If there are other suitable cases, we can just
>> extend this tranform.
> 
> I was just thinking of a small amount of unrelated stmts in those
> blocks or the case where just one PHI argument is fed by a conditional
> thus only one path would simplify (but maybe the hot one if you look
> at edge probabilities).
Perhaps, but do these happen enough in practice?  With the code motions
we do I'd be a bit surprised.

> 
>>>
>>>> Any specific reason for the pass placement between PRE and sinking?
>>>> tracer and path splitting run much later, jump threading runs
>>>> all over the place.
>>>
>>> Dunno.  Jiufu, does the pass placement matter much?
>> This pass would just need be inserted after ssa, and before the last dce
>> and forwprop which could help to eliminate temp conversion from bool to
>> int:
>>  _1 = a_8(D) < b_9(D);
>>  t_14 = (int) _1;
>>  if (_1 != 0)
>>
>> Putting it after PRE is just because some case is better be handled by PREx,
>> like below case:  if (x) t = a < b; else t = a < b; if (t) xx.
>> While actually, this pass is ok to insert at other placement.
> 
> OK, so all of the suggestions above would work since we have a late
> forwprop and DCE pass to clean up.
I'd expect this transformation to be useful for jump threading, VRP,
DOM, etc.

> 
> Btw, I wonder if on RTL basic-block reordering (which also does
> some tail duplication) could be a place to do such transform?
> Or is it too late to do the desired cleanups after that?
> Possibly since we're after RA.IIRC we've had code in jump.c to do this in the past.  It may have
morphed through the years, but I'm pretty sure we've done this kind of
thing on a low level before.
jeff
Jeff Law May 8, 2019, 9:03 p.m. UTC | #10
On 5/6/19 8:24 AM, Jiufu Guo wrote:
> Hi,
> 
> This patch implements the optimization in PR77820.  The optimization
> eliminates phi and phi's basic block, if the phi is used only by
> condition branch, and the phi's incoming value in the result of a
> CMP result.
> 
> This optimization eliminates:
> 1. saving CMP result: p0 = a CMP b.
> 2. additional CMP on branch: if (phi != 0).
> 3. converting CMP result if there is phi = (INT_CONV) p0 if there is.
> 
>   <P0>
>   p0 = a CMP b
>   goto <X>;
> 
>   <P1>
>   p1 = c CMP d
>   goto <X>;
> 
>   <X>
>   # phi = PHI <p0 (P0), p1 (P1)>
>   if (phi != 0) goto <Y>; else goto <Z>;
> 
> Transform to:
> 
>   <P0>
>   p0 = a CMP b
>   if (p0 != 0) goto <Y>; else goto <Z>;
> 
>   <P1>
>   p1 = c CMP d
>   if (p1 != 0) goto <Y>; else goto <Z>;
> 
> Bootstrapped and tested on powerpc64le with no regressions, and testcases were
> saved. Is this ok for trunk?
So there's been talk of somehow integrating with jump threading.   The
big problem with that idea is jump threading is only concerned with
rearranging the CFG when doing so allows it to determine the result of a
conditional branch.

This is actually a generalized form of path splitting that we currently
do for loop latches.  You may find you can twiddle that code to do what
you want.

One of the things that falls out of that realization is that you have to
be very careful that this doesn't muck up if-conversion.

Jeff
Jiufu Guo May 9, 2019, 3:14 a.m. UTC | #11
Jeff Law <law@redhat.com> writes:

> On 5/6/19 8:24 AM, Jiufu Guo wrote:
>> Hi,
>> 
>> This patch implements the optimization in PR77820.  The optimization
>> eliminates phi and phi's basic block, if the phi is used only by
>> condition branch, and the phi's incoming value in the result of a
>> CMP result.
>> 
>> This optimization eliminates:
>> 1. saving CMP result: p0 = a CMP b.
>> 2. additional CMP on branch: if (phi != 0).
>> 3. converting CMP result if there is phi = (INT_CONV) p0 if there is.
>> 
>>   <P0>
>>   p0 = a CMP b
>>   goto <X>;
>> 
>>   <P1>
>>   p1 = c CMP d
>>   goto <X>;
>> 
>>   <X>
>>   # phi = PHI <p0 (P0), p1 (P1)>
>>   if (phi != 0) goto <Y>; else goto <Z>;
>> 
>> Transform to:
>> 
>>   <P0>
>>   p0 = a CMP b
>>   if (p0 != 0) goto <Y>; else goto <Z>;
>> 
>>   <P1>
>>   p1 = c CMP d
>>   if (p1 != 0) goto <Y>; else goto <Z>;
>> 
>> Bootstrapped and tested on powerpc64le with no regressions, and testcases were
>> saved. Is this ok for trunk?
> So there's been talk of somehow integrating with jump threading.   The
> big problem with that idea is jump threading is only concerned with
> rearranging the CFG when doing so allows it to determine the result of a
> conditional branch.
>
> This is actually a generalized form of path splitting that we currently
> do for loop latches.  You may find you can twiddle that code to do what
> you want.
>
> One of the things that falls out of that realization is that you have to
> be very careful that this doesn't muck up if-conversion.
>
> Jeff

Thanks for all your suggeustions: Richard, Segher, Andrew and Jeff!
Will send new patch after investigating and code refining.

Jiufu Guo.
Richard Biener May 9, 2019, 6:07 a.m. UTC | #12
On Wed, 8 May 2019, Jeff Law wrote:

> On 5/8/19 6:20 AM, Richard Biener wrote:
> > On Wed, 8 May 2019, Jiufu Guo wrote:
> > 
> >> Hi,
> >>
> >> Thanks Richard, Segher, Andrew and all.
> >>
> >> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >>
> >>> Let me try to answer some of this...
> >>>
> >>> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
> >>>> On Mon, 6 May 2019, Jiufu Guo wrote:
> >>>>> This patch implements the optimization in PR77820.  The optimization
> >>>>> eliminates phi and phi's basic block, if the phi is used only by
> >>>>> condition branch, and the phi's incoming value in the result of a
> >>>>> CMP result.
> >>>
> >>>> I'm not sure I like a new pass here ;)  The transform is basically
> >>>> tail-duplicating the PHI block because the exit conditional can
> >>>> be "simplfied" - that's something jump threading generally does
> >>>> though it relies on "simplified" being a little bit more simplified
> >>>> than above.
> >>>
> >> Adding a new pass is just because it may be relatively easy to extend
> >> and maintain. 
> >>
> >> Adding this micor-optimization into other passes is also a good
> >> sugguestion. 
> >>
> >> - Similar with jump threading, this new transform is replacing jump
> >> old destination with new destination. While for this case, the new
> >> destination can not be determined at compile time. 
> >>
> >> - And as Andrew said: forwprop/backprop are similar, but this case is in
> >> opposite: it is spread common code(phi+gcond) into different preds.
> >>
> >> - And there is phiopt pass which focus on making 'phi' stmt better. 
> >> it also do some similar thing:
> >>      bb0:
> >>        if (a != b) goto bb2; else goto bb1;
> >>      bb1:
> >>      bb2:
> >>        x = PHI <a (bb1), b (bb0), ...>;
> >>
> >>    tranform to:
> >>
> >>      bb0:
> >>      bb2:
> >>        x = PHI <b (bb0), ...>;
> >>
> >> If I avoid to add new pass, any suggustions about which exiting pass
> >> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to
> >> support this new transform? 
> > 
> > The main thing the transform does is tail-duplicate the PHI block,
> > if we'd just do that followup transforms would do the rest.
> One might even claim this is really a transformation for cfg cleanups.
> If we ignore the PHI what we have is a unconditional jump to a
> conditional jump.  The obvious way to optimize that is to replace the
> unconditional jump with a copy of the conditional jump.

I though about this too, but then quickly disregarded as too gross ;)

> >> Currently this pass handles the case if there is only one PHI and the
> >> phi is used by gcond. If there are other suitable cases, we can just
> >> extend this tranform.
> > 
> > I was just thinking of a small amount of unrelated stmts in those
> > blocks or the case where just one PHI argument is fed by a conditional
> > thus only one path would simplify (but maybe the hot one if you look
> > at edge probabilities).
> Perhaps, but do these happen enough in practice?  With the code motions
> we do I'd be a bit surprised.

It probably depends on what kind of source this typically arises from.

Richard.
Jeff Law May 9, 2019, 2:52 p.m. UTC | #13
On 5/9/19 12:07 AM, Richard Biener wrote:
> On Wed, 8 May 2019, Jeff Law wrote:
> 
>> On 5/8/19 6:20 AM, Richard Biener wrote:
>>> On Wed, 8 May 2019, Jiufu Guo wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks Richard, Segher, Andrew and all.
>>>>
>>>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>>>>
>>>>> Let me try to answer some of this...
>>>>>
>>>>> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
>>>>>> On Mon, 6 May 2019, Jiufu Guo wrote:
>>>>>>> This patch implements the optimization in PR77820.  The optimization
>>>>>>> eliminates phi and phi's basic block, if the phi is used only by
>>>>>>> condition branch, and the phi's incoming value in the result of a
>>>>>>> CMP result.
>>>>>
>>>>>> I'm not sure I like a new pass here ;)  The transform is basically
>>>>>> tail-duplicating the PHI block because the exit conditional can
>>>>>> be "simplfied" - that's something jump threading generally does
>>>>>> though it relies on "simplified" being a little bit more simplified
>>>>>> than above.
>>>>>
>>>> Adding a new pass is just because it may be relatively easy to extend
>>>> and maintain. 
>>>>
>>>> Adding this micor-optimization into other passes is also a good
>>>> sugguestion. 
>>>>
>>>> - Similar with jump threading, this new transform is replacing jump
>>>> old destination with new destination. While for this case, the new
>>>> destination can not be determined at compile time. 
>>>>
>>>> - And as Andrew said: forwprop/backprop are similar, but this case is in
>>>> opposite: it is spread common code(phi+gcond) into different preds.
>>>>
>>>> - And there is phiopt pass which focus on making 'phi' stmt better. 
>>>> it also do some similar thing:
>>>>      bb0:
>>>>        if (a != b) goto bb2; else goto bb1;
>>>>      bb1:
>>>>      bb2:
>>>>        x = PHI <a (bb1), b (bb0), ...>;
>>>>
>>>>    tranform to:
>>>>
>>>>      bb0:
>>>>      bb2:
>>>>        x = PHI <b (bb0), ...>;
>>>>
>>>> If I avoid to add new pass, any suggustions about which exiting pass
>>>> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to
>>>> support this new transform? 
>>>
>>> The main thing the transform does is tail-duplicate the PHI block,
>>> if we'd just do that followup transforms would do the rest.
>> One might even claim this is really a transformation for cfg cleanups.
>> If we ignore the PHI what we have is a unconditional jump to a
>> conditional jump.  The obvious way to optimize that is to replace the
>> unconditional jump with a copy of the conditional jump.
> 
> I though about this too, but then quickly disregarded as too gross ;)
Yea, the changes to the CFG (and dominator tree and SSA graph) are
probably significant enough that trying to do it into cleanup_cfg is
just madness.  One of those few cases where doing something on RTL is
probably easier than gimple.

Jeff
Jiufu Guo May 13, 2019, 3:07 a.m. UTC | #14
Jeff Law <law@redhat.com> writes:

> On 5/8/19 6:20 AM, Richard Biener wrote:
>> On Wed, 8 May 2019, Jiufu Guo wrote:
>> 
>> The main thing the transform does is tail-duplicate the PHI block,
>> if we'd just do that followup transforms would do the rest.
> One might even claim this is really a transformation for cfg cleanups.
> If we ignore the PHI what we have is a unconditional jump to a
> conditional jump.  The obvious way to optimize that is to replace the
> unconditional jump with a copy of the conditional jump.
>

>> Btw, I wonder if on RTL basic-block reordering (which also does
>> some tail duplication) could be a place to do such transform?
>> Or is it too late to do the desired cleanups after that?
>> Possibly since we're after RA.IIRC we've had code in jump.c to do this in the past.  It may have
> morphed through the years, but I'm pretty sure we've done this kind of
> thing on a low level before.

Yes, RTL basic-block reordering duplicate the conditional jump and
replace unconditional jump with it:
   17: %9:DI=%9:DI^0x1
   19: %9:DI=zero_extend(%9:QI)
   58: pc=L31
   59: barrier
--->
   17: %9:DI=%9:DI^0x1
   19: %9:DI=zero_extend(%9:QI)
   33: %0:CC=cmp(%9:DI,0)
      REG_DEAD %9:DI
   34: pc={(%0:CC!=0)?L90:pc}
      REG_DEAD %0:CC
      REG_BR_PROB 354334804

While, the conditional jump is still using GPR even the GPR(%9:DI) is
result of a CMP instruction. Because moving CMP result to GPR is
generated when tranforming gimple to rtl. 

To elimiate the instructions which moving result of CMP to GPR, I'm
wondering maybe we could do a combine(or a peephole) after bbro to let
the condition jump directly using the result of CMP.

Jiufu Guo.

> jeff
Segher Boessenkool May 13, 2019, 10:25 a.m. UTC | #15
On Sun, May 12, 2019 at 10:07:28PM -0500, Jiufu Guo wrote:
> To elimiate the instructions which moving result of CMP to GPR, I'm
> wondering maybe we could do a combine(or a peephole) after bbro to let
> the condition jump directly using the result of CMP.

It will have to be a peephole.  And peepholes are machine-specific.

You will also not get any further optimisations in those blocks, that way;
not combine etc.

It really should be done on gimple, not rtl.


Segher
Richard Biener May 13, 2019, 10:33 a.m. UTC | #16
On Mon, 13 May 2019, Segher Boessenkool wrote:

> On Sun, May 12, 2019 at 10:07:28PM -0500, Jiufu Guo wrote:
> > To elimiate the instructions which moving result of CMP to GPR, I'm
> > wondering maybe we could do a combine(or a peephole) after bbro to let
> > the condition jump directly using the result of CMP.
> 
> It will have to be a peephole.  And peepholes are machine-specific.
> 
> You will also not get any further optimisations in those blocks, that way;
> not combine etc.
> 
> It really should be done on gimple, not rtl.

Agreed.
diff mbox series

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d186d71..9729501 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1567,6 +1567,7 @@  OBJS = \
 	tree-ssa-reassoc.o \
 	tree-ssa-sccvn.o \
 	tree-ssa-scopedtables.o \
+	tree-ssa-mergephicmp.o \
 	tree-ssa-sink.o \
 	tree-ssa-strlen.o \
 	tree-ssa-structalias.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index d342c4f..5ea5ed2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2702,6 +2702,10 @@  ftree-salias
 Common Ignore
 Does nothing.  Preserved for backward compatibility.
 
+ftree-mergephicmp
+Common Report Var(flag_mergephicmp) Init(1) Optimization
+Enable optimization on branch phi compare on trees.
+
 ftree-sink
 Common Report Var(flag_tree_sink) Optimization
 Enable SSA code sinking on trees.
diff --git a/gcc/passes.def b/gcc/passes.def
index 446a7c4..e3a3913 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -249,6 +249,7 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_lim);
       NEXT_PASS (pass_walloca, false);
       NEXT_PASS (pass_pre);
+      NEXT_PASS (pass_mergephicmp);
       NEXT_PASS (pass_sink_code);
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
new file mode 100644
index 0000000..2e3f4f6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
+
+void g (void);
+void g1 (void);
+
+void
+f (long a, long b, long c, long d, int x)
+{
+  _Bool t;
+  if (x)
+    {
+      t = a < b;
+    }
+  else if (d == a + b)
+    {
+      t = c < d;
+    }
+  else
+    {
+      t = a == c;
+    }
+
+  if (t)
+    {
+      g1 ();
+      g ();
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
new file mode 100644
index 0000000..7c35417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
+
+void g (void);
+void g1 (void);
+
+void
+f (long a, long b, long c, long d, int x)
+{
+  int t;
+  if (x)
+    {
+      t = a < b;
+    }
+  else if (d == a + b)
+    {
+      t = c < d;
+    }
+  else
+    {
+      t = a == c;
+    }
+
+  if (t)
+    {
+      g1 ();
+      g ();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 5415446..91f92d7 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -170,6 +170,7 @@  DEFTIMEVAR (TV_TREE_SPLIT_EDGES      , "tree split crit edges")
 DEFTIMEVAR (TV_TREE_REASSOC          , "tree reassociation")
 DEFTIMEVAR (TV_TREE_PRE		     , "tree PRE")
 DEFTIMEVAR (TV_TREE_FRE		     , "tree FRE")
+DEFTIMEVAR (TV_TREE_MERGEPHICMP	     , "tree branch on PHI compare")
 DEFTIMEVAR (TV_TREE_SINK             , "tree code sinking")
 DEFTIMEVAR (TV_TREE_PHIOPT	     , "tree linearize phis")
 DEFTIMEVAR (TV_TREE_BACKPROP	     , "tree backward propagate")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 47be59b..e21e820 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -441,6 +441,7 @@  extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c
new file mode 100644
index 0000000..7bb82d5
--- /dev/null
+++ b/gcc/tree-ssa-mergephicmp.c
@@ -0,0 +1,260 @@ 
+/* Elimiate PHI nodes which come from comparasion and used by branch.
+   Copyright (C) 2004-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "tree-cfg.h"
+
+/* Return true if it is ok to merge phi's incoming value:
+  - phi's incoming value is
+  phi_arg = a CMP b
+  or
+  cmp = a CMP b
+  phi_arg = (INT_CONV) cmp
+  - phi's incoming value is defined in incoming basic block.
+
+  * Parameter PHI: the phi to be checked.
+  * Parameter INDEX: index'th incoming argument of phi to be checked. */
+static bool
+could_incoming_merge (gphi *phi, int index)
+{
+  tree value = gimple_phi_arg_def (phi, index);
+  if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value)))
+    return false;
+
+  gimple *def = SSA_NAME_DEF_STMT (value);
+
+  if (!is_gimple_assign (def))
+    return false;
+
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)))
+    {
+      value = gimple_assign_rhs1 (def);
+      if (!has_single_use (value))
+	return false;
+
+      def = SSA_NAME_DEF_STMT (value);
+
+      if (!is_gimple_assign (def))
+	return false;
+    }
+
+  if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison)
+    return false;
+
+  /* Check if phi's incoming value is defined in the incoming basic_block.  */
+  edge e = gimple_phi_arg_edge (phi, index);
+  if (def->bb != e->src)
+    return false;
+
+  if (!single_succ_p (def->bb))
+    return false;
+
+  return true;
+}
+
+/* Return true if the basic_block is ok to optimize:
+   <X>
+   res = PHI <v0(b0), v1(b1)...>
+   if (res != 0) goto l_true; goto l_false;
+
+   The phi stmt is saved to argument PHI.
+   The gcond stmt is saved to argument GC.  */
+static bool
+could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc)
+{
+  /* There should be only one phi.  */
+  gphi_iterator pi = gsi_start_phis (bb);
+  if (gsi_end_p (pi))
+    return false;
+
+  *phi = pi.phi ();
+  if (!has_single_use ((*phi)->result))
+    return false;
+
+  gsi_next (&pi);
+  if (!gsi_end_p (pi))
+    return false;
+
+  /* There should be only one stmt which is gcond.  */
+  gimple *gs = last_and_only_stmt (bb);
+  if (gs == NULL)
+    return false;
+
+  if (!is_a<gcond *> (gs))
+    return false;
+
+  *gc = as_a<gcond *> (gs);
+  if (gimple_cond_lhs (*gc) != (*phi)->result)
+    return false;
+
+  /* Check if all incoming basic blocks can merge.  */
+  for (unsigned int i = 0; i < (*phi)->nargs; i++)
+    if (!could_incoming_merge (*phi, i))
+      return false;
+
+  /* Check if there is no phi in any successors.  */
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, (*phi)->bb->succs)
+    {
+      if (!gsi_end_p (gsi_start_phis (e->dest)))
+	return false;
+    }
+
+  return true;
+}
+
+/* Merge the phi and the gcond into the index'th incoming block.  */
+static void
+merge_incoming_bb (gcond *gc, gphi *phi, int index)
+{
+  tree value = gimple_phi_arg_def (phi, index);
+
+  gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value,
+				     gimple_cond_rhs (gc), NULL, NULL);
+
+  gimple *def = SSA_NAME_DEF_STMT (value);
+  gimple_stmt_iterator last = gsi_last_bb (def->bb);
+  gsi_insert_after (&last, new_gc, GSI_NEW_STMT);
+
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, phi->bb->succs)
+    {
+      edge new_e = make_edge (def->bb, e->dest, e->flags);
+      new_e->probability = e->probability;
+    }
+}
+
+/* If there are basic blocks look like:
+  <P0>
+  p0 = a CMP b
+  goto <X>;
+
+  <P1>
+  p1 = c CMP d
+  goto <X>;
+
+  <X>
+  # phi = PHI <p0 (P0), p1 (P1)>
+  if (phi != 0) goto <Y>; else goto <Z>;
+
+Transform it to:
+
+  <P0>
+  p0 = a CMP b
+  if (p0 != 0) goto <Y>; else goto <Z>;
+
+  <P1>
+  p1 = c CMP d
+  if (p1 != 0) goto <Y>; else goto <Z>; */
+
+static bool
+mergephicmp_once (function *fun)
+{
+  bool change = false;
+  basic_block bb;
+  basic_block prev_need_delete = NULL;
+
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gphi *phi;
+      gcond *gc;
+
+      /* Prev bb can be deleted only after iterator move to next bb.  */
+      if (prev_need_delete)
+	delete_basic_block (prev_need_delete);
+      prev_need_delete = NULL;
+
+      if (could_optimize_phi_bb (bb, &phi, &gc))
+	{
+	  for (unsigned int i = 0; i < phi->nargs; i++)
+	    merge_incoming_bb (gc, phi, i);
+
+	  change = true;
+	  prev_need_delete = bb;
+	}
+    }
+
+  return change;
+}
+
+namespace {
+
+const pass_data pass_data_mergephicmp = {
+  GIMPLE_PASS,		/* type */
+  "mergephicmp",       /* name */
+  OPTGROUP_NONE,	/* optinfo_flags */
+  TV_TREE_MERGEPHICMP, /* tv_id */
+  /* PROP_no_crit_edges is ensured by running split_critical_edges in
+     pass_data_sink_code::execute ().  */
+  (PROP_cfg | PROP_ssa), /* properties_required */
+  0,			 /* properties_provided */
+  0,			 /* properties_destroyed */
+  0,			 /* todo_flags_start */
+  0,			 /* todo_flags_finish */
+};
+
+class pass_mergephicmp : public gimple_opt_pass
+{
+public:
+  pass_mergephicmp (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_mergephicmp, ctxt)
+  {
+  }
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_mergephicmp != 0; }
+
+  virtual unsigned int execute (function *);
+
+}; // class pass_sink_code
+
+unsigned int
+pass_mergephicmp::execute (function *fun)
+{
+  bool changed;
+
+  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
+    return 0;
+
+  changed = mergephicmp_once (fun);
+
+  if (changed)
+    free_dominance_info (CDI_DOMINATORS);
+
+  return changed ? TODO_cleanup_cfg : 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_mergephicmp (gcc::context *ctxt)
+{
+  return new pass_mergephicmp (ctxt);
+}