Message ID | 1557152644-3063-1-git-send-email-guojiufu@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Eliminates phi on branch for CMP result | expand |
On Mon, 6 May 2019, Jiufu Guo wrote: > Hi, > > This patch implements the optimization in PR77820. The optimization > eliminates phi and phi's basic block, if the phi is used only by > condition branch, and the phi's incoming value in the result of a > CMP result. > > This optimization eliminates: > 1. saving CMP result: p0 = a CMP b. > 2. additional CMP on branch: if (phi != 0). > 3. converting CMP result if there is phi = (INT_CONV) p0 if there is. > > <P0> > p0 = a CMP b > goto <X>; > > <P1> > p1 = c CMP d > goto <X>; > > <X> > # phi = PHI <p0 (P0), p1 (P1)> > if (phi != 0) goto <Y>; else goto <Z>; > > Transform to: > > <P0> > p0 = a CMP b > if (p0 != 0) goto <Y>; else goto <Z>; > > <P1> > p1 = c CMP d > if (p1 != 0) goto <Y>; else goto <Z>; > > Bootstrapped and tested on powerpc64le with no regressions, and testcases were > saved. Is this ok for trunk? I'm not sure I like a new pass here ;) The transform is basically tail-duplicating the PHI block because the exit conditional can be "simplfied" - that's something jump threading generally does though it relies on "simplified" being a little bit more simplified than above. I suspect this transform was implemented because of some benchmark? I suspect the performance benefit is because of better branch prediction by not mangling both conditional branches into one? The transform is also somewhat similar to tail-duplication done in path splitting or tracer. The pass itself does quite strict pattern-matching but I wonder if more cases should be handled this way. Any specific reason for the pass placement between PRE and sinking? tracer and path splitting run much later, jump threading runs all over the place. Thanks, Richard. > Thanks! > > [gcc] > 2019-05-06 Jiufu Guo <guojiufu@linux.ibm.com> > Lijia He <helijia@linux.ibm.com> > > PR tree-optimization/77820 > * tree-ssa-mergephicmp.c: New file. > * Makefile.in (OBJS): Add tree-ssa-mergephicmp.o. > * common.opt (ftree-mergephicmp): New flag. > * passes.def (pass_mergephicmp): New pass. > * timevar.def (TV_TREE_MERGEPHICMP): New timevar. > * tree-pass.h: New file. > > [gcc/testsuite] > 2019-05-06 Jiufu Guo <guojiufu@linux.ibm.com> > Lijia He <helijia@linux.ibm.com> > > PR tree-optimization/77820 > * gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase. > * gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase. > > > --- > gcc/Makefile.in | 1 + > gcc/common.opt | 4 + > gcc/passes.def | 1 + > gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c | 31 +++ > gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c | 31 +++ > gcc/timevar.def | 1 + > gcc/tree-pass.h | 1 + > gcc/tree-ssa-mergephicmp.c | 260 +++++++++++++++++++++++ > 8 files changed, 330 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > create mode 100644 gcc/tree-ssa-mergephicmp.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index d186d71..9729501 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1567,6 +1567,7 @@ OBJS = \ > tree-ssa-reassoc.o \ > tree-ssa-sccvn.o \ > tree-ssa-scopedtables.o \ > + tree-ssa-mergephicmp.o \ > tree-ssa-sink.o \ > tree-ssa-strlen.o \ > tree-ssa-structalias.o \ > diff --git a/gcc/common.opt b/gcc/common.opt > index d342c4f..5ea5ed2 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2702,6 +2702,10 @@ ftree-salias > Common Ignore > Does nothing. Preserved for backward compatibility. > > +ftree-mergephicmp > +Common Report Var(flag_mergephicmp) Init(1) Optimization > +Enable optimization on branch phi compare on trees. > + > ftree-sink > Common Report Var(flag_tree_sink) Optimization > Enable SSA code sinking on trees. > diff --git a/gcc/passes.def b/gcc/passes.def > index 446a7c4..e3a3913 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_lim); > NEXT_PASS (pass_walloca, false); > NEXT_PASS (pass_pre); > + NEXT_PASS (pass_mergephicmp); > NEXT_PASS (pass_sink_code); > NEXT_PASS (pass_sancov); > NEXT_PASS (pass_asan); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > new file mode 100644 > index 0000000..2e3f4f6 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ > + > +void g (void); > +void g1 (void); > + > +void > +f (long a, long b, long c, long d, int x) > +{ > + _Bool t; > + if (x) > + { > + t = a < b; > + } > + else if (d == a + b) > + { > + t = c < d; > + } > + else > + { > + t = a == c; > + } > + > + if (t) > + { > + g1 (); > + g (); > + } > +} > + > +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > new file mode 100644 > index 0000000..7c35417 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ > + > +void g (void); > +void g1 (void); > + > +void > +f (long a, long b, long c, long d, int x) > +{ > + int t; > + if (x) > + { > + t = a < b; > + } > + else if (d == a + b) > + { > + t = c < d; > + } > + else > + { > + t = a == c; > + } > + > + if (t) > + { > + g1 (); > + g (); > + } > +} > + > +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */ > diff --git a/gcc/timevar.def b/gcc/timevar.def > index 5415446..91f92d7 100644 > --- a/gcc/timevar.def > +++ b/gcc/timevar.def > @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES , "tree split crit edges") > DEFTIMEVAR (TV_TREE_REASSOC , "tree reassociation") > DEFTIMEVAR (TV_TREE_PRE , "tree PRE") > DEFTIMEVAR (TV_TREE_FRE , "tree FRE") > +DEFTIMEVAR (TV_TREE_MERGEPHICMP , "tree branch on PHI compare") > DEFTIMEVAR (TV_TREE_SINK , "tree code sinking") > DEFTIMEVAR (TV_TREE_PHIOPT , "tree linearize phis") > DEFTIMEVAR (TV_TREE_BACKPROP , "tree backward propagate") > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index 47be59b..e21e820 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt); > diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c > new file mode 100644 > index 0000000..7bb82d5 > --- /dev/null > +++ b/gcc/tree-ssa-mergephicmp.c > @@ -0,0 +1,260 @@ > +/* Elimiate PHI nodes which come from comparasion and used by branch. > + Copyright (C) 2004-2019 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it > +under the terms of the GNU General Public License as published by the > +Free Software Foundation; either version 3, or (at your option) any > +later version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "tree.h" > +#include "gimple.h" > +#include "cfghooks.h" > +#include "tree-pass.h" > +#include "ssa.h" > +#include "gimple-iterator.h" > +#include "tree-cfg.h" > + > +/* Return true if it is ok to merge phi's incoming value: > + - phi's incoming value is > + phi_arg = a CMP b > + or > + cmp = a CMP b > + phi_arg = (INT_CONV) cmp > + - phi's incoming value is defined in incoming basic block. > + > + * Parameter PHI: the phi to be checked. > + * Parameter INDEX: index'th incoming argument of phi to be checked. */ > +static bool > +could_incoming_merge (gphi *phi, int index) > +{ > + tree value = gimple_phi_arg_def (phi, index); > + if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value))) > + return false; > + > + gimple *def = SSA_NAME_DEF_STMT (value); > + > + if (!is_gimple_assign (def)) > + return false; > + > + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))) > + { > + value = gimple_assign_rhs1 (def); > + if (!has_single_use (value)) > + return false; > + > + def = SSA_NAME_DEF_STMT (value); > + > + if (!is_gimple_assign (def)) > + return false; > + } > + > + if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison) > + return false; > + > + /* Check if phi's incoming value is defined in the incoming basic_block. */ > + edge e = gimple_phi_arg_edge (phi, index); > + if (def->bb != e->src) > + return false; > + > + if (!single_succ_p (def->bb)) > + return false; > + > + return true; > +} > + > +/* Return true if the basic_block is ok to optimize: > + <X> > + res = PHI <v0(b0), v1(b1)...> > + if (res != 0) goto l_true; goto l_false; > + > + The phi stmt is saved to argument PHI. > + The gcond stmt is saved to argument GC. */ > +static bool > +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc) > +{ > + /* There should be only one phi. */ > + gphi_iterator pi = gsi_start_phis (bb); > + if (gsi_end_p (pi)) > + return false; > + > + *phi = pi.phi (); > + if (!has_single_use ((*phi)->result)) > + return false; > + > + gsi_next (&pi); > + if (!gsi_end_p (pi)) > + return false; > + > + /* There should be only one stmt which is gcond. */ > + gimple *gs = last_and_only_stmt (bb); > + if (gs == NULL) > + return false; > + > + if (!is_a<gcond *> (gs)) > + return false; > + > + *gc = as_a<gcond *> (gs); > + if (gimple_cond_lhs (*gc) != (*phi)->result) > + return false; > + > + /* Check if all incoming basic blocks can merge. */ > + for (unsigned int i = 0; i < (*phi)->nargs; i++) > + if (!could_incoming_merge (*phi, i)) > + return false; > + > + /* Check if there is no phi in any successors. */ > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, (*phi)->bb->succs) > + { > + if (!gsi_end_p (gsi_start_phis (e->dest))) > + return false; > + } > + > + return true; > +} > + > +/* Merge the phi and the gcond into the index'th incoming block. */ > +static void > +merge_incoming_bb (gcond *gc, gphi *phi, int index) > +{ > + tree value = gimple_phi_arg_def (phi, index); > + > + gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value, > + gimple_cond_rhs (gc), NULL, NULL); > + > + gimple *def = SSA_NAME_DEF_STMT (value); > + gimple_stmt_iterator last = gsi_last_bb (def->bb); > + gsi_insert_after (&last, new_gc, GSI_NEW_STMT); > + > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, phi->bb->succs) > + { > + edge new_e = make_edge (def->bb, e->dest, e->flags); > + new_e->probability = e->probability; > + } > +} > + > +/* If there are basic blocks look like: > + <P0> > + p0 = a CMP b > + goto <X>; > + > + <P1> > + p1 = c CMP d > + goto <X>; > + > + <X> > + # phi = PHI <p0 (P0), p1 (P1)> > + if (phi != 0) goto <Y>; else goto <Z>; > + > +Transform it to: > + > + <P0> > + p0 = a CMP b > + if (p0 != 0) goto <Y>; else goto <Z>; > + > + <P1> > + p1 = c CMP d > + if (p1 != 0) goto <Y>; else goto <Z>; */ > + > +static bool > +mergephicmp_once (function *fun) > +{ > + bool change = false; > + basic_block bb; > + basic_block prev_need_delete = NULL; > + > + FOR_EACH_BB_FN (bb, fun) > + { > + gphi *phi; > + gcond *gc; > + > + /* Prev bb can be deleted only after iterator move to next bb. */ > + if (prev_need_delete) > + delete_basic_block (prev_need_delete); > + prev_need_delete = NULL; > + > + if (could_optimize_phi_bb (bb, &phi, &gc)) > + { > + for (unsigned int i = 0; i < phi->nargs; i++) > + merge_incoming_bb (gc, phi, i); > + > + change = true; > + prev_need_delete = bb; > + } > + } > + > + return change; > +} > + > +namespace { > + > +const pass_data pass_data_mergephicmp = { > + GIMPLE_PASS, /* type */ > + "mergephicmp", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_TREE_MERGEPHICMP, /* tv_id */ > + /* PROP_no_crit_edges is ensured by running split_critical_edges in > + pass_data_sink_code::execute (). */ > + (PROP_cfg | PROP_ssa), /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + 0, /* todo_flags_finish */ > +}; > + > +class pass_mergephicmp : public gimple_opt_pass > +{ > +public: > + pass_mergephicmp (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_mergephicmp, ctxt) > + { > + } > + > + /* opt_pass methods: */ > + virtual bool gate (function *) { return flag_mergephicmp != 0; } > + > + virtual unsigned int execute (function *); > + > +}; // class pass_sink_code > + > +unsigned int > +pass_mergephicmp::execute (function *fun) > +{ > + bool changed; > + > + if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) > + return 0; > + > + changed = mergephicmp_once (fun); > + > + if (changed) > + free_dominance_info (CDI_DOMINATORS); > + > + return changed ? TODO_cleanup_cfg : 0; > +} > + > +} // anon namespace > + > +gimple_opt_pass * > +make_pass_mergephicmp (gcc::context *ctxt) > +{ > + return new pass_mergephicmp (ctxt); > +} >
Let me try to answer some of this... On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: > On Mon, 6 May 2019, Jiufu Guo wrote: > > This patch implements the optimization in PR77820. The optimization > > eliminates phi and phi's basic block, if the phi is used only by > > condition branch, and the phi's incoming value in the result of a > > CMP result. > I'm not sure I like a new pass here ;) The transform is basically > tail-duplicating the PHI block because the exit conditional can > be "simplfied" - that's something jump threading generally does > though it relies on "simplified" being a little bit more simplified > than above. Right, but where in the pipeline does this fit in? > I suspect this transform was implemented because of some benchmark? Something in SPEC... 2006 iirc... Will need to dig it up, I forgot the details. > I suspect the performance benefit is because of better branch > prediction by not mangling both conditional branches into one? No, it is that previously a condition was moved to a GPR, and then compared again. See PR77820. This is expensive, and serial, too. > The transform is also somewhat similar to tail-duplication done > in path splitting or tracer. Yes. > The pass itself does quite strict pattern-matching but I wonder > if more cases should be handled this way. Maybe. Probably. But which? > Any specific reason for the pass placement between PRE and sinking? > tracer and path splitting run much later, jump threading runs > all over the place. Dunno. Jiufu, does the pass placement matter much? Segher
On Mon, May 6, 2019 at 7:24 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote: > > Hi, > > This patch implements the optimization in PR77820. The optimization > eliminates phi and phi's basic block, if the phi is used only by > condition branch, and the phi's incoming value in the result of a > CMP result. > > This optimization eliminates: > 1. saving CMP result: p0 = a CMP b. > 2. additional CMP on branch: if (phi != 0). > 3. converting CMP result if there is phi = (INT_CONV) p0 if there is. > > <P0> > p0 = a CMP b > goto <X>; > > <P1> > p1 = c CMP d > goto <X>; > > <X> > # phi = PHI <p0 (P0), p1 (P1)> > if (phi != 0) goto <Y>; else goto <Z>; > > Transform to: > > <P0> > p0 = a CMP b > if (p0 != 0) goto <Y>; else goto <Z>; > > <P1> > p1 = c CMP d > if (p1 != 0) goto <Y>; else goto <Z>; > > Bootstrapped and tested on powerpc64le with no regressions, and testcases were > saved. Is this ok for trunk? forwprop was created orginally to something similar but this case is a specific case of backwards prop (almost). I wonder if it could be combined with that or as Richard mentioned, jump threading. Thanks, Andrew Pinski > > Thanks! > > [gcc] > 2019-05-06 Jiufu Guo <guojiufu@linux.ibm.com> > Lijia He <helijia@linux.ibm.com> > > PR tree-optimization/77820 > * tree-ssa-mergephicmp.c: New file. > * Makefile.in (OBJS): Add tree-ssa-mergephicmp.o. > * common.opt (ftree-mergephicmp): New flag. > * passes.def (pass_mergephicmp): New pass. > * timevar.def (TV_TREE_MERGEPHICMP): New timevar. > * tree-pass.h: New file. > > [gcc/testsuite] > 2019-05-06 Jiufu Guo <guojiufu@linux.ibm.com> > Lijia He <helijia@linux.ibm.com> > > PR tree-optimization/77820 > * gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase. > * gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase. > > > --- > gcc/Makefile.in | 1 + > gcc/common.opt | 4 + > gcc/passes.def | 1 + > gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c | 31 +++ > gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c | 31 +++ > gcc/timevar.def | 1 + > gcc/tree-pass.h | 1 + > gcc/tree-ssa-mergephicmp.c | 260 +++++++++++++++++++++++ > 8 files changed, 330 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > create mode 100644 gcc/tree-ssa-mergephicmp.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index d186d71..9729501 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1567,6 +1567,7 @@ OBJS = \ > tree-ssa-reassoc.o \ > tree-ssa-sccvn.o \ > tree-ssa-scopedtables.o \ > + tree-ssa-mergephicmp.o \ > tree-ssa-sink.o \ > tree-ssa-strlen.o \ > tree-ssa-structalias.o \ > diff --git a/gcc/common.opt b/gcc/common.opt > index d342c4f..5ea5ed2 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2702,6 +2702,10 @@ ftree-salias > Common Ignore > Does nothing. Preserved for backward compatibility. > > +ftree-mergephicmp > +Common Report Var(flag_mergephicmp) Init(1) Optimization > +Enable optimization on branch phi compare on trees. > + > ftree-sink > Common Report Var(flag_tree_sink) Optimization > Enable SSA code sinking on trees. > diff --git a/gcc/passes.def b/gcc/passes.def > index 446a7c4..e3a3913 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_lim); > NEXT_PASS (pass_walloca, false); > NEXT_PASS (pass_pre); > + NEXT_PASS (pass_mergephicmp); > NEXT_PASS (pass_sink_code); > NEXT_PASS (pass_sancov); > NEXT_PASS (pass_asan); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > new file mode 100644 > index 0000000..2e3f4f6 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ > + > +void g (void); > +void g1 (void); > + > +void > +f (long a, long b, long c, long d, int x) > +{ > + _Bool t; > + if (x) > + { > + t = a < b; > + } > + else if (d == a + b) > + { > + t = c < d; > + } > + else > + { > + t = a == c; > + } > + > + if (t) > + { > + g1 (); > + g (); > + } > +} > + > +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > new file mode 100644 > index 0000000..7c35417 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ > + > +void g (void); > +void g1 (void); > + > +void > +f (long a, long b, long c, long d, int x) > +{ > + int t; > + if (x) > + { > + t = a < b; > + } > + else if (d == a + b) > + { > + t = c < d; > + } > + else > + { > + t = a == c; > + } > + > + if (t) > + { > + g1 (); > + g (); > + } > +} > + > +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */ > diff --git a/gcc/timevar.def b/gcc/timevar.def > index 5415446..91f92d7 100644 > --- a/gcc/timevar.def > +++ b/gcc/timevar.def > @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES , "tree split crit edges") > DEFTIMEVAR (TV_TREE_REASSOC , "tree reassociation") > DEFTIMEVAR (TV_TREE_PRE , "tree PRE") > DEFTIMEVAR (TV_TREE_FRE , "tree FRE") > +DEFTIMEVAR (TV_TREE_MERGEPHICMP , "tree branch on PHI compare") > DEFTIMEVAR (TV_TREE_SINK , "tree code sinking") > DEFTIMEVAR (TV_TREE_PHIOPT , "tree linearize phis") > DEFTIMEVAR (TV_TREE_BACKPROP , "tree backward propagate") > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index 47be59b..e21e820 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt); > diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c > new file mode 100644 > index 0000000..7bb82d5 > --- /dev/null > +++ b/gcc/tree-ssa-mergephicmp.c > @@ -0,0 +1,260 @@ > +/* Elimiate PHI nodes which come from comparasion and used by branch. > + Copyright (C) 2004-2019 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it > +under the terms of the GNU General Public License as published by the > +Free Software Foundation; either version 3, or (at your option) any > +later version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "tree.h" > +#include "gimple.h" > +#include "cfghooks.h" > +#include "tree-pass.h" > +#include "ssa.h" > +#include "gimple-iterator.h" > +#include "tree-cfg.h" > + > +/* Return true if it is ok to merge phi's incoming value: > + - phi's incoming value is > + phi_arg = a CMP b > + or > + cmp = a CMP b > + phi_arg = (INT_CONV) cmp > + - phi's incoming value is defined in incoming basic block. > + > + * Parameter PHI: the phi to be checked. > + * Parameter INDEX: index'th incoming argument of phi to be checked. */ > +static bool > +could_incoming_merge (gphi *phi, int index) > +{ > + tree value = gimple_phi_arg_def (phi, index); > + if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value))) > + return false; > + > + gimple *def = SSA_NAME_DEF_STMT (value); > + > + if (!is_gimple_assign (def)) > + return false; > + > + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))) > + { > + value = gimple_assign_rhs1 (def); > + if (!has_single_use (value)) > + return false; > + > + def = SSA_NAME_DEF_STMT (value); > + > + if (!is_gimple_assign (def)) > + return false; > + } > + > + if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison) > + return false; > + > + /* Check if phi's incoming value is defined in the incoming basic_block. */ > + edge e = gimple_phi_arg_edge (phi, index); > + if (def->bb != e->src) > + return false; > + > + if (!single_succ_p (def->bb)) > + return false; > + > + return true; > +} > + > +/* Return true if the basic_block is ok to optimize: > + <X> > + res = PHI <v0(b0), v1(b1)...> > + if (res != 0) goto l_true; goto l_false; > + > + The phi stmt is saved to argument PHI. > + The gcond stmt is saved to argument GC. */ > +static bool > +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc) > +{ > + /* There should be only one phi. */ > + gphi_iterator pi = gsi_start_phis (bb); > + if (gsi_end_p (pi)) > + return false; > + > + *phi = pi.phi (); > + if (!has_single_use ((*phi)->result)) > + return false; > + > + gsi_next (&pi); > + if (!gsi_end_p (pi)) > + return false; > + > + /* There should be only one stmt which is gcond. */ > + gimple *gs = last_and_only_stmt (bb); > + if (gs == NULL) > + return false; > + > + if (!is_a<gcond *> (gs)) > + return false; > + > + *gc = as_a<gcond *> (gs); > + if (gimple_cond_lhs (*gc) != (*phi)->result) > + return false; > + > + /* Check if all incoming basic blocks can merge. */ > + for (unsigned int i = 0; i < (*phi)->nargs; i++) > + if (!could_incoming_merge (*phi, i)) > + return false; > + > + /* Check if there is no phi in any successors. */ > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, (*phi)->bb->succs) > + { > + if (!gsi_end_p (gsi_start_phis (e->dest))) > + return false; > + } > + > + return true; > +} > + > +/* Merge the phi and the gcond into the index'th incoming block. */ > +static void > +merge_incoming_bb (gcond *gc, gphi *phi, int index) > +{ > + tree value = gimple_phi_arg_def (phi, index); > + > + gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value, > + gimple_cond_rhs (gc), NULL, NULL); > + > + gimple *def = SSA_NAME_DEF_STMT (value); > + gimple_stmt_iterator last = gsi_last_bb (def->bb); > + gsi_insert_after (&last, new_gc, GSI_NEW_STMT); > + > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, phi->bb->succs) > + { > + edge new_e = make_edge (def->bb, e->dest, e->flags); > + new_e->probability = e->probability; > + } > +} > + > +/* If there are basic blocks look like: > + <P0> > + p0 = a CMP b > + goto <X>; > + > + <P1> > + p1 = c CMP d > + goto <X>; > + > + <X> > + # phi = PHI <p0 (P0), p1 (P1)> > + if (phi != 0) goto <Y>; else goto <Z>; > + > +Transform it to: > + > + <P0> > + p0 = a CMP b > + if (p0 != 0) goto <Y>; else goto <Z>; > + > + <P1> > + p1 = c CMP d > + if (p1 != 0) goto <Y>; else goto <Z>; */ > + > +static bool > +mergephicmp_once (function *fun) > +{ > + bool change = false; > + basic_block bb; > + basic_block prev_need_delete = NULL; > + > + FOR_EACH_BB_FN (bb, fun) > + { > + gphi *phi; > + gcond *gc; > + > + /* Prev bb can be deleted only after iterator move to next bb. */ > + if (prev_need_delete) > + delete_basic_block (prev_need_delete); > + prev_need_delete = NULL; > + > + if (could_optimize_phi_bb (bb, &phi, &gc)) > + { > + for (unsigned int i = 0; i < phi->nargs; i++) > + merge_incoming_bb (gc, phi, i); > + > + change = true; > + prev_need_delete = bb; > + } > + } > + > + return change; > +} > + > +namespace { > + > +const pass_data pass_data_mergephicmp = { > + GIMPLE_PASS, /* type */ > + "mergephicmp", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_TREE_MERGEPHICMP, /* tv_id */ > + /* PROP_no_crit_edges is ensured by running split_critical_edges in > + pass_data_sink_code::execute (). */ > + (PROP_cfg | PROP_ssa), /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + 0, /* todo_flags_finish */ > +}; > + > +class pass_mergephicmp : public gimple_opt_pass > +{ > +public: > + pass_mergephicmp (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_mergephicmp, ctxt) > + { > + } > + > + /* opt_pass methods: */ > + virtual bool gate (function *) { return flag_mergephicmp != 0; } > + > + virtual unsigned int execute (function *); > + > +}; // class pass_sink_code > + > +unsigned int > +pass_mergephicmp::execute (function *fun) > +{ > + bool changed; > + > + if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) > + return 0; > + > + changed = mergephicmp_once (fun); > + > + if (changed) > + free_dominance_info (CDI_DOMINATORS); > + > + return changed ? TODO_cleanup_cfg : 0; > +} > + > +} // anon namespace > + > +gimple_opt_pass * > +make_pass_mergephicmp (gcc::context *ctxt) > +{ > + return new pass_mergephicmp (ctxt); > +} > -- > 2.7.4 >
Hi, Thanks Richard, Segher, Andrew and all. Segher Boessenkool <segher@kernel.crashing.org> writes: > Let me try to answer some of this... > > On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: >> On Mon, 6 May 2019, Jiufu Guo wrote: >> > This patch implements the optimization in PR77820. The optimization >> > eliminates phi and phi's basic block, if the phi is used only by >> > condition branch, and the phi's incoming value in the result of a >> > CMP result. > >> I'm not sure I like a new pass here ;) The transform is basically >> tail-duplicating the PHI block because the exit conditional can >> be "simplfied" - that's something jump threading generally does >> though it relies on "simplified" being a little bit more simplified >> than above. > Adding a new pass is just because it may be relatively easy to extend and maintain. Adding this micor-optimization into other passes is also a good sugguestion. - Similar with jump threading, this new transform is replacing jump old destination with new destination. While for this case, the new destination can not be determined at compile time. - And as Andrew said: forwprop/backprop are similar, but this case is in opposite: it is spread common code(phi+gcond) into different preds. - And there is phiopt pass which focus on making 'phi' stmt better. it also do some similar thing: bb0: if (a != b) goto bb2; else goto bb1; bb1: bb2: x = PHI <a (bb1), b (bb0), ...>; tranform to: bb0: bb2: x = PHI <b (bb0), ...>; If I avoid to add new pass, any suggustions about which exiting pass (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to support this new transform? > Right, but where in the pipeline does this fit in? > >> I suspect this transform was implemented because of some benchmark? > > Something in SPEC... 2006 iirc... Will need to dig it up, I forgot > the details. > >> I suspect the performance benefit is because of better branch >> prediction by not mangling both conditional branches into one? > > No, it is that previously a condition was moved to a GPR, and then compared > again. See PR77820. This is expensive, and serial, too. > This transform focusing on eliminating additional GPR saving and additional compare, then it would helps a lot if the comparasion in in hot path. >> The transform is also somewhat similar to tail-duplication done >> in path splitting or tracer. > > Yes. > >> The pass itself does quite strict pattern-matching but I wonder >> if more cases should be handled this way. > > Maybe. Probably. But which? Currently this pass handles the case if there is only one PHI and the phi is used by gcond. If there are other suitable cases, we can just extend this tranform. > >> Any specific reason for the pass placement between PRE and sinking? >> tracer and path splitting run much later, jump threading runs >> all over the place. > > Dunno. Jiufu, does the pass placement matter much? This pass would just need be inserted after ssa, and before the last dce and forwprop which could help to eliminate temp conversion from bool to int: _1 = a_8(D) < b_9(D); t_14 = (int) _1; if (_1 != 0) Putting it after PRE is just because some case is better be handled by PREx, like below case: if (x) t = a < b; else t = a < b; if (t) xx. While actually, this pass is ok to insert at other placement. > > > Segher
On Tue, 7 May 2019, Andrew Pinski wrote: > On Mon, May 6, 2019 at 7:24 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote: > > > > Hi, > > > > This patch implements the optimization in PR77820. The optimization > > eliminates phi and phi's basic block, if the phi is used only by > > condition branch, and the phi's incoming value in the result of a > > CMP result. > > > > This optimization eliminates: > > 1. saving CMP result: p0 = a CMP b. > > 2. additional CMP on branch: if (phi != 0). > > 3. converting CMP result if there is phi = (INT_CONV) p0 if there is. > > > > <P0> > > p0 = a CMP b > > goto <X>; > > > > <P1> > > p1 = c CMP d > > goto <X>; > > > > <X> > > # phi = PHI <p0 (P0), p1 (P1)> > > if (phi != 0) goto <Y>; else goto <Z>; > > > > Transform to: > > > > <P0> > > p0 = a CMP b > > if (p0 != 0) goto <Y>; else goto <Z>; > > > > <P1> > > p1 = c CMP d > > if (p1 != 0) goto <Y>; else goto <Z>; > > > > Bootstrapped and tested on powerpc64le with no regressions, and testcases were > > saved. Is this ok for trunk? > > forwprop was created orginally to something similar but this case is a > specific case of backwards prop (almost). > I wonder if it could be combined with that or as Richard mentioned, > jump threading. The awkward part is that it changes the CFG so it doesn't fit forwprop well. I thought of ifcombine which kind-of does a reverse transform but in the end it is really duplicating <X> to get rid of the PHI and then applying forwprop. Richard. > Thanks, > Andrew Pinski > > > > > Thanks! > > > > [gcc] > > 2019-05-06 Jiufu Guo <guojiufu@linux.ibm.com> > > Lijia He <helijia@linux.ibm.com> > > > > PR tree-optimization/77820 > > * tree-ssa-mergephicmp.c: New file. > > * Makefile.in (OBJS): Add tree-ssa-mergephicmp.o. > > * common.opt (ftree-mergephicmp): New flag. > > * passes.def (pass_mergephicmp): New pass. > > * timevar.def (TV_TREE_MERGEPHICMP): New timevar. > > * tree-pass.h: New file. > > > > [gcc/testsuite] > > 2019-05-06 Jiufu Guo <guojiufu@linux.ibm.com> > > Lijia He <helijia@linux.ibm.com> > > > > PR tree-optimization/77820 > > * gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase. > > * gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase. > > > > > > --- > > gcc/Makefile.in | 1 + > > gcc/common.opt | 4 + > > gcc/passes.def | 1 + > > gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c | 31 +++ > > gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c | 31 +++ > > gcc/timevar.def | 1 + > > gcc/tree-pass.h | 1 + > > gcc/tree-ssa-mergephicmp.c | 260 +++++++++++++++++++++++ > > 8 files changed, 330 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > > create mode 100644 gcc/tree-ssa-mergephicmp.c > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > > index d186d71..9729501 100644 > > --- a/gcc/Makefile.in > > +++ b/gcc/Makefile.in > > @@ -1567,6 +1567,7 @@ OBJS = \ > > tree-ssa-reassoc.o \ > > tree-ssa-sccvn.o \ > > tree-ssa-scopedtables.o \ > > + tree-ssa-mergephicmp.o \ > > tree-ssa-sink.o \ > > tree-ssa-strlen.o \ > > tree-ssa-structalias.o \ > > diff --git a/gcc/common.opt b/gcc/common.opt > > index d342c4f..5ea5ed2 100644 > > --- a/gcc/common.opt > > +++ b/gcc/common.opt > > @@ -2702,6 +2702,10 @@ ftree-salias > > Common Ignore > > Does nothing. Preserved for backward compatibility. > > > > +ftree-mergephicmp > > +Common Report Var(flag_mergephicmp) Init(1) Optimization > > +Enable optimization on branch phi compare on trees. > > + > > ftree-sink > > Common Report Var(flag_tree_sink) Optimization > > Enable SSA code sinking on trees. > > diff --git a/gcc/passes.def b/gcc/passes.def > > index 446a7c4..e3a3913 100644 > > --- a/gcc/passes.def > > +++ b/gcc/passes.def > > @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3. If not see > > NEXT_PASS (pass_lim); > > NEXT_PASS (pass_walloca, false); > > NEXT_PASS (pass_pre); > > + NEXT_PASS (pass_mergephicmp); > > NEXT_PASS (pass_sink_code); > > NEXT_PASS (pass_sancov); > > NEXT_PASS (pass_asan); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > > new file mode 100644 > > index 0000000..2e3f4f6 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c > > @@ -0,0 +1,31 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ > > + > > +void g (void); > > +void g1 (void); > > + > > +void > > +f (long a, long b, long c, long d, int x) > > +{ > > + _Bool t; > > + if (x) > > + { > > + t = a < b; > > + } > > + else if (d == a + b) > > + { > > + t = c < d; > > + } > > + else > > + { > > + t = a == c; > > + } > > + > > + if (t) > > + { > > + g1 (); > > + g (); > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */ > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > > new file mode 100644 > > index 0000000..7c35417 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c > > @@ -0,0 +1,31 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ > > + > > +void g (void); > > +void g1 (void); > > + > > +void > > +f (long a, long b, long c, long d, int x) > > +{ > > + int t; > > + if (x) > > + { > > + t = a < b; > > + } > > + else if (d == a + b) > > + { > > + t = c < d; > > + } > > + else > > + { > > + t = a == c; > > + } > > + > > + if (t) > > + { > > + g1 (); > > + g (); > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */ > > diff --git a/gcc/timevar.def b/gcc/timevar.def > > index 5415446..91f92d7 100644 > > --- a/gcc/timevar.def > > +++ b/gcc/timevar.def > > @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES , "tree split crit edges") > > DEFTIMEVAR (TV_TREE_REASSOC , "tree reassociation") > > DEFTIMEVAR (TV_TREE_PRE , "tree PRE") > > DEFTIMEVAR (TV_TREE_FRE , "tree FRE") > > +DEFTIMEVAR (TV_TREE_MERGEPHICMP , "tree branch on PHI compare") > > DEFTIMEVAR (TV_TREE_SINK , "tree code sinking") > > DEFTIMEVAR (TV_TREE_PHIOPT , "tree linearize phis") > > DEFTIMEVAR (TV_TREE_BACKPROP , "tree backward propagate") > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > > index 47be59b..e21e820 100644 > > --- a/gcc/tree-pass.h > > +++ b/gcc/tree-pass.h > > @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt); > > +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt); > > diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c > > new file mode 100644 > > index 0000000..7bb82d5 > > --- /dev/null > > +++ b/gcc/tree-ssa-mergephicmp.c > > @@ -0,0 +1,260 @@ > > +/* Elimiate PHI nodes which come from comparasion and used by branch. > > + Copyright (C) 2004-2019 Free Software Foundation, Inc. > > + > > +This file is part of GCC. > > + > > +GCC is free software; you can redistribute it and/or modify it > > +under the terms of the GNU General Public License as published by the > > +Free Software Foundation; either version 3, or (at your option) any > > +later version. > > + > > +GCC is distributed in the hope that it will be useful, but WITHOUT > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > > +for more details. > > + > > +You should have received a copy of the GNU General Public License > > +along with GCC; see the file COPYING3. If not see > > +<http://www.gnu.org/licenses/>. */ > > + > > +#include "config.h" > > +#include "system.h" > > +#include "coretypes.h" > > +#include "backend.h" > > +#include "tree.h" > > +#include "gimple.h" > > +#include "cfghooks.h" > > +#include "tree-pass.h" > > +#include "ssa.h" > > +#include "gimple-iterator.h" > > +#include "tree-cfg.h" > > + > > +/* Return true if it is ok to merge phi's incoming value: > > + - phi's incoming value is > > + phi_arg = a CMP b > > + or > > + cmp = a CMP b > > + phi_arg = (INT_CONV) cmp > > + - phi's incoming value is defined in incoming basic block. > > + > > + * Parameter PHI: the phi to be checked. > > + * Parameter INDEX: index'th incoming argument of phi to be checked. */ > > +static bool > > +could_incoming_merge (gphi *phi, int index) > > +{ > > + tree value = gimple_phi_arg_def (phi, index); > > + if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value))) > > + return false; > > + > > + gimple *def = SSA_NAME_DEF_STMT (value); > > + > > + if (!is_gimple_assign (def)) > > + return false; > > + > > + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))) > > + { > > + value = gimple_assign_rhs1 (def); > > + if (!has_single_use (value)) > > + return false; > > + > > + def = SSA_NAME_DEF_STMT (value); > > + > > + if (!is_gimple_assign (def)) > > + return false; > > + } > > + > > + if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison) > > + return false; > > + > > + /* Check if phi's incoming value is defined in the incoming basic_block. */ > > + edge e = gimple_phi_arg_edge (phi, index); > > + if (def->bb != e->src) > > + return false; > > + > > + if (!single_succ_p (def->bb)) > > + return false; > > + > > + return true; > > +} > > + > > +/* Return true if the basic_block is ok to optimize: > > + <X> > > + res = PHI <v0(b0), v1(b1)...> > > + if (res != 0) goto l_true; goto l_false; > > + > > + The phi stmt is saved to argument PHI. > > + The gcond stmt is saved to argument GC. */ > > +static bool > > +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc) > > +{ > > + /* There should be only one phi. */ > > + gphi_iterator pi = gsi_start_phis (bb); > > + if (gsi_end_p (pi)) > > + return false; > > + > > + *phi = pi.phi (); > > + if (!has_single_use ((*phi)->result)) > > + return false; > > + > > + gsi_next (&pi); > > + if (!gsi_end_p (pi)) > > + return false; > > + > > + /* There should be only one stmt which is gcond. */ > > + gimple *gs = last_and_only_stmt (bb); > > + if (gs == NULL) > > + return false; > > + > > + if (!is_a<gcond *> (gs)) > > + return false; > > + > > + *gc = as_a<gcond *> (gs); > > + if (gimple_cond_lhs (*gc) != (*phi)->result) > > + return false; > > + > > + /* Check if all incoming basic blocks can merge. */ > > + for (unsigned int i = 0; i < (*phi)->nargs; i++) > > + if (!could_incoming_merge (*phi, i)) > > + return false; > > + > > + /* Check if there is no phi in any successors. */ > > + edge e; > > + edge_iterator ei; > > + FOR_EACH_EDGE (e, ei, (*phi)->bb->succs) > > + { > > + if (!gsi_end_p (gsi_start_phis (e->dest))) > > + return false; > > + } > > + > > + return true; > > +} > > + > > +/* Merge the phi and the gcond into the index'th incoming block. */ > > +static void > > +merge_incoming_bb (gcond *gc, gphi *phi, int index) > > +{ > > + tree value = gimple_phi_arg_def (phi, index); > > + > > + gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value, > > + gimple_cond_rhs (gc), NULL, NULL); > > + > > + gimple *def = SSA_NAME_DEF_STMT (value); > > + gimple_stmt_iterator last = gsi_last_bb (def->bb); > > + gsi_insert_after (&last, new_gc, GSI_NEW_STMT); > > + > > + edge e; > > + edge_iterator ei; > > + FOR_EACH_EDGE (e, ei, phi->bb->succs) > > + { > > + edge new_e = make_edge (def->bb, e->dest, e->flags); > > + new_e->probability = e->probability; > > + } > > +} > > + > > +/* If there are basic blocks look like: > > + <P0> > > + p0 = a CMP b > > + goto <X>; > > + > > + <P1> > > + p1 = c CMP d > > + goto <X>; > > + > > + <X> > > + # phi = PHI <p0 (P0), p1 (P1)> > > + if (phi != 0) goto <Y>; else goto <Z>; > > + > > +Transform it to: > > + > > + <P0> > > + p0 = a CMP b > > + if (p0 != 0) goto <Y>; else goto <Z>; > > + > > + <P1> > > + p1 = c CMP d > > + if (p1 != 0) goto <Y>; else goto <Z>; */ > > + > > +static bool > > +mergephicmp_once (function *fun) > > +{ > > + bool change = false; > > + basic_block bb; > > + basic_block prev_need_delete = NULL; > > + > > + FOR_EACH_BB_FN (bb, fun) > > + { > > + gphi *phi; > > + gcond *gc; > > + > > + /* Prev bb can be deleted only after iterator move to next bb. */ > > + if (prev_need_delete) > > + delete_basic_block (prev_need_delete); > > + prev_need_delete = NULL; > > + > > + if (could_optimize_phi_bb (bb, &phi, &gc)) > > + { > > + for (unsigned int i = 0; i < phi->nargs; i++) > > + merge_incoming_bb (gc, phi, i); > > + > > + change = true; > > + prev_need_delete = bb; > > + } > > + } > > + > > + return change; > > +} > > + > > +namespace { > > + > > +const pass_data pass_data_mergephicmp = { > > + GIMPLE_PASS, /* type */ > > + "mergephicmp", /* name */ > > + OPTGROUP_NONE, /* optinfo_flags */ > > + TV_TREE_MERGEPHICMP, /* tv_id */ > > + /* PROP_no_crit_edges is ensured by running split_critical_edges in > > + pass_data_sink_code::execute (). */ > > + (PROP_cfg | PROP_ssa), /* properties_required */ > > + 0, /* properties_provided */ > > + 0, /* properties_destroyed */ > > + 0, /* todo_flags_start */ > > + 0, /* todo_flags_finish */ > > +}; > > + > > +class pass_mergephicmp : public gimple_opt_pass > > +{ > > +public: > > + pass_mergephicmp (gcc::context *ctxt) > > + : gimple_opt_pass (pass_data_mergephicmp, ctxt) > > + { > > + } > > + > > + /* opt_pass methods: */ > > + virtual bool gate (function *) { return flag_mergephicmp != 0; } > > + > > + virtual unsigned int execute (function *); > > + > > +}; // class pass_sink_code > > + > > +unsigned int > > +pass_mergephicmp::execute (function *fun) > > +{ > > + bool changed; > > + > > + if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) > > + return 0; > > + > > + changed = mergephicmp_once (fun); > > + > > + if (changed) > > + free_dominance_info (CDI_DOMINATORS); > > + > > + return changed ? TODO_cleanup_cfg : 0; > > +} > > + > > +} // anon namespace > > + > > +gimple_opt_pass * > > +make_pass_mergephicmp (gcc::context *ctxt) > > +{ > > + return new pass_mergephicmp (ctxt); > > +} > > -- > > 2.7.4 > > >
On Wed, 8 May 2019, Jiufu Guo wrote: > Hi, > > Thanks Richard, Segher, Andrew and all. > > Segher Boessenkool <segher@kernel.crashing.org> writes: > > > Let me try to answer some of this... > > > > On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: > >> On Mon, 6 May 2019, Jiufu Guo wrote: > >> > This patch implements the optimization in PR77820. The optimization > >> > eliminates phi and phi's basic block, if the phi is used only by > >> > condition branch, and the phi's incoming value in the result of a > >> > CMP result. > > > >> I'm not sure I like a new pass here ;) The transform is basically > >> tail-duplicating the PHI block because the exit conditional can > >> be "simplfied" - that's something jump threading generally does > >> though it relies on "simplified" being a little bit more simplified > >> than above. > > > Adding a new pass is just because it may be relatively easy to extend > and maintain. > > Adding this micor-optimization into other passes is also a good > sugguestion. > > - Similar with jump threading, this new transform is replacing jump > old destination with new destination. While for this case, the new > destination can not be determined at compile time. > > - And as Andrew said: forwprop/backprop are similar, but this case is in > opposite: it is spread common code(phi+gcond) into different preds. > > - And there is phiopt pass which focus on making 'phi' stmt better. > it also do some similar thing: > bb0: > if (a != b) goto bb2; else goto bb1; > bb1: > bb2: > x = PHI <a (bb1), b (bb0), ...>; > > tranform to: > > bb0: > bb2: > x = PHI <b (bb0), ...>; > > If I avoid to add new pass, any suggustions about which exiting pass > (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to > support this new transform? The main thing the transform does is tail-duplicate the PHI block, if we'd just do that followup transforms would do the rest. So my suggestion would be to look at passes doing exactly that which would be tracer and path splitting (but that's just run for loops). If you want to fit it into jump threading then it would be the backwards threader since the heuristics would look at opportunities to simplify the conditional, see it fed by a PHI and there from (two) conditional(s). You do not put any constraints on the block relation of the conditional generators, for example path splitting looks for diamonds, duplicating the merge block. It might be that extending the backwards threader with the pattern matching is easiest (this one also runs quite a few times). > > Right, but where in the pipeline does this fit in? > > > >> I suspect this transform was implemented because of some benchmark? > > > > Something in SPEC... 2006 iirc... Will need to dig it up, I forgot > > the details. > > > >> I suspect the performance benefit is because of better branch > >> prediction by not mangling both conditional branches into one? > > > > No, it is that previously a condition was moved to a GPR, and then compared > > again. See PR77820. This is expensive, and serial, too. > > > This transform focusing on eliminating additional GPR saving and > additional compare, then it would helps a lot if the comparasion in in > hot path. > >> The transform is also somewhat similar to tail-duplication done > >> in path splitting or tracer. > > > > Yes. > > > >> The pass itself does quite strict pattern-matching but I wonder > >> if more cases should be handled this way. > > > > Maybe. Probably. But which? > Currently this pass handles the case if there is only one PHI and the > phi is used by gcond. If there are other suitable cases, we can just > extend this tranform. I was just thinking of a small amount of unrelated stmts in those blocks or the case where just one PHI argument is fed by a conditional thus only one path would simplify (but maybe the hot one if you look at edge probabilities). > > > >> Any specific reason for the pass placement between PRE and sinking? > >> tracer and path splitting run much later, jump threading runs > >> all over the place. > > > > Dunno. Jiufu, does the pass placement matter much? > This pass would just need be inserted after ssa, and before the last dce > and forwprop which could help to eliminate temp conversion from bool to > int: > _1 = a_8(D) < b_9(D); > t_14 = (int) _1; > if (_1 != 0) > > Putting it after PRE is just because some case is better be handled by PREx, > like below case: if (x) t = a < b; else t = a < b; if (t) xx. > While actually, this pass is ok to insert at other placement. OK, so all of the suggestions above would work since we have a late forwprop and DCE pass to clean up. Btw, I wonder if on RTL basic-block reordering (which also does some tail duplication) could be a place to do such transform? Or is it too late to do the desired cleanups after that? Possibly since we're after RA. Richard.
On Wed, May 08, 2019 at 02:20:19PM +0200, Richard Biener wrote: > Btw, I wonder if on RTL basic-block reordering (which also does > some tail duplication) could be a place to do such transform? > Or is it too late to do the desired cleanups after that? > Possibly since we're after RA. It is *much* too late; it is too late to do it at combine time, already (and combine of course cannot do this). You need the early RTL passes to clean up all that mess the conditionals generate -- the point of this patch is to never have that conditional-to-integer-reg stuff to begin with. RTL isn't nice for doing cross-BB transforms, either. Segher
On Wed, 8 May 2019, Segher Boessenkool wrote: > On Wed, May 08, 2019 at 02:20:19PM +0200, Richard Biener wrote: > > Btw, I wonder if on RTL basic-block reordering (which also does > > some tail duplication) could be a place to do such transform? > > Or is it too late to do the desired cleanups after that? > > Possibly since we're after RA. > > It is *much* too late; it is too late to do it at combine time, already > (and combine of course cannot do this). You need the early RTL passes to > clean up all that mess the conditionals generate -- the point of this patch > is to never have that conditional-to-integer-reg stuff to begin with. > > RTL isn't nice for doing cross-BB transforms, either. Ah, well - I thought it works reasonably well for extended BBs ;) We can of course also match this at RTL expansion time but exposing this earlier definitely looks valuable. Richard.
On 5/8/19 6:20 AM, Richard Biener wrote: > On Wed, 8 May 2019, Jiufu Guo wrote: > >> Hi, >> >> Thanks Richard, Segher, Andrew and all. >> >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> >>> Let me try to answer some of this... >>> >>> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: >>>> On Mon, 6 May 2019, Jiufu Guo wrote: >>>>> This patch implements the optimization in PR77820. The optimization >>>>> eliminates phi and phi's basic block, if the phi is used only by >>>>> condition branch, and the phi's incoming value in the result of a >>>>> CMP result. >>> >>>> I'm not sure I like a new pass here ;) The transform is basically >>>> tail-duplicating the PHI block because the exit conditional can >>>> be "simplfied" - that's something jump threading generally does >>>> though it relies on "simplified" being a little bit more simplified >>>> than above. >>> >> Adding a new pass is just because it may be relatively easy to extend >> and maintain. >> >> Adding this micor-optimization into other passes is also a good >> sugguestion. >> >> - Similar with jump threading, this new transform is replacing jump >> old destination with new destination. While for this case, the new >> destination can not be determined at compile time. >> >> - And as Andrew said: forwprop/backprop are similar, but this case is in >> opposite: it is spread common code(phi+gcond) into different preds. >> >> - And there is phiopt pass which focus on making 'phi' stmt better. >> it also do some similar thing: >> bb0: >> if (a != b) goto bb2; else goto bb1; >> bb1: >> bb2: >> x = PHI <a (bb1), b (bb0), ...>; >> >> tranform to: >> >> bb0: >> bb2: >> x = PHI <b (bb0), ...>; >> >> If I avoid to add new pass, any suggustions about which exiting pass >> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to >> support this new transform? > > The main thing the transform does is tail-duplicate the PHI block, > if we'd just do that followup transforms would do the rest. One might even claim this is really a transformation for cfg cleanups. If we ignore the PHI what we have is a unconditional jump to a conditional jump. The obvious way to optimize that is to replace the unconditional jump with a copy of the conditional jump. >> Currently this pass handles the case if there is only one PHI and the >> phi is used by gcond. If there are other suitable cases, we can just >> extend this tranform. > > I was just thinking of a small amount of unrelated stmts in those > blocks or the case where just one PHI argument is fed by a conditional > thus only one path would simplify (but maybe the hot one if you look > at edge probabilities). Perhaps, but do these happen enough in practice? With the code motions we do I'd be a bit surprised. > >>> >>>> Any specific reason for the pass placement between PRE and sinking? >>>> tracer and path splitting run much later, jump threading runs >>>> all over the place. >>> >>> Dunno. Jiufu, does the pass placement matter much? >> This pass would just need be inserted after ssa, and before the last dce >> and forwprop which could help to eliminate temp conversion from bool to >> int: >> _1 = a_8(D) < b_9(D); >> t_14 = (int) _1; >> if (_1 != 0) >> >> Putting it after PRE is just because some case is better be handled by PREx, >> like below case: if (x) t = a < b; else t = a < b; if (t) xx. >> While actually, this pass is ok to insert at other placement. > > OK, so all of the suggestions above would work since we have a late > forwprop and DCE pass to clean up. I'd expect this transformation to be useful for jump threading, VRP, DOM, etc. > > Btw, I wonder if on RTL basic-block reordering (which also does > some tail duplication) could be a place to do such transform? > Or is it too late to do the desired cleanups after that? > Possibly since we're after RA.IIRC we've had code in jump.c to do this in the past. It may have morphed through the years, but I'm pretty sure we've done this kind of thing on a low level before. jeff
On 5/6/19 8:24 AM, Jiufu Guo wrote: > Hi, > > This patch implements the optimization in PR77820. The optimization > eliminates phi and phi's basic block, if the phi is used only by > condition branch, and the phi's incoming value in the result of a > CMP result. > > This optimization eliminates: > 1. saving CMP result: p0 = a CMP b. > 2. additional CMP on branch: if (phi != 0). > 3. converting CMP result if there is phi = (INT_CONV) p0 if there is. > > <P0> > p0 = a CMP b > goto <X>; > > <P1> > p1 = c CMP d > goto <X>; > > <X> > # phi = PHI <p0 (P0), p1 (P1)> > if (phi != 0) goto <Y>; else goto <Z>; > > Transform to: > > <P0> > p0 = a CMP b > if (p0 != 0) goto <Y>; else goto <Z>; > > <P1> > p1 = c CMP d > if (p1 != 0) goto <Y>; else goto <Z>; > > Bootstrapped and tested on powerpc64le with no regressions, and testcases were > saved. Is this ok for trunk? So there's been talk of somehow integrating with jump threading. The big problem with that idea is jump threading is only concerned with rearranging the CFG when doing so allows it to determine the result of a conditional branch. This is actually a generalized form of path splitting that we currently do for loop latches. You may find you can twiddle that code to do what you want. One of the things that falls out of that realization is that you have to be very careful that this doesn't muck up if-conversion. Jeff
Jeff Law <law@redhat.com> writes: > On 5/6/19 8:24 AM, Jiufu Guo wrote: >> Hi, >> >> This patch implements the optimization in PR77820. The optimization >> eliminates phi and phi's basic block, if the phi is used only by >> condition branch, and the phi's incoming value in the result of a >> CMP result. >> >> This optimization eliminates: >> 1. saving CMP result: p0 = a CMP b. >> 2. additional CMP on branch: if (phi != 0). >> 3. converting CMP result if there is phi = (INT_CONV) p0 if there is. >> >> <P0> >> p0 = a CMP b >> goto <X>; >> >> <P1> >> p1 = c CMP d >> goto <X>; >> >> <X> >> # phi = PHI <p0 (P0), p1 (P1)> >> if (phi != 0) goto <Y>; else goto <Z>; >> >> Transform to: >> >> <P0> >> p0 = a CMP b >> if (p0 != 0) goto <Y>; else goto <Z>; >> >> <P1> >> p1 = c CMP d >> if (p1 != 0) goto <Y>; else goto <Z>; >> >> Bootstrapped and tested on powerpc64le with no regressions, and testcases were >> saved. Is this ok for trunk? > So there's been talk of somehow integrating with jump threading. The > big problem with that idea is jump threading is only concerned with > rearranging the CFG when doing so allows it to determine the result of a > conditional branch. > > This is actually a generalized form of path splitting that we currently > do for loop latches. You may find you can twiddle that code to do what > you want. > > One of the things that falls out of that realization is that you have to > be very careful that this doesn't muck up if-conversion. > > Jeff Thanks for all your suggeustions: Richard, Segher, Andrew and Jeff! Will send new patch after investigating and code refining. Jiufu Guo.
On Wed, 8 May 2019, Jeff Law wrote: > On 5/8/19 6:20 AM, Richard Biener wrote: > > On Wed, 8 May 2019, Jiufu Guo wrote: > > > >> Hi, > >> > >> Thanks Richard, Segher, Andrew and all. > >> > >> Segher Boessenkool <segher@kernel.crashing.org> writes: > >> > >>> Let me try to answer some of this... > >>> > >>> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: > >>>> On Mon, 6 May 2019, Jiufu Guo wrote: > >>>>> This patch implements the optimization in PR77820. The optimization > >>>>> eliminates phi and phi's basic block, if the phi is used only by > >>>>> condition branch, and the phi's incoming value in the result of a > >>>>> CMP result. > >>> > >>>> I'm not sure I like a new pass here ;) The transform is basically > >>>> tail-duplicating the PHI block because the exit conditional can > >>>> be "simplfied" - that's something jump threading generally does > >>>> though it relies on "simplified" being a little bit more simplified > >>>> than above. > >>> > >> Adding a new pass is just because it may be relatively easy to extend > >> and maintain. > >> > >> Adding this micor-optimization into other passes is also a good > >> sugguestion. > >> > >> - Similar with jump threading, this new transform is replacing jump > >> old destination with new destination. While for this case, the new > >> destination can not be determined at compile time. > >> > >> - And as Andrew said: forwprop/backprop are similar, but this case is in > >> opposite: it is spread common code(phi+gcond) into different preds. > >> > >> - And there is phiopt pass which focus on making 'phi' stmt better. > >> it also do some similar thing: > >> bb0: > >> if (a != b) goto bb2; else goto bb1; > >> bb1: > >> bb2: > >> x = PHI <a (bb1), b (bb0), ...>; > >> > >> tranform to: > >> > >> bb0: > >> bb2: > >> x = PHI <b (bb0), ...>; > >> > >> If I avoid to add new pass, any suggustions about which exiting pass > >> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to > >> support this new transform? > > > > The main thing the transform does is tail-duplicate the PHI block, > > if we'd just do that followup transforms would do the rest. > One might even claim this is really a transformation for cfg cleanups. > If we ignore the PHI what we have is a unconditional jump to a > conditional jump. The obvious way to optimize that is to replace the > unconditional jump with a copy of the conditional jump. I though about this too, but then quickly disregarded as too gross ;) > >> Currently this pass handles the case if there is only one PHI and the > >> phi is used by gcond. If there are other suitable cases, we can just > >> extend this tranform. > > > > I was just thinking of a small amount of unrelated stmts in those > > blocks or the case where just one PHI argument is fed by a conditional > > thus only one path would simplify (but maybe the hot one if you look > > at edge probabilities). > Perhaps, but do these happen enough in practice? With the code motions > we do I'd be a bit surprised. It probably depends on what kind of source this typically arises from. Richard.
On 5/9/19 12:07 AM, Richard Biener wrote: > On Wed, 8 May 2019, Jeff Law wrote: > >> On 5/8/19 6:20 AM, Richard Biener wrote: >>> On Wed, 8 May 2019, Jiufu Guo wrote: >>> >>>> Hi, >>>> >>>> Thanks Richard, Segher, Andrew and all. >>>> >>>> Segher Boessenkool <segher@kernel.crashing.org> writes: >>>> >>>>> Let me try to answer some of this... >>>>> >>>>> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: >>>>>> On Mon, 6 May 2019, Jiufu Guo wrote: >>>>>>> This patch implements the optimization in PR77820. The optimization >>>>>>> eliminates phi and phi's basic block, if the phi is used only by >>>>>>> condition branch, and the phi's incoming value in the result of a >>>>>>> CMP result. >>>>> >>>>>> I'm not sure I like a new pass here ;) The transform is basically >>>>>> tail-duplicating the PHI block because the exit conditional can >>>>>> be "simplfied" - that's something jump threading generally does >>>>>> though it relies on "simplified" being a little bit more simplified >>>>>> than above. >>>>> >>>> Adding a new pass is just because it may be relatively easy to extend >>>> and maintain. >>>> >>>> Adding this micor-optimization into other passes is also a good >>>> sugguestion. >>>> >>>> - Similar with jump threading, this new transform is replacing jump >>>> old destination with new destination. While for this case, the new >>>> destination can not be determined at compile time. >>>> >>>> - And as Andrew said: forwprop/backprop are similar, but this case is in >>>> opposite: it is spread common code(phi+gcond) into different preds. >>>> >>>> - And there is phiopt pass which focus on making 'phi' stmt better. >>>> it also do some similar thing: >>>> bb0: >>>> if (a != b) goto bb2; else goto bb1; >>>> bb1: >>>> bb2: >>>> x = PHI <a (bb1), b (bb0), ...>; >>>> >>>> tranform to: >>>> >>>> bb0: >>>> bb2: >>>> x = PHI <b (bb0), ...>; >>>> >>>> If I avoid to add new pass, any suggustions about which exiting pass >>>> (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to >>>> support this new transform? >>> >>> The main thing the transform does is tail-duplicate the PHI block, >>> if we'd just do that followup transforms would do the rest. >> One might even claim this is really a transformation for cfg cleanups. >> If we ignore the PHI what we have is a unconditional jump to a >> conditional jump. The obvious way to optimize that is to replace the >> unconditional jump with a copy of the conditional jump. > > I though about this too, but then quickly disregarded as too gross ;) Yea, the changes to the CFG (and dominator tree and SSA graph) are probably significant enough that trying to do it into cleanup_cfg is just madness. One of those few cases where doing something on RTL is probably easier than gimple. Jeff
Jeff Law <law@redhat.com> writes: > On 5/8/19 6:20 AM, Richard Biener wrote: >> On Wed, 8 May 2019, Jiufu Guo wrote: >> >> The main thing the transform does is tail-duplicate the PHI block, >> if we'd just do that followup transforms would do the rest. > One might even claim this is really a transformation for cfg cleanups. > If we ignore the PHI what we have is a unconditional jump to a > conditional jump. The obvious way to optimize that is to replace the > unconditional jump with a copy of the conditional jump. > >> Btw, I wonder if on RTL basic-block reordering (which also does >> some tail duplication) could be a place to do such transform? >> Or is it too late to do the desired cleanups after that? >> Possibly since we're after RA.IIRC we've had code in jump.c to do this in the past. It may have > morphed through the years, but I'm pretty sure we've done this kind of > thing on a low level before. Yes, RTL basic-block reordering duplicate the conditional jump and replace unconditional jump with it: 17: %9:DI=%9:DI^0x1 19: %9:DI=zero_extend(%9:QI) 58: pc=L31 59: barrier ---> 17: %9:DI=%9:DI^0x1 19: %9:DI=zero_extend(%9:QI) 33: %0:CC=cmp(%9:DI,0) REG_DEAD %9:DI 34: pc={(%0:CC!=0)?L90:pc} REG_DEAD %0:CC REG_BR_PROB 354334804 While, the conditional jump is still using GPR even the GPR(%9:DI) is result of a CMP instruction. Because moving CMP result to GPR is generated when tranforming gimple to rtl. To elimiate the instructions which moving result of CMP to GPR, I'm wondering maybe we could do a combine(or a peephole) after bbro to let the condition jump directly using the result of CMP. Jiufu Guo. > jeff
On Sun, May 12, 2019 at 10:07:28PM -0500, Jiufu Guo wrote: > To elimiate the instructions which moving result of CMP to GPR, I'm > wondering maybe we could do a combine(or a peephole) after bbro to let > the condition jump directly using the result of CMP. It will have to be a peephole. And peepholes are machine-specific. You will also not get any further optimisations in those blocks, that way; not combine etc. It really should be done on gimple, not rtl. Segher
On Mon, 13 May 2019, Segher Boessenkool wrote: > On Sun, May 12, 2019 at 10:07:28PM -0500, Jiufu Guo wrote: > > To elimiate the instructions which moving result of CMP to GPR, I'm > > wondering maybe we could do a combine(or a peephole) after bbro to let > > the condition jump directly using the result of CMP. > > It will have to be a peephole. And peepholes are machine-specific. > > You will also not get any further optimisations in those blocks, that way; > not combine etc. > > It really should be done on gimple, not rtl. Agreed.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d186d71..9729501 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1567,6 +1567,7 @@ OBJS = \ tree-ssa-reassoc.o \ tree-ssa-sccvn.o \ tree-ssa-scopedtables.o \ + tree-ssa-mergephicmp.o \ tree-ssa-sink.o \ tree-ssa-strlen.o \ tree-ssa-structalias.o \ diff --git a/gcc/common.opt b/gcc/common.opt index d342c4f..5ea5ed2 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2702,6 +2702,10 @@ ftree-salias Common Ignore Does nothing. Preserved for backward compatibility. +ftree-mergephicmp +Common Report Var(flag_mergephicmp) Init(1) Optimization +Enable optimization on branch phi compare on trees. + ftree-sink Common Report Var(flag_tree_sink) Optimization Enable SSA code sinking on trees. diff --git a/gcc/passes.def b/gcc/passes.def index 446a7c4..e3a3913 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -249,6 +249,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_lim); NEXT_PASS (pass_walloca, false); NEXT_PASS (pass_pre); + NEXT_PASS (pass_mergephicmp); NEXT_PASS (pass_sink_code); NEXT_PASS (pass_sancov); NEXT_PASS (pass_asan); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c new file mode 100644 index 0000000..2e3f4f6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ + +void g (void); +void g1 (void); + +void +f (long a, long b, long c, long d, int x) +{ + _Bool t; + if (x) + { + t = a < b; + } + else if (d == a + b) + { + t = c < d; + } + else + { + t = a == c; + } + + if (t) + { + g1 (); + g (); + } +} + +/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c new file mode 100644 index 0000000..7c35417 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-mergephicmp" } */ + +void g (void); +void g1 (void); + +void +f (long a, long b, long c, long d, int x) +{ + int t; + if (x) + { + t = a < b; + } + else if (d == a + b) + { + t = c < d; + } + else + { + t = a == c; + } + + if (t) + { + g1 (); + g (); + } +} + +/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */ diff --git a/gcc/timevar.def b/gcc/timevar.def index 5415446..91f92d7 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES , "tree split crit edges") DEFTIMEVAR (TV_TREE_REASSOC , "tree reassociation") DEFTIMEVAR (TV_TREE_PRE , "tree PRE") DEFTIMEVAR (TV_TREE_FRE , "tree FRE") +DEFTIMEVAR (TV_TREE_MERGEPHICMP , "tree branch on PHI compare") DEFTIMEVAR (TV_TREE_SINK , "tree code sinking") DEFTIMEVAR (TV_TREE_PHIOPT , "tree linearize phis") DEFTIMEVAR (TV_TREE_BACKPROP , "tree backward propagate") diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 47be59b..e21e820 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -441,6 +441,7 @@ extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dse (gcc::context *ctxt); extern gimple_opt_pass *make_pass_nrv (gcc::context *ctxt); extern gimple_opt_pass *make_pass_rename_ssa_copies (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_mergephicmp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sink_code (gcc::context *ctxt); extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt); extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt); diff --git a/gcc/tree-ssa-mergephicmp.c b/gcc/tree-ssa-mergephicmp.c new file mode 100644 index 0000000..7bb82d5 --- /dev/null +++ b/gcc/tree-ssa-mergephicmp.c @@ -0,0 +1,260 @@ +/* Elimiate PHI nodes which come from comparasion and used by branch. + Copyright (C) 2004-2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +GCC is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "cfghooks.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-iterator.h" +#include "tree-cfg.h" + +/* Return true if it is ok to merge phi's incoming value: + - phi's incoming value is + phi_arg = a CMP b + or + cmp = a CMP b + phi_arg = (INT_CONV) cmp + - phi's incoming value is defined in incoming basic block. + + * Parameter PHI: the phi to be checked. + * Parameter INDEX: index'th incoming argument of phi to be checked. */ +static bool +could_incoming_merge (gphi *phi, int index) +{ + tree value = gimple_phi_arg_def (phi, index); + if (!(TREE_CODE (value) == SSA_NAME && has_single_use (value))) + return false; + + gimple *def = SSA_NAME_DEF_STMT (value); + + if (!is_gimple_assign (def)) + return false; + + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))) + { + value = gimple_assign_rhs1 (def); + if (!has_single_use (value)) + return false; + + def = SSA_NAME_DEF_STMT (value); + + if (!is_gimple_assign (def)) + return false; + } + + if (TREE_CODE_CLASS (gimple_assign_rhs_code (def)) != tcc_comparison) + return false; + + /* Check if phi's incoming value is defined in the incoming basic_block. */ + edge e = gimple_phi_arg_edge (phi, index); + if (def->bb != e->src) + return false; + + if (!single_succ_p (def->bb)) + return false; + + return true; +} + +/* Return true if the basic_block is ok to optimize: + <X> + res = PHI <v0(b0), v1(b1)...> + if (res != 0) goto l_true; goto l_false; + + The phi stmt is saved to argument PHI. + The gcond stmt is saved to argument GC. */ +static bool +could_optimize_phi_bb (basic_block bb, gphi **phi, gcond **gc) +{ + /* There should be only one phi. */ + gphi_iterator pi = gsi_start_phis (bb); + if (gsi_end_p (pi)) + return false; + + *phi = pi.phi (); + if (!has_single_use ((*phi)->result)) + return false; + + gsi_next (&pi); + if (!gsi_end_p (pi)) + return false; + + /* There should be only one stmt which is gcond. */ + gimple *gs = last_and_only_stmt (bb); + if (gs == NULL) + return false; + + if (!is_a<gcond *> (gs)) + return false; + + *gc = as_a<gcond *> (gs); + if (gimple_cond_lhs (*gc) != (*phi)->result) + return false; + + /* Check if all incoming basic blocks can merge. */ + for (unsigned int i = 0; i < (*phi)->nargs; i++) + if (!could_incoming_merge (*phi, i)) + return false; + + /* Check if there is no phi in any successors. */ + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, (*phi)->bb->succs) + { + if (!gsi_end_p (gsi_start_phis (e->dest))) + return false; + } + + return true; +} + +/* Merge the phi and the gcond into the index'th incoming block. */ +static void +merge_incoming_bb (gcond *gc, gphi *phi, int index) +{ + tree value = gimple_phi_arg_def (phi, index); + + gcond *new_gc = gimple_build_cond (gimple_cond_code (gc), value, + gimple_cond_rhs (gc), NULL, NULL); + + gimple *def = SSA_NAME_DEF_STMT (value); + gimple_stmt_iterator last = gsi_last_bb (def->bb); + gsi_insert_after (&last, new_gc, GSI_NEW_STMT); + + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, phi->bb->succs) + { + edge new_e = make_edge (def->bb, e->dest, e->flags); + new_e->probability = e->probability; + } +} + +/* If there are basic blocks look like: + <P0> + p0 = a CMP b + goto <X>; + + <P1> + p1 = c CMP d + goto <X>; + + <X> + # phi = PHI <p0 (P0), p1 (P1)> + if (phi != 0) goto <Y>; else goto <Z>; + +Transform it to: + + <P0> + p0 = a CMP b + if (p0 != 0) goto <Y>; else goto <Z>; + + <P1> + p1 = c CMP d + if (p1 != 0) goto <Y>; else goto <Z>; */ + +static bool +mergephicmp_once (function *fun) +{ + bool change = false; + basic_block bb; + basic_block prev_need_delete = NULL; + + FOR_EACH_BB_FN (bb, fun) + { + gphi *phi; + gcond *gc; + + /* Prev bb can be deleted only after iterator move to next bb. */ + if (prev_need_delete) + delete_basic_block (prev_need_delete); + prev_need_delete = NULL; + + if (could_optimize_phi_bb (bb, &phi, &gc)) + { + for (unsigned int i = 0; i < phi->nargs; i++) + merge_incoming_bb (gc, phi, i); + + change = true; + prev_need_delete = bb; + } + } + + return change; +} + +namespace { + +const pass_data pass_data_mergephicmp = { + GIMPLE_PASS, /* type */ + "mergephicmp", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_TREE_MERGEPHICMP, /* tv_id */ + /* PROP_no_crit_edges is ensured by running split_critical_edges in + pass_data_sink_code::execute (). */ + (PROP_cfg | PROP_ssa), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_mergephicmp : public gimple_opt_pass +{ +public: + pass_mergephicmp (gcc::context *ctxt) + : gimple_opt_pass (pass_data_mergephicmp, ctxt) + { + } + + /* opt_pass methods: */ + virtual bool gate (function *) { return flag_mergephicmp != 0; } + + virtual unsigned int execute (function *); + +}; // class pass_sink_code + +unsigned int +pass_mergephicmp::execute (function *fun) +{ + bool changed; + + if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) + return 0; + + changed = mergephicmp_once (fun); + + if (changed) + free_dominance_info (CDI_DOMINATORS); + + return changed ? TODO_cleanup_cfg : 0; +} + +} // anon namespace + +gimple_opt_pass * +make_pass_mergephicmp (gcc::context *ctxt) +{ + return new pass_mergephicmp (ctxt); +}