Message ID | 56450B62.4090404@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Nov 12, 2015 at 10:57 PM, Jeff Law <law@redhat.com> wrote: > On 11/12/2015 11:32 AM, Jeff Law wrote: >> >> On 11/12/2015 10:05 AM, Jeff Law wrote: >>>> >>>> But IIRC you mentioned it should enable vectorization or so? In this >>>> case >>>> that's obviously too late. >>> >>> The opposite. Path splitting interferes with if-conversion & >>> vectorization. Path splitting mucks up the CFG enough that >>> if-conversion won't fire and as a result vectorization is inhibited. It >>> also creates multi-latch loops, which isn't a great situation either. >>> >>> It *may* be the case that dropping it that far down in the pipeline and >>> making the modifications necessary to handle simple latches may in turn >>> make the path splitting code play better with if-conversion and >>> vectorization and avoid creation of multi-latch loops. At least that's >>> how it looks on paper when I draw out the CFG manipulations. >>> >>> I'll do some experiments. >> >> It doesn't look too terrible to ravamp the recognition code to work >> later in the pipeline with simple latches. Sadly that doesn't seem to >> have fixed the bad interactions with if-conversion. >> >> *But* that does open up the possibility of moving the path splitting >> pass even deeper in the pipeline -- in particular we can move it past >> the vectorizer. Which is may be a win. >> >> So the big question is whether or not we'll still see enough benefits >> from having it so late in the pipeline. It's still early enough that we >> get DOM, VRP, reassoc, forwprop, phiopt, etc. >> >> Ajit, I'll pass along an updated patch after doing some more testing. > > So here's what I'm working with. It runs after the vectorizer now. > > Ajit, if you could benchmark this it would be greatly appreciated. I know > you saw significant improvements on one or more benchmarks in the past. > It'd be good to know that the updated placement of the pass doesn't > invalidate the gains you saw. > > With the updated pass placement, we don't have to worry about switching the > pass on/off based on whether or not the vectorizer & if-conversion are > enabled. So that hackery is gone. > > I think I've beefed up the test to identify the diamond patterns we want so > that it's stricter in what we accept. The call to ignore_bb_p is a part of > that test so that we're actually looking at the right block in a world where > we're doing this transformation with simple latches. > > I've also put a graphical comment before perform_path_splitting which > hopefully shows the CFG transformation we're making a bit clearer. > > This bootstraps and regression tests cleanly on x86_64-linux-gnu. > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 34d2356..6613e83 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1474,6 +1474,7 @@ OBJS = \ > tree-ssa-loop.o \ > tree-ssa-math-opts.o \ > tree-ssa-operands.o \ > + tree-ssa-path-split.o \ gimple-ssa-path-split please. > tree-ssa-phionlycprop.o \ > tree-ssa-phiopt.o \ > tree-ssa-phiprop.o \ > diff --git a/gcc/common.opt b/gcc/common.opt > index 757ce85..3e946ca 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2403,6 +2403,10 @@ ftree-vrp > Common Report Var(flag_tree_vrp) Init(0) Optimization > Perform Value Range Propagation on trees. > > +ftree-path-split fsplit-paths > +Common Report Var(flag_tree_path_split) Init(0) Optimization > +Perform Path Splitting on trees for loop backedges. > + > funit-at-a-time > Common Report Var(flag_unit_at_a_time) Init(1) > Compile whole compilation unit at a time. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 213a9d0..b1e95da 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -354,6 +354,7 @@ Objective-C and Objective-C++ Dialects}. > -fdump-tree-fre@r{[}-@var{n}@r{]} @gol > -fdump-tree-vtable-verify @gol > -fdump-tree-vrp@r{[}-@var{n}@r{]} @gol > +-fdump-tree-path-split@r{[}-@var{n}@r{]} @gol > -fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol > -fdump-final-insns=@var{file} @gol > -fcompare-debug@r{[}=@var{opts}@r{]} -fcompare-debug-second @gol > @@ -462,7 +463,7 @@ Objective-C and Objective-C++ Dialects}. > -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta > @gol > -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol > -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol > --ftree-vectorize -ftree-vrp @gol > +-ftree-vectorize -ftree-vrp @gol -ftree-path-split @gol > -funit-at-a-time -funroll-all-loops -funroll-loops @gol > -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops > @gol > -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol > @@ -7169,6 +7170,11 @@ output on to @file{stderr}. If two conflicting dump > filenames are > given for the same pass, then the latter option overrides the earlier > one. > > +@item path-split > +@opindex fdump-tree-path-split > +Dump each function after path splitting. The file name is made by > +appending @file{.path-split} to the source file name. > + > @item all > Turn on all options, except @option{raw}, @option{slim}, @option{verbose} > and @option{lineno}. > @@ -7811,6 +7817,7 @@ also turns on the following optimization flags: > -ftree-switch-conversion -ftree-tail-merge @gol > -ftree-pre @gol > -ftree-vrp @gol > +-ftree-path-split @gol > -fipa-ra} > > Please note the warning under @option{-fgcse} about > @@ -8819,7 +8826,7 @@ currently enabled, but may be enabled by @option{-O2} > in the future. > > @item -ftree-sink > @opindex ftree-sink > -Perform forward store motion on trees. This flag is > +Perform forward store motion on trees. This flag is > enabled by default at @option{-O} and higher. > > @item -ftree-bit-ccp > @@ -9125,6 +9132,13 @@ enabled by default at @option{-O2} and higher. Null > pointer check > elimination is only done if @option{-fdelete-null-pointer-checks} is > enabled. > > +@item -ftree-path-split > +@opindex ftree-path-split > +Perform Path Splitting on trees. When the two execution paths of a > +if-then-else merge at the loop latch node, try to duplicate the > +merge node into two paths. This is enabled by default at @option{-O2} > +and above. > + I think if we go into the detail of the transform we should mention the effective result (creating a loop nest with disambiguation figuring out which is the "better" inner loop). > @item -fsplit-ivs-in-unroller > @opindex fsplit-ivs-in-unroller > Enables expression of values of induction variables in later iterations > diff --git a/gcc/opts.c b/gcc/opts.c > index 9a3fbb3..9a0b27c 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -509,6 +509,7 @@ static const struct default_options > default_options_table[] = > { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 > }, > { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, > + { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 }, Is this transform a good idea for -Os? > > /* -O3 optimizations. */ > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, > diff --git a/gcc/passes.def b/gcc/passes.def > index c0ab6b9..4c9ef5f 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -274,6 +274,7 @@ along with GCC; see the file COPYING3. If not see > POP_INSERT_PASSES () > NEXT_PASS (pass_simduid_cleanup); > NEXT_PASS (pass_lower_vector_ssa); > + NEXT_PASS (pass_path_split); > NEXT_PASS (pass_cse_reciprocals); > NEXT_PASS (pass_reassoc); > NEXT_PASS (pass_strength_reduction); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c > new file mode 100644 > index 0000000..c7e9515 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c > @@ -0,0 +1,67 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-path-split-details " } */ > + > +#include <stdio.h> > +#include <stdlib.h> > + > +#define RGBMAX 255 > + > +int > +test() > +{ > + int i, Pels; > + unsigned char sum = 0; > + unsigned char xr, xg, xb; > + unsigned char xc, xm, xy, xk; > + unsigned char *ReadPtr, *EritePtr; > + > + ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100); > + EritePtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100); > + > + for (i = 0; i < 100;i++) > + { > + ReadPtr[i] = 100 - i; > + } > + > + for (i = 0; i < 100; i++) > + { > + xr = *ReadPtr++; > + xg = *ReadPtr++; > + xb = *ReadPtr++; > + > + xc = (unsigned char) (RGBMAX - xr); > + xm = (unsigned char) (RGBMAX - xg); > + xy = (unsigned char) (RGBMAX - xb); > + > + if (xc < xm) > + { > + xk = (unsigned char) (xc < xy ? xc : xy); > + } > + else > + { > + xk = (unsigned char) (xm < xy ? xm : xy); > + } > + > + xc = (unsigned char) (xc - xk); > + xm = (unsigned char) (xm - xk); > + xy = (unsigned char) (xy - xk); > + > + *EritePtr++ = xc; > + *EritePtr++ = xm; > + *EritePtr++ = xy; > + *EritePtr++ = xk; > + sum += *EritePtr; > + } > + return sum; > +} > + > +int > +main() > +{ > + if (test() != 33) > + abort(); > + > + return 0; > +} > + > +/* { dg-final { scan-tree-dump "Duplicating join block" "path-split" } } */ > diff --git a/gcc/timevar.def b/gcc/timevar.def > index b429faf..3dba68b 100644 > --- a/gcc/timevar.def > +++ b/gcc/timevar.def > @@ -300,3 +300,4 @@ DEFTIMEVAR (TV_LINK , "link JIT code") > DEFTIMEVAR (TV_LOAD , "load JIT result") > DEFTIMEVAR (TV_JIT_ACQUIRING_MUTEX , "acquiring JIT mutex") > DEFTIMEVAR (TV_JIT_CLIENT_CODE , "JIT client code") > +DEFTIMEVAR (TV_TREE_PATH_SPLIT , "tree path split") > diff --git a/gcc/tracer.c b/gcc/tracer.c > index 941dc20..c7b5150 100644 > --- a/gcc/tracer.c > +++ b/gcc/tracer.c > @@ -51,9 +51,9 @@ > #include "tree-inline.h" > #include "cfgloop.h" > #include "fibonacci_heap.h" > +#include "tracer.h" > > static int count_insns (basic_block); > -static bool ignore_bb_p (const_basic_block); > static bool better_p (const_edge, const_edge); > static edge find_best_successor (basic_block); > static edge find_best_predecessor (basic_block); > @@ -85,7 +85,7 @@ bb_seen_p (basic_block bb) > } > > /* Return true if we should ignore the basic block for purposes of tracing. > */ > -static bool > +bool > ignore_bb_p (const_basic_block bb) > { > if (bb->index < NUM_FIXED_BLOCKS) > @@ -226,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace) > return i; > } > > +/* Duplicate block BB2, placing it after BB in the CFG. Return the > + newly created block. */ > +basic_block > +transform_duplicate (basic_block bb, basic_block bb2) > +{ > + edge e; > + basic_block copy; > + > + e = find_edge (bb, bb2); > + > + copy = duplicate_block (bb2, e, bb); > + flush_pending_stmts (e); > + > + add_phi_args_after_copy (©, 1, NULL); > + > + return (copy); > +} > + > /* Look for basic blocks in frequency order, construct traces and tail > duplicate > if profitable. */ > > @@ -321,17 +339,8 @@ tail_duplicate (void) > entries or at least rotate the loop. */ > && bb2->loop_father->header != bb2) > { > - edge e; > - basic_block copy; > - > - nduplicated += counts [bb2->index]; > - > - e = find_edge (bb, bb2); > - > - copy = duplicate_block (bb2, e, bb); > - flush_pending_stmts (e); > - > - add_phi_args_after_copy (©, 1, NULL); > + nduplicated += counts [bb2->index]; > + basic_block copy = transform_duplicate (bb, bb2); > > /* Reconsider the original copy of block we've duplicated. > Removing the most common predecessor may make it to be > diff --git a/gcc/tracer.h b/gcc/tracer.h > new file mode 100644 > index 0000000..cd1792a > --- /dev/null > +++ b/gcc/tracer.h > @@ -0,0 +1,26 @@ > +/* Header file for Tracer. > + Copyright (C) 2015 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > + for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#ifndef GCC_TRACER_H > +#define GCC_TRACER_H > + > +extern basic_block transform_duplicate (basic_block bb, basic_block bb2); > +extern bool ignore_bb_p (const_basic_block bb); > + > +#endif /* GCC_TRACER_H */ > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index 49e22a9..6963acc 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -390,6 +390,7 @@ extern gimple_opt_pass *make_pass_tree_loop_done > (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt); > diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c > new file mode 100644 > index 0000000..9f61bd4 > --- /dev/null > +++ b/gcc/tree-ssa-path-split.c > @@ -0,0 +1,275 @@ > +/* Support routines for Path Splitting. > + Copyright (C) 2015 Free Software Foundation, Inc. > + Contributed by Ajit Kumar Agarwal <ajitkum@xilinx.com>. > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + > +GCC is distributed in the hope that it will be useful, > +but WITHOUT ANY WARRANTY; without even the implied warranty of > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +GNU General Public License for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "tree.h" > +#include "gimple.h" > +#include "tree-pass.h" > +#include "cfganal.h" > +#include "cfgloop.h" > +#include "gimple-iterator.h" > +#include "tracer.h" > + > +/* Given LATCH, the latch block in a loop, see if the shape of the > + path reaching LATCH is suitable for path splitting. If so, return > + the block that will be duplicated into its predecessor paths. Else > + return NULL. */ > + > +static basic_block > +find_block_to_duplicate_for_path_splitting (basic_block latch) > +{ > + /* We should have simple latches at this point. So the latch should > + have a single successor. This implies the predecessor of the latch > + likely has the loop exit. And it's that predecessor we're most > + interested in. To keep things simple, we're going to require that > + the latch have a single predecessor too. */ > + if (single_succ_p (latch) && single_pred_p (latch)) > + { > + basic_block bb = get_immediate_dominator (CDI_DOMINATORS, latch); > + gcc_assert (single_pred_edge (latch)->src == bb); > + > + /* If BB has been marked as not to be duplicated, then honor that > + request. */ > + if (ignore_bb_p (bb)) > + return NULL; > + > + gimple *last = gsi_stmt (gsi_last_nondebug_bb (bb)); > + /* The immediate dominator of the latch must end in a conditional. > */ > + if (!last || gimple_code (last) != GIMPLE_COND) > + return NULL; > + > + /* We're hoping that BB is a join point for an IF-THEN-ELSE diamond > + region. Verify that it is. > + > + First, verify that BB has two predecessors (each arm of the > + IF-THEN-ELSE) and two successors (the latch and exit). */ > + if (EDGE_COUNT (bb->preds) == 2 && EDGE_COUNT (bb->succs) == 2) > + { > + /* Now verify that BB's immediate dominator ends in a > + conditional as well. */ > + basic_block bb_idom = get_immediate_dominator (CDI_DOMINATORS, > bb); > + gimple *last = gsi_stmt (gsi_last_nondebug_bb (bb_idom)); > + if (!last || gimple_code (last) != GIMPLE_COND) > + return NULL; > + > + /* And that BB's immediate dominator's successors are the > + the predecessors of BB. */ > + if (!find_edge (bb_idom, EDGE_PRED (bb, 0)->src) > + || !find_edge (bb_idom, EDGE_PRED (bb, 1)->src)) > + return NULL; > + > + /* So at this point we have a simple diamond for an IF-THEN-ELSE > + construct starting at BB_IDOM, with a join point at BB. BB > + pass control outside the loop or to the loop latch. > + > + We're going to want to create two duplicates of BB, one for > + each successor of BB_IDOM. */ > + return bb; > + } > + } > + return NULL; > +} > + > +/* Return TRUE if BB is a reasonable block to duplicate by examining > + its size, false otherwise. BB will always be a loop latch block. > + > + Should this use the same tests as we do for jump threading? */ > + > +static bool > +is_feasible_trace (basic_block bb) > +{ > + int num_stmt = 0; > + gimple_stmt_iterator gsi; > + > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple *stmt = gsi_stmt (gsi); > + if (!is_gimple_debug (stmt)) > + num_stmt++; > + } > + > + /* We may want to limit how many statements we copy. */ > + if (num_stmt > 1) > + return true; > + > + return false; > +} > + > +/* If the immediate dominator of the latch of the loop is > + block with conditional branch, then the loop latch is > + duplicated to its predecessors path preserving the SSA > + semantics. > + > + CFG before transformation. > + > + 2 > + | > + | > + +---->3 > + | / \ > + | / \ > + | 4 5 > + | \ / > + | \ / > + | 6 > + | / \ > + | / \ > + | 8 7 > + | | | > + ---+ E > + > + > + > + Block 8 is the latch. We're going to make copies of block 6 (9 & 10) > + and wire things up so they look like this: > + > + 2 > + | > + | > + +---->3 > + | / \ > + | / \ > + | 4 5 > + | | | > + | | | > + | 9 10 > + | |\ /| > + | | \ / | > + | | 7 | > + | | | | > + | | E | > + | | | > + | \ / > + | \ / > + +-----8 > + > + > + Blocks 9 and 10 will get merged into blocks 4 & 5 respectively which > + enables CSE, DCE and other optimizations to occur on a larger block > + of code. */ > + > +static bool > +perform_path_splitting () > +{ > + bool changed = false; > + loop_p loop; > + > + loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); > + initialize_original_copy_tables (); > + calculate_dominance_info (CDI_DOMINATORS); > + > + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > + { > + /* See if there is a block that we can duplicate to split the > + path to the loop latch. */ > + basic_block bb = find_block_to_duplicate_for_path_splitting > (loop->latch); > + > + /* BB is the merge point for an IF-THEN-ELSE we want to transform. > + > + Essentially we want to create two duplicates of BB and append > + a duplicate to the THEN and ELSE clauses. This will split the > + path leading to the latch. BB will be unreachable and removed. */ > + if (bb && is_feasible_trace (bb)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + "Duplicating join block %d into predecessor paths\n", > + bb->index); > + basic_block pred0 = EDGE_PRED (bb, 0)->src; > + basic_block pred1 = EDGE_PRED (bb, 1)->src; > + transform_duplicate (pred0, bb); > + transform_duplicate (pred1, bb); > + changed = true; > + } > + } > + > + loop_optimizer_finalize (); > + free_original_copy_tables (); > + return changed; > +} > + > +/* Main entry point for path splitting. Returns TODO_cleanup_cfg if any > + paths where split, otherwise return zero. */ > + > +static unsigned int > +execute_path_split (void) > +{ > + /* If we don't have at least 2 real blocks and backedges in the > + CFG, then there's no point in trying to perform path splitting. */ > + if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1 > + || !mark_dfs_back_edges ()) > + return 0; > + > + bool changed = perform_path_splitting(); > + if (changed) > + { > + free_dominance_info (CDI_DOMINATORS); > + /* If we changed the CFG schedule loops for fixup by cleanup_cfg. */ > + if (current_loops) > + loops_state_set (LOOPS_NEED_FIXUP); > + } > + > + return changed ? TODO_cleanup_cfg : 0; > +} > + > +static bool > +gate_path_split (void) > +{ > + return flag_tree_path_split; > +} > + > +namespace { > + > +const pass_data pass_data_path_split = > +{ > + GIMPLE_PASS, /* type */ > + "path-split", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_TREE_PATH_SPLIT, /* tv_id */ > + PROP_ssa, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_update_ssa, /* todo_flags_finish */ > +}; > + > +class pass_path_split : public gimple_opt_pass > +{ > + public: > + pass_path_split (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_path_split, ctxt) > + {} > + /* opt_pass methods: */ > + opt_pass * clone () { return new pass_path_split (m_ctxt); } > + virtual bool gate (function *) { return gate_path_split (); } > + virtual unsigned int execute (function *) { return execute_path_split > (); } > + > +}; // class pass_path_split > + > +} // anon namespace > + > +gimple_opt_pass * > +make_pass_path_split (gcc::context *ctxt) > +{ > + return new pass_path_split (ctxt); > +} >
-----Original Message----- From: Jeff Law [mailto:law@redhat.com] Sent: Friday, November 13, 2015 3:28 AM To: Richard Biener Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On 11/12/2015 11:32 AM, Jeff Law wrote: > On 11/12/2015 10:05 AM, Jeff Law wrote: >>> But IIRC you mentioned it should enable vectorization or so? In >>> this case that's obviously too late. >> The opposite. Path splitting interferes with if-conversion & >> vectorization. Path splitting mucks up the CFG enough that >> if-conversion won't fire and as a result vectorization is inhibited. >> It also creates multi-latch loops, which isn't a great situation either. >> >> It *may* be the case that dropping it that far down in the pipeline >> and making the modifications necessary to handle simple latches may >> in turn make the path splitting code play better with if-conversion >> and vectorization and avoid creation of multi-latch loops. At least >> that's how it looks on paper when I draw out the CFG manipulations. >> >> I'll do some experiments. > It doesn't look too terrible to ravamp the recognition code to work > later in the pipeline with simple latches. Sadly that doesn't seem to > have fixed the bad interactions with if-conversion. > > *But* that does open up the possibility of moving the path splitting > pass even deeper in the pipeline -- in particular we can move it past > the vectorizer. Which is may be a win. > > So the big question is whether or not we'll still see enough benefits > from having it so late in the pipeline. It's still early enough that > we get DOM, VRP, reassoc, forwprop, phiopt, etc. > > Ajit, I'll pass along an updated patch after doing some more testing. Hello Jeff: >>So here's what I'm working with. It runs after the vectorizer now. >>Ajit, if you could benchmark this it would be greatly appreciated. I know you saw significant improvements on one or more benchmarks in the past. It'd be good to know that the >>updated placement of the pass doesn't invalidate the gains you saw. >>With the updated pass placement, we don't have to worry about switching the pass on/off based on whether or not the vectorizer & if-conversion are enabled. So that hackery is gone. >>I think I've beefed up the test to identify the diamond patterns we want so that it's stricter in what we accept. The call to ignore_bb_p is a part of that test so that we're actually looking at >>the right block in a world where we're doing this transformation with simple latches. >>I've also put a graphical comment before perform_path_splitting which hopefully shows the CFG transformation we're making a bit clearer. >>This bootstraps and regression tests cleanly on x86_64-linux-gnu. Thank you for the inputs. I will build the compiler and run SPEC CPU 2000 benchmarks for X86 target and respond back as soon as run is done. I will also run the EEMBC/Mibench benchmarks for Microblaze target. Would let you know the results at the earliest. Thanks & Regards Ajit
On 11/13/2015 03:13 AM, Richard Biener wrote: >> diff --git a/gcc/Makefile.in b/gcc/Makefile.in >> index 34d2356..6613e83 100644 >> --- a/gcc/Makefile.in >> +++ b/gcc/Makefile.in >> @@ -1474,6 +1474,7 @@ OBJS = \ >> tree-ssa-loop.o \ >> tree-ssa-math-opts.o \ >> tree-ssa-operands.o \ >> + tree-ssa-path-split.o \ > > gimple-ssa-path-split please. Agreed. I'll make that change for Ajit. > >> tree-ssa-phionlycprop.o \ >> tree-ssa-phiopt.o \ >> tree-ssa-phiprop.o \ >> diff --git a/gcc/common.opt b/gcc/common.opt >> index 757ce85..3e946ca 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -2403,6 +2403,10 @@ ftree-vrp >> Common Report Var(flag_tree_vrp) Init(0) Optimization >> Perform Value Range Propagation on trees. >> >> +ftree-path-split > > fsplit-paths And this plus related variable name fixes and such. >> >> +@item -ftree-path-split >> +@opindex ftree-path-split >> +Perform Path Splitting on trees. When the two execution paths of a >> +if-then-else merge at the loop latch node, try to duplicate the >> +merge node into two paths. This is enabled by default at @option{-O2} >> +and above. >> + > > I think if we go into the detail of the transform we should mention the > effective result (creating a loop nest with disambiguation figuring out > which is the "better" inner loop). It no longer creates a loop nest. The overall shape of the CFG is maintained. ie, we still have a single simple latch for the loop. The blocks we create are internal to the loop. I always struggle with the right level at which to document these options. I'll take a look at this for Ajit. BTW Do we have an API for indicating that new blocks have been added to a loop? If so, then we can likely drop the LOOPS_NEED_FIXUP. > >> @item -fsplit-ivs-in-unroller >> @opindex fsplit-ivs-in-unroller >> Enables expression of values of induction variables in later iterations >> diff --git a/gcc/opts.c b/gcc/opts.c >> index 9a3fbb3..9a0b27c 100644 >> --- a/gcc/opts.c >> +++ b/gcc/opts.c >> @@ -509,6 +509,7 @@ static const struct default_options >> default_options_table[] = >> { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 >> }, >> { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, >> { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, >> + { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 }, > > Is this transform a good idea for -Os? In general, no because of the block duplication. jeff
On November 13, 2015 5:26:01 PM GMT+01:00, Jeff Law <law@redhat.com> wrote: >On 11/13/2015 03:13 AM, Richard Biener wrote: > >>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in >>> index 34d2356..6613e83 100644 >>> --- a/gcc/Makefile.in >>> +++ b/gcc/Makefile.in >>> @@ -1474,6 +1474,7 @@ OBJS = \ >>> tree-ssa-loop.o \ >>> tree-ssa-math-opts.o \ >>> tree-ssa-operands.o \ >>> + tree-ssa-path-split.o \ >> >> gimple-ssa-path-split please. >Agreed. I'll make that change for Ajit. > > >> >>> tree-ssa-phionlycprop.o \ >>> tree-ssa-phiopt.o \ >>> tree-ssa-phiprop.o \ >>> diff --git a/gcc/common.opt b/gcc/common.opt >>> index 757ce85..3e946ca 100644 >>> --- a/gcc/common.opt >>> +++ b/gcc/common.opt >>> @@ -2403,6 +2403,10 @@ ftree-vrp >>> Common Report Var(flag_tree_vrp) Init(0) Optimization >>> Perform Value Range Propagation on trees. >>> >>> +ftree-path-split >> >> fsplit-paths >And this plus related variable name fixes and such. > > >>> >>> +@item -ftree-path-split >>> +@opindex ftree-path-split >>> +Perform Path Splitting on trees. When the two execution paths of a >>> +if-then-else merge at the loop latch node, try to duplicate the >>> +merge node into two paths. This is enabled by default at >@option{-O2} >>> +and above. >>> + >> >> I think if we go into the detail of the transform we should mention >the >> effective result (creating a loop nest with disambiguation figuring >out >> which is the "better" inner loop). >It no longer creates a loop nest. The overall shape of the CFG is >maintained. ie, we still have a single simple latch for the loop. The > >blocks we create are internal to the loop. > >I always struggle with the right level at which to document these >options. I'll take a look at this for Ajit. > >BTW Do we have an API for indicating that new blocks have been added to > >a loop? If so, then we can likely drop the LOOPS_NEED_FIXUP. Please. It's called add_to_loop or so. Richard. > >> >>> @item -fsplit-ivs-in-unroller >>> @opindex fsplit-ivs-in-unroller >>> Enables expression of values of induction variables in later >iterations >>> diff --git a/gcc/opts.c b/gcc/opts.c >>> index 9a3fbb3..9a0b27c 100644 >>> --- a/gcc/opts.c >>> +++ b/gcc/opts.c >>> @@ -509,6 +509,7 @@ static const struct default_options >>> default_options_table[] = >>> { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, >NULL, 1 >>> }, >>> { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, >>> { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, >>> + { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 }, >> >> Is this transform a good idea for -Os? >In general, no because of the block duplication. > >jeff
On 11/13/2015 11:09 AM, Richard Biener wrote: >> >> BTW Do we have an API for indicating that new blocks have been added to >> >> a loop? If so, then we can likely drop the LOOPS_NEED_FIXUP. > > Please. It's called add_to_loop or so. Haha, the block duplication code was handling this already. So in theory I can just drop the LOOPS_NEED_FIXUP completely. Testing now. jeff
diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 34d2356..6613e83 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1474,6 +1474,7 @@ OBJS = \ tree-ssa-loop.o \ tree-ssa-math-opts.o \ tree-ssa-operands.o \ + tree-ssa-path-split.o \ tree-ssa-phionlycprop.o \ tree-ssa-phiopt.o \ tree-ssa-phiprop.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 757ce85..3e946ca 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2403,6 +2403,10 @@ ftree-vrp Common Report Var(flag_tree_vrp) Init(0) Optimization Perform Value Range Propagation on trees. +ftree-path-split +Common Report Var(flag_tree_path_split) Init(0) Optimization +Perform Path Splitting on trees for loop backedges. + funit-at-a-time Common Report Var(flag_unit_at_a_time) Init(1) Compile whole compilation unit at a time. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 213a9d0..b1e95da 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -354,6 +354,7 @@ Objective-C and Objective-C++ Dialects}. -fdump-tree-fre@r{[}-@var{n}@r{]} @gol -fdump-tree-vtable-verify @gol -fdump-tree-vrp@r{[}-@var{n}@r{]} @gol +-fdump-tree-path-split@r{[}-@var{n}@r{]} @gol -fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol -fdump-final-insns=@var{file} @gol -fcompare-debug@r{[}=@var{opts}@r{]} -fcompare-debug-second @gol @@ -462,7 +463,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol --ftree-vectorize -ftree-vrp @gol +-ftree-vectorize -ftree-vrp @gol -ftree-path-split @gol -funit-at-a-time -funroll-all-loops -funroll-loops @gol -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol @@ -7169,6 +7170,11 @@ output on to @file{stderr}. If two conflicting dump filenames are given for the same pass, then the latter option overrides the earlier one. +@item path-split +@opindex fdump-tree-path-split +Dump each function after path splitting. The file name is made by +appending @file{.path-split} to the source file name. + @item all Turn on all options, except @option{raw}, @option{slim}, @option{verbose} and @option{lineno}. @@ -7811,6 +7817,7 @@ also turns on the following optimization flags: -ftree-switch-conversion -ftree-tail-merge @gol -ftree-pre @gol -ftree-vrp @gol +-ftree-path-split @gol -fipa-ra} Please note the warning under @option{-fgcse} about @@ -8819,7 +8826,7 @@ currently enabled, but may be enabled by @option{-O2} in the future. @item -ftree-sink @opindex ftree-sink -Perform forward store motion on trees. This flag is +Perform forward store motion on trees. This flag is enabled by default at @option{-O} and higher. @item -ftree-bit-ccp @@ -9125,6 +9132,13 @@ enabled by default at @option{-O2} and higher. Null pointer check elimination is only done if @option{-fdelete-null-pointer-checks} is enabled. +@item -ftree-path-split +@opindex ftree-path-split +Perform Path Splitting on trees. When the two execution paths of a +if-then-else merge at the loop latch node, try to duplicate the +merge node into two paths. This is enabled by default at @option{-O2} +and above. + @item -fsplit-ivs-in-unroller @opindex fsplit-ivs-in-unroller Enables expression of values of induction variables in later iterations diff --git a/gcc/opts.c b/gcc/opts.c index 9a3fbb3..9a0b27c 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -509,6 +509,7 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, diff --git a/gcc/passes.def b/gcc/passes.def index c0ab6b9..4c9ef5f 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -274,6 +274,7 @@ along with GCC; see the file COPYING3. If not see POP_INSERT_PASSES () NEXT_PASS (pass_simduid_cleanup); NEXT_PASS (pass_lower_vector_ssa); + NEXT_PASS (pass_path_split); NEXT_PASS (pass_cse_reciprocals); NEXT_PASS (pass_reassoc); NEXT_PASS (pass_strength_reduction); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c new file mode 100644 index 0000000..c7e9515 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c @@ -0,0 +1,67 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-path-split-details " } */ + +#include <stdio.h> +#include <stdlib.h> + +#define RGBMAX 255 + +int +test() +{ + int i, Pels; + unsigned char sum = 0; + unsigned char xr, xg, xb; + unsigned char xc, xm, xy, xk; + unsigned char *ReadPtr, *EritePtr; + + ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100); + EritePtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100); + + for (i = 0; i < 100;i++) + { + ReadPtr[i] = 100 - i; + } + + for (i = 0; i < 100; i++) + { + xr = *ReadPtr++; + xg = *ReadPtr++; + xb = *ReadPtr++; + + xc = (unsigned char) (RGBMAX - xr); + xm = (unsigned char) (RGBMAX - xg); + xy = (unsigned char) (RGBMAX - xb); + + if (xc < xm) + { + xk = (unsigned char) (xc < xy ? xc : xy); + } + else + { + xk = (unsigned char) (xm < xy ? xm : xy); + } + + xc = (unsigned char) (xc - xk); + xm = (unsigned char) (xm - xk); + xy = (unsigned char) (xy - xk); + + *EritePtr++ = xc; + *EritePtr++ = xm; + *EritePtr++ = xy; + *EritePtr++ = xk; + sum += *EritePtr; + } + return sum; +} + +int +main() +{ + if (test() != 33) + abort(); + + return 0; +} + +/* { dg-final { scan-tree-dump "Duplicating join block" "path-split" } } */ diff --git a/gcc/timevar.def b/gcc/timevar.def index b429faf..3dba68b 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -300,3 +300,4 @@ DEFTIMEVAR (TV_LINK , "link JIT code") DEFTIMEVAR (TV_LOAD , "load JIT result") DEFTIMEVAR (TV_JIT_ACQUIRING_MUTEX , "acquiring JIT mutex") DEFTIMEVAR (TV_JIT_CLIENT_CODE , "JIT client code") +DEFTIMEVAR (TV_TREE_PATH_SPLIT , "tree path split") diff --git a/gcc/tracer.c b/gcc/tracer.c index 941dc20..c7b5150 100644 --- a/gcc/tracer.c +++ b/gcc/tracer.c @@ -51,9 +51,9 @@ #include "tree-inline.h" #include "cfgloop.h" #include "fibonacci_heap.h" +#include "tracer.h" static int count_insns (basic_block); -static bool ignore_bb_p (const_basic_block); static bool better_p (const_edge, const_edge); static edge find_best_successor (basic_block); static edge find_best_predecessor (basic_block); @@ -85,7 +85,7 @@ bb_seen_p (basic_block bb) } /* Return true if we should ignore the basic block for purposes of tracing. */ -static bool +bool ignore_bb_p (const_basic_block bb) { if (bb->index < NUM_FIXED_BLOCKS) @@ -226,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace) return i; } +/* Duplicate block BB2, placing it after BB in the CFG. Return the + newly created block. */ +basic_block +transform_duplicate (basic_block bb, basic_block bb2) +{ + edge e; + basic_block copy; + + e = find_edge (bb, bb2); + + copy = duplicate_block (bb2, e, bb); + flush_pending_stmts (e); + + add_phi_args_after_copy (©, 1, NULL); + + return (copy); +} + /* Look for basic blocks in frequency order, construct traces and tail duplicate if profitable. */ @@ -321,17 +339,8 @@ tail_duplicate (void) entries or at least rotate the loop. */ && bb2->loop_father->header != bb2) { - edge e; - basic_block copy; - - nduplicated += counts [bb2->index]; - - e = find_edge (bb, bb2); - - copy = duplicate_block (bb2, e, bb); - flush_pending_stmts (e); - - add_phi_args_after_copy (©, 1, NULL); + nduplicated += counts [bb2->index]; + basic_block copy = transform_duplicate (bb, bb2); /* Reconsider the original copy of block we've duplicated. Removing the most common predecessor may make it to be diff --git a/gcc/tracer.h b/gcc/tracer.h new file mode 100644 index 0000000..cd1792a --- /dev/null +++ b/gcc/tracer.h @@ -0,0 +1,26 @@ +/* Header file for Tracer. + Copyright (C) 2015 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef GCC_TRACER_H +#define GCC_TRACER_H + +extern basic_block transform_duplicate (basic_block bb, basic_block bb2); +extern bool ignore_bb_p (const_basic_block bb); + +#endif /* GCC_TRACER_H */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 49e22a9..6963acc 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -390,6 +390,7 @@ extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt); extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt); diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c new file mode 100644 index 0000000..9f61bd4 --- /dev/null +++ b/gcc/tree-ssa-path-split.c @@ -0,0 +1,275 @@ +/* Support routines for Path Splitting. + Copyright (C) 2015 Free Software Foundation, Inc. + Contributed by Ajit Kumar Agarwal <ajitkum@xilinx.com>. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "cfganal.h" +#include "cfgloop.h" +#include "gimple-iterator.h" +#include "tracer.h" + +/* Given LATCH, the latch block in a loop, see if the shape of the + path reaching LATCH is suitable for path splitting. If so, return + the block that will be duplicated into its predecessor paths. Else + return NULL. */ + +static basic_block +find_block_to_duplicate_for_path_splitting (basic_block latch) +{ + /* We should have simple latches at this point. So the latch should + have a single successor. This implies the predecessor of the latch + likely has the loop exit. And it's that predecessor we're most + interested in. To keep things simple, we're going to require that + the latch have a single predecessor too. */ + if (single_succ_p (latch) && single_pred_p (latch)) + { + basic_block bb = get_immediate_dominator (CDI_DOMINATORS, latch); + gcc_assert (single_pred_edge (latch)->src == bb); + + /* If BB has been marked as not to be duplicated, then honor that + request. */ + if (ignore_bb_p (bb)) + return NULL; + + gimple *last = gsi_stmt (gsi_last_nondebug_bb (bb)); + /* The immediate dominator of the latch must end in a conditional. */ + if (!last || gimple_code (last) != GIMPLE_COND) + return NULL; + + /* We're hoping that BB is a join point for an IF-THEN-ELSE diamond + region. Verify that it is. + + First, verify that BB has two predecessors (each arm of the + IF-THEN-ELSE) and two successors (the latch and exit). */ + if (EDGE_COUNT (bb->preds) == 2 && EDGE_COUNT (bb->succs) == 2) + { + /* Now verify that BB's immediate dominator ends in a + conditional as well. */ + basic_block bb_idom = get_immediate_dominator (CDI_DOMINATORS, bb); + gimple *last = gsi_stmt (gsi_last_nondebug_bb (bb_idom)); + if (!last || gimple_code (last) != GIMPLE_COND) + return NULL; + + /* And that BB's immediate dominator's successors are the + the predecessors of BB. */ + if (!find_edge (bb_idom, EDGE_PRED (bb, 0)->src) + || !find_edge (bb_idom, EDGE_PRED (bb, 1)->src)) + return NULL; + + /* So at this point we have a simple diamond for an IF-THEN-ELSE + construct starting at BB_IDOM, with a join point at BB. BB + pass control outside the loop or to the loop latch. + + We're going to want to create two duplicates of BB, one for + each successor of BB_IDOM. */ + return bb; + } + } + return NULL; +} + +/* Return TRUE if BB is a reasonable block to duplicate by examining + its size, false otherwise. BB will always be a loop latch block. + + Should this use the same tests as we do for jump threading? */ + +static bool +is_feasible_trace (basic_block bb) +{ + int num_stmt = 0; + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (!is_gimple_debug (stmt)) + num_stmt++; + } + + /* We may want to limit how many statements we copy. */ + if (num_stmt > 1) + return true; + + return false; +} + +/* If the immediate dominator of the latch of the loop is + block with conditional branch, then the loop latch is + duplicated to its predecessors path preserving the SSA + semantics. + + CFG before transformation. + + 2 + | + | + +---->3 + | / \ + | / \ + | 4 5 + | \ / + | \ / + | 6 + | / \ + | / \ + | 8 7 + | | | + ---+ E + + + + Block 8 is the latch. We're going to make copies of block 6 (9 & 10) + and wire things up so they look like this: + + 2 + | + | + +---->3 + | / \ + | / \ + | 4 5 + | | | + | | | + | 9 10 + | |\ /| + | | \ / | + | | 7 | + | | | | + | | E | + | | | + | \ / + | \ / + +-----8 + + + Blocks 9 and 10 will get merged into blocks 4 & 5 respectively which + enables CSE, DCE and other optimizations to occur on a larger block + of code. */ + +static bool +perform_path_splitting () +{ + bool changed = false; + loop_p loop; + + loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); + initialize_original_copy_tables (); + calculate_dominance_info (CDI_DOMINATORS); + + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) + { + /* See if there is a block that we can duplicate to split the + path to the loop latch. */ + basic_block bb = find_block_to_duplicate_for_path_splitting (loop->latch); + + /* BB is the merge point for an IF-THEN-ELSE we want to transform. + + Essentially we want to create two duplicates of BB and append + a duplicate to the THEN and ELSE clauses. This will split the + path leading to the latch. BB will be unreachable and removed. */ + if (bb && is_feasible_trace (bb)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Duplicating join block %d into predecessor paths\n", + bb->index); + basic_block pred0 = EDGE_PRED (bb, 0)->src; + basic_block pred1 = EDGE_PRED (bb, 1)->src; + transform_duplicate (pred0, bb); + transform_duplicate (pred1, bb); + changed = true; + } + } + + loop_optimizer_finalize (); + free_original_copy_tables (); + return changed; +} + +/* Main entry point for path splitting. Returns TODO_cleanup_cfg if any + paths where split, otherwise return zero. */ + +static unsigned int +execute_path_split (void) +{ + /* If we don't have at least 2 real blocks and backedges in the + CFG, then there's no point in trying to perform path splitting. */ + if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1 + || !mark_dfs_back_edges ()) + return 0; + + bool changed = perform_path_splitting(); + if (changed) + { + free_dominance_info (CDI_DOMINATORS); + /* If we changed the CFG schedule loops for fixup by cleanup_cfg. */ + if (current_loops) + loops_state_set (LOOPS_NEED_FIXUP); + } + + return changed ? TODO_cleanup_cfg : 0; +} + +static bool +gate_path_split (void) +{ + return flag_tree_path_split; +} + +namespace { + +const pass_data pass_data_path_split = +{ + GIMPLE_PASS, /* type */ + "path-split", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_TREE_PATH_SPLIT, /* tv_id */ + PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_path_split : public gimple_opt_pass +{ + public: + pass_path_split (gcc::context *ctxt) + : gimple_opt_pass (pass_data_path_split, ctxt) + {} + /* opt_pass methods: */ + opt_pass * clone () { return new pass_path_split (m_ctxt); } + virtual bool gate (function *) { return gate_path_split (); } + virtual unsigned int execute (function *) { return execute_path_split (); } + +}; // class pass_path_split + +} // anon namespace + +gimple_opt_pass * +make_pass_path_split (gcc::context *ctxt) +{ + return new pass_path_split (ctxt); +}