Message ID | alpine.LSU.2.20.1511121734040.11029@wotan.suse.de |
---|---|
State | New |
Headers | show |
On 11/12/2015 09:52 AM, Michael Matz wrote: > Hello, > > this new pass implements loop iteration space splitting for loops that > contain a conditional that's always true for one part of the iteration > space and false for the other, i.e. such situations: FWIW, Ajit suggested the same transformation earlier this year. During that discussion Richi indicated that for hmmer this transformation would enable vectorization. > > This transformation is in itself a good one but can also be an enabler for > the vectorizer. Agreed. It does increase code size, when the loop body contains > also unconditional code (that one is duplicated), so we only transform hot > loops. Probably ought to be disabled when we're not optimizing for speed as well. I'm a bit unsure of the placement of the new pass, or if it should > be an own pass at all. Right now I've placed it after unswitching and > scev_cprop, before loop distribution. Ideally I think all three, together > with loop fusion and an gimple unroller should be integrated into one loop > nest optimizer, alas, we aren't there yet. Given its impact on the looping structure, I'd think early in the loop optimizer. Given the similarities with unswitching, I think before/after unswitching is a natural first cut. We can always iterate if it looks like putting it elsewhere would make sense. > I've regstrapped this pass enabled with -O2 on x86-64-linux, without > regressions. I've also checked cpu2006 (the non-fortran part) for > correctness, not yet for performance. In the end it should probably only > be enabled for -O3+ (although if the whole loop body is conditional it > makes sense to also have it with -O2 because code growth is very small > then). Very curious on the performance side, so if you could get some #s on that, it'd be greatly appreciated. I'd be comfortable with this at -O2, but won't object if you'd prefer -O3. > > So, okay for trunk? > > > Ciao, > Michael. > * passes.def (pass_loop_split): Add. > * timevar.def (TV_LOOP_SPLIT): Add. > * tree-pass.h (make_pass_loop_split): Declare. > * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. > * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, > cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h, > gimple-pretty-print.h, gimple-fold.h, gimplify-me.h. > (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi, > split_loop, tree_ssa_split_loops, > make_pass_loop_split): New functions. > (pass_data_loop_split): New. > (pass_loop_split): New. > > testsuite/ > * gcc.dg/loop-split.c: New test. Please clean up the #if 0/#if 1 code in the new tests. You might also want to clean out the TRACE stuff. Essentially the tests look like you just dropped in a test you'd been running by hand until now :-) I don't see any negative tests -- ie tests that should not be split due to boundary conditions. Do you have any from development? If so it'd be good to have those too. > > Index: tree-ssa-loop-manip.h > =================================================================== > --- tree-ssa-loop-manip.h (revision 229763) > +++ tree-ssa-loop-manip.h (working copy) > @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc > > extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *, > bool, tree *, tree *); > +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int, > + struct loop *); > extern void rewrite_into_loop_closed_ssa (bitmap, unsigned); > extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *); > extern void verify_loop_closed_ssa (bool); > Index: tree-ssa-loop-unswitch.c > =================================================================== > --- tree-ssa-loop-unswitch.c (revision 229763) > +++ tree-ssa-loop-unswitch.c (working copy) Given the amount of new code, unless there's a strong need, I'd prefer this transformation to be implemented in its own file. > + > +/* Give an induction variable GUARD_IV, and its affine descriptor IV, > + find the loop phi node in LOOP defining it directly, or create > + such phi node. Return that phi node. */ > + > +static gphi * > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/) > +{ > + gimple *def = SSA_NAME_DEF_STMT (guard_iv); > + gphi *phi; > + if ((phi = dyn_cast <gphi *> (def)) > + && gimple_bb (phi) == loop->header) > + return phi; > + > + /* XXX Create the PHI instead. */ > + return NULL; So right now we just punt if we need to create the PHI? Does that happen with any kind of regularity in practice? > +} > + > +/* Checks if LOOP contains an conditional block whose condition > + depends on which side in the iteration space it is, and if so > + splits the iteration space into two loops. Returns true if the > + loop was split. NITER must contain the iteration descriptor for the > + single exit of LOOP. */ > + > +static bool > +split_loop (struct loop *loop, struct tree_niter_desc *niter) This should probably be broken up a bit more. It's loooong as-is. Without looking at how much stuff would have to be passed around, diddling the exit edge of the first loop, phi updates for the 2nd loop, fix iteration space of 2nd loop, exit block fixup might be a good initial cut at breaking this down into something of manageable size. Not sure if the setup and initial versioning should be broken out or not. > + initialize_original_copy_tables (); > + basic_block cond_bb; > + struct loop *floop = loop_version (loop, cond, &cond_bb, > + REG_BR_PROB_BASE, REG_BR_PROB_BASE, > + REG_BR_PROB_BASE, false); > + gcc_assert (floop); > + update_ssa (TODO_update_ssa); > + > + /* Now diddle the exit edge of the first loop (floop->join in the > + above) to either go to the common exit (join) or to the second > + loop, depending on if there are still iterations left, or not. > + We split the floop exit edge and insert a copy of the > + original exit expression into the new block, that either > + skips the second loop or goes to it. */ So after diddling, haven't we mucked up the dominator tree and the SSA graph? You're iterating over each PHI in two loop headers and fixing the SSA graph by hand AFAICT. But ISTM the dominator tree is still mucked up, right? I'm thinking specifically about the 2nd loop. Though perhaps it just works since after all your transformations it'll still be immediately dominated by the same block as before your transformations. Overall I think this looks real good. THe biggest problem IMHO is breaking down that monster function a bit. I'm a bit concerned by the dominator tree state. Worst case is we have to rebuild the dominators before ensuring we're LCSSA form, and even that doesn't seem too bad. As I mentioned, it may actually be the case that we're OK on the dominator tree, kindof by accident more than design. Jeff
Hi, On Thu, 12 Nov 2015, Jeff Law wrote: > > this new pass implements loop iteration space splitting for loops that > > contain a conditional that's always true for one part of the iteration > > space and false for the other, i.e. such situations: > FWIW, Ajit suggested the same transformation earlier this year. During that > discussion Richi indicated that for hmmer this transformation would enable > vectorization. It's a prerequisite indeed, but not enough in itself. The next problem will be that only parts of access chains inside the hot loop are vectorizable, but for that those parts need to be disambiguated. ICC is doing that by a massive chain of conditionals testing non-overlapping of the respective arrays at runtime. Our vectorizer could also do that (possibly by increasing the allowed number of conditionals), but the next problem then is that one of these (then indeed separated) parts is not vectorizable by our vectorizer: it's a 'a[i] = f(a[i-1])' dependency that can't yet be handled by us. If the separation of parts would be done by loop distribution that would be fine (we'd have separate loops for the parts, some of them vectorizable), but our loop distribution can't do runtime disambiguation, only our vectorizer. hmmer is actually quite interesting because it's a fairly isolated hot loop posing quite challenging problems for us :) > > It does increase code size, when the loop body contains > > also unconditional code (that one is duplicated), so we only transform hot > > loops. > > Probably ought to be disabled when we're not optimizing for speed as well. That should be dealt with by '!optimize_loop_for_size_p (loop)'. > > I've regstrapped this pass enabled with -O2 on x86-64-linux, without > > regressions. I've also checked cpu2006 (the non-fortran part) for > > correctness, not yet for performance. In the end it should probably only > > be enabled for -O3+ (although if the whole loop body is conditional it > > makes sense to also have it with -O2 because code growth is very small > > then). > > Very curious on the performance side, so if you could get some #s on that, > it'd be greatly appreciated. My test machine misbehaved over the weekend, but as soon as I have them I'll update here. > > testsuite/ > > * gcc.dg/loop-split.c: New test. > > Please clean up the #if 0/#if 1 code in the new tests. Actually I'd prefer if that test contains the by-hand code and the TRACE stuff as well, I'd only change the #if 0 into some #if BYHAND or so ... > You might also want to clean out the TRACE stuff. Essentially the tests > look like you just dropped in a test you'd been running by hand until > now :-) ... the reason being, that bugs in the splitter are somewhat unwieldy to debug by just staring at the dumps, you only get a checksum mismatch, so TRACE=1 is for finding out which of the params and loops is actually miscompiled, TRACE=2 for finding the specific iteration that's broken, and the #if0 code for putting that situation into a non-macroized and smaller function than dotest. (That's actually how I've run the testcase after I had it basically working, extending dotest with a couple more lines, aka example loop sitations, adjusting the checksum, and then making a face and scratching my head and mucking with the TRACE and #if0 macros :) ). > I don't see any negative tests -- ie tests that should not be split due > to boundary conditions. Do you have any from development? Good point, I had some but only ones where I was able to extend the splitters to cover them. I'll think of some that really shouldn't be split. > > Index: tree-ssa-loop-unswitch.c > > =================================================================== > > --- tree-ssa-loop-unswitch.c (revision 229763) > > +++ tree-ssa-loop-unswitch.c (working copy) > Given the amount of new code, unless there's a strong need, I'd prefer this > transformation to be implemented in its own file. Okay. > > +/* Give an induction variable GUARD_IV, and its affine descriptor IV, > > + find the loop phi node in LOOP defining it directly, or create > > + such phi node. Return that phi node. */ > > + > > +static gphi * > > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * > > /*iv*/) > > +{ > > + gimple *def = SSA_NAME_DEF_STMT (guard_iv); > > + gphi *phi; > > + if ((phi = dyn_cast <gphi *> (def)) > > + && gimple_bb (phi) == loop->header) > > + return phi; > > + > > + /* XXX Create the PHI instead. */ > > + return NULL; > > So right now we just punt if we need to create the PHI? Does that > happen with any kind of regularity in practice? Only with such situations: for (int i = start; i < end; i++) { if (i + offset < bound) ... } Here the condition-IV is not directly defined by a PHI node. If it happens often I don't know, I guess the usual situation is testing the control IV directly. The deficiency is not hard to fix. > > +static bool > > +split_loop (struct loop *loop, struct tree_niter_desc *niter) > This should probably be broken up a bit more. It's loooong as-is. > > Without looking at how much stuff would have to be passed around, > diddling the exit edge of the first loop, phi updates for the 2nd loop, > fix iteration space of 2nd loop, exit block fixup might be a good > initial cut at breaking this down into something of manageable size. Thanks, I'll do that. > > + initialize_original_copy_tables (); > > + basic_block cond_bb; > > + struct loop *floop = loop_version (loop, cond, &cond_bb, > > + REG_BR_PROB_BASE, REG_BR_PROB_BASE, > > + REG_BR_PROB_BASE, false); > > + gcc_assert (floop); > > + update_ssa (TODO_update_ssa); > > + > > + /* Now diddle the exit edge of the first loop (floop->join in the > > + above) to either go to the common exit (join) or to the second > > + loop, depending on if there are still iterations left, or not. > > + We split the floop exit edge and insert a copy of the > > + original exit expression into the new block, that either > > + skips the second loop or goes to it. */ > > So after diddling, haven't we mucked up the dominator tree and the SSA > graph? You're iterating over each PHI in two loop headers and fixing the > SSA graph by hand AFAICT. But ISTM the dominator tree is still mucked > up, right? I think I convinced myself on paper that the dominator tree is correct due to our helpers doing the right thing (loop_version() for the initial loop copying and split_edge for the above diddling). Let's see if I can paint some ASCII art. So, after loop_version (which updates dom) we have: .------if (cond)-------. v v pre1 pre2 | | h1<----. h2<----. | | | | .--ex1 | .------ex2 | | \ | | \ | | l1---' | l2---' | | | | '--X--------->join<' At this point dominators are all correct (due to loop_version updating them), in particular dom(pre1)==dom(pre2)==if(cond). Now we split ex1->join at X, and split_edge also updates them (trivially), but we insert a new edge from split_bb to pre2. There are no paths from region2 into region1, and anything in region2 except pre2 is still dominated by pre2 (or something further down), so if anything changes, then dom(pre2). .------if (cond)----. v | pre1 | | | h1<----. | | | | ex1 | | | \ | | | l1-' | v | .-split-----------. | | v | | pre2<----' | | | h2<----. | | | | ex2 | | | \ | | | l2--' | .-' '------>join<--' But there's a path directly to pre2, skipping whole region1, so dom(pre2) must be still if(cond), as originally. Also dom(join) doesn't change, because what was first a normal diamond between if(cond),region1,region2,join now is a meddled diamond with paths from region1 to region2, but not back, so the dominator of the join block still is the if(cond) block. This is all true if the internal structure of region1/region2 is sensible, and single_exit() regions are such. Even multiple exits to something behind join wouldn't change this, but we don't even have to think about this. In addition, anything not updating dominators correctly would scream loudly in the verifier. The SSA tree is correct after loop_version() and split_edge. The new edge split_bb->pre2 needs the adjustments in that loop over loop PHI nodes. That walk must catch everything, if it wouldn't then that would mean a use in region2 that's defined in region1, that wasn't originally dominated by the def (and hence must have been a loop-carried value and hence be defined in the loop header PHI block). > Overall I think this looks real good. THe biggest problem IMHO is > breaking down that monster function a bit. I'm a bit concerned by the > dominator tree state. Worst case is we have to rebuild the dominators > before ensuring we're LCSSA form, and even that doesn't seem too bad. Actually keeping LCSSA form correct is doable as well, but needs another loop over one or the other PHI nodes. I punted for now and called rewrite_into_loop_closed_ssa_1, which actually isn't too expensive for a single loop. > As I mentioned, it may actually be the case that we're OK on the > dominator tree, kindof by accident more than design. I'm pretty sure it is correct, and it is so by design :) Thanks for the feedback, I'll update the patch accordingly. Ciao, Michael.
On 11/16/2015 09:05 AM, Michael Matz wrote: > It's a prerequisite indeed, but not enough in itself. Sigh. OK. One can always hope. > > hmmer is actually quite interesting because it's a fairly isolated hot > loop posing quite challenging problems for us :) Sounds like it. Essentially, it's a TODO list :-) >> Probably ought to be disabled when we're not optimizing for speed as well. > > That should be dealt with by '!optimize_loop_for_size_p (loop)'. Doh, must have missed that. >> >> Please clean up the #if 0/#if 1 code in the new tests. > > Actually I'd prefer if that test contains the by-hand code and the TRACE > stuff as well, I'd only change the #if 0 into some #if BYHAND or so ... > >> You might also want to clean out the TRACE stuff. Essentially the tests >> look like you just dropped in a test you'd been running by hand until >> now :-) > > ... the reason being, that bugs in the splitter are somewhat unwieldy to > debug by just staring at the dumps, you only get a checksum mismatch, so > TRACE=1 is for finding out which of the params and loops is actually > miscompiled, TRACE=2 for finding the specific iteration that's broken, and > the #if0 code for putting that situation into a non-macroized and smaller > function than dotest. (That's actually how I've run the testcase after I > had it basically working, extending dotest with a couple more lines, aka > example loop sitations, adjusting the checksum, and then making a face and > scratching my head and mucking with the TRACE and #if0 macros :) ). OK, if you want to keep them, then have a consistent way to turn them on/off for future debugging. if0/if1 doesn't provide much of a clue to someone else what to turn on/off if they need to debug this stuff. > >> I don't see any negative tests -- ie tests that should not be split due >> to boundary conditions. Do you have any from development? > > Good point, I had some but only ones where I was able to extend the > splitters to cover them. I'll think of some that really shouldn't be > split. If you've got them, certainly add them. Though I realize they may get lost over time. > > Only with such situations: > > for (int i = start; i < end; i++) { > if (i + offset < bound) > ... > } > > Here the condition-IV is not directly defined by a PHI node. If it > happens often I don't know, I guess the usual situation is testing the > control IV directly. The deficiency is not hard to fix. I'm comfortable waiting until we see the need. > I think I convinced myself on paper that the dominator tree is correct due > to our helpers doing the right thing (loop_version() for the initial > loop copying and split_edge for the above diddling). Let's see if I can > paint some ASCII art. So, after loop_version (which updates dom) we > have: OK. I was worried about the next step -- where we insert the conditional on the exit from pre1 to have it transfer to join or pre2. But in that case, the immediate dominator of pre2 & join is still the initial if statement. So I think we're OK. That was the conclusion I was starting to come to yesterday, having the ascii art makes it pretty clear. I'm just not good at conceptualizing a CFG. I have to see it explicitly and then everything seems so clear and simple. jeff
Hi, On Mon, 16 Nov 2015, Jeff Law wrote: > OK, if you want to keep them, then have a consistent way to turn them > on/off for future debugging. if0/if1 doesn't provide much of a clue to > someone else what to turn on/off if they need to debug this stuff. > > > I don't see any negative tests -- ie tests that should not be split > > > due to boundary conditions. Do you have any from development? > > > > Good point, I had some but only ones where I was able to extend the > > splitters to cover them. I'll think of some that really shouldn't be > > split. > If you've got them, certainly add them. Though I realize they may get > lost over time. Actually, thinking a bit more about this, I don't have any that wouldn't be merely restrictions in the implementation that couldn't be lifted in the future (e.g. unequal step sizes), so I've added no additional ones. > But in that case, the immediate dominator of pre2 & join is still the > initial if statement. So I think we're OK. That was the conclusion I > was starting to come to yesterday, having the ascii art makes it pretty > clear. I'm just not good at conceptualizing a CFG. I have to see it > explicitly and then everything seems so clear and simple. So, this second version should reflect the review. I've moved everything to a new file, split the long function into several logically separate ones, and even included ascii art in the comments :) The testcase got a comment about what to #define for debugging. I've included the pass to -O3 or alternatively if profile-use is on, similar to funswitch-loops. I've also added a proper -fsplit-loops option. There's two functional changes in v2: a bugfix to not try splitting a non-iterating loop (irritatingly such a look returns true from number_of_iterations_exit, but with an ERROR_MARK comparator), and a limitation to avoid combinatorical explosion in artificial testcases: Once we have done a splitting, we don't do any in that loops parents (we may still do splitting in siblings or childs of siblings). I've also done some measurements: first, bootstrap time is unaffected, and regstrapping succeeds without regressions when I activate the pass by default. Then SPECcpu2006: build times are unaffected, everything builds and works also with -fsplit-loops, performance is mostly unaffected, base is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops. Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 400.perlbench 9770 325 30.1 * 9770 323 30.3 * 401.bzip2 9650 382 25.2 * 9650 382 25.3 * 403.gcc 8050 242 33.3 * 8050 241 33.4 * 429.mcf 9120 311 29.3 * 9120 311 29.3 * 445.gobmk 10490 392 26.8 * 10490 391 26.8 * 456.hmmer 9330 345 27.0 * 9330 342 27.3 * 458.sjeng 12100 422 28.7 * 12100 420 28.8 * 462.libquantum 20720 308 67.3 * 20720 308 67.3 * 464.h264ref 22130 423 52.3 * 22130 423 52.3 * 471.omnetpp 6250 273 22.9 * 6250 273 22.9 * 473.astar 7020 311 22.6 * 7020 311 22.6 * 483.xalancbmk 6900 191 36.2 * 6900 190 36.2 * Est. SPECint_base2006 31.7 Est. SPECint2006 31.7 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 235 57.7 * 13590 235 57.8 * 416.gamess NR NR 433.milc 9180 347 26.5 * 9180 345 26.6 * 434.zeusmp 9100 269 33.9 * 9100 268 33.9 * 435.gromacs 7140 260 27.4 * 7140 262 27.3 * 436.cactusADM 11950 237 50.5 * 11950 240 49.9 * 437.leslie3d 9400 228 41.3 * 9400 228 41.2 * 444.namd 8020 312 25.7 * 8020 311 25.7 * 447.dealII 11440 254 45.0 * 11440 254 45.0 * 450.soplex 8340 201 41.4 * 8340 202 41.4 * 453.povray NR NR 454.calculix 8250 282 29.2 * 8250 283 29.2 * 459.GemsFDTD 10610 310 34.3 * 10610 309 34.3 * 465.tonto 9840 683 14.4 * 9840 684 14.4 * 470.lbm 13740 224 61.2 * 13740 224 61.3 * 481.wrf 11170 291 38.4 * 11170 291 38.4 * 482.sphinx3 19490 377 51.7 * 19490 377 51.6 * Est. SPECfp_base2006 36.3 Est. SPECfp2006 36.3 The 1% improvements and degradations are all inside the normal result variations on this machine (I have the feeling that the hmmer improvement is stable, and will recheck this). Not all of the above had loops split at all, only: SPECint: 400.perlbench, 403.gcc, 445.gobmk, 456.hmmer, 462.libquantum, 464.h264ref, 471.omnetpp and SPECfp: 435.gromacs, 436.cactusADM, 447.dealII, 454.calculix. So, okay for trunk? Ciao, Michael.
On 12/01/2015 09:46 AM, Michael Matz wrote: > Hi, > > So, okay for trunk? -ENOPATCH Jeff
On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz <matz@suse.de> wrote: > Hello, > > this new pass implements loop iteration space splitting for loops that > contain a conditional that's always true for one part of the iteration > space and false for the other, i.e. such situations: > > for (i = beg; i < end; i++) > if (i < p) > dothis(); > else > dothat(); > > this is transformed into roughly: > > for (i = beg; i < p; i++) > dothis(); > for (; i < end; i++) > dothat(); > > Of course, not quite the above as there needs to be provisions for the > border conditions, if e.g. 'p' is outside the original iteration space, or > the conditional doesn't directly use the control IV, but some other, or > the IV runs backwards. The testcase checks many of these border > conditions. > > This transformation is in itself a good one but can also be an enabler for > the vectorizer. It does increase code size, when the loop body contains > also unconditional code (that one is duplicated), so we only transform hot > loops. I'm a bit unsure of the placement of the new pass, or if it should > be an own pass at all. Right now I've placed it after unswitching and > scev_cprop, before loop distribution. Ideally I think all three, together > with loop fusion and an gimple unroller should be integrated into one loop > nest optimizer, alas, we aren't there yet. > > I'm planning to work on loop fusion in the future as well, but that's not > for GCC 6. > > I've regstrapped this pass enabled with -O2 on x86-64-linux, without > regressions. I've also checked cpu2006 (the non-fortran part) for > correctness, not yet for performance. In the end it should probably only > be enabled for -O3+ (although if the whole loop body is conditional it > makes sense to also have it with -O2 because code growth is very small > then). > > So, okay for trunk? What ever happened to this patch? I was looking into doing this myself today but I found this patch. It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC. Thanks, Andrew > > > Ciao, > Michael. > * passes.def (pass_loop_split): Add. > * timevar.def (TV_LOOP_SPLIT): Add. > * tree-pass.h (make_pass_loop_split): Declare. > * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. > * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, > cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h, > gimple-pretty-print.h, gimple-fold.h, gimplify-me.h. > (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi, > split_loop, tree_ssa_split_loops, > make_pass_loop_split): New functions. > (pass_data_loop_split): New. > (pass_loop_split): New. > > testsuite/ > * gcc.dg/loop-split.c: New test. > > Index: passes.def > =================================================================== > --- passes.def (revision 229763) > +++ passes.def (working copy) > @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > + NEXT_PASS (pass_loop_split); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > Index: timevar.def > =================================================================== > --- timevar.def (revision 229763) > +++ timevar.def (working copy) > @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , " > DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") > DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop") > DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching") > +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting") > DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") > DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") > DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization") > Index: tree-pass.h > =================================================================== > --- tree-pass.h (revision 229763) > +++ tree-pass.h (working copy) > @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n > extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); > Index: tree-ssa-loop-manip.h > =================================================================== > --- tree-ssa-loop-manip.h (revision 229763) > +++ tree-ssa-loop-manip.h (working copy) > @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc > > extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *, > bool, tree *, tree *); > +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int, > + struct loop *); > extern void rewrite_into_loop_closed_ssa (bitmap, unsigned); > extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *); > extern void verify_loop_closed_ssa (bool); > Index: tree-ssa-loop-unswitch.c > =================================================================== > --- tree-ssa-loop-unswitch.c (revision 229763) > +++ tree-ssa-loop-unswitch.c (working copy) > @@ -31,12 +31,20 @@ along with GCC; see the file COPYING3. > #include "tree-ssa.h" > #include "tree-ssa-loop-niter.h" > #include "tree-ssa-loop.h" > +#include "tree-ssa-loop-manip.h" > #include "tree-into-ssa.h" > +#include "cfganal.h" > #include "cfgloop.h" > +#include "tree-chrec.h" > +#include "tree-affine.h" > +#include "tree-scalar-evolution.h" > #include "params.h" > #include "tree-inline.h" > #include "gimple-iterator.h" > +#include "gimple-pretty-print.h" > #include "cfghooks.h" > +#include "gimple-fold.h" > +#include "gimplify-me.h" > > /* This file implements the loop unswitching, i.e. transformation of loops like > > @@ -842,4 +850,551 @@ make_pass_tree_unswitch (gcc::context *c > return new pass_tree_unswitch (ctxt); > } > > +/* Return true when BB inside LOOP is a potential iteration space > + split point, i.e. ends with a condition like "IV < comp", which > + is true on one side of the iteration space and false on the other, > + and the split point can be computed. If so, also return the border > + point in *BORDER and the comparison induction variable in IV. */ > > +static tree > +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv) > +{ > + gimple *last; > + gcond *stmt; > + affine_iv iv2; > + > + /* BB must end in a simple conditional jump. */ > + last = last_stmt (bb); > + if (!last || gimple_code (last) != GIMPLE_COND) > + return NULL_TREE; > + stmt = as_a <gcond *> (last); > + > + enum tree_code code = gimple_cond_code (stmt); > + > + /* Only handle relational comparisons, for equality and non-equality > + we'd have to split the loop into two loops and a middle statement. */ > + switch (code) > + { > + case LT_EXPR: > + case LE_EXPR: > + case GT_EXPR: > + case GE_EXPR: > + break; > + default: > + return NULL_TREE; > + } > + > + if (loop_exits_from_bb_p (loop, bb)) > + return NULL_TREE; > + > + tree op0 = gimple_cond_lhs (stmt); > + tree op1 = gimple_cond_rhs (stmt); > + > + if (!simple_iv (loop, loop, op0, iv, false)) > + return NULL_TREE; > + if (!simple_iv (loop, loop, op1, &iv2, false)) > + return NULL_TREE; > + > + /* Make it so, that the first argument of the condition is > + the looping one. */ > + if (integer_zerop (iv->step)) > + { > + std::swap (op0, op1); > + std::swap (*iv, iv2); > + code = swap_tree_comparison (code); > + gimple_cond_set_condition (stmt, code, op0, op1); > + update_stmt (stmt); > + } > + > + if (integer_zerop (iv->step)) > + return NULL_TREE; > + if (!integer_zerop (iv2.step)) > + return NULL_TREE; > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Found potential split point: "); > + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > + fprintf (dump_file, " { "); > + print_generic_expr (dump_file, iv->base, TDF_SLIM); > + fprintf (dump_file, " + I*"); > + print_generic_expr (dump_file, iv->step, TDF_SLIM); > + fprintf (dump_file, " } %s ", get_tree_code_name (code)); > + print_generic_expr (dump_file, iv2.base, TDF_SLIM); > + fprintf (dump_file, "\n"); > + } > + > + *border = iv2.base; > + return op0; > +} > + > +/* Given a GUARD conditional stmt inside LOOP, which we want to make always > + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL > + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop > + exit test statement to loop back only if the GUARD statement will > + also be true/false in the next iteration. */ > + > +static void > +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound, > + bool initial_true) > +{ > + edge exit = single_exit (loop); > + gcond *stmt = as_a <gcond *> (last_stmt (exit->src)); > + gimple_cond_set_condition (stmt, gimple_cond_code (guard), > + nextval, newbound); > + update_stmt (stmt); > + > + edge stay = single_pred_edge (loop->latch); > + > + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); > + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); > + > + if (initial_true) > + { > + exit->flags |= EDGE_FALSE_VALUE; > + stay->flags |= EDGE_TRUE_VALUE; > + } > + else > + { > + exit->flags |= EDGE_TRUE_VALUE; > + stay->flags |= EDGE_FALSE_VALUE; > + } > +} > + > +/* Give an induction variable GUARD_IV, and its affine descriptor IV, > + find the loop phi node in LOOP defining it directly, or create > + such phi node. Return that phi node. */ > + > +static gphi * > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/) > +{ > + gimple *def = SSA_NAME_DEF_STMT (guard_iv); > + gphi *phi; > + if ((phi = dyn_cast <gphi *> (def)) > + && gimple_bb (phi) == loop->header) > + return phi; > + > + /* XXX Create the PHI instead. */ > + return NULL; > +} > + > +/* Checks if LOOP contains an conditional block whose condition > + depends on which side in the iteration space it is, and if so > + splits the iteration space into two loops. Returns true if the > + loop was split. NITER must contain the iteration descriptor for the > + single exit of LOOP. */ > + > +static bool > +split_loop (struct loop *loop, struct tree_niter_desc *niter) > +{ > + basic_block *bbs; > + unsigned i; > + bool changed = false; > + tree guard_iv; > + tree border; > + affine_iv iv; > + > + bbs = get_loop_body (loop); > + > + /* Find a splitting opportunity. */ > + for (i = 0; i < loop->num_nodes; i++) > + if ((guard_iv = split_at_bb_p (loop, bbs[i], &border, &iv))) > + { > + /* Handling opposite steps is not implemented yet. Neither > + is handling different step sizes. */ > + if ((tree_int_cst_sign_bit (iv.step) > + != tree_int_cst_sign_bit (niter->control.step)) > + || !tree_int_cst_equal (iv.step, niter->control.step)) > + continue; > + > + /* Find a loop PHI node that defines guard_iv directly, > + or create one doing that. */ > + gphi *phi = find_or_create_guard_phi (loop, guard_iv, &iv); > + if (!phi) > + continue; > + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i])); > + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi, > + loop_preheader_edge (loop)); > + enum tree_code guard_code = gimple_cond_code (guard_stmt); > + > + /* Loop splitting is implemented by versioning the loop, placing > + the new loop in front of the old loop, make the first loop iterate > + as long as the conditional stays true (or false) and let the > + second (original) loop handle the rest of the iterations. > + > + First we need to determine if the condition will start being true > + or false in the first loop. */ > + bool initial_true; > + switch (guard_code) > + { > + case LT_EXPR: > + case LE_EXPR: > + initial_true = !tree_int_cst_sign_bit (iv.step); > + break; > + case GT_EXPR: > + case GE_EXPR: > + initial_true = tree_int_cst_sign_bit (iv.step); > + break; > + default: > + gcc_unreachable (); > + } > + > + /* Build a condition that will skip the first loop when the > + guard condition won't ever be true (or false). */ > + tree cond = build2 (guard_code, boolean_type_node, guard_init, border); > + if (initial_true) > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > + > + /* Now version the loop, we will then have this situation: > + if (!cond) > + for (...) {body} //floop > + else > + for (...) {body} //loop > + join: */ > + initialize_original_copy_tables (); > + basic_block cond_bb; > + struct loop *floop = loop_version (loop, cond, &cond_bb, > + REG_BR_PROB_BASE, REG_BR_PROB_BASE, > + REG_BR_PROB_BASE, false); > + gcc_assert (floop); > + update_ssa (TODO_update_ssa); > + > + /* Now diddle the exit edge of the first loop (floop->join in the > + above) to either go to the common exit (join) or to the second > + loop, depending on if there are still iterations left, or not. > + We split the floop exit edge and insert a copy of the > + original exit expression into the new block, that either > + skips the second loop or goes to it. */ > + edge exit = single_exit (floop); > + basic_block skip_bb = split_edge (exit); > + gcond *skip_stmt; > + gimple_stmt_iterator gsi; > + edge new_e, skip_e; > + > + gimple *stmt = last_stmt (exit->src); > + skip_stmt = gimple_build_cond (gimple_cond_code (stmt), > + gimple_cond_lhs (stmt), > + gimple_cond_rhs (stmt), > + NULL_TREE, NULL_TREE); > + gsi = gsi_last_bb (skip_bb); > + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT); > + > + skip_e = EDGE_SUCC (skip_bb, 0); > + skip_e->flags &= ~EDGE_FALLTHRU; > + new_e = make_edge (skip_bb, loop_preheader_edge (loop)->src, 0); > + if (exit->flags & EDGE_TRUE_VALUE) > + { > + skip_e->flags |= EDGE_TRUE_VALUE; > + new_e->flags |= EDGE_FALSE_VALUE; > + } > + else > + { > + skip_e->flags |= EDGE_FALSE_VALUE; > + new_e->flags |= EDGE_TRUE_VALUE; > + } > + > + new_e->count = skip_bb->count; > + new_e->probability = PROB_LIKELY; > + new_e->count = apply_probability (skip_e->count, PROB_LIKELY); > + skip_e->count -= new_e->count; > + skip_e->probability = inverse_probability (PROB_LIKELY); > + > + /* Now we have created this situation: > + if (!cond) { > + for (...) {body; if (cexit) break;} > + if (!cexit) goto second; > + } else { > + second: > + for (...) {body; if (cexit) break;} > + } > + join: > + > + The second loop can now be entered by skipping the first > + loop (the inital values of its PHI nodes will be the > + original initial values), or by falling in from the first > + loop (the initial values will be the continuation values > + from the first loop). Insert PHI nodes reflecting this > + in the pre-header of the second loop. */ > + > + basic_block rest = loop_preheader_edge (loop)->src; > + edge skip_first = find_edge (cond_bb, rest); > + gcc_assert (skip_first); > + > + edge firste = loop_preheader_edge (floop); > + edge seconde = loop_preheader_edge (loop); > + edge firstn = loop_latch_edge (floop); > + gphi *new_guard_phi = 0; > + gphi_iterator psi_first, psi_second; > + for (psi_first = gsi_start_phis (floop->header), > + psi_second = gsi_start_phis (loop->header); > + !gsi_end_p (psi_first); > + gsi_next (&psi_first), gsi_next (&psi_second)) > + { > + tree init, next, new_init; > + use_operand_p op; > + gphi *phi_first = psi_first.phi (); > + gphi *phi_second = psi_second.phi (); > + > + if (phi_second == phi) > + new_guard_phi = phi_first; > + > + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste); > + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn); > + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde); > + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op))); > + > + /* Prefer using original variable as a base for the new ssa name. > + This is necessary for virtual ops, and useful in order to avoid > + losing debug info for real ops. */ > + if (TREE_CODE (next) == SSA_NAME > + && useless_type_conversion_p (TREE_TYPE (next), > + TREE_TYPE (init))) > + new_init = copy_ssa_name (next); > + else if (TREE_CODE (init) == SSA_NAME > + && useless_type_conversion_p (TREE_TYPE (init), > + TREE_TYPE (next))) > + new_init = copy_ssa_name (init); > + else if (useless_type_conversion_p (TREE_TYPE (next), > + TREE_TYPE (init))) > + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL, > + "unrinittmp"); > + else > + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL, > + "unrinittmp"); > + > + gphi * newphi = create_phi_node (new_init, rest); > + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION); > + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION); > + SET_USE (op, new_init); > + } > + > + /* The iterations of the second loop is now already > + exactly those that the first loop didn't do, but the > + iteration space of the first loop is still the original one. > + Build a new one, exactly covering those iterations where > + the conditional is true (or false). For example, from such a loop: > + > + for (i = beg, j = beg2; i < end; i++, j++) > + if (j < c) // this is supposed to be true > + ... > + > + we build new bounds and change the exit condtions such that > + it's effectively this: > + > + newend = min (end+beg2-beg, c) > + for (i = beg; j = beg2; j < newend; i++, j++) > + if (j < c) > + ... > + > + Depending on the direction of the IVs and if the exit tests > + are strict or include equality we need to use MIN or MAX, > + and add or subtract 1. */ > + > + gimple_seq stmts = NULL; > + /* The niter structure contains the after-increment IV, we need > + the loop-enter base, so subtract STEP once. */ > + tree controlbase = force_gimple_operand (niter->control.base, > + &stmts, true, NULL_TREE); > + tree controlstep = niter->control.step; > + tree enddiff; > + if (POINTER_TYPE_P (TREE_TYPE (controlbase))) > + { > + controlstep = gimple_build (&stmts, NEGATE_EXPR, > + TREE_TYPE (controlstep), controlstep); > + enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR, > + TREE_TYPE (controlbase), > + controlbase, controlstep); > + } > + else > + enddiff = gimple_build (&stmts, MINUS_EXPR, > + TREE_TYPE (controlbase), > + controlbase, controlstep); > + > + /* Compute beg-beg2. */ > + if (POINTER_TYPE_P (TREE_TYPE (enddiff))) > + { > + tree tem = gimple_convert (&stmts, sizetype, guard_init); > + tem = gimple_build (&stmts, NEGATE_EXPR, sizetype, tem); > + enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR, > + TREE_TYPE (enddiff), > + enddiff, tem); > + } > + else > + enddiff = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff), > + enddiff, guard_init); > + > + /* Compute end-(beg-beg2). */ > + gimple_seq stmts2; > + tree newbound = force_gimple_operand (niter->bound, &stmts2, > + true, NULL_TREE); > + gimple_seq_add_seq_without_update (&stmts, stmts2); > + > + if (POINTER_TYPE_P (TREE_TYPE (enddiff)) > + || POINTER_TYPE_P (TREE_TYPE (newbound))) > + { > + enddiff = gimple_convert (&stmts, sizetype, enddiff); > + enddiff = gimple_build (&stmts, NEGATE_EXPR, sizetype, enddiff); > + newbound = gimple_build (&stmts, POINTER_PLUS_EXPR, > + TREE_TYPE (newbound), > + newbound, enddiff); > + } > + else > + newbound = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff), > + newbound, enddiff); > + > + /* Depending on the direction of the IVs the new bound for the first > + loop is the minimum or maximum of old bound and border. > + Also, if the guard condition isn't strictly less or greater, > + we need to adjust the bound. */ > + int addbound = 0; > + enum tree_code minmax; > + if (niter->cmp == LT_EXPR) > + { > + /* GT and LE are the same, inverted. */ > + if (guard_code == GT_EXPR || guard_code == LE_EXPR) > + addbound = -1; > + minmax = MIN_EXPR; > + } > + else > + { > + gcc_assert (niter->cmp == GT_EXPR); > + if (guard_code == GE_EXPR || guard_code == LT_EXPR) > + addbound = 1; > + minmax = MAX_EXPR; > + } > + > + if (addbound) > + { > + tree type2 = TREE_TYPE (newbound); > + if (POINTER_TYPE_P (type2)) > + type2 = sizetype; > + newbound = gimple_build (&stmts, > + POINTER_TYPE_P (TREE_TYPE (newbound)) > + ? POINTER_PLUS_EXPR : PLUS_EXPR, > + TREE_TYPE (newbound), > + newbound, > + build_int_cst (type2, addbound)); > + } > + > + tree newend = gimple_build (&stmts, minmax, TREE_TYPE (border), > + border, newbound); > + if (stmts) > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (floop), > + stmts); > + > + /* Now patch the exit block of the first loop to compare > + the post-increment value of the guarding IV with the new end > + value. */ > + tree new_guard_next = PHI_ARG_DEF_FROM_EDGE (new_guard_phi, > + loop_latch_edge (floop)); > + patch_loop_exit (floop, guard_stmt, new_guard_next, newend, > + initial_true); > + > + /* Finally patch out the two copies of the condition to be always > + true/false (or opposite). */ > + gcond *force_true = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i]))); > + gcond *force_false = as_a<gcond *> (last_stmt (bbs[i])); > + if (!initial_true) > + std::swap (force_true, force_false); > + gimple_cond_make_true (force_true); > + gimple_cond_make_false (force_false); > + update_stmt (force_true); > + update_stmt (force_false); > + > + free_original_copy_tables (); > + > + /* We destroyed LCSSA form above. Eventually we might be able > + to fix it on the fly, for now simply punt and use the helper. */ > + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, floop); > + > + changed = true; > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, ";; Loop split.\n"); > + > + /* Only deal with the first opportunity. */ > + break; > + } > + > + free (bbs); > + return changed; > +} > + > +/* Main entry point. Perform loop splitting on all suitable loops. */ > + > +static unsigned int > +tree_ssa_split_loops (void) > +{ > + struct loop *loop; > + bool changed = false; > + > + gcc_assert (scev_initialized_p ()); > + /* Go through all loops starting from innermost. */ > + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > + { > + struct tree_niter_desc niter; > + if (single_exit (loop) > + /* ??? We could handle non-empty latches when we split > + the latch edge (not the exit edge), and put the new > + exit condition in the new block. OTOH this executes some > + code unconditionally that might have been skipped by the > + original exit before. */ > + && empty_block_p (loop->latch) > + && !optimize_loop_for_size_p (loop) > + && number_of_iterations_exit (loop, single_exit (loop), &niter, > + false, true) > + /* We can't yet handle loops controlled by a != predicate. */ > + && niter.cmp != NE_EXPR) > + changed |= split_loop (loop, &niter); > + } > + > + if (changed) > + return TODO_cleanup_cfg; > + return 0; > +} > + > +/* Loop splitting pass. */ > + > +namespace { > + > +const pass_data pass_data_loop_split = > +{ > + GIMPLE_PASS, /* type */ > + "lsplit", /* name */ > + OPTGROUP_LOOP, /* optinfo_flags */ > + TV_LOOP_SPLIT, /* tv_id */ > + PROP_cfg, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + 0, /* todo_flags_finish */ > +}; > + > +class pass_loop_split : public gimple_opt_pass > +{ > +public: > + pass_loop_split (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_loop_split, ctxt) > + {} > + > + /* opt_pass methods: */ > + virtual bool gate (function *) { return optimize >= 2; } > + virtual unsigned int execute (function *); > + > +}; // class pass_loop_split > + > +unsigned int > +pass_loop_split::execute (function *fun) > +{ > + if (number_of_loops (fun) <= 1) > + return 0; > + > + return tree_ssa_split_loops (); > +} > + > +} // anon namespace > + > +gimple_opt_pass * > +make_pass_loop_split (gcc::context *ctxt) > +{ > + return new pass_loop_split (ctxt); > +} > Index: testsuite/gcc.dg/loop-split.c > =================================================================== > --- testsuite/gcc.dg/loop-split.c (revision 0) > +++ testsuite/gcc.dg/loop-split.c (working copy) > @@ -0,0 +1,141 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-lsplit-details" } */ > + > +#ifdef __cplusplus > +extern "C" int printf (const char *, ...); > +extern "C" void abort (void); > +#else > +extern int printf (const char *, ...); > +extern void abort (void); > +#endif > + > +#ifndef TRACE > +#define TRACE 0 > +#endif > + > +#define loop(beg,step,beg2,cond1,cond2) \ > + do \ > + { \ > + sum = 0; \ > + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \ > + { \ > + if (cond2) { \ > + if (TRACE > 1) printf ("a: %d %d\n", i, j); \ > + sum += a[i]; \ > + } else { \ > + if (TRACE > 1) printf ("b: %d %d\n", i, j); \ > + sum += b[i]; \ > + } \ > + } \ > + if (TRACE > 0) printf ("sum: %d\n", sum); \ > + check = check * 47 + sum; \ > + } while (0) > + > +#if 1 > +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step, > + int c, int *a, int *b, int beg2) > +{ > + unsigned check = 0; > + int sum; > + int i, j; > + loop (beg, 1, beg2, i < end, j < c); > + loop (beg, 1, beg2, i <= end, j < c); > + loop (beg, 1, beg2, i < end, j <= c); > + loop (beg, 1, beg2, i <= end, j <= c); > + loop (beg, 1, beg2, i < end, j > c); > + loop (beg, 1, beg2, i <= end, j > c); > + loop (beg, 1, beg2, i < end, j >= c); > + loop (beg, 1, beg2, i <= end, j >= c); > + beg2 += end-beg; > + loop (end, -1, beg2, i >= beg, j >= c); > + loop (end, -1, beg2, i >= beg, j > c); > + loop (end, -1, beg2, i > beg, j >= c); > + loop (end, -1, beg2, i > beg, j > c); > + loop (end, -1, beg2, i >= beg, j <= c); > + loop (end, -1, beg2, i >= beg, j < c); > + loop (end, -1, beg2, i > beg, j <= c); > + loop (end, -1, beg2, i > beg, j < c); > + return check; > +} > + > +#else > + > +int __attribute__((noinline, noclone)) f (int beg, int end, int step, > + int c, int *a, int *b, int beg2) > +{ > + int sum = 0; > + int i, j; > + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/) > + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/) > + { > + // i - j == X --> i = X + j > + // --> i < end == X+j < end == j < end - X > + // --> newend = end - (i_init - j_init) > + // j < end-X && j < c --> j < min(end-X,c) > + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!}) > + //if (j < c) > + if (j >= c) > + printf ("a: %d %d\n", i, j); > + /*else > + printf ("b: %d %d\n", i, j);*/ > + /*sum += a[i]; > + else > + sum += b[i];*/ > + } > + return sum; > +} > + > +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step, > + int *c, int *a, int *b, int *beg2) > +{ > + int sum = 0; > + int *i, *j; > + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/) > + { > + if (j <= c) > + printf ("%d %d\n", i - beg, j - beg); > + /*sum += a[i]; > + else > + sum += b[i];*/ > + } > + return sum; > +} > +#endif > + > +extern int printf (const char *, ...); > + > +int main () > +{ > + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0}; > + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,}; > + int c; > + int diff = 0; > + unsigned check = 0; > + //dotest (0, 9, 1, -1, a+5, b+5, -1); > + //return 0; > + //f (0, 9, 1, -1, a+5, b+5, -1); > + //return 0; > + for (diff = -5; diff <= 5; diff++) > + { > + for (c = -1; c <= 10; c++) > + { > +#if 0 > + int s = f (0, 9, 1, c, a+5, b+5, diff); > + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff); > + printf ("%d ", s); > +#else > + if (TRACE > 0) > + printf ("check %d %d\n", c, diff); > + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff); > +#endif > + } > + //printf ("\n"); > + } > + //printf ("%u\n", check); > + if (check != 3213344948) > + abort (); > + return 0; > +} > + > +/* All 16 loops in dotest should be split. */ > +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
Hi, On Sun, 24 Jul 2016, Andrew Pinski wrote: > What ever happened to this patch? It got accepted but I deferred inclusion in GCC 6 because it was late in the cycle then and performance results didn't show super improvements (only looked at cpu2006). No regressions, but no nice speedups either. > I was looking into doing this myself today but I found this patch. It is > stage 1 of GCC 7, it might be a good idea to get this patch into GCC. Indeed. If you want to performance test it on something you know where it should help, I'm all ears. Ciao, Michael.
Index: passes.def =================================================================== --- passes.def (revision 229763) +++ passes.def (working copy) @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); + NEXT_PASS (pass_loop_split); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); Index: timevar.def =================================================================== --- timevar.def (revision 229763) +++ timevar.def (working copy) @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , " DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop") DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching") +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting") DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization") Index: tree-pass.h =================================================================== --- tree-pass.h (revision 229763) +++ tree-pass.h (working copy) @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); Index: tree-ssa-loop-manip.h =================================================================== --- tree-ssa-loop-manip.h (revision 229763) +++ tree-ssa-loop-manip.h (working copy) @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *, bool, tree *, tree *); +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int, + struct loop *); extern void rewrite_into_loop_closed_ssa (bitmap, unsigned); extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *); extern void verify_loop_closed_ssa (bool); Index: tree-ssa-loop-unswitch.c =================================================================== --- tree-ssa-loop-unswitch.c (revision 229763) +++ tree-ssa-loop-unswitch.c (working copy) @@ -31,12 +31,20 @@ along with GCC; see the file COPYING3. #include "tree-ssa.h" #include "tree-ssa-loop-niter.h" #include "tree-ssa-loop.h" +#include "tree-ssa-loop-manip.h" #include "tree-into-ssa.h" +#include "cfganal.h" #include "cfgloop.h" +#include "tree-chrec.h" +#include "tree-affine.h" +#include "tree-scalar-evolution.h" #include "params.h" #include "tree-inline.h" #include "gimple-iterator.h" +#include "gimple-pretty-print.h" #include "cfghooks.h" +#include "gimple-fold.h" +#include "gimplify-me.h" /* This file implements the loop unswitching, i.e. transformation of loops like @@ -842,4 +850,551 @@ make_pass_tree_unswitch (gcc::context *c return new pass_tree_unswitch (ctxt); } +/* Return true when BB inside LOOP is a potential iteration space + split point, i.e. ends with a condition like "IV < comp", which + is true on one side of the iteration space and false on the other, + and the split point can be computed. If so, also return the border + point in *BORDER and the comparison induction variable in IV. */ +static tree +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv) +{ + gimple *last; + gcond *stmt; + affine_iv iv2; + + /* BB must end in a simple conditional jump. */ + last = last_stmt (bb); + if (!last || gimple_code (last) != GIMPLE_COND) + return NULL_TREE; + stmt = as_a <gcond *> (last); + + enum tree_code code = gimple_cond_code (stmt); + + /* Only handle relational comparisons, for equality and non-equality + we'd have to split the loop into two loops and a middle statement. */ + switch (code) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + break; + default: + return NULL_TREE; + } + + if (loop_exits_from_bb_p (loop, bb)) + return NULL_TREE; + + tree op0 = gimple_cond_lhs (stmt); + tree op1 = gimple_cond_rhs (stmt); + + if (!simple_iv (loop, loop, op0, iv, false)) + return NULL_TREE; + if (!simple_iv (loop, loop, op1, &iv2, false)) + return NULL_TREE; + + /* Make it so, that the first argument of the condition is + the looping one. */ + if (integer_zerop (iv->step)) + { + std::swap (op0, op1); + std::swap (*iv, iv2); + code = swap_tree_comparison (code); + gimple_cond_set_condition (stmt, code, op0, op1); + update_stmt (stmt); + } + + if (integer_zerop (iv->step)) + return NULL_TREE; + if (!integer_zerop (iv2.step)) + return NULL_TREE; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Found potential split point: "); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, " { "); + print_generic_expr (dump_file, iv->base, TDF_SLIM); + fprintf (dump_file, " + I*"); + print_generic_expr (dump_file, iv->step, TDF_SLIM); + fprintf (dump_file, " } %s ", get_tree_code_name (code)); + print_generic_expr (dump_file, iv2.base, TDF_SLIM); + fprintf (dump_file, "\n"); + } + + *border = iv2.base; + return op0; +} + +/* Given a GUARD conditional stmt inside LOOP, which we want to make always + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop + exit test statement to loop back only if the GUARD statement will + also be true/false in the next iteration. */ + +static void +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound, + bool initial_true) +{ + edge exit = single_exit (loop); + gcond *stmt = as_a <gcond *> (last_stmt (exit->src)); + gimple_cond_set_condition (stmt, gimple_cond_code (guard), + nextval, newbound); + update_stmt (stmt); + + edge stay = single_pred_edge (loop->latch); + + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); + + if (initial_true) + { + exit->flags |= EDGE_FALSE_VALUE; + stay->flags |= EDGE_TRUE_VALUE; + } + else + { + exit->flags |= EDGE_TRUE_VALUE; + stay->flags |= EDGE_FALSE_VALUE; + } +} + +/* Give an induction variable GUARD_IV, and its affine descriptor IV, + find the loop phi node in LOOP defining it directly, or create + such phi node. Return that phi node. */ + +static gphi * +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/) +{ + gimple *def = SSA_NAME_DEF_STMT (guard_iv); + gphi *phi; + if ((phi = dyn_cast <gphi *> (def)) + && gimple_bb (phi) == loop->header) + return phi; + + /* XXX Create the PHI instead. */ + return NULL; +} + +/* Checks if LOOP contains an conditional block whose condition + depends on which side in the iteration space it is, and if so + splits the iteration space into two loops. Returns true if the + loop was split. NITER must contain the iteration descriptor for the + single exit of LOOP. */ + +static bool +split_loop (struct loop *loop, struct tree_niter_desc *niter) +{ + basic_block *bbs; + unsigned i; + bool changed = false; + tree guard_iv; + tree border; + affine_iv iv; + + bbs = get_loop_body (loop); + + /* Find a splitting opportunity. */ + for (i = 0; i < loop->num_nodes; i++) + if ((guard_iv = split_at_bb_p (loop, bbs[i], &border, &iv))) + { + /* Handling opposite steps is not implemented yet. Neither + is handling different step sizes. */ + if ((tree_int_cst_sign_bit (iv.step) + != tree_int_cst_sign_bit (niter->control.step)) + || !tree_int_cst_equal (iv.step, niter->control.step)) + continue; + + /* Find a loop PHI node that defines guard_iv directly, + or create one doing that. */ + gphi *phi = find_or_create_guard_phi (loop, guard_iv, &iv); + if (!phi) + continue; + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i])); + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi, + loop_preheader_edge (loop)); + enum tree_code guard_code = gimple_cond_code (guard_stmt); + + /* Loop splitting is implemented by versioning the loop, placing + the new loop in front of the old loop, make the first loop iterate + as long as the conditional stays true (or false) and let the + second (original) loop handle the rest of the iterations. + + First we need to determine if the condition will start being true + or false in the first loop. */ + bool initial_true; + switch (guard_code) + { + case LT_EXPR: + case LE_EXPR: + initial_true = !tree_int_cst_sign_bit (iv.step); + break; + case GT_EXPR: + case GE_EXPR: + initial_true = tree_int_cst_sign_bit (iv.step); + break; + default: + gcc_unreachable (); + } + + /* Build a condition that will skip the first loop when the + guard condition won't ever be true (or false). */ + tree cond = build2 (guard_code, boolean_type_node, guard_init, border); + if (initial_true) + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); + + /* Now version the loop, we will then have this situation: + if (!cond) + for (...) {body} //floop + else + for (...) {body} //loop + join: */ + initialize_original_copy_tables (); + basic_block cond_bb; + struct loop *floop = loop_version (loop, cond, &cond_bb, + REG_BR_PROB_BASE, REG_BR_PROB_BASE, + REG_BR_PROB_BASE, false); + gcc_assert (floop); + update_ssa (TODO_update_ssa); + + /* Now diddle the exit edge of the first loop (floop->join in the + above) to either go to the common exit (join) or to the second + loop, depending on if there are still iterations left, or not. + We split the floop exit edge and insert a copy of the + original exit expression into the new block, that either + skips the second loop or goes to it. */ + edge exit = single_exit (floop); + basic_block skip_bb = split_edge (exit); + gcond *skip_stmt; + gimple_stmt_iterator gsi; + edge new_e, skip_e; + + gimple *stmt = last_stmt (exit->src); + skip_stmt = gimple_build_cond (gimple_cond_code (stmt), + gimple_cond_lhs (stmt), + gimple_cond_rhs (stmt), + NULL_TREE, NULL_TREE); + gsi = gsi_last_bb (skip_bb); + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT); + + skip_e = EDGE_SUCC (skip_bb, 0); + skip_e->flags &= ~EDGE_FALLTHRU; + new_e = make_edge (skip_bb, loop_preheader_edge (loop)->src, 0); + if (exit->flags & EDGE_TRUE_VALUE) + { + skip_e->flags |= EDGE_TRUE_VALUE; + new_e->flags |= EDGE_FALSE_VALUE; + } + else + { + skip_e->flags |= EDGE_FALSE_VALUE; + new_e->flags |= EDGE_TRUE_VALUE; + } + + new_e->count = skip_bb->count; + new_e->probability = PROB_LIKELY; + new_e->count = apply_probability (skip_e->count, PROB_LIKELY); + skip_e->count -= new_e->count; + skip_e->probability = inverse_probability (PROB_LIKELY); + + /* Now we have created this situation: + if (!cond) { + for (...) {body; if (cexit) break;} + if (!cexit) goto second; + } else { + second: + for (...) {body; if (cexit) break;} + } + join: + + The second loop can now be entered by skipping the first + loop (the inital values of its PHI nodes will be the + original initial values), or by falling in from the first + loop (the initial values will be the continuation values + from the first loop). Insert PHI nodes reflecting this + in the pre-header of the second loop. */ + + basic_block rest = loop_preheader_edge (loop)->src; + edge skip_first = find_edge (cond_bb, rest); + gcc_assert (skip_first); + + edge firste = loop_preheader_edge (floop); + edge seconde = loop_preheader_edge (loop); + edge firstn = loop_latch_edge (floop); + gphi *new_guard_phi = 0; + gphi_iterator psi_first, psi_second; + for (psi_first = gsi_start_phis (floop->header), + psi_second = gsi_start_phis (loop->header); + !gsi_end_p (psi_first); + gsi_next (&psi_first), gsi_next (&psi_second)) + { + tree init, next, new_init; + use_operand_p op; + gphi *phi_first = psi_first.phi (); + gphi *phi_second = psi_second.phi (); + + if (phi_second == phi) + new_guard_phi = phi_first; + + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste); + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn); + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde); + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op))); + + /* Prefer using original variable as a base for the new ssa name. + This is necessary for virtual ops, and useful in order to avoid + losing debug info for real ops. */ + if (TREE_CODE (next) == SSA_NAME + && useless_type_conversion_p (TREE_TYPE (next), + TREE_TYPE (init))) + new_init = copy_ssa_name (next); + else if (TREE_CODE (init) == SSA_NAME + && useless_type_conversion_p (TREE_TYPE (init), + TREE_TYPE (next))) + new_init = copy_ssa_name (init); + else if (useless_type_conversion_p (TREE_TYPE (next), + TREE_TYPE (init))) + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL, + "unrinittmp"); + else + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL, + "unrinittmp"); + + gphi * newphi = create_phi_node (new_init, rest); + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION); + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION); + SET_USE (op, new_init); + } + + /* The iterations of the second loop is now already + exactly those that the first loop didn't do, but the + iteration space of the first loop is still the original one. + Build a new one, exactly covering those iterations where + the conditional is true (or false). For example, from such a loop: + + for (i = beg, j = beg2; i < end; i++, j++) + if (j < c) // this is supposed to be true + ... + + we build new bounds and change the exit condtions such that + it's effectively this: + + newend = min (end+beg2-beg, c) + for (i = beg; j = beg2; j < newend; i++, j++) + if (j < c) + ... + + Depending on the direction of the IVs and if the exit tests + are strict or include equality we need to use MIN or MAX, + and add or subtract 1. */ + + gimple_seq stmts = NULL; + /* The niter structure contains the after-increment IV, we need + the loop-enter base, so subtract STEP once. */ + tree controlbase = force_gimple_operand (niter->control.base, + &stmts, true, NULL_TREE); + tree controlstep = niter->control.step; + tree enddiff; + if (POINTER_TYPE_P (TREE_TYPE (controlbase))) + { + controlstep = gimple_build (&stmts, NEGATE_EXPR, + TREE_TYPE (controlstep), controlstep); + enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR, + TREE_TYPE (controlbase), + controlbase, controlstep); + } + else + enddiff = gimple_build (&stmts, MINUS_EXPR, + TREE_TYPE (controlbase), + controlbase, controlstep); + + /* Compute beg-beg2. */ + if (POINTER_TYPE_P (TREE_TYPE (enddiff))) + { + tree tem = gimple_convert (&stmts, sizetype, guard_init); + tem = gimple_build (&stmts, NEGATE_EXPR, sizetype, tem); + enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR, + TREE_TYPE (enddiff), + enddiff, tem); + } + else + enddiff = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff), + enddiff, guard_init); + + /* Compute end-(beg-beg2). */ + gimple_seq stmts2; + tree newbound = force_gimple_operand (niter->bound, &stmts2, + true, NULL_TREE); + gimple_seq_add_seq_without_update (&stmts, stmts2); + + if (POINTER_TYPE_P (TREE_TYPE (enddiff)) + || POINTER_TYPE_P (TREE_TYPE (newbound))) + { + enddiff = gimple_convert (&stmts, sizetype, enddiff); + enddiff = gimple_build (&stmts, NEGATE_EXPR, sizetype, enddiff); + newbound = gimple_build (&stmts, POINTER_PLUS_EXPR, + TREE_TYPE (newbound), + newbound, enddiff); + } + else + newbound = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff), + newbound, enddiff); + + /* Depending on the direction of the IVs the new bound for the first + loop is the minimum or maximum of old bound and border. + Also, if the guard condition isn't strictly less or greater, + we need to adjust the bound. */ + int addbound = 0; + enum tree_code minmax; + if (niter->cmp == LT_EXPR) + { + /* GT and LE are the same, inverted. */ + if (guard_code == GT_EXPR || guard_code == LE_EXPR) + addbound = -1; + minmax = MIN_EXPR; + } + else + { + gcc_assert (niter->cmp == GT_EXPR); + if (guard_code == GE_EXPR || guard_code == LT_EXPR) + addbound = 1; + minmax = MAX_EXPR; + } + + if (addbound) + { + tree type2 = TREE_TYPE (newbound); + if (POINTER_TYPE_P (type2)) + type2 = sizetype; + newbound = gimple_build (&stmts, + POINTER_TYPE_P (TREE_TYPE (newbound)) + ? POINTER_PLUS_EXPR : PLUS_EXPR, + TREE_TYPE (newbound), + newbound, + build_int_cst (type2, addbound)); + } + + tree newend = gimple_build (&stmts, minmax, TREE_TYPE (border), + border, newbound); + if (stmts) + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (floop), + stmts); + + /* Now patch the exit block of the first loop to compare + the post-increment value of the guarding IV with the new end + value. */ + tree new_guard_next = PHI_ARG_DEF_FROM_EDGE (new_guard_phi, + loop_latch_edge (floop)); + patch_loop_exit (floop, guard_stmt, new_guard_next, newend, + initial_true); + + /* Finally patch out the two copies of the condition to be always + true/false (or opposite). */ + gcond *force_true = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i]))); + gcond *force_false = as_a<gcond *> (last_stmt (bbs[i])); + if (!initial_true) + std::swap (force_true, force_false); + gimple_cond_make_true (force_true); + gimple_cond_make_false (force_false); + update_stmt (force_true); + update_stmt (force_false); + + free_original_copy_tables (); + + /* We destroyed LCSSA form above. Eventually we might be able + to fix it on the fly, for now simply punt and use the helper. */ + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, floop); + + changed = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, ";; Loop split.\n"); + + /* Only deal with the first opportunity. */ + break; + } + + free (bbs); + return changed; +} + +/* Main entry point. Perform loop splitting on all suitable loops. */ + +static unsigned int +tree_ssa_split_loops (void) +{ + struct loop *loop; + bool changed = false; + + gcc_assert (scev_initialized_p ()); + /* Go through all loops starting from innermost. */ + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) + { + struct tree_niter_desc niter; + if (single_exit (loop) + /* ??? We could handle non-empty latches when we split + the latch edge (not the exit edge), and put the new + exit condition in the new block. OTOH this executes some + code unconditionally that might have been skipped by the + original exit before. */ + && empty_block_p (loop->latch) + && !optimize_loop_for_size_p (loop) + && number_of_iterations_exit (loop, single_exit (loop), &niter, + false, true) + /* We can't yet handle loops controlled by a != predicate. */ + && niter.cmp != NE_EXPR) + changed |= split_loop (loop, &niter); + } + + if (changed) + return TODO_cleanup_cfg; + return 0; +} + +/* Loop splitting pass. */ + +namespace { + +const pass_data pass_data_loop_split = +{ + GIMPLE_PASS, /* type */ + "lsplit", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_LOOP_SPLIT, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_loop_split : public gimple_opt_pass +{ +public: + pass_loop_split (gcc::context *ctxt) + : gimple_opt_pass (pass_data_loop_split, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return optimize >= 2; } + virtual unsigned int execute (function *); + +}; // class pass_loop_split + +unsigned int +pass_loop_split::execute (function *fun) +{ + if (number_of_loops (fun) <= 1) + return 0; + + return tree_ssa_split_loops (); +} + +} // anon namespace + +gimple_opt_pass * +make_pass_loop_split (gcc::context *ctxt) +{ + return new pass_loop_split (ctxt); +} Index: testsuite/gcc.dg/loop-split.c =================================================================== --- testsuite/gcc.dg/loop-split.c (revision 0) +++ testsuite/gcc.dg/loop-split.c (working copy) @@ -0,0 +1,141 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-lsplit-details" } */ + +#ifdef __cplusplus +extern "C" int printf (const char *, ...); +extern "C" void abort (void); +#else +extern int printf (const char *, ...); +extern void abort (void); +#endif + +#ifndef TRACE +#define TRACE 0 +#endif + +#define loop(beg,step,beg2,cond1,cond2) \ + do \ + { \ + sum = 0; \ + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \ + { \ + if (cond2) { \ + if (TRACE > 1) printf ("a: %d %d\n", i, j); \ + sum += a[i]; \ + } else { \ + if (TRACE > 1) printf ("b: %d %d\n", i, j); \ + sum += b[i]; \ + } \ + } \ + if (TRACE > 0) printf ("sum: %d\n", sum); \ + check = check * 47 + sum; \ + } while (0) + +#if 1 +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step, + int c, int *a, int *b, int beg2) +{ + unsigned check = 0; + int sum; + int i, j; + loop (beg, 1, beg2, i < end, j < c); + loop (beg, 1, beg2, i <= end, j < c); + loop (beg, 1, beg2, i < end, j <= c); + loop (beg, 1, beg2, i <= end, j <= c); + loop (beg, 1, beg2, i < end, j > c); + loop (beg, 1, beg2, i <= end, j > c); + loop (beg, 1, beg2, i < end, j >= c); + loop (beg, 1, beg2, i <= end, j >= c); + beg2 += end-beg; + loop (end, -1, beg2, i >= beg, j >= c); + loop (end, -1, beg2, i >= beg, j > c); + loop (end, -1, beg2, i > beg, j >= c); + loop (end, -1, beg2, i > beg, j > c); + loop (end, -1, beg2, i >= beg, j <= c); + loop (end, -1, beg2, i >= beg, j < c); + loop (end, -1, beg2, i > beg, j <= c); + loop (end, -1, beg2, i > beg, j < c); + return check; +} + +#else + +int __attribute__((noinline, noclone)) f (int beg, int end, int step, + int c, int *a, int *b, int beg2) +{ + int sum = 0; + int i, j; + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/) + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/) + { + // i - j == X --> i = X + j + // --> i < end == X+j < end == j < end - X + // --> newend = end - (i_init - j_init) + // j < end-X && j < c --> j < min(end-X,c) + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!}) + //if (j < c) + if (j >= c) + printf ("a: %d %d\n", i, j); + /*else + printf ("b: %d %d\n", i, j);*/ + /*sum += a[i]; + else + sum += b[i];*/ + } + return sum; +} + +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step, + int *c, int *a, int *b, int *beg2) +{ + int sum = 0; + int *i, *j; + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/) + { + if (j <= c) + printf ("%d %d\n", i - beg, j - beg); + /*sum += a[i]; + else + sum += b[i];*/ + } + return sum; +} +#endif + +extern int printf (const char *, ...); + +int main () +{ + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0}; + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,}; + int c; + int diff = 0; + unsigned check = 0; + //dotest (0, 9, 1, -1, a+5, b+5, -1); + //return 0; + //f (0, 9, 1, -1, a+5, b+5, -1); + //return 0; + for (diff = -5; diff <= 5; diff++) + { + for (c = -1; c <= 10; c++) + { +#if 0 + int s = f (0, 9, 1, c, a+5, b+5, diff); + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff); + printf ("%d ", s); +#else + if (TRACE > 0) + printf ("check %d %d\n", c, diff); + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff); +#endif + } + //printf ("\n"); + } + //printf ("%u\n", check); + if (check != 3213344948) + abort (); + return 0; +} + +/* All 16 loops in dotest should be split. */ +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */