loop-unroll.c TLC

Hi,
this patch applies some TLC on RTL loop unroller.  The following issues are
fixed:

 1) while updating loop-iv.c I did not notice that the upper bound computed
    is in fact conditional on infinite and assumptions flags.
    The patch disables adding it if those are non-zero.  I checked it does
    not lead to missed optimizations since all paths that rely on the upper
    bound tests these being zero.
    (for the same reasons it also won't lead to wrong code prior the following
    changes).
 2) I updated complette peeling same way as I updated the tree level:
    to use the max iteration bound known and when it is small peel + add
    barrier to the last iteration of the loop when complette peeling is
    possible but we do not know what exit will relaly terminate the lop.

    I considered the option to remove it after I get the tree level
    change in, but even with that change it triggers few times during
    the bootstrap even when tree level is peeling more aggressivly.
    I checked the reasons and these are either
    loop ordering issues - often loop-lim is needed to make iteration count
    to trigger, some cases where RTL can duplicate more than trees (such
    as computed gotos) and the fact that RTL level handle outer loops.

    Given major embarrasment on trying to unroll or stupidly peel loop
    with small upper bound and the fact that analysis are done and thus
    the pass is virtually free when it does not trigger I think it makes
    sense to keep it for now.
    In general RTL world is better informed on code size and thus it seems
    to make sense to keep the functionality. We may even want to reduce
    tree level peeling when the loops have complicated control flow
    and leave the rest of oppurtunities for RTL land.
 3) loop_exit_at_end_p can be made stronger by use of active_insn_p
    that is the proper predicate here. I tried to make it to use
    forwarder_block_p
    but someone hacked in the logic to make it return false on loop
    headers/latches I would probably put into a wrapper.
 4) I noticed that we handle peeling by two paths - decide_peel_once_rolling
    and decide_peel_completelly. The path on decide_peel_once_rolling seems
    confused.  It passes only loops that do not roll at all (those can exist)
    and in this case it makes no sense to bound the number of instructions in
    the loop because no duplication is done.
    I also see no reason to differ loops rolling once from rolling many times
    (and we didn't do that), so i removed the undocumented param
    max-once-peeled-insns.
 5) decide_unroll_constant_iterations assumes that loops with constant
    iterations really loops may times.  This is not true if loop has more
    than one exit; we should still consult profile here.
 6) decide_peel_simple gives up on loops with constant number of iterations;
    this is wrong - if profile tells that loop is rolling just frew times
    it is still more sensible to peel it rather than unroll.
 7) decide_peel_simple gives up when loop contains branches. This seems
    confused and I will analyze it better later.  In general peeling may make
    loops with very small iteration bounds more predictable.
    For now I disabled the test when profile feedback gives us strong idea that
    the loop is really good peeling candidate.

I also started to add testcases for interesting cases.  So far I cover 2), 5),
6) and 7).  I failed to construct testcase for 4) because of another bug in
tree-ssa-loop-niter I will fill separate PR for.

Bootstrapped/regtested x86_64-linux, OK?
	* loop-iv.c (iv_number_of_iterations): Record upper bound estimates
	only when the bound is unconditonal.
	* cfgloopmanip.c (unloop): Export.
	* cfgloop.h (unloop): Declare.
	(scale_loop_frequencies): Tidy.
	* loop-unroll.c (decide_peel_once_rolling): Rename to ...
	(decide_peel_not_rolling): ... this one; update documentation;
	kill insn bound.
	(loop_exit_at_end_p): Assert that latches has no conditionals;
	skip non active_insn_p.
	(peel_loops_completely): Update.
	(decide_peel_completely): Use max_loop_iterations_int instead of
	const_iter bound given by simple loop analysis.
	(peel_loop_completely): Handle the non-simple loops.
	(decide_unroll_constant_iterations): Consut profile if we want
	to unroll.
	* gcc.dg/tree-prof/unroll-1.c: New testcase.
	* gcc.dg/tree-prof/peel-1.c: New testcase.
	* gcc.dg/unroll_6.c: New testcase.

loop-unroll.c TLC

Commit Message

Patch