[PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

This example which I wrote to test ifconversion, currently fails to if-convert 
or vectorize:

int foo ()
{
   for (int i = 0; i < 32 ; i++)
     {
       int m = (a[i] & i) ? 5 : 4;
       b[i] = a[i] * m;
     }
}

...because jump-threading in dom1 rearranged the loop into a form that neither 
if-conversion nor vectorization would attempt. Discussion at 
https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that I 
should rerun loop-header copying (an earlier attempt to fix ifconversion, 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not enable 
vectorization.)

This patch does so (and makes slightly less conservative, to tackle the example 
above). I found I had to make this a separate pass, so that the phi nodes were 
cleaned up at the end of the pass before running tree_if_conversion. Also at 
this stage in the compiler (inside loop opts) it was not possible to run 
loop_optimizer_init+finalize, or other loop_optimizer data structures needed 
later would be deleted; hence, I have two nearly-but-not-quite-identical passes, 
the new "ch_vect" avoiding the init/finalize. I tried to tackle this with some 
C++ subclassing, which removes the duplication, but the result feels a little 
ugly; suggestions for any neater approach welcome.

This patch causes failure of the scan-tree-dump of dom2 in gcc.dg/ssa/pr21417.c. 
This looks for jump-threading to perform an optimization, but no longer finds 
the expected line in the log - as the loop-header-copying phase has already done 
an equivalent transformation *before* dom2. The final CFG is thus in the desired 
form, but I'm not sure how to determine this (scanning the CFG itself is very 
difficult, well beyond what we can do with regex, requiring looking at multiple 
lines and basic blocks). Can anyone advise? [The test issue can be worked around 
by preserving the old do_while_p logic for the first header-copying pass, and 
using the new logic only for the second, but this is more awkward inside the 
compiler, which feels wrong.]

Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of other 
examples where this patch enables (undesired!) vectorization. I've dealt with 
these, but for the record:
	* gcc.dg/vect/slp-perm-7.c: the initialization loop in main, contained a check 
that input[i] < 200; this was already optimized out (because input[i] was set to 
i%256, where i<N with N #defined to 16), but that loop was not vectorized because:
/work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: 
note: not vectorized: latch block not empty.
/work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: 
note: bad loop form.

	* gcc.dg/vect/vect-strided-a-u16-i4.c: the main1() function has three loops; 
the first (initialization) has an 'if (y) abort() /* Avoid vectorization.  */'. 
However, the 'volatile int y = 0' this was meant to reference, is actually 
shadowed by a local non-volatile; the test is thus peeled off and absent from 
the body of the loop. The loop only avoided vectorization because of non-empty 
latch and bad loop form, as previous.

With this patch, both those loops now have good form, hence I have fixed both to 
check a global volatile to prevent these extraneous parts from being vectorized.

Tested with bootstrap + check-gcc on x86_64 and AArch64 (linux). As noted above, 
this causes a spurious PASS->FAIL of a scan-tree-dump test, which I'm unsure how 
to fix, but no other regressions.

gcc/ChangeLog:

	* tree-pass.h (make_pass_ch_vect): New.
	* passes.def: Add pass_ch_vect just before pass_if_conversion.

	* tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops,
	look for blocks with exit edge and code after them.
	(pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New.
	(class pass_ch): Extend pass_ch_vect.
	(pass_ch::execute): Move all but loop_optimizer_init/finalize to...
	(pass_ch_vect::execute): ...here.

	* tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags
	LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-perm-7.c (zero): New.
	(main): Test zero rather than input[i], to avoid vectorization.
	* gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w.
	of unsigned
	* gcc.dg/vect/vect-ifcvt-11.c: New.

[PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

Commit Message

Comments

Patch