PATCH: stop ivopts from pessimizing code (PR42505)

I spent a lot of time bumbling around in the ivopts cost model for PR42505 
before I finally realized that, even using its existing cost functions, the 
supposedly "optimized" solution it was coming up with had a higher cost metric 
than the original code for the test case in the PR.  Apparently its approach of 
using a heuristic to choose an initial solution and then jiggling it to try to 
find lower-cost alternatives is incapable of jiggling hard enough to find its 
way back to the original set of IV candidates.  So, the simple solution is just 
to compare what it comes up with against the cost of the original set, but 
having done that much it's not much more work to try jiggling the original set 
too to see if it can be optimized further.  That's what I've implemented in the 
attached patch.

The one change to the cost model that I found to be worthwhile was using 
call_used_regs rather than fixed_regs to count the number of registers available 
for holding loop variables.  E.g., on Thumb-1 there are really only 4 registers 
available, not 9, so ivopts was significantly underestimating register pressure 
in the test case filed with the PR.

I tested this patch by doing full bootstrap and regression test on x86-64 
native.  I did some initial size benchmarking with CSiBE on both x86-64 and ARM. 
   Then I did some more detailed benchmarking on ARM for both code size and 
speed.  Here's a summary of the numbers.

Size benchmarks (smaller is better):
   x86:
     CSiBE	-0.22%
   ARMv5TE Thumb-1:
     CSiBE	-0.15%
     CoreMark	-2.7%
     eembc CINT	-0.1%
     spec2000	-0.1%
   ARMv7-a Thumb-2:
     CSiBE	-0.19%
     CoreMark	-5.7%
     eembc CINT	-0.4%
     spec2000	-0.3%

Speed benchmarks (bigger is better):
   ARMv5TE:
     CoreMark	-0.5%
     eembc CINT	-0.3%
     spec2000	-1.2%
   ARMv7-a:
     CoreMark	+1.3%
     eembc CINT	+0.6%
     spec2000	-0.3%

With only the change to check for pessimization and not the costs change, speed 
numbers went down on both ARM targets.  The costs change helped a lot more on 
ARMv7-a than it did on ARMv5TE.  So, perhaps there is still something wrong with 
the ivopts costs model, but perhaps this is just tripping over bugs somewhere 
else in the compiler.  I noted that on both ARMv5TE and ARMv7-a, some of the 
individual spec2000 and eembc benchmarks had quite large regressions (> 5%).  I 
spent a couple days on some further experiments with the cost model, but nothing 
I tried gave an obvious improvement.  My gut feeling now is that there are 
unnecessary spills being generated somewhere else and that there's no point in 
further tinkering with ivopts costs without some specific evidence or test cases 
that indicate that's where the problem is.

Anyway, I've about run out of time for working on this issue for now.  I think 
both changes in my current patch are abstractly improvements over the existing 
code, and from an experimental point of view at least the code size improvements 
are good even if the speed results are mixed.  So, OK to check in the patch as-is?

-Sandra

2010-06-18  Sandra Loosemore  <sandra@codesourcery.com>

	PR middle-end/42505

	gcc/
	* tree-ssa-loop-ivopts.c (determine_set_costs): Delete obsolete
	comments about cost model.
	(try_add_cand_for):  Add second strategy for choosing initial set
	based on original IVs, controlled by ORIGINALP argument.
	(get_initial_solution): Add ORIGINALP argument.
	(find_optimal_iv_set_1): New function, split from find_optimal_iv_set.
	(find_optimal_iv_set): Try two different strategies for choosing
	the IV set, and return the one with lower cost.

	* cfgloopanal.c (init_set_costs): Use call_used_regs rather than
	fixed_regs to count number of registers available for loop variables.

	gcc/testsuite/
	* gcc.target/arm/pr42505.c: New test case.

PATCH: stop ivopts from pessimizing code (PR42505)

Commit Message

Comments

Patch