Message ID | 20141017182811.GA14499@f1.c.bardezibar.internal |
---|---|
State | New |
Headers | show |
On Fri, 17 Oct 2014, Sebastian Pop wrote: > Sebastian Pop wrote: > > Richard Biener wrote: > > > looks like > > > RTL issues and/or IVOPTs issues? > > > > I should have posted the first diff between the compilers with -fdump-tree-all: > > that would expose the problem at its root. > > Looks like this is caused by the fwprop pass: > > diff -u -r ./foo.i.087t.forwprop3 ../mas/foo.i.087t.forwprop3 > --- ./foo.i.087t.forwprop3 2014-10-17 13:17:29.985327000 -0500 > +++ ../mas/foo.i.087t.forwprop3 2014-10-17 13:17:29.308814000 -0500 > @@ -5,6 +5,8 @@ > Pass statistics: > ---------------- > > +Applying pattern match-comparison.pd:43, gimple-match.c:11747 > +gimple_simplified to if (i_20 != 99) > > Pass statistics: > ---------------- > @@ -60,7 +62,7 @@ > i_17 = i_20 + 1; > # DEBUG iD.2450 => i_17 > # DEBUG iD.2450 => i_17 > - if (i_17 != 100) > + if (i_20 != 99) > goto <bb 3>; > else > goto <bb 4>; Ok, so this is one effect on the thing Marc pointed out - currently no patterns (well, no but one) guards itself with has_single_use predicates. That was a conscious decision and the idea was that the caller should do this via its lattice valueization function which could look like tree valueize (tree t) { if (TREE_CODE (t) == SSA_NAME && !has_single_use (t)) return NULL_TREE; return t; } But of course doing that unconditionally would also pessimize code. Generally we'd like to avoid un-CSEing stuff in a way that cannot be CSEd again. That's a more complex condition than what can be implemented with has_single_use. You might also consider a stmt doing a_1 + a_1 where a_1 has two uses now. For Sebastians case above the issue is that we are appearantly bad at optimizing post-increment exit tests. But if you'd consider code like i_2 = i_1 + 1; b1_3 = i_2 < 100; b2_4 = i_2 > 50; if (b1_3 && b2_4) ... then it is profitable to remove i_2 by changing the two comparisons to i_2 <= 98 and i_2 > 49. I thought about doing all simplifications first without committing any simplified sequence to the IL, then scanning over the result, pruning out cases that end up pessimizing code (how exactly isn't yet clear to me). So I'm not sure what we want to do here now. I don't very much like doing things explicitely in the pattern description (nor using the "has_single_use" predicate). I suppose for the gimple_build () stuff we could restrict simplifications to the expression we are building (not simplifying with SSA defs in the IL), more exactly mimicing fold_buildN behavior. I suppose for forwprop we could use the above valueize hook (but then regress because not all patterns as implemented in forwprop guard their def stmt lookup with has_single_use...). Any opinion on this? Any idea of a "simple" cost function if you have the functions IL before and after simplifications (but without any DCE/CSE applied)? Thanks, Richard.
On 10/20/14 05:42, Richard Biener wrote: > That was a conscious decision and the idea was that the caller should > do this via its lattice valueization function which could look like > > tree > valueize (tree t) > { > if (TREE_CODE (t) == SSA_NAME > && !has_single_use (t)) > return NULL_TREE; > return t; > } > > But of course doing that unconditionally would also pessimize code. > Generally we'd like to avoid un-CSEing stuff in a way that cannot > be CSEd again. That's a more complex condition than what can be > implemented with has_single_use. You might also consider a > stmt doing a_1 + a_1 where a_1 has two uses now. FWIW, I wouldn't worry much about the two uses in a single statement case. I looked at that in RTL eons ago it just doesn't happen enough to bother trying to detect and treat as a single use. > > I thought about doing all simplifications first without committing > any simplified sequence to the IL, then scanning over the result, > pruning out cases that end up pessimizing code (how exactly isn't > yet clear to me). > > So I'm not sure what we want to do here now. I don't very much like > doing things explicitely in the pattern description (nor using the > "has_single_use" predicate). > I suppose for the gimple_build () stuff we could restrict simplifications > to the expression we are building (not simplifying with SSA defs in the > IL), more exactly mimicing fold_buildN behavior. > I suppose for forwprop we could use the above valueize hook (but then > regress because not all patterns as implemented in forwprop guard > their def stmt lookup with has_single_use...). > > Any opinion on this? Any idea of a "simple" cost function if > you have the functions IL before and after simplifications (but > without any DCE/CSE applied)? It's certainly ideal to be able to be able to CSE/un-CSE depending on final context and it's a design goal I've heard other compiler developers making. ie, every transformation early which may be somewhat speculative must be "un-doable" later. But the infrastructure for that is, umm, hard. The concept of simplify on the side, then prune out stuff that isn't profitable is nice, but as you state, that's nontrivial as well. In general, the has_single_use case is profitable. So we want to aggressively go after those and I think we can commit those immediately and use the valueize function shown above. Maybe you then look at the more speculative cases... jeff
diff -u -r ./foo.i.087t.forwprop3 ../mas/foo.i.087t.forwprop3 --- ./foo.i.087t.forwprop3 2014-10-17 13:17:29.985327000 -0500 +++ ../mas/foo.i.087t.forwprop3 2014-10-17 13:17:29.308814000 -0500 @@ -5,6 +5,8 @@ Pass statistics: ---------------- +Applying pattern match-comparison.pd:43, gimple-match.c:11747 +gimple_simplified to if (i_20 != 99) Pass statistics: ---------------- @@ -60,7 +62,7 @@ i_17 = i_20 + 1; # DEBUG iD.2450 => i_17 # DEBUG iD.2450 => i_17 - if (i_17 != 100) + if (i_20 != 99) goto <bb 3>; else goto <bb 4>; [...] diff -u -r ./foo.i.089t.ccp3 ../mas/foo.i.089t.ccp3 --- ./foo.i.089t.ccp3 2014-10-17 13:17:29.991734000 -0500 +++ ../mas/foo.i.089t.ccp3 2014-10-17 13:17:29.316140000 -0500 @@ -53,13 +53,13 @@ # VUSE <.MEM_16> return; -i_17 : -->2 uses. +i_17 : --> single use. i_20 = PHI <i_17(3), 0(2)> # DEBUG i => i_17 -if (i_17 != 100) # DEBUG i => i_17 -i_20 : -->2 uses. +i_20 : -->3 uses. +if (i_20 != 99) i_17 = i_20 + 1; _4 = (long unsigned int) i_20; # DEBUG i => i_20