Message ID | 201805092341.w49Nfr4o030638@ignucius.se.axis.com |
---|---|
State | New |
Headers | show |
Series | Fix PR85726 (div-div suboptimization) and a rant on match.pd :s-flag | expand |
(not a review) On Thu, 10 May 2018, Hans-Peter Nilsson wrote: > Replacing a division feeding a division helps only when the > second division is the only user, and "fusing" the divisions is Well, that's not quite true. int x, y; void f(int n){ int c = 3 << 20; x = n / c; y = n / c; } Here we can optimize the last division to y = 0. After your patch, we likely need VRP to do that simplification. There are probably more complicated transformations this disables. > downright bad if another user of the result of first division is > a modulus of the same value as the second division, forming a > divmod pair. See the test-case, where for the tested > architectures (which all fail the test-case before the patch) > the div and mod are implemented using the high-part of a widened > multiplication and shift, emitted separately but combined as > late as in rtl, with the multiplicaton and shift re-used. That > of course does not happen if later passes see (y / 48; y % 3). > While in match.pd, I noticed the corresponding mul-mul match, > which I believe should be guarded the same way. Did you notice bad codegen because of the multiplication? You are only adding a test for divisions. I am asking because I know such a change will help some cases and hurt others... > Now a rant on the match.pd ":s" flag, which reasonable people > may reasonably suggest I should have used in the patch instead > of the (if (single_use ...)). > > Initially, I got the match.pd-language (which deserves a proper > name) all wrong. Then I read the documentation and still got it > wrong. I "misunderstood" that the ":s" on an operand "O" was > supposed to have the effect of conditionalize the replacement > "R" by wrapping it in "(if (single_use (O)) R)" as in the > suggested patch (above). To wit, this does not work; it will > *not* stop the replacement as seen in the test-case (THIS IS NOT > A SUGGESTED PATCH): > > (for div (trunc_div exact_div) > (simplify > - (div (div @0 INTEGER_CST@1) INTEGER_CST@2) > + (div (div:s @0 INTEGER_CST@1) INTEGER_CST@2) > (with { > bool overflow_p; > wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), > > In PR69556, it seems other people seem to have read the > documentation of ":s" the same way, but are corrected by other > comments there, so I guess it's not my reading that's flawed. > > I suggest preferably (1) correcting the semantics of ":s" to do > as the documentation says because I don't understand the > explanation in PR69556 comment #4 that the replacement "is still > allowed if it is a single operation as that replaces at least > one other (the one we are simplifying)"; I see that as a > complete nullification of the :s flag, making it a nop. > Alternatively (2), if the :s is *not* a nop in some case add an > example of that case and sufficient explanation to > match-and-simplify.texi *and* the suggested ":S" flag > (i.e. *really* conditionalize the replacement by a single-use of > that operand). To simplify, the goal of :s is to avoid increasing the number of instructions. Normally, the transformation output is smaller (or the same size but cheaper, simpler, more canonical) than the input. But if we can't get rid of some of the input intermediate results, the size may still increase. :s does test single_use, but it has a special case. If the output (possibly after several rounds of resimplifications) is at most one instruction, then it cannot be larger than the input (we are at least getting rid of the instruction we called the simplification on), so the transformation does happen even if !single_use. Originally this was only done when the simplification led to an SSA_NAME or a constant, IIRC. Then people start wanting single_use restrictions to reduce register pressure, reduce the size / number of constants, etc. And not all of those want exactly the same conditions. It is useful for high-level transformations to push the canonicalization as far as possible, to notice equivalent quantities or constant bounds in particular. So on a case by case basis, we use :s or single_use or whatever... If we use both y=x/3 and z=x/15 in the same function, should we make an effort to detect it and rewrite to z=y/5?
On Thu, May 10, 2018 at 10:33:39AM +0200, Marc Glisse wrote: > (not a review) > > On Thu, 10 May 2018, Hans-Peter Nilsson wrote: > > >Replacing a division feeding a division helps only when the > >second division is the only user, and "fusing" the divisions is > > Well, that's not quite true. > int x, y; > void f(int n){ > int c = 3 << 20; > x = n / c; > y = x / c; > } [ fixed the typo ] > Here we can optimize the last division to y = 0. After your patch, we > likely need VRP to do that simplification. There are probably more > complicated transformations this disables. Without the replacement we have two dependent divisions; with the replacement we have two independent divisions, and that is much faster on some (many?) systems (that can do two fixed-point divisions in parallel), even if things do not simplify further. Segher
On Thu, May 10, 2018 at 1:41 AM, Hans-Peter Nilsson <hans-peter.nilsson@axis.com> wrote: [...] With all the followups this generated I'm still going to reply here. > Now a rant on the match.pd ":s" flag, which reasonable people > may reasonably suggest I should have used in the patch instead > of the (if (single_use ...)). > > Initially, I got the match.pd-language (which deserves a proper > name) all wrong. Then I read the documentation and still got it > wrong. I "misunderstood" that the ":s" on an operand "O" was > supposed to have the effect of conditionalize the replacement > "R" by wrapping it in "(if (single_use (O)) R)" as in the > suggested patch (above). To wit, this does not work; it will > *not* stop the replacement as seen in the test-case (THIS IS NOT > A SUGGESTED PATCH): > > (for div (trunc_div exact_div) > (simplify > - (div (div @0 INTEGER_CST@1) INTEGER_CST@2) > + (div (div:s @0 INTEGER_CST@1) INTEGER_CST@2) > (with { > bool overflow_p; > wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), > > In PR69556, it seems other people seem to have read the > documentation of ":s" the same way, but are corrected by other > comments there, so I guess it's not my reading that's flawed. The documentation doesn't mention a few details, yes ... I will amend it. > I suggest preferably (1) correcting the semantics of ":s" to do > as the documentation says because I don't understand the > explanation in PR69556 comment #4 that the replacement "is still > allowed if it is a single operation as that replaces at least > one other (the one we are simplifying)"; I see that as a > complete nullification of the :s flag, making it a nop. It doesn't make it a nop in all cases. One motivation of the match.pd language was to allow pattern-matching and simplification in during value-numbering in an efficient way. So when I originally added :s it failed to detect equivalences it could otherwise handle, like computing that (x / 5) / 6 is equal to x / 30 because when one (rightfully so!) adds :s to the x / 5 then for the purpose of canonicalization (which is what value-numbering is interested in) the "simplification" no longer happens. So based on what our value-numbering implementation handles I restricted :s to be in effect only if the simplification causes extra expressions beside the toplevel one to be emitted. Marc gave good examples for the (rdiv (rdiv:s ...) ...) pattern. That's not perfect and we've added explicit single_use () checks in (too many) places. Most issues pop up with stmt-local foldings which indeed should better treat :s "literally". That's because we are lacking a stmt folder with a global cost model - if you have two expressions like tem = x/5; (tem / 7) and (tem / 8) then you _do_ want to simplify them to x / 35 and x / 40 because tem becomes dead. I suppose we need a way to make :s behave depending on context (value numbering or stmt folding). The current implementation doesn't make that too easy - in fact the current semantic is very simple in the implementation. I also think that if the result evaluates to a non-operation (constant or operand) we do not want to "honor" :s either. > Alternatively (2), if the :s is *not* a nop in some case add an > example of that case and sufficient explanation to > match-and-simplify.texi *and* the suggested ":S" flag > (i.e. *really* conditionalize the replacement by a single-use of > that operand). I didn't want to add :S. Beacuse it's not about :s or :S it's really about the context in which the patterns are used. In some sense :s should even be implicitely present on all expressions since originally we just moved fold-const.c patterns to match.pd and there's no such thing as multi-uses in GENERIC (ok, only if you ignore SAVE_EXPR). That said, without actually reviewing the patch, there's room for improvement. That is not adding :S. What it exactly is remains to be determined ;) Richard. > That's just a detail, I do like that language. > > brgds, H-P
On Fri, May 11, 2018 at 2:17 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Thu, May 10, 2018 at 1:41 AM, Hans-Peter Nilsson > <hans-peter.nilsson@axis.com> wrote: > > [...] > > With all the followups this generated I'm still going to reply here. > >> Now a rant on the match.pd ":s" flag, which reasonable people >> may reasonably suggest I should have used in the patch instead >> of the (if (single_use ...)). >> >> Initially, I got the match.pd-language (which deserves a proper >> name) all wrong. Then I read the documentation and still got it >> wrong. I "misunderstood" that the ":s" on an operand "O" was >> supposed to have the effect of conditionalize the replacement >> "R" by wrapping it in "(if (single_use (O)) R)" as in the >> suggested patch (above). To wit, this does not work; it will >> *not* stop the replacement as seen in the test-case (THIS IS NOT >> A SUGGESTED PATCH): >> >> (for div (trunc_div exact_div) >> (simplify >> - (div (div @0 INTEGER_CST@1) INTEGER_CST@2) >> + (div (div:s @0 INTEGER_CST@1) INTEGER_CST@2) >> (with { >> bool overflow_p; >> wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), >> >> In PR69556, it seems other people seem to have read the >> documentation of ":s" the same way, but are corrected by other >> comments there, so I guess it's not my reading that's flawed. > > The documentation doesn't mention a few details, yes ... I will > amend it. As follows, does this make things more clear? Btw, since your case is about not disrupting a div-mod pair a fix would be to expose that pairing earlier (very early actually...). We do expose it during the widen_mul pass which more or less runs right before RTL expansion only. Jakubs suggestion to explicitely check for a 2nd use in a modulo operation would also work but that gets a bit awkward given it requires immediate uses which are _not_ required to be present or up-to-date during the folding machinery (only SSA use->def chains are required to be valid). Thanks, Richard. diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi index 6bc2c47bee7..024d747cafd 100644 --- a/gcc/doc/match-and-simplify.texi +++ b/gcc/doc/match-and-simplify.texi @@ -250,7 +250,9 @@ come second in commutative expressions. The second supported flag is @code{s} which tells the code generator to fail the pattern if the expression marked with -@code{s} does have more than one use. For example in +@code{s} does have more than one use and the simplification +results in an expression with more than one operator. +For example in @smallexample (simplify @@ -261,6 +263,14 @@ generator to fail the pattern if the expression marked with this avoids the association if @code{(pointer_plus @@0 @@1)} is used outside of the matched expression and thus it would stay live and not trivially removed by dead code elimination. +Now consider @code{((x + 3) + -3)} with the temporary +holding @code{(x + 3)} used elsewhere. This simplifies down +to @code{x} which is desirable and thus flagging with @code{s} +does not prevent the transform. Now consider @code{((x + 3) + 1)} +which simplifies to @code{(x + 4)}. Despite being flagged with +@code{s} the simplification will be performed. The +simplification of @code{((x + a) + 1)} to @code{(x + (a + 1))} will +not performed in this case though. More features exist to avoid too much repetition. >> I suggest preferably (1) correcting the semantics of ":s" to do >> as the documentation says because I don't understand the >> explanation in PR69556 comment #4 that the replacement "is still >> allowed if it is a single operation as that replaces at least >> one other (the one we are simplifying)"; I see that as a >> complete nullification of the :s flag, making it a nop. > > It doesn't make it a nop in all cases. One motivation of the > match.pd language was to allow pattern-matching and simplification > in during value-numbering in an efficient way. So when I originally > added :s it failed to detect equivalences it could otherwise handle, > like computing that (x / 5) / 6 is equal to x / 30 because when > one (rightfully so!) adds :s to the x / 5 then for the purpose of > canonicalization (which is what value-numbering is interested in) > the "simplification" no longer happens. So based on what > our value-numbering implementation handles I restricted :s to > be in effect only if the simplification causes extra expressions > beside the toplevel one to be emitted. Marc gave good examples > for the (rdiv (rdiv:s ...) ...) pattern. > > That's not perfect and we've added explicit single_use () checks > in (too many) places. > > Most issues pop up with stmt-local foldings which indeed should > better treat :s "literally". That's because we are lacking a stmt > folder with a global cost model - if you have two expressions like > tem = x/5; (tem / 7) and (tem / 8) then you _do_ want to simplify > them to x / 35 and x / 40 because tem becomes dead. > > I suppose we need a way to make :s behave depending on > context (value numbering or stmt folding). The current > implementation doesn't make that too easy - in fact the > current semantic is very simple in the implementation. > > I also think that if the result evaluates to a non-operation > (constant or operand) we do not want to "honor" :s either. > >> Alternatively (2), if the :s is *not* a nop in some case add an >> example of that case and sufficient explanation to >> match-and-simplify.texi *and* the suggested ":S" flag >> (i.e. *really* conditionalize the replacement by a single-use of >> that operand). > > I didn't want to add :S. Beacuse it's not about :s or :S it's really > about the context in which the patterns are used. In some > sense :s should even be implicitely present on all expressions > since originally we just moved fold-const.c patterns to match.pd > and there's no such thing as multi-uses in GENERIC (ok, only if you > ignore SAVE_EXPR). > > That said, without actually reviewing the patch, there's room for > improvement. That is not adding :S. What it exactly is remains > to be determined ;) > > Richard. > >> That's just a detail, I do like that language. >> >> brgds, H-P
--- /dev/null Tue Oct 29 15:57:07 2002 +++ gcc.dg/pr85726.c Tue May 8 07:18:19 2018 @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-fwprop1" } */ + +/* There should just be one mult-as-div sequence, re-used for the mod, + not one for each of the y / 3 and y % 3 (as in due to suboptimal + simplification as y / 48 and y % 3) and there should no be a trace of + the constant used for the 1 / 48 mult-as-div. */ +int ww, vv; +int x(int y) +{ + int z = y / 16; + int w = z / 3; + int v = z % 3; + ww = w; + return v; +} +/* { dg-final { scan-rtl-dump-times "truncate:SI .lshiftrt:DI .mult:DI .sign_extend:DI" 1 "fwprop1" { target mips*-*-* } } } */ +/* { dg-final { scan-rtl-dump-times " .0x2aaaaaab" 0 "fwprop1" } } */ diff --git a/gcc/match.pd b/gcc/match.pd index 0de4432..e8ebeac 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -278,11 +278,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TYPE_UNSIGNED (type)) (trunc_div @0 @1))) -/* Combine two successive divisions. Note that combining ceil_div - and floor_div is trickier and combining round_div even more so. */ +/* Combine two successive divisions, if the second is the only + user of the result of the first, or else we'll just end up + with two divisions anyway. Note that combining ceil_div and + floor_div is trickier and combining round_div even more so. */ (for div (trunc_div exact_div) (simplify - (div (div @0 INTEGER_CST@1) INTEGER_CST@2) + (div (div@3 @0 INTEGER_CST@1) INTEGER_CST@2) + (if (single_use (@3)) (with { bool overflow_p; wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), @@ -292,12 +295,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (div @0 { wide_int_to_tree (type, mul); }) (if (TYPE_UNSIGNED (type) || mul != wi::min_value (TYPE_PRECISION (type), SIGNED)) - { build_zero_cst (type); }))))) + { build_zero_cst (type); })))))) /* Combine successive multiplications. Similar to above, but handling overflow is different. */ (simplify - (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2) + (mult (mult@3 @0 INTEGER_CST@1) INTEGER_CST@2) + (if (single_use (@3)) (with { bool overflow_p; wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), @@ -306,7 +310,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Skip folding on overflow: the only special case is @1 * @2 == -INT_MIN, otherwise undefined overflow implies that @0 must be zero. */ (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type)) - (mult @0 { wide_int_to_tree (type, mul); })))) + (mult @0 { wide_int_to_tree (type, mul); }))))) /* Optimize A / A to 1.0 if we don't care about NaNs or Infinities. */