Message ID | 202007060201.06621sH0032715@ignucius.se.axis.com |
---|---|
State | New |
Headers | show |
Series | RFC: make combine do as advertised (cheaper-than)? | expand |
On Mon, Jul 6, 2020 at 4:03 AM Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Most comments, including the second sentence in the head comment > of combine_validate_cost, the main decision-maker of the combine > pass, refer to the function as returning true if the new > insns(s) *cheaper* than the old insns, when in fact the function > returned true also if the cost was the same. Returning true for > cheaper also seems more sane than as-cheap-as considering the > need to avoid oscillation between same-cost combinations. Also, > it makes the job of later passes harder, having combine make > more complex combinations of the same cost. > > Right, you can affect this with your target TARGET_RTX_COSTS and > TARGET_INSN_COST hooks, but only for trivial cases, and you have > increasingly more complex combinations (many-to-many > combinations) where you have to twist simple logic to appease > combine (stop it from combining) or give up. > > Main-interest ports are unsurprisingly pretty tied to this > effect. I'd love to install the following patch, adjusting the > function and the two opposing comments. But...it causes > hundreds of regressions for each of x86_64-linux and > aarch64-linux (tens for ppc64le-linux), so I'm just not up to > the task, at least not without previous buy-in from reviewers. > > It would need those targets to have their TARGET_INSN_COST > and/or TARGET_RTX_COSTS functions adjusted. > > Alternatives from the top of my head, one of: > > - With buy-in from global reviewers, installing this patch on a > development branch and let all target maintainers adjust their > target test-cases and cost-functions there, for merge when > first-class targets are done. (I'm a dreamer.) > > - A target combine hook for the decision (passing for inspection > tuples of from-insns and to-insns and costs) and just falling > back to the current addition of rtx costs. > > - A simpler target combine decision hook that says which one of > "cheaper" or "as-cheap-as". A target combine decision hook that on old == new cost decides between both and given the actual insns? Might lead to quite hackish target hook implementations... > - Adjusting documentation and comments that are currently > untruthful about the cost decision to instead say (to the effect > of) "as cheap as" instead of "cheaper". Well, adjusting the function comment to reflect reality is kind-of obvious ;) > So, WDYT? > > (Tested as above, causing massive pattern-match regressions.) > > gcc: > * combine.c (combine_validate_cost): Reject unless the new total > cost is cheaper than the original. Adjust the minority of > comments that don't say "cheaper": > > --- > gcc/combine.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/combine.c b/gcc/combine.c > index f69413a..7da144e 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -846,8 +846,8 @@ do_SUBST_LINK (struct insn_link **into, struct insn_link *newval) > than the original sequence I0, I1, I2, I3 and undobuf.other_insn. Note > that I0, I1 and/or NEWI2PAT may be NULL_RTX. Similarly, NEWOTHERPAT and > undobuf.other_insn may also both be NULL_RTX. Return false if the cost > - of all the instructions can be estimated and the replacements are more > - expensive than the original sequence. */ > + of all the instructions can be estimated and the replacements are not > + cheaper than the original sequence. */ > > static bool > combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3, > @@ -938,8 +938,8 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3, > } > > /* Disallow this combination if both new_cost and old_cost are greater than > - zero, and new_cost is greater than old cost. */ > - int reject = old_cost > 0 && new_cost > old_cost; > + zero, and new_cost is greater than or equal to the old cost. */ > + int reject = old_cost > 0 && new_cost >= old_cost; > > if (dump_file) > { > -- > 2.11.0
Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Most comments, including the second sentence in the head comment > of combine_validate_cost, the main decision-maker of the combine > pass, refer to the function as returning true if the new > insns(s) *cheaper* than the old insns, when in fact the function > returned true also if the cost was the same. Returning true for > cheaper also seems more sane than as-cheap-as considering the > need to avoid oscillation between same-cost combinations. Also, > it makes the job of later passes harder, having combine make > more complex combinations of the same cost. > > Right, you can affect this with your target TARGET_RTX_COSTS and > TARGET_INSN_COST hooks, but only for trivial cases, and you have > increasingly more complex combinations (many-to-many > combinations) where you have to twist simple logic to appease > combine (stop it from combining) or give up. > > Main-interest ports are unsurprisingly pretty tied to this > effect. I'd love to install the following patch, adjusting the > function and the two opposing comments. But...it causes > hundreds of regressions for each of x86_64-linux and > aarch64-linux (tens for ppc64le-linux), so I'm just not up to > the task, at least not without previous buy-in from reviewers. Out of interest, how do the results change if we still allow the combination for equal costs if the new sequence is shorter than the original? I think that still counts as “cheaper than”, but maybe I'm too RISC-centric. ;-) (I'm not saying that's what we should do, I'm just curious.) Originally combine always produced shorter sequences, so by the above reasoning, combining for == was correct. These days we allow N-to-N replacements too, which are obviously a good thing when the new cost is lower, but are more of a wash when the costs are the same. But even then, the combination should have a “canonicalisation” effect. (Unfortunately that effect includes the result of expand_compound_operation/make_compound_operation.) Do you have specific examples of where things go wrong? Thanks, Richard
Hi! On Mon, Jul 06, 2020 at 10:48:25AM +0100, Richard Sandiford wrote: > Originally combine always produced shorter sequences, so by the (Shorter in # insns, not # bytes). > above reasoning, combining for == was correct. These days we allow > N-to-N replacements too, which are obviously a good thing when Only 2-2 so far (not 1-1 yet, there are some target problems to overcome first). > the new cost is lower, but are more of a wash when the costs > are the same. But even then, the combination should have a > “canonicalisation” effect. (Unfortunately that effect includes > the result of expand_compound_operation/make_compound_operation.) 2-2 is always reducing latency if the costs are equal (and sane ;-) ), that is a large part of what makes 2-2 combinations useful. Originally the output of i2 is input to i3, but not anymore in the new insns. Segher
Hi! On Mon, Jul 06, 2020 at 04:01:54AM +0200, Hans-Peter Nilsson via Gcc-patches wrote: > Most comments, including the second sentence in the head comment > of combine_validate_cost, the main decision-maker of the combine > pass, refer to the function as returning true if the new > insns(s) *cheaper* than the old insns, when in fact the function > returned true also if the cost was the same. Returning true for > cheaper also seems more sane than as-cheap-as considering the > need to avoid oscillation between same-cost combinations. Also, > it makes the job of later passes harder, having combine make > more complex combinations of the same cost. All of this was added in https://gcc.gnu.org/g:64b8935d4809 , which also adds the + /* Disallow this recombination if both new_cost and old_cost are + greater than zero, and new_cost is greater than old cost. */ comment (which is what it actually does). Before that change, combine didn't look at costs at all. Combine never has used a "strictly smaller than" cost thing. > Right, you can affect this with your target TARGET_RTX_COSTS and > TARGET_INSN_COST hooks, but only for trivial cases, and you have > increasingly more complex combinations (many-to-many > combinations) where you have to twist simple logic to appease > combine (stop it from combining) or give up. There are 2-1, 3-1, 4-1, 3-2, 4-2, which all reduce the number of insns, and then there is 2-2, which gets rid of one log_link. If the isnn_cost stays the same, it always wins something else. > Alternatives from the top of my head, one of: ... 5) Improve your target so that its insn_cost reflects ithe costs of the insns better. Can you share some typical examples where things are worse with the current behaviour? Segher
Hi! On Mon, Aug 10, 2020 at 05:47:53PM +0200, Hans-Peter Nilsson wrote: > > All of this was added in https://gcc.gnu.org/g:64b8935d4809 , which also > > adds the > > > > + /* Disallow this recombination if both new_cost and old_cost are > > + greater than zero, and new_cost is greater than old cost. */ > > > > comment (which is what it actually does). Before that change, combine > > didn't look at costs at all. > > > > Combine never has used a "strictly smaller than" cost thing. > > (I suggest we use terms of generally greater/lower cost, instead > of smaller/greater cost, to avoid confusion whether "smaller" > refers to the general cost or just code size. It refers to neither. It refers to the concept of "cost" in combine, which is a number (and some extra conditions). Combine is a local optimisation ("peephole", if you want), so it uses a local cost function. And yes, that cost is calculated differently if you use -Os. The is because it uses insn_cost to base on. As it is a number, I (and most other people) use "smaller than". > > > Right, you can affect this with your target TARGET_RTX_COSTS and > > > TARGET_INSN_COST hooks, but only for trivial cases, and you have > > > increasingly more complex combinations (many-to-many > > > combinations) where you have to twist simple logic to appease > > > combine (stop it from combining) or give up. > > > > There are 2-1, 3-1, 4-1, 3-2, 4-2, which all reduce the number of insns, > > and then there is 2-2, which gets rid of one log_link. If the isnn_cost > > stays the same, it always wins something else. > > That "something else" is presumptious. I wrote this code, I am maintainer of combine, I really do know. This is not presumptuous. This is important to make sure combine never can get into an infinite loop! > A longer > dependency-chain may sum up to faster execution considering > parallel or OoO execution. Someone adding another N-M > combination will notice, after two more years of work. :) Combine knows nothing about how any specific CPU will actually execute the code. It sees just a tiny fraction of code at any one time, anyway. All it does is optimise for the lowest cost. Whether that actually performs best, is something the backend writer has to take care of. (insn_cost is only the secondary "score", anyway: the number of RTL insns is primary). > > 5) Improve your target so that its insn_cost reflects ithe costs of > > the insns better. > > I see cheap gnawing, I hope I didn't add any of that myself. I guess we both did. Sorry. > I already covered target costs before the 1..4 list and you > actually quoted that (the quoted paragraph above your log_links > comment). Let me write this differently, then: 5) Write your target cost callbacks so that you work *with* combine, not *against* it. This gives better results. > To recap, it all began with the observation of comments in > combine.c saying "cheaper than", with the code implementing > "same or cheaper", together with the flaw fixed by 84c5396d4bdbf > which I belive is a sufficient conclusion for that specific > instance. A lower cost also seems more consistent with other > decisions. (BTW, the commit is a bit misleading; I believe all > "case #2" cc0 convertees will benefit from an accurate > is_just_move, not just CRIS.) It makes a difference on many other targets, but it never results in different code there. I wonder why not (or, why it does for cris). > The fixed 2-2 case is all the "typical examples" I had so far. > Remaining suggestions are just fixing perceived inconsistencies. All the other cases reduce the number of RTL insns, which is a clear cost metric (and originally the only one!) If you have RTL insns that correspond to more than just single machine insns, this can hurt you; otherwise, there are a few cases where you want to trick combine, but not many (doing that tends to backfire anyway, better keep it to a minimum). Segher
diff --git a/gcc/combine.c b/gcc/combine.c index f69413a..7da144e 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -846,8 +846,8 @@ do_SUBST_LINK (struct insn_link **into, struct insn_link *newval) than the original sequence I0, I1, I2, I3 and undobuf.other_insn. Note that I0, I1 and/or NEWI2PAT may be NULL_RTX. Similarly, NEWOTHERPAT and undobuf.other_insn may also both be NULL_RTX. Return false if the cost - of all the instructions can be estimated and the replacements are more - expensive than the original sequence. */ + of all the instructions can be estimated and the replacements are not + cheaper than the original sequence. */ static bool combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3, @@ -938,8 +938,8 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3, } /* Disallow this combination if both new_cost and old_cost are greater than - zero, and new_cost is greater than old cost. */ - int reject = old_cost > 0 && new_cost > old_cost; + zero, and new_cost is greater than or equal to the old cost. */ + int reject = old_cost > 0 && new_cost >= old_cost; if (dump_file) {