Message ID | 202002120647.01C6l4Zi026181@ignucius.se.axis.com |
---|---|
Headers | show |
Series | A set of compare-elimination-fixes. | expand |
> I just rebased and updated the vendors/axis branch > axis/cris-decc0 with the following commits, which should bring > back compare-elimination results to that of cc0 on master. Nice work! An example of transition done properly... > With the exception of the bit-test patterns (btst / btstq which > is more of a "combine" matter), everything is centered around > working together with the "cmpelim" pass with the help of > define_subst attributes. Feel free to further tweak the cmpelim pass if need be; I did it for the Visium so there is a precedent. :-) > No performance tests yet though, but I expect axis/cris-decc0 to > be a win over master, since as I've mentioned before, I see > improvements in register-allocation already in libgcc, which > should get back what's lost in all the special patterns I > deleted. I haven't looked into the cause, but it shouldn't > surprise anyone that there's some noticeable goodies inside > something to the effect of #ifndef HAVE_cc0, even with IRA. > (Conversion to LRA is way down on the TODO list.) For the Visium, the transition was a win overall, with sporadic regressions in delay slot filling. The transition to LRA looks more problematic.
On Wed, 2020-02-12 at 07:47 +0100, Hans-Peter Nilsson wrote: > I just rebased and updated the vendors/axis branch > axis/cris-decc0 with the following commits, which should bring > back compare-elimination results to that of cc0 on master. > > With the exception of the bit-test patterns (btst / btstq which > is more of a "combine" matter), everything is centered around > working together with the "cmpelim" pass with the help of > define_subst attributes. Regression test-cases have already > been committed to master (the recently committed pr93372-* > tests), covering all patterns but not all CCmodes or conditions. > All patches regtested for cris-elf, at a smaller granularity > than these partially squashed commits, but naturally with > regressions for the pr93372-* testcases until the last one of > these commits. > > No performance tests yet though, but I expect axis/cris-decc0 to > be a win over master, since as I've mentioned before, I see > improvements in register-allocation already in libgcc, which > should get back what's lost in all the special patterns I > deleted. I haven't looked into the cause, but it shouldn't > surprise anyone that there's some noticeable goodies inside > something to the effect of #ifndef HAVE_cc0, even with IRA. > (Conversion to LRA is way down on the TODO list.) > > It's a bit unfortunate that so many pattern names are now > obfuscated with the define_subst_attr attributes (like > "<acc><anz><anzvc>zero_extend<mode>si2<setcc><setnz><setnzvc>" > instead of "zero_extend<mode>si2"), but I'll take that single > line change in patterns over duplicated or triplicated patterns. FWIW, I'm evaluating the converted H8 on/off. In general it looks to be a wash there. THere's a few cases where we're doing better, possibly because I've actually improved the precision of condition code tracking in various patterns and done some other simplifications along the way. The H8 is a type-2 port. It's easiest to think of it as everything clobbering the condition codes, even most moves. The H8 also doesn't perform variable or multi-position shifts -- and the shifting patterns sometimes need scratch registers. So at expand time we inject a (clobber (match_scratch ...)) expression. Then post-reload splitting add the clobber of the condition code register resulting in two clobbers on all the shift insns. As it turns out cmp-elim won't handle that. So I improved some of the H8 expanders so generate simpler RTL when we know the scratch won't be needed at expansion time. That's allowing cmp-elim to do a reasonable job exploiting the condition codes set by the shift/rotate insns. But it also allows fwprop to make a trivial improvement on some tests. That trivial improvement from fwprop in turn hinders combine and can occasionally causes us to fail to narrow certain shift constructs from HI to QI modes. Anyway, I mostly mention it because of the multi-clobber problem. If your CC_REG clobbering insn has more clobbers than just the CC register, then it won't be used to do eliminate comparisons. Jeff