Message ID | 4DBEC003.4050300@gjlay.de |
---|---|
State | New |
Headers | show |
2011/5/2 Georg-Johann Lay <avr@gjlay.de>: > This is a fix for an optimization flaw when a long value is composed > from byte values. > > For -fsplit-wide-types (which is still default for avr) the code is > worse than with -fno-split-wide-types. The code for the test case is > better in either situations, i.e. compared to code without the patch, > but it is still not optimal. > > Fixing this by some combine patterns is the only thing the BE can do. > I did not write more complex patterns because things get too complex > with little performance gain. > > Tested without regressions. > > Johann > > 2011-05-02 Georg-Johann Lay <avr@gjlay.de> > > PR target/27663 > * config/avr/predicates.md (const_8_16_24_operand): New predicate. > * config/avr/avr.md ("*ior<mode>qi.byte0", > "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns. > I'm sorry, but I dot'n like to have a both combiner related patches in port because code improvement isn't much and your patterns are difficult to understand and maintain. May be somebody else have a different oppinion ? I'm open to discussion. Denis.
Denis Chertykov schrieb: > 2011/5/2 Georg-Johann Lay <avr@gjlay.de>: >> This is a fix for an optimization flaw when a long value is composed >> from byte values. >> >> For -fsplit-wide-types (which is still default for avr) the code is >> worse than with -fno-split-wide-types. The code for the test case is >> better in either situations, i.e. compared to code without the patch, >> but it is still not optimal. >> >> Fixing this by some combine patterns is the only thing the BE can do. >> I did not write more complex patterns because things get too complex >> with little performance gain. >> >> Tested without regressions. >> >> Johann >> >> 2011-05-02 Georg-Johann Lay <avr@gjlay.de> >> >> PR target/27663 >> * config/avr/predicates.md (const_8_16_24_operand): New predicate. >> * config/avr/avr.md ("*ior<mode>qi.byte0", >> "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns. >> > > I'm sorry, but I dot'n like to have a both combiner related patches in > port because code improvement isn't much and your patterns are > difficult to understand and maintain. You refer to this patch for PR42210? http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02099.html > May be somebody else have a different oppinion ? > I'm open to discussion. The patterns in this patch are similar to "*addhi3_zero_extend", "*addhi3_zero_extend1" that handle HI+QI resp. "*addhi3_zero_extend" that handle SI+QI. The difference is that they handle IOR instead of PLUS. It's true that the user has to use some specific code (addition of QI to HI resp. SI in the first case and ORing of QI to HI resp. SI in the second). IMO insn combine is a very powerful pass and I do not see why the avr BE should not take advantage of it to synthesize new instructions. Note that other parts like "*sbi" or "*cbi" rely on insn combine, too. If it's hard to understand what their intention is, I can add some more comments. As insn combine is capable of generating new instructions that are not covered by standard patterns, it is only natural that they might be more complicated than standard patterns. But almost everything in GCC is complicated, even in the avr BE stuff like, e.g. handling of rotate, is way much more complicated. The new patterns are restricted to one single place in the backend. If they are correct, they are supposed to work in the future without steadily maintaining them. I agree that it would be nice if the middleend detected the expressions as, say, (set (zero_extract:QI (reg:SI ...))), but that's not the case; not even on 32-bit targets with full insv/extzv support. And as I already wrote, the -fsplit-wide-types is not a good choice on avr (except for 64-bit stuff where subreg lowering leads to much code), see http://gcc.gnu.org/ml/gcc/2011-03/msg00261.html So with -fno-split-wide-types and some more elaborate testcase you will see that the new patterns are a clear improvement. Johann > > Denis. >
Denis Chertykov schrieb: > 2011/5/2 Georg-Johann Lay <avr@gjlay.de>: > >>This is a fix for an optimization flaw when a long value is composed >>from byte values. >> >>For -fsplit-wide-types (which is still default for avr) the code is >>worse than with -fno-split-wide-types. The code for the test case is >>better in either situations, i.e. compared to code without the patch, >>but it is still not optimal. >> >>Fixing this by some combine patterns is the only thing the BE can do. >>I did not write more complex patterns because things get too complex >>with little performance gain. >> >>Tested without regressions. >> >>Johann >> >>2011-05-02 Georg-Johann Lay <avr@gjlay.de> >> >> PR target/27663 >> * config/avr/predicates.md (const_8_16_24_operand): New predicate. >> * config/avr/avr.md ("*ior<mode>qi.byte0", >> "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns. >> > I'm sorry, but I dot'n like to have a both combiner related patches in > port because code improvement isn't much and your patterns are > difficult to understand and maintain. > > May be somebody else have a different oppinion ? > I'm open to discussion. > > Denis. Let me add that the patch is generic enough to also improve ORing HI against QI like described in http://gcc.gnu.org/PR41076 which is not uncommon on avr. Johann
2011/5/11 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> >> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>: >> >>> This is a fix for an optimization flaw when a long value is composed >>> from byte values. >>> >>> For -fsplit-wide-types (which is still default for avr) the code is >>> worse than with -fno-split-wide-types. The code for the test case is >>> better in either situations, i.e. compared to code without the patch, >>> but it is still not optimal. >>> >>> Fixing this by some combine patterns is the only thing the BE can do. >>> I did not write more complex patterns because things get too complex >>> with little performance gain. >>> >>> Tested without regressions. >>> >>> Johann >>> >>> 2011-05-02 Georg-Johann Lay <avr@gjlay.de> >>> >>> PR target/27663 >>> * config/avr/predicates.md (const_8_16_24_operand): New predicate. >>> * config/avr/avr.md ("*ior<mode>qi.byte0", >>> "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns. >>> >> I'm sorry, but I dot'n like to have a both combiner related patches in >> port because code improvement isn't much and your patterns are >> difficult to understand and maintain. >> >> May be somebody else have a different oppinion ? >> I'm open to discussion. >> >> Denis. > > Let me add that the patch is generic enough to also improve ORing HI against > QI like described in > > http://gcc.gnu.org/PR41076 > > which is not uncommon on avr. You right here. Please, apply the patch. Denis.
Denis Chertykov schrieb: > 2011/5/11 Georg-Johann Lay <avr@gjlay.de>: >> Denis Chertykov schrieb: >>> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>: >>> >>>> This is a fix for an optimization flaw when a long value is composed >>>> from byte values. >>>> >>>> For -fsplit-wide-types (which is still default for avr) the code is >>>> worse than with -fno-split-wide-types. The code for the test case is >>>> better in either situations, i.e. compared to code without the patch, >>>> but it is still not optimal. >>>> >>>> Fixing this by some combine patterns is the only thing the BE can do. >>>> I did not write more complex patterns because things get too complex >>>> with little performance gain. >>>> >>>> Tested without regressions. >>>> >>>> Johann >>>> >>>> 2011-05-02 Georg-Johann Lay <avr@gjlay.de> >>>> >>>> PR target/27663 >>>> * config/avr/predicates.md (const_8_16_24_operand): New predicate. >>>> * config/avr/avr.md ("*ior<mode>qi.byte0", >>>> "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns. >>>> >>> I'm sorry, but I dot'n like to have a both combiner related patches in >>> port because code improvement isn't much and your patterns are >>> difficult to understand and maintain. >>> >>> May be somebody else have a different oppinion ? >>> I'm open to discussion. >>> >>> Denis. >> Let me add that the patch is generic enough to also improve ORing HI against >> QI like described in >> >> http://gcc.gnu.org/PR41076 >> >> which is not uncommon on avr. > > You right here. > Please, apply the patch. Applied the current version of the patch: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173792 There are still situations that could be handled easily by avr like: [1]: Two instructions (set (reg:HI 50) (ior:HI (ashift:HI (zero_extend:HI (reg:QI 52)) (const_int 8)) (zero_extend:HI (reg:QI 55)))) [2]: Two instructions (set (reg:HI 50) (ior:HI (ashift:HI (reg:HI 51) (const_int)) (zero_extend:HI (reg:QI 55)))) [3]: One Instruction (set (reg:HI 50) (ior:HI (ashift:HI (reg:HI 51) (const_int 8)) (reg:HI 54))) If you agree, I make a patch to cover these cases, too. Just for HI, without mode iterators so that they are easier to understand. IMO it's preferred to do some pre-reload transforms instead of trying to catch such situations in peepholes or let the optimization opportunity pass by without making use of it. Unfortunately, there is no standard form the middle end tries; it differs depending on if a value is in memory or in a register and of -fsplit-wide-types is on or not. The patterns may seem complicated, but it's just what's going on... Johann > > Denis. >
2011/5/16 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/5/11 Georg-Johann Lay <avr@gjlay.de>: >>> Denis Chertykov schrieb: >>>> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>: >>>> >>>>> This is a fix for an optimization flaw when a long value is composed >>>>> from byte values. >>>>> >>>>> For -fsplit-wide-types (which is still default for avr) the code is >>>>> worse than with -fno-split-wide-types. The code for the test case is >>>>> better in either situations, i.e. compared to code without the patch, >>>>> but it is still not optimal. >>>>> >>>>> Fixing this by some combine patterns is the only thing the BE can do. >>>>> I did not write more complex patterns because things get too complex >>>>> with little performance gain. >>>>> >>>>> Tested without regressions. >>>>> >>>>> Johann >>>>> >>>>> 2011-05-02 Georg-Johann Lay <avr@gjlay.de> >>>>> >>>>> PR target/27663 >>>>> * config/avr/predicates.md (const_8_16_24_operand): New predicate. >>>>> * config/avr/avr.md ("*ior<mode>qi.byte0", >>>>> "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns. >>>>> >>>> I'm sorry, but I dot'n like to have a both combiner related patches in >>>> port because code improvement isn't much and your patterns are >>>> difficult to understand and maintain. >>>> >>>> May be somebody else have a different oppinion ? >>>> I'm open to discussion. >>>> >>>> Denis. >>> Let me add that the patch is generic enough to also improve ORing HI against >>> QI like described in >>> >>> http://gcc.gnu.org/PR41076 >>> >>> which is not uncommon on avr. >> >> You right here. >> Please, apply the patch. > > Applied the current version of the patch: > > http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173792 > > There are still situations that could be handled easily by avr like: > > [1]: Two instructions > (set (reg:HI 50) > (ior:HI (ashift:HI (zero_extend:HI (reg:QI 52)) > (const_int 8)) > (zero_extend:HI (reg:QI 55)))) > > [2]: Two instructions > (set (reg:HI 50) > (ior:HI (ashift:HI (reg:HI 51) > (const_int)) > (zero_extend:HI (reg:QI 55)))) > > [3]: One Instruction > (set (reg:HI 50) > (ior:HI (ashift:HI (reg:HI 51) > (const_int 8)) > (reg:HI 54))) > > If you agree, I make a patch to cover these cases, too. Just for HI, > without mode iterators so that they are easier to understand. > > IMO it's preferred to do some pre-reload transforms instead of trying > to catch such situations in peepholes or let the optimization > opportunity pass by without making use of it. > > Unfortunately, there is no standard form the middle end tries; it > differs depending on if a value is in memory or in a register and of > -fsplit-wide-types is on or not. > > The patterns may seem complicated, but it's just what's going on... Please, provide the patch. Denis.
Index: config/avr/predicates.md =================================================================== --- config/avr/predicates.md (Revision 172902) +++ config/avr/predicates.md (Arbeitskopie) @@ -138,3 +138,10 @@ (define_predicate "call_insn_operand" (define_predicate "pseudo_register_operand" (and (match_code "reg") (match_test "!HARD_REGISTER_P (op)"))) + +;; Return true if OP is a constant integer that is either +;; 8 or 16 or 24. +(define_predicate "const_8_16_24_operand" + (and (match_code "const_int") + (match_test "8 == INTVAL(op) || 16 == INTVAL(op) || 24 == INTVAL(op)"))) + Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (Revision 172902) +++ config/avr/avr.md (Arbeitskopie) @@ -3388,3 +3388,42 @@ (define_insn "fmulsu" clr __zero_reg__" [(set_attr "length" "3") (set_attr "cc" "clobber")]) + + +;; Some combine patterns that try to fix bad code when a value is composed +;; from byte parts like in PR27663. +;; The patterns give some release but the code still is not optimal, +;; in particular when subreg lowering (-fsplit-wide-types) is turned on. +;; That switch obfuscates things here and in many other places. + +(define_insn_and_split "*ior<mode>qi.byte0" + [(set (match_operand:HISI 0 "register_operand" "=r") + (ior:HISI + (zero_extend:HISI (match_operand:QI 1 "register_operand" "r")) + (match_operand:HISI 2 "register_operand" "0")))] + "" + "#" + "reload_completed" + [(set (match_dup 3) + (ior:QI (match_dup 3) + (match_dup 1)))] + { + operands[3] = simplify_gen_subreg (QImode, operands[0], <MODE>mode, 0); + }) + +(define_insn_and_split "*ior<mode>qi.byte1-3" + [(set (match_operand:HISI 0 "register_operand" "=r") + (ior:HISI + (ashift:HISI (zero_extend:HISI (match_operand:QI 1 "register_operand" "r")) + (match_operand:QI 2 "const_8_16_24_operand" "n")) + (match_operand:HISI 3 "register_operand" "0")))] + "INTVAL(operands[2]) < GET_MODE_BITSIZE (<MODE>mode)" + "#" + "&& reload_completed" + [(set (match_dup 4) + (ior:QI (match_dup 4) + (match_dup 1)))] + { + int byteno = INTVAL(operands[2]) / BITS_PER_UNIT; + operands[4] = simplify_gen_subreg (QImode, operands[0], <MODE>mode, byteno); + })