Message ID | 4DA880F6.4070109@gjlay.de |
---|---|
State | New |
Headers | show |
2011/4/15 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/4/14 Georg-Johann Lay <avr@gjlay.de>: >>> Denis Chertykov schrieb: >>>> 2011/4/14 Georg-Johann Lay <avr@gjlay.de>: >>>>> The "rotl<mode>3" expanders (mode \in {HI,SI,DI}) violates synopsis by >>>>> using 4 operands instead of 3. This runs in ICE in top of >>>>> optabs.c:maybe_gen_insn. >>>>> >>>>> The right way to do this is to use match_dup, not match_operand. So >>>>> the fix is obvious. >>>>> >>>>> Regenerated avr-gcc and run against avr testsuite: >>>>> >>>>> gcc.target/avr/torture/pr41885.c generates these patterns >>>>> >>>>> Johann >>>>> >>>>> 2011-04-14 Georg-Johann Lay <avr@gjlay.de> >>>>> >>>>> * config/avr/avr.md ("rotlhi3", "rotlsi3", "rotldi3"): Fix >>>>> expanders operand 3 from match_operand to match_dup. >>>> May be better to use (clobber (match_scratch ...)) here and in >>>> `*rotw<mode>' and `*rotb<mode>'. >>> These are splitters that might split before reload, producing strange, >>> non-matching insns like >>> (set (scratch:QI) ... >>> or crashing already in avr_rotate_bytes becase the split condition reads >>> "&& (reload_completed || <MODE>mode == DImode)" >> >> Generally I'm agree with you, change match_operand to match_dup is easy. > > I already submitted the easy patch to avoid the ICE. > >> But IMHO the right way is: >> - the splitters and avr_rotate_bytes are wrong and must be fixed too. >> - operands[3] is a scratch register and right way is to use match_scratch. >> >> I can approve the patch. >> >> Denis. > > Ok, here is the right-way patch. > > The expander generates a SCRATCH instead of a pseudo, and in case of a > pre-reload split the SCRATCH is replaced with a pseudo because it is > not known if or if not the scratch will actually be needed. > > avr_rotate_bytes now skips operations on non-REG 'scratch'. > Furthermore, I added an assertion that ensures that 'scratch' is > actually a REG when it is needed. > > Besides that, I fixed indentation. I know it is not optimal to fix > indentation alongside with functionality... did it anyway :-) > > Finally, I exposed alternative #3 of the insns to the register > allocator, because it is not possible to distinguish between > overlapping or non-overlapping regs, and #3 does not need a scratch. > > Ran C-testsuite with no regressions. Are you encountered any difference in code size ? Denis.
2011/4/17 Denis Chertykov <chertykov@gmail.com>: > 2011/4/15 Georg-Johann Lay <avr@gjlay.de>: >> Finally, I exposed alternative #3 of the insns to the register >> allocator, because it is not possible to distinguish between >> overlapping or non-overlapping regs, and #3 does not need a scratch. >> >> Ran C-testsuite with no regressions. > > Are you encountered any difference in code size ? I'm ask about code size because the IRA pass isn't work with `scratch:MODE' at all. This lead to bad/wrong register allocation in IRA pass. The reload pass will correct such a wrong allocation, but reload can't generate optimal code. (reload generate correct code). Because of that, may be you right and may be better to have (clobber (match_operand....)) instead of (clobber (match_scratch...)). Denis.
Denis Chertykov schrieb: > 2011/4/17 Denis Chertykov <chertykov@gmail.com>: >> 2011/4/15 Georg-Johann Lay <avr@gjlay.de>: >>> Finally, I exposed alternative #3 of the insns to the register >>> allocator, because it is not possible to distinguish between >>> overlapping or non-overlapping regs, and #3 does not need a scratch. >>> >>> Ran C-testsuite with no regressions. >> Are you encountered any difference in code size ? > > I'm ask about code size because the IRA pass isn't work with > `scratch:MODE' at all. I wonder what the difference is; IRA could treat scratch:MODE just the same way as pseudo:MODE. Anyway, thanks for that insight. It's merely impossible to get the big picture of IRA/reload from digging sources. So that means IRA generates garbage for scratch and reload has to fix it, generate spill code if needed, etc? Does this mean that if IRA sees a pseudo together with constraint "X" it will allocate the pseudo, anyway? > This lead to bad/wrong register allocation in IRA pass. > The reload pass will correct such a wrong allocation, but reload can't > generate optimal code. (reload generate correct code). > Because of that, may be you right and may be better to have > (clobber (match_operand....)) instead of (clobber (match_scratch...)). I don't see effects on code size, at least for the binaries left by testsuite. Is there a policy how to drive size-performance tests for avr? Anatoly posted a result on code size some weeks ago (concerning move costs), it would be interesting to know his setup and what sources/options he uses. Using testsuite seems not appropriate to me, because it runs -O3, -finline etc. and most code is no real-world code, at least not for avr. FYI, what I observe is a remarkable dependency on subreg lowering that can be generalized as this: * code size decreases for mode <= SImode * code size increases for mode > SImode An example for gcc.target/avr/torture/pr41885.c -Os -fno-split-wide-types 00000000 00000008 T rotl_16a 00000008 00000006 T rotl_16b 00000014 00000012 T rotl_32a 00000026 00000010 T rotl_32b 00000036 00000006 T rotl_32c 00000042 00000010 T rotl_32d 00000052 00000010 T rotl_32e 00000062 00000088 T rotl_64 00000150 00000106 T rotl_64a 00000256 00000082 T rotl_64b 00000338 00000096 T rotl_64c 00000434 00000098 T rotl_64d 00000532 00000100 T rotl_64e 00000632 00000102 T rotl_64f 00000734 00000104 T rotl_64g 00000838 00000110 T rotl_64h -fsplit-wide-types 00000000 00000008 T rotl_16a 00000008 00000006 T rotl_16b 00000014 00000024 T rotl_32a 00000038 00000020 T rotl_32b 00000058 00000024 T rotl_32c 00000082 00000022 T rotl_32d 00000104 00000022 T rotl_32e 00000126 00000036 T rotl_64 00000162 00000050 T rotl_64a 00000212 00000026 T rotl_64b 00000238 00000036 T rotl_64c 00000274 00000038 T rotl_64d 00000312 00000040 T rotl_64e 00000352 00000042 T rotl_64f 00000394 00000044 T rotl_64g 00000438 00000038 T rotl_64h Johann > > Denis. >
2011/4/18 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/4/17 Denis Chertykov <chertykov@gmail.com>: >>> 2011/4/15 Georg-Johann Lay <avr@gjlay.de>: >>>> Finally, I exposed alternative #3 of the insns to the register >>>> allocator, because it is not possible to distinguish between >>>> overlapping or non-overlapping regs, and #3 does not need a scratch. >>>> >>>> Ran C-testsuite with no regressions. >>> Are you encountered any difference in code size ? >> >> I'm ask about code size because the IRA pass isn't work with >> `scratch:MODE' at all. > > I wonder what the difference is; IRA could treat scratch:MODE just the > same way as pseudo:MODE. Anyway, thanks for that insight. It's merely > impossible to get the big picture of IRA/reload from digging sources. > > So that means IRA generates garbage for scratch and reload has to fix > it, generate spill code if needed, etc? No garbage, IRA just ignore scratch. The reload pass will allocate scratch registers if it's possible. If it's impossible, then reload pass will spill and generate reload for insns with scratches. > Does this mean that if IRA sees a pseudo together with constraint "X" > it will allocate the pseudo, anyway? I think so, but better to test IRA. >> This lead to bad/wrong register allocation in IRA pass. >> The reload pass will correct such a wrong allocation, but reload can't >> generate optimal code. (reload generate correct code). >> Because of that, may be you right and may be better to have >> (clobber (match_operand....)) instead of (clobber (match_scratch...)). > > I don't see effects on code size, at least for the binaries left by > testsuite. > > Is there a policy how to drive size-performance tests for avr? > Anatoly posted a result on code size some weeks ago (concerning move > costs), it would be interesting to know his setup and what > sources/options he uses. Using testsuite seems not appropriate to me, > because it runs -O3, -finline etc. and most code is no real-world > code, at least not for avr. IMHO libgcc is a good test case. > FYI, what I observe is a remarkable dependency on subreg lowering that > can be generalized as this: > * code size decreases for mode <= SImode > * code size increases for mode > SImode I think that DImode move instructions must be supported for representative test results for mode > SImode. Personally I'm don't want to support DImode for AVR. IMHO DImode is an overkill for 8bits controller. Another one note: Few years ago I have played with early splitting of anything possible (move,add,sub,and,...). The results was very bad. It's happened because flow of splitted insns (8bits insns) becomes unreadable for most of GCC optimisation passes. May be splitting is appropriate before register allocation or even after reload pass. Denis. PS: about code size tests: try to email directly to Anatoly. PPS: I can try to call him.
2011/4/18 Denis Chertykov <chertykov@gmail.com>: > 2011/4/18 Georg-Johann Lay <avr@gjlay.de>: > Few years ago I have played with early splitting of anything possible > (move,add,sub,and,...). The results was very bad. > It's happened because flow of splitted insns (8bits insns) becomes > unreadable for most of GCC optimisation passes. > May be splitting is appropriate before register allocation or even after > reload pass. May be something like this (early splitting) happened with the subreg lowering. Denis.
Denis Chertykov schrieb: > 2011/4/18 Georg-Johann Lay <avr@gjlay.de>: >> Denis Chertykov schrieb: >>> 2011/4/17 Denis Chertykov <chertykov@gmail.com>: >>>> 2011/4/15 Georg-Johann Lay <avr@gjlay.de>: >>>>> Finally, I exposed alternative #3 of the insns to the register >>>>> allocator, because it is not possible to distinguish between >>>>> overlapping or non-overlapping regs, and #3 does not need a scratch. >>>>> >>>>> Ran C-testsuite with no regressions. >>>> Are you encountered any difference in code size ? >>> I'm ask about code size because the IRA pass isn't work with >>> `scratch:MODE' at all. >> I wonder what the difference is; IRA could treat scratch:MODE just the >> same way as pseudo:MODE. Anyway, thanks for that insight. It's merely >> impossible to get the big picture of IRA/reload from digging sources. >> >> So that means IRA generates garbage for scratch and reload has to fix >> it, generate spill code if needed, etc? > > No garbage, IRA just ignore scratch. > The reload pass will allocate scratch registers if it's possible. > If it's impossible, then reload pass will spill and generate reload for > insns with scratches. > >> Does this mean that if IRA sees a pseudo together with constraint "X" >> it will allocate the pseudo, anyway? > > I think so, but better to test IRA. > >>> This lead to bad/wrong register allocation in IRA pass. >>> The reload pass will correct such a wrong allocation, but reload can't >>> generate optimal code. (reload generate correct code). >>> Because of that, may be you right and may be better to have >>> (clobber (match_operand....)) instead of (clobber (match_scratch...)). >> I don't see effects on code size, at least for the binaries left by >> testsuite. >> >> Is there a policy how to drive size-performance tests for avr? >> Anatoly posted a result on code size some weeks ago (concerning move >> costs), it would be interesting to know his setup and what >> sources/options he uses. Using testsuite seems not appropriate to me, >> because it runs -O3, -finline etc. and most code is no real-world >> code, at least not for avr. > > IMHO libgcc is a good test case. I don't see differences there, either. This is presumably because rotate pattern is not used resp. used only very seldom. >> FYI, what I observe is a remarkable dependency on subreg lowering that >> can be generalized as this: >> * code size decreases for mode <= SImode >> * code size increases for mode > SImode > > I think that DImode move instructions must be supported for representative > test results for mode > SImode. > > > Personally I'm don't want to support DImode for AVR. > IMHO DImode is an overkill for 8bits controller. Some people use 64bit on AVR. But to support a movdi would be very painful, I think it would cause more harm like spill failures than good. > Another one note: > Few years ago I have played with early splitting of anything possible > (move,add,sub,and,...). The results was very bad. > It's happened because flow of splitted insns (8bits insns) becomes > unreadable for most of GCC optimisation passes. > May be splitting is appropriate before register allocation or even after > reload pass. The -f[no-]split-wide-types happens before reload, in .subreg1 (prior to combine) resp. .subreg2 (after combine). For 64bit, without subreg lowering, there is an insn like (clobber (reg:DI 43 [ <retval> ])) which causes bunch of moves and finally the setup of a frame pointer even though nothing lives in the frame. How can add, sub etc. be split? This would need an explicit representation of carry. > Denis. > PS: about code size tests: try to email directly to Anatoly. > PPS: I can try to call him. Johann
2011/4/19 Georg-Johann Lay <avr@gjlay.de>: > How can add, sub etc. be split? This would need an explicit > representation of carry. Yes. Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html Denis.
Denis Chertykov schrieb: > 2011/4/19 Georg-Johann Lay <avr@gjlay.de>: >> How can add, sub etc. be split? This would need an explicit >> representation of carry. > > Yes. > > Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html Just skimmed the conversation. I thought about making AVR ISA's effects on SREG explicit several times, but I always got stuck at some point. - It's not only about scheduling (which does not happen for avr) but also about moving instructions across jumps. - Many transformations would happen before reload, but at these stages the effect on SREG is not yet known in many cases. There is sophisticated instruction output for many patterns, and their impact on SREG/CC is not known before reload. - Making CC explicit would render many single_set insns to PARALLELs making the optimizers' live much harder or impossible. Imagine instructions that could be combined. Explicit CC would clutter up insns and combine won't try to transform the bulky patterns. - Backend would be much more complicated, harder to maintain and understand. Almost any insn would have to be changed. Johann > > Denis. >
2011/4/19 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/4/19 Georg-Johann Lay <avr@gjlay.de>: >>> How can add, sub etc. be split? This would need an explicit >>> representation of carry. >> >> Yes. >> >> Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html > > Just skimmed the conversation. I thought about making AVR ISA's > effects on SREG explicit several times, but I always got stuck at some > point. > > - It's not only about scheduling (which does not happen for avr) but > also about moving instructions across jumps. > > - Many transformations would happen before reload, but at these stages > the effect on SREG is not yet known in many cases. There is > sophisticated instruction output for many patterns, and their impact > on SREG/CC is not known before reload. > > - Making CC explicit would render many single_set insns to PARALLELs > making the optimizers' live much harder or impossible. Imagine > instructions that could be combined. Explicit CC would clutter up > insns and combine won't try to transform the bulky patterns. > > - Backend would be much more complicated, harder to maintain and > understand. Almost any insn would have to be changed. Generally, I'm agree with you, the AVR port uses CC0 because of that. Denis.
On Tue, 2011-04-19 at 15:17 +0400, Denis Chertykov wrote: > 2011/4/19 Georg-Johann Lay <avr@gjlay.de>: > > Denis Chertykov schrieb: > >> 2011/4/19 Georg-Johann Lay <avr@gjlay.de>: > >>> How can add, sub etc. be split? This would need an explicit > >>> representation of carry. > >> > >> Yes. > >> > >> Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html > > > > Just skimmed the conversation. I thought about making AVR ISA's > > effects on SREG explicit several times, but I always got stuck at some > > point. > > > > - It's not only about scheduling (which does not happen for avr) but > > also about moving instructions across jumps. > > > > - Many transformations would happen before reload, but at these stages > > the effect on SREG is not yet known in many cases. There is > > sophisticated instruction output for many patterns, and their impact > > on SREG/CC is not known before reload. > > > > - Making CC explicit would render many single_set insns to PARALLELs > > making the optimizers' live much harder or impossible. Imagine > > instructions that could be combined. Explicit CC would clutter up > > insns and combine won't try to transform the bulky patterns. > > > > - Backend would be much more complicated, harder to maintain and > > understand. Almost any insn would have to be changed. > > Generally, I'm agree with you, the AVR port uses CC0 because of that. Thumb-1 support in the ARM compiler has similar flag-setting properties (ie most instructions set the condition codes). It doesn't use CC0. It works because it doesn't model the condition code register at all, but treats compare/branch sequences as indivisible operations. R.
Denis Chertykov schrieb: > 2011/4/17 Denis Chertykov <chertykov@gmail.com>: >> 2011/4/15 Georg-Johann Lay <avr@gjlay.de>: >>> Finally, I exposed alternative #3 of the insns to the register >>> allocator, because it is not possible to distinguish between >>> overlapping or non-overlapping regs, and #3 does not need a scratch. >>> >>> Ran C-testsuite with no regressions. >> Are you encountered any difference in code size ? > > I'm ask about code size because the IRA pass isn't work with > `scratch:MODE' at all. > This lead to bad/wrong register allocation in IRA pass. > The reload pass will correct such a wrong allocation, but reload can't > generate optimal code. (reload generate correct code). > Because of that, may be you right and may be better to have > (clobber (match_operand....)) instead of (clobber (match_scratch...)). So the conclusion is not to commit this patch and that the one-liner already installed is sufficient to fix the ICE? > > Denis. >
2011/4/20 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/4/17 Denis Chertykov <chertykov@gmail.com>: >>> 2011/4/15 Georg-Johann Lay <avr@gjlay.de>: >>>> Finally, I exposed alternative #3 of the insns to the register >>>> allocator, because it is not possible to distinguish between >>>> overlapping or non-overlapping regs, and #3 does not need a scratch. >>>> >>>> Ran C-testsuite with no regressions. >>> Are you encountered any difference in code size ? >> >> I'm ask about code size because the IRA pass isn't work with >> `scratch:MODE' at all. >> This lead to bad/wrong register allocation in IRA pass. >> The reload pass will correct such a wrong allocation, but reload can't >> generate optimal code. (reload generate correct code). >> Because of that, may be you right and may be better to have >> (clobber (match_operand....)) instead of (clobber (match_scratch...)). > > So the conclusion is not to commit this patch and that the one-liner > already installed is sufficient to fix the ICE? Yes. Denis.
On 04/19/2011 02:52 AM, Denis Chertykov wrote: > 2011/4/19 Georg-Johann Lay <avr@gjlay.de>: >> How can add, sub etc. be split? This would need an explicit >> representation of carry. > > Yes. > > Look at http://gcc.gnu.org/ml/gcc/2005-03/msg00871.html Well, sort-of, but not really. It gets a tad ugly, but have a look at the adddi3* patterns in the mn10300 and rx ports. In particular note how both the inputs and outputs to the insn (not the expander) are all SImode, allowing for lower_subreg to do its job. The patterns are split post-reload -- some of that is for scheduling, some of that simply makes computing the individual insn lengths significantly easier -- but you wouldn't really have to do that for AVR. For AVR things would become even trickier. You might consider (define_predicate "concat_operator" (match_code "concat")) (define_insn "addsi3_qqqq" [(set (match_operand:QI 0 "register_operand" "=r") (truncate:QI (plus:SI (match_operator:SI 12 "concat_operator" [(match_operand:QI 4 "register_operand" "0") (match_operand:QI 5 "register_operand" "1") (match_operand:QI 6 "register_operand" "2") (match_operand:QI 7 "register_operand" "3")]) (match_operator:SI 13 "concat_operator" [(match_operand:QI 8 "reg_or_0_operand" "rL") (match_operand:QI 9 "reg_or_0_operand" "rL") (match_operand:QI 10 "reg_or_0_operand" "rL") (match_operand:QI 11 "reg_or_0_operand" "rL")]))) (set (match_operand:QI 1 "register_operand" "=r") (truncate:QI (lshiftrt:SI (plus:SI (match_dup 24) (match_dup 25)) (const_int 8)))) (set (match_operand:QI 2 "register_operand" "=r") (truncate:QI (lshiftrt:SI (plus:SI (match_dup 24) (match_dup 25)) (const_int 16)))) (set (match_operand:QI 3 "register_operand" "=r") (truncate:QI (lshiftrt:SI (plus:SI (match_dup 24) (match_dup 25)) (const_int 24))))] "" "add %0,%Z8\;adc %1,%Z9\;adc %2,%Z10\;adc %3,%Z11" [(set_attr "length" "4")] ) This may require a little bit of code inside combine to handle CONCAT in a reasonable way, but that should be fairly minimal. It may also want some more special case patterns and/or peep2s to more efficiently handle constants, particularly considering adiw and subic. But I think it's at least worth investigating. r~
Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (Revision 172493) +++ config/avr/avr.md (Arbeitskopie) @@ -1519,22 +1519,22 @@ (define_mode_attr rotsmode [(DI "QI") (S (define_expand "rotl<mode>3" [(parallel [(set (match_operand:HIDI 0 "register_operand" "") - (rotate:HIDI (match_operand:HIDI 1 "register_operand" "") - (match_operand:VOID 2 "const_int_operand" ""))) - (clobber (match_dup 3))])] + (rotate:HIDI (match_operand:HIDI 1 "register_operand" "") + (match_operand:VOID 2 "const_int_operand" ""))) + (clobber (match_dup 3))])] "" - " -{ - if (CONST_INT_P (operands[2]) && 0 == (INTVAL (operands[2]) % 8)) { - if (AVR_HAVE_MOVW && 0 == INTVAL (operands[2]) % 16) - operands[3] = gen_reg_rtx (<rotsmode>mode); - else - operands[3] = gen_reg_rtx (QImode); - } - else - FAIL; -}") + if (CONST_INT_P (operands[2]) + && 0 == INTVAL (operands[2]) % 8) + { + if (AVR_HAVE_MOVW && 0 == INTVAL (operands[2]) % 16) + operands[3] = gen_rtx_SCRATCH (<rotsmode>mode); + else + operands[3] = gen_rtx_SCRATCH (QImode); + } + else + FAIL; + }) ;; Overlapping non-HImode registers often (but not always) need a scratch. @@ -1545,35 +1545,49 @@ (define_expand "rotl<mode>3" ; Split word aligned rotates using scratch that is mode dependent. (define_insn_and_split "*rotw<mode>" - [(set (match_operand:HIDI 0 "register_operand" "=r,r,#&r") - (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r") - (match_operand 2 "immediate_operand" "n,n,n"))) - (clobber (match_operand:<rotsmode> 3 "register_operand" "=<rotx>" ))] - "(CONST_INT_P (operands[2]) && - (0 == (INTVAL (operands[2]) % 16) && AVR_HAVE_MOVW))" + [(set (match_operand:HIDI 0 "register_operand" "=r,r,&r") + (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r") + (match_operand 2 "const_int_operand" "n,n,n"))) + (clobber (match_scratch:<rotsmode> 3 "=<rotx>"))] + "AVR_HAVE_MOVW + && 0 == INTVAL (operands[2]) % 16" "#" "&& (reload_completed || <MODE>mode == DImode)" [(const_int 0)] - "avr_rotate_bytes (operands); - DONE;" -) + { + /* Split happens in .split1: Cook up a pseudo */ + if (SCRATCH == GET_CODE (operands[3]) + && !reload_completed) + { + operands[3] = gen_reg_rtx (GET_MODE (operands[3])); + } + avr_rotate_bytes (operands); + DONE; + }) ; Split byte aligned rotates using scratch that is always QI mode. (define_insn_and_split "*rotb<mode>" - [(set (match_operand:HIDI 0 "register_operand" "=r,r,#&r") - (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r") - (match_operand 2 "immediate_operand" "n,n,n"))) - (clobber (match_operand:QI 3 "register_operand" "=<rotx>" ))] - "(CONST_INT_P (operands[2]) && - (8 == (INTVAL (operands[2]) % 16) - || (!AVR_HAVE_MOVW && 0 == (INTVAL (operands[2]) % 16))))" + [(set (match_operand:HIDI 0 "register_operand" "=r,r,&r") + (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r") + (match_operand 2 "const_int_operand" "n,n,n"))) + (clobber (match_scratch:QI 3 "=<rotx>" ))] + "8 == INTVAL (operands[2]) % 16 + || (!AVR_HAVE_MOVW + && 0 == INTVAL (operands[2]) % 16)" "#" "&& (reload_completed || <MODE>mode == DImode)" [(const_int 0)] - "avr_rotate_bytes (operands); - DONE;" -) + { + /* Split happens in .split1: Cook up a pseudo */ + if (SCRATCH == GET_CODE (operands[3]) + && !reload_completed) + { + operands[3] = gen_reg_rtx (QImode); + } + avr_rotate_bytes (operands); + DONE; + }) ;;<< << << << << << << << << << << << << << << << << << << << << << << << << << Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (Revision 172493) +++ config/avr/avr.c (Arbeitskopie) @@ -4481,141 +4481,147 @@ lshrsi3_out (rtx insn, rtx operands[], i bool avr_rotate_bytes (rtx operands[]) { - int i, j; - enum machine_mode mode = GET_MODE (operands[0]); - bool overlapped = reg_overlap_mentioned_p (operands[0], operands[1]); - bool same_reg = rtx_equal_p (operands[0], operands[1]); - int num = INTVAL (operands[2]); - rtx scratch = operands[3]; - /* Work out if byte or word move is needed. Odd byte rotates need QImode. - Word move if no scratch is needed, otherwise use size of scratch. */ - enum machine_mode move_mode = QImode; - int move_size, offset, size; - - if (num & 0xf) - move_mode = QImode; - else if ((mode == SImode && !same_reg) || !overlapped) - move_mode = HImode; - else - move_mode = GET_MODE (scratch); - - /* Force DI rotate to use QI moves since other DI moves are currently split - into QI moves so forward propagation works better. */ - if (mode == DImode) - move_mode = QImode; - /* Make scratch smaller if needed. */ - if (GET_MODE (scratch) == HImode && move_mode == QImode) - scratch = simplify_gen_subreg (move_mode, scratch, HImode, 0); - - move_size = GET_MODE_SIZE (move_mode); - /* Number of bytes/words to rotate. */ - offset = (num >> 3) / move_size; - /* Number of moves needed. */ - size = GET_MODE_SIZE (mode) / move_size; - /* Himode byte swap is special case to avoid a scratch register. */ - if (mode == HImode && same_reg) - { - /* HImode byte swap, using xor. This is as quick as using scratch. */ - rtx src, dst; - src = simplify_gen_subreg (move_mode, operands[1], mode, 0); - dst = simplify_gen_subreg (move_mode, operands[0], mode, 1); - if (!rtx_equal_p (dst, src)) - { - emit_move_insn (dst, gen_rtx_XOR (QImode, dst, src)); - emit_move_insn (src, gen_rtx_XOR (QImode, src, dst)); - emit_move_insn (dst, gen_rtx_XOR (QImode, dst, src)); - } - } - else - { + int i, j; + enum machine_mode mode = GET_MODE (operands[0]); + bool overlapped = reg_overlap_mentioned_p (operands[0], operands[1]); + bool same_reg = rtx_equal_p (operands[0], operands[1]); + int num = INTVAL (operands[2]); + rtx scratch = operands[3]; + /* Work out if byte or word move is needed. Odd byte rotates need QImode. + Word move if no scratch is needed, otherwise use size of scratch. */ + enum machine_mode move_mode = QImode; + int move_size, offset, size; + + if (num & 0xf) + move_mode = QImode; + else if ((mode == SImode && !same_reg) || !overlapped) + move_mode = HImode; + else + move_mode = GET_MODE (scratch); + + /* Force DI rotate to use QI moves since other DI moves are currently split + into QI moves so forward propagation works better. */ + if (mode == DImode) + move_mode = QImode; + /* Make scratch smaller if needed. */ + if (GET_MODE (scratch) == HImode + && move_mode == QImode + && REG_P (scratch)) + { + scratch = simplify_gen_subreg (move_mode, scratch, HImode, 0); + } + + move_size = GET_MODE_SIZE (move_mode); + /* Number of bytes/words to rotate. */ + offset = (num >> 3) / move_size; + /* Number of moves needed. */ + size = GET_MODE_SIZE (mode) / move_size; + /* Himode byte swap is special case to avoid a scratch register. */ + if (mode == HImode && same_reg) + { + /* HImode byte swap, using xor. This is as quick as using scratch. */ + rtx src, dst; + src = simplify_gen_subreg (move_mode, operands[1], mode, 0); + dst = simplify_gen_subreg (move_mode, operands[0], mode, 1); + if (!rtx_equal_p (dst, src)) + { + emit_move_insn (dst, gen_rtx_XOR (QImode, dst, src)); + emit_move_insn (src, gen_rtx_XOR (QImode, src, dst)); + emit_move_insn (dst, gen_rtx_XOR (QImode, dst, src)); + } + } + else + { #define MAX_SIZE 8 /* GET_MODE_SIZE (DImode) / GET_MODE_SIZE (QImode) */ - /* Create linked list of moves to determine move order. */ - struct { - rtx src, dst; - int links; - } move[MAX_SIZE + 8]; - int blocked, moves; - - gcc_assert (size <= MAX_SIZE); - /* Generate list of subreg moves. */ - for (i = 0; i < size; i++) - { - int from = i; - int to = (from + offset) % size; - move[i].src = simplify_gen_subreg (move_mode, operands[1], - mode, from * move_size); - move[i].dst = simplify_gen_subreg (move_mode, operands[0], - mode, to * move_size); - move[i].links = -1; - } - /* Mark dependence where a dst of one move is the src of another move. - The first move is a conflict as it must wait until second is - performed. We ignore moves to self - we catch this later. */ - if (overlapped) - for (i = 0; i < size; i++) - if (reg_overlap_mentioned_p (move[i].dst, operands[1])) - for (j = 0; j < size; j++) - if (j != i && rtx_equal_p (move[j].src, move[i].dst)) - { - /* The dst of move i is the src of move j. */ - move[i].links = j; - break; - } - - blocked = -1; - moves = 0; - /* Go through move list and perform non-conflicting moves. As each - non-overlapping move is made, it may remove other conflicts - so the process is repeated until no conflicts remain. */ - do - { - blocked = -1; - moves = 0; - /* Emit move where dst is not also a src or we have used that - src already. */ - for (i = 0; i < size; i++) - if (move[i].src != NULL_RTX) - { - if (move[i].links == -1 - || move[move[i].links].src == NULL_RTX) - { - moves++; - /* Ignore NOP moves to self. */ - if (!rtx_equal_p (move[i].dst, move[i].src)) - emit_move_insn (move[i].dst, move[i].src); - - /* Remove conflict from list. */ - move[i].src = NULL_RTX; - } - else - blocked = i; - } - - /* Check for deadlock. This is when no moves occurred and we have - at least one blocked move. */ - if (moves == 0 && blocked != -1) - { - /* Need to use scratch register to break deadlock. - Add move to put dst of blocked move into scratch. - When this move occurs, it will break chain deadlock. - The scratch register is substituted for real move. */ - - move[size].src = move[blocked].dst; - move[size].dst = scratch; - /* Scratch move is never blocked. */ - move[size].links = -1; - /* Make sure we have valid link. */ - gcc_assert (move[blocked].links != -1); - /* Replace src of blocking move with scratch reg. */ - move[move[blocked].links].src = scratch; - /* Make dependent on scratch move occuring. */ - move[blocked].links = size; - size=size+1; - } - } - while (blocked != -1); - } - return true; + /* Create linked list of moves to determine move order. */ + struct { + rtx src, dst; + int links; + } move[MAX_SIZE + 8]; + int blocked, moves; + + gcc_assert (size <= MAX_SIZE); + /* Generate list of subreg moves. */ + for (i = 0; i < size; i++) + { + int from = i; + int to = (from + offset) % size; + move[i].src = simplify_gen_subreg (move_mode, operands[1], + mode, from * move_size); + move[i].dst = simplify_gen_subreg (move_mode, operands[0], + mode, to * move_size); + move[i].links = -1; + } + /* Mark dependence where a dst of one move is the src of another move. + The first move is a conflict as it must wait until second is + performed. We ignore moves to self - we catch this later. */ + if (overlapped) + for (i = 0; i < size; i++) + if (reg_overlap_mentioned_p (move[i].dst, operands[1])) + for (j = 0; j < size; j++) + if (j != i && rtx_equal_p (move[j].src, move[i].dst)) + { + /* The dst of move i is the src of move j. */ + move[i].links = j; + break; + } + + blocked = -1; + moves = 0; + /* Go through move list and perform non-conflicting moves. As each + non-overlapping move is made, it may remove other conflicts + so the process is repeated until no conflicts remain. */ + do + { + blocked = -1; + moves = 0; + /* Emit move where dst is not also a src or we have used that + src already. */ + for (i = 0; i < size; i++) + if (move[i].src != NULL_RTX) + { + if (move[i].links == -1 + || move[move[i].links].src == NULL_RTX) + { + moves++; + /* Ignore NOP moves to self. */ + if (!rtx_equal_p (move[i].dst, move[i].src)) + emit_move_insn (move[i].dst, move[i].src); + + /* Remove conflict from list. */ + move[i].src = NULL_RTX; + } + else + blocked = i; + } + + /* Check for deadlock. This is when no moves occurred and we have + at least one blocked move. */ + if (moves == 0 && blocked != -1) + { + /* Need to use scratch register to break deadlock. + Add move to put dst of blocked move into scratch. + When this move occurs, it will break chain deadlock. + The scratch register is substituted for real move. */ + + gcc_assert (REG_P (scratch)); + + move[size].src = move[blocked].dst; + move[size].dst = scratch; + /* Scratch move is never blocked. */ + move[size].links = -1; + /* Make sure we have valid link. */ + gcc_assert (move[blocked].links != -1); + /* Replace src of blocking move with scratch reg. */ + move[move[blocked].links].src = scratch; + /* Make dependent on scratch move occuring. */ + move[blocked].links = size; + size=size+1; + } + } + while (blocked != -1); + } + return true; } /* Modifies the length assigned to instruction INSN