Message ID | 4E9D6966.6000003@gjlay.de |
---|---|
State | New |
Headers | show |
2011/10/18 Georg-Johann Lay <avr@gjlay.de>: > This patch do some tweaks to addhi3 like adding QI scratch register. > > The original *addhi3 insn is still there and located prior to new > addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this > note) so that there is a version with and a version without scratch register. > > Patch passes without regressions. > Which improvements added by this patch ? Denis.
Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay <avr@gjlay.de>: >> This patch do some tweaks to addhi3 like adding QI scratch register. >> >> The original *addhi3 insn is still there and located prior to new >> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this >> note) so that there is a version with and a version without scratch register. >> >> Patch passes without regressions. >> > > Which improvements added by this patch ? > > Denis. If the addhi3 is expanded early, the addition happens with QI scratch which avoids reload of constant if target register is in NO_LD. And reduce register pressure as only QI is needed and not reload of constant to HI. Otherwise, there might be sequences like ldi r31, 2 ; *reload_inhi mov r12, r31 clr r13 add r14, r12 ; *addhi3 adc r15, r13 which now will be ldi r31, 2 ; addhi3_clobber add r14, r31 adc r15, __zero_reg__ Similar applies if the reload of the constant happens to LD regs: ldi r30, 2 ; *movhi clr r31 add r14, r12 ; *addhi3 adc r15, r13 will become ldi r30, 2 ; addhi3_clobber add r14, r30 adc r15, __zero_reg__ For *addhi3 insns the register pressure is not reduced but the insn sequence might be smarter if peep2 comes up with a QI scratch or if it detects a *reload_inhi insn just prior to the addition (and the reg that holds the reloaded constant dies after the addition). As *addhi3 is special to reload, there is still an "ordinary" add addhi insn without scratch. This is easier because, e.g. prologue and epilogue generation generate add insns (not by means of addhi3 expander but by explicit gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an addhi3 insn is to be generated via addhi3 expander late in the compilation process. Johann
2011/10/18 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>: >>> This patch do some tweaks to addhi3 like adding QI scratch register. >>> >>> The original *addhi3 insn is still there and located prior to new >>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this >>> note) so that there is a version with and a version without scratch register. >>> >>> Patch passes without regressions. >>> >> >> Which improvements added by this patch ? >> >> Denis. > > If the addhi3 is expanded early, the addition happens with QI scratch which > avoids reload of constant if target register is in NO_LD. And reduce register > pressure as only QI is needed and not reload of constant to HI. > > Otherwise, there might be sequences like > > ldi r31, 2 ; *reload_inhi > mov r12, r31 > clr r13 > > add r14, r12 ; *addhi3 > adc r15, r13 > > which now will be > > ldi r31, 2 ; addhi3_clobber > add r14, r31 > adc r15, __zero_reg__ > > Similar applies if the reload of the constant happens to LD regs: > > ldi r30, 2 ; *movhi > clr r31 > > add r14, r12 ; *addhi3 > adc r15, r13 > > will become > > ldi r30, 2 ; addhi3_clobber > add r14, r30 > adc r15, __zero_reg__ > > For *addhi3 insns the register pressure is not reduced but the insn sequence > might be smarter if peep2 comes up with a QI scratch or if it detects a > *reload_inhi insn just prior to the addition (and the reg that holds the > reloaded constant dies after the addition). > > As *addhi3 is special to reload, there is still an "ordinary" add addhi insn > without scratch. This is easier because, e.g. prologue and epilogue generation > generate add insns (not by means of addhi3 expander but by explicit > gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an > addhi3 insn is to be generated via addhi3 expander late in the compilation process Please provide any real world example. Denis.
Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay <avr@gjlay.de>: >> Denis Chertykov schrieb: >>> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>: >>>> This patch do some tweaks to addhi3 like adding QI scratch register. >>>> >>>> The original *addhi3 insn is still there and located prior to new >>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this >>>> note) so that there is a version with and a version without scratch register. >>>> >>>> Patch passes without regressions. >>>> >>> Which improvements added by this patch ? >>> >>> Denis. >> If the addhi3 is expanded early, the addition happens with QI scratch which >> avoids reload of constant if target register is in NO_LD. And reduce register >> pressure as only QI is needed and not reload of constant to HI. >> >> Otherwise, there might be sequences like >> >> ldi r31, 2 ; *reload_inhi >> mov r12, r31 >> clr r13 >> >> add r14, r12 ; *addhi3 >> adc r15, r13 >> >> which now will be >> >> ldi r31, 2 ; addhi3_clobber >> add r14, r31 >> adc r15, __zero_reg__ >> >> Similar applies if the reload of the constant happens to LD regs: >> >> ldi r30, 2 ; *movhi >> clr r31 >> >> add r14, r12 ; *addhi3 >> adc r15, r13 >> >> will become >> >> ldi r30, 2 ; addhi3_clobber >> add r14, r30 >> adc r15, __zero_reg__ >> >> For *addhi3 insns the register pressure is not reduced but the insn sequence >> might be smarter if peep2 comes up with a QI scratch or if it detects a >> *reload_inhi insn just prior to the addition (and the reg that holds the >> reloaded constant dies after the addition). >> >> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn >> without scratch. This is easier because, e.g. prologue and epilogue generation >> generate add insns (not by means of addhi3 expander but by explicit >> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >> addhi3 insn is to be generated via addhi3 expander late in the compilation process > > Please provide any real world example. > > Denis. Consider avr-libc (under the assumption that it is "real world" code): In avr-libc's build directory, and with the patch integrated: $ cd avr/lib/avr4 $ make clean && make CFLAGS='-save-temps -dp -Os' $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) $ grep 'addhi3_clobber\/2' *.s | wc -l 33 This shows that the insns are already there before peep2 and thus no reload of 16-bit constant is needed; an 8-bit scratch is sufficient. Alternatively, the implementation could omit the expansion to addhi3_clobber in addhi3 expander and instead rely completely on peep2. However, that does not reduce register pressure because a 16-bit register will be allocated and the peep2 just prints things smarter and needs just a QI scratch to call avr_out_plus_clobber. For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess. as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but better because it sets cc0. So you like this patch? Or prefer a patch that is neutral with respect to register allocator and just uses peep2 to print things smarter? Johann dtoa_prf.s: ldi r31,3 ; , ; 338 addhi3_clobber/2 [length = 3] dtoa_prf.s- add r12,r31 ; s, dtoa_prf.s- adc r13,__zero_reg__ ; s -- dtoa_prf.s: ldi r31,3 ; , ; 447 addhi3_clobber/2 [length = 3] dtoa_prf.s- add r12,r31 ; s, dtoa_prf.s- adc r13,__zero_reg__ ; s -- fgets.s: ldi r31,1 ; , ; 70 addhi3_clobber/2 [length = 3] fgets.s- sub r14,r31 ; ivtmp.9, fgets.s- sbc r15,__zero_reg__ ; ivtmp.9 -- realloc.s: ldi r17,2 ; , ; 80 addhi3_clobber/2 [length = 3] realloc.s- add r12,r17 ; tmp83, realloc.s- adc r13,__zero_reg__ ; -- realloc.s: ldi r18,2 ; , ; 85 addhi3_clobber/2 [length = 3] realloc.s- add r12,r18 ; tmp84, realloc.s- adc r13,__zero_reg__ ; -- strtod.s: ldi r31,1 ; , ; 101 addhi3_clobber/2 [length = 3] strtod.s- sub r14,r31 ; D.2581, strtod.s- sbc r15,__zero_reg__ ; D.2581 -- strtod.s: ldi r18,2 ; , ; 110 addhi3_clobber/2 [length = 3] strtod.s- add r14,r18 ; nptr, strtod.s- adc r15,__zero_reg__ ; nptr -- strtod.s: ldi r21,7 ; , ; 120 addhi3_clobber/2 [length = 3] strtod.s- add r14,r21 ; nptr, strtod.s- adc r15,__zero_reg__ ; nptr -- strtod.s: ldi r31,255 ; , ; 175 addhi3_clobber/2 [length = 3] strtod.s- sub r14,r31 ; exp, strtod.s- sbc r15,r31 ; exp, -- strtod.s: ldi r18,1 ; , ; 185 addhi3_clobber/2 [length = 3] strtod.s- sub r14,r18 ; exp, strtod.s- sbc r15,__zero_reg__ ; exp -- strtod.s: ldi r31,24 ; , ; 376 addhi3_clobber/2 [length = 3] strtod.s- sub r8,r31 ; D.2735, strtod.s- sbc r9,__zero_reg__ ; D.2735 -- strtol.s: ldi r31,2 ; , ; 128 addhi3_clobber/2 [length = 3] strtol.s- add r6,r31 ; nptr, strtol.s- adc r7,__zero_reg__ ; nptr -- strtol.s: ldi r31,1 ; , ; 242 addhi3_clobber/2 [length = 3] strtol.s- sub r6,r31 ; tmp117, strtol.s- sbc r7,__zero_reg__ ; -- strtol.s: ldi r31,2 ; , ; 252 addhi3_clobber/2 [length = 3] strtol.s- sub r6,r31 ; tmp119, strtol.s- sbc r7,__zero_reg__ ; -- strtoul.s: ldi r31,2 ; , ; 126 addhi3_clobber/2 [length = 3] strtoul.s- add r14,r31 ; nptr, strtoul.s- adc r15,__zero_reg__ ; nptr -- strtoul.s: ldi r31,1 ; , ; 229 addhi3_clobber/2 [length = 3] strtoul.s- sub r14,r31 ; tmp113, strtoul.s- sbc r15,__zero_reg__ ; -- strtoul.s: ldi r31,2 ; , ; 239 addhi3_clobber/2 [length = 3] strtoul.s- sub r14,r31 ; tmp115, strtoul.s- sbc r15,__zero_reg__ ; -- vfprintf.s: ldi r24,4 ; , ; 399 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r24 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r21,10 ; , ; 850 addhi3_clobber/2 [length = 3] vfprintf.s- sub r10,r21 ; exp, vfprintf.s- sbc r11,__zero_reg__ ; exp -- vfprintf.s: ldi r30,2 ; , ; 882 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r30 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 892 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 919 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,1 ; , ; 987 addhi3_clobber/2 [length = 3] vfprintf.s- sub r8,r31 ; size, vfprintf.s- sbc r9,__zero_reg__ ; size -- vfprintf.s: ldi r18,4 ; , ; 1012 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r18 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 1019 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r30,4 ; , ; 1109 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r30 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 1116 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfscanf.s: ldi r27,1 ; , ; 213 addhi3_clobber/2 [length = 3] vfscanf.s- sub r10,r27 ; width, vfscanf.s- sbc r11,__zero_reg__ ; width -- vfscanf.s: ldi r25,255 ; , ; 163 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r25 ; exp, vfscanf.s- sbc r13,r25 ; exp, -- vfscanf.s: ldi r30,1 ; , ; 173 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r30 ; exp, vfscanf.s- sbc r13,__zero_reg__ ; exp -- vfscanf.s: ldi r25,24 ; , ; 354 addhi3_clobber/2 [length = 3] vfscanf.s- sub r6,r25 ; D.3471, vfscanf.s- sbc r7,__zero_reg__ ; D.3471 -- vfscanf.s: ldi r31,1 ; , ; 235 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r31 ; width, vfscanf.s- sbc r13,__zero_reg__ ; width -- vfscanf.s: ldi r31,1 ; , ; 334 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r31 ; width, vfscanf.s- sbc r13,__zero_reg__ ; width
2011/10/18 Georg-Johann Lay <avr@gjlay.de>: > Denis Chertykov schrieb: >> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>: >>> Denis Chertykov schrieb: >>>> 2011/10/18 Georg-Johann Lay <avr@gjlay.de>: >>>>> This patch do some tweaks to addhi3 like adding QI scratch register. >>>>> >>>>> The original *addhi3 insn is still there and located prior to new >>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for this >>>>> note) so that there is a version with and a version without scratch register. >>>>> >>>>> Patch passes without regressions. >>>>> >>>> Which improvements added by this patch ? >>>> >>>> Denis. >>> If the addhi3 is expanded early, the addition happens with QI scratch which >>> avoids reload of constant if target register is in NO_LD. And reduce register >>> pressure as only QI is needed and not reload of constant to HI. >>> >>> Otherwise, there might be sequences like >>> >>> ldi r31, 2 ; *reload_inhi >>> mov r12, r31 >>> clr r13 >>> >>> add r14, r12 ; *addhi3 >>> adc r15, r13 >>> >>> which now will be >>> >>> ldi r31, 2 ; addhi3_clobber >>> add r14, r31 >>> adc r15, __zero_reg__ >>> >>> Similar applies if the reload of the constant happens to LD regs: >>> >>> ldi r30, 2 ; *movhi >>> clr r31 >>> >>> add r14, r12 ; *addhi3 >>> adc r15, r13 >>> >>> will become >>> >>> ldi r30, 2 ; addhi3_clobber >>> add r14, r30 >>> adc r15, __zero_reg__ >>> >>> For *addhi3 insns the register pressure is not reduced but the insn sequence >>> might be smarter if peep2 comes up with a QI scratch or if it detects a >>> *reload_inhi insn just prior to the addition (and the reg that holds the >>> reloaded constant dies after the addition). >>> >>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn >>> without scratch. This is easier because, e.g. prologue and epilogue generation >>> generate add insns (not by means of addhi3 expander but by explicit >>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >>> addhi3 insn is to be generated via addhi3 expander late in the compilation process >> >> Please provide any real world example. >> >> Denis. > > Consider avr-libc (under the assumption that it is "real world" code): > > In avr-libc's build directory, and with the patch integrated: > > $ cd avr/lib/avr4 > $ make clean && make CFLAGS='-save-temps -dp -Os' > $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) > $ grep 'addhi3_clobber\/2' *.s | wc -l > 33 > > This shows that the insns are already there before peep2 and thus no reload of > 16-bit constant is needed; an 8-bit scratch is sufficient. > > Alternatively, the implementation could omit the expansion to addhi3_clobber in > addhi3 expander and instead rely completely on peep2. However, that does not > reduce register pressure because a 16-bit register will be allocated and the > peep2 just prints things smarter and needs just a QI scratch to call > avr_out_plus_clobber. > > For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess. > as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but > better because it sets cc0. > > So you like this patch? > Or prefer a patch that is neutral with respect to register allocator and just > uses peep2 to print things smarter? I'm interested in code improvements. What difference in size of avr-libc ? Denis.
Just a note. Instead of `!reload_completed && !reload_in_progress'you can use `can_create_pseudo_p()' declared in rtl.h Denis.
Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 180104) +++ config/avr/avr.md (working copy) @@ -78,7 +78,7 @@ (define_c_enum "unspecv" ;; Condition code settings. (define_attr "cc" "none,set_czn,set_zn,set_n,compare,clobber, - out_plus" + out_plus, out_plus_noclobber" (const_string "none")) (define_attr "type" "branch,branch1,arith,xcall" @@ -125,7 +125,8 @@ (define_attr "length" "" ;; Otherwise do special processing depending on the attribute. (define_attr "adjust_len" - "out_bitop, out_plus, addto_sp, tsthi, tstsi, compare, call, + "out_bitop, out_plus, out_plus_noclobber, addto_sp, + tsthi, tstsi, compare, call, mov8, mov16, mov32, reload_in16, reload_in32, ashlqi, ashrqi, lshrqi, ashlhi, ashrhi, lshrhi, @@ -759,14 +760,23 @@ (define_expand "addhi3" (plus:HI (match_operand:HI 1 "register_operand" "") (match_operand:HI 2 "nonmemory_operand" "")))] "" - " -{ - if (GET_CODE (operands[2]) == CONST_INT) - { - short tmp = INTVAL (operands[2]); - operands[2] = GEN_INT(tmp); - } -}") + { + if (CONST_INT_P (operands[2])) + { + operands[2] = gen_int_mode (INTVAL (operands[2]), HImode); + + if (!reload_completed + && !reload_in_progress + && !stack_register_operand (operands[0], HImode) + && !stack_register_operand (operands[1], HImode) + && !d_register_operand (operands[0], HImode) + && !d_register_operand (operands[1], HImode)) + { + emit_insn (gen_addhi3_clobber (operands[0], operands[1], operands[2])); + DONE; + } + } + }) (define_insn "*addhi3_zero_extend" @@ -803,20 +813,77 @@ (define_insn "*addhi3_sp_R" (set_attr "adjust_len" "addto_sp")]) (define_insn "*addhi3" - [(set (match_operand:HI 0 "register_operand" "=r,!w,!w,d,r,r") - (plus:HI - (match_operand:HI 1 "register_operand" "%0,0,0,0,0,0") - (match_operand:HI 2 "nonmemory_operand" "r,I,J,i,P,N")))] + [(set (match_operand:HI 0 "register_operand" "=r,d,d") + (plus:HI (match_operand:HI 1 "register_operand" "%0,0,0") + (match_operand:HI 2 "nonmemory_operand" "r,s,n")))] "" - "@ - add %A0,%A2\;adc %B0,%B2 - adiw %A0,%2 - sbiw %A0,%n2 - subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2)) - sec\;adc %A0,__zero_reg__\;adc %B0,__zero_reg__ - sec\;sbc %A0,__zero_reg__\;sbc %B0,__zero_reg__" - [(set_attr "length" "2,1,1,2,3,3") - (set_attr "cc" "set_n,set_czn,set_czn,set_czn,set_n,set_n")]) + { + static const char * const asm_code[] = + { + "add %A0,%A2\;adc %B0,%B2", + "subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2))", + "" + }; + + if (*asm_code[which_alternative]) + return asm_code[which_alternative]; + + return avr_out_plus_noclobber (operands, NULL, NULL); + } + [(set_attr "length" "2,2,2") + (set_attr "adjust_len" "*,*,out_plus_noclobber") + (set_attr "cc" "set_n,set_czn,out_plus_noclobber")]) + +;; Adding a constant to NO_LD_REGS might have lead to a reload of +;; that constant to LD_REGS. We don't add a scratch to *addhi3 +;; itself because that insn is special to reload. + +(define_peephole2 ; *addhi3_clobber + [(set (match_operand:HI 0 "d_register_operand" "") + (match_operand:HI 1 "const_int_operand" "")) + (set (match_operand:HI 2 "l_register_operand" "") + (plus:HI (match_dup 2) + (match_dup 0)))] + "peep2_reg_dead_p (2, operands[0])" + [(parallel [(set (match_dup 2) + (plus:HI (match_dup 2) + (match_dup 1))) + (clobber (match_dup 3))])] + { + operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 0); + }) + +;; Same, but with reload to NO_LD_REGS +;; Combine *reload_inhi with *addhi3 + +(define_peephole2 ; *addhi3_clobber + [(parallel [(set (match_operand:HI 0 "l_register_operand" "") + (match_operand:HI 1 "const_int_operand" "")) + (clobber (match_operand:QI 2 "d_register_operand" ""))]) + (set (match_operand:HI 3 "l_register_operand" "") + (plus:HI (match_dup 3) + (match_dup 0)))] + "peep2_reg_dead_p (2, operands[0])" + [(parallel [(set (match_dup 3) + (plus:HI (match_dup 3) + (match_dup 1))) + (clobber (match_dup 2))])]) + +(define_insn "addhi3_clobber" + [(set (match_operand:HI 0 "register_operand" "=d,l") + (plus:HI (match_operand:HI 1 "register_operand" "%0,0") + (match_operand:HI 2 "const_int_operand" "n,n"))) + (clobber (match_scratch:QI 3 "=X,&d"))] + "" + { + gcc_assert (REGNO (operands[0]) == REGNO (operands[1])); + + return avr_out_plus (operands, NULL, NULL); + } + [(set_attr "length" "4") + (set_attr "adjust_len" "out_plus") + (set_attr "cc" "out_plus")]) + (define_insn "addsi3" [(set (match_operand:SI 0 "register_operand" "=r,d ,d,r") Index: config/avr/avr-protos.h =================================================================== --- config/avr/avr-protos.h (revision 180102) +++ config/avr/avr-protos.h (working copy) @@ -83,6 +83,7 @@ extern void avr_output_addr_vec_elt (FIL extern const char *avr_out_sbxx_branch (rtx insn, rtx operands[]); extern const char* avr_out_bitop (rtx, rtx*, int*); extern const char* avr_out_plus (rtx*, int*, int*); +extern const char* avr_out_plus_noclobber (rtx*, int*, int*); extern const char* avr_out_addto_sp (rtx*, int*); extern bool avr_popcount_each_byte (rtx, int, int); Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 180104) +++ config/avr/avr.c (working copy) @@ -1051,9 +1051,10 @@ expand_epilogue (bool sibcall_p) if (frame_pointer_needed) { /* Get rid of frame. */ - emit_move_insn(frame_pointer_rtx, - gen_rtx_PLUS (HImode, frame_pointer_rtx, - gen_int_mode (size, HImode))); + if (size) + emit_move_insn (frame_pointer_rtx, + gen_rtx_PLUS (HImode, frame_pointer_rtx, + gen_int_mode (size, HImode))); } else { @@ -1682,14 +1683,19 @@ notice_update_cc (rtx body ATTRIBUTE_UNU break; case CC_OUT_PLUS: + case CC_OUT_PLUS_NOCLOBBER: { rtx *op = recog_data.operand; int len_dummy, icc; /* Extract insn's operands. */ extract_constrain_insn_cached (insn); + + if (CC_OUT_PLUS == cc) + avr_out_plus (op, &len_dummy, &icc); + else + avr_out_plus_noclobber (op, &len_dummy, &icc); - avr_out_plus (op, &len_dummy, &icc); cc = (enum attr_cc) icc; break; @@ -4773,7 +4779,8 @@ avr_out_plus_1 (rtx *xop, int *plen, enu /* Value to add. There are two ways to add VAL: R += VAL and R -= -VAL. */ rtx xval = xop[2]; - /* Addition does not set cc0 in a usable way. */ + /* Except in the case of ADIW with 16-bit register (see below) + addition does not set cc0 in a usable way. */ *pcc = (MINUS == code) ? CC_SET_CZN : CC_CLOBBER; @@ -4821,6 +4828,9 @@ avr_out_plus_1 (rtx *xop, int *plen, enu started = true; avr_asm_len (code == PLUS ? "adiw %0,%1" : "sbiw %0,%1", op, plen, 1); + + if (n_bytes == 2 && PLUS == code) + *pcc = CC_SET_ZN; } i++; @@ -4836,6 +4846,14 @@ avr_out_plus_1 (rtx *xop, int *plen, enu op, plen, 1); continue; } + else if ((val8 == 1 || val8 == 0xff) + && !started + && i == n_bytes - 1) + { + avr_asm_len ((code == PLUS) ^ (val8 == 1) ? "dec %0" : "inc %0", + op, plen, 1); + break; + } switch (code) { @@ -4924,6 +4942,22 @@ avr_out_plus (rtx *xop, int *plen, int * } +/* Same as above but XOP has just 3 entries. + Supply a dummy 4th operand. */ + +const char* +avr_out_plus_noclobber (rtx *xop, int *plen, int *pcc) +{ + rtx op[4]; + + op[0] = xop[0]; + op[1] = xop[1]; + op[2] = xop[2]; + op[3] = NULL_RTX; + + return avr_out_plus (op, plen, pcc); +} + /* Output bit operation (IOR, AND, XOR) with register XOP[0] and compile time constant XOP[2]: @@ -5308,6 +5342,8 @@ adjust_insn_length (rtx insn, int len) case ADJUST_LEN_OUT_BITOP: avr_out_bitop (insn, op, &len); break; case ADJUST_LEN_OUT_PLUS: avr_out_plus (op, &len, NULL); break; + case ADJUST_LEN_OUT_PLUS_NOCLOBBER: + avr_out_plus_noclobber (op, &len, NULL); break; case ADJUST_LEN_ADDTO_SP: avr_out_addto_sp (op, &len); break;