Message ID | 80797ddb7efb09eef63b444485bd3f5c9fd328b9.1309865252.git.batuzovk@ispras.ru |
---|---|
State | New |
Headers | show |
Am 07.07.2011 14:37, schrieb Kirill Batuzov: > Make tcg_constant_folding do copy and constant propagation. It is a > preparational work before actual constant folding. > > Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru> > --- > tcg/optimize.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 180 insertions(+), 2 deletions(-) > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index c7c7da9..f8afe71 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > ... This patch breaks QEMU on 32 bit hosts (tested on 386 Linux and w32 hosts). Simply running qemu (BIOS only) terminates with abort(). As the error is easy to reproduce, I don't provide a stack frame here. > +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src, > + int nb_temps, int nb_globals) > +{ > + reset_temp(dst, nb_temps, nb_globals); > + assert(temps[src].state != TCG_TEMP_COPY); > + if (src>= nb_globals) { > + assert(temps[src].state != TCG_TEMP_CONST); > + if (temps[src].state != TCG_TEMP_HAS_COPY) { > + temps[src].state = TCG_TEMP_HAS_COPY; > + temps[src].next_copy = src; > + temps[src].prev_copy = src; > + } > + temps[dst].state = TCG_TEMP_COPY; > + temps[dst].val = src; > + temps[dst].next_copy = temps[src].next_copy; > + temps[dst].prev_copy = src; > + temps[temps[dst].next_copy].prev_copy = dst; > + temps[src].next_copy = dst; > + } > + gen_args[0] = dst; > + gen_args[1] = src; > +} > QEMU with a modified tcg_opt_gen_mov() (without the if block) works. Kind regards, Stefan Weil
On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote: > Am 07.07.2011 14:37, schrieb Kirill Batuzov: >> >> Make tcg_constant_folding do copy and constant propagation. It is a >> preparational work before actual constant folding. >> >> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru> >> --- >> tcg/optimize.c | 182 >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 files changed, 180 insertions(+), 2 deletions(-) >> >> diff --git a/tcg/optimize.c b/tcg/optimize.c >> index c7c7da9..f8afe71 100644 >> --- a/tcg/optimize.c >> +++ b/tcg/optimize.c >> > > ... > > This patch breaks QEMU on 32 bit hosts (tested on 386 Linux > and w32 hosts). Simply running qemu (BIOS only) terminates > with abort(). As the error is easy to reproduce, I don't provide > a stack frame here. I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and Sparc64 emulators work fine. Maybe you have a stale build (bug in Makefile dependencies)? >> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src, >> + int nb_temps, int nb_globals) >> +{ >> + reset_temp(dst, nb_temps, nb_globals); >> + assert(temps[src].state != TCG_TEMP_COPY); >> + if (src>= nb_globals) { >> + assert(temps[src].state != TCG_TEMP_CONST); >> + if (temps[src].state != TCG_TEMP_HAS_COPY) { >> + temps[src].state = TCG_TEMP_HAS_COPY; >> + temps[src].next_copy = src; >> + temps[src].prev_copy = src; >> + } >> + temps[dst].state = TCG_TEMP_COPY; >> + temps[dst].val = src; >> + temps[dst].next_copy = temps[src].next_copy; >> + temps[dst].prev_copy = src; >> + temps[temps[dst].next_copy].prev_copy = dst; >> + temps[src].next_copy = dst; >> + } >> + gen_args[0] = dst; >> + gen_args[1] = src; >> +} >> > > QEMU with a modified tcg_opt_gen_mov() (without the if block) works. > > Kind regards, > Stefan Weil > >
Am 03.08.2011 22:20, schrieb Blue Swirl: > On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote: >> Am 07.07.2011 14:37, schrieb Kirill Batuzov: >>> >>> Make tcg_constant_folding do copy and constant propagation. It is a >>> preparational work before actual constant folding. >>> >>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru> >>> --- >>> tcg/optimize.c | 182 >>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>> 1 files changed, 180 insertions(+), 2 deletions(-) >>> >>> diff --git a/tcg/optimize.c b/tcg/optimize.c >>> index c7c7da9..f8afe71 100644 >>> --- a/tcg/optimize.c >>> +++ b/tcg/optimize.c >>> >> >> ... >> >> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux >> and w32 hosts). Simply running qemu (BIOS only) terminates >> with abort(). As the error is easy to reproduce, I don't provide >> a stack frame here. > > I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and > Sparc64 emulators work fine. > > Maybe you have a stale build (bug in Makefile dependencies)? Sorry, an important information was wrong / missing in my report. It's not qemu, but qemu-system-x86_64 which fails to work. I just tested it once more with a new build: $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios /qemu/tcg/tcg.c:1646: tcg fatal error Abgebrochen Cheers, Stefan
Am 03.08.2011 22:56, schrieb Stefan Weil: > Am 03.08.2011 22:20, schrieb Blue Swirl: >> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> >> wrote: >>> Am 07.07.2011 14:37, schrieb Kirill Batuzov: >>>> >>>> Make tcg_constant_folding do copy and constant propagation. It is a >>>> preparational work before actual constant folding. >>>> >>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru> >>>> --- >>>> tcg/optimize.c | 182 >>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>>> 1 files changed, 180 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/tcg/optimize.c b/tcg/optimize.c >>>> index c7c7da9..f8afe71 100644 >>>> --- a/tcg/optimize.c >>>> +++ b/tcg/optimize.c >>>> >>> >>> ... >>> >>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux >>> and w32 hosts). Simply running qemu (BIOS only) terminates >>> with abort(). As the error is easy to reproduce, I don't provide >>> a stack frame here. >> >> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and >> Sparc64 emulators work fine. >> >> Maybe you have a stale build (bug in Makefile dependencies)? > > Sorry, an important information was wrong / missing in my report. > It's not qemu, but qemu-system-x86_64 which fails to work. > > I just tested it once more with a new build: > > $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios > /qemu/tcg/tcg.c:1646: tcg fatal error > Abgebrochen > > Cheers, > Stefan qemu-system-mips64el fails with the same error, so the problem occurs when running 64 bit emulations on 32 bit hosts.
On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote: > Am 03.08.2011 22:56, schrieb Stefan Weil: >> >> Am 03.08.2011 22:20, schrieb Blue Swirl: >>> >>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote: >>>> >>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov: >>>>> >>>>> Make tcg_constant_folding do copy and constant propagation. It is a >>>>> preparational work before actual constant folding. >>>>> >>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru> >>>>> --- >>>>> tcg/optimize.c | 182 >>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>>>> 1 files changed, 180 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c >>>>> index c7c7da9..f8afe71 100644 >>>>> --- a/tcg/optimize.c >>>>> +++ b/tcg/optimize.c >>>>> >>>> >>>> ... >>>> >>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux >>>> and w32 hosts). Simply running qemu (BIOS only) terminates >>>> with abort(). As the error is easy to reproduce, I don't provide >>>> a stack frame here. >>> >>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and >>> Sparc64 emulators work fine. >>> >>> Maybe you have a stale build (bug in Makefile dependencies)? >> >> Sorry, an important information was wrong / missing in my report. >> It's not qemu, but qemu-system-x86_64 which fails to work. >> >> I just tested it once more with a new build: >> >> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios >> /qemu/tcg/tcg.c:1646: tcg fatal error >> Abgebrochen OK, now that is broken also for me. >> Cheers, >> Stefan > > qemu-system-mips64el fails with the same error, so the problem > occurs when running 64 bit emulations on 32 bit hosts. Not always, Sparc64 still works fine.
On Thu, Aug 4, 2011 at 6:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote: > On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote: >> Am 03.08.2011 22:56, schrieb Stefan Weil: >>> >>> Am 03.08.2011 22:20, schrieb Blue Swirl: >>>> >>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote: >>>>> >>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov: >>>>>> >>>>>> Make tcg_constant_folding do copy and constant propagation. It is a >>>>>> preparational work before actual constant folding. >>>>>> >>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru> >>>>>> --- >>>>>> tcg/optimize.c | 182 >>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>>>>> 1 files changed, 180 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c >>>>>> index c7c7da9..f8afe71 100644 >>>>>> --- a/tcg/optimize.c >>>>>> +++ b/tcg/optimize.c >>>>>> >>>>> >>>>> ... >>>>> >>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux >>>>> and w32 hosts). Simply running qemu (BIOS only) terminates >>>>> with abort(). As the error is easy to reproduce, I don't provide >>>>> a stack frame here. >>>> >>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and >>>> Sparc64 emulators work fine. >>>> >>>> Maybe you have a stale build (bug in Makefile dependencies)? >>> >>> Sorry, an important information was wrong / missing in my report. >>> It's not qemu, but qemu-system-x86_64 which fails to work. >>> >>> I just tested it once more with a new build: >>> >>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios >>> /qemu/tcg/tcg.c:1646: tcg fatal error >>> Abgebrochen > > OK, now that is broken also for me. > >>> Cheers, >>> Stefan >> >> qemu-system-mips64el fails with the same error, so the problem >> occurs when running 64 bit emulations on 32 bit hosts. > > Not always, Sparc64 still works fine. x86_64 fails because 'mov_i32 cc_src_0,loc25' is incorrectly optimized to 'mov_i32 cc_src_0,tmp6' where tmp6 is dead after brcond. IN: 0x000000000ffeb90a: shl %cl,%eax OP: ---- 0xffeb90a mov_i32 tmp2,rcx_0 mov_i32 tmp3,rcx_1 mov_i32 tmp0,rax_0 mov_i32 tmp1,rax_1 movi_i32 tmp20,$0x1f and_i32 tmp2,tmp2,tmp20 movi_i32 tmp3,$0x0 movi_i32 tmp21,$0xffffffff movi_i32 tmp22,$0xffffffff add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22 movi_i32 tmp20,$0x80bd4e0 call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17 ...tmp6 is assigned here... movi_i32 tmp20,$0x80bd4e0 call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3 mov_i32 rax_0,tmp0 movi_i32 rax_1,$0x0 mov_i32 loc23,tmp0 mov_i32 loc24,tmp1 mov_i32 loc25,tmp6 ...tmp6 saved to loc25 to survive brcond... mov_i32 loc26,tmp7 movi_i32 tmp21,$0x0 movi_i32 tmp22,$0x0 brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0 mov_i32 cc_src_0,loc25 ...used here. mov_i32 cc_src_1,loc26 mov_i32 cc_dst_0,loc23 mov_i32 cc_dst_1,loc24 movi_i32 cc_op,$0x24 set_label $0x0 movi_i32 tmp8,$0xffeb90c movi_i32 tmp9,$0x0 st_i32 tmp8,env,$0x80 st_i32 tmp9,env,$0x84 movi_i32 tmp20,$debug call tmp20,$0x0,$0 OP after liveness analysis: ---- 0xffeb90a mov_i32 tmp2,rcx_0 nopn $0x2,$0x2 mov_i32 tmp0,rax_0 mov_i32 tmp1,rax_1 movi_i32 tmp20,$0x1f and_i32 tmp2,tmp2,tmp20 movi_i32 tmp3,$0x0 movi_i32 tmp21,$0xffffffff movi_i32 tmp22,$0xffffffff add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22 movi_i32 tmp20,$0x80bd4e0 call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17 OK movi_i32 tmp20,$0x80bd4e0 call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3 mov_i32 rax_0,tmp0 movi_i32 rax_1,$0x0 mov_i32 loc23,tmp0 mov_i32 loc24,tmp1 mov_i32 loc25,tmp6 OK, though loc25 is unused after this, why it is not optimized away? mov_i32 loc26,tmp7 movi_i32 tmp21,$0x0 movi_i32 tmp22,$0x0 brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0 mov_i32 cc_src_0,tmp6 Incorrect optimization. mov_i32 cc_src_1,tmp7 mov_i32 cc_dst_0,tmp0 mov_i32 cc_dst_1,tmp1 movi_i32 cc_op,$0x24 set_label $0x0 movi_i32 tmp8,$0xffeb90c movi_i32 tmp9,$0x0 st_i32 tmp8,env,$0x80 st_i32 tmp9,env,$0x84 movi_i32 tmp20,$debug call tmp20,$0x0,$0 end The corresponding translation code is in target-i386/translate.c:1456, it looks correct. Maybe the optimizer should consider stack and memory temporaries different from register temporaries?
diff --git a/tcg/optimize.c b/tcg/optimize.c index c7c7da9..f8afe71 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -40,24 +40,196 @@ glue(glue(case INDEX_op_, x), _i32) #endif +typedef enum { + TCG_TEMP_UNDEF = 0, + TCG_TEMP_CONST, + TCG_TEMP_COPY, + TCG_TEMP_HAS_COPY, + TCG_TEMP_ANY +} tcg_temp_state; + +struct tcg_temp_info { + tcg_temp_state state; + uint16_t prev_copy; + uint16_t next_copy; + tcg_target_ulong val; +}; + +static struct tcg_temp_info temps[TCG_MAX_TEMPS]; + +/* Reset TEMP's state to TCG_TEMP_ANY. If TEMP was a representative of some + class of equivalent temp's, a new representative should be chosen in this + class. */ +static void reset_temp(TCGArg temp, int nb_temps, int nb_globals) +{ + int i; + TCGArg new_base = (TCGArg)-1; + if (temps[temp].state == TCG_TEMP_HAS_COPY) { + for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) { + if (i >= nb_globals) { + temps[i].state = TCG_TEMP_HAS_COPY; + new_base = i; + break; + } + } + for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) { + if (new_base == (TCGArg)-1) { + temps[i].state = TCG_TEMP_ANY; + } else { + temps[i].val = new_base; + } + } + temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy; + temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy; + } else if (temps[temp].state == TCG_TEMP_COPY) { + temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy; + temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy; + new_base = temps[temp].val; + } + temps[temp].state = TCG_TEMP_ANY; + if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) { + temps[new_base].state = TCG_TEMP_ANY; + } +} + +static int op_bits(int op) +{ + switch (op) { + case INDEX_op_mov_i32: + return 32; +#if TCG_TARGET_REG_BITS == 64 + case INDEX_op_mov_i64: + return 64; +#endif + default: + fprintf(stderr, "Unrecognized operation %d in op_bits.\n", op); + tcg_abort(); + } +} + +static int op_to_movi(int op) +{ + switch (op_bits(op)) { + case 32: + return INDEX_op_movi_i32; +#if TCG_TARGET_REG_BITS == 64 + case 64: + return INDEX_op_movi_i64; +#endif + default: + fprintf(stderr, "op_to_movi: unexpected return value of " + "function op_bits.\n"); + tcg_abort(); + } +} + +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src, + int nb_temps, int nb_globals) +{ + reset_temp(dst, nb_temps, nb_globals); + assert(temps[src].state != TCG_TEMP_COPY); + if (src >= nb_globals) { + assert(temps[src].state != TCG_TEMP_CONST); + if (temps[src].state != TCG_TEMP_HAS_COPY) { + temps[src].state = TCG_TEMP_HAS_COPY; + temps[src].next_copy = src; + temps[src].prev_copy = src; + } + temps[dst].state = TCG_TEMP_COPY; + temps[dst].val = src; + temps[dst].next_copy = temps[src].next_copy; + temps[dst].prev_copy = src; + temps[temps[dst].next_copy].prev_copy = dst; + temps[src].next_copy = dst; + } + gen_args[0] = dst; + gen_args[1] = src; +} + +static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val, + int nb_temps, int nb_globals) +{ + reset_temp(dst, nb_temps, nb_globals); + temps[dst].state = TCG_TEMP_CONST; + temps[dst].val = val; + gen_args[0] = dst; + gen_args[1] = val; +} + +/* Propagate constants and copies, fold constant expressions. */ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args, TCGOpDef *tcg_op_defs) { - int i, nb_ops, op_index, op, nb_temps, nb_globals; + int i, nb_ops, op_index, op, nb_temps, nb_globals, nb_call_args; const TCGOpDef *def; TCGArg *gen_args; + /* Array VALS has an element for each temp. + If this temp holds a constant then its value is kept in VALS' element. + If this temp is a copy of other ones then this equivalence class' + representative is kept in VALS' element. + If this temp is neither copy nor constant then corresponding VALS' + element is unused. */ nb_temps = s->nb_temps; nb_globals = s->nb_globals; + memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info)); nb_ops = tcg_opc_ptr - gen_opc_buf; gen_args = args; for (op_index = 0; op_index < nb_ops; op_index++) { op = gen_opc_buf[op_index]; def = &tcg_op_defs[op]; + /* Do copy propagation */ + if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) { + assert(op != INDEX_op_call); + for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) { + if (temps[args[i]].state == TCG_TEMP_COPY) { + args[i] = temps[args[i]].val; + } + } + } + + /* Propagate constants through copy operations and do constant + folding. Constants will be substituted to arguments by register + allocator where needed and possible. Also detect copies. */ switch (op) { + CASE_OP_32_64(mov): + if ((temps[args[1]].state == TCG_TEMP_COPY + && temps[args[1]].val == args[0]) + || args[0] == args[1]) { + args += 2; + gen_opc_buf[op_index] = INDEX_op_nop; + break; + } + if (temps[args[1]].state != TCG_TEMP_CONST) { + tcg_opt_gen_mov(gen_args, args[0], args[1], + nb_temps, nb_globals); + gen_args += 2; + args += 2; + break; + } + /* Source argument is constant. Rewrite the operation and + let movi case handle it. */ + op = op_to_movi(op); + gen_opc_buf[op_index] = op; + args[1] = temps[args[1]].val; + /* fallthrough */ + CASE_OP_32_64(movi): + tcg_opt_gen_movi(gen_args, args[0], args[1], nb_temps, nb_globals); + gen_args += 2; + args += 2; + break; case INDEX_op_call: - i = (args[0] >> 16) + (args[0] & 0xffff) + 3; + nb_call_args = (args[0] >> 16) + (args[0] & 0xffff); + if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) { + for (i = 0; i < nb_globals; i++) { + reset_temp(i, nb_temps, nb_globals); + } + } + for (i = 0; i < (args[0] >> 16); i++) { + reset_temp(args[i + 1], nb_temps, nb_globals); + } + i = nb_call_args + 3; while (i) { *gen_args = *args; args++; @@ -69,6 +241,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, case INDEX_op_jmp: case INDEX_op_br: CASE_OP_32_64(brcond): + memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info)); for (i = 0; i < def->nb_args; i++) { *gen_args = *args; args++; @@ -76,6 +249,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, } break; default: + /* Default case: we do know nothing about operation so no + propagation is done. We only trash output args. */ + for (i = 0; i < def->nb_oargs; i++) { + reset_temp(args[i], nb_temps, nb_globals); + } for (i = 0; i < def->nb_args; i++) { gen_args[i] = args[i]; }
Make tcg_constant_folding do copy and constant propagation. It is a preparational work before actual constant folding. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> --- tcg/optimize.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 180 insertions(+), 2 deletions(-)