Message ID | 1348084823-18277-3-git-send-email-aurelien@aurel32.net |
---|---|
State | New |
Headers | show |
On 09/19/2012 01:00 PM, Aurelien Jarno wrote: > The copy propagation doesn't check the types of the temps during copy > propagation. However TCG is using the mov_i32 for the i64 to i32 > conversion and thus the two are not equivalent. > > With this patch tcg_opt_gen_mov() doesn't consider two temps with > different types as copies anymore. > > So far it seems the optimization was not aggressive enough to trigger > this bug, but it will be triggered later in this series once the copy > propagation is improved. Exactly where/how does this manifest as problematic? We do this mov_i32 trick from i64->i32 when we're truncating. Given that we're not (yet) targeting mips64 and having to maintain 32-bit sign-extended quantities, I can't see how that would matter. We do the i32->i64 trick immediately before a proper extension. In either case I can't see how plain copy propagation should matter until some other operation is involved. So, do we have some other data propagation bug that is being masked here? r~
On Wed, Sep 19, 2012 at 02:33:46PM -0700, Richard Henderson wrote: > On 09/19/2012 01:00 PM, Aurelien Jarno wrote: > > The copy propagation doesn't check the types of the temps during copy > > propagation. However TCG is using the mov_i32 for the i64 to i32 > > conversion and thus the two are not equivalent. > > > > With this patch tcg_opt_gen_mov() doesn't consider two temps with > > different types as copies anymore. > > > > So far it seems the optimization was not aggressive enough to trigger > > this bug, but it will be triggered later in this series once the copy > > propagation is improved. > > Exactly where/how does this manifest as problematic? The problem arise when a 64-bit value is moved to a 32-bit value, and later this 64-bit value is reused. With the copy propagation if you consider them as identical, the 32-bit value might be used instead of the 64-bit value. This happens for example for the umull instruction on arm: | OP: | ---- 0xf67e5ea0 | mov_i32 tmp5,r3 | mov_i32 tmp6,r8 | ext32u_i64 tmp7,tmp5 | ext32u_i64 tmp8,tmp6 | mul_i64 tmp7,tmp7,tmp8 tmp7 is a 64-bit value | mov_i32 tmp6,tmp7 Now transfered to a 32-bit tmp | mov_i32 r1,tmp6 and a 32-bit register. | movi_i64 tmp8,$0x20 | shr_i64 tmp7,tmp7,tmp8 | mov_i32 tmp6,tmp7 | mov_i32 r3,tmp6 | goto_tb $0x1 | movi_i32 pc,$0xf67e5ea4 | exit_tb $0x7f16948b0299 | | OP after optimization and liveness analysis: | ---- 0xf67e5ea0 | nopn $0x2,$0x2 | nopn $0x2,$0x2 | ext32u_i64 tmp7,r3 | ext32u_i64 tmp8,r8 | mul_i64 tmp7,tmp7,tmp8 | mov_i32 tmp6,tmp7 | mov_i32 r1,tmp6 | movi_i64 tmp8,$0x20 | shr_i64 tmp7,r1,tmp8 Here, tmp7 is replaced by r1. However r1 only contains the 32-bit low part of tmp7, thus returning 0. | mov_i32 tmp6,tmp7 | mov_i32 r3,tmp6 | goto_tb $0x1 | movi_i32 pc,$0xf67e5ea4 | exit_tb $0x7f16948b0299 | end > We do this mov_i32 trick from i64->i32 when we're truncating. > Given that we're not (yet) targeting mips64 and having to > maintain 32-bit sign-extended quantities, I can't see how > that would matter. It does matter on architectures using different instructions to work on the 32 part of a register the registers. Actually in the case a above with a register shift right, shr_i32 and shr_i64 are always implemented differently to not shift bits from the 32bit high part to the low part. > We do the i32->i64 trick immediately before a proper extension. > > In either case I can't see how plain copy propagation should > matter until some other operation is involved. So, do we have > some other data propagation bug that is being masked here? I think it's a bug of the copy propagation considering these registers are equivalent. If on x86-64 you replace all accesses to rax by eax, your code will be smaller, but I am not sure it is going to work correctly. The only latent bug I can see there, is having mov_i32 both being used to move data between 32-bit temps and between a 64-bit temp and a 32-bit temp.
On 09/19/2012 10:54 PM, Aurelien Jarno wrote: > | mov_i32 r1,tmp6 > | movi_i64 tmp8,$0x20 > | shr_i64 tmp7,r1,tmp8 > > Here, tmp7 is replaced by r1. However r1 only contains the 32-bit low > part of tmp7, thus returning 0. Ok. Thanks for getting that on the record. r~
diff --git a/tcg/optimize.c b/tcg/optimize.c index ca81198..13b5b15 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -106,12 +106,13 @@ static TCGOpcode op_to_movi(TCGOpcode op) } } -static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src, - int nb_temps, int nb_globals) +static void tcg_opt_gen_mov(TCGContext *s, TCGArg *gen_args, + TCGArg dst, TCGArg src) { - reset_temp(dst, nb_temps, nb_globals); + reset_temp(dst, s->nb_temps, s->nb_globals); assert(temps[src].state != TCG_TEMP_COPY); - if (src >= nb_globals) { + /* Only consider temps with the same type (width) as copies. */ + if (src >= s->nb_globals && s->temps[dst].type == s->temps[src].type) { assert(temps[src].state != TCG_TEMP_CONST); if (temps[src].state != TCG_TEMP_HAS_COPY) { temps[src].state = TCG_TEMP_HAS_COPY; @@ -440,8 +441,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, gen_opc_buf[op_index] = INDEX_op_nop; } else { gen_opc_buf[op_index] = op_to_mov(op); - tcg_opt_gen_mov(gen_args, args[0], args[1], - nb_temps, nb_globals); + tcg_opt_gen_mov(s, gen_args, args[0], args[1]); gen_args += 2; } args += 3; @@ -478,8 +478,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, gen_opc_buf[op_index] = INDEX_op_nop; } else { gen_opc_buf[op_index] = op_to_mov(op); - tcg_opt_gen_mov(gen_args, args[0], args[1], nb_temps, - nb_globals); + tcg_opt_gen_mov(s, gen_args, args[0], args[1]); gen_args += 2; } args += 3; @@ -503,8 +502,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, break; } if (temps[args[1]].state != TCG_TEMP_CONST) { - tcg_opt_gen_mov(gen_args, args[0], args[1], - nb_temps, nb_globals); + tcg_opt_gen_mov(s, gen_args, args[0], args[1]); gen_args += 2; args += 2; break;
The copy propagation doesn't check the types of the temps during copy propagation. However TCG is using the mov_i32 for the i64 to i32 conversion and thus the two are not equivalent. With this patch tcg_opt_gen_mov() doesn't consider two temps with different types as copies anymore. So far it seems the optimization was not aggressive enough to trigger this bug, but it will be triggered later in this series once the copy propagation is improved. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- tcg/optimize.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)