Message ID | 1436958199-5181-9-git-send-email-aurelien@aurel32.net |
---|---|
State | New |
Headers | show |
On 07/15/2015 12:03 PM, Aurelien Jarno wrote: > Now that we have real size changing ops, we don't need to marked high > bits of the destination as garbage. The goal of the optimizer is to > predict the value of the temps (and not of the registers) and do > simplifications when possible. The problem there is therefore not the > fact that those bits are not counted as garbage, but that a size > changing op is replaced by a move. > > This patch is basically a revert of 24666baf, including the changes that > have been made since then. > > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Richard Henderson <rth@twiddle.net> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> What we're missing here is whether the omitted size changing op is extu or exts. Mask should be extended to match. Which means keeping most of this code. r~
On 2015-07-17 07:38, Richard Henderson wrote: > On 07/15/2015 12:03 PM, Aurelien Jarno wrote: > >Now that we have real size changing ops, we don't need to marked high > >bits of the destination as garbage. The goal of the optimizer is to > >predict the value of the temps (and not of the registers) and do > >simplifications when possible. The problem there is therefore not the > >fact that those bits are not counted as garbage, but that a size > >changing op is replaced by a move. > > > >This patch is basically a revert of 24666baf, including the changes that > >have been made since then. > > > >Cc: Paolo Bonzini <pbonzini@redhat.com> > >Cc: Richard Henderson <rth@twiddle.net> > >Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> > > What we're missing here is whether the omitted size changing op is extu or > exts. Mask should be extended to match. Which means keeping most of this > code. I am afraid your correct. Unfortunately one of my goal is to remove this part in the optimizer, as I need that in a patch series I am preparing. I have also tried to check the temp type directly from the optimizer (it is accessible), but it has some performance impact. Propagating the extu/exts as real opcode means propagating the information about size changing up to the optimizer or the register allocator, without having to recreate it from other available information. For now I do wonder if we shouldn't get the size changing extu/exts mandatory instead of reusing the 64-bit only version. This doesn't change the generated code, at least on x86.
On 07/17/2015 11:33 AM, Aurelien Jarno wrote: > For now I do wonder if we shouldn't get the size changing extu/exts > mandatory instead of reusing the 64-bit only version. This doesn't > change the generated code, at least on x86. I'd be surprised if it did anywhere. I don't mind starting with them being required, and then figuring out a way to optimize. r~
On 2015-07-18 08:24, Richard Henderson wrote: > On 07/17/2015 11:33 AM, Aurelien Jarno wrote: > >For now I do wonder if we shouldn't get the size changing extu/exts > >mandatory instead of reusing the 64-bit only version. This doesn't > >change the generated code, at least on x86. > > I'd be surprised if it did anywhere. I don't mind starting with them being > required, and then figuring out a way to optimize. I have a patch series ready for that if you want I can post it as RFC. That said looking more deeply into the problem you found I guess we can solve that easily by using the same convention than the real CPU for storing 32-bit constants in the TCG optimizer. This roughly means the following code for the 32-bit ops: /* 32-bit ops generate 32-bit results. */ if (!(def->flags & TCG_OPF_64BIT)) { if (!TCG_TARGET_HAS_ext_i32_i64) { /* registers are maintained sign-extended */ mask = (int32_t)mask; affected = (int32_t)mask; } else if (!TCG_TARGET_HAS_extu_i32_i64) { /* registers are maintained zero-extended */ mask = (uint32_t)mask; affected = (uint32_t)mask; } else { /* high bits will be computed by ext/extu_i32_i64 */ mask = (uint32_t)mask; affected = (uint32_t)mask; } } And that would be fine for my patch series in preparation, as long as I can predict the high part instead of considering it as garbage.
diff --git a/tcg/optimize.c b/tcg/optimize.c index 18b7bc3..d1a0b6d 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -197,19 +197,13 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args, TCGArg dst, TCGArg val) { TCGOpcode new_op = op_to_movi(op->opc); - tcg_target_ulong mask; op->opc = new_op; reset_temp(dst); temps[dst].state = TCG_TEMP_CONST; temps[dst].val = val; - mask = val; - if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) { - /* High bits of the destination are now garbage. */ - mask |= ~0xffffffffull; - } - temps[dst].mask = mask; + temps[dst].mask = val; args[0] = dst; args[1] = val; @@ -229,17 +223,11 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args, } TCGOpcode new_op = op_to_mov(op->opc); - tcg_target_ulong mask; op->opc = new_op; reset_temp(dst); - mask = temps[src].mask; - if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) { - /* High bits of the destination are now garbage. */ - mask |= ~0xffffffffull; - } - temps[dst].mask = mask; + temps[src].mask = temps[dst].mask; assert(temps[src].state != TCG_TEMP_CONST); @@ -590,7 +578,7 @@ void tcg_optimize(TCGContext *s) reset_all_temps(nb_temps); for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) { - tcg_target_ulong mask, partmask, affected; + tcg_target_ulong mask, affected; int nb_oargs, nb_iargs, i; TCGArg tmp; @@ -945,17 +933,13 @@ void tcg_optimize(TCGContext *s) break; } - /* 32-bit ops generate 32-bit results. For the result is zero test - below, we can ignore high bits, but for further optimizations we - need to record that the high bits contain garbage. */ - partmask = mask; + /* 32-bit ops generate 32-bit results. */ if (!(def->flags & TCG_OPF_64BIT)) { - mask |= ~(tcg_target_ulong)0xffffffffu; - partmask &= 0xffffffffu; + mask &= 0xffffffffu; affected &= 0xffffffffu; } - if (partmask == 0) { + if (mask == 0) { assert(nb_oargs == 1); tcg_opt_gen_movi(s, op, args, args[0], 0); continue;
Now that we have real size changing ops, we don't need to marked high bits of the destination as garbage. The goal of the optimizer is to predict the value of the temps (and not of the registers) and do simplifications when possible. The problem there is therefore not the fact that those bits are not counted as garbage, but that a size changing op is replaced by a move. This patch is basically a revert of 24666baf, including the changes that have been made since then. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- tcg/optimize.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-)