[RFC,8/9] tcg/optimize: do not simplify size changing moves

Message ID	1436958199-5181-9-git-send-email-aurelien@aurel32.net
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Aurelien Jarno <aurelien@aurel32.net> To: qemu-devel@nongnu.org Date: Wed, 15 Jul 2015 13:03:18 +0200 Message-Id: <1436958199-5181-9-git-send-email-aurelien@aurel32.net> In-Reply-To: <1436958199-5181-1-git-send-email-aurelien@aurel32.net> References: <1436958199-5181-1-git-send-email-aurelien@aurel32.net> Error: Malformed IPv6 address (bad octet value). Cc: Paolo Bonzini <pbonzini@redhat.com>, Aurelien Jarno <aurelien@aurel32.net>, Richard Henderson <rth@twiddle.net> Subject: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Message ID

1436958199-5181-9-git-send-email-aurelien@aurel32.net

State

New

Headers

From: Aurelien Jarno <aurelien@aurel32.net>
To: qemu-devel@nongnu.org
Date: Wed, 15 Jul 2015 13:03:18 +0200
Message-Id: <1436958199-5181-9-git-send-email-aurelien@aurel32.net>
In-Reply-To: <1436958199-5181-1-git-send-email-aurelien@aurel32.net>
References: <1436958199-5181-1-git-send-email-aurelien@aurel32.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Aurelien Jarno <aurelien@aurel32.net>, 
	Richard Henderson <rth@twiddle.net>
Subject: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size
	changing moves
Precedence: list
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Commit Message

Aurelien Jarno July 15, 2015, 11:03 a.m. UTC

Now that we have real size changing ops, we don't need to marked high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

Comments

Richard Henderson July 17, 2015, 6:38 a.m. UTC | #1

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> Now that we have real size changing ops, we don't need to marked high
> bits of the destination as garbage. The goal of the optimizer is to
> predict the value of the temps (and not of the registers) and do
> simplifications when possible. The problem there is therefore not the
> fact that those bits are not counted as garbage, but that a size
> changing op is replaced by a move.
>
> This patch is basically a revert of 24666baf, including the changes that
> have been made since then.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

What we're missing here is whether the omitted size changing op is extu or 
exts.  Mask should be extended to match.  Which means keeping most of this code.


r~

Aurelien Jarno July 17, 2015, 10:33 a.m. UTC | #2

On 2015-07-17 07:38, Richard Henderson wrote:
> On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> >Now that we have real size changing ops, we don't need to marked high
> >bits of the destination as garbage. The goal of the optimizer is to
> >predict the value of the temps (and not of the registers) and do
> >simplifications when possible. The problem there is therefore not the
> >fact that those bits are not counted as garbage, but that a size
> >changing op is replaced by a move.
> >
> >This patch is basically a revert of 24666baf, including the changes that
> >have been made since then.
> >
> >Cc: Paolo Bonzini <pbonzini@redhat.com>
> >Cc: Richard Henderson <rth@twiddle.net>
> >Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> 
> What we're missing here is whether the omitted size changing op is extu or
> exts.  Mask should be extended to match.  Which means keeping most of this
> code.

I am afraid your correct. Unfortunately one of my goal is to remove this
part in the optimizer, as I need that in a patch series I am preparing.
I have also tried to check the temp type directly from the optimizer (it
is accessible), but it has some performance impact. Propagating the
extu/exts as real opcode means propagating the information about size
changing up to the optimizer or the register allocator, without having
to recreate it from other available information.

For now I do wonder if we shouldn't get the size changing extu/exts
mandatory instead of reusing the 64-bit only version. This doesn't
change the generated code, at least on x86.

Richard Henderson July 18, 2015, 7:24 a.m. UTC | #3

On 07/17/2015 11:33 AM, Aurelien Jarno wrote:
> For now I do wonder if we shouldn't get the size changing extu/exts
> mandatory instead of reusing the 64-bit only version. This doesn't
> change the generated code, at least on x86.

I'd be surprised if it did anywhere.  I don't mind starting with them being 
required, and then figuring out a way to optimize.

r~

Aurelien Jarno July 18, 2015, 9:19 p.m. UTC | #4

On 2015-07-18 08:24, Richard Henderson wrote:
> On 07/17/2015 11:33 AM, Aurelien Jarno wrote:
> >For now I do wonder if we shouldn't get the size changing extu/exts
> >mandatory instead of reusing the 64-bit only version. This doesn't
> >change the generated code, at least on x86.
> 
> I'd be surprised if it did anywhere.  I don't mind starting with them being
> required, and then figuring out a way to optimize.

I have a patch series ready for that if you want I can post it as RFC.

That said looking more deeply into the problem you found I guess we can
solve that easily by using the same convention than the real CPU for
storing 32-bit constants in the TCG optimizer.

This roughly means the following code for the 32-bit ops:

         /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
             if (!TCG_TARGET_HAS_ext_i32_i64) {
                 /* registers are maintained sign-extended */
                 mask = (int32_t)mask;
                 affected = (int32_t)mask;
             } else if (!TCG_TARGET_HAS_extu_i32_i64) { 
                 /* registers are maintained zero-extended */
                 mask = (uint32_t)mask;
                 affected = (uint32_t)mask;
             } else {
                 /* high bits will be computed by ext/extu_i32_i64 */
                 mask = (uint32_t)mask;
                 affected = (uint32_t)mask;
             }
         }

And that would be fine for my patch series in preparation, as long as I
can predict the high part instead of considering it as garbage.

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 18b7bc3..d1a0b6d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -197,19 +197,13 @@  static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
                              TCGArg dst, TCGArg val)
 {
     TCGOpcode new_op = op_to_movi(op->opc);
-    tcg_target_ulong mask;
 
     op->opc = new_op;
 
     reset_temp(dst);
     temps[dst].state = TCG_TEMP_CONST;
     temps[dst].val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[dst].mask = val;
 
     args[0] = dst;
     args[1] = val;
@@ -229,17 +223,11 @@  static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
 
     TCGOpcode new_op = op_to_mov(op->opc);
-    tcg_target_ulong mask;
 
     op->opc = new_op;
 
     reset_temp(dst);
-    mask = temps[src].mask;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[src].mask = temps[dst].mask;
 
     assert(temps[src].state != TCG_TEMP_CONST);
 
@@ -590,7 +578,7 @@  void tcg_optimize(TCGContext *s)
     reset_all_temps(nb_temps);
 
     for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
-        tcg_target_ulong mask, partmask, affected;
+        tcg_target_ulong mask, affected;
         int nb_oargs, nb_iargs, i;
         TCGArg tmp;
 
@@ -945,17 +933,13 @@  void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* 32-bit ops generate 32-bit results.  For the result is zero test
-           below, we can ignore high bits, but for further optimizations we
-           need to record that the high bits contain garbage.  */
-        partmask = mask;
+        /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
-            mask |= ~(tcg_target_ulong)0xffffffffu;
-            partmask &= 0xffffffffu;
+            mask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
 
-        if (partmask == 0) {
+        if (mask == 0) {
             assert(nb_oargs == 1);
             tcg_opt_gen_movi(s, op, args, args[0], 0);
             continue;

[RFC,8/9] tcg/optimize: do not simplify size changing moves

Commit Message

Comments

Patch