From patchwork Wed Aug 5 08:13:38 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Virolainen X-Patchwork-Id: 30783 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by bilbo.ozlabs.org (Postfix) with ESMTPS id 0F33FB7079 for ; Wed, 5 Aug 2009 18:17:38 +1000 (EST) Received: from localhost ([127.0.0.1]:35892 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MYbgj-00028P-PT for incoming@patchwork.ozlabs.org; Wed, 05 Aug 2009 04:17:33 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MYbgB-00027a-Pz for qemu-devel@nongnu.org; Wed, 05 Aug 2009 04:16:59 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MYbgA-00027N-VX for qemu-devel@nongnu.org; Wed, 05 Aug 2009 04:16:59 -0400 Received: from [199.232.76.173] (port=38126 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MYbgA-00026p-Am for qemu-devel@nongnu.org; Wed, 05 Aug 2009 04:16:58 -0400 Received: from mail.nomovok.com ([83.150.122.238]:41111) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MYbg9-0000RU-5X for qemu-devel@nongnu.org; Wed, 05 Aug 2009 04:16:57 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.nomovok.com (Postfix) with ESMTP id 35BCB11175B for ; Wed, 5 Aug 2009 11:16:53 +0300 (EEST) X-Virus-Scanned: Debian amavisd-new at nomovok.com Received: from mail.nomovok.com ([127.0.0.1]) by localhost (mail.nomovok.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KrlCLtJEhHJV for ; Wed, 5 Aug 2009 11:16:50 +0300 (EEST) Received: from [91.153.182.37] (a91-153-182-37.elisa-laajakaista.fi [91.153.182.37]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.nomovok.com (Postfix) with ESMTPSA id 84638111744 for ; Wed, 5 Aug 2009 11:16:50 +0300 (EEST) Message-ID: <4A793F32.4090207@nomovok.com> Date: Wed, 05 Aug 2009 11:13:38 +0300 From: Pablo Virolainen User-Agent: Thunderbird 2.0.0.22 (X11/20090608) MIME-Version: 1.0 To: qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [PATCH] RFC: TCG constant propagation. References: In-Reply-To: X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Filip Navara kirjoitti: > Add support for constant propagation to TCG. This has to be paired with the liveness > analysis to remove the dead code. Not all possible operations are covered, but the > most common ones are. This improves the code generation for several ARM instructions, > like MVN (immediate), and it may help other targets as well. On my small benchmark, qemu-system-sh4 was about 3% slower on Intel Xeon E5405@2.00GHz. I'm running 64-bit mode. My mini benchmark is to build zlib 1.2.3, so it's 'real' world work load. Ran the benchmark several times and results seems to be pretty constant. ps. I added INDEX_op_*_i64 cases to the evaluation part. I'm not completly sure if those &mask should be there. Pablo Virolainen --- qemu-0.11.0-rc1_orig/tcg/tcg.c 2009-07-30 03:38:26.000000000 +0300 +++ qemu-0.11.0-rc1/tcg/tcg.c 2009-08-05 10:43:48.000000000 +0300 @@ -1021,7 +1021,194 @@ #endif tdefs++; } +} +static void tcg_const_analysis(TCGContext *s) +{ + int nb_cargs, nb_iargs, nb_oargs, dest, src, src2, del_args, i; + TCGArg *args; + uint16_t op; + uint16_t *opc_ptr; + const TCGOpDef *def; + uint8_t *const_temps; + tcg_target_ulong *temp_values; + tcg_target_ulong val, mask; + tcg_target_ulong dest_val, src_val, src2_val; + + const_temps = tcg_malloc(s->nb_temps); + memset(const_temps, 0, s->nb_temps); + temp_values = tcg_malloc(s->nb_temps * sizeof(uint32_t)); + + opc_ptr = gen_opc_buf; + args = gen_opparam_buf; + while (opc_ptr < gen_opc_ptr) { + op = *opc_ptr; + def = &tcg_op_defs[op]; + nb_oargs = def->nb_oargs; + nb_iargs = def->nb_iargs; + nb_cargs = def->nb_cargs; + del_args = 0; + mask = ~((tcg_target_ulong)0); + + switch(op) { + case INDEX_op_movi_i32: +#if TCG_TARGET_REG_BITS == 64 + case INDEX_op_movi_i64: +#endif + dest = args[0]; + val = args[1]; + const_temps[dest] = 1; + temp_values[dest] = val; + break; + case INDEX_op_mov_i32: +#if TCG_TARGET_REG_BITS == 64 + case INDEX_op_mov_i64: +#endif + dest = args[0]; + src = args[1]; + const_temps[dest] = const_temps[src]; + temp_values[dest] = temp_values[src]; + break; + case INDEX_op_not_i32: +#if TCG_TARGET_REG_BITS == 64 + mask = 0xffffffff; + case INDEX_op_not_i64: +#endif + dest = args[0]; + src = args[1]; + if (const_temps[src]) { + const_temps[dest] = 1; + dest_val = ~temp_values[src]; + *opc_ptr = INDEX_op_movi_i32; + args[1] = temp_values[dest] = dest_val & mask; + } else { + const_temps[dest] = 0; + } + break; + case INDEX_op_add_i32: + case INDEX_op_sub_i32: + case INDEX_op_mul_i32: + case INDEX_op_and_i32: + case INDEX_op_or_i32: + case INDEX_op_xor_i32: + case INDEX_op_shl_i32: + case INDEX_op_shr_i32: +#if TCG_TARGET_REG_BITS == 64 + mask = 0xffffffff; + case INDEX_op_add_i64: + case INDEX_op_sub_i64: + case INDEX_op_mul_i64: + case INDEX_op_and_i64: + case INDEX_op_or_i64: + case INDEX_op_xor_i64: + case INDEX_op_shl_i64: + case INDEX_op_shr_i64: +#endif + + dest = args[0]; + src = args[1]; + src2 = args[2]; + if (const_temps[src] && const_temps[src2]) { + src_val = temp_values[src]; + src2_val = temp_values[src2]; + const_temps[dest] = 1; + switch (op) { + case INDEX_op_add_i32: + dest_val = src_val + src2_val; + break; + case INDEX_op_add_i64: + dest_val = (src_val + src2_val) & mask; + break; + case INDEX_op_sub_i32: + dest_val = src_val - src2_val; + break; + case INDEX_op_sub_i64: + dest_val = (src_val - src2_val) & mask; + break; + case INDEX_op_mul_i32: + dest_val = src_val * src2_val; + break; + case INDEX_op_mul_i64: + dest_val = (src_val * src2_val) & mask; + break; + case INDEX_op_and_i32: + dest_val = src_val & src2_val; + break; + case INDEX_op_and_i64: + dest_val = src_val & src2_val & mask; + break; + case INDEX_op_or_i32: + dest_val = src_val | src2_val; + break; + case INDEX_op_or_i64: + dest_val = (src_val | src2_val) & mask; + break; + case INDEX_op_xor_i32: + dest_val = src_val ^ src2_val; + break; + case INDEX_op_xor_i64: + dest_val = (src_val ^ src2_val) & mask; + break; + case INDEX_op_shl_i32: + dest_val = src_val << src2_val; + break; + case INDEX_op_shl_i64: + dest_val = (src_val << src2_val) & mask; + break; + case INDEX_op_shr_i32: + dest_val = src_val >> src2_val; + break; + case INDEX_op_shr_i64: + dest_val = (src_val >> src2_val) & mask; + break; + default: + fprintf(stderr,"index op %i\n",op); + tcg_abort(); + return; + } + *opc_ptr = INDEX_op_movi_i32; + args[1] = temp_values[dest] = dest_val & mask; + del_args = 1; + } else { + const_temps[dest] = 0; + } + break; + case INDEX_op_call: + nb_oargs = args[0] >> 16; + nb_iargs = args[0] & 0xffff; + nb_cargs = def->nb_cargs; + args++; + for (i = 0; i < nb_oargs; i++) { + const_temps[args[i]] = 0; + } + break; + case INDEX_op_nopn: + /* variable number of arguments */ + nb_cargs = args[0]; + break; + case INDEX_op_set_label: + memset(const_temps, 0, s->nb_temps); + break; + default: + if (def->flags & TCG_OPF_BB_END) { + memset(const_temps, 0, s->nb_temps); + } else { + for (i = 0; i < nb_oargs; i++) { + const_temps[args[i]] = 0; + } + } + break; + } + opc_ptr++; + args += nb_iargs + nb_oargs + nb_cargs - del_args; + if (del_args > 0) { + gen_opparam_ptr -= del_args; + memmove(args, args + del_args, (gen_opparam_ptr - args) * sizeof(*args)); + } + } + + if (args != gen_opparam_ptr) + tcg_abort(); } #ifdef USE_LIVENESS_ANALYSIS @@ -1891,6 +2078,8 @@ } #endif + tcg_const_analysis(s); + #ifdef CONFIG_PROFILER s->la_time -= profile_getclock(); #endif