From patchwork Mon Oct 10 19:13:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688229 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTFb5PC3z20d4 for ; Tue, 11 Oct 2022 06:17:39 +1100 (AEDT) Received: from localhost ([::1]:49706 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyHR-0001ks-8u for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:17:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33222) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyF5-00014l-0m; Mon, 10 Oct 2022 15:15:11 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyF2-0003Ag-Sx; Mon, 10 Oct 2022 15:15:10 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:13:58 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 83461800278; Mon, 10 Oct 2022 16:13:58 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 01/12] target/ppc: Moved VMLADDUHM to decodetree and use gvec Date: Mon, 10 Oct 2022 16:13:45 -0300 Message-Id: <20221010191356.83659-2-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:13:58.0796 (UTC) FILETIME=[730394C0:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" This patch moves VMLADDUHM to decodetree a creates a gvec implementation using mul_vec and add_vec. rept loop master patch 8 12500 0,01810500 0,00903100 (-50.1%) 25 4000 0,01739400 0,00747700 (-57.0%) 100 1000 0,01843600 0,00901400 (-51.1%) 500 200 0,02574600 0,01971000 (-23.4%) 2500 40 0,05921600 0,07121800 (+20.3%) 8000 12 0,15326700 0,21725200 (+41.7%) The significant difference in performance when REPT is low and LOOP is high I think is due to the fact that the new implementation has a higher translation time, as when using a helper only 5 TCGop are used but with the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec equivalent so this instruction is implemented with the help of 5 others, vmuleu, vmulou, vmrgh, vmrgl and vpkum). Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 +- target/ppc/insn32.decode | 2 ++ target/ppc/int_helper.c | 3 +- target/ppc/translate.c | 1 - target/ppc/translate/vmx-impl.c.inc | 48 ++++++++++++++++++----------- 5 files changed, 35 insertions(+), 21 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 57eee07256..9c562ab00e 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -264,7 +264,7 @@ DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) DEF_HELPER_5(VMSUMUHS, void, env, avr, avr, avr, avr) DEF_HELPER_FLAGS_4(VMSUMSHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) DEF_HELPER_5(VMSUMSHS, void, env, avr, avr, avr, avr) -DEF_HELPER_FLAGS_4(vmladduhm, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) +DEF_HELPER_FLAGS_5(VMLADDUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32) DEF_HELPER_FLAGS_1(mfvscr, TCG_CALL_NO_RWG, i32, env) DEF_HELPER_3(lvebx, void, env, avr, tl) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index a5249ee32c..7445455a12 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -693,6 +693,8 @@ VMSUMUHS 000100 ..... ..... ..... ..... 100111 @VA VMSUMCUD 000100 ..... ..... ..... ..... 010111 @VA VMSUMUDM 000100 ..... ..... ..... ..... 100011 @VA +VMLADDUHM 000100 ..... ..... ..... ..... 100010 @VA + ## Vector String Instructions VSTRIBL 000100 ..... 00000 ..... . 0000001101 @VX_tb_rc diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 696096100b..0d25000b2a 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -974,7 +974,8 @@ void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, } } -void helper_vmladduhm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) +void helper_VMLADDUHM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c, + uint32_t v) { int i; diff --git a/target/ppc/translate.c b/target/ppc/translate.c index e810842925..11f729c60c 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -6921,7 +6921,6 @@ GEN_HANDLER(lvsl, 0x1f, 0x06, 0x00, 0x00000001, PPC_ALTIVEC), GEN_HANDLER(lvsr, 0x1f, 0x06, 0x01, 0x00000001, PPC_ALTIVEC), GEN_HANDLER(mfvscr, 0x04, 0x2, 0x18, 0x001ff800, PPC_ALTIVEC), GEN_HANDLER(mtvscr, 0x04, 0x2, 0x19, 0x03ff0000, PPC_ALTIVEC), -GEN_HANDLER(vmladduhm, 0x04, 0x11, 0xFF, 0x00000000, PPC_ALTIVEC), #if defined(TARGET_PPC64) GEN_HANDLER_E(maddhd_maddhdu, 0x04, 0x18, 0xFF, 0x00000000, PPC_NONE, PPC2_ISA300), diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index e644ad3236..9f18c6d4f2 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -2523,24 +2523,6 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx) \ GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16) -static void gen_vmladduhm(DisasContext *ctx) -{ - TCGv_ptr ra, rb, rc, rd; - if (unlikely(!ctx->altivec_enabled)) { - gen_exception(ctx, POWERPC_EXCP_VPU); - return; - } - ra = gen_avr_ptr(rA(ctx->opcode)); - rb = gen_avr_ptr(rB(ctx->opcode)); - rc = gen_avr_ptr(rC(ctx->opcode)); - rd = gen_avr_ptr(rD(ctx->opcode)); - gen_helper_vmladduhm(rd, ra, rb, rc); - tcg_temp_free_ptr(ra); - tcg_temp_free_ptr(rb); - tcg_temp_free_ptr(rc); - tcg_temp_free_ptr(rd); -} - static bool do_va_helper(DisasContext *ctx, arg_VA *a, void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr)) { @@ -2569,6 +2551,36 @@ TRANS_FLAGS2(ALTIVEC_207, VSUBECUQ, do_va_helper, gen_helper_VSUBECUQ) TRANS_FLAGS(ALTIVEC, VPERM, do_va_helper, gen_helper_VPERM) TRANS_FLAGS2(ISA300, VPERMR, do_va_helper, gen_helper_VPERMR) +static void gen_vmladduhm_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b, + TCGv_vec c) +{ + tcg_gen_mul_vec(vece, t, a, b); + tcg_gen_add_vec(vece, t, t, c); +} + +static bool trans_VMLADDUHM(DisasContext *ctx, arg_VA *a) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_add_vec, INDEX_op_mul_vec, 0 + }; + + static const GVecGen4 op = { + .fno = gen_helper_VMLADDUHM, + .fniv = gen_vmladduhm_vec, + .opt_opc = vecop_list, + .vece = MO_16 + }; + + REQUIRE_INSNS_FLAGS(ctx, ALTIVEC); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_4(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), avr_full_offset(a->rc), + 16, 16, &op); + + return true; +} + static bool trans_VSEL(DisasContext *ctx, arg_VA *a) { REQUIRE_INSNS_FLAGS(ctx, ALTIVEC); From patchwork Mon Oct 10 19:13:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688231 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTHC75k4z20d4 for ; Tue, 11 Oct 2022 06:19:03 +1100 (AEDT) Received: from localhost ([::1]:49708 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyIn-0001qD-RA for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:19:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33224) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyF7-0001BB-Kz; Mon, 10 Oct 2022 15:15:13 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyF5-0003Ag-Re; Mon, 10 Oct 2022 15:15:13 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:13:59 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id CC26A800278; Mon, 10 Oct 2022 16:13:58 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 02/12] target/ppc: Move VMH[R]ADDSHS instruction to decodetree Date: Mon, 10 Oct 2022 16:13:46 -0300 Message-Id: <20221010191356.83659-3-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:13:59.0046 (UTC) FILETIME=[7329BA60:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find a satisfactory implementation with TCG inline. vmhaddshs: rept loop master patch 8 12500 0,02983400 0,02648500 (-11.2%) 25 4000 0,02946000 0,02518000 (-14.5%) 100 1000 0,03104300 0,02638000 (-15.0%) 500 200 0,04002000 0,03502500 (-12.5%) 2500 40 0,08090100 0,07562200 (-6.5%) 8000 12 0,19242600 0,18626800 (-3.2%) vmhraddshs: rept loop master patch 8 12500 0,03078600 0,02851000 (-7.4%) 25 4000 0,02793200 0,02746900 (-1.7%) 100 1000 0,02886000 0,02839900 (-1.6%) 500 200 0,03714700 0,03799200 (+2.3%) 2500 40 0,07948000 0,07852200 (-1.2%) 8000 12 0,19049800 0,18813900 (-1.2%) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 4 ++-- target/ppc/insn32.decode | 2 ++ target/ppc/int_helper.c | 4 ++-- target/ppc/translate/vmx-impl.c.inc | 5 +++-- target/ppc/translate/vmx-ops.c.inc | 1 - 5 files changed, 9 insertions(+), 7 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 9c562ab00e..f02a9497b7 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -258,8 +258,8 @@ DEF_HELPER_4(vpkuhum, void, env, avr, avr, avr) DEF_HELPER_4(vpkuwum, void, env, avr, avr, avr) DEF_HELPER_4(vpkudum, void, env, avr, avr, avr) DEF_HELPER_FLAGS_3(vpkpx, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_5(vmhaddshs, void, env, avr, avr, avr, avr) -DEF_HELPER_5(vmhraddshs, void, env, avr, avr, avr, avr) +DEF_HELPER_5(VMHADDSHS, void, env, avr, avr, avr, avr) +DEF_HELPER_5(VMHRADDSHS, void, env, avr, avr, avr, avr) DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) DEF_HELPER_5(VMSUMUHS, void, env, avr, avr, avr, avr) DEF_HELPER_FLAGS_4(VMSUMSHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 7445455a12..9a509e84df 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -694,6 +694,8 @@ VMSUMCUD 000100 ..... ..... ..... ..... 010111 @VA VMSUMUDM 000100 ..... ..... ..... ..... 100011 @VA VMLADDUHM 000100 ..... ..... ..... ..... 100010 @VA +VMHADDSHS 000100 ..... ..... ..... ..... 100000 @VA +VMHRADDSHS 000100 ..... ..... ..... ..... 100001 @VA ## Vector String Instructions diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 0d25000b2a..ae1ba8084d 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -939,7 +939,7 @@ target_ulong helper_vctzlsbb(ppc_avr_t *r) return count; } -void helper_vmhaddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, +void helper_VMHADDSHS(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { int sat = 0; @@ -957,7 +957,7 @@ void helper_vmhaddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, } } -void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, +void helper_VMHRADDSHS(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { int sat = 0; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 9f18c6d4f2..3acd585a2f 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -2521,7 +2521,7 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx) \ tcg_temp_free_ptr(rd); \ } -GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16) +GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) static bool do_va_helper(DisasContext *ctx, arg_VA *a, void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr)) @@ -2620,7 +2620,8 @@ static bool do_va_env_helper(DisasContext *ctx, arg_VA *a, TRANS_FLAGS(ALTIVEC, VMSUMUHS, do_va_env_helper, gen_helper_VMSUMUHS) TRANS_FLAGS(ALTIVEC, VMSUMSHS, do_va_env_helper, gen_helper_VMSUMSHS) -GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) +TRANS_FLAGS(ALTIVEC, VMHADDSHS, do_va_env_helper, gen_helper_VMHADDSHS) +TRANS_FLAGS(ALTIVEC, VMHRADDSHS, do_va_env_helper, gen_helper_VMHRADDSHS) GEN_VXFORM_NOA(vclzb, 1, 28) GEN_VXFORM_NOA(vclzh, 1, 29) diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index a3a0fd0650..7cd9d40e06 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -219,7 +219,6 @@ GEN_VXFORM_UIMM(vctsxs, 5, 15), #define GEN_VAFORM_PAIRED(name0, name1, opc2) \ GEN_HANDLER(name0##_##name1, 0x04, opc2, 0xFF, 0x00000000, PPC_ALTIVEC) -GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16), GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23), GEN_VXFORM_DUAL(vclzb, vpopcntb, 1, 28, PPC_NONE, PPC2_ALTIVEC_207), From patchwork Mon Oct 10 19:13:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688233 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTNB49f5z23jn for ; Tue, 11 Oct 2022 06:23:22 +1100 (AEDT) Received: from localhost ([::1]:53604 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyMy-0007h6-FS for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:23:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46574) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFA-0001Hl-Np; Mon, 10 Oct 2022 15:15:16 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyF8-0003Ag-N7; Mon, 10 Oct 2022 15:15:16 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:13:59 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 1AAC3800278; Mon, 10 Oct 2022 16:13:59 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 03/12] target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec Date: Mon, 10 Oct 2022 16:13:47 -0300 Message-Id: <20221010191356.83659-4-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:13:59.0374 (UTC) FILETIME=[735BC6E0:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an implementation based on the helper, with the main difference being changing the -1 (aka all bits set to 1) result returned by cmp when true to +1. It also implemented a .fni4 version of those instructions and dropped the helper. vaddcuw: rept loop master patch 8 12500 0,01008200 0,00612400 (-39.3%) 25 4000 0,01091500 0,00471600 (-56.8%) 100 1000 0,01332500 0,00593700 (-55.4%) 500 200 0,01998500 0,01275700 (-36.2%) 2500 40 0,04704300 0,04364300 (-7.2%) 8000 12 0,10748200 0,11241000 (+4.6%) vsubcuw: rept loop master patch 8 12500 0,01226200 0,00571600 (-53.4%) 25 4000 0,01493500 0,00462100 (-69.1%) 100 1000 0,01522700 0,00455100 (-70.1%) 500 200 0,02384600 0,01133500 (-52.5%) 2500 40 0,04935200 0,03178100 (-35.6%) 8000 12 0,09039900 0,09440600 (+4.4%) Overall there was a gain in performance, but the TCGop code was still slightly bigger in the new version (it went from 4 to 5). Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 - target/ppc/insn32.decode | 2 + target/ppc/int_helper.c | 18 --------- target/ppc/translate/vmx-impl.c.inc | 61 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 +- 5 files changed, 60 insertions(+), 26 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index f02a9497b7..f7047ed2aa 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -193,11 +193,9 @@ DEF_HELPER_FLAGS_3(vslo, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vaddcuw, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_3(vsubcuw, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 9a509e84df..aebc7b73c8 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -608,12 +608,14 @@ VRLQNM 000100 ..... ..... ..... 00101000101 @VX ## Vector Integer Arithmetic Instructions +VADDCUW 000100 ..... ..... ..... 00110000000 @VX VADDCUQ 000100 ..... ..... ..... 00101000000 @VX VADDUQM 000100 ..... ..... ..... 00100000000 @VX VADDEUQM 000100 ..... ..... ..... ..... 111100 @VA VADDECUQ 000100 ..... ..... ..... ..... 111101 @VA +VSUBCUW 000100 ..... ..... ..... 10110000000 @VX VSUBCUQ 000100 ..... ..... ..... 10101000000 @VX VSUBUQM 000100 ..... ..... ..... 10100000000 @VX diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index ae1ba8084d..f8dd12e8ae 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -492,15 +492,6 @@ static inline void set_vscr_sat(CPUPPCState *env) env->vscr_sat.u32[0] = 1; } -void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(r->u32); i++) { - r->u32[i] = ~a->u32[i] < b->u32[i]; - } -} - /* vprtybw */ void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b) { @@ -1962,15 +1953,6 @@ void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) #endif } -void helper_vsubcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(r->u32); i++) { - r->u32[i] = a->u32[i] >= b->u32[i]; - } -} - void helper_vsumsws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) { int64_t t; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 3acd585a2f..f52485a5f1 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -803,8 +803,6 @@ GEN_VXFORM(vsrv, 2, 28); GEN_VXFORM(vslv, 2, 29); GEN_VXFORM(vslo, 6, 16); GEN_VXFORM(vsro, 6, 17); -GEN_VXFORM(vaddcuw, 0, 6); -GEN_VXFORM(vsubcuw, 0, 22); static bool do_vector_gvec3_VX(DisasContext *ctx, arg_VX *a, int vece, void (*gen_gvec)(unsigned, uint32_t, uint32_t, @@ -2847,8 +2845,6 @@ static void gen_xpnd04_2(DisasContext *ctx) } -GEN_VXFORM_DUAL(vsubcuw, PPC_ALTIVEC, PPC_NONE, \ - xpnd04_1, PPC_NONE, PPC2_ISA300) GEN_VXFORM_DUAL(vsubsws, PPC_ALTIVEC, PPC_NONE, \ xpnd04_2, PPC_NONE, PPC2_ISA300) @@ -3110,6 +3106,63 @@ TRANS_FLAGS2(ALTIVEC_207, VPMSUMD, do_vx_helper, gen_helper_VPMSUMD) TRANS_FLAGS2(ALTIVEC_207, VSUBCUQ, do_vx_helper, gen_helper_VSUBCUQ) TRANS_FLAGS2(ALTIVEC_207, VSUBUQM, do_vx_helper, gen_helper_VSUBUQM) +static void gen_VADDCUW_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_not_vec(vece, a, a); + tcg_gen_cmp_vec(TCG_COND_LTU, vece, t, a, b); + tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(t, vece, 1)); +} + +static void gen_VADDCUW_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_not_i32(a, a); + tcg_gen_setcond_i32(TCG_COND_LTU, t, a, b); +} + +static void gen_VSUBCUW_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_cmp_vec(TCG_COND_GEU, vece, t, a, b); + tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(t, vece, 1)); +} + +static void gen_VSUBCUW_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_setcond_i32(TCG_COND_GEU, t, a, b); +} + +static bool do_vx_vaddsubcuw(DisasContext *ctx, arg_VX *a, int add) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_cmp_vec, 0 + }; + + static const GVecGen3 op[] = { + { + .fniv = gen_VSUBCUW_vec, + .fni4 = gen_VSUBCUW_i32, + .opt_opc = vecop_list, + .vece = MO_32 + }, + { + .fniv = gen_VADDCUW_vec, + .fni4 = gen_VADDCUW_i32, + .opt_opc = vecop_list, + .vece = MO_32 + }, + }; + + REQUIRE_INSNS_FLAGS(ctx, ALTIVEC); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), 16, 16, &op[add]); + + return true; +} + +TRANS(VSUBCUW, do_vx_vaddsubcuw, 0) +TRANS(VADDCUW, do_vx_vaddsubcuw, 1) + static bool do_vx_vmuleo(DisasContext *ctx, arg_VX *a, bool even, void (*gen_mul)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 7cd9d40e06..ded0234123 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -106,12 +106,11 @@ GEN_VXFORM_300(vsrv, 2, 28), GEN_VXFORM_300(vslv, 2, 29), GEN_VXFORM(vslo, 6, 16), GEN_VXFORM(vsro, 6, 17), -GEN_VXFORM(vaddcuw, 0, 6), GEN_HANDLER_E_2(vprtybw, 0x4, 0x1, 0x18, 8, 0, PPC_NONE, PPC2_ISA300), GEN_HANDLER_E_2(vprtybd, 0x4, 0x1, 0x18, 9, 0, PPC_NONE, PPC2_ISA300), GEN_HANDLER_E_2(vprtybq, 0x4, 0x1, 0x18, 10, 0, PPC_NONE, PPC2_ISA300), -GEN_VXFORM_DUAL(vsubcuw, xpnd04_1, 0, 22, PPC_ALTIVEC, PPC_NONE), +GEN_VXFORM(xpnd04_1, 0, 22), GEN_VXFORM_300(bcdsr, 0, 23), GEN_VXFORM_300(bcdsr, 0, 31), GEN_VXFORM_DUAL(vaddubs, vmul10uq, 0, 8, PPC_ALTIVEC, PPC_NONE), From patchwork Mon Oct 10 19:13:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688249 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTmk5WMDz1yqk for ; Tue, 11 Oct 2022 06:41:10 +1100 (AEDT) Received: from localhost ([::1]:54270 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyeC-0006IU-GR for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:41:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46576) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFD-0001LK-Jx; Mon, 10 Oct 2022 15:15:19 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFB-0003Ag-RO; Mon, 10 Oct 2022 15:15:19 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:13:59 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 65E43800278; Mon, 10 Oct 2022 16:13:59 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 04/12] target/ppc: Move VNEG[WD] to decodtree and use gvec Date: Mon, 10 Oct 2022 16:13:48 -0300 Message-Id: <20221010191356.83659-5-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:13:59.0655 (UTC) FILETIME=[7386A770:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved the instructions VNEGW and VNEGD to decodetree and used gvec to decode it. vnegw: rept loop master patch 8 12500 0,01053200 0,00548400 (-47.9%) 25 4000 0,01030500 0,00390000 (-62.2%) 100 1000 0,01096300 0,00395400 (-63.9%) 500 200 0,01472000 0,00712300 (-51.6%) 2500 40 0,03809000 0,02147700 (-43.6%) 8000 12 0,09957100 0,06202100 (-37.7%) vnegd: rept loop master patch 8 12500 0,00594600 0,00543800 (-8.5%) 25 4000 0,00575200 0,00396400 (-31.1%) 100 1000 0,00676100 0,00394800 (-41.6%) 500 200 0,01149300 0,00709400 (-38.3%) 2500 40 0,03441500 0,02169600 (-37.0%) 8000 12 0,09516900 0,06337000 (-33.4%) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 -- target/ppc/insn32.decode | 3 +++ target/ppc/int_helper.c | 12 ------------ target/ppc/translate/vmx-impl.c.inc | 15 +++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 2 -- 5 files changed, 16 insertions(+), 18 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index f7047ed2aa..b2e910b089 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -229,8 +229,6 @@ DEF_HELPER_FLAGS_2(VSTRIBL, TCG_CALL_NO_RWG, i32, avr, avr) DEF_HELPER_FLAGS_2(VSTRIBR, TCG_CALL_NO_RWG, i32, avr, avr) DEF_HELPER_FLAGS_2(VSTRIHL, TCG_CALL_NO_RWG, i32, avr, avr) DEF_HELPER_FLAGS_2(VSTRIHR, TCG_CALL_NO_RWG, i32, avr, avr) -DEF_HELPER_FLAGS_2(vnegw, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vnegd, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vupkhpx, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vupklpx, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vupkhsb, TCG_CALL_NO_RWG, void, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index aebc7b73c8..2658dd3395 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -629,6 +629,9 @@ VEXTSH2D 000100 ..... 11001 ..... 11000000010 @VX_tb VEXTSW2D 000100 ..... 11010 ..... 11000000010 @VX_tb VEXTSD2Q 000100 ..... 11011 ..... 11000000010 @VX_tb +VNEGD 000100 ..... 00111 ..... 11000000010 @VX_tb +VNEGW 000100 ..... 00110 ..... 11000000010 @VX_tb + ## Vector Mask Manipulation Instructions MTVSRBM 000100 ..... 10000 ..... 11001000010 @VX_tb diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index f8dd12e8ae..c7fd0d1faa 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1928,18 +1928,6 @@ XXBLEND(W, 32) XXBLEND(D, 64) #undef XXBLEND -#define VNEG(name, element) \ -void helper_##name(ppc_avr_t *r, ppc_avr_t *b) \ -{ \ - int i; \ - for (i = 0; i < ARRAY_SIZE(r->element); i++) { \ - r->element[i] = -b->element[i]; \ - } \ -} -VNEG(vnegw, s32) -VNEG(vnegd, s64) -#undef VNEG - void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) { int sh = (b->VsrB(0xf) >> 3) & 0xf; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index f52485a5f1..b9a9e83ab3 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -2625,8 +2625,19 @@ GEN_VXFORM_NOA(vclzb, 1, 28) GEN_VXFORM_NOA(vclzh, 1, 29) GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) -GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) -GEN_VXFORM_NOA_2(vnegd, 1, 24, 7) + +static bool do_vneg(DisasContext *ctx, arg_VX_tb *a, unsigned vece) +{ + REQUIRE_INSNS_FLAGS2(ctx, ISA300); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_neg(vece, avr_full_offset(a->vrt), avr_full_offset(a->vrb), + 16, 16); + return true; +} + +TRANS(VNEGW, do_vneg, MO_32) +TRANS(VNEGD, do_vneg, MO_64) static void gen_vexts_i64(TCGv_i64 t, TCGv_i64 b, int64_t s) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index ded0234123..27908533dd 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -181,8 +181,6 @@ GEN_VXFORM_300_EXT(vextractd, 6, 11, 0x100000), GEN_VXFORM(vspltisb, 6, 12), GEN_VXFORM(vspltish, 6, 13), GEN_VXFORM(vspltisw, 6, 14), -GEN_VXFORM_300_EO(vnegw, 0x01, 0x18, 0x06), -GEN_VXFORM_300_EO(vnegd, 0x01, 0x18, 0x07), GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C), GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D), GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E), From patchwork Mon Oct 10 19:13:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688237 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTSZ5F9Tz1yqk for ; Tue, 11 Oct 2022 06:27:09 +1100 (AEDT) Received: from localhost ([::1]:54708 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyQb-0005tY-LT for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:27:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46578) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFH-0001QH-Qo; Mon, 10 Oct 2022 15:15:26 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFF-0003Ag-Q9; Mon, 10 Oct 2022 15:15:23 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:13:59 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id B0B36800278; Mon, 10 Oct 2022 16:13:59 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 05/12] target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec Date: Mon, 10 Oct 2022 16:13:49 -0300 Message-Id: <20221010191356.83659-6-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:00.0014 (UTC) FILETIME=[73BD6EE0:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8, respectively. vprtybw: rept loop master patch 8 12500 0,00991200 0,00626300 (-36.8%) 25 4000 0,01040600 0,00550600 (-47.1%) 100 1000 0,01084500 0,00601100 (-44.6%) 500 200 0,01490600 0,01394100 (-6.5%) 2500 40 0,03285100 0,05143000 (+56.6%) 8000 12 0,08971500 0,14662500 (+63.4%) vprtybd: rept loop master patch 8 12500 0,00665800 0,00652800 (-2.0%) 25 4000 0,00589300 0,00670400 (+13.8%) 100 1000 0,00646800 0,00743900 (+15.0%) 500 200 0,01065800 0,01586400 (+48.8%) 2500 40 0,03497000 0,07180100 (+105.3%) 8000 12 0,09242200 0,21566600 (+133.3%) vprtybq: rept loop master patch 8 12500 0,00656200 0,00665800 (+1.5%) 25 4000 0,00620500 0,00644900 (+3.9%) 100 1000 0,00707500 0,00764900 (+8.1%) 500 200 0,01203500 0,01349500 (+12.1%) 2500 40 0,03505700 0,04123100 (+17.6%) 8000 12 0,09590600 0,11586700 (+20.8%) I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ, I'm not sure if it's worth to move those instructions. Comparing the assembly of the helper with the TCGop they are pretty similar, so I'm not sure why vprtybd took so much more time. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 4 +- target/ppc/insn32.decode | 4 ++ target/ppc/int_helper.c | 25 +-------- target/ppc/translate/vmx-impl.c.inc | 80 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 -- 5 files changed, 83 insertions(+), 33 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index b2e910b089..a06193bc67 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -193,9 +193,7 @@ DEF_HELPER_FLAGS_3(vslo, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr) +DEF_HELPER_FLAGS_3(VPRTYBQ, TCG_CALL_NO_RWG, void, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 2658dd3395..aa4968e6b9 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -529,6 +529,10 @@ VCTZDM 000100 ..... ..... ..... 11111000100 @VX VPDEPD 000100 ..... ..... ..... 10111001101 @VX VPEXTD 000100 ..... ..... ..... 10110001101 @VX +VPRTYBD 000100 ..... 01001 ..... 11000000010 @VX_tb +VPRTYBQ 000100 ..... 01010 ..... 11000000010 @VX_tb +VPRTYBW 000100 ..... 01000 ..... 11000000010 @VX_tb + ## Vector Permute and Formatting Instruction VEXTDUBVLX 000100 ..... ..... ..... ..... 011000 @VA diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index c7fd0d1faa..c6ce4665fa 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -492,31 +492,8 @@ static inline void set_vscr_sat(CPUPPCState *env) env->vscr_sat.u32[0] = 1; } -/* vprtybw */ -void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b) -{ - int i; - for (i = 0; i < ARRAY_SIZE(r->u32); i++) { - uint64_t res = b->u32[i] ^ (b->u32[i] >> 16); - res ^= res >> 8; - r->u32[i] = res & 1; - } -} - -/* vprtybd */ -void helper_vprtybd(ppc_avr_t *r, ppc_avr_t *b) -{ - int i; - for (i = 0; i < ARRAY_SIZE(r->u64); i++) { - uint64_t res = b->u64[i] ^ (b->u64[i] >> 32); - res ^= res >> 16; - res ^= res >> 8; - r->u64[i] = res & 1; - } -} - /* vprtybq */ -void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b) +void helper_VPRTYBQ(ppc_avr_t *r, ppc_avr_t *b, uint32_t v) { uint64_t res = b->u64[0] ^ b->u64[1]; res ^= res >> 32; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index b9a9e83ab3..23601942bc 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -1659,9 +1659,83 @@ GEN_VXFORM_NOA_ENV(vrfim, 5, 11); GEN_VXFORM_NOA_ENV(vrfin, 5, 8); GEN_VXFORM_NOA_ENV(vrfip, 5, 10); GEN_VXFORM_NOA_ENV(vrfiz, 5, 9); -GEN_VXFORM_NOA(vprtybw, 1, 24); -GEN_VXFORM_NOA(vprtybd, 1, 24); -GEN_VXFORM_NOA(vprtybq, 1, 24); + +static void gen_vprtyb_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + int i; + TCGv_vec tmp = tcg_temp_new_vec_matching(b); + /* MO_32 is 2, so 2 iteractions for MO_32 and 3 for MO_64 */ + for (i = 0; i < vece; i++) { + tcg_gen_shri_vec(vece, tmp, b, (4 << (vece - i))); + tcg_gen_xor_vec(vece, b, tmp, b); + } + tcg_gen_and_vec(vece, t, b, tcg_constant_vec_matching(t, vece, 1)); + tcg_temp_free_vec(tmp); +} + +/* vprtybw */ +static void gen_vprtyb_i32(TCGv_i32 t, TCGv_i32 b) +{ + TCGv_i32 tmp = tcg_temp_new_i32(); + tcg_gen_shri_i32(tmp, b, 16); + tcg_gen_xor_i32(b, tmp, b); + tcg_gen_shri_i32(tmp, b, 8); + tcg_gen_xor_i32(b, tmp, b); + tcg_gen_and_i32(t, b, tcg_constant_i32(1)); + tcg_temp_free_i32(tmp); +} + +/* vprtybd */ +static void gen_vprtyb_i64(TCGv_i64 t, TCGv_i64 b) +{ + TCGv_i64 tmp = tcg_temp_new_i64(); + tcg_gen_shri_i64(tmp, b, 32); + tcg_gen_xor_i64(b, tmp, b); + tcg_gen_shri_i64(tmp, b, 16); + tcg_gen_xor_i64(b, tmp, b); + tcg_gen_shri_i64(tmp, b, 8); + tcg_gen_xor_i64(b, tmp, b); + tcg_gen_and_i64(t, b, tcg_constant_i64(1)); + tcg_temp_free_i64(tmp); +} + +static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a, unsigned vece) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_shri_vec, 0 + }; + + static const GVecGen2 op[] = { + { + .fniv = gen_vprtyb_vec, + .fni4 = gen_vprtyb_i32, + .opt_opc = vecop_list, + .vece = MO_32 + }, + { + .fniv = gen_vprtyb_vec, + .fni8 = gen_vprtyb_i64, + .opt_opc = vecop_list, + .vece = MO_64 + }, + { + .fno = gen_helper_VPRTYBQ, + .vece = MO_128 + }, + }; + + REQUIRE_INSNS_FLAGS2(ctx, ISA300); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_2(avr_full_offset(a->vrt), avr_full_offset(a->vrb), + 16, 16, &op[vece - MO_32]); + + return true; +} + +TRANS(VPRTYBW, do_vx_vprtyb, MO_32) +TRANS(VPRTYBD, do_vx_vprtyb, MO_64) +TRANS(VPRTYBQ, do_vx_vprtyb, MO_128) static void gen_vsplt(DisasContext *ctx, int vece) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 27908533dd..46a620a232 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -106,9 +106,6 @@ GEN_VXFORM_300(vsrv, 2, 28), GEN_VXFORM_300(vslv, 2, 29), GEN_VXFORM(vslo, 6, 16), GEN_VXFORM(vsro, 6, 17), -GEN_HANDLER_E_2(vprtybw, 0x4, 0x1, 0x18, 8, 0, PPC_NONE, PPC2_ISA300), -GEN_HANDLER_E_2(vprtybd, 0x4, 0x1, 0x18, 9, 0, PPC_NONE, PPC2_ISA300), -GEN_HANDLER_E_2(vprtybq, 0x4, 0x1, 0x18, 10, 0, PPC_NONE, PPC2_ISA300), GEN_VXFORM(xpnd04_1, 0, 22), GEN_VXFORM_300(bcdsr, 0, 23), From patchwork Mon Oct 10 19:13:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688252 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTvY2wf1z23jS for ; Tue, 11 Oct 2022 06:47:05 +1100 (AEDT) Received: from localhost ([::1]:34464 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyjv-0003xg-3n for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:47:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59120) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFL-0001Vg-F2; Mon, 10 Oct 2022 15:15:27 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFI-0003Ag-UJ; Mon, 10 Oct 2022 15:15:27 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:00 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 09225800278; Mon, 10 Oct 2022 16:14:00 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 06/12] target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec Date: Mon, 10 Oct 2022 16:13:50 -0300 Message-Id: <20221010191356.83659-7-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:00.0249 (UTC) FILETIME=[73E14A90:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW, to decodetree and use gvec with them. For these one the right shift had to be made before the sum as to avoid an overflow, so add 1 at the end if any of the entries had 1 in its LSB as to replicate the "+ 1" before the shift described by the ISA. vavgub: rept loop master patch 8 12500 0,02616600 0,00754200 (-71.2%) 25 4000 0,02530000 0,00637700 (-74.8%) 100 1000 0,02604600 0,00790100 (-69.7%) 500 200 0,03189300 0,01838400 (-42.4%) 2500 40 0,06006900 0,06851000 (+14.1%) 8000 12 0,13941000 0,20548500 (+47.4%) vavguh: rept loop master patch 8 12500 0,01818200 0,00780600 (-57.1%) 25 4000 0,01789300 0,00641600 (-64.1%) 100 1000 0,01899100 0,00787200 (-58.5%) 500 200 0,02527200 0,01828400 (-27.7%) 2500 40 0,05361800 0,06773000 (+26.3%) 8000 12 0,12886600 0,20291400 (+57.5%) vavguw: rept loop master patch 8 12500 0,01423100 0,00776600 (-45.4%) 25 4000 0,01780800 0,00638600 (-64.1%) 100 1000 0,02085500 0,00787000 (-62.3%) 500 200 0,02737100 0,01828800 (-33.2%) 2500 40 0,05572600 0,06774200 (+21.6%) 8000 12 0,13101700 0,20311600 (+55.0%) vavgsb: rept loop master patch 8 12500 0,03006000 0,00788600 (-73.8%) 25 4000 0,02882200 0,00637800 (-77.9%) 100 1000 0,02958000 0,00791400 (-73.2%) 500 200 0,03548800 0,01860400 (-47.6%) 2500 40 0,06360000 0,06850800 (+7.7%) 8000 12 0,13816500 0,20550300 (+48.7%) vavgsh: rept loop master patch 8 12500 0,01965900 0,00776600 (-60.5%) 25 4000 0,01875400 0,00638700 (-65.9%) 100 1000 0,01952200 0,00786900 (-59.7%) 500 200 0,02562000 0,01760300 (-31.3%) 2500 40 0,05384300 0,06742800 (+25.2%) 8000 12 0,13240800 0,20330000 (+53.5%) vavgsw: rept loop master patch 8 12500 0,01407700 0,00775600 (-44.9%) 25 4000 0,01762300 0,00640000 (-63.7%) 100 1000 0,02046500 0,00788500 (-61.5%) 500 200 0,02745600 0,01843000 (-32.9%) 2500 40 0,05375500 0,06820500 (+26.9%) 8000 12 0,13068300 0,20304900 (+55.4%) These results to me seems to indicate that with gvec the results have a slower translation but faster execution. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 12 ++-- target/ppc/insn32.decode | 9 +++ target/ppc/int_helper.c | 32 ++++----- target/ppc/translate/vmx-impl.c.inc | 106 ++++++++++++++++++++++++---- target/ppc/translate/vmx-ops.c.inc | 9 +-- 5 files changed, 127 insertions(+), 41 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index a06193bc67..71c22efc2e 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -143,15 +143,15 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64) #define dh_ctype_acc ppc_acc_t * #define dh_typecode_acc dh_typecode_ptr -DEF_HELPER_FLAGS_3(vavgub, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavguh, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavguw, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_4(VAVGUB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGUH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_3(vabsdub, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vabsduh, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vabsduw, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavgsb, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavgsh, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavgsw, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_4(VAVGSB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGSH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGSW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_4(vcmpeqfp, void, env, avr, avr, avr) DEF_HELPER_4(vcmpgefp, void, env, avr, avr, avr) DEF_HELPER_4(vcmpgtfp, void, env, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index aa4968e6b9..38458c01de 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -519,6 +519,15 @@ VCMPNEZW 000100 ..... ..... ..... . 0110000111 @VC VCMPSQ 000100 ... -- ..... ..... 00101000001 @VX_bf VCMPUQ 000100 ... -- ..... ..... 00100000001 @VX_bf +## Vector Integer Average Instructions + +VAVGSB 000100 ..... ..... ..... 10100000010 @VX +VAVGSH 000100 ..... ..... ..... 10101000010 @VX +VAVGSW 000100 ..... ..... ..... 10110000010 @VX +VAVGUB 000100 ..... ..... ..... 10000000010 @VX +VAVGUH 000100 ..... ..... ..... 10001000010 @VX +VAVGUW 000100 ..... ..... ..... 10010000010 @VX + ## Vector Bit Manipulation Instruction VGNB 000100 ..... -- ... ..... 10011001100 @VX_n diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index c6ce4665fa..bda76e54d4 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -570,25 +570,23 @@ VARITHSAT_UNSIGNED(w, u32, uint64_t, cvtsduw) #undef VARITHSAT_SIGNED #undef VARITHSAT_UNSIGNED -#define VAVG_DO(name, element, etype) \ - void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \ - { \ - int i; \ - \ - for (i = 0; i < ARRAY_SIZE(r->element); i++) { \ - etype x = (etype)a->element[i] + (etype)b->element[i] + 1; \ - r->element[i] = x >> 1; \ - } \ +#define VAVG(name, element, etype) \ + void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t v)\ + { \ + int i; \ + \ + for (i = 0; i < ARRAY_SIZE(r->element); i++) { \ + etype x = (etype)a->element[i] + (etype)b->element[i] + 1; \ + r->element[i] = x >> 1; \ + } \ } -#define VAVG(type, signed_element, signed_type, unsigned_element, \ - unsigned_type) \ - VAVG_DO(avgs##type, signed_element, signed_type) \ - VAVG_DO(avgu##type, unsigned_element, unsigned_type) -VAVG(b, s8, int16_t, u8, uint16_t) -VAVG(h, s16, int32_t, u16, uint32_t) -VAVG(w, s32, int64_t, u32, uint64_t) -#undef VAVG_DO +VAVG(VAVGSB, s8, int16_t) +VAVG(VAVGUB, u8, uint16_t) +VAVG(VAVGSH, s16, int32_t) +VAVG(VAVGUH, u16, uint32_t) +VAVG(VAVGSW, s32, int64_t) +VAVG(VAVGUW, u32, uint64_t) #undef VAVG #define VABSDU_DO(name, element) \ diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 23601942bc..1e3e099739 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -431,21 +431,9 @@ GEN_VXFORM_V(vminsb, MO_8, tcg_gen_gvec_smin, 1, 12); GEN_VXFORM_V(vminsh, MO_16, tcg_gen_gvec_smin, 1, 13); GEN_VXFORM_V(vminsw, MO_32, tcg_gen_gvec_smin, 1, 14); GEN_VXFORM_V(vminsd, MO_64, tcg_gen_gvec_smin, 1, 15); -GEN_VXFORM(vavgub, 1, 16); GEN_VXFORM(vabsdub, 1, 16); -GEN_VXFORM_DUAL(vavgub, PPC_ALTIVEC, PPC_NONE, \ - vabsdub, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vavguh, 1, 17); GEN_VXFORM(vabsduh, 1, 17); -GEN_VXFORM_DUAL(vavguh, PPC_ALTIVEC, PPC_NONE, \ - vabsduh, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vavguw, 1, 18); GEN_VXFORM(vabsduw, 1, 18); -GEN_VXFORM_DUAL(vavguw, PPC_ALTIVEC, PPC_NONE, \ - vabsduw, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vavgsb, 1, 20); -GEN_VXFORM(vavgsh, 1, 21); -GEN_VXFORM(vavgsw, 1, 22); GEN_VXFORM(vmrghb, 6, 0); GEN_VXFORM(vmrghh, 6, 1); GEN_VXFORM(vmrghw, 6, 2); @@ -3385,6 +3373,100 @@ TRANS(VMULHSD, do_vx_mulh, true , do_vx_vmulhd_i64) TRANS(VMULHUW, do_vx_mulh, false, do_vx_vmulhw_i64) TRANS(VMULHUD, do_vx_mulh, false, do_vx_vmulhd_i64) +static void do_vavg(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b, + void (*gen_shr_vec)(unsigned, TCGv_vec, TCGv_vec, int64_t)) +{ + TCGv_vec tmp = tcg_temp_new_vec_matching(t); + tcg_gen_or_vec(vece, tmp, a, b); + tcg_gen_and_vec(vece, tmp, tmp, tcg_constant_vec_matching(t, vece, 1)); + gen_shr_vec(vece, a, a, 1); + gen_shr_vec(vece, b, b, 1); + tcg_gen_add_vec(vece, t, a, b); + tcg_gen_add_vec(vece, t, t, tmp); + tcg_temp_free_vec(tmp); +} + +QEMU_FLATTEN +static void gen_vavgu(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + do_vavg(vece, t, a, b, tcg_gen_shri_vec); +} + +QEMU_FLATTEN +static void gen_vavgs(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + do_vavg(vece, t, a, b, tcg_gen_sari_vec); +} + +static bool do_vx_vavg(DisasContext *ctx, arg_VX *a, int sign, int vece) +{ + static const TCGOpcode vecop_list_s[] = { + INDEX_op_add_vec, INDEX_op_sari_vec, 0 + }; + static const TCGOpcode vecop_list_u[] = { + INDEX_op_add_vec, INDEX_op_shri_vec, 0 + }; + + static const GVecGen3 op[2][3] = { + { + { + .fniv = gen_vavgu, + .fno = gen_helper_VAVGUB, + .opt_opc = vecop_list_u, + .vece = MO_8 + }, + { + .fniv = gen_vavgu, + .fno = gen_helper_VAVGUH, + .opt_opc = vecop_list_u, + .vece = MO_16 + }, + { + .fniv = gen_vavgu, + .fno = gen_helper_VAVGUW, + .opt_opc = vecop_list_u, + .vece = MO_32 + }, + }, + { + { + .fniv = gen_vavgs, + .fno = gen_helper_VAVGSB, + .opt_opc = vecop_list_s, + .vece = MO_8 + }, + { + .fniv = gen_vavgs, + .fno = gen_helper_VAVGSH, + .opt_opc = vecop_list_s, + .vece = MO_16 + }, + { + .fniv = gen_vavgs, + .fno = gen_helper_VAVGSW, + .opt_opc = vecop_list_s, + .vece = MO_32 + }, + }, + }; + + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), 16, 16, &op[sign][vece]); + + + return true; +} + + +TRANS_FLAGS(ALTIVEC, VAVGSB, do_vx_vavg, 1, MO_8) +TRANS_FLAGS(ALTIVEC, VAVGSH, do_vx_vavg, 1, MO_16) +TRANS_FLAGS(ALTIVEC, VAVGSW, do_vx_vavg, 1, MO_32) +TRANS_FLAGS(ALTIVEC, VAVGUB, do_vx_vavg, 0, MO_8) +TRANS_FLAGS(ALTIVEC, VAVGUH, do_vx_vavg, 0, MO_16) +TRANS_FLAGS(ALTIVEC, VAVGUW, do_vx_vavg, 0, MO_32) + static bool do_vdiv_vmod(DisasContext *ctx, arg_VX *a, const int vece, void (*func_32)(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b), void (*func_64)(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)) diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 46a620a232..02db51def0 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -83,12 +83,9 @@ GEN_VXFORM(vminsb, 1, 12), GEN_VXFORM(vminsh, 1, 13), GEN_VXFORM(vminsw, 1, 14), GEN_VXFORM_207(vminsd, 1, 15), -GEN_VXFORM_DUAL(vavgub, vabsdub, 1, 16, PPC_ALTIVEC, PPC_NONE), -GEN_VXFORM_DUAL(vavguh, vabsduh, 1, 17, PPC_ALTIVEC, PPC_NONE), -GEN_VXFORM_DUAL(vavguw, vabsduw, 1, 18, PPC_ALTIVEC, PPC_NONE), -GEN_VXFORM(vavgsb, 1, 20), -GEN_VXFORM(vavgsh, 1, 21), -GEN_VXFORM(vavgsw, 1, 22), +GEN_VXFORM(vabsdub, 1, 16), +GEN_VXFORM(vabsduh, 1, 17), +GEN_VXFORM(vabsduw, 1, 18), GEN_VXFORM(vmrghb, 6, 0), GEN_VXFORM(vmrghh, 6, 1), GEN_VXFORM(vmrghw, 6, 2), From patchwork Mon Oct 10 19:13:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688256 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmV2c4vNGz20cX for ; Tue, 11 Oct 2022 06:53:12 +1100 (AEDT) Received: from localhost ([::1]:37598 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohypq-00024K-EP for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:53:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59122) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFO-0001bi-B3; Mon, 10 Oct 2022 15:15:30 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFM-0003Ag-F6; Mon, 10 Oct 2022 15:15:30 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:00 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 46699800278; Mon, 10 Oct 2022 16:14:00 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 07/12] target/ppc: Move VABSDU[BHW] to decodetree and use gvec Date: Mon, 10 Oct 2022 16:13:51 -0300 Message-Id: <20221010191356.83659-8-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:00.0546 (UTC) FILETIME=[740E9C20:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to translate them. vabsdub: rept loop master patch 8 12500 0,03601600 0,00688500 (-80.9%) 25 4000 0,03651000 0,00532100 (-85.4%) 100 1000 0,03666900 0,00595300 (-83.8%) 500 200 0,04305800 0,01244600 (-71.1%) 2500 40 0,06893300 0,04273700 (-38.0%) 8000 12 0,14633200 0,12660300 (-13.5%) vabsduh: rept loop master patch 8 12500 0,02172400 0,00687500 (-68.4%) 25 4000 0,02154100 0,00531500 (-75.3%) 100 1000 0,02235400 0,00596300 (-73.3%) 500 200 0,02827500 0,01245100 (-56.0%) 2500 40 0,05638400 0,04285500 (-24.0%) 8000 12 0,13166000 0,12641400 (-4.0%) vabsduw: rept loop master patch 8 12500 0,01646400 0,00688300 (-58.2%) 25 4000 0,01454500 0,00475500 (-67.3%) 100 1000 0,01545800 0,00511800 (-66.9%) 500 200 0,02168200 0,01114300 (-48.6%) 2500 40 0,04571300 0,04138800 (-9.5%) 8000 12 0,12209500 0,12178500 (-0.3%) Same as VADDCUW and VSUBCUW, overall performance gain but it uses more TCGop (4 before the patch, 6 after). Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 6 ++-- target/ppc/insn32.decode | 6 ++++ target/ppc/int_helper.c | 13 +++----- target/ppc/translate/vmx-impl.c.inc | 49 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 -- 5 files changed, 60 insertions(+), 17 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 71c22efc2e..fd8280dfa7 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -146,9 +146,9 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64) DEF_HELPER_FLAGS_4(VAVGUB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGUH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) -DEF_HELPER_FLAGS_3(vabsdub, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vabsduh, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vabsduw, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_4(VABSDUB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VABSDUH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VABSDUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGSB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGSH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGSW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 38458c01de..ae151c4b62 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -528,6 +528,12 @@ VAVGUB 000100 ..... ..... ..... 10000000010 @VX VAVGUH 000100 ..... ..... ..... 10001000010 @VX VAVGUW 000100 ..... ..... ..... 10010000010 @VX +## Vector Integer Absolute Difference Instructions + +VABSDUB 000100 ..... ..... ..... 10000000011 @VX +VABSDUH 000100 ..... ..... ..... 10001000011 @VX +VABSDUW 000100 ..... ..... ..... 10010000011 @VX + ## Vector Bit Manipulation Instruction VGNB 000100 ..... -- ... ..... 10011001100 @VX_n diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index bda76e54d4..d97a7f1f28 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -589,8 +589,8 @@ VAVG(VAVGSW, s32, int64_t) VAVG(VAVGUW, u32, uint64_t) #undef VAVG -#define VABSDU_DO(name, element) \ -void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \ +#define VABSDU(name, element) \ +void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t v)\ { \ int i; \ \ @@ -606,12 +606,9 @@ void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \ * name - instruction mnemonic suffix (b: byte, h: halfword, w: word) * element - element type to access from vector */ -#define VABSDU(type, element) \ - VABSDU_DO(absdu##type, element) -VABSDU(b, u8) -VABSDU(h, u16) -VABSDU(w, u32) -#undef VABSDU_DO +VABSDU(VABSDUB, u8) +VABSDU(VABSDUH, u16) +VABSDU(VABSDUW, u32) #undef VABSDU #define VCF(suffix, cvt, element) \ diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 1e3e099739..f46a354d31 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -431,9 +431,6 @@ GEN_VXFORM_V(vminsb, MO_8, tcg_gen_gvec_smin, 1, 12); GEN_VXFORM_V(vminsh, MO_16, tcg_gen_gvec_smin, 1, 13); GEN_VXFORM_V(vminsw, MO_32, tcg_gen_gvec_smin, 1, 14); GEN_VXFORM_V(vminsd, MO_64, tcg_gen_gvec_smin, 1, 15); -GEN_VXFORM(vabsdub, 1, 16); -GEN_VXFORM(vabsduh, 1, 17); -GEN_VXFORM(vabsduw, 1, 18); GEN_VXFORM(vmrghb, 6, 0); GEN_VXFORM(vmrghh, 6, 1); GEN_VXFORM(vmrghw, 6, 2); @@ -3467,6 +3464,52 @@ TRANS_FLAGS(ALTIVEC, VAVGUB, do_vx_vavg, 0, MO_8) TRANS_FLAGS(ALTIVEC, VAVGUH, do_vx_vavg, 0, MO_16) TRANS_FLAGS(ALTIVEC, VAVGUW, do_vx_vavg, 0, MO_32) +static void gen_vabsdu(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_umax_vec(vece, t, a, b); + tcg_gen_umin_vec(vece, a, a, b); + tcg_gen_sub_vec(vece, t, t, a); +} + +static bool do_vabsdu(DisasContext *ctx, arg_VX *a, const int vece) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_umax_vec, INDEX_op_umin_vec, INDEX_op_sub_vec, 0 + }; + + static const GVecGen3 op[] = { + { + .fniv = gen_vabsdu, + .fno = gen_helper_VABSDUB, + .opt_opc = vecop_list, + .vece = MO_8 + }, + { + .fniv = gen_vabsdu, + .fno = gen_helper_VABSDUH, + .opt_opc = vecop_list, + .vece = MO_16 + }, + { + .fniv = gen_vabsdu, + .fno = gen_helper_VABSDUW, + .opt_opc = vecop_list, + .vece = MO_32 + }, + }; + + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), 16, 16, &op[vece]); + + return true; +} + +TRANS_FLAGS2(ISA300, VABSDUB, do_vabsdu, MO_8) +TRANS_FLAGS2(ISA300, VABSDUH, do_vabsdu, MO_16) +TRANS_FLAGS2(ISA300, VABSDUW, do_vabsdu, MO_32) + static bool do_vdiv_vmod(DisasContext *ctx, arg_VX *a, const int vece, void (*func_32)(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b), void (*func_64)(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)) diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 02db51def0..33fec8aca4 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -83,9 +83,6 @@ GEN_VXFORM(vminsb, 1, 12), GEN_VXFORM(vminsh, 1, 13), GEN_VXFORM(vminsw, 1, 14), GEN_VXFORM_207(vminsd, 1, 15), -GEN_VXFORM(vabsdub, 1, 16), -GEN_VXFORM(vabsduh, 1, 17), -GEN_VXFORM(vabsduw, 1, 18), GEN_VXFORM(vmrghb, 6, 0), GEN_VXFORM(vmrghh, 6, 1), GEN_VXFORM(vmrghw, 6, 2), From patchwork Mon Oct 10 19:13:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688244 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTcT34yXz23jS for ; Tue, 11 Oct 2022 06:34:00 +1100 (AEDT) Received: from localhost ([::1]:38844 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyXF-00045h-29 for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:33:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59124) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFR-0001h4-A3; Mon, 10 Oct 2022 15:15:33 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFP-0003Ag-Cv; Mon, 10 Oct 2022 15:15:33 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:00 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 88C9F800278; Mon, 10 Oct 2022 16:14:00 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 08/12] target/ppc: Use gvec to decode XV[N]ABS[DS]P/XVNEG[DS]P Date: Mon, 10 Oct 2022 16:13:52 -0300 Message-Id: <20221010191356.83659-9-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:00.0812 (UTC) FILETIME=[743732C0:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XVABSSP, XVABSDP, XVNABSSP,XVNABSDP, XVNEGSP and XVNEGDP to decodetree and used gvec to translate them. xvabssp: rept loop master patch 8 12500 0,00477900 0,00476000 (-0.4%) 25 4000 0,00442800 0,00353300 (-20.2%) 100 1000 0,00478700 0,00366100 (-23.5%) 500 200 0,00973200 0,00649400 (-33.3%) 2500 40 0,03165200 0,02226700 (-29.7%) 8000 12 0,09315900 0,06674900 (-28.3%) xvabsdp: rept loop master patch 8 12500 0,00475000 0,00474400 (-0.1%) 25 4000 0,00355600 0,00367500 (+3.3%) 100 1000 0,00444200 0,00366000 (-17.6%) 500 200 0,00942700 0,00732400 (-22.3%) 2500 40 0,02990000 0,02308500 (-22.8%) 8000 12 0,08770300 0,06683800 (-23.8%) xvnabssp: rept loop master patch 8 12500 0,00494500 0,00492900 (-0.3%) 25 4000 0,00397700 0,00338600 (-14.9%) 100 1000 0,00421400 0,00353500 (-16.1%) 500 200 0,01048000 0,00707100 (-32.5%) 2500 40 0,03251500 0,02238300 (-31.2%) 8000 12 0,08889100 0,06469800 (-27.2%) xvnabsdp: rept loop master patch 8 12500 0,00511000 0,00492700 (-3.6%) 25 4000 0,00398800 0,00381500 (-4.3%) 100 1000 0,00390500 0,00365900 (-6.3%) 500 200 0,00924800 0,00784600 (-15.2%) 2500 40 0,03138900 0,02391600 (-23.8%) 8000 12 0,09654200 0,05684600 (-41.1%) xvnegsp: rept loop master patch 8 12500 0,00493900 0,00452800 (-8.3%) 25 4000 0,00369100 0,00366800 (-0.6%) 100 1000 0,00371100 0,00380000 (+2.4%) 500 200 0,00991100 0,00652300 (-34.2%) 2500 40 0,03025800 0,02422300 (-19.9%) 8000 12 0,09251100 0,06457600 (-30.2%) xvnegdp: rept loop master patch 8 12500 0,00474900 0,00454400 (-4.3%) 25 4000 0,00353100 0,00325600 (-7.8%) 100 1000 0,00398600 0,00366800 (-8.0%) 500 200 0,01032300 0,00702400 (-32.0%) 2500 40 0,03125000 0,02422400 (-22.5%) 8000 12 0,09475100 0,06173000 (-34.9%) This one to me seemed the opposite of the previous instructions, as it looks like there was an improvement in the translation time (itself not a surprise as operations were done twice before so there was the need to translate twice as many TCGop) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/insn32.decode | 9 ++++ target/ppc/translate/vsx-impl.c.inc | 73 ++++++++++++++++++++++++++--- target/ppc/translate/vsx-ops.c.inc | 6 --- 3 files changed, 76 insertions(+), 12 deletions(-) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index ae151c4b62..5b687078be 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -754,6 +754,15 @@ STXVRHX 011111 ..... ..... ..... 0010101101 . @X_TSX STXVRWX 011111 ..... ..... ..... 0011001101 . @X_TSX STXVRDX 011111 ..... ..... ..... 0011101101 . @X_TSX +## VSX Vector Binary Floating-Point Sign Manipulation Instructions + +XVABSDP 111100 ..... 00000 ..... 111011001 .. @XX2 +XVABSSP 111100 ..... 00000 ..... 110011001 .. @XX2 +XVNABSDP 111100 ..... 00000 ..... 111101001 .. @XX2 +XVNABSSP 111100 ..... 00000 ..... 110101001 .. @XX2 +XVNEGDP 111100 ..... 00000 ..... 111111001 .. @XX2 +XVNEGSP 111100 ..... 00000 ..... 110111001 .. @XX2 + ## VSX Scalar Multiply-Add Instructions XSMADDADP 111100 ..... ..... ..... 00100001 . . . @XX3 diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 7acdbceec4..3f9af811dc 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -782,15 +782,76 @@ static void glue(gen_, name)(DisasContext *ctx) \ tcg_temp_free_i64(sgm); \ } -VSX_VECTOR_MOVE(xvabsdp, OP_ABS, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvnabsdp, OP_NABS, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvnegdp, OP_NEG, SGN_MASK_DP) VSX_VECTOR_MOVE(xvcpsgndp, OP_CPSGN, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvabssp, OP_ABS, SGN_MASK_SP) -VSX_VECTOR_MOVE(xvnabssp, OP_NABS, SGN_MASK_SP) -VSX_VECTOR_MOVE(xvnegsp, OP_NEG, SGN_MASK_SP) VSX_VECTOR_MOVE(xvcpsgnsp, OP_CPSGN, SGN_MASK_SP) +#define TCG_OP_IMM_i64(FUNC, OP, IMM) \ + static void FUNC(TCGv_i64 t, TCGv_i64 b) \ + { \ + OP(t, b, IMM); \ + } + +TCG_OP_IMM_i64(do_xvabssp_i64, tcg_gen_andi_i64, ~SGN_MASK_SP) +TCG_OP_IMM_i64(do_xvnabssp_i64, tcg_gen_ori_i64, SGN_MASK_SP) +TCG_OP_IMM_i64(do_xvnegsp_i64, tcg_gen_xori_i64, SGN_MASK_SP) +TCG_OP_IMM_i64(do_xvabsdp_i64, tcg_gen_andi_i64, ~SGN_MASK_DP) +TCG_OP_IMM_i64(do_xvnabsdp_i64, tcg_gen_ori_i64, SGN_MASK_DP) +TCG_OP_IMM_i64(do_xvnegdp_i64, tcg_gen_xori_i64, SGN_MASK_DP) +#undef TCG_OP_IMM_i64 + +static void xv_msb_op1(unsigned vece, TCGv_vec t, TCGv_vec b, + void (*tcg_gen_op_vec)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec)) +{ + uint64_t msb = (vece == MO_32) ? SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_op_vec(vece, t, b, tcg_constant_vec_matching(t, vece, msb)); +} + +static void do_xvabs_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + xv_msb_op1(vece, t, b, tcg_gen_andc_vec); +} + +static void do_xvnabs_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + xv_msb_op1(vece, t, b, tcg_gen_or_vec); +} + +static void do_xvneg_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + xv_msb_op1(vece, t, b, tcg_gen_xor_vec); +} + +static bool do_vsx_msb_op(DisasContext *ctx, arg_XX2 *a, unsigned vece, + void (*vec)(unsigned, TCGv_vec, TCGv_vec), + void (*i64)(TCGv_i64, TCGv_i64)) +{ + static const TCGOpcode vecop_list[] = { + 0 + }; + + const GVecGen2 op = { + .fni8 = i64, + .fniv = vec, + .opt_opc = vecop_list, + .vece = vece + }; + + REQUIRE_INSNS_FLAGS2(ctx, VSX); + REQUIRE_VSX(ctx); + + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), + 16, 16, &op); + + return true; +} + +TRANS(XVABSDP, do_vsx_msb_op, MO_64, do_xvabs_vec, do_xvabsdp_i64) +TRANS(XVNABSDP, do_vsx_msb_op, MO_64, do_xvnabs_vec, do_xvnabsdp_i64) +TRANS(XVNEGDP, do_vsx_msb_op, MO_64, do_xvneg_vec, do_xvnegdp_i64) +TRANS(XVABSSP, do_vsx_msb_op, MO_32, do_xvabs_vec, do_xvabssp_i64) +TRANS(XVNABSSP, do_vsx_msb_op, MO_32, do_xvnabs_vec, do_xvnabssp_i64) +TRANS(XVNEGSP, do_vsx_msb_op, MO_32, do_xvneg_vec, do_xvnegsp_i64) + #define VSX_CMP(name, op1, op2, inval, type) \ static void gen_##name(DisasContext *ctx) \ { \ diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index bff14bbece..b77324e0a8 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -165,13 +165,7 @@ GEN_XX3FORM(name, opc2, opc3 | 1, fl2) GEN_XX2FORM_DCMX(xvtstdcdp, 0x14, 0x1E, PPC2_ISA300), GEN_XX2FORM_DCMX(xvtstdcsp, 0x14, 0x1A, PPC2_ISA300), -GEN_XX2FORM(xvabsdp, 0x12, 0x1D, PPC2_VSX), -GEN_XX2FORM(xvnabsdp, 0x12, 0x1E, PPC2_VSX), -GEN_XX2FORM(xvnegdp, 0x12, 0x1F, PPC2_VSX), GEN_XX3FORM(xvcpsgndp, 0x00, 0x1E, PPC2_VSX), -GEN_XX2FORM(xvabssp, 0x12, 0x19, PPC2_VSX), -GEN_XX2FORM(xvnabssp, 0x12, 0x1A, PPC2_VSX), -GEN_XX2FORM(xvnegsp, 0x12, 0x1B, PPC2_VSX), GEN_XX3FORM(xvcpsgnsp, 0x00, 0x1A, PPC2_VSX), GEN_XX3FORM(xsadddp, 0x00, 0x04, PPC2_VSX), From patchwork Mon Oct 10 19:13:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688245 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmTj90zgsz20cX for ; Tue, 11 Oct 2022 06:38:04 +1100 (AEDT) Received: from localhost ([::1]:52876 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohybB-0002Ie-FQ for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:38:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53902) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFU-0001nH-8y; Mon, 10 Oct 2022 15:15:36 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFS-0003Ag-FP; Mon, 10 Oct 2022 15:15:36 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:01 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id BE474800631; Mon, 10 Oct 2022 16:14:00 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 09/12] target/ppc: Use gvec to decode XVCPSGN[SD]P Date: Mon, 10 Oct 2022 16:13:53 -0300 Message-Id: <20221010191356.83659-10-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:01.0015 (UTC) FILETIME=[74562C70:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XVCPSGNSP and XVCPSGNDP to decodetree and used gvec to translate them. xvcpsgnsp: rept loop master patch 8 12500 0,00561400 0,00537900 (-4.2%) 25 4000 0,00562100 0,00400000 (-28.8%) 100 1000 0,00696900 0,00416300 (-40.3%) 500 200 0,02211900 0,00840700 (-62.0%) 2500 40 0,09328600 0,02728300 (-70.8%) 8000 12 0,27295300 0,06867800 (-74.8%) xvcpsgndp: rept loop master patch 8 12500 0,00556300 0,00584200 (+5.0%) 25 4000 0,00482700 0,00431700 (-10.6%) 100 1000 0,00585800 0,00464400 (-20.7%) 500 200 0,01565300 0,00839700 (-46.4%) 2500 40 0,05766500 0,02430600 (-57.8%) 8000 12 0,19875300 0,07947100 (-60.0%) Like the previous instructions there seemed to be a improvement on translation time. Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/insn32.decode | 2 + target/ppc/translate/vsx-impl.c.inc | 109 ++++++++++++++-------------- target/ppc/translate/vsx-ops.c.inc | 3 - 3 files changed, 55 insertions(+), 59 deletions(-) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 5b687078be..6549c4040e 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -762,6 +762,8 @@ XVNABSDP 111100 ..... 00000 ..... 111101001 .. @XX2 XVNABSSP 111100 ..... 00000 ..... 110101001 .. @XX2 XVNEGDP 111100 ..... 00000 ..... 111111001 .. @XX2 XVNEGSP 111100 ..... 00000 ..... 110111001 .. @XX2 +XVCPSGNDP 111100 ..... ..... ..... 11110000 ... @XX3 +XVCPSGNSP 111100 ..... ..... ..... 11010000 ... @XX3 ## VSX Scalar Multiply-Add Instructions diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 3f9af811dc..4f17da514c 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -729,62 +729,6 @@ VSX_SCALAR_MOVE_QP(xsnabsqp, OP_NABS, SGN_MASK_DP) VSX_SCALAR_MOVE_QP(xsnegqp, OP_NEG, SGN_MASK_DP) VSX_SCALAR_MOVE_QP(xscpsgnqp, OP_CPSGN, SGN_MASK_DP) -#define VSX_VECTOR_MOVE(name, op, sgn_mask) \ -static void glue(gen_, name)(DisasContext *ctx) \ - { \ - TCGv_i64 xbh, xbl, sgm; \ - if (unlikely(!ctx->vsx_enabled)) { \ - gen_exception(ctx, POWERPC_EXCP_VSXU); \ - return; \ - } \ - xbh = tcg_temp_new_i64(); \ - xbl = tcg_temp_new_i64(); \ - sgm = tcg_temp_new_i64(); \ - get_cpu_vsr(xbh, xB(ctx->opcode), true); \ - get_cpu_vsr(xbl, xB(ctx->opcode), false); \ - tcg_gen_movi_i64(sgm, sgn_mask); \ - switch (op) { \ - case OP_ABS: { \ - tcg_gen_andc_i64(xbh, xbh, sgm); \ - tcg_gen_andc_i64(xbl, xbl, sgm); \ - break; \ - } \ - case OP_NABS: { \ - tcg_gen_or_i64(xbh, xbh, sgm); \ - tcg_gen_or_i64(xbl, xbl, sgm); \ - break; \ - } \ - case OP_NEG: { \ - tcg_gen_xor_i64(xbh, xbh, sgm); \ - tcg_gen_xor_i64(xbl, xbl, sgm); \ - break; \ - } \ - case OP_CPSGN: { \ - TCGv_i64 xah = tcg_temp_new_i64(); \ - TCGv_i64 xal = tcg_temp_new_i64(); \ - get_cpu_vsr(xah, xA(ctx->opcode), true); \ - get_cpu_vsr(xal, xA(ctx->opcode), false); \ - tcg_gen_and_i64(xah, xah, sgm); \ - tcg_gen_and_i64(xal, xal, sgm); \ - tcg_gen_andc_i64(xbh, xbh, sgm); \ - tcg_gen_andc_i64(xbl, xbl, sgm); \ - tcg_gen_or_i64(xbh, xbh, xah); \ - tcg_gen_or_i64(xbl, xbl, xal); \ - tcg_temp_free_i64(xah); \ - tcg_temp_free_i64(xal); \ - break; \ - } \ - } \ - set_cpu_vsr(xT(ctx->opcode), xbh, true); \ - set_cpu_vsr(xT(ctx->opcode), xbl, false); \ - tcg_temp_free_i64(xbh); \ - tcg_temp_free_i64(xbl); \ - tcg_temp_free_i64(sgm); \ - } - -VSX_VECTOR_MOVE(xvcpsgndp, OP_CPSGN, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvcpsgnsp, OP_CPSGN, SGN_MASK_SP) - #define TCG_OP_IMM_i64(FUNC, OP, IMM) \ static void FUNC(TCGv_i64 t, TCGv_i64 b) \ { \ @@ -852,6 +796,59 @@ TRANS(XVABSSP, do_vsx_msb_op, MO_32, do_xvabs_vec, do_xvabssp_i64) TRANS(XVNABSSP, do_vsx_msb_op, MO_32, do_xvnabs_vec, do_xvnabssp_i64) TRANS(XVNEGSP, do_vsx_msb_op, MO_32, do_xvneg_vec, do_xvnegsp_i64) +static void do_xvcpsgndp_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_andi_i64(a, a, SGN_MASK_DP); + tcg_gen_andi_i64(b, b, ~SGN_MASK_DP); + tcg_gen_or_i64(t, a, b); +} + +static void do_xvcpsgnsp_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_andi_i64(a, a, SGN_MASK_SP); + tcg_gen_andi_i64(b, b, ~SGN_MASK_SP); + tcg_gen_or_i64(t, a, b); +} + +static void do_xvcpsgn_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + uint64_t msb = (vece == MO_32) ? SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_bitsel_vec(vece, t, tcg_constant_vec_matching(t, vece, msb), a, b); +} + +static bool do_xvcpsgn(DisasContext *ctx, arg_XX3 *a, unsigned vece) +{ + static const TCGOpcode vecop_list[] = { + 0 + }; + + static const GVecGen3 op[] = { + { + .fni8 = do_xvcpsgnsp_i64, + .fniv = do_xvcpsgn_vec, + .opt_opc = vecop_list, + .vece = MO_32 + }, + { + .fni8 = do_xvcpsgndp_i64, + .fniv = do_xvcpsgn_vec, + .opt_opc = vecop_list, + .vece = MO_64 + }, + }; + + REQUIRE_INSNS_FLAGS2(ctx, VSX); + REQUIRE_VSX(ctx); + + tcg_gen_gvec_3(vsr_full_offset(a->xt), vsr_full_offset(a->xa), + vsr_full_offset(a->xb), 16, 16, &op[vece - MO_32]); + + return true; +} + +TRANS(XVCPSGNSP, do_xvcpsgn, MO_32) +TRANS(XVCPSGNDP, do_xvcpsgn, MO_64) + #define VSX_CMP(name, op1, op2, inval, type) \ static void gen_##name(DisasContext *ctx) \ { \ diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index b77324e0a8..f7d7377379 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -165,9 +165,6 @@ GEN_XX3FORM(name, opc2, opc3 | 1, fl2) GEN_XX2FORM_DCMX(xvtstdcdp, 0x14, 0x1E, PPC2_ISA300), GEN_XX2FORM_DCMX(xvtstdcsp, 0x14, 0x1A, PPC2_ISA300), -GEN_XX3FORM(xvcpsgndp, 0x00, 0x1E, PPC2_VSX), -GEN_XX3FORM(xvcpsgnsp, 0x00, 0x1A, PPC2_VSX), - GEN_XX3FORM(xsadddp, 0x00, 0x04, PPC2_VSX), GEN_VSX_XFORM_300(xsaddqp, 0x04, 0x00, 0x0), GEN_XX3FORM(xssubdp, 0x00, 0x05, PPC2_VSX), From patchwork Mon Oct 10 19:13:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688257 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmV2m6vBrz20cX for ; Tue, 11 Oct 2022 06:53:20 +1100 (AEDT) Received: from localhost ([::1]:51316 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohypy-0002F0-Pf for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:53:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53904) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFX-0001uH-72; Mon, 10 Oct 2022 15:15:39 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFV-0003Ag-Ab; Mon, 10 Oct 2022 15:15:38 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:01 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 145E4800278; Mon, 10 Oct 2022 16:14:01 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 10/12] target/ppc: Moved XVTSTDC[DS]P to decodetree Date: Mon, 10 Oct 2022 16:13:54 -0300 Message-Id: <20221010191356.83659-11-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:01.0345 (UTC) FILETIME=[74888710:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper to be simpler and do all decoding in the decodetree (so XB, XT and DCMX are all calculated outside the helper). Obs: The tests in this one are slightly different, these are the sum of these instructions with all possible immediate and those instructions are repeated 10 times. xvtstdcsp: rept loop master patch 8 12500 2,76402100 2,70699100 (-2.1%) 25 4000 2,64867100 2,67884100 (+1.1%) 100 1000 2,73806300 2,78701000 (+1.8%) 500 200 3,44666500 3,61027600 (+4.7%) 2500 40 5,85790200 6,47475500 (+10.5%) 8000 12 15,22102100 17,46062900 (+14.7%) xvtstdcdp: rept loop master patch 8 12500 2,11818000 1,61065300 (-24.0%) 25 4000 2,04573400 1,60132200 (-21.7%) 100 1000 2,13834100 1,69988100 (-20.5%) 500 200 2,73977000 2,48631700 (-9.3%) 2500 40 5,05067000 5,25914100 (+4.1%) 8000 12 14,60507800 15,93704900 (+9.1%) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/fpu_helper.c | 39 +++++++++++++++++++++++++++-- target/ppc/helper.h | 4 +-- target/ppc/insn32.decode | 5 ++++ target/ppc/translate/vsx-impl.c.inc | 28 +++++++++++++++++++-- target/ppc/translate/vsx-ops.c.inc | 8 ------ 5 files changed, 70 insertions(+), 14 deletions(-) diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c index ae25f32d6e..960a76a8a5 100644 --- a/target/ppc/fpu_helper.c +++ b/target/ppc/fpu_helper.c @@ -3295,11 +3295,46 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ } \ } -VSX_TEST_DC(xvtstdcdp, 2, xB(opcode), float64, VsrD(i), VsrD(i), UINT64_MAX, 0) -VSX_TEST_DC(xvtstdcsp, 4, xB(opcode), float32, VsrW(i), VsrW(i), UINT32_MAX, 0) VSX_TEST_DC(xststdcdp, 1, xB(opcode), float64, VsrD(0), VsrD(0), 0, 1) VSX_TEST_DC(xststdcqp, 1, (rB(opcode) + 32), float128, f128, VsrD(0), 0, 1) +#define VSX_TSTDC(tp) \ +static int32_t tp##_tstdc(tp b, uint32_t dcmx) \ +{ \ + uint32_t match = 0; \ + uint32_t sign = tp##_is_neg(b); \ + if (tp##_is_any_nan(b)) { \ + match = extract32(dcmx, 6, 1); \ + } else if (tp##_is_infinity(b)) { \ + match = extract32(dcmx, 4 + !sign, 1); \ + } else if (tp##_is_zero(b)) { \ + match = extract32(dcmx, 2 + !sign, 1); \ + } else if (tp##_is_zero_or_denormal(b)) { \ + match = extract32(dcmx, 0 + !sign, 1); \ + } \ + return (match != 0); \ +} + +VSX_TSTDC(float32) +VSX_TSTDC(float64) +#undef VSX_TSTDC + +void helper_XVTSTDCDP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) +{ + int i; + for (i = 0; i < 2; i++) { + t->s64[i] = (int64_t)-float64_tstdc(b->f64[i], dcmx); + } +} + +void helper_XVTSTDCSP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) +{ + int i; + for (i = 0; i < 4; i++) { + t->s32[i] = (int32_t)-float32_tstdc(b->f32[i], dcmx); + } +} + void helper_xststdcsp(CPUPPCState *env, uint32_t opcode, ppc_vsr_t *xb) { uint32_t dcmx, sign, exp; diff --git a/target/ppc/helper.h b/target/ppc/helper.h index fd8280dfa7..9e5d11939b 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -517,8 +517,8 @@ DEF_HELPER_3(xvcvsxdsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvuxdsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvsxwsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvuxwsp, void, env, vsr, vsr) -DEF_HELPER_2(xvtstdcsp, void, env, i32) -DEF_HELPER_2(xvtstdcdp, void, env, i32) +DEF_HELPER_FLAGS_4(XVTSTDCSP, TCG_CALL_NO_RWG, void, vsr, vsr, i64, i32) +DEF_HELPER_FLAGS_4(XVTSTDCDP, TCG_CALL_NO_RWG, void, vsr, vsr, i64, i32) DEF_HELPER_3(xvrspi, void, env, vsr, vsr) DEF_HELPER_3(xvrspic, void, env, vsr, vsr) DEF_HELPER_3(xvrspim, void, env, vsr, vsr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 6549c4040e..c0a531be5c 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -199,6 +199,9 @@ @XX2_uim4 ...... ..... . uim:4 ..... ......... .. &XX2_uim xt=%xx_xt xb=%xx_xb +%xx_uim7 6:1 2:1 16:5 +@XX2_uim7 ...... ..... ..... ..... .... . ... . .. &XX2_uim xt=%xx_xt xb=%xx_xb uim=%xx_uim7 + &XX2_bf_xb bf xb @XX2_bf_xb ...... bf:3 .. ..... ..... ......... . . &XX2_bf_xb xb=%xx_xb @@ -848,6 +851,8 @@ XSCVSPDPN 111100 ..... ----- ..... 101001011 .. @XX2 ## VSX Binary Floating-Point Math Support Instructions XVXSIGSP 111100 ..... 01001 ..... 111011011 .. @XX2 +XVTSTDCDP 111100 ..... ..... ..... 1111 . 101 ... @XX2_uim7 +XVTSTDCSP 111100 ..... ..... ..... 1101 . 101 ... @XX2_uim7 ## VSX Vector Test Least-Significant Bit by Byte Instruction diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 4f17da514c..8b42402343 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -630,6 +630,8 @@ static void gen_mtvsrws(DisasContext *ctx) #define OP_CPSGN 4 #define SGN_MASK_DP 0x8000000000000000ull #define SGN_MASK_SP 0x8000000080000000ull +#define EXP_MASK_DP 0x7FF0000000000000ull +#define EXP_MASK_SP 0x7F8000007F800000ull #define VSX_SCALAR_MOVE(name, op, sgn_mask) \ static void glue(gen_, name)(DisasContext *ctx) \ @@ -1111,6 +1113,30 @@ GEN_VSX_HELPER_X2(xscvhpdp, 0x16, 0x15, 0x10, PPC2_ISA300) GEN_VSX_HELPER_R2(xscvsdqp, 0x04, 0x1A, 0x0A, PPC2_ISA300) GEN_VSX_HELPER_X2(xscvspdp, 0x12, 0x14, 0, PPC2_VSX) +static bool do_xvtstdc(DisasContext *ctx, arg_XX2_uim *a, unsigned vece) +{ + static const GVecGen2i op[] = { + { + .fnoi = gen_helper_XVTSTDCSP, + .vece = MO_32 + }, + { + .fnoi = gen_helper_XVTSTDCDP, + .vece = MO_64 + }, + }; + + REQUIRE_VSX(ctx); + + tcg_gen_gvec_2i(vsr_full_offset(a->xt), vsr_full_offset(a->xb), + 16, 16, (int32_t)(a->uim), &op[vece - MO_32]); + + return true; +} + +TRANS_FLAGS2(VSX, XVTSTDCSP, do_xvtstdc, MO_32) +TRANS_FLAGS2(VSX, XVTSTDCDP, do_xvtstdc, MO_64) + bool trans_XSCVSPDPN(DisasContext *ctx, arg_XX2 *a) { TCGv_i64 tmp; @@ -1214,8 +1240,6 @@ GEN_VSX_HELPER_X2(xvrspic, 0x16, 0x0A, 0, PPC2_VSX) GEN_VSX_HELPER_X2(xvrspim, 0x12, 0x0B, 0, PPC2_VSX) GEN_VSX_HELPER_X2(xvrspip, 0x12, 0x0A, 0, PPC2_VSX) GEN_VSX_HELPER_X2(xvrspiz, 0x12, 0x09, 0, PPC2_VSX) -GEN_VSX_HELPER_2(xvtstdcsp, 0x14, 0x1A, 0, PPC2_VSX) -GEN_VSX_HELPER_2(xvtstdcdp, 0x14, 0x1E, 0, PPC2_VSX) static bool trans_XXPERM(DisasContext *ctx, arg_XX3 *a) { diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index f7d7377379..4b317d4b06 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -157,14 +157,6 @@ GEN_XX2FORM_EO(xvxexpdp, 0x16, 0x1D, 0x00, PPC2_ISA300), GEN_XX2FORM_EO(xvxsigdp, 0x16, 0x1D, 0x01, PPC2_ISA300), GEN_XX2FORM_EO(xvxexpsp, 0x16, 0x1D, 0x08, PPC2_ISA300), -/* DCMX = bit[25] << 6 | bit[29] << 5 | bit[11:15] */ -#define GEN_XX2FORM_DCMX(name, opc2, opc3, fl2) \ -GEN_XX3FORM(name, opc2, opc3 | 0, fl2), \ -GEN_XX3FORM(name, opc2, opc3 | 1, fl2) - -GEN_XX2FORM_DCMX(xvtstdcdp, 0x14, 0x1E, PPC2_ISA300), -GEN_XX2FORM_DCMX(xvtstdcsp, 0x14, 0x1A, PPC2_ISA300), - GEN_XX3FORM(xsadddp, 0x00, 0x04, PPC2_VSX), GEN_VSX_XFORM_300(xsaddqp, 0x04, 0x00, 0x0), GEN_XX3FORM(xssubdp, 0x00, 0x05, PPC2_VSX), From patchwork Mon Oct 10 19:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688258 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmV5z4ltzz1yqk for ; Tue, 11 Oct 2022 06:56:06 +1100 (AEDT) Received: from localhost ([::1]:51432 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohysd-0006MJ-6W for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 15:56:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53906) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFc-00022R-H7; Mon, 10 Oct 2022 15:15:44 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFY-0003Ag-8u; Mon, 10 Oct 2022 15:15:43 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:01 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 5FDC8800278; Mon, 10 Oct 2022 16:14:01 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 11/12] target/ppc: Moved XSTSTDC[QDS]P to decodetree Date: Mon, 10 Oct 2022 16:13:55 -0300 Message-Id: <20221010191356.83659-12-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:01.0642 (UTC) FILETIME=[74B5D8A0:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of its decoding away from the helper as previously the DCMX, XB and BF were calculated in the helper with the help of cpu_env, now that part was moved to the decodetree with the rest. xvtstdcsp: rept loop master patch 8 12500 1,85393600 1,94683600 (+5.0%) 25 4000 1,78779800 1,92479000 (+7.7%) 100 1000 2,12775000 2,28895500 (+7.6%) 500 200 2,99655300 3,23102900 (+7.8%) 2500 40 6,89082200 7,44827500 (+8.1%) 8000 12 17,50585500 18,95152100 (+8.3%) xvtstdcdp: rept loop master patch 8 12500 1,39043100 1,33539800 (-4.0%) 25 4000 1,35731800 1,37347800 (+1.2%) 100 1000 1,51514800 1,56053000 (+3.0%) 500 200 2,21014400 2,47906000 (+12.2%) 2500 40 5,39488200 6,68766700 (+24.0%) 8000 12 13,98623900 18,17661900 (+30.0%) xvtstdcdp: rept loop master patch 8 12500 1,35123800 1,34455800 (-0.5%) 25 4000 1,36441200 1,36759600 (+0.2%) 100 1000 1,49763500 1,54138400 (+2.9%) 500 200 2,19020200 2,46196400 (+12.4%) 2500 40 5,39265700 6,68147900 (+23.9%) 8000 12 14,04163600 18,19669600 (+29.6%) As some values are now decoded outside the helper and passed to it as an argument the number of arguments of the helper increased, the number of TCGop needed to load the arguments increased. I suspect that's why the slow-down in the tests with a high REPT but low LOOP. Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/fpu_helper.c | 114 +++++++++------------------- target/ppc/helper.h | 6 +- target/ppc/insn32.decode | 6 ++ target/ppc/translate/vsx-impl.c.inc | 20 ++++- target/ppc/translate/vsx-ops.c.inc | 4 - 5 files changed, 60 insertions(+), 90 deletions(-) diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c index 960a76a8a5..a66e16c212 100644 --- a/target/ppc/fpu_helper.c +++ b/target/ppc/fpu_helper.c @@ -3241,63 +3241,6 @@ void helper_XVXSIGSP(ppc_vsr_t *xt, ppc_vsr_t *xb) *xt = t; } -/* - * VSX_TEST_DC - VSX floating point test data class - * op - instruction mnemonic - * nels - number of elements (1, 2 or 4) - * xbn - VSR register number - * tp - type (float32 or float64) - * fld - vsr_t field (VsrD(*) or VsrW(*)) - * tfld - target vsr_t field (VsrD(*) or VsrW(*)) - * fld_max - target field max - * scrf - set result in CR and FPCC - */ -#define VSX_TEST_DC(op, nels, xbn, tp, fld, tfld, fld_max, scrf) \ -void helper_##op(CPUPPCState *env, uint32_t opcode) \ -{ \ - ppc_vsr_t *xt = &env->vsr[xT(opcode)]; \ - ppc_vsr_t *xb = &env->vsr[xbn]; \ - ppc_vsr_t t = { }; \ - uint32_t i, sign, dcmx; \ - uint32_t cc, match = 0; \ - \ - if (!scrf) { \ - dcmx = DCMX_XV(opcode); \ - } else { \ - t = *xt; \ - dcmx = DCMX(opcode); \ - } \ - \ - for (i = 0; i < nels; i++) { \ - sign = tp##_is_neg(xb->fld); \ - if (tp##_is_any_nan(xb->fld)) { \ - match = extract32(dcmx, 6, 1); \ - } else if (tp##_is_infinity(xb->fld)) { \ - match = extract32(dcmx, 4 + !sign, 1); \ - } else if (tp##_is_zero(xb->fld)) { \ - match = extract32(dcmx, 2 + !sign, 1); \ - } else if (tp##_is_zero_or_denormal(xb->fld)) { \ - match = extract32(dcmx, 0 + !sign, 1); \ - } \ - \ - if (scrf) { \ - cc = sign << CRF_LT_BIT | match << CRF_EQ_BIT; \ - env->fpscr &= ~FP_FPCC; \ - env->fpscr |= cc << FPSCR_FPCC; \ - env->crf[BF(opcode)] = cc; \ - } else { \ - t.tfld = match ? fld_max : 0; \ - } \ - match = 0; \ - } \ - if (!scrf) { \ - *xt = t; \ - } \ -} - -VSX_TEST_DC(xststdcdp, 1, xB(opcode), float64, VsrD(0), VsrD(0), 0, 1) -VSX_TEST_DC(xststdcqp, 1, (rB(opcode) + 32), float128, f128, VsrD(0), 0, 1) - #define VSX_TSTDC(tp) \ static int32_t tp##_tstdc(tp b, uint32_t dcmx) \ { \ @@ -3317,6 +3260,7 @@ static int32_t tp##_tstdc(tp b, uint32_t dcmx) \ VSX_TSTDC(float32) VSX_TSTDC(float64) +VSX_TSTDC(float128) #undef VSX_TSTDC void helper_XVTSTDCDP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) @@ -3335,34 +3279,44 @@ void helper_XVTSTDCSP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) } } -void helper_xststdcsp(CPUPPCState *env, uint32_t opcode, ppc_vsr_t *xb) +static bool not_SP_value(float64 val) { - uint32_t dcmx, sign, exp; - uint32_t cc, match = 0, not_sp = 0; - float64 arg = xb->VsrD(0); - float64 arg_sp; - - dcmx = DCMX(opcode); - exp = (arg >> 52) & 0x7FF; - sign = float64_is_neg(arg); - - if (float64_is_any_nan(arg)) { - match = extract32(dcmx, 6, 1); - } else if (float64_is_infinity(arg)) { - match = extract32(dcmx, 4 + !sign, 1); - } else if (float64_is_zero(arg)) { - match = extract32(dcmx, 2 + !sign, 1); - } else if (float64_is_zero_or_denormal(arg) || (exp > 0 && exp < 0x381)) { - match = extract32(dcmx, 0 + !sign, 1); - } - - arg_sp = helper_todouble(helper_tosingle(arg)); - not_sp = arg != arg_sp; + return val != helper_todouble(helper_tosingle(val)); +} +/* + * VSX_XS_TSTDC - VSX Scalar Test Data Class + * NAME - instruction name + * FLD - vsr_t field (VsrD(0) or f128) + * TP - type (float64 or float128) + */ +#define VSX_XS_TSTDC(NAME, FLD, TP) \ + void helper_##NAME(CPUPPCState *env, uint32_t bf, \ + uint32_t dcmx, ppc_vsr_t *b) \ + { \ + uint32_t cc, match, sign = TP##_is_neg(b->FLD); \ + match = TP##_tstdc(b->FLD, dcmx); \ + cc = sign << CRF_LT_BIT | match << CRF_EQ_BIT; \ + env->fpscr &= ~FP_FPCC; \ + env->fpscr |= cc << FPSCR_FPCC; \ + env->crf[bf] = cc; \ + } + +VSX_XS_TSTDC(XSTSTDCDP, VsrD(0), float64) +VSX_XS_TSTDC(XSTSTDCQP, f128, float128) +#undef VSX_XS_TSTDC + +void helper_XSTSTDCSP(CPUPPCState *env, uint32_t bf, + uint32_t dcmx, ppc_vsr_t *b) +{ + uint32_t cc, match, sign = float64_is_neg(b->VsrD(0)); + uint32_t exp = (b->VsrD(0) >> 52) & 0x7FF; + int not_sp = (int)not_SP_value(b->VsrD(0)); + match = float64_tstdc(b->VsrD(0), dcmx) || (exp > 0 && exp < 0x381); cc = sign << CRF_LT_BIT | match << CRF_EQ_BIT | not_sp << CRF_SO_BIT; env->fpscr &= ~FP_FPCC; env->fpscr |= cc << FPSCR_FPCC; - env->crf[BF(opcode)] = cc; + env->crf[bf] = cc; } void helper_xsrqpi(CPUPPCState *env, uint32_t opcode, diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 9e5d11939b..8344fe39c6 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -417,9 +417,9 @@ DEF_HELPER_3(xscvuxdsp, void, env, vsr, vsr) DEF_HELPER_3(xscvsxdsp, void, env, vsr, vsr) DEF_HELPER_4(xscvudqp, void, env, i32, vsr, vsr) DEF_HELPER_3(xscvuxddp, void, env, vsr, vsr) -DEF_HELPER_3(xststdcsp, void, env, i32, vsr) -DEF_HELPER_2(xststdcdp, void, env, i32) -DEF_HELPER_2(xststdcqp, void, env, i32) +DEF_HELPER_4(XSTSTDCSP, void, env, i32, i32, vsr) +DEF_HELPER_4(XSTSTDCDP, void, env, i32, i32, vsr) +DEF_HELPER_4(XSTSTDCQP, void, env, i32, i32, vsr) DEF_HELPER_3(xsrdpi, void, env, vsr, vsr) DEF_HELPER_3(xsrdpic, void, env, vsr, vsr) DEF_HELPER_3(xsrdpim, void, env, vsr, vsr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index c0a531be5c..334eb1beca 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -202,6 +202,9 @@ %xx_uim7 6:1 2:1 16:5 @XX2_uim7 ...... ..... ..... ..... .... . ... . .. &XX2_uim xt=%xx_xt xb=%xx_xb uim=%xx_uim7 +&XX2_bf_uim bf xb uim +@XX2_bf_uim ...... bf:3 uim:7 ..... ......... . . &XX2_bf_uim + &XX2_bf_xb bf xb @XX2_bf_xb ...... bf:3 .. ..... ..... ......... . . &XX2_bf_xb xb=%xx_xb @@ -853,6 +856,9 @@ XSCVSPDPN 111100 ..... ----- ..... 101001011 .. @XX2 XVXSIGSP 111100 ..... 01001 ..... 111011011 .. @XX2 XVTSTDCDP 111100 ..... ..... ..... 1111 . 101 ... @XX2_uim7 XVTSTDCSP 111100 ..... ..... ..... 1101 . 101 ... @XX2_uim7 +XSTSTDCSP 111100 ... ....... ..... 100101010 . - @XX2_bf_uim xb=%xx_xb +XSTSTDCDP 111100 ... ....... ..... 101101010 . - @XX2_bf_uim xb=%xx_xb +XSTSTDCQP 111111 ... ....... xb:5 1011000100 - @XX2_bf_uim ## VSX Vector Test Least-Significant Bit by Byte Instruction diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 8b42402343..4fdbc45ff4 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -1137,6 +1137,23 @@ static bool do_xvtstdc(DisasContext *ctx, arg_XX2_uim *a, unsigned vece) TRANS_FLAGS2(VSX, XVTSTDCSP, do_xvtstdc, MO_32) TRANS_FLAGS2(VSX, XVTSTDCDP, do_xvtstdc, MO_64) +static bool do_XX2_bf_uim(DisasContext *ctx, arg_XX2_bf_uim *a, bool vsr, + void (*gen_helper)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_ptr)) +{ + TCGv_ptr xb; + + REQUIRE_VSX(ctx); + xb = vsr ? gen_vsr_ptr(a->xb) : gen_avr_ptr(a->xb); + gen_helper(cpu_env, tcg_constant_i32(a->bf), tcg_constant_i32(a->uim), xb); + tcg_temp_free_ptr(xb); + + return true; +} + +TRANS_FLAGS2(ISA300, XSTSTDCSP, do_XX2_bf_uim, true, gen_helper_XSTSTDCSP) +TRANS_FLAGS2(ISA300, XSTSTDCDP, do_XX2_bf_uim, true, gen_helper_XSTSTDCDP) +TRANS_FLAGS2(ISA300, XSTSTDCQP, do_XX2_bf_uim, false, gen_helper_XSTSTDCQP) + bool trans_XSCVSPDPN(DisasContext *ctx, arg_XX2 *a) { TCGv_i64 tmp; @@ -1183,9 +1200,6 @@ GEN_VSX_HELPER_X2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207) GEN_VSX_HELPER_X2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207) GEN_VSX_HELPER_X2(xscvsxdsp, 0x10, 0x13, 0, PPC2_VSX207) GEN_VSX_HELPER_X2(xscvuxdsp, 0x10, 0x12, 0, PPC2_VSX207) -GEN_VSX_HELPER_X1(xststdcsp, 0x14, 0x12, 0, PPC2_ISA300) -GEN_VSX_HELPER_2(xststdcdp, 0x14, 0x16, 0, PPC2_ISA300) -GEN_VSX_HELPER_2(xststdcqp, 0x04, 0x16, 0, PPC2_ISA300) GEN_VSX_HELPER_X3(xvadddp, 0x00, 0x0C, 0, PPC2_VSX) GEN_VSX_HELPER_X3(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX) diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index 4b317d4b06..a3ba094d62 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -147,10 +147,6 @@ GEN_HANDLER_E(xsiexpdp, 0x3C, 0x16, 0x1C, 0, PPC_NONE, PPC2_ISA300), GEN_VSX_XFORM_300(xsiexpqp, 0x4, 0x1B, 0x00000001), #endif -GEN_XX2FORM(xststdcdp, 0x14, 0x16, PPC2_ISA300), -GEN_XX2FORM(xststdcsp, 0x14, 0x12, PPC2_ISA300), -GEN_VSX_XFORM_300(xststdcqp, 0x04, 0x16, 0x00000001), - GEN_XX3FORM(xviexpsp, 0x00, 0x1B, PPC2_ISA300), GEN_XX3FORM(xviexpdp, 0x00, 0x1F, PPC2_ISA300), GEN_XX2FORM_EO(xvxexpdp, 0x16, 0x1D, 0x00, PPC2_ISA300), From patchwork Mon Oct 10 19:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 1688259 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MmVC04Wpnz23jS for ; Tue, 11 Oct 2022 07:00:28 +1100 (AEDT) Received: from localhost ([::1]:51556 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohywr-0003JK-N3 for incoming@patchwork.ozlabs.org; Mon, 10 Oct 2022 16:00:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56730) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohyFf-00028h-PL; Mon, 10 Oct 2022 15:15:47 -0400 Received: from [200.168.210.66] (port=19040 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohyFc-0003Ag-Iq; Mon, 10 Oct 2022 15:15:47 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Mon, 10 Oct 2022 16:14:01 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 95372800631; Mon, 10 Oct 2022 16:14:01 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v2 12/12] target/ppc: Use gvec to decode XVTSTDC[DS]P Date: Mon, 10 Oct 2022 16:13:56 -0300 Message-Id: <20221010191356.83659-13-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> References: <20221010191356.83659-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 10 Oct 2022 19:14:01.0876 (UTC) FILETIME=[74D98D40:01D8DCDC] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Used gvec to translate XVTSTDCSP and XVTSTDCDP. xvtstdcsp: rept loop imm prev version current version 25 4000 0 0,047550 0,040820 (-14.2%) 25 4000 1 0,069520 0,053520 (-23.0%) 25 4000 3 0,078660 0,058470 (-25.7%) 25 4000 51 0,099280 0,190100 (+91.5%) 25 4000 127 0,129690 0,201750 (+55.6%) 8000 12 0 0,554625 0,391385 (-29.4%) 8000 12 1 2,675635 1,423656 (-46.8%) 8000 12 3 3,186823 1,756885 (-44.9%) 8000 12 51 4,284417 1,363698 (-68.2%) 8000 12 127 5,638000 1,305333 (-76.8%) xvtstdcdp: rept loop imm prev version current version 25 4000 0 0,047450 0,040590 (-14.5%) 25 4000 1 0,074130 0,053570 (-27.7%) 25 4000 3 0,084180 0,063020 (-25.1%) 25 4000 51 0,103340 0,127980 (+23.8%) 25 4000 127 0,134670 0,128660 (-4.5%) 8000 12 0 0,522427 0,391510 (-25.1%) 8000 12 1 2,884708 1,426802 (-50.5%) 8000 12 3 3,427625 1,972115 (-42.5%) 8000 12 51 4,450260 1,251865 (-71.9%) 8000 12 127 5,854479 1,250719 (-78.6%) Overall, these instructions are the hardest ones to measure performance as the gvec implementation is affected by the immediate. Above there are 5 different scenarios when it comes to immediate and 2 when it comes to rept/loop combination. The immediates scenarios are: all bits are 0 therefore the target register should just be changed to 0, with 1 bit set, with 2 bits set in a combination the new implementation can deal with using gvec, 4 bits set and the new implementation can't deal with it using gvec and all bits set. The rept/loop scenarios are high loop and low rept (so it should spend more time executing it than translating it) and high rept low loop (so it should spend more time translating it than executing this code). There was a gain when it came to translating the instructions and in the execution time in the immediates the new implementation is configured to accept, but a loss in performance in execution time for more exoteric immediates. Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/fpu_helper.c | 7 +- target/ppc/helper.h | 4 +- target/ppc/translate/vsx-impl.c.inc | 188 ++++++++++++++++++++++++++-- 3 files changed, 184 insertions(+), 15 deletions(-) diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c index a66e16c212..6c94576575 100644 --- a/target/ppc/fpu_helper.c +++ b/target/ppc/fpu_helper.c @@ -22,6 +22,7 @@ #include "exec/exec-all.h" #include "internal.h" #include "fpu/softfloat.h" +#include "tcg/tcg-gvec-desc.h" static inline float128 float128_snan_to_qnan(float128 x) { @@ -3263,17 +3264,19 @@ VSX_TSTDC(float64) VSX_TSTDC(float128) #undef VSX_TSTDC -void helper_XVTSTDCDP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) +void helper_XVTSTDCDP(ppc_vsr_t *t, ppc_vsr_t *b, uint32_t dcmx) { int i; + dcmx = simd_data(dcmx); for (i = 0; i < 2; i++) { t->s64[i] = (int64_t)-float64_tstdc(b->f64[i], dcmx); } } -void helper_XVTSTDCSP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) +void helper_XVTSTDCSP(ppc_vsr_t *t, ppc_vsr_t *b, uint32_t dcmx) { int i; + dcmx = simd_data(dcmx); for (i = 0; i < 4; i++) { t->s32[i] = (int32_t)-float32_tstdc(b->f32[i], dcmx); } diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 8344fe39c6..2851418acc 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -517,8 +517,8 @@ DEF_HELPER_3(xvcvsxdsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvuxdsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvsxwsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvuxwsp, void, env, vsr, vsr) -DEF_HELPER_FLAGS_4(XVTSTDCSP, TCG_CALL_NO_RWG, void, vsr, vsr, i64, i32) -DEF_HELPER_FLAGS_4(XVTSTDCDP, TCG_CALL_NO_RWG, void, vsr, vsr, i64, i32) +DEF_HELPER_FLAGS_3(XVTSTDCSP, TCG_CALL_NO_RWG, void, vsr, vsr, i32) +DEF_HELPER_FLAGS_3(XVTSTDCDP, TCG_CALL_NO_RWG, void, vsr, vsr, i32) DEF_HELPER_3(xvrspi, void, env, vsr, vsr) DEF_HELPER_3(xvrspic, void, env, vsr, vsr) DEF_HELPER_3(xvrspim, void, env, vsr, vsr) diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 4fdbc45ff4..26fc8c0b01 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -632,6 +632,8 @@ static void gen_mtvsrws(DisasContext *ctx) #define SGN_MASK_SP 0x8000000080000000ull #define EXP_MASK_DP 0x7FF0000000000000ull #define EXP_MASK_SP 0x7F8000007F800000ull +#define FRC_MASK_DP (~(SGN_MASK_DP | EXP_MASK_DP)) +#define FRC_MASK_SP (~(SGN_MASK_SP | EXP_MASK_SP)) #define VSX_SCALAR_MOVE(name, op, sgn_mask) \ static void glue(gen_, name)(DisasContext *ctx) \ @@ -1113,23 +1115,187 @@ GEN_VSX_HELPER_X2(xscvhpdp, 0x16, 0x15, 0x10, PPC2_ISA300) GEN_VSX_HELPER_R2(xscvsdqp, 0x04, 0x1A, 0x0A, PPC2_ISA300) GEN_VSX_HELPER_X2(xscvspdp, 0x12, 0x14, 0, PPC2_VSX) +/* test if +Inf */ +static void gen_is_pos_inf(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, exp_msk)); +} + +/* test if -Inf */ +static void gen_is_neg_inf(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk | exp_msk)); +} + +/* test if +Inf or -Inf */ +static void gen_is_any_inf(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_andc_vec(vece, b, b, tcg_constant_vec_matching(t, vece, exp_msk)); + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk)); +} + +/* test if +0 */ +static void gen_is_pos_zero(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, 0)); +} + +/* test if -0 */ +static void gen_is_neg_zero(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk)); +} + +/* test if +0 or -0 */ +static void gen_is_any_zero(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_and_vec(vece, b, b, tcg_constant_vec_matching(t, vece, ~sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, 0)); +} + +/* test if +Denormal */ +static void gen_is_pos_denormal(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t frc_msk = (vece == MO_32) ? (uint32_t)FRC_MASK_SP : FRC_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_LEU, vece, t, b, + tcg_constant_vec_matching(t, vece, frc_msk)); + tcg_gen_cmp_vec(TCG_COND_NE, vece, b, b, + tcg_constant_vec_matching(t, vece, 0)); + tcg_gen_and_vec(vece, t, t, b); +} + +/* test if -Denormal */ +static void gen_is_neg_denormal(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + uint64_t frc_msk = (vece == MO_32) ? (uint32_t)FRC_MASK_SP : FRC_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_LEU, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk | frc_msk)); + tcg_gen_cmp_vec(TCG_COND_GTU, vece, b, b, + tcg_constant_vec_matching(t, vece, sgn_msk)); + tcg_gen_and_vec(vece, t, t, b); +} + +/* test if +Denormal or -Denormal */ +static void gen_is_any_denormal(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + uint64_t frc_msk = (vece == MO_32) ? (uint32_t)FRC_MASK_SP : FRC_MASK_DP; + tcg_gen_and_vec(vece, b, b, tcg_constant_vec_matching(t, vece, ~sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_LE, vece, t, b, + tcg_constant_vec_matching(t, vece, frc_msk)); + tcg_gen_cmp_vec(TCG_COND_NE, vece, b, b, + tcg_constant_vec_matching(t, vece, 0)); + tcg_gen_and_vec(vece, t, t, b); +} + +/* test if NaN */ +static void gen_is_nan(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_and_vec(vece, b, b, tcg_constant_vec_matching(t, vece, ~sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_GT, vece, t, b, + tcg_constant_vec_matching(t, vece, exp_msk)); +} + static bool do_xvtstdc(DisasContext *ctx, arg_XX2_uim *a, unsigned vece) { - static const GVecGen2i op[] = { - { - .fnoi = gen_helper_XVTSTDCSP, - .vece = MO_32 - }, - { - .fnoi = gen_helper_XVTSTDCDP, - .vece = MO_64 - }, + static const TCGOpcode vecop_list[] = { + INDEX_op_cmp_vec, 0 + }; + + GVecGen2 op = { + .fno = (vece == MO_32) ? gen_helper_XVTSTDCSP : gen_helper_XVTSTDCDP, + .vece = vece, + .opt_opc = vecop_list }; REQUIRE_VSX(ctx); - tcg_gen_gvec_2i(vsr_full_offset(a->xt), vsr_full_offset(a->xb), - 16, 16, (int32_t)(a->uim), &op[vece - MO_32]); + switch (a->uim) { + case 0: + set_cpu_vsr(a->xt, tcg_constant_i64(0), true); + set_cpu_vsr(a->xt, tcg_constant_i64(0), false); + break; + case ((1 << 0) | (1 << 1)): + /* test if +Denormal or -Denormal */ + op.fniv = gen_is_any_denormal, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 0): + /* test if -Denormal */ + op.fniv = gen_is_neg_denormal, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 1): + /* test if +Denormal */ + op.fniv = gen_is_pos_denormal, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case ((1 << 2) | (1 << 3)): + /* test if +0 or -0 */ + op.fniv = gen_is_any_zero, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 2): + /* test if -0 */ + op.fniv = gen_is_neg_zero, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 3): + /* test if +0 */ + op.fniv = gen_is_pos_zero, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case ((1 << 4) | (1 << 5)): + /* test if +Inf or -Inf */ + op.fniv = gen_is_any_inf, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 4): + /* test if -Inf */ + op.fniv = gen_is_neg_inf, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 5): + /* test if +Inf */ + op.fniv = gen_is_pos_inf, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + case (1 << 6): + /* test if NaN */ + op.fniv = gen_is_nan, + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, 16, + &op); + break; + default: + tcg_gen_gvec_2_ool(vsr_full_offset(a->xt), vsr_full_offset(a->xb), 16, + 16, (int32_t)(a->uim), op.fno); + break; + } return true; }