[v1,04/46] target/loongarch: Implement xvadd/xvsub

Message ID	20230620093814.123650-5-gaosong@loongson.cn
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Song Gao <gaosong@loongson.cn> To: qemu-devel@nongnu.org Cc: richard.henderson@linaro.org Subject: [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub Date: Tue, 20 Jun 2023 17:37:32 +0800 Message-Id: <20230620093814.123650-5-gaosong@loongson.cn> In-Reply-To: <20230620093814.123650-1-gaosong@loongson.cn> References: <20230620093814.123650-1-gaosong@loongson.cn> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=114.242.206.163; envelope-from=gaosong@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	Add LoongArch LASX instructions \| expand [v1,00/46] Add LoongArch LASX instructions [v1,01/46] target/loongarch: Add LASX data type XReg [v1,02/46] target/loongarch: meson.build support build LASX [v1,03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable [v1,04/46] target/loongarch: Implement xvadd/xvsub [v1,05/46] target/loongarch: Implement xvreplgr2vr [v1,06/46] target/loongarch: Implement xvaddi/xvsubi [v1,07/46] target/loongarch: Implement xvneg [v1,08/46] target/loongarch: Implement xvsadd/xvssub [v1,09/46] target/loongarch: Implement xvhaddw/xvhsubw [v1,10/46] target/loongarch: Implement xvaddw/xvsubw [v1,11/46] target/loongarch: Implement xavg/xvagr [v1,12/46] target/loongarch: Implement xvabsd [v1,13/46] target/loongarch: Implement xvadda [v1,14/46] target/loongarch: Implement xvmax/xvmin [v1,15/46] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} [v1,16/46] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} [v1,17/46] target/loongarch; Implement xvdiv/xvmod [v1,18/46] target/loongarch: Implement xvsat [v1,19/46] target/loongarch: Implement xvexth [v1,20/46] target/loongarch: Implement vext2xv [v1,21/46] target/loongarch: Implement xvsigncov [v1,22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz [v1,23/46] target/loognarch: Implement xvldi [v1,24/46] target/loongarch: Implement LASX logic instructions [v1,25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr [v1,26/46] target/loongarch: Implement xvsllwil xvextl [v1,27/46] target/loongarch: Implement xvsrlr xvsrar [v1,28/46] target/loongarch: Implement xvsrln xvsran [v1,29/46] target/loongarch: Implement xvsrlrn xvsrarn [v1,30/46] target/loongarch: Implement xvssrln xvssran [v1,31/46] target/loongarch: Implement xvssrlrn xvssrarn [v1,32/46] target/loongarch: Implement xvclo xvclz [v1,33/46] target/loongarch: Implement xvpcnt [v1,34/46] target/loongarch: Implement xvbitclr xvbitset xvbitrev [v1,35/46] target/loongarch: Implement xvfrstp [v1,36/46] target/loongarch: Implement LASX fpu arith instructions [v1,37/46] target/loongarch: Implement LASX fpu fcvt instructions [v1,38/46] target/loongarch: Implement xvseq xvsle xvslt [v1,39/46] target/loongarch: Implement xvfcmp [v1,40/46] target/loongarch: Implement xvbitsel xvset [v1,41/46] target/loongarch: Implement xvinsgr2vr xvpickve2gr [v1,42/46] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v [v1,43/46] target/loongarch: Implement xvpack xvpick xvilv{l/h} [v1,44/46] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins [v1,45/46] target/loongarch: Implement xvld xvst [v1,46/46] target/loongarch: CPUCFG support LASX

Message ID

20230620093814.123650-5-gaosong@loongson.cn

State

New

Headers

From: Song Gao <gaosong@loongson.cn>
To: qemu-devel@nongnu.org
Cc: richard.henderson@linaro.org
Subject: [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub
Date: Tue, 20 Jun 2023 17:37:32 +0800
Message-Id: <20230620093814.123650-5-gaosong@loongson.cn>
In-Reply-To: <20230620093814.123650-1-gaosong@loongson.cn>
References: <20230620093814.123650-1-gaosong@loongson.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=114.242.206.163;
 envelope-from=gaosong@loongson.cn;
 helo=mail.loongson.cn
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Series

Add LoongArch LASX instructions | expand

Commit Message

Song Gao June 20, 2023, 9:37 a.m. UTC

This patch includes:
- XVADD.{B/H/W/D/Q};
- XVSUB.{B/H/W/D/Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                     | 23 ++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 59 ++++++++++++++++++++
 target/loongarch/insns.decode                | 23 ++++++++
 target/loongarch/translate.c                 | 17 ++++++
 4 files changed, 122 insertions(+)

Comments

Richard Henderson June 20, 2023, 12:25 p.m. UTC | #1

On 6/20/23 11:37, Song Gao wrote:
> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
> +                     void (*func)(unsigned, uint32_t, uint32_t,
> +                                  uint32_t, uint32_t, uint32_t))
> +{
> +    uint32_t xd_ofs, xj_ofs, xk_ofs;
> +
> +    CHECK_ASXE;
> +
> +    xd_ofs = vec_full_offset(a->xd);
> +    xj_ofs = vec_full_offset(a->xj);
> +    xk_ofs = vec_full_offset(a->xk);
> +
> +    func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
> +    return true;
> +}

Comparing gvec_xxx vs gvec_vvv for LSX,

>     func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);

gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero to satisfy vl / 8.

I presume this is the intended behaviour of mixing LSX with LASX, that the high 128-bits 
that are not considered by the LSX instruction are zeroed on write?

Which means that your macros from patch 1,

> +#if HOST_BIG_ENDIAN
...
> +#define XB(x)  XB[31 - (x)]
> +#define XH(x)  XH[15 - (x)]

are incorrect.  We need big-endian within the Int128, but little-endian ordering of the 
two Int128. This can be done with

#define XB(x)  XB[(x) ^ 15]
#define XH(x)  XH[(x) ^ 7]

etc.

It would be nice to share more code with trans_lsx.c, if possible.

r~

Song Gao June 21, 2023, 9:19 a.m. UTC | #2

在 2023/6/20 下午8:25, Richard Henderson 写道:
> On 6/20/23 11:37, Song Gao wrote:
>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
>> +                     void (*func)(unsigned, uint32_t, uint32_t,
>> +                                  uint32_t, uint32_t, uint32_t))
>> +{
>> +    uint32_t xd_ofs, xj_ofs, xk_ofs;
>> +
>> +    CHECK_ASXE;
>> +
>> +    xd_ofs = vec_full_offset(a->xd);
>> +    xj_ofs = vec_full_offset(a->xj);
>> +    xk_ofs = vec_full_offset(a->xk);
>> +
>> +    func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
>> +    return true;
>> +}
>
> Comparing gvec_xxx vs gvec_vvv for LSX,
>
>>     func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
>
> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero 
> to satisfy vl / 8.
>
>
> I presume this is the intended behaviour of mixing LSX with LASX, that 
> the high 128-bits that are not considered by the LSX instruction are 
> zeroed on write?
>
Yes,  the LSX instruction  can ignore the high 128-bits.

> Which means that your macros from patch 1,
>
>> +#if HOST_BIG_ENDIAN
> ...
>> +#define XB(x)  XB[31 - (x)]
>> +#define XH(x)  XH[15 - (x)]
>
> are incorrect.  We need big-endian within the Int128, but 
> little-endian ordering of the two Int128. This can be done with
>
> #define XB(x)  XB[(x) ^ 15]
> #define XH(x)  XH[(x) ^ 7]
>
> etc.
>
Ok, I will correct it.
> It would be nice to share more code with trans_lsx.c, if possible.
>
Some functions can be merged,  e.g   gvec_vvv and  gvec_xxx.

Many of the latter patches are similar to LSX.   Maybe more code can be 
merged.

Thanks.
Song Gao

Richard Henderson June 21, 2023, 9:27 a.m. UTC | #3

On 6/21/23 11:19, Song Gao wrote:
> 
> 
> 在 2023/6/20 下午8:25, Richard Henderson 写道:
>> On 6/20/23 11:37, Song Gao wrote:
>>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
>>> +                     void (*func)(unsigned, uint32_t, uint32_t,
>>> +                                  uint32_t, uint32_t, uint32_t))
>>> +{
>>> +    uint32_t xd_ofs, xj_ofs, xk_ofs;
>>> +
>>> +    CHECK_ASXE;
>>> +
>>> +    xd_ofs = vec_full_offset(a->xd);
>>> +    xj_ofs = vec_full_offset(a->xj);
>>> +    xk_ofs = vec_full_offset(a->xk);
>>> +
>>> +    func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
>>> +    return true;
>>> +}
>>
>> Comparing gvec_xxx vs gvec_vvv for LSX,
>>
>>>     func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
>>
>> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero to satisfy vl / 8.
>>
>>
>> I presume this is the intended behaviour of mixing LSX with LASX, that the high 128-bits 
>> that are not considered by the LSX instruction are zeroed on write?
>>
> Yes,  the LSX instruction  can ignore the high 128-bits.

Ignore != write zeros on output.  What is the behaviour?


r~

Song Gao June 21, 2023, 9:56 a.m. UTC | #4

在 2023/6/21 下午5:27, Richard Henderson 写道:
> On 6/21/23 11:19, Song Gao wrote:
>>
>>
>> 在 2023/6/20 下午8:25, Richard Henderson 写道:
>>> On 6/20/23 11:37, Song Gao wrote:
>>>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
>>>> +                     void (*func)(unsigned, uint32_t, uint32_t,
>>>> +                                  uint32_t, uint32_t, uint32_t))
>>>> +{
>>>> +    uint32_t xd_ofs, xj_ofs, xk_ofs;
>>>> +
>>>> +    CHECK_ASXE;
>>>> +
>>>> +    xd_ofs = vec_full_offset(a->xd);
>>>> +    xj_ofs = vec_full_offset(a->xj);
>>>> +    xk_ofs = vec_full_offset(a->xk);
>>>> +
>>>> +    func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
>>>> +    return true;
>>>> +}
>>>
>>> Comparing gvec_xxx vs gvec_vvv for LSX,
>>>
>>>>     func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
>>>
>>> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero 
>>> to satisfy vl / 8.
>>>
>>>
>>> I presume this is the intended behaviour of mixing LSX with LASX, 
>>> that the high 128-bits that are not considered by the LSX 
>>> instruction are zeroed on write?
>>>
>> Yes,  the LSX instruction  can ignore the high 128-bits.
>
> Ignore != write zeros on output.  What is the behaviour?
>
Unpredictable,

For more,
LSX:
LA64 fp instructiosn change fp registers value,    the same num LSX 
registers [127: 64]  is  unpredictable.

LASX:
LA64 fp instructions change fp_registers value,  the same num LASX 
registers[255: 64] is unpredictable.
LSX instructions change LSX registers value,    the same num  LASX 
registers[255: 128] is Unpredictable.

Thanks.
Song Gao.

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5c402d944d..696f78c491 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1695,3 +1695,26 @@  INSN_LSX(vstelm_d,         vr_ii)
 INSN_LSX(vstelm_w,         vr_ii)
 INSN_LSX(vstelm_h,         vr_ii)
 INSN_LSX(vstelm_b,         vr_ii)
+
+#define INSN_LASX(insn, type)                               \
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{                                                           \
+    output_##type(ctx, a, #insn);                           \
+    return true;                                            \
+}
+
+static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk);
+}
+
+INSN_LASX(xvadd_b,           xxx)
+INSN_LASX(xvadd_h,           xxx)
+INSN_LASX(xvadd_w,           xxx)
+INSN_LASX(xvadd_d,           xxx)
+INSN_LASX(xvadd_q,           xxx)
+INSN_LASX(xvsub_b,           xxx)
+INSN_LASX(xvsub_h,           xxx)
+INSN_LASX(xvsub_w,           xxx)
+INSN_LASX(xvsub_d,           xxx)
+INSN_LASX(xvsub_q,           xxx)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 75a77f5dce..c918522f96 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -14,3 +14,62 @@ 
 #else
 #define CHECK_ASXE
 #endif
+
+static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
+                     void (*func)(unsigned, uint32_t, uint32_t,
+                                  uint32_t, uint32_t, uint32_t))
+{
+    uint32_t xd_ofs, xj_ofs, xk_ofs;
+
+    CHECK_ASXE;
+
+    xd_ofs = vec_full_offset(a->xd);
+    xj_ofs = vec_full_offset(a->xj);
+    xk_ofs = vec_full_offset(a->xk);
+
+    func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
+    return true;
+}
+
+TRANS(xvadd_b, gvec_xxx, MO_8, tcg_gen_gvec_add)
+TRANS(xvadd_h, gvec_xxx, MO_16, tcg_gen_gvec_add)
+TRANS(xvadd_w, gvec_xxx, MO_32, tcg_gen_gvec_add)
+TRANS(xvadd_d, gvec_xxx, MO_64, tcg_gen_gvec_add)
+
+#define XVADDSUB_Q(NAME)                                        \
+static bool trans_xv## NAME ##_q(DisasContext *ctx, arg_xxx *a) \
+{                                                               \
+    TCGv_i64 rh, rl, ah, al, bh, bl;                            \
+    int i;                                                      \
+                                                                \
+    CHECK_ASXE;                                                 \
+                                                                \
+    rh = tcg_temp_new_i64();                                    \
+    rl = tcg_temp_new_i64();                                    \
+    ah = tcg_temp_new_i64();                                    \
+    al = tcg_temp_new_i64();                                    \
+    bh = tcg_temp_new_i64();                                    \
+    bl = tcg_temp_new_i64();                                    \
+                                                                \
+    for (i = 0; i < 2; i++) {                                   \
+        get_xreg64(ah, a->xj, 1 + i * 2);                       \
+        get_xreg64(al, a->xj, 0 + i * 2);                       \
+        get_xreg64(bh, a->xk, 1 + i * 2);                       \
+        get_xreg64(bl, a->xk, 0 + i * 2);                       \
+                                                                \
+        tcg_gen_## NAME ##2_i64(rl, rh, al, ah, bl, bh);        \
+                                                                \
+        set_xreg64(rh, a->xd, 1 + i * 2);                       \
+        set_xreg64(rl, a->xd, 0 + i * 2);                       \
+   }                                                            \
+                                                                \
+    return true;                                                \
+}
+
+XVADDSUB_Q(add)
+XVADDSUB_Q(sub)
+
+TRANS(xvsub_b, gvec_xxx, MO_8, tcg_gen_gvec_sub)
+TRANS(xvsub_h, gvec_xxx, MO_16, tcg_gen_gvec_sub)
+TRANS(xvsub_w, gvec_xxx, MO_32, tcg_gen_gvec_sub)
+TRANS(xvsub_d, gvec_xxx, MO_64, tcg_gen_gvec_sub)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c9c3bc2c73..bac1903975 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1296,3 +1296,26 @@  vstelm_d         0011 00010001 0 . ........ ..... .....   @vr_i8i1
 vstelm_w         0011 00010010 .. ........ ..... .....    @vr_i8i2
 vstelm_h         0011 0001010 ... ........ ..... .....    @vr_i8i3
 vstelm_b         0011 000110 .... ........ ..... .....    @vr_i8i4
+
+#
+# LASX Argument sets
+#
+
+&xxx          xd xj xk
+
+#
+# LASX Formats
+#
+
+@xxx                .... ........ ..... xk:5 xj:5 xd:5    &xxx
+
+xvadd_b          0111 01000000 10100 ..... ..... .....    @xxx
+xvadd_h          0111 01000000 10101 ..... ..... .....    @xxx
+xvadd_w          0111 01000000 10110 ..... ..... .....    @xxx
+xvadd_d          0111 01000000 10111 ..... ..... .....    @xxx
+xvadd_q          0111 01010010 11010 ..... ..... .....    @xxx
+xvsub_b          0111 01000000 11000 ..... ..... .....    @xxx
+xvsub_h          0111 01000000 11001 ..... ..... .....    @xxx
+xvsub_w          0111 01000000 11010 ..... ..... .....    @xxx
+xvsub_d          0111 01000000 11011 ..... ..... .....    @xxx
+xvsub_q          0111 01010010 11011 ..... ..... .....    @xxx
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 6bf2d726d6..5300e14815 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -18,6 +18,7 @@ 
 #include "fpu/softfloat.h"
 #include "translate.h"
 #include "internals.h"
+#include "vec.h"
 
 /* Global register indices */
 TCGv cpu_gpr[32], cpu_pc;
@@ -48,6 +49,18 @@  static inline void set_vreg64(TCGv_i64 src, int regno, int index)
                    offsetof(CPULoongArchState, fpr[regno].vreg.D(index)));
 }
 
+static inline void get_xreg64(TCGv_i64 dest, int regno, int index)
+{
+    tcg_gen_ld_i64(dest, cpu_env,
+                   offsetof(CPULoongArchState, fpr[regno].xreg.XD(index)));
+}
+
+static inline void set_xreg64(TCGv_i64 src, int regno, int index)
+{
+    tcg_gen_st_i64(src, cpu_env,
+                   offsetof(CPULoongArchState, fpr[regno].xreg.XD(index)));
+}
+
 static inline int plus_1(DisasContext *ctx, int x)
 {
     return x + 1;
@@ -119,6 +132,10 @@  static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
         ctx->vl = LSX_LEN;
     }
 
+    if (FIELD_EX64(env->cpucfg[2], CPUCFG2, LASX)) {
+        ctx->vl = LASX_LEN;
+    }
+
     ctx->zero = tcg_constant_tl(0);
 }

[v1,04/46] target/loongarch: Implement xvadd/xvsub

Commit Message

Comments

Patch