Message ID | 20230620093814.123650-5-gaosong@loongson.cn |
---|---|
State | New |
Headers | show |
Series | Add LoongArch LASX instructions | expand |
On 6/20/23 11:37, Song Gao wrote: > +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop, > + void (*func)(unsigned, uint32_t, uint32_t, > + uint32_t, uint32_t, uint32_t)) > +{ > + uint32_t xd_ofs, xj_ofs, xk_ofs; > + > + CHECK_ASXE; > + > + xd_ofs = vec_full_offset(a->xd); > + xj_ofs = vec_full_offset(a->xj); > + xk_ofs = vec_full_offset(a->xk); > + > + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8); > + return true; > +} Comparing gvec_xxx vs gvec_vvv for LSX, > func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8); gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero to satisfy vl / 8. I presume this is the intended behaviour of mixing LSX with LASX, that the high 128-bits that are not considered by the LSX instruction are zeroed on write? Which means that your macros from patch 1, > +#if HOST_BIG_ENDIAN ... > +#define XB(x) XB[31 - (x)] > +#define XH(x) XH[15 - (x)] are incorrect. We need big-endian within the Int128, but little-endian ordering of the two Int128. This can be done with #define XB(x) XB[(x) ^ 15] #define XH(x) XH[(x) ^ 7] etc. It would be nice to share more code with trans_lsx.c, if possible. r~
在 2023/6/20 下午8:25, Richard Henderson 写道: > On 6/20/23 11:37, Song Gao wrote: >> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop, >> + void (*func)(unsigned, uint32_t, uint32_t, >> + uint32_t, uint32_t, uint32_t)) >> +{ >> + uint32_t xd_ofs, xj_ofs, xk_ofs; >> + >> + CHECK_ASXE; >> + >> + xd_ofs = vec_full_offset(a->xd); >> + xj_ofs = vec_full_offset(a->xj); >> + xk_ofs = vec_full_offset(a->xk); >> + >> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8); >> + return true; >> +} > > Comparing gvec_xxx vs gvec_vvv for LSX, > >> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8); > > gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero > to satisfy vl / 8. > > > I presume this is the intended behaviour of mixing LSX with LASX, that > the high 128-bits that are not considered by the LSX instruction are > zeroed on write? > Yes, the LSX instruction can ignore the high 128-bits. > Which means that your macros from patch 1, > >> +#if HOST_BIG_ENDIAN > ... >> +#define XB(x) XB[31 - (x)] >> +#define XH(x) XH[15 - (x)] > > are incorrect. We need big-endian within the Int128, but > little-endian ordering of the two Int128. This can be done with > > #define XB(x) XB[(x) ^ 15] > #define XH(x) XH[(x) ^ 7] > > etc. > Ok, I will correct it. > It would be nice to share more code with trans_lsx.c, if possible. > Some functions can be merged, e.g gvec_vvv and gvec_xxx. Many of the latter patches are similar to LSX. Maybe more code can be merged. Thanks. Song Gao
On 6/21/23 11:19, Song Gao wrote: > > > 在 2023/6/20 下午8:25, Richard Henderson 写道: >> On 6/20/23 11:37, Song Gao wrote: >>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop, >>> + void (*func)(unsigned, uint32_t, uint32_t, >>> + uint32_t, uint32_t, uint32_t)) >>> +{ >>> + uint32_t xd_ofs, xj_ofs, xk_ofs; >>> + >>> + CHECK_ASXE; >>> + >>> + xd_ofs = vec_full_offset(a->xd); >>> + xj_ofs = vec_full_offset(a->xj); >>> + xk_ofs = vec_full_offset(a->xk); >>> + >>> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8); >>> + return true; >>> +} >> >> Comparing gvec_xxx vs gvec_vvv for LSX, >> >>> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8); >> >> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero to satisfy vl / 8. >> >> >> I presume this is the intended behaviour of mixing LSX with LASX, that the high 128-bits >> that are not considered by the LSX instruction are zeroed on write? >> > Yes, the LSX instruction can ignore the high 128-bits. Ignore != write zeros on output. What is the behaviour? r~
在 2023/6/21 下午5:27, Richard Henderson 写道: > On 6/21/23 11:19, Song Gao wrote: >> >> >> 在 2023/6/20 下午8:25, Richard Henderson 写道: >>> On 6/20/23 11:37, Song Gao wrote: >>>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop, >>>> + void (*func)(unsigned, uint32_t, uint32_t, >>>> + uint32_t, uint32_t, uint32_t)) >>>> +{ >>>> + uint32_t xd_ofs, xj_ofs, xk_ofs; >>>> + >>>> + CHECK_ASXE; >>>> + >>>> + xd_ofs = vec_full_offset(a->xd); >>>> + xj_ofs = vec_full_offset(a->xj); >>>> + xk_ofs = vec_full_offset(a->xk); >>>> + >>>> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8); >>>> + return true; >>>> +} >>> >>> Comparing gvec_xxx vs gvec_vvv for LSX, >>> >>>> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8); >>> >>> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero >>> to satisfy vl / 8. >>> >>> >>> I presume this is the intended behaviour of mixing LSX with LASX, >>> that the high 128-bits that are not considered by the LSX >>> instruction are zeroed on write? >>> >> Yes, the LSX instruction can ignore the high 128-bits. > > Ignore != write zeros on output. What is the behaviour? > Unpredictable, For more, LSX: LA64 fp instructiosn change fp registers value, the same num LSX registers [127: 64] is unpredictable. LASX: LA64 fp instructions change fp_registers value, the same num LASX registers[255: 64] is unpredictable. LSX instructions change LSX registers value, the same num LASX registers[255: 128] is Unpredictable. Thanks. Song Gao.
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c index 5c402d944d..696f78c491 100644 --- a/target/loongarch/disas.c +++ b/target/loongarch/disas.c @@ -1695,3 +1695,26 @@ INSN_LSX(vstelm_d, vr_ii) INSN_LSX(vstelm_w, vr_ii) INSN_LSX(vstelm_h, vr_ii) INSN_LSX(vstelm_b, vr_ii) + +#define INSN_LASX(insn, type) \ +static bool trans_##insn(DisasContext *ctx, arg_##type * a) \ +{ \ + output_##type(ctx, a, #insn); \ + return true; \ +} + +static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic) +{ + output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk); +} + +INSN_LASX(xvadd_b, xxx) +INSN_LASX(xvadd_h, xxx) +INSN_LASX(xvadd_w, xxx) +INSN_LASX(xvadd_d, xxx) +INSN_LASX(xvadd_q, xxx) +INSN_LASX(xvsub_b, xxx) +INSN_LASX(xvsub_h, xxx) +INSN_LASX(xvsub_w, xxx) +INSN_LASX(xvsub_d, xxx) +INSN_LASX(xvsub_q, xxx) diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc index 75a77f5dce..c918522f96 100644 --- a/target/loongarch/insn_trans/trans_lasx.c.inc +++ b/target/loongarch/insn_trans/trans_lasx.c.inc @@ -14,3 +14,62 @@ #else #define CHECK_ASXE #endif + +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop, + void (*func)(unsigned, uint32_t, uint32_t, + uint32_t, uint32_t, uint32_t)) +{ + uint32_t xd_ofs, xj_ofs, xk_ofs; + + CHECK_ASXE; + + xd_ofs = vec_full_offset(a->xd); + xj_ofs = vec_full_offset(a->xj); + xk_ofs = vec_full_offset(a->xk); + + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8); + return true; +} + +TRANS(xvadd_b, gvec_xxx, MO_8, tcg_gen_gvec_add) +TRANS(xvadd_h, gvec_xxx, MO_16, tcg_gen_gvec_add) +TRANS(xvadd_w, gvec_xxx, MO_32, tcg_gen_gvec_add) +TRANS(xvadd_d, gvec_xxx, MO_64, tcg_gen_gvec_add) + +#define XVADDSUB_Q(NAME) \ +static bool trans_xv## NAME ##_q(DisasContext *ctx, arg_xxx *a) \ +{ \ + TCGv_i64 rh, rl, ah, al, bh, bl; \ + int i; \ + \ + CHECK_ASXE; \ + \ + rh = tcg_temp_new_i64(); \ + rl = tcg_temp_new_i64(); \ + ah = tcg_temp_new_i64(); \ + al = tcg_temp_new_i64(); \ + bh = tcg_temp_new_i64(); \ + bl = tcg_temp_new_i64(); \ + \ + for (i = 0; i < 2; i++) { \ + get_xreg64(ah, a->xj, 1 + i * 2); \ + get_xreg64(al, a->xj, 0 + i * 2); \ + get_xreg64(bh, a->xk, 1 + i * 2); \ + get_xreg64(bl, a->xk, 0 + i * 2); \ + \ + tcg_gen_## NAME ##2_i64(rl, rh, al, ah, bl, bh); \ + \ + set_xreg64(rh, a->xd, 1 + i * 2); \ + set_xreg64(rl, a->xd, 0 + i * 2); \ + } \ + \ + return true; \ +} + +XVADDSUB_Q(add) +XVADDSUB_Q(sub) + +TRANS(xvsub_b, gvec_xxx, MO_8, tcg_gen_gvec_sub) +TRANS(xvsub_h, gvec_xxx, MO_16, tcg_gen_gvec_sub) +TRANS(xvsub_w, gvec_xxx, MO_32, tcg_gen_gvec_sub) +TRANS(xvsub_d, gvec_xxx, MO_64, tcg_gen_gvec_sub) diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode index c9c3bc2c73..bac1903975 100644 --- a/target/loongarch/insns.decode +++ b/target/loongarch/insns.decode @@ -1296,3 +1296,26 @@ vstelm_d 0011 00010001 0 . ........ ..... ..... @vr_i8i1 vstelm_w 0011 00010010 .. ........ ..... ..... @vr_i8i2 vstelm_h 0011 0001010 ... ........ ..... ..... @vr_i8i3 vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4 + +# +# LASX Argument sets +# + +&xxx xd xj xk + +# +# LASX Formats +# + +@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx + +xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx +xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx +xvadd_w 0111 01000000 10110 ..... ..... ..... @xxx +xvadd_d 0111 01000000 10111 ..... ..... ..... @xxx +xvadd_q 0111 01010010 11010 ..... ..... ..... @xxx +xvsub_b 0111 01000000 11000 ..... ..... ..... @xxx +xvsub_h 0111 01000000 11001 ..... ..... ..... @xxx +xvsub_w 0111 01000000 11010 ..... ..... ..... @xxx +xvsub_d 0111 01000000 11011 ..... ..... ..... @xxx +xvsub_q 0111 01010010 11011 ..... ..... ..... @xxx diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c index 6bf2d726d6..5300e14815 100644 --- a/target/loongarch/translate.c +++ b/target/loongarch/translate.c @@ -18,6 +18,7 @@ #include "fpu/softfloat.h" #include "translate.h" #include "internals.h" +#include "vec.h" /* Global register indices */ TCGv cpu_gpr[32], cpu_pc; @@ -48,6 +49,18 @@ static inline void set_vreg64(TCGv_i64 src, int regno, int index) offsetof(CPULoongArchState, fpr[regno].vreg.D(index))); } +static inline void get_xreg64(TCGv_i64 dest, int regno, int index) +{ + tcg_gen_ld_i64(dest, cpu_env, + offsetof(CPULoongArchState, fpr[regno].xreg.XD(index))); +} + +static inline void set_xreg64(TCGv_i64 src, int regno, int index) +{ + tcg_gen_st_i64(src, cpu_env, + offsetof(CPULoongArchState, fpr[regno].xreg.XD(index))); +} + static inline int plus_1(DisasContext *ctx, int x) { return x + 1; @@ -119,6 +132,10 @@ static void loongarch_tr_init_disas_context(DisasContextBase *dcbase, ctx->vl = LSX_LEN; } + if (FIELD_EX64(env->cpucfg[2], CPUCFG2, LASX)) { + ctx->vl = LASX_LEN; + } + ctx->zero = tcg_constant_tl(0); }
This patch includes: - XVADD.{B/H/W/D/Q}; - XVSUB.{B/H/W/D/Q}. Signed-off-by: Song Gao <gaosong@loongson.cn> --- target/loongarch/disas.c | 23 ++++++++ target/loongarch/insn_trans/trans_lasx.c.inc | 59 ++++++++++++++++++++ target/loongarch/insns.decode | 23 ++++++++ target/loongarch/translate.c | 17 ++++++ 4 files changed, 122 insertions(+)