diff mbox series

[RFC,v4,67/70] target/riscv: rvv-1.0: relax RV_VLEN_MAX to 512-bits

Message ID 20200817084955.28793-68-frank.chang@sifive.com
State New
Headers show
Series support vector extension v1.0 | expand

Commit Message

Frank Chang Aug. 17, 2020, 8:49 a.m. UTC
From: Frank Chang <frank.chang@sifive.com>

As GVEC only supports MAXSZ and OPRSZ in the range of: [8..256] bytes
and LMUL could be a fractional number. The maximum vector size can be
operated might be less than 8 bytes or larger than 256 bytes.
Skip to use GVEC if maximum vector size <= 8 or >= 256 bytes.

Signed-off-by: Frank Chang <frank.chang@sifive.com>

--
Maybe to relax the limitations of MAXSZ or OPRSZ would be a better
approach.

Signed-off-by: Frank Chang <frank.chang@sifive.com>
---
 target/riscv/cpu.h                      | 13 +++++++------
 target/riscv/insn_trans/trans_rvv.inc.c |  2 +-
 target/riscv/vector_helper.c            |  2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)

Comments

Richard Henderson Aug. 30, 2020, 1:39 a.m. UTC | #1
On 8/17/20 1:49 AM, frank.chang@sifive.com wrote:
> From: Frank Chang <frank.chang@sifive.com>
> 
> As GVEC only supports MAXSZ and OPRSZ in the range of: [8..256] bytes
> and LMUL could be a fractional number. The maximum vector size can be
> operated might be less than 8 bytes or larger than 256 bytes.
> Skip to use GVEC if maximum vector size <= 8 or >= 256 bytes.
> 
> Signed-off-by: Frank Chang <frank.chang@sifive.com>
> 
> --
> Maybe to relax the limitations of MAXSZ or OPRSZ would be a better
> approach.

I would definitely like to improve gvec to handle any actual vector length that
you need.  With VLEN=512 (bits) and LMUL=8, that gives you 512 byte vectors.
Is that the limit of what you need, or did you want to go higher?

There will have to be some maximum supported by tcg, though.
It's probably worth having an assert somewhere.

Perhaps something like

/*
 * RV_LEN_MAX (bits) / 8 (bits-per-byte) * 8 (LMUL)
 * = RV_LEN_MAX (bytes)
 *
 * should be less than the number of bytes supported by gvec.
 */
QEMU_BUILD_BUG_ON(RV_VLEN_MAX > (8 << SIMD_MAXSZ_BITS));

Perhaps placed in in vector_helper.c, so that cpu.h does not have to include
"tcg/tcg-gvec-desc.h".

However... simply increasing the number of bits in SIMD_MAXSZ_BITS and
SIMD_OPRSZ_BITS will break Arm SVE -- we need 20 bits in simd_data(), and
that's exactly what we have at present.

If we can come up with a more compact encoding of oprsz/maxsz, that would be
ideal.  Otherwise, I need to compress the data currently stored in simd_data().

-----

I suppose one point here is that for RISC-V, oprsz always equals maxsz.  So
we've effectively wasted 5 bits.  Moreover, that's also true for Arm SVE.

However, Arm AdvSIMD, the older vector isa, will have oprsz == 8 or oprsz ==
16.  Since the vector registers overlap, maxsz is the SVE vector length, and
the area in between oprsz and maxsz is cleared.

If we ever merge the x86_64 AVX2 patches from last year's GSoC, and then expand
on that to implement AVX512, then we would have oprsz == 16 or oprsz == 32,
with maxsz == 64.

Perhaps we could reduce the generality of oprsz, and compress it into 2 bits:

  0b00 -> 8
  0b01 -> 16
  0b10 -> 32
  0b11 -> maxsz

Now we have 3 bits we can move over to the maxsz field, which will let us
represent 8 * 256 or 2048 byte vectors.

Thoughts?


r~
diff mbox series

Patch

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 6e9b17c4e38..2c7ce500fa7 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -92,7 +92,7 @@  typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
-#define RV_VLEN_MAX 256
+#define RV_VLEN_MAX 512
 
 FIELD(VTYPE, VLMUL, 0, 3)
 FIELD(VTYPE, VSEW, 3, 3)
@@ -413,16 +413,17 @@  static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
         /*
          * If env->vl equals to VLMAX, we can use generic vector operation
          * expanders (GVEC) to accerlate the vector operations.
-         * However, as LMUL could be a fractional number. The maximum
-         * vector size can be operated might be less than 8 bytes,
-         * which is not supported by GVEC. So we set vl_eq_vlmax flag to true
-         * only when maxsz >= 8 bytes.
+         * However, as GVEC only supports MAXSZ and OPRSZ in the range of:
+         * [8..256] bytes and LMUL could be a fractional number. The maximum
+         * vector size can be operated might be less than 8 bytes or
+         * larger than 256 bytes. So we set vl_eq_vlmax flag to true only
+         * when maxsz >= 8 bytes and <= 256 bytes.
          */
         uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
         uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
         uint32_t maxsz = vlmax << sew;
         bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl)
-                           && (maxsz >= 8);
+                           && (maxsz >= 8) && (maxsz <= 256);
         flags = FIELD_DP32(flags, TB_FLAGS, VILL,
                     FIELD_EX64(env->vtype, VTYPE, VILL));
         flags = FIELD_DP32(flags, TB_FLAGS, SEW, sew);
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index f2edf804460..9ad64762239 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -669,7 +669,7 @@  static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
 
     /*
      * As simd_desc supports at most 256 bytes, and in this implementation,
-     * the max vector group length is 1024 bytes. So split it into two parts.
+     * the max vector group length is 2048 bytes. So split it into two parts.
      *
      * The first part is vlen in bytes, encoded in maxsz of simd_desc.
      * The second part is lmul, encoded in data of simd_desc.
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 316e435f8af..07d1ee60717 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -129,7 +129,7 @@  static uint32_t vext_wd(uint32_t desc)
 static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
 {
     /*
-     * As simd_desc support at most 256 bytes, the max vlen is 256 bits.
+     * As simd_desc support at most 256 bytes, the max vlen is 512 bits.
      * so vlen in bytes (vlenb) is encoded as maxsz.
      */
     uint32_t vlenb = simd_maxsz(desc);