Message ID | 2cf2fa3f-541b-4c39-8689-161c7a047f7a@gmail.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: Vectorized str(n)cmp and strlen. | expand |
On 11/30/23 15:22, Robin Dapp wrote: > Hi, > > this adds vectorized implementations of strcmp and strncmp as well as > strlen. strlen falls back to the previously implemented rawmemchr. > Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure: > We would only ever increment the source address by 1 regardless of > the input type. > > The patch also changes the stringop-strategy handling slightly: > auto is now an aggregate (including vector and scalar, > possibly more in the future) and expansion functions try all > matching strategies in their preferred order. > > As before, str* expansion is guarded by -minline-str* and not active > by default. This might change in the future as I would rather have > those on by default. As of now, though, there is still a latent bug: > > With -minline-strlen and -minline-strcmp we have several execution > failures in gcc.c-torture/execute/builtins/. From my initial analysis > it looks like we don't insert a vsetvl at the right spot (which would > be right after a setjmp in those cases). This leaves the initial > vle8ff without a proper vtype or vl causing a SIGILL. > Still, I figured I'd rather post the patch as-is so the bug can be > reproduced upstream. > > Regards > Robin > > gcc/ChangeLog: > > PR target/112109 > > * config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): > Rename. > (enum stringop_strategy_enum): To this. > * config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen > param. > (expand_strcmp): Define. > * config/riscv/riscv-string.cc (riscv_expand_strcmp): Add > vector version. > (riscv_expand_strlen): Ditto. > (riscv_expand_block_move_scalar): Handle existing scalar expansion. > (riscv_expand_block_move): Expand to either vector or scalar > version. > (expand_block_move): Add stringop strategy. > (expand_rawmemchr): Handle strlen and fix increment bug. > (expand_strcmp): New expander. > * config/riscv/riscv.md: Add vector. > * config/riscv/riscv.opt: Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test. > * gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test. > * gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test. > * gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test. Do you want to extract the rawmemchr fix and push it forward independently? Or do we think the vsetvl issue will be resolved quickly enough that extraction of that fix would just be "make work"? Jeff
Hi, Robin.
Thanks for working on this. I know this is a tedious work.
A couple comments here:
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx,
+ alignment, ncompare);
+ if (ok)
+ return true;
+ }
+ if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR))
+ {
+ riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+ /* strlen */ true);
+ return true;
+ }
To make code consistent, I think you should change it cpymem:
(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "general_operand")
(match_operand:BLK 1 "general_operand"))
(use (match_operand:P 2 ""))
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
})
Or you should change cpymem code first (in another patch) like strcmp/strlen you did in this patch.
I don't have strong opinion here, depend on you.
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
- if (riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_VECTOR)
+ if (!CONST_INT_P (length))
return false;
- if (CONST_INT_P (length))
- {
- unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
- unsigned HOST_WIDE_INT factor, align;
+ unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+ unsigned HOST_WIDE_INT factor, align;
- align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
- factor = BITS_PER_WORD / align;
+ align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+ factor = BITS_PER_WORD / align;
- if (optimize_function_for_size_p (cfun)
- && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+ if (optimize_function_for_size_p (cfun)
+ && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+ return false;
- if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ {
+ riscv_block_move_straight (dest, src, INTVAL (length));
+ return true;
+ }
+ else if (optimize && align >= BITS_PER_WORD)
+ {
+ unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+ unsigned iter_words = min_iter_words;
+ unsigned HOST_WIDE_INT bytes = hwi_length;
+ unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+ /* Lengthen the loop body if it shortens the tail. */
+ for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
- riscv_block_move_straight (dest, src, INTVAL (length));
- return true;
+ unsigned cur_cost = iter_words + words % iter_words;
+ unsigned new_cost = i + words % i;
+ if (new_cost <= cur_cost)
+ iter_words = i;
}
- else if (optimize && align >= BITS_PER_WORD)
- {
- unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
- unsigned iter_words = min_iter_words;
- unsigned HOST_WIDE_INT bytes = hwi_length;
- unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
- /* Lengthen the loop body if it shortens the tail. */
- for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
- {
- unsigned cur_cost = iter_words + words % iter_words;
- unsigned new_cost = i + words % i;
- if (new_cost <= cur_cost)
- iter_words = i;
- }
- riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
- return true;
- }
+ riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+ return true;
+ }
+
+ return false;
+}
I don't understand why you touch scalar part here ? It looks like formating ?
If yes, it should be another separate patch.
Otherwise, Ok from my side.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-01 06:22
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zhong@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Vectorized str(n)cmp and strlen.
Hi,
this adds vectorized implementations of strcmp and strncmp as well as
strlen. strlen falls back to the previously implemented rawmemchr.
Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure:
We would only ever increment the source address by 1 regardless of
the input type.
The patch also changes the stringop-strategy handling slightly:
auto is now an aggregate (including vector and scalar,
possibly more in the future) and expansion functions try all
matching strategies in their preferred order.
As before, str* expansion is guarded by -minline-str* and not active
by default. This might change in the future as I would rather have
those on by default. As of now, though, there is still a latent bug:
With -minline-strlen and -minline-strcmp we have several execution
failures in gcc.c-torture/execute/builtins/. From my initial analysis
it looks like we don't insert a vsetvl at the right spot (which would
be right after a setjmp in those cases). This leaves the initial
vle8ff without a proper vtype or vl causing a SIGILL.
Still, I figured I'd rather post the patch as-is so the bug can be
reproduced upstream.
Regards
Robin
gcc/ChangeLog:
PR target/112109
* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum):
Rename.
(enum stringop_strategy_enum): To this.
* config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen
param.
(expand_strcmp): Define.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
vector version.
(riscv_expand_strlen): Ditto.
(riscv_expand_block_move_scalar): Handle existing scalar expansion.
(riscv_expand_block_move): Expand to either vector or scalar
version.
(expand_block_move): Add stringop strategy.
(expand_rawmemchr): Handle strlen and fix increment bug.
(expand_strcmp): New expander.
* config/riscv/riscv.md: Add vector.
* config/riscv/riscv.opt: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test.
---
gcc/config/riscv/riscv-opts.h | 20 +-
gcc/config/riscv/riscv-protos.h | 4 +-
gcc/config/riscv/riscv-string.cc | 287 +++++++++++++++---
gcc/config/riscv/riscv.md | 18 +-
gcc/config/riscv/riscv.opt | 18 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c | 32 ++
.../riscv/rvv/autovec/builtin/strcmp.c | 13 +
.../riscv/rvv/autovec/builtin/strlen-run.c | 37 +++
.../riscv/rvv/autovec/builtin/strlen.c | 12 +
9 files changed, 363 insertions(+), 78 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index e6e55ad7071..315f6ddb239 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -103,16 +103,16 @@ enum riscv_entity
MAX_RISCV_ENTITIES
};
-/* RISC-V stringop strategy. */
-enum riscv_stringop_strategy_enum {
- /* Use scalar or vector instructions. */
- USE_AUTO,
- /* Always use a library call. */
- USE_LIBCALL,
- /* Only use scalar instructions. */
- USE_SCALAR,
- /* Only use vector instructions. */
- USE_VECTOR
+/* RISC-V builtin strategy. */
+enum stringop_strategy_enum {
+ /* No expansion. */
+ STRINGOP_STRATEGY_LIBCALL = 1,
+ /* Use scalar expansion if possible. */
+ STRINGOP_STRATEGY_SCALAR = 2,
+ /* Only vector expansion if possible. */
+ STRINGOP_STRATEGY_VECTOR = 4,
+ /* Use any. */
+ STRINGOP_STRATEGY_AUTO = STRINGOP_STRATEGY_SCALAR | STRINGOP_STRATEGY_VECTOR
};
#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 695ee24ad6f..51359154846 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -557,7 +557,9 @@ void expand_cond_unop (unsigned, rtx *);
void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
-void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx,
+ unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM. */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 80e3b5981af..ce259831a5c 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,7 +511,16 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
return false;
alignment = UINTVAL (align_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx,
+ alignment, ncompare);
+ if (ok)
+ return true;
+ }
+
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
{
return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
ncompare);
@@ -588,9 +597,17 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
bool
riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
{
+ if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR))
+ {
+ riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+ /* strlen */ true);
+ return true;
+ }
+
gcc_assert (search_char == const0_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
return riscv_expand_strlen_scalar (result, src, align);
return false;
@@ -707,51 +724,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
/* Expand a cpymemsi instruction, which copies LENGTH bytes from
memory reference SRC to memory reference DEST. */
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
- if (riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_VECTOR)
+ if (!CONST_INT_P (length))
return false;
- if (CONST_INT_P (length))
- {
- unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
- unsigned HOST_WIDE_INT factor, align;
+ unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+ unsigned HOST_WIDE_INT factor, align;
- align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
- factor = BITS_PER_WORD / align;
+ align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+ factor = BITS_PER_WORD / align;
- if (optimize_function_for_size_p (cfun)
- && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+ if (optimize_function_for_size_p (cfun)
+ && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+ return false;
- if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ {
+ riscv_block_move_straight (dest, src, INTVAL (length));
+ return true;
+ }
+ else if (optimize && align >= BITS_PER_WORD)
+ {
+ unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+ unsigned iter_words = min_iter_words;
+ unsigned HOST_WIDE_INT bytes = hwi_length;
+ unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+ /* Lengthen the loop body if it shortens the tail. */
+ for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
- riscv_block_move_straight (dest, src, INTVAL (length));
- return true;
+ unsigned cur_cost = iter_words + words % iter_words;
+ unsigned new_cost = i + words % i;
+ if (new_cost <= cur_cost)
+ iter_words = i;
}
- else if (optimize && align >= BITS_PER_WORD)
- {
- unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
- unsigned iter_words = min_iter_words;
- unsigned HOST_WIDE_INT bytes = hwi_length;
- unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
- /* Lengthen the loop body if it shortens the tail. */
- for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
- {
- unsigned cur_cost = iter_words + words % iter_words;
- unsigned new_cost = i + words % i;
- if (new_cost <= cur_cost)
- iter_words = i;
- }
- riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
- return true;
- }
+ riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+ return true;
+ }
+
+ return false;
+}
+
+/* This function delegates block-move expansion to either the vector
+ implementation or the scalar one. Return TRUE if successful or FALSE
+ otherwise. */
+
+bool
+riscv_expand_block_move (rtx dest, rtx src, rtx length)
+{
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_block_move (dest, src, length);
+ if (ok)
+ return true;
}
+
+ if (stringop_strategy & STRINGOP_STRATEGY_SCALAR)
+ return riscv_expand_block_move_scalar (dest, src, length);
+
return false;
}
@@ -777,9 +811,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
bnez a2, loop # Any more?
ret # Return
*/
- if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_SCALAR)
- return false;
HOST_WIDE_INT potential_ew
= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
/ BITS_PER_UNIT);
@@ -968,7 +999,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
behavior is undefined. */
void
-expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat,
+ bool strlen)
{
/*
rawmemchr:
@@ -1001,6 +1033,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
rtx end = gen_reg_rtx (Pmode);
rtx vec = gen_reg_rtx (vmode);
rtx mask = gen_reg_rtx (mask_mode);
@@ -1011,12 +1045,18 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+ rtx start_addr = copy_addr_to_reg (XEXP (src, 0));
rtx loop = gen_label_rtx ();
emit_label (loop);
rtx vsrc = change_address (src, vmode, src_addr);
+ /* Bump the pointer. */
+ rtx step = gen_reg_rtx (Pmode);
+ emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step)));
+
/* Emit a first-fault load. */
rtx vlops[] = {vec, vsrc};
emit_vlmax_insn (code_for_pred_fault_load (vmode),
@@ -1039,19 +1079,166 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
riscv_vector::CPOP_OP, vfops, cnt);
- /* Bump the pointer. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt)));
-
/* Emit the loop condition. */
rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx);
emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop));
- /* We overran by CNT, subtract it. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt)));
-
- /* We found something at SRC + END * [1,2,4,8]. */
- emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
- emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ if (strlen)
+ {
+ /* For strlen, return the length. */
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_MINUS (Pmode, dst, start_addr)));
+ }
+ else
+ {
+ /* For rawmemchr, return the position at SRC + END * [1,2,4,8]. */
+ emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ }
}
+/* Implement cmpstr<mode> using vector instructions. */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+ unsigned HOST_WIDE_INT, bool)
+{
+ gcc_assert (TARGET_VECTOR);
+
+ /* We don't support big endian. */
+ if (BYTES_BIG_ENDIAN)
+ return false;
+
+ bool with_length = nbytes != NULL_RTX;
+
+ if (with_length
+ && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+ return false;
+
+ if (with_length && CONST_INT_P (nbytes))
+ nbytes = force_reg (Pmode, nbytes);
+
+ machine_mode mode = E_QImode;
+ unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+ int lmul = TARGET_MAX_LMUL;
+ poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+ machine_mode vmode;
+ if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode),
+ nunits).exists (&vmode))
+ gcc_unreachable ();
+
+ machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
+
+ /* Prepare addresses. */
+ rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0));
+ rtx vsrc1 = change_address (src1, vmode, src_addr1);
+
+ rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0));
+ rtx vsrc2 = change_address (src2, vmode, src_addr2);
+
+ /* Set initial pointer bump to 0. */
+ rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
+ rtx sub = gen_reg_rtx (Pmode);
+ emit_move_insn (sub, CONST0_RTX (Pmode));
+
+ /* Create source vectors. */
+ rtx vec1 = gen_reg_rtx (vmode);
+ rtx vec2 = gen_reg_rtx (vmode);
+
+ rtx done = gen_label_rtx ();
+ rtx loop = gen_label_rtx ();
+ emit_label (loop);
+
+ /* Bump the pointers. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt)));
+
+ rtx vlops1[] = {vec1, vsrc1};
+ rtx vlops2[] = {vec2, vsrc2};
+
+ if (!with_length)
+ {
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1);
+
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2);
+ }
+ else
+ {
+ nbytes = gen_lowpart (Pmode, nbytes);
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1, nbytes);
+
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2, nbytes);
+ }
+
+ /* Read the vl for the next pointer bump. */
+ if (Pmode == SImode)
+ emit_insn (gen_read_vlsi (cnt));
+ else
+ emit_insn (gen_read_vldi_zero_extend (cnt));
+
+ if (with_length)
+ {
+ rtx test_done = gen_rtx_EQ (VOIDmode, cnt, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test_done, cnt, const0_rtx, done));
+ emit_insn (gen_rtx_SET (nbytes, gen_rtx_MINUS (Pmode, nbytes, cnt)));
+ }
+
+ /* Look for a \0 in the first string. */
+ rtx mask0 = gen_reg_rtx (mask_mode);
+ rtx eq0 = gen_rtx_EQ (mask_mode,
+ gen_const_vec_duplicate (vmode, CONST0_RTX (mode)),
+ vec1);
+ rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)};
+ emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+ riscv_vector::COMPARE_OP, vmsops1, cnt);
+
+ /* Look for vec1 != vec2 (includes vec2[i] == 0). */
+ rtx maskne = gen_reg_rtx (mask_mode);
+ rtx ne = gen_rtx_NE (mask_mode, vec1, vec2);
+ rtx vmsops[] = {maskne, ne, vec1, vec2};
+ emit_nonvlmax_insn (code_for_pred_cmp (vmode),
+ riscv_vector::COMPARE_OP, vmsops, cnt);
+
+ /* Combine both masks into one. */
+ rtx mask = gen_reg_rtx (mask_mode);
+ rtx vmorops[] = {mask, mask0, maskne};
+ emit_nonvlmax_insn (code_for_pred (IOR, mask_mode),
+ riscv_vector::BINARY_MASK_OP, vmorops, cnt);
+
+ /* Find the first bit in the mask (the first unequal element). */
+ rtx found_at = gen_reg_rtx (Pmode);
+ rtx vfops[] = {found_at, mask};
+ emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops, cnt);
+
+ /* Emit the loop condition. */
+ rtx test = gen_rtx_LT (VOIDmode, found_at, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test, found_at, const0_rtx, loop));
+
+ /* Walk up to the difference point. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, found_at)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, found_at)));
+
+ /* Load the respective byte and compute the difference. */
+ rtx c1 = gen_reg_rtx (Pmode);
+ rtx c2 = gen_reg_rtx (Pmode);
+
+ do_load_from_addr (mode, c1, src_addr1, src1);
+ do_load_from_addr (mode, c2, src_addr2, src2);
+
+ do_sub3 (sub, c1, c2);
+
+ if (with_length)
+ emit_label (done);
+
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, sub)));
+ return true;
+}
}
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6bf2dfdf9b4..ce092e92465 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>"
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
- if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
- DONE;
- else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+ if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
@@ -3705,7 +3703,8 @@ (define_expand "cmpstrnsi"
(match_operand:BLK 2)))
(use (match_operand:SI 3))
(use (match_operand:SI 4))])]
- "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strncmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
operands[3], operands[4]))
@@ -3725,7 +3724,8 @@ (define_expand "cmpstrsi"
(compare:SI (match_operand:BLK 1)
(match_operand:BLK 2)))
(use (match_operand:SI 3))])]
- "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strcmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
NULL_RTX, operands[3]))
@@ -3746,14 +3746,16 @@ (define_expand "strlen<mode>"
(match_operand:SI 2 "const_int_operand")
(match_operand:SI 3 "const_int_operand")]
UNSPEC_STRLEN))]
- "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strlen && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
rtx search_char = operands[2];
- if (search_char != const0_rtx)
+ if (search_char != const0_rtx && !TARGET_VECTOR)
FAIL;
- if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
+ else if (riscv_expand_strlen (operands[0], operands[1], operands[2],
+ operands[3]))
DONE;
else
FAIL;
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 0c6517bdc8b..00b52f5dc77 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -536,21 +536,21 @@ Enable the use of vector registers for function arguments and return value.
This is an experimental switch and may be subject to change in the future.
Enum
-Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum)
-Valid arguments to -mmemcpy-strategy=:
+Name(stringop_strategy) Type(enum stringop_strategy_enum)
+Valid arguments to -mbuilin-strategy=:
EnumValue
-Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO)
+Enum(stringop_strategy) String(auto) Value(STRINGOP_STRATEGY_AUTO)
EnumValue
-Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL)
+Enum(stringop_strategy) String(libcall) Value(STRINGOP_STRATEGY_LIBCALL)
EnumValue
-Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR)
+Enum(stringop_strategy) String(scalar) Value(STRINGOP_STRATEGY_SCALAR)
EnumValue
-Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR)
+Enum(stringop_strategy) String(vector) Value(STRINGOP_STRATEGY_VECTOR)
-mmemcpy-strategy=
-Target RejectNegative Joined Enum(riscv_stringop_strategy) Var(riscv_memcpy_strategy) Init(USE_AUTO)
-Specify memcpy expansion strategy.
+mbuiltin-strategy=
+Target RejectNegative Joined Enum(stringop_strategy) Var(stringop_strategy) Init(STRINGOP_STRATEGY_AUTO)
+Specify builtin expansion strategy.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
new file mode 100644
index 00000000000..6dec7da91c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+#include <string.h>
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+int
+__attribute__ ((noipa, optimize ("0")))
+foo2 (const char *s, const char *t)
+{
+ return strcmp (s, t);
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ for (int j = 0; j < SZ; j++)
+ if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+ __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
new file mode 100644
index 00000000000..f9d33a74fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 2 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
+/* { dg-final { scan-assembler-times "vmor.m" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
new file mode 100644
index 00000000000..d29297a5f86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+int
+__attribute__ ((noipa))
+foo2 (const char *s)
+{
+ int n = 0;
+ while (*s++ != '\0')
+ {
+ asm volatile ("");
+ n++;
+ }
+ return n;
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ {
+ if (foo (s[i]) != foo2 (s[i]))
+ __builtin_abort ();
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
new file mode 100644
index 00000000000..0c6cca63ebf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 1 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
Ah. I see:
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>"
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
- if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
- DONE;
- else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+ if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
I think it should be an NFC patch in another separate patch.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-01 06:22
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zhong@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Vectorized str(n)cmp and strlen.
Hi,
this adds vectorized implementations of strcmp and strncmp as well as
strlen. strlen falls back to the previously implemented rawmemchr.
Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure:
We would only ever increment the source address by 1 regardless of
the input type.
The patch also changes the stringop-strategy handling slightly:
auto is now an aggregate (including vector and scalar,
possibly more in the future) and expansion functions try all
matching strategies in their preferred order.
As before, str* expansion is guarded by -minline-str* and not active
by default. This might change in the future as I would rather have
those on by default. As of now, though, there is still a latent bug:
With -minline-strlen and -minline-strcmp we have several execution
failures in gcc.c-torture/execute/builtins/. From my initial analysis
it looks like we don't insert a vsetvl at the right spot (which would
be right after a setjmp in those cases). This leaves the initial
vle8ff without a proper vtype or vl causing a SIGILL.
Still, I figured I'd rather post the patch as-is so the bug can be
reproduced upstream.
Regards
Robin
gcc/ChangeLog:
PR target/112109
* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum):
Rename.
(enum stringop_strategy_enum): To this.
* config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen
param.
(expand_strcmp): Define.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
vector version.
(riscv_expand_strlen): Ditto.
(riscv_expand_block_move_scalar): Handle existing scalar expansion.
(riscv_expand_block_move): Expand to either vector or scalar
version.
(expand_block_move): Add stringop strategy.
(expand_rawmemchr): Handle strlen and fix increment bug.
(expand_strcmp): New expander.
* config/riscv/riscv.md: Add vector.
* config/riscv/riscv.opt: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test.
---
gcc/config/riscv/riscv-opts.h | 20 +-
gcc/config/riscv/riscv-protos.h | 4 +-
gcc/config/riscv/riscv-string.cc | 287 +++++++++++++++---
gcc/config/riscv/riscv.md | 18 +-
gcc/config/riscv/riscv.opt | 18 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c | 32 ++
.../riscv/rvv/autovec/builtin/strcmp.c | 13 +
.../riscv/rvv/autovec/builtin/strlen-run.c | 37 +++
.../riscv/rvv/autovec/builtin/strlen.c | 12 +
9 files changed, 363 insertions(+), 78 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index e6e55ad7071..315f6ddb239 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -103,16 +103,16 @@ enum riscv_entity
MAX_RISCV_ENTITIES
};
-/* RISC-V stringop strategy. */
-enum riscv_stringop_strategy_enum {
- /* Use scalar or vector instructions. */
- USE_AUTO,
- /* Always use a library call. */
- USE_LIBCALL,
- /* Only use scalar instructions. */
- USE_SCALAR,
- /* Only use vector instructions. */
- USE_VECTOR
+/* RISC-V builtin strategy. */
+enum stringop_strategy_enum {
+ /* No expansion. */
+ STRINGOP_STRATEGY_LIBCALL = 1,
+ /* Use scalar expansion if possible. */
+ STRINGOP_STRATEGY_SCALAR = 2,
+ /* Only vector expansion if possible. */
+ STRINGOP_STRATEGY_VECTOR = 4,
+ /* Use any. */
+ STRINGOP_STRATEGY_AUTO = STRINGOP_STRATEGY_SCALAR | STRINGOP_STRATEGY_VECTOR
};
#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 695ee24ad6f..51359154846 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -557,7 +557,9 @@ void expand_cond_unop (unsigned, rtx *);
void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
-void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx,
+ unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM. */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 80e3b5981af..ce259831a5c 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,7 +511,16 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
return false;
alignment = UINTVAL (align_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx,
+ alignment, ncompare);
+ if (ok)
+ return true;
+ }
+
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
{
return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
ncompare);
@@ -588,9 +597,17 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
bool
riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
{
+ if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR))
+ {
+ riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+ /* strlen */ true);
+ return true;
+ }
+
gcc_assert (search_char == const0_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
return riscv_expand_strlen_scalar (result, src, align);
return false;
@@ -707,51 +724,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
/* Expand a cpymemsi instruction, which copies LENGTH bytes from
memory reference SRC to memory reference DEST. */
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
- if (riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_VECTOR)
+ if (!CONST_INT_P (length))
return false;
- if (CONST_INT_P (length))
- {
- unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
- unsigned HOST_WIDE_INT factor, align;
+ unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+ unsigned HOST_WIDE_INT factor, align;
- align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
- factor = BITS_PER_WORD / align;
+ align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+ factor = BITS_PER_WORD / align;
- if (optimize_function_for_size_p (cfun)
- && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+ if (optimize_function_for_size_p (cfun)
+ && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+ return false;
- if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ {
+ riscv_block_move_straight (dest, src, INTVAL (length));
+ return true;
+ }
+ else if (optimize && align >= BITS_PER_WORD)
+ {
+ unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+ unsigned iter_words = min_iter_words;
+ unsigned HOST_WIDE_INT bytes = hwi_length;
+ unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+ /* Lengthen the loop body if it shortens the tail. */
+ for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
- riscv_block_move_straight (dest, src, INTVAL (length));
- return true;
+ unsigned cur_cost = iter_words + words % iter_words;
+ unsigned new_cost = i + words % i;
+ if (new_cost <= cur_cost)
+ iter_words = i;
}
- else if (optimize && align >= BITS_PER_WORD)
- {
- unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
- unsigned iter_words = min_iter_words;
- unsigned HOST_WIDE_INT bytes = hwi_length;
- unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
- /* Lengthen the loop body if it shortens the tail. */
- for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
- {
- unsigned cur_cost = iter_words + words % iter_words;
- unsigned new_cost = i + words % i;
- if (new_cost <= cur_cost)
- iter_words = i;
- }
- riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
- return true;
- }
+ riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+ return true;
+ }
+
+ return false;
+}
+
+/* This function delegates block-move expansion to either the vector
+ implementation or the scalar one. Return TRUE if successful or FALSE
+ otherwise. */
+
+bool
+riscv_expand_block_move (rtx dest, rtx src, rtx length)
+{
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_block_move (dest, src, length);
+ if (ok)
+ return true;
}
+
+ if (stringop_strategy & STRINGOP_STRATEGY_SCALAR)
+ return riscv_expand_block_move_scalar (dest, src, length);
+
return false;
}
@@ -777,9 +811,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
bnez a2, loop # Any more?
ret # Return
*/
- if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_SCALAR)
- return false;
HOST_WIDE_INT potential_ew
= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
/ BITS_PER_UNIT);
@@ -968,7 +999,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
behavior is undefined. */
void
-expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat,
+ bool strlen)
{
/*
rawmemchr:
@@ -1001,6 +1033,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
rtx end = gen_reg_rtx (Pmode);
rtx vec = gen_reg_rtx (vmode);
rtx mask = gen_reg_rtx (mask_mode);
@@ -1011,12 +1045,18 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+ rtx start_addr = copy_addr_to_reg (XEXP (src, 0));
rtx loop = gen_label_rtx ();
emit_label (loop);
rtx vsrc = change_address (src, vmode, src_addr);
+ /* Bump the pointer. */
+ rtx step = gen_reg_rtx (Pmode);
+ emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step)));
+
/* Emit a first-fault load. */
rtx vlops[] = {vec, vsrc};
emit_vlmax_insn (code_for_pred_fault_load (vmode),
@@ -1039,19 +1079,166 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
riscv_vector::CPOP_OP, vfops, cnt);
- /* Bump the pointer. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt)));
-
/* Emit the loop condition. */
rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx);
emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop));
- /* We overran by CNT, subtract it. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt)));
-
- /* We found something at SRC + END * [1,2,4,8]. */
- emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
- emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ if (strlen)
+ {
+ /* For strlen, return the length. */
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_MINUS (Pmode, dst, start_addr)));
+ }
+ else
+ {
+ /* For rawmemchr, return the position at SRC + END * [1,2,4,8]. */
+ emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ }
}
+/* Implement cmpstr<mode> using vector instructions. */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+ unsigned HOST_WIDE_INT, bool)
+{
+ gcc_assert (TARGET_VECTOR);
+
+ /* We don't support big endian. */
+ if (BYTES_BIG_ENDIAN)
+ return false;
+
+ bool with_length = nbytes != NULL_RTX;
+
+ if (with_length
+ && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+ return false;
+
+ if (with_length && CONST_INT_P (nbytes))
+ nbytes = force_reg (Pmode, nbytes);
+
+ machine_mode mode = E_QImode;
+ unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+ int lmul = TARGET_MAX_LMUL;
+ poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+ machine_mode vmode;
+ if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode),
+ nunits).exists (&vmode))
+ gcc_unreachable ();
+
+ machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
+
+ /* Prepare addresses. */
+ rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0));
+ rtx vsrc1 = change_address (src1, vmode, src_addr1);
+
+ rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0));
+ rtx vsrc2 = change_address (src2, vmode, src_addr2);
+
+ /* Set initial pointer bump to 0. */
+ rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
+ rtx sub = gen_reg_rtx (Pmode);
+ emit_move_insn (sub, CONST0_RTX (Pmode));
+
+ /* Create source vectors. */
+ rtx vec1 = gen_reg_rtx (vmode);
+ rtx vec2 = gen_reg_rtx (vmode);
+
+ rtx done = gen_label_rtx ();
+ rtx loop = gen_label_rtx ();
+ emit_label (loop);
+
+ /* Bump the pointers. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt)));
+
+ rtx vlops1[] = {vec1, vsrc1};
+ rtx vlops2[] = {vec2, vsrc2};
+
+ if (!with_length)
+ {
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1);
+
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2);
+ }
+ else
+ {
+ nbytes = gen_lowpart (Pmode, nbytes);
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1, nbytes);
+
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2, nbytes);
+ }
+
+ /* Read the vl for the next pointer bump. */
+ if (Pmode == SImode)
+ emit_insn (gen_read_vlsi (cnt));
+ else
+ emit_insn (gen_read_vldi_zero_extend (cnt));
+
+ if (with_length)
+ {
+ rtx test_done = gen_rtx_EQ (VOIDmode, cnt, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test_done, cnt, const0_rtx, done));
+ emit_insn (gen_rtx_SET (nbytes, gen_rtx_MINUS (Pmode, nbytes, cnt)));
+ }
+
+ /* Look for a \0 in the first string. */
+ rtx mask0 = gen_reg_rtx (mask_mode);
+ rtx eq0 = gen_rtx_EQ (mask_mode,
+ gen_const_vec_duplicate (vmode, CONST0_RTX (mode)),
+ vec1);
+ rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)};
+ emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+ riscv_vector::COMPARE_OP, vmsops1, cnt);
+
+ /* Look for vec1 != vec2 (includes vec2[i] == 0). */
+ rtx maskne = gen_reg_rtx (mask_mode);
+ rtx ne = gen_rtx_NE (mask_mode, vec1, vec2);
+ rtx vmsops[] = {maskne, ne, vec1, vec2};
+ emit_nonvlmax_insn (code_for_pred_cmp (vmode),
+ riscv_vector::COMPARE_OP, vmsops, cnt);
+
+ /* Combine both masks into one. */
+ rtx mask = gen_reg_rtx (mask_mode);
+ rtx vmorops[] = {mask, mask0, maskne};
+ emit_nonvlmax_insn (code_for_pred (IOR, mask_mode),
+ riscv_vector::BINARY_MASK_OP, vmorops, cnt);
+
+ /* Find the first bit in the mask (the first unequal element). */
+ rtx found_at = gen_reg_rtx (Pmode);
+ rtx vfops[] = {found_at, mask};
+ emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops, cnt);
+
+ /* Emit the loop condition. */
+ rtx test = gen_rtx_LT (VOIDmode, found_at, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test, found_at, const0_rtx, loop));
+
+ /* Walk up to the difference point. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, found_at)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, found_at)));
+
+ /* Load the respective byte and compute the difference. */
+ rtx c1 = gen_reg_rtx (Pmode);
+ rtx c2 = gen_reg_rtx (Pmode);
+
+ do_load_from_addr (mode, c1, src_addr1, src1);
+ do_load_from_addr (mode, c2, src_addr2, src2);
+
+ do_sub3 (sub, c1, c2);
+
+ if (with_length)
+ emit_label (done);
+
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, sub)));
+ return true;
+}
}
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6bf2dfdf9b4..ce092e92465 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>"
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
- if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
- DONE;
- else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+ if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
@@ -3705,7 +3703,8 @@ (define_expand "cmpstrnsi"
(match_operand:BLK 2)))
(use (match_operand:SI 3))
(use (match_operand:SI 4))])]
- "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strncmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
operands[3], operands[4]))
@@ -3725,7 +3724,8 @@ (define_expand "cmpstrsi"
(compare:SI (match_operand:BLK 1)
(match_operand:BLK 2)))
(use (match_operand:SI 3))])]
- "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strcmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
NULL_RTX, operands[3]))
@@ -3746,14 +3746,16 @@ (define_expand "strlen<mode>"
(match_operand:SI 2 "const_int_operand")
(match_operand:SI 3 "const_int_operand")]
UNSPEC_STRLEN))]
- "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strlen && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
rtx search_char = operands[2];
- if (search_char != const0_rtx)
+ if (search_char != const0_rtx && !TARGET_VECTOR)
FAIL;
- if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
+ else if (riscv_expand_strlen (operands[0], operands[1], operands[2],
+ operands[3]))
DONE;
else
FAIL;
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 0c6517bdc8b..00b52f5dc77 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -536,21 +536,21 @@ Enable the use of vector registers for function arguments and return value.
This is an experimental switch and may be subject to change in the future.
Enum
-Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum)
-Valid arguments to -mmemcpy-strategy=:
+Name(stringop_strategy) Type(enum stringop_strategy_enum)
+Valid arguments to -mbuilin-strategy=:
EnumValue
-Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO)
+Enum(stringop_strategy) String(auto) Value(STRINGOP_STRATEGY_AUTO)
EnumValue
-Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL)
+Enum(stringop_strategy) String(libcall) Value(STRINGOP_STRATEGY_LIBCALL)
EnumValue
-Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR)
+Enum(stringop_strategy) String(scalar) Value(STRINGOP_STRATEGY_SCALAR)
EnumValue
-Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR)
+Enum(stringop_strategy) String(vector) Value(STRINGOP_STRATEGY_VECTOR)
-mmemcpy-strategy=
-Target RejectNegative Joined Enum(riscv_stringop_strategy) Var(riscv_memcpy_strategy) Init(USE_AUTO)
-Specify memcpy expansion strategy.
+mbuiltin-strategy=
+Target RejectNegative Joined Enum(stringop_strategy) Var(stringop_strategy) Init(STRINGOP_STRATEGY_AUTO)
+Specify builtin expansion strategy.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
new file mode 100644
index 00000000000..6dec7da91c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+#include <string.h>
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+int
+__attribute__ ((noipa, optimize ("0")))
+foo2 (const char *s, const char *t)
+{
+ return strcmp (s, t);
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ for (int j = 0; j < SZ; j++)
+ if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+ __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
new file mode 100644
index 00000000000..f9d33a74fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 2 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
+/* { dg-final { scan-assembler-times "vmor.m" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
new file mode 100644
index 00000000000..d29297a5f86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+int
+__attribute__ ((noipa))
+foo2 (const char *s)
+{
+ int n = 0;
+ while (*s++ != '\0')
+ {
+ asm volatile ("");
+ n++;
+ }
+ return n;
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ {
+ if (foo (s[i]) != foo2 (s[i]))
+ __builtin_abort ();
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
new file mode 100644
index 00000000000..0c6cca63ebf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 1 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
Split it into four separate patches now. Regards Robin
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index e6e55ad7071..315f6ddb239 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -103,16 +103,16 @@ enum riscv_entity MAX_RISCV_ENTITIES }; -/* RISC-V stringop strategy. */ -enum riscv_stringop_strategy_enum { - /* Use scalar or vector instructions. */ - USE_AUTO, - /* Always use a library call. */ - USE_LIBCALL, - /* Only use scalar instructions. */ - USE_SCALAR, - /* Only use vector instructions. */ - USE_VECTOR +/* RISC-V builtin strategy. */ +enum stringop_strategy_enum { + /* No expansion. */ + STRINGOP_STRATEGY_LIBCALL = 1, + /* Use scalar expansion if possible. */ + STRINGOP_STRATEGY_SCALAR = 2, + /* Only vector expansion if possible. */ + STRINGOP_STRATEGY_VECTOR = 4, + /* Use any. */ + STRINGOP_STRATEGY_AUTO = STRINGOP_STRATEGY_SCALAR | STRINGOP_STRATEGY_VECTOR }; #define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT)) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 695ee24ad6f..51359154846 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -557,7 +557,9 @@ void expand_cond_unop (unsigned, rtx *); void expand_cond_binop (unsigned, rtx *); void expand_cond_ternop (unsigned, rtx *); void expand_popcount (rtx *); -void expand_rawmemchr (machine_mode, rtx, rtx, rtx); +void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false); +bool expand_strcmp (rtx, rtx, rtx, rtx, + unsigned HOST_WIDE_INT, bool); void emit_vec_extract (rtx, rtx, poly_int64); /* Rounding mode bitfield for fixed point VXRM. */ diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 80e3b5981af..ce259831a5c 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -511,7 +511,16 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2, return false; alignment = UINTVAL (align_rtx); - if (TARGET_ZBB || TARGET_XTHEADBB) + if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR) + { + bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx, + alignment, ncompare); + if (ok) + return true; + } + + if ((TARGET_ZBB || TARGET_XTHEADBB) + && stringop_strategy & STRINGOP_STRATEGY_SCALAR) { return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment, ncompare); @@ -588,9 +597,17 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align) bool riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align) { + if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR)) + { + riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char, + /* strlen */ true); + return true; + } + gcc_assert (search_char == const0_rtx); - if (TARGET_ZBB || TARGET_XTHEADBB) + if ((TARGET_ZBB || TARGET_XTHEADBB) + && stringop_strategy & STRINGOP_STRATEGY_SCALAR) return riscv_expand_strlen_scalar (result, src, align); return false; @@ -707,51 +724,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, /* Expand a cpymemsi instruction, which copies LENGTH bytes from memory reference SRC to memory reference DEST. */ -bool -riscv_expand_block_move (rtx dest, rtx src, rtx length) +static bool +riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) { - if (riscv_memcpy_strategy == USE_LIBCALL - || riscv_memcpy_strategy == USE_VECTOR) + if (!CONST_INT_P (length)) return false; - if (CONST_INT_P (length)) - { - unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); - unsigned HOST_WIDE_INT factor, align; + unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); + unsigned HOST_WIDE_INT factor, align; - align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); - factor = BITS_PER_WORD / align; + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); + factor = BITS_PER_WORD / align; - if (optimize_function_for_size_p (cfun) - && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) - return false; + if (optimize_function_for_size_p (cfun) + && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) + return false; - if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) + if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) + { + riscv_block_move_straight (dest, src, INTVAL (length)); + return true; + } + else if (optimize && align >= BITS_PER_WORD) + { + unsigned min_iter_words + = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; + unsigned iter_words = min_iter_words; + unsigned HOST_WIDE_INT bytes = hwi_length; + unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD; + + /* Lengthen the loop body if it shortens the tail. */ + for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) { - riscv_block_move_straight (dest, src, INTVAL (length)); - return true; + unsigned cur_cost = iter_words + words % iter_words; + unsigned new_cost = i + words % i; + if (new_cost <= cur_cost) + iter_words = i; } - else if (optimize && align >= BITS_PER_WORD) - { - unsigned min_iter_words - = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; - unsigned iter_words = min_iter_words; - unsigned HOST_WIDE_INT bytes = hwi_length; - unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD; - - /* Lengthen the loop body if it shortens the tail. */ - for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) - { - unsigned cur_cost = iter_words + words % iter_words; - unsigned new_cost = i + words % i; - if (new_cost <= cur_cost) - iter_words = i; - } - riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); - return true; - } + riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); + return true; + } + + return false; +} + +/* This function delegates block-move expansion to either the vector + implementation or the scalar one. Return TRUE if successful or FALSE + otherwise. */ + +bool +riscv_expand_block_move (rtx dest, rtx src, rtx length) +{ + if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR) + { + bool ok = riscv_vector::expand_block_move (dest, src, length); + if (ok) + return true; } + + if (stringop_strategy & STRINGOP_STRATEGY_SCALAR) + return riscv_expand_block_move_scalar (dest, src, length); + return false; } @@ -777,9 +811,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) bnez a2, loop # Any more? ret # Return */ - if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL - || riscv_memcpy_strategy == USE_SCALAR) - return false; HOST_WIDE_INT potential_ew = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD) / BITS_PER_UNIT); @@ -968,7 +999,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) behavior is undefined. */ void -expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) +expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat, + bool strlen) { /* rawmemchr: @@ -1001,6 +1033,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) machine_mode mask_mode = riscv_vector::get_mask_mode (vmode); rtx cnt = gen_reg_rtx (Pmode); + emit_move_insn (cnt, CONST0_RTX (Pmode)); + rtx end = gen_reg_rtx (Pmode); rtx vec = gen_reg_rtx (vmode); rtx mask = gen_reg_rtx (mask_mode); @@ -1011,12 +1045,18 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ()); rtx src_addr = copy_addr_to_reg (XEXP (src, 0)); + rtx start_addr = copy_addr_to_reg (XEXP (src, 0)); rtx loop = gen_label_rtx (); emit_label (loop); rtx vsrc = change_address (src, vmode, src_addr); + /* Bump the pointer. */ + rtx step = gen_reg_rtx (Pmode); + emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift)))); + emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step))); + /* Emit a first-fault load. */ rtx vlops[] = {vec, vsrc}; emit_vlmax_insn (code_for_pred_fault_load (vmode), @@ -1039,19 +1079,166 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode), riscv_vector::CPOP_OP, vfops, cnt); - /* Bump the pointer. */ - emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt))); - /* Emit the loop condition. */ rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx); emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop)); - /* We overran by CNT, subtract it. */ - emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt))); - - /* We found something at SRC + END * [1,2,4,8]. */ - emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift)))); - emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); + if (strlen) + { + /* For strlen, return the length. */ + emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); + emit_insn (gen_rtx_SET (dst, gen_rtx_MINUS (Pmode, dst, start_addr))); + } + else + { + /* For rawmemchr, return the position at SRC + END * [1,2,4,8]. */ + emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift)))); + emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); + } } +/* Implement cmpstr<mode> using vector instructions. */ + +bool +expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes, + unsigned HOST_WIDE_INT, bool) +{ + gcc_assert (TARGET_VECTOR); + + /* We don't support big endian. */ + if (BYTES_BIG_ENDIAN) + return false; + + bool with_length = nbytes != NULL_RTX; + + if (with_length + && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes))) + return false; + + if (with_length && CONST_INT_P (nbytes)) + nbytes = force_reg (Pmode, nbytes); + + machine_mode mode = E_QImode; + unsigned int isize = GET_MODE_SIZE (mode).to_constant (); + int lmul = TARGET_MAX_LMUL; + poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize); + + machine_mode vmode; + if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode), + nunits).exists (&vmode)) + gcc_unreachable (); + + machine_mode mask_mode = riscv_vector::get_mask_mode (vmode); + + /* Prepare addresses. */ + rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0)); + rtx vsrc1 = change_address (src1, vmode, src_addr1); + + rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0)); + rtx vsrc2 = change_address (src2, vmode, src_addr2); + + /* Set initial pointer bump to 0. */ + rtx cnt = gen_reg_rtx (Pmode); + emit_move_insn (cnt, CONST0_RTX (Pmode)); + + rtx sub = gen_reg_rtx (Pmode); + emit_move_insn (sub, CONST0_RTX (Pmode)); + + /* Create source vectors. */ + rtx vec1 = gen_reg_rtx (vmode); + rtx vec2 = gen_reg_rtx (vmode); + + rtx done = gen_label_rtx (); + rtx loop = gen_label_rtx (); + emit_label (loop); + + /* Bump the pointers. */ + emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt))); + emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt))); + + rtx vlops1[] = {vec1, vsrc1}; + rtx vlops2[] = {vec2, vsrc2}; + + if (!with_length) + { + emit_vlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops1); + + emit_vlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops2); + } + else + { + nbytes = gen_lowpart (Pmode, nbytes); + emit_nonvlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops1, nbytes); + + emit_nonvlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops2, nbytes); + } + + /* Read the vl for the next pointer bump. */ + if (Pmode == SImode) + emit_insn (gen_read_vlsi (cnt)); + else + emit_insn (gen_read_vldi_zero_extend (cnt)); + + if (with_length) + { + rtx test_done = gen_rtx_EQ (VOIDmode, cnt, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test_done, cnt, const0_rtx, done)); + emit_insn (gen_rtx_SET (nbytes, gen_rtx_MINUS (Pmode, nbytes, cnt))); + } + + /* Look for a \0 in the first string. */ + rtx mask0 = gen_reg_rtx (mask_mode); + rtx eq0 = gen_rtx_EQ (mask_mode, + gen_const_vec_duplicate (vmode, CONST0_RTX (mode)), + vec1); + rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)}; + emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode), + riscv_vector::COMPARE_OP, vmsops1, cnt); + + /* Look for vec1 != vec2 (includes vec2[i] == 0). */ + rtx maskne = gen_reg_rtx (mask_mode); + rtx ne = gen_rtx_NE (mask_mode, vec1, vec2); + rtx vmsops[] = {maskne, ne, vec1, vec2}; + emit_nonvlmax_insn (code_for_pred_cmp (vmode), + riscv_vector::COMPARE_OP, vmsops, cnt); + + /* Combine both masks into one. */ + rtx mask = gen_reg_rtx (mask_mode); + rtx vmorops[] = {mask, mask0, maskne}; + emit_nonvlmax_insn (code_for_pred (IOR, mask_mode), + riscv_vector::BINARY_MASK_OP, vmorops, cnt); + + /* Find the first bit in the mask (the first unequal element). */ + rtx found_at = gen_reg_rtx (Pmode); + rtx vfops[] = {found_at, mask}; + emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode), + riscv_vector::CPOP_OP, vfops, cnt); + + /* Emit the loop condition. */ + rtx test = gen_rtx_LT (VOIDmode, found_at, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test, found_at, const0_rtx, loop)); + + /* Walk up to the difference point. */ + emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, found_at))); + emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, found_at))); + + /* Load the respective byte and compute the difference. */ + rtx c1 = gen_reg_rtx (Pmode); + rtx c2 = gen_reg_rtx (Pmode); + + do_load_from_addr (mode, c1, src_addr1, src1); + do_load_from_addr (mode, c2, src_addr2, src2); + + do_sub3 (sub, c1, c2); + + if (with_length) + emit_label (done); + + emit_insn (gen_movsi (result, gen_lowpart (SImode, sub))); + return true; +} } diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6bf2dfdf9b4..ce092e92465 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>" (use (match_operand:SI 3 "const_int_operand"))])] "" { - if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2])) - DONE; - else if (riscv_expand_block_move (operands[0], operands[1], operands[2])) + if (riscv_expand_block_move (operands[0], operands[1], operands[2])) DONE; else FAIL; @@ -3705,7 +3703,8 @@ (define_expand "cmpstrnsi" (match_operand:BLK 2))) (use (match_operand:SI 3)) (use (match_operand:SI 4))])] - "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)" + "riscv_inline_strncmp && !optimize_size + && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { if (riscv_expand_strcmp (operands[0], operands[1], operands[2], operands[3], operands[4])) @@ -3725,7 +3724,8 @@ (define_expand "cmpstrsi" (compare:SI (match_operand:BLK 1) (match_operand:BLK 2))) (use (match_operand:SI 3))])] - "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)" + "riscv_inline_strcmp && !optimize_size + && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { if (riscv_expand_strcmp (operands[0], operands[1], operands[2], NULL_RTX, operands[3])) @@ -3746,14 +3746,16 @@ (define_expand "strlen<mode>" (match_operand:SI 2 "const_int_operand") (match_operand:SI 3 "const_int_operand")] UNSPEC_STRLEN))] - "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)" + "riscv_inline_strlen && !optimize_size + && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { rtx search_char = operands[2]; - if (search_char != const0_rtx) + if (search_char != const0_rtx && !TARGET_VECTOR) FAIL; - if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3])) + else if (riscv_expand_strlen (operands[0], operands[1], operands[2], + operands[3])) DONE; else FAIL; diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 0c6517bdc8b..00b52f5dc77 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -536,21 +536,21 @@ Enable the use of vector registers for function arguments and return value. This is an experimental switch and may be subject to change in the future. Enum -Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum) -Valid arguments to -mmemcpy-strategy=: +Name(stringop_strategy) Type(enum stringop_strategy_enum) +Valid arguments to -mbuilin-strategy=: EnumValue -Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO) +Enum(stringop_strategy) String(auto) Value(STRINGOP_STRATEGY_AUTO) EnumValue -Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL) +Enum(stringop_strategy) String(libcall) Value(STRINGOP_STRATEGY_LIBCALL) EnumValue -Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR) +Enum(stringop_strategy) String(scalar) Value(STRINGOP_STRATEGY_SCALAR) EnumValue -Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR) +Enum(stringop_strategy) String(vector) Value(STRINGOP_STRATEGY_VECTOR) -mmemcpy-strategy= -Target RejectNegative Joined Enum(riscv_stringop_strategy) Var(riscv_memcpy_strategy) Init(USE_AUTO) -Specify memcpy expansion strategy. +mbuiltin-strategy= +Target RejectNegative Joined Enum(stringop_strategy) Var(stringop_strategy) Init(STRINGOP_STRATEGY_AUTO) +Specify builtin expansion strategy. diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c new file mode 100644 index 00000000000..6dec7da91c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3 -minline-strcmp" } */ + +#include <string.h> + +int +__attribute__ ((noipa)) +foo (const char *s, const char *t) +{ + return __builtin_strcmp (s, t); +} + +int +__attribute__ ((noipa, optimize ("0"))) +foo2 (const char *s, const char *t) +{ + return strcmp (s, t); +} + +#define SZ 10 + +int main () +{ + const char *s[SZ] + = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43", + "a", "z", "1", "9", "12345678901234567889012345678901234567890"}; + + for (int i = 0; i < SZ; i++) + for (int j = 0; j < SZ; j++) + if (foo (s[i], s[j]) != foo2 (s[i], s[j])) + __builtin_abort (); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c new file mode 100644 index 00000000000..f9d33a74fc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { riscv_v } } } */ +/* { dg-additional-options "-O3 -minline-strcmp" } */ + +int +__attribute__ ((noipa)) +foo (const char *s, const char *t) +{ + return __builtin_strcmp (s, t); +} + +/* { dg-final { scan-assembler-times "vle8ff" 2 } } */ +/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */ +/* { dg-final { scan-assembler-times "vmor.m" 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c new file mode 100644 index 00000000000..d29297a5f86 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c @@ -0,0 +1,37 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3 -minline-strlen" } */ + +int +__attribute__ ((noipa)) +foo (const char *s) +{ + return __builtin_strlen (s); +} + +int +__attribute__ ((noipa)) +foo2 (const char *s) +{ + int n = 0; + while (*s++ != '\0') + { + asm volatile (""); + n++; + } + return n; +} + +#define SZ 10 + +int main () +{ + const char *s[SZ] + = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43", + "a", "z", "1", "9", "12345678901234567889012345678901234567890"}; + + for (int i = 0; i < SZ; i++) + { + if (foo (s[i]) != foo2 (s[i])) + __builtin_abort (); + } +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c new file mode 100644 index 00000000000..0c6cca63ebf --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { riscv_v } } } */ +/* { dg-additional-options "-O3 -minline-strlen" } */ + +int +__attribute__ ((noipa)) +foo (const char *s) +{ + return __builtin_strlen (s); +} + +/* { dg-final { scan-assembler-times "vle8ff" 1 } } */ +/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */