From patchwork Thu Nov 30 22:22:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1870433 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=JpVTDrsW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sh9gP5STnz1ySd for ; Fri, 1 Dec 2023 09:22:57 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1AF95385C33A for ; Thu, 30 Nov 2023 22:22:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 51586385C33A for ; Thu, 30 Nov 2023 22:22:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 51586385C33A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 51586385C33A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701382962; cv=none; b=bSuHSbnSxuSm+9CdD0902+HeGfl4COYpFq5cPBAOmJvdH5QW3kqzKu/OQArqi3kxy8z1U4vPOoGXDa+my8P7itCtVOPgTzTuHF6ww7S3XAyfkfjJ9vKbBcz1hHSDWbA+BkqNGA5rmcOUSwGevYJEl0rF1PehkWDyd27ZmewjOaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701382962; c=relaxed/simple; bh=Fcscexzuh2R6QghHfUJd7bC/vj3/W40U6z7q2zx7eXk=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=gKAFvyZdYgaJNtMsbRmcbyZIHOenUIoojvxHUts1smkfEHwzhNUtFHCzvpUfnuQ2aYdRo8iudrdJWutMqlRAZiyHfoXaP8LZbzFxTQ7XKI3MkvbisTNULZhr+u0473/lP0QMTQD7t/Yciw1hWKRdD8WTZa3ptUayFeCdpOwdxLk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-a00d5b0ec44so208557866b.0 for ; Thu, 30 Nov 2023 14:22:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701382957; x=1701987757; darn=gcc.gnu.org; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Ufktf/5LfX7qs/VhwNkISRFywqzTAMeU28i2WgUxhvQ=; b=JpVTDrsWs2m++/ZBE351YFFKwMToczWptH07kn4XoZgcPiMaBFSrvePXn8SRWKqGK4 fz+4a+/YSkTa8fd2tdqA0BG67zWNBf8UbCxXlvpZQ4CBzLaI0HUCPwspNXQ4p51Qix+B sijfwFA0oBE4uSda+H2HPfyDRU4+ODHY6gzq5sYScs6CmRMBGh8t8o9ez27W5qAsr8/6 TdriEmte5m7R4+mLRRsw0no6jNXa3OwxNVhEZ0Vq5aMqBynPlnBoz1tlhgrY4ymI4sDr oO8ss3NvMv7bXU5cEVKxIOIuaiFjByExpWdAjolZ5YW5HA+u4K1xT1cnvd5kiRSZYFoA nqHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701382957; x=1701987757; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Ufktf/5LfX7qs/VhwNkISRFywqzTAMeU28i2WgUxhvQ=; b=ravhkp1uub4DAbsr72bn0UpuWeBlpnMllD9e0c15Kdl8yNob+uXbUYWbsfo5mCvYDR 5HmkfordOnMSKFyIQJ0fEzZ2+9BZOq9cpD8zYybA/4WV2tKGzsbtunO8HrBz6reDTWkg 5d/XVio6Bw9QwQGAKsINXOp01N6clTLTLAqo/wElkmtsYiF2z1qo8Nl78snD1MEDVFrp V4bPGFi8quvXivCakHP/WbDTBsNb0NR4uNDepXUzBjcgp7tv6I9bwzz0H905WmJZpvE1 WApoK0A8rYKAnCQijgMu6MfyCThBCqd/RvW0T0UQCuLT57Jkm/m9i4xeC0dp63iFs9ur x5ZA== X-Gm-Message-State: AOJu0YxVXpUp2eU1V8DSMJEuqUlZ9V+kQCZF8L7LmbTcI8PmzR/4caEy jWpGGi9LEzwQI00EL/AWeGRJBlYMBC4= X-Google-Smtp-Source: AGHT+IEW9lyGEk9nCJ0SGtoY7IY4b2sRCae6o1qBSIAQjL/+8EvsUnibCOji8ediOcHsZR5oded09g== X-Received: by 2002:a17:906:816:b0:a19:9b79:8b4d with SMTP id e22-20020a170906081600b00a199b798b4dmr120077ejd.94.1701382956888; Thu, 30 Nov 2023 14:22:36 -0800 (PST) Received: from [192.168.1.23] (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id lo9-20020a170906fa0900b009fd0102f71asm1150749ejb.176.2023.11.30.14.22.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Nov 2023 14:22:36 -0800 (PST) Message-ID: <2cf2fa3f-541b-4c39-8689-161c7a047f7a@gmail.com> Date: Thu, 30 Nov 2023 23:22:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: rdapp.gcc@gmail.com Content-Language: en-US From: Robin Dapp Subject: [PATCH] RISC-V: Vectorized str(n)cmp and strlen. To: gcc-patches , palmer , Kito Cheng , jeffreyalaw , "juzhe.zhong@rivai.ai" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, this adds vectorized implementations of strcmp and strncmp as well as strlen. strlen falls back to the previously implemented rawmemchr. Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure: We would only ever increment the source address by 1 regardless of the input type. The patch also changes the stringop-strategy handling slightly: auto is now an aggregate (including vector and scalar, possibly more in the future) and expansion functions try all matching strategies in their preferred order. As before, str* expansion is guarded by -minline-str* and not active by default. This might change in the future as I would rather have those on by default. As of now, though, there is still a latent bug: With -minline-strlen and -minline-strcmp we have several execution failures in gcc.c-torture/execute/builtins/. From my initial analysis it looks like we don't insert a vsetvl at the right spot (which would be right after a setjmp in those cases). This leaves the initial vle8ff without a proper vtype or vl causing a SIGILL. Still, I figured I'd rather post the patch as-is so the bug can be reproduced upstream. Regards Robin gcc/ChangeLog: PR target/112109 * config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): Rename. (enum stringop_strategy_enum): To this. * config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen param. (expand_strcmp): Define. * config/riscv/riscv-string.cc (riscv_expand_strcmp): Add vector version. (riscv_expand_strlen): Ditto. (riscv_expand_block_move_scalar): Handle existing scalar expansion. (riscv_expand_block_move): Expand to either vector or scalar version. (expand_block_move): Add stringop strategy. (expand_rawmemchr): Handle strlen and fix increment bug. (expand_strcmp): New expander. * config/riscv/riscv.md: Add vector. * config/riscv/riscv.opt: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test. * gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test. * gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test. * gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test. --- gcc/config/riscv/riscv-opts.h | 20 +- gcc/config/riscv/riscv-protos.h | 4 +- gcc/config/riscv/riscv-string.cc | 287 +++++++++++++++--- gcc/config/riscv/riscv.md | 18 +- gcc/config/riscv/riscv.opt | 18 +- .../riscv/rvv/autovec/builtin/strcmp-run.c | 32 ++ .../riscv/rvv/autovec/builtin/strcmp.c | 13 + .../riscv/rvv/autovec/builtin/strlen-run.c | 37 +++ .../riscv/rvv/autovec/builtin/strlen.c | 12 + 9 files changed, 363 insertions(+), 78 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index e6e55ad7071..315f6ddb239 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -103,16 +103,16 @@ enum riscv_entity MAX_RISCV_ENTITIES }; -/* RISC-V stringop strategy. */ -enum riscv_stringop_strategy_enum { - /* Use scalar or vector instructions. */ - USE_AUTO, - /* Always use a library call. */ - USE_LIBCALL, - /* Only use scalar instructions. */ - USE_SCALAR, - /* Only use vector instructions. */ - USE_VECTOR +/* RISC-V builtin strategy. */ +enum stringop_strategy_enum { + /* No expansion. */ + STRINGOP_STRATEGY_LIBCALL = 1, + /* Use scalar expansion if possible. */ + STRINGOP_STRATEGY_SCALAR = 2, + /* Only vector expansion if possible. */ + STRINGOP_STRATEGY_VECTOR = 4, + /* Use any. */ + STRINGOP_STRATEGY_AUTO = STRINGOP_STRATEGY_SCALAR | STRINGOP_STRATEGY_VECTOR }; #define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT)) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 695ee24ad6f..51359154846 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -557,7 +557,9 @@ void expand_cond_unop (unsigned, rtx *); void expand_cond_binop (unsigned, rtx *); void expand_cond_ternop (unsigned, rtx *); void expand_popcount (rtx *); -void expand_rawmemchr (machine_mode, rtx, rtx, rtx); +void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false); +bool expand_strcmp (rtx, rtx, rtx, rtx, + unsigned HOST_WIDE_INT, bool); void emit_vec_extract (rtx, rtx, poly_int64); /* Rounding mode bitfield for fixed point VXRM. */ diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 80e3b5981af..ce259831a5c 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -511,7 +511,16 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2, return false; alignment = UINTVAL (align_rtx); - if (TARGET_ZBB || TARGET_XTHEADBB) + if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR) + { + bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx, + alignment, ncompare); + if (ok) + return true; + } + + if ((TARGET_ZBB || TARGET_XTHEADBB) + && stringop_strategy & STRINGOP_STRATEGY_SCALAR) { return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment, ncompare); @@ -588,9 +597,17 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align) bool riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align) { + if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR)) + { + riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char, + /* strlen */ true); + return true; + } + gcc_assert (search_char == const0_rtx); - if (TARGET_ZBB || TARGET_XTHEADBB) + if ((TARGET_ZBB || TARGET_XTHEADBB) + && stringop_strategy & STRINGOP_STRATEGY_SCALAR) return riscv_expand_strlen_scalar (result, src, align); return false; @@ -707,51 +724,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, /* Expand a cpymemsi instruction, which copies LENGTH bytes from memory reference SRC to memory reference DEST. */ -bool -riscv_expand_block_move (rtx dest, rtx src, rtx length) +static bool +riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) { - if (riscv_memcpy_strategy == USE_LIBCALL - || riscv_memcpy_strategy == USE_VECTOR) + if (!CONST_INT_P (length)) return false; - if (CONST_INT_P (length)) - { - unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); - unsigned HOST_WIDE_INT factor, align; + unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); + unsigned HOST_WIDE_INT factor, align; - align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); - factor = BITS_PER_WORD / align; + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); + factor = BITS_PER_WORD / align; - if (optimize_function_for_size_p (cfun) - && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) - return false; + if (optimize_function_for_size_p (cfun) + && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) + return false; - if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) + if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) + { + riscv_block_move_straight (dest, src, INTVAL (length)); + return true; + } + else if (optimize && align >= BITS_PER_WORD) + { + unsigned min_iter_words + = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; + unsigned iter_words = min_iter_words; + unsigned HOST_WIDE_INT bytes = hwi_length; + unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD; + + /* Lengthen the loop body if it shortens the tail. */ + for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) { - riscv_block_move_straight (dest, src, INTVAL (length)); - return true; + unsigned cur_cost = iter_words + words % iter_words; + unsigned new_cost = i + words % i; + if (new_cost <= cur_cost) + iter_words = i; } - else if (optimize && align >= BITS_PER_WORD) - { - unsigned min_iter_words - = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; - unsigned iter_words = min_iter_words; - unsigned HOST_WIDE_INT bytes = hwi_length; - unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD; - - /* Lengthen the loop body if it shortens the tail. */ - for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) - { - unsigned cur_cost = iter_words + words % iter_words; - unsigned new_cost = i + words % i; - if (new_cost <= cur_cost) - iter_words = i; - } - riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); - return true; - } + riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); + return true; + } + + return false; +} + +/* This function delegates block-move expansion to either the vector + implementation or the scalar one. Return TRUE if successful or FALSE + otherwise. */ + +bool +riscv_expand_block_move (rtx dest, rtx src, rtx length) +{ + if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR) + { + bool ok = riscv_vector::expand_block_move (dest, src, length); + if (ok) + return true; } + + if (stringop_strategy & STRINGOP_STRATEGY_SCALAR) + return riscv_expand_block_move_scalar (dest, src, length); + return false; } @@ -777,9 +811,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) bnez a2, loop # Any more? ret # Return */ - if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL - || riscv_memcpy_strategy == USE_SCALAR) - return false; HOST_WIDE_INT potential_ew = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD) / BITS_PER_UNIT); @@ -968,7 +999,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) behavior is undefined. */ void -expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) +expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat, + bool strlen) { /* rawmemchr: @@ -1001,6 +1033,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) machine_mode mask_mode = riscv_vector::get_mask_mode (vmode); rtx cnt = gen_reg_rtx (Pmode); + emit_move_insn (cnt, CONST0_RTX (Pmode)); + rtx end = gen_reg_rtx (Pmode); rtx vec = gen_reg_rtx (vmode); rtx mask = gen_reg_rtx (mask_mode); @@ -1011,12 +1045,18 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ()); rtx src_addr = copy_addr_to_reg (XEXP (src, 0)); + rtx start_addr = copy_addr_to_reg (XEXP (src, 0)); rtx loop = gen_label_rtx (); emit_label (loop); rtx vsrc = change_address (src, vmode, src_addr); + /* Bump the pointer. */ + rtx step = gen_reg_rtx (Pmode); + emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift)))); + emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step))); + /* Emit a first-fault load. */ rtx vlops[] = {vec, vsrc}; emit_vlmax_insn (code_for_pred_fault_load (vmode), @@ -1039,19 +1079,166 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode), riscv_vector::CPOP_OP, vfops, cnt); - /* Bump the pointer. */ - emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt))); - /* Emit the loop condition. */ rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx); emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop)); - /* We overran by CNT, subtract it. */ - emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt))); - - /* We found something at SRC + END * [1,2,4,8]. */ - emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift)))); - emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); + if (strlen) + { + /* For strlen, return the length. */ + emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); + emit_insn (gen_rtx_SET (dst, gen_rtx_MINUS (Pmode, dst, start_addr))); + } + else + { + /* For rawmemchr, return the position at SRC + END * [1,2,4,8]. */ + emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift)))); + emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); + } } +/* Implement cmpstr using vector instructions. */ + +bool +expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes, + unsigned HOST_WIDE_INT, bool) +{ + gcc_assert (TARGET_VECTOR); + + /* We don't support big endian. */ + if (BYTES_BIG_ENDIAN) + return false; + + bool with_length = nbytes != NULL_RTX; + + if (with_length + && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes))) + return false; + + if (with_length && CONST_INT_P (nbytes)) + nbytes = force_reg (Pmode, nbytes); + + machine_mode mode = E_QImode; + unsigned int isize = GET_MODE_SIZE (mode).to_constant (); + int lmul = TARGET_MAX_LMUL; + poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize); + + machine_mode vmode; + if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode), + nunits).exists (&vmode)) + gcc_unreachable (); + + machine_mode mask_mode = riscv_vector::get_mask_mode (vmode); + + /* Prepare addresses. */ + rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0)); + rtx vsrc1 = change_address (src1, vmode, src_addr1); + + rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0)); + rtx vsrc2 = change_address (src2, vmode, src_addr2); + + /* Set initial pointer bump to 0. */ + rtx cnt = gen_reg_rtx (Pmode); + emit_move_insn (cnt, CONST0_RTX (Pmode)); + + rtx sub = gen_reg_rtx (Pmode); + emit_move_insn (sub, CONST0_RTX (Pmode)); + + /* Create source vectors. */ + rtx vec1 = gen_reg_rtx (vmode); + rtx vec2 = gen_reg_rtx (vmode); + + rtx done = gen_label_rtx (); + rtx loop = gen_label_rtx (); + emit_label (loop); + + /* Bump the pointers. */ + emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt))); + emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt))); + + rtx vlops1[] = {vec1, vsrc1}; + rtx vlops2[] = {vec2, vsrc2}; + + if (!with_length) + { + emit_vlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops1); + + emit_vlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops2); + } + else + { + nbytes = gen_lowpart (Pmode, nbytes); + emit_nonvlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops1, nbytes); + + emit_nonvlmax_insn (code_for_pred_fault_load (vmode), + riscv_vector::UNARY_OP, vlops2, nbytes); + } + + /* Read the vl for the next pointer bump. */ + if (Pmode == SImode) + emit_insn (gen_read_vlsi (cnt)); + else + emit_insn (gen_read_vldi_zero_extend (cnt)); + + if (with_length) + { + rtx test_done = gen_rtx_EQ (VOIDmode, cnt, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test_done, cnt, const0_rtx, done)); + emit_insn (gen_rtx_SET (nbytes, gen_rtx_MINUS (Pmode, nbytes, cnt))); + } + + /* Look for a \0 in the first string. */ + rtx mask0 = gen_reg_rtx (mask_mode); + rtx eq0 = gen_rtx_EQ (mask_mode, + gen_const_vec_duplicate (vmode, CONST0_RTX (mode)), + vec1); + rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)}; + emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode), + riscv_vector::COMPARE_OP, vmsops1, cnt); + + /* Look for vec1 != vec2 (includes vec2[i] == 0). */ + rtx maskne = gen_reg_rtx (mask_mode); + rtx ne = gen_rtx_NE (mask_mode, vec1, vec2); + rtx vmsops[] = {maskne, ne, vec1, vec2}; + emit_nonvlmax_insn (code_for_pred_cmp (vmode), + riscv_vector::COMPARE_OP, vmsops, cnt); + + /* Combine both masks into one. */ + rtx mask = gen_reg_rtx (mask_mode); + rtx vmorops[] = {mask, mask0, maskne}; + emit_nonvlmax_insn (code_for_pred (IOR, mask_mode), + riscv_vector::BINARY_MASK_OP, vmorops, cnt); + + /* Find the first bit in the mask (the first unequal element). */ + rtx found_at = gen_reg_rtx (Pmode); + rtx vfops[] = {found_at, mask}; + emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode), + riscv_vector::CPOP_OP, vfops, cnt); + + /* Emit the loop condition. */ + rtx test = gen_rtx_LT (VOIDmode, found_at, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test, found_at, const0_rtx, loop)); + + /* Walk up to the difference point. */ + emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, found_at))); + emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, found_at))); + + /* Load the respective byte and compute the difference. */ + rtx c1 = gen_reg_rtx (Pmode); + rtx c2 = gen_reg_rtx (Pmode); + + do_load_from_addr (mode, c1, src_addr1, src1); + do_load_from_addr (mode, c2, src_addr2, src2); + + do_sub3 (sub, c1, c2); + + if (with_length) + emit_label (done); + + emit_insn (gen_movsi (result, gen_lowpart (SImode, sub))); + return true; +} } diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6bf2dfdf9b4..ce092e92465 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2336,9 +2336,7 @@ (define_expand "cpymem" (use (match_operand:SI 3 "const_int_operand"))])] "" { - if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2])) - DONE; - else if (riscv_expand_block_move (operands[0], operands[1], operands[2])) + if (riscv_expand_block_move (operands[0], operands[1], operands[2])) DONE; else FAIL; @@ -3705,7 +3703,8 @@ (define_expand "cmpstrnsi" (match_operand:BLK 2))) (use (match_operand:SI 3)) (use (match_operand:SI 4))])] - "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)" + "riscv_inline_strncmp && !optimize_size + && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { if (riscv_expand_strcmp (operands[0], operands[1], operands[2], operands[3], operands[4])) @@ -3725,7 +3724,8 @@ (define_expand "cmpstrsi" (compare:SI (match_operand:BLK 1) (match_operand:BLK 2))) (use (match_operand:SI 3))])] - "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)" + "riscv_inline_strcmp && !optimize_size + && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { if (riscv_expand_strcmp (operands[0], operands[1], operands[2], NULL_RTX, operands[3])) @@ -3746,14 +3746,16 @@ (define_expand "strlen" (match_operand:SI 2 "const_int_operand") (match_operand:SI 3 "const_int_operand")] UNSPEC_STRLEN))] - "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)" + "riscv_inline_strlen && !optimize_size + && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { rtx search_char = operands[2]; - if (search_char != const0_rtx) + if (search_char != const0_rtx && !TARGET_VECTOR) FAIL; - if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3])) + else if (riscv_expand_strlen (operands[0], operands[1], operands[2], + operands[3])) DONE; else FAIL; diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 0c6517bdc8b..00b52f5dc77 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -536,21 +536,21 @@ Enable the use of vector registers for function arguments and return value. This is an experimental switch and may be subject to change in the future. Enum -Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum) -Valid arguments to -mmemcpy-strategy=: +Name(stringop_strategy) Type(enum stringop_strategy_enum) +Valid arguments to -mbuilin-strategy=: EnumValue -Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO) +Enum(stringop_strategy) String(auto) Value(STRINGOP_STRATEGY_AUTO) EnumValue -Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL) +Enum(stringop_strategy) String(libcall) Value(STRINGOP_STRATEGY_LIBCALL) EnumValue -Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR) +Enum(stringop_strategy) String(scalar) Value(STRINGOP_STRATEGY_SCALAR) EnumValue -Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR) +Enum(stringop_strategy) String(vector) Value(STRINGOP_STRATEGY_VECTOR) -mmemcpy-strategy= -Target RejectNegative Joined Enum(riscv_stringop_strategy) Var(riscv_memcpy_strategy) Init(USE_AUTO) -Specify memcpy expansion strategy. +mbuiltin-strategy= +Target RejectNegative Joined Enum(stringop_strategy) Var(stringop_strategy) Init(STRINGOP_STRATEGY_AUTO) +Specify builtin expansion strategy. diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c new file mode 100644 index 00000000000..6dec7da91c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3 -minline-strcmp" } */ + +#include + +int +__attribute__ ((noipa)) +foo (const char *s, const char *t) +{ + return __builtin_strcmp (s, t); +} + +int +__attribute__ ((noipa, optimize ("0"))) +foo2 (const char *s, const char *t) +{ + return strcmp (s, t); +} + +#define SZ 10 + +int main () +{ + const char *s[SZ] + = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43", + "a", "z", "1", "9", "12345678901234567889012345678901234567890"}; + + for (int i = 0; i < SZ; i++) + for (int j = 0; j < SZ; j++) + if (foo (s[i], s[j]) != foo2 (s[i], s[j])) + __builtin_abort (); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c new file mode 100644 index 00000000000..f9d33a74fc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { riscv_v } } } */ +/* { dg-additional-options "-O3 -minline-strcmp" } */ + +int +__attribute__ ((noipa)) +foo (const char *s, const char *t) +{ + return __builtin_strcmp (s, t); +} + +/* { dg-final { scan-assembler-times "vle8ff" 2 } } */ +/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */ +/* { dg-final { scan-assembler-times "vmor.m" 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c new file mode 100644 index 00000000000..d29297a5f86 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c @@ -0,0 +1,37 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3 -minline-strlen" } */ + +int +__attribute__ ((noipa)) +foo (const char *s) +{ + return __builtin_strlen (s); +} + +int +__attribute__ ((noipa)) +foo2 (const char *s) +{ + int n = 0; + while (*s++ != '\0') + { + asm volatile (""); + n++; + } + return n; +} + +#define SZ 10 + +int main () +{ + const char *s[SZ] + = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43", + "a", "z", "1", "9", "12345678901234567889012345678901234567890"}; + + for (int i = 0; i < SZ; i++) + { + if (foo (s[i]) != foo2 (s[i])) + __builtin_abort (); + } +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c new file mode 100644 index 00000000000..0c6cca63ebf --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { riscv_v } } } */ +/* { dg-additional-options "-O3 -minline-strlen" } */ + +int +__attribute__ ((noipa)) +foo (const char *s) +{ + return __builtin_strlen (s); +} + +/* { dg-final { scan-assembler-times "vle8ff" 1 } } */ +/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */