From patchwork Fri Oct 18 13:12:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999115 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=EUBZGrQK; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQGd48Q1z1xth for ; Sat, 19 Oct 2024 00:17:13 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B9C503858282 for ; Fri, 18 Oct 2024 13:17:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id DF6323858429 for ; Fri, 18 Oct 2024 13:14:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF6323858429 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DF6323858429 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; cv=none; b=fA8/7Nv2qiA5Mnuo3M5Ih0GlrqrL1h+rTGuAM97CAE0/0rVPbiXc08wVm67FXISLyJHaMCo3c2pRai5HzS+CcJYiqhrywxGgApLVwMFTsVHosvkLT4UPRwcTDzhS7YlI+ms89n0KgewgQ+T+Nh64iV4Ty/j5s4Z8Vdqeem0Fn3M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; c=relaxed/simple; bh=A/kZslWybJKaPzqCtHS4bd1xCZtPhWs+mkIuJKyozkA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qFluKIctC1SAB80GvGELaSbMV39cYylFycNRkcu9OxAyycrkgstR2dQJ8fUWPepfe1ITMFjNCT+tI/vSTJBuQVvN+GSoaK+7EQISqiXQmCEIxjVLWvyNJvOtLnJuL2TKqyG2z0dGdDTb67Unm/tUb/lj9qPQefsl/i8/yRb/gZ8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-4315abed18aso16364335e9.2 for ; Fri, 18 Oct 2024 06:14:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257269; x=1729862069; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=11KfAi+WfuuzV6ZNe263VtwD3S8c/IOhbDbvWOtQNnc=; b=EUBZGrQKtnjtqaa9e3ZEMaTY8BGhBBnmoTfxU4u16/ejkHqeqkdOh5q0OlaApjKMom PGrrwy8Drtpasqh2WCeRignzhrvdiQkgUT4UoByovwUj46MqMCS23NCJsSow0Au9/4of TAPMHiJooTOCv6X3t9/28eEZN/waJgytGBSgOisu2B0sCPCxwi11ht+OekLJaUxs/JlR 739H0QkA1MZU0G3KE96Gbrn637W+pHQ+N26bis48p9EtujEYe0BSx8ougm91FMuT+u5L uZuZWsGmKtxipOnSxUhdqVG/slBm8seobSq/C7A0qCk2AiH6ajUulvhNIvv5LMMpqNaQ KTDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257269; x=1729862069; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=11KfAi+WfuuzV6ZNe263VtwD3S8c/IOhbDbvWOtQNnc=; b=NDyItMhJvjgtbDNLMXJXVRaKuNgHCYXijrZbS+LOh3SUYo44r7B8xfSceBzVzw/5zI 9k+X9Sq1hC/iUSuvjOg9tNxXj6A1VaR/1bINBp3oU40epMylQLF7l48efdT+yBFB2bwK wYGva246FURJ7QsjQdVINILTZMRf3nvzXGjr4P90oJZy8IuXVPxBfUG9qSCz40NBuYz/ fSieevx+92eTomnkuQ7SRbLvOgW3lS1aSoelAdFUfPhd5jvpjVnF8rfZWmr7jVoIZIMy XFUyrRh3UEshYrUqTCmgfMM4sW1T4OzL4coPb+kvrkjvMAxU6NSMW3sqkMVjbX5OtSZn GKuQ== X-Gm-Message-State: AOJu0YxwxZNQw1toBIJNSycRFjZApiO7j677Dp+hsXKuEco38DBtn+2H s0ha6+Z+kMqKCLGhaftgVgi9kh37XrVB5KNHP30dOAz+Of9GTAs0LAOQr+Ur1iHYC6dLeI0ts9a 4 X-Google-Smtp-Source: AGHT+IEkithmcJYGT/ASW6ofQu9WlmqMnM2QW+MuKoChYLXpJAaCSXrUjamemd6oe2fJ1UfhOgUfjg== X-Received: by 2002:a05:600c:5120:b0:431:5533:8f0b with SMTP id 5b1f17b1804b1-431616a3a96mr17901745e9.32.1729257269363; Fri, 18 Oct 2024 06:14:29 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:29 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 5/7] RISC-V: Move vector memcpy decision making to separate function [NFC] Date: Fri, 18 Oct 2024 14:12:58 +0100 Message-ID: <20241018131300.1150819-6-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This moves the code for deciding whether to generate a vectorized memcpy, what vector mode to use and whether a loop is needed out of riscv_vector::expand_block_move and into a new function riscv_vector::use_stringop_p so that it can be reused for other string operations. gcc/ChangeLog: * config/riscv/riscv-string.cc (struct stringop_info): New. (expand_block_move): Move decision making code to... (use_vector_stringop_p): ...here. --- gcc/config/riscv/riscv-string.cc | 143 +++++++++++++++++++------------ 1 file changed, 87 insertions(+), 56 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 64fd6b29092..118c02a4021 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1051,35 +1051,31 @@ riscv_expand_block_clear (rtx dest, rtx length) namespace riscv_vector { -/* Used by cpymemsi in riscv.md . */ +struct stringop_info { + rtx avl; + bool need_loop; + machine_mode vmode; +}; -bool -expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) -{ - /* - memcpy: - mv a3, a0 # Copy destination - loop: - vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b - vle8.v v0, (a1) # Load bytes - add a1, a1, t0 # Bump pointer - sub a2, a2, t0 # Decrement count - vse8.v v0, (a3) # Store bytes - add a3, a3, t0 # Bump pointer - bnez a2, loop # Any more? - ret # Return - */ - gcc_assert (TARGET_VECTOR); +/* If a vectorized stringop should be used populate INFO and return TRUE. + Otherwise return false and leave INFO unchanged. - HOST_WIDE_INT potential_ew - = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD) - / BITS_PER_UNIT); - machine_mode vmode = VOIDmode; + MAX_EW is the maximum element width that the caller wants to use and + LENGTH_IN is the length of the stringop in bytes. +*/ + +static bool +use_vector_stringop_p (struct stringop_info &info, HOST_WIDE_INT max_ew, + rtx length_in) +{ bool need_loop = true; - bool size_p = optimize_function_for_size_p (cfun); - rtx src, dst; - rtx vec; - rtx length_rtx = length_in; + machine_mode vmode = VOIDmode; + /* The number of elements in the stringop. */ + rtx avl = length_in; + HOST_WIDE_INT potential_ew = max_ew; + + if (!TARGET_VECTOR || !(stringop_strategy & STRATEGY_VECTOR)) + return false; if (CONST_INT_P (length_in)) { @@ -1113,17 +1109,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) for small element widths, we might allow larger element widths for loops too. */ if (need_loop) - { - if (movmem_p) - /* Inlining general memmove is a pessimisation: we can't avoid - having to decide which direction to go at runtime, which is - costly in instruction count however for situations where the - entire move fits in one vector operation we can do all reads - before doing any writes so we don't have to worry so generate - the inline vector code in such situations. */ - return false; - potential_ew = 1; - } + potential_ew = 1; for (; potential_ew; potential_ew >>= 1) { scalar_int_mode elem_mode; @@ -1193,7 +1179,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) gcc_assert (potential_ew > 1); } if (potential_ew > 1) - length_rtx = GEN_INT (length / potential_ew); + avl = GEN_INT (length / potential_ew); } else { @@ -1203,35 +1189,80 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) /* A memcpy libcall in the worst case takes 3 instructions to prepare the arguments + 1 for the call. When RVV should take 7 instructions and we're optimizing for size a libcall may be preferable. */ - if (size_p && need_loop) + if (optimize_function_for_size_p (cfun) && need_loop) return false; - /* length_rtx holds the (remaining) length of the required copy. + info.need_loop = need_loop; + info.vmode = vmode; + info.avl = avl; + return true; +} + +/* Used by cpymemsi in riscv.md . */ + +bool +expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) +{ + /* + memcpy: + mv a3, a0 # Copy destination + loop: + vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b + vle8.v v0, (a1) # Load bytes + add a1, a1, t0 # Bump pointer + sub a2, a2, t0 # Decrement count + vse8.v v0, (a3) # Store bytes + add a3, a3, t0 # Bump pointer + bnez a2, loop # Any more? + ret # Return + */ + struct stringop_info info; + + HOST_WIDE_INT potential_ew + = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD) + / BITS_PER_UNIT); + + if (!use_vector_stringop_p (info, potential_ew, length_in)) + return false; + + /* Inlining general memmove is a pessimisation: we can't avoid having to + decide which direction to go at runtime, which is costly in instruction + count however for situations where the entire move fits in one vector + operation we can do all reads before doing any writes so we don't have to + worry so generate the inline vector code in such situations. */ + if (info.need_loop && movmem_p) + return false; + + rtx src, dst; + rtx vec; + + /* avl holds the (remaining) length of the required copy. cnt holds the length we copy with the current load/store pair. */ - rtx cnt = length_rtx; + rtx cnt = info.avl; rtx label = NULL_RTX; rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0)); rtx src_addr = copy_addr_to_reg (XEXP (src_in, 0)); - if (need_loop) + if (info.need_loop) { - length_rtx = copy_to_mode_reg (Pmode, length_rtx); + info.avl = copy_to_mode_reg (Pmode, info.avl); cnt = gen_reg_rtx (Pmode); label = gen_label_rtx (); emit_label (label); - emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (vmode, cnt, - length_rtx)); + emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (info.vmode, cnt, + info.avl)); } - vec = gen_reg_rtx (vmode); - src = change_address (src_in, vmode, src_addr); - dst = change_address (dst_in, vmode, dst_addr); + vec = gen_reg_rtx (info.vmode); + src = change_address (src_in, info.vmode, src_addr); + dst = change_address (dst_in, info.vmode, dst_addr); /* If we don't need a loop and have a suitable mode to describe the size, just do a load / store pair and leave it up to the later lazy code motion pass to insert the appropriate vsetvli. */ - if (!need_loop && known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + if (!info.need_loop + && known_eq (GET_MODE_SIZE (info.vmode), INTVAL (length_in))) { emit_move_insn (vec, src); emit_move_insn (dst, vec); @@ -1239,26 +1270,26 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) else { machine_mode mask_mode = riscv_vector::get_vector_mode - (BImode, GET_MODE_NUNITS (vmode)).require (); + (BImode, GET_MODE_NUNITS (info.vmode)).require (); rtx mask = CONSTM1_RTX (mask_mode); if (!satisfies_constraint_K (cnt)) cnt= force_reg (Pmode, cnt); rtx m_ops[] = {vec, mask, src}; - emit_nonvlmax_insn (code_for_pred_mov (vmode), + emit_nonvlmax_insn (code_for_pred_mov (info.vmode), riscv_vector::UNARY_OP_TAMA, m_ops, cnt); - emit_insn (gen_pred_store (vmode, dst, mask, vec, cnt, + emit_insn (gen_pred_store (info.vmode, dst, mask, vec, cnt, get_avl_type_rtx (riscv_vector::NONVLMAX))); } - if (need_loop) + if (info.need_loop) { emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt))); emit_insn (gen_rtx_SET (dst_addr, gen_rtx_PLUS (Pmode, dst_addr, cnt))); - emit_insn (gen_rtx_SET (length_rtx, gen_rtx_MINUS (Pmode, length_rtx, cnt))); + emit_insn (gen_rtx_SET (info.avl, gen_rtx_MINUS (Pmode, info.avl, cnt))); /* Emit the loop condition. */ - rtx test = gen_rtx_NE (VOIDmode, length_rtx, const0_rtx); - emit_jump_insn (gen_cbranch4 (Pmode, test, length_rtx, const0_rtx, label)); + rtx test = gen_rtx_NE (VOIDmode, info.avl, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test, info.avl, const0_rtx, label)); emit_insn (gen_nop ()); }