From patchwork Fri Oct 18 13:12:58 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Craig Blackmore <craig.blackmore@embecosm.com>
X-Patchwork-Id: 1999115
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256
 header.s=google header.b=EUBZGrQK;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQGd48Q1z1xth
	for <incoming@patchwork.ozlabs.org>; Sat, 19 Oct 2024 00:17:13 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B9C503858282
	for <incoming@patchwork.ozlabs.org>; Fri, 18 Oct 2024 13:17:11 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com
 [IPv6:2a00:1450:4864:20::32d])
 by sourceware.org (Postfix) with ESMTPS id DF6323858429
 for <gcc-patches@gcc.gnu.org>; Fri, 18 Oct 2024 13:14:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF6323858429
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=embecosm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DF6323858429
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2a00:1450:4864:20::32d
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; cv=none;
 b=fA8/7Nv2qiA5Mnuo3M5Ih0GlrqrL1h+rTGuAM97CAE0/0rVPbiXc08wVm67FXISLyJHaMCo3c2pRai5HzS+CcJYiqhrywxGgApLVwMFTsVHosvkLT4UPRwcTDzhS7YlI+ms89n0KgewgQ+T+Nh64iV4Ty/j5s4Z8Vdqeem0Fn3M=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1729257275; c=relaxed/simple;
 bh=A/kZslWybJKaPzqCtHS4bd1xCZtPhWs+mkIuJKyozkA=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=qFluKIctC1SAB80GvGELaSbMV39cYylFycNRkcu9OxAyycrkgstR2dQJ8fUWPepfe1ITMFjNCT+tI/vSTJBuQVvN+GSoaK+7EQISqiXQmCEIxjVLWvyNJvOtLnJuL2TKqyG2z0dGdDTb67Unm/tUb/lj9qPQefsl/i8/yRb/gZ8=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-wm1-x32d.google.com with SMTP id
 5b1f17b1804b1-4315abed18aso16364335e9.2
 for <gcc-patches@gcc.gnu.org>; Fri, 18 Oct 2024 06:14:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=embecosm.com; s=google; t=1729257269; x=1729862069; darn=gcc.gnu.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=11KfAi+WfuuzV6ZNe263VtwD3S8c/IOhbDbvWOtQNnc=;
 b=EUBZGrQKtnjtqaa9e3ZEMaTY8BGhBBnmoTfxU4u16/ejkHqeqkdOh5q0OlaApjKMom
 PGrrwy8Drtpasqh2WCeRignzhrvdiQkgUT4UoByovwUj46MqMCS23NCJsSow0Au9/4of
 TAPMHiJooTOCv6X3t9/28eEZN/waJgytGBSgOisu2B0sCPCxwi11ht+OekLJaUxs/JlR
 739H0QkA1MZU0G3KE96Gbrn637W+pHQ+N26bis48p9EtujEYe0BSx8ougm91FMuT+u5L
 uZuZWsGmKtxipOnSxUhdqVG/slBm8seobSq/C7A0qCk2AiH6ajUulvhNIvv5LMMpqNaQ
 KTDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1729257269; x=1729862069;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=11KfAi+WfuuzV6ZNe263VtwD3S8c/IOhbDbvWOtQNnc=;
 b=NDyItMhJvjgtbDNLMXJXVRaKuNgHCYXijrZbS+LOh3SUYo44r7B8xfSceBzVzw/5zI
 9k+X9Sq1hC/iUSuvjOg9tNxXj6A1VaR/1bINBp3oU40epMylQLF7l48efdT+yBFB2bwK
 wYGva246FURJ7QsjQdVINILTZMRf3nvzXGjr4P90oJZy8IuXVPxBfUG9qSCz40NBuYz/
 fSieevx+92eTomnkuQ7SRbLvOgW3lS1aSoelAdFUfPhd5jvpjVnF8rfZWmr7jVoIZIMy
 XFUyrRh3UEshYrUqTCmgfMM4sW1T4OzL4coPb+kvrkjvMAxU6NSMW3sqkMVjbX5OtSZn
 GKuQ==
X-Gm-Message-State: AOJu0YxwxZNQw1toBIJNSycRFjZApiO7j677Dp+hsXKuEco38DBtn+2H
 s0ha6+Z+kMqKCLGhaftgVgi9kh37XrVB5KNHP30dOAz+Of9GTAs0LAOQr+Ur1iHYC6dLeI0ts9a
 4
X-Google-Smtp-Source: 
 AGHT+IEkithmcJYGT/ASW6ofQu9WlmqMnM2QW+MuKoChYLXpJAaCSXrUjamemd6oe2fJ1UfhOgUfjg==
X-Received: by 2002:a05:600c:5120:b0:431:5533:8f0b with SMTP id
 5b1f17b1804b1-431616a3a96mr17901745e9.32.1729257269363;
 Fri, 18 Oct 2024 06:14:29 -0700 (PDT)
Received: from dorian..
 (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149])
 by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.28
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 18 Oct 2024 06:14:29 -0700 (PDT)
From: Craig Blackmore <craig.blackmore@embecosm.com>
To: gcc-patches@gcc.gnu.org
Cc: Craig Blackmore <craig.blackmore@embecosm.com>
Subject: [PATCH 5/7] RISC-V: Move vector memcpy decision making to separate
 function [NFC]
Date: Fri, 18 Oct 2024 14:12:58 +0100
Message-ID: <20241018131300.1150819-6-craig.blackmore@embecosm.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com>
References: <20241018131300.1150819-1-craig.blackmore@embecosm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

This moves the code for deciding whether to generate a vectorized
memcpy, what vector mode to use and whether a loop is needed out of
riscv_vector::expand_block_move and into a new function
riscv_vector::use_stringop_p so that it can be reused for other string
operations.

gcc/ChangeLog:

	* config/riscv/riscv-string.cc (struct stringop_info): New.
	(expand_block_move): Move decision making code to...
	(use_vector_stringop_p): ...here.
---
 gcc/config/riscv/riscv-string.cc | 143 +++++++++++++++++++------------
 1 file changed, 87 insertions(+), 56 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 64fd6b29092..118c02a4021 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1051,35 +1051,31 @@ riscv_expand_block_clear (rtx dest, rtx length)
 
 namespace riscv_vector {
 
-/* Used by cpymemsi in riscv.md .  */
+struct stringop_info {
+  rtx avl;
+  bool need_loop;
+  machine_mode vmode;
+};
 
-bool
-expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p)
-{
-  /*
-    memcpy:
-	mv a3, a0                       # Copy destination
-    loop:
-	vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
-	vle8.v v0, (a1)                 # Load bytes
-	add a1, a1, t0                  # Bump pointer
-	sub a2, a2, t0                  # Decrement count
-	vse8.v v0, (a3)                 # Store bytes
-	add a3, a3, t0                  # Bump pointer
-	bnez a2, loop                   # Any more?
-	ret                             # Return
-  */
-  gcc_assert (TARGET_VECTOR);
+/* If a vectorized stringop should be used populate INFO and return TRUE.
+   Otherwise return false and leave INFO unchanged.
 
-  HOST_WIDE_INT potential_ew
-    = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
-       / BITS_PER_UNIT);
-  machine_mode vmode = VOIDmode;
+   MAX_EW is the maximum element width that the caller wants to use and
+   LENGTH_IN is the length of the stringop in bytes.
+*/
+
+static bool
+use_vector_stringop_p (struct stringop_info &info, HOST_WIDE_INT max_ew,
+		       rtx length_in)
+{
   bool need_loop = true;
-  bool size_p = optimize_function_for_size_p (cfun);
-  rtx src, dst;
-  rtx vec;
-  rtx length_rtx = length_in;
+  machine_mode vmode = VOIDmode;
+  /* The number of elements in the stringop.  */
+  rtx avl = length_in;
+  HOST_WIDE_INT potential_ew = max_ew;
+
+  if (!TARGET_VECTOR || !(stringop_strategy & STRATEGY_VECTOR))
+    return false;
 
   if (CONST_INT_P (length_in))
     {
@@ -1113,17 +1109,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p)
 	 for small element widths, we might allow larger element widths for
 	 loops too.  */
       if (need_loop)
-	{
-	  if (movmem_p)
-	    /* Inlining general memmove is a pessimisation: we can't avoid
-	       having to decide which direction to go at runtime, which is
-	       costly in instruction count however for situations where the
-	       entire move fits in one vector operation we can do all reads
-	       before doing any writes so we don't have to worry so generate
-	       the inline vector code in such situations.  */
-	    return false;
-	  potential_ew = 1;
-	}
+	potential_ew = 1;
       for (; potential_ew; potential_ew >>= 1)
 	{
 	  scalar_int_mode elem_mode;
@@ -1193,7 +1179,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p)
 	  gcc_assert (potential_ew > 1);
 	}
       if (potential_ew > 1)
-	length_rtx = GEN_INT (length / potential_ew);
+	avl = GEN_INT (length / potential_ew);
     }
   else
     {
@@ -1203,35 +1189,80 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p)
   /* A memcpy libcall in the worst case takes 3 instructions to prepare the
      arguments + 1 for the call.  When RVV should take 7 instructions and
      we're optimizing for size a libcall may be preferable.  */
-  if (size_p && need_loop)
+  if (optimize_function_for_size_p (cfun) && need_loop)
     return false;
 
-  /* length_rtx holds the (remaining) length of the required copy.
+  info.need_loop = need_loop;
+  info.vmode = vmode;
+  info.avl = avl;
+  return true;
+}
+
+/* Used by cpymemsi in riscv.md .  */
+
+bool
+expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p)
+{
+  /*
+    memcpy:
+	mv a3, a0                       # Copy destination
+    loop:
+	vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
+	vle8.v v0, (a1)                 # Load bytes
+	add a1, a1, t0                  # Bump pointer
+	sub a2, a2, t0                  # Decrement count
+	vse8.v v0, (a3)                 # Store bytes
+	add a3, a3, t0                  # Bump pointer
+	bnez a2, loop                   # Any more?
+	ret                             # Return
+  */
+  struct stringop_info info;
+
+  HOST_WIDE_INT potential_ew
+    = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
+       / BITS_PER_UNIT);
+
+  if (!use_vector_stringop_p (info, potential_ew, length_in))
+    return false;
+
+  /* Inlining general memmove is a pessimisation: we can't avoid having to
+     decide which direction to go at runtime, which is costly in instruction
+     count however for situations where the entire move fits in one vector
+     operation we can do all reads before doing any writes so we don't have to
+     worry so generate the inline vector code in such situations.  */
+  if (info.need_loop && movmem_p)
+    return false;
+
+  rtx src, dst;
+  rtx vec;
+
+  /* avl holds the (remaining) length of the required copy.
      cnt holds the length we copy with the current load/store pair.  */
-  rtx cnt = length_rtx;
+  rtx cnt = info.avl;
   rtx label = NULL_RTX;
   rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0));
   rtx src_addr = copy_addr_to_reg (XEXP (src_in, 0));
 
-  if (need_loop)
+  if (info.need_loop)
     {
-      length_rtx = copy_to_mode_reg (Pmode, length_rtx);
+      info.avl = copy_to_mode_reg (Pmode, info.avl);
       cnt = gen_reg_rtx (Pmode);
       label = gen_label_rtx ();
 
       emit_label (label);
-      emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (vmode, cnt,
-							       length_rtx));
+      emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (info.vmode, cnt,
+							       info.avl));
     }
 
-  vec = gen_reg_rtx (vmode);
-  src = change_address (src_in, vmode, src_addr);
-  dst = change_address (dst_in, vmode, dst_addr);
+  vec = gen_reg_rtx (info.vmode);
+  src = change_address (src_in, info.vmode, src_addr);
+  dst = change_address (dst_in, info.vmode, dst_addr);
 
   /* If we don't need a loop and have a suitable mode to describe the size,
      just do a load / store pair and leave it up to the later lazy code
      motion pass to insert the appropriate vsetvli.  */
-  if (!need_loop && known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+  if (!info.need_loop
+      && known_eq (GET_MODE_SIZE (info.vmode), INTVAL (length_in)))
     {
       emit_move_insn (vec, src);
       emit_move_insn (dst, vec);
@@ -1239,26 +1270,26 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p)
   else
     {
       machine_mode mask_mode = riscv_vector::get_vector_mode
-	(BImode, GET_MODE_NUNITS (vmode)).require ();
+	(BImode, GET_MODE_NUNITS (info.vmode)).require ();
       rtx mask =  CONSTM1_RTX (mask_mode);
       if (!satisfies_constraint_K (cnt))
 	cnt= force_reg (Pmode, cnt);
       rtx m_ops[] = {vec, mask, src};
-      emit_nonvlmax_insn (code_for_pred_mov (vmode),
+      emit_nonvlmax_insn (code_for_pred_mov (info.vmode),
 			  riscv_vector::UNARY_OP_TAMA, m_ops, cnt);
-      emit_insn (gen_pred_store (vmode, dst, mask, vec, cnt,
+      emit_insn (gen_pred_store (info.vmode, dst, mask, vec, cnt,
 				 get_avl_type_rtx (riscv_vector::NONVLMAX)));
     }
 
-  if (need_loop)
+  if (info.need_loop)
     {
       emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt)));
       emit_insn (gen_rtx_SET (dst_addr, gen_rtx_PLUS (Pmode, dst_addr, cnt)));
-      emit_insn (gen_rtx_SET (length_rtx, gen_rtx_MINUS (Pmode, length_rtx, cnt)));
+      emit_insn (gen_rtx_SET (info.avl, gen_rtx_MINUS (Pmode, info.avl, cnt)));
 
       /* Emit the loop condition.  */
-      rtx test = gen_rtx_NE (VOIDmode, length_rtx, const0_rtx);
-      emit_jump_insn (gen_cbranch4 (Pmode, test, length_rtx, const0_rtx, label));
+      rtx test = gen_rtx_NE (VOIDmode, info.avl, const0_rtx);
+      emit_jump_insn (gen_cbranch4 (Pmode, test, info.avl, const0_rtx, label));
       emit_insn (gen_nop ());
     }