From patchwork Tue Nov  7 01:31:05 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Palmer Dabbelt <palmer@dabbelt.com>
X-Patchwork-Id: 835065
Return-Path: 
 <gcc-patches-return-466087-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-466087-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="srEFJqWN"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3yWBjj07mVz9s3w
	for <incoming@patchwork.ozlabs.org>;
	Tue,  7 Nov 2017 12:31:42 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:subject:date:message-id:cc:from:to; q=dns; s=default; b=hQ2etle
	av3eHsVvxcs7A2tLOKKs8B6XfYknNWyiYnVufobhIBiXBJLU6arhGbnFHm3e5v2x
	c0GZGI/0snHYRDsgCNyWmD6eYQQVHhKSuEvOWQ2ytkOADWITRAm318skDan8G/rW
	2FT1MjTJDvVYZknv119ByHHHLQrecvqIh4/0=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:subject:date:message-id:cc:from:to; s=default; bh=uVneBVi9XQMAY
	hTzK/+TcuaGoVk=; b=srEFJqWNCmE4zYWJDbuiHVzryWhgCvMpGL84pT88xz1Jz
	aiT5UhX5UgAaF/W+8iyC55XybTRgAUl78aWBDuT/aXlJnm5Wq3Cv8VURRbxkrVuQ
	qgpbWXlY9ajj08PTSHt+SjSg5njrCZ4Udj86hGKZf2yS7sO+zO/d7u9ZIAWNvg=
Received: (qmail 106806 invoked by alias); 7 Nov 2017 01:31:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 106796 invoked by uid 89); 7 Nov 2017 01:31:35 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-26.3 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	RCVD_IN_DNSWL_NONE,
	SPF_PASS autolearn=ham version=3.3.2 spammy=mop, HIGH, Mop,
	Tune
X-HELO: mail-pf0-f182.google.com
Received: from mail-pf0-f182.google.com (HELO mail-pf0-f182.google.com)
	(209.85.192.182) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 07 Nov 2017 01:31:31 +0000
Received: by mail-pf0-f182.google.com with SMTP id b79so9102395pfk.5 for
	<gcc-patches@gcc.gnu.org>; Mon, 06 Nov 2017 17:31:31 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20161025;
	h=x-gm-message-state:subject:date:message-id:cc:from:to;
	bh=5ApWoZiPHnT8F1rdWVtDXqRvdB97BdekHUb4rHmnOVY=;
	b=rlL6XvkU840MTPyd7ol3WBMGXfrQi4BXk6QkM44tyyvhYG9pmv2nmIqBzEtYLKybtZ
	uEUSnIIUo2TOYxp2UkYOHg5NkX3DUM4TUC+XldckCRMZUuboJm8R9Gwy7B4Ro78tvch8
	+OUYmUNncDOOVSvv7T+UsaKoJEVWYT4xduUlVKGdHiMwpDqTJ3Jt67jISvU4Km2l5GsT
	pNk11Uawm5GcQ3bxGke5CCs2slvxdSyYgMkoMdlzqlodyBOFrBLdDG/h2sBAzP5pqETO
	ey01Tf25tmOwGtmOys2rUG+brg1Gu0IssoUHjzEaCDX0+9PWYa+YUt823Exi+HfCO3wg
	Yaog==
X-Gm-Message-State: AMCzsaUTDBzD+y3rcfuIrYsemsmc6BLRqUXnPlfDVFgdQKw6qMuFIAcr
	cEzDjcREhr2VO39AQzJmwePTNQ==
X-Google-Smtp-Source: 
 ABhQp+TxAl7js8JdbT0QWGbM5bcdgz8eyW8m/OjcAQb7Z6fa+d8PrZJQoiSF+VD5giUwNl8fN5LCgg==
X-Received: by 10.98.13.79 with SMTP id v76mr18601663pfi.225.1510018289538;
	Mon, 06 Nov 2017 17:31:29 -0800 (PST)
Received: from localhost ([12.206.222.5]) by smtp.gmail.com with ESMTPSA id
	c184sm96308pfg.35.2017.11.06.17.31.28 (version=TLS1_2
	cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Mon, 06 Nov 2017 17:31:28 -0800 (PST)
Subject: [PATCH] RISC-V: Implement movmemsi
Date: Mon,  6 Nov 2017 17:31:05 -0800
Message-Id: <20171107013105.15859-1-palmer@dabbelt.com>
Cc: patches@groups.riscv.org, Andrew Waterman <andrew@sifive.com>
From: Palmer Dabbelt <palmer@dabbelt.com>
To: gcc-patches@gcc.gnu.org
X-IsSubscribed: yes

From: Andrew Waterman <andrew@sifive.com>

Without this we aren't getting proper memcpy inlining on RISC-V systems,
which is particularly disastrous for Dhrystone performance on RV32IM
systems.

gcc/ChangeLog

2017-11-06  Andrew Waterman  <andrew@sifive.com>

        * config/riscv/riscv-protos.h (riscv_hard_regno_nregs): New
        prototype.
        (riscv_expand_block_move): Likewise.
        gcc/config/riscv/riscv.h (MOVE_RATIO): Tune cost to movmemsi
        implementation.
        (RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER): New define.
        (RISCV_MAX_MOVE_BYTES_STRAIGHT): New define.
        gcc/config/riscv/riscv.c (riscv_block_move_straight): New
        function.
        (riscv_adjust_block_mem): Likewise.
        (riscv_block_move_loop): Likewise.
        (riscv_expand_block_move): Likewise.
        gcc/config/riscv/riscv.md (movmemsi): New pattern.
---
 gcc/config/riscv/riscv-protos.h |   4 +-
 gcc/config/riscv/riscv.c        | 156 ++++++++++++++++++++++++++++++++++++++++
 gcc/config/riscv/riscv.h        |  21 +++++-
 gcc/config/riscv/riscv.md       |  13 ++++
 4 files changed, 190 insertions(+), 4 deletions(-)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ae551fb39775..34f9859928e2 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -67,7 +67,9 @@ extern HOST_WIDE_INT riscv_initial_elimination_offset (int, int);
 extern void riscv_expand_prologue (void);
 extern void riscv_expand_epilogue (bool);
 extern bool riscv_can_use_return_insn (void);
-extern rtx riscv_function_value (const_tree, const_tree, machine_mode);
+extern rtx riscv_function_value (const_tree, const_tree, enum machine_mode);
+extern unsigned int riscv_hard_regno_nregs (int, enum machine_mode);
+extern bool riscv_expand_block_move (rtx, rtx, rtx);
 
 /* Routines implemented in riscv-c.c.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 4a16a75fbafa..e9783e920ef6 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2642,6 +2642,162 @@ riscv_legitimize_call_address (rtx addr)
   return addr;
 }
 
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+   Assume that the areas do not overlap.  */
+
+static void
+riscv_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length)
+{
+  HOST_WIDE_INT offset, delta;
+  unsigned HOST_WIDE_INT bits;
+  int i;
+  enum machine_mode mode;
+  rtx *regs;
+
+  bits = MAX (BITS_PER_UNIT,
+	      MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest))));
+
+  mode = mode_for_size (bits, MODE_INT, 0);
+  delta = bits / BITS_PER_UNIT;
+
+  /* Allocate a buffer for the temporary registers.  */
+  regs = XALLOCAVEC (rtx, length / delta);
+
+  /* Load as many BITS-sized chunks as possible.  Use a normal load if
+     the source has enough alignment, otherwise use left/right pairs.  */
+  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+    {
+      regs[i] = gen_reg_rtx (mode);
+      riscv_emit_move (regs[i], adjust_address (src, mode, offset));
+    }
+
+  /* Copy the chunks to the destination.  */
+  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+    riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+
+  /* Mop up any left-over bytes.  */
+  if (offset < length)
+    {
+      src = adjust_address (src, BLKmode, offset);
+      dest = adjust_address (dest, BLKmode, offset);
+      move_by_pieces (dest, src, length - offset,
+		      MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), 0);
+    }
+}
+
+/* Helper function for doing a loop-based block operation on memory
+   reference MEM.  Each iteration of the loop will operate on LENGTH
+   bytes of MEM.
+
+   Create a new base register for use within the loop and point it to
+   the start of MEM.  Create a new memory reference that uses this
+   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+
+static void
+riscv_adjust_block_mem (rtx mem, HOST_WIDE_INT length,
+		       rtx *loop_reg, rtx *loop_mem)
+{
+  *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
+
+  /* Although the new mem does not refer to a known location,
+     it does keep up to LENGTH bytes of alignment.  */
+  *loop_mem = change_address (mem, BLKmode, *loop_reg);
+  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
+}
+
+/* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER
+   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  Assume that
+   the memory regions do not overlap.  */
+
+static void
+riscv_block_move_loop (rtx dest, rtx src, HOST_WIDE_INT length,
+		      HOST_WIDE_INT bytes_per_iter)
+{
+  rtx label, src_reg, dest_reg, final_src, test;
+  HOST_WIDE_INT leftover;
+
+  leftover = length % bytes_per_iter;
+  length -= leftover;
+
+  /* Create registers and memory references for use within the loop.  */
+  riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src);
+  riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest);
+
+  /* Calculate the value that SRC_REG should have after the last iteration
+     of the loop.  */
+  final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length),
+				   0, 0, OPTAB_WIDEN);
+
+  /* Emit the start of the loop.  */
+  label = gen_label_rtx ();
+  emit_label (label);
+
+  /* Emit the loop body.  */
+  riscv_block_move_straight (dest, src, bytes_per_iter);
+
+  /* Move on to the next block.  */
+  riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter));
+  riscv_emit_move (dest_reg, plus_constant (Pmode, dest_reg, bytes_per_iter));
+
+  /* Emit the loop condition.  */
+  test = gen_rtx_NE (VOIDmode, src_reg, final_src);
+  if (Pmode == DImode)
+    emit_jump_insn (gen_cbranchdi4 (test, src_reg, final_src, label));
+  else
+    emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label));
+
+  /* Mop up any left-over bytes.  */
+  if (leftover)
+    riscv_block_move_straight (dest, src, leftover);
+  else
+    emit_insn(gen_nop ());
+}
+
+/* Expand a movmemsi instruction, which copies LENGTH bytes from
+   memory reference SRC to memory reference DEST.  */
+
+bool
+riscv_expand_block_move (rtx dest, rtx src, rtx length)
+{
+  if (CONST_INT_P (length))
+    {
+      HOST_WIDE_INT factor, align;
+
+      align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+      factor = BITS_PER_WORD / align;
+
+      if (optimize_function_for_size_p (cfun)
+	  && INTVAL (length) * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+	return false;
+
+      if (INTVAL (length) <= RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)
+	{
+	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  return true;
+	}
+      else if (optimize && align >= BITS_PER_WORD)
+	{
+	  unsigned min_iter_words
+	    = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+	  unsigned iter_words = min_iter_words;
+	  HOST_WIDE_INT bytes = INTVAL (length), words = bytes / UNITS_PER_WORD;
+
+	  /* Lengthen the loop body if it shortens the tail.  */
+	  for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
+	    {
+	      unsigned cur_cost = iter_words + words % iter_words;
+	      unsigned new_cost = i + words % i;
+	      if (new_cost <= cur_cost)
+		iter_words = i;
+	    }
+
+	  riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+	  return true;
+	}
+    }
+  return false;
+}
+
 /* Print symbolic operand OP, which is part of a HIGH or LO_SUM
    in context CONTEXT.  HI_RELOC indicates a high-part reloc.  */
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index a802a3f8cbbb..c0901a093033 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -808,10 +808,25 @@ while (0)
 #undef PTRDIFF_TYPE
 #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int")
 
-/* If a memory-to-memory move would take MOVE_RATIO or more simple
-   move-instruction pairs, we will do a movmem or libcall instead.  */
+/* The maximum number of bytes copied by one iteration of a movmemsi loop.  */
+
+#define RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER (UNITS_PER_WORD * 4)
+
+/* The maximum number of bytes that can be copied by a straight-line
+   movmemsi implementation.  */
 
-#define MOVE_RATIO(speed) (CLEAR_RATIO (speed) / 2)
+#define RISCV_MAX_MOVE_BYTES_STRAIGHT (RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER * 3)
+
+/* If a memory-to-memory move would take MOVE_RATIO or more simple
+   move-instruction pairs, we will do a movmem or libcall instead.
+   Do not use move_by_pieces at all when strict alignment is not
+   in effect but the target has slow unaligned accesses; in this
+   case, movmem or libcall is more efficient.  */
+
+#define MOVE_RATIO(speed)						\
+  (!STRICT_ALIGNMENT && riscv_slow_unaligned_access ? 1 :		\
+   (speed) ? RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD :	\
+   CLEAR_RATIO (speed) / 2)
 
 /* For CLEAR_RATIO, when optimizing for size, give a better estimate
    of the length of a memset call, but use the default otherwise.  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 53e1db97db7d..814ff6ec6ad7 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1436,6 +1436,19 @@
   DONE;
 })
 
+(define_expand "movmemsi"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+		   (match_operand:BLK 1 "general_operand"))
+	      (use (match_operand:SI 2 ""))
+	      (use (match_operand:SI 3 "const_int_operand"))])]
+  ""
+{
+  if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"