From patchwork Tue Nov 7 01:31:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Palmer Dabbelt X-Patchwork-Id: 835065 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-466087-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="srEFJqWN"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yWBjj07mVz9s3w for ; Tue, 7 Nov 2017 12:31:42 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:date:message-id:cc:from:to; q=dns; s=default; b=hQ2etle av3eHsVvxcs7A2tLOKKs8B6XfYknNWyiYnVufobhIBiXBJLU6arhGbnFHm3e5v2x c0GZGI/0snHYRDsgCNyWmD6eYQQVHhKSuEvOWQ2ytkOADWITRAm318skDan8G/rW 2FT1MjTJDvVYZknv119ByHHHLQrecvqIh4/0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:date:message-id:cc:from:to; s=default; bh=uVneBVi9XQMAY hTzK/+TcuaGoVk=; b=srEFJqWNCmE4zYWJDbuiHVzryWhgCvMpGL84pT88xz1Jz aiT5UhX5UgAaF/W+8iyC55XybTRgAUl78aWBDuT/aXlJnm5Wq3Cv8VURRbxkrVuQ qgpbWXlY9ajj08PTSHt+SjSg5njrCZ4Udj86hGKZf2yS7sO+zO/d7u9ZIAWNvg= Received: (qmail 106806 invoked by alias); 7 Nov 2017 01:31:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 106796 invoked by uid 89); 7 Nov 2017 01:31:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=mop, HIGH, Mop, Tune X-HELO: mail-pf0-f182.google.com Received: from mail-pf0-f182.google.com (HELO mail-pf0-f182.google.com) (209.85.192.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 07 Nov 2017 01:31:31 +0000 Received: by mail-pf0-f182.google.com with SMTP id b79so9102395pfk.5 for ; Mon, 06 Nov 2017 17:31:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:date:message-id:cc:from:to; bh=5ApWoZiPHnT8F1rdWVtDXqRvdB97BdekHUb4rHmnOVY=; b=rlL6XvkU840MTPyd7ol3WBMGXfrQi4BXk6QkM44tyyvhYG9pmv2nmIqBzEtYLKybtZ uEUSnIIUo2TOYxp2UkYOHg5NkX3DUM4TUC+XldckCRMZUuboJm8R9Gwy7B4Ro78tvch8 +OUYmUNncDOOVSvv7T+UsaKoJEVWYT4xduUlVKGdHiMwpDqTJ3Jt67jISvU4Km2l5GsT pNk11Uawm5GcQ3bxGke5CCs2slvxdSyYgMkoMdlzqlodyBOFrBLdDG/h2sBAzP5pqETO ey01Tf25tmOwGtmOys2rUG+brg1Gu0IssoUHjzEaCDX0+9PWYa+YUt823Exi+HfCO3wg Yaog== X-Gm-Message-State: AMCzsaUTDBzD+y3rcfuIrYsemsmc6BLRqUXnPlfDVFgdQKw6qMuFIAcr cEzDjcREhr2VO39AQzJmwePTNQ== X-Google-Smtp-Source: ABhQp+TxAl7js8JdbT0QWGbM5bcdgz8eyW8m/OjcAQb7Z6fa+d8PrZJQoiSF+VD5giUwNl8fN5LCgg== X-Received: by 10.98.13.79 with SMTP id v76mr18601663pfi.225.1510018289538; Mon, 06 Nov 2017 17:31:29 -0800 (PST) Received: from localhost ([12.206.222.5]) by smtp.gmail.com with ESMTPSA id c184sm96308pfg.35.2017.11.06.17.31.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Nov 2017 17:31:28 -0800 (PST) Subject: [PATCH] RISC-V: Implement movmemsi Date: Mon, 6 Nov 2017 17:31:05 -0800 Message-Id: <20171107013105.15859-1-palmer@dabbelt.com> Cc: patches@groups.riscv.org, Andrew Waterman From: Palmer Dabbelt To: gcc-patches@gcc.gnu.org X-IsSubscribed: yes From: Andrew Waterman Without this we aren't getting proper memcpy inlining on RISC-V systems, which is particularly disastrous for Dhrystone performance on RV32IM systems. gcc/ChangeLog 2017-11-06 Andrew Waterman * config/riscv/riscv-protos.h (riscv_hard_regno_nregs): New prototype. (riscv_expand_block_move): Likewise. gcc/config/riscv/riscv.h (MOVE_RATIO): Tune cost to movmemsi implementation. (RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER): New define. (RISCV_MAX_MOVE_BYTES_STRAIGHT): New define. gcc/config/riscv/riscv.c (riscv_block_move_straight): New function. (riscv_adjust_block_mem): Likewise. (riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise. gcc/config/riscv/riscv.md (movmemsi): New pattern. --- gcc/config/riscv/riscv-protos.h | 4 +- gcc/config/riscv/riscv.c | 156 ++++++++++++++++++++++++++++++++++++++++ gcc/config/riscv/riscv.h | 21 +++++- gcc/config/riscv/riscv.md | 13 ++++ 4 files changed, 190 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index ae551fb39775..34f9859928e2 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -67,7 +67,9 @@ extern HOST_WIDE_INT riscv_initial_elimination_offset (int, int); extern void riscv_expand_prologue (void); extern void riscv_expand_epilogue (bool); extern bool riscv_can_use_return_insn (void); -extern rtx riscv_function_value (const_tree, const_tree, machine_mode); +extern rtx riscv_function_value (const_tree, const_tree, enum machine_mode); +extern unsigned int riscv_hard_regno_nregs (int, enum machine_mode); +extern bool riscv_expand_block_move (rtx, rtx, rtx); /* Routines implemented in riscv-c.c. */ void riscv_cpu_cpp_builtins (cpp_reader *); diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 4a16a75fbafa..e9783e920ef6 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -2642,6 +2642,162 @@ riscv_legitimize_call_address (rtx addr) return addr; } +/* Emit straight-line code to move LENGTH bytes from SRC to DEST. + Assume that the areas do not overlap. */ + +static void +riscv_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length) +{ + HOST_WIDE_INT offset, delta; + unsigned HOST_WIDE_INT bits; + int i; + enum machine_mode mode; + rtx *regs; + + bits = MAX (BITS_PER_UNIT, + MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)))); + + mode = mode_for_size (bits, MODE_INT, 0); + delta = bits / BITS_PER_UNIT; + + /* Allocate a buffer for the temporary registers. */ + regs = XALLOCAVEC (rtx, length / delta); + + /* Load as many BITS-sized chunks as possible. Use a normal load if + the source has enough alignment, otherwise use left/right pairs. */ + for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) + { + regs[i] = gen_reg_rtx (mode); + riscv_emit_move (regs[i], adjust_address (src, mode, offset)); + } + + /* Copy the chunks to the destination. */ + for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) + riscv_emit_move (adjust_address (dest, mode, offset), regs[i]); + + /* Mop up any left-over bytes. */ + if (offset < length) + { + src = adjust_address (src, BLKmode, offset); + dest = adjust_address (dest, BLKmode, offset); + move_by_pieces (dest, src, length - offset, + MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), 0); + } +} + +/* Helper function for doing a loop-based block operation on memory + reference MEM. Each iteration of the loop will operate on LENGTH + bytes of MEM. + + Create a new base register for use within the loop and point it to + the start of MEM. Create a new memory reference that uses this + register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ + +static void +riscv_adjust_block_mem (rtx mem, HOST_WIDE_INT length, + rtx *loop_reg, rtx *loop_mem) +{ + *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); + + /* Although the new mem does not refer to a known location, + it does keep up to LENGTH bytes of alignment. */ + *loop_mem = change_address (mem, BLKmode, *loop_reg); + set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); +} + +/* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER + bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that + the memory regions do not overlap. */ + +static void +riscv_block_move_loop (rtx dest, rtx src, HOST_WIDE_INT length, + HOST_WIDE_INT bytes_per_iter) +{ + rtx label, src_reg, dest_reg, final_src, test; + HOST_WIDE_INT leftover; + + leftover = length % bytes_per_iter; + length -= leftover; + + /* Create registers and memory references for use within the loop. */ + riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); + riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); + + /* Calculate the value that SRC_REG should have after the last iteration + of the loop. */ + final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length), + 0, 0, OPTAB_WIDEN); + + /* Emit the start of the loop. */ + label = gen_label_rtx (); + emit_label (label); + + /* Emit the loop body. */ + riscv_block_move_straight (dest, src, bytes_per_iter); + + /* Move on to the next block. */ + riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter)); + riscv_emit_move (dest_reg, plus_constant (Pmode, dest_reg, bytes_per_iter)); + + /* Emit the loop condition. */ + test = gen_rtx_NE (VOIDmode, src_reg, final_src); + if (Pmode == DImode) + emit_jump_insn (gen_cbranchdi4 (test, src_reg, final_src, label)); + else + emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label)); + + /* Mop up any left-over bytes. */ + if (leftover) + riscv_block_move_straight (dest, src, leftover); + else + emit_insn(gen_nop ()); +} + +/* Expand a movmemsi instruction, which copies LENGTH bytes from + memory reference SRC to memory reference DEST. */ + +bool +riscv_expand_block_move (rtx dest, rtx src, rtx length) +{ + if (CONST_INT_P (length)) + { + HOST_WIDE_INT factor, align; + + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); + factor = BITS_PER_WORD / align; + + if (optimize_function_for_size_p (cfun) + && INTVAL (length) * factor * UNITS_PER_WORD > MOVE_RATIO (false)) + return false; + + if (INTVAL (length) <= RISCV_MAX_MOVE_BYTES_STRAIGHT / factor) + { + riscv_block_move_straight (dest, src, INTVAL (length)); + return true; + } + else if (optimize && align >= BITS_PER_WORD) + { + unsigned min_iter_words + = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; + unsigned iter_words = min_iter_words; + HOST_WIDE_INT bytes = INTVAL (length), words = bytes / UNITS_PER_WORD; + + /* Lengthen the loop body if it shortens the tail. */ + for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) + { + unsigned cur_cost = iter_words + words % iter_words; + unsigned new_cost = i + words % i; + if (new_cost <= cur_cost) + iter_words = i; + } + + riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); + return true; + } + } + return false; +} + /* Print symbolic operand OP, which is part of a HIGH or LO_SUM in context CONTEXT. HI_RELOC indicates a high-part reloc. */ diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index a802a3f8cbbb..c0901a093033 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -808,10 +808,25 @@ while (0) #undef PTRDIFF_TYPE #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int") -/* If a memory-to-memory move would take MOVE_RATIO or more simple - move-instruction pairs, we will do a movmem or libcall instead. */ +/* The maximum number of bytes copied by one iteration of a movmemsi loop. */ + +#define RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER (UNITS_PER_WORD * 4) + +/* The maximum number of bytes that can be copied by a straight-line + movmemsi implementation. */ -#define MOVE_RATIO(speed) (CLEAR_RATIO (speed) / 2) +#define RISCV_MAX_MOVE_BYTES_STRAIGHT (RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER * 3) + +/* If a memory-to-memory move would take MOVE_RATIO or more simple + move-instruction pairs, we will do a movmem or libcall instead. + Do not use move_by_pieces at all when strict alignment is not + in effect but the target has slow unaligned accesses; in this + case, movmem or libcall is more efficient. */ + +#define MOVE_RATIO(speed) \ + (!STRICT_ALIGNMENT && riscv_slow_unaligned_access ? 1 : \ + (speed) ? RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD : \ + CLEAR_RATIO (speed) / 2) /* For CLEAR_RATIO, when optimizing for size, give a better estimate of the length of a memset call, but use the default otherwise. */ diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 53e1db97db7d..814ff6ec6ad7 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -1436,6 +1436,19 @@ DONE; }) +(define_expand "movmemsi" + [(parallel [(set (match_operand:BLK 0 "general_operand") + (match_operand:BLK 1 "general_operand")) + (use (match_operand:SI 2 "")) + (use (match_operand:SI 3 "const_int_operand"))])] + "" +{ + if (riscv_expand_block_move (operands[0], operands[1], operands[2])) + DONE; + else + FAIL; +}) + ;; Expand in-line code to clear the instruction cache between operand[0] and ;; operand[1]. (define_expand "clear_cache"