From patchwork Mon Oct 16 20:04:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1849539 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=ItptD5wt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4S8SlC3B2vz1ypX for ; Tue, 17 Oct 2023 07:05:11 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 46A193857BA4 for ; Mon, 16 Oct 2023 20:05:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id 424663858C2A for ; Mon, 16 Oct 2023 20:04:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 424663858C2A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 424663858C2A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::42f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697486697; cv=none; b=Y8M/KrH/h8ELtO1gyeyjnzs7JYeBPYXAZir4t2UkiM0URVOBr47a4qhZJwT3+6ANwRd6uDSaRCr84GdYvpK7rJwxBMWrh1FRiBLj+RGgvcz3YXH7OyMBxJL0L5drzvN1DQKavoBT/o5DWpddyG/hF8GS66pBWkQ0iLfV2UMJUpc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697486697; c=relaxed/simple; bh=y0kVlz2PLBSovdXQ5+LiFBEjXx6mZVmiLF8YCpVVIx8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=PvCv8BaYhgKEjHmzE8DrAxFGTBhkPK109XNKJECXLR/H7feJ6mRK0cJp+tI4asU2Z3h1j+KI+mmue+mtXW8zRZ+pdYfiL97LkdghzzVJs7Zy+91TaAmHoIU86NKhp5f7L5HYk9mTa4No4R6SOVXZUrEmuVK6JsQPA8+w2b8VJ3s= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-6b1d1099a84so3299166b3a.1 for ; Mon, 16 Oct 2023 13:04:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1697486694; x=1698091494; darn=gcc.gnu.org; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=HKhOfAnUqlSAMzLfX3eBXBooIUrcMvs0obQNcQb30WQ=; b=ItptD5wt9AcjJdoms1qIGbBwocPWxpI9ftdCjcBjBk43EKLCIuNQ/wwwYrvIj8FE0a n414cWQlx88WhVAI53bMrWI8laD00pT+H7Pkd9aDJDZFI7x3sKY4JNq3Akh1HN7YkpC/ 6DW7jQVF/ZnvqFxwYt0R5RpCYxMe0uKZfd7MlQWHiVD1It2znmiLJ9s8lBU/oPOHKSHu dKHWBKPLm/CcoU2++t4B9ZXDhrjVJG8YodSGdKKwfJyXFpkVr8GDc3Dyr2uSNPtY58yA W4RJ5Zgitg2uel1AVJvGBMakF2o5mJ5BXNfNJwHvmkZQqg6jIXiOy/yVhXRPbJs8dqLc m5IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697486694; x=1698091494; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HKhOfAnUqlSAMzLfX3eBXBooIUrcMvs0obQNcQb30WQ=; b=KHz8akHtidYBhNJ+l+JstWlHDfSGv6CFM1bK9oHVdu1Ny7J99soRRSlQcuC5bj6d+K MI9XxqhDmP5QBTnCoP3t1pY3EfUx8lRXkB6PiKW2QLCN65gHT/+wjlvKDHI3KAmzaYK1 Hvxpm1I8coBPpUoqL6bMAj+S6Cn+HbYAShdo6MksjOmWKD86lMX/obHIJRrdODJ0nIgY TVMdH2XKrvQaYKEq2451F9TodDGg1fMeY6gNKvttlqBCTWG9iX7bTIygD/NKh1NV6jtO 7/tomJDDLaijWpjSyJQRvT1OyYgKoOp38ijYJRYSkoAX2UGeJgqDO/gQHLw7eKuXRXWC TL8A== X-Gm-Message-State: AOJu0YzN4jRfOut3iohBf5rchE4owGQLv3QFsaSte50V3pEbcP86TrJt XCDkoUzUUdrgnc2eQuO9lLsigW6a/WDdHT1SP9A= X-Google-Smtp-Source: AGHT+IGuvmfbPELPrSNxIOlsarfgGLLj3us94WjgtCmwWX9uyqaTKowff9Eu9DUlBeyDIqW0MJogsQ== X-Received: by 2002:a05:6a20:258b:b0:154:c836:9ed5 with SMTP id k11-20020a056a20258b00b00154c8369ed5mr75971pzd.17.1697486693868; Mon, 16 Oct 2023 13:04:53 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id az7-20020a17090b028700b0027360359b70sm5027448pjb.48.2023.10.16.13.04.50 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Oct 2023 13:04:53 -0700 (PDT) Message-ID: <26b66674-778a-49ad-bbf3-d25446b35814@ventanamicro.com> Date: Mon, 16 Oct 2023 14:04:47 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: "gcc-patches@gcc.gnu.org" From: Jeff Law Subject: [committed] RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This just moves a few functions out of riscv.cc into riscv-string.cc in an attempt to keep riscv.cc manageable. This was originally Christoph's code and I'm just pushing it on his behalf. Full disclosure: I built rv64gc after changing to verify everything still builds. Given it was just lifting code from one place to another, I didn't run the testsuite. Jeff commit 328745607c5d403a1c7b6bc2ecaa1574ee42122f Author: Christoph Müllner Date: Mon Oct 16 13:57:43 2023 -0600 RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc This just moves a few functions out of riscv.cc into riscv-string.cc in an attempt to keep riscv.cc manageable. This was originally Christoph's code and I'm just pushing it on his behalf. Full disclosure: I built rv64gc after changing to verify everything still builds. Given it was just lifting code from one place to another, I didn't run the testsuite. gcc/ * config/riscv/riscv-protos.h (emit_block_move): Remove redundant prototype. Improve comment. * config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc into riscv-string.cc. (riscv_adjust_block_mem, riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise. * config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved function. (riscv_adjust_block_mem, riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise. diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 49bdcdf2f93..6190faab501 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -117,7 +117,6 @@ extern rtx riscv_emit_binary (enum rtx_code code, rtx dest, rtx x, rtx y); extern bool riscv_expand_conditional_move (rtx, rtx, rtx, rtx); extern rtx riscv_legitimize_call_address (rtx); extern void riscv_set_return_address (rtx, rtx); -extern bool riscv_expand_block_move (rtx, rtx, rtx); extern rtx riscv_return_addr (int, rtx); extern poly_int64 riscv_initial_elimination_offset (int, int); extern void riscv_expand_prologue (void); @@ -125,7 +124,6 @@ extern void riscv_expand_epilogue (int); extern bool riscv_epilogue_uses (unsigned int); extern bool riscv_can_use_return_insn (void); extern rtx riscv_function_value (const_tree, const_tree, enum machine_mode); -extern bool riscv_expand_block_move (rtx, rtx, rtx); extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *); extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *); extern bool riscv_gpr_save_operation_p (rtx); @@ -160,6 +158,9 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned); rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt); rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt); +/* Routines implemented in riscv-string.c. */ +extern bool riscv_expand_block_move (rtx, rtx, rtx); + /* Information about one CPU we know about. */ struct riscv_cpu_info { /* This CPU's canonical name. */ diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 2bdff0374e8..0b4606aa7b2 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -592,3 +592,158 @@ riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align) return false; } + +/* Emit straight-line code to move LENGTH bytes from SRC to DEST. + Assume that the areas do not overlap. */ + +static void +riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) +{ + unsigned HOST_WIDE_INT offset, delta; + unsigned HOST_WIDE_INT bits; + int i; + enum machine_mode mode; + rtx *regs; + + bits = MAX (BITS_PER_UNIT, + MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)))); + + mode = mode_for_size (bits, MODE_INT, 0).require (); + delta = bits / BITS_PER_UNIT; + + /* Allocate a buffer for the temporary registers. */ + regs = XALLOCAVEC (rtx, length / delta); + + /* Load as many BITS-sized chunks as possible. Use a normal load if + the source has enough alignment, otherwise use left/right pairs. */ + for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) + { + regs[i] = gen_reg_rtx (mode); + riscv_emit_move (regs[i], adjust_address (src, mode, offset)); + } + + /* Copy the chunks to the destination. */ + for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) + riscv_emit_move (adjust_address (dest, mode, offset), regs[i]); + + /* Mop up any left-over bytes. */ + if (offset < length) + { + src = adjust_address (src, BLKmode, offset); + dest = adjust_address (dest, BLKmode, offset); + move_by_pieces (dest, src, length - offset, + MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN); + } +} + +/* Helper function for doing a loop-based block operation on memory + reference MEM. Each iteration of the loop will operate on LENGTH + bytes of MEM. + + Create a new base register for use within the loop and point it to + the start of MEM. Create a new memory reference that uses this + register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ + +static void +riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length, + rtx *loop_reg, rtx *loop_mem) +{ + *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); + + /* Although the new mem does not refer to a known location, + it does keep up to LENGTH bytes of alignment. */ + *loop_mem = change_address (mem, BLKmode, *loop_reg); + set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); +} + +/* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER + bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that + the memory regions do not overlap. */ + +static void +riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, + unsigned HOST_WIDE_INT bytes_per_iter) +{ + rtx label, src_reg, dest_reg, final_src, test; + unsigned HOST_WIDE_INT leftover; + + leftover = length % bytes_per_iter; + length -= leftover; + + /* Create registers and memory references for use within the loop. */ + riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); + riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); + + /* Calculate the value that SRC_REG should have after the last iteration + of the loop. */ + final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length), + 0, 0, OPTAB_WIDEN); + + /* Emit the start of the loop. */ + label = gen_label_rtx (); + emit_label (label); + + /* Emit the loop body. */ + riscv_block_move_straight (dest, src, bytes_per_iter); + + /* Move on to the next block. */ + riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter)); + riscv_emit_move (dest_reg, plus_constant (Pmode, dest_reg, bytes_per_iter)); + + /* Emit the loop condition. */ + test = gen_rtx_NE (VOIDmode, src_reg, final_src); + emit_jump_insn (gen_cbranch4 (Pmode, test, src_reg, final_src, label)); + + /* Mop up any left-over bytes. */ + if (leftover) + riscv_block_move_straight (dest, src, leftover); + else + emit_insn(gen_nop ()); +} + +/* Expand a cpymemsi instruction, which copies LENGTH bytes from + memory reference SRC to memory reference DEST. */ + +bool +riscv_expand_block_move (rtx dest, rtx src, rtx length) +{ + if (CONST_INT_P (length)) + { + unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); + unsigned HOST_WIDE_INT factor, align; + + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); + factor = BITS_PER_WORD / align; + + if (optimize_function_for_size_p (cfun) + && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) + return false; + + if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) + { + riscv_block_move_straight (dest, src, INTVAL (length)); + return true; + } + else if (optimize && align >= BITS_PER_WORD) + { + unsigned min_iter_words + = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; + unsigned iter_words = min_iter_words; + unsigned HOST_WIDE_INT bytes = hwi_length; + unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD; + + /* Lengthen the loop body if it shortens the tail. */ + for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) + { + unsigned cur_cost = iter_words + words % iter_words; + unsigned new_cost = i + words % i; + if (new_cost <= cur_cost) + iter_words = i; + } + + riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); + return true; + } + } + return false; +} diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 44746256f61..f2dcb0db6fb 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5147,161 +5147,6 @@ riscv_legitimize_call_address (rtx addr) return addr; } -/* Emit straight-line code to move LENGTH bytes from SRC to DEST. - Assume that the areas do not overlap. */ - -static void -riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) -{ - unsigned HOST_WIDE_INT offset, delta; - unsigned HOST_WIDE_INT bits; - int i; - enum machine_mode mode; - rtx *regs; - - bits = MAX (BITS_PER_UNIT, - MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)))); - - mode = mode_for_size (bits, MODE_INT, 0).require (); - delta = bits / BITS_PER_UNIT; - - /* Allocate a buffer for the temporary registers. */ - regs = XALLOCAVEC (rtx, length / delta); - - /* Load as many BITS-sized chunks as possible. Use a normal load if - the source has enough alignment, otherwise use left/right pairs. */ - for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) - { - regs[i] = gen_reg_rtx (mode); - riscv_emit_move (regs[i], adjust_address (src, mode, offset)); - } - - /* Copy the chunks to the destination. */ - for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) - riscv_emit_move (adjust_address (dest, mode, offset), regs[i]); - - /* Mop up any left-over bytes. */ - if (offset < length) - { - src = adjust_address (src, BLKmode, offset); - dest = adjust_address (dest, BLKmode, offset); - move_by_pieces (dest, src, length - offset, - MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN); - } -} - -/* Helper function for doing a loop-based block operation on memory - reference MEM. Each iteration of the loop will operate on LENGTH - bytes of MEM. - - Create a new base register for use within the loop and point it to - the start of MEM. Create a new memory reference that uses this - register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ - -static void -riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length, - rtx *loop_reg, rtx *loop_mem) -{ - *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); - - /* Although the new mem does not refer to a known location, - it does keep up to LENGTH bytes of alignment. */ - *loop_mem = change_address (mem, BLKmode, *loop_reg); - set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); -} - -/* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER - bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that - the memory regions do not overlap. */ - -static void -riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, - unsigned HOST_WIDE_INT bytes_per_iter) -{ - rtx label, src_reg, dest_reg, final_src, test; - unsigned HOST_WIDE_INT leftover; - - leftover = length % bytes_per_iter; - length -= leftover; - - /* Create registers and memory references for use within the loop. */ - riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); - riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); - - /* Calculate the value that SRC_REG should have after the last iteration - of the loop. */ - final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length), - 0, 0, OPTAB_WIDEN); - - /* Emit the start of the loop. */ - label = gen_label_rtx (); - emit_label (label); - - /* Emit the loop body. */ - riscv_block_move_straight (dest, src, bytes_per_iter); - - /* Move on to the next block. */ - riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter)); - riscv_emit_move (dest_reg, plus_constant (Pmode, dest_reg, bytes_per_iter)); - - /* Emit the loop condition. */ - test = gen_rtx_NE (VOIDmode, src_reg, final_src); - emit_jump_insn (gen_cbranch4 (Pmode, test, src_reg, final_src, label)); - - /* Mop up any left-over bytes. */ - if (leftover) - riscv_block_move_straight (dest, src, leftover); - else - emit_insn(gen_nop ()); -} - -/* Expand a cpymemsi instruction, which copies LENGTH bytes from - memory reference SRC to memory reference DEST. */ - -bool -riscv_expand_block_move (rtx dest, rtx src, rtx length) -{ - if (CONST_INT_P (length)) - { - unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); - unsigned HOST_WIDE_INT factor, align; - - align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); - factor = BITS_PER_WORD / align; - - if (optimize_function_for_size_p (cfun) - && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) - return false; - - if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) - { - riscv_block_move_straight (dest, src, INTVAL (length)); - return true; - } - else if (optimize && align >= BITS_PER_WORD) - { - unsigned min_iter_words - = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD; - unsigned iter_words = min_iter_words; - unsigned HOST_WIDE_INT bytes = hwi_length; - unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD; - - /* Lengthen the loop body if it shortens the tail. */ - for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++) - { - unsigned cur_cost = iter_words + words % iter_words; - unsigned new_cost = i + words % i; - if (new_cost <= cur_cost) - iter_words = i; - } - - riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); - return true; - } - } - return false; -} - /* Print symbolic operand OP, which is part of a HIGH or LO_SUM in context CONTEXT. HI_RELOC indicates a high-part reloc. */