From patchwork Sat Sep 9 07:03:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 1831778 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=c/ExmNAW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RjP910cS0z1yfm for ; Sat, 9 Sep 2023 17:04:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D0C123858412 for ; Sat, 9 Sep 2023 07:04:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D0C123858412 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694243042; bh=7gg9xovaZsnofl6CA24gcV2he45ZbZUC0qZ+l9U0K+E=; h=Subject:To:Cc:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=c/ExmNAWfIEL+87x1bKS2pjuwPtipCLs6Ct9XsmvINGculCwd5a+7J4KpIcLGzKjQ Bnb4pMmAZ6QI+bUJiBi4AHtqjf27JlT9AK1mGVl6tIbiXcKqb2dbn2pT15u2hqp1SY 5DCQ0Lz1ml1MFiIDn98kFLSbadXqtFhigA7QYUHs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 075183858D1E for ; Sat, 9 Sep 2023 07:03:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 075183858D1E Received: from localhost.localdomain (xry111.site [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384)) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 65D4E659C0; Sat, 9 Sep 2023 03:03:41 -0400 (EDT) Message-ID: Subject: Pushed: [PATCH v2] LoongArch: Use LSX and LASX for block move To: chenglulu , gcc-patches@gcc.gnu.org Cc: Chenghui Pan , i@xen0n.name, xuchenghua@loongson.cn Date: Sat, 09 Sep 2023 15:03:40 +0800 In-Reply-To: References: <20230907161407.27338-2-xry111@xry111.site> User-Agent: Evolution 3.48.4 MIME-Version: 1.0 X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Xi Ruoyao via Gcc-patches From: Xi Ruoyao Reply-To: Xi Ruoyao Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Pushed r14-3818 with test cases added. The pushed patch is attached. On Sat, 2023-09-09 at 14:10 +0800, chenglulu wrote: > > 在 2023/9/8 上午12:14, Xi Ruoyao 写道: > > gcc/ChangeLog: > > > >         * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): > >         Define to the maximum amount of bytes able to be loaded or > >         stored with one machine instruction. > >         * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): > >         New static function. > >         (loongarch_block_move_straight): Call > >         loongarch_mode_for_move_size for machine_mode to be moved. > >         (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN > >         instead of UNITS_PER_WORD. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch > > applied, the "lib_build_self_spec = %<..." line in t-linux commented out > > (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie > > is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx".  Ok for trunk? > > I think test cases need to be added here. > > Otherwise OK, thanks! /* snip */ From 35adc54b55aa199f17e2c84e382792e424b6171e Mon Sep 17 00:00:00 2001 From: Xi Ruoyao Date: Tue, 5 Sep 2023 21:02:38 +0800 Subject: [PATCH v2] LoongArch: Use LSX and LASX for block move gcc/ChangeLog: * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): Define to the maximum amount of bytes able to be loaded or stored with one machine instruction. * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): New static function. (loongarch_block_move_straight): Call loongarch_mode_for_move_size for machine_mode to be moved. (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN instead of UNITS_PER_WORD. gcc/testsuite/ChangeLog: * gcc.target/loongarch/memcpy-vec-1.c: New test. * gcc.target/loongarch/memcpy-vec-2.c: New test. * gcc.target/loongarch/memcpy-vec-3.c: New test. --- gcc/config/loongarch/loongarch.cc | 22 +++++++++++++++---- gcc/config/loongarch/loongarch.h | 3 +++ .../gcc.target/loongarch/memcpy-vec-1.c | 11 ++++++++++ .../gcc.target/loongarch/memcpy-vec-2.c | 12 ++++++++++ .../gcc.target/loongarch/memcpy-vec-3.c | 6 +++++ 5 files changed, 50 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 6698414281e..509ef2b97f1 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, return true; } +static machine_mode +loongarch_mode_for_move_size (HOST_WIDE_INT size) +{ + switch (size) + { + case 32: + return V32QImode; + case 16: + return V16QImode; + } + + return int_mode_for_size (size * BITS_PER_UNIT, 0).require (); +} + /* Emit straight-line code to move LENGTH bytes from SRC to DEST. Assume that the areas do not overlap. */ @@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) { - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); + mode = loongarch_mode_for_move_size (delta_cur); for (; offs + delta_cur <= length; offs += delta_cur, i++) { @@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) { - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); + mode = loongarch_mode_for_move_size (delta_cur); for (; offs + delta_cur <= length; offs += delta_cur, i++) loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]); @@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align) HOST_WIDE_INT align = INTVAL (r_align); - if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD) - align = UNITS_PER_WORD; + if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN) + align = LARCH_MAX_MOVE_PER_INSN; if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT) { diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 3fc9dc43ab1..7e391205583 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -1181,6 +1181,9 @@ typedef struct { least twice. */ #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2) +#define LARCH_MAX_MOVE_PER_INSN \ + (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD)) + /* The base cost of a memcpy call, for MOVE_RATIO and friends. These values were determined experimentally by benchmarking with CSiBE. */ diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c new file mode 100644 index 00000000000..8d9fedc9e4f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d -march=la464 -mno-strict-align" } */ +/* { dg-final { scan-assembler-times "xvst" 2 } } */ +/* { dg-final { scan-assembler-times "\tvst" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.d|stptr\\.d" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.w|stptr\\.w" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.h" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.b" 1 } } */ + +extern char a[], b[]; +void test() { __builtin_memcpy(a, b, 95); } diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c new file mode 100644 index 00000000000..6b28b884db0 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d -march=la464 -mno-strict-align" } */ +/* { dg-final { scan-assembler-times "xvst" 2 } } */ +/* { dg-final { scan-assembler-times "\tvst" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.d|stptr\\.d" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.w|stptr\\.w" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.h" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.b" 1 } } */ + +typedef char __attribute__ ((vector_size (32), aligned (32))) vec; +extern vec a[], b[]; +void test() { __builtin_memcpy(a, b, 95); } diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c new file mode 100644 index 00000000000..233ed215078 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=la464 -mabi=lp64d -mstrict-align" } */ +/* { dg-final { scan-assembler-not "vst" } } */ + +extern char a[], b[]; +void test() { __builtin_memcpy(a, b, 16); } -- 2.42.0