From patchwork Fri Oct 18 13:13:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999112 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=NRz5iBh5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQDM1f5tz1xth for ; Sat, 19 Oct 2024 00:15:15 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B8C7385AC0D for ; Fri, 18 Oct 2024 13:15:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id 059493858D20 for ; Fri, 18 Oct 2024 13:14:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 059493858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 059493858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257283; cv=none; b=R5XvKOVvufUAP17tCE1uH8bg3Ew8T8FCJ8hn/kz3GzAKWhoCjh/LsFHk43UDstQDtiDjAQZvW/tulbXvVLEOkw9X30OCC8ajCUcdJ5WUVayNvt9BUvLkwRkkuCNctA3CzgdgUeFcgFoJpPDEHDGsEXOgG0YwHOGMFz0dGVlNlkw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257283; c=relaxed/simple; bh=XrHdMlLs/pHbZ/8TYdfwxy4TQvVq3ueU2KC2Yv4X84w=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=c50IhtpMKOV2p0t1nf+DgsVOZR55aKp1Pa4QurZ7n6mYsEU9Qt6RiZ2BRyarw3JdfoixhO4deNXKqiohCAkeLgViQfhBMjXaFa3tArZdnCdscnmjCk875g+NqzAeRicWf1n4JrfwgNu48qmGfkSkfKo1z1Kjd8Eqz52rVo8thLY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-431137d12a5so21314205e9.1 for ; Fri, 18 Oct 2024 06:14:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257274; x=1729862074; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xgUR2PRDXdI3wXadocUWAe9/46vvp8A6YJtR7uQfMjA=; b=NRz5iBh5GzWyHmTMZLX0XuoZfuKzY0pJDIpRpRcWoCOcgnVXYEcO6MvkgFDOp37qWg d/hMcMMkj2Hz/KE6zk0EySyYiFnYHHIGZ+SsDQRKBJBIhsUnyK/Bjqt5wH4wdUqYpLkf 8WSJeDko++hyCsQPqiNlI0dY4RMqd5x8jGqv3crzahqbF0r1d3PRXHZeiy/FURIAM5of 7avLRW+dutkx8iSXW5j7RteSAvw9ALY51Fflvt2DHNVLLJRl4HeHhlJ2QISyCTPeR4yi FbFBkCIhIIkDAixOwsOO+4OLUxOJC4q0iFv277yPWTRQ4mNBMsosKwLZ3AGWkvJwxgsD UI9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257274; x=1729862074; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xgUR2PRDXdI3wXadocUWAe9/46vvp8A6YJtR7uQfMjA=; b=L7ME6VYFUtyfzDTQ5T9y/HcOMAVrSIRqZ6FZ7/FNx8wGGsYZcb3/H4w5gGiIJXY40S kpSag8heW7eFHglBXgH/0+ypn7WVmn3wAtPVoPEAul/dvId2QGSluLeNQa3lLur4NW7c BGxEY5OoJyZLz/b7trayvWI3nhQD2Eza8xAwm3SEfhc45AZZS6/+j6dJ/qyYkIgR1T4H jv6UnW0s1icLkx+kpbpGk7Z6HRlwRe6Q3GDyzPx9XFOJNHOQDuPGlcm0AUBeaN8VBXJx VWNzBsQQpWUs9vZKCzqEJpr1SwVx6d9cqd8Ncz3F6vrInWZPAw1iWQ6x/PaKZ1aI2jfa Mcxg== X-Gm-Message-State: AOJu0YyMvzl1n+EntXi5NvFNLS3EoJeo42cvYovGZmoyAC34Cp30yZiA +LhbpisfQ+wpA/cG2+5eXp/BjnLXxU+FNmhVNTKlM232oUJbpacksTULh0V3L2Yl01G0rfWVev/ m X-Google-Smtp-Source: AGHT+IEzypwovsSi8kRfbiffqThqJOhVXTvpQSDhlQuU5rXZImd6D/j8yBG4ZORnsXnFJl6x7bKRSg== X-Received: by 2002:a05:600c:34d4:b0:431:588a:44a2 with SMTP id 5b1f17b1804b1-43161636685mr19111885e9.12.1729257273633; Fri, 18 Oct 2024 06:14:33 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:33 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 7/7] RISC-V: Disable by pieces for vector setmem length > UNITS_PER_WORD Date: Fri, 18 Oct 2024 14:13:00 +0100 Message-ID: <20241018131300.1150819-8-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org For fast unaligned access targets, by pieces uses up to UNITS_PER_WORD size pieces resulting in more store instructions than needed. For example gcc.target/riscv/rvv/base/setmem-1.c:f1 built with `-O3 -march=rv64gcv -mtune=thead-c906`: ``` f1: vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a1 vsetivli zero,0,e32,mf2,ta,ma sb a1,14(a0) vmv.x.s a4,v1 vsetivli zero,8,e16,m1,ta,ma vmv.x.s a5,v1 vse8.v v1,0(a0) sw a4,8(a0) sh a5,12(a0) ret ``` The slow unaligned access version built with `-O3 -march=rv64gcv` used 15 sb instructions: ``` f1: sb a1,0(a0) sb a1,1(a0) sb a1,2(a0) sb a1,3(a0) sb a1,4(a0) sb a1,5(a0) sb a1,6(a0) sb a1,7(a0) sb a1,8(a0) sb a1,9(a0) sb a1,10(a0) sb a1,11(a0) sb a1,12(a0) sb a1,13(a0) sb a1,14(a0) ret ``` After this patch, the following is generated in both cases: ``` f1: vsetivli zero,15,e8,m1,ta,ma vmv.v.x v1,a1 vse8.v v1,0(a0) ret ``` gcc/ChangeLog: * config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p): New function. (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Define. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113469.c: Expect mf2 setmem. * gcc.target/riscv/rvv/base/setmem-2.c: Update f1 to expect straight-line vector memset. * gcc.target/riscv/rvv/base/setmem-3.c: Likewise. --- gcc/config/riscv/riscv.cc | 19 +++++++++++++++++++ .../gcc.target/riscv/rvv/autovec/pr113469.c | 3 ++- .../gcc.target/riscv/rvv/base/setmem-2.c | 12 +++++++----- .../gcc.target/riscv/rvv/base/setmem-3.c | 12 +++++++----- 4 files changed, 35 insertions(+), 11 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index e111cb07284..c008b2da3b7 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12583,6 +12583,22 @@ riscv_stack_clash_protection_alloca_probe_range (void) return STACK_CLASH_CALLER_GUARD; } +static bool +riscv_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, + unsigned alignment, + enum by_pieces_operation op, bool speed_p) +{ + /* For set/clear with size > UNITS_PER_WORD, by pieces uses vector broadcasts + with UNITS_PER_WORD size pieces. Use setmem instead which can use + bigger chunks. */ + if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR + && (op == CLEAR_BY_PIECES || op == SET_BY_PIECES) + && speed_p && size > UNITS_PER_WORD) + return false; + + return default_use_by_pieces_infrastructure_p (size, alignment, op, speed_p); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -12948,6 +12964,9 @@ riscv_stack_clash_protection_alloca_probe_range (void) #undef TARGET_C_MODE_FOR_FLOATING_TYPE #define TARGET_C_MODE_FOR_FLOATING_TYPE riscv_c_mode_for_floating_type +#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P +#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P riscv_use_by_pieces_infrastructure_p + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-riscv.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c index d1c118c02d6..f86084bdb40 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c @@ -51,4 +51,5 @@ void p(int buf, __builtin_va_list ab, int q) { } while (k); } -/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*4,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 2 } } */ +/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*4,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*8,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c index 9da1c9309d8..67d62f7193e 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c @@ -5,15 +5,17 @@ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) -/* Small memsets shouldn't be vectorised. +/* Vectorise with no loop. ** f1: ** ( -** sb\s+a1,0\(a0\) -** ... +** vsetivli\s+zero,\d+,e8,m1,ta,ma ** | -** li\s+a2,\d+ -** tail\s+memset +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma ** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret */ void * f1 (void *a, int const b) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c index 2111a139ad4..7ade7ef415b 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c @@ -5,15 +5,17 @@ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) -/* Small memsets shouldn't be vectorised. +/* Vectorise with no loop. ** f1: ** ( -** sb\s+a1,0\(a0\) -** ... +** vsetivli\s+zero,\d+,e8,m1,ta,ma ** | -** li\s+a2,\d+ -** tail\s+memset +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma ** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret */ void * f1 (void *a, int const b)