From patchwork Fri Oct 18 13:12:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999108 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=aUFt+a4P; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQCs5gkTz1xvV for ; Sat, 19 Oct 2024 00:14:49 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F21363858031 for ; Fri, 18 Oct 2024 13:14:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by sourceware.org (Postfix) with ESMTPS id 3F97D385843D for ; Fri, 18 Oct 2024 13:14:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3F97D385843D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3F97D385843D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::335 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257258; cv=none; b=L2UdxH8BQ9eilKnDv1NF2X6Y8xagQHmB1CheJfsuk7RNMgvCs8rZy4XMASd32iDb13K3jXdwN34q+4dO4SxOFQuRJaU26+oLZMDjkGFEvmuh8m7v4BEjVTi446IDiap2pMbEgDF4UaZXXTxNjjgXTzZOCd3QV6BNwquC7ySLNaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257258; c=relaxed/simple; bh=+INiAWAG6jzF+aoGZIhH+oVJICY96kIe7meWZruyq8c=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=MP1pRHLQDq39rLn0AN8OoHvN9ZGBow840gUdCsbQWGo/nvNhu0J0e6Hnmo81wVWZHYBEgaucMUp0yAC5QHShnfaVdYwH1pINOZ77V5yf+z59mdI/rKXsDV5VEY7jf6ML1S+moZWGPaL2vE7dsFisplkT0uld84eztQHCq08Feo0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-4314c4cb752so21243995e9.2 for ; Fri, 18 Oct 2024 06:14:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257255; x=1729862055; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W7dKjmbDemwda5DYjmOwbEiNfhNreb8XIAV5q6En5RQ=; b=aUFt+a4PGzgNTZl9EXhKdTIELcydfQx/nU8Dzz4slIC4kvoOinSNWyK7aI9BrVLikW sZkqlcyx9aTGuJ9g6tKic5jijJ69vFZU/yGmw+9wLfV56OambCjOuuVCMUXBAagC94vs +qhxPdXUa/HtoVThoprQVU6HRB/Bi/n03xMJmTa9SeQX62dd0b5YcxeWcBlrTTfr99wm X9M6xmJD7Yrg3MXJos5n+hZVo8NMsrD5ITSkjfwHLSkbur+dmRto8vtS2LMPtzqX7e/p FAWZqFSX+s1aU9bTQT0zFGaxdjzlfWMVflaPZBqPxJmoB7zFtwJV04Qia++eF93gjsRk RZCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257255; x=1729862055; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W7dKjmbDemwda5DYjmOwbEiNfhNreb8XIAV5q6En5RQ=; b=QCSoTgjCV7Rl6uHdNs2DK/47lxuAGii7gZwDHhtlLyPnzDC+NC2QExG3ZB1Ra9DZqS wz8y7LUCBiNiJj6mc2LjZd+OmOmUA3WA6jeH1sUNfhv+AnBatNdRAl+GRwIVDChTdYh1 WjOmUjnOdcga2NNynwysIALCqcPXK0MK2QJdWWXVr9rWqU5B6MFwvsn53LfOCuyfbBTu +cmIHMcHWKCoPXl7wor8tODj+qsrxfn0w4s0wY19kuaiwYQb1jDGhETvfAbsU9ltCwoW no3hekC/7UOjEgHk4aIlByvNxEqYNZbrigSm3YfQ4MSHvC7X82wWJi5GEvg/k15lgfPI O0NQ== X-Gm-Message-State: AOJu0YzYMvQNSZhpTFkgNdojW/A1VGrMt5aEZ42WL+kDrkf/O9ixis7O E9xO9EtAdUO8HF4/qYibq7nfclJkTB1nSMKi5DYyP/lcp6vHT49Ic7YsZQnG1507vQ70QVMtpbB k X-Google-Smtp-Source: AGHT+IFODALPPWnIqwIsiEVW9sHoZBzZisUx0Ej4Hef+MCZ8jM1aurz/4WCpGQmSFocUZZ2O3K3G5g== X-Received: by 2002:a05:600c:3151:b0:426:61e8:fb3b with SMTP id 5b1f17b1804b1-43161685b2dmr17376285e9.27.1729257254656; Fri, 18 Oct 2024 06:14:14 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:14 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 1/7] RISC-V: Fix indentation in riscv_vector::expand_block_move [NFC] Date: Fri, 18 Oct 2024 14:12:54 +0100 Message-ID: <20241018131300.1150819-2-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Fix indentation. --- gcc/config/riscv/riscv-string.cc | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 4bb8bcec4a5..0c5ffd7d861 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1086,22 +1086,22 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) { HOST_WIDE_INT length = INTVAL (length_in); - /* By using LMUL=8, we can copy as many bytes in one go as there - are bits in a vector register. If the entire block thus fits, - we don't need a loop. */ - if (length <= TARGET_MIN_VLEN) - { - need_loop = false; - - /* If a single scalar load / store pair can do the job, leave it - to the scalar code to do that. */ - /* ??? If fast unaligned access is supported, the scalar code could - use suitably sized scalars irrespective of alignment. If that - gets fixed, we have to adjust the test here. */ - - if (pow2p_hwi (length) && length <= potential_ew) - return false; - } + /* By using LMUL=8, we can copy as many bytes in one go as there + are bits in a vector register. If the entire block thus fits, + we don't need a loop. */ + if (length <= TARGET_MIN_VLEN) + { + need_loop = false; + + /* If a single scalar load / store pair can do the job, leave it + to the scalar code to do that. */ + /* ??? If fast unaligned access is supported, the scalar code could + use suitably sized scalars irrespective of alignment. If that + gets fixed, we have to adjust the test here. */ + + if (pow2p_hwi (length) && length <= potential_ew) + return false; + } /* Find the vector mode to use. Using the largest possible element size is likely to give smaller constants, and thus potentially From patchwork Fri Oct 18 13:12:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999114 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=Fjas5IlG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQF54VBXz1xth for ; Sat, 19 Oct 2024 00:15:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C7BB0385AC19 for ; Fri, 18 Oct 2024 13:15:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id 86CFF3858410 for ; Fri, 18 Oct 2024 13:14:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86CFF3858410 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 86CFF3858410 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::334 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257261; cv=none; b=hgMe3MRqHUg8DLDu9UCxBx5XNTkOVqr6D/asEeU43UGYv2FUJ057b9vCqiGPYuNUHI0o1s0syd/mmYq9Y0m7Pn2p9TSKYsYoq88+tm8QdUFkElKUJpac3d+mn/+n/2uC7z2fqWpnUnn5uH57uIla1mnbyotQYMMJkNO0A0C336o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257261; c=relaxed/simple; bh=laJB1s6fWzFrRHueizeRKINW6u3pKVJL8sTuX0O6kEM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FGt4oobh28ZLnCibRy0QbUtpkP9VzfsyvybLMcfJ4MTGCVMGZTgG+G/ygxMmqb6YoAhLv/rQdw5jBDLDDeKhQn3N+mlO2vVbB829E+asgTJqpxGyRkpggA/yAGfzIgqDfiVJnprk8QiORhLjSOKVGvoKN6NSjiNK+8ryr+N+mPI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-43159c9f617so17568735e9.2 for ; Fri, 18 Oct 2024 06:14:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257258; x=1729862058; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ymgmCBxXfzF74LepquK/UXZrl8Y3SHbp/tmDu8IsaS8=; b=Fjas5IlGyPkAoCJTmVOu2KKwGK8BzOGDTpW+JlGCP6Rjhz1bBdR0YagFFcoBkvakoq ZodhhwPfw2p9t7iYZV6MFEnfJa2fRsIOSvnwOFqep0QJ0XKzg0LjOmXqFRpPgXhFb7jg kOfm9p4W4IksOss14KWCaGuulsP2VV9s2rfE7IJO1s0/2XJcuOoHbzJJUEhhDc9Nu/E+ a1rM9VQsfH4P7I1xVWcC1l/soQvNlOM9s8Tmrlqx249UFnPZdqXXDk7wUiQv2uaHDy5t xxj2ItPWBcgHal7UW+a7kY4aqc0cmW8sEAZ0dUuJ6kaJ3zxHSIC3HM4BdyIDmVGCSa6p dt2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257258; x=1729862058; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ymgmCBxXfzF74LepquK/UXZrl8Y3SHbp/tmDu8IsaS8=; b=PZphXeM7sau1KntFkDqgqnppWKO61shppobwTK4+qVsN2U9KVYVniVVyTe7hQclItU 5Vt53Qtav+uxDEqL3i4maq1gJMcGoMMp/fOG4DbXCyA8oRzOUC3AjhEloP5jueSLkdBA hZgPDuIN1vdcTduMVj1YM63SZ3wFuzSEE+ipCMTOgOHtanWMXU11dqlkeeoXxSnbrZMl HRO30WJmVSwYpFYbAQ7ncwDxcOI8/bUXBj/WVY+avsGCTRfU+WhRC8wAVl/JMr3hmCWn /SobDy+A2ri8R31EBJvbxkFatLpIpOcEzhO0xU0QNx8cQPc+qOUukXY4BTGEFjemMme/ X5bw== X-Gm-Message-State: AOJu0YwmJoDRblH8drnDO2vcXYSdmQGk8yg+eIcdgCPzgigzIWIiX1Ef vlK6NLECYN9yc35k8sokOpeLSOIP1iOlS1eQiBRL3eIIu7uKjPmXoCBh8fzZ/chA79NEthDCyuk 4 X-Google-Smtp-Source: AGHT+IGQkOpGoIcyrGfIMZbw51vVd+QHYeEnV6w1TwJx438R/nDCwEGfTrn69d1fBKga32+SqFEcOA== X-Received: by 2002:a05:600c:4e12:b0:431:50fa:89c4 with SMTP id 5b1f17b1804b1-43161622aa4mr17908925e9.3.1729257258081; Fri, 18 Oct 2024 06:14:18 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:17 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 2/7] RISC-V: Fix uninitialized reg in memcpy Date: Fri, 18 Oct 2024 14:12:55 +0100 Message-ID: <20241018131300.1150819-3-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org riscv_vector::expand_block_move contains a gen_rtx_NE that uses uninitialized reg rtx `end`. It looks like `length_rtx` was supposed to be used here. gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Replace `end` with `length_rtx` in gen_rtx_NE. --- gcc/config/riscv/riscv-string.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 0c5ffd7d861..0f1353baba3 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1078,7 +1078,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) bool need_loop = true; bool size_p = optimize_function_for_size_p (cfun); rtx src, dst; - rtx end = gen_reg_rtx (Pmode); rtx vec; rtx length_rtx = length_in; @@ -1245,7 +1244,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) emit_insn (gen_rtx_SET (length_rtx, gen_rtx_MINUS (Pmode, length_rtx, cnt))); /* Emit the loop condition. */ - rtx test = gen_rtx_NE (VOIDmode, end, const0_rtx); + rtx test = gen_rtx_NE (VOIDmode, length_rtx, const0_rtx); emit_jump_insn (gen_cbranch4 (Pmode, test, length_rtx, const0_rtx, label)); emit_insn (gen_nop ()); } From patchwork Fri Oct 18 13:12:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999110 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=cfUCROpQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQD54NKFz1xvV for ; Sat, 19 Oct 2024 00:15:01 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C61FE3857C7F for ; Fri, 18 Oct 2024 13:14:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id 26EEB3858288 for ; Fri, 18 Oct 2024 13:14:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 26EEB3858288 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 26EEB3858288 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257267; cv=none; b=rQxK/ljz1Kau7nDVDmX4PLJamghlSWvTecFaDVZHuI+qNw1YaYy0UX4GqkgW4OZvWZ8HVCbJZcOkkRJtIP+1f0JBfUaAOJlsLJTJtaoHze+7/eVDDitWsTfkT2HLlVcSxQhhnamxwBzmn9RIT5A63mXwXdPJXt/EDh/wppGfLNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257267; c=relaxed/simple; bh=y4fPNj5bzJYw03VsZuBfREsJqCSUuE+pdiixk0lPoDA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FZRei/nTuX9MDYt9Cd5S03aDGsLIdwVPxv91hVbvFmITIUpwrYutSN6pO9s1+ybtIROunoKXCiaocHXWRAbOetW6YEcfJwc5uRyTKnm+iZfn1p4QM7aGv0Dt4lMHqkRkXa0/Sueg9FCmxbTAm1L9+RrdPLCIKvdHh9m0NKA1D5A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-431137d12a5so21312005e9.1 for ; Fri, 18 Oct 2024 06:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257261; x=1729862061; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XpWp92hULpUzu43rxQaCcETCp6k2BlgRa86qWQQ/oic=; b=cfUCROpQgGkJxBKj0sTEyOzymiZpPFzB8Th0gzh9eV75nT+kZaZTwudaWWuzEpQYwP tCtiBcmrSOc59BkRm6bW49lmHxNAWiWWlHHpcq0qajxVMI3+GbIa1SwPmfItQ39+YrLP r6BDqMANPDAr8eonghao9onsF1XaMhJ4IjrysKh/FE7c0XLZAaziGCAcJx3ImCNHlOFV w0MTvcMtG5vvp0QuZVvX2DI7VmzHzaBMQBU8EotA0fvI6WGsqvSMflN09gsd6iYClMuk OOqQH1uLjOp9d9PozJgUtBKwoJZ49z2OCgqshOVmFDS9Any+coGkfNKN/RUoaP/0bb0F Jyjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257261; x=1729862061; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XpWp92hULpUzu43rxQaCcETCp6k2BlgRa86qWQQ/oic=; b=pZcIDhkkp165g5luT+5iv/qF55SuA0lfzFh3z4Y9RRAYeecoGpRS261F5TqhWKpv6D Zrz1qgyt1WA4pmxwPZoiNuGsF+7TtvEfvTE8OgKXAso6Z5rhcFhJSjdN4iixkzJaUsyD aqj55x/8Z2pdY/FFgD3pt6taQLv+MLnJUgEaNa2tVFoJsSCx82wuQWiabPtJkMY2Ps8z XwJEXnoTfkGYmAT1KAACDfJ4hhi7Ni/2dMxXXMj2hDzoR5nwKCzntQdmcdtrPqG92FRL M0hahraRTHMxPAR38UaU6/Ei1beWTQo1igFeaHnJEkji8yO5HZ+NFv0rHg5OrhUN/4jJ WRXw== X-Gm-Message-State: AOJu0YzRtS6IM9iiEtW6JyuZch99iaFvkC648bRI7js+IiIkD5JD0yZ7 F9jjpXmLMR8nJkCfLx75flxTY3B2Dw3arrW1Lo3vzZId3WhJ07norEENvj4dXBQn8BwYy2zxi0i w X-Google-Smtp-Source: AGHT+IGVunqeEi0p2nxKOuWC4lSJ0yb9r+57ElEhPtGhUhhqgOQLfaDooc8VgbtwoJ6zwKDJRvSw+A== X-Received: by 2002:a05:600c:4e12:b0:430:5887:c238 with SMTP id 5b1f17b1804b1-43161628886mr20910405e9.11.1729257260629; Fri, 18 Oct 2024 06:14:20 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:20 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 3/7] RISC-V: Fix vector memcpy smaller LMUL generation Date: Fri, 18 Oct 2024 14:12:56 +0100 Message-ID: <20241018131300.1150819-4-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org If riscv_vector::expand_block_move is generating a straight-line memcpy using a predicated store, it tries to use a smaller LMUL to reduce register pressure if it still allows an entire transfer. This happens in the inner loop of riscv_vector::expand_block_move, however, the vmode chosen by this loop gets overwritten later in the function, so I have added the missing break from the outer loop. I have also addressed a couple of issues with the conditions of the if statement within the inner loop. The first condition did not make sense to me: ``` TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT ``` I think this was supposed to be checking that the length fits within the given LMUL, so I have changed it to do that. The second condition: ``` /* Avoid loosing the option of using vsetivli . */ && (nunits <= 31 * lmul || nunits > 31 * 8) ``` seems to imply that lmul affects the range of AVL immediate that vsetivli can take but I don't think that is correct. Anyway, I don't think this condition is necessary because if we find a suitable mode we should stick with it, regardless of whether it allowed vsetivli, rather than continuing to try larger lmul which would increase register pressure or smaller potential_ew which would increase AVL. I have removed this condition. gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Fix condition for using smaller LMUL. Break outer loop if a suitable vmode has been found. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112929-1.c: Expect smaller lmul. * gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Likewise. * gcc.target/riscv/rvv/base/cpymem-3.c: New test. --- gcc/config/riscv/riscv-string.cc | 8 +- .../gcc.target/riscv/rvv/base/cpymem-3.c | 85 +++++++++++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr112929-1.c | 2 +- .../gcc.target/riscv/rvv/vsetvl/pr112988-1.c | 2 +- 4 files changed, 92 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 0f1353baba3..b590c516354 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1153,9 +1153,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) Still, by choosing a lower LMUL factor that still allows an entire transfer, we can reduce register pressure. */ for (unsigned lmul = 1; lmul <= 4; lmul <<= 1) - if (TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT - /* Avoid loosing the option of using vsetivli . */ - && (nunits <= 31 * lmul || nunits > 31 * 8) + if (length * BITS_PER_UNIT <= TARGET_MIN_VLEN * lmul && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew) && (riscv_vector::get_vector_mode (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul, @@ -1163,6 +1161,10 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) break; } + /* Stop searching if a suitable vmode has been found. */ + if (vmode != VOIDmode) + break; + /* The RVVM8?I modes are notionally 8 * BYTES_PER_RISCV_VECTOR bytes wide. BYTES_PER_RISCV_VECTOR can't be evenly divided by the sizes of larger element types; the LMUL factor of 8 can at diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c new file mode 100644 index 00000000000..f07078ba6a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O1 -fno-schedule-insns -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ +/* { dg-add-options riscv_v } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Check that vector memcpy with predicated store uses smaller LMUL where + possible. + +/* m1 +** f1: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f1 (char *d, char *s) +{ + __builtin_memcpy (d, s, MIN_VECTOR_BYTES - 1); +} + +/* m2 +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m2,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m2,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f2 (char *d, char *s) +{ + __builtin_memcpy (d, s, 2 * MIN_VECTOR_BYTES - 1); +} + +/* m4 +** f3: +** ( +** vsetivli\s+zero,\d+,e8,m4,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m4,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f3 (char *d, char *s) +{ + __builtin_memcpy (d, s, 4 * MIN_VECTOR_BYTES - 1); +} + +/* m8 +** f4: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** | +** li\s+[ta][0-7],\d+ +** addi\s+[ta][0-7],[ta][0-7],-?\d+ +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f4 (char *d, char *s) +{ + __builtin_memcpy (d, s, 8 * MIN_VECTOR_BYTES - 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c index 86d65ddcbab..e55604e1114 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c @@ -54,5 +54,5 @@ int main() { /* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ /* { dg-final { scan-assembler-not {vsetivli} } } */ -/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ /* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c index 63817f21385..b20e46395aa 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c @@ -64,5 +64,5 @@ int main() { /* { dg-final { scan-assembler-times {vsetvli} 4 } } */ /* { dg-final { scan-assembler-not {vsetivli} } } */ -/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m8,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m2,\s*t[au],\s*m[au]} 1 } } */ /* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 1 } } */ From patchwork Fri Oct 18 13:12:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999111 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=QNinxjxg; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQDF3TM5z1xvV for ; Sat, 19 Oct 2024 00:15:08 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EF6A63857BB3 for ; Fri, 18 Oct 2024 13:15:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id C2B193858401 for ; Fri, 18 Oct 2024 13:14:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C2B193858401 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C2B193858401 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::336 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; cv=none; b=JrJ6+MkMXER4wSvFFGCH75bLxJ+1JTHroVQjZvd5QsCwOHGR6NNXonv9RYVc2qmZal1BTJ+s1jxSqpMQBsfYyvhE5gC+mpvIv/ELGiAbVcfZZK7zRTiOP8sFVYy+0phjRHoxhJebPmBJg8mcAaThT+S+Pgbsblp3Nd50x8HSSk8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; c=relaxed/simple; bh=0uTetZ4HM7W7a8rAf7TU1pDjyPLPrt0Z2VgK5YJ2keQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=JLU5tLTqW/MaV+GjDz5nsga/YMYriencSOTUGSyk6aWHR1Hd48BipMVnVfKgGdAIB821IM2rMk3bhpUOOhsX9zxAQgmoF5ZaHEmeoccd0KZMixHjQXJeFVcY2WZUtNQzosf8sx92TLGkG4DCRRPVrnLqD9zB7z8cUK6eaw8sjok= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-4315abed18aso16363885e9.2 for ; Fri, 18 Oct 2024 06:14:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257266; x=1729862066; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9VJb5cwCE5LbcWQ+5x63yLvDILCkBGGyuW+1oujCLDw=; b=QNinxjxgdAHz0FLlPjMGAhUWFKwlK5LyMhcGEa6DEbWSg4wDskTVY5VUUGnH631h1D 2an1bIaaw9WQ60rbDgrwJpaL0p2yJ27VMtRIX8O2lQCw+3jP+4RX+fusu8MJfp8DzJ81 2JFI/al1kqCDYxYMZwMJHGY896HtXXev2m1O0seLHngdjwLl3xEJHj/S84MJPwPlXG2s bZOCK9l82nZQLDB9aje7f0xgo+ai/ASqbRvX/ki0FtAmv9Kld4M7ngd4CAIpBQlX47nh r3qH53YFiaQXigDjbAhTq+Ahq9DjKSCC+52Zx6Pf8au23ErCyUkyGs/leoTm6neE3g2j 79vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257266; x=1729862066; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9VJb5cwCE5LbcWQ+5x63yLvDILCkBGGyuW+1oujCLDw=; b=tQpHtvN+gYPVVoUSRR6+MxsefujSB4xWQb82pVwRnCY8LIDnTxgSd9n0KUJiemiS0m L9SopFVg2KEMRYJznsKJz5CeN/Nc0KFCqjyEylWT1H5GbD/U8agcD+PfAEh4FuLyrf8X OCwyeTsSWFFOldsRmZY1Rh4ZovCQjAv1Jr6ilxpAa49OXvMMj6l4boqVJclSf5NAkftJ j3qUoL0pK/GlVo9Pa1bmYIjbqbu/qRe5aCemI9v1OzyGizwOfEISRV9YhcoZXdPSwGtr FOkOJaI2WlM78mXp/D7Hvyrk6aU8XZaa5Kkx/XLAvc8nfWn3nzb77W8aBcWXd6dYMBem lbdg== X-Gm-Message-State: AOJu0YwkPJXL1xCNabEcfyLF0iErOvWu2mH2cBz0+1SKhDtTPtbbO+ZD iqwlLnmg92SFoKnUWs3U/KpSsvBBQbMc3YpDPPoVzLTxsTOEmBm9NxJhZfhv2vcbaBIi/HGexo/ b X-Google-Smtp-Source: AGHT+IGkR0KZbQSqt773nPwC8Qi4TQJPS9irzpmbNANhygTouKwmRqkVR6d2tdqQl0VEzmcF7sL3AQ== X-Received: by 2002:a05:600c:3849:b0:431:52a3:d9ea with SMTP id 5b1f17b1804b1-431615b9cb1mr18645285e9.0.1729257266193; Fri, 18 Oct 2024 06:14:26 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:25 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 4/7] RISC-V: Honour -mrvv-max-lmul in riscv_vector::expand_block_move Date: Fri, 18 Oct 2024 14:12:57 +0100 Message-ID: <20241018131300.1150819-5-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Unlike the other vector string ops, expand_block_move was using max LMUL m8 regardless of TARGET_MAX_LMUL. The check for whether to generate inline vector code for movmem has been moved from movmem to riscv_vector::expand_block_move to avoid maintaining multiple versions of similar logic. They already differed on the minimum length for which they would generate vector code. Now that the expand_block_move value is used, movmem will be generated for smaller lengths. Limiting memcpy to m1 caused some memcpy loops to be generated in the calling convention tests which makes it awkward to add suitable scan assembler tests checking the return value being set, so -mrvv-max-lmul=m8 has been added to these tests. Other tests have been adjusted to expect the new memcpy m1 generation where reasonably straight-forward, otherwise -mrvv-max-lmul=m8 has been added. pr111720-[0-9].c regressed because a memcpy loop is generated instead of straight-line. This reveals an existing issue where a redundant straight-line memcpy gets eliminated but a memcpy loop does not (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117205). For example, on pr111720-0.c after this patch: -mrvv-max-lmul=m8: test: lui a5,%hi(.LANCHOR0) li a4,32 addi sp,sp,-32 addi a5,a5,%lo(.LANCHOR0) vsetvli zero,a4,e8,m1,ta,ma vle8.v v8,0(a5) addi sp,sp,32 jr ra -mrvv-max-lmul=m1: test: addi sp,sp,-32 lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) mv a2,sp li a3,32 .L2: vsetvli a4,a3,e8,m1,ta,ma vle8.v v8,0(a5) sub a3,a3,a4 add a5,a5,a4 vse8.v v8,0(a2) add a2,a2,a4 bne a3,zero,.L2 li a5,32 vsetvli zero,a5,e8,m1,ta,ma vle8.v v8,0(sp) addi sp,sp,32 jr ra I have added -mrvv-max-lmul=m8 to pr111720-[0-9].c so that we continue to test the elimination of straight-line memcpy. gcc/ChangeLog: * config/riscv/riscv-protos.h (get_lmul_mode): New prototype. (expand_block_move): Add bool parameter for movmem_p. * config/riscv/riscv-string.cc (riscv_expand_block_move_scalar): Pass movmem_p as false to riscv_vector::expand_block_move. (expand_block_move): Add movmem_p parameter. Return false if loop needed and movmem_p is true. Respect TARGET_MAX_LMUL. * config/riscv/riscv-v.cc (get_lmul_mode): New function. * config/riscv/riscv.md (movmem): Move checking for whether to generate inline vector code to riscv_vector::expand_block_move by passing movmem_p as true. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113206-1.c: Add -mrvv-max-lmul=m8. * gcc.target/riscv/rvv/autovec/pr113206-2.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Add -mrvv-max-lmul=m8 and adjust assembly scans. * gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Add -mrvv-max-lmul=m8. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Likewise. * gcc.target/riscv/rvv/base/cpymem-1.c: Expect m1 in f1 and f2. * gcc.target/riscv/rvv/base/cpymem-2.c: Add -mrvv-max-lmul=m8. * gcc.target/riscv/rvv/base/movmem-1.c: Adjust f1 to a length that will not get vectorized. * gcc.target/riscv/rvv/base/pr111720-0.c: Add -mrvv-max-lmul=m8. * gcc.target/riscv/rvv/base/pr111720-1.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-2.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-3.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-4.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-5.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-6.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-7.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-8.c: Likewise. * gcc.target/riscv/rvv/base/pr111720-9.c: Likewise. * gcc.target/riscv/rvv/autovec/pr112929-1.c: Expect memcpy m1 loops. * gcc.target/riscv/rvv/autovec/pr112988-1.c: Likewise. --- gcc/config/riscv/riscv-protos.h | 3 +- gcc/config/riscv/riscv-string.cc | 65 +++++++++++-------- gcc/config/riscv/riscv-v.cc | 12 ++++ gcc/config/riscv/riscv.md | 12 +--- .../gcc.target/riscv/rvv/autovec/pr113206-1.c | 2 +- .../gcc.target/riscv/rvv/autovec/pr113206-2.c | 2 +- .../rvv/autovec/vls/calling-convention-1.c | 11 +--- .../rvv/autovec/vls/calling-convention-2.c | 11 +--- .../rvv/autovec/vls/calling-convention-3.c | 11 +--- .../rvv/autovec/vls/calling-convention-4.c | 8 +-- .../rvv/autovec/vls/calling-convention-5.c | 11 +--- .../rvv/autovec/vls/calling-convention-6.c | 11 +--- .../rvv/autovec/vls/calling-convention-7.c | 8 +-- .../riscv/rvv/autovec/vls/spill-4.c | 2 +- .../riscv/rvv/autovec/vls/spill-7.c | 2 +- .../gcc.target/riscv/rvv/base/cpymem-1.c | 4 +- .../gcc.target/riscv/rvv/base/cpymem-2.c | 2 +- .../gcc.target/riscv/rvv/base/movmem-1.c | 7 +- .../gcc.target/riscv/rvv/base/pr111720-0.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-1.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-2.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-3.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-4.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-5.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-6.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-7.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-8.c | 2 +- .../gcc.target/riscv/rvv/base/pr111720-9.c | 2 +- .../gcc.target/riscv/rvv/vsetvl/pr112929-1.c | 6 +- .../gcc.target/riscv/rvv/vsetvl/pr112988-1.c | 6 +- 30 files changed, 95 insertions(+), 121 deletions(-) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 1e6d10a1402..5f6f0cb59dc 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -624,6 +624,7 @@ enum mask_policy enum tail_policy get_prefer_tail_policy (); enum mask_policy get_prefer_mask_policy (); rtx get_avl_type_rtx (enum avl_type); +opt_machine_mode get_lmul_mode (scalar_mode, int); opt_machine_mode get_vector_mode (scalar_mode, poly_uint64); opt_machine_mode get_tuple_mode (machine_mode, unsigned int); bool simm5_p (rtx); @@ -672,7 +673,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode, machine_mode, rtx *); rtx gen_avl_for_scalar_move (rtx); void expand_tuple_move (rtx *); -bool expand_block_move (rtx, rtx, rtx); +bool expand_block_move (rtx, rtx, rtx, bool); machine_mode preferred_simd_mode (scalar_mode); machine_mode get_mask_mode (machine_mode); void expand_vec_series (rtx, rtx, rtx, rtx = 0); diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index b590c516354..64fd6b29092 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -966,7 +966,7 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) /* This function delegates block-move expansion to either the vector implementation or the scalar one. Return TRUE if successful or FALSE - otherwise. */ + otherwise. Assume that the memory regions do not overlap. */ bool riscv_expand_block_move (rtx dest, rtx src, rtx length) @@ -974,7 +974,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length) if ((TARGET_VECTOR && !TARGET_XTHEADVECTOR) && stringop_strategy & STRATEGY_VECTOR) { - bool ok = riscv_vector::expand_block_move (dest, src, length); + bool ok = riscv_vector::expand_block_move (dest, src, length, false); if (ok) return true; } @@ -1054,7 +1054,7 @@ namespace riscv_vector { /* Used by cpymemsi in riscv.md . */ bool -expand_block_move (rtx dst_in, rtx src_in, rtx length_in) +expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) { /* memcpy: @@ -1085,10 +1085,9 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) { HOST_WIDE_INT length = INTVAL (length_in); - /* By using LMUL=8, we can copy as many bytes in one go as there - are bits in a vector register. If the entire block thus fits, - we don't need a loop. */ - if (length <= TARGET_MIN_VLEN) + /* If the VLEN and preferred LMUL allow the entire block to be copied in + one go then no loop is needed. */ + if (known_le (length, BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL)) { need_loop = false; @@ -1114,19 +1113,32 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) for small element widths, we might allow larger element widths for loops too. */ if (need_loop) - potential_ew = 1; + { + if (movmem_p) + /* Inlining general memmove is a pessimisation: we can't avoid + having to decide which direction to go at runtime, which is + costly in instruction count however for situations where the + entire move fits in one vector operation we can do all reads + before doing any writes so we don't have to worry so generate + the inline vector code in such situations. */ + return false; + potential_ew = 1; + } for (; potential_ew; potential_ew >>= 1) { scalar_int_mode elem_mode; unsigned HOST_WIDE_INT bits = potential_ew * BITS_PER_UNIT; - unsigned HOST_WIDE_INT per_iter; - HOST_WIDE_INT nunits; + poly_uint64 per_iter; + poly_int64 nunits; if (need_loop) - per_iter = TARGET_MIN_VLEN; + per_iter = BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL; else per_iter = length; - nunits = per_iter / potential_ew; + /* BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL may not be divisible by + this potential_ew. */ + if (!multiple_p (per_iter, potential_ew, &nunits)) + continue; /* Unless we get an implementation that's slow for small element size / non-word-aligned accesses, we assume that the hardware @@ -1137,6 +1149,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) if (length % potential_ew != 0 || !int_mode_for_size (bits, 0).exists (&elem_mode)) continue; + + poly_uint64 mode_units; /* Find the mode to use for the copy inside the loop - or the sole copy, if there is no loop. */ if (!need_loop) @@ -1152,12 +1166,12 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) pointless. Still, by choosing a lower LMUL factor that still allows an entire transfer, we can reduce register pressure. */ - for (unsigned lmul = 1; lmul <= 4; lmul <<= 1) - if (length * BITS_PER_UNIT <= TARGET_MIN_VLEN * lmul - && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew) + for (unsigned lmul = 1; lmul < TARGET_MAX_LMUL; lmul <<= 1) + if (known_le (length * BITS_PER_UNIT, TARGET_MIN_VLEN * lmul) + && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew, + &mode_units) && (riscv_vector::get_vector_mode - (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul, - potential_ew)).exists (&vmode))) + (elem_mode, mode_units).exists (&vmode))) break; } @@ -1165,15 +1179,12 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) if (vmode != VOIDmode) break; - /* The RVVM8?I modes are notionally 8 * BYTES_PER_RISCV_VECTOR bytes - wide. BYTES_PER_RISCV_VECTOR can't be evenly divided by - the sizes of larger element types; the LMUL factor of 8 can at - the moment be divided by the SEW, with SEW of up to 8 bytes, - but there are reserved encodings so there might be larger - SEW in the future. */ - if (riscv_vector::get_vector_mode - (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * 8, - potential_ew)).exists (&vmode)) + /* BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL will at least be divisible + by potential_ew 1, so this should succeed eventually. */ + if (multiple_p (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL, + potential_ew, &mode_units) + && riscv_vector::get_vector_mode (elem_mode, + mode_units).exists (&vmode)) break; /* We may get here if we tried an element size that's larger than @@ -1186,7 +1197,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) } else { - vmode = E_RVVM8QImode; + gcc_assert (get_lmul_mode (QImode, TARGET_MAX_LMUL).exists (&vmode)); } /* A memcpy libcall in the worst case takes 3 instructions to prepare the diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index ca3a80cceb9..0802a7069a2 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1890,6 +1890,18 @@ get_mask_mode (machine_mode mode) return get_vector_mode (BImode, nunits).require (); } +/* Return the appropriate LMUL mode for MODE. */ + +opt_machine_mode +get_lmul_mode (scalar_mode mode, int lmul) +{ + poly_uint64 lmul_nunits; + unsigned int bytes = GET_MODE_SIZE (mode); + if (multiple_p (BYTES_PER_RISCV_VECTOR * lmul, bytes, &lmul_nunits)) + return get_vector_mode (mode, lmul_nunits); + return E_VOIDmode; +} + /* Return the appropriate M1 mode for MODE. */ static opt_machine_mode diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 78112afbb26..c5a38b42301 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2745,12 +2745,6 @@ FAIL; }) -;; Inlining general memmove is a pessimisation: we can't avoid having to decide -;; which direction to go at runtime, which is costly in instruction count -;; however for situations where the entire move fits in one vector operation -;; we can do all reads before doing any writes so we don't have to worry -;; so generate the inline vector code in such situations -;; nb. prefer scalar path for tiny memmoves. (define_expand "movmem" [(parallel [(set (match_operand:BLK 0 "general_operand") (match_operand:BLK 1 "general_operand")) @@ -2758,10 +2752,8 @@ (use (match_operand:SI 3 "const_int_operand"))])] "TARGET_VECTOR" { - if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN / 8) - && (INTVAL (operands[2]) <= TARGET_MIN_VLEN) - && riscv_vector::expand_block_move (operands[0], operands[1], - operands[2])) + if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2], + true)) DONE; else FAIL; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-1.c index ef92c6f35d1..45086182aa8 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -mrvv-max-lmul=m8" } */ signed char e; short f = 8; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-2.c index cfce88988f7..a3c61b467da 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113206-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -frename-registers" } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -frename-registers -mrvv-max-lmul=m8" } */ signed char e; short f = 8; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c index 82039f5ac4e..86c2400ce51 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -143,10 +143,6 @@ DEF_RET1_ARG9 (v1024qi) DEF_RET1_ARG9 (v2048qi) DEF_RET1_ARG9 (v4096qi) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 9 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1qi tests: return value (lbu) and function prologue (sb) // 1 lbu per test, argnum sb's when args > 1 /* { dg-final { scan-assembler-times {lbu\s+a0,\s*[0-9]+\(sp\)} 8 } } */ @@ -169,7 +165,4 @@ DEF_RET1_ARG9 (v4096qi) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v32-4096qi tests: return value (vse8.v) -/* { dg-final { scan-assembler-times {vse8.v\s+v[0-9],\s*[0-9]+\(a0\)} 74 } } */ -// v1024-4096qi_ARG1 tests: return value (vse64.v) -// for some reason ARG1 returns using vse64 instead of vse8 -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)\s+ret} 3 } } */ +/* { dg-final { scan-assembler-times {vse8.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c index af52b703986..c489a9ff796 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -133,10 +133,6 @@ DEF_RET1_ARG9 (v512hi) DEF_RET1_ARG9 (v1024hi) DEF_RET1_ARG9 (v2048hi) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 8 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1hi tests: return value (lhu) and function prologue (sh) // 1 lhu per test, argnum sh's when args > 1 /* { dg-final { scan-assembler-times {lhu\s+a0,\s*[0-9]+\(sp\)} 8 } } */ @@ -155,7 +151,4 @@ DEF_RET1_ARG9 (v2048hi) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v16-2048hi tests: return value (vse16.v) -/* { dg-final { scan-assembler-times {vse16.v\s+v[0-9],\s*[0-9]+\(a0\)} 74 } } */ -// v512-2048qi_ARG1 tests: return value (vse64.v) -// for some reason ARG1 returns using vse64 instead of vse16 -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)\s+ret} 3 } } */ +/* { dg-final { scan-assembler-times {vse16.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c index 01c5a1a1ba2..97a3282a657 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -123,10 +123,6 @@ DEF_RET1_ARG9 (v256si) DEF_RET1_ARG9 (v512si) DEF_RET1_ARG9 (v1024si) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 7 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1si tests: return value (lw) and function prologue (sw) // 1 lw per test, argnum sw's when args > 1 /* { dg-final { scan-assembler-times {lw\s+a0,\s*[0-9]+\(sp\)} 8 } } */ @@ -140,7 +136,4 @@ DEF_RET1_ARG9 (v1024si) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v8-1024si tests: return value (vse32.v) -/* { dg-final { scan-assembler-times {vse32.v\s+v[0-9],\s*[0-9]+\(a0\)} 74 } } */ -// 256-1024si tests: return value (vse64.v) -// for some reason ARG1 returns using vse64 instead of vse32 -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)\s+ret} 3 } } */ +/* { dg-final { scan-assembler-times {vse32.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c index 2c01aa8c260..a892919feb7 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -113,10 +113,6 @@ DEF_RET1_ARG9 (v128di) DEF_RET1_ARG9 (v256di) DEF_RET1_ARG9 (v512di) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 6 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1di and v2di tests: return value (ld) and function prologue (sd) // - 1 ld per v1di and 2 ld per v2di with args > 1 // - argnum sd's per v1di when argnum > 1 @@ -125,4 +121,4 @@ DEF_RET1_ARG9 (v512di) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v4-512di tests: return value (vse64.v) -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)} 77 } } */ +/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c index 98d6d4a758a..0d40349fb0f 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -133,10 +133,6 @@ DEF_RET1_ARG9 (v512hf) DEF_RET1_ARG9 (v1024hf) DEF_RET1_ARG9 (v2048hf) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 8 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1hf tests: return value (lhu) and function prologue (sh) // 1 lhu per test, argnum sh's when args > 1 /* { dg-final { scan-assembler-times {lhu\s+a[0-1],\s*[0-9]+\(sp\)} 8 } } */ @@ -155,7 +151,4 @@ DEF_RET1_ARG9 (v2048hf) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v16-2048hf tests: return value (vse16.v) -/* { dg-final { scan-assembler-times {vse16.v\s+v[0-9],\s*[0-9]+\(a0\)} 74 } } */ -// v512-2048qf_ARG1 tests: return value (vse64.v) -// for some reason ARG1 returns using vse64 instead of vse16 -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)\s+ret} 3 } } */ +/* { dg-final { scan-assembler-times {vse16.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c index 5f59f001969..8b5a779467d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -123,10 +123,6 @@ DEF_RET1_ARG9 (v256sf) DEF_RET1_ARG9 (v512sf) DEF_RET1_ARG9 (v1024sf) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 7 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1sf tests: return value (lw) and function prologue (sw) // 1 lw per test, argnum sw's when args > 1 /* { dg-final { scan-assembler-times {lw\s+a[0-1],\s*[0-9]+\(sp\)} 8 } } */ @@ -140,7 +136,4 @@ DEF_RET1_ARG9 (v1024sf) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v8-1024sf tests: return value (vse32.v) -/* { dg-final { scan-assembler-times {vse32.v\s+v[0-9],\s*[0-9]+\(a0\)} 74 } } */ -// 256-1024sf tests: return value (vse64.v) -// for some reason ARG1 returns using vse64 instead of vse32 -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)\s+ret} 3 } } */ +/* { dg-final { scan-assembler-times {vse32.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c index 1d427fd08d6..3ba4e1f1864 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mrvv-vector-bits=scalable -mabi=lp64d -O3 -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" @@ -113,10 +113,6 @@ DEF_RET1_ARG9 (v128df) DEF_RET1_ARG9 (v256df) DEF_RET1_ARG9 (v512df) -// RET1_ARG0 tests -/* { dg-final { scan-assembler-times {li\s+a[0-1],\s*0} 6 } } */ -/* { dg-final { scan-assembler-times {call\s+memset} 3 } } */ - // v1df and v2df tests: return value (ld) and function prologue (sd) // - 1 ld per v1df and 2 ld per v2df with args > 1 // - argnum sd's per v1df when argnum > 1 @@ -125,4 +121,4 @@ DEF_RET1_ARG9 (v512df) /* { dg-final { scan-assembler-times {sd\s+a[0-7],\s*[0-9]+\(sp\)} 103 } } */ // v4-512df tests: return value (vse64.v) -/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)} 77 } } */ +/* { dg-final { scan-assembler-times {vse64.v\s+v[0-9],\s*[0-9]+\(a0\)} 80 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-4.c index 1faf31ffd8e..1df83847363 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-7.c index e3980a29540..74b7b699f1a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-7.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/spill-7.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ #include "def.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c index 0699cb78dd5..6edb4c9253a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c @@ -12,7 +12,7 @@ extern void *memcpy(void *__restrict dest, const void *__restrict src, __SIZE_TY /* memcpy should be implemented using the cpymem pattern. ** f1: XX \.L\d+: # local label is ignored -** vsetvli\s+[ta][0-7],a2,e8,m8,ta,ma +** vsetvli\s+[ta][0-7],a2,e8,m1,ta,ma ** vle8\.v\s+v\d+,0\(a1\) ** vse8\.v\s+v\d+,0\(a0\) ** add\s+a1,a1,[ta][0-7] @@ -31,7 +31,7 @@ void f1 (void *a, void *b, __SIZE_TYPE__ l) overflow is undefined. ** f2: XX \.L\d+: # local label is ignored -** vsetvli\s+[ta][0-7],a2,e8,m8,ta,ma +** vsetvli\s+[ta][0-7],a2,e8,m1,ta,ma ** vle8\.v\s+v\d+,0\(a1\) ** vse8\.v\s+v\d+,0\(a0\) ** add\s+a1,a1,[ta][0-7] diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-2.c index 6a854c87cd0..7b6a429f34c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-additional-options "-O1 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-additional-options "-O1 -fno-schedule-insns -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ /* { dg-add-options riscv_v } */ /* { dg-final { check-function-bodies "**" "" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c index d9d4a70a392..1f148bc7052 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c @@ -7,13 +7,14 @@ /* Tiny memmoves should not be vectorised. ** f1: -** li\s+a2,\d+ -** tail\s+memmove +** lbu\s+[ta][0-7],0\(a1\) +** sb\s+[ta][0-7],0\(a0\) +** ret */ char * f1 (char *a, char const *b) { - return __builtin_memmove (a, b, MIN_VECTOR_BYTES - 1); + return __builtin_memmove (a, b, 1); } /* Vectorise+inline minimum vector register width with LMUL=1 diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c index 8265105f4eb..7e40ac583bb 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c index 682d3e9cb7e..c5be5b1d28e 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c index 73a9f51a16b..8f66d9670f3 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c index bec9b28008d..3e23ae717ae 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c index c8978052b91..11cdc74ec72 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c index 5604ca280fe..7a5d04e3c5c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-6.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-6.c index 9c6484479cf..ba96b340279 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-6.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-7.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-7.c index 0bb2260cf1c..c0e8d6f1b39 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-7.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-7.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-8.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-8.c index 1ad588ff8ad..91743009639 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-8.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-8.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-9.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-9.c index 5b28863b6ad..ac7ec74472d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-9.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-9.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl" } */ +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize -mrvv-vector-bits=zvl -mrvv-max-lmul=m8" } */ #include "riscv_vector.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c index e55604e1114..9ab04b07c12 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c @@ -52,7 +52,7 @@ int main() { printf("%d\n", m); } -/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ /* { dg-final { scan-assembler-not {vsetivli} } } */ -/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e8,\s*m1,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c index b20e46395aa..1facfd55d79 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c @@ -62,7 +62,7 @@ int main() { return 0; } -/* { dg-final { scan-assembler-times {vsetvli} 4 } } */ +/* { dg-final { scan-assembler-times {vsetvli} 5 } } */ /* { dg-final { scan-assembler-not {vsetivli} } } */ -/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m2,\s*t[au],\s*m[au]} 1 } } */ -/* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 1 } } */ +/* { dg-final { scan-assembler-times {vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e8,\s*m1,\s*t[au],\s*m[au]} 2 } } */ +/* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 2 } } */ From patchwork Fri Oct 18 13:12:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999115 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=EUBZGrQK; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQGd48Q1z1xth for ; Sat, 19 Oct 2024 00:17:13 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B9C503858282 for ; Fri, 18 Oct 2024 13:17:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id DF6323858429 for ; Fri, 18 Oct 2024 13:14:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF6323858429 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DF6323858429 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; cv=none; b=fA8/7Nv2qiA5Mnuo3M5Ih0GlrqrL1h+rTGuAM97CAE0/0rVPbiXc08wVm67FXISLyJHaMCo3c2pRai5HzS+CcJYiqhrywxGgApLVwMFTsVHosvkLT4UPRwcTDzhS7YlI+ms89n0KgewgQ+T+Nh64iV4Ty/j5s4Z8Vdqeem0Fn3M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257275; c=relaxed/simple; bh=A/kZslWybJKaPzqCtHS4bd1xCZtPhWs+mkIuJKyozkA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qFluKIctC1SAB80GvGELaSbMV39cYylFycNRkcu9OxAyycrkgstR2dQJ8fUWPepfe1ITMFjNCT+tI/vSTJBuQVvN+GSoaK+7EQISqiXQmCEIxjVLWvyNJvOtLnJuL2TKqyG2z0dGdDTb67Unm/tUb/lj9qPQefsl/i8/yRb/gZ8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-4315abed18aso16364335e9.2 for ; Fri, 18 Oct 2024 06:14:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257269; x=1729862069; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=11KfAi+WfuuzV6ZNe263VtwD3S8c/IOhbDbvWOtQNnc=; b=EUBZGrQKtnjtqaa9e3ZEMaTY8BGhBBnmoTfxU4u16/ejkHqeqkdOh5q0OlaApjKMom PGrrwy8Drtpasqh2WCeRignzhrvdiQkgUT4UoByovwUj46MqMCS23NCJsSow0Au9/4of TAPMHiJooTOCv6X3t9/28eEZN/waJgytGBSgOisu2B0sCPCxwi11ht+OekLJaUxs/JlR 739H0QkA1MZU0G3KE96Gbrn637W+pHQ+N26bis48p9EtujEYe0BSx8ougm91FMuT+u5L uZuZWsGmKtxipOnSxUhdqVG/slBm8seobSq/C7A0qCk2AiH6ajUulvhNIvv5LMMpqNaQ KTDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257269; x=1729862069; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=11KfAi+WfuuzV6ZNe263VtwD3S8c/IOhbDbvWOtQNnc=; b=NDyItMhJvjgtbDNLMXJXVRaKuNgHCYXijrZbS+LOh3SUYo44r7B8xfSceBzVzw/5zI 9k+X9Sq1hC/iUSuvjOg9tNxXj6A1VaR/1bINBp3oU40epMylQLF7l48efdT+yBFB2bwK wYGva246FURJ7QsjQdVINILTZMRf3nvzXGjr4P90oJZy8IuXVPxBfUG9qSCz40NBuYz/ fSieevx+92eTomnkuQ7SRbLvOgW3lS1aSoelAdFUfPhd5jvpjVnF8rfZWmr7jVoIZIMy XFUyrRh3UEshYrUqTCmgfMM4sW1T4OzL4coPb+kvrkjvMAxU6NSMW3sqkMVjbX5OtSZn GKuQ== X-Gm-Message-State: AOJu0YxwxZNQw1toBIJNSycRFjZApiO7j677Dp+hsXKuEco38DBtn+2H s0ha6+Z+kMqKCLGhaftgVgi9kh37XrVB5KNHP30dOAz+Of9GTAs0LAOQr+Ur1iHYC6dLeI0ts9a 4 X-Google-Smtp-Source: AGHT+IEkithmcJYGT/ASW6ofQu9WlmqMnM2QW+MuKoChYLXpJAaCSXrUjamemd6oe2fJ1UfhOgUfjg== X-Received: by 2002:a05:600c:5120:b0:431:5533:8f0b with SMTP id 5b1f17b1804b1-431616a3a96mr17901745e9.32.1729257269363; Fri, 18 Oct 2024 06:14:29 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:29 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 5/7] RISC-V: Move vector memcpy decision making to separate function [NFC] Date: Fri, 18 Oct 2024 14:12:58 +0100 Message-ID: <20241018131300.1150819-6-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This moves the code for deciding whether to generate a vectorized memcpy, what vector mode to use and whether a loop is needed out of riscv_vector::expand_block_move and into a new function riscv_vector::use_stringop_p so that it can be reused for other string operations. gcc/ChangeLog: * config/riscv/riscv-string.cc (struct stringop_info): New. (expand_block_move): Move decision making code to... (use_vector_stringop_p): ...here. --- gcc/config/riscv/riscv-string.cc | 143 +++++++++++++++++++------------ 1 file changed, 87 insertions(+), 56 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 64fd6b29092..118c02a4021 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1051,35 +1051,31 @@ riscv_expand_block_clear (rtx dest, rtx length) namespace riscv_vector { -/* Used by cpymemsi in riscv.md . */ +struct stringop_info { + rtx avl; + bool need_loop; + machine_mode vmode; +}; -bool -expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) -{ - /* - memcpy: - mv a3, a0 # Copy destination - loop: - vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b - vle8.v v0, (a1) # Load bytes - add a1, a1, t0 # Bump pointer - sub a2, a2, t0 # Decrement count - vse8.v v0, (a3) # Store bytes - add a3, a3, t0 # Bump pointer - bnez a2, loop # Any more? - ret # Return - */ - gcc_assert (TARGET_VECTOR); +/* If a vectorized stringop should be used populate INFO and return TRUE. + Otherwise return false and leave INFO unchanged. - HOST_WIDE_INT potential_ew - = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD) - / BITS_PER_UNIT); - machine_mode vmode = VOIDmode; + MAX_EW is the maximum element width that the caller wants to use and + LENGTH_IN is the length of the stringop in bytes. +*/ + +static bool +use_vector_stringop_p (struct stringop_info &info, HOST_WIDE_INT max_ew, + rtx length_in) +{ bool need_loop = true; - bool size_p = optimize_function_for_size_p (cfun); - rtx src, dst; - rtx vec; - rtx length_rtx = length_in; + machine_mode vmode = VOIDmode; + /* The number of elements in the stringop. */ + rtx avl = length_in; + HOST_WIDE_INT potential_ew = max_ew; + + if (!TARGET_VECTOR || !(stringop_strategy & STRATEGY_VECTOR)) + return false; if (CONST_INT_P (length_in)) { @@ -1113,17 +1109,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) for small element widths, we might allow larger element widths for loops too. */ if (need_loop) - { - if (movmem_p) - /* Inlining general memmove is a pessimisation: we can't avoid - having to decide which direction to go at runtime, which is - costly in instruction count however for situations where the - entire move fits in one vector operation we can do all reads - before doing any writes so we don't have to worry so generate - the inline vector code in such situations. */ - return false; - potential_ew = 1; - } + potential_ew = 1; for (; potential_ew; potential_ew >>= 1) { scalar_int_mode elem_mode; @@ -1193,7 +1179,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) gcc_assert (potential_ew > 1); } if (potential_ew > 1) - length_rtx = GEN_INT (length / potential_ew); + avl = GEN_INT (length / potential_ew); } else { @@ -1203,35 +1189,80 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) /* A memcpy libcall in the worst case takes 3 instructions to prepare the arguments + 1 for the call. When RVV should take 7 instructions and we're optimizing for size a libcall may be preferable. */ - if (size_p && need_loop) + if (optimize_function_for_size_p (cfun) && need_loop) return false; - /* length_rtx holds the (remaining) length of the required copy. + info.need_loop = need_loop; + info.vmode = vmode; + info.avl = avl; + return true; +} + +/* Used by cpymemsi in riscv.md . */ + +bool +expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) +{ + /* + memcpy: + mv a3, a0 # Copy destination + loop: + vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b + vle8.v v0, (a1) # Load bytes + add a1, a1, t0 # Bump pointer + sub a2, a2, t0 # Decrement count + vse8.v v0, (a3) # Store bytes + add a3, a3, t0 # Bump pointer + bnez a2, loop # Any more? + ret # Return + */ + struct stringop_info info; + + HOST_WIDE_INT potential_ew + = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD) + / BITS_PER_UNIT); + + if (!use_vector_stringop_p (info, potential_ew, length_in)) + return false; + + /* Inlining general memmove is a pessimisation: we can't avoid having to + decide which direction to go at runtime, which is costly in instruction + count however for situations where the entire move fits in one vector + operation we can do all reads before doing any writes so we don't have to + worry so generate the inline vector code in such situations. */ + if (info.need_loop && movmem_p) + return false; + + rtx src, dst; + rtx vec; + + /* avl holds the (remaining) length of the required copy. cnt holds the length we copy with the current load/store pair. */ - rtx cnt = length_rtx; + rtx cnt = info.avl; rtx label = NULL_RTX; rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0)); rtx src_addr = copy_addr_to_reg (XEXP (src_in, 0)); - if (need_loop) + if (info.need_loop) { - length_rtx = copy_to_mode_reg (Pmode, length_rtx); + info.avl = copy_to_mode_reg (Pmode, info.avl); cnt = gen_reg_rtx (Pmode); label = gen_label_rtx (); emit_label (label); - emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (vmode, cnt, - length_rtx)); + emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (info.vmode, cnt, + info.avl)); } - vec = gen_reg_rtx (vmode); - src = change_address (src_in, vmode, src_addr); - dst = change_address (dst_in, vmode, dst_addr); + vec = gen_reg_rtx (info.vmode); + src = change_address (src_in, info.vmode, src_addr); + dst = change_address (dst_in, info.vmode, dst_addr); /* If we don't need a loop and have a suitable mode to describe the size, just do a load / store pair and leave it up to the later lazy code motion pass to insert the appropriate vsetvli. */ - if (!need_loop && known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + if (!info.need_loop + && known_eq (GET_MODE_SIZE (info.vmode), INTVAL (length_in))) { emit_move_insn (vec, src); emit_move_insn (dst, vec); @@ -1239,26 +1270,26 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) else { machine_mode mask_mode = riscv_vector::get_vector_mode - (BImode, GET_MODE_NUNITS (vmode)).require (); + (BImode, GET_MODE_NUNITS (info.vmode)).require (); rtx mask = CONSTM1_RTX (mask_mode); if (!satisfies_constraint_K (cnt)) cnt= force_reg (Pmode, cnt); rtx m_ops[] = {vec, mask, src}; - emit_nonvlmax_insn (code_for_pred_mov (vmode), + emit_nonvlmax_insn (code_for_pred_mov (info.vmode), riscv_vector::UNARY_OP_TAMA, m_ops, cnt); - emit_insn (gen_pred_store (vmode, dst, mask, vec, cnt, + emit_insn (gen_pred_store (info.vmode, dst, mask, vec, cnt, get_avl_type_rtx (riscv_vector::NONVLMAX))); } - if (need_loop) + if (info.need_loop) { emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt))); emit_insn (gen_rtx_SET (dst_addr, gen_rtx_PLUS (Pmode, dst_addr, cnt))); - emit_insn (gen_rtx_SET (length_rtx, gen_rtx_MINUS (Pmode, length_rtx, cnt))); + emit_insn (gen_rtx_SET (info.avl, gen_rtx_MINUS (Pmode, info.avl, cnt))); /* Emit the loop condition. */ - rtx test = gen_rtx_NE (VOIDmode, length_rtx, const0_rtx); - emit_jump_insn (gen_cbranch4 (Pmode, test, length_rtx, const0_rtx, label)); + rtx test = gen_rtx_NE (VOIDmode, info.avl, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test, info.avl, const0_rtx, label)); emit_insn (gen_nop ()); } From patchwork Fri Oct 18 13:12:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999113 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=AB8RLMBU; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQDn2mmmz1xth for ; Sat, 19 Oct 2024 00:15:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8DFC53858405 for ; Fri, 18 Oct 2024 13:15:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id 2B25C3858404 for ; Fri, 18 Oct 2024 13:14:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B25C3858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2B25C3858404 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::330 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257278; cv=none; b=vzqnfZAc1JZRnSOTsma6qXTWF3uN9eImkY8KwfDqNHXWWnHfCdnvzk5qaUcji2pcdTqg79pN+LB+6RmewgM0bI6jbD3293XzitktTkxsC6osmeVLdbUauAFtpMKOlmhyGhYnyr8TozZw9TNhLKF/O3IuvYeTWfImHucNR2E+eH0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257278; c=relaxed/simple; bh=3YgSGYjHc0DxB53FgBtDTkN7crmDm3lEZs3Cc1y0lbA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FcMc/SRhVczvf9453JgdVfNjX6Tfj5PzldpPXmRiuhCH0uJQ/jvQo8TfhFClkZIgyya9QDBETUmNqPBdLfuczMO1YYQvYKY1KGGq1rJvaUTA4v6fZxWdTRUvjILghtACdaqb1VRFiEQ/4x9tlN2kuDkNtGjFO02LlHRAEbtFL7A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-43155afca99so16758485e9.1 for ; Fri, 18 Oct 2024 06:14:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257272; x=1729862072; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CsMwWjuIx1ZCkJtxR8fz2Oc/UTK26btOgALS/BenQeU=; b=AB8RLMBUoYfYQSrDSG6PB/C2s7ACC9/rd5gWbBxDXmoymBGC1EAYRYcJ6rxmTxa+uz lbICrbFZDR7TPt3s+LAMYjbJCvBQmL6RnPowtJmfw6zV74p9j36HdZMyX+eFkE/Dofay duVZ1Yk3H8kCIWB8M7Tc11ZujlMBKXUoS5KBm7bYqKJ00Lx/Pb4fRoctnu9YLnk2BkMi ZDZWvZyRgbPCkimWpnZrLxI0oHH0pH4wkc7to+FHKG5V5fWFweaKyx+Ug4Fh4S0Nk8Bf 0eRqjZdOBJZue4mh+WaTLNYT0AkrynHg+px0BrhIFju5e+QaQXBdmRk6QmNO30qw+pO3 T8vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257272; x=1729862072; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CsMwWjuIx1ZCkJtxR8fz2Oc/UTK26btOgALS/BenQeU=; b=Aebe5hpodnWVvdtE1HlTctbfPtejpc9FuYIIMD6iOukYT89aJYrP8Qr/XNwrQ7m/RO wFUp6kqwSZ+962MLnjAe4Zvae20vREclaUjd2V5QpsnYLiQ6cf0bT9XA8brKQx2cLKaQ m0ZRHIIdWXLiWbdJZqghhiRcTtRxJvZNI6nas9hMoljMAfPc/yLfpgiLeoTFnb5NPDHT ER05pMAuhb0tu1ugAk6TNbFoDMpZc3c4OJ4bIj2muKvIDFjxCRy5eQN7/P9rj5C8HmIY ALtYyn5T+xRZrFB+F3iBWE0TskH1vSl4NStScyH1bVfoZjSk/vtqRRXVfhSQMk01+s5O 1Seg== X-Gm-Message-State: AOJu0YygdYSB/aPrlsRo2cMxWfkaD6hy5auS8O/bB3q6R/8QhbRMkrY+ d5cbWN/yS6BUBiQsOHa+61wGseEDdIftEfC3YO9U7UgJInGUpiYnvIPVn2WhNNjmaQ+wL4gI9tU M X-Google-Smtp-Source: AGHT+IHvDPgm3B7SJV4Jn0yS8cb9D0X6U/mr0fopsUdZHQI4MvmuJoGpA03b0Ulv1Ex75IjyjMerfg== X-Received: by 2002:a05:600c:34d3:b0:431:55af:a220 with SMTP id 5b1f17b1804b1-4316168501fmr15236415e9.12.1729257271509; Fri, 18 Oct 2024 06:14:31 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:31 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 6/7] RISC-V: Make vectorized memset handle more cases Date: Fri, 18 Oct 2024 14:12:59 +0100 Message-ID: <20241018131300.1150819-7-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org `expand_vec_setmem` only generated vectorized memset if it fitted into a single vector store. Extend it to generate a loop for longer and unknown lengths. The test cases now use -O1 so that they are not sensitive to scheduling. gcc/ChangeLog: * config/riscv/riscv-string.cc (use_vector_stringop_p): Add comment. (expand_vec_setmem): Use use_vector_stringop_p instead of check_vectorise_memory_operation. Add loop generation. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/setmem-1.c: Use -O1. Expect a loop instead of a libcall. Add test for unknown length. * gcc.target/riscv/rvv/base/setmem-2.c: Likewise. * gcc.target/riscv/rvv/base/setmem-3.c: Likewise and expect smaller lmul. --- gcc/config/riscv/riscv-string.cc | 83 ++++++++++++++----- .../gcc.target/riscv/rvv/base/setmem-1.c | 37 ++++++++- .../gcc.target/riscv/rvv/base/setmem-2.c | 37 ++++++++- .../gcc.target/riscv/rvv/base/setmem-3.c | 41 +++++++-- 4 files changed, 160 insertions(+), 38 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 118c02a4021..91b0ec03118 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1062,6 +1062,9 @@ struct stringop_info { MAX_EW is the maximum element width that the caller wants to use and LENGTH_IN is the length of the stringop in bytes. + + This is currently used for cpymem and setmem. If expand_vec_cmpmem switches + to using it too then check_vectorise_memory_operation can be removed. */ static bool @@ -1600,41 +1603,75 @@ check_vectorise_memory_operation (rtx length_in, HOST_WIDE_INT &lmul_out) bool expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in) { - HOST_WIDE_INT lmul; + stringop_info info; + /* Check we are able and allowed to vectorise this operation; bail if not. */ - if (!check_vectorise_memory_operation (length_in, lmul)) + if (!use_vector_stringop_p (info, 1, length_in)) return false; - machine_mode vmode - = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul) - .require (); + /* avl holds the (remaining) length of the required set. + cnt holds the length we set with the current store. */ + rtx cnt = info.avl; rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0)); - rtx dst = change_address (dst_in, vmode, dst_addr); + rtx dst = change_address (dst_in, info.vmode, dst_addr); - rtx fill_value = gen_reg_rtx (vmode); + rtx fill_value = gen_reg_rtx (info.vmode); rtx broadcast_ops[] = { fill_value, fill_value_in }; - /* If the length is exactly vlmax for the selected mode, do that. - Otherwise, use a predicated store. */ - if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + rtx label = NULL_RTX; + rtx mask = NULL_RTX; + + /* If we don't need a loop and the length is exactly vlmax for the selected + mode do a broadcast and store, otherwise use a predicated store. */ + if (!info.need_loop + && known_eq (GET_MODE_SIZE (info.vmode), INTVAL (length_in))) { - emit_vlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, - broadcast_ops); + emit_vlmax_insn (code_for_pred_broadcast (info.vmode), UNARY_OP, + broadcast_ops); emit_move_insn (dst, fill_value); + return true; } - else + + machine_mode mask_mode + = riscv_vector::get_vector_mode (BImode, + GET_MODE_NUNITS (info.vmode)).require (); + mask = CONSTM1_RTX (mask_mode); + if (!satisfies_constraint_K (cnt)) + cnt = force_reg (Pmode, cnt); + + if (info.need_loop) { - if (!satisfies_constraint_K (length_in)) - length_in = force_reg (Pmode, length_in); - emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, - broadcast_ops, length_in); - machine_mode mask_mode - = riscv_vector::get_vector_mode (BImode, GET_MODE_NUNITS (vmode)) - .require (); - rtx mask = CONSTM1_RTX (mask_mode); - emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in, - get_avl_type_rtx (riscv_vector::NONVLMAX))); + info.avl = copy_to_mode_reg (Pmode, info.avl); + cnt = gen_reg_rtx (Pmode); + emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (info.vmode, cnt, + info.avl)); + } + + emit_nonvlmax_insn (code_for_pred_broadcast (info.vmode), + riscv_vector::UNARY_OP, broadcast_ops, cnt); + + if (info.need_loop) + { + label = gen_label_rtx (); + + emit_label (label); + emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (info.vmode, cnt, + info.avl)); + } + + emit_insn (gen_pred_store (info.vmode, dst, mask, fill_value, cnt, + get_avl_type_rtx (riscv_vector::NONVLMAX))); + + if (info.need_loop) + { + emit_insn (gen_rtx_SET (dst_addr, gen_rtx_PLUS (Pmode, dst_addr, cnt))); + emit_insn (gen_rtx_SET (info.avl, gen_rtx_MINUS (Pmode, info.avl, cnt))); + + /* Emit the loop condition. */ + rtx test = gen_rtx_NE (VOIDmode, info.avl, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test, info.avl, const0_rtx, label)); + emit_insn (gen_nop ()); } return true; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c index 22844ff348c..32d85ea4f14 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-add-options riscv_v } */ -/* { dg-additional-options "-O3 -mrvv-max-lmul=dynamic" } */ +/* { dg-additional-options "-O1 -mrvv-max-lmul=dynamic" } */ /* { dg-final { check-function-bodies "**" "" } } */ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) @@ -91,13 +91,42 @@ f6 (void *a, int const b) return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8); } -/* Don't vectorise if the move is too large for one operation. +/* Vectorise with loop for larger lengths ** f7: -** li\s+a2,\d+ -** tail\s+memset +** mv\s+[ta][0-7],a0 +** li\s+[ta][0-7],129 +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** vmv.v.x\s+v8,a1 +XX \.L\d+: +** vsetvli\s+[ta][0-7],[ta][0-7],e8,m8,ta,ma +** vse8.v\s+v8,0\(a[0-9]\) +** add\s+[ta][0-7],[ta][0-7],[ta][0-7] +** sub\s+[ta][0-7],[ta][0-7],[ta][0-7] +** bne\s+[ta][0-7],zero,\.L\d+ +** ret */ void * f7 (void *a, int const b) { return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8 + 1); } + +/* Vectorize with loop for unknown length. +** f8: +** mv\s+[ta][0-7],a0 +** mv\s+[ta][0-7],a2 +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** vmv.v.x\s+v8,a1 +XX \.L\d+: +** vsetvli\s+[ta][0-7],[ta][0-7],e8,m8,ta,ma +** vse8.v\s+v8,0\(a[0-9]\) +** add\s+[ta][0-7],[ta][0-7],[ta][0-7] +** sub\s+[ta][0-7],[ta][0-7],[ta][0-7] +** bne\s+[ta][0-7],zero,\.L\d+ +** ret +*/ +void * +f8 (void *a, int const b, int n) +{ + return __builtin_memset (a, b, n); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c index faea442a4bd..9da1c9309d8 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-add-options riscv_v } */ -/* { dg-additional-options "-O3 -mrvv-max-lmul=m1" } */ +/* { dg-additional-options "-O1 -mrvv-max-lmul=m1" } */ /* { dg-final { check-function-bodies "**" "" } } */ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) @@ -39,13 +39,42 @@ f2 (void *a, int const b) return __builtin_memset (a, b, MIN_VECTOR_BYTES); } -/* Don't vectorise if the move is too large for requested lmul. +/* Vectorise with loop for larger lengths ** f3: -** li\s+a2,\d+ -** tail\s+memset +** mv\s+[ta][0-7],a0 +** li\s+[ta][0-7],17 +** vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma +** vmv.v.x\s+v1,a1 +XX \.L\d+: +** vsetvli\s+[ta][0-7],[ta][0-7],e8,m1,ta,ma +** vse8.v\s+v1,0\(a[0-9]\) +** add\s+[ta][0-7],[ta][0-7],[ta][0-7] +** sub\s+[ta][0-7],[ta][0-7],[ta][0-7] +** bne\s+[ta][0-7],zero,\.L\d+ +** ret */ void * f3 (void *a, int const b) { return __builtin_memset (a, b, MIN_VECTOR_BYTES + 1); } + +/* Vectorize with loop for unknown length. +** f4: +** mv\s+[ta][0-7],a0 +** mv\s+[ta][0-7],a2 +** vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma +** vmv.v.x\s+v1,a1 +XX \.L\d+: +** vsetvli\s+[ta][0-7],[ta][0-7],e8,m1,ta,ma +** vse8.v\s+v1,0\(a[0-9]\) +** add\s+[ta][0-7],[ta][0-7],[ta][0-7] +** sub\s+[ta][0-7],[ta][0-7],[ta][0-7] +** bne\s+[ta][0-7],zero,\.L\d+ +** ret +*/ +void * +f4 (void *a, int const b, int n) +{ + return __builtin_memset (a, b, n); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c index 25be694d248..2111a139ad4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-add-options riscv_v } */ -/* { dg-additional-options "-O3 -mrvv-max-lmul=m8" } */ +/* { dg-additional-options "-O1 -mrvv-max-lmul=m8" } */ /* { dg-final { check-function-bodies "**" "" } } */ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) @@ -21,13 +21,13 @@ f1 (void *a, int const b) return __builtin_memset (a, b, MIN_VECTOR_BYTES - 1); } -/* Vectorise+inline minimum vector register width using requested lmul. +/* Vectorised code should use smallest lmul known to fit length. ** f2: ** ( -** vsetivli\s+zero,\d+,e8,m8,ta,ma +** vsetivli\s+zero,\d+,e8,m1,ta,ma ** | ** li\s+a\d+,\d+ -** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** vsetvli\s+zero,a\d+,e8,m1,ta,ma ** ) ** vmv\.v\.x\s+v\d+,a1 ** vse8\.v\s+v\d+,0\(a0\) @@ -57,13 +57,40 @@ f3 (void *a, int const b) return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8); } -/* Don't vectorise if the move is too large for requested lmul. +/* Vectorise with loop for larger lengths ** f4: -** li\s+a2,\d+ -** tail\s+memset +** mv\s+[ta][0-7],a0 +** li\s+[ta][0-7],129 +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** vmv.v.x\s+v8,a1 +** vsetvli\s+[ta][0-7],[ta][0-7],e8,m8,ta,ma +** vse8.v\s+v8,0\(a[0-9]\) +** add\s+[ta][0-7],[ta][0-7],[ta][0-7] +** sub\s+[ta][0-7],[ta][0-7],[ta][0-7] +** bne\s+[ta][0-7],zero,\.L\d+ +** ret */ void * f4 (void *a, int const b) { return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8 + 1); } + +/* Vectorize with loop for unknown length. +** f5: +** mv\s+[ta][0-7],a0 +** mv\s+[ta][0-7],a2 +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** vmv.v.x\s+v8,a1 +** vsetvli\s+[ta][0-7],[ta][0-7],e8,m8,ta,ma +** vse8.v\s+v8,0\(a[0-9]\) +** add\s+[ta][0-7],[ta][0-7],[ta][0-7] +** sub\s+[ta][0-7],[ta][0-7],[ta][0-7] +** bne\s+[ta][0-7],zero,\.L\d+ +** ret +*/ +void * +f5 (void *a, int const b, int n) +{ + return __builtin_memset (a, b, n); +} From patchwork Fri Oct 18 13:13:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999112 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=NRz5iBh5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQDM1f5tz1xth for ; Sat, 19 Oct 2024 00:15:15 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B8C7385AC0D for ; Fri, 18 Oct 2024 13:15:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id 059493858D20 for ; Fri, 18 Oct 2024 13:14:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 059493858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 059493858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257283; cv=none; b=R5XvKOVvufUAP17tCE1uH8bg3Ew8T8FCJ8hn/kz3GzAKWhoCjh/LsFHk43UDstQDtiDjAQZvW/tulbXvVLEOkw9X30OCC8ajCUcdJ5WUVayNvt9BUvLkwRkkuCNctA3CzgdgUeFcgFoJpPDEHDGsEXOgG0YwHOGMFz0dGVlNlkw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257283; c=relaxed/simple; bh=XrHdMlLs/pHbZ/8TYdfwxy4TQvVq3ueU2KC2Yv4X84w=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=c50IhtpMKOV2p0t1nf+DgsVOZR55aKp1Pa4QurZ7n6mYsEU9Qt6RiZ2BRyarw3JdfoixhO4deNXKqiohCAkeLgViQfhBMjXaFa3tArZdnCdscnmjCk875g+NqzAeRicWf1n4JrfwgNu48qmGfkSkfKo1z1Kjd8Eqz52rVo8thLY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-431137d12a5so21314205e9.1 for ; Fri, 18 Oct 2024 06:14:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257274; x=1729862074; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xgUR2PRDXdI3wXadocUWAe9/46vvp8A6YJtR7uQfMjA=; b=NRz5iBh5GzWyHmTMZLX0XuoZfuKzY0pJDIpRpRcWoCOcgnVXYEcO6MvkgFDOp37qWg d/hMcMMkj2Hz/KE6zk0EySyYiFnYHHIGZ+SsDQRKBJBIhsUnyK/Bjqt5wH4wdUqYpLkf 8WSJeDko++hyCsQPqiNlI0dY4RMqd5x8jGqv3crzahqbF0r1d3PRXHZeiy/FURIAM5of 7avLRW+dutkx8iSXW5j7RteSAvw9ALY51Fflvt2DHNVLLJRl4HeHhlJ2QISyCTPeR4yi FbFBkCIhIIkDAixOwsOO+4OLUxOJC4q0iFv277yPWTRQ4mNBMsosKwLZ3AGWkvJwxgsD UI9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257274; x=1729862074; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xgUR2PRDXdI3wXadocUWAe9/46vvp8A6YJtR7uQfMjA=; b=L7ME6VYFUtyfzDTQ5T9y/HcOMAVrSIRqZ6FZ7/FNx8wGGsYZcb3/H4w5gGiIJXY40S kpSag8heW7eFHglBXgH/0+ypn7WVmn3wAtPVoPEAul/dvId2QGSluLeNQa3lLur4NW7c BGxEY5OoJyZLz/b7trayvWI3nhQD2Eza8xAwm3SEfhc45AZZS6/+j6dJ/qyYkIgR1T4H jv6UnW0s1icLkx+kpbpGk7Z6HRlwRe6Q3GDyzPx9XFOJNHOQDuPGlcm0AUBeaN8VBXJx VWNzBsQQpWUs9vZKCzqEJpr1SwVx6d9cqd8Ncz3F6vrInWZPAw1iWQ6x/PaKZ1aI2jfa Mcxg== X-Gm-Message-State: AOJu0YyMvzl1n+EntXi5NvFNLS3EoJeo42cvYovGZmoyAC34Cp30yZiA +LhbpisfQ+wpA/cG2+5eXp/BjnLXxU+FNmhVNTKlM232oUJbpacksTULh0V3L2Yl01G0rfWVev/ m X-Google-Smtp-Source: AGHT+IEzypwovsSi8kRfbiffqThqJOhVXTvpQSDhlQuU5rXZImd6D/j8yBG4ZORnsXnFJl6x7bKRSg== X-Received: by 2002:a05:600c:34d4:b0:431:588a:44a2 with SMTP id 5b1f17b1804b1-43161636685mr19111885e9.12.1729257273633; Fri, 18 Oct 2024 06:14:33 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:33 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 7/7] RISC-V: Disable by pieces for vector setmem length > UNITS_PER_WORD Date: Fri, 18 Oct 2024 14:13:00 +0100 Message-ID: <20241018131300.1150819-8-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org For fast unaligned access targets, by pieces uses up to UNITS_PER_WORD size pieces resulting in more store instructions than needed. For example gcc.target/riscv/rvv/base/setmem-1.c:f1 built with `-O3 -march=rv64gcv -mtune=thead-c906`: ``` f1: vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a1 vsetivli zero,0,e32,mf2,ta,ma sb a1,14(a0) vmv.x.s a4,v1 vsetivli zero,8,e16,m1,ta,ma vmv.x.s a5,v1 vse8.v v1,0(a0) sw a4,8(a0) sh a5,12(a0) ret ``` The slow unaligned access version built with `-O3 -march=rv64gcv` used 15 sb instructions: ``` f1: sb a1,0(a0) sb a1,1(a0) sb a1,2(a0) sb a1,3(a0) sb a1,4(a0) sb a1,5(a0) sb a1,6(a0) sb a1,7(a0) sb a1,8(a0) sb a1,9(a0) sb a1,10(a0) sb a1,11(a0) sb a1,12(a0) sb a1,13(a0) sb a1,14(a0) ret ``` After this patch, the following is generated in both cases: ``` f1: vsetivli zero,15,e8,m1,ta,ma vmv.v.x v1,a1 vse8.v v1,0(a0) ret ``` gcc/ChangeLog: * config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p): New function. (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Define. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113469.c: Expect mf2 setmem. * gcc.target/riscv/rvv/base/setmem-2.c: Update f1 to expect straight-line vector memset. * gcc.target/riscv/rvv/base/setmem-3.c: Likewise. --- gcc/config/riscv/riscv.cc | 19 +++++++++++++++++++ .../gcc.target/riscv/rvv/autovec/pr113469.c | 3 ++- .../gcc.target/riscv/rvv/base/setmem-2.c | 12 +++++++----- .../gcc.target/riscv/rvv/base/setmem-3.c | 12 +++++++----- 4 files changed, 35 insertions(+), 11 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index e111cb07284..c008b2da3b7 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12583,6 +12583,22 @@ riscv_stack_clash_protection_alloca_probe_range (void) return STACK_CLASH_CALLER_GUARD; } +static bool +riscv_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, + unsigned alignment, + enum by_pieces_operation op, bool speed_p) +{ + /* For set/clear with size > UNITS_PER_WORD, by pieces uses vector broadcasts + with UNITS_PER_WORD size pieces. Use setmem instead which can use + bigger chunks. */ + if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR + && (op == CLEAR_BY_PIECES || op == SET_BY_PIECES) + && speed_p && size > UNITS_PER_WORD) + return false; + + return default_use_by_pieces_infrastructure_p (size, alignment, op, speed_p); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -12948,6 +12964,9 @@ riscv_stack_clash_protection_alloca_probe_range (void) #undef TARGET_C_MODE_FOR_FLOATING_TYPE #define TARGET_C_MODE_FOR_FLOATING_TYPE riscv_c_mode_for_floating_type +#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P +#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P riscv_use_by_pieces_infrastructure_p + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-riscv.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c index d1c118c02d6..f86084bdb40 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113469.c @@ -51,4 +51,5 @@ void p(int buf, __builtin_va_list ab, int q) { } while (k); } -/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*4,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 2 } } */ +/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*4,\s*e8,\s*mf4,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*8,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c index 9da1c9309d8..67d62f7193e 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c @@ -5,15 +5,17 @@ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) -/* Small memsets shouldn't be vectorised. +/* Vectorise with no loop. ** f1: ** ( -** sb\s+a1,0\(a0\) -** ... +** vsetivli\s+zero,\d+,e8,m1,ta,ma ** | -** li\s+a2,\d+ -** tail\s+memset +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma ** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret */ void * f1 (void *a, int const b) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c index 2111a139ad4..7ade7ef415b 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c @@ -5,15 +5,17 @@ #define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) -/* Small memsets shouldn't be vectorised. +/* Vectorise with no loop. ** f1: ** ( -** sb\s+a1,0\(a0\) -** ... +** vsetivli\s+zero,\d+,e8,m1,ta,ma ** | -** li\s+a2,\d+ -** tail\s+memset +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma ** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret */ void * f1 (void *a, int const b)