From patchwork Fri Oct 18 13:12:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 1999110 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=cfUCROpQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVQD54NKFz1xvV for ; Sat, 19 Oct 2024 00:15:01 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C61FE3857C7F for ; Fri, 18 Oct 2024 13:14:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id 26EEB3858288 for ; Fri, 18 Oct 2024 13:14:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 26EEB3858288 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 26EEB3858288 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257267; cv=none; b=rQxK/ljz1Kau7nDVDmX4PLJamghlSWvTecFaDVZHuI+qNw1YaYy0UX4GqkgW4OZvWZ8HVCbJZcOkkRJtIP+1f0JBfUaAOJlsLJTJtaoHze+7/eVDDitWsTfkT2HLlVcSxQhhnamxwBzmn9RIT5A63mXwXdPJXt/EDh/wppGfLNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729257267; c=relaxed/simple; bh=y4fPNj5bzJYw03VsZuBfREsJqCSUuE+pdiixk0lPoDA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FZRei/nTuX9MDYt9Cd5S03aDGsLIdwVPxv91hVbvFmITIUpwrYutSN6pO9s1+ybtIROunoKXCiaocHXWRAbOetW6YEcfJwc5uRyTKnm+iZfn1p4QM7aGv0Dt4lMHqkRkXa0/Sueg9FCmxbTAm1L9+RrdPLCIKvdHh9m0NKA1D5A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-431137d12a5so21312005e9.1 for ; Fri, 18 Oct 2024 06:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1729257261; x=1729862061; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XpWp92hULpUzu43rxQaCcETCp6k2BlgRa86qWQQ/oic=; b=cfUCROpQgGkJxBKj0sTEyOzymiZpPFzB8Th0gzh9eV75nT+kZaZTwudaWWuzEpQYwP tCtiBcmrSOc59BkRm6bW49lmHxNAWiWWlHHpcq0qajxVMI3+GbIa1SwPmfItQ39+YrLP r6BDqMANPDAr8eonghao9onsF1XaMhJ4IjrysKh/FE7c0XLZAaziGCAcJx3ImCNHlOFV w0MTvcMtG5vvp0QuZVvX2DI7VmzHzaBMQBU8EotA0fvI6WGsqvSMflN09gsd6iYClMuk OOqQH1uLjOp9d9PozJgUtBKwoJZ49z2OCgqshOVmFDS9Any+coGkfNKN/RUoaP/0bb0F Jyjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729257261; x=1729862061; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XpWp92hULpUzu43rxQaCcETCp6k2BlgRa86qWQQ/oic=; b=pZcIDhkkp165g5luT+5iv/qF55SuA0lfzFh3z4Y9RRAYeecoGpRS261F5TqhWKpv6D Zrz1qgyt1WA4pmxwPZoiNuGsF+7TtvEfvTE8OgKXAso6Z5rhcFhJSjdN4iixkzJaUsyD aqj55x/8Z2pdY/FFgD3pt6taQLv+MLnJUgEaNa2tVFoJsSCx82wuQWiabPtJkMY2Ps8z XwJEXnoTfkGYmAT1KAACDfJ4hhi7Ni/2dMxXXMj2hDzoR5nwKCzntQdmcdtrPqG92FRL M0hahraRTHMxPAR38UaU6/Ei1beWTQo1igFeaHnJEkji8yO5HZ+NFv0rHg5OrhUN/4jJ WRXw== X-Gm-Message-State: AOJu0YzRtS6IM9iiEtW6JyuZch99iaFvkC648bRI7js+IiIkD5JD0yZ7 F9jjpXmLMR8nJkCfLx75flxTY3B2Dw3arrW1Lo3vzZId3WhJ07norEENvj4dXBQn8BwYy2zxi0i w X-Google-Smtp-Source: AGHT+IGVunqeEi0p2nxKOuWC4lSJ0yb9r+57ElEhPtGhUhhqgOQLfaDooc8VgbtwoJ6zwKDJRvSw+A== X-Received: by 2002:a05:600c:4e12:b0:430:5887:c238 with SMTP id 5b1f17b1804b1-43161628886mr20910405e9.11.1729257260629; Fri, 18 Oct 2024 06:14:20 -0700 (PDT) Received: from dorian.. (sals-04-b2-v4wan-167965-cust660.vm36.cable.virginm.net. [80.3.10.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43160dc9a89sm23577435e9.16.2024.10.18.06.14.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 06:14:20 -0700 (PDT) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: Craig Blackmore Subject: [PATCH 3/7] RISC-V: Fix vector memcpy smaller LMUL generation Date: Fri, 18 Oct 2024 14:12:56 +0100 Message-ID: <20241018131300.1150819-4-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241018131300.1150819-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org If riscv_vector::expand_block_move is generating a straight-line memcpy using a predicated store, it tries to use a smaller LMUL to reduce register pressure if it still allows an entire transfer. This happens in the inner loop of riscv_vector::expand_block_move, however, the vmode chosen by this loop gets overwritten later in the function, so I have added the missing break from the outer loop. I have also addressed a couple of issues with the conditions of the if statement within the inner loop. The first condition did not make sense to me: ``` TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT ``` I think this was supposed to be checking that the length fits within the given LMUL, so I have changed it to do that. The second condition: ``` /* Avoid loosing the option of using vsetivli . */ && (nunits <= 31 * lmul || nunits > 31 * 8) ``` seems to imply that lmul affects the range of AVL immediate that vsetivli can take but I don't think that is correct. Anyway, I don't think this condition is necessary because if we find a suitable mode we should stick with it, regardless of whether it allowed vsetivli, rather than continuing to try larger lmul which would increase register pressure or smaller potential_ew which would increase AVL. I have removed this condition. gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Fix condition for using smaller LMUL. Break outer loop if a suitable vmode has been found. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112929-1.c: Expect smaller lmul. * gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Likewise. * gcc.target/riscv/rvv/base/cpymem-3.c: New test. --- gcc/config/riscv/riscv-string.cc | 8 +- .../gcc.target/riscv/rvv/base/cpymem-3.c | 85 +++++++++++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr112929-1.c | 2 +- .../gcc.target/riscv/rvv/vsetvl/pr112988-1.c | 2 +- 4 files changed, 92 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 0f1353baba3..b590c516354 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1153,9 +1153,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) Still, by choosing a lower LMUL factor that still allows an entire transfer, we can reduce register pressure. */ for (unsigned lmul = 1; lmul <= 4; lmul <<= 1) - if (TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT - /* Avoid loosing the option of using vsetivli . */ - && (nunits <= 31 * lmul || nunits > 31 * 8) + if (length * BITS_PER_UNIT <= TARGET_MIN_VLEN * lmul && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew) && (riscv_vector::get_vector_mode (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul, @@ -1163,6 +1161,10 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) break; } + /* Stop searching if a suitable vmode has been found. */ + if (vmode != VOIDmode) + break; + /* The RVVM8?I modes are notionally 8 * BYTES_PER_RISCV_VECTOR bytes wide. BYTES_PER_RISCV_VECTOR can't be evenly divided by the sizes of larger element types; the LMUL factor of 8 can at diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c new file mode 100644 index 00000000000..f07078ba6a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-3.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O1 -fno-schedule-insns -fno-schedule-insns2 -mrvv-max-lmul=m8" } */ +/* { dg-add-options riscv_v } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Check that vector memcpy with predicated store uses smaller LMUL where + possible. + +/* m1 +** f1: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f1 (char *d, char *s) +{ + __builtin_memcpy (d, s, MIN_VECTOR_BYTES - 1); +} + +/* m2 +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m2,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m2,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f2 (char *d, char *s) +{ + __builtin_memcpy (d, s, 2 * MIN_VECTOR_BYTES - 1); +} + +/* m4 +** f3: +** ( +** vsetivli\s+zero,\d+,e8,m4,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m4,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f3 (char *d, char *s) +{ + __builtin_memcpy (d, s, 4 * MIN_VECTOR_BYTES - 1); +} + +/* m8 +** f4: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** | +** li\s+[ta][0-7],\d+ +** addi\s+[ta][0-7],[ta][0-7],-?\d+ +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** ) +** vle8.v\s+v\d+,0\(a1\) +** vse8.v\s+v\d+,0\(a0\) +** ret +*/ + +void f4 (char *d, char *s) +{ + __builtin_memcpy (d, s, 8 * MIN_VECTOR_BYTES - 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c index 86d65ddcbab..e55604e1114 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c @@ -54,5 +54,5 @@ int main() { /* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ /* { dg-final { scan-assembler-not {vsetivli} } } */ -/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ /* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c index 63817f21385..b20e46395aa 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c @@ -64,5 +64,5 @@ int main() { /* { dg-final { scan-assembler-times {vsetvli} 4 } } */ /* { dg-final { scan-assembler-not {vsetivli} } } */ -/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m8,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m2,\s*t[au],\s*m[au]} 1 } } */ /* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 1 } } */