From patchwork Wed Dec 27 02:38:26 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?b?6ZKf5bGF5ZOy?= <juzhe.zhong@rivai.ai>
X-Patchwork-Id: 1880389
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4T0G740PR6z1ydd
	for <incoming@patchwork.ozlabs.org>; Wed, 27 Dec 2023 13:39:10 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id CD9A43858C41
	for <incoming@patchwork.ozlabs.org>; Wed, 27 Dec 2023 02:39:08 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtpbgeu1.qq.com (smtpbgeu1.qq.com [52.59.177.22])
 by sourceware.org (Postfix) with ESMTPS id E4E583858D38
 for <gcc-patches@gcc.gnu.org>; Wed, 27 Dec 2023 02:38:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4E583858D38
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=rivai.ai
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E4E583858D38
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=52.59.177.22
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703644729; cv=none;
 b=kKZwRqSOzr+MmA7NsfSWDpZ9wCMIjkH2dTzobWgWQWepQk57+0rI8UFTFDRaPrMviOxIr6DmdUZaY3YReEe4FproZpPM1VrhhxzO33XTZlvyo9sds/pU1tqV/vQz3HLw2ZGqFxBZ3xxNI08vl2Wb8dwKt2ahDAAn6AocJTUlM+Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1703644729; c=relaxed/simple;
 bh=g2TC5hEo10BYBzZ5UH8yKACMSvBHi/L3CpmARfg8bro=;
 h=From:To:Subject:Date:Message-Id:MIME-Version;
 b=tIDoMWqQgQPMUuK2hLkCYmOPt9VAqp7lpl7LC0CNSjS/Yg54CxhU2Gmj8rZCH6/kP+1PjjM+6xcHZjSVqyG+uCzA0DLj5BvZYqAcXQT7isgw08MFjI14QQykm6f+JwAmzpIAuxSK2bY0Fgc9BppL0ack1LgRBPYk0NYZNGuHoew=
ARC-Authentication-Results: i=1; server2.sourceware.org
X-QQ-mid: bizesmtp77t1703644713tfomnjjv
Received: from rios-cad122.hadoop.rioslab.org ( [58.60.1.26])
 by bizesmtp.qq.com (ESMTP) with
 id ; Wed, 27 Dec 2023 10:38:32 +0800 (CST)
X-QQ-SSF: 01400000000000G0V000000A0000000
X-QQ-FEAT: E0jzmhOyEPKzb3UxYYOcN1EQht6HQTM2GyM4u1nHk3GRW2RAO1FQPeL6lmEhP
 NbM3G4Nm/EiND+Z7qQStiNWtGNPO1nNR0m32i5FPOwnWC3Psan69oeNHE6+nFsQxZ1FtqOt
 V/Dffqr+jjDp5HHeLdJp4mGtyJ8AWQuCyB5IvgFwp7lMXquy/X1kDM3eD8yuCsKS2UA8/19
 +NKqSWhKourd2808sAWE6x74xFrSZUL9qDUtK4FLU4X3TB066WXeqwYk3/8aIVs6BpGshSa
 x8hbbBoY366B+/r00WTTemIdgrXGzaZD8KeiE9V9Mk5q4OZQtwS24WMUS4FkujY5OL23Rzd
 IZzjI/5vFrdq3YZtvaXxdJxNWttqnk/LiVfT0OT11G9V3oe75/+YcPNWNWeBQ==
X-QQ-GoodBg: 2
X-BIZMAIL-ID: 8396225421396755344
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com,
 rdapp.gcc@gmail.com, Juzhe-Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for
 cond_len_xxx when length is in range [0, 31]
Date: Wed, 27 Dec 2023 10:38:26 +0800
Message-Id: <20231227023826.226460-1-juzhe.zhong@rivai.ai>
X-Mailer: git-send-email 2.36.3
MIME-Version: 1.0
X-QQ-SENDSIZE: 520
Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0
X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H2, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE,
 T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Notice we have this following situation:

        vsetivli        zero,4,e32,m1,ta,ma
        vlseg4e32.v     v4,(a5)
        vlseg4e32.v     v12,(a3)
        vsetvli a5,zero,e32,m1,tu,ma             ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax
        vfadd.vf        v3,v13,fa0
        vfadd.vf        v1,v12,fa1
        vfmul.vv        v17,v3,v5
        vfmul.vv        v16,v1,v5

The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == NUNITS blindly.
However, we don't need to transform all of them since when len is range of [0,31], we don't need to
consume scalar registers.

After this patch:

	vsetivli	zero,4,e32,m1,tu,ma
	addi	a4,a5,400
	vlseg4e32.v	v12,(a3)
	vfadd.vf	v3,v13,fa0
	vfadd.vf	v1,v12,fa1
	vlseg4e32.v	v4,(a4)
	vfadd.vf	v2,v14,fa1
	vfmul.vv	v17,v3,v5
	vfmul.vv	v16,v1,v5

Tested on both RV32 and RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (is_vlmax_len_p): New function.
	(expand_load_store): Disallow transformation into VLMAX when len is in range of [0,31]
	(expand_cond_len_op): Ditto.
	(expand_gather_scatter): Ditto.
	(expand_lanes_load_store): Ditto.
	(expand_fold_extract_last): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test.
	* gcc.target/riscv/rvv/base/vf_avl-2.c: New test.
---
 gcc/config/riscv/riscv-v.cc                   | 21 +++++++++++++------
 .../riscv/rvv/autovec/post-ra-avl.c           |  2 +-
 .../gcc.target/riscv/rvv/base/vf_avl-2.c      | 21 +++++++++++++++++++
 3 files changed, 37 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 038ab084a37..0cc7af58da6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode)
 	   : false;
 }
 
+/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31].  */
+static bool
+is_vlmax_len_p (machine_mode mode, rtx len)
+{
+  poly_int64 value;
+  return poly_int_rtx_p (len, &value)
+	 && known_eq (value, GET_MODE_NUNITS (mode))
+	 && !satisfies_constraint_K (len);
+}
+
 /* Helper functions for insn_flags && insn_types */
 
 /* Return true if caller need pass mask operand for insn pattern with
@@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load)
   rtx len = ops[3];
   machine_mode mode = GET_MODE (ops[0]);
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
     {
       /* If the length operand is equal to VF, it is VLMAX load/store.  */
       if (is_load)
@@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len)
   machine_mode mask_mode = GET_MODE (mask);
   poly_int64 value;
   bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
-  bool is_vlmax_len
-    = poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
+  bool is_vlmax_len = is_vlmax_len_p (mode, len);
 
   unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type;
   if (is_dummy_mask)
@@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   poly_int64 value;
-  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+  bool is_vlmax = is_vlmax_len_p (vec_mode, len);
 
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
@@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load)
   rtx reg = is_load ? ops[0] : ops[1];
   machine_mode mode = GET_MODE (ops[0]);
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
     {
       /* If the length operand is equal to VF, it is VLMAX load/store.  */
       if (is_load)
@@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops)
   rtx slide_vect = gen_reg_rtx (mode);
   insn_code icode;
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
     len = NULL_RTX;
 
   /* Calculate the number of 1-bit in mask. */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
index f3d12bac7cd..bff6dcb1c38 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
@@ -13,4 +13,4 @@ int foo() {
   return a;
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c
new file mode 100644
index 00000000000..5a94a51f308
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=fixed-vlmax" } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 4)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
+      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*4,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 } } */
+/* { dg-final { scan-assembler-not {vsetvli} } } */