From patchwork Mon Oct 14 11:10:50 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Li, Pan2" <pan2.li@intel.com>
X-Patchwork-Id: 1996818
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256
 header.s=Intel header.b=SneYXwk4;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvjy0dnKz1xvK
	for <incoming@patchwork.ozlabs.org>; Mon, 14 Oct 2024 22:13:42 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 40EF6385AC30
	for <incoming@patchwork.ozlabs.org>; Mon, 14 Oct 2024 11:13:40 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13])
 by sourceware.org (Postfix) with ESMTPS id B241B3858C66
 for <gcc-patches@gcc.gnu.org>; Mon, 14 Oct 2024 11:12:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B241B3858C66
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B241B3858C66
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=192.198.163.13
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728904383; cv=none;
 b=HqI3Tg0U2ukNLsHJAse8VDqv/2qjYxAMJI+z1QTjFtjzYbz73kVb7RRz7VgCyC7HOtEdXgSHKShJlTpTH6FaBiTzSekyD6lmwjjy/nZIr8oSSeIwE2q4LqCpMLk9JmXQ/DQpHmlWNKGvsqWRP2euIKpPhXY9+KtqR+m2soKvCaE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1728904383; c=relaxed/simple;
 bh=PzEzZY/y0msDK46n0NPyeX8GXzKFRrS3X1jI6WHfXoA=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=TOs3degeAoQaJkPiIvDiNuv5iiUxbe3SozqPfPb7zGfKRTlby+w0y2INkg7HbB80i2I56qfEw9PuWxQnQZoVzaWL+38zb5YUbGi7xY6z0IgD/gvxvwDN0EopJ0HX1aMNlZBm9755mvvbL2z5Nod3uro6kOTHz+o4VHQJGwv9XfM=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1728904360; x=1760440360;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=PzEzZY/y0msDK46n0NPyeX8GXzKFRrS3X1jI6WHfXoA=;
 b=SneYXwk4EFUdPZoT+4ooTLyC8By/FeMgUwYeI7v1401TO4JJA39mJFUh
 vIpOVPzki3VQHQM0bU+3kewaCu0KRSMheGxmxtlpg16yzxZb/0+NvnsHn
 s+BRM0U/HB9CtXIJ+dL3aBZS21Q8Mnl0ug2gAQ71WKL+vhMbaENxgZIbI
 dUyzRI+GfkC4yapvMn9KxY3Hfc7hJ1OD4qZQhF16HXlhqZ8QjhYilc75a
 mN3Z6S2XQlQN4GOhPvmPvRA7DXXVp/UXcVqZ7TEc7qBjO2IC3Nkv7A+6H
 BhLWAcAfQ32m5W4IfKJF54j1pPnyAE04gD2q95c/dXzA8YKrw32fZCKFk Q==;
X-CSE-ConnectionGUID: nQrZsUfuQ0iMWHZtGxtd9A==
X-CSE-MsgGUID: 4+9kHof8SpKzG68OxEvnZw==
X-IronPort-AV: E=McAfee;i="6700,10204,11224"; a="31129490"
X-IronPort-AV: E=Sophos;i="6.11,202,1725346800"; d="scan'208";a="31129490"
Received: from orviesa001.jf.intel.com ([10.64.159.141])
 by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 14 Oct 2024 04:12:39 -0700
X-CSE-ConnectionGUID: S3HsIC0YT9e7QalkXOV1Jg==
X-CSE-MsgGUID: pJX7MTUHSBWJYV/QPAePWQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.11,202,1725346800"; d="scan'208";a="114998495"
Received: from panli.sh.intel.com ([10.239.154.73])
 by orviesa001.jf.intel.com with ESMTP; 14 Oct 2024 04:12:36 -0700
From: pan2.li@intel.com
To: gcc-patches@gcc.gnu.org
Cc: richard.guenther@gmail.com, Tamar.Christina@arm.com, juzhe.zhong@rivai.ai,
 kito.cheng@gmail.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com,
 Pan Li <pan2.li@intel.com>
Subject: [PATCH 03/11] RISC-V: Implement vector SAT_TRUNC for signed integer
Date: Mon, 14 Oct 2024 19:10:50 +0800
Message-ID: <20241014111058.1033886-3-pan2.li@intel.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241014111058.1033886-1-pan2.li@intel.com>
References: <20241014111058.1033886-1-pan2.li@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_ASCII_DIVIDERS, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

From: Pan Li <pan2.li@intel.com>

This patch would like to implement the sstrunc for vector signed integer.

Form 1:
  #define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX                     \
	  ? trunc                                                       \
	  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

DEF_VEC_SAT_S_TRUNC_FMT_1(int32_t, int64_t, INT32_MIN, INT32_MAX)

Before this patch:
  27   │     vsetvli a5,a2,e64,m1,ta,ma
  28   │     vle64.v v1,0(a1)
  29   │     slli    a3,a5,3
  30   │     slli    a4,a5,2
  31   │     sub a2,a2,a5
  32   │     add a1,a1,a3
  33   │     vadd.vv v0,v1,v5
  34   │     vsetvli zero,zero,e32,mf2,ta,ma
  35   │     vnsrl.wx    v2,v1,a6
  36   │     vncvt.x.x.w v1,v1
  37   │     vsetvli zero,zero,e64,m1,ta,ma
  38   │     vmsgtu.vv   v0,v0,v4
  39   │     vsetvli zero,zero,e32,mf2,ta,mu
  40   │     vneg.v  v2,v2
  41   │     vxor.vv v1,v2,v3,v0.t
  42   │     vse32.v v1,0(a0)
  43   │     add a0,a0,a4
  44   │     bne a2,zero,.L3

After this patch:
  16   │     vsetvli a5,a2,e32,mf2,ta,ma
  17   │     vle64.v v1,0(a1)
  18   │     slli    a3,a5,3
  19   │     slli    a4,a5,2
  20   │     sub a2,a2,a5
  21   │     add a1,a1,a3
  22   │     vnclip.wi   v1,v1,0
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a2,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

	* config/riscv/autovec.md (sstrunc<mode><v_double_trunc>2): Add
	new pattern sstrunc for double trunc.
	(sstrunc<mode><v_quad_trunc>2): Ditto but for quad trunc.
	(sstrunc<mode><v_oct_trunc>2): Ditto but for oct trunc.
	* config/riscv/riscv-protos.h (expand_vec_double_sstrunc): Add
	new func decl to expand double trunc.
	(expand_vec_quad_sstrunc): Ditto but for quad trunc.
	(expand_vec_oct_sstrunc): Ditto but for oct trunc.
	* config/riscv/riscv-v.cc (expand_vec_double_sstrunc): Add new
	func to expand double trunc.
	(expand_vec_quad_sstrunc): Ditto but for quad trunc.
	(expand_vec_oct_sstrunc): Ditto but for oct trunc.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md     | 34 ++++++++++++++++++++++++
 gcc/config/riscv/riscv-protos.h |  4 +++
 gcc/config/riscv/riscv-v.cc     | 46 +++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7dc78a48874..82d65a95e7a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2779,6 +2779,40 @@ (define_expand "ustrunc<mode><v_oct_trunc>2"
   }
 )
 
+(define_expand "sstrunc<mode><v_double_trunc>2"
+  [(match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (match_operand:VWEXTI           1 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_double_sstrunc (operands[0], operands[1],
+					  <MODE>mode);
+    DONE;
+  }
+)
+
+(define_expand "sstrunc<mode><v_quad_trunc>2"
+  [(match_operand:<V_QUAD_TRUNC> 0 "register_operand")
+   (match_operand:VQEXTI         1 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_quad_sstrunc (operands[0], operands[1], <MODE>mode,
+					   <V_DOUBLE_TRUNC>mode);
+    DONE;
+  }
+)
+
+(define_expand "sstrunc<mode><v_oct_trunc>2"
+  [(match_operand:<V_OCT_TRUNC> 0 "register_operand")
+   (match_operand:VOEXTI        1 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_oct_sstrunc (operands[0], operands[1], <MODE>mode,
+					  <V_DOUBLE_TRUNC>mode,
+					  <V_QUAD_TRUNC>mode);
+    DONE;
+  }
+)
+
 ;; =========================================================================
 ;; == Early break auto-vectorization patterns
 ;; =========================================================================
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b2f5d72f494..2b2378468e2 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -651,9 +651,13 @@ void expand_vec_ssadd (rtx, rtx, rtx, machine_mode);
 void expand_vec_ussub (rtx, rtx, rtx, machine_mode);
 void expand_vec_sssub (rtx, rtx, rtx, machine_mode);
 void expand_vec_double_ustrunc (rtx, rtx, machine_mode);
+void expand_vec_double_sstrunc (rtx, rtx, machine_mode);
 void expand_vec_quad_ustrunc (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_quad_sstrunc (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_oct_ustrunc (rtx, rtx, machine_mode, machine_mode,
 			     machine_mode);
+void expand_vec_oct_sstrunc (rtx, rtx, machine_mode, machine_mode,
+			     machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fba35652cc2..65d36dc31d2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4927,6 +4927,22 @@ expand_vec_double_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode)
   emit_vlmax_insn (icode, BINARY_OP_VXRM_RNU, ops);
 }
 
+/* Expand the standard name sstrunc<m><n>2 for double vector mode,  like
+   DI => SI.  we can leverage the vector fixed point vector narrowing
+   fixed-point clip directly.  */
+
+void
+expand_vec_double_sstrunc (rtx op_0, rtx op_1, machine_mode vec_mode)
+{
+  insn_code icode;
+  rtx zero = CONST0_RTX (Xmode);
+  enum unspec unspec = UNSPEC_VNCLIP;
+  rtx ops[] = {op_0, op_1, zero};
+
+  icode = code_for_pred_narrow_clip_scalar (unspec, vec_mode);
+  emit_vlmax_insn (icode, BINARY_OP_VXRM_RNU, ops);
+}
+
 /* Expand the standard name ustrunc<m><n>2 for double vector mode,  like
    DI => HI.  we can leverage the vector fixed point vector narrowing
    fixed-point clip directly.  */
@@ -4941,6 +4957,20 @@ expand_vec_quad_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode,
   expand_vec_double_ustrunc (op_0, double_rtx, double_mode);
 }
 
+/* Expand the standard name sstrunc<m><n>2 for quad vector mode,  like
+   DI => HI.  we can leverage the vector fixed point vector narrowing
+   fixed-point clip directly.  */
+
+void
+expand_vec_quad_sstrunc (rtx op_0, rtx op_1, machine_mode vec_mode,
+			 machine_mode double_mode)
+{
+  rtx double_rtx = gen_reg_rtx (double_mode);
+
+  expand_vec_double_sstrunc (double_rtx, op_1, vec_mode);
+  expand_vec_double_sstrunc (op_0, double_rtx, double_mode);
+}
+
 /* Expand the standard name ustrunc<m><n>2 for double vector mode,  like
    DI => QI.  we can leverage the vector fixed point vector narrowing
    fixed-point clip directly.  */
@@ -4957,6 +4987,22 @@ expand_vec_oct_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode,
   expand_vec_double_ustrunc (op_0, quad_rtx, quad_mode);
 }
 
+/* Expand the standard name sstrunc<m><n>2 for oct vector mode,  like
+   DI => QI.  we can leverage the vector fixed point vector narrowing
+   fixed-point clip directly.  */
+
+void
+expand_vec_oct_sstrunc (rtx op_0, rtx op_1, machine_mode vec_mode,
+			machine_mode double_mode, machine_mode quad_mode)
+{
+  rtx double_rtx = gen_reg_rtx (double_mode);
+  rtx quad_rtx = gen_reg_rtx (quad_mode);
+
+  expand_vec_double_sstrunc (double_rtx, op_1, vec_mode);
+  expand_vec_double_sstrunc (quad_rtx, double_rtx, double_mode);
+  expand_vec_double_sstrunc (op_0, quad_rtx, quad_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void