From patchwork Mon Oct 14 11:10:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 1996818 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=SneYXwk4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRvjy0dnKz1xvK for ; Mon, 14 Oct 2024 22:13:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 40EF6385AC30 for ; Mon, 14 Oct 2024 11:13:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by sourceware.org (Postfix) with ESMTPS id B241B3858C66 for ; Mon, 14 Oct 2024 11:12:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B241B3858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B241B3858C66 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728904383; cv=none; b=HqI3Tg0U2ukNLsHJAse8VDqv/2qjYxAMJI+z1QTjFtjzYbz73kVb7RRz7VgCyC7HOtEdXgSHKShJlTpTH6FaBiTzSekyD6lmwjjy/nZIr8oSSeIwE2q4LqCpMLk9JmXQ/DQpHmlWNKGvsqWRP2euIKpPhXY9+KtqR+m2soKvCaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728904383; c=relaxed/simple; bh=PzEzZY/y0msDK46n0NPyeX8GXzKFRrS3X1jI6WHfXoA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=TOs3degeAoQaJkPiIvDiNuv5iiUxbe3SozqPfPb7zGfKRTlby+w0y2INkg7HbB80i2I56qfEw9PuWxQnQZoVzaWL+38zb5YUbGi7xY6z0IgD/gvxvwDN0EopJ0HX1aMNlZBm9755mvvbL2z5Nod3uro6kOTHz+o4VHQJGwv9XfM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728904360; x=1760440360; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PzEzZY/y0msDK46n0NPyeX8GXzKFRrS3X1jI6WHfXoA=; b=SneYXwk4EFUdPZoT+4ooTLyC8By/FeMgUwYeI7v1401TO4JJA39mJFUh vIpOVPzki3VQHQM0bU+3kewaCu0KRSMheGxmxtlpg16yzxZb/0+NvnsHn s+BRM0U/HB9CtXIJ+dL3aBZS21Q8Mnl0ug2gAQ71WKL+vhMbaENxgZIbI dUyzRI+GfkC4yapvMn9KxY3Hfc7hJ1OD4qZQhF16HXlhqZ8QjhYilc75a mN3Z6S2XQlQN4GOhPvmPvRA7DXXVp/UXcVqZ7TEc7qBjO2IC3Nkv7A+6H BhLWAcAfQ32m5W4IfKJF54j1pPnyAE04gD2q95c/dXzA8YKrw32fZCKFk Q==; X-CSE-ConnectionGUID: nQrZsUfuQ0iMWHZtGxtd9A== X-CSE-MsgGUID: 4+9kHof8SpKzG68OxEvnZw== X-IronPort-AV: E=McAfee;i="6700,10204,11224"; a="31129490" X-IronPort-AV: E=Sophos;i="6.11,202,1725346800"; d="scan'208";a="31129490" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2024 04:12:39 -0700 X-CSE-ConnectionGUID: S3HsIC0YT9e7QalkXOV1Jg== X-CSE-MsgGUID: pJX7MTUHSBWJYV/QPAePWQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,202,1725346800"; d="scan'208";a="114998495" Received: from panli.sh.intel.com ([10.239.154.73]) by orviesa001.jf.intel.com with ESMTP; 14 Oct 2024 04:12:36 -0700 From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, Tamar.Christina@arm.com, juzhe.zhong@rivai.ai, kito.cheng@gmail.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Pan Li Subject: [PATCH 03/11] RISC-V: Implement vector SAT_TRUNC for signed integer Date: Mon, 14 Oct 2024 19:10:50 +0800 Message-ID: <20241014111058.1033886-3-pan2.li@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241014111058.1033886-1-pan2.li@intel.com> References: <20241014111058.1033886-1-pan2.li@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Pan Li This patch would like to implement the sstrunc for vector signed integer. Form 1: #define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \ void __attribute__((noinline)) \ vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ WT x = in[i]; \ NT trunc = (NT)x; \ out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } \ } DEF_VEC_SAT_S_TRUNC_FMT_1(int32_t, int64_t, INT32_MIN, INT32_MAX) Before this patch: 27 │ vsetvli a5,a2,e64,m1,ta,ma 28 │ vle64.v v1,0(a1) 29 │ slli a3,a5,3 30 │ slli a4,a5,2 31 │ sub a2,a2,a5 32 │ add a1,a1,a3 33 │ vadd.vv v0,v1,v5 34 │ vsetvli zero,zero,e32,mf2,ta,ma 35 │ vnsrl.wx v2,v1,a6 36 │ vncvt.x.x.w v1,v1 37 │ vsetvli zero,zero,e64,m1,ta,ma 38 │ vmsgtu.vv v0,v0,v4 39 │ vsetvli zero,zero,e32,mf2,ta,mu 40 │ vneg.v v2,v2 41 │ vxor.vv v1,v2,v3,v0.t 42 │ vse32.v v1,0(a0) 43 │ add a0,a0,a4 44 │ bne a2,zero,.L3 After this patch: 16 │ vsetvli a5,a2,e32,mf2,ta,ma 17 │ vle64.v v1,0(a1) 18 │ slli a3,a5,3 19 │ slli a4,a5,2 20 │ sub a2,a2,a5 21 │ add a1,a1,a3 22 │ vnclip.wi v1,v1,0 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a2,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (sstrunc2): Add new pattern sstrunc for double trunc. (sstrunc2): Ditto but for quad trunc. (sstrunc2): Ditto but for oct trunc. * config/riscv/riscv-protos.h (expand_vec_double_sstrunc): Add new func decl to expand double trunc. (expand_vec_quad_sstrunc): Ditto but for quad trunc. (expand_vec_oct_sstrunc): Ditto but for oct trunc. * config/riscv/riscv-v.cc (expand_vec_double_sstrunc): Add new func to expand double trunc. (expand_vec_quad_sstrunc): Ditto but for quad trunc. (expand_vec_oct_sstrunc): Ditto but for oct trunc. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 34 ++++++++++++++++++++++++ gcc/config/riscv/riscv-protos.h | 4 +++ gcc/config/riscv/riscv-v.cc | 46 +++++++++++++++++++++++++++++++++ 3 files changed, 84 insertions(+) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 7dc78a48874..82d65a95e7a 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2779,6 +2779,40 @@ (define_expand "ustrunc2" } ) +(define_expand "sstrunc2" + [(match_operand: 0 "register_operand") + (match_operand:VWEXTI 1 "register_operand")] + "TARGET_VECTOR" + { + riscv_vector::expand_vec_double_sstrunc (operands[0], operands[1], + mode); + DONE; + } +) + +(define_expand "sstrunc2" + [(match_operand: 0 "register_operand") + (match_operand:VQEXTI 1 "register_operand")] + "TARGET_VECTOR" + { + riscv_vector::expand_vec_quad_sstrunc (operands[0], operands[1], mode, + mode); + DONE; + } +) + +(define_expand "sstrunc2" + [(match_operand: 0 "register_operand") + (match_operand:VOEXTI 1 "register_operand")] + "TARGET_VECTOR" + { + riscv_vector::expand_vec_oct_sstrunc (operands[0], operands[1], mode, + mode, + mode); + DONE; + } +) + ;; ========================================================================= ;; == Early break auto-vectorization patterns ;; ========================================================================= diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index b2f5d72f494..2b2378468e2 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -651,9 +651,13 @@ void expand_vec_ssadd (rtx, rtx, rtx, machine_mode); void expand_vec_ussub (rtx, rtx, rtx, machine_mode); void expand_vec_sssub (rtx, rtx, rtx, machine_mode); void expand_vec_double_ustrunc (rtx, rtx, machine_mode); +void expand_vec_double_sstrunc (rtx, rtx, machine_mode); void expand_vec_quad_ustrunc (rtx, rtx, machine_mode, machine_mode); +void expand_vec_quad_sstrunc (rtx, rtx, machine_mode, machine_mode); void expand_vec_oct_ustrunc (rtx, rtx, machine_mode, machine_mode, machine_mode); +void expand_vec_oct_sstrunc (rtx, rtx, machine_mode, machine_mode, + machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx), enum avl_type); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index fba35652cc2..65d36dc31d2 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -4927,6 +4927,22 @@ expand_vec_double_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode) emit_vlmax_insn (icode, BINARY_OP_VXRM_RNU, ops); } +/* Expand the standard name sstrunc2 for double vector mode, like + DI => SI. we can leverage the vector fixed point vector narrowing + fixed-point clip directly. */ + +void +expand_vec_double_sstrunc (rtx op_0, rtx op_1, machine_mode vec_mode) +{ + insn_code icode; + rtx zero = CONST0_RTX (Xmode); + enum unspec unspec = UNSPEC_VNCLIP; + rtx ops[] = {op_0, op_1, zero}; + + icode = code_for_pred_narrow_clip_scalar (unspec, vec_mode); + emit_vlmax_insn (icode, BINARY_OP_VXRM_RNU, ops); +} + /* Expand the standard name ustrunc2 for double vector mode, like DI => HI. we can leverage the vector fixed point vector narrowing fixed-point clip directly. */ @@ -4941,6 +4957,20 @@ expand_vec_quad_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode, expand_vec_double_ustrunc (op_0, double_rtx, double_mode); } +/* Expand the standard name sstrunc2 for quad vector mode, like + DI => HI. we can leverage the vector fixed point vector narrowing + fixed-point clip directly. */ + +void +expand_vec_quad_sstrunc (rtx op_0, rtx op_1, machine_mode vec_mode, + machine_mode double_mode) +{ + rtx double_rtx = gen_reg_rtx (double_mode); + + expand_vec_double_sstrunc (double_rtx, op_1, vec_mode); + expand_vec_double_sstrunc (op_0, double_rtx, double_mode); +} + /* Expand the standard name ustrunc2 for double vector mode, like DI => QI. we can leverage the vector fixed point vector narrowing fixed-point clip directly. */ @@ -4957,6 +4987,22 @@ expand_vec_oct_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode, expand_vec_double_ustrunc (op_0, quad_rtx, quad_mode); } +/* Expand the standard name sstrunc2 for oct vector mode, like + DI => QI. we can leverage the vector fixed point vector narrowing + fixed-point clip directly. */ + +void +expand_vec_oct_sstrunc (rtx op_0, rtx op_1, machine_mode vec_mode, + machine_mode double_mode, machine_mode quad_mode) +{ + rtx double_rtx = gen_reg_rtx (double_mode); + rtx quad_rtx = gen_reg_rtx (quad_mode); + + expand_vec_double_sstrunc (double_rtx, op_1, vec_mode); + expand_vec_double_sstrunc (quad_rtx, double_rtx, double_mode); + expand_vec_double_sstrunc (op_0, quad_rtx, quad_mode); +} + /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as well. */ void