From patchwork Tue May 28 03:14:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 1940200 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=C1LnDJSZ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VpHhJ2tHrz20Pq for ; Tue, 28 May 2024 13:15:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CA4BF3858C78 for ; Tue, 28 May 2024 03:15:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 0F5813858C78 for ; Tue, 28 May 2024 03:15:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F5813858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0F5813858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716866105; cv=none; b=h8vwS1pD2HHD2w5Ii5ebXHqEZm+eMUq4DquXGzYJIKP+h1hAHtLsBY371MB8mnIdyShH4mkKbC8pG/l/bPAfRshtj9gN2D6yo9gPv1vNHr4/qjTVSw3e2eJtlsjulNYZsjhdllyI4z2sdFdQYObLL+4uGQuuCGeoxkb/Vz+3+zw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716866105; c=relaxed/simple; bh=pbPNQPJc7qZqfBlsXVPP7KUJS/AwJwsLEByfFbW+e8A=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Qyv3mf/97KPW9x8E3YB/9HvQsjgGoAXdVAoxff9URWXp4x8U2onJ8J74JjPZ6JxOjKNlvvE5yNnbmidfdgL44LC7PKvpYzilJhNeigdhNvZl2ByYzdwQWeuCof7XhqMvc/zO5Bs3WkUgvKm12VF6nSzOXEdq0xpiiyKCCTOJBK0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1716866102; x=1748402102; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=pbPNQPJc7qZqfBlsXVPP7KUJS/AwJwsLEByfFbW+e8A=; b=C1LnDJSZSClNfVD52RuVYZ84lMBvw5KXfNeoaHWzNd+sznZvXtIC5ajX uhPta56j2Q+dhFhTikJLIB47AJz0CsXIFDEm6HhLjW0L/S4HqAsBvYcSr JgmgRS/WpS27AL5SfEq0CPfpBKxMNZkDRZX30n8xmMMnpG2yp2zD9cByg edowjL5jq36+G8wwMKQ2ypsUaqkM6Xq01y+e7O+cIixTfvK8HY8TgXfl/ /uAsAM/JgCz4UvOZyb+5afmeFqMfiQFSivQIseEw4rdQi6Kytkc56ivSc 3vRFL4bbej/+i91tkqUgChSaHi+4V6zrXVJh/enwUGQ4xGOrb+vdP0wzc g==; X-CSE-ConnectionGUID: tVBKAc9GQxu1/qPn6LqV3A== X-CSE-MsgGUID: euaIv1skS9qzBYoW2ftuxA== X-IronPort-AV: E=McAfee;i="6600,9927,11085"; a="12965519" X-IronPort-AV: E=Sophos;i="6.08,194,1712646000"; d="scan'208";a="12965519" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2024 20:15:00 -0700 X-CSE-ConnectionGUID: 92gFvsrpRt+deuBY+mJTRg== X-CSE-MsgGUID: bw0vnqlBTd6Ckr9oRr1EXQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,194,1712646000"; d="scan'208";a="35000794" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa009.fm.intel.com with ESMTP; 27 May 2024 20:14:55 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 470F210077DC; Tue, 28 May 2024 11:14:54 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, tamar.christina@arm.com, richard.guenther@gmail.com, Pan Li Subject: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store Date: Tue, 28 May 2024 11:14:52 +0800 Message-Id: <20240528031452.2706461-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Pan Li This patch would like to add new internal fun for the below 2 IFN. * mask_len_strided_load * mask_len_strided_store The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias). The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias) be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias). The below test suites are passed for this patch: * The x86 bootstrap test. * The x86 fully regression test. * The riscv fully regression test. gcc/ChangeLog: * doc/md.texi: Add description for mask_len_strided_load/store. * internal-fn.cc (strided_load_direct): New internal_fn define for strided_load_direct. (strided_store_direct): Ditto but for store. (expand_strided_load_optab_fn): New expand func for mask_len_strided_load. (expand_strided_store_optab_fn): Ditto but for store. (direct_strided_load_optab_supported_p): New define for load direct optab supported. (direct_strided_store_optab_supported_p): Ditto but for store. (internal_fn_len_index): Add len index for both load and store. (internal_fn_mask_index): Ditto but for mask index. (internal_fn_stored_value_index): Add stored index. * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define for strided_load. (MASK_LEN_STRIDED_STORE): Ditto but for stride_store. * optabs.def (OPTAB_D): New optab define for load and store. Signed-off-by: Pan Li Co-Authored-By: Juzhe-Zhong --- gcc/doc/md.texi | 27 ++++++++++++++++ gcc/internal-fn.cc | 75 +++++++++++++++++++++++++++++++++++++++++++++ gcc/internal-fn.def | 6 ++++ gcc/optabs.def | 2 ++ 4 files changed, 110 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5730bda80dc..3d242675c63 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be undefined. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_load@var{m}} instruction pattern +@item @samp{mask_len_strided_load@var{m}} +Load several separate memory locations into a destination vector of mode @var{m}. +Operand 0 is a destination vector of mode @var{m}. +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step. +For each element index i load address is operand 1 + @var{i} * operand 2. +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory. +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be zero. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_store@var{m}} instruction pattern +@item @samp{mask_len_strided_store@var{m}} +Store a vector of mode m into several distinct memory locations. +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode. +Operand 2 is the vector of values that should be stored, which is of mode @var{m}. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step. +For each element index i store address is operand 0 + @var{i} * operand 1. +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory. +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 9c09026793f..f6e5329cd84 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -159,6 +159,7 @@ init_internal_fns () #define load_lanes_direct { -1, -1, false } #define mask_load_lanes_direct { -1, -1, false } #define gather_load_direct { 3, 1, false } +#define strided_load_direct { -1, -1, false } #define len_load_direct { -1, -1, false } #define mask_len_load_direct { -1, 4, false } #define mask_store_direct { 3, 2, false } @@ -168,6 +169,7 @@ init_internal_fns () #define vec_cond_mask_len_direct { 1, 1, false } #define vec_cond_direct { 2, 0, false } #define scatter_store_direct { 3, 1, false } +#define strided_store_direct { 1, 1, false } #define len_store_direct { 3, 3, false } #define mask_len_store_direct { 4, 5, false } #define vec_set_direct { 3, 3, false } @@ -3668,6 +3670,68 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) emit_move_insn (lhs_rtx, ops[0].value); } +/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB. */ + +static void +expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt, + direct_optab optab) +{ + tree lhs = gimple_call_lhs (stmt); + tree base = gimple_call_arg (stmt, 0); + tree stride = gimple_call_arg (stmt, 1); + + rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx base_rtx = expand_normal (base); + rtx stride_rtx = expand_normal (stride); + + unsigned i = 0; + class expand_operand ops[6]; + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + + create_output_operand (&ops[i++], lhs_rtx, mode); + create_address_operand (&ops[i++], base_rtx); + create_address_operand (&ops[i++], stride_rtx); + + insn_code icode = direct_optab_handler (optab, mode); + + i = add_mask_and_len_args (ops, i, stmt); + expand_insn (icode, i, ops); + + if (!rtx_equal_p (lhs_rtx, ops[0].value)) + emit_move_insn (lhs_rtx, ops[0].value); +} + +/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB. */ + +static void +expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt, + direct_optab optab) +{ + internal_fn fn = gimple_call_internal_fn (stmt); + int rhs_index = internal_fn_stored_value_index (fn); + + tree base = gimple_call_arg (stmt, 0); + tree stride = gimple_call_arg (stmt, 1); + tree rhs = gimple_call_arg (stmt, rhs_index); + + rtx base_rtx = expand_normal (base); + rtx stride_rtx = expand_normal (stride); + rtx rhs_rtx = expand_normal (rhs); + + unsigned i = 0; + class expand_operand ops[6]; + machine_mode mode = TYPE_MODE (TREE_TYPE (rhs)); + + create_address_operand (&ops[i++], base_rtx); + create_address_operand (&ops[i++], stride_rtx); + create_input_operand (&ops[i++], rhs_rtx, mode); + + insn_code icode = direct_optab_handler (optab, mode); + i = add_mask_and_len_args (ops, i, stmt); + + expand_insn (icode, i, ops); +} + /* Helper for expand_DIVMOD. Return true if the sequence starting with INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes. */ @@ -4058,6 +4122,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_gather_load_optab_supported_p convert_optab_supported_p +#define direct_strided_load_optab_supported_p direct_optab_supported_p #define direct_len_load_optab_supported_p direct_optab_supported_p #define direct_mask_len_load_optab_supported_p convert_optab_supported_p #define direct_mask_store_optab_supported_p convert_optab_supported_p @@ -4066,6 +4131,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p #define direct_vec_cond_optab_supported_p convert_optab_supported_p #define direct_scatter_store_optab_supported_p convert_optab_supported_p +#define direct_strided_store_optab_supported_p direct_optab_supported_p #define direct_len_store_optab_supported_p direct_optab_supported_p #define direct_mask_len_store_optab_supported_p convert_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p @@ -4723,6 +4789,8 @@ internal_fn_len_index (internal_fn fn) case IFN_COND_LEN_XOR: case IFN_COND_LEN_SHL: case IFN_COND_LEN_SHR: + case IFN_MASK_LEN_STRIDED_LOAD: + case IFN_MASK_LEN_STRIDED_STORE: return 4; case IFN_COND_LEN_NEG: @@ -4817,6 +4885,10 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_LEN_STORE: return 2; + case IFN_MASK_LEN_STRIDED_LOAD: + case IFN_MASK_LEN_STRIDED_STORE: + return 3; + case IFN_MASK_GATHER_LOAD: case IFN_MASK_SCATTER_STORE: case IFN_MASK_LEN_GATHER_LOAD: @@ -4840,6 +4912,9 @@ internal_fn_stored_value_index (internal_fn fn) { switch (fn) { + case IFN_MASK_LEN_STRIDED_STORE: + return 2; + case IFN_MASK_STORE: case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 25badbb86e5..b30a7a5b009 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see - mask_load_lanes: currently just vec_mask_load_lanes - mask_len_load_lanes: currently just vec_mask_len_load_lanes - gather_load: used for {mask_,mask_len_,}gather_load + - strided_load: currently just mask_len_strided_load - len_load: currently just len_load - mask_len_load: currently just mask_len_load @@ -64,6 +65,7 @@ along with GCC; see the file COPYING3. If not see - mask_store_lanes: currently just vec_mask_store_lanes - mask_len_store_lanes: currently just vec_mask_len_store_lanes - scatter_store: used for {mask_,mask_len_,}scatter_store + - strided_store: currently just mask_len_strided_store - len_store: currently just len_store - mask_len_store: currently just mask_len_store @@ -212,6 +214,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE, mask_len_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE, + mask_len_strided_load, strided_load) DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load) @@ -221,6 +225,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, mask_scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0, mask_len_scatter_store, scatter_store) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0, + mask_len_strided_store, strided_store) DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) diff --git a/gcc/optabs.def b/gcc/optabs.def index 3f2cb46aff8..630b1de8f97 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") OPTAB_D (len_load_optab, "len_load_$a") OPTAB_D (len_store_optab, "len_store_$a") +OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a") +OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a") OPTAB_D (select_vl_optab, "select_vl$a")