From patchwork Thu Oct 12 06:41:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6ZKf5bGF5ZOy?= X-Patchwork-Id: 1847238 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4S5g6M6P8Cz23jd for ; Thu, 12 Oct 2023 17:42:02 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 67D213857701 for ; Thu, 12 Oct 2023 06:42:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg150.qq.com (smtpbg150.qq.com [18.132.163.193]) by sourceware.org (Postfix) with ESMTPS id 30A853858D28 for ; Thu, 12 Oct 2023 06:41:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 30A853858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp89t1697092899tc4lwu9j Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 12 Oct 2023 14:41:38 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 90EFqYDyPxDBsSRDO8Zk405X2EnMuAqbguq9hTsEmt+IK/Z8mejmGfqbdkNg/ gM6/7dizF0KWCXGNg8lKxeoxcWuWJXZQrpV0ziqmCOagQ4zTyesjaImNSlt8+nXj4c+m35f hEHl8H6CBpit3r5guZ6YRifOViohagIIDI8mruwMOapUrHQMq5PhoddeYlys9jiZ4bOasY+ YlHiYXoYOmdgDNzLOKtcZRYbY82yOxj0WaeVh5+hIILHhXgJ8vBZ7cgLBy2XqHxVF1GlNqy Hl89rkRgIqk9YHfA2khkACjdeTJe+DNuKxDmfmECwo1K8WSX3vxtdR2WDKvv66cjn6HEZvo qTVlyR33mTuMVFrycdoN00o1tr4cMI7X9bnIHtMzw8IsiG1B3rR9k9AahNZ2K7xi7RDVKXm OAidUOKPZJk= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 16532936623968094273 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH V2] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] Date: Thu, 12 Oct 2023 14:41:37 +0800 Message-Id: <20231012064137.733900-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. To naturally reuse the current flow of GATHER_LOAD/MASK_GATHER_LOAD. I adjust MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE pattern in tree-vect-patterns.cc Here is adjustment in tree-vect-patterns.cc: 1. For un-conditional gather load/scatter store: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) ---> MASK_LEN_GATHER_LOAD (base, offset, scale, zero) Note that we remove the dummy mask (-1) of MASK_LEN_GATHER_LOAD, so that we can reuse the current SLP flow of GATHER_LOAD. 2. For conditional gather load/scatter store: We don't change the IR, so they have an additional conditional mask. Then, we reuse the current flow of MASK_GATHER_LOAD. So, after the recognization of patterns (tree-vect-patterns.cc), we will end up with scalar gather/scatter IR with different num arguments. (4 arguments for un-conditional, 5 arguments for conditional). The difference only apply on scalar gather/scatter IR. Pass through "call" argument to "internal_fn_mask_index" and return the mask_index according to CALL for mask_len_gather/mask_len_scatter. For vector IR, they are always same (keep original format): MASK_GATHER_LOAD (ptr, offset, scale, zero, mask, len, bias). Hence, the optab of mask_len gather/scatter don't change. To conclude, we only change the format of mask_len gather/scatter scalar IR in tree-vect-patterns.cc It seems the flow of MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE after this patch seems to be more natural and reasonable. Also, I realize that SLP of conditional gather_load is missing so I append a test for that. RISC-V regression passed and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * internal-fn.cc (internal_fn_mask_index): Add call argument. * internal-fn.h (internal_fn_mask_index): Ditto. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Delete MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE. * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Ditto. (vectorizable_store): Adapt for new interface of internal_fn_mask_index. (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/internal-fn.cc | 16 ++++++++++++++-- gcc/internal-fn.h | 2 +- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++ gcc/tree-vect-patterns.cc | 4 +--- gcc/tree-vect-slp.cc | 17 +++++++++++++++-- gcc/tree-vect-stmts.cc | 6 +++--- 6 files changed, 49 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 61d5a9e4772..009ebd95785 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4701,7 +4701,7 @@ internal_fn_len_index (internal_fn fn) otherwise return -1. */ int -internal_fn_mask_index (internal_fn fn) +internal_fn_mask_index (internal_fn fn, gcall *call) { switch (fn) { @@ -4717,9 +4717,21 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_GATHER_LOAD: case IFN_MASK_SCATTER_STORE: + return 4; + case IFN_MASK_LEN_GATHER_LOAD: case IFN_MASK_LEN_SCATTER_STORE: - return 4; + /* In tree-vect-patterns.cc, we will have these 2 situations: + + - Unconditional gather load transforms + into MASK_LEN_GATHER_LOAD with no mask. + + - Conditional gather load transforms + into MASK_LEN_GATHER_LOAD with real conditional mask.*/ + if (!call || gimple_num_args (call) == 5) + return 4; + else + return -1; default: return (conditional_internal_fn_code (fn) != ERROR_MARK diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 99de13a0199..62fbbd537f4 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -235,7 +235,7 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *, extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); extern bool internal_gather_scatter_fn_p (internal_fn); -extern int internal_fn_mask_index (internal_fn); +extern int internal_fn_mask_index (internal_fn, gcall * = nullptr); extern int internal_fn_len_index (internal_fn); extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c new file mode 100644 index 00000000000..ff55f321854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n) +{ + for (int i = 0; i < n; ++i) + { + if (cond[i * 2]) + y[i * 2] = x[indices[i * 2]] + 1; + if (cond[i * 2 + 1]) + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; + } +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 6964c998698..7aaeecbbaed 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -6142,9 +6142,7 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, mask = vect_convert_mask_for_vectype (mask, gs_vectype, stmt_info, loop_vinfo); else if (gs_info.ifn == IFN_MASK_SCATTER_STORE - || gs_info.ifn == IFN_MASK_GATHER_LOAD - || gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE - || gs_info.ifn == IFN_MASK_LEN_GATHER_LOAD) + || gs_info.ifn == IFN_MASK_GATHER_LOAD) mask = build_int_cst (TREE_TYPE (truth_type_for (gs_vectype)), -1); /* Get the invariant base and non-invariant offset, converting the diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index fa098f9ff4e..8e4116f6fa8 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -544,6 +544,16 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) case IFN_MASK_GATHER_LOAD: return arg1_arg4_map; + case IFN_MASK_LEN_GATHER_LOAD: + /* In tree-vect-patterns.cc, we will have these 2 situations: + + - Unconditional gather load transforms + into MASK_LEN_GATHER_LOAD with dummy mask which is -1. + + - Conditional gather load transforms + into MASK_LEN_GATHER_LOAD with real conditional mask.*/ + return gimple_num_args (call) == 5 ? arg1_arg4_map : arg1_map; + case IFN_MASK_STORE: return arg3_arg2_map; @@ -1077,7 +1087,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (cfn == CFN_MASK_LOAD || cfn == CFN_GATHER_LOAD - || cfn == CFN_MASK_GATHER_LOAD) + || cfn == CFN_MASK_GATHER_LOAD + || cfn == CFN_MASK_LEN_GATHER_LOAD) ldst_p = true; else if (cfn == CFN_MASK_STORE) { @@ -1337,6 +1348,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) && rhs_code != CFN_GATHER_LOAD && rhs_code != CFN_MASK_GATHER_LOAD + && rhs_code != CFN_MASK_LEN_GATHER_LOAD /* Not grouped loads are handled as externals for BB vectorization. For loop vectorization we can handle splats the same we handle single element interleaving. */ @@ -1837,7 +1849,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, if (gcall *stmt = dyn_cast (stmt_info->stmt)) gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD) || gimple_call_internal_p (stmt, IFN_GATHER_LOAD) - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)); + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD) + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD)); else { *max_nunits = this_max_nunits; diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index cd7c1090d88..a2a3486931d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -448,7 +448,7 @@ exist_non_indexing_operands_for_use_p (tree use, stmt_vec_info stmt_info) if (call && gimple_call_internal_p (call)) { internal_fn ifn = gimple_call_internal_fn (call); - int mask_index = internal_fn_mask_index (ifn); + int mask_index = internal_fn_mask_index (ifn, call); if (mask_index >= 0 && use == gimple_call_arg (call, mask_index)) return true; @@ -8246,7 +8246,7 @@ vectorizable_store (vec_info *vinfo, if (!internal_store_fn_p (ifn)) return false; - int mask_index = internal_fn_mask_index (ifn); + int mask_index = internal_fn_mask_index (ifn, call); if (mask_index >= 0 && slp_node) mask_index = vect_slp_child_index_for_operand (call, mask_index); if (mask_index >= 0 @@ -9574,7 +9574,7 @@ vectorizable_load (vec_info *vinfo, if (!scalar_dest) return false; - mask_index = internal_fn_mask_index (ifn); + mask_index = internal_fn_mask_index (ifn, call); if (mask_index >= 0 && slp_node) mask_index = vect_slp_child_index_for_operand (call, mask_index); if (mask_index >= 0