From patchwork Fri Aug 23 16:55:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 1152311 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-507626-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Id8D3zGO"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46FSFt5ptGz9s3Z for ; Sat, 24 Aug 2019 02:55:45 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=YtYjzoK1LhoB2BD9vTWkNXSGljT8p8WgKIxdpQPqd3cFT7VZp9 ueOi257zXXHYrlT6nsYX8EBgHZ7WtGFe5tIVnFQnxkzdG+J0ZNgtsS6cf7kcSKwK 8nZHrnw3emO6O2hOZsg+kUxzPUWVDtSHg7VucZRf+nSLkCA+2w4nNbR+4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=wWlgHOFa/kmWQa8IRP1YOhrd61M=; b=Id8D3zGOZKd+s1oP2pi6 Y5hV09UX+vbkwmxxp6TI27MARaNg41/rLzlUUxZU+wbwII4erYyp6SFbWw4O5WgM 6ZYFnXklq/lRJO731YEXPFehvlNWHtbqCblNjRsdeZCitdjHerQXDNgKsgaPUavN A4t9y5yu8wrV5kRt5ad+OzE= Received: (qmail 48658 invoked by alias); 23 Aug 2019 16:55:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 48647 invoked by uid 89); 23 Aug 2019 16:55:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LOTSOFHASH, SPF_PASS autolearn=ham version=3.3.1 spammy=accumulated X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 23 Aug 2019 16:55:33 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5EED028; Fri, 23 Aug 2019 09:55:32 -0700 (PDT) Received: from [10.2.206.37] (e107157-lin.cambridge.arm.com [10.2.206.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DFAAF3F246; Fri, 23 Aug 2019 09:55:31 -0700 (PDT) To: gcc-patches , Richard Biener From: "Andre Vieira (lists)" Subject: [RFC][vect]PR: 65930 teach vectorizer to handle SUM reductions with sign-change casts Message-ID: <780e0f8d-2b31-a9e8-1701-4a895e95d9b6@arm.com> Date: Fri, 23 Aug 2019 17:55:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 X-IsSubscribed: yes Hi Richard, I have come up with a way to teach the vectorizer to handle sign-changing reductions, restricted to SUM operations as I'm not sure other reductions are equivalent with different signs. The main nature of this approach is to let it recognize reductions of the form: Phi->NopConversion?->Plus/Minus-reduction->NopConversion?->Phi. Then vectorize the statements normally, with some extra workarounds to handle the conversions. This is mainly needed where it looks for uses of the result of the reduction, we now need to check the uses of the result of the conversion instead. I am curious to know what you think of this approach. I have regression tested this on aarch64 and x86_64 with AVX512 and it shows no regressions. On the 1 month old version of trunk I tested on it even seems to make gcc.dg/vect/pr89268.c pass, where it used to fail with an ICE complaining about a definition not dominating a use. The initial benchmarks I did also show a 14% improvement on x264_r on SPEC2017 for aarch64. Cheers, Andre diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index b0cbbac0cb5ba1ffce706715d3dbb9139063803d..a346547153b6b12fd9090dd7491766986ab2f4f9 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2576,6 +2576,26 @@ report_vect_op (dump_flags_t msg_type, gimple *stmt, const char *msg) dump_printf_loc (msg_type, vect_location, "%s%G", msg, stmt); } +/* Function is_nop_conversion_stmt + + Check if STMT is a gimple assign statement that does a tree nop conversion. + */ + +bool +is_nop_conversion_stmt (gimple *stmt) +{ + tree outer_t, inner_t; + if (!is_gimple_assign (stmt)) + return false; + if (gimple_assign_rhs_code (stmt) != NOP_EXPR) + return false; + + outer_t = TREE_TYPE (gimple_assign_lhs (stmt)); + inner_t = TREE_TYPE (gimple_assign_rhs1 (stmt)); + + return tree_nop_conversion_p (outer_t, inner_t); +} + /* DEF_STMT_INFO occurs in a loop that contains a potential reduction operation. Return true if the results of DEF_STMT_INFO are something that can be accumulated by such a reduction. */ @@ -2649,7 +2669,9 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple *phi, if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))) { loop_use_stmt = use_stmt; - nloop_uses++; + /* Do not count a nop conversion as a use. */ + if (!is_nop_conversion_stmt (use_stmt)) + nloop_uses++; } else n_out_of_loop_uses++; @@ -2663,18 +2685,24 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple *phi, if (found) break; - /* We reached a statement with no loop uses. */ - if (nloop_uses == 0) - return false; - /* This is a loop exit phi, and we haven't reached the reduction phi. */ if (gimple_code (loop_use_stmt) == GIMPLE_PHI) return false; - if (!is_gimple_assign (loop_use_stmt) - || code != gimple_assign_rhs_code (loop_use_stmt) - || !flow_bb_inside_loop_p (loop, gimple_bb (loop_use_stmt))) - return false; + if (!is_gimple_assign (loop_use_stmt)) + return false; + + /* Keep moving along the def-use chain, ignoring nop coversions. */ + if (!is_nop_conversion_stmt (loop_use_stmt)) + { + /* We reached a statement with no loop uses. */ + if (nloop_uses == 0) + return false; + + else if (code != gimple_assign_rhs_code (loop_use_stmt) + || !flow_bb_inside_loop_p (loop, gimple_bb (loop_use_stmt))) + return false; + } /* Insert USE_STMT into reduction chain. */ use_stmt_info = loop_info->lookup_stmt (loop_use_stmt); @@ -2693,7 +2721,9 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple *phi, for (unsigned i = 0; i < reduc_chain.length (); ++i) { gassign *next_stmt = as_a (reduc_chain[i]->stmt); - if (gimple_assign_rhs2 (next_stmt) == lhs) + if (is_nop_conversion_stmt (next_stmt)) + continue; + else if (gimple_assign_rhs2 (next_stmt) == lhs) { tree op = gimple_assign_rhs1 (next_stmt); stmt_vec_info def_stmt_info = loop_info->lookup_def (op); @@ -3120,6 +3150,28 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, gassign *def_stmt = as_a (def_stmt_info->stmt); code = orig_code = gimple_assign_rhs_code (def_stmt); + /* If the def_stmt is a nop conversion then this is not the real reduction + definition statement. Follow the definition of the variable this + statement is converting to the actual reduction definition. */ + if (is_nop_conversion_stmt (def_stmt)) + { + tree rhs = gimple_assign_rhs1 (def_stmt); + gimple *new_def = SSA_NAME_DEF_STMT (rhs); + + if (is_gimple_assign (new_def)) + { + enum tree_code new_code = gimple_assign_rhs_code (new_def); + /* Only do this for reductions that are safe to ignore the signedness + though. */ + if (new_code == PLUS_EXPR || new_code == MINUS_EXPR) + { + def_stmt = as_a (new_def); + def_stmt_info = loop_info->lookup_stmt (def_stmt); + code = orig_code = new_code; + } + } + } + if (nested_in_vect_loop && !check_reduction) { /* FIXME: Even for non-reductions code generation is funneled @@ -4551,6 +4603,7 @@ vect_create_epilog_for_reduction (vec vect_defs, tree new_phi_result; stmt_vec_info inner_phi = NULL; tree induction_index = NULL_TREE; + stmt_vec_info use_stmt_info; if (slp_node) group_size = SLP_TREE_SCALAR_STMTS (slp_node).length (); @@ -4798,6 +4851,18 @@ vect_create_epilog_for_reduction (vec vect_defs, v_out1 = phi Store them in NEW_PHIS. */ + stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info); + scalar_dest = gimple_assign_lhs (orig_stmt_info->stmt); + if ((use_stmt_info = loop_vinfo->lookup_single_use (scalar_dest)) + && is_nop_conversion_stmt (use_stmt_info->stmt)) + scalar_dest = gimple_assign_lhs (use_stmt_info->stmt); + + scalar_type = TREE_TYPE (scalar_dest); + scalar_results.create (group_size); + new_scalar_dest = vect_create_destination_var (scalar_dest, NULL); + bitsize = TYPE_SIZE (scalar_type); + + exit_bb = single_exit (loop)->dest; prev_phi_info = NULL; new_phis.create (vect_defs.length ()); @@ -4805,6 +4870,34 @@ vect_create_epilog_for_reduction (vec vect_defs, { for (j = 0; j < ncopies; j++) { + /* If use_stmt_info is NULL, then the scalar destination does not + have a single use. This means we could have the following case: + loop: + phi_r = (phi_i, loop), (initial_def, pre_header) + cast_i = (int) phi_r; + sum = cast_i + ...; + phi_i = (unsigned int) sum; + loop_exit: + phi_out = (sum, loop) + + In this case, the def will currently point to the result of the + cast rather than the result of the reduction, which means the loop + exit phi's will be constructed using the wrong type. For this + reason we want to use the value of the reduction before the + casting. Note that we only accept reductions with sign-changing + casts if they are using operations that are sign-invariant. + */ + gimple *def_stmt; + if (!use_stmt_info + && !useless_type_conversion_p (TREE_TYPE (TREE_TYPE (def)), + scalar_type) + && tree_nop_conversion_p (TREE_TYPE (TREE_TYPE (def)), + scalar_type) + && (def_stmt = SSA_NAME_DEF_STMT (def)) + && is_gimple_assign (def_stmt) + && gimple_assign_rhs_code (def_stmt) == VIEW_CONVERT_EXPR) + def = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0); + tree new_def = copy_ssa_name (def); phi = create_phi_node (new_def, exit_bb); stmt_vec_info phi_info = loop_vinfo->add_stmt (phi); @@ -4863,7 +4956,6 @@ vect_create_epilog_for_reduction (vec vect_defs, Otherwise (it is a regular reduction) - the tree-code and scalar-def are taken from STMT. */ - stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info); if (orig_stmt_info != stmt_info) { /* Reduction pattern */ @@ -4877,12 +4969,6 @@ vect_create_epilog_for_reduction (vec vect_defs, if (code == MINUS_EXPR) code = PLUS_EXPR; - scalar_dest = gimple_assign_lhs (orig_stmt_info->stmt); - scalar_type = TREE_TYPE (scalar_dest); - scalar_results.create (group_size); - new_scalar_dest = vect_create_destination_var (scalar_dest, NULL); - bitsize = TYPE_SIZE (scalar_type); - /* In case this is a reduction in an inner-loop while vectorizing an outer loop - we don't need to extract a single scalar result at the end of the inner-loop (unless it is double reduction, i.e., the use of reduction is @@ -5591,16 +5677,50 @@ vect_finalize_reduction: if (adjustment_def) { gcc_assert (!slp_reduc); + if (nested_in_vect_loop) { - new_phi = new_phis[0]; + new_phi = new_phis[0]; + new_temp = PHI_RESULT (new_phi); + if (!useless_type_conversion_p (TREE_TYPE (TREE_TYPE (new_temp)), + TREE_TYPE (initial_def))) + { + gimple_seq stmts; + poly_uint64 sz + = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (initial_def))); + vectype + = get_vectype_for_scalar_type_and_size (TREE_TYPE (initial_def), + sz); + + gcc_assert (tree_nop_conversion_p (TREE_TYPE (TREE_TYPE (new_temp)), + TREE_TYPE (initial_def))); + + new_temp = build1 (VIEW_CONVERT_EXPR, vectype, new_temp); + new_temp = force_gimple_operand (unshare_expr(new_temp), &stmts, + true, NULL_TREE); + if (stmts) + gsi_insert_before (&exit_gsi, stmts, GSI_SAME_STMT); + } gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE); - expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def); + expr = build2 (code, vectype, new_temp, adjustment_def); new_dest = vect_create_destination_var (scalar_dest, vectype); } else { - new_temp = scalar_results[0]; + new_temp = scalar_results[0]; + if (!useless_type_conversion_p (TREE_TYPE (new_temp), + TREE_TYPE (initial_def))) + { + gimple_seq stmts; + scalar_type = TREE_TYPE (initial_def); + gcc_assert (tree_nop_conversion_p (TREE_TYPE (new_temp), + scalar_type)); + new_temp = build1 (NOP_EXPR, scalar_type, new_temp); + new_temp = force_gimple_operand (unshare_expr(new_temp), &stmts, + true, NULL_TREE); + if (stmts) + gsi_insert_before (&exit_gsi, stmts, GSI_SAME_STMT); + } gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) != VECTOR_TYPE); expr = build2 (code, scalar_type, new_temp, adjustment_def); new_dest = vect_create_destination_var (scalar_dest, scalar_type); @@ -5817,17 +5937,22 @@ vect_finalize_reduction: continue; } - phis.create (3); + auto_vec dest_uses; + dest_uses.create(3); /* Find the loop-closed-use at the loop exit of the original scalar result. (The reduction result is expected to have two immediate uses, one at the latch block, and one at the loop exit). For double reductions we are looking for exit phis of the outer loop. */ + + use_stmt_info + = loop_vinfo->lookup_single_use (scalar_dest); + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p)))) { if (!is_gimple_debug (USE_STMT (use_p))) - phis.safe_push (USE_STMT (use_p)); + dest_uses.safe_push (use_p); } else { @@ -5840,23 +5965,70 @@ vect_finalize_reduction: if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (phi_use_p))) && !is_gimple_debug (USE_STMT (phi_use_p))) - phis.safe_push (USE_STMT (phi_use_p)); + dest_uses.safe_push (phi_use_p); } } } } - FOR_EACH_VEC_ELT (phis, i, exit_phi) - { - /* Replace the uses: */ - orig_name = PHI_RESULT (exit_phi); - scalar_result = scalar_results[k]; - FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name) - FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) - SET_USE (use_p, scalar_result); - } + scalar_result = scalar_results[k]; + /* Not quite sure why we initially expect these PHI-Nodes to have a + single argument. Given the sign-change reductions we sometimes see + code generation that results in these phi-nodes having multiple + arguments. If that is the case we replace the actual argument within + the phi-node rather than the uses of the result of the phi-node. */ + FOR_EACH_VEC_ELT (dest_uses, i, use_p) + { + if (gimple_phi_num_args (USE_STMT (use_p)) > 1) + { + tree use = USE_FROM_PTR (use_p); + if (!useless_type_conversion_p (TREE_TYPE (use), + TREE_TYPE (scalar_result))) + { + gimple_stmt_iterator gsi; + gimple_seq stmts; + gcc_assert (tree_nop_conversion_p (TREE_TYPE (use), + TREE_TYPE + (scalar_result))); + gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (scalar_result)); + scalar_result = build1 (NOP_EXPR, TREE_TYPE (use), + scalar_result); + scalar_result + = force_gimple_operand (unshare_expr (scalar_result), + &stmts, true, NULL_TREE); + gsi_insert_after (&gsi, stmts, GSI_SAME_STMT); + } + SET_USE (use_p, scalar_result); + } + else + { + orig_name = PHI_RESULT (USE_STMT (use_p)); + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name) + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + { + tree use = USE_FROM_PTR (use_p); + if (!useless_type_conversion_p (TREE_TYPE (use), + TREE_TYPE (scalar_result))) + { + gimple_stmt_iterator gsi; + gimple_seq stmts; + gcc_assert (tree_nop_conversion_p (TREE_TYPE (use), + TREE_TYPE + (scalar_result))); + gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (scalar_result)); + scalar_result = build1 (NOP_EXPR, TREE_TYPE (use), + scalar_result); + scalar_result + = force_gimple_operand (unshare_expr (scalar_result), + &stmts, true, NULL_TREE); + gsi_insert_after (&gsi, stmts, GSI_SAME_STMT); + } + SET_USE (use_p, scalar_result); + } + } + } - phis.release (); + dest_uses.release (); } } @@ -6339,6 +6511,9 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k) { tree op = gimple_op (reduc_stmt, k); + if (TREE_CODE (op) == SSA_NAME + && is_nop_conversion_stmt (SSA_NAME_DEF_STMT(op))) + op = gimple_assign_rhs1(SSA_NAME_DEF_STMT (op)); if (op == phi_result) continue; if (k == 1 && code == COND_EXPR) @@ -6367,9 +6542,16 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, stmt_vec_info use_stmt_info; if (ncopies > 1 && STMT_VINFO_RELEVANT (reduc_stmt_info) <= vect_used_only_live - && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result)) - && vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info) - single_defuse_cycle = true; + && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result))) + { + if (is_nop_conversion_stmt (use_stmt_info->stmt)) + { + tree lhs = gimple_assign_lhs (use_stmt_info->stmt); + use_stmt_info = loop_vinfo->lookup_single_use (lhs); + } + if (vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info) + single_defuse_cycle = true; + } /* Create the destination vector */ scalar_dest = gimple_assign_lhs (reduc_stmt); @@ -6512,8 +6694,13 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, &def_stmt_info); dt = dts[i]; gcc_assert (is_simple_use); - if (dt == vect_reduction_def - && ops[i] == reduc_def) + + + if ((dt == vect_reduction_def + && ops[i] == reduc_def) + || (def_stmt_info + && is_nop_conversion_stmt (def_stmt_info->stmt) + && gimple_assign_rhs1 (def_stmt_info->stmt) == reduc_def)) { reduc_index = i; continue; @@ -6536,7 +6723,10 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, return false; if (dt == vect_nested_cycle - && ops[i] == reduc_def) + && (ops[i] == reduc_def + || (def_stmt_info + && is_nop_conversion_stmt (def_stmt_info->stmt) + && gimple_assign_rhs1 (def_stmt_info->stmt) == reduc_def))) { found_nested_cycle_def = true; reduc_index = i; @@ -6579,6 +6769,8 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if (!(reduc_index == -1 || dts[reduc_index] == vect_reduction_def || dts[reduc_index] == vect_nested_cycle + || (dts[reduc_index] == vect_internal_def + && is_nop_conversion_stmt (SSA_NAME_DEF_STMT (ops[reduc_index]))) || ((dts[reduc_index] == vect_internal_def || dts[reduc_index] == vect_external_def || dts[reduc_index] == vect_constant_def @@ -6749,6 +6941,12 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, loop_preheader_edge (def_stmt_loop)); stmt_vec_info def_arg_stmt_info = loop_vinfo->lookup_def (def_arg); + if (def_arg_stmt_info + && is_nop_conversion_stmt (def_arg_stmt_info->stmt)) + { + tree rhs = gimple_assign_rhs1 (def_arg_stmt_info->stmt); + def_arg_stmt_info = loop_vinfo->lookup_def (rhs); + } if (def_arg_stmt_info && (STMT_VINFO_DEF_TYPE (def_arg_stmt_info) == vect_double_reduction_def)) @@ -7133,12 +7331,19 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, This only works when we see both the reduction PHI and its only consumer in vectorizable_reduction and there are no intermediate stmts participating. */ - stmt_vec_info use_stmt_info; + stmt_vec_info use_stmt_info = NULL; tree reduc_phi_result = gimple_phi_result (reduc_def_phi); if (ncopies > 1 && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live) - && (use_stmt_info = loop_vinfo->lookup_single_use (reduc_phi_result)) - && vect_stmt_to_vectorize (use_stmt_info) == stmt_info) + && (use_stmt_info = loop_vinfo->lookup_single_use (reduc_phi_result))) + { + if (is_nop_conversion_stmt (use_stmt_info->stmt)) + { + tree lhs = gimple_assign_lhs (use_stmt_info->stmt); + use_stmt_info = loop_vinfo->lookup_single_use (lhs); + } + } + if (use_stmt_info && vect_stmt_to_vectorize (use_stmt_info) == stmt_info) { single_defuse_cycle = true; epilog_copies = 1; @@ -7405,6 +7610,27 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if ((!single_defuse_cycle || code == COND_EXPR) && !slp_node) vect_defs[0] = gimple_get_lhs ((*vec_stmt)->stmt); + for (j = 0; j < vec_num; ++j) + { + gimple_seq stmts; + gimple_stmt_iterator it = gsi_for_stmt (SSA_NAME_DEF_STMT (vect_defs[j])); + tree def_t = TREE_TYPE (vect_defs[j]); + tree phi_t = TREE_TYPE (PHI_RESULT (reduc_def_phi)); + if (tree_nop_conversion_p (TREE_TYPE (def_t), phi_t)) + { + /* TODO: Not sure about slp_node here... */ + poly_uint64 sz = GET_MODE_SIZE (TYPE_MODE (def_t)); + tree vectype = get_vectype_for_scalar_type_and_size (phi_t, sz); + vect_defs[j] = fold_build1 (VIEW_CONVERT_EXPR, vectype, + vect_defs[j]); + vect_defs[j] + = force_gimple_operand (unshare_expr (vect_defs[j]), &stmts, + true, NULL_TREE); + if (stmts) + gsi_insert_after (&it, stmts, GSI_SAME_STMT); + } + } + vect_create_epilog_for_reduction (vect_defs, stmt_info, reduc_def_phi, epilog_copies, reduc_fn, phis, double_reduc, slp_node, slp_node_instance, @@ -8148,7 +8374,7 @@ vectorizable_live_operation (stmt_vec_info stmt_info, else { enum vect_def_type dt = STMT_VINFO_DEF_TYPE (stmt_info); - vec_lhs = vect_get_vec_def_for_operand_1 (stmt_info, dt); + vec_lhs = vect_get_vec_def_for_operand_1 (NULL, stmt_info, dt); gcc_checking_assert (ncopies == 1 || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 601a6f55fbff388c89f88d994e790aebf2bf960e..e6af73af30f63ed72ffd6df8486212a83fa57313 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1519,7 +1519,8 @@ vect_init_vector (stmt_vec_info stmt_info, tree val, tree type, with type DT that will be used in the vectorized stmt. */ tree -vect_get_vec_def_for_operand_1 (stmt_vec_info def_stmt_info, +vect_get_vec_def_for_operand_1 (stmt_vec_info stmt_vinfo, + stmt_vec_info def_stmt_info, enum vect_def_type dt) { tree vec_oprnd; @@ -1533,14 +1534,19 @@ vect_get_vec_def_for_operand_1 (stmt_vec_info def_stmt_info, /* Code should use vect_get_vec_def_for_operand. */ gcc_unreachable (); - /* Operand is defined by a loop header phi. In case of nested - cycles we also may have uses of the backedge def. */ + /* Operand is defined by a loop header phi or by the reduction statement + itself when we are asking for the definition of the rhs of a nop + conversion. In case of nested cycles we also may have uses of the + backedge def. */ case vect_reduction_def: case vect_double_reduction_def: case vect_nested_cycle: case vect_induction_def: gcc_assert (gimple_code (def_stmt_info->stmt) == GIMPLE_PHI - || dt == vect_nested_cycle); + || dt == vect_nested_cycle + || (dt == vect_reduction_def + && stmt_vinfo + && is_nop_conversion_stmt (stmt_vinfo->stmt))); /* Fallthru. */ /* operand is defined inside the loop. */ @@ -1616,7 +1622,7 @@ vect_get_vec_def_for_operand (tree op, stmt_vec_info stmt_vinfo, tree vectype) return vect_init_vector (stmt_vinfo, op, vector_type, NULL); } else - return vect_get_vec_def_for_operand_1 (def_stmt_info, dt); + return vect_get_vec_def_for_operand_1 (stmt_vinfo, def_stmt_info, dt); } @@ -5359,6 +5365,17 @@ vectorizable_assignment (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, else ncopies = vect_get_num_copies (loop_vinfo, vectype); + if (ncopies > 1 && is_nop_conversion_stmt (stmt_info->stmt)) + { + tree lhs = gimple_assign_lhs (stmt_info->stmt); + stmt_vec_info reduc_info = loop_vinfo->lookup_single_use (lhs); + gimple *phi_def + = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt_info->stmt)); + if (reduc_info && STMT_VINFO_REDUC_DEF (reduc_info) + && gimple_code (phi_def) == GIMPLE_PHI) + ncopies = 1; + } + gcc_assert (ncopies >= 1); if (!vect_is_simple_use (op, vinfo, &dt[0], &vectype_in)) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 1456cde4c2c2dec7244c504d2c496248894a4f1e..387bca3d4433403185c7bbc4b81b3e6520dba531 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1515,7 +1515,8 @@ extern stmt_vec_info vect_finish_stmt_generation (stmt_vec_info, gimple *, gimple_stmt_iterator *); extern opt_result vect_mark_stmts_to_be_vectorized (loop_vec_info, bool *); extern tree vect_get_store_rhs (stmt_vec_info); -extern tree vect_get_vec_def_for_operand_1 (stmt_vec_info, enum vect_def_type); +extern tree vect_get_vec_def_for_operand_1 (stmt_vec_info, stmt_vec_info, + enum vect_def_type); extern tree vect_get_vec_def_for_operand (tree, stmt_vec_info, tree = NULL); extern void vect_get_vec_defs (tree, tree, stmt_vec_info, vec *, vec *, slp_tree); @@ -1620,6 +1621,7 @@ extern void vect_record_loop_mask (loop_vec_info, vec_loop_masks *, unsigned int, tree); extern tree vect_get_loop_mask (gimple_stmt_iterator *, vec_loop_masks *, unsigned int, tree, unsigned int); +extern bool is_nop_conversion_stmt (gimple *); /* Drive for loop transformation stage. */ extern class loop *vect_transform_loop (loop_vec_info);