From patchwork Fri Jul 31 14:51:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1339481 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=CWR+4du2; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BJ9G61tSLz9s1x for ; Sat, 1 Aug 2020 00:51:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A0BF63851C11; Fri, 31 Jul 2020 14:51:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A0BF63851C11 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1596207080; bh=BEg6XxYMKfTwRCUz9Ydqnypup9CHwmu0tlJ0bbxxBDs=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=CWR+4du2+w+pun3yWGtRHOyLdyx9thEnaNPn7OyzltzaBDiBGtwDCuuY+4d/vvquO hUPcFShg6ZDT3T8LvT6qXQzECoA9/x8M9CpY6zWyfhXjMOv0skjNK5Z3931H0bjhD0 niX68VzmBprvOuw1DU/cohEygruIS7lhxjYjyD70= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 02102385702D for ; Fri, 31 Jul 2020 14:51:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 02102385702D Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 06VEXQXh092827; Fri, 31 Jul 2020 10:51:15 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 32mhysxe15-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 Jul 2020 10:51:15 -0400 Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 06VEXvMh097993; Fri, 31 Jul 2020 10:51:15 -0400 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0b-001b2d01.pphosted.com with ESMTP id 32mhysxe0h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 Jul 2020 10:51:14 -0400 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 06VEkn2g029432; Fri, 31 Jul 2020 14:51:13 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma06ams.nl.ibm.com with ESMTP id 32gcqgqc3y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 Jul 2020 14:51:13 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 06VEpA9T61669712 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 31 Jul 2020 14:51:10 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 77CF8AE056; Fri, 31 Jul 2020 14:51:10 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 15800AE055; Fri, 31 Jul 2020 14:51:08 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.197.236.127]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 31 Jul 2020 14:51:07 +0000 (GMT) Subject: [PATCH v5] vect/rs6000: Support vector with length cost modeling To: GCC Patches , richard.sandiford@arm.com References: <419f1fad-05be-115c-1a53-cb710ae7b2dc@linux.ibm.com> <1aeabdc7-0cf4-055b-a3ec-74c283053cf5@linux.ibm.com> Message-ID: Date: Fri, 31 Jul 2020 22:51:06 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-31_05:2020-07-31, 2020-07-31 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 spamscore=0 priorityscore=1501 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 impostorscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007310108 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , Segher Boessenkool Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi Richard, New version v5 is attached. v5 main changes against v4: 1) use _stmt instead of _cnt to avoid confusion 2) factor out function vect_rgroup_iv_might_wrap_p 3) use generic scalar_stmt for min/max stmt Does this look better? Thanks in advance! BR, Kewen ----- gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New function. (rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost modeling for vector with length. (vect_rgroup_iv_might_wrap_p): New function, factored out from... * tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this. Update function comment. * tree-vect-stmts.c (vect_gen_len): Update function comment. * tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 009afc5f894..86ef584e09b 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5177,6 +5177,34 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count, return retval; } +/* For some target specific vectorization cost which can't be handled per stmt, + we check the requisite conditions and adjust the vectorization cost + accordingly if satisfied. One typical example is to model shift cost for + vector with length by counting number of required lengths under condition + LOOP_VINFO_FULLY_WITH_LENGTH_P. */ + +static void +rs6000_adjust_vect_cost_per_loop (rs6000_cost_data *data) +{ + struct loop *loop = data->loop_info; + gcc_assert (loop); + loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop); + + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + rgroup_controls *rgc; + unsigned int num_vectors_m1; + unsigned int shift_cnt = 0; + FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc) + if (rgc->type) + /* Each length needs one shift to fill into bits 0-7. */ + shift_cnt += num_vectors_m1 + 1; + + rs6000_add_stmt_cost (loop_vinfo, (void *) data, shift_cnt, scalar_stmt, + NULL, NULL_TREE, 0, vect_body); + } +} + /* Implement targetm.vectorize.finish_cost. */ static void @@ -5186,7 +5214,10 @@ rs6000_finish_cost (void *data, unsigned *prologue_cost, rs6000_cost_data *cost_data = (rs6000_cost_data*) data; if (cost_data->loop_info) - rs6000_density_test (cost_data); + { + rs6000_adjust_vect_cost_per_loop (cost_data); + rs6000_density_test (cost_data); + } /* Don't vectorize minimum-vectorization-factor, simple copy loops that require versioning for any reason. The vectorization is at diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 490e7befea3..2a0d3b1840d 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -412,7 +412,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_controls *dest_rgm, This means that we cannot guarantee that such an induction variable would ever hit a value that produces a set of all-false masks or zero - lengths for RGC. */ + lengths for RGC. + + Note that please check cost modeling whether needed to be updated in + function vect_estimate_min_profitable_iters if any changes. */ static tree vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, @@ -711,8 +714,6 @@ vect_set_loop_condition_partial_vectors (class loop *loop, else niters = gimple_convert (&preheader_seq, compare_type, niters); - widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo); - /* Iterate over all the rgroups and fill in their controls. We could use the first control from any rgroup for the loop condition; here we arbitrarily pick the last. */ @@ -739,11 +740,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, /* See whether zero-based IV would ever generate all-false masks or zero length before wrapping around. */ - unsigned nitems_per_iter = rgc->max_nscalars_per_iter * rgc->factor; - bool might_wrap_p - = (iv_limit == -1 - || (wi::min_precision (iv_limit * nitems_per_iter, UNSIGNED) - > compare_precision)); + bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc); /* Set up all controls for this group. */ test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo, diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 43ec4fb04d5..dea9ca6778f 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3798,6 +3798,58 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1, vector_stmt, NULL, NULL_TREE, 0, vect_body); } + else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + /* Referring to the functions vect_set_loop_condition_partial_vectors + and vect_set_loop_controls_directly, we need to generate each + length in the prologue and in the loop body if required. Although + there are some possible optimizations, we consider the worst case + here. */ + + bool niters_known_p = LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo); + bool need_iterate_p + = (!LOOP_VINFO_EPILOGUE_P (loop_vinfo) + && !vect_known_niters_smaller_than_vf (loop_vinfo)); + + /* Calculate how many statements to be added. */ + unsigned int prologue_stmt = 0; + unsigned int body_stmt = 0; + + rgroup_controls *rgc; + unsigned int num_vectors_m1; + FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc) + if (rgc->type) + { + /* May need one SHIFT for nitems_total computation. */ + unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor; + if (nitems != 1 && !niters_known_p) + prologue_stmt += 1; + + /* May need one MAX and one MINUS for wrap around. */ + if (vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc)) + prologue_stmt += 2; + + /* Need one MAX and one MINUS for each batch limit excepting for + the 1st one. */ + prologue_stmt += num_vectors_m1 * 2; + + unsigned int num_vectors = num_vectors_m1 + 1; + + /* Need to set up lengths in prologue, only one MIN required + for each since start index is zero. */ + prologue_stmt += num_vectors; + + /* Each may need two MINs and one MINUS to update lengths in body + for next iteration. */ + if (need_iterate_p) + body_stmt += 3 * num_vectors; + } + + (void) add_stmt_cost (loop_vinfo, target_cost_data, prologue_stmt, + scalar_stmt, NULL, NULL_TREE, 0, vect_prologue); + (void) add_stmt_cost (loop_vinfo, target_cost_data, body_stmt, + scalar_stmt, NULL, NULL_TREE, 0, vect_body); + } /* FORNOW: The scalar outside cost is incremented in one of the following ways: @@ -3932,8 +3984,8 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, } /* ??? The "if" arm is written to handle all cases; see below for what - we would do for !LOOP_VINFO_FULLY_MASKED_P. */ - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P. */ + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) { /* Rewriting the condition above in terms of the number of vector iterations (vniters) rather than the number of @@ -3960,7 +4012,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, dump_printf (MSG_NOTE, " Minimum number of vector iterations: %d\n", min_vec_niters); - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) { /* Now that we know the minimum number of vector iterations, find the minimum niters for which the scalar cost is larger: @@ -4015,6 +4067,10 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, && min_profitable_iters < (assumed_vf + peel_iters_prologue)) /* We want the vectorized loop to execute at least once. */ min_profitable_iters = assumed_vf + peel_iters_prologue; + else if (min_profitable_iters < peel_iters_prologue) + /* For LOOP_VINFO_USING_PARTIAL_VECTORS_P, we need to ensure the + vectorized loop to execute at least once. */ + min_profitable_iters = peel_iters_prologue; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -4032,7 +4088,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, if (vec_outside_cost <= 0) min_profitable_estimate = 0; - else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) { /* This is a repeat of the code above, but with + SOC rather than - SOC. */ @@ -4044,7 +4100,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, if (outside_overhead > 0) min_vec_niters = outside_overhead / saving_per_viter + 1; - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) { int threshold = (vec_inside_cost * min_vec_niters + vec_outside_cost @@ -9351,3 +9407,25 @@ vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo) return iv_limit; } +/* For the given rgroup_controls RGC, check whether an induction variable + would ever hit a value that produces a set of all-false masks or zero + lengths before wrapping around. Return true if it's possible to wrap + around before hitting the desirable value, otherwise return false. */ + +bool +vect_rgroup_iv_might_wrap_p (loop_vec_info loop_vinfo, rgroup_controls *rgc) +{ + widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo); + + if (iv_limit == -1) + return true; + + tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); + unsigned int compare_precision = TYPE_PRECISION (compare_type); + unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor; + + if (wi::min_precision (iv_limit * nitems, UNSIGNED) > compare_precision) + return true; + + return false; +} diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 31af46ae19c..8550a252f44 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -12090,7 +12090,10 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, min_of_start_and_end = min (START_INDEX, END_INDEX); left_len = END_INDEX - min_of_start_and_end; rhs = min (left_len, LEN_LIMIT); - LEN = rhs; */ + LEN = rhs; + + Note that please check cost modeling whether needed to be updated in + function vect_estimate_min_profitable_iters if any changes. */ gimple_seq vect_gen_len (tree len, tree start_index, tree end_index, tree len_limit) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 5466c78c20b..7ab6d8234ac 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1960,6 +1960,7 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *, /* In tree-vect-loop.c. */ extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo); +bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *); /* Used in tree-vect-loop-manip.c */ extern void determine_peel_for_niter (loop_vec_info); /* Used in gimple-loop-interchange.c and tree-parloops.c. */