From patchwork Sat May 8 08:05:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1475811 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=GeU2rOeC; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FcfyN1SnHz9sWk for ; Sat, 8 May 2021 18:05:47 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E06A3939C1B; Sat, 8 May 2021 08:05:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8E06A3939C1B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1620461143; bh=EmGd0i7t962TScizHAW+7/Ms93jntKShKNI+n4x8wvI=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=GeU2rOeCYoIhEXly67fnKmfdqW+kC0ZwZV225X7uG6ktCPUK+j5X4mFzUc/iqnfDl 3wDBLynYkVp7qLst4zaFtl0kkghAvZO34b91UpxUHiyoDMW7uIK7TLIlfgrzgKyuut 8nvlqy6t3iaB3VgLcFIW1+cMNrj4xKkEYR0ZjN8c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 9288E3896C31 for ; Sat, 8 May 2021 08:05:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9288E3896C31 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 148853vx102252; Sat, 8 May 2021 04:05:39 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 38dptt00ef-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 08 May 2021 04:05:38 -0400 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 14885I6e102648; Sat, 8 May 2021 04:05:38 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 38dptt00dx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 08 May 2021 04:05:38 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 14882J1W002658; Sat, 8 May 2021 08:05:36 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03ams.nl.ibm.com with ESMTP id 38dj9882a7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 08 May 2021 08:05:36 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 148858ci32965006 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 8 May 2021 08:05:08 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 34B884C04A; Sat, 8 May 2021 08:05:34 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C121E4C040; Sat, 8 May 2021 08:05:31 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.45.107]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Sat, 8 May 2021 08:05:31 +0000 (GMT) Subject: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook To: Richard Biener References: <990b2492-9198-b713-c79f-68563d488ba0@linux.ibm.com> Message-ID: <5169a943-4a45-040a-60c6-91af7718dc6f@linux.ibm.com> Date: Sat, 8 May 2021 16:05:29 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: JJYbGVkphY2HCELJ3IbHX51P54EotPvX X-Proofpoint-ORIG-GUID: 0N4KtEDOnP1a7H7uu7678DgiiwI4ihXo X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-05-08_05:2021-05-06, 2021-05-08 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 adultscore=0 mlxlogscore=999 impostorscore=0 malwarescore=0 suspectscore=0 phishscore=0 clxscore=1015 lowpriorityscore=0 bulkscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2105080060 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , GCC Patches , Segher Boessenkool , David Edelsohn Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi Richi, Thanks for the comments! on 2021/5/7 下午5:43, Richard Biener wrote: > On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches > wrote: >> >> Hi, >> >> When I was investigating density_test heuristics, I noticed that >> the current rs6000_density_test could be used for single scalar >> iteration cost calculation, through the call trace: >> vect_compute_single_scalar_iteration_cost >> -> rs6000_finish_cost >> -> rs6000_density_test >> >> It looks unexpected as its desriptive function comments and Bill >> helped to confirm this needs to be fixed (thanks!). >> >> So this patch is to check the passed data, if it's the same as >> the one in loop_vinfo, it indicates it's working on vector version >> cost calculation, otherwise just early return. >> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9. >> >> Nothing remarkable was observed with SPEC2017 Power9 full run. >> >> Is it ok for trunk? > > + /* Only care about cost of vector version, so exclude scalar > version here. */ > + if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data) > + return; > > Hmm, looks like a quite "random" test to me. What about adding a > parameter to finish_cost () (or init_cost?) indicating the cost kind? > I originally wanted to change the hook interface, but noticed that the finish_cost in function vect_estimate_min_profitable_iters is the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), it looks enough to differentiate the scalar version costing or vector version costing for loop. Do you mean this observation/ assumption easy to be broken sometime later? The attached patch to add one new parameter to indicate the costing kind explicitly as you suggested. Does it look better? gcc/ChangeLog: * doc/tm.texi: Regenerated. * target.def (init_cost): Add new parameter costing_for_scalar. * targhooks.c (default_init_cost): Adjust for new parameter. * targhooks.h (default_init_cost): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise. (vect_compute_single_scalar_iteration_cost): Likewise. (vect_analyze_loop_2): Likewise. * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vectorizer.h (init_cost): Likewise. * config/aarch64/aarch64.c (aarch64_init_cost): Likewise. * config/i386/i386.c (ix86_init_cost): Likewise. * config/rs6000/rs6000.c (rs6000_init_cost): Likewise. > OTOH we already pass scalar_stmt to individual add_stmt_cost, > so not sure whether the context really matters. That said, > the density test looks "interesting" ... the intent was that finish_cost > might look at gathered data from add_stmt, not that it looks at > the GIMPLE IL ... so why are you not counting vector_stmt vs. > scalar_stmt entries in vect_body and using that for this metric? > Good to know the intention behind finish_cost, thanks! I'm afraid that the check on vector_stmt and scalar_stmt entries from add_stmt_cost doesn't work for the density test here. The density test focuses on the vector version itself, there are some stmts whose relevants are marked as vect_unused_in_scope, IIUC they won't be passed down when costing for both versions. But the existing density check would like to know the cost for the non-vectorized part. The current implementation does: vec_cost = data->cost[vect_body] if (!STMT_VINFO_RELEVANT_P (stmt_info) && !STMT_VINFO_IN_PATTERN_P (stmt_info)) not_vec_cost++ density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost); it takes those unrelevant stmts into account, and then has both costs from the non-vectorized part (not_vec_cost) and vectorized part (cost[vect_body]), it can calculate the vectorization code density ratio. BR, Kewen gcc/config/aarch64/aarch64.c | 2 +- gcc/config/i386/i386.c | 2 +- gcc/config/rs6000/rs6000.c | 2 +- gcc/doc/tm.texi | 4 ++-- gcc/target.def | 6 ++++-- gcc/targhooks.c | 3 ++- gcc/targhooks.h | 2 +- gcc/tree-vect-loop.c | 6 +++--- gcc/tree-vect-slp.c | 8 +++++--- gcc/tree-vectorizer.h | 4 ++-- 10 files changed, 22 insertions(+), 17 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12625a4bee3..de3c86d85fb 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -14390,7 +14390,7 @@ struct aarch64_vector_costs /* Implement TARGET_VECTORIZE_INIT_COST. */ void * -aarch64_init_cost (class loop *) +aarch64_init_cost (class loop *, bool) { return new aarch64_vector_costs; } diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7c41302c75b..5078d94970a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22223,7 +22223,7 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) /* Implement targetm.vectorize.init_cost. */ static void * -ix86_init_cost (class loop *) +ix86_init_cost (class loop *, bool) { unsigned *cost = XNEWVEC (unsigned, 3); cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0; diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index e2ff00145c5..88f16b909a3 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5292,7 +5292,7 @@ rs6000_density_test (rs6000_cost_data *data) /* Implement targetm.vectorize.init_cost. */ static void * -rs6000_init_cost (struct loop *loop_info) +rs6000_init_cost (struct loop *loop_info, bool) { rs6000_cost_data *data = XNEW (struct _rs6000_cost_data); data->loop_info = loop_info; diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 823f85ba9ab..8b3af67782e 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6111,8 +6111,8 @@ type @code{internal_fn}) should be considered expensive when the mask is all zeros. GCC can then try to branch around the instruction instead. @end deftypefn -@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (class loop *@var{loop_info}) -This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block. The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block. If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized. +@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (class loop *@var{loop_info}, bool @var{costing_for_scalar}) +This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block. The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block. If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized. If @var{costing_for_scalar} is true, it indicates the current cost model is for the scalar version of a loop or block; otherwise it is for the vector version. @end deftypefn @deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (class vec_info *@var{}, void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, class _stmt_vec_info *@var{stmt_info}, tree @var{vectype}, int @var{misalign}, enum vect_cost_model_location @var{where}) diff --git a/gcc/target.def b/gcc/target.def index d7b94bd8e5d..9407ed17f2d 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2004,9 +2004,11 @@ DEFHOOK "allocates three unsigned integers for accumulating costs for the prologue, " "body, and epilogue of the loop or basic block. If @var{loop_info} is " "non-NULL, it identifies the loop being vectorized; otherwise a single block " - "is being vectorized.", + "is being vectorized. If @var{costing_for_scalar} is true, it indicates the " + "current cost model is for the scalar version of a loop or block; otherwise " + "it is for the vector version.", void *, - (class loop *loop_info), + (class loop *loop_info, bool costing_for_scalar), default_init_cost) /* Target function to record N statements of the given kind using the diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 952fad422eb..2e0fdb797e0 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1373,7 +1373,8 @@ default_empty_mask_is_expensive (unsigned ifn) array of three unsigned ints, set it to zero, and return its address. */ void * -default_init_cost (class loop *loop_info ATTRIBUTE_UNUSED) +default_init_cost (class loop *loop_info ATTRIBUTE_UNUSED, + bool costing_for_scalar ATTRIBUTE_UNUSED) { unsigned *cost = XNEWVEC (unsigned, 3); cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0; diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 9928d064abd..b537038c0aa 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -117,7 +117,7 @@ extern opt_machine_mode default_vectorize_related_mode (machine_mode, poly_uint64); extern opt_machine_mode default_get_mask_mode (machine_mode); extern bool default_empty_mask_is_expensive (unsigned); -extern void *default_init_cost (class loop *); +extern void *default_init_cost (class loop *, bool); extern unsigned default_add_stmt_cost (class vec_info *, void *, int, enum vect_cost_for_stmt, class _stmt_vec_info *, tree, int, diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 2aba503fef7..f10e66a2465 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -813,7 +813,7 @@ bb_in_loop_p (const_basic_block bb, const void *data) stmt_vec_info structs for all the stmts in LOOP_IN. */ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) - : vec_info (vec_info::loop, init_cost (loop_in), shared), + : vec_info (vec_info::loop, init_cost (loop_in, false), shared), loop (loop_in), bbs (XCNEWVEC (basic_block, loop->num_nodes)), num_itersm1 (NULL_TREE), @@ -1284,7 +1284,7 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo) } /* Now accumulate cost. */ - void *target_cost_data = init_cost (loop); + void *target_cost_data = init_cost (loop, true); stmt_info_for_cost *si; int j; FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), @@ -2723,7 +2723,7 @@ again: /* Reset target cost data. */ destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)); LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) - = init_cost (LOOP_VINFO_LOOP (loop_vinfo)); + = init_cost (LOOP_VINFO_LOOP (loop_vinfo), false); /* Reset accumulated rgroup information. */ release_vec_loop_controls (&LOOP_VINFO_MASKS (loop_vinfo)); release_vec_loop_controls (&LOOP_VINFO_LENS (loop_vinfo)); diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 1c5b7ae84e2..0ec92b0f0ca 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3690,7 +3690,9 @@ vect_detect_hybrid_slp (loop_vec_info loop_vinfo) /* Initialize a bb_vec_info struct for the statements in BBS basic blocks. */ _bb_vec_info::_bb_vec_info (vec _bbs, vec_info_shared *shared) - : vec_info (vec_info::bb, init_cost (NULL), shared), bbs (_bbs), roots (vNULL) + : vec_info (vec_info::bb, init_cost (NULL, false), shared), + bbs (_bbs), + roots (vNULL) { for (unsigned i = 0; i < bbs.length (); ++i) { @@ -4530,7 +4532,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, continue; } - void *scalar_target_cost_data = init_cost (NULL); + void *scalar_target_cost_data = init_cost (NULL, true); do { add_stmt_cost (bb_vinfo, scalar_target_cost_data, @@ -4544,7 +4546,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, destroy_cost_data (scalar_target_cost_data); /* Complete the target-specific vector cost calculation. */ - void *vect_target_cost_data = init_cost (NULL); + void *vect_target_cost_data = init_cost (NULL, false); do { add_stmt_cost (bb_vinfo, vect_target_cost_data, diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 9861d9e8810..8d1ffafdbf0 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1455,9 +1455,9 @@ int vect_get_stmt_cost (enum vect_cost_for_stmt type_of_cost) /* Alias targetm.vectorize.init_cost. */ static inline void * -init_cost (class loop *loop_info) +init_cost (class loop *loop_info, bool costing_for_scalar) { - return targetm.vectorize.init_cost (loop_info); + return targetm.vectorize.init_cost (loop_info, costing_for_scalar); } extern void dump_stmt_cost (FILE *, void *, int, enum vect_cost_for_stmt,