From patchwork Wed May 19 09:46:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1480797 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=xIIGeLP5; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FlShR3LC9z9sW4 for ; Wed, 19 May 2021 19:47:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2EA943896C0E; Wed, 19 May 2021 09:47:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2EA943896C0E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1621417635; bh=AoowMRoy2B6VDWNc+Gc9U6tEbtiu3rbol1ERAHZJbz4=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xIIGeLP5v1GIB+fUKBo1weJ6PgNgEe1mDWoP//S04+fijI7TKeCVUfRBp8aXgFOrF G1HbvRyoDE+3aeliSv8lSojUncmaUiGczS5jH8Lj6iqw3VARl/D6XhdxwZksrQJw/3 l1vIwdK+86vkOPUYO60bcZgzjcvIdqS+BZr6JO7o= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 9DD0238930D7 for ; Wed, 19 May 2021 09:47:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9DD0238930D7 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 14J9XjqS051716; Wed, 19 May 2021 05:47:07 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 38mye9j0fu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 May 2021 05:47:07 -0400 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 14J9YPMH055290; Wed, 19 May 2021 05:47:06 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 38mye9j0eh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 May 2021 05:47:06 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 14J9Sfrl020543; Wed, 19 May 2021 09:47:04 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma03ams.nl.ibm.com with ESMTP id 38j5x7t0ua-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 May 2021 09:47:03 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 14J9l1Ko32244214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 19 May 2021 09:47:01 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A313CAE051; Wed, 19 May 2021 09:47:01 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E7280AE04D; Wed, 19 May 2021 09:46:59 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.146.228]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 19 May 2021 09:46:59 +0000 (GMT) Subject: [PATCH v2] vect: Replace hardcoded weight factor with param To: Richard Biener References: <61b73f06-cced-0365-3b16-5c18c0dfc683@linux.ibm.com> Message-ID: <1ea0e5ca-3ce5-b35d-9b07-044d07ad5abf@linux.ibm.com> Date: Wed, 19 May 2021 17:46:58 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: vsVtJu1tnSM4DZeNP-dyL4ypKdKWy3c7 X-Proofpoint-ORIG-GUID: QrOA2JLjrlZRxVs_NvmArUX8oJNdwDqY X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-05-19_04:2021-05-18, 2021-05-19 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 spamscore=0 phishscore=0 clxscore=1015 priorityscore=1501 suspectscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 lowpriorityscore=0 bulkscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2105190069 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford , Bill Schmidt , GCC Patches , Segher Boessenkool Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi Richi, on 2021/5/19 下午4:15, Richard Biener wrote: > On Wed, May 19, 2021 at 8:20 AM Kewen.Lin wrote: >> >> Hi, >> >> This patch is to replace the current hardcoded weight factor 50 >> for those statements in an inner loop relative to the loop being >> vectorized with a specific parameter vect-inner-loop-weight-factor. >> >> The motivation behind this change is: if targets want to have one >> unique function to gather some information in each add_stmt_cost >> call, no matter that it's put before or after the cost tweaking >> part for inner loop, it may have the need to adjust (expand or >> shrink) the gathered data as the factor. Now the factor is >> hardcoded, it's not easily maintained. Since it's possible that >> targets have their own decisions on this costing like the others, >> I used parameter instead of one unique macro here. >> >> Testing is ongoing, is it ok for trunk if everything goes well? > > Certainly an improvement. I suppose we might want to put > the factor into vinfo->inner_loop_cost_factor. That way > we could adjust it easily in common code in the vectorizer > when we for example have (non-guessed) profile data. > > "weight_factor" is kind-of double-speak and I'm missing 'cost' ... > so, bike-shedding to vect_inner_loop_cost_factor? > > Just suggestions - as said, the patch is an improvement already. > Thanks for your nice suggestions! I've updated the patch accordingly and attached it. Does it look better to you? btw, the testing on the previous patch passed, new round testing was just kicked off. BR, Kewen ------ gcc/ChangeLog: * doc/invoke.texi (vect-inner-loop-cost-factor): Document new parameter. * params.opt (vect-inner-loop-cost-factor): New. * targhooks.c (default_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file tree-vectorizer.h and its required ones. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Likewise. (_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor. * tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor. (LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12625a4bee3..be883b61059 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -15437,7 +15437,10 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void *data, int count, arbitrary and could potentially be improved with analysis. */ if (where == vect_body && stmt_info && stmt_in_inner_loop_p (vinfo, stmt_info)) - count *= 50; /* FIXME */ + { + gcc_assert (loop_vinfo); + count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME */ + } retval = (unsigned) (count * stmt_cost); costs->region[where] += retval; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 340f7c95d76..223faa49b11 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12201,7 +12201,11 @@ arm_add_stmt_cost (vec_info *vinfo, void *data, int count, arbitrary and could potentially be improved with analysis. */ if (where == vect_body && stmt_info && stmt_in_inner_loop_p (vinfo, stmt_info)) - count *= 50; /* FIXME. */ + { + loop_vec_info loop_vinfo = dyn_cast (vinfo); + gcc_assert (loop_vinfo); + count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME. */ + } retval = (unsigned) (count * stmt_cost); cost[where] += retval; diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7c41302c75b..43b1fb0de0b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22396,7 +22396,11 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count, arbitrary and could potentially be improved with analysis. */ if (where == vect_body && stmt_info && stmt_in_inner_loop_p (vinfo, stmt_info)) - count *= 50; /* FIXME. */ + { + loop_vec_info loop_vinfo = dyn_cast (vinfo); + gcc_assert (loop_vinfo); + count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME. */ + } retval = (unsigned) (count * stmt_cost); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 48b8efd732b..859da8bd0ed 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5348,7 +5348,11 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count, arbitrary and could potentially be improved with analysis. */ if (where == vect_body && stmt_info && stmt_in_inner_loop_p (vinfo, stmt_info)) - count *= 50; /* FIXME. */ + { + loop_vec_info loop_vinfo = dyn_cast (vinfo); + gcc_assert (loop_vinfo); + count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME. */ + } retval = (unsigned) (count * stmt_cost); cost_data->cost[where] += retval; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 8b70fdf580d..2234801cab4 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14221,6 +14221,10 @@ code to iterate. 2 allows partial vector loads and stores in all loops. The parameter only has an effect on targets that support partial vector loads and stores. +@item vect-inner-loop-cost-factor +The factor which loop vectorizer uses to over weight those statements in +an inner loop relative to the loop being vectorized. + @item avoid-fma-max-bits Maximum number of bits for which we avoid creating FMAs. diff --git a/gcc/params.opt b/gcc/params.opt index 7c7aa78992a..a35c2abe359 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1089,4 +1089,8 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. +-param=vect-inner-loop-cost-factor= +Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization +Indicates the factor which loop vectorizer uses to over weight those statements in an inner loop relative to the loop being vectorized. The default value is 50. + ; This comment is to ensure we retain the blank line above. diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 952fad422eb..b595b7838af 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -90,6 +90,9 @@ along with GCC; see the file COPYING3. If not see #include "attribs.h" #include "asan.h" #include "emit-rtl.h" +#include "gimple.h" +#include "cfgloop.h" +#include "tree-vectorizer.h" bool default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED, @@ -1400,7 +1403,11 @@ default_add_stmt_cost (class vec_info *vinfo, void *data, int count, arbitrary and could potentially be improved with analysis. */ if (where == vect_body && stmt_info && stmt_in_inner_loop_p (vinfo, stmt_info)) - count *= 50; /* FIXME. */ + { + loop_vec_info loop_vinfo = dyn_cast (vinfo); + gcc_assert (loop_vinfo); + count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); + } retval = (unsigned) (count * stmt_cost); cost[where] += retval; diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 2aba503fef7..106c91964b5 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -836,6 +836,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) single_scalar_iteration_cost (0), vec_outside_cost (0), vec_inside_cost (0), + inner_loop_cost_factor (param_vect_inner_loop_cost_factor), vectorizable (false), can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), @@ -1237,7 +1238,7 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo) /* FORNOW. */ innerloop_iters = 1; if (loop->inner) - innerloop_iters = 50; /* FIXME */ + innerloop_iters = LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); for (i = 0; i < nbbs; i++) { diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 9861d9e8810..b8ba63cc8e2 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -689,6 +689,10 @@ public: /* The cost of the vector loop body. */ int vec_inside_cost; + /* The factor used to over weight those statements in an inner loop + relative to the loop being vectorized. */ + unsigned int inner_loop_cost_factor; + /* Is the loop vectorizable? */ bool vectorizable; @@ -807,6 +811,7 @@ public: #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost #define LOOP_VINFO_ORIG_LOOP_INFO(L) (L)->orig_loop_info #define LOOP_VINFO_SIMD_IF_COND(L) (L)->simd_if_cond +#define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)->inner_loop_cost_factor #define LOOP_VINFO_FULLY_MASKED_P(L) \ (LOOP_VINFO_USING_PARTIAL_VECTORS_P (L) \