From patchwork Thu Jan 16 09:39:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1224105 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-517497-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=ImzM/Y0W; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47yzgv23Cnz9sNx for ; Thu, 16 Jan 2020 20:40:10 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=q5KihDGhnbRyzewJXJ +9/MS5tkeFP0BDsk+/7FQ/WKvQRv7755Ev56P2FhAJX81MKphmtXHw3U++KmhsTg OPkv/fE3zu5S7u6XODXXF5DqrK9+Vt0Ptr/7V5BkOtY1TjmiVK51zOpod28j+ZJg pb9jewiDOElRG9zPqxvq5PYoA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=s3KT5KSgfW+MtkvxlSHEPz3J oxY=; b=ImzM/Y0WyL0i/P0iQG9DOrEzsqo4MkB0y+o8gidng23/ruqfyAacjDRw krCN9avlsfbyoFj7Bht23KefjbM85C9a1MzsNZCVqq34Kx0nVjUIvjMvI+KFpj8K 6B6jDRwSOzgR40gXoAlTVCA9dbv8qzV/N/fQ132g2Rv/3Te+xuc= Received: (qmail 129583 invoked by alias); 16 Jan 2020 09:40:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 129570 invoked by uid 89); 16 Jan 2020 09:40:02 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=mathematical, clique, Mathematical X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 16 Jan 2020 09:39:52 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00G9buM3106597 for ; Thu, 16 Jan 2020 04:39:50 -0500 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0b-001b2d01.pphosted.com with ESMTP id 2xh7h9n560-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 16 Jan 2020 04:39:50 -0500 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 16 Jan 2020 09:39:48 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 16 Jan 2020 09:39:45 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 00G9diqJ37617708 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Jan 2020 09:39:44 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 70E2E52050; Thu, 16 Jan 2020 09:39:44 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.222]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 2EA1B52051; Thu, 16 Jan 2020 09:39:41 +0000 (GMT) Subject: [PATCH 1/4 GCC11] Add middle-end unroll factor estimation From: "Kewen.Lin" To: GCC Patches Cc: Segher Boessenkool , Bill Schmidt , "bin.cheng" , Richard Guenther References: Date: Thu, 16 Jan 2020 17:39:40 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: x-cbid: 20011609-4275-0000-0000-0000039807ED X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20011609-4276-0000-0000-000038AC065E Message-Id: <131a3294-1951-3678-453b-085744366af6@linux.ibm.com> X-IsSubscribed: yes gcc/ChangeLog 2020-01-16 Kewen Lin * cfgloop.h (struct loop): New field estimated_uf. * config/rs6000/rs6000.c (TARGET_LOOP_UNROLL_ADJUST_TREE): New macro. (rs6000_loop_unroll_adjust_tree): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_LOOP_UNROLL_ADJUST_TREE): New hook. * target.def (loop_unroll_adjust_tree): New hook. * tree-ssa-loop-manip.c (decide_uf_const_iter): New function. (decide_uf_runtime_iter): Likewise. (decide_uf_stupid): Likewise. (estimate_unroll_factor): Likewise. * tree-ssa-loop-manip.h (estimate_unroll_factor): New declare. * tree-ssa-loop.c (tree_average_num_loop_insns): New function. * tree-ssa-loop.h (tree_average_num_loop_insns): New declare. gcc/cfgloop.h | 3 + gcc/config/rs6000/rs6000.c | 16 ++- gcc/doc/tm.texi | 6 ++ gcc/doc/tm.texi.in | 2 + gcc/target.def | 8 ++ gcc/tree-ssa-loop-manip.c | 254 +++++++++++++++++++++++++++++++++++++++++++++ gcc/tree-ssa-loop-manip.h | 3 +- gcc/tree-ssa-loop.c | 33 ++++++ gcc/tree-ssa-loop.h | 2 + 9 files changed, 324 insertions(+), 3 deletions(-) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index e3590d7..feceed6 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -232,6 +232,9 @@ public: Other values means unroll with the given unrolling factor. */ unsigned short unroll; + /* Like unroll field above, but it's estimated in middle-end. */ + unsigned short estimated_uf; + /* If this loop was inlined the main clique of the callee which does not need remapping when copying the loop body. */ unsigned short owned_clique; diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 2995348..0dabaa6 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1431,6 +1431,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_LOOP_UNROLL_ADJUST #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust +#undef TARGET_LOOP_UNROLL_ADJUST_TREE +#define TARGET_LOOP_UNROLL_ADJUST_TREE rs6000_loop_unroll_adjust_tree + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -5090,7 +5093,8 @@ rs6000_destroy_cost_data (void *data) free (data); } -/* Implement targetm.loop_unroll_adjust. */ +/* Implement targetm.loop_unroll_adjust. Don't forget to update + loop_unroll_adjust_tree for any changes. */ static unsigned rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) @@ -5109,6 +5113,16 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) return nunroll; } +/* Implement targetm.loop_unroll_adjust_tree, strictly refers to + targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust_tree (unsigned nunroll, struct loop *loop) +{ + /* For now loop_unroll_adjust is simple, just invoke directly. */ + return rs6000_loop_unroll_adjust (nunroll, loop); +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 2244df4..86ad278 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11875,6 +11875,12 @@ is required only when the target has special constraints like maximum number of memory accesses. @end deftypefn +@deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST_TREE (unsigned @var{nunroll}, class loop *@var{loop}) +This target hook is the same as @code{loop_unroll_adjust}, but it's for +middle-end unroll factor estimation computation. See +@code{loop_unroll_adjust} for the function description. +@end deftypefn + @defmac POWI_MAX_MULTS If defined, this macro is interpreted as a signed integer C expression that specifies the maximum number of floating point multiplications diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 52cd603..fd9769e 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8008,6 +8008,8 @@ lists. @hook TARGET_LOOP_UNROLL_ADJUST +@hook TARGET_LOOP_UNROLL_ADJUST_TREE + @defmac POWI_MAX_MULTS If defined, this macro is interpreted as a signed integer C expression that specifies the maximum number of floating point multiplications diff --git a/gcc/target.def b/gcc/target.def index e705c5d..f61c831 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2725,6 +2725,14 @@ number of memory accesses.", unsigned, (unsigned nunroll, class loop *loop), NULL) +DEFHOOK +(loop_unroll_adjust_tree, + "This target hook is the same as @code{loop_unroll_adjust}, but it's for\n\ +middle-end unroll factor estimation computation. See\n\ +@code{loop_unroll_adjust} for the function description.", + unsigned, (unsigned nunroll, class loop *loop), + NULL) + /* True if X is a legitimate MODE-mode immediate operand. */ DEFHOOK (legitimate_constant_p, diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index a79912a..db7f6e6 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see #include "system.h" #include "coretypes.h" #include "backend.h" +#include "target.h" #include "tree.h" #include "gimple.h" #include "cfghooks.h" @@ -42,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "tree-scalar-evolution.h" #include "tree-inline.h" +#include "wide-int.h" /* All bitmaps for rewriting into loop-closed SSA go on this obstack, so that we can free them all at once. */ @@ -1592,3 +1594,255 @@ canonicalize_loop_ivs (class loop *loop, tree *nit, bool bump_in_latch) return var_before; } + +/* Try to determine estimated unroll factor for given LOOP with constant number + of iterations, mainly refer to decide_unroll_constant_iterations. + - NITER_DESC holds number of iteration description if it isn't NULL. + - NUNROLL holds a unroll factor value computed with instruction numbers. + - ITER holds estimated or likely max loop iterations. + Return true if it succeeds, also update estimated_uf. */ + +static bool +decide_uf_const_iter (class loop *loop, const tree_niter_desc *niter_desc, + unsigned nunroll, const widest_int *iter) +{ + /* Skip big loops. */ + if (nunroll <= 1) + return false; + + gcc_assert (niter_desc && niter_desc->assumptions); + + /* Check number of iterations is constant. */ + if ((niter_desc->may_be_zero && !integer_zerop (niter_desc->may_be_zero)) + || !tree_fits_uhwi_p (niter_desc->niter)) + return false; + + unsigned HOST_WIDE_INT const_niter = tree_to_uhwi (niter_desc->niter); + + /* Check for an explicit unrolling factor. */ + if (loop->unroll > 0 && loop->unroll < USHRT_MAX) + { + /* It should have been peeled instead. */ + if (const_niter == 0 || (unsigned) loop->unroll > const_niter - 1) + loop->estimated_uf = 1; + else + loop->estimated_uf = loop->unroll; + return true; + } + + /* Check whether the loop rolls enough to consider. */ + if (const_niter < 2 * nunroll || wi::ltu_p (*iter, 2 * nunroll)) + return false; + + /* Success; now compute number of iterations to unroll. */ + unsigned best_unroll = 0, n_copies = 0; + unsigned best_copies = 2 * nunroll + 10; + unsigned i = 2 * nunroll + 2; + + if (i > const_niter - 2) + i = const_niter - 2; + + for (; i >= nunroll - 1; i--) + { + unsigned exit_mod = const_niter % (i + 1); + + if (!empty_block_p (loop->latch)) + n_copies = exit_mod + i + 1; + else if (exit_mod != i) + n_copies = exit_mod + i + 2; + else + n_copies = i + 1; + + if (n_copies < best_copies) + { + best_copies = n_copies; + best_unroll = i; + } + } + + loop->estimated_uf = best_unroll + 1; + return true; +} + +/* Try to determine estimated unroll factor for given LOOP with countable but + non-constant number of iterations, mainly refer to + decide_unroll_runtime_iterations. + - NITER_DESC holds number of iteration description if it isn't NULL. + - NUNROLL_IN holds a unroll factor value computed with instruction numbers. + - ITER holds estimated or likely max loop iterations. + Return true if it succeeds, also update estimated_uf. */ + +static bool +decide_uf_runtime_iter (class loop *loop, const tree_niter_desc *niter_desc, + unsigned nunroll_in, const widest_int *iter) +{ + unsigned nunroll = nunroll_in; + if (loop->unroll > 0 && loop->unroll < USHRT_MAX) + nunroll = loop->unroll; + + /* Skip big loops. */ + if (nunroll <= 1) + return false; + + gcc_assert (niter_desc && niter_desc->assumptions); + + /* Skip constant number of iterations. */ + if ((!niter_desc->may_be_zero || !integer_zerop (niter_desc->may_be_zero)) + && tree_fits_uhwi_p (niter_desc->niter)) + return false; + + /* Check whether the loop rolls. */ + if (wi::ltu_p (*iter, 2 * nunroll)) + return false; + + /* Success; now force nunroll to be power of 2. */ + unsigned i; + for (i = 1; 2 * i <= nunroll; i *= 2) + continue; + + loop->estimated_uf = i; + return true; +} + +/* Try to determine estimated unroll factor for given LOOP with uncountable + number of iterations, mainly refer to decide_unroll_stupid. + - NITER_DESC holds number of iteration description if it isn't NULL. + - NUNROLL_IN holds a unroll factor value computed with instruction numbers. + - ITER holds estimated or likely max loop iterations. + Return true if it succeeds, also update estimated_uf. */ + +static bool +decide_uf_stupid (class loop *loop, const tree_niter_desc *niter_desc, + unsigned nunroll_in, const widest_int *iter) +{ + if (!flag_unroll_all_loops && !loop->unroll) + return false; + + unsigned nunroll = nunroll_in; + if (loop->unroll > 0 && loop->unroll < USHRT_MAX) + nunroll = loop->unroll; + + /* Skip big loops. */ + if (nunroll <= 1) + return false; + + gcc_assert (!niter_desc || !niter_desc->assumptions); + + /* Skip loop with multiple branches for now. */ + if (num_loop_branches (loop) > 1) + return false; + + /* Check whether the loop rolls. */ + if (wi::ltu_p (*iter, 2 * nunroll)) + return false; + + /* Success; now force nunroll to be power of 2. */ + unsigned i; + for (i = 1; 2 * i <= nunroll; i *= 2) + continue; + + loop->estimated_uf = i; + return true; +} + +/* Try to estimate whether this given LOOP can be unrolled or not, and compute + its estimated unroll factor if it can. To avoid duplicated computation, you + can pass number of iterations information by DESC. The heuristics mainly + refer to decide_unrolling in loop-unroll.c. */ + +void +estimate_unroll_factor (class loop *loop, tree_niter_desc *desc) +{ + /* Return the existing estimated unroll factor. */ + if (loop->estimated_uf) + return; + + /* Don't unroll explicitly. */ + if (loop->unroll == 1) + { + loop->estimated_uf = loop->unroll; + return; + } + + /* Like decide_unrolling, don't unroll if: + 1) the loop is cold. + 2) the loop can't be manipulated. + 3) the loop isn't innermost. */ + if (optimize_loop_for_size_p (loop) + || !can_duplicate_loop_p (loop) + || loop->inner != NULL) + { + loop->estimated_uf = 1; + return; + } + + /* Don't unroll without explicit information. */ + if (!loop->unroll && !flag_unroll_loops && !flag_unroll_all_loops) + { + loop->estimated_uf = 1; + return; + } + + /* Check for instruction number and average instruction number. */ + loop->ninsns = tree_num_loop_insns (loop, &eni_size_weights); + loop->av_ninsns = tree_average_num_loop_insns (loop, &eni_size_weights); + unsigned nunroll = param_max_unrolled_insns / loop->ninsns; + unsigned nunroll_by_av = param_max_average_unrolled_insns / loop->av_ninsns; + + if (nunroll > nunroll_by_av) + nunroll = nunroll_by_av; + if (nunroll > (unsigned) param_max_unroll_times) + nunroll = param_max_unroll_times; + + if (targetm.loop_unroll_adjust_tree) + nunroll = targetm.loop_unroll_adjust_tree (nunroll, loop); + + tree_niter_desc *niter_desc = NULL; + bool desc_need_delete = false; + + /* Compute number of iterations if need. */ + if (!desc) + { + /* For now, use single_dom_exit for simplicity. TODO: Support multiple + exits like find_simple_exit if we finds some profitable cases. */ + niter_desc = XNEW (class tree_niter_desc); + gcc_assert (niter_desc); + edge exit = single_dom_exit (loop); + if (!exit || !number_of_iterations_exit (loop, exit, niter_desc, true)) + { + XDELETE (niter_desc); + niter_desc = NULL; + } + else + desc_need_delete = true; + } + else + niter_desc = desc; + + /* For checking the loop rolls enough to consider, also consult loop bounds + and profile. */ + widest_int iterations; + if (!get_estimated_loop_iterations (loop, &iterations) + && !get_likely_max_loop_iterations (loop, &iterations)) + iterations = 0; + + if (niter_desc && niter_desc->assumptions) + { + /* For countable loops. */ + if (!decide_uf_const_iter (loop, niter_desc, nunroll, &iterations) + && !decide_uf_runtime_iter (loop, niter_desc, nunroll, &iterations)) + loop->estimated_uf = 1; + } + else + { + if (!decide_uf_stupid (loop, niter_desc, nunroll, &iterations)) + loop->estimated_uf = 1; + } + + if (desc_need_delete) + { + XDELETE (niter_desc); + niter_desc = NULL; + } +} + diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h index 8263a67..46aec40 100644 --- a/gcc/tree-ssa-loop-manip.h +++ b/gcc/tree-ssa-loop-manip.h @@ -55,7 +55,6 @@ extern void tree_transform_and_unroll_loop (class loop *, unsigned, extern void tree_unroll_loop (class loop *, unsigned, edge, class tree_niter_desc *); extern tree canonicalize_loop_ivs (class loop *, tree *, bool); - - +extern void estimate_unroll_factor (class loop *, tree_niter_desc *); #endif /* GCC_TREE_SSA_LOOP_MANIP_H */ diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index fc9f083..d07422e 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "stringpool.h" #include "attribs.h" +#include "sreal.h" /* A pass making sure loops are fixed up. */ @@ -790,5 +791,37 @@ tree_num_loop_insns (class loop *loop, eni_weights *weights) return size; } +/* Computes an estimated number of insns on average per iteration in LOOP, + weighted by WEIGHTS. Refer to function average_num_loop_insns. */ +unsigned +tree_average_num_loop_insns (class loop *loop, eni_weights *weights) +{ + basic_block *body = get_loop_body (loop); + gimple_stmt_iterator gsi; + unsigned bb_size, i; + sreal nsize = 0; + + for (i = 0; i < loop->num_nodes; i++) + { + bb_size = 0; + for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi)) + bb_size += estimate_num_insns (gsi_stmt (gsi), weights); + nsize += (sreal) bb_size + * body[i]->count.to_sreal_scale (loop->header->count); + /* Avoid overflows. */ + if (nsize > 1000000) + { + free (body); + return 1000000; + } + } + free (body); + + unsigned ret = nsize.to_int (); + if (!ret) + ret = 1; /* To avoid division by zero. */ + + return ret; +} diff --git a/gcc/tree-ssa-loop.h b/gcc/tree-ssa-loop.h index e523de2..7bf6ba7 100644 --- a/gcc/tree-ssa-loop.h +++ b/gcc/tree-ssa-loop.h @@ -67,6 +67,8 @@ public: extern bool for_each_index (tree *, bool (*) (tree, tree *, void *), void *); extern char *get_lsm_tmp_name (tree ref, unsigned n, const char *suffix = NULL); extern unsigned tree_num_loop_insns (class loop *, struct eni_weights *); +extern unsigned tree_average_num_loop_insns (class loop *, + struct eni_weights *); /* Returns the loop of the statement STMT. */ From patchwork Thu Jan 16 09:40:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1224106 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-517498-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=M+Sfk5cL; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47yzj54N5Zz9sNx for ; Thu, 16 Jan 2020 20:41:13 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=Lj00lz7ZZNOpPu+Ppo Fb2mWGeo2UadbtfSHhtwFtk86ZuAGm4eyUteL/ju/HEytUT6L3eJWn+UKxlP5ndm gC2CYjLS6AlYjMyukYKjLjSO9UcxrFwudBMOh1VpLA8RqMoo46oaNt1YZPYKrZor oiMDveL0zQLM2URZYKr8aiVjc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=EFvkVSOGTzLeNmPvAfQUKC9a HUA=; b=M+Sfk5cLbB6DqbORpdNJt5bqeOHnd2igtRun2v5427VCuhnMAqDDxeh6 7WtBko9cgv9bMZlJNAr9GfVqRL1QxehGCDwl7pOrUrJK6E2ILmc4LN4VRYoHOZKX sa6XocgQk+fOufXyAyY870wpP2cg3oQmo1DXY66rFj9Li34x01E= Received: (qmail 1006 invoked by alias); 16 Jan 2020 09:41:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 992 invoked by uid 89); 16 Jan 2020 09:41:05 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 16 Jan 2020 09:40:55 +0000 Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00G9bZ8U034770 for ; Thu, 16 Jan 2020 04:40:53 -0500 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2xfaw28bv5-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 16 Jan 2020 04:40:53 -0500 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 16 Jan 2020 09:40:50 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 16 Jan 2020 09:40:48 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 00G9elaK48103428 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Jan 2020 09:40:47 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AC09552067; Thu, 16 Jan 2020 09:40:47 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.222]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 96A3E5204E; Thu, 16 Jan 2020 09:40:45 +0000 (GMT) Subject: [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p From: "Kewen.Lin" To: GCC Patches Cc: Segher Boessenkool , Bill Schmidt , "bin.cheng" , Richard Guenther References: Date: Thu, 16 Jan 2020 17:40:44 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: x-cbid: 20011609-0016-0000-0000-000002DDCD82 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20011609-0017-0000-0000-000033406466 Message-Id: X-IsSubscribed: yes gcc/ChangeLog 2020-01-16 Kewen Lin * config/rs6000/rs6000.c (TARGET_STRIDE_DFORM_VALID_P): New macro. (rs6000_stride_dform_valid_p): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_STRIDE_DFORM_VALID_P): New hook. * target.def (stride_dform_valid_p): New hook. gcc/config/rs6000/rs6000.c | 40 ++++++++++++++++++++++++++++++++++++++++ gcc/doc/tm.texi | 8 ++++++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 13 ++++++++++++- 4 files changed, 62 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 0dabaa6..1e41fcf 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1657,6 +1657,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_PREDICT_DOLOOP_P #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p +#undef TARGET_STRIDE_DFORM_VALID_P +#define TARGET_STRIDE_DFORM_VALID_P rs6000_stride_dform_valid_p + #undef TARGET_HAVE_COUNT_REG_DECR_P #define TARGET_HAVE_COUNT_REG_DECR_P true @@ -26272,6 +26275,43 @@ rs6000_predict_doloop_p (struct loop *loop) return true; } +/* Return true if the memory access with mode MODE, signedness SIGNED_P and + store STORE_P with offset from 0 to (NUNROLL-1) * STRIDE are valid with + D-form instructions. */ + +static bool +rs6000_stride_dform_valid_p (machine_mode mode, signed HOST_WIDE_INT stride, + bool signed_p, bool store_p, unsigned nunroll) +{ + static const HOST_WIDE_INT max_bound = 0x7fff; + static const HOST_WIDE_INT min_bound = -0x8000; + + if (!IN_RANGE ((nunroll - 1) * stride, min_bound, max_bound)) + return false; + + /* Check DQ-form for vector mode or float128 mode. */ + if (VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode)) + { + if (mode_supports_dq_form (mode) && !(stride & 0xF)) + return true; + else + return false; + } + + /* Simply consider non VSX instructions. */ + if (mode == QImode || mode == HImode || mode == SFmode || mode == DFmode) + return true; + + /* lwz/stw is D-form, but lwa is DS-form. */ + if (mode == SImode && (!signed_p || store_p || !(stride & 0x03))) + return true; + + if (mode == DImode && !(stride & 0x03)) + return true; + + return false; +} + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 86ad278..0b8bc7c 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11669,6 +11669,14 @@ function version at run-time for a given set of function versions. body must be generated. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STRIDE_DFORM_VALID_P (machine_mode @var{mode}, signed HOST_WIDE_INT @var{stride}, bool @var{signed_p}, bool @var{store_p}, unsigned @var{nunroll}) +For a given memory access, check whether it is valid to put 0, @var{stride} +, 2 * @var{stride}, ... , (@var{nunroll} - 1) to the instruction D-form +displacement, with mode @var{mode}, signedness @var{signed_p} and store +@var{store_p}. Return true if valid. +The default version of this hook returns false. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (class loop *@var{loop}) Return true if we can predict it is possible to use a low-overhead loop for a particular loop. The parameter @var{loop} is a pointer to the loop. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index fd9769e..e90d020 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7953,6 +7953,8 @@ to by @var{ce_info}. @hook TARGET_GENERATE_VERSION_DISPATCHER_BODY +@hook TARGET_STRIDE_DFORM_VALID_P + @hook TARGET_PREDICT_DOLOOP_P @hook TARGET_HAVE_COUNT_REG_DECR_P diff --git a/gcc/target.def b/gcc/target.def index f61c831..ee19a8d 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4300,7 +4300,18 @@ DEFHOOK emits a @code{speculation_barrier} instruction if that is defined.", rtx, (machine_mode mode, rtx result, rtx val, rtx failval), default_speculation_safe_value) - + +DEFHOOK +(stride_dform_valid_p, + "For a given memory access, check whether it is valid to put 0, @var{stride}\n\ +, 2 * @var{stride}, ... , (@var{nunroll} - 1) to the instruction D-form\n\ +displacement, with mode @var{mode}, signedness @var{signed_p} and store\n\ +@var{store_p}. Return true if valid.\n\ +The default version of this hook returns false.", + bool, (machine_mode mode, signed HOST_WIDE_INT stride, bool signed_p, + bool store_p, unsigned nunroll), + NULL) + DEFHOOK (predict_doloop_p, "Return true if we can predict it is possible to use a low-overhead loop\n\ From patchwork Thu Jan 16 09:41:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1224107 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-517499-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=kAFoIKI5; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47yzjw063cz9sNx for ; Thu, 16 Jan 2020 20:41:55 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=NKN9/rVD0AXYjWjdgN /+UN6YmN3uaNu5SdkJ2akh/0EXb9ddjvpMDY4l97p2tf0a2sXtF1sGA4mUa3QhsQ poXz26cpsRg+DGg49P3pQieaF2FkRb/hvWWILzQlB1upVKfkDLo8rrqMSxEGxwse V9/Aielaw/dPbOj+ehkTNwKFo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=R9YJEVafvt08Zt6D7SNhM+ZB 8RE=; b=kAFoIKI5wtjLVBghhYz/5qtnSeajp9/J9Z3TcSAGbGaaT8Oj08Kf6qze Q4PAsYA/bFHaOwWUCLE195Otp/HCA5LAnV1qhhG61bQaCfd5V9IdE8SwiW7RmIEk cVFXo9axo4Rsja1K1DmI5AXF4VwNuBcnrc75GjwD5icmy37e3II= Received: (qmail 2910 invoked by alias); 16 Jan 2020 09:41:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2902 invoked by uid 89); 16 Jan 2020 09:41:48 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 16 Jan 2020 09:41:37 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00G9bvVY106699 for ; Thu, 16 Jan 2020 04:41:36 -0500 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0b-001b2d01.pphosted.com with ESMTP id 2xh7h9n788-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 16 Jan 2020 04:41:35 -0500 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 16 Jan 2020 09:41:34 -0000 Received: from b06avi18626390.portsmouth.uk.ibm.com (9.149.26.192) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 16 Jan 2020 09:41:32 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 00G9egPL37749132 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Jan 2020 09:40:42 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 36FAE52051; Thu, 16 Jan 2020 09:41:31 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.222]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 1195A52054; Thu, 16 Jan 2020 09:41:28 +0000 (GMT) Subject: [PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling From: "Kewen.Lin" To: GCC Patches Cc: Segher Boessenkool , Bill Schmidt , "bin.cheng" , Richard Guenther References: Date: Thu, 16 Jan 2020 17:41:27 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: x-cbid: 20011609-0016-0000-0000-000002DDCD90 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20011609-0017-0000-0000-000033406474 Message-Id: X-IsSubscribed: yes gcc/ChangeLog 2020-01-16 Kewen Lin * tree-ssa-loop-ivopts.c (struct iv_group): New field dform_p. (struct iv_cand): New field dform_p. (struct ivopts_data): New field mark_dform_p. (record_group): Initialize dform_p. (mark_dform_groups): New function. (find_interesting_uses): Call mark_dform_groups. (add_candidate_1): Update dform_p if derived from dform_p group. (determine_iv_cost): Increase cost by considering unroll factor. (tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update mark_dform_p. gcc/tree-ssa-loop-ivopts.c | 84 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 83 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index ab52cbe..a0d29bb 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -429,6 +429,8 @@ struct iv_group struct iv_cand *selected; /* To indicate this is a doloop use group. */ bool doloop_p; + /* To indicate this group is D-form preferred. */ + bool dform_p; /* Uses in the group. */ vec vuses; }; @@ -470,6 +472,7 @@ struct iv_cand struct iv *orig_iv; /* The original iv if this cand is added from biv with smaller type. */ bool doloop_p; /* Whether this is a doloop candidate. */ + bool dform_p; /* Derived from one D-form preferred group. */ }; /* Hashtable entry for common candidate derived from iv uses. */ @@ -650,6 +653,10 @@ struct ivopts_data /* Whether the loop has doloop comparison use. */ bool doloop_use_p; + + /* Whether the loop is likely to unroll and need to check and mark + D-form group for better step cost modeling. */ + bool mark_dform_p; }; /* An assignment of iv candidates to uses. */ @@ -1575,6 +1582,7 @@ record_group (struct ivopts_data *data, enum use_type type) group->related_cands = BITMAP_ALLOC (NULL); group->vuses.create (1); group->doloop_p = false; + group->dform_p = false; data->vgroups.safe_push (group); return group; @@ -2724,6 +2732,59 @@ split_address_groups (struct ivopts_data *data) } } +/* Go through all address type groups, check and mark D-form preferred. */ +static void +mark_dform_groups (struct ivopts_data *data) +{ + if (!data->mark_dform_p) + return; + + class loop *loop = data->current_loop; + bool dump_details = (dump_file && (dump_flags & TDF_DETAILS)); + for (unsigned i = 0; i < data->vgroups.length (); i++) + { + struct iv_group *group = data->vgroups[i]; + if (address_p (group->type)) + { + bool found = true; + for (unsigned j = 0; j < group->vuses.length (); j++) + { + struct iv_use *use = group->vuses[j]; + gcc_assert (use->mem_type); + /* Ensure the step fit into D-form field. */ + if (TREE_CODE (use->iv->step) != INTEGER_CST + || !tree_fits_shwi_p (use->iv->step)) + { + found = false; + if (dump_details) + fprintf (dump_file, + " Group use %u.%u doesn't" + "have constant step for D-form.\n", + i, j); + break; + } + bool is_store + = TREE_CODE (gimple_assign_lhs (use->stmt)) == SSA_NAME; + if (!targetm.stride_dform_valid_p (TYPE_MODE (use->mem_type), + tree_to_shwi (use->iv->step), + TYPE_UNSIGNED (use->mem_type), + is_store, loop->estimated_uf)) + { + found = false; + if (dump_details) + fprintf (dump_file, + " Group use %u.%u isn't" + "suitable for D-form.\n", + i, j); + break; + } + } + if (found) + group->dform_p = true; + } + } +} + /* Finds uses of the induction variables that are interesting. */ static void @@ -2755,6 +2816,8 @@ find_interesting_uses (struct ivopts_data *data) split_address_groups (data); + mark_dform_groups (data); + if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "\n:\n"); @@ -3137,6 +3200,7 @@ add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, cand->important = important; cand->incremented_at = incremented_at; cand->doloop_p = doloop; + cand->dform_p = false; data->vcands.safe_push (cand); if (!poly_int_tree_p (step)) @@ -3173,7 +3237,11 @@ add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, /* Relate candidate to the group for which it is added. */ if (use) - bitmap_set_bit (data->vgroups[use->group_id]->related_cands, i); + { + bitmap_set_bit (data->vgroups[use->group_id]->related_cands, i); + if (data->vgroups[use->group_id]->dform_p) + cand->dform_p = true; + } return cand; } @@ -5867,6 +5935,10 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand) cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); cost = cost_step + adjust_setup_cost (data, cost_base.cost); + /* Consider the stride update cost during unrolling. */ + if (!cand->dform_p) + cost += (data->current_loop->estimated_uf - 1) * cost_step; + /* Prefer the original ivs unless we may gain something by replacing it. The reason is to make debugging simpler; so this is not relevant for artificial ivs created by other optimization passes. */ @@ -7953,6 +8025,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, class loop *loop, data->current_loop = loop; data->loop_loc = find_loop_location (loop).get_location_t (); data->speed = optimize_loop_for_speed_p (loop); + data->mark_dform_p = false; if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -7984,6 +8057,15 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, class loop *loop, if (!find_induction_variables (data)) goto finish; + if (targetm.stride_dform_valid_p && exit) + { + tree_niter_desc *desc = niter_for_exit (data, exit); + estimate_unroll_factor (loop, desc); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "estimated_uf:%u\n", loop->estimated_uf); + data->mark_dform_p = loop->estimated_uf > 1; + } + /* Finds interesting uses (item 1). */ find_interesting_uses (data); if (data->vgroups.length () > MAX_CONSIDERED_GROUPS) From patchwork Thu Jan 16 09:42:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1224108 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-517500-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=MrMhDaLs; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47yzlN42Dgz9sNx for ; Thu, 16 Jan 2020 20:43:11 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=SZUUPQ5/akBwePMgT1 R/EXxppVmfOjstSWnEpoCItXfpU7X3YbV5IoVMk1jkt+B/fe41Jr5SUwUZJF+E/v uSCqfSQ/WlIZHELLw9J7qEGuum8KAgnGemj8Jj9y8RjdE/gtow/KNDbZMTqNrvJh k6alLcvJGgZmFbIzmuo3qTqRM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=Vot1fmQUxTmwZR3hqo6/HRH8 JJY=; b=MrMhDaLskXzWuZZswxQNm+RgbXs8ViASaUyn5Vr6vYCS2D69osJRZU63 e6wI3bKoqao4f0tlrGRHIEQrk8HpRLKmSYaaBx03RNjQAhoNObX84IKkd2NcOJYU z0GpKJdFBsnn6h2Yzvcn6J393trJ4leyo5tPRkYTo6E9dX2reL4= Received: (qmail 6999 invoked by alias); 16 Jan 2020 09:43:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6990 invoked by uid 89); 16 Jan 2020 09:43:03 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 16 Jan 2020 09:42:53 +0000 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00G9gjxl018144 for ; Thu, 16 Jan 2020 04:42:51 -0500 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0b-001b2d01.pphosted.com with ESMTP id 2xhgv6p796-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 16 Jan 2020 04:42:49 -0500 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 16 Jan 2020 09:42:47 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 16 Jan 2020 09:42:45 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 00G9gi0e57540756 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Jan 2020 09:42:44 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E31F5205F; Thu, 16 Jan 2020 09:42:44 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.222]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 72C1552051; Thu, 16 Jan 2020 09:42:42 +0000 (GMT) Subject: [PATCH 4/4 GCC11] rs6000: P9 D-form test cases From: "Kewen.Lin" To: GCC Patches Cc: Segher Boessenkool , Bill Schmidt , "bin.cheng" , Richard Guenther References: Date: Thu, 16 Jan 2020 17:42:41 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: x-cbid: 20011609-0016-0000-0000-000002DDCDAE X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20011609-0017-0000-0000-000033406495 Message-Id: <91209c2f-50f8-556b-bde4-3a4f93c8accd@linux.ibm.com> X-IsSubscribed: yes gcc/testsuite/ChangeLog 2020-01-16 Kelvin Nilsen Kewen Lin * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New test. gcc/testsuite/gcc.target/powerpc/p9-dform-0.c | 43 +++++++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-dform-1.c | 55 ++++++++++++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-dform-2.c | 12 +++++ gcc/testsuite/gcc.target/powerpc/p9-dform-3.c | 15 ++++++ gcc/testsuite/gcc.target/powerpc/p9-dform-4.c | 12 +++++ .../gcc.target/powerpc/p9-dform-generic.h | 34 +++++++++++++ 6 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-4.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-0.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-0.c new file mode 100644 index 0000000..01f8b69 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-0.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +/* This test confirms that the dform instructions are selected in the + translation of this main program. */ + +extern void first_dummy (); +extern void dummy (double sacc, int n); +extern void other_dummy (); + +extern float opt_value; +extern char *opt_desc; + +#define M 128 +#define N 512 + +double x [N]; +double y [N]; + +int main (int argc, char *argv []) { + double sacc; + + first_dummy (); + for (int j = 0; j < M; j++) { + + sacc = 0.00; + for (unsigned long long int i = 0; i < N; i++) { + sacc += x[i] * y[i]; + } + dummy (sacc, N); + } + opt_value = ((float) N) * 2 * ((float) M); + opt_desc = "flops"; + other_dummy (); +} + +/* At time the dform optimization pass was merged with trunk, 12 + lxv instructions were emitted in place of the same number of lxvx + instructions. No need to require exactly this number, as it may + change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mlxv\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-1.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-1.c new file mode 100644 index 0000000..c6f1d76 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-1.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +/* This test confirms that the dform instructions are selected in the + translation of this main program. */ + +extern void first_dummy (); +extern void dummy (double sacc, int n); +extern void other_dummy (); + +extern float opt_value; +extern char *opt_desc; + +#define M 128 +#define N 512 + +double x [N]; +double y [N]; +double z [N]; + +int main (int argc, char *argv []) { + double sacc; + + first_dummy (); + for (int j = 0; j < M; j++) { + + sacc = 0.00; + for (unsigned long long int i = 0; i < N; i++) { + z[i] = x[i] * y[i]; + sacc += z[i]; + } + dummy (sacc, N); + } + opt_value = ((float) N) * 2 * ((float) M); + opt_desc = "flops"; + other_dummy (); +} + + + +/* At time the dform optimization pass was merged with trunk, 12 + lxv instructions were emitted in place of the same number of lxvx + instructions. No need to require exactly this number, as it may + change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mlxv\M} } } */ + +/* At time the dform optimization pass was merged with trunk, 6 + stxv instructions were emitted in place of the same number of stxvx + instructions. No need to require exactly this number, as it may + change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mstxv\M} } } */ + diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-2.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-2.c new file mode 100644 index 0000000..8752f3d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +#define TYPE int +#include "p9-dform-generic.h" + +/* The precise number of lxv and stxv instructions may be impacted by + complex interactions between optimization passes, but we expect at + least one of each. */ +/* { dg-final { scan-assembler {\mlxv\M} } } */ +/* { dg-final { scan-assembler {\mstxv\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-3.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-3.c new file mode 100644 index 0000000..df299a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +#define TYPE double +#include "p9-dform-generic.h" + +/* At time the dform optimization pass was merged with trunk, 6 + lxv instructions were emitted in place of the same number of lxvx + instructions and 8 stxv instructions replace the same number of + stxvx instructions. No need to require exactly this number, as it + may change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mlxv\M} } } */ +/* { dg-final { scan-assembler {\mstxv\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-4.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-4.c new file mode 100644 index 0000000..d712958 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -funroll-loops -mfloat128" } */ + +#define TYPE __float128 +#include "p9-dform-generic.h" + +/* The precise number of lxv and stxv instructions may be impacted by + complex interactions between optimization passes, but we expect at + least one of each. */ +/* { dg-final { scan-assembler {\mlxv\M} } } */ +/* { dg-final { scan-assembler {\mstxv\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h b/gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h new file mode 100644 index 0000000..3caed25 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h @@ -0,0 +1,34 @@ + +#define ITERATIONS 1000000 + +#define SIZE (16384/sizeof(TYPE)) + +static TYPE x[SIZE] __attribute__ ((aligned (16))); +static TYPE y[SIZE] __attribute__ ((aligned (16))); +static TYPE a; + +void obfuscate(void *a, ...); + +static void __attribute__((noinline)) do_one(void) +{ + unsigned long i; + + obfuscate(x, y, &a); + + for (i = 0; i < SIZE; i++) + y[i] = a * x[i]; + + obfuscate(x, y, &a); + +} + +int main(void) +{ + unsigned long i; + + for (i = 0; i < ITERATIONS; i++) + do_one(); + + return 0; + +}