From patchwork Wed Apr 24 08:41:01 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Kewen.Lin" <linkw@linux.ibm.com>
X-Patchwork-Id: 1090000
Return-Path: 
 <gcc-patches-return-499589-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-499589-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none)
	header.from=linux.ibm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="aq+AJwF5"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 44pv1R3kYLz9s4Y
	for <incoming@patchwork.ozlabs.org>;
	Wed, 24 Apr 2019 18:41:28 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; q=dns; s=default; b=MU5JT
	F3PT1WjGKCCw1gPaaXX12PiXwcP61m6p7Gqulby6OYND9A5GJfc08/+XtkrEZv5l
	wNMt29+GMG17nHQpcAWQV7q60LaZsYznr4oWL5r1MmlMLTGcr9lg//UOuptopjES
	4fzoQQ6tSEoilent1xg5/BZWPJ5aB/DtGybB0Q=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; s=default; bh=eTFGzdLUKvL
	AgD7eSQk8vx0ji4U=; b=aq+AJwF5LHYIi6eaeHdBmlXruBxbRCYwxqTeQ0rDaV4
	djKvZ3zxi4qrUxG353hn2bjZsK6woAQrWzVvfEuOnYmRy1tdTIcXgldNJuolHk6p
	5RpypQKBKPOuM5kf6odX1Z/JlVXPRnvvKW1txpMKsQ5wos86m4UjF6YRRkDPDYI4
	=
Received: (qmail 100066 invoked by alias); 24 Apr 2019 08:41:19 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 100057 invoked by uid 89); 24 Apr 2019 08:41:19 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-19.2 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT,
	KAM_STOCKGEN, KHOP_DYNAMIC, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.156.1) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 24 Apr 2019 08:41:13 +0000
Received: from pps.filterd (m0098394.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id
	x3O8dMdI029937	for <gcc-patches@gcc.gnu.org>;
	Wed, 24 Apr 2019 04:41:11 -0400
Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99])	by
	mx0a-001b2d01.pphosted.com with ESMTP id
	2s2hb80yud-1	(version=TLSv1.2 cipher=AES256-GCM-SHA384
	bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Wed, 24 Apr 2019 04:41:11 -0400
Received: from localhost	by e06smtp03.uk.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<linkw@linux.ibm.com>; Wed, 24 Apr 2019 09:41:09 +0100
Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195)	by
	e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	(version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)	Wed, 24
	Apr 2019 09:41:06 +0100
Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com
	[9.149.105.60])	by b06cxnps3075.portsmouth.uk.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id
	x3O8f5oA62390324	(version=TLSv1/SSLv3
	cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Wed, 24 Apr 2019 08:41:05 GMT
Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id A223F4203F;
	Wed, 24 Apr 2019 08:41:05 +0000 (GMT)
Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id A766342042;
	Wed, 24 Apr 2019 08:41:02 +0000 (GMT)
Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.132])	by
	d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP;
	Wed, 24 Apr 2019 08:41:02 +0000 (GMT)
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: bin.cheng@linux.alibaba.com,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Bill Schmidt <wschmidt@linux.ibm.com>,
	Richard Guenther <rguenther@suse.de>, jakub@redhat.com
From: "Kewen.Lin" <linkw@linux.ibm.com>
Subject: [PATCH, RFC, rs6000] PR80791 Consider doloop in ivopts
Date: Wed, 24 Apr 2019 16:41:01 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14;
	rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
x-cbid: 19042408-0012-0000-0000-00000312CAC9
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19042408-0013-0000-0000-0000214B1F16
Message-Id: <f63c1459-6e99-8ea5-45d1-a486ed801c0f@linux.ibm.com>
X-IsSubscribed: yes

Hi all,

As PR80791 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80791, on
some targets which support low-overhead loop, the related compare
type ivs use is possible to become dead and removed eventually.
However, the current ivopt cost modeling doesn't consider this
possible elimination, it leads suboptimal iv candidate decision.

To make it better, this patch is to introduce one target hook, to
provide target which supports low-overhead loop a way to teach
ivopt to know which compare type iv can be ignored in modeling.

Since the low-overhead loop optimize transformation is based on RTL,
some of those checks are hard to be imitated on gimple, so it's not
possible to predict the current loop will be transformed exactly
in middle-end.  But if we can have most loop predicted precisely,
it would be helpful.  It highly depends on target hook fine tuning.
It's acceptable to have some loops which can be transformed to low-
overhead loop but we don't catch.  But we should try our best to
avoid to predict some loop as low-overhead loop but which isn't.

Bootstrapped and regression testing on powerpc64le.

One failure was found in regression testing, which is:
  gcc.target/powerpc/20050830-1.c

I did some investigation and found: with this patch, one new iv cand
is chosen for exit condition, which is used to rewrite the compare
type use.  Later loop iv analysis in RTL can NOT determine the loop
iteration number is finite, which causes doloop_optimize not transform
the loop.  The check doesn't find the expected pattern.

Still investigating how to solve this failure, like either to enhance
loop iv analysis or to check some special condition and guard in ivopt.
Any suggestions are highly welcomed.

Btw, this is for GCC10.

---------------------------------------------

gcc/ChangeLog

2019-04-24  Kewen Lin  <linkw@gcc.gnu.org>

	PR middle-end/80791
	* target.def (predict_doloop_p): New.
	* targhooks.h (default_predict_doloop_p): New.
	* targhooks.c (default_predict_doloop_p): Likewise.
	* doc/tm.texi.in (TARGET_PREDICT_DOLOOP_P): New.
	* doc/tm.texi: Regenerate.
	* config/rs6000/rs6000.c (invalid_insn_for_doloop_p): New.
	  (costly_iter_for_doloop_p): New.
	  (rs6000_predict_doloop_p): New.
	* gcc/expr.c (produce_memory_decl_rtl): New.
	  (prepare_decl_rtl): New.
	* gcc/expr.h (produce_memory_decl_rtl): Declare.
	  (prepare_decl_rtl): Declare.
	* gcc/tree-ssa-loop-ivopts.c (tree_ssa_iv_optimize_loop): Call
	  predict_doloop_p hook.
	  (tailor_cmp_uses): New.
	  (preserve_ivs_for_use): New.
	  (computation_cost): Call refactored prepare_decl_rtl, consider
	  zero cost iv use.
	  (dump_use): Dump zero_cost_p field.
	  (record_use): Init zero_cost_p field.
	  (produce_memory_decl_rtl): Remove.
	  (prepare_decl_rtl): Remove.

gcc/testsuite/ChangeLog

2019-04-24  Kewen Lin  <linkw@gcc.gnu.org>

	PR middle-end/80791
	* gcc.dg/tree-ssa/ivopts-lt.c : Adjust.
---
 gcc/config/rs6000/rs6000.c                | 174 ++++++++++++++++++++-
 gcc/doc/tm.texi                           |   8 +
 gcc/doc/tm.texi.in                        |   2 +
 gcc/expr.c                                |  91 +++++++++++
 gcc/expr.h                                |  16 +-
 gcc/target.def                            |   9 ++
 gcc/targhooks.c                           |  13 ++
 gcc/targhooks.h                           |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c |   7 +-
 gcc/tree-ssa-loop-ivopts.c                | 247 +++++++++++++++++++-----------
 10 files changed, 472 insertions(+), 96 deletions(-)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a21f4f7..756f914 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -83,6 +83,9 @@
 #include "tree-ssa-propagate.h"
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-cfg.h"
+#include "tree-scalar-evolution.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1914,6 +1917,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_PREDICT_DOLOOP_P
+#define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
@@ -39436,7 +39442,173 @@ rs6000_mangle_decl_assembler_name (tree decl, tree id)
   return id;
 }
 
-
+/* Check whether there are some instructions preventing doloop transformation
+   inside loop body, mainly for instructions which are possible to kill CTR.
+
+   Return true if some invalid insn exits, otherwise return false.  */
+
+static bool
+invalid_insn_for_doloop_p (struct loop *loop)
+{
+  basic_block *body = get_loop_body (loop);
+  unsigned num_nodes = loop->num_nodes;
+  gimple_stmt_iterator gsi;
+  unsigned i;
+
+  for (i = 0; i < num_nodes; i++)
+    for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt)
+	    && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file,
+		       "predict doloop failure due to finding call.\n");
+	    return true;
+	  }
+	if (computed_goto_p (stmt))
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (
+		dump_file,
+		"predict doloop failure due to finding computed jump.\n");
+	    return true;
+	  }
+	/* TODO: Now this hook is expected to be called in ivopts, which is
+	   before switchlower1/switchlower2.  Checking for SWITCH at this point
+       will eliminate some good candidates.  But since there are only a few
+       cases having _a_ switch statement in the innermost loop, it can be a low
+	   priority enhancement.  */
+
+	if (gimple_code (stmt) == GIMPLE_SWITCH)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file,
+		       "predict doloop failure due to finding switch.\n");
+	    return true;
+	  }
+      }
+
+  return false;
+}
+
+/* Check whether number of iteration computation is too costly for doloop
+   transformation.  It expands the gimple sequence to equivalent RTL insn
+   sequence, then evaluate the cost.
+
+   Return true if it's costly, otherwise return false.  */
+
+static bool
+costly_iter_for_doloop_p (struct loop *loop, tree niters)
+{
+  tree type = TREE_TYPE (niters);
+  unsigned cost = 0, i;
+  tree obj;
+  bool speed = optimize_loop_for_speed_p (loop);
+  int regno = LAST_VIRTUAL_REGISTER + 1;
+  vec<tree> tvec;
+  tvec.create (20);
+  decl_rtl_data tdata;
+  tdata.regno = &regno;
+  tdata.treevec = &tvec;
+  walk_tree (&niters, prepare_decl_rtl, &tdata, NULL);
+  start_sequence ();
+  expand_expr (niters, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  FOR_EACH_VEC_ELT (tvec, i, obj)
+    SET_DECL_RTL (obj, NULL_RTX);
+  tvec.release ();
+
+  for (; seq; seq = NEXT_INSN (seq))
+    {
+      if (!INSN_P (seq))
+	continue;
+      rtx body = PATTERN (seq);
+      if (GET_CODE (body) == SET)
+	{
+	  rtx set_val = XEXP (body, 1);
+	  enum rtx_code code = GET_CODE (set_val);
+	  enum rtx_class cls = GET_RTX_CLASS (code);
+	  /* For now, we only consider these two RTX classes, to match what we
+	 get in doloop_optimize, excluding operations like zero/sign extend.  */
+	  if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH)
+	    cost += set_src_cost (set_val, GET_MODE (set_val), speed);
+	}
+    }
+  unsigned max_cost
+    = COSTS_N_INSNS (PARAM_VALUE (PARAM_MAX_ITERATIONS_COMPUTATION_COST));
+  if (cost > max_cost)
+    return true;
+
+  return false;
+}
+
+/*
+   Predict whether the given loop will be transformed in the RTL
+   doloop_optimize pass.  This is for use by the IVOPTS middle-end pass.
+   Attempt to duplicate as many doloop_optimize checks as possible.
+
+   Some RTL specific checks seems unable to be checked here, if any
+   new checks or easy checks _are_ missing here.  */
+
+static bool
+rs6000_predict_doloop_p (struct loop *loop)
+{
+  gcc_assert (loop);
+
+  /* On rs6000, targetm.can_use_doloop_p is actually
+     can_use_doloop_if_innermost.  Just ensure it's innermost.  */
+  if (loop->inner != NULL)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "predict doloop failure due to no innermost.\n");
+      return false;
+    }
+
+  /* number_of_latch_executions is not so costly, so we don't use
+     number_of_iterations_exit for iteration description.  */
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "predict doloop failure due to unexpected niters.\n");
+      return false;
+    }
+
+  /* Similar to doloop_optimize, check whether iteration count too small
+     and not profitable.  */
+  HOST_WIDE_INT est_niter = get_estimated_loop_iterations_int (loop);
+  if (est_niter == -1)
+    est_niter = get_likely_max_loop_iterations_int (loop);
+  if (est_niter >= 0 && est_niter < 3)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "predict doloop failure due to too few iterations (%d).\n",
+		 (unsigned int) est_niter);
+      return false;
+    }
+
+  /* Similar to doloop_optimize, check whether number of iterations too costly
+     to compute.  */
+  if (costly_iter_for_doloop_p (loop, niters))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "predict doloop failure due to costly niter computation.\n");
+      return false;
+    }
+
+  /* Similar to doloop_optimize targetm.invalid_within_doloop, check for
+     CALL, tablejump, computed_jump.  */
+  if (invalid_insn_for_doloop_p (loop))
+    return false;
+
+  return true;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c8978b..d3ad208 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11603,6 +11603,14 @@ function version at run-time for a given set of function versions.
 body must be generated.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (struct loop *@var{loop})
+Return true if we can predict it is possible to use low-overhead loops
+for a particular loop.  The parameter @var{loop} is a pointer to the loop
+which is going to be checked. This target hook is required only when the
+target supports low-overhead loops, and will help some earlier middle-end
+passes to make some decisions.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top})
 Return true if it is possible to use low-overhead loops (@code{doloop_end}
 and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index fe1194e..e047734 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7942,6 +7942,8 @@ to by @var{ce_info}.
 
 @hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
 
+@hook TARGET_PREDICT_DOLOOP_P
+
 @hook TARGET_CAN_USE_DOLOOP_P
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
diff --git a/gcc/expr.c b/gcc/expr.c
index 9ff5e5f..1f2ad45 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -12539,3 +12539,94 @@ int_expr_size (tree exp)
 
   return tree_to_shwi (size);
 }
+
+/* Produce DECL_RTL for object obj so it looks like it is stored in memory.  */
+
+rtx
+produce_memory_decl_rtl (tree obj, int *regno)
+{
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj));
+  machine_mode address_mode = targetm.addr_space.address_mode (as);
+  rtx x;
+
+  gcc_assert (obj);
+  if (TREE_STATIC (obj) || DECL_EXTERNAL (obj))
+    {
+      const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (obj));
+      x = gen_rtx_SYMBOL_REF (address_mode, name);
+      SET_SYMBOL_REF_DECL (x, obj);
+      x = gen_rtx_MEM (DECL_MODE (obj), x);
+      set_mem_addr_space (x, as);
+      targetm.encode_section_info (obj, x, true);
+    }
+  else
+    {
+      x = gen_raw_REG (address_mode, (*regno)++);
+      x = gen_rtx_MEM (DECL_MODE (obj), x);
+      set_mem_addr_space (x, as);
+    }
+
+  return x;
+}
+
+/* Prepares decl_rtl for variables referred in *EXPR_P.  Callback for
+   walk_tree.  DATA contains the actual fake register number.  */
+
+tree
+prepare_decl_rtl (tree *expr_p, int *ws, void *data)
+{
+  tree obj = NULL_TREE;
+  rtx x = NULL_RTX;
+  decl_rtl_data *info = (decl_rtl_data *) data;
+  int *regno = info->regno;
+  vec<tree> *treevec = info->treevec;
+
+  switch (TREE_CODE (*expr_p))
+    {
+    case ADDR_EXPR:
+      for (expr_p = &TREE_OPERAND (*expr_p, 0); handled_component_p (*expr_p);
+	   expr_p = &TREE_OPERAND (*expr_p, 0))
+	continue;
+      obj = *expr_p;
+      if (DECL_P (obj) && HAS_RTL_P (obj) && !DECL_RTL_SET_P (obj))
+	x = produce_memory_decl_rtl (obj, regno);
+      break;
+
+    case SSA_NAME:
+      *ws = 0;
+      obj = SSA_NAME_VAR (*expr_p);
+      /* Defer handling of anonymous SSA_NAMEs to the expander.  */
+      if (!obj)
+	return NULL_TREE;
+      if (!DECL_RTL_SET_P (obj))
+	x = gen_raw_REG (DECL_MODE (obj), (*regno)++);
+      break;
+
+    case VAR_DECL:
+    case PARM_DECL:
+    case RESULT_DECL:
+      *ws = 0;
+      obj = *expr_p;
+
+      if (DECL_RTL_SET_P (obj))
+	break;
+
+      if (DECL_MODE (obj) == BLKmode)
+	x = produce_memory_decl_rtl (obj, regno);
+      else
+	x = gen_raw_REG (DECL_MODE (obj), (*regno)++);
+
+      break;
+
+    default:
+      break;
+    }
+
+  if (x)
+    {
+      treevec->safe_push (obj);
+      SET_DECL_RTL (obj, x);
+    }
+
+  return NULL_TREE;
+}
diff --git a/gcc/expr.h b/gcc/expr.h
index 17c3962..b1894a6b 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -53,7 +53,21 @@ typedef struct separate_ops
   tree type;
   tree op0, op1, op2;
 } *sepops;
-
+
+/* This structure is used to pass information to tree walker function
+   prepare_decl_rtl.  */
+typedef struct data_for_decl_rtl
+{
+  int *regno;
+  vec<tree> *treevec;
+} decl_rtl_data;
+
+/* Produce decl_rtl for object so it looks like it is stored in memory.  */
+rtx produce_memory_decl_rtl (tree, int *);
+
+/* Prepares decl_rtl for variables referred.  Callback for walk_tree.  */
+tree prepare_decl_rtl (tree *, int *, void *);
+
 /* This is run during target initialization to set up which modes can be
    used directly in memory and to initialize the block move optab.  */
 extern void init_expr_target (void);
diff --git a/gcc/target.def b/gcc/target.def
index 66cee07..2f28e80 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4227,6 +4227,15 @@ DEFHOOK
 rtx, (machine_mode mode, rtx result, rtx val, rtx failval),
  default_speculation_safe_value)
  
+DEFHOOK
+(predict_doloop_p,
+ "Return true if we can predict it is possible to use low-overhead loops\n\
+for a particular loop.  The parameter @var{loop} is a pointer to the loop\n\
+which is going to be checked. This target hook is required only when the\n\
+target supports low-overhead loops, and will help some earlier middle-end\n\
+passes to make some decisions.",
+ bool, (struct loop *loop),
+ default_predict_doloop_p)
 
 DEFHOOK
 (can_use_doloop_p,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 318f7e9..56ed4cc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -643,6 +643,19 @@ default_has_ifunc_p (void)
   return HAVE_GNU_INDIRECT_FUNCTION;
 }
 
+/* True if we can predict this loop is possible to be transformed to a
+   low-overhead loop, otherwise returns false.
+
+   By default, false is returned, as this hook's applicability should be
+   verified for each target.  Target maintainers should re-define the hook
+   if the target can take advantage of it.  */
+
+bool
+default_predict_doloop_p (struct loop *loop ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* NULL if INSN insn is valid within a low-overhead loop, otherwise returns
    an error message.
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5943627..70bfb17 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -85,6 +85,7 @@ extern bool default_fixed_point_supported_p (void);
 
 extern bool default_has_ifunc_p (void);
 
+extern bool default_predict_doloop_p (struct loop *);
 extern const char * default_invalid_within_doloop (const rtx_insn *);
 
 extern tree default_builtin_vectorized_function (unsigned int, tree, tree);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
index 171c85a..f61288c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
@@ -17,6 +17,7 @@ f1 (char *p, uintptr_t i, uintptr_t n)
   while (i < n);
 }
 
-/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" } } */
-/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts"} } */
-/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts" } } */
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" { target { ! powerpc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "PHI" 2 "ivopts" { target { powerpc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts" { target { ! powerpc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts" { target { ! powerpc*-*-* } } } } */
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index a44b4cb..148a48d 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -109,6 +109,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-vectorizer.h"
 
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
    interface between the GIMPLE and RTL worlds.  */
@@ -376,6 +377,7 @@ struct iv_use
   tree addr_base;	/* Base address with const offset stripped.  */
   poly_uint64_pod addr_offset;
 			/* Const offset stripped from base address.  */
+  bool zero_cost_p; /* This iv use takes zero cost, only compare type.  */
 };
 
 /* Group of uses.  */
@@ -691,6 +693,8 @@ static vec<tree> decl_rtl_to_reset;
 
 static comp_cost force_expr_to_var_cost (tree, bool);
 
+static void preserve_ivs_for_use (struct ivopts_data*, struct iv_use*);
+
 /* The single loop exit if it dominates the latch, NULL otherwise.  */
 
 edge
@@ -765,6 +769,8 @@ dump_use (FILE *file, struct iv_use *use)
   fprintf (file, "    At pos:\t");
   if (use->op_p)
     print_generic_expr (file, *use->op_p, TDF_SLIM);
+  if (use->zero_cost_p)
+    fprintf (file, "\n    Zero cost");
   fprintf (file, "\n");
   dump_iv (file, use->iv, false, 2);
 }
@@ -1549,6 +1555,7 @@ record_use (struct iv_group *group, tree *use_p, struct iv *iv,
   use->op_p = use_p;
   use->addr_base = addr_base;
   use->addr_offset = addr_offset;
+  use->zero_cost_p = false;
 
   group->vuses.safe_push (use);
   return use;
@@ -3687,94 +3694,6 @@ get_group_iv_cost (struct ivopts_data *data, struct iv_group *group,
   return NULL;
 }
 
-/* Produce DECL_RTL for object obj so it looks like it is stored in memory.  */
-static rtx
-produce_memory_decl_rtl (tree obj, int *regno)
-{
-  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj));
-  machine_mode address_mode = targetm.addr_space.address_mode (as);
-  rtx x;
-
-  gcc_assert (obj);
-  if (TREE_STATIC (obj) || DECL_EXTERNAL (obj))
-    {
-      const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (obj));
-      x = gen_rtx_SYMBOL_REF (address_mode, name);
-      SET_SYMBOL_REF_DECL (x, obj);
-      x = gen_rtx_MEM (DECL_MODE (obj), x);
-      set_mem_addr_space (x, as);
-      targetm.encode_section_info (obj, x, true);
-    }
-  else
-    {
-      x = gen_raw_REG (address_mode, (*regno)++);
-      x = gen_rtx_MEM (DECL_MODE (obj), x);
-      set_mem_addr_space (x, as);
-    }
-
-  return x;
-}
-
-/* Prepares decl_rtl for variables referred in *EXPR_P.  Callback for
-   walk_tree.  DATA contains the actual fake register number.  */
-
-static tree
-prepare_decl_rtl (tree *expr_p, int *ws, void *data)
-{
-  tree obj = NULL_TREE;
-  rtx x = NULL_RTX;
-  int *regno = (int *) data;
-
-  switch (TREE_CODE (*expr_p))
-    {
-    case ADDR_EXPR:
-      for (expr_p = &TREE_OPERAND (*expr_p, 0);
-	   handled_component_p (*expr_p);
-	   expr_p = &TREE_OPERAND (*expr_p, 0))
-	continue;
-      obj = *expr_p;
-      if (DECL_P (obj) && HAS_RTL_P (obj) && !DECL_RTL_SET_P (obj))
-	x = produce_memory_decl_rtl (obj, regno);
-      break;
-
-    case SSA_NAME:
-      *ws = 0;
-      obj = SSA_NAME_VAR (*expr_p);
-      /* Defer handling of anonymous SSA_NAMEs to the expander.  */
-      if (!obj)
-	return NULL_TREE;
-      if (!DECL_RTL_SET_P (obj))
-	x = gen_raw_REG (DECL_MODE (obj), (*regno)++);
-      break;
-
-    case VAR_DECL:
-    case PARM_DECL:
-    case RESULT_DECL:
-      *ws = 0;
-      obj = *expr_p;
-
-      if (DECL_RTL_SET_P (obj))
-	break;
-
-      if (DECL_MODE (obj) == BLKmode)
-	x = produce_memory_decl_rtl (obj, regno);
-      else
-	x = gen_raw_REG (DECL_MODE (obj), (*regno)++);
-
-      break;
-
-    default:
-      break;
-    }
-
-  if (x)
-    {
-      decl_rtl_to_reset.safe_push (obj);
-      SET_DECL_RTL (obj, x);
-    }
-
-  return NULL_TREE;
-}
 
 /* Determines cost of the computation of EXPR.  */
 
@@ -3792,7 +3711,10 @@ computation_cost (tree expr, bool speed)
 
   node->frequency = NODE_FREQUENCY_NORMAL;
   crtl->maybe_hot_insn_p = speed;
-  walk_tree (&expr, prepare_decl_rtl, &regno, NULL);
+  decl_rtl_data data;
+  data.regno = &regno;
+  data.treevec = &decl_rtl_to_reset;
+  walk_tree (&expr, prepare_decl_rtl, &data, NULL);
   start_sequence ();
   rslt = expand_expr (expr, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
   seq = get_insns ();
@@ -5284,6 +5206,8 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       inv_exprs = BITMAP_ALLOC (NULL);
       bitmap_set_bit (inv_exprs, inv_expr->id);
     }
+  if (use->zero_cost_p)
+    cost.cost = 0;
   set_group_iv_cost (data, group, cand, cost,
 		     inv_vars, bound, comp, inv_exprs);
 
@@ -7202,6 +7126,11 @@ rewrite_use_compare (struct ivopts_data *data,
       return;
     }
 
+  if (use->zero_cost_p) {
+    preserve_ivs_for_use(data, use);
+    return;
+  }
+
   /* The induction variable elimination failed; just express the original
      giv.  */
   comp = get_computation_at (data->current_loop, use->stmt, use, cand);
@@ -7248,8 +7177,8 @@ rewrite_groups (struct ivopts_data *data)
 
 	  for (j = 0; j < group->vuses.length (); j++)
 	    {
-	      rewrite_use_compare (data, group->vuses[j], cand);
-	      update_stmt (group->vuses[j]->stmt);
+		  rewrite_use_compare (data, group->vuses[j], cand);
+		  update_stmt (group->vuses[j]->stmt);
 	    }
 	}
     }
@@ -7527,12 +7456,134 @@ loop_body_includes_call (basic_block *body, unsigned num_nodes)
   return false;
 }
 
+/* For a comparison iv_use that we've chosen to ignore, we have to ensure
+   that its dependent IVs are preserved here.  Otherwise remove_unused_ivs
+   will remove those IVs, leading this use to have no feeding IV.  */
+
+static void
+preserve_ivs_for_use (struct ivopts_data *data, struct iv_use *use)
+{
+  struct loop *loop = data->current_loop;
+  struct iv *iv = use->iv;
+  auto_vec<tree> work_list;
+  work_list.safe_push (iv->ssa_name);
+  bitmap visited = BITMAP_ALLOC (NULL);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Preserving IVs: ");
+
+  while (!work_list.is_empty ())
+    {
+      tree op = work_list.pop ();
+      gcc_assert (TREE_CODE (op) == SSA_NAME);
+      gcc_assert (name_info (data, op));
+      if (bitmap_bit_p (visited, SSA_NAME_VERSION (op)))
+	continue;
+
+      struct version_info *vi = name_info (data, op);
+      vi->preserve_biv = true;
+      bitmap_set_bit (visited, SSA_NAME_VERSION (iv->ssa_name));
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  print_generic_expr (dump_file, op, TDF_DETAILS);
+	  fprintf (dump_file, " ");
+	}
+
+      gimple *def = SSA_NAME_DEF_STMT (op);
+      /* Ensure the incr_iv preserved as well, for some case
+         like:
+           i2 = phi(i0, i1);
+           i1 = i2 + ...;
+           cmp i2 ... ;
+      */
+      if (gimple_code (def) == GIMPLE_PHI && gimple_bb (def) == loop->header)
+	{
+	  tree incr_iv = PHI_ARG_DEF_FROM_EDGE (def, loop_latch_edge (loop));
+	  if (bitmap_bit_p (visited, SSA_NAME_VERSION (incr_iv)))
+	    continue;
+	  name_info (data, incr_iv)->preserve_biv = true;
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      print_generic_expr (dump_file, incr_iv, TDF_DETAILS);
+	      fprintf (dump_file, " ");
+	    }
+	  continue;
+	}
+
+      /* Only need to check assign for iv chain.  */
+      if (gimple_code (def) != GIMPLE_ASSIGN)
+	continue;
+
+      ssa_op_iter iter;
+      use_operand_p use_p;
+      tree use_op;
+      struct iv *use_iv;
+      FOR_EACH_PHI_OR_STMT_USE (use_p, def, iter, SSA_OP_USE)
+	{
+	  use_op = USE_FROM_PTR (use_p);
+	  if ((use_iv = name_info (data, use_op)->iv)
+	      && !integer_zerop (use_iv->step))
+	    work_list.safe_push (use_op);
+	}
+    }
+
+  BITMAP_FREE (visited);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\n\n");
+}
+
+/* Doloop optimization RTL pass can make the related comparison computation
+   become dead and get eliminated, then these comparison IV uses should NOT
+   be considered in optimal IVs determination, mark them as ignored and taken
+   as zero cost in cost modeling.  */
+
+static void
+tailor_cmp_uses (struct ivopts_data *data)
+{
+  unsigned i;
+  struct iv_group *group;
+  struct loop *loop = data->current_loop;
+
+  for (i = 0; i < data->vgroups.length (); i++)
+    {
+      group = data->vgroups[i];
+      if (group->type == USE_COMPARE)
+	{
+	  gcc_assert (group->vuses.length () == 1);
+	  struct iv_use *use = group->vuses[0];
+	  gimple *stmt = use->stmt;
+	  if (gimple_code (stmt) == GIMPLE_COND)
+	    {
+	      basic_block bb = gimple_bb (stmt);
+	      edge true_edge, false_edge;
+	      extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+
+	      /* This comparison is used for loop latch.  Require latch is empty
+		     for now.  */
+	      if ((loop->latch == true_edge->dest
+		   || loop->latch == false_edge->dest)
+		  && gsi_end_p (gsi_last_bb (loop->latch)))
+		{
+		  use->zero_cost_p = true;
+		  /* Dump for this use.  */
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    {
+		      fprintf (dump_file, "Zeroing cost for use: ");
+		      print_gimple_stmt (dump_file, stmt, TDF_DETAILS);
+		    }
+		}
+	    }
+	}
+    }
+}
+
 /* Optimizes the LOOP.  Returns true if anything changed.  */
 
 static bool
 tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop)
 {
   bool changed = false;
+  bool tailor_cmp_p = false;
   struct iv_ca *iv_ca;
   edge exit = single_dom_exit (loop);
   basic_block *body;
@@ -7578,6 +7629,20 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop)
   if (data->vgroups.length () > MAX_CONSIDERED_GROUPS)
     goto finish;
 
+  tailor_cmp_p = targetm.predict_doloop_p (loop);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file,
+	       "Predict loop %d can %sperform doloop optimization later.\n",
+	       loop->num, tailor_cmp_p ? "" : "not ");
+      flow_loop_dump (loop, dump_file, NULL, 1);
+    }
+
+  /* Some compare iv_use is probably useless once the doloop optimization
+     performs.  */
+  if (tailor_cmp_p)
+    tailor_cmp_uses (data);
+
   /* Finds candidates for the induction variables (item 2).  */
   find_iv_candidates (data);