From patchwork Tue May 14 03:09:55 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Kewen.Lin" <linkw@linux.ibm.com>
X-Patchwork-Id: 1099196
Return-Path: 
 <gcc-patches-return-500601-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-500601-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none)
	header.from=linux.ibm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="L51tY938"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4532k073yHz9sBV
	for <incoming@patchwork.ozlabs.org>;
	Tue, 14 May 2019 13:10:16 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id; q=dns; s=default; b=jeDPWyAlZbao
	VxnvPZ7PU2Koch8UCWG5eyEdfF7Mu5q/D9vsrL0MetBmy3RwRd8/XEWrDm4tngHs
	g/WM0v5XF7o7ETuXCm2Tx44/YZmxdqeVAgKczNbCyf9NjniW+S3WZbZHShXJzpQi
	u1weehXvyRLsiEWi3Jv5sVKYqF6DYec=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id; s=default; bh=3dAwsUCVxEUwL5h2eW
	3+21bU8e4=; b=L51tY938QKoEK1Tza2W9wR/8zCKJINslxOXdAkfAnVJlkwFT/T
	TKcg4ZhWmJubxioMKowQXeU5apO9IY6mgl/qFfimaGpyNUA8K/046A/T9BEukxQu
	KP+l73Ylblg3VeEjJLWReMS6k5KvFKJao4iI1vXgdu/svJaYO5f61NC9g=
Received: (qmail 2440 invoked by alias); 14 May 2019 03:10:09 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 2430 invoked by uid 89); 14 May 2019 03:10:09 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT,
	RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.156.1) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 14 May 2019 03:10:06 +0000
Received: from pps.filterd (m0098393.ppops.net [127.0.0.1])	by
	mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id
	x4E329xK031765	for <gcc-patches@gcc.gnu.org>;
	Mon, 13 May 2019 23:10:05 -0400
Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com
	[195.75.94.101])	by mx0a-001b2d01.pphosted.com with ESMTP id
	2sfj48ph1t-1	(version=TLSv1.2 cipher=AES256-GCM-SHA384
	bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Mon, 13 May 2019 23:10:05 -0400
Received: from localhost	by e06smtp05.uk.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<linkw@linux.ibm.com>; Tue, 14 May 2019 04:10:03 +0100
Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196)	by
	e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	(version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)	Tue, 14
	May 2019 04:10:00 +0100
Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com
	[9.149.105.61])	by b06cxnps4074.portsmouth.uk.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id
	x4E39ww648562206	(version=TLSv1/SSLv3
	cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Tue, 14 May 2019 03:09:58 GMT
Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id 7373E11C064;
	Tue, 14 May 2019 03:09:58 +0000 (GMT)
Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id 52AAD11C052;
	Tue, 14 May 2019 03:09:57 +0000 (GMT)
Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157])	by
	d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP;
	Tue, 14 May 2019 03:09:57 +0000 (GMT)
From: linkw@linux.ibm.com
To: gcc-patches@gcc.gnu.org
Cc: segher@kernel.crashing.org, wschmidt@linux.ibm.com,
	bin.cheng@linux.alibaba.com, rguenther@suse.de,
	jakub@redhat.com, Kewen Lin <linkw@linux.ibm.com>
Subject: [PATCH v2 2/3] Add predict_doloop_p target hook
Date: Mon, 13 May 2019 22:09:55 -0500
x-cbid: 19051403-0020-0000-0000-0000033C597E
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19051403-0021-0000-0000-0000218F124A
Message-Id: <1557803395-117707-1-git-send-email-linkw@linux.ibm.com>
X-IsSubscribed: yes

From: Kewen Lin <linkw@linux.ibm.com>

Previous version link for background:
https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00912.html

This hook is to predict whether one loop in gimple will
be transformed to low-overhead loop later in RTL, and
designed to be called in ivopts.

"Since the low-overhead loop optimize transformation is
based on RTL, some of those checks are hard to be imitated
on gimple, so it's not possible to predict the current
loop will be transformed exactly in middle-end.  But if we
can have most loop predicted precisely, it would be helpful.
It highly depends on target hook fine tuning. It's
acceptable to have some loops which can be transformed to
low-overhead loop but we don't catch.  But we should try
our best to avoid to predict some loop as low-overhead loop
but which isn't."

Bootstrapped and regression testing passed on powerpc64le.

Is it ok for trunk?

gcc/ChangeLog

2019-05-13  Kewen Lin  <linkw@gcc.gnu.org>

	PR middle-end/80791
	* target.def (predict_doloop_p): New hook.
	* targhooks.h (default_predict_doloop_p): New declaration.
	* targhooks.c (default_predict_doloop_p): New function.
	* doc/tm.texi.in (TARGET_PREDICT_DOLOOP_P): New hook.
	* doc/tm.texi: Regenerate.
	* config/rs6000/rs6000.c (invalid_insn_for_doloop_p): New function.
	(costly_iter_for_doloop_p): Likewise.
	(rs6000_predict_doloop_p): Likewise.
	(TARGET_PREDICT_DOLOOP_P): New macro.
---
 gcc/config/rs6000/rs6000.c | 174 ++++++++++++++++++++++++++++++++++++++++++++-
 gcc/doc/tm.texi            |   8 +++
 gcc/doc/tm.texi.in         |   2 +
 gcc/target.def             |   9 +++
 gcc/targhooks.c            |  13 ++++
 gcc/targhooks.h            |   1 +
 6 files changed, 206 insertions(+), 1 deletion(-)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a21f4f7..1c1c8eb 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -83,6 +83,9 @@
 #include "tree-ssa-propagate.h"
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-cfg.h"
+#include "tree-scalar-evolution.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1914,6 +1917,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_PREDICT_DOLOOP_P
+#define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
@@ -39436,7 +39442,173 @@ rs6000_mangle_decl_assembler_name (tree decl, tree id)
   return id;
 }
 
-
+/* Check whether there are some instructions preventing doloop transformation
+   inside loop body, mainly for instructions which are possible to kill CTR.
+
+   Return true if some invalid insn exits, otherwise return false.  */
+
+static bool
+invalid_insn_for_doloop_p (struct loop *loop)
+{
+  basic_block *body = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt)
+	    && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file,
+		       "predict doloop failure due to finding call.\n");
+	    return true;
+	  }
+	if (computed_goto_p (stmt))
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "predict doloop failure due to"
+				  "finding computed jump.\n");
+	    return true;
+	  }
+	/* TODO: Now this hook is expected to be called in ivopts, which is
+	   before switchlower1/switchlower2.  Checking for SWITCH at this point
+       will eliminate some good candidates.  But since there are only a few
+       cases having _a_ switch statement in the innermost loop, it can be a low
+	   priority enhancement.  */
+
+	if (gimple_code (stmt) == GIMPLE_SWITCH)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file,
+		       "predict doloop failure due to finding switch.\n");
+	    return true;
+	  }
+      }
+
+  return false;
+}
+
+/* Check whether number of iteration computation is too costly for doloop
+   transformation.  It expands the gimple sequence to equivalent RTL insn
+   sequence, then evaluate the cost.
+
+   Return true if it's costly, otherwise return false.  */
+
+static bool
+costly_iter_for_doloop_p (struct loop *loop, tree niters)
+{
+  tree type = TREE_TYPE (niters);
+  unsigned cost = 0, i;
+  tree obj;
+  bool speed = optimize_loop_for_speed_p (loop);
+  int regno = LAST_VIRTUAL_REGISTER + 1;
+  vec<tree> tvec;
+  tvec.create (20);
+  decl_rtl_data tdata;
+  tdata.regno = &regno;
+  tdata.treevec = &tvec;
+  walk_tree (&niters, prepare_decl_rtl, &tdata, NULL);
+  start_sequence ();
+  expand_expr (niters, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  FOR_EACH_VEC_ELT (tvec, i, obj)
+    SET_DECL_RTL (obj, NULL_RTX);
+  tvec.release ();
+
+  for (; seq; seq = NEXT_INSN (seq))
+    {
+      if (!INSN_P (seq))
+	continue;
+      rtx body = PATTERN (seq);
+      if (GET_CODE (body) == SET)
+	{
+	  rtx set_val = XEXP (body, 1);
+	  enum rtx_code code = GET_CODE (set_val);
+	  enum rtx_class cls = GET_RTX_CLASS (code);
+	  /* For now, we only consider these two RTX classes, to match what we
+	 get in doloop_optimize, excluding operations like zero/sign extend.  */
+	  if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH)
+	    cost += set_src_cost (set_val, GET_MODE (set_val), speed);
+	}
+    }
+  unsigned max_cost
+    = COSTS_N_INSNS (PARAM_VALUE (PARAM_MAX_ITERATIONS_COMPUTATION_COST));
+  if (cost > max_cost)
+    return true;
+
+  return false;
+}
+
+/*
+   Predict whether the given loop will be transformed in the RTL
+   doloop_optimize pass.  This is for use by the IVOPTS middle-end pass.
+   Attempt to duplicate as many doloop_optimize checks as possible.
+
+   Some RTL specific checks seems unable to be checked here, if any
+   new checks or easy checks _are_ missing here.  */
+
+static bool
+rs6000_predict_doloop_p (struct loop *loop)
+{
+  gcc_assert (loop);
+
+  /* On rs6000, targetm.can_use_doloop_p is actually
+     can_use_doloop_if_innermost.  Just ensure it's innermost.  */
+  if (loop->inner != NULL)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "predict doloop failure due to"
+			    "no innermost.\n");
+      return false;
+    }
+
+  /* number_of_latch_executions is not so costly, so we don't use
+     number_of_iterations_exit for iteration description.  */
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "predict doloop failure due to"
+			    "unexpected niters.\n");
+      return false;
+    }
+
+  /* Similar to doloop_optimize, check whether iteration count too small
+     and not profitable.  */
+  HOST_WIDE_INT est_niter = get_estimated_loop_iterations_int (loop);
+  if (est_niter == -1)
+    est_niter = get_likely_max_loop_iterations_int (loop);
+  if (est_niter >= 0 && est_niter < 3)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "predict doloop failure due to"
+		 "too few iterations (%d).\n",
+		 (unsigned int) est_niter);
+      return false;
+    }
+
+  /* Similar to doloop_optimize, check whether number of iterations too costly
+     to compute.  */
+  if (costly_iter_for_doloop_p (loop, niters))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "predict doloop failure due to"
+			    "costly niter computation.\n");
+      return false;
+    }
+
+  /* Similar to doloop_optimize targetm.invalid_within_doloop, check for
+     CALL, tablejump, computed_jump.  */
+  if (invalid_insn_for_doloop_p (loop))
+    return false;
+
+  return true;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c8978b..e595b8b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11603,6 +11603,14 @@ function version at run-time for a given set of function versions.
 body must be generated.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (struct loop *@var{loop})
+Return true if we can predict it is possible to use low-overhead loops
+for a particular loop.  The parameter @var{loop} is a pointer to the loop
+which is going to be checked.  This target hook is required only when the
+target supports low-overhead loops, and will help some earlier middle-end
+passes to make some decisions.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top})
 Return true if it is possible to use low-overhead loops (@code{doloop_end}
 and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index fe1194e..e047734 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7942,6 +7942,8 @@ to by @var{ce_info}.
 
 @hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
 
+@hook TARGET_PREDICT_DOLOOP_P
+
 @hook TARGET_CAN_USE_DOLOOP_P
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
diff --git a/gcc/target.def b/gcc/target.def
index 66cee07..0a80de3 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4227,6 +4227,15 @@ DEFHOOK
 rtx, (machine_mode mode, rtx result, rtx val, rtx failval),
  default_speculation_safe_value)
  
+DEFHOOK
+(predict_doloop_p,
+ "Return true if we can predict it is possible to use low-overhead loops\n\
+for a particular loop.  The parameter @var{loop} is a pointer to the loop\n\
+which is going to be checked.  This target hook is required only when the\n\
+target supports low-overhead loops, and will help some earlier middle-end\n\
+passes to make some decisions.",
+ bool, (struct loop *loop),
+ default_predict_doloop_p)
 
 DEFHOOK
 (can_use_doloop_p,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 318f7e9..56ed4cc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -643,6 +643,19 @@ default_has_ifunc_p (void)
   return HAVE_GNU_INDIRECT_FUNCTION;
 }
 
+/* True if we can predict this loop is possible to be transformed to a
+   low-overhead loop, otherwise returns false.
+
+   By default, false is returned, as this hook's applicability should be
+   verified for each target.  Target maintainers should re-define the hook
+   if the target can take advantage of it.  */
+
+bool
+default_predict_doloop_p (struct loop *loop ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* NULL if INSN insn is valid within a low-overhead loop, otherwise returns
    an error message.
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5943627..70bfb17 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -85,6 +85,7 @@ extern bool default_fixed_point_supported_p (void);
 
 extern bool default_has_ifunc_p (void);
 
+extern bool default_predict_doloop_p (struct loop *);
 extern const char * default_invalid_within_doloop (const rtx_insn *);
 
 extern tree default_builtin_vectorized_function (unsigned int, tree, tree);