From patchwork Tue Aug 18 09:02:58 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Kewen.Lin" <linkw@linux.ibm.com>
X-Patchwork-Id: 1346720
Return-Path: <gcc-patches-bounces@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org;
 envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
 dmarc=none (p=none dis=none) header.from=gcc.gnu.org
Authentication-Results: ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=IDES95LQ;
	dkim-atps=neutral
Received: from sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4BW4hR0MsQz9sRK
	for <incoming@patchwork.ozlabs.org>; Tue, 18 Aug 2020 19:03:33 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 18CF8386100D;
	Tue, 18 Aug 2020 09:03:30 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 18CF8386100D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1597741410;
	bh=p+Cx/6ZwCKLaMCO/hESETa5up0ApNxqTohEC27EvHHc=;
	h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=IDES95LQFFEDjxdPpVqKTxN5cTVuGu9jUdAnwsOa6lv4koI3tzncYPjjWriGhZ9AD
	 vvZvJqFbbsijNQd4kIHqIi2XS+O8QGl6aQj9hOoxeIPkUti532gvwORLy5nD3bTjGe
	 SIAI755D9YiuNZ8xsRtv1agkl7cKouTWt9rW+4qk=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 9355B3857C74
 for <gcc-patches@gcc.gnu.org>; Tue, 18 Aug 2020 09:03:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9355B3857C74
Received: from pps.filterd (m0098394.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 07I93C4K050712; Tue, 18 Aug 2020 05:03:22 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3304sc2kg9-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 18 Aug 2020 05:03:22 -0400
Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 07I93LSS051728;
 Tue, 18 Aug 2020 05:03:21 -0400
Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com
 [169.51.49.99])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3304sc2k8b-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 18 Aug 2020 05:03:21 -0400
Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1])
 by ppma04ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 07I90d3K012117;
 Tue, 18 Aug 2020 09:03:06 GMT
Received: from b06cxnps3075.portsmouth.uk.ibm.com
 (d06relay10.portsmouth.uk.ibm.com [9.149.109.195])
 by ppma04ams.nl.ibm.com with ESMTP id 3304cc0eed-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 18 Aug 2020 09:03:06 +0000
Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60])
 by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 07I9339029294928
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 18 Aug 2020 09:03:03 GMT
Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 5DD9242047;
 Tue, 18 Aug 2020 09:03:03 +0000 (GMT)
Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 0A0C542042;
 Tue, 18 Aug 2020 09:03:00 +0000 (GMT)
Received: from KewenLins-MacBook-Pro.local (unknown [9.200.44.150])
 by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Tue, 18 Aug 2020 09:02:59 +0000 (GMT)
Subject: [PATCH 3/4 v2] ivopts: Consider cost_step on different forms during
 unrolling
To: "Bin.Cheng" <amker.cheng@gmail.com>
References: <a2a9189d-d114-7d07-3b3e-1a4d13613ef1@linux.ibm.com>
 <16f8ae40-c0ae-dd57-346e-f9cacea55038@linux.ibm.com>
 <CAHFci28Gkv_8s2jzDfodb4vu0fWf+fBEam-VzbaMu27WnrYf1Q@mail.gmail.com>
 <e62ab8e7-02a9-7ddf-8b21-eab3bdf0a2ba@linux.ibm.com>
 <CAHFci28TVbvtVSWSmqYonrcuikg1j=fbdtH9cUm2qHQ35a-LMA@mail.gmail.com>
 <54fd03a8-0efa-af95-c929-6beb70f563ea@linux.ibm.com>
 <CAHFci2_T3bS5xJi9NzhAjiEb45dfSDWvQO8dFz0b_nwLtq7j=A@mail.gmail.com>
Message-ID: <7130462d-7ffd-af0b-b62a-fb792a41762f@linux.ibm.com>
Date: Tue, 18 Aug 2020 17:02:58 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0)
 Gecko/20100101 Thunderbird/68.9.0
MIME-Version: 1.0
In-Reply-To: 
 <CAHFci2_T3bS5xJi9NzhAjiEb45dfSDWvQO8dFz0b_nwLtq7j=A@mail.gmail.com>
Content-Language: en-US
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687
 definitions=2020-08-18_06:2020-08-18,
 2020-08-18 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0 malwarescore=0
 bulkscore=0 adultscore=0 lowpriorityscore=0 mlxlogscore=999
 impostorscore=0 phishscore=0 mlxscore=0 spamscore=0 clxscore=1015
 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2006250000 definitions=main-2008180064
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Reply-To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: "bin.cheng" <bin.cheng@linux.alibaba.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 GCC Patches <gcc-patches@gcc.gnu.org>, Bill Schmidt <wschmidt@linux.ibm.com>,
 Richard Guenther <rguenther@suse.de>
Errors-To: gcc-patches-bounces@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>

Hi Bin,

> I see, it's similar to the auto-increment case where cost should be
> recorded only once.  So this is okay given 1) fine predicting
> rtl-unroll is likely impossible here; 2) the patch has very limited
> impact.
> 
Really appreciate your time and patience!

I extended the previous version to address Richard S.'s comments on
candidates with the same base/step but different offsets here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547014.html.

The previous version only allows the candidate derived from the group
of interest, this updated patch extends it to those ones which have the
same bases/steps and same/different offsets but in the acceptable range
by considering unrolling.

For one particular case like: 

            for (i = 0; i < SIZE; i++)
              y[i] = a * x[i] + z[i];

we will mark reg_offset_p for IV candidates on x as below:
   - (unsigned long) (x_18(D) + 8)    // only mark this before.
   - x_18(D) + 8
   - (unsigned long) (x_18(D) + 24)
   - (unsigned long) ((vector(2) double *) (x_18(D) + 8) + 18446744073709551600)
   ...

Do you mind to have a review again?  Thanks in advance!

Bootstrapped/regtested on powerpc64le-linux-gnu P8 and P9.

SPEC2017 P9 performance run has no remarkable degradations/improvements.

BR,
Kewen
-----
gcc/ChangeLog:

	* tree-ssa-loop-ivopts.c (struct iv_cand): New field reg_offset_p.
	(struct ivopts_data): New field consider_reg_offset_for_unroll_p.
	(mark_reg_offset_candidates): New function.
	(add_candidate_1): Set reg_offset_p to false for new candidate.
	(set_group_iv_cost): Scale up group cost with estimate_unroll_factor if
	consider_reg_offset_for_unroll_p.
	(determine_iv_cost): Increase step cost with estimate_unroll_factor if
	consider_reg_offset_for_unroll_p.
	(tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update
	consider_reg_offset_for_unroll_p.
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d2697ae1ba..5a19b53c8d5 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -473,6 +473,9 @@ struct iv_cand
   struct iv *orig_iv;	/* The original iv if this cand is added from biv with
 			   smaller type.  */
   bool doloop_p;	/* Whether this is a doloop candidate.  */
+  bool reg_offset_p;	/* Whether this is available for an address type group
+			   where its all uses are valid to adopt reg_offset
+			   addressing mode even considering unrolling.  */
 };
 
 /* Hashtable entry for common candidate derived from iv uses.  */
@@ -653,6 +656,10 @@ struct ivopts_data
 
   /* Whether the loop has doloop comparison use.  */
   bool doloop_use_p;
+
+  /* Whether need to consider register offset addressing mode for the loop with
+     upcoming unrolling by estimated unroll factor.  */
+  bool consider_reg_offset_for_unroll_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -2731,6 +2738,112 @@ split_address_groups (struct ivopts_data *data)
     }
 }
 
+/* For each address type group, it finds the address-based IV candidates with
+   the same base and step, for those that are available to be used for the
+   whole group with reg_offset addressing mode by considering the address offset
+   difference and increased offset with unrolling factor estimation, mark them
+   as reg_offset_p.  */
+
+static void
+mark_reg_offset_candidates (struct ivopts_data *data)
+{
+  class loop *loop = data->current_loop;
+  gcc_assert (data->current_loop->estimated_unroll > 1);
+  bool any_reg_offset_p = false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "<Reg_offset_p Candidates>:\n");
+
+  auto valid_reg_offset_p
+    = [] (struct iv_use *use, poly_uint64 off, poly_uint64 max_inc) {
+	if (!addr_offset_valid_p (use, off))
+	  return false;
+	if (!addr_offset_valid_p (use, off + max_inc))
+	  return false;
+	return true;
+      };
+
+  for (unsigned i = 0; i < data->vgroups.length (); i++)
+    {
+      struct iv_group *group = data->vgroups[i];
+
+      if (address_p (group->type))
+	{
+	  struct iv_use *head_use = group->vuses[0];
+	  if (!tree_fits_poly_int64_p (head_use->iv->step))
+	    continue;
+
+	  poly_int64 step = tree_to_poly_int64 (head_use->iv->step);
+	  /* Max extra offset to be added due to unrolling.  */
+	  poly_int64 max_increase = (loop->estimated_unroll - 1) * step;
+
+	  tree use_base = head_use->addr_base;
+	  STRIP_NOPS (use_base);
+
+	  struct iv_use *last_use = NULL;
+	  unsigned group_size = group->vuses.length ();
+	  gcc_assert (group_size >= 1);
+	  if (maybe_ne (head_use->addr_offset,
+			group->vuses[group_size - 1]->addr_offset))
+	    last_use = group->vuses[group_size - 1];
+
+	  unsigned j;
+	  bitmap_iterator bi;
+	  EXECUTE_IF_SET_IN_BITMAP (group->related_cands, 0, j, bi)
+	  {
+	    struct iv_cand *cand = data->vcands[j];
+
+	    if (!cand->iv->base_object)
+	      continue;
+
+	    if (cand->reg_offset_p)
+	      continue;
+
+	    if (!operand_equal_p (head_use->iv->base_object,
+				  cand->iv->base_object, 0))
+	      continue;
+
+	    if (!operand_equal_p (head_use->iv->step, cand->iv->step, 0))
+	      continue;
+
+	    poly_uint64 cand_offset = 0;
+	    tree cand_base = strip_offset (cand->iv->base, &cand_offset);
+	    STRIP_NOPS (cand_base);
+	    if (!operand_equal_p (use_base, cand_base, 0))
+	      continue;
+
+	    /* Only need to check the first one and the last one in the group
+	       since it's sorted.  If both are valid, the other intermediate
+	       ones should be in the acceptable range.  */
+	    poly_uint64 head_off = head_use->addr_offset - cand_offset;
+	    if (!valid_reg_offset_p (head_use, head_off, max_increase))
+	      continue;
+
+	    if (last_use)
+	      {
+		poly_int64 last_off = last_use->addr_offset - cand_offset;
+		if (!valid_reg_offset_p (head_use, last_off, max_increase))
+		  continue;
+	      }
+
+	    cand->reg_offset_p = true;
+
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "  cand %u valid for group %u\n", j, i);
+
+	    if (!any_reg_offset_p)
+	      any_reg_offset_p = true;
+	  }
+	}
+    }
+
+  if (!any_reg_offset_p)
+    data->consider_reg_offset_for_unroll_p = false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\n");
+}
+
 /* Finds uses of the induction variables that are interesting.  */
 
 static void
@@ -3147,6 +3260,7 @@ add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important,
       cand->important = important;
       cand->incremented_at = incremented_at;
       cand->doloop_p = doloop;
+      cand->reg_offset_p = false;
       data->vcands.safe_push (cand);
 
       if (!poly_int_tree_p (step))
@@ -3654,6 +3768,14 @@ set_group_iv_cost (struct ivopts_data *data,
       return;
     }
 
+  /* Since we priced more on non reg_offset IV cand step cost, we should scale
+     up the appropriate IV group costs.  Simply consider USE_COMPARE at the
+     loop exit, FIXME if multiple exits supported or no loop exit comparisons
+     matter.  */
+  if (data->consider_reg_offset_for_unroll_p
+      && group->vuses[0]->type != USE_COMPARE)
+    cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll;
+
   if (data->consider_all_candidates)
     {
       group->cost_map[cand->id].cand = cand;
@@ -5718,6 +5840,9 @@ find_iv_candidates (struct ivopts_data *data)
   if (!data->consider_all_candidates)
     relate_compare_use_with_all_cands (data);
 
+  if (data->consider_reg_offset_for_unroll_p)
+    mark_reg_offset_candidates (data);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       unsigned i;
@@ -5890,6 +6015,10 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
     cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
 
+  /* Consider additional step updates during unrolling.  */
+  if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p)
+    cost += (data->current_loop->estimated_unroll - 1) * cost_step;
+
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
      artificial ivs created by other optimization passes.  */
@@ -7976,6 +8105,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, class loop *loop,
   data->current_loop = loop;
   data->loop_loc = find_loop_location (loop).get_location_t ();
   data->speed = optimize_loop_for_speed_p (loop);
+  data->consider_reg_offset_for_unroll_p = false;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -8008,6 +8138,16 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, class loop *loop,
   if (!find_induction_variables (data))
     goto finish;
 
+  if (param_iv_consider_reg_offset_for_unroll != 0 && exit)
+    {
+      tree_niter_desc *desc = niter_for_exit (data, exit);
+      estimate_unroll_factor (loop, desc);
+      data->consider_reg_offset_for_unroll_p = loop->estimated_unroll > 1;
+      if (dump_file && (dump_flags & TDF_DETAILS)
+	  && data->consider_reg_offset_for_unroll_p)
+	fprintf (dump_file, "\nEstimated_unroll:%u\n", loop->estimated_unroll);
+    }
+
   /* Finds interesting uses (item 1).  */
   find_interesting_uses (data);
   if (data->vgroups.length () > MAX_CONSIDERED_GROUPS)