From patchwork Fri Mar 11 03:01:02 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Law <law@redhat.com>
X-Patchwork-Id: 596033
Return-Path: 
 <gcc-patches-return-423105-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id EC52614030E
	for <incoming@patchwork.ozlabs.org>;
	Fri, 11 Mar 2016 14:01:28 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=W4+68Esa; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:message-id:date:mime-version:content-type; q=dns; s=
	default; b=R28rvaKSVFPC64FBsDQzPzpEmDUbheyllqBl+/j/W5oDGJWZaB8wx
	IBziH9QEm3CH2pBTG/NxElYzi48Waqsf/alfSHR9GlrHoy6iW9/6IUG08fIX439s
	86q5DcrrMyxQIJKu5GRmbRZ+WT5gQOaNLdwxBmU9lviCSDe7doZuMQ=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:message-id:date:mime-version:content-type; s=
	default; bh=LV4a13c4Fv8J5m/tCSoqWY3nHQs=; b=W4+68EsaKv2r1QXcRjW1
	HmmSGutiQYlpr81htcE2A+296ZI48VBSwwAfbPes5xs4gzloDElJMjHizWussUt6
	liLSdscUerx3iVFRda1PwAsW3DTUkFVsUx6thtohBJWAXtMzExFvTj/MA/QckCmE
	i3+HriR/ZGcT9kXLnWsaLvI=
Received: (qmail 116693 invoked by alias); 11 Mar 2016 03:01:17 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 116622 invoked by uid 89); 11 Mar 2016 03:01:17 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,
	RP_MATCHES_RCVD,
	SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=*ptr, ppi,
	eliminates, positively
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-GCM-SHA384 encrypted) ESMTPS;
	Fri, 11 Mar 2016 03:01:04 +0000
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
	(int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])	by
	mx1.redhat.com (Postfix) with ESMTPS id 7D32A75742	for
	<gcc-patches@gcc.gnu.org>; Fri, 11 Mar 2016 03:01:03 +0000 (UTC)
Received: from slagheap.utah.redhat.com (ovpn-113-86.phx2.redhat.com
	[10.3.113.86])	by int-mx14.intmail.prod.int.phx2.redhat.com
	(8.14.4/8.14.4) with ESMTP id u2B313jK004835	for
	<gcc-patches@gcc.gnu.org>; Thu, 10 Mar 2016 22:01:03 -0500
From: Jeff Law <law@redhat.com>
Subject: [RFA][PATCH][PR tree-optimization/64058] Improve and stabilize
	sorting of coalesce pairs
To: gcc-patches <gcc-patches@gcc.gnu.org>
Message-ID: <56E234EE.9080804@redhat.com>
Date: Thu, 10 Mar 2016 20:01:02 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
X-IsSubscribed: yes

As discussed in the BZ, we have multiple problems with how we sort the 
coalesce list during out-of-ssa coalescing.

First, the sort is not stable.  If the cost of two coalesce pairs is the 
same, we break the tie by looking at the underlying SSA_NAME_VERSION of 
the first, then the second elements in the coalesce pairs.

As a result, changes in SSA_NAME_VERSIONs in the IL can result in 
different coalescing during out-of-ssa.  That in turn can cause changes 
in what objects are coalesced, which in turn causes random performance 
changes.

This patch addresses that problem by recording an index for each 
coalescing pair discovered and using that index as the final tiebreaker 
rather than looking at SSA_NAME_VERSIONs.  That brings stability to the 
coalescing process and avoids a lot of unnecessary differences in the 
code we generate when SSA_NAME_VERSIONs change.

The second problem is our costing heuristic only looks at edge 
frequencies & flags.  It's actually a pretty good heuristic and captures 
the main goal of coalescing -- reducing the most commonly executed 
copies.  However, in the case where the edge frequencies/flags result in 
the same cost we can do better.

When we coalesce two SSA_NAMEs, we have to build the union of the 
conflicts of each of the SSA_NAMEs -- which means the resulting union 
object is less likely to be able to participate in further coalescing.

So given two coalescing pairs with the same primary cost, preferring the 
coalescing pair with the smaller resulting conflict set gives us a 
better chance that the resulting object will be able to participate in 
further coalescing.

That heuristic broadly mirrors one aspect of how iterated conservative 
coalescing works.  The other interesting heuristic (that I did not 
implement) was to favor coalescing of the pair which had a higher degree 
of common conflicts between the two nodes -- which broadly falls into 
the same category as what we're doing with this patch.  The key being 
that the conflict sets are an important thing to consider when coalescing.

Using the conflict sizes as a tie-breaker eliminates the regression in 
64058 and AFAICT also eliminates the regression in 68654 (the latter 
doesn't include a testcase or as in-depth analysis as 64058, but my 
testing indicates this patch should generate the desired code for both 
cases).

The patch has (of course) bootstrapped and regression tested on 
x86_64-linux-gnu.

I'd be curious for thoughts on how to build a testcase for this.  I 
could emit the conflict sizes along with the coalescing cost in the 
dumps, but that won't positively verify that we've done the preferred 
set of coalescings.

I might be able to look at the .expand dumps and perhaps look for copies 
on edges.  However, unless the only copies are the ones that were 
causing the regression, I suspect such a test would end up being rather 
fragile.

Other thoughts on how to get this under regression testing?  And of 
course, thoughts on the patch itself?

Thanks,
Jeff

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cc91e84..f28baa2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2016-03-10  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/64058
+	* tree-ssa-coalesce.c (struct coalesce_pair): Add new fields
+	CONFLICT_COUNT and INDEX.
+	(num_coalesce_pairs): Move up earlier in the file.
+	(find_coalesce_pair): Initialize the INDEX field for each pair
+	discovered.
+	(add_conflict_counts): New function to initialize the CONFLICT_COUNT
+	field for each conflict pair.
+	(coalesce_ssa_name): Call it.
+	(compare_pairs): No longer sort on the elements of each pair.
+	Instead break ties with the conflict size and finally the index
+	of the coalesce pair.
+
 2016-03-10  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>
 
 	PR target/70168
diff --git a/gcc/tree-ssa-coalesce.c b/gcc/tree-ssa-coalesce.c
index 6624e7e..b8a2e0d 100644
--- a/gcc/tree-ssa-coalesce.c
+++ b/gcc/tree-ssa-coalesce.c
@@ -50,6 +50,19 @@ struct coalesce_pair
   int first_element;
   int second_element;
   int cost;
+
+  /* A count of the number of unique partitions this pair would conflict
+     with if coalescing was successful.  This is the secondary sort key,
+     given two pairs with equal costs, we will prefer the pair with a smaller
+     conflict set.
+
+     Note this is not updated and propagated as pairs are coalesced.  */
+  int conflict_count;
+
+  /* The order in which coalescing pairs are discovered is recorded in this
+     field, which is used as the final tie breaker when sorting coalesce
+     pairs.  */
+  int index;
 };
 
 /* Coalesce pair hashtable helpers.  */
@@ -254,6 +267,13 @@ delete_coalesce_list (coalesce_list *cl)
   free (cl);
 }
 
+/* Return the number of unique coalesce pairs in CL.  */
+
+static inline int
+num_coalesce_pairs (coalesce_list *cl)
+{
+  return cl->list->elements ();
+}
 
 /* Find a matching coalesce pair object in CL for the pair P1 and P2.  If
    one isn't found, return NULL if CREATE is false, otherwise create a new
@@ -290,6 +310,7 @@ find_coalesce_pair (coalesce_list *cl, int p1, int p2, bool create)
       pair->first_element = p.first_element;
       pair->second_element = p.second_element;
       pair->cost = 0;
+      pair->index = num_coalesce_pairs (cl);
       *slot = pair;
     }
 
@@ -343,29 +364,21 @@ compare_pairs (const void *p1, const void *p2)
   int result;
 
   result = (* pp1)->cost - (* pp2)->cost;
-  /* Since qsort does not guarantee stability we use the elements
-     as a secondary key.  This provides us with independence from
-     the host's implementation of the sorting algorithm.  */
+  /* We use the size of the resulting conflict set as the secondary sort key.
+     Given two equal costing coalesce pairs, we want to prefer the pair that
+     has the smaller conflict set.  */
   if (result == 0)
     {
-      result = (* pp2)->first_element - (* pp1)->first_element;
+      result = (*pp2)->conflict_count - (*pp1)->conflict_count;
+      /* And if everything else is equal, then sort based on which
+	 coalesce pair was found first.  */
       if (result == 0)
-	result = (* pp2)->second_element - (* pp1)->second_element;
+	result = (*pp2)->index - (*pp1)->index;
     }
 
   return result;
 }
 
-
-/* Return the number of unique coalesce pairs in CL.  */
-
-static inline int
-num_coalesce_pairs (coalesce_list *cl)
-{
-  return cl->list->elements ();
-}
-
-
 /* Iterate over CL using ITER, returning values in PAIR.  */
 
 #define FOR_EACH_PARTITION_PAIR(PAIR, ITER, CL)		\
@@ -578,6 +591,42 @@ ssa_conflicts_merge (ssa_conflicts *ptr, unsigned x, unsigned y)
     }
 }
 
+/* Compute and record how many unique conflicts would exist for the
+   representative partition for each coalesce pair in CL.
+
+   CONFLICTS is the conflict graph and MAP is the current partition view.  */
+
+static void
+add_conflict_counts (ssa_conflicts *conflicts, var_map map, coalesce_list *cl)
+{
+  coalesce_pair *p;
+  coalesce_iterator_type ppi;
+  bitmap tmp = BITMAP_ALLOC (NULL);
+
+  FOR_EACH_PARTITION_PAIR (p, ppi, cl)
+    {
+      int p1 = var_to_partition (map, ssa_name (p->first_element));
+      int p2 = var_to_partition (map, ssa_name (p->second_element));
+
+      /* 4 cases.  If both P1 and P2 have conflicts, then build their
+	 union and count the members.  Else handle the degenerate cases
+	 in the obvious ways.  */
+      if (conflicts->conflicts[p1] && conflicts->conflicts[p2])
+	{
+	  bitmap_ior (tmp, conflicts->conflicts[p1], conflicts->conflicts[p2]);
+	  p->conflict_count = bitmap_count_bits (tmp);
+	  bitmap_clear (tmp);
+	}
+      else if (conflicts->conflicts[p1])
+	p->conflict_count = bitmap_count_bits (conflicts->conflicts[p1]);
+      else if (conflicts->conflicts[p2])
+	p->conflict_count = bitmap_count_bits (conflicts->conflicts[p2]);
+      else
+	p->conflict_count = 0;
+    }
+
+  BITMAP_FREE (tmp);
+}
 
 /* Dump a conflicts graph.  */
 
@@ -1802,6 +1851,7 @@ coalesce_ssa_name (void)
   if (dump_file && (dump_flags & TDF_DETAILS))
     ssa_conflicts_dump (dump_file, graph);
 
+  add_conflict_counts (graph, map, cl);
   sort_coalesce_list (cl);
 
   if (dump_file && (dump_flags & TDF_DETAILS))