From patchwork Fri Dec 15 16:19:14 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Law <law@redhat.com>
X-Patchwork-Id: 849242
Return-Path: 
 <gcc-patches-return-469362-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-469362-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="oz5B8haM"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3yywcJ0T8fz9sNr
	for <incoming@patchwork.ozlabs.org>;
	Sat, 16 Dec 2017 03:19:27 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:message-id:date:mime-version:content-type; q=dns; s=
	default; b=VXMgF9suFV2sJh3GbDNNpFAZbHxXc3QP5k/Mw3iBsUM5/fEsJtm3s
	fwozgyziJCw2dMxi2fmXaF3HXCeFvAIWp8zjyFx2elYqdcoR8J0Wbb7MLBZv6R5e
	S0EmevQPru4Ruteed8gJMaP3L232lGzi5H5cdg+PCnJDcSBmzpOppo=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:message-id:date:mime-version:content-type; s=
	default; bh=OeA90O6ztmxfjKjXy65l4Maiv6Y=; b=oz5B8haMtUfe8CBKy1b9
	ufftOLxuhj5IEnM8gGpaaHdNfLkQAHcIi4MR0vZKBf11Ahrps+3sAieIbmiPvodL
	2wROps9lj0b8fhX+1+DYMSWlP1h4WuF8cZk5fR0O+YIpTGKLhsCDnsfFwMjQc4cU
	Ul9JJU34hPvSn2UJxaeblnc=
Received: (qmail 33237 invoked by alias); 15 Dec 2017 16:19:19 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 33225 invoked by uid 89); 15 Dec 2017 16:19:19 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS,
	T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=plays,
	wise
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 15 Dec 2017 16:19:17 +0000
Received: from smtp.corp.redhat.com
	(int-mx04.intmail.prod.int.phx2.redhat.com
	[10.5.11.14])	(using TLSv1.2 with cipher AECDH-AES256-SHA
	(256/256 bits))	(No client certificate requested)	by
	mx1.redhat.com (Postfix) with ESMTPS id 9A155C014179	for
	<gcc-patches@gcc.gnu.org>; Fri, 15 Dec 2017 16:19:16 +0000 (UTC)
Received: from localhost.localdomain (ovpn-112-2.rdu2.redhat.com
	[10.10.112.2])	by smtp.corp.redhat.com (Postfix) with ESMTP
	id CEE776CDAB	for <gcc-patches@gcc.gnu.org>;
	Fri, 15 Dec 2017 16:19:15 +0000 (UTC)
From: Jeff Law <law@redhat.com>
To: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: [committed][PR tree-optimization/83410] Avoid some jump threads
	when parallelizing loops
Message-ID: <c56e84d9-5e0a-5430-4d2f-0584c51e3012@redhat.com>
Date: Fri, 15 Dec 2017 09:19:14 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
X-IsSubscribed: yes

I hate this patch.

The fundamental problem we have is that there are times when we very
much want to thread jumps and yet there are other times when we do not.

To date we have been able to largely select between those two by looking
at the shape of the CFG and the jump thread to see how threading a
particular jump would impact the shape of the CFG (particularly loops in
the CFG).

In this BZ we have a loop like this:

        2
        |
        3<-------+
        |        |
        4<---+   |
       / \   |   |
      5   6  |   |
       \  /  |   |
         7   |   |
        / \  |   |
       8  11-+   |
      / \        |
     9  10-------+
     |
     E

We want to thread the path (6, 7) (7, 11).  ie, there's a path through
that inner loop where we don't need to do the loop test.  Here's an
example (from libgomp testsuite)

(N is 1500)

     for (j = 0; j < N; j++)
        {
          if (j > 500)
            {
              x[i][j] = i + j + 3;
              y[j] = i*j + 10;
            }
          else
            x[i][j] = x[i][j]*3;
        }



Threading (in effect) puts the loop test into both arms of the
conditional, then removes it from the ELSE arm because the condition is
always "keep looping" in that arm.

This plays poorly with loop parallelization -- I tried to follow what's
going on in there and just simply got lost.  I got as far as seeing that
the parallelization code thought there were loop carried dependencies.
At least that's how it looked to me.  But I don't see any loop carried
dependencies in the code.

You might naturally ask if threading this is actually wise.  It seems to
broadly fit into the cases where we throttle threading so as not to muck
around too much with the loop structure.  It's not terrible to detect a
CFG of this shape and avoid threading.

BUT....


You can have essentially the same shape CFG (not 100% the same, but the
same key characteristics), but where jump threading simplifies things in
ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where
jump threading avoids spurious warnings (graphite/scop-4.c)

Initially I thought I'd seen a key difference in the contents of the
latch block, but I think I was just up too late and mis-read the dump.

So I don't see anything in the CFG shape or the contents of the blocks
that can be reasonably analyzed at jump threading time.  Given we have
two opposite needs and no reasonable way I can find to select between
them, I'm resorting to a disgusting hack.  Namely to avoid threading
through the latch to another point in the loop *when loop
parallelization is enabled*.

Let me be clear.  This is a hack.  I don't like it, not one little bit.
But I don't see a way to resolve the regression without introducing
other regressions and the loop parallelization code is a total mystery
to me.

Bootstrapped on x86_64 and regression tested with and without graphite.
Confirmed it fixes the graphite related regressions mentioned in the BZ
on x86_64.

Committing to the trunk and hanging my head in shame.

Jeff
commit aa9f2e239944cc5baafdae431c821b900e7f37a9
Author: Jeff Law <law@torsion.usersys.redhat.com>
Date:   Fri Dec 15 11:09:50 2017 -0500

            PR tree-optimization/83410
            * tree-ssa-threadupdate.c (thread_block_1): Avoid certain jump
            threads when parallelizing loops.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8830638d226..2d53e24b4c1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-12-12  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/83410
+	* tree-ssa-threadupdate.c (thread_block_1): Avoid certain jump
+	threads when parallelizing loops.
+
 2017-12-15  Jakub Jelinek  <jakub@redhat.com>
 
 	* tree-core.h (struct attribute_spec): Swap affects_type_identity and
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 045905eceb7..63ad8f9c953 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1333,6 +1333,31 @@ thread_block_1 (basic_block bb, bool noloop_only, bool joiners)
 
 	  if (i != path->length ())
 	    continue;
+
+	  /* Loop parallelization can be confused by the result of
+	     threading through the loop exit test back into the loop.
+	     However, theading those jumps seems to help other codes.
+
+	     I have been unable to find anything related to the shape of
+	     the CFG, the contents of the affected blocks, etc which would
+	     allow a more sensible test than what we're using below which
+	     merely avoids the optimization when parallelizing loops.  */
+	  if (flag_tree_parallelize_loops > 1)
+	    {
+	      for (i = 1; i < path->length (); i++)
+	        if (bb->loop_father == e2->src->loop_father
+		    && loop_exits_from_bb_p (bb->loop_father,
+					     (*path)[i]->e->src)
+		    && !loop_exit_edge_p (bb->loop_father, e2))
+		  break;
+
+	      if (i != path->length ())
+		{
+		  delete_jump_thread_path (path);
+		  e->aux = NULL;
+		  continue;
+		}
+	    }
 	}
 
       /* Insert the outgoing edge into the hash table if it is not