From patchwork Fri Apr  8 07:03:10 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tom de Vries <Tom_deVries@mentor.com>
X-Patchwork-Id: 607892
Return-Path: 
 <gcc-patches-return-424532-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3qh9RW5K9Lz9t5g
	for <incoming@patchwork.ozlabs.org>;
	Fri,  8 Apr 2016 17:03:49 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=wlOfwLmy; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:cc:message-id:date:mime-version:content-type; q=dns;
	s=default; b=juvDUY/V6embq4swtDwXKbACF4NovOInBaAtH5qI5oolC5LsWY
	+JkHF4aRKjhB45Rm6D9udu5I9YxFrsu1V0EZp7S0FZpSRPuxWJU5xUya8YQ+xfvn
	ysakE9E3HykXf3t92PdAa4a92dQXmHXzGmve+b2fYUTn0gynFEHWqQTP0=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:cc:message-id:date:mime-version:content-type; s=
	default; bh=c5xDsvfBf4H182iW0rJp0G6kAEo=; b=wlOfwLmymRHRyObY5pnu
	b+gWsNAPIToP+hi3Dc7NXXDFB6Qp//77fRFmRxVWONc8M6tuheUyjtu6MMQkkPO6
	eLFQ+SEoOPKHW3S7YmhlATxDQO/EdP6vtVNV5fn9XBxb0Sw8Laoq5RBM7w2Il3Bc
	yMQwMC/e59U7WcSbiOWMxnY=
Received: (qmail 71474 invoked by alias); 8 Apr 2016 07:03:40 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 71450 invoked by uid 89); 8 Apr 2016 07:03:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_NONE, SPF_PASS,
	URIBL_RED autolearn=ham version=3.3.2 spammy=1, 30, sk:graphit,
	inversion, nest
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-GCM-SHA384 encrypted) ESMTPS;
	Fri, 08 Apr 2016 07:03:29 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2]
	helo=SVR-IES-FEM-01.mgc.mentorg.com)	by relay1.mentorg.com
	with esmtp id 1aoQRv-00030Q-P5 from Tom_deVries@mentor.com ;
	Fri, 08 Apr 2016 00:03:24 -0700
Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com
	(137.202.0.104) with Microsoft SMTP Server id 14.3.224.2;
	Fri, 8 Apr 2016 08:03:22 +0100
From: Tom de Vries <Tom_deVries@mentor.com>
Subject: [PATCH, PR68953] Fix pdr accesses order
To: Sebastian Pop <sebpop@gmail.com>, <tobias@grosser.es>
CC: Richard Biener <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org>
Message-ID: <570757AE.7070306@mentor.com>
Date: Fri, 8 Apr 2016 09:03:10 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0

Hi,

this patch fixes wrong-code PR68953, a graphite 6 regression.


I.

Consider test.c:
...
int yu[4][1] = { { 1 }, { 2 }, { 3 }, { 4 } };

int
main (void)
{
   int zh, ro;

   for (zh = 0; zh < 2; ++zh)
     for (ro = 0; ro < 3; ++ro)
       yu[ro][0] = yu[zh + 1][0];

   return yu[0][0];
}
...

The program is supposed to return 2, but when compiling with -O1 
-floop-nest-optimize, the program returns 3 instead.

The problem is that graphite rewrites this loop nest:
...
   for (zh = 0; zh < 2; ++zh)
     for (ro = 0; ro < 3; ++ro)
       yu[ro][0] = yu[zh + 1][0];
...
into this loop nest, which has different semantics:
...
   for (ro = 0; ro < 3; ++ro)
     for (zh = 0; zh < 2; ++zh)
       yu[ro][0] = yu[zh + 1][0];
...


II.

Now, consider test2.c, the equivalent of test.c without the extra array 
dimension on yu:
...
int yu[4] = { 1, 2, 3, 4 };

int
main (void)
{
   int zh, ro;

   for (zh = 0; zh < 2; ++zh)
     for (ro = 0; ro < 3; ++ro)
       yu[ro] = yu[zh + 1];

   return yu[0];
}
...

In this case, when compiling with -O1 -floop-nest-optimize, the graphite 
transformation doesn't happen, and the program returns 2.


III.

So, what is the difference in graphite analysis that causes different 
conclusions?

For test2.c, we find the data references:
...
data references (
   reads: { S_4[i1, i2] -> [1, 1 + i1] : i1 >= 0 and i1 <= 1 and i2 <= 2 
and i2 >= 0 }
   must_writes: { S_4[i1, i2] -> [1, i2] : i2 >= 0 and i2 <= 2 and i1 >= 
0 and i1 <= 1 }
   may_writes: {  }
)
...

For test.c, we find these data references:
...
data references (
   reads: {  }
   must_writes: { S_4[i1, 0] -> [1, 0, 0] : i1 >= 0 and i1 <= 1 }
   may_writes: {  }
)
...

In other words, there are no reads in the program, which is of course 
incorrect.

Focusing on the reads, we find for test2.c:
...
Adding read to depedence graph: pdr_0 (read
in gimple stmt: _9 = yu[_8];
data accesses: { S_4[i1, i2] -> [1, 1 + i1] }
subscript sizes: { [1, i1] : i1 >= 0 and i1 <= 3 }
)
Reads depedence graph: { S_4[i1, i2] -> [1, 1 + i1] : i1 >= 0 and i1 <= 
1 and i2 <= 2 and i2 >= 0 }
...

And for test.c:
...
Adding read to depedence graph: pdr_0 (read
in gimple stmt: _9 = yu[_8][0];
data accesses: { S_4[i1, i2] -> [1, 0, 1 + i1] }
subscript sizes: { [1, i1, 0] : i1 >= 0 and i1 <= 3 }
)
Reads depedence graph: {  }
...


IV.

AFAICT, the data accesses and subscript sizes are parsed as follows:
...
data accesses:
{
   S_4[i1, i2]
      /* [alias set, last subscript, first subscript] */
   -> [1, 0, 1 + i1]
}
subscript sizes:
{
   /* [alias set range, first subscript range, last subscript range] */
   [1, i1, 0]
   : i1 >= 0 and i1 <= 3
}
...

In add_pdr_constraints, we do an intersection of the data accesses and 
the subscript_sizes:
...
/* Constrain pdr->accesses with pdr->subscript_sizes and pbb->domain.*/

static isl_map *
add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb)
{
   isl_map *x
     = isl_map_intersect_range (isl_map_copy (pdr->accesses),
                                isl_set_copy (pdr->subscript_sizes));
   x = isl_map_coalesce (x);
   return constrain_domain (x, isl_set_copy (pbb->domain));
}

...

The output of the intersection is:
...
x: { S_4[-1, i2] -> [1, 0, 0] }
...

AFAICT, we're intersecting
   '[1, 0, 1 + i1]'
with
   '[1, i1, 0]'
and end up with
   '[1, 0, 0]'.

In other words, we're intersecting:
- the last subscript access function with the first subscript range,
   and
- the first subscript access function with the last subscript range.

This is the point from where things go wrong.


V.

I'm not really sure how this is supposed to be fixed. I imagine that we 
should do one of 3:
1. we change the order in the access functions
2. we change the order in the subscript_sizes
3. we keep the orders as they are, but don't intersect them directly
    but do an order inversion before.

I've picked 1, since that was the easiest for me to implement (but I'm 
not sure if by doing so, I've broken any hardcoded graphite assumptions).

Bootstrapped and regtested on x86_64.

OK for stage4?

Thanks,
- Tom

Fix pdr accesses order

2016-04-07  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/68953
	* graphite-sese-to-poly.c (pdr_add_memory_accesses): Order accesses from
	first to last subscript.

	* gcc.dg/graphite/pr68953.c: New test.

---
 gcc/graphite-sese-to-poly.c             |  2 +-
 gcc/testsuite/gcc.dg/graphite/pr68953.c | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index b62789f8..22a09a1 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -672,7 +672,7 @@ pdr_add_memory_accesses (isl_map *acc, dr_info &dri)
 
       aff = extract_affine (scop, afn,
 			    isl_space_domain (isl_map_get_space (acc)));
-      acc = set_index (acc, i + 1, aff);
+      acc = set_index (acc, nb_subscripts - i , aff);
     }
 
   return isl_map_coalesce (acc);
diff --git a/gcc/testsuite/gcc.dg/graphite/pr68953.c b/gcc/testsuite/gcc.dg/graphite/pr68953.c
new file mode 100644
index 0000000..12c632d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr68953.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O1 -floop-nest-optimize" } */
+
+extern void abort (void);
+
+int yu[4][1] = { { 1 }, { 2 }, { 3 }, { 4 } };
+
+static void __attribute__((noinline,noclone))
+foo (void)
+{
+  int zh, ro;
+
+  for (zh = 0; zh < 2; ++zh)
+    for (ro = 0; ro < 3; ++ro)
+      yu[ro][0] = yu[zh + 1][0];
+}
+
+int
+main (void)
+{
+  foo ();
+
+  if (yu[0][0] != 2
+      || yu[1][0] != 2
+      || yu[2][0] != 2
+      || yu[3][0] != 4)
+    abort ();
+
+  return 0;
+}