From patchwork Fri Apr 8 07:03:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 607892 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qh9RW5K9Lz9t5g for ; Fri, 8 Apr 2016 17:03:49 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=wlOfwLmy; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:message-id:date:mime-version:content-type; q=dns; s=default; b=juvDUY/V6embq4swtDwXKbACF4NovOInBaAtH5qI5oolC5LsWY +JkHF4aRKjhB45Rm6D9udu5I9YxFrsu1V0EZp7S0FZpSRPuxWJU5xUya8YQ+xfvn ysakE9E3HykXf3t92PdAa4a92dQXmHXzGmve+b2fYUTn0gynFEHWqQTP0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:message-id:date:mime-version:content-type; s= default; bh=c5xDsvfBf4H182iW0rJp0G6kAEo=; b=wlOfwLmymRHRyObY5pnu b+gWsNAPIToP+hi3Dc7NXXDFB6Qp//77fRFmRxVWONc8M6tuheUyjtu6MMQkkPO6 eLFQ+SEoOPKHW3S7YmhlATxDQO/EdP6vtVNV5fn9XBxb0Sw8Laoq5RBM7w2Il3Bc yMQwMC/e59U7WcSbiOWMxnY= Received: (qmail 71474 invoked by alias); 8 Apr 2016 07:03:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 71450 invoked by uid 89); 8 Apr 2016 07:03:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=1, 30, sk:graphit, inversion, nest X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 08 Apr 2016 07:03:29 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1aoQRv-00030Q-P5 from Tom_deVries@mentor.com ; Fri, 08 Apr 2016 00:03:24 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Fri, 8 Apr 2016 08:03:22 +0100 From: Tom de Vries Subject: [PATCH, PR68953] Fix pdr accesses order To: Sebastian Pop , CC: Richard Biener , GCC Patches Message-ID: <570757AE.7070306@mentor.com> Date: Fri, 8 Apr 2016 09:03:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 Hi, this patch fixes wrong-code PR68953, a graphite 6 regression. I. Consider test.c: ... int yu[4][1] = { { 1 }, { 2 }, { 3 }, { 4 } }; int main (void) { int zh, ro; for (zh = 0; zh < 2; ++zh) for (ro = 0; ro < 3; ++ro) yu[ro][0] = yu[zh + 1][0]; return yu[0][0]; } ... The program is supposed to return 2, but when compiling with -O1 -floop-nest-optimize, the program returns 3 instead. The problem is that graphite rewrites this loop nest: ... for (zh = 0; zh < 2; ++zh) for (ro = 0; ro < 3; ++ro) yu[ro][0] = yu[zh + 1][0]; ... into this loop nest, which has different semantics: ... for (ro = 0; ro < 3; ++ro) for (zh = 0; zh < 2; ++zh) yu[ro][0] = yu[zh + 1][0]; ... II. Now, consider test2.c, the equivalent of test.c without the extra array dimension on yu: ... int yu[4] = { 1, 2, 3, 4 }; int main (void) { int zh, ro; for (zh = 0; zh < 2; ++zh) for (ro = 0; ro < 3; ++ro) yu[ro] = yu[zh + 1]; return yu[0]; } ... In this case, when compiling with -O1 -floop-nest-optimize, the graphite transformation doesn't happen, and the program returns 2. III. So, what is the difference in graphite analysis that causes different conclusions? For test2.c, we find the data references: ... data references ( reads: { S_4[i1, i2] -> [1, 1 + i1] : i1 >= 0 and i1 <= 1 and i2 <= 2 and i2 >= 0 } must_writes: { S_4[i1, i2] -> [1, i2] : i2 >= 0 and i2 <= 2 and i1 >= 0 and i1 <= 1 } may_writes: { } ) ... For test.c, we find these data references: ... data references ( reads: { } must_writes: { S_4[i1, 0] -> [1, 0, 0] : i1 >= 0 and i1 <= 1 } may_writes: { } ) ... In other words, there are no reads in the program, which is of course incorrect. Focusing on the reads, we find for test2.c: ... Adding read to depedence graph: pdr_0 (read in gimple stmt: _9 = yu[_8]; data accesses: { S_4[i1, i2] -> [1, 1 + i1] } subscript sizes: { [1, i1] : i1 >= 0 and i1 <= 3 } ) Reads depedence graph: { S_4[i1, i2] -> [1, 1 + i1] : i1 >= 0 and i1 <= 1 and i2 <= 2 and i2 >= 0 } ... And for test.c: ... Adding read to depedence graph: pdr_0 (read in gimple stmt: _9 = yu[_8][0]; data accesses: { S_4[i1, i2] -> [1, 0, 1 + i1] } subscript sizes: { [1, i1, 0] : i1 >= 0 and i1 <= 3 } ) Reads depedence graph: { } ... IV. AFAICT, the data accesses and subscript sizes are parsed as follows: ... data accesses: { S_4[i1, i2] /* [alias set, last subscript, first subscript] */ -> [1, 0, 1 + i1] } subscript sizes: { /* [alias set range, first subscript range, last subscript range] */ [1, i1, 0] : i1 >= 0 and i1 <= 3 } ... In add_pdr_constraints, we do an intersection of the data accesses and the subscript_sizes: ... /* Constrain pdr->accesses with pdr->subscript_sizes and pbb->domain.*/ static isl_map * add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb) { isl_map *x = isl_map_intersect_range (isl_map_copy (pdr->accesses), isl_set_copy (pdr->subscript_sizes)); x = isl_map_coalesce (x); return constrain_domain (x, isl_set_copy (pbb->domain)); } ... The output of the intersection is: ... x: { S_4[-1, i2] -> [1, 0, 0] } ... AFAICT, we're intersecting '[1, 0, 1 + i1]' with '[1, i1, 0]' and end up with '[1, 0, 0]'. In other words, we're intersecting: - the last subscript access function with the first subscript range, and - the first subscript access function with the last subscript range. This is the point from where things go wrong. V. I'm not really sure how this is supposed to be fixed. I imagine that we should do one of 3: 1. we change the order in the access functions 2. we change the order in the subscript_sizes 3. we keep the orders as they are, but don't intersect them directly but do an order inversion before. I've picked 1, since that was the easiest for me to implement (but I'm not sure if by doing so, I've broken any hardcoded graphite assumptions). Bootstrapped and regtested on x86_64. OK for stage4? Thanks, - Tom Fix pdr accesses order 2016-04-07 Tom de Vries PR tree-optimization/68953 * graphite-sese-to-poly.c (pdr_add_memory_accesses): Order accesses from first to last subscript. * gcc.dg/graphite/pr68953.c: New test. --- gcc/graphite-sese-to-poly.c | 2 +- gcc/testsuite/gcc.dg/graphite/pr68953.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index b62789f8..22a09a1 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -672,7 +672,7 @@ pdr_add_memory_accesses (isl_map *acc, dr_info &dri) aff = extract_affine (scop, afn, isl_space_domain (isl_map_get_space (acc))); - acc = set_index (acc, i + 1, aff); + acc = set_index (acc, nb_subscripts - i , aff); } return isl_map_coalesce (acc); diff --git a/gcc/testsuite/gcc.dg/graphite/pr68953.c b/gcc/testsuite/gcc.dg/graphite/pr68953.c new file mode 100644 index 0000000..12c632d --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr68953.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options "-O1 -floop-nest-optimize" } */ + +extern void abort (void); + +int yu[4][1] = { { 1 }, { 2 }, { 3 }, { 4 } }; + +static void __attribute__((noinline,noclone)) +foo (void) +{ + int zh, ro; + + for (zh = 0; zh < 2; ++zh) + for (ro = 0; ro < 3; ++ro) + yu[ro][0] = yu[zh + 1][0]; +} + +int +main (void) +{ + foo (); + + if (yu[0][0] != 2 + || yu[1][0] != 2 + || yu[2][0] != 2 + || yu[3][0] != 4) + abort (); + + return 0; +}