From patchwork Fri Nov 6 17:44:37 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 541049 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 92BA214027C for ; Sat, 7 Nov 2015 04:45:22 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=IiZrrTWN; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=HScC2NSUFGEzltBA5hGO28NPwNXwx+Lj5+WI5ET6IDh5z9WriJ npbEwPmI9hZI2StJ7GF6OL74oSCk+S14JgkrZNqFkL2avZOU8fbiJmT6nSPXg70t YQ5Ynr8DJfqew3R/crKCEJRbKc/YsFu2pamVCqDoaZl9RSD2ems5AikBc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=rprPq7RqXuDlw6LQM8BaLt14Jd4=; b=IiZrrTWNvmtvRiP3B+YS aHc4fkQ+3E4UnCVVDph2uTlLjFDJjEeXNgWag4siR+5FIYMQnedaKf1Vim77wqqO h7TfUMHV+rlwwIvLDGMWU23wK6diNDCDgfK/rjxhvNXRh4yULAK+D7Bu7/CrO1nM LIXMwE1cWk4doCCVfmBsipA= Received: (qmail 76976 invoked by alias); 6 Nov 2015 17:45:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 76897 invoked by uid 89); 6 Nov 2015 17:45:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 06 Nov 2015 17:45:07 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49137) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1Zul4T-0003u0-DX for gcc-patches@gnu.org; Fri, 06 Nov 2015 12:45:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zul4N-0004Pm-Ud for gcc-patches@gnu.org; Fri, 06 Nov 2015 12:45:04 -0500 Received: from relay1.mentorg.com ([192.94.38.131]:51718) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zul4N-0004O6-Lk for gcc-patches@gnu.org; Fri, 06 Nov 2015 12:44:59 -0500 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-03.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1Zul4K-0005FT-OX from Tom_deVries@mentor.com ; Fri, 06 Nov 2015 09:44:57 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-03.mgc.mentorg.com (137.202.0.108) with Microsoft SMTP Server id 14.3.224.2; Fri, 6 Nov 2015 17:44:55 +0000 To: "gcc-patches@gnu.org" CC: Jakub Jelinek From: Tom de Vries Subject: [gomp4, committed] Revert "Use marked_independent in oacc kernels region" Message-ID: <563CE705.5040901@mentor.com> Date: Fri, 6 Nov 2015 18:44:37 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 Hi, this patch reverts the independent clause support in the oacc kernels region. The independent clause support is broken, in a subtle way. We currently set the marked_independent field in struct loop for loops with the independent clause in a kernels region. So that property holds for all the loads and stores present at source level. But, at omp-lowering, we introduce new loads and stores. Those new load and stores are supposed to be eliminated from the loop by the kernels pass group. But in general, we can't guarantuee that that happens. So, at parloops, we cannot assume based on marked_independent that in fact all loads and stores in the loop body are independent. Committed to gomp-4_0-branch. Thanks, - Tom Revert "Use marked_independent in oacc kernels region" 2015-10-20 Tom de Vries Revert: 2015-07-14 Tom de Vries * tree-parloops.c (parallelize_loops): Use marked_independent flag in oacc kernels region. * c-c++-common/goacc/kernels-independent.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-independent.c: New test. --- .../c-c++-common/goacc/kernels-independent.c | 41 -------------------- gcc/tree-parloops.c | 21 ++-------- .../kernels-independent.c | 45 ---------------------- 3 files changed, 3 insertions(+), 104 deletions(-) delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-independent.c delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-independent.c diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-independent.c b/gcc/testsuite/c-c++-common/goacc/kernels-independent.c deleted file mode 100644 index 1f36323..0000000 --- a/gcc/testsuite/c-c++-common/goacc/kernels-independent.c +++ /dev/null @@ -1,41 +0,0 @@ -/* { dg-additional-options "-O2" } */ -/* { dg-additional-options "-ftree-parallelize-loops=32" } */ -/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */ -/* { dg-additional-options "-fdump-tree-optimized" } */ - -#include - -#define N (1024 * 512) -#define COUNTERTYPE unsigned int - -void -foo (unsigned int *a, unsigned int *b, unsigned int *c) -{ - - for (COUNTERTYPE i = 0; i < N; i++) - a[i] = i * 2; - - for (COUNTERTYPE i = 0; i < N; i++) - b[i] = i * 4; - -#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) - { - #pragma acc loop independent - for (COUNTERTYPE ii = 0; ii < N; ii++) - c[ii] = a[ii] + b[ii]; - } - - for (COUNTERTYPE i = 0; i < N; i++) - if (c[i] != a[i] + b[i]) - abort (); -} - -/* Check that only one loop is analyzed, and that it can be parallelized. */ -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized, marked independent" 1 "parloops_oacc_kernels" } } */ -/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */ - -/* Check that the loop has been split off into a function. */ -/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */ - -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops_oacc_kernels" } } */ - diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 05827d1..b4039ad 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -3258,24 +3258,9 @@ parallelize_loops (bool oacc_kernels_p) if (!try_create_reduction_list (loop, &reduction_list, oacc_kernels_p)) continue; - if (!flag_loop_parallelize_all) - { - bool independent = (oacc_kernels_p - && loop->marked_independent); - - if (independent) - { - if (dump_file - && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, - " SUCCESS: may be parallelized, marked independent\n"); - } - else - independent = loop_parallel_p (loop, &parloop_obstack); - - if (!independent) - continue; - } + if (!flag_loop_parallelize_all + && !loop_parallel_p (loop, &parloop_obstack)) + continue; if (oacc_kernels_p && !oacc_entry_exit_ok (loop, &reduction_list)) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-independent.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-independent.c deleted file mode 100644 index d169a5f..0000000 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-independent.c +++ /dev/null @@ -1,45 +0,0 @@ -/* { dg-do run } */ -/* { dg-additional-options "-ftree-parallelize-loops=32" } */ - -#include - -#define N (1024 * 512) -#define COUNTERTYPE unsigned int - -void __attribute__((noinline,noclone)) -foo (unsigned int *a, unsigned int *b, unsigned int *c) -{ - - for (COUNTERTYPE i = 0; i < N; i++) - a[i] = i * 2; - - for (COUNTERTYPE i = 0; i < N; i++) - b[i] = i * 4; - -#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) - { - #pragma acc loop independent - for (COUNTERTYPE ii = 0; ii < N; ii++) - c[ii] = a[ii] + b[ii]; - } - - for (COUNTERTYPE i = 0; i < N; i++) - if (c[i] != a[i] + b[i]) - abort (); -} - -int -main (void) -{ - unsigned int *__restrict a; - unsigned int *__restrict b; - unsigned int *__restrict c; - - a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int)); - b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int)); - c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int)); - - foo (a, b, c); - - return 0; -} -- 1.9.1