From patchwork Thu Jul 24 15:58:33 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 373457 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0603D1400D7 for ; Fri, 25 Jul 2014 01:59:04 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type; q=dns; s=default; b=PFCzBknSODouu5Ms h1+IQmz/Z+Fey4dOBninblKZS+23wjm+Mr2t+qnIEpHLWB39fU4LwDFSdXO01+8N ltGe5CazTnN1cmulFq/ve2Rw4MAxQTX1whtFyXbC29aLTxqFtwJTph+HbfsbmW3u IVqLaWQlMrXMC6taW55aBiYYpiA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type; s=default; bh=gHltSk8TlhkkSSPG+JJR7i rM7Ug=; b=XTUlLFGiYRQPchZjGYl5NgoojcVC9NnPtIUTgAzLJqfDjJ1InbKOcz /lz1wT/ZG+QhHir87Don2hgXsKI9KRLNBb/AoLrpvEObqzywf7xvywwo3TmXFB2a JaCHlF2zk/gO2A0bONVTJalxdRdM9Cq/41DUu+Ls1AqhFjN4tz/ls= Received: (qmail 5558 invoked by alias); 24 Jul 2014 15:58:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5548 invoked by uid 89); 24 Jul 2014 15:58:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00 autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 24 Jul 2014 15:58:52 +0000 Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93]) by relay1.mentorg.com with esmtp id 1XALPs-0006lJ-BZ from Thomas_Schwinge@mentor.com ; Thu, 24 Jul 2014 08:58:48 -0700 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 24 Jul 2014 08:58:48 -0700 Received: from feldtkeller.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.2.247.3; Thu, 24 Jul 2014 16:58:46 +0100 From: Thomas Schwinge To: CC: , Tom de Vries Subject: Re: [PATCH] [gomp4] Initial support of OpenACC loop directive in C front-end. In-Reply-To: <87zjklkk2f.fsf@kepler.schwinge.homeip.net> References: <53283E04.6010501@samsung.com> <87ha6vipjf.fsf@schwinge.name> <87zjklkk2f.fsf@kepler.schwinge.homeip.net> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) Date: Thu, 24 Jul 2014 17:58:33 +0200 Message-ID: <87lhriiw9i.fsf@kepler.schwinge.homeip.net> MIME-Version: 1.0 Hi! On Thu, 20 Mar 2014 15:42:48 +0100, I wrote: > On Tue, 18 Mar 2014 14:50:44 +0100, I wrote: > > On Tue, 18 Mar 2014 16:37:24 +0400, Ilmir Usmanov wrote: > > > This patch introduces support of OpenACC loop directive (and combined > > > directives) in C front-end up to GENERIC. Currently no clause is allowed. > > > > Thanks! I had worked on a simpler patch, not yet dealing with combined > > clauses. Also, I have some work for the GIMPLE level, namely building on > > GIMPLE_OMP_FOR, adding a new GF_OMP_FOR_KIND_OACC_LOOP. I'll post this > > soon. > > Here are the patches, committed in r208702..4 to gomp-4_0-branch. > commit f1d39706db8dccbc988e2c66552511cd54632257 > Author: tschwinge > Date: Thu Mar 20 14:40:01 2014 +0000 > > Continue implementation of OpenACC loop construct. For loop scheduling, this is currently using expand_omp_for_static_nochunk. For a loop iterating through [0; 100) on 32 threads, this gives us the following schedule: 0 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 32 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 64 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 96 30 31 31 31 ..., that is, several consecutive loop iterations are executed on the same thread. This isn't ideal for GPUs, where for a number of "threads" that are executing in parallel, we'd like all these to execute one "bucket" of consecutive loop iterations, and then the whole set of them moves to the next "bucket", so we'd like a schedule as follows: 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 96 0 1 2 3 Here, "buckets" of 32 iterations are being executed by 32 threads, then the next 32 iterations, and so on. (This is actually one of the OpenACC parallelism concepts, vector parallelism, mapped to the "warp size" of a Nvidia GPU.) In r213006, I committed the following hack to use expand_omp_for_static_chunk instead of expand_omp_for_static_nochunk, by specifying a chunk_size of one to implement the desired scheduling. commit 9a545f89fbb1b361286005ceb68e154d0afc84bd Author: tschwinge Date: Thu Jul 24 15:55:49 2014 +0000 Force OpenACC loop to use a chunk size of one. gcc/ * omp-low.c (extract_omp_for_data): Force OpenACC loop to use a chunk size of one. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213006 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 3 +++ gcc/omp-low.c | 10 ++++++++++ 2 files changed, 13 insertions(+) Grüße, Thomas diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index f8a9d74..cc9b06c 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,8 @@ 2014-07-24 Thomas Schwinge + * omp-low.c (extract_omp_for_data): Force OpenACC loop to use a + chunk size of one. + * omp-low.c (expand_omp_for_static_chunk): Merge changes previously applied to expand_omp_for_static_nochunk. diff --git gcc/omp-low.c gcc/omp-low.c index 2799638..b188e2d 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -619,6 +619,16 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd, fd->loop.step = build_int_cst (TREE_TYPE (fd->loop.v), 1); fd->loop.cond_code = LT_EXPR; } + + //TODO + /* For OpenACC loops, force a chunk size of one, as this avoids the default + scheduling where several subsequent iterations are being executed by the + same thread. */ + if (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP) + { + gcc_assert (fd->chunk_size == NULL_TREE); + fd->chunk_size = build_int_cst (TREE_TYPE (fd->loop.v), 1); + } } In r213005, I committed changes to expand_omp_for_static_chunk that are just what has previously been applied to expand_omp_for_static_nochunk. (Internally, we have builtins to query the real nthreads and threadid, insteead of the dummy one, zero values that I'm using here.) commit 6c07d1bd13f6ceef80beb3c62cd25c3aaa397f1b Author: tschwinge Date: Thu Jul 24 15:55:39 2014 +0000 Make expand_omp_for_static_chunk usable for OpenACC. gcc/ * omp-low.c (expand_omp_for_static_chunk): Merge changes previously applied to expand_omp_for_static_nochunk. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213005 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 5 +++++ gcc/omp-low.c | 19 +++++++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index adfae10..f8a9d74 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-07-24 Thomas Schwinge + + * omp-low.c (expand_omp_for_static_chunk): Merge changes + previously applied to expand_omp_for_static_nochunk. + 2014-07-14 Cesar Philippidis * omp-low.c (extract_omp_for_data): Likewise. diff --git gcc/omp-low.c gcc/omp-low.c index 6345e14..2799638 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -7040,8 +7040,6 @@ static void expand_omp_for_static_chunk (struct omp_region *region, struct omp_for_data *fd, gimple inner_stmt) { - gcc_assert (gimple_omp_for_kind (fd->for_stmt) != GF_OMP_FOR_KIND_OACC_LOOP); - tree n, s0, e0, e, t; tree trip_var, trip_init, trip_main, trip_back, nthreads, threadid; tree type, itype, vmain, vback, vextra; @@ -7054,6 +7052,10 @@ expand_omp_for_static_chunk (struct omp_region *region, tree *counts = NULL; tree n1, n2, step; + gcc_assert ((gimple_omp_for_kind (fd->for_stmt) + != GF_OMP_FOR_KIND_OACC_LOOP) + || !inner_stmt); + itype = type = TREE_TYPE (fd->loop.v); if (POINTER_TYPE_P (type)) itype = signed_type_for (type); @@ -7153,6 +7155,10 @@ expand_omp_for_static_chunk (struct omp_region *region, threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_TEAM_NUM); threadid = build_call_expr (threadid, 0); break; + case GF_OMP_FOR_KIND_OACC_LOOP: + nthreads = integer_one_node; + threadid = integer_zero_node; + break; default: gcc_unreachable (); } @@ -7168,6 +7174,9 @@ expand_omp_for_static_chunk (struct omp_region *region, step = fd->loop.step; if (gimple_omp_for_combined_into_p (fd->for_stmt)) { + gcc_assert (gimple_omp_for_kind (fd->for_stmt) + != GF_OMP_FOR_KIND_OACC_LOOP); + tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), OMP_CLAUSE__LOOPTEMP_); gcc_assert (innerc); @@ -7351,6 +7360,9 @@ expand_omp_for_static_chunk (struct omp_region *region, gsi = gsi_last_bb (exit_bb); if (!gimple_omp_return_nowait_p (gsi_stmt (gsi))) { + gcc_assert (gimple_omp_for_kind (fd->for_stmt) + != GF_OMP_FOR_KIND_OACC_LOOP); + t = gimple_omp_return_lhs (gsi_stmt (gsi)); gsi_insert_after (&gsi, build_omp_barrier (t), GSI_SAME_STMT); } @@ -7365,6 +7377,9 @@ expand_omp_for_static_chunk (struct omp_region *region, se = find_edge (cont_bb, body_bb); if (gimple_omp_for_combined_p (fd->for_stmt)) { + gcc_assert (gimple_omp_for_kind (fd->for_stmt) + != GF_OMP_FOR_KIND_OACC_LOOP); + remove_edge (se); se = NULL; }