From patchwork Thu Jul 24 15:58:33 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Schwinge <thomas@codesourcery.com>
X-Patchwork-Id: 373457
Return-Path: 
 <gcc-patches-return-373225-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 0603D1400D7
	for <incoming@patchwork.ozlabs.org>;
	Fri, 25 Jul 2014 01:59:04 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:in-reply-to:references:date:message-id
	:mime-version:content-type; q=dns; s=default; b=PFCzBknSODouu5Ms
	h1+IQmz/Z+Fey4dOBninblKZS+23wjm+Mr2t+qnIEpHLWB39fU4LwDFSdXO01+8N
	ltGe5CazTnN1cmulFq/ve2Rw4MAxQTX1whtFyXbC29aLTxqFtwJTph+HbfsbmW3u
	IVqLaWQlMrXMC6taW55aBiYYpiA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:in-reply-to:references:date:message-id
	:mime-version:content-type; s=default; bh=gHltSk8TlhkkSSPG+JJR7i
	rM7Ug=; b=XTUlLFGiYRQPchZjGYl5NgoojcVC9NnPtIUTgAzLJqfDjJ1InbKOcz
	/lz1wT/ZG+QhHir87Don2hgXsKI9KRLNBb/AoLrpvEObqzywf7xvywwo3TmXFB2a
	JaCHlF2zk/gO2A0bONVTJalxdRdM9Cq/41DUu+Ls1AqhFjN4tz/ls=
Received: (qmail 5558 invoked by alias); 24 Jul 2014 15:58:57 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 5548 invoked by uid 89); 24 Jul 2014 15:58:56 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,
	BAYES_00 autolearn=ham version=3.3.2
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	ESMTP; Thu, 24 Jul 2014 15:58:52 +0000
Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93])	by
	relay1.mentorg.com with esmtp id 1XALPs-0006lJ-BZ from
	Thomas_Schwinge@mentor.com ; Thu, 24 Jul 2014 08:58:48 -0700
Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by
	svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with
	Microsoft SMTPSVC(6.0.3790.4675); Thu, 24 Jul 2014 08:58:48 -0700
Received: from feldtkeller.schwinge.homeip.net (137.202.0.76) by
	SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft
	SMTP Server id 14.2.247.3; Thu, 24 Jul 2014 16:58:46 +0100
From: Thomas Schwinge <thomas@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
CC: <jakub@redhat.com>, Tom de Vries <tom@codesourcery.com>
Subject: Re: [PATCH] [gomp4] Initial support of OpenACC loop directive in C
	front-end.
In-Reply-To: <87zjklkk2f.fsf@kepler.schwinge.homeip.net>
References: <53283E04.6010501@samsung.com> <87ha6vipjf.fsf@schwinge.name>
	<87zjklkk2f.fsf@kepler.schwinge.homeip.net>
User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1
	(i486-pc-linux-gnu)
Date: Thu, 24 Jul 2014 17:58:33 +0200
Message-ID: <87lhriiw9i.fsf@kepler.schwinge.homeip.net>
MIME-Version: 1.0

Hi!

On Thu, 20 Mar 2014 15:42:48 +0100, I wrote:
> On Tue, 18 Mar 2014 14:50:44 +0100, I wrote:
> > On Tue, 18 Mar 2014 16:37:24 +0400, Ilmir Usmanov <i.usmanov@samsung.com> wrote:
> > > This patch introduces support of OpenACC loop directive (and combined 
> > > directives) in C front-end up to GENERIC. Currently no clause is allowed.
> > 
> > Thanks!  I had worked on a simpler patch, not yet dealing with combined
> > clauses.  Also, I have some work for the GIMPLE level, namely building on
> > GIMPLE_OMP_FOR, adding a new GF_OMP_FOR_KIND_OACC_LOOP.  I'll post this
> > soon.
> 
> Here are the patches, committed in r208702..4 to gomp-4_0-branch.

> commit f1d39706db8dccbc988e2c66552511cd54632257
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Thu Mar 20 14:40:01 2014 +0000
> 
>     Continue implementation of OpenACC loop construct.

For loop scheduling, this is currently using
expand_omp_for_static_nochunk.  For a loop iterating through [0; 100) on
32 threads, this gives us the following schedule:

    0       0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
    32      9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19
    64      20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30
    96      30 31 31 31

..., that is, several consecutive loop iterations are executed on the
same thread.  This isn't ideal for GPUs, where for a number of "threads"
that are executing in parallel, we'd like all these to execute one
"bucket" of consecutive loop iterations, and then the whole set of them
moves to the next "bucket", so we'd like a schedule as follows:

    0       0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    32      0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    64      0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    96      0 1 2 3

Here, "buckets" of 32 iterations are being executed by 32 threads, then
the next 32 iterations, and so on.  (This is actually one of the OpenACC
parallelism concepts, vector parallelism, mapped to the "warp size" of a
Nvidia GPU.)

In r213006, I committed the following hack to use
expand_omp_for_static_chunk instead of expand_omp_for_static_nochunk, by
specifying a chunk_size of one to implement the desired scheduling.

commit 9a545f89fbb1b361286005ceb68e154d0afc84bd
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jul 24 15:55:49 2014 +0000

    Force OpenACC loop to use a chunk size of one.
    
    	gcc/
    	* omp-low.c (extract_omp_for_data): Force OpenACC loop to use a
    	chunk size of one.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213006 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  3 +++
 gcc/omp-low.c      | 10 ++++++++++
 2 files changed, 13 insertions(+)


Grüße,
 Thomas

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index f8a9d74..cc9b06c 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-07-24  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* omp-low.c (extract_omp_for_data): Force OpenACC loop to use a
+	chunk size of one.
+
 	* omp-low.c (expand_omp_for_static_chunk): Merge changes
 	previously applied to expand_omp_for_static_nochunk.
 
diff --git gcc/omp-low.c gcc/omp-low.c
index 2799638..b188e2d 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -619,6 +619,16 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
       fd->loop.step = build_int_cst (TREE_TYPE (fd->loop.v), 1);
       fd->loop.cond_code = LT_EXPR;
     }
+
+  //TODO
+  /* For OpenACC loops, force a chunk size of one, as this avoids the default
+    scheduling where several subsequent iterations are being executed by the
+    same thread.  */
+  if (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
+    {
+      gcc_assert (fd->chunk_size == NULL_TREE);
+      fd->chunk_size = build_int_cst (TREE_TYPE (fd->loop.v), 1);
+    }
 }
 
 
In r213005, I committed changes to expand_omp_for_static_chunk that are
just what has previously been applied to expand_omp_for_static_nochunk.
(Internally, we have builtins to query the real nthreads and threadid,
insteead of the dummy one, zero values that I'm using here.)

commit 6c07d1bd13f6ceef80beb3c62cd25c3aaa397f1b
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jul 24 15:55:39 2014 +0000

    Make expand_omp_for_static_chunk usable for OpenACC.
    
    	gcc/
    	* omp-low.c (expand_omp_for_static_chunk): Merge changes
    	previously applied to expand_omp_for_static_nochunk.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213005 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  5 +++++
 gcc/omp-low.c      | 19 +++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index adfae10..f8a9d74 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-07-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-low.c (expand_omp_for_static_chunk): Merge changes
+	previously applied to expand_omp_for_static_nochunk.
+
 2014-07-14  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* omp-low.c (extract_omp_for_data): Likewise.
diff --git gcc/omp-low.c gcc/omp-low.c
index 6345e14..2799638 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -7040,8 +7040,6 @@ static void
 expand_omp_for_static_chunk (struct omp_region *region,
 			     struct omp_for_data *fd, gimple inner_stmt)
 {
-  gcc_assert (gimple_omp_for_kind (fd->for_stmt) != GF_OMP_FOR_KIND_OACC_LOOP);
-
   tree n, s0, e0, e, t;
   tree trip_var, trip_init, trip_main, trip_back, nthreads, threadid;
   tree type, itype, vmain, vback, vextra;
@@ -7054,6 +7052,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
   tree *counts = NULL;
   tree n1, n2, step;
 
+  gcc_assert ((gimple_omp_for_kind (fd->for_stmt)
+	       != GF_OMP_FOR_KIND_OACC_LOOP)
+	      || !inner_stmt);
+
   itype = type = TREE_TYPE (fd->loop.v);
   if (POINTER_TYPE_P (type))
     itype = signed_type_for (type);
@@ -7153,6 +7155,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
       threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_TEAM_NUM);
       threadid = build_call_expr (threadid, 0);
       break;
+    case GF_OMP_FOR_KIND_OACC_LOOP:
+      nthreads = integer_one_node;
+      threadid = integer_zero_node;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -7168,6 +7174,9 @@ expand_omp_for_static_chunk (struct omp_region *region,
   step = fd->loop.step;
   if (gimple_omp_for_combined_into_p (fd->for_stmt))
     {
+      gcc_assert (gimple_omp_for_kind (fd->for_stmt)
+		  != GF_OMP_FOR_KIND_OACC_LOOP);
+
       tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
 				     OMP_CLAUSE__LOOPTEMP_);
       gcc_assert (innerc);
@@ -7351,6 +7360,9 @@ expand_omp_for_static_chunk (struct omp_region *region,
   gsi = gsi_last_bb (exit_bb);
   if (!gimple_omp_return_nowait_p (gsi_stmt (gsi)))
     {
+      gcc_assert (gimple_omp_for_kind (fd->for_stmt)
+		  != GF_OMP_FOR_KIND_OACC_LOOP);
+
       t = gimple_omp_return_lhs (gsi_stmt (gsi));
       gsi_insert_after (&gsi, build_omp_barrier (t), GSI_SAME_STMT);
     }
@@ -7365,6 +7377,9 @@ expand_omp_for_static_chunk (struct omp_region *region,
       se = find_edge (cont_bb, body_bb);
       if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
+	  gcc_assert (gimple_omp_for_kind (fd->for_stmt)
+		      != GF_OMP_FOR_KIND_OACC_LOOP);
+
 	  remove_edge (se);
 	  se = NULL;
 	}