From patchwork Tue Dec 8 20:47:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 554043 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3F3781402CD for ; Wed, 9 Dec 2015 07:47:34 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=FH78x8FN; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=BiPtcZLl5Nmj4kWJ74v61Xq43sT0ssrIBAntncNFeiQR9r1yWiQDA ROjFa6BoqgNQK3DLaKb7OCC1qaFgEDgdWxgtwhMqqY65z5ZROiB73R1Nxo1uD091 4k45YVcsWV3UuyDTPJ6sHBFy6ouaXuVBCPQN5qXrAaGZA0KCq/KXUU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type:content-transfer-encoding; s=default; bh=dVVsdhtXz3zMmHaZqtfKGYi/h2o=; b=FH78x8FN1I1RHjWW28KdlfqC+8bs o3aYLcieLsWUkVU9uDBiaZb9dNvd3n03h2NlTVtCw/U/ueeBAklMcyG39/ln69LG BD7igUYH8GV6+qKwwAoIWKKE8fcOd3ecifuWqmcW192jAv+xOc/xZbGA7xgN9C5x Vu9kO1w9shsj3jU= Received: (qmail 82157 invoked by alias); 8 Dec 2015 20:47:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 81846 invoked by uid 89); 8 Dec 2015 20:47:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 08 Dec 2015 20:47:15 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1a6PAE-0005ad-VD from Thomas_Schwinge@mentor.com for gcc-patches@gcc.gnu.org; Tue, 08 Dec 2015 12:47:11 -0800 Received: from hertz.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Tue, 8 Dec 2015 20:47:09 +0000 From: Thomas Schwinge To: CC: James Norris Subject: [gomp4] libgomp documentation: CUDA Streams Usage In-Reply-To: <54B426C3.9080004@mentor.com> References: <54B426C3.9080004@mentor.com> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu) Date: Tue, 8 Dec 2015 21:47:08 +0100 Message-ID: <87r3iw3bxv.fsf@hertz.schwinge.homeip.net> MIME-Version: 1.0 Hi! On Mon, 12 Jan 2015 13:55:47 -0600, James Norris wrote: > The attached patch adds a new section to the documentation > for libgomp. This section describes the use of streams > within the OpenACC portion of the library. That never made it upstream; with a little bit of copy-editing now committed to gomp-4_0-branch in r231424: commit ec7ae163b644bd11fd7343dd576cc9da0b50cbc7 Author: tschwinge Date: Tue Dec 8 20:44:06 2015 +0000 libgomp documentation: CUDA Streams Usage libgomp/ * libgomp.texi (CUDA Streams Usage): New chapter. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231424 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 5 +++++ libgomp/libgomp.texi | 49 +++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 53 insertions(+), 1 deletion(-) Grüße Thomas diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index a59cc9d..4b99302 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,4 +1,9 @@ 2015-12-08 Thomas Schwinge + James Norris + + * libgomp.texi (CUDA Streams Usage): New chapter. + +2015-12-08 Thomas Schwinge * testsuite/libgomp.oacc-c-c++-common/routine-bind-nohost-1.c: New file. diff --git libgomp/libgomp.texi libgomp/libgomp.texi index 019e439..542ca2f 100644 --- libgomp/libgomp.texi +++ libgomp/libgomp.texi @@ -100,6 +100,8 @@ changed to GNU Offloading and Multi Processing Runtime Library. programming interface. * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with environment variables. +* CUDA Streams Usage:: Notes on the implementation of + asynchronous operations. * OpenACC Library Interoperability:: OpenACC library interoperability with the NVIDIA CUBLAS library. * Enabling OpenMP:: How to enable OpenMP for your @@ -552,6 +554,51 @@ Print debug information pertaining to the accelerator. @end table + +@c --------------------------------------------------------------------- +@c CUDA Streams Usage +@c --------------------------------------------------------------------- + +@node CUDA Streams Usage +@chapter CUDA Streams Usage + +This applies to the @code{nvptx} plugin only. + +The library provides elements that perform asynchronous movement of +data and asynchronous operation of computing constructs. This +asynchronous functionality is implemented by making use of CUDA +streams@footnote{See "Stream Management" in "CUDA Driver API", +TRM-06703-001, Version 5.5, July 2013, for additional information}. + +The primary means by which the asychronous functionality is accessed +is through the use of those OpenACC directives which make use of the +@code{async} and @code{wait} clauses. When the @code{async} clause is +first used with a directive, it will create a CUDA stream. If an +@code{async-argument} is used with the @code{async} clause, then the +stream will be associated with the specified @code{async-argument}. + +Following the creation of an association between a CUDA stream and the +@code{async-argument} of an @code{async} clause, both the @code{wait} +clause and the @code{wait} directive can be used. When either the +clause or directive is used after stream creation, it creates a +rendezvous point whereby execution will wait until all operations +associated with the @code{async-argument}, that is, stream, have +completed. + +Normally, the management of the streams that are created as a result of +using the @code{async} clause, is done without any intervention by the +caller. This implies the association between the @code{async-argument} +and the CUDA stream will be maintained for the lifetime of the program. +However, this association can be changed through the use of the library +function @code{acc_set_cuda_stream}. When the function +@code{acc_set_cuda_stream} is used, the CUDA stream that was +originally associated with the @code{async} clause will be destroyed. +Caution should be taken when changing the association as subsequent +references to the @code{async-argument} will be referring to a different +CUDA stream. + + + @c --------------------------------------------------------------------- @c OpenACC Library Interoperability @c --------------------------------------------------------------------- @@ -564,7 +611,7 @@ Print debug information pertaining to the accelerator. As the OpenACC library is built using the CUDA Driver API, the question has arisen on what impact does using the OpenACC library have on a program that uses the Runtime library, or a library based on the Runtime library, e.g., -CUBLAS@footnote{Seee section 2.26, "Interactions with the CUDA Driver API" in +CUBLAS@footnote{See section 2.26, "Interactions with the CUDA Driver API" in "CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, July 2013, for additional information on library interoperability.}.