Message ID | e1cccd27-4f07-6123-e9ee-82be74002c08@mentor.com |
---|---|
Headers | show |
Series | Async re-work | expand |
Hi Chung-Lin! Happy New Year now to you, too! :-) On Tue, 22 Jan 2019 22:52:09 +0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote: > Hi, this is a rebase to current trunk and re-submission of the OpenACC Async > re-organization work, aiming to commit when stage1 re-opens. Thanks! > This is technically > the 2nd time I'm sending this whole patch series, but because I've named > partial revisions up to v4 by now, for clarity I will just call this entire set "v5". As far as I'm concerned, these patches should all (with a few exceptions to be split out, see below) be merged into one patch, because they logically all belong together, as one piece: "async re-work". > Thomas, I hope I resolved all discussed issues in this current patch set. Please > kindly remind if I missed anything, as there were so many emails to re-check :) I'm still waiting for you to commit the PR87924 "OpenACC wait clauses without async-arguments" changes, as a prerequisite to this re-work, <https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01282.html>. If we agree that we actually need such a thing (I'll have to re-read Jakub's comments), please submit the 'GOMP_PLUGIN_IF_VERSION' changes separately, with 'GOMP_PLUGIN_IF_VERSION' equal to 'GOMP_VERSION' (initially). As this then is only a kind of documentation update, this might then go into trunk right now -- and even if not right now, should still be done separately as a prerequisite patch to this re-work, which will then just increment 'GOMP_PLUGIN_IF_VERSION'. Maybe rename 'GOMP_PLUGIN_IF_VERSION' to 'GOMP_PLUGIN_VERSION', for similarity with 'GOMP_VERSION'? And, it's then a bit confusing that 'GOMP_PLUGIN_VERSION' is returned from 'GOMP_OFFLOAD_version' functions (plus 'host_version'); we there got "plugin" vs. "offload". But I suppose we'll just live with that? The 'GOMP_OFFLOAD_version' functions should then also get their source code comments updated: "libgomp [plugin] version"? Now, back to the actual async re-work. I see you've incorporated some of the incremental patches I provided (thanks!), but not all of them. I don't know if you just missed (some of) these, or actually object? I had requested that the OpenACC 2.5 'default_async' changes be discussed separately, after this re-work has gone in, so please remove these changes from this patch series. I've again attached "into async re-work: revert default_async changes". I had provided changes, "into async re-work: don't create an asyncqueue just to then test/synchronize with it", again attached. I had asked that you 'Please especially review the "libgomp/oacc-parallel.c:goacc_wait" change, and confirm no corresponding "libgomp/oacc-parallel.c:GOACC_wait" change to be done, because that code is structured differently'. I had requested that we maintain the current behavior, that "acc_async_noval" stays in its own, separate asyncqueue, instead of aliasing it to 'async(0)'. I had proposed "into async re-work: libgomp/oacc-async.c:async2id", again attached. You said you don't like the 'async2id' function I'm adding there (I still don't understand why), so I assume you'd then implement this async-argument to queue ID translation in 'lookup_goacc_asyncqueue' proper? I had provided "[WIP] into async re-work: documentation", again attached, as 'A little bit of documentation starter update for you to include. Please make sure that all relevant functions have such comments addded'. I'm again attaching my changes 'into async re-work: replicate "[PR88407] [OpenACC] Correctly handle unseen async-arguments"', which -- I suppose -- are necessary to maintain the current GCC trunk behavior (that is, avoid testsuite regressions). I'm again attaching my changes 'into async re-work: replicate "[PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval"', which -- I suppose -- are necessary to maintain the current GCC trunk behavior (that is, avoid testsuite regressions). I'm again attaching my changes 'into async re-work: adjust for test case added in "[PR88484] OpenACC wait directive without wait argument but with async clause"', which -- I suppose -- are necessary to maintain the current GCC trunk behavior (that is, avoid testsuite regressions). You suggested that "Instead of fixing it here, will it make more sense to have the serialize_func hook to accommodate the NULL asyncqueue?", to which I said "Sure, that may make sense, yes. Right: if there's no asyncqueue to serialize with, then serialize/synchronize with the local (host) thread", but this has not yet been implemented, as far as I can tell. I'm again attaching my changes 'into async re-work: don't synchronize with the local thread unless actually necessary', which is the behavior that makes most sense to me, and I had asked 'Would you please review the "TODO" comments, and again also especially review the "libgomp/oacc-parallel.c:goacc_wait" change, and confirm no corresponding "libgomp/oacc-parallel.c:GOACC_wait" change to be done, because that code is structured differently'. By means of a "TODO" comment that I added, I had asked you to verify in your 'libgomp/oacc-parallel.c:GOACC_enter_exit_data' translation from 'async_set_async_func' function call to 'async' formal parameter, whether/why one case deliberately has not been converted; again attaching this remaining piece of 'into async re-work: more async function usage'. > The more detailed descriptions are in the individual patch submissions. I'll respond to a few items individually, later on. Grüße Thomas From da84271ae48b0d6882be3304b8b97028e08158a1 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Thu, 6 Dec 2018 15:57:46 +0100 Subject: [PATCH 1/9] into async re-work: revert default_async changes --- include/gomp-constants.h | 1 - libgomp/libgomp.map | 4 - libgomp/oacc-async.c | 19 +- libgomp/oacc-init.c | 2 - libgomp/oacc-int.h | 3 - libgomp/openacc.f90 | 22 +- libgomp/openacc.h | 3 - libgomp/openacc_lib.h | 13 - .../libgomp.oacc-c-c++-common/asyncwait-2.c | 904 ------------------ 9 files changed, 2 insertions(+), 969 deletions(-) delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c diff --git a/include/gomp-constants.h b/include/gomp-constants.h index e37f1f9e9e4..f1e2ca3c75c 100644 --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -160,7 +160,6 @@ enum gomp_map_kind /* Asynchronous behavior. Keep in sync with libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t. */ -#define GOMP_ASYNC_DEFAULT 0 #define GOMP_ASYNC_NOVAL -1 #define GOMP_ASYNC_SYNC -2 diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 8feec91f5a3..d8e2fd1818b 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -464,12 +464,8 @@ OACC_2.5 { acc_delete_finalize_async_32_h_; acc_delete_finalize_async_64_h_; acc_delete_finalize_async_array_h_; - acc_get_default_async; - acc_get_default_async_h_; acc_memcpy_from_device_async; acc_memcpy_to_device_async; - acc_set_default_async; - acc_set_default_async_h_; acc_update_device_async; acc_update_device_async_32_h_; acc_update_device_async_64_h_; diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c index 00bed7452af..ea5ae542ac3 100644 --- a/libgomp/oacc-async.c +++ b/libgomp/oacc-async.c @@ -60,7 +60,7 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async) /* The special value acc_async_noval (-1) maps to the thread-specific default async stream. */ if (async == acc_async_noval) - async = thr->default_async; + async = 0; //TODO thr->default_async; if (async == acc_async_sync) return NULL; @@ -246,23 +246,6 @@ acc_wait_all_async (int async) gomp_fatal ("wait all async(%d) failed", async); } -int -acc_get_default_async (void) -{ - struct goacc_thread *thr = get_goacc_thread (); - return thr->default_async; -} - -void -acc_set_default_async (int async) -{ - if (async < acc_async_sync) - gomp_fatal ("invalid async argument: %d", async); - - struct goacc_thread *thr = get_goacc_thread (); - thr->default_async = async; -} - attribute_hidden void goacc_async_free (struct gomp_device_descr *devicep, struct goacc_asyncqueue *aq, void *ptr) diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index a561792b243..28471e40ba0 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -426,8 +426,6 @@ goacc_attach_host_thread_to_device (int ord) thr->target_tls = acc_dev->openacc.create_thread_data_func (ord); - - thr->default_async = acc_async_default; } /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h index b343a06472e..e4b6ea6b7db 100644 --- a/libgomp/oacc-int.h +++ b/libgomp/oacc-int.h @@ -73,9 +73,6 @@ struct goacc_thread /* Target-specific data (used by plugin). */ void *target_tls; - - /* Default OpenACC async queue for current thread, exported to plugin. */ - int default_async; }; #if defined HAVE_TLS || defined USE_EMUTLS diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90 index 971c16f6239..bc205453f82 100644 --- a/libgomp/openacc.f90 +++ b/libgomp/openacc.f90 @@ -51,10 +51,9 @@ module openacc_kinds integer, parameter :: acc_handle_kind = int32 - public :: acc_async_default, acc_async_noval, acc_async_sync + public :: acc_async_noval, acc_async_sync ! Keep in sync with include/gomp-constants.h. - integer (acc_handle_kind), parameter :: acc_async_default = 0 integer (acc_handle_kind), parameter :: acc_async_noval = -1 integer (acc_handle_kind), parameter :: acc_async_sync = -2 @@ -93,16 +92,6 @@ module openacc_internal integer (acc_device_kind) d end function - subroutine acc_set_default_async_h (a) - import - integer a - end subroutine - - function acc_get_default_async_h () - import - integer acc_get_default_async_h - end function - function acc_async_test_h (a) logical acc_async_test_h integer a @@ -731,7 +720,6 @@ module openacc public :: acc_get_num_devices, acc_set_device_type, acc_get_device_type public :: acc_set_device_num, acc_get_device_num, acc_async_test - public :: acc_set_default_async, acc_get_default_async public :: acc_async_test_all public :: acc_wait, acc_async_wait, acc_wait_async public :: acc_wait_all, acc_async_wait_all, acc_wait_all_async @@ -764,14 +752,6 @@ module openacc procedure :: acc_get_device_num_h end interface - interface acc_set_default_async - procedure :: acc_set_default_async_h - end interface - - interface acc_get_default_async - procedure :: acc_get_default_async_h - end interface - interface acc_async_test procedure :: acc_async_test_h end interface diff --git a/libgomp/openacc.h b/libgomp/openacc.h index 381f74f39d0..1bbe6c90e7f 100644 --- a/libgomp/openacc.h +++ b/libgomp/openacc.h @@ -63,7 +63,6 @@ typedef enum acc_device_t { typedef enum acc_async_t { /* Keep in sync with include/gomp-constants.h. */ - acc_async_default = 0, acc_async_noval = -1, acc_async_sync = -2 } acc_async_t; @@ -73,8 +72,6 @@ void acc_set_device_type (acc_device_t) __GOACC_NOTHROW; acc_device_t acc_get_device_type (void) __GOACC_NOTHROW; void acc_set_device_num (int, acc_device_t) __GOACC_NOTHROW; int acc_get_device_num (acc_device_t) __GOACC_NOTHROW; -void acc_set_default_async (int) __GOACC_NOTHROW; -int acc_get_default_async (void) __GOACC_NOTHROW; int acc_async_test (int) __GOACC_NOTHROW; int acc_async_test_all (void) __GOACC_NOTHROW; void acc_wait (int) __GOACC_NOTHROW; diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h index 9fe47bbc48d..fbd8f5e3625 100644 --- a/libgomp/openacc_lib.h +++ b/libgomp/openacc_lib.h @@ -46,7 +46,6 @@ integer, parameter :: acc_handle_kind = 4 ! Keep in sync with include/gomp-constants.h. - integer (acc_handle_kind), parameter :: acc_async_default = 0 integer (acc_handle_kind), parameter :: acc_async_noval = -1 integer (acc_handle_kind), parameter :: acc_async_sync = -2 @@ -90,18 +89,6 @@ end function end interface - interface acc_set_default_async - subroutine acc_set_default_async_h (a) - integer a - end subroutine - end interface - - interface acc_get_default_async - function acc_get_default_async_h () - integer acc_get_default_async_h - end function - end interface - interface acc_async_test function acc_async_test_h (a) logical acc_async_test_h diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c deleted file mode 100644 index 94205407d41..00000000000 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c +++ /dev/null @@ -1,904 +0,0 @@ -/* { dg-do run { target openacc_nvidia_accel_selected } } */ -/* { dg-additional-options "-lcuda" } */ - -#include <openacc.h> -#include <stdlib.h> -#include <cuda.h> - -#include <stdio.h> -#include <time.h> -#include <sys/time.h> - -int -main (int argc, char **argv) -{ - CUresult r; - CUstream stream1; - int N = 128; //1024 * 1024; - float *a, *b, *c, *d, *e; - int i; - int nbytes; - - srand (time (NULL)); - int s = rand () % 100; - - acc_init (acc_device_nvidia); - - nbytes = N * sizeof (float); - - a = (float *) malloc (nbytes); - b = (float *) malloc (nbytes); - c = (float *) malloc (nbytes); - d = (float *) malloc (nbytes); - e = (float *) malloc (nbytes); - - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - } - - acc_set_default_async (s); - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = a[ii]; - } - -#pragma acc wait - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 3.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 2.0; - b[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 2.0) - abort (); - - if (b[i] != 2.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 9.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 2.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc parallel wait (s) async (s) - { - int ii; - - for (ii = 0; ii < N; ii++) - e[ii] = a[ii] + b[ii] + c[ii] + d[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 2.0) - abort (); - - if (b[i] != 4.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - - if (e[i] != 11.0) - abort (); - } - - - r = cuStreamCreate (&stream1, CU_STREAM_NON_BLOCKING); - if (r != CUDA_SUCCESS) - { - fprintf (stderr, "cuStreamCreate failed: %d\n", r); - abort (); - } - - acc_set_cuda_stream (1, stream1); - - for (i = 0; i < N; i++) - { - a[i] = 5.0; - b[i] = 0.0; - } - -#pragma acc data copy (a[0:N], b[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 5.0) - abort (); - - if (b[i] != 5.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 7.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 7.0) - abort (); - - if (b[i] != 49.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc parallel wait (s) async (s) - { - int ii; - - for (ii = 0; ii < N; ii++) - e[ii] = a[ii] + b[ii] + c[ii] + d[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 9.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - - if (e[i] != 17.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 4.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc update host (a[0:N], b[0:N], c[0:N]) wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 4.0) - abort (); - - if (b[i] != 16.0) - abort (); - - if (c[i] != 4.0) - abort (); - } - - - for (i = 0; i < N; i++) - { - a[i] = 5.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N) - { - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc parallel async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc update host (a[0:N], b[0:N], c[0:N]) async - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 5.0) - abort (); - - if (b[i] != 25.0) - abort (); - - if (c[i] != 4.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = a[ii]; - } - -#pragma acc wait - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 3.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 2.0; - b[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 2.0) - abort (); - - if (b[i] != 2.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 9.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 2.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc kernels wait (s) async (s) - { - int ii; - - for (ii = 0; ii < N; ii++) - e[ii] = a[ii] + b[ii] + c[ii] + d[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 2.0) - abort (); - - if (b[i] != 4.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - - if (e[i] != 11.0) - abort (); - } - - - r = cuStreamCreate (&stream1, CU_STREAM_NON_BLOCKING); - if (r != CUDA_SUCCESS) - { - fprintf (stderr, "cuStreamCreate failed: %d\n", r); - abort (); - } - - acc_set_cuda_stream (1, stream1); - - for (i = 0; i < N; i++) - { - a[i] = 5.0; - b[i] = 0.0; - } - -#pragma acc data copy (a[0:N], b[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 5.0) - abort (); - - if (b[i] != 5.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 7.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - } - -#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 7.0) - abort (); - - if (b[i] != 49.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; - } - -#pragma acc kernels wait (s) async (s) - { - int ii; - - for (ii = 0; ii < N; ii++) - e[ii] = a[ii] + b[ii] + c[ii] + d[ii]; - } - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 9.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - - if (e[i] != 17.0) - abort (); - } - - for (i = 0; i < N; i++) - { - a[i] = 4.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc update host (a[0:N], b[0:N], c[0:N]) wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 4.0) - abort (); - - if (b[i] != 16.0) - abort (); - - if (c[i] != 4.0) - abort (); - } - - - for (i = 0; i < N; i++) - { - a[i] = 5.0; - b[i] = 0.0; - c[i] = 0.0; - d[i] = 0.0; - e[i] = 0.0; - } - -#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N) - { - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; - } - -#pragma acc kernels async - { - int ii; - - for (ii = 0; ii < N; ii++) - c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; - } - -#pragma acc update host (a[0:N], b[0:N], c[0:N]) async - -#pragma acc wait (s) - - } - - for (i = 0; i < N; i++) - { - if (a[i] != 5.0) - abort (); - - if (b[i] != 25.0) - abort (); - - if (c[i] != 4.0) - abort (); - } - - acc_shutdown (acc_device_nvidia); - - return 0; -}