From patchwork Fri Dec 14 21:10:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 1013710 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-492525-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ex7MPX/6"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43GjrC6lDcz9s3Z for ; Sat, 15 Dec 2018 08:10:35 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=uOs8a8fagt3ZzHMZ 9fSwgSATQS5PqExiBqgVrONyT2UXxm0e3VdGm0yHd7PJ1WBeu2uWO7gobYH+q8LX b/3zILfWAJjGRt1wWzIoiRm1Huxwuo/ExVQsMImz5dvj/cyYLi7yPfGQGVcML8rD P2Ch0kD4TlBbCgIZzS+z17fXJH0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=default; bh=YjouBtpTLmPOmoz5GdmIkU vthZ4=; b=ex7MPX/6OdF9dBva1l/ZWiTVCghxfHRhPndOGbwlmAXLpiEECWDSt9 kOi4yuBCKakLCLZiHenrIIPmIUwxIp9qm2NGy53ito9LszZcpghajQzRnJeN987r PYdbmJucPRHFPInX/tXESQSByfE6t1lEqa4TROd5KYo3et58iMFoM= Received: (qmail 66623 invoked by alias); 14 Dec 2018 21:10:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 66607 invoked by uid 89); 14 Dec 2018 21:10:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=revise, 2724 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 14 Dec 2018 21:10:21 +0000 Received: from svr-orw-mbx-05.mgc.mentorg.com ([147.34.90.205]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1gXuix-0005X8-GP from Thomas_Schwinge@mentor.com for gcc-patches@gcc.gnu.org; Fri, 14 Dec 2018 13:10:19 -0800 Received: from SVR-ORW-MBX-09.mgc.mentorg.com (147.34.90.209) by SVR-ORW-MBX-05.mgc.mentorg.com (147.34.90.205) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Fri, 14 Dec 2018 13:10:17 -0800 Received: from tftp-cs (147.34.91.1) by SVR-ORW-MBX-09.mgc.mentorg.com (147.34.90.209) with Microsoft SMTP Server id 15.0.1320.4 via Frontend Transport; Fri, 14 Dec 2018 13:10:16 -0800 Received: by tftp-cs (Postfix, from userid 49978) id 988EDC2321; Fri, 14 Dec 2018 13:10:16 -0800 (PST) From: Thomas Schwinge To: CC: Chung-Lin Tang Subject: Revise libgomp.oacc-c-c++-common/data-2-lib.c, libgomp.oacc-c-c++-common/data-2.c User-Agent: Notmuch/0.9-125-g4686d11 (http://notmuchmail.org) Emacs/25.2.2 (x86_64-pc-linux-gnu) Date: Fri, 14 Dec 2018 22:10:06 +0100 Message-ID: <87pnu3rf0x.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Hi! Committed to trunk in r267149: commit 1d61d32a5dda2b567f2253284ce3ecf40c253fab Author: tschwinge Date: Fri Dec 14 20:42:29 2018 +0000 Revise libgomp.oacc-c-c++-common/data-2-lib.c, libgomp.oacc-c-c++-common/data-2.c These are meant to be functionally equivalent (but no longer are), just using different means. Also, use the OpenACC "*_async" functions recently added. libgomp/ * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Revise. * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267149 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog | 5 + .../libgomp.oacc-c-c++-common/data-2-lib.c | 129 ++++++++---------- .../testsuite/libgomp.oacc-c-c++-common/data-2.c | 148 +++++++++------------ 3 files changed, 125 insertions(+), 157 deletions(-) Grüße Thomas diff --git libgomp/ChangeLog libgomp/ChangeLog index b6cbb34908a2..d84c3f4bfe2e 100644 --- libgomp/ChangeLog +++ libgomp/ChangeLog @@ -1,3 +1,8 @@ +2018-12-14 Thomas Schwinge + + * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Revise. + * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise. + 2018-12-14 Chung-Lin Tang * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Adjust. diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c index f553d3d839c5..e432f8d9c796 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c @@ -1,16 +1,15 @@ -/* This test is similar to data-2.c, but it uses acc_* library functions - to move data. */ - -/* { dg-do run } */ +/* Test asynchronous, unstructed data regions, runtime library variant. */ +/* See also data-2.c. */ #include +#undef NDEBUG #include #include int main (int argc, char **argv) { - int N = 128; //1024 * 1024; + int N = 12345; float *a, *b, *c, *d, *e; void *d_a, *d_b, *d_c, *d_d; int i; @@ -30,19 +29,21 @@ main (int argc, char **argv) b[i] = 0.0; } - d_a = acc_copyin (a, nbytes); - d_b = acc_copyin (b, nbytes); - acc_copyin (&N, sizeof (int)); + acc_copyin_async (a, nbytes, acc_async_noval); + acc_copyin_async (b, nbytes, acc_async_noval); + acc_copyin_async (&N, sizeof (int), acc_async_noval); -#pragma acc parallel present (a[0:N], b[0:N], N) async wait +#pragma acc parallel present (a[0:N], b[0:N], N) async #pragma acc loop for (i = 0; i < N; i++) b[i] = a[i]; - acc_wait_all (); + d_a = acc_deviceptr (a); + acc_memcpy_from_device_async (a, d_a, nbytes, acc_async_noval); + d_b = acc_deviceptr (b); + acc_memcpy_from_device_async (b, d_b, nbytes, acc_async_noval); - acc_memcpy_from_device (a, d_a, nbytes); - acc_memcpy_from_device (b, d_b, nbytes); + acc_wait (acc_async_noval); for (i = 0; i < N; i++) { @@ -56,19 +57,19 @@ main (int argc, char **argv) b[i] = 0.0; } - acc_update_device (a, nbytes); - acc_update_device (b, nbytes); + acc_update_device_async (a, nbytes, 1); + acc_update_device_async (b, nbytes, 1); -#pragma acc parallel present (a[0:N], b[0:N], N) async (1) +#pragma acc parallel present (a[0:N], b[0:N], N) async (1) #pragma acc loop for (i = 0; i < N; i++) b[i] = a[i]; + acc_memcpy_from_device_async (a, d_a, nbytes, 1); + acc_memcpy_from_device_async (b, d_b, nbytes, 1); + acc_wait (1); - acc_memcpy_from_device (a, d_a, nbytes); - acc_memcpy_from_device (b, d_b, nbytes); - for (i = 0; i < N; i++) { assert (a[i] == 2.0); @@ -83,46 +84,42 @@ main (int argc, char **argv) d[i] = 0.0; } - acc_update_device (a, nbytes); - acc_update_device (b, nbytes); - d_c = acc_copyin (c, nbytes); - d_d = acc_copyin (d, nbytes); + acc_update_device_async (a, nbytes, 0); + acc_update_device_async (b, nbytes, 1); + acc_copyin_async (c, nbytes, 2); + acc_copyin_async (d, nbytes, 3); -#pragma acc parallel present (a[0:N], b[0:N], N) async (1) +#pragma acc parallel present (a[0:N], b[0:N], N) wait (0) async (1) #pragma acc loop for (i = 0; i < N; i++) b[i] = (a[i] * a[i] * a[i]) / a[i]; -#pragma acc parallel present (a[0:N], c[0:N], N) async (2) +#pragma acc parallel present (a[0:N], c[0:N], N) wait (0) async (2) #pragma acc loop for (i = 0; i < N; i++) c[i] = (a[i] + a[i] + a[i] + a[i]) / a[i]; -#pragma acc parallel present (a[0:N], d[0:N], N) async (3) +#pragma acc parallel present (a[0:N], d[0:N], N) wait (0) async (3) #pragma acc loop for (i = 0; i < N; i++) d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i]; - acc_wait_all (); + acc_memcpy_from_device_async (a, d_a, nbytes, 0); + acc_memcpy_from_device_async (b, d_b, nbytes, 1); + d_c = acc_deviceptr (c); + acc_memcpy_from_device_async (c, d_c, nbytes, 2); + d_d = acc_deviceptr (d); + acc_memcpy_from_device_async (d, d_d, nbytes, 3); - acc_memcpy_from_device (a, d_a, nbytes); - acc_memcpy_from_device (b, d_b, nbytes); - acc_memcpy_from_device (c, d_c, nbytes); - acc_memcpy_from_device (d, d_d, nbytes); + acc_wait_all_async (0); + acc_wait (0); for (i = 0; i < N; i++) { - if (a[i] != 3.0) - abort (); - - if (b[i] != 9.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); + assert (a[i] == 3.0); + assert (b[i] == 9.0); + assert (c[i] == 4.0); + assert (d[i] == 1.0); } for (i = 0; i < N; i++) @@ -134,53 +131,43 @@ main (int argc, char **argv) e[i] = 0.0; } - acc_update_device (a, nbytes); - acc_update_device (b, nbytes); - acc_update_device (c, nbytes); - acc_update_device (d, nbytes); - acc_copyin (e, nbytes); + acc_update_device_async (a, nbytes, 10); + acc_update_device_async (b, nbytes, 11); + acc_update_device_async (c, nbytes, 12); + acc_update_device_async (d, nbytes, 13); + acc_copyin_async (e, nbytes, 14); -#pragma acc parallel present (a[0:N], b[0:N], N) async (1) +#pragma acc parallel present (a[0:N], b[0:N], N) wait (10) async (11) for (int ii = 0; ii < N; ii++) b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; -#pragma acc parallel present (a[0:N], c[0:N], N) async (2) +#pragma acc parallel present (a[0:N], c[0:N], N) wait (10) async (12) for (int ii = 0; ii < N; ii++) c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; -#pragma acc parallel present (a[0:N], d[0:N], N) async (3) +#pragma acc parallel present (a[0:N], d[0:N], N) wait (10) async (13) for (int ii = 0; ii < N; ii++) d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; -#pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N], N) \ - wait (1, 2, 3) async (4) +#pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N], N) wait (11) wait (12) wait (13) async (14) for (int ii = 0; ii < N; ii++) e[ii] = a[ii] + b[ii] + c[ii] + d[ii]; + acc_copyout_async (a, nbytes, 10); + acc_copyout_async (b, nbytes, 11); + acc_copyout_async (c, nbytes, 12); + acc_copyout_async (d, nbytes, 13); + acc_copyout_async (e, nbytes, 14); + acc_delete_async (&N, sizeof (int), 15); acc_wait_all (); - acc_copyout (a, nbytes); - acc_copyout (b, nbytes); - acc_copyout (c, nbytes); - acc_copyout (d, nbytes); - acc_copyout (e, nbytes); - acc_delete (&N, sizeof (int)); for (i = 0; i < N; i++) { - if (a[i] != 2.0) - abort (); - - if (b[i] != 4.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - - if (e[i] != 11.0) - abort (); + assert (a[i] == 2.0); + assert (b[i] == 4.0); + assert (c[i] == 4.0); + assert (d[i] == 1.0); + assert (e[i] == 11.0); } return 0; diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c index 81d623afa0ea..c0f36d3be6ba 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c @@ -1,14 +1,14 @@ -/* Test 'acc enter/exit data' regions. */ - -/* { dg-do run } */ -/* { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */ +/* Test asynchronous, unstructed data regions, directives variant. */ +/* See also data-2-lib.c. */ #include +#undef NDEBUG +#include int main (int argc, char **argv) { - int N = 128; //1024 * 1024; + int N = 12345; float *a, *b, *c, *d, *e; int i; int nbytes; @@ -27,48 +27,24 @@ main (int argc, char **argv) b[i] = 0.0; } -#pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async -#pragma acc parallel present (a[0:N], b[0:N]) async wait -#pragma acc loop - for (i = 0; i < N; i++) - b[i] = a[i]; - -#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) wait async -#pragma acc wait - - for (i = 0; i < N; i++) - { - if (a[i] != 3.0) - abort (); - - if (b[i] != 3.0) - abort (); - } +#pragma acc enter data copyin (a[0:N]) async +#pragma acc enter data copyin (b[0:N]) async +#pragma acc enter data copyin (N) async - for (i = 0; i < N; i++) - { - a[i] = 3.0; - b[i] = 0.0; - } - -#pragma acc enter data copyin (a[0:N]) async -#pragma acc enter data copyin (b[0:N]) async wait -#pragma acc enter data copyin (N) async wait -#pragma acc parallel async wait +#pragma acc parallel present (a[0:N], b[0:N], N) async #pragma acc loop for (i = 0; i < N; i++) b[i] = a[i]; -#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) delete (N) wait async +#pragma acc update self (a[0:N]) async +#pragma acc update self (b[0:N]) async + #pragma acc wait for (i = 0; i < N; i++) { - if (a[i] != 3.0) - abort (); - - if (b[i] != 3.0) - abort (); + assert (a[i] == 3.0); + assert (b[i] == 3.0); } for (i = 0; i < N; i++) @@ -77,22 +53,23 @@ main (int argc, char **argv) b[i] = 0.0; } -#pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async (1) -#pragma acc parallel present (a[0:N], b[0:N]) async (1) +#pragma acc update device (a[0:N]) async (1) +#pragma acc update device (b[0:N]) async (1) + +#pragma acc parallel present (a[0:N], b[0:N], N) async (1) #pragma acc loop for (i = 0; i < N; i++) b[i] = a[i]; -#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) wait (1) async (1) +#pragma acc update self (a[0:N]) async (1) +#pragma acc update self (b[0:N]) async (1) + #pragma acc wait (1) for (i = 0; i < N; i++) { - if (a[i] != 2.0) - abort (); - - if (b[i] != 2.0) - abort (); + assert (a[i] == 2.0); + assert (b[i] == 2.0); } for (i = 0; i < N; i++) @@ -103,39 +80,40 @@ main (int argc, char **argv) d[i] = 0.0; } -#pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (c[0:N]) copyin (d[0:N]) copyin (N) async (1) +#pragma acc update device (a[0:N]) async (0) +#pragma acc update device (b[0:N]) async (1) +#pragma acc enter data copyin (c[0:N]) async (2) +#pragma acc enter data copyin (d[0:N]) async (3) -#pragma acc parallel present (a[0:N], b[0:N]) async (1) wait (1) +#pragma acc parallel present (a[0:N], b[0:N], N) wait (0) async (1) #pragma acc loop for (i = 0; i < N; i++) b[i] = (a[i] * a[i] * a[i]) / a[i]; -#pragma acc parallel present (a[0:N], c[0:N]) async (2) wait (1) +#pragma acc parallel present (a[0:N], c[0:N], N) wait (0) async (2) #pragma acc loop for (i = 0; i < N; i++) c[i] = (a[i] + a[i] + a[i] + a[i]) / a[i]; -#pragma acc parallel present (a[0:N], d[0:N]) async (3) wait (1) +#pragma acc parallel present (a[0:N], d[0:N], N) wait (0) async (3) #pragma acc loop for (i = 0; i < N; i++) d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i]; -#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) copyout (c[0:N]) copyout (d[0:N]) wait (1, 2, 3) async (1) -#pragma acc wait (1) +#pragma acc update self (a[0:N]) async (0) +#pragma acc update self (b[0:N]) async (1) +#pragma acc update self (c[0:N]) async (2) +#pragma acc update self (d[0:N]) async (3) + +#pragma acc wait async (0) +#pragma acc wait (0) for (i = 0; i < N; i++) { - if (a[i] != 3.0) - abort (); - - if (b[i] != 9.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); + assert (a[i] == 3.0); + assert (b[i] == 9.0); + assert (c[i] == 4.0); + assert (d[i] == 1.0); } for (i = 0; i < N; i++) @@ -147,45 +125,43 @@ main (int argc, char **argv) e[i] = 0.0; } -#pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (c[0:N]) copyin (d[0:N]) copyin (e[0:N]) copyin (N) async (1) +#pragma acc update device (a[0:N]) async (10) +#pragma acc update device (b[0:N]) async (11) +#pragma acc update device (c[0:N]) async (12) +#pragma acc update device (d[0:N]) async (13) +#pragma acc enter data copyin (e[0:N]) async (14) -#pragma acc parallel present (a[0:N], b[0:N]) async (1) wait (1) +#pragma acc parallel present (a[0:N], b[0:N], N) wait (10) async (11) for (int ii = 0; ii < N; ii++) b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii]; -#pragma acc parallel present (a[0:N], c[0:N]) async (2) wait (1) +#pragma acc parallel present (a[0:N], c[0:N], N) wait (10) async (12) for (int ii = 0; ii < N; ii++) c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; -#pragma acc parallel present (a[0:N], d[0:N]) async (3) wait (1) +#pragma acc parallel present (a[0:N], d[0:N], N) wait (10) async (13) for (int ii = 0; ii < N; ii++) d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; -#pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) \ - wait (1, 2, 3) async (4) +#pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N], N) wait (11) wait (12) wait (13) async (14) for (int ii = 0; ii < N; ii++) e[ii] = a[ii] + b[ii] + c[ii] + d[ii]; -#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) copyout (c[0:N]) \ - copyout (d[0:N]) copyout (e[0:N]) wait (1, 2, 3, 4) async (1) -#pragma acc wait (1) +#pragma acc exit data copyout (a[0:N]) async (10) +#pragma acc exit data copyout (b[0:N]) async (11) +#pragma acc exit data copyout (c[0:N]) async (12) +#pragma acc exit data copyout (d[0:N]) async (13) +#pragma acc exit data copyout (e[0:N]) async (14) +#pragma acc exit data delete (N) async (15) +#pragma acc wait for (i = 0; i < N; i++) { - if (a[i] != 2.0) - abort (); - - if (b[i] != 4.0) - abort (); - - if (c[i] != 4.0) - abort (); - - if (d[i] != 1.0) - abort (); - - if (e[i] != 11.0) - abort (); + assert (a[i] == 2.0); + assert (b[i] == 4.0); + assert (c[i] == 4.0); + assert (d[i] == 1.0); + assert (e[i] == 11.0); } return 0;