Message ID | 20191109010421.5b8b689c@squid.athome |
---|---|
State | New |
Headers | show |
Series | OpenACC "present" subarrays: runtime API return value and unmapping fixes | expand |
Hi Julian! On 2019-11-09T01:04:21+0000, Julian Brown <julian@codesourcery.com> wrote: > This patch fixes an issue I noticed when investigating an answer > for Thomas's question about device pointer return values in: > > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg02260.html > > It looks to me like the return value for the present case is wrong in > the existing code: in case of a acc_pcopyin or similar call that refers > to a subarray of a larger block already mapped on the target, the > device pointer return value will be the start of the larger block, not > of the subarray being copied. Note that I've filed <https://gcc.gnu.org/PR92511> "[OpenACC] Support subset subarray mappings", so please reference that one in the ChangeLog/commit log. Principal ACK for that problem, and it's solution ('libgomp/oacc-mem.c:present_create_copy' 'if (n)' change). Then, I was confused, because I couldn't really find wording in the OpenACC specification that explicitly permits such things. But given that, for example, in OpenACC 2.7, 3.2.20. "acc_copyin", 'acc_copyin' is described to be "equivalent to the 'enter data' directive with a 'copyin' clause", and the latter supposedly (?) does allow such "subset subarray mappings", and in 2.7.6. "copyin clause" it is said that "An 'enter data' directive with a 'copyin' clause is functionally equivalent to a call to the 'acc_copyin' API routine", that's probably motivation enough to fix the latter to conform what the former supposedly already is allowing (though not implementing by means of 'enter data copyin' just calling 'acc_copyin' etc. I see that 2.7.6. "copyin clause" also states that "The restrictions regarding subarrays in the present clause apply to this clause", which per 2.7.4. "present clause" is that "If only a subarray of an array is present in the current device memory, the 'present' clause must specify the same subarray, or a subarray that is a proper subset of the subarray in the data lifetime". From that we probably are to deduce that it's fine the other way round (as you've argued): if a subarray of an array (or, the whole array) is present in the current device memory, the 'present' clause may specify the same subarray, or a subarray that is a proper subset of the subarray in the data lifetime (my words). Unless you object to that, we shall (later) try to get the clarified/amended in the OpenACC specification. Indeed I am confirming that such subset subarray mappings do work fine with PGI 19.4 and 19.10 -- but only when using OpenACC directives, not necessarily when using OpenACC runtime library calls, huh. (That's not our problem to solve, of course, and under the assumption that my test case has actually been valid.) Later (not now), we should then also add corresponding testing for actual 'data' etc. constructs being nested in that way. > The attached patch corrects this issue, and also relaxes a restriction > on acc_delete, acc_copyout (etc.) to allow them to unmap/copyout > subarrays of a larger block already present on the target. There's no > particular reason to disallow that, as far as I can tell. (That's where PGI fails at runtime, but I have not analyzed how exactly this fails -- let's first clarify that with OpenACC Technical Committee, later on.) > This is > necessary to allow the new tests included with this patch to pass, and > a couple of existing "shouldfail" tests no longer fail, and have been > adjusted accordingly. These should then actually be removed, or re-written, because in their current form they no longer make much sense, as far as I can tell: For example, 'libgomp.oacc-c-c++-common/lib-22.c': acc_copyin (h, N); ... followed by: acc_copyout (h + 1, N - 1); ... is now meant to no longer abort with a "surrounds2" message, but instead we now expect success, and '!acc_is_present'. I'll take care of that later on -- I have some more tests to add anyway. > It's still an error to try to copy data beyond > the bounds of a mapped block, and other existing tests cover those > cases. ACK. > The calculation for the return value for the non-present case of > present_create_copy has also been adjusted in anticipation of a new > version of the above-linked patch. But please back out this one, for it's not related to this bug fix, and we shall take care of that in a later patch. (No need for you to re-post that one just for this.) > Tested with offloading to nvptx. OK for trunk? I'm see C++ compilation failures the new libgomp test cases; OK with these resolved. To record the review effort, please include "Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>" in the commit log, see <https://gcc.gnu.org/wiki/Reviewed-by>. Grüße Thomas
Hi! On 2019-11-14T17:02:02+0100, I wrote: > [...] I couldn't really find wording in the > OpenACC specification that explicitly permits such things. But given > that, for example, in OpenACC 2.7, 3.2.20. "acc_copyin", 'acc_copyin' is > described to be "equivalent to the 'enter data' directive with a 'copyin' > clause", and the latter supposedly (?) does allow such "subset subarray > mappings", and in 2.7.6. "copyin clause" it is said that "An 'enter data' > directive with a 'copyin' clause is functionally equivalent to a call to > the 'acc_copyin' API routine", that's probably motivation enough to fix > the latter to conform what the former supposedly already is allowing > (though not implementing by means of 'enter data copyin' just calling > 'acc_copyin' etc. > > I see that 2.7.6. "copyin clause" also states that "The restrictions > regarding subarrays in the present clause apply to this clause", which > per 2.7.4. "present clause" is that "If only a subarray of an array is > present in the current device memory, the 'present' clause must specify > the same subarray, or a subarray that is a proper subset of the subarray > in the data lifetime". From that we probably are to deduce that it's > fine the other way round (as you've argued): if a subarray of an array > (or, the whole array) is present in the current device memory, the > 'present' clause may specify the same subarray, or a subarray that is a > proper subset of the subarray in the data lifetime (my words). Unless > you object to that, we shall (later) try to get the clarified/amended in > the OpenACC specification. I filed <https://github.com/OpenACC/openacc-spec/issues/247> "Subset subarray restrictions". > Later (not now), we should then also add corresponding testing for actual > 'data' etc. constructs being nested in that way. > On 2019-11-09T01:04:21+0000, Julian Brown <julian@codesourcery.com> wrote: >> a couple of existing "shouldfail" tests no longer fail, and have been >> adjusted accordingly. > > These should then actually be removed, or re-written, because in their > current form they no longer make much sense, as far as I can tell: > > For example, 'libgomp.oacc-c-c++-common/lib-22.c': > > acc_copyin (h, N); > > ... followed by: > > acc_copyout (h + 1, N - 1); > > ... is now meant to no longer abort with a "surrounds2" message, but > instead we now expect success, and '!acc_is_present'. > > I'll take care of that later on -- I have some more tests to add anyway. See attached '[PR92511] More testing for OpenACC "present" subarrays', committed to trunk in r279122. Grüße Thomas
commit 00607b06c8e506b0f0744a230856e1e8776633c3 Author: Julian Brown <julian@codesourcery.com> Date: Thu Nov 7 14:24:49 2019 -0800 OpenACC "present" subarrays: runtime API return value and unmapping fixes libgomp/ * oacc-mem.c (present_create_copy): Fix device pointer return value in case of "present" subarray. Use tgt->tgt_start instead of tgt->to_free in non-present/create case. (delete_copyout): Change error condition to fail only on copies outside of mapped block. Adjust error message accordingly. * testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/lib-20.c: Adjust expected error message. * testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-22.c: Allow test to pass now. * testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise. diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index 2f271009fb8..0a41f11210c 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -535,7 +535,7 @@ present_create_copy (unsigned f, void *h, size_t s, int async) if (n) { /* Present. */ - d = (void *) (n->tgt->tgt_start + n->tgt_offset); + d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start); if (!(f & FLAG_PRESENT)) { @@ -584,7 +584,7 @@ present_create_copy (unsigned f, void *h, size_t s, int async) gomp_mutex_lock (&acc_dev->lock); - d = tgt->to_free; + d = (void *) tgt->tgt_start; tgt->prev = acc_dev->openacc.data_environ; acc_dev->openacc.data_environ = tgt; @@ -669,7 +669,6 @@ acc_pcopyin (void *h, size_t s) static void delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) { - size_t host_size; splay_tree_key n; void *d; struct goacc_thread *thr = goacc_thread (); @@ -703,13 +702,12 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) d = (void *) (n->tgt->tgt_start + n->tgt_offset + (uintptr_t) h - n->host_start); - host_size = n->host_end - n->host_start; - - if (n->host_start != (uintptr_t) h || host_size != s) + if ((uintptr_t) h < n->host_start || (uintptr_t) h + s > n->host_end) { + size_t host_size = n->host_end - n->host_start; gomp_mutex_unlock (&acc_dev->lock); - gomp_fatal ("[%p,%d] surrounds2 [%p,+%d]", - (void *) n->host_start, (int) host_size, (void *) h, (int) s); + gomp_fatal ("[%p,+%d] outside mapped block [%p,+%d]", + (void *) h, (int) s, (void *) n->host_start, (int) host_size); } if (n->refcount == REFCOUNT_INFINITY) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c new file mode 100644 index 00000000000..bee0b10ca7b --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c @@ -0,0 +1,28 @@ +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */ + +#include <openacc.h> +#include <stdlib.h> +#include <assert.h> +#include <stdint.h> + +int main (int argc, char* argv[]) +{ + char* myblock = malloc (1024); + int i; + void *dst; + for (i = 0; i < 1024; i++) + myblock[i] = i; + dst = acc_copyin (myblock, 1024); + for (i = 0; i < 1024; i += 256) + { + void *partdst = acc_pcopyin (&myblock[i], 256); + assert ((uintptr_t) partdst == (uintptr_t) dst + i); + } + for (i = 0; i < 1024; i += 256) + acc_delete (&myblock[i], 256); + assert (acc_is_present (myblock, 1024)); + acc_delete (myblock, 1024); + assert (!acc_is_present (myblock, 1024)); + free (myblock); + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c new file mode 100644 index 00000000000..d35ab5c4b71 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c @@ -0,0 +1,35 @@ +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */ + +#include <openacc.h> +#include <stdlib.h> +#include <assert.h> +#include <stdint.h> + +int main (int argc, char* argv[]) +{ + char* block1 = malloc (1024); + char *block2 = malloc (1024); + char *block3 = malloc (1024); + int i; + void *dst; + for (i = 0; i < 1024; i++) + block1[i] = block2[i] = block3[i] = i; + #pragma acc data copyin(block1[0:1024]) copyin(block2[0:1024]) \ + copyin(block3[0:1024]) + { + dst = acc_deviceptr (block2); + for (i = 0; i < 1024; i += 256) + { + void *partdst = acc_pcopyin (&block2[i], 256); + assert ((uintptr_t) partdst == (uintptr_t) dst + i); + } + } + assert (acc_is_present (block2, 1024)); + for (i = 0; i < 1024; i += 256) + acc_delete (&block2[i], 256); + assert (!acc_is_present (block2, 1024)); + free (block1); + free (block2); + free (block3); + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c index 25ceb3a26af..10d3cbc5cc6 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c @@ -31,5 +31,5 @@ main (int argc, char **argv) } /* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */ -/* { dg-output "\\\[\[0-9a-fA-FxX\]+,256\\\] surrounds2 \\\[\[0-9a-fA-FxX\]+,\\\+257\\\]" } */ +/* { dg-output "\\\[\[0-9a-fA-FxX\]+,\\\+257\\\] outside mapped block \\\[\[0-9a-fA-FxX\]+,\\\+256\\\]" } */ /* { dg-shouldfail "" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c index 65ff440a528..cb32bbcb652 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c @@ -31,5 +31,3 @@ main (int argc, char **argv) } /* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */ -/* { dg-output "\\\[\[0-9a-fA-FxX\]+,256\\\] surrounds2 \\\[\[0-9a-fA-FxX\]+,\\\+255\\\]" } */ -/* { dg-shouldfail "" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c index fd4dc5971a1..b1f3e71f278 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c @@ -41,5 +41,5 @@ main (int argc, char **argv) } /* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */ -/* { dg-output "\\\[\[0-9a-fA-FxX\]+,256\\\] surrounds2 \\\[\[0-9a-fA-FxX\]+,\\\+512\\\]" } */ +/* { dg-output "\\\[\[0-9a-fA-FxX\]+,\\\+512\\\] outside mapped block \\\[\[0-9a-fA-FxX\]+,\\\+256\\\]" } */ /* { dg-shouldfail "" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c index 9bc9ecc1068..d0e5ffb0691 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c @@ -28,5 +28,3 @@ main (int argc, char **argv) } /* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */ -/* { dg-output "\\\[\[0-9a-fA-FxX\]+,256\\\] surrounds2 \\\[\[0-9a-fA-FxX\]+,\\\+254\\\]" } */ -/* { dg-shouldfail "" } */