Message ID | 87y32ggz9o.fsf@euler.schwinge.homeip.net |
---|---|
State | New |
Headers | show |
Series | [PR90743] Fortran 'allocatable' with OpenACC data/OpenMP 'target' 'map' clauses | expand |
On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote: > libgomp/ > PR fortran/90743 > * oacc-parallel.c (GOACC_parallel_keyed): Handle NULL case. > * testsuite/libgomp.fortran/target-allocatable-1.f90: New file. > * testsuite/libgomp.oacc-fortran/allocatable-1.f90: New file. > --- > libgomp/oacc-parallel.c | 9 ++- > .../libgomp.fortran/target-allocatable-1.f90 | 8 +++ > .../libgomp.oacc-fortran/allocatable-1.f90 | 70 +++++++++++++++++++ > 3 files changed, 84 insertions(+), 3 deletions(-) > create mode 100644 libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90 > create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90 > > diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c > index e56330f6226b..0c2cfa05a438 100644 > --- a/libgomp/oacc-parallel.c > +++ b/libgomp/oacc-parallel.c > @@ -325,9 +325,12 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *), > > devaddrs = gomp_alloca (sizeof (void *) * mapnum); > for (i = 0; i < mapnum; i++) > - devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start > - + tgt->list[i].key->tgt_offset > - + tgt->list[i].offset); > + if (tgt->list[i].key != NULL) > + devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start > + + tgt->list[i].key->tgt_offset > + + tgt->list[i].offset); > + else > + devaddrs[i] = NULL; I don't know what does OpenACC require for allocatables, so can't comment on this (and it falls under OpenACC maintainance). > + > + !$omp target map(to: a) map(tofrom: b, c, d) map(from: e) > + !$acc parallel copyin(a) copy(b, c, d) copyout(e) Is mixing OpenMP and OpenACC construct this way defined at all? I see we reject OpenMP constructs inside of OpenACC contexts, and using OpenACC constructs inside host OpenMP constructs should be generally fine too, but mixing OpenMP offloading constructs with OpenACC constructs sounds wrong. Jakub
Hi Jakub! On Wed, 5 Jun 2019 12:00:25 +0200, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote: > > + !$omp target map(to: a) map(tofrom: b, c, d) map(from: e) > > + !$acc parallel copyin(a) copy(b, c, d) copyout(e) > > Is mixing OpenMP and OpenACC construct this way defined at all? It's not. I'm using this just to avoid duplicating the test case file, that is, '-fopenacc' and '-fopenmp' aren't enabled at the same time. > I see we reject OpenMP constructs inside of OpenACC contexts ACK. > and > using OpenACC constructs inside host OpenMP constructs should be generally > fine too ACK. (Should, but currently isn't.) > but mixing OpenMP offloading constructs with OpenACC constructs > sounds wrong. ACK. (Unless that gets defined by the two standards.) Grüße Thomas
On Wed, Jun 05, 2019 at 12:26:32PM +0200, Thomas Schwinge wrote: > Hi Jakub! > > On Wed, 5 Jun 2019 12:00:25 +0200, Jakub Jelinek <jakub@redhat.com> wrote: > > On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote: > > > + !$omp target map(to: a) map(tofrom: b, c, d) map(from: e) > > > + !$acc parallel copyin(a) copy(b, c, d) copyout(e) > > > > Is mixing OpenMP and OpenACC construct this way defined at all? > > It's not. I'm using this just to avoid duplicating the test case file, > that is, '-fopenacc' and '-fopenmp' aren't enabled at the same time. I think it is better to duplicate the test, it avoids confusion. Jakub
Hi! On Wed, 05 Jun 2019 11:25:07 +0200, I wrote: > I [...] had a look at what OpenMP 5.0 > is saying about Fortran 'allocatable' in 'map' clauses [...] Will you (Jakub, and/or Fortran guys), please have a look at the attached test case. If I disable the '!$omp declare target (ar)', then this works with offloading enabled, otherwise it fails, and here is my understanding what happens. With '!$omp declare target (ar)' active, the array descriptor globally gets allocated on the device, via the 'offload_vars' handling in 'gcc/omp-offload.c' etc. Then, in 'gcc/omp-low.c:scan_sharing_clauses', the 'GOMP_MAP_TO_PSET(ar)' (which would update the array descriptor on the device after the host-side 'allocate') gets removed, because 'Global variables with "omp declare target" attribute don't need to be copied, the receiver side will use them directly'. So, on the device, we'll still have the dummy (unallocated) array descriptor, and will thus fail. But even with that behavior disabled, we'll still fail, because in 'libgomp/target.c:gomp_map_vars_internal', when processing the 'GOMP_MAP_TO_PSET(ar)', we find that 'ar' has already been mapped (globally, as mentioned above), so for 'n && n->refcount != REFCOUNT_LINK', we'll call 'gomp_map_vars_existing', and that one again won't update device-side the array descriptor, because it's not 'GOMP_MAP_ALWAYS_TO_P'. Indeed, if I use '!$omp declare target link(ar)' ('link' added), then things seem to work as expected. Unless I got something wrong, at which level do you suggest this should be fixed? Grüße Thomas ! { dg-do run } module mod implicit none integer, parameter :: n = 40 integer, allocatable :: ar(:,:,:) !$omp declare target (ar) end module mod program main use mod implicit none integer :: i integer, parameter :: ar_rank = rank(ar) integer :: ar_rank_v integer :: ar_size_v integer, dimension(ar_rank) :: ar_extent_v integer, dimension(ar_rank) :: ar_shape_v allocate (ar(n,0:n+1,-1:n+2)) !$omp target enter data map(alloc:ar) !$omp target map(from:ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v) ar_rank_v = rank(ar) ar_size_v = size(ar) do i = 1, ar_rank ar_extent_v(i) = size(ar, i) end do ar_shape_v = shape(ar) !$omp end target print *, ar_rank_v if (ar_rank_v /= rank(ar)) error stop print *, ar_size_v if (ar_size_v /= size(ar)) error stop print *, ar_extent_v if (any (ar_extent_v /= shape(ar))) error stop print *, ar_shape_v if (any (ar_shape_v /= shape(ar))) error stop end program main
Hi! On Fri, 21 Jun 2019 13:46:09 +0200, I wrote: > On Wed, 05 Jun 2019 11:25:07 +0200, I wrote: > > I [...] had a look at what OpenMP 5.0 > > is saying about Fortran 'allocatable' in 'map' clauses [...] > > Will you (Jakub, and/or Fortran guys), please have a look at the attached > test case. > > If I disable the '!$omp declare target (ar)', then this works with > offloading enabled, otherwise it fails, and here is my understanding what > happens. > > With '!$omp declare target (ar)' active, the array descriptor globally > gets allocated on the device, via the 'offload_vars' handling in > 'gcc/omp-offload.c' etc. > > Then, in 'gcc/omp-low.c:scan_sharing_clauses', the 'GOMP_MAP_TO_PSET(ar)' > (which would update the array descriptor on the device after the > host-side 'allocate') gets removed, because 'Global variables with "omp > declare target" attribute don't need to be copied, the receiver side will > use them directly'. So, on the device, we'll still have the dummy > (unallocated) array descriptor, and will thus fail. > > But even with that behavior disabled, we'll still fail, because in > 'libgomp/target.c:gomp_map_vars_internal', when processing the > 'GOMP_MAP_TO_PSET(ar)', we find that 'ar' has already been mapped > (globally, as mentioned above), so for 'n && n->refcount != > REFCOUNT_LINK', we'll call 'gomp_map_vars_existing', and that one again > won't update device-side the array descriptor, because it's not > 'GOMP_MAP_ALWAYS_TO_P'. > > Indeed, if I use '!$omp declare target link(ar)' ('link' added), then > things seem to work as expected. > > Unless I got something wrong, at which level do you suggest this should > be fixed? So, handling 'declare'd 'allocatable's via the 'link' case does seem to make some sense to me -- but not yet fully thought through, for example, how that'd relate to the (default) 'to' property of '!$omp declare target [to](ar)'? Is there even something in OpenMP that we'd indeed copy data to the device (as opposed to just allocate)? Because, in OpenACC, we actually do have '!$acc declare copyin', etc. (But what would that mean with an 'allocatable'?) Anyway, I tried a WIP patch, and I'm also attaching the previous test case rewritten to add a '!$omp declare target' routine 'ar_properties' to read the device-side array properties. However, compiling that for offloading results in: [...]/target-array-properties-allocatable-declare-1-2.f90:6: error: variable 'ar' has been referenced in offloaded code but hasn't been marked to be included in the offloaded code 6 | integer, allocatable :: ar(:,:,:) | lto1: fatal error: errors during merging of translation units Same, if I use an explicit '!$omp declare target link(ar)'. (I thought that was the very point of the 'link' clause, that you could then reference such a variable in offloaded code; I shall re-read the standards again. Or, is that just another bug?) (There doesn't seem to be any Fortran libgomp test case exercising the '!$omp declare target link'?) For the record, for the corresponding OpenACC code, we get: [...]/array-properties-allocatable-declare-1-2.f90:6:0: 6 | integer, allocatable :: ar(:,:,:) | Error: 'ar' with 'link' clause used in 'routine' function [...]/array-properties-allocatable-declare-1-2.f90:6:0: Error: 'ar' with 'link' clause used in 'routine' function [...]/array-properties-allocatable-declare-1-2.f90:6:0: Error: 'ar' with 'link' clause used in 'routine' function ..., which is different -- but not better. WIP patch: diff --git gcc/fortran/trans-common.c gcc/fortran/trans-common.c index debdbd98ac0..5c2cd9b539e 100644 --- gcc/fortran/trans-common.c +++ gcc/fortran/trans-common.c @@ -452,16 +452,18 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init) DECL_USER_ALIGN (decl) = 0; GFC_DECL_COMMON_OR_EQUIV (decl) = 1; gfc_set_decl_location (decl, &com->where); if (com->threadprivate) set_decl_tls_model (decl, decl_default_tls_model (decl)); + /* Not possible to get an 'allocatable' here, no special handling + required. */ if (com->omp_declare_target_link) DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("omp declare target link"), NULL_TREE, DECL_ATTRIBUTES (decl)); else if (com->omp_declare_target) DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("omp declare target"), NULL_TREE, DECL_ATTRIBUTES (decl)); diff --git gcc/fortran/trans-decl.c gcc/fortran/trans-decl.c index 8803525f6ae..029962f65c6 100644 --- gcc/fortran/trans-decl.c +++ gcc/fortran/trans-decl.c @@ -1429,18 +1429,26 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list) tree c = build_omp_clause (UNKNOWN_LOCATION, code); OMP_CLAUSE_CHAIN (c) = clauses; clauses = c; tree dims = oacc_build_routine_dims (clauses); list = oacc_replace_fn_attrib_attr (list, dims); } + /* For an 'allocatable', it's the user's responsibility to 'allocate' it, and + create a device copy, so here, we handle it like the 'link' case. */ if (sym_attr.omp_declare_target_link - || sym_attr.oacc_declare_link) + || sym_attr.oacc_declare_link + || ((sym_attr.omp_declare_target + || sym_attr.oacc_declare_create + || sym_attr.oacc_declare_copyin + || sym_attr.oacc_declare_deviceptr + || sym_attr.oacc_declare_device_resident) + && sym_attr.allocatable)) list = tree_cons (get_identifier ("omp declare target link"), NULL_TREE, list); else if (sym_attr.omp_declare_target || sym_attr.oacc_declare_create || sym_attr.oacc_declare_copyin || sym_attr.oacc_declare_deviceptr || sym_attr.oacc_declare_device_resident) list = tree_cons (get_identifier ("omp declare target"), @@ -6473,17 +6481,22 @@ find_module_oacc_declare_clauses (gfc_symbol *sym) map_op = OMP_MAP_DEVICE_RESIDENT; if (sym->attr.oacc_declare_create || sym->attr.oacc_declare_copyin || sym->attr.oacc_declare_deviceptr || sym->attr.oacc_declare_device_resident) { sym->attr.referenced = 1; - add_clause (sym, map_op); + /* For an 'allocatable', it's the user's responsibility to 'allocate' + it, and create a device copy, so here, we handle it like the + 'link' case, and don't add it to the outer OpenACC 'data' + construct. */ + if (!sym->attr.allocatable) + add_clause (sym, map_op); } } } void finish_oacc_declare (gfc_namespace *ns, gfc_symbol *sym, bool block) { Grüße Thomas ! { dg-do run } module mod implicit none integer, parameter :: n = 40 integer, allocatable :: ar(:,:,:) integer, parameter :: ar_rank = rank(ar) integer :: ar_rank_v integer :: ar_size_v integer, dimension(ar_rank) :: ar_extent_v integer, dimension(ar_rank) :: ar_shape_v !$omp declare target (ar, ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v) end module mod subroutine ar_properties use mod implicit none !$omp declare target integer :: i ar_rank_v = rank(ar) ar_size_v = size(ar) do i = 1, ar_rank ar_extent_v(i) = size(ar, i) end do ar_shape_v = shape(ar) end subroutine ar_properties program main use mod implicit none allocate (ar(n,0:n+1,-1:n+2)) !$omp target enter data map(alloc:ar) !$omp target map(from:ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v) call ar_properties !$omp end target !$omp target update from(ar_rank_v, ar_size_v, ar_extent_v, ar_shape_v) print *, ar_rank_v if (ar_rank_v /= rank(ar)) error stop print *, ar_size_v if (ar_size_v /= size(ar)) error stop print *, ar_extent_v if (any (ar_extent_v /= shape(ar))) error stop print *, ar_shape_v if (any (ar_shape_v /= shape(ar))) error stop end program main
From a55b64d3dbc7e6c9c649f7f3d23e72a0a8712ee0 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Tue, 4 Jun 2019 20:25:41 +0200 Subject: [PATCH] [PR90743] Fortran 'allocatable' with OpenACC data/OpenMP 'target' 'map' clauses Test what OpenMP 5.0 has to say on this topic. And make OpenACC do the same. libgomp/ PR fortran/90743 * oacc-parallel.c (GOACC_parallel_keyed): Handle NULL case. * testsuite/libgomp.fortran/target-allocatable-1.f90: New file. * testsuite/libgomp.oacc-fortran/allocatable-1.f90: New file. --- libgomp/oacc-parallel.c | 9 ++- .../libgomp.fortran/target-allocatable-1.f90 | 8 +++ .../libgomp.oacc-fortran/allocatable-1.f90 | 70 +++++++++++++++++++ 3 files changed, 84 insertions(+), 3 deletions(-) create mode 100644 libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90 diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c index e56330f6226b..0c2cfa05a438 100644 --- a/libgomp/oacc-parallel.c +++ b/libgomp/oacc-parallel.c @@ -325,9 +325,12 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *), devaddrs = gomp_alloca (sizeof (void *) * mapnum); for (i = 0; i < mapnum; i++) - devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start - + tgt->list[i].key->tgt_offset - + tgt->list[i].offset); + if (tgt->list[i].key != NULL) + devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start + + tgt->list[i].key->tgt_offset + + tgt->list[i].offset); + else + devaddrs[i] = NULL; if (aq == NULL) acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims, tgt); diff --git a/libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90 b/libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90 new file mode 100644 index 000000000000..9f0b7f0c3b53 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-allocatable-1.f90 @@ -0,0 +1,8 @@ +! Test 'allocatable' with OpenMP 'target' 'map' clauses. + +! { dg-do run } +! { dg-additional-options "-cpp" } +! { dg-additional-options "-DACC_MEM_SHARED=0" { target offload_device_nonshared_as } } +! { dg-additional-options "-DACC_MEM_SHARED=1" { target offload_device_shared_as } } + +#include "../libgomp.oacc-fortran/allocatable-1.f90" diff --git a/libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90 new file mode 100644 index 000000000000..0941f71f8c30 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-1.f90 @@ -0,0 +1,70 @@ +! Test 'allocatable' with OpenACC data clauses. + +! This is also 'include'd from '../libgomp.fortran/target-allocatable-1.f90'. + +! { dg-do run } +! { dg-additional-options "-cpp" } + +program main + implicit none + integer, allocatable :: a, b, c, d, e + + allocate (a) + a = 11 + + b = 25 ! Implicit allocation. + + c = 52 ! Implicit allocation. + + !No 'allocate (d)' here. + + !No 'allocate (e)' here. + + !$omp target map(to: a) map(tofrom: b, c, d) map(from: e) + !$acc parallel copyin(a) copy(b, c, d) copyout(e) + + if (.not. allocated (a)) stop 1 + if (a .ne. 11) stop 2 + a = 33 + + if (.not. allocated (b)) stop 3 + if (b .ne. 25) stop 4 + + if (.not. allocated (c)) stop 5 + if (c .ne. 52) stop 6 + c = 10 + + if (allocated (d)) stop 7 + d = 42 ! Implicit allocation, but on device only. + if (.not. allocated (d)) stop 8 + deallocate (d) ! OpenMP requires must be "unallocated upon exit from the region". + + if (allocated (e)) stop 9 + e = 24 ! Implicit allocation, but on device only. + if (.not. allocated (e)) stop 10 + deallocate (e) ! OpenMP requires must be "unallocated upon exit from the region". + + !$acc end parallel + !$omp end target + + if (.not. allocated (a)) stop 20 +#if ACC_MEM_SHARED + if (a .ne. 33) stop 21 +#else + if (a .ne. 11) stop 22 +#endif + deallocate (a) + + if (.not. allocated (b)) stop 23 + if (b .ne. 25) stop 24 + deallocate (b) + + if (.not. allocated (c)) stop 25 + if (c .ne. 10) stop 26 + deallocate (c) + + if (allocated (d)) stop 27 + + if (allocated (e)) stop 28 + +end program main -- 2.17.1