From patchwork Wed Sep 6 09:34:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1830286 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rgcgz5ZFLz1yh1 for ; Wed, 6 Sep 2023 19:36:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8AC883858431 for ; Wed, 6 Sep 2023 09:36:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 35A8B3857033; Wed, 6 Sep 2023 09:35:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 35A8B3857033 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="6.02,231,1688457600"; d="scan'208";a="18331047" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 06 Sep 2023 01:35:50 -0800 IronPort-SDR: pcydfgKvHDsE6EXmqLysoqtAZpIR3b2UV5WBuQ89pQACD9g9mqyD1bwPDIBC8Kh2Rs5PoWpCWN 438aZMQwvy1DgyUfxY6P7mbIsfZQWXQhV5k85I9UslUdMbxwI3EdB3y4tpwIql+bwiPH/waWxA QBn3DgoG2AeY9oYNVea7PPpCHqLPs7J5DEuIS6lPD/9if817TlVE7JLmpSjD37BxVhcM3KOpQE 1An4bJb1px77aZG9t68SqchWLvW/H5Lr8tFRSeDF/DWR6A4yG99kMEPzRSUFwWneuy8ixTkXlP SKw= From: Julian Brown To: CC: , , Subject: [PATCH 5/5] OpenMP: Noncontiguous "target update" for Fortran Date: Wed, 6 Sep 2023 02:34:34 -0700 Message-ID: <2b85bf8067f775244ff9b3a6b35ede81473c3a55.1693991759.git.julian@codesourcery.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements noncontiguous "target update" for Fortran. The existing middle end/runtime bits relating to C and C++ support are reused, with some small adjustments, e.g.: 1. The node used to map the OMP "array descriptor" (from omp-low.cc onwards) now uses the OMP_CLAUSE_SIZE field as a bias (the difference between the "virtual origin" element with zero indices in each dimension and the first element actually stored in memory). 2. The OMP_CLAUSE_SIZE field of a GOMP_MAP_DIM_STRIDE node may now be used to store a "span", which is the distance in bytes between two adjacent elements in an array (with unit stride) when that is different from the element size, as it can be in Fortran. The implementation goes to some effort to massage Fortran array metadata (array descriptors) into a form that can ultimately be consumed by omp_target_memcpy_rect_worker. The method for doing this is described in comments in the patch body. This version of the patch has been rebased and contains some additional minor fixes relative to the version posted for the og13 branch. 2023-09-05 Julian Brown gcc/fortran/ * openmp.cc (omp_verify_map_motion_clauses): Allow strided array sections with 'omp target update'. * trans-openmp.cc (gfc_trans_omp_arrayshape_type, gfc_omp_calculate_gcd, gfc_desc_to_omp_noncontig_array, gfc_omp_contiguous_update_p): New functions. (gfc_trans_omp_clauses): Handle noncontiguous to/from clauses for OMP "target update" directives. gcc/ * gimplify.cc (gimplify_adjust_omp_clauses): Don't gimplify VIEW_CONVERT_EXPR away in GOMP_MAP_TO_GRID/GOMP_MAP_FROM_GRID clauses. * omp-low.cc (omp_noncontig_descriptor_type): Add SPAN field. (scan_sharing_clauses): Don't store descriptor size in its OMP_CLAUSE_SIZE field. (lower_omp_target): Add missing OMP_CLAUSE_MAP check. Add special-case string handling. Handle span and bias. Use low bound instead of zero as index for trailing full dimensions. gcc/testsuite/ * gfortran.dg/gomp/noncontig-updates-1.f90: New test. * gfortran.dg/gomp/noncontig-updates-2.f90: New test. * gfortran.dg/gomp/noncontig-updates-3.f90: New test. * gfortran.dg/gomp/noncontig-updates-4.f90: New test. libgomp/ * libgomp.h (omp_noncontig_array_desc): Add span field. * target.c (omp_target_memcpy_rect_worker): Add span parameter. Update forward declaration. Handle span != element_size. (gomp_update): Handle bias in descriptor's size slot. Update calls to omp_target_memcpy_rect_worker. (omp_target_memcpy_rect_worker): Add element_size == span checks (to existing stride == 1 check) to guard use of target plugin's 2D/3D memcpy hooks. * testsuite/libgomp.fortran/noncontig-updates-1.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-2.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-3.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-4.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-5.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-6.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-7.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-8.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-9.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-10.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-11.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-12.f90: New test. * testsuite/libgomp.fortran/noncontig-updates-13.f90: New test. --- gcc/fortran/openmp.cc | 5 +- gcc/fortran/trans-openmp.cc | 499 ++++++++++++++++++ gcc/gimplify.cc | 10 + gcc/omp-low.cc | 53 +- .../gfortran.dg/gomp/noncontig-updates-1.f90 | 19 + .../gfortran.dg/gomp/noncontig-updates-2.f90 | 16 + .../gfortran.dg/gomp/noncontig-updates-3.f90 | 16 + .../gfortran.dg/gomp/noncontig-updates-4.f90 | 15 + libgomp/libgomp.h | 1 + libgomp/target.c | 57 +- .../libgomp.fortran/noncontig-updates-1.f90 | 54 ++ .../libgomp.fortran/noncontig-updates-10.f90 | 29 + .../libgomp.fortran/noncontig-updates-11.f90 | 51 ++ .../libgomp.fortran/noncontig-updates-12.f90 | 59 +++ .../libgomp.fortran/noncontig-updates-13.f90 | 42 ++ .../libgomp.fortran/noncontig-updates-2.f90 | 101 ++++ .../libgomp.fortran/noncontig-updates-3.f90 | 47 ++ .../libgomp.fortran/noncontig-updates-4.f90 | 78 +++ .../libgomp.fortran/noncontig-updates-5.f90 | 55 ++ .../libgomp.fortran/noncontig-updates-6.f90 | 34 ++ .../libgomp.fortran/noncontig-updates-7.f90 | 36 ++ .../libgomp.fortran/noncontig-updates-8.f90 | 39 ++ .../libgomp.fortran/noncontig-updates-9.f90 | 34 ++ 23 files changed, 1311 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/noncontig-updates-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/noncontig-updates-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/noncontig-updates-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/noncontig-updates-4.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-10.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-11.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-12.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-13.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-3.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-4.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-5.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-6.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-7.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-8.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/noncontig-updates-9.f90 diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 585ffe035236..42d003776ee0 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -8026,7 +8026,10 @@ omp_verify_map_motion_clauses (gfc_code *code, int list, const char *name, int i; gfc_array_ref *ar = &lastslice->u.ar; for (i = 0; i < ar->dimen; i++) - if (ar->stride[i] && code && code->op != EXEC_OACC_UPDATE) + if (ar->stride[i] + && code + && code->op != EXEC_OACC_UPDATE + && code->op != EXEC_OMP_TARGET_UPDATE) { gfc_error ("Stride should not be specified for array section " "in %s clause at %L", name, &n->where); diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 95ce33797d62..3c4d9ccc5432 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -2781,6 +2781,346 @@ get_symbol_rooted_namelist (hash_map *dims) +{ + gcc_assert (dims->length () > 0); + + for (unsigned i = 0; i < dims->length (); i++) + { + tree dim = fold_convert (sizetype, (*dims)[i]); + /* We need the index of the last element, not the array size. */ + dim = size_binop (MINUS_EXPR, dim, size_one_node); + tree idxtype = build_index_type (dim); + type = build_array_type (type, idxtype); + } + + return type; +} + +/* Emit code to find the greatest common divisor of two (gfc_array_index_type) + trees to BLOCK. This is Euclid's algorithm: + + int + gcd (int a, int b) + { + int tmp; + while (b != 0) + { + tmp = b; + b = a % b; + a = tmp; + } + return a; + } +*/ + +static void +gfc_omp_calculate_gcd (stmtblock_t *block, tree dst, tree a, tree b) +{ + tree tmp = gfc_create_var (gfc_array_index_type, "tmp"); + tree avar = gfc_create_var (gfc_array_index_type, "a"); + tree bvar = gfc_create_var (gfc_array_index_type, "b"); + + /* Avoid clobbering the inputs. */ + gfc_add_modify (block, avar, a); + gfc_add_modify (block, bvar, b); + + tree label_cond = gfc_build_label_decl (NULL_TREE); + tree label_loop = gfc_build_label_decl (NULL_TREE); + TREE_USED (label_cond) = 1; + TREE_USED (label_loop) = 1; + + gfc_add_expr_to_block (block, build1_v (GOTO_EXPR, label_cond)); + gfc_add_expr_to_block (block, build1_v (LABEL_EXPR, label_loop)); + + gfc_add_modify (block, tmp, bvar); + gfc_add_modify (block, bvar, + fold_build2_loc (input_location, TRUNC_MOD_EXPR, + gfc_array_index_type, avar, bvar)); + gfc_add_modify (block, avar, tmp); + + gfc_add_expr_to_block (block, build1_v (LABEL_EXPR, label_cond)); + + tmp = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, bvar, + gfc_index_zero_node); + tmp = build3_v (COND_EXPR, tmp, build1_v (GOTO_EXPR, label_loop), + build_empty_stmt (input_location)); + gfc_add_expr_to_block (block, tmp); + + gfc_add_modify (block, dst, avar); +} + +/* Convert a gfortran array descriptor -- specifically the per-dimension + strides -- into a form that can be easily translated to a noncontiguous + OpenMP "target update" operation. We emit a specialized version of a + function like this inline: + + void + gfc_desc_to_omp_noncontig_array (int *dims, int *strides, int ndims, + int *fstrides, int *flo, int *fhi) + { + dims[ndims - 1] = (fhi[ndims - 1] - flo[ndims - 1] + 1); + strides[0] = fstrides[0]; + if (ndims > 1) + strides[ndims - 1] = 1; + if (ndims == 2) + dims[0] = fstrides[1]; + else if (ndims > 2) + { + int grains[ndims - 2]; + + int bigger_grain = fstrides[ndims - 1]; + for (int i = ndims - 2; i > 0; i--) + { + grains[i - 1] = gcd (fstrides[i], bigger_grain); + bigger_grain = grains[i - 1]; + } + + int volume = 1; + for (int i = 0; i < ndims - 2; i++) + { + int g = grains[i]; + dims[i] = g / volume; + strides[i + 1] = fstrides[i + 1] / g; + volume = volume * dims[i]; + } + dims[ndims - 2] = fstrides[ndims - 1] / volume; + } + } + + where "fstrides", "flo" and "fhi" represent the stride, low bound and upper + bound of each dimension in the Fortran array descriptor. + + (Note that most of the complexity only applies to arrays with more than two + dimensions, and the final stanza won't be emitted at all for lower-ranked + arrays.) + + The output of the algorithm is a set of dimensions dims[] = { D, C, B, A } + "as if" the array was declared like this (in C): + + type arr[A][B][C][D]; + + i.e. with the innermost dimension first, and a set of strides (in terms of + the step size along each dimension, without previous dimensions multiplied + in). + + As an example, if we have an array: + + allocate (arr(18,19,20,21,22)) + + and an update operation: + + !$omp target update to(arr(1:3:2,1:4:3,1:5:4,1:6:5,1:7:6)) + + the strides we see in the Fortran array descriptor will be: + + 2 54 1368 34200 861840 + + as given by: + + 2 = stride0 + 54 = dim0 * stride1 + 1368 = dim0 * dim1 * stride2 + 34200 = dim0 * dim1 * dim2 * stride3 + 861840 = dim0 * dim1 * dim2 * dim3 * stride4 + + where "dimN" are the extents of each dimension (18,19,20,21,22), and + "strideN" are the strides in terms of step length along each dimension + (2,3,4,5,6). + + We'd like to figure out what the original dimN, strideN were from the + Fortran array descriptor, but that's in general impossible. Furthermore, + if we naively divide a stride by the preceding stride, the result isn't + necessarily an integer, as for e.g.: + + 861840/34200 = 25.2 + + What we can do though is figure out the greatest common divisor of + each stride and the preceding one, from the largest down, and use those as + units of granularity, i.e. the size of the corresponding dimension we pass + to the middle-end/runtime. The stepwise stride is then the number of + times each "grain" fits into the Fortran array descriptor stride. + + The output of the algorithm will be: + + dims strides + 18 2 + 76 3 + 5 1 + 126 5 + 9 1 + + These numbers work fine for libgomp target.c:omp_target_memcpy_rect_worker. + Multiplying them through also gives the same numbers as the source Fortran + array strides, i.e. dim0*dim1*dim2*stride3 (18*76*5*5) = 34200. */ + +static void +gfc_desc_to_omp_noncontig_array (stmtblock_t *block, tree *ompdimsp, + tree *ompstridesp, tree desc, int ndims) +{ + tree lastdim = build_int_cst (gfc_array_index_type, ndims - 1); + tree dimrange = build_index_type (lastdim); + tree ndimarrtype = build_array_type (gfc_array_index_type, dimrange); + tree ompdims = gfc_create_var (ndimarrtype, "dims"); + tree ompstrides = gfc_create_var (ndimarrtype, "strides"); + + *ompdimsp = ompdims; + *ompstridesp = ompstrides; + + /* dims[ndims - 1] = (fhi[ndims - 1] - flo[ndims - 1] + 1); */ + tree lastlbound = gfc_conv_array_lbound (desc, ndims - 1); + tree lastubound = gfc_conv_array_ubound (desc, ndims - 1); + tree lastrange = fold_build2_loc (input_location, MINUS_EXPR, + gfc_array_index_type, lastubound, + lastlbound); + lastrange = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type, + lastrange, gfc_index_one_node); + + gfc_add_modify (block, + gfc_build_array_ref (ompdims, lastdim, NULL_TREE, true), + lastrange); + + /* strides[0] = fstrides[0]; */ + tree stride0 = gfc_conv_array_stride (desc, 0); + gfc_add_modify (block, + gfc_build_array_ref (ompstrides, gfc_index_zero_node, + NULL_TREE, true), + stride0); + + if (ndims > 1) + /* strides[ndims - 1] = 1; */ + gfc_add_modify (block, + gfc_build_array_ref (ompstrides, lastdim, NULL_TREE, true), + gfc_index_one_node); + + if (ndims == 2) + /* dims[0] = fstrides[1]; */ + gfc_add_modify (block, + gfc_build_array_ref (ompdims, gfc_index_zero_node, + NULL_TREE, true), + gfc_conv_array_stride (desc, 1)); + else if (ndims > 2) + { + /* int grains[ndims - 2]; */ + tree lastgrain = build_int_cst (gfc_array_index_type, ndims - 3); + tree grainrange = build_index_type (lastgrain); + tree grainarrtype = build_array_type (gfc_array_index_type, grainrange); + tree grains = gfc_create_var (grainarrtype, "grains"); + + /* int bigger_grain = fstrides[ndims - 1]; */ + tree bigger_grain = gfc_create_var (gfc_array_index_type, "bigger_grain"); + tree fstridem1 = gfc_conv_array_stride (desc, ndims - 1); + gfc_add_modify (block, bigger_grain, fstridem1); + + /* + for (int i = ndims - 2; i > 0; i--) + { + grains[i - 1] = gcd (fstrides[i], bigger_grain); + bigger_grain = grains[i - 1]; + } + */ + stmtblock_t loop_body; + gfc_init_block (&loop_body); + + tree idx = gfc_create_var (gfc_array_index_type, "idx"); + + tree gcdtmp = gfc_create_var (gfc_array_index_type, "tmp"); + gfc_omp_calculate_gcd (&loop_body, gcdtmp, + gfc_conv_descriptor_stride_get (desc, idx), + bigger_grain); + tree idxm1 = fold_build2_loc (input_location, MINUS_EXPR, + gfc_array_index_type, idx, + gfc_index_one_node); + gfc_add_modify (&loop_body, + gfc_build_array_ref (grains, idxm1, NULL_TREE, true), + gcdtmp); + gfc_add_modify (&loop_body, bigger_grain, gcdtmp); + + gfc_simple_for_loop (block, idx, + build_int_cst (gfc_array_index_type, ndims - 2), + gfc_index_zero_node, GT_EXPR, + build_int_cst (gfc_array_index_type, -1), + gfc_finish_block (&loop_body)); + /* + int volume = 1; + for (int i = 0; i < ndims - 2; i++) + { + int g = grains[i]; + dims[i] = g / volume; + strides[i + 1] = fstrides[i + 1] / g; + volume = volume * dims[i]; + } + */ + tree volume = gfc_create_var (gfc_array_index_type, "volume"); + gfc_add_modify (block, volume, gfc_index_one_node); + + gfc_init_block (&loop_body); + tree grain = gfc_create_var (gfc_array_index_type, "grain"); + gfc_add_modify (&loop_body, grain, + gfc_build_array_ref (grains, idx, NULL_TREE, true)); + tree dims_i = gfc_build_array_ref (ompdims, idx, NULL_TREE, true); + gfc_add_modify (&loop_body, dims_i, + fold_build2_loc (input_location, TRUNC_DIV_EXPR, + gfc_array_index_type, grain, volume)); + tree nidx = fold_build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, idx, + gfc_index_one_node); + tree strides_ni = gfc_build_array_ref (ompstrides, nidx, NULL_TREE, true); + tree fstrides_ni = gfc_conv_descriptor_stride_get (desc, nidx); + gfc_add_modify (&loop_body, strides_ni, + fold_build2_loc (input_location, TRUNC_DIV_EXPR, + gfc_array_index_type, fstrides_ni, + grain)); + gfc_add_modify (&loop_body, volume, + fold_build2_loc (input_location, MULT_EXPR, + gfc_array_index_type, volume, dims_i)); + + gfc_simple_for_loop (block, idx, gfc_index_zero_node, + build_int_cst (gfc_array_index_type, ndims - 2), + LT_EXPR, gfc_index_one_node, + gfc_finish_block (&loop_body)); + + /* dims[ndims - 2] = fstrides[ndims - 1] / volume; */ + tree dimsm2 + = gfc_build_array_ref (ompdims, + build_int_cst (gfc_array_index_type, ndims - 2), + NULL_TREE, true); + gfc_add_modify (block, dimsm2, + fold_build2_loc (input_location, TRUNC_DIV_EXPR, + gfc_array_index_type, fstridem1, + volume)); + } +} + +/* Return TRUE if update for N can definitely be done with a single contiguous + transfer. If no or if we can't tell, return FALSE. */ + +static bool +gfc_omp_contiguous_update_p (gfc_omp_namelist *n) +{ + gfc_expr *contig_expr = n->expr; + + if (!n->expr) + { + if (n->sym->attr.contiguous) + return true; + + tree desc = gfc_trans_omp_variable (n->sym, false); + tree type = TREE_TYPE (desc); + if (!GFC_ARRAY_TYPE_P (type) && !GFC_DESCRIPTOR_TYPE_P (type)) + return true; + + contig_expr = gfc_lval_expr_from_sym (n->sym); + } + + return gfc_is_simply_contiguous (contig_expr, false, true); +} + static tree gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, locus where, toc_directive cd = TOC_OPENMP) @@ -4218,6 +4558,165 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, default: gcc_unreachable (); } + + gfc_ref *lastref = NULL; + if (n->expr) + for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next) + if (ref->type == REF_COMPONENT || ref->type == REF_ARRAY) + lastref = ref; + + if ((list == OMP_LIST_TO || list == OMP_LIST_FROM) + && (!n->expr || (lastref && lastref->type == REF_ARRAY)) + && !gfc_omp_contiguous_update_p (n)) + { + int ndims; + gfc_se se; + gfc_init_se (&se, NULL); + + tree desc, span = NULL_TREE; + + if (n->expr) + { + if (n->expr->rank) + gfc_conv_expr_descriptor (&se, n->expr); + else + gfc_conv_expr (&se, n->expr); + + desc = se.expr; + /* The span is the distance between two array elements + along the innermost dimension (there may be padding + or other data between elements, e.g. of a derived-type + array). */ + span = gfc_get_array_span (desc, n->expr); + ndims = lastref->u.ar.dimen; + } + else + { + desc = gfc_trans_omp_variable (n->sym, false); + tree type = TREE_TYPE (desc); + if (GFC_DESCRIPTOR_TYPE_P (type)) + span = gfc_conv_descriptor_span_get (desc); + ndims = GFC_TYPE_ARRAY_RANK (type); + } + + gfc_add_block_to_block (block, &se.pre); + + tree ompdims, ompstrides; + + gfc_desc_to_omp_noncontig_array (block, &ompdims, + &ompstrides, desc, ndims); + + tree type = TREE_TYPE (desc); + tree etype = gfc_get_element_type (type); + tree elsize = fold_convert (gfc_array_index_type, + size_in_bytes (etype)); + + tree ptr = gfc_conv_array_data (desc); + tree offset = gfc_conv_array_offset (desc); + + if (!span) + /* The span is the element size. */ + span = elsize; + + tree node = build_omp_clause (input_location, OMP_CLAUSE_MAP); + + switch (list) + { + case OMP_LIST_TO: + OMP_CLAUSE_SET_MAP_KIND (node, GOMP_MAP_TO_GRID); + break; + case OMP_LIST_FROM: + OMP_CLAUSE_SET_MAP_KIND (node, GOMP_MAP_FROM_GRID); + break; + default: + gcc_unreachable (); + } + + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); + tree byte_offset = fold_convert (sizetype, offset); + byte_offset = size_binop (MULT_EXPR, byte_offset, + fold_convert (sizetype, span)); + tree origin + = fold_build2_loc (input_location, POINTER_PLUS_EXPR, + TREE_TYPE (ptr), ptr, byte_offset); + + OMP_CLAUSE_SIZE (node) = elsize; + + omp_clauses = gfc_trans_add_clause (node, omp_clauses); + + auto_vec dims; + + for (int r = 0; r < ndims; r++) + { + tree d + = gfc_build_array_ref (ompdims, + build_int_cst + (gfc_array_index_type, r), + NULL_TREE, true); + d = gfc_evaluate_now (d, block); + dims.safe_push (d); + } + + for (int r = ndims - 1; r >= 0; r--) + { + tree stride_r, len_r, lowbound_r; + + tree rcst = build_int_cst (gfc_array_index_type, r); + + stride_r = gfc_build_array_ref (ompstrides, rcst, + NULL_TREE, true); + lowbound_r = gfc_conv_array_lbound (desc, r); + len_r + = fold_build2_loc (input_location, MINUS_EXPR, + gfc_array_index_type, + gfc_conv_array_ubound (desc, r), + lowbound_r); + len_r + = fold_build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, + len_r, gfc_index_one_node); + + lowbound_r + = fold_build2_loc (input_location, MULT_EXPR, + gfc_array_index_type, lowbound_r, + stride_r); + + stride_r = gfc_evaluate_now (stride_r, block); + lowbound_r = gfc_evaluate_now (lowbound_r, block); + len_r = gfc_evaluate_now (len_r, block); + + tree dim = build_omp_clause (input_location, + OMP_CLAUSE_MAP); + OMP_CLAUSE_SET_MAP_KIND (dim, GOMP_MAP_GRID_DIM); + OMP_CLAUSE_DECL (dim) = lowbound_r; + OMP_CLAUSE_SIZE (dim) = len_r; + + omp_clauses = gfc_trans_add_clause (dim, omp_clauses); + + if (!integer_onep (stride_r) + || (r == 0 && !operand_equal_p (span, elsize))) + { + tree snode = build_omp_clause (input_location, + OMP_CLAUSE_MAP); + OMP_CLAUSE_SET_MAP_KIND (snode, + GOMP_MAP_GRID_STRIDE); + OMP_CLAUSE_DECL (snode) = stride_r; + if (r == 0 && !operand_equal_p (span, elsize)) + OMP_CLAUSE_SIZE (snode) = span; + omp_clauses = gfc_trans_add_clause (snode, + omp_clauses); + } + } + origin = build_fold_indirect_ref (origin); + tree eltype = gfc_get_element_type (TREE_TYPE (desc)); + tree arrtype + = gfc_trans_omp_arrayshape_type (eltype, &dims); + OMP_CLAUSE_DECL (node) + = build1_loc (input_location, VIEW_CONVERT_EXPR, + arrtype, origin); + continue; + } + tree node = build_omp_clause (input_location, clause_code); if (n->expr == NULL || (n->expr->ref->type == REF_ARRAY diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index c74e7696da42..695373992462 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -14426,6 +14426,16 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT) break; + /* If we have a non-contiguous (strided/rectangular) update + operation with a VIEW_CONVERT_EXPR, we need to be careful not + to gimplify the conversion away, because we need it during + omp-low.cc in order to retrieve the array's dimensions. Just + gimplify partially instead. */ + if ((OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_TO_GRID + || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FROM_GRID) + && TREE_CODE (*pd) == VIEW_CONVERT_EXPR) + pd = &TREE_OPERAND (*pd, 0); + /* We've already partly gimplified this in gimplify_scan_omp_clauses. Don't do any more. */ if (code == OMP_TARGET && OMP_CLAUSE_MAP_IN_REDUCTION (c)) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index bce74707fe89..f0885df05b84 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -1162,6 +1162,11 @@ omp_noncontig_descriptor_type (location_t loc) TREE_CHAIN (field) = fields; fields = field; + field = build_decl (loc, FIELD_DECL, get_identifier ("__span"), + size_type_node); + TREE_CHAIN (field) = fields; + fields = field; + tree ptr_size_type = build_pointer_type (size_type_node); field = build_decl (loc, FIELD_DECL, get_identifier ("__dim"), ptr_size_type); @@ -1756,7 +1761,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) OMP_CLAUSE_MAP); OMP_CLAUSE_SET_MAP_KIND (dn, GOMP_MAP_TO_PSET); OMP_CLAUSE_DECL (dn) = desc; - OMP_CLAUSE_SIZE (dn) = TYPE_SIZE_UNIT (desc_type); OMP_CLAUSE_CHAIN (dn) = OMP_CLAUSE_CHAIN (c); OMP_CLAUSE_CHAIN (c) = dn; @@ -12967,6 +12971,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) && OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_TO_PSET); c = nc; while ((nc = OMP_CLAUSE_CHAIN (c)) + && OMP_CLAUSE_CODE (nc) == OMP_CLAUSE_MAP && (OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_GRID_DIM || OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_GRID_STRIDE)) c = nc; @@ -13279,7 +13284,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) int i, dims = 0; auto_vec tdims; bool pointer_based = false, handled_pointer_section = false; - tree arrsize = fold_convert (sizetype, elsize); + tree arrsize = size_one_node; /* Allow a single (maybe strided) array section if we have a pointer base. */ @@ -13291,8 +13296,12 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) dims = 1; } else + /* NOTE: Don't treat (e.g. Fortran, fixed-length) strings as + array types here; array section syntax isn't applicable to + strings. */ for (tree itype = type; - TREE_CODE (itype) == ARRAY_TYPE; + TREE_CODE (itype) == ARRAY_TYPE + && !TYPE_STRING_FLAG (itype); itype = TREE_TYPE (itype)) { tdims.safe_push (itype); @@ -13333,13 +13342,16 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) oc = c; c = dn; + tree span = NULL_TREE; + for (i = 0; i < dims; i++) { nc = OMP_CLAUSE_CHAIN (c); tree dim = NULL_TREE, index = NULL_TREE, len = NULL_TREE, stride = size_one_node; - if (OMP_CLAUSE_CODE (nc) == OMP_CLAUSE_MAP + if (nc + && OMP_CLAUSE_CODE (nc) == OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_KIND (nc) == GOMP_MAP_GRID_DIM) { index = OMP_CLAUSE_DECL (nc); @@ -13356,6 +13368,18 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) { stride = OMP_CLAUSE_DECL (nc2); stride = fold_convert (sizetype, stride); + if (OMP_CLAUSE_SIZE (nc2)) + { + /* If the element size is not the same as the + distance between two adjacent array + elements (in the innermost dimension), + retrieve the latter value ("span") from the + size field of the stride. We only expect to + see one such field per array. */ + gcc_assert (!span); + span = OMP_CLAUSE_SIZE (nc2); + span = fold_convert (sizetype, span); + } nc = nc2; } @@ -13413,7 +13437,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) dim = size_binop (MINUS_EXPR, maxval, minval); dim = size_binop (PLUS_EXPR, dim, size_one_node); len = dim; - index = size_zero_node; + index = minval; + nc = c; } if (TREE_CODE (dim) != INTEGER_CST) @@ -13451,7 +13476,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) + index[1] * stride[1] * dim[2] + index[2] * stride[2] - All multiplied by "elsize". */ + All multiplied by "span" (or "elsize"). */ tree index_stride = size_binop (MULT_EXPR, index, stride); bias = size_binop (PLUS_EXPR, bias, @@ -13474,14 +13499,16 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) enclosure = volume; } - elsize = fold_convert (sizetype, elsize); + /* If we don't have a separate span size, use the element size + instead. */ + if (!span) + span = fold_convert (sizetype, elsize); /* The size of a volume enclosing the elements to be transferred. */ - OMP_CLAUSE_SIZE (oc) - = size_binop (MULT_EXPR, enclosure, elsize); + OMP_CLAUSE_SIZE (oc) = size_binop (MULT_EXPR, enclosure, span); /* And the bias of the first element we will update. */ - OMP_CLAUSE_SIZE (dn) = size_binop (MULT_EXPR, bias, elsize); + OMP_CLAUSE_SIZE (dn) = size_binop (MULT_EXPR, bias, span); tree cdim = build_constructor (size_arr_type, vdim); tree cindex = build_constructor (size_arr_type, vindex); @@ -13512,13 +13539,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) tree ndims_field = TYPE_FIELDS (desc_type); tree elemsize_field = DECL_CHAIN (ndims_field); - tree dim_field = DECL_CHAIN (elemsize_field); + tree span_field = DECL_CHAIN (elemsize_field); + tree dim_field = DECL_CHAIN (span_field); tree index_field = DECL_CHAIN (dim_field); tree len_field = DECL_CHAIN (index_field); tree stride_field = DECL_CHAIN (len_field); vec *v; - vec_alloc (v, 6); + vec_alloc (v, 7); bool all_static = (TREE_STATIC (dim_tmp) && TREE_STATIC (index_tmp) @@ -13548,6 +13576,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) CONSTRUCTOR_APPEND_ELT (v, ndims_field, ndims); CONSTRUCTOR_APPEND_ELT (v, elemsize_field, elsize); + CONSTRUCTOR_APPEND_ELT (v, span_field, span); CONSTRUCTOR_APPEND_ELT (v, dim_field, dim_tmp); CONSTRUCTOR_APPEND_ELT (v, index_field, index_tmp); CONSTRUCTOR_APPEND_ELT (v, len_field, len_tmp); diff --git a/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-1.f90 b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-1.f90 new file mode 100644 index 000000000000..5c60f5cac620 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-1.f90 @@ -0,0 +1,19 @@ +! { dg-additional-options "-fdump-tree-original" } + +integer :: basicarray(100) +integer, allocatable :: allocarray(:) + +allocate(allocarray(1:20)) + +!$omp target update to(basicarray) + +!$omp target update from(basicarray(:)) + +!$omp target update to(allocarray) + +!$omp target update from(allocarray(:)) + +end + +! { dg-final { scan-tree-dump-times {omp target update from\(} 2 "original" } } +! { dg-final { scan-tree-dump-times {omp target update to\(} 2 "original" } } diff --git a/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-2.f90 b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-2.f90 new file mode 100644 index 000000000000..f5a52736b0cc --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-2.f90 @@ -0,0 +1,16 @@ +! { dg-additional-options "-fdump-tree-original" } + +integer, allocatable :: allocarray(:) +integer, allocatable :: allocarray2(:,:) + +allocate(allocarray(1:20)) +allocate(allocarray2(1:20,1:20)) + +! This one must be noncontiguous +!$omp target update to(allocarray(::2)) +! { dg-final { scan-tree-dump {omp target update map\(to_grid:} "original" } } + +!$omp target update from(allocarray2(:,5:15)) +! { dg-final { scan-tree-dump {omp target update from\(} "original" } } + +end diff --git a/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-3.f90 b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-3.f90 new file mode 100644 index 000000000000..5cbfe7c7be54 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-3.f90 @@ -0,0 +1,16 @@ +! { dg-additional-options "-fdump-tree-original" } + +integer, allocatable :: allocarray(:,:) + +allocate(allocarray(1:20,1:20)) + +! This one could possibly be handled as a contiguous update - but isn't, +! for now. +!$omp target update to(allocarray(1:20,5:15)) +! { dg-final { scan-tree-dump {omp target update map\(to_grid:} "original" } } + +!$omp target update from(allocarray(:,5:15:2)) +! { dg-final { scan-tree-dump {omp target update map\(from_grid:} "original" } } + +end + diff --git a/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-4.f90 b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-4.f90 new file mode 100644 index 000000000000..53152aacbb41 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/noncontig-updates-4.f90 @@ -0,0 +1,15 @@ +! { dg-additional-options "-fdump-tree-original" } + +integer, target :: tgtarray(20) +integer, pointer, contiguous :: arrayptr(:) + +arrayptr => tgtarray + +!$omp target update from(arrayptr) +! { dg-final { scan-tree-dump {omp target update from\(} "original" } } + +!$omp target update to(arrayptr(::2)) +! { dg-final { scan-tree-dump {omp target update map\(to_grid:} "original" } } + +end + diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index a930a8243eae..5a2f768c29b8 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1308,6 +1308,7 @@ struct target_mem_desc { typedef struct { size_t ndims; size_t elemsize; + size_t span; size_t *dim; size_t *index; size_t *length; diff --git a/libgomp/target.c b/libgomp/target.c index d11e222d6fa8..9b9b08db8be0 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2126,8 +2126,8 @@ goacc_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom, gomp_unmap_vars_internal (tgt, do_copyfrom, NULL, aq); } -static int omp_target_memcpy_rect_worker (void *, const void *, size_t, int, - const size_t *, const size_t *, +static int omp_target_memcpy_rect_worker (void *, const void *, size_t, size_t, + int, const size_t *, const size_t *, const size_t *, const size_t *, const size_t *, const size_t *, struct gomp_device_descr *, @@ -2165,9 +2165,9 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs, { omp_noncontig_array_desc *desc = (omp_noncontig_array_desc *) hostaddrs[i + 1]; - cur_node.host_start = (uintptr_t) hostaddrs[i]; + size_t bias = sizes[i + 1]; + cur_node.host_start = (uintptr_t) hostaddrs[i] + bias; cur_node.host_end = cur_node.host_start + sizes[i]; - assert (sizes[i + 1] == sizeof (omp_noncontig_array_desc)); splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node); if (n) { @@ -2179,21 +2179,24 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs, } void *devaddr = (void *) (n->tgt->tgt_start + n->tgt_offset + cur_node.host_start - - n->host_start); + - n->host_start + - bias); if ((kind & typemask) == GOMP_MAP_TO_GRID) omp_target_memcpy_rect_worker (devaddr, hostaddrs[i], - desc->elemsize, desc->ndims, - desc->length, desc->stride, - desc->index, desc->index, - desc->dim, desc->dim, devicep, - NULL, &tmp_size, &tmp); + desc->elemsize, desc->span, + desc->ndims, desc->length, + desc->stride, desc->index, + desc->index, desc->dim, + desc->dim, devicep, NULL, + &tmp_size, &tmp); else omp_target_memcpy_rect_worker (hostaddrs[i], devaddr, - desc->elemsize, desc->ndims, - desc->length, desc->stride, - desc->index, desc->index, - desc->dim, desc->dim, NULL, - devicep, &tmp_size, &tmp); + desc->elemsize, desc->span, + desc->ndims, desc->length, + desc->stride, desc->index, + desc->index, desc->dim, + desc->dim, NULL, devicep, + &tmp_size, &tmp); } i++; } @@ -4640,7 +4643,7 @@ omp_target_memcpy_async (void *dst, const void *src, size_t length, static int omp_target_memcpy_rect_worker (void *dst, const void *src, size_t element_size, - int num_dims, const size_t *volume, + size_t span, int num_dims, const size_t *volume, const size_t *strides, const size_t *dst_offsets, const size_t *src_offsets, @@ -4655,7 +4658,7 @@ omp_target_memcpy_rect_worker (void *dst, const void *src, size_t element_size, size_t j, dst_off, src_off, length; int i, ret; - if (num_dims == 1 && (!strides || strides[0] == 1)) + if (num_dims == 1 && (!strides || (strides[0] == 1 && element_size == span))) { if (__builtin_mul_overflow (element_size, volume[0], &length) || __builtin_mul_overflow (element_size, dst_offsets[0], &dst_off) @@ -4716,12 +4719,11 @@ omp_target_memcpy_rect_worker (void *dst, const void *src, size_t element_size, assert ((src_devicep == NULL || dst_devicep == NULL) && (src_devicep != NULL || dst_devicep != NULL)); - if (__builtin_mul_overflow (element_size, dst_offsets[0], &dst_off) - || __builtin_mul_overflow (element_size, src_offsets[0], &src_off)) + if (__builtin_mul_overflow (span, dst_offsets[0], &dst_off) + || __builtin_mul_overflow (span, src_offsets[0], &src_off)) return EINVAL; - if (strides - && __builtin_mul_overflow (element_size, strides[0], &stride)) + if (__builtin_mul_overflow (span, strides[0], &stride)) return EINVAL; for (i = 0, ret = 1; i < volume[0] && ret; i++) @@ -4744,7 +4746,9 @@ omp_target_memcpy_rect_worker (void *dst, const void *src, size_t element_size, /* host->device, device->host and intra device. */ if (num_dims == 2 - && (!strides || (strides[0] == 1 && strides[1] == 1)) + && (!strides || (strides[0] == 1 + && strides[1] == 1 + && element_size == span)) && ((src_devicep && src_devicep == dst_devicep && src_devicep->memcpy2d_func) @@ -4773,7 +4777,8 @@ omp_target_memcpy_rect_worker (void *dst, const void *src, size_t element_size, else if (num_dims == 3 && (!strides || (strides[0] == 1 && strides[1] == 1 - && strides[2] == 1)) + && strides[2] == 1 + && element_size == span)) && ((src_devicep && src_devicep == dst_devicep && src_devicep->memcpy3d_func) @@ -4817,7 +4822,7 @@ omp_target_memcpy_rect_worker (void *dst, const void *src, size_t element_size, { ret = omp_target_memcpy_rect_worker ((char *) dst + dst_off, (const char *) src + src_off, - element_size, num_dims - 1, + element_size, span, num_dims - 1, volume + 1, strides ? strides + 1 : NULL, dst_offsets + 1, src_offsets + 1, @@ -4870,8 +4875,8 @@ omp_target_memcpy_rect_copy (void *dst, const void *src, gomp_mutex_lock (&src_devicep->lock); if (lock_dst) gomp_mutex_lock (&dst_devicep->lock); - int ret = omp_target_memcpy_rect_worker (dst, src, element_size, num_dims, - volume, NULL, dst_offsets, + int ret = omp_target_memcpy_rect_worker (dst, src, element_size, element_size, + num_dims, volume, NULL, dst_offsets, src_offsets, dst_dimensions, src_dimensions, dst_devicep, src_devicep, diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-1.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-1.f90 new file mode 100644 index 000000000000..6ee87e8043b3 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-1.f90 @@ -0,0 +1,54 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +implicit none +integer, allocatable, target :: arr(:), arr2(:,:) +integer, pointer :: ap(:), ap2(:,:) +integer :: i, j + +allocate(arr(1:20)) + +arr = 0 + +!$omp target enter data map(to: arr) + +ap => arr(1:20:2) +ap = 5 + +!$omp target update to(ap) + +!$omp target exit data map(from: arr) + +do i=1,20 + if (mod(i,2).eq.1.and.arr(i).ne.5) stop 1 + if (mod(i,2).eq.0.and.arr(i).ne.0) stop 2 +end do + +allocate(arr2(1:20,1:20)) + +ap2 => arr2(2:10:2,3:12:3) + +arr2 = 1 + +!$omp target enter data map(to: arr2) + +!$omp target +ap2 = 5 +!$omp end target + +!$omp target update from(ap2) + +do i=1,20 + do j=1,20 + if (i.ge.2.and.i.le.10.and.mod(i-2,2).eq.0.and.& + &j.ge.3.and.j.le.12.and.mod(j-3,3).eq.0) then + if (arr2(i,j).ne.5) stop 3 + else + if (arr2(i,j).ne.1) stop 4 + end if + end do +end do + +!$omp target exit data map(delete: arr2) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-10.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-10.f90 new file mode 100644 index 000000000000..c47ce38918d6 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-10.f90 @@ -0,0 +1,29 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +character(len=8), allocatable, dimension(:) :: lines +integer :: i + +allocate(lines(10)) + +lines = "OMPHELLO" + +!$omp target enter data map(to: lines) + +!$omp target +lines = "NEWVALUE" +!$omp end target + +!$omp target update from(lines(5:7:2)) + +do i=1,10 + if (i.eq.5.or.i.eq.7) then + if (lines(i).ne."NEWVALUE") stop 1 + else + if (lines(i).ne."OMPHELLO") stop 2 + end if +end do + +!$omp target exit data map(delete: lines) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-11.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-11.f90 new file mode 100644 index 000000000000..a93acf21d770 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-11.f90 @@ -0,0 +1,51 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +program p +implicit none +real(kind=4) :: arr(10,10,10,10) + +call s(arr,9,9,9,9) + +contains + +subroutine s(arr,m,n,o,p) +implicit none +integer :: i,m,n,o,p +integer :: a,b,c,d +real(kind=4) :: arr(0:m,0:n,0:o,0:p) + +arr = 0 + +!$omp target enter data map(to: arr) + +!$omp target +do i=0,9 + arr(i,i,i,i) = i +end do +!$omp end target + +!$omp target update from(arr(0:2,0:2,0:2,0:2)) + +do a=0,9 + do b=0,9 + do c=0,9 + do d=0,9 + if (a.le.2.and.b.le.2.and.c.le.2.and.d.le.2) then + if (a.eq.b.and.b.eq.c.and.c.eq.d) then + if (arr(a,b,c,d).ne.a) stop 1 + else + if (arr(a,b,c,d).ne.0) stop 2 + end if + else + if (arr(a,b,c,d).ne.0) stop 3 + end if + end do + end do + end do +end do + +!$omp target exit data map(delete: arr) + +end subroutine s +end program p diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-12.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-12.f90 new file mode 100644 index 000000000000..c47fbdb0d112 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-12.f90 @@ -0,0 +1,59 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +! Test plain, fixed-size arrays, and also pointers to same. + +implicit none +integer(kind=8) :: arr(10,30) +integer, target :: arr2(9,11,13) +integer, pointer :: parr(:,:,:) +integer :: i, j, k + +arr = 0 +!$omp target enter data map(to: arr) + +!$omp target +arr = 99 +!$omp end target + +!$omp target update from(arr(1:10:3,5:30:7)) + +do i=1,10 + do j=1,30 + if (mod(i-1,3).eq.0.and.mod(j-5,7).eq.0) then + if (arr(i,j).ne.99) stop 1 + else + if (arr(i,j).ne.0) stop 2 + endif + end do +end do + +!$omp target exit data map(delete: arr) + +arr2 = 0 +parr => arr2 +!$omp target enter data map(to: parr) + +!$omp target +parr = 99 +!$omp end target + +!$omp target update from(parr(7:9:2,5:7:2,3:6:3)) + +do i=1,9 + do j=1,11 + do k=1,13 + if (i.ge.7.and.j.ge.5.and.k.ge.3.and.& + &i.le.9.and.j.le.7.and.k.le.6.and.& + &mod(i-7,2).eq.0.and.mod(j-5,2).eq.0.and.mod(k-3,3).eq.0) then + if (parr(i,j,k).ne.99) stop 3 + else + if (parr(i,j,k).ne.0) stop 4 + end if + end do + end do +end do + +!$omp target exit data map(delete: parr) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-13.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-13.f90 new file mode 100644 index 000000000000..42f867efefc1 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-13.f90 @@ -0,0 +1,42 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +implicit none +integer, allocatable :: arr(:,:,:,:,:) +integer :: i, j, k, l, m + +allocate (arr(18,19,20,21,22)) + +arr = 0 + +!$omp target enter data map(to: arr) + +arr = 10 + +!$omp target update to(arr(1:3:2,1:4:3,1:5:4,1:6:5,1:7:6)) + +!$omp target +do i=1,18 + do j=1,19 + do k=1,20 + do l=1,21 + do m=1,22 + if ((i.eq.1.or.i.eq.3).and.& + &(j.eq.1.or.j.eq.4).and.& + &(k.eq.1.or.k.eq.5).and.& + &(l.eq.1.or.l.eq.6).and.& + &(m.eq.1.or.m.eq.7)) then + if (arr(i,j,k,l,m).ne.10) stop 1 + else + if (arr(i,j,k,l,m).ne.0) stop 2 + end if + end do + end do + end do + end do +end do +!$omp end target + +!$omp target exit data map(delete: arr) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-2.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-2.f90 new file mode 100644 index 000000000000..2d3efb8bfccc --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-2.f90 @@ -0,0 +1,101 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +program p +implicit none +integer, allocatable, target :: arr3(:,:,:) +integer, pointer :: ap3(:,:,:) +integer :: i, j, k + +allocate(arr3(1:10,1:10,1:10)) + +! CHECK 1 + +arr3 = 0 +ap3 => arr3(1:10,1:10,1:10:2) + +!$omp target enter data map(to: arr3) + +!$omp target +ap3 = 5 +!$omp end target + +!$omp target update from(ap3) + +call check(arr3, 0, 1, 1, 2) + +!$omp target exit data map(delete: arr3) + +! CHECK 2 + +arr3 = 0 +ap3 => arr3(1:10,1:10:2,1:10) + +!$omp target enter data map(to: arr3) + +!$omp target +ap3 = 5 +!$omp end target + +!$omp target update from(ap3) + +call check(arr3, 2, 1, 2, 1) + +!$omp target exit data map(delete: arr3) + +! CHECK 3 + +arr3 = 0 +ap3 => arr3(1:10:2,1:10,1:10) + +!$omp target enter data map(to: arr3) + +!$omp target +ap3 = 5 +!$omp end target + +!$omp target update from(ap3) + +call check(arr3, 4, 2, 1, 1) + +!$omp target exit data map(delete: arr3) + +! CHECK 4 + +arr3 = 0 +ap3 => arr3(1:10:2,1:10:2,1:10:2) + +!$omp target enter data map(to: arr3) + +!$omp target +ap3 = 5 +!$omp end target + +!$omp target update from(ap3) + +call check(arr3, 6, 2, 2, 2) + +!$omp target exit data map(delete: arr3) + +contains + +subroutine check(arr,cb,s1,s2,s3) +implicit none +integer :: arr(:,:,:) +integer :: cb, s1, s2, s3 + +do i=1,10 + do j=1,10 + do k=1,10 + if (mod(k-1,s1).eq.0.and.mod(j-1,s2).eq.0.and.mod(i-1,s3).eq.0) then + if (arr(k,j,i).ne.5) stop cb+1 + else + if (arr(k,j,i).ne.0) stop cb+2 + end if + end do + end do +end do + +end subroutine check + +end program p diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-3.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-3.f90 new file mode 100644 index 000000000000..14f1288a6970 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-3.f90 @@ -0,0 +1,47 @@ +program p +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +integer :: A(200) +A = [(i, i=1,200)] +!$omp target enter data map(to: A(40:200)) +call foo(A(101:)) + +contains + +subroutine foo(x) +integer, target :: x(100) +integer, pointer :: p(:,:) +integer :: i, j + +p(0:5,-5:-1) => x(::2) + +!$omp target +x = x * 2 +!$omp end target + +!$omp target update from(x(1:20:2)) + +do i=1,20 +if (mod(i,2).eq.1 .and. x(i).ne.(100+i)*2) stop 1 +if (mod(i,2).eq.0 .and. x(i).ne.100+i) stop 2 +end do + +!$omp target +p = 0 +!$omp end target + +!$omp target update from(p(::3,::2)) + +do i=0,5 + do j=-5,-1 + if (mod(i,3).eq.0 .and. mod(j+5,2).eq.0) then + if (p(i,j).ne.0) stop 3 + else + if (p(i,j).eq.0) stop 4 + end if + end do +end do + +end subroutine foo +end program p diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-4.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-4.f90 new file mode 100644 index 000000000000..46e8c23d2856 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-4.f90 @@ -0,0 +1,78 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +type t + complex(kind=8) :: c + integer :: i +end type t + +type u + integer :: i, j + complex(kind=8) :: c + integer :: k +end type u + +type(t), target :: var(10) +type(u), target :: var2(10) +complex(kind=8), pointer :: ptr(:) +integer :: i + +do i=1,10 + var(i)%c = dcmplx(i,0) + var(i)%i = i +end do + +ptr => var(:)%c + +!$omp target enter data map(to: var) + +!$omp target +var(:)%c = dcmplx(0,0) +var(:)%i = 0 +!$omp end target + +!$omp target update from(ptr) + +do i=1,10 + if (var(i)%c.ne.dcmplx(0,0)) stop 1 + if (var(i)%i.ne.i) stop 2 +end do + +!$omp target exit data map(delete: var) + +! Now do it again with a differently-ordered derived type. + +do i=1,10 + var2(i)%c = dcmplx(0,i) + var2(i)%i = i + var2(i)%j = i * 2 + var2(i)%k = i * 3 +end do + +ptr => var2(::2)%c + +!$omp target enter data map(to: var2) + +!$omp target +var2(:)%c = dcmplx(0,0) +var2(:)%i = 0 +var2(:)%j = 0 +var2(:)%k = 0 +!$omp end target + +!$omp target update from(ptr) + +do i=1,10 + if (mod(i,2).eq.1) then + if (var2(i)%c.ne.dcmplx(0,0)) stop 3 + else + if (var2(i)%c.ne.dcmplx(0,i)) stop 4 + end if + if (var2(i)%i.ne.i) stop 5 + if (var2(i)%j.ne.i * 2) stop 6 + if (var2(i)%k.ne.i * 3) stop 7 +end do + +!$omp target exit data map(delete: var2) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-5.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-5.f90 new file mode 100644 index 000000000000..9cc20fa321eb --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-5.f90 @@ -0,0 +1,55 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +! Only some of an array mapped on the target + +integer, target :: arr(100) +integer, pointer :: ptr(:) + +arr = [(i * 2, i=1,100)] + +!$omp target enter data map(to: arr(51:100)) + +!$omp target +arr(51:100) = arr(51:100) + 1 +!$omp end target + +!$omp target update from(arr(51:100:2)) + +do i=1,100 + if (i.le.50) then + if (arr(i).ne.i*2) stop 1 + else + if (mod(i,2).eq.1 .and. arr(i).ne.i*2+1) stop 2 + if (mod(i,2).eq.0 .and. arr(i).ne.i*2) stop 3 + end if +end do + +!$omp target exit data map(delete: arr) + +arr = [(i * 2, i=1,100)] + +! Similar, but update via pointer. + +ptr => arr(51:100) + +!$omp target enter data map(to: ptr(1:50)) + +!$omp target +ptr = ptr + 1 +!$omp end target + +!$omp target update from(ptr(::2)) + +do i=1,100 + if (i.le.50) then + if (arr(i).ne.i*2) stop 1 + else + if (mod(i,2).eq.1 .and. arr(i).ne.i*2+1) stop 2 + if (mod(i,2).eq.0 .and. arr(i).ne.i*2) stop 3 + end if +end do + +!$omp target exit data map(delete: ptr) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-6.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-6.f90 new file mode 100644 index 000000000000..5c42b9077b38 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-6.f90 @@ -0,0 +1,34 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +program p +implicit none +integer, dimension(100) :: parr +integer :: i + +parr = [(i, i=1,100)] + +!$omp target enter data map(to: parr) + +call s(parr) + +do i=1,100 + if (mod(i,3).eq.1 .and. parr(i).ne.999) stop 1 + if (mod(i,3).ne.1 .and. parr(i).ne.i) stop 2 +end do + +!$omp target exit data map(delete: parr) + +contains +subroutine s(arr) +implicit none +integer, intent(inout) :: arr(*) + +!$omp target map(alloc: arr(1:100)) +arr(1:100) = 999 +!$omp end target + +!$omp target update from(arr(1:100:3)) + +end subroutine s +end program p diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-7.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-7.f90 new file mode 100644 index 000000000000..120fd9c90ed5 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-7.f90 @@ -0,0 +1,36 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +! Assumed-shape arrays + +program p +implicit none +integer, dimension(100) :: parr +integer :: i + +parr = [(i, i=1,100)] + +!$omp target enter data map(to: parr) + +call s(parr) + +do i=1,100 + if (mod(i,3).eq.1 .and. parr(i).ne.999) stop 1 + if (mod(i,3).ne.1 .and. parr(i).ne.i) stop 2 +end do + +!$omp target exit data map(delete: parr) + +contains +subroutine s(arr) +implicit none +integer, intent(inout) :: arr(:) + +!$omp target +arr = 999 +!$omp end target + +!$omp target update from(arr(1:100:3)) + +end subroutine s +end program p diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-8.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-8.f90 new file mode 100644 index 000000000000..d9b3c9ca8966 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-8.f90 @@ -0,0 +1,39 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +! Test biasing for target-region lookup. + +implicit none +integer, allocatable, target :: var(:,:,:) +integer, pointer :: p(:,:,:) +integer :: i, j, k + +allocate(var(1:20,5:25,10:30)) + +var = 0 + +!$omp target enter data map(to: var) + +!$omp target +var = 99 +!$omp end target + +p => var(1:3:2,5:5,10:10) + +!$omp target update from(p) + +do i=1,20 + do j=5,25 + do k=10,30 + if ((i.eq.1.or.i.eq.3).and.j.eq.5.and.k.eq.10) then + if (var(i,j,k).ne.99) stop 1 + else + if (var(i,j,k).ne.0) stop 2 + end if + end do + end do +end do + +!$omp target exit data map(delete: var) + +end diff --git a/libgomp/testsuite/libgomp.fortran/noncontig-updates-9.f90 b/libgomp/testsuite/libgomp.fortran/noncontig-updates-9.f90 new file mode 100644 index 000000000000..689a46a91f0e --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/noncontig-updates-9.f90 @@ -0,0 +1,34 @@ +! { dg-do run } +! { dg-require-effective-target offload_device_nonshared_as } + +! This test case hits the problem described in: +! https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612219.html + +! { dg-xfail-run-if "'enter data' bug" { offload_device_nonshared_as } } + +character(len=:), allocatable, dimension(:) :: lines +integer :: i + +allocate(character(len=8) :: lines(10)) + +lines = "OMPHELLO" + +!$omp target enter data map(to: lines) + +!$omp target +lines = "NEWVALUE" +!$omp end target + +!$omp target update from(lines(5:7:2)) + +do i=1,10 + if (i.eq.5.or.i.eq.7) then + if (lines(i).ne."NEWVALUE") stop 1 + else + if (lines(i).ne."OMPHELLO") stop 2 + end if +end do + +!$omp target exit data map(delete: lines) + +end