From patchwork Mon Jan 30 15:26:21 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 721565 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vBtXq2CwJz9sf9 for ; Tue, 31 Jan 2017 02:26:51 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="EvldR+QZ"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type; q=dns; s=default; b=I+XhfuvkXCudg8Jr ZTpDjW28kNRy/s2lQ9LAAS/Rxp8xFQzLM2qjNDQPQZTLnDS45XwLZcrnrMmQiw3v lyDhXPA3FtA0YP4RBKqDQsEtcYYQfbN7k6rrelDOjHZCsiKamTUEEZPA/UyvTYcd GnWPF+y2TN4VR0V7VK/XgShgciU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type; s=default; bh=JB0XYg6OmCarfy/gQlU0Lv Yrx7E=; b=EvldR+QZozELnYnEgVOplwpwIkEezzXdsyYf0bwXY75l+aRlpOPbyg Bhncy7/W+G99YFiuhFZWj4D+dA75a14lWBL3wd7PeZ6rCqy0K6+TnqVVHb7JyVYm 7mbuE7c9z/mY1tft/PmJd7k2Tocuncd2VZEUv70QMOqOFhzpRvONY= Received: (qmail 106910 invoked by alias); 30 Jan 2017 15:26:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 106672 invoked by uid 89); 30 Jan 2017 15:26:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=clique, location_t, U*thomas, thomas@codesourcery.com X-Spam-User: qpsmtpd, 2 recipients X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Jan 2017 15:26:30 +0000 Received: from svr-orw-fem-02x.mgc.mentorg.com ([147.34.96.206] helo=SVR-ORW-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1cYDqe-0002kF-Ov from Thomas_Schwinge@mentor.com ; Mon, 30 Jan 2017 07:26:28 -0800 Received: from tftp-cs (147.34.91.1) by svr-orw-fem-02.mgc.mentorg.com (147.34.96.168) with Microsoft SMTP Server id 14.3.224.2; Mon, 30 Jan 2017 07:26:28 -0800 Received: by tftp-cs (Postfix, from userid 49978) id A6192C224D; Mon, 30 Jan 2017 07:26:27 -0800 (PST) From: Thomas Schwinge To: Cesar Philippidis CC: "gcc-patches@gcc.gnu.org" , Fortran List Subject: Re: [gomp4] partially enable GOMP_MAP_FIRSTPRIVATE_POINTER in gfortran In-Reply-To: <23548788-8508-dda9-f559-b4e588e9c644@codesourcery.com> References: <23548788-8508-dda9-f559-b4e588e9c644@codesourcery.com> User-Agent: Notmuch/0.9-125-g4686d11 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Mon, 30 Jan 2017 16:26:21 +0100 Message-ID: <87inowbmci.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Hi Cesar! (It's me, again!) ;-) On Fri, 27 Jan 2017 09:13:06 -0800, Cesar Philippidis wrote: > This patch partially enables GOMP_MAP_FIRSTPRIVATE_POINTER in gfortran. > gfortran still falls back to GOMP_MAP_POINTER for arrays with > descriptors and derived types. The limitation on derived types is there > because we don't have much test coverage for it, and this patch series > was more exploratory for performance enhancements. Now that you still freshly remember it, please file an issue so that we'll take care of that later. > With that in mind, > there are a couple of shortcomings with this patch. > > 1) Dummy reduction variables fallback to GOMP_MAP_POINTER because of a > pointer dereferencing bug. Please also file an issue for that. > The state of debugging such problems on > PTX targets leaves something to be desired, especially since print > isn't working on nvptx targets currently. If the following is what you mean, then that's working for me: $ cat < ../printf.c int main(int argc, char *argv[]) { #pragma acc parallel copyin(argv[0][0:__builtin_strlen(argv[0]) + 1]) { __builtin_printf("Offloaded from %s.\n", argv[0]); } return 0; } $ build-gcc/gcc/xgcc [...] -Wall -Wextra -g ../printf.c -fopenacc -O2 $ GOMP_DEBUG=1 ./a.out [...] nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=1, vectors=32 Offloaded from ./a.out. nvptx_exec: kernel main$_omp_fn$0: finished GOMP_offload_unregister_ver (1, 0x400c20, 5, 0x401560) GOMP_offload_unregister_ver (0, 0x400c20, 6, 0x602050) Again, please file an issue as appropriate. ;-) > 2) Apparently, firstprivate pointers negatively affects the alias > analysis used by ACC KERNELS and parloops, so a couple of more > execution tests fail to generate offloaded code. > > I plan to resolve issue 1) in a follow up patch later on (but maybe not > in the immediate future). Regarding 2), ACC KERNELS are eventually going > to need a significant rework, but that's not going to happen in the near > future either. I've been pushing to get the performance of ACC PARALLEL > regions on par to other OpenACC compilers first, and hopefully that > won't be too far way. Hmm, hmm. > With this patch, I'm observing an approximate 0.6s reduction in > CloverLeaf's original 0.9s execution time (it takes approximate 0.9s > after the GOMP_MAP_FIRSTPRIVATE_INT and GOMP_MAP_TO_PSET patches), to > yield a final execution time somewhere in the neighborhood of 0.3s. > That's about a one second savings from the unpatched version of GCC. Yay! \o/ > This patch has been committed to gomp-4_0-branch. (Not reviewed in detail.) > --- a/gcc/fortran/trans-openmp.c > +++ b/gcc/fortran/trans-openmp.c > @@ -2005,9 +2005,12 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses, > (TREE_TYPE (TREE_TYPE (decl))))) > { > tree orig_decl = decl; > + enum gomp_map_kind gmk = GOMP_MAP_FIRSTPRIVATE_POINTER; > + if (n->u.map_op == OMP_MAP_FORCE_DEVICEPTR) > + gmk = GOMP_MAP_POINTER; Curious, why is "deviceptr" different? > node4 = build_omp_clause (input_location, > OMP_CLAUSE_MAP); > - OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER); > + OMP_CLAUSE_SET_MAP_KIND (node4, gmk); > OMP_CLAUSE_DECL (node4) = decl; > OMP_CLAUSE_SIZE (node4) = size_int (0); > decl = build_fold_indirect_ref (decl); > --- a/gcc/gimplify.c > +++ b/gcc/gimplify.c > @@ -6605,11 +6636,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, > ctx = new_omp_context (region_type); > ctx->clauses = *list_p; > outer_ctx = ctx->outer_context; > - if (code == OMP_TARGET && !lang_GNU_Fortran ()) > + if (code == OMP_TARGET && !(lang_GNU_Fortran () && !(region_type & ORT_ACC))) > { > - ctx->target_map_pointers_as_0len_arrays = true; > - /* FIXME: For Fortran we want to set this too, when > - the Fortran FE is updated to OpenMP 4.5. */ > + if (!lang_GNU_Fortran () || region_type & ORT_ACC) > + ctx->target_map_pointers_as_0len_arrays = true; > ctx->target_map_scalars_firstprivate = true; > } I guess the Fortran OpenMP comment should stay? And, isn't that logic a bit complicated; could simplify this as follows, unless I'm confused? > --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 > +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 > @@ -37,4 +37,6 @@ end module test > ! Check that the loop has been split off into a function. > ! { dg-final { scan-tree-dump-times "(?n);; Function __test_MOD_foo._omp_fn.0 " 1 "optimized" } } > > -! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } > +! This failure was introduced with the GOMP_MAP_POINTER -> > +! GOMP_MAP_FIRSTPRIVATE_POINTER conversion. > +! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" { xfail *-*-* } } } Hmm, hmm. > --- a/libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 > +++ b/libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 > @@ -3,6 +3,7 @@ > ! the deviceptr variable is implied. > > ! { dg-do run } > +! { dg-additional-options "-foffload-force" } > > subroutine subr1 (a, b) > implicit none This is also an OpenACC kernels issue. > --- a/libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > +++ b/libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > @@ -2,6 +2,7 @@ > ! offloaded regions are properly mapped using present_or_copy. > > ! { dg-do run } > +! { dg-additional-options "-foffload-force" } > > program main > implicit none Likweise. I do agree that our OpenACC kernels implementation leaves a lot to be desired, but that we're now also regressing such very simple cases, is a bit unfortunate. Have you already made an attempt at figuring out what's going wrong? Another OpenMP regression: PASS: libgomp.fortran/target2.f90 -O0 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target2.f90 -O0 execution test PASS: libgomp.fortran/target2.f90 -O1 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target2.f90 -O1 execution test PASS: libgomp.fortran/target2.f90 -O2 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target2.f90 -O2 execution test PASS: libgomp.fortran/target2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/target2.f90 -O3 -g (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target2.f90 -O3 -g execution test PASS: libgomp.fortran/target2.f90 -Os (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target2.f90 -Os execution test That is: offload error: process on the device 0 unexpectedly exited with code 0 ..., which, as far as I remember, basically means "SIGSEGV" in the Intel MIC (emulated) offloaded code. Porting this gomp-4_0-branch r244987 "Partially enable GOMP_MAP_FIRSTPRIVATE_POINTER in gfortran." to trunk (see attached, if you want to experiment with that), I can reproduce some (maybe even the same?) issue with OpenMP nvptx offloading: "libgomp: cuCtxSynchronize error: an illegal memory access was encountered". Do you have an idea which of your changes might cause that? Grüße Thomas From 35dfd63154e01e2d9f299daaa876adcc6f94f013 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 30 Jan 2017 14:48:40 +0100 Subject: [PATCH] Partially enable GOMP_MAP_FIRSTPRIVATE_POINTER in gfortran. gcc/fortran/ * trans-openmp.c (gfc_omp_finish_clause): Use GOMP_MAP_POINTER for POINTER_TYPE decls. (gfc_trans_omp_clauses_1): Likewise. gcc/ * gimplify.c (demote_firstprivate_pointer): New function. (gimplify_scan_omp_clauses): Enable target_map_pointers_as_0len_arrays and target_map_scalars_firstprivate in OpenACC and gfortran. (gimplify_adjust_omp_clauses): Demote FIRSTPRIVATE_POINTERS for OpenACC retuction variables. * omp-low.c (lower_omp_target): Adjust receiver reference of decls for fortran dummy arguments. gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-n.f95: Xfail test. libgomp/ * testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Add -foffload-force. * testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise. (cherry picked from commit 771fd834ccc7b5b06dc763240636f0b9a883a8fc) --- gcc/fortran/trans-openmp.c | 7 ++- gcc/gimplify.c | 52 +++++++++++++++++++--- gcc/omp-low.c | 3 +- .../gfortran.dg/goacc/kernels-alias-3.f95 | 3 +- 4 files changed, 55 insertions(+), 10 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 4f525fe..0afe8a0 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1070,7 +1070,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p) return; tree orig_decl = decl; c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER); + OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_FIRSTPRIVATE_POINTER); OMP_CLAUSE_DECL (c4) = decl; OMP_CLAUSE_SIZE (c4) = size_int (0); decl = build_fold_indirect_ref (decl); @@ -2095,9 +2095,12 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, (TREE_TYPE (TREE_TYPE (decl))))) { tree orig_decl = decl; + enum gomp_map_kind gmk = GOMP_MAP_FIRSTPRIVATE_POINTER; + if (n->u.map_op == OMP_MAP_FORCE_DEVICEPTR) + gmk = GOMP_MAP_POINTER; node4 = build_omp_clause (input_location, OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER); + OMP_CLAUSE_SET_MAP_KIND (node4, gmk); OMP_CLAUSE_DECL (node4) = decl; OMP_CLAUSE_SIZE (node4) = size_int (0); decl = build_fold_indirect_ref (decl); diff --git a/gcc/gimplify.c b/gcc/gimplify.c index feb5fa0..cd6c2aa 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -178,6 +178,7 @@ struct gimplify_omp_ctx /* Iteration variables in an OMP_FOR. */ vec loop_iter_var; location_t location; + tree clauses; enum omp_clause_default_kind default_kind; enum omp_region_type region_type; bool combined_loop; @@ -402,6 +403,7 @@ new_omp_context (enum omp_region_type region_type) c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0); c->privatized_types = new hash_set; c->location = input_location; + c->clauses = NULL_TREE; c->region_type = region_type; if ((region_type & ORT_TASK) == 0) c->default_kind = OMP_CLAUSE_DEFAULT_SHARED; @@ -7318,6 +7320,37 @@ find_decl_expr (tree *tp, int *walk_subtrees, void *data) return NULL_TREE; } +static void +demote_firstprivate_pointer (tree decl, gimplify_omp_ctx *ctx) +{ + if (!lang_GNU_Fortran ()) + return; + + while (ctx) + { + if (ctx->region_type == ORT_ACC_PARALLEL + || ctx->region_type == ORT_ACC_KERNELS) + break; + ctx = ctx->outer_context; + } + + if (ctx == NULL) + return; + + tree clauses = ctx->clauses; + + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + { + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER + && OMP_CLAUSE_DECL (c) == decl) + { + OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_POINTER); + return; + } + } +} + /* Scan the OMP clauses in *LIST_P, installing mappings into a new and previous omp contexts. */ @@ -7333,9 +7366,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, ctx = new_omp_context (region_type); outer_ctx = ctx->outer_context; + ctx->clauses = *list_p; if (code == OMP_TARGET) { - if (!lang_GNU_Fortran ()) + if (!lang_GNU_Fortran () || region_type & ORT_ACC) ctx->target_map_pointers_as_0len_arrays = true; ctx->target_map_scalars_firstprivate = true; } @@ -7459,6 +7493,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, if (!(region_type & ORT_ACC)) check_non_private = "reduction"; decl = OMP_CLAUSE_DECL (c); + demote_firstprivate_pointer (decl, ctx->outer_context); if (TREE_CODE (decl) == MEM_REF) { tree type = TREE_TYPE (decl); @@ -8910,11 +8945,16 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, && kind != GOMP_MAP_FORCE_PRESENT && kind != GOMP_MAP_POINTER) { - warning_at (OMP_CLAUSE_LOCATION (c), 0, - "incompatible data clause with reduction " - "on %qE; promoting to present_or_copy", - DECL_NAME (t)); - OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM); + if (lang_hooks.decls.omp_privatize_by_reference (decl)) + OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_POINTER); + else + { + warning_at (OMP_CLAUSE_LOCATION (c), 0, + "incompatible data clause with reduction " + "on %qE; promoting to present_or_copy", + DECL_NAME (t)); + OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM); + } } } } diff --git a/gcc/omp-low.c b/gcc/omp-low.c index ff0f447..18aa394 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -8328,7 +8328,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) } else is_ref = omp_is_reference (var); - if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE) + if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE + || (lang_GNU_Fortran () && TREE_CODE (var) == PARM_DECL)) is_ref = false; bool ref_to_array = false; if (is_ref) diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95 index 07dc8d6..8ca47a0 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95 @@ -16,4 +16,5 @@ end program main ! Only the omp_data_i related loads should be annotated with cliques. ! { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } -! { dg-final { scan-tree-dump-times "(?n)clique 1 base 0" 3 "ealias" } } +! TODO +! { dg-final { scan-tree-dump-times "(?n)clique 1 base 0" 3 "ealias" { xfail *-*-* } } } -- 2.9.3