From patchwork Fri Oct 18 15:08:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 1179438 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-511296-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="WtnIeHFX"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46vqFD1bq3z9sNw for ; Sat, 19 Oct 2019 02:09:17 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :reply-to:from:subject:to:cc:message-id:date:mime-version :content-type; q=dns; s=default; b=OEuq2nsJUaTcmC+LrLZZeK5o3YT6h dUZw4gub9iYRzqR3KRaKIg1NyfV/ZVnviFQP+r8eMrRqroUa5ch1EE+uZL5EMC9d QPDs1MJT0HuAb1zHR0slrQiJeex1qzX+c/HAZzV267LM8iucPrQWjmxbKMY/tHFy /Tr2AkoPa3ZeqA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :reply-to:from:subject:to:cc:message-id:date:mime-version :content-type; s=default; bh=knkr3APw65Pe3zbO5lI1ChR3Ocs=; b=Wtn IeHFXx/fbyYlOILONrkwzjyXCTw/3oGJ459gSmrewlCL0HmLAX2iX1876uMUqs82 BJN79hV9I7IJWLfDAfy68dV/3O8eg+7DiiZwpoI6XFG5XBN0Wg8RCBAQ21VIF13e I2KmUsg8MV9xZtK4GMpoAe0b9zeCcrTBwsBbttSw= Received: (qmail 115639 invoked by alias); 18 Oct 2019 15:09:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 115623 invoked by uid 89); 18 Oct 2019 15:09:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.1 spammy=DECL_NAME, 84, decl_name X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 18 Oct 2019 15:09:04 +0000 IronPort-SDR: LJQfitJ8igyL26eJOjuVFELZt5QS4nxfgF1vIwihju4jyRFTVcntYnVIf7y0T3SXD4x79ELigh ZE3RiwAgwhhr/39uUHcPoz8nKdjyc3NWjar90FI8QlXOns29V8yWYm8FWhS98jUSkVBvvWV/uN pQMIfYZ8ASDki7tL9K/VbsJ4ql4uZPDy1RW/grHl5tkW5jn2f2RrjyQSWstefnLS3BhC3G6O/J D4YJmkDPfmbUEdr6AAGG9WLn618dyNoTPNiPPNLUbOIxfcjihPCgA1PIzbdb+8goGZelTwBgiB Stc= Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 18 Oct 2019 07:09:02 -0800 IronPort-SDR: HufJXZNRbQPAUTP6/sKFplAtkKy6oCNn6VK6qvBOQtOCK6sf0npxWzYAVNwtMD5L1jhPBGrNup DtvD8UZ7y1vKfTSM05V0HzvSlAVSg6ZioEXFwQNG8093JL8SmDMUo5wx+jfzSSFjtNGBRKJObx Qx1avftb9ju6PzQkaxxRGGj6u71IY0U86XypW13V41lslXOO/hw2WTdUjKEx6XhIsdffqG50wv FsZSowr4wzmHGbkOBQa8rkgog+eWir1GyyOGllkhKKCfbDwQZ0s5zd6H5Md5PZ1u2Mvn4niz1w Pmk= Reply-To: From: Chung-Lin Tang Subject: [PATCH, OpenACC] Fortran deviceptr To: Thomas Schwinge CC: gcc-patches , Fortran List Message-ID: Date: Fri, 18 Oct 2019 23:08:54 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 Hi Thomas, this is the updated Fortran deviceptr patche, originated from Cesar, and one of the tests was from James Norris: https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00286.html https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00532.html There were a few style cleanups, but the goal of modification is the same: to use only one clause to represent Fortran deviceptr, and to preserve it during gimplification. Because of this modification, and as we discussed earlier, the handle_ftn_pointers() code in libgomp/oacc-parallel.c appeared to be no longer needed. I have remove them in this patch, and tested libgomp without regressions. Also, I've added a new libgomp.oacc-fortran/deviceptr-2.f90 testcase that actually copies out and verifies the deviceptr computation. Is this okay for trunk now? Thanks, Chung-Lin 2019-10-18 Cesar Philippidis Chung-Lin Tang gcc/fortran/ * trans-openmp.c (gfc_omp_finish_clause): Don't create pointer data mappings for deviceptr clauses. (gfc_trans_omp_clauses): Likewise. gcc/ * gimplify.c (enum gimplify_omp_var_data): Add GOVD_DEVICETPR. (omp_notice_variable): Add GOVD_DEVICEPTR attribute when appropriate. (gimplify_scan_omp_clauses): Likewise. (gimplify_adjust_omp_clauses_1): Set GOMP_MAP_FORCE_DEVICEPTR for implicit deviceptr mappings. gcc/testsuite/ * c-c++-common/goacc/deviceptr-4.c: Update expected data mapping. 2019-10-18 Chung-Lin Tang James Norris libgomp/ * oacc-parallel.c (handle_ftn_pointers): Delete function. (GOACC_parallel_keyed): Remove call to handle_ftn_pointers. * testsuite/libgomp.oacc-fortran/deviceptr-1.f90: New test. * testsuite/libgomp.oacc-fortran/deviceptr-2.f90: New test. Index: gcc/fortran/trans-openmp.c =================================================================== --- gcc/fortran/trans-openmp.c (revision 277155) +++ gcc/fortran/trans-openmp.c (working copy) @@ -1099,7 +1099,8 @@ gfc_omp_clause_dtor (tree clause, tree decl) void gfc_omp_finish_clause (tree c, gimple_seq *pre_p) { - if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP) + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP + || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR) return; tree decl = OMP_CLAUSE_DECL (c); @@ -2173,6 +2174,12 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL) { if (POINTER_TYPE_P (TREE_TYPE (decl)) + && n->u.map_op == OMP_MAP_FORCE_DEVICEPTR) + { + OMP_CLAUSE_DECL (node) = decl; + goto finalize_map_clause; + } + else if (POINTER_TYPE_P (TREE_TYPE (decl)) && (gfc_omp_privatize_by_reference (decl) || GFC_DECL_GET_SCALAR_POINTER (decl) || GFC_DECL_GET_SCALAR_ALLOCATABLE (decl) @@ -2346,6 +2353,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, sizetype, ptr, ptr2); } + finalize_map_clause: switch (n->u.map_op) { case OMP_MAP_ALLOC: Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c (revision 277155) +++ gcc/gimplify.c (working copy) @@ -123,6 +123,9 @@ enum gimplify_omp_var_data /* Flag for GOVD_REDUCTION: inscan seen in {in,ex}clusive clause. */ GOVD_REDUCTION_INSCAN = 0x2000000, + /* Flag for OpenACC deviceptrs. */ + GOVD_DEVICEPTR = 0x4000000, + GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR | GOVD_LOCAL) @@ -7426,6 +7429,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, error ("variable %qE declared in enclosing " "% region", DECL_NAME (decl)); nflags |= GOVD_MAP; + nflags |= (n2->value & GOVD_DEVICEPTR); if (octx->region_type == ORT_ACC_DATA && (n2->value & GOVD_MAP_0LEN_ARRAY)) nflags |= GOVD_MAP_0LEN_ARRAY; @@ -8943,6 +8947,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_TO || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_TOFROM) flags |= GOVD_MAP_ALWAYS_TO; + else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR) + flags |= GOVD_DEVICEPTR; goto do_add; case OMP_CLAUSE_DEPEND: @@ -9727,7 +9733,8 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, | GOVD_MAP_FORCE | GOVD_MAP_FORCE_PRESENT | GOVD_MAP_ALLOC_ONLY - | GOVD_MAP_FROM_ONLY)) + | GOVD_MAP_FROM_ONLY + | GOVD_DEVICEPTR)) { case 0: kind = GOMP_MAP_TOFROM; @@ -9750,6 +9757,9 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, case GOVD_MAP_FORCE_PRESENT: kind = GOMP_MAP_FORCE_PRESENT; break; + case GOVD_DEVICEPTR: + kind = GOMP_MAP_FORCE_DEVICEPTR; + break; default: gcc_unreachable (); } Index: gcc/testsuite/c-c++-common/goacc/deviceptr-4.c =================================================================== --- gcc/testsuite/c-c++-common/goacc/deviceptr-4.c (revision 277155) +++ gcc/testsuite/c-c++-common/goacc/deviceptr-4.c (working copy) @@ -8,4 +8,4 @@ subr (int *a) a[0] += 1.0; } -/* { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel.*map\\(force_deviceptr:a" 1 "gimple" } } */ Index: libgomp/oacc-parallel.c =================================================================== --- libgomp/oacc-parallel.c (revision 277155) +++ libgomp/oacc-parallel.c (working copy) @@ -66,51 +66,6 @@ find_pointer (int pos, size_t mapnum, unsigned sho return 0; } -/* Handle the mapping pair that are presented when a - deviceptr clause is used with Fortran. */ - -static void -handle_ftn_pointers (size_t mapnum, void **hostaddrs, size_t *sizes, - unsigned short *kinds) -{ - int i; - - for (i = 0; i < mapnum; i++) - { - unsigned short kind1 = kinds[i] & 0xff; - - /* Handle Fortran deviceptr clause. */ - if (kind1 == GOMP_MAP_FORCE_DEVICEPTR) - { - unsigned short kind2; - - if (i < (signed)mapnum - 1) - kind2 = kinds[i + 1] & 0xff; - else - kind2 = 0xffff; - - if (sizes[i] == sizeof (void *)) - continue; - - /* At this point, we're dealing with a Fortran deviceptr. - If the next element is not what we're expecting, then - this is an instance of where the deviceptr variable was - not used within the region and the pointer was removed - by the gimplifier. */ - if (kind2 == GOMP_MAP_POINTER - && sizes[i + 1] == 0 - && hostaddrs[i] == *(void **)hostaddrs[i + 1]) - { - kinds[i+1] = kinds[i]; - sizes[i+1] = sizeof (void *); - } - - /* Invalidate the entry. */ - hostaddrs[i] = NULL; - } - } -} - static void goacc_wait (int async, int num_waits, va_list *ap); @@ -203,8 +158,6 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (voi goacc_profiling_dispatch (&prof_info, &compute_construct_event_info, &api_info); - handle_ftn_pointers (mapnum, hostaddrs, sizes, kinds); - /* Host fallback if "if" clause is false or if the current device is set to the host. */ if (flags & GOACC_FLAG_HOST_FALLBACK) Index: libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 =================================================================== --- libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 (nonexistent) +++ libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 (working copy) @@ -0,0 +1,197 @@ +! { dg-do run } + +! Test the deviceptr clause with various directives +! and in combination with other directives where +! the deviceptr variable is implied. + +subroutine subr1 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + integer :: b(N) + integer :: i = 0 + + !$acc data deviceptr (a) + + !$acc parallel copy (b) + do i = 1, N + a(i) = i * 2 + b(i) = a(i) + end do + !$acc end parallel + + !$acc end data + +end subroutine + +subroutine subr2 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + !$acc declare deviceptr (a) + integer :: b(N) + integer :: i = 0 + + !$acc parallel copy (b) + do i = 1, N + a(i) = i * 4 + b(i) = a(i) + end do + !$acc end parallel + +end subroutine + +subroutine subr3 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + !$acc declare deviceptr (a) + integer :: b(N) + integer :: i = 0 + + !$acc kernels copy (b) + do i = 1, N + a(i) = i * 8 + b(i) = a(i) + end do + !$acc end kernels + +end subroutine + +subroutine subr4 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + integer :: b(N) + integer :: i = 0 + + !$acc parallel deviceptr (a) copy (b) + do i = 1, N + a(i) = i * 16 + b(i) = a(i) + end do + !$acc end parallel + +end subroutine + +subroutine subr5 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + integer :: b(N) + integer :: i = 0 + + !$acc kernels deviceptr (a) copy (b) + do i = 1, N + a(i) = i * 32 + b(i) = a(i) + end do + !$acc end kernels + +end subroutine + +subroutine subr6 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + integer :: b(N) + integer :: i = 0 + + !$acc parallel deviceptr (a) copy (b) + do i = 1, N + b(i) = i + end do + !$acc end parallel + +end subroutine + +subroutine subr7 (a, b) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + integer :: b(N) + integer :: i = 0 + + !$acc data deviceptr (a) + + !$acc parallel copy (b) + do i = 1, N + a(i) = i * 2 + b(i) = a(i) + end do + !$acc end parallel + + !$acc parallel copy (b) + do i = 1, N + a(i) = b(i) * 2 + b(i) = a(i) + end do + !$acc end parallel + + !$acc end data + +end subroutine + +program main + use iso_c_binding, only: c_ptr, c_f_pointer + implicit none + type (c_ptr) :: cp + integer, parameter :: N = 8 + integer, pointer :: fp(:) + integer :: i = 0 + integer :: b(N) + + interface + function acc_malloc (s) bind (C) + use iso_c_binding, only: c_ptr, c_size_t + integer (c_size_t), value :: s + type (c_ptr) :: acc_malloc + end function + end interface + + cp = acc_malloc (N * sizeof (fp(N))) + call c_f_pointer (cp, fp, [N]) + + call subr1 (fp, b) + + do i = 1, N + if (b(i) .ne. i * 2) call abort + end do + + call subr2 (fp, b) + + do i = 1, N + if (b(i) .ne. i * 4) call abort + end do + + call subr3 (fp, b) + + do i = 1, N + if (b(i) .ne. i * 8) call abort + end do + + call subr4 (fp, b) + + do i = 1, N + if (b(i) .ne. i * 16) call abort + end do + + call subr5 (fp, b) + + do i = 1, N + if (b(i) .ne. i * 32) call abort + end do + + call subr6 (fp, b) + + do i = 1, N + if (b(i) .ne. i) call abort + end do + + call subr7 (fp, b) + + do i = 1, N + if (b(i) .ne. i * 4) call abort + end do + +end program main Index: libgomp/testsuite/libgomp.oacc-fortran/deviceptr-2.f90 =================================================================== --- libgomp/testsuite/libgomp.oacc-fortran/deviceptr-2.f90 (nonexistent) +++ libgomp/testsuite/libgomp.oacc-fortran/deviceptr-2.f90 (working copy) @@ -0,0 +1,54 @@ +! { dg-do run } + +! Test deviceptr clause to see if computation on device memory array +! and copy back to host memory works. + +subroutine process_by_openacc (a, c) + implicit none + integer, parameter :: N = 8 + integer :: a(N) + integer :: i = 0 + integer :: c + + !$acc parallel deviceptr (a) + do i = 1, N + a(i) = i * c + end do + !$acc end parallel + +end subroutine + +program main + use iso_c_binding, only: c_ptr, c_f_pointer, c_loc + implicit none + type (c_ptr) :: cp + integer, parameter :: N = 8 + integer, pointer :: fp(:) + integer, target :: res(N) + integer :: i + + interface + function acc_malloc (s) bind (C) + use iso_c_binding, only: c_ptr, c_size_t + integer (c_size_t), value :: s + type (c_ptr) :: acc_malloc + end function acc_malloc + + subroutine acc_memcpy_from_device (d, s, sz) bind (C) + use iso_c_binding, only: c_ptr, c_size_t + type (c_ptr), value :: d, s + integer (c_size_t), value :: sz + end subroutine acc_memcpy_from_device + end interface + + cp = acc_malloc (N * sizeof (fp(N))) + call c_f_pointer (cp, fp, [N]) + + call process_by_openacc (fp, 1234) + call acc_memcpy_from_device (c_loc (res), cp, N * sizeof (fp(N))) + + do i = 1, N + if (res(i) .ne. i * 1234) call abort + end do + +end program main