From patchwork Mon Dec 7 11:19:57 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 553343 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0493A1402A0 for ; Mon, 7 Dec 2015 22:20:14 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=PR5Y+AOt; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=XmJS3WKf15w26ckyr xBoqvYIJMI7n5VX1E72eOyp9QBv0BScDAa8kWK/OAR6NrFJKfAu9Bryn328XWoE7 rceco8Ckscxf8UR7BMtkBKC1afeJmGHENogvMDrlwbJOKhcYQUvKN9fNwC0kw622 EmzMus3XZAGUWGKAvElvg6say8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=Dm73x7nECEpRzLvJ+N/9XQ4 tKjE=; b=PR5Y+AOthu97Bc9A1WZ9tdZpBxK4347Z/J6A3SX//3qDX+1ad03YHXz N0wu+uvymS3ZMrOvUN9fnOw4pR/Ctk94nI7mh549PbJVxMYq50gY0/5ps2rwiCl7 aJn7dlfQ5rZ2rLR9tF/LElpJvHImyGXXBXqRds0VS2Li8CyMSjMg= Received: (qmail 88648 invoked by alias); 7 Dec 2015 11:20:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88624 invoked by uid 89); 7 Dec 2015 11:20:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_40, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 07 Dec 2015 11:20:01 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 4A18AAAC7; Mon, 7 Dec 2015 11:19:58 +0000 (UTC) Date: Mon, 7 Dec 2015 12:19:57 +0100 From: Martin Jambor To: GCC Patches Cc: Jakub Jelinek Subject: [hsa 2/10] Modifications to libgomp proper Message-ID: <20151207111957.GC24234@virgil.suse.cz> Mail-Followup-To: GCC Patches , Jakub Jelinek References: <20151207111758.GA24234@virgil.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20151207111758.GA24234@virgil.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes Hi, The patch below contains all changes to libgomp files except for the hsa plugin (which is in the following patch). The changes can roughly divided into three categories. First, it contains changes I that are necessary to support shared-memory devices. In majority of cases this means treating them like the host fallback because there is no need to copy, host malloc can be used for allocating etc. It also means that GOMP_target_ext and gomp_target_task_fn should not be remapping arguments but should pass to the plugin the same thing host fallback function would receive. Second, because GCC HSA backend often does not emit HSAIL for function it knows it cannot handle, these two functions need to gracefully handle the case when there is no device implementation of a particular function available by doing host fallback too. Third, the patch implements libgomp-part of the device-specific arguments passed to GOMP_target as requested Jakub (well, some are actually for all devices but that is what we call them). Because of nowait target constructs, the arguments have proliferated into tasking too, as did firstprivate copies. Any feedback will be greatly appreciated, Martin 2015-12-04 Martin Jambor Martin Liska include/ * gomp-constants.h (GOMP_DEVICE_HSA): New macro. (GOMP_VERSION_HSA): Likewise. (GOMP_TARGET_ARG_DEVICE_MASK): Likewise. (GOMP_TARGET_ARG_DEVICE_ALL): Likewise. (GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise. (GOMP_TARGET_ARG_ID_MASK): Likewise. (GOMP_TARGET_ARG_NUM_TEAMS): Likewise. (GOMP_TARGET_ARG_THREAD_LIMIT): Likewise. (GOMP_TARGET_ARG_VALUE_SHIFT): Likewise. (GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise. (GOMP_kernel_launch_attributes): New type. (GOMP_hsa_kernel_dispatch): New type. libgomp/ * libgomp-plugin.h (offload_target_type): New element OFFLOAD_TARGET_TYPE_HSA. * libgomp.h (gomp_target_task): New field args. (bool gomp_create_target_task): Updated. (gomp_device_descr): Extra parameter of run_func and async_run_func, new field can_run_func. * libgomp_g.h (GOMP_target_ext): Change prototype. * oacc-host.c (host_run): Added a new parameter args. * target.c (gomp_target_fallback_firstprivate): New function. (gomp_target_fallback_firstprivate): Use gomp_target_fallback_firstprivate. (gomp_get_target_fn_addr): Allow returning NULL for shared memory devices. (GOMP_target): Do host fallback for all shared memory devices. Do not pass any args to plugins. (GOMP_target_ext): Add new parameter args. Allow host fallback if device shares memory. Do not remap data if device has shared memory. (gomp_target_task_fn): Likewise. Also Treat shared memory devices like host fallback for mappings. (GOMP_target_data): Treat shared memory devices like host fallback. (GOMP_target_data_ext): Likewise. (GOMP_target_update): Likewise. (GOMP_target_update_ext): Likewise. Also pass NULL as args to gomp_create_target_task. (GOMP_target_enter_exit_data): Likewise. (omp_target_alloc): Treat shared memory devices like host fallback. (omp_target_free): Likewise. (omp_target_is_present): Likewise. (omp_target_memcpy): Likewise. (omp_target_memcpy_rect): Likewise. (omp_target_associate_ptr): Likewise. (gomp_load_plugin_for_device): Also load can_run. * task.c (GOMP_PLUGIN_target_task_completion): Free firstprivate_copies. (gomp_create_target_task): Accept new argument args and store it to ttask. liboffloadmic/plugin * libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused parameter. (GOMP_OFFLOAD_run): Likewise. diff --git a/include/gomp-constants.h b/include/gomp-constants.h index dffd631..1dae474 100644 --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -176,6 +176,7 @@ enum gomp_map_kind #define GOMP_DEVICE_NOT_HOST 4 #define GOMP_DEVICE_NVIDIA_PTX 5 #define GOMP_DEVICE_INTEL_MIC 6 +#define GOMP_DEVICE_HSA 7 #define GOMP_DEVICE_ICV -1 #define GOMP_DEVICE_HOST_FALLBACK -2 @@ -201,6 +202,7 @@ enum gomp_map_kind #define GOMP_VERSION 0 #define GOMP_VERSION_NVIDIA_PTX 1 #define GOMP_VERSION_INTEL_MIC 0 +#define GOMP_VERSION_HSA 0 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV)) #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff) @@ -228,4 +230,74 @@ enum gomp_map_kind #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff) #define GOMP_LAUNCH_OP_MAX 0xffff +/* Bitmask to apply in order to find out the intended device of a target + argument. */ +#define GOMP_TARGET_ARG_DEVICE_MASK ((1 << 7) - 1) +/* The target argument is significant for all devices. */ +#define GOMP_TARGET_ARG_DEVICE_ALL 0 + +/* Flag set when the subsequent element in the device-specific argument + values. */ +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7) + +/* Bitmask to apply to a target argument to find out the value identifier. */ +#define GOMP_TARGET_ARG_ID_MASK (((1 << 8) - 1) << 8) +/* Target argument index of NUM_TEAMS. */ +#define GOMP_TARGET_ARG_NUM_TEAMS (1 << 8) +/* Target argument index of THREAD_LIMIT. */ +#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8) + +/* If the value is directly embeded in target argument, it should be a 16-bit + at most and shifted by this many bits. */ +#define GOMP_TARGET_ARG_VALUE_SHIFT 16 + +/* HSA specific data structures. */ + +/* Identifiers of device-specific target arguments. */ +#define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES (1 << 8) + +/* Structure describing the run-time and grid properties of an HSA kernel + lauch. */ + +struct GOMP_kernel_launch_attributes +{ + /* Number of dimensions the workload has. Maximum number is 3. */ + uint32_t ndim; + /* Size of the grid in the three respective dimensions. */ + uint32_t gdims[3]; + /* Size of work-groups in the respective dimensions. */ + uint32_t wdims[3]; +}; + +/* Collection of information needed for a dispatch of a kernel from a + kernel. */ + +struct GOMP_hsa_kernel_dispatch +{ + /* Pointer to a command queue associated with a kernel dispatch agent. */ + void *queue; + /* Pointer to reserved memory for OMP data struct copying. */ + void *omp_data_memory; + /* Pointer to a memory space used for kernel arguments passing. */ + void *kernarg_address; + /* Kernel object. */ + uint64_t object; + /* Synchronization signal used for dispatch synchronization. */ + uint64_t signal; + /* Private segment size. */ + uint32_t private_segment_size; + /* Group segment size. */ + uint32_t group_segment_size; + /* Number of children kernel dispatches. */ + uint64_t kernel_dispatch_count; + /* Number of threads. */ + uint32_t omp_num_threads; + /* Debug purpose argument. */ + uint64_t debug; + /* Levels-var ICV. */ + uint64_t omp_level; + /* Kernel dispatch structures created for children kernel dispatches. */ + struct GOMP_hsa_kernel_dispatch **children_dispatches; +}; + #endif diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index ab22e85..0204491 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -48,7 +48,8 @@ enum offload_target_type OFFLOAD_TARGET_TYPE_HOST = 2, /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed. */ OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5, - OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 + OFFLOAD_TARGET_TYPE_INTEL_MIC = 6, + OFFLOAD_TARGET_TYPE_HSA = 7 }; /* Auxiliary struct, used for transferring pairs of addresses from plugin diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index c467f97..16c591f 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -496,6 +496,10 @@ struct gomp_target_task struct target_mem_desc *tgt; struct gomp_task *task; struct gomp_team *team; + /* Copies of firstprivate mapped data for shared memory accelerators. */ + void *firstprivate_copies; + /* Device-specific target arguments. */ + void **args; void *hostaddrs[]; }; @@ -750,7 +754,8 @@ extern void gomp_task_maybe_wait_for_dependencies (void **); extern bool gomp_create_target_task (struct gomp_device_descr *, void (*) (void *), size_t, void **, size_t *, unsigned short *, unsigned int, - void **, enum gomp_target_task_state); + void **, void **, + enum gomp_target_task_state); static void inline gomp_finish_task (struct gomp_task *task) @@ -924,8 +929,9 @@ struct gomp_device_descr void *(*dev2host_func) (int, void *, const void *, size_t); void *(*host2dev_func) (int, void *, const void *, size_t); void *(*dev2dev_func) (int, void *, const void *, size_t); - void (*run_func) (int, void *, void *); - void (*async_run_func) (int, void *, void *, void *); + bool (*can_run_func) (void *); + void (*run_func) (int, void *, void *, void **); + void (*async_run_func) (int, void *, void *, void **, void *); /* Splay tree containing information about mapped memory regions. */ struct splay_tree_s mem_map; diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h index c238e6a..9c90d59 100644 --- a/libgomp/libgomp_g.h +++ b/libgomp/libgomp_g.h @@ -278,8 +278,7 @@ extern void GOMP_single_copy_end (void *); extern void GOMP_target (int, void (*) (void *), const void *, size_t, void **, size_t *, unsigned char *); extern void GOMP_target_ext (int, void (*) (void *), size_t, void **, size_t *, - unsigned short *, unsigned int, void **, - int, int); + unsigned short *, unsigned int, void **, void **); extern void GOMP_target_data (int, const void *, size_t, void **, size_t *, unsigned char *); extern void GOMP_target_data_ext (int, size_t, void **, size_t *, diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index 9874804..a769211 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -123,7 +123,8 @@ host_host2dev (int n __attribute__ ((unused)), } static void -host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars) +host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars, + void **args __attribute__((unused))) { void (*fn)(void *) = (void (*)(void *)) fn_ptr; diff --git a/libgomp/target.c b/libgomp/target.c index cf9d0e6..b453c0c 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1261,12 +1261,13 @@ gomp_target_fallback (void (*fn) (void *), void **hostaddrs) *thr = old_thr; } -/* Host fallback with firstprivate map-type handling. */ +/* Handle firstprivate map-type for shared memory devices and the host + fallback. Return the pointer of firstprivate copies which has to be freed + after use. */ -static void -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, - void **hostaddrs, size_t *sizes, - unsigned short *kinds) +static void * +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs, + size_t *sizes, unsigned short *kinds) { size_t i, tgt_align = 0, tgt_size = 0; char *tgt = NULL; @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, } if (tgt_align) { - tgt = gomp_alloca (tgt_size + tgt_align - 1); + tgt = gomp_malloc (tgt_size + tgt_align - 1); uintptr_t al = (uintptr_t) tgt & (tgt_align - 1); if (al) tgt += tgt_align - al; @@ -1296,7 +1297,19 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, tgt_size = tgt_size + sizes[i]; } } + return tgt; +} + +/* Host fallback with firstprivate map-type handling. */ + +static void +gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, + void **hostaddrs, size_t *sizes, + unsigned short *kinds) +{ + void *fpc = gomp_target_unshare_firstprivate (mapnum, hostaddrs, sizes, kinds); gomp_target_fallback (fn, hostaddrs); + free (fpc); } /* Helper function of GOMP_target{,_ext} routines. */ @@ -1316,7 +1329,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep, splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k); gomp_mutex_unlock (&devicep->lock); if (tgt_fn == NULL) - gomp_fatal ("Target function wasn't mapped"); + { + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + return NULL; + else + gomp_fatal ("Target function wasn't mapped"); + } return (void *) tgt_fn->tgt_offset; } @@ -1340,7 +1358,9 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, struct gomp_device_descr *devicep = resolve_device (device); if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + /* All shared memory devices should use the GOMP_target_ext function. */ + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return gomp_target_fallback (fn, hostaddrs); void *fn_addr = gomp_get_target_fn_addr (devicep, fn); @@ -1348,7 +1368,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, struct target_mem_desc *tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false, GOMP_MAP_VARS_TARGET); - devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start); + devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start, + NULL); gomp_unmap_vars (tgt_vars, true); } @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, and several arguments have been added: FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h. DEPEND is array of dependencies, see GOMP_task for details. + ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and a + variable number of device-specific arguments, which always take two elements + where the first specifies the type and the second the actual value. The + last element of the array is a single NULL. + NUM_TEAMS is positive if GOMP_teams will be called in the body with that value, or 1 if teams construct is not present, or 0, if teams construct does not have num_teams clause and so the choice is @@ -1369,14 +1395,10 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, void GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, void **hostaddrs, size_t *sizes, unsigned short *kinds, - unsigned int flags, void **depend, int num_teams, - int thread_limit) + unsigned int flags, void **depend, void **args) { struct gomp_device_descr *devicep = resolve_device (device); - (void) num_teams; - (void) thread_limit; - if (flags & GOMP_TARGET_FLAG_NOWAIT) { struct gomp_thread *thr = gomp_thread (); @@ -1413,7 +1435,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, && !thr->task->final_task) { gomp_create_target_task (devicep, fn, mapnum, hostaddrs, - sizes, kinds, flags, depend, + sizes, kinds, flags, depend, args, GOMP_TARGET_TASK_BEFORE_MAP); return; } @@ -1430,20 +1452,33 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, gomp_task_maybe_wait_for_dependencies (depend); } + void *fn_addr; if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)) + || (devicep->can_run_func && !devicep->can_run_func (fn_addr))) { gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, kinds); return; } - void *fn_addr = gomp_get_target_fn_addr (devicep, fn); - - struct target_mem_desc *tgt_vars - = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true, - GOMP_MAP_VARS_TARGET); - devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start); - gomp_unmap_vars (tgt_vars, true); + struct target_mem_desc *tgt_vars; + void *fpc = NULL; + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + { + fpc = gomp_target_unshare_firstprivate (mapnum, hostaddrs, sizes, kinds); + tgt_vars = NULL; + } + else + tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, + true, GOMP_MAP_VARS_TARGET); + devicep->run_func (devicep->target_id, fn_addr, + tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs, + args); + if (tgt_vars) + gomp_unmap_vars (tgt_vars, true); + else + free (fpc); } /* Host fallback for GOMP_target_data{,_ext} routines. */ @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, size_t mapnum, struct gomp_device_descr *devicep = resolve_device (device); if (devicep == NULL + || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) return gomp_target_data_fallback (); @@ -1491,7 +1527,8 @@ GOMP_target_data_ext (int device, size_t mapnum, void **hostaddrs, struct gomp_device_descr *devicep = resolve_device (device); if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return gomp_target_data_fallback (); struct target_mem_desc *tgt @@ -1521,7 +1558,8 @@ GOMP_target_update (int device, const void *unused, size_t mapnum, struct gomp_device_descr *devicep = resolve_device (device); if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return; gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false); @@ -1552,7 +1590,7 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs, if (gomp_create_target_task (devicep, (void (*) (void *)) NULL, mapnum, hostaddrs, sizes, kinds, flags | GOMP_TARGET_FLAG_UPDATE, - depend, GOMP_TARGET_TASK_DATA)) + depend, NULL, GOMP_TARGET_TASK_DATA)) return; } else @@ -1572,7 +1610,8 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs, } if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return; struct gomp_thread *thr = gomp_thread (); @@ -1673,7 +1712,7 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs, { if (gomp_create_target_task (devicep, (void (*) (void *)) NULL, mapnum, hostaddrs, sizes, kinds, - flags, depend, + flags, depend, NULL, GOMP_TARGET_TASK_DATA)) return; } @@ -1694,7 +1733,8 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs, } if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return; struct gomp_thread *thr = gomp_thread (); @@ -1729,8 +1769,11 @@ gomp_target_task_fn (void *data) if (ttask->fn != NULL) { + void *fn_addr; if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn)) + || (devicep->can_run_func && !devicep->can_run_func (fn_addr))) { ttask->state = GOMP_TARGET_TASK_FALLBACK; gomp_target_fallback_firstprivate (ttask->fn, ttask->mapnum, @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data) if (ttask->state == GOMP_TARGET_TASK_FINISHED) { - gomp_unmap_vars (ttask->tgt, true); + if (ttask->tgt) + gomp_unmap_vars (ttask->tgt, true); return false; } - void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn); - ttask->tgt - = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL, - ttask->sizes, ttask->kinds, true, - GOMP_MAP_VARS_TARGET); + bool shared_mem; + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + { + shared_mem = true; + ttask->tgt = NULL; + ttask->firstprivate_copies + = gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs, + ttask->sizes, ttask->kinds); + } + else + { + shared_mem = false; + ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, + NULL, ttask->sizes, ttask->kinds, true, + GOMP_MAP_VARS_TARGET); + } ttask->state = GOMP_TARGET_TASK_READY_TO_RUN; devicep->async_run_func (devicep->target_id, fn_addr, - (void *) ttask->tgt->tgt_start, (void *) ttask); + shared_mem ? ttask->hostaddrs + : (void *) ttask->tgt->tgt_start, + ttask->args, (void *) ttask); return true; } else if (devicep == NULL - || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return false; size_t i; @@ -1807,7 +1865,8 @@ omp_target_alloc (size_t size, int device_num) if (devicep == NULL) return NULL; - if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return malloc (size); gomp_mutex_lock (&devicep->lock); @@ -1835,7 +1894,8 @@ omp_target_free (void *device_ptr, int device_num) if (devicep == NULL) return; - if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) { free (device_ptr); return; @@ -1862,7 +1922,8 @@ omp_target_is_present (void *ptr, int device_num) if (devicep == NULL) return 0; - if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return 1; gomp_mutex_lock (&devicep->lock); @@ -1892,7 +1953,8 @@ omp_target_memcpy (void *dst, void *src, size_t length, size_t dst_offset, if (dst_devicep == NULL) return EINVAL; - if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || dst_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) dst_devicep = NULL; } if (src_device_num != GOMP_DEVICE_HOST_FALLBACK) @@ -1904,7 +1966,8 @@ omp_target_memcpy (void *dst, void *src, size_t length, size_t dst_offset, if (src_devicep == NULL) return EINVAL; - if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || src_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) src_devicep = NULL; } if (src_devicep == NULL && dst_devicep == NULL) @@ -2034,7 +2097,8 @@ omp_target_memcpy_rect (void *dst, void *src, size_t element_size, if (dst_devicep == NULL) return EINVAL; - if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || dst_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) dst_devicep = NULL; } if (src_device_num != GOMP_DEVICE_HOST_FALLBACK) @@ -2046,7 +2110,8 @@ omp_target_memcpy_rect (void *dst, void *src, size_t element_size, if (src_devicep == NULL) return EINVAL; - if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || src_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) src_devicep = NULL; } @@ -2082,7 +2147,8 @@ omp_target_associate_ptr (void *host_ptr, void *device_ptr, size_t size, if (devicep == NULL) return EINVAL; - if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return EINVAL; gomp_mutex_lock (&devicep->lock); @@ -2225,6 +2291,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device, { DLSYM (run); DLSYM (async_run); + DLSYM_OPT (can_run, can_run); DLSYM (dev2dev); } if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200) diff --git a/libgomp/task.c b/libgomp/task.c index 620facd..aa6bd67 100644 --- a/libgomp/task.c +++ b/libgomp/task.c @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data) gomp_mutex_unlock (&team->task_lock); } ttask->state = GOMP_TARGET_TASK_FINISHED; + free (ttask->firstprivate_copies); gomp_target_task_completion (team, task); gomp_mutex_unlock (&team->task_lock); } @@ -593,7 +594,7 @@ bool gomp_create_target_task (struct gomp_device_descr *devicep, void (*fn) (void *), size_t mapnum, void **hostaddrs, size_t *sizes, unsigned short *kinds, - unsigned int flags, void **depend, + unsigned int flags, void **depend, void **args, enum gomp_target_task_state state) { struct gomp_thread *thr = gomp_thread (); @@ -653,6 +654,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep, ttask->devicep = devicep; ttask->fn = fn; ttask->mapnum = mapnum; + ttask->args = args; memcpy (ttask->hostaddrs, hostaddrs, mapnum * sizeof (void *)); ttask->sizes = (size_t *) &ttask->hostaddrs[mapnum]; memcpy (ttask->sizes, sizes, mapnum * sizeof (size_t)); diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp index f8c1725..48599dd 100644 --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp @@ -539,7 +539,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const void *src_ptr, extern "C" void GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars, - void *async_data) + void **, void *async_data) { TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", device, tgt_fn, tgt_vars, async_data); @@ -555,7 +555,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars, } extern "C" void -GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars) +GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **) { TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, tgt_vars);