From patchwork Fri Aug 26 11:15:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tobias Burnus X-Patchwork-Id: 1670679 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MDchT3xCpz1yhT for ; Fri, 26 Aug 2022 21:15:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D7E6538515D3 for ; Fri, 26 Aug 2022 11:15:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 29DCD3858C74 for ; Fri, 26 Aug 2022 11:15:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 29DCD3858C74 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,265,1654588800"; d="diff'?scan'208,217";a="81862680" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 26 Aug 2022 03:15:32 -0800 IronPort-SDR: lJyGBGKrL+p1zCRsKDxC1Uw5mUh5JHnyaumyUtQ+St8a1F0sFfKYy7RA01p6dYsVpom4QXFuxQ MVt1tBVb/9HpTs2ZhuEC2eD2prnHgk/c0eSu8kD1kNFNg1UePt5w3U06zEbXbDdVPTLgWwZO5V drfo0QWXNp2GfPEwngyYTBgpLaEVxAY3ro3KOuh78xpZ/e5uaFmL+0+p67oWIAvFdff+zOV8Qn UqVWvPfLWcFgD0FTqhFT6Hc0KRYZ2RYDk01ld9VUeJf0x5v/Wdt2mT+UubPvzI24uZvfaet+6o ww4= Message-ID: Date: Fri, 26 Aug 2022 13:15:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Content-Language: en-US To: gcc-patches , Jakub Jelinek From: Tobias Burnus Subject: [Patch] libgomp: Add reverse-offload splay tree X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" For reverse-offload data handling, we need to support: (a) device fn addr -> host fn address (b) finding already mapped host -> device vars via their device address For (a), the functions addrs, we need some extra code (cf. previous patches) as this information does not exist already. For (b), the variables, we have two options: (i) Do a reverse search on the existing data. That's done in oacc-mem.c's lookup_dev and obviously is O(N) as it has to check all nodes. With the assumption "This is not expected to be a common operation." is might be still okay. (ii) Using a second splay tree for the reverse lookup. The attached patch prepares for this – but does not yet handle all locations and is not yet active. The 'devicep->load_image_func' call depends whether the previous [1/3] patch (cf. below) has been applied or not. (The advantage of the reverse-offload mapping is that 'nowait' is not permitted and 'target {enter,exit,} data device(anchestor:1)' is neither. Thus, all 'omp target device(ancestor:1)' mapping done in target_rev can be undone in the same function - and all others are preexisting.) OK for mainline? Tobias PS: Still to be done (for nvptx, for gcn a bit more is needed): - handle remaining var-mapping cases - gomp_target_rev - mainly: handle all of map/firstprivate. - turn on reverse-offload support (accept omp_requires) + add tons of testcases. Pending reverse-offload patches (libgomp only with abit gcc/config/{gcn,nvptx}): [Patch][1/3] libgomp: Prepare for reverse offload fn lookup [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup [1/3] is at: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600333.html [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600381.html ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp: Add reverse-offload splay tree Adds a splay tree for reverse offloading; effectively, it is not yet filled as reverse offload is not enabled for any device type. It also does not yet track variables later mapped (and then unmapped). Those will be added in follow-up commits. libgomp/ChangeLog: * libgomp.h: Include splay-tree.h with splay_tree_prefix set to 'reverse'. (struct target_mem_desc): Declare at old line; move definition down and add 'reverse_splay_tree_node rev_array' member. (struct reverse_splay_tree_key_s, reverse_splay_compare): New. (reverse_splay_tree_node, reverse_splay_tree, reverse_splay_tree_key): New typedef. (struct gomp_device_descr): Add 'reverse_splay_tree_s mem_map_rev'. * oacc-host.c (host_dispatch): NULL init mem_map_rev. * target.c: Include splay-tree.h with splay_tree_c set and splay_tree_prefix set to 'reverse'. (gomp_map_lookup_rev): New function (for now inside '#if 0'). (gomp_unmap_tgt): Free array_rev. (gomp_load_image_to_device): Fill reverse-offload splay tree with funct and var address. (gomp_unload_image_from_device): Empty reverse-offload splay tree. (gomp_target_init): Init mem_map_rev. libgomp/libgomp.h | 83 ++++++++++++++++++++++++++++++++++++++--------------- libgomp/oacc-host.c | 1 + libgomp/target.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 137 insertions(+), 25 deletions(-) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index c243c4d6cf4..10518a8348b 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1038,29 +1038,7 @@ struct target_var_desc { uintptr_t length; }; -struct target_mem_desc { - /* Reference count. */ - uintptr_t refcount; - /* All the splay nodes allocated together. */ - splay_tree_node array; - /* Start of the target region. */ - uintptr_t tgt_start; - /* End of the targer region. */ - uintptr_t tgt_end; - /* Handle to free. */ - void *to_free; - /* Previous target_mem_desc. */ - struct target_mem_desc *prev; - /* Number of items in following list. */ - size_t list_count; - - /* Corresponding target device descriptor. */ - struct gomp_device_descr *device_descr; - - /* List of target items to remove (or decrease refcount) - at the end of region. */ - struct target_var_desc list[]; -}; +struct target_mem_desc; /* Special value for refcount - mask to indicate existence of special values. Right now we allocate 3 bits. */ @@ -1154,6 +1132,64 @@ splay_compare (splay_tree_key x, splay_tree_key y) #include "splay-tree.h" +/* Reverse offload splay-tree handling. */ + +struct reverse_splay_tree_key_s { + /* Address of the device object. */ + uintptr_t dev_start; + /* Address immediately after the device object. */ + uintptr_t dev_end; + + splay_tree_key k; +}; + +typedef struct reverse_splay_tree_node_s *reverse_splay_tree_node; +typedef struct reverse_splay_tree_s *reverse_splay_tree; +typedef struct reverse_splay_tree_key_s *reverse_splay_tree_key; + +static inline int +reverse_splay_compare (reverse_splay_tree_key x, reverse_splay_tree_key y) +{ + if (x->dev_start == x->dev_end + && y->dev_start == y->dev_end) + return 0; + if (x->dev_end <= y->dev_start) + return -1; + if (x->dev_start >= y->dev_end) + return 1; + return 0; +} + +#define splay_tree_prefix reverse +#include "splay-tree.h" + +struct target_mem_desc { + /* Reference count. */ + uintptr_t refcount; + /* All the splay nodes allocated together. */ + splay_tree_node array; + /* Likeverse for the reverse lookup device->host for reverse offload. */ + reverse_splay_tree_node rev_array; + /* Start of the target region. */ + uintptr_t tgt_start; + /* End of the targer region. */ + uintptr_t tgt_end; + /* Handle to free. */ + void *to_free; + /* Previous target_mem_desc. */ + struct target_mem_desc *prev; + /* Number of items in following list. */ + size_t list_count; + + /* Corresponding target device descriptor. */ + struct gomp_device_descr *device_descr; + + /* List of target items to remove (or decrease refcount) + at the end of region. */ + struct target_var_desc list[]; +}; + + typedef struct acc_dispatch_t { /* Execute. */ @@ -1248,6 +1284,7 @@ struct gomp_device_descr /* Splay tree containing information about mapped memory regions. */ struct splay_tree_s mem_map; + struct reverse_splay_tree_s mem_map_rev; /* Mutex for the mutable data. */ gomp_mutex_t lock; diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index eb11b9cf16a..89a15a4d80f 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -283,6 +283,7 @@ static struct gomp_device_descr host_dispatch = .run_func = host_run, .mem_map = { NULL }, + .mem_map_rev = { NULL }, /* .lock initialized in goacc_host_init. */ .state = GOMP_DEVICE_UNINITIALIZED, diff --git a/libgomp/target.c b/libgomp/target.c index 135db1d88ab..b22678f687f 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -45,6 +45,11 @@ #include "plugin-suffix.h" #endif +/* Define another splay tree instantiation - for reverse offload. */ +#define splay_tree_prefix reverse +#define splay_tree_c +#include "splay-tree.h" + typedef uintptr_t *hash_entry_type; static inline void * htab_alloc (size_t size) { return gomp_malloc (size); } static inline void htab_free (void *ptr) { free (ptr); } @@ -200,6 +205,27 @@ gomp_map_lookup (splay_tree mem_map, splay_tree_key key) return splay_tree_lookup (mem_map, key); } +#if 0 +static inline reverse_splay_tree_key +gomp_map_lookup_rev (reverse_splay_tree mem_map_rev, reverse_splay_tree_key key) +{ + if (key->dev_start != key->dev_end) + return reverse_splay_tree_lookup (mem_map_rev, key); + + key->dev_end++; + reverse_splay_tree_key n = reverse_splay_tree_lookup (mem_map_rev, key); + key->dev_end--; + if (n) + return n; + key->dev_start--; + n = reverse_splay_tree_lookup (mem_map_rev, key); + key->dev_start++; + if (n) + return n; + return reverse_splay_tree_lookup (mem_map_rev, key); +} +#endif + static inline splay_tree_key gomp_map_0len_lookup (splay_tree mem_map, splay_tree_key key) { @@ -1811,6 +1837,7 @@ gomp_unmap_tgt (struct target_mem_desc *tgt) if (tgt->tgt_end) gomp_free_device_memory (tgt->device_descr, tgt->to_free); + free (tgt->rev_array); free (tgt->array); free (tgt); } @@ -2133,11 +2160,16 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, /* Load image to device and get target addresses for the image. */ struct addr_pair *target_table = NULL; + uint64_t *rev_target_fn_table = NULL; int i, num_target_entries; + /* With reverse offload, insert also target-host addresses. */ + bool rev_lookup = omp_requires_mask & GOMP_REQUIRES_REVERSE_OFFLOAD; + num_target_entries = devicep->load_image_func (devicep->target_id, version, - target_data, &target_table); + target_data, &target_table /* , + rev_lookup ? &rev_target_fn_table : NULL */); if (num_target_entries != num_funcs + num_vars /* Others (device_num) are included as trailing entries in pair list. */ @@ -2154,6 +2186,15 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, /* Insert host-target address mapping into splay tree. */ struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt)); tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array)); + if (rev_lookup) + { + size_t n = num_vars; + if (rev_target_fn_table) + n += num_funcs; + tgt->rev_array = gomp_malloc (n * sizeof (*tgt->rev_array)); + } + else + tgt->rev_array = NULL; tgt->refcount = REFCOUNT_INFINITY; tgt->tgt_start = 0; tgt->tgt_end = 0; @@ -2162,6 +2203,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, tgt->list_count = 0; tgt->device_descr = devicep; splay_tree_node array = tgt->array; + reverse_splay_tree_node rev_array = tgt->rev_array; for (i = 0; i < num_funcs; i++) { @@ -2176,6 +2218,18 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, array->left = NULL; array->right = NULL; splay_tree_insert (&devicep->mem_map, array); + if (rev_target_fn_table) + { + reverse_splay_tree_key k2 = &rev_array->key; + k2->dev_start = rev_target_fn_table[i]; + k2->dev_end = rev_target_fn_table[i] + 1; + k2->k = k; + rev_array->left = NULL; + rev_array->right = NULL; + if (k2->dev_start != 0) + reverse_splay_tree_insert (&devicep->mem_map_rev, rev_array); + rev_array++; + } array++; } @@ -2210,6 +2264,17 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, array->left = NULL; array->right = NULL; splay_tree_insert (&devicep->mem_map, array); + if (rev_lookup) + { + reverse_splay_tree_key k2 = &rev_array->key; + k2->dev_start = target_table[num_funcs + i].start; + k2->dev_end = k2->dev_start + k->host_end - k->host_start; + k2->k = k; + rev_array->left = NULL; + rev_array->right = NULL; + reverse_splay_tree_insert (&devicep->mem_map_rev, rev_array); + rev_array++; + } array++; } @@ -2242,11 +2307,16 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, } } + if (rev_target_fn_table) + free (rev_target_fn_table); free (target_table); } /* Unload the mappings described by target_data from device DEVICE_P. - The device must be locked. */ + The device must be locked. + Note: For reverse offload, the code assumes that the splay tree is + no longer needed and all images for this device are deleted. + (This is the case for all current callers.) */ static void gomp_unload_image_from_device (struct gomp_device_descr *devicep, @@ -2266,6 +2336,9 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep, struct splay_tree_key_s k; splay_tree_key node = NULL; + /* Remove splay tree access; the memory is freed in gomp_unmap_tgt. */ + devicep->mem_map_rev.root = NULL; + /* Find mapping at start of node array */ if (num_funcs || num_vars) { @@ -4233,6 +4306,7 @@ gomp_target_init (void) /* current_device.capabilities has already been set. */ current_device.type = current_device.get_type_func (); current_device.mem_map.root = NULL; + current_device.mem_map_rev.root = NULL; current_device.state = GOMP_DEVICE_UNINITIALIZED; for (i = 0; i < new_num_devs; i++) {