From patchwork Tue Nov 20 21:54:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1000753 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-490566-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="j9rkkuvu"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42zzz41Cypz9s3q for ; Wed, 21 Nov 2018 08:55:27 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; q=dns; s=default; b=lZcBtvnhptYbjIut So8tbtB1mvjE3A4BNC9OBCiMS2xiZEB2Z/xtwnIxN5ekMUH/pyxygUpdKnuaqB2F ZA2GRZHBMUMwMD56cO47phoz8UQYd1S+9kjXjD0ovQYRcoXC6oz5+AC5yvlX6tHA G/eOU58mm0cKOOsDk0DoSYNM3lQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; s=default; bh=uuAv5080rAztv0xWIuYtAj B4HqE=; b=j9rkkuvuOJdP9TmvLldbTSx0hJE3VEN+K6cgmCJgtTxQAJj8TH8WGz 5SE8yi62JKRko/LyHOkWnKu2IpSsLjWmpqAsF+GpKHLFcV5RNpjzc58vEdn5oyvL M2ZUeq5ipZgCHEBZe4wjFK2ogLgew5nroniYacATMUaGrebEXRWjI= Received: (qmail 41043 invoked by alias); 20 Nov 2018 21:55:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 40919 invoked by uid 89); 20 Nov 2018 21:55:11 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS, UNWANTED_LANGUAGE_BODY autolearn=ham version=3.3.2 spammy=transfers, coalesce, onto, Special X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Nov 2018 21:55:08 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1gPDz8-0002vM-Rz from Julian_Brown@mentor.com ; Tue, 20 Nov 2018 13:55:07 -0800 Received: from localhost.localdomain (147.34.91.1) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Tue, 20 Nov 2018 21:55:01 +0000 From: Julian Brown To: CC: , , Subject: [PATCH 1/6] [og8] Host-to-device transfer coalescing & magic offset value self-documentation Date: Tue, 20 Nov 2018 13:54:47 -0800 Message-ID: In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes Previously posted upstream: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00825.html libgomp/ * libgomp.h (OFFSET_INLINED, OFFSET_POINTER, OFFSET_STRUCT): Define. * target.c (FIELD_TGT_EMPTY): Define. (gomp_coalesce_chunk): New. (gomp_coalesce_buf): Use above instead of flat array of size_t pairs. (gomp_coalesce_buf_add): Adjust for above change. (gomp_copy_host2dev): Likewise. (gomp_map_val): Use OFFSET_* macros instead of magic constants. Write as switch instead of list of ifs. (gomp_map_vars_async): Adjust for gomp_coalesce_chunk change. Use OFFSET_* macros. --- libgomp/libgomp.h | 5 +++ libgomp/target.c | 101 +++++++++++++++++++++++++++++++--------------------- 2 files changed, 65 insertions(+), 41 deletions(-) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 607f4c2..acf7f8f 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -842,6 +842,11 @@ struct target_mem_desc { artificial pointer to "omp declare target link" object. */ #define REFCOUNT_LINK (~(uintptr_t) 1) +/* Special offset values. */ +#define OFFSET_INLINED (~(uintptr_t) 0) +#define OFFSET_POINTER (~(uintptr_t) 1) +#define OFFSET_STRUCT (~(uintptr_t) 2) + struct splay_tree_key_s { /* Address of the host object. */ uintptr_t host_start; diff --git a/libgomp/target.c b/libgomp/target.c index ab17650..7220ac6 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -45,6 +45,8 @@ #include "plugin-suffix.h" #endif +#define FIELD_TGT_EMPTY (~(size_t) 0) + static void gomp_target_init (void); /* The whole initialization code for offloading plugins is only run one. */ @@ -206,8 +208,14 @@ goacc_device_copy_async (struct gomp_device_descr *devicep, } } -/* Infrastructure for coalescing adjacent or nearly adjacent (in device addresses) - host to device memory transfers. */ +/* Infrastructure for coalescing adjacent or nearly adjacent (in device + addresses) host to device memory transfers. */ + +struct gomp_coalesce_chunk +{ + /* The starting and ending point of a coalesced chunk of memory. */ + size_t start, end; +}; struct gomp_coalesce_buf { @@ -215,10 +223,10 @@ struct gomp_coalesce_buf it will be copied to the device. */ void *buf; struct target_mem_desc *tgt; - /* Array with offsets, chunks[2 * i] is the starting offset and - chunks[2 * i + 1] ending offset relative to tgt->tgt_start device address + /* Array with offsets, chunks[i].start is the starting offset and + chunks[i].end ending offset relative to tgt->tgt_start device address of chunks which are to be copied to buf and later copied to device. */ - size_t *chunks; + struct gomp_coalesce_chunk *chunks; /* Number of chunks in chunks array, or -1 if coalesce buffering should not be performed. */ long chunk_cnt; @@ -251,14 +259,14 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len) { if (cbuf->chunk_cnt < 0) return; - if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1]) + if (start < cbuf->chunks[cbuf->chunk_cnt-1].end) { cbuf->chunk_cnt = -1; return; } - if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1] + MAX_COALESCE_BUF_GAP) + if (start < cbuf->chunks[cbuf->chunk_cnt-1].end + MAX_COALESCE_BUF_GAP) { - cbuf->chunks[2 * cbuf->chunk_cnt - 1] = start + len; + cbuf->chunks[cbuf->chunk_cnt-1].end = start + len; cbuf->use_cnt++; return; } @@ -268,8 +276,8 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len) if (cbuf->use_cnt == 1) cbuf->chunk_cnt--; } - cbuf->chunks[2 * cbuf->chunk_cnt] = start; - cbuf->chunks[2 * cbuf->chunk_cnt + 1] = start + len; + cbuf->chunks[cbuf->chunk_cnt].start = start; + cbuf->chunks[cbuf->chunk_cnt].end = start + len; cbuf->chunk_cnt++; cbuf->use_cnt = 1; } @@ -301,20 +309,20 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, if (cbuf) { uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start; - if (doff < cbuf->chunks[2 * cbuf->chunk_cnt - 1]) + if (doff < cbuf->chunks[cbuf->chunk_cnt-1].end) { long first = 0; long last = cbuf->chunk_cnt - 1; while (first <= last) { long middle = (first + last) >> 1; - if (cbuf->chunks[2 * middle + 1] <= doff) + if (cbuf->chunks[middle].end <= doff) first = middle + 1; - else if (cbuf->chunks[2 * middle] <= doff) + else if (cbuf->chunks[middle].start <= doff) { - if (doff + sz > cbuf->chunks[2 * middle + 1]) + if (doff + sz > cbuf->chunks[middle].end) gomp_fatal ("internal libgomp cbuf error"); - memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0]), + memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start), h, sz); return; } @@ -538,17 +546,25 @@ gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i) return tgt->list[i].key->tgt->tgt_start + tgt->list[i].key->tgt_offset + tgt->list[i].offset; - if (tgt->list[i].offset == ~(uintptr_t) 0) - return (uintptr_t) hostaddrs[i]; - if (tgt->list[i].offset == ~(uintptr_t) 1) - return 0; - if (tgt->list[i].offset == ~(uintptr_t) 2) - return tgt->list[i + 1].key->tgt->tgt_start - + tgt->list[i + 1].key->tgt_offset - + tgt->list[i + 1].offset - + (uintptr_t) hostaddrs[i] - - (uintptr_t) hostaddrs[i + 1]; - return tgt->tgt_start + tgt->list[i].offset; + + switch (tgt->list[i].offset) + { + case OFFSET_INLINED: + return (uintptr_t) hostaddrs[i]; + + case OFFSET_POINTER: + return 0; + + case OFFSET_STRUCT: + return tgt->list[i + 1].key->tgt->tgt_start + + tgt->list[i + 1].key->tgt_offset + + tgt->list[i + 1].offset + + (uintptr_t) hostaddrs[i] + - (uintptr_t) hostaddrs[i + 1]; + + default: + return tgt->tgt_start + tgt->list[i].offset; + } } /* Dynamic array related data structures, interfaces with the compiler. */ @@ -758,8 +774,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, cbuf.buf = NULL; if (mapnum > 1 || pragma_kind == GOMP_MAP_VARS_TARGET) { - cbuf.chunks - = (size_t *) gomp_alloca ((2 * mapnum + 2) * sizeof (size_t)); + size_t chunk_size = (mapnum + 1) * sizeof (struct gomp_coalesce_chunk); + cbuf.chunks = (struct gomp_coalesce_chunk *) gomp_alloca (chunk_size); cbuf.chunk_cnt = 0; } if (pragma_kind == GOMP_MAP_VARS_TARGET) @@ -769,8 +785,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, tgt_size = mapnum * sizeof (void *); cbuf.chunk_cnt = 1; cbuf.use_cnt = 1 + (mapnum > 1); - cbuf.chunks[0] = 0; - cbuf.chunks[1] = tgt_size; + cbuf.chunks[0].start = 0; + cbuf.chunks[0].end = tgt_size; } gomp_mutex_lock (&devicep->lock); @@ -788,7 +804,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, || (kind & typemask) == GOMP_MAP_FIRSTPRIVATE_INT) { tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 0; + tgt->list[i].offset = OFFSET_INLINED; continue; } else if ((kind & typemask) == GOMP_MAP_USE_DEVICE_PTR) @@ -806,7 +822,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, = (void *) (n->tgt->tgt_start + n->tgt_offset + cur_node.host_start); tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 0; + tgt->list[i].offset = OFFSET_INLINED; continue; } else if ((kind & typemask) == GOMP_MAP_STRUCT) @@ -817,7 +833,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, cur_node.host_end = (uintptr_t) hostaddrs[last] + sizes[last]; tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 2; + tgt->list[i].offset = OFFSET_STRUCT; splay_tree_key n = splay_tree_lookup (mem_map, &cur_node); if (n == NULL) { @@ -850,7 +866,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, else if ((kind & typemask) == GOMP_MAP_ALWAYS_POINTER) { tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 1; + tgt->list[i].offset = OFFSET_POINTER; has_firstprivate = true; continue; } @@ -894,7 +910,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, if (!n) { tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 1; + tgt->list[i].offset = OFFSET_POINTER; continue; } } @@ -1018,7 +1034,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, if (cbuf.chunk_cnt > 0) { cbuf.buf - = malloc (cbuf.chunks[2 * cbuf.chunk_cnt - 1] - cbuf.chunks[0]); + = malloc (cbuf.chunks[cbuf.chunk_cnt-1].end - cbuf.chunks[0].start); if (cbuf.buf) { cbuf.tgt = tgt; @@ -1144,6 +1160,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, else k->host_end = k->host_start + sizeof (void *); splay_tree_key n = splay_tree_lookup (mem_map, k); + /* Need to account for the case where a struct field hasn't been + mapped onto the accelerator yet. */ if (n && n->refcount != REFCOUNT_LINK) gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i], kind & typemask, cbufp); @@ -1160,12 +1178,12 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, size_t align = (size_t) 1 << (kind >> rshift); tgt->list[i].key = k; k->tgt = tgt; - if (field_tgt_clear != ~(size_t) 0) + if (field_tgt_clear != FIELD_TGT_EMPTY) { k->tgt_offset = k->host_start - field_tgt_base + field_tgt_offset; if (i == field_tgt_clear) - field_tgt_clear = ~(size_t) 0; + field_tgt_clear = FIELD_TGT_EMPTY; } else { @@ -1419,9 +1437,10 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, long c = 0; for (c = 0; c < cbuf.chunk_cnt; ++c) gomp_copy_host2dev (devicep, aq, - (void *) (tgt->tgt_start + cbuf.chunks[2 * c]), - (char *) cbuf.buf + (cbuf.chunks[2 * c] - cbuf.chunks[0]), - cbuf.chunks[2 * c + 1] - cbuf.chunks[2 * c], NULL); + (void *) (tgt->tgt_start + cbuf.chunks[c].start), + (char *) cbuf.buf + (cbuf.chunks[c].start + - cbuf.chunks[0].start), + cbuf.chunks[c].end - cbuf.chunks[c].start, NULL); free (cbuf.buf); }