From patchwork Sat Nov 10 17:11:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 995937 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-489622-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="HKyIX8cE"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42sk8g323Fz9s9G for ; Sun, 11 Nov 2018 04:12:03 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; q=dns; s=default; b=fuq1ec+jHydFiQsn TIKUiDJ2uD5YRAqZ3gL6nH7Bvm4DEeu4Lv7KDF6/w0qq1esQaXA4sZnMdyf4sGy+ Lk4BmduXRkzFr2fatkUDkC8fCEZIeEwU2oVBPYpRGR3OMO34p6peiwt1fUvspEab 2u8Ct4pPZ27Cd4A2ZweQVe7H7sQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; s=default; bh=8fstFUlIwyp6xomMdvNZ/c MUEZM=; b=HKyIX8cExD0uU1ob41c8tI0TBWBPpUx8W9ZU5KqHYEUgqXNXAM9zXM 6dHFqzQz4ghBU6DmTTg04sJmefw0ok9Qa1IJZSxD2ElsEh4Ts4pd5nQT/qMOFosk FqVVGX9tElbH4bVD2FdAsvm6tBWLNYJEzI+YOqLP4qzc5KhjvX7wo= Received: (qmail 100468 invoked by alias); 10 Nov 2018 17:11:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 100365 invoked by uid 89); 10 Nov 2018 17:11:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, UNWANTED_LANGUAGE_BODY autolearn=ham version=3.3.2 spammy=transfers, 8766, 458, 2058 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 10 Nov 2018 17:11:38 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1gLWnI-0005c4-EZ from Julian_Brown@mentor.com ; Sat, 10 Nov 2018 09:11:36 -0800 Received: from localhost.localdomain (147.34.91.1) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Sat, 10 Nov 2018 17:11:31 +0000 From: Julian Brown To: CC: , , , Subject: [PATCH 1/3] Host-to-device transfer coalescing & magic offset value self-documentation Date: Sat, 10 Nov 2018 09:11:18 -0800 Message-ID: <8340b3d7685106871b060c54f894105f20cdc052.1541863637.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch (by Cesar, with some minor additional changes) replaces usage of several magic constants in target.c with named macros, and replaces the flat array of size_t pairs used for coalescing host-to-device copies with an array of a new struct with start/end fields instead. Tested and bootstrapped alongside the other patches in this series (plus the async patches).. OK? Julian ChangeLog libgomp/ * libgomp.h (OFFSET_INLINED, OFFSET_POINTER, OFFSET_STRUCT): Define. * target.c (FIELD_TGT_EMPTY): Define. (gomp_coalesce_chunk): New. (gomp_coalesce_buf): Use above instead of flat array of size_t pairs. (gomp_coalesce_buf_add): Adjust for above change. (gomp_copy_host2dev): Likewise. (gomp_map_val): Use OFFSET_* macros instead of magic constants. Write as switch instead of list of ifs. (gomp_map_vars_async): Adjust for gomp_coalesce_chunk change. Use OFFSET_* macros. --- libgomp/libgomp.h | 5 +++ libgomp/target.c | 101 ++++++++++++++++++++++++++++++++---------------------- 2 files changed, 65 insertions(+), 41 deletions(-) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index dac8dc4..cb25e86 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -902,6 +902,11 @@ struct target_mem_desc { artificial pointer to "omp declare target link" object. */ #define REFCOUNT_LINK (~(uintptr_t) 1) +/* Special offset values. */ +#define OFFSET_INLINED (~(uintptr_t) 0) +#define OFFSET_POINTER (~(uintptr_t) 1) +#define OFFSET_STRUCT (~(uintptr_t) 2) + struct splay_tree_key_s { /* Address of the host object. */ uintptr_t host_start; diff --git a/libgomp/target.c b/libgomp/target.c index f3e2332..2bfc7e2 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -45,6 +45,8 @@ #include "plugin-suffix.h" #endif +#define FIELD_TGT_EMPTY (~(size_t) 0) + static void gomp_target_init (void); /* The whole initialization code for offloading plugins is only run one. */ @@ -205,8 +207,14 @@ goacc_device_copy_async (struct gomp_device_descr *devicep, } } -/* Infrastructure for coalescing adjacent or nearly adjacent (in device addresses) - host to device memory transfers. */ +/* Infrastructure for coalescing adjacent or nearly adjacent (in device + addresses) host to device memory transfers. */ + +struct gomp_coalesce_chunk +{ + /* The starting and ending point of a coalesced chunk of memory. */ + size_t start, end; +}; struct gomp_coalesce_buf { @@ -214,10 +222,10 @@ struct gomp_coalesce_buf it will be copied to the device. */ void *buf; struct target_mem_desc *tgt; - /* Array with offsets, chunks[2 * i] is the starting offset and - chunks[2 * i + 1] ending offset relative to tgt->tgt_start device address + /* Array with offsets, chunks[i].start is the starting offset and + chunks[i].end ending offset relative to tgt->tgt_start device address of chunks which are to be copied to buf and later copied to device. */ - size_t *chunks; + struct gomp_coalesce_chunk *chunks; /* Number of chunks in chunks array, or -1 if coalesce buffering should not be performed. */ long chunk_cnt; @@ -250,14 +258,14 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len) { if (cbuf->chunk_cnt < 0) return; - if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1]) + if (start < cbuf->chunks[cbuf->chunk_cnt-1].end) { cbuf->chunk_cnt = -1; return; } - if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1] + MAX_COALESCE_BUF_GAP) + if (start < cbuf->chunks[cbuf->chunk_cnt-1].end + MAX_COALESCE_BUF_GAP) { - cbuf->chunks[2 * cbuf->chunk_cnt - 1] = start + len; + cbuf->chunks[cbuf->chunk_cnt-1].end = start + len; cbuf->use_cnt++; return; } @@ -267,8 +275,8 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len) if (cbuf->use_cnt == 1) cbuf->chunk_cnt--; } - cbuf->chunks[2 * cbuf->chunk_cnt] = start; - cbuf->chunks[2 * cbuf->chunk_cnt + 1] = start + len; + cbuf->chunks[cbuf->chunk_cnt].start = start; + cbuf->chunks[cbuf->chunk_cnt].end = start + len; cbuf->chunk_cnt++; cbuf->use_cnt = 1; } @@ -300,20 +308,20 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, if (cbuf) { uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start; - if (doff < cbuf->chunks[2 * cbuf->chunk_cnt - 1]) + if (doff < cbuf->chunks[cbuf->chunk_cnt-1].end) { long first = 0; long last = cbuf->chunk_cnt - 1; while (first <= last) { long middle = (first + last) >> 1; - if (cbuf->chunks[2 * middle + 1] <= doff) + if (cbuf->chunks[middle].end <= doff) first = middle + 1; - else if (cbuf->chunks[2 * middle] <= doff) + else if (cbuf->chunks[middle].start <= doff) { - if (doff + sz > cbuf->chunks[2 * middle + 1]) + if (doff + sz > cbuf->chunks[middle].end) gomp_fatal ("internal libgomp cbuf error"); - memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0]), + memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start), h, sz); return; } @@ -504,17 +512,25 @@ gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i) return tgt->list[i].key->tgt->tgt_start + tgt->list[i].key->tgt_offset + tgt->list[i].offset; - if (tgt->list[i].offset == ~(uintptr_t) 0) - return (uintptr_t) hostaddrs[i]; - if (tgt->list[i].offset == ~(uintptr_t) 1) - return 0; - if (tgt->list[i].offset == ~(uintptr_t) 2) - return tgt->list[i + 1].key->tgt->tgt_start - + tgt->list[i + 1].key->tgt_offset - + tgt->list[i + 1].offset - + (uintptr_t) hostaddrs[i] - - (uintptr_t) hostaddrs[i + 1]; - return tgt->tgt_start + tgt->list[i].offset; + + switch (tgt->list[i].offset) + { + case OFFSET_INLINED: + return (uintptr_t) hostaddrs[i]; + + case OFFSET_POINTER: + return 0; + + case OFFSET_STRUCT: + return tgt->list[i + 1].key->tgt->tgt_start + + tgt->list[i + 1].key->tgt_offset + + tgt->list[i + 1].offset + + (uintptr_t) hostaddrs[i] + - (uintptr_t) hostaddrs[i + 1]; + + default: + return tgt->tgt_start + tgt->list[i].offset; + } } attribute_hidden struct target_mem_desc * @@ -562,8 +578,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, cbuf.buf = NULL; if (mapnum > 1 || pragma_kind == GOMP_MAP_VARS_TARGET) { - cbuf.chunks - = (size_t *) gomp_alloca ((2 * mapnum + 2) * sizeof (size_t)); + size_t chunk_size = (mapnum + 1) * sizeof (struct gomp_coalesce_chunk); + cbuf.chunks = (struct gomp_coalesce_chunk *) gomp_alloca (chunk_size); cbuf.chunk_cnt = 0; } if (pragma_kind == GOMP_MAP_VARS_TARGET) @@ -573,8 +589,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, tgt_size = mapnum * sizeof (void *); cbuf.chunk_cnt = 1; cbuf.use_cnt = 1 + (mapnum > 1); - cbuf.chunks[0] = 0; - cbuf.chunks[1] = tgt_size; + cbuf.chunks[0].start = 0; + cbuf.chunks[0].end = tgt_size; } gomp_mutex_lock (&devicep->lock); @@ -592,7 +608,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, || (kind & typemask) == GOMP_MAP_FIRSTPRIVATE_INT) { tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 0; + tgt->list[i].offset = OFFSET_INLINED; continue; } else if ((kind & typemask) == GOMP_MAP_USE_DEVICE_PTR) @@ -610,7 +626,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, = (void *) (n->tgt->tgt_start + n->tgt_offset + cur_node.host_start); tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 0; + tgt->list[i].offset = OFFSET_INLINED; continue; } else if ((kind & typemask) == GOMP_MAP_STRUCT) @@ -621,7 +637,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, cur_node.host_end = (uintptr_t) hostaddrs[last] + sizes[last]; tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 2; + tgt->list[i].offset = OFFSET_STRUCT; splay_tree_key n = splay_tree_lookup (mem_map, &cur_node); if (n == NULL) { @@ -654,7 +670,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, else if ((kind & typemask) == GOMP_MAP_ALWAYS_POINTER) { tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 1; + tgt->list[i].offset = OFFSET_POINTER; has_firstprivate = true; continue; } @@ -684,7 +700,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, if (!n) { tgt->list[i].key = NULL; - tgt->list[i].offset = ~(uintptr_t) 1; + tgt->list[i].offset = OFFSET_POINTER; continue; } } @@ -759,7 +775,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, if (cbuf.chunk_cnt > 0) { cbuf.buf - = malloc (cbuf.chunks[2 * cbuf.chunk_cnt - 1] - cbuf.chunks[0]); + = malloc (cbuf.chunks[cbuf.chunk_cnt-1].end - cbuf.chunks[0].start); if (cbuf.buf) { cbuf.tgt = tgt; @@ -876,6 +892,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, else k->host_end = k->host_start + sizeof (void *); splay_tree_key n = splay_tree_lookup (mem_map, k); + /* Need to account for the case where a struct field hasn't been + mapped onto the accelerator yet. */ if (n && n->refcount != REFCOUNT_LINK) gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i], kind & typemask, cbufp); @@ -892,12 +910,12 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, size_t align = (size_t) 1 << (kind >> rshift); tgt->list[i].key = k; k->tgt = tgt; - if (field_tgt_clear != ~(size_t) 0) + if (field_tgt_clear != FIELD_TGT_EMPTY) { k->tgt_offset = k->host_start - field_tgt_base + field_tgt_offset; if (i == field_tgt_clear) - field_tgt_clear = ~(size_t) 0; + field_tgt_clear = FIELD_TGT_EMPTY; } else { @@ -1035,9 +1053,10 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, long c = 0; for (c = 0; c < cbuf.chunk_cnt; ++c) gomp_copy_host2dev (devicep, aq, - (void *) (tgt->tgt_start + cbuf.chunks[2 * c]), - (char *) cbuf.buf + (cbuf.chunks[2 * c] - cbuf.chunks[0]), - cbuf.chunks[2 * c + 1] - cbuf.chunks[2 * c], NULL); + (void *) (tgt->tgt_start + cbuf.chunks[c].start), + (char *) cbuf.buf + (cbuf.chunks[c].start + - cbuf.chunks[0].start), + cbuf.chunks[c].end - cbuf.chunks[c].start, NULL); free (cbuf.buf); }