From patchwork Mon Jan 30 10:26:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 721336 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vBlvD5TVzz9ryT for ; Mon, 30 Jan 2017 21:27:18 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="IY8gXeF+"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=UO23eAz+Qr2zMSko8VIXCCKZML0DFHsaP5K+CWCXsFXxaVmdVu3cc cYnU7K9545YcZBU6l6vKf5vpaBreka9GEoddlxHPQc42r5SCM53LB/1cBisYYC4X 4J0rwS6CwflCe4TGtQsHp2br0GzgFEr6DHujJes3TnojJPNbbLO/34= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type:content-transfer-encoding; s=default; bh=nwq8bP9rqa7iwLJaSrWel2Ij750=; b=IY8gXeF+DTOIcRkMzVLXgJRAH9y/ YCPYPQoBtYw4sjKxDBkwgfMVKeqBUPQ3DTg13c9wsntqX3SbqJFX9gXhyHpXr4UV MZCI4AFZSSurt7keGBAYvyioZanA1WQ2cxSRLYXxjtM/7T0FGo2n/9W+K2MxuPYC CJUXoYjqTdG7LUg= Received: (qmail 101366 invoked by alias); 30 Jan 2017 10:26:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 101340 invoked by uid 89); 30 Jan 2017 10:26:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=H*r:PST, transfer X-Spam-User: qpsmtpd, 2 recipients X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Jan 2017 10:26:47 +0000 Received: from svr-orw-fem-02x.mgc.mentorg.com ([147.34.96.206] helo=SVR-ORW-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1cY9Ab-0006r6-Lr from Thomas_Schwinge@mentor.com ; Mon, 30 Jan 2017 02:26:45 -0800 Received: from tftp-cs (147.34.91.1) by svr-orw-fem-02.mgc.mentorg.com (147.34.96.168) with Microsoft SMTP Server id 14.3.224.2; Mon, 30 Jan 2017 02:26:45 -0800 Received: by tftp-cs (Postfix, from userid 49978) id 9B966C23DB; Mon, 30 Jan 2017 02:26:44 -0800 (PST) From: Thomas Schwinge To: Cesar Philippidis CC: "gcc-patches@gcc.gnu.org" , Fortran List Subject: Re: [gomp4] optimize GOMP_MAP_TO_PSET In-Reply-To: <551a81a0-8ad4-9250-d8cd-68378e06d8fc@codesourcery.com> References: <551a81a0-8ad4-9250-d8cd-68378e06d8fc@codesourcery.com> User-Agent: Notmuch/0.9-125-g4686d11 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Mon, 30 Jan 2017 11:26:31 +0100 Message-ID: <87lgtsc088.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Hi! On Fri, 27 Jan 2017 08:06:22 -0800, Cesar Philippidis wrote: > This patch optimizes GOMP_MAP_TO_PSET in libgomp by installing the > remapped pointer to the array data directly in the PSET, instead of > uploading it separately with GOMP_MAP_POINTER. Effectively this > eliminates the GOMP_MAP_POINTER that is associated with the PSET, > thereby eliminating an additional host2dev data transfer. > > While it does work, it's not quite as effective as I had hope it would > be. I'm only observing about 0.05s, if that, in CloverLeaf, and arguably > that's statistical noise. Ah, too bad. > This is probably because CloverLeaf makes use > of ACC DATA regions in the critical sections, so all of those PSETs and > POINTERs are already preset on the accelerator. > > One thing I don't like about this patch is that I'm updating the host's > copy of the PSET prior to uploading it. The host's PSET does get > restored prior to returning from gomp_map_vars, however that might > impact things if the host were to run in multi-threaded applications. > Maybe I'll drop this patch from gomp4 since it's not very effective. ... also there is some bug somewhere; I see: PASS: libgomp.fortran/examples-4/async_target-2.f90 -O0 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90 -O0 execution test PASS: libgomp.fortran/examples-4/async_target-2.f90 -O1 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90 -O1 execution test PASS: libgomp.fortran/examples-4/async_target-2.f90 -O2 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90 -O2 execution test PASS: libgomp.fortran/examples-4/async_target-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/examples-4/async_target-2.f90 -O3 -g (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90 -O3 -g execution test PASS: libgomp.fortran/examples-4/async_target-2.f90 -Os (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90 -Os execution test ..., and: PASS: libgomp.fortran/target3.f90 -O0 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90 -O0 execution test PASS: libgomp.fortran/target3.f90 -O1 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90 -O1 execution test PASS: libgomp.fortran/target3.f90 -O2 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90 -O2 execution test PASS: libgomp.fortran/target3.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/target3.f90 -O3 -g (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90 -O3 -g execution test PASS: libgomp.fortran/target3.f90 -Os (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90 -Os execution test In all cases, the run-time error message is: libgomp: Pointer target of array section wasn't mapped For reference, I'm appending the patch, which wasn't included in the original email. Grüße Thomas commit 29f783a0e2162fea3e21e7f4295bc16f252e1c1f Author: cesar Date: Fri Jan 27 15:51:52 2017 +0000 Optimize GOMP_MAP_TO_PSET. libgomp/ * target.c (gomp_map_pset): New function. (gomp_map_vars): Update GOMP_MAP_POINTER with GOMP_MAP_TO_PSET, when possible. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@244984 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 6 +++ libgomp/target.c | 101 ++++++++++++++++++++++++++++++++++--------------- 2 files changed, 77 insertions(+), 30 deletions(-) diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index 40b536f..0fb5297 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,5 +1,11 @@ 2017-01-27 Cesar Philippidis + * target.c (gomp_map_pset): New function. + (gomp_map_vars): Update GOMP_MAP_POINTER with GOMP_MAP_TO_PSET, when + possible. + +2017-01-27 Cesar Philippidis + * plugin/plugin-nvptx.c (nvptx_exec): Make aware of GOMP_MAP_FIRSTPRIVATE_INT host addresses. * testsuite/libgomp.oacc-c++/firstprivate-int.C: New test. diff --git libgomp/target.c libgomp/target.c index 557f565..590b7dd 100644 --- libgomp/target.c +++ libgomp/target.c @@ -295,6 +295,37 @@ gomp_map_pointer (struct target_mem_desc *tgt, uintptr_t host_ptr, (void *) &cur_node.tgt_offset, sizeof (void *)); } +static uintptr_t +gomp_map_pset (struct target_mem_desc *tgt, uintptr_t host_ptr, + uintptr_t target_offset, uintptr_t bias) +{ + struct gomp_device_descr *devicep = tgt->device_descr; + struct splay_tree_s *mem_map = &devicep->mem_map; + struct splay_tree_key_s cur_node; + + cur_node.host_start = host_ptr; + + /* Add bias to the pointer value. */ + cur_node.host_start += bias; + cur_node.host_end = cur_node.host_start; + splay_tree_key n = gomp_map_lookup (mem_map, &cur_node); + if (n == NULL) + { + gomp_mutex_unlock (&devicep->lock); + gomp_fatal ("Pointer target of array section wasn't mapped"); + } + + cur_node.host_start -= n->host_start; + cur_node.tgt_offset + = n->tgt->tgt_start + n->tgt_offset + cur_node.host_start; + /* At this point tgt_offset is target address of the + array section. Now subtract bias to get what we want + to initialize the pointer with. */ + cur_node.tgt_offset -= bias; + + return cur_node.tgt_offset; +} + static void gomp_map_fields_existing (struct target_mem_desc *tgt, splay_tree_key n, size_t first, size_t i, void **hostaddrs, @@ -984,36 +1015,46 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum, break; case GOMP_MAP_TO_PSET: /* FIXME: see above FIXME comment. */ - gomp_copy_host2dev (devicep, - (void *) (tgt->tgt_start - + k->tgt_offset), - (void *) k->host_start, - k->host_end - k->host_start); - - for (j = i + 1; j < mapnum; j++) - if (!GOMP_MAP_POINTER_P (get_kind (short_mapkind, kinds, - j) - & typemask)) - break; - else if ((uintptr_t) hostaddrs[j] < k->host_start - || ((uintptr_t) hostaddrs[j] + sizeof (void *) - > k->host_end)) - break; - else - { - tgt->list[j].key = k; - tgt->list[j].copy_from = false; - tgt->list[j].always_copy_from = false; - if (k->refcount != REFCOUNT_INFINITY) - k->refcount++; - gomp_map_pointer (tgt, - (uintptr_t) *(void **) hostaddrs[j], - k->tgt_offset - + ((uintptr_t) hostaddrs[j] - - k->host_start), - sizes[j]); - i++; - } + { + bool found_pointer = false; + for (j = i + 1; j < mapnum; j++) + if (!GOMP_MAP_POINTER_P (get_kind (short_mapkind, kinds, + j) + & typemask)) + break; + else if ((uintptr_t) hostaddrs[j] < k->host_start + || ((uintptr_t) hostaddrs[j] + sizeof (void *) + > k->host_end)) + break; + else + { + uintptr_t tptr = 0; + uintptr_t toffset = + gomp_map_pset (tgt, + (uintptr_t) *(void **) + hostaddrs[j], + k->tgt_offset + + ((uintptr_t) hostaddrs[j] + - k->host_start), + sizes[j]); + tptr = *(uintptr_t *) hostaddrs[i]; + *(uintptr_t *) hostaddrs[i]= toffset; + gomp_copy_host2dev (devicep, + (void *) (tgt->tgt_start + + k->tgt_offset), + (void *) k->host_start, + k->host_end - k->host_start); + *(uintptr_t *) hostaddrs[i] = tptr; + i++; + found_pointer = true; + } + if (!found_pointer) + gomp_copy_host2dev (devicep, + (void *) (tgt->tgt_start + + k->tgt_offset), + (void *) k->host_start, + k->host_end - k->host_start); + } break; case GOMP_MAP_FORCE_PRESENT: {