From patchwork Tue Oct 28 16:07:21 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 404286 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4B9EC14007B for ; Wed, 29 Oct 2014 03:07:42 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type; q=dns; s=default; b=I7bPXlf0tcbZdLsm wjRJvOQsXeemkNm5bJ2smf4h3Hd+Ky5wKtHrg4XAJ82aWzf/iXijeP65uTbCOlVf /osqXv/qKBR0ItAb4KSxAhvEiihFdUMJ0s8Yq4WDGXnY0tm2m/Na+FFR5itLPzC2 888cQL5Tqk9FmQh0eAgPo0wsgnc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-type; s=default; bh=SeYOvN5SBt/0snu929ubns jOn7A=; b=dynbikxDWu23FL6De5kD1w9B+Uo4KdgE0KjYF8SP+0D35C8Chr5qBz dARgJHSFDSfkXLWSvE/GoL1tbe0kTYc6YI/p8peT/pUkI8LDPHT2lwMJduqh2iJT a1vY1YEOVuAX82wb+MKI28osRSadzRKZAn6CpxokqOQJY2KgdkuVA= Received: (qmail 23833 invoked by alias); 28 Oct 2014 16:07:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 23820 invoked by uid 89); 28 Oct 2014 16:07:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Oct 2014 16:07:31 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-03.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1Xj9It-0002jU-76 from Thomas_Schwinge@mentor.com for gcc-patches@gcc.gnu.org; Tue, 28 Oct 2014 09:07:27 -0700 Received: from feldtkeller.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-03.mgc.mentorg.com (137.202.0.108) with Microsoft SMTP Server id 14.3.181.6; Tue, 28 Oct 2014 16:07:25 +0000 From: Thomas Schwinge To: CC: Julian Brown Subject: Re: [gomp4] [1/3] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin In-Reply-To: <20141014171118.6cec6fb4@octopus> References: <20141014171118.6cec6fb4@octopus> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.3.1 (i586-pc-linux-gnu) Date: Tue, 28 Oct 2014 17:07:21 +0100 Message-ID: <87bnowgoue.fsf@kepler.schwinge.homeip.net> MIME-Version: 1.0 Hi! Following the noble goal of code re-use, we had been using for a standard C linked list implementation. However, we found that elderly (but still sufficient to build GCC) glibc releases contain a variant of that pre-dates a 2006 upstream glibc update to a more recent upstream BSD version of that file, and so is missing certain interfaces that we were using. Instead of conditionally re-implementing those, in r216803 I committed a patch to remove the LIST_* usage, and instead do things manually: commit ba8916f6bc1dd93d8b6dc92f3d84aec49b68dea9 Author: tschwinge Date: Tue Oct 28 15:57:37 2014 +0000 libgomp: Don't use 's SLIST_*. Some of the interfaces are "too new". libgomp/ * oacc-init.c: Don't use 's SLIST_*. * plugin-nvptx.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@216803 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 5 ++ libgomp/oacc-init.c | 23 ++++----- libgomp/plugin-nvptx.c | 138 +++++++++++++++++++++++++++++-------------------- 3 files changed, 96 insertions(+), 70 deletions(-) Grüße, Thomas diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index 5363068..fda1cbc 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-10-28 Thomas Schwinge + + * oacc-init.c: Don't use 's SLIST_*. + * plugin-nvptx.c: Likewise. + 2014-10-23 Thomas Schwinge * testsuite/libgomp.oacc-c/reduction-initial-1.c: New file. diff --git libgomp/oacc-init.c libgomp/oacc-init.c index f797f89..ffa9ad8 100644 --- libgomp/oacc-init.c +++ libgomp/oacc-init.c @@ -31,7 +31,6 @@ #include #include #include -#include #include gomp_mutex_t acc_device_lock; @@ -55,11 +54,11 @@ static __thread int handle_num = -1; struct ACC_context { struct memmap_t *ACC_memmap; void *ACC_handle; - SLIST_ENTRY(ACC_context) next; + + struct ACC_context *next; }; -static SLIST_HEAD(_ACC_contexts, ACC_context) _ACC_contexts; -static struct _ACC_contexts *ACC_contexts; +static struct ACC_context *ACC_contexts; static struct gomp_device_descr const *dispatchers[_ACC_device_hwm] = { 0 }; @@ -198,7 +197,7 @@ lazy_open (int ord) ACC_handle = ACC_dev->openacc.open_device_func (ord); handle_num = ord; - SLIST_FOREACH(acc_ctx, ACC_contexts, next) + for (acc_ctx = ACC_contexts; acc_ctx != NULL; acc_ctx = acc_ctx->next) { if (acc_ctx->ACC_handle == ACC_handle) { @@ -220,7 +219,8 @@ lazy_open (int ord) if (!ACC_memmap->mem_map.is_initialized) gomp_init_tables (ACC_dev, &ACC_memmap->mem_map); - SLIST_INSERT_HEAD(ACC_contexts, acc_ctx, next); + acc_ctx->next = ACC_contexts; + ACC_contexts = acc_ctx; } /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of @@ -259,12 +259,10 @@ _acc_shutdown (acc_device_t d) close_handle (); - while (SLIST_FIRST(ACC_contexts) != NULL) + while (ACC_contexts != NULL) { - struct ACC_context *c; - - c = SLIST_FIRST(ACC_contexts); - SLIST_REMOVE_HEAD(ACC_contexts, next); + struct ACC_context *c = ACC_contexts; + ACC_contexts = ACC_contexts->next; free (c); } @@ -467,8 +465,7 @@ ACC_runtime_initialize (void) { gomp_mutex_init (&acc_device_lock); - ACC_contexts = &_ACC_contexts; - SLIST_INIT (ACC_contexts); + ACC_contexts = NULL; } /* Compiler helper functions */ diff --git libgomp/plugin-nvptx.c libgomp/plugin-nvptx.c index f193229..33f868a 100644 --- libgomp/plugin-nvptx.c +++ libgomp/plugin-nvptx.c @@ -40,7 +40,6 @@ #include "libgomp-plugin.h" #include -#include #include #include #include @@ -149,11 +148,9 @@ struct PTX_stream void *h_prev; void *h_tail; - SLIST_ENTRY(PTX_stream) next; + struct PTX_stream *next; }; -SLIST_HEAD(PTX_streams, PTX_stream); - /* Each thread may select a stream (also specific to a device/context). */ static __thread struct PTX_stream *current_stream; @@ -293,7 +290,7 @@ struct PTX_device /* All non-null streams associated with this device (actually context), either created implicitly or passed in from the user (via acc_set_cuda_stream). */ - struct PTX_streams active_streams; + struct PTX_stream *active_streams; struct { struct PTX_stream **arr; int size; @@ -306,12 +303,12 @@ struct PTX_device bool concur; int mode; bool mkern; - SLIST_ENTRY(PTX_device) next; + + struct PTX_device *next; }; static __thread struct PTX_device *PTX_dev; -static SLIST_HEAD(_PTX_devices, PTX_device) _PTX_devices; -static struct _PTX_devices *PTX_devices; +static struct PTX_device *PTX_devices; enum PTX_event_type { @@ -327,12 +324,12 @@ struct PTX_event int type; void *addr; int ord; - SLIST_ENTRY(PTX_event) next; + + struct PTX_event *next; }; static gomp_mutex_t PTX_event_lock; -static SLIST_HEAD(_PTX_events, PTX_event) _PTX_events; -static struct _PTX_events *PTX_events; +static struct PTX_event *PTX_events; #define _XSTR(s) _STR(s) #define _STR(s) #s @@ -417,7 +414,7 @@ init_streams_for_device (struct PTX_device *ptx_dev, int concurrency) map_init (null_stream); ptx_dev->null_stream = null_stream; - SLIST_INIT (&ptx_dev->active_streams); + ptx_dev->active_streams = NULL; GOMP_PLUGIN_mutex_init (&ptx_dev->stream_lock); if (concurrency < 1) @@ -437,13 +434,13 @@ init_streams_for_device (struct PTX_device *ptx_dev, int concurrency) static void fini_streams_for_device (struct PTX_device *ptx_dev) { - struct PTX_stream *s; free (ptx_dev->async_streams.arr); - while (!SLIST_EMPTY (&ptx_dev->active_streams)) + while (ptx_dev->active_streams != NULL) { - s = SLIST_FIRST (&ptx_dev->active_streams); - SLIST_REMOVE_HEAD (&ptx_dev->active_streams, next); + struct PTX_stream *s = ptx_dev->active_streams; + ptx_dev->active_streams = ptx_dev->active_streams->next; + cuStreamDestroy (s->stream); map_fini (s); free (s); @@ -535,7 +532,8 @@ select_stream_for_async (int async, pthread_t thread, bool create, s->h = NULL; map_init (s); - SLIST_INSERT_HEAD (&ptx_dev->active_streams, s, next); + s->next = ptx_dev->active_streams; + ptx_dev->active_streams = s; ptx_dev->async_streams.arr[async] = s; } @@ -593,11 +591,8 @@ PTX_init (void) if (r != CUDA_SUCCESS) GOMP_PLUGIN_fatal ("cuInit error: %s", cuErrorMsg (r)); - PTX_devices = &_PTX_devices; - PTX_events = &_PTX_events; - - SLIST_INIT(PTX_devices); - SLIST_INIT(PTX_events); + PTX_devices = NULL; + PTX_events = NULL; GOMP_PLUGIN_mutex_init (&PTX_event_lock); @@ -625,7 +620,9 @@ PTX_open_device (int n) { struct PTX_device *ptx_device; - SLIST_FOREACH(ptx_device, PTX_devices, next) + for (ptx_device = PTX_devices; + ptx_device != NULL; + ptx_device = ptx_device->next) { if (ptx_device->ord == n) { @@ -653,7 +650,8 @@ PTX_open_device (int n) PTX_dev->dev = dev; PTX_dev->ctx_shared = false; - SLIST_INSERT_HEAD(PTX_devices, PTX_dev, next); + PTX_dev->next = PTX_devices; + PTX_devices = PTX_dev; r = cuCtxGetCurrent (&PTX_dev->ctx); if (r != CUDA_SUCCESS) @@ -729,7 +727,15 @@ PTX_close_device (void *h __attribute__((unused))) GOMP_PLUGIN_fatal ("cuCtxDestroy error: %s", cuErrorMsg (r)); } - SLIST_REMOVE(PTX_devices, PTX_dev, PTX_device, next); + if (PTX_devices == PTX_dev) + PTX_devices = PTX_devices->next; + else + { + struct PTX_device* d = PTX_devices; + while (d->next != PTX_dev) + d = d->next; + d->next = d->next->next; + } free (PTX_dev); PTX_dev = NULL; @@ -920,60 +926,67 @@ link_ptx (CUmodule *module, char *ptx_code) static void event_gc (bool memmap_lockable) { - struct PTX_event *ptx_event; + struct PTX_event *ptx_event = PTX_events; GOMP_PLUGIN_mutex_lock (&PTX_event_lock); - for (ptx_event = SLIST_FIRST (PTX_events); ptx_event;) + while (ptx_event != NULL) { CUresult r; - struct PTX_event *next = SLIST_NEXT (ptx_event, next); + struct PTX_event *e = ptx_event; - if (ptx_event->ord != PTX_dev->ord) - goto next_event; + ptx_event = ptx_event->next; - r = cuEventQuery (*ptx_event->evt); + if (e->ord != PTX_dev->ord) + continue; + + r = cuEventQuery (*e->evt); if (r == CUDA_SUCCESS) - { - CUevent *te; + { + CUevent *te; - te = ptx_event->evt; + te = e->evt; - switch (ptx_event->type) + switch (e->type) { case PTX_EVT_MEM: case PTX_EVT_SYNC: break; case PTX_EVT_KNL: - map_pop (ptx_event->addr); + map_pop (e->addr); break; case PTX_EVT_ASYNC_CLEANUP: - { - /* The function GOMP_PLUGIN_async_unmap_vars needs to claim the + { + /* The function GOMP_PLUGIN_async_unmap_vars needs to claim the memory-map splay tree lock for the current device, so we can't call it when one of our callers has already claimed the lock. In that case, just delay the GC for this event - until later. */ - if (!memmap_lockable) - goto next_event; + until later. */ + if (!memmap_lockable) + continue; - GOMP_PLUGIN_async_unmap_vars (ptx_event->addr); - } + GOMP_PLUGIN_async_unmap_vars (e->addr); + } break; } - cuEventDestroy (*te); - free ((void *)te); + cuEventDestroy (*te); + free ((void *)te); - SLIST_REMOVE (PTX_events, ptx_event, PTX_event, next); + if (PTX_events == e) + PTX_events = PTX_events->next; + else + { + struct PTX_event *e_ = PTX_events; + while (e_->next != e) + e_ = e_->next; + e_->next = e_->next->next; + } - free (ptx_event); - } - - next_event: - ptx_event = next; + free (e); + } } GOMP_PLUGIN_mutex_unlock (&PTX_event_lock); @@ -995,7 +1008,8 @@ event_add (enum PTX_event_type type, CUevent *e, void *h) GOMP_PLUGIN_mutex_lock (&PTX_event_lock); - SLIST_INSERT_HEAD(PTX_events, ptx_event, next); + ptx_event->next = PTX_events; + PTX_events = ptx_event; GOMP_PLUGIN_mutex_unlock (&PTX_event_lock); } @@ -1316,7 +1330,7 @@ PTX_async_test_all (void) GOMP_PLUGIN_mutex_lock (&PTX_dev->stream_lock); - SLIST_FOREACH (s, &PTX_dev->active_streams, next) + for (s = PTX_dev->active_streams; s != NULL; s = s->next) { if ((s->multithreaded || pthread_equal (s->host_thread, self)) && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY) @@ -1400,7 +1414,7 @@ PTX_wait_all (void) /* Wait for active streams initiated by this thread (or by multiple threads) to complete. */ - SLIST_FOREACH (s, &PTX_dev->active_streams, next) + for (s = PTX_dev->active_streams; s != NULL; s = s->next) { if (s->multithreaded || pthread_equal (s->host_thread, self)) { @@ -1443,7 +1457,9 @@ PTX_wait_all_async (int async) GOMP_PLUGIN_mutex_lock (&PTX_dev->stream_lock); - SLIST_FOREACH (other_stream, &PTX_dev->active_streams, next) + for (other_stream = PTX_dev->active_streams; + other_stream != NULL; + other_stream = other_stream->next) { if (!other_stream->multithreaded && !pthread_equal (other_stream->host_thread, self)) @@ -1524,8 +1540,16 @@ PTX_set_cuda_stream (int async, void *stream) if (oldstream) { - SLIST_REMOVE (&PTX_dev->active_streams, oldstream, PTX_stream, next); - + if (PTX_dev->active_streams == oldstream) + PTX_dev->active_streams = PTX_dev->active_streams->next; + else + { + struct PTX_stream *s = PTX_dev->active_streams; + while (s->next != oldstream) + s = s->next; + s->next = s->next->next; + } + cuStreamDestroy (oldstream->stream); map_fini (oldstream); free (oldstream);