From patchwork Wed Jan 23 08:19:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 1029784 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-494575-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="fvuB/14o"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43kyqm1bKzz9s3q for ; Wed, 23 Jan 2019 19:19:15 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=XvSHAKP3ivvMIiEnJBj0SH0lrGbpjUF7xOlQ6vHNNTi6yssYl6 aFPovzjUDj89ab7wS4geXrDF3amzqzeuieII+Qtf+aMfaDghfi1klNiDEDt9GguQ N2r8/EK+v1N3EsxHAfaC5Y9FQVypa79W/kT4BBpPyC1WFGa+9akthNdl8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=EM3/kd8N27neVem6bQCdt8ym0Rw=; b=fvuB/14o0BcYTPGjzbhw HXT9NAOG58Dzx+mQv0/L7Sx/gKvTTyV8tfNBhghsFZHktqC2LuhT9W7UljLwfqNZ z+iCqGe+rkhb7BXF5Xr1+kg2c1mmyjPzEM4rLNQnqEx4XkMDHTuNh/hPij1z7za1 ZApkf2xIOy2Qr9UFPAGgBUs= Received: (qmail 34855 invoked by alias); 23 Jan 2019 08:19:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 34846 invoked by uid 89); 23 Jan 2019 08:19:07 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.2 spammy=H*Ad:D*name X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Jan 2019 08:19:06 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 71FE4AF5B; Wed, 23 Jan 2019 08:19:03 +0000 (UTC) Date: Wed, 23 Jan 2019 09:19:33 +0100 From: Tom de Vries To: gcc-patches@gcc.gnu.org Cc: Thomas Schwinge Subject: [committed][nvptx, libgomp] Fix map_push Message-ID: <20190123081931.GA7156@delia> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-IsSubscribed: yes Hi, The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a single linked list, with pop-from-the-front semantics. The function map_pop pops an element, either by: - deallocating the element, if there is more than one element - or marking the element inactive, if there's only one element The responsibility of map_push is to push an element to the back, as well as selecting the element to push, by: - allocating an element, or - reusing the element at the front if inactive and big enough, or - dropping the element at the front if inactive and not big enough, and allocating one that's big enough The current implemention gets at least the first and most basic scenario wrong: > map = cuda_map_create (size); We create an element, and assign it to map. > for (t = s->map; t->next != NULL; t = t->next) > ; We determine the last element in the fifo. > t->next = map; We append the new element. > s->map = map; But here, we throw away the rest of the FIFO, and declare the FIFO to be just the new element. This problem causes the test-case asyncwait-1.c to fail intermittently on some systems. The pr87835.c test-case added here is a a minimized and modified version of asyncwait-1.c (avoiding the kernel construct) that is more likely to fail. Fix this by rewriting map_pop more robustly, by: - seperating the function in two phases: select element, push element - when reusing or dropping an element, making sure that the element is cleanly popped from the queue - rewriting the push element part in such a way that it can handle all cases without needing if statements, such that each line is exercised for each of the three cases. Committed to trunk. Thanks, - Tom [nvptx, libgomp] Fix map_push 2019-01-22 Tom de Vries PR target/87835 * plugin/plugin-nvptx.c (map_push): Fix adding of allocated element. * testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test. --- libgomp/plugin/plugin-nvptx.c | 47 +++++++++------- .../testsuite/libgomp.oacc-c-c++-common/pr87835.c | 62 ++++++++++++++++++++++ 2 files changed, 91 insertions(+), 18 deletions(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index dd2bcf3083f..a220560b189 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -296,35 +296,46 @@ map_pop (struct ptx_stream *s) static CUdeviceptr map_push (struct ptx_stream *s, size_t size) { - struct cuda_map *map = NULL, *t = NULL; + struct cuda_map *map = NULL; + struct cuda_map **t; assert (s); assert (s->map); - /* Each PTX stream requires a separate data region to store the - launch arguments for cuLaunchKernel. Allocate a new - cuda_map and push it to the end of the list. */ + /* Select an element to push. */ if (s->map->active) + map = cuda_map_create (size); + else { - map = cuda_map_create (size); + /* Pop the inactive front element. */ + struct cuda_map *pop = s->map; + s->map = pop->next; + pop->next = NULL; - for (t = s->map; t->next != NULL; t = t->next) - ; + if (pop->size < size) + { + cuda_map_destroy (pop); - t->next = map; - } - else if (s->map->size < size) - { - cuda_map_destroy (s->map); - map = cuda_map_create (size); + map = cuda_map_create (size); + } + else + map = pop; } - else - map = s->map; - s->map = map; - s->map->active = true; + /* Check that the element is as expected. */ + assert (map->next == NULL); + assert (!map->active); + + /* Mark the element active. */ + map->active = true; + + /* Push the element to the back of the list. */ + for (t = &s->map; (*t) != NULL; t = &(*t)->next) + ; + assert (t != NULL && *t == NULL); + *t = map; - return s->map->d; + return map->d; } /* Target data function launch information. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr87835.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr87835.c new file mode 100644 index 00000000000..310a485e74f --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr87835.c @@ -0,0 +1,62 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ +/* { dg-additional-options "-lcuda" } */ + +#include +#include +#include "cuda.h" + +#include + +#define n 128 + +int +main (void) +{ + CUresult r; + CUstream stream1; + int N = n; + int a[n]; + int b[n]; + int c[n]; + + acc_init (acc_device_nvidia); + + r = cuStreamCreate (&stream1, CU_STREAM_NON_BLOCKING); + if (r != CUDA_SUCCESS) + { + fprintf (stderr, "cuStreamCreate failed: %d\n", r); + abort (); + } + + acc_set_cuda_stream (1, stream1); + + for (int i = 0; i < n; i++) + { + a[i] = 3; + c[i] = 0; + } + +#pragma acc data copy (a, b, c) copyin (N) + { +#pragma acc parallel async (1) + ; + +#pragma acc parallel async (1) num_gangs (320) + #pragma loop gang + for (int ii = 0; ii < N; ii++) + c[ii] = (a[ii] + a[N - ii - 1]); + +#pragma acc parallel async (1) + #pragma acc loop seq + for (int ii = 0; ii < n; ii++) + a[ii] = 6; + +#pragma acc wait (1) + } + + for (int i = 0; i < n; i++) + if (c[i] != 6) + abort (); + + return 0; +}