From patchwork Thu Feb 22 11:23:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 876597 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-473701-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="MiH3vILk"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3znBnB6b0Vz9s0t for ; Thu, 22 Feb 2018 22:23:42 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:cc:message-id:date:mime-version:content-type; q= dns; s=default; b=pvedh3Dkf+6fsEHqyaNrpBjPM6rdg/aZnzTdXggLcF+00V qRwEttTVKrZ/qcTQqBRNDATlUawhFtGPZ5pSFRT6Z7ElQicLFIUBO4QD05pQiVuf keuVtOBAQkJcJItr0vO5FQ1mFK6XyPb+fC6pXAw/FJthtD3VPN7rpiaSiGDco= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:cc:message-id:date:mime-version:content-type; s= default; bh=h1BMnYWDZiapBR/5xaZvzi2MkdQ=; b=MiH3vILkXACd0d7EvFH2 Dm9fytlaYs510xGWmPDqehvTbpKsF4uMDx5CBSsbo5KjxlinO06Ddp0O1LTuLdvb EjfCt8xVbuVt5Fi+OMgBbT9XqT7NiTSWRZKu5eq6v/Divs249Yefwa1TH5isTAyk Z/uZfgBoihKeQH+yOgL/GAg= Received: (qmail 97343 invoked by alias); 22 Feb 2018 11:23:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 97326 invoked by uid 89); 22 Feb 2018 11:23:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=dispatched, H*Ad:U*thomas, 4011, 22511 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Feb 2018 11:23:32 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1eooyI-00038V-HW from Tom_deVries@mentor.com ; Thu, 22 Feb 2018 03:23:30 -0800 Received: from [172.30.72.140] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Thu, 22 Feb 2018 11:23:26 +0000 To: GCC Patches From: Tom de Vries Subject: [og7] Fix hang when running oacc exec with CUDA 9.0 nvprof CC: Thomas Schwinge , Jakub Jelinek Message-ID: Date: Thu, 22 Feb 2018 12:23:25 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) Hi, when using cuda 9 nvprof with an openacc executable, the executable hangs. The scenario resulting in the hang is as follows: 1. goacc_lazy_initialize calls gomp_mutex_lock (&acc_device_lock) 2. goacc_lazy_initialize calls acc_init_1 3. acc_init_1 calls goacc_profiling_dispatch (&prof_info, &device_init_event_info, &api_info); 4. goacc_profiling_dispatch calls the registered callback in the cuda profiling library 5. the registered call back calls acc_get_device_type 6. acc_get_device_type calls gomp_mutex_lock (&acc_device_lock) 7. The lock is not recursive, so we have deadlock The registered callback in cuda 8 does not call acc_get_device_type, so the hang doesn't occur there. This patch fixes the hang by detecting in acc_get_device_type that the calling thread is a thread that is currently initializing the openacc part of the libgomp library, and returning acc_device_none, which is a legal value given that the openacc standard states "If the device type has not yet been selected, the value acc_device_none may be returned". Committed to og7 branch. Thanks, - Tom Fix hang when running oacc exec with CUDA 9.0 nvprof 2018-02-15 Tom de Vries * oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread): New variable. (acc_init_1): Set acc_init_thread to pthread_self (). Set acc_init_state to initializing at the start, and to initialized at the end. (self_initializing_p): New function. (acc_get_device_type): Return acc_device_none if called by thread that is currently executing acc_init_1. --- libgomp/oacc-init.c | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index 6dada0b..d8348c0 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -40,6 +40,11 @@ static gomp_mutex_t acc_device_lock; +static gomp_mutex_t acc_init_state_lock; +static enum { uninitialized, initializing, initialized } acc_init_state + = uninitialized; +static pthread_t acc_init_thread; + /* A cached version of the dispatcher for the global "current" accelerator type, e.g. used as the default when creating new host threads. This is the device-type equivalent of goacc_device_num (which specifies which device to @@ -220,6 +225,11 @@ acc_dev_num_out_of_range (acc_device_t d, int ord, int ndevs) static struct gomp_device_descr * acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit) { + gomp_mutex_lock (&acc_init_state_lock); + acc_init_state = initializing; + acc_init_thread = pthread_self (); + gomp_mutex_unlock (&acc_init_state_lock); + bool check_not_nested_p; if (implicit) { @@ -312,6 +322,9 @@ acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit) &api_info); } + gomp_mutex_lock (&acc_init_state_lock); + acc_init_state = initialized; + gomp_mutex_unlock (&acc_init_state_lock); return base_dev; } @@ -644,6 +657,17 @@ acc_set_device_type (acc_device_t d) ialias (acc_set_device_type) +static bool +self_initializing_p (void) +{ + bool res; + gomp_mutex_lock (&acc_init_state_lock); + res = (acc_init_state == initializing + && pthread_equal (acc_init_thread, pthread_self ())); + gomp_mutex_unlock (&acc_init_state_lock); + return res; +} + acc_device_t acc_get_device_type (void) { @@ -653,6 +677,15 @@ acc_get_device_type (void) if (thr && thr->base_dev) res = acc_device_type (thr->base_dev->type); + else if (self_initializing_p ()) + /* The Cuda libaccinj64.so version 9.0+ calls acc_get_device_type during the + acc_ev_device_init_start event callback, which is dispatched during + acc_init_1. Trying to lock acc_device_lock during such a call (as we do + in the else clause below), will result in deadlock, since the lock has + already been taken by the acc_init_1 caller. We work around this problem + by using the acc_get_device_type property "If the device type has not yet + been selected, the value acc_device_none may be returned". */ + ; else { acc_prof_info prof_info;