From patchwork Thu Feb 22 11:23:25 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tom de Vries <Tom_deVries@mentor.com>
X-Patchwork-Id: 876597
Return-Path: 
 <gcc-patches-return-473701-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-473701-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="MiH3vILk"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3znBnB6b0Vz9s0t
	for <incoming@patchwork.ozlabs.org>;
	Thu, 22 Feb 2018 22:23:42 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to
	:from:subject:cc:message-id:date:mime-version:content-type; q=
	dns; s=default; b=pvedh3Dkf+6fsEHqyaNrpBjPM6rdg/aZnzTdXggLcF+00V
	qRwEttTVKrZ/qcTQqBRNDATlUawhFtGPZ5pSFRT6Z7ElQicLFIUBO4QD05pQiVuf
	keuVtOBAQkJcJItr0vO5FQ1mFK6XyPb+fC6pXAw/FJthtD3VPN7rpiaSiGDco=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to
	:from:subject:cc:message-id:date:mime-version:content-type; s=
	default; bh=h1BMnYWDZiapBR/5xaZvzi2MkdQ=; b=MiH3vILkXACd0d7EvFH2
	Dm9fytlaYs510xGWmPDqehvTbpKsF4uMDx5CBSsbo5KjxlinO06Ddp0O1LTuLdvb
	EjfCt8xVbuVt5Fi+OMgBbT9XqT7NiTSWRZKu5eq6v/Divs249Yefwa1TH5isTAyk
	Z/uZfgBoihKeQH+yOgL/GAg=
Received: (qmail 97343 invoked by alias); 22 Feb 2018 11:23:34 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 97326 invoked by uid 89); 22 Feb 2018 11:23:33 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	RCVD_IN_DNSWL_NONE, SPF_PASS,
	URIBL_RED autolearn=ham version=3.3.2 spammy=dispatched,
	H*Ad:U*thomas, 4011, 22511
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	ESMTP; Thu, 22 Feb 2018 11:23:32 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2]
	helo=SVR-IES-MBX-04.mgc.mentorg.com)	by relay1.mentorg.com
	with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256)	id
	1eooyI-00038V-HW from Tom_deVries@mentor.com ;
	Thu, 22 Feb 2018 03:23:30 -0800
Received: from [172.30.72.140] (137.202.0.87) by
	SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft
	SMTP Server (TLS) id 15.0.1320.4; Thu, 22 Feb 2018 11:23:26 +0000
To: GCC Patches <gcc-patches@gcc.gnu.org>
From: Tom de Vries <Tom_deVries@mentor.com>
Subject: [og7] Fix hang when running oacc exec with CUDA 9.0 nvprof
CC: Thomas Schwinge <thomas@codesourcery.com>,
	Jakub Jelinek	<jakub@redhat.com>
Message-ID: <c9712aa1-d85a-53fc-1624-d2933e45789d@mentor.com>
Date: Thu, 22 Feb 2018 12:23:25 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:52.0) Gecko/20100101 Thunderbird/52.6.0
MIME-Version: 1.0
X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To
	SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)

Hi,

when using cuda 9 nvprof with an openacc executable, the executable hangs.

The scenario resulting in the hang is as follows:
1. goacc_lazy_initialize calls gomp_mutex_lock (&acc_device_lock)
2. goacc_lazy_initialize calls acc_init_1
3. acc_init_1 calls goacc_profiling_dispatch (&prof_info,
    &device_init_event_info, &api_info);
4. goacc_profiling_dispatch calls the registered callback in the cuda
    profiling library
5. the registered call back calls acc_get_device_type
6. acc_get_device_type calls gomp_mutex_lock (&acc_device_lock)
7. The lock is not recursive, so we have deadlock

The registered callback in cuda 8 does not call acc_get_device_type, so 
the hang doesn't occur there.

This patch fixes the hang by detecting in acc_get_device_type that the 
calling thread is a thread that is currently initializing the openacc 
part of the libgomp library, and returning acc_device_none, which is a 
legal value given that the openacc standard states "If the device type 
has not yet been selected, the value acc_device_none may be returned".

Committed to og7 branch.

Thanks,
- Tom

Fix hang when running oacc exec with CUDA 9.0 nvprof

2018-02-15  Tom de Vries  <tom@codesourcery.com>

	* oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread):
	New variable.
	(acc_init_1): Set acc_init_thread to pthread_self ().  Set
	acc_init_state to initializing at the start, and to initialized at the
	end.
	(self_initializing_p): New function.
	(acc_get_device_type): Return acc_device_none if called by thread that
	is currently executing acc_init_1.

---
 libgomp/oacc-init.c   | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 6dada0b..d8348c0 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -40,6 +40,11 @@
 
 static gomp_mutex_t acc_device_lock;
 
+static gomp_mutex_t acc_init_state_lock;
+static enum { uninitialized, initializing, initialized } acc_init_state
+  = uninitialized;
+static pthread_t acc_init_thread;
+
 /* A cached version of the dispatcher for the global "current" accelerator type,
    e.g. used as the default when creating new host threads.  This is the
    device-type equivalent of goacc_device_num (which specifies which device to
@@ -220,6 +225,11 @@ acc_dev_num_out_of_range (acc_device_t d, int ord, int ndevs)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit)
 {
+  gomp_mutex_lock (&acc_init_state_lock);
+  acc_init_state = initializing;
+  acc_init_thread = pthread_self ();
+  gomp_mutex_unlock (&acc_init_state_lock);
+
   bool check_not_nested_p;
   if (implicit)
     {
@@ -312,6 +322,9 @@ acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit)
 				&api_info);
     }
 
+  gomp_mutex_lock (&acc_init_state_lock);
+  acc_init_state = initialized;
+  gomp_mutex_unlock (&acc_init_state_lock);
   return base_dev;
 }
 
@@ -644,6 +657,17 @@ acc_set_device_type (acc_device_t d)
 
 ialias (acc_set_device_type)
 
+static bool
+self_initializing_p (void)
+{
+  bool res;
+  gomp_mutex_lock (&acc_init_state_lock);
+  res = (acc_init_state == initializing
+	 && pthread_equal (acc_init_thread, pthread_self ()));
+  gomp_mutex_unlock (&acc_init_state_lock);
+  return res;
+}
+
 acc_device_t
 acc_get_device_type (void)
 {
@@ -653,6 +677,15 @@ acc_get_device_type (void)
 
   if (thr && thr->base_dev)
     res = acc_device_type (thr->base_dev->type);
+  else if (self_initializing_p ())
+    /* The Cuda libaccinj64.so version 9.0+ calls acc_get_device_type during the
+       acc_ev_device_init_start event callback, which is dispatched during
+       acc_init_1.  Trying to lock acc_device_lock during such a call (as we do
+       in the else clause below), will result in deadlock, since the lock has
+       already been taken by the acc_init_1 caller.  We work around this problem
+       by using the acc_get_device_type property "If the device type has not yet
+       been selected, the value acc_device_none may be returned".  */
+    ;
   else
     {
       acc_prof_info prof_info;