From patchwork Mon Feb 17 22:02:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Koenig X-Patchwork-Id: 1239625 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-519678-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=netcologne.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=AgxPbA9V; dkim=pass (2048-bit key; unprotected) header.d=netcologne.de header.i=@netcologne.de header.a=rsa-sha256 header.s=nc1116a header.b=D4FwP7h2; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48Lyf56GSTz9sRG for ; Tue, 18 Feb 2020 09:02:53 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=BaOfJhzwQrFqFLECLsgVFf7NeSfAjJlNv8C+AIlxPLg4L8gjaR TivBdU5loLb84pDkL0iB1EQrczYSRAiSGU8qpDCAMTN3R53oQvra/STYIO8Wp8ex yQiHNutP3kIE45aAxUvQ3jHD2deKOJAOZIM3hw+BhHCPdJfQubsYBsnR8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=kKbQ3brmdfrKYw3YRtCBxiMAA8I=; b=AgxPbA9VAPPl7H8i11zI JmR5ZoiPNPgFIpiBzUsZqB8xkPoWOldZlJKRup+PnFIF73laGW2VnZ3tPhrEHshX PrQqelsIZvtEdDBJCo8L5oiWM07lOvlhRqMcpE8Adt2i4GlCEhZq+C2i5zmM/+ca TR9/orIcbloOIBMQR3ScBoE= Received: (qmail 3753 invoked by alias); 17 Feb 2020 22:02:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3679 invoked by uid 89); 17 Feb 2020 22:02:23 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=rec, exhibited, H*r:sk:tkoenig X-HELO: cc-smtpout3.netcologne.de Received: from cc-smtpout3.netcologne.de (HELO cc-smtpout3.netcologne.de) (89.1.8.213) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 17 Feb 2020 22:02:20 +0000 Received: from cc-smtpin1.netcologne.de (cc-smtpin1.netcologne.de [89.1.8.201]) by cc-smtpout3.netcologne.de (Postfix) with ESMTP id C540C126AF; Mon, 17 Feb 2020 23:02:16 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=netcologne.de; s=nc1116a; t=1581976936; bh=t0xaT3SqFfFr2YMImdssTK93SkEomYgE1izIxn2LMhc=; h=To:From:Subject:Message-ID:Date:From; b=D4FwP7h2KePKa2MaqVBKsLgzssS+fOaQRiWrfA/XoZ9AKg67qE4hJ6FFRmDkZLdxR vyfQ4kisyaa1vFECFTFWT7gLMdboFqavopOWQGIFdi3ZsK3aW1BgKVsbjZhZzspT8t oiZGzGMVSUSN6bVhQan8Nxa0p4223gjLLRDq8azfDjdRdAf8w4hjUyilf7CttNkicl OkoRBEnyvReyvxFMnyg+NGq6aH3sC5g97PFBYfpJlv0jxUaPmnRvm9+NcN0RWq8UY3 /v0F4rmtpylNvJvuBay0hGxaeANpruZzbT5CgDC8ZkXDtFD/a9xvrR/PYbBEnaCT3x 1wOqoZXFFimSQ== Received: from localhost (localhost [127.0.0.1]) by cc-smtpin1.netcologne.de (Postfix) with ESMTP id B0D1611EE2; Mon, 17 Feb 2020 23:02:16 +0100 (CET) Received: from [2001:4dd4:fbfe:0:f116:d44:7d03:748a] (helo=cc-smtpin1.netcologne.de) by localhost with ESMTP (eXpurgate 4.11.6) (envelope-from ) id 5e4b0d68-45d8-7f0000012729-7f0000018a18-1 for ; Mon, 17 Feb 2020 23:02:16 +0100 Received: from linux-p51k.fritz.box (2001-4dd4-fbfe-0-f116-d44-7d03-748a.ipv6dyn.netcologne.de [IPv6:2001:4dd4:fbfe:0:f116:d44:7d03:748a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by cc-smtpin1.netcologne.de (Postfix) with ESMTPSA; Mon, 17 Feb 2020 23:02:15 +0100 (CET) To: "fortran@gcc.gnu.org" , gcc-patches From: Thomas Koenig Subject: [patch, libfortran] Fix race condition in asynchronous I/O Message-ID: <63ab72d7-5629-6d41-41e5-7f00b2b0f196@netcologne.de> Date: Mon, 17 Feb 2020 23:02:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 Hello world, as you can see from the PR, async_io_4.f90 exhibited a race condition every few thousands to tens of thousands of cases. This was difficult to track down, and took some thinking, discussion with Nicolas and enabling of the debugging feature of the async I/O, which finally produced a failure which could then be analyzed. The solution was to remove the mutexes for the different conditions that can happen in the async library, relying on the lock that protects the async I/O queue instead. Tested by Bill Seurer (thanks!) with a previous version of the patch, and with valgrind --tool=helgrind and valgrind --tool=drd, both of which showed no (new) failures for all of the async_io_*.f90 tests. So, OK for trunk and gcc-9? Regards Thomas 2020-02-17 Thomas Koenig PR fortran/93599 * io/async.c (destroy_adv_cond): Do not destroy lock. (async_io): Make sure au->lock is locked for finishing of thread. Do not lock/unlock around signalling emptysignal. Unlock au->lock before return. (init_adv_cond): Do not initialize lock. (enqueue_transfer): Unlock after signal. (enqueue_done_id): Likewise. (enqueue_done): Likewise. (enqueue_close): Likewise. (enqueue_data_transfer): Likewise. (async_wait_id): Do not lock/unlock around signalling au->work. (async_wait): Unlock after signal. * io/async.h (SIGNAL): Add comment about needed au->lock. Remove locking/unlocking of advcond->lock. (WAIT_SIGNAL_MUTEX): Add comment. Remove locking/unlocking of advcond->lock. Unlock mutex only at the end. Loop on __ghread_cond_wait returning zero. (REVOKE_SIGNAL): Add comment. Remove locking/unlocking of advcond->lock. (struct adv_cond): Remove mutex from struct. diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c index ab214af8a66..63b9158c0ba 100644 --- a/libgfortran/io/async.c +++ b/libgfortran/io/async.c @@ -80,7 +80,6 @@ update_pdt (st_parameter_dt **old, st_parameter_dt *new) { static void destroy_adv_cond (struct adv_cond *ac) { - T_ERROR (__gthread_mutex_destroy, &ac->lock); T_ERROR (__gthread_cond_destroy, &ac->signal); } @@ -156,6 +155,7 @@ async_io (void *arg) case AIO_CLOSE: NOTE ("Received AIO_CLOSE"); + LOCK (&au->lock); goto finish_thread; default: @@ -175,7 +175,6 @@ async_io (void *arg) else if (ctq->type == AIO_CLOSE) { NOTE ("Received AIO_CLOSE during error condition"); - UNLOCK (&au->lock); goto finish_thread; } } @@ -189,9 +188,7 @@ async_io (void *arg) au->tail = NULL; au->head = NULL; au->empty = 1; - UNLOCK (&au->lock); SIGNAL (&au->emptysignal); - LOCK (&au->lock); } finish_thread: au->tail = NULL; @@ -199,6 +196,7 @@ async_io (void *arg) au->empty = 1; SIGNAL (&au->emptysignal); free (ctq); + UNLOCK (&au->lock); return NULL; } @@ -223,7 +221,6 @@ static void init_adv_cond (struct adv_cond *ac) { ac->pending = 0; - __GTHREAD_MUTEX_INIT_FUNCTION (&ac->lock); __GTHREAD_COND_INIT_FUNCTION (&ac->signal); } @@ -279,8 +276,8 @@ enqueue_transfer (async_unit *au, transfer_args *arg, enum aio_do type) au->tail = tq; REVOKE_SIGNAL (&(au->emptysignal)); au->empty = false; - UNLOCK (&au->lock); SIGNAL (&au->work); + UNLOCK (&au->lock); } /* Enqueue an st_write_done or st_read_done which contains an ID. */ @@ -303,8 +300,8 @@ enqueue_done_id (async_unit *au, enum aio_do type) au->empty = false; ret = au->id.high++; NOTE ("Enqueue id: %d", ret); - UNLOCK (&au->lock); SIGNAL (&au->work); + UNLOCK (&au->lock); return ret; } @@ -324,8 +321,8 @@ enqueue_done (async_unit *au, enum aio_do type) au->tail = tq; REVOKE_SIGNAL (&(au->emptysignal)); au->empty = false; - UNLOCK (&au->lock); SIGNAL (&au->work); + UNLOCK (&au->lock); } /* Enqueue a CLOSE statement. */ @@ -344,8 +341,8 @@ enqueue_close (async_unit *au) au->tail = tq; REVOKE_SIGNAL (&(au->emptysignal)); au->empty = false; - UNLOCK (&au->lock); SIGNAL (&au->work); + UNLOCK (&au->lock); } /* The asynchronous unit keeps the currently active PDT around. @@ -374,9 +371,9 @@ enqueue_data_transfer_init (async_unit *au, st_parameter_dt *dt, int read_flag) au->tail->next = tq; au->tail = tq; REVOKE_SIGNAL (&(au->emptysignal)); - au->empty = 0; - UNLOCK (&au->lock); + au->empty = false; SIGNAL (&au->work); + UNLOCK (&au->lock); } /* Collect the errors that may have happened asynchronously. Return true if @@ -430,9 +427,7 @@ async_wait_id (st_parameter_common *cmp, async_unit *au, int i) NOTE ("Waiting for id %d", i); if (au->id.waiting < i) au->id.waiting = i; - UNLOCK (&au->lock); SIGNAL (&(au->work)); - LOCK (&au->lock); WAIT_SIGNAL_MUTEX (&(au->id.done), (au->id.low >= au->id.waiting || au->empty), &au->lock); LOCK (&au->lock); @@ -454,8 +449,8 @@ async_wait (st_parameter_common *cmp, async_unit *au) if (cmp == NULL) cmp = au->error.cmp; - SIGNAL (&(au->work)); LOCK (&(au->lock)); + SIGNAL (&(au->work)); if (au->empty) { diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h index c6b2e0f94bd..17d303c127b 100644 --- a/libgfortran/io/async.h +++ b/libgfortran/io/async.h @@ -229,44 +229,44 @@ #if ASYNC_IO +/* au->lock has to be held when calling this macro. */ + #define SIGNAL(advcond) do{ \ - INTERN_LOCK (&(advcond)->lock); \ (advcond)->pending = 1; \ DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_ORANGE "SIGNAL: " DEBUG_NORM \ #advcond, __FUNCTION__, __LINE__, (void *) advcond); \ T_ERROR (__gthread_cond_broadcast, &(advcond)->signal); \ - INTERN_UNLOCK (&(advcond)->lock); \ } while (0) +/* Has to be entered with mutex locked. */ + #define WAIT_SIGNAL_MUTEX(advcond, condition, mutex) do{ \ __label__ finish; \ - INTERN_LOCK (&((advcond)->lock)); \ DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_BLUE "WAITING: " DEBUG_NORM \ #advcond, __FUNCTION__, __LINE__, (void *) advcond); \ - if ((advcond)->pending || (condition)){ \ - UNLOCK (mutex); \ + if ((advcond)->pending || (condition)) \ goto finish; \ - } \ - UNLOCK (mutex); \ - while (!__gthread_cond_wait(&(advcond)->signal, &(advcond)->lock)) { \ - { int cond; \ - LOCK (mutex); cond = condition; UNLOCK (mutex); \ - if (cond){ \ - DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_ORANGE "REC: " DEBUG_NORM \ + while (1) \ + { \ + int err_ret = __gthread_cond_wait(&(advcond)->signal, mutex); \ + if (err_ret) internal_error (NULL, "WAIT_SIGNAL_MUTEX failed"); \ + if (condition) \ + { \ + DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_ORANGE \ + "REC: " DEBUG_NORM \ #advcond, __FUNCTION__, __LINE__, (void *)advcond); \ break; \ } \ } \ - } \ finish: \ (advcond)->pending = 0; \ - INTERN_UNLOCK (&((advcond)->lock)); \ + UNLOCK (mutex); \ } while (0) +/* au->lock has to be held when calling this macro. */ + #define REVOKE_SIGNAL(advcond) do{ \ - INTERN_LOCK (&(advcond)->lock); \ (advcond)->pending = 0; \ - INTERN_UNLOCK (&(advcond)->lock); \ } while (0) #else @@ -330,7 +330,6 @@ struct adv_cond { #if ASYNC_IO int pending; - __gthread_mutex_t lock; __gthread_cond_t signal; #endif };