From patchwork Tue Jan 22 11:03:25 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Modra X-Patchwork-Id: 214509 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 350062C007C for ; Tue, 22 Jan 2013 22:03:42 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1359457423; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To: MIME-Version:Content-Type:Content-Disposition:User-Agent: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=WD4n/8ATbzsjRk8lb0jg JCrmqsg=; b=gxcbxg1+8wFWL91nRjlRayhv/o/1z4Tp7OZykZxp2L6HColRSB7O UQlno2ZzirvoTHKKc47T/hL0BkK2HZOZE1qljJd0EZOrdsXd9/4rPmaUbY+2ChxO gNZjQ3g1t36VyH+4/oOh5qCsiLtnyiGukjKD2q+4G0zKLoC1zsXYW9I= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:X-Received:Received:Received:Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To:MIME-Version:Content-Type:Content-Disposition:User-Agent:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=QOihZaZtXwNhn4esARaNx+E+NaPcplzbIrAGXWS6Kc2yuHZHKIH5dw2gBAURoF SUOD6kn/UumosKyGv89YWDIgaaHMSJXGhtBUlcnOHStkIOiAYMVkw57EoZITX816 vd3oyXtStazYmw+bYFRli+m8qAW0/56RKsE3dYcymeLnI=; Received: (qmail 31957 invoked by alias); 22 Jan 2013 11:03:38 -0000 Received: (qmail 31944 invoked by uid 22791); 22 Jan 2013 11:03:38 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-pa0-f41.google.com (HELO mail-pa0-f41.google.com) (209.85.220.41) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 Jan 2013 11:03:33 +0000 Received: by mail-pa0-f41.google.com with SMTP id bj3so4005865pad.0 for ; Tue, 22 Jan 2013 03:03:33 -0800 (PST) X-Received: by 10.68.253.137 with SMTP id aa9mr57269476pbd.102.1358852613118; Tue, 22 Jan 2013 03:03:33 -0800 (PST) Received: from bubble.grove.modra.org ([101.166.26.37]) by mx.google.com with ESMTPS id ti9sm10562142pbc.16.2013.01.22.03.03.29 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 22 Jan 2013 03:03:31 -0800 (PST) Received: by bubble.grove.modra.org (Postfix, from userid 1000) id 1FB85EA1F62; Tue, 22 Jan 2013 21:33:25 +1030 (CST) Date: Tue, 22 Jan 2013 21:33:25 +1030 From: Alan Modra To: gcc-patches@gcc.gnu.org Cc: Jakub Jelinek Subject: PR libgomp/56073: benchmark regression due to PR libgomp/51376 fix Message-ID: <20130122110324.GL3244@bubble.grove.modra.org> Mail-Followup-To: gcc-patches@gcc.gnu.org, Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org SPEComp2012 376.kdtree shows a huge regression after my PR51376 fix, with the benchmark, which finishes normally in a few minutes, failing to complete after hours of running on a power7 machine. Using a reduced data set showed typically over a 200x slowdown. So, why is this happening? Well, it appears that the benchmark hits exactly the libgomp code paths that previously accessed task->children without locking and now we need to obtain task_lock. The slowdown is due to massive task_lock contention. I tried the solution in http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00235.html and found it gave a small improvement, but there was still far too much contention due to hitting task_lock in GOMP_task. The question then was whether locking was really necessary there, and on analysing it properly I believe I was far too conservative in forcing the "children" access to be inside a task_lock held region. That particular task.children will only be set if the current thread creates new threads in its task work function! ie. Some other thread won't set it, so there's no need to worry that the current thread might see a stale NULL value and miss calling gomp_clear_parent. (It's true that the current thread might see a stale non-NULL value, but that doesn't matter. Gaining the lock will ensure gomp_clear_parent sees the real value of task.children.) With this patch, PR51376 stays fixed and we're back to a reasonable time for kdtree. I'm seeing a 20% slowdown in my quick and dirty testing, but some of that will be due to different optimisation and tuning in the libgomp builds. I did consider (and test) another small refinement. The release barrier in gomp_sem_post is sufficient to ensure correct memory ordering, so you can write: if (parent->in_taskwait) { gomp_sem_post (&parent->taskwait_sem); parent->children = NULL; } else __atomic_store_n (&parent->children, NULL, MEMMODEL_RELEASE); However, this reorders posting the semaphore and writing parent->children. I think doing so is OK but am wary of trying to be too clever where multiple threads are involved.. Bootstrapped and regression tested powerpc64-linux. OK to apply? PR libgomp/51376 PR libgomp/56073 * task.c (GOMP_task): Revert 2011-12-09 change. (GOMP_taskwait): Likewise. Instead use atomic load with acquire barrier to read task->children.. (gomp_barrier_handle_tasks): ..and matching atomic store with release barrier here when setting parent->children to NULL. Index: libgomp/task.c =================================================================== --- libgomp/task.c (revision 195354) +++ libgomp/task.c (working copy) @@ -116,11 +116,10 @@ GOMP_task (void (*fn) (void *), void *data, void ( } else fn (data); - if (team != NULL) + if (task.children != NULL) { gomp_mutex_lock (&team->task_lock); - if (task.children != NULL) - gomp_clear_parent (task.children); + gomp_clear_parent (task.children); gomp_mutex_unlock (&team->task_lock); } gomp_end_task (); @@ -258,7 +257,13 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t st parent->children = child_task->next_child; else { - parent->children = NULL; + /* We access task->children in GOMP_taskwait + outside of the task lock mutex region, so + need a release barrier here to ensure memory + written by child_task->fn above is flushed + before the NULL is written. */ + __atomic_store_n (&parent->children, NULL, + MEMMODEL_RELEASE); if (parent->in_taskwait) gomp_sem_post (&parent->taskwait_sem); } @@ -291,7 +296,8 @@ GOMP_taskwait (void) struct gomp_task *child_task = NULL; struct gomp_task *to_free = NULL; - if (task == NULL || team == NULL) + if (task == NULL + || __atomic_load_n (&task->children, MEMMODEL_ACQUIRE) == NULL) return; gomp_mutex_lock (&team->task_lock);