From patchwork Sat May 5 08:11:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 909105 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-477278-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="eWsiJTkW"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40dM6t1mJJz9s2k for ; Sat, 5 May 2018 18:12:03 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=UYvV34B10+d8V96eB4dSlRGHZrxjI06dRY3ljosfXJL5MkmpYt rK2nR5Q0zOhMfxaoPRlGsRGg4z7tOZ6Ly2k5eofGToUJTwS5YXJLvzR5G7NeX6ad aCv0kr/EZBpB6PO6f/PUT5Xs9b4TIF/o3k2LRcpe9wa1X9xuMzWysXb/o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=O5JwAAEBW0cQ+p9X4EUFDNQhh5M=; b=eWsiJTkWJbDa2XIxwXiI sr7ujsf5gG9fSlROB8WGx1J1Ma5yxDQzDp0GVl0S716EO6se459MNRA8rcK8OVuQ hJrCUjIQBn7PiiAzAnk6BVaozX6pbfdqm5/cgmxUYyQLLhvexqniQBUqJdLG+Dl5 xy/DmtyIHyLMAF+8hxRPN1A= Received: (qmail 6333 invoked by alias); 5 May 2018 08:11:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6274 invoked by uid 89); 5 May 2018 08:11:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=reside, reverted X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 05 May 2018 08:11:48 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1fEsIE-0006nx-3k from Tom_deVries@mentor.com for gcc-patches@gcc.gnu.org; Sat, 05 May 2018 01:11:46 -0700 Received: from [172.30.73.137] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Sat, 5 May 2018 09:11:41 +0100 To: GCC Patches From: Tom de Vries Subject: [nvptx, PR85653, committed] Add workaround for subsequent bar.syncs Message-ID: <179a62dd-b889-b81f-250a-d06afc15f334@mentor.com> Date: Sat, 5 May 2018 10:11:39 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) Hi, when compiling this testcase with the og7 branch without the recently committed "[nvptx, openacc] Don't emit barriers for empty loops": ... int main (void) { long long v1; #pragma acc parallel num_gangs (640) num_workers(1) vector_length (128) #pragma acc loop for (v1 = 0; v1 < 20; v1 += 2) ; return 0; } ... this ptx is generated: ... { bar.sync 0; bar.sync 0; ret; } ... This triggers some bug in the ptxas/JIT compiler that hangs the testcase on my quadro m1200. We can work around this by adding two membar.ctas inbetween. To the best of my knowledge, this is currently not triggering on trunk, but I'd rather have the workaround in place in case future changes will produce subsequent barsyncs again. Build trunk with x86_64 with nvptx accelerator and tested libgomp. Build og7 branch (both with the patch mentioned above in place, and reverted) with x86_64 with nvptx accelerator and tested libgomp. Committed to trunk. Thanks, - Tom [nvptx] Add workaround for subsequent bar.syncs 2018-05-04 Tom de Vries PR target/85653 * config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_3): Define. (workaround_barsyncs): New function. (nvptx_reorg): Use workaround_barsyncs. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR. (define_expand "nvptx_membar_cta"): New define_expand. (define_insn "*nvptx_membar_cta"): New insn. --- gcc/config/nvptx/nvptx.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++ gcc/config/nvptx/nvptx.md | 17 ++++++++++++++++ 2 files changed, 66 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index a0c7bc1..5608bee 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -79,6 +79,7 @@ #define WORKAROUND_PTXJIT_BUG 1 #define WORKAROUND_PTXJIT_BUG_2 1 +#define WORKAROUND_PTXJIT_BUG_3 1 /* The various PTX memory areas an object might reside in. */ enum nvptx_data_area @@ -4647,6 +4648,50 @@ prevent_branch_around_nothing (void) } #endif +#ifdef WORKAROUND_PTXJIT_BUG_3 +/* Insert two membar.cta insns inbetween two subsequent bar.sync insns. This + works around a hang observed at driver version 390.48 for sm_50. */ + +static void +workaround_barsyncs (void) +{ + bool seen_barsync = false; + for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn)) + { + if (INSN_P (insn) && recog_memoized (insn) == CODE_FOR_nvptx_barsync) + { + if (seen_barsync) + { + emit_insn_before (gen_nvptx_membar_cta (), insn); + emit_insn_before (gen_nvptx_membar_cta (), insn); + } + + seen_barsync = true; + continue; + } + + if (!seen_barsync) + continue; + + if (NOTE_P (insn) || DEBUG_INSN_P (insn)) + continue; + else if (INSN_P (insn)) + switch (recog_memoized (insn)) + { + case CODE_FOR_nvptx_fork: + case CODE_FOR_nvptx_forked: + case CODE_FOR_nvptx_joining: + case CODE_FOR_nvptx_join: + continue; + default: + break; + } + + seen_barsync = false; + } +} +#endif + /* PTX-specific reorganization - Split blocks at fork and join instructions - Compute live registers @@ -4730,6 +4775,10 @@ nvptx_reorg (void) prevent_branch_around_nothing (); #endif +#ifdef WORKAROUND_PTXJIT_BUG_3 + workaround_barsyncs (); +#endif + regstat_free_n_sets_and_refs (); df_finish_pass (true); diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 68bba36..9754219 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -56,6 +56,7 @@ UNSPECV_XCHG UNSPECV_BARSYNC UNSPECV_MEMBAR + UNSPECV_MEMBAR_CTA UNSPECV_DIM_POS UNSPECV_FORK @@ -1481,6 +1482,22 @@ "\\tmembar.sys;" [(set_attr "predicable" "false")]) +(define_expand "nvptx_membar_cta" + [(set (match_dup 0) + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR_CTA))] + "" +{ + operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + MEM_VOLATILE_P (operands[0]) = 1; +}) + +(define_insn "*nvptx_membar_cta" + [(set (match_operand:BLK 0 "" "") + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR_CTA))] + "" + "\\tmembar.cta;" + [(set_attr "predicable" "false")]) + (define_insn "nvptx_nounroll" [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)] ""