From patchwork Tue Mar 20 09:17:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 888123 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-474993-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="buztMeR/"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4056lm1TC4z9sYD for ; Tue, 20 Mar 2018 20:17:38 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:references:subject:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=W8WYfvcKrrBCxeeJ0 m2DIonjD9YUAytQ6pj1QHE6+1+g4Y3Q5KjVTabTD9A3vYI7V5Vc60KnqdInEFCfM dhwvVnctuDszoJ81j60vdOQPyhhcYNkT9IdtGP0aelATjHDwgJjSgqBeO40nCwSa /Kg241PL/DMWUYCfkClKMA9io8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:references:subject:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=eI3S2a3nUMQ7qPLYjUnY4m/ 2/XQ=; b=buztMeR/BcF8n3pbVmzJklH3WsEpkzV2T++HJCxk+B6/WOUXihQxDT4 87gdcZAVNoMJuE7pS2gPr9ZKIUNusVrulEwMqQhasXmQ82ZpZV/philCHwiuHFWz avsNZRdMSIPPr3gR3AqB0nzCT0HhuFnUYM1n1ocG8DK+uVTVFU+A= Received: (qmail 106766 invoked by alias); 20 Mar 2018 09:17:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 105663 invoked by uid 89); 20 Mar 2018 09:17:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=worker X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Mar 2018 09:17:26 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1eyDOW-0003eH-4P from Tom_deVries@mentor.com ; Tue, 20 Mar 2018 02:17:24 -0700 Received: from [172.30.73.228] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Tue, 20 Mar 2018 09:17:18 +0000 From: Tom de Vries To: GCC Patches CC: Jakub Jelinek , Richard Biener References: <34fb1d00-dc5d-04f2-d601-ee6fe710ac3b@mentor.com> Subject: [nvptx, PR84954, committed] Fix prevent_branch_around_nothing Message-ID: <5ad5142d-7b99-9f1b-8513-92f77c25d823@mentor.com> Date: Tue, 20 Mar 2018 10:17:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <34fb1d00-dc5d-04f2-d601-ee6fe710ac3b@mentor.com> X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) [ was: Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug ] On 01/24/2018 11:41 AM, Tom de Vries wrote: > Hi, > > this patch adds a workaround for the nvptx target JIT bug PR83589 - > "[nvptx] mode-transitions.c and private-variables.{c,f90} execution > FAILs at GOMP_NVPTX_JIT=-O0". > > > When compiling a branch-around-nothing (where the branch is warp > neutering, so it's a divergent branch): > ... >   .reg .pred %r36; >   { >     .reg .u32 %x; >     mov.u32 %x,%tid.x; >     setp.ne.u32 %r36,%x,0; >   } > >   @ %r36 bra $L5; >   $L5: > ... > > The JIT fails to generate a convergence point here: > ... >          /*0128*/               @P0 BRA `(.L_1); > .L_1: > ... > > Consequently, we execute subsequent code in divergent mode, and when > executing a shfl.idx a bit later we run into the undefined behaviour > that shfl.idx has when executing in divergent mode. > > The workaround detects branch-around-nothing, and inserts a ptx > operation that does nothing (I'm calling it a fake nop, I haven't been > able to come up with a better term yet): > ... >   @ %r36 bra $L5; >     { >       .reg .u32 %nop_src; >       .reg .u32 %nop_dst; >       mov.u32 %nop_dst, %nop_src; >     } >   $L5: > ... > which makes the test pass, because then we generate a convergence point > here at .L1: > ... >         /*0128*/                   SSY `(.L_1); >         /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*); >         /*0138*/                   SYNC (*"TARGET= .L_1 "*); > .L_1: > ... > > The workaround is not minimal given that it inserts the fake nop in all > branch-around-nothings it detects, not just the warp neutering ones, but > I think this is more robust than trying to identify the warp neutering > branches. Furthermore, I'm not going for optimality here anyway. The > optimal way to fix this is making sure we don't generate > branch-around-nothing, but that's for stage1. > > Build and reg-tested on x86_64 with nvptx accelerator. > > I'd like to commit in stage4, but I'd appreciate a review of the code. > Does the patch look OK? > > Thanks, > - Tom > > 0002-nvptx-PR83589-Workaround-for-branch-around-nothing-JIT-bug.patch > > > [nvptx, PR83589] Workaround for branch-around-nothing JIT bug > > 2018-01-23 Tom de Vries > > PR target/83589 > * config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1. > (nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c. > Add strict parameter. > (prevent_branch_around_nothing): Insert dummy insn between branch to > label and label with no ptx insn inbetween. > * config/nvptx/nvptx.md (define_insn "fake_nop"): New insn. > > * testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test. > > --- > gcc/config/nvptx/nvptx.c | 92 ++++++++++++++++++++++ > gcc/config/nvptx/nvptx.md | 9 +++ > .../testsuite/libgomp.oacc-c-c++-common/pr83589.c | 21 +++++ > 3 files changed, 122 insertions(+) > > +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx > + insn inbetween the branch and the label. This works around a JIT bug > + observed at driver version 384.111, at -O0 for sm_50. */ > + > +static void > +prevent_branch_around_nothing (void) > +{ > + rtx_insn *seen_label = 0; > + for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn)) > + { > + if (seen_label == 0) > + { > + if (INSN_P (insn) && condjump_p (insn)) > + seen_label = label_ref_label (nvptx_condjump_label (insn, false)); > + > + continue; > + } > + > + if (NOTE_P (insn)) > + continue; > + > + if (INSN_P (insn)) > + switch (recog_memoized (insn)) > + { > + case CODE_FOR_nvptx_fork: > + case CODE_FOR_nvptx_forked: > + case CODE_FOR_nvptx_joining: > + case CODE_FOR_nvptx_join: > + continue; > + default: > + seen_label = 0; > + continue; > + } > + > + if (LABEL_P (insn) && insn == seen_label) > + emit_insn_before (gen_fake_nop (), insn); > + > + seen_label = 0; > + } > + } Consider testcase: ... int main (void) { int a[10]; #pragma acc parallel loop worker for (int i = 0; i < 10; i++) a[i] = i; return 0; } ... At -O2, we generate this, and fail to generate a fake nop: ... @ %r34 bra.uni $L8; @ %r33 bra $L9; // join 2; $L9: $L8: ... What is happening in prevent_branch_around_nothing is: - seen_label is NULL - we process "@ %r34 bra.uni $L8" and seen_label becomes $L8 - we process "@ %r33 bra $L9" and since seen_label != NULL, we end up in the default case in the switch and reset seen_label to NULL - we process the labels, seen_label remains NULL, and no fake nop is generated What we want to happen instead, is that when processing "@ %r33 bra $L9", seen_label is updated to $L9. Patch below implements that. Build and reg-tested on x86_64 with nvptx accelerator. Committed to stage4 trunk. Thanks, - Tom [nvptx] Fix prevent_branch_around_nothing 2018-03-20 Tom de Vries PR target/84954 * config/nvptx/nvptx.c (prevent_branch_around_nothing): Also update seen_label if seen_label is already set. --- gcc/config/nvptx/nvptx.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index a6f4443..7b0b182 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4419,14 +4419,15 @@ prevent_branch_around_nothing (void) rtx_insn *seen_label = NULL; for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn)) { - if (seen_label == NULL) + if (INSN_P (insn) && condjump_p (insn)) { - if (INSN_P (insn) && condjump_p (insn)) - seen_label = label_ref_label (nvptx_condjump_label (insn, false)); - + seen_label = label_ref_label (nvptx_condjump_label (insn, false)); continue; } + if (seen_label == NULL) + continue; + if (NOTE_P (insn) || DEBUG_INSN_P (insn)) continue;