From patchwork Mon May 17 15:49:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 1479611 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FkNqR1BJWz9sSs for ; Tue, 18 May 2021 01:49:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29FC33890439; Mon, 17 May 2021 15:49:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 2D817383582E for ; Mon, 17 May 2021 15:49:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2D817383582E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tdevries@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1F9D3AE93; Mon, 17 May 2021 15:49:33 +0000 (UTC) Date: Mon, 17 May 2021 17:49:31 +0200 From: Tom de Vries To: gcc-patches@gcc.gnu.org Subject: [PATCH][nvptx] Handle memmodel for atomic ops Message-ID: <20210517154930.GA9410@delia> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_STOCKGEN, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tobias Burnus Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi, [ Tobias, can you test this on volta ? ] The atomic ops in nvptx.md have memmodel arguments, which are currently ignored. Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c on volta. Tested libgomp on x86_64-linux with nvptx accelerator. Any comments? Thanks, - Tom [nvptx] Handle memmodel for atomic ops gcc/ChangeLog: 2021-05-17 Tom de Vries PR target/100497 * config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare * config/nvptx/nvptx.c (nvptx_output_barrier) (nvptx_output_atomic_insn): New function. (nvptx_print_operand): Add support for 'B'. * config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic insns. --- gcc/config/nvptx/nvptx-protos.h | 1 + gcc/config/nvptx/nvptx.c | 77 +++++++++++++++++++++++++++++++++++++++++ gcc/config/nvptx/nvptx.md | 31 ++++++++++++++--- 3 files changed, 104 insertions(+), 5 deletions(-) diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index 15122096487..b7e6ae26522 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -57,5 +57,6 @@ extern const char *nvptx_output_set_softstack (unsigned); extern const char *nvptx_output_simt_enter (rtx, rtx, rtx); extern const char *nvptx_output_simt_exit (rtx); extern const char *nvptx_output_red_partition (rtx, rtx); +extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int); #endif #endif diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index ebbfa921589..722b0faa330 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -2444,6 +2444,53 @@ nvptx_output_mov_insn (rtx dst, rtx src) return "%.\tcvt%t0%t1\t%0, %1;"; } +/* Output a pre/post barrier for MEM_OPERAND according to MEMMODEL. */ + +static void +nvptx_output_barrier (rtx *mem_operand, int memmodel, bool pre_p) +{ + bool post_p = !pre_p; + + switch (memmodel) + { + case MEMMODEL_RELAXED: + return; + case MEMMODEL_CONSUME: + case MEMMODEL_ACQUIRE: + case MEMMODEL_SYNC_ACQUIRE: + if (post_p) + break; + return; + case MEMMODEL_RELEASE: + case MEMMODEL_SYNC_RELEASE: + if (pre_p) + break; + return; + case MEMMODEL_ACQ_REL: + case MEMMODEL_SEQ_CST: + case MEMMODEL_SYNC_SEQ_CST: + if (pre_p || post_p) + break; + return; + default: + gcc_unreachable (); + } + + output_asm_insn ("%.\tmembar%B0;", mem_operand); +} + +const char * +nvptx_output_atomic_insn (const char *asm_template, rtx *operands, int mem_pos, + int memmodel_pos) +{ + nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]), + true); + output_asm_insn (asm_template, operands); + nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]), + false); + return ""; +} + static void nvptx_print_operand (FILE *, rtx, int); /* Output INSN, which is a call to CALLEE with result RESULT. For ptx, this @@ -2660,6 +2707,36 @@ nvptx_print_operand (FILE *file, rtx x, int code) switch (code) { + case 'B': + if (SYMBOL_REF_P (XEXP (x, 0))) + switch (SYMBOL_DATA_AREA (XEXP (x, 0))) + { + case DATA_AREA_GENERIC: + /* Assume worst-case: global. */ + gcc_fallthrough (); /* FALLTHROUGH. */ + case DATA_AREA_GLOBAL: + break; + case DATA_AREA_SHARED: + fputs (".cta", file); + return; + case DATA_AREA_LOCAL: + case DATA_AREA_CONST: + case DATA_AREA_PARAM: + default: + gcc_unreachable (); + } + + /* There are 2 cases where membar.sys differs from membar.gl: + - host accesses global memory (f.i. systemwide atomics) + - 2 or more devices are setup in peer-to-peer mode, and one + peer can access global memory of other peer. + Neither are currently supported by openMP/OpenACC on nvptx, but + that could change, so we default to membar.sys. We could support + this more optimally by adding DATA_AREA_SYS and then emitting + .gl for DATA_AREA_GLOBAL and .sys for DATA_AREA_SYS. */ + fputs (".sys", file); + return; + case 'A': x = XEXP (x, 0); gcc_fallthrough (); /* FALLTHROUGH. */ diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 00bb8fea821..108de1c0c59 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1642,7 +1642,11 @@ (set (match_dup 1) (unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))] "" - "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;" + { + const char *t + = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"; + return nvptx_output_atomic_insn (t, operands, 1, 4); + } [(set_attr "atomic" "true")]) (define_insn "atomic_exchange" @@ -1654,7 +1658,11 @@ (set (match_dup 1) (match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))] ;; input "" - "%.\\tatom%A1.exch.b%T0\\t%0, %1, %2;" + { + const char *t + = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } [(set_attr "atomic" "true")]) (define_insn "atomic_fetch_add" @@ -1667,7 +1675,11 @@ (set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (match_dup 1))] "" - "%.\\tatom%A1.add%t0\\t%0, %1, %2;" + { + const char *t + = "%.\\tatom%A1.add%t0\\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } [(set_attr "atomic" "true")]) (define_insn "atomic_fetch_addsf" @@ -1680,7 +1692,11 @@ (set (match_operand:SF 0 "nvptx_register_operand" "=R") (match_dup 1))] "" - "%.\\tatom%A1.add%t0\\t%0, %1, %2;" + { + const char *t + = "%.\\tatom%A1.add%t0\\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } [(set_attr "atomic" "true")]) (define_code_iterator any_logic [and ior xor]) @@ -1696,7 +1712,12 @@ (set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (match_dup 1))] "mode == SImode || TARGET_SM35" - "%.\\tatom%A1.b%T0.\\t%0, %1, %2;" + { + const char *t + = "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } + [(set_attr "atomic" "true")]) (define_expand "atomic_test_and_set"