Message ID | 20210517154930.GA9410@delia |
---|---|
State | New |
Headers | show |
Series | [nvptx] Handle memmodel for atomic ops | expand |
On 17.05.21 17:49, Tom de Vries wrote: > [ Tobias, can you test this on volta ? ] Unfortunately, it does not seem to help. On a non-Volta system, it still works (run time 0.3s) but on a Volta system it fails after 1.5s (abort). Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1; with no prior membar in GOMP_atomic_start. Likewise with nvptx-none/lib/libgomp.a which has atom.global.exch.b32 %r22,[atomic_lock],1; I thought a barrier would show up there? > The atomic ops in nvptx.md have memmodel arguments, which are currently > ignored. > Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c > on volta. Is there a reason that PR target/96932 isn't listed in the ChangeLog? Or is it supposed that the barrier does not show up at GOMP_atomic_start (as it doesn't) and it should show up elsewhere and still help with those two testcases? Sorry for not having better news. (Unless I messed up and it is an issue on my side - but it doesn't look like.) Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
On 5/17/21 6:47 PM, Tobias Burnus wrote: > On 17.05.21 17:49, Tom de Vries wrote: >> [ Tobias, can you test this on volta ? ] > > Unfortunately, it does not seem to help. On a non-Volta system, it still > works (run time 0.3s) but on a Volta system it fails after 1.5s (abort). > > Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see > @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1; > with no prior membar in GOMP_atomic_start. I have: ... @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1; @ %r25 membar.sys; ... > Likewise with > nvptx-none/lib/libgomp.a which has > atom.global.exch.b32 %r22,[atomic_lock],1; > I thought a barrier would show up there? > and: ... atom.global.exch.b32 %r22,[atomic_lock],1; membar.sys; ... So both look as expect. >> The atomic ops in nvptx.md have memmodel arguments, which are currently >> ignored. >> Handle these, fixing test-case fails >> libgomp.c-c++-common/reduction-{5,6}.c >> on volta. > > Is there a reason that PR target/96932 isn't listed in the > ChangeLog? Just that I'm going to mark it a duplicate when this is fixed. > Or is it supposed that the barrier does not show up > at GOMP_atomic_start (as it doesn't) and it should show up elsewhere > and still help with those two testcases? > Nope, it should show up. > Sorry for not having better news. (Unless I messed up and it is an > issue on my side - but it doesn't look like.) Well yes, it's possible that the patch somehow does not work, but then you'll need to investigate why that is. Thanks, - Tom
Hi Tom, short version works :-) now (and I don't know why it didn't). Long version: On 17.05.21 18:58, Tom de Vries wrote: > I have: > ... > @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1; > @ %r25 membar.sys; > ... I tried with the PowerPC cross build - running on PowerPC + Quadro GV100 did work for both reduction-{5,6}.c and I can confirm that this line does appear in all four nvptx libgomp.a (x86->power, power->power and lib/ vs. lib/mgomp/) On the x86-64 build, which did fail before and where I didn't spot that line, it now also shows up now (after 'rm -rf' all files) – I wonder why it didn't work before; the script should have deleted the directory before compiling the files. In any case, it is now works there as well with non-Volta 'Tesla K80' and with 'TITAN V'. Sorry for the red herring and thanks for the patch! Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index 15122096487..b7e6ae26522 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -57,5 +57,6 @@ extern const char *nvptx_output_set_softstack (unsigned); extern const char *nvptx_output_simt_enter (rtx, rtx, rtx); extern const char *nvptx_output_simt_exit (rtx); extern const char *nvptx_output_red_partition (rtx, rtx); +extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int); #endif #endif diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index ebbfa921589..722b0faa330 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -2444,6 +2444,53 @@ nvptx_output_mov_insn (rtx dst, rtx src) return "%.\tcvt%t0%t1\t%0, %1;"; } +/* Output a pre/post barrier for MEM_OPERAND according to MEMMODEL. */ + +static void +nvptx_output_barrier (rtx *mem_operand, int memmodel, bool pre_p) +{ + bool post_p = !pre_p; + + switch (memmodel) + { + case MEMMODEL_RELAXED: + return; + case MEMMODEL_CONSUME: + case MEMMODEL_ACQUIRE: + case MEMMODEL_SYNC_ACQUIRE: + if (post_p) + break; + return; + case MEMMODEL_RELEASE: + case MEMMODEL_SYNC_RELEASE: + if (pre_p) + break; + return; + case MEMMODEL_ACQ_REL: + case MEMMODEL_SEQ_CST: + case MEMMODEL_SYNC_SEQ_CST: + if (pre_p || post_p) + break; + return; + default: + gcc_unreachable (); + } + + output_asm_insn ("%.\tmembar%B0;", mem_operand); +} + +const char * +nvptx_output_atomic_insn (const char *asm_template, rtx *operands, int mem_pos, + int memmodel_pos) +{ + nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]), + true); + output_asm_insn (asm_template, operands); + nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]), + false); + return ""; +} + static void nvptx_print_operand (FILE *, rtx, int); /* Output INSN, which is a call to CALLEE with result RESULT. For ptx, this @@ -2660,6 +2707,36 @@ nvptx_print_operand (FILE *file, rtx x, int code) switch (code) { + case 'B': + if (SYMBOL_REF_P (XEXP (x, 0))) + switch (SYMBOL_DATA_AREA (XEXP (x, 0))) + { + case DATA_AREA_GENERIC: + /* Assume worst-case: global. */ + gcc_fallthrough (); /* FALLTHROUGH. */ + case DATA_AREA_GLOBAL: + break; + case DATA_AREA_SHARED: + fputs (".cta", file); + return; + case DATA_AREA_LOCAL: + case DATA_AREA_CONST: + case DATA_AREA_PARAM: + default: + gcc_unreachable (); + } + + /* There are 2 cases where membar.sys differs from membar.gl: + - host accesses global memory (f.i. systemwide atomics) + - 2 or more devices are setup in peer-to-peer mode, and one + peer can access global memory of other peer. + Neither are currently supported by openMP/OpenACC on nvptx, but + that could change, so we default to membar.sys. We could support + this more optimally by adding DATA_AREA_SYS and then emitting + .gl for DATA_AREA_GLOBAL and .sys for DATA_AREA_SYS. */ + fputs (".sys", file); + return; + case 'A': x = XEXP (x, 0); gcc_fallthrough (); /* FALLTHROUGH. */ diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 00bb8fea821..108de1c0c59 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1642,7 +1642,11 @@ (set (match_dup 1) (unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))] "" - "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;" + { + const char *t + = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"; + return nvptx_output_atomic_insn (t, operands, 1, 4); + } [(set_attr "atomic" "true")]) (define_insn "atomic_exchange<mode>" @@ -1654,7 +1658,11 @@ (set (match_dup 1) (match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))] ;; input "" - "%.\\tatom%A1.exch.b%T0\\t%0, %1, %2;" + { + const char *t + = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } [(set_attr "atomic" "true")]) (define_insn "atomic_fetch_add<mode>" @@ -1667,7 +1675,11 @@ (set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (match_dup 1))] "" - "%.\\tatom%A1.add%t0\\t%0, %1, %2;" + { + const char *t + = "%.\\tatom%A1.add%t0\\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } [(set_attr "atomic" "true")]) (define_insn "atomic_fetch_addsf" @@ -1680,7 +1692,11 @@ (set (match_operand:SF 0 "nvptx_register_operand" "=R") (match_dup 1))] "" - "%.\\tatom%A1.add%t0\\t%0, %1, %2;" + { + const char *t + = "%.\\tatom%A1.add%t0\\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } [(set_attr "atomic" "true")]) (define_code_iterator any_logic [and ior xor]) @@ -1696,7 +1712,12 @@ (set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (match_dup 1))] "<MODE>mode == SImode || TARGET_SM35" - "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;" + { + const char *t + = "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;"; + return nvptx_output_atomic_insn (t, operands, 1, 3); + } + [(set_attr "atomic" "true")]) (define_expand "atomic_test_and_set"