Message ID | 9cde09ef-8a7f-fbc7-3677-fb6763cc82b4@suse.de |
---|---|
State | New |
Headers | show |
Series | [RFC,nvptx,libgomp] Add 128-bit atomic support | expand |
On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote: > And test-case passes on x86_64 with this patch (obviously, in > combination with trigger patch above). > > Jakub, WDYT? I guess the normal answer would be use libatomic, but it isn't ported for nvptx. I guess at least temporarily this is ok, though I'm wondering why you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and __atomic_*. What happens if you try unsigned __int128 v; #pragma omp declare target (v) int main () { #pragma omp target { __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED); __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED); unsigned __int128v exp = 2; __atomic_compare_exchange_n (&v, &expected, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE); } } etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)? I guess better not to throw everything into one test, because not every target supports them all (e.g. I think x86_64 doesn't really do 128-bit atomic loads because the cmpxchg16b insn are not appropriate for .rodata locations). Jakub
On 9/2/20 12:22 PM, Tom de Vries wrote: > Tobias, can you try on powerpc? Testcase now compiles and runs w/o error message. On 9/2/20 12:44 PM, Jakub Jelinek wrote: > I guess the normal answer would be use libatomic, but it isn't ported for > nvptx. > I guess at least temporarily this is ok,though I'm wondering why > you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and > __atomic_*. > > What happens if you try > unsigned __int128 v; ... I had to change "unsigned __int128" and "unsigned __int128v" to "__uint128_t" and "expected" to "exp". Result without offloading configured on x86-64-gnu-linux: aotmic.c:(.text+0x84): undefined reference to `__atomic_fetch_add_16' /usr/bin/ld: aotmic.c:(.text+0xa3): undefined reference to `__atomic_fetch_add_16' /usr/bin/ld: aotmic.c:(.text+0xda): undefined reference to `__atomic_compare_exchange_16' And on PowerPC with nvptx (without the RFC patch): atomic.c: In function 'main._omp_fn.0': atomic.c:6:11: internal compiler error: in write_fn_proto, at config/nvptx/nvptx.c:913 6 | #pragma omp target | ^ Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
On 9/2/20 12:44 PM, Jakub Jelinek wrote: > On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote: >> And test-case passes on x86_64 with this patch (obviously, in >> combination with trigger patch above). >> >> Jakub, WDYT? > > I guess the normal answer would be use libatomic, but it isn't ported for > nvptx. Ah, I was not aware of that one, filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that. > I guess at least temporarily this is ok, though I'm wondering why > you need __sync_*_16 rather than __atomic_*_16, That's what omp-expand.c uses in expand_omp_atomic_pipeline: BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N . Thanks, - Tom > or perhaps both __sync_* and > __atomic_*. > > What happens if you try > unsigned __int128 v; > #pragma omp declare target (v) > int > main () > { > #pragma omp target > { > __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED); > __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED); > unsigned __int128v exp = 2; > __atomic_compare_exchange_n (&v, &expected, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE); > } > } > etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)? > I guess better not to throw everything into one test, because not every > target supports them all (e.g. I think x86_64 doesn't really do 128-bit > atomic loads because the cmpxchg16b insn are not appropriate for .rodata > locations). > > Jakub >
On 9/2/20 1:48 PM, Tom de Vries wrote: > On 9/2/20 12:44 PM, Jakub Jelinek wrote: >> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote: >>> And test-case passes on x86_64 with this patch (obviously, in >>> combination with trigger patch above). >>> >>> Jakub, WDYT? >> >> I guess the normal answer would be use libatomic, but it isn't ported for >> nvptx. > > Ah, I was not aware of that one, filed > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that. > >> I guess at least temporarily this is ok, though I'm wondering why >> you need __sync_*_16 rather than __atomic_*_16, > > That's what omp-expand.c uses in expand_omp_atomic_pipeline: > BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N . > I've got an updated version of this patch. It: - no longer supplies the __atomic_load_16, since that's now handled by libatomic - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap, which also falls back on libatomic. I'm currently retesting. Any comments? Otherwise, I'll commit on Monday. Thanks, - Tom
[ Fixing ENOPATCH. ] On 9/11/20 4:24 PM, Tom de Vries wrote: > On 9/2/20 1:48 PM, Tom de Vries wrote: >> On 9/2/20 12:44 PM, Jakub Jelinek wrote: >>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote: >>>> And test-case passes on x86_64 with this patch (obviously, in >>>> combination with trigger patch above). >>>> >>>> Jakub, WDYT? >>> >>> I guess the normal answer would be use libatomic, but it isn't ported for >>> nvptx. >> >> Ah, I was not aware of that one, filed >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that. >> >>> I guess at least temporarily this is ok, though I'm wondering why >>> you need __sync_*_16 rather than __atomic_*_16, >> >> That's what omp-expand.c uses in expand_omp_atomic_pipeline: >> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N . >> > > I've got an updated version of this patch. It: > - no longer supplies the __atomic_load_16, since that's now handled by > libatomic > - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap, > which also falls back on libatomic. > > I'm currently retesting. > > Any comments? > > Otherwise, I'll commit on Monday. > > Thanks, > - Tom >
On Fri, Sep 11, 2020 at 04:24:42PM +0200, Tom de Vries wrote: > I've got an updated version of this patch. It: > - no longer supplies the __atomic_load_16, since that's now handled by > libatomic > - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap, > which also falls back on libatomic. > > I'm currently retesting. > > Any comments? > > Otherwise, I'll commit on Monday. If some functions are now in libatomic, do we expect users to know that and pass -foffload=-latomic to link, or will mkoffload or whatever do that automatically? If the latter, will e.g. libgomp testsuite ensure that during testing the library can be found even non-installed, if the former, will libgomp testsuite add it for the respective testcases that need it, perhaps under special options? Jakub
On 11/09/2020 15:25, Tom de Vries wrote: > --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c > +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c > > @@ -1,4 +1,5 @@ > > /*·{·dg-do·run·}·*/ > +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ This will probably break amdgcn, where libatomic does not exist. Andrew
On 2020-09-11 16:48, Andrew Stubbs wrote: > On 11/09/2020 15:25, Tom de Vries wrote: >> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c >> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c >> >> @@ -1,4 +1,5 @@ >> >> /*·{·dg-do·run·}·*/ >> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ > > This will probably break amdgcn, where libatomic does not exist. > It looks like the customary way to handle that is to use offload_target_nvptx. Thanks, - Tom
On 9/11/20 5:03 PM, tdevries wrote: > On 2020-09-11 16:48, Andrew Stubbs wrote: >> On 11/09/2020 15:25, Tom de Vries wrote: >>> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c >>> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c >>> >>> @@ -1,4 +1,5 @@ >>> >>> /*·{·dg-do·run·}·*/ >>> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ >> >> This will probably break amdgcn, where libatomic does not exist. >> > It looks like the customary way to handle that is to use > offload_target_nvptx. Or { target { powerpc*-*-* } } ? For some (known) reasons, the __sync_val_compare_and_swap_16 is produced for powerpc but not for x86-64. I could imagine that GCN is affected in the same way as nvptx, except that AMD's ROC is currently not supported for PowerPC, if I understand it correctly. If FAIL start to occur in some CPU/GPU combinations, it can be still revisited. Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
[nvptx, libgomp] Add 128-bit atomic support --- libgomp/config/nvptx/atomic.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c new file mode 100644 index 00000000000..49a6d350827 --- /dev/null +++ b/libgomp/config/nvptx/atomic.c @@ -0,0 +1,34 @@ +#include <stdbool.h> + +#include "../../atomic.c" + +unsigned __int128 +__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval, + unsigned __int128 newval) +{ + volatile unsigned __int128 *ptr = vptr; + GOMP_atomic_start (); + unsigned __int128 val = *ptr; + if (val == oldval) + *ptr = newval; + GOMP_atomic_end (); + return val; +} + +bool +__sync_bool_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval, + unsigned __int128 newval) +{ + return __sync_val_compare_and_swap_16 (vptr, oldval, newval) == oldval; +} + +unsigned __int128 +__atomic_load_16 (const volatile void *vptr, + int memorder __attribute__((unused))) +{ + const volatile unsigned __int128 *ptr = vptr; + GOMP_atomic_start (); + unsigned __int128 val = *ptr; + GOMP_atomic_end (); + return val; +}