Message ID | AM6PR03MB517068C1808B39346FDC2728E4FD0@AM6PR03MB5170.eurprd03.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | Avoid atomic for guard acquire when that is expensive | expand |
On 11/22/20 3:05 AM, Bernd Edlinger wrote: > Hi, > > this avoids the need to use -fno-threadsafe-statics on > arm-none-eabi or working around that problem by supplying > a dummy __sync_synchronize function which might > just lead to silent code failure of the worst kind > (non-reproducable, racy) at runtime, as was pointed out > on previous discussions here. > > When the atomic access involves a call to __sync_synchronize > it is better to call __cxa_guard_acquire unconditionally, > since it handles the atomics too, or is a non-threaded > implementation when there is no gthread support for this target. > > This fixes also a bug for the ARM EABI big-endian target, > that is, previously the wrong bit was checked. Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? Jason
Hi, I'd like to ping for this patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html Thanks Bernd. On 11/22/20 9:05 AM, Bernd Edlinger wrote: > Hi, > > this avoids the need to use -fno-threadsafe-statics on > arm-none-eabi or working around that problem by supplying > a dummy __sync_synchronize function which might > just lead to silent code failure of the worst kind > (non-reproducable, racy) at runtime, as was pointed out > on previous discussions here. > > When the atomic access involves a call to __sync_synchronize > it is better to call __cxa_guard_acquire unconditionally, > since it handles the atomics too, or is a non-threaded > implementation when there is no gthread support for this target. > > This fixes also a bug for the ARM EABI big-endian target, > that is, previously the wrong bit was checked. > > > Regression tested successfully on arm-none-eabi with newlib-3.3.0. > > Is it OK for trunk? > > > Thanks > Bernd. >
On 11/30/20 3:08 PM, Bernd Edlinger wrote: > Hi, > > I'd like to ping for this patch: I reviewed it on the 24th: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560118.html > > https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html > > Thanks > Bernd. > > On 11/22/20 9:05 AM, Bernd Edlinger wrote: >> Hi, >> >> this avoids the need to use -fno-threadsafe-statics on >> arm-none-eabi or working around that problem by supplying >> a dummy __sync_synchronize function which might >> just lead to silent code failure of the worst kind >> (non-reproducable, racy) at runtime, as was pointed out >> on previous discussions here. >> >> When the atomic access involves a call to __sync_synchronize >> it is better to call __cxa_guard_acquire unconditionally, >> since it handles the atomics too, or is a non-threaded >> implementation when there is no gthread support for this target. >> >> This fixes also a bug for the ARM EABI big-endian target, >> that is, previously the wrong bit was checked. >> >> >> Regression tested successfully on arm-none-eabi with newlib-3.3.0. >> >> Is it OK for trunk? >> >> >> Thanks >> Bernd. >> >
On 11/24/20 11:10 PM, Jason Merrill wrote: > On 11/22/20 3:05 AM, Bernd Edlinger wrote: >> Hi, >> >> this avoids the need to use -fno-threadsafe-statics on >> arm-none-eabi or working around that problem by supplying >> a dummy __sync_synchronize function which might >> just lead to silent code failure of the worst kind >> (non-reproducable, racy) at runtime, as was pointed out >> on previous discussions here. >> >> When the atomic access involves a call to __sync_synchronize >> it is better to call __cxa_guard_acquire unconditionally, >> since it handles the atomics too, or is a non-threaded >> implementation when there is no gthread support for this target. >> >> This fixes also a bug for the ARM EABI big-endian target, >> that is, previously the wrong bit was checked. > > Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? > Yes, thanks, that should work too. Would you like this better? Thanks Bernd.
On 12/1/20 1:28 PM, Bernd Edlinger wrote: > On 11/24/20 11:10 PM, Jason Merrill wrote: >> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>> Hi, >>> >>> this avoids the need to use -fno-threadsafe-statics on >>> arm-none-eabi or working around that problem by supplying >>> a dummy __sync_synchronize function which might >>> just lead to silent code failure of the worst kind >>> (non-reproducable, racy) at runtime, as was pointed out >>> on previous discussions here. >>> >>> When the atomic access involves a call to __sync_synchronize >>> it is better to call __cxa_guard_acquire unconditionally, >>> since it handles the atomics too, or is a non-threaded >>> implementation when there is no gthread support for this target. >>> >>> This fixes also a bug for the ARM EABI big-endian target, >>> that is, previously the wrong bit was checked. >> >> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? > > Yes, thanks, that should work too. > Would you like this better? > +is_atomic_expensive_p (machine_mode mode) > +{ > + if (!flag_inline_atomics) > + return false; Why not true? > + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) > + return false; This also seems backwards; I'd think we want to return false if either of those tests are true. Or maybe just can_atomic_load_p, and not bother about compare-and-swap. > + tree type = targetm.cxx.guard_mask_bit () > + ? TREE_TYPE (guard) : char_type_node; > + > + if (is_atomic_expensive_p (TYPE_MODE (type))) > + guard = integer_zero_node; > + else > + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); It should still work to load a single byte, it just needs to be the least-significant byte. And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. Jason
On 12/2/20 7:57 PM, Jason Merrill wrote: > On 12/1/20 1:28 PM, Bernd Edlinger wrote: >> On 11/24/20 11:10 PM, Jason Merrill wrote: >>> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>>> Hi, >>>> >>>> this avoids the need to use -fno-threadsafe-statics on >>>> arm-none-eabi or working around that problem by supplying >>>> a dummy __sync_synchronize function which might >>>> just lead to silent code failure of the worst kind >>>> (non-reproducable, racy) at runtime, as was pointed out >>>> on previous discussions here. >>>> >>>> When the atomic access involves a call to __sync_synchronize >>>> it is better to call __cxa_guard_acquire unconditionally, >>>> since it handles the atomics too, or is a non-threaded >>>> implementation when there is no gthread support for this target. >>>> >>>> This fixes also a bug for the ARM EABI big-endian target, >>>> that is, previously the wrong bit was checked. >>> >>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? >> >> Yes, thanks, that should work too. >> Would you like this better? > >> +is_atomic_expensive_p (machine_mode mode) >> +{ >> + if (!flag_inline_atomics) >> + return false; > > Why not true? > Ooops... Yes, I ought to return true here. I must have made a mistake when I tested the last version of this patch, sorry for the confusion. >> + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) >> + return false; > > This also seems backwards; I'd think we want to return false if either of those tests are true. Or maybe just can_atomic_load_p, and not bother about compare-and-swap. > Yes, you are right. Unfortuately can_atomic_load_p is too weak, since it does not cover the memory barrier. And can_compare_and_swap_p (..., false) is actually a bit too strong, but if it returns true, we should be able to use any atomic without need for a library call. >> + tree type = targetm.cxx.guard_mask_bit () >> + ? TREE_TYPE (guard) : char_type_node; >> + >> + if (is_atomic_expensive_p (TYPE_MODE (type))) >> + guard = integer_zero_node; >> + else >> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); > > It should still work to load a single byte, it just needs to be the least-significant byte. And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. > I think the non-EABI code is always using bit 0 in the first byte, by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. For all other targets when _GLIBCXX_USE_FUTEX is defined, __cxa_guard_XXX accesses the value as int* while the memory is a 64-bit long, so I could imagine that is an aliasing violation. But nothing that needs to be fixed immediately. Attached is the corrected patch. Tested again on arm-none-eabi with arm-sim. Is it OK for trunk? Thanks Bernd. > Jason >
On 12/5/20 7:37 AM, Bernd Edlinger wrote: > On 12/2/20 7:57 PM, Jason Merrill wrote: >> On 12/1/20 1:28 PM, Bernd Edlinger wrote: >>> On 11/24/20 11:10 PM, Jason Merrill wrote: >>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>>>> Hi, >>>>> >>>>> this avoids the need to use -fno-threadsafe-statics on >>>>> arm-none-eabi or working around that problem by supplying >>>>> a dummy __sync_synchronize function which might >>>>> just lead to silent code failure of the worst kind >>>>> (non-reproducable, racy) at runtime, as was pointed out >>>>> on previous discussions here. >>>>> >>>>> When the atomic access involves a call to __sync_synchronize >>>>> it is better to call __cxa_guard_acquire unconditionally, >>>>> since it handles the atomics too, or is a non-threaded >>>>> implementation when there is no gthread support for this target. >>>>> >>>>> This fixes also a bug for the ARM EABI big-endian target, >>>>> that is, previously the wrong bit was checked. >>>> >>>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? >>> >>> Yes, thanks, that should work too. >>> Would you like this better? >> >>> +is_atomic_expensive_p (machine_mode mode) >>> +{ >>> + if (!flag_inline_atomics) >>> + return false; >> >> Why not true? > > Ooops... > Yes, I ought to return true here. > I must have made a mistake when I tested the last version of this patch, > sorry for the confusion. > >>> + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) >>> + return false; >> >> This also seems backwards; I'd think we want to return false if either of those tests are true. Or maybe just can_atomic_load_p, and not bother about compare-and-swap. > > Yes, you are right. > Unfortuately can_atomic_load_p is too weak, since it does not cover > the memory barrier. > > And can_compare_and_swap_p (..., false) is actually a bit too strong, > but if it returns true, we should be able to use any atomic without > need for a library call. >>> + tree type = targetm.cxx.guard_mask_bit () >>> + ? TREE_TYPE (guard) : char_type_node; >>> + >>> + if (is_atomic_expensive_p (TYPE_MODE (type))) >>> + guard = integer_zero_node; >>> + else >>> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); >> >> It should still work to load a single byte, it just needs to be the least-significant byte. I still don't think you need to load the whole word to check the guard bit. > And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. >> > > I think the non-EABI code is always using bit 0 in the first byte, > by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). Except that set_guard sets the least-significant bit on all targets. > Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. > > For all other targets when _GLIBCXX_USE_FUTEX is defined, > __cxa_guard_XXX accesses the value as int* while the memory > is a 64-bit long, so I could imagine that is an aliasing violation. > > > But nothing that needs to be fixed immediately. Agreed. > Attached is the corrected patch. > > Tested again on arm-none-eabi with arm-sim. > Is it OK for trunk? > > Thanks > Bernd. > >> Jason >>
On 12/7/20 4:04 PM, Jason Merrill wrote: > On 12/5/20 7:37 AM, Bernd Edlinger wrote: >> On 12/2/20 7:57 PM, Jason Merrill wrote: >>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> + tree type = targetm.cxx.guard_mask_bit () >>>> + ? TREE_TYPE (guard) : char_type_node; >>>> + >>>> + if (is_atomic_expensive_p (TYPE_MODE (type))) >>>> + guard = integer_zero_node; >>>> + else >>>> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); >>> >>> It should still work to load a single byte, it just needs to be the least-significant byte. > > I still don't think you need to load the whole word to check the guard bit. > Of course that is also possible. But I would not expect an address offset and a byte access to be cheaper than a word access. I just saw that get_guard_bits does the same thing: access the first byte if !targetm.cxx.guard_mask_bit () and access the whole word otherwise, which is only there for ARM EABI. >> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. >>> >> >> I think the non-EABI code is always using bit 0 in the first byte, >> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). > > Except that set_guard sets the least-significant bit on all targets. > Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (), and as the first byte otherwise. Of course it could get the third byte, if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated this way, wouldn't it? Bernd. >> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. >> >> For all other targets when _GLIBCXX_USE_FUTEX is defined, >> __cxa_guard_XXX accesses the value as int* while the memory >> is a 64-bit long, so I could imagine that is an aliasing violation. >> >> >> But nothing that needs to be fixed immediately. > > Agreed. > >> Attached is the corrected patch. >> >> Tested again on arm-none-eabi with arm-sim. >> Is it OK for trunk? >> >> Thanks >> Bernd. >> >>> Jason >>> >
On 12/7/20 11:17 AM, Bernd Edlinger wrote: > On 12/7/20 4:04 PM, Jason Merrill wrote: >> On 12/5/20 7:37 AM, Bernd Edlinger wrote: >>> On 12/2/20 7:57 PM, Jason Merrill wrote: >>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> + tree type = targetm.cxx.guard_mask_bit () >>>>> + ? TREE_TYPE (guard) : char_type_node; >>>>> + >>>>> + if (is_atomic_expensive_p (TYPE_MODE (type))) >>>>> + guard = integer_zero_node; >>>>> + else >>>>> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); >>>> >>>> It should still work to load a single byte, it just needs to be the least-significant byte. >> >> I still don't think you need to load the whole word to check the guard bit. > > Of course that is also possible. But I would not expect an > address offset and a byte access to be cheaper than a word access. Fair point. > I just saw that get_guard_bits does the same thing: > access the first byte if !targetm.cxx.guard_mask_bit () > and access the whole word otherwise, which is only > there for ARM EABI. >>> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. >>> >>> I think the non-EABI code is always using bit 0 in the first byte, >>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). >> >> Except that set_guard sets the least-significant bit on all targets. > > Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (), > and as the first byte otherwise. Of course it could get the third byte, > if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated > this way, wouldn't it? Ah, yes, I was overlooking that set_guard uses get_guard_bits. The patch is OK. >>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. >>> >>> For all other targets when _GLIBCXX_USE_FUTEX is defined, >>> __cxa_guard_XXX accesses the value as int* while the memory >>> is a 64-bit long, so I could imagine that is an aliasing violation. >>> >>> >>> But nothing that needs to be fixed immediately. >> >> Agreed. >> >>> Attached is the corrected patch. >>> >>> Tested again on arm-none-eabi with arm-sim. >>> Is it OK for trunk? >>> >>> Thanks >>> Bernd. >>> >>>> Jason >>>> >> >
From 9fd070407408cf10789f5e9eb5ddda2fb3798e6f Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Sun, 22 Nov 2020 08:11:14 +0100 Subject: [PATCH] Avoid atomic for guard acquire when that is expensive When the atomic access involves a call to __sync_synchronize it is better to call __cxa_guard_acquire unconditionally, since it handles the atomics too, or is a non-threaded implementation when there is no gthread support for this target. This fixes also a bug for the ARM EABI big-endian target, that is, previously the wrong bit was checked. gcc: 2020-11-22 Bernd Edlinger <bernd.edlinger@hotmail.de> * target.def (guard_atomic_expensive): New hook. * doc/tm.texi: Document TARGET_CXX_GUARD_ATOMIC_EXPENSIVE. * doc/tm.texi.in: Likewise. * config/arm/arm.c (arm_cxx_guard_atomic_expensive): New callback. gcc/cp: 2020-11-22 Bernd Edlinger <bernd.edlinger@hotmail.de> * decl2.c: (build_atomic_load_byte): Rename to... (build_atomic_load_type): ... and add new parameter type. (get_guard_cond): Skip the atomic here if that is expensive. Use the correct type for the atomic load on certain targets. --- gcc/config/arm/arm.c | 13 +++++++++++++ gcc/cp/decl2.c | 12 ++++++++---- gcc/doc/tm.texi | 7 +++++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 10 ++++++++++ 5 files changed, 40 insertions(+), 4 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04190b1..04ca1fe 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -235,6 +235,7 @@ static rtx arm_dwarf_register_span (rtx); static tree arm_cxx_guard_type (void); static bool arm_cxx_guard_mask_bit (void); +static bool arm_cxx_guard_atomic_expensive (void); static tree arm_get_cookie_size (tree); static bool arm_cookie_has_size (void); static bool arm_cxx_cdtor_returns_this (void); @@ -593,6 +594,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_CXX_GUARD_MASK_BIT #define TARGET_CXX_GUARD_MASK_BIT arm_cxx_guard_mask_bit +#undef TARGET_CXX_GUARD_ATOMIC_EXPENSIVE +#define TARGET_CXX_GUARD_ATOMIC_EXPENSIVE arm_cxx_guard_atomic_expensive + #undef TARGET_CXX_GET_COOKIE_SIZE #define TARGET_CXX_GET_COOKIE_SIZE arm_get_cookie_size @@ -28882,6 +28886,15 @@ arm_cxx_guard_mask_bit (void) } +/* Return true if atomics involve a call to __sync_synchronize. */ + +static bool +arm_cxx_guard_atomic_expensive (void) +{ + return TARGET_AAPCS_BASED && !TARGET_HAVE_DMB && !TARGET_HAVE_DMB_MCR; +} + + /* The EABI specifies that all array cookies are 8 bytes long. */ static tree diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 1bc7b7e..e2b29a6 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -3300,15 +3300,15 @@ get_guard (tree decl) /* Return an atomic load of src with the appropriate memory model. */ static tree -build_atomic_load_byte (tree src, HOST_WIDE_INT model) +build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type) { - tree ptr_type = build_pointer_type (char_type_node); + tree ptr_type = build_pointer_type (type); tree mem_model = build_int_cst (integer_type_node, model); tree t, addr, val; unsigned int size; int fncode; - size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node)); + size = tree_to_uhwi (TYPE_SIZE_UNIT (type)); fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; t = builtin_decl_implicit ((enum built_in_function) fncode); @@ -3350,8 +3350,12 @@ get_guard_cond (tree guard, bool thread_safe) if (!thread_safe) guard = get_guard_bits (guard); + else if (targetm.cxx.guard_atomic_expensive ()) + guard = integer_zero_node; else - guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE); + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, + targetm.cxx.guard_mask_bit () + ? TREE_TYPE (guard) : char_type_node); /* Mask off all but the low bit. */ if (targetm.cxx.guard_mask_bit ()) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 2b88f78..92215cf 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -10728,6 +10728,13 @@ This hook determines how guard variables are used. It should return @code{true} indicates that only the least significant bit should be used. @end deftypefn +@deftypefn {Target Hook} bool TARGET_CXX_GUARD_ATOMIC_EXPENSIVE (void) +This hook determines if the guard atomic is expensive. It should return +@code{false} (the default) if the atomic is inexpensive. A return value of +@code{true} indicates that the atomic is expensive i.e., involves a call to +__sync_synchronize. In this case let __cxa_guard_acquire handle the atomics. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_CXX_GET_COOKIE_SIZE (tree @var{type}) This hook returns the size of the cookie to use when allocating an array whose elements have the indicated @var{type}. Assumes that it is already diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 897f289..ce1d837 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7321,6 +7321,8 @@ floating-point support; they are not included in this mechanism. @hook TARGET_CXX_GUARD_MASK_BIT +@hook TARGET_CXX_GUARD_ATOMIC_EXPENSIVE + @hook TARGET_CXX_GET_COOKIE_SIZE @hook TARGET_CXX_COOKIE_HAS_SIZE diff --git a/gcc/target.def b/gcc/target.def index 810d554..0c02d5c 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6160,6 +6160,16 @@ DEFHOOK bool, (void), hook_bool_void_false) +/* Return true if the guard atomic is expensive. */ +DEFHOOK +(guard_atomic_expensive, + "This hook determines if the guard atomic is expensive. It should return\n\ +@code{false} (the default) if the atomic is inexpensive. A return value of\n\ +@code{true} indicates that the atomic is expensive i.e., involves a call to\n\ +__sync_synchronize. In this case let __cxa_guard_acquire handle the atomics.", + bool, (void), + hook_bool_void_false) + /* Returns the size of the array cookie for an array of type. */ DEFHOOK (get_cookie_size, -- 1.9.1