Message ID | 1413921274.8483.65.camel@triegel.csb |
---|---|
State | New |
Headers | show |
On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote: > On powerpc, atomic_exchange_and_add is implemented without any barriers. > However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel > (which supposedly should have acquire / release barrier semantics) both > fall back to atomic_exchange_and_add if they are not defined (see > include/atomic.h). I have not reviewed existing code to see whether > this would indeed cause a bug, but this lack of barriers likely is a > (future) fault, and prevents any use of > atomic_exchange_and_add_{acq,rel} that actually rely on the barrier > semantics. Therefore, this patch defines > atomic_exchange_and_add_{acq,rel} using atomic_read_barrier / > atomic_write_barrier. > > I have NOT tested this. Can somebody who cares about powerpc please > have a look and test this? Thanks! This is incorrect. Do not apply these changes. The requirement for memory barriers depend on how the result of the exchange and add is used. For example if the value is a simple counter or the original value (result of the larx load) is used as an address there is NO need for a memory barrier. See PowerISA-2.07 BookII, Appendix B.2.1.2 So this is unnecessarily pessimistic and should not be the default.
On Fri, Oct 24, 2014 at 08:15:04AM -0500, Steven Munroe wrote: > On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote: > > On powerpc, atomic_exchange_and_add is implemented without any barriers. > > However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel > > (which supposedly should have acquire / release barrier semantics) both > > fall back to atomic_exchange_and_add if they are not defined (see > > include/atomic.h). I have not reviewed existing code to see whether > > this would indeed cause a bug, but this lack of barriers likely is a > > (future) fault, and prevents any use of > > atomic_exchange_and_add_{acq,rel} that actually rely on the barrier > > semantics. Therefore, this patch defines > > atomic_exchange_and_add_{acq,rel} using atomic_read_barrier / > > atomic_write_barrier. > > > > I have NOT tested this. Can somebody who cares about powerpc please > > have a look and test this? Thanks! > > This is incorrect. Do not apply these changes. > > The requirement for memory barriers depend on how the result of the > exchange and add is used. For example if the value is a simple counter > or the original value (result of the larx load) is used as an address > there is NO need for a memory barrier. > > See PowerISA-2.07 BookII, Appendix B.2.1.2 > > So this is unnecessarily pessimistic and should not be the default. I think you misread Torvald's email/patch. The idea is not to make barriers the default, but to provide working versions with acquire and release barriers for callers that need them since the default lacks any barriers. Rich
On Fri, 2014-10-24 at 08:15 -0500, Steven Munroe wrote: > On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote: > > On powerpc, atomic_exchange_and_add is implemented without any barriers. > > However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel > > (which supposedly should have acquire / release barrier semantics) both > > fall back to atomic_exchange_and_add if they are not defined (see > > include/atomic.h). I have not reviewed existing code to see whether > > this would indeed cause a bug, but this lack of barriers likely is a > > (future) fault, and prevents any use of > > atomic_exchange_and_add_{acq,rel} that actually rely on the barrier > > semantics. Therefore, this patch defines > > atomic_exchange_and_add_{acq,rel} using atomic_read_barrier / > > atomic_write_barrier. > > > > I have NOT tested this. Can somebody who cares about powerpc please > > have a look and test this? Thanks! > > This is incorrect. Do not apply these changes. Have a look at the different variants of atomic_exchange_and_add*. The _acq and _rel suffixes are used on the current atomic ops to designate those operations that have acquire and release barrier semantics. > The requirement for memory barriers depend on how the result of the > exchange and add is used. Yes, see above. Note that this doesn't add any barriers to atomic_exchange_and_add, just to atomic_exchange_and_add_acq and atomic_exchange_and_add_rel. If you are concerned about uses of unnecessary memory barriers on powerpc, the way forward is to help review concurrent code in glibc and help moving it over to the C11 memory model (see below). > For example if the value is a simple counter > or the original value (result of the larx load) is used as an address > there is NO need for a memory barrier. I do understand the concept of memory barriers, and their difference to atomic HW operations. Please also note that we won't be using address dependencies or such in the near future, because the language support isn't there yet. We don't have the resources to do the complicated dance with the compiler that the Linux kernel people do. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4215.pdf for more background. > See PowerISA-2.07 BookII, Appendix B.2.1.2 > > So this is unnecessarily pessimistic and should not be the default. It is not the default. And there are no defaults really, just that some implementations pick defaults. BTW, we're very likely to transition to using the C11 memory model. So at that point, all barriers will be explicit in the atomic operations, and we'll add actual memory_order_relaxed atomic operations (including for CAS and other atomic RMW ops). There won't be any default in the C11-like atomics we use.
On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote: > On powerpc, atomic_exchange_and_add is implemented without any barriers. > However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel > (which supposedly should have acquire / release barrier semantics) both > fall back to atomic_exchange_and_add if they are not defined (see > include/atomic.h). I have not reviewed existing code to see whether > this would indeed cause a bug, but this lack of barriers likely is a > (future) fault, and prevents any use of > atomic_exchange_and_add_{acq,rel} that actually rely on the barrier > semantics. Therefore, this patch defines > atomic_exchange_and_add_{acq,rel} using atomic_read_barrier / > atomic_write_barrier. > > I have NOT tested this. Can somebody who cares about powerpc please > have a look and test this? Thanks! Ping.
On 20-11-2014 09:04, Torvald Riegel wrote: > On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote: >> On powerpc, atomic_exchange_and_add is implemented without any barriers. >> However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel >> (which supposedly should have acquire / release barrier semantics) both >> fall back to atomic_exchange_and_add if they are not defined (see >> include/atomic.h). I have not reviewed existing code to see whether >> this would indeed cause a bug, but this lack of barriers likely is a >> (future) fault, and prevents any use of >> atomic_exchange_and_add_{acq,rel} that actually rely on the barrier >> semantics. Therefore, this patch defines >> atomic_exchange_and_add_{acq,rel} using atomic_read_barrier / >> atomic_write_barrier. >> >> I have NOT tested this. Can somebody who cares about powerpc please >> have a look and test this? Thanks! > Ping. > > Sorry, I missed it on my radar. I will review and test today, thanks.
Hi Torvald, On 21-10-2014 17:54, Torvald Riegel wrote: > \ > }) > +#define atomic_exchange_and_add_acq(mem, value) \ > + ({ \ > + __typeof (*(mem)) __result2; \ > + __result2 = atomic_exchange_and_add (mem, value); \ > + atomic_read_barrier (); \ > + __result2; Although it is not wrong by using a 'atomic_read_barrier' (lwsync), it adds a more expensive synchronization than required (isync). I would prefer if we use the already defined __arch_compare_and_exchange_val_[32|64]_[acq|rel] operations on powerpc. > \ > + }) > +#define atomic_exchange_and_add_rel(mem, value) \ > + ({ \ > + atomic_write_barrier (); \ > + atomic_exchange_and_add (mem, value); \ > + }) > > #define atomic_increment_val(mem) \ > ({ \
On Tue, 2014-11-25 at 13:07 -0200, Adhemerval Zanella wrote: > Hi Torvald, > > On 21-10-2014 17:54, Torvald Riegel wrote: > > \ > > }) > > +#define atomic_exchange_and_add_acq(mem, value) \ > > + ({ \ > > + __typeof (*(mem)) __result2; \ > > + __result2 = atomic_exchange_and_add (mem, value); \ > > + atomic_read_barrier (); \ > > + __result2; > > Although it is not wrong by using a 'atomic_read_barrier' (lwsync), it adds a more > expensive synchronization than required (isync). I would prefer if we use the > already defined __arch_compare_and_exchange_val_[32|64]_[acq|rel] operations on powerpc. That's fine with me. Do you want to go adapt and commit the patch (given that you can test this easily I guess), or should I?
On 25-11-2014 13:39, Torvald Riegel wrote: > On Tue, 2014-11-25 at 13:07 -0200, Adhemerval Zanella wrote: >> Hi Torvald, >> >> On 21-10-2014 17:54, Torvald Riegel wrote: >>> \ >>> }) >>> +#define atomic_exchange_and_add_acq(mem, value) \ >>> + ({ \ >>> + __typeof (*(mem)) __result2; \ >>> + __result2 = atomic_exchange_and_add (mem, value); \ >>> + atomic_read_barrier (); \ >>> + __result2; >> Although it is not wrong by using a 'atomic_read_barrier' (lwsync), it adds a more >> expensive synchronization than required (isync). I would prefer if we use the >> already defined __arch_compare_and_exchange_val_[32|64]_[acq|rel] operations on powerpc. > That's fine with me. Do you want to go adapt and commit the patch > (given that you can test this easily I guess), or should I? > I will do it, thanks.
commit 67da4e31ce865d57bb798d6247d893941f2148f3 Author: Torvald Riegel <triegel@redhat.com> Date: Tue Oct 21 21:21:57 2014 +0200 powerpc: Fix missing barriers in atomic_exchange_and_add_{acq,rel} diff --git a/sysdeps/powerpc/bits/atomic.h b/sysdeps/powerpc/bits/atomic.h index 2ffba48..b838631 100644 --- a/sysdeps/powerpc/bits/atomic.h +++ b/sysdeps/powerpc/bits/atomic.h @@ -253,6 +253,18 @@ typedef uintmax_t uatomic_max_t; abort (); \ __result; \ }) +#define atomic_exchange_and_add_acq(mem, value) \ + ({ \ + __typeof (*(mem)) __result2; \ + __result2 = atomic_exchange_and_add (mem, value); \ + atomic_read_barrier (); \ + __result2; \ + }) +#define atomic_exchange_and_add_rel(mem, value) \ + ({ \ + atomic_write_barrier (); \ + atomic_exchange_and_add (mem, value); \ + }) #define atomic_increment_val(mem) \ ({ \