Message ID | 1442418575-12297-5-git-send-email-boqun.feng@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, Sep 16, 2015 at 11:49:32PM +0800, Boqun Feng wrote: > Implement xchg_relaxed and define atomic{,64}_xchg_* as xchg_relaxed, > based on these _relaxed variants, release/acquire variants can be built. > > Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler > barriers. Hmm, and I note your previous patch creating the regular _relaxed thingies also removes the memory clobber. And looking at the ARM _relaxed patch from Will, I see their _relaxed ops are also not a compiler barrier. I must say I'm somewhat surprised by this level of relaxation, I had expected to only loose SMP barriers, not the program order ones. Is there a good argument for this?
On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > On Wed, Sep 16, 2015 at 11:49:32PM +0800, Boqun Feng wrote: > > Implement xchg_relaxed and define atomic{,64}_xchg_* as xchg_relaxed, > > based on these _relaxed variants, release/acquire variants can be built. > > > > Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler > > barriers. > > Hmm, and I note your previous patch creating the regular _relaxed > thingies also removes the memory clobber. > > And looking at the ARM _relaxed patch from Will, I see their _relaxed > ops are also not a compiler barrier. > > I must say I'm somewhat surprised by this level of relaxation, I had > expected to only loose SMP barriers, not the program order ones. > > Is there a good argument for this? Yes, when we say "relaxed", we really mean relaxed. ;-) Both the CPU and the compiler are allowed to reorder around relaxed operations. Thanx, Paul
On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > I must say I'm somewhat surprised by this level of relaxation, I had > > expected to only loose SMP barriers, not the program order ones. > > > > Is there a good argument for this? > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > Both the CPU and the compiler are allowed to reorder around relaxed > operations. Is this documented somewhere, because I completely missed this part.
On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > expected to only loose SMP barriers, not the program order ones. > > > > > > Is there a good argument for this? > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > operations. > > Is this documented somewhere, because I completely missed this part. Well, yes, these need to be added to the documentation. I am assuming that Will is looking to have the same effect as C11 memory_order_relaxed, which is relaxed in this sense. If he has something else in mind, he needs to tell us what it is and why. ;-) Thanx, Paul
On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote: > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > Is there a good argument for this? > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > operations. > > > > Is this documented somewhere, because I completely missed this part. > > Well, yes, these need to be added to the documentation. I am assuming > that Will is looking to have the same effect as C11 memory_order_relaxed, > which is relaxed in this sense. If he has something else in mind, > he needs to tell us what it is and why. ;-) I suspect he is; but I'm not _that_ up to date on the whole C11 stuff.
On Thu, Oct 01, 2015 at 08:23:09PM +0200, Peter Zijlstra wrote: > On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote: > > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > > > Is there a good argument for this? > > > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > > operations. > > > > > > Is this documented somewhere, because I completely missed this part. > > > > Well, yes, these need to be added to the documentation. I am assuming > > that Will is looking to have the same effect as C11 memory_order_relaxed, > > which is relaxed in this sense. If he has something else in mind, > > he needs to tell us what it is and why. ;-) > > I suspect he is; but I'm not _that_ up to date on the whole C11 stuff. Lucky you! ;-) Thanx, Paul
On Thu, Oct 01, 2015 at 07:03:01PM +0100, Paul E. McKenney wrote: > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > Is there a good argument for this? > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > operations. > > > > Is this documented somewhere, because I completely missed this part. > > Well, yes, these need to be added to the documentation. I am assuming > that Will is looking to have the same effect as C11 memory_order_relaxed, > which is relaxed in this sense. If he has something else in mind, > he needs to tell us what it is and why. ;-) I was treating them purely as being single-copy atomic and not providing any memory ordering guarantees (much like the non *_return atomic operations that we already have). I think this lines up with C11, minus the bits about data races which we don't call out anyway. Will
On Mon, Oct 05, 2015 at 03:44:07PM +0100, Will Deacon wrote: > On Thu, Oct 01, 2015 at 07:03:01PM +0100, Paul E. McKenney wrote: > > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > > > Is there a good argument for this? > > > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > > operations. > > > > > > Is this documented somewhere, because I completely missed this part. > > > > Well, yes, these need to be added to the documentation. I am assuming > > that Will is looking to have the same effect as C11 memory_order_relaxed, > > which is relaxed in this sense. If he has something else in mind, > > he needs to tell us what it is and why. ;-) > > I was treating them purely as being single-copy atomic and not providing > any memory ordering guarantees (much like the non *_return atomic operations > that we already have). I think this lines up with C11, minus the bits > about data races which we don't call out anyway. As long as it is single-copy atomic and not multi-copy atomic, I believe we are on the saem page. We have slowly been outlawing some sorts of data races over the past few years, and I would guess that this will continue, expecially if good tooling emerges (and KTSAN is showing some promise from what I can see). Thanx, Paul
Hi Paul, On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote: > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > Is there a good argument for this? > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > operations. > > > > Is this documented somewhere, because I completely missed this part. > > Well, yes, these need to be added to the documentation. I am assuming Maybe it's good time for us to call it out which operation should be a compiler barrier or a CPU barrier? I had something in my mind while I was working on this series, not really sure whether it's correct, but probably a start point: All global and local atomic operations are at least atomic(no one can observe the middle state) and volatile(compilers can't optimize out the memory access). Based on this, there are four strictness levels, one can rely on them: RELAXED: neither a compiler barrier or a CPU barrier LOCAL: a compiler barrier PARTIAL: both a compiler barrier and a CPU barrier but not transitive FULL: both compiler barrier and a CPU barrier, and transitive. RELAXED includes all _relaxed variants and non-return atomics, LOCAL includes all local atomics(local_* and {cmp}xchg_local), PARTIAL includes _acquire and _release operations and FULL includes all fully ordered global atomic operations. Thoughts? Regards, Boqun > that Will is looking to have the same effect as C11 memory_order_relaxed, > which is relaxed in this sense. If he has something else in mind, > he needs to tell us what it is and why. ;-) > > Thanx, Paul >
On Mon, Oct 12, 2015 at 09:17:50AM +0800, Boqun Feng wrote: > On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote: > > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > > > Is there a good argument for this? > > > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > > operations. > > > > > > Is this documented somewhere, because I completely missed this part. > > > > Well, yes, these need to be added to the documentation. I am assuming > > Maybe it's good time for us to call it out which operation should be > a compiler barrier or a CPU barrier? > > I had something in my mind while I was working on this series, not > really sure whether it's correct, but probably a start point: > > All global and local atomic operations are at least atomic(no one can > observe the middle state) and volatile(compilers can't optimize out the > memory access). Based on this, there are four strictness levels, one > can rely on them: > > RELAXED: neither a compiler barrier or a CPU barrier > LOCAL: a compiler barrier > PARTIAL: both a compiler barrier and a CPU barrier but not transitive > FULL: both compiler barrier and a CPU barrier, and transitive. > > RELAXED includes all _relaxed variants and non-return atomics, LOCAL > includes all local atomics(local_* and {cmp}xchg_local), PARTIAL > includes _acquire and _release operations and FULL includes all fully > ordered global atomic operations. > > Thoughts? I think that's where we currently are already, apart from defining transitivity (see the other thread), which makes things a whole lot more muddy. That said, on Friday we seemed to be in broad agreement on the semantics -- the difficult part is getting the language right (which is why we started to discuss including litmus tests alongside the documentation). Will
On Mon, Oct 12, 2015 at 09:17:50AM +0800, Boqun Feng wrote: > Hi Paul, > > On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote: > > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote: > > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote: > > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote: > > > > > > > > I must say I'm somewhat surprised by this level of relaxation, I had > > > > > expected to only loose SMP barriers, not the program order ones. > > > > > > > > > > Is there a good argument for this? > > > > > > > > Yes, when we say "relaxed", we really mean relaxed. ;-) > > > > > > > > Both the CPU and the compiler are allowed to reorder around relaxed > > > > operations. > > > > > > Is this documented somewhere, because I completely missed this part. > > > > Well, yes, these need to be added to the documentation. I am assuming > > Maybe it's good time for us to call it out which operation should be > a compiler barrier or a CPU barrier? > > I had something in my mind while I was working on this series, not > really sure whether it's correct, but probably a start point: > > All global and local atomic operations are at least atomic(no one can > observe the middle state) and volatile(compilers can't optimize out the > memory access). Based on this, there are four strictness levels, one > can rely on them: > > RELAXED: neither a compiler barrier or a CPU barrier > LOCAL: a compiler barrier > PARTIAL: both a compiler barrier and a CPU barrier but not transitive > FULL: both compiler barrier and a CPU barrier, and transitive. As Will noted, we have two types of transitive. The first type is that of release-acquire chains, where the transitivity is only observable within the chain. The second type is that of smp_mb(), where the transitivity is observable globally. Thanx, Paul > RELAXED includes all _relaxed variants and non-return atomics, LOCAL > includes all local atomics(local_* and {cmp}xchg_local), PARTIAL > includes _acquire and _release operations and FULL includes all fully > ordered global atomic operations. > > Thoughts? > > Regards, > Boqun > > > that Will is looking to have the same effect as C11 memory_order_relaxed, > > which is relaxed in this sense. If he has something else in mind, > > he needs to tell us what it is and why. ;-) > > > > Thanx, Paul > >
diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index d1dcdcf..d9f570b 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -193,6 +193,7 @@ static __inline__ int atomic_dec_return(atomic_t *v) #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) #define atomic_xchg(v, new) (xchg(&((v)->counter), new)) +#define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) /** * __atomic_add_unless - add unless the number is a given value @@ -461,6 +462,7 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v) #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) #define atomic64_xchg(v, new) (xchg(&((v)->counter), new)) +#define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) /** * atomic64_add_unless - add unless the number is a given value diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h index ad6263c..66374f4 100644 --- a/arch/powerpc/include/asm/cmpxchg.h +++ b/arch/powerpc/include/asm/cmpxchg.h @@ -54,6 +54,32 @@ __xchg_u32_local(volatile void *p, unsigned long val) return prev; } +/* + * Atomic exchange relaxed + * + * Changes the memory location '*p' to be val and returns + * the previous value stored there. + * + * Note that this is not a compiler barrier, there is no order + * guarantee around. + */ +static __always_inline unsigned long +__xchg_u32_relaxed(u32 *p, unsigned long val) +{ + unsigned long prev; + + __asm__ __volatile__( +"1: lwarx %0,0,%2\n" + PPC405_ERR77(0, %2) +" stwcx. %3,0,%2\n" +" bne- 1b" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (val) + : "cc"); + + return prev; +} + #ifdef CONFIG_PPC64 static __always_inline unsigned long __xchg_u64(volatile void *p, unsigned long val) @@ -90,6 +116,23 @@ __xchg_u64_local(volatile void *p, unsigned long val) return prev; } + +static __always_inline unsigned long +__xchg_u64_relaxed(u64 *p, unsigned long val) +{ + unsigned long prev; + + __asm__ __volatile__( +"1: ldarx %0,0,%2\n" + PPC405_ERR77(0, %2) +" stdcx. %3,0,%2\n" +" bne- 1b" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (val) + : "cc"); + + return prev; +} #endif /* @@ -127,6 +170,21 @@ __xchg_local(volatile void *ptr, unsigned long x, unsigned int size) __xchg_called_with_bad_pointer(); return x; } + +static __always_inline unsigned long +__xchg_relaxed(void *ptr, unsigned long x, unsigned int size) +{ + switch (size) { + case 4: + return __xchg_u32_relaxed(ptr, x); +#ifdef CONFIG_PPC64 + case 8: + return __xchg_u64_relaxed(ptr, x); +#endif + } + __xchg_called_with_bad_pointer(); + return x; +} #define xchg(ptr,x) \ ({ \ __typeof__(*(ptr)) _x_ = (x); \ @@ -140,6 +198,12 @@ __xchg_local(volatile void *ptr, unsigned long x, unsigned int size) (unsigned long)_x_, sizeof(*(ptr))); \ }) +#define xchg_relaxed(ptr, x) \ +({ \ + __typeof__(*(ptr)) _x_ = (x); \ + (__typeof__(*(ptr))) __xchg_relaxed((ptr), \ + (unsigned long)_x_, sizeof(*(ptr))); \ +}) /* * Compare and exchange - if *p == old, set it to new, * and return the old value of *p.
Implement xchg_relaxed and define atomic{,64}_xchg_* as xchg_relaxed, based on these _relaxed variants, release/acquire variants can be built. Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler barriers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- arch/powerpc/include/asm/atomic.h | 2 ++ arch/powerpc/include/asm/cmpxchg.h | 64 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+)