Message ID | 1444659246-24769-7-git-send-email-boqun.feng@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote: > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on > which _release variants can be built. > > To avoid superfluous barriers in _acquire variants, we implement these > operations with assembly code rather use __atomic_op_acquire() to build > them automatically. The "superfluous barriers" are for the case where the cmpxchg fails, right? And you don't do the same thing for release, because you want to avoid a barrier in the middle of the critical section? (just checking I understand your reasoning). Will
On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote: > On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote: > > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on > > which _release variants can be built. > > > > To avoid superfluous barriers in _acquire variants, we implement these > > operations with assembly code rather use __atomic_op_acquire() to build > > them automatically. > > The "superfluous barriers" are for the case where the cmpxchg fails, right? Yes. > And you don't do the same thing for release, because you want to avoid a > barrier in the middle of the critical section? > Mostly because of the comments in include/linux/atomic.h: * For compound atomics performing both a load and a store, ACQUIRE * semantics apply only to the load and RELEASE semantics only to the * store portion of the operation. Note that a failed cmpxchg_acquire * does -not- imply any memory ordering constraints. so I thought only the barrier in cmpxchg_acquire() is conditional, and the barrier in cmpxchg_release() is not. Maybe we'd better call it out that cmpxchg *family* doesn't have any order guarantee if cmp fails, as a complement of ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics") Because it seems this commit only claims that the barriers in fully ordered version are conditional. If cmpxchg_release doesn't have order guarantee when failed, I guess I can implement it with a barrier in the middle as you mentioned: unsigned int prev; __asm__ __volatile__ ( "1: lwarx %0,0,%2 cmpw 0,%0,%3\n\ bne- 2f\n" PPC_RELEASE_BARRIER " stwcx. %4,0,%2\n\ bne- 1b" "\n\ 2:" : "=&r" (prev), "+m" (*p) : "r" (p), "r" (old), "r" (new) : "cc", "memory"); return prev; However, I need to check whether the architecture allows this and any other problem exists. Besides, I don't think it's a good idea to do the "put barrier in the middle" thing in this patchset, because that seems a premature optimization and if we go further, I guess we can also replace the PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered version cmpxchg(). Too much needs to investigate then.. > (just checking I understand your reasoning). > That actually helps me find a probably better implementation if allowed, thank you ;-) Regards, Boqun
On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote: > On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote: > > On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote: > > > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on > > > which _release variants can be built. > > > > > > To avoid superfluous barriers in _acquire variants, we implement these > > > operations with assembly code rather use __atomic_op_acquire() to build > > > them automatically. > > > > The "superfluous barriers" are for the case where the cmpxchg fails, right? > > Yes. > > > And you don't do the same thing for release, because you want to avoid a > > barrier in the middle of the critical section? > > > > Mostly because of the comments in include/linux/atomic.h: > > * For compound atomics performing both a load and a store, ACQUIRE > * semantics apply only to the load and RELEASE semantics only to the > * store portion of the operation. Note that a failed cmpxchg_acquire > * does -not- imply any memory ordering constraints. > > so I thought only the barrier in cmpxchg_acquire() is conditional, and > the barrier in cmpxchg_release() is not. Maybe we'd better call it out > that cmpxchg *family* doesn't have any order guarantee if cmp fails, as > a complement of > > ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics") > > Because it seems this commit only claims that the barriers in fully > ordered version are conditional. I didn't think this was ambiguous... A failed cmpxchg_release doesn't perform a store, so because the RELEASE semantics only apply to the store portion of the operation, it therefore doesn't have any ordering guarantees. Acquire is called out as a special case because it *does* actually perform a load on the failure case. > If cmpxchg_release doesn't have order guarantee when failed, I guess I > can implement it with a barrier in the middle as you mentioned: > > unsigned int prev; > > __asm__ __volatile__ ( > "1: lwarx %0,0,%2 > cmpw 0,%0,%3\n\ > bne- 2f\n" > PPC_RELEASE_BARRIER > " stwcx. %4,0,%2\n\ > bne- 1b" > "\n\ > 2:" > : "=&r" (prev), "+m" (*p) > : "r" (p), "r" (old), "r" (new) > : "cc", "memory"); > > return prev; > > > However, I need to check whether the architecture allows this and any > other problem exists. > > Besides, I don't think it's a good idea to do the "put barrier in the > middle" thing in this patchset, because that seems a premature > optimization and if we go further, I guess we can also replace the > PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered > version cmpxchg(). Too much needs to investigate then.. Putting a barrier in the middle of that critical section is probably a terrible idea, and that's why I thought you were avoiding it (hence my original question). Perhaps just add a comment to that effect, since I fear adding more words to memory-barriers.txt is just likely to create further confusion. Will
On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote: > On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote: > > On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote: > > > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on > > > which _release variants can be built. > > > > > > To avoid superfluous barriers in _acquire variants, we implement these > > > operations with assembly code rather use __atomic_op_acquire() to build > > > them automatically. > > > > The "superfluous barriers" are for the case where the cmpxchg fails, right? > > Yes. > > > And you don't do the same thing for release, because you want to avoid a > > barrier in the middle of the critical section? > > > > Mostly because of the comments in include/linux/atomic.h: > > * For compound atomics performing both a load and a store, ACQUIRE > * semantics apply only to the load and RELEASE semantics only to the > * store portion of the operation. Note that a failed cmpxchg_acquire > * does -not- imply any memory ordering constraints. > > so I thought only the barrier in cmpxchg_acquire() is conditional, and > the barrier in cmpxchg_release() is not. Maybe we'd better call it out > that cmpxchg *family* doesn't have any order guarantee if cmp fails, as > a complement of > > ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics") > > Because it seems this commit only claims that the barriers in fully > ordered version are conditional. > > > If cmpxchg_release doesn't have order guarantee when failed, I guess I > can implement it with a barrier in the middle as you mentioned: > > unsigned int prev; > > __asm__ __volatile__ ( > "1: lwarx %0,0,%2 > cmpw 0,%0,%3\n\ > bne- 2f\n" > PPC_RELEASE_BARRIER > " stwcx. %4,0,%2\n\ > bne- 1b" > "\n\ > 2:" > : "=&r" (prev), "+m" (*p) > : "r" (p), "r" (old), "r" (new) > : "cc", "memory"); > > return prev; > > > However, I need to check whether the architecture allows this and any > other problem exists. > > Besides, I don't think it's a good idea to do the "put barrier in the > middle" thing in this patchset, because that seems a premature > optimization and if we go further, I guess we can also replace the > PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered Correction: we can't just put the sync in the middle to implement a fully ordered version. Sorry for that mistake.. Regards, Boqun > version cmpxchg(). Too much needs to investigate then.. > > > (just checking I understand your reasoning). > > > > That actually helps me find a probably better implementation if allowed, > thank you ;-) > > Regards, > Boqun
On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote: > On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote: [snip] > > > > Mostly because of the comments in include/linux/atomic.h: > > > > * For compound atomics performing both a load and a store, ACQUIRE > > * semantics apply only to the load and RELEASE semantics only to the > > * store portion of the operation. Note that a failed cmpxchg_acquire > > * does -not- imply any memory ordering constraints. > > > > so I thought only the barrier in cmpxchg_acquire() is conditional, and > > the barrier in cmpxchg_release() is not. Maybe we'd better call it out > > that cmpxchg *family* doesn't have any order guarantee if cmp fails, as > > a complement of > > > > ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics") > > > > Because it seems this commit only claims that the barriers in fully > > ordered version are conditional. > > I didn't think this was ambiguous... A failed cmpxchg_release doesn't > perform a store, so because the RELEASE semantics only apply to the > store portion of the operation, it therefore doesn't have any ordering > guarantees. Acquire is called out as a special case because it *does* > actually perform a load on the failure case. > Make sense. > > If cmpxchg_release doesn't have order guarantee when failed, I guess I > > can implement it with a barrier in the middle as you mentioned: > > > > unsigned int prev; > > > > __asm__ __volatile__ ( > > "1: lwarx %0,0,%2 > > cmpw 0,%0,%3\n\ > > bne- 2f\n" > > PPC_RELEASE_BARRIER > > " stwcx. %4,0,%2\n\ > > bne- 1b" > > "\n\ > > 2:" > > : "=&r" (prev), "+m" (*p) > > : "r" (p), "r" (old), "r" (new) > > : "cc", "memory"); > > > > return prev; > > > > > > However, I need to check whether the architecture allows this and any > > other problem exists. > > > > Besides, I don't think it's a good idea to do the "put barrier in the > > middle" thing in this patchset, because that seems a premature > > optimization and if we go further, I guess we can also replace the > > PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered > > version cmpxchg(). Too much needs to investigate then.. > > Putting a barrier in the middle of that critical section is probably a > terrible idea, and that's why I thought you were avoiding it (hence my The fact is that I haven't thought of that way to implement cmpxchg_release before you ask that question ;-) And I'm not going to do that for now and probably not in the future. > original question). Perhaps just add a comment to that effect, since I Are you suggesting if I put a barrier in the middle I'd better to add a comment, right? So if I don't do that, it's OK to let this patch as it. Regards, Boqun > fear adding more words to memory-barriers.txt is just likely to create > further confusion. > > Will
On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote: > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote: > > Putting a barrier in the middle of that critical section is probably a > > terrible idea, and that's why I thought you were avoiding it (hence my > > The fact is that I haven't thought of that way to implement > cmpxchg_release before you ask that question ;-) And I'm not going to do > that for now and probably not in the future. > > > original question). Perhaps just add a comment to that effect, since I > > Are you suggesting if I put a barrier in the middle I'd better to add a > comment, right? So if I don't do that, it's OK to let this patch as it. No, I mean put a comment in your file to explain the reason why you override _relaxed and _acquire, but not _release (because overriding _release would introduce this weird barrier in the middle of the critical section, which would likely cause the conditional store to fail). Will
On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote: > On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote: > > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote: > > > Putting a barrier in the middle of that critical section is probably a > > > terrible idea, and that's why I thought you were avoiding it (hence my > > > > The fact is that I haven't thought of that way to implement > > cmpxchg_release before you ask that question ;-) And I'm not going to do > > that for now and probably not in the future. > > > > > original question). Perhaps just add a comment to that effect, since I > > > > Are you suggesting if I put a barrier in the middle I'd better to add a > > comment, right? So if I don't do that, it's OK to let this patch as it. > > No, I mean put a comment in your file to explain the reason why you > override _relaxed and _acquire, but not _release (because overriding > _release would introduce this weird barrier in the middle of the critical > section, which would likely cause the conditional store to fail). > Good idea, will do that. Thank you ;-) Regards, Boqun
On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote: > On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote: > > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote: > > > Putting a barrier in the middle of that critical section is probably a > > > terrible idea, and that's why I thought you were avoiding it (hence my > > > > The fact is that I haven't thought of that way to implement > > cmpxchg_release before you ask that question ;-) And I'm not going to do > > that for now and probably not in the future. > > > > > original question). Perhaps just add a comment to that effect, since I > > > > Are you suggesting if I put a barrier in the middle I'd better to add a > > comment, right? So if I don't do that, it's OK to let this patch as it. > > No, I mean put a comment in your file to explain the reason why you > override _relaxed and _acquire, but not _release (because overriding You mean overriding _acquire and fully order version, right? > _release would introduce this weird barrier in the middle of the critical > section, which would likely cause the conditional store to fail). > > Will
On Wed, Oct 14, 2015 at 09:47:35AM +0800, Boqun Feng wrote: > On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote: > > On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote: > > > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote: > > > > Putting a barrier in the middle of that critical section is probably a > > > > terrible idea, and that's why I thought you were avoiding it (hence my > > > > > > The fact is that I haven't thought of that way to implement > > > cmpxchg_release before you ask that question ;-) And I'm not going to do > > > that for now and probably not in the future. > > > > > > > original question). Perhaps just add a comment to that effect, since I > > > > > > Are you suggesting if I put a barrier in the middle I'd better to add a > > > comment, right? So if I don't do that, it's OK to let this patch as it. > > > > No, I mean put a comment in your file to explain the reason why you > > override _relaxed and _acquire, but not _release (because overriding > > You mean overriding _acquire and fully order version, right? Yes, my mistake. Sounds like you get my drift, though. Will
diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index 8869fb3..9f44515 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -191,6 +191,11 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v) #define atomic_dec_return_relaxed atomic_dec_return_relaxed #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) +#define atomic_cmpxchg_relaxed(v, o, n) \ + cmpxchg_relaxed(&((v)->counter), (o), (n)) +#define atomic_cmpxchg_acquire(v, o, n) \ + cmpxchg_acquire(&((v)->counter), (o), (n)) + #define atomic_xchg(v, new) (xchg(&((v)->counter), new)) #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) @@ -459,6 +464,11 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v) } #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) +#define atomic64_cmpxchg_relaxed(v, o, n) \ + cmpxchg_relaxed(&((v)->counter), (o), (n)) +#define atomic64_cmpxchg_acquire(v, o, n) \ + cmpxchg_acquire(&((v)->counter), (o), (n)) + #define atomic64_xchg(v, new) (xchg(&((v)->counter), new)) #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h index 17c7e14..fd4ea04 100644 --- a/arch/powerpc/include/asm/cmpxchg.h +++ b/arch/powerpc/include/asm/cmpxchg.h @@ -181,6 +181,48 @@ __cmpxchg_u32_local(volatile unsigned int *p, unsigned long old, return prev; } +static __always_inline unsigned long +__cmpxchg_u32_relaxed(u32 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( +"1: lwarx %0,0,%2 # __cmpxchg_u32_relaxed\n" +" cmpw 0,%0,%3\n" +" bne- 2f\n" + PPC405_ERR77(0, %2) +" stwcx. %4,0,%2\n" +" bne- 1b\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc"); + + return prev; +} + +static __always_inline unsigned long +__cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( +"1: lwarx %0,0,%2 # __cmpxchg_u32_acquire\n" +" cmpw 0,%0,%3\n" +" bne- 2f\n" + PPC405_ERR77(0, %2) +" stwcx. %4,0,%2\n" +" bne- 1b\n" + PPC_ACQUIRE_BARRIER + "\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc", "memory"); + + return prev; +} + #ifdef CONFIG_PPC64 static __always_inline unsigned long __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new) @@ -224,6 +266,46 @@ __cmpxchg_u64_local(volatile unsigned long *p, unsigned long old, return prev; } + +static __always_inline unsigned long +__cmpxchg_u64_relaxed(u64 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( +"1: ldarx %0,0,%2 # __cmpxchg_u64_relaxed\n" +" cmpd 0,%0,%3\n" +" bne- 2f\n" +" stdcx. %4,0,%2\n" +" bne- 1b\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc"); + + return prev; +} + +static __always_inline unsigned long +__cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( +"1: ldarx %0,0,%2 # __cmpxchg_u64_acquire\n" +" cmpd 0,%0,%3\n" +" bne- 2f\n" +" stdcx. %4,0,%2\n" +" bne- 1b\n" + PPC_ACQUIRE_BARRIER + "\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc", "memory"); + + return prev; +} #endif /* This function doesn't exist, so you'll get a linker error @@ -262,6 +344,37 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new, return old; } +static __always_inline unsigned long +__cmpxchg_relaxed(void *ptr, unsigned long old, unsigned long new, + unsigned int size) +{ + switch (size) { + case 4: + return __cmpxchg_u32_relaxed(ptr, old, new); +#ifdef CONFIG_PPC64 + case 8: + return __cmpxchg_u64_relaxed(ptr, old, new); +#endif + } + __cmpxchg_called_with_bad_pointer(); + return old; +} + +static __always_inline unsigned long +__cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, + unsigned int size) +{ + switch (size) { + case 4: + return __cmpxchg_u32_acquire(ptr, old, new); +#ifdef CONFIG_PPC64 + case 8: + return __cmpxchg_u64_acquire(ptr, old, new); +#endif + } + __cmpxchg_called_with_bad_pointer(); + return old; +} #define cmpxchg(ptr, o, n) \ ({ \ __typeof__(*(ptr)) _o_ = (o); \ @@ -279,6 +392,23 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new, (unsigned long)_n_, sizeof(*(ptr))); \ }) +#define cmpxchg_relaxed(ptr, o, n) \ +({ \ + __typeof__(*(ptr)) _o_ = (o); \ + __typeof__(*(ptr)) _n_ = (n); \ + (__typeof__(*(ptr))) __cmpxchg_relaxed((ptr), \ + (unsigned long)_o_, (unsigned long)_n_, \ + sizeof(*(ptr))); \ +}) + +#define cmpxchg_acquire(ptr, o, n) \ +({ \ + __typeof__(*(ptr)) _o_ = (o); \ + __typeof__(*(ptr)) _n_ = (n); \ + (__typeof__(*(ptr))) __cmpxchg_acquire((ptr), \ + (unsigned long)_o_, (unsigned long)_n_, \ + sizeof(*(ptr))); \ +}) #ifdef CONFIG_PPC64 #define cmpxchg64(ptr, o, n) \ ({ \ @@ -290,7 +420,16 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new, BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ cmpxchg_local((ptr), (o), (n)); \ }) -#define cmpxchg64_relaxed cmpxchg64_local +#define cmpxchg64_relaxed(ptr, o, n) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + cmpxchg_relaxed((ptr), (o), (n)); \ +}) +#define cmpxchg64_acquire(ptr, o, n) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + cmpxchg_acquire((ptr), (o), (n)); \ +}) #else #include <asm-generic/cmpxchg-local.h> #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on which _release variants can be built. To avoid superfluous barriers in _acquire variants, we implement these operations with assembly code rather use __atomic_op_acquire() to build them automatically. For the same reason, we keep the assembly implementation of fully ordered cmpxchg operations. Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not compiler barriers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- arch/powerpc/include/asm/atomic.h | 10 +++ arch/powerpc/include/asm/cmpxchg.h | 141 ++++++++++++++++++++++++++++++++++++- 2 files changed, 150 insertions(+), 1 deletion(-)