diff mbox

[RFC,v2,4/7] powerpc: atomic: Implement xchg_* and atomic{, 64}_xchg_* variants

Message ID 1442418575-12297-5-git-send-email-boqun.feng@gmail.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Boqun Feng Sept. 16, 2015, 3:49 p.m. UTC
Implement xchg_relaxed and define atomic{,64}_xchg_* as xchg_relaxed,
based on these _relaxed variants, release/acquire variants can be built.

Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler
barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h  |  2 ++
 arch/powerpc/include/asm/cmpxchg.h | 64 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

Comments

Peter Zijlstra Oct. 1, 2015, 12:24 p.m. UTC | #1
On Wed, Sep 16, 2015 at 11:49:32PM +0800, Boqun Feng wrote:
> Implement xchg_relaxed and define atomic{,64}_xchg_* as xchg_relaxed,
> based on these _relaxed variants, release/acquire variants can be built.
> 
> Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler
> barriers.

Hmm, and I note your previous patch creating the regular _relaxed
thingies also removes the memory clobber.

And looking at the ARM _relaxed patch from Will, I see their _relaxed
ops are also not a compiler barrier.

I must say I'm somewhat surprised by this level of relaxation, I had
expected to only loose SMP barriers, not the program order ones.

Is there a good argument for this?
Paul E. McKenney Oct. 1, 2015, 3:09 p.m. UTC | #2
On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 16, 2015 at 11:49:32PM +0800, Boqun Feng wrote:
> > Implement xchg_relaxed and define atomic{,64}_xchg_* as xchg_relaxed,
> > based on these _relaxed variants, release/acquire variants can be built.
> > 
> > Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler
> > barriers.
> 
> Hmm, and I note your previous patch creating the regular _relaxed
> thingies also removes the memory clobber.
> 
> And looking at the ARM _relaxed patch from Will, I see their _relaxed
> ops are also not a compiler barrier.
> 
> I must say I'm somewhat surprised by this level of relaxation, I had
> expected to only loose SMP barriers, not the program order ones.
> 
> Is there a good argument for this?

Yes, when we say "relaxed", we really mean relaxed.  ;-)

Both the CPU and the compiler are allowed to reorder around relaxed
operations.

							Thanx, Paul
Peter Zijlstra Oct. 1, 2015, 5:13 p.m. UTC | #3
On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:

> > I must say I'm somewhat surprised by this level of relaxation, I had
> > expected to only loose SMP barriers, not the program order ones.
> > 
> > Is there a good argument for this?
> 
> Yes, when we say "relaxed", we really mean relaxed.  ;-)
> 
> Both the CPU and the compiler are allowed to reorder around relaxed
> operations.

Is this documented somewhere, because I completely missed this part.
Paul E. McKenney Oct. 1, 2015, 6:03 p.m. UTC | #4
On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> 
> > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > expected to only loose SMP barriers, not the program order ones.
> > > 
> > > Is there a good argument for this?
> > 
> > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > 
> > Both the CPU and the compiler are allowed to reorder around relaxed
> > operations.
> 
> Is this documented somewhere, because I completely missed this part.

Well, yes, these need to be added to the documentation.  I am assuming
that Will is looking to have the same effect as C11 memory_order_relaxed,
which is relaxed in this sense.  If he has something else in mind,
he needs to tell us what it is and why.  ;-)

							Thanx, Paul
Peter Zijlstra Oct. 1, 2015, 6:23 p.m. UTC | #5
On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > 
> > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > expected to only loose SMP barriers, not the program order ones.
> > > > 
> > > > Is there a good argument for this?
> > > 
> > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > 
> > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > operations.
> > 
> > Is this documented somewhere, because I completely missed this part.
> 
> Well, yes, these need to be added to the documentation.  I am assuming
> that Will is looking to have the same effect as C11 memory_order_relaxed,
> which is relaxed in this sense.  If he has something else in mind,
> he needs to tell us what it is and why.  ;-)

I suspect he is; but I'm not _that_ up to date on the whole C11 stuff.
Paul E. McKenney Oct. 1, 2015, 7:41 p.m. UTC | #6
On Thu, Oct 01, 2015 at 08:23:09PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > > expected to only loose SMP barriers, not the program order ones.
> > > > > 
> > > > > Is there a good argument for this?
> > > > 
> > > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > > 
> > > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > > operations.
> > > 
> > > Is this documented somewhere, because I completely missed this part.
> > 
> > Well, yes, these need to be added to the documentation.  I am assuming
> > that Will is looking to have the same effect as C11 memory_order_relaxed,
> > which is relaxed in this sense.  If he has something else in mind,
> > he needs to tell us what it is and why.  ;-)
> 
> I suspect he is; but I'm not _that_ up to date on the whole C11 stuff.

Lucky you!  ;-)

							Thanx, Paul
Will Deacon Oct. 5, 2015, 2:44 p.m. UTC | #7
On Thu, Oct 01, 2015 at 07:03:01PM +0100, Paul E. McKenney wrote:
> On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > 
> > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > expected to only loose SMP barriers, not the program order ones.
> > > > 
> > > > Is there a good argument for this?
> > > 
> > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > 
> > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > operations.
> > 
> > Is this documented somewhere, because I completely missed this part.
> 
> Well, yes, these need to be added to the documentation.  I am assuming
> that Will is looking to have the same effect as C11 memory_order_relaxed,
> which is relaxed in this sense.  If he has something else in mind,
> he needs to tell us what it is and why.  ;-)

I was treating them purely as being single-copy atomic and not providing
any memory ordering guarantees (much like the non *_return atomic operations
that we already have). I think this lines up with C11, minus the bits
about data races which we don't call out anyway.

Will
Paul E. McKenney Oct. 5, 2015, 4:57 p.m. UTC | #8
On Mon, Oct 05, 2015 at 03:44:07PM +0100, Will Deacon wrote:
> On Thu, Oct 01, 2015 at 07:03:01PM +0100, Paul E. McKenney wrote:
> > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > > expected to only loose SMP barriers, not the program order ones.
> > > > > 
> > > > > Is there a good argument for this?
> > > > 
> > > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > > 
> > > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > > operations.
> > > 
> > > Is this documented somewhere, because I completely missed this part.
> > 
> > Well, yes, these need to be added to the documentation.  I am assuming
> > that Will is looking to have the same effect as C11 memory_order_relaxed,
> > which is relaxed in this sense.  If he has something else in mind,
> > he needs to tell us what it is and why.  ;-)
> 
> I was treating them purely as being single-copy atomic and not providing
> any memory ordering guarantees (much like the non *_return atomic operations
> that we already have). I think this lines up with C11, minus the bits
> about data races which we don't call out anyway.

As long as it is single-copy atomic and not multi-copy atomic, I believe
we are on the saem page.  We have slowly been outlawing some sorts of
data races over the past few years, and I would guess that this will
continue, expecially if good tooling emerges (and KTSAN is showing some
promise from what I can see).

							Thanx, Paul
Boqun Feng Oct. 12, 2015, 1:17 a.m. UTC | #9
Hi Paul,

On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > 
> > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > expected to only loose SMP barriers, not the program order ones.
> > > > 
> > > > Is there a good argument for this?
> > > 
> > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > 
> > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > operations.
> > 
> > Is this documented somewhere, because I completely missed this part.
> 
> Well, yes, these need to be added to the documentation.  I am assuming

Maybe it's good time for us to call it out which operation should be
a compiler barrier or a CPU barrier?

I had something in my mind while I was working on this series, not
really sure whether it's correct, but probably a start point:

All global and local atomic operations are at least atomic(no one can
observe the middle state) and volatile(compilers can't optimize out the
memory access). Based on this, there are four strictness levels, one
can rely on them:

RELAXED: neither a compiler barrier or a CPU barrier
LOCAL: a compiler barrier
PARTIAL: both a compiler barrier and a CPU barrier but not transitive
FULL: both compiler barrier and a CPU barrier, and transitive.

RELAXED includes all _relaxed variants and non-return atomics, LOCAL
includes all local atomics(local_* and {cmp}xchg_local), PARTIAL
includes _acquire and _release operations and FULL includes all fully
ordered global atomic operations.

Thoughts?

Regards,
Boqun

> that Will is looking to have the same effect as C11 memory_order_relaxed,
> which is relaxed in this sense.  If he has something else in mind,
> he needs to tell us what it is and why.  ;-)
> 
> 							Thanx, Paul
>
Will Deacon Oct. 12, 2015, 9:28 a.m. UTC | #10
On Mon, Oct 12, 2015 at 09:17:50AM +0800, Boqun Feng wrote:
> On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > > expected to only loose SMP barriers, not the program order ones.
> > > > > 
> > > > > Is there a good argument for this?
> > > > 
> > > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > > 
> > > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > > operations.
> > > 
> > > Is this documented somewhere, because I completely missed this part.
> > 
> > Well, yes, these need to be added to the documentation.  I am assuming
> 
> Maybe it's good time for us to call it out which operation should be
> a compiler barrier or a CPU barrier?
> 
> I had something in my mind while I was working on this series, not
> really sure whether it's correct, but probably a start point:
> 
> All global and local atomic operations are at least atomic(no one can
> observe the middle state) and volatile(compilers can't optimize out the
> memory access). Based on this, there are four strictness levels, one
> can rely on them:
> 
> RELAXED: neither a compiler barrier or a CPU barrier
> LOCAL: a compiler barrier
> PARTIAL: both a compiler barrier and a CPU barrier but not transitive
> FULL: both compiler barrier and a CPU barrier, and transitive.
> 
> RELAXED includes all _relaxed variants and non-return atomics, LOCAL
> includes all local atomics(local_* and {cmp}xchg_local), PARTIAL
> includes _acquire and _release operations and FULL includes all fully
> ordered global atomic operations.
> 
> Thoughts?

I think that's where we currently are already, apart from defining
transitivity (see the other thread), which makes things a whole lot more
muddy.

That said, on Friday we seemed to be in broad agreement on the semantics
-- the difficult part is getting the language right (which is why we
started to discuss including litmus tests alongside the documentation).

Will
Paul E. McKenney Oct. 12, 2015, 11:24 p.m. UTC | #11
On Mon, Oct 12, 2015 at 09:17:50AM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Thu, Oct 01, 2015 at 11:03:01AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 01, 2015 at 07:13:04PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 01, 2015 at 08:09:09AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 01, 2015 at 02:24:40PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > I must say I'm somewhat surprised by this level of relaxation, I had
> > > > > expected to only loose SMP barriers, not the program order ones.
> > > > > 
> > > > > Is there a good argument for this?
> > > > 
> > > > Yes, when we say "relaxed", we really mean relaxed.  ;-)
> > > > 
> > > > Both the CPU and the compiler are allowed to reorder around relaxed
> > > > operations.
> > > 
> > > Is this documented somewhere, because I completely missed this part.
> > 
> > Well, yes, these need to be added to the documentation.  I am assuming
> 
> Maybe it's good time for us to call it out which operation should be
> a compiler barrier or a CPU barrier?
> 
> I had something in my mind while I was working on this series, not
> really sure whether it's correct, but probably a start point:
> 
> All global and local atomic operations are at least atomic(no one can
> observe the middle state) and volatile(compilers can't optimize out the
> memory access). Based on this, there are four strictness levels, one
> can rely on them:
> 
> RELAXED: neither a compiler barrier or a CPU barrier
> LOCAL: a compiler barrier
> PARTIAL: both a compiler barrier and a CPU barrier but not transitive
> FULL: both compiler barrier and a CPU barrier, and transitive.

As Will noted, we have two types of transitive.  The first type is that
of release-acquire chains, where the transitivity is only observable
within the chain.  The second type is that of smp_mb(), where the
transitivity is observable globally.

							Thanx, Paul

> RELAXED includes all _relaxed variants and non-return atomics, LOCAL
> includes all local atomics(local_* and {cmp}xchg_local), PARTIAL
> includes _acquire and _release operations and FULL includes all fully
> ordered global atomic operations.
> 
> Thoughts?
> 
> Regards,
> Boqun
> 
> > that Will is looking to have the same effect as C11 memory_order_relaxed,
> > which is relaxed in this sense.  If he has something else in mind,
> > he needs to tell us what it is and why.  ;-)
> > 
> > 							Thanx, Paul
> >
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index d1dcdcf..d9f570b 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -193,6 +193,7 @@  static __inline__ int atomic_dec_return(atomic_t *v)
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
 /**
  * __atomic_add_unless - add unless the number is a given value
@@ -461,6 +462,7 @@  static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
 /**
  * atomic64_add_unless - add unless the number is a given value
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index ad6263c..66374f4 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -54,6 +54,32 @@  __xchg_u32_local(volatile void *p, unsigned long val)
 	return prev;
 }
 
+/*
+ * Atomic exchange relaxed
+ *
+ * Changes the memory location '*p' to be val and returns
+ * the previous value stored there.
+ *
+ * Note that this is not a compiler barrier, there is no order
+ * guarantee around.
+ */
+static __always_inline unsigned long
+__xchg_u32_relaxed(u32 *p, unsigned long val)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%2\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%3,0,%2\n"
+"	bne-	1b"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (val)
+	: "cc");
+
+	return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
 __xchg_u64(volatile void *p, unsigned long val)
@@ -90,6 +116,23 @@  __xchg_u64_local(volatile void *p, unsigned long val)
 
 	return prev;
 }
+
+static __always_inline unsigned long
+__xchg_u64_relaxed(u64 *p, unsigned long val)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%2\n"
+	PPC405_ERR77(0, %2)
+"	stdcx.	%3,0,%2\n"
+"	bne-	1b"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (val)
+	: "cc");
+
+	return prev;
+}
 #endif
 
 /*
@@ -127,6 +170,21 @@  __xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
 	__xchg_called_with_bad_pointer();
 	return x;
 }
+
+static __always_inline unsigned long
+__xchg_relaxed(void *ptr, unsigned long x, unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __xchg_u32_relaxed(ptr, x);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __xchg_u64_relaxed(ptr, x);
+#endif
+	}
+	__xchg_called_with_bad_pointer();
+	return x;
+}
 #define xchg(ptr,x)							     \
   ({									     \
      __typeof__(*(ptr)) _x_ = (x);					     \
@@ -140,6 +198,12 @@  __xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
      		(unsigned long)_x_, sizeof(*(ptr))); 			     \
   })
 
+#define xchg_relaxed(ptr, x)						\
+({									\
+	__typeof__(*(ptr)) _x_ = (x);					\
+	(__typeof__(*(ptr))) __xchg_relaxed((ptr),			\
+			(unsigned long)_x_, sizeof(*(ptr)));		\
+})
 /*
  * Compare and exchange - if *p == old, set it to new,
  * and return the old value of *p.