diff mbox

[v3,4/6] powerpc: atomic: Implement atomic{, 64}_*_return_* variants

Message ID 1444659246-24769-5-git-send-email-boqun.feng@gmail.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Boqun Feng Oct. 12, 2015, 2:14 p.m. UTC
On powerpc, acquire and release semantics can be achieved with
lightweight barriers("lwsync" and "ctrl+isync"), which can be used to
implement __atomic_op_{acquire,release}.

For release semantics, since we only need to ensure all memory accesses
that issue before must take effects before the -store- part of the
atomics, "lwsync" is what we only need. On the platform without
"lwsync", "sync" should be used. Therefore, smp_lwsync() is used here.

For acquire semantics, "lwsync" is what we only need for the similar
reason.  However on the platform without "lwsync", we can use "isync"
rather than "sync" as an acquire barrier. So a new kind of barrier
smp_acquire_barrier__after_atomic() is introduced, which is barrier() on
UP, "lwsync" if available and "isync" otherwise.

__atomic_op_fence is defined as smp_lwsync() + _relaxed +
smp_mb__after_atomic() to guarantee a full barrier.

Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other
variants with these helpers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h | 122 +++++++++++++++++++++++++-------------
 1 file changed, 80 insertions(+), 42 deletions(-)

Comments

Will Deacon Oct. 13, 2015, 1:21 p.m. UTC | #1
On Mon, Oct 12, 2015 at 10:14:04PM +0800, Boqun Feng wrote:
> On powerpc, acquire and release semantics can be achieved with
> lightweight barriers("lwsync" and "ctrl+isync"), which can be used to
> implement __atomic_op_{acquire,release}.
> 
> For release semantics, since we only need to ensure all memory accesses
> that issue before must take effects before the -store- part of the
> atomics, "lwsync" is what we only need. On the platform without
> "lwsync", "sync" should be used. Therefore, smp_lwsync() is used here.
> 
> For acquire semantics, "lwsync" is what we only need for the similar
> reason.  However on the platform without "lwsync", we can use "isync"
> rather than "sync" as an acquire barrier. So a new kind of barrier
> smp_acquire_barrier__after_atomic() is introduced, which is barrier() on
> UP, "lwsync" if available and "isync" otherwise.
> 
> __atomic_op_fence is defined as smp_lwsync() + _relaxed +
> smp_mb__after_atomic() to guarantee a full barrier.
> 
> Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other
> variants with these helpers.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  arch/powerpc/include/asm/atomic.h | 122 +++++++++++++++++++++++++-------------
>  1 file changed, 80 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
> index 55f106e..3143af9 100644
> --- a/arch/powerpc/include/asm/atomic.h
> +++ b/arch/powerpc/include/asm/atomic.h
> @@ -12,6 +12,39 @@
>  
>  #define ATOMIC_INIT(i)		{ (i) }
>  
> +/*
> + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> + * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
> + * on the platform without lwsync.
> + */
> +#ifdef CONFIG_SMP
> +#define smp_acquire_barrier__after_atomic() \
> +	__asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")

I'm not keen on this barrier, as it sounds like it's part of the kernel
memory model, as opposed to an implementation detail on PowerPC (and
we've already got enough of that in the generic code ;).

Can you name it something different please (and maybe #undef it when
you're done)?

Will
Boqun Feng Oct. 13, 2015, 1:35 p.m. UTC | #2
On Tue, Oct 13, 2015 at 02:21:32PM +0100, Will Deacon wrote:
> On Mon, Oct 12, 2015 at 10:14:04PM +0800, Boqun Feng wrote:
[snip]
> > +/*
> > + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> > + * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
> > + * on the platform without lwsync.
> > + */
> > +#ifdef CONFIG_SMP
> > +#define smp_acquire_barrier__after_atomic() \
> > +	__asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
> 
> I'm not keen on this barrier, as it sounds like it's part of the kernel
> memory model, as opposed to an implementation detail on PowerPC (and
> we've already got enough of that in the generic code ;).
> 

Indeed, but we still have smp_lwsync() ;-)

> Can you name it something different please (and maybe #undef it when
> you're done)?
> 

I've considered #undef it after used, but now I think open code this
into __atomic_op_acquire() of PPC is a better idea?


#define __atomic_op_acquire(op, args...)				\
({									\
	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
	__asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory");	\
	__ret;								\
})

PPC_ACQUIRE_BARRIER will be empty if !SMP, so that will become a pure
compiler barrier and just what we need.

Regards,
Boqun
Boqun Feng Oct. 14, 2015, 1 a.m. UTC | #3
On Tue, Oct 13, 2015 at 09:35:54PM +0800, Boqun Feng wrote:
> On Tue, Oct 13, 2015 at 02:21:32PM +0100, Will Deacon wrote:
> > On Mon, Oct 12, 2015 at 10:14:04PM +0800, Boqun Feng wrote:
> [snip]
> > > +/*
> > > + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> > > + * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
> > > + * on the platform without lwsync.
> > > + */
> > > +#ifdef CONFIG_SMP
> > > +#define smp_acquire_barrier__after_atomic() \
> > > +	__asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
> > 
> > I'm not keen on this barrier, as it sounds like it's part of the kernel
> > memory model, as opposed to an implementation detail on PowerPC (and
> > we've already got enough of that in the generic code ;).
> > 
> 
> Indeed, but we still have smp_lwsync() ;-)
> 
> > Can you name it something different please (and maybe #undef it when
> > you're done)?
> > 
> 
> I've considered #undef it after used, but now I think open code this
> into __atomic_op_acquire() of PPC is a better idea?
> 
> 
> #define __atomic_op_acquire(op, args...)				\
> ({									\
> 	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
> 	__asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory");	\

Should be:
	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");

> 	__ret;								\
> })
> 
> PPC_ACQUIRE_BARRIER will be empty if !SMP, so that will become a pure
> compiler barrier and just what we need.
> 
> Regards,
> Boqun
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 55f106e..3143af9 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -12,6 +12,39 @@ 
 
 #define ATOMIC_INIT(i)		{ (i) }
 
+/*
+ * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
+ * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
+ * on the platform without lwsync.
+ */
+#ifdef CONFIG_SMP
+#define smp_acquire_barrier__after_atomic() \
+	__asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
+#else
+#define smp_acquire_barrier__after_atomic() barrier()
+#endif
+#define __atomic_op_acquire(op, args...)				\
+({									\
+	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
+	smp_acquire_barrier__after_atomic();				\
+	__ret;								\
+})
+
+#define __atomic_op_release(op, args...)				\
+({									\
+	smp_lwsync();							\
+	op##_relaxed(args);						\
+})
+
+#define __atomic_op_fence(op, args...)				\
+({									\
+	typeof(op##_relaxed(args)) __ret;				\
+	smp_lwsync();							\
+	__ret = op##_relaxed(args);					\
+	smp_mb__after_atomic();						\
+	__ret;								\
+})
+
 static __inline__ int atomic_read(const atomic_t *v)
 {
 	int t;
@@ -42,27 +75,27 @@  static __inline__ void atomic_##op(int a, atomic_t *v)			\
 	: "cc");							\
 }									\
 
-#define ATOMIC_OP_RETURN(op, asm_op)					\
-static __inline__ int atomic_##op##_return(int a, atomic_t *v)		\
+#define ATOMIC_OP_RETURN_RELAXED(op, asm_op)				\
+static inline int atomic_##op##_return_relaxed(int a, atomic_t *v)	\
 {									\
 	int t;								\
 									\
 	__asm__ __volatile__(						\
-	PPC_ATOMIC_ENTRY_BARRIER					\
-"1:	lwarx	%0,0,%2		# atomic_" #op "_return\n"		\
-	#asm_op " %0,%1,%0\n"						\
-	PPC405_ERR77(0,%2)						\
-"	stwcx.	%0,0,%2 \n"						\
+"1:	lwarx	%0,0,%3		# atomic_" #op "_return_relaxed\n"	\
+	#asm_op " %0,%2,%0\n"						\
+	PPC405_ERR77(0, %3)						\
+"	stwcx.	%0,0,%3\n"						\
 "	bne-	1b\n"							\
-	PPC_ATOMIC_EXIT_BARRIER						\
-	: "=&r" (t)							\
+	: "=&r" (t), "+m" (v->counter)					\
 	: "r" (a), "r" (&v->counter)					\
-	: "cc", "memory");						\
+	: "cc");							\
 									\
 	return t;							\
 }
 
-#define ATOMIC_OPS(op, asm_op) ATOMIC_OP(op, asm_op) ATOMIC_OP_RETURN(op, asm_op)
+#define ATOMIC_OPS(op, asm_op)						\
+	ATOMIC_OP(op, asm_op)						\
+	ATOMIC_OP_RETURN_RELAXED(op, asm_op)
 
 ATOMIC_OPS(add, add)
 ATOMIC_OPS(sub, subf)
@@ -71,8 +104,11 @@  ATOMIC_OP(and, and)
 ATOMIC_OP(or, or)
 ATOMIC_OP(xor, xor)
 
+#define atomic_add_return_relaxed atomic_add_return_relaxed
+#define atomic_sub_return_relaxed atomic_sub_return_relaxed
+
 #undef ATOMIC_OPS
-#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP_RETURN_RELAXED
 #undef ATOMIC_OP
 
 #define atomic_add_negative(a, v)	(atomic_add_return((a), (v)) < 0)
@@ -92,21 +128,19 @@  static __inline__ void atomic_inc(atomic_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ int atomic_inc_return(atomic_t *v)
+static __inline__ int atomic_inc_return_relaxed(atomic_t *v)
 {
 	int t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
-"1:	lwarx	%0,0,%1		# atomic_inc_return\n\
+"1:	lwarx	%0,0,%1		# atomic_inc_return_relaxed\n\
 	addic	%0,%0,1\n"
 	PPC405_ERR77(0,%1)
 "	stwcx.	%0,0,%1 \n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
@@ -136,25 +170,26 @@  static __inline__ void atomic_dec(atomic_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ int atomic_dec_return(atomic_t *v)
+static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
 {
 	int t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
-"1:	lwarx	%0,0,%1		# atomic_dec_return\n\
+"1:	lwarx	%0,0,%1		# atomic_dec_return_relaxed\n\
 	addic	%0,%0,-1\n"
 	PPC405_ERR77(0,%1)
 "	stwcx.	%0,0,%1\n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
 
+#define atomic_inc_return_relaxed atomic_inc_return_relaxed
+#define atomic_dec_return_relaxed atomic_dec_return_relaxed
+
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 
@@ -285,26 +320,27 @@  static __inline__ void atomic64_##op(long a, atomic64_t *v)		\
 	: "cc");							\
 }
 
-#define ATOMIC64_OP_RETURN(op, asm_op)					\
-static __inline__ long atomic64_##op##_return(long a, atomic64_t *v)	\
+#define ATOMIC64_OP_RETURN_RELAXED(op, asm_op)				\
+static inline long							\
+atomic64_##op##_return_relaxed(long a, atomic64_t *v)			\
 {									\
 	long t;								\
 									\
 	__asm__ __volatile__(						\
-	PPC_ATOMIC_ENTRY_BARRIER					\
-"1:	ldarx	%0,0,%2		# atomic64_" #op "_return\n"		\
-	#asm_op " %0,%1,%0\n"						\
-"	stdcx.	%0,0,%2 \n"						\
+"1:	ldarx	%0,0,%3		# atomic64_" #op "_return_relaxed\n"	\
+	#asm_op " %0,%2,%0\n"						\
+"	stdcx.	%0,0,%3\n"						\
 "	bne-	1b\n"							\
-	PPC_ATOMIC_EXIT_BARRIER						\
-	: "=&r" (t)							\
+	: "=&r" (t), "+m" (v->counter)					\
 	: "r" (a), "r" (&v->counter)					\
-	: "cc", "memory");						\
+	: "cc");							\
 									\
 	return t;							\
 }
 
-#define ATOMIC64_OPS(op, asm_op) ATOMIC64_OP(op, asm_op) ATOMIC64_OP_RETURN(op, asm_op)
+#define ATOMIC64_OPS(op, asm_op)					\
+	ATOMIC64_OP(op, asm_op)						\
+	ATOMIC64_OP_RETURN_RELAXED(op, asm_op)
 
 ATOMIC64_OPS(add, add)
 ATOMIC64_OPS(sub, subf)
@@ -312,8 +348,11 @@  ATOMIC64_OP(and, and)
 ATOMIC64_OP(or, or)
 ATOMIC64_OP(xor, xor)
 
-#undef ATOMIC64_OPS
-#undef ATOMIC64_OP_RETURN
+#define atomic64_add_return_relaxed atomic64_add_return_relaxed
+#define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
+
+#undef ATOPIC64_OPS
+#undef ATOMIC64_OP_RETURN_RELAXED
 #undef ATOMIC64_OP
 
 #define atomic64_add_negative(a, v)	(atomic64_add_return((a), (v)) < 0)
@@ -332,20 +371,18 @@  static __inline__ void atomic64_inc(atomic64_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ long atomic64_inc_return(atomic64_t *v)
+static __inline__ long atomic64_inc_return_relaxed(atomic64_t *v)
 {
 	long t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%1		# atomic64_inc_return\n\
 	addic	%0,%0,1\n\
 	stdcx.	%0,0,%1 \n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
@@ -374,24 +411,25 @@  static __inline__ void atomic64_dec(atomic64_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ long atomic64_dec_return(atomic64_t *v)
+static __inline__ long atomic64_dec_return_relaxed(atomic64_t *v)
 {
 	long t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%1		# atomic64_dec_return\n\
 	addic	%0,%0,-1\n\
 	stdcx.	%0,0,%1\n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
 
+#define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
+#define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
+
 #define atomic64_sub_and_test(a, v)	(atomic64_sub_return((a), (v)) == 0)
 #define atomic64_dec_and_test(v)	(atomic64_dec_return((v)) == 0)