Message ID | 1475474583-3085-1-git-send-email-anton@ozlabs.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Mon, 3 Oct 2016 17:03:03 +1100 Anton Blanchard <anton@ozlabs.org> wrote: > From: Anton Blanchard <anton@samba.org> > > I see quite a lot of static branch mispredictions on a simple > web serving workload. The issue is in __atomic_add_unless(), called > from _atomic_dec_and_lock(). There is no obvious common case, so it > is better to let the hardware predict the branch. Seems reasonable. How problematic is an unmatched lwarx for performance? It seems that it will serialize a subsequent lwarx until it is next to complete (in the case of atomic_dec_and_lock, the spin_lock will immediately do another lwarx). Maybe that's not a big problem. Putting a regular load and test here before the lwarx would be disappointing because many users have success as the common case. > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > arch/powerpc/include/asm/atomic.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h > index f08d567..2b90335 100644 > --- a/arch/powerpc/include/asm/atomic.h > +++ b/arch/powerpc/include/asm/atomic.h > @@ -233,7 +233,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u) > PPC_ATOMIC_ENTRY_BARRIER > "1: lwarx %0,0,%1 # __atomic_add_unless\n\ > cmpw 0,%0,%3 \n\ > - beq- 2f \n\ > + beq 2f \n\ > add %0,%2,%0 \n" > PPC405_ERR77(0,%2) > " stwcx. %0,0,%1 \n\ > @@ -539,7 +539,7 @@ static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u) > PPC_ATOMIC_ENTRY_BARRIER > "1: ldarx %0,0,%1 # __atomic_add_unless\n\ > cmpd 0,%0,%3 \n\ > - beq- 2f \n\ > + beq 2f \n\ > add %0,%2,%0 \n" > " stdcx. %0,0,%1 \n\ > bne- 1b \n"
On Mon, 2016-03-10 at 06:03:03 UTC, Anton Blanchard wrote: > From: Anton Blanchard <anton@samba.org> > > I see quite a lot of static branch mispredictions on a simple > web serving workload. The issue is in __atomic_add_unless(), called > from _atomic_dec_and_lock(). There is no obvious common case, so it > is better to let the hardware predict the branch. > > Signed-off-by: Anton Blanchard <anton@samba.org> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/61e98ebff3ba3d3b17e999dc483c26 cheers
diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index f08d567..2b90335 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -233,7 +233,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u) PPC_ATOMIC_ENTRY_BARRIER "1: lwarx %0,0,%1 # __atomic_add_unless\n\ cmpw 0,%0,%3 \n\ - beq- 2f \n\ + beq 2f \n\ add %0,%2,%0 \n" PPC405_ERR77(0,%2) " stwcx. %0,0,%1 \n\ @@ -539,7 +539,7 @@ static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u) PPC_ATOMIC_ENTRY_BARRIER "1: ldarx %0,0,%1 # __atomic_add_unless\n\ cmpd 0,%0,%3 \n\ - beq- 2f \n\ + beq 2f \n\ add %0,%2,%0 \n" " stdcx. %0,0,%1 \n\ bne- 1b \n"