diff mbox series

arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

Message ID 20230222060107.70565-1-kconsul@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show
Series arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse fail sparse (ppc64, ubuntu-22.04, ppc64) failed at step Build.
snowpatch_ozlabs/github-powerpc_kernel_qemu fail kernel (corenet64_smp_defconfig, korg-5.5.0, /linux/arch/powerpc/configs/ppc64e-qemu.config) failed at step Build.
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.

Commit Message

Kautuk Consul Feb. 22, 2023, 6:01 a.m. UTC
A link from ibm.com states:
"Ensures that all instructions preceding the call to __lwsync
 complete before any subsequent store instructions can be executed
 on the processor that executed the function. Also, it ensures that
 all load instructions preceding the call to __lwsync complete before
 any subsequent load instructions can be executed on the processor
 that executed the function. This allows you to synchronize between
 multiple processors with minimal performance impact, as __lwsync
 does not wait for confirmation from each processor."

Thats why smp_rmb() and smp_wmb() are defined to lwsync.
But this same understanding applies to parallel pipeline
execution on each PowerPC processor.
So, use the lwsync instruction for rmb() and wmb() on the PPC
architectures that support it.

Also removed some useless spaces.

Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/barrier.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Kautuk Consul Feb. 22, 2023, 6:05 a.m. UTC | #1
Hi All,

On Wed, Feb 22, 2023 at 11:31:07AM +0530, Kautuk Consul wrote:
>  /* The sub-arch has lwsync */
>  #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> -#    define SMPWMB      LWSYNC
> +#undef rmb
> +#undef wmb
> +/* Redefine rmb() to lwsync. */
> +#define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +/* Redefine wmb() to lwsync. */
> +#define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +#define SMPWMB      LWSYNC
>  #elif defined(CONFIG_BOOKE)
> -#    define SMPWMB      mbar
> +#define SMPWMB      mbar
>  #else
> -#    define SMPWMB      eieio
> +#define SMPWMB      eieio
>  #endif

I think I am conceptually right about this patch but I lack the
resources currently to tets this out on PowerPC 64 bit servers.

I request IBM/Non-IBM employees to test this patch out for:
a) functionality breaking. This patch is no good if this breaks current
   kernel functionality.
b) performance impact. If functionality doesn't break, can anyone do
   some reliable kernel load testing on ppc64 servers to see if there
   is any significant performance gain ?

Thanks a lot!
Christophe Leroy Feb. 22, 2023, 7:02 a.m. UTC | #2
Le 22/02/2023 à 07:01, Kautuk Consul a écrit :
> A link from ibm.com states:
> "Ensures that all instructions preceding the call to __lwsync
>   complete before any subsequent store instructions can be executed
>   on the processor that executed the function. Also, it ensures that
>   all load instructions preceding the call to __lwsync complete before
>   any subsequent load instructions can be executed on the processor
>   that executed the function. This allows you to synchronize between
>   multiple processors with minimal performance impact, as __lwsync
>   does not wait for confirmation from each processor."
> 
> Thats why smp_rmb() and smp_wmb() are defined to lwsync.
> But this same understanding applies to parallel pipeline
> execution on each PowerPC processor.
> So, use the lwsync instruction for rmb() and wmb() on the PPC
> architectures that support it.
> 
> Also removed some useless spaces.
> 
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/barrier.h | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index e80b2c0e9315..553f5a5d20bd 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -41,11 +41,17 @@
>   
>   /* The sub-arch has lwsync */
>   #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> -#    define SMPWMB      LWSYNC

This line shouldn't be changed by your patch

> +#undef rmb
> +#undef wmb

I see no point with defining something and undefining them a few lines 
later.

Instead, why not do:

#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
#define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
#define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
#else
#define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
#define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
#endif

By the way, why put it inside a ({ }) ?
And I think nowdays we use "asm volatile" not "__asm__ __volatile__"

Shouldn't you also consider the same for mb() ?

Note that your series will conflict with b6e259297a6b ("powerpc/kcsan: 
Memory barriers semantics") in powerpc/next tree.

> +/* Redefine rmb() to lwsync. */

WHat's the added value of this comment ? Isn't it obvious in the line 
below that rmb() is being defined to lwsync ? Please avoid useless comments.

> +#define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +/* Redefine wmb() to lwsync. */
> +#define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +#define SMPWMB      LWSYNC
>   #elif defined(CONFIG_BOOKE)
> -#    define SMPWMB      mbar

This line shouldn't be changed by your patch

> +#define SMPWMB      mbar
>   #else
> -#    define SMPWMB      eieio

This line shouldn't be changed by your patch

> +#define SMPWMB      eieio
>   #endif
>   
>   /* clang defines this macro for a builtin, which will not work with runtime patching */
Kautuk Consul Feb. 22, 2023, 8:16 a.m. UTC | #3
On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
> 
> 
> Le 22/02/2023 à 07:01, Kautuk Consul a écrit :
> > A link from ibm.com states:
> > "Ensures that all instructions preceding the call to __lwsync
> >   complete before any subsequent store instructions can be executed
> >   on the processor that executed the function. Also, it ensures that
> >   all load instructions preceding the call to __lwsync complete before
> >   any subsequent load instructions can be executed on the processor
> >   that executed the function. This allows you to synchronize between
> >   multiple processors with minimal performance impact, as __lwsync
> >   does not wait for confirmation from each processor."
> > 
> > Thats why smp_rmb() and smp_wmb() are defined to lwsync.
> > But this same understanding applies to parallel pipeline
> > execution on each PowerPC processor.
> > So, use the lwsync instruction for rmb() and wmb() on the PPC
> > architectures that support it.
> > 
> > Also removed some useless spaces.
> > 
> > Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> > ---
> >   arch/powerpc/include/asm/barrier.h | 12 +++++++++---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index e80b2c0e9315..553f5a5d20bd 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -41,11 +41,17 @@
> >   
> >   /* The sub-arch has lwsync */
> >   #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> > -#    define SMPWMB      LWSYNC
> 
> This line shouldn't be changed by your patch
I mentioned it in the commit message.
But if you want I'll remove this. Did this because the rest
of the file doesn't have these spaces.
> 
> > +#undef rmb
> > +#undef wmb
> 
> I see no point with defining something and undefining them a few lines 
> later.
> 
> Instead, why not do:
> 
> #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
> 
> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> #define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> #define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> #else
> #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> #endif
> 
I thought of doing that earlier, but there exists one more #elif
for CONFIG_BOOKE and then the #else.
That way we would have to put 3 different #defines for rmb and wmb
and I wanted to avoid that.
> By the way, why put it inside a ({ }) ?
checkpatch.pl asks for ({}).
> And I think nowdays we use "asm volatile" not "__asm__ __volatile__"
I was just following what was there in the file already.
Can change this if required.
> 
> Shouldn't you also consider the same for mb() ?
My change wasn't meant to address newer usages of as volatile
#defines. I just wanted to redefine the rmb and wmb #defines
to lwsync.
> 
> Note that your series will conflict with b6e259297a6b ("powerpc/kcsan: 
> Memory barriers semantics") in powerpc/next tree.
I thought of defining the __rmb and __wmb macros but I decided against
it because I didn't understand kcsan completely.
I used the standard Linus' tree, not powerpc/next.
Can you tell me which branch or git repo I should make this patch on ?
> 
> > +/* Redefine rmb() to lwsync. */
> 
> WHat's the added value of this comment ? Isn't it obvious in the line 
> below that rmb() is being defined to lwsync ? Please avoid useless comments.
Sure.
> 
> > +#define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> > +/* Redefine wmb() to lwsync. */
> > +#define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> > +#define SMPWMB      LWSYNC
> >   #elif defined(CONFIG_BOOKE)
> > -#    define SMPWMB      mbar
> 
> This line shouldn't be changed by your patch
> 
> > +#define SMPWMB      mbar
> >   #else
> > -#    define SMPWMB      eieio
Ok. Can change my patch.
> 
> This line shouldn't be changed by your patch
> 
> > +#define SMPWMB      eieio
> >   #endif
Sure. Will retain this too.
> >   
> >   /* clang defines this macro for a builtin, which will not work with runtime patching */
Kautuk Consul Feb. 22, 2023, 8:21 a.m. UTC | #4
> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
> > > +/* Redefine rmb() to lwsync. */
> > 
> > WHat's the added value of this comment ? Isn't it obvious in the line 
> > below that rmb() is being defined to lwsync ? Please avoid useless comments.
> Sure.
Sorry, forgot to add that I wasn't adding this useless comment.
Its just that checkpatch.pl complains that the memory barrier #define
doesn't have a comment for it.
> >
Christophe Leroy Feb. 22, 2023, 8:28 a.m. UTC | #5
Le 22/02/2023 à 09:21, Kautuk Consul a écrit :
>> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
>>>> +/* Redefine rmb() to lwsync. */
>>>
>>> WHat's the added value of this comment ? Isn't it obvious in the line
>>> below that rmb() is being defined to lwsync ? Please avoid useless comments.
>> Sure.
> Sorry, forgot to add that I wasn't adding this useless comment.
> Its just that checkpatch.pl complains that the memory barrier #define
> doesn't have a comment for it.
>>>

See https://docs.kernel.org/dev-tools/checkpatch.html, it says:

Checkpatch is not always right. Your judgement takes precedence over 
checkpatch messages. If your code looks better with the violations, then 
its probably best left alone.

checkpatch wants a comment for uses of memory barriers. Here I think it 
is a false positive.
Kautuk Consul Feb. 22, 2023, 8:34 a.m. UTC | #6
On Wed, Feb 22, 2023 at 08:28:19AM +0000, Christophe Leroy wrote:
> 
> 
> Le 22/02/2023 à 09:21, Kautuk Consul a écrit :
> >> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
> >>>> +/* Redefine rmb() to lwsync. */
> >>>
> >>> WHat's the added value of this comment ? Isn't it obvious in the line
> >>> below that rmb() is being defined to lwsync ? Please avoid useless comments.
> >> Sure.
> > Sorry, forgot to add that I wasn't adding this useless comment.
> > Its just that checkpatch.pl complains that the memory barrier #define
> > doesn't have a comment for it.
> >>>
> 
> See https://docs.kernel.org/dev-tools/checkpatch.html, it says:
> 
> Checkpatch is not always right. Your judgement takes precedence over 
> checkpatch messages. If your code looks better with the violations, then 
> its probably best left alone.
> 
> checkpatch wants a comment for uses of memory barriers. Here I think it 
> is a false positive.
Cool. I will make the changes you mentioned.
Can you tell me which branch or git repo I should re-make this patch on ?
Christophe Leroy Feb. 22, 2023, 8:40 a.m. UTC | #7
Le 22/02/2023 à 09:16, Kautuk Consul a écrit :
> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 22/02/2023 à 07:01, Kautuk Consul a écrit :
>>> A link from ibm.com states:
>>> "Ensures that all instructions preceding the call to __lwsync
>>>    complete before any subsequent store instructions can be executed
>>>    on the processor that executed the function. Also, it ensures that
>>>    all load instructions preceding the call to __lwsync complete before
>>>    any subsequent load instructions can be executed on the processor
>>>    that executed the function. This allows you to synchronize between
>>>    multiple processors with minimal performance impact, as __lwsync
>>>    does not wait for confirmation from each processor."
>>>
>>> Thats why smp_rmb() and smp_wmb() are defined to lwsync.
>>> But this same understanding applies to parallel pipeline
>>> execution on each PowerPC processor.
>>> So, use the lwsync instruction for rmb() and wmb() on the PPC
>>> architectures that support it.
>>>
>>> Also removed some useless spaces.
>>>
>>> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
>>> ---
>>>    arch/powerpc/include/asm/barrier.h | 12 +++++++++---
>>>    1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
>>> index e80b2c0e9315..553f5a5d20bd 100644
>>> --- a/arch/powerpc/include/asm/barrier.h
>>> +++ b/arch/powerpc/include/asm/barrier.h
>>> @@ -41,11 +41,17 @@
>>>    
>>>    /* The sub-arch has lwsync */
>>>    #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
>>> -#    define SMPWMB      LWSYNC
>>
>> This line shouldn't be changed by your patch
> I mentioned it in the commit message.
> But if you want I'll remove this. Did this because the rest
> of the file doesn't have these spaces.
>>
>>> +#undef rmb
>>> +#undef wmb
>>
>> I see no point with defining something and undefining them a few lines
>> later.
>>
>> Instead, why not do:
>>
>> #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
>>
>> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
>> #define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>> #define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>> #else
>> #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>> #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
>> #endif
>>
> I thought of doing that earlier, but there exists one more #elif
> for CONFIG_BOOKE and then the #else.
> That way we would have to put 3 different #defines for rmb and wmb
> and I wanted to avoid that.

No, I don't mean to use the existing #ifdef/elif/else.

Define an #ifdef /#else dedicated to xmb macros.

Something like that:

@@ -35,9 +35,15 @@
   * However, on CPUs that don't support lwsync, lwsync actually maps to a
   * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
   */
+#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
+#define __mb()   asm volatile ("lwsync" : : : "memory")
+#define __rmb()  asm volatile ("lwsync" : : : "memory")
+#define __wmb()  asm volatile ("lwsync" : : : "memory")
+#else
  #define __mb()   __asm__ __volatile__ ("sync" : : : "memory")
  #define __rmb()  __asm__ __volatile__ ("sync" : : : "memory")
  #define __wmb()  __asm__ __volatile__ ("sync" : : : "memory")
+#endif

  /* The sub-arch has lwsync */
  #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)



>> By the way, why put it inside a ({ }) ?
> checkpatch.pl asks for ({}).
>> And I think nowdays we use "asm volatile" not "__asm__ __volatile__"
> I was just following what was there in the file already.
> Can change this if required.
>>
>> Shouldn't you also consider the same for mb() ?
> My change wasn't meant to address newer usages of as volatile
> #defines. I just wanted to redefine the rmb and wmb #defines
> to lwsync.

That's my point, why not also redefine mb to lwsync ?

>>
>> Note that your series will conflict with b6e259297a6b ("powerpc/kcsan:
>> Memory barriers semantics") in powerpc/next tree.
> I thought of defining the __rmb and __wmb macros but I decided against
> it because I didn't understand kcsan completely.
> I used the standard Linus' tree, not powerpc/next.
> Can you tell me which branch or git repo I should make this patch on ?

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git

'merge' branch is a merge of branches 'master', 'fixes' and 'next'.

That's the branch to use, allthough it is not always in sync with fixes 
and next, in that case all you have to do is to locally merge 'next' and 
'fixes' branch until it's done remotely.

But just using 'next' branch also works most of the time.

Note that 'next' branch should already be part of linux-next so you may 
also use linux-next if you prefer.

>>
>>> +/* Redefine rmb() to lwsync. */
>>
>> WHat's the added value of this comment ? Isn't it obvious in the line
>> below that rmb() is being defined to lwsync ? Please avoid useless comments.
> Sure.
>>
>>> +#define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>>> +/* Redefine wmb() to lwsync. */
>>> +#define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>>> +#define SMPWMB      LWSYNC
>>>    #elif defined(CONFIG_BOOKE)
>>> -#    define SMPWMB      mbar
>>
>> This line shouldn't be changed by your patch
>>
>>> +#define SMPWMB      mbar
>>>    #else
>>> -#    define SMPWMB      eieio
> Ok. Can change my patch.
>>
>> This line shouldn't be changed by your patch
>>
>>> +#define SMPWMB      eieio
>>>    #endif
> Sure. Will retain this too.
>>>    
>>>    /* clang defines this macro for a builtin, which will not work with runtime patching */
Kautuk Consul Feb. 22, 2023, 8:47 a.m. UTC | #8
> No, I don't mean to use the existing #ifdef/elif/else.
> 
> Define an #ifdef /#else dedicated to xmb macros.
> 
> Something like that:
> 
> @@ -35,9 +35,15 @@
>    * However, on CPUs that don't support lwsync, lwsync actually maps to a
>    * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
>    */
> +#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> +#define __mb()   asm volatile ("lwsync" : : : "memory")
> +#define __rmb()  asm volatile ("lwsync" : : : "memory")
> +#define __wmb()  asm volatile ("lwsync" : : : "memory")
> +#else
>   #define __mb()   __asm__ __volatile__ ("sync" : : : "memory")
>   #define __rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>   #define __wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> +#endif
Ok. Got it. Will do.

> >> Shouldn't you also consider the same for mb() ?
> > My change wasn't meant to address newer usages of as volatile
> > #defines. I just wanted to redefine the rmb and wmb #defines
> > to lwsync.
> 
> That's my point, why not also redefine mb to lwsync ?
That would be incorrect. lwsync will only work for one: load or store.
mb() is meant for barriering both loads as well as stores so the sync
instruction is correct for that one.
> 
> >>
> >> Note that your series will conflict with b6e259297a6b ("powerpc/kcsan:
> >> Memory barriers semantics") in powerpc/next tree.
> > I thought of defining the __rmb and __wmb macros but I decided against
> > it because I didn't understand kcsan completely.
> > I used the standard Linus' tree, not powerpc/next.
> > Can you tell me which branch or git repo I should make this patch on ?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
> 
> 'merge' branch is a merge of branches 'master', 'fixes' and 'next'.
> 
> That's the branch to use, allthough it is not always in sync with fixes 
> and next, in that case all you have to do is to locally merge 'next' and 
> 'fixes' branch until it's done remotely.
> 
> But just using 'next' branch also works most of the time.
> 
> Note that 'next' branch should already be part of linux-next so you may 
> also use linux-next if you prefer.
> 
Will send another patch on this.
Thanks. Will use linux-next branch on this git repo.
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index e80b2c0e9315..553f5a5d20bd 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -41,11 +41,17 @@ 
 
 /* The sub-arch has lwsync */
 #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
-#    define SMPWMB      LWSYNC
+#undef rmb
+#undef wmb
+/* Redefine rmb() to lwsync. */
+#define rmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
+/* Redefine wmb() to lwsync. */
+#define wmb()	({__asm__ __volatile__ ("lwsync" : : : "memory"); })
+#define SMPWMB      LWSYNC
 #elif defined(CONFIG_BOOKE)
-#    define SMPWMB      mbar
+#define SMPWMB      mbar
 #else
-#    define SMPWMB      eieio
+#define SMPWMB      eieio
 #endif
 
 /* clang defines this macro for a builtin, which will not work with runtime patching */