diff mbox series

[4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

Message ID 20230609100026.8946-4-npiggin@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series [1/4] powerpc: Make mmiowb a wmb | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu fail 2 of 24 jobs failed.

Commit Message

Nicholas Piggin June 9, 2023, 10 a.m. UTC
The most expensive ordering for hwsync to provide is the store-load
barrier, because all prior stores have to be drained to the caches
before subsequent instructions can complete.

stsync just orders stores which means it can just be a barrer that
goes down the store queue and orders draining, and does not prevent
completion of subsequent instructions. So it should be faster than
hwsync.

Use stsync for wmb(). Older processors that don't recognise the SC
field should treat this as hwsync.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michael Ellerman June 13, 2023, 1:59 p.m. UTC | #1
Nicholas Piggin <npiggin@gmail.com> writes:
> The most expensive ordering for hwsync to provide is the store-load
> barrier, because all prior stores have to be drained to the caches
> before subsequent instructions can complete.
>
> stsync just orders stores which means it can just be a barrer that
> goes down the store queue and orders draining, and does not prevent
> completion of subsequent instructions. So it should be faster than
> hwsync.
>
> Use stsync for wmb(). Older processors that don't recognise the SC
> field should treat this as hwsync.

qemu (7.1) emulating ppc64e does not :/

  mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
  mpic: ISU size: 256, shift: 8, mask: ff
  mpic: Initializing for 256 sources
  Oops: Exception in kernel mode, sig: 4 [#1]

No more output.

(qemu) info registers                                                                                                │
NIP c000000000df4264   LR c0000000000ce49c CTR 0000000000000000 XER 0000000020000000 CPU#0                           │
MSR 0000000080001000 HID0 0000000000000000  HF 24020006 iidx 1 didx 1                                                │
...
 SRR0 c0000000000ce7c4  SRR1 0000000080081000    PVR 0000000080240020 VRSAVE 0000000000000000

$ objdump -d vmlinux | grep c0000000000ce7c4
c0000000000ce7c4:       7c 03 04 ac     stsync


That's qemu -M ppce500 -cpu e5500 or e6500.

I guess just put it behind an #ifdef 64S.

cheers
Michael Ellerman June 14, 2023, 5:56 a.m. UTC | #2
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> The most expensive ordering for hwsync to provide is the store-load
>> barrier, because all prior stores have to be drained to the caches
>> before subsequent instructions can complete.
>>
>> stsync just orders stores which means it can just be a barrer that
>> goes down the store queue and orders draining, and does not prevent
>> completion of subsequent instructions. So it should be faster than
>> hwsync.
>>
>> Use stsync for wmb(). Older processors that don't recognise the SC
>> field should treat this as hwsync.
>
> qemu (7.1) emulating ppc64e does not :/
>
>   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
>   mpic: ISU size: 256, shift: 8, mask: ff
>   mpic: Initializing for 256 sources
>   Oops: Exception in kernel mode, sig: 4 [#1]
..
>
> I guess just put it behind an #ifdef 64S.

That doesn't work because qemu emulating a G5 also doesn't accept it.

So either we need to get qemu updated and wait a while for that to
percolate, or do some runtime patching of wmbs in the kernel >_<

cheers
Nicholas Piggin June 15, 2023, 1:53 a.m. UTC | #3
On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > Nicholas Piggin <npiggin@gmail.com> writes:
> >> The most expensive ordering for hwsync to provide is the store-load
> >> barrier, because all prior stores have to be drained to the caches
> >> before subsequent instructions can complete.
> >>
> >> stsync just orders stores which means it can just be a barrer that
> >> goes down the store queue and orders draining, and does not prevent
> >> completion of subsequent instructions. So it should be faster than
> >> hwsync.
> >>
> >> Use stsync for wmb(). Older processors that don't recognise the SC
> >> field should treat this as hwsync.
> >
> > qemu (7.1) emulating ppc64e does not :/
> >
> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
> >   mpic: ISU size: 256, shift: 8, mask: ff
> >   mpic: Initializing for 256 sources
> >   Oops: Exception in kernel mode, sig: 4 [#1]
> ..
> >
> > I guess just put it behind an #ifdef 64S.
>
> That doesn't work because qemu emulating a G5 also doesn't accept it.
>
> So either we need to get qemu updated and wait a while for that to
> percolate, or do some runtime patching of wmbs in the kernel >_<

Gah, sorry. QEMU really should be ignoring reserved fields in
instructions :(

I guess leave it out for now. Should fix QEMU but we probably also need
to do patching so as not to break older QEMUs.

Thanks,
Nick
Michael Ellerman June 15, 2023, 3:09 a.m. UTC | #4
"Nicholas Piggin" <npiggin@gmail.com> writes:
> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>> > Nicholas Piggin <npiggin@gmail.com> writes:
>> >> The most expensive ordering for hwsync to provide is the store-load
>> >> barrier, because all prior stores have to be drained to the caches
>> >> before subsequent instructions can complete.
>> >>
>> >> stsync just orders stores which means it can just be a barrer that
>> >> goes down the store queue and orders draining, and does not prevent
>> >> completion of subsequent instructions. So it should be faster than
>> >> hwsync.
>> >>
>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>> >> field should treat this as hwsync.
>> >
>> > qemu (7.1) emulating ppc64e does not :/
>> >
>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
>> >   mpic: ISU size: 256, shift: 8, mask: ff
>> >   mpic: Initializing for 256 sources
>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>> ..
>> >
>> > I guess just put it behind an #ifdef 64S.
>>
>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>>
>> So either we need to get qemu updated and wait a while for that to
>> percolate, or do some runtime patching of wmbs in the kernel >_<
>
> Gah, sorry. QEMU really should be ignoring reserved fields in
> instructions :(

Yeah, it's an annoying discrepancy vs real hardware and the ISA.

> I guess leave it out for now. Should fix QEMU but we probably also need
> to do patching so as not to break older QEMUs.

I'll plan to take the first 3 patches, they seem OK as-is.

cheers
Michael Ellerman Aug. 24, 2023, 12:11 p.m. UTC | #5
Michael Ellerman <mpe@ellerman.id.au> writes:
> "Nicholas Piggin" <npiggin@gmail.com> writes:
>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>> > Nicholas Piggin <npiggin@gmail.com> writes:
>>> >> The most expensive ordering for hwsync to provide is the store-load
>>> >> barrier, because all prior stores have to be drained to the caches
>>> >> before subsequent instructions can complete.
>>> >>
>>> >> stsync just orders stores which means it can just be a barrer that
>>> >> goes down the store queue and orders draining, and does not prevent
>>> >> completion of subsequent instructions. So it should be faster than
>>> >> hwsync.
>>> >>
>>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>>> >> field should treat this as hwsync.
>>> >
>>> > qemu (7.1) emulating ppc64e does not :/
>>> >
>>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
>>> >   mpic: ISU size: 256, shift: 8, mask: ff
>>> >   mpic: Initializing for 256 sources
>>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>>> ..
>>> >
>>> > I guess just put it behind an #ifdef 64S.
>>>
>>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>>>
>>> So either we need to get qemu updated and wait a while for that to
>>> percolate, or do some runtime patching of wmbs in the kernel >_<
>>
>> Gah, sorry. QEMU really should be ignoring reserved fields in
>> instructions :(
>
> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
>
>> I guess leave it out for now. Should fix QEMU but we probably also need
>> to do patching so as not to break older QEMUs.
>
> I'll plan to take the first 3 patches, they seem OK as-is.

I didn't do that in the end, because patch 2 suffers from the same
problem of not working on QEMU.

cheers
Michael Ellerman Aug. 24, 2023, 12:12 p.m. UTC | #6
Michael Ellerman <mpe@ellerman.id.au> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> "Nicholas Piggin" <npiggin@gmail.com> writes:
>>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>> > Nicholas Piggin <npiggin@gmail.com> writes:
>>>> >> The most expensive ordering for hwsync to provide is the store-load
>>>> >> barrier, because all prior stores have to be drained to the caches
>>>> >> before subsequent instructions can complete.
>>>> >>
>>>> >> stsync just orders stores which means it can just be a barrer that
>>>> >> goes down the store queue and orders draining, and does not prevent
>>>> >> completion of subsequent instructions. So it should be faster than
>>>> >> hwsync.
>>>> >>
>>>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>>>> >> field should treat this as hwsync.
>>>> >
>>>> > qemu (7.1) emulating ppc64e does not :/
>>>> >
>>>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
>>>> >   mpic: ISU size: 256, shift: 8, mask: ff
>>>> >   mpic: Initializing for 256 sources
>>>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>>>> ..
>>>> >
>>>> > I guess just put it behind an #ifdef 64S.
>>>>
>>>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>>>>
>>>> So either we need to get qemu updated and wait a while for that to
>>>> percolate, or do some runtime patching of wmbs in the kernel >_<
>>>
>>> Gah, sorry. QEMU really should be ignoring reserved fields in
>>> instructions :(
>>
>> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
>>
>>> I guess leave it out for now. Should fix QEMU but we probably also need
>>> to do patching so as not to break older QEMUs.
>>
>> I'll plan to take the first 3 patches, they seem OK as-is.
>
> I didn't do that in the end, because patch 2 suffers from the same
                                             ^
                                             3
> problem of not working on QEMU.
>
> cheers
Joel Stanley Aug. 25, 2023, 12:28 a.m. UTC | #7
On Thu, 24 Aug 2023 at 12:12, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > Michael Ellerman <mpe@ellerman.id.au> writes:
> >> "Nicholas Piggin" <npiggin@gmail.com> writes:
> >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
> >>>> Michael Ellerman <mpe@ellerman.id.au> writes:
> >>>> > Nicholas Piggin <npiggin@gmail.com> writes:
> >>>> >> The most expensive ordering for hwsync to provide is the store-load
> >>>> >> barrier, because all prior stores have to be drained to the caches
> >>>> >> before subsequent instructions can complete.
> >>>> >>
> >>>> >> stsync just orders stores which means it can just be a barrer that
> >>>> >> goes down the store queue and orders draining, and does not prevent
> >>>> >> completion of subsequent instructions. So it should be faster than
> >>>> >> hwsync.
> >>>> >>
> >>>> >> Use stsync for wmb(). Older processors that don't recognise the SC
> >>>> >> field should treat this as hwsync.
> >>>> >
> >>>> > qemu (7.1) emulating ppc64e does not :/
> >>>> >
> >>>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
> >>>> >   mpic: ISU size: 256, shift: 8, mask: ff
> >>>> >   mpic: Initializing for 256 sources
> >>>> >   Oops: Exception in kernel mode, sig: 4 [#1]
> >>>> ..
> >>>> >
> >>>> > I guess just put it behind an #ifdef 64S.
> >>>>
> >>>> That doesn't work because qemu emulating a G5 also doesn't accept it.
> >>>>
> >>>> So either we need to get qemu updated and wait a while for that to
> >>>> percolate, or do some runtime patching of wmbs in the kernel >_<
> >>>
> >>> Gah, sorry. QEMU really should be ignoring reserved fields in
> >>> instructions :(
> >>
> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
> >>
> >>> I guess leave it out for now. Should fix QEMU but we probably also need
> >>> to do patching so as not to break older QEMUs.
> >>
> >> I'll plan to take the first 3 patches, they seem OK as-is.
> >
> > I didn't do that in the end, because patch 2 suffers from the same
>                                              ^
>                                              3
> > problem of not working on QEMU.

Did we get a patch to fix this in to Qemu?

Qemu has recently developed a stable tree process, so if we had a
backportable fix we could get it in there too.

Cheers,

Joel
Michael Ellerman Aug. 25, 2023, 6:59 a.m. UTC | #8
Joel Stanley <joel@jms.id.au> writes:
> On Thu, 24 Aug 2023 at 12:12, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>> > Michael Ellerman <mpe@ellerman.id.au> writes:
>> >> "Nicholas Piggin" <npiggin@gmail.com> writes:
>> >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>> >>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>> >>>> > Nicholas Piggin <npiggin@gmail.com> writes:
>> >>>> >> The most expensive ordering for hwsync to provide is the store-load
>> >>>> >> barrier, because all prior stores have to be drained to the caches
>> >>>> >> before subsequent instructions can complete.
>> >>>> >>
>> >>>> >> stsync just orders stores which means it can just be a barrer that
>> >>>> >> goes down the store queue and orders draining, and does not prevent
>> >>>> >> completion of subsequent instructions. So it should be faster than
>> >>>> >> hwsync.
>> >>>> >>
>> >>>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>> >>>> >> field should treat this as hwsync.
>> >>>> >
>> >>>> > qemu (7.1) emulating ppc64e does not :/
>> >>>> >
>> >>>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe0040000, max 1 CPUs
>> >>>> >   mpic: ISU size: 256, shift: 8, mask: ff
>> >>>> >   mpic: Initializing for 256 sources
>> >>>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>> >>>> ..
>> >>>> >
>> >>>> > I guess just put it behind an #ifdef 64S.
>> >>>>
>> >>>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>> >>>>
>> >>>> So either we need to get qemu updated and wait a while for that to
>> >>>> percolate, or do some runtime patching of wmbs in the kernel >_<
>> >>>
>> >>> Gah, sorry. QEMU really should be ignoring reserved fields in
>> >>> instructions :(
>> >>
>> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
>> >>
>> >>> I guess leave it out for now. Should fix QEMU but we probably also need
>> >>> to do patching so as not to break older QEMUs.
>> >>
>> >> I'll plan to take the first 3 patches, they seem OK as-is.
>> >
>> > I didn't do that in the end, because patch 2 suffers from the same
>>                                              ^
>>                                              3
>> > problem of not working on QEMU.
>
> Did we get a patch to fix this in to Qemu?

No. Nick might have looked at it but he hasn't posted anything AFAIK.

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index f0ff5737b0d8..95e637c1a3b6 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -39,7 +39,7 @@ 
  */
 #define __mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define __rmb()  __asm__ __volatile__ ("sync" : : : "memory")
-#define __wmb()  __asm__ __volatile__ ("sync" : : : "memory")
+#define __wmb()  __asm__ __volatile__ (PPC_STSYNC : : : "memory")
 
 /* The sub-arch has lwsync */
 #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)