Message ID | 20230609100026.8946-4-npiggin@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/4] powerpc: Make mmiowb a wmb | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | fail | 2 of 24 jobs failed. |
Nicholas Piggin <npiggin@gmail.com> writes: > The most expensive ordering for hwsync to provide is the store-load > barrier, because all prior stores have to be drained to the caches > before subsequent instructions can complete. > > stsync just orders stores which means it can just be a barrer that > goes down the store queue and orders draining, and does not prevent > completion of subsequent instructions. So it should be faster than > hwsync. > > Use stsync for wmb(). Older processors that don't recognise the SC > field should treat this as hwsync. qemu (7.1) emulating ppc64e does not :/ mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs mpic: ISU size: 256, shift: 8, mask: ff mpic: Initializing for 256 sources Oops: Exception in kernel mode, sig: 4 [#1] No more output. (qemu) info registers │ NIP c000000000df4264 LR c0000000000ce49c CTR 0000000000000000 XER 0000000020000000 CPU#0 │ MSR 0000000080001000 HID0 0000000000000000 HF 24020006 iidx 1 didx 1 │ ... SRR0 c0000000000ce7c4 SRR1 0000000080081000 PVR 0000000080240020 VRSAVE 0000000000000000 $ objdump -d vmlinux | grep c0000000000ce7c4 c0000000000ce7c4: 7c 03 04 ac stsync That's qemu -M ppce500 -cpu e5500 or e6500. I guess just put it behind an #ifdef 64S. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nicholas Piggin <npiggin@gmail.com> writes: >> The most expensive ordering for hwsync to provide is the store-load >> barrier, because all prior stores have to be drained to the caches >> before subsequent instructions can complete. >> >> stsync just orders stores which means it can just be a barrer that >> goes down the store queue and orders draining, and does not prevent >> completion of subsequent instructions. So it should be faster than >> hwsync. >> >> Use stsync for wmb(). Older processors that don't recognise the SC >> field should treat this as hwsync. > > qemu (7.1) emulating ppc64e does not :/ > > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs > mpic: ISU size: 256, shift: 8, mask: ff > mpic: Initializing for 256 sources > Oops: Exception in kernel mode, sig: 4 [#1] .. > > I guess just put it behind an #ifdef 64S. That doesn't work because qemu emulating a G5 also doesn't accept it. So either we need to get qemu updated and wait a while for that to percolate, or do some runtime patching of wmbs in the kernel >_< cheers
On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: > Michael Ellerman <mpe@ellerman.id.au> writes: > > Nicholas Piggin <npiggin@gmail.com> writes: > >> The most expensive ordering for hwsync to provide is the store-load > >> barrier, because all prior stores have to be drained to the caches > >> before subsequent instructions can complete. > >> > >> stsync just orders stores which means it can just be a barrer that > >> goes down the store queue and orders draining, and does not prevent > >> completion of subsequent instructions. So it should be faster than > >> hwsync. > >> > >> Use stsync for wmb(). Older processors that don't recognise the SC > >> field should treat this as hwsync. > > > > qemu (7.1) emulating ppc64e does not :/ > > > > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs > > mpic: ISU size: 256, shift: 8, mask: ff > > mpic: Initializing for 256 sources > > Oops: Exception in kernel mode, sig: 4 [#1] > .. > > > > I guess just put it behind an #ifdef 64S. > > That doesn't work because qemu emulating a G5 also doesn't accept it. > > So either we need to get qemu updated and wait a while for that to > percolate, or do some runtime patching of wmbs in the kernel >_< Gah, sorry. QEMU really should be ignoring reserved fields in instructions :( I guess leave it out for now. Should fix QEMU but we probably also need to do patching so as not to break older QEMUs. Thanks, Nick
"Nicholas Piggin" <npiggin@gmail.com> writes: > On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >> Michael Ellerman <mpe@ellerman.id.au> writes: >> > Nicholas Piggin <npiggin@gmail.com> writes: >> >> The most expensive ordering for hwsync to provide is the store-load >> >> barrier, because all prior stores have to be drained to the caches >> >> before subsequent instructions can complete. >> >> >> >> stsync just orders stores which means it can just be a barrer that >> >> goes down the store queue and orders draining, and does not prevent >> >> completion of subsequent instructions. So it should be faster than >> >> hwsync. >> >> >> >> Use stsync for wmb(). Older processors that don't recognise the SC >> >> field should treat this as hwsync. >> > >> > qemu (7.1) emulating ppc64e does not :/ >> > >> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs >> > mpic: ISU size: 256, shift: 8, mask: ff >> > mpic: Initializing for 256 sources >> > Oops: Exception in kernel mode, sig: 4 [#1] >> .. >> > >> > I guess just put it behind an #ifdef 64S. >> >> That doesn't work because qemu emulating a G5 also doesn't accept it. >> >> So either we need to get qemu updated and wait a while for that to >> percolate, or do some runtime patching of wmbs in the kernel >_< > > Gah, sorry. QEMU really should be ignoring reserved fields in > instructions :( Yeah, it's an annoying discrepancy vs real hardware and the ISA. > I guess leave it out for now. Should fix QEMU but we probably also need > to do patching so as not to break older QEMUs. I'll plan to take the first 3 patches, they seem OK as-is. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > "Nicholas Piggin" <npiggin@gmail.com> writes: >> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >>> Michael Ellerman <mpe@ellerman.id.au> writes: >>> > Nicholas Piggin <npiggin@gmail.com> writes: >>> >> The most expensive ordering for hwsync to provide is the store-load >>> >> barrier, because all prior stores have to be drained to the caches >>> >> before subsequent instructions can complete. >>> >> >>> >> stsync just orders stores which means it can just be a barrer that >>> >> goes down the store queue and orders draining, and does not prevent >>> >> completion of subsequent instructions. So it should be faster than >>> >> hwsync. >>> >> >>> >> Use stsync for wmb(). Older processors that don't recognise the SC >>> >> field should treat this as hwsync. >>> > >>> > qemu (7.1) emulating ppc64e does not :/ >>> > >>> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs >>> > mpic: ISU size: 256, shift: 8, mask: ff >>> > mpic: Initializing for 256 sources >>> > Oops: Exception in kernel mode, sig: 4 [#1] >>> .. >>> > >>> > I guess just put it behind an #ifdef 64S. >>> >>> That doesn't work because qemu emulating a G5 also doesn't accept it. >>> >>> So either we need to get qemu updated and wait a while for that to >>> percolate, or do some runtime patching of wmbs in the kernel >_< >> >> Gah, sorry. QEMU really should be ignoring reserved fields in >> instructions :( > > Yeah, it's an annoying discrepancy vs real hardware and the ISA. > >> I guess leave it out for now. Should fix QEMU but we probably also need >> to do patching so as not to break older QEMUs. > > I'll plan to take the first 3 patches, they seem OK as-is. I didn't do that in the end, because patch 2 suffers from the same problem of not working on QEMU. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Michael Ellerman <mpe@ellerman.id.au> writes: >> "Nicholas Piggin" <npiggin@gmail.com> writes: >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >>>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>> > Nicholas Piggin <npiggin@gmail.com> writes: >>>> >> The most expensive ordering for hwsync to provide is the store-load >>>> >> barrier, because all prior stores have to be drained to the caches >>>> >> before subsequent instructions can complete. >>>> >> >>>> >> stsync just orders stores which means it can just be a barrer that >>>> >> goes down the store queue and orders draining, and does not prevent >>>> >> completion of subsequent instructions. So it should be faster than >>>> >> hwsync. >>>> >> >>>> >> Use stsync for wmb(). Older processors that don't recognise the SC >>>> >> field should treat this as hwsync. >>>> > >>>> > qemu (7.1) emulating ppc64e does not :/ >>>> > >>>> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs >>>> > mpic: ISU size: 256, shift: 8, mask: ff >>>> > mpic: Initializing for 256 sources >>>> > Oops: Exception in kernel mode, sig: 4 [#1] >>>> .. >>>> > >>>> > I guess just put it behind an #ifdef 64S. >>>> >>>> That doesn't work because qemu emulating a G5 also doesn't accept it. >>>> >>>> So either we need to get qemu updated and wait a while for that to >>>> percolate, or do some runtime patching of wmbs in the kernel >_< >>> >>> Gah, sorry. QEMU really should be ignoring reserved fields in >>> instructions :( >> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA. >> >>> I guess leave it out for now. Should fix QEMU but we probably also need >>> to do patching so as not to break older QEMUs. >> >> I'll plan to take the first 3 patches, they seem OK as-is. > > I didn't do that in the end, because patch 2 suffers from the same ^ 3 > problem of not working on QEMU. > > cheers
On Thu, 24 Aug 2023 at 12:12, Michael Ellerman <mpe@ellerman.id.au> wrote: > > Michael Ellerman <mpe@ellerman.id.au> writes: > > Michael Ellerman <mpe@ellerman.id.au> writes: > >> "Nicholas Piggin" <npiggin@gmail.com> writes: > >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: > >>>> Michael Ellerman <mpe@ellerman.id.au> writes: > >>>> > Nicholas Piggin <npiggin@gmail.com> writes: > >>>> >> The most expensive ordering for hwsync to provide is the store-load > >>>> >> barrier, because all prior stores have to be drained to the caches > >>>> >> before subsequent instructions can complete. > >>>> >> > >>>> >> stsync just orders stores which means it can just be a barrer that > >>>> >> goes down the store queue and orders draining, and does not prevent > >>>> >> completion of subsequent instructions. So it should be faster than > >>>> >> hwsync. > >>>> >> > >>>> >> Use stsync for wmb(). Older processors that don't recognise the SC > >>>> >> field should treat this as hwsync. > >>>> > > >>>> > qemu (7.1) emulating ppc64e does not :/ > >>>> > > >>>> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs > >>>> > mpic: ISU size: 256, shift: 8, mask: ff > >>>> > mpic: Initializing for 256 sources > >>>> > Oops: Exception in kernel mode, sig: 4 [#1] > >>>> .. > >>>> > > >>>> > I guess just put it behind an #ifdef 64S. > >>>> > >>>> That doesn't work because qemu emulating a G5 also doesn't accept it. > >>>> > >>>> So either we need to get qemu updated and wait a while for that to > >>>> percolate, or do some runtime patching of wmbs in the kernel >_< > >>> > >>> Gah, sorry. QEMU really should be ignoring reserved fields in > >>> instructions :( > >> > >> Yeah, it's an annoying discrepancy vs real hardware and the ISA. > >> > >>> I guess leave it out for now. Should fix QEMU but we probably also need > >>> to do patching so as not to break older QEMUs. > >> > >> I'll plan to take the first 3 patches, they seem OK as-is. > > > > I didn't do that in the end, because patch 2 suffers from the same > ^ > 3 > > problem of not working on QEMU. Did we get a patch to fix this in to Qemu? Qemu has recently developed a stable tree process, so if we had a backportable fix we could get it in there too. Cheers, Joel
Joel Stanley <joel@jms.id.au> writes: > On Thu, 24 Aug 2023 at 12:12, Michael Ellerman <mpe@ellerman.id.au> wrote: >> >> Michael Ellerman <mpe@ellerman.id.au> writes: >> > Michael Ellerman <mpe@ellerman.id.au> writes: >> >> "Nicholas Piggin" <npiggin@gmail.com> writes: >> >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >> >>>> Michael Ellerman <mpe@ellerman.id.au> writes: >> >>>> > Nicholas Piggin <npiggin@gmail.com> writes: >> >>>> >> The most expensive ordering for hwsync to provide is the store-load >> >>>> >> barrier, because all prior stores have to be drained to the caches >> >>>> >> before subsequent instructions can complete. >> >>>> >> >> >>>> >> stsync just orders stores which means it can just be a barrer that >> >>>> >> goes down the store queue and orders draining, and does not prevent >> >>>> >> completion of subsequent instructions. So it should be faster than >> >>>> >> hwsync. >> >>>> >> >> >>>> >> Use stsync for wmb(). Older processors that don't recognise the SC >> >>>> >> field should treat this as hwsync. >> >>>> > >> >>>> > qemu (7.1) emulating ppc64e does not :/ >> >>>> > >> >>>> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe0040000, max 1 CPUs >> >>>> > mpic: ISU size: 256, shift: 8, mask: ff >> >>>> > mpic: Initializing for 256 sources >> >>>> > Oops: Exception in kernel mode, sig: 4 [#1] >> >>>> .. >> >>>> > >> >>>> > I guess just put it behind an #ifdef 64S. >> >>>> >> >>>> That doesn't work because qemu emulating a G5 also doesn't accept it. >> >>>> >> >>>> So either we need to get qemu updated and wait a while for that to >> >>>> percolate, or do some runtime patching of wmbs in the kernel >_< >> >>> >> >>> Gah, sorry. QEMU really should be ignoring reserved fields in >> >>> instructions :( >> >> >> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA. >> >> >> >>> I guess leave it out for now. Should fix QEMU but we probably also need >> >>> to do patching so as not to break older QEMUs. >> >> >> >> I'll plan to take the first 3 patches, they seem OK as-is. >> > >> > I didn't do that in the end, because patch 2 suffers from the same >> ^ >> 3 >> > problem of not working on QEMU. > > Did we get a patch to fix this in to Qemu? No. Nick might have looked at it but he hasn't posted anything AFAIK. cheers
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index f0ff5737b0d8..95e637c1a3b6 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -39,7 +39,7 @@ */ #define __mb() __asm__ __volatile__ ("sync" : : : "memory") #define __rmb() __asm__ __volatile__ ("sync" : : : "memory") -#define __wmb() __asm__ __volatile__ ("sync" : : : "memory") +#define __wmb() __asm__ __volatile__ (PPC_STSYNC : : : "memory") /* The sub-arch has lwsync */ #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
The most expensive ordering for hwsync to provide is the store-load barrier, because all prior stores have to be drained to the caches before subsequent instructions can complete. stsync just orders stores which means it can just be a barrer that goes down the store queue and orders draining, and does not prevent completion of subsequent instructions. So it should be faster than hwsync. Use stsync for wmb(). Older processors that don't recognise the SC field should treat this as hwsync. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/include/asm/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)