Message ID | 20220720132132.903462-2-npiggin@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3,1/2] powerpc: add ISA v3.0 / v3.1 wait opcode macro | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 10 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 10 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 7 jobs. |
Hi! On Wed, Jul 20, 2022 at 11:21:32PM +1000, Nicholas Piggin wrote: > We want to move away from using SMT prioroty updates for cpu_relax, and (typo, "priority") > +#define spin_begin() \ > +do { \ > + asm volatile(ASM_FTR_IFCLR( \ > + "or 1,1,1", /* HMT_LOW */ \ > + "nop",/* POWER10 onward uses pause_short (wait 2,0) */ \ > + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ > +} while (0) Is that nop patched later? Or should you change the comment, maybe? > +#define spin_cpu_relax() \ > +do { \ > + asm volatile(ASM_FTR_IFCLR( \ > + /* Pre-POWER10 uses low / medium priority nops */ \ > + "nop", \ "nop" aka "or 0,0,0" does not change program priority? Medium low would be "or 6,6,6", not sure if that is the ppr you wanted here? > + /* POWER10 onward uses pause_short (wait 2,0) */ \ > + PPC_WAIT(2, 0), \ > + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ > +} while (0) > + > +#define spin_end() \ > +do { \ > + asm volatile(ASM_FTR_IFCLR( \ > + "or 2,2,2", /* HMT_MEDIUM */ \ > + "nop",/* POWER10 onward uses pause_short (wait 2,0) */ \ > + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ > +} while (0) Same comment as for spin_begin. Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Segher
Nicholas Piggin <npiggin@gmail.com> writes: > We want to move away from using SMT prioroty updates for cpu_relax, and > use a 'wait' instruction which is similar to x86. As well as being a > much better fit for what everybody else uses and tests with, priority > nops are stateful which is nasty (interrupts have to consider they might > be taken at a different priority), and they're expensive to execute, > similar to a mtSPR which can effect other threads in the pipe. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > Unfortunately qemu TCG does not emulate pause_short properly and will > cause hangs. That _really_ sucks for testing, being able to use qemu is a huge benefit. I can boot test multiple kernels per minute using qemu, vs multiple minutes per kernel using real hardware. > I have a patch for it but not merged yet. But if we tune > qspinlock code it would be best to do it with this patch. What's the urgency on this patch? Can we wait for the qemu change to land? I guess Qemu 8 is not due until next year? :/ cheers
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index fdfaae194ddd..d926e59f9d1b 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -355,11 +355,31 @@ static inline unsigned long __pack_fe01(unsigned int fpmode) #ifdef CONFIG_PPC64 -#define spin_begin() HMT_low() - -#define spin_cpu_relax() barrier() - -#define spin_end() HMT_medium() +#define spin_begin() \ +do { \ + asm volatile(ASM_FTR_IFCLR( \ + "or 1,1,1", /* HMT_LOW */ \ + "nop",/* POWER10 onward uses pause_short (wait 2,0) */ \ + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ +} while (0) + +#define spin_cpu_relax() \ +do { \ + asm volatile(ASM_FTR_IFCLR( \ + /* Pre-POWER10 uses low / medium priority nops */ \ + "nop", \ + /* POWER10 onward uses pause_short (wait 2,0) */ \ + PPC_WAIT(2, 0), \ + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ +} while (0) + +#define spin_end() \ +do { \ + asm volatile(ASM_FTR_IFCLR( \ + "or 2,2,2", /* HMT_MEDIUM */ \ + "nop",/* POWER10 onward uses pause_short (wait 2,0) */ \ + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ +} while (0) #endif diff --git a/arch/powerpc/include/asm/vdso/processor.h b/arch/powerpc/include/asm/vdso/processor.h index 8d79f994b4aa..778d2b53041b 100644 --- a/arch/powerpc/include/asm/vdso/processor.h +++ b/arch/powerpc/include/asm/vdso/processor.h @@ -22,7 +22,15 @@ #endif #ifdef CONFIG_PPC64 -#define cpu_relax() do { HMT_low(); HMT_medium(); barrier(); } while (0) +#define cpu_relax() \ +do { \ + asm volatile(ASM_FTR_IFCLR( \ + /* Pre-POWER10 uses low ; medium priority nops */ \ + "or 1,1,1 ; or 2,2,2", \ + /* POWER10 onward uses pause_short (wait 2,0) */ \ + PPC_WAIT(2, 0), \ + %0) :: "i" (CPU_FTR_ARCH_31) : "memory"); \ +} while (0) #else #define cpu_relax() barrier() #endif
We want to move away from using SMT prioroty updates for cpu_relax, and use a 'wait' instruction which is similar to x86. As well as being a much better fit for what everybody else uses and tests with, priority nops are stateful which is nasty (interrupts have to consider they might be taken at a different priority), and they're expensive to execute, similar to a mtSPR which can effect other threads in the pipe. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- Unfortunately qemu TCG does not emulate pause_short properly and will cause hangs. I have a patch for it but not merged yet. But if we tune qspinlock code it would be best to do it with this patch. arch/powerpc/include/asm/processor.h | 30 +++++++++++++++++++---- arch/powerpc/include/asm/vdso/processor.h | 10 +++++++- 2 files changed, 34 insertions(+), 6 deletions(-)