Message ID | 20230121095352.2823517-1-npiggin@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | bc88ef663265676419555df2dc469a471c0add31 |
Headers | show |
Series | powerpc/64s: Fix local irq disable when PMIs are disabled | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
On Sat, 21 Jan 2023 19:53:52 +1000, Nicholas Piggin wrote: > When PMI interrupts are soft-masked, local_irq_save() will clear the PMI > mask bit, allowing PMIs in and causing a race condition. This causes a > deadlock in native_hpte_insert via hash_preload, which depends on PMIs > being disabled since commit 8b91cee5eadd ("powerpc/64s/hash: Make hash > faults work in NMI context"). native_hpte_insert calls local_irq_save(). > It's possible the lpar hash code is also affected when tracing is > enabled because __trace_hcall_entry() calls local_irq_save(). > > [...] Applied to powerpc/fixes. [1/1] powerpc/64s: Fix local irq disable when PMIs are disabled https://git.kernel.org/powerpc/c/bc88ef663265676419555df2dc469a471c0add31 cheers
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 77fa88c2aed0..5156fe21284c 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -180,6 +180,9 @@ static inline unsigned long arch_local_save_flags(void) static inline void arch_local_irq_disable(void) { + if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) + WARN_ON_ONCE((irq_soft_mask_return() != IRQS_ENABLED) && + (irq_soft_mask_return() != IRQS_DISABLED)); irq_soft_mask_set(IRQS_DISABLED); } @@ -192,7 +195,7 @@ static inline void arch_local_irq_enable(void) static inline unsigned long arch_local_irq_save(void) { - return irq_soft_mask_set_return(IRQS_DISABLED); + return irq_soft_mask_or_return(IRQS_DISABLED); } static inline bool arch_irqs_disabled_flags(unsigned long flags)
When PMI interrupts are soft-masked, local_irq_save() will clear the PMI mask bit, allowing PMIs in and causing a race condition. This causes a deadlock in native_hpte_insert via hash_preload, which depends on PMIs being disabled since commit 8b91cee5eadd ("powerpc/64s/hash: Make hash faults work in NMI context"). native_hpte_insert calls local_irq_save(). It's possible the lpar hash code is also affected when tracing is enabled because __trace_hcall_entry() calls local_irq_save(). Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit into the mask. Add a warning in arch_local_irq_disable() to make sure it isn't called with PMIs disabled. This was found with the stress_hpt option with a kbuild workload running together with `perf record -g`. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Fixes: 8b91cee5eadd ("powerpc/64s/hash: Make hash faults work in NMI context") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- Lockup looks like this, note IRQMASK=1 in native_hpte_insert when we expect it should be 3. watchdog: CPU 16 Hard LOCKUP watchdog: CPU 16 TB:6084087529753, last heartbeat TB:6075895318740 (16000ms ago) CPU: 16 PID: 9319 Comm: check-local-exp NIP: c00000000008b040 LR: c00000000037cd64 CTR: c000000000342160 REGS: c000003fffa3fd60 TRAP: 0100 Not tainted MSR: 9000000000081033 <SF,HV,ME,IR,DR,RI,LE> CR: 88484808 XER: 20040078 CFAR: c00000000000dc3c IRQMASK: 3 GPR00: c0000000000e5b10 c000000088e17090 c0000000010c0100 c000000088e170f0 GPR04: 00007fffffffc690 0000000000000008 c0000000024f0100 fffffffffffffe00 GPR08: c000000012ac4cc0 bcffffffffffffff a8aaaaaaaaaaaaaa 0000000000004000 GPR12: c000000000342160 c000003fffff2880 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000001 fffffffffffffe00 c00000002c16d000 000ffffffffffff8 GPR28: 00007fffffffffdf 0000000000000000 00007fffffffc690 c000000088e171b0 NIP [c00000000008b040] __copy_tofrom_user_power7+0x20c/0x7ac LR [c00000000037cd64] copy_from_user_nofault+0xa4/0x190 Call Trace: [c000000088e17090] [c000003feb802030] 0xc000003feb802030 (unreliable) [c000000088e170c0] [c0000000000e5b10] perf_callchain_user_64+0x170/0x4f0 [c000000088e17160] [c0000000000e5980] perf_callchain_user+0x20/0x40 [c000000088e17180] [c00000000035f054] get_perf_callchain+0x184/0x250 [c000000088e17210] [c000000000357874] perf_callchain+0x94/0xd0 [c000000088e17230] [c00000000035819c] perf_prepare_sample+0x6ac/0x8f0 [c000000088e17290] [c000000000358428] perf_event_output_forward+0x48/0xc0 [c000000088e17310] [c00000000034d6cc] __perf_event_overflow+0x12c/0x270 [c000000088e17360] [c0000000000e8b80] record_and_restart+0x340/0x830 [c000000088e17580] [c0000000000e9318] perf_event_interrupt+0x2a8/0x4a0 [c000000088e17620] [c000000000028b64] performance_monitor_exception_nmi+0x64/0xb0 [c000000088e17670] [c00000000000baac] performance_monitor_common_virt+0x2ac/0x390 --- interrupt: f00 at native_hpte_insert+0x174/0x210 NIP: c00000000007be84 LR: c00000000007bdd4 CTR: c00000000007bd10 REGS: c000000088e176a0 TRAP: 0f00 Not tainted (6.2.0-rc4-00077-gd368967cb103-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44484802 XER: 00000078 CFAR: 0000000000000000 IRQMASK: 1 GPR00: c00000000007d2b8 c000000088e17940 c0000000010c0100 c000203fc2347b80 GPR04: 00b3b9708ffffff0 0000000000000010 04000000d5791196 0000000000001000 GPR08: 000b3b9708ffff85 c000203fc2347b88 000b3b9708ffff84 c000000002457fd0 GPR12: c00000000007bd10 c000003fffff2880 c000000002457e70 ffffffd1e43b9708 GPR16: 00b3b9708ffffff0 c000000002457e18 0000000000000001 0000000000000196 GPR20: c0000000024576b8 0800000000000000 0000000000000002 0000000000000002 GPR24: 00000000d5790000 0000000000000196 0000000000000003 000b3b9708ffff80 GPR28: 0000000000000000 0000000000000001 0000000000000000 c000203fc2347b80 NIP [c00000000007be84] native_hpte_insert+0x174/0x210 LR [c00000000007bdd4] native_hpte_insert+0xc4/0x210 --- interrupt: f00 [c000000088e17940] [c000000088e179c0] 0xc000000088e179c0 (unreliable) [c000000088e179c0] [c00000000007d2b8] __hash_page_64K+0x218/0x4f0 [c000000088e17a70] [c0000000000761fc] __update_mmu_cache+0x30c/0x3b0 [c000000088e17b10] [c0000000003d00a0] do_wp_page+0xa50/0x1640 [c000000088e17bf0] [c0000000003d3ca4] __handle_mm_fault+0xb94/0x1b90 [c000000088e17d00] [c0000000003d4dc0] handle_mm_fault+0x120/0x300 [c000000088e17d50] [c00000000006cbc4] ___do_page_fault+0x2d4/0xac0 [c000000088e17df0] [c00000000006d460] hash__do_page_fault+0x30/0xc0 [c000000088e17e20] [c000000000075d88] do_hash_fault+0x258/0x340 Thanks, Nick --- arch/powerpc/include/asm/hw_irq.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)