Message ID | 20231017121527.1574104-1-mpe@ellerman.id.au (mailing list archive) |
---|---|
State | Accepted |
Commit | 20045f0155ab79f8beb840022ea86bff46167f79 |
Headers | show |
Series | powerpc/64s/radix: Don't warn on copros in radix__tlb_flush() | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
> On 17-Oct-2023, at 5:45 PM, Michael Ellerman <mpe@ellerman.id.au> wrote: > > Sachin reported a warning when running the inject-ra-err selftest: > > # selftests: powerpc/mce: inject-ra-err > Disabling lock debugging due to kernel taint > MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] > MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48] > MCE: CPU19: Initiator CPU > MCE: CPU19: Unknown > ------------[ cut here ]------------ > WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180 > CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G M E 6.6.0-rc3-00055-g9ed22ae6be81 #4 > Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries > ... > NIP radix__tlb_flush+0x160/0x180 > LR radix__tlb_flush+0x104/0x180 > Call Trace: > radix__tlb_flush+0xf4/0x180 (unreliable) > tlb_finish_mmu+0x15c/0x1e0 > exit_mmap+0x1a0/0x510 > __mmput+0x60/0x1e0 > exit_mm+0xdc/0x170 > do_exit+0x2bc/0x5a0 > do_group_exit+0x4c/0xc0 > sys_exit_group+0x28/0x30 > system_call_exception+0x138/0x330 > system_call_vectored_common+0x15c/0x2ec > > And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine > final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning > in radix__tlb_flush() if mm->context.copros is still elevated. > > However it's possible for the copros count to be elevated if a process > exits without first closing file descriptors that are associated with a > copro, eg. VAS. > > If the process exits with a VAS file still open, the release callback > is queued up for exit_task_work() via: > exit_files() > put_files_struct() > close_files() > filp_close() > fput() > > And called via: > exit_task_work() > ____fput() > __fput() > file->f_op->release(inode, file) > coproc_release() > vas_user_win_ops->close_win() > vas_deallocate_window() > mm_context_remove_vas_window() > mm_context_remove_copro() > > But that is after exit_mm() has been called from do_exit() and triggered > the warning. > > Fix it by dropping the warning, and always calling __flush_all_mm(). > > In the normal case of no copros, that will result in a call to > _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code > does. > > If the copros count is elevated then it will cause a global flush, which > should flush translations from any copros. Note that the process table > entry was cleared in arch_exit_mmap(), so copros should not be able to > fetch any new translations. > > Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs") > Reported-by: Sachin Sant <sachinp@linux.ibm.com> > Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/ > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > --- Thanks for the fix. This fixes the reported problem. Tested-by: Sachin Sant <sachinp@linux.ibm.com> - Sachin
On Tue, 17 Oct 2023 23:15:27 +1100, Michael Ellerman wrote: > Sachin reported a warning when running the inject-ra-err selftest: > > # selftests: powerpc/mce: inject-ra-err > Disabling lock debugging due to kernel taint > MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] > MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48] > MCE: CPU19: Initiator CPU > MCE: CPU19: Unknown > ------------[ cut here ]------------ > WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180 > CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G M E 6.6.0-rc3-00055-g9ed22ae6be81 #4 > Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries > ... > NIP radix__tlb_flush+0x160/0x180 > LR radix__tlb_flush+0x104/0x180 > Call Trace: > radix__tlb_flush+0xf4/0x180 (unreliable) > tlb_finish_mmu+0x15c/0x1e0 > exit_mmap+0x1a0/0x510 > __mmput+0x60/0x1e0 > exit_mm+0xdc/0x170 > do_exit+0x2bc/0x5a0 > do_group_exit+0x4c/0xc0 > sys_exit_group+0x28/0x30 > system_call_exception+0x138/0x330 > system_call_vectored_common+0x15c/0x2ec > > [...] Applied to powerpc/fixes. [1/1] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush() https://git.kernel.org/powerpc/c/20045f0155ab79f8beb840022ea86bff46167f79 cheers
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 39acc2cbab4c..9e1f6558d026 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -1212,14 +1212,7 @@ void radix__tlb_flush(struct mmu_gather *tlb) smp_mb(); /* see radix__flush_tlb_mm */ exit_flush_lazy_tlbs(mm); - _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL); - - /* - * It should not be possible to have coprocessors still - * attached here. - */ - if (WARN_ON_ONCE(atomic_read(&mm->context.copros) > 0)) - __flush_all_mm(mm, true); + __flush_all_mm(mm, true); preempt_enable(); } else {