Message ID | 1458095370-26731-1-git-send-email-cyrilbur@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Wed, 2016-16-03 at 02:29:30 UTC, Cyril Bur wrote: > Commit 70fe3d9 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a > call to restore_math() late in the syscall return path, after MSR_RI has been > cleared. The MSR_RI flag is used to indicate whether the kernel can take > another exception or not. A cleared MSR_RI flag indicates that the kernel > cannot. > > Unfortunately when a machine is under high load an SLB miss can occur in > restore_math() which (with MSR_RI cleared) leads to an unrecoverable exception. ... > > Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/6e669f085d595cb6053920832c cheers
On Wed, 2016-03-16 at 13:29 +1100, Cyril Bur wrote: > +#ifdef CONFIG_PPC_BOOK3S > + ld r10,PACAKMSR(r13) > + li r9,MSR_RI > + andc r11,r10,r9 /* Re-clear RI */ > + mtmsrd r11,1 > +#endif Do you need that ? IE, mtmsrd with "1" will only update RI and EE, is the value of EE in PACAMSR variable ? Cheers, Ben.
On Wed, 16 Mar 2016 22:45:20 +1100 Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > On Wed, 2016-03-16 at 13:29 +1100, Cyril Bur wrote: > > +#ifdef CONFIG_PPC_BOOK3S > > + ld r10,PACAKMSR(r13) > > + li r9,MSR_RI > > + andc r11,r10,r9 /* Re-clear RI */ > > + mtmsrd r11,1 > > +#endif > > Do you need that ? IE, mtmsrd with "1" will only update RI and EE, is > the value of EE in PACAMSR variable ? > Indeed, so, we should check with Anton who did the disable EE and clear RI just a bit before that. I had a quick look and to the best of my knowledge EE is not set in PACAKMSR. > Cheers, > Ben. >
On Wed, 2016-03-16 at 22:45 +1100, Benjamin Herrenschmidt wrote: > On Wed, 2016-03-16 at 13:29 +1100, Cyril Bur wrote: > > +#ifdef CONFIG_PPC_BOOK3S > > + ld r10,PACAKMSR(r13) > > + li r9,MSR_RI > > + andc r11,r10,r9 /* Re-clear RI */ > > + mtmsrd r11,1 > > +#endif > > Do you need that ? IE, mtmsrd with "1" will only update RI and EE, is > the value of EE in PACAMSR variable ? Yeah that's a bit brain dead when you look at it. But in Cyril's defence he just copied it from the existing code. PACAKMSR does not include EE, so the code is correct, just suboptimal. I've already merged it though in my rush to get my next branch working. Cyril is looking at reworking the EE/RI clearing in there so we'll fix it up as part of that. cheers
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 038e0a1..f3aa4b4 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -218,7 +218,16 @@ system_call: /* label this so stack traces look sane */ bne 3f #endif 2: addi r3,r1,STACK_FRAME_OVERHEAD +#ifdef CONFIG_PPC_BOOK3S + mtmsrd r10,1 /* Restore RI */ +#endif bl restore_math +#ifdef CONFIG_PPC_BOOK3S + ld r10,PACAKMSR(r13) + li r9,MSR_RI + andc r11,r10,r9 /* Re-clear RI */ + mtmsrd r11,1 +#endif ld r8,_MSR(r1) ld r3,RESULT(r1) li r11,-MAX_ERRNO
Commit 70fe3d9 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a call to restore_math() late in the syscall return path, after MSR_RI has been cleared. The MSR_RI flag is used to indicate whether the kernel can take another exception or not. A cleared MSR_RI flag indicates that the kernel cannot. Unfortunately when a machine is under high load an SLB miss can occur in restore_math() which (with MSR_RI cleared) leads to an unrecoverable exception. Unrecoverable exception trace: powerpc: Restore FPU/VEC/VSX if previously used Unrecoverable exception 4100 at c0000000000088d8 cpu 0x0: Vector: 4100 at [c0000003fa473b20] pc: c0000000000088d8: .load_vr_state+0x70/0x110 lr: c00000000000f710: .restore_math+0x130/0x188 sp: c0000003fa473da0 msr: 9000000002003030 current = 0xc0000007f876f180 paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01 pid = 1944, comm = K08umountfs Linux version 4.5.0-rc3-g22ccd98 (kerkins@alpine1-p1) (gcc version 5.2.1 20151001 (GCC) ) #1 SMP Tue Mar 15 21:33:26 AEDT 2016 WARNING: exception is not recoverable, can't continue enter ? for help [link register ] c00000000000f710 .restore_math+0x130/0x188 [c0000003fa473da0] c0000003fa473e30 (unreliable) [c0000003fa473e30] c000000000007b6c system_call+0x84/0xfc --- Exception: c00 (System Call) at 000000000fe84328 0:mon> The clearing of MSR_RI is actually an optimisation to avoid multiple MSR writes, what must be disabled are interrupts. See comment in entry_64.S: /* * For performance reasons we clear RI the same time that we * clear EE. We only need to clear RI just before we restore r13 * below, but batching it with EE saves us one expensive mtmsrd call. * We have to be careful to restore RI if we branch anywhere from * here (eg syscall_exit_work). */ At the point of calling restore_math() r13 has not been restored, as such, the quick fix of turning MSR_RI back on for the call to restore_math() will eliminate the occurrence of an unrecoverable exception. We'd like to do a better fix in future. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> --- arch/powerpc/kernel/entry_64.S | 9 +++++++++ 1 file changed, 9 insertions(+)