From patchwork Wed Mar 16 02:29:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cyril Bur X-Patchwork-Id: 598112 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qPwvF44XFz9sRZ for ; Wed, 16 Mar 2016 13:50:01 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3qPwvF2sVyzDr70 for ; Wed, 16 Mar 2016 13:50:01 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3qPwt755mRzDq5f for ; Wed, 16 Mar 2016 13:49:03 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 3qPwt74dBbz9t0t; Wed, 16 Mar 2016 13:49:03 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org X-Greylist: delayed 607 seconds by postgrey-1.35 at bilbo; Wed, 16 Mar 2016 13:49:03 AEDT Received: from e23smtp06.au.ibm.com (unknown [202.81.31.148]) (using TLSv1.2 with cipher CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qPwt745YTz9sRZ for ; Wed, 16 Mar 2016 13:49:03 +1100 (AEDT) Received: from localhost by e23smtp06.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 16 Mar 2016 12:38:48 +1000 Received: from d23dlp02.au.ibm.com (202.81.31.213) by e23smtp06.au.ibm.com (202.81.31.212) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 16 Mar 2016 12:38:45 +1000 X-IBM-Helo: d23dlp02.au.ibm.com X-IBM-MailFrom: cyrilbur@gmail.com X-IBM-RcptTo: linuxppc-dev@ozlabs.org Received: from d23relay08.au.ibm.com (d23relay08.au.ibm.com [9.185.71.33]) by d23dlp02.au.ibm.com (Postfix) with ESMTP id A70882BB00BB for ; Wed, 16 Mar 2016 13:30:56 +1100 (EST) Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay08.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u2G2Um9P131494 for ; Wed, 16 Mar 2016 13:30:56 +1100 Received: from d23av03.au.ibm.com (localhost [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u2G2UO3E022700 for ; Wed, 16 Mar 2016 13:30:24 +1100 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.192.253.14]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u2G2UOKe021763; Wed, 16 Mar 2016 13:30:24 +1100 Received: from camb691.ozlabs.ibm.com (haven.au.ibm.com [9.192.254.114]) (using TLSv1.2 with cipher AES128-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.au.ibm.com (Postfix) with ESMTPSA id A14F4A013A; Wed, 16 Mar 2016 13:29:59 +1100 (AEDT) From: Cyril Bur To: mpe@ellerman.id.au, linuxppc-dev@ozlabs.org Subject: [PATCH] powerpc: Fix possible unrecoverable exception caused by '70fe3d9' Date: Wed, 16 Mar 2016 13:29:30 +1100 Message-Id: <1458095370-26731-1-git-send-email-cyrilbur@gmail.com> X-Mailer: git-send-email 2.7.3 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16031602-0021-0000-0000-000003070373 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Commit 70fe3d9 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a call to restore_math() late in the syscall return path, after MSR_RI has been cleared. The MSR_RI flag is used to indicate whether the kernel can take another exception or not. A cleared MSR_RI flag indicates that the kernel cannot. Unfortunately when a machine is under high load an SLB miss can occur in restore_math() which (with MSR_RI cleared) leads to an unrecoverable exception. Unrecoverable exception trace: powerpc: Restore FPU/VEC/VSX if previously used Unrecoverable exception 4100 at c0000000000088d8 cpu 0x0: Vector: 4100 at [c0000003fa473b20] pc: c0000000000088d8: .load_vr_state+0x70/0x110 lr: c00000000000f710: .restore_math+0x130/0x188 sp: c0000003fa473da0 msr: 9000000002003030 current = 0xc0000007f876f180 paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01 pid = 1944, comm = K08umountfs Linux version 4.5.0-rc3-g22ccd98 (kerkins@alpine1-p1) (gcc version 5.2.1 20151001 (GCC) ) #1 SMP Tue Mar 15 21:33:26 AEDT 2016 WARNING: exception is not recoverable, can't continue enter ? for help [link register ] c00000000000f710 .restore_math+0x130/0x188 [c0000003fa473da0] c0000003fa473e30 (unreliable) [c0000003fa473e30] c000000000007b6c system_call+0x84/0xfc --- Exception: c00 (System Call) at 000000000fe84328 0:mon> The clearing of MSR_RI is actually an optimisation to avoid multiple MSR writes, what must be disabled are interrupts. See comment in entry_64.S: /* * For performance reasons we clear RI the same time that we * clear EE. We only need to clear RI just before we restore r13 * below, but batching it with EE saves us one expensive mtmsrd call. * We have to be careful to restore RI if we branch anywhere from * here (eg syscall_exit_work). */ At the point of calling restore_math() r13 has not been restored, as such, the quick fix of turning MSR_RI back on for the call to restore_math() will eliminate the occurrence of an unrecoverable exception. We'd like to do a better fix in future. Signed-off-by: Cyril Bur --- arch/powerpc/kernel/entry_64.S | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 038e0a1..f3aa4b4 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -218,7 +218,16 @@ system_call: /* label this so stack traces look sane */ bne 3f #endif 2: addi r3,r1,STACK_FRAME_OVERHEAD +#ifdef CONFIG_PPC_BOOK3S + mtmsrd r10,1 /* Restore RI */ +#endif bl restore_math +#ifdef CONFIG_PPC_BOOK3S + ld r10,PACAKMSR(r13) + li r9,MSR_RI + andc r11,r10,r9 /* Re-clear RI */ + mtmsrd r11,1 +#endif ld r8,_MSR(r1) ld r3,RESULT(r1) li r11,-MAX_ERRNO