Message ID | 1395992312-23035-1-git-send-email-dongsheng.wang@freescale.com (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
On Fri, Mar 28, 2014 at 03:38:32PM +0800, Dongsheng Wang wrote: > From: Wang Dongsheng <dongsheng.wang@freescale.com> > > If softirq use hardirq stack, we will get kernel painc when a hard irq coming again > during __do_softirq enable local irq to deal with softirq action. So we need to switch > satck into softirq stack when invoke soft irq. > > Task---> > | Task stack > | > Interrput->EXCEPTION->do_IRQ-> > ^ | Hard irq stack > | | > | irq_exit->__do_softirq->local_irq_enable-- -->local_irq_disable > | | Hard irq stack > | | > | Interrupt coming again > | There will get a Interrupt nesting | > ------------------------------------------------------------------------ > > Trace 1: Trap 900 > > Kernel stack overflow in process e8152f40, r1=e8e05ec0 > CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432 > task: e8152f40 ti: c080a000 task.ti: ef176000 > NIP: c05bec04 LR: c0305590 CTR: 00000010 > REGS: e8e05e10 TRAP: 0901 Not tainted (3.13.0-rc3-03475-g2e3f85b) Could you double check if you got the following patch applied? commit 1a18a66446f3f289b05b634f18012424d82aa63a Author: Kevin Hao <haokexin@gmail.com> Date: Fri Jan 17 12:25:28 2014 +0800 powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef440000 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000 GPR08: 00000000 020b8000 00000000 00000000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Cc: stable@vger.kernel.org # 3.12 Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Thanks, Kevin
Thanks Kevin. Your patch works normal. :) I still have some confused. I think when __do_softirq always get a interrupt, the hard stack will be run out, isn't it? Regards, -Dongsheng > -----Original Message----- > From: Kevin Hao [mailto:haokexin@gmail.com] > Sent: Friday, March 28, 2014 4:18 PM > To: Wang Dongsheng-B40534 > Cc: fweisbec@gmail.com; James Hogan; Andrew Morton; David S. Miller; Peter > Zijlstra; Helge Deller; H. Peter Anvin; Heiko Carstens; linux- > kernel@vger.kernel.org; Paul Mackerras; James E.J. Bottomley; Linus Torvalds; > Jin Zhengxiong-R64188; Wood Scott-B07421; Thomas Gleixner; linuxppc- > dev@lists.ozlabs.org; Ingo Molnar; Martin Schwidefsky > Subject: Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at > powerpc platform > > On Fri, Mar 28, 2014 at 03:38:32PM +0800, Dongsheng Wang wrote: > > From: Wang Dongsheng <dongsheng.wang@freescale.com> > > > > If softirq use hardirq stack, we will get kernel painc when a hard irq > > coming again during __do_softirq enable local irq to deal with softirq > > action. So we need to switch satck into softirq stack when invoke soft irq. > > > > Task---> > > | Task stack > > | > > Interrput->EXCEPTION->do_IRQ-> > > ^ | Hard irq stack > > | | > > | irq_exit->__do_softirq->local_irq_enable-- -- > >local_irq_disable > > | | Hard irq > stack > > | | > > | Interrupt > coming again > > | There will get a Interrupt nesting | > > > > ---------------------------------------------------------------------- > > -- > > > > Trace 1: Trap 900 > > > > Kernel stack overflow in process e8152f40, r1=e8e05ec0 > > CPU: 0 PID: 2399 Comm: image_compress/ Not tainted > > 3.13.0-rc3-03475-g2e3f85b #432 > > task: e8152f40 ti: c080a000 task.ti: ef176000 > > NIP: c05bec04 LR: c0305590 CTR: 00000010 > > REGS: e8e05e10 TRAP: 0901 Not tainted (3.13.0-rc3-03475-g2e3f85b) > > Could you double check if you got the following patch applied? > > commit 1a18a66446f3f289b05b634f18012424d82aa63a > Author: Kevin Hao <haokexin@gmail.com> > Date: Fri Jan 17 12:25:28 2014 +0800 > > powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack > > Guenter Roeck has got the following call trace on a p2020 board: > Kernel stack overflow in process eb3e5a00, r1=eb79df90 > CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 > #4 > task: eb3e5a00 ti: c0616000 task.ti: ef440000 > NIP: c003a420 LR: c003a410 CTR: c0017518 > REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) > MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 > GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 > 00000000 > GPR08: 00000000 020b8000 00000000 00000000 44008442 > NIP [c003a420] __do_softirq+0x94/0x1ec > LR [c003a410] __do_softirq+0x84/0x1ec > Call Trace: > [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) > [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 > [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c > [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 > [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 > --- Exception: 501 at 0xfcda524 > LR = 0x10024900 > Instruction dump: > 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 > 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f > Kernel panic - not syncing: kernel stack overflow > CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 > #4 > Call Trace: > > The reason is that we have used the wrong register to calculate the > ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). > Just fix it. > > As suggested by Benjamin Herrenschmidt, also add the C prototype of the > function in the comment in order to avoid such kind of errors in the > future. > > Cc: stable@vger.kernel.org # 3.12 > Reported-by: Guenter Roeck <linux@roeck-us.net> > Tested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Kevin Hao <haokexin@gmail.com> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Thanks, > Kevin
On Fri, Mar 28, 2014 at 09:00:13AM +0000, Dongsheng.Wang@freescale.com wrote: > Thanks Kevin. Your patch works normal. :) > > I still have some confused. I think when __do_softirq always get a interrupt, the hard stack will be run out, isn't it? No, it won't. Please see the explanation in the following commit log. commit cc1f027454929924471bea2f362431072e3c71be Author: Frederic Weisbecker <fweisbec@gmail.com> Date: Tue Sep 24 17:17:47 2013 +0200 irq: Optimize softirq stack selection in irq exit If irq_exit() is called on the arch's specified irq stack, it should be safe to run softirqs inline under that same irq stack as it is near empty by the time we call irq_exit(). For example if we use the same stack for both hard and soft irqs here, the worst case scenario is: hardirq -> softirq -> hardirq. But then the softirq supersedes the first hardirq as the stack user since irq_exit() is called in a mostly empty stack. So the stack merge in this case looks acceptable. Stack overrun still have a chance to happen if hardirqs have more opportunities to nest, but then it's another problem to solve. So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol that can be defined when irq_exit() runs on the irq stack. That way we can spare some stack switch on irq processing and all the cache issues that come along. Thanks, Kevin
On Fri, 2014-03-28 at 15:38 +0800, Dongsheng Wang wrote: > From: Wang Dongsheng <dongsheng.wang@freescale.com> > > If softirq use hardirq stack, we will get kernel painc when a hard irq coming again > during __do_softirq enable local irq to deal with softirq action. So we need to switch > satck into softirq stack when invoke soft irq. Yes, an interrupt can potentially nest but we should be near the top of the stack at that point, as the comment says in softirq.c, it should be fine. And your backtrace doesn't seem to indicate a major overflow. The code in do_IRQ() will make sure we don't switch stack again if we were already on either hard or softirq stack. I need a better analysis of your problem. Is that really a stack overflow ? Or is it a false positive due to a bug in the overflow detection ? I moved around the code that updates KSP_LIMIT in 32-bit to asm in misc_32.S a while ago since we don't do that on 64-bit, maybe we are getting it wrong... Cheers, Ben.
On Fri, 2014-03-28 at 16:18 +0800, Kevin Hao wrote: > powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack > Kevin. It looks like it was applied to 3.14 and sent to 3.12 stable but not 3.13 ... can you fix that up ? Cheers, Ben. > Guenter Roeck has got the following call trace on a p2020 board: > Kernel stack overflow in process eb3e5a00, r1=eb79df90 > CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 > task: eb3e5a00 ti: c0616000 task.ti: ef440000 > NIP: c003a420 LR: c003a410 CTR: c0017518 > REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) > MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 > GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000 > GPR08: 00000000 020b8000 00000000 00000000 44008442 > NIP [c003a420] __do_softirq+0x94/0x1ec > LR [c003a410] __do_softirq+0x84/0x1ec > Call Trace: > [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) > [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 > [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c > [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 > [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 > --- Exception: 501 at 0xfcda524 > LR = 0x10024900 > Instruction dump: > 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 > 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f > Kernel panic - not syncing: kernel stack overflow > CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 > Call Trace: > > The reason is that we have used the wrong register to calculate the > ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). > Just fix it. > > As suggested by Benjamin Herrenschmidt, also add the C prototype of the > function in the comment in order to avoid such kind of errors in the > future. > > Cc: stable@vger.kernel.org # 3.12 > Reported-by: Guenter Roeck <linux@roeck-us.net> > Tested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Kevin Hao <haokexin@gmail.com> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Thanks, > Kevin
On Sat, Mar 29, 2014 at 08:27:07AM +1100, Benjamin Herrenschmidt wrote: > On Fri, 2014-03-28 at 16:18 +0800, Kevin Hao wrote: > > > powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack > > > > Kevin. It looks like it was applied to 3.14 and sent to 3.12 stable but > not 3.13 ... can you fix that up ? It was already merged into 3.13 stable since 3.13.6: https://lkml.org/lkml/2014/3/4/787 I guess that Dongsheng didn't use the latest 3.13 stable tree. Thanks, Kevin
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 957bf34..ffde3fb 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -139,7 +139,6 @@ config PPC select OLD_SIGSUSPEND select OLD_SIGACTION if PPC32 select HAVE_DEBUG_STACKOVERFLOW - select HAVE_IRQ_EXIT_ON_IRQ_STACK select ARCH_USE_CMPXCHG_LOCKREF if PPC64 config GENERIC_CSUM