Message ID | 1409842224-11847-3-git-send-email-atomlin@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, Sep 04, 2014 at 05:32:31PM +0200, Peter Zijlstra wrote: > On Thu, Sep 04, 2014 at 03:50:24PM +0100, Aaron Tomlin wrote: > > Currently in the event of a stack overrun a call to schedule() > > does not check for this type of corruption. This corruption is > > often silent and can go unnoticed. However once the corrupted > > region is examined at a later stage, the outcome is undefined > > and often results in a sporadic page fault which cannot be > > handled. > > > > This patch checks for a stack overrun and takes appropriate > > action since the damage is already done, there is no point > > in continuing. > > > > Signed-off-by: Aaron Tomlin <atomlin@redhat.com> > > --- > > kernel/sched/core.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index ec1a286..d6af6a0 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) > > */ > > static inline void schedule_debug(struct task_struct *prev) > > { > > + if (unlikely(prev != &init_task && > > + task_stack_end_corrupted(prev))) > > + BUG(); > > superfluous linebreak, also we appear to have BUG_ON() for situations > just like these. > > secondly, while I appreciate the 'feature' you're making schedule() > slower for everybody, what do you propose to do about that? Understood. I will wrap this with a suitable Kconfig option.
Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. The first patch adds a canary to init_task's end of stack. While the second patch provides a helper to determine the integrity of the canary. The third checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Changes since v1: * Rebased against v3.17-rc4 * Add a canary to init_task - Oleg Nesterov * Fix various code formatting issues - Peter Zijlstra * Introduce Kconfig option - Peter Zijlstra Aaron Tomlin (3): init/main.c: Give init_task a canary sched: Add helper for task stack page overrun checking sched: BUG when stack end location is over written arch/powerpc/mm/fault.c | 5 +---- arch/x86/mm/fault.c | 5 +---- include/linux/sched.h | 4 ++++ init/main.c | 1 + kernel/fork.c | 12 +++++++++--- kernel/sched/core.c | 4 ++++ kernel/trace/trace_stack.c | 4 +--- lib/Kconfig.debug | 12 ++++++++++++ 8 files changed, 33 insertions(+), 14 deletions(-)
Resending with "v2" added to each subject line: Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. The first patch adds a canary to init_task's end of stack. While the second patch provides a helper to determine the integrity of the canary. The third checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Changes since v1: * Rebased against v3.17-rc4 * Add a canary to init_task - Oleg Nesterov * Fix various code formatting issues - Peter Zijlstra * Introduce Kconfig option - Peter Zijlstra Aaron Tomlin (3): init/main.c: Give init_task a canary sched: Add helper for task stack page overrun checking sched: BUG when stack end location is over written arch/powerpc/mm/fault.c | 5 +---- arch/x86/mm/fault.c | 5 +---- include/linux/sched.h | 4 ++++ init/main.c | 1 + kernel/fork.c | 12 +++++++++--- kernel/sched/core.c | 4 ++++ kernel/trace/trace_stack.c | 4 +--- lib/Kconfig.debug | 12 ++++++++++++ 8 files changed, 33 insertions(+), 14 deletions(-)
Hi Peter, Please let me know if this iteration is satisfactory. Thanks. Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. The first patch adds a canary to init_task's end of stack. While the second patch provides a helper to determine the integrity of the canary. The third checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Changes since v2: * Use BUG_ON in schedule_debug() - Peter Zijlstra * Use a more explicit function name for setting the canary - Chuck Ebbert Changes since v1: * Rebased against v3.17-rc4 * Add a canary to init_task - Oleg Nesterov * Fix various code formatting issues - Peter Zijlstra * Introduce Kconfig option - Peter Zijlstra Aaron Tomlin (3): init/main.c: Give init_task a canary sched: Add helper for task stack page overrun checking sched: BUG when stack end location is over written arch/powerpc/mm/fault.c | 5 +---- arch/x86/mm/fault.c | 5 +---- include/linux/sched.h | 4 ++++ init/main.c | 1 + kernel/fork.c | 12 +++++++++--- kernel/sched/core.c | 3 +++ kernel/trace/trace_stack.c | 4 +--- lib/Kconfig.debug | 12 ++++++++++++ 8 files changed, 32 insertions(+), 14 deletions(-)
What's with the threading all versions together? Please don't do that -- also don't post a new version just for this though.
On Thu, Sep 11, 2014 at 05:53:03PM +0200, Peter Zijlstra wrote: > > What's with the threading all versions together? Please don't do that -- > also don't post a new version just for this though. Sorry about that. Noted.
From: Aaron Tomlin > Currently in the event of a stack overrun a call to schedule() > does not check for this type of corruption. This corruption is > often silent and can go unnoticed. However once the corrupted > region is examined at a later stage, the outcome is undefined > and often results in a sporadic page fault which cannot be > handled. > > The first patch adds a canary to init_task's end of stack. > While the second patch provides a helper to determine the > integrity of the canary. The third checks for a stack > overrun and takes appropriate action since the damage > is already done, there is no point in continuing. Clearly you've just been 'bitten' by a kernel stack overflow. But a simple 'canary' isn't going to find most of the overflows and will give an incorrect 'sense of security'. The canary will only work if the stack is densely written. In practise the stack alignment rules create gaps, and the most likely reason for overflow is a large on-stack buffer that isn't actually written to. The only real way to detect kernel stack overflow is to arrange for an unmapped page beyond the stack. That costs KVA, but not much else. David
On Thu, 11 Sep 2014 16:02:45 +0000 David Laight <David.Laight@ACULAB.COM> wrote: > From: Aaron Tomlin > > Currently in the event of a stack overrun a call to schedule() > > does not check for this type of corruption. This corruption is > > often silent and can go unnoticed. However once the corrupted > > region is examined at a later stage, the outcome is undefined > > and often results in a sporadic page fault which cannot be > > handled. > > > > The first patch adds a canary to init_task's end of stack. > > While the second patch provides a helper to determine the > > integrity of the canary. The third checks for a stack > > overrun and takes appropriate action since the damage > > is already done, there is no point in continuing. > > Clearly you've just been 'bitten' by a kernel stack overflow. > But a simple 'canary' isn't going to find most of the overflows > and will give an incorrect 'sense of security'. > > The canary will only work if the stack is densely written. > In practise the stack alignment rules create gaps, and the > most likely reason for overflow is a large on-stack buffer > that isn't actually written to. > > The only real way to detect kernel stack overflow is to arrange > for an unmapped page beyond the stack. > That costs KVA, but not much else. > That doesn't work either, because the threadinfo sits between the end of the stack and the beginning of the next page, making it possible to corrupt critical data without running off the page.
On Thu, Sep 11, 2014 at 04:02:45PM +0000, David Laight wrote: > From: Aaron Tomlin > > Currently in the event of a stack overrun a call to schedule() > > does not check for this type of corruption. This corruption is > > often silent and can go unnoticed. However once the corrupted > > region is examined at a later stage, the outcome is undefined > > and often results in a sporadic page fault which cannot be > > handled. > > > > The first patch adds a canary to init_task's end of stack. > > While the second patch provides a helper to determine the > > integrity of the canary. The third checks for a stack > > overrun and takes appropriate action since the damage > > is already done, there is no point in continuing. > > Clearly you've just been 'bitten' by a kernel stack overflow. > But a simple 'canary' isn't going to find most of the overflows > and will give an incorrect 'sense of security'. Please note that this is not suppose to be a 'perfect' solution. Rather a worth while check in this particular code path. Let's assume that the canary is damaged. In this situation it is rather likely that the thread_info object has been compromised too.
From: Chuck Ebbert > David Laight <David.Laight@ACULAB.COM> wrote: > > > From: Aaron Tomlin > > > Currently in the event of a stack overrun a call to schedule() > > > does not check for this type of corruption. This corruption is > > > often silent and can go unnoticed. However once the corrupted > > > region is examined at a later stage, the outcome is undefined > > > and often results in a sporadic page fault which cannot be > > > handled. > > > > > > The first patch adds a canary to init_task's end of stack. > > > While the second patch provides a helper to determine the > > > integrity of the canary. The third checks for a stack > > > overrun and takes appropriate action since the damage > > > is already done, there is no point in continuing. > > > > Clearly you've just been 'bitten' by a kernel stack overflow. > > But a simple 'canary' isn't going to find most of the overflows > > and will give an incorrect 'sense of security'. > > > > The canary will only work if the stack is densely written. > > In practise the stack alignment rules create gaps, and the > > most likely reason for overflow is a large on-stack buffer > > that isn't actually written to. > > > > The only real way to detect kernel stack overflow is to arrange > > for an unmapped page beyond the stack. > > That costs KVA, but not much else. > > > > That doesn't work either, because the threadinfo sits between the end of the > stack and the beginning of the next page, making it possible to corrupt critical > data without running off the page. Then flip the order of the allocations so that the threadinfo is at the other end. I'm not sure how many per-lwp structures the linux kernel has, but I know that on netbsd the main thing the kernel stack hits is the fpu save area - the end of that is the avx area which won't be needed when the stack usage is large. Everything else could be moved from the stack pages to the lwp struct itself. David
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..d6af6a0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) */ static inline void schedule_debug(struct task_struct *prev) { + if (unlikely(prev != &init_task && + task_stack_end_corrupted(prev))) + BUG(); /* * Test if we are atomic. Since do_exit() needs to call into * schedule() atomically, we ignore that path. Otherwise whine
Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. This patch checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+)