diff mbox

[2/2] sched: BUG when stack end location is over written

Message ID 1409842224-11847-3-git-send-email-atomlin@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Aaron Tomlin Sept. 4, 2014, 2:50 p.m. UTC
Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

This patch checks for a stack overrun and takes appropriate
action since the damage is already done, there is no point
in continuing.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Aaron Tomlin Sept. 4, 2014, 4:11 p.m. UTC | #1
On Thu, Sep 04, 2014 at 05:32:31PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 04, 2014 at 03:50:24PM +0100, Aaron Tomlin wrote:
> > Currently in the event of a stack overrun a call to schedule()
> > does not check for this type of corruption. This corruption is
> > often silent and can go unnoticed. However once the corrupted
> > region is examined at a later stage, the outcome is undefined
> > and often results in a sporadic page fault which cannot be
> > handled.
> > 
> > This patch checks for a stack overrun and takes appropriate
> > action since the damage is already done, there is no point
> > in continuing.
> > 
> > Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
> > ---
> >  kernel/sched/core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index ec1a286..d6af6a0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev)
> >   */
> >  static inline void schedule_debug(struct task_struct *prev)
> >  {
> > +	if (unlikely(prev != &init_task &&
> > +		task_stack_end_corrupted(prev)))
> > +		BUG();
> 
> superfluous linebreak, also we appear to have BUG_ON() for situations
> just like these.
> 
> secondly, while I appreciate the 'feature' you're making schedule()
> slower for everybody, what do you propose to do about that?

Understood. I will wrap this with a suitable Kconfig option.
Aaron Tomlin Sept. 8, 2014, 7:23 p.m. UTC | #2
Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

The first patch adds a canary to init_task's end of stack.
While the second patch provides a helper to determine the
integrity of the canary. The third checks for a stack
overrun and takes appropriate action since the damage
is already done, there is no point in continuing.

Changes since v1:

 * Rebased against v3.17-rc4
 * Add a canary to init_task - Oleg Nesterov
 * Fix various code formatting issues - Peter Zijlstra
 * Introduce Kconfig option - Peter Zijlstra

Aaron Tomlin (3):
  init/main.c: Give init_task a canary
  sched: Add helper for task stack page overrun checking
  sched: BUG when stack end location is over written

 arch/powerpc/mm/fault.c    |  5 +----
 arch/x86/mm/fault.c        |  5 +----
 include/linux/sched.h      |  4 ++++
 init/main.c                |  1 +
 kernel/fork.c              | 12 +++++++++---
 kernel/sched/core.c        |  4 ++++
 kernel/trace/trace_stack.c |  4 +---
 lib/Kconfig.debug          | 12 ++++++++++++
 8 files changed, 33 insertions(+), 14 deletions(-)
Aaron Tomlin Sept. 9, 2014, 9:42 a.m. UTC | #3
Resending with "v2" added to each subject line:


Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

The first patch adds a canary to init_task's end of stack.
While the second patch provides a helper to determine the
integrity of the canary. The third checks for a stack
overrun and takes appropriate action since the damage
is already done, there is no point in continuing.

Changes since v1:

 * Rebased against v3.17-rc4
 * Add a canary to init_task - Oleg Nesterov
 * Fix various code formatting issues - Peter Zijlstra
 * Introduce Kconfig option - Peter Zijlstra

Aaron Tomlin (3):
  init/main.c: Give init_task a canary
  sched: Add helper for task stack page overrun checking
  sched: BUG when stack end location is over written

 arch/powerpc/mm/fault.c    |  5 +----
 arch/x86/mm/fault.c        |  5 +----
 include/linux/sched.h      |  4 ++++
 init/main.c                |  1 +
 kernel/fork.c              | 12 +++++++++---
 kernel/sched/core.c        |  4 ++++
 kernel/trace/trace_stack.c |  4 +---
 lib/Kconfig.debug          | 12 ++++++++++++
 8 files changed, 33 insertions(+), 14 deletions(-)
Aaron Tomlin Sept. 11, 2014, 3:41 p.m. UTC | #4
Hi Peter,

Please let me know if this iteration is satisfactory. Thanks.


Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

The first patch adds a canary to init_task's end of stack.
While the second patch provides a helper to determine the
integrity of the canary. The third checks for a stack
overrun and takes appropriate action since the damage
is already done, there is no point in continuing.

Changes since v2:

 * Use BUG_ON in schedule_debug() - Peter Zijlstra
 * Use a more explicit function
   name for setting the canary - Chuck Ebbert

Changes since v1:

 * Rebased against v3.17-rc4
 * Add a canary to init_task - Oleg Nesterov
 * Fix various code formatting issues - Peter Zijlstra
 * Introduce Kconfig option - Peter Zijlstra

Aaron Tomlin (3):
  init/main.c: Give init_task a canary
  sched: Add helper for task stack page overrun checking
  sched: BUG when stack end location is over written

 arch/powerpc/mm/fault.c    |  5 +----
 arch/x86/mm/fault.c        |  5 +----
 include/linux/sched.h      |  4 ++++
 init/main.c                |  1 +
 kernel/fork.c              | 12 +++++++++---
 kernel/sched/core.c        |  3 +++
 kernel/trace/trace_stack.c |  4 +---
 lib/Kconfig.debug          | 12 ++++++++++++
 8 files changed, 32 insertions(+), 14 deletions(-)
Peter Zijlstra Sept. 11, 2014, 3:53 p.m. UTC | #5
What's with the threading all versions together? Please don't do that --
also don't post a new version just for this though.
Aaron Tomlin Sept. 11, 2014, 3:59 p.m. UTC | #6
On Thu, Sep 11, 2014 at 05:53:03PM +0200, Peter Zijlstra wrote:
> 
> What's with the threading all versions together? Please don't do that --
> also don't post a new version just for this though.

Sorry about that. Noted.
David Laight Sept. 11, 2014, 4:02 p.m. UTC | #7
From: Aaron Tomlin

> Currently in the event of a stack overrun a call to schedule()

> does not check for this type of corruption. This corruption is

> often silent and can go unnoticed. However once the corrupted

> region is examined at a later stage, the outcome is undefined

> and often results in a sporadic page fault which cannot be

> handled.

> 

> The first patch adds a canary to init_task's end of stack.

> While the second patch provides a helper to determine the

> integrity of the canary. The third checks for a stack

> overrun and takes appropriate action since the damage

> is already done, there is no point in continuing.


Clearly you've just been 'bitten' by a kernel stack overflow.
But a simple 'canary' isn't going to find most of the overflows
and will give an incorrect 'sense of security'.

The canary will only work if the stack is densely written.
In practise the stack alignment rules create gaps, and the
most likely reason for overflow is a large on-stack buffer
that isn't actually written to.

The only real way to detect kernel stack overflow is to arrange
for an unmapped page beyond the stack.
That costs KVA, but not much else.

	David
Chuck Ebbert Sept. 11, 2014, 5:26 p.m. UTC | #8
On Thu, 11 Sep 2014 16:02:45 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Aaron Tomlin
> > Currently in the event of a stack overrun a call to schedule()
> > does not check for this type of corruption. This corruption is
> > often silent and can go unnoticed. However once the corrupted
> > region is examined at a later stage, the outcome is undefined
> > and often results in a sporadic page fault which cannot be
> > handled.
> > 
> > The first patch adds a canary to init_task's end of stack.
> > While the second patch provides a helper to determine the
> > integrity of the canary. The third checks for a stack
> > overrun and takes appropriate action since the damage
> > is already done, there is no point in continuing.
> 
> Clearly you've just been 'bitten' by a kernel stack overflow.
> But a simple 'canary' isn't going to find most of the overflows
> and will give an incorrect 'sense of security'.
> 
> The canary will only work if the stack is densely written.
> In practise the stack alignment rules create gaps, and the
> most likely reason for overflow is a large on-stack buffer
> that isn't actually written to.
> 
> The only real way to detect kernel stack overflow is to arrange
> for an unmapped page beyond the stack.
> That costs KVA, but not much else.
> 

That doesn't work either, because the threadinfo sits between the end of the
stack and the beginning of the next page, making it possible to corrupt critical
data without running off the page.
Aaron Tomlin Sept. 11, 2014, 5:44 p.m. UTC | #9
On Thu, Sep 11, 2014 at 04:02:45PM +0000, David Laight wrote:
> From: Aaron Tomlin
> > Currently in the event of a stack overrun a call to schedule()
> > does not check for this type of corruption. This corruption is
> > often silent and can go unnoticed. However once the corrupted
> > region is examined at a later stage, the outcome is undefined
> > and often results in a sporadic page fault which cannot be
> > handled.
> > 
> > The first patch adds a canary to init_task's end of stack.
> > While the second patch provides a helper to determine the
> > integrity of the canary. The third checks for a stack
> > overrun and takes appropriate action since the damage
> > is already done, there is no point in continuing.
> 
> Clearly you've just been 'bitten' by a kernel stack overflow.
> But a simple 'canary' isn't going to find most of the overflows
> and will give an incorrect 'sense of security'.

Please note that this is not suppose to be a 'perfect' solution.
Rather a worth while check in this particular code path.
Let's assume that the canary is damaged. In this situation it is
rather likely that the thread_info object has been compromised too.
David Laight Sept. 12, 2014, 8:43 a.m. UTC | #10
From: Chuck Ebbert 
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > From: Aaron Tomlin
> > > Currently in the event of a stack overrun a call to schedule()
> > > does not check for this type of corruption. This corruption is
> > > often silent and can go unnoticed. However once the corrupted
> > > region is examined at a later stage, the outcome is undefined
> > > and often results in a sporadic page fault which cannot be
> > > handled.
> > >
> > > The first patch adds a canary to init_task's end of stack.
> > > While the second patch provides a helper to determine the
> > > integrity of the canary. The third checks for a stack
> > > overrun and takes appropriate action since the damage
> > > is already done, there is no point in continuing.
> >
> > Clearly you've just been 'bitten' by a kernel stack overflow.
> > But a simple 'canary' isn't going to find most of the overflows
> > and will give an incorrect 'sense of security'.
> >
> > The canary will only work if the stack is densely written.
> > In practise the stack alignment rules create gaps, and the
> > most likely reason for overflow is a large on-stack buffer
> > that isn't actually written to.
> >
> > The only real way to detect kernel stack overflow is to arrange
> > for an unmapped page beyond the stack.
> > That costs KVA, but not much else.
> >
> 
> That doesn't work either, because the threadinfo sits between the end of the
> stack and the beginning of the next page, making it possible to corrupt critical
> data without running off the page.

Then flip the order of the allocations so that the threadinfo is at the other end.
I'm not sure how many per-lwp structures the linux kernel has, but
I know that on netbsd the main thing the kernel stack hits is the fpu
save area - the end of that is the avx area which won't be needed when
the stack usage is large.
Everything else could be moved from the stack pages to the lwp struct itself.

	David
diff mbox

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..d6af6a0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2660,6 +2660,9 @@  static noinline void __schedule_bug(struct task_struct *prev)
  */
 static inline void schedule_debug(struct task_struct *prev)
 {
+	if (unlikely(prev != &init_task &&
+		task_stack_end_corrupted(prev)))
+		BUG();
 	/*
 	 * Test if we are atomic. Since do_exit() needs to call into
 	 * schedule() atomically, we ignore that path. Otherwise whine