diff mbox

[v3,3/3] sched: BUG when stack end location is over written

Message ID 1410450088-18236-4-git-send-email-atomlin@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Aaron Tomlin Sept. 11, 2014, 3:41 p.m. UTC
Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

This patch checks for a stack overrun and takes appropriate
action since the damage is already done, there is no point
in continuing.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 kernel/sched/core.c |  3 +++
 lib/Kconfig.debug   | 12 ++++++++++++
 2 files changed, 15 insertions(+)

Comments

Michael Ellerman Sept. 12, 2014, 4:06 a.m. UTC | #1
On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index a285900..2a8280a 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -824,6 +824,18 @@ config SCHEDSTATS
>  	  application, you can say N to avoid the very slight overhead
>  	  this adds.
>  
> +config SCHED_STACK_END_CHECK
> +	bool "Detect stack corruption on calls to schedule()"
> +	depends on DEBUG_KERNEL
> +	default y

Did you really mean default y?

Doing so means it will be turned on more or less everywhere, which defeats the
purpose of having a config option in the first place.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ec1a286..0b70b73 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev)
>   */
>  static inline void schedule_debug(struct task_struct *prev)
>  {
> +#ifdef CONFIG_SCHED_STACK_END_CHECK
> +	BUG_ON(unlikely(task_stack_end_corrupted(prev)))
> +#endif

If this was my code I'd make you put that in a static inline.

cheers
Michael Ellerman Sept. 12, 2014, 6:04 a.m. UTC | #2
On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
> Currently in the event of a stack overrun a call to schedule()
> does not check for this type of corruption. This corruption is
> often silent and can go unnoticed. However once the corrupted
> region is examined at a later stage, the outcome is undefined
> and often results in a sporadic page fault which cannot be
> handled.
> 
> This patch checks for a stack overrun and takes appropriate
> action since the damage is already done, there is no point
> in continuing.
> 
> Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
> ---
>  kernel/sched/core.c |  3 +++
>  lib/Kconfig.debug   | 12 ++++++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ec1a286..0b70b73 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev)
>   */
>  static inline void schedule_debug(struct task_struct *prev)
>  {
> +#ifdef CONFIG_SCHED_STACK_END_CHECK
> +	BUG_ON(unlikely(task_stack_end_corrupted(prev)))
> +#endif

Spot the bug? Please compile your code in future.


../kernel/sched/core.c: In function ‘schedule_debug’:
../kernel/sched/core.c:2671:2: error: expected ‘;’ before ‘if’
  if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
  ^
../kernel/sched/core.c: At top level:
../kernel/sched/core.c:2635:22: warning: ‘__schedule_bug’ defined but not used [-Wunused-function]
 static noinline void __schedule_bug(struct task_struct *prev)
                      ^
make[3]: *** [kernel/sched/core.o] Error 1
make[2]: *** [kernel/sched] Error 2
make[1]: *** [kernel] Error 2
make: *** [sub-make] Error 2


cheers
Aaron Tomlin Sept. 12, 2014, 9:44 a.m. UTC | #3
On Fri, Sep 12, 2014 at 02:06:57PM +1000, Michael Ellerman wrote:
> On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index a285900..2a8280a 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -824,6 +824,18 @@ config SCHEDSTATS
> >  	  application, you can say N to avoid the very slight overhead
> >  	  this adds.
> >  
> > +config SCHED_STACK_END_CHECK
> > +	bool "Detect stack corruption on calls to schedule()"
> > +	depends on DEBUG_KERNEL
> > +	default y
> 
> Did you really mean default y?
> 
> Doing so means it will be turned on more or less everywhere, which defeats the
> purpose of having a config option in the first place.

Only if Kconfig CONFIG_DEBUG_KERNEL is enabled in the first place.
Aaron Tomlin Sept. 12, 2014, 9:50 a.m. UTC | #4
On Fri, Sep 12, 2014 at 04:04:51PM +1000, Michael Ellerman wrote:
> On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
> > Currently in the event of a stack overrun a call to schedule()
> > does not check for this type of corruption. This corruption is
> > often silent and can go unnoticed. However once the corrupted
> > region is examined at a later stage, the outcome is undefined
> > and often results in a sporadic page fault which cannot be
> > handled.
> > 
> > This patch checks for a stack overrun and takes appropriate
> > action since the damage is already done, there is no point
> > in continuing.
> > 
> > Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
> > ---
> >  kernel/sched/core.c |  3 +++
> >  lib/Kconfig.debug   | 12 ++++++++++++
> >  2 files changed, 15 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index ec1a286..0b70b73 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev)
> >   */
> >  static inline void schedule_debug(struct task_struct *prev)
> >  {
> > +#ifdef CONFIG_SCHED_STACK_END_CHECK
> > +	BUG_ON(unlikely(task_stack_end_corrupted(prev)))
> > +#endif
> 
> Spot the bug? Please compile your code in future.

Oops! Sorry about that.
Mike Galbraith Sept. 12, 2014, 10:58 a.m. UTC | #5
On Fri, 2014-09-12 at 10:44 +0100, Aaron Tomlin wrote: 
> On Fri, Sep 12, 2014 at 02:06:57PM +1000, Michael Ellerman wrote:
> > On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
> > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > index a285900..2a8280a 100644
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -824,6 +824,18 @@ config SCHEDSTATS
> > >  	  application, you can say N to avoid the very slight overhead
> > >  	  this adds.
> > >  
> > > +config SCHED_STACK_END_CHECK
> > > +	bool "Detect stack corruption on calls to schedule()"
> > > +	depends on DEBUG_KERNEL
> > > +	default y
> > 
> > Did you really mean default y?
> > 
> > Doing so means it will be turned on more or less everywhere, which defeats the
> > purpose of having a config option in the first place.
> 
> Only if Kconfig CONFIG_DEBUG_KERNEL is enabled in the first place.

Which is likely enabled just about everywhere on the planet.

-Mike
Michael Ellerman Sept. 15, 2014, 2:39 a.m. UTC | #6
On Fri, 2014-09-12 at 12:58 +0200, Mike Galbraith wrote:
> On Fri, 2014-09-12 at 10:44 +0100, Aaron Tomlin wrote: 
> > On Fri, Sep 12, 2014 at 02:06:57PM +1000, Michael Ellerman wrote:
> > > On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
> > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > > index a285900..2a8280a 100644
> > > > --- a/lib/Kconfig.debug
> > > > +++ b/lib/Kconfig.debug
> > > > @@ -824,6 +824,18 @@ config SCHEDSTATS
> > > >  	  application, you can say N to avoid the very slight overhead
> > > >  	  this adds.
> > > >  
> > > > +config SCHED_STACK_END_CHECK
> > > > +	bool "Detect stack corruption on calls to schedule()"
> > > > +	depends on DEBUG_KERNEL
> > > > +	default y
> > > 
> > > Did you really mean default y?
> > > 
> > > Doing so means it will be turned on more or less everywhere, which defeats the
> > > purpose of having a config option in the first place.
> > 
> > Only if Kconfig CONFIG_DEBUG_KERNEL is enabled in the first place.
> 
> Which is likely enabled just about everywhere on the planet.

Yes, including all distros I'm aware of.

cheers
diff mbox

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..0b70b73 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2660,6 +2660,9 @@  static noinline void __schedule_bug(struct task_struct *prev)
  */
 static inline void schedule_debug(struct task_struct *prev)
 {
+#ifdef CONFIG_SCHED_STACK_END_CHECK
+	BUG_ON(unlikely(task_stack_end_corrupted(prev)))
+#endif
 	/*
 	 * Test if we are atomic. Since do_exit() needs to call into
 	 * schedule() atomically, we ignore that path. Otherwise whine
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a285900..2a8280a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -824,6 +824,18 @@  config SCHEDSTATS
 	  application, you can say N to avoid the very slight overhead
 	  this adds.
 
+config SCHED_STACK_END_CHECK
+	bool "Detect stack corruption on calls to schedule()"
+	depends on DEBUG_KERNEL
+	default y
+	help
+	  This option checks for a stack overrun on calls to schedule().
+	  If the stack end location is found to be over written always panic as
+	  the content of the corrupted region can no longer be trusted.
+	  This is to ensure no erroneous behaviour occurs which could result in
+	  data corruption or a sporadic crash at a later stage once the region
+	  is examined. The runtime overhead introduced is minimal.
+
 config TIMER_STATS
 	bool "Collect kernel timers statistics"
 	depends on DEBUG_KERNEL && PROC_FS