Message ID | 1410450088-18236-4-git-send-email-atomlin@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index a285900..2a8280a 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -824,6 +824,18 @@ config SCHEDSTATS > application, you can say N to avoid the very slight overhead > this adds. > > +config SCHED_STACK_END_CHECK > + bool "Detect stack corruption on calls to schedule()" > + depends on DEBUG_KERNEL > + default y Did you really mean default y? Doing so means it will be turned on more or less everywhere, which defeats the purpose of having a config option in the first place. > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index ec1a286..0b70b73 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) > */ > static inline void schedule_debug(struct task_struct *prev) > { > +#ifdef CONFIG_SCHED_STACK_END_CHECK > + BUG_ON(unlikely(task_stack_end_corrupted(prev))) > +#endif If this was my code I'd make you put that in a static inline. cheers
On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: > Currently in the event of a stack overrun a call to schedule() > does not check for this type of corruption. This corruption is > often silent and can go unnoticed. However once the corrupted > region is examined at a later stage, the outcome is undefined > and often results in a sporadic page fault which cannot be > handled. > > This patch checks for a stack overrun and takes appropriate > action since the damage is already done, there is no point > in continuing. > > Signed-off-by: Aaron Tomlin <atomlin@redhat.com> > --- > kernel/sched/core.c | 3 +++ > lib/Kconfig.debug | 12 ++++++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index ec1a286..0b70b73 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) > */ > static inline void schedule_debug(struct task_struct *prev) > { > +#ifdef CONFIG_SCHED_STACK_END_CHECK > + BUG_ON(unlikely(task_stack_end_corrupted(prev))) > +#endif Spot the bug? Please compile your code in future. ../kernel/sched/core.c: In function ‘schedule_debug’: ../kernel/sched/core.c:2671:2: error: expected ‘;’ before ‘if’ if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) ^ ../kernel/sched/core.c: At top level: ../kernel/sched/core.c:2635:22: warning: ‘__schedule_bug’ defined but not used [-Wunused-function] static noinline void __schedule_bug(struct task_struct *prev) ^ make[3]: *** [kernel/sched/core.o] Error 1 make[2]: *** [kernel/sched] Error 2 make[1]: *** [kernel] Error 2 make: *** [sub-make] Error 2 cheers
On Fri, Sep 12, 2014 at 02:06:57PM +1000, Michael Ellerman wrote: > On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > index a285900..2a8280a 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -824,6 +824,18 @@ config SCHEDSTATS > > application, you can say N to avoid the very slight overhead > > this adds. > > > > +config SCHED_STACK_END_CHECK > > + bool "Detect stack corruption on calls to schedule()" > > + depends on DEBUG_KERNEL > > + default y > > Did you really mean default y? > > Doing so means it will be turned on more or less everywhere, which defeats the > purpose of having a config option in the first place. Only if Kconfig CONFIG_DEBUG_KERNEL is enabled in the first place.
On Fri, Sep 12, 2014 at 04:04:51PM +1000, Michael Ellerman wrote: > On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: > > Currently in the event of a stack overrun a call to schedule() > > does not check for this type of corruption. This corruption is > > often silent and can go unnoticed. However once the corrupted > > region is examined at a later stage, the outcome is undefined > > and often results in a sporadic page fault which cannot be > > handled. > > > > This patch checks for a stack overrun and takes appropriate > > action since the damage is already done, there is no point > > in continuing. > > > > Signed-off-by: Aaron Tomlin <atomlin@redhat.com> > > --- > > kernel/sched/core.c | 3 +++ > > lib/Kconfig.debug | 12 ++++++++++++ > > 2 files changed, 15 insertions(+) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index ec1a286..0b70b73 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) > > */ > > static inline void schedule_debug(struct task_struct *prev) > > { > > +#ifdef CONFIG_SCHED_STACK_END_CHECK > > + BUG_ON(unlikely(task_stack_end_corrupted(prev))) > > +#endif > > Spot the bug? Please compile your code in future. Oops! Sorry about that.
On Fri, 2014-09-12 at 10:44 +0100, Aaron Tomlin wrote: > On Fri, Sep 12, 2014 at 02:06:57PM +1000, Michael Ellerman wrote: > > On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > > index a285900..2a8280a 100644 > > > --- a/lib/Kconfig.debug > > > +++ b/lib/Kconfig.debug > > > @@ -824,6 +824,18 @@ config SCHEDSTATS > > > application, you can say N to avoid the very slight overhead > > > this adds. > > > > > > +config SCHED_STACK_END_CHECK > > > + bool "Detect stack corruption on calls to schedule()" > > > + depends on DEBUG_KERNEL > > > + default y > > > > Did you really mean default y? > > > > Doing so means it will be turned on more or less everywhere, which defeats the > > purpose of having a config option in the first place. > > Only if Kconfig CONFIG_DEBUG_KERNEL is enabled in the first place. Which is likely enabled just about everywhere on the planet. -Mike
On Fri, 2014-09-12 at 12:58 +0200, Mike Galbraith wrote: > On Fri, 2014-09-12 at 10:44 +0100, Aaron Tomlin wrote: > > On Fri, Sep 12, 2014 at 02:06:57PM +1000, Michael Ellerman wrote: > > > On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: > > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > > > index a285900..2a8280a 100644 > > > > --- a/lib/Kconfig.debug > > > > +++ b/lib/Kconfig.debug > > > > @@ -824,6 +824,18 @@ config SCHEDSTATS > > > > application, you can say N to avoid the very slight overhead > > > > this adds. > > > > > > > > +config SCHED_STACK_END_CHECK > > > > + bool "Detect stack corruption on calls to schedule()" > > > > + depends on DEBUG_KERNEL > > > > + default y > > > > > > Did you really mean default y? > > > > > > Doing so means it will be turned on more or less everywhere, which defeats the > > > purpose of having a config option in the first place. > > > > Only if Kconfig CONFIG_DEBUG_KERNEL is enabled in the first place. > > Which is likely enabled just about everywhere on the planet. Yes, including all distros I'm aware of. cheers
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..0b70b73 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) */ static inline void schedule_debug(struct task_struct *prev) { +#ifdef CONFIG_SCHED_STACK_END_CHECK + BUG_ON(unlikely(task_stack_end_corrupted(prev))) +#endif /* * Test if we are atomic. Since do_exit() needs to call into * schedule() atomically, we ignore that path. Otherwise whine diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a285900..2a8280a 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -824,6 +824,18 @@ config SCHEDSTATS application, you can say N to avoid the very slight overhead this adds. +config SCHED_STACK_END_CHECK + bool "Detect stack corruption on calls to schedule()" + depends on DEBUG_KERNEL + default y + help + This option checks for a stack overrun on calls to schedule(). + If the stack end location is found to be over written always panic as + the content of the corrupted region can no longer be trusted. + This is to ensure no erroneous behaviour occurs which could result in + data corruption or a sporadic crash at a later stage once the region + is examined. The runtime overhead introduced is minimal. + config TIMER_STATS bool "Collect kernel timers statistics" depends on DEBUG_KERNEL && PROC_FS
Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. This patch checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> --- kernel/sched/core.c | 3 +++ lib/Kconfig.debug | 12 ++++++++++++ 2 files changed, 15 insertions(+)