Message ID | 20190509121923.8339-1-pmladek@suse.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | vsprintf: Do not break early boot with probing addresses | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | warning | Failed to apply on branch next (8150a153c013aa2dd1ffae43370b89ac1347a7fb) |
snowpatch_ozlabs/apply_patch | fail | Failed to apply to any branch |
On Thu, May 09, 2019 at 02:19:23PM +0200, Petr Mladek wrote: > The commit 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing > invalid pointers") broke boot on several architectures. The common > pattern is that probe_kernel_read() is not working during early > boot because userspace access framework is not ready. > > The check is only the best effort. Let's not rush with it during > the early boot. > > Details: > > 1. Report on Power: > > Kernel crashes very early during boot with with CONFIG_PPC_KUAP and > CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG > > The problem is the combination of some new code called via printk(), > check_pointer() which calls probe_kernel_read(). That then calls > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > (before we've patched features). With the JUMP_LABEL debug enabled that > causes us to call printk() & dump_stack() and we end up recursing and > overflowing the stack. > > Because it happens so early you don't get any output, just an apparently > dead system. > > The stack trace (which you don't see) is something like: > > ... > dump_stack+0xdc > probe_kernel_read+0x1a4 > check_pointer+0x58 > string+0x3c > vsnprintf+0x1bc > vscnprintf+0x20 > printk_safe_log_store+0x7c > printk+0x40 > dump_stack_print_info+0xbc > dump_stack+0x8 > probe_kernel_read+0x1a4 > probe_kernel_read+0x19c > check_pointer+0x58 > string+0x3c > vsnprintf+0x1bc > vscnprintf+0x20 > vprintk_store+0x6c > vprintk_emit+0xec > vprintk_func+0xd4 > printk+0x40 > cpufeatures_process_feature+0xc8 > scan_cpufeatures_subnodes+0x380 > of_scan_flat_dt_subnodes+0xb4 > dt_cpu_ftrs_scan_callback+0x158 > of_scan_flat_dt+0xf0 > dt_cpu_ftrs_scan+0x3c > early_init_devtree+0x360 > early_setup+0x9c > > 2. Report on s390: > > vsnprintf invocations, are broken on s390. For example, the early boot > output now looks like this where the first (efault) should be > the linux_banner: > > [ 0.099985] (efault) > [ 0.099985] setup: Linux is running as a z/VM guest operating system in 64-bit mode > [ 0.100066] setup: The maximum memory size is 8192MB > [ 0.100070] cma: Reserved 4 MiB at (efault) > [ 0.100100] numa: NUMA mode: (efault) > > The reason for this, is that the code assumes that > probe_kernel_address() works very early. This however is not true on > at least s390. Uaccess on KERNEL_DS works only after page tables have > been setup on s390, which happens with setup_arch()->paging_init(). > > Any probe_kernel_address() invocation before that will return -EFAULT. > It's seems as a good enough fix. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Though in all cases would be nice to distinguish error pointers as well. Something like if (IS_ERR(ptr)) return err_pointer_str(ptr); in check_pointer_msg(). > Fixes: 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing invalid pointers") > Signed-off-by: Petr Mladek <pmladek@suse.com> > --- > lib/vsprintf.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index 7b0a6140bfad..8b43a883be6b 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -640,8 +640,13 @@ static const char *check_pointer_msg(const void *ptr) > if (!ptr) > return "(null)"; > > - if (probe_kernel_address(ptr, byte)) > - return "(efault)"; > + /* User space address handling is not ready during early boot. */ > + if (system_state <= SYSTEM_BOOTING) { > + if ((unsigned long)ptr < PAGE_SIZE) > + return "(efault)"; > + } else { > + if (probe_kernel_address(ptr, byte)) > + return "(efault)"; > > return NULL; > } > -- > 2.16.4 >
On Thu, 9 May 2019 14:19:23 +0200 Petr Mladek <pmladek@suse.com> wrote: > The commit 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing > invalid pointers") broke boot on several architectures. The common > pattern is that probe_kernel_read() is not working during early > boot because userspace access framework is not ready. > > The check is only the best effort. Let's not rush with it during > the early boot. > > Details: > > 1. Report on Power: > > Kernel crashes very early during boot with with CONFIG_PPC_KUAP and > CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG > > The problem is the combination of some new code called via printk(), > check_pointer() which calls probe_kernel_read(). That then calls > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > (before we've patched features). With the JUMP_LABEL debug enabled that > causes us to call printk() & dump_stack() and we end up recursing and > overflowing the stack. > > Because it happens so early you don't get any output, just an apparently > dead system. > > The stack trace (which you don't see) is something like: > > ... > dump_stack+0xdc > probe_kernel_read+0x1a4 > check_pointer+0x58 > string+0x3c > vsnprintf+0x1bc > vscnprintf+0x20 > printk_safe_log_store+0x7c > printk+0x40 > dump_stack_print_info+0xbc > dump_stack+0x8 > probe_kernel_read+0x1a4 > probe_kernel_read+0x19c > check_pointer+0x58 > string+0x3c > vsnprintf+0x1bc > vscnprintf+0x20 > vprintk_store+0x6c > vprintk_emit+0xec > vprintk_func+0xd4 > printk+0x40 > cpufeatures_process_feature+0xc8 > scan_cpufeatures_subnodes+0x380 > of_scan_flat_dt_subnodes+0xb4 > dt_cpu_ftrs_scan_callback+0x158 > of_scan_flat_dt+0xf0 > dt_cpu_ftrs_scan+0x3c > early_init_devtree+0x360 > early_setup+0x9c > > 2. Report on s390: > > vsnprintf invocations, are broken on s390. For example, the early boot > output now looks like this where the first (efault) should be > the linux_banner: > > [ 0.099985] (efault) > [ 0.099985] setup: Linux is running as a z/VM guest operating system in 64-bit mode > [ 0.100066] setup: The maximum memory size is 8192MB > [ 0.100070] cma: Reserved 4 MiB at (efault) > [ 0.100100] numa: NUMA mode: (efault) > > The reason for this, is that the code assumes that > probe_kernel_address() works very early. This however is not true on > at least s390. Uaccess on KERNEL_DS works only after page tables have > been setup on s390, which happens with setup_arch()->paging_init(). > > Any probe_kernel_address() invocation before that will return -EFAULT. Hmm, this sounds to me that probe_kernel_address() is broken for these architectures. Perhaps the system_state check should be in probe_kernel_address() for those architectures? > > Fixes: 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing invalid pointers") > Signed-off-by: Petr Mladek <pmladek@suse.com> > --- > lib/vsprintf.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index 7b0a6140bfad..8b43a883be6b 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -640,8 +640,13 @@ static const char *check_pointer_msg(const void *ptr) > if (!ptr) > return "(null)"; > > - if (probe_kernel_address(ptr, byte)) > - return "(efault)"; Either that, or we add a macro to those architectures and add: #ifdef ARCH_NO_EARLY_PROBE_KERNEL_ADDRESS > + /* User space address handling is not ready during early boot. */ > + if (system_state <= SYSTEM_BOOTING) { > + if ((unsigned long)ptr < PAGE_SIZE) > + return "(efault)"; > + } else { #endif Why add an unnecessary branch for archs this is not a problem for? -- Steve > + if (probe_kernel_address(ptr, byte)) > + return "(efault)"; > > return NULL; > }
On Thu, 9 May 2019 14:19:23 +0200 Petr Mladek <pmladek@suse.com> wrote: > The commit 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing > invalid pointers") broke boot on several architectures. The common > pattern is that probe_kernel_read() is not working during early > boot because userspace access framework is not ready. > > The check is only the best effort. Let's not rush with it during > the early boot. > > Details: > > 1. Report on Power: > > Kernel crashes very early during boot with with CONFIG_PPC_KUAP and > CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG > > The problem is the combination of some new code called via printk(), > check_pointer() which calls probe_kernel_read(). That then calls > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > (before we've patched features). There is early_mmu_has_feature for this case. mmu_has_feature does not work before patching so parts of kernel that can run before patching must use the early_ variant which actually runs code reading the feature bitmap to determine the answer. Thanks Michal
From: Michal Suchánek > Sent: 09 May 2019 14:38 ... > > The problem is the combination of some new code called via printk(), > > check_pointer() which calls probe_kernel_read(). That then calls > > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > > (before we've patched features). > > There is early_mmu_has_feature for this case. mmu_has_feature does not > work before patching so parts of kernel that can run before patching > must use the early_ variant which actually runs code reading the > feature bitmap to determine the answer. Does the early_ variant get patched so the it is reasonably efficient after the 'patching' is done? Or should there be a third version which gets patched across? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu 2019-05-09 09:13:57, Steven Rostedt wrote: > On Thu, 9 May 2019 14:19:23 +0200 > Petr Mladek <pmladek@suse.com> wrote: > > > The commit 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing > > invalid pointers") broke boot on several architectures. The common > > pattern is that probe_kernel_read() is not working during early > > boot because userspace access framework is not ready. > > > > The check is only the best effort. Let's not rush with it during > > the early boot. > > > > Details: > > > > 1. Report on Power: > > > > Kernel crashes very early during boot with with CONFIG_PPC_KUAP and > > CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG > > > > The problem is the combination of some new code called via printk(), > > check_pointer() which calls probe_kernel_read(). That then calls > > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > > (before we've patched features). With the JUMP_LABEL debug enabled that > > causes us to call printk() & dump_stack() and we end up recursing and > > overflowing the stack. > > > > Because it happens so early you don't get any output, just an apparently > > dead system. > > > > The stack trace (which you don't see) is something like: > > > > ... > > dump_stack+0xdc > > probe_kernel_read+0x1a4 > > check_pointer+0x58 > > string+0x3c > > vsnprintf+0x1bc > > vscnprintf+0x20 > > printk_safe_log_store+0x7c > > printk+0x40 > > dump_stack_print_info+0xbc > > dump_stack+0x8 > > probe_kernel_read+0x1a4 > > probe_kernel_read+0x19c > > check_pointer+0x58 > > string+0x3c > > vsnprintf+0x1bc > > vscnprintf+0x20 > > vprintk_store+0x6c > > vprintk_emit+0xec > > vprintk_func+0xd4 > > printk+0x40 > > cpufeatures_process_feature+0xc8 > > scan_cpufeatures_subnodes+0x380 > > of_scan_flat_dt_subnodes+0xb4 > > dt_cpu_ftrs_scan_callback+0x158 > > of_scan_flat_dt+0xf0 > > dt_cpu_ftrs_scan+0x3c > > early_init_devtree+0x360 > > early_setup+0x9c > > > > 2. Report on s390: > > > > vsnprintf invocations, are broken on s390. For example, the early boot > > output now looks like this where the first (efault) should be > > the linux_banner: > > > > [ 0.099985] (efault) > > [ 0.099985] setup: Linux is running as a z/VM guest operating system in 64-bit mode > > [ 0.100066] setup: The maximum memory size is 8192MB > > [ 0.100070] cma: Reserved 4 MiB at (efault) > > [ 0.100100] numa: NUMA mode: (efault) > > > > The reason for this, is that the code assumes that > > probe_kernel_address() works very early. This however is not true on > > at least s390. Uaccess on KERNEL_DS works only after page tables have > > been setup on s390, which happens with setup_arch()->paging_init(). > > > > Any probe_kernel_address() invocation before that will return -EFAULT. > > Hmm, this sounds to me that probe_kernel_address() is broken for these > architectures. Perhaps the system_state check should be in > probe_kernel_address() for those architectures? Yeah. Well, these problems are hard to debug. It left a dead power system with a blank screen. I am not sure if the added check is worth the pain. I hope that the check would help to debug problems. But it is yet another complexity in printk() path. I think that it is fine to keep it enabled only on the booted system for a while and get some more feedback. Best Regards, Petr
On (05/09/19 14:19), Petr Mladek wrote: > 1. Report on Power: > > Kernel crashes very early during boot with with CONFIG_PPC_KUAP and > CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG > > The problem is the combination of some new code called via printk(), > check_pointer() which calls probe_kernel_read(). That then calls > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > (before we've patched features). With the JUMP_LABEL debug enabled that > causes us to call printk() & dump_stack() and we end up recursing and > overflowing the stack. Hmm... hmm... PPC does an .opd-based symbol dereference, which eventually probe_kernel_read()-s. So early printk(%pS) will do printk(%pS) dereference_function_descriptor() probe_kernel_address() dump_stack() printk(%pS) dereference_function_descriptor() probe_kernel_address() dump_stack() printk(%pS) ... I'd say... that it's not vsprintf that we want to fix, it's the idea that probe_kernel_address() can dump_stack() on any platform. On some archs probe_kernel_address()->dump_stack() is going nowhere: dump_stack() does probe_kernel_address(), which calls dump_stack(), which calls printk(%pS)->probe_kernel_address() again and again, and again. -ss
[ Sorry about html and mobile crud, I'm not at the computer right now ] How about we just undo the whole misguided probe_kernel_address() thing? The excuse for is was that it would avoid crashes. It turns out that was wrong, and that it just made things much worse. Honestly, we haven't really had the crashes that the logic was supposed to protect against in the first place, so by now it's clear that the whole thing was a stupid and horrible mistake in the first place. So let's admit that and just go back to the old sane model. Seriously, we've never needed that odd probing. It causes issues. It's wrong and pointless. Linus On Thu, May 9, 2019, 21:32 Sergey Senozhatsky < sergey.senozhatsky.work@gmail.com> wrote: > On (05/09/19 14:19), Petr Mladek wrote: > > 1. Report on Power: > > > > Kernel crashes very early during boot with with CONFIG_PPC_KUAP and > > CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG > > > > The problem is the combination of some new code called via printk(), > > check_pointer() which calls probe_kernel_read(). That then calls > > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early > > (before we've patched features). With the JUMP_LABEL debug enabled that > > causes us to call printk() & dump_stack() and we end up recursing and > > overflowing the stack. > > Hmm... hmm... PPC does an .opd-based symbol dereference, which > eventually probe_kernel_read()-s. So early printk(%pS) will do > > printk(%pS) > dereference_function_descriptor() > probe_kernel_address() > dump_stack() > printk(%pS) > dereference_function_descriptor() > probe_kernel_address() > dump_stack() > printk(%pS) > ... > > I'd say... that it's not vsprintf that we want to fix, it's > the idea that probe_kernel_address() can dump_stack() on any > platform. On some archs probe_kernel_address()->dump_stack() > is going nowhere: > dump_stack() does probe_kernel_address(), which calls dump_stack(), > which calls printk(%pS)->probe_kernel_address() again and again, > and again. > > -ss >
David Laight <David.Laight@ACULAB.COM> writes: > From: Michal Suchánek >> Sent: 09 May 2019 14:38 > ... >> > The problem is the combination of some new code called via printk(), >> > check_pointer() which calls probe_kernel_read(). That then calls >> > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early >> > (before we've patched features). >> >> There is early_mmu_has_feature for this case. mmu_has_feature does not >> work before patching so parts of kernel that can run before patching >> must use the early_ variant which actually runs code reading the >> feature bitmap to determine the answer. > > Does the early_ variant get patched so the it is reasonably > efficient after the 'patching' is done? No they don't get patched ever. The name is a bit misleading I guess. > Or should there be a third version which gets patched across? For a case like this it's entirely safe to just skip the code early in boot, so if it was a static_key_false everything would just work. Unfortunately the way the code is currently written we would have to change all MMU features to static_key_false and that risks breaking something else. We have a long standing TODO to rework all our feature logic and unify CPU/MMU/firmware/etc. features. Possibly as part of that we can come up with a scheme where the default value is per-feature bit. Having said all that, in this case the overhead of the test and branch is small compared to the cost of writing to the SPR which controls user access and then doing an isync, so it's all somewhat premature optimisation. cheers
diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 7b0a6140bfad..8b43a883be6b 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -640,8 +640,13 @@ static const char *check_pointer_msg(const void *ptr) if (!ptr) return "(null)"; - if (probe_kernel_address(ptr, byte)) - return "(efault)"; + /* User space address handling is not ready during early boot. */ + if (system_state <= SYSTEM_BOOTING) { + if ((unsigned long)ptr < PAGE_SIZE) + return "(efault)"; + } else { + if (probe_kernel_address(ptr, byte)) + return "(efault)"; return NULL; }
The commit 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing invalid pointers") broke boot on several architectures. The common pattern is that probe_kernel_read() is not working during early boot because userspace access framework is not ready. The check is only the best effort. Let's not rush with it during the early boot. Details: 1. Report on Power: Kernel crashes very early during boot with with CONFIG_PPC_KUAP and CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG The problem is the combination of some new code called via printk(), check_pointer() which calls probe_kernel_read(). That then calls allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early (before we've patched features). With the JUMP_LABEL debug enabled that causes us to call printk() & dump_stack() and we end up recursing and overflowing the stack. Because it happens so early you don't get any output, just an apparently dead system. The stack trace (which you don't see) is something like: ... dump_stack+0xdc probe_kernel_read+0x1a4 check_pointer+0x58 string+0x3c vsnprintf+0x1bc vscnprintf+0x20 printk_safe_log_store+0x7c printk+0x40 dump_stack_print_info+0xbc dump_stack+0x8 probe_kernel_read+0x1a4 probe_kernel_read+0x19c check_pointer+0x58 string+0x3c vsnprintf+0x1bc vscnprintf+0x20 vprintk_store+0x6c vprintk_emit+0xec vprintk_func+0xd4 printk+0x40 cpufeatures_process_feature+0xc8 scan_cpufeatures_subnodes+0x380 of_scan_flat_dt_subnodes+0xb4 dt_cpu_ftrs_scan_callback+0x158 of_scan_flat_dt+0xf0 dt_cpu_ftrs_scan+0x3c early_init_devtree+0x360 early_setup+0x9c 2. Report on s390: vsnprintf invocations, are broken on s390. For example, the early boot output now looks like this where the first (efault) should be the linux_banner: [ 0.099985] (efault) [ 0.099985] setup: Linux is running as a z/VM guest operating system in 64-bit mode [ 0.100066] setup: The maximum memory size is 8192MB [ 0.100070] cma: Reserved 4 MiB at (efault) [ 0.100100] numa: NUMA mode: (efault) The reason for this, is that the code assumes that probe_kernel_address() works very early. This however is not true on at least s390. Uaccess on KERNEL_DS works only after page tables have been setup on s390, which happens with setup_arch()->paging_init(). Any probe_kernel_address() invocation before that will return -EFAULT. Fixes: 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing invalid pointers") Signed-off-by: Petr Mladek <pmladek@suse.com> --- lib/vsprintf.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)