Message ID | 20220718154620.132195-1-grimm@linux.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | mambo: Fix backtrace when trace mixes endian code | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-Docker_builds_and_checks | fail | check_build (ubuntu-rolling) failed at step Create Docker image. |
On Mon, 18 Jul 2022 11:46:20 -0400 Ryan Grimm <grimm@linux.ibm.com> wrote: > In the case of LE kernel and BE skiboot, the bt functions triggers an > illegal address when the kernel has a stack pointer in skiboot. For > example, in copy_and_flush: > > pc: 0x000000000000C25C +0x000000000000C25C > lr: 0x000000000000C240 +0x000000000000C240 > stack:0x0000000031C13D20 0x8428023000000000 +0x8428023000000000 > Illegal Address 0x001EC13100000007 > > The bad address is from mem_display_64 and is fixed up by inverting > the LE bit: > > systemsim % mem_display_64 [ expr 0x0000000031C13D20 ] 1 > 0x103EC13100000000 > systemsim % mem_display_64 [ expr 0x0000000031C13D20 ] 0 > 0x0000000031C13E10 > > This patch tests the pointer by catching the illegal access and > inverting the LE bit. Now the stack trace looks good: > > pc: 0x000000000000C254 +0x000000000000C254 > lr: 0x000000000000C240 +0x000000000000C240 > stack:0x0000000031C13D20 0x0000000030022884 .load_and_boot_kernel+0xc6c > stack:0x0000000031C13E10 0x0000000030023344 .main_cpu_entry+0x8bc > > Opal calls also look good too now: > > pc: 0x0000000030028588 .cpu_idle_delay+0xb8 > lr: 0x000000003002856C .cpu_idle_delay+0x9c > stack:0x0000000031C13A10 0x0000000030028514 .cpu_idle_delay+0x44 > stack:0x0000000031C13AB0 0x000000003002D6C0 .time_wait_nopoll+0x34 > stack:0x0000000031C13B20 0x000000003002D77C .time_wait+0xa8 > stack:0x0000000031C13BA0 0x000000003002821C .cpu_wait_job+0x3c > stack:0x0000000031C13C40 0x0000000030029554 .opal_reinit_cpus+0x3c0 > stack:0x0000000031C13D10 0x00000000300038AC opal_entry+0x14c > stack:0x000000000071FDA0 0xC0000000000537B0 opal_call+0x40 > stack:0x000000000071FE60 0xC00000000005450C opal_reinit_cpus+0x20 > stack:0x000000000071FED0 0xC00000000065FDAC opal_configure_cores+0x48 > stack:0x000000000071FF00 0xC000000000656554 early_setup+0x134 > > Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> LGTM Reviewed-by: Dan Horák <dan@danny.cz> Dan > --- > external/mambo/mambo_utils.tcl | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/external/mambo/mambo_utils.tcl b/external/mambo/mambo_utils.tcl > index 96f8971a..f8f64eb9 100644 > --- a/external/mambo/mambo_utils.tcl > +++ b/external/mambo/mambo_utils.tcl > @@ -423,6 +423,13 @@ proc bt { {sp 0} } { > set sym [addr2func $lr] > puts "stack:$pa \t$lr\t$sym" > if { $bc == 0 } { break } > + > + # catch illegal address in case of endian mismatch > + set tstpa [ mysim cpu $p:$c:$t util dtranslate $bc ] > + if {[catch { set tst [ mem_display_64 $tstpa $le ] } ]} { > + set le [ expr ! $le ] > + set bc [ mem_display_64 $pa $le ] > + } > set sp $bc > } > puts "" > -- > 2.31.1 > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot
Excerpts from Ryan Grimm's message of July 19, 2022 1:46 am: > In the case of LE kernel and BE skiboot, the bt functions triggers an > illegal address when the kernel has a stack pointer in skiboot. For > example, in copy_and_flush: > > pc: 0x000000000000C25C +0x000000000000C25C > lr: 0x000000000000C240 +0x000000000000C240 > stack:0x0000000031C13D20 0x8428023000000000 +0x8428023000000000 > Illegal Address 0x001EC13100000007 > > The bad address is from mem_display_64 and is fixed up by inverting > the LE bit: > > systemsim % mem_display_64 [ expr 0x0000000031C13D20 ] 1 > 0x103EC13100000000 > systemsim % mem_display_64 [ expr 0x0000000031C13D20 ] 0 > 0x0000000031C13E10 > > This patch tests the pointer by catching the illegal access and > inverting the LE bit. Now the stack trace looks good: > > pc: 0x000000000000C254 +0x000000000000C254 > lr: 0x000000000000C240 +0x000000000000C240 > stack:0x0000000031C13D20 0x0000000030022884 .load_and_boot_kernel+0xc6c > stack:0x0000000031C13E10 0x0000000030023344 .main_cpu_entry+0x8bc > > Opal calls also look good too now: > > pc: 0x0000000030028588 .cpu_idle_delay+0xb8 > lr: 0x000000003002856C .cpu_idle_delay+0x9c > stack:0x0000000031C13A10 0x0000000030028514 .cpu_idle_delay+0x44 > stack:0x0000000031C13AB0 0x000000003002D6C0 .time_wait_nopoll+0x34 > stack:0x0000000031C13B20 0x000000003002D77C .time_wait+0xa8 > stack:0x0000000031C13BA0 0x000000003002821C .cpu_wait_job+0x3c > stack:0x0000000031C13C40 0x0000000030029554 .opal_reinit_cpus+0x3c0 > stack:0x0000000031C13D10 0x00000000300038AC opal_entry+0x14c > stack:0x000000000071FDA0 0xC0000000000537B0 opal_call+0x40 > stack:0x000000000071FE60 0xC00000000005450C opal_reinit_cpus+0x20 > stack:0x000000000071FED0 0xC00000000065FDAC opal_configure_cores+0x48 > stack:0x000000000071FF00 0xC000000000656554 early_setup+0x134 Nice! Acked-by: Nicholas Piggin <npiggin@gmail.com> > > Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> > --- > external/mambo/mambo_utils.tcl | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/external/mambo/mambo_utils.tcl b/external/mambo/mambo_utils.tcl > index 96f8971a..f8f64eb9 100644 > --- a/external/mambo/mambo_utils.tcl > +++ b/external/mambo/mambo_utils.tcl > @@ -423,6 +423,13 @@ proc bt { {sp 0} } { > set sym [addr2func $lr] > puts "stack:$pa \t$lr\t$sym" > if { $bc == 0 } { break } > + > + # catch illegal address in case of endian mismatch > + set tstpa [ mysim cpu $p:$c:$t util dtranslate $bc ] > + if {[catch { set tst [ mem_display_64 $tstpa $le ] } ]} { > + set le [ expr ! $le ] > + set bc [ mem_display_64 $pa $le ] > + } > set sp $bc > } > puts "" > -- > 2.31.1 > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot >
On Mon, Jul 18, 2022 at 11:46:20AM -0400, Ryan Grimm wrote: >diff --git a/external/mambo/mambo_utils.tcl b/external/mambo/mambo_utils.tcl >index 96f8971a..f8f64eb9 100644 >--- a/external/mambo/mambo_utils.tcl >+++ b/external/mambo/mambo_utils.tcl >@@ -423,6 +423,13 @@ proc bt { {sp 0} } { > set sym [addr2func $lr] > puts "stack:$pa \t$lr\t$sym" > if { $bc == 0 } { break } >+ >+ # catch illegal address in case of endian mismatch >+ set tstpa [ mysim cpu $p:$c:$t util dtranslate $bc ] >+ if {[catch { set tst [ mem_display_64 $tstpa $le ] } ]} { >+ set le [ expr ! $le ] >+ set bc [ mem_display_64 $pa $le ] >+ } > set sp $bc > } > puts "" Applied to master.
diff --git a/external/mambo/mambo_utils.tcl b/external/mambo/mambo_utils.tcl index 96f8971a..f8f64eb9 100644 --- a/external/mambo/mambo_utils.tcl +++ b/external/mambo/mambo_utils.tcl @@ -423,6 +423,13 @@ proc bt { {sp 0} } { set sym [addr2func $lr] puts "stack:$pa \t$lr\t$sym" if { $bc == 0 } { break } + + # catch illegal address in case of endian mismatch + set tstpa [ mysim cpu $p:$c:$t util dtranslate $bc ] + if {[catch { set tst [ mem_display_64 $tstpa $le ] } ]} { + set le [ expr ! $le ] + set bc [ mem_display_64 $pa $le ] + } set sp $bc } puts ""
In the case of LE kernel and BE skiboot, the bt functions triggers an illegal address when the kernel has a stack pointer in skiboot. For example, in copy_and_flush: pc: 0x000000000000C25C +0x000000000000C25C lr: 0x000000000000C240 +0x000000000000C240 stack:0x0000000031C13D20 0x8428023000000000 +0x8428023000000000 Illegal Address 0x001EC13100000007 The bad address is from mem_display_64 and is fixed up by inverting the LE bit: systemsim % mem_display_64 [ expr 0x0000000031C13D20 ] 1 0x103EC13100000000 systemsim % mem_display_64 [ expr 0x0000000031C13D20 ] 0 0x0000000031C13E10 This patch tests the pointer by catching the illegal access and inverting the LE bit. Now the stack trace looks good: pc: 0x000000000000C254 +0x000000000000C254 lr: 0x000000000000C240 +0x000000000000C240 stack:0x0000000031C13D20 0x0000000030022884 .load_and_boot_kernel+0xc6c stack:0x0000000031C13E10 0x0000000030023344 .main_cpu_entry+0x8bc Opal calls also look good too now: pc: 0x0000000030028588 .cpu_idle_delay+0xb8 lr: 0x000000003002856C .cpu_idle_delay+0x9c stack:0x0000000031C13A10 0x0000000030028514 .cpu_idle_delay+0x44 stack:0x0000000031C13AB0 0x000000003002D6C0 .time_wait_nopoll+0x34 stack:0x0000000031C13B20 0x000000003002D77C .time_wait+0xa8 stack:0x0000000031C13BA0 0x000000003002821C .cpu_wait_job+0x3c stack:0x0000000031C13C40 0x0000000030029554 .opal_reinit_cpus+0x3c0 stack:0x0000000031C13D10 0x00000000300038AC opal_entry+0x14c stack:0x000000000071FDA0 0xC0000000000537B0 opal_call+0x40 stack:0x000000000071FE60 0xC00000000005450C opal_reinit_cpus+0x20 stack:0x000000000071FED0 0xC00000000065FDAC opal_configure_cores+0x48 stack:0x000000000071FF00 0xC000000000656554 early_setup+0x134 Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> --- external/mambo/mambo_utils.tcl | 7 +++++++ 1 file changed, 7 insertions(+)