diff mbox series

[v2] On ppc64le we HAVE_RELIABLE_STACKTRACE

Message ID 20180305164928.GA17953@lst.de (mailing list archive)
State Superseded, archived
Headers show
Series [v2] On ppc64le we HAVE_RELIABLE_STACKTRACE | expand

Commit Message

Torsten Duwe March 5, 2018, 4:49 p.m. UTC
The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3:

[...] There are several rules that must be adhered to in order to ensure
reliable and consistent call chain backtracing:

* Before a function calls any other function, it shall establish its
  own stack frame, whose size shall be a multiple of 16 bytes.

 – In instances where a function’s prologue creates a stack frame, the
   back-chain word of the stack frame shall be updated atomically with
   the value of the stack pointer (r1) when a back chain is implemented.
   (This must be supported as default by all ELF V2 ABI-compliant
   environments.)
[...]
 – The function shall save the link register that contains its return
   address in the LR save doubleword of its caller’s stack frame before
   calling another function.

To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE.
This patch may be unneccessarily limited to ppc64le, but OTOH the only
user of this flag so far is livepatching, which is only implemented on
PPCs with 64-LE, a.k.a. ELF ABI v2.

This change also implements save_stack_trace_tsk_reliable() for ppc64
that checks for the above conditions, where possible.

Signed-off-by:   Torsten Duwe <duwe@suse.de>

---
v2:
 * implemented save_stack_trace_tsk_reliable(), with a bunch of sanity
   checks. The test for a kernel code pointer is much nicer now, and
   the exit condition is exact (when compared to last week's follow-up)

---

Comments

Segher Boessenkool March 5, 2018, 5:09 p.m. UTC | #1
On Mon, Mar 05, 2018 at 05:49:28PM +0100, Torsten Duwe wrote:
> The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3:
> 
> [...] There are several rules that must be adhered to in order to ensure
> reliable and consistent call chain backtracing:
> 
> * Before a function calls any other function, it shall establish its
>   own stack frame, whose size shall be a multiple of 16 bytes.
> 
>  – In instances where a function’s prologue creates a stack frame, the
>    back-chain word of the stack frame shall be updated atomically with
>    the value of the stack pointer (r1) when a back chain is implemented.
>    (This must be supported as default by all ELF V2 ABI-compliant
>    environments.)
> [...]
>  – The function shall save the link register that contains its return
>    address in the LR save doubleword of its caller’s stack frame before
>    calling another function.

All of this is also true for the other PowerPC ABIs, fwiw (both 32-bit
and 64-bit; the offset of the LR save slot isn't the same in all ABIs).


Segher
Josh Poimboeuf March 8, 2018, 4:26 p.m. UTC | #2
On Mon, Mar 05, 2018 at 05:49:28PM +0100, Torsten Duwe wrote:
> The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3:
> 
> [...] There are several rules that must be adhered to in order to ensure
> reliable and consistent call chain backtracing:
> 
> * Before a function calls any other function, it shall establish its
>   own stack frame, whose size shall be a multiple of 16 bytes.
> 
>  – In instances where a function’s prologue creates a stack frame, the
>    back-chain word of the stack frame shall be updated atomically with
>    the value of the stack pointer (r1) when a back chain is implemented.
>    (This must be supported as default by all ELF V2 ABI-compliant
>    environments.)
> [...]
>  – The function shall save the link register that contains its return
>    address in the LR save doubleword of its caller’s stack frame before
>    calling another function.
> 
> To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE.
> This patch may be unneccessarily limited to ppc64le, but OTOH the only
> user of this flag so far is livepatching, which is only implemented on
> PPCs with 64-LE, a.k.a. ELF ABI v2.
> 
> This change also implements save_stack_trace_tsk_reliable() for ppc64
> that checks for the above conditions, where possible.
> 
> Signed-off-by:   Torsten Duwe <duwe@suse.de>

This doesn't seem to address some of my previous concerns:

- Bailing on interrupt/exception frames

- Function graph tracing return address conversion

- kretprobes return address conversion
Torsten Duwe March 9, 2018, 4:47 p.m. UTC | #3
On Thu, 8 Mar 2018 10:26:16 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> This doesn't seem to address some of my previous concerns:

You're right. That discussion quickly headed towards objtool
and I forgot about this one paragraph with the remarks.

> - Bailing on interrupt/exception frames

That is a good question. My current code keeps unwinding as long
as the trace looks sane. If the exception frame has a valid code
pointer in the LR slot it will continue. Couldn't there be cases
where this is desirable? Should this be configurable? Not that
I have an idea how this situation could occur for a thread
that is current or sleeping...

Michael, Balbir: is that possible? Any Idea how to reliably detect
an exception frame? My approach would be to look at the next return
address and compare it to the usual suspects (i.e. collect all 
"b ret" addresses in the EXCEPTION_COMMON macro, for BookS).

> - Function graph tracing return address conversion
> 
> - kretprobes return address conversion

You mean like in arch/x86/kernel/unwind_frame.c the call to
ftrace_graph_ret_addr ?

Forgive me my directness but I don't see why these should be handled in
arch-dependent code, other than maybe a hook, if inevitable, that calls
back into the graph tracer / kretprobes in order to get the proper
address, or simply call the trace unreliable in case it finds such a
return address.

I understand that x86 has a hard time with its multiple unwinders, but
there should be a simpler, generic solution for all other architectures.

	Torsten
Josh Poimboeuf March 12, 2018, 3:35 p.m. UTC | #4
On Fri, Mar 09, 2018 at 05:47:18PM +0100, Torsten Duwe wrote:
> On Thu, 8 Mar 2018 10:26:16 -0600
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > This doesn't seem to address some of my previous concerns:
> 
> You're right. That discussion quickly headed towards objtool
> and I forgot about this one paragraph with the remarks.
> 
> > - Bailing on interrupt/exception frames
> 
> That is a good question. My current code keeps unwinding as long
> as the trace looks sane. If the exception frame has a valid code
> pointer in the LR slot it will continue. Couldn't there be cases
> where this is desirable?

I thought we established in the previous discussion that this could
cause some functions to get skipped in the stack trace:

  https://lkml.kernel.org/r/20171219214652.u7qeb7fxov62ttke@treble

> Should this be configurable? Not that
> I have an idea how this situation could occur for a thread
> that is current or sleeping...

Page faults and preemption.

> Michael, Balbir: is that possible? Any Idea how to reliably detect
> an exception frame? My approach would be to look at the next return
> address and compare it to the usual suspects (i.e. collect all 
> "b ret" addresses in the EXCEPTION_COMMON macro, for BookS).

It looks like show_stack() already knows how to do this:

		/*
		 * See if this is an exception frame.
		 * We look for the "regshere" marker in the current frame.
		 */
		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {

So you could do something similar.

> > - Function graph tracing return address conversion
> > 
> > - kretprobes return address conversion
> 
> You mean like in arch/x86/kernel/unwind_frame.c the call to
> ftrace_graph_ret_addr ?
> 
> Forgive me my directness but I don't see why these should be handled in
> arch-dependent code, other than maybe a hook, if inevitable, that calls
> back into the graph tracer / kretprobes in order to get the proper
> address,

I don't really follow, where exactly would you propose calling
ftrace_graph_ret_addr() from?

> or simply call the trace unreliable in case it finds such a
> return address.

If you're going to make livepatch incompatible with function graph
tracing, there needs to be a good justification for it (and we'd need to
make sure existing users are fine with it).
diff mbox series

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 73ce5dd07642..9f49913e19e3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -220,6 +220,7 @@  config PPC
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
+	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HAVE_IRQ_TIME_ACCOUNTING
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index d534ed901538..e14c2dfd5311 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -11,8 +11,11 @@ 
  */
 
 #include <linux/export.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/task_stack.h>
 #include <linux/stacktrace.h>
 #include <asm/ptrace.h>
 #include <asm/processor.h>
@@ -76,3 +79,77 @@  save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 	save_context_stack(trace, regs->gpr[1], current, 0);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_regs);
+
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+int
+save_stack_trace_tsk_reliable(struct task_struct *tsk,
+				struct stack_trace *trace)
+{
+	unsigned long sp;
+	unsigned long stack_page = (unsigned long)task_stack_page(tsk);
+
+	/* The last frame (unwinding first) may not yet have saved
+	 * its LR onto the stack.
+	 */
+	int firstframe = 1;
+
+	if (tsk == current)
+		sp = current_stack_pointer();
+	else
+		sp = tsk->thread.ksp;
+
+	if (sp < stack_page + sizeof(struct thread_struct)
+	    || sp > stack_page + THREAD_SIZE - STACK_FRAME_OVERHEAD)
+		return 1;
+
+	for (;;) {
+		unsigned long *stack = (unsigned long *) sp;
+		unsigned long newsp, ip;
+
+		/* sanity check: ABI requires SP to be aligned 16 bytes. */
+		if (sp & 0xF)
+			return 1;
+
+		newsp = stack[0];
+		/* Stack grows downwards; unwinder may only go up. */
+		if (newsp <= sp)
+			return 1;
+
+		if (newsp >= stack_page + THREAD_SIZE)
+			return 1; /* invalid backlink, too far up. */
+
+		/* Examine the saved LR: it must point into kernel code. */
+		ip = stack[STACK_FRAME_LR_SAVE];
+		if (!firstframe) {
+			if (!func_ptr_is_kernel_text((void *)ip)) {
+#ifdef CONFIG_MODULES
+				struct module *mod = __module_text_address(ip);
+
+				if (!mod)
+#endif
+					return 1;
+			}
+		}
+		firstframe = 0;
+
+		if (!trace->skip)
+			trace->entries[trace->nr_entries++] = ip;
+		else
+			trace->skip--;
+
+		/* SP value loaded on kernel entry, see "PACAKSAVE(r13)" in
+		 * _switch() and system_call_common()
+		 */
+		if (newsp == stack_page + THREAD_SIZE - /* SWITCH_FRAME_SIZE */
+		    (STACK_FRAME_OVERHEAD + sizeof(struct pt_regs)))
+			break;
+
+		if (trace->nr_entries >= trace->max_entries)
+			return -E2BIG;
+
+		sp = newsp;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+#endif /* CONFIG_HAVE_RELIABLE_STACKTRACE */