Message ID | f2d5d66d47b28474b6224613787757fed3e92d3d.1718908016.git.naveen@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc: Add support for ftrace direct and BPF trampolines | expand |
On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote: > On 32-bit powerpc, gcc generates a three instruction sequence for > function profiling: > mflr r0 > stw r0, 4(r1) > bl _mcount > > On kernel boot, the call to _mcount() is nop-ed out, to be patched back > in when ftrace is actually enabled. The 'stw' instruction therefore is > not necessary unless ftrace is enabled. Nop it out during ftrace init. > > When ftrace is enabled, we want the 'stw' so that stack unwinding works > properly. Perform the same within the ftrace handler, similar to 64-bit > powerpc. > > For 64-bit powerpc, early versions of gcc used to emit a three > instruction sequence for function profiling (with -mprofile-kernel) with > a 'std' instruction to mimic the 'stw' above. Address that scenario also > by nop-ing out the 'std' instruction during ftrace init. Cool! Could 32-bit use the 2-insn sequence as well if it had -mprofile-kernel, out of curiosity? > > Signed-off-by: Naveen N Rao <naveen@kernel.org> > --- > arch/powerpc/kernel/trace/ftrace.c | 6 ++++-- > arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++-- > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c > index d8d6b4fd9a14..463bd7531dc8 100644 > --- a/arch/powerpc/kernel/trace/ftrace.c > +++ b/arch/powerpc/kernel/trace/ftrace.c > @@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) > /* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */ > ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); > if (!ret) > - ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4))); > + ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)), > + ppc_inst(PPC_RAW_NOP())); > } else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) { > /* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */ > ret = ftrace_read_inst(ip - 4, &old); > if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) { > ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); > - ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16))); > + ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)), > + ppc_inst(PPC_RAW_NOP())); So this is the old style path... Should you check the mflr validate result first? Also do you know what GCC version, roughly? Maybe we could have a comment here and eventually deprecate it. You could split this change into its own patch. > } > } else { > return -EINVAL; > diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S > index 76dbe9fd2c0f..244a1c7bb1e8 100644 > --- a/arch/powerpc/kernel/trace/ftrace_entry.S > +++ b/arch/powerpc/kernel/trace/ftrace_entry.S > @@ -33,6 +33,8 @@ > * and then arrange for the ftrace function to be called. > */ > .macro ftrace_regs_entry allregs > + /* Save the original return address in A's stack frame */ > + PPC_STL r0, LRSAVE(r1) > /* Create a minimal stack frame for representing B */ > PPC_STLU r1, -STACK_FRAME_MIN_SIZE(r1) > > @@ -44,8 +46,6 @@ > SAVE_GPRS(3, 10, r1) > > #ifdef CONFIG_PPC64 > - /* Save the original return address in A's stack frame */ > - std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1) > /* Ok to continue? */ > lbz r3, PACA_FTRACE_ENABLED(r13) > cmpdi r3, 0 That seems right to me. Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
On Mon, Jul 01, 2024 at 06:57:12PM GMT, Nicholas Piggin wrote: > On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote: > > On 32-bit powerpc, gcc generates a three instruction sequence for > > function profiling: > > mflr r0 > > stw r0, 4(r1) > > bl _mcount > > > > On kernel boot, the call to _mcount() is nop-ed out, to be patched back > > in when ftrace is actually enabled. The 'stw' instruction therefore is > > not necessary unless ftrace is enabled. Nop it out during ftrace init. > > > > When ftrace is enabled, we want the 'stw' so that stack unwinding works > > properly. Perform the same within the ftrace handler, similar to 64-bit > > powerpc. > > > > For 64-bit powerpc, early versions of gcc used to emit a three > > instruction sequence for function profiling (with -mprofile-kernel) with > > a 'std' instruction to mimic the 'stw' above. Address that scenario also > > by nop-ing out the 'std' instruction during ftrace init. > > Cool! Could 32-bit use the 2-insn sequence as well if it had > -mprofile-kernel, out of curiosity? Yes! It actually already does with the previous change to add support for -fpatchable-function-entry. Commit 0f71dcfb4aef ("powerpc/ftrace: Add support for -fpatchable-function-entry") changelog describes this: This changes the profiling instructions used on ppc32. The default -pg option emits an additional 'stw' instruction after 'mflr r0' and before the branch to _mcount 'bl _mcount'. This is very similar to the original -mprofile-kernel implementation on ppc64le, where an additional 'std' instruction was used to save LR to its save location in the caller's stackframe. Subsequently, this additional store was removed in later compiler versions for performance reasons. The same reasons apply for ppc32 so we only patch in a 'mflr r0'. > > > > > Signed-off-by: Naveen N Rao <naveen@kernel.org> > > --- > > arch/powerpc/kernel/trace/ftrace.c | 6 ++++-- > > arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++-- > > 2 files changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c > > index d8d6b4fd9a14..463bd7531dc8 100644 > > --- a/arch/powerpc/kernel/trace/ftrace.c > > +++ b/arch/powerpc/kernel/trace/ftrace.c > > @@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) > > /* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */ > > ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); > > if (!ret) > > - ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4))); > > + ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)), > > + ppc_inst(PPC_RAW_NOP())); > > } else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) { > > /* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */ > > ret = ftrace_read_inst(ip - 4, &old); > > if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) { > > ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); > > - ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16))); > > + ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)), > > + ppc_inst(PPC_RAW_NOP())); > > So this is the old style path... Should you check the mflr validate > result first? Also do you know what GCC version, roughly? Maybe we > could have a comment here and eventually deprecate it. Sure, this is gcc v5.5 for sure. gcc v6.3 doesn't seem to emit the additional 'std' instruction. > > You could split this change into its own patch. Indeed. I will do that. > > > } > > } else { > > return -EINVAL; > > diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S > > index 76dbe9fd2c0f..244a1c7bb1e8 100644 > > --- a/arch/powerpc/kernel/trace/ftrace_entry.S > > +++ b/arch/powerpc/kernel/trace/ftrace_entry.S > > @@ -33,6 +33,8 @@ > > * and then arrange for the ftrace function to be called. > > */ > > .macro ftrace_regs_entry allregs > > + /* Save the original return address in A's stack frame */ > > + PPC_STL r0, LRSAVE(r1) > > /* Create a minimal stack frame for representing B */ > > PPC_STLU r1, -STACK_FRAME_MIN_SIZE(r1) > > > > @@ -44,8 +46,6 @@ > > SAVE_GPRS(3, 10, r1) > > > > #ifdef CONFIG_PPC64 > > - /* Save the original return address in A's stack frame */ > > - std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1) > > /* Ok to continue? */ > > lbz r3, PACA_FTRACE_ENABLED(r13) > > cmpdi r3, 0 > > That seems right to me. > > Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Thanks, Naveen
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index d8d6b4fd9a14..463bd7531dc8 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) /* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */ ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); if (!ret) - ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4))); + ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)), + ppc_inst(PPC_RAW_NOP())); } else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) { /* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */ ret = ftrace_read_inst(ip - 4, &old); if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0)))) { ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); - ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16))); + ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)), + ppc_inst(PPC_RAW_NOP())); } } else { return -EINVAL; diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S index 76dbe9fd2c0f..244a1c7bb1e8 100644 --- a/arch/powerpc/kernel/trace/ftrace_entry.S +++ b/arch/powerpc/kernel/trace/ftrace_entry.S @@ -33,6 +33,8 @@ * and then arrange for the ftrace function to be called. */ .macro ftrace_regs_entry allregs + /* Save the original return address in A's stack frame */ + PPC_STL r0, LRSAVE(r1) /* Create a minimal stack frame for representing B */ PPC_STLU r1, -STACK_FRAME_MIN_SIZE(r1) @@ -44,8 +46,6 @@ SAVE_GPRS(3, 10, r1) #ifdef CONFIG_PPC64 - /* Save the original return address in A's stack frame */ - std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1) /* Ok to continue? */ lbz r3, PACA_FTRACE_ENABLED(r13) cmpdi r3, 0
On 32-bit powerpc, gcc generates a three instruction sequence for function profiling: mflr r0 stw r0, 4(r1) bl _mcount On kernel boot, the call to _mcount() is nop-ed out, to be patched back in when ftrace is actually enabled. The 'stw' instruction therefore is not necessary unless ftrace is enabled. Nop it out during ftrace init. When ftrace is enabled, we want the 'stw' so that stack unwinding works properly. Perform the same within the ftrace handler, similar to 64-bit powerpc. For 64-bit powerpc, early versions of gcc used to emit a three instruction sequence for function profiling (with -mprofile-kernel) with a 'std' instruction to mimic the 'stw' above. Address that scenario also by nop-ing out the 'std' instruction during ftrace init. Signed-off-by: Naveen N Rao <naveen@kernel.org> --- arch/powerpc/kernel/trace/ftrace.c | 6 ++++-- arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++-- 2 files changed, 6 insertions(+), 4 deletions(-)