mbox series

[RFC,0/4] Out-of-line static calls for powerpc64 ELF V2

Message ID 20220901055823.152983-1-bgray@linux.ibm.com (mailing list archive)
Headers show
Series Out-of-line static calls for powerpc64 ELF V2 | expand

Message

Benjamin Gray Sept. 1, 2022, 5:58 a.m. UTC
WIP implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
Static calls patch an indirect branch into a direct branch at runtime.
Out-of-line specifically has a caller directly call a trampoline, and
the trampoline gets patched to directly call the target. This current
implementation has a known issue described in detail below, and is
presented here for any comments or suggestions.

64-bit ELF V2 specifies a table of contents (TOC) pointer stored in r2.
Functions that use a TOC can use it to perform various operations
relative to its value. When the caller and target use different TOCs,
the static call implementation must ensure the TOC is kept consistent
so that neither tries to use the other's TOC.

However, while the trampoline can change the caller's TOC to the target's
TOC, it cannot restore the caller's TOC when the target returns. For the
trampoline to do this would require the target to return to the trampoline,
and so the return address back to the caller would need to be saved to
the stack. But the trampoline cannot move the stack because the target
may be expecting parameters relative to the stack pointer (when registers
are insufficient or varargs are used). And as static calls are usable in
generic code, there can be no arch-specific restrictions on parameters
that would sidestep this issue.

Normally the TOC change issue is resolved by the caller, which will save
and restore its TOC if necessary. For static calls though the caller
sees the trampoline as a local function, so assumes it does not change
the TOC and treats r2 as nonvolatile (no save & restore added).

This is a simialar problem to that faced by livepatch. Static calls may have
a few more options though, because the call is performed through a
`static_call` macro, allowing annotation and insertion of inline assembly
at every callsite.

I can think of several possible solutions, but they are relatively complex:

1. Patching the callsites at runtime, as is done for inline static calls.
    This also requires some inline assembly to save `r2` to the TOC pointer
    Doubleword slot on the stack before each static call, as the caller may
    not have done so in its prologue. It should be easy to add though, because
    static calls are invoked through the `static_call` macro that can be
    modified appropriately. The runtime patching then modifies the trailing
    function call `nop` to restore this r2 value.

    The patching itself can probably be done at compile time at kernel callsites.

2. Use the livepatch stack method of growing the base of the stack backwards.
    I haven't looked too closely at the implementation though, especially
    regarding how much room is available.

    The benefit of this method is that there can be zero overhead when the
    trampoline and target share a TOC. So the trampoline in kernel-only
    calls can still just be a single direct branch.

3. Remove the local entry point from the trampoline. This makes the trampoline
    less efficient, as it cannot assume r2 to be correct, but should at least
    cause the caller to automatically save and restore r2 without manual patching.
    From the ABI manual:

    > 2.2.1. Function Call Linkage Protocols
    >   A function that uses a TOC pointer always has a separate local entry point
    >   [...], and preserves r2 when called via its local entry point.
    >
    > 2.2.2.1. Register Roles
    >   (a) Register r2 is nonvolatile with respect to calls between functions
    >       in the same compilation unit, except under the conditions in footnote (b)
    >   (b) Register r2 is volatile and available for use in a function that does not
    >       use a TOC pointer and that does not guarantee that it preserves r2.

    So not having a local entry point implies not using a TOC pointer, which
    implies r2 is volatile if the trampoline does not guarantee that it preserves
    r2. However experimenting with such a trampoline showed the caller still did
    not preserve its TOC when necessary, even when the trampoline used instructions
    that wrote to r2. Possibly there's an attribute that can be used to mark the
    necessary info, but I could not find one.


Benjamin Gray (3):
  static_call: Move static call selftest to static_call.c
  powerpc/64: Add support for out-of-line static calls
  powerpc/64: Add tests for out-of-line static calls

Russell Currey (1):
  powerpc/code-patching: add patch_memory() for writing RO text

 arch/powerpc/Kconfig                     |  23 +-
 arch/powerpc/include/asm/code-patching.h |   2 +
 arch/powerpc/include/asm/static_call.h   |  45 +++-
 arch/powerpc/kernel/Makefile             |   4 +-
 arch/powerpc/kernel/static_call.c        | 184 +++++++++++++++-
 arch/powerpc/kernel/static_call_test.c   | 257 +++++++++++++++++++++++
 arch/powerpc/lib/code-patching.c         |  65 ++++++
 kernel/static_call.c                     |  43 ++++
 kernel/static_call_inline.c              |  43 ----
 9 files changed, 613 insertions(+), 53 deletions(-)
 create mode 100644 arch/powerpc/kernel/static_call_test.c


base-commit: c5e4d5e99162ba8025d58a3af7ad103f155d2df7
--
2.37.2

Comments

Christophe Leroy Sept. 1, 2022, 8:07 a.m. UTC | #1
CCing static call maintainers/reviewers.

And note that my email address has changed to 
christophe.leroy@csgroup.eu monthes ago.

Le 01/09/2022 à 07:58, Benjamin Gray a écrit :
> [Vous ne recevez pas souvent de courriers de bgray@linux.ibm.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> 
> WIP implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
> Static calls patch an indirect branch into a direct branch at runtime.
> Out-of-line specifically has a caller directly call a trampoline, and
> the trampoline gets patched to directly call the target. This current
> implementation has a known issue described in detail below, and is
> presented here for any comments or suggestions.

For a wider audience I recommend you to copy people from the core STATIC 
BRANCH/CALL (see MAINTAINERS file)


> 
> 64-bit ELF V2 specifies a table of contents (TOC) pointer stored in r2.
> Functions that use a TOC can use it to perform various operations
> relative to its value. When the caller and target use different TOCs,
> the static call implementation must ensure the TOC is kept consistent
> so that neither tries to use the other's TOC.
> 
> However, while the trampoline can change the caller's TOC to the target's
> TOC, it cannot restore the caller's TOC when the target returns. For the
> trampoline to do this would require the target to return to the trampoline,
> and so the return address back to the caller would need to be saved to
> the stack. But the trampoline cannot move the stack because the target
> may be expecting parameters relative to the stack pointer (when registers
> are insufficient or varargs are used). And as static calls are usable in
> generic code, there can be no arch-specific restrictions on parameters
> that would sidestep this issue.
> 
> Normally the TOC change issue is resolved by the caller, which will save
> and restore its TOC if necessary. For static calls though the caller
> sees the trampoline as a local function, so assumes it does not change
> the TOC and treats r2 as nonvolatile (no save & restore added).
> 
> This is a simialar problem to that faced by livepatch. Static calls may have
> a few more options though, because the call is performed through a
> `static_call` macro, allowing annotation and insertion of inline assembly
> at every callsite.
> 
> I can think of several possible solutions, but they are relatively complex:
> 
> 1. Patching the callsites at runtime, as is done for inline static calls.
>      This also requires some inline assembly to save `r2` to the TOC pointer
>      Doubleword slot on the stack before each static call, as the caller may
>      not have done so in its prologue. It should be easy to add though, because
>      static calls are invoked through the `static_call` macro that can be
>      modified appropriately. The runtime patching then modifies the trailing
>      function call `nop` to restore this r2 value.

I'm working at implementing inline static calls for ppc32. Will copy you 
next spin (If I don't forget).

> 
>      The patching itself can probably be done at compile time at kernel callsites.
> 
> 2. Use the livepatch stack method of growing the base of the stack backwards.
>      I haven't looked too closely at the implementation though, especially
>      regarding how much room is available.
> 
>      The benefit of this method is that there can be zero overhead when the
>      trampoline and target share a TOC. So the trampoline in kernel-only
>      calls can still just be a single direct branch.
> 
> 3. Remove the local entry point from the trampoline. This makes the trampoline
>      less efficient, as it cannot assume r2 to be correct, but should at least
>      cause the caller to automatically save and restore r2 without manual patching.
>      From the ABI manual:
> 
>      > 2.2.1. Function Call Linkage Protocols
>      >   A function that uses a TOC pointer always has a separate local entry point
>      >   [...], and preserves r2 when called via its local entry point.
>      >
>      > 2.2.2.1. Register Roles
>      >   (a) Register r2 is nonvolatile with respect to calls between functions
>      >       in the same compilation unit, except under the conditions in footnote (b)
>      >   (b) Register r2 is volatile and available for use in a function that does not
>      >       use a TOC pointer and that does not guarantee that it preserves r2.
> 
>      So not having a local entry point implies not using a TOC pointer, which
>      implies r2 is volatile if the trampoline does not guarantee that it preserves
>      r2. However experimenting with such a trampoline showed the caller still did
>      not preserve its TOC when necessary, even when the trampoline used instructions
>      that wrote to r2. Possibly there's an attribute that can be used to mark the
>      necessary info, but I could not find one.
> 

Another possible solution (at least for kernel) is to restore r2 from 
PACA instead of restoring it from the stack. So no worry whether the 
caller stored it or not. Something similar is done by module code, see 
comment before create_ftrace_stub()



> 
> Benjamin Gray (3):
>    static_call: Move static call selftest to static_call.c
>    powerpc/64: Add support for out-of-line static calls
>    powerpc/64: Add tests for out-of-line static calls
> 
> Russell Currey (1):
>    powerpc/code-patching: add patch_memory() for writing RO text
> 
>   arch/powerpc/Kconfig                     |  23 +-
>   arch/powerpc/include/asm/code-patching.h |   2 +
>   arch/powerpc/include/asm/static_call.h   |  45 +++-
>   arch/powerpc/kernel/Makefile             |   4 +-
>   arch/powerpc/kernel/static_call.c        | 184 +++++++++++++++-
>   arch/powerpc/kernel/static_call_test.c   | 257 +++++++++++++++++++++++
>   arch/powerpc/lib/code-patching.c         |  65 ++++++
>   kernel/static_call.c                     |  43 ++++
>   kernel/static_call_inline.c              |  43 ----
>   9 files changed, 613 insertions(+), 53 deletions(-)
>   create mode 100644 arch/powerpc/kernel/static_call_test.c
> 
> 
> base-commit: c5e4d5e99162ba8025d58a3af7ad103f155d2df7
> --
> 2.37.2
Benjamin Gray Sept. 13, 2022, 3:31 a.m. UTC | #2
On Thu, 2022-09-01 at 15:58 +1000, Benjamin Gray wrote:
>     So not having a local entry point implies not using a TOC
> pointer, which
>     implies r2 is volatile if the trampoline does not guarantee that
> it preserves
>     r2. However experimenting with such a trampoline showed the
> caller still did
>     not preserve its TOC when necessary, even when the trampoline
> used instructions
>     that wrote to r2. Possibly there's an attribute that can be used
> to mark the
>     necessary info, but I could not find one.

The `.localentry` directive is more general than just specifying where
the local entry is: it can be used to set the relevant ELF bits
directly. So the solution here is setting `.localentry NAME, 1` on the
SC trampoline.

It's not an optimal solution, as it inserts another trampoline to save
r2 before calling the SC trampoline, but it would allow a correct
implementation without the work needed in the other choices.