mbox series

[v2,0/6] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

Message ID cover.1606319495.git.szabolcs.nagy@arm.com
Headers show
Series aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831] | expand

Message

Szabolcs Nagy Nov. 27, 2020, 1:19 p.m. UTC
This is v2 of
https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html

To enable BTI support, re-mmap executable segments instead of
mprotecting them in case mprotect is seccomp filtered.

I would like linux to change to map the main exe with PROT_BTI when
that is marked as BTI compatible. From the linux side i heard the
following concerns about this:
- it's an ABI change so requires some ABI bump. (this is fine with
  me, i think glibc does not care about backward compat as nothing
  can reasonably rely on the current behaviour, but if we have a
  new bit in auxv or similar then we can save one mprotect call.)
- in case we discover compatibility issues with user binaries it's
  better if userspace can easily disable BTI (e.g. removing the
  mprotect based on some env var, but if kernel adds PROT_BTI and
  mprotect is filtered then we have no reliable way to remove that
  from executables. this problem already exists for static linked
  exes, although admittedly those are less of a compat concern.)
- ideally PROT_BTI would be added via a new syscall that does not
  interfere with PROT_EXEC filtering. (this does not conflict with
  the current patches: even with a new syscall we need a fallback.)
- solve it in systemd (e.g. turn off the filter, use better filter):
  i would prefer not to have aarch64 (or BTI) specific policy in
  user code. and there was no satisfying way to do this portably.

Other concerns about the approach:
- mmap is more expensive than mprotect: in my measurements using
  mmap instead of mprotect is 3-8x slower (and after mmap pages
  have to be faulted in again), but e.g. the exec time of a program
  with 4 deps only increases by < 8% due to the 4 new mmaps. (the
  kernel side resource usage may increase too, i didnt look at that.)
- _dl_signal_error is not valid from the _dl_process_gnu_property
  hook. The v2 set addresses this problem: i could either propagate
  the errors up until they can be handled or solve it in the aarch64
  backend by first recording failures and then dealing with them in
  _dl_open_check. I choose the latter, but did some refactorings in
  _dl_map_object_from_fd that makes the former possible too.

v2:
- [1/6]: New patch that fixes a missed BTI bug found during v2.
- [2-3/6]: New, _dl_map_object_from_fd failure handling improvements,
  these are independent of the rest of the series.
- [4/6]: Move the note handling to a different place (after l_phdr
  setup, but before fd is closed).
- [5/6]: Rebased.
- [6/6]: First record errors and only report them later. (this fixes
  various failure handling issues.)

Szabolcs Nagy (6):
  aarch64: Fix missing BTI protection from dependencies [BZ #26926]
  elf: lose is closely tied to _dl_map_object_from_fd
  elf: Fix failure handling in _dl_map_object_from_fd
  elf: Move note processing after l_phdr is updated
  elf: Pass the fd to note processing
  aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

 elf/dl-load.c              | 110 ++++++++++++++++++++-----------------
 elf/rtld.c                 |   4 +-
 sysdeps/aarch64/dl-bti.c   |  74 ++++++++++++++++++-------
 sysdeps/aarch64/dl-prop.h  |  14 +++--
 sysdeps/aarch64/linkmap.h  |   2 +-
 sysdeps/generic/dl-prop.h  |   6 +-
 sysdeps/generic/ldsodefs.h |   5 +-
 sysdeps/x86/dl-prop.h      |   6 +-
 8 files changed, 135 insertions(+), 86 deletions(-)

Comments

Szabolcs Nagy Nov. 30, 2020, 3:56 p.m. UTC | #1
The 11/27/2020 13:19, Szabolcs Nagy via Libc-alpha wrote:
> This is v2 of
> https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
> 
> To enable BTI support, re-mmap executable segments instead of
> mprotecting them in case mprotect is seccomp filtered.
> 
> I would like linux to change to map the main exe with PROT_BTI when
> that is marked as BTI compatible. From the linux side i heard the
> following concerns about this:
> - it's an ABI change so requires some ABI bump. (this is fine with
>   me, i think glibc does not care about backward compat as nothing
>   can reasonably rely on the current behaviour, but if we have a
>   new bit in auxv or similar then we can save one mprotect call.)
> - in case we discover compatibility issues with user binaries it's
>   better if userspace can easily disable BTI (e.g. removing the
>   mprotect based on some env var, but if kernel adds PROT_BTI and
>   mprotect is filtered then we have no reliable way to remove that
>   from executables. this problem already exists for static linked
>   exes, although admittedly those are less of a compat concern.)
> - ideally PROT_BTI would be added via a new syscall that does not
>   interfere with PROT_EXEC filtering. (this does not conflict with
>   the current patches: even with a new syscall we need a fallback.)
> - solve it in systemd (e.g. turn off the filter, use better filter):
>   i would prefer not to have aarch64 (or BTI) specific policy in
>   user code. and there was no satisfying way to do this portably.
> 
> Other concerns about the approach:
> - mmap is more expensive than mprotect: in my measurements using
>   mmap instead of mprotect is 3-8x slower (and after mmap pages
>   have to be faulted in again), but e.g. the exec time of a program
>   with 4 deps only increases by < 8% due to the 4 new mmaps. (the
>   kernel side resource usage may increase too, i didnt look at that.)

i tested glibc build time with mprotect vs mmap
which should be exec heavy.

the real time overhead was < 0.2% on a particular
4 core system with linux 5.3 ubuntu kernel, which
i consider to be small.

(used PROT_EXEC without PROT_BTI for the measurement).


> - _dl_signal_error is not valid from the _dl_process_gnu_property
>   hook. The v2 set addresses this problem: i could either propagate
>   the errors up until they can be handled or solve it in the aarch64
>   backend by first recording failures and then dealing with them in
>   _dl_open_check. I choose the latter, but did some refactorings in
>   _dl_map_object_from_fd that makes the former possible too.
> 
> v2:
> - [1/6]: New patch that fixes a missed BTI bug found during v2.
> - [2-3/6]: New, _dl_map_object_from_fd failure handling improvements,
>   these are independent of the rest of the series.
> - [4/6]: Move the note handling to a different place (after l_phdr
>   setup, but before fd is closed).
> - [5/6]: Rebased.
> - [6/6]: First record errors and only report them later. (this fixes
>   various failure handling issues.)
> 
> Szabolcs Nagy (6):
>   aarch64: Fix missing BTI protection from dependencies [BZ #26926]
>   elf: lose is closely tied to _dl_map_object_from_fd
>   elf: Fix failure handling in _dl_map_object_from_fd
>   elf: Move note processing after l_phdr is updated
>   elf: Pass the fd to note processing
>   aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]
> 
>  elf/dl-load.c              | 110 ++++++++++++++++++++-----------------
>  elf/rtld.c                 |   4 +-
>  sysdeps/aarch64/dl-bti.c   |  74 ++++++++++++++++++-------
>  sysdeps/aarch64/dl-prop.h  |  14 +++--
>  sysdeps/aarch64/linkmap.h  |   2 +-
>  sysdeps/generic/dl-prop.h  |   6 +-
>  sysdeps/generic/ldsodefs.h |   5 +-
>  sysdeps/x86/dl-prop.h      |   6 +-
>  8 files changed, 135 insertions(+), 86 deletions(-)
> 
> -- 
> 2.17.1
>
Catalin Marinas Dec. 3, 2020, 5:30 p.m. UTC | #2
Hi Szabolcs,

On Fri, Nov 27, 2020 at 01:19:16PM +0000, Szabolcs Nagy wrote:
> This is v2 of
> https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
> 
> To enable BTI support, re-mmap executable segments instead of
> mprotecting them in case mprotect is seccomp filtered.
> 
> I would like linux to change to map the main exe with PROT_BTI when
> that is marked as BTI compatible. From the linux side i heard the
> following concerns about this:
> - it's an ABI change so requires some ABI bump. (this is fine with
>   me, i think glibc does not care about backward compat as nothing
>   can reasonably rely on the current behaviour, but if we have a
>   new bit in auxv or similar then we can save one mprotect call.)

I'm not concerned about the ABI change but there are workarounds like a
new auxv bit.

> - in case we discover compatibility issues with user binaries it's
>   better if userspace can easily disable BTI (e.g. removing the
>   mprotect based on some env var, but if kernel adds PROT_BTI and
>   mprotect is filtered then we have no reliable way to remove that
>   from executables. this problem already exists for static linked
>   exes, although admittedly those are less of a compat concern.)

This is our main concern. For static binaries, the linker could detect,
in theory, potential issues when linking and not set the corresponding
ELF information.

At runtime, a dynamic linker could detect issues and avoid enabling BTI.
In both cases, it's a (static or dynamic) linker decision that belongs
in user-space.

> - ideally PROT_BTI would be added via a new syscall that does not
>   interfere with PROT_EXEC filtering. (this does not conflict with
>   the current patches: even with a new syscall we need a fallback.)

This can be discussed as a long term solution.

> - solve it in systemd (e.g. turn off the filter, use better filter):
>   i would prefer not to have aarch64 (or BTI) specific policy in
>   user code. and there was no satisfying way to do this portably.

I agree. I think the best for now (as a back-portable glibc fix) is to
ignore the mprotect(PROT_EXEC|PROT_BTI) error that the dynamic loader
gets. BTI will be disabled if MDWX is enabled.

In the meantime, we should start (continue) looking at a solution that
works for both systemd and the kernel and be generic enough for other
architectures. The stateless nature of the current SECCOMP approach is
not suitable for this W^X policy. Kees had some suggestions here but the
thread seems to have died:

https://lore.kernel.org/kernel-hardening/202010221256.A4F95FD11@keescook/
Szabolcs Nagy Dec. 7, 2020, 8:03 p.m. UTC | #3
The 12/03/2020 17:30, Catalin Marinas wrote:
> On Fri, Nov 27, 2020 at 01:19:16PM +0000, Szabolcs Nagy wrote:
> > This is v2 of
> > https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
> > 
> > To enable BTI support, re-mmap executable segments instead of
> > mprotecting them in case mprotect is seccomp filtered.
> > 
> > I would like linux to change to map the main exe with PROT_BTI when
> > that is marked as BTI compatible. From the linux side i heard the
> > following concerns about this:
> > - it's an ABI change so requires some ABI bump. (this is fine with
> >   me, i think glibc does not care about backward compat as nothing
> >   can reasonably rely on the current behaviour, but if we have a
> >   new bit in auxv or similar then we can save one mprotect call.)
> 
> I'm not concerned about the ABI change but there are workarounds like a
> new auxv bit.
> 
> > - in case we discover compatibility issues with user binaries it's
> >   better if userspace can easily disable BTI (e.g. removing the
> >   mprotect based on some env var, but if kernel adds PROT_BTI and
> >   mprotect is filtered then we have no reliable way to remove that
> >   from executables. this problem already exists for static linked
> >   exes, although admittedly those are less of a compat concern.)
> 
> This is our main concern. For static binaries, the linker could detect,
> in theory, potential issues when linking and not set the corresponding
> ELF information.
> 
> At runtime, a dynamic linker could detect issues and avoid enabling BTI.
> In both cases, it's a (static or dynamic) linker decision that belongs
> in user-space.

note that the marking is tied to an elf module: if the static
linker can be trusted to produce correct marking then both the
static and dynamic linking cases work, otherwise neither works.
(the dynamic linker cannot detect bti issues, just apply user
supplied policy.)

1) if we consider bti part of the semantics of a marked module
then it should be always on if the system supports it and
ideally the loader of the module should deal with PROT_BTI.
(and if the marking is wrong then the binary is wrong.)

2) if we consider the marking to be a compatibility indicator
and let userspace policy to decide what to do with it then the
static exe and vdso cases should be handled by that policy too.
(this makes sense if we expect that there are reasons to turn
bti off for a process independently of markings. this requires
the static linking startup code to do the policy decision and
self-apply PROT_BTI early.)

the current code does not fit either case well, but i was
planning to do (1). and ideally PROT_BTI would be added
reliably, but a best effort only PROT_BTI works too, however
it limits our ability to report real mprotect failures.

> > - ideally PROT_BTI would be added via a new syscall that does not
> >   interfere with PROT_EXEC filtering. (this does not conflict with
> >   the current patches: even with a new syscall we need a fallback.)
> 
> This can be discussed as a long term solution.
> 
> > - solve it in systemd (e.g. turn off the filter, use better filter):
> >   i would prefer not to have aarch64 (or BTI) specific policy in
> >   user code. and there was no satisfying way to do this portably.
> 
> I agree. I think the best for now (as a back-portable glibc fix) is to
> ignore the mprotect(PROT_EXEC|PROT_BTI) error that the dynamic loader
> gets. BTI will be disabled if MDWX is enabled.

ok.

we got back to the original proposal: silently ignore mprotect
failures. i'm still considering the mmap solution for libraries
only: at least then libraries are handled reliably on current
setups, but i will have to think about whether attack targets
are mainly in libraries like libc or in executables.

> 
> In the meantime, we should start (continue) looking at a solution that
> works for both systemd and the kernel and be generic enough for other
> architectures. The stateless nature of the current SECCOMP approach is
> not suitable for this W^X policy. Kees had some suggestions here but the
> thread seems to have died:
> 
> https://lore.kernel.org/kernel-hardening/202010221256.A4F95FD11@keescook/

it sounded like better W^X enforcement won't happen any time soon.
Catalin Marinas Dec. 11, 2020, 5:46 p.m. UTC | #4
On Mon, Dec 07, 2020 at 08:03:38PM +0000, Szabolcs Nagy wrote:
> The 12/03/2020 17:30, Catalin Marinas wrote:
> > On Fri, Nov 27, 2020 at 01:19:16PM +0000, Szabolcs Nagy wrote:
> > > This is v2 of
> > > https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
> > > 
> > > To enable BTI support, re-mmap executable segments instead of
> > > mprotecting them in case mprotect is seccomp filtered.
> > > 
> > > I would like linux to change to map the main exe with PROT_BTI when
> > > that is marked as BTI compatible. From the linux side i heard the
> > > following concerns about this:
> > > - it's an ABI change so requires some ABI bump. (this is fine with
> > >   me, i think glibc does not care about backward compat as nothing
> > >   can reasonably rely on the current behaviour, but if we have a
> > >   new bit in auxv or similar then we can save one mprotect call.)
> > 
> > I'm not concerned about the ABI change but there are workarounds like a
> > new auxv bit.
> > 
> > > - in case we discover compatibility issues with user binaries it's
> > >   better if userspace can easily disable BTI (e.g. removing the
> > >   mprotect based on some env var, but if kernel adds PROT_BTI and
> > >   mprotect is filtered then we have no reliable way to remove that
> > >   from executables. this problem already exists for static linked
> > >   exes, although admittedly those are less of a compat concern.)
> > 
> > This is our main concern. For static binaries, the linker could detect,
> > in theory, potential issues when linking and not set the corresponding
> > ELF information.
> > 
> > At runtime, a dynamic linker could detect issues and avoid enabling BTI.
> > In both cases, it's a (static or dynamic) linker decision that belongs
> > in user-space.
> 
> note that the marking is tied to an elf module: if the static
> linker can be trusted to produce correct marking then both the
> static and dynamic linking cases work, otherwise neither works.
> (the dynamic linker cannot detect bti issues, just apply user
> supplied policy.)

My assumption is that the dynamic linker may become smarter and detect
BTI issues, if necessary.

Let's say we link together multiple objects, some of them with BTI
instructions, others without. Does the static linker generate a
.note.gnu.property section with GNU_PROPERTY_AARCH64_FEATURE_1_BTI? I
guess not, otherwise the .text section would have a mixture of functions
with and without landing pads.

In the dynamic linker case, if there are multiple shared objects where
some are missing BTI, I guess the dynamic linker currently invokes
mprotect(PROT_BTI) (or mmap()) on all objects with the corresponding
GNU_PROPERTY.

While I don't immediately see an issue with the dynamic loader always
turning on PROT_BTI based solely on the shared object it is linking in,
the static linker takes a more conservative approach. The dynamic linker
may not have a similar choice in the future if the kernel forced
PROT_BTI on the main executable. In both cases it was a user choice.

The dynamic loader itself is statically linked, so any potential
mismatch would have been detected at build time and the corresponding
GNU_PROPERTY unset.

> 1) if we consider bti part of the semantics of a marked module
> then it should be always on if the system supports it and
> ideally the loader of the module should deal with PROT_BTI.
> (and if the marking is wrong then the binary is wrong.)
> 
> 2) if we consider the marking to be a compatibility indicator
> and let userspace policy to decide what to do with it then the
> static exe and vdso cases should be handled by that policy too.

For static exe, we assume that the compatibility was checked at link
time. However, you are right on the vdso, we always turn BTI on. So it
can indeed be argued that the kernel already made the decision for (some
of) the user modules.

> (this makes sense if we expect that there are reasons to turn
> bti off for a process independently of markings. this requires
> the static linking startup code to do the policy decision and
> self-apply PROT_BTI early.)

We currently left this policy decision to the dynamic loader (mostly,
apart from vdso).

> the current code does not fit either case well, but i was
> planning to do (1). and ideally PROT_BTI would be added
> reliably, but a best effort only PROT_BTI works too, however
> it limits our ability to report real mprotect failures.

If we (kernel people) agree to set PROT_BTI on for the main executable,
we can expose a bit (in AT_FLAGS or somewhere) to tell the dynamic
loader that PROT_BTI is already on. I presume subsequent objects will be
mapped with mmap().

> > > - ideally PROT_BTI would be added via a new syscall that does not
> > >   interfere with PROT_EXEC filtering. (this does not conflict with
> > >   the current patches: even with a new syscall we need a fallback.)
> > 
> > This can be discussed as a long term solution.
> > 
> > > - solve it in systemd (e.g. turn off the filter, use better filter):
> > >   i would prefer not to have aarch64 (or BTI) specific policy in
> > >   user code. and there was no satisfying way to do this portably.
> > 
> > I agree. I think the best for now (as a back-portable glibc fix) is to
> > ignore the mprotect(PROT_EXEC|PROT_BTI) error that the dynamic loader
> > gets. BTI will be disabled if MDWX is enabled.
> 
> ok.
> 
> we got back to the original proposal: silently ignore mprotect
> failures. i'm still considering the mmap solution for libraries
> only: at least then libraries are handled reliably on current
> setups, but i will have to think about whether attack targets
> are mainly in libraries like libc or in executables.

I think ignoring the mprotect() error is the best we can do now. If we
add a kernel patch to turn PROT_BTI on together with an AT_FLAGS bit,
the user mprotect() would no longer be necessary.

In the absence of an AT_FLAGS bit, we could add PROT_BTI on the main exe
and backport the fix to when we first added BTI support. This way the
dynamic loader may just ignore the mprotect() altogether on the main
exe, assuming that people run latest stable kernels.

> > In the meantime, we should start (continue) looking at a solution that
> > works for both systemd and the kernel and be generic enough for other
> > architectures. The stateless nature of the current SECCOMP approach is
> > not suitable for this W^X policy. Kees had some suggestions here but the
> > thread seems to have died:
> >
> > https://lore.kernel.org/kernel-hardening/202010221256.A4F95FD11@keescook/
> 
> it sounded like better W^X enforcement won't happen any time soon.

Unfortunately, I think you are right here.

Anyway, looking for any other input from the kernel and systemd people.
If not, I'll post a patch at 5.11-rc1 turning PROT_BTI on for the main
exe and take it from there. I think such discussion shouldn't disrupt
the glibc fixes/improvements.