diff mbox series

x86-64: Restore LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]

Message ID 20220801195150.2160919-1-hjl.tools@gmail.com
State New
Headers show
Series x86-64: Restore LD_PREFER_MAP_32BIT_EXEC support [BZ #28656] | expand

Commit Message

H.J. Lu Aug. 1, 2022, 7:51 p.m. UTC
Crossing 2GB boundaries with indirect calls and jumps can use more
branch prediction resources on several Intel CPUs.  There is visible
performance improvement on workloads with many PLT calls when executable
and shared libraries are mmapped below 2GB.  Add the Prefer_MAP_32BIT_EXEC
bit so that mmap will try to map executable or denywrite pages with
MAP_32BIT first.

NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space
layout randomization (ASLR), which is always disabled for SUID programs
and can only be enabled by setting environment variable,
LD_PREFER_MAP_32BIT_EXEC.
---
 elf/dl-support.c                              |  4 ++
 elf/rtld.c                                    | 20 ++++++++-
 sysdeps/generic/dl-librecon.h                 | 24 +++++++++++
 sysdeps/unix/sysv/linux/x86_64/64/Makefile    | 19 ++++++++
 .../unix/sysv/linux/x86_64/64/dl-librecon.h   | 43 +++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/mmap_internal.h | 43 +++++++++++++++++++
 .../sysv/linux/x86_64/64/tst-map-32bit-mod.c  | 33 ++++++++++++++
 .../unix/sysv/linux/x86_64/64/tst-map-32bit.c | 34 +++++++++++++++
 sysdeps/x86/cpu-tunables.c                    |  7 +++
 ...cpu-features-preferred_feature_index_1.def |  1 +
 10 files changed, 226 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/generic/dl-librecon.h
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/mmap_internal.h
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit-mod.c
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit.c

Comments

Florian Weimer Aug. 2, 2022, 8 a.m. UTC | #1
* H. J. Lu via Libc-alpha:

> Crossing 2GB boundaries with indirect calls and jumps can use more
> branch prediction resources on several Intel CPUs.  There is visible
> performance improvement on workloads with many PLT calls when executable
> and shared libraries are mmapped below 2GB.  Add the Prefer_MAP_32BIT_EXEC
> bit so that mmap will try to map executable or denywrite pages with
> MAP_32BIT first.
>
> NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space
> layout randomization (ASLR), which is always disabled for SUID programs
> and can only be enabled by setting environment variable,
> LD_PREFER_MAP_32BIT_EXEC.

If the performance benefits are significant, this should be handled at
the kernel level.  Only the kernel can put the main program, ld.so and
the vDSO into the same 2GB window (presumably with the main program at
the top, so that the heap can grow almost indefinitely).

For mapping shared objects, we can give the kernel a hint that they will
eventually contain an executable mapping.  If the kernel could reuse
MAP_DENYWRITE for that, no glibc changes would be needed after all.

Doing this is in glibc is only a very partial solution, and so I'd
appreciate if it could be fixed properly in the kernel.

Thanks,
Florian
H.J. Lu Aug. 5, 2022, 9:53 p.m. UTC | #2
On Tue, Aug 2, 2022 at 1:00 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Crossing 2GB boundaries with indirect calls and jumps can use more
> > branch prediction resources on several Intel CPUs.  There is visible
> > performance improvement on workloads with many PLT calls when executable
> > and shared libraries are mmapped below 2GB.  Add the Prefer_MAP_32BIT_EXEC
> > bit so that mmap will try to map executable or denywrite pages with
> > MAP_32BIT first.
> >
> > NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space
> > layout randomization (ASLR), which is always disabled for SUID programs
> > and can only be enabled by setting environment variable,
> > LD_PREFER_MAP_32BIT_EXEC.
>
> If the performance benefits are significant, this should be handled at
> the kernel level.  Only the kernel can put the main program, ld.so and
> the vDSO into the same 2GB window (presumably with the main program at
> the top, so that the heap can grow almost indefinitely).

ld.so and vDSO aren't performance sensitive.  But we need to handle PIE.

> For mapping shared objects, we can give the kernel a hint that they will
> eventually contain an executable mapping.  If the kernel could reuse
> MAP_DENYWRITE for that, no glibc changes would be needed after all.
>
> Doing this is in glibc is only a very partial solution, and so I'd
> appreciate if it could be fixed properly in the kernel.
>

There is no easy way for kernel to selectively mmap PIE with MAP_32BIT.
Can ld.so re-exec PIE with "ld.so PIE" so that ld.so can mmap PIE with
MAP_32BIT?
Florian Weimer Aug. 8, 2022, 1:29 p.m. UTC | #3
* H. J. Lu:

> On Tue, Aug 2, 2022 at 1:00 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > Crossing 2GB boundaries with indirect calls and jumps can use more
>> > branch prediction resources on several Intel CPUs.  There is visible
>> > performance improvement on workloads with many PLT calls when executable
>> > and shared libraries are mmapped below 2GB.  Add the Prefer_MAP_32BIT_EXEC
>> > bit so that mmap will try to map executable or denywrite pages with
>> > MAP_32BIT first.
>> >
>> > NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space
>> > layout randomization (ASLR), which is always disabled for SUID programs
>> > and can only be enabled by setting environment variable,
>> > LD_PREFER_MAP_32BIT_EXEC.
>>
>> If the performance benefits are significant, this should be handled at
>> the kernel level.  Only the kernel can put the main program, ld.so and
>> the vDSO into the same 2GB window (presumably with the main program at
>> the top, so that the heap can grow almost indefinitely).
>
> ld.so and vDSO aren't performance sensitive.  But we need to handle PIE.

I don't think this is necessarily true.  It depends on execution
profile.

clock_gettime in the vDSO could certainly matter to some workloads.

>> For mapping shared objects, we can give the kernel a hint that they will
>> eventually contain an executable mapping.  If the kernel could reuse
>> MAP_DENYWRITE for that, no glibc changes would be needed after all.
>>
>> Doing this is in glibc is only a very partial solution, and so I'd
>> appreciate if it could be fixed properly in the kernel.
>>
>
> There is no easy way for kernel to selectively mmap PIE with MAP_32BIT.
> Can ld.so re-exec PIE with "ld.so PIE" so that ld.so can mmap PIE with
> MAP_32BIT?

In theory, yes, but that still leaves the vDSO issue.  The kernel could
cover that as well.

Regarding the performance issue, does everything have to be in the first
2 GiB or 4 GiB, or is it sufficient if everything is in the same
+/- 2 GiB window?

Thanks,
Florian
H.J. Lu Aug. 8, 2022, 5:02 p.m. UTC | #4
On Mon, Aug 8, 2022 at 6:29 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Tue, Aug 2, 2022 at 1:00 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > Crossing 2GB boundaries with indirect calls and jumps can use more
> >> > branch prediction resources on several Intel CPUs.  There is visible
> >> > performance improvement on workloads with many PLT calls when executable
> >> > and shared libraries are mmapped below 2GB.  Add the Prefer_MAP_32BIT_EXEC
> >> > bit so that mmap will try to map executable or denywrite pages with
> >> > MAP_32BIT first.
> >> >
> >> > NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space
> >> > layout randomization (ASLR), which is always disabled for SUID programs
> >> > and can only be enabled by setting environment variable,
> >> > LD_PREFER_MAP_32BIT_EXEC.
> >>
> >> If the performance benefits are significant, this should be handled at
> >> the kernel level.  Only the kernel can put the main program, ld.so and
> >> the vDSO into the same 2GB window (presumably with the main program at
> >> the top, so that the heap can grow almost indefinitely).
> >
> > ld.so and vDSO aren't performance sensitive.  But we need to handle PIE.
>
> I don't think this is necessarily true.  It depends on execution
> profile.

True.

> clock_gettime in the vDSO could certainly matter to some workloads.
>
> >> For mapping shared objects, we can give the kernel a hint that they will
> >> eventually contain an executable mapping.  If the kernel could reuse
> >> MAP_DENYWRITE for that, no glibc changes would be needed after all.
> >>
> >> Doing this is in glibc is only a very partial solution, and so I'd
> >> appreciate if it could be fixed properly in the kernel.
> >>
> >
> > There is no easy way for kernel to selectively mmap PIE with MAP_32BIT.
> > Can ld.so re-exec PIE with "ld.so PIE" so that ld.so can mmap PIE with
> > MAP_32BIT?
>
> In theory, yes, but that still leaves the vDSO issue.  The kernel could
> cover that as well.

Kernel changes may not be easy.  Glibc changes can cover most of
performance issues.   However, "ld.so PIE" may be difficult to debug.
Is that possible for ld.so to unmap PIE and map PIE with MAP_32BIT?

> Regarding the performance issue, does everything have to be in the first
> 2 GiB or 4 GiB, or is it sufficient if everything is in the same
> +/- 2 GiB window?

This doesn't apply since the issue is with indirect calls and jumps.
diff mbox series

Patch

diff --git a/elf/dl-support.c b/elf/dl-support.c
index 4af0b5b2ce..7160bdfcf1 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -35,6 +35,7 @@ 
 #include <dl-machine.h>
 #include <libc-lock.h>
 #include <dl-cache.h>
+#include <dl-librecon.h>
 #include <dl-procinfo.h>
 #include <unsecvars.h>
 #include <hp-timing.h>
@@ -300,6 +301,9 @@  _dl_non_dynamic_init (void)
     {
       static const char unsecure_envvars[] =
 	UNSECURE_ENVVARS
+#ifdef EXTRA_UNSECURE_ENVVARS
+	EXTRA_UNSECURE_ENVVARS
+#endif
 	;
       const char *cp = unsecure_envvars;
 
diff --git a/elf/rtld.c b/elf/rtld.c
index cbbaf4a331..2ca8c128a3 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -32,6 +32,7 @@ 
 #include <fpu_control.h>
 #include <hp-timing.h>
 #include <libc-lock.h>
+#include <dl-librecon.h>
 #include <unsecvars.h>
 #include <dl-cache.h>
 #include <dl-osinfo.h>
@@ -2681,14 +2682,29 @@  process_envvars (struct dl_main_state *state)
 		= _dl_strtoul (&envline[21], NULL) > 1;
 	    }
 	  break;
+
+	  /* We might have some extra environment variable to handle.  This
+	     is tricky due to the pre-processing of the length of the name
+	     in the switch statement here.  The code here assumes that added
+	     environment variables have a different length.  */
+#ifdef EXTRA_LD_ENVVARS
+	  EXTRA_LD_ENVVARS
+#endif
 	}
     }
 
   /* Extra security for SUID binaries.  Remove all dangerous environment
      variables.  */
-  if (__glibc_unlikely (__libc_enable_secure))
+  if (__builtin_expect (__libc_enable_secure, 0))
     {
-      const char *nextp = UNSECURE_ENVVARS;
+      static const char unsecure_envvars[] =
+#ifdef EXTRA_UNSECURE_ENVVARS
+	EXTRA_UNSECURE_ENVVARS
+#endif
+	UNSECURE_ENVVARS;
+      const char *nextp;
+
+      nextp = unsecure_envvars;
       do
 	{
 	  unsetenv (nextp);
diff --git a/sysdeps/generic/dl-librecon.h b/sysdeps/generic/dl-librecon.h
new file mode 100644
index 0000000000..19fc70cb29
--- /dev/null
+++ b/sysdeps/generic/dl-librecon.h
@@ -0,0 +1,24 @@ 
+/* Optional code to distinguish library flavours.
+   Copyright (C) 1998-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _DL_LIBRECON_H
+#define _DL_LIBRECON_H	1
+
+/* In the general case we don't do anything.  */
+
+#endif /* dl-librecon.h */
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/Makefile b/sysdeps/unix/sysv/linux/x86_64/64/Makefile
index a7b6dc5a53..53a401e35e 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/Makefile
+++ b/sysdeps/unix/sysv/linux/x86_64/64/Makefile
@@ -1,2 +1,21 @@ 
 # The default ABI is 64.
 default-abi := 64
+
+ifeq ($(subdir),elf)
+
+tests-map-32bit = \
+  tst-map-32bit \
+# tests-map-32bit
+tst-map-32bit-no-pie = yes
+tests += $(tests-map-32bit)
+
+modules-map-32bit = \
+  tst-map-32bit-mod \
+# modules-map-32bit
+modules-names += $(modules-map-32bit)
+
+tst-map-32bit-ENV = LD_PREFER_MAP_32BIT_EXEC=1
+$(objpfx)tst-map-32bit-mod.so: $(libsupport)
+$(objpfx)tst-map-32bit: $(objpfx)tst-map-32bit-mod.so
+
+endif
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h b/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
new file mode 100644
index 0000000000..246abccd45
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
@@ -0,0 +1,43 @@ 
+/* Optional code to distinguish library flavours.  x86-64 version.
+   Copyright (C) 2015-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _DL_LIBRECON_H
+
+/* Recognizing extra environment variables.  For 64-bit applications,
+   branch prediction performance may be negatively impacted when the
+   target of a branch is more than 4GB away from the branch.  Add the
+   Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable
+   pages with MAP_32BIT first.  NB: MAP_32BIT will map to lower 2GB,
+   not lower 4GB, address.  Prefer_MAP_32BIT_EXEC reduces bits available
+   for address space layout randomization (ASLR).  Prefer_MAP_32BIT_EXEC
+   is always disabled for SUID programs and can be enabled by setting
+   environment variable, LD_PREFER_MAP_32BIT_EXEC.  */
+#define EXTRA_LD_ENVVARS \
+  case 21:								  \
+    if (!__libc_enable_secure						  \
+	&& memcmp (envline, "PREFER_MAP_32BIT_EXEC", 21) == 0)		  \
+      GLRO(dl_x86_cpu_features).preferred[index_arch_Prefer_MAP_32BIT_EXEC] \
+	|= bit_arch_Prefer_MAP_32BIT_EXEC;				  \
+    break;
+
+/* Extra unsecure variables.  The names are all stuffed in a single
+   string which means they have to be terminated with a '\0' explicitly.  */
+#define EXTRA_UNSECURE_ENVVARS \
+  "LD_PREFER_MAP_32BIT_EXEC\0"
+
+#endif /* dl-librecon.h */
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/mmap_internal.h b/sysdeps/unix/sysv/linux/x86_64/64/mmap_internal.h
new file mode 100644
index 0000000000..7a3c18fb7b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/mmap_internal.h
@@ -0,0 +1,43 @@ 
+/* Linux mmap system call.  x86-64 version.
+   Copyright (C) 2015-2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MMAP_X86_64_INTERNAL_H
+#define MMAP_X86_64_INTERNAL_H
+
+#include <ldsodefs.h>
+
+/* If the Prefer_MAP_32BIT_EXEC bit is set, try to map executable or
+    denywrite pages with MAP_32BIT first.  */
+#define MMAP_PREPARE(addr, len, prot, flags, fd, offset)		\
+  if ((addr) == NULL							\
+      && (((prot) & PROT_EXEC) != 0					\
+	  || ((flags) & MAP_DENYWRITE) != 0)				\
+      && HAS_ARCH_FEATURE (Prefer_MAP_32BIT_EXEC))			\
+    {									\
+      void *ret = (void*) INLINE_SYSCALL_CALL (mmap, (addr), (len),	\
+					      (prot),			\
+					      (flags) | MAP_32BIT,	\
+					      (fd), (offset));		\
+      if (ret != MAP_FAILED)						\
+	return ret;							\
+    }
+
+#include_next <mmap_internal.h>
+
+#endif
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit-mod.c b/sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit-mod.c
new file mode 100644
index 0000000000..0e44d17105
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit-mod.c
@@ -0,0 +1,33 @@ 
+/* Check that LD_PREFER_MAP_32BIT_EXEC works in shared library.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <support/check.h>
+
+static void
+dso_do_test (void)
+{
+}
+
+void
+dso_check_map_32bit (void)
+{
+  printf ("dso_do_test: %p\n", dso_do_test);
+  TEST_VERIFY ((uintptr_t) dso_do_test < 0xffffffffUL);
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit.c b/sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit.c
new file mode 100644
index 0000000000..ed96d8a4d0
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/tst-map-32bit.c
@@ -0,0 +1,34 @@ 
+/* Check that LD_PREFER_MAP_32BIT_EXEC works in PIE and shared library.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <support/check.h>
+
+extern void dso_check_map_32bit (void);
+
+static int
+do_test (void)
+{
+  printf ("do_test: %p\n", do_test);
+  TEST_VERIFY ((uintptr_t) do_test < 0xffffffffUL);
+  dso_check_map_32bit ();
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c
index 94f4fbf243..8b4dfaa5b3 100644
--- a/sysdeps/x86/cpu-tunables.c
+++ b/sysdeps/x86/cpu-tunables.c
@@ -259,6 +259,13 @@  TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp)
 		 20);
 	    }
 	  break;
+	case 21:
+	    {
+	      CHECK_GLIBC_IFUNC_PREFERRED_BOTH (n, cpu_features,
+						Prefer_MAP_32BIT_EXEC,
+						disable, 21);
+	    }
+	  break;
 	case 23:
 	    {
 	      CHECK_GLIBC_IFUNC_PREFERRED_NEED_BOTH
diff --git a/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def b/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def
index 0e9090e74b..2b2c884be9 100644
--- a/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def
+++ b/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def
@@ -26,6 +26,7 @@  BIT (I586)
 BIT (I686)
 BIT (Slow_SSE4_2)
 BIT (AVX_Fast_Unaligned_Load)
+BIT (Prefer_MAP_32BIT_EXEC)
 BIT (Prefer_No_VZEROUPPER)
 BIT (Prefer_ERMS)
 BIT (Prefer_No_AVX512)