Message ID | 20190212194253.1951-2-mathieu.desnoyers@efficios.com |
---|---|
State | New |
Headers | show |
Series | Restartable Sequences support for glibc 2.30 | expand |
On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > Register rseq(2) TLS for each thread (including main), and unregister > for each thread (excluding main). "rseq" stands for Restartable > Sequences. Thanks, the implementation is looking good, before this goes in it needs some procedural cleanup and the following: (a) I would like to see a final position from Florian, either accept, object (objection recorded), or sustained objection (blocks this patch), on the issue of the registration protocol. I suggested a possible way forward to remove the registration protocol down the line, leaving only the __rseq_lib_abi as the residual artifact once we deprecate coordination with other libraries. Applications would have to be written with this in mind, and coordination still remains with libc for the time being. Perhaps Mathieu has another suggestion. (b) I would like to see a final position from Rich Felker, either accept, object (objection recorded), or sustained objection (blocks this patch), on the patch set as a whole. I trust Rich's opinion as an alternative libc maintainer in this matter and as a independent implementor of similar standards. Rich will have to implement this for musl, and it would be good to have consensus from him. So while I reviewed this patch, and provided nit-picky feedback below, I think we need a whole new thread, to finalize the way forward for the registration process. Next steps: - Start a new thread to talk *only* about options for removing the refcounted registration. > See the rseq(2) man page proposed here: > https://lkml.org/lkml/2018/9/19/647 Thank you for having a man page, that makes for a very easy to follow description of the structures, syscall and use cases. > > This patch is based on glibc-2.29. The rseq(2) system call was merged > into Linux 4.18. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Carlos O'Donell <carlos@redhat.com> > CC: Florian Weimer <fweimer@redhat.com> > CC: Joseph Myers <joseph@codesourcery.com> > CC: Szabolcs Nagy <szabolcs.nagy@arm.com> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Ben Maurer <bmaurer@fb.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > CC: Boqun Feng <boqun.feng@gmail.com> > CC: Will Deacon <will.deacon@arm.com> > CC: Dave Watson <davejwatson@fb.com> > CC: Paul Turner <pjt@google.com> > CC: Rich Felker <dalias@libc.org> > CC: libc-alpha@sourceware.org > CC: linux-kernel@vger.kernel.org > CC: linux-api@vger.kernel.org > --- > Changes since v1: > - Move __rseq_refcount to an extra field at the end of __rseq_abi to > eliminate one symbol. > > All libraries/programs which try to register rseq (glibc, > early-adopter applications, early-adopter libraries) should use the > rseq refcount. It becomes part of the ABI within a user-space > process, but it's not part of the ABI shared with the kernel per se. > > - Restructure how this code is organized so glibc keeps building on > non-Linux targets. > > - Use non-weak symbol for __rseq_abi. > > - Move rseq registration/unregistration implementation into its own > nptl/rseq.c compile unit. > > - Move __rseq_abi symbol under GLIBC_2.29. > > Changes since v2: > - Move __rseq_refcount to its own symbol, which is less ugly than > trying to play tricks with the rseq uapi. > - Move __rseq_abi from nptl to csu (C start up), so it can be used > across glibc, including memory allocator and sched_getcpu(). The > __rseq_refcount symbol is kept in nptl, because there is no reason > to use it elsewhere in glibc. > > Changes since v3: > - Set __rseq_refcount TLS to 1 on register/set to 0 on unregister > because glibc is the first/last user. > - Unconditionally register/unregister rseq at thread start/exit, because > glibc is the first/last user. > - Add missing abilist items. > - Rebase on glibc master commit a502c5294. > - Add NEWS entry. > > Changes since v4: > - Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on > "System V Application Binary Interface", weak only affects the link > editor, not the dynamic linker. > - Install a new sys/rseq.h system header on Linux, which contains the > RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount > declaration. Move those definition/declarations from rseq-internal.h > to the installed sys/rseq.h header. > - Considering that rseq is only available on Linux, move csu/rseq.c to > sysdeps/unix/sysv/linux/rseq-sym.c. > - Move __rseq_refcount from nptl/rseq.c to > sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux. > - Move both ABI definitions for __rseq_abi and __rseq_refcount to > sysdeps/unix/sysv/linux/Versions, so they only appear on Linux. > - Document __rseq_abi and __rseq_refcount volatile. > - Document the RSEQ_SIG signature define. > - Move registration functions from rseq.c to rseq-internal.h static > inline functions. Introduce empty stubs in misc/rseq-internal.h, > which can be overridden by architecture code in > sysdeps/unix/sysv/linux/rseq-internal.h. > - Rename __rseq_register_current_thread and __rseq_unregister_current_thread > to rseq_register_current_thread and rseq_unregister_current_thread, > now that those are only visible as internal static inline functions. > - Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN > rather than nptl init, so applications not linked against > libpthread.so have rseq registered for their main() thread. Note that > it is invoked separately for SHARED and !SHARED builds. > > Changes since v5: > - Replace __rseq_refcount by __rseq_lib_abi, which contains two > uint32_t: register_state and refcount. The "register_state" field > allows inhibiting rseq registration from signal handlers nested on top > of glibc registration and occuring after rseq unregistration by glibc. > - Introduce enum rseq_register_state, which contains the states allowed > for the struct rseq_lib_abi register_state field. > > Changes since v6: > - Introduce bits/rseq.h to define RSEQ_SIG for each architecture. > The generic bits/rseq.h does not define RSEQ_SIG, meaning that each > architecture implementing rseq needs to implement bits/rseq.h. > - Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING. > - Port to glibc-2.29. > --- > NEWS | 11 +++ > csu/libc-start.c | 12 ++- > misc/Makefile | 3 +- > misc/rseq-internal.h | 34 +++++++ > nptl/pthread_create.c | 9 ++ > sysdeps/unix/sysv/linux/Makefile | 4 +- > sysdeps/unix/sysv/linux/Versions | 4 + > sysdeps/unix/sysv/linux/aarch64/bits/rseq.h | 24 +++++ > sysdeps/unix/sysv/linux/aarch64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/alpha/libc.abilist | 2 + > sysdeps/unix/sysv/linux/arm/bits/rseq.h | 24 +++++ > sysdeps/unix/sysv/linux/arm/libc.abilist | 2 + > sysdeps/unix/sysv/linux/bits/rseq.h | 24 +++++ > sysdeps/unix/sysv/linux/hppa/libc.abilist | 2 + > sysdeps/unix/sysv/linux/i386/libc.abilist | 2 + > sysdeps/unix/sysv/linux/ia64/libc.abilist | 2 + > .../sysv/linux/m68k/coldfire/libc.abilist | 2 + > .../unix/sysv/linux/m68k/m680x0/libc.abilist | 2 + > .../unix/sysv/linux/microblaze/libc.abilist | 2 + > sysdeps/unix/sysv/linux/mips/bits/rseq.h | 24 +++++ > .../sysv/linux/mips/mips32/fpu/libc.abilist | 2 + > .../sysv/linux/mips/mips32/nofpu/libc.abilist | 2 + > .../sysv/linux/mips/mips64/n32/libc.abilist | 2 + > .../sysv/linux/mips/mips64/n64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/nios2/libc.abilist | 2 + > sysdeps/unix/sysv/linux/powerpc/bits/rseq.h | 24 +++++ > .../linux/powerpc/powerpc32/fpu/libc.abilist | 2 + > .../powerpc/powerpc32/nofpu/libc.abilist | 2 + > .../linux/powerpc/powerpc64/be/libc.abilist | 2 + > .../linux/powerpc/powerpc64/le/libc.abilist | 2 + > .../unix/sysv/linux/riscv/rv64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/rseq-internal.h | 91 +++++++++++++++++++ > sysdeps/unix/sysv/linux/rseq-sym.c | 54 +++++++++++ > sysdeps/unix/sysv/linux/s390/bits/rseq.h | 24 +++++ > .../unix/sysv/linux/s390/s390-32/libc.abilist | 2 + > .../unix/sysv/linux/s390/s390-64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/sh/libc.abilist | 2 + > .../sysv/linux/sparc/sparc32/libc.abilist | 2 + > .../sysv/linux/sparc/sparc64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/sys/rseq.h | 65 +++++++++++++ > sysdeps/unix/sysv/linux/x86/bits/rseq.h | 24 +++++ > .../unix/sysv/linux/x86_64/64/libc.abilist | 2 + > .../unix/sysv/linux/x86_64/x32/libc.abilist | 2 + > 43 files changed, 501 insertions(+), 6 deletions(-) > create mode 100644 misc/rseq-internal.h > create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h > create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c > create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h > > diff --git a/NEWS b/NEWS > index 912a9bdc0f..0608c60f7d 100644 > --- a/NEWS > +++ b/NEWS > @@ -5,6 +5,17 @@ See the end for copying conditions. > Please send GNU C library bug reports via <https://sourceware.org/bugzilla/> > using `glibc' in the "product" field. > > +Version 2.30 > + > +Major new features: > + > +* Support for automatically registering threads with the Linux rseq(2) > + system call has been added. This system call is implemented starting > + from Linux 4.18. In order to be activated, it requires that glibc is built > + against kernel headers that include this system call, and that glibc > + detects availability of that system call at runtime. What benefit does the feature have for users? Can you talk about that please? Why would a user want to use it. Please feel free to link to an external reference. > + > + > Version 2.29 > > Major new features: > diff --git a/csu/libc-start.c b/csu/libc-start.c > index 5d9c3675fa..8680afc0ef 100644 > --- a/csu/libc-start.c > +++ b/csu/libc-start.c > @@ -22,6 +22,7 @@ > #include <ldsodefs.h> > #include <exit-thread.h> > #include <libc-internal.h> > +#include <rseq-internal.h> OK. > > #include <elf/dl-tunables.h> > > @@ -140,7 +141,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), > > __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up; > > -#ifndef SHARED > +#ifdef SHARED > + /* Register rseq ABI to the kernel. */ > + (void) rseq_register_current_thread (); OK, early registration of main thread in shared case. > +#else > _dl_relocate_static_pie (); > > char **ev = &argv[argc + 1]; > @@ -218,6 +222,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), > } > # endif > > + /* Register rseq ABI to the kernel. */ > + (void) rseq_register_current_thread (); OK, early registration of main thread in static case. > + > /* Initialize libpthread if linked in. */ > if (__pthread_initialize_minimal != NULL) > __pthread_initialize_minimal (); > @@ -230,8 +237,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), > # else > __pointer_chk_guard_local = pointer_chk_guard; > # endif > - > -#endif /* !SHARED */ > +#endif OK. > > /* Register the destructor of the dynamic linker if there is any. */ > if (__glibc_likely (rtld_fini != NULL)) > diff --git a/misc/Makefile b/misc/Makefile > index cf0daa1161..0ae1dbaf80 100644 > --- a/misc/Makefile > +++ b/misc/Makefile > @@ -36,7 +36,8 @@ headers := sys/uio.h bits/uio-ext.h bits/uio_lim.h \ > syslog.h sys/syslog.h \ > bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \ > bits/select2.h bits/hwcap.h sys/auxv.h \ > - sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h > + sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \ > + rseq-internal.h OK. > > routines := brk sbrk sstk ioctl \ > readv writev preadv preadv64 pwritev pwritev64 \ > diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h > new file mode 100644 > index 0000000000..915122e4bf > --- /dev/null > +++ b/misc/rseq-internal.h > @@ -0,0 +1,34 @@ > +/* Copyright (C) 2018 Free Software Foundation, Inc. Add a one line description before copyright which describes the file. Is it a stub file. Is it the Linux implementation. etc. Please add a one sentence first line short description for all new files you create. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. We don't use "contributed by" markers anymore, we let the git commit author carry that information, along with git blame information. Please remove this contributed by line and all subsequent such lines. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef RSEQ_INTERNAL_H > +#define RSEQ_INTERNAL_H > + > +static inline int > +rseq_register_current_thread (void) > +{ > + return -1; > +} > + > +static inline int > +rseq_unregister_current_thread (void) > +{ > + return -1; > +} > + OK. > +#endif /* rseq-internal.h */ > diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c > index 2bd2b10727..90b3419390 100644 > --- a/nptl/pthread_create.c > +++ b/nptl/pthread_create.c > @@ -33,6 +33,7 @@ > #include <default-sched.h> > #include <futex-internal.h> > #include <tls-setup.h> > +#include <rseq-internal.h> Ok. > #include "libioP.h" > > #include <shlib-compat.h> > @@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd) > START_THREAD_DEFN > { > struct pthread *pd = START_THREAD_SELF; > + bool has_rseq = false; OK. > > #if HP_TIMING_AVAIL > /* Remember the time when the thread was started. */ > @@ -396,6 +398,9 @@ START_THREAD_DEFN > if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2)) > futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE); > > + /* Register rseq TLS to the kernel. */ > + has_rseq = !rseq_register_current_thread (); OK, initialize rseq after enabling setxid but before setting up roubust list, no particularly reason it should be right here that I can see, that's OK though. > + > #ifdef __NR_set_robust_list > # ifndef __ASSUME_SET_ROBUST_LIST > if (__set_robust_list_avail >= 0) > @@ -573,6 +578,10 @@ START_THREAD_DEFN > } > #endif > > + /* Unregister rseq TLS from kernel. */ > + if (has_rseq && rseq_unregister_current_thread ()) > + abort(); OK. Calling abort() seems harsh because it kills the whole process, but I'm OK with failing catastrophically if that's what we want here. It seems unrecoverable. > + > advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, > pd->guardsize); > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile > index 5f8c2c7c7d..5b541469ec 100644 > --- a/sysdeps/unix/sysv/linux/Makefile > +++ b/sysdeps/unix/sysv/linux/Makefile > @@ -1,5 +1,5 @@ > ifeq ($(subdir),csu) > -sysdep_routines += errno-loc > +sysdep_routines += errno-loc rseq-sym OK. > endif > > ifeq ($(subdir),assert) > @@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \ > bits/termios-c_iflag.h bits/termios-c_oflag.h \ > bits/termios-baud.h bits/termios-c_cflag.h \ > bits/termios-c_lflag.h bits/termios-tcflow.h \ > - bits/termios-misc.h > + bits/termios-misc.h sys/rseq.h bits/rseq.h OK. > > tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \ > tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \ > diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions > index f1e12d9c69..ad88c2b7ff 100644 > --- a/sysdeps/unix/sysv/linux/Versions > +++ b/sysdeps/unix/sysv/linux/Versions > @@ -174,6 +174,10 @@ libc { > GLIBC_2.29 { > getcpu; > } > + GLIBC_2.30 { > + __rseq_abi; > + __rseq_lib_abi; > + } OK, good using GLIBC_2.30 for master. > GLIBC_PRIVATE { > # functions used in other libraries > __syscall_rt_sigqueueinfo; > diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h > new file mode 100644 > index 0000000000..543bc5388a > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h > @@ -0,0 +1,24 @@ Needs a one sentence description. > +/* Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0. */ OK. > diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist > index 9c330f325e..bc937f585d 100644 > --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist > @@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist > index f630fa4c6f..89cc8b1cfb 100644 > --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist > +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist > @@ -2036,6 +2036,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > @@ -0,0 +1,24 @@ Needs a one sentence description. > +/* Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why isn't this an arm specific op code? Does the user have to mark this up as part of a constant pool when placing it in front of the abort handler to avoid disassembling the constant as code? This was at one point required to get gdb to work properly. > diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist > index b96f45590f..e5055f2d4e 100644 > --- a/sysdeps/unix/sysv/linux/arm/libc.abilist > +++ b/sysdeps/unix/sysv/linux/arm/libc.abilist > @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.4 _Exit F > GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 > GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 > diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h > new file mode 100644 > index 0000000000..d60f02cfeb > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit > + signature inserted before each rseq abort label in the code section. */ Needs a huge explanation about RSEQ_SIG and the reasons why it's per-arch. Basically cut-and-paste what you wrote to libc-alpha about this. > diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist > index 088a8ee369..546d073cdb 100644 > --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist > +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist > @@ -1883,6 +1883,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist > index f7ff2c57b9..ac1de6e4b3 100644 > --- a/sysdeps/unix/sysv/linux/i386/libc.abilist > +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist > @@ -2048,6 +2048,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist > index becd8b1033..cc3445b958 100644 > --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist > @@ -1917,6 +1917,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist > index 74e42a5209..f7e28bd5a0 100644 > --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist > +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist > @@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.4 _Exit F > GLIBC_2.4 _IO_2_1_stderr_ D 0x98 > GLIBC_2.4 _IO_2_1_stdin_ D 0x98 > diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist > index 4af5a74e8a..b8f00f6111 100644 > --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist > +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist > @@ -1992,6 +1992,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist > index ccef673fd2..19f191434f 100644 > --- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist > +++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist > @@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why isn't this a mips-specific op code? > diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist > index 1054bb599e..fe43507f55 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist > @@ -1970,6 +1970,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist > index 4f5b5ffebf..b247c6ea9b 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist > @@ -1968,6 +1968,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist > index 943aee58d4..5339ca52b6 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist > @@ -1976,6 +1976,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist > index 17a5d17ef9..11f24eb440 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist > @@ -1971,6 +1971,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist > index 4d62a540fd..fd223bfc44 100644 > --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist > +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist > @@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why isn't this an opcode specific to power? > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist > index ecc2d6fa13..cc53178e81 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist > @@ -1996,6 +1996,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist > index f5830f9c33..2de3134bc7 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist > @@ -2000,6 +2000,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist > index 633d8f4792..aae3def700 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist > @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 _Exit F > GLIBC_2.3 _IO_2_1_stderr_ D 0xe0 > GLIBC_2.3 _IO_2_1_stdin_ D 0xe0 > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist > index 2c712636ef..8d582a3a9b 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist > @@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > index 195bc8b2cf..155953f6cf 100644 > --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > @@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h > new file mode 100644 > index 0000000000..d676da3701 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/rseq-internal.h > @@ -0,0 +1,91 @@ > +/* Copyright (C) 2018 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef RSEQ_INTERNAL_H > +#define RSEQ_INTERNAL_H > + > +#include <sysdep.h> > + > +#ifdef __NR_rseq > + > +#include <errno.h> > +#include <sys/rseq.h> > + This needs a giant comment describing the reference count algorithm and how it is intended to work. > +static inline int > +rseq_register_current_thread (void) > +{ > + int rc, ret = 0; > + INTERNAL_SYSCALL_DECL (err); > + > + if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED) > + return -1; > + /* Temporarily prevent nested signal handlers from registering rseq. */ > + __rseq_lib_abi.register_state = RSEQ_REGISTER_ONGOING; OK. > + if (__rseq_lib_abi.refcount == UINT_MAX) > + { > + ret = -1; > + goto end; > + } > + if (__rseq_lib_abi.refcount++) > + goto end; OK. > + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), > + 0, RSEQ_SIG); > + if (!rc) > + goto end; > + if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY) > + __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED; > + ret = -1; > +end: > + __rseq_lib_abi.register_state = RSEQ_REGISTER_ALLOWED; OK. > + return ret; > +} > + > +static inline int > +rseq_unregister_current_thread (void) > +{ > + int rc, ret = 0; > + INTERNAL_SYSCALL_DECL (err); > + > + /* Setting __rseq_register_state = RSEQ_REGISTER_EXITING for the rest of the > + thread lifetime. Ensures signal handlers nesting just before thread exit > + don't try to register rseq. */ > + __rseq_lib_abi.register_state = RSEQ_REGISTER_EXITING; > + __rseq_lib_abi.refcount = 0; > + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), > + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); OK. > + if (!rc) > + goto end; > + ret = -1; > +end: > + return ret; > +} > +#else > +static inline int > +rseq_register_current_thread (void) > +{ > + return -1; > +} > + > +static inline int > +rseq_unregister_current_thread (void) > +{ > + return -1; > +} OK. > +#endif > + > +#endif /* rseq-internal.h */ > diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c > new file mode 100644 > index 0000000000..99b277e9d6 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/rseq-sym.c > @@ -0,0 +1,54 @@ > +/* Copyright (C) 2018 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#include <sys/syscall.h> > +#include <stdint.h> > + > +#ifdef __NR_rseq > +#include <sys/rseq.h> > +#else > + > +enum rseq_cpu_id_state { > + RSEQ_CPU_ID_UNINITIALIZED = -1, > + RSEQ_CPU_ID_REGISTRATION_FAILED = -2, OK. > +}; > + > +/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI > + size is 20 bytes. */ > +struct rseq { > + uint32_t cpu_id_start; > + uint32_t cpu_id; > + uint64_t rseq_cs; > + uint32_t flags; > +} __attribute__ ((aligned(4 * sizeof(uint64_t)))); OK, this is the kernel abi. > + > +struct rseq_lib_abi > +{ > + uint32_t register_state; > + uint32_t refcount; OK. This is the refcount coordination structure for registration. > +}; > + > +#endif > + > +/* volatile because fields can be read/updated by the kernel. */ > +__thread volatile struct rseq __rseq_abi = { > + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, OK. > +}; > + > +/* volatile because fields can be read/updated by signal handlers. */ > +__thread volatile struct rseq_lib_abi __rseq_lib_abi; OK. > diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why not a s390 specific value here? > diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist > index 334def033c..42316d8666 100644 > --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist > @@ -2005,6 +2005,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist > index 536f4c4ced..c6c4e55a77 100644 > --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist > @@ -1911,6 +1911,8 @@ GLIBC_2.29 __fentry__ F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist > index 30ae3b6ebb..8652dfea59 100644 > --- a/sysdeps/unix/sysv/linux/sh/libc.abilist > +++ b/sysdeps/unix/sysv/linux/sh/libc.abilist > @@ -1887,6 +1887,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist > index 68b107d080..95b58dfa67 100644 > --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist > @@ -1999,6 +1999,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist > index e5b6a4da50..bfd24f9d1c 100644 > --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist > @@ -1940,6 +1940,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h > new file mode 100644 > index 0000000000..83c8976f50 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/sys/rseq.h > @@ -0,0 +1,65 @@ Needs a one sentence description. > +/* Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +#define _SYS_RSEQ_H 1 > + > +/* We use the structures declarations from the kernel headers. */ > +#include <linux/rseq.h> > +/* Architecture-specific rseq signature. */ > +#include <bits/rseq.h> > +#include <stdint.h> OK. > + > +enum rseq_register_state > +{ > + /* Value RSEQ_REGISTER_ALLOWED means it is allowed to update > + the refcount field and to register/unregister rseq. */ > + RSEQ_REGISTER_ALLOWED = 0, > + /* Value RSEQ_REGISTER_ONGOING means a rseq registration is in progress, > + so it is temporarily forbidden to update the refcount field or to > + register/unregister rseq for this thread or signal handlers nested > + on this thread. */ > + RSEQ_REGISTER_ONGOING = 1, > + /* Value RSEQ_REGISTER_EXITING means it is forbidden to update the > + refcount field or to register/unregister rseq for the rest of the > + thread's lifetime. */ > + RSEQ_REGISTER_EXITING = 2, OK. > +}; > + > +struct rseq_lib_abi > +{ > + uint32_t register_state; /* enum rseq_register_state. */ > + /* The refcount field keeps track of rseq users, so early adopters > + of rseq can cooperate amongst each other and with glibc to > + share rseq thread registration. The refcount field can only be > + updated when allowed by the value of field register_state. > + Registering rseq should be performed when incrementing refcount > + from 0 to 1, and unregistering rseq should be performed when > + decrementing refcount from 1 to 0. */ OK. > + uint32_t refcount; > +}; > + > +/* volatile because fields can be read/updated by the kernel. */ > +extern __thread volatile struct rseq __rseq_abi > +__attribute__ ((tls_model ("initial-exec"))); OK. > + > +/* volatile because fields can be read/updated by signal handlers. */ > +extern __thread volatile struct rseq_lib_abi __rseq_lib_abi > +__attribute__ ((tls_model ("initial-exec"))); OK. > + > +#endif /* sys/rseq.h */ > diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Add one line short description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why not an x86-specific op code? > diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist > index 86dfb0c94d..e9f8411fb2 100644 > --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist > @@ -1898,6 +1898,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist > index dd688263aa..f9432d07f1 100644 > --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist > @@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. >
Hi Carlos, ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: [...] I took care of all your comments for an upcoming round of patches, except the following that remain open (see answer inline). I'm adding Linux maintainers for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of code signature prior to the abort handler for each of those architectures. [...] >> diff --git a/NEWS b/NEWS >> index 912a9bdc0f..0608c60f7d 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -5,6 +5,17 @@ See the end for copying conditions. >> Please send GNU C library bug reports via <https://sourceware.org/bugzilla/> >> using `glibc' in the "product" field. >> >> +Version 2.30 >> + >> +Major new features: >> + >> +* Support for automatically registering threads with the Linux rseq(2) >> + system call has been added. This system call is implemented starting >> + from Linux 4.18. In order to be activated, it requires that glibc is built >> + against kernel headers that include this system call, and that glibc >> + detects availability of that system call at runtime. > > What benefit does the feature have for users? Can you talk about that please? > Why would a user want to use it. Please feel free to link to an external > reference. Would the following text work ? Version 2.30 Major new features: * Support for automatically registering threads with the Linux rseq(2) system call has been added. This system call is implemented starting from Linux 4.18. The Restartable Sequences ABI accelerates user-space operations on per-cpu data. It allows user-space to perform updates on per-cpu data without requiring heavy-weight atomic operations. See https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/ for further explanation. In order to be activated, it requires that glibc is built against kernel headers that include this system call, and that glibc detects availability of that system call at runtime. For reference the assembly code I'm pointing at below can be found in the Linux selftests under: tools/testing/selftests/rseq/rseq-*.h >> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h [...] >> + >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why isn't this an arm specific op code? Does the user have to mark this > up as part of a constant pool when placing it in front of the abort handler > to avoid disassembling the constant as code? This was at one point required > to get gdb to work properly. > For arm, the abort is defined as: #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ abort_label, version, flags, \ start_ip, post_commit_offset, abort_ip) \ ".balign 32\n\t" \ __rseq_str(table_label) ":\n\t" \ ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ ".word " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "b %l[" __rseq_str(abort_label) "]\n\t" Which contains a copy of the struct rseq_cs for that critical section close to the actual critical section, within the code, followed by the signature. The reason why we have a copy of the struct rseq_cs there is to speed up entry into the critical section by using the "adr" instruction to compute the address to store into __rseq_cs->rseq_cs. AFAIU, a literal pool on ARM is defined as something which is always jumped over (never executed), which is the case here. We always have an unconditional branch instruction ("b") skipping over each RSEQ_ASM_DEFINE_ABORT(). Therefore, given that the signature is part of a literal pool on ARM, it can be any value we choose and should not need to be an actual valid instruction. aarch64 defines the abort as: #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ " b 222f\n" \ " .inst " __rseq_str(RSEQ_SIG) "\n" \ __rseq_str(label) ":\n" \ " b %l[" __rseq_str(abort_label) "]\n" \ "222:\n" Where the signature actually maps to a valid instruction. Considering that aarch64 also have literal pools, and we branch over the signature, I wonder why it's so important to ensure the signature is a valid trap instruction. Perhaps Will Deacon can enlighten us ? [...] >> +++ b/sysdeps/unix/sysv/linux/bits/rseq.h >> @@ -0,0 +1,24 @@ [...] >> + >> +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit >> + signature inserted before each rseq abort label in the code section. */ > > Needs a huge explanation about RSEQ_SIG and the reasons why it's per-arch. > > Basically cut-and-paste what you wrote to libc-alpha about this. Updated to: /* RSEQ_SIG is a signature required before each abort handler code. It is a 32-bit value that maps to actual architecture code compiled into applications and libraries. It needs to be defined for each architecture. When choosing this value, it needs to be taken into account that generating invalid instructions may have ill effects on tools like objdump, and may also have impact on the CPU speculative execution efficiency in some cases. */ [...] >> +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h [...] >> + >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why isn't this a mips-specific op code? MIPS also has a literal pool just before the abort handler, and it jumps over it. My understanding is that we can use any signature value we want, and it does not need to be a valid instruction, similarly to ARM: #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ abort_label, version, flags, \ start_ip, post_commit_offset, abort_ip) \ ".balign 32\n\t" \ __rseq_str(table_label) ":\n\t" \ ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ ".word " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "b %l[" __rseq_str(abort_label) "]\n\t" Perhaps Paul Burton can confirm this ? [...] >> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h [...] >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why isn't this an opcode specific to power? On powerpc 32/64, the abort is placed in a __rseq_failure executable section: #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ "b %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" That section only contains snippets of those trampolines. Arguably, it would be good if disassemblers could find valid instructions there. Boqun Feng could perhaps shed some light on this signature choice ? Now would be a good time to decide once and for all whether a valid instruction would be a better choice. [...] >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h [...] >> + >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why not a s390 specific value here? s390 also has the abort handler in a __rseq_failure section: #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "j %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" Same question applies as powerpc: since disassemblers will try to decode that instruction, would it be better to define it as a valid one ? [...] >> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h [...] >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why not an x86-specific op code? On x86, we use this 4-byte signature as operand to a "no-op" instruction taking 4-byte immediate operand: x86-32: #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ /* Disassembler-friendly signature: nopl <sig>. */ \ ".byte 0x0f, 0x1f, 0x05\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "jmp %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" x86-64: #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ /* Disassembler-friendly signature: nopl <sig>(%rip). */\ ".byte 0x0f, 0x1f, 0x05\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "jmp %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" Thanks, Mathieu
On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > [...] > >> + > >> +/* Signature required before each abort handler code. */ > >> +#define RSEQ_SIG 0x53053053 > > > > Why not a s390 specific value here? > > s390 also has the abort handler in a __rseq_failure section: > > #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "j %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > Same question applies as powerpc: since disassemblers will try to decode > that instruction, would it be better to define it as a valid one ? > > [...] A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. And please replace that "j %l[...]" with a "jg %l[...]", the branch target range of the "j" instruction is 64K, not enough for the general case.
----- On Mar 27, 2019, at 5:16 AM, schwidefsky schwidefsky@de.ibm.com wrote: > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h >> [...] >> >> + >> >> +/* Signature required before each abort handler code. */ >> >> +#define RSEQ_SIG 0x53053053 >> > >> > Why not a s390 specific value here? >> >> s390 also has the abort handler in a __rseq_failure section: >> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> __rseq_str(label) ":\n\t" \ >> teardown \ >> "j %l[" __rseq_str(abort_label) "]\n\t" \ >> ".popsection\n\t" >> >> Same question applies as powerpc: since disassemblers will try to decode >> that instruction, would it be better to define it as a valid one ? >> >> [...] > > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. Based on the z/Architecture reference summary, it appears that the DIEBR instruction's opcode is B353. So I suspect that just starting with 0x53 is not sufficient to make it a valid opcode in the instruction set. Is it something we should care about ? > And please replace that "j %l[...]" with a "jg %l[...]", the branch target > range of the "j" instruction is 64K, not enough for the general case. I'll also need to use "jg" for RSEQ_ASM_DEFINE_CMPFAIL: #define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "jg %l[" __rseq_str(cmpfail_label) "]\n\t" \ ".popsection\n\t" I'll prepare a fix for Linux selftests, and I already pushed a fix within my librseq repository. Thanks for pointing it out! Mathieu > > -- > blue skies, > Martin. > > "Reality continues to ruin my life." - Calvin.
On 3/27/19 5:16 AM, Martin Schwidefsky wrote: > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h >> [...] >>>> + >>>> +/* Signature required before each abort handler code. */ >>>> +#define RSEQ_SIG 0x53053053 >>> >>> Why not a s390 specific value here? >> >> s390 also has the abort handler in a __rseq_failure section: >> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> __rseq_str(label) ":\n\t" \ >> teardown \ >> "j %l[" __rseq_str(abort_label) "]\n\t" \ >> ".popsection\n\t" >> >> Same question applies as powerpc: since disassemblers will try to decode >> that instruction, would it be better to define it as a valid one ? >> >> [...] > > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. > And please replace that "j %l[...]" with a "jg %l[...]", the branch target > range of the "j" instruction is 64K, not enough for the general case. Why was this particular operated selected? So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer" instruction that can't be reached by any control flow? Can we use a NOP with a unique value in an immediate operand? The goal being to have something that won't confuse during a debug session, or that the debugger can ignore (like constant pools on Arm)
On Wed, 27 Mar 2019 16:38:32 -0400 "Carlos O'Donell" <codonell@redhat.com> wrote: > On 3/27/19 5:16 AM, Martin Schwidefsky wrote: > > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > >> [...] > >>>> + > >>>> +/* Signature required before each abort handler code. */ > >>>> +#define RSEQ_SIG 0x53053053 > >>> > >>> Why not a s390 specific value here? > >> > >> s390 also has the abort handler in a __rseq_failure section: > >> > >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > >> ".pushsection __rseq_failure, \"ax\"\n\t" \ > >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > >> __rseq_str(label) ":\n\t" \ > >> teardown \ > >> "j %l[" __rseq_str(abort_label) "]\n\t" \ > >> ".popsection\n\t" > >> > >> Same question applies as powerpc: since disassemblers will try to decode > >> that instruction, would it be better to define it as a valid one ? > >> > >> [...] > > > > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. > > And please replace that "j %l[...]" with a "jg %l[...]", the branch target > > range of the "j" instruction is 64K, not enough for the general case. > > Why was this particular operated selected? > > So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer" > instruction that can't be reached by any control flow? > > Can we use a NOP with a unique value in an immediate operand? > > The goal being to have something that won't confuse during a debug > session, or that the debugger can ignore (like constant pools on Arm) I was looking at the wrong table in regard to opcode 0x53. The pattern 0x53...... is not a known instruction as far as the disassembler is concerned. As Mathieu pointed out "diebr" is actually 0xb353.... Sorry about the confusion. But why do we need this value in the first place if it can not be reached?
----- On Mar 28, 2019, at 3:49 AM, schwidefsky schwidefsky@de.ibm.com wrote: > On Wed, 27 Mar 2019 16:38:32 -0400 > "Carlos O'Donell" <codonell@redhat.com> wrote: > >> On 3/27/19 5:16 AM, Martin Schwidefsky wrote: >> > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) >> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >> > >> >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h >> >> [...] >> >>>> + >> >>>> +/* Signature required before each abort handler code. */ >> >>>> +#define RSEQ_SIG 0x53053053 >> >>> >> >>> Why not a s390 specific value here? >> >> >> >> s390 also has the abort handler in a __rseq_failure section: >> >> >> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ >> >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> >> __rseq_str(label) ":\n\t" \ >> >> teardown \ >> >> "j %l[" __rseq_str(abort_label) "]\n\t" \ >> >> ".popsection\n\t" >> >> >> >> Same question applies as powerpc: since disassemblers will try to decode >> >> that instruction, would it be better to define it as a valid one ? >> >> >> >> [...] >> > >> > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. >> > And please replace that "j %l[...]" with a "jg %l[...]", the branch target >> > range of the "j" instruction is 64K, not enough for the general case. >> >> Why was this particular operated selected? The 0x53053053 signature was selected by myself on x86, where it is a 4-byte operand to a no-op instruction. That value looks like "SEQSEQSE" in hexadecimal. The goal was to have an uncommon code signature value. Then it has been used as-is on arm and mips within literal pools (which seems fine), and also on s390 and powerpc where those seem to generate invalid instruction within a separate code section. I'm mainly concerned about the choice of this value on s390 and powerpc. >> >> So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer" >> instruction that can't be reached by any control flow? >> >> Can we use a NOP with a unique value in an immediate operand? >> >> The goal being to have something that won't confuse during a debug >> session, or that the debugger can ignore (like constant pools on Arm) > > I was looking at the wrong table in regard to opcode 0x53. The pattern > 0x53...... is not a known instruction as far as the disassembler is > concerned. As Mathieu pointed out "diebr" is actually 0xb353.... > Sorry about the confusion. > > But why do we need this value in the first place if it can not be reached? One reason is to help disassemblers in tools like objdump, gdb, and so on. Another reason for making this a valid instruction is if the CPU speculative execution can be helped by making this instruction a valid one, even though it's not reachable through normal execution. (this appears to be important at least on aarch64) However, we want that instruction to be an uncommon one, to reduce the chances that an attacker can use the rseq abort mechanism to redirect execution to an unrelated code block that would happen to follow that same 4-byte signature. For instance, I would not use a no-op which is typically generated by compilers there. In summary: * x86: Uses a no-op instruction ending with a 4-byte operand. * aarch64: Uses a trap instruction which also has a seldom-used operand value. * arm: The signature is in a literal pool (there is a jump over the signature). * mips: The signature is in a literal pool (there is a jump over the signature). * powerpc: the signature is an unreachable invalid(?) instruction within an executable section. * s390: the signature is an unreachable invalid instruction within an executable section. Thanks, Mathieu
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: > Hi Carlos, > > ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: ... > > [...] >>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an opcode specific to power? > > On powerpc 32/64, the abort is placed in a __rseq_failure executable section: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > "b %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > That section only contains snippets of those trampolines. Arguably, it would be > good if disassemblers could find valid instructions there. Boqun Feng could perhaps > shed some light on this signature choice ? Now would be a good time to decide > once and for all whether a valid instruction would be a better choice. I'm a bit vague on what we're trying to do here. But it seems like you want some sort of "eye catcher" prior to the branch? That value is a valid instruction on current CPUs (rlwimi. r5,r24,6,1,9), and even if it wasn't it could become one in future. If you change it to 0x8053530 that is both a valid instruction and is a nop (conditional trap immediate but with no conditions set). cheers
* Michael Ellerman: > I'm a bit vague on what we're trying to do here. > > But it seems like you want some sort of "eye catcher" prior to the branch? > > That value is a valid instruction on current CPUs (rlwimi. > r5,r24,6,1,9), and even if it wasn't it could become one in future. > > If you change it to 0x8053530 that is both a valid instruction and is a > nop (conditional trap immediate but with no conditions set). I think we need something that is very unlikely to appear in the instruction stream. It's just a marker. The instruction will never be executed, and it does not have to be a trap, either (I believe that a standard trap instruction would be a bad choice). Thanks, Florian
On 4/2/19 2:02 AM, Michael Ellerman wrote: > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: >> Hi Carlos, >> >> ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: > ... >> >> [...] >>>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h >> [...] >>>> +/* Signature required before each abort handler code. */ >>>> +#define RSEQ_SIG 0x53053053 >>> >>> Why isn't this an opcode specific to power? >> >> On powerpc 32/64, the abort is placed in a __rseq_failure executable section: >> >> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> __rseq_str(label) ":\n\t" \ >> "b %l[" __rseq_str(abort_label) "]\n\t" \ >> ".popsection\n\t" >> >> That section only contains snippets of those trampolines. Arguably, it would be >> good if disassemblers could find valid instructions there. Boqun Feng could perhaps >> shed some light on this signature choice ? Now would be a good time to decide >> once and for all whether a valid instruction would be a better choice. > > I'm a bit vague on what we're trying to do here. > > But it seems like you want some sort of "eye catcher" prior to the branch? > > That value is a valid instruction on current CPUs (rlwimi. > r5,r24,6,1,9), and even if it wasn't it could become one in future. > > If you change it to 0x8053530 that is both a valid instruction and is a > nop (conditional trap immediate but with no conditions set). The s390 IBM team needs to respond to this and I want to make sure they ACK the NOP being used here because it impacts them directly. I'd like to see Martin's opinion on this.
On 4/2/19 3:08 AM, Florian Weimer wrote: > * Michael Ellerman: > >> I'm a bit vague on what we're trying to do here. >> >> But it seems like you want some sort of "eye catcher" prior to the branch? >> >> That value is a valid instruction on current CPUs (rlwimi. >> r5,r24,6,1,9), and even if it wasn't it could become one in future. >> >> If you change it to 0x8053530 that is both a valid instruction and is a >> nop (conditional trap immediate but with no conditions set). > > I think we need something that is very unlikely to appear in the > instruction stream. It's just a marker. The instruction will never be > executed, and it does not have to be a trap, either (I believe that a > standard trap instruction would be a bad choice). I assume you want to avoid a standard trap instruction because it would be common, and so not meet the intent of the RSEQ_SIG choice as being something that is *uncommon* right? It is valuable that it be a trap, particularly for constant pools because it means that a jump into the constant pool will trap.
On 3/25/19 11:54 AM, Mathieu Desnoyers wrote: > Hi Carlos, > > ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: > > [...] > > I took care of all your comments for an upcoming round of patches, except the > following that remain open (see answer inline). I'm adding Linux maintainers > for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of > code signature prior to the abort handler for each of those architectures. Thank you for kicking off this conversation. Every architecture should have a reasonable RSEQ_SIG that applies to their ISA with a comment about why that instruction was chosen. It should be a conscious choice, without a default. > * Support for automatically registering threads with the Linux rseq(2) > system call has been added. This system call is implemented starting > from Linux 4.18. The Restartable Sequences ABI accelerates user-space > operations on per-cpu data. It allows user-space to perform updates > on per-cpu data without requiring heavy-weight atomic operations. See > https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/ > for further explanation. > > In order to be activated, it requires that glibc is built against > kernel headers that include this system call, and that glibc detects > availability of that system call at runtime. Suggest: * Support for automatically registering threads with the Linux rseq(2) system call has been added. This system call is implemented starting from Linux 4.18. The Restartable Sequences ABI accelerates user-space operations on per-cpu data. It allows user-space to perform updates on per-cpu data without requiring heavy-weight atomic operations. Automatically registering threads allows all libraries, including libc, to make immediate use of the rseq(2) support by using the documented ABI. See 'man 2 rseq' for the details of the ABI shared between libc and the kernel. > > For reference the assembly code I'm pointing at below can be found > in the Linux selftests under: > > tools/testing/selftests/rseq/rseq-*.h OK. >>> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > [...] >>> + >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an arm specific op code? Does the user have to mark this >> up as part of a constant pool when placing it in front of the abort handler >> to avoid disassembling the constant as code? This was at one point required >> to get gdb to work properly. >> > > For arm, the abort is defined as: > > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ > abort_label, version, flags, \ > start_ip, post_commit_offset, abort_ip) \ > ".balign 32\n\t" \ > __rseq_str(table_label) ":\n\t" \ > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ > ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "b %l[" __rseq_str(abort_label) "]\n\t" > > Which contains a copy of the struct rseq_cs for that critical section > close to the actual critical section, within the code, followed by the > signature. The reason why we have a copy of the struct rseq_cs there is > to speed up entry into the critical section by using the "adr" instruction > to compute the address to store into __rseq_cs->rseq_cs. > > AFAIU, a literal pool on ARM is defined as something which is always > jumped over (never executed), which is the case here. We always have > an unconditional branch instruction ("b") skipping over each > RSEQ_ASM_DEFINE_ABORT(). > > Therefore, given that the signature is part of a literal pool on ARM, > it can be any value we choose and should not need to be an actual valid > instruction. > > aarch64 defines the abort as: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > " b 222f\n" \ > " .inst " __rseq_str(RSEQ_SIG) "\n" \ > __rseq_str(label) ":\n" \ > " b %l[" __rseq_str(abort_label) "]\n" \ > "222:\n" > > Where the signature actually maps to a valid instruction. Considering that > aarch64 also have literal pools, and we branch over the signature, I wonder > why it's so important to ensure the signature is a valid trap instruction. > Perhaps Will Deacon can enlighten us ? In the event that you accidentally jump to it then you trap? However, you want an *uncommon* trap insn. I think the order of preference is: 1. An uncommon insn (with random immediate values), in a literal pool, that is not a useful ROP/JOP sequence (very uncommon) 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) 2b. A NOP to avoid affecting speculative execution (maybe uncommon) With 2a/2b being roughly equivalent depending on speculative execution policy. >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this a mips-specific op code? > > MIPS also has a literal pool just before the abort handler, and it > jumps over it. My understanding is that we can use any signature value > we want, and it does not need to be a valid instruction, similarly to ARM: > > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ > abort_label, version, flags, \ > start_ip, post_commit_offset, abort_ip) \ > ".balign 32\n\t" \ > __rseq_str(table_label) ":\n\t" \ > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "b %l[" __rseq_str(abort_label) "]\n\t" > > Perhaps Paul Burton can confirm this ? Yes please. You also want to avoid the value being a valid MIPS insn that's common. Did you check that? > [...] >>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an opcode specific to power? > > On powerpc 32/64, the abort is placed in a __rseq_failure executable section: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > "b %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > That section only contains snippets of those trampolines. Arguably, it would be > good if disassemblers could find valid instructions there. Boqun Feng could perhaps > shed some light on this signature choice ? Now would be a good time to decide > once and for all whether a valid instruction would be a better choice. This seems questionable too. > [...] >>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > [...] >>> + >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why not a s390 specific value here? > > s390 also has the abort handler in a __rseq_failure section: > > #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "j %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > Same question applies as powerpc: since disassemblers will try to decode > that instruction, would it be better to define it as a valid one ? Yes, I think it needs to be a valid uncommon insn or nop. > [...] >>> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why not an x86-specific op code? > > On x86, we use this 4-byte signature as operand to a "no-op" instruction > taking 4-byte immediate operand: That makes perfect sense. Thanks. So what is left to audit? In summary: - Why does AArch64 choose a trap? - Is the current choice of 0x53053053 OK for MIPS? Does it map to a valid insn? - What better choice is there for power? Pick a real uncommon insn or nop? - What better choice is there for s390? Pick a real uncommon insn or nop? - Todays choice could become something special in the future since it's unassigned.
* Carlos O'Donell: > On 4/2/19 3:08 AM, Florian Weimer wrote: >> * Michael Ellerman: >> >>> I'm a bit vague on what we're trying to do here. >>> >>> But it seems like you want some sort of "eye catcher" prior to the branch? >>> >>> That value is a valid instruction on current CPUs (rlwimi. >>> r5,r24,6,1,9), and even if it wasn't it could become one in future. >>> >>> If you change it to 0x8053530 that is both a valid instruction and is a >>> nop (conditional trap immediate but with no conditions set). >> >> I think we need something that is very unlikely to appear in the >> instruction stream. It's just a marker. The instruction will never be >> executed, and it does not have to be a trap, either (I believe that a >> standard trap instruction would be a bad choice). > > I assume you want to avoid a standard trap instruction because it would > be common, and so not meet the intent of the RSEQ_SIG choice as being something > that is *uncommon* right? Ideally, RSEQ_SIG would be something that does not show up in the instruction stream at all, so that it is a reliable marker for the start of an rseq handler. I assume the intent here is that the kernel provides some validation on the program counter it reads from the rseq area, so that we do not end up with some easily-abused gadget in every process image. > It is valuable that it be a trap, particularly for constant pools because > it means that a jump into the constant pool will trap. Sorry, I don't understand why this matters in this context. Would you please elaborate? Thanks, Florian
On 4/5/19 5:16 AM, Florian Weimer wrote: > * Carlos O'Donell: >> It is valuable that it be a trap, particularly for constant pools because >> it means that a jump into the constant pool will trap. > > Sorry, I don't understand why this matters in this context. Would you > please elaborate? Sorry, I wasn't very clear. My point is only that any accidental jumps, either with off-by-one (like you fixed in gcc/glibc's signal unwinding most recently), result in a process fault rather than executing RSEQ_SIG as a valid instruction *and then* continuing onwards to the handler. A process fault is achieved either by a trap, or an invalid instruction, or a privileged insn (like suggested for MIPS in this thread).
Carlos O'Donell <codonell@redhat.com> writes: > On 4/5/19 5:16 AM, Florian Weimer wrote: >> * Carlos O'Donell: >>> It is valuable that it be a trap, particularly for constant pools because >>> it means that a jump into the constant pool will trap. >> >> Sorry, I don't understand why this matters in this context. Would you >> please elaborate? > > Sorry, I wasn't very clear. > > My point is only that any accidental jumps, either with off-by-one (like you > fixed in gcc/glibc's signal unwinding most recently), result in a process fault > rather than executing RSEQ_SIG as a valid instruction *and then* continuing > onwards to the handler. > > A process fault is achieved either by a trap, or an invalid instruction, or > a privileged insn (like suggested for MIPS in this thread). In that case, mtmsr (Move to Machine State Register) seems a good candidate. mtmsr is available both on 32 and 64 bits since their first implementations. It's a privileged instruction and should never appear in userspace code (causes SIGILL). Any comments?
On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote: > Carlos O'Donell <codonell@redhat.com> writes: > >> On 4/5/19 5:16 AM, Florian Weimer wrote: >>> * Carlos O'Donell: >>>> It is valuable that it be a trap, particularly for constant pools because >>>> it means that a jump into the constant pool will trap. >>> >>> Sorry, I don't understand why this matters in this context. Would you >>> please elaborate? >> >> Sorry, I wasn't very clear. >> >> My point is only that any accidental jumps, either with off-by-one (like you >> fixed in gcc/glibc's signal unwinding most recently), result in a process fault >> rather than executing RSEQ_SIG as a valid instruction *and then* continuing >> onwards to the handler. >> >> A process fault is achieved either by a trap, or an invalid instruction, or >> a privileged insn (like suggested for MIPS in this thread). > > In that case, mtmsr (Move to Machine State Register) seems a good candidate. > > mtmsr is available both on 32 and 64 bits since their first implementations. > > It's a privileged instruction and should never appear in userspace > code (causes SIGILL). > > Any comments? That seems good to me. Mathieu, What's required to move this forward for POWER?
Carlos O'Donell <codonell@redhat.com> writes: > On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote: >> Carlos O'Donell <codonell@redhat.com> writes: >> >>> On 4/5/19 5:16 AM, Florian Weimer wrote: >>>> * Carlos O'Donell: >>>>> It is valuable that it be a trap, particularly for constant pools because >>>>> it means that a jump into the constant pool will trap. >>>> >>>> Sorry, I don't understand why this matters in this context. Would you >>>> please elaborate? >>> >>> Sorry, I wasn't very clear. >>> >>> My point is only that any accidental jumps, either with off-by-one (like you >>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault >>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing >>> onwards to the handler. >>> >>> A process fault is achieved either by a trap, or an invalid instruction, or >>> a privileged insn (like suggested for MIPS in this thread). >> >> In that case, mtmsr (Move to Machine State Register) seems a good candidate. >> >> mtmsr is available both on 32 and 64 bits since their first implementations. >> >> It's a privileged instruction and should never appear in userspace >> code (causes SIGILL). I'd much rather we use a trap with a specific immediate value. Otherwise someone's going to waste time one day puzzling over why userspace is doing mtmsr. It would also complicate things if we ever wanted to emulate mtmsr. If we want something that is a trap rather than a nop then use 0x0fe50553. That's "compare the value in r5 with 0x553 and then trap unconditionally". It shows up in objdump as: 10000000: 53 05 e5 0f twui r5,1363 The immediate can be anything, I chose that value to mimic the x86 value Mathieu mentioned. There's no reason that instruction would ever be generated because the immediate value serves no purpose. So it satisfies the "very unlikely to appear" criteria AFAICS. cheers
On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote: > I'd much rather we use a trap with a specific immediate value. Otherwise > someone's going to waste time one day puzzling over why userspace is > doing mtmsr. It's data. We have other data in executable sections. Anyone who wonders about odd disassembly just hasn't realized they are disassembling data. > It would also complicate things if we ever wanted to emulate mtmsr. No, because it won't be executed. If I understand correctly, the only reason to choose an illegal, trap or privileged insn is to halt execution earlier rather than later when a program goes off in the weeds. > If we want something that is a trap rather than a nop then use 0x0fe50553. > > That's "compare the value in r5 with 0x553 and then trap unconditionally". > > It shows up in objdump as: > > 10000000: 53 05 e5 0f twui r5,1363 > > > The immediate can be anything, I chose that value to mimic the x86 value > Mathieu mentioned. > > There's no reason that instruction would ever be generated because the > immediate value serves no purpose. So it satisfies the "very unlikely > to appear" criteria AFAICS. Yes, looks fine to me, except that in VLE mode (do we care?) ".long 0x0fe50553" disassembles as 0: 0f e5 se_cmphl r5,r30 2: 05 53 se_mullw r3,r5 No illegal/trap/privileged insn there. ".long 0x0fe5000b" might be better to cover VLE.
Alan Modra <amodra@gmail.com> writes: > On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote: >> I'd much rather we use a trap with a specific immediate value. Otherwise >> someone's going to waste time one day puzzling over why userspace is >> doing mtmsr. > > It's data. We have other data in executable sections. Anyone who > wonders about odd disassembly just hasn't realized they are > disassembling data. > >> It would also complicate things if we ever wanted to emulate mtmsr. > > No, because it won't be executed. If I understand correctly, the only > reason to choose an illegal, trap or privileged insn is to halt > execution earlier rather than later when a program goes off in the > weeds. > >> If we want something that is a trap rather than a nop then use 0x0fe50553. >> >> That's "compare the value in r5 with 0x553 and then trap unconditionally". >> >> It shows up in objdump as: >> >> 10000000: 53 05 e5 0f twui r5,1363 >> >> >> The immediate can be anything, I chose that value to mimic the x86 value >> Mathieu mentioned. >> >> There's no reason that instruction would ever be generated because the >> immediate value serves no purpose. So it satisfies the "very unlikely >> to appear" criteria AFAICS. > > Yes, looks fine to me, except that in VLE mode (do we care?) > ".long 0x0fe50553" disassembles as > 0: 0f e5 se_cmphl r5,r30 > 2: 05 53 se_mullw r3,r5 > No illegal/trap/privileged insn there. > > ".long 0x0fe5000b" might be better to cover VLE. Looks good for me too. Actually, it better fits what Carlos O'Donnell had requested: >>> I think the order of preference is: >>> >>> 1. An uncommon insn (with random immediate values), in a literal pool, that is >>> not a useful ROP/JOP sequence (very uncommon) >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >>> >>> With 2a/2b being roughly equivalent depending on speculative execution policy.
On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote: > Alan Modra <amodra@gmail.com> writes: >> Yes, looks fine to me, except that in VLE mode (do we care?) >> ".long 0x0fe50553" disassembles as >> 0: 0f e5 se_cmphl r5,r30 >> 2: 05 53 se_mullw r3,r5 >> No illegal/trap/privileged insn there. >> >> ".long 0x0fe5000b" might be better to cover VLE. > > Looks good for me too. The requirement that it be a valid instruction is simply to aid in the disassembly of rseq regions which may be hand written assembly with a thin veneer of CFI/DWARF information. It has already been pointed out that POWER uses data in the instruction stream for jump tables to implement switch statements, but that specific use has compiler support and one presumes good debug information. So as Alan says, there is already data in the insn stream, though such things can't be good for performance (pollutes D-cache, problematic for speculative execution). > Actually, it better fits what Carlos O'Donnell had requested: > >>>> I think the order of preference is: >>>> >>>> 1. An uncommon insn (with random immediate values), in a literal pool, that is >>>> not a useful ROP/JOP sequence (very uncommon) >>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >>>> >>>> With 2a/2b being roughly equivalent depending on speculative execution policy. Yes, though "in a literal pool" is something that is not required, since users might not want literal pools and so we shouldn't require that feature (it also pollutes D-cache). Keep in mind the insn will never execute. If a trap insn calls out the nature of the signature more clearly then use that instead.
----- On Apr 9, 2019, at 10:13 AM, Carlos O'Donell codonell@redhat.com wrote: > On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote: >> Alan Modra <amodra@gmail.com> writes: >>> Yes, looks fine to me, except that in VLE mode (do we care?) >>> ".long 0x0fe50553" disassembles as >>> 0: 0f e5 se_cmphl r5,r30 >>> 2: 05 53 se_mullw r3,r5 >>> No illegal/trap/privileged insn there. >>> >>> ".long 0x0fe5000b" might be better to cover VLE. >> >> Looks good for me too. > > The requirement that it be a valid instruction is simply to aid in the > disassembly of rseq regions which may be hand written assembly with a > thin veneer of CFI/DWARF information. > > It has already been pointed out that POWER uses data in the instruction > stream for jump tables to implement switch statements, but that specific > use has compiler support and one presumes good debug information. So as > Alan says, there is already data in the insn stream, though such things > can't be good for performance (pollutes D-cache, problematic for > speculative execution). > >> Actually, it better fits what Carlos O'Donnell had requested: >> >>>>> I think the order of preference is: >>>>> >>>>> 1. An uncommon insn (with random immediate values), in a literal pool, that is >>>>> not a useful ROP/JOP sequence (very uncommon) >>>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >>>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >>>>> >>>>> With 2a/2b being roughly equivalent depending on speculative execution policy. > > Yes, though "in a literal pool" is something that is not required, since > users might not want literal pools and so we shouldn't require that > feature (it also pollutes D-cache). > > Keep in mind the insn will never execute. > > If a trap insn calls out the nature of the signature more clearly then > use that instead. So based on the recent discussions, there are a few things we can conclude: - Choosing a random value and relying on literal pools is a bad idea, because some compilation environments disable them entirely, - We ideally want the signature to be a valid instruction in the instruction set so disassembler/emulator tools don't get confused and we don't hurt speculative execution. - Best option is a trap with an unlikely immediate opcode, because it traps on the instruction in case the program try to execute it by mistake, - Second best would be a no-op with an unlikely immediate opcode, - We may want to stay away from privileged instructions because they can confuse emulators with may try to emulate them, - Some architectures have big endian/little endian variants. We may need to carefully #ifdef each case so the numeric value matches actual instructions, - Some architectures have extensions to their instruction set (e.g. ARM thumb, power VLE) which can be combined with the basic instruction set within the same program. We need to decide whether we care what those signatures look like in those instruction set extensions or not. Is it a best effort to match real instructions or a hard requirement ? If it's a hard requirement, we may need to extend the rseq system call with new flags to accept more than one signature. Thoughts ? Thanks, Mathieu
----- On Apr 8, 2019, at 5:45 PM, Carlos O'Donell codonell@redhat.com wrote: > On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote: >> Carlos O'Donell <codonell@redhat.com> writes: >> >>> On 4/5/19 5:16 AM, Florian Weimer wrote: >>>> * Carlos O'Donell: >>>>> It is valuable that it be a trap, particularly for constant pools because >>>>> it means that a jump into the constant pool will trap. >>>> >>>> Sorry, I don't understand why this matters in this context. Would you >>>> please elaborate? >>> >>> Sorry, I wasn't very clear. >>> >>> My point is only that any accidental jumps, either with off-by-one (like you >>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault >>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing >>> onwards to the handler. >>> >>> A process fault is achieved either by a trap, or an invalid instruction, or >>> a privileged insn (like suggested for MIPS in this thread). >> >> In that case, mtmsr (Move to Machine State Register) seems a good candidate. >> >> mtmsr is available both on 32 and 64 bits since their first implementations. >> >> It's a privileged instruction and should never appear in userspace >> code (causes SIGILL). >> >> Any comments? > > That seems good to me. > > Mathieu, > > What's required to move this forward for POWER? First, we need to decide whether we need rseq to support more than one signature per process to cover the VLE extension. If so, we'd need to extend rseq with a new flag for future kernels. Then, ideally, we'd need a patch on top of the Linux kernel tools/testing/selftests/rseq/rseq-ppc.h file that updates the signature value. I think the current discussion leads us towards a trap: /* * TODO: document instruction objdump output on each architecture and VLE * instruction sets. */ #define RSEQ_SIG 0x0fe5000b Should we do anything specific for big/little endian ? Is the byte order of the instruction encoding the same as data ? Thanks, Mathieu
----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote: > Hi Carlos / all, > > On Thu, Apr 04, 2019 at 04:50:08PM -0400, Carlos O'Donell wrote: >> > > > +/* Signature required before each abort handler code. */ >> > > > +#define RSEQ_SIG 0x53053053 >> > > >> > > Why isn't this a mips-specific op code? >> > >> > MIPS also has a literal pool just before the abort handler, and it >> > jumps over it. My understanding is that we can use any signature value >> > we want, and it does not need to be a valid instruction, similarly to ARM: >> > >> > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ >> > abort_label, version, flags, \ >> > start_ip, post_commit_offset, abort_ip) \ >> > ".balign 32\n\t" \ >> > __rseq_str(table_label) ":\n\t" \ >> > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ >> > LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ >> > LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ >> > LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ >> > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ >> > __rseq_str(label) ":\n\t" \ >> > teardown \ >> > "b %l[" __rseq_str(abort_label) "]\n\t" >> > >> > Perhaps Paul Burton can confirm this ? >> >> Yes please. >> >> You also want to avoid the value being a valid MIPS insn that's common. >> >> Did you check that? > > This does not decode as a standard MIPS instruction, though it does > decode for both the microMIPS (ori) & nanoMIPS (lwxs; sll) ISAs. > > I imagine I copied the value from another architecture when porting, and > since it doesn't get executed it seemed fine. > > One maybe nicer option along the same lines would be 0x72736571 or > 0x71657372 (ASCII 'rseq') neither of which decode as a MIPS instruction. > >> I think the order of preference is: >> >> 1. An uncommon insn (with random immediate values), in a literal pool, that is >> not a useful ROP/JOP sequence (very uncommon) > > For that option on MIPS we could do something like: > > sll $0, $0, 31 # effectively a nop, but looks weird > >> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) > > Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS > & classic MIPS) so that could be something like: > > break 0x7273 # ASCII 'rs' > > That's pretty unlikely to be seen in normal code, or the teq instruction > has a rarely used code field (4b in microMIPS, 5b in nanoMIPS, 10b in > classic MIPS) that's meaningless to hardware so something like this > would be possible: > > teq $0, $0, 0x8 # ASCII backspace > >> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >> >> With 2a/2b being roughly equivalent depending on speculative execution policy. > > There are a bunch of potential odd looking nops possible, one of which > would be the sll I mentioned above. > > Another option would be to use a priveleged instruction which userland > code can't execute & should normally never contain. That would decode as > a valid instruction & effectively behave like a trap instruction but > look very odd to anyone reading disassembled code. eg: > > mfc0 $0, 13 # Try to read the cause register; take SIGILL > > In order to handle MIPS vs microMIPS vs nanoMIPS differences I'm > thinking it may be best to switch to one of these real instructions that > looks strange. The ugly part would be the nest of #ifdef's to deal with > endianness & ISA when defining it as a number... Note that we can have different signatures for each sub-architecture, as long as they don't have to co-exist within the same process. Ideally we'd need a patch on top of the Linux kernel tools/testing/selftests/rseq/rseq-mips.h file that updates the signature value. I think the current discussion leads us towards a trap with unlikely immediate operand. Note that we can special-case with #ifdef for each sub-architecture and endianness if need be. /* * TODO: document trap instruction objdump output on each sub-architecture * instruction sets. */ #define RSEQ_SIG 0x######## Should we do anything specific for big/little endian ? Is the byte order of the instruction encoding the same as data ? Thanks, Mathieu
----- On Apr 9, 2019, at 5:29 AM, Alan Modra amodra@gmail.com wrote: > On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote: >> I'd much rather we use a trap with a specific immediate value. Otherwise >> someone's going to waste time one day puzzling over why userspace is >> doing mtmsr. > > It's data. We have other data in executable sections. Anyone who > wonders about odd disassembly just hasn't realized they are > disassembling data. > >> It would also complicate things if we ever wanted to emulate mtmsr. > > No, because it won't be executed. If I understand correctly, the only > reason to choose an illegal, trap or privileged insn is to halt > execution earlier rather than later when a program goes off in the > weeds. > >> If we want something that is a trap rather than a nop then use 0x0fe50553. >> >> That's "compare the value in r5 with 0x553 and then trap unconditionally". >> >> It shows up in objdump as: >> >> 10000000: 53 05 e5 0f twui r5,1363 >> >> >> The immediate can be anything, I chose that value to mimic the x86 value >> Mathieu mentioned. >> >> There's no reason that instruction would ever be generated because the >> immediate value serves no purpose. So it satisfies the "very unlikely >> to appear" criteria AFAICS. > > Yes, looks fine to me, except that in VLE mode (do we care?) > ".long 0x0fe50553" disassembles as > 0: 0f e5 se_cmphl r5,r30 > 2: 05 53 se_mullw r3,r5 > No illegal/trap/privileged insn there. > > ".long 0x0fe5000b" might be better to cover VLE. Can you share with us the objdump output of ".long 0x0fe5000b" in VLE mode ? VLE mode support does not appear to be available in typical toolchains. Also, is VLE mode only for powerpc 32 be, or also for powerpc 64 be/le ? Thanks, Mathieu
----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote: [...] >> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) > > Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS > & classic MIPS) so that could be something like: > > break 0x7273 # ASCII 'rs' > Hi Paul, I like this uncommon break instruction as signature choice. However, if I try to compile assembler with a break 0x7273 instruction with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04)) I get: /tmp/ccVh9F7T.s: Assembler messages: /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273' It works up to the value 0x3FF, which seems to use the top 10 code bits: a: 03ff 0007 break 0x3ff Would a "break 0x350" be a good choice as well ? Any idea why 0x7273 is not accepted by my assembler ? I also tried crafting the assembler with values between 0x3FF and 0x7273 in the 20 code bits. It seems fine from an objdump perspective: ".long 0x03FFFC7\n\t" generates: 10: 003f ffc7 break 0x3f,0x3ff What I don't understand is why the instruction generated by my toolchain ends with the last 6 bits "000111", whereas the mips32 instruction set specifies break as ending with "001101" [1]. What am I missing ? Also, the nanomips break code [2] has a completely different instruction layout. Should we use a different signature when compiling for nanomips ? What #ifdef should we use ? Do I need a special toolchain to generate nanomips binaries ? Thanks, Mathieu [1] http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf [2] https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf
----- On Apr 18, 2019, at 2:58 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote: > [...] > >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >> >> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS >> & classic MIPS) so that could be something like: >> >> break 0x7273 # ASCII 'rs' >> > > Hi Paul, > > I like this uncommon break instruction as signature choice. > > However, if I try to compile assembler with a break 0x7273 instruction > with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu > 8.2.0-1ubuntu2~18.04)) > I get: > > /tmp/ccVh9F7T.s: Assembler messages: > /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273' > > It works up to the value 0x3FF, which seems to use the top 10 > code bits: > > a: 03ff 0007 break 0x3ff > > Would a "break 0x350" be a good choice as well ? > > Any idea why 0x7273 is not accepted by my assembler ? > > I also tried crafting the assembler with values between 0x3FF and 0x7273 > in the 20 code bits. It seems fine from an objdump perspective: > > ".long 0x03FFFC7\n\t" > > generates: > > 10: 003f ffc7 break 0x3f,0x3ff > > What I don't understand is why the instruction generated by my > toolchain ends with the last 6 bits "000111", whereas the mips32 > instruction set specifies break as ending with "001101" [1]. > What am I missing ? > > Also, the nanomips break code [2] has a completely different > instruction layout. Should we use a different signature when > compiling for nanomips ? What #ifdef should we use ? Do I > need a special toolchain to generate nanomips binaries ? Hi Paul, I'm still waiting for feedback on the MIPS front. Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to: 0350000d break 0x350 and use RSEQ_SIG in assembly with: ".word " __rseq_str(RSEQ_SIG) "\n\t" on big and little endian MIPS, for MIPS32 and MIPS64, based on code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04). Let me know if it needs to be tweaked. Thanks, Mathieu > > Thanks, > > Mathieu > > [1] > http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf > [2] > https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com
On Wed, 24 Apr 2019, Paul Burton wrote: > > > Any idea why 0x7273 is not accepted by my assembler ? > > I don't know why the assembler wants a smaller immediate than the > instruction encoding allows... There's a comment in the binutils file > include/opcode/mips.h that reads: > > > A breakpoint instruction uses OP, CODE and SPEC (10 bits of the > > breakpoint instruction are not defined; Kane says the breakpoint code > > field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use > > ten bits). An optional two-operand form of break/sdbbp allows the > > lower ten bits to be set too, and MIPS32 and later architectures allow > > 20 bits to be set with a signal operand (using CODE20). > > I suspect there's some history here that predates my involvement (or > possibly just predates me). A useful explanation is in the Linux kernel (always good to look there), in arch/mips/kernel/traps.c: /* * There is the ancient bug in the MIPS assemblers that the break * code starts left to bit 16 instead to bit 6 in the opcode. * Gas is bug-compatible, but not always, grrr... * We handle both cases with a simple heuristics. --macro */ Unfortunately the bug has been carried over to the microMIPS instruction encoding in libopcodes for no reason (i.e. likely by copying the table mechanically without analysing it) and I didn't catch it when upstreaming. We should have permitted setting all bits in the 20-bit code field in the microMIPS encoding with a single operand, but you need two, like with the regular MIPS instruction set. The note on the MIPS32 assembly ISA permitting to set all the 20 bits with a single operand is a stale comment referring to the situation before binutils commit 1586d91e32ea ("/ 0 should send SIGFPE not SIGTRAP..."), <https://sourceware.org/ml/binutils/2004-07/msg00260.html>, which addressed a user ABI compatibility issue as discussed upthread here: <https://sourceware.org/ml/binutils/2004-06/msg00188.html> and previously: <https://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=40C9F5A4.2050606%40avtrex.com>. As this is my mistake with the stale note, I have applied a fix to binutils now, commit cd0923370be1 ("MIPS/include: opcode/mips.h: Update stale comment for CODE20 operand"), so that it is clear that it is only SDBBP that accepts a single 20-bit operand for the code field (for the MIPS32 and later ISAs). FWIW, Maciej
diff --git a/NEWS b/NEWS index 912a9bdc0f..0608c60f7d 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,17 @@ See the end for copying conditions. Please send GNU C library bug reports via <https://sourceware.org/bugzilla/> using `glibc' in the "product" field. +Version 2.30 + +Major new features: + +* Support for automatically registering threads with the Linux rseq(2) + system call has been added. This system call is implemented starting + from Linux 4.18. In order to be activated, it requires that glibc is built + against kernel headers that include this system call, and that glibc + detects availability of that system call at runtime. + + Version 2.29 Major new features: diff --git a/csu/libc-start.c b/csu/libc-start.c index 5d9c3675fa..8680afc0ef 100644 --- a/csu/libc-start.c +++ b/csu/libc-start.c @@ -22,6 +22,7 @@ #include <ldsodefs.h> #include <exit-thread.h> #include <libc-internal.h> +#include <rseq-internal.h> #include <elf/dl-tunables.h> @@ -140,7 +141,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up; -#ifndef SHARED +#ifdef SHARED + /* Register rseq ABI to the kernel. */ + (void) rseq_register_current_thread (); +#else _dl_relocate_static_pie (); char **ev = &argv[argc + 1]; @@ -218,6 +222,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), } # endif + /* Register rseq ABI to the kernel. */ + (void) rseq_register_current_thread (); + /* Initialize libpthread if linked in. */ if (__pthread_initialize_minimal != NULL) __pthread_initialize_minimal (); @@ -230,8 +237,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), # else __pointer_chk_guard_local = pointer_chk_guard; # endif - -#endif /* !SHARED */ +#endif /* Register the destructor of the dynamic linker if there is any. */ if (__glibc_likely (rtld_fini != NULL)) diff --git a/misc/Makefile b/misc/Makefile index cf0daa1161..0ae1dbaf80 100644 --- a/misc/Makefile +++ b/misc/Makefile @@ -36,7 +36,8 @@ headers := sys/uio.h bits/uio-ext.h bits/uio_lim.h \ syslog.h sys/syslog.h \ bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \ bits/select2.h bits/hwcap.h sys/auxv.h \ - sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h + sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \ + rseq-internal.h routines := brk sbrk sstk ioctl \ readv writev preadv preadv64 pwritev pwritev64 \ diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h new file mode 100644 index 0000000000..915122e4bf --- /dev/null +++ b/misc/rseq-internal.h @@ -0,0 +1,34 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef RSEQ_INTERNAL_H +#define RSEQ_INTERNAL_H + +static inline int +rseq_register_current_thread (void) +{ + return -1; +} + +static inline int +rseq_unregister_current_thread (void) +{ + return -1; +} + +#endif /* rseq-internal.h */ diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 2bd2b10727..90b3419390 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -33,6 +33,7 @@ #include <default-sched.h> #include <futex-internal.h> #include <tls-setup.h> +#include <rseq-internal.h> #include "libioP.h" #include <shlib-compat.h> @@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd) START_THREAD_DEFN { struct pthread *pd = START_THREAD_SELF; + bool has_rseq = false; #if HP_TIMING_AVAIL /* Remember the time when the thread was started. */ @@ -396,6 +398,9 @@ START_THREAD_DEFN if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2)) futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE); + /* Register rseq TLS to the kernel. */ + has_rseq = !rseq_register_current_thread (); + #ifdef __NR_set_robust_list # ifndef __ASSUME_SET_ROBUST_LIST if (__set_robust_list_avail >= 0) @@ -573,6 +578,10 @@ START_THREAD_DEFN } #endif + /* Unregister rseq TLS from kernel. */ + if (has_rseq && rseq_unregister_current_thread ()) + abort(); + advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, pd->guardsize); diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile index 5f8c2c7c7d..5b541469ec 100644 --- a/sysdeps/unix/sysv/linux/Makefile +++ b/sysdeps/unix/sysv/linux/Makefile @@ -1,5 +1,5 @@ ifeq ($(subdir),csu) -sysdep_routines += errno-loc +sysdep_routines += errno-loc rseq-sym endif ifeq ($(subdir),assert) @@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \ bits/termios-c_iflag.h bits/termios-c_oflag.h \ bits/termios-baud.h bits/termios-c_cflag.h \ bits/termios-c_lflag.h bits/termios-tcflow.h \ - bits/termios-misc.h + bits/termios-misc.h sys/rseq.h bits/rseq.h tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \ tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \ diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions index f1e12d9c69..ad88c2b7ff 100644 --- a/sysdeps/unix/sysv/linux/Versions +++ b/sysdeps/unix/sysv/linux/Versions @@ -174,6 +174,10 @@ libc { GLIBC_2.29 { getcpu; } + GLIBC_2.30 { + __rseq_abi; + __rseq_lib_abi; + } GLIBC_PRIVATE { # functions used in other libraries __syscall_rt_sigqueueinfo; diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h new file mode 100644 index 0000000000..543bc5388a --- /dev/null +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0. */ diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index 9c330f325e..bc937f585d 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index f630fa4c6f..89cc8b1cfb 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2036,6 +2036,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist index b96f45590f..e5055f2d4e 100644 --- a/sysdeps/unix/sysv/linux/arm/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/libc.abilist @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h new file mode 100644 index 0000000000..d60f02cfeb --- /dev/null +++ b/sysdeps/unix/sysv/linux/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit + signature inserted before each rseq abort label in the code section. */ diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index 088a8ee369..546d073cdb 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -1883,6 +1883,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index f7ff2c57b9..ac1de6e4b3 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2048,6 +2048,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index becd8b1033..cc3445b958 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -1917,6 +1917,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 74e42a5209..f7e28bd5a0 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0x98 GLIBC_2.4 _IO_2_1_stdin_ D 0x98 diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 4af5a74e8a..b8f00f6111 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -1992,6 +1992,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist index ccef673fd2..19f191434f 100644 --- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist @@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 1054bb599e..fe43507f55 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -1970,6 +1970,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index 4f5b5ffebf..b247c6ea9b 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -1968,6 +1968,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 943aee58d4..5339ca52b6 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -1976,6 +1976,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index 17a5d17ef9..11f24eb440 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -1971,6 +1971,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index 4d62a540fd..fd223bfc44 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index ecc2d6fa13..cc53178e81 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -1996,6 +1996,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index f5830f9c33..2de3134bc7 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2000,6 +2000,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index 633d8f4792..aae3def700 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 _Exit F GLIBC_2.3 _IO_2_1_stderr_ D 0xe0 GLIBC_2.3 _IO_2_1_stdin_ D 0xe0 diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 2c712636ef..8d582a3a9b 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index 195bc8b2cf..155953f6cf 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h new file mode 100644 index 0000000000..d676da3701 --- /dev/null +++ b/sysdeps/unix/sysv/linux/rseq-internal.h @@ -0,0 +1,91 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef RSEQ_INTERNAL_H +#define RSEQ_INTERNAL_H + +#include <sysdep.h> + +#ifdef __NR_rseq + +#include <errno.h> +#include <sys/rseq.h> + +static inline int +rseq_register_current_thread (void) +{ + int rc, ret = 0; + INTERNAL_SYSCALL_DECL (err); + + if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED) + return -1; + /* Temporarily prevent nested signal handlers from registering rseq. */ + __rseq_lib_abi.register_state = RSEQ_REGISTER_ONGOING; + if (__rseq_lib_abi.refcount == UINT_MAX) + { + ret = -1; + goto end; + } + if (__rseq_lib_abi.refcount++) + goto end; + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), + 0, RSEQ_SIG); + if (!rc) + goto end; + if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY) + __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED; + ret = -1; +end: + __rseq_lib_abi.register_state = RSEQ_REGISTER_ALLOWED; + return ret; +} + +static inline int +rseq_unregister_current_thread (void) +{ + int rc, ret = 0; + INTERNAL_SYSCALL_DECL (err); + + /* Setting __rseq_register_state = RSEQ_REGISTER_EXITING for the rest of the + thread lifetime. Ensures signal handlers nesting just before thread exit + don't try to register rseq. */ + __rseq_lib_abi.register_state = RSEQ_REGISTER_EXITING; + __rseq_lib_abi.refcount = 0; + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + if (!rc) + goto end; + ret = -1; +end: + return ret; +} +#else +static inline int +rseq_register_current_thread (void) +{ + return -1; +} + +static inline int +rseq_unregister_current_thread (void) +{ + return -1; +} +#endif + +#endif /* rseq-internal.h */ diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c new file mode 100644 index 0000000000..99b277e9d6 --- /dev/null +++ b/sysdeps/unix/sysv/linux/rseq-sym.c @@ -0,0 +1,54 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sys/syscall.h> +#include <stdint.h> + +#ifdef __NR_rseq +#include <sys/rseq.h> +#else + +enum rseq_cpu_id_state { + RSEQ_CPU_ID_UNINITIALIZED = -1, + RSEQ_CPU_ID_REGISTRATION_FAILED = -2, +}; + +/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI + size is 20 bytes. */ +struct rseq { + uint32_t cpu_id_start; + uint32_t cpu_id; + uint64_t rseq_cs; + uint32_t flags; +} __attribute__ ((aligned(4 * sizeof(uint64_t)))); + +struct rseq_lib_abi +{ + uint32_t register_state; + uint32_t refcount; +}; + +#endif + +/* volatile because fields can be read/updated by the kernel. */ +__thread volatile struct rseq __rseq_abi = { + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, +}; + +/* volatile because fields can be read/updated by signal handlers. */ +__thread volatile struct rseq_lib_abi __rseq_lib_abi; diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index 334def033c..42316d8666 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2005,6 +2005,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 536f4c4ced..c6c4e55a77 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -1911,6 +1911,8 @@ GLIBC_2.29 __fentry__ F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist index 30ae3b6ebb..8652dfea59 100644 --- a/sysdeps/unix/sysv/linux/sh/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/libc.abilist @@ -1887,6 +1887,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 68b107d080..95b58dfa67 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -1999,6 +1999,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index e5b6a4da50..bfd24f9d1c 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -1940,6 +1940,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h new file mode 100644 index 0000000000..83c8976f50 --- /dev/null +++ b/sysdeps/unix/sysv/linux/sys/rseq.h @@ -0,0 +1,65 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +#define _SYS_RSEQ_H 1 + +/* We use the structures declarations from the kernel headers. */ +#include <linux/rseq.h> +/* Architecture-specific rseq signature. */ +#include <bits/rseq.h> +#include <stdint.h> + +enum rseq_register_state +{ + /* Value RSEQ_REGISTER_ALLOWED means it is allowed to update + the refcount field and to register/unregister rseq. */ + RSEQ_REGISTER_ALLOWED = 0, + /* Value RSEQ_REGISTER_ONGOING means a rseq registration is in progress, + so it is temporarily forbidden to update the refcount field or to + register/unregister rseq for this thread or signal handlers nested + on this thread. */ + RSEQ_REGISTER_ONGOING = 1, + /* Value RSEQ_REGISTER_EXITING means it is forbidden to update the + refcount field or to register/unregister rseq for the rest of the + thread's lifetime. */ + RSEQ_REGISTER_EXITING = 2, +}; + +struct rseq_lib_abi +{ + uint32_t register_state; /* enum rseq_register_state. */ + /* The refcount field keeps track of rseq users, so early adopters + of rseq can cooperate amongst each other and with glibc to + share rseq thread registration. The refcount field can only be + updated when allowed by the value of field register_state. + Registering rseq should be performed when incrementing refcount + from 0 to 1, and unregistering rseq should be performed when + decrementing refcount from 1 to 0. */ + uint32_t refcount; +}; + +/* volatile because fields can be read/updated by the kernel. */ +extern __thread volatile struct rseq __rseq_abi +__attribute__ ((tls_model ("initial-exec"))); + +/* volatile because fields can be read/updated by signal handlers. */ +extern __thread volatile struct rseq_lib_abi __rseq_lib_abi +__attribute__ ((tls_model ("initial-exec"))); + +#endif /* sys/rseq.h */ diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index 86dfb0c94d..e9f8411fb2 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -1898,6 +1898,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index dd688263aa..f9432d07f1 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8
Register rseq(2) TLS for each thread (including main), and unregister for each thread (excluding main). "rseq" stands for Restartable Sequences. See the rseq(2) man page proposed here: https://lkml.org/lkml/2018/9/19/647 This patch is based on glibc-2.29. The rseq(2) system call was merged into Linux 4.18. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Carlos O'Donell <carlos@redhat.com> CC: Florian Weimer <fweimer@redhat.com> CC: Joseph Myers <joseph@codesourcery.com> CC: Szabolcs Nagy <szabolcs.nagy@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Maurer <bmaurer@fb.com> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Will Deacon <will.deacon@arm.com> CC: Dave Watson <davejwatson@fb.com> CC: Paul Turner <pjt@google.com> CC: Rich Felker <dalias@libc.org> CC: libc-alpha@sourceware.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Move __rseq_refcount to an extra field at the end of __rseq_abi to eliminate one symbol. All libraries/programs which try to register rseq (glibc, early-adopter applications, early-adopter libraries) should use the rseq refcount. It becomes part of the ABI within a user-space process, but it's not part of the ABI shared with the kernel per se. - Restructure how this code is organized so glibc keeps building on non-Linux targets. - Use non-weak symbol for __rseq_abi. - Move rseq registration/unregistration implementation into its own nptl/rseq.c compile unit. - Move __rseq_abi symbol under GLIBC_2.29. Changes since v2: - Move __rseq_refcount to its own symbol, which is less ugly than trying to play tricks with the rseq uapi. - Move __rseq_abi from nptl to csu (C start up), so it can be used across glibc, including memory allocator and sched_getcpu(). The __rseq_refcount symbol is kept in nptl, because there is no reason to use it elsewhere in glibc. Changes since v3: - Set __rseq_refcount TLS to 1 on register/set to 0 on unregister because glibc is the first/last user. - Unconditionally register/unregister rseq at thread start/exit, because glibc is the first/last user. - Add missing abilist items. - Rebase on glibc master commit a502c5294. - Add NEWS entry. Changes since v4: - Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on "System V Application Binary Interface", weak only affects the link editor, not the dynamic linker. - Install a new sys/rseq.h system header on Linux, which contains the RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount declaration. Move those definition/declarations from rseq-internal.h to the installed sys/rseq.h header. - Considering that rseq is only available on Linux, move csu/rseq.c to sysdeps/unix/sysv/linux/rseq-sym.c. - Move __rseq_refcount from nptl/rseq.c to sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux. - Move both ABI definitions for __rseq_abi and __rseq_refcount to sysdeps/unix/sysv/linux/Versions, so they only appear on Linux. - Document __rseq_abi and __rseq_refcount volatile. - Document the RSEQ_SIG signature define. - Move registration functions from rseq.c to rseq-internal.h static inline functions. Introduce empty stubs in misc/rseq-internal.h, which can be overridden by architecture code in sysdeps/unix/sysv/linux/rseq-internal.h. - Rename __rseq_register_current_thread and __rseq_unregister_current_thread to rseq_register_current_thread and rseq_unregister_current_thread, now that those are only visible as internal static inline functions. - Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN rather than nptl init, so applications not linked against libpthread.so have rseq registered for their main() thread. Note that it is invoked separately for SHARED and !SHARED builds. Changes since v5: - Replace __rseq_refcount by __rseq_lib_abi, which contains two uint32_t: register_state and refcount. The "register_state" field allows inhibiting rseq registration from signal handlers nested on top of glibc registration and occuring after rseq unregistration by glibc. - Introduce enum rseq_register_state, which contains the states allowed for the struct rseq_lib_abi register_state field. Changes since v6: - Introduce bits/rseq.h to define RSEQ_SIG for each architecture. The generic bits/rseq.h does not define RSEQ_SIG, meaning that each architecture implementing rseq needs to implement bits/rseq.h. - Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING. - Port to glibc-2.29. --- NEWS | 11 +++ csu/libc-start.c | 12 ++- misc/Makefile | 3 +- misc/rseq-internal.h | 34 +++++++ nptl/pthread_create.c | 9 ++ sysdeps/unix/sysv/linux/Makefile | 4 +- sysdeps/unix/sysv/linux/Versions | 4 + sysdeps/unix/sysv/linux/aarch64/bits/rseq.h | 24 +++++ sysdeps/unix/sysv/linux/aarch64/libc.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 2 + sysdeps/unix/sysv/linux/arm/bits/rseq.h | 24 +++++ sysdeps/unix/sysv/linux/arm/libc.abilist | 2 + sysdeps/unix/sysv/linux/bits/rseq.h | 24 +++++ sysdeps/unix/sysv/linux/hppa/libc.abilist | 2 + sysdeps/unix/sysv/linux/i386/libc.abilist | 2 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 2 + .../sysv/linux/m68k/coldfire/libc.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 2 + .../unix/sysv/linux/microblaze/libc.abilist | 2 + sysdeps/unix/sysv/linux/mips/bits/rseq.h | 24 +++++ .../sysv/linux/mips/mips32/fpu/libc.abilist | 2 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 2 + .../sysv/linux/mips/mips64/n32/libc.abilist | 2 + .../sysv/linux/mips/mips64/n64/libc.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 2 + sysdeps/unix/sysv/linux/powerpc/bits/rseq.h | 24 +++++ .../linux/powerpc/powerpc32/fpu/libc.abilist | 2 + .../powerpc/powerpc32/nofpu/libc.abilist | 2 + .../linux/powerpc/powerpc64/be/libc.abilist | 2 + .../linux/powerpc/powerpc64/le/libc.abilist | 2 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 2 + sysdeps/unix/sysv/linux/rseq-internal.h | 91 +++++++++++++++++++ sysdeps/unix/sysv/linux/rseq-sym.c | 54 +++++++++++ sysdeps/unix/sysv/linux/s390/bits/rseq.h | 24 +++++ .../unix/sysv/linux/s390/s390-32/libc.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sh/libc.abilist | 2 + .../sysv/linux/sparc/sparc32/libc.abilist | 2 + .../sysv/linux/sparc/sparc64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sys/rseq.h | 65 +++++++++++++ sysdeps/unix/sysv/linux/x86/bits/rseq.h | 24 +++++ .../unix/sysv/linux/x86_64/64/libc.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 2 + 43 files changed, 501 insertions(+), 6 deletions(-) create mode 100644 misc/rseq-internal.h create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h