Message ID | 87ikxfd7d3.fsf@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2] Linux: Make __rseq_size useful for feature detection (bug 31965) | expand |
On 2024-07-08 15:14, Florian Weimer wrote: > The __rseq_size value is now the active area of struct rseq > (so 20 initially), not the full struct size including padding > at the end (32 initially). > > Update misc/tst-rseq to print some additional diagnostics. LGTM. Reviewed-by: Michael Jeanson <mjeanson@efficios.com> Cheers, Michael
On 08-Jul-2024 09:14:00 PM, Florian Weimer wrote: > The __rseq_size value is now the active area of struct rseq > (so 20 initially), not the full struct size including padding > at the end (32 initially). > > Update misc/tst-rseq to print some additional diagnostics. Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> So applications that want to be fully compatible with the older behavior should consider a __rseq_size=32 as a special-case, in which case it should either: A) consider the feature size to be 20 (which is always safe but may be missing supported features), or B) use the result of getauxval(AT_RSEQ_FEATURE_SIZE) to figure out the supported kernel features. If getauxval(AT_RSEQ_FEATURE_SIZE)=0, 20 bytes of feature are supported, else the available features would be the minimum between 32 and getauxval(AT_RSEQ_FEATURE_SIZE). Note that we may want to consider ensuring getauxval(AT_RSEQ_FEATURE_SIZE) never returns the special value 32, e.g. by making sure we expose a few new fields at once in a kernel release. This way userspace could consider that __rseq_size=32 really means that 20 bytes are active. If we allow getauxval(AT_RSEQ_FEATURE_SIZE) to eventually return 32, then userspace using option (A) could "regress" when the kernel expose more features by assuming it means only 20 bytes are active, when in fact 32 bytes are supported. If userspace use option (B) and validate __rseq_size=32 with getauxval(AT_RSEQ_FEATURE_SIZE), we never have such possible feature regression. Thanks! Mathieu > > --- > v2: Treat as bug fix, add bug number, remove symbol versioning. > NEWS | 3 +++ > manual/threads.texi | 8 ++++++-- > sysdeps/nptl/dl-tls_init_tp.c | 8 +------- > sysdeps/unix/sysv/linux/rseq-internal.h | 23 +++++++++++++++++++++-- > sysdeps/unix/sysv/linux/tst-rseq.c | 10 +++++++++- > 5 files changed, 40 insertions(+), 12 deletions(-) > > diff --git a/NEWS b/NEWS > index f626896aa0..3824815917 100644 > --- a/NEWS > +++ b/NEWS > @@ -49,6 +49,9 @@ Deprecated and removed features, and other changes affecting compatibility: > <utmpx.h> (except for login_tty) due to locking and session management > problems. > > +* __rseq_size now denotes the size of the active rseq area (20 bytes > + initially), not the size of struct rseq (32 bytes initially). > + > Changes to build and runtime requirements: > > [Add changes to build and runtime requirements here] > diff --git a/manual/threads.texi b/manual/threads.texi > index e5544ff3da..25e99c9606 100644 > --- a/manual/threads.texi > +++ b/manual/threads.texi > @@ -1007,8 +1007,12 @@ This variable is either zero (if restartable sequence registration > failed or has been disabled) or the size of the restartable sequence > registration. This can be different from the size of @code{struct rseq} > if the kernel has extended the size of the registration. If > -registration is successful, @code{__rseq_size} is at least 32 (the > -initial size of @code{struct rseq}). > +registration is successful, @code{__rseq_size} is at least 20 (the > +initially active size of @code{struct rseq}). > + > +Previous versions of @theglibc{} set this to 32 even if the kernel only > +supported the initial area of 20 bytes because the value included unused > +padding at the end of the restartable sequence area. > @end deftypevar > > @deftypevar {unsigned int} __rseq_flags > diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c > index 7eb35fb133..7803e19fd1 100644 > --- a/sysdeps/nptl/dl-tls_init_tp.c > +++ b/sysdeps/nptl/dl-tls_init_tp.c > @@ -46,10 +46,6 @@ rtld_mutex_dummy (pthread_mutex_t *lock) > > const unsigned int __rseq_flags; > > -/* The variables are in .data.relro but are not yet write-protected. */ > -extern unsigned int _rseq_size attribute_hidden; > -extern ptrdiff_t _rseq_offset attribute_hidden; > - > void > __tls_pre_init_tp (void) > { > @@ -106,9 +102,7 @@ __tls_init_tp (void) > bool do_rseq = true; > do_rseq = TUNABLE_GET (rseq, int, NULL); > if (rseq_register_current_thread (pd, do_rseq)) > - { > - _rseq_size = sizeof (pd->rseq_area); > - } > + _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED; > > #ifdef RSEQ_SIG > /* This should be a compile-time constant, but the current > diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h > index 48eebc1e16..7ea935b4ad 100644 > --- a/sysdeps/unix/sysv/linux/rseq-internal.h > +++ b/sysdeps/unix/sysv/linux/rseq-internal.h > @@ -25,15 +25,34 @@ > #include <stdio.h> > #include <sys/rseq.h> > > +/* 32 is the initially required value for the area size. The > + actually used rseq size may be less (20 bytes initially). */ > +#define RSEQ_AREA_SIZE_INITIAL 32 > +#define RSEQ_AREA_SIZE_INITIAL_USED 20 > + > +/* The variables are in .data.relro but are not yet write-protected. */ > +extern unsigned int _rseq_size attribute_hidden; > +extern ptrdiff_t _rseq_offset attribute_hidden; > + > #ifdef RSEQ_SIG > static inline bool > rseq_register_current_thread (struct pthread *self, bool do_rseq) > { > if (do_rseq) > { > + unsigned int size; > +#if IS_IN (rtld) > + /* Use the hidden symbol in ld.so. */ > + size = _rseq_size; > +#else > + size = __rseq_size; > +#endif > + if (size < RSEQ_AREA_SIZE_INITIAL) > + /* The initial implementation used only 20 bytes out of 32, > + but still expected size 32. */ > + size = RSEQ_AREA_SIZE_INITIAL; > int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area, > - sizeof (self->rseq_area), > - 0, RSEQ_SIG); > + size, 0, RSEQ_SIG); > if (!INTERNAL_SYSCALL_ERROR_P (ret)) > return true; > } > diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c > index 2c90409ba0..08a9533130 100644 > --- a/sysdeps/unix/sysv/linux/tst-rseq.c > +++ b/sysdeps/unix/sysv/linux/tst-rseq.c > @@ -29,6 +29,7 @@ > # include <stdlib.h> > # include <string.h> > # include <syscall.h> > +# include <sys/auxv.h> > # include <thread_pointer.h> > # include <tls.h> > # include "tst-rseq.h" > @@ -42,7 +43,8 @@ do_rseq_main_test (void) > TEST_COMPARE (__rseq_flags, 0); > TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset > == (char *) &pd->rseq_area); > - TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area)); > + /* The current implementation only supports the initial size. */ > + TEST_COMPARE (__rseq_size, 20); > } > > static void > @@ -52,6 +54,12 @@ do_rseq_test (void) > { > FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test"); > } > + printf ("info: __rseq_size: %u\n", __rseq_size); > + printf ("info: __rseq_offset: %td\n", __rseq_offset); > + printf ("info: __rseq_flags: %u\n", __rseq_flags); > + printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n", > + getauxval (AT_RSEQ_FEATURE_SIZE)); > + printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN)); > do_rseq_main_test (); > } > #else /* RSEQ_SIG */ > > base-commit: 9fc639f654dc004736836613be703e6bed0c36a8 >
* Michael Jeanson: > Reviewed-by: Michael Jeanson <mjeanson@efficios.com> * Mathieu Desnoyers: > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Thanks. Andreas, is this okay for 2.40? Florian
Am Dienstag, 9. Juli 2024, 17:10:45 CEST schrieb Florian Weimer: > * Michael Jeanson: > > > Reviewed-by: Michael Jeanson <mjeanson@efficios.com> > > * Mathieu Desnoyers: > > > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Thanks. > > Andreas, is this okay for 2.40? > Sure, as already discussed
On 2024-07-09 11:10, Florian Weimer wrote: > * Michael Jeanson: > >> Reviewed-by: Michael Jeanson <mjeanson@efficios.com> > > * Mathieu Desnoyers: > >> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Hi Florian, I have created a prototype patch against librseq to integrate with your proposed changes. Feel free to have a look. Feedback is welcome! https://review.lttng.org/c/librseq/+/12910 Note that librseq does not have any release nor offers any ABI guarantees yet, so we can change the semantic as needed. Thanks, Mathieu > > Thanks. > > Andreas, is this okay for 2.40? > > Florian >
* Mathieu Desnoyers: > On 2024-07-09 11:10, Florian Weimer wrote: >> * Michael Jeanson: >> >>> Reviewed-by: Michael Jeanson <mjeanson@efficios.com> >> * Mathieu Desnoyers: >> >>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Hi Florian, > > I have created a prototype patch against librseq to integrate with > your proposed changes. Feel free to have a look. Feedback is welcome! > > https://review.lttng.org/c/librseq/+/12910 > > Note that librseq does not have any release nor offers any ABI > guarantees yet, so we can change the semantic as needed. Thanks. I've submitted a PR against CRIU to adjust to the new glibc behavior: Adjust to glibc __rseq_size semantic change <https://github.com/checkpoint-restore/criu/pull/2440> Let's see if the CRIU developers raise any concerns with the glibc changes. Florian
diff --git a/NEWS b/NEWS index f626896aa0..3824815917 100644 --- a/NEWS +++ b/NEWS @@ -49,6 +49,9 @@ Deprecated and removed features, and other changes affecting compatibility: <utmpx.h> (except for login_tty) due to locking and session management problems. +* __rseq_size now denotes the size of the active rseq area (20 bytes + initially), not the size of struct rseq (32 bytes initially). + Changes to build and runtime requirements: [Add changes to build and runtime requirements here] diff --git a/manual/threads.texi b/manual/threads.texi index e5544ff3da..25e99c9606 100644 --- a/manual/threads.texi +++ b/manual/threads.texi @@ -1007,8 +1007,12 @@ This variable is either zero (if restartable sequence registration failed or has been disabled) or the size of the restartable sequence registration. This can be different from the size of @code{struct rseq} if the kernel has extended the size of the registration. If -registration is successful, @code{__rseq_size} is at least 32 (the -initial size of @code{struct rseq}). +registration is successful, @code{__rseq_size} is at least 20 (the +initially active size of @code{struct rseq}). + +Previous versions of @theglibc{} set this to 32 even if the kernel only +supported the initial area of 20 bytes because the value included unused +padding at the end of the restartable sequence area. @end deftypevar @deftypevar {unsigned int} __rseq_flags diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c index 7eb35fb133..7803e19fd1 100644 --- a/sysdeps/nptl/dl-tls_init_tp.c +++ b/sysdeps/nptl/dl-tls_init_tp.c @@ -46,10 +46,6 @@ rtld_mutex_dummy (pthread_mutex_t *lock) const unsigned int __rseq_flags; -/* The variables are in .data.relro but are not yet write-protected. */ -extern unsigned int _rseq_size attribute_hidden; -extern ptrdiff_t _rseq_offset attribute_hidden; - void __tls_pre_init_tp (void) { @@ -106,9 +102,7 @@ __tls_init_tp (void) bool do_rseq = true; do_rseq = TUNABLE_GET (rseq, int, NULL); if (rseq_register_current_thread (pd, do_rseq)) - { - _rseq_size = sizeof (pd->rseq_area); - } + _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED; #ifdef RSEQ_SIG /* This should be a compile-time constant, but the current diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h index 48eebc1e16..7ea935b4ad 100644 --- a/sysdeps/unix/sysv/linux/rseq-internal.h +++ b/sysdeps/unix/sysv/linux/rseq-internal.h @@ -25,15 +25,34 @@ #include <stdio.h> #include <sys/rseq.h> +/* 32 is the initially required value for the area size. The + actually used rseq size may be less (20 bytes initially). */ +#define RSEQ_AREA_SIZE_INITIAL 32 +#define RSEQ_AREA_SIZE_INITIAL_USED 20 + +/* The variables are in .data.relro but are not yet write-protected. */ +extern unsigned int _rseq_size attribute_hidden; +extern ptrdiff_t _rseq_offset attribute_hidden; + #ifdef RSEQ_SIG static inline bool rseq_register_current_thread (struct pthread *self, bool do_rseq) { if (do_rseq) { + unsigned int size; +#if IS_IN (rtld) + /* Use the hidden symbol in ld.so. */ + size = _rseq_size; +#else + size = __rseq_size; +#endif + if (size < RSEQ_AREA_SIZE_INITIAL) + /* The initial implementation used only 20 bytes out of 32, + but still expected size 32. */ + size = RSEQ_AREA_SIZE_INITIAL; int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area, - sizeof (self->rseq_area), - 0, RSEQ_SIG); + size, 0, RSEQ_SIG); if (!INTERNAL_SYSCALL_ERROR_P (ret)) return true; } diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c index 2c90409ba0..08a9533130 100644 --- a/sysdeps/unix/sysv/linux/tst-rseq.c +++ b/sysdeps/unix/sysv/linux/tst-rseq.c @@ -29,6 +29,7 @@ # include <stdlib.h> # include <string.h> # include <syscall.h> +# include <sys/auxv.h> # include <thread_pointer.h> # include <tls.h> # include "tst-rseq.h" @@ -42,7 +43,8 @@ do_rseq_main_test (void) TEST_COMPARE (__rseq_flags, 0); TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset == (char *) &pd->rseq_area); - TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area)); + /* The current implementation only supports the initial size. */ + TEST_COMPARE (__rseq_size, 20); } static void @@ -52,6 +54,12 @@ do_rseq_test (void) { FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test"); } + printf ("info: __rseq_size: %u\n", __rseq_size); + printf ("info: __rseq_offset: %td\n", __rseq_offset); + printf ("info: __rseq_flags: %u\n", __rseq_flags); + printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n", + getauxval (AT_RSEQ_FEATURE_SIZE)); + printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN)); do_rseq_main_test (); } #else /* RSEQ_SIG */