Message ID | 20240801-rseq-abi-v11-split-v12-1-0e87479dddc0@efficios.com |
---|---|
State | New |
Headers | show |
Series | Add rseq extensible ABI support | expand |
* Michael Jeanson: > diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h > index ea2a58ecb1..d24daab3fc 100644 > --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h > +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h > @@ -21,6 +21,7 @@ > #include <fpu_control.h> > #include <ldsodefs.h> > #include <link.h> > +#include <rseq-internal.h> > > typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1]; > > @@ -59,5 +60,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values) > GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO]; > #endif > > + _rseq_size = MAX (auxv_values[AT_RSEQ_FEATURE_SIZE], > + RSEQ_AREA_SIZE_INITIAL_USED); > + _rseq_align = MAX (auxv_values[AT_RSEQ_ALIGN], RSEQ_MIN_ALIGN); > + > DL_PLATFORM_AUXV > } I believe you need an else branch in __tls_init_tp in sysdeps/nptl/dl-tls_init_tp.c, to set _rseq_size to 0 if registration fails: bool do_rseq = true; do_rseq = TUNABLE_GET (rseq, int, NULL); if (rseq_register_current_thread (pd, do_rseq)) _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED; Not sure why this doesn't result in a test failure (maybe it's fixed again later in the series?). Thanks, Florian
On 2024-08-06 02 h 11, Florian Weimer wrote: > * Michael Jeanson: > >> diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h >> index ea2a58ecb1..d24daab3fc 100644 >> --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h >> +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h >> @@ -21,6 +21,7 @@ >> #include <fpu_control.h> >> #include <ldsodefs.h> >> #include <link.h> >> +#include <rseq-internal.h> >> >> typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1]; >> >> @@ -59,5 +60,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values) >> GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO]; >> #endif >> >> + _rseq_size = MAX (auxv_values[AT_RSEQ_FEATURE_SIZE], >> + RSEQ_AREA_SIZE_INITIAL_USED); >> + _rseq_align = MAX (auxv_values[AT_RSEQ_ALIGN], RSEQ_MIN_ALIGN); >> + >> DL_PLATFORM_AUXV >> } > > I believe you need an else branch in __tls_init_tp in > sysdeps/nptl/dl-tls_init_tp.c, to set _rseq_size to 0 if registration > fails: > > bool do_rseq = true; > do_rseq = TUNABLE_GET (rseq, int, NULL); > if (rseq_register_current_thread (pd, do_rseq)) > _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED; > > Not sure why this doesn't result in a test failure (maybe it's fixed > again later in the series?). Before the removal of _dl_rseq_feature_size there was no behavior change introduced until "Move the rseq area to the 'extra TLS' block" which is where _rseq_size is set to 0 on failure. I agree it would make more sense to have it in this patch, I'll queue this change for the next round. Thanks, Michael
On 2024-08-06 09:32, Michael Jeanson wrote: > Before the removal of _dl_rseq_feature_size there was no behavior change > introduced until "Move the rseq area to the 'extra TLS' block" which is where > _rseq_size is set to 0 on failure. I agree it would make more sense to have it > in this patch, I'll queue this change for the next round. I was wondering if this patch with the code to set _rseq_size to zero on registration failure, plus an additional modification to cap the _rseq_size to 32 (the size of the rseq area in the struct pthread) could be backported to 2.40 and earlier? It would give application code access to the 'node_id' and 'mm_cid' fields without introducing the whole 'extra tls' stuff. Thoughts? Thanks, Michael
* Michael Jeanson: > On 2024-08-06 09:32, Michael Jeanson wrote: >> Before the removal of _dl_rseq_feature_size there was no behavior >> change introduced until "Move the rseq area to the 'extra TLS' >> block" which is where _rseq_size is set to 0 on failure. I agree it >> would make more sense to have it in this patch, I'll queue this >> change for the next round. > > I was wondering if this patch with the code to set _rseq_size to zero > on registration failure, plus an additional modification to cap the > _rseq_size to 32 (the size of the rseq area in the struct pthread) > could be backported to 2.40 and earlier? It would give application > code access to the 'node_id' and 'mm_cid' fields without introducing > the whole 'extra tls' stuff. We already touched briefly on this. I was worried that kernels would only support two size arguments in the rseq system call: 32 and what they signal through auxiliary vector. And that passing 32 might actually mean 20 or 28 in a future kernel release, not 32. Then __rseq would be wrong again. Thanks, Florian
On 2024-08-07 13:52, Florian Weimer wrote: > We already touched briefly on this. I was worried that kernels would > only support two size arguments in the rseq system call: 32 and what > they signal through auxiliary vector. And that passing 32 might > actually mean 20 or 28 in a future kernel release, not 32. Then __rseq > would be wrong again. According to Mathieu, the plan for hypothetical future kernels with a feature size above 32 is to have 2 types of registration: - fixed 32 bytes, gives you access to ALL the fields that fit inside 32 bytes - at least the feature size reported in the auxiliary vector In the first case, reporting a feature size of 32 would be correct but I can see an issue with applications special casing '__rseq_size == 32' since it's what glibc < 2.40 reports and it's not a valid 'feature size' when reported by those libc's. Also the plan kernel side is to never have a feature size of 32 to allow userspace to special case it, but until such a kernel ships some other consideration could override this I suppose. We'll have to handle this when we get there. What we could do here is set a MAX of 28 instead of 32 on '__rseq_size' in this patch which would still give us access to all the fields that are currently available in released kernels without the confusion associated with '32'. Thanks, Michael
* Michael Jeanson: > On 2024-08-06 09:32, Michael Jeanson wrote: >> Before the removal of _dl_rseq_feature_size there was no behavior >> change introduced until "Move the rseq area to the 'extra TLS' >> block" which is where _rseq_size is set to 0 on failure. I agree it >> would make more sense to have it in this patch, I'll queue this >> change for the next round. > > I was wondering if this patch with the code to set _rseq_size to zero > on registration failure, plus an additional modification to cap the > _rseq_size to 32 (the size of the rseq area in the struct pthread) > could be backported to 2.40 and earlier? It would give application > code access to the 'node_id' and 'mm_cid' fields without introducing > the whole 'extra tls' stuff. > > Thoughts? I thought about this some more. I think we should also increase the reservation inside struct pthread beyond 32 bytes if we do this, so that we avoid the __rseq_size == 32 ambiguity. It currently doesn't break the GLIBC_PRIVATE ABI because rseq_area is at the end of struct pthread. (As long as we assume that ld.so gets updated before libc.so because it comes first in lexicographic order.) Thanks, Florian
On 2024-08-15 13:54, Florian Weimer wrote: > * Michael Jeanson: > >> On 2024-08-06 09:32, Michael Jeanson wrote: >>> Before the removal of _dl_rseq_feature_size there was no behavior >>> change introduced until "Move the rseq area to the 'extra TLS' >>> block" which is where _rseq_size is set to 0 on failure. I agree it >>> would make more sense to have it in this patch, I'll queue this >>> change for the next round. >> >> I was wondering if this patch with the code to set _rseq_size to zero >> on registration failure, plus an additional modification to cap the >> _rseq_size to 32 (the size of the rseq area in the struct pthread) >> could be backported to 2.40 and earlier? It would give application >> code access to the 'node_id' and 'mm_cid' fields without introducing >> the whole 'extra tls' stuff. >> >> Thoughts? > > I thought about this some more. I think we should also increase the > reservation inside struct pthread beyond 32 bytes if we do this, so that > we avoid the __rseq_size == 32 ambiguity. It currently doesn't break > the GLIBC_PRIVATE ABI because rseq_area is at the end of struct pthread. > (As long as we assume that ld.so gets updated before libc.so because it > comes first in lexicographic order.) So the plan for 2.40 would look something like this? - Increase the size of the rseq area in 'struct pthread' to 64 bytes or more? - Set __rseq_size from AT_RSEQ_FEATURE_SIZE with a min of 20 and max of 64 The resulting __rseq_size for different kernels: - kernel < 6.3 (AT_RSEQ_FEATURE_SIZE == 0) : __rseq_size == 20 - Same behavior as current 2.40 - kernel >= 6.3 (AT_RSEQ_FEATURE_SIZE == 28) : __rseq_size == 28 - Exposes 'node_id' and 'mm_cid' to userspace - future unreleased kernel (AT_RSEQ_FEATURE_SIZE == 32) : __rseq_size == 32 - We expect such a kernel to never exist, but would technically work - future unreleased kernel (AT_RSEQ_FEATURE_SIZE == 36) : __rseq_size == 36 - Exposes future fields to userspace - future unreleased kernel (AT_RSEQ_FEATURE_SIZE > 64) : __rseq_size == 0 - Attempting registration with a kernel that has a feature size bigger than our rseq area would result in a failed registration. We don't deal with the planned compat mode of 32 bytes. Does that align with what you have in mind? Thanks, Michael
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c index 7803e19fd1..b2abb2762d 100644 --- a/sysdeps/nptl/dl-tls_init_tp.c +++ b/sysdeps/nptl/dl-tls_init_tp.c @@ -46,6 +46,8 @@ rtld_mutex_dummy (pthread_mutex_t *lock) const unsigned int __rseq_flags; +size_t _rseq_align attribute_hidden; + void __tls_pre_init_tp (void) { diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h index ea2a58ecb1..d24daab3fc 100644 --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h @@ -21,6 +21,7 @@ #include <fpu_control.h> #include <ldsodefs.h> #include <link.h> +#include <rseq-internal.h> typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1]; @@ -59,5 +60,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values) GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO]; #endif + _rseq_size = MAX (auxv_values[AT_RSEQ_FEATURE_SIZE], + RSEQ_AREA_SIZE_INITIAL_USED); + _rseq_align = MAX (auxv_values[AT_RSEQ_ALIGN], RSEQ_MIN_ALIGN); + DL_PLATFORM_AUXV } diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h index 7ea935b4ad..2aae447b60 100644 --- a/sysdeps/unix/sysv/linux/rseq-internal.h +++ b/sysdeps/unix/sysv/linux/rseq-internal.h @@ -25,12 +25,23 @@ #include <stdio.h> #include <sys/rseq.h> -/* 32 is the initially required value for the area size. The - actually used rseq size may be less (20 bytes initially). */ +/* Minimum size of the rseq area allocation required by the syscall. The + actually used rseq feature size may be less (20 bytes initially). */ #define RSEQ_AREA_SIZE_INITIAL 32 + +/* Minimum used feature size of the rseq area. */ #define RSEQ_AREA_SIZE_INITIAL_USED 20 -/* The variables are in .data.relro but are not yet write-protected. */ +/* Minimum alignment of the rseq area. */ +#define RSEQ_MIN_ALIGN 32 + +/* Alignment requirement of the rseq area. + Populated from the auxiliary vector with a minimum of '32'. */ +extern size_t _rseq_align attribute_hidden; + +/* Size of the active features in the rseq area. + Populated from the auxiliary vector with a minimum of '20'. + In .data.relro but not yet write-protected. */ extern unsigned int _rseq_size attribute_hidden; extern ptrdiff_t _rseq_offset attribute_hidden;
Get the rseq feature size and alignment requirement from the auxiliary vector for use inside the dynamic loader. This will be used in the TLS block allocator to compute the size and alignment of the rseq area block for the extended ABI support. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> --- Changes since v11: - Removed _dl_rseq_feature_size, use __rseq_size instead - Replace GLRO(dl_rseq_align) with a hidden global variable _rseq_align --- sysdeps/nptl/dl-tls_init_tp.c | 2 ++ sysdeps/unix/sysv/linux/dl-parse_auxv.h | 5 +++++ sysdeps/unix/sysv/linux/rseq-internal.h | 17 ++++++++++++++--- 3 files changed, 21 insertions(+), 3 deletions(-)