Message ID | 20200713193434.30440-1-mathieu.desnoyers@efficios.com |
---|---|
State | New |
Headers | show |
Series | [RFC] Linux: Use fixed rseq_len value for rseq registration | expand |
* Mathieu Desnoyers: > + /* The rseq_len parameter does not allow extending struct rseq. Fix its > + value to 32 as expected by the Linux kernel. */ > + ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG); If the layout of struct rseq can change in the kernel headers, than far more significant changes are needed. glibc cannot change its ABI depending on the version of the kernel headers it is compiled against. Thanks, Florian
----- On Jul 14, 2020, at 4:51 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> + /* The rseq_len parameter does not allow extending struct rseq. Fix its >> + value to 32 as expected by the Linux kernel. */ >> + ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG); > > If the layout of struct rseq can change in the kernel headers, than far > more significant changes are needed. glibc cannot change its ABI > depending on the version of the kernel headers it is compiled against. What I have in mind for struct rseq is that the current structure layout is guaranteed as a "minimum" requirement. It can then be extended by the kernel UAPI in a backward compatible way: if user-space provides enough room for extension, and the kernel supports the additional features, then user-space can use those additional features. Adding extra room in the __rseq_abi can be done though: - Upgrading glibc with knowledge of the extra layout, - Preloading an interposition library which defines __rseq_abi with the extended layout, - Defining __rseq_abi with extended layout in the application binary. What I suspect is your concern here is that two glibc-2.32 builds against different versions of kernel headers could lead to different __rseq_abi size, which is unexpected. Would using the internal struct rseq all the time in glibc (rather than uapi linux/rseq.h) resolve your concern ? This way, glibc could integrate extended layouts on version bump (e.g. 2.33). Thanks, Mathieu
* Mathieu Desnoyers: > ----- On Jul 14, 2020, at 4:51 AM, Florian Weimer fweimer@redhat.com wrote: > >> * Mathieu Desnoyers: >> >>> + /* The rseq_len parameter does not allow extending struct rseq. Fix its >>> + value to 32 as expected by the Linux kernel. */ >>> + ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG); >> >> If the layout of struct rseq can change in the kernel headers, than far >> more significant changes are needed. glibc cannot change its ABI >> depending on the version of the kernel headers it is compiled against. > > What I have in mind for struct rseq is that the current structure layout > is guaranteed as a "minimum" requirement. It can then be extended by the > kernel UAPI in a backward compatible way: if user-space provides enough > room for extension, and the kernel supports the additional features, then > user-space can use those additional features. > > Adding extra room in the __rseq_abi can be done though: > > - Upgrading glibc with knowledge of the extra layout, > - Preloading an interposition library which defines __rseq_abi with the > extended layout, > - Defining __rseq_abi with extended layout in the application binary. > > What I suspect is your concern here is that two glibc-2.32 builds against > different versions of kernel headers could lead to different __rseq_abi > size, which is unexpected. Indeed. > Would using the internal struct rseq all the time in glibc (rather than > uapi linux/rseq.h) resolve your concern ? This way, glibc could integrate > extended layouts on version bump (e.g. 2.33). It works reliably as long as glibc only ever uses the minimum rseq size. And since glibc monopolizes the rseq registration, applications cannot register a larger area. So there is no way to make use of any future kernel extensions. Thanks, Florian
On Tue, Jul 14, 2020 at 03:28:47PM +0200, Florian Weimer via Libc-alpha wrote: > * Mathieu Desnoyers: > > > ----- On Jul 14, 2020, at 4:51 AM, Florian Weimer fweimer@redhat.com wrote: > > > >> * Mathieu Desnoyers: > >> > >>> + /* The rseq_len parameter does not allow extending struct rseq. Fix its > >>> + value to 32 as expected by the Linux kernel. */ > >>> + ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG); > >> > >> If the layout of struct rseq can change in the kernel headers, than far > >> more significant changes are needed. glibc cannot change its ABI > >> depending on the version of the kernel headers it is compiled against. > > > > What I have in mind for struct rseq is that the current structure layout > > is guaranteed as a "minimum" requirement. It can then be extended by the > > kernel UAPI in a backward compatible way: if user-space provides enough > > room for extension, and the kernel supports the additional features, then > > user-space can use those additional features. > > > > Adding extra room in the __rseq_abi can be done though: > > > > - Upgrading glibc with knowledge of the extra layout, > > - Preloading an interposition library which defines __rseq_abi with the > > extended layout, > > - Defining __rseq_abi with extended layout in the application binary. > > > > What I suspect is your concern here is that two glibc-2.32 builds against > > different versions of kernel headers could lead to different __rseq_abi > > size, which is unexpected. > > Indeed. > > > Would using the internal struct rseq all the time in glibc (rather than > > uapi linux/rseq.h) resolve your concern ? This way, glibc could integrate > > extended layouts on version bump (e.g. 2.33). > > It works reliably as long as glibc only ever uses the minimum rseq size. > And since glibc monopolizes the rseq registration, applications cannot > register a larger area. So there is no way to make use of any future > kernel extensions. But when you bump ABI in glibc you can switch to a new rseq size, no? Christian
* Christian Brauner: >> It works reliably as long as glibc only ever uses the minimum rseq size. >> And since glibc monopolizes the rseq registration, applications cannot >> register a larger area. So there is no way to make use of any future >> kernel extensions. > > But when you bump ABI in glibc you can switch to a new rseq size, no? We are expected to support interposers with their own definition __rseq_abi, which could be smaller. Thanks, Florian
----- On Jul 14, 2020, at 9:54 AM, Florian Weimer fweimer@redhat.com wrote: > * Christian Brauner: > >>> It works reliably as long as glibc only ever uses the minimum rseq size. >>> And since glibc monopolizes the rseq registration, applications cannot >>> register a larger area. So there is no way to make use of any future >>> kernel extensions. >> >> But when you bump ABI in glibc you can switch to a new rseq size, no? > > We are expected to support interposers with their own definition > __rseq_abi, which could be smaller. In this scenario, the interposers would have to follow the UAPI rules: The size should at least cover the original struct rseq fields, up to and including the flags field. This just means glibc would internally abide by the same rules as the other users if it want to use an extended rseq structure, and not expect to be the only possible library defining the __rseq_abi symbol. If glibc wants to use the extension feature, it would have to validate that the RSEQ_TLS_FLAG_SIZE flag is set in the __rseq_abi.flags, and then validate that __rseq_abi.kernel_size covers the feature it needs before accessing the feature field. It would have to do something similar anyway to make sure the kernel supports those extensions. Thanks, Mathieu
* Mathieu Desnoyers: > ----- On Jul 14, 2020, at 9:54 AM, Florian Weimer fweimer@redhat.com wrote: > >> * Christian Brauner: >> >>>> It works reliably as long as glibc only ever uses the minimum rseq size. >>>> And since glibc monopolizes the rseq registration, applications cannot >>>> register a larger area. So there is no way to make use of any future >>>> kernel extensions. >>> >>> But when you bump ABI in glibc you can switch to a new rseq size, no? >> >> We are expected to support interposers with their own definition >> __rseq_abi, which could be smaller. > > In this scenario, the interposers would have to follow the UAPI rules: > The size should at least cover the original struct rseq fields, > up to and including the flags field. If you use an ELF symbol, you also have to follow rules related to ELF linking (and C language requirements). > This just means glibc would internally abide by the same rules as the > other users if it want to use an extended rseq structure, and not expect > to be the only possible library defining the __rseq_abi symbol. If glibc > wants to use the extension feature, it would have to validate that the > RSEQ_TLS_FLAG_SIZE flag is set in the __rseq_abi.flags, and then validate > that __rseq_abi.kernel_size covers the feature it needs before accessing > the feature field. This still does not give us a way to perform the rseq registration with the size expected by an interposing definition of __rseq_abi. I think we are looking at this from the wrong perspective. It's not userspace that is setting the size here, it's the kernel based on the features it supports. So the kernel should put the size into the auxiliary vector, and the registration should use that size. But that doesn't align well with the use of an ELF TLS symbol. Thanks, Florian
----- On Jul 14, 2020, at 11:07 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> ----- On Jul 14, 2020, at 9:54 AM, Florian Weimer fweimer@redhat.com wrote: >> >>> * Christian Brauner: >>> >>>>> It works reliably as long as glibc only ever uses the minimum rseq size. >>>>> And since glibc monopolizes the rseq registration, applications cannot >>>>> register a larger area. So there is no way to make use of any future >>>>> kernel extensions. >>>> >>>> But when you bump ABI in glibc you can switch to a new rseq size, no? >>> >>> We are expected to support interposers with their own definition >>> __rseq_abi, which could be smaller. >> >> In this scenario, the interposers would have to follow the UAPI rules: >> The size should at least cover the original struct rseq fields, >> up to and including the flags field. > > If you use an ELF symbol, you also have to follow rules related to ELF > linking (and C language requirements). Indeed. > >> This just means glibc would internally abide by the same rules as the >> other users if it want to use an extended rseq structure, and not expect >> to be the only possible library defining the __rseq_abi symbol. If glibc >> wants to use the extension feature, it would have to validate that the >> RSEQ_TLS_FLAG_SIZE flag is set in the __rseq_abi.flags, and then validate >> that __rseq_abi.kernel_size covers the feature it needs before accessing >> the feature field. > > This still does not give us a way to perform the rseq registration with > the size expected by an interposing definition of __rseq_abi. Yes it does. The interposing definition of __rseq_abi would have the __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, and would have __rseq_abi.user_size = offsetof(struct rseq, end) (or something similar) from its own perspective. That would be passed to the kernel on registration. The "rseq_len" system call parameter would always stay at value 32 though. > I think we are looking at this from the wrong perspective. It's not > userspace that is setting the size here, it's the kernel based on the > features it supports. So the kernel should put the size into the > auxiliary vector, and the registration should use that size. But that > doesn't align well with the use of an ELF TLS symbol. We have a few possible ways to do things here: 1) Kernel exports supported size, incompatible with ELF TLS symbol, 2) Userspace dictates supported size, compatible with ELF TLS symbol, triggers failure if the kernel supports a smaller size, 3) Userspace lets kernel know how much space is available for struct rseq (through user_size field), and the kernel lets user-space know how much of that structure is being filled (through kernel_size field). This would also be compatible with ELF TLS symbol AFAIU, and would allow extending struct rseq. Option (3) would allow us to have the speed gains that come with using a TLS from the fast-path, while allowing extension of struct rseq. Or is there anything in that scheme that breaks ELF rules or C language requirements ? Thanks, Mathieu
The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote: > > I think we are looking at this from the wrong perspective. It's not > > userspace that is setting the size here, it's the kernel based on the > > features it supports. So the kernel should put the size into the > > auxiliary vector, and the registration should use that size. But that > > doesn't align well with the use of an ELF TLS symbol. this is why it is better to use a function that returns a pointer than a tls symbol as public abi. > We have a few possible ways to do things here: > > 1) Kernel exports supported size, incompatible with ELF TLS symbol, > > 2) Userspace dictates supported size, compatible with ELF TLS symbol, > triggers failure if the kernel supports a smaller size, > > 3) Userspace lets kernel know how much space is available for struct rseq > (through user_size field), and the kernel lets user-space know how much > of that structure is being filled (through kernel_size field). This > would also be compatible with ELF TLS symbol AFAIU, and would allow > extending struct rseq. > > Option (3) would allow us to have the speed gains that come with using a > TLS from the fast-path, while allowing extension of struct rseq. > > Or is there anything in that scheme that breaks ELF rules or C language > requirements ? how would users access those extension fields? how would the public struct definition change?
----- On Jul 14, 2020, at 12:01 PM, Szabolcs Nagy szabolcs.nagy@arm.com wrote: > The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote: >> > I think we are looking at this from the wrong perspective. It's not >> > userspace that is setting the size here, it's the kernel based on the >> > features it supports. So the kernel should put the size into the >> > auxiliary vector, and the registration should use that size. But that >> > doesn't align well with the use of an ELF TLS symbol. > > this is why it is better to use a function that returns > a pointer than a tls symbol as public abi. > >> We have a few possible ways to do things here: >> >> 1) Kernel exports supported size, incompatible with ELF TLS symbol, >> >> 2) Userspace dictates supported size, compatible with ELF TLS symbol, >> triggers failure if the kernel supports a smaller size, >> >> 3) Userspace lets kernel know how much space is available for struct rseq >> (through user_size field), and the kernel lets user-space know how much >> of that structure is being filled (through kernel_size field). This >> would also be compatible with ELF TLS symbol AFAIU, and would allow >> extending struct rseq. >> >> Option (3) would allow us to have the speed gains that come with using a >> TLS from the fast-path, while allowing extension of struct rseq. >> >> Or is there anything in that scheme that breaks ELF rules or C language >> requirements ? > > how would users access those extension fields? if (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) { /* Allowed to access user_size and kernel_size. */ if (__rseq_abi.kernel_size >= offsetof(struct rseq, myfield) + sizeof(((struct rseq *)NULL)->myfield)) { /* Allowed to access __rseq_abi.myfield. User code should remember this, e.g. in a static variable. */ } } Once the user program/library has confirmed that it can indeed access the field, it can directly use the field with __rseq_abi.myfield. > how would the public struct definition change? Additional fields would be added at the end of struct rseq. If we use the approach with the end[] flexible array member, then it would move down to stay at the very end of the structure. If we don't (because it is not c++ compliant), then we would update an "rseq_last_field" macro to point to the last field in struct rseq. Adding additional fields to struct rseq would increase its size due to added fields, and eventually may change its alignment to a larger value. Thanks, Mathieu
* Mathieu Desnoyers: > ----- On Jul 14, 2020, at 12:01 PM, Szabolcs Nagy szabolcs.nagy@arm.com wrote: > >> The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote: >>> > I think we are looking at this from the wrong perspective. It's not >>> > userspace that is setting the size here, it's the kernel based on the >>> > features it supports. So the kernel should put the size into the >>> > auxiliary vector, and the registration should use that size. But that >>> > doesn't align well with the use of an ELF TLS symbol. >> >> this is why it is better to use a function that returns >> a pointer than a tls symbol as public abi. >> >>> We have a few possible ways to do things here: >>> >>> 1) Kernel exports supported size, incompatible with ELF TLS symbol, >>> >>> 2) Userspace dictates supported size, compatible with ELF TLS symbol, >>> triggers failure if the kernel supports a smaller size, >>> >>> 3) Userspace lets kernel know how much space is available for struct rseq >>> (through user_size field), and the kernel lets user-space know how much >>> of that structure is being filled (through kernel_size field). This >>> would also be compatible with ELF TLS symbol AFAIU, and would allow >>> extending struct rseq. >>> >>> Option (3) would allow us to have the speed gains that come with using a >>> TLS from the fast-path, while allowing extension of struct rseq. >>> >>> Or is there anything in that scheme that breaks ELF rules or C language >>> requirements ? >> >> how would users access those extension fields? > > if (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) { > /* Allowed to access user_size and kernel_size. */ > if (__rseq_abi.kernel_size >= offsetof(struct rseq, myfield) + sizeof(((struct rseq *)NULL)->myfield)) { > /* Allowed to access __rseq_abi.myfield. User code should remember this, e.g. in a static variable. */ > } > } Technically, this needs a compiler extension, so that the allowed accesses to the __rseq_abi.myfield field are not moved before the flags and size checks. Thanks, Florian
----- On Jul 14, 2020, at 1:03 PM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> ----- On Jul 14, 2020, at 12:01 PM, Szabolcs Nagy szabolcs.nagy@arm.com wrote: >> >>> The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote: >>>> > I think we are looking at this from the wrong perspective. It's not >>>> > userspace that is setting the size here, it's the kernel based on the >>>> > features it supports. So the kernel should put the size into the >>>> > auxiliary vector, and the registration should use that size. But that >>>> > doesn't align well with the use of an ELF TLS symbol. >>> >>> this is why it is better to use a function that returns >>> a pointer than a tls symbol as public abi. >>> >>>> We have a few possible ways to do things here: >>>> >>>> 1) Kernel exports supported size, incompatible with ELF TLS symbol, >>>> >>>> 2) Userspace dictates supported size, compatible with ELF TLS symbol, >>>> triggers failure if the kernel supports a smaller size, >>>> >>>> 3) Userspace lets kernel know how much space is available for struct rseq >>>> (through user_size field), and the kernel lets user-space know how much >>>> of that structure is being filled (through kernel_size field). This >>>> would also be compatible with ELF TLS symbol AFAIU, and would allow >>>> extending struct rseq. >>>> >>>> Option (3) would allow us to have the speed gains that come with using a >>>> TLS from the fast-path, while allowing extension of struct rseq. >>>> >>>> Or is there anything in that scheme that breaks ELF rules or C language >>>> requirements ? >>> >>> how would users access those extension fields? >> >> if (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) { >> /* Allowed to access user_size and kernel_size. */ >> if (__rseq_abi.kernel_size >= offsetof(struct rseq, myfield) + sizeof(((struct >> rseq *)NULL)->myfield)) { >> /* Allowed to access __rseq_abi.myfield. User code should remember this, e.g. in >> a static variable. */ >> } >> } > > Technically, this needs a compiler extension, so that the allowed > accesses to the __rseq_abi.myfield field are not moved before the flags > and size checks. If those accesses are performed as volatile loads, or with atomic relaxed semantic, the compiler should not be allowed to change the order. Likewise if we use asm volatile (::: "memory") clobber between the loads. Thanks, Mathieu
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h index 8f6772ca1d..3522668f3a 100644 --- a/sysdeps/unix/sysv/linux/rseq-internal.h +++ b/sysdeps/unix/sysv/linux/rseq-internal.h @@ -33,8 +33,9 @@ rseq_register_current_thread (void) if (__rseq_abi.cpu_id != RSEQ_CPU_ID_UNINITIALIZED) __libc_fatal ("glibc fatal error: " "rseq already initialized for this thread\n"); - ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, sizeof (struct rseq), - 0, RSEQ_SIG); + /* The rseq_len parameter does not allow extending struct rseq. Fix its + value to 32 as expected by the Linux kernel. */ + ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG); if (INTERNAL_SYSCALL_ERROR_P (ret)) { const char *msg = NULL;