Message ID | 20190212194253.1951-1-mathieu.desnoyers@efficios.com |
---|---|
Headers | show |
Series | Restartable Sequences support for glibc 2.30 | expand |
On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > The only point that still appears to not reach concensus is whether it's > acceptable to define the RSEQ_SIG code signature for each architecture. > If I missed other points that failed to reach concensus, please let me > know! I have no objection to RSEQ_SIG being a signature that each architecture uses. Can you summarize the previous discussions here?
* Mathieu Desnoyers: > The only point that still appears to not reach concensus is whether it's > acceptable to define the RSEQ_SIG code signature for each architecture. > If I missed other points that failed to reach concensus, please let me > know! I still think the registration mechanism is very problematic and should be avoided.
On 3/22/19 1:39 PM, Florian Weimer wrote: > * Mathieu Desnoyers: > >> The only point that still appears to not reach concensus is whether it's >> acceptable to define the RSEQ_SIG code signature for each architecture. >> If I missed other points that failed to reach concensus, please let me >> know! > > I still think the registration mechanism is very problematic and > should be avoided. The *entire* registration mechanism?
* Carlos O'Donell: > On 3/22/19 1:39 PM, Florian Weimer wrote: >> * Mathieu Desnoyers: >> >>> The only point that still appears to not reach concensus is whether it's >>> acceptable to define the RSEQ_SIG code signature for each architecture. >>> If I missed other points that failed to reach concensus, please let me >>> know! >> >> I still think the registration mechanism is very problematic and >> should be avoided. > > The *entire* registration mechanism? The reference-counting part. It's going to be of limited use, for a few years at most, and we'll have to carry it forward indefinitely. I don't think it's worth the complexity.
----- On Mar 22, 2019, at 1:34 PM, Carlos O'Donell codonell@redhat.com wrote: > On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: >> The only point that still appears to not reach concensus is whether it's >> acceptable to define the RSEQ_SIG code signature for each architecture. >> If I missed other points that failed to reach concensus, please let me >> know! > > I have no objection to RSEQ_SIG being a signature that each architecture uses. > > Can you summarize the previous discussions here? The culprit is whether we provide an architecture-agnostic "generic" value for RSEQ_SIG for architectures that do not define a specific RSEQ_SIG, or leave RSEQ_SIG undefined for architectures that do not define it yet. Considering that RSEQ_SIG ends up being a 32-bit value that maps to actual architecture code compiled into applications and libraries, it's understandable that it needs to be defined for each architecture specifically. For instance, generating invalid instructions may have ill effects on tools like objdump, and may also have impact on the CPU speculative execution efficiency in some cases. There has been ideas floating around about having a "generic" value for RSEQ_SIG that would apply to all architectures that do not override it. However, if we have this, it makes it impossible for those architectures to ever define a different value if they ever choose so, because that value ends up being an ABI that is built into applications and libraries including the glibc rseq header. Changing that value would break applications. So I favor _not_ defining any generic RSEQ_SIG, and adding per-architecture RSEQ_SIG define gradually. Therefore, user-space can #ifdef on whether RSEQ_SIG is defined or not after including the glibc rseq header. Thanks, Mathieu
On 3/22/19 1:55 PM, Mathieu Desnoyers wrote: > ----- On Mar 22, 2019, at 1:34 PM, Carlos O'Donell codonell@redhat.com wrote: > >> On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: >>> The only point that still appears to not reach concensus is whether it's >>> acceptable to define the RSEQ_SIG code signature for each architecture. >>> If I missed other points that failed to reach concensus, please let me >>> know! >> >> I have no objection to RSEQ_SIG being a signature that each architecture uses. >> >> Can you summarize the previous discussions here? > > The culprit is whether we provide an architecture-agnostic "generic" value > for RSEQ_SIG for architectures that do not define a specific RSEQ_SIG, or > leave RSEQ_SIG undefined for architectures that do not define it yet. > > Considering that RSEQ_SIG ends up being a 32-bit value that maps to actual > architecture code compiled into applications and libraries, it's > understandable that it needs to be defined for each architecture specifically. > For instance, generating invalid instructions may have ill effects on tools like > objdump, and may also have impact on the CPU speculative execution efficiency > in some cases. > > There has been ideas floating around about having a "generic" value for > RSEQ_SIG that would apply to all architectures that do not override it. > However, if we have this, it makes it impossible for those architectures > to ever define a different value if they ever choose so, because that value > ends up being an ABI that is built into applications and libraries including > the glibc rseq header. Changing that value would break applications. > > So I favor _not_ defining any generic RSEQ_SIG, and adding per-architecture > RSEQ_SIG define gradually. Therefore, user-space can #ifdef on whether > RSEQ_SIG is defined or not after including the glibc rseq header. That makes perfect sense. I was particularly worried about hardware that wasn't designed for having constant pools and injecting invalid instructions into that stream can worse speculative execution and break developer tools that need to know about new instructions (like valgrind).
On 3/22/19 1:51 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> On 3/22/19 1:39 PM, Florian Weimer wrote: >>> * Mathieu Desnoyers: >>> >>>> The only point that still appears to not reach concensus is whether it's >>>> acceptable to define the RSEQ_SIG code signature for each architecture. >>>> If I missed other points that failed to reach concensus, please let me >>>> know! >>> >>> I still think the registration mechanism is very problematic and >>> should be avoided. >> >> The *entire* registration mechanism? > > The reference-counting part. It's going to be of limited use, for a > few years at most, and we'll have to carry it forward indefinitely. > I don't think it's worth the complexity. I can understand Mathieu's position here, he wants to enable all kinds of users, and wants to write libraries that use rseq today but which work with future glibc. This is a perfectly reasonable thing to want. The question we have to ask is the cost. My suggestion is as follows, tell me what you think: (a) Add a RSEQ_REGISTER_ALWAYS. The meaning of which is that the core C library does unconditional registration/unregistration for all threads, and that your application must not call rseq with flags 0 (register)/RSEQ_FLAG_UNREGISTER. (b) In a few years we remove all the ref count code and define a __rseq_lib_abi with a register_state that is set to a constant value of RSEQ_REGISTER_ALWAYS, and do nothing else. This way we have a way to backout the ref count process and just leave a public data symbol as the only part of the ABI. My idea is that we just need one more RSEQ_REGISTER_* value to indicate that libc has taken over unconditional registration. Thoughts?
----- On Mar 22, 2019, at 3:45 PM, Carlos O'Donell codonell@redhat.com wrote: > On 3/22/19 1:51 PM, Florian Weimer wrote: >> * Carlos O'Donell: >> >>> On 3/22/19 1:39 PM, Florian Weimer wrote: >>>> * Mathieu Desnoyers: >>>> >>>>> The only point that still appears to not reach concensus is whether it's >>>>> acceptable to define the RSEQ_SIG code signature for each architecture. >>>>> If I missed other points that failed to reach concensus, please let me >>>>> know! >>>> >>>> I still think the registration mechanism is very problematic and >>>> should be avoided. >>> >>> The *entire* registration mechanism? >> >> The reference-counting part. It's going to be of limited use, for a >> few years at most, and we'll have to carry it forward indefinitely. >> I don't think it's worth the complexity. > > I can understand Mathieu's position here, he wants to enable all > kinds of users, and wants to write libraries that use rseq today > but which work with future glibc. This is a perfectly reasonable > thing to want. The question we have to ask is the cost. > > My suggestion is as follows, tell me what you think: > > (a) Add a RSEQ_REGISTER_ALWAYS. The meaning of which is that the > core C library does unconditional registration/unregistration > for all threads, and that your application must not call > rseq with flags 0 (register)/RSEQ_FLAG_UNREGISTER. > > (b) In a few years we remove all the ref count code and define > a __rseq_lib_abi with a register_state that is set to > a constant value of RSEQ_REGISTER_ALWAYS, and do nothing else. > > This way we have a way to backout the ref count process and just > leave a public data symbol as the only part of the ABI. > > My idea is that we just need one more RSEQ_REGISTER_* value to > indicate that libc has taken over unconditional registration. > > Thoughts? I think we can do even simpler. We can move the TLS refcount and state to an external library (librseq). glibc would expose a new global "int" variable symbol __rseq_handled acting as a boolean. It would be initially 0. glibc would set it to 1 in its C startup code when it effectively handles rseq registration. That symbol would _not_ be a TLS (it's global). librseq would be a new library used by early rseq adopters. It would expose a rseq register/unregister API, which internally would: - Check whether __rseq_handled is true. If so, it would do nothing, leaving rseq registration to the libc. - If __rseq_handled is false, deal with many early adopters with TLS refcount and state variables internal to librseq.so. That should take care of minimizing those metrics: - glibc ABI complexity and maintenance burden in the long term, - pain for rseq early adopters when upgrading to newer glibc, Does that make sense ? Thanks, Mathieu