diff mbox series

[v2] Linux: Make __rseq_size useful for feature detection (bug 31965)

Message ID 87ikxfd7d3.fsf@oldenburg.str.redhat.com
State New
Headers show
Series [v2] Linux: Make __rseq_size useful for feature detection (bug 31965) | expand

Commit Message

Florian Weimer July 8, 2024, 7:14 p.m. UTC
The __rseq_size value is now the active area of struct rseq
(so 20 initially), not the full struct size including padding
at the end (32 initially).

Update misc/tst-rseq to print some additional diagnostics.

---
v2: Treat as bug fix, add bug number, remove symbol versioning.
 NEWS                                    |  3 +++
 manual/threads.texi                     |  8 ++++++--
 sysdeps/nptl/dl-tls_init_tp.c           |  8 +-------
 sysdeps/unix/sysv/linux/rseq-internal.h | 23 +++++++++++++++++++++--
 sysdeps/unix/sysv/linux/tst-rseq.c      | 10 +++++++++-
 5 files changed, 40 insertions(+), 12 deletions(-)


base-commit: 9fc639f654dc004736836613be703e6bed0c36a8

Comments

Michael Jeanson July 9, 2024, 2:28 p.m. UTC | #1
On 2024-07-08 15:14, Florian Weimer wrote:
> The __rseq_size value is now the active area of struct rseq
> (so 20 initially), not the full struct size including padding
> at the end (32 initially).
> 
> Update misc/tst-rseq to print some additional diagnostics.

LGTM.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>

Cheers,

Michael
Mathieu Desnoyers July 9, 2024, 2:41 p.m. UTC | #2
On 08-Jul-2024 09:14:00 PM, Florian Weimer wrote:
> The __rseq_size value is now the active area of struct rseq
> (so 20 initially), not the full struct size including padding
> at the end (32 initially).
> 
> Update misc/tst-rseq to print some additional diagnostics.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

So applications that want to be fully compatible with the older
behavior should consider a __rseq_size=32 as a special-case, in which
case it should either:

A) consider the feature size to be 20 (which is always safe but may be
   missing supported features), or

B) use the result of getauxval(AT_RSEQ_FEATURE_SIZE) to figure out the
   supported kernel features. If getauxval(AT_RSEQ_FEATURE_SIZE)=0,
   20 bytes of feature are supported, else the available features would
   be the minimum between 32 and getauxval(AT_RSEQ_FEATURE_SIZE).

Note that we may want to consider ensuring getauxval(AT_RSEQ_FEATURE_SIZE)
never returns the special value 32, e.g. by making sure we expose a few
new fields at once in a kernel release. This way userspace could consider
that __rseq_size=32 really means that 20 bytes are active.

If we allow getauxval(AT_RSEQ_FEATURE_SIZE) to eventually return 32,
then userspace using option (A) could "regress" when the kernel expose
more features by assuming it means only 20 bytes are active, when in
fact 32 bytes are supported. If userspace use option (B) and validate
__rseq_size=32 with getauxval(AT_RSEQ_FEATURE_SIZE), we never have such
possible feature regression.

Thanks!

Mathieu

> 
> ---
> v2: Treat as bug fix, add bug number, remove symbol versioning.
>  NEWS                                    |  3 +++
>  manual/threads.texi                     |  8 ++++++--
>  sysdeps/nptl/dl-tls_init_tp.c           |  8 +-------
>  sysdeps/unix/sysv/linux/rseq-internal.h | 23 +++++++++++++++++++++--
>  sysdeps/unix/sysv/linux/tst-rseq.c      | 10 +++++++++-
>  5 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index f626896aa0..3824815917 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -49,6 +49,9 @@ Deprecated and removed features, and other changes affecting compatibility:
>    <utmpx.h> (except for login_tty) due to locking and session management
>    problems.
>  
> +* __rseq_size now denotes the size of the active rseq area (20 bytes
> +  initially), not the size of struct rseq (32 bytes initially).
> +
>  Changes to build and runtime requirements:
>  
>    [Add changes to build and runtime requirements here]
> diff --git a/manual/threads.texi b/manual/threads.texi
> index e5544ff3da..25e99c9606 100644
> --- a/manual/threads.texi
> +++ b/manual/threads.texi
> @@ -1007,8 +1007,12 @@ This variable is either zero (if restartable sequence registration
>  failed or has been disabled) or the size of the restartable sequence
>  registration.  This can be different from the size of @code{struct rseq}
>  if the kernel has extended the size of the registration.  If
> -registration is successful, @code{__rseq_size} is at least 32 (the
> -initial size of @code{struct rseq}).
> +registration is successful, @code{__rseq_size} is at least 20 (the
> +initially active size of @code{struct rseq}).
> +
> +Previous versions of @theglibc{} set this to 32 even if the kernel only
> +supported the initial area of 20 bytes because the value included unused
> +padding at the end of the restartable sequence area.
>  @end deftypevar
>  
>  @deftypevar {unsigned int} __rseq_flags
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index 7eb35fb133..7803e19fd1 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -46,10 +46,6 @@ rtld_mutex_dummy (pthread_mutex_t *lock)
>  
>  const unsigned int __rseq_flags;
>  
> -/* The variables are in .data.relro but are not yet write-protected.  */
> -extern unsigned int _rseq_size attribute_hidden;
> -extern ptrdiff_t _rseq_offset attribute_hidden;
> -
>  void
>  __tls_pre_init_tp (void)
>  {
> @@ -106,9 +102,7 @@ __tls_init_tp (void)
>      bool do_rseq = true;
>      do_rseq = TUNABLE_GET (rseq, int, NULL);
>      if (rseq_register_current_thread (pd, do_rseq))
> -      {
> -        _rseq_size = sizeof (pd->rseq_area);
> -      }
> +      _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED;
>  
>  #ifdef RSEQ_SIG
>      /* This should be a compile-time constant, but the current
> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> index 48eebc1e16..7ea935b4ad 100644
> --- a/sysdeps/unix/sysv/linux/rseq-internal.h
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -25,15 +25,34 @@
>  #include <stdio.h>
>  #include <sys/rseq.h>
>  
> +/* 32 is the initially required value for the area size.  The
> +   actually used rseq size may be less (20 bytes initially).  */
> +#define RSEQ_AREA_SIZE_INITIAL 32
> +#define RSEQ_AREA_SIZE_INITIAL_USED 20
> +
> +/* The variables are in .data.relro but are not yet write-protected.  */
> +extern unsigned int _rseq_size attribute_hidden;
> +extern ptrdiff_t _rseq_offset attribute_hidden;
> +
>  #ifdef RSEQ_SIG
>  static inline bool
>  rseq_register_current_thread (struct pthread *self, bool do_rseq)
>  {
>    if (do_rseq)
>      {
> +      unsigned int size;
> +#if IS_IN (rtld)
> +      /* Use the hidden symbol in ld.so.  */
> +      size = _rseq_size;
> +#else
> +      size = __rseq_size;
> +#endif
> +      if (size < RSEQ_AREA_SIZE_INITIAL)
> +        /* The initial implementation used only 20 bytes out of 32,
> +           but still expected size 32.  */
> +        size = RSEQ_AREA_SIZE_INITIAL;
>        int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
> -                                       sizeof (self->rseq_area),
> -                                       0, RSEQ_SIG);
> +                                       size, 0, RSEQ_SIG);
>        if (!INTERNAL_SYSCALL_ERROR_P (ret))
>          return true;
>      }
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
> index 2c90409ba0..08a9533130 100644
> --- a/sysdeps/unix/sysv/linux/tst-rseq.c
> +++ b/sysdeps/unix/sysv/linux/tst-rseq.c
> @@ -29,6 +29,7 @@
>  # include <stdlib.h>
>  # include <string.h>
>  # include <syscall.h>
> +# include <sys/auxv.h>
>  # include <thread_pointer.h>
>  # include <tls.h>
>  # include "tst-rseq.h"
> @@ -42,7 +43,8 @@ do_rseq_main_test (void)
>    TEST_COMPARE (__rseq_flags, 0);
>    TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
>                 == (char *) &pd->rseq_area);
> -  TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area));
> +  /* The current implementation only supports the initial size.  */
> +  TEST_COMPARE (__rseq_size, 20);
>  }
>  
>  static void
> @@ -52,6 +54,12 @@ do_rseq_test (void)
>      {
>        FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
>      }
> +  printf ("info: __rseq_size: %u\n", __rseq_size);
> +  printf ("info: __rseq_offset: %td\n", __rseq_offset);
> +  printf ("info: __rseq_flags: %u\n", __rseq_flags);
> +  printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
> +          getauxval (AT_RSEQ_FEATURE_SIZE));
> +  printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
>    do_rseq_main_test ();
>  }
>  #else /* RSEQ_SIG */
> 
> base-commit: 9fc639f654dc004736836613be703e6bed0c36a8
>
Florian Weimer July 9, 2024, 3:10 p.m. UTC | #3
* Michael Jeanson:

> Reviewed-by: Michael Jeanson <mjeanson@efficios.com>

* Mathieu Desnoyers:

> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thanks.

Andreas, is this okay for 2.40?

Florian
Andreas K. Huettel July 9, 2024, 4:37 p.m. UTC | #4
Am Dienstag, 9. Juli 2024, 17:10:45 CEST schrieb Florian Weimer:
> * Michael Jeanson:
> 
> > Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
> 
> * Mathieu Desnoyers:
> 
> > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> Thanks.
> 
> Andreas, is this okay for 2.40?
> 

Sure, as already discussed
Mathieu Desnoyers July 9, 2024, 6:28 p.m. UTC | #5
On 2024-07-09 11:10, Florian Weimer wrote:
> * Michael Jeanson:
> 
>> Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
> 
> * Mathieu Desnoyers:
> 
>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Hi Florian,

I have created a prototype patch against librseq to integrate with
your proposed changes. Feel free to have a look. Feedback is welcome!

   https://review.lttng.org/c/librseq/+/12910

Note that librseq does not have any release nor offers any ABI
guarantees yet, so we can change the semantic as needed.

Thanks,

Mathieu

> 
> Thanks.
> 
> Andreas, is this okay for 2.40?
> 
> Florian
>
Florian Weimer July 10, 2024, 4:59 p.m. UTC | #6
* Mathieu Desnoyers:

> On 2024-07-09 11:10, Florian Weimer wrote:
>> * Michael Jeanson:
>> 
>>> Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
>> * Mathieu Desnoyers:
>> 
>>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
> Hi Florian,
>
> I have created a prototype patch against librseq to integrate with
> your proposed changes. Feel free to have a look. Feedback is welcome!
>
>   https://review.lttng.org/c/librseq/+/12910
>
> Note that librseq does not have any release nor offers any ABI
> guarantees yet, so we can change the semantic as needed.

Thanks.  I've submitted a PR against CRIU to adjust to the new glibc
behavior:

  Adjust to glibc __rseq_size semantic change
  <https://github.com/checkpoint-restore/criu/pull/2440>

Let's see if the CRIU developers raise any concerns with the glibc
changes.

Florian
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index f626896aa0..3824815917 100644
--- a/NEWS
+++ b/NEWS
@@ -49,6 +49,9 @@  Deprecated and removed features, and other changes affecting compatibility:
   <utmpx.h> (except for login_tty) due to locking and session management
   problems.
 
+* __rseq_size now denotes the size of the active rseq area (20 bytes
+  initially), not the size of struct rseq (32 bytes initially).
+
 Changes to build and runtime requirements:
 
   [Add changes to build and runtime requirements here]
diff --git a/manual/threads.texi b/manual/threads.texi
index e5544ff3da..25e99c9606 100644
--- a/manual/threads.texi
+++ b/manual/threads.texi
@@ -1007,8 +1007,12 @@  This variable is either zero (if restartable sequence registration
 failed or has been disabled) or the size of the restartable sequence
 registration.  This can be different from the size of @code{struct rseq}
 if the kernel has extended the size of the registration.  If
-registration is successful, @code{__rseq_size} is at least 32 (the
-initial size of @code{struct rseq}).
+registration is successful, @code{__rseq_size} is at least 20 (the
+initially active size of @code{struct rseq}).
+
+Previous versions of @theglibc{} set this to 32 even if the kernel only
+supported the initial area of 20 bytes because the value included unused
+padding at the end of the restartable sequence area.
 @end deftypevar
 
 @deftypevar {unsigned int} __rseq_flags
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index 7eb35fb133..7803e19fd1 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -46,10 +46,6 @@  rtld_mutex_dummy (pthread_mutex_t *lock)
 
 const unsigned int __rseq_flags;
 
-/* The variables are in .data.relro but are not yet write-protected.  */
-extern unsigned int _rseq_size attribute_hidden;
-extern ptrdiff_t _rseq_offset attribute_hidden;
-
 void
 __tls_pre_init_tp (void)
 {
@@ -106,9 +102,7 @@  __tls_init_tp (void)
     bool do_rseq = true;
     do_rseq = TUNABLE_GET (rseq, int, NULL);
     if (rseq_register_current_thread (pd, do_rseq))
-      {
-        _rseq_size = sizeof (pd->rseq_area);
-      }
+      _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED;
 
 #ifdef RSEQ_SIG
     /* This should be a compile-time constant, but the current
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 48eebc1e16..7ea935b4ad 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -25,15 +25,34 @@ 
 #include <stdio.h>
 #include <sys/rseq.h>
 
+/* 32 is the initially required value for the area size.  The
+   actually used rseq size may be less (20 bytes initially).  */
+#define RSEQ_AREA_SIZE_INITIAL 32
+#define RSEQ_AREA_SIZE_INITIAL_USED 20
+
+/* The variables are in .data.relro but are not yet write-protected.  */
+extern unsigned int _rseq_size attribute_hidden;
+extern ptrdiff_t _rseq_offset attribute_hidden;
+
 #ifdef RSEQ_SIG
 static inline bool
 rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
   if (do_rseq)
     {
+      unsigned int size;
+#if IS_IN (rtld)
+      /* Use the hidden symbol in ld.so.  */
+      size = _rseq_size;
+#else
+      size = __rseq_size;
+#endif
+      if (size < RSEQ_AREA_SIZE_INITIAL)
+        /* The initial implementation used only 20 bytes out of 32,
+           but still expected size 32.  */
+        size = RSEQ_AREA_SIZE_INITIAL;
       int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
-                                       sizeof (self->rseq_area),
-                                       0, RSEQ_SIG);
+                                       size, 0, RSEQ_SIG);
       if (!INTERNAL_SYSCALL_ERROR_P (ret))
         return true;
     }
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
index 2c90409ba0..08a9533130 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq.c
+++ b/sysdeps/unix/sysv/linux/tst-rseq.c
@@ -29,6 +29,7 @@ 
 # include <stdlib.h>
 # include <string.h>
 # include <syscall.h>
+# include <sys/auxv.h>
 # include <thread_pointer.h>
 # include <tls.h>
 # include "tst-rseq.h"
@@ -42,7 +43,8 @@  do_rseq_main_test (void)
   TEST_COMPARE (__rseq_flags, 0);
   TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
                == (char *) &pd->rseq_area);
-  TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area));
+  /* The current implementation only supports the initial size.  */
+  TEST_COMPARE (__rseq_size, 20);
 }
 
 static void
@@ -52,6 +54,12 @@  do_rseq_test (void)
     {
       FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
     }
+  printf ("info: __rseq_size: %u\n", __rseq_size);
+  printf ("info: __rseq_offset: %td\n", __rseq_offset);
+  printf ("info: __rseq_flags: %u\n", __rseq_flags);
+  printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
+          getauxval (AT_RSEQ_FEATURE_SIZE));
+  printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
   do_rseq_main_test ();
 }
 #else /* RSEQ_SIG */