diff mbox series

[v1,1/1] rseq: Validate read-only fields under DEBUG_RSEQ config

Message ID 20241111192214.1988000-1-mathieu.desnoyers@efficios.com
State New
Headers show
Series [v1,1/1] rseq: Validate read-only fields under DEBUG_RSEQ config | expand

Commit Message

Mathieu Desnoyers Nov. 11, 2024, 7:22 p.m. UTC
The rseq uapi requires cooperation between users of the rseq fields
to ensure that all libraries and applications using rseq within a
process do not interfere with each other.

This is especially important for fields which are meant to be read-only
from user-space, as documented in uapi/linux/rseq.h:

  - cpu_id_start,
  - cpu_id,
  - node_id,
  - mm_cid.

Storing to those fields from a user-space library prevents any sharing
of the rseq ABI with other libraries and applications, as other users
are not aware that the content of those fields has been altered by a
third-party library.

This is unfortunately the current behavior of tcmalloc: it purposefully
overlaps part of a cached value with the cpu_id_start upper bits to get
notified about preemption, because the kernel clears those upper bits
before returning to user-space. This behavior does not conform to the
rseq uapi header ABI.

This prevents tcmalloc from using rseq when rseq is registered by the
GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq
registration with a glibc tunable, which is a sad state of affairs.

Considering that tcmalloc and the GNU C library are the two first
upstream projects using rseq, and that they are already incompatible due
to use of this hack, adding kernel-level validation of all read-only
fields content is necessary to ensure future users of rseq abide by the
rseq ABI requirements.

Validate that user-space does not corrupt the read-only fields and
conform to the rseq uapi header ABI when the kernel is built with
CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only
fields in the task_struct, and validating the prior values present in
user-space before updating them. If the values do not match, print
a warning on the console (printk_ratelimited()).

This is a first step to identify misuses of the rseq ABI by printing
a warning on the console. After a giving some time to userspace to
correct its use of rseq, the plan is to eventually terminate offending
processes with SIGSEGV.

This change is expected to produce warnings for the upstream tcmalloc
implementation, but tcmalloc developers mentioned they were open to
adapt their implementation to kernel-level change.

Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/
Link: https://github.com/google/tcmalloc/issues/144
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Oskolkov <posk@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: DJ Delorie <dj@redhat.com>
Cc: libc-alpha@sourceware.org
---
Changes since v0:
- Structure ending with a flexible array cannot be placed
  within another structure (if not last member). Fix this by
  declaring the kernel copy place holder as a char array instead.
---
 include/linux/sched.h |  9 +++++
 kernel/rseq.c         | 92 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+)

Comments

Peter Zijlstra Nov. 12, 2024, 10:04 a.m. UTC | #1
On Mon, Nov 11, 2024 at 02:22:14PM -0500, Mathieu Desnoyers wrote:

So I'm entirely agreeing with the intent, but perhaps we can argue a
little on the implementation :-)

> +#ifdef CONFIG_DEBUG_RSEQ
> +static struct rseq *rseq_kernel_fields(struct task_struct *t)
> +{
> +	return (struct rseq *) t->rseq_fields;
> +}
> +
> +static int rseq_validate_ro_fields(struct task_struct *t)
> +{
> +	u32 cpu_id_start, cpu_id, node_id, mm_cid;
> +	struct rseq __user *rseq = t->rseq;
> +
> +	/*
> +	 * Validate fields which are required to be read-only by
> +	 * user-space.
> +	 */
> +	if (!user_read_access_begin(rseq, t->rseq_len))
> +		goto efault;
> +	unsafe_get_user(cpu_id_start, &rseq->cpu_id_start, efault_end);
> +	unsafe_get_user(cpu_id, &rseq->cpu_id, efault_end);
> +	unsafe_get_user(node_id, &rseq->node_id, efault_end);
> +	unsafe_get_user(mm_cid, &rseq->mm_cid, efault_end);
> +	user_read_access_end();
> +
> +	if (cpu_id_start != rseq_kernel_fields(t)->cpu_id_start)
> +		printk_ratelimited(KERN_WARNING
> +			"Detected rseq cpu_id_start field corruption. Value: %u, expecting: %u (pid=%d).\n",
> +			cpu_id_start, rseq_kernel_fields(t)->cpu_id_start, t->pid);
> +	if (cpu_id != rseq_kernel_fields(t)->cpu_id)
> +		printk_ratelimited(KERN_WARNING
> +			"Detected rseq cpu_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
> +			cpu_id, rseq_kernel_fields(t)->cpu_id, t->pid);
> +	if (node_id != rseq_kernel_fields(t)->node_id)
> +		printk_ratelimited(KERN_WARNING
> +			"Detected rseq node_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
> +			node_id, rseq_kernel_fields(t)->node_id, t->pid);
> +	if (mm_cid != rseq_kernel_fields(t)->mm_cid)
> +		printk_ratelimited(KERN_WARNING
> +			"Detected rseq mm_cid field corruption. Value: %u, expecting: %u (pid=%d).\n",
> +			mm_cid, rseq_kernel_fields(t)->mm_cid, t->pid);

So aside from this just being ugly, this also has the problem of getting
the ratelimits out of sync and perhaps only showing partial corruption
for any one task. 

Completely untested hackery below.

> @@ -423,6 +504,17 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
>  	current->rseq = rseq;
>  	current->rseq_len = rseq_len;
>  	current->rseq_sig = sig;
> +#ifdef CONFIG_DEBUG_RSEQ
> +	/*
> +	 * Initialize the in-kernel rseq fields copy for validation of
> +	 * read-only fields.
> +	 */
> +	if (get_user(rseq_kernel_fields(current)->cpu_id_start, &rseq->cpu_id_start) ||
> +	    get_user(rseq_kernel_fields(current)->cpu_id, &rseq->cpu_id) ||
> +	    get_user(rseq_kernel_fields(current)->node_id, &rseq->node_id) ||
> +	    get_user(rseq_kernel_fields(current)->mm_cid, &rseq->mm_cid))
> +		return -EFAULT;
> +#endif

So I didn't change the above, but wouldn't it make more sense to do
rseq_reset_rseq_cpu_node_id() here, but without the validation?

---
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1364,6 +1364,15 @@ struct task_struct {
 	 * with respect to preemption.
 	 */
 	unsigned long rseq_event_mask;
+# ifdef CONFIG_DEBUG_RSEQ
+	/*
+	 * This is a place holder to save a copy of the rseq fields for
+	 * validation of read-only fields. The struct rseq has a
+	 * variable-length array at the end, so it cannot be used
+	 * directly. Reserve a size large enough for the known fields.
+	 */
+	char 				rseq_fields[sizeof(struct rseq)];
+# endif
 #endif
 
 #ifdef CONFIG_SCHED_MM_CID
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -25,6 +25,79 @@
 				  RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \
 				  RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE)
 
+#ifdef CONFIG_DEBUG_RSEQ
+static struct rseq *rseq_kernel_fields(struct task_struct *t)
+{
+	return (struct rseq *) t->rseq_fields;
+}
+
+static int rseq_validate_ro_fields(struct task_struct *t)
+{
+	static DEFINE_RATELIMIT_STATE(_rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	u32 cpu_id_start, cpu_id, node_id, mm_cid;
+	struct rseq __user *rseq = t->rseq;
+
+	/*
+	 * Validate fields which are required to be read-only by
+	 * user-space.
+	 */
+	if (!user_read_access_begin(rseq, t->rseq_len))
+		goto efault;
+	unsafe_get_user(cpu_id_start, &rseq->cpu_id_start, efault_end);
+	unsafe_get_user(cpu_id, &rseq->cpu_id, efault_end);
+	unsafe_get_user(node_id, &rseq->node_id, efault_end);
+	unsafe_get_user(mm_cid, &rseq->mm_cid, efault_end);
+	user_read_access_end();
+
+	if ((cpu_id_start != rseq_kernel_fields(t)->cpu_id_start ||
+	     cpu_id != rseq_kernel_fields(t)->cpu_id ||
+	     node_id != rseq_kernel_fields(t)->node_id ||
+	     mm_cid != rseq_kernel_fields(t)->mm_cid) && __ratelimit(&_rs)) {
+
+		pr_warn("Detected rseq corruption for pid: %d;\n"
+			"  cpu_id_start: %u ?= %u\n"
+			"  cpu_id:       %u ?= %u\n"
+			"  node_id:      %u ?= %u\n"
+			"  mm_cid:       %u ?= %u\n"
+			t->pid,
+			cpu_id_start, rseq_kernel_fields(t)->cpu_id_start,
+			cpu_id, rseq_kernel_fields(t)->cpu_id,
+			node_id, rseq_kernel_fields(t)->node_id,
+			mm_cid, rseq_kernel_fields(t)->mm_cid);
+	}
+
+	/* For now, only print a console warning on mismatch. */
+	return 0;
+
+efault_end:
+	user_read_access_end();
+efault:
+	return -EFAULT;
+}
+
+static void rseq_set_ro_fields(struct task_struct *t, u32 cpu_id_start, u32 cpu_id,
+			       u32 node_id, u32 mm_cid)
+{
+	rseq_kernel_fields(t)->cpu_id_start = cpu_id;
+	rseq_kernel_fields(t)->cpu_id = cpu_id;
+	rseq_kernel_fields(t)->node_id = node_id;
+	rseq_kernel_fields(t)->mm_cid = mm_cid;
+}
+
+#else
+static int rseq_validate_ro_fields(struct task_struct *t)
+{
+	return 0;
+}
+
+static void rseq_set_ro_fields(struct task_struct *t, u32 cpu_id_start, u32 cpu_id,
+			       u32 node_id, u32 mm_cid)
+{
+}
+#endif
+
 /*
  *
  * Restartable sequences are a lightweight interface that allows
@@ -92,6 +165,11 @@ static int rseq_update_cpu_node_id(struc
 	u32 node_id = cpu_to_node(cpu_id);
 	u32 mm_cid = task_mm_cid(t);
 
+	/*
+	 * Validate read-only rseq fields.
+	 */
+	if (rseq_validate_ro_fields(t))
+		goto efault;
 	WARN_ON_ONCE((int) mm_cid < 0);
 	if (!user_write_access_begin(rseq, t->rseq_len))
 		goto efault;
@@ -105,6 +183,7 @@ static int rseq_update_cpu_node_id(struc
 	 * t->rseq_len != ORIG_RSEQ_SIZE.
 	 */
 	user_write_access_end();
+	rseq_set_ro_fields(t, cpu_id, cpu_id, node_id, mm_cid);
 	trace_rseq_update(t);
 	return 0;
 
@@ -120,6 +199,11 @@ static int rseq_reset_rseq_cpu_node_id(s
 	    mm_cid = 0;
 
 	/*
+	 * Validate read-only rseq fields.
+	 */
+	if (!rseq_validate_ro_fields(t))
+		return -EFAULT;
+	/*
 	 * Reset cpu_id_start to its initial state (0).
 	 */
 	if (put_user(cpu_id_start, &t->rseq->cpu_id_start))
@@ -141,6 +225,9 @@ static int rseq_reset_rseq_cpu_node_id(s
 	 */
 	if (put_user(mm_cid, &t->rseq->mm_cid))
 		return -EFAULT;
+
+	rseq_set_ro_fields(t, cpu_id_start, cpu_id, node_id, mm_cid);
+
 	/*
 	 * Additional feature fields added after ORIG_RSEQ_SIZE
 	 * need to be conditionally reset only if
@@ -423,6 +510,17 @@ SYSCALL_DEFINE4(rseq, struct rseq __user
 	current->rseq = rseq;
 	current->rseq_len = rseq_len;
 	current->rseq_sig = sig;
+#ifdef CONFIG_DEBUG_RSEQ
+	/*
+	 * Initialize the in-kernel rseq fields copy for validation of
+	 * read-only fields.
+	 */
+	if (get_user(rseq_kernel_fields(current)->cpu_id_start, &rseq->cpu_id_start) ||
+	    get_user(rseq_kernel_fields(current)->cpu_id, &rseq->cpu_id) ||
+	    get_user(rseq_kernel_fields(current)->node_id, &rseq->node_id) ||
+	    get_user(rseq_kernel_fields(current)->mm_cid, &rseq->mm_cid))
+		return -EFAULT;
+#endif
 	/*
 	 * If rseq was previously inactive, and has just been
 	 * registered, ensure the cpu_id_start and cpu_id fields
Mathieu Desnoyers Nov. 12, 2024, 2:27 p.m. UTC | #2
On 2024-11-12 05:04, Peter Zijlstra wrote:
> On Mon, Nov 11, 2024 at 02:22:14PM -0500, Mathieu Desnoyers wrote:
> 
> So I'm entirely agreeing with the intent, but perhaps we can argue a
> little on the implementation :-)

Of course, thanks for providing feedback!

> 
>> +#ifdef CONFIG_DEBUG_RSEQ
>> +static struct rseq *rseq_kernel_fields(struct task_struct *t)
>> +{
>> +	return (struct rseq *) t->rseq_fields;
>> +}
>> +
>> +static int rseq_validate_ro_fields(struct task_struct *t)
>> +{
>> +	u32 cpu_id_start, cpu_id, node_id, mm_cid;
>> +	struct rseq __user *rseq = t->rseq;
>> +
>> +	/*
>> +	 * Validate fields which are required to be read-only by
>> +	 * user-space.
>> +	 */
>> +	if (!user_read_access_begin(rseq, t->rseq_len))
>> +		goto efault;
>> +	unsafe_get_user(cpu_id_start, &rseq->cpu_id_start, efault_end);
>> +	unsafe_get_user(cpu_id, &rseq->cpu_id, efault_end);
>> +	unsafe_get_user(node_id, &rseq->node_id, efault_end);
>> +	unsafe_get_user(mm_cid, &rseq->mm_cid, efault_end);
>> +	user_read_access_end();
>> +
>> +	if (cpu_id_start != rseq_kernel_fields(t)->cpu_id_start)
>> +		printk_ratelimited(KERN_WARNING
>> +			"Detected rseq cpu_id_start field corruption. Value: %u, expecting: %u (pid=%d).\n",
>> +			cpu_id_start, rseq_kernel_fields(t)->cpu_id_start, t->pid);
>> +	if (cpu_id != rseq_kernel_fields(t)->cpu_id)
>> +		printk_ratelimited(KERN_WARNING
>> +			"Detected rseq cpu_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
>> +			cpu_id, rseq_kernel_fields(t)->cpu_id, t->pid);
>> +	if (node_id != rseq_kernel_fields(t)->node_id)
>> +		printk_ratelimited(KERN_WARNING
>> +			"Detected rseq node_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
>> +			node_id, rseq_kernel_fields(t)->node_id, t->pid);
>> +	if (mm_cid != rseq_kernel_fields(t)->mm_cid)
>> +		printk_ratelimited(KERN_WARNING
>> +			"Detected rseq mm_cid field corruption. Value: %u, expecting: %u (pid=%d).\n",
>> +			mm_cid, rseq_kernel_fields(t)->mm_cid, t->pid);
> 
> So aside from this just being ugly, this also has the problem of getting
> the ratelimits out of sync and perhaps only showing partial corruption
> for any one task.
> 
> Completely untested hackery below.

Your approach looks indeed better than mine, I'll steal it with your
permission. :)

> 
>> @@ -423,6 +504,17 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
>>   	current->rseq = rseq;
>>   	current->rseq_len = rseq_len;
>>   	current->rseq_sig = sig;
>> +#ifdef CONFIG_DEBUG_RSEQ
>> +	/*
>> +	 * Initialize the in-kernel rseq fields copy for validation of
>> +	 * read-only fields.
>> +	 */
>> +	if (get_user(rseq_kernel_fields(current)->cpu_id_start, &rseq->cpu_id_start) ||
>> +	    get_user(rseq_kernel_fields(current)->cpu_id, &rseq->cpu_id) ||
>> +	    get_user(rseq_kernel_fields(current)->node_id, &rseq->node_id) ||
>> +	    get_user(rseq_kernel_fields(current)->mm_cid, &rseq->mm_cid))
>> +		return -EFAULT;
>> +#endif
> 
> So I didn't change the above, but wouldn't it make more sense to do
> rseq_reset_rseq_cpu_node_id() here, but without the validation?

Indeed we could do this (for both DEBUG_RSEQ={y,n}), but it would add extra
useless stores to those userspace fields on rseq registration, which is
performed on every thread creation starting from glibc 2.35. The
rseq_set_notify_resume() invoked at the end of registration ensures that
those fields get populated before return to userspace.

So I am not against a more robust approach, but I'm reluctant to add redundant
work on thread creation.

> 
> ---
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1364,6 +1364,15 @@ struct task_struct {
>   	 * with respect to preemption.
>   	 */
>   	unsigned long rseq_event_mask;
> +# ifdef CONFIG_DEBUG_RSEQ
> +	/*
> +	 * This is a place holder to save a copy of the rseq fields for
> +	 * validation of read-only fields. The struct rseq has a
> +	 * variable-length array at the end, so it cannot be used
> +	 * directly. Reserve a size large enough for the known fields.
> +	 */
> +	char 				rseq_fields[sizeof(struct rseq)];
> +# endif
>   #endif
>   
>   #ifdef CONFIG_SCHED_MM_CID
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c

We should #include <linux/ratelimit.h> then.

> @@ -25,6 +25,79 @@
>   				  RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \
>   				  RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE)
>   
> +#ifdef CONFIG_DEBUG_RSEQ
> +static struct rseq *rseq_kernel_fields(struct task_struct *t)
> +{
> +	return (struct rseq *) t->rseq_fields;
> +}
> +
> +static int rseq_validate_ro_fields(struct task_struct *t)
> +{
> +	static DEFINE_RATELIMIT_STATE(_rs,
> +				      DEFAULT_RATELIMIT_INTERVAL,
> +				      DEFAULT_RATELIMIT_BURST);
> +	u32 cpu_id_start, cpu_id, node_id, mm_cid;
> +	struct rseq __user *rseq = t->rseq;
> +
> +	/*
> +	 * Validate fields which are required to be read-only by
> +	 * user-space.
> +	 */
> +	if (!user_read_access_begin(rseq, t->rseq_len))
> +		goto efault;
> +	unsafe_get_user(cpu_id_start, &rseq->cpu_id_start, efault_end);
> +	unsafe_get_user(cpu_id, &rseq->cpu_id, efault_end);
> +	unsafe_get_user(node_id, &rseq->node_id, efault_end);
> +	unsafe_get_user(mm_cid, &rseq->mm_cid, efault_end);
> +	user_read_access_end();
> +
> +	if ((cpu_id_start != rseq_kernel_fields(t)->cpu_id_start ||
> +	     cpu_id != rseq_kernel_fields(t)->cpu_id ||
> +	     node_id != rseq_kernel_fields(t)->node_id ||
> +	     mm_cid != rseq_kernel_fields(t)->mm_cid) && __ratelimit(&_rs)) {
> +
> +		pr_warn("Detected rseq corruption for pid: %d;\n"
> +			"  cpu_id_start: %u ?= %u\n"
> +			"  cpu_id:       %u ?= %u\n"
> +			"  node_id:      %u ?= %u\n"
> +			"  mm_cid:       %u ?= %u\n"
> +			t->pid,
> +			cpu_id_start, rseq_kernel_fields(t)->cpu_id_start,
> +			cpu_id, rseq_kernel_fields(t)->cpu_id,
> +			node_id, rseq_kernel_fields(t)->node_id,
> +			mm_cid, rseq_kernel_fields(t)->mm_cid);
> +	}

It looks better, thanks.

> +
> +	/* For now, only print a console warning on mismatch. */
> +	return 0;
> +
> +efault_end:
> +	user_read_access_end();
> +efault:
> +	return -EFAULT;
> +}
> +
> +static void rseq_set_ro_fields(struct task_struct *t, u32 cpu_id_start, u32 cpu_id,
> +			       u32 node_id, u32 mm_cid)
> +{
> +	rseq_kernel_fields(t)->cpu_id_start = cpu_id;
> +	rseq_kernel_fields(t)->cpu_id = cpu_id;
> +	rseq_kernel_fields(t)->node_id = node_id;
> +	rseq_kernel_fields(t)->mm_cid = mm_cid;
> +}

This too.

Thanks!

Mathieu
Peter Zijlstra Nov. 12, 2024, 2:47 p.m. UTC | #3
On Tue, Nov 12, 2024 at 09:27:23AM -0500, Mathieu Desnoyers wrote:

> Your approach looks indeed better than mine, I'll steal it with your
> permission. :)

Ofc.


> > > +#ifdef CONFIG_DEBUG_RSEQ
> > > +	/*
> > > +	 * Initialize the in-kernel rseq fields copy for validation of
> > > +	 * read-only fields.
> > > +	 */
> > > +	if (get_user(rseq_kernel_fields(current)->cpu_id_start, &rseq->cpu_id_start) ||
> > > +	    get_user(rseq_kernel_fields(current)->cpu_id, &rseq->cpu_id) ||
> > > +	    get_user(rseq_kernel_fields(current)->node_id, &rseq->node_id) ||
> > > +	    get_user(rseq_kernel_fields(current)->mm_cid, &rseq->mm_cid))
> > > +		return -EFAULT;
> > > +#endif
> > 
> > So I didn't change the above, but wouldn't it make more sense to do
> > rseq_reset_rseq_cpu_node_id() here, but without the validation?
> 
> Indeed we could do this (for both DEBUG_RSEQ={y,n}), but it would add extra
> useless stores to those userspace fields on rseq registration, which is
> performed on every thread creation starting from glibc 2.35. The
> rseq_set_notify_resume() invoked at the end of registration ensures that
> those fields get populated before return to userspace.
> 
> So I am not against a more robust approach, but I'm reluctant to add redundant
> work on thread creation.

Ah, indeed. Oh well..
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index bb343136ddd0..b7c42cea2aac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1364,6 +1364,15 @@  struct task_struct {
 	 * with respect to preemption.
 	 */
 	unsigned long rseq_event_mask;
+# ifdef CONFIG_DEBUG_RSEQ
+	/*
+	 * This is a place holder to save a copy of the rseq fields for
+	 * validation of read-only fields. The struct rseq has a
+	 * variable-length array at the end, so it cannot be used
+	 * directly. Reserve a size large enough for the known fields.
+	 */
+	char 				rseq_fields[sizeof(struct rseq)];
+# endif
 #endif
 
 #ifdef CONFIG_SCHED_MM_CID
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 9de6e35fe679..c3be2e498891 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -25,6 +25,61 @@ 
 				  RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \
 				  RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE)
 
+#ifdef CONFIG_DEBUG_RSEQ
+static struct rseq *rseq_kernel_fields(struct task_struct *t)
+{
+	return (struct rseq *) t->rseq_fields;
+}
+
+static int rseq_validate_ro_fields(struct task_struct *t)
+{
+	u32 cpu_id_start, cpu_id, node_id, mm_cid;
+	struct rseq __user *rseq = t->rseq;
+
+	/*
+	 * Validate fields which are required to be read-only by
+	 * user-space.
+	 */
+	if (!user_read_access_begin(rseq, t->rseq_len))
+		goto efault;
+	unsafe_get_user(cpu_id_start, &rseq->cpu_id_start, efault_end);
+	unsafe_get_user(cpu_id, &rseq->cpu_id, efault_end);
+	unsafe_get_user(node_id, &rseq->node_id, efault_end);
+	unsafe_get_user(mm_cid, &rseq->mm_cid, efault_end);
+	user_read_access_end();
+
+	if (cpu_id_start != rseq_kernel_fields(t)->cpu_id_start)
+		printk_ratelimited(KERN_WARNING
+			"Detected rseq cpu_id_start field corruption. Value: %u, expecting: %u (pid=%d).\n",
+			cpu_id_start, rseq_kernel_fields(t)->cpu_id_start, t->pid);
+	if (cpu_id != rseq_kernel_fields(t)->cpu_id)
+		printk_ratelimited(KERN_WARNING
+			"Detected rseq cpu_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
+			cpu_id, rseq_kernel_fields(t)->cpu_id, t->pid);
+	if (node_id != rseq_kernel_fields(t)->node_id)
+		printk_ratelimited(KERN_WARNING
+			"Detected rseq node_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
+			node_id, rseq_kernel_fields(t)->node_id, t->pid);
+	if (mm_cid != rseq_kernel_fields(t)->mm_cid)
+		printk_ratelimited(KERN_WARNING
+			"Detected rseq mm_cid field corruption. Value: %u, expecting: %u (pid=%d).\n",
+			mm_cid, rseq_kernel_fields(t)->mm_cid, t->pid);
+
+	/* For now, only print a console warning on mismatch. */
+	return 0;
+
+efault_end:
+	user_read_access_end();
+efault:
+	return -EFAULT;
+}
+#else
+static int rseq_validate_ro_fields(struct task_struct *t)
+{
+	return 0;
+}
+#endif
+
 /*
  *
  * Restartable sequences are a lightweight interface that allows
@@ -92,6 +147,11 @@  static int rseq_update_cpu_node_id(struct task_struct *t)
 	u32 node_id = cpu_to_node(cpu_id);
 	u32 mm_cid = task_mm_cid(t);
 
+	/*
+	 * Validate read-only rseq fields.
+	 */
+	if (rseq_validate_ro_fields(t))
+		goto efault;
 	WARN_ON_ONCE((int) mm_cid < 0);
 	if (!user_write_access_begin(rseq, t->rseq_len))
 		goto efault;
@@ -105,6 +165,13 @@  static int rseq_update_cpu_node_id(struct task_struct *t)
 	 * t->rseq_len != ORIG_RSEQ_SIZE.
 	 */
 	user_write_access_end();
+#ifdef CONFIG_DEBUG_RSEQ
+	/* Save a copy of the values which are read-only into kernel-space. */
+	rseq_kernel_fields(t)->cpu_id_start = cpu_id;
+	rseq_kernel_fields(t)->cpu_id = cpu_id;
+	rseq_kernel_fields(t)->node_id = node_id;
+	rseq_kernel_fields(t)->mm_cid = mm_cid;
+#endif
 	trace_rseq_update(t);
 	return 0;
 
@@ -119,6 +186,11 @@  static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
 	u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED, node_id = 0,
 	    mm_cid = 0;
 
+	/*
+	 * Validate read-only rseq fields.
+	 */
+	if (!rseq_validate_ro_fields(t))
+		return -EFAULT;
 	/*
 	 * Reset cpu_id_start to its initial state (0).
 	 */
@@ -141,6 +213,15 @@  static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
 	 */
 	if (put_user(mm_cid, &t->rseq->mm_cid))
 		return -EFAULT;
+#ifdef CONFIG_DEBUG_RSEQ
+	/*
+	 * Reset the in-kernel rseq fields copy.
+	 */
+	rseq_kernel_fields(t)->cpu_id_start = cpu_id_start;
+	rseq_kernel_fields(t)->cpu_id = cpu_id;
+	rseq_kernel_fields(t)->node_id = node_id;
+	rseq_kernel_fields(t)->mm_cid = mm_cid;
+#endif
 	/*
 	 * Additional feature fields added after ORIG_RSEQ_SIZE
 	 * need to be conditionally reset only if
@@ -423,6 +504,17 @@  SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 	current->rseq = rseq;
 	current->rseq_len = rseq_len;
 	current->rseq_sig = sig;
+#ifdef CONFIG_DEBUG_RSEQ
+	/*
+	 * Initialize the in-kernel rseq fields copy for validation of
+	 * read-only fields.
+	 */
+	if (get_user(rseq_kernel_fields(current)->cpu_id_start, &rseq->cpu_id_start) ||
+	    get_user(rseq_kernel_fields(current)->cpu_id, &rseq->cpu_id) ||
+	    get_user(rseq_kernel_fields(current)->node_id, &rseq->node_id) ||
+	    get_user(rseq_kernel_fields(current)->mm_cid, &rseq->mm_cid))
+		return -EFAULT;
+#endif
 	/*
 	 * If rseq was previously inactive, and has just been
 	 * registered, ensure the cpu_id_start and cpu_id fields