diff mbox series

[v4,05/13] powerpc/rtas: Facilitate high-level call sequences

Message ID 20231117-papr-sys_rtas-vs-lockdown-v4-5-b794d8cb8502@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series powerpc/pseries: New character devices for system parameters and VPD | expand

Commit Message

Nathan Lynch via B4 Relay Nov. 18, 2023, 5:14 a.m. UTC
From: Nathan Lynch <nathanl@linux.ibm.com>

On RTAS platforms there is a general restriction that the OS must not
enter RTAS on more than one CPU at a time. This low-level
serialization requirement is satisfied by holding a spin
lock (rtas_lock) across most RTAS function invocations.

However, some pseries RTAS functions require multiple successive calls
to complete a logical operation. Beginning a new call sequence for such a
function may disrupt any other sequences of that function already in
progress. Safe and reliable use of these functions effectively
requires higher-level serialization beyond what is already done at the
level of RTAS entry and exit.

Where a sequence-based RTAS function is invoked only through
sys_rtas(), with no in-kernel users, there is no issue as far as the
kernel is concerned. User space is responsible for appropriately
serializing its call sequences. (Whether user space code actually
takes measures to prevent sequence interleaving is another matter.)
Examples of such functions currently include ibm,platform-dump and
ibm,get-vpd.

But where a sequence-based RTAS function has both user space and
in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
of such a function serialize their sequences correctly, a user of
sys_rtas() can invoke the same function at any time, potentially
disrupting a sequence in progress.

So in order to prevent disruption of kernel-based RTAS call sequences,
they must serialize not only with themselves but also with sys_rtas()
users, somehow. Preferably without adding global locks or adding more
function-specific hacks to sys_rtas(). This is a prerequisite for
adding an in-kernel call sequence of ibm,get-vpd, which is in a change
to follow.

Note that it has never been feasible for the kernel to prevent
sys_rtas()-based sequences from being disrupted because control
returns to user space on every call. sys_rtas()-based users of these
functions have always been, and continue to be, responsible for
coordinating their call sequences with other users, even those which
may invoke the RTAS functions through less direct means than
sys_rtas(). This is an unavoidable consequence of exposing
sequence-based RTAS functions through sys_rtas().

* Add new rtas_function_lock() and rtas_function_unlock() APIs for use
  with sequence-based RTAS functions.

* Add an optional per-function mutex to struct rtas_function. When this
  member is set, kernel-internal callers of the RTAS function are
  required to guard their call sequences with rtas_function_lock() and
  rtas_function_unlock(). This requirement will be enforced in a later
  change, after all affected call sites are updated.

* Populate the lock members of function table entries where
  serialization of call sequences is known to be necessary, along with
  justifying commentary.

* In sys_rtas(), acquire the per-function mutex when it is present.

There should be no perceivable change introduced here except that
concurrent callers of the same RTAS function via sys_rtas() may block
on a mutex instead of spinning on rtas_lock. Changes to follow will
add rtas_function_lock()/unlock() pairs to kernel-based call
sequences.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |   2 +
 arch/powerpc/kernel/rtas.c      | 101 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 101 insertions(+), 2 deletions(-)

Comments

Aneesh Kumar K.V Nov. 20, 2023, 8:10 a.m. UTC | #1
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
writes:

> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> On RTAS platforms there is a general restriction that the OS must not
> enter RTAS on more than one CPU at a time. This low-level
> serialization requirement is satisfied by holding a spin
> lock (rtas_lock) across most RTAS function invocations.
>
> However, some pseries RTAS functions require multiple successive calls
> to complete a logical operation. Beginning a new call sequence for such a
> function may disrupt any other sequences of that function already in
> progress. Safe and reliable use of these functions effectively
> requires higher-level serialization beyond what is already done at the
> level of RTAS entry and exit.
>
> Where a sequence-based RTAS function is invoked only through
> sys_rtas(), with no in-kernel users, there is no issue as far as the
> kernel is concerned. User space is responsible for appropriately
> serializing its call sequences. (Whether user space code actually
> takes measures to prevent sequence interleaving is another matter.)
> Examples of such functions currently include ibm,platform-dump and
> ibm,get-vpd.
>
> But where a sequence-based RTAS function has both user space and
> in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
> of such a function serialize their sequences correctly, a user of
> sys_rtas() can invoke the same function at any time, potentially
> disrupting a sequence in progress.
>
> So in order to prevent disruption of kernel-based RTAS call sequences,
> they must serialize not only with themselves but also with sys_rtas()
> users, somehow. Preferably without adding global locks or adding more
> function-specific hacks to sys_rtas(). This is a prerequisite for
> adding an in-kernel call sequence of ibm,get-vpd, which is in a change
> to follow.
>
> Note that it has never been feasible for the kernel to prevent
> sys_rtas()-based sequences from being disrupted because control
> returns to user space on every call. sys_rtas()-based users of these
> functions have always been, and continue to be, responsible for
> coordinating their call sequences with other users, even those which
> may invoke the RTAS functions through less direct means than
> sys_rtas(). This is an unavoidable consequence of exposing
> sequence-based RTAS functions through sys_rtas().
>
> * Add new rtas_function_lock() and rtas_function_unlock() APIs for use
>   with sequence-based RTAS functions.
>
> * Add an optional per-function mutex to struct rtas_function. When this
>   member is set, kernel-internal callers of the RTAS function are
>   required to guard their call sequences with rtas_function_lock() and
>   rtas_function_unlock(). This requirement will be enforced in a later
>   change, after all affected call sites are updated.
>
> * Populate the lock members of function table entries where
>   serialization of call sequences is known to be necessary, along with
>   justifying commentary.
>
> * In sys_rtas(), acquire the per-function mutex when it is present.
>
> There should be no perceivable change introduced here except that
> concurrent callers of the same RTAS function via sys_rtas() may block
> on a mutex instead of spinning on rtas_lock. Changes to follow will
> add rtas_function_lock()/unlock() pairs to kernel-based call
> sequences.
>

Can you add an example of the last part. I did look at to find 06 to
find the details 

	rtas_function_lock(RTAS_FN_IBM_ACTIVATE_FIRMWARE);

	do {
		fwrc = rtas_call(token, 0, 1, NULL);
	} while (rtas_busy_delay(fwrc));

	rtas_function_unlock(RTAS_FN_IBM_ACTIVATE_FIRMWARE);

Reviewed-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>

>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/rtas.h |   2 +
>  arch/powerpc/kernel/rtas.c      | 101 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 101 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index b73010583a8d..9a20caba6858 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -421,6 +421,8 @@ static inline bool rtas_function_implemented(const rtas_fn_handle_t handle)
>  {
>  	return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
>  }
> +void rtas_function_lock(rtas_fn_handle_t handle);
> +void rtas_function_unlock(rtas_fn_handle_t handle);
>  extern int rtas_token(const char *service);
>  extern int rtas_service_present(const char *service);
>  extern int rtas_call(int token, int, int, int *, ...);
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 1fc0b3fffdd1..52f2242d0c28 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -18,6 +18,7 @@
>  #include <linux/kernel.h>
>  #include <linux/lockdep.h>
>  #include <linux/memblock.h>
> +#include <linux/mutex.h>
>  #include <linux/of.h>
>  #include <linux/of_fdt.h>
>  #include <linux/reboot.h>
> @@ -70,14 +71,34 @@ struct rtas_filter {
>   *                            ppc64le, and we want to keep it that way. It does
>   *                            not make sense for this to be set when @filter
>   *                            is NULL.
> + * @lock: Pointer to an optional dedicated per-function mutex. This
> + *        should be set for functions that require multiple calls in
> + *        sequence to complete a single operation, and such sequences
> + *        will disrupt each other if allowed to interleave. Users of
> + *        this function are required to hold the associated lock for
> + *        the duration of the call sequence. Add an explanatory
> + *        comment to the function table entry if setting this member.
>   */
>  struct rtas_function {
>  	s32 token;
>  	const bool banned_for_syscall_on_le:1;
>  	const char * const name;
>  	const struct rtas_filter *filter;
> +	struct mutex *lock;
>  };
>  
> +/*
> + * Per-function locks. Access these through the
> + * rtas_function_lock/unlock APIs, not directly.
> + */
> +static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
> +static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
> +static DEFINE_MUTEX(rtas_ibm_get_indices_lock);
> +static DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
> +static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
> +static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
> +static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
> +
>  static struct rtas_function rtas_function_table[] __ro_after_init = {
>  	[RTAS_FNIDX__CHECK_EXCEPTION] = {
>  		.name = "check-exception",
> @@ -125,6 +146,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = -1, .size_idx1 = -1,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR doesn't explicitly impose any restriction, but
> +		 * this typically requires multiple calls before
> +		 * success, and there's no reason to allow sequences
> +		 * to interleave.
> +		 */
> +		.lock = &rtas_ibm_activate_firmware_lock,
>  	},
>  	[RTAS_FNIDX__IBM_CBE_START_PTCAL] = {
>  		.name = "ibm,cbe-start-ptcal",
> @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 1, .size_idx1 = -1,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR+ R1–7.3.19–3 is explicit that the OS must not
> +		 * call ibm,get-dynamic-sensor-state with different
> +		 * inputs until a non-retry status has been returned.
> +		 */
> +		.lock = &rtas_ibm_get_dynamic_sensor_state_lock,
>  	},
>  	[RTAS_FNIDX__IBM_GET_INDICES] = {
>  		.name = "ibm,get-indices",
> @@ -203,6 +237,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 2, .size_idx1 = 3,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR+ R1–7.3.17–2 says that the OS must not
> +		 * interleave ibm,get-indices call sequences with
> +		 * different inputs.
> +		 */
> +		.lock = &rtas_ibm_get_indices_lock,
>  	},
>  	[RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY] = {
>  		.name = "ibm,get-rio-topology",
> @@ -220,6 +260,11 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 0, .size_idx1 = -1,
>  			.buf_idx2 = 1, .size_idx2 = 2,
>  		},
> +		/*
> +		 * PAPR+ R1–7.3.20–4 indicates that sequences
> +		 * should not be allowed to interleave.
> +		 */
> +		.lock = &rtas_ibm_get_vpd_lock,
>  	},
>  	[RTAS_FNIDX__IBM_GET_XIVE] = {
>  		.name = "ibm,get-xive",
> @@ -239,6 +284,11 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 2, .size_idx1 = 3,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR+ R1–7.3.26–6 says the OS should allow only one
> +		 * call sequence in progress at a time.
> +		 */
> +		.lock = &rtas_ibm_lpar_perftools_lock,
>  	},
>  	[RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE] = {
>  		.name = "ibm,manage-flash-image",
> @@ -277,6 +327,14 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 0, .size_idx1 = 1,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * This follows a sequence-based pattern similar to
> +		 * ibm,get-vpd et al. Since PAPR+ restricts
> +		 * interleaving call sequences for other functions of
> +		 * this style, assume the restriction applies here,
> +		 * even though it's not explicit in the spec.
> +		 */
> +		.lock = &rtas_ibm_physical_attestation_lock,
>  	},
>  	[RTAS_FNIDX__IBM_PLATFORM_DUMP] = {
>  		.name = "ibm,platform-dump",
> @@ -284,6 +342,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 4, .size_idx1 = 5,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR+ 7.3.3.4.1 indicates that concurrent sequences
> +		 * of ibm,platform-dump are allowed if they are
> +		 * operating on different dump tags. So leave
> +		 * the lock pointer unset for now. This may need
> +		 * reconsideration if kernel-internal users appear.
> +		 */
>  	},
>  	[RTAS_FNIDX__IBM_POWER_OFF_UPS] = {
>  		.name = "ibm,power-off-ups",
> @@ -326,6 +391,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 2, .size_idx1 = -1,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR+ R1–7.3.18–3 says the OS must not call this
> +		 * function with different inputs until a non-retry
> +		 * status has been returned.
> +		 */
> +		.lock = &rtas_ibm_set_dynamic_indicator_lock,
>  	},
>  	[RTAS_FNIDX__IBM_SET_EEH_OPTION] = {
>  		.name = "ibm,set-eeh-option",
> @@ -556,9 +627,9 @@ static int __init rtas_token_to_function_xarray_init(void)
>  }
>  arch_initcall(rtas_token_to_function_xarray_init);
>  
> -static const struct rtas_function *rtas_token_to_function(s32 token)
> +static struct rtas_function *rtas_token_to_function(s32 token)
>  {
> -	const struct rtas_function *func;
> +	struct rtas_function *func;
>  
>  	if (WARN_ONCE(token < 0, "invalid token %d", token))
>  		return NULL;
> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>  	return NULL;
>  }
>  
> +static void __rtas_function_lock(struct rtas_function *func)
> +{
> +	if (func && func->lock)
> +		mutex_lock(func->lock);
> +}
> +
> +static void __rtas_function_unlock(struct rtas_function *func)
> +{
> +	if (func && func->lock)
> +		mutex_unlock(func->lock);
> +}
> +
> +void rtas_function_lock(const rtas_fn_handle_t handle)
> +{
> +	__rtas_function_lock(rtas_function_lookup(handle));
> +}
> +
> +void rtas_function_unlock(const rtas_fn_handle_t handle)
> +{
> +	__rtas_function_unlock(rtas_function_lookup(handle));
> +}
> +
>  /* This is here deliberately so it's only used in this file */
>  void enter_rtas(unsigned long);
>  
> @@ -1885,6 +1978,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>  
>  	buff_copy = get_errorlog_buffer();
>  
> +	__rtas_function_lock(rtas_token_to_function(token));
> +
>  	raw_spin_lock_irqsave(&rtas_lock, flags);
>  	cookie = lockdep_pin_lock(&rtas_lock);
>  
> @@ -1900,6 +1995,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>  	lockdep_unpin_lock(&rtas_lock, cookie);
>  	raw_spin_unlock_irqrestore(&rtas_lock, flags);
>  
> +	__rtas_function_unlock(rtas_token_to_function(token));
> +
>  	if (buff_copy) {
>  		if (errbuf)
>  			log_error(errbuf, ERR_TYPE_RTAS_LOG, 0);
>
> -- 
> 2.41.0
Nathan Lynch Nov. 28, 2023, 3:35 p.m. UTC | #2
"Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org> writes:
> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
> writes:
>> There should be no perceivable change introduced here except that
>> concurrent callers of the same RTAS function via sys_rtas() may block
>> on a mutex instead of spinning on rtas_lock. Changes to follow will
>> add rtas_function_lock()/unlock() pairs to kernel-based call
>> sequences.
>>
>
> Can you add an example of the last part. I did look at to find 06 to
> find the details
>
> 	rtas_function_lock(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
>
> 	do {
> 		fwrc = rtas_call(token, 0, 1, NULL);
> 	} while (rtas_busy_delay(fwrc));
>
> 	rtas_function_unlock(RTAS_FN_IBM_ACTIVATE_FIRMWARE);

Sure, I'll add a simple example of the API usage in the commit message,
thanks.
Michael Ellerman Nov. 28, 2023, 10:30 p.m. UTC | #3
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> On RTAS platforms there is a general restriction that the OS must not
> enter RTAS on more than one CPU at a time. This low-level
> serialization requirement is satisfied by holding a spin
> lock (rtas_lock) across most RTAS function invocations.
...
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 1fc0b3fffdd1..52f2242d0c28 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>  	return NULL;
>  }
>  
> +static void __rtas_function_lock(struct rtas_function *func)
> +{
> +	if (func && func->lock)
> +		mutex_lock(func->lock);
> +}

This is obviously going to defeat most static analysis tools. I assume
lockdep is OK with it though?

cheers
Nathan Lynch Nov. 28, 2023, 11:05 p.m. UTC | #4
Michael Ellerman <mpe@ellerman.id.au> writes:

> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
> writes:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> On RTAS platforms there is a general restriction that the OS must not
>> enter RTAS on more than one CPU at a time. This low-level
>> serialization requirement is satisfied by holding a spin
>> lock (rtas_lock) across most RTAS function invocations.
> ...
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 1fc0b3fffdd1..52f2242d0c28 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>  	return NULL;
>>  }
>>  
>> +static void __rtas_function_lock(struct rtas_function *func)
>> +{
>> +	if (func && func->lock)
>> +		mutex_lock(func->lock);
>> +}
>
> This is obviously going to defeat most static analysis tools.

I guess it's not that obvious to me :-) Is it because the mutex_lock()
is conditional? I'll improve this if it's possible.

> I assume lockdep is OK with it though?

Seems to be, yes.
Michael Ellerman Nov. 29, 2023, 2:11 a.m. UTC | #5
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
writes:
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 1fc0b3fffdd1..52f2242d0c28 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  			.buf_idx1 = 1, .size_idx1 = -1,
>  			.buf_idx2 = -1, .size_idx2 = -1,
>  		},
> +		/*
> +		 * PAPR+ R1–7.3.19–3 is explicit that the OS must not

When you cite PAPR+ can you please include the version number?

That's a general comment on this patch and in some other places in the
series too.

cheers
Nathan Lynch Nov. 29, 2023, 2:37 a.m. UTC | #6
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
> writes:
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 1fc0b3fffdd1..52f2242d0c28 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>>  			.buf_idx1 = 1, .size_idx1 = -1,
>>  			.buf_idx2 = -1, .size_idx2 = -1,
>>  		},
>> +		/*
>> +		 * PAPR+ R1–7.3.19–3 is explicit that the OS must not
>
> When you cite PAPR+ can you please include the version number?
>
> That's a general comment on this patch and in some other places in the
> series too.

OK. I assume v2.13 is fine even though most of the citations refer to
passages that significantly predate that version.
Michael Ellerman Nov. 29, 2023, 3:16 a.m. UTC | #7
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>> writes:
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>>>  			.buf_idx1 = 1, .size_idx1 = -1,
>>>  			.buf_idx2 = -1, .size_idx2 = -1,
>>>  		},
>>> +		/*
>>> +		 * PAPR+ R1–7.3.19–3 is explicit that the OS must not
>>
>> When you cite PAPR+ can you please include the version number?
>>
>> That's a general comment on this patch and in some other places in the
>> series too.
>
> OK. I assume v2.13 is fine even though most of the citations refer to
> passages that significantly predate that version.

Yeah whatever version you are referring to.

It just means if there's ever confusion about what's in the kernel
comments vs the then current version of PAPR, we can go back and refer
to the exact version you were using.

It also avoids confusion vs LoPAPR, which is simliar but has some
differently numbered chapters.

cheers
Michael Ellerman Nov. 29, 2023, 1:20 p.m. UTC | #8
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>
>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>> writes:
>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>
>>> On RTAS platforms there is a general restriction that the OS must not
>>> enter RTAS on more than one CPU at a time. This low-level
>>> serialization requirement is satisfied by holding a spin
>>> lock (rtas_lock) across most RTAS function invocations.
>> ...
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>>  	return NULL;
>>>  }
>>>  
>>> +static void __rtas_function_lock(struct rtas_function *func)
>>> +{
>>> +	if (func && func->lock)
>>> +		mutex_lock(func->lock);
>>> +}
>>
>> This is obviously going to defeat most static analysis tools.
>
> I guess it's not that obvious to me :-) Is it because the mutex_lock()
> is conditional? I'll improve this if it's possible.

Well maybe I'm not giving modern static analysis tools enough credit :)

But what I mean that it's not easy to reason about what the function
does in isolation. ie. all you can say is that it may or may not lock a
mutex, and you can't say which mutex.

>> I assume lockdep is OK with it though?
>
> Seems to be, yes.

OK.

cheers
Nathan Lynch Nov. 30, 2023, 6:26 p.m. UTC | #9
Michael Ellerman <mpe@ellerman.id.au> writes:

> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>
>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>>> writes:
>>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>>
>>>> On RTAS platforms there is a general restriction that the OS must not
>>>> enter RTAS on more than one CPU at a time. This low-level
>>>> serialization requirement is satisfied by holding a spin
>>>> lock (rtas_lock) across most RTAS function invocations.
>>> ...
>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>>> --- a/arch/powerpc/kernel/rtas.c
>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>>>  	return NULL;
>>>>  }
>>>>  
>>>> +static void __rtas_function_lock(struct rtas_function *func)
>>>> +{
>>>> +	if (func && func->lock)
>>>> +		mutex_lock(func->lock);
>>>> +}
>>>
>>> This is obviously going to defeat most static analysis tools.
>>
>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>> is conditional? I'll improve this if it's possible.
>
> Well maybe I'm not giving modern static analysis tools enough credit :)
>
> But what I mean that it's not easy to reason about what the function
> does in isolation. ie. all you can say is that it may or may not lock a
> mutex, and you can't say which mutex.

I've pulled the thread on this a little bit and here is what I can do:

* Discard rtas_lock_function() and rtas_unlock_function() and make the
  function mutexes extern as needed. As of now only
  rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
  __acquires(), __releases(), and __must_hold() annotations in
  papr-vpd.c since it explicitly manipulates the mutex.

* Then sys_rtas() becomes the only site that needs
  __rtas_function_lock() and __rtas_function_unlock(), which can be
  open-coded and commented (and, one hopes, not emulated elsewhere).

This will look something like:

SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
{
        struct rtas_function *func = rtas_token_to_function(token);

        if (func->lock)
                mutex_lock(func->lock);

        [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]

        if (func->lock)
                mutex_unlock(func->lock);

The indirection seems unavoidable since we're working backwards from a
token value (supplied by the user and not known at build time) to the
function descriptor.

Is that tolerable for now?

Alternatively, sys_rtas() could be refactored into locking and
non-locking paths, e.g.

static long __do_sys_rtas(struct rtas_function *func)
{
	// [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ]
}

static long do_sys_rtas(struct rtas_function *func, struct mutex *mtx)
{
	mutex_lock(mtx);
	ret = __do_sys_rtas(func);
	mutex_unlock(mtx);

	return ret;
}

SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
{
	// real code does copy_from_user etc
	struct rtas_function *func = rtas_token_to_function(uargs->token);
	long ret;

	// [ ... input validation and filtering ... ]

	if (func->lock)
		ret = do_sys_rtas(func, func->lock);
	else
		ret = __do_sys_rtas(func);

	// [ ... copy out results ... ]

	return ret;
}
Nathan Lynch Nov. 30, 2023, 9:41 p.m. UTC | #10
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Alternatively, sys_rtas() could be refactored into locking and
> non-locking paths, e.g.
>
> static long __do_sys_rtas(struct rtas_function *func)
> {
> 	// [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ]
> }

Of course this conflicts with the code generated by SYSCALL_DEFINE1(rtas), so
a different function name would be needed here.
Michael Ellerman Nov. 30, 2023, 10:46 p.m. UTC | #11
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>>>> writes:
>>>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>>>
>>>>> On RTAS platforms there is a general restriction that the OS must not
>>>>> enter RTAS on more than one CPU at a time. This low-level
>>>>> serialization requirement is satisfied by holding a spin
>>>>> lock (rtas_lock) across most RTAS function invocations.
>>>> ...
>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>>>> --- a/arch/powerpc/kernel/rtas.c
>>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>>>>  	return NULL;
>>>>>  }
>>>>>  
>>>>> +static void __rtas_function_lock(struct rtas_function *func)
>>>>> +{
>>>>> +	if (func && func->lock)
>>>>> +		mutex_lock(func->lock);
>>>>> +}
>>>>
>>>> This is obviously going to defeat most static analysis tools.
>>>
>>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>>> is conditional? I'll improve this if it's possible.
>>
>> Well maybe I'm not giving modern static analysis tools enough credit :)
>>
>> But what I mean that it's not easy to reason about what the function
>> does in isolation. ie. all you can say is that it may or may not lock a
>> mutex, and you can't say which mutex.
>
> I've pulled the thread on this a little bit and here is what I can do:
>
> * Discard rtas_lock_function() and rtas_unlock_function() and make the
>   function mutexes extern as needed. As of now only
>   rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
>   __acquires(), __releases(), and __must_hold() annotations in
>   papr-vpd.c since it explicitly manipulates the mutex.
>
> * Then sys_rtas() becomes the only site that needs
>   __rtas_function_lock() and __rtas_function_unlock(), which can be
>   open-coded and commented (and, one hopes, not emulated elsewhere).
>
> This will look something like:
>
> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
> {
>         struct rtas_function *func = rtas_token_to_function(token);
>
>         if (func->lock)
>                 mutex_lock(func->lock);
>
>         [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]
>
>         if (func->lock)
>                 mutex_unlock(func->lock);
>
> The indirection seems unavoidable since we're working backwards from a
> token value (supplied by the user and not known at build time) to the
> function descriptor.
>
> Is that tolerable for now?

Yeah. Thanks for looking into it.

I wasn't unhappy with the original version, but just slightly uneasy
about the locking via pointer.

But that new proposal sounds good, more code will have static lock
annotations, and only sys_rtas() which is already weird, will have the
dynamic stuff.

> Alternatively, sys_rtas() could be refactored into locking and
> non-locking paths, e.g.
>
> static long __do_sys_rtas(struct rtas_function *func)
> {
> 	// [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ]
> }
>
> static long do_sys_rtas(struct rtas_function *func, struct mutex *mtx)
> {
> 	mutex_lock(mtx);
> 	ret = __do_sys_rtas(func);
> 	mutex_unlock(mtx);
>
> 	return ret;
> }
>
> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
> {
> 	// real code does copy_from_user etc
> 	struct rtas_function *func = rtas_token_to_function(uargs->token);
> 	long ret;
>
> 	// [ ... input validation and filtering ... ]
>
> 	if (func->lock)
> 		ret = do_sys_rtas(func, func->lock);
> 	else
> 		ret = __do_sys_rtas(func);
>
> 	// [ ... copy out results ... ]
>
> 	return ret;
> }

You could go even further and switch on the token, and handle each case
separately so that you can then statically take the appropriate lock.
But that's probably overkill.

cheers
Nathan Lynch Nov. 30, 2023, 11:53 p.m. UTC | #12
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>>>>> writes:
>>>>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>>>>
>>>>>> On RTAS platforms there is a general restriction that the OS must not
>>>>>> enter RTAS on more than one CPU at a time. This low-level
>>>>>> serialization requirement is satisfied by holding a spin
>>>>>> lock (rtas_lock) across most RTAS function invocations.
>>>>> ...
>>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>>>>> --- a/arch/powerpc/kernel/rtas.c
>>>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>>>>>  	return NULL;
>>>>>>  }
>>>>>>  
>>>>>> +static void __rtas_function_lock(struct rtas_function *func)
>>>>>> +{
>>>>>> +	if (func && func->lock)
>>>>>> +		mutex_lock(func->lock);
>>>>>> +}
>>>>>
>>>>> This is obviously going to defeat most static analysis tools.
>>>>
>>>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>>>> is conditional? I'll improve this if it's possible.
>>>
>>> Well maybe I'm not giving modern static analysis tools enough credit :)
>>>
>>> But what I mean that it's not easy to reason about what the function
>>> does in isolation. ie. all you can say is that it may or may not lock a
>>> mutex, and you can't say which mutex.
>>
>> I've pulled the thread on this a little bit and here is what I can do:
>>
>> * Discard rtas_lock_function() and rtas_unlock_function() and make the
>>   function mutexes extern as needed. As of now only
>>   rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
>>   __acquires(), __releases(), and __must_hold() annotations in
>>   papr-vpd.c since it explicitly manipulates the mutex.
>>
>> * Then sys_rtas() becomes the only site that needs
>>   __rtas_function_lock() and __rtas_function_unlock(), which can be
>>   open-coded and commented (and, one hopes, not emulated elsewhere).
>>
>> This will look something like:
>>
>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>> {
>>         struct rtas_function *func = rtas_token_to_function(token);
>>
>>         if (func->lock)
>>                 mutex_lock(func->lock);
>>
>>         [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]
>>
>>         if (func->lock)
>>                 mutex_unlock(func->lock);
>>
>> The indirection seems unavoidable since we're working backwards from a
>> token value (supplied by the user and not known at build time) to the
>> function descriptor.
>>
>> Is that tolerable for now?
>
> Yeah. Thanks for looking into it.
>
> I wasn't unhappy with the original version, but just slightly uneasy
> about the locking via pointer.
>
> But that new proposal sounds good, more code will have static lock
> annotations, and only sys_rtas() which is already weird, will have the
> dynamic stuff.

OK, I'll work that up then.
Nathan Lynch Dec. 5, 2023, 4:51 p.m. UTC | #13
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>>>>>> writes:
>>>>>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>>>>>
>>>>>>> On RTAS platforms there is a general restriction that the OS must not
>>>>>>> enter RTAS on more than one CPU at a time. This low-level
>>>>>>> serialization requirement is satisfied by holding a spin
>>>>>>> lock (rtas_lock) across most RTAS function invocations.
>>>>>> ...
>>>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>>>>>> --- a/arch/powerpc/kernel/rtas.c
>>>>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>>>>>>  	return NULL;
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void __rtas_function_lock(struct rtas_function *func)
>>>>>>> +{
>>>>>>> +	if (func && func->lock)
>>>>>>> +		mutex_lock(func->lock);
>>>>>>> +}
>>>>>>
>>>>>> This is obviously going to defeat most static analysis tools.
>>>>>
>>>>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>>>>> is conditional? I'll improve this if it's possible.
>>>>
>>>> Well maybe I'm not giving modern static analysis tools enough credit :)
>>>>
>>>> But what I mean that it's not easy to reason about what the function
>>>> does in isolation. ie. all you can say is that it may or may not lock a
>>>> mutex, and you can't say which mutex.
>>>
>>> I've pulled the thread on this a little bit and here is what I can do:
>>>
>>> * Discard rtas_lock_function() and rtas_unlock_function() and make the
>>>   function mutexes extern as needed. As of now only
>>>   rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
>>>   __acquires(), __releases(), and __must_hold() annotations in
>>>   papr-vpd.c since it explicitly manipulates the mutex.
>>>
>>> * Then sys_rtas() becomes the only site that needs
>>>   __rtas_function_lock() and __rtas_function_unlock(), which can be
>>>   open-coded and commented (and, one hopes, not emulated elsewhere).
>>>
>>> This will look something like:
>>>
>>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>>> {
>>>         struct rtas_function *func = rtas_token_to_function(token);
>>>
>>>         if (func->lock)
>>>                 mutex_lock(func->lock);
>>>
>>>         [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]
>>>
>>>         if (func->lock)
>>>                 mutex_unlock(func->lock);
>>>
>>> The indirection seems unavoidable since we're working backwards from a
>>> token value (supplied by the user and not known at build time) to the
>>> function descriptor.
>>>
>>> Is that tolerable for now?
>>
>> Yeah. Thanks for looking into it.
>>
>> I wasn't unhappy with the original version, but just slightly uneasy
>> about the locking via pointer.
>>
>> But that new proposal sounds good, more code will have static lock
>> annotations, and only sys_rtas() which is already weird, will have the
>> dynamic stuff.
>
> OK, I'll work that up then.

Well, apparently the annotations aren't useful with mutexes; see these
threads:

https://lore.kernel.org/all/8e8d93ee2125c739caabe5986f40fa2156c8b4ce.1579893447.git.jbi.octave@gmail.com/
https://lore.kernel.org/all/20200601184552.23128-5-jbi.octave@gmail.com/

And indeed I can't get sparse to accept them when added to the papr-vpd
code:

$ make C=2 arch/powerpc/platforms/pseries/papr-vpd.o
  CHECK   scripts/mod/empty.c
  CALL    scripts/checksyscalls.sh
  CHECK   arch/powerpc/platforms/pseries/papr-vpd.c
    arch/powerpc/platforms/pseries/papr-vpd.c:235:13: warning: context
      imbalance in 'vpd_sequence_begin' - wrong count at exit
    arch/powerpc/platforms/pseries/papr-vpd.c:269:13: warning: context
      imbalance in 'vpd_sequence_end' - wrong count at exit

I don't think it's my own mistake since I see existing code with the
same problem, such as net/core/sock.c:

static void *proto_seq_start(struct seq_file *seq, loff_t *pos)
	__acquires(proto_list_mutex)
{
	mutex_lock(&proto_list_mutex);
	return seq_list_start_head(&proto_list, *pos);
}

static void proto_seq_stop(struct seq_file *seq, void *v)
	__releases(proto_list_mutex)
{
	mutex_unlock(&proto_list_mutex);
}

which yields:

net/core/sock.c:4018:13: warning: context imbalance in 'proto_seq_start'
  - wrong count at exit
net/core/sock.c:4030:13: warning: context imbalance in 'proto_seq_stop'
  - wrong count at exit

So I'll give up on static annotations for this series and look for
opportunities to add lockdep_assert_held() etc.
Michael Ellerman Dec. 7, 2023, 1:02 a.m. UTC | #14
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>>>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
>>>>>>> writes:
>>>>>>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>>>>>>
>>>>>>>> On RTAS platforms there is a general restriction that the OS must not
>>>>>>>> enter RTAS on more than one CPU at a time. This low-level
>>>>>>>> serialization requirement is satisfied by holding a spin
>>>>>>>> lock (rtas_lock) across most RTAS function invocations.
>>>>>>> ...
>>>>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644
>>>>>>>> --- a/arch/powerpc/kernel/rtas.c
>>>>>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
>>>>>>>>  	return NULL;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void __rtas_function_lock(struct rtas_function *func)
>>>>>>>> +{
>>>>>>>> +	if (func && func->lock)
>>>>>>>> +		mutex_lock(func->lock);
>>>>>>>> +}
>>>>>>>
>>>>>>> This is obviously going to defeat most static analysis tools.
>>>>>>
>>>>>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>>>>>> is conditional? I'll improve this if it's possible.
>>>>>
>>>>> Well maybe I'm not giving modern static analysis tools enough credit :)
>>>>>
>>>>> But what I mean that it's not easy to reason about what the function
>>>>> does in isolation. ie. all you can say is that it may or may not lock a
>>>>> mutex, and you can't say which mutex.
>>>>
>>>> I've pulled the thread on this a little bit and here is what I can do:
>>>>
>>>> * Discard rtas_lock_function() and rtas_unlock_function() and make the
>>>>   function mutexes extern as needed. As of now only
>>>>   rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
>>>>   __acquires(), __releases(), and __must_hold() annotations in
>>>>   papr-vpd.c since it explicitly manipulates the mutex.
>>>>
>>>> * Then sys_rtas() becomes the only site that needs
>>>>   __rtas_function_lock() and __rtas_function_unlock(), which can be
>>>>   open-coded and commented (and, one hopes, not emulated elsewhere).
>>>>
>>>> This will look something like:
>>>>
>>>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>>>> {
>>>>         struct rtas_function *func = rtas_token_to_function(token);
>>>>
>>>>         if (func->lock)
>>>>                 mutex_lock(func->lock);
>>>>
>>>>         [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]
>>>>
>>>>         if (func->lock)
>>>>                 mutex_unlock(func->lock);
>>>>
>>>> The indirection seems unavoidable since we're working backwards from a
>>>> token value (supplied by the user and not known at build time) to the
>>>> function descriptor.
>>>>
>>>> Is that tolerable for now?
>>>
>>> Yeah. Thanks for looking into it.
>>>
>>> I wasn't unhappy with the original version, but just slightly uneasy
>>> about the locking via pointer.
>>>
>>> But that new proposal sounds good, more code will have static lock
>>> annotations, and only sys_rtas() which is already weird, will have the
>>> dynamic stuff.
>>
>> OK, I'll work that up then.
>
> Well, apparently the annotations aren't useful with mutexes; see these
> threads:
>
> https://lore.kernel.org/all/8e8d93ee2125c739caabe5986f40fa2156c8b4ce.1579893447.git.jbi.octave@gmail.com/
> https://lore.kernel.org/all/20200601184552.23128-5-jbi.octave@gmail.com/
>
> And indeed I can't get sparse to accept them when added to the papr-vpd
> code:
>
> $ make C=2 arch/powerpc/platforms/pseries/papr-vpd.o
>   CHECK   scripts/mod/empty.c
>   CALL    scripts/checksyscalls.sh
>   CHECK   arch/powerpc/platforms/pseries/papr-vpd.c
>     arch/powerpc/platforms/pseries/papr-vpd.c:235:13: warning: context
>       imbalance in 'vpd_sequence_begin' - wrong count at exit
>     arch/powerpc/platforms/pseries/papr-vpd.c:269:13: warning: context
>       imbalance in 'vpd_sequence_end' - wrong count at exit

Oof. Sorry to send you on that goose chase.

> I don't think it's my own mistake since I see existing code with the
> same problem, such as net/core/sock.c:
>
> static void *proto_seq_start(struct seq_file *seq, loff_t *pos)
> 	__acquires(proto_list_mutex)
> {
> 	mutex_lock(&proto_list_mutex);
> 	return seq_list_start_head(&proto_list, *pos);
> }
>
> static void proto_seq_stop(struct seq_file *seq, void *v)
> 	__releases(proto_list_mutex)
> {
> 	mutex_unlock(&proto_list_mutex);
> }
>
> which yields:
>
> net/core/sock.c:4018:13: warning: context imbalance in 'proto_seq_start'
>   - wrong count at exit
> net/core/sock.c:4030:13: warning: context imbalance in 'proto_seq_stop'
>   - wrong count at exit
>
> So I'll give up on static annotations for this series and look for
> opportunities to add lockdep_assert_held() etc.

OK. There are other static analysers than sparse, and even for lockdep
having more of the locks taken statically is probably still preferable.

But feel free to ignore me and just repost what you had :)

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b73010583a8d..9a20caba6858 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -421,6 +421,8 @@  static inline bool rtas_function_implemented(const rtas_fn_handle_t handle)
 {
 	return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
 }
+void rtas_function_lock(rtas_fn_handle_t handle);
+void rtas_function_unlock(rtas_fn_handle_t handle);
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 1fc0b3fffdd1..52f2242d0c28 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -18,6 +18,7 @@ 
 #include <linux/kernel.h>
 #include <linux/lockdep.h>
 #include <linux/memblock.h>
+#include <linux/mutex.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/reboot.h>
@@ -70,14 +71,34 @@  struct rtas_filter {
  *                            ppc64le, and we want to keep it that way. It does
  *                            not make sense for this to be set when @filter
  *                            is NULL.
+ * @lock: Pointer to an optional dedicated per-function mutex. This
+ *        should be set for functions that require multiple calls in
+ *        sequence to complete a single operation, and such sequences
+ *        will disrupt each other if allowed to interleave. Users of
+ *        this function are required to hold the associated lock for
+ *        the duration of the call sequence. Add an explanatory
+ *        comment to the function table entry if setting this member.
  */
 struct rtas_function {
 	s32 token;
 	const bool banned_for_syscall_on_le:1;
 	const char * const name;
 	const struct rtas_filter *filter;
+	struct mutex *lock;
 };
 
+/*
+ * Per-function locks. Access these through the
+ * rtas_function_lock/unlock APIs, not directly.
+ */
+static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
+static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
+static DEFINE_MUTEX(rtas_ibm_get_indices_lock);
+static DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
+static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
+static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
+static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
+
 static struct rtas_function rtas_function_table[] __ro_after_init = {
 	[RTAS_FNIDX__CHECK_EXCEPTION] = {
 		.name = "check-exception",
@@ -125,6 +146,13 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = -1, .size_idx1 = -1,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * PAPR doesn't explicitly impose any restriction, but
+		 * this typically requires multiple calls before
+		 * success, and there's no reason to allow sequences
+		 * to interleave.
+		 */
+		.lock = &rtas_ibm_activate_firmware_lock,
 	},
 	[RTAS_FNIDX__IBM_CBE_START_PTCAL] = {
 		.name = "ibm,cbe-start-ptcal",
@@ -196,6 +224,12 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 1, .size_idx1 = -1,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * PAPR+ R1–7.3.19–3 is explicit that the OS must not
+		 * call ibm,get-dynamic-sensor-state with different
+		 * inputs until a non-retry status has been returned.
+		 */
+		.lock = &rtas_ibm_get_dynamic_sensor_state_lock,
 	},
 	[RTAS_FNIDX__IBM_GET_INDICES] = {
 		.name = "ibm,get-indices",
@@ -203,6 +237,12 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 2, .size_idx1 = 3,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * PAPR+ R1–7.3.17–2 says that the OS must not
+		 * interleave ibm,get-indices call sequences with
+		 * different inputs.
+		 */
+		.lock = &rtas_ibm_get_indices_lock,
 	},
 	[RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY] = {
 		.name = "ibm,get-rio-topology",
@@ -220,6 +260,11 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 0, .size_idx1 = -1,
 			.buf_idx2 = 1, .size_idx2 = 2,
 		},
+		/*
+		 * PAPR+ R1–7.3.20–4 indicates that sequences
+		 * should not be allowed to interleave.
+		 */
+		.lock = &rtas_ibm_get_vpd_lock,
 	},
 	[RTAS_FNIDX__IBM_GET_XIVE] = {
 		.name = "ibm,get-xive",
@@ -239,6 +284,11 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 2, .size_idx1 = 3,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * PAPR+ R1–7.3.26–6 says the OS should allow only one
+		 * call sequence in progress at a time.
+		 */
+		.lock = &rtas_ibm_lpar_perftools_lock,
 	},
 	[RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE] = {
 		.name = "ibm,manage-flash-image",
@@ -277,6 +327,14 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 0, .size_idx1 = 1,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * This follows a sequence-based pattern similar to
+		 * ibm,get-vpd et al. Since PAPR+ restricts
+		 * interleaving call sequences for other functions of
+		 * this style, assume the restriction applies here,
+		 * even though it's not explicit in the spec.
+		 */
+		.lock = &rtas_ibm_physical_attestation_lock,
 	},
 	[RTAS_FNIDX__IBM_PLATFORM_DUMP] = {
 		.name = "ibm,platform-dump",
@@ -284,6 +342,13 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 4, .size_idx1 = 5,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * PAPR+ 7.3.3.4.1 indicates that concurrent sequences
+		 * of ibm,platform-dump are allowed if they are
+		 * operating on different dump tags. So leave
+		 * the lock pointer unset for now. This may need
+		 * reconsideration if kernel-internal users appear.
+		 */
 	},
 	[RTAS_FNIDX__IBM_POWER_OFF_UPS] = {
 		.name = "ibm,power-off-ups",
@@ -326,6 +391,12 @@  static struct rtas_function rtas_function_table[] __ro_after_init = {
 			.buf_idx1 = 2, .size_idx1 = -1,
 			.buf_idx2 = -1, .size_idx2 = -1,
 		},
+		/*
+		 * PAPR+ R1–7.3.18–3 says the OS must not call this
+		 * function with different inputs until a non-retry
+		 * status has been returned.
+		 */
+		.lock = &rtas_ibm_set_dynamic_indicator_lock,
 	},
 	[RTAS_FNIDX__IBM_SET_EEH_OPTION] = {
 		.name = "ibm,set-eeh-option",
@@ -556,9 +627,9 @@  static int __init rtas_token_to_function_xarray_init(void)
 }
 arch_initcall(rtas_token_to_function_xarray_init);
 
-static const struct rtas_function *rtas_token_to_function(s32 token)
+static struct rtas_function *rtas_token_to_function(s32 token)
 {
-	const struct rtas_function *func;
+	struct rtas_function *func;
 
 	if (WARN_ONCE(token < 0, "invalid token %d", token))
 		return NULL;
@@ -581,6 +652,28 @@  static const struct rtas_function *rtas_token_to_function(s32 token)
 	return NULL;
 }
 
+static void __rtas_function_lock(struct rtas_function *func)
+{
+	if (func && func->lock)
+		mutex_lock(func->lock);
+}
+
+static void __rtas_function_unlock(struct rtas_function *func)
+{
+	if (func && func->lock)
+		mutex_unlock(func->lock);
+}
+
+void rtas_function_lock(const rtas_fn_handle_t handle)
+{
+	__rtas_function_lock(rtas_function_lookup(handle));
+}
+
+void rtas_function_unlock(const rtas_fn_handle_t handle)
+{
+	__rtas_function_unlock(rtas_function_lookup(handle));
+}
+
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
@@ -1885,6 +1978,8 @@  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 
 	buff_copy = get_errorlog_buffer();
 
+	__rtas_function_lock(rtas_token_to_function(token));
+
 	raw_spin_lock_irqsave(&rtas_lock, flags);
 	cookie = lockdep_pin_lock(&rtas_lock);
 
@@ -1900,6 +1995,8 @@  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	lockdep_unpin_lock(&rtas_lock, cookie);
 	raw_spin_unlock_irqrestore(&rtas_lock, flags);
 
+	__rtas_function_unlock(rtas_token_to_function(token));
+
 	if (buff_copy) {
 		if (errbuf)
 			log_error(errbuf, ERR_TYPE_RTAS_LOG, 0);