Message ID | 20231117-papr-sys_rtas-vs-lockdown-v4-5-b794d8cb8502@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc/pseries: New character devices for system parameters and VPD | expand |
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> writes: > From: Nathan Lynch <nathanl@linux.ibm.com> > > On RTAS platforms there is a general restriction that the OS must not > enter RTAS on more than one CPU at a time. This low-level > serialization requirement is satisfied by holding a spin > lock (rtas_lock) across most RTAS function invocations. > > However, some pseries RTAS functions require multiple successive calls > to complete a logical operation. Beginning a new call sequence for such a > function may disrupt any other sequences of that function already in > progress. Safe and reliable use of these functions effectively > requires higher-level serialization beyond what is already done at the > level of RTAS entry and exit. > > Where a sequence-based RTAS function is invoked only through > sys_rtas(), with no in-kernel users, there is no issue as far as the > kernel is concerned. User space is responsible for appropriately > serializing its call sequences. (Whether user space code actually > takes measures to prevent sequence interleaving is another matter.) > Examples of such functions currently include ibm,platform-dump and > ibm,get-vpd. > > But where a sequence-based RTAS function has both user space and > in-kernel uesrs, there is a hazard. Even if the in-kernel call sites > of such a function serialize their sequences correctly, a user of > sys_rtas() can invoke the same function at any time, potentially > disrupting a sequence in progress. > > So in order to prevent disruption of kernel-based RTAS call sequences, > they must serialize not only with themselves but also with sys_rtas() > users, somehow. Preferably without adding global locks or adding more > function-specific hacks to sys_rtas(). This is a prerequisite for > adding an in-kernel call sequence of ibm,get-vpd, which is in a change > to follow. > > Note that it has never been feasible for the kernel to prevent > sys_rtas()-based sequences from being disrupted because control > returns to user space on every call. sys_rtas()-based users of these > functions have always been, and continue to be, responsible for > coordinating their call sequences with other users, even those which > may invoke the RTAS functions through less direct means than > sys_rtas(). This is an unavoidable consequence of exposing > sequence-based RTAS functions through sys_rtas(). > > * Add new rtas_function_lock() and rtas_function_unlock() APIs for use > with sequence-based RTAS functions. > > * Add an optional per-function mutex to struct rtas_function. When this > member is set, kernel-internal callers of the RTAS function are > required to guard their call sequences with rtas_function_lock() and > rtas_function_unlock(). This requirement will be enforced in a later > change, after all affected call sites are updated. > > * Populate the lock members of function table entries where > serialization of call sequences is known to be necessary, along with > justifying commentary. > > * In sys_rtas(), acquire the per-function mutex when it is present. > > There should be no perceivable change introduced here except that > concurrent callers of the same RTAS function via sys_rtas() may block > on a mutex instead of spinning on rtas_lock. Changes to follow will > add rtas_function_lock()/unlock() pairs to kernel-based call > sequences. > Can you add an example of the last part. I did look at to find 06 to find the details rtas_function_lock(RTAS_FN_IBM_ACTIVATE_FIRMWARE); do { fwrc = rtas_call(token, 0, 1, NULL); } while (rtas_busy_delay(fwrc)); rtas_function_unlock(RTAS_FN_IBM_ACTIVATE_FIRMWARE); Reviewed-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org> > > Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> > --- > arch/powerpc/include/asm/rtas.h | 2 + > arch/powerpc/kernel/rtas.c | 101 +++++++++++++++++++++++++++++++++++++++- > 2 files changed, 101 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h > index b73010583a8d..9a20caba6858 100644 > --- a/arch/powerpc/include/asm/rtas.h > +++ b/arch/powerpc/include/asm/rtas.h > @@ -421,6 +421,8 @@ static inline bool rtas_function_implemented(const rtas_fn_handle_t handle) > { > return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE; > } > +void rtas_function_lock(rtas_fn_handle_t handle); > +void rtas_function_unlock(rtas_fn_handle_t handle); > extern int rtas_token(const char *service); > extern int rtas_service_present(const char *service); > extern int rtas_call(int token, int, int, int *, ...); > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 1fc0b3fffdd1..52f2242d0c28 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -18,6 +18,7 @@ > #include <linux/kernel.h> > #include <linux/lockdep.h> > #include <linux/memblock.h> > +#include <linux/mutex.h> > #include <linux/of.h> > #include <linux/of_fdt.h> > #include <linux/reboot.h> > @@ -70,14 +71,34 @@ struct rtas_filter { > * ppc64le, and we want to keep it that way. It does > * not make sense for this to be set when @filter > * is NULL. > + * @lock: Pointer to an optional dedicated per-function mutex. This > + * should be set for functions that require multiple calls in > + * sequence to complete a single operation, and such sequences > + * will disrupt each other if allowed to interleave. Users of > + * this function are required to hold the associated lock for > + * the duration of the call sequence. Add an explanatory > + * comment to the function table entry if setting this member. > */ > struct rtas_function { > s32 token; > const bool banned_for_syscall_on_le:1; > const char * const name; > const struct rtas_filter *filter; > + struct mutex *lock; > }; > > +/* > + * Per-function locks. Access these through the > + * rtas_function_lock/unlock APIs, not directly. > + */ > +static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock); > +static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock); > +static DEFINE_MUTEX(rtas_ibm_get_indices_lock); > +static DEFINE_MUTEX(rtas_ibm_get_vpd_lock); > +static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock); > +static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock); > +static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock); > + > static struct rtas_function rtas_function_table[] __ro_after_init = { > [RTAS_FNIDX__CHECK_EXCEPTION] = { > .name = "check-exception", > @@ -125,6 +146,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = -1, .size_idx1 = -1, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR doesn't explicitly impose any restriction, but > + * this typically requires multiple calls before > + * success, and there's no reason to allow sequences > + * to interleave. > + */ > + .lock = &rtas_ibm_activate_firmware_lock, > }, > [RTAS_FNIDX__IBM_CBE_START_PTCAL] = { > .name = "ibm,cbe-start-ptcal", > @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 1, .size_idx1 = -1, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR+ R1–7.3.19–3 is explicit that the OS must not > + * call ibm,get-dynamic-sensor-state with different > + * inputs until a non-retry status has been returned. > + */ > + .lock = &rtas_ibm_get_dynamic_sensor_state_lock, > }, > [RTAS_FNIDX__IBM_GET_INDICES] = { > .name = "ibm,get-indices", > @@ -203,6 +237,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 2, .size_idx1 = 3, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR+ R1–7.3.17–2 says that the OS must not > + * interleave ibm,get-indices call sequences with > + * different inputs. > + */ > + .lock = &rtas_ibm_get_indices_lock, > }, > [RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY] = { > .name = "ibm,get-rio-topology", > @@ -220,6 +260,11 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 0, .size_idx1 = -1, > .buf_idx2 = 1, .size_idx2 = 2, > }, > + /* > + * PAPR+ R1–7.3.20–4 indicates that sequences > + * should not be allowed to interleave. > + */ > + .lock = &rtas_ibm_get_vpd_lock, > }, > [RTAS_FNIDX__IBM_GET_XIVE] = { > .name = "ibm,get-xive", > @@ -239,6 +284,11 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 2, .size_idx1 = 3, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR+ R1–7.3.26–6 says the OS should allow only one > + * call sequence in progress at a time. > + */ > + .lock = &rtas_ibm_lpar_perftools_lock, > }, > [RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE] = { > .name = "ibm,manage-flash-image", > @@ -277,6 +327,14 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 0, .size_idx1 = 1, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * This follows a sequence-based pattern similar to > + * ibm,get-vpd et al. Since PAPR+ restricts > + * interleaving call sequences for other functions of > + * this style, assume the restriction applies here, > + * even though it's not explicit in the spec. > + */ > + .lock = &rtas_ibm_physical_attestation_lock, > }, > [RTAS_FNIDX__IBM_PLATFORM_DUMP] = { > .name = "ibm,platform-dump", > @@ -284,6 +342,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 4, .size_idx1 = 5, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR+ 7.3.3.4.1 indicates that concurrent sequences > + * of ibm,platform-dump are allowed if they are > + * operating on different dump tags. So leave > + * the lock pointer unset for now. This may need > + * reconsideration if kernel-internal users appear. > + */ > }, > [RTAS_FNIDX__IBM_POWER_OFF_UPS] = { > .name = "ibm,power-off-ups", > @@ -326,6 +391,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 2, .size_idx1 = -1, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR+ R1–7.3.18–3 says the OS must not call this > + * function with different inputs until a non-retry > + * status has been returned. > + */ > + .lock = &rtas_ibm_set_dynamic_indicator_lock, > }, > [RTAS_FNIDX__IBM_SET_EEH_OPTION] = { > .name = "ibm,set-eeh-option", > @@ -556,9 +627,9 @@ static int __init rtas_token_to_function_xarray_init(void) > } > arch_initcall(rtas_token_to_function_xarray_init); > > -static const struct rtas_function *rtas_token_to_function(s32 token) > +static struct rtas_function *rtas_token_to_function(s32 token) > { > - const struct rtas_function *func; > + struct rtas_function *func; > > if (WARN_ONCE(token < 0, "invalid token %d", token)) > return NULL; > @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) > return NULL; > } > > +static void __rtas_function_lock(struct rtas_function *func) > +{ > + if (func && func->lock) > + mutex_lock(func->lock); > +} > + > +static void __rtas_function_unlock(struct rtas_function *func) > +{ > + if (func && func->lock) > + mutex_unlock(func->lock); > +} > + > +void rtas_function_lock(const rtas_fn_handle_t handle) > +{ > + __rtas_function_lock(rtas_function_lookup(handle)); > +} > + > +void rtas_function_unlock(const rtas_fn_handle_t handle) > +{ > + __rtas_function_unlock(rtas_function_lookup(handle)); > +} > + > /* This is here deliberately so it's only used in this file */ > void enter_rtas(unsigned long); > > @@ -1885,6 +1978,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) > > buff_copy = get_errorlog_buffer(); > > + __rtas_function_lock(rtas_token_to_function(token)); > + > raw_spin_lock_irqsave(&rtas_lock, flags); > cookie = lockdep_pin_lock(&rtas_lock); > > @@ -1900,6 +1995,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) > lockdep_unpin_lock(&rtas_lock, cookie); > raw_spin_unlock_irqrestore(&rtas_lock, flags); > > + __rtas_function_unlock(rtas_token_to_function(token)); > + > if (buff_copy) { > if (errbuf) > log_error(errbuf, ERR_TYPE_RTAS_LOG, 0); > > -- > 2.41.0
"Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org> writes: > Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> > writes: >> There should be no perceivable change introduced here except that >> concurrent callers of the same RTAS function via sys_rtas() may block >> on a mutex instead of spinning on rtas_lock. Changes to follow will >> add rtas_function_lock()/unlock() pairs to kernel-based call >> sequences. >> > > Can you add an example of the last part. I did look at to find 06 to > find the details > > rtas_function_lock(RTAS_FN_IBM_ACTIVATE_FIRMWARE); > > do { > fwrc = rtas_call(token, 0, 1, NULL); > } while (rtas_busy_delay(fwrc)); > > rtas_function_unlock(RTAS_FN_IBM_ACTIVATE_FIRMWARE); Sure, I'll add a simple example of the API usage in the commit message, thanks.
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> writes: > From: Nathan Lynch <nathanl@linux.ibm.com> > > On RTAS platforms there is a general restriction that the OS must not > enter RTAS on more than one CPU at a time. This low-level > serialization requirement is satisfied by holding a spin > lock (rtas_lock) across most RTAS function invocations. ... > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 1fc0b3fffdd1..52f2242d0c28 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) > return NULL; > } > > +static void __rtas_function_lock(struct rtas_function *func) > +{ > + if (func && func->lock) > + mutex_lock(func->lock); > +} This is obviously going to defeat most static analysis tools. I assume lockdep is OK with it though? cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> > writes: >> From: Nathan Lynch <nathanl@linux.ibm.com> >> >> On RTAS platforms there is a general restriction that the OS must not >> enter RTAS on more than one CPU at a time. This low-level >> serialization requirement is satisfied by holding a spin >> lock (rtas_lock) across most RTAS function invocations. > ... >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >> index 1fc0b3fffdd1..52f2242d0c28 100644 >> --- a/arch/powerpc/kernel/rtas.c >> +++ b/arch/powerpc/kernel/rtas.c >> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >> return NULL; >> } >> >> +static void __rtas_function_lock(struct rtas_function *func) >> +{ >> + if (func && func->lock) >> + mutex_lock(func->lock); >> +} > > This is obviously going to defeat most static analysis tools. I guess it's not that obvious to me :-) Is it because the mutex_lock() is conditional? I'll improve this if it's possible. > I assume lockdep is OK with it though? Seems to be, yes.
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> writes: > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 1fc0b3fffdd1..52f2242d0c28 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { > .buf_idx1 = 1, .size_idx1 = -1, > .buf_idx2 = -1, .size_idx2 = -1, > }, > + /* > + * PAPR+ R1–7.3.19–3 is explicit that the OS must not When you cite PAPR+ can you please include the version number? That's a general comment on this patch and in some other places in the series too. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> > writes: >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >> index 1fc0b3fffdd1..52f2242d0c28 100644 >> --- a/arch/powerpc/kernel/rtas.c >> +++ b/arch/powerpc/kernel/rtas.c >> @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { >> .buf_idx1 = 1, .size_idx1 = -1, >> .buf_idx2 = -1, .size_idx2 = -1, >> }, >> + /* >> + * PAPR+ R1–7.3.19–3 is explicit that the OS must not > > When you cite PAPR+ can you please include the version number? > > That's a general comment on this patch and in some other places in the > series too. OK. I assume v2.13 is fine even though most of the citations refer to passages that significantly predate that version.
Nathan Lynch <nathanl@linux.ibm.com> writes: > Michael Ellerman <mpe@ellerman.id.au> writes: >> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >> writes: >>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>> --- a/arch/powerpc/kernel/rtas.c >>> +++ b/arch/powerpc/kernel/rtas.c >>> @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { >>> .buf_idx1 = 1, .size_idx1 = -1, >>> .buf_idx2 = -1, .size_idx2 = -1, >>> }, >>> + /* >>> + * PAPR+ R1–7.3.19–3 is explicit that the OS must not >> >> When you cite PAPR+ can you please include the version number? >> >> That's a general comment on this patch and in some other places in the >> series too. > > OK. I assume v2.13 is fine even though most of the citations refer to > passages that significantly predate that version. Yeah whatever version you are referring to. It just means if there's ever confusion about what's in the kernel comments vs the then current version of PAPR, we can go back and refer to the exact version you were using. It also avoids confusion vs LoPAPR, which is simliar but has some differently numbered chapters. cheers
Nathan Lynch <nathanl@linux.ibm.com> writes: > Michael Ellerman <mpe@ellerman.id.au> writes: > >> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >> writes: >>> From: Nathan Lynch <nathanl@linux.ibm.com> >>> >>> On RTAS platforms there is a general restriction that the OS must not >>> enter RTAS on more than one CPU at a time. This low-level >>> serialization requirement is satisfied by holding a spin >>> lock (rtas_lock) across most RTAS function invocations. >> ... >>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>> --- a/arch/powerpc/kernel/rtas.c >>> +++ b/arch/powerpc/kernel/rtas.c >>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >>> return NULL; >>> } >>> >>> +static void __rtas_function_lock(struct rtas_function *func) >>> +{ >>> + if (func && func->lock) >>> + mutex_lock(func->lock); >>> +} >> >> This is obviously going to defeat most static analysis tools. > > I guess it's not that obvious to me :-) Is it because the mutex_lock() > is conditional? I'll improve this if it's possible. Well maybe I'm not giving modern static analysis tools enough credit :) But what I mean that it's not easy to reason about what the function does in isolation. ie. all you can say is that it may or may not lock a mutex, and you can't say which mutex. >> I assume lockdep is OK with it though? > > Seems to be, yes. OK. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch <nathanl@linux.ibm.com> writes: >> Michael Ellerman <mpe@ellerman.id.au> writes: >> >>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >>> writes: >>>> From: Nathan Lynch <nathanl@linux.ibm.com> >>>> >>>> On RTAS platforms there is a general restriction that the OS must not >>>> enter RTAS on more than one CPU at a time. This low-level >>>> serialization requirement is satisfied by holding a spin >>>> lock (rtas_lock) across most RTAS function invocations. >>> ... >>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>>> --- a/arch/powerpc/kernel/rtas.c >>>> +++ b/arch/powerpc/kernel/rtas.c >>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >>>> return NULL; >>>> } >>>> >>>> +static void __rtas_function_lock(struct rtas_function *func) >>>> +{ >>>> + if (func && func->lock) >>>> + mutex_lock(func->lock); >>>> +} >>> >>> This is obviously going to defeat most static analysis tools. >> >> I guess it's not that obvious to me :-) Is it because the mutex_lock() >> is conditional? I'll improve this if it's possible. > > Well maybe I'm not giving modern static analysis tools enough credit :) > > But what I mean that it's not easy to reason about what the function > does in isolation. ie. all you can say is that it may or may not lock a > mutex, and you can't say which mutex. I've pulled the thread on this a little bit and here is what I can do: * Discard rtas_lock_function() and rtas_unlock_function() and make the function mutexes extern as needed. As of now only rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put __acquires(), __releases(), and __must_hold() annotations in papr-vpd.c since it explicitly manipulates the mutex. * Then sys_rtas() becomes the only site that needs __rtas_function_lock() and __rtas_function_unlock(), which can be open-coded and commented (and, one hopes, not emulated elsewhere). This will look something like: SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) { struct rtas_function *func = rtas_token_to_function(token); if (func->lock) mutex_lock(func->lock); [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ] if (func->lock) mutex_unlock(func->lock); The indirection seems unavoidable since we're working backwards from a token value (supplied by the user and not known at build time) to the function descriptor. Is that tolerable for now? Alternatively, sys_rtas() could be refactored into locking and non-locking paths, e.g. static long __do_sys_rtas(struct rtas_function *func) { // [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ] } static long do_sys_rtas(struct rtas_function *func, struct mutex *mtx) { mutex_lock(mtx); ret = __do_sys_rtas(func); mutex_unlock(mtx); return ret; } SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) { // real code does copy_from_user etc struct rtas_function *func = rtas_token_to_function(uargs->token); long ret; // [ ... input validation and filtering ... ] if (func->lock) ret = do_sys_rtas(func, func->lock); else ret = __do_sys_rtas(func); // [ ... copy out results ... ] return ret; }
Nathan Lynch <nathanl@linux.ibm.com> writes: > Alternatively, sys_rtas() could be refactored into locking and > non-locking paths, e.g. > > static long __do_sys_rtas(struct rtas_function *func) > { > // [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ] > } Of course this conflicts with the code generated by SYSCALL_DEFINE1(rtas), so a different function name would be needed here.
Nathan Lynch <nathanl@linux.ibm.com> writes: > Michael Ellerman <mpe@ellerman.id.au> writes: >> Nathan Lynch <nathanl@linux.ibm.com> writes: >>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >>>> writes: >>>>> From: Nathan Lynch <nathanl@linux.ibm.com> >>>>> >>>>> On RTAS platforms there is a general restriction that the OS must not >>>>> enter RTAS on more than one CPU at a time. This low-level >>>>> serialization requirement is satisfied by holding a spin >>>>> lock (rtas_lock) across most RTAS function invocations. >>>> ... >>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>>>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>>>> --- a/arch/powerpc/kernel/rtas.c >>>>> +++ b/arch/powerpc/kernel/rtas.c >>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >>>>> return NULL; >>>>> } >>>>> >>>>> +static void __rtas_function_lock(struct rtas_function *func) >>>>> +{ >>>>> + if (func && func->lock) >>>>> + mutex_lock(func->lock); >>>>> +} >>>> >>>> This is obviously going to defeat most static analysis tools. >>> >>> I guess it's not that obvious to me :-) Is it because the mutex_lock() >>> is conditional? I'll improve this if it's possible. >> >> Well maybe I'm not giving modern static analysis tools enough credit :) >> >> But what I mean that it's not easy to reason about what the function >> does in isolation. ie. all you can say is that it may or may not lock a >> mutex, and you can't say which mutex. > > I've pulled the thread on this a little bit and here is what I can do: > > * Discard rtas_lock_function() and rtas_unlock_function() and make the > function mutexes extern as needed. As of now only > rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put > __acquires(), __releases(), and __must_hold() annotations in > papr-vpd.c since it explicitly manipulates the mutex. > > * Then sys_rtas() becomes the only site that needs > __rtas_function_lock() and __rtas_function_unlock(), which can be > open-coded and commented (and, one hopes, not emulated elsewhere). > > This will look something like: > > SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) > { > struct rtas_function *func = rtas_token_to_function(token); > > if (func->lock) > mutex_lock(func->lock); > > [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ] > > if (func->lock) > mutex_unlock(func->lock); > > The indirection seems unavoidable since we're working backwards from a > token value (supplied by the user and not known at build time) to the > function descriptor. > > Is that tolerable for now? Yeah. Thanks for looking into it. I wasn't unhappy with the original version, but just slightly uneasy about the locking via pointer. But that new proposal sounds good, more code will have static lock annotations, and only sys_rtas() which is already weird, will have the dynamic stuff. > Alternatively, sys_rtas() could be refactored into locking and > non-locking paths, e.g. > > static long __do_sys_rtas(struct rtas_function *func) > { > // [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ] > } > > static long do_sys_rtas(struct rtas_function *func, struct mutex *mtx) > { > mutex_lock(mtx); > ret = __do_sys_rtas(func); > mutex_unlock(mtx); > > return ret; > } > > SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) > { > // real code does copy_from_user etc > struct rtas_function *func = rtas_token_to_function(uargs->token); > long ret; > > // [ ... input validation and filtering ... ] > > if (func->lock) > ret = do_sys_rtas(func, func->lock); > else > ret = __do_sys_rtas(func); > > // [ ... copy out results ... ] > > return ret; > } You could go even further and switch on the token, and handle each case separately so that you can then statically take the appropriate lock. But that's probably overkill. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch <nathanl@linux.ibm.com> writes: >> Michael Ellerman <mpe@ellerman.id.au> writes: >>> Nathan Lynch <nathanl@linux.ibm.com> writes: >>>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >>>>> writes: >>>>>> From: Nathan Lynch <nathanl@linux.ibm.com> >>>>>> >>>>>> On RTAS platforms there is a general restriction that the OS must not >>>>>> enter RTAS on more than one CPU at a time. This low-level >>>>>> serialization requirement is satisfied by holding a spin >>>>>> lock (rtas_lock) across most RTAS function invocations. >>>>> ... >>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>>>>> --- a/arch/powerpc/kernel/rtas.c >>>>>> +++ b/arch/powerpc/kernel/rtas.c >>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >>>>>> return NULL; >>>>>> } >>>>>> >>>>>> +static void __rtas_function_lock(struct rtas_function *func) >>>>>> +{ >>>>>> + if (func && func->lock) >>>>>> + mutex_lock(func->lock); >>>>>> +} >>>>> >>>>> This is obviously going to defeat most static analysis tools. >>>> >>>> I guess it's not that obvious to me :-) Is it because the mutex_lock() >>>> is conditional? I'll improve this if it's possible. >>> >>> Well maybe I'm not giving modern static analysis tools enough credit :) >>> >>> But what I mean that it's not easy to reason about what the function >>> does in isolation. ie. all you can say is that it may or may not lock a >>> mutex, and you can't say which mutex. >> >> I've pulled the thread on this a little bit and here is what I can do: >> >> * Discard rtas_lock_function() and rtas_unlock_function() and make the >> function mutexes extern as needed. As of now only >> rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put >> __acquires(), __releases(), and __must_hold() annotations in >> papr-vpd.c since it explicitly manipulates the mutex. >> >> * Then sys_rtas() becomes the only site that needs >> __rtas_function_lock() and __rtas_function_unlock(), which can be >> open-coded and commented (and, one hopes, not emulated elsewhere). >> >> This will look something like: >> >> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) >> { >> struct rtas_function *func = rtas_token_to_function(token); >> >> if (func->lock) >> mutex_lock(func->lock); >> >> [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ] >> >> if (func->lock) >> mutex_unlock(func->lock); >> >> The indirection seems unavoidable since we're working backwards from a >> token value (supplied by the user and not known at build time) to the >> function descriptor. >> >> Is that tolerable for now? > > Yeah. Thanks for looking into it. > > I wasn't unhappy with the original version, but just slightly uneasy > about the locking via pointer. > > But that new proposal sounds good, more code will have static lock > annotations, and only sys_rtas() which is already weird, will have the > dynamic stuff. OK, I'll work that up then.
Nathan Lynch <nathanl@linux.ibm.com> writes: > Michael Ellerman <mpe@ellerman.id.au> writes: >> Nathan Lynch <nathanl@linux.ibm.com> writes: >>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>> Nathan Lynch <nathanl@linux.ibm.com> writes: >>>>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >>>>>> writes: >>>>>>> From: Nathan Lynch <nathanl@linux.ibm.com> >>>>>>> >>>>>>> On RTAS platforms there is a general restriction that the OS must not >>>>>>> enter RTAS on more than one CPU at a time. This low-level >>>>>>> serialization requirement is satisfied by holding a spin >>>>>>> lock (rtas_lock) across most RTAS function invocations. >>>>>> ... >>>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>>>>>> --- a/arch/powerpc/kernel/rtas.c >>>>>>> +++ b/arch/powerpc/kernel/rtas.c >>>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >>>>>>> return NULL; >>>>>>> } >>>>>>> >>>>>>> +static void __rtas_function_lock(struct rtas_function *func) >>>>>>> +{ >>>>>>> + if (func && func->lock) >>>>>>> + mutex_lock(func->lock); >>>>>>> +} >>>>>> >>>>>> This is obviously going to defeat most static analysis tools. >>>>> >>>>> I guess it's not that obvious to me :-) Is it because the mutex_lock() >>>>> is conditional? I'll improve this if it's possible. >>>> >>>> Well maybe I'm not giving modern static analysis tools enough credit :) >>>> >>>> But what I mean that it's not easy to reason about what the function >>>> does in isolation. ie. all you can say is that it may or may not lock a >>>> mutex, and you can't say which mutex. >>> >>> I've pulled the thread on this a little bit and here is what I can do: >>> >>> * Discard rtas_lock_function() and rtas_unlock_function() and make the >>> function mutexes extern as needed. As of now only >>> rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put >>> __acquires(), __releases(), and __must_hold() annotations in >>> papr-vpd.c since it explicitly manipulates the mutex. >>> >>> * Then sys_rtas() becomes the only site that needs >>> __rtas_function_lock() and __rtas_function_unlock(), which can be >>> open-coded and commented (and, one hopes, not emulated elsewhere). >>> >>> This will look something like: >>> >>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) >>> { >>> struct rtas_function *func = rtas_token_to_function(token); >>> >>> if (func->lock) >>> mutex_lock(func->lock); >>> >>> [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ] >>> >>> if (func->lock) >>> mutex_unlock(func->lock); >>> >>> The indirection seems unavoidable since we're working backwards from a >>> token value (supplied by the user and not known at build time) to the >>> function descriptor. >>> >>> Is that tolerable for now? >> >> Yeah. Thanks for looking into it. >> >> I wasn't unhappy with the original version, but just slightly uneasy >> about the locking via pointer. >> >> But that new proposal sounds good, more code will have static lock >> annotations, and only sys_rtas() which is already weird, will have the >> dynamic stuff. > > OK, I'll work that up then. Well, apparently the annotations aren't useful with mutexes; see these threads: https://lore.kernel.org/all/8e8d93ee2125c739caabe5986f40fa2156c8b4ce.1579893447.git.jbi.octave@gmail.com/ https://lore.kernel.org/all/20200601184552.23128-5-jbi.octave@gmail.com/ And indeed I can't get sparse to accept them when added to the papr-vpd code: $ make C=2 arch/powerpc/platforms/pseries/papr-vpd.o CHECK scripts/mod/empty.c CALL scripts/checksyscalls.sh CHECK arch/powerpc/platforms/pseries/papr-vpd.c arch/powerpc/platforms/pseries/papr-vpd.c:235:13: warning: context imbalance in 'vpd_sequence_begin' - wrong count at exit arch/powerpc/platforms/pseries/papr-vpd.c:269:13: warning: context imbalance in 'vpd_sequence_end' - wrong count at exit I don't think it's my own mistake since I see existing code with the same problem, such as net/core/sock.c: static void *proto_seq_start(struct seq_file *seq, loff_t *pos) __acquires(proto_list_mutex) { mutex_lock(&proto_list_mutex); return seq_list_start_head(&proto_list, *pos); } static void proto_seq_stop(struct seq_file *seq, void *v) __releases(proto_list_mutex) { mutex_unlock(&proto_list_mutex); } which yields: net/core/sock.c:4018:13: warning: context imbalance in 'proto_seq_start' - wrong count at exit net/core/sock.c:4030:13: warning: context imbalance in 'proto_seq_stop' - wrong count at exit So I'll give up on static annotations for this series and look for opportunities to add lockdep_assert_held() etc.
Nathan Lynch <nathanl@linux.ibm.com> writes: > Nathan Lynch <nathanl@linux.ibm.com> writes: >> Michael Ellerman <mpe@ellerman.id.au> writes: >>> Nathan Lynch <nathanl@linux.ibm.com> writes: >>>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>>> Nathan Lynch <nathanl@linux.ibm.com> writes: >>>>>> Michael Ellerman <mpe@ellerman.id.au> writes: >>>>>>> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> >>>>>>> writes: >>>>>>>> From: Nathan Lynch <nathanl@linux.ibm.com> >>>>>>>> >>>>>>>> On RTAS platforms there is a general restriction that the OS must not >>>>>>>> enter RTAS on more than one CPU at a time. This low-level >>>>>>>> serialization requirement is satisfied by holding a spin >>>>>>>> lock (rtas_lock) across most RTAS function invocations. >>>>>>> ... >>>>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>>>>>>> index 1fc0b3fffdd1..52f2242d0c28 100644 >>>>>>>> --- a/arch/powerpc/kernel/rtas.c >>>>>>>> +++ b/arch/powerpc/kernel/rtas.c >>>>>>>> @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) >>>>>>>> return NULL; >>>>>>>> } >>>>>>>> >>>>>>>> +static void __rtas_function_lock(struct rtas_function *func) >>>>>>>> +{ >>>>>>>> + if (func && func->lock) >>>>>>>> + mutex_lock(func->lock); >>>>>>>> +} >>>>>>> >>>>>>> This is obviously going to defeat most static analysis tools. >>>>>> >>>>>> I guess it's not that obvious to me :-) Is it because the mutex_lock() >>>>>> is conditional? I'll improve this if it's possible. >>>>> >>>>> Well maybe I'm not giving modern static analysis tools enough credit :) >>>>> >>>>> But what I mean that it's not easy to reason about what the function >>>>> does in isolation. ie. all you can say is that it may or may not lock a >>>>> mutex, and you can't say which mutex. >>>> >>>> I've pulled the thread on this a little bit and here is what I can do: >>>> >>>> * Discard rtas_lock_function() and rtas_unlock_function() and make the >>>> function mutexes extern as needed. As of now only >>>> rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put >>>> __acquires(), __releases(), and __must_hold() annotations in >>>> papr-vpd.c since it explicitly manipulates the mutex. >>>> >>>> * Then sys_rtas() becomes the only site that needs >>>> __rtas_function_lock() and __rtas_function_unlock(), which can be >>>> open-coded and commented (and, one hopes, not emulated elsewhere). >>>> >>>> This will look something like: >>>> >>>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) >>>> { >>>> struct rtas_function *func = rtas_token_to_function(token); >>>> >>>> if (func->lock) >>>> mutex_lock(func->lock); >>>> >>>> [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ] >>>> >>>> if (func->lock) >>>> mutex_unlock(func->lock); >>>> >>>> The indirection seems unavoidable since we're working backwards from a >>>> token value (supplied by the user and not known at build time) to the >>>> function descriptor. >>>> >>>> Is that tolerable for now? >>> >>> Yeah. Thanks for looking into it. >>> >>> I wasn't unhappy with the original version, but just slightly uneasy >>> about the locking via pointer. >>> >>> But that new proposal sounds good, more code will have static lock >>> annotations, and only sys_rtas() which is already weird, will have the >>> dynamic stuff. >> >> OK, I'll work that up then. > > Well, apparently the annotations aren't useful with mutexes; see these > threads: > > https://lore.kernel.org/all/8e8d93ee2125c739caabe5986f40fa2156c8b4ce.1579893447.git.jbi.octave@gmail.com/ > https://lore.kernel.org/all/20200601184552.23128-5-jbi.octave@gmail.com/ > > And indeed I can't get sparse to accept them when added to the papr-vpd > code: > > $ make C=2 arch/powerpc/platforms/pseries/papr-vpd.o > CHECK scripts/mod/empty.c > CALL scripts/checksyscalls.sh > CHECK arch/powerpc/platforms/pseries/papr-vpd.c > arch/powerpc/platforms/pseries/papr-vpd.c:235:13: warning: context > imbalance in 'vpd_sequence_begin' - wrong count at exit > arch/powerpc/platforms/pseries/papr-vpd.c:269:13: warning: context > imbalance in 'vpd_sequence_end' - wrong count at exit Oof. Sorry to send you on that goose chase. > I don't think it's my own mistake since I see existing code with the > same problem, such as net/core/sock.c: > > static void *proto_seq_start(struct seq_file *seq, loff_t *pos) > __acquires(proto_list_mutex) > { > mutex_lock(&proto_list_mutex); > return seq_list_start_head(&proto_list, *pos); > } > > static void proto_seq_stop(struct seq_file *seq, void *v) > __releases(proto_list_mutex) > { > mutex_unlock(&proto_list_mutex); > } > > which yields: > > net/core/sock.c:4018:13: warning: context imbalance in 'proto_seq_start' > - wrong count at exit > net/core/sock.c:4030:13: warning: context imbalance in 'proto_seq_stop' > - wrong count at exit > > So I'll give up on static annotations for this series and look for > opportunities to add lockdep_assert_held() etc. OK. There are other static analysers than sparse, and even for lockdep having more of the locks taken statically is probably still preferable. But feel free to ignore me and just repost what you had :) cheers
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index b73010583a8d..9a20caba6858 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -421,6 +421,8 @@ static inline bool rtas_function_implemented(const rtas_fn_handle_t handle) { return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE; } +void rtas_function_lock(rtas_fn_handle_t handle); +void rtas_function_unlock(rtas_fn_handle_t handle); extern int rtas_token(const char *service); extern int rtas_service_present(const char *service); extern int rtas_call(int token, int, int, int *, ...); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 1fc0b3fffdd1..52f2242d0c28 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -18,6 +18,7 @@ #include <linux/kernel.h> #include <linux/lockdep.h> #include <linux/memblock.h> +#include <linux/mutex.h> #include <linux/of.h> #include <linux/of_fdt.h> #include <linux/reboot.h> @@ -70,14 +71,34 @@ struct rtas_filter { * ppc64le, and we want to keep it that way. It does * not make sense for this to be set when @filter * is NULL. + * @lock: Pointer to an optional dedicated per-function mutex. This + * should be set for functions that require multiple calls in + * sequence to complete a single operation, and such sequences + * will disrupt each other if allowed to interleave. Users of + * this function are required to hold the associated lock for + * the duration of the call sequence. Add an explanatory + * comment to the function table entry if setting this member. */ struct rtas_function { s32 token; const bool banned_for_syscall_on_le:1; const char * const name; const struct rtas_filter *filter; + struct mutex *lock; }; +/* + * Per-function locks. Access these through the + * rtas_function_lock/unlock APIs, not directly. + */ +static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock); +static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock); +static DEFINE_MUTEX(rtas_ibm_get_indices_lock); +static DEFINE_MUTEX(rtas_ibm_get_vpd_lock); +static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock); +static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock); +static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock); + static struct rtas_function rtas_function_table[] __ro_after_init = { [RTAS_FNIDX__CHECK_EXCEPTION] = { .name = "check-exception", @@ -125,6 +146,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = -1, .size_idx1 = -1, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * PAPR doesn't explicitly impose any restriction, but + * this typically requires multiple calls before + * success, and there's no reason to allow sequences + * to interleave. + */ + .lock = &rtas_ibm_activate_firmware_lock, }, [RTAS_FNIDX__IBM_CBE_START_PTCAL] = { .name = "ibm,cbe-start-ptcal", @@ -196,6 +224,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 1, .size_idx1 = -1, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * PAPR+ R1–7.3.19–3 is explicit that the OS must not + * call ibm,get-dynamic-sensor-state with different + * inputs until a non-retry status has been returned. + */ + .lock = &rtas_ibm_get_dynamic_sensor_state_lock, }, [RTAS_FNIDX__IBM_GET_INDICES] = { .name = "ibm,get-indices", @@ -203,6 +237,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 2, .size_idx1 = 3, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * PAPR+ R1–7.3.17–2 says that the OS must not + * interleave ibm,get-indices call sequences with + * different inputs. + */ + .lock = &rtas_ibm_get_indices_lock, }, [RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY] = { .name = "ibm,get-rio-topology", @@ -220,6 +260,11 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 0, .size_idx1 = -1, .buf_idx2 = 1, .size_idx2 = 2, }, + /* + * PAPR+ R1–7.3.20–4 indicates that sequences + * should not be allowed to interleave. + */ + .lock = &rtas_ibm_get_vpd_lock, }, [RTAS_FNIDX__IBM_GET_XIVE] = { .name = "ibm,get-xive", @@ -239,6 +284,11 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 2, .size_idx1 = 3, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * PAPR+ R1–7.3.26–6 says the OS should allow only one + * call sequence in progress at a time. + */ + .lock = &rtas_ibm_lpar_perftools_lock, }, [RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE] = { .name = "ibm,manage-flash-image", @@ -277,6 +327,14 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 0, .size_idx1 = 1, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * This follows a sequence-based pattern similar to + * ibm,get-vpd et al. Since PAPR+ restricts + * interleaving call sequences for other functions of + * this style, assume the restriction applies here, + * even though it's not explicit in the spec. + */ + .lock = &rtas_ibm_physical_attestation_lock, }, [RTAS_FNIDX__IBM_PLATFORM_DUMP] = { .name = "ibm,platform-dump", @@ -284,6 +342,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 4, .size_idx1 = 5, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * PAPR+ 7.3.3.4.1 indicates that concurrent sequences + * of ibm,platform-dump are allowed if they are + * operating on different dump tags. So leave + * the lock pointer unset for now. This may need + * reconsideration if kernel-internal users appear. + */ }, [RTAS_FNIDX__IBM_POWER_OFF_UPS] = { .name = "ibm,power-off-ups", @@ -326,6 +391,12 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { .buf_idx1 = 2, .size_idx1 = -1, .buf_idx2 = -1, .size_idx2 = -1, }, + /* + * PAPR+ R1–7.3.18–3 says the OS must not call this + * function with different inputs until a non-retry + * status has been returned. + */ + .lock = &rtas_ibm_set_dynamic_indicator_lock, }, [RTAS_FNIDX__IBM_SET_EEH_OPTION] = { .name = "ibm,set-eeh-option", @@ -556,9 +627,9 @@ static int __init rtas_token_to_function_xarray_init(void) } arch_initcall(rtas_token_to_function_xarray_init); -static const struct rtas_function *rtas_token_to_function(s32 token) +static struct rtas_function *rtas_token_to_function(s32 token) { - const struct rtas_function *func; + struct rtas_function *func; if (WARN_ONCE(token < 0, "invalid token %d", token)) return NULL; @@ -581,6 +652,28 @@ static const struct rtas_function *rtas_token_to_function(s32 token) return NULL; } +static void __rtas_function_lock(struct rtas_function *func) +{ + if (func && func->lock) + mutex_lock(func->lock); +} + +static void __rtas_function_unlock(struct rtas_function *func) +{ + if (func && func->lock) + mutex_unlock(func->lock); +} + +void rtas_function_lock(const rtas_fn_handle_t handle) +{ + __rtas_function_lock(rtas_function_lookup(handle)); +} + +void rtas_function_unlock(const rtas_fn_handle_t handle) +{ + __rtas_function_unlock(rtas_function_lookup(handle)); +} + /* This is here deliberately so it's only used in this file */ void enter_rtas(unsigned long); @@ -1885,6 +1978,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) buff_copy = get_errorlog_buffer(); + __rtas_function_lock(rtas_token_to_function(token)); + raw_spin_lock_irqsave(&rtas_lock, flags); cookie = lockdep_pin_lock(&rtas_lock); @@ -1900,6 +1995,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) lockdep_unpin_lock(&rtas_lock, cookie); raw_spin_unlock_irqrestore(&rtas_lock, flags); + __rtas_function_unlock(rtas_token_to_function(token)); + if (buff_copy) { if (errbuf) log_error(errbuf, ERR_TYPE_RTAS_LOG, 0);