Message ID | 20211203154321.13168-1-ldufour@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3] powerpc/pseries: read the lpar name from the firmware | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 7 jobs. |
Hi Laurent, Laurent Dufour <ldufour@linux.ibm.com> writes: > +/* > + * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to > + * read the LPAR name. > + */ > +#define SPLPAR_LPAR_NAME_TOKEN 55 > +static void read_lpar_name(struct seq_file *m) > +{ > + int rc, len, token; > + union { > + char raw_buffer[RTAS_DATA_BUF_SIZE]; > + struct { > + __be16 len; This: > + char name[RTAS_DATA_BUF_SIZE-2]; ^^^^^^ should be 4000, not (4K - 2), according to PAPR (it's weird and I don't know the reason). > + }; > + } *local_buffer; > + > + token = rtas_token("ibm,get-system-parameter"); > + if (token == RTAS_UNKNOWN_SERVICE) > + return; > + > + local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL); > + if (!local_buffer) > + return; > + > + do { > + spin_lock(&rtas_data_buf_lock); > + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE); > + rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN, > + __pa(rtas_data_buf), RTAS_DATA_BUF_SIZE); > + if (!rc) > + memcpy(local_buffer->raw_buffer, rtas_data_buf, > + RTAS_DATA_BUF_SIZE); > + spin_unlock(&rtas_data_buf_lock); > + } while (rtas_busy_delay(rc)); > + > + if (rc != 0) { > + pr_err_once( > + "%s %s Error calling get-system-parameter (0x%x)\n", > + __FILE__, __func__, rc); The __FILE__ and __func__ in the message seem unnecessary, and rc should be reported in decimal so the error meaning is apparent. Is there a reasonable fallback for VMs where this parameter doesn't exist? PowerVM partitions should always have it, but what do we want the behavior to be on other hypervisors? > + } else { > + /* Force end of string */ > + len = be16_to_cpu(local_buffer->len); > + if (len >= (RTAS_DATA_BUF_SIZE-2)) > + len = RTAS_DATA_BUF_SIZE-2; Could use min() or clamp(), and it would be better to build the expression using the value of sizeof(local_buffer->name). > + local_buffer->name[len] = '\0'; If 'len' can be (RTAS_DATA_BUF_SIZE - 2), then this writes past the end of the buffer, no?
On 07/12/2021, 15:32:39, Nathan Lynch wrote: > Hi Laurent, > > Laurent Dufour <ldufour@linux.ibm.com> writes: >> +/* >> + * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to >> + * read the LPAR name. >> + */ >> +#define SPLPAR_LPAR_NAME_TOKEN 55 >> +static void read_lpar_name(struct seq_file *m) >> +{ >> + int rc, len, token; >> + union { >> + char raw_buffer[RTAS_DATA_BUF_SIZE]; >> + struct { >> + __be16 len; > > This: > >> + char name[RTAS_DATA_BUF_SIZE-2]; > ^^^^^^ > > should be 4000, not (4K - 2), according to PAPR (it's weird and I don't > know the reason). That's true, PAPR defines the largest output buffer for ibm,get-system-parameter to 4002, so we could limit this to 4002, not sure whether this would make a big difference here. Anyway I will limit that buffer size this way. > > >> + }; >> + } *local_buffer; >> + >> + token = rtas_token("ibm,get-system-parameter"); >> + if (token == RTAS_UNKNOWN_SERVICE) >> + return; >> + >> + local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL); >> + if (!local_buffer) >> + return; >> + >> + do { >> + spin_lock(&rtas_data_buf_lock); >> + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE); >> + rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN, >> + __pa(rtas_data_buf), RTAS_DATA_BUF_SIZE); >> + if (!rc) >> + memcpy(local_buffer->raw_buffer, rtas_data_buf, >> + RTAS_DATA_BUF_SIZE); >> + spin_unlock(&rtas_data_buf_lock); >> + } while (rtas_busy_delay(rc)); >> + >> + if (rc != 0) { >> + pr_err_once( >> + "%s %s Error calling get-system-parameter (0x%x)\n", >> + __FILE__, __func__, rc); > > The __FILE__ and __func__ in the message seem unnecessary, and rc should > be reported in decimal so the error meaning is apparent. Fair enough. > Is there a reasonable fallback for VMs where this parameter doesn't > exist? PowerVM partitions should always have it, but what do we want the > behavior to be on other hypervisors? In that case, there is no value displayed in the /proc/powerpc/lparcfg and the lparstat -i command will fall back to the device tree value. I can't see any valid reason to report the value defined in the device tree here. > > >> + } else { >> + /* Force end of string */ >> + len = be16_to_cpu(local_buffer->len); >> + if (len >= (RTAS_DATA_BUF_SIZE-2)) >> + len = RTAS_DATA_BUF_SIZE-2; > > Could use min() or clamp(), and it would be better to build the > expression using the value of sizeof(local_buffer->name). Fair enough. > >> + local_buffer->name[len] = '\0'; > > If 'len' can be (RTAS_DATA_BUF_SIZE - 2), then this writes past the end > of the buffer, no? Oh yes, the previous test should be if (len >= (RTAS_DATA_BUF_SIZE-2)) len = RTAS_DATA_BUF_SIZE-3; Thanks, Laurent.
Laurent Dufour <ldufour@linux.ibm.com> writes: > On 07/12/2021, 15:32:39, Nathan Lynch wrote: >> Is there a reasonable fallback for VMs where this parameter doesn't >> exist? PowerVM partitions should always have it, but what do we want the >> behavior to be on other hypervisors? > > In that case, there is no value displayed in the /proc/powerpc/lparcfg and > the lparstat -i command will fall back to the device tree value. I can't > see any valid reason to report the value defined in the device tree > here. Here's a valid reason :-) lparstat isn't the only possible consumer of the interface, and the 'ibm,partition-name' property and the dynamic system parameter clearly serve a common purpose. 'ibm,partition-name' is provided by qemu. In any case, the function should not print an error when the return value is -3 (parameter not supported).
On 07/12/2021, 18:07:50, Nathan Lynch wrote: > Laurent Dufour <ldufour@linux.ibm.com> writes: >> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>> Is there a reasonable fallback for VMs where this parameter doesn't >>> exist? PowerVM partitions should always have it, but what do we want the >>> behavior to be on other hypervisors? >> >> In that case, there is no value displayed in the /proc/powerpc/lparcfg and >> the lparstat -i command will fall back to the device tree value. I can't >> see any valid reason to report the value defined in the device tree >> here. > > Here's a valid reason :-) > > lparstat isn't the only possible consumer of the interface, and the > 'ibm,partition-name' property and the dynamic system parameter clearly > serve a common purpose. 'ibm,partition-name' is provided by qemu. If the hypervisor is not providing this value, this is not the goal of this interface to fetch it from the device tree. Any consumer should be able to fall back on the device tree value, and there is no added value to do such a trick in the kernel when it can be done in the user space. > In any case, the function should not print an error when the return > value is -3 (parameter not supported). That's a valid requirement.
On 07/12/2021, 18:18:40, Laurent Dufour wrote: > On 07/12/2021, 18:07:50, Nathan Lynch wrote: >> Laurent Dufour <ldufour@linux.ibm.com> writes: >>> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>>> Is there a reasonable fallback for VMs where this parameter doesn't >>>> exist? PowerVM partitions should always have it, but what do we want the >>>> behavior to be on other hypervisors? >>> >>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and >>> the lparstat -i command will fall back to the device tree value. I can't >>> see any valid reason to report the value defined in the device tree >>> here. >> >> Here's a valid reason :-) >> >> lparstat isn't the only possible consumer of the interface, and the >> 'ibm,partition-name' property and the dynamic system parameter clearly >> serve a common purpose. 'ibm,partition-name' is provided by qemu. > > If the hypervisor is not providing this value, this is not the goal of this > interface to fetch it from the device tree. > > Any consumer should be able to fall back on the device tree value, and > there is no added value to do such a trick in the kernel when it can be > done in the user space. > >> In any case, the function should not print an error when the return >> value is -3 (parameter not supported). > > That's a valid requirement. I sent a v4 which is printing an error message even if the parameter is not supported by the hypervisor. This is unlikely and since this a call to pr_err_once(), it would be printed only once, so not really annoying. I don't think a v5 is required for such a minor message.
Laurent Dufour <ldufour@linux.ibm.com> writes: > On 07/12/2021, 18:07:50, Nathan Lynch wrote: >> Laurent Dufour <ldufour@linux.ibm.com> writes: >>> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>>> Is there a reasonable fallback for VMs where this parameter doesn't >>>> exist? PowerVM partitions should always have it, but what do we want the >>>> behavior to be on other hypervisors? >>> >>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and >>> the lparstat -i command will fall back to the device tree value. I can't >>> see any valid reason to report the value defined in the device tree >>> here. >> >> Here's a valid reason :-) >> >> lparstat isn't the only possible consumer of the interface, and the >> 'ibm,partition-name' property and the dynamic system parameter clearly >> serve a common purpose. 'ibm,partition-name' is provided by qemu. > > If the hypervisor is not providing this value, this is not the goal of this > interface to fetch it from the device tree. > > Any consumer should be able to fall back on the device tree value, and > there is no added value to do such a trick in the kernel when it can be > done in the user space. There is value in imposing a level of abstraction so that the semantics are: * Report the name assigned to the guest by the hosting environment, if available as opposed to * Return the string returned by a RTAS call to ibm,get-system-parameter with token 55, if implemented The benefit is that consumers of lparcfg do not have to be coded with the knowledge that "if a partition_name= line is absent, the ibm,get-system-parameter RTAS call must have failed, so now I should read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort of esoterica that is appropriate for the kernel to encapsulate. And I'd say the effort involved (falling back to a root node property lookup) is proportional to the benefit.
On 08/12/2021, 16:21:29, Nathan Lynch wrote: > Laurent Dufour <ldufour@linux.ibm.com> writes: >> On 07/12/2021, 18:07:50, Nathan Lynch wrote: >>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>>>> Is there a reasonable fallback for VMs where this parameter doesn't >>>>> exist? PowerVM partitions should always have it, but what do we want the >>>>> behavior to be on other hypervisors? >>>> >>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and >>>> the lparstat -i command will fall back to the device tree value. I can't >>>> see any valid reason to report the value defined in the device tree >>>> here. >>> >>> Here's a valid reason :-) >>> >>> lparstat isn't the only possible consumer of the interface, and the >>> 'ibm,partition-name' property and the dynamic system parameter clearly >>> serve a common purpose. 'ibm,partition-name' is provided by qemu. >> >> If the hypervisor is not providing this value, this is not the goal of this >> interface to fetch it from the device tree. >> >> Any consumer should be able to fall back on the device tree value, and >> there is no added value to do such a trick in the kernel when it can be >> done in the user space. > > There is value in imposing a level of abstraction so that the semantics > are: > > * Report the name assigned to the guest by the hosting environment, if > available > > as opposed to > > * Return the string returned by a RTAS call to ibm,get-system-parameter > with token 55, if implemented > > The benefit is that consumers of lparcfg do not have to be coded with > the knowledge that "if a partition_name= line is absent, the > ibm,get-system-parameter RTAS call must have failed, so now I should > read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort > of esoterica that is appropriate for the kernel to encapsulate. > > And I'd say the effort involved (falling back to a root node property > lookup) is proportional to the benefit. > I don't agree. From the kernel point of view, I can't see any benefit, this is adding more complexity to do in the kernel what can be done easily in user space. This is typically what should be implemented in a user space shared library.
Laurent Dufour <ldufour@linux.ibm.com> writes: > On 08/12/2021, 16:21:29, Nathan Lynch wrote: >> Laurent Dufour <ldufour@linux.ibm.com> writes: >>> On 07/12/2021, 18:07:50, Nathan Lynch wrote: >>>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>>>>> Is there a reasonable fallback for VMs where this parameter doesn't >>>>>> exist? PowerVM partitions should always have it, but what do we want the >>>>>> behavior to be on other hypervisors? >>>>> >>>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and >>>>> the lparstat -i command will fall back to the device tree value. I can't >>>>> see any valid reason to report the value defined in the device tree >>>>> here. >>>> >>>> Here's a valid reason :-) >>>> >>>> lparstat isn't the only possible consumer of the interface, and the >>>> 'ibm,partition-name' property and the dynamic system parameter clearly >>>> serve a common purpose. 'ibm,partition-name' is provided by qemu. >>> >>> If the hypervisor is not providing this value, this is not the goal of this >>> interface to fetch it from the device tree. >>> >>> Any consumer should be able to fall back on the device tree value, and >>> there is no added value to do such a trick in the kernel when it can be >>> done in the user space. >> >> There is value in imposing a level of abstraction so that the semantics >> are: >> >> * Report the name assigned to the guest by the hosting environment, if >> available >> >> as opposed to >> >> * Return the string returned by a RTAS call to ibm,get-system-parameter >> with token 55, if implemented >> >> The benefit is that consumers of lparcfg do not have to be coded with >> the knowledge that "if a partition_name= line is absent, the >> ibm,get-system-parameter RTAS call must have failed, so now I should >> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort >> of esoterica that is appropriate for the kernel to encapsulate. >> >> And I'd say the effort involved (falling back to a root node property >> lookup) is proportional to the benefit. >> > > I don't agree. > From the kernel point of view, I can't see any benefit, this is adding more > complexity to do in the kernel what can be done easily in user space. Applying this logic, I don't see how adding this to lparcfg would be justified at all, because user space can already get at the parameter using the privileged rtas syscall. Publish it to unprivileged programs over D-Bus or something. That would minimize complexity for the kernel.
On 09/12/2021, 14:53:31, Nathan Lynch wrote: > Laurent Dufour <ldufour@linux.ibm.com> writes: >> On 08/12/2021, 16:21:29, Nathan Lynch wrote: >>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>> On 07/12/2021, 18:07:50, Nathan Lynch wrote: >>>>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>>>> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>>>>>> Is there a reasonable fallback for VMs where this parameter doesn't >>>>>>> exist? PowerVM partitions should always have it, but what do we want the >>>>>>> behavior to be on other hypervisors? >>>>>> >>>>>> In that case, there is no value displayed in the /proc/powerpc/lparcfg and >>>>>> the lparstat -i command will fall back to the device tree value. I can't >>>>>> see any valid reason to report the value defined in the device tree >>>>>> here. >>>>> >>>>> Here's a valid reason :-) >>>>> >>>>> lparstat isn't the only possible consumer of the interface, and the >>>>> 'ibm,partition-name' property and the dynamic system parameter clearly >>>>> serve a common purpose. 'ibm,partition-name' is provided by qemu. >>>> >>>> If the hypervisor is not providing this value, this is not the goal of this >>>> interface to fetch it from the device tree. >>>> >>>> Any consumer should be able to fall back on the device tree value, and >>>> there is no added value to do such a trick in the kernel when it can be >>>> done in the user space. >>> >>> There is value in imposing a level of abstraction so that the semantics >>> are: >>> >>> * Report the name assigned to the guest by the hosting environment, if >>> available >>> >>> as opposed to >>> >>> * Return the string returned by a RTAS call to ibm,get-system-parameter >>> with token 55, if implemented >>> >>> The benefit is that consumers of lparcfg do not have to be coded with >>> the knowledge that "if a partition_name= line is absent, the >>> ibm,get-system-parameter RTAS call must have failed, so now I should >>> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort >>> of esoterica that is appropriate for the kernel to encapsulate. >>> >>> And I'd say the effort involved (falling back to a root node property >>> lookup) is proportional to the benefit. >>> >> >> I don't agree. >> From the kernel point of view, I can't see any benefit, this is adding more >> complexity to do in the kernel what can be done easily in user space. > > Applying this logic, I don't see how adding this to lparcfg would be > justified at all, because user space can already get at the parameter > using the privileged rtas syscall. Publish it to unprivileged programs > over D-Bus or something. That would minimize complexity for the kernel. As you know, there is a need for unprivileged user to read that value. Adding this to lparcfg solves this issue. Also, this makes it easy to read from user space, without the need to get plumbing like D-Bus in the picture. This is simple, working fine, and user space can easily make the choice to read the DT value if needed. Please keep in mind that most of the time the hypervisor is providing that value.
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c index f71eac74ea92..5f6dbc23d7d2 100644 --- a/arch/powerpc/platforms/pseries/lparcfg.c +++ b/arch/powerpc/platforms/pseries/lparcfg.c @@ -311,6 +311,59 @@ static void parse_mpp_x_data(struct seq_file *m) seq_printf(m, "coalesce_pool_spurr=%ld\n", mpp_x_data.pool_spurr_cycles); } +/* + * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to + * read the LPAR name. + */ +#define SPLPAR_LPAR_NAME_TOKEN 55 +static void read_lpar_name(struct seq_file *m) +{ + int rc, len, token; + union { + char raw_buffer[RTAS_DATA_BUF_SIZE]; + struct { + __be16 len; + char name[RTAS_DATA_BUF_SIZE-2]; + }; + } *local_buffer; + + token = rtas_token("ibm,get-system-parameter"); + if (token == RTAS_UNKNOWN_SERVICE) + return; + + local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL); + if (!local_buffer) + return; + + do { + spin_lock(&rtas_data_buf_lock); + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE); + rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN, + __pa(rtas_data_buf), RTAS_DATA_BUF_SIZE); + if (!rc) + memcpy(local_buffer->raw_buffer, rtas_data_buf, + RTAS_DATA_BUF_SIZE); + spin_unlock(&rtas_data_buf_lock); + } while (rtas_busy_delay(rc)); + + if (rc != 0) { + pr_err_once( + "%s %s Error calling get-system-parameter (0x%x)\n", + __FILE__, __func__, rc); + } else { + /* Force end of string */ + len = be16_to_cpu(local_buffer->len); + if (len >= (RTAS_DATA_BUF_SIZE-2)) + len = RTAS_DATA_BUF_SIZE-2; + local_buffer->name[len] = '\0'; + + seq_printf(m, "partition_name=%s\n", local_buffer->name); + } + + kfree(local_buffer); +} + + #define SPLPAR_CHARACTERISTICS_TOKEN 20 #define SPLPAR_MAXLENGTH 1026*(sizeof(char)) @@ -496,6 +549,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v) if (firmware_has_feature(FW_FEATURE_SPLPAR)) { /* this call handles the ibm,get-system-parameter contents */ + read_lpar_name(m); parse_system_parameter_string(m); parse_ppp_data(m); parse_mpp_data(m);
The LPAR name may be changed after the LPAR has been started in the HMC. In that case lparstat command is not reporting the updated value because it reads it from the device tree which is read at boot time. However this value could be read from RTAS. Adding this value in the /proc/powerpc/lparcfg output allows to read the updated value. Cc: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> --- v3: address Michael's comments. v2: address Nathan's comments. change title to partition_name aligning with existing partition_id --- arch/powerpc/platforms/pseries/lparcfg.c | 54 ++++++++++++++++++++++++ 1 file changed, 54 insertions(+)