diff mbox series

[v7,05/15] machine: Improve the error reporting of smp parsing

Message ID 20210823122804.7692-6-wangyanan55@huawei.com
State New
Headers show
Series machine: smp parsing fixes and improvement | expand

Commit Message

wangyanan (Y) Aug. 23, 2021, 12:27 p.m. UTC
We have two requirements for a valid SMP configuration:
the product of "sockets * cores * threads" must represent all the
possible cpus, i.e., max_cpus, and then must include the initially
present cpus, i.e., smp_cpus.

So we only need to ensure 1) "sockets * cores * threads == maxcpus"
at first and then ensure 2) "maxcpus >= cpus". With a reasonable
order of the sanity check, we can simplify the error reporting code.
When reporting an error message we also report the exact value of
each topology member to make users easily see what's going on.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
---
 hw/core/machine.c | 22 +++++++++-------------
 hw/i386/pc.c      | 24 ++++++++++--------------
 2 files changed, 19 insertions(+), 27 deletions(-)

Comments

Philippe Mathieu-Daudé Aug. 23, 2021, 1:17 p.m. UTC | #1
On 8/23/21 2:27 PM, Yanan Wang wrote:
> We have two requirements for a valid SMP configuration:
> the product of "sockets * cores * threads" must represent all the
> possible cpus, i.e., max_cpus, and then must include the initially
> present cpus, i.e., smp_cpus.
> 
> So we only need to ensure 1) "sockets * cores * threads == maxcpus"
> at first and then ensure 2) "maxcpus >= cpus". With a reasonable
> order of the sanity check, we can simplify the error reporting code.
> When reporting an error message we also report the exact value of
> each topology member to make users easily see what's going on.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
> ---
>  hw/core/machine.c | 22 +++++++++-------------
>  hw/i386/pc.c      | 24 ++++++++++--------------
>  2 files changed, 19 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 85908abc77..093c0d382d 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -779,25 +779,21 @@ static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
>      maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
>      cpus = cpus > 0 ? cpus : maxcpus;
>  
> -    if (sockets * cores * threads < cpus) {
> -        error_setg(errp, "cpu topology: "
> -                   "sockets (%u) * cores (%u) * threads (%u) < "
> -                   "smp_cpus (%u)",
> -                   sockets, cores, threads, cpus);
> +    if (sockets * cores * threads != maxcpus) {
> +        error_setg(errp, "Invalid CPU topology: "
> +                   "product of the hierarchy must match maxcpus: "
> +                   "sockets (%u) * cores (%u) * threads (%u) "
> +                   "!= maxcpus (%u)",
> +                   sockets, cores, threads, maxcpus);
>          return;
>      }

Thinking about scalability, MachineClass could have a
parse_cpu_topology() handler, and this would be the
generic one. Principally because architectures don't
use the same terms, and die/socket/core/thread arrangement
is machine specific (besides being arch-spec).
Not a problem as of today, but the way we try to handle
this generically seems over-engineered to me.

[unrelated to this particular patch]
wangyanan (Y) Aug. 24, 2021, 4:51 a.m. UTC | #2
On 2021/8/23 21:17, Philippe Mathieu-Daudé wrote:
> On 8/23/21 2:27 PM, Yanan Wang wrote:
>> We have two requirements for a valid SMP configuration:
>> the product of "sockets * cores * threads" must represent all the
>> possible cpus, i.e., max_cpus, and then must include the initially
>> present cpus, i.e., smp_cpus.
>>
>> So we only need to ensure 1) "sockets * cores * threads == maxcpus"
>> at first and then ensure 2) "maxcpus >= cpus". With a reasonable
>> order of the sanity check, we can simplify the error reporting code.
>> When reporting an error message we also report the exact value of
>> each topology member to make users easily see what's going on.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
>> ---
>>   hw/core/machine.c | 22 +++++++++-------------
>>   hw/i386/pc.c      | 24 ++++++++++--------------
>>   2 files changed, 19 insertions(+), 27 deletions(-)
>>
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index 85908abc77..093c0d382d 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -779,25 +779,21 @@ static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
>>       maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
>>       cpus = cpus > 0 ? cpus : maxcpus;
>>   
>> -    if (sockets * cores * threads < cpus) {
>> -        error_setg(errp, "cpu topology: "
>> -                   "sockets (%u) * cores (%u) * threads (%u) < "
>> -                   "smp_cpus (%u)",
>> -                   sockets, cores, threads, cpus);
>> +    if (sockets * cores * threads != maxcpus) {
>> +        error_setg(errp, "Invalid CPU topology: "
>> +                   "product of the hierarchy must match maxcpus: "
>> +                   "sockets (%u) * cores (%u) * threads (%u) "
>> +                   "!= maxcpus (%u)",
>> +                   sockets, cores, threads, maxcpus);
>>           return;
>>       }
> Thinking about scalability, MachineClass could have a
> parse_cpu_topology() handler, and this would be the
> generic one. Principally because architectures don't
> use the same terms, and die/socket/core/thread arrangement
> is machine specific (besides being arch-spec).
> Not a problem as of today, but the way we try to handle
> this generically seems over-engineered to me.
Hi Philippe,

The reason for introducing a generic implementation and avoiding
specific ones is that we thought there is little difference in parsing
logic between the specific parsers. Most part of the parsing is the
automatic calculation of missing values and the related error reporting,
in which the only difference between parsers is the handling of specific
(no matter of arch-specific or machine-specifc) parameters.

So it may be better to keep the parsing logic unified if we can easily
realize that. And actually we can use compat stuff to handle specific
topology parameters well. See implementation in patch #10.

There have been patches on list introducing new specific members
(s390 related in [1] and ARM related in [2]), and in each of them there
is a specific parser needed. However, based on generic one we can
extend without the increasing code duplication.

There is also some discussion about generic/specific parser in [1],
which can be a reference.

[1] 
https://lore.kernel.org/qemu-devel/1626281596-31061-2-git-send-email-pmorel@linux.ibm.com/
[2] 
https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/

Thanks,
Yanan
.
> [unrelated to this particular patch]
>
> .
Philippe Mathieu-Daudé Aug. 24, 2021, 7:29 a.m. UTC | #3
On 8/24/21 6:51 AM, wangyanan (Y) wrote:
> On 2021/8/23 21:17, Philippe Mathieu-Daudé wrote:
>> On 8/23/21 2:27 PM, Yanan Wang wrote:
>>> We have two requirements for a valid SMP configuration:
>>> the product of "sockets * cores * threads" must represent all the
>>> possible cpus, i.e., max_cpus, and then must include the initially
>>> present cpus, i.e., smp_cpus.
>>>
>>> So we only need to ensure 1) "sockets * cores * threads == maxcpus"
>>> at first and then ensure 2) "maxcpus >= cpus". With a reasonable
>>> order of the sanity check, we can simplify the error reporting code.
>>> When reporting an error message we also report the exact value of
>>> each topology member to make users easily see what's going on.
>>>
>>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>>> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
>>> ---
>>>   hw/core/machine.c | 22 +++++++++-------------
>>>   hw/i386/pc.c      | 24 ++++++++++--------------
>>>   2 files changed, 19 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index 85908abc77..093c0d382d 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -779,25 +779,21 @@ static void smp_parse(MachineState *ms,
>>> SMPConfiguration *config, Error **errp)
>>>       maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
>>>       cpus = cpus > 0 ? cpus : maxcpus;
>>>   -    if (sockets * cores * threads < cpus) {
>>> -        error_setg(errp, "cpu topology: "
>>> -                   "sockets (%u) * cores (%u) * threads (%u) < "
>>> -                   "smp_cpus (%u)",
>>> -                   sockets, cores, threads, cpus);
>>> +    if (sockets * cores * threads != maxcpus) {
>>> +        error_setg(errp, "Invalid CPU topology: "
>>> +                   "product of the hierarchy must match maxcpus: "
>>> +                   "sockets (%u) * cores (%u) * threads (%u) "
>>> +                   "!= maxcpus (%u)",
>>> +                   sockets, cores, threads, maxcpus);
>>>           return;
>>>       }
>> Thinking about scalability, MachineClass could have a
>> parse_cpu_topology() handler, and this would be the
>> generic one. Principally because architectures don't
>> use the same terms, and die/socket/core/thread arrangement
>> is machine specific (besides being arch-spec).
>> Not a problem as of today, but the way we try to handle
>> this generically seems over-engineered to me.
> Hi Philippe,
> 
> The reason for introducing a generic implementation and avoiding
> specific ones is that we thought there is little difference in parsing
> logic between the specific parsers. Most part of the parsing is the
> automatic calculation of missing values and the related error reporting,
> in which the only difference between parsers is the handling of specific
> (no matter of arch-specific or machine-specifc) parameters.
> 
> So it may be better to keep the parsing logic unified if we can easily
> realize that. And actually we can use compat stuff to handle specific
> topology parameters well. See implementation in patch #10.
> 
> There have been patches on list introducing new specific members
> (s390 related in [1] and ARM related in [2]), and in each of them there
> is a specific parser needed. However, based on generic one we can
> extend without the increasing code duplication.
> 
> There is also some discussion about generic/specific parser in [1],
> which can be a reference.
> 
> [1]
> https://lore.kernel.org/qemu-devel/1626281596-31061-2-git-send-email-pmorel@linux.ibm.com/
> 
> [2]
> https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/

OK I read Daniel's rationale here:
https://lore.kernel.org/qemu-devel/YPFN83pKBt7F97kW@redhat.com/

Thanks,

Phil.
wangyanan (Y) Aug. 24, 2021, 8:23 a.m. UTC | #4
On 2021/8/24 15:29, Philippe Mathieu-Daudé wrote:
> On 8/24/21 6:51 AM, wangyanan (Y) wrote:
>> On 2021/8/23 21:17, Philippe Mathieu-Daudé wrote:
>>> On 8/23/21 2:27 PM, Yanan Wang wrote:
>>>> We have two requirements for a valid SMP configuration:
>>>> the product of "sockets * cores * threads" must represent all the
>>>> possible cpus, i.e., max_cpus, and then must include the initially
>>>> present cpus, i.e., smp_cpus.
>>>>
>>>> So we only need to ensure 1) "sockets * cores * threads == maxcpus"
>>>> at first and then ensure 2) "maxcpus >= cpus". With a reasonable
>>>> order of the sanity check, we can simplify the error reporting code.
>>>> When reporting an error message we also report the exact value of
>>>> each topology member to make users easily see what's going on.
>>>>
>>>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>>>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>>>> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
>>>> ---
>>>>    hw/core/machine.c | 22 +++++++++-------------
>>>>    hw/i386/pc.c      | 24 ++++++++++--------------
>>>>    2 files changed, 19 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>>> index 85908abc77..093c0d382d 100644
>>>> --- a/hw/core/machine.c
>>>> +++ b/hw/core/machine.c
>>>> @@ -779,25 +779,21 @@ static void smp_parse(MachineState *ms,
>>>> SMPConfiguration *config, Error **errp)
>>>>        maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
>>>>        cpus = cpus > 0 ? cpus : maxcpus;
>>>>    -    if (sockets * cores * threads < cpus) {
>>>> -        error_setg(errp, "cpu topology: "
>>>> -                   "sockets (%u) * cores (%u) * threads (%u) < "
>>>> -                   "smp_cpus (%u)",
>>>> -                   sockets, cores, threads, cpus);
>>>> +    if (sockets * cores * threads != maxcpus) {
>>>> +        error_setg(errp, "Invalid CPU topology: "
>>>> +                   "product of the hierarchy must match maxcpus: "
>>>> +                   "sockets (%u) * cores (%u) * threads (%u) "
>>>> +                   "!= maxcpus (%u)",
>>>> +                   sockets, cores, threads, maxcpus);
>>>>            return;
>>>>        }
>>> Thinking about scalability, MachineClass could have a
>>> parse_cpu_topology() handler, and this would be the
>>> generic one. Principally because architectures don't
>>> use the same terms, and die/socket/core/thread arrangement
>>> is machine specific (besides being arch-spec).
>>> Not a problem as of today, but the way we try to handle
>>> this generically seems over-engineered to me.
>> Hi Philippe,
>>
>> The reason for introducing a generic implementation and avoiding
>> specific ones is that we thought there is little difference in parsing
>> logic between the specific parsers. Most part of the parsing is the
>> automatic calculation of missing values and the related error reporting,
>> in which the only difference between parsers is the handling of specific
>> (no matter of arch-specific or machine-specifc) parameters.
>>
>> So it may be better to keep the parsing logic unified if we can easily
>> realize that. And actually we can use compat stuff to handle specific
>> topology parameters well. See implementation in patch #10.
>>
>> There have been patches on list introducing new specific members
>> (s390 related in [1] and ARM related in [2]), and in each of them there
>> is a specific parser needed. However, based on generic one we can
>> extend without the increasing code duplication.
>>
>> There is also some discussion about generic/specific parser in [1],
>> which can be a reference.
>>
>> [1]
>> https://lore.kernel.org/qemu-devel/1626281596-31061-2-git-send-email-pmorel@linux.ibm.com/
>>
>> [2]
>> https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/
> OK I read Daniel's rationale here:
> https://lore.kernel.org/qemu-devel/YPFN83pKBt7F97kW@redhat.com/
exactly. :)

Thanks,
Yanan
.
> Thanks,
>
> Phil.
>
> .
diff mbox series

Patch

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 85908abc77..093c0d382d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -779,25 +779,21 @@  static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
     cpus = cpus > 0 ? cpus : maxcpus;
 
-    if (sockets * cores * threads < cpus) {
-        error_setg(errp, "cpu topology: "
-                   "sockets (%u) * cores (%u) * threads (%u) < "
-                   "smp_cpus (%u)",
-                   sockets, cores, threads, cpus);
+    if (sockets * cores * threads != maxcpus) {
+        error_setg(errp, "Invalid CPU topology: "
+                   "product of the hierarchy must match maxcpus: "
+                   "sockets (%u) * cores (%u) * threads (%u) "
+                   "!= maxcpus (%u)",
+                   sockets, cores, threads, maxcpus);
         return;
     }
 
     if (maxcpus < cpus) {
-        error_setg(errp, "maxcpus must be equal to or greater than smp");
-        return;
-    }
-
-    if (sockets * cores * threads != maxcpus) {
         error_setg(errp, "Invalid CPU topology: "
+                   "maxcpus must be equal to or greater than smp: "
                    "sockets (%u) * cores (%u) * threads (%u) "
-                   "!= maxcpus (%u)",
-                   sockets, cores, threads,
-                   maxcpus);
+                   "== maxcpus (%u) < smp_cpus (%u)",
+                   sockets, cores, threads, maxcpus, cpus);
         return;
     }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 9ad7ae5254..fcf6905219 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -747,25 +747,21 @@  static void pc_smp_parse(MachineState *ms, SMPConfiguration *config, Error **err
     maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * cores * threads;
     cpus = cpus > 0 ? cpus : maxcpus;
 
-    if (sockets * dies * cores * threads < cpus) {
-        error_setg(errp, "cpu topology: "
-                   "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
-                   "smp_cpus (%u)",
-                   sockets, dies, cores, threads, cpus);
+    if (sockets * dies * cores * threads != maxcpus) {
+        error_setg(errp, "Invalid CPU topology: "
+                   "product of the hierarchy must match maxcpus: "
+                   "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
+                   "!= maxcpus (%u)",
+                   sockets, dies, cores, threads, maxcpus);
         return;
     }
 
     if (maxcpus < cpus) {
-        error_setg(errp, "maxcpus must be equal to or greater than smp");
-        return;
-    }
-
-    if (sockets * dies * cores * threads != maxcpus) {
-        error_setg(errp, "Invalid CPU topology deprecated: "
+        error_setg(errp, "Invalid CPU topology: "
+                   "maxcpus must be equal to or greater than smp: "
                    "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
-                   "!= maxcpus (%u)",
-                   sockets, dies, cores, threads,
-                   maxcpus);
+                   "== maxcpus (%u) < smp_cpus (%u)",
+                   sockets, dies, cores, threads, maxcpus, cpus);
         return;
     }