diff mbox series

[PULL,03/18] numa: Validate cluster and NUMA node boundary if required

Message ID 20230626111445.163573-4-pbonzini@redhat.com
State New
Headers show
Series [PULL,01/18] build: further refine build.ninja rules | expand

Commit Message

Paolo Bonzini June 26, 2023, 11:14 a.m. UTC
From: Gavin Shan <gshan@redhat.com>

For some architectures like ARM64, multiple CPUs in one cluster can be
associated with different NUMA nodes, which is irregular configuration
because we shouldn't have this in baremetal environment. The irregular
configuration causes Linux guest to misbehave, as the following warning
messages indicate.

  -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
  -numa node,nodeid=0,cpus=0-1,memdev=ram0                \
  -numa node,nodeid=1,cpus=2-3,memdev=ram1                \
  -numa node,nodeid=2,cpus=4-5,memdev=ram2                \

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
  pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : build_sched_domains+0x284/0x910
  lr : build_sched_domains+0x184/0x910
  sp : ffff80000804bd50
  x29: ffff80000804bd50 x28: 0000000000000002 x27: 0000000000000000
  x26: ffff800009cf9a80 x25: 0000000000000000 x24: ffff800009cbf840
  x23: ffff000080325000 x22: ffff0000005df800 x21: ffff80000a4ce508
  x20: 0000000000000000 x19: ffff000080324440 x18: 0000000000000014
  x17: 00000000388925c0 x16: 000000005386a066 x15: 000000009c10cc2e
  x14: 00000000000001c0 x13: 0000000000000001 x12: ffff00007fffb1a0
  x11: ffff00007fffb180 x10: ffff80000a4ce508 x9 : 0000000000000041
  x8 : ffff80000a4ce500 x7 : ffff80000a4cf920 x6 : 0000000000000001
  x5 : 0000000000000001 x4 : 0000000000000007 x3 : 0000000000000002
  x2 : 0000000000001000 x1 : ffff80000a4cf928 x0 : 0000000000000001
  Call trace:
   build_sched_domains+0x284/0x910
   sched_init_domains+0xac/0xe0
   sched_init_smp+0x48/0xc8
   kernel_init_freeable+0x140/0x1ac
   kernel_init+0x28/0x140
   ret_from_fork+0x10/0x20

Improve the situation to warn when multiple CPUs in one cluster have
been associated with different NUMA nodes. However, one NUMA node is
allowed to be associated with different clusters.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230509002739.18388-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/core/machine.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/boards.h |  1 +
 2 files changed, 43 insertions(+)

Comments

Peter Maydell July 20, 2023, 1:10 p.m. UTC | #1
On Mon, 26 Jun 2023 at 12:15, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> From: Gavin Shan <gshan@redhat.com>
>
> For some architectures like ARM64, multiple CPUs in one cluster can be
> associated with different NUMA nodes, which is irregular configuration
> because we shouldn't have this in baremetal environment. The irregular
> configuration causes Linux guest to misbehave, as the following warning
> messages indicate.
>
>   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
>   -numa node,nodeid=0,cpus=0-1,memdev=ram0                \
>   -numa node,nodeid=1,cpus=2-3,memdev=ram1                \
>   -numa node,nodeid=2,cpus=4-5,memdev=ram2                \

Hi. This new warning shows up a lot in "make check" output:

$ grep -c 'can cause OSes' /tmp/parn3ofA.par
44

Looks like this is all in the qtest-aarch64/numa-test test.

Please can you investigate and either:
 (1) fix the test not to do the bad thing that's causing the warning
 (2) change the warning so it doesn't show up in stderr when
     running a correct and passing test
?

thanks
-- PMM
Gavin Shan July 21, 2023, 10:50 a.m. UTC | #2
On 7/20/23 23:10, Peter Maydell wrote:
> On Mon, 26 Jun 2023 at 12:15, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> From: Gavin Shan <gshan@redhat.com>
>>
>> For some architectures like ARM64, multiple CPUs in one cluster can be
>> associated with different NUMA nodes, which is irregular configuration
>> because we shouldn't have this in baremetal environment. The irregular
>> configuration causes Linux guest to misbehave, as the following warning
>> messages indicate.
>>
>>    -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
>>    -numa node,nodeid=0,cpus=0-1,memdev=ram0                \
>>    -numa node,nodeid=1,cpus=2-3,memdev=ram1                \
>>    -numa node,nodeid=2,cpus=4-5,memdev=ram2                \
> 
> Hi. This new warning shows up a lot in "make check" output:
> 
> $ grep -c 'can cause OSes' /tmp/parn3ofA.par
> 44
> 
> Looks like this is all in the qtest-aarch64/numa-test test.
> 
> Please can you investigate and either:
>   (1) fix the test not to do the bad thing that's causing the warning
>   (2) change the warning so it doesn't show up in stderr when
>       running a correct and passing test
> ?
> 

Yes, all the warning messages come from tests/qtest/numa-test.c. There
are 3 configurations where the boundary of CPU cluster and NUMA node is
broken as expected. I've sent a patch to disable the validation for qtest.

https://lists.nongnu.org/archive/html/qemu-arm/2023-07/msg00440.html

With the patch applied, I didn't see similar warning messages from
"make -j 40 check-qtest".

Thanks,
Gavin
diff mbox series

Patch

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1000406211f..46f8f9a2b04 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1262,6 +1262,45 @@  static void machine_numa_finish_cpu_init(MachineState *machine)
     g_string_free(s, true);
 }
 
+static void validate_cpu_cluster_to_numa_boundary(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+    const CPUArchId *cpus = possible_cpus->cpus;
+    int i, j;
+
+    if (state->num_nodes <= 1 || possible_cpus->len <= 1) {
+        return;
+    }
+
+    /*
+     * The Linux scheduling domain can't be parsed when the multiple CPUs
+     * in one cluster have been associated with different NUMA nodes. However,
+     * it's fine to associate one NUMA node with CPUs in different clusters.
+     */
+    for (i = 0; i < possible_cpus->len; i++) {
+        for (j = i + 1; j < possible_cpus->len; j++) {
+            if (cpus[i].props.has_socket_id &&
+                cpus[i].props.has_cluster_id &&
+                cpus[i].props.has_node_id &&
+                cpus[j].props.has_socket_id &&
+                cpus[j].props.has_cluster_id &&
+                cpus[j].props.has_node_id &&
+                cpus[i].props.socket_id == cpus[j].props.socket_id &&
+                cpus[i].props.cluster_id == cpus[j].props.cluster_id &&
+                cpus[i].props.node_id != cpus[j].props.node_id) {
+                warn_report("CPU-%d and CPU-%d in socket-%" PRId64 "-cluster-%" PRId64
+                             " have been associated with node-%" PRId64 " and node-%" PRId64
+                             " respectively. It can cause OSes like Linux to"
+                             " misbehave", i, j, cpus[i].props.socket_id,
+                             cpus[i].props.cluster_id, cpus[i].props.node_id,
+                             cpus[j].props.node_id);
+            }
+        }
+    }
+}
+
 MemoryRegion *machine_consume_memdev(MachineState *machine,
                                      HostMemoryBackend *backend)
 {
@@ -1355,6 +1394,9 @@  void machine_run_board_init(MachineState *machine, const char *mem_path, Error *
         numa_complete_configuration(machine);
         if (machine->numa_state->num_nodes) {
             machine_numa_finish_cpu_init(machine);
+            if (machine_class->cpu_cluster_has_numa_boundary) {
+                validate_cpu_cluster_to_numa_boundary(machine);
+            }
         }
     }
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a385010909d..6b267c21ce7 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -274,6 +274,7 @@  struct MachineClass {
     bool nvdimm_supported;
     bool numa_mem_supported;
     bool auto_enable_numa;
+    bool cpu_cluster_has_numa_boundary;
     SMPCompatProps smp_props;
     const char *default_ram_id;