diff mbox series

[2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node

Message ID 20220623125442.645240-2-aneesh.kumar@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series [1/2] powerpc/numa: Return the first online node instead of 0 | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 10 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 10 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 7 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.

Commit Message

Aneesh Kumar K V June 23, 2022, 12:54 p.m. UTC
While building the cpu_to_node map make sure we always use the online node
to build the mapping table. In general this should not be an issue
because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
nodes online (setup_node_data()). Hence NUMA nodes we find during
lookup in numa_setup_cpu() will always be found online.

To keep logic simpler/correct, make sure that if the hypervisor
or device tree returned a not online node, don't use that to build
the map table. Instead, use the first_online_node.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Srikar Dronamraju June 24, 2022, 8:50 a.m. UTC | #1
* Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:42]:

> While building the cpu_to_node map make sure we always use the online node
> to build the mapping table. In general this should not be an issue
> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
> nodes online (setup_node_data()). Hence NUMA nodes we find during
> lookup in numa_setup_cpu() will always be found online.
> 
> To keep logic simpler/correct, make sure that if the hypervisor
> or device tree returned a not online node, don't use that to build
> the map table. Instead, use the first_online_node.

Why should the returned nid be already online. Are we facing any problem
with the current code?

Since general idea is to keep cpu_to_node constant, By assigining it the
first_online_node, we may be ending up assigning a wrong node, resulting a
performance penalty later on. i.e CPU may actually belong to node 4 but we
assigned it to node 1. Also the coregroup id is derived from associavity
array. If there is a mismatch between the coregroup id and nid,
scheduler will crib.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 0801b2ce9b7d..f387b9eb9dc9 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>  	of_node_put(cpu);
> 
>  out_present:
> -	if (nid < 0 || !node_possible(nid))
> +	if (nid < 0 || !node_online(nid))
>  		nid = first_online_node;
> 
>  	/*
> -- 
> 2.36.1
>
Aneesh Kumar K V June 27, 2022, 5:27 a.m. UTC | #2
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:42]:
>
>> While building the cpu_to_node map make sure we always use the online node
>> to build the mapping table. In general this should not be an issue
>> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
>> nodes online (setup_node_data()). Hence NUMA nodes we find during
>> lookup in numa_setup_cpu() will always be found online.
>> 
>> To keep logic simpler/correct, make sure that if the hypervisor
>> or device tree returned a not online node, don't use that to build
>> the map table. Instead, use the first_online_node.
>
> Why should the returned nid be already online. Are we facing any problem
> with the current code?

There is no issue with the current upstream code. The problem was
detected in a distro kernel where we had partial backport. We didn't
initialize NUMA node details correctly. So only a subset of actual
NUMA nodes got marked online. This resulted in a crash at completely
independent part of the kernel because the kernel try to derefer NODE_DATA()
which was not initialized. 

>
> Since general idea is to keep cpu_to_node constant, By assigining it the
> first_online_node, we may be ending up assigning a wrong node, resulting a
> performance penalty later on. i.e CPU may actually belong to node 4 but we
> assigned it to node 1. Also the coregroup id is derived from associavity
> array. If there is a mismatch between the coregroup id and nid,
> scheduler will crib.

The changed code will never be executed unless something broke between
the NUMA node init (setup_node_data() and numa_setup_cpu()). The reason
the code change was made was to keep it consistent with the fact that we
only initialize NODE_DATA for online NUMA nodes and hence don't return
a possible NUMA node value in numa_setup_cpu(). 

>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 0801b2ce9b7d..f387b9eb9dc9 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>>  	of_node_put(cpu);
>> 
>>  out_present:
>> -	if (nid < 0 || !node_possible(nid))
>> +	if (nid < 0 || !node_online(nid))
>>  		nid = first_online_node;
>> 
>>  	/*
>> -- 
>> 2.36.1
>> 
>
> -- 
> Thanks and Regards
> Srikar Dronamraju
diff mbox series

Patch

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0801b2ce9b7d..f387b9eb9dc9 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -741,7 +741,7 @@  static int numa_setup_cpu(unsigned long lcpu)
 	of_node_put(cpu);
 
 out_present:
-	if (nid < 0 || !node_possible(nid))
+	if (nid < 0 || !node_online(nid))
 		nid = first_online_node;
 
 	/*