Message ID | 20220623125442.645240-2-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/2] powerpc/numa: Return the first online node instead of 0 | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 10 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 10 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 7 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
* Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:42]: > While building the cpu_to_node map make sure we always use the online node > to build the mapping table. In general this should not be an issue > because the kernel use similar lookup mechanism (vphn_get_nid()) to mark > nodes online (setup_node_data()). Hence NUMA nodes we find during > lookup in numa_setup_cpu() will always be found online. > > To keep logic simpler/correct, make sure that if the hypervisor > or device tree returned a not online node, don't use that to build > the map table. Instead, use the first_online_node. Why should the returned nid be already online. Are we facing any problem with the current code? Since general idea is to keep cpu_to_node constant, By assigining it the first_online_node, we may be ending up assigning a wrong node, resulting a performance penalty later on. i.e CPU may actually belong to node 4 but we assigned it to node 1. Also the coregroup id is derived from associavity array. If there is a mismatch between the coregroup id and nid, scheduler will crib. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > arch/powerpc/mm/numa.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 0801b2ce9b7d..f387b9eb9dc9 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu) > of_node_put(cpu); > > out_present: > - if (nid < 0 || !node_possible(nid)) > + if (nid < 0 || !node_online(nid)) > nid = first_online_node; > > /* > -- > 2.36.1 >
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:42]: > >> While building the cpu_to_node map make sure we always use the online node >> to build the mapping table. In general this should not be an issue >> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark >> nodes online (setup_node_data()). Hence NUMA nodes we find during >> lookup in numa_setup_cpu() will always be found online. >> >> To keep logic simpler/correct, make sure that if the hypervisor >> or device tree returned a not online node, don't use that to build >> the map table. Instead, use the first_online_node. > > Why should the returned nid be already online. Are we facing any problem > with the current code? There is no issue with the current upstream code. The problem was detected in a distro kernel where we had partial backport. We didn't initialize NUMA node details correctly. So only a subset of actual NUMA nodes got marked online. This resulted in a crash at completely independent part of the kernel because the kernel try to derefer NODE_DATA() which was not initialized. > > Since general idea is to keep cpu_to_node constant, By assigining it the > first_online_node, we may be ending up assigning a wrong node, resulting a > performance penalty later on. i.e CPU may actually belong to node 4 but we > assigned it to node 1. Also the coregroup id is derived from associavity > array. If there is a mismatch between the coregroup id and nid, > scheduler will crib. The changed code will never be executed unless something broke between the NUMA node init (setup_node_data() and numa_setup_cpu()). The reason the code change was made was to keep it consistent with the fact that we only initialize NODE_DATA for online NUMA nodes and hence don't return a possible NUMA node value in numa_setup_cpu(). > >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> arch/powerpc/mm/numa.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index 0801b2ce9b7d..f387b9eb9dc9 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu) >> of_node_put(cpu); >> >> out_present: >> - if (nid < 0 || !node_possible(nid)) >> + if (nid < 0 || !node_online(nid)) >> nid = first_online_node; >> >> /* >> -- >> 2.36.1 >> > > -- > Thanks and Regards > Srikar Dronamraju
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0801b2ce9b7d..f387b9eb9dc9 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu) of_node_put(cpu); out_present: - if (nid < 0 || !node_possible(nid)) + if (nid < 0 || !node_online(nid)) nid = first_online_node; /*
While building the cpu_to_node map make sure we always use the online node to build the mapping table. In general this should not be an issue because the kernel use similar lookup mechanism (vphn_get_nid()) to mark nodes online (setup_node_data()). Hence NUMA nodes we find during lookup in numa_setup_cpu() will always be found online. To keep logic simpler/correct, make sure that if the hypervisor or device tree returned a not online node, don't use that to build the map table. Instead, use the first_online_node. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- arch/powerpc/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)