Message ID | 20231229120107.2281153-5-mpe@ellerman.id.au (mailing list archive) |
---|---|
State | Accepted |
Commit | 0875f1ceba974042069f04946aa8f1d4d1e688da |
Headers | show |
Series | [RFC,1/5] powerpc/smp: Adjust nr_cpu_ids to cover all threads of a core | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
Michael Ellerman <mpe@ellerman.id.au> writes: > If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU > onto logical core 0. Hi guys, I finally got time to look at this issue. I think this series should fix the problems that have been seen. I've tested this fairly thoroughly with a qemu script, and also a few boots on a real machine. If you can test it with your setups that would be great. Hopefully there isn't some obscure case I've missed. cheers
On Fri, Dec 29, 2023 at 8:07 PM Michael Ellerman <mpe@ellerman.id.au> wrote: > > Michael Ellerman <mpe@ellerman.id.au> writes: > > If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU > > onto logical core 0. > > Hi guys, > > I finally got time to look at this issue. I think this series should fix Thanks a lot for sparing time on it and hope we can close this prolonged issue soon. And loop in Wen Xiong and Ming Lei, who care for this issue too. Best Regards, Pingfan > the problems that have been seen. I've tested this fairly thoroughly > with a qemu script, and also a few boots on a real machine. > > If you can test it with your setups that would be great. Hopefully there > isn't some obscure case I've missed. > > cheers >
Michael Ellerman <mpe@ellerman.id.au> writes: .... > #ifdef CONFIG_PPC64 > int boot_cpu_hwid = -1; > @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void) > avail = !of_property_match_string(dn, > "enable-method", "spin-table"); > > - cpu = assign_threads(cpu, nthreads, avail, intserv); > + if (boot_core_hwid >= 0) { > + if (cpu == 0) { > + pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn); > + cpu = nthreads; > + continue; > + } > > - if (cpu >= nr_cpu_ids) { > + if (be32_to_cpu(intserv[0]) == boot_core_hwid) { > + pr_info("Renumbered boot core %pOF to logical 0\n", dn); > + assign_threads(0, nthreads, avail, intserv); > + of_node_put(dn); > + break; > I was expecting a 'continue' here. Why 'break' the loop? The condition that should break the loop should be cpu >= nr_cpu_ids > + } > + } else if (cpu >= nr_cpu_ids) { > of_node_put(dn); > break; > } > + > + if (cpu < nr_cpu_ids) > + cpu = assign_threads(cpu, nthreads, avail, intserv); > } > > /* If no SMT supported, nthreads is forced to 1 */ > -- > 2.43.0 -aneesh
On Tue, Jan 02, 2024 at 10:16:04AM +0530, Aneesh Kumar K.V wrote: > Michael Ellerman <mpe@ellerman.id.au> writes: > > .... > > > #ifdef CONFIG_PPC64 > > int boot_cpu_hwid = -1; > > @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void) > > avail = !of_property_match_string(dn, > > "enable-method", "spin-table"); > > > > - cpu = assign_threads(cpu, nthreads, avail, intserv); > > + if (boot_core_hwid >= 0) { > > + if (cpu == 0) { > > + pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn); > > + cpu = nthreads; > > + continue; > > + } > > > > - if (cpu >= nr_cpu_ids) { > > + if (be32_to_cpu(intserv[0]) == boot_core_hwid) { > > + pr_info("Renumbered boot core %pOF to logical 0\n", dn); > > + assign_threads(0, nthreads, avail, intserv); > > + of_node_put(dn); > > + break; > > > > I was expecting a 'continue' here. Why 'break' the loop? The condition that > should break the loop should be cpu >= nr_cpu_ids No, the patch seems correct: We're in the "if (boot_core_hwid >= 0)" branch, meaning that it was determined by early_init_dt_scan_cpus() that boot_cpuid >= nr_cpu_ids. So we loop until we get to the boot CPU, so it can be renumbered to 0. Once we do that we break, because we know we are already past nr_cpu_ids - otherwise boot_core_hwid would not be >= 0. > > + } > > + } else if (cpu >= nr_cpu_ids) { > > of_node_put(dn); > > break; > > } Here is what you expected - in case the boot CPU was < nr_cpu_ids we break as soon as nr_cpu_ids is reached. > > + > > + if (cpu < nr_cpu_ids) this ensures that CPUs between nr_cpu_ids and the boot CPU are correctly ignored in case we're already past nr_cpu_ids and only scanning further to find the boot CPU to be renumbered to 0
>>And loop in Wen Xiong and Ming Lei, who care for this issue too. Hi Pingfan, Thanks for your email! Hi Michael, Thanks for your new patchset! In past month, Our several test team have modified /etc/sysconfig/kdump file with nr_cpus=1, triggered kdump over the following IO devices and kdump works fine with generating vmcore file. Test kernel: the latest upstream kernel + your patchset the latest rhel94 kernel + backport of your patchset Test hardware: Nvme drives, FC adapter, NVmf-over-FC Test device drivers: nvme, lpfc, nvme-fc Thanks for your work! Please consideration for inclusion in upstream kernel. Thanks for your help! Wendy > the problems that have been seen. I've tested this fairly thoroughly > with a qemu script, and also a few boots on a real machine. > > If you can test it with your setups that would be great. Hopefully > there isn't some obscure case I've missed. > > cheers >
Jiri Bohac <jbohac@suse.cz> writes: > On Tue, Jan 02, 2024 at 10:16:04AM +0530, Aneesh Kumar K.V wrote: >> Michael Ellerman <mpe@ellerman.id.au> writes: >> >> .... >> >> > #ifdef CONFIG_PPC64 >> > int boot_cpu_hwid = -1; >> > @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void) >> > avail = !of_property_match_string(dn, >> > "enable-method", "spin-table"); >> > >> > - cpu = assign_threads(cpu, nthreads, avail, intserv); >> > + if (boot_core_hwid >= 0) { >> > + if (cpu == 0) { >> > + pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn); >> > + cpu = nthreads; >> > + continue; >> > + } >> > >> > - if (cpu >= nr_cpu_ids) { >> > + if (be32_to_cpu(intserv[0]) == boot_core_hwid) { >> > + pr_info("Renumbered boot core %pOF to logical 0\n", dn); >> > + assign_threads(0, nthreads, avail, intserv); >> > + of_node_put(dn); >> > + break; >> > >> >> I was expecting a 'continue' here. Why 'break' the loop? The condition that >> should break the loop should be cpu >= nr_cpu_ids > > No, the patch seems correct: > > We're in the "if (boot_core_hwid >= 0)" branch, meaning that it > was determined by early_init_dt_scan_cpus() that boot_cpuid >= > nr_cpu_ids. So we loop until we get to the boot CPU, so it can be > renumbered to 0. Once we do that we break, because we > know we are already past nr_cpu_ids - otherwise boot_core_hwid > would not be >= 0. Yes that's exactly right. Thanks for answering for me (was on leave and still catching up). cheers
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index aaaa576d0e15..b77927ccb0ab 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -27,6 +27,7 @@ extern int boot_cpuid; extern int boot_cpu_hwid; /* PPC64 only */ +extern int boot_core_hwid; extern int spinning_secondaries; extern u32 *cpu_to_phys_id; extern bool coregroup_enabled; diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 77364729a1b6..af2b70585b2e 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -368,8 +368,6 @@ static int __init early_init_dt_scan_cpus(unsigned long node, if (found < 0) return 0; - DBG("boot cpu: logical %d physical %d\n", found, - be32_to_cpu(intserv[found_thread])); boot_cpuid = found; if (IS_ENABLED(CONFIG_PPC64)) @@ -382,11 +380,19 @@ static int __init early_init_dt_scan_cpus(unsigned long node, } if (boot_cpuid >= nr_cpu_ids) { - set_nr_cpu_ids(min(CONFIG_NR_CPUS, ALIGN(boot_cpuid + 1, nthreads))); - pr_warn("Boot CPU %d >= nr_cpu_ids, adjusted nr_cpu_ids to %d\n", - boot_cpuid, nr_cpu_ids); + // Remember boot core for smp_setup_cpu_maps() + boot_core_hwid = be32_to_cpu(intserv[0]); + + pr_warn("Boot CPU %d (core hwid %d) >= nr_cpu_ids, adjusted boot CPU to %d\n", + boot_cpuid, boot_core_hwid, found_thread); + + // Adjust boot CPU to appear on logical core 0 + boot_cpuid = found_thread; } + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, + be32_to_cpu(intserv[found_thread])); + /* * PAPR defines "logical" PVR values for cpus that * meet various levels of the architecture: diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index d9f8ed9bd2fc..5844f3d113a5 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -85,6 +85,7 @@ EXPORT_SYMBOL(machine_id); int boot_cpuid = -1; EXPORT_SYMBOL_GPL(boot_cpuid); +int __initdata boot_core_hwid = -1; #ifdef CONFIG_PPC64 int boot_cpu_hwid = -1; @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void) avail = !of_property_match_string(dn, "enable-method", "spin-table"); - cpu = assign_threads(cpu, nthreads, avail, intserv); + if (boot_core_hwid >= 0) { + if (cpu == 0) { + pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn); + cpu = nthreads; + continue; + } - if (cpu >= nr_cpu_ids) { + if (be32_to_cpu(intserv[0]) == boot_core_hwid) { + pr_info("Renumbered boot core %pOF to logical 0\n", dn); + assign_threads(0, nthreads, avail, intserv); + of_node_put(dn); + break; + } + } else if (cpu >= nr_cpu_ids) { of_node_put(dn); break; } + + if (cpu < nr_cpu_ids) + cpu = assign_threads(cpu, nthreads, avail, intserv); } /* If no SMT supported, nthreads is forced to 1 */
If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU onto logical core 0. This is achieved in two stages. In early_init_dt_scan_cpus() the boot CPU is renumbered to be on logical core 0, and the original boot core's hardware ID is recorded. Later in smp_setup_cpu_maps(), if the original boot core ID is set, the logical CPU numbers on the 0th core are skipped in the normal device tree search over CPU device tree nodes. Then the search is continued until the device tree node matching the boot core is found, and those CPUs are assigned the CPU numbers starting at 0. This allows kdump kernels to be booted with low values for nr_cpu_ids to conserve memory, while also allowing the crashing/boot CPU to be any CPU. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> --- arch/powerpc/include/asm/smp.h | 1 + arch/powerpc/kernel/prom.c | 16 +++++++++++----- arch/powerpc/kernel/setup-common.c | 19 +++++++++++++++++-- 3 files changed, 29 insertions(+), 7 deletions(-)