Message ID | 20231017022806.4523-3-piliu@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | enable nr_cpus for powerpc | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
On 17/10/23 7:58 am, Pingfan Liu wrote: > *** Idea *** > For kexec -p, the boot cpu can be not the cpu0, this causes the problem > of allocating memory for paca_ptrs[]. However, in theory, there is no > requirement to assign cpu's logical id as its present sequence in the > device tree. But there is something like cpu_first_thread_sibling(), > which makes assumption on the mapping inside a core. Hence partially > loosening the mapping, i.e. unbind the mapping of core while keep the > mapping inside a core. > > *** Implement *** > At this early stage, there are plenty of memory to utilize. Hence, this > patch allocates interim memory to link the cpu info on a list, then > reorder cpus by changing the list head. As a result, there is a rotate > shift between the sequence number in dt and the cpu logical number. > > *** Result *** > After this patch, a boot-cpu's logical id will always be mapped into the > range [0,threads_per_core). > > Besides this, at this phase, all threads in the boot core are forced to > be onlined. This restriction will be lifted in a later patch with > extra effort. > > Signed-off-by: Pingfan Liu <piliu@redhat.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> > Cc: Wen Xiong <wenxiong@us.ibm.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Sourabh Jain <sourabhjain@linux.ibm.com> > Cc: Hari Bathini <hbathini@linux.ibm.com> > Cc: kexec@lists.infradead.org > To: linuxppc-dev@lists.ozlabs.org Thanks for working on this, Pingfan. Looks good to me. Acked-by: Hari Bathini <hbathini@linux.ibm.com> > --- > arch/powerpc/kernel/prom.c | 25 +++++---- > arch/powerpc/kernel/setup-common.c | 84 +++++++++++++++++++++++------- > 2 files changed, 82 insertions(+), 27 deletions(-) > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index ec82f5bda908..7ed9034912ca 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -76,7 +76,9 @@ u64 ppc64_rma_size; > unsigned int boot_cpu_node_count __ro_after_init; > #endif > static phys_addr_t first_memblock_size; > +#ifdef CONFIG_SMP > static int __initdata boot_cpu_count; > +#endif > > static int __init early_parse_mem(char *p) > { > @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, > const __be32 *intserv; > int i, nthreads; > int len; > - int found = -1; > - int found_thread = 0; > + bool found = false; > > /* We are scanning "cpu" nodes only */ > if (type == NULL || strcmp(type, "cpu") != 0) > @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node, > for (i = 0; i < nthreads; i++) { > if (be32_to_cpu(intserv[i]) == > fdt_boot_cpuid_phys(initial_boot_params)) { > - found = boot_cpu_count; > - found_thread = i; > + /* > + * always map the boot-cpu logical id into the > + * range of [0, thread_per_core) > + */ > + boot_cpuid = i; > + found = true; > + /* This forces all threads in a core to be online */ > + if (nr_cpu_ids % nthreads != 0) > + set_nr_cpu_ids(ALIGN(nr_cpu_ids, nthreads)); > } > #ifdef CONFIG_SMP > /* logical cpu id is always 0 on UP kernels */ > @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node, > } > > /* Not the boot CPU */ > - if (found < 0) > + if (!found) > return 0; > > - DBG("boot cpu: logical %d physical %d\n", found, > - be32_to_cpu(intserv[found_thread])); > - boot_cpuid = found; > + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, > + be32_to_cpu(intserv[boot_cpuid])); > > - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); > + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]); > > /* > * PAPR defines "logical" PVR values for cpus that > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c > index 707f0490639d..9802c7e5ee2f 100644 > --- a/arch/powerpc/kernel/setup-common.c > +++ b/arch/powerpc/kernel/setup-common.c > @@ -36,6 +36,7 @@ > #include <linux/of_irq.h> > #include <linux/hugetlb.h> > #include <linux/pgtable.h> > +#include <linux/list.h> > #include <asm/io.h> > #include <asm/paca.h> > #include <asm/processor.h> > @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc) > > u32 *cpu_to_phys_id = NULL; > > +struct interrupt_server_node { > + struct list_head node; > + bool avail; > + int len; > + __be32 intserv[]; > +}; > + > /** > * setup_cpu_maps - initialize the following cpu maps: > * cpu_possible_mask > @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; > void __init smp_setup_cpu_maps(void) > { > struct device_node *dn; > - int cpu = 0; > - int nthreads = 1; > + int shift = 0, cpu = 0; > + int j, nthreads = 1; > + int len; > + struct interrupt_server_node *intserv_node, *n; > + struct list_head *bt_node, head; > + bool avail, found_boot_cpu = false; > > DBG("smp_setup_cpu_maps()\n"); > > + INIT_LIST_HEAD(&head); > cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32), > __alignof__(u32)); > if (!cpu_to_phys_id) > @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void) > for_each_node_by_type(dn, "cpu") { > const __be32 *intserv; > __be32 cpu_be; > - int j, len; > > DBG(" * %pOF...\n", dn); > > @@ -480,29 +492,65 @@ void __init smp_setup_cpu_maps(void) > } > } > > - nthreads = len / sizeof(int); > + avail = of_device_is_available(dn); > + if (!avail) > + avail = !of_property_match_string(dn, > + "enable-method", "spin-table"); > > - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > - bool avail; > > - DBG(" thread %d -> cpu %d (hard id %d)\n", > - j, cpu, be32_to_cpu(intserv[j])); > - > - avail = of_device_is_available(dn); > - if (!avail) > - avail = !of_property_match_string(dn, > - "enable-method", "spin-table"); > + intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + if (!intserv_node) > + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", > + __func__, > + sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + intserv_node->len = len; > + memcpy(intserv_node->intserv, intserv, len); > + intserv_node->avail = avail; > + list_add_tail(&intserv_node->node, &head); > + > + if (!found_boot_cpu) { > + nthreads = len / sizeof(int); > + for (j = 0 ; j < nthreads; j++) { > + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { > + bt_node = &intserv_node->node; > + found_boot_cpu = true; > + /* > + * Record the round-shift between dt > + * seq and cpu logical number > + */ > + shift = cpu - j; > + break; > + } > + > + cpu++; > + } > + } > > + } > + cpu = 0; > + list_del_init(&head); > + /* Select the primary thread, the boot cpu's slibing, as the logic 0 */ > + list_add_tail(&head, bt_node); > + pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift); > + list_for_each_entry(intserv_node, &head, node) { > + > + avail = intserv_node->avail; > + nthreads = intserv_node->len / sizeof(int); > + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > set_cpu_present(cpu, avail); > set_cpu_possible(cpu, true); > - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); > + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]); > + DBG(" thread %d -> cpu %d (hard id %d)\n", > + j, cpu, be32_to_cpu(intserv_node->intserv[j])); > cpu++; > } > + } > > - if (cpu >= nr_cpu_ids) { > - of_node_put(dn); > - break; > - } > + list_for_each_entry_safe(intserv_node, n, &head, node) { > + len = sizeof(struct interrupt_server_node) + intserv_node->len; > + memblock_free(intserv_node, len); > } > > /* If no SMT supported, nthreads is forced to 1 */
On Tue, Oct 17, 2023 at 6:39 PM Hari Bathini <hbathini@linux.ibm.com> wrote: > > > > On 17/10/23 7:58 am, Pingfan Liu wrote: > > *** Idea *** > > For kexec -p, the boot cpu can be not the cpu0, this causes the problem > > of allocating memory for paca_ptrs[]. However, in theory, there is no > > requirement to assign cpu's logical id as its present sequence in the > > device tree. But there is something like cpu_first_thread_sibling(), > > which makes assumption on the mapping inside a core. Hence partially > > loosening the mapping, i.e. unbind the mapping of core while keep the > > mapping inside a core. > > > > *** Implement *** > > At this early stage, there are plenty of memory to utilize. Hence, this > > patch allocates interim memory to link the cpu info on a list, then > > reorder cpus by changing the list head. As a result, there is a rotate > > shift between the sequence number in dt and the cpu logical number. > > > > *** Result *** > > After this patch, a boot-cpu's logical id will always be mapped into the > > range [0,threads_per_core). > > > > Besides this, at this phase, all threads in the boot core are forced to > > be onlined. This restriction will be lifted in a later patch with > > extra effort. > > > > Signed-off-by: Pingfan Liu <piliu@redhat.com> > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > Cc: Nicholas Piggin <npiggin@gmail.com> > > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > > Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> > > Cc: Wen Xiong <wenxiong@us.ibm.com> > > Cc: Baoquan He <bhe@redhat.com> > > Cc: Ming Lei <ming.lei@redhat.com> > > Cc: Sourabh Jain <sourabhjain@linux.ibm.com> > > Cc: Hari Bathini <hbathini@linux.ibm.com> > > Cc: kexec@lists.infradead.org > > To: linuxppc-dev@lists.ozlabs.org > > Thanks for working on this, Pingfan. > Looks good to me. > > Acked-by: Hari Bathini <hbathini@linux.ibm.com> > Thank you for kindly reviewing. I hope that after all these years, we have accomplished the objective. Best Regards, Pingfan
On 18/10/23 1:51 pm, Pingfan Liu wrote: > On Tue, Oct 17, 2023 at 6:39 PM Hari Bathini <hbathini@linux.ibm.com> wrote: >> >> >> >> On 17/10/23 7:58 am, Pingfan Liu wrote: >>> *** Idea *** >>> For kexec -p, the boot cpu can be not the cpu0, this causes the problem >>> of allocating memory for paca_ptrs[]. However, in theory, there is no >>> requirement to assign cpu's logical id as its present sequence in the >>> device tree. But there is something like cpu_first_thread_sibling(), >>> which makes assumption on the mapping inside a core. Hence partially >>> loosening the mapping, i.e. unbind the mapping of core while keep the >>> mapping inside a core. >>> >>> *** Implement *** >>> At this early stage, there are plenty of memory to utilize. Hence, this >>> patch allocates interim memory to link the cpu info on a list, then >>> reorder cpus by changing the list head. As a result, there is a rotate >>> shift between the sequence number in dt and the cpu logical number. >>> >>> *** Result *** >>> After this patch, a boot-cpu's logical id will always be mapped into the >>> range [0,threads_per_core). >>> >>> Besides this, at this phase, all threads in the boot core are forced to >>> be onlined. This restriction will be lifted in a later patch with >>> extra effort. >>> >>> Signed-off-by: Pingfan Liu <piliu@redhat.com> >>> Cc: Michael Ellerman <mpe@ellerman.id.au> >>> Cc: Nicholas Piggin <npiggin@gmail.com> >>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> >>> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> >>> Cc: Wen Xiong <wenxiong@us.ibm.com> >>> Cc: Baoquan He <bhe@redhat.com> >>> Cc: Ming Lei <ming.lei@redhat.com> >>> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> >>> Cc: Hari Bathini <hbathini@linux.ibm.com> >>> Cc: kexec@lists.infradead.org >>> To: linuxppc-dev@lists.ozlabs.org >> >> Thanks for working on this, Pingfan. >> Looks good to me. >> >> Acked-by: Hari Bathini <hbathini@linux.ibm.com> >> > > Thank you for kindly reviewing. I hope that after all these years, we > have accomplished the objective. > I hope so too. Thanks!
With this patch series applied first and kdump kernel boots fine with nr_cpus=1 on both PowerNV and PowerVM platforms. For both patches: Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> - Sourabh Jain On 17/10/23 07:58, Pingfan Liu wrote: > *** Idea *** > For kexec -p, the boot cpu can be not the cpu0, this causes the problem > of allocating memory for paca_ptrs[]. However, in theory, there is no > requirement to assign cpu's logical id as its present sequence in the > device tree. But there is something like cpu_first_thread_sibling(), > which makes assumption on the mapping inside a core. Hence partially > loosening the mapping, i.e. unbind the mapping of core while keep the > mapping inside a core. > > *** Implement *** > At this early stage, there are plenty of memory to utilize. Hence, this > patch allocates interim memory to link the cpu info on a list, then > reorder cpus by changing the list head. As a result, there is a rotate > shift between the sequence number in dt and the cpu logical number. > > *** Result *** > After this patch, a boot-cpu's logical id will always be mapped into the > range [0,threads_per_core). > > Besides this, at this phase, all threads in the boot core are forced to > be onlined. This restriction will be lifted in a later patch with > extra effort. > > Signed-off-by: Pingfan Liu <piliu@redhat.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> > Cc: Wen Xiong <wenxiong@us.ibm.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Sourabh Jain <sourabhjain@linux.ibm.com> > Cc: Hari Bathini <hbathini@linux.ibm.com> > Cc: kexec@lists.infradead.org > To: linuxppc-dev@lists.ozlabs.org > --- > arch/powerpc/kernel/prom.c | 25 +++++---- > arch/powerpc/kernel/setup-common.c | 84 +++++++++++++++++++++++------- > 2 files changed, 82 insertions(+), 27 deletions(-) > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index ec82f5bda908..7ed9034912ca 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -76,7 +76,9 @@ u64 ppc64_rma_size; > unsigned int boot_cpu_node_count __ro_after_init; > #endif > static phys_addr_t first_memblock_size; > +#ifdef CONFIG_SMP > static int __initdata boot_cpu_count; > +#endif > > static int __init early_parse_mem(char *p) > { > @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, > const __be32 *intserv; > int i, nthreads; > int len; > - int found = -1; > - int found_thread = 0; > + bool found = false; > > /* We are scanning "cpu" nodes only */ > if (type == NULL || strcmp(type, "cpu") != 0) > @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node, > for (i = 0; i < nthreads; i++) { > if (be32_to_cpu(intserv[i]) == > fdt_boot_cpuid_phys(initial_boot_params)) { > - found = boot_cpu_count; > - found_thread = i; > + /* > + * always map the boot-cpu logical id into the > + * range of [0, thread_per_core) > + */ > + boot_cpuid = i; > + found = true; > + /* This forces all threads in a core to be online */ > + if (nr_cpu_ids % nthreads != 0) > + set_nr_cpu_ids(ALIGN(nr_cpu_ids, nthreads)); > } > #ifdef CONFIG_SMP > /* logical cpu id is always 0 on UP kernels */ > @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node, > } > > /* Not the boot CPU */ > - if (found < 0) > + if (!found) > return 0; > > - DBG("boot cpu: logical %d physical %d\n", found, > - be32_to_cpu(intserv[found_thread])); > - boot_cpuid = found; > + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, > + be32_to_cpu(intserv[boot_cpuid])); > > - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); > + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]); > > /* > * PAPR defines "logical" PVR values for cpus that > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c > index 707f0490639d..9802c7e5ee2f 100644 > --- a/arch/powerpc/kernel/setup-common.c > +++ b/arch/powerpc/kernel/setup-common.c > @@ -36,6 +36,7 @@ > #include <linux/of_irq.h> > #include <linux/hugetlb.h> > #include <linux/pgtable.h> > +#include <linux/list.h> > #include <asm/io.h> > #include <asm/paca.h> > #include <asm/processor.h> > @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc) > > u32 *cpu_to_phys_id = NULL; > > +struct interrupt_server_node { > + struct list_head node; > + bool avail; > + int len; > + __be32 intserv[]; > +}; > + > /** > * setup_cpu_maps - initialize the following cpu maps: > * cpu_possible_mask > @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; > void __init smp_setup_cpu_maps(void) > { > struct device_node *dn; > - int cpu = 0; > - int nthreads = 1; > + int shift = 0, cpu = 0; > + int j, nthreads = 1; > + int len; > + struct interrupt_server_node *intserv_node, *n; > + struct list_head *bt_node, head; > + bool avail, found_boot_cpu = false; > > DBG("smp_setup_cpu_maps()\n"); > > + INIT_LIST_HEAD(&head); > cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32), > __alignof__(u32)); > if (!cpu_to_phys_id) > @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void) > for_each_node_by_type(dn, "cpu") { > const __be32 *intserv; > __be32 cpu_be; > - int j, len; > > DBG(" * %pOF...\n", dn); > > @@ -480,29 +492,65 @@ void __init smp_setup_cpu_maps(void) > } > } > > - nthreads = len / sizeof(int); > + avail = of_device_is_available(dn); > + if (!avail) > + avail = !of_property_match_string(dn, > + "enable-method", "spin-table"); > > - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > - bool avail; > > - DBG(" thread %d -> cpu %d (hard id %d)\n", > - j, cpu, be32_to_cpu(intserv[j])); > - > - avail = of_device_is_available(dn); > - if (!avail) > - avail = !of_property_match_string(dn, > - "enable-method", "spin-table"); > + intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + if (!intserv_node) > + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", > + __func__, > + sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + intserv_node->len = len; > + memcpy(intserv_node->intserv, intserv, len); > + intserv_node->avail = avail; > + list_add_tail(&intserv_node->node, &head); > + > + if (!found_boot_cpu) { > + nthreads = len / sizeof(int); > + for (j = 0 ; j < nthreads; j++) { > + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { > + bt_node = &intserv_node->node; > + found_boot_cpu = true; > + /* > + * Record the round-shift between dt > + * seq and cpu logical number > + */ > + shift = cpu - j; > + break; > + } > + > + cpu++; > + } > + } > > + } > + cpu = 0; > + list_del_init(&head); > + /* Select the primary thread, the boot cpu's slibing, as the logic 0 */ > + list_add_tail(&head, bt_node); > + pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift); > + list_for_each_entry(intserv_node, &head, node) { > + > + avail = intserv_node->avail; > + nthreads = intserv_node->len / sizeof(int); > + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > set_cpu_present(cpu, avail); > set_cpu_possible(cpu, true); > - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); > + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]); > + DBG(" thread %d -> cpu %d (hard id %d)\n", > + j, cpu, be32_to_cpu(intserv_node->intserv[j])); > cpu++; > } > + } > > - if (cpu >= nr_cpu_ids) { > - of_node_put(dn); > - break; > - } > + list_for_each_entry_safe(intserv_node, n, &head, node) { > + len = sizeof(struct interrupt_server_node) + intserv_node->len; > + memblock_free(intserv_node, len); > } > > /* If no SMT supported, nthreads is forced to 1 */
Hi Pingfan, Michael, On 17/10/23 4:03 pm, Hari Bathini wrote: > > > On 17/10/23 7:58 am, Pingfan Liu wrote: >> *** Idea *** >> For kexec -p, the boot cpu can be not the cpu0, this causes the problem >> of allocating memory for paca_ptrs[]. However, in theory, there is no >> requirement to assign cpu's logical id as its present sequence in the >> device tree. But there is something like cpu_first_thread_sibling(), >> which makes assumption on the mapping inside a core. Hence partially >> loosening the mapping, i.e. unbind the mapping of core while keep the >> mapping inside a core. >> >> *** Implement *** >> At this early stage, there are plenty of memory to utilize. Hence, this >> patch allocates interim memory to link the cpu info on a list, then >> reorder cpus by changing the list head. As a result, there is a rotate >> shift between the sequence number in dt and the cpu logical number. >> >> *** Result *** >> After this patch, a boot-cpu's logical id will always be mapped into the >> range [0,threads_per_core). >> >> Besides this, at this phase, all threads in the boot core are forced to >> be onlined. This restriction will be lifted in a later patch with >> extra effort. >> >> Signed-off-by: Pingfan Liu <piliu@redhat.com> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> Cc: Nicholas Piggin <npiggin@gmail.com> >> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> >> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> >> Cc: Wen Xiong <wenxiong@us.ibm.com> >> Cc: Baoquan He <bhe@redhat.com> >> Cc: Ming Lei <ming.lei@redhat.com> >> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> >> Cc: Hari Bathini <hbathini@linux.ibm.com> >> Cc: kexec@lists.infradead.org >> To: linuxppc-dev@lists.ozlabs.org > > Thanks for working on this, Pingfan. > Looks good to me. > > Acked-by: Hari Bathini <hbathini@linux.ibm.com> > On second thoughts, probably better off with no impact for bootcpu < nr_cpu_ids case and changing only two cores logical numbering otherwise. Something like the below (Please share your thoughts): diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..78a8312aa8c4 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -76,7 +76,9 @@ u64 ppc64_rma_size; unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size; +#ifdef CONFIG_SMP static int __initdata boot_cpu_count; +#endif static int __init early_parse_mem(char *p) { @@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned long node, fdt_boot_cpuid_phys(initial_boot_params)) { found = boot_cpu_count; found_thread = i; + /* + * Map boot-cpu logical id into the range + * of [0, thread_per_core) if it can't be + * accommodated within nr_cpu_ids. + */ + if (i != boot_cpu_count && boot_cpu_count >= nr_cpu_ids) { + boot_cpuid = i; + DBG("Logical CPU number for boot CPU changed from %d to %d\n", + boot_cpu_count, i); + } else { + boot_cpuid = boot_cpu_count; + } + + /* Ensure boot thread is acconted for in nr_cpu_ids */ + if (boot_cpuid >= nr_cpu_ids) { + set_nr_cpu_ids(boot_cpuid + 1); + DBG("Adjusted nr_cpu_ids to %u, to include boot CPU.\n", + nr_cpu_ids); + } } #ifdef CONFIG_SMP /* logical cpu id is always 0 on UP kernels */ @@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned long node, if (found < 0) return 0; - DBG("boot cpu: logical %d physical %d\n", found, + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, be32_to_cpu(intserv[found_thread])); - boot_cpuid = found; boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index b7b733474b60..f7179525c774 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int tpc) u32 *cpu_to_phys_id = NULL; +struct interrupt_server_node { + bool avail; + int len; + __be32 intserv[]; +}; + /** * setup_cpu_maps - initialize the following cpu maps: * cpu_possible_mask @@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL; */ void __init smp_setup_cpu_maps(void) { + struct interrupt_server_node *core0_node = NULL, *bt_node = NULL; + int orig_boot_cpu = -1, orig_boot_thread = -1; + bool found_boot_cpu = false; struct device_node *dn; - int cpu = 0; int nthreads = 1; + int cpu = 0; + int j, len; DBG("smp_setup_cpu_maps()\n"); @@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void) __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32)); for_each_node_by_type(dn, "cpu") { + bool avail, skip = false; const __be32 *intserv; __be32 cpu_be; - int j, len; DBG(" * %pOF...\n", dn); @@ -466,29 +476,121 @@ void __init smp_setup_cpu_maps(void) nthreads = len / sizeof(int); - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { - bool avail; + avail = of_device_is_available(dn); + if (!avail) + avail = !of_property_match_string(dn, + "enable-method", "spin-table"); + + for (j = 0; (cpu == 0 || !found_boot_cpu) && j < nthreads; j++) { + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { + found_boot_cpu = true; + if (cpu == 0) + break; + + /* Original logical CPU number of thread0 in boot core */ + orig_boot_cpu = cpu; + orig_boot_thread = j; + bt_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, + __alignof__(u32)); + if (!bt_node) + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", + __func__, + sizeof(struct interrupt_server_node) + len, + __alignof__(u32)); + bt_node->len = len; + memcpy(bt_node->intserv, intserv, len); + bt_node->avail = avail; + skip = true; + break; + } + } + /* + * Boot CPU not on core0. + * Hold off adding core0 until boot core is found as core0 + * may have to be replaced with boot core if boot core can + * not be accommodated within nr_cpu_ids with its original + * logical CPU numbering. + */ + if (cpu == 0 && !found_boot_cpu) { + core0_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, + __alignof__(u32)); + if (!core0_node) + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", + __func__, + sizeof(struct interrupt_server_node) + len, + __alignof__(u32)); + core0_node->len = len; + memcpy(core0_node->intserv, intserv, len); + core0_node->avail = avail; + skip = true; + } + + if (skip) { + /* Assumes same number of threads for all cores */ + cpu += nthreads; + continue; + } + + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { DBG(" thread %d -> cpu %d (hard id %d)\n", j, cpu, be32_to_cpu(intserv[j])); - avail = of_device_is_available(dn); - if (!avail) - avail = !of_property_match_string(dn, - "enable-method", "spin-table"); - set_cpu_present(cpu, avail); set_cpu_possible(cpu, true); cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); cpu++; } - if (cpu >= nr_cpu_ids) { + if (found_boot_cpu && cpu >= nr_cpu_ids) { of_node_put(dn); break; } } + /* + * Boot CPU not on core0. + * + * If nr_cpu_ids does not accommodate the original logical CPU numbering for + * boot CPU core, use logical CPU numbers 0 to nthreads for boot CPU core. + * Note that boot cpu is already assigned with logical CPU number somewhere + * between 0 to nthreads (depending on the boot thread within the core) in + * early_init_dt_scan_cpus() for this case. + * + * Otherwise, stick with the original logical CPU numbering. + */ + if (bt_node) { + int core0_cpu; + + if (orig_boot_cpu + orig_boot_thread >= nr_cpu_ids) { + cpu = 0; + core0_cpu = orig_boot_cpu; + } else { + cpu = orig_boot_cpu; + core0_cpu = 0; + } + + for (j = 0; j < nthreads && core0_cpu < nr_cpu_ids; j++) { + DBG(" thread %d -> cpu %d (hard id %d)\n", + j, core0_cpu, be32_to_cpu(core0_node->intserv[j])); + + set_cpu_present(core0_cpu, core0_node->avail); + set_cpu_possible(core0_cpu, true); + cpu_to_phys_id[core0_cpu] = be32_to_cpu(core0_node->intserv[j]); + core0_cpu++; + } + + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { + DBG(" thread %d -> cpu %d (hard id %d)\n", + j, cpu, be32_to_cpu(bt_node->intserv[j])); + + set_cpu_present(cpu, bt_node->avail); + set_cpu_possible(cpu, true); + cpu_to_phys_id[cpu] = be32_to_cpu(bt_node->intserv[j]); + cpu++; + } + } + /* If no SMT supported, nthreads is forced to 1 */ if (!cpu_has_feature(CPU_FTR_SMT)) { DBG(" SMT disabled ! nthreads forced to 1\n");
Hi Hari, On Mon, Nov 27, 2023 at 12:30 PM Hari Bathini <hbathini@linux.ibm.com> wrote: > > Hi Pingfan, Michael, > > On 17/10/23 4:03 pm, Hari Bathini wrote: > > > > > > On 17/10/23 7:58 am, Pingfan Liu wrote: > >> *** Idea *** > >> For kexec -p, the boot cpu can be not the cpu0, this causes the problem > >> of allocating memory for paca_ptrs[]. However, in theory, there is no > >> requirement to assign cpu's logical id as its present sequence in the > >> device tree. But there is something like cpu_first_thread_sibling(), > >> which makes assumption on the mapping inside a core. Hence partially > >> loosening the mapping, i.e. unbind the mapping of core while keep the > >> mapping inside a core. > >> > >> *** Implement *** > >> At this early stage, there are plenty of memory to utilize. Hence, this > >> patch allocates interim memory to link the cpu info on a list, then > >> reorder cpus by changing the list head. As a result, there is a rotate > >> shift between the sequence number in dt and the cpu logical number. > >> > >> *** Result *** > >> After this patch, a boot-cpu's logical id will always be mapped into the > >> range [0,threads_per_core). > >> > >> Besides this, at this phase, all threads in the boot core are forced to > >> be onlined. This restriction will be lifted in a later patch with > >> extra effort. > >> > >> Signed-off-by: Pingfan Liu <piliu@redhat.com> > >> Cc: Michael Ellerman <mpe@ellerman.id.au> > >> Cc: Nicholas Piggin <npiggin@gmail.com> > >> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > >> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> > >> Cc: Wen Xiong <wenxiong@us.ibm.com> > >> Cc: Baoquan He <bhe@redhat.com> > >> Cc: Ming Lei <ming.lei@redhat.com> > >> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> > >> Cc: Hari Bathini <hbathini@linux.ibm.com> > >> Cc: kexec@lists.infradead.org > >> To: linuxppc-dev@lists.ozlabs.org > > > > Thanks for working on this, Pingfan. > > Looks good to me. > > > > Acked-by: Hari Bathini <hbathini@linux.ibm.com> > > > > On second thoughts, probably better off with no impact for > bootcpu < nr_cpu_ids case and changing only two cores logical > numbering otherwise. Something like the below (Please share > your thoughts): > I am afraid that it may not be as ideal as it looks, considering the following factors: -1. For the case of 'bootcpu < nr_cpu_ids', crash can happen evenly across any cpu in the system, which seriously undermines the protection intended here (Under the most optimistic scenario, there is a 50% chance of success) -2. For the re-ordering of logical numbering, IMHO, if there is concern that re-ordering will break something, the partial re-ordering can not avoid that. We ought to spot probable hazards so as to ease worries. Thanks, Pingfan > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index ec82f5bda908..78a8312aa8c4 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -76,7 +76,9 @@ u64 ppc64_rma_size; > unsigned int boot_cpu_node_count __ro_after_init; > #endif > static phys_addr_t first_memblock_size; > +#ifdef CONFIG_SMP > static int __initdata boot_cpu_count; > +#endif > > static int __init early_parse_mem(char *p) > { > @@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned > long node, > fdt_boot_cpuid_phys(initial_boot_params)) { > found = boot_cpu_count; > found_thread = i; > + /* > + * Map boot-cpu logical id into the range > + * of [0, thread_per_core) if it can't be > + * accommodated within nr_cpu_ids. > + */ > + if (i != boot_cpu_count && boot_cpu_count >= nr_cpu_ids) { > + boot_cpuid = i; > + DBG("Logical CPU number for boot CPU changed from %d to %d\n", > + boot_cpu_count, i); > + } else { > + boot_cpuid = boot_cpu_count; > + } > + > + /* Ensure boot thread is acconted for in nr_cpu_ids */ > + if (boot_cpuid >= nr_cpu_ids) { > + set_nr_cpu_ids(boot_cpuid + 1); > + DBG("Adjusted nr_cpu_ids to %u, to include boot CPU.\n", > + nr_cpu_ids); > + } > } > #ifdef CONFIG_SMP > /* logical cpu id is always 0 on UP kernels */ > @@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned > long node, > if (found < 0) > return 0; > > - DBG("boot cpu: logical %d physical %d\n", found, > + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, > be32_to_cpu(intserv[found_thread])); > - boot_cpuid = found; > > boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); > > diff --git a/arch/powerpc/kernel/setup-common.c > b/arch/powerpc/kernel/setup-common.c > index b7b733474b60..f7179525c774 100644 > --- a/arch/powerpc/kernel/setup-common.c > +++ b/arch/powerpc/kernel/setup-common.c > @@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int tpc) > > u32 *cpu_to_phys_id = NULL; > > +struct interrupt_server_node { > + bool avail; > + int len; > + __be32 intserv[]; > +}; > + > /** > * setup_cpu_maps - initialize the following cpu maps: > * cpu_possible_mask > @@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL; > */ > void __init smp_setup_cpu_maps(void) > { > + struct interrupt_server_node *core0_node = NULL, *bt_node = NULL; > + int orig_boot_cpu = -1, orig_boot_thread = -1; > + bool found_boot_cpu = false; > struct device_node *dn; > - int cpu = 0; > int nthreads = 1; > + int cpu = 0; > + int j, len; > > DBG("smp_setup_cpu_maps()\n"); > > @@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void) > __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32)); > > for_each_node_by_type(dn, "cpu") { > + bool avail, skip = false; > const __be32 *intserv; > __be32 cpu_be; > - int j, len; > > DBG(" * %pOF...\n", dn); > > @@ -466,29 +476,121 @@ void __init smp_setup_cpu_maps(void) > > nthreads = len / sizeof(int); > > - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > - bool avail; > + avail = of_device_is_available(dn); > + if (!avail) > + avail = !of_property_match_string(dn, > + "enable-method", "spin-table"); > + > + for (j = 0; (cpu == 0 || !found_boot_cpu) && j < nthreads; j++) { > + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { > + found_boot_cpu = true; > + if (cpu == 0) > + break; > + > + /* Original logical CPU number of thread0 in boot core */ > + orig_boot_cpu = cpu; > + orig_boot_thread = j; > + bt_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + if (!bt_node) > + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", > + __func__, > + sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + bt_node->len = len; > + memcpy(bt_node->intserv, intserv, len); > + bt_node->avail = avail; > + skip = true; > + break; > + } > + } > > + /* > + * Boot CPU not on core0. > + * Hold off adding core0 until boot core is found as core0 > + * may have to be replaced with boot core if boot core can > + * not be accommodated within nr_cpu_ids with its original > + * logical CPU numbering. > + */ > + if (cpu == 0 && !found_boot_cpu) { > + core0_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + if (!core0_node) > + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", > + __func__, > + sizeof(struct interrupt_server_node) + len, > + __alignof__(u32)); > + core0_node->len = len; > + memcpy(core0_node->intserv, intserv, len); > + core0_node->avail = avail; > + skip = true; > + } > + > + if (skip) { > + /* Assumes same number of threads for all cores */ > + cpu += nthreads; > + continue; > + } > + > + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > DBG(" thread %d -> cpu %d (hard id %d)\n", > j, cpu, be32_to_cpu(intserv[j])); > > - avail = of_device_is_available(dn); > - if (!avail) > - avail = !of_property_match_string(dn, > - "enable-method", "spin-table"); > - > set_cpu_present(cpu, avail); > set_cpu_possible(cpu, true); > cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); > cpu++; > } > > - if (cpu >= nr_cpu_ids) { > + if (found_boot_cpu && cpu >= nr_cpu_ids) { > of_node_put(dn); > break; > } > } > > + /* > + * Boot CPU not on core0. > + * > + * If nr_cpu_ids does not accommodate the original logical CPU > numbering for > + * boot CPU core, use logical CPU numbers 0 to nthreads for boot CPU core. > + * Note that boot cpu is already assigned with logical CPU number > somewhere > + * between 0 to nthreads (depending on the boot thread within the core) in > + * early_init_dt_scan_cpus() for this case. > + * > + * Otherwise, stick with the original logical CPU numbering. > + */ > + if (bt_node) { > + int core0_cpu; > + > + if (orig_boot_cpu + orig_boot_thread >= nr_cpu_ids) { > + cpu = 0; > + core0_cpu = orig_boot_cpu; > + } else { > + cpu = orig_boot_cpu; > + core0_cpu = 0; > + } > + > + for (j = 0; j < nthreads && core0_cpu < nr_cpu_ids; j++) { > + DBG(" thread %d -> cpu %d (hard id %d)\n", > + j, core0_cpu, be32_to_cpu(core0_node->intserv[j])); > + > + set_cpu_present(core0_cpu, core0_node->avail); > + set_cpu_possible(core0_cpu, true); > + cpu_to_phys_id[core0_cpu] = be32_to_cpu(core0_node->intserv[j]); > + core0_cpu++; > + } > + > + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { > + DBG(" thread %d -> cpu %d (hard id %d)\n", > + j, cpu, be32_to_cpu(bt_node->intserv[j])); > + > + set_cpu_present(cpu, bt_node->avail); > + set_cpu_possible(cpu, true); > + cpu_to_phys_id[cpu] = be32_to_cpu(bt_node->intserv[j]); > + cpu++; > + } > + } > + > /* If no SMT supported, nthreads is forced to 1 */ > if (!cpu_has_feature(CPU_FTR_SMT)) { > DBG(" SMT disabled ! nthreads forced to 1\n"); >
Hi Michael, I am fine with either approach. I was trying to address your concerns in my way. Looking for your inputs here on how to go about this now.. On 29/11/23 7:00 am, Pingfan Liu wrote: > Hi Hari, > > > On Mon, Nov 27, 2023 at 12:30 PM Hari Bathini <hbathini@linux.ibm.com> wrote: >> >> Hi Pingfan, Michael, >> >> On 17/10/23 4:03 pm, Hari Bathini wrote: >>> >>> >>> On 17/10/23 7:58 am, Pingfan Liu wrote: >>>> *** Idea *** >>>> For kexec -p, the boot cpu can be not the cpu0, this causes the problem >>>> of allocating memory for paca_ptrs[]. However, in theory, there is no >>>> requirement to assign cpu's logical id as its present sequence in the >>>> device tree. But there is something like cpu_first_thread_sibling(), >>>> which makes assumption on the mapping inside a core. Hence partially >>>> loosening the mapping, i.e. unbind the mapping of core while keep the >>>> mapping inside a core. >>>> >>>> *** Implement *** >>>> At this early stage, there are plenty of memory to utilize. Hence, this >>>> patch allocates interim memory to link the cpu info on a list, then >>>> reorder cpus by changing the list head. As a result, there is a rotate >>>> shift between the sequence number in dt and the cpu logical number. >>>> >>>> *** Result *** >>>> After this patch, a boot-cpu's logical id will always be mapped into the >>>> range [0,threads_per_core). >>>> >>>> Besides this, at this phase, all threads in the boot core are forced to >>>> be onlined. This restriction will be lifted in a later patch with >>>> extra effort. >>>> >>>> Signed-off-by: Pingfan Liu <piliu@redhat.com> >>>> Cc: Michael Ellerman <mpe@ellerman.id.au> >>>> Cc: Nicholas Piggin <npiggin@gmail.com> >>>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> >>>> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> >>>> Cc: Wen Xiong <wenxiong@us.ibm.com> >>>> Cc: Baoquan He <bhe@redhat.com> >>>> Cc: Ming Lei <ming.lei@redhat.com> >>>> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> >>>> Cc: Hari Bathini <hbathini@linux.ibm.com> >>>> Cc: kexec@lists.infradead.org >>>> To: linuxppc-dev@lists.ozlabs.org >>> >>> Thanks for working on this, Pingfan. >>> Looks good to me. >>> >>> Acked-by: Hari Bathini <hbathini@linux.ibm.com> >>> >> >> On second thoughts, probably better off with no impact for >> bootcpu < nr_cpu_ids case and changing only two cores logical >> numbering otherwise. Something like the below (Please share >> your thoughts): >> > > I am afraid that it may not be as ideal as it looks, considering the > following factors: > -1. For the case of 'bootcpu < nr_cpu_ids', crash can happen evenly > across any cpu in the system, which seriously undermines the > protection intended here (Under the most optimistic scenario, there is > a 50% chance of success) > > -2. For the re-ordering of logical numbering, IMHO, if there is > concern that re-ordering will break something, the partial re-ordering > can not avoid that. We ought to spot probable hazards so as to ease > worries. > > > Thanks, > > Pingfan > >> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c >> index ec82f5bda908..78a8312aa8c4 100644 >> --- a/arch/powerpc/kernel/prom.c >> +++ b/arch/powerpc/kernel/prom.c >> @@ -76,7 +76,9 @@ u64 ppc64_rma_size; >> unsigned int boot_cpu_node_count __ro_after_init; >> #endif >> static phys_addr_t first_memblock_size; >> +#ifdef CONFIG_SMP >> static int __initdata boot_cpu_count; >> +#endif >> >> static int __init early_parse_mem(char *p) >> { >> @@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned >> long node, >> fdt_boot_cpuid_phys(initial_boot_params)) { >> found = boot_cpu_count; >> found_thread = i; >> + /* >> + * Map boot-cpu logical id into the range >> + * of [0, thread_per_core) if it can't be >> + * accommodated within nr_cpu_ids. >> + */ >> + if (i != boot_cpu_count && boot_cpu_count >= nr_cpu_ids) { >> + boot_cpuid = i; >> + DBG("Logical CPU number for boot CPU changed from %d to %d\n", >> + boot_cpu_count, i); >> + } else { >> + boot_cpuid = boot_cpu_count; >> + } >> + >> + /* Ensure boot thread is acconted for in nr_cpu_ids */ >> + if (boot_cpuid >= nr_cpu_ids) { >> + set_nr_cpu_ids(boot_cpuid + 1); >> + DBG("Adjusted nr_cpu_ids to %u, to include boot CPU.\n", >> + nr_cpu_ids); >> + } >> } >> #ifdef CONFIG_SMP >> /* logical cpu id is always 0 on UP kernels */ >> @@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned >> long node, >> if (found < 0) >> return 0; >> >> - DBG("boot cpu: logical %d physical %d\n", found, >> + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, >> be32_to_cpu(intserv[found_thread])); >> - boot_cpuid = found; >> >> boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); >> >> diff --git a/arch/powerpc/kernel/setup-common.c >> b/arch/powerpc/kernel/setup-common.c >> index b7b733474b60..f7179525c774 100644 >> --- a/arch/powerpc/kernel/setup-common.c >> +++ b/arch/powerpc/kernel/setup-common.c >> @@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int tpc) >> >> u32 *cpu_to_phys_id = NULL; >> >> +struct interrupt_server_node { >> + bool avail; >> + int len; >> + __be32 intserv[]; >> +}; >> + >> /** >> * setup_cpu_maps - initialize the following cpu maps: >> * cpu_possible_mask >> @@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL; >> */ >> void __init smp_setup_cpu_maps(void) >> { >> + struct interrupt_server_node *core0_node = NULL, *bt_node = NULL; >> + int orig_boot_cpu = -1, orig_boot_thread = -1; >> + bool found_boot_cpu = false; >> struct device_node *dn; >> - int cpu = 0; >> int nthreads = 1; >> + int cpu = 0; >> + int j, len; >> >> DBG("smp_setup_cpu_maps()\n"); >> >> @@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void) >> __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32)); >> >> for_each_node_by_type(dn, "cpu") { >> + bool avail, skip = false; >> const __be32 *intserv; >> __be32 cpu_be; >> - int j, len; >> >> DBG(" * %pOF...\n", dn); >> >> @@ -466,29 +476,121 @@ void __init smp_setup_cpu_maps(void) >> >> nthreads = len / sizeof(int); >> >> - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >> - bool avail; >> + avail = of_device_is_available(dn); >> + if (!avail) >> + avail = !of_property_match_string(dn, >> + "enable-method", "spin-table"); >> + >> + for (j = 0; (cpu == 0 || !found_boot_cpu) && j < nthreads; j++) { >> + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { >> + found_boot_cpu = true; >> + if (cpu == 0) >> + break; >> + >> + /* Original logical CPU number of thread0 in boot core */ >> + orig_boot_cpu = cpu; >> + orig_boot_thread = j; >> + bt_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + if (!bt_node) >> + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", >> + __func__, >> + sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + bt_node->len = len; >> + memcpy(bt_node->intserv, intserv, len); >> + bt_node->avail = avail; >> + skip = true; >> + break; >> + } >> + } >> >> + /* >> + * Boot CPU not on core0. >> + * Hold off adding core0 until boot core is found as core0 >> + * may have to be replaced with boot core if boot core can >> + * not be accommodated within nr_cpu_ids with its original >> + * logical CPU numbering. >> + */ >> + if (cpu == 0 && !found_boot_cpu) { >> + core0_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + if (!core0_node) >> + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", >> + __func__, >> + sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + core0_node->len = len; >> + memcpy(core0_node->intserv, intserv, len); >> + core0_node->avail = avail; >> + skip = true; >> + } >> + >> + if (skip) { >> + /* Assumes same number of threads for all cores */ >> + cpu += nthreads; >> + continue; >> + } >> + >> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >> DBG(" thread %d -> cpu %d (hard id %d)\n", >> j, cpu, be32_to_cpu(intserv[j])); >> >> - avail = of_device_is_available(dn); >> - if (!avail) >> - avail = !of_property_match_string(dn, >> - "enable-method", "spin-table"); >> - >> set_cpu_present(cpu, avail); >> set_cpu_possible(cpu, true); >> cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); >> cpu++; >> } >> >> - if (cpu >= nr_cpu_ids) { >> + if (found_boot_cpu && cpu >= nr_cpu_ids) { >> of_node_put(dn); >> break; >> } >> } >> >> + /* >> + * Boot CPU not on core0. >> + * >> + * If nr_cpu_ids does not accommodate the original logical CPU >> numbering for >> + * boot CPU core, use logical CPU numbers 0 to nthreads for boot CPU core. >> + * Note that boot cpu is already assigned with logical CPU number >> somewhere >> + * between 0 to nthreads (depending on the boot thread within the core) in >> + * early_init_dt_scan_cpus() for this case. >> + * >> + * Otherwise, stick with the original logical CPU numbering. >> + */ >> + if (bt_node) { >> + int core0_cpu; >> + >> + if (orig_boot_cpu + orig_boot_thread >= nr_cpu_ids) { >> + cpu = 0; >> + core0_cpu = orig_boot_cpu; >> + } else { >> + cpu = orig_boot_cpu; >> + core0_cpu = 0; >> + } >> + >> + for (j = 0; j < nthreads && core0_cpu < nr_cpu_ids; j++) { >> + DBG(" thread %d -> cpu %d (hard id %d)\n", >> + j, core0_cpu, be32_to_cpu(core0_node->intserv[j])); >> + >> + set_cpu_present(core0_cpu, core0_node->avail); >> + set_cpu_possible(core0_cpu, true); >> + cpu_to_phys_id[core0_cpu] = be32_to_cpu(core0_node->intserv[j]); >> + core0_cpu++; >> + } >> + >> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >> + DBG(" thread %d -> cpu %d (hard id %d)\n", >> + j, cpu, be32_to_cpu(bt_node->intserv[j])); >> + >> + set_cpu_present(cpu, bt_node->avail); >> + set_cpu_possible(cpu, true); >> + cpu_to_phys_id[cpu] = be32_to_cpu(bt_node->intserv[j]); >> + cpu++; >> + } >> + } >> + >> /* If no SMT supported, nthreads is forced to 1 */ >> if (!cpu_has_feature(CPU_FTR_SMT)) { >> DBG(" SMT disabled ! nthreads forced to 1\n"); >> > Thanks Hari
On 09/01/24 9:57 am, Hari Bathini wrote: > Hi Michael, > Sorry, Michael. I am just about getting back to work and I spoke too soon. You already seem to have posted a set with the approach you had in mind: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=388350 Thanks Hari > I am fine with either approach. I was trying to address your concerns > in my way. Looking for your inputs here on how to go about this now.. > > On 29/11/23 7:00 am, Pingfan Liu wrote: >> Hi Hari, >> >> >> On Mon, Nov 27, 2023 at 12:30 PM Hari Bathini <hbathini@linux.ibm.com> >> wrote: >>> >>> Hi Pingfan, Michael, >>> >>> On 17/10/23 4:03 pm, Hari Bathini wrote: >>>> >>>> >>>> On 17/10/23 7:58 am, Pingfan Liu wrote: >>>>> *** Idea *** >>>>> For kexec -p, the boot cpu can be not the cpu0, this causes the >>>>> problem >>>>> of allocating memory for paca_ptrs[]. However, in theory, there is no >>>>> requirement to assign cpu's logical id as its present sequence in the >>>>> device tree. But there is something like cpu_first_thread_sibling(), >>>>> which makes assumption on the mapping inside a core. Hence partially >>>>> loosening the mapping, i.e. unbind the mapping of core while keep the >>>>> mapping inside a core. >>>>> >>>>> *** Implement *** >>>>> At this early stage, there are plenty of memory to utilize. Hence, >>>>> this >>>>> patch allocates interim memory to link the cpu info on a list, then >>>>> reorder cpus by changing the list head. As a result, there is a rotate >>>>> shift between the sequence number in dt and the cpu logical number. >>>>> >>>>> *** Result *** >>>>> After this patch, a boot-cpu's logical id will always be mapped >>>>> into the >>>>> range [0,threads_per_core). >>>>> >>>>> Besides this, at this phase, all threads in the boot core are >>>>> forced to >>>>> be onlined. This restriction will be lifted in a later patch with >>>>> extra effort. >>>>> >>>>> Signed-off-by: Pingfan Liu <piliu@redhat.com> >>>>> Cc: Michael Ellerman <mpe@ellerman.id.au> >>>>> Cc: Nicholas Piggin <npiggin@gmail.com> >>>>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> >>>>> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> >>>>> Cc: Wen Xiong <wenxiong@us.ibm.com> >>>>> Cc: Baoquan He <bhe@redhat.com> >>>>> Cc: Ming Lei <ming.lei@redhat.com> >>>>> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> >>>>> Cc: Hari Bathini <hbathini@linux.ibm.com> >>>>> Cc: kexec@lists.infradead.org >>>>> To: linuxppc-dev@lists.ozlabs.org >>>> >>>> Thanks for working on this, Pingfan. >>>> Looks good to me. >>>> >>>> Acked-by: Hari Bathini <hbathini@linux.ibm.com> >>>> >>> >>> On second thoughts, probably better off with no impact for >>> bootcpu < nr_cpu_ids case and changing only two cores logical >>> numbering otherwise. Something like the below (Please share >>> your thoughts): >>> >> >> I am afraid that it may not be as ideal as it looks, considering the >> following factors: >> -1. For the case of 'bootcpu < nr_cpu_ids', crash can happen evenly >> across any cpu in the system, which seriously undermines the >> protection intended here (Under the most optimistic scenario, there is >> a 50% chance of success) >> >> -2. For the re-ordering of logical numbering, IMHO, if there is >> concern that re-ordering will break something, the partial re-ordering >> can not avoid that. We ought to spot probable hazards so as to ease >> worries. >> >> >> Thanks, >> >> Pingfan >> >>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c >>> index ec82f5bda908..78a8312aa8c4 100644 >>> --- a/arch/powerpc/kernel/prom.c >>> +++ b/arch/powerpc/kernel/prom.c >>> @@ -76,7 +76,9 @@ u64 ppc64_rma_size; >>> unsigned int boot_cpu_node_count __ro_after_init; >>> #endif >>> static phys_addr_t first_memblock_size; >>> +#ifdef CONFIG_SMP >>> static int __initdata boot_cpu_count; >>> +#endif >>> >>> static int __init early_parse_mem(char *p) >>> { >>> @@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned >>> long node, >>> fdt_boot_cpuid_phys(initial_boot_params)) { >>> found = boot_cpu_count; >>> found_thread = i; >>> + /* >>> + * Map boot-cpu logical id into the range >>> + * of [0, thread_per_core) if it can't be >>> + * accommodated within nr_cpu_ids. >>> + */ >>> + if (i != boot_cpu_count && boot_cpu_count >= >>> nr_cpu_ids) { >>> + boot_cpuid = i; >>> + DBG("Logical CPU number for boot CPU >>> changed from %d to %d\n", >>> + boot_cpu_count, i); >>> + } else { >>> + boot_cpuid = boot_cpu_count; >>> + } >>> + >>> + /* Ensure boot thread is acconted for in >>> nr_cpu_ids */ >>> + if (boot_cpuid >= nr_cpu_ids) { >>> + set_nr_cpu_ids(boot_cpuid + 1); >>> + DBG("Adjusted nr_cpu_ids to %u, to >>> include boot CPU.\n", >>> + nr_cpu_ids); >>> + } >>> } >>> #ifdef CONFIG_SMP >>> /* logical cpu id is always 0 on UP kernels */ >>> @@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned >>> long node, >>> if (found < 0) >>> return 0; >>> >>> - DBG("boot cpu: logical %d physical %d\n", found, >>> + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, >>> be32_to_cpu(intserv[found_thread])); >>> - boot_cpuid = found; >>> >>> boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); >>> >>> diff --git a/arch/powerpc/kernel/setup-common.c >>> b/arch/powerpc/kernel/setup-common.c >>> index b7b733474b60..f7179525c774 100644 >>> --- a/arch/powerpc/kernel/setup-common.c >>> +++ b/arch/powerpc/kernel/setup-common.c >>> @@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int >>> tpc) >>> >>> u32 *cpu_to_phys_id = NULL; >>> >>> +struct interrupt_server_node { >>> + bool avail; >>> + int len; >>> + __be32 intserv[]; >>> +}; >>> + >>> /** >>> * setup_cpu_maps - initialize the following cpu maps: >>> * cpu_possible_mask >>> @@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL; >>> */ >>> void __init smp_setup_cpu_maps(void) >>> { >>> + struct interrupt_server_node *core0_node = NULL, *bt_node = >>> NULL; >>> + int orig_boot_cpu = -1, orig_boot_thread = -1; >>> + bool found_boot_cpu = false; >>> struct device_node *dn; >>> - int cpu = 0; >>> int nthreads = 1; >>> + int cpu = 0; >>> + int j, len; >>> >>> DBG("smp_setup_cpu_maps()\n"); >>> >>> @@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void) >>> __func__, nr_cpu_ids * sizeof(u32), >>> __alignof__(u32)); >>> >>> for_each_node_by_type(dn, "cpu") { >>> + bool avail, skip = false; >>> const __be32 *intserv; >>> __be32 cpu_be; >>> - int j, len; >>> >>> DBG(" * %pOF...\n", dn); >>> >>> @@ -466,29 +476,121 @@ void __init smp_setup_cpu_maps(void) >>> >>> nthreads = len / sizeof(int); >>> >>> - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >>> - bool avail; >>> + avail = of_device_is_available(dn); >>> + if (!avail) >>> + avail = !of_property_match_string(dn, >>> + "enable-method", "spin-table"); >>> + >>> + for (j = 0; (cpu == 0 || !found_boot_cpu) && j < >>> nthreads; j++) { >>> + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { >>> + found_boot_cpu = true; >>> + if (cpu == 0) >>> + break; >>> + >>> + /* Original logical CPU number of >>> thread0 in boot core */ >>> + orig_boot_cpu = cpu; >>> + orig_boot_thread = j; >>> + bt_node = >>> memblock_alloc(sizeof(struct interrupt_server_node) + len, >>> + __alignof__(u32)); >>> + if (!bt_node) >>> + panic("%s: Failed to allocate >>> %zu bytes align=0x%zx\n", >>> + __func__, >>> + sizeof(struct >>> interrupt_server_node) + len, >>> + __alignof__(u32)); >>> + bt_node->len = len; >>> + memcpy(bt_node->intserv, intserv, len); >>> + bt_node->avail = avail; >>> + skip = true; >>> + break; >>> + } >>> + } >>> >>> + /* >>> + * Boot CPU not on core0. >>> + * Hold off adding core0 until boot core is found as >>> core0 >>> + * may have to be replaced with boot core if boot >>> core can >>> + * not be accommodated within nr_cpu_ids with its >>> original >>> + * logical CPU numbering. >>> + */ >>> + if (cpu == 0 && !found_boot_cpu) { >>> + core0_node = memblock_alloc(sizeof(struct >>> interrupt_server_node) + len, >>> + __alignof__(u32)); >>> + if (!core0_node) >>> + panic("%s: Failed to allocate %zu >>> bytes align=0x%zx\n", >>> + __func__, >>> + sizeof(struct >>> interrupt_server_node) + len, >>> + __alignof__(u32)); >>> + core0_node->len = len; >>> + memcpy(core0_node->intserv, intserv, len); >>> + core0_node->avail = avail; >>> + skip = true; >>> + } >>> + >>> + if (skip) { >>> + /* Assumes same number of threads for all >>> cores */ >>> + cpu += nthreads; >>> + continue; >>> + } >>> + >>> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >>> DBG(" thread %d -> cpu %d (hard id %d)\n", >>> j, cpu, be32_to_cpu(intserv[j])); >>> >>> - avail = of_device_is_available(dn); >>> - if (!avail) >>> - avail = !of_property_match_string(dn, >>> - "enable-method", >>> "spin-table"); >>> - >>> set_cpu_present(cpu, avail); >>> set_cpu_possible(cpu, true); >>> cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); >>> cpu++; >>> } >>> >>> - if (cpu >= nr_cpu_ids) { >>> + if (found_boot_cpu && cpu >= nr_cpu_ids) { >>> of_node_put(dn); >>> break; >>> } >>> } >>> >>> + /* >>> + * Boot CPU not on core0. >>> + * >>> + * If nr_cpu_ids does not accommodate the original logical CPU >>> numbering for >>> + * boot CPU core, use logical CPU numbers 0 to nthreads for >>> boot CPU core. >>> + * Note that boot cpu is already assigned with logical CPU >>> number >>> somewhere >>> + * between 0 to nthreads (depending on the boot thread within >>> the core) in >>> + * early_init_dt_scan_cpus() for this case. >>> + * >>> + * Otherwise, stick with the original logical CPU numbering. >>> + */ >>> + if (bt_node) { >>> + int core0_cpu; >>> + >>> + if (orig_boot_cpu + orig_boot_thread >= nr_cpu_ids) { >>> + cpu = 0; >>> + core0_cpu = orig_boot_cpu; >>> + } else { >>> + cpu = orig_boot_cpu; >>> + core0_cpu = 0; >>> + } >>> + >>> + for (j = 0; j < nthreads && core0_cpu < nr_cpu_ids; >>> j++) { >>> + DBG(" thread %d -> cpu %d (hard id %d)\n", >>> + j, core0_cpu, >>> be32_to_cpu(core0_node->intserv[j])); >>> + >>> + set_cpu_present(core0_cpu, core0_node->avail); >>> + set_cpu_possible(core0_cpu, true); >>> + cpu_to_phys_id[core0_cpu] = >>> be32_to_cpu(core0_node->intserv[j]); >>> + core0_cpu++; >>> + } >>> + >>> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >>> + DBG(" thread %d -> cpu %d (hard id %d)\n", >>> + j, cpu, be32_to_cpu(bt_node->intserv[j])); >>> + >>> + set_cpu_present(cpu, bt_node->avail); >>> + set_cpu_possible(cpu, true); >>> + cpu_to_phys_id[cpu] = >>> be32_to_cpu(bt_node->intserv[j]); >>> + cpu++; >>> + } >>> + } >>> + >>> /* If no SMT supported, nthreads is forced to 1 */ >>> if (!cpu_has_feature(CPU_FTR_SMT)) { >>> DBG(" SMT disabled ! nthreads forced to 1\n"); >>> >> > > Thanks > Hari
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..7ed9034912ca 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -76,7 +76,9 @@ u64 ppc64_rma_size; unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size; +#ifdef CONFIG_SMP static int __initdata boot_cpu_count; +#endif static int __init early_parse_mem(char *p) { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, const __be32 *intserv; int i, nthreads; int len; - int found = -1; - int found_thread = 0; + bool found = false; /* We are scanning "cpu" nodes only */ if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node, for (i = 0; i < nthreads; i++) { if (be32_to_cpu(intserv[i]) == fdt_boot_cpuid_phys(initial_boot_params)) { - found = boot_cpu_count; - found_thread = i; + /* + * always map the boot-cpu logical id into the + * range of [0, thread_per_core) + */ + boot_cpuid = i; + found = true; + /* This forces all threads in a core to be online */ + if (nr_cpu_ids % nthreads != 0) + set_nr_cpu_ids(ALIGN(nr_cpu_ids, nthreads)); } #ifdef CONFIG_SMP /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node, } /* Not the boot CPU */ - if (found < 0) + if (!found) return 0; - DBG("boot cpu: logical %d physical %d\n", found, - be32_to_cpu(intserv[found_thread])); - boot_cpuid = found; + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, + be32_to_cpu(intserv[boot_cpuid])); - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]); /* * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 707f0490639d..9802c7e5ee2f 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -36,6 +36,7 @@ #include <linux/of_irq.h> #include <linux/hugetlb.h> #include <linux/pgtable.h> +#include <linux/list.h> #include <asm/io.h> #include <asm/paca.h> #include <asm/processor.h> @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc) u32 *cpu_to_phys_id = NULL; +struct interrupt_server_node { + struct list_head node; + bool avail; + int len; + __be32 intserv[]; +}; + /** * setup_cpu_maps - initialize the following cpu maps: * cpu_possible_mask @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; void __init smp_setup_cpu_maps(void) { struct device_node *dn; - int cpu = 0; - int nthreads = 1; + int shift = 0, cpu = 0; + int j, nthreads = 1; + int len; + struct interrupt_server_node *intserv_node, *n; + struct list_head *bt_node, head; + bool avail, found_boot_cpu = false; DBG("smp_setup_cpu_maps()\n"); + INIT_LIST_HEAD(&head); cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32), __alignof__(u32)); if (!cpu_to_phys_id) @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void) for_each_node_by_type(dn, "cpu") { const __be32 *intserv; __be32 cpu_be; - int j, len; DBG(" * %pOF...\n", dn); @@ -480,29 +492,65 @@ void __init smp_setup_cpu_maps(void) } } - nthreads = len / sizeof(int); + avail = of_device_is_available(dn); + if (!avail) + avail = !of_property_match_string(dn, + "enable-method", "spin-table"); - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { - bool avail; - DBG(" thread %d -> cpu %d (hard id %d)\n", - j, cpu, be32_to_cpu(intserv[j])); - - avail = of_device_is_available(dn); - if (!avail) - avail = !of_property_match_string(dn, - "enable-method", "spin-table"); + intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, + __alignof__(u32)); + if (!intserv_node) + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", + __func__, + sizeof(struct interrupt_server_node) + len, + __alignof__(u32)); + intserv_node->len = len; + memcpy(intserv_node->intserv, intserv, len); + intserv_node->avail = avail; + list_add_tail(&intserv_node->node, &head); + + if (!found_boot_cpu) { + nthreads = len / sizeof(int); + for (j = 0 ; j < nthreads; j++) { + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { + bt_node = &intserv_node->node; + found_boot_cpu = true; + /* + * Record the round-shift between dt + * seq and cpu logical number + */ + shift = cpu - j; + break; + } + + cpu++; + } + } + } + cpu = 0; + list_del_init(&head); + /* Select the primary thread, the boot cpu's slibing, as the logic 0 */ + list_add_tail(&head, bt_node); + pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift); + list_for_each_entry(intserv_node, &head, node) { + + avail = intserv_node->avail; + nthreads = intserv_node->len / sizeof(int); + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { set_cpu_present(cpu, avail); set_cpu_possible(cpu, true); - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]); + DBG(" thread %d -> cpu %d (hard id %d)\n", + j, cpu, be32_to_cpu(intserv_node->intserv[j])); cpu++; } + } - if (cpu >= nr_cpu_ids) { - of_node_put(dn); - break; - } + list_for_each_entry_safe(intserv_node, n, &head, node) { + len = sizeof(struct interrupt_server_node) + intserv_node->len; + memblock_free(intserv_node, len); } /* If no SMT supported, nthreads is forced to 1 */
*** Idea *** For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core. *** Implement *** At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number. *** Result *** After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core). Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Wen Xiong <wenxiong@us.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: kexec@lists.infradead.org To: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/kernel/prom.c | 25 +++++---- arch/powerpc/kernel/setup-common.c | 84 +++++++++++++++++++++++------- 2 files changed, 82 insertions(+), 27 deletions(-)