diff mbox series

[PATCHv8,1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32

Message ID 20231009113036.45988-2-piliu@redhat.com (mailing list archive)
State Superseded
Headers show
Series enable nr_cpus for powerpc | expand

Commit Message

Pingfan Liu Oct. 9, 2023, 11:30 a.m. UTC
In order to identify the boot cpu, its intserv[] should be recorded and
checked in smp_setup_cpu_maps().

smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has
already used boot_cpu_hwid to carry that information, enabling this
variable on PPC32 so later it can also be used to carry that information
for PPC32 in the coming patch.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/smp.h     | 2 +-
 arch/powerpc/kernel/prom.c         | 3 +--
 arch/powerpc/kernel/setup-common.c | 2 --
 3 files changed, 2 insertions(+), 5 deletions(-)

Comments

Sourabh Jain Oct. 10, 2023, 4:44 a.m. UTC | #1
Hello Pingfan,

With this patch series applied, the kdump kernel fails to boot on 
powerpc with nr_cpus=1.

Console logs:
-------------------
[root]# echo c > /proc/sysrq-trigger
[   74.783235] sysrq: Trigger a crash
[   74.783244] Kernel panic - not syncing: sysrq triggered crash
[   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 
6.6.0-rc5pf-nr-cpus+ #3
[   74.783259] Hardware name: POWER10 (raw) phyp pSeries
[   74.783275] Call Trace:
[   74.783280] [c00000020f4ebac0] [c000000000ed9f38] 
dump_stack_lvl+0x6c/0x9c (unreliable)
[   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
[   74.783298] [c00000020f4ebb90] [c000000000936d48] 
sysrq_handle_crash+0x28/0x30
[   74.783304] [c00000020f4ebbf0] [c00000000093773c] 
__handle_sysrq+0x10c/0x250
[   74.783309] [c00000020f4ebc90] [c000000000937fa8] 
write_sysrq_trigger+0xc8/0x168
[   74.783314] [c00000020f4ebcd0] [c000000000665d8c] 
proc_reg_write+0x10c/0x1b0
[   74.783321] [c00000020f4ebd00] [c00000000058da54] vfs_write+0x104/0x4b0
[   74.783326] [c00000020f4ebdc0] [c00000000058dfdc] ksys_write+0x7c/0x140
[   74.783331] [c00000020f4ebe10] [c000000000033a64] 
system_call_exception+0x144/0x3a0
[   74.783337] [c00000020f4ebe50] [c00000000000c554] 
system_call_common+0xf4/0x258
[   74.783343] --- interrupt: c00 at 0x7fffa0721594
[   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR: 
0000000000000000
[   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted 
(6.6.0-rc5pf-nr-cpus+)
[   74.783376] MSR:  800000000280f033 
<SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
[   74.783394] IRQMASK: 0
[   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007fffa0807300 
0000000000000001
[   74.783394] GPR04: 000000013549ea60 0000000000000002 0000000000000010 
0000000000000000
[   74.783394] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000040000000 
000000011a0f9798
[   74.783394] GPR16: 000000011a0f9724 000000011a097688 000000011a02ff70 
000000011a0fd568
[   74.783394] GPR20: 0000000135554bf0 0000000000000001 000000011a0aa478 
00007ffffc4b6a24
[   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000000000002 
000000013549ea60
[   74.783394] GPR28: 0000000000000002 00007fffa08017a0 000000013549ea60 
0000000000000002
[   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
[   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
[   74.783447] --- interrupt: c00
I'm in purgatory
[    0.000000] radix-mmu: Page sizes from device-tree:
[    0.000000] radix-mmu: Page size shift = 12 AP=0x0
[    0.000000] radix-mmu: Page size shift = 16 AP=0x5
[    0.000000] radix-mmu: Page size shift = 21 AP=0x1
[    0.000000] radix-mmu: Page size shift = 30 AP=0x2
[    0.000000] Activating Kernel Userspace Access Prevention
[    0.000000] Activating Kernel Userspace Execution Prevention
[    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000 
with 64.0 KiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000 
with 64.0 KiB pages
[    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000 
with 2.00 MiB pages
[    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000 
with 2.00 MiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000 
with 2.00 MiB pages
[    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000 
with 1.00 GiB pages
[    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000 
with 2.00 MiB pages
[    0.000000] lpar: Using radix MMU under hypervisor
[    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+ 
(root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red 
Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
41 CDT 2023
[    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
[    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 
0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
[    0.000000] printk: bootconsole [udbg0] enabled
[    0.000000] the round shift between dt seq and the cpu logic number: 56
[    0.000000] BUG: Unable to handle kernel data access on write at 
0xc0000001a0000000
[    0.000000] Faulting instruction address: 0xc000000022009c64
[    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[    0.000000] Modules linked in:
[    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted 
6.6.0-rc5pf-nr-cpus+ #3
[    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
[    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR: 
c0000000201ff348
[    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted 
(6.6.0-rc5pf-nr-cpus+)
[    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 28222824  
XER: 00000001
[    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR: 
42000000 IRQMASK: 1
[    0.000000] GPR00: c000000022009ba0 c000000022aebda0 c0000000213d1300 
0000000000000004
[    0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000022aebbb8 
0000000000000000
[    0.000000] GPR08: 0000000000000001 c00000019ffffff8 000000000000003a 
c0000000229c8a78
[    0.000000] GPR12: 0000000000002000 c000000022e4a800 c0000000211d34b8 
c0000000211d3aa8
[    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c0000000225f3b98 
0000000000000000
[    0.000000] GPR20: 0000000000000001 0000000000000001 0000000000000001 
0000000000000001
[    0.000000] GPR24: 0000000000000008 0000000000000000 0000000000000001 
c00000019ffffdc0
[    0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000022aebe08 
0000000000000008
[    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
[    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
[    0.000000] Call Trace:
[    0.000000] [c000000022aebda0] [c000000022009ba0] 
smp_setup_cpu_maps+0x35c/0x724 (unreliable)
[    0.000000] [c000000022aebeb0] [c00000002200a19c] setup_arch+0x1b8/0x54c
[    0.000000] [c000000022aebf30] [c000000022003f88] start_kernel+0xb0/0x768
[    0.000000] [c000000022aebfe0] [c00000002000d888] 
start_here_common+0x1c/0x20
[    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378 
4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c 
<7d49c12e> eb7b0000 7e99a378 4bffff3c
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000]
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] Rebooting in 180 seconds..

However, the kdump kernel boots fine if the kernel crashes on CPU 0.

Thanks,
Sourabh Jain


On 09/10/23 17:00, Pingfan Liu wrote:
> In order to identify the boot cpu, its intserv[] should be recorded and
> checked in smp_setup_cpu_maps().
>
> smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has
> already used boot_cpu_hwid to carry that information, enabling this
> variable on PPC32 so later it can also be used to carry that information
> for PPC32 in the coming patch.
>
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
>   arch/powerpc/include/asm/smp.h     | 2 +-
>   arch/powerpc/kernel/prom.c         | 3 +--
>   arch/powerpc/kernel/setup-common.c | 2 --
>   3 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index aaaa576d0e15..5db9178cc800 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -26,7 +26,7 @@
>   #include <asm/percpu.h>
>   
>   extern int boot_cpuid;
> -extern int boot_cpu_hwid; /* PPC64 only */
> +extern int boot_cpu_hwid;
>   extern int spinning_secondaries;
>   extern u32 *cpu_to_phys_id;
>   extern bool coregroup_enabled;
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 0b5878c3125b..ec82f5bda908 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -372,8 +372,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>   	    be32_to_cpu(intserv[found_thread]));
>   	boot_cpuid = found;
>   
> -	if (IS_ENABLED(CONFIG_PPC64))
> -		boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
> +	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
>   
>   	/*
>   	 * PAPR defines "logical" PVR values for cpus that
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index d2a446216444..1b19a9815672 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -87,9 +87,7 @@ EXPORT_SYMBOL(machine_id);
>   int boot_cpuid = -1;
>   EXPORT_SYMBOL_GPL(boot_cpuid);
>   
> -#ifdef CONFIG_PPC64
>   int boot_cpu_hwid = -1;
> -#endif
>   
>   /*
>    * These are used in binfmt_elf.c to put aux entries on the stack
Sourabh Jain Oct. 10, 2023, 8:24 a.m. UTC | #2
Hello Pingfan,

>
> With this patch series applied, the kdump kernel fails to boot on 
> powerpc with nr_cpus=1.
>
> Console logs:
> -------------------
> [root]# echo c > /proc/sysrq-trigger
> [   74.783235] sysrq: Trigger a crash
> [   74.783244] Kernel panic - not syncing: sysrq triggered crash
> [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 
> 6.6.0-rc5pf-nr-cpus+ #3
> [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
> [   74.783275] Call Trace:
> [   74.783280] [c00000020f4ebac0] [c000000000ed9f38] 
> dump_stack_lvl+0x6c/0x9c (unreliable)
> [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
> [   74.783298] [c00000020f4ebb90] [c000000000936d48] 
> sysrq_handle_crash+0x28/0x30
> [   74.783304] [c00000020f4ebbf0] [c00000000093773c] 
> __handle_sysrq+0x10c/0x250
> [   74.783309] [c00000020f4ebc90] [c000000000937fa8] 
> write_sysrq_trigger+0xc8/0x168
> [   74.783314] [c00000020f4ebcd0] [c000000000665d8c] 
> proc_reg_write+0x10c/0x1b0
> [   74.783321] [c00000020f4ebd00] [c00000000058da54] 
> vfs_write+0x104/0x4b0
> [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc] 
> ksys_write+0x7c/0x140
> [   74.783331] [c00000020f4ebe10] [c000000000033a64] 
> system_call_exception+0x144/0x3a0
> [   74.783337] [c00000020f4ebe50] [c00000000000c554] 
> system_call_common+0xf4/0x258
> [   74.783343] --- interrupt: c00 at 0x7fffa0721594
> [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR: 
> 0000000000000000
> [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted 
> (6.6.0-rc5pf-nr-cpus+)
> [   74.783376] MSR:  800000000280f033 
> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> [   74.783394] IRQMASK: 0
> [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 
> 00007fffa0807300 0000000000000001
> [   74.783394] GPR04: 000000013549ea60 0000000000000002 
> 0000000000000010 0000000000000000
> [   74.783394] GPR08: 0000000000000000 0000000000000000 
> 0000000000000000 0000000000000000
> [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 
> 0000000040000000 000000011a0f9798
> [   74.783394] GPR16: 000000011a0f9724 000000011a097688 
> 000000011a02ff70 000000011a0fd568
> [   74.783394] GPR20: 0000000135554bf0 0000000000000001 
> 000000011a0aa478 00007ffffc4b6a24
> [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 
> 0000000000000002 000000013549ea60
> [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 
> 000000013549ea60 0000000000000002
> [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
> [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
> [   74.783447] --- interrupt: c00
> I'm in purgatory
> [    0.000000] radix-mmu: Page sizes from device-tree:
> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> [    0.000000] Activating Kernel Userspace Access Prevention
> [    0.000000] Activating Kernel Userspace Execution Prevention
> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000 
> with 64.0 KiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000 
> with 64.0 KiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000 
> with 2.00 MiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000 
> with 2.00 MiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000 
> with 2.00 MiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000 
> with 1.00 GiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000 
> with 2.00 MiB pages
> [    0.000000] lpar: Using radix MMU under hypervisor
> [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+ 
> (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 
> (Red Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
> 41 CDT 2023
> [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
> [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 
> 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
> [    0.000000] printk: bootconsole [udbg0] enabled
> [    0.000000] the round shift between dt seq and the cpu logic 
> number: 56
> [    0.000000] BUG: Unable to handle kernel data access on write at 
> 0xc0000001a0000000
> [    0.000000] Faulting instruction address: 0xc000000022009c64
> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted 
> 6.6.0-rc5pf-nr-cpus+ #3
> [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
> [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR: 
> c0000000201ff348
> [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted 
> (6.6.0-rc5pf-nr-cpus+)
> [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 
> 28222824  XER: 00000001
> [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR: 
> 42000000 IRQMASK: 1
> [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 
> c0000000213d1300 0000000000000004
> [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 
> c000000022aebbb8 0000000000000000
> [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 
> 000000000000003a c0000000229c8a78
> [    0.000000] GPR12: 0000000000002000 c000000022e4a800 
> c0000000211d34b8 c0000000211d3aa8
> [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 
> c0000000225f3b98 0000000000000000
> [    0.000000] GPR20: 0000000000000001 0000000000000001 
> 0000000000000001 0000000000000001
> [    0.000000] GPR24: 0000000000000008 0000000000000000 
> 0000000000000001 c00000019ffffdc0
> [    0.000000] GPR28: 0000000000000002 c000000022b368e0 
> c000000022aebe08 0000000000000008
> [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
> [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
> [    0.000000] Call Trace:
> [    0.000000] [c000000022aebda0] [c000000022009ba0] 
> smp_setup_cpu_maps+0x35c/0x724 (unreliable)
> [    0.000000] [c000000022aebeb0] [c00000002200a19c] 
> setup_arch+0x1b8/0x54c
> [    0.000000] [c000000022aebf30] [c000000022003f88] 
> start_kernel+0xb0/0x768
> [    0.000000] [c000000022aebfe0] [c00000002000d888] 
> start_here_common+0x1c/0x20
> [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378 
> 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c 
> <7d49c12e> eb7b0000 7e99a378 4bffff3c
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000]
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] Rebooting in 180 seconds..
>
> However, the kdump kernel boots fine if the kernel crashes on CPU 0.

Found a pattern in kdump kernel failure with nr_cpus=1.

On CPU 0, 8, 16, 24, 32, 40, it boots fine.
On CPUs 1-7, 9-15, 17-23, 25-31, 33-39, it fails to boot.

Hope this helps.

Thanks,
Sourabh
Sourabh Jain Oct. 10, 2023, 9:08 a.m. UTC | #3
Hello Pingfan,

>
> With this patch series applied, the kdump kernel fails to boot on 
> powerpc with nr_cpus=1.
>
> Console logs:
> -------------------
> [root]# echo c > /proc/sysrq-trigger
> [   74.783235] sysrq: Trigger a crash
> [   74.783244] Kernel panic - not syncing: sysrq triggered crash
> [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 
> 6.6.0-rc5pf-nr-cpus+ #3
> [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
> [   74.783275] Call Trace:
> [   74.783280] [c00000020f4ebac0] [c000000000ed9f38] 
> dump_stack_lvl+0x6c/0x9c (unreliable)
> [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
> [   74.783298] [c00000020f4ebb90] [c000000000936d48] 
> sysrq_handle_crash+0x28/0x30
> [   74.783304] [c00000020f4ebbf0] [c00000000093773c] 
> __handle_sysrq+0x10c/0x250
> [   74.783309] [c00000020f4ebc90] [c000000000937fa8] 
> write_sysrq_trigger+0xc8/0x168
> [   74.783314] [c00000020f4ebcd0] [c000000000665d8c] 
> proc_reg_write+0x10c/0x1b0
> [   74.783321] [c00000020f4ebd00] [c00000000058da54] 
> vfs_write+0x104/0x4b0
> [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc] 
> ksys_write+0x7c/0x140
> [   74.783331] [c00000020f4ebe10] [c000000000033a64] 
> system_call_exception+0x144/0x3a0
> [   74.783337] [c00000020f4ebe50] [c00000000000c554] 
> system_call_common+0xf4/0x258
> [   74.783343] --- interrupt: c00 at 0x7fffa0721594
> [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR: 
> 0000000000000000
> [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted 
> (6.6.0-rc5pf-nr-cpus+)
> [   74.783376] MSR:  800000000280f033 
> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> [   74.783394] IRQMASK: 0
> [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 
> 00007fffa0807300 0000000000000001
> [   74.783394] GPR04: 000000013549ea60 0000000000000002 
> 0000000000000010 0000000000000000
> [   74.783394] GPR08: 0000000000000000 0000000000000000 
> 0000000000000000 0000000000000000
> [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 
> 0000000040000000 000000011a0f9798
> [   74.783394] GPR16: 000000011a0f9724 000000011a097688 
> 000000011a02ff70 000000011a0fd568
> [   74.783394] GPR20: 0000000135554bf0 0000000000000001 
> 000000011a0aa478 00007ffffc4b6a24
> [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 
> 0000000000000002 000000013549ea60
> [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 
> 000000013549ea60 0000000000000002
> [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
> [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
> [   74.783447] --- interrupt: c00
> I'm in purgatory
> [    0.000000] radix-mmu: Page sizes from device-tree:
> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> [    0.000000] Activating Kernel Userspace Access Prevention
> [    0.000000] Activating Kernel Userspace Execution Prevention
> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000 
> with 64.0 KiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000 
> with 64.0 KiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000 
> with 2.00 MiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000 
> with 2.00 MiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000 
> with 2.00 MiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000 
> with 1.00 GiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000 
> with 2.00 MiB pages
> [    0.000000] lpar: Using radix MMU under hypervisor
> [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+ 
> (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 
> (Red Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
> 41 CDT 2023
> [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
> [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 
> 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
> [    0.000000] printk: bootconsole [udbg0] enabled
> [    0.000000] the round shift between dt seq and the cpu logic 
> number: 56
> [    0.000000] BUG: Unable to handle kernel data access on write at 
> 0xc0000001a0000000
> [    0.000000] Faulting instruction address: 0xc000000022009c64
> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted 
> 6.6.0-rc5pf-nr-cpus+ #3
> [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
> [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR: 
> c0000000201ff348
> [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted 
> (6.6.0-rc5pf-nr-cpus+)
> [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 
> 28222824  XER: 00000001
> [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR: 
> 42000000 IRQMASK: 1
> [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 
> c0000000213d1300 0000000000000004
> [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 
> c000000022aebbb8 0000000000000000
> [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 
> 000000000000003a c0000000229c8a78
> [    0.000000] GPR12: 0000000000002000 c000000022e4a800 
> c0000000211d34b8 c0000000211d3aa8
> [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 
> c0000000225f3b98 0000000000000000
> [    0.000000] GPR20: 0000000000000001 0000000000000001 
> 0000000000000001 0000000000000001
> [    0.000000] GPR24: 0000000000000008 0000000000000000 
> 0000000000000001 c00000019ffffdc0
> [    0.000000] GPR28: 0000000000000002 c000000022b368e0 
> c000000022aebe08 0000000000000008
> [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
> [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
> [    0.000000] Call Trace:
> [    0.000000] [c000000022aebda0] [c000000022009ba0] 
> smp_setup_cpu_maps+0x35c/0x724 (unreliable)
> [    0.000000] [c000000022aebeb0] [c00000002200a19c] 
> setup_arch+0x1b8/0x54c
> [    0.000000] [c000000022aebf30] [c000000022003f88] 
> start_kernel+0xb0/0x768
> [    0.000000] [c000000022aebfe0] [c00000002000d888] 
> start_here_common+0x1c/0x20
> [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378 
> 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c 
> <7d49c12e> eb7b0000 7e99a378 4bffff3c

The faulting instruction address, 0xc000000022009c6, corresponds to the 
code below:

File:
arch/powerpc/kernel/setup-common.c

Function
void __init smp_setup_cpu_maps(void)
{
             ...
             cpu_to_phys_id[bt_thread] = 
be32_to_cpu(intserv_node->intserv[bt_thread]);
             ...
}

Hope it helps.

Thanks,
Sourabh Jain
Pingfan Liu Oct. 11, 2023, 2:30 a.m. UTC | #4
On Tue, Oct 10, 2023 at 02:38:40PM +0530, Sourabh Jain wrote:
> Hello Pingfan,
> 
> > 
> > With this patch series applied, the kdump kernel fails to boot on
> > powerpc with nr_cpus=1.
> > 
> > Console logs:
> > -------------------
> > [root]# echo c > /proc/sysrq-trigger
> > [   74.783235] sysrq: Trigger a crash
> > [   74.783244] Kernel panic - not syncing: sysrq triggered crash
> > [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted
> > 6.6.0-rc5pf-nr-cpus+ #3
> > [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
> > [   74.783275] Call Trace:
> > [   74.783280] [c00000020f4ebac0] [c000000000ed9f38]
> > dump_stack_lvl+0x6c/0x9c (unreliable)
> > [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
> > [   74.783298] [c00000020f4ebb90] [c000000000936d48]
> > sysrq_handle_crash+0x28/0x30
> > [   74.783304] [c00000020f4ebbf0] [c00000000093773c]
> > __handle_sysrq+0x10c/0x250
> > [   74.783309] [c00000020f4ebc90] [c000000000937fa8]
> > write_sysrq_trigger+0xc8/0x168
> > [   74.783314] [c00000020f4ebcd0] [c000000000665d8c]
> > proc_reg_write+0x10c/0x1b0
> > [   74.783321] [c00000020f4ebd00] [c00000000058da54]
> > vfs_write+0x104/0x4b0
> > [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc]
> > ksys_write+0x7c/0x140
> > [   74.783331] [c00000020f4ebe10] [c000000000033a64]
> > system_call_exception+0x144/0x3a0
> > [   74.783337] [c00000020f4ebe50] [c00000000000c554]
> > system_call_common+0xf4/0x258
> > [   74.783343] --- interrupt: c00 at 0x7fffa0721594
> > [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR:
> > 0000000000000000
> > [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted
> > (6.6.0-rc5pf-nr-cpus+)
> > [   74.783376] MSR:  800000000280f033
> > <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> > [   74.783394] IRQMASK: 0
> > [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007fffa0807300
> > 0000000000000001
> > [   74.783394] GPR04: 000000013549ea60 0000000000000002 0000000000000010
> > 0000000000000000
> > [   74.783394] GPR08: 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000
> > [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000040000000
> > 000000011a0f9798
> > [   74.783394] GPR16: 000000011a0f9724 000000011a097688 000000011a02ff70
> > 000000011a0fd568
> > [   74.783394] GPR20: 0000000135554bf0 0000000000000001 000000011a0aa478
> > 00007ffffc4b6a24
> > [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000000000002
> > 000000013549ea60
> > [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 000000013549ea60
> > 0000000000000002
> > [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
> > [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
> > [   74.783447] --- interrupt: c00
> > I'm in purgatory
> > [    0.000000] radix-mmu: Page sizes from device-tree:
> > [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> > [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> > [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> > [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> > [    0.000000] Activating Kernel Userspace Access Prevention
> > [    0.000000] Activating Kernel Userspace Execution Prevention
> > [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
> > with 64.0 KiB pages (exec)
> > [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
> > with 64.0 KiB pages
> > [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
> > with 2.00 MiB pages
> > [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
> > with 2.00 MiB pages (exec)
> > [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
> > with 2.00 MiB pages
> > [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000
> > with 1.00 GiB pages
> > [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000
> > with 2.00 MiB pages
> > [    0.000000] lpar: Using radix MMU under hypervisor
> > [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+
> > (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red
> > Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
> > 41 CDT 2023
> > [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
> > [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200
> > 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
> > [    0.000000] printk: bootconsole [udbg0] enabled
> > [    0.000000] the round shift between dt seq and the cpu logic number:
> > 56
> > [    0.000000] BUG: Unable to handle kernel data access on write at
> > 0xc0000001a0000000
> > [    0.000000] Faulting instruction address: 0xc000000022009c64
> > [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> > [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted
> > 6.6.0-rc5pf-nr-cpus+ #3
> > [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
> > [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR:
> > c0000000201ff348
> > [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted
> > (6.6.0-rc5pf-nr-cpus+)
> > [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 28222824 
> > XER: 00000001
> > [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR:
> > 42000000 IRQMASK: 1
> > [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 c0000000213d1300
> > 0000000000000004
> > [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000022aebbb8
> > 0000000000000000
> > [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 000000000000003a
> > c0000000229c8a78
> > [    0.000000] GPR12: 0000000000002000 c000000022e4a800 c0000000211d34b8
> > c0000000211d3aa8
> > [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c0000000225f3b98
> > 0000000000000000
> > [    0.000000] GPR20: 0000000000000001 0000000000000001 0000000000000001
> > 0000000000000001
> > [    0.000000] GPR24: 0000000000000008 0000000000000000 0000000000000001
> > c00000019ffffdc0
> > [    0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000022aebe08
> > 0000000000000008
> > [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
> > [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
> > [    0.000000] Call Trace:
> > [    0.000000] [c000000022aebda0] [c000000022009ba0]
> > smp_setup_cpu_maps+0x35c/0x724 (unreliable)
> > [    0.000000] [c000000022aebeb0] [c00000002200a19c]
> > setup_arch+0x1b8/0x54c
> > [    0.000000] [c000000022aebf30] [c000000022003f88]
> > start_kernel+0xb0/0x768
> > [    0.000000] [c000000022aebfe0] [c00000002000d888]
> > start_here_common+0x1c/0x20
> > [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378
> > 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c
> > <7d49c12e> eb7b0000 7e99a378 4bffff3c
> 
> The faulting instruction address, 0xc000000022009c6, corresponds to the code
> below:
> 
> File:
> arch/powerpc/kernel/setup-common.c
> 
> Function
> void __init smp_setup_cpu_maps(void)
> {
>             ...
>             cpu_to_phys_id[bt_thread] =
> be32_to_cpu(intserv_node->intserv[bt_thread]);
>             ...
> }
> 
> Hope it helps.
> 

Appreciate your help.

This issue should be linked with the capability of cpu_to_phys_id[].

Could you please to try the fix suggested at the end of the email? 
It should be a fix for 
[PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus


Thanks,

Pingfan

---

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index bd7853a4bc91..849adc7a4b47 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -464,12 +464,6 @@ void __init smp_setup_cpu_maps(void)
 	DBG("smp_setup_cpu_maps()\n");
 
 	INIT_LIST_HEAD(&head);
-	cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
-					__alignof__(u32));
-	if (!cpu_to_phys_id)
-		panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
-		      __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
-
 	for_each_node_by_type(dn, "cpu") {
 		const __be32 *intserv;
 		__be32 cpu_be;
@@ -533,6 +527,16 @@ void __init smp_setup_cpu_maps(void)
 		}
 
 	}
+
+	/* There may be hole between cpu0 and boot cpu */
+	j = (bt_thread + 1) > nr_cpu_ids ? (bt_thread + 1) : nr_cpu_ids;
+	cpu_to_phys_id = memblock_alloc(j * sizeof(u32),
+					__alignof__(u32));
+	if (!cpu_to_phys_id)
+		panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
+		      __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
+
+
 	cpu = 0;
 	list_del_init(&head);
 	/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
Sourabh Jain Oct. 11, 2023, 10:53 a.m. UTC | #5
Hello Pingfan,
>>> With this patch series applied, the kdump kernel fails to boot on
>>> powerpc with nr_cpus=1.
>>>
>>> Console logs:
>>> -------------------
>>> [root]# echo c > /proc/sysrq-trigger
>>> [   74.783235] sysrq: Trigger a crash
>>> [   74.783244] Kernel panic - not syncing: sysrq triggered crash
>>> [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted
>>> 6.6.0-rc5pf-nr-cpus+ #3
>>> [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
>>> [   74.783275] Call Trace:
>>> [   74.783280] [c00000020f4ebac0] [c000000000ed9f38]
>>> dump_stack_lvl+0x6c/0x9c (unreliable)
>>> [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
>>> [   74.783298] [c00000020f4ebb90] [c000000000936d48]
>>> sysrq_handle_crash+0x28/0x30
>>> [   74.783304] [c00000020f4ebbf0] [c00000000093773c]
>>> __handle_sysrq+0x10c/0x250
>>> [   74.783309] [c00000020f4ebc90] [c000000000937fa8]
>>> write_sysrq_trigger+0xc8/0x168
>>> [   74.783314] [c00000020f4ebcd0] [c000000000665d8c]
>>> proc_reg_write+0x10c/0x1b0
>>> [   74.783321] [c00000020f4ebd00] [c00000000058da54]
>>> vfs_write+0x104/0x4b0
>>> [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc]
>>> ksys_write+0x7c/0x140
>>> [   74.783331] [c00000020f4ebe10] [c000000000033a64]
>>> system_call_exception+0x144/0x3a0
>>> [   74.783337] [c00000020f4ebe50] [c00000000000c554]
>>> system_call_common+0xf4/0x258
>>> [   74.783343] --- interrupt: c00 at 0x7fffa0721594
>>> [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR:
>>> 0000000000000000
>>> [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted
>>> (6.6.0-rc5pf-nr-cpus+)
>>> [   74.783376] MSR:  800000000280f033
>>> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
>>> [   74.783394] IRQMASK: 0
>>> [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007fffa0807300
>>> 0000000000000001
>>> [   74.783394] GPR04: 000000013549ea60 0000000000000002 0000000000000010
>>> 0000000000000000
>>> [   74.783394] GPR08: 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000
>>> [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000040000000
>>> 000000011a0f9798
>>> [   74.783394] GPR16: 000000011a0f9724 000000011a097688 000000011a02ff70
>>> 000000011a0fd568
>>> [   74.783394] GPR20: 0000000135554bf0 0000000000000001 000000011a0aa478
>>> 00007ffffc4b6a24
>>> [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000000000002
>>> 000000013549ea60
>>> [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 000000013549ea60
>>> 0000000000000002
>>> [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
>>> [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
>>> [   74.783447] --- interrupt: c00
>>> I'm in purgatory
>>> [    0.000000] radix-mmu: Page sizes from device-tree:
>>> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
>>> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
>>> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
>>> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
>>> [    0.000000] Activating Kernel Userspace Access Prevention
>>> [    0.000000] Activating Kernel Userspace Execution Prevention
>>> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
>>> with 64.0 KiB pages (exec)
>>> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
>>> with 64.0 KiB pages
>>> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
>>> with 2.00 MiB pages
>>> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
>>> with 2.00 MiB pages (exec)
>>> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
>>> with 2.00 MiB pages
>>> [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000
>>> with 1.00 GiB pages
>>> [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000
>>> with 2.00 MiB pages
>>> [    0.000000] lpar: Using radix MMU under hypervisor
>>> [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+
>>> (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red
>>> Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
>>> 41 CDT 2023
>>> [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
>>> [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200
>>> 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
>>> [    0.000000] printk: bootconsole [udbg0] enabled
>>> [    0.000000] the round shift between dt seq and the cpu logic number:
>>> 56
>>> [    0.000000] BUG: Unable to handle kernel data access on write at
>>> 0xc0000001a0000000
>>> [    0.000000] Faulting instruction address: 0xc000000022009c64
>>> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>> [    0.000000] Modules linked in:
>>> [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted
>>> 6.6.0-rc5pf-nr-cpus+ #3
>>> [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
>>> [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR:
>>> c0000000201ff348
>>> [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted
>>> (6.6.0-rc5pf-nr-cpus+)
>>> [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 28222824
>>> XER: 00000001
>>> [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR:
>>> 42000000 IRQMASK: 1
>>> [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 c0000000213d1300
>>> 0000000000000004
>>> [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000022aebbb8
>>> 0000000000000000
>>> [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 000000000000003a
>>> c0000000229c8a78
>>> [    0.000000] GPR12: 0000000000002000 c000000022e4a800 c0000000211d34b8
>>> c0000000211d3aa8
>>> [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c0000000225f3b98
>>> 0000000000000000
>>> [    0.000000] GPR20: 0000000000000001 0000000000000001 0000000000000001
>>> 0000000000000001
>>> [    0.000000] GPR24: 0000000000000008 0000000000000000 0000000000000001
>>> c00000019ffffdc0
>>> [    0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000022aebe08
>>> 0000000000000008
>>> [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
>>> [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
>>> [    0.000000] Call Trace:
>>> [    0.000000] [c000000022aebda0] [c000000022009ba0]
>>> smp_setup_cpu_maps+0x35c/0x724 (unreliable)
>>> [    0.000000] [c000000022aebeb0] [c00000002200a19c]
>>> setup_arch+0x1b8/0x54c
>>> [    0.000000] [c000000022aebf30] [c000000022003f88]
>>> start_kernel+0xb0/0x768
>>> [    0.000000] [c000000022aebfe0] [c00000002000d888]
>>> start_here_common+0x1c/0x20
>>> [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378
>>> 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c
>>> <7d49c12e> eb7b0000 7e99a378 4bffff3c
>> The faulting instruction address, 0xc000000022009c6, corresponds to the code
>> below:
>>
>> File:
>> arch/powerpc/kernel/setup-common.c
>>
>> Function
>> void __init smp_setup_cpu_maps(void)
>> {
>>              ...
>>              cpu_to_phys_id[bt_thread] =
>> be32_to_cpu(intserv_node->intserv[bt_thread]);
>>              ...
>> }
>>
>> Hope it helps.
>>
> Appreciate your help.
>
> This issue should be linked with the capability of cpu_to_phys_id[].
>
> Could you please to try the fix suggested at the end of the email?
> It should be a fix for
> [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
>
>
> Thanks,
>
> Pingfan
>
> ---
>
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index bd7853a4bc91..849adc7a4b47 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -464,12 +464,6 @@ void __init smp_setup_cpu_maps(void)
>   	DBG("smp_setup_cpu_maps()\n");
>   
>   	INIT_LIST_HEAD(&head);
> -	cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
> -					__alignof__(u32));
> -	if (!cpu_to_phys_id)
> -		panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> -		      __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
> -
>   	for_each_node_by_type(dn, "cpu") {
>   		const __be32 *intserv;
>   		__be32 cpu_be;
> @@ -533,6 +527,16 @@ void __init smp_setup_cpu_maps(void)
>   		}
>   
>   	}
> +
> +	/* There may be hole between cpu0 and boot cpu */
> +	j = (bt_thread + 1) > nr_cpu_ids ? (bt_thread + 1) : nr_cpu_ids;
> +	cpu_to_phys_id = memblock_alloc(j * sizeof(u32),
> +					__alignof__(u32));
> +	if (!cpu_to_phys_id)
> +		panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> +		      __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
> +
> +
>   	cpu = 0;
>   	list_del_init(&head);
>   	/* Select the primary thread, the boot cpu's slibing, as the logic 0 */

With the above changes applied, kdump kernel boots fine with a WARNING:

[root]# echo c > /proc/sysrq-trigger
[  310.748248] sysrq: Trigger a crash
[  310.748256] Kernel panic - not syncing: sysrq triggered crash
[  310.748266] CPU: 26 PID: 2610 Comm: bash Kdump: loaded Not tainted 
6.6.0-rc5-fix-setup-common+ #3
[  310.748273] Hardware name: IBM,9043-MRX POWER10  hv:phyp pSeries
[  310.748280] Call Trace:
[  310.748284] [c000000184717ac0] [c000000000ecf8d8] 
dump_stack_lvl+0x6c/0x9c (unreliable)
[  310.748298] [c000000184717af0] [c000000000150310] panic+0x178/0x438
[  310.748307] [c000000184717b90] [c00000000092c8b8] 
sysrq_handle_crash+0x28/0x30
[  310.748316] [c000000184717bf0] [c00000000092d2ac] 
__handle_sysrq+0x10c/0x250
[  310.748330] [c000000184717c90] [c00000000092db18] 
write_sysrq_trigger+0xc8/0x168
[  310.748339] [c000000184717cd0] [c00000000065c21c] 
proc_reg_write+0x10c/0x1b0
[  310.748349] [c000000184717d00] [c000000000583f94] vfs_write+0x104/0x4b0
[  310.748356] [c000000184717dc0] [c00000000058451c] ksys_write+0x7c/0x140
[  310.748365] [c000000184717e10] [c000000000033a54] 
system_call_exception+0x144/0x3a0
[  310.748377] [c000000184717e50] [c00000000000c554] 
system_call_common+0xf4/0x258
[  310.748389] --- interrupt: c00 at 0x7fff97720c34
[  310.748395] NIP:  00007fff97720c34 LR: 00007fff97697c74 CTR: 
0000000000000000
[  310.748404] REGS: c000000184717e80 TRAP: 0c00   Not tainted 
(6.6.0-rc5-fix-setup-common+)
[  310.748413] MSR:  800000000280f033 
<SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
[  310.748430] IRQMASK: 0
[  310.748430] GPR00: 0000000000000004 00007fffffabc510 00007fff97807300 
0000000000000001
[  310.748430] GPR04: 00000001624f7910 0000000000000002 0000000000000010 
00007fff97669724
[  310.748430] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[  310.748430] GPR12: 0000000000000000 00007fff97a5aee0 0000000040000000 
0000000125c39798
[  310.748430] GPR16: 0000000125c39724 0000000125bd8128 0000000125b70370 
0000000125c3d568
[  310.748430] GPR20: 0000000162551030 0000000000000001 0000000125beaf18 
00007fffffabc734
[  310.748430] GPR24: 00007fffffabc730 0000000125c3af94 0000000000000002 
00000001624f7910
[  310.748430] GPR28: 0000000000000002 00007fff97801798 00000001624f7910 
0000000000000002
[  310.748475] NIP [00007fff97720c34] 0x7fff97720c34
[  310.748478] LR [00007fff97697c74] 0x7fff97697c74
[  310.748482] --- interrupt: c00
I'm in purgatory
[    0.000000] radix-mmu: Page sizes from device-tree:
[    0.000000] radix-mmu: Page size shift = 12 AP=0x0
[    0.000000] radix-mmu: Page size shift = 16 AP=0x5
[    0.000000] radix-mmu: Page size shift = 21 AP=0x1
[    0.000000] radix-mmu: Page size shift = 30 AP=0x2
[    0.000000] Activating Kernel Userspace Access Prevention
[    0.000000] Activating Kernel Userspace Execution Prevention
[    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000 
with 64.0 KiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000 
with 64.0 KiB pages
[    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000 
with 2.00 MiB pages
[    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000 
with 2.00 MiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000 
with 2.00 MiB pages

Trimmed logs ....

[    0.001738] Mount-cache hash table entries: 16384 (order: 1, 131072 
bytes, linear)
[    0.001751] Mountpoint-cache hash table entries: 16384 (order: 1, 
131072 bytes, linear)
[    0.007339] ------------[ cut here ]------------
[    0.007356] WARNING: CPU: 2 PID: 1 at arch/powerpc/kernel/smp.c:941 
update_mask_from_threadgroup+0x128/0x1a0
[    0.007371] Modules linked in:
[    0.007377] CPU: 2 PID: 1 Comm: swapper/2 Not tainted 
6.6.0-rc5-fix-setup-common+ #3
[    0.007385] Hardware name: IBM,9043-MRX POWER10 hv:phyp pSeries
[    0.007393] NIP:  c000000022011ed8 LR: c000000022011e10 CTR: 
0000000000000000
[    0.007411] REGS: c0000000256338f0 TRAP: 0700   Not tainted 
(6.6.0-rc5-fix-setup-common+)
[    0.007425] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 
44000842  XER: 0000000c
[    0.007444] CFAR: c000000022011e78 IRQMASK: 0
[    0.007444] GPR00: c000000022011e10 c000000025633b90 c0000000213c1300 
0000000000000002
[    0.007444] GPR04: 0000000000000000 0000000000000005 0000000000000001 
0000000000000002
[    0.007444] GPR08: 0000000000000008 0000000000000001 0000000000000002 
0000000000000004
[    0.007444] GPR12: 0000000000000000 c000000022e3ac00 c000000020010138 
0000000000000000
[    0.007444] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007444] GPR20: 0000000000000018 c000000022150968 c000000022093580 
c0000000253df000
[    0.007444] GPR24: 0000000000000002 0000000000000000 c000000022b32058 
0000000000000000
[    0.007444] GPR28: c00000015fca0a68 c000000022ba0330 c00000002209352c 
0000000000000000
[    0.007520] NIP [c000000022011ed8] 
update_mask_from_threadgroup+0x128/0x1a0
[    0.007528] LR [c000000022011e10] update_mask_from_threadgroup+0x60/0x1a0
[    0.007536] Call Trace:
[    0.007539] [c000000025633b90] [c000000022011e10] 
update_mask_from_threadgroup+0x60/0x1a0 (unreliable)
[    0.007550] [c000000025633be0] [c000000022012210] 
init_thread_group_cache_map+0x2c0/0x338
[    0.007559] [c000000025633c50] [c0000000220125a0] 
smp_prepare_cpus+0x318/0x510
[    0.007568] [c000000025633d10] [c000000022004874] 
kernel_init_freeable+0x198/0x3cc
[    0.007578] [c000000025633de0] [c000000020010164] kernel_init+0x34/0x1b0
[    0.007586] [c000000025633e50] [c00000002000cd94] 
ret_from_kernel_user_thread+0x14/0x1c
[    0.007596] --- interrupt: 0 at 0x0
[    0.007601] NIP:  0000000000000000 LR: 0000000000000000 CTR: 
0000000000000000
[    0.007608] REGS: c000000025633e80 TRAP: 0000   Not tainted 
(6.6.0-rc5-fix-setup-common+)
[    0.007632] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
[    0.007651] CFAR: 0000000000000000 IRQMASK: 0
[    0.007651] GPR00: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR04: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR12: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR24: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007651] GPR28: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.007742] NIP [0000000000000000] 0x0
[    0.007756] LR [0000000000000000] 0x0
[    0.007769] --- interrupt: 0
[    0.007779] Code: 7ca507b4 79081764 7d1e4214 8108000c 7f882000 
409effdc 48000010 38e70001 7ce707b4 4bffffa4 2f8affff 409e0010 
<0fe00000> 3860ffc3 4800004c 7f9b5000
[    0.007805] ---[ end trace 0000000000000000 ]---
[    0.007997] RCU Tasks Rude: Setting shift to 2 and lim to 1 
rcu_task_cb_adjust=1.
[    0.008018] RCU Tasks Trace: Setting shift to 2 and lim to 1 
rcu_task_cb_adjust=1.
[    0.008043] POWER10 performance monitor hardware support registered
[    0.008071] rcu: Hierarchical SRCU implementation.
[    0.008078] rcu:     Max phase no-delay instances is 1000.
[    0.008516] smp: Bringing up secondary CPUs ...
[    0.008735] smp: Brought up 1 node, 2 CPUs
...

Note: no warning observed if crashing CPU is 0, 8, 16, 24, 32, ....

Code that generates warning:

File: arch/powerpc/kernel/smp.c
Function: update_mask_from_threadgroup
...
         if (unlikely(i_group_start == -1)) {
             WARN_ON_ONCE(1);
             return -ENODATA;
         }


Thanks,
Sourabh
Pingfan Liu Oct. 12, 2023, 1:20 p.m. UTC | #6
On Wed, Oct 11, 2023 at 6:53 PM Sourabh Jain <sourabhjain@linux.ibm.com> wrote:
>
> Hello Pingfan,
> >>> With this patch series applied, the kdump kernel fails to boot on
> >>> powerpc with nr_cpus=1.
> >>>
> >>> Console logs:
> >>> -------------------
> >>> [root]# echo c > /proc/sysrq-trigger
> >>> [   74.783235] sysrq: Trigger a crash
> >>> [   74.783244] Kernel panic - not syncing: sysrq triggered crash
> >>> [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted
> >>> 6.6.0-rc5pf-nr-cpus+ #3
> >>> [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
> >>> [   74.783275] Call Trace:
> >>> [   74.783280] [c00000020f4ebac0] [c000000000ed9f38]
> >>> dump_stack_lvl+0x6c/0x9c (unreliable)
> >>> [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
> >>> [   74.783298] [c00000020f4ebb90] [c000000000936d48]
> >>> sysrq_handle_crash+0x28/0x30
> >>> [   74.783304] [c00000020f4ebbf0] [c00000000093773c]
> >>> __handle_sysrq+0x10c/0x250
> >>> [   74.783309] [c00000020f4ebc90] [c000000000937fa8]
> >>> write_sysrq_trigger+0xc8/0x168
> >>> [   74.783314] [c00000020f4ebcd0] [c000000000665d8c]
> >>> proc_reg_write+0x10c/0x1b0
> >>> [   74.783321] [c00000020f4ebd00] [c00000000058da54]
> >>> vfs_write+0x104/0x4b0
> >>> [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc]
> >>> ksys_write+0x7c/0x140
> >>> [   74.783331] [c00000020f4ebe10] [c000000000033a64]
> >>> system_call_exception+0x144/0x3a0
> >>> [   74.783337] [c00000020f4ebe50] [c00000000000c554]
> >>> system_call_common+0xf4/0x258
> >>> [   74.783343] --- interrupt: c00 at 0x7fffa0721594
> >>> [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR:
> >>> 0000000000000000
> >>> [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted
> >>> (6.6.0-rc5pf-nr-cpus+)
> >>> [   74.783376] MSR:  800000000280f033
> >>> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> >>> [   74.783394] IRQMASK: 0
> >>> [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007fffa0807300
> >>> 0000000000000001
> >>> [   74.783394] GPR04: 000000013549ea60 0000000000000002 0000000000000010
> >>> 0000000000000000
> >>> [   74.783394] GPR08: 0000000000000000 0000000000000000 0000000000000000
> >>> 0000000000000000
> >>> [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000040000000
> >>> 000000011a0f9798
> >>> [   74.783394] GPR16: 000000011a0f9724 000000011a097688 000000011a02ff70
> >>> 000000011a0fd568
> >>> [   74.783394] GPR20: 0000000135554bf0 0000000000000001 000000011a0aa478
> >>> 00007ffffc4b6a24
> >>> [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000000000002
> >>> 000000013549ea60
> >>> [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 000000013549ea60
> >>> 0000000000000002
> >>> [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
> >>> [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
> >>> [   74.783447] --- interrupt: c00
> >>> I'm in purgatory
> >>> [    0.000000] radix-mmu: Page sizes from device-tree:
> >>> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> >>> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> >>> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> >>> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> >>> [    0.000000] Activating Kernel Userspace Access Prevention
> >>> [    0.000000] Activating Kernel Userspace Execution Prevention
> >>> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
> >>> with 64.0 KiB pages (exec)
> >>> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
> >>> with 64.0 KiB pages
> >>> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
> >>> with 2.00 MiB pages
> >>> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
> >>> with 2.00 MiB pages (exec)
> >>> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
> >>> with 2.00 MiB pages
> >>> [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000
> >>> with 1.00 GiB pages
> >>> [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000
> >>> with 2.00 MiB pages
> >>> [    0.000000] lpar: Using radix MMU under hypervisor
> >>> [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+
> >>> (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red
> >>> Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
> >>> 41 CDT 2023
> >>> [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
> >>> [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200
> >>> 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
> >>> [    0.000000] printk: bootconsole [udbg0] enabled
> >>> [    0.000000] the round shift between dt seq and the cpu logic number:
> >>> 56
> >>> [    0.000000] BUG: Unable to handle kernel data access on write at
> >>> 0xc0000001a0000000
> >>> [    0.000000] Faulting instruction address: 0xc000000022009c64
> >>> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> >>> [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> >>> [    0.000000] Modules linked in:
> >>> [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted
> >>> 6.6.0-rc5pf-nr-cpus+ #3
> >>> [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
> >>> [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR:
> >>> c0000000201ff348
> >>> [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted
> >>> (6.6.0-rc5pf-nr-cpus+)
> >>> [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 28222824
> >>> XER: 00000001
> >>> [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR:
> >>> 42000000 IRQMASK: 1
> >>> [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 c0000000213d1300
> >>> 0000000000000004
> >>> [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000022aebbb8
> >>> 0000000000000000
> >>> [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 000000000000003a
> >>> c0000000229c8a78
> >>> [    0.000000] GPR12: 0000000000002000 c000000022e4a800 c0000000211d34b8
> >>> c0000000211d3aa8
> >>> [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c0000000225f3b98
> >>> 0000000000000000
> >>> [    0.000000] GPR20: 0000000000000001 0000000000000001 0000000000000001
> >>> 0000000000000001
> >>> [    0.000000] GPR24: 0000000000000008 0000000000000000 0000000000000001
> >>> c00000019ffffdc0
> >>> [    0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000022aebe08
> >>> 0000000000000008
> >>> [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
> >>> [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
> >>> [    0.000000] Call Trace:
> >>> [    0.000000] [c000000022aebda0] [c000000022009ba0]
> >>> smp_setup_cpu_maps+0x35c/0x724 (unreliable)
> >>> [    0.000000] [c000000022aebeb0] [c00000002200a19c]
> >>> setup_arch+0x1b8/0x54c
> >>> [    0.000000] [c000000022aebf30] [c000000022003f88]
> >>> start_kernel+0xb0/0x768
> >>> [    0.000000] [c000000022aebfe0] [c00000002000d888]
> >>> start_here_common+0x1c/0x20
> >>> [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378
> >>> 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c
> >>> <7d49c12e> eb7b0000 7e99a378 4bffff3c
> >> The faulting instruction address, 0xc000000022009c6, corresponds to the code
> >> below:
> >>
> >> File:
> >> arch/powerpc/kernel/setup-common.c
> >>
> >> Function
> >> void __init smp_setup_cpu_maps(void)
> >> {
> >>              ...
> >>              cpu_to_phys_id[bt_thread] =
> >> be32_to_cpu(intserv_node->intserv[bt_thread]);
> >>              ...
> >> }
> >>
> >> Hope it helps.
> >>
> > Appreciate your help.
> >
> > This issue should be linked with the capability of cpu_to_phys_id[].
> >
> > Could you please to try the fix suggested at the end of the email?
> > It should be a fix for
> > [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
> >
> >
> > Thanks,
> >
> > Pingfan
> >
> > ---
> >
> > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> > index bd7853a4bc91..849adc7a4b47 100644
> > --- a/arch/powerpc/kernel/setup-common.c
> > +++ b/arch/powerpc/kernel/setup-common.c
> > @@ -464,12 +464,6 @@ void __init smp_setup_cpu_maps(void)
> >       DBG("smp_setup_cpu_maps()\n");
> >
> >       INIT_LIST_HEAD(&head);
> > -     cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
> > -                                     __alignof__(u32));
> > -     if (!cpu_to_phys_id)
> > -             panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> > -                   __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
> > -
> >       for_each_node_by_type(dn, "cpu") {
> >               const __be32 *intserv;
> >               __be32 cpu_be;
> > @@ -533,6 +527,16 @@ void __init smp_setup_cpu_maps(void)
> >               }
> >
> >       }
> > +
> > +     /* There may be hole between cpu0 and boot cpu */
> > +     j = (bt_thread + 1) > nr_cpu_ids ? (bt_thread + 1) : nr_cpu_ids;
> > +     cpu_to_phys_id = memblock_alloc(j * sizeof(u32),
> > +                                     __alignof__(u32));
> > +     if (!cpu_to_phys_id)
> > +             panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> > +                   __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
> > +
> > +
> >       cpu = 0;
> >       list_del_init(&head);
> >       /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
>
> With the above changes applied, kdump kernel boots fine with a WARNING:
>
> [root]# echo c > /proc/sysrq-trigger
> [  310.748248] sysrq: Trigger a crash
> [  310.748256] Kernel panic - not syncing: sysrq triggered crash
> [  310.748266] CPU: 26 PID: 2610 Comm: bash Kdump: loaded Not tainted
> 6.6.0-rc5-fix-setup-common+ #3
> [  310.748273] Hardware name: IBM,9043-MRX POWER10  hv:phyp pSeries
> [  310.748280] Call Trace:
> [  310.748284] [c000000184717ac0] [c000000000ecf8d8]
> dump_stack_lvl+0x6c/0x9c (unreliable)
> [  310.748298] [c000000184717af0] [c000000000150310] panic+0x178/0x438
> [  310.748307] [c000000184717b90] [c00000000092c8b8]
> sysrq_handle_crash+0x28/0x30
> [  310.748316] [c000000184717bf0] [c00000000092d2ac]
> __handle_sysrq+0x10c/0x250
> [  310.748330] [c000000184717c90] [c00000000092db18]
> write_sysrq_trigger+0xc8/0x168
> [  310.748339] [c000000184717cd0] [c00000000065c21c]
> proc_reg_write+0x10c/0x1b0
> [  310.748349] [c000000184717d00] [c000000000583f94] vfs_write+0x104/0x4b0
> [  310.748356] [c000000184717dc0] [c00000000058451c] ksys_write+0x7c/0x140
> [  310.748365] [c000000184717e10] [c000000000033a54]
> system_call_exception+0x144/0x3a0
> [  310.748377] [c000000184717e50] [c00000000000c554]
> system_call_common+0xf4/0x258
> [  310.748389] --- interrupt: c00 at 0x7fff97720c34
> [  310.748395] NIP:  00007fff97720c34 LR: 00007fff97697c74 CTR:
> 0000000000000000
> [  310.748404] REGS: c000000184717e80 TRAP: 0c00   Not tainted
> (6.6.0-rc5-fix-setup-common+)
> [  310.748413] MSR:  800000000280f033
> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> [  310.748430] IRQMASK: 0
> [  310.748430] GPR00: 0000000000000004 00007fffffabc510 00007fff97807300
> 0000000000000001
> [  310.748430] GPR04: 00000001624f7910 0000000000000002 0000000000000010
> 00007fff97669724
> [  310.748430] GPR08: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [  310.748430] GPR12: 0000000000000000 00007fff97a5aee0 0000000040000000
> 0000000125c39798
> [  310.748430] GPR16: 0000000125c39724 0000000125bd8128 0000000125b70370
> 0000000125c3d568
> [  310.748430] GPR20: 0000000162551030 0000000000000001 0000000125beaf18
> 00007fffffabc734
> [  310.748430] GPR24: 00007fffffabc730 0000000125c3af94 0000000000000002
> 00000001624f7910
> [  310.748430] GPR28: 0000000000000002 00007fff97801798 00000001624f7910
> 0000000000000002
> [  310.748475] NIP [00007fff97720c34] 0x7fff97720c34
> [  310.748478] LR [00007fff97697c74] 0x7fff97697c74
> [  310.748482] --- interrupt: c00
> I'm in purgatory
> [    0.000000] radix-mmu: Page sizes from device-tree:
> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> [    0.000000] Activating Kernel Userspace Access Prevention
> [    0.000000] Activating Kernel Userspace Execution Prevention
> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
> with 64.0 KiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
> with 64.0 KiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
> with 2.00 MiB pages
> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
> with 2.00 MiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
> with 2.00 MiB pages
>
> Trimmed logs ....
>
> [    0.001738] Mount-cache hash table entries: 16384 (order: 1, 131072
> bytes, linear)
> [    0.001751] Mountpoint-cache hash table entries: 16384 (order: 1,
> 131072 bytes, linear)
> [    0.007339] ------------[ cut here ]------------
> [    0.007356] WARNING: CPU: 2 PID: 1 at arch/powerpc/kernel/smp.c:941
> update_mask_from_threadgroup+0x128/0x1a0
> [    0.007371] Modules linked in:
> [    0.007377] CPU: 2 PID: 1 Comm: swapper/2 Not tainted
> 6.6.0-rc5-fix-setup-common+ #3
> [    0.007385] Hardware name: IBM,9043-MRX POWER10 hv:phyp pSeries
> [    0.007393] NIP:  c000000022011ed8 LR: c000000022011e10 CTR:
> 0000000000000000
> [    0.007411] REGS: c0000000256338f0 TRAP: 0700   Not tainted
> (6.6.0-rc5-fix-setup-common+)
> [    0.007425] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR:
> 44000842  XER: 0000000c
> [    0.007444] CFAR: c000000022011e78 IRQMASK: 0
> [    0.007444] GPR00: c000000022011e10 c000000025633b90 c0000000213c1300
> 0000000000000002
> [    0.007444] GPR04: 0000000000000000 0000000000000005 0000000000000001
> 0000000000000002
> [    0.007444] GPR08: 0000000000000008 0000000000000001 0000000000000002
> 0000000000000004
> [    0.007444] GPR12: 0000000000000000 c000000022e3ac00 c000000020010138
> 0000000000000000
> [    0.007444] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007444] GPR20: 0000000000000018 c000000022150968 c000000022093580
> c0000000253df000
> [    0.007444] GPR24: 0000000000000002 0000000000000000 c000000022b32058
> 0000000000000000
> [    0.007444] GPR28: c00000015fca0a68 c000000022ba0330 c00000002209352c
> 0000000000000000
> [    0.007520] NIP [c000000022011ed8]
> update_mask_from_threadgroup+0x128/0x1a0
> [    0.007528] LR [c000000022011e10] update_mask_from_threadgroup+0x60/0x1a0
> [    0.007536] Call Trace:
> [    0.007539] [c000000025633b90] [c000000022011e10]
> update_mask_from_threadgroup+0x60/0x1a0 (unreliable)
> [    0.007550] [c000000025633be0] [c000000022012210]
> init_thread_group_cache_map+0x2c0/0x338
> [    0.007559] [c000000025633c50] [c0000000220125a0]
> smp_prepare_cpus+0x318/0x510
> [    0.007568] [c000000025633d10] [c000000022004874]
> kernel_init_freeable+0x198/0x3cc
> [    0.007578] [c000000025633de0] [c000000020010164] kernel_init+0x34/0x1b0
> [    0.007586] [c000000025633e50] [c00000002000cd94]
> ret_from_kernel_user_thread+0x14/0x1c
> [    0.007596] --- interrupt: 0 at 0x0
> [    0.007601] NIP:  0000000000000000 LR: 0000000000000000 CTR:
> 0000000000000000
> [    0.007608] REGS: c000000025633e80 TRAP: 0000   Not tainted
> (6.6.0-rc5-fix-setup-common+)
> [    0.007632] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
> [    0.007651] CFAR: 0000000000000000 IRQMASK: 0
> [    0.007651] GPR00: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR04: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR08: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR12: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR20: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR24: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007651] GPR28: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.007742] NIP [0000000000000000] 0x0
> [    0.007756] LR [0000000000000000] 0x0
> [    0.007769] --- interrupt: 0
> [    0.007779] Code: 7ca507b4 79081764 7d1e4214 8108000c 7f882000
> 409effdc 48000010 38e70001 7ce707b4 4bffffa4 2f8affff 409e0010
> <0fe00000> 3860ffc3 4800004c 7f9b5000
> [    0.007805] ---[ end trace 0000000000000000 ]---
> [    0.007997] RCU Tasks Rude: Setting shift to 2 and lim to 1
> rcu_task_cb_adjust=1.
> [    0.008018] RCU Tasks Trace: Setting shift to 2 and lim to 1
> rcu_task_cb_adjust=1.
> [    0.008043] POWER10 performance monitor hardware support registered
> [    0.008071] rcu: Hierarchical SRCU implementation.
> [    0.008078] rcu:     Max phase no-delay instances is 1000.
> [    0.008516] smp: Bringing up secondary CPUs ...
> [    0.008735] smp: Brought up 1 node, 2 CPUs
> ...
>
> Note: no warning observed if crashing CPU is 0, 8, 16, 24, 32, ....
>
> Code that generates warning:
>
> File: arch/powerpc/kernel/smp.c
> Function: update_mask_from_threadgroup
> ...
>          if (unlikely(i_group_start == -1)) {
>              WARN_ON_ONCE(1);
>              return -ENODATA;
>          }
>

It seems that the crash cpu passed the statements in
init_thread_group_cache_map()
{
if (unlikely(cpu_group_start == -1)) {
WARN_ON_ONCE(1);
return -ENODATA;
}

}

But raising warn in the above snippet.  So it means that
get_cpu_thread_group_start(i, tg) for the @first_thread failed in
update_mask_from_threadgroup().  At present, I have no idea about it.


And is this warning observed if only applying [1-2/5] ?

According to my collected data, percpu area will cost 1792 kB per cpu.
Forcing all eight threads in a core online will cost 10752KB more than
the result if applying the whole series. Maybe I can put [3-5/5] aside
as Hari suggested, and try them later if needed.

Thanks,

Pingfan
Sourabh Jain Oct. 16, 2023, 6:43 a.m. UTC | #7
Hello Pingfan,

>>>>> With this patch series applied, the kdump kernel fails to boot on
>>>>> powerpc with nr_cpus=1.
>>>>>
>>>>> Console logs:
>>>>> -------------------
>>>>> [root]# echo c > /proc/sysrq-trigger
>>>>> [   74.783235] sysrq: Trigger a crash
>>>>> [   74.783244] Kernel panic - not syncing: sysrq triggered crash
>>>>> [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted
>>>>> 6.6.0-rc5pf-nr-cpus+ #3
>>>>> [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
>>>>> [   74.783275] Call Trace:
>>>>> [   74.783280] [c00000020f4ebac0] [c000000000ed9f38]
>>>>> dump_stack_lvl+0x6c/0x9c (unreliable)
>>>>> [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
>>>>> [   74.783298] [c00000020f4ebb90] [c000000000936d48]
>>>>> sysrq_handle_crash+0x28/0x30
>>>>> [   74.783304] [c00000020f4ebbf0] [c00000000093773c]
>>>>> __handle_sysrq+0x10c/0x250
>>>>> [   74.783309] [c00000020f4ebc90] [c000000000937fa8]
>>>>> write_sysrq_trigger+0xc8/0x168
>>>>> [   74.783314] [c00000020f4ebcd0] [c000000000665d8c]
>>>>> proc_reg_write+0x10c/0x1b0
>>>>> [   74.783321] [c00000020f4ebd00] [c00000000058da54]
>>>>> vfs_write+0x104/0x4b0
>>>>> [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc]
>>>>> ksys_write+0x7c/0x140
>>>>> [   74.783331] [c00000020f4ebe10] [c000000000033a64]
>>>>> system_call_exception+0x144/0x3a0
>>>>> [   74.783337] [c00000020f4ebe50] [c00000000000c554]
>>>>> system_call_common+0xf4/0x258
>>>>> [   74.783343] --- interrupt: c00 at 0x7fffa0721594
>>>>> [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR:
>>>>> 0000000000000000
>>>>> [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted
>>>>> (6.6.0-rc5pf-nr-cpus+)
>>>>> [   74.783376] MSR:  800000000280f033
>>>>> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
>>>>> [   74.783394] IRQMASK: 0
>>>>> [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007fffa0807300
>>>>> 0000000000000001
>>>>> [   74.783394] GPR04: 000000013549ea60 0000000000000002 0000000000000010
>>>>> 0000000000000000
>>>>> [   74.783394] GPR08: 0000000000000000 0000000000000000 0000000000000000
>>>>> 0000000000000000
>>>>> [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000040000000
>>>>> 000000011a0f9798
>>>>> [   74.783394] GPR16: 000000011a0f9724 000000011a097688 000000011a02ff70
>>>>> 000000011a0fd568
>>>>> [   74.783394] GPR20: 0000000135554bf0 0000000000000001 000000011a0aa478
>>>>> 00007ffffc4b6a24
>>>>> [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000000000002
>>>>> 000000013549ea60
>>>>> [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 000000013549ea60
>>>>> 0000000000000002
>>>>> [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
>>>>> [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
>>>>> [   74.783447] --- interrupt: c00
>>>>> I'm in purgatory
>>>>> [    0.000000] radix-mmu: Page sizes from device-tree:
>>>>> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
>>>>> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
>>>>> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
>>>>> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
>>>>> [    0.000000] Activating Kernel Userspace Access Prevention
>>>>> [    0.000000] Activating Kernel Userspace Execution Prevention
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
>>>>> with 64.0 KiB pages (exec)
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
>>>>> with 64.0 KiB pages
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
>>>>> with 2.00 MiB pages
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
>>>>> with 2.00 MiB pages (exec)
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
>>>>> with 2.00 MiB pages
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000
>>>>> with 1.00 GiB pages
>>>>> [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000
>>>>> with 2.00 MiB pages
>>>>> [    0.000000] lpar: Using radix MMU under hypervisor
>>>>> [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+
>>>>> (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red
>>>>> Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
>>>>> 41 CDT 2023
>>>>> [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
>>>>> [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200
>>>>> 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
>>>>> [    0.000000] printk: bootconsole [udbg0] enabled
>>>>> [    0.000000] the round shift between dt seq and the cpu logic number:
>>>>> 56
>>>>> [    0.000000] BUG: Unable to handle kernel data access on write at
>>>>> 0xc0000001a0000000
>>>>> [    0.000000] Faulting instruction address: 0xc000000022009c64
>>>>> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
>>>>> [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>>>> [    0.000000] Modules linked in:
>>>>> [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted
>>>>> 6.6.0-rc5pf-nr-cpus+ #3
>>>>> [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
>>>>> [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR:
>>>>> c0000000201ff348
>>>>> [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted
>>>>> (6.6.0-rc5pf-nr-cpus+)
>>>>> [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 28222824
>>>>> XER: 00000001
>>>>> [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR:
>>>>> 42000000 IRQMASK: 1
>>>>> [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 c0000000213d1300
>>>>> 0000000000000004
>>>>> [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000022aebbb8
>>>>> 0000000000000000
>>>>> [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 000000000000003a
>>>>> c0000000229c8a78
>>>>> [    0.000000] GPR12: 0000000000002000 c000000022e4a800 c0000000211d34b8
>>>>> c0000000211d3aa8
>>>>> [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c0000000225f3b98
>>>>> 0000000000000000
>>>>> [    0.000000] GPR20: 0000000000000001 0000000000000001 0000000000000001
>>>>> 0000000000000001
>>>>> [    0.000000] GPR24: 0000000000000008 0000000000000000 0000000000000001
>>>>> c00000019ffffdc0
>>>>> [    0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000022aebe08
>>>>> 0000000000000008
>>>>> [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
>>>>> [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
>>>>> [    0.000000] Call Trace:
>>>>> [    0.000000] [c000000022aebda0] [c000000022009ba0]
>>>>> smp_setup_cpu_maps+0x35c/0x724 (unreliable)
>>>>> [    0.000000] [c000000022aebeb0] [c00000002200a19c]
>>>>> setup_arch+0x1b8/0x54c
>>>>> [    0.000000] [c000000022aebf30] [c000000022003f88]
>>>>> start_kernel+0xb0/0x768
>>>>> [    0.000000] [c000000022aebfe0] [c00000002000d888]
>>>>> start_here_common+0x1c/0x20
>>>>> [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378
>>>>> 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c
>>>>> <7d49c12e> eb7b0000 7e99a378 4bffff3c
>>>> The faulting instruction address, 0xc000000022009c6, corresponds to the code
>>>> below:
>>>>
>>>> File:
>>>> arch/powerpc/kernel/setup-common.c
>>>>
>>>> Function
>>>> void __init smp_setup_cpu_maps(void)
>>>> {
>>>>               ...
>>>>               cpu_to_phys_id[bt_thread] =
>>>> be32_to_cpu(intserv_node->intserv[bt_thread]);
>>>>               ...
>>>> }
>>>>
>>>> Hope it helps.
>>>>
>>> Appreciate your help.
>>>
>>> This issue should be linked with the capability of cpu_to_phys_id[].
>>>
>>> Could you please to try the fix suggested at the end of the email?
>>> It should be a fix for
>>> [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
>>>
>>>
>>> Thanks,
>>>
>>> Pingfan
>>>
>>> ---
>>>
>>> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
>>> index bd7853a4bc91..849adc7a4b47 100644
>>> --- a/arch/powerpc/kernel/setup-common.c
>>> +++ b/arch/powerpc/kernel/setup-common.c
>>> @@ -464,12 +464,6 @@ void __init smp_setup_cpu_maps(void)
>>>        DBG("smp_setup_cpu_maps()\n");
>>>
>>>        INIT_LIST_HEAD(&head);
>>> -     cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
>>> -                                     __alignof__(u32));
>>> -     if (!cpu_to_phys_id)
>>> -             panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
>>> -                   __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
>>> -
>>>        for_each_node_by_type(dn, "cpu") {
>>>                const __be32 *intserv;
>>>                __be32 cpu_be;
>>> @@ -533,6 +527,16 @@ void __init smp_setup_cpu_maps(void)
>>>                }
>>>
>>>        }
>>> +
>>> +     /* There may be hole between cpu0 and boot cpu */
>>> +     j = (bt_thread + 1) > nr_cpu_ids ? (bt_thread + 1) : nr_cpu_ids;
>>> +     cpu_to_phys_id = memblock_alloc(j * sizeof(u32),
>>> +                                     __alignof__(u32));
>>> +     if (!cpu_to_phys_id)
>>> +             panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
>>> +                   __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
>>> +
>>> +
>>>        cpu = 0;
>>>        list_del_init(&head);
>>>        /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
>> With the above changes applied, kdump kernel boots fine with a WARNING:
>>
>> [root]# echo c > /proc/sysrq-trigger
>> [  310.748248] sysrq: Trigger a crash
>> [  310.748256] Kernel panic - not syncing: sysrq triggered crash
>> [  310.748266] CPU: 26 PID: 2610 Comm: bash Kdump: loaded Not tainted
>> 6.6.0-rc5-fix-setup-common+ #3
>> [  310.748273] Hardware name: IBM,9043-MRX POWER10  hv:phyp pSeries
>> [  310.748280] Call Trace:
>> [  310.748284] [c000000184717ac0] [c000000000ecf8d8]
>> dump_stack_lvl+0x6c/0x9c (unreliable)
>> [  310.748298] [c000000184717af0] [c000000000150310] panic+0x178/0x438
>> [  310.748307] [c000000184717b90] [c00000000092c8b8]
>> sysrq_handle_crash+0x28/0x30
>> [  310.748316] [c000000184717bf0] [c00000000092d2ac]
>> __handle_sysrq+0x10c/0x250
>> [  310.748330] [c000000184717c90] [c00000000092db18]
>> write_sysrq_trigger+0xc8/0x168
>> [  310.748339] [c000000184717cd0] [c00000000065c21c]
>> proc_reg_write+0x10c/0x1b0
>> [  310.748349] [c000000184717d00] [c000000000583f94] vfs_write+0x104/0x4b0
>> [  310.748356] [c000000184717dc0] [c00000000058451c] ksys_write+0x7c/0x140
>> [  310.748365] [c000000184717e10] [c000000000033a54]
>> system_call_exception+0x144/0x3a0
>> [  310.748377] [c000000184717e50] [c00000000000c554]
>> system_call_common+0xf4/0x258
>> [  310.748389] --- interrupt: c00 at 0x7fff97720c34
>> [  310.748395] NIP:  00007fff97720c34 LR: 00007fff97697c74 CTR:
>> 0000000000000000
>> [  310.748404] REGS: c000000184717e80 TRAP: 0c00   Not tainted
>> (6.6.0-rc5-fix-setup-common+)
>> [  310.748413] MSR:  800000000280f033
>> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
>> [  310.748430] IRQMASK: 0
>> [  310.748430] GPR00: 0000000000000004 00007fffffabc510 00007fff97807300
>> 0000000000000001
>> [  310.748430] GPR04: 00000001624f7910 0000000000000002 0000000000000010
>> 00007fff97669724
>> [  310.748430] GPR08: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [  310.748430] GPR12: 0000000000000000 00007fff97a5aee0 0000000040000000
>> 0000000125c39798
>> [  310.748430] GPR16: 0000000125c39724 0000000125bd8128 0000000125b70370
>> 0000000125c3d568
>> [  310.748430] GPR20: 0000000162551030 0000000000000001 0000000125beaf18
>> 00007fffffabc734
>> [  310.748430] GPR24: 00007fffffabc730 0000000125c3af94 0000000000000002
>> 00000001624f7910
>> [  310.748430] GPR28: 0000000000000002 00007fff97801798 00000001624f7910
>> 0000000000000002
>> [  310.748475] NIP [00007fff97720c34] 0x7fff97720c34
>> [  310.748478] LR [00007fff97697c74] 0x7fff97697c74
>> [  310.748482] --- interrupt: c00
>> I'm in purgatory
>> [    0.000000] radix-mmu: Page sizes from device-tree:
>> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
>> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
>> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
>> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
>> [    0.000000] Activating Kernel Userspace Access Prevention
>> [    0.000000] Activating Kernel Userspace Execution Prevention
>> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
>> with 64.0 KiB pages (exec)
>> [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
>> with 64.0 KiB pages
>> [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
>> with 2.00 MiB pages
>> [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
>> with 2.00 MiB pages (exec)
>> [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
>> with 2.00 MiB pages
>>
>> Trimmed logs ....
>>
>> [    0.001738] Mount-cache hash table entries: 16384 (order: 1, 131072
>> bytes, linear)
>> [    0.001751] Mountpoint-cache hash table entries: 16384 (order: 1,
>> 131072 bytes, linear)
>> [    0.007339] ------------[ cut here ]------------
>> [    0.007356] WARNING: CPU: 2 PID: 1 at arch/powerpc/kernel/smp.c:941
>> update_mask_from_threadgroup+0x128/0x1a0
>> [    0.007371] Modules linked in:
>> [    0.007377] CPU: 2 PID: 1 Comm: swapper/2 Not tainted
>> 6.6.0-rc5-fix-setup-common+ #3
>> [    0.007385] Hardware name: IBM,9043-MRX POWER10 hv:phyp pSeries
>> [    0.007393] NIP:  c000000022011ed8 LR: c000000022011e10 CTR:
>> 0000000000000000
>> [    0.007411] REGS: c0000000256338f0 TRAP: 0700   Not tainted
>> (6.6.0-rc5-fix-setup-common+)
>> [    0.007425] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR:
>> 44000842  XER: 0000000c
>> [    0.007444] CFAR: c000000022011e78 IRQMASK: 0
>> [    0.007444] GPR00: c000000022011e10 c000000025633b90 c0000000213c1300
>> 0000000000000002
>> [    0.007444] GPR04: 0000000000000000 0000000000000005 0000000000000001
>> 0000000000000002
>> [    0.007444] GPR08: 0000000000000008 0000000000000001 0000000000000002
>> 0000000000000004
>> [    0.007444] GPR12: 0000000000000000 c000000022e3ac00 c000000020010138
>> 0000000000000000
>> [    0.007444] GPR16: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007444] GPR20: 0000000000000018 c000000022150968 c000000022093580
>> c0000000253df000
>> [    0.007444] GPR24: 0000000000000002 0000000000000000 c000000022b32058
>> 0000000000000000
>> [    0.007444] GPR28: c00000015fca0a68 c000000022ba0330 c00000002209352c
>> 0000000000000000
>> [    0.007520] NIP [c000000022011ed8]
>> update_mask_from_threadgroup+0x128/0x1a0
>> [    0.007528] LR [c000000022011e10] update_mask_from_threadgroup+0x60/0x1a0
>> [    0.007536] Call Trace:
>> [    0.007539] [c000000025633b90] [c000000022011e10]
>> update_mask_from_threadgroup+0x60/0x1a0 (unreliable)
>> [    0.007550] [c000000025633be0] [c000000022012210]
>> init_thread_group_cache_map+0x2c0/0x338
>> [    0.007559] [c000000025633c50] [c0000000220125a0]
>> smp_prepare_cpus+0x318/0x510
>> [    0.007568] [c000000025633d10] [c000000022004874]
>> kernel_init_freeable+0x198/0x3cc
>> [    0.007578] [c000000025633de0] [c000000020010164] kernel_init+0x34/0x1b0
>> [    0.007586] [c000000025633e50] [c00000002000cd94]
>> ret_from_kernel_user_thread+0x14/0x1c
>> [    0.007596] --- interrupt: 0 at 0x0
>> [    0.007601] NIP:  0000000000000000 LR: 0000000000000000 CTR:
>> 0000000000000000
>> [    0.007608] REGS: c000000025633e80 TRAP: 0000   Not tainted
>> (6.6.0-rc5-fix-setup-common+)
>> [    0.007632] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
>> [    0.007651] CFAR: 0000000000000000 IRQMASK: 0
>> [    0.007651] GPR00: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR04: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR08: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR12: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR16: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR20: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR24: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007651] GPR28: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [    0.007742] NIP [0000000000000000] 0x0
>> [    0.007756] LR [0000000000000000] 0x0
>> [    0.007769] --- interrupt: 0
>> [    0.007779] Code: 7ca507b4 79081764 7d1e4214 8108000c 7f882000
>> 409effdc 48000010 38e70001 7ce707b4 4bffffa4 2f8affff 409e0010
>> <0fe00000> 3860ffc3 4800004c 7f9b5000
>> [    0.007805] ---[ end trace 0000000000000000 ]---
>> [    0.007997] RCU Tasks Rude: Setting shift to 2 and lim to 1
>> rcu_task_cb_adjust=1.
>> [    0.008018] RCU Tasks Trace: Setting shift to 2 and lim to 1
>> rcu_task_cb_adjust=1.
>> [    0.008043] POWER10 performance monitor hardware support registered
>> [    0.008071] rcu: Hierarchical SRCU implementation.
>> [    0.008078] rcu:     Max phase no-delay instances is 1000.
>> [    0.008516] smp: Bringing up secondary CPUs ...
>> [    0.008735] smp: Brought up 1 node, 2 CPUs
>> ...
>>
>> Note: no warning observed if crashing CPU is 0, 8, 16, 24, 32, ....
>>
>> Code that generates warning:
>>
>> File: arch/powerpc/kernel/smp.c
>> Function: update_mask_from_threadgroup
>> ...
>>           if (unlikely(i_group_start == -1)) {
>>               WARN_ON_ONCE(1);
>>               return -ENODATA;
>>           }
>>
> It seems that the crash cpu passed the statements in
> init_thread_group_cache_map()
> {
> if (unlikely(cpu_group_start == -1)) {
> WARN_ON_ONCE(1);
> return -ENODATA;
> }
>
> }
>
> But raising warn in the above snippet.  So it means that
> get_cpu_thread_group_start(i, tg) for the @first_thread failed in
> update_mask_from_threadgroup().  At present, I have no idea about it.
>
>
> And is this warning observed if only applying [1-2/5] ?

No warning observed with just 1-2/5 patches.

>
> According to my collected data, percpu area will cost 1792 kB per cpu.
> Forcing all eight threads in a core online will cost 107z52KB more than
> the result if applying the whole series. Maybe I can put [3-5/5] aside
> as Hari suggested, and try them later if needed.
In my experiment 7MB was allocated for Percpu for both nr_cpus=1 and
nr_cpus=8 if only 1-2/5 patches are applied.

Trimmed output of lscpu and cat /proc/meminfo

With nr_cpus=1
============

kdump:/# lscpu
Architecture:            ppc64le
   Byte Order:            Little Endian
CPU(s):                  8
   On-line CPU(s) list:   0,3
   Off-line CPU(s) list:  1,2,4-7
Model name:              POWER10
kdump:/#
kdump:/# cat /proc/meminfo | grep Percpu
Percpu:             7168 kB
kdump:/#


with nr_cpus=8
============

kdump:/# lscpu
Architecture:            ppc64le
   Byte Order:            Little Endian
CPU(s):                  8
   On-line CPU(s) list:   0,2
   Off-line CPU(s) list:  1,3-7
Model name:              POWER10

kdump:/#
kdump:/# cat /proc/meminfo | grep Percpu
Percpu:             7168 kB


Thanks,
Sourabh Jain
Pingfan Liu Oct. 17, 2023, 2:12 a.m. UTC | #8
On Mon, Oct 16, 2023 at 12:13:53PM +0530, Sourabh Jain wrote:
> Hello Pingfan,
> 
> > > > > > With this patch series applied, the kdump kernel fails to boot on
> > > > > > powerpc with nr_cpus=1.
> > > > > > 
> > > > > > Console logs:
> > > > > > -------------------
> > > > > > [root]# echo c > /proc/sysrq-trigger
> > > > > > [   74.783235] sysrq: Trigger a crash
> > > > > > [   74.783244] Kernel panic - not syncing: sysrq triggered crash
> > > > > > [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted
> > > > > > 6.6.0-rc5pf-nr-cpus+ #3
> > > > > > [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
> > > > > > [   74.783275] Call Trace:
> > > > > > [   74.783280] [c00000020f4ebac0] [c000000000ed9f38]
> > > > > > dump_stack_lvl+0x6c/0x9c (unreliable)
> > > > > > [   74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x178/0x438
> > > > > > [   74.783298] [c00000020f4ebb90] [c000000000936d48]
> > > > > > sysrq_handle_crash+0x28/0x30
> > > > > > [   74.783304] [c00000020f4ebbf0] [c00000000093773c]
> > > > > > __handle_sysrq+0x10c/0x250
> > > > > > [   74.783309] [c00000020f4ebc90] [c000000000937fa8]
> > > > > > write_sysrq_trigger+0xc8/0x168
> > > > > > [   74.783314] [c00000020f4ebcd0] [c000000000665d8c]
> > > > > > proc_reg_write+0x10c/0x1b0
> > > > > > [   74.783321] [c00000020f4ebd00] [c00000000058da54]
> > > > > > vfs_write+0x104/0x4b0
> > > > > > [   74.783326] [c00000020f4ebdc0] [c00000000058dfdc]
> > > > > > ksys_write+0x7c/0x140
> > > > > > [   74.783331] [c00000020f4ebe10] [c000000000033a64]
> > > > > > system_call_exception+0x144/0x3a0
> > > > > > [   74.783337] [c00000020f4ebe50] [c00000000000c554]
> > > > > > system_call_common+0xf4/0x258
> > > > > > [   74.783343] --- interrupt: c00 at 0x7fffa0721594
> > > > > > [   74.783352] NIP:  00007fffa0721594 LR: 00007fffa0697bf4 CTR:
> > > > > > 0000000000000000
> > > > > > [   74.783364] REGS: c00000020f4ebe80 TRAP: 0c00   Not tainted
> > > > > > (6.6.0-rc5pf-nr-cpus+)
> > > > > > [   74.783376] MSR:  800000000280f033
> > > > > > <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> > > > > > [   74.783394] IRQMASK: 0
> > > > > > [   74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007fffa0807300
> > > > > > 0000000000000001
> > > > > > [   74.783394] GPR04: 000000013549ea60 0000000000000002 0000000000000010
> > > > > > 0000000000000000
> > > > > > [   74.783394] GPR08: 0000000000000000 0000000000000000 0000000000000000
> > > > > > 0000000000000000
> > > > > > [   74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000040000000
> > > > > > 000000011a0f9798
> > > > > > [   74.783394] GPR16: 000000011a0f9724 000000011a097688 000000011a02ff70
> > > > > > 000000011a0fd568
> > > > > > [   74.783394] GPR20: 0000000135554bf0 0000000000000001 000000011a0aa478
> > > > > > 00007ffffc4b6a24
> > > > > > [   74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000000000002
> > > > > > 000000013549ea60
> > > > > > [   74.783394] GPR28: 0000000000000002 00007fffa08017a0 000000013549ea60
> > > > > > 0000000000000002
> > > > > > [   74.783440] NIP [00007fffa0721594] 0x7fffa0721594
> > > > > > [   74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4
> > > > > > [   74.783447] --- interrupt: c00
> > > > > > I'm in purgatory
> > > > > > [    0.000000] radix-mmu: Page sizes from device-tree:
> > > > > > [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> > > > > > [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> > > > > > [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> > > > > > [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> > > > > > [    0.000000] Activating Kernel Userspace Access Prevention
> > > > > > [    0.000000] Activating Kernel Userspace Execution Prevention
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
> > > > > > with 64.0 KiB pages (exec)
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
> > > > > > with 64.0 KiB pages
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
> > > > > > with 2.00 MiB pages
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
> > > > > > with 2.00 MiB pages (exec)
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
> > > > > > with 2.00 MiB pages
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000180000000
> > > > > > with 1.00 GiB pages
> > > > > > [    0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a0000000
> > > > > > with 2.00 MiB pages
> > > > > > [    0.000000] lpar: Using radix MMU under hypervisor
> > > > > > [    0.000000] Linux version 6.6.0-rc5pf-nr-cpus+
> > > > > > (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red
> > > > > > Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
> > > > > > 41 CDT 2023
> > > > > > [    0.000000] Found initrd at 0xc000000022e60000:0xc0000000248f08d8
> > > > > > [    0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200
> > > > > > 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
> > > > > > [    0.000000] printk: bootconsole [udbg0] enabled
> > > > > > [    0.000000] the round shift between dt seq and the cpu logic number:
> > > > > > 56
> > > > > > [    0.000000] BUG: Unable to handle kernel data access on write at
> > > > > > 0xc0000001a0000000
> > > > > > [    0.000000] Faulting instruction address: 0xc000000022009c64
> > > > > > [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> > > > > > [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > > > > > [    0.000000] Modules linked in:
> > > > > > [    0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted
> > > > > > 6.6.0-rc5pf-nr-cpus+ #3
> > > > > > [    0.000000] Hardware name:  POWER10 (raw)  hv:phyp pSeries
> > > > > > [    0.000000] NIP:  c000000022009c64 LR: c000000022009c54 CTR:
> > > > > > c0000000201ff348
> > > > > > [    0.000000] REGS: c000000022aebb00 TRAP: 0300   Not tainted
> > > > > > (6.6.0-rc5pf-nr-cpus+)
> > > > > > [    0.000000] MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 28222824
> > > > > > XER: 00000001
> > > > > > [    0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSISR:
> > > > > > 42000000 IRQMASK: 1
> > > > > > [    0.000000] GPR00: c000000022009ba0 c000000022aebda0 c0000000213d1300
> > > > > > 0000000000000004
> > > > > > [    0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000022aebbb8
> > > > > > 0000000000000000
> > > > > > [    0.000000] GPR08: 0000000000000001 c00000019ffffff8 000000000000003a
> > > > > > c0000000229c8a78
> > > > > > [    0.000000] GPR12: 0000000000002000 c000000022e4a800 c0000000211d34b8
> > > > > > c0000000211d3aa8
> > > > > > [    0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c0000000225f3b98
> > > > > > 0000000000000000
> > > > > > [    0.000000] GPR20: 0000000000000001 0000000000000001 0000000000000001
> > > > > > 0000000000000001
> > > > > > [    0.000000] GPR24: 0000000000000008 0000000000000000 0000000000000001
> > > > > > c00000019ffffdc0
> > > > > > [    0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000022aebe08
> > > > > > 0000000000000008
> > > > > > [    0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/0x724
> > > > > > [    0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0x724
> > > > > > [    0.000000] Call Trace:
> > > > > > [    0.000000] [c000000022aebda0] [c000000022009ba0]
> > > > > > smp_setup_cpu_maps+0x35c/0x724 (unreliable)
> > > > > > [    0.000000] [c000000022aebeb0] [c00000002200a19c]
> > > > > > setup_arch+0x1b8/0x54c
> > > > > > [    0.000000] [c000000022aebf30] [c000000022003f88]
> > > > > > start_kernel+0xb0/0x768
> > > > > > [    0.000000] [c000000022aebfe0] [c00000002000d888]
> > > > > > start_here_common+0x1c/0x20
> > > > > > [    0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e378
> > > > > > 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c
> > > > > > <7d49c12e> eb7b0000 7e99a378 4bffff3c
> > > > > The faulting instruction address, 0xc000000022009c6, corresponds to the code
> > > > > below:
> > > > > 
> > > > > File:
> > > > > arch/powerpc/kernel/setup-common.c
> > > > > 
> > > > > Function
> > > > > void __init smp_setup_cpu_maps(void)
> > > > > {
> > > > >               ...
> > > > >               cpu_to_phys_id[bt_thread] =
> > > > > be32_to_cpu(intserv_node->intserv[bt_thread]);
> > > > >               ...
> > > > > }
> > > > > 
> > > > > Hope it helps.
> > > > > 
> > > > Appreciate your help.
> > > > 
> > > > This issue should be linked with the capability of cpu_to_phys_id[].
> > > > 
> > > > Could you please to try the fix suggested at the end of the email?
> > > > It should be a fix for
> > > > [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
> > > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > Pingfan
> > > > 
> > > > ---
> > > > 
> > > > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> > > > index bd7853a4bc91..849adc7a4b47 100644
> > > > --- a/arch/powerpc/kernel/setup-common.c
> > > > +++ b/arch/powerpc/kernel/setup-common.c
> > > > @@ -464,12 +464,6 @@ void __init smp_setup_cpu_maps(void)
> > > >        DBG("smp_setup_cpu_maps()\n");
> > > > 
> > > >        INIT_LIST_HEAD(&head);
> > > > -     cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
> > > > -                                     __alignof__(u32));
> > > > -     if (!cpu_to_phys_id)
> > > > -             panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> > > > -                   __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
> > > > -
> > > >        for_each_node_by_type(dn, "cpu") {
> > > >                const __be32 *intserv;
> > > >                __be32 cpu_be;
> > > > @@ -533,6 +527,16 @@ void __init smp_setup_cpu_maps(void)
> > > >                }
> > > > 
> > > >        }
> > > > +
> > > > +     /* There may be hole between cpu0 and boot cpu */
> > > > +     j = (bt_thread + 1) > nr_cpu_ids ? (bt_thread + 1) : nr_cpu_ids;
> > > > +     cpu_to_phys_id = memblock_alloc(j * sizeof(u32),
> > > > +                                     __alignof__(u32));
> > > > +     if (!cpu_to_phys_id)
> > > > +             panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> > > > +                   __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));
> > > > +
> > > > +
> > > >        cpu = 0;
> > > >        list_del_init(&head);
> > > >        /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
> > > With the above changes applied, kdump kernel boots fine with a WARNING:
> > > 
> > > [root]# echo c > /proc/sysrq-trigger
> > > [  310.748248] sysrq: Trigger a crash
> > > [  310.748256] Kernel panic - not syncing: sysrq triggered crash
> > > [  310.748266] CPU: 26 PID: 2610 Comm: bash Kdump: loaded Not tainted
> > > 6.6.0-rc5-fix-setup-common+ #3
> > > [  310.748273] Hardware name: IBM,9043-MRX POWER10  hv:phyp pSeries
> > > [  310.748280] Call Trace:
> > > [  310.748284] [c000000184717ac0] [c000000000ecf8d8]
> > > dump_stack_lvl+0x6c/0x9c (unreliable)
> > > [  310.748298] [c000000184717af0] [c000000000150310] panic+0x178/0x438
> > > [  310.748307] [c000000184717b90] [c00000000092c8b8]
> > > sysrq_handle_crash+0x28/0x30
> > > [  310.748316] [c000000184717bf0] [c00000000092d2ac]
> > > __handle_sysrq+0x10c/0x250
> > > [  310.748330] [c000000184717c90] [c00000000092db18]
> > > write_sysrq_trigger+0xc8/0x168
> > > [  310.748339] [c000000184717cd0] [c00000000065c21c]
> > > proc_reg_write+0x10c/0x1b0
> > > [  310.748349] [c000000184717d00] [c000000000583f94] vfs_write+0x104/0x4b0
> > > [  310.748356] [c000000184717dc0] [c00000000058451c] ksys_write+0x7c/0x140
> > > [  310.748365] [c000000184717e10] [c000000000033a54]
> > > system_call_exception+0x144/0x3a0
> > > [  310.748377] [c000000184717e50] [c00000000000c554]
> > > system_call_common+0xf4/0x258
> > > [  310.748389] --- interrupt: c00 at 0x7fff97720c34
> > > [  310.748395] NIP:  00007fff97720c34 LR: 00007fff97697c74 CTR:
> > > 0000000000000000
> > > [  310.748404] REGS: c000000184717e80 TRAP: 0c00   Not tainted
> > > (6.6.0-rc5-fix-setup-common+)
> > > [  310.748413] MSR:  800000000280f033
> > > <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28222202  XER: 00000000
> > > [  310.748430] IRQMASK: 0
> > > [  310.748430] GPR00: 0000000000000004 00007fffffabc510 00007fff97807300
> > > 0000000000000001
> > > [  310.748430] GPR04: 00000001624f7910 0000000000000002 0000000000000010
> > > 00007fff97669724
> > > [  310.748430] GPR08: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [  310.748430] GPR12: 0000000000000000 00007fff97a5aee0 0000000040000000
> > > 0000000125c39798
> > > [  310.748430] GPR16: 0000000125c39724 0000000125bd8128 0000000125b70370
> > > 0000000125c3d568
> > > [  310.748430] GPR20: 0000000162551030 0000000000000001 0000000125beaf18
> > > 00007fffffabc734
> > > [  310.748430] GPR24: 00007fffffabc730 0000000125c3af94 0000000000000002
> > > 00000001624f7910
> > > [  310.748430] GPR28: 0000000000000002 00007fff97801798 00000001624f7910
> > > 0000000000000002
> > > [  310.748475] NIP [00007fff97720c34] 0x7fff97720c34
> > > [  310.748478] LR [00007fff97697c74] 0x7fff97697c74
> > > [  310.748482] --- interrupt: c00
> > > I'm in purgatory
> > > [    0.000000] radix-mmu: Page sizes from device-tree:
> > > [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> > > [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> > > [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> > > [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> > > [    0.000000] Activating Kernel Userspace Access Prevention
> > > [    0.000000] Activating Kernel Userspace Execution Prevention
> > > [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000
> > > with 64.0 KiB pages (exec)
> > > [    0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000
> > > with 64.0 KiB pages
> > > [    0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000
> > > with 2.00 MiB pages
> > > [    0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000
> > > with 2.00 MiB pages (exec)
> > > [    0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000
> > > with 2.00 MiB pages
> > > 
> > > Trimmed logs ....
> > > 
> > > [    0.001738] Mount-cache hash table entries: 16384 (order: 1, 131072
> > > bytes, linear)
> > > [    0.001751] Mountpoint-cache hash table entries: 16384 (order: 1,
> > > 131072 bytes, linear)
> > > [    0.007339] ------------[ cut here ]------------
> > > [    0.007356] WARNING: CPU: 2 PID: 1 at arch/powerpc/kernel/smp.c:941
> > > update_mask_from_threadgroup+0x128/0x1a0
> > > [    0.007371] Modules linked in:
> > > [    0.007377] CPU: 2 PID: 1 Comm: swapper/2 Not tainted
> > > 6.6.0-rc5-fix-setup-common+ #3
> > > [    0.007385] Hardware name: IBM,9043-MRX POWER10 hv:phyp pSeries
> > > [    0.007393] NIP:  c000000022011ed8 LR: c000000022011e10 CTR:
> > > 0000000000000000
> > > [    0.007411] REGS: c0000000256338f0 TRAP: 0700   Not tainted
> > > (6.6.0-rc5-fix-setup-common+)
> > > [    0.007425] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR:
> > > 44000842  XER: 0000000c
> > > [    0.007444] CFAR: c000000022011e78 IRQMASK: 0
> > > [    0.007444] GPR00: c000000022011e10 c000000025633b90 c0000000213c1300
> > > 0000000000000002
> > > [    0.007444] GPR04: 0000000000000000 0000000000000005 0000000000000001
> > > 0000000000000002
> > > [    0.007444] GPR08: 0000000000000008 0000000000000001 0000000000000002
> > > 0000000000000004
> > > [    0.007444] GPR12: 0000000000000000 c000000022e3ac00 c000000020010138
> > > 0000000000000000
> > > [    0.007444] GPR16: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007444] GPR20: 0000000000000018 c000000022150968 c000000022093580
> > > c0000000253df000
> > > [    0.007444] GPR24: 0000000000000002 0000000000000000 c000000022b32058
> > > 0000000000000000
> > > [    0.007444] GPR28: c00000015fca0a68 c000000022ba0330 c00000002209352c
> > > 0000000000000000
> > > [    0.007520] NIP [c000000022011ed8]
> > > update_mask_from_threadgroup+0x128/0x1a0
> > > [    0.007528] LR [c000000022011e10] update_mask_from_threadgroup+0x60/0x1a0
> > > [    0.007536] Call Trace:
> > > [    0.007539] [c000000025633b90] [c000000022011e10]
> > > update_mask_from_threadgroup+0x60/0x1a0 (unreliable)
> > > [    0.007550] [c000000025633be0] [c000000022012210]
> > > init_thread_group_cache_map+0x2c0/0x338
> > > [    0.007559] [c000000025633c50] [c0000000220125a0]
> > > smp_prepare_cpus+0x318/0x510
> > > [    0.007568] [c000000025633d10] [c000000022004874]
> > > kernel_init_freeable+0x198/0x3cc
> > > [    0.007578] [c000000025633de0] [c000000020010164] kernel_init+0x34/0x1b0
> > > [    0.007586] [c000000025633e50] [c00000002000cd94]
> > > ret_from_kernel_user_thread+0x14/0x1c
> > > [    0.007596] --- interrupt: 0 at 0x0
> > > [    0.007601] NIP:  0000000000000000 LR: 0000000000000000 CTR:
> > > 0000000000000000
> > > [    0.007608] REGS: c000000025633e80 TRAP: 0000   Not tainted
> > > (6.6.0-rc5-fix-setup-common+)
> > > [    0.007632] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
> > > [    0.007651] CFAR: 0000000000000000 IRQMASK: 0
> > > [    0.007651] GPR00: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR04: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR08: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR12: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR16: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR20: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR24: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007651] GPR28: 0000000000000000 0000000000000000 0000000000000000
> > > 0000000000000000
> > > [    0.007742] NIP [0000000000000000] 0x0
> > > [    0.007756] LR [0000000000000000] 0x0
> > > [    0.007769] --- interrupt: 0
> > > [    0.007779] Code: 7ca507b4 79081764 7d1e4214 8108000c 7f882000
> > > 409effdc 48000010 38e70001 7ce707b4 4bffffa4 2f8affff 409e0010
> > > <0fe00000> 3860ffc3 4800004c 7f9b5000
> > > [    0.007805] ---[ end trace 0000000000000000 ]---
> > > [    0.007997] RCU Tasks Rude: Setting shift to 2 and lim to 1
> > > rcu_task_cb_adjust=1.
> > > [    0.008018] RCU Tasks Trace: Setting shift to 2 and lim to 1
> > > rcu_task_cb_adjust=1.
> > > [    0.008043] POWER10 performance monitor hardware support registered
> > > [    0.008071] rcu: Hierarchical SRCU implementation.
> > > [    0.008078] rcu:     Max phase no-delay instances is 1000.
> > > [    0.008516] smp: Bringing up secondary CPUs ...
> > > [    0.008735] smp: Brought up 1 node, 2 CPUs
> > > ...
> > > 
> > > Note: no warning observed if crashing CPU is 0, 8, 16, 24, 32, ....
> > > 
> > > Code that generates warning:
> > > 
> > > File: arch/powerpc/kernel/smp.c
> > > Function: update_mask_from_threadgroup
> > > ...
> > >           if (unlikely(i_group_start == -1)) {
> > >               WARN_ON_ONCE(1);
> > >               return -ENODATA;
> > >           }
> > > 
> > It seems that the crash cpu passed the statements in
> > init_thread_group_cache_map()
> > {
> > if (unlikely(cpu_group_start == -1)) {
> > WARN_ON_ONCE(1);
> > return -ENODATA;
> > }
> > 
> > }
> > 
> > But raising warn in the above snippet.  So it means that
> > get_cpu_thread_group_start(i, tg) for the @first_thread failed in
> > update_mask_from_threadgroup().  At present, I have no idea about it.
> > 
> > 
> > And is this warning observed if only applying [1-2/5] ?
> 
> No warning observed with just 1-2/5 patches.
> 

Good to know it. I think that [1-2/5] can be a first step.

I will post V9, which trims [3-5/5] later.

> > 
> > According to my collected data, percpu area will cost 1792 kB per cpu.
> > Forcing all eight threads in a core online will cost 107z52KB more than
> > the result if applying the whole series. Maybe I can put [3-5/5] aside
> > as Hari suggested, and try them later if needed.
> In my experiment 7MB was allocated for Percpu for both nr_cpus=1 and
> nr_cpus=8 if only 1-2/5 patches are applied.
> 
> Trimmed output of lscpu and cat /proc/meminfo
> 
> With nr_cpus=1
> ============
> 
> kdump:/# lscpu
> Architecture:            ppc64le
>   Byte Order:            Little Endian
> CPU(s):                  8
>   On-line CPU(s) list:   0,3
>   Off-line CPU(s) list:  1,2,4-7

The nr_cpus has the semantic of possible cpu, instead of online cpu.
It includes both On-line and Off-line CPU(s) list.

So Percpu area allocates memory for eight cpus.

> Model name:              POWER10
> kdump:/#
> kdump:/# cat /proc/meminfo | grep Percpu
> Percpu:             7168 kB
> kdump:/#
> 
> 
> with nr_cpus=8
> ============
> 
> kdump:/# lscpu
> Architecture:            ppc64le
>   Byte Order:            Little Endian
> CPU(s):                  8
>   On-line CPU(s) list:   0,2
>   Off-line CPU(s) list:  1,3-7
> Model name:              POWER10
> 
> kdump:/#
> kdump:/# cat /proc/meminfo | grep Percpu
> Percpu:             7168 kB
> 

Here Percpu area also allocates memory for eight cpus, hence the size is
identical to the former one.


Again, thank you for your help and precious time.

Regards,

	Pingfan
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index aaaa576d0e15..5db9178cc800 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -26,7 +26,7 @@ 
 #include <asm/percpu.h>
 
 extern int boot_cpuid;
-extern int boot_cpu_hwid; /* PPC64 only */
+extern int boot_cpu_hwid;
 extern int spinning_secondaries;
 extern u32 *cpu_to_phys_id;
 extern bool coregroup_enabled;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..ec82f5bda908 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -372,8 +372,7 @@  static int __init early_init_dt_scan_cpus(unsigned long node,
 	    be32_to_cpu(intserv[found_thread]));
 	boot_cpuid = found;
 
-	if (IS_ENABLED(CONFIG_PPC64))
-		boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
 
 	/*
 	 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d2a446216444..1b19a9815672 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -87,9 +87,7 @@  EXPORT_SYMBOL(machine_id);
 int boot_cpuid = -1;
 EXPORT_SYMBOL_GPL(boot_cpuid);
 
-#ifdef CONFIG_PPC64
 int boot_cpu_hwid = -1;
-#endif
 
 /*
  * These are used in binfmt_elf.c to put aux entries on the stack