Message ID | 20170808175936.28793-1-npiggin@gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Aug 09, 2017 at 03:59:36AM +1000, Nicholas Piggin wrote: > Unicast H_SIGNAL_SYS_RESET does not find the target CPU if it > is not the current CPU. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > > Unfortunately this slipped through without my noticing because the > Linux driver for NMI IPIs has a fallback to using regular IPIs, and > because Linux did not make much use of unicasts. A new watchdog > has started using them. After this patch, this function works > properly: In fact this bug was already fixed in the for-2.11 branch. If you've hit this for real, I guess it's more important than I realized, so I'll pull it into 2.10 instead. > > Watchdog CPU:0 detected Hard LOCKUP other CPUS:3 > *** Unicast NMI IPI is sent here *** > Watchdog CPU:3 Hard LOCKUP > Modules linked in: > CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-00305-ge84cf82ae73a-dirty #1191 > task: c00000001e440000 task.stack: c00000001e480000 > NIP: c000000000023800 LR: c00000000000da28 CTR: c000000000626db0 > REGS: c00000001ff97d80 TRAP: 0100 Not tainted (4.13.0-rc3-00305-ge84cf82ae73a-dirty) > MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE> > CR: 48000224 XER: 20000000 > CFAR: c00000000002380c SOFTE: 0 > GPR00: c00000000000d9cc c00000001e483dc0 c000000000ebd900 000000000007d000 > GPR04: f000000000078680 c00000001e1a0048 c00000001e1a0048 0000000000000001 > GPR08: 0000000000000000 0000000000075036 00000003d5f633c2 0000000000000020 > GPR12: 0000000000000000 c00000000fd80f00 c00000000000d988 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000b60 > NIP [c000000000023800] udelay+0x40/0x60 > LR [c00000000000da28] kernel_init+0xa8/0x1b0 > Call Trace: > [c00000001e483dc0] [c00000000000d9cc] kernel_init+0x4c/0x1b0 (unreliable) > [c00000001e483e30] [c00000000000bb1c] ret_from_kernel_thread+0x5c/0xc0 > Instruction dump: > 7c6349d2 7c210b78 7d4c42a6 7d2c42a6 7d2a4850 7fa34840 409d0028 48000014 > 60000000 60000000 60000000 60420000 <7d2c42a6> 7d2a4850 7fa34840 419dfff4 > > hw/ppc/spapr_hcall.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > index 72ea5a8247..f50e979b43 100644 > --- a/hw/ppc/spapr_hcall.c > +++ b/hw/ppc/spapr_hcall.c > @@ -1432,7 +1432,9 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu, > } else { > /* Unicast */ > CPU_FOREACH(cs) { > - if (cpu->cpu_dt_id == target) { > + PowerPCCPU *c = POWERPC_CPU(cs); > + > + if (c->cpu_dt_id == target) { > run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL); > return H_SUCCESS; > }
On Wed, 9 Aug 2017 14:05:46 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Wed, Aug 09, 2017 at 03:59:36AM +1000, Nicholas Piggin wrote: > > Unicast H_SIGNAL_SYS_RESET does not find the target CPU if it > > is not the current CPU. > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > > > Unfortunately this slipped through without my noticing because the > > Linux driver for NMI IPIs has a fallback to using regular IPIs, and > > because Linux did not make much use of unicasts. A new watchdog > > has started using them. After this patch, this function works > > properly: > > In fact this bug was already fixed in the for-2.11 branch. If you've > hit this for real, I guess it's more important than I realized, so > I'll pull it into 2.10 instead. Oh sorry I didn't notice the 2.11 branch fix. Yes please pull it into 2.10 if possible. Thanks, Nick > > > > > Watchdog CPU:0 detected Hard LOCKUP other CPUS:3 > > *** Unicast NMI IPI is sent here *** > > Watchdog CPU:3 Hard LOCKUP > > Modules linked in: > > CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-00305-ge84cf82ae73a-dirty #1191 > > task: c00000001e440000 task.stack: c00000001e480000 > > NIP: c000000000023800 LR: c00000000000da28 CTR: c000000000626db0 > > REGS: c00000001ff97d80 TRAP: 0100 Not tainted (4.13.0-rc3-00305-ge84cf82ae73a-dirty) > > MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE> > > CR: 48000224 XER: 20000000 > > CFAR: c00000000002380c SOFTE: 0 > > GPR00: c00000000000d9cc c00000001e483dc0 c000000000ebd900 000000000007d000 > > GPR04: f000000000078680 c00000001e1a0048 c00000001e1a0048 0000000000000001 > > GPR08: 0000000000000000 0000000000075036 00000003d5f633c2 0000000000000020 > > GPR12: 0000000000000000 c00000000fd80f00 c00000000000d988 0000000000000000 > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000b60 > > NIP [c000000000023800] udelay+0x40/0x60 > > LR [c00000000000da28] kernel_init+0xa8/0x1b0 > > Call Trace: > > [c00000001e483dc0] [c00000000000d9cc] kernel_init+0x4c/0x1b0 (unreliable) > > [c00000001e483e30] [c00000000000bb1c] ret_from_kernel_thread+0x5c/0xc0 > > Instruction dump: > > 7c6349d2 7c210b78 7d4c42a6 7d2c42a6 7d2a4850 7fa34840 409d0028 48000014 > > 60000000 60000000 60000000 60420000 <7d2c42a6> 7d2a4850 7fa34840 419dfff4 > > > > hw/ppc/spapr_hcall.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > > index 72ea5a8247..f50e979b43 100644 > > --- a/hw/ppc/spapr_hcall.c > > +++ b/hw/ppc/spapr_hcall.c > > @@ -1432,7 +1432,9 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu, > > } else { > > /* Unicast */ > > CPU_FOREACH(cs) { > > - if (cpu->cpu_dt_id == target) { > > + PowerPCCPU *c = POWERPC_CPU(cs); > > + > > + if (c->cpu_dt_id == target) { > > run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL); > > return H_SUCCESS; > > } >
On Wed, Aug 09, 2017 at 03:07:19PM +1000, Nicholas Piggin wrote: > On Wed, 9 Aug 2017 14:05:46 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Wed, Aug 09, 2017 at 03:59:36AM +1000, Nicholas Piggin wrote: > > > Unicast H_SIGNAL_SYS_RESET does not find the target CPU if it > > > is not the current CPU. > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > --- > > > > > > Unfortunately this slipped through without my noticing because the > > > Linux driver for NMI IPIs has a fallback to using regular IPIs, and > > > because Linux did not make much use of unicasts. A new watchdog > > > has started using them. After this patch, this function works > > > properly: > > > > In fact this bug was already fixed in the for-2.11 branch. If you've > > hit this for real, I guess it's more important than I realized, so > > I'll pull it into 2.10 instead. > > Oh sorry I didn't notice the 2.11 branch fix. Yes please pull it > into 2.10 if possible. Already pulled into my ppc-for-2.10 tree, I'm preparing a pull request at this moment. > > Thanks, > Nick > > > > > > > > > Watchdog CPU:0 detected Hard LOCKUP other CPUS:3 > > > *** Unicast NMI IPI is sent here *** > > > Watchdog CPU:3 Hard LOCKUP > > > Modules linked in: > > > CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-00305-ge84cf82ae73a-dirty #1191 > > > task: c00000001e440000 task.stack: c00000001e480000 > > > NIP: c000000000023800 LR: c00000000000da28 CTR: c000000000626db0 > > > REGS: c00000001ff97d80 TRAP: 0100 Not tainted (4.13.0-rc3-00305-ge84cf82ae73a-dirty) > > > MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE> > > > CR: 48000224 XER: 20000000 > > > CFAR: c00000000002380c SOFTE: 0 > > > GPR00: c00000000000d9cc c00000001e483dc0 c000000000ebd900 000000000007d000 > > > GPR04: f000000000078680 c00000001e1a0048 c00000001e1a0048 0000000000000001 > > > GPR08: 0000000000000000 0000000000075036 00000003d5f633c2 0000000000000020 > > > GPR12: 0000000000000000 c00000000fd80f00 c00000000000d988 0000000000000000 > > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000b60 > > > NIP [c000000000023800] udelay+0x40/0x60 > > > LR [c00000000000da28] kernel_init+0xa8/0x1b0 > > > Call Trace: > > > [c00000001e483dc0] [c00000000000d9cc] kernel_init+0x4c/0x1b0 (unreliable) > > > [c00000001e483e30] [c00000000000bb1c] ret_from_kernel_thread+0x5c/0xc0 > > > Instruction dump: > > > 7c6349d2 7c210b78 7d4c42a6 7d2c42a6 7d2a4850 7fa34840 409d0028 48000014 > > > 60000000 60000000 60000000 60420000 <7d2c42a6> 7d2a4850 7fa34840 419dfff4 > > > > > > hw/ppc/spapr_hcall.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > > > index 72ea5a8247..f50e979b43 100644 > > > --- a/hw/ppc/spapr_hcall.c > > > +++ b/hw/ppc/spapr_hcall.c > > > @@ -1432,7 +1432,9 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu, > > > } else { > > > /* Unicast */ > > > CPU_FOREACH(cs) { > > > - if (cpu->cpu_dt_id == target) { > > > + PowerPCCPU *c = POWERPC_CPU(cs); > > > + > > > + if (c->cpu_dt_id == target) { > > > run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL); > > > return H_SUCCESS; > > > } > > >
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index 72ea5a8247..f50e979b43 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -1432,7 +1432,9 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu, } else { /* Unicast */ CPU_FOREACH(cs) { - if (cpu->cpu_dt_id == target) { + PowerPCCPU *c = POWERPC_CPU(cs); + + if (c->cpu_dt_id == target) { run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL); return H_SUCCESS; }
Unicast H_SIGNAL_SYS_RESET does not find the target CPU if it is not the current CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- Unfortunately this slipped through without my noticing because the Linux driver for NMI IPIs has a fallback to using regular IPIs, and because Linux did not make much use of unicasts. A new watchdog has started using them. After this patch, this function works properly: Watchdog CPU:0 detected Hard LOCKUP other CPUS:3 *** Unicast NMI IPI is sent here *** Watchdog CPU:3 Hard LOCKUP Modules linked in: CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-00305-ge84cf82ae73a-dirty #1191 task: c00000001e440000 task.stack: c00000001e480000 NIP: c000000000023800 LR: c00000000000da28 CTR: c000000000626db0 REGS: c00000001ff97d80 TRAP: 0100 Not tainted (4.13.0-rc3-00305-ge84cf82ae73a-dirty) MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE> CR: 48000224 XER: 20000000 CFAR: c00000000002380c SOFTE: 0 GPR00: c00000000000d9cc c00000001e483dc0 c000000000ebd900 000000000007d000 GPR04: f000000000078680 c00000001e1a0048 c00000001e1a0048 0000000000000001 GPR08: 0000000000000000 0000000000075036 00000003d5f633c2 0000000000000020 GPR12: 0000000000000000 c00000000fd80f00 c00000000000d988 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000b60 NIP [c000000000023800] udelay+0x40/0x60 LR [c00000000000da28] kernel_init+0xa8/0x1b0 Call Trace: [c00000001e483dc0] [c00000000000d9cc] kernel_init+0x4c/0x1b0 (unreliable) [c00000001e483e30] [c00000000000bb1c] ret_from_kernel_thread+0x5c/0xc0 Instruction dump: 7c6349d2 7c210b78 7d4c42a6 7d2c42a6 7d2a4850 7fa34840 409d0028 48000014 60000000 60000000 60000000 60420000 <7d2c42a6> 7d2a4850 7fa34840 419dfff4 hw/ppc/spapr_hcall.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)