Message ID | 1486407496-12151-1-git-send-email-bauerman@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote: > This was reproduced in v4.10-rc6 as well, but I don't have a crash log > handy for that version right now. Sorry. > This is the crash log of v4.10: roselp4 login: [ 505.097727] sysrq: SysRq : Changing Loglevel [ 505.097743] sysrq: Loglevel set to 9 [ 595.183727] iommu: Removing device 0021:01:00.0 from group 0 [ 605.213053] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 605.213204] rpadlpar_io: slot PHB 33 removed [ 624.623544] min_free_kbytes is not updated to 16826 because user defined value 547628 is preferred [ 647.574039] cpu 152 (hwid 152) Ready to die... [ 647.803624] cpu 153 (hwid 153) Ready to die... [ 647.813371] cpu 154 (hwid 154) Ready to die... [ 647.823646] cpu 155 (hwid 155) Ready to die... [ 647.833371] cpu 156 (hwid 156) Ready to die... [ 647.843392] cpu 157 (hwid 157) Ready to die... [ 647.853368] cpu 158 (hwid 158) Ready to die... [ 647.863358] cpu 159 (hwid 159) Ready to die... [ 648.233756] cpu 144 (hwid 144) Ready to die... [ 648.383417] cpu 145 (hwid 145) Ready to die... [ 648.383488] cpu 146 (hwid 146) Ready to die... [ 648.383551] cpu 147 (hwid 147) Ready to die... [ 648.402845] cpu 148 (hwid 148) Ready to die... [ 648.403329] cpu 149 (hwid 149) Ready to die... [ 648.403384] cpu 150 (hwid 150) Ready to die... [ 648.413398] cpu 151 (hwid 151) Ready to die... [ 648.583466] cpu 136 (hwid 136) Ready to die... [ 648.833672] cpu 137 (hwid 137) Ready to die... [ 648.843481] cpu 138 (hwid 138) Ready to die... [ 648.843541] cpu 139 (hwid 139) Ready to die... [ 648.843589] cpu 140 (hwid 140) Ready to die... [ 648.843635] cpu 141 (hwid 141) Ready to die... [ 648.853426] cpu 142 (hwid 142) Ready to die... [ 648.863105] Querying DEAD? cpu 142 (142) shows 2 [ 648.863697] cpu 143 (hwid 143) Ready to die... [ 649.013280] cpu 128 (hwid 128) Ready to die... [ 649.253575] cpu 129 (hwid 129) Ready to die... [ 649.253634] cpu 130 (hwid 130) Ready to die... [ 649.253689] cpu 131 (hwid 131) Ready to die... [ 649.253741] cpu 132 (hwid 132) Ready to die... [ 649.253793] cpu 133 (hwid 133) Ready to die... [ 649.253836] cpu 134 (hwid 134) Ready to die... [ 649.253885] cpu 135 (hwid 135) Ready to die... [ 649.353116] cpu 120 (hwid 120) Ready to die... [ 649.553776] cpu 121 (hwid 121) Ready to die... [ 649.563626] Querying DEAD? cpu 122 (122) shows 2 cpu 0x7b: Vector: 400 (Instruction Access) at [c000000007bc7d40] pc: f81f0070801f0078 lr: f81f0070801f0078 sp: ec20d28 msr: 8000000042801000 current = 0xc000000474838600 paca = 0xc000000007b65300 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/123 Linux version 4.10.0-rc6 (bauermann@u1604le) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #1 SMP Fri Feb 3 18:55:20 BRST 2017 WARNING: exception is not recoverable, can't continue enter ? for help SP (ec20d28) is in userspace 7b:mon> e cpu 0x7b: Vector: 400 (Instruction Access) at [c000000007bc7d40] pc: f81f0070801f0078 lr: f81f0070801f0078 sp: ec20d28 msr: 8000000042801000 current = 0xc000000474838600 paca = 0xc000000007b65300 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/123 Linux version 4.10.0-rc6 (bauermann@u1604le) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #1 SMP Fri Feb 3 18:55:20 BRST 2017 7b:mon>
Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> writes: > When testing DLPAR CPU add/remove on a system under stress, pseries_cpu_die > doesn't wait long enough for a CPU to die and the kernel ends up crashing: > > [ 446.143648] cpu 152 (hwid 152) Ready to die... > [ 446.464057] cpu 153 (hwid 153) Ready to die... > [ 446.473525] cpu 154 (hwid 154) Ready to die... > [ 446.474077] cpu 155 (hwid 155) Ready to die... > [ 446.483529] cpu 156 (hwid 156) Ready to die... > [ 446.493532] cpu 157 (hwid 157) Ready to die... > [ 446.494078] cpu 158 (hwid 158) Ready to die... > [ 446.503527] cpu 159 (hwid 159) Ready to die... > [ 446.664534] cpu 144 (hwid 144) Ready to die... > [ 446.964113] cpu 145 (hwid 145) Ready to die... > [ 446.973525] cpu 146 (hwid 146) Ready to die... > [ 446.974094] cpu 147 (hwid 147) Ready to die... > [ 446.983944] cpu 148 (hwid 148) Ready to die... > [ 446.984062] cpu 149 (hwid 149) Ready to die... > [ 446.993518] cpu 150 (hwid 150) Ready to die... > [ 446.993543] Querying DEAD? cpu 150 (150) shows 2 > [ 446.994098] cpu 151 (hwid 151) Ready to die... > [ 447.133726] cpu 136 (hwid 136) Ready to die... > [ 447.403532] cpu 137 (hwid 137) Ready to die... > [ 447.403772] cpu 138 (hwid 138) Ready to die... > [ 447.403839] cpu 139 (hwid 139) Ready to die... > [ 447.403887] cpu 140 (hwid 140) Ready to die... > [ 447.403937] cpu 141 (hwid 141) Ready to die... > [ 447.403979] cpu 142 (hwid 142) Ready to die... > [ 447.404038] cpu 143 (hwid 143) Ready to die... > [ 447.513546] cpu 128 (hwid 128) Ready to die... > [ 447.693533] cpu 129 (hwid 129) Ready to die... > [ 447.693999] cpu 130 (hwid 130) Ready to die... > [ 447.703530] cpu 131 (hwid 131) Ready to die... > [ 447.704087] Querying DEAD? cpu 132 (132) shows 2 > [ 447.704102] cpu 132 (hwid 132) Ready to die... > [ 447.713534] cpu 133 (hwid 133) Ready to die... > [ 447.714064] Querying DEAD? cpu 134 (134) shows 2 > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40] > pc: 000000001ec3072c > lr: 000000001ec2fee0 > sp: 1faf6bd0 > msr: 8000000102801000 > dar: 212d6c1a2a20c > dsisr: 42000000 > current = 0xc000000474c6d600 > paca = 0xc000000007b6b600 softe: 0 irq_happened: 0x01 > pid = 0, comm = swapper/134 > Linux version 4.8.0-34-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 (Ubuntu 4.8.0-34.36~16.04.1-generic 4.8.11) > WARNING: exception is not recoverable, can't continue > > This was reproduced in v4.10-rc6 as well, but I don't have a crash log > handy for that version right now. Sorry. We shouldn't be crashing. So we need to fix that. We may also need to increase the timeout, though it's pretty gross TBH. But step one is make sure we don't crash. cheers
On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote: > [ 447.714064] Querying DEAD? cpu 134 (134) shows 2 > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40] > pc: 000000001ec3072c > lr: 000000001ec2fee0 > sp: 1faf6bd0 > msr: 8000000102801000 > dar: 212d6c1a2a20c This looks like we accessed a bad address, but why? > dsisr: 42000000 > current = 0xc000000474c6d600 > paca = 0xc000000007b6b600 softe: 0 irq_happened: 0x01 > pid = 0, comm = swapper/134 > Linux version 4.8.0-34-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 (Ubuntu 4.8.0-34.36~16.04.1-generic 4.8.11) > WARNING: exception is not recoverable, can't continue > > This was reproduced in v4.10-rc6 as well, but I don't have a crash log > handy for that version right now. Sorry. > > This is a race between one CPU stopping and another one calling > pseries_cpu_die to wait for it to stop. That function does a short > busy loop calling RTAS query-cpu-stopped-state on the stopping CPU > to verify that it is stopped. > > As can be seen in the dmesg right before or after the "Querying DEAD?" > messages, if pseries_cpu_die waited a little longer it would have seen > the CPU in the stopped state. > > I see two cases that can be causing this race: > > 1. It's possible that CPU 134 was inactive at the time it was unplugged. > In that case, dlpar_offline_cpu calls H_PROD on the CPU and immediately > calls pseries_cpu_die. Meanwhile, the prodded CPU activates and start > the process of stopping itself. It's possible that the busy loop is not > long enough to allow for the CPU to wake up and complete the stopping > process. > 2. If CPU 134 was online at the time it was unplugged, it would have gone > through the new CPU hotplug state machine in kernel/cpu.c that was > introduced in v4.6 to get itself stopped. It's possible that the busy > loop in pseries_cpu_die was long enough for the older hotplug code but > not for the new hotplug state machine. > > Either way, the solution is the same: wait an adequate amount in > pseries_cpu_die. > > The simple solution is to increase the number of tries in the loop. > This was done to solve a similar problem in > commit 940ce422a367 ("powerpc/pseries: Increase cpu die timeout"), so > it's not as lame as it sounds. :-) > > Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> > --- > > Notes: > A solution that is probably better is to have pseries_cpu_die wait > on a per-CPU semaphore at the beginning of the function, before doing a > short busy loop. Then the CPU that is stopping unlocks that semaphore right > before stopping itself, probably at pseries_mach_cpu_die. > > What do you think? I can implement that if there is interest. > > arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index a1b63e00b2f7..3d43317eec1b 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu) > } > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { > > - for (tries = 0; tries < 25; tries++) { > + for (tries = 0; tries < 5000; tries++) { This fixes some of the asymmetry between handling of CPU_STATE_INACTIVE and CPU_STATE_OFFLINE, but I think we can probably move the cpu_relax() to msleep(1). Please also see 940ce42 powerpc/pseries: Increase cpu die timeout Balbir Singh.
Am Dienstag, 7. Februar 2017, 08:26:45 BRST schrieb Balbir Singh: > On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote: > > [ 447.714064] Querying DEAD? cpu 134 (134) shows 2 > > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40] > > > > pc: 000000001ec3072c > > lr: 000000001ec2fee0 > > sp: 1faf6bd0 > > > > msr: 8000000102801000 > > dar: 212d6c1a2a20c > > This looks like we accessed a bad address, but why? Am Dienstag, 7. Februar 2017, 13:10:22 BRST schrieb Michael Ellerman: > We shouldn't be crashing. > > So we need to fix that. > > We may also need to increase the timeout, though it's pretty gross TBH. > > But step one is make sure we don't crash. I didn't analyze exactly what is causing the CPU to crash because the root cause is the inconsistency between what the kernel thinks the CPU state is and reality. But if we have to be able to handle that inconsistency I will keep digging and try to fix that. > > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > > @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu) > > > > } > > > > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { > > > > - for (tries = 0; tries < 25; tries++) { > > + for (tries = 0; tries < 5000; tries++) { > > This fixes some of the asymmetry between handling of CPU_STATE_INACTIVE > and CPU_STATE_OFFLINE, but I think we can probably move the cpu_relax() > to msleep(1). I didn't change it to msleep() because I thought it would introduce a regression. commit b906cfa397fd ("powerpc/pseries: Fix cpu hotplug") changed a msleep(200) that was there to a cpu_relax() with this explanation: Currently, pseries_cpu_die() calls msleep() while polling RTAS for the status of the dying cpu. However, if the cpu that is going down also happens to be the one doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification. Therefore jiffies won't be updated anymore. This replaces that msleep() with a cpu_relax() to make sure we're not going to schedule at that point. With this patch my test box survives a 100k iterations hotplug stress test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations. I can try to add it back and see what happens. Perhaps that situation won't happen anymore with today's kernel. > Please also see > 940ce42 powerpc/pseries: Increase cpu die timeout Yes, that is the commit that I mentioned in the patch description.
Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> writes: > Am Dienstag, 7. Februar 2017, 08:26:45 BRST schrieb Balbir Singh: >> On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote: >> > [ 447.714064] Querying DEAD? cpu 134 (134) shows 2 >> > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40] >> > >> > pc: 000000001ec3072c >> > lr: 000000001ec2fee0 >> > sp: 1faf6bd0 >> > >> > msr: 8000000102801000 >> > dar: 212d6c1a2a20c >> >> This looks like we accessed a bad address, but why? > > Am Dienstag, 7. Februar 2017, 13:10:22 BRST schrieb Michael Ellerman: >> We shouldn't be crashing. >> >> So we need to fix that. >> >> We may also need to increase the timeout, though it's pretty gross TBH. >> >> But step one is make sure we don't crash. > > I didn't analyze exactly what is causing the CPU to crash because the root > cause is the inconsistency between what the kernel thinks the CPU state is and > reality. But if we have to be able to handle that inconsistency I will keep > digging and try to fix that. Please do. It's obviously not a good situation, but it shouldn't crash the kernel. I realise that may be easier said than done, but that should be the goal. I see smp_ops->cpu_die() can't fail (returns void), which may be part of the problem. >> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> > @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu) >> > >> > } >> > >> > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { >> > >> > - for (tries = 0; tries < 25; tries++) { >> > + for (tries = 0; tries < 5000; tries++) { >> >> This fixes some of the asymmetry between handling of CPU_STATE_INACTIVE >> and CPU_STATE_OFFLINE, but I think we can probably move the cpu_relax() >> to msleep(1). > > I didn't change it to msleep() because I thought it would introduce a > regression. commit b906cfa397fd ("powerpc/pseries: Fix cpu hotplug") changed a > msleep(200) that was there to a cpu_relax() with this explanation: > > Currently, pseries_cpu_die() calls msleep() while polling RTAS for > the status of the dying cpu. > > However, if the cpu that is going down also happens to be the one > doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is > only passed later on in tick_shutdown() when _cpu_down() does the > CPU_DEAD notification. Therefore jiffies won't be updated anymore. > > This replaces that msleep() with a cpu_relax() to make sure we're not > going to schedule at that point. > > With this patch my test box survives a 100k iterations hotplug stress > test on _all_ cpus, whereas without it, it quickly dies after ~50 > iterations. > > I can try to add it back and see what happens. Perhaps that situation won't > happen anymore with today's kernel. The relative timing of the events may have changed. But the fundamental problem still exists AFAIK. So we'd need a pretty solid guarantee that the problem can't happy anymore before I'd change it to msleep(). However it could/should at least use udelay(), which AFAIK is not reliant on jiffies. That way the loop will actually take a set amount of wall time, rather than just cycles. cheers
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index a1b63e00b2f7..3d43317eec1b 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu) } } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { - for (tries = 0; tries < 25; tries++) { + for (tries = 0; tries < 5000; tries++) { cpu_status = smp_query_cpu_stopped(pcpu); if (cpu_status == QCSS_STOPPED || cpu_status == QCSS_HARDWARE_ERROR)
When testing DLPAR CPU add/remove on a system under stress, pseries_cpu_die doesn't wait long enough for a CPU to die and the kernel ends up crashing: [ 446.143648] cpu 152 (hwid 152) Ready to die... [ 446.464057] cpu 153 (hwid 153) Ready to die... [ 446.473525] cpu 154 (hwid 154) Ready to die... [ 446.474077] cpu 155 (hwid 155) Ready to die... [ 446.483529] cpu 156 (hwid 156) Ready to die... [ 446.493532] cpu 157 (hwid 157) Ready to die... [ 446.494078] cpu 158 (hwid 158) Ready to die... [ 446.503527] cpu 159 (hwid 159) Ready to die... [ 446.664534] cpu 144 (hwid 144) Ready to die... [ 446.964113] cpu 145 (hwid 145) Ready to die... [ 446.973525] cpu 146 (hwid 146) Ready to die... [ 446.974094] cpu 147 (hwid 147) Ready to die... [ 446.983944] cpu 148 (hwid 148) Ready to die... [ 446.984062] cpu 149 (hwid 149) Ready to die... [ 446.993518] cpu 150 (hwid 150) Ready to die... [ 446.993543] Querying DEAD? cpu 150 (150) shows 2 [ 446.994098] cpu 151 (hwid 151) Ready to die... [ 447.133726] cpu 136 (hwid 136) Ready to die... [ 447.403532] cpu 137 (hwid 137) Ready to die... [ 447.403772] cpu 138 (hwid 138) Ready to die... [ 447.403839] cpu 139 (hwid 139) Ready to die... [ 447.403887] cpu 140 (hwid 140) Ready to die... [ 447.403937] cpu 141 (hwid 141) Ready to die... [ 447.403979] cpu 142 (hwid 142) Ready to die... [ 447.404038] cpu 143 (hwid 143) Ready to die... [ 447.513546] cpu 128 (hwid 128) Ready to die... [ 447.693533] cpu 129 (hwid 129) Ready to die... [ 447.693999] cpu 130 (hwid 130) Ready to die... [ 447.703530] cpu 131 (hwid 131) Ready to die... [ 447.704087] Querying DEAD? cpu 132 (132) shows 2 [ 447.704102] cpu 132 (hwid 132) Ready to die... [ 447.713534] cpu 133 (hwid 133) Ready to die... [ 447.714064] Querying DEAD? cpu 134 (134) shows 2 cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40] pc: 000000001ec3072c lr: 000000001ec2fee0 sp: 1faf6bd0 msr: 8000000102801000 dar: 212d6c1a2a20c dsisr: 42000000 current = 0xc000000474c6d600 paca = 0xc000000007b6b600 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/134 Linux version 4.8.0-34-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 (Ubuntu 4.8.0-34.36~16.04.1-generic 4.8.11) WARNING: exception is not recoverable, can't continue This was reproduced in v4.10-rc6 as well, but I don't have a crash log handy for that version right now. Sorry. This is a race between one CPU stopping and another one calling pseries_cpu_die to wait for it to stop. That function does a short busy loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify that it is stopped. As can be seen in the dmesg right before or after the "Querying DEAD?" messages, if pseries_cpu_die waited a little longer it would have seen the CPU in the stopped state. I see two cases that can be causing this race: 1. It's possible that CPU 134 was inactive at the time it was unplugged. In that case, dlpar_offline_cpu calls H_PROD on the CPU and immediately calls pseries_cpu_die. Meanwhile, the prodded CPU activates and start the process of stopping itself. It's possible that the busy loop is not long enough to allow for the CPU to wake up and complete the stopping process. 2. If CPU 134 was online at the time it was unplugged, it would have gone through the new CPU hotplug state machine in kernel/cpu.c that was introduced in v4.6 to get itself stopped. It's possible that the busy loop in pseries_cpu_die was long enough for the older hotplug code but not for the new hotplug state machine. Either way, the solution is the same: wait an adequate amount in pseries_cpu_die. The simple solution is to increase the number of tries in the loop. This was done to solve a similar problem in commit 940ce422a367 ("powerpc/pseries: Increase cpu die timeout"), so it's not as lame as it sounds. :-) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> --- Notes: A solution that is probably better is to have pseries_cpu_die wait on a per-CPU semaphore at the beginning of the function, before doing a short busy loop. Then the CPU that is stopping unlocks that semaphore right before stopping itself, probably at pseries_mach_cpu_die. What do you think? I can implement that if there is interest. arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)